Documentation
TukiCode Docs
The local-first AI terminal assistant for Windows developers. Learn how to install, configure, and master TukiCode.
Getting Started
TukiCode runs on Windows, macOS, and Linux. Install in seconds with a single command.
Installation
Windows — PowerShell 5.1+
macOS & Linux — bash / zsh
Installs tuki to ~/.local/bin and updates your shell profile. On macOS, if the binary is blocked on
first run:
xattr -d com.apple.quarantine ~/.local/bin/tuki
Configuration
After installation, you need to configure which AI model TukiCode will use. Run the following command to start the setup wizard:
This will launch an interactive configuration wizard where you can:
- Select your AI provider: Choose between Ollama, OpenRouter, Anthropic, or Google (Gemini).
- Enter the model name: Type the specific model identifier
(e.g.,
tencent/hy3-preview:free). - Provide your API key: If you selected a cloud provider, paste your API key securely. TukiCode encrypts and stores this key locally.
- Configure Advanced Settings: (Optional) Adjust the context window size and max tokens limit depending on your chosen model's capabilities.
For OpenRouter, you can use a single API key for all supported models, including free ones.
TukiCode is designed to work with models that support native tool-calling. This is required for proper agent behavior, including file operations, command execution, and multi-step reasoning.
Recommended free models (with tool-calling support):
-
tencent/hy3-preview:free(recommended) moonshotai/kimi-k2.5zhipu/glm-4.6-
deepseek/deepseek-chat-v3.2
We strongly recommend using:
tencent/hy3-preview:free for the best balance
between performance, reliability, and tool execution in free-tier environments.
Quick Start
After installation and configuration, launch TukiCode from any terminal:
Working Modes
TukiCode adapts to your workflow through three distinct operating modes. Each one routes your input through a different internal pipeline, giving you full control over autonomy and speed.
Mode Comparison
| Mode | Pipeline | Confirmation | Best For |
|---|---|---|---|
| Chat | AgentLoop directly | Per tool (based on autonomy) | Questions, quick fixes, explanations |
| Plan | Planner → show steps → confirm → Executor | Always asks before executing | Multi-step tasks you want to review first |
| Build | Planner → show steps → Executor | No confirmation — runs immediately | Full project scaffolding, autonomous builds |
Chat Mode (Default)
The standard conversational interface. Your message goes directly into the ReAct loop: the agent thinks, calls tools, observes results, and repeats until it has a complete answer. No structured plan is created — the model decides each action on the fly based on what it discovers.
Best for: asking questions, getting code explanations, or doing quick isolated tasks.
- Entry point:
AgentLoop.run_turn() - Planning: None (model reasons internally via
<thinking>tags) - Confirmation: Controlled by your
/autonomylevel
Plan Mode
When you have a complex task, Plan Mode uses a dedicated Planner module to break it into a precise, numbered list of atomic steps before any code is written. You review and approve the plan, and only then does the Executor run each step.
Example: "Refactor the authentication module to use JWT."
-
Your message is sent to
Planner.generate_plan(), which calls the LLM and receives a structured JSON array of steps. - The plan is displayed in full in the chat.
- The agent pauses and asks: "Do you want to execute this plan? (y/n)"
-
If confirmed,
Executor.execute_plan()runs each step sequentially via the AgentLoop, with automatic retries and model fallback on failure.
(Demonstration of Plan Mode creating and executing steps)
Build Mode
Build Mode is designed for autonomous, end-to-end project generation. It follows the same Planner pipeline as Plan Mode, but skips the confirmation step — the agent shows you the plan and begins executing immediately.
It also supports plan resumption: if a previous build was interrupted, Build Mode picks up from the last pending step automatically.
-
If there are pending steps in
planner_state.json(from a previous session), execution resumes from where it left off. -
If there is no existing plan,
Planner.generate_plan()is called automatically. The generated plan is shown in the chat before execution starts. -
Executor.execute_plan()runs immediately — no confirmation is asked.
Build vs Plan: Use Plan when you want to review and approve the strategy before anything runs. Use Build when you trust the agent and want to go as fast as possible — you still see the plan, but execution starts right away.
Core Features
Understand the key capabilities that make TukiCode unique.
Three-Tier Autonomy System
TukiCode gives you total control over the agent's decision-making
process. Adjust the autonomy level with /autonomy:
Low — Maximum Safety
The agent is "on a leash." It pauses and asks for your explicit confirmation for every single action—reading files, searching, or applying fixes.
Medium — Balanced
The agent can explore your files and list directories autonomously to gather context. It stops and waits for approval before writing code, deleting files, or executing shell commands.
High — Unstoppable Speed
Say "Yes" once per turn. After your first approval, TukiCode executes all necessary steps in its reasoning loop autonomously until it reaches the final solution.
Interactive Project Explorer
TukiCode includes a live directory tree in the left panel of the terminal interface:
- Visual Context: Always see exactly where you are in your project structure.
- Smart Selection: Clicking any file in the tree instantly tells the agent to read and analyze it—the fastest way to feed context without typing paths.
Non-Touch Git Policy
Your project's history belongs to you. TukiCode is hard-coded to
ignore any .git related commands. It will
never initialize, commit, or push changes. This ensures your version control
remains clean and strictly manual.
AI Models & Hardware Requirements
A detailed, honest guide to choosing the right AI models and understanding the hardware required to run TukiCode.
Recommended Models
Ollama (Local) — No limits, no costs, offline
Ollama is the ideal choice for TukiCode because it has no rate limits, works entirely offline, guarantees total privacy, and has zero cost per token.
| Model | Required RAM | Required VRAM | Estimated Speed |
|---|---|---|---|
qwen2.5-coder:7b | 8GB RAM | 6GB VRAM | ~50-70 tok/s on RTX 3060 |
qwen2.5-coder:14b | 16GB RAM | 10GB VRAM | ~40-60 tok/s on RTX 3060 |
qwen2.5-coder:32b | 32GB RAM | 20GB VRAM | ~20-30 tok/s on RTX 3060 |
Installation commands:
ollama pull qwen2.5-coder:7bollama pull qwen2.5-coder:14bollama pull qwen2.5-coder:32b
OpenRouter (Cloud) — Requires internet & API key
OpenRouter allows you to use cloud models, including free and paid options. The free models have important limitations you should understand before using them.
Recommended free models for TukiCode:
-
tencent/hy3-preview:free— Best for agents, supports native tool calling -
qwen/qwen3-coder-480b:free— Most powerful available for free, 262K context -
deepseek/deepseek-r1:free— Excellent reasoning for complex tasks -
meta-llama/llama-3.3-70b-instruct:free— Stable and reliable for simple tasks
Important limitations of OpenRouter free models:
- Rate limits: 20 requests per minute and 200 requests per day max.
- Frequent timeouts: During peak hours (US time), models can take over 60 seconds to respond, causing timeout errors.
- Variable availability: Free models can be saturated or go offline without notice.
- Inconsistent quality: During high demand, the quality of responses may degrade.
- Not recommended for: Long tasks like full project migrations, generating multiple large files, or intensive work sessions.
TukiCode features an automatic fallback system between free models, but this does not completely eliminate interruptions.
Recommended paid models via OpenRouter (guaranteed high quality):
-
anthropic/claude-sonnet-4-5— $3/1M input tokens, $15/1M output tokens -
anthropic/claude-haiku-3-5— $0.80/1M input tokens, $4/1M output tokens (best value) -
openai/gpt-4o— $2.50/1M input tokens, $10/1M output tokens
For reference: an intensive 30-minute session with TukiCode consumes
approximately 10,000-40,000 tokens depending on the task. With claude-haiku-3-5, that equals less than $0.20 USD.
Hardware Requirements
Using TukiCode with Ollama (local)
Recommended Minimum:
- CPU: 8 modern cores (Intel i7 10th gen+ / AMD Ryzen 5 5000+)
- RAM: 16GB
- GPU: NVIDIA RTX 3060 (12GB VRAM) or equivalent
- Storage: 20GB free space for models
Optimal:
- CPU: Intel i9 / AMD Ryzen 9 or Apple Silicon
- RAM: 32GB+
- GPU: NVIDIA RTX 3080+ / RTX 4070+ or Apple M1/M2/M3
- Storage: NVMe SSD with 50GB free space
Apple Silicon (M1/M2/M3/M4):
Special mention — Apple Silicon's unified memory is ideal for Ollama.
An M1 with 16GB can run qwen2.5-coder:14b at ~30-40 tokens/second.
An M3 with 16GB is even better. It is probably the best value option to
use TukiCode with Ollama.
Verified Compatible GPUs:
- NVIDIA RTX 3060 (12GB) [YES] — fluid with qwen2.5-coder:14b
- NVIDIA RTX 2060 (8GB) [YES] — fluid with qwen2.5-coder:7b
- NVIDIA RTX 4070 (12GB) [YES] — fluid with qwen2.5-coder:32b
- AMD RX 6700 XT (12GB) [YES] — compatible with ROCm
- Intel Arc A770 (16GB) [WARNING] — experimental support
- NVIDIA GTX 1080 (8GB) [WARNING] — works but slow
- GPUs with less than 6GB VRAM [NO] — not recommended for Ollama
My PC doesn't have a dedicated GPU?
If your machine does not have a dedicated GPU or has less than 6GB
VRAM, Ollama will run on CPU and will be very slow (2-5
tokens/second). In that case, OpenRouter with a paid model like claude-haiku-3-5 is the most practical option — fast, reliable, and almost zero cost.
Using TukiCode with OpenRouter (cloud)
- Any computer with internet connection
- No GPU required
- Minimum RAM: 4GB (to run TukiCode itself)
- Python 3.10+
Which model should I use?
| Situation | Recommendation |
|---|---|
| I have a GPU with 6GB+ VRAM | Ollama with qwen2.5-coder:7b or
14b |
| I have a Mac with Apple Silicon | Ollama with qwen2.5-coder:14b |
| I don't have a GPU but want free options | OpenRouter free models (with limitations) |
| I want the best possible experience | OpenRouter claude-haiku-3-5 (<
$0.20/session) |
| I work in an enterprise / need strict privacy | Local Ollama is mandatory |
| I'm a student with a basic PC | OpenRouter free + patience during peak hours |
Commands in chat
TukiCode uses a slash-command system to manage your AI experience.
/model — Intelligence Switching
The most powerful command in TukiCode. Switch between AI models:
- Without arguments: Opens a semi-transparent modal to select from Ollama (local), Gemini, Claude, or OpenRouter models.
- Smart Detection: Type
/model gemini-1.5-proand the agent automatically configures itself. - Auto-Config: If a cloud model requires an API key you haven't provided, TukiCode pops up a secure input box to save it permanently.
- History: TukiCode remembers every model you use, keeping your favorites at the top of the list.
/autonomy — Control the Agent
Toggle how much TukiCode asks for permission. Use
/autonomy high for quick iterations or
/autonomy low when you want to review every
single line the agent reads or writes.
/risk — Sensitivity Management
Adjust how the agent classifies the "danger" of its tools:
- Low Risk: Standard operations.
- Medium/High Risk: System-altering commands. TukiCode uses this setting to determine when to trigger security warnings based on your current autonomy level.
/copy — Export Your Code
Found the perfect solution? Type /copy
to instantly copy the last code block generated by the AI to your clipboard.
Paste directly into VS Code, IntelliJ, or any editor.
/history — Project Continuity
Shows a list of your most recent sessions. Helps you keep track of what you've worked on and maintain context across different parts of your development cycle.
/clear & /exit
-
/clear: Wipes the current terminal log for a distraction-free environment. -
/exit: Saves your current settings and safely shuts down the agent.
Architecture
Understand how TukiCode thinks and processes your requests.
Asynchronous Architecture
TukiCode operates on a native asynchronous engine (httpx and
asyncio). This ensures the terminal remains incredibly
responsive and non-blocking, even while performing heavy background
LLM generation or executing large shell processes concurrently.
ReAct Logic: How TukiCode "Thinks"
Unlike simple chatbots, TukiCode follows a technical ReAct logic:
1. Thinking
The agent analyzes your request and writes down its internal reasoning.
2. Planning
It creates a step-by-step roadmap of which files to read and which tools to use.
3. Execution
It acts upon your files locally, showing real-time progress in the terminal.
4. Final Response
Once the task is done, it provides a concise summary and the final result.
MVC Architecture
TukiCode operates on a native asynchronous engine (httpx and
asyncio). This ensures the terminal remains incredibly
responsive and non-blocking, even while performing heavy background
LLM generation or executing large shell processes concurrently. More
details
TukiCode GitHub.