Documentation Index
Fetch the complete documentation index at: https://docs.soloent.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisite: Keep your client up to date
We continuously improve context management and API caching in every release, so that as models evolve we can keep delivering the best cost-efficiency for you.Download the latest version
Get the latest SoloEnt client from our website
The core equation
Token usage = input size × number of calls
High impact — apply every session
1. Tighten the context window
Only show the AI what it actually needs. When you’re writing chapter 47, it doesn’t need chapter 1. When you’re polishing one line of dialogue, it doesn’t need the whole chapter. What to do:- Activate only the documents relevant to the current scene. When drafting a chapter, load only the directly relevant settings, the chapter outline, and limited context
- Maintain a
SoloEnt.mdso the AI can absorb context from a single file instead of pulling in many - Use
@for precise references, or holdshiftand drag specific files into the chat — don’t open or read everything by default - When editing dialogue, select only the target paragraph, not the entire chapter
- Close unused document references after each scene
2. Replace long prose with short directives
The AI doesn’t need your background framing — only what to do and how to do it. SoloEnt already provides the system prompt; you don’t need to repeat the setup in chat. Token-heavy:3. Audit the Rules you’re loading
Rules are the most overlooked silent token sink — they’re force-loaded on every request. Trim them:- Load chapter-writing Rules only when actually writing chapters
- Delete “You are…” role-play preambles (the AI already knows what it is)
- Use lists instead of paragraphs — same information, half the tokens
- Audit Rules quarterly and remove anything the AI has already internalized
Medium impact — build good daily habits
4. Light tasks deserve light models
Not every task needs the strongest model.| Task type | Best model (when quality matters) | Light model (when you can trade quality) |
|---|---|---|
| Brainstorming, outline generation, consistency checks | Sonnet | Haiku, GLM |
| Prose writing, dialogue polish, scene expansion | Gemini | Doubao, DeepSeek |
| Complex plot design, deep style imitation, long-form logic threading | Opus | Sonnet, GLM |
| First-draft generation, outline drafting | GLM, DeepSeek | Open-source models |
5. Work in steps; don’t ask for the full output in one shot
Don’t probe by regenerating: asking for a 2,000-word chapter and restarting whenever you don’t like it is the most wasteful pattern there is. Recommended flow (chapter writing example):
Each step costs few tokens, and each one only continues after you’ve confirmed direction — total spend is far below repeated full regenerations.
Use Plan mode: before executing, switch to Plan mode and align on direction, structure, and key details over a few lightweight turns. Then switch back to execute. Plan mode burns very few tokens, and one round of alignment saves enormous spend on repeated regeneration.
6. Open new windows often; don’t keep extending old chats
Every chat window carries its history — the longer you talk, the bigger every subsequent input becomes, because the full history is replayed. A window that’s run for dozens of turns can spend most of its budget on “historical baggage” alone. Suggestions:- After finishing one self-contained task, open a new window for the next
- Don’t polish dialogue, discuss outlines, and edit settings in the same window
- If a window has grown long and you need to regenerate, prefer a fresh window with only the necessary context
- Re-activate the right context by referencing
SoloEnt.mdor@specific files
Good habit: one window, one job
7. Tell the AI to edit, not rewrite
Without constraints, the AI tends to re-emit the whole passage. So explicitly tell it what to change. Triggers a full rewrite:Advanced — deeper optimization
8. Codify high-frequency flows as Workflows
If every chapter you write begins with the same ritual — review the previous summary, confirm character emotions, read the chapter outline — turn it into a Workflow. The only parameter is the chapter number; everything else is assembled automatically. The prompt tokens per call become a fixed minimum instead of a randomly inflated value, and consistency improves at the same time.Outcome: consistency + token savings
9. Use a local model as the “draft layer”
Run an open-source model locally with LM Studio to produce the first draft (marginal cost: zero). Then use the cloud model for one final polish pass — small token spend, large quality lift. Hardware reference:| RAM | Model size | Suitable for |
|---|---|---|
| 16 GB | 7B parameters | Drafting |
| 32 GB | 13B parameters | More stable quality |
In one sentence
Control the context and state your need precisely — don’t over-engineer the prompt. That’s the core of saving tokens.
Next steps
Choose your plan
Compare plans and pricing
Manage subscription
Balance, invoices, and cancellation