Save on Tokens

Prerequisite: Keep your client up to date

We continuously improve context management and API caching in every release, so that as models evolve we can keep delivering the best cost-efficiency for you.

Download the latest version

Get the latest SoloEnt client from our website

The core equation

Token usage = input size × number of calls

Once you internalize this, the playbook becomes simple: shrink each input and cut wasted calls.

High impact — apply every session

1. Tighten the context window

Only show the AI what it actually needs. When you’re writing chapter 47, it doesn’t need chapter 1. When you’re polishing one line of dialogue, it doesn’t need the whole chapter. What to do:

Activate only the documents relevant to the current scene. When drafting a chapter, load only the directly relevant settings, the chapter outline, and limited context
Maintain a SoloEnt.md so the AI can absorb context from a single file instead of pulling in many
Use @ for precise references, or hold shift and drag specific files into the chat — don’t open or read everything by default
When editing dialogue, select only the target paragraph, not the entire chapter
Close unused document references after each scene

Estimated savings: 40–60%

2. Replace long prose with short directives

The AI doesn’t need your background framing — only what to do and how to do it. SoloEnt already provides the system prompt; you don’t need to repeat the setup in chat. Token-heavy:

You are a professional novel-writing assistant. Please rewrite this dialogue
to feel more tense, so the reader senses the strain between the two characters,
while keeping each character's voice consistent…

Token-light:

Rewrite dialogue: increase tension, preserve voice

Save your recurring directives as a Skill — one click, zero descriptive cost.

Estimated savings: 20–35%

3. Audit the Rules you’re loading

Rules are the most overlooked silent token sink — they’re force-loaded on every request. Trim them:

Load chapter-writing Rules only when actually writing chapters
Delete “You are…” role-play preambles (the AI already knows what it is)
Use lists instead of paragraphs — same information, half the tokens
Audit Rules quarterly and remove anything the AI has already internalized

Estimated savings: 15–30%

Medium impact — build good daily habits

4. Light tasks deserve light models

Not every task needs the strongest model.

Task type	Best model (when quality matters)	Light model (when you can trade quality)
Brainstorming, outline generation, consistency checks	Sonnet	Haiku, GLM
Prose writing, dialogue polish, scene expansion	Gemini	Doubao, DeepSeek
Complex plot design, deep style imitation, long-form logic threading	Opus	Sonnet, GLM
First-draft generation, outline drafting	GLM, DeepSeek	Open-source models

Estimated savings: 50–70% on light-task workloads

5. Work in steps; don’t ask for the full output in one shot

Don’t probe by regenerating: asking for a 2,000-word chapter and restarting whenever you don’t like it is the most wasteful pattern there is. Recommended flow (chapter writing example):

Outline first

Have the AI produce the chapter structure and beats

Then expand

Once the outline is right, draft the prose

Tone and style pass

Polish locally at the end

Each step costs few tokens, and each one only continues after you’ve confirmed direction — total spend is far below repeated full regenerations. Use Plan mode: before executing, switch to Plan mode and align on direction, structure, and key details over a few lightweight turns. Then switch back to execute. Plan mode burns very few tokens, and one round of alignment saves enormous spend on repeated regeneration.

[Plan mode]
This chapter has A and B reconciling, but I want to plant a seed for C.
What structures could work?
→ Align on direction and beats

[Execute mode]
Write the prose using structure 2

Estimated savings: 30–50% on iterative work

6. Open new windows often; don’t keep extending old chats

Every chat window carries its history — the longer you talk, the bigger every subsequent input becomes, because the full history is replayed. A window that’s run for dozens of turns can spend most of its budget on “historical baggage” alone. Suggestions:

After finishing one self-contained task, open a new window for the next
Don’t polish dialogue, discuss outlines, and edit settings in the same window
If a window has grown long and you need to regenerate, prefer a fresh window with only the necessary context
Re-activate the right context by referencing SoloEnt.md or @ specific files

Good habit: one window, one job

Estimated savings: 10–30% over time

7. Tell the AI to edit, not rewrite

Without constraints, the AI tends to re-emit the whole passage. So explicitly tell it what to change. Triggers a full rewrite:

Improve this passage

Edits only:

Only change paragraph 3 — slow the pacing of the sentences. Output only the
revised paragraph; nothing else.

Add “no explanation” / “no summary” — preambles and post-ambles cost tokens too.

Estimated savings: 20–40% on polish work

Advanced — deeper optimization

8. Codify high-frequency flows as Workflows

If every chapter you write begins with the same ritual — review the previous summary, confirm character emotions, read the chapter outline — turn it into a Workflow. The only parameter is the chapter number; everything else is assembled automatically. The prompt tokens per call become a fixed minimum instead of a randomly inflated value, and consistency improves at the same time.

Outcome: consistency + token savings

9. Use a local model as the “draft layer”

Run an open-source model locally with LM Studio to produce the first draft (marginal cost: zero). Then use the cloud model for one final polish pass — small token spend, large quality lift. Hardware reference:

RAM	Model size	Suitable for
16 GB	7B parameters	Drafting
32 GB	13B parameters	More stable quality

For prolific writers this can cut cloud spend by 60% or more.

In one sentence

Control the context and state your need precisely — don’t over-engineer the prompt. That’s the core of saving tokens.

Short Rules, precise references, the right model for the task — do all three and your monthly token bill can drop by more than half, with no loss in writing quality.

Introduction

Getting started

Write with Flexibility

Write with High Quality

Subscription and Payment

Free Resources

Troubleshooting

Prerequisite: Keep your client up to date

Download the latest version

The core equation

High impact — apply every session

1. Tighten the context window

2. Replace long prose with short directives

3. Audit the Rules you’re loading

Medium impact — build good daily habits

4. Light tasks deserve light models

5. Work in steps; don’t ask for the full output in one shot

6. Open new windows often; don’t keep extending old chats

7. Tell the AI to edit, not rewrite

Advanced — deeper optimization

8. Codify high-frequency flows as Workflows

9. Use a local model as the “draft layer”

In one sentence

Next steps

Choose your plan

Manage subscription

Introduction

Getting started

Write with Flexibility

Write with High Quality

Subscription and Payment

Free Resources

Troubleshooting

Documentation Index

​Prerequisite: Keep your client up to date

Download the latest version

​The core equation

​High impact — apply every session

​1. Tighten the context window

​2. Replace long prose with short directives

​3. Audit the Rules you’re loading

​Medium impact — build good daily habits

​4. Light tasks deserve light models

​5. Work in steps; don’t ask for the full output in one shot

​6. Open new windows often; don’t keep extending old chats

​7. Tell the AI to edit, not rewrite

​Advanced — deeper optimization

​8. Codify high-frequency flows as Workflows

​9. Use a local model as the “draft layer”

​In one sentence

​Next steps

Choose your plan

Manage subscription

Prerequisite: Keep your client up to date

The core equation

High impact — apply every session

1. Tighten the context window

2. Replace long prose with short directives

3. Audit the Rules you’re loading

Medium impact — build good daily habits

4. Light tasks deserve light models

5. Work in steps; don’t ask for the full output in one shot

6. Open new windows often; don’t keep extending old chats

7. Tell the AI to edit, not rewrite

Advanced — deeper optimization

8. Codify high-frequency flows as Workflows

9. Use a local model as the “draft layer”

In one sentence

Next steps