> ## Documentation Index
> Fetch the complete documentation index at: https://docs.soloent.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Save on Tokens

> How to use tokens efficiently and lower your monthly SoloEnt bill

## Prerequisite: Keep your client up to date

We continuously improve context management and API caching in every release, so that as models evolve we can keep delivering the best cost-efficiency for you.

<Card title="Download the latest version" icon="download" href="https://soloent.ai">
  Get the latest SoloEnt client from our website
</Card>

## The core equation

<Note>
  **Token usage = input size × number of calls**
</Note>

Once you internalize this, the playbook becomes simple: shrink each input and cut wasted calls.

## High impact — apply every session

### 1. Tighten the context window

Only show the AI what it actually needs. When you're writing chapter 47, it doesn't need chapter 1. When you're polishing one line of dialogue, it doesn't need the whole chapter.

**What to do**:

* Activate only the documents relevant to the current scene. When drafting a chapter, load only the directly relevant settings, the chapter outline, and limited context
* Maintain a [`SoloEnt.md`](../tips/SoloEnt) so the AI can absorb context from a single file instead of pulling in many
* Use `@` for precise references, or hold `shift` and drag specific files into the chat — don't open or read everything by default
* When editing dialogue, select only the target paragraph, not the entire chapter
* Close unused document references after each scene

<Tip>
  Estimated savings: **40–60%**
</Tip>

### 2. Replace long prose with short directives

The AI doesn't need your background framing — only what to do and how to do it. SoloEnt already provides the system prompt; you don't need to repeat the setup in chat.

**Token-heavy**:

```text theme={null}
You are a professional novel-writing assistant. Please rewrite this dialogue
to feel more tense, so the reader senses the strain between the two characters,
while keeping each character's voice consistent…
```

**Token-light**:

```text theme={null}
Rewrite dialogue: increase tension, preserve voice
```

Save your recurring directives as a [Skill](../tips/skills) — one click, zero descriptive cost.

<Tip>
  Estimated savings: **20–35%**
</Tip>

### 3. Audit the Rules you're loading

[Rules](../tips/rules) are the most overlooked silent token sink — they're force-loaded on every request.

**Trim them**:

* Load chapter-writing Rules only when actually writing chapters
* Delete "You are…" role-play preambles (the AI already knows what it is)
* Use lists instead of paragraphs — same information, half the tokens
* Audit Rules quarterly and remove anything the AI has already internalized

<Tip>
  Estimated savings: **15–30%**
</Tip>

## Medium impact — build good daily habits

### 4. Light tasks deserve light models

Not every task needs the strongest model.

| Task type                                                            | Best model (when quality matters) | Light model (when you can trade quality) |
| -------------------------------------------------------------------- | --------------------------------- | ---------------------------------------- |
| Brainstorming, outline generation, consistency checks                | Sonnet                            | Haiku, GLM                               |
| Prose writing, dialogue polish, scene expansion                      | Gemini                            | Doubao, DeepSeek                         |
| Complex plot design, deep style imitation, long-form logic threading | Opus                              | Sonnet, GLM                              |
| First-draft generation, outline drafting                             | GLM, DeepSeek                     | Open-source models                       |

<Tip>
  Estimated savings: **50–70%** on light-task workloads
</Tip>

### 5. Work in steps; don't ask for the full output in one shot

Don't probe by regenerating: asking for a 2,000-word chapter and restarting whenever you don't like it is **the most wasteful pattern there is**.

**Recommended flow** (chapter writing example):

<Steps>
  <Step title="Outline first">
    Have the AI produce the chapter structure and beats
  </Step>

  <Step title="Then expand">
    Once the outline is right, draft the prose
  </Step>

  <Step title="Tone and style pass">
    Polish locally at the end
  </Step>
</Steps>

Each step costs few tokens, and each one only continues after you've confirmed direction — total spend is far below repeated full regenerations.

**Use [Plan mode](../tips/plan-mode)**: before executing, switch to Plan mode and align on direction, structure, and key details over a few lightweight turns. Then switch back to execute. Plan mode burns very few tokens, and one round of alignment saves enormous spend on repeated regeneration.

```text theme={null}
[Plan mode]
This chapter has A and B reconciling, but I want to plant a seed for C.
What structures could work?
→ Align on direction and beats

[Execute mode]
Write the prose using structure 2
```

<Tip>
  Estimated savings: **30–50%** on iterative work
</Tip>

### 6. Open new windows often; don't keep extending old chats

Every chat window carries its history — the longer you talk, the bigger every subsequent input becomes, because the full history is replayed. A window that's run for dozens of turns can spend most of its budget on "historical baggage" alone.

**Suggestions**:

* After finishing one self-contained task, open a new window for the next
* Don't polish dialogue, discuss outlines, and edit settings in the same window
* If a window has grown long and you need to regenerate, prefer a fresh window with only the necessary context
* Re-activate the right context by referencing [`SoloEnt.md`](../tips/SoloEnt) or `@` specific files

<Note>
  Good habit: **one window, one job**
</Note>

<Tip>
  Estimated savings: **10–30%** over time
</Tip>

### 7. Tell the AI to edit, not rewrite

Without constraints, the AI tends to re-emit the whole passage. So **explicitly tell it what to change**.

**Triggers a full rewrite**:

```text theme={null}
Improve this passage
```

**Edits only**:

```text theme={null}
Only change paragraph 3 — slow the pacing of the sentences. Output only the
revised paragraph; nothing else.
```

Add "no explanation" / "no summary" — preambles and post-ambles cost tokens too.

<Tip>
  Estimated savings: **20–40%** on polish work
</Tip>

## Advanced — deeper optimization

### 8. Codify high-frequency flows as Workflows

If every chapter you write begins with the same ritual — review the previous summary, confirm character emotions, read the chapter outline — turn it into a [Workflow](../tips/workflows). The only parameter is the chapter number; everything else is assembled automatically.

The prompt tokens per call become a fixed minimum instead of a randomly inflated value, and consistency improves at the same time.

<Note>
  Outcome: **consistency + token savings**
</Note>

### 9. Use a local model as the "draft layer"

Run an open-source model locally with [LM Studio](../resources/local-llms) to produce the first draft (marginal cost: zero). Then use the cloud model for one final polish pass — small token spend, large quality lift.

**Hardware reference**:

| RAM   | Model size     | Suitable for        |
| ----- | -------------- | ------------------- |
| 16 GB | 7B parameters  | Drafting            |
| 32 GB | 13B parameters | More stable quality |

For prolific writers this can cut cloud spend by **60% or more**.

## In one sentence

<Note>
  Control the context and state your need precisely — don't over-engineer the prompt. That's the core of saving tokens.
</Note>

Short Rules, precise references, the right model for the task — do all three and your monthly token bill can drop by more than half, with no loss in writing quality.

## Next steps

<CardGroup cols={2}>
  <Card title="Choose your plan" icon="credit-card" href="./choose-your-plan">
    Compare plans and pricing
  </Card>

  <Card title="Manage subscription" icon="gear" href="./manage-subscription">
    Balance, invoices, and cancellation
  </Card>
</CardGroup>
