Cost Optimization Actionables

I do not like cost guides that start with twenty knobs and no baseline.

In one real OpenClaw setup, the measured cost over 30 days was about $0.35. That is already cheap. So the goal here is not to panic and shave fractions of a cent off an already tiny bill. The useful question is simpler: which changes still have good ROI if usage grows, and which ones are just tuning for the sake of tuning.

What I Would Measure First

Before touching config, I would check four things in a real session:

  1. /status
  2. /usage full
  3. /usage cost
  4. /context detail

Those are the surfaces OpenClaw already exposes for token usage, context size, and local cost summaries.

I would also make sure model pricing is configured under:

models.providers.<provider>.models[].cost

OpenClaw can only show dollar estimates when that pricing metadata exists. If pricing is missing, you only get token counts. Also note one important detail from the current docs: OAuth sessions hide dollar cost, so API-key traffic is the clearer baseline when you want actual spend visibility.

What The Real Numbers Usually Change

The first useful realization from a cheap setup is that you may not have an absolute cost problem today.

That matters because it changes the priority order:

  1. Keep what is already working.
  2. Fix the recurring overhead that gets paid on every message.
  3. Only then start tuning session lifecycle and model mix.

If your current spend is already low, the job is to preserve that as traffic grows, not to blindly optimize every setting in sight.

Highest-ROI Changes First

1. Keep the tool surface as narrow as you can

OpenClaw injects the tool list and descriptions into the prompt on every run. In practice, that often becomes one of the biggest fixed chunks of recurring context.

That is why the highest-ROI change is often not a model swap. It is reducing the tool surface to what the agent actually needs day to day.

If this is a personal Telegram agent, I would be especially suspicious of broad tool sets that include things like heavy browser flows, background orchestration, TTS, or other tools that are technically available but rarely used.

The rule here is simple: if a tool is not solving a real recurring problem, it should not be in the always-loaded prompt.

2. Do not casually raise max_output_tokens if it is already capped

If output is already capped at a sane level and quality is fine, that cap is doing real work. It is one of the easiest ways to stop a low-cost setup from drifting upward without noticing.

I would treat any increase there as a product decision, not a casual config cleanup.

3. Use cache policy by agent role, not one global default

OpenClaw supports per-model and per-agent cache behavior, and that is a better lever than one blanket policy.

The pattern I would use is:

  1. Long research or deep interactive sessions: longer cache retention.
  2. Bursty alerts or short utility flows: little or no cache retention.

That matches how the traffic actually behaves instead of pretending all agents have the same cost profile.

4. Use cache-TTL pruning for long or intermittent sessions

This is one of the cleaner cost controls OpenClaw has.

Session pruning trims old tool results from live context without rewriting the transcript on disk. In OpenClaw, this is the documented way to enable that behavior for providers where it is not already auto-enabled:

{
	"contextPruning": {
		"mode": "cache-ttl",
		"ttl": "5m"
	}
}

Why I like it: it reduces the amount of stale tool output that has to be re-cached after idle gaps, which is exactly the kind of recurring cost that grows quietly.

5. Be selective with heartbeat

Heartbeat is useful, but it is not free.

OpenClaw’s own token-cost docs explicitly frame heartbeat as a way to keep cache warm across idle gaps. That can be worth it if you are protecting a large expensive prompt from repeated cache writes. It is not worth it just because the feature exists.

So my default position is:

  1. If heartbeat is not configured, that is fine.
  2. If it is configured, it should have a specific operational reason.
  3. If the point is cache retention, set it just under the provider cache TTL instead of guessing.

6. Trim bootstrap files and skill descriptions before chasing exotic tricks

OpenClaw injects bootstrap files and skill metadata into the prompt. That includes files like AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, and BOOTSTRAP.md when present.

That means prompt bloat is real, and it is persistent. If one of those files is large, repetitive, or full of wording that does not change runtime behavior, it is charging rent on every turn.

Same idea for skill descriptions: keep them short enough to be useful and no longer.

7. Reduce tool-output bloat before it forces compaction sooner than needed

Large tool outputs are one of the fastest ways to make a session more expensive than it looks.

There are a few boring but effective controls here:

  1. Filter or summarize large tool outputs before they become long-lived context.
  2. Use /compact deliberately when a session has clearly moved past an older chunk of work.
  3. Lower agents.defaults.imageMaxDimensionPx carefully in image-heavy workflows.

These are not flashy changes, but they work.

Pricing Visibility I Would Set Up Early

If I want dollar visibility instead of just token counts, I would populate model pricing in:

{
	"models": {
		"providers": {
			"openai": {
				"models": [
					{
						"id": "gpt-5-mini",
						"cost": {
							"input": 0.25,
							"cacheRead": 0.025,
							"output": 2.0
						}
					}
				]
			}
		}
	}
}

The important part is not the exact snippet. The important part is using current provider pricing and putting it in the documented cost fields so /status and /usage have something real to work with.

If provider pricing changes, update the numbers. Do not treat old screenshots or old blog posts as source of truth.

Changes I Would Not Rush

There are a few ideas I would leave alone until the simpler wins are done.

  1. Forced periodic resets. Easy way to break continuity for unclear savings.
  2. Fancy memory or vectorization ideas without a measured before-and-after.
  3. Huge-context thinking as a baseline operating mode instead of an exception path.

Those can all be valid in some setups. They are just not where I would start.

My Starting Order

If I had to work through this in order, I would do it like this:

  1. Get real visibility with /status, /usage full, /usage cost, and /context detail.
  2. Tighten the tool surface.
  3. Trim bootstrap and skill overhead.
  4. Enable or tune cache-TTL pruning where it makes sense.
  5. Revisit cache retention and heartbeat only after that.
  6. Move to model segmentation once the prompt overhead is already under control.

Validation Pattern

For every cost change, I would keep the process boring:

  1. Record the baseline.
  2. Change one thing.
  3. Re-measure.
  4. Check one or two quality-critical flows.
  5. Keep the change only if the savings are real and the behavior still holds up.

That is slower than cargo-cult tuning, but it is how you avoid ending up with a cheap system that is also worse.