52.6% fewer
input tokens.
Token compression proxy for coding agents.
Measured across 120 A/B API calls. Zero code changes.
Works with Claude Code, Aider, Cursor, Cline, 🦞 OpenClaw, and any OpenAI-compatible agent.
$ npx @sliday/tamp
Or install the Claude Code plugin for auto-start: claude plugin marketplace add sliday/claude-plugins && claude plugin install tamp@sliday
Every turn costs more than the last.
Coding agents re-send the full conversation on every API call. Tool results accumulate — file reads, JSON configs, CLI output — all re-sent as input tokens, every single turn.
200+
API calls per session
each sends full history
60%
is tool_result bloat
JSON, files, CLI output
$6–15
per coding session
at $3/Mtok pricing
8 stages. Zero config.
Tamp sits between your agent and the API. It classifies each tool result and applies the right compression — automatically.
-
minify
lossless
Strip JSON whitespace.
package.jsonshrinks 22%. - toon lossless Columnar encoding for arrays. File listings shrink 49%.
- prune lossless Strip lockfile hashes, registry URLs, npm metadata. -81% on lockfiles.
- dedup lossless Same file read twice? Send a reference, not the content.
- diff lossless Tiny edit? Send a patch, not the full file.
- strip-lines lossless Remove line-number prefixes from Read tool output.
- whitespace lossless Collapse blank lines, trim trailing spaces.
- llmlingua neural LLMLingua-2 token pruning for text. Source code shrinks 40%. Auto-starts sidecar.
Opt-in stages (lossy, not enabled by default)
-
strip-comments
opt-in
Remove
//,/* */,#comments. -35% on commented code. - textpress opt-in LLM semantic compression via Ollama or OpenRouter. -73% on stacktraces.
Works with Claude Code, Aider, Cursor, Cline, and any OpenAI-compatible agent.
🦞 Works with OpenClaw
Route your AI gateway through Tamp. Every request gets compressed before it hits Anthropic — your agents work the same, your bill doesn't.
Setup in 2 minutes
- Run
npm i -g @sliday/tamp && tamp -yon your server - Add a provider in your OpenClaw config pointing to
http://localhost:7778 - Set it as primary model — done. All requests now flow through Tamp.
Chat sessions (Telegram, short turns)
3–5%
mostly text, few tool calls
Coding sessions (file reads, JSON)
30–50%
heavy tool_result compression
70MB RAM. <5ms latency. No Python needed. If Tamp goes down, requests bypass it automatically.
Measured: 52.6% fewer tokens
A/B tested via OpenRouter with Claude Sonnet 4.6. Twelve scenarios, five runs each, 120 API calls.
Quality verified: 8/8 A/B scenarios — compressed responses identical to uncompressed. Sonnet 4.6, $3/$15 MTok in/out.
"NOT BAD IRL — save 30% of tokens. 7,681 blocks compressed, 3M tokens saved, $9.30 back in my pocket."
Claude Max? Last 2× longer.
Max subscribers have a fixed token budget. Tamp compresses input tokens before they count against your limit — same work, fewer tokens consumed.
Max 5× — $100/mo
5× → 10.6×
Max 20× — $200/mo
20× → 42.2×
| Model | In/Out $/MTok | Saved/session | Per dev/month | Team/year |
|---|---|---|---|---|
| Sonnet 4.6 | $3/$15 | $0.68 | $75 | $9,000 |
| Opus 4.6 | $15/$75 | $3.39 | $373 | $44,700 |
| Opus 4.6 (extended) | $15/$75 | $3.39 | $373 | $44,700 |
52.6% fewer input tokens + 66.2% fewer output tokens. 10-person team, 5 sessions/day.
Output tokens too.
Tamp compresses input. But Claude also wastes output tokens on filler — "Sure!", "Great question!", unsolicited advice. The Tamp plugin auto-injects token-efficient rules into your project’s CLAUDE.md to cut that too.
66.2% fewer output tokens
A/B tested: 8 scenarios, 5 runs each, 80 API calls via OpenRouter with Sonnet 4.6
Before (508 tokens)
“Great question! I'd be happy to help you with this code review. Looking at your loop, I can see there's an interesting issue here. The problem is that you have an off-by-one error on line 12 where i <= arr.length should be i < arr.length. This is a common mistake that many developers make. I hope this helps! Let me know if you have any other questions.”
After (262 tokens, −48.5%)
“Off-by-one error: i <= arr.length should be i < arr.length. Array index goes out of bounds on last iteration.”
Before (95 tokens)
“You're absolutely right that Python was created in 2005! Actually, I should mention that while your enthusiasm is great, Python was actually created by Guido van Rossum and first released in 1991. It's a common misconception though!”
After (24 tokens, −74.7%)
“Python was created by Guido van Rossum, first released in 1991, not 2005.”
| Scenario | Control | Treatment | Reduction |
|---|---|---|---|
| Code Review | 508 | 262 | 48.5% |
| Concept Explanation | 569 | 327 | 42.6% |
| Factual Correction | 95 | 24 | 74.7% |
| Refactor Suggestion | 139 | 30 | 78.5% |
| Debug Assistance | 483 | 137 | 71.6% |
| API Usage | 609 | 246 | 59.6% |
| Git Command | 334 | 61 | 81.6% |
| Code Generation | 922 | 151 | 83.7% |
Inspired by drona23/claude-token-efficient. More examples →
One command. Zero config.
Point your agent at localhost:7778 and go.
$ npx @sliday/tamp
Or install the Claude Code plugin: claude plugin marketplace add sliday/claude-plugins && claude plugin install tamp@sliday