Tamp
Save up to 50% on input tokens.
$ npx @sliday/tamp
Starts the proxy on port 7778. Set ANTHROPIC_BASE_URL=http://localhost:7778 and go.
Other installation methods
# Shell installer $ curl -fsSL tamp.dev/setup.sh | bash # Manual $ git clone https://github.com/sliday/tamp ~/.tamp $ cd ~/.tamp && npm install $ node index.js
Independent A/B Benchmark
30.7% fewer input tokens. Measured.
We sent identical requests through OpenRouter — once raw, once compressed by Tamp — and compared the exact token counts reported by the API. Seven scenarios, five runs each, 70 API calls total.
48.7%
on tabular data
TOON columnar encoding
0%
on source code & errors
safe passthrough by design
<1ms
compression overhead
per request
How we tested.
Each of 7 scenarios was sent to Claude Sonnet 4 via OpenRouter 5 times — once raw (control), once after Tamp compression (treatment). OpenRouter reports exact input_tokens for both, giving us a clean A/B comparison with no estimation.
What compresses. Pretty-printed JSON (minify whitespace), homogeneous arrays like file listings and route tables (TOON columnar encoding), and line-numbered Read tool output (strip prefixes + minify). These are the bulk of tool_result traffic in real Claude Code sessions.
What doesn't. Source code, error results, already-minified JSON, and TOON-encoded content all pass through untouched. Tamp classifies content first and only compresses when it's safe and effective.
Read the full white paper (PDF) → · Reproduce the benchmark →
Tokens pile up. Fast.
Claude Code sends full history every turn.
Tool results: pretty JSON, raw files, verbose CLI output.
100K+ tokens in minutes — mostly redundant.
More tokens = more cost, more latency, faster context exhaustion.
3-Stage Compression Pipeline
Stage 1
JSON Minify
Strip whitespace. Instant, lossless.
Before
After
Typical saving: ~30-50% on structured data
Stage 2
TOON Encoding
Array-of-objects → columnar. Huge wins on tabular data.
JSON (334 chars)
TOON (165 chars)
on real package.json tool result
Stage 3
LLMLingua-2
ML token pruning. Preserves meaning, drops the rest.
Input: compress.js source code
4,630 chars → 2,214 chars
Input: ls -la command output
1,046 chars → 516 chars
Python sidecar via Microsoft LLMLingua-2, running on CPU
Benchmark Results
A/B tested via OpenRouter with Claude Sonnet 4. 7 scenarios, 5 runs each, 70 API calls total.
weighted avg token reduction
on tabular data (TOON)
impact on model output
| Scenario | Control | Compressed | Reduction |
|---|---|---|---|
| Small JSON (package.json) | 315 | 246 | -21.9% |
| Large JSON (dependency tree) | 6,773 | 4,855 | -28.3% |
| Tabular Data (file listing) | 3,574 | 1,835 | -48.7% |
| Source Code (TypeScript) | 1,069 | 1,069 | 0.0% |
| Multi-turn (5-turn conversation) | 1,026 | 790 | -23.0% |
| Line-Numbered (Read tool output) | 473 | 323 | -31.7% |
| Error Result (is_error: true) | 167 | 167 | 0.0% |
Token counts from OpenRouter. Source code and errors pass through untouched by design. Read the white paper (PDF) → · Reproduce it yourself →
Transparent HTTP Proxy
Intercept
POST /v1/messages only. All other routes pass through untouched.
Compress
Last user message → tool_result blocks get compressed per content type.
Forward
Rewrite Content-Length, forward to upstream API. Response streams back untouched.
Safety
Bodies over 256KB bypass compression. Parse errors fall through gracefully.