Tamp

Token compression proxy for coding agents.

50.5% fewer input tokens. Zero code changes.

$ npx @sliday/tamp

Works with Claude Code, Aider, Cursor, Cline, and any OpenAI-compatible agent.

Other installation methods
# Shell installer
$ curl -fsSL tamp.dev/setup.sh | bash

# Manual
$ git clone https://github.com/sliday/tamp ~/.tamp
$ cd ~/.tamp && npm install
$ node index.js

Tokens pile up. Fast.

Turn 1 Turn 10 Turn 20 Turn 30 5K tokens 25K 80K 100K+ input tokens

200+

API calls per session
each sends full history

60%

is tool_result bloat
pretty JSON, raw files, CLI output

$6–15

per coding session
at $3/Mtok input pricing

Every turn re-sends everything. Tamp compresses before it leaves your machine.

5-Stage Compression Pipeline

Claude Code / Aider / Cursor / Cline tamp:7778 Anthropic / OpenAI / Gemini
1

JSON Minify

— strip whitespace, lossless

Before

{ "name": "tamp", "version": "0.1.0", "type": "module", "dependencies": { "@toon-format/toon": "^2.1.0" } }

After

{"name":"tamp","version":"0.1.0","type":"module","dependencies":{"@toon-format/toon":"^2.1.0"}}
2

TOON Encoding

— objects → columnar format

JSON (334 chars)

[{"name":"a.js","size":1024}, {"name":"b.js","size":2048}, {"name":"c.js","size":512}]

TOON (165 chars)

name[3]{a.js|b.js|c.js} size[3]{1024|2048|512}
3

LLMLingua-2

— ML token pruning for text content

compress.js source code

4,630 chars → 2,214 chars

-52.2%

Each stage runs only when effective. Error results are never compressed.

Live Output

Every request shows compression stats, token savings, and cumulative $ saved.

Compression:
   minify      — Strip JSON whitespace (lossless)
   toon        — Columnar array encoding (lossless)
   strip-lines — Remove line-number prefixes
   whitespace  — Collapse blank lines, trim trailing
   llmlingua   — Neural text compression (LLMLingua-2)

anthropic /v1/messages 45.2k — 5 blocks compressed, -32.0%
  block[0] 12.5k→8.2k -34.4% 3340→2201 tok [toon]
  block[1] 2.1k→1.4k -33.3% 560→374 tok [minify]
  block[2] 890→534 -40.0% 237→142 tok [minify]
  block[3] skipped (error)
  block[4] 1.1k→712 -35.3% 293→190 tok [normalize]
  session 31.4k chars, 8686 tokens saved across 8 blocks (32.0% avg) $0.0261 saved @ $3/Mtok

anthropic /v1/messages 2.1k — no tool blocks

Measured: 50.5% fewer tokens

A/B tested via OpenRouter with Claude Sonnet 4.6. Twelve scenarios, five runs each, 120 API calls total.

21.8%
64.9%
48.6%
31.8%
31.6%
81.4%
37.1%
Small JSON Large JSON Tabular Multi-turn Line-Num Lockfile Dedup Read
Scenario Control Compressed Reduction
Small JSON (package.json) 315 246 -21.9%
Large JSON (dependency tree + prune) 6,774 2,380 -64.9%
Tabular Data (file listing) 3,574 1,835 -48.7%
Multi-turn (5-turn, all messages) 1,027 700 -31.8%
Line-Numbered JSON (Read tool output) 473 323 -31.7%
Lockfile (npm deps + prune) 3,444 640 -81.4%
Duplicate Read (same file twice + dedup) 490 308 -37.1%
All-Message 10-turn (historical compression) 2,218 1,718 -22.5%
Error Result (is_error: true) — skipped by design 167 167 0.0%

How we tested

Identical requests via OpenRouter. Raw vs. compressed. Exact input_tokens from the API. No estimation.

What compresses

8 stages: minify, TOON, strip-lines, whitespace, LLMLingua-2, dedup, diff, field pruning.

What doesn't

Error results, already-minified JSON, and TOON content pass through untouched. Safety first.

Read the white paper Reproduce the benchmark

~$46/month saved per developer

At Sonnet 4.6 pricing ($3/Mtok input), Tamp saves ~$0.31 per 200-request session (103K tokens). Heavy users running 5 sessions/day see ~$46/month. For a 10-person team: ~$5,500/year.

Per Session $0.31 200 requests
Per Developer $46/mo 5 sessions/day
10-Person Team $5,500/yr free and open source

Based on Sonnet 4.6 ($3/Mtok) with 60% compressible traffic. Opus 4.6 ($5/Mtok) saves ~$77/dev/month.

Claude Max? Last 2× longer.

Max subscribers have a fixed token budget per session. Tamp compresses input tokens before they count against your limit — same work, fewer tokens consumed.

Max 5× — $100/mo

10.1×

+2× more usage

Max 20× — $200/mo

20× 40.4×

+2× more usage

Model Input $/MTok Saved/session Per dev/month 10-person team/yr
Sonnet 4.6 $3 $0.31 $46 $5,500
Opus 4.6 $5 $0.52 $77 $9,200
Opus 4.6 (extended >200k) $10 $1.03 $155 $18,500

Math: 50.5% fewer input tokens → 1/(1−0.505) = 2.02× more requests per budget. 5 sessions/day assumed.

Transparent HTTP Proxy

Intercept

POST /v1/messages only. All other routes pass through untouched.

Compress

Last user message → tool_result blocks get compressed per content type.

Forward

Rewrite Content-Length, forward to upstream API. Response streams back untouched.

Safety

Bodies over 256KB bypass compression. Parse errors fall through gracefully.

What's Next

Extended thinking block compression
Response caching for repeated tool calls
Per-session dashboards with live stats
Configurable compression aggressiveness
Cloudflare Workers edge deployment

Try it now

$ npx @sliday/tamp