Tamp

Save up to 50% on input tokens.

$ npx @sliday/tamp

Starts the proxy on port 7778. Set ANTHROPIC_BASE_URL=http://localhost:7778 and go.

~50% compression on real traffic
Other installation methods
# Shell installer
$ curl -fsSL tamp.dev/setup.sh | bash

# Manual
$ git clone https://github.com/sliday/tamp ~/.tamp
$ cd ~/.tamp && npm install
$ node index.js

Independent A/B Benchmark

30.7% fewer input tokens. Measured.

We sent identical requests through OpenRouter — once raw, once compressed by Tamp — and compared the exact token counts reported by the API. Seven scenarios, five runs each, 70 API calls total.

50% 40% 30% 20% 10% 0% 21.9% Small JSON 28.3% Large JSON 48.7% Tabular Data 0% Source Code 23.0% Multi- turn 31.7% Line- Numbered 0% Error Result Token Reduction

48.7%

on tabular data
TOON columnar encoding

0%

on source code & errors
safe passthrough by design

<1ms

compression overhead
per request

How we tested. Each of 7 scenarios was sent to Claude Sonnet 4 via OpenRouter 5 times — once raw (control), once after Tamp compression (treatment). OpenRouter reports exact input_tokens for both, giving us a clean A/B comparison with no estimation.

What compresses. Pretty-printed JSON (minify whitespace), homogeneous arrays like file listings and route tables (TOON columnar encoding), and line-numbered Read tool output (strip prefixes + minify). These are the bulk of tool_result traffic in real Claude Code sessions.

What doesn't. Source code, error results, already-minified JSON, and TOON-encoded content all pass through untouched. Tamp classifies content first and only compresses when it's safe and effective.

Read the full white paper (PDF) → · Reproduce the benchmark →

Tokens pile up. Fast.

Claude Code sends full history every turn.

Tool results: pretty JSON, raw files, verbose CLI output.

100K+ tokens in minutes — mostly redundant.

More tokens = more cost, more latency, faster context exhaustion.

3-Stage Compression Pipeline

1
JSON Minify
Strip whitespace. Lossless.
2
TOON Encode
Objects → columnar format.
3
LLMLingua-2
ML token pruning.

Stage 1

JSON Minify

Strip whitespace. Instant, lossless.

Before

{ "name": "tamp", "version": "0.1.0", "type": "module", "dependencies": { "@toon-format/toon": "^2.1.0" } }

After

{"name":"tamp","version":"0.1.0","type":"module","dependencies":{"@toon-format/toon":"^2.1.0"}}

Typical saving: ~30-50% on structured data

Stage 2

TOON Encoding

Array-of-objects → columnar. Huge wins on tabular data.

JSON (334 chars)

[{"name":"a.js","size":1024}, {"name":"b.js","size":2048}, {"name":"c.js","size":512}]

TOON (165 chars)

name[3]{a.js|b.js|c.js} size[3]{1024|2048|512}
-50.6%

on real package.json tool result

Stage 3

LLMLingua-2

ML token pruning. Preserves meaning, drops the rest.

Input: compress.js source code

4,630 chars → 2,214 chars

-52.2%

Input: ls -la command output

1,046 chars → 516 chars

-50.7%

Python sidecar via Microsoft LLMLingua-2, running on CPU

Benchmark Results

A/B tested via OpenRouter with Claude Sonnet 4. 7 scenarios, 5 runs each, 70 API calls total.

30.7%

weighted avg token reduction

48.7%

on tabular data (TOON)

0%

impact on model output

Scenario Control Compressed Reduction
Small JSON (package.json) 315 246 -21.9%
Large JSON (dependency tree) 6,773 4,855 -28.3%
Tabular Data (file listing) 3,574 1,835 -48.7%
Source Code (TypeScript) 1,069 1,069 0.0%
Multi-turn (5-turn conversation) 1,026 790 -23.0%
Line-Numbered (Read tool output) 473 323 -31.7%
Error Result (is_error: true) 167 167 0.0%

Token counts from OpenRouter. Source code and errors pass through untouched by design. Read the white paper (PDF) → · Reproduce it yourself →

Transparent HTTP Proxy

Claude Code tamp:7778 Anthropic API

Intercept

POST /v1/messages only. All other routes pass through untouched.

Compress

Last user message → tool_result blocks get compressed per content type.

Forward

Rewrite Content-Length, forward to upstream API. Response streams back untouched.

Safety

Bodies over 256KB bypass compression. Parse errors fall through gracefully.

What's Next

Extended thinking block compression
Response caching for repeated tool calls
Per-session dashboards with live stats
Configurable compression aggressiveness
Cloudflare Workers edge deployment