Tamp — Cut your AI coding costs in half

v0.8.0 zip-like 1–9 compression levels + Codex ChatGPT + Kimi CLI support → Token proxy for coding agents

45% fewer tokens.

100% quality retention across 216 A/B tasks. Zero code changes.

Works with Claude Code Codex CLI Cursor Cline opencode Aider Kimi OpenClaw

"Saved 30% of tokens IRL — $9.30 in one session" — @sanches_free

$ npx @sliday/tamp

or auto-start via the Claude Code plugin →

Every turn costs more than the last.

Coding agents re-send the full conversation on every API call. Tool results accumulate — file reads, JSON configs, CLI output — all re-sent as input tokens, every single turn.

200+

API calls per session
each sends full history

60%

is tool_result bloat
JSON, files, CLI output

$6–15

per coding session
at $3/Mtok pricing

17 stages. Zero config.

Tamp sits between your agent and the API. It classifies each tool result and applies the right compression — automatically.

Claude Code / Codex CLI / opencode / Aider / Cursor / Cline → tamp:7778 → Anthropic / OpenAI / Gemini

cmd-stripnew lossless Strip progress bars and spinners from npm, pip, cargo, docker, git, pytest output.
minify lossless Strip JSON whitespace. package.json shrinks 22%.
toon lossless Columnar encoding for arrays. File listings shrink 49%.
prune lossless Strip lockfile hashes, registry URLs, npm metadata. -81% lockfiles
dedup lossless Same file read twice? Send a reference, not the content.
diff lossless Tiny edit? Send a patch, not the full file.
read-diffnew lossless Agent re-reads a file? Emit a unified diff vs the prior copy. Session-scoped cache.
strip-lines lossless Remove line-number prefixes from Read tool output.
whitespace lossless Collapse blank lines, trim trailing spaces.
llmlingua neural LLMLingua-2 token pruning for text. Auto-starts sidecar. -40% source

Opt-in stages (not enabled by default)

strip-comments opt-in Remove //, /* */, # comments. -35% commented
textpress opt-in LLM semantic compression via Ollama or OpenRouter. -73% stacktraces
graph opt-in Session-scoped dedup. Works on any coding agent — Codex, Claude Code, Aider — anywhere the same file is read twice. -99% repeats
br-cachenew opt-in Brotli disk store under ~/.cache/tamp/br/. Offloads entries >8KB for persistence + cross-session rehydration. Lossless.
disclosurenew lossy 3-tier summary for tool_result bodies >32KB. Emits a <tamp-ref:v1:HASH:BYTES> marker; model quotes it back to rehydrate. Aggressive only, skipped on dangerous tasks.
bm25-trimnew lossy Pure-JS BM25 ranker. Bodies >64KB get lines scored against the last user message; drops low-score lines at a 4096-token budget. Aggressive only, skipped on dangerous tasks.

Works with Claude Code, Codex CLI, opencode, Aider, Cursor, Cline, and any OpenAI-compatible agent.

One knob. Nine stops.

Pick a compression level like you pick a zip level. Each step adds stages on top of the previous — no need to memorize 17 names.

Level	Adds	Lossy	Savings	Preset
L1	`minify`	—	~15%	—
L2	+ `whitespace`, `strip-lines`	—	~25%	—
L3	+ `cmd-strip`	—	~35%	—
L4	+ `toon`, `dedup`, `diff`	—	~45%	`conservative`
L5	+ `llmlingua`, `read-diff`, `prune`	yes	~53%	`balanced` (default)
L6	+ `strip-comments`	yes	~58%	—
L7	+ `textpress`, `br-cache`	yes	~62%	—
L8	+ `disclosure`, `bm25-trim`	yes	~67%	`aggressive`
L9	+ `graph`, `foundation-models`	yes	~72%	`max`

CLI flag

tamp --level 7

Environment

TAMP_LEVEL=7 tamp

Interactive

tamp settings

Precedence: --level > TAMP_LEVEL > config file > preset alias > default (L5).

Wire it up in 30 seconds.

One tamp instance, any coding agent. Point the base URL at localhost:7778 and go.

Anthropic API format — /v1/messages. Works with BYOK and Claude Console OAuth (Pro/Max plans) — tamp forwards the bearer verbatim.

export ANTHROPIC_BASE_URL=http://localhost:7778
claude

Responses API — edit ~/.codex/config.toml. Works with ChatGPT Plus/Pro OAuth: tamp auto-detects the JWT bearer and routes to chatgpt.com/backend-api/codex.

model_provider = "tamp"

[model_providers.tamp]
name = "Tamp Proxy"
base_url = "http://localhost:7778/v1"
env_key = "OPENAI_API_KEY"
wire_api = "responses"

# ChatGPT Plus/Pro OAuth — also set:
openai_base_url = "http://localhost:7778/v1"

Settings → Models → Override OpenAI Base URL. Caveat: bundled cursor-*, composer-*, claude-*, gpt-* models route through api2.cursor.sh and cannot be intercepted. Tamp works with BYOK on external model names via a public tunnel.

http://localhost:7778/v1

# Pick an OpenAI-family model the Cursor router
# doesn't hijack, e.g.:
# gpt-4o-mini-sliday / claude-3.7-sonnet-sliday

API Provider → OpenAI Compatible. Continue extension also works — see README. GitHub Copilot cannot be intercepted (closed proprietary endpoint).

Base URL:  http://localhost:7778/v1
API Key:   sk-...
Model ID:  gpt-4o

Any CLI that honors OPENAI_API_BASE. Aider, sgpt, llm, mods, and the OpenAI SDK itself all respect this env var.

export OPENAI_API_BASE=http://localhost:7778/v1
aider   # or any OpenAI-compatible CLI

Per-provider baseURL in ~/.config/opencode/opencode.json. opencode silently ignores OPENAI_API_BASE / OPENAI_BASE_URL env vars — per-provider config is required.

{
  "provider": {
    "anthropic":  { "options": { "baseURL": "http://localhost:7778" } },
    "openai":     { "options": { "baseURL": "http://localhost:7778/v1" } },
    "openrouter": { "options": { "baseURL": "http://localhost:7778/v1/openrouter" } },
    "opencode":   { "options": { "baseURL": "http://localhost:7778/v1/zen" } }
  }
}

Edit ~/.kimi/config.toml. Tamp routes both Kimi Code OAuth (private api.kimi.com/coding/v1) and Moonshot API-key traffic (api.moonshot.cn/v1) transparently.

# Kimi Code subscription (OAuth)
[providers.kimi-for-coding]
base_url = "http://localhost:7778/coding/v1"

# Or Moonshot API key (pay-per-token)
[providers.moonshot]
base_url = "http://localhost:7778/v1/moonshot"

Lifecycle & health

tamp stop gracefully shuts down a running proxy · tamp -y --force replaces an existing instance · tamp status checks health · curl http://localhost:7778/caveman-help returns the current output mode + classifier rules. If the terminal dies mid-session, the PID file at ~/.config/tamp/tamp-<port>.pid + SIGHUP handler release the port automatically.

🦞 Works with OpenClaw

Route your AI gateway through Tamp. Every request gets compressed before it hits Anthropic — your agents work the same, your bill doesn't.

Setup in 2 minutes

Run npm i -g @sliday/tamp && tamp -y on your server
Add a provider in your OpenClaw config pointing to http://localhost:7778
Set it as primary model — done. All requests now flow through Tamp.

Chat sessions (Telegram, short turns)

3–5%

mostly text, few tool calls

Coding sessions (file reads, JSON)

30–50%

heavy tool_result compression

70MB RAM. <5ms latency. No Python needed. If Tamp goes down, requests bypass it automatically.

Measured: 45% fewer tokens, 100% quality

A/B tested via OpenRouter with Sonnet Haiku 4.5 as judge. Twelve scenarios, 216 live A/B tasks at level 5 (default). Zero quality regressions.

21.3%

62.3%

60.0%

0.0%

18.4%

78.9%

9.8%

Small JSON Large JSON Tabular Source Code Multi-turn Lockfile Dedup Read

Per Session $0.68 input + output combined

Per Developer $75/mo 5 sessions/day

10-Person Team $9,000/yr free and open source

Read the white paper Reproduce the benchmark

Quality verified: 8/8 A/B scenarios — compressed responses identical to uncompressed. Sonnet 4.6, $3/$15 MTok in/out.

"NOT BAD IRL — save 30% of tokens. 7,681 blocks compressed, 3M tokens saved, $9.30 back in my pocket."

[tamp] session 7323.9k chars, 3099180 tokens saved (28.8% avg) $9.2975 saved @ $3/Mtok

"Works perfectly, saving a bunch of tokens daily, and that’s even without using the LM-based optimisers."

[tamp] 1,107 requests, 77 blocks compressed, 26,847 tokens saved (11.4% avg, no llmlingua)

Claude Max? Last 2× longer.

Max subscribers have a fixed token budget. Tamp compresses input tokens before they count against your limit — same work, fewer tokens consumed.

Max 5× — $100/mo

5× → 10.6×

Max 20× — $200/mo

20× → 42.2×

Model	In/Out $/MTok	Saved/session	Per dev/month	Team/year
Sonnet 4.6	$3/$15	$0.68	$75	$9,000
Opus 4.6	$15/$75	$3.39	$373	$44,700
Opus 4.6 (extended)	$15/$75	$3.39	$373	$44,700

Projection at level 5 default (45% input savings, lossless, 100% quality). Caveman output mode adds a further 20–30pp on output tokens. 10-person team, 5 sessions/day.

Output tokens too.

Tamp compresses input. But Claude also wastes output tokens on filler — "Sure!", "Great question!", unsolicited advice. The Tamp plugin auto-injects token-efficient rules into your project’s CLAUDE.md to cut that too.

66.2% fewer output tokens

A/B tested: 8 scenarios, 5 runs each, 80 API calls via OpenRouter with Sonnet 4.6

Before (508 tokens)

“Great question! I'd be happy to help you with this code review. Looking at your loop, I can see there's an interesting issue here. The problem is that you have an off-by-one error on line 12 where i <= arr.length should be i < arr.length. This is a common mistake that many developers make. I hope this helps! Let me know if you have any other questions.”

After (262 tokens, −48.5%)

“Off-by-one error: i <= arr.length should be i < arr.length. Array index goes out of bounds on last iteration.”

Before (95 tokens)

“You're absolutely right that Python was created in 2005! Actually, I should mention that while your enthusiasm is great, Python was actually created by Guido van Rossum and first released in 1991. It's a common misconception though!”

After (24 tokens, −74.7%)

“Python was created by Guido van Rossum, first released in 1991, not 2005.”

Scenario	Control	Treatment	Reduction
Code Review	508	262	48.5%
Concept Explanation	569	327	42.6%
Factual Correction	95	24	74.7%
Refactor Suggestion	139	30	78.5%
Debug Assistance	483	137	71.6%
API Usage	609	246	59.6%
Git Command	334	61	81.6%
Code Generation	922	151	83.7%

Inspired by drona23/claude-token-efficient. More examples →

Caveman Mode Evaluation

Conditional — Safe for simple tasks only

⚠️ Evaluation Results: Not Safe for Production

Comprehensive testing shows Caveman Mode achieves 40–70% output token savings but introduces critical risks for security fixes, debugging, and architectural decisions.

What is Caveman Mode?

An extreme output compression approach that makes Claude Code's responses more token-efficient by:

Executing tools before speaking (tool-first)
Removing preamble ("Let me...", "I will...")
Using sentence fragments
Stopping immediately after results
Never echoing user input

Task-Type Safety Assessment

Task Type	Savings	Safe?
Env var additions	80%	✅ Yes
Typos	95%	✅ Yes
Documentation	85%	✅ Yes
New features (trivial)	70%	⚠️ Conditional
Simple refactors	65%	⚠️ Conditional
Debugging	40%	❌ No
Security fixes	50%	❌ No
Performance	45%	❌ No
Architecture	35%	❌ No

Recommended Approach: Task-Type-Aware Compression

Instead of blanket Caveman Mode, use a hybrid approach:

Safe zones: Apply full compression to env vars, typos, docs (78% average savings)
Danger zones: Use full output for security, debugging, performance, architecture

This achieves 64% overall savings without breaking critical workflows.

Example Failure: Security Fix

❌ Caveman Mode (56% savings)

SSRF found: llmLinguaUrl unvalidated
Fixed: Added localhost-only hostname check
Tests: ✓

✅ Normal Mode

Found vulnerability: SSRF via
config.llmLinguaUrl (compress.js:273).
URL concatenated without validation...

Fix: Added hostname validation to
ensure llmLinguaUrl only points to
localhost (127.0.0.1, ::1).

Edge cases: IPv6 variants handled,
hostname spoofing prevented.

Tests: ✓ Commit: d631c4f

Problem: User can't verify attack vector, IPv6 handling, or hostname spoofing prevention.

See the full evaluation at github.com/sliday/tamp/tree/main/bench

Caveman-Inspired Features

Task-type-aware compression, presets, opt-in output rules

Tamp now integrates the best ideas from JuliusBrussee/caveman for full-spectrum token optimization: input compression (Tamp's strength) + output compression (Caveman's strength).

Compression Presets

Three intensity levels simplify configuration—no need to memorize 10 stage names:

🛡️ Conservative

45–50% Lossless only

Cmd-strip, minify, toon, strip-lines
Whitespace, dedup, diff
No neural compression

Default

⚖️ Balanced

52–58% Recommended

All conservative stages
LLMLingua neural compression
Prune lockfile metadata
Read-diff (re-read deltas)

🚀 Aggressive

65–72% Maximum

All balanced stages
Strip code comments
Textpress LLM compression
Br-cache disk store
Disclosure + bm25-trim (safe tasks)

💡 Usage

# Environment variable
export TAMP_COMPRESSION_PRESET=balanced

# Config file (~/.config/tamp/config)
TAMP_COMPRESSION_PRESET=balanced  # conservative | balanced | aggressive

Task-Type-Aware Output Compression

Tamp injects token-efficient rules into every request based on the user's intent. Safe tasks (env vars, typos, docs) get compressed output; dangerous tasks (security, debugging) get full output. Opt in via TAMP_OUTPUT_MODE=balanced — default is off so existing users see no behavior change.

Mode	Safe Tasks	Dangerous
Conservative	40-50%	40-50%
Balanced	65-75%	Full output
Aggressive	75-85%	Partial

CLI: Compress Config Files

New tamp compress-config tool compresses CLAUDE.md and config files by 40-45% using Tamp's compression pipeline.

# Dry run (preview savings)
tamp compress-config --dry-run ~/.claude/CLAUDE.md

# Compress with backup
tamp compress-config ~/.claude/CLAUDE.md

# Compress multiple files
tamp compress-config ~/.config/tamp/config ~/.claude/CLAUDE.md

🎯 Combined Impact

With Caveman-inspired features integrated, Tamp now provides 60-70% combined token savings (input + output) in balanced mode—full-spectrum optimization without sacrificing quality on critical tasks.

Star History

One command. Zero config.

Point your agent at localhost:7778 and go.

$ npx @sliday/tamp

Or install the Claude Code plugin: claude plugin marketplace add sliday/claude-plugins && claude plugin install tamp@sliday