Every developer using AI coding assistants has felt the sting: you ask a simple question about a React re-render bug, and the model responds with a 1,200-token essay. It opens with “Great question!”, fills the middle with transitional phrases nobody reads, and closes with “Let me know if you need anything else!”
That’s not just annoying, it’s expensive. Every one of those filler tokens costs real money.
Caveman, now at 52.3K stars on GitHub with 2.8K forks, takes a brutally simple approach to fixing this: it tells your AI to shut up and just answer. And since its initial viral moment, it’s evolved from a single Claude Code plugin into a full three-part ecosystem supporting over 30 AI coding agents.
The Problem: You’re Paying for Small Talk
Here’s the comparison from Caveman’s own README that makes the waste obvious. Same question, explaining why a React component re-renders due to reference changes:
Normal Claude (69 tokens)
“The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle. When you pass an inline object as a prop, React’s shallow comparison sees it as a different object every time, which triggers a re-render. I’d recommend using useMemo to memoize the object.”
Caveman Claude (19 tokens)
“New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo.”
Same fix. 75% less words. Brain still big.
The Caveman Ecosystem (New in 2026)
Caveman is no longer just one tool. It’s grown into three complementary projects:
| Project | Purpose | Key Benefit |
|---|---|---|
| caveman | Compress agent output | ~65% mean output token reduction |
| cavemem | Cross-agent persistent memory | Stores memories in compressed format (SQLite/FTS5) to reduce token bloat when recalling context |
| cavekit | Specification-driven build loops | Reduces agent “guessing” with structured autonomous workflows |
Plus two powerful sub-tools:
- caveman-compress, cuts ~46% of input tokens from files like
CLAUDE.mdevery session - caveman-shrink, an MCP proxy middleware for even deeper compression
Four Compression Levels
Caveman now offers four modes, including a 文言文 (Classical Chinese) mode:
| Level | Style | Example Output |
|---|---|---|
| Lite | Trims filler, keeps complete grammar | ”Your component re-renders because you create a new object reference each render. Wrap it in useMemo.” |
| Full (default) | Drops articles, fragments sentences | ”New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo.” |
| Ultra | Heavy abbreviation, telegram-style | ”Inline obj prop → new ref → re-render. useMemo.” |
| 文言文 | Classical Chinese compression | ”物出新參照,致重繪。useMemo Wrap之。” |
The core design principle remains unchanged: compress the language, never the code. Technical nouns, function names, file paths, and code blocks pass through completely untouched. Only natural-language fluff gets stripped.
Official Benchmark Numbers
From Caveman’s own benchmarks, 65% mean output reduction across tasks, with a range of 22–87%:
┌─────────────────────────────────────┐
│ TOKENS SAVED ████████ 65% │
│ TECHNICAL ACC. ████████ 100% │
│ SPEED INCREASE ████████ ~3x │
│ VIBES ████████ OOG │
└─────────────────────────────────────┘
The savings claim is backed by peer-reviewed research showing that forcing concise output doesn’t reduce accuracy — it can actually improve it. The model stops performing “helpful assistant theater” and focuses compute on the actual problem.
Key benefits:
- Faster responses — fewer tokens to generate = lower latency
- Easier to read — no wall of text, just the answer
- Same accuracy — all technical info preserved, only fluff dropped
- Save money — 65% mean output reduction (range 22–87%)
Setup: One Command, 30+ Agents
The installation has been dramatically simplified. A single command auto-detects every supported agent on your system:
Universal Installer (Recommended)
# macOS / Linux / WSL / Git Bash
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash
# Windows (PowerShell)
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex
This detects 30+ agents automatically: Claude Code, Gemini CLI, Codex, Cursor, Windsurf, Cline, GitHub Copilot, Continue, Kilo, Roo, Augment, Aider Desk, Amp, JetBrains Junie, Kiro CLI, Mistral Vibe, OpenHands, Qwen Code, Tabnine, Trae, Warp, Replit Agent, Antigravity, and more. It runs each agent’s native install, skips what you don’t have, and is safe to re-run.
Manual Install (Per Agent)
# Claude Code
claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman
# Gemini CLI
gemini extensions install https://github.com/JuliusBrussee/caveman
# Cursor / Windsurf / Cline / Copilot
npx skills add JuliusBrussee/caveman -a <cursor|windsurf|cline|github-copilot>
Installer Flags
| Flag | Effect |
|---|---|
--minimal | Skip extras, just install the plugin/extension |
--all | Also drop per-repo rule files into current directory |
--with-hooks | Install Claude Code hooks + statusline + stats badge |
--with-mcp-shrink | Register the caveman-shrink MCP proxy |
--with-init | Write auto-start rules for Cursor, Windsurf, Cline, Copilot, etc. |
--dry-run | Preview what would be installed |
--list | Show all detected agents |
Using It: Simple Voice Commands
Once installed, control Caveman with natural language:
- Activate: Type
/cavemanin your prompt (or$cavemanin Codex) - Deactivate: Say “stop caveman” or “switch back to normal”
- Change level:
/caveman ultraor just say “switch to 文言文 mode” - Commit mode:
caveman-commitfor tight commit messages - Review mode:
caveman-reviewfor one-line code reviews - Compress input:
caveman-compressto shrink context files
The transitions are seamless, no session restart needed.
The Deeper Point: Are You Paying for AI’s “Emotional Labor”?
Caveman is a small project with a big insight: a significant chunk of what we pay AI models for is performative politeness, greetings, hedging, empathetic sign-offs. That’s fine in a customer-facing chatbot. In a development tool, it’s pure waste.
When you force a model to skip the social performance and focus entirely on the technical answer, something interesting happens: the answers actually get better. Published research confirms that concise output constraints can improve model reasoning accuracy. The model stops maintaining a “helpful assistant persona” and focuses its compute on the actual problem.
That might be the real value of Caveman, not just the cost savings, but the quality improvement that comes from letting your tools be tools.
Who Should Use This?
- Anyone using AI coding assistants, it now supports 30+ agents, not just Claude Code
- Heavy API users spending significant monthly budgets on tokens
- Teams where multiple developers share a token budget
- Solo developers on free tiers who want to stretch their allocation
- Anyone tired of scrolling past AI pleasantries to find the actual answer
Who might skip it:
- If you’re writing documentation or client-facing content where natural language matters
- If you’re new to coding and benefit from verbose explanations
My Take After Two Weeks
I started on full and kept it there for most tasks. The savings numbers in the benchmarks above — 65% average — are accurate for prose-heavy responses, but in a real coding session most of your tokens come from file reads and conversation history, not Claude’s explanations. My actual per-session savings were closer to 8-12%. Still worth it when you’re running 15+ sessions a day.
What I didn’t expect: I started sending fewer follow-up messages. Without the hedging and filler, Claude’s first attempt was more focused, and I stopped needing to say “just give me the fix” as a second prompt. That saved more tokens than the compression itself, honestly.
I turn it off for debugging sessions where I’m genuinely confused and want verbose reasoning. And architecture discussions — I actually want Claude to think out loud when I’m evaluating trade-offs. But for everyday refactoring and code reviews, full is staying on permanently. The ultra level is a novelty; lite is barely noticeable. full hits the sweet spot.
Final Verdict
What started as a viral joke about caveman-speak has grown into a serious, well-engineered ecosystem at 52.3K stars. With four compression levels, a universal one-command installer for 30+ agents, input compression via caveman-compress, persistent memory via cavemem, and real-world savings of 22–87% on output tokens, Caveman has become essential infrastructure for cost-conscious AI-assisted development.
The developer efficiency divide might not be about which model you use, it might be about whether you have the nerve to tell your AI to stop talking and start working.
Free & Open Source | GitHub Repository
More from AI Tool Pick:
- Cursor vs Claude Code vs Windsurf vs Copilot, 30-Day Comparison
- 10 Best AI Tools in 2026
- ChatGPT vs Claude vs Gemini: Detailed Head-to-Head
- Free Online AI Tools: Token Counter, Cost Calculator, Prompt Builder & more
Comments
Sign in with GitHub to leave a comment. Your feedback is appreciated!