What I
actually shipped.
I've been saying Claude Code 20–50×'ed my output. You asked how I'd prove it. So I had Claude run the math on itself. Every number comes from git log, filesystem, and the JSONL session transcripts. No vibes. Raw CSVs, the Python audit script, and the three companion sub-reports are all in the linked folder. Swap any weight, re-run.
How much is 20–50×?
Here's the spread.
Three different ways of asking the question. Each lands in the 20–50× band.
Plain version: every hour I was at the keyboard, about 19 human-work-equivalent hours came out. Over 108 days, that's roughly 2,156 workdays of throughput against my 728–1,092 keyboard-hours (91 active days × 8–12 hrs each). The 20–50× rhetoric holds only against a "no-AI 2026" baseline, which doesn't describe anyone actually working. Against real peers with AI, ~12–20× is the honest answer.
The three formulas.
1. Artifact-weighted Equivalent Human-Days
Each ship category gets a weight: how many workdays a solo principal designer-dev without AI would need for one of them. I calibrated against my own pre-2025 pace. Simply Smart Home: $1.5M to $5.5M. Tantum: 3 startups a year. iO Theater ticket sales: 50% to 75%.
2. Activity-volume (compressed parallel work)
Counts what a single human physically cannot do serially. Parallel subagent dispatches, tool-call time savings, my own labor. All summed into one equivalent-human-workdays number, divided by 108 calendar days.
3. Per keyboard-hour output
What most people mean by "output multiplier." Per hour I'm at the keyboard, how many human-work-equivalent hours come out, including the agents running in parallel. This is the one that's hardest to argue with.
What I shipped,
what I measured.
Every number comes from three sources. git log across all my repos. filesystem scan of clients, plugins, docs, chapters. JSONL parsing of 7,189 Claude Code session transcripts across 15 project directories (I run Claude Code from multiple cwds; the original single-dir scan missed ~30% of my activity).
Shipped artifacts (in window)
| Category | Count |
|---|---|
| Enterprise-tier sites / engagements | 7 |
| Portfolio-tier sites shipped | 2 |
| Sites in active build (≥50%) | 3 |
| Provisional patents filed (77 claims drafted) | 4 |
| Trademarks filed with methodology framework | 1 |
| Book chapters drafted (publication-quality) | 12 |
| OSS plugins / skills / framework | 8 |
| Cognograph MVP (Electron + React + WebGL + SaaS, ~50K LOC) | 1 |
| Forge (automated design-audit product) | 1 |
| Plans / substantive docs written | 75+ |
Activity volume
| Signal | Count |
|---|---|
| Git commits (Stefan-authored, 9 repos) | 1,578 |
| Lines of code added (gross) | 3,018,688 |
| Active commit days / 108 | 80 |
| Active days across git + Claude Code / 108 | 91 |
| Claude Code sessions (main + subagent, 15 project dirs) | 7,189 |
| Tool calls | 220,269 |
| Subagent dispatches (parallel work units) | 4,861 |
| Input + output tokens | 128.3M |
| Cache-read tokens (context reuse) | 40.4B |
2024 same-window baseline on every activity axis: 0. Claude Code did not exist in this form.
Data nugget: cognograph-02 cwd contributed 1,799 sessions in window but zero subagent dispatches. That project pre-dated my Agent-tool adoption. Subagents only show up in the workspace cwd starting March. Cache-read / I+O ratio is ~315×, which you'd expect from heavy long-session context persistence via the prompt cache.
What I checked,
what I skipped.
- The "20–50×" phrase was mine, not Claude's. I said it first, casually, on 2026-04-06 when I was drafting a Claude cancellation survey. Claude later flagged that exact line as marketing filler in a different draft and cut it. This audit is the first time anyone ran the math. It wasn't built to justify the claim, it was built to find out if it held.
-
Every weight is in the open. Counts are objective (git, filesystem, JSONL). Weights are subjective (workdays a principal solo would need per artifact). If you don't like my weights, swap them in. Artifacts in the companion folder:
session-audit.py(streaming parser, per-line JSON, handles 30M+ rows),session-audit-raw-multiproject.csv(per-session rows),audit-totals-multiproject.json(aggregates), plus01-git-audit.md/02b-session-audit-multiproject.md/03-ship-artifacts.md. - Four rounds of self-correction. Every time I pushed back on the first pass, the number went up, not down. First pass weights were at median-senior instead of principal-tier. The per-hour measure initially left out agent work (structurally wrong). The ship list got expanded three times because I kept remembering clients Claude had missed. When corrections move in one direction only, the first pass was conservative, not sympathetic.
- Cognograph's weight is a lowball on purpose. A solo designer-frontend without AI isn't shipping Cognograph (Electron + React + custom WebGL shader pipeline + SaaS backend, ~50K LOC source) in 108 days. Realistically impossible. I capped the weight at 250 workdays as a proxy, because "impossible" doesn't compute.
- Baseline choice drives half the number. Against "solo designer-frontend without AI, 2026" (a hypothetical nobody actually is), the artifact multiplier is 19–41×. Against "top-decile solo peer with Cursor + Copilot + Sonnet" (what your competitors actually use), it's 8–18×. I default to the stricter baseline in the headline because the no-AI comparison is a ghost. Calibration anchor for both: my own pre-2025 pace — Simply Smart Home drove 233% YOY, Tantum shipped 15+ startups on broadcast timelines, iO Theater ticket sales moved 50% to 75%.
- Confounders I can't control for. Some output credit belongs to (a) 15 years of prior expertise — PFD methodology is assembled from stuff I already knew, not invented in window; (b) Cognograph's pre-2026 foundations; (c) urgent runway, which changes working hours and focus regardless of tooling. "Claude's multiplier" is entangled with "Stefan's 2026 conditions." The numbers don't disaggregate that.
- Activity-volume has a double-count risk I corrected for. Subagent dispatches execute their own tool calls, so those tool calls appear both in the 220,269 total AND as subagent parallel work. The stricter read dedup's ~50% of tool calls against the subagent bucket and cuts per-call time savings from 5 min to 1 min. Without that correction, the activity multiplier is 39× (inflated); with it, 20×.
- The fact that's hardest to argue with: 4,861 subagent dispatches in 108 days. Work that literally didn't exist as a human-possible category before the tool shipped. That alone is ~1,800 workdays of parallel execution, nearly 17× one person's 108-day calendar capacity. Before I count anything I did myself.