The proof, not the pitch — measured, not estimated
Reproduced locally on SigMap v7.30: whole-repo extraction across 405 repositories, 51 real coding tasks answered with vs without SigMap, and a BM25 re-ranker that lifts retrieval. We also A/B-tested a real agent (Devin) — and report that result honestly below, win or not. Every figure is a real measurement; the full method and raw data are open.
- ~99%
- fewer tokens
- 98.7% overall, 405 repos
- 96×
- cheaper context
- 51 real coding tasks
- 82.4%
- retrieval hit@5
- BM25 re-ranker, +7pts
1 · Token reduction at scale — 321 repositories
1,765,696,549 → 23,427,118 tokens — an 98.7% overall reduction (95.6% average per repo). 84 of 405repos use languages SigMap doesn't yet parse and are excluded from the headline.
2 · Real coding tasks — 51 tasks, with vs without SigMap
For each task we measured the tokens an LLM needs to answer using the whole repo versus only the files SigMap ranks. Tokens are model-reported; cost is derived.
3 · Does it also make a real agent faster? — we tested it honestly
The savings above are about context size and cost, and they're deterministic. A separate question is whether smaller context also makes an autonomous agent finish faster. We A/B-tested one (Devin) on these tasks — A = task only, B = SigMap context first — at 3 reps each. The honest result: too close to call.
What this means: this is notevidence SigMap slows an agent down — the two are statistically even. The agent's run-to-run variance is simply large (some runs exceeded our 30-min measurement cap), so we can't yet claim a speedup either way. An early single run looked like a big win (~61%) but didn't hold up across 3reps — so we're showing you the real result, not that number. SigMap's proven value is the ~99% smaller, 96× cheaper, better-retrieved context above— what your agent does with that head start is the open question we're still measuring.
What these numbers don't say
- •84 of 405 repos use languages SigMap doesn't yet parse (Clojure/Lua/C/C++/Haskell) — excluded from the headline, not hidden.
- •Retrieval precision is the ceiling: even the BM25 re-ranker (82.4% hit@5) sometimes surfaces a neighbour file, not the exact target.
- •We did NOT find a reproducible agent wall-clock speedup: a 3-rep Devin A/B came out within noise (8.4 vs 8.0 min on completed runs), with high variance and some sessions exceeding our 30-min cap. The token/cost savings below are deterministic; the agent-speed question is still open.
- •Devin's ACUs (its billing unit) aren't exposed by the API — only on its dashboard — so cost-per-task on the agent side is not yet measured.