Day 35: Testing AI Development Tools: Semble
I am always looking for ways to improve my development workflow, so when I saw Semble and its promise, I decided to test it right away.
Benchmark report
BENCHMARK REPORT Runner Seconds Turns Cost InputTok OutputTok CacheCreate CacheRead Changed ------------------------------------------------------------------------------------------------------------------------ claude-no-semble 138.82 5 $0.7690 5 2,336 19,906 70,333 0 claude-with-semble 182.83 27 $0.9135 29 5,306 38,500 1,080,097 0 Fastest: claude-no-semble at 138.82s Lowest reported cost: claude-no-semble at $0.7690 Lowest output tokens: claude-no-semble with 2,336
I tested Semble on the AudioTutor repo using Claude Code. The task was to trace the "More explanation" / explanation-cache flow across UI, API, server logic, tests, and docs.
Runner Time Turns Cost Output tokens Cache read ---------------------------------------------------------------------------- Claude without Semble 138.82s 5 $0.7690 2,336 70,333 Claude with Semble 182.83s 27 $0.9135 5,306 1,080,097
TLDR: Semble did not win in this test. It made the agent slower, more expensive, and more turn-heavy.
The takeaway: unfortunately, a no-go. This tool might be useful for context-heavy apps, but for exact-code tracing tasks, normal search was better here. The important benchmark is not "does the search tool return fewer tokens?" but: does the whole agent finish faster, cheaper, and with equal or better quality?
David Tsalani