N16X Blog / Day 35: Testing AI Development Tools: Semble

Day 35: Testing AI Development Tools: Semble

I am always looking for ways to improve my development workflow, so when I saw Semble and its promise, I decided to test it right away.

Benchmark report

BENCHMARK REPORT
Runner                     Seconds  Turns      Cost     InputTok    OutputTok   CacheCreate    CacheRead    Changed
------------------------------------------------------------------------------------------------------------------------
claude-no-semble            138.82      5   $0.7690            5        2,336        19,906       70,333          0
claude-with-semble          182.83     27   $0.9135           29        5,306        38,500    1,080,097          0

Fastest: claude-no-semble at 138.82s
Lowest reported cost: claude-no-semble at $0.7690
Lowest output tokens: claude-no-semble with 2,336

I tested Semble on the AudioTutor repo using Claude Code. The task was to trace the "More explanation" / explanation-cache flow across UI, API, server logic, tests, and docs.

Runner                 Time      Turns   Cost     Output tokens   Cache read
----------------------------------------------------------------------------
Claude without Semble   138.82s       5   $0.7690          2,336       70,333
Claude with Semble      182.83s      27   $0.9135          5,306    1,080,097

TLDR: Semble did not win in this test. It made the agent slower, more expensive, and more turn-heavy.

The takeaway: unfortunately, a no-go. This tool might be useful for context-heavy apps, but for exact-code tracing tasks, normal search was better here. The important benchmark is not "does the search tool return fewer tokens?" but: does the whole agent finish faster, cheaper, and with equal or better quality?

David Tsalani