The features we deleted

Most "building in public" posts are about what someone shipped. This one is about what we took out: six commands, an entire category of work, and a backend we'd already built. Removing them is the work I'm most confident was right.

If the first post was the question and the second was where the answer lives, this is the messy middle: the part where we'd built too much and had to subtract our way back to something that starts in the first minute.

The line in the sand: no LLM in the CLI

Early on, the command-line tool could do clever things. You could ask it open-ended questions and it would call out to a model and reason for you. It felt impressive in a demo.

It was also slow, non-deterministic, and quietly demanding. Clever-in-the-CLI meant you needed a model configured before the tool was useful, meant every invocation could hit the network, meant the same command could give you two different answers on two runs. For a thing an agent shells out to thousands of times, "sometimes it phones a model and thinks for a while" is not a feature. It's a tax.

So we drew a line and we've held it: no LLM in the CLI. The command-line tool does the deterministic, fast, local things (memory, the code graph, full-text search) and returns the same answer every time, with no key and no model server required to start. Anything that genuinely needs a model lives server-side, opt-in, where it isn't in the hot path of every single call.

Drawing that line meant deleting things that crossed it.

The six commands we removed

In one release we cut a pile of commands that had grown up around the "clever CLI" idea: the ones that wrapped a model to ask, plan, spec, snapshot, recount history, and verify things for you. Six of them, gone in a single pass.

Some of them I genuinely liked. But every one of them either pulled an LLM into the CLI (violating the line above) or duplicated something the agent calling Spelunk was already better placed to do itself. Spelunk's job is to be the fast, factual layer the agent queries, not a small, worse agent of its own. Once that was clear, the commands weren't hard calls. They were just overdue.

The tool got smaller. It also got faster to explain, which is its own kind of feature: the surface you have to describe to a new user shrank to the three things that actually matter.

Zero infrastructure, by deletion

The bigger subtraction was about what you need before the tool is useful.

The early shape assumed a setup step: stand something up, build an index, then start. We kept asking why that step had to be there, and the honest answer was that it didn't, not for the core. Memory writes through to git notes, so it needs no store of its own. The code graph and full-text search read straight from your code and your git history. None of that requires an index or a server.

So the default became index-free. You run the tool inside a repo and it works: the session entry point gives an agent useful context immediately, with nothing provisioned. Semantic search, which genuinely does need an index and an embedding endpoint, became the opt-in step for people who want concept-level search, rather than a toll everyone pays at the door.

This is where I have to be precise, because it's the kind of thing engineers check. While we were chasing zero-infrastructure memory we prototyped two ways of putting memory into git. One wrote through to git notes, anchored to commits, carrying provenance, cloning with the repo. The other was a lower-level backend that wrote into git's own metadata more directly. We kept the notes write-through. We dropped the lower-level one. It was real, it was built, and it didn't earn the extra complexity it added over plain notes. The shipped mechanism is git-notes write-through; the other was a prototype we learned from and removed. Saying that plainly is the whole point of a post like this.

Dropping sources we'd already built

The last category is the least glamorous and maybe the most honest. We'd built ways of pulling context from the codebase (the harvest sources) and some of them simply didn't earn their keep. There was a catch-all source that swept in more than it should have; it produced noise faster than signal, and we cut it.

It's uncomfortable to delete a thing you built. There's a sunk-cost reflex that says keep it, it might be useful, someone might want it. But a feature nobody reaches for isn't neutral: it's surface area. It's another thing to document, another flag to explain, another way for a first-time user to take a wrong turn. Removing it made the remaining sources easier to trust, because every one that's left is one we'd actually recommend.

What subtraction bought

Every one of these deletions pointed the same direction. No LLM in the CLI made it deterministic and fast. Index-free-by-default made it start in the first minute. Dropping the extra backend and the weak sources made it smaller to learn and easier to trust.

None of that is a roadmap of exciting new capabilities. It's the opposite: a list of things that are no longer there. But I think a project tells you more about itself by what it threw away than by what it kept. The features we deleted are why the ones that remain start fast and reason cleanly.

The next posts in the series go back to the additions: why the memory deliberately doesn't belong to any one agent vendor, and what changes when a whole team's agents draw from the same memory instead of each working alone.

Spelunk is open source and code-aware, callable from whatever agent you already use. Repo and docs: spelunk.cloud.