Self-hosting spelunk-server

Run spelunk-server as a shared team service so a team can sync project memory — Docker, configuration, and production notes.

spelunk-server does two jobs:

Local inference server (automatic). From v0.8.0 the CLI starts a local instance for you in the background to provide embeddings and LLM inference — see getting started. There is nothing to set up for this and the rest of this page does not apply.
Team memory server (optional, deployed). The same binary, run as a long-lived service, lets a team share project memory (decisions, context, requirements) without sharing code. Each developer's code index stays local; only memory entries travel to the server.

This page covers the second case: running spelunk-server as a deployed, shared service.

Quick start (Docker)

# Clone and build
git clone https://github.com/spelunk-cloud/spelunk
cd spelunk

# Start the server (no auth — dev only)
docker compose up -d

# Verify
curl http://localhost:7777/v1/health
# → {"status":"ok","version":"0.8.0","capabilities":["memory"],...}

With an API key (recommended)

# Generate a key
export SPELUNK_SERVER_KEY=$(openssl rand -hex 32)

# Start
SPELUNK_SERVER_KEY=$SPELUNK_SERVER_KEY docker compose up -d

# Save the key — you'll need to distribute it to your team
echo "SPELUNK_SERVER_KEY=$SPELUNK_SERVER_KEY"

Client configuration

Each developer adds a .spelunk/config.toml at the project root (commit it — it contains no secrets):

# .spelunk/config.toml — commit this
server_url = "http://spelunk.internal:7777"
project_id = "my-awesome-app"

Personal config — ~/.config/spelunk/config.toml (never commit this; it can hold secrets):

# ~/.config/spelunk/config.toml
server_key = "your-shared-api-key"

Or use an environment variable instead of the personal config file:

export SPELUNK_SERVER_KEY=your-shared-api-key

The legacy memory_server_url / memory_server_key TOML keys remain accepted as deprecated aliases for server_url / server_key.

project_id is required when server_url points at a non-loopback address. If server_url is a loopback address (127.0.0.1, localhost, ::1), project_id may be omitted — spelunk derives one from the project's git remote (or a local path hash if there's no remote).

Migrating existing local memory

If team members have existing local memory.db entries, push them to the server once .spelunk/config.toml is set up:

spelunk memory push

This reads the local memory database and sends all active entries to the server. Archived entries are skipped by default; pass --include-archived to push them.

Managing the server from the CLI

The same spelunk server subcommands used for the local autostarted server also work against a server you run yourself, when invoked on the host running it:

spelunk server start [--port <n>] [--bin <path>] [--db <path>]
spelunk server stop
spelunk server status
spelunk server logs [-n <lines>]

Subcommand	Notes
`start`	Idempotent; tries `--port` (default 7777) then 7778–7787 on collision; auto-binds `127.0.0.1`
`stop`	SIGTERM the running daemon and wait for exit
`status`	Print PID, port, instance id, and uptime
`logs`	Print the last N lines of the server log (`-n`, default 50)

Runtime state lives under ~/.local/state/spelunk/ (server.pid, server.port, server.log).

Multiple projects

One server instance supports multiple projects. Each project has its own namespace — entries from project_id = "api" are invisible to clients configured with project_id = "frontend". Projects are auto-created on first write — no registration step required.

Embedding dimension

All clients writing to the same project must use the same embedding model. The server records the embedding dimension on the first write and rejects subsequent writes with a different dimension.

Default: 768 dimensions (EmbeddingGemma 300M).

If your team uses a different model, configure the server at startup:

docker compose run spelunk-server --embedding-dim 1024

Or via compose environment:

environment:
  SPELUNK_EMBEDDING_DIM: "1024"

Production deployment

docker-compose.yml is the recommended minimal deployment — just spelunk-server plus a named volume for the SQLite database.

Key considerations:

Put the server behind a VPN or private subnet (the API key is the app-level guard; network-level access control is the real security boundary)
The SQLite WAL-mode database handles 2–20 concurrent writers comfortably
Back up the volume (spelunk.db) with your normal database backup process
spelunk-server terminates plain HTTP — do not expose it directly on a public or shared-network interface. Bind it to 127.0.0.1 and put a TLS-terminating reverse proxy (Caddy, nginx) in front of it

Running without Docker

# Build
cargo build --release --bin spelunk-server

# Run, bound to loopback
./target/release/spelunk-server \
  --db /var/lib/spelunk/spelunk.db \
  --port 7777 \
  --host 127.0.0.1 \
  --key your-api-key

Or with the API key via environment variable:

SPELUNK_SERVER_KEY=$(openssl rand -hex 32) \
  spelunk-server --port 7777 --host 127.0.0.1

Reverse proxy (Caddy)

spelunk.example.com {
    reverse_proxy 127.0.0.1:7777
}

caddy run --config /etc/caddy/Caddyfile

Reverse proxy (nginx)

server {
    listen 443 ssl;
    server_name spelunk.example.com;

    ssl_certificate     /etc/letsencrypt/live/spelunk.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/spelunk.example.com/privkey.pem;

    location / {
        proxy_pass         http://127.0.0.1:7777;
        proxy_http_version 1.1;
        proxy_set_header   Host $host;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto $scheme;

        # The memory stream is server-sent events — don't buffer it.
        proxy_set_header   Connection '';
        proxy_buffering    off;
        proxy_read_timeout 1h;
    }
}

The proxy_buffering off / long read-timeout block matters: spelunk memory watch uses server-sent events, and default nginx buffering would stall it.

systemd unit

# /etc/systemd/system/spelunk-server.service
[Unit]
Description=spelunk-server
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=spelunk
Environment=SPELUNK_SERVER_KEY=your-shared-api-key
ExecStart=/usr/local/bin/spelunk-server --port 7777 --host 127.0.0.1 --db /var/lib/spelunk/spelunk.db
Restart=on-failure
RestartSec=5

# Hardening — the server only needs its data dir and loopback.
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
ReadWritePaths=/var/lib/spelunk

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable --now spelunk-server
sudo systemctl status spelunk-server

Full stack with Ollama (Linux/NVIDIA only)

docker-compose.full.yml adds Ollama for server-side LLM inference. This requires Linux + NVIDIA GPU + nvidia-container-toolkit. It does not work on Apple Silicon (Docker runs in a Linux VM without GPU passthrough).

SPELUNK_SERVER_KEY=your-key docker compose -f docker-compose.full.yml up -d

Pointing a remote agent at the server

On a remote host (or in its container), set:

export SPELUNK_SERVER_URL=https://spelunk.example.com
export SPELUNK_SERVER_KEY=your-shared-api-key

spelunk check                 # should report the server reachable over TLS
spelunk search "auth tokens"

The agent's network path to spelunk.example.com is yours to provide — a VPN, Tailscale, or a public DNS record. Spelunk does not tunnel traffic; it just needs the URL to resolve and the TLS proxy to answer.

API reference

All routes require Authorization: Bearer <key> except /v1/health.

GET    /v1/health
GET    /v1/projects
POST   /v1/projects/{project_id}/memory
GET    /v1/projects/{project_id}/memory           ?kind=&limit=&archived=
GET    /v1/projects/{project_id}/memory/{id}
POST   /v1/projects/{project_id}/memory/search
DELETE /v1/projects/{project_id}/memory/{id}
POST   /v1/projects/{project_id}/memory/{id}/archive
POST   /v1/projects/{project_id}/memory/{id}/supersede
GET    /v1/projects/{project_id}/memory/since     ?t=<epoch>&limit=
GET    /v1/projects/{project_id}/memory/stream    (Server-Sent Events)
GET    /v1/projects/{project_id}/memory/harvested-shas
GET    /v1/projects/{project_id}/stats
POST   /v1/projects/{project_id}/index/embed      (embedding proxy — vectors not stored)
POST   /v1/projects/{project_id}/search           (query embedding proxy for CLI KNN)
POST   /v1/projects/{project_id}/explore          (SSE — LLM reasoning loop)
POST   /v1/projects/{project_id}/llm/complete     (SSE — raw LLM completion)

Conflict detection

When POST /v1/projects/{project_id}/memory is called, the server checks whether a semantically similar entry already exists (cosine similarity >= 0.92). If a conflict is detected, the response is HTTP 409 with a JSON body:

{
  "stored": true,
  "id": 42,
  "conflicts": [
    { "id": 37, "title": "Previous similar entry", "similarity": 0.97 }
  ]
}

The new entry is stored with a contradicts edge to the conflicting entry. Clients should log or display this warning. Configure the threshold with the --conflict-threshold flag (0.0–1.0, default 0.92).

What's next

Memory guide — kinds, supersede chains, harvesting, and memory push/since/watch
Config reference — server_url, server_key, project_id, and deprecated aliases
CLI reference — spelunk server and spelunk memory subcommands

Self-hosting spelunk-server

On this page