Self-hosting spelunk-server
Run spelunk-server as a shared team service so a team can sync project memory — Docker, configuration, and production notes.
spelunk-server does two jobs:
- Local inference server (automatic). From v0.8.0 the CLI starts a local instance for you in the background to provide embeddings and LLM inference — see getting started. There is nothing to set up for this and the rest of this page does not apply.
- Team memory server (optional, deployed). The same binary, run as a long-lived service, lets a team share project memory (decisions, context, requirements) without sharing code. Each developer's code index stays local; only memory entries travel to the server.
This page covers the second case: running spelunk-server as a deployed,
shared service.
Quick start (Docker)
# Clone and build
git clone https://github.com/spelunk-cloud/spelunk
cd spelunk
# Start the server (no auth — dev only)
docker compose up -d
# Verify
curl http://localhost:7777/v1/health
# → {"status":"ok","version":"0.8.0","capabilities":["memory"],...}With an API key (recommended)
# Generate a key
export SPELUNK_SERVER_KEY=$(openssl rand -hex 32)
# Start
SPELUNK_SERVER_KEY=$SPELUNK_SERVER_KEY docker compose up -d
# Save the key — you'll need to distribute it to your team
echo "SPELUNK_SERVER_KEY=$SPELUNK_SERVER_KEY"Client configuration
Each developer adds a .spelunk/config.toml at the project root (commit it —
it contains no secrets):
# .spelunk/config.toml — commit this
server_url = "http://spelunk.internal:7777"
project_id = "my-awesome-app"Personal config — ~/.config/spelunk/config.toml (never commit this; it can
hold secrets):
# ~/.config/spelunk/config.toml
server_key = "your-shared-api-key"Or use an environment variable instead of the personal config file:
export SPELUNK_SERVER_KEY=your-shared-api-keyThe legacy
memory_server_url/memory_server_keyTOML keys remain accepted as deprecated aliases forserver_url/server_key.
project_id is required when server_url points at a non-loopback address.
If server_url is a loopback address (127.0.0.1, localhost, ::1),
project_id may be omitted — spelunk derives one from the project's git
remote (or a local path hash if there's no remote).
Migrating existing local memory
If team members have existing local memory.db entries, push them to the
server once .spelunk/config.toml is set up:
spelunk memory pushThis reads the local memory database and sends all active entries to the
server. Archived entries are skipped by default; pass --include-archived to
push them.
Managing the server from the CLI
The same spelunk server subcommands used for the local autostarted server
also work against a server you run yourself, when invoked on the host running
it:
spelunk server start [--port <n>] [--bin <path>] [--db <path>]
spelunk server stop
spelunk server status
spelunk server logs [-n <lines>]| Subcommand | Notes |
|---|---|
start | Idempotent; tries --port (default 7777) then 7778–7787 on collision; auto-binds 127.0.0.1 |
stop | SIGTERM the running daemon and wait for exit |
status | Print PID, port, instance id, and uptime |
logs | Print the last N lines of the server log (-n, default 50) |
Runtime state lives under ~/.local/state/spelunk/ (server.pid,
server.port, server.log).
Multiple projects
One server instance supports multiple projects. Each project has its own
namespace — entries from project_id = "api" are invisible to clients
configured with project_id = "frontend". Projects are auto-created on first
write — no registration step required.
Embedding dimension
All clients writing to the same project must use the same embedding model. The server records the embedding dimension on the first write and rejects subsequent writes with a different dimension.
Default: 768 dimensions (EmbeddingGemma 300M).
If your team uses a different model, configure the server at startup:
docker compose run spelunk-server --embedding-dim 1024Or via compose environment:
environment:
SPELUNK_EMBEDDING_DIM: "1024"Production deployment
docker-compose.yml is the recommended minimal deployment — just
spelunk-server plus a named volume for the SQLite database.
Key considerations:
- Put the server behind a VPN or private subnet (the API key is the app-level guard; network-level access control is the real security boundary)
- The SQLite WAL-mode database handles 2–20 concurrent writers comfortably
- Back up the volume (
spelunk.db) with your normal database backup process spelunk-serverterminates plain HTTP — do not expose it directly on a public or shared-network interface. Bind it to127.0.0.1and put a TLS-terminating reverse proxy (Caddy, nginx) in front of it
Running without Docker
# Build
cargo build --release --bin spelunk-server
# Run, bound to loopback
./target/release/spelunk-server \
--db /var/lib/spelunk/spelunk.db \
--port 7777 \
--host 127.0.0.1 \
--key your-api-keyOr with the API key via environment variable:
SPELUNK_SERVER_KEY=$(openssl rand -hex 32) \
spelunk-server --port 7777 --host 127.0.0.1Reverse proxy (Caddy)
spelunk.example.com {
reverse_proxy 127.0.0.1:7777
}caddy run --config /etc/caddy/CaddyfileReverse proxy (nginx)
server {
listen 443 ssl;
server_name spelunk.example.com;
ssl_certificate /etc/letsencrypt/live/spelunk.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/spelunk.example.com/privkey.pem;
location / {
proxy_pass http://127.0.0.1:7777;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# The memory stream is server-sent events — don't buffer it.
proxy_set_header Connection '';
proxy_buffering off;
proxy_read_timeout 1h;
}
}The proxy_buffering off / long read-timeout block matters: spelunk memory watch uses server-sent events, and default nginx buffering would stall it.
systemd unit
# /etc/systemd/system/spelunk-server.service
[Unit]
Description=spelunk-server
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=spelunk
Environment=SPELUNK_SERVER_KEY=your-shared-api-key
ExecStart=/usr/local/bin/spelunk-server --port 7777 --host 127.0.0.1 --db /var/lib/spelunk/spelunk.db
Restart=on-failure
RestartSec=5
# Hardening — the server only needs its data dir and loopback.
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
ReadWritePaths=/var/lib/spelunk
[Install]
WantedBy=multi-user.targetsudo systemctl daemon-reload
sudo systemctl enable --now spelunk-server
sudo systemctl status spelunk-serverFull stack with Ollama (Linux/NVIDIA only)
docker-compose.full.yml adds Ollama for server-side LLM inference. This
requires Linux + NVIDIA GPU + nvidia-container-toolkit. It does not work on
Apple Silicon (Docker runs in a Linux VM without GPU passthrough).
SPELUNK_SERVER_KEY=your-key docker compose -f docker-compose.full.yml up -dPointing a remote agent at the server
On a remote host (or in its container), set:
export SPELUNK_SERVER_URL=https://spelunk.example.com
export SPELUNK_SERVER_KEY=your-shared-api-key
spelunk check # should report the server reachable over TLS
spelunk search "auth tokens"The agent's network path to spelunk.example.com is yours to provide — a
VPN, Tailscale, or a public DNS record. Spelunk does not tunnel traffic; it
just needs the URL to resolve and the TLS proxy to answer.
API reference
All routes require Authorization: Bearer <key> except /v1/health.
GET /v1/health
GET /v1/projects
POST /v1/projects/{project_id}/memory
GET /v1/projects/{project_id}/memory ?kind=&limit=&archived=
GET /v1/projects/{project_id}/memory/{id}
POST /v1/projects/{project_id}/memory/search
DELETE /v1/projects/{project_id}/memory/{id}
POST /v1/projects/{project_id}/memory/{id}/archive
POST /v1/projects/{project_id}/memory/{id}/supersede
GET /v1/projects/{project_id}/memory/since ?t=<epoch>&limit=
GET /v1/projects/{project_id}/memory/stream (Server-Sent Events)
GET /v1/projects/{project_id}/memory/harvested-shas
GET /v1/projects/{project_id}/stats
POST /v1/projects/{project_id}/index/embed (embedding proxy — vectors not stored)
POST /v1/projects/{project_id}/search (query embedding proxy for CLI KNN)
POST /v1/projects/{project_id}/explore (SSE — LLM reasoning loop)
POST /v1/projects/{project_id}/llm/complete (SSE — raw LLM completion)Conflict detection
When POST /v1/projects/{project_id}/memory is called, the server checks
whether a semantically similar entry already exists (cosine similarity >=
0.92). If a conflict is detected, the response is HTTP 409 with a JSON
body:
{
"stored": true,
"id": 42,
"conflicts": [
{ "id": 37, "title": "Previous similar entry", "similarity": 0.97 }
]
}The new entry is stored with a contradicts edge to the conflicting entry.
Clients should log or display this warning. Configure the threshold with the
--conflict-threshold flag (0.0–1.0, default 0.92).
What's next
- Memory guide — kinds, supersede chains, harvesting, and
memory push/since/watch - Config reference —
server_url,server_key,project_id, and deprecated aliases - CLI reference —
spelunk serverandspelunk memorysubcommands