Mnemosyne: The Hybrid Memory Layer

Tier 3 · Real Builds 9 min read

Before this, read:

MCP basics — Mnemosyne exposes its API as MCP tools
Anatomy of an agent — agents are the consumers of this layer

Every agent system eventually hits the same problem. The researcher finishes a run, writes a report, and the next session starts from scratch — same gaps re-detected, same facts re-extracted, same context re-loaded from scratch. The CRM grows to hundreds of markdown files and “find someone you talked to last month” becomes a grepping expedition. The context window fills with files that are mostly irrelevant to the current task.

The solution is a real memory layer — one that knows not just what you’ve stored, but how the things you’ve stored relate to each other.

That’s Mnemosyne.

What it is

Mnemosyne is the hybrid memory layer for the agent system. It is not a Mem0 account, a hosted vector database, or a LangChain abstraction. It is two embedded databases running on the Mac Mini, stitched together by a Python module at ~/agent-system/agents/shared/memory.py:

Kùzu — an embedded graph database (~/clawd/memory/graph.kuzu/). Every CRM person is a Person node. Projects, domains, interactions, and research chunks become nodes and edges. Kùzu lets you ask “who does this person know” or “what projects is this domain connected to” without a network call.
Qdrant — a local file-backed vector store (~/clawd/memory/qdrant/). Three collections: crm_people (one embedding per person), project_chunks, and plaud_chunks. No Docker required; Qdrant runs embedded.
nomic-embed-text via Ollama — the embedding model. 768-dimensional, runs locally, costs $0 per query.

The storage spec came from an audit in April 2026 that found the existing “vector search” module (agents/ai_os/vector_search.py) was 432 lines of dead code — indexed by cron every two hours, called by zero application agents. That was not a small finding. The whole context-loading path was dump-and-pray: the project chat loaded README + WORKPLAN + CHANGELOG tail + LINKS and crammed ~40KB into every system prompt regardless of what the task actually needed.

The `hybrid_retrieve()` function

The public API for reading from Mnemosyne is:

from agents.shared.memory import hybrid_retrieve

hits = hybrid_retrieve("Matt Madden MBA analytics", top_k=8)

Each Hit has a score, a source (e.g. "crm_people:matt-madden"), text, and metadata. The retrieval path:

Vector search — embed the query with nomic-embed-text, search across all DEFAULT_COLLECTIONS (crm_people, project_chunks, plaud_chunks), return top candidates per collection.
Graph expansion — for each hit that maps to a Kùzu node, run a 1-hop expansion: find adjacent nodes (related people, projects, interactions) and promote their scores.
Score fusion — combine vector score and graph-expansion score with configurable weights (vector heavier by default, per AD-05). Return ranked hits.

The one-hop expansion is the move that vector-only retrieval cannot make. If you search for a person, you get back not just their embedding but their connections — the companies they’re linked to, the projects they’ve touched, the interaction log edges. Context that a similarity search would miss entirely.

The bridge at bridge/routes/projects.py reads from Mnemosyne when the USE_HYBRID_RETRIEVE=true flag is set, with a file-dump fallback if the layer is unavailable.

Three MCP tools

Because Mnemosyne runs as an MCP server, every agent session can call it through tools rather than importing Python:

memory.semantic_search   — pure vector, no graph expansion
memory.cypher_query      — raw Kùzu query (for graph traversals you write yourself)
memory.hybrid_retrieve   — the full pipeline described above

The MCP surface means that a session can retrieve from Mnemosyne at runtime without loading the library or connecting to the databases directly. The bridge spawns these as normal tool calls.

Re-ingesting the CRM

The seed dataset for Mnemosyne is the CRM. As of mid-2026, ~/clawd/memory/entities/people/ holds approximately 475 person files (after a garbage-stub sweep that quarantined 176 prompt-fragments and Plaud sentence-fragments from the count). Re-ingesting:

python3.12 -m agents.ai_os.memory_ingest_crm

This walks every people/*.md file, embeds each one, upserts into the crm_people Qdrant collection, and creates a Person node in Kùzu with the slug as the primary key. The graph edges (KNOWS, WORKS_AT, INTERACTED_ON) are populated from the metadata frontmatter and interaction log in each CRM file.

Why local and embedded

The design decision to use local embedded databases rather than hosted services was deliberate:

Cost. hybrid_retrieve() runs at $0 per query. A hosted vector service with 475+ people, thousands of project chunks, and hundreds of Plaud recordings ingested would run real money at query volume.

Latency. Kùzu’s embedded query time is sub-millisecond for 1-hop traversals. No network round-trip means Mnemosyne can be called inline from a heartbeat without meaningfully slowing it down.

Privacy. CRM data includes interaction notes, contact details, and domain-sensitive context that should not leave the machine.

The off-ramp is documented in the mnemosyne PRD: if the graph ever outgrows embedded Kùzu (measured by query latency degrading past 100ms), the migration path is to a self-hosted Neo4j-compatible instance. The schema is designed to make that migration mechanical.

What agents gain

Before Mnemosyne, a multi-hop question like “which contacts know someone at an HVAC company with retiring owners in the Mountain West?” was structurally impossible — vector similarity has no concept of “two hops away” or “knows.” With the graph layer, that becomes a Cypher traversal.

More practically: when the executive assistant drafts a follow-up email, it can call hybrid_retrieve("Sam Davenport scheduling this week") and get back not just Sam’s CRM entry but any adjacent calendar entries, recent interactions, and linked project context — in one call, ranked by relevance to the query.

That is the difference between dump-and-pray and memory that knows what it’s looking for.

Next: Why hybrid retrieval beats vector-only — the concrete case for the graph layer using the CRM and open-loops as examples.