Why Hybrid Retrieval Beats Vector-Only

Tier 3 · Real Builds 7 min read

Before this, read:

Mnemosyne: the hybrid memory layer — what the layer is before you argue for its design

Vector search is genuinely good. You embed a query, embed your documents, rank by cosine similarity, return the top-K. For “what does this document say about X?” it works. For a lot of agent retrieval tasks, it is the right tool.

The problems start when you need to answer questions that are inherently relational.

What vector similarity cannot do

Consider three questions an agent might need to answer:

“Who should I ask about HVAC deals in Utah?” — Vector search can find people whose descriptions mention HVAC or Utah. It cannot tell you which of your contacts has a direct connection to someone in that space, because that relationship is not in any one person’s embedding. It’s an edge between two nodes.
“What open loops are blocked on Felix?” — You have 15 open loops in the system. A vector search for “Felix” will score Felix’s CRM entry highly and probably miss that three open loops have him in their metadata — because those loops aren’t in the same embedding space as the person record.
“Show me everything the system learned about mnemosyne in April.” — Pure similarity retrieval doesn’t know about time. The recorded_at and valid_from fields on a bi-temporal graph node give you temporal slicing that a vector distance score cannot.

These are not edge cases. They come up constantly in a personal agent system that tracks people, projects, commitments, and interactions across multiple domains. The common structure: you need to cross the boundary between two kinds of things and retrieve based on how they’re related, not just how similar their text is.

The 1-hop expansion in practice

Mnemosyne’s hybrid_retrieve() adds graph expansion on top of vector ranking. When a hit maps to a node in the Kùzu graph, the system runs a 1-hop traversal — follow every edge from that node one step, collect adjacent nodes, and promote their scores based on edge weight and recency.

What this looks like concretely: you search for “Felix Vivanco consulting,” the vector step returns Felix’s CRM entry. The graph step finds:

A MENTIONED_IN edge to an open loop about the Subefy Dropbox review
A WORKS_AT edge to Vivanco Mortgage
An INTERACTED_ON edge dated 2026-06-07 (the mortgage roadmap meeting)

All three of those nodes get score-boosted and appear in the returned hits. Without the graph step, you’d get Felix’s CRM entry and nothing else.

The CRM as a graph seed

The CRM is the most obvious case for graph-backed retrieval. Consider what a flat embedding of a person file can and can’t represent:

Vector can represent: name, role, how you met, what you discussed, key facts.

Vector cannot represent: the path from this person to another person through a shared company, the degree of separation between two contacts, the fact that four separate open loops all depend on a response from this person.

When Mnemosyne ingests the CRM (via python3.12 -m agents.ai_os.memory_ingest_crm), each person file becomes both an embedding in the crm_people collection AND a Person node in Kùzu with edges populated from the metadata. The graph edges are where the relationships live.

This is why the mnemosyne README calls the CRM the “seed dataset” — it’s not the most complex use of the graph, but it’s the one that exists right now and that proves the retrieval path works.

The open-loops pairing

The CLAUDE.md loops doctrine says: “open-loop entries that are machine-checkable conditions get a paired watcher.” Mnemosyne extends this idea to retrieval: open loops that reference a person or project should be graph-linked to those entities so that searching for the person or project surfaces the pending loops automatically.

When you query for someone who has three pending open loops — a draft email waiting for approval, a follow-up call to schedule, a Dropbox review to complete — a vector search returns their CRM entry and maybe their most recent interaction note. A graph query returns all three loops through their MENTIONED_IN or BLOCKS edges.

The executive assistant can ask “what’s outstanding with Felix?” and get a complete picture from a single hybrid_retrieve() call. That’s the practical payoff.

The dead-code audit that forced this

The argument for adding the graph layer was not theoretical. In April 2026, an audit found that agents/ai_os/vector_search.py — 432 lines of vector indexing and search code — was being re-indexed by a cron job every two hours and called by zero agents. The whole retrieval path bypassed it entirely.

The diagnosis: vector-only retrieval was insufficient for the actual queries the agents needed to make, so agents fell back to dumping files directly into the system prompt. That’s the failure mode. When vector search doesn’t answer the real question, you don’t build a better vector search — you build dump-and-pray at scale.

Vector alone (what existed)

Substring match across 372+ markdown files. Re-indexed every 2 hours. Called by zero agents. Everything loaded raw into the system prompt.

Hybrid retrieval (what Mnemosyne provides)

Semantic search + 1-hop graph expansion across CRM, projects, and Plaud. Ranked hits with source provenance. Single call from any agent or MCP tool.

The honest status

As of mid-2026, Mnemosyne v1 is operational: Kùzu + Qdrant installed, CRM ingested, hybrid_retrieve() live, MCP tools registered, bridge reads from it behind a feature flag. The Obsidian vault ingest is deferred (iCloud permissions from the agent session), and the researcher-emitted triples that would make the graph truly rich are a v2 item.

The bi-temporal fact model (tracking when something became true vs. when it was recorded) is in the PRD but not shipped yet. That matters for questions like “who did JD talk to in April about X?” — right now the recorded_at timestamps are on Kùzu nodes but the full valid_from/valid_to bi-temporal model is v3.

The system is better than dump-and-pray. It is not yet the Roman Forum JD described in the README — the version where the researcher emits triples, ACQUISITOR traverses deal networks, and every agent compounds on top of a shared knowledge graph. Getting from here to there is what v2 through v5 are for.

Next: The CRM: every person becomes an entity — how the auto-CRM utility works, how entity files are structured, and why 475 people are in the graph.