Refactor documentation to distinguish character reference from AI system prompt. Removed user context and persona definitions. System prompt instructions moved to prompts/personal/bourdain.md.
9.0 KiB
Mnemosyne
Multimodal personal knowledge base — Robert's curated content across many domains, retrieved through a content-type-aware MCP surface.
- MCP server name:
mnemosyne - Prompt snippet: prompts/tools/mnemosyne.md
- Project repo:
/home/robert/git/mnemosyne
What It Is
Mnemosyne is "the memory of everything you know" — Robert's content-type-aware multimodal knowledge management system built on Neo4j vector storage and Qwen3-VL embeddings. Unlike a generic vector store, Mnemosyne knows what kind of thing a document is (a novel, a textbook, an album, a journal entry, a business proposal) and adjusts chunking, embedding, and retrieval accordingly.
It is a retrieval surface, not a synthesis engine. Tools return ranked evidence — chunks plus metadata. The calling agent reads the chunks and forms the answer, citing chunk UIDs back so Robert can trace what informed the response.
Concepts
Library — the top-level container. Each library has a library_type that drives chunking, embedding, and re-ranking strategy.
Collection — a named group of items inside a library (a novel series, a multi-volume manual).
Item — an indexed document or file. Only items with embedding_status = "completed" appear in search results.
Chunk — a text segment of an item. search returns a text_preview (~500 chars); use get_chunk for the full text.
Library Types
library_type |
Content |
|---|---|
fiction |
Novels, short stories. Cover art available. |
nonfiction |
General non-fiction prose. |
technical |
Manuals, textbooks, docs. Diagrams and code-like content. |
music |
Lyrics, liner notes, album artwork. |
film |
Scripts, synopses, stills. |
art |
Catalogs, descriptions, the artwork itself. |
journal |
Personal entries; temporal/reflective. |
business |
Proposals, marketing, sales, strategy. Commercial context. |
finance |
Statements, tax, market commentary. Quote figures exactly. |
Scoping queries to the right library_type matters. A search for "Stoic philosophy" against the finance library returns useless results.
MCP Tools
Recommended workflow
list_libraries
→ search(query, library_type=..., library_uid=...)
→ get_chunk(chunk_uid) # only when text_preview is insufficient
search
Hybrid retrieval: vector + full-text + concept-graph candidates fused by RRF (Reciprocal Rank Fusion), with optional Synesis re-ranking.
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
str | required | The search query |
library_uid |
str | None | None | Restrict to one library by UID |
library_type |
str | None | None | Restrict by library type (table above) |
collection_uid |
str | None | None | Restrict to one collection by UID |
limit |
int | 20 | Max candidates to return |
rerank |
bool | True | Apply Synesis re-ranking |
include_images |
bool | True | Include matching images in the response |
search_types |
list[str] | None | ["vector", "fulltext", "graph"] |
Which retrieval strategies to run |
Returns:
{
"query": "...",
"candidates": [
{
"chunk_uid": "...",
"item_uid": "...",
"item_title": "...",
"library_type": "...",
"text_preview": "... (~500 chars) ...",
"score": 0.92,
"source": "vector|fulltext|graph"
}
],
"images": [...],
"total_candidates": 42,
"search_time_ms": 85,
"reranker_used": true,
"reranker_model": "...",
"search_types_used": ["vector", "fulltext", "graph"]
}
get_chunk
Full text of a single chunk by UID. Use when text_preview is insufficient.
| Parameter | Type | Description |
|---|---|---|
chunk_uid |
str | The chunk UID from a search result |
list_libraries
Enumerate libraries the caller is authorized to read.
| Parameter | Type | Default | Description |
|---|---|---|---|
limit |
int | 50 | Max libraries (capped at 200) |
offset |
int | 0 | Pagination offset |
list_collections
Enumerate collections, optionally filtered to one library.
| Parameter | Type | Default | Description |
|---|---|---|---|
library_uid |
str | None | None | Filter to one parent library |
limit |
int | 50 | Max collections (capped at 200) |
offset |
int | 0 | Pagination offset |
list_items
Enumerate indexed documents or files. Check embedding_status — only "completed" items appear in search.
| Parameter | Type | Default | Description |
|---|---|---|---|
collection_uid |
str | None | None | Filter to one collection |
library_uid |
str | None | None | Filter to one library |
limit |
int | 50 | Max items (capped at 200) |
offset |
int | 0 | Pagination offset |
get_health
Pallas-compatible health probe. No auth required.
{
"status": "ok | degraded | error",
"checks": {
"neo4j": {"status": "ok", "duration_ms": 2.1},
"s3": {"status": "ok", "duration_ms": 8.4},
"embedding": {"status": "ok", "model": "...", "duration_ms": 0.3}
}
}
Neo4j or S3 failures → error (critical). Missing or unconfigured embedding model → degraded (non-critical).
Authentication
All tools except get_health require a Bearer token in the Authorization header. Three credential types:
| Type | Issued by | Lifetime | Scope |
|---|---|---|---|
Opaque MCPToken |
Mnemosyne admin | Long-lived (optional expiry) | allowed_libraries list on the token row; per-tool ACL available |
Per-turn JWT (iss=daedalus) |
Daedalus chat | ≤10 minutes | libs claim (list of Library UIDs) |
Team JWT (iss=mnemosyne, typ=team) |
Mnemosyne | 10-year lifetime | Resolved live from TeamWorkspaceAssignment → Neo4j Library.workspace_id. Revoked via active_jti rotation. |
Every authenticated request resolves to a resolved_libraries list — the set of Library UIDs the caller may read. Tools enforce this list at the query layer. Empty list = authenticated but sees nothing (fail-closed). No auth = also fail-closed.
Who Uses Mnemosyne
All regular agents have access via team-based authentication. Each team's token resolves to the libraries appropriate for that team's domain:
- Personal team — all personal-relevant libraries (fiction, nonfiction, technical, music, film, art, journal, business, finance). Each agent self-filters by
library_typebased on their domain. - Work team — business-focused libraries; supporting reference (Ann reaches for nonfiction; Alan reaches for business strategy material).
- Engineering team — technical libraries and reference (Harper for build references; Scotty for runbooks and incident records).
Within a team, each agent is responsible for searching the right library_type for their work — there's no per-agent ACL inside a team token. Searching the wrong library type returns useless results, not an error.
What It's Good For
- Searching Robert's curated knowledge across libraries — books, music, journal entries, business documents, reference material
- Multimodal queries — find a book cover, an album sleeve, a screenshot alongside text
- "Did I read something about X" / "what did I write about Y on what date"
- Pulling source material Robert has actually curated, rather than guessing from training data
- Following graph relationships through the underlying Neo4j vector store (Author → Book → Topic; Artist → Album → Track)
What It's Not Good For
- General web knowledge — that's Argos
- Anything not yet ingested — Mnemosyne only knows what's been indexed
- Synthesis or "give me the answer" — Mnemosyne returns chunks; the calling agent synthesizes
- Real-time information (status, news) — content is ingested, not live
- Writing — Mnemosyne is a retrieval surface; ingestion happens through Daedalus and admin tooling
Known Gotchas
- It's retrieval, not answers. Always cite
chunk_uidso Robert can verify. library_typematters. Searching the wrong library type returns nothing useful. Uselist_librariesif uncertain.text_previewis ~500 chars. Often enough for the agent to decide whether the chunk is relevant; not enough for synthesis. Callget_chunkfor the full text only when you need it.- Only
embedding_status = "completed"items appear in search. A library with items in progress will show fewer results thanlist_itemssuggests. - Empty results may mean the index isn't ready in this environment.
get_healthwill reportdegradedif the embedding model is missing. Surface that, don't silently confabulate. - Fail-closed auth. No token = no results. Empty allowed-library list = also no results. Distinguish "I searched and found nothing" from "I'm not authorized" —
list_librariesreturning an empty set is the tell for the latter. include_images=Trueby default. When images aren't relevant, set it to False to reduce noise and tokens.- Re-ranking has a cost.
rerank=True(default) gives better precision but adds latency. For exploratory queries,rerank=Falseis fine; for the query that produces the final answer, leave reranking on.