r/koios

Files

Robert Helewka b7e0dc927f docs(personal): restructure bourdain docs to separate system prompt

Refactor documentation to distinguish character reference from AI
system prompt. Removed user context and persona definitions.
System prompt instructions moved to prompts/personal/bourdain.md.

2026-05-21 06:53:04 -04:00

9.0 KiB

Raw Blame History

Mnemosyne

Multimodal personal knowledge base — Robert's curated content across many domains, retrieved through a content-type-aware MCP surface.

MCP server name: mnemosyne
Prompt snippet: prompts/tools/mnemosyne.md
Project repo: /home/robert/git/mnemosyne

What It Is

Mnemosyne is "the memory of everything you know" — Robert's content-type-aware multimodal knowledge management system built on Neo4j vector storage and Qwen3-VL embeddings. Unlike a generic vector store, Mnemosyne knows what kind of thing a document is (a novel, a textbook, an album, a journal entry, a business proposal) and adjusts chunking, embedding, and retrieval accordingly.

It is a retrieval surface, not a synthesis engine. Tools return ranked evidence — chunks plus metadata. The calling agent reads the chunks and forms the answer, citing chunk UIDs back so Robert can trace what informed the response.

Concepts

Library — the top-level container. Each library has a library_type that drives chunking, embedding, and re-ranking strategy.

Collection — a named group of items inside a library (a novel series, a multi-volume manual).

Item — an indexed document or file. Only items with embedding_status = "completed" appear in search results.

Chunk — a text segment of an item. search returns a text_preview (~500 chars); use get_chunk for the full text.

Library Types

`library_type`	Content
`fiction`	Novels, short stories. Cover art available.
`nonfiction`	General non-fiction prose.
`technical`	Manuals, textbooks, docs. Diagrams and code-like content.
`music`	Lyrics, liner notes, album artwork.
`film`	Scripts, synopses, stills.
`art`	Catalogs, descriptions, the artwork itself.
`journal`	Personal entries; temporal/reflective.
`business`	Proposals, marketing, sales, strategy. Commercial context.
`finance`	Statements, tax, market commentary. Quote figures exactly.

Scoping queries to the right library_type matters. A search for "Stoic philosophy" against the finance library returns useless results.

MCP Tools

Recommended workflow

list_libraries
  → search(query, library_type=..., library_uid=...)
    → get_chunk(chunk_uid)    # only when text_preview is insufficient

`search`

Hybrid retrieval: vector + full-text + concept-graph candidates fused by RRF (Reciprocal Rank Fusion), with optional Synesis re-ranking.

Parameter	Type	Default	Description
`query`	str	required	The search query
`library_uid`	str \| None	None	Restrict to one library by UID
`library_type`	str \| None	None	Restrict by library type (table above)
`collection_uid`	str \| None	None	Restrict to one collection by UID
`limit`	int	20	Max candidates to return
`rerank`	bool	True	Apply Synesis re-ranking
`include_images`	bool	True	Include matching images in the response
`search_types`	list[str] \| None	`["vector", "fulltext", "graph"]`	Which retrieval strategies to run

Returns:

{
  "query": "...",
  "candidates": [
    {
      "chunk_uid": "...",
      "item_uid": "...",
      "item_title": "...",
      "library_type": "...",
      "text_preview": "... (~500 chars) ...",
      "score": 0.92,
      "source": "vector|fulltext|graph"
    }
  ],
  "images": [...],
  "total_candidates": 42,
  "search_time_ms": 85,
  "reranker_used": true,
  "reranker_model": "...",
  "search_types_used": ["vector", "fulltext", "graph"]
}

`get_chunk`

Full text of a single chunk by UID. Use when text_preview is insufficient.

Parameter	Type	Description
`chunk_uid`	str	The chunk UID from a `search` result

`list_libraries`

Enumerate libraries the caller is authorized to read.

Parameter	Type	Default	Description
`limit`	int	50	Max libraries (capped at 200)
`offset`	int	0	Pagination offset

`list_collections`

Enumerate collections, optionally filtered to one library.

Parameter	Type	Default	Description
`library_uid`	str \| None	None	Filter to one parent library
`limit`	int	50	Max collections (capped at 200)
`offset`	int	0	Pagination offset

`list_items`

Enumerate indexed documents or files. Check embedding_status — only "completed" items appear in search.

Parameter	Type	Default	Description
`collection_uid`	str \| None	None	Filter to one collection
`library_uid`	str \| None	None	Filter to one library
`limit`	int	50	Max items (capped at 200)
`offset`	int	0	Pagination offset

`get_health`

Pallas-compatible health probe. No auth required.

{
  "status": "ok | degraded | error",
  "checks": {
    "neo4j":     {"status": "ok", "duration_ms": 2.1},
    "s3":        {"status": "ok", "duration_ms": 8.4},
    "embedding": {"status": "ok", "model": "...", "duration_ms": 0.3}
  }
}

Neo4j or S3 failures → error (critical). Missing or unconfigured embedding model → degraded (non-critical).

Authentication

All tools except get_health require a Bearer token in the Authorization header. Three credential types:

Type	Issued by	Lifetime	Scope
Opaque `MCPToken`	Mnemosyne admin	Long-lived (optional expiry)	`allowed_libraries` list on the token row; per-tool ACL available
Per-turn JWT (`iss=daedalus`)	Daedalus chat	≤10 minutes	`libs` claim (list of Library UIDs)
Team JWT (`iss=mnemosyne`, `typ=team`)	Mnemosyne	10-year lifetime	Resolved live from `TeamWorkspaceAssignment` → Neo4j `Library.workspace_id`. Revoked via `active_jti` rotation.

Every authenticated request resolves to a resolved_libraries list — the set of Library UIDs the caller may read. Tools enforce this list at the query layer. Empty list = authenticated but sees nothing (fail-closed). No auth = also fail-closed.

Who Uses Mnemosyne

All regular agents have access via team-based authentication. Each team's token resolves to the libraries appropriate for that team's domain:

Personal team — all personal-relevant libraries (fiction, nonfiction, technical, music, film, art, journal, business, finance). Each agent self-filters by library_type based on their domain.
Work team — business-focused libraries; supporting reference (Ann reaches for nonfiction; Alan reaches for business strategy material).
Engineering team — technical libraries and reference (Harper for build references; Scotty for runbooks and incident records).

Within a team, each agent is responsible for searching the right library_type for their work — there's no per-agent ACL inside a team token. Searching the wrong library type returns useless results, not an error.

What It's Good For

Searching Robert's curated knowledge across libraries — books, music, journal entries, business documents, reference material
Multimodal queries — find a book cover, an album sleeve, a screenshot alongside text
"Did I read something about X" / "what did I write about Y on what date"
Pulling source material Robert has actually curated, rather than guessing from training data
Following graph relationships through the underlying Neo4j vector store (Author → Book → Topic; Artist → Album → Track)

What It's Not Good For

General web knowledge — that's Argos
Anything not yet ingested — Mnemosyne only knows what's been indexed
Synthesis or "give me the answer" — Mnemosyne returns chunks; the calling agent synthesizes
Real-time information (status, news) — content is ingested, not live
Writing — Mnemosyne is a retrieval surface; ingestion happens through Daedalus and admin tooling

Known Gotchas

It's retrieval, not answers. Always cite chunk_uid so Robert can verify.
library_type matters. Searching the wrong library type returns nothing useful. Use list_libraries if uncertain.
text_preview is ~500 chars. Often enough for the agent to decide whether the chunk is relevant; not enough for synthesis. Call get_chunk for the full text only when you need it.
Only embedding_status = "completed" items appear in search. A library with items in progress will show fewer results than list_items suggests.
Empty results may mean the index isn't ready in this environment. get_health will report degraded if the embedding model is missing. Surface that, don't silently confabulate.
Fail-closed auth. No token = no results. Empty allowed-library list = also no results. Distinguish "I searched and found nothing" from "I'm not authorized" — list_libraries returning an empty set is the tell for the latter.
include_images=True by default. When images aren't relevant, set it to False to reduce noise and tokens.
Re-ranking has a cost. rerank=True (default) gives better precision but adds latency. For exploratory queries, rerank=False is fine; for the query that produces the final answer, leave reranking on.

9.0 KiB Raw Blame History