mnemosyne

Author	SHA1	Message	Date
Robert Helewka	2a8a3d75b4	docs(readme): document operations + Daedalus integration endpoints Adds a "Running Mnemosyne" section with the three commands needed to operate the system: Django web app (gunicorn), MCP server (uvicorn on :22091), and Celery worker — with notes on the embedding queue that the Daedalus ingest task depends on. Adds the Ouranos host map (Portia / Ariel / Oberon / Nyx / Memcached), one-time setup commands (migrate, setup_neo4j_indexes, load_library_types), the Daedalus integration endpoints table, and the two new library types (business, finance) in the existing Library Types table. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 06:27:46 -04:00
Robert Helewka	5527cf6bdb	feat(search,mcp): workspace-scope search and add get_health MCP tool Workspace scoping is the integration's security-critical property: an agent in workspace A must never see content from workspace B or from any global library, regardless of what the calling LLM tries. Adds `workspace_id` to SearchRequest with __post_init__ normalization that converts empty strings to None — so "" cannot slip through as a truthy filter at the Cypher boundary. Extracts the workspace scope clause to a single string and appends it to all five search queries (vector, fulltext-chunk, fulltext-concept, graph, image): ($workspace_id IS NULL AND lib.workspace_id IS NULL OR lib.workspace_id = $workspace_id) Either workspace-only or global-only — never both — and the operator precedence is bracketed so a refactor can't accidentally widen it. A test verifies the literal clause string for that exact reason. Adds `workspace_id` as a parameter to every MCP tool (`search`, `get_chunk`, `list_libraries`, `list_collections`, `list_items`). Deliberately undocumented in tool docstrings so the calling LLM is never told the parameter exists — it is system-injected by Daedalus's chat path and force-overwritten before reaching Mnemosyne. Mnemosyne also validates the value but the security guarantee is enforced upstream. Adds the `get_health` MCP tool per the Pallas health spec: returns ok / degraded / error after probing Neo4j, S3, and the embedding model registration. Used by Daedalus's existing health poller. Updates the server INSTRUCTIONS string to advertise the new tool and the two new library types (business, finance). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 06:27:32 -04:00
Robert Helewka	f2af28d96d	feat(api): add workspace + ingest REST endpoints for Daedalus Adds the REST API surface that Daedalus calls to manage workspace lifecycle and dispatch file ingestion. All endpoints under /library/api/: POST /workspaces/ create workspace (idempotent on workspace_id; library_type frozen) GET /workspaces/{workspace_id}/ workspace status with item/chunk counts DELETE /workspaces/{workspace_id}/ delete workspace + reachable content; concept-safe (orphan-only Concept GC; concepts referenced elsewhere are preserved) POST /ingest/ queue a file for ingest. Idempotent on (library, source_ref, hash): same triple → return existing job; new hash → supersede. GET /jobs/{job_id}/ poll job status POST /jobs/{job_id}/retry/ re-dispatch a failed job GET /jobs/?status=&library_uid= list recent jobs Workspace-Library lookup uses the unique workspace_id index added in the schema commit. Concept GC runs as a separate transaction after item/chunk delete so partial failures don't leave the global graph corrupted. Tests cover serializer validation, IngestJob ORM behavior, the (library, source_ref, hash) idempotency query pattern, and auth boundaries on every new endpoint. Cypher correctness is validated by manual end-to-end testing — no live Neo4j in unit tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 06:27:08 -04:00
Robert Helewka	c485a8560c	feat(ingest): add Daedalus cross-bucket S3 fetch + ingest_from_daedalus task Adds DAEDALUS_S3_* settings (read-only credentials for the Daedalus bucket) and a small `daedalus_s3.py` helper that fetches a file from Daedalus's bucket and writes it into Mnemosyne's bucket via default_storage. Adds the Celery task `library.tasks.ingest_from_daedalus`. Given an IngestJob row, it: 1. Resolves the target Library (by library_uid). 2. Supersedes a prior Item with the same source_ref but different content_hash by deleting the old Item + chunks first. 3. Fetches from Daedalus S3, copies into items/{item_uid}/original.{ext}. 4. Creates the Item node, links it to a default Collection. 5. Runs the existing EmbeddingPipeline.process_item. 6. Marks the job completed with chunks/concepts counts. Failures retry up to 3× with exponential backoff; final failure marks the job failed with the exception text. Routed to the embedding queue so single-worker setups must consume it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 06:26:48 -04:00
Robert Helewka	33658fbc8d	feat(library): add business + finance types, workspace_id, IngestJob Adds two new content-type-aware library types — `business` for proposals/marketing/strategy (used by the work-team agents) and `finance` for statements/tax/market commentary (used by Garth). Each ships with chunking config, embedding/reranker instructions, an LLM-context prompt that forbids fabricating financial figures, and a vision prompt. Adds a unique-indexed `workspace_id` property to `Library` so a node can be scoped to a Daedalus workspace. Null means a global library; non-null means workspace-scoped. Search Cypher (added in a later commit) enforces the boundary. Adds an `IngestJob` Django ORM model — separate from neomodel — that tracks asynchronous ingestion lifecycle (Daedalus → S3 → Celery → embedding pipeline) with idempotency on (library, source_ref, hash). Migration 0001_initial creates the table. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 06:26:26 -04:00
Robert Helewka	81426327bf	feat(mcp): store MCP tokens as SHA-256 hashes instead of plaintext Replace plaintext token storage with SHA-256 hashes so leaked database contents cannot be used to authenticate. Plaintext is generated, shown once at creation time, and never persisted. - Add `hash_token()` helper and `MCPTokenManager.create_token()` that returns `(instance, plaintext)`. - Replace `token` field with indexed `token_hash`; look up bearers by hashing the incoming value. - Update dashboard, management command, and admin to surface plaintext only at creation. Disable admin "add" since it cannot reveal plaintext. - Migration drops the old `token` column and adds `token_hash`; pre-existing tokens are invalidated and must be reissued.	2026-04-27 09:01:36 -04:00
Robert Helewka	2df22941d2	feat: replace server-side RAG with MCP retrieval primitives - Remove Phase 4 RAG pipeline in favor of retrieval-only architecture - Add FastMCP server exposing search, get_chunk, list_libraries tools - Mount MCP endpoints (streamable HTTP + SSE) via Starlette in ASGI config - Update README to clarify Mnemosyne is a retrieval engine, not RAG - Let calling LLMs drive synthesis and iterative retrieval themselves	2026-04-26 15:34:26 -04:00
Robert Helewka	388b37e471	fix(search): require library match and preserve raw scores for RRF Replace OPTIONAL MATCH with MATCH for Library-Collection-Item paths to ensure results are properly scoped to libraries, and remove per-query score normalization since RRF fuses results by rank rather than score magnitude.	2026-04-26 06:35:11 -04:00
Robert Helewka	4a35aa126f	refactor(settings): replace DATABASE_URL with explicit DB env vars Replace the single `DATABASE_URL` connection string with individual environment variables (`APP_DB_NAME`, `APP_DB_USER`, `APP_DB_PASSWORD`, `DB_HOST`, `DB_PORT`) for more granular database configuration control.	2026-04-13 10:23:03 +00:00
Robert Helewka	634845fee0	feat: add Phase 3 hybrid search with Synesis reranking Implement hybrid search pipeline combining vector, fulltext, and graph search across Neo4j, with cross-attention reranking via Synesis (Qwen3-VL-Reranker-2B) `/v1/rerank` endpoint. - Add SearchService with vector, fulltext, and graph search strategies - Add SynesisRerankerClient for multimodal reranking via HTTP API - Add search API endpoint (POST /search/) with filtering by library, collection, and library_type - Add SearchRequest/Response serializers and image search results - Add "nonfiction" to library_type choices - Consolidate reranker stack from two models to single Synesis service - Handle image analysis_status as "skipped" when analysis is unavailable - Add comprehensive tests for search pipeline and reranker client	2026-03-29 18:09:50 +00:00
Robert Helewka	fb38a881d9	Add vision model support to LLM Manager admin and rename index for clarity	2026-03-29 17:03:59 +00:00
Robert Helewka	90db904959	Add vision analysis capabilities to the embedding pipeline - Introduced a new vision analysis service to classify, describe, and extract text from images. - Enhanced the Image model with fields for OCR text, vision model name, and analysis status. - Added a new "nonfiction" library type with specific chunking and embedding configurations. - Updated content types to include vision prompts for various library types. - Integrated vision analysis into the embedding pipeline, allowing for image analysis during document processing. - Implemented metrics to track vision analysis performance and usage. - Updated UI components to display vision analysis results and statuses in item details and the embedding dashboard. - Added migration for new vision model fields and usage tracking.	2026-03-22 15:14:34 +00:00
Robert Helewka	6585beed20	Add download functionality for items and images with presigned URLs	2026-03-22 12:08:44 +00:00
Robert Helewka	1379e0d425	Add logging configuration to prevent Celery from overriding Django's logging setup	2026-03-21 13:23:56 +00:00
Robert Helewka	99bdb4ac92	Add Themis application with custom widgets, views, and utilities - Implemented custom form widgets for date, time, and datetime fields with DaisyUI styling. - Created utility functions for formatting dates, times, and numbers according to user preferences. - Developed views for profile settings, API key management, and notifications, including health check endpoints. - Added URL configurations for Themis tests and main application routes. - Established test cases for custom widgets to ensure proper functionality and integration. - Defined project metadata and dependencies in pyproject.toml for package management.	2026-03-21 02:00:18 +00:00
Robert	e99346d014	Initial commit	2026-03-18 23:01:09 +00:00

1 2

66 Commits