Commit Graph

64 Commits

Author SHA1 Message Date
f2af28d96d feat(api): add workspace + ingest REST endpoints for Daedalus
Adds the REST API surface that Daedalus calls to manage workspace
lifecycle and dispatch file ingestion. All endpoints under /library/api/:

  POST   /workspaces/                   create workspace (idempotent on
                                        workspace_id; library_type frozen)
  GET    /workspaces/{workspace_id}/    workspace status with item/chunk
                                        counts
  DELETE /workspaces/{workspace_id}/    delete workspace + reachable
                                        content; concept-safe (orphan-only
                                        Concept GC; concepts referenced
                                        elsewhere are preserved)

  POST   /ingest/                       queue a file for ingest. Idempotent
                                        on (library, source_ref, hash):
                                        same triple → return existing job;
                                        new hash → supersede.
  GET    /jobs/{job_id}/                poll job status
  POST   /jobs/{job_id}/retry/          re-dispatch a failed job
  GET    /jobs/?status=&library_uid=    list recent jobs

Workspace-Library lookup uses the unique workspace_id index added in the
schema commit. Concept GC runs as a separate transaction after item/chunk
delete so partial failures don't leave the global graph corrupted.

Tests cover serializer validation, IngestJob ORM behavior, the
(library, source_ref, hash) idempotency query pattern, and auth
boundaries on every new endpoint. Cypher correctness is validated by
manual end-to-end testing — no live Neo4j in unit tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 06:27:08 -04:00
c485a8560c feat(ingest): add Daedalus cross-bucket S3 fetch + ingest_from_daedalus task
Adds DAEDALUS_S3_* settings (read-only credentials for the Daedalus bucket)
and a small `daedalus_s3.py` helper that fetches a file from Daedalus's
bucket and writes it into Mnemosyne's bucket via default_storage.

Adds the Celery task `library.tasks.ingest_from_daedalus`. Given an
IngestJob row, it:
  1. Resolves the target Library (by library_uid).
  2. Supersedes a prior Item with the same source_ref but different
     content_hash by deleting the old Item + chunks first.
  3. Fetches from Daedalus S3, copies into items/{item_uid}/original.{ext}.
  4. Creates the Item node, links it to a default Collection.
  5. Runs the existing EmbeddingPipeline.process_item.
  6. Marks the job completed with chunks/concepts counts.

Failures retry up to 3× with exponential backoff; final failure marks
the job failed with the exception text. Routed to the embedding queue
so single-worker setups must consume it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 06:26:48 -04:00
33658fbc8d feat(library): add business + finance types, workspace_id, IngestJob
Adds two new content-type-aware library types — `business` for
proposals/marketing/strategy (used by the work-team agents) and `finance`
for statements/tax/market commentary (used by Garth). Each ships with
chunking config, embedding/reranker instructions, an LLM-context prompt
that forbids fabricating financial figures, and a vision prompt.

Adds a unique-indexed `workspace_id` property to `Library` so a node
can be scoped to a Daedalus workspace. Null means a global library;
non-null means workspace-scoped. Search Cypher (added in a later
commit) enforces the boundary.

Adds an `IngestJob` Django ORM model — separate from neomodel — that
tracks asynchronous ingestion lifecycle (Daedalus → S3 → Celery →
embedding pipeline) with idempotency on (library, source_ref, hash).
Migration 0001_initial creates the table.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 06:26:26 -04:00
81426327bf feat(mcp): store MCP tokens as SHA-256 hashes instead of plaintext
Replace plaintext token storage with SHA-256 hashes so leaked database
contents cannot be used to authenticate. Plaintext is generated, shown
once at creation time, and never persisted.

- Add `hash_token()` helper and `MCPTokenManager.create_token()` that
  returns `(instance, plaintext)`.
- Replace `token` field with indexed `token_hash`; look up bearers by
  hashing the incoming value.
- Update dashboard, management command, and admin to surface plaintext
  only at creation. Disable admin "add" since it cannot reveal plaintext.
- Migration drops the old `token` column and adds `token_hash`;
  pre-existing tokens are invalidated and must be reissued.
2026-04-27 09:01:36 -04:00
2df22941d2 feat: replace server-side RAG with MCP retrieval primitives
- Remove Phase 4 RAG pipeline in favor of retrieval-only architecture
- Add FastMCP server exposing search, get_chunk, list_libraries tools
- Mount MCP endpoints (streamable HTTP + SSE) via Starlette in ASGI config
- Update README to clarify Mnemosyne is a retrieval engine, not RAG
- Let calling LLMs drive synthesis and iterative retrieval themselves
2026-04-26 15:34:26 -04:00
388b37e471 fix(search): require library match and preserve raw scores for RRF
Replace OPTIONAL MATCH with MATCH for Library-Collection-Item paths to
ensure results are properly scoped to libraries, and remove per-query
score normalization since RRF fuses results by rank rather than score
magnitude.
2026-04-26 06:35:11 -04:00
4a35aa126f refactor(settings): replace DATABASE_URL with explicit DB env vars
Replace the single `DATABASE_URL` connection string with individual
environment variables (`APP_DB_NAME`, `APP_DB_USER`, `APP_DB_PASSWORD`,
`DB_HOST`, `DB_PORT`) for more granular database configuration control.
2026-04-13 10:23:03 +00:00
634845fee0 feat: add Phase 3 hybrid search with Synesis reranking
Implement hybrid search pipeline combining vector, fulltext, and graph
search across Neo4j, with cross-attention reranking via Synesis
(Qwen3-VL-Reranker-2B) `/v1/rerank` endpoint.

- Add SearchService with vector, fulltext, and graph search strategies
- Add SynesisRerankerClient for multimodal reranking via HTTP API
- Add search API endpoint (POST /search/) with filtering by library,
  collection, and library_type
- Add SearchRequest/Response serializers and image search results
- Add "nonfiction" to library_type choices
- Consolidate reranker stack from two models to single Synesis service
- Handle image analysis_status as "skipped" when analysis is unavailable
- Add comprehensive tests for search pipeline and reranker client
2026-03-29 18:09:50 +00:00
fb38a881d9 Add vision model support to LLM Manager admin and rename index for clarity 2026-03-29 17:03:59 +00:00
90db904959 Add vision analysis capabilities to the embedding pipeline
- Introduced a new vision analysis service to classify, describe, and extract text from images.
- Enhanced the Image model with fields for OCR text, vision model name, and analysis status.
- Added a new "nonfiction" library type with specific chunking and embedding configurations.
- Updated content types to include vision prompts for various library types.
- Integrated vision analysis into the embedding pipeline, allowing for image analysis during document processing.
- Implemented metrics to track vision analysis performance and usage.
- Updated UI components to display vision analysis results and statuses in item details and the embedding dashboard.
- Added migration for new vision model fields and usage tracking.
2026-03-22 15:14:34 +00:00
6585beed20 Add download functionality for items and images with presigned URLs 2026-03-22 12:08:44 +00:00
1379e0d425 Add logging configuration to prevent Celery from overriding Django's logging setup 2026-03-21 13:23:56 +00:00
99bdb4ac92 Add Themis application with custom widgets, views, and utilities
- Implemented custom form widgets for date, time, and datetime fields with DaisyUI styling.
- Created utility functions for formatting dates, times, and numbers according to user preferences.
- Developed views for profile settings, API key management, and notifications, including health check endpoints.
- Added URL configurations for Themis tests and main application routes.
- Established test cases for custom widgets to ensure proper functionality and integration.
- Defined project metadata and dependencies in pyproject.toml for package management.
2026-03-21 02:00:18 +00:00
e99346d014 Initial commit 2026-03-18 23:01:09 +00:00