Commit Graph

83 Commits

Author SHA1 Message Date
56e977ffb5 fix(library): normalize MIME types to file extensions in Daedalus ingest
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 1m9s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m15s
Daedalus may send `file_type` as a MIME type (e.g. `text/markdown`) rather
than a bare extension. Add a `_normalize_file_type` helper with a MIME→ext
lookup table and sensible fallbacks so ingested items are stored with
proper extensions like `md` instead of `text/markdown`.
2026-05-04 12:39:54 -04:00
37bb38ee43 fix(mnemosyne): use STORAGES config for S3 health check
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 49s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m14s
Update `_check_s3` to read S3 settings from the `STORAGES` dict instead of
deprecated top-level `AWS_*` settings. Skip the check when local storage
is enabled and return an error early if no bucket is configured.
2026-05-04 12:26:50 -04:00
cbe7921938 fix(deploy): use /ready/ healthcheck and /srv/mnemosyne path
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 1m9s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m31s
- Change app healthcheck from /live/ to /ready/ to verify full
  readiness including dependencies (DB, Neo4j, S3)
- Increase healthcheck timeout from 5s to 10s to accommodate
  dependency checks
- Add S3 bucket connectivity check to readiness probe
- Update deployment documentation to use /srv/mnemosyne instead
  of /opt/mnemosyne as the compose project directory
2026-05-04 09:23:36 -04:00
de0d7a4317 docs(mnemosyne): update integration doc for container deployment
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 50s
CVE Scan & Docker Build / build-and-push (push) Successful in 4m2s
2026-05-04 08:56:49 -04:00
e34b7f46a5 feat(mcp_server): add --password option to ensure_service_user command
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 1m2s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m15s
2026-05-04 08:43:55 -04:00
df2e495660 docs: add Red Panda Django Standards V1-02
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 49s
CVE Scan & Docker Build / build-and-push (push) Successful in 42s
Introduces the Red Panda Approval standards document for Django projects,
covering environment setup, directory structure, dependency pinning,
Docker Compose per-service environment scoping, nginx reverse-proxy
configuration (Docker DNS, X-Forwarded-Proto preservation, access-log
filtering, internal allowlists), and Memcached deployment notes.
2026-05-04 07:47:08 -04:00
c9328c58fc refactor(nginx): overhaul config with dynamic resolution and media serving
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 1m12s
CVE Scan & Docker Build / build-and-push (push) Successful in 1m5s
- Add Docker DNS resolver to prevent stale upstream IPs after container restarts
- Preserve X-Forwarded-Proto from HAProxy for correct HTTPS detection
- Mount mnemosyne-media volume for direct /media/ serving
- Add IP allowlisting for probe/metrics endpoints (RFC1918 + loopback)
- Fix access_log inheritance so probe paths are properly suppressed
- Expand inline documentation covering routing model and conventions
2026-05-04 07:41:15 -04:00
003f958f7b docs(env): expand .env.example into full compose interpolation template
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 51s
CVE Scan & Docker Build / build-and-push (push) Successful in 3m3s
Replace the minimal placeholder .env.example with a comprehensive template
documenting every variable consumed by docker-compose.yaml, organized by
service (Django core, HTTP, Postgres, Neo4j, Memcached, S3/MinIO, Daedalus,
Celery/RabbitMQ, etc.). Clarifies that this file is rendered from an Ansible
Jinja2 template with vaulted secrets in production, and distinguishes it
from the in-tree mnemosyne/.env used for bare-Python development.
2026-05-04 07:04:28 -04:00
d84f0e548b Docker Compose: Set pull policy to always
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 53s
CVE Scan & Docker Build / build-and-push (push) Successful in 43s
2026-05-03 20:06:38 -04:00
72bd4b381d Port number adjustments
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 50s
CVE Scan & Docker Build / build-and-push (push) Successful in 56s
2026-05-03 19:56:01 -04:00
7185d326eb feat(docker): rename web service to app, add nginx as web
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 53s
CVE Scan & Docker Build / build-and-push (push) Successful in 3m0s
Reorganize Docker Compose services: the Django/gunicorn container is now
`app` and nginx is `web`, better reflecting their roles. Add a dedicated
gunicorn configuration and install curl in the runtime image for health
checks.

Update documentation to reflect:
- Neo4j migration from ariel.incus to a dedicated umbriel.incus instance
- Rationale for requiring a dedicated Neo4j instance (single-tenancy
  assumptions, label/index isolation, schema ownership)
- New service naming in compose commands and log tailing examples
2026-05-03 19:35:27 -04:00
a2c885cf34 feat(library): add workspace-scoped search and JWT auth for Daedalus
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 52s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m32s
- Extend library list endpoint with `include_workspace` and
  `with_item_count` query params to support Daedalus registry mirroring
- Expand search scope clause to three modes: workspace-only, workspace
  plus allowed user libraries, and global
- Add `allowed_libraries` field to SearchRequest for Phase-2 JWT claims
- Introduce JWT-based actor resolution using a synthetic service user
  (`MCP_JWT_SERVICE_USERNAME`) for Daedalus-originated requests
2026-05-03 17:36:06 -04:00
e5618973fc docs(integration): mark Phases 1+2 as implemented; add Phase 3 stub
The integration doc was forward-looking spec but most of it now ships:

  Phase 1 (REST workspace + ingest API for Daedalus)         implemented
  Phase 2 (MCP server: search/get_chunk/list_*/get_health)   implemented
  Phase 3 (per-turn signed-token access control)            📋 deferred

Updated:
- Tool table reflects actual implementation (search, get_chunk,
  list_libraries, list_collections, list_items, get_health) instead
  of the speculative names (search_knowledge, search_by_category, etc.)
- Project structure matches the as-built layout (tools/discovery.py
  exists; no separate browse.py).
- REST API table covers both workspace lifecycle endpoints and ingest
  endpoints, with correct routes (/library/api/...).
- Ingest request schema includes content_hash and workspace_id
  (the actual idempotency key on the Mnemosyne side).
- Celery task description matches library.tasks.ingest_from_daedalus
  rather than the placeholder embed_item.
- Phase 6 checklist marks Phases 1+2 done; adds Phase 3 (per-turn
  token access control) with a per-Mnemosyne-side TODO list pointing
  at the matching Daedalus-side §9 design.

Internal MCP port stays 22091; public access via nginx on 23090.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 21:54:05 -04:00
236d9e2e74 feat(deploy): production docker compose stack + Gitea CI image build
Adds a complete deployment surface for production:

  Dockerfile               multi-stage 3.12-slim build, collectstatic
                           baked into the image, runs as non-root mnemosyne
                           uid/gid 1000.
  docker/entrypoint.sh     dispatches `web | mcp | worker | beat | migrate
                           | setup | shell` from a single image, so every
                           service in compose runs the same artifact.
  docker-compose.yaml      five services: static-init (one-shot copies
                           statics into the shared volume on every up),
                           web (gunicorn), mcp (uvicorn), worker (celery),
                           nginx. External services (Postgres, Neo4j,
                           RabbitMQ, S3, Memcached, embedder, reranker)
                           reached over the 10.10.0.0/24 internal network
                           and configured via mnemosyne/.env.
  nginx/mnemosyne.conf     reverse proxy: /library/* and /admin/* → web,
                           /mcp/* → mcp, /static/* → volume, /metrics
                           internal-network-only (127/8 + RFC1918), /healthz
                           proxies to /mcp/health for liveness probes.
  .gitea/workflows/        CVE scan + image build, image pushed to
                           git.helu.ca/r/mnemosyne. Trivy scans pyproject
                           extras (dev/test/lint/docs) and the built image.
  pyproject.toml           adds [test], [lint], [docs] extras so the CI
                           pip-compile step has something to resolve.

README documents the bring-up flow (`docker compose run --rm web migrate`,
then `setup`, then `up -d`), day-to-day commands, and the env-var values
that need adjusting for production (DEBUG=False, KVDB_LOCATION pointing
at the external memcached, AWS keys filled in, etc.).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 12:05:23 -04:00
1cd556c3f6 fix(asgi): redirect /mcp → /mcp/ for clients that omit the trailing slash
Starlette's Mount("/mcp", ...) only matches /mcp/* paths. A POST to bare
/mcp falls through to the catch-all Django mount and returns 404. The
fast-agent MCP client and the README example both used the no-slash URL,
so the validator was never able to initialize a session — every call
landed in django.request.

Adds a 307 redirect at /mcp so any client URL works, and points the
validator config at /mcp/ directly to skip the redirect round-trip.
Also gitignores fastagent.jsonl (a runtime log file fast-agent writes
into the working directory).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 12:04:42 -04:00
e2a6d45b77 chore(validator): drop .env, keep all config in FastAgent YAMLs
OPENAI_BASE_URL was duplicated between .env and fastagent.config.yaml;
the YAML is authoritative, so .env is dead weight. Removing the .env
template and gitignore entry, updating README to reflect.

The real fastagent.secrets.yaml stays gitignored;
fastagent.secrets.yaml.example remains as the documented schema.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 07:01:52 -04:00
97a14fb03a feat(validator): add bare FastAgent + Pallas validator for Mnemosyne MCP
A self-contained sub-project under validator/ that wraps Mnemosyne's MCP
server in a single FastAgent. Use it to confirm — outside of Daedalus —
that Mnemosyne's MCP transport works, every tool registers, args/responses
round-trip, and an LLM can actually drive the tools.

The validator is its own Pallas-consuming project with its own pyproject
(pallas-mcp + fast-agent-mcp), agents.yaml, and fastagent.config.yaml —
matching the pattern used by Iolaus and other Pallas consumers. It does
not import Mnemosyne Python code; it only speaks MCP over HTTP.

The agent never sets workspace_id, so all calls run against the global
scope (libraries with workspace_id IS NULL). Workspace-scoped validation
will come once Daedalus's chat path is wired (Daedalus injects
workspace_id server-side, force-overwriting whatever the LLM produces).

Default model is openai.Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf served by
llama.cpp at nyx.helu.ca:22079/v1. Token provisioning via
`python manage.py create_mcp_token --user <u> --name validator`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 06:53:48 -04:00
2a8a3d75b4 docs(readme): document operations + Daedalus integration endpoints
Adds a "Running Mnemosyne" section with the three commands needed to
operate the system: Django web app (gunicorn), MCP server (uvicorn on
:22091), and Celery worker — with notes on the embedding queue that
the Daedalus ingest task depends on.

Adds the Ouranos host map (Portia / Ariel / Oberon / Nyx / Memcached),
one-time setup commands (migrate, setup_neo4j_indexes, load_library_types),
the Daedalus integration endpoints table, and the two new library types
(business, finance) in the existing Library Types table.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 06:27:46 -04:00
5527cf6bdb feat(search,mcp): workspace-scope search and add get_health MCP tool
Workspace scoping is the integration's security-critical property: an
agent in workspace A must never see content from workspace B or from
any global library, regardless of what the calling LLM tries.

Adds `workspace_id` to SearchRequest with __post_init__ normalization
that converts empty strings to None — so "" cannot slip through as a
truthy filter at the Cypher boundary. Extracts the workspace scope
clause to a single string and appends it to all five search queries
(vector, fulltext-chunk, fulltext-concept, graph, image):

  ($workspace_id IS NULL AND lib.workspace_id IS NULL
   OR lib.workspace_id = $workspace_id)

Either workspace-only or global-only — never both — and the operator
precedence is bracketed so a refactor can't accidentally widen it. A
test verifies the literal clause string for that exact reason.

Adds `workspace_id` as a parameter to every MCP tool (`search`,
`get_chunk`, `list_libraries`, `list_collections`, `list_items`).
Deliberately undocumented in tool docstrings so the calling LLM is never
told the parameter exists — it is system-injected by Daedalus's chat
path and force-overwritten before reaching Mnemosyne. Mnemosyne also
validates the value but the security guarantee is enforced upstream.

Adds the `get_health` MCP tool per the Pallas health spec: returns
ok / degraded / error after probing Neo4j, S3, and the embedding
model registration. Used by Daedalus's existing health poller.
Updates the server INSTRUCTIONS string to advertise the new tool and
the two new library types (business, finance).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 06:27:32 -04:00
f2af28d96d feat(api): add workspace + ingest REST endpoints for Daedalus
Adds the REST API surface that Daedalus calls to manage workspace
lifecycle and dispatch file ingestion. All endpoints under /library/api/:

  POST   /workspaces/                   create workspace (idempotent on
                                        workspace_id; library_type frozen)
  GET    /workspaces/{workspace_id}/    workspace status with item/chunk
                                        counts
  DELETE /workspaces/{workspace_id}/    delete workspace + reachable
                                        content; concept-safe (orphan-only
                                        Concept GC; concepts referenced
                                        elsewhere are preserved)

  POST   /ingest/                       queue a file for ingest. Idempotent
                                        on (library, source_ref, hash):
                                        same triple → return existing job;
                                        new hash → supersede.
  GET    /jobs/{job_id}/                poll job status
  POST   /jobs/{job_id}/retry/          re-dispatch a failed job
  GET    /jobs/?status=&library_uid=    list recent jobs

Workspace-Library lookup uses the unique workspace_id index added in the
schema commit. Concept GC runs as a separate transaction after item/chunk
delete so partial failures don't leave the global graph corrupted.

Tests cover serializer validation, IngestJob ORM behavior, the
(library, source_ref, hash) idempotency query pattern, and auth
boundaries on every new endpoint. Cypher correctness is validated by
manual end-to-end testing — no live Neo4j in unit tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 06:27:08 -04:00
c485a8560c feat(ingest): add Daedalus cross-bucket S3 fetch + ingest_from_daedalus task
Adds DAEDALUS_S3_* settings (read-only credentials for the Daedalus bucket)
and a small `daedalus_s3.py` helper that fetches a file from Daedalus's
bucket and writes it into Mnemosyne's bucket via default_storage.

Adds the Celery task `library.tasks.ingest_from_daedalus`. Given an
IngestJob row, it:
  1. Resolves the target Library (by library_uid).
  2. Supersedes a prior Item with the same source_ref but different
     content_hash by deleting the old Item + chunks first.
  3. Fetches from Daedalus S3, copies into items/{item_uid}/original.{ext}.
  4. Creates the Item node, links it to a default Collection.
  5. Runs the existing EmbeddingPipeline.process_item.
  6. Marks the job completed with chunks/concepts counts.

Failures retry up to 3× with exponential backoff; final failure marks
the job failed with the exception text. Routed to the embedding queue
so single-worker setups must consume it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 06:26:48 -04:00
33658fbc8d feat(library): add business + finance types, workspace_id, IngestJob
Adds two new content-type-aware library types — `business` for
proposals/marketing/strategy (used by the work-team agents) and `finance`
for statements/tax/market commentary (used by Garth). Each ships with
chunking config, embedding/reranker instructions, an LLM-context prompt
that forbids fabricating financial figures, and a vision prompt.

Adds a unique-indexed `workspace_id` property to `Library` so a node
can be scoped to a Daedalus workspace. Null means a global library;
non-null means workspace-scoped. Search Cypher (added in a later
commit) enforces the boundary.

Adds an `IngestJob` Django ORM model — separate from neomodel — that
tracks asynchronous ingestion lifecycle (Daedalus → S3 → Celery →
embedding pipeline) with idempotency on (library, source_ref, hash).
Migration 0001_initial creates the table.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 06:26:26 -04:00
81426327bf feat(mcp): store MCP tokens as SHA-256 hashes instead of plaintext
Replace plaintext token storage with SHA-256 hashes so leaked database
contents cannot be used to authenticate. Plaintext is generated, shown
once at creation time, and never persisted.

- Add `hash_token()` helper and `MCPTokenManager.create_token()` that
  returns `(instance, plaintext)`.
- Replace `token` field with indexed `token_hash`; look up bearers by
  hashing the incoming value.
- Update dashboard, management command, and admin to surface plaintext
  only at creation. Disable admin "add" since it cannot reveal plaintext.
- Migration drops the old `token` column and adds `token_hash`;
  pre-existing tokens are invalidated and must be reissued.
2026-04-27 09:01:36 -04:00
2df22941d2 feat: replace server-side RAG with MCP retrieval primitives
- Remove Phase 4 RAG pipeline in favor of retrieval-only architecture
- Add FastMCP server exposing search, get_chunk, list_libraries tools
- Mount MCP endpoints (streamable HTTP + SSE) via Starlette in ASGI config
- Update README to clarify Mnemosyne is a retrieval engine, not RAG
- Let calling LLMs drive synthesis and iterative retrieval themselves
2026-04-26 15:34:26 -04:00
388b37e471 fix(search): require library match and preserve raw scores for RRF
Replace OPTIONAL MATCH with MATCH for Library-Collection-Item paths to
ensure results are properly scoped to libraries, and remove per-query
score normalization since RRF fuses results by rank rather than score
magnitude.
2026-04-26 06:35:11 -04:00
4a35aa126f refactor(settings): replace DATABASE_URL with explicit DB env vars
Replace the single `DATABASE_URL` connection string with individual
environment variables (`APP_DB_NAME`, `APP_DB_USER`, `APP_DB_PASSWORD`,
`DB_HOST`, `DB_PORT`) for more granular database configuration control.
2026-04-13 10:23:03 +00:00
634845fee0 feat: add Phase 3 hybrid search with Synesis reranking
Implement hybrid search pipeline combining vector, fulltext, and graph
search across Neo4j, with cross-attention reranking via Synesis
(Qwen3-VL-Reranker-2B) `/v1/rerank` endpoint.

- Add SearchService with vector, fulltext, and graph search strategies
- Add SynesisRerankerClient for multimodal reranking via HTTP API
- Add search API endpoint (POST /search/) with filtering by library,
  collection, and library_type
- Add SearchRequest/Response serializers and image search results
- Add "nonfiction" to library_type choices
- Consolidate reranker stack from two models to single Synesis service
- Handle image analysis_status as "skipped" when analysis is unavailable
- Add comprehensive tests for search pipeline and reranker client
2026-03-29 18:09:50 +00:00
fb38a881d9 Add vision model support to LLM Manager admin and rename index for clarity 2026-03-29 17:03:59 +00:00
90db904959 Add vision analysis capabilities to the embedding pipeline
- Introduced a new vision analysis service to classify, describe, and extract text from images.
- Enhanced the Image model with fields for OCR text, vision model name, and analysis status.
- Added a new "nonfiction" library type with specific chunking and embedding configurations.
- Updated content types to include vision prompts for various library types.
- Integrated vision analysis into the embedding pipeline, allowing for image analysis during document processing.
- Implemented metrics to track vision analysis performance and usage.
- Updated UI components to display vision analysis results and statuses in item details and the embedding dashboard.
- Added migration for new vision model fields and usage tracking.
2026-03-22 15:14:34 +00:00
6585beed20 Add download functionality for items and images with presigned URLs 2026-03-22 12:08:44 +00:00
1379e0d425 Add logging configuration to prevent Celery from overriding Django's logging setup 2026-03-21 13:23:56 +00:00
99bdb4ac92 Add Themis application with custom widgets, views, and utilities
- Implemented custom form widgets for date, time, and datetime fields with DaisyUI styling.
- Created utility functions for formatting dates, times, and numbers according to user preferences.
- Developed views for profile settings, API key management, and notifications, including health check endpoints.
- Added URL configurations for Themis tests and main application routes.
- Established test cases for custom widgets to ensure proper functionality and integration.
- Defined project metadata and dependencies in pyproject.toml for package management.
2026-03-21 02:00:18 +00:00
e99346d014 Initial commit 2026-03-18 23:01:09 +00:00