# Phase 1: Foundation ## Objective Establish the project skeleton, Neo4j data model, Django integration, and content-type system. At the end of this phase, you can create libraries, collections, and items via Django admin and the Neo4j graph is populated with the correct node/relationship structure. ## Deliverables ### 1. Django Project Skeleton - Rename configuration module from `mnemosyne/mnemosyne/` to `mnemosyne/config/` per Red Panda Standards - Create `pyproject.toml` at repo root with floor-pinned dependencies - Create `.env` / `.env.example` for environment variables (never commit `.env`) - Use a single settings.py and use dotenv to configure with '.env'. - Configure dual-database: PostgreSQL (Django auth/config) + Neo4j (content graph) - Install and configure `django-neomodel` for Neo4j OGM integration - Configure `djangorestframework` for API - Configure Celery + RabbitMQ (Async Task pattern) - Configure S3 storage backend via Incus buckets (MinIO-backed, Terraform-provisioned) - Configure structured logging for Loki integration via Alloy ### 2. Django Apps | App | Purpose | Database | |-----|---------|----------| | `themis` (installed) | User profiles, preferences, API key management, navigation, notifications | PostgreSQL | | `library/` | Libraries, Collections, Items, Chunks, Concepts | Neo4j (neomodel) | | `llm_manager/` | LLM API/model config, usage tracking | PostgreSQL (ported from Spelunker) | > **Note:** Themis replaces `core/`. User profiles, timezone preferences, theme management, API key storage (encrypted, Fernet), and standard navigation are all provided by Themis. No separate `core/` app is needed. If SSO (Casdoor) or Organization models are required in future, they will be added as separate apps following the SSO and Organization patterns. ### 3. Neo4j Graph Model (neomodel) ```python # library/models.py class Library(StructuredNode): uid = UniqueIdProperty() name = StringProperty(unique_index=True, required=True) library_type = StringProperty(required=True) # fiction, technical, music, film, art, journal description = StringProperty(default='') # Content-type configuration (stored as JSON strings) chunking_config = JSONProperty(default={}) embedding_instruction = StringProperty(default='') reranker_instruction = StringProperty(default='') llm_context_prompt = StringProperty(default='') created_at = DateTimeProperty(default_now=True) collections = RelationshipTo('Collection', 'CONTAINS') class Collection(StructuredNode): uid = UniqueIdProperty() name = StringProperty(required=True) description = StringProperty(default='') metadata = JSONProperty(default={}) created_at = DateTimeProperty(default_now=True) items = RelationshipTo('Item', 'CONTAINS') library = RelationshipTo('Library', 'BELONGS_TO') class Item(StructuredNode): uid = UniqueIdProperty() title = StringProperty(required=True) item_type = StringProperty(default='') s3_key = StringProperty(default='') content_hash = StringProperty(index=True) file_type = StringProperty(default='') file_size = IntegerProperty(default=0) metadata = JSONProperty(default={}) created_at = DateTimeProperty(default_now=True) updated_at = DateTimeProperty(default_now=True) chunks = RelationshipTo('Chunk', 'HAS_CHUNK') images = RelationshipTo('Image', 'HAS_IMAGE') concepts = RelationshipTo('Concept', 'REFERENCES', model=ReferencesRel) related_items = RelationshipTo('Item', 'RELATED_TO', model=RelatedToRel) class Chunk(StructuredNode): uid = UniqueIdProperty() chunk_index = IntegerProperty(required=True) chunk_s3_key = StringProperty(required=True) chunk_size = IntegerProperty(default=0) text_preview = StringProperty(default='') # First 500 chars for full-text index embedding = ArrayProperty(FloatProperty()) # 4096d vector created_at = DateTimeProperty(default_now=True) mentions = RelationshipTo('Concept', 'MENTIONS') class Concept(StructuredNode): uid = UniqueIdProperty() name = StringProperty(unique_index=True, required=True) concept_type = StringProperty(default='') # person, place, topic, technique, theme embedding = ArrayProperty(FloatProperty()) # 4096d vector related_concepts = RelationshipTo('Concept', 'RELATED_TO') class Image(StructuredNode): uid = UniqueIdProperty() s3_key = StringProperty(required=True) image_type = StringProperty(default='') # cover, diagram, artwork, still, photo description = StringProperty(default='') metadata = JSONProperty(default={}) created_at = DateTimeProperty(default_now=True) embeddings = RelationshipTo('ImageEmbedding', 'HAS_EMBEDDING') class ImageEmbedding(StructuredNode): uid = UniqueIdProperty() embedding = ArrayProperty(FloatProperty()) # 4096d multimodal vector created_at = DateTimeProperty(default_now=True) ``` ### 4. Neo4j Index Setup Management command: `python manage.py setup_neo4j_indexes` Creates vector indexes (4096d cosine), full-text indexes, and constraint indexes. ### 5. Content-Type System Default library type configurations loaded via management command (`python manage.py load_library_types`). A management command is preferred over fixtures because these configurations will evolve across releases, and the command can be re-run idempotently to update defaults without overwriting per-library customizations. Default configurations: | Library Type | Chunking Strategy | Embedding Instruction | LLM Context | |-------------|-------------------|----------------------|-------------| | fiction | chapter_aware | narrative retrieval | "Excerpts from fiction..." | | technical | section_aware | procedural retrieval | "Excerpts from technical docs..." | | music | song_level | music discovery | "Song lyrics and metadata..." | | film | scene_level | cinematic retrieval | "Film content..." | | art | description_level | visual/stylistic retrieval | "Artwork descriptions..." | | journal | entry_level | temporal/reflective retrieval | "Personal journal entries..." | ### 6. Admin & Management UI `django-neomodel`'s admin support is limited — `StructuredNode` models don't participate in Django's ORM, so standard `ModelAdmin`, filters, search, and inlines don't work. Instead: - **Custom admin views** for Library, Collection, and Item CRUD using Cypher/neomodel queries, rendered in Django admin's template structure - **DRF management API** (`/api/v1/library/`, `/api/v1/collection/`, `/api/v1/item/`) for programmatic access and future frontend consumption - Library CRUD includes content-type configuration editing - Collection/Item views support filtering by library, type, and date - All admin views extend `themis/base.html` for consistent navigation ### 7. LLM Manager (Port from Spelunker) Copy and adapt `llm_manager/` app from Spelunker: - `LLMApi` model (OpenAI-compatible API endpoints) - `LLMModel` model (with new `reranker` and `multimodal_embed` model types) - `LLMUsage` tracking - **API key storage uses Themis `UserAPIKey`** — LLM Manager does not implement its own encrypted key storage. API credentials for LLM providers are stored via Themis's Fernet-encrypted `UserAPIKey` model with `key_type='api'` and appropriate `service_name` (e.g., "OpenAI", "Arke"). `LLMApi` references credentials by service name lookup against the requesting user's Themis keys. Schema additions to Spelunker's `LLMModel`: | Field | Change | Purpose | |-------|--------|---------| | `model_type` | Add choices: `reranker`, `multimodal_embed` | Support Qwen3-VL reranker and embedding models | | `supports_multimodal` | New `BooleanField` | Flag models that accept image+text input | | `vector_dimensions` | New `IntegerProperty` | Embedding output dimensions (e.g., 4096) | ### 8. Infrastructure Wiring (Ouranos) All connections follow Ouranos DNS conventions — use `.incus` hostnames, never hardcode IPs. | Service | Host | Connection | Settings Variable | |---------|------|------------|-------------------| | PostgreSQL | `portia.incus:5432` | Database `mnemosyne` (must be provisioned) | `DATABASE_URL` | | Neo4j (Bolt) | `ariel.incus:25554` | Neo4j 5.26.0 | `NEOMODEL_NEO4J_BOLT_URL` | | Neo4j (HTTP) | `ariel.incus:25584` | Browser/API access | — | | RabbitMQ | `oberon.incus:5672` | Message broker | `CELERY_BROKER_URL` | | S3 (Incus) | Terraform-provisioned Incus bucket | MinIO-backed object storage | `AWS_S3_ENDPOINT_URL`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_STORAGE_BUCKET_NAME` | | Arke LLM Proxy | `sycorax.incus:25540` | LLM API routing | Configured per `LLMApi` record | | SMTP (dev) | `oberon.incus:22025` | smtp4dev test server | `EMAIL_HOST` | | Loki (logs) | `prospero.incus:3100` | Via Alloy agent (host-level, not app-level) | — | | Casdoor SSO | `titania.incus:22081` | Future: SSO pattern | — | **Terraform provisioning required before Phase 1 deployment:** - PostgreSQL database `mnemosyne` on Portia - Incus S3 bucket for Mnemosyne content storage - HAProxy route: `mnemosyne.ouranos.helu.ca` → `puck.incus:` (port TBD, assign next available in 22xxx range) **Development environment (local):** - PostgreSQL for Django ORM on 'portia.incus' - Local Neo4j instance or `ariel.incus` via SSH tunnel - `django.core.files.storage.FileSystemStorage` for S3 (tests/dev) - `CELERY_TASK_ALWAYS_EAGER=True` for synchronous task execution ### 9. Testing Strategy Follows Red Panda Standards: Django `TestCase`, separate test files per module. | Test File | Scope | |-----------|-------| | `library/tests/test_models.py` | Neo4j node creation, relationships, property validation | | `library/tests/test_content_types.py` | `load_library_types` command, configuration retrieval per library | | `library/tests/test_indexes.py` | `setup_neo4j_indexes` command execution | | `library/tests/test_api.py` | DRF endpoints for Library/Collection/Item CRUD | | `library/tests/test_admin_views.py` | Custom admin views render and submit correctly | | `llm_manager/tests/test_models.py` | LLMApi, LLMModel creation, new model types | | `llm_manager/tests/test_api.py` | LLM Manager API endpoints | **Neo4j test strategy:** - Tests use a dedicated Neo4j test database (separate from development/production) - `NEOMODEL_NEO4J_BOLT_URL` overridden in test settings to point to test database - Each test class clears its nodes in `setUp` / `tearDown` using `neomodel.clear_neo4j_database()` - CI/CD (Gitea Runner on Puck) uses a Docker Neo4j instance for isolated test runs - For local development without Neo4j, tests that require Neo4j are skipped via `@unittest.skipUnless(neo4j_available(), "Neo4j not available")` ## Dependencies ```toml # pyproject.toml — floor-pinned with ceiling per Red Panda Standards dependencies = [ "Django>=5.2,<6.0", "djangorestframework>=3.14,<4.0", "django-neomodel>=0.1,<1.0", "neomodel>=5.3,<6.0", "neo4j>=5.0,<6.0", "celery>=5.3,<6.0", "django-storages[boto3]>=1.14,<2.0", "django-environ>=0.11,<1.0", "psycopg[binary]>=3.1,<4.0", "dj-database-url>=2.1,<3.0", "shortuuid>=1.0,<2.0", "gunicorn>=21.0,<24.0", "cryptography>=41.0,<45.0", "flower>=2.0,<3.0", "pymemcache>=4.0,<5.0", "django-heluca-themis", ] ``` ## Success Criteria - [ ] Config module renamed to `config/`, `pyproject.toml` at repo root with floor-pinned deps - [ ] Settings load from environment variables via `django-environ` (`.env.example` provided) - [ ] Django project runs with dual PostgreSQL + Neo4j databases - [ ] Can create Library → Collection → Item through custom admin views - [ ] DRF API endpoints return Library/Collection/Item data - [ ] Neo4j graph shows correct node types and relationships - [ ] Content-type configurations loaded via `load_library_types` and retrievable per library - [ ] LLM Manager ported from Spelunker; uses Themis `UserAPIKey` for credential storage - [ ] S3 storage configured against Incus bucket (Terraform-provisioned) and tested - [ ] Celery worker connects to RabbitMQ on Oberon - [ ] Structured logging configured (JSON format, compatible with Loki/Alloy) - [ ] Tests pass for all Phase 1 apps (library, llm_manager) - [ ] HAProxy route provisioned: `mnemosyne.ouranos.helu.ca`