# Phase 1: Foundation

## Objective

Establish the project skeleton, Neo4j data model, Django integration, and content-type system. At the end of this phase, you can create libraries, collections, and items via Django admin and the Neo4j graph is populated with the correct node/relationship structure.

## Deliverables

### 1. Django Project Skeleton

- Rename configuration module from `mnemosyne/mnemosyne/` to `mnemosyne/config/` per Red Panda Standards
- Create `pyproject.toml` at repo root with floor-pinned dependencies
- Create `.env` / `.env.example` for environment variables (never commit `.env`)
- Use a single settings.py and use dotenv to configure with '.env'.
- Configure dual-database: PostgreSQL (Django auth/config) + Neo4j (content graph)
- Install and configure `django-neomodel` for Neo4j OGM integration
- Configure `djangorestframework` for API
- Configure Celery + RabbitMQ (Async Task pattern)
- Configure S3 storage backend via Incus buckets (MinIO-backed, Terraform-provisioned)
- Configure structured logging for Loki integration via Alloy

### 2. Django Apps

| App | Purpose | Database |
|-----|---------|----------|
| `themis` (installed) | User profiles, preferences, API key management, navigation, notifications | PostgreSQL |
| `library/` | Libraries, Collections, Items, Chunks, Concepts | Neo4j (neomodel) |
| `llm_manager/` | LLM API/model config, usage tracking | PostgreSQL (ported from Spelunker) |

> **Note:** Themis replaces `core/`. User profiles, timezone preferences, theme management, API key storage (encrypted, Fernet), and standard navigation are all provided by Themis. No separate `core/` app is needed. If SSO (Casdoor) or Organization models are required in future, they will be added as separate apps following the SSO and Organization patterns.

### 3. Neo4j Graph Model (neomodel)

```python
# library/models.py

class Library(StructuredNode):
    uid = UniqueIdProperty()
    name = StringProperty(unique_index=True, required=True)
    library_type = StringProperty(required=True)  # fiction, technical, music, film, art, journal
    description = StringProperty(default='')
    
    # Content-type configuration (stored as JSON strings)
    chunking_config = JSONProperty(default={})
    embedding_instruction = StringProperty(default='')
    reranker_instruction = StringProperty(default='')
    llm_context_prompt = StringProperty(default='')
    
    created_at = DateTimeProperty(default_now=True)
    collections = RelationshipTo('Collection', 'CONTAINS')


class Collection(StructuredNode):
    uid = UniqueIdProperty()
    name = StringProperty(required=True)
    description = StringProperty(default='')
    metadata = JSONProperty(default={})
    
    created_at = DateTimeProperty(default_now=True)
    items = RelationshipTo('Item', 'CONTAINS')
    library = RelationshipTo('Library', 'BELONGS_TO')


class Item(StructuredNode):
    uid = UniqueIdProperty()
    title = StringProperty(required=True)
    item_type = StringProperty(default='')
    s3_key = StringProperty(default='')
    content_hash = StringProperty(index=True)
    file_type = StringProperty(default='')
    file_size = IntegerProperty(default=0)
    metadata = JSONProperty(default={})
    
    created_at = DateTimeProperty(default_now=True)
    updated_at = DateTimeProperty(default_now=True)
    
    chunks = RelationshipTo('Chunk', 'HAS_CHUNK')
    images = RelationshipTo('Image', 'HAS_IMAGE')
    concepts = RelationshipTo('Concept', 'REFERENCES', model=ReferencesRel)
    related_items = RelationshipTo('Item', 'RELATED_TO', model=RelatedToRel)


class Chunk(StructuredNode):
    uid = UniqueIdProperty()
    chunk_index = IntegerProperty(required=True)
    chunk_s3_key = StringProperty(required=True)
    chunk_size = IntegerProperty(default=0)
    text_preview = StringProperty(default='')  # First 500 chars for full-text index
    embedding = ArrayProperty(FloatProperty())  # 4096d vector
    
    created_at = DateTimeProperty(default_now=True)
    mentions = RelationshipTo('Concept', 'MENTIONS')


class Concept(StructuredNode):
    uid = UniqueIdProperty()
    name = StringProperty(unique_index=True, required=True)
    concept_type = StringProperty(default='')  # person, place, topic, technique, theme
    embedding = ArrayProperty(FloatProperty())  # 4096d vector
    
    related_concepts = RelationshipTo('Concept', 'RELATED_TO')


class Image(StructuredNode):
    uid = UniqueIdProperty()
    s3_key = StringProperty(required=True)
    image_type = StringProperty(default='')  # cover, diagram, artwork, still, photo
    description = StringProperty(default='')
    metadata = JSONProperty(default={})
    
    created_at = DateTimeProperty(default_now=True)
    embeddings = RelationshipTo('ImageEmbedding', 'HAS_EMBEDDING')


class ImageEmbedding(StructuredNode):
    uid = UniqueIdProperty()
    embedding = ArrayProperty(FloatProperty())  # 4096d multimodal vector
    created_at = DateTimeProperty(default_now=True)
```

### 4. Neo4j Index Setup

Management command: `python manage.py setup_neo4j_indexes`

Creates vector indexes (4096d cosine), full-text indexes, and constraint indexes.

### 5. Content-Type System

Default library type configurations loaded via management command (`python manage.py load_library_types`). A management command is preferred over fixtures because these configurations will evolve across releases, and the command can be re-run idempotently to update defaults without overwriting per-library customizations.

Default configurations:

| Library Type | Chunking Strategy | Embedding Instruction | LLM Context |
|-------------|-------------------|----------------------|-------------|
| fiction | chapter_aware | narrative retrieval | "Excerpts from fiction..." |
| technical | section_aware | procedural retrieval | "Excerpts from technical docs..." |
| music | song_level | music discovery | "Song lyrics and metadata..." |
| film | scene_level | cinematic retrieval | "Film content..." |
| art | description_level | visual/stylistic retrieval | "Artwork descriptions..." |
| journal | entry_level | temporal/reflective retrieval | "Personal journal entries..." |

### 6. Admin & Management UI

`django-neomodel`'s admin support is limited — `StructuredNode` models don't participate in Django's ORM, so standard `ModelAdmin`, filters, search, and inlines don't work. Instead:

- **Custom admin views** for Library, Collection, and Item CRUD using Cypher/neomodel queries, rendered in Django admin's template structure
- **DRF management API** (`/api/v1/library/`, `/api/v1/collection/`, `/api/v1/item/`) for programmatic access and future frontend consumption
- Library CRUD includes content-type configuration editing
- Collection/Item views support filtering by library, type, and date
- All admin views extend `themis/base.html` for consistent navigation

### 7. LLM Manager (Port from Spelunker)

Copy and adapt `llm_manager/` app from Spelunker:
- `LLMApi` model (OpenAI-compatible API endpoints)
- `LLMModel` model (with new `reranker` and `multimodal_embed` model types)
- `LLMUsage` tracking
- **API key storage uses Themis `UserAPIKey`** — LLM Manager does not implement its own encrypted key storage. API credentials for LLM providers are stored via Themis's Fernet-encrypted `UserAPIKey` model with `key_type='api'` and appropriate `service_name` (e.g., "OpenAI", "Arke"). `LLMApi` references credentials by service name lookup against the requesting user's Themis keys.

Schema additions to Spelunker's `LLMModel`:

| Field | Change | Purpose |
|-------|--------|---------|
| `model_type` | Add choices: `reranker`, `multimodal_embed` | Support Qwen3-VL reranker and embedding models |
| `supports_multimodal` | New `BooleanField` | Flag models that accept image+text input |
| `vector_dimensions` | New `IntegerProperty` | Embedding output dimensions (e.g., 4096) |

### 8. Infrastructure Wiring (Ouranos)

All connections follow Ouranos DNS conventions — use `.incus` hostnames, never hardcode IPs.

| Service | Host | Connection | Settings Variable |
|---------|------|------------|-------------------|
| PostgreSQL | `portia.incus:5432` | Database `mnemosyne` (must be provisioned) | `DATABASE_URL` |
| Neo4j (Bolt) | `ariel.incus:25554` | Neo4j 5.26.0 | `NEOMODEL_NEO4J_BOLT_URL` |
| Neo4j (HTTP) | `ariel.incus:25584` | Browser/API access | — |
| RabbitMQ | `oberon.incus:5672` | Message broker | `CELERY_BROKER_URL` |
| S3 (Incus) | Terraform-provisioned Incus bucket | MinIO-backed object storage | `AWS_S3_ENDPOINT_URL`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_STORAGE_BUCKET_NAME` |
| Arke LLM Proxy | `sycorax.incus:25540` | LLM API routing | Configured per `LLMApi` record |
| SMTP (dev) | `oberon.incus:22025` | smtp4dev test server | `EMAIL_HOST` |
| Loki (logs) | `prospero.incus:3100` | Via Alloy agent (host-level, not app-level) | — |
| Casdoor SSO | `titania.incus:22081` | Future: SSO pattern | — |

**Terraform provisioning required before Phase 1 deployment:**
- PostgreSQL database `mnemosyne` on Portia
- Incus S3 bucket for Mnemosyne content storage
- HAProxy route: `mnemosyne.ouranos.helu.ca` → `puck.incus:<port>` (port TBD, assign next available in 22xxx range)

**Development environment (local):**
- PostgreSQL for Django ORM on 'portia.incus'
- Local Neo4j instance or `ariel.incus` via SSH tunnel
- `django.core.files.storage.FileSystemStorage` for S3 (tests/dev)
- `CELERY_TASK_ALWAYS_EAGER=True` for synchronous task execution

### 9. Testing Strategy

Follows Red Panda Standards: Django `TestCase`, separate test files per module.

| Test File | Scope |
|-----------|-------|
| `library/tests/test_models.py` | Neo4j node creation, relationships, property validation |
| `library/tests/test_content_types.py` | `load_library_types` command, configuration retrieval per library |
| `library/tests/test_indexes.py` | `setup_neo4j_indexes` command execution |
| `library/tests/test_api.py` | DRF endpoints for Library/Collection/Item CRUD |
| `library/tests/test_admin_views.py` | Custom admin views render and submit correctly |
| `llm_manager/tests/test_models.py` | LLMApi, LLMModel creation, new model types |
| `llm_manager/tests/test_api.py` | LLM Manager API endpoints |

**Neo4j test strategy:**
- Tests use a dedicated Neo4j test database (separate from development/production)
- `NEOMODEL_NEO4J_BOLT_URL` overridden in test settings to point to test database
- Each test class clears its nodes in `setUp` / `tearDown` using `neomodel.clear_neo4j_database()`
- CI/CD (Gitea Runner on Puck) uses a Docker Neo4j instance for isolated test runs
- For local development without Neo4j, tests that require Neo4j are skipped via `@unittest.skipUnless(neo4j_available(), "Neo4j not available")`

## Dependencies

```toml
# pyproject.toml — floor-pinned with ceiling per Red Panda Standards
dependencies = [
    "Django>=5.2,<6.0",
    "djangorestframework>=3.14,<4.0",
    "django-neomodel>=0.1,<1.0",
    "neomodel>=5.3,<6.0",
    "neo4j>=5.0,<6.0",
    "celery>=5.3,<6.0",
    "django-storages[boto3]>=1.14,<2.0",
    "django-environ>=0.11,<1.0",
    "psycopg[binary]>=3.1,<4.0",
    "dj-database-url>=2.1,<3.0",
    "shortuuid>=1.0,<2.0",
    "gunicorn>=21.0,<24.0",
    "cryptography>=41.0,<45.0",
    "flower>=2.0,<3.0",
    "pymemcache>=4.0,<5.0",
    "django-heluca-themis",
]
```

## Success Criteria

- [ ] Config module renamed to `config/`, `pyproject.toml` at repo root with floor-pinned deps
- [ ] Settings load from environment variables via `django-environ` (`.env.example` provided)
- [ ] Django project runs with dual PostgreSQL + Neo4j databases
- [ ] Can create Library → Collection → Item through custom admin views
- [ ] DRF API endpoints return Library/Collection/Item data
- [ ] Neo4j graph shows correct node types and relationships
- [ ] Content-type configurations loaded via `load_library_types` and retrievable per library
- [ ] LLM Manager ported from Spelunker; uses Themis `UserAPIKey` for credential storage
- [ ] S3 storage configured against Incus bucket (Terraform-provisioned) and tested
- [ ] Celery worker connects to RabbitMQ on Oberon
- [ ] Structured logging configured (JSON format, compatible with Loki/Alloy)
- [ ] Tests pass for all Phase 1 apps (library, llm_manager)
- [ ] HAProxy route provisioned: `mnemosyne.ouranos.helu.ca`