- Implemented custom form widgets for date, time, and datetime fields with DaisyUI styling. - Created utility functions for formatting dates, times, and numbers according to user preferences. - Developed views for profile settings, API key management, and notifications, including health check endpoints. - Added URL configurations for Themis tests and main application routes. - Established test cases for custom widgets to ensure proper functionality and integration. - Defined project metadata and dependencies in pyproject.toml for package management.
255 lines
12 KiB
Markdown
255 lines
12 KiB
Markdown
# Phase 1: Foundation
|
|
|
|
## Objective
|
|
|
|
Establish the project skeleton, Neo4j data model, Django integration, and content-type system. At the end of this phase, you can create libraries, collections, and items via Django admin and the Neo4j graph is populated with the correct node/relationship structure.
|
|
|
|
## Deliverables
|
|
|
|
### 1. Django Project Skeleton
|
|
|
|
- Rename configuration module from `mnemosyne/mnemosyne/` to `mnemosyne/config/` per Red Panda Standards
|
|
- Create `pyproject.toml` at repo root with floor-pinned dependencies
|
|
- Create `.env` / `.env.example` for environment variables (never commit `.env`)
|
|
- Use a single settings.py and use dotenv to configure with '.env'.
|
|
- Configure dual-database: PostgreSQL (Django auth/config) + Neo4j (content graph)
|
|
- Install and configure `django-neomodel` for Neo4j OGM integration
|
|
- Configure `djangorestframework` for API
|
|
- Configure Celery + RabbitMQ (Async Task pattern)
|
|
- Configure S3 storage backend via Incus buckets (MinIO-backed, Terraform-provisioned)
|
|
- Configure structured logging for Loki integration via Alloy
|
|
|
|
### 2. Django Apps
|
|
|
|
| App | Purpose | Database |
|
|
|-----|---------|----------|
|
|
| `themis` (installed) | User profiles, preferences, API key management, navigation, notifications | PostgreSQL |
|
|
| `library/` | Libraries, Collections, Items, Chunks, Concepts | Neo4j (neomodel) |
|
|
| `llm_manager/` | LLM API/model config, usage tracking | PostgreSQL (ported from Spelunker) |
|
|
|
|
> **Note:** Themis replaces `core/`. User profiles, timezone preferences, theme management, API key storage (encrypted, Fernet), and standard navigation are all provided by Themis. No separate `core/` app is needed. If SSO (Casdoor) or Organization models are required in future, they will be added as separate apps following the SSO and Organization patterns.
|
|
|
|
### 3. Neo4j Graph Model (neomodel)
|
|
|
|
```python
|
|
# library/models.py
|
|
|
|
class Library(StructuredNode):
|
|
uid = UniqueIdProperty()
|
|
name = StringProperty(unique_index=True, required=True)
|
|
library_type = StringProperty(required=True) # fiction, technical, music, film, art, journal
|
|
description = StringProperty(default='')
|
|
|
|
# Content-type configuration (stored as JSON strings)
|
|
chunking_config = JSONProperty(default={})
|
|
embedding_instruction = StringProperty(default='')
|
|
reranker_instruction = StringProperty(default='')
|
|
llm_context_prompt = StringProperty(default='')
|
|
|
|
created_at = DateTimeProperty(default_now=True)
|
|
collections = RelationshipTo('Collection', 'CONTAINS')
|
|
|
|
|
|
class Collection(StructuredNode):
|
|
uid = UniqueIdProperty()
|
|
name = StringProperty(required=True)
|
|
description = StringProperty(default='')
|
|
metadata = JSONProperty(default={})
|
|
|
|
created_at = DateTimeProperty(default_now=True)
|
|
items = RelationshipTo('Item', 'CONTAINS')
|
|
library = RelationshipTo('Library', 'BELONGS_TO')
|
|
|
|
|
|
class Item(StructuredNode):
|
|
uid = UniqueIdProperty()
|
|
title = StringProperty(required=True)
|
|
item_type = StringProperty(default='')
|
|
s3_key = StringProperty(default='')
|
|
content_hash = StringProperty(index=True)
|
|
file_type = StringProperty(default='')
|
|
file_size = IntegerProperty(default=0)
|
|
metadata = JSONProperty(default={})
|
|
|
|
created_at = DateTimeProperty(default_now=True)
|
|
updated_at = DateTimeProperty(default_now=True)
|
|
|
|
chunks = RelationshipTo('Chunk', 'HAS_CHUNK')
|
|
images = RelationshipTo('Image', 'HAS_IMAGE')
|
|
concepts = RelationshipTo('Concept', 'REFERENCES', model=ReferencesRel)
|
|
related_items = RelationshipTo('Item', 'RELATED_TO', model=RelatedToRel)
|
|
|
|
|
|
class Chunk(StructuredNode):
|
|
uid = UniqueIdProperty()
|
|
chunk_index = IntegerProperty(required=True)
|
|
chunk_s3_key = StringProperty(required=True)
|
|
chunk_size = IntegerProperty(default=0)
|
|
text_preview = StringProperty(default='') # First 500 chars for full-text index
|
|
embedding = ArrayProperty(FloatProperty()) # 4096d vector
|
|
|
|
created_at = DateTimeProperty(default_now=True)
|
|
mentions = RelationshipTo('Concept', 'MENTIONS')
|
|
|
|
|
|
class Concept(StructuredNode):
|
|
uid = UniqueIdProperty()
|
|
name = StringProperty(unique_index=True, required=True)
|
|
concept_type = StringProperty(default='') # person, place, topic, technique, theme
|
|
embedding = ArrayProperty(FloatProperty()) # 4096d vector
|
|
|
|
related_concepts = RelationshipTo('Concept', 'RELATED_TO')
|
|
|
|
|
|
class Image(StructuredNode):
|
|
uid = UniqueIdProperty()
|
|
s3_key = StringProperty(required=True)
|
|
image_type = StringProperty(default='') # cover, diagram, artwork, still, photo
|
|
description = StringProperty(default='')
|
|
metadata = JSONProperty(default={})
|
|
|
|
created_at = DateTimeProperty(default_now=True)
|
|
embeddings = RelationshipTo('ImageEmbedding', 'HAS_EMBEDDING')
|
|
|
|
|
|
class ImageEmbedding(StructuredNode):
|
|
uid = UniqueIdProperty()
|
|
embedding = ArrayProperty(FloatProperty()) # 4096d multimodal vector
|
|
created_at = DateTimeProperty(default_now=True)
|
|
```
|
|
|
|
### 4. Neo4j Index Setup
|
|
|
|
Management command: `python manage.py setup_neo4j_indexes`
|
|
|
|
Creates vector indexes (4096d cosine), full-text indexes, and constraint indexes.
|
|
|
|
### 5. Content-Type System
|
|
|
|
Default library type configurations loaded via management command (`python manage.py load_library_types`). A management command is preferred over fixtures because these configurations will evolve across releases, and the command can be re-run idempotently to update defaults without overwriting per-library customizations.
|
|
|
|
Default configurations:
|
|
|
|
| Library Type | Chunking Strategy | Embedding Instruction | LLM Context |
|
|
|-------------|-------------------|----------------------|-------------|
|
|
| fiction | chapter_aware | narrative retrieval | "Excerpts from fiction..." |
|
|
| technical | section_aware | procedural retrieval | "Excerpts from technical docs..." |
|
|
| music | song_level | music discovery | "Song lyrics and metadata..." |
|
|
| film | scene_level | cinematic retrieval | "Film content..." |
|
|
| art | description_level | visual/stylistic retrieval | "Artwork descriptions..." |
|
|
| journal | entry_level | temporal/reflective retrieval | "Personal journal entries..." |
|
|
|
|
### 6. Admin & Management UI
|
|
|
|
`django-neomodel`'s admin support is limited — `StructuredNode` models don't participate in Django's ORM, so standard `ModelAdmin`, filters, search, and inlines don't work. Instead:
|
|
|
|
- **Custom admin views** for Library, Collection, and Item CRUD using Cypher/neomodel queries, rendered in Django admin's template structure
|
|
- **DRF management API** (`/api/v1/library/`, `/api/v1/collection/`, `/api/v1/item/`) for programmatic access and future frontend consumption
|
|
- Library CRUD includes content-type configuration editing
|
|
- Collection/Item views support filtering by library, type, and date
|
|
- All admin views extend `themis/base.html` for consistent navigation
|
|
|
|
### 7. LLM Manager (Port from Spelunker)
|
|
|
|
Copy and adapt `llm_manager/` app from Spelunker:
|
|
- `LLMApi` model (OpenAI-compatible API endpoints)
|
|
- `LLMModel` model (with new `reranker` and `multimodal_embed` model types)
|
|
- `LLMUsage` tracking
|
|
- **API key storage uses Themis `UserAPIKey`** — LLM Manager does not implement its own encrypted key storage. API credentials for LLM providers are stored via Themis's Fernet-encrypted `UserAPIKey` model with `key_type='api'` and appropriate `service_name` (e.g., "OpenAI", "Arke"). `LLMApi` references credentials by service name lookup against the requesting user's Themis keys.
|
|
|
|
Schema additions to Spelunker's `LLMModel`:
|
|
|
|
| Field | Change | Purpose |
|
|
|-------|--------|---------|
|
|
| `model_type` | Add choices: `reranker`, `multimodal_embed` | Support Qwen3-VL reranker and embedding models |
|
|
| `supports_multimodal` | New `BooleanField` | Flag models that accept image+text input |
|
|
| `vector_dimensions` | New `IntegerProperty` | Embedding output dimensions (e.g., 4096) |
|
|
|
|
### 8. Infrastructure Wiring (Ouranos)
|
|
|
|
All connections follow Ouranos DNS conventions — use `.incus` hostnames, never hardcode IPs.
|
|
|
|
| Service | Host | Connection | Settings Variable |
|
|
|---------|------|------------|-------------------|
|
|
| PostgreSQL | `portia.incus:5432` | Database `mnemosyne` (must be provisioned) | `DATABASE_URL` |
|
|
| Neo4j (Bolt) | `ariel.incus:25554` | Neo4j 5.26.0 | `NEOMODEL_NEO4J_BOLT_URL` |
|
|
| Neo4j (HTTP) | `ariel.incus:25584` | Browser/API access | — |
|
|
| RabbitMQ | `oberon.incus:5672` | Message broker | `CELERY_BROKER_URL` |
|
|
| S3 (Incus) | Terraform-provisioned Incus bucket | MinIO-backed object storage | `AWS_S3_ENDPOINT_URL`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_STORAGE_BUCKET_NAME` |
|
|
| Arke LLM Proxy | `sycorax.incus:25540` | LLM API routing | Configured per `LLMApi` record |
|
|
| SMTP (dev) | `oberon.incus:22025` | smtp4dev test server | `EMAIL_HOST` |
|
|
| Loki (logs) | `prospero.incus:3100` | Via Alloy agent (host-level, not app-level) | — |
|
|
| Casdoor SSO | `titania.incus:22081` | Future: SSO pattern | — |
|
|
|
|
**Terraform provisioning required before Phase 1 deployment:**
|
|
- PostgreSQL database `mnemosyne` on Portia
|
|
- Incus S3 bucket for Mnemosyne content storage
|
|
- HAProxy route: `mnemosyne.ouranos.helu.ca` → `puck.incus:<port>` (port TBD, assign next available in 22xxx range)
|
|
|
|
**Development environment (local):**
|
|
- PostgreSQL for Django ORM on 'portia.incus'
|
|
- Local Neo4j instance or `ariel.incus` via SSH tunnel
|
|
- `django.core.files.storage.FileSystemStorage` for S3 (tests/dev)
|
|
- `CELERY_TASK_ALWAYS_EAGER=True` for synchronous task execution
|
|
|
|
### 9. Testing Strategy
|
|
|
|
Follows Red Panda Standards: Django `TestCase`, separate test files per module.
|
|
|
|
| Test File | Scope |
|
|
|-----------|-------|
|
|
| `library/tests/test_models.py` | Neo4j node creation, relationships, property validation |
|
|
| `library/tests/test_content_types.py` | `load_library_types` command, configuration retrieval per library |
|
|
| `library/tests/test_indexes.py` | `setup_neo4j_indexes` command execution |
|
|
| `library/tests/test_api.py` | DRF endpoints for Library/Collection/Item CRUD |
|
|
| `library/tests/test_admin_views.py` | Custom admin views render and submit correctly |
|
|
| `llm_manager/tests/test_models.py` | LLMApi, LLMModel creation, new model types |
|
|
| `llm_manager/tests/test_api.py` | LLM Manager API endpoints |
|
|
|
|
**Neo4j test strategy:**
|
|
- Tests use a dedicated Neo4j test database (separate from development/production)
|
|
- `NEOMODEL_NEO4J_BOLT_URL` overridden in test settings to point to test database
|
|
- Each test class clears its nodes in `setUp` / `tearDown` using `neomodel.clear_neo4j_database()`
|
|
- CI/CD (Gitea Runner on Puck) uses a Docker Neo4j instance for isolated test runs
|
|
- For local development without Neo4j, tests that require Neo4j are skipped via `@unittest.skipUnless(neo4j_available(), "Neo4j not available")`
|
|
|
|
## Dependencies
|
|
|
|
```toml
|
|
# pyproject.toml — floor-pinned with ceiling per Red Panda Standards
|
|
dependencies = [
|
|
"Django>=5.2,<6.0",
|
|
"djangorestframework>=3.14,<4.0",
|
|
"django-neomodel>=0.1,<1.0",
|
|
"neomodel>=5.3,<6.0",
|
|
"neo4j>=5.0,<6.0",
|
|
"celery>=5.3,<6.0",
|
|
"django-storages[boto3]>=1.14,<2.0",
|
|
"django-environ>=0.11,<1.0",
|
|
"psycopg[binary]>=3.1,<4.0",
|
|
"dj-database-url>=2.1,<3.0",
|
|
"shortuuid>=1.0,<2.0",
|
|
"gunicorn>=21.0,<24.0",
|
|
"cryptography>=41.0,<45.0",
|
|
"flower>=2.0,<3.0",
|
|
"pymemcache>=4.0,<5.0",
|
|
"django-heluca-themis",
|
|
]
|
|
```
|
|
|
|
## Success Criteria
|
|
|
|
- [ ] Config module renamed to `config/`, `pyproject.toml` at repo root with floor-pinned deps
|
|
- [ ] Settings load from environment variables via `django-environ` (`.env.example` provided)
|
|
- [ ] Django project runs with dual PostgreSQL + Neo4j databases
|
|
- [ ] Can create Library → Collection → Item through custom admin views
|
|
- [ ] DRF API endpoints return Library/Collection/Item data
|
|
- [ ] Neo4j graph shows correct node types and relationships
|
|
- [ ] Content-type configurations loaded via `load_library_types` and retrievable per library
|
|
- [ ] LLM Manager ported from Spelunker; uses Themis `UserAPIKey` for credential storage
|
|
- [ ] S3 storage configured against Incus bucket (Terraform-provisioned) and tested
|
|
- [ ] Celery worker connects to RabbitMQ on Oberon
|
|
- [ ] Structured logging configured (JSON format, compatible with Loki/Alloy)
|
|
- [ ] Tests pass for all Phase 1 apps (library, llm_manager)
|
|
- [ ] HAProxy route provisioned: `mnemosyne.ouranos.helu.ca`
|