Files
mnemosyne/docs/PHASE_1_FOUNDATION.md
Robert Helewka 99bdb4ac92 Add Themis application with custom widgets, views, and utilities
- Implemented custom form widgets for date, time, and datetime fields with DaisyUI styling.
- Created utility functions for formatting dates, times, and numbers according to user preferences.
- Developed views for profile settings, API key management, and notifications, including health check endpoints.
- Added URL configurations for Themis tests and main application routes.
- Established test cases for custom widgets to ensure proper functionality and integration.
- Defined project metadata and dependencies in pyproject.toml for package management.
2026-03-21 02:00:18 +00:00

12 KiB

Phase 1: Foundation

Objective

Establish the project skeleton, Neo4j data model, Django integration, and content-type system. At the end of this phase, you can create libraries, collections, and items via Django admin and the Neo4j graph is populated with the correct node/relationship structure.

Deliverables

1. Django Project Skeleton

  • Rename configuration module from mnemosyne/mnemosyne/ to mnemosyne/config/ per Red Panda Standards
  • Create pyproject.toml at repo root with floor-pinned dependencies
  • Create .env / .env.example for environment variables (never commit .env)
  • Use a single settings.py and use dotenv to configure with '.env'.
  • Configure dual-database: PostgreSQL (Django auth/config) + Neo4j (content graph)
  • Install and configure django-neomodel for Neo4j OGM integration
  • Configure djangorestframework for API
  • Configure Celery + RabbitMQ (Async Task pattern)
  • Configure S3 storage backend via Incus buckets (MinIO-backed, Terraform-provisioned)
  • Configure structured logging for Loki integration via Alloy

2. Django Apps

App Purpose Database
themis (installed) User profiles, preferences, API key management, navigation, notifications PostgreSQL
library/ Libraries, Collections, Items, Chunks, Concepts Neo4j (neomodel)
llm_manager/ LLM API/model config, usage tracking PostgreSQL (ported from Spelunker)

Note: Themis replaces core/. User profiles, timezone preferences, theme management, API key storage (encrypted, Fernet), and standard navigation are all provided by Themis. No separate core/ app is needed. If SSO (Casdoor) or Organization models are required in future, they will be added as separate apps following the SSO and Organization patterns.

3. Neo4j Graph Model (neomodel)

# library/models.py

class Library(StructuredNode):
    uid = UniqueIdProperty()
    name = StringProperty(unique_index=True, required=True)
    library_type = StringProperty(required=True)  # fiction, technical, music, film, art, journal
    description = StringProperty(default='')
    
    # Content-type configuration (stored as JSON strings)
    chunking_config = JSONProperty(default={})
    embedding_instruction = StringProperty(default='')
    reranker_instruction = StringProperty(default='')
    llm_context_prompt = StringProperty(default='')
    
    created_at = DateTimeProperty(default_now=True)
    collections = RelationshipTo('Collection', 'CONTAINS')


class Collection(StructuredNode):
    uid = UniqueIdProperty()
    name = StringProperty(required=True)
    description = StringProperty(default='')
    metadata = JSONProperty(default={})
    
    created_at = DateTimeProperty(default_now=True)
    items = RelationshipTo('Item', 'CONTAINS')
    library = RelationshipTo('Library', 'BELONGS_TO')


class Item(StructuredNode):
    uid = UniqueIdProperty()
    title = StringProperty(required=True)
    item_type = StringProperty(default='')
    s3_key = StringProperty(default='')
    content_hash = StringProperty(index=True)
    file_type = StringProperty(default='')
    file_size = IntegerProperty(default=0)
    metadata = JSONProperty(default={})
    
    created_at = DateTimeProperty(default_now=True)
    updated_at = DateTimeProperty(default_now=True)
    
    chunks = RelationshipTo('Chunk', 'HAS_CHUNK')
    images = RelationshipTo('Image', 'HAS_IMAGE')
    concepts = RelationshipTo('Concept', 'REFERENCES', model=ReferencesRel)
    related_items = RelationshipTo('Item', 'RELATED_TO', model=RelatedToRel)


class Chunk(StructuredNode):
    uid = UniqueIdProperty()
    chunk_index = IntegerProperty(required=True)
    chunk_s3_key = StringProperty(required=True)
    chunk_size = IntegerProperty(default=0)
    text_preview = StringProperty(default='')  # First 500 chars for full-text index
    embedding = ArrayProperty(FloatProperty())  # 4096d vector
    
    created_at = DateTimeProperty(default_now=True)
    mentions = RelationshipTo('Concept', 'MENTIONS')


class Concept(StructuredNode):
    uid = UniqueIdProperty()
    name = StringProperty(unique_index=True, required=True)
    concept_type = StringProperty(default='')  # person, place, topic, technique, theme
    embedding = ArrayProperty(FloatProperty())  # 4096d vector
    
    related_concepts = RelationshipTo('Concept', 'RELATED_TO')


class Image(StructuredNode):
    uid = UniqueIdProperty()
    s3_key = StringProperty(required=True)
    image_type = StringProperty(default='')  # cover, diagram, artwork, still, photo
    description = StringProperty(default='')
    metadata = JSONProperty(default={})
    
    created_at = DateTimeProperty(default_now=True)
    embeddings = RelationshipTo('ImageEmbedding', 'HAS_EMBEDDING')


class ImageEmbedding(StructuredNode):
    uid = UniqueIdProperty()
    embedding = ArrayProperty(FloatProperty())  # 4096d multimodal vector
    created_at = DateTimeProperty(default_now=True)

4. Neo4j Index Setup

Management command: python manage.py setup_neo4j_indexes

Creates vector indexes (4096d cosine), full-text indexes, and constraint indexes.

5. Content-Type System

Default library type configurations loaded via management command (python manage.py load_library_types). A management command is preferred over fixtures because these configurations will evolve across releases, and the command can be re-run idempotently to update defaults without overwriting per-library customizations.

Default configurations:

Library Type Chunking Strategy Embedding Instruction LLM Context
fiction chapter_aware narrative retrieval "Excerpts from fiction..."
technical section_aware procedural retrieval "Excerpts from technical docs..."
music song_level music discovery "Song lyrics and metadata..."
film scene_level cinematic retrieval "Film content..."
art description_level visual/stylistic retrieval "Artwork descriptions..."
journal entry_level temporal/reflective retrieval "Personal journal entries..."

6. Admin & Management UI

django-neomodel's admin support is limited — StructuredNode models don't participate in Django's ORM, so standard ModelAdmin, filters, search, and inlines don't work. Instead:

  • Custom admin views for Library, Collection, and Item CRUD using Cypher/neomodel queries, rendered in Django admin's template structure
  • DRF management API (/api/v1/library/, /api/v1/collection/, /api/v1/item/) for programmatic access and future frontend consumption
  • Library CRUD includes content-type configuration editing
  • Collection/Item views support filtering by library, type, and date
  • All admin views extend themis/base.html for consistent navigation

7. LLM Manager (Port from Spelunker)

Copy and adapt llm_manager/ app from Spelunker:

  • LLMApi model (OpenAI-compatible API endpoints)
  • LLMModel model (with new reranker and multimodal_embed model types)
  • LLMUsage tracking
  • API key storage uses Themis UserAPIKey — LLM Manager does not implement its own encrypted key storage. API credentials for LLM providers are stored via Themis's Fernet-encrypted UserAPIKey model with key_type='api' and appropriate service_name (e.g., "OpenAI", "Arke"). LLMApi references credentials by service name lookup against the requesting user's Themis keys.

Schema additions to Spelunker's LLMModel:

Field Change Purpose
model_type Add choices: reranker, multimodal_embed Support Qwen3-VL reranker and embedding models
supports_multimodal New BooleanField Flag models that accept image+text input
vector_dimensions New IntegerProperty Embedding output dimensions (e.g., 4096)

8. Infrastructure Wiring (Ouranos)

All connections follow Ouranos DNS conventions — use .incus hostnames, never hardcode IPs.

Service Host Connection Settings Variable
PostgreSQL portia.incus:5432 Database mnemosyne (must be provisioned) DATABASE_URL
Neo4j (Bolt) ariel.incus:25554 Neo4j 5.26.0 NEOMODEL_NEO4J_BOLT_URL
Neo4j (HTTP) ariel.incus:25584 Browser/API access
RabbitMQ oberon.incus:5672 Message broker CELERY_BROKER_URL
S3 (Incus) Terraform-provisioned Incus bucket MinIO-backed object storage AWS_S3_ENDPOINT_URL, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_STORAGE_BUCKET_NAME
Arke LLM Proxy sycorax.incus:25540 LLM API routing Configured per LLMApi record
SMTP (dev) oberon.incus:22025 smtp4dev test server EMAIL_HOST
Loki (logs) prospero.incus:3100 Via Alloy agent (host-level, not app-level)
Casdoor SSO titania.incus:22081 Future: SSO pattern

Terraform provisioning required before Phase 1 deployment:

  • PostgreSQL database mnemosyne on Portia
  • Incus S3 bucket for Mnemosyne content storage
  • HAProxy route: mnemosyne.ouranos.helu.capuck.incus:<port> (port TBD, assign next available in 22xxx range)

Development environment (local):

  • PostgreSQL for Django ORM on 'portia.incus'
  • Local Neo4j instance or ariel.incus via SSH tunnel
  • django.core.files.storage.FileSystemStorage for S3 (tests/dev)
  • CELERY_TASK_ALWAYS_EAGER=True for synchronous task execution

9. Testing Strategy

Follows Red Panda Standards: Django TestCase, separate test files per module.

Test File Scope
library/tests/test_models.py Neo4j node creation, relationships, property validation
library/tests/test_content_types.py load_library_types command, configuration retrieval per library
library/tests/test_indexes.py setup_neo4j_indexes command execution
library/tests/test_api.py DRF endpoints for Library/Collection/Item CRUD
library/tests/test_admin_views.py Custom admin views render and submit correctly
llm_manager/tests/test_models.py LLMApi, LLMModel creation, new model types
llm_manager/tests/test_api.py LLM Manager API endpoints

Neo4j test strategy:

  • Tests use a dedicated Neo4j test database (separate from development/production)
  • NEOMODEL_NEO4J_BOLT_URL overridden in test settings to point to test database
  • Each test class clears its nodes in setUp / tearDown using neomodel.clear_neo4j_database()
  • CI/CD (Gitea Runner on Puck) uses a Docker Neo4j instance for isolated test runs
  • For local development without Neo4j, tests that require Neo4j are skipped via @unittest.skipUnless(neo4j_available(), "Neo4j not available")

Dependencies

# pyproject.toml — floor-pinned with ceiling per Red Panda Standards
dependencies = [
    "Django>=5.2,<6.0",
    "djangorestframework>=3.14,<4.0",
    "django-neomodel>=0.1,<1.0",
    "neomodel>=5.3,<6.0",
    "neo4j>=5.0,<6.0",
    "celery>=5.3,<6.0",
    "django-storages[boto3]>=1.14,<2.0",
    "django-environ>=0.11,<1.0",
    "psycopg[binary]>=3.1,<4.0",
    "dj-database-url>=2.1,<3.0",
    "shortuuid>=1.0,<2.0",
    "gunicorn>=21.0,<24.0",
    "cryptography>=41.0,<45.0",
    "flower>=2.0,<3.0",
    "pymemcache>=4.0,<5.0",
    "django-heluca-themis",
]

Success Criteria

  • Config module renamed to config/, pyproject.toml at repo root with floor-pinned deps
  • Settings load from environment variables via django-environ (.env.example provided)
  • Django project runs with dual PostgreSQL + Neo4j databases
  • Can create Library → Collection → Item through custom admin views
  • DRF API endpoints return Library/Collection/Item data
  • Neo4j graph shows correct node types and relationships
  • Content-type configurations loaded via load_library_types and retrievable per library
  • LLM Manager ported from Spelunker; uses Themis UserAPIKey for credential storage
  • S3 storage configured against Incus bucket (Terraform-provisioned) and tested
  • Celery worker connects to RabbitMQ on Oberon
  • Structured logging configured (JSON format, compatible with Loki/Alloy)
  • Tests pass for all Phase 1 apps (library, llm_manager)
  • HAProxy route provisioned: mnemosyne.ouranos.helu.ca