Add Themis application with custom widgets, views, and utilities

- Implemented custom form widgets for date, time, and datetime fields with DaisyUI styling. - Created utility functions for formatting dates, times, and numbers according to user preferences. - Developed views for profile settings, API key management, and notifications, including health check endpoints. - Added URL configurations for Themis tests and main application routes. - Established test cases for custom widgets to ensure proper functionality and integration. - Defined project metadata and dependencies in pyproject.toml for package management.
2026-03-21 02:00:18 +00:00
parent e99346d014
commit 99bdb4ac92
351 changed files with 65123 additions and 2 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,96 @@
-# mnemosyne
+# Mnemosyne
+
+*"The electric light did not come from the continuous improvement of candles."* — Oren Harari
+
+**The memory of everything you know.**
+
+Mnemosyne is a content-type-aware, multimodal personal knowledge management system built on Neo4j knowledge graphs and Qwen3-VL multimodal AI models. Named after the Titan goddess of memory and mother of the nine Muses, Mnemosyne doesn't just store your knowledge — it understands what kind of knowledge it is, connects it through relationships, and makes it all searchable through text, images, and natural language.
+
+## What Makes This Different
+
+Every existing knowledge base tool treats all documents identically: text in, chunks out, vectors stored. A novel and a PostgreSQL manual get the same treatment.
+
+Mnemosyne knows the difference:
+
+- **A textbook** has chapters, an index, technical terminology, and pedagogical structure. It's chunked accordingly, and when an LLM retrieves results, it knows this is instructional content.
+- **A novel** has narrative flow, characters, plot arcs, dialogue. The LLM knows to interpret results as creative fiction.
+- **Album artwork** is a visual asset tied to an artist, genre, and era. It's embedded multimodally — searchable by both image similarity and text description.
+- **A journal entry** is personal, temporal, reflective. The LLM treats it differently than a reference manual.
+
+This **content-type awareness** flows through every layer: chunking strategy, embedding instructions, re-ranking, and the final LLM prompt.
+
+## Core Architecture
+
+| Component | Technology | Purpose |
+|-----------|-----------|---------|
+| **Knowledge Graph** | Neo4j 5.x | Relationships + vector storage (no dimension limits) |
+| **Multimodal Embeddings** | Qwen3-VL-Embedding-8B | Text + image + video in unified vector space (4096d) |
+| **Multimodal Re-ranking** | Qwen3-VL-Reranker-8B | Cross-attention precision scoring |
+| **Text Fallback** | Qwen3-Reranker (llama.cpp) | Text-only re-ranking via GGUF |
+| **Web Framework** | Django 5.x + DRF | Auth, admin, API, content management |
+| **Object Storage** | S3/MinIO | Original content + chunk text storage |
+| **Async Processing** | Celery + RabbitMQ | Document embedding, graph construction |
+| **LLM Interface** | MCP Server | Primary interface for Claude, Copilot, etc. |
+| **GPU Serving** | vLLM + llama.cpp | Local model inference |
+
+## Library Types
+
+| Library | Example Content | Multimodal? | Graph Relationships |
+|---------|----------------|-------------|-------------------|
+| **Fiction** | Novels, short stories | Cover art | Author → Book → Character → Theme |
+| **Technical** | Textbooks, manuals, docs | Diagrams, screenshots | Product → Manual → Section → Procedure |
+| **Music** | Lyrics, liner notes | Album artwork | Artist → Album → Track → Genre |
+| **Film** | Scripts, synopses | Stills, posters | Director → Film → Scene → Actor |
+| **Art** | Descriptions, catalogs | The artwork itself | Artist → Piece → Style → Movement |
+| **Journals** | Personal entries | Photos | Date → Entry → Topic → Person/Place |
+
+## Search Pipeline
+
+```
+Query → Vector Search (Neo4j) + Graph Traversal (Cypher) + Full-Text Search
+  → Candidate Fusion → Qwen3-VL Re-ranking → Content-Type Context Injection
+    → LLM Response with Citations
+```
+
+## Heritage
+
+Mnemosyne's RAG pipeline architecture is inspired by [Spelunker](https://git.helu.ca/r/spelunker), an enterprise RFP response platform. The proven patterns — hybrid search, two-stage RAG (responder + reviewer), citation-based retrieval, and async document processing — are carried forward and enhanced with multimodal capabilities and knowledge graph relationships.
+
+## Running Celery Workers
+
+Mnemosyne uses Celery with RabbitMQ for async document embedding. From the `mnemosyne/` directory:
+
+```bash
+# Development — single worker, all queues
+celery -A mnemosyne worker -l info -Q celery,embedding,batch
+
+# Or skip workers entirely with eager mode (.env):
+CELERY_TASK_ALWAYS_EAGER=True
+```
+
+**Production — separate workers:**
+```bash
+celery -A mnemosyne worker -l info -Q embedding -c 1 -n embedding@%h    # GPU-bound embedding
+celery -A mnemosyne worker -l info -Q batch -c 2 -n batch@%h            # Batch orchestration
+celery -A mnemosyne worker -l info -Q celery -c 2 -n default@%h         # LLM API validation
+```
+
+**Scheduler & Monitoring:**
+```bash
+celery -A mnemosyne beat -l info            # Periodic task scheduler
+celery -A mnemosyne flower --port=5555      # Web monitoring UI
+```
+
+See [Phase 2: Celery Workers & Scheduler](docs/PHASE_2_EMBEDDING_PIPELINE.md#celery-workers--scheduler) for full details on queues, reliability settings, and task progress tracking.
+
+## Documentation
+
+- **[Architecture Documentation](docs/mnemosyne.html)** — Full system architecture with diagrams
+- **[Phase 1: Foundation](docs/PHASE_1_FOUNDATION.md)** — Project skeleton, Neo4j data model, content-type system
+- **[Phase 2: Embedding Pipeline](docs/PHASE_2_EMBEDDING_PIPELINE.md)** — Qwen3-VL multimodal embedding
+- **[Phase 3: Search & Re-ranking](docs/PHASE_3_SEARCH_AND_RERANKING.md)** — Hybrid search + re-ranker
+- **[Phase 4: RAG Pipeline](docs/PHASE_4_RAG_PIPELINE.md)** — Content-type-aware generation
+- **[Phase 5: MCP Server](docs/PHASE_5_MCP_SERVER.md)** — LLM integration interface
+- **[Phase 6: Backport to Spelunker](docs/PHASE_6_BACKPORT_TO_SPELUNKER.md)** — Proven patterns flowing back
+