feat: add Daedalus application configuration, database setup, and monitoring alerts
This commit is contained in:
244
docs/daedalus.md
Normal file
244
docs/daedalus.md
Normal file
@@ -0,0 +1,244 @@
|
||||
# Daedalus — Deployment Requirements
|
||||
|
||||
All infrastructure runs within the Agathos Incus sandbox. Hosts are resolved via DNS using the `.incus` suffix.
|
||||
|
||||
---
|
||||
|
||||
## 1. HAProxy — Titania
|
||||
|
||||
**Host:** `titania.incus`
|
||||
**Domain:** `daedalus.ouranos.helu.ca`
|
||||
|
||||
HAProxy on Titania terminates TLS and routes traffic to Daedalus on puck. Casdoor SSO enforces authentication before requests reach the backend.
|
||||
|
||||
```haproxy
|
||||
frontend https
|
||||
acl host_daedalus hdr(host) -i daedalus.ouranos.helu.ca
|
||||
use_backend daedalus if host_daedalus
|
||||
|
||||
backend daedalus
|
||||
option httpchk GET /api/health
|
||||
server puck puck.incus:22181 check
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
- ACL entry in the HAProxy `frontend https` block
|
||||
- Backend definition with health check on `/api/health`
|
||||
- Casdoor application configured for `daedalus.ouranos.helu.ca` (same pattern as other Agathos services)
|
||||
- TLS certificate covering `daedalus.ouranos.helu.ca` (wildcard or SAN)
|
||||
|
||||
---
|
||||
|
||||
## 2. PostgreSQL — Portia
|
||||
|
||||
**Host:** `portia.incus`
|
||||
**Port:** 5432
|
||||
**Database:** `daedalus`
|
||||
|
||||
Stores conversation history, workspace configuration, user preferences, and file metadata (S3 keys).
|
||||
|
||||
**Provisioning:**
|
||||
```sql
|
||||
CREATE USER daedalus WITH PASSWORD '<from Ansible Vault>';
|
||||
CREATE DATABASE daedalus OWNER daedalus;
|
||||
```
|
||||
|
||||
**Connection string (backend `.env`):**
|
||||
```
|
||||
DAEDALUS_DATABASE_URL=postgresql+asyncpg://daedalus:<password>@portia.incus:5432/daedalus
|
||||
```
|
||||
|
||||
**Schema management:** Alembic migrations run from the backend container/service on puck. No direct DDL.
|
||||
|
||||
**Tables (created by Alembic):**
|
||||
- Workspaces
|
||||
- Conversations / messages
|
||||
- File metadata (S3 keys, filenames, content types, sizes)
|
||||
- User preferences
|
||||
|
||||
---
|
||||
|
||||
## 3. Prometheus Scraping — Prospero
|
||||
|
||||
**Scraper:** `prospero.incus`
|
||||
**Target:** `puck.incus:22181/metrics`
|
||||
|
||||
Daedalus exposes a `/metrics` endpoint in Prometheus text format. Prospero's Prometheus must be configured to scrape it.
|
||||
|
||||
**Prometheus scrape config:**
|
||||
```yaml
|
||||
- job_name: daedalus
|
||||
scrape_interval: 15s
|
||||
metrics_path: /metrics
|
||||
static_configs:
|
||||
- targets:
|
||||
- puck.incus:22181
|
||||
```
|
||||
|
||||
**Key metric families:**
|
||||
| Prefix | Category |
|
||||
|--------|----------|
|
||||
| `daedalus_up`, `daedalus_build_info` | Application health |
|
||||
| `daedalus_http_requests_total`, `daedalus_http_request_duration_seconds` | HTTP traffic |
|
||||
| `daedalus_mcp_*` | MCP connection and request metrics |
|
||||
| `daedalus_agent_*` | Agent interaction metrics |
|
||||
| `daedalus_file_*`, `daedalus_s3_*` | File and S3 operations |
|
||||
| `daedalus_client_*` | Browser telemetry (exceptions, Web Vitals) |
|
||||
| `daedalus_feature_usage_total` | Feature usage counters |
|
||||
|
||||
**Network requirement:** The `/metrics` endpoint is restricted to internal networks (`10.10.0.0/24`, `172.16.0.0/12`, `127.0.0.0/8`) in the Nginx config.
|
||||
|
||||
**AlertManager rules** (on Prospero):
|
||||
| Alert | Condition | Severity |
|
||||
|-------|-----------|----------|
|
||||
| `DaedalusDown` | `daedalus_up == 0` for 1m | critical |
|
||||
| `DaedalusMCPDisconnected` | `daedalus_mcp_connections_active == 0` for 5m | warning |
|
||||
| `DaedalusHighErrorRate` | HTTP 5xx > 5% for 5m | warning |
|
||||
| `DaedalusClientExceptionSpike` | Client exceptions > 10/min | warning |
|
||||
| `DaedalusSlowResponses` | p95 > 5s for 5m | warning |
|
||||
| `DaedalusMCPLatency` | MCP p95 > 30s for 5m | warning |
|
||||
| `DaedalusS3Errors` | S3 error rate > 1% for 5m | warning |
|
||||
|
||||
---
|
||||
|
||||
## 4. S3 Object Storage — MinIO
|
||||
|
||||
**Provider:** MinIO on Incus (provisioned by Terraform)
|
||||
**Bucket:** `daedalus`
|
||||
|
||||
Stores workspace file uploads. Metadata lives in PostgreSQL; actual bytes live in S3.
|
||||
|
||||
**Key layout:**
|
||||
```
|
||||
workspaces/{workspace_id}/files/{file_id}/{filename}
|
||||
```
|
||||
|
||||
**Backend environment variables:**
|
||||
```
|
||||
DAEDALUS_S3_ENDPOINT=http://<minio-host>:9000
|
||||
DAEDALUS_S3_ACCESS_KEY=<from Ansible Vault>
|
||||
DAEDALUS_S3_SECRET_KEY=<from Ansible Vault>
|
||||
DAEDALUS_S3_BUCKET=daedalus
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
- Terraform resource for the bucket (same pattern as Casdoor and LobeChat S3 buckets)
|
||||
- Access key / secret key stored in Ansible Vault
|
||||
- Credentials never exposed to the frontend — all file access flows through FastAPI
|
||||
|
||||
---
|
||||
|
||||
## 5. Application Runtime — Puck
|
||||
|
||||
**Host:** `puck.incus`
|
||||
**Nginx port:** 22181 (proxied by HAProxy on Titania)
|
||||
**Uvicorn port:** 8000 (internal only, behind Nginx)
|
||||
|
||||
### Production Deployment
|
||||
|
||||
**Systemd service** (`/etc/systemd/system/daedalus-api.service`):
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Daedalus API (FastAPI)
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=daedalus
|
||||
WorkingDirectory=/srv/daedalus/backend
|
||||
ExecStart=/srv/daedalus/venv/bin/uvicorn daedalus.main:app --host 127.0.0.1 --port 8000 --workers 2
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
Environment=DAEDALUS_ENV=production
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
SyslogIdentifier=daedalus
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
**Nginx** serves the SvelteKit static build from `/srv/daedalus/frontend/build` and proxies `/api/*` and `/metrics` to Uvicorn on `127.0.0.1:8000`.
|
||||
|
||||
**Directory layout (production):**
|
||||
```
|
||||
/srv/daedalus/
|
||||
├── backend/ # Python source
|
||||
├── frontend/build/ # Static SPA build
|
||||
├── venv/ # Python virtualenv
|
||||
└── .env # Environment configuration
|
||||
```
|
||||
|
||||
### Docker Compose (Development)
|
||||
|
||||
For local development on puck, two containers run behind Docker bridge networking:
|
||||
|
||||
| Service | Image | Port |
|
||||
|---------|-------|------|
|
||||
| `api` | Built from `./backend/Dockerfile` | 8000 (internal) |
|
||||
| `nginx` | `nginx:alpine` | 22181 → 80 |
|
||||
|
||||
```bash
|
||||
docker compose up --build
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Logging — Alloy → Loki
|
||||
|
||||
**No log files.** Structured JSON goes to stdout.
|
||||
|
||||
| Environment | Log path |
|
||||
|-------------|----------|
|
||||
| Production (systemd) | stdout → journal → syslog → Alloy → Loki (prospero) |
|
||||
| Development (Docker) | stdout → Docker log driver → Alloy → Loki (prospero) |
|
||||
|
||||
Alloy on puck is already configured to ship container and journal logs to Loki on Prospero. The `SyslogIdentifier=daedalus` tag allows filtering in Grafana with `{unit="daedalus"}`.
|
||||
|
||||
---
|
||||
|
||||
## 7. LLM Proxy — Sycorax
|
||||
|
||||
**Host:** `sycorax.incus`
|
||||
|
||||
FastAgent MCP servers route LLM API calls through Arke on Sycorax for multi-provider model routing (OpenAI, Anthropic, etc.). Daedalus does not call Sycorax directly — it communicates with FastAgent servers over MCP Streamable HTTP, and those agents use Sycorax.
|
||||
|
||||
---
|
||||
|
||||
## 8. DNS Summary
|
||||
|
||||
| FQDN | Resolves to | Purpose |
|
||||
|------|-------------|---------|
|
||||
| `daedalus.ouranos.helu.ca` | Titania (HAProxy) | Public entry point |
|
||||
| `puck.incus` | Puck | Application host (Nginx + Uvicorn) |
|
||||
| `portia.incus` | Portia | PostgreSQL |
|
||||
| `prospero.incus` | Prospero | Prometheus, Loki, Grafana |
|
||||
| `titania.incus` | Titania | HAProxy + Casdoor SSO |
|
||||
| `sycorax.incus` | Sycorax | LLM proxy (Arke) |
|
||||
|
||||
---
|
||||
|
||||
## 9. Deployment Checklist
|
||||
|
||||
### Infrastructure (Terraform / Ansible)
|
||||
- [ ] Create `daedalus` database and user on Portia
|
||||
- [ ] Create `daedalus` S3 bucket in MinIO (Terraform)
|
||||
- [ ] Store DB password and S3 credentials in Ansible Vault
|
||||
- [ ] Add Prometheus scrape target on Prospero
|
||||
- [ ] Add AlertManager rules on Prospero
|
||||
- [ ] Add Grafana dashboard on Prospero
|
||||
- [ ] Configure HAProxy backend + ACL on Titania
|
||||
- [ ] Configure Casdoor application for `daedalus.ouranos.helu.ca`
|
||||
|
||||
### Application (Puck)
|
||||
- [ ] Create `/srv/daedalus` directory structure
|
||||
- [ ] Create `daedalus` system user
|
||||
- [ ] Set up Python virtualenv and install backend dependencies
|
||||
- [ ] Build SvelteKit frontend (`npm run build`)
|
||||
- [ ] Deploy `.env` from Ansible Vault
|
||||
- [ ] Install Nginx site config
|
||||
- [ ] Install and enable systemd service
|
||||
- [ ] Run Alembic migrations (`alembic upgrade head`)
|
||||
- [ ] Verify `/api/health` returns `{"status": "ok"}`
|
||||
- [ ] Verify `/metrics` is reachable from Prospero
|
||||
- [ ] Verify `daedalus.ouranos.helu.ca` loads the SPA through HAProxy
|
||||
Reference in New Issue
Block a user