docs: rewrite README with structured overview and quick start guide

Replaces the minimal project description with a comprehensive README
including a component overview table, quick start instructions, common
Ansible operations, and links to detailed documentation. Aligns with
Red Panda Approval™ standards.
This commit is contained in:
2026-03-03 12:49:06 +00:00
parent c7be03a743
commit b4d60f2f38
219 changed files with 34586 additions and 2 deletions

334
docs/anythingllm.md Normal file
View File

@@ -0,0 +1,334 @@
# AnythingLLM
## Overview
AnythingLLM is a full-stack application that provides a unified interface for interacting with Large Language Models (LLMs). It supports multi-provider LLM access, document intelligence (RAG with pgvector), AI agents with tools, and Model Context Protocol (MCP) extensions.
**Host:** Rosalind
**Role:** go_nodejs_php_apps
**Port:** 22084 (internal), accessible via `anythingllm.ouranos.helu.ca` (HAProxy)
## Architecture
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Client │────▶│ HAProxy │────▶│ AnythingLLM │
│ (Browser/API) │ │ (Titania) │ │ (Rosalind) │
└─────────────────┘ └─────────────────┘ └────────┬────────┘
┌────────────────────────────────┼────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ PostgreSQL │ │ LLM Backend │ │ TTS Service │
│ + pgvector │ │ (pan.helu.ca) │ │ (FastKokoro) │
│ (Portia) │ │ llama-cpp │ │ pan.helu.ca │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
### Directory Structure
AnythingLLM uses a native Node.js deployment with the following directory layout:
```
/srv/anythingllm/
├── app/ # Cloned git repository
│ ├── server/ # Backend API server
│ │ ├── .env # Environment configuration
│ │ └── node_modules/
│ ├── collector/ # Document processing service
│ │ ├── hotdir -> ../hotdir # SYMLINK (critical!)
│ │ └── node_modules/
│ └── frontend/ # React frontend (built into server)
├── storage/ # Persistent data
│ ├── documents/ # Processed documents
│ ├── vector-cache/ # Embedding cache
│ └── plugins/ # MCP server configs
└── hotdir/ # Upload staging directory (actual location)
/srv/collector/
└── hotdir -> /srv/anythingllm/hotdir # SYMLINK (critical!)
```
### Hotdir Path Resolution (Critical)
The server and collector use **different path resolution** for the upload directory:
| Component | Code Location | Resolves To |
|-----------|--------------|-------------|
| **Server** (multer) | `STORAGE_DIR/../../collector/hotdir` | `/srv/collector/hotdir` |
| **Collector** | `__dirname/../hotdir` | `/srv/anythingllm/app/collector/hotdir` |
Both paths must point to the same physical directory. This is achieved with **two symlinks**:
1. `/srv/collector/hotdir``/srv/anythingllm/hotdir`
2. `/srv/anythingllm/app/collector/hotdir``/srv/anythingllm/hotdir`
⚠️ **Important**: The collector ships with an empty `hotdir/` directory. The Ansible deploy must **remove** this directory before creating the symlink, or file uploads will fail with "File does not exist in upload directory."
### Key Integrations
| Component | Host | Purpose |
|-----------|------|---------|
| PostgreSQL + pgvector | Portia | Vector database for RAG embeddings |
| LLM Provider | pan.helu.ca:22071 | Generic OpenAI-compatible llama-cpp |
| TTS Service | pan.helu.ca:22070 | FastKokoro text-to-speech |
| HAProxy | Titania | TLS termination and routing |
| Loki | Prospero | Log aggregation |
## Terraform Resources
### Host Definition
AnythingLLM runs on **Rosalind**, which is already defined in `terraform/containers.tf`:
| Attribute | Value |
|-----------|-------|
| Image | noble |
| Role | go_nodejs_php_apps |
| Security Nesting | true |
| AppArmor | unconfined |
| Port Range | 22080-22099 |
No Terraform changes required—AnythingLLM uses port 22084 within Rosalind's existing range.
## Ansible Deployment
### Playbook
```bash
cd ansible
source ~/env/agathos/bin/activate
# Deploy PostgreSQL database first (if not already done)
ansible-playbook postgresql/deploy.yml
# Deploy AnythingLLM
ansible-playbook anythingllm/deploy.yml
# Redeploy HAProxy to pick up new backend
ansible-playbook haproxy/deploy.yml
# Redeploy Alloy to pick up new log source
ansible-playbook alloy/deploy.yml
```
### Files
| File | Purpose |
|------|---------|
| `anythingllm/deploy.yml` | Main deployment playbook |
| `anythingllm/anythingllm-server.service.j2` | Systemd service for server |
| `anythingllm/anythingllm-collector.service.j2` | Systemd service for collector |
| `anythingllm/env.j2` | Environment variables template |
### Variables
#### Host Variables (`host_vars/rosalind.incus.yml`)
| Variable | Description | Default |
|----------|-------------|---------|
| `anythingllm_user` | Service account user | `anythingllm` |
| `anythingllm_group` | Service account group | `anythingllm` |
| `anythingllm_directory` | Installation directory | `/srv/anythingllm` |
| `anythingllm_port` | Service port | `22084` |
| `anythingllm_db_host` | PostgreSQL host | `portia.incus` |
| `anythingllm_db_port` | PostgreSQL port | `5432` |
| `anythingllm_db_name` | Database name | `anythingllm` |
| `anythingllm_db_user` | Database user | `anythingllm` |
| `anythingllm_llm_base_url` | LLM API endpoint | `http://pan.helu.ca:22071/v1` |
| `anythingllm_llm_model` | Default LLM model | `llama-3-8b` |
| `anythingllm_embedding_engine` | Embedding engine | `native` |
| `anythingllm_tts_provider` | TTS provider | `openai` |
| `anythingllm_tts_endpoint` | TTS API endpoint | `http://pan.helu.ca:22070/v1` |
#### Vault Variables (`group_vars/all/vault.yml`)
| Variable | Description |
|----------|-------------|
| `vault_anythingllm_db_password` | PostgreSQL password |
| `vault_anythingllm_jwt_secret` | JWT signing secret (32+ chars) |
| `vault_anythingllm_sig_key` | Signature key (32+ chars) |
| `vault_anythingllm_sig_salt` | Signature salt (32+ chars) |
Generate secrets with:
```bash
openssl rand -hex 32
```
## Configuration
### Environment Variables
| Variable | Description | Source |
|----------|-------------|--------|
| `JWT_SECRET` | JWT signing secret | `vault_anythingllm_jwt_secret` |
| `SIG_KEY` | Signature key | `vault_anythingllm_sig_key` |
| `SIG_SALT` | Signature salt | `vault_anythingllm_sig_salt` |
| `VECTOR_DB` | Vector database type | `pgvector` |
| `PGVECTOR_CONNECTION_STRING` | PostgreSQL connection | Composed from host_vars |
| `LLM_PROVIDER` | LLM provider type | `generic-openai` |
| `EMBEDDING_ENGINE` | Embedding engine | `native` |
| `TTS_PROVIDER` | TTS provider | `openai` |
### External Access
AnythingLLM is accessible via HAProxy on Titania:
| URL | Backend |
|-----|---------|
| `https://anythingllm.ouranos.helu.ca` | `rosalind.incus:22084` |
The HAProxy backend is configured in `host_vars/titania.incus.yml`.
## Monitoring
### Loki Logs
| Log Source | Labels |
|------------|--------|
| Server logs | `{unit="anythingllm-server.service"}` |
| Collector logs | `{unit="anythingllm-collector.service"}` |
Logs are collected via systemd journal → Alloy on Rosalind → Loki on Prospero.
**Grafana Query:**
```logql
{unit=~"anythingllm.*"} |= ``
```
### Health Check
```bash
# From any sandbox host
curl http://rosalind.incus:22084/api/ping
# Via HAProxy (external)
curl -k https://anythingllm.ouranos.helu.ca/api/ping
```
## Operations
### Start/Stop
```bash
# SSH to Rosalind
ssh rosalind.incus
# Manage via systemd
sudo systemctl start anythingllm-server # Start server
sudo systemctl start anythingllm-collector # Start collector
sudo systemctl stop anythingllm-server # Stop server
sudo systemctl stop anythingllm-collector # Stop collector
sudo systemctl restart anythingllm-server # Restart server
sudo systemctl restart anythingllm-collector # Restart collector
```
### Logs
```bash
# Real-time server logs
journalctl -u anythingllm-server -f
# Real-time collector logs
journalctl -u anythingllm-collector -f
# Grafana (historical)
# Query: {unit=~"anythingllm.*"}
```
### Upgrade
Pull latest code and redeploy:
```bash
ansible-playbook anythingllm/deploy.yml
```
## Vault Setup
Add the following secrets to `ansible/inventory/group_vars/all/vault.yml`:
```bash
ansible-vault edit ansible/inventory/group_vars/all/vault.yml
```
```yaml
# AnythingLLM Secrets
vault_anythingllm_db_password: "your-secure-password"
vault_anythingllm_jwt_secret: "your-32-char-jwt-secret"
vault_anythingllm_sig_key: "your-32-char-signature-key"
vault_anythingllm_sig_salt: "your-32-char-signature-salt"
```
## Follow-On Tasks
### MCP Server Integration
AnythingLLM supports Model Context Protocol (MCP) for extending AI agent capabilities. Future integration with existing MCP servers:
| MCP Server | Host | Tools |
|------------|------|-------|
| MCPO | Miranda | Docker management |
| Neo4j MCP | Miranda | Graph database queries |
| GitHub MCP | (external) | Repository operations |
Configure MCP connections via AnythingLLM Admin UI after initial deployment.
### Casdoor SSO
For single sign-on integration, configure AnythingLLM to authenticate via Casdoor OAuth2. This requires:
1. Creating an application in Casdoor admin
2. Configuring OAuth2 environment variables in AnythingLLM
3. Optionally using OAuth2-Proxy for transparent authentication
## Troubleshooting
### File Upload Fails with "File does not exist in upload directory"
**Symptom:** Uploading files via the UI returns 500 Internal Server Error with message "File does not exist in upload directory."
**Cause:** The server uploads files to `/srv/collector/hotdir`, but the collector looks for them in `/srv/anythingllm/app/collector/hotdir`. If these aren't the same physical directory, uploads fail.
**Solution:** Verify symlinks are correctly configured:
```bash
# Check symlinks
ls -la /srv/collector/hotdir
# Should show: /srv/collector/hotdir -> /srv/anythingllm/hotdir
ls -la /srv/anythingllm/app/collector/hotdir
# Should show: /srv/anythingllm/app/collector/hotdir -> /srv/anythingllm/hotdir
# If collector/hotdir is a directory (not symlink), fix it:
sudo rm -rf /srv/anythingllm/app/collector/hotdir
sudo ln -s /srv/anythingllm/hotdir /srv/anythingllm/app/collector/hotdir
sudo chown -h anythingllm:anythingllm /srv/anythingllm/app/collector/hotdir
sudo systemctl restart anythingllm-collector
```
### Container Won't Start
Check Docker logs:
```bash
sudo docker logs anythingllm
```
Verify PostgreSQL connectivity:
```bash
psql -h portia.incus -U anythingllm -d anythingllm
```
### Database Connection Issues
Ensure pgvector extension is enabled:
```bash
psql -h portia.incus -U postgres -d anythingllm -c "SELECT * FROM pg_extension WHERE extname = 'vector';"
```
### LLM Provider Issues
Test LLM endpoint directly:
```bash
curl http://pan.helu.ca:22071/v1/models
```