docs: rewrite README with structured overview and quick start guide

Replaces the minimal project description with a comprehensive README including a component overview table, quick start instructions, common Ansible operations, and links to detailed documentation. Aligns with Red Panda Approval™ standards.
2026-03-03 12:49:06 +00:00
parent c7be03a743
commit b4d60f2f38
219 changed files with 34586 additions and 2 deletions
--- a/docs/kb/Docker
+++ b/docs/kb/Docker
@@ -0,0 +1,127 @@
+Docker Compose doesn't pull newer images for existing tags
+-----------------------------------------------------------
+
+# Issue
+
+Running `docker compose up` on a service tagged `:latest` does not check the registry for a newer image. The container keeps running the old image even though a newer one has been pushed upstream.
+
+## Symptoms
+
+- `docker compose up` starts the container immediately using the locally cached image
+- `docker compose pull` or `docker pull <image>:latest` successfully downloads a newer image
+- After pulling manually, `docker compose up` recreates the container with the new image
+- The `community.docker.docker_compose_v2` Ansible module with `state: present` behaves identically — no pull check
+
+# Explanation
+
+Docker's default behaviour is: **if an image with the requested tag exists locally, use it without checking the registry.** The `:latest` tag is not special — it's just a regular mutable tag. Docker does not treat it as "always fetch the newest." It is simply the default tag applied when no tag is specified.
+
+When you run `docker compose up`:
+
+1. Docker checks if `image:latest` exists in the local image store
+2. If yes → use it, no registry check
+3. If no → pull from registry
+
+This means a stale `:latest` can sit on your host indefinitely while the upstream registry has a completely different image behind the same tag. The only way Docker knows to pull is if:
+- The image doesn't exist locally at all
+- You explicitly tell it to pull
+
+The same applies to the Ansible `community.docker.docker_compose_v2` module — `state: present` maps to `docker compose up` behaviour, so no pull check occurs unless you tell it to.
+
+# Solution
+
+Two complementary fixes ensure images are always checked against the registry.
+
+## 1. Docker Compose — `pull_policy: always`
+
+Add `pull_policy: always` to the service definition in `docker-compose.yml`:
+
+```yaml
+services:
+  my-service:
+    image: registry.example.com/my-image:latest
+    pull_policy: always          # Check registry on every `up`
+    container_name: my-service
+    ...
+```
+
+With this set, `docker compose up` will always contact the registry and compare the local image digest with the remote one. If they match, no download occurs — it's a lightweight check. If they differ, the new image layers are pulled.
+
+Valid values for `pull_policy`:
+
+| Value | Behaviour |
+|-------|-----------|
+| `always` | Always check the registry before starting |
+| `missing` | Only pull if the image doesn't exist locally (default) |
+| `never` | Never pull, fail if image doesn't exist locally |
+| `build` | Always build the image (for services with `build:`) |
+
+## 2. Ansible — `pull: always` on `docker_compose_v2`
+
+Add `pull: always` to the `community.docker.docker_compose_v2` task:
+
+```yaml
+- name: Start service
+  community.docker.docker_compose_v2:
+    project_src: "{{ service_directory }}"
+    state: present
+    pull: always                 # Check registry during deploy
+```
+
+Valid values for `pull`:
+
+| Value | Behaviour |
+|-------|-----------|
+| `always` | Always pull before starting (like `docker compose pull && up`) |
+| `missing` | Only pull if image doesn't exist locally |
+| `never` | Never pull |
+| `policy` | Defer to `pull_policy` defined in the compose file |
+
+## Why use both?
+
+- **`pull_policy` in compose file** — Protects against manual `docker compose up` on the host
+- **`pull: always` in Ansible** — Ensures automated deployments always get the freshest image
+
+They are independent mechanisms. The Ansible `pull` parameter runs a pull step before compose up, regardless of what the compose file says. Belt and suspenders.
+
+# Agathos Fix
+
+Applied to `ansible/gitea_mcp/` as the first instance. The same pattern should be applied to any service using mutable tags (`:latest`, `:stable`, etc.).
+
+**docker-compose.yml.j2:**
+```yaml
+services:
+  gitea-mcp:
+    image: docker.gitea.com/gitea-mcp-server:latest
+    pull_policy: always
+    ...
+```
+
+**deploy.yml:**
+```yaml
+- name: Start Gitea MCP service
+  community.docker.docker_compose_v2:
+    project_src: "{{ gitea_mcp_directory }}"
+    state: present
+    pull: always
+```
+
+# When you DON'T need this
+
+- **Pinned image tags** (e.g., `postgres:16.2`, `grafana/grafana:11.1.0`) — The tag is immutable, so there's nothing newer to pull. Using `pull: always` here just adds a redundant registry check on every deploy.
+- **Locally built images** — If the image is built by `docker compose build`, use `pull_policy: build` instead.
+- **Air-gapped / offline hosts** — `pull: always` will fail if the registry is unreachable. Use `missing` or `never`.
+
+# Verification
+
+```bash
+# Check what image a running container is using
+docker inspect --format='{{.Image}}' gitea-mcp
+
+# Compare local digest with remote
+docker images --digests docker.gitea.com/gitea-mcp-server
+
+# Force pull and check if image ID changes
+docker compose pull
+docker compose up -d
+```
--- a/docs/kb/Docker
+++ b/docs/kb/Docker
@@ -0,0 +1,134 @@
+Docker won't start inside Incus container
+------------------------------------------
+
+# Issue
+Running Docker inside Incus has worked for years, but a recent Ubuntu package update caused it to fail.
+
+## Symptoms
+
+Docker containers won't start with the following error:
+
+```
+docker compose up
+Attaching to neo4j
+Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: open sysctl net.ipv4.ip_unprivileged_port_start file: reopen fd 8: permission denied
+```
+
+The issue is AppArmor on Incus containers. The host has AppArmor, and Incus applies an AppArmor profile to containers with `security.nesting=true` that blocks Docker from writing to `/proc/sys/net/ipv4/ip_unprivileged_port_start`.
+
+# Solution (Automated)
+
+The fix requires **both** host-side and container-side changes. These are now automated in our infrastructure:
+
+## 1. Terraform - Host-side fix
+
+In `terraform/containers.tf`, all containers with `security.nesting=true` now include:
+
+```terraform
+config = {
+  "security.nesting" = true
+  "raw.lxc"          = "lxc.apparmor.profile=unconfined"
+}
+```
+
+This tells Incus not to load any AppArmor profile for the container.
+
+## 2. Ansible - Container-side fix
+
+In `ansible/docker/deploy.yml`, Docker deployment now creates a systemd override:
+
+```yaml
+- name: Create AppArmor workaround for Incus nested Docker
+  ansible.builtin.copy:
+    content: |
+      [Service]
+      Environment=container="setmeandforgetme"
+    dest: /etc/systemd/system/docker.service.d/apparmor-workaround.conf
+```
+
+This tells Docker to skip loading its own AppArmor profile.
+
+# Manual Workaround
+
+If you need to fix this manually (e.g., before running Terraform/Ansible):
+
+## Step 1: Force unconfined mode from the Incus host
+
+```bash
+# On the HOST (pan.helu.ca), not in the container
+incus config set <container-name> raw.lxc "lxc.apparmor.profile=unconfined" --project agathos
+incus restart <container-name> --project agathos
+```
+
+## Step 2: Disable AppArmor for Docker inside the container
+
+```bash
+# Inside the container
+sudo mkdir -p /etc/systemd/system/docker.service.d
+sudo tee /etc/systemd/system/docker.service.d/apparmor-workaround.conf <<EOF
+[Service]
+Environment=container="setmeandforgetme"
+EOF
+sudo systemctl daemon-reload
+sudo systemctl restart docker
+```
+
+Reference: [ktz.blog](https://blog.ktz.me/proxmox-9-broke-my-docker-containers/)
+
+# Verification
+
+Tested on Miranda (2025-12-28):
+
+```bash
+# Before fix - fails with permission denied
+$ ssh miranda.incus "docker run hello-world"
+docker: Error response from daemon: failed to create task for container: ... permission denied
+
+# After applying both fixes
+$ ssh miranda.incus "docker run hello-world"
+Hello from Docker!
+
+# Port binding also works
+$ ssh miranda.incus "docker run -d -p 8080:80 nginx"
+# Container starts successfully
+```
+
+# Security Considerations
+
+Setting `lxc.apparmor.profile=unconfined` only disables the AppArmor profile that Incus applies **to** the container. The host's AppArmor daemon continues running and protecting the host itself.
+
+Security layers with this fix:
+- Host AppArmor ✅ (still active)
+- Incus container isolation ✅ (namespaces, cgroups)
+- Container AppArmor ❌ (disabled with unconfined)
+- Docker container isolation ✅ (namespaces, cgroups)
+
+For sandbox/dev environments, this tradeoff is acceptable since:
+- The Incus container is already isolated from the host
+- We're not running untrusted workloads
+- Production uses VMs + Docker without Incus nesting
+
+# Explanation
+
+What happened is that a recent update on the host (probably the incus and/or apparmor packages that landed in Ubuntu 24.04) started feeding the container a new AppArmor profile that contains this rule (or one very much like it):
+
+```
+deny @{PROC}/sys/net/ipv4/ip_unprivileged_port_start rw,
+```
+
+That rule is not present in the profile that ships with plain Docker, but it is present in the profile that Incus now attaches to every container that has `security.nesting=true` (the flag you need to run Docker inside Incus).
+
+Because the rule is a `deny`, it overrides any later `allow`, so Docker's own profile (which allows the write) is ignored and the kernel returns `permission denied` the first time Docker/runc tries to write the value that tells the kernel which ports an unprivileged user may bind to.
+
+So the container itself starts fine, but as soon as Docker tries to start any of its own containers, the AppArmor policy that Incus attached to the nested container blocks the write and the whole Docker container creation aborts.
+
+The two workarounds remove the enforcing profile:
+
+1. **`raw.lxc = lxc.apparmor.profile=unconfined`** — Tells Incus "don't load any AppArmor profile for this container at all", so the offending rule is never applied.
+
+2. **`Environment=container="setmeandforgetme"`** — Is the magic string Docker's systemd unit looks for. When it sees that variable it skips loading the Docker-default AppArmor profile. The value literally does not matter; the variable only has to exist.
+
+Either way you end up with no AppArmor policy on the nested Docker container, so the write to `ip_unprivileged_port_start` succeeds and your containers start again.
+
+**In short:** Recent Incus added a deny rule that clashes with Docker's need to tweak that sysctl; disabling the profile (host-side or container-side) is the quickest fix until the profiles are updated to allow the operation.
+Because the rule is a deny, it overrides any later allow, so Docker’s own profile (which allows the write) is ignored and the kernel returns: