Files
ouranos/docs/kb/Docker won't start inside Incus container.md

135 lines
5.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Docker won't start inside Incus container
------------------------------------------
# Issue
Running Docker inside Incus has worked for years, but a recent Ubuntu package update caused it to fail.
## Symptoms
Docker containers won't start with the following error:
```
docker compose up
Attaching to neo4j
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: open sysctl net.ipv4.ip_unprivileged_port_start file: reopen fd 8: permission denied
```
The issue is AppArmor on Incus containers. The host has AppArmor, and Incus applies an AppArmor profile to containers with `security.nesting=true` that blocks Docker from writing to `/proc/sys/net/ipv4/ip_unprivileged_port_start`.
# Solution (Automated)
The fix requires **both** host-side and container-side changes. These are now automated in our infrastructure:
## 1. Terraform - Host-side fix
In `terraform/containers.tf`, all containers with `security.nesting=true` now include:
```terraform
config = {
"security.nesting" = true
"raw.lxc" = "lxc.apparmor.profile=unconfined"
}
```
This tells Incus not to load any AppArmor profile for the container.
## 2. Ansible - Container-side fix
In `ansible/docker/deploy.yml`, Docker deployment now creates a systemd override:
```yaml
- name: Create AppArmor workaround for Incus nested Docker
ansible.builtin.copy:
content: |
[Service]
Environment=container="setmeandforgetme"
dest: /etc/systemd/system/docker.service.d/apparmor-workaround.conf
```
This tells Docker to skip loading its own AppArmor profile.
# Manual Workaround
If you need to fix this manually (e.g., before running Terraform/Ansible):
## Step 1: Force unconfined mode from the Incus host
```bash
# On the HOST (pan.helu.ca), not in the container
incus config set <container-name> raw.lxc "lxc.apparmor.profile=unconfined" --project ouranos
incus restart <container-name> --project ouranos
```
## Step 2: Disable AppArmor for Docker inside the container
```bash
# Inside the container
sudo mkdir -p /etc/systemd/system/docker.service.d
sudo tee /etc/systemd/system/docker.service.d/apparmor-workaround.conf <<EOF
[Service]
Environment=container="setmeandforgetme"
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker
```
Reference: [ktz.blog](https://blog.ktz.me/proxmox-9-broke-my-docker-containers/)
# Verification
Tested on Miranda (2025-12-28):
```bash
# Before fix - fails with permission denied
$ ssh miranda.incus "docker run hello-world"
docker: Error response from daemon: failed to create task for container: ... permission denied
# After applying both fixes
$ ssh miranda.incus "docker run hello-world"
Hello from Docker!
# Port binding also works
$ ssh miranda.incus "docker run -d -p 8080:80 nginx"
# Container starts successfully
```
# Security Considerations
Setting `lxc.apparmor.profile=unconfined` only disables the AppArmor profile that Incus applies **to** the container. The host's AppArmor daemon continues running and protecting the host itself.
Security layers with this fix:
- Host AppArmor ✅ (still active)
- Incus container isolation ✅ (namespaces, cgroups)
- Container AppArmor ❌ (disabled with unconfined)
- Docker container isolation ✅ (namespaces, cgroups)
For sandbox/dev environments, this tradeoff is acceptable since:
- The Incus container is already isolated from the host
- We're not running untrusted workloads
- Production uses VMs + Docker without Incus nesting
# Explanation
What happened is that a recent update on the host (probably the incus and/or apparmor packages that landed in Ubuntu 24.04) started feeding the container a new AppArmor profile that contains this rule (or one very much like it):
```
deny @{PROC}/sys/net/ipv4/ip_unprivileged_port_start rw,
```
That rule is not present in the profile that ships with plain Docker, but it is present in the profile that Incus now attaches to every container that has `security.nesting=true` (the flag you need to run Docker inside Incus).
Because the rule is a `deny`, it overrides any later `allow`, so Docker's own profile (which allows the write) is ignored and the kernel returns `permission denied` the first time Docker/runc tries to write the value that tells the kernel which ports an unprivileged user may bind to.
So the container itself starts fine, but as soon as Docker tries to start any of its own containers, the AppArmor policy that Incus attached to the nested container blocks the write and the whole Docker container creation aborts.
The two workarounds remove the enforcing profile:
1. **`raw.lxc = lxc.apparmor.profile=unconfined`** — Tells Incus "don't load any AppArmor profile for this container at all", so the offending rule is never applied.
2. **`Environment=container="setmeandforgetme"`** — Is the magic string Docker's systemd unit looks for. When it sees that variable it skips loading the Docker-default AppArmor profile. The value literally does not matter; the variable only has to exist.
Either way you end up with no AppArmor policy on the nested Docker container, so the write to `ip_unprivileged_port_start` succeeds and your containers start again.
**In short:** Recent Incus added a deny rule that clashes with Docker's need to tweak that sysctl; disabling the profile (host-side or container-side) is the quickest fix until the profiles are updated to allow the operation.
Because the rule is a deny, it overrides any later allow, so Dockers own profile (which allows the write) is ignored and the kernel returns: