135 lines
5.5 KiB
Markdown
135 lines
5.5 KiB
Markdown
Docker won't start inside Incus container
|
||
------------------------------------------
|
||
|
||
# Issue
|
||
Running Docker inside Incus has worked for years, but a recent Ubuntu package update caused it to fail.
|
||
|
||
## Symptoms
|
||
|
||
Docker containers won't start with the following error:
|
||
|
||
```
|
||
docker compose up
|
||
Attaching to neo4j
|
||
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: open sysctl net.ipv4.ip_unprivileged_port_start file: reopen fd 8: permission denied
|
||
```
|
||
|
||
The issue is AppArmor on Incus containers. The host has AppArmor, and Incus applies an AppArmor profile to containers with `security.nesting=true` that blocks Docker from writing to `/proc/sys/net/ipv4/ip_unprivileged_port_start`.
|
||
|
||
# Solution (Automated)
|
||
|
||
The fix requires **both** host-side and container-side changes. These are now automated in our infrastructure:
|
||
|
||
## 1. Terraform - Host-side fix
|
||
|
||
In `terraform/containers.tf`, all containers with `security.nesting=true` now include:
|
||
|
||
```terraform
|
||
config = {
|
||
"security.nesting" = true
|
||
"raw.lxc" = "lxc.apparmor.profile=unconfined"
|
||
}
|
||
```
|
||
|
||
This tells Incus not to load any AppArmor profile for the container.
|
||
|
||
## 2. Ansible - Container-side fix
|
||
|
||
In `ansible/docker/deploy.yml`, Docker deployment now creates a systemd override:
|
||
|
||
```yaml
|
||
- name: Create AppArmor workaround for Incus nested Docker
|
||
ansible.builtin.copy:
|
||
content: |
|
||
[Service]
|
||
Environment=container="setmeandforgetme"
|
||
dest: /etc/systemd/system/docker.service.d/apparmor-workaround.conf
|
||
```
|
||
|
||
This tells Docker to skip loading its own AppArmor profile.
|
||
|
||
# Manual Workaround
|
||
|
||
If you need to fix this manually (e.g., before running Terraform/Ansible):
|
||
|
||
## Step 1: Force unconfined mode from the Incus host
|
||
|
||
```bash
|
||
# On the HOST (pan.helu.ca), not in the container
|
||
incus config set <container-name> raw.lxc "lxc.apparmor.profile=unconfined" --project ouranos
|
||
incus restart <container-name> --project ouranos
|
||
```
|
||
|
||
## Step 2: Disable AppArmor for Docker inside the container
|
||
|
||
```bash
|
||
# Inside the container
|
||
sudo mkdir -p /etc/systemd/system/docker.service.d
|
||
sudo tee /etc/systemd/system/docker.service.d/apparmor-workaround.conf <<EOF
|
||
[Service]
|
||
Environment=container="setmeandforgetme"
|
||
EOF
|
||
sudo systemctl daemon-reload
|
||
sudo systemctl restart docker
|
||
```
|
||
|
||
Reference: [ktz.blog](https://blog.ktz.me/proxmox-9-broke-my-docker-containers/)
|
||
|
||
# Verification
|
||
|
||
Tested on Miranda (2025-12-28):
|
||
|
||
```bash
|
||
# Before fix - fails with permission denied
|
||
$ ssh miranda.incus "docker run hello-world"
|
||
docker: Error response from daemon: failed to create task for container: ... permission denied
|
||
|
||
# After applying both fixes
|
||
$ ssh miranda.incus "docker run hello-world"
|
||
Hello from Docker!
|
||
|
||
# Port binding also works
|
||
$ ssh miranda.incus "docker run -d -p 8080:80 nginx"
|
||
# Container starts successfully
|
||
```
|
||
|
||
# Security Considerations
|
||
|
||
Setting `lxc.apparmor.profile=unconfined` only disables the AppArmor profile that Incus applies **to** the container. The host's AppArmor daemon continues running and protecting the host itself.
|
||
|
||
Security layers with this fix:
|
||
- Host AppArmor ✅ (still active)
|
||
- Incus container isolation ✅ (namespaces, cgroups)
|
||
- Container AppArmor ❌ (disabled with unconfined)
|
||
- Docker container isolation ✅ (namespaces, cgroups)
|
||
|
||
For sandbox/dev environments, this tradeoff is acceptable since:
|
||
- The Incus container is already isolated from the host
|
||
- We're not running untrusted workloads
|
||
- Production uses VMs + Docker without Incus nesting
|
||
|
||
# Explanation
|
||
|
||
What happened is that a recent update on the host (probably the incus and/or apparmor packages that landed in Ubuntu 24.04) started feeding the container a new AppArmor profile that contains this rule (or one very much like it):
|
||
|
||
```
|
||
deny @{PROC}/sys/net/ipv4/ip_unprivileged_port_start rw,
|
||
```
|
||
|
||
That rule is not present in the profile that ships with plain Docker, but it is present in the profile that Incus now attaches to every container that has `security.nesting=true` (the flag you need to run Docker inside Incus).
|
||
|
||
Because the rule is a `deny`, it overrides any later `allow`, so Docker's own profile (which allows the write) is ignored and the kernel returns `permission denied` the first time Docker/runc tries to write the value that tells the kernel which ports an unprivileged user may bind to.
|
||
|
||
So the container itself starts fine, but as soon as Docker tries to start any of its own containers, the AppArmor policy that Incus attached to the nested container blocks the write and the whole Docker container creation aborts.
|
||
|
||
The two workarounds remove the enforcing profile:
|
||
|
||
1. **`raw.lxc = lxc.apparmor.profile=unconfined`** — Tells Incus "don't load any AppArmor profile for this container at all", so the offending rule is never applied.
|
||
|
||
2. **`Environment=container="setmeandforgetme"`** — Is the magic string Docker's systemd unit looks for. When it sees that variable it skips loading the Docker-default AppArmor profile. The value literally does not matter; the variable only has to exist.
|
||
|
||
Either way you end up with no AppArmor policy on the nested Docker container, so the write to `ip_unprivileged_port_start` succeeds and your containers start again.
|
||
|
||
**In short:** Recent Incus added a deny rule that clashes with Docker's need to tweak that sysctl; disabling the profile (host-side or container-side) is the quickest fix until the profiles are updated to allow the operation.
|
||
Because the rule is a deny, it overrides any later allow, so Docker’s own profile (which allows the write) is ignored and the kernel returns:
|