1
0
This commit is contained in:
Wonderfall 2022-04-03 02:02:42 +02:00
parent a1e0e5e49b
commit f033e1637d

View File

@ -100,7 +100,7 @@ As mentioned just above, [user namespaces](https://www.man7.org/linux/man-pages/
`whoami && sleep 60` in the container will return root, but `ps -fC sleep` on the host will show us the PID of another user. That is nice, but it has limitations and therefore shouldn't be considered as a real sandbox. In fact, the paradox is that [user namespaces are attack surface](https://lists.archlinux.org/pipermail/arch-general/2017-February/043066.html) (and vulnerabilities are still being found [years later](https://www.openwall.com/lists/oss-security/2022/01/29/1)), and it's common to restrict them to privileged users (`kernel.unprivileged_userns_clone=0`). That is fine for Docker with its traditional root daemon, but Podman expects you to let unprivileged users interact with user namespaces (so essentially privileged code).
Enabling `userns-remap` in Docker shouldn't be a subtitute for running unprivileged application containers (where applicable). User namespaces are mostly useful if you intend to run full-fledged OS containers which need root in order to function, but that is out of the scope of the container technologies mentioned in this article; for them, I'd argue exposing such a vulnerable attack surface from the host kernel for dubious sandboxing benefits isn't an interesting trade-off to make.
Enabling `userns-remap` in Docker shouldn't be a substitute for running unprivileged application containers (where applicable). User namespaces are mostly useful if you intend to run full-fledged OS containers which need root in order to function, but that is out of the scope of the container technologies mentioned in this article; for them, I'd argue exposing such a vulnerable attack surface from the host kernel for dubious sandboxing benefits isn't an interesting trade-off to make.
### The no_new_privs bit
After ensuring root isn't used in your containers, you should look into setting the `no_new_privs` bit. [This Linux feature](https://docs.kernel.org/userspace-api/no_new_privs.html) restricts syscalls such as `execve()` from granting privileges, which is what you want to restrict in-container privilege escalation. This flag can be set for a given container in a Compose file: