It can't easily, Docker should not be naively treated as a security solution. It's very easy to misconfigure it:
- The Docker daemon runs as root: any user in the docker group effectively also has sudo (--privileged)
- Ports exposed by Docker punch through the firewall
- In general, you can break the security boundary towards root (not your user!) by mounting the wrong things, setting the wrong flags etc.
What Docker primarily gives you is a stupid (good!) solution for having a reproducible, re-settable environment. But containers (read: magic isolated box) are not really a good tool to reason about security in Linux imo.
If you are a beginner, instead make sure you don't run services as the sudo-capable/root user as a first step. Then, I would recommend you look into Systemd services: you can configure all the Linux sandboxing features Docker uses and more. This composes well with Podman, which gives you a reproducible environment (drop-in replacement for Docker) but contained to an unprivileged user.
I agree with what you wrote, and add that you should make sure that your service's executables and scripts also should not be owned by the user they run as.
It's unfortunately very common to install, for example, a project as the "ubuntu" user and also run it as the "ubuntu" user. But this arrangement effectively turns any kind of file-overwrite vulnerability into a remote-execution vulnerability.
Owning executables as root:root, perms 0755, and running as a separate unprivileged user, is a standard approach.
> - Ports exposed by Docker punch through the firewall
I've been using ufw-docker [1] to force ufw and docker to cooperate. Without it, Docker ports do actually get exposed to to the Internet. As far as I can tell, it does its job correctly. Is there another problem I am not aware of?
Docker keeps well behaved programs well behaved. You can escape in one line of shell.
How? Like if I have a Debian-Slim container running it's possible to "break-out" onto the host?
Yup that's trivially easy if you have permissions to use mknod and mount. (and if the file system namespace looks like it normally does all you need is mount.)
Docker is for organizing things for yourself, just like directories are. If you want actual isolation you have to take extra steps.
EDIT: and I feel like I should add those extra steps are exactly what most server software does automatically when it chroots itself. Again docker is really just for organizing things.
For those not intimate familiar with containers (docker/podman), can you link to a brief blog post that touches on this in detail for further reading? Much appreciated.
> Docker is for organizing things for yourself, just like directories are.
Services have the following dependencies: static data files; configuration files; executable code/binaries; library dependencies.
In days of yonder, you'd need to download/install all of that ^ on each machine where "service A" needs to run. Developers would run and test "service A" on ubuntu 18.04. But production servers had to run ubuntu 16.04 because "service X" that also runs on the same server needs a library that has not been ported to 18.04 yet.
But "service A" needs a library that was never available on 16.04. Welcome to dependency hell!
Containers bundle all of those dependencies into one object that can be downloaded directly onto the host server, ready for the "service A" process to execute. Now it doesn't matter if production servers are running 16.04. Everything "service A" needs is stored inside the container blob (including some minimal ubuntu 18.04 stuff).
the magic that lets this happen -- containers re-use the host server's OS kernel. Running a new ubuntu 18.04 container does not start a new OS kernel running. the process for your container is just 'firewalled' off from all other processes using cgroups [0]. containers re-use the host's kernel, start a cgroup'd process which starts your container's services and processes (the 18.04 'OS' services and your binary/code/executable).
short/simpler version: containers share the core of the underlying operating system on the host server.
> If you want actual isolation you have to take extra steps.
unfortunately, this means containers share the core of the underlying operating system on the host server.
containers not being isolated from the host server OS can present a security risk as you can escape from the container and "do bad things to host server". [1]
In cases where that is a problem you mostly have two choices:
* use VMs instead (a completely isolated OS instance is started for each service, cannot interact with the host OS at all -- this uses a lot more memory/cpu)
* use rootless containers [2] (container processes are launched under a specific user namespace rather than kernel namespace -- escaping the container means you only get access to the user namespace)
[0]: https://en.wikipedia.org/wiki/Cgroups
[1]: by default the docker daemon service and all the container processes it starts are running as root, which means escaping out of a container in a a default docker installation is as bad as giving someone root.
> Yup that's trivially easy if you have permissions to use mknod and mount.
Docker containers don't have mount permissions by default.