> Looks like each container gets its own lightweight Linux VM.
That sounds pretty heavyweight. A project with 12 containers will run 12 kernels instead of 1?
Curious to see metrics on this approach.
This is the approach used by Kata Containers/Firecracker. It's not much heavier than the shared kernel approach, but has significantly better security. An bug in the container runtime doesn't immediately break the separation between containers.
The performance overhead of the VM is minimal, the main tradeoffs is container startup time.
I wonder why Apple cared so much about the security aspect to take the isolated VM approach versus shared VM approach. Seems unlikely that Apple hardware is going to be used to host containerized applications in production where this would be more of a concern. On the other hand, it's more likely to be used for development purposes where the memory overhead could be a bigger concern.
> Seems unlikely that Apple hardware is going to be used to host containerized applications in production
I imagine this is certainly happening already inside Apple datacenters.
One of the use cases for this feature is for macOS desktop apps to run Linux sidecars, so this needed to be secure for end user devices.
Ram overhead can be nontrivial. Each kernel has its own page cache.
On a non Linux OS that should be offset by being able to allocate RAM separately to each container instead of the current approach in Docker Desktop where a static slice of your system memory is always allocated to the Docker VM.
This a feature targeting developers or perhaps apps running on end-user machine where page cache sharing between applications or container does not typically get much of RAM saving.
Linux kernel overhead itself while non-trivial is still very manageable in those settings. AWS Nitro stripped down VM kernel is about 40 MB, I suppose for Apple solution it will be similar.
Is that not the premise of docker?
No it's the opposite, the entire premise of Docker over VMs is that you run one instance of all the OS stuff that's shared so it takes less resources than a VM and the portable images are smaller because they don't contain the OS image.
The premise is containerization, not necessarily particular resource usage by the host running the containers.
For hosted services, you want to choose - is it worth running a single kernel with a lot of containers for the cost savings from shared resources, or isolate them by making them different VMs. There are certainly products for containers which lean towards the latter, at least by default.
For development it matters a lot less, as long as the sum resources of containers you are planning to run don't overload the system.
The VM option is relatively new and the original idea was to provide that isolation without the weight of a VM. Also I'm not sure that docker didn't coin the word containerization, I've alway associated it with specifically the kind of packaging docker provides and don't remember it being mentioned around VMs.
On Windows containers you can chose if the kernel is shared across containers or not, it in only on Linux containers mode that the kernel gets shared.
Nope, docker uses the host's kernel, so there are zero additional kernels.
On non-Linux, you obviously need an additional kernel running (the Linux kernel). In this case, there are N additional kernels running.
> On non-Linux, you obviously need an additional kernel running (the Linux kernel).
That seems to be true in practice, but I don't think it's obviously true. As WSL1 shows, it's possible to make an emulation layer for Linux syscalls on top of quite a different operating system.
I would draw the opposite conclusion from the WSL1 attempt.
It was a strategy that failed in practice and needed to be replaced with a vm based approach.
The Linux kernel have a huge surface area with some subtle behavior in it. There was no economic way to replicate all of that and keep it up to date in a proprietary kernel. Specially as the VM tech is well established and reusable.
WSL1 wasn't really a VM though? IIRC it was implementing syscalls over the Windows kernel.
Indeed, WSL1 isn't a VM. As I said, it's just:
> an emulation layer for Linux syscalls on top of quite a different operating system.
My point was that, in principle, it could be possible to implement Linux containers on another OS without using VMs.
However, as you said (and so did I), in practice no one has. Probably because it's just not worth the effort compared to just using a VM. Especially since all your containers can share a single VM, so you end up only running 2 kernels (rather than e.g. 11 for 10 containers). That's exactly how Docker on WSL2 works.
gVisor has basically re-implemented most of syscall api, but only when the host is also Linux.
I think that's the point. You don't have to run the full kernel to run some linux tools.
Though I don't think it ever supported docker. And wasn't really expected to, since the entire namespaces+cgroup stuff is way deeper than just some surface level syscall shims.
> On non-Linux, you obviously need an additional kernel running (the Linux kernel)
Only "obvious" for running Linux processes using Linux container facilities (cgroups)
Windows has its own native facilities allowing Windows processes to be containerised. It just so happens that in addition to that, there's WSL2 at hand to run Linux processes (containerised or not).
There is nothing preventing Apple to implement Darwin-native facilities so that Darwin processes would be containerised. It would actually be very nice to be able to distribute/spin up arbitrary macOS environments with some minimal CLI + CLT base† and run build/test stuff without having to spawn full-blown macOS VMs.
† "base" in the BSD sense.
eh docker desktop nowadays runs VMs even on Linux
Docker Desktop is non free proprietary software that isn’t very good anyway.
I could imagine one Linux kernel running in a VM (on top of MacOS) and then containers inside that host OS. So 1 base instance (MacOS), 1 hypervisor (Linux L0), 12 containers (using that L0 kernel).
That's how Docker Desktop for Mac works. With Apples approach you have 12 VMs with 12 Linux kernels.