They use a lot of similar techniques. One big difference is that docker uses user namespaces and flatpak does not. I'n not sure about the reasoning, but it's probably a combo of "not trusting user namespaces" (disagree) and user namespaces requiring privileges to use.
It sounds like the bigger issue isn't that the underlying technologies are fundamentally better or worse, but that the de facto configurations are worse. In particular, the median docker container can not write to my home directory. The median flatpak can.
Despite the ordering, the "no updates" seems like a way worse issue than the "most of the sandboxing is ineffective". It seems pretty clear to me that a lot of apps need wide access and the first person who does a great job at that will do us all a big security favor but we're not there yet in terms of UX. Sometimes I really want my text editor to edit my bashrc. Maybe that should require a privilege escalation, that's fine.
Docker has support for user namespaces but it's off by default, and I've never actually seen someone use them (I'm sure people do, but the way the support was implemented is fairly half-baked in a variety of ways, for a variety of understandable but still disappointing reasons).
LXC/LXD's user namespace implementation actually privilege-separates different containers from each other (while also being able to "punch out" parts of the mapping so that you can share stuff between containers without needing to share the entire uid_map).
> user namespaces requiring privileges to use
Not always. See https://github.com/rootlesscontainers (a project I work on -- currently you can run Kubernetes as an unprivileged user with some caveats about multi-node setups but we're working on it) or LXC's unprivileged containers.
And in cases where you need to have multi-user mappings (which isn't necessary for most user applications because they wouldn't be able to setuid anyway!) you can just use "newuidmap" and "newgidmap".
In fact, bubblewrap has supported precisely this usecase and the use of user namespaces for a while. Of course, user namespaces wouldn't really help with protecting against home directory attacks -- if you're running as the same user (but in a user namespace) and you bind-mount the home directory then it can obviously write to said home directory.
Creation of user namespaces still has caused security vulnerabilities in very recent history. But with seccomp you can disable it inside a container (which is what Docker and LXC do by default for instance), and it doesn't make sense to be worried about that as a container runtime because you are using it to increase the security of your sandbox.
> I'n not sure about the reasoning, but it's probably a combo of "not trusting user namespaces" (disagree) and user namespaces requiring privileges to use.
binctr looks like an interesting solution to tackle this issue.
Or https://rootlesscontaine.rs/ [1]. runc has had upstream support for this for quite a while (binctr predates it by a bit, but the LXC support for it predates all of this by several years). If you want to run this in production, please use this -- or LXC -- rather than the PoC that Jess wrote a few years ago. umoci[2] also has rootless support (though it doesn't use user namespaces) for image manipulation (extraction and diff generation).
I worked quite a bit on getting this userspace stuff together (though of course the kernel work was done by much more clever people than myself :P).
And many docker users run privileged containers, because then they don't need to troubleshoot permissions.. It doesn't meant the underlying system is flawed, because people take the lazy way around it.
I'm thinking of all the blogs back a few years ago for setting up things on Centos.
Step 1, disable SELinux.. That was never recommended, but the blog writers didn't want to go into details about how to manage selinux, or couldn't understand it.
You're right! It's not the fault of the underlying system, it's the fault of the lazy people who work around it trivially.
With that said, some people might consider a system that is much easier to trivially work around than to use properly is one possessed of a wonderful, glorious, bountiful collection of opportunities to improve its design. Such systems are not bad! Not by any means! They just could, perhaps, be somewhat better.
All of that said, I do think a sandbox-based system probably shouldn't allow things inside the sandbox to say "Don't sandbox me bro". That seems less than maximally wise, even if it does also seem super convenient.
It should also be noted that bind-mounting docker.sock is equivalent (or much worse -- it's easier to exploit at least) to using privileged containers, and an exceptionally large number of people do this (you see it in many blog posts and project installation scripts).