On 9/17/25 07:27, Alyssa Ross wrote: > Demi Marie Obenour writes: > >> On 9/10/25 11:11, Alyssa Ross wrote: >>> This all sounds fine, BUT there are a couple of important things to bear >>> in mind: >>> >>> • This needs to be maintainable. I don't know how much code this is >>> going to be our how complex it's going to be, but that this will be >>> totally custom does make me a bit concerned. >> >> This should not be too difficult. It's the same system calls used by >> container managers, so if there is a problem it should be possible to >> get help fairly easily. bubblewrap > > bubblewrap? :) Bubblewrap is a bit more complex than I would like, and doesn't support useful features like non-recursive bind mounts. I don't know if minijail supports them, but it might well. >>> • These services are part of our TCB anyway. Sandboxing only gets us >>> defense in depth. With that in mind, it's basically never going to >>> be worth adding sandboxing if it adds any amount of attack surface. >>> One example of that would be user namespaces. They've been a >>> consistent source of kernel security issues, and it might be better >>> to turn them off entirely than to use them for sandboxing stuff >>> that's trusted anyway. >> >> Sandboxing virtiofsd is going to be really annoying and will definitely >> come at a performance cost. The most efficient way to use virtiofsd >> is to give it CAP_DAC_READ_SEARCH in the initial user namespace and >> delegate _all_ access control to it. This allows virtiofs to use >> open_by_handle_at() for all filesystem access. Unfortunately, >> this also allows virtiofsd to open any file on the filesystem, ignoring >> all discretionary access control checks. I don't think Landlock would >> work either. SELinux or SMACK might work, but using them is >> significantly more complicated. >> >> If one wants to sandbox virtiofsd, one either needs to >> use --cache=never or run into an effective resource leak >> (https://gitlab.com/virtio-fs/virtiofsd/-/issues/194). >> My hope is that in the future the problem will be solved >> by DAX and an in-kernel shrinker that is aware of the host >> resources it is using. Denial of service would be prevented >> by cgroups on the host, addressing the objection mentioned >> in the issue comments. > > Do we not trust virtiofsd's built-in sandboxing? I do trust it, provided that it is verifiable (by dumping the state of the process at runtime). However, allowing unrestricted open_by_handle_at() allows opening any file on the system, conditioned only on the filesystem supporting open_by_handle_at(). Therefore, sandboxing and using handles for all filesystem access are incompatible. -- Sincerely, Demi Marie Obenour (she/her/hers)