On 9/19/25 09:17, Alyssa Ross wrote: > Demi Marie Obenour writes: > >> On 9/17/25 07:27, Alyssa Ross wrote: >>> Demi Marie Obenour writes: >>> >>>> On 9/10/25 11:11, Alyssa Ross wrote: >>>>> • These services are part of our TCB anyway. Sandboxing only gets us >>>>> defense in depth. With that in mind, it's basically never going to >>>>> be worth adding sandboxing if it adds any amount of attack surface. >>>>> One example of that would be user namespaces. They've been a >>>>> consistent source of kernel security issues, and it might be better >>>>> to turn them off entirely than to use them for sandboxing stuff >>>>> that's trusted anyway. >>>> >>>> Sandboxing virtiofsd is going to be really annoying and will definitely >>>> come at a performance cost. The most efficient way to use virtiofsd >>>> is to give it CAP_DAC_READ_SEARCH in the initial user namespace and >>>> delegate _all_ access control to it. This allows virtiofs to use >>>> open_by_handle_at() for all filesystem access. Unfortunately, >>>> this also allows virtiofsd to open any file on the filesystem, ignoring >>>> all discretionary access control checks. I don't think Landlock would >>>> work either. SELinux or SMACK might work, but using them is >>>> significantly more complicated. >>>> >>>> If one wants to sandbox virtiofsd, one either needs to >>>> use --cache=never or run into an effective resource leak >>>> (https://gitlab.com/virtio-fs/virtiofsd/-/issues/194). >>>> My hope is that in the future the problem will be solved >>>> by DAX and an in-kernel shrinker that is aware of the host >>>> resources it is using. Denial of service would be prevented >>>> by cgroups on the host, addressing the objection mentioned >>>> in the issue comments. >>> >>> Do we not trust virtiofsd's built-in sandboxing? >> >> I do trust it, provided that it is verifiable (by dumping the state >> of the process at runtime). However, allowing unrestricted >> open_by_handle_at() allows opening any file on the system, conditioned >> only on the filesystem supporting open_by_handle_at(). Therefore, >> sandboxing and using handles for all filesystem access are incompatible. > > Wouldn't it be limited to only files on the same filesystem, since you > have to pass a mount FD to open_by_handle_at()? It would, but I think that different mounts count as the same filesystem for this purpose. open_by_handle_at() in privileged mode bypasses the VFS layer and goes straight to the underlying filesystem driver. File handles have low enough entropy that they can be guessed. > That's still bad though. So then to start with we just want to make > sure it doesn't have CAP_DAC_READ_SEARCH, and then we hope that > something comes along to address the limitations of that? This is correct. There is already one idea for that, which is to cryptographically sign (and possibly encrypt) file handles so that one cannot guess them. This would ensure that one cannot get a a file handle without using name_to_handle_at(), which already does access checks. In the future, it might make sense for virtiofsd to talk to a userspace filesystem implementation. Depending on how this is implemented, it might or might not be possible to sandbox virtiofsd in this case. -- Sincerely, Demi Marie Obenour (she/her/hers)