From: Demi Marie Obenour <demiobenour@gmail.com>
To: Alyssa Ross <hi@alyssa.is>
Cc: Spectrum OS Development <devel@spectrum-os.org>
Subject: Re: Sandboxing strategy
Date: Wed, 10 Sep 2025 16:35:30 -0400 [thread overview]
Message-ID: <37d4dc51-4ee1-4ccf-8d37-2272988b9361@gmail.com> (raw)
In-Reply-To: <87o6ritk9q.fsf@alyssa.is>
[-- Attachment #1.1.1: Type: text/plain, Size: 4420 bytes --]
On 9/10/25 11:11, Alyssa Ross wrote:
> Demi Marie Obenour <demiobenour@gmail.com> writes:
>
>> I was thinking about how to sandbox the various per-VM daemons
>> and came up with the following strategy:
>>
>> - Each VM gets its own PID and mount namespace and set of user IDs.
>
> Didn't you say to me we couldn't do PID namespaces without support from
> s6?
I was mistaken about this. Without direct support in s6, there is no
way to avoid having a persistent process outside the PID namespace as
s6's direct child, but that is harmless.
>> - Mount namespace includes /proc, /sys, /dev, and the host rootfs.
>>
>> - Each service gets its own /tmp and /dev/shm if they are needed at all.
>
> Just a question: if we put services into cgroups, does use of tmpfs get
> charged to the appropriate cgroup?
It definitely should, especially if the tmpfs is mounted from inside
the cgroup. Whether it actually does I don't know.
>> - virtiofsd gets r/w access to the VM private storage.
>>
>> - IPC namespaces are irrelevant because the kernel is
>> built without System V IPC or POSIX message queues.
>>
>> - Sending signals between services in the namespace is blocked
>> by Landlock. Landlock also blocks ptrace() and other nastiness,
>> as well as communication via abstract AF_UNIX sockets.
>>
>> - Since AF_UNIX abstract sockets between services are blocked by
>> Landlock and Spectrum builds without IP or even Ethernet on the
>> host there is no need for network namespacing.
>
> It doesn't currently, just to be clear. (I'm still putting off using a
> custom kernel config on the host until we have better tooling for
> keeping up with Nixpkgs.)
Makes sense.
>> - The sandbox manager is PID 1 in the VM's PID namespace.
>> When s6 tells it to shut down, it tries to gracefully shut
>> down the VM. After a timeout or once the VM has shut down,
>> it exits, and Linux automatically kills all the processes
>> and cleans up the mount namespace.
>>
>> - The sandbox manager uses prctl(PR_SET_PDEATHSIG) to ensure it
>> dies if the parent s6 process dies. This requires s6 to provide
>> its own PID to avoid races, but that is easy to implement.
>>
>> All of this behavior will be hard-coded into C and Rust source code,
>> so it will be vastly simpler than a generic program that must support
>> many use-cases.
>
> This all sounds fine, BUT there are a couple of important things to bear
> in mind:
>
> • This needs to be maintainable. I don't know how much code this is
> going to be our how complex it's going to be, but that this will be
> totally custom does make me a bit concerned.
This should not be too difficult. It's the same system calls used by
container managers, so if there is a problem it should be possible to
get help fairly easily. bubblewrap
> • These services are part of our TCB anyway. Sandboxing only gets us
> defense in depth. With that in mind, it's basically never going to
> be worth adding sandboxing if it adds any amount of attack surface.
> One example of that would be user namespaces. They've been a
> consistent source of kernel security issues, and it might be better
> to turn them off entirely than to use them for sandboxing stuff
> that's trusted anyway.
Sandboxing virtiofsd is going to be really annoying and will definitely
come at a performance cost. The most efficient way to use virtiofsd
is to give it CAP_DAC_READ_SEARCH in the initial user namespace and
delegate _all_ access control to it. This allows virtiofs to use
open_by_handle_at() for all filesystem access. Unfortunately,
this also allows virtiofsd to open any file on the filesystem, ignoring
all discretionary access control checks. I don't think Landlock would
work either. SELinux or SMACK might work, but using them is
significantly more complicated.
If one wants to sandbox virtiofsd, one either needs to
use --cache=never or run into an effective resource leak
(https://gitlab.com/virtio-fs/virtiofsd/-/issues/194).
My hope is that in the future the problem will be solved
by DAX and an in-kernel shrinker that is aware of the host
resources it is using. Denial of service would be prevented
by cgroups on the host, addressing the objection mentioned
in the issue comments.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2025-09-10 20:35 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-09 7:57 Sandboxing strategy Demi Marie Obenour
2025-09-10 15:11 ` Alyssa Ross
2025-09-10 15:14 ` Alyssa Ross
2025-09-10 20:35 ` Demi Marie Obenour [this message]
2025-09-17 11:27 ` Alyssa Ross
2025-09-18 2:34 ` Demi Marie Obenour
2025-09-19 13:17 ` Alyssa Ross
2025-09-19 19:37 ` Demi Marie Obenour
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=37d4dc51-4ee1-4ccf-8d37-2272988b9361@gmail.com \
--to=demiobenour@gmail.com \
--cc=devel@spectrum-os.org \
--cc=hi@alyssa.is \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://spectrum-os.org/git/crosvm
https://spectrum-os.org/git/doc
https://spectrum-os.org/git/mktuntap
https://spectrum-os.org/git/nixpkgs
https://spectrum-os.org/git/spectrum
https://spectrum-os.org/git/ucspi-vsock
https://spectrum-os.org/git/www
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).