I was thinking about how to sandbox the various per-VM daemons and came up with the following strategy: - Each VM gets its own PID and mount namespace and set of user IDs. - Mount namespace includes /proc, /sys, /dev, and the host rootfs. - Each service gets its own /tmp and /dev/shm if they are needed at all. - virtiofsd gets r/w access to the VM private storage. - IPC namespaces are irrelevant because the kernel is built without System V IPC or POSIX message queues. - Sending signals between services in the namespace is blocked by Landlock. Landlock also blocks ptrace() and other nastiness, as well as communication via abstract AF_UNIX sockets. - Since AF_UNIX abstract sockets between services are blocked by Landlock and Spectrum builds without IP or even Ethernet on the host there is no need for network namespacing. - The sandbox manager is PID 1 in the VM's PID namespace. When s6 tells it to shut down, it tries to gracefully shut down the VM. After a timeout or once the VM has shut down, it exits, and Linux automatically kills all the processes and cleans up the mount namespace. - The sandbox manager uses prctl(PR_SET_PDEATHSIG) to ensure it dies if the parent s6 process dies. This requires s6 to provide its own PID to avoid races, but that is easy to implement. All of this behavior will be hard-coded into C and Rust source code, so it will be vastly simpler than a generic program that must support many use-cases. -- Sincerely, Demi Marie Obenour (she/her/hers)