From: Stefano Garzarella <sgarzare@redhat.com>
To: Demi Marie Obenour <demiobenour@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
"Eric Dumazet" <edumazet@google.com>,
"Arseniy Krasnov" <avkrasnov@salutedevices.com>,
"Bobby Eshleman" <bobbyeshleman@meta.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"David S . Miller" <davem@davemloft.net>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Simon Horman" <horms@kernel.org>,
netdev@vger.kernel.org, eric.dumazet@gmail.com,
"Arseniy Krasnov" <AVKrasnov@sberdevices.ru>,
"Jason Wang" <jasowang@redhat.com>,
"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
"Eugenio Pérez" <eperezma@redhat.com>,
kvm@vger.kernel.org, virtualization@lists.linux.dev,
"virtio-comment@lists.linux.dev" <virtio-comment@lists.linux.dev>,
"Manuel Stoeckl" <code@mstoeckl.com>,
"Alyssa Ross" <hi@alyssa.is>,
"Spectrum Development" <devel@spectrum-os.org>,
"systemd development" <systemd-devel@lists.freedesktop.org>,
"Val Packett" <val@invisiblethingslab.com>
Subject: Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
Date: Mon, 25 May 2026 11:14:17 +0200 [thread overview]
Message-ID: <CAGxU2F5yG1VJ6eeawAtxcgE-uEQugsG00ZrTBmqz6=AF3JyksA@mail.gmail.com> (raw)
In-Reply-To: <96977898-81a0-4c79-a95f-0b288e0572a6@gmail.com>
On Mon, 25 May 2026 at 04:37, Demi Marie Obenour <demiobenour@gmail.com> wrote:
>
> On 5/23/26 17:00, Demi Marie Obenour wrote:
> > On 5/8/26 06:11, Stefano Garzarella wrote:
> >> On Fri, May 08, 2026 at 05:58:06AM -0400, Michael S. Tsirkin wrote:
> >>> On Fri, May 08, 2026 at 11:41:21AM +0200, Stefano Garzarella wrote:
> >>>> On Thu, May 07, 2026 at 06:48:47PM -0400, Michael S. Tsirkin wrote:
> >>>>> On Thu, May 07, 2026 at 02:59:13PM +0200, Stefano Garzarella wrote:
> >>>>>> On Thu, May 07, 2026 at 07:45:10AM -0400, Michael S. Tsirkin wrote:
> >>>>>>> On Thu, May 07, 2026 at 11:09:47AM +0200, Stefano Garzarella wrote:
> >>>>
> >>>> [...]
> >>>>
> >>>>>>>> For now, we're already doing something:
> >>>>>>>> merging the skuffs if they don't have EOM set.
> >>>>>>>
> >>>>>>>
> >>>>>>> Right that's good. You could go further and merge with EOM too
> >>>>>>> if you stick the info about message boundaries somewhere else.
> >>>>>>
> >>>>>> This adds a lot of complexity IMO, but we can try.
> >>>>>>
> >>>>>> Do you have something in mind?
> >>>>>
> >>>>> BER is clearly overkill but here's a POC that claude made for me,
> >>>>> just to give u an idea. It's clearly has a ton of issues,
> >>>>> for example I dislike how GFP_ATOMIC is handled.
> >>>>
> >>>> Okay, I somewhat understand, but clearly this isn't net material
> >>>
> >>> I doubt we have many other options given reverting the regression was
> >>> ruled out.
> >>
> >> As Eric pointed out, we can't revert it.
> >
> > Could there be an option to disable the mitigation in guests, for the
> > situation where the host is trusted? There are VMMs that implement
> > AF_VSOCK in userspace with a backing AF_UNIX socket.
Here the main problem was an attack to the host from the guest.
Can the application set the max (IIRC 4G) of SO_VM_SOCKETS_BUFFER_SIZE ?
> >
> >>>> so for now
> >>>> I think the best thing to do is to merge the fixup I sent (or something
> >>>> similar):
> >>>> https://lore.kernel.org/netdev/20260508092330.69690-1-sgarzare@redhat.com/
> >>>
> >>> I reviewed that one, problem is it's a spec violation/change that we'll
> >>> have to support forever.
> >>
> >> I have a few points to make on this, but let's discuss them there.
> >>
> >>>
> >>>> This is a major change that should be merged with more caution.
> >>>> Could this have too much of an impact on performance?
> >>>>
> >>>> Thanks,
> >>>> Stefano
> >>>
> >>> It's really a POC, real patch is left as an excersise for the reader
> >>> :).
> >>
> >> eheh, I see, but honestly, this overcomplication scares me. I'll try to
> >> think it over.
> >>
> >>> The correct approach IMHO is to only start using this
> >>> when we wasted a lot of memory on small packets.
> >>>
> >>> For example, if sum(truesize) >= buf size.
> >>>
> >>> then we'll not see a perf impact unless it's already pathological.
> >>
> >> Agree on this, which is similar to what I'm doing in that patch.
> >> Reducing the advertised buf_alloc only in pathological cases (e.g.
> >> overhead > buf_alloc).
> >
> > This isn't enough to prevent data loss due to race conditions.
Yep, we didn't go in that direction.
> >
> > I'm CCing the virtio-comment list and a few others.
> >
> > Right now, any application that needs to send massive amount of
> > data over a vsock is simply broken. This isn't just theoretical.
> > It's causing real-world problems for users of Waypipe.
> > Waypipe forwards Wayland protocol messages over AF_VSOCK,
> > so it can send a large amount of traffic over the socket.
> > See https://gitlab.freedesktop.org/mstoceckl/waypipe/work_items/165.
> >
> > If one is willing to mutate the ring buffer in-place, or to maintain
> > an auxiliary counter, it's possible to store all messages with bounded
> > (in practice) overhead. Specifically:
> >
> > - If the first byte of a block of data is nonzero, it's a
> > variable-length length. 1 byte for messages less than 128 bytes.
> >
> > - If the first byte of a block of data is zero, the subsequent bytes
> > are a variable-length counter that stores the number of consecutive
> > zero-byte messages.
> >
> > That adds a lot of complexity, which is very unfortunate for something
> > that needs to be backported to stable kernels. I also suspect it
> > requires all access to the ring buffer to take a lock rather than
> > being lock-free. But it's the only approach that I can think of that
> > can work with the current spec.
Yep, we were discussing something like that in this thread, but we are
still working on that.
That said, any help on preparing patches for this would be more than welcome.
> >
> > Could there at least be a normative note stating that drivers and
> > devices should treat each message as consuming 1024 bytes + the size
> > of the message itself, and warning that anything that doesn't is
> > going to be broken in practice?
We would like to fix the issue. This is a transitive situation.
> >
> > I'm CCing Val Packet (of Invisible Things Lab) and Alyssa Ross
> > (of Spectrum) because both of them are working on systems that rely
> > critically on vsock.
>
> Update: I see that patches have been upstreamed (with CC: stable) that
> reset the connection instead of data loss.
Yep, hope that helps the situation. But again, can setting the max
SO_VM_SOCKETS_BUFFER_SIZE helps those applications?
Thanks,
Stefano
prev parent reply other threads:[~2026-05-25 9:14 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <afn0ZdvZWswBuDMm@sgarzare-redhat>
[not found] ` <CANn89iLs8DOWJwDpf_ARoMrV+6b2tbhEJ=VVzeC8gCm5dRGaig@mail.gmail.com>
[not found] ` <afoF_cHfl6ygcupM@sgarzare-redhat>
[not found] ` <20260506113554-mutt-send-email-mst@kernel.org>
[not found] ` <afxVH2So9BbZ3Gta@sgarzare-redhat>
[not found] ` <20260507074113-mutt-send-email-mst@kernel.org>
[not found] ` <afyMCyBvZpzWrLtO@sgarzare-redhat>
[not found] ` <20260507163710-mutt-send-email-mst@kernel.org>
[not found] ` <af2sXvVCBT05XF0D@sgarzare-redhat>
[not found] ` <20260508055343-mutt-send-email-mst@kernel.org>
[not found] ` <af21ZGlANY6UUCKx@sgarzare-redhat>
2026-05-23 21:00 ` [PATCH net] vsock/virtio: fix potential unbounded skb queue Demi Marie Obenour
2026-05-25 2:37 ` Demi Marie Obenour
2026-05-25 9:14 ` Stefano Garzarella [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGxU2F5yG1VJ6eeawAtxcgE-uEQugsG00ZrTBmqz6=AF3JyksA@mail.gmail.com' \
--to=sgarzare@redhat.com \
--cc=AVKrasnov@sberdevices.ru \
--cc=avkrasnov@salutedevices.com \
--cc=bobbyeshleman@meta.com \
--cc=code@mstoeckl.com \
--cc=davem@davemloft.net \
--cc=demiobenour@gmail.com \
--cc=devel@spectrum-os.org \
--cc=edumazet@google.com \
--cc=eperezma@redhat.com \
--cc=eric.dumazet@gmail.com \
--cc=hi@alyssa.is \
--cc=horms@kernel.org \
--cc=jasowang@redhat.com \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stefanha@redhat.com \
--cc=systemd-devel@lists.freedesktop.org \
--cc=val@invisiblethingslab.com \
--cc=virtio-comment@lists.linux.dev \
--cc=virtualization@lists.linux.dev \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://spectrum-os.org/git/crosvm
https://spectrum-os.org/git/doc
https://spectrum-os.org/git/mktuntap
https://spectrum-os.org/git/nixpkgs
https://spectrum-os.org/git/spectrum
https://spectrum-os.org/git/ucspi-vsock
https://spectrum-os.org/git/www
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).