On 9/7/25 16:28, Yureka wrote: > > On 9/7/25 21:25, Demi Marie Obenour wrote: >> On 9/6/25 10:12, Yureka Lilian wrote: >>> The xdp-forwarder's purpose is implementing the functionality needed >>> within the net-vm (a VM running the Linux drivers for any physical >>> interfaces on the spectrum system). >>> >>> In the future, the net-vm will load the included XDP programs on the >>> passed-through physical interfaces as well as the downstream virtio >>> interface going into the router (recognized by its special MAC address). >>> >>> The net-vm needs to multiplex between the physical interfaces, as there >>> might be several interfaces in the same IOMMU-group. >>> >>> For this, the XDP program loaded on the physical interfaces >>> (`prog_physical.o`) applies a VLAN tag corresponding to the interface id >>> and redirects the packets to the router interface (identified by the >>> `router_iface` bpf map). In the other direction the XDP program loaded on >>> the router interface (`prog_router.o`) removes one layer of VLAN tagging >>> and redirects the packets to the interface read from the VLAN tag. >>> >>> The helper program `set_router_iface` is used to update the >>> `router_iface` >>> bpf map to point to the interface passed as argument to the program. >>> >>> Signed-off-by: Yureka Lilian >>> --- >>> pkgs/default.nix | 5 + >>> tools/default.nix | 15 +- >>> tools/meson.build | 5 + >>> tools/meson_options.txt | 4 + >>> tools/xdp-forwarder/include/parsing_helpers.h | 273 ++++++++++++++++++ >>> tools/xdp-forwarder/include/rewrite_helpers.h | 145 ++++++++++ >>> tools/xdp-forwarder/meson.build | 38 +++ >>> tools/xdp-forwarder/prog_physical.c | 37 +++ >>> tools/xdp-forwarder/prog_router.c | 43 +++ >>> tools/xdp-forwarder/set_router_iface.c | 29 ++ >>> 10 files changed, 591 insertions(+), 3 deletions(-) >>> create mode 100644 tools/xdp-forwarder/include/parsing_helpers.h >>> create mode 100644 tools/xdp-forwarder/include/rewrite_helpers.h >>> create mode 100644 tools/xdp-forwarder/meson.build >>> create mode 100644 tools/xdp-forwarder/prog_physical.c >>> create mode 100644 tools/xdp-forwarder/prog_router.c >>> create mode 100644 tools/xdp-forwarder/set_router_iface.c [...] >>> diff --git a/tools/xdp-forwarder/include/parsing_helpers.h >>> b/tools/xdp-forwarder/include/parsing_helpers.h >>> new file mode 100644 >>> index 0000000..3d240cd >>> --- /dev/null >>> +++ b/tools/xdp-forwarder/include/parsing_helpers.h >>> @@ -0,0 +1,273 @@ >>> +/* SPDX-License-Identifier: (GPL-2.0-or-later OR BSD-2-clause) */ >>> +/* Vendored from https://github.com/xdp-project/xdp-tutorial/blob/d3d3eed6ea9a63d1302bfa8b5a8e93862bfe11f0/common/parsing_helpers.h */ It looks like your mail client wrapped the lines. I demangled them in the C code. >>> [...] >>> +/* >>> + * struct vlan_hdr - vlan header >>> + * @h_vlan_TCI: priority and VLAN ID >>> + * @h_vlan_encapsulated_proto: packet type ID or len >>> + */ >>> +struct vlan_hdr { >>> + __be16 h_vlan_TCI; >>> + __be16 h_vlan_encapsulated_proto; >>> +}; I think this (and all the subsequent structs) should be __attribute__((packed)) but am not sure. [...] >>> diff --git a/tools/xdp-forwarder/include/rewrite_helpers.h >>> b/tools/xdp-forwarder/include/rewrite_helpers.h >>> new file mode 100644 >>> index 0000000..2deae9a >>> --- /dev/null >>> +++ b/tools/xdp-forwarder/include/rewrite_helpers.h >>> @@ -0,0 +1,145 @@ >>> +/* SPDX-License-Identifier: (GPL-2.0-or-later OR BSD-2-clause) */ >>> +/* Vendored from >>> https://github.com/xdp-project/xdp-tutorial/blob/d3d3eed6ea9a63d1302bfa8b5a8e93862bfe11f0/common/rewrite_helpers.h >>> */ >>> +/* >>> + * This file contains functions that are used in the packetXX XDP programs to >>> + * manipulate on packets data. The functions are marked as __always_inline, and >>> + * fully defined in this header file to be included in the BPF program. >>> + */ >>> + >>> +#ifndef __REWRITE_HELPERS_H >>> +#define __REWRITE_HELPERS_H >>> + >>> +#include >>> +#include >>> +#include >>> +#include >>> + >>> +#include >>> +#include >>> + >>> +/* Pops the outermost VLAN tag off the packet. Returns the popped VLAN ID on >>> + * success or negative errno on failure. >>> + */ >>> +static __always_inline int vlan_tag_pop(struct xdp_md *ctx, struct ethhdr *eth) >>> +{ >>> + void *data_end = (void *)(long)ctx->data_end; >>> + struct ethhdr eth_cpy; >>> + struct vlan_hdr *vlh; >>> + __be16 h_proto; >>> + int vlid; >>> + >>> + if (!proto_is_vlan(eth->h_proto)) >>> + return -1; >>> + >>> + /* Careful with the parenthesis here */ >>> + vlh = (void *)(eth + 1); >>> + >>> + /* Still need to do bounds checking */ >>> + if (vlh + 1 > data_end) >>> + return -1; >>> + >>> + /* Save vlan ID for returning, h_proto for updating Ethernet header */ >>> + vlid = bpf_ntohs(vlh->h_vlan_TCI); >>> + h_proto = vlh->h_vlan_encapsulated_proto; >>> + >>> + /* Make a copy of the outer Ethernet header before we cut it off */ >>> + __builtin_memcpy(ð_cpy, eth, sizeof(eth_cpy)); >>> + >>> + /* Actually adjust the head pointer */ >>> + if (bpf_xdp_adjust_head(ctx, (int)sizeof(*vlh))) >>> + return -1; >>> + >>> + /* Need to re-evaluate data *and* data_end and do new bounds checking >>> + * after adjusting head >>> + */ >>> + eth = (void *)(long)ctx->data; >>> + data_end = (void *)(long)ctx->data_end; >>> + if (eth + 1 > data_end) >>> + return -1; >>> + >>> + /* Copy back the old Ethernet header and update the proto type */ >>> + __builtin_memcpy(eth, ð_cpy, sizeof(*eth)); >>> + eth->h_proto = h_proto; >>> + >>> + return vlid; >>> +} >> In Spectrum, eth is always the packet data, so this should be able to be >> simplified to: [...] > > Not upstreamable I agree. > [...] >> In the Spectrum use-case (where the Ethernet header is always the one in >> the packet) this should be able to be simplified to (not tested): [...] > > Not upstreamable That makes sense. >> This also avoids needing the 'eth' parameter, as it is now assigned in the >> function body before being read from. I'm not sure this would be > you're not sure this would be what? Should have added "upstreamable". > [...] >>> +#endif /* __REWRITE_HELPERS_H */ >>> diff --git a/tools/xdp-forwarder/meson.build >>> b/tools/xdp-forwarder/meson.build >>> new file mode 100644 >>> index 0000000..7e60c11 >>> --- /dev/null >>> +++ b/tools/xdp-forwarder/meson.build >>> @@ -0,0 +1,38 @@ >>> +# SPDX-License-Identifier: EUPL-1.2+ >>> +# SPDX-FileCopyrightText: 2025 Yureka Lilian >>> + >>> +libbpf = dependency('libbpf', version : '1.6.2') >>> + >>> +executable('set_router_iface', 'set_router_iface.c', >>> + dependencies : libbpf, >>> + install : true) >>> + >>> +clang = find_program('clang') >>> + >>> +bpf_o_cmd = [ >>> + clang.full_path(), >>> + '-fno-stack-protector', >>> + '-O2', >>> + '-target', 'bpf', >>> + '-I', meson.current_source_dir() + '/include', >>> + '-g', >>> + '-c', >>> + '@INPUT@', >>> + '-o', >>> + '@OUTPUT@', >>> +] >> I recommend adding -fno-strict-overflow and -fno-strict-aliasing >> here. I don't know if those are implied by -target bpf, but better >> safe than sorry. > Will do. > >>> +prog_router_o = custom_target( >>> + input : 'prog_router.c', >>> + output : 'prog_router.o', >>> + command : bpf_o_cmd, >>> + install: true, >>> + install_dir: 'lib/xdp') >>> + >>> +prog_physical_o = custom_target( >>> + input : 'prog_physical.c', >>> + output : 'prog_physical.o', >>> + command : bpf_o_cmd, >>> + install: true, >>> + install_dir: 'lib/xdp') >> Meson allows providing a dependency file so that the >> target is remade if any of the headers are changed. >> Clang is able to produce such a file (-MD -MP -MF file). > > I mostly got inspiration from the systemd meson files. I'm not very > familiar with meson, but I will try to add this. I think bpf_o_cmd = [ clang.full_path(), '-fno-stack-protector', '-fno-strict-aliasing', '-fno-strict-overflow', '-Wall', '-Wextra', '-O2', '-target', 'bpf', '-I', meson.current_source_dir() + '/include', '-g', '-c', '-o', '@OUTPUT@', '-MD', '-MP', '-MF', '@DEPFILE@', '--', '@INPUT@', ] prog_physical_o = custom_target( input: 'prog_physical.c', output: 'prog_physical.o', depfile: 'prog_physical.o.dep', command: bpf_o_cmd, install: true, install_dir: 'lib/xdp') should work. '--' is optional, of course; I am in the habit of adding it because it does matter in other contexts. This is also untested, but it is consistent with the Meson documentation (https://mesonbuild.com/Reference-manual_functions.html#custom_target). The above command includes the extra command-line options I mentioned (-Wall -Wextra -fno-strict-overflow -fno-strict-aliasing). > [...] >>> +SEC("xdp") >>> +int physical(struct xdp_md *ctx) >>> +{ >>> + void *data_end = (void *)(long)ctx->data_end; >>> + void *data = (void *)(long)ctx->data; >> Would it make sense to use char * here? I know that Linux uses >> arithmetic on void *, but it makes the > it makes the [... what]? Whoops, forgot to finish the sentence. It makes the code easier to understand for those familiar with userspace programming. >>> + struct hdr_cursor nh; >>> + nh.pos = data; >>> + >>> + struct ethhdr *eth; >>> + if (parse_ethhdr(&nh, data_end, ð) < 0) >>> + return XDP_DROP; >>> + >>> + >>> + if (ctx->ingress_ifindex < 1 || ctx->ingress_ifindex > 4096) >>> + return XDP_DROP; I think this is off-by-1 and should be: if (ctx->ingress_ifindex < 1 || ctx->ingress_ifindex > VLAN_VID_MASK) return XDP_DROP; >>> + vlan_tag_push(ctx, eth, ctx->ingress_ifindex); >> Looks like a missing check for the return value. > Thanks, will fix. You're welcome. > [...] >>> + int vlid = vlan_tag_pop(ctx, eth); >>> + if (vlid < 0) { >>> + return XDP_DROP; >>> + } >> I don’t think Spectrum uses bits other than the VLAN ID, >> and VLAN 0 is reserved, so perhaps: >> >> if (vlid < 1 || vlid > 4096) { >> return XDP_DROP; >> } > The "< 0" check is strictly an error check. In success cases, > vlan_tag_pop returns exactly the VLAN ID. Makes sense. A check for the VLAN ID being valid could be done but would be separate (I think `(vlid & VLAN_VID_MASK) != 0`). > [...] >> I simplified the code into the following (only compile-tested): >> >> /* SPDX-License-Identifier: (GPL-2.0-or-later OR BSD-2-clause) */ >> /* >> * This file contains parsing functions that can be used in eXDP programs. The >> * functions are marked as __always_inline, and fully defined in this header >> * file to be included in the BPF program. >> * >> * Each helper parses a packet header, including doing bounds checking, and >> * returns the type of its contents if successful, and -1 otherwise. >> * >> * For Ethernet and IP headers, the content type is the type of the payload >> * (h_proto for Ethernet, nexthdr for IPv6), for ICMP it is the ICMP type field. >> * All return values are in host byte order. >> */ >> >> #include >> #include >> #include >> #include >> #include >> #include >> >> /* >> * struct vlan_hdr - vlan header >> * @h_vlan_TCI: priority and VLAN ID >> * @h_vlan_encapsulated_proto: packet type ID or len >> */ >> struct vlan_hdr { >> __be16 h_vlan_TCI; >> __be16 h_vlan_encapsulated_proto; >> } __attribute__((__packed__)); >> >> struct eth_vlan_hdr { >> struct ethhdr eth; >> struct vlan_hdr vlan; >> } __attribute__((__packed__)); >> >> static __always_inline int proto_is_vlan(__u16 h_proto) >> { >> return !!(h_proto == bpf_htons(ETH_P_8021Q) || >> h_proto == bpf_htons(ETH_P_8021AD)); >> } >> >> /* Pushes a new VLAN tag after the Ethernet header. Returns 0 on success, >> * -1 on failure. >> */ >> static __always_inline int vlan_tag_push(struct xdp_md *ctx, int vlid) >> { >> struct eth_vlan_hdr *hdr; >> char *data_end; >> char *data; >> >> /* Check for valid Ethernet frame */ >> if ((unsigned long)ctx->data + ETH_ZLEN > (unsigned long)ctx->data_end) >> return -1; >> >> /* Add space in front of the packet */ >> if (bpf_xdp_adjust_head(ctx, -(int)sizeof(struct vlan_hdr))) >> return -1; >> >> /* Must re-fetch header pointers */ >> data_end = (char *)(long)ctx->data_end; >> data = (char *)(long)ctx->data; >> >> /* Verifier requires this (redundant) bounds check */ >> if (data + ETH_ZLEN + sizeof(struct vlan_hdr) > data_end) >> return -1; >> >> /* Move source and destination MAC addresses, populate VLAN tag >> * with ID and proto, and set outer Ethernet header to VLAN type. >> */ >> __builtin_memmove(data, data + sizeof(struct vlan_hdr), >> offsetof(struct ethhdr, h_proto)); >> hdr = (struct eth_vlan_hdr *)data; >> /* TODO: should this be ETH_P_8021AD? */ >> hdr->eth.h_proto = bpf_htons(ETH_P_8021Q); >> hdr->vlan.h_vlan_TCI = bpf_htons(vlid); >> return 0; >> } >> >> /* Pops the outermost VLAN tag off the packet. Returns the popped VLAN ID on >> * success or negative errno on failure. >> */ >> static __always_inline int vlan_tag_pop(struct xdp_md *ctx) >> { >> >> char *data_end = (void *)(long)ctx->data_end; >> struct ethhdr *eth = (void *)(long)ctx->data; >> struct vlan_hdr *vlh; >> int vlid; >> >> /* Check that the produced Ethernet frame will be long enough */ >> if ((unsigned long)eth + ETH_ZLEN + sizeof(*vlh) > >> (unsigned long)data_end) >> return -1; >> >> /* Careful with the parenthesis here */ >> vlh = (void *)(eth + 1); >> >> if (!proto_is_vlan(eth->h_proto)) >> return -1; >> >> /* Save vlan ID for returning */ >> vlid = bpf_ntohs(vlh->h_vlan_TCI); >> >> /* Move the source and destination MAC addresses. >> * This moves the Ethernet header by 4 bytes, so >> * be sure to cast to char * before doing pointer >> * arithmetic. >> */ >> __builtin_memmove((char *)eth + sizeof(*vlh), eth, >> offsetof(struct ethhdr, h_proto)); >> >> /* Actually adjust the head pointer */ >> if (bpf_xdp_adjust_head(ctx, (int)sizeof(*vlh))) >> return -1; >> >> return vlid; >> } >> >> // The maps are actually not used by this program, but just included >> // to keep the reference-counted pin alive before any physical interfaces >> // are added. >> struct { >> __uint(type, BPF_MAP_TYPE_DEVMAP); >> __type(key, int); >> __type(value, int); >> __uint(max_entries, 1); >> __uint(pinning, LIBBPF_PIN_BY_NAME); >> } physical_iface SEC(".maps"); >> >> struct { >> __uint(type, BPF_MAP_TYPE_DEVMAP); >> __type(key, int); >> __type(value, int); >> __uint(max_entries, 1); >> __uint(pinning, LIBBPF_PIN_BY_NAME); >> } router_iface SEC(".maps"); >> >> SEC("xdp") >> int physical(struct xdp_md *ctx) >> { >> if (ctx->ingress_ifindex < 1 || ctx->ingress_ifindex > 4096) >> return XDP_DROP; >> >> if (vlan_tag_push(ctx, ctx->ingress_ifindex)) >> return XDP_DROP; >> >> return bpf_redirect_map(&physical_iface, 0, 0); >> } >> >> SEC("xdp") >> int router(struct xdp_md *ctx) >> { >> int vlid = vlan_tag_pop(ctx); >> >> /* Negative VLAN IDs indicate errors, and 0 is reserved and invalid. >> * Values above 4096 indicate that the priority field is used, which >> * isn't suppported out of simplicity. >> */ >> if (vlid < 1 || vlid > 4096) >> return XDP_DROP; >> >> return bpf_redirect(vlid, 0); >> } > > I'm not sure what you intend with this. If there is no significant flaws > in the upstream code, why should we write our own version? Valid point. I do think it is safe to set VLAN_MAX_DEPTH to 0, as Spectrum doesn't parse VLAN tags. -- Sincerely, Demi Marie Obenour (she/her/hers)