Add VXLAN/EVPN support with flood list management by rjarry · Pull Request #504 · DPDK/grout

rjarry · 2026-02-14T00:04:01Z

Add VXLAN interface type with encapsulation/decapsulation datapath nodes. Each VXLAN interface maintains a per-VNI flood list of remote VTEPs used for BUM traffic ingress replication.

The flood list API is transport-agnostic, designed to accommodate future SRv6 EVPN support. VXLAN VTEP is the first registered flood type. A dispatch layer routes add/del/list operations to type-specific callbacks.

FRR integration is wired up for bridge interfaces, VXLAN interfaces, FDB entries and flood lists. This enables BGP EVPN type-2 (MAC/IP) and type-3 (IMET) route exchange with remote PEs.

Also fix interface running state not being set on creation. This prevented FRR from seeing logical interfaces as operationally up.

Summary by CodeRabbit

New Features
- Added bridge interface type with member management, MAC learning, and flooding capabilities.
- Added VXLAN tunnel interface support with VNI configuration and VTEP management.
- Added Forwarding Database (FDB) management for MAC learning and aging.
- Added CLI commands for bridge, VXLAN, FDB, and flood management.
Bug Fixes
- Fixed synchronization timing in resource cleanup operations.
- Improved control queue draining for proper resource deallocation.
Chores
- Expanded build system for new L2 module infrastructure.

grep -q '10.' can match any 3 characters that start with '10' (e.g. 100, 10a, 10:, etc.). So it can match parts of IPv6 link local addresses. Use full address and -F/--fixed-strings to avoid any special regexp characters. We want verbatim match. Fixes: 74228b7 ("cli: add address flush command") Signed-off-by: Robin Jarry <rjarry@redhat.com>

There is no "name" argument available when creating an interface. The name is the first argument. Fixes: 9d5152f ("smoke: add VRF configuration tests") Signed-off-by: Robin Jarry <rjarry@redhat.com>

When removing a port which is the xconnect peer of another one, iface_from_id(iface->domain_id) will return NULL since the interface was deleted. Program terminated with signal SIGSEGV, Segmentation fault. xconnect_process at modules/infra/datapath/xconnect.c:36 if (peer->type == GR_IFACE_TYPE_PORT) { __rte_node_process at subprojects/dpdk/lib/graph/rte_graph_worker_common.h:216 rte_graph_walk_rtc at subprojects/dpdk/lib/graph/rte_graph_model_rtc.h:42 rte_graph_walk at subprojects/dpdk/lib/graph/rte_graph_worker.h:38 gr_datapath_loop at modules/infra/datapath/main_loop.c:252 Check the return value and drop the packet in that case. Signed-off-by: Robin Jarry <rjarry@redhat.com>

When an interface leaves VRF mode (e.g. reconfigured as cross-connect), any IPv4 and IPv6 addresses previously configured on it become invalid. Likewise, when an interface moves to a different VRF, its addresses belong to the old VRF and need to be removed. Subscribe to GR_EVENT_IFACE_POST_RECONFIG in both IPv4 and IPv6 address modules. On reconfiguration, flush all addresses when the interface is no longer in VRF mode or has moved to a different VRF. For IPv6, also reinitialize link-local and well-known multicast addresses when entering VRF mode or changing VRFs. Extend the IPv6 add/del smoke test to exercise VRF reassignment and cross-connect mode transitions. Signed-off-by: Robin Jarry <rjarry@redhat.com>

When an interface is removed, GR_EVENT_IFACE_PRE_REMOVE is handled by both nexthop_iface_cleanup() in nexthop.c and address-family-specific handlers in ip/control/address.c and ip6/control/address.c (added in 6a1362c "ip,ip6: flush addresses on interface mode change"). The event handler execution order is not guaranteed. If nexthop_iface_cleanup() runs first, it destroys local address nexthops by decrementing their ref_count to zero. When the address-family handler runs next, it accesses already-freed nexthops via nexthop_info_l3(), leading to use-after-free. Skip local address nexthops (NH_LOCAL_ADDR_FLAGS) in nh_cleanup_interface_cb(), leaving their cleanup to addr4_delete() and addr6_delete() which properly remove them from the per-interface address vector and handle associated routes. Signed-off-by: Robin Jarry <rjarry@redhat.com>

When a bond is destroyed, its member ports are detached but remain without a VRF assignment. When a port is destroyed, its peer interfaces (other ports whose domain_id points to this port) lose their domain reference. In both cases, reassign the orphaned ports to the default VRF and fire GR_EVENT_IFACE_POST_RECONFIG so that address-family handlers can flush stale addresses and reinitialize as needed. Export vrf_default_get_or_create() so it can be used from bond and port teardown paths. Signed-off-by: Robin Jarry <rjarry@redhat.com>

When rte_rcu_qsbr_dq_enqueue() fails in DQ mode, the deleted key slot is never freed and becomes permanently leaked. Also, when rte_hash_add_key_data() overwrites an existing key, the old data pointer is silently lost. With RCU-protected readers still potentially accessing the old data, there is no safe way to free it. Add two patches from an upstream series [1]: - Fall back to synchronous reclamation instead of only logging an error when the RCU defer queue enqueue fails on key deletion. - When RCU is configured with a free_key_data_func callback, automatically defer-free the old data pointer on overwrite. The third patch from that series (adding a new rte_hash_replace API) is not needed since the free_key_data_func callback is sufficient. [1] https://patches.dpdk.org/project/dpdk/list/?series=37352 Signed-off-by: Robin Jarry <rjarry@redhat.com>

When outputting on a VLAN interface, the local iface variable is reassigned to the parent interface after VLAN tag insertion. The subsequent UP status check and TX stats increment then use this reassigned pointer, accounting them on the parent instead of the original VLAN interface. Use d->iface which still references the original VLAN interface for the status check and stats increment. Fixes: 7701685 ("port: add dedicated port_tx functions") Signed-off-by: Robin Jarry <rjarry@redhat.com>

Bridge members that are not VLAN interfaces (trunk ports) need to carry the VLAN ID through the output path so that the Ethernet header can be built with the correct 802.1Q tag. iface_output unconditionally clears d->vlan_id to zero for non-VLAN interfaces, discarding the VLAN ID set during input processing. Only set d->vlan_id when the output interface is actually a VLAN type. Clear it instead at the points where it is no longer needed: in eth_output after the Ethernet header has been built, and in the control plane injection path where no VLAN context exists. Signed-off-by: Robin Jarry <rjarry@redhat.com>

A future change will require calling control_queue_push() from gr_event_push() which lives in main/. If control_queue stays in the infra module, this would create a circular dependency between main and infra. Move control_queue.c and gr_control_queue.h to main/ and replace the event-based drain mechanism with explicit control_queue_drain() calls from iface_destroy() and nexthop_destroy() after the RCU sync. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Pass rte_lcore_id() to rte_rcu_qsbr_synchronize() instead of RTE_QSBR_THRID_INVALID to exclude the calling thread from the quiescent state wait. This is needed to allow creating objects from datapath workers. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Event notifications must be processed on the control plane thread. Modify gr_event_push() to detect when it is called from a datapath worker and use the control queue to defer the notification to the control plane event loop. This enables datapath nodes (such as bridge MAC learning) to create MAC entries on the fly without blocking the control plane. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Introduce a new l2 module with a bridge interface type that allows grouping multiple member interfaces (ports, VLANs, bonds) into a single L2 broadcast domain. The bridge maintains a list of members and supports configurable MAC learning, BUM traffic flooding, per-bridge ageing timer and a custom MAC address. Members are switched to GR_IFACE_MODE_BRIDGE when attached and restored to the default VRF when the bridge is destroyed. FDB management and datapath nodes for actual packet forwarding will follow in subsequent commits. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Implement a forwarding database backed by an RCU-protected rte_hash with a dedicated rte_mempool for entries. The hash is configured with a free_key_data_func callback so that deleted entries are automatically returned to the pool after RCU synchronization. Entries can be added/deleted/flushed via the API and can also be dynamically learned from the datapath via fdb_learn(). A periodic ageing timer evicts learned entries that have not been refreshed within the bridge ageing_time. Static entries configured by the user are never aged out. FDB entries associated with a member or bridge are automatically purged on detach or bridge destruction. The FDB table size defaults to 4096 entries and can be changed at runtime via the config set/get API, provided the table is empty. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Add bridge_input and bridge_flood datapath nodes. bridge_input receives packets from member interfaces via GR_IFACE_MODE_BRIDGE. It learns source MAC addresses into the FDB (unless GR_BRIDGE_F_NO_LEARN is set), then looks up the destination. Known unicast destinations are forwarded to the learned output interface. Unknown unicast, broadcast and multicast are sent to bridge_flood. Hairpin packets (destination is the source interface) are dropped. When the destination is the bridge interface itself, packets are sent to eth_input for local processing. bridge_flood replicates each packet to all bridge members except the ingress interface, and to the bridge interface itself. The first output reuses the original mbuf, subsequent ones are cloned. When GR_BRIDGE_F_NO_FLOOD is set, the packet is dropped instead. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Create a bridge with three member ports and verify L2 forwarding between namespaces, L3 reachability to the bridge interface address, and overwriting a dynamic FDB entry with a static one. Also check that detaching a member and deleting the bridge properly clean up FDB entries. Signed-off-by: Robin Jarry <rjarry@redhat.com>

iface_create copies the requested configuration into the iface struct early via iface->base = conf->base. When the interface is created with GR_IFACE_F_UP, the flag is already set by the time iface_set_up_down runs. The down-to-up transition condition (!(flags & UP) && up) evaluates to false, so GR_IFACE_S_RUNNING is never set and GR_EVENT_IFACE_STATUS_UP is never pushed. This only affects logical interfaces (bridges, VXLAN, VLANs). Physical ports are not affected because their set_up_down callback manages the running state independently via the DPDK link status event. This prevents FRR from seeing logical interfaces as operationally up (IFF_RUNNING), which in turn prevents EVPN from advertising IMET routes for VXLAN interfaces. Clear the UP flag before calling iface_set_up_down so the transition fires normally. Fixes: 9a61e92 ("iface: send status events on admin state changes") Signed-off-by: Robin Jarry <rjarry@redhat.com>

coderabbitai · 2026-02-14T00:10:59Z

📝 Walkthrough

Walkthrough

This pull request introduces comprehensive Layer 2 (L2) support with new bridge and VXLAN interface types, Forwarding Database (FDB) management, and VXLAN flood (VTEP) capabilities. The control plane adds bridge member management, FDB learning and aging, and flood entry tracking. The datapath implements bridge and VXLAN packet processing nodes with learning, flooding, and tunnel encapsulation/decapsulation. FRR integration extends MAC and VTEP support through dplane operations. The event system is refactored to defer notifications via a control queue. New CLI modules enable bridge, VXLAN, FDB, and flood management. Infrastructure is extended with new interface types, VRF handling improvements, and QSBR synchronization updates. Integration tests validate bridge connectivity, VXLAN tunneling, and EVPN/VXLAN interoperability.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Fix all issues with AI agents

In `@modules/infra/control/group_nexthop.c`:
- Line 152: The call to rte_rcu_qsbr_synchronize(gr_datapath_rcu(),
rte_lcore_id()) is using rte_lcore_id() from a control thread that is not
registered as a QSBR reader; replace the second argument with
RTE_QSBR_THRID_INVALID so the call becomes
rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID) whenever
invoked from control-plane threads (same change for any other control-plane
calls that pass rte_lcore_id()); ensure only datapath reader threads keep using
their registered thread IDs (registration happens via
rte_rcu_qsbr_thread_register in the datapath main loop).

In `@modules/l2/cli/vxlan.c`:
- Around line 73-77: arg_vrf currently returns 0 when the user omits the
ENCAV_VRF argument, but the code treats 0 as success and unconditionally sets
GR_VXLAN_SET_ENCAP_VRF, causing encap_vrf to be overwritten; fix by storing the
arg_vrf return value (e.g. int ret = arg_vrf(c, p, "ENCAP_VRF",
&vxlan->encap_vrf_id)), return on ret < 0, and only set set_attrs |=
GR_VXLAN_SET_ENCAP_VRF when ret > 0 (meaning the user actually supplied
ENCAV_VRF), leaving vxlan->encap_vrf_id untouched when the argument is absent.

In `@modules/l2/control/bridge.c`:
- Around line 60-77: bridge_detach_member currently resets member->mode to
GR_IFACE_MODE_VRF but leaves member->vrf_id as GR_VRF_ID_UNDEF; update
bridge_detach_member to restore the member's VRF by calling
vrf_default_get_or_create() and assigning the returned vrf id to member->vrf_id
and incrementing its refcount via vrf_incref (mirroring bridge_fini behavior),
then set member->mode = GR_IFACE_MODE_VRF so the detached iface has a valid VRF.

In `@modules/l2/control/vxlan.c`:
- Around line 281-287: The vtep_flood_del function mutates the shared
flood_vteps array in-place (swap-and-decrement) without RCU protection, causing
a data-race with datapath readers; change vtep_flood_del to follow the
copy-on-write + RCU pattern used by vtep_flood_add: allocate a new flood_vteps
buffer, copy entries from the old array excluding entry->vtep.addr (preserving
order if add does), set the new pointer and updated n_flood_vteps atomically
(using the same RCU/atomic swap helper used by vtep_flood_add), schedule the old
buffer to be freed after the RCU grace period, and keep the
gr_event_push(GR_EVENT_FLOOD_DEL, entry) call; reference vtep_flood_del,
vtep_flood_add, flood_vteps, n_flood_vteps, and gr_event_push when making the
change.
- Around line 50-83: The delete uses cur->encap_vrf_id after it was overwritten,
so rte_hash_del_key is built with the new encap_vrf_id instead of the old one;
fix by capturing the old encap_vrf_id (and old vni if needed) before mutating
cur (e.g., read old_vrf = cur->encap_vrf_id and build cur_key from old_vrf and
cur->vni) or postpone assigning cur->encap_vrf_id until after the hash
delete/add sequence; update the code around cur->encap_vrf_id, cur_key,
rte_hash_del_key, next_key and rte_hash_add_key_data accordingly so the deletion
targets the original {old_vni, old_vrf}.

In `@modules/l2/datapath/vxlan_output.c`:
- Around line 75-79: vxlan_output currently assigns ip_output_mbuf_data(m)->nh =
fib4_lookup(...) without checking for NULL and always sends packets to
IP_OUTPUT; change vxlan_output to check the result of fib4_lookup (the value
stored in ip_output_mbuf_data(m)->nh) and if it is NULL enqueue the packet to
the BAD_NEXTHOP edge (the declared but unused BAD_NEXTHOP path) instead of
forwarding to IP_OUTPUT, otherwise continue to set edge = IP_OUTPUT and enqueue
as before; update the enqueue logic around rte_node_enqueue_x1(graph, node,
edge, m) so the chosen edge reflects this NULL-check.

🧹 Nitpick comments (3)

frr/if_grout.c (1)
369-378: Variable add shadows outer bool add on line 356.

struct gr_fdb_add_req *add (line 370) shadows the bool add declared at line 356. This works correctly due to block scoping, but it's a latent maintenance trap — a future refactor could easily reference the wrong add.
Proposed fix — rename inner variable
 	if (add) {
-		struct gr_fdb_add_req *add = req;
-		add->exist_ok = true;
-		add->fdb.iface_id = ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx));
-		add->fdb.bridge_id = ifindex_frr_to_grout(dplane_ctx_mac_get_br_ifindex(ctx));
-		add->fdb.vlan_id = dplane_ctx_mac_get_vlan(ctx);
-		add->fdb.flags = dplane_ctx_mac_get_dp_static(ctx) ? GR_FDB_F_STATIC : 0;
-		memcpy(&add->fdb.mac, dplane_ctx_mac_get_addr(ctx), sizeof(add->fdb.mac));
-		add->fdb.vtep = dplane_ctx_mac_get_vtep_ip(ctx)->s_addr;
+		struct gr_fdb_add_req *add_req = req;
+		add_req->exist_ok = true;
+		add_req->fdb.iface_id = ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx));
+		add_req->fdb.bridge_id = ifindex_frr_to_grout(dplane_ctx_mac_get_br_ifindex(ctx));
+		add_req->fdb.vlan_id = dplane_ctx_mac_get_vlan(ctx);
+		add_req->fdb.flags = dplane_ctx_mac_get_dp_static(ctx) ? GR_FDB_F_STATIC : 0;
+		memcpy(&add_req->fdb.mac, dplane_ctx_mac_get_addr(ctx), sizeof(add_req->fdb.mac));
+		add_req->fdb.vtep = dplane_ctx_mac_get_vtep_ip(ctx)->s_addr;
 		req_type = GR_FDB_ADD;
modules/l2/api/gr_l2.h (1)

44-49: Bit 36 skipped in VXLAN reconfiguration flags.

GR_VXLAN_SET_LOCAL is bit 35, GR_VXLAN_SET_MAC jumps to bit 37. Bit 36 is unused. If intentional (reserved for a future attribute), no problem. If a typo, it won't cause a bug now but could cause confusion later.

modules/l2/control/fdb.c (1)

329-346: Redundant fdb_max_entries assignment.

Line 342 sets fdb_max_entries = req->max_entries, but fdb_reconfig (line 79) already does the same assignment. Harmless, but the duplicate write could be removed.

modules/infra/control/group_nexthop.c

modules/l2/cli/vxlan.c

modules/l2/control/bridge.c

coderabbitai · 2026-02-14T00:11:02Z

modules/l2/control/vxlan.c

+	if (set_attrs & GR_VXLAN_SET_ENCAP_VRF) {
+		uint16_t vrf = next->encap_vrf_id;
+		uint16_t old = cur->encap_vrf_id;
+
+		if (vrf == GR_VRF_ID_UNDEF)
+			vrf = vrf_default_get_or_create();
+
+		if (vrf != old && vrf_incref(vrf) < 0)
+			return -errno;
+
+		if (old != GR_VRF_ID_UNDEF)
+			vrf_decref(old);
+
+		cur->encap_vrf_id = vrf;
+	}
+
+	if (set_attrs & (GR_VXLAN_SET_VNI | GR_VXLAN_SET_ENCAP_VRF)) {
+		const struct vxlan_key next_key = {rte_cpu_to_be_32(next->vni), next->encap_vrf_id};
+		const struct vxlan_key cur_key = {rte_cpu_to_be_32(cur->vni), cur->encap_vrf_id};
+
+		if (rte_hash_lookup(vxlan_hash, &next_key) >= 0)
+			return errno_set(EADDRINUSE);
+
+		if (next->vni == 0 || next->vni > 0xffffff)
+			return errno_set(ERANGE);
+
+		rte_hash_del_key(vxlan_hash, &cur_key);
+
+		ret = rte_hash_add_key_data(vxlan_hash, &next_key, iface);
+		if (ret < 0)
+			return errno_log(-ret, "rte_hash_add_key_data");
+
+		cur->vni = next->vni;
+	}


⚠️ Potential issue | 🔴 Critical

Bug: cur_key for hash deletion uses the already-updated encap_vrf_id.

When GR_VXLAN_SET_ENCAP_VRF is set, line 63 updates cur->encap_vrf_id to the new value before the hash key update block at line 66. Consequently, cur_key on line 68 is built with the new encap_vrf_id, not the old one. The rte_hash_del_key on line 76 attempts to delete a key that doesn't exist yet, leaving the old hash entry ({old_vni, old_vrf}) as a stale orphan.

Proposed fix: capture old key before mutating cur

+ // Save old key components before ENCAP_VRF update may overwrite them. + const struct vxlan_key old_key = {rte_cpu_to_be_32(cur->vni), cur->encap_vrf_id}; + if (set_attrs & GR_VXLAN_SET_ENCAP_VRF) { uint16_t vrf = next->encap_vrf_id; uint16_t old = cur->encap_vrf_id; if (vrf == GR_VRF_ID_UNDEF) vrf = vrf_default_get_or_create(); if (vrf != old && vrf_incref(vrf) < 0) return -errno; if (old != GR_VRF_ID_UNDEF) vrf_decref(old); cur->encap_vrf_id = vrf; } if (set_attrs & (GR_VXLAN_SET_VNI | GR_VXLAN_SET_ENCAP_VRF)) { const struct vxlan_key next_key = {rte_cpu_to_be_32(next->vni), next->encap_vrf_id}; - const struct vxlan_key cur_key = {rte_cpu_to_be_32(cur->vni), cur->encap_vrf_id}; if (rte_hash_lookup(vxlan_hash, &next_key) >= 0) return errno_set(EADDRINUSE); if (next->vni == 0 || next->vni > 0xffffff) return errno_set(ERANGE); - rte_hash_del_key(vxlan_hash, &cur_key); + rte_hash_del_key(vxlan_hash, &old_key); ret = rte_hash_add_key_data(vxlan_hash, &next_key, iface);

🤖 Prompt for AI Agents

In `@modules/l2/control/vxlan.c` around lines 50 - 83, The delete uses cur->encap_vrf_id after it was overwritten, so rte_hash_del_key is built with the new encap_vrf_id instead of the old one; fix by capturing the old encap_vrf_id (and old vni if needed) before mutating cur (e.g., read old_vrf = cur->encap_vrf_id and build cur_key from old_vrf and cur->vni) or postpone assigning cur->encap_vrf_id until after the hash delete/add sequence; update the code around cur->encap_vrf_id, cur_key, rte_hash_del_key, next_key and rte_hash_add_key_data accordingly so the deletion targets the original {old_vni, old_vrf}.

modules/l2/control/vxlan.c

coderabbitai · 2026-02-14T00:11:02Z

modules/l2/datapath/vxlan_output.c

+		ip_output_mbuf_data(m)->nh = fib4_lookup(vxlan->encap_vrf_id, d->vtep);
+
+		edge = IP_OUTPUT;
+next:
+		rte_node_enqueue_x1(graph, node, edge, m);


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the vxlan_output.c file around the lines in question cat -n modules/l2/datapath/vxlan_output.c | sed -n '50,90p'

Repository: DPDK/grout

Length of output: 1387

🏁 Script executed:

# Search for fib4_lookup definition rg -n 'fib4_lookup' --type=c -A 5 | head -80

Repository: DPDK/grout

Length of output: 4748

🏁 Script executed:

# Check for BAD_NEXTHOP edge declaration and usage in vxlan_output.c rg -n 'BAD_NEXTHOP' modules/l2/datapath/vxlan_output.c

Repository: DPDK/grout

Length of output: 119

🏁 Script executed:

# Look for edge declarations in vxlan_output.c rg -n 'enum.*edge|edge.*=' modules/l2/datapath/vxlan_output.c -i | head -20

Repository: DPDK/grout

Length of output: 130

🏁 Script executed:

# Check fib4_lookup implementation cat -n modules/ip/control/route.c | sed -n '119,145p'

Repository: DPDK/grout

Length of output: 892

🏁 Script executed:

# Check if ip_output can handle NULL nexthop rg -n 'ip_output' modules/ip/datapath/ -A 20 | grep -A 20 'process.*ip_output' | head -40

Repository: DPDK/grout

Length of output: 1282

🏁 Script executed:

# Search for ip_output process function fd 'ip_output' --type=f modules/ip/datapath/ && grep -n 'process.*ip_output\|ip_output_process' modules/ip/datapath/*.c

Repository: DPDK/grout

Length of output: 286

🏁 Script executed:

# Look at what happens with NULL nexthop in ip_output rg -n '->nh' modules/ip/datapath/ip_output.c -C 2 | head -50

Repository: DPDK/grout

Length of output: 458

🏁 Script executed:

# Check ip_output_process function to see how it handles nexthop cat -n modules/ip/datapath/ip_output.c | sed -n '58,120p'

Repository: DPDK/grout

Length of output: 1904

🏁 Script executed:

# Look for how nexthop is dereferenced in ip_output rg -n 'nh\-' modules/ip/datapath/ip_output.c -C 2 | head -60

Repository: DPDK/grout

Length of output: 277

🏁 Script executed:

# Check if there's NULL checking for nexthop in ip_output rg -n 'nh == NULL|nh != NULL' modules/ip/datapath/ip_output.c

Repository: DPDK/grout

Length of output: 76

Missing NULL check for fib4_lookup result — error handling deferred to downstream node.

fib4_lookup can return NULL when no route is found, but vxlan_output doesn't check this locally. Instead, the packet proceeds to ip_output, which detects the NULL and routes it via NO_ROUTE to an error handler. While this works, it's inefficient and inconsistent with other modules (dnat44, icmp_output, ip_input). The BAD_NEXTHOP edge is declared but never used.

Proposed fix

ip_output_mbuf_data(m)->nh = fib4_lookup(vxlan->encap_vrf_id, d->vtep); - edge = IP_OUTPUT; + if (unlikely(ip_output_mbuf_data(m)->nh == NULL)) { + edge = BAD_NEXTHOP; + } else { + edge = IP_OUTPUT; + }

🤖 Prompt for AI Agents

In `@modules/l2/datapath/vxlan_output.c` around lines 75 - 79, vxlan_output currently assigns ip_output_mbuf_data(m)->nh = fib4_lookup(...) without checking for NULL and always sends packets to IP_OUTPUT; change vxlan_output to check the result of fib4_lookup (the value stored in ip_output_mbuf_data(m)->nh) and if it is NULL enqueue the packet to the BAD_NEXTHOP edge (the declared but unused BAD_NEXTHOP path) instead of forwarding to IP_OUTPUT, otherwise continue to set edge = IP_OUTPUT and enqueue as before; update the enqueue logic around rte_node_enqueue_x1(graph, node, edge, m) so the chosen edge reflects this NULL-check.

Introduce the VXLAN interface type for the L2 module. A VXLAN interface carries a VNI (VXLAN Network Identifier), a local VTEP address used as the outer IP source, an encapsulation VRF for underlay routing, and a configurable UDP destination port (default 4789). VXLAN interfaces are keyed by (VNI, encap_vrf_id) in a lockfree RCU-protected hash table so that the datapath can resolve incoming tunneled packets to the correct interface without locks. VXLAN interfaces are intended to be attached to a bridge domain. All L2 traffic entering the bridge is forwarded transparently over the VXLAN tunnel. The local VTEP address must already be configured in the encapsulation VRF. Signed-off-by: Robin Jarry <rjarry@redhat.com>

VXLAN uses UDP port 4789 by default but allows configuring a custom destination port per interface. Allow the control plane to register additional UDP ports at runtime as aliases for an already registered port, reusing the same datapath edge. Use reference counting so that multiple interfaces sharing the same non-default port do not interfere with each other during teardown. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Wire up the VXLAN interface's configurable destination port to the L4 input node. When a non-default port is configured, register it as an alias for the standard VXLAN port (4789) so that the datapath delivers matching UDP packets to the vxlan_input node. Unregister the alias when the port changes or the interface is destroyed. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Introduce a transport-agnostic flood list framework for BUM traffic (Broadcast, Unknown unicast, Multicast). In EVPN, each PE maintains a flooding list built from IMET routes (RFC 8365, RFC 9572). The entries in this list differ depending on the overlay encapsulation: VXLAN uses a remote VTEP IPv4 address and a VNI, while SRv6 would use a 128-bit SID. The API defines a gr_flood_entry structure with a type discriminant and a union, allowing future encapsulation types (e.g. SRv6 SIDs) to be added without changing the API request types. A dispatch layer in control/flood.c routes add/del/list operations to type-specific callbacks registered at init time. Implement the VXLAN VTEP flood type (GR_FLOOD_T_VTEP). Each VXLAN interface maintains a per-VNI array of remote VTEP addresses used by the vxlan_flood datapath node for ingress replication. The array is replaced atomically with an RCU synchronization barrier so that the datapath never sees a partially updated list. CLI commands are exposed under "flood vtep add/del/show". Signed-off-by: Robin Jarry <rjarry@redhat.com>

In a VXLAN overlay, the bridge needs to know which remote VTEP to use when sending unicast frames to a learned MAC address. Add a VTEP IPv4 address field to FDB entries so that known unicast traffic can be sent directly to the correct tunnel endpoint instead of being flooded to all VTEPs. When bridge_input learns a MAC address from a VXLAN member interface, it records the source VTEP from the decapsulated packet's outer IP header. When forwarding to a known destination, the stored VTEP address is passed to the output path via the mbuf private data so that vxlan_output can build the correct outer header. Only set the VTEP field when the source interface is actually a VXLAN type to avoid storing uninitialized data from other packet paths (control plane, local bridge traffic). Signed-off-by: Robin Jarry <rjarry@redhat.com>

Add three datapath nodes for VXLAN packet processing. vxlan_input decapsulates incoming UDP/4789 packets. It strips the outer UDP and VXLAN headers, resolves the inner VNI to a VXLAN interface via the RCU-protected hash table, records the source VTEP from the outer IP header into the mbuf private data, and forwards the inner Ethernet frame to iface_input for bridge processing. vxlan_output encapsulates outgoing frames for a known destination VTEP. It prepends a pre-built IP/UDP/VXLAN header template initialized by the control plane, fills in the per-packet fields (destination VTEP, UDP length, IP length, checksum), and hashes the inner flow to select an ephemeral source port for underlay ECMP (RFC 7348 Section 5). The FIB lookup for the outer IP uses the encapsulation VRF, not the bridge domain. vxlan_flood handles BUM traffic by replicating the frame to every VTEP in the flood list via ingress replication. The original mbuf is sent to the first VTEP and clones are created for the rest. The bridge_flood node is updated to steer VXLAN member traffic through vxlan_flood instead of direct iface_output. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Set up a VXLAN overlay between grout and a Linux netns peer. Grout runs a bridge with a VXLAN member (VNI 100) and the Linux side mirrors the topology with a kernel VXLAN device enslaved to a Linux bridge. Both sides have flood lists configured with each other's VTEP address for BUM traffic replication. The test verifies L3 connectivity over the tunnel by having the Linux side ping the bridge address. This exercises the full path: ARP resolution over VXLAN, FDB learning from decapsulated traffic, and ICMP echo reply via the VXLAN output encapsulation. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Report bridge interfaces to FRR as ZEBRA_IF_BRIDGE with their MAC address. Tag members with ZEBRA_IF_SLAVE_BRIDGE and propagate the bridge ifindex so that FRR can associate them with the correct master. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Report VXLAN interfaces to FRR's zebra as ZEBRA_IF_VXLAN with the associated L2 VNI information. This allows FRR's EVPN control plane to discover which VNIs are locally configured and advertise them via BGP IMET routes to remote PEs. The VXLAN L2 info includes the VNI, the local VTEP address, and the underlay interface index so that zebra can correlate the tunnel with the correct underlay routing context. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Synchronize bridge FDB entries bidirectionally between grout and FRR. This is required for EVPN to advertise locally learned MAC addresses via BGP type-2 routes and to install remotely learned MACs into the bridge forwarding table. Subscribe to FDB add/del/update events from grout and translate them to dplane MAC install/delete operations for zebra. In the reverse direction, handle DPLANE_OP_MAC_INSTALL/DELETE from FRR and convert them to GR_FDB_ADD/DEL API calls. The VTEP address is propagated in both directions so that remote MACs are associated with the correct tunnel endpoint. Self-event suppression is enabled on the FDB subscriptions to prevent feedback loops when FRR installs a MAC that was originally learned by grout. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Handle DPLANE_OP_VTEP_ADD and DPLANE_OP_VTEP_DELETE operations from FRR's EVPN control plane. When BGP learns a remote VTEP via an IMET route (EVPN Route Type 3), zebra pushes the VTEP to the dataplane provider which translates it to a GR_FLOOD_ADD/DEL request with GR_FLOOD_T_VTEP type. This allows BGP EVPN to dynamically manage the per-VNI flood lists used for BUM traffic ingress replication, replacing the need for static flood list configuration via the CLI. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Set up a full EVPN/VXLAN topology between FRR+grout and a standalone FRR+Linux peer. Each side runs a bridge with a VXLAN member (VNI 100) and a host namespace. Both peers run iBGP with the l2vpn evpn address-family and advertise-all-vni. The test verifies that EVPN type-3 (IMET) routes are exchanged so that both sides install each other's VTEP in their flood lists. It then verifies end-to-end L2 connectivity by pinging between the two host namespaces through the VXLAN overlay, which exercises type-2 (MAC/IP) route advertisement and FDB synchronization. Signed-off-by: Robin Jarry <rjarry@redhat.com>

rjarry added 17 commits February 13, 2026 16:47

smoke: fix typo in vrf config test

9be621a

There is no "name" argument available when creating an interface. The name is the first argument. Fixes: 9d5152f ("smoke: add VRF configuration tests") Signed-off-by: Robin Jarry <rjarry@redhat.com>

rjarry marked this pull request as draft February 14, 2026 00:06

coderabbitai bot reviewed Feb 14, 2026

View reviewed changes

rjarry added 10 commits February 14, 2026 01:13

frr: sync bridge interfaces

f8c2d8d

Report bridge interfaces to FRR as ZEBRA_IF_BRIDGE with their MAC address. Tag members with ZEBRA_IF_SLAVE_BRIDGE and propagate the bridge ifindex so that FRR can associate them with the correct master. Signed-off-by: Robin Jarry <rjarry@redhat.com>

rjarry added 2 commits February 14, 2026 01:13

rjarry force-pushed the vxlan branch from 6710de2 to 9c97cb1 Compare February 14, 2026 00:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VXLAN/EVPN support with flood list management#504

Add VXLAN/EVPN support with flood list management#504
rjarry wants to merge 29 commits intoDPDK:mainfrom
rjarry:vxlan

rjarry commented Feb 14, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 14, 2026

Walkthrough

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 14, 2026

Uh oh!

Uh oh!

coderabbitai bot Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rjarry commented Feb 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 14, 2026

Walkthrough

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rjarry commented Feb 14, 2026 •

edited by coderabbitai bot

Loading