Skip to content

Add L2 bridge interface support#491

Open
rjarry wants to merge 16 commits intoDPDK:mainfrom
rjarry:bridge
Open

Add L2 bridge interface support#491
rjarry wants to merge 16 commits intoDPDK:mainfrom
rjarry:bridge

Conversation

@rjarry
Copy link
Collaborator

@rjarry rjarry commented Jan 29, 2026

Implement L2 bridging functionality with MAC learning, flooding, and per-bridge configuration options.

This series introduces a new bridge interface type that can attach port and bond interfaces as members. Packets received on member interfaces are forwarded based on learned MAC addresses, with unknown unicast, broadcast and multicast traffic flooded to all other members.

The first commits prepare the infrastructure by moving the control queue to avoid circular dependencies and allowing nexthop creation from datapath workers. This enables the bridge input node to learn MAC addresses on the fly without blocking the control plane.

The L2 nexthop type stores (bridge_id, vlan_id, mac) tuples with automatic ageing based on per-bridge configurable timeout. Static entries can be created via CLI to override learned ones.

Bridge configuration supports disabling flooding (NO_FLOOD) and MAC learning (NO_LEARN) flags, custom MAC address, and up to 64 member interfaces. The frr plugin exposes bridges to zebra as ZEBRA_IF_BRIDGE with proper slave type on members.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added L2 bridge interface type with member management, supporting PORT, VLAN, and BOND interfaces as members.
    • Introduced Forwarding Database (FDB) for bridge learning and MAC address management.
    • Added bridge configuration: MAC address, ageing time, and flood/learn behavior flags.
    • Added CLI commands for bridge and FDB management: create, configure, and query operations.
    • Enhanced interface lifecycle with improved VRF binding and cleanup mechanisms.
    • Improved IPv6 address initialization and lifecycle management on interface events.
  • Bug Fixes

    • Fixed null-pointer dereference in cross-connect processing.
    • Enhanced RCU synchronization for proper resource cleanup across dataplane operations.
  • Tests

    • Added comprehensive bridge and L3 routing test coverage.

@coderabbitai

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

@rjarry rjarry force-pushed the bridge branch 2 times, most recently from abf123b to 2155207 Compare January 29, 2026 16:28
coderabbitai[bot]

This comment was marked as outdated.

@rjarry rjarry marked this pull request as draft February 1, 2026 20:47
@rjarry rjarry force-pushed the bridge branch 3 times, most recently from 9283839 to 21e11ee Compare February 5, 2026 11:01
@rjarry rjarry marked this pull request as ready for review February 6, 2026 09:34
coderabbitai[bot]

This comment was marked as outdated.

@rjarry rjarry marked this pull request as draft February 11, 2026 13:37
@rjarry rjarry force-pushed the bridge branch 3 times, most recently from 73085ff to 0cb64f6 Compare February 13, 2026 12:17
grep -q '10.' can match any 3 characters that start with '10' (e.g. 100,
10a, 10:, etc.). So it can match parts of IPv6 link local addresses.

Use full address and -F/--fixed-strings to avoid any special regexp
characters. We want verbatim match.

Fixes: 74228b7 ("cli: add address flush command")
Signed-off-by: Robin Jarry <rjarry@redhat.com>
There is no "name" argument available when creating an interface. The
name is the first argument.

Fixes: 9d5152f ("smoke: add VRF configuration tests")
Signed-off-by: Robin Jarry <rjarry@redhat.com>
When removing a port which is the xconnect peer of another one,
iface_from_id(iface->domain_id) will return NULL since the interface was
deleted.

Program terminated with signal SIGSEGV, Segmentation fault.
  xconnect_process at modules/infra/datapath/xconnect.c:36
                      if (peer->type == GR_IFACE_TYPE_PORT) {
  __rte_node_process at subprojects/dpdk/lib/graph/rte_graph_worker_common.h:216
  rte_graph_walk_rtc at subprojects/dpdk/lib/graph/rte_graph_model_rtc.h:42
  rte_graph_walk at subprojects/dpdk/lib/graph/rte_graph_worker.h:38
  gr_datapath_loop at modules/infra/datapath/main_loop.c:252

Check the return value and drop the packet in that case.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
When an interface leaves VRF mode (e.g. reconfigured as cross-connect),
any IPv4 and IPv6 addresses previously configured on it become invalid.
Likewise, when an interface moves to a different VRF, its addresses
belong to the old VRF and need to be removed.

Subscribe to GR_EVENT_IFACE_POST_RECONFIG in both IPv4 and IPv6 address
modules. On reconfiguration, flush all addresses when the interface is
no longer in VRF mode or has moved to a different VRF. For IPv6, also
reinitialize link-local and well-known multicast addresses when entering
VRF mode or changing VRFs.

Extend the IPv6 add/del smoke test to exercise VRF reassignment and
cross-connect mode transitions.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
When an interface is removed, GR_EVENT_IFACE_PRE_REMOVE is handled by
both nexthop_iface_cleanup() in nexthop.c and address-family-specific
handlers in ip/control/address.c and ip6/control/address.c (added in
6a1362c "ip,ip6: flush addresses on interface mode change").

The event handler execution order is not guaranteed. If
nexthop_iface_cleanup() runs first, it destroys local address nexthops
by decrementing their ref_count to zero. When the address-family handler
runs next, it accesses already-freed nexthops via nexthop_info_l3(),
leading to use-after-free.

Skip local address nexthops (NH_LOCAL_ADDR_FLAGS) in
nh_cleanup_interface_cb(), leaving their cleanup to addr4_delete() and
addr6_delete() which properly remove them from the per-interface address
vector and handle associated routes.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
When a bond is destroyed, its member ports are detached but remain
without a VRF assignment. When a port is destroyed, its peer interfaces
(other ports whose domain_id points to this port) lose their domain
reference.

In both cases, reassign the orphaned ports to the default VRF and fire
GR_EVENT_IFACE_POST_RECONFIG so that address-family handlers can flush
stale addresses and reinitialize as needed.

Export vrf_default_get_or_create() so it can be used from bond and port
teardown paths.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
When rte_rcu_qsbr_dq_enqueue() fails in DQ mode, the deleted key slot
is never freed and becomes permanently leaked. Also, when
rte_hash_add_key_data() overwrites an existing key, the old data pointer
is silently lost. With RCU-protected readers still potentially accessing
the old data, there is no safe way to free it.

Add two patches from an upstream series [1]:

- Fall back to synchronous reclamation instead of only logging an error
  when the RCU defer queue enqueue fails on key deletion.
- When RCU is configured with a free_key_data_func callback,
  automatically defer-free the old data pointer on overwrite.

The third patch from that series (adding a new rte_hash_replace API) is
not needed since the free_key_data_func callback is sufficient.

[1] https://patches.dpdk.org/project/dpdk/list/?series=37352

Signed-off-by: Robin Jarry <rjarry@redhat.com>
@rjarry rjarry force-pushed the bridge branch 3 times, most recently from f94fef6 to 1fe044a Compare February 13, 2026 16:26
@rjarry rjarry marked this pull request as ready for review February 13, 2026 16:29
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
main/control_queue.c (1)

50-52: ⚠️ Potential issue | 🟠 Major

drained accumulates ARRAY_DIM(items) instead of count — may exit the loop prematurely.

Line 52 adds the array capacity (ARRAY_DIM(items)) to drained rather than the actual number of items dequeued (count). If the ring has more than ARRAY_DIM(items) pending items but a burst returns fewer, drained inflates faster than actual progress, causing the loop to hit the CONTROL_QUEUE_SIZE cap before all items are processed.

Proposed fix
-	drained += ARRAY_DIM(items);
+	drained += count;
🤖 Fix all issues with AI agents
In `@modules/l2/control/bridge.c`:
- Around line 31-52: bridge_attach_member currently allows adding the same iface
multiple times; add a duplicate check similar to bond_attach_member by iterating
br->members[0..br->n_members-1] and if any entry equals the incoming member
return 0 (no-op) before appending. Use the same comparison semantics as
bond_attach_member, keep the check before the capacity check (br->n_members ==
ARRAY_DIM(br->members)), and do not change member state when returning early.

In `@modules/l2/control/fdb.c`:
- Around line 378-381: The ageing calculation uses CLOCKS_PER_SEC to convert the
microsecond value returned by gr_clock_us() which is misleading; update the
conversion in the age calculation (the code using fdb->last_seen, gr_clock_us(),
and age) to divide by an explicit microsecond-per-second constant (e.g. 1000000
or a US_PER_SEC constant) instead of CLOCKS_PER_SEC so the unit conversion is
explicit and robust; add a named constant US_PER_SEC and use (now -
fdb->last_seen) / US_PER_SEC when computing age.
- Around line 417-422: fdb_pool allocated in fdb_reconfig is never freed in
fdb_fini causing a mempool leak; update fdb_fini to check fdb_pool and call the
appropriate mempool free (e.g. rte_mempool_free or the project-specific free
function) and set fdb_pool to NULL, in addition to the existing cleanup of
ageing_timer and fdb_hash so both the hash and the mempool are released on
shutdown.

Comment on lines 31 to 52
static int bridge_attach_member(struct iface *bridge, struct iface *member) {
struct iface_info_bridge *br = iface_info_bridge(bridge);

switch (member->type) {
case GR_IFACE_TYPE_PORT:
case GR_IFACE_TYPE_VLAN:
case GR_IFACE_TYPE_BOND:
break;
default:
return errno_set(EMEDIUMTYPE);
}

if (br->n_members == ARRAY_DIM(br->members))
return errno_set(EUSERS);

br->members[br->n_members++] = member;
member->domain_id = bridge->id;
member->vrf_id = GR_VRF_ID_UNDEF;
member->mode = GR_IFACE_MODE_BRIDGE;

return 0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check the bond_attach_member function to see if it has a duplicate check
cat modules/infra/control/bond.c | sed -n '170,185p'

Repository: DPDK/grout

Length of output: 529


🏁 Script executed:

# Let's also find the caller of bridge_attach_member to understand the attach flow
rg -n 'bridge_attach_member' --type c

Repository: DPDK/grout

Length of output: 234


🏁 Script executed:

# Check how iface attach_domain is called to understand if there's higher-level protection
rg -n 'attach_domain|detach_domain' modules/infra/control/iface.c -A 5 -B 5

Repository: DPDK/grout

Length of output: 2246


🏁 Script executed:

# Look for any domain attachment safeguards in the iface module
rg -n 'struct iface' modules/infra/control/iface.h -A 3 | head -40

Repository: DPDK/grout

Length of output: 123


Missing duplicate member check in bridge_attach_member.

If the same interface is attached to a bridge multiple times, it will occupy multiple slots in br->members and cause inconsistencies on detach (only the first slot is removed). The analogous bond_attach_member in modules/infra/control/bond.c explicitly checks for and returns early on duplicates—bridge should do the same.

Proposed fix
 static int bridge_attach_member(struct iface *bridge, struct iface *member) {
 	struct iface_info_bridge *br = iface_info_bridge(bridge);
 
 	switch (member->type) {
 	case GR_IFACE_TYPE_PORT:
 	case GR_IFACE_TYPE_VLAN:
 	case GR_IFACE_TYPE_BOND:
 		break;
 	default:
 		return errno_set(EMEDIUMTYPE);
 	}
 
+	for (unsigned i = 0; i < br->n_members; i++) {
+		if (br->members[i] == member)
+			return 0; // already a member
+	}
+
 	if (br->n_members == ARRAY_DIM(br->members))
 		return errno_set(EUSERS);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static int bridge_attach_member(struct iface *bridge, struct iface *member) {
struct iface_info_bridge *br = iface_info_bridge(bridge);
switch (member->type) {
case GR_IFACE_TYPE_PORT:
case GR_IFACE_TYPE_VLAN:
case GR_IFACE_TYPE_BOND:
break;
default:
return errno_set(EMEDIUMTYPE);
}
if (br->n_members == ARRAY_DIM(br->members))
return errno_set(EUSERS);
br->members[br->n_members++] = member;
member->domain_id = bridge->id;
member->vrf_id = GR_VRF_ID_UNDEF;
member->mode = GR_IFACE_MODE_BRIDGE;
return 0;
}
static int bridge_attach_member(struct iface *bridge, struct iface *member) {
struct iface_info_bridge *br = iface_info_bridge(bridge);
switch (member->type) {
case GR_IFACE_TYPE_PORT:
case GR_IFACE_TYPE_VLAN:
case GR_IFACE_TYPE_BOND:
break;
default:
return errno_set(EMEDIUMTYPE);
}
for (unsigned i = 0; i < br->n_members; i++) {
if (br->members[i] == member)
return 0; // already a member
}
if (br->n_members == ARRAY_DIM(br->members))
return errno_set(EUSERS);
br->members[br->n_members++] = member;
member->domain_id = bridge->id;
member->vrf_id = GR_VRF_ID_UNDEF;
member->mode = GR_IFACE_MODE_BRIDGE;
return 0;
}
🤖 Prompt for AI Agents
In `@modules/l2/control/bridge.c` around lines 31 - 52, bridge_attach_member
currently allows adding the same iface multiple times; add a duplicate check
similar to bond_attach_member by iterating br->members[0..br->n_members-1] and
if any entry equals the incoming member return 0 (no-op) before appending. Use
the same comparison semantics as bond_attach_member, keep the check before the
capacity check (br->n_members == ARRAY_DIM(br->members)), and do not change
member state when returning early.

Comment on lines +378 to +381
continue;

age = (now - fdb->last_seen) / CLOCKS_PER_SEC;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Ageing arithmetic uses CLOCKS_PER_SEC to convert gr_clock_us() microseconds — semantically misleading.

gr_clock_us() returns microseconds. Dividing by CLOCKS_PER_SEC (which is 1,000,000 on POSIX) happens to produce seconds, but only by coincidence — CLOCKS_PER_SEC is defined for clock() ticks, not for microsecond values. If gr_clock_us() ever changes unit, or if this runs on a non-POSIX platform, this silently breaks.

Consider using 1000000 or a US_PER_SEC constant explicitly.

🤖 Prompt for AI Agents
In `@modules/l2/control/fdb.c` around lines 378 - 381, The ageing calculation uses
CLOCKS_PER_SEC to convert the microsecond value returned by gr_clock_us() which
is misleading; update the conversion in the age calculation (the code using
fdb->last_seen, gr_clock_us(), and age) to divide by an explicit
microsecond-per-second constant (e.g. 1000000 or a US_PER_SEC constant) instead
of CLOCKS_PER_SEC so the unit conversion is explicit and robust; add a named
constant US_PER_SEC and use (now - fdb->last_seen) / US_PER_SEC when computing
age.

Comment on lines 417 to 422
static void fdb_fini(struct event_base *) {
if (ageing_timer != NULL)
event_free(ageing_timer);

rte_hash_free(fdb_hash);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

fdb_pool is never freed in fdb_fini — resource leak.

fdb_reconfig allocates both fdb_hash and fdb_pool, but fdb_fini only frees the hash. The mempool leaks on shutdown.

Proposed fix
 static void fdb_fini(struct event_base *) {
 	if (ageing_timer != NULL)
 		event_free(ageing_timer);
 
 	rte_hash_free(fdb_hash);
+	rte_mempool_free(fdb_pool);
 }
🤖 Prompt for AI Agents
In `@modules/l2/control/fdb.c` around lines 417 - 422, fdb_pool allocated in
fdb_reconfig is never freed in fdb_fini causing a mempool leak; update fdb_fini
to check fdb_pool and call the appropriate mempool free (e.g. rte_mempool_free
or the project-specific free function) and set fdb_pool to NULL, in addition to
the existing cleanup of ageing_timer and fdb_hash so both the hash and the
mempool are released on shutdown.

When outputting on a VLAN interface, the local iface variable is
reassigned to the parent interface after VLAN tag insertion. The
subsequent UP status check and TX stats increment then use this
reassigned pointer, accounting them on the parent instead of the
original VLAN interface.

Use d->iface which still references the original VLAN interface
for the status check and stats increment.

Fixes: 7701685 ("port: add dedicated port_tx functions")
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Bridge members that are not VLAN interfaces (trunk ports) need to
carry the VLAN ID through the output path so that the Ethernet
header can be built with the correct 802.1Q tag. iface_output
unconditionally clears d->vlan_id to zero for non-VLAN interfaces,
discarding the VLAN ID set during input processing.

Only set d->vlan_id when the output interface is actually a VLAN
type. Clear it instead at the points where it is no longer needed:
in eth_output after the Ethernet header has been built, and in the
control plane injection path where no VLAN context exists.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
A future change will require calling control_queue_push() from
gr_event_push() which lives in main/. If control_queue stays in the
infra module, this would create a circular dependency between main and
infra.

Move control_queue.c and gr_control_queue.h to main/ and replace the
event-based drain mechanism with explicit control_queue_drain() calls
from iface_destroy() and nexthop_destroy() after the RCU sync.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Pass rte_lcore_id() to rte_rcu_qsbr_synchronize() instead of
RTE_QSBR_THRID_INVALID to exclude the calling thread from the quiescent
state wait. This is needed to allow creating objects from datapath
workers.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Event notifications must be processed on the control plane thread.
Modify gr_event_push() to detect when it is called from a datapath
worker and use the control queue to defer the notification to the
control plane event loop.

This enables datapath nodes (such as bridge MAC learning) to create
MAC entries on the fly without blocking the control plane.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Introduce a new l2 module with a bridge interface type that allows
grouping multiple member interfaces (ports, VLANs, bonds) into a single
L2 broadcast domain.

The bridge maintains a list of members and supports configurable MAC
learning, BUM traffic flooding, per-bridge ageing timer and a custom
MAC address. Members are switched to GR_IFACE_MODE_BRIDGE when attached
and restored to the default VRF when the bridge is destroyed.

FDB management and datapath nodes for actual packet forwarding will
follow in subsequent commits.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Implement a forwarding database backed by an RCU-protected rte_hash
with a dedicated rte_mempool for entries. The hash is configured with
a free_key_data_func callback so that deleted entries are automatically
returned to the pool after RCU synchronization.

Entries can be added/deleted/flushed via the API and can also be
dynamically learned from the datapath via fdb_learn(). A periodic
ageing timer evicts learned entries that have not been refreshed
within the bridge ageing_time. Static entries configured by the user
are never aged out. FDB entries associated with a member or bridge are
automatically purged on detach or bridge destruction.

The FDB table size defaults to 4096 entries and can be changed at
runtime via the config set/get API, provided the table is empty.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Add bridge_input and bridge_flood datapath nodes.

bridge_input receives packets from member interfaces via
GR_IFACE_MODE_BRIDGE. It learns source MAC addresses into the FDB
(unless GR_BRIDGE_F_NO_LEARN is set), then looks up the destination.

Known unicast destinations are forwarded to the learned output
interface. Unknown unicast, broadcast and multicast are sent to
bridge_flood. Hairpin packets (destination is the source interface)
are dropped. When the destination is the bridge interface itself,
packets are sent to eth_input for local processing.

bridge_flood replicates each packet to all bridge members except the
ingress interface, and to the bridge interface itself. The first
output reuses the original mbuf, subsequent ones are cloned.

When GR_BRIDGE_F_NO_FLOOD is set, the packet is dropped instead.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Create a bridge with three member ports and verify L2 forwarding
between namespaces, L3 reachability to the bridge interface address,
and overwriting a dynamic FDB entry with a static one.

Also check that detaching a member and deleting the bridge properly
clean up FDB entries.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant