Conversation
This comment was marked as outdated.
This comment was marked as outdated.
abf123b to
2155207
Compare
9283839 to
21e11ee
Compare
73085ff to
0cb64f6
Compare
grep -q '10.' can match any 3 characters that start with '10' (e.g. 100, 10a, 10:, etc.). So it can match parts of IPv6 link local addresses. Use full address and -F/--fixed-strings to avoid any special regexp characters. We want verbatim match. Fixes: 74228b7 ("cli: add address flush command") Signed-off-by: Robin Jarry <rjarry@redhat.com>
There is no "name" argument available when creating an interface. The name is the first argument. Fixes: 9d5152f ("smoke: add VRF configuration tests") Signed-off-by: Robin Jarry <rjarry@redhat.com>
When removing a port which is the xconnect peer of another one,
iface_from_id(iface->domain_id) will return NULL since the interface was
deleted.
Program terminated with signal SIGSEGV, Segmentation fault.
xconnect_process at modules/infra/datapath/xconnect.c:36
if (peer->type == GR_IFACE_TYPE_PORT) {
__rte_node_process at subprojects/dpdk/lib/graph/rte_graph_worker_common.h:216
rte_graph_walk_rtc at subprojects/dpdk/lib/graph/rte_graph_model_rtc.h:42
rte_graph_walk at subprojects/dpdk/lib/graph/rte_graph_worker.h:38
gr_datapath_loop at modules/infra/datapath/main_loop.c:252
Check the return value and drop the packet in that case.
Signed-off-by: Robin Jarry <rjarry@redhat.com>
When an interface leaves VRF mode (e.g. reconfigured as cross-connect), any IPv4 and IPv6 addresses previously configured on it become invalid. Likewise, when an interface moves to a different VRF, its addresses belong to the old VRF and need to be removed. Subscribe to GR_EVENT_IFACE_POST_RECONFIG in both IPv4 and IPv6 address modules. On reconfiguration, flush all addresses when the interface is no longer in VRF mode or has moved to a different VRF. For IPv6, also reinitialize link-local and well-known multicast addresses when entering VRF mode or changing VRFs. Extend the IPv6 add/del smoke test to exercise VRF reassignment and cross-connect mode transitions. Signed-off-by: Robin Jarry <rjarry@redhat.com>
When an interface is removed, GR_EVENT_IFACE_PRE_REMOVE is handled by both nexthop_iface_cleanup() in nexthop.c and address-family-specific handlers in ip/control/address.c and ip6/control/address.c (added in 6a1362c "ip,ip6: flush addresses on interface mode change"). The event handler execution order is not guaranteed. If nexthop_iface_cleanup() runs first, it destroys local address nexthops by decrementing their ref_count to zero. When the address-family handler runs next, it accesses already-freed nexthops via nexthop_info_l3(), leading to use-after-free. Skip local address nexthops (NH_LOCAL_ADDR_FLAGS) in nh_cleanup_interface_cb(), leaving their cleanup to addr4_delete() and addr6_delete() which properly remove them from the per-interface address vector and handle associated routes. Signed-off-by: Robin Jarry <rjarry@redhat.com>
When a bond is destroyed, its member ports are detached but remain without a VRF assignment. When a port is destroyed, its peer interfaces (other ports whose domain_id points to this port) lose their domain reference. In both cases, reassign the orphaned ports to the default VRF and fire GR_EVENT_IFACE_POST_RECONFIG so that address-family handlers can flush stale addresses and reinitialize as needed. Export vrf_default_get_or_create() so it can be used from bond and port teardown paths. Signed-off-by: Robin Jarry <rjarry@redhat.com>
When rte_rcu_qsbr_dq_enqueue() fails in DQ mode, the deleted key slot is never freed and becomes permanently leaked. Also, when rte_hash_add_key_data() overwrites an existing key, the old data pointer is silently lost. With RCU-protected readers still potentially accessing the old data, there is no safe way to free it. Add two patches from an upstream series [1]: - Fall back to synchronous reclamation instead of only logging an error when the RCU defer queue enqueue fails on key deletion. - When RCU is configured with a free_key_data_func callback, automatically defer-free the old data pointer on overwrite. The third patch from that series (adding a new rte_hash_replace API) is not needed since the free_key_data_func callback is sufficient. [1] https://patches.dpdk.org/project/dpdk/list/?series=37352 Signed-off-by: Robin Jarry <rjarry@redhat.com>
f94fef6 to
1fe044a
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
main/control_queue.c (1)
50-52:⚠️ Potential issue | 🟠 Major
drainedaccumulatesARRAY_DIM(items)instead ofcount— may exit the loop prematurely.Line 52 adds the array capacity (
ARRAY_DIM(items)) todrainedrather than the actual number of items dequeued (count). If the ring has more thanARRAY_DIM(items)pending items but a burst returns fewer,drainedinflates faster than actual progress, causing the loop to hit theCONTROL_QUEUE_SIZEcap before all items are processed.Proposed fix
- drained += ARRAY_DIM(items); + drained += count;
🤖 Fix all issues with AI agents
In `@modules/l2/control/bridge.c`:
- Around line 31-52: bridge_attach_member currently allows adding the same iface
multiple times; add a duplicate check similar to bond_attach_member by iterating
br->members[0..br->n_members-1] and if any entry equals the incoming member
return 0 (no-op) before appending. Use the same comparison semantics as
bond_attach_member, keep the check before the capacity check (br->n_members ==
ARRAY_DIM(br->members)), and do not change member state when returning early.
In `@modules/l2/control/fdb.c`:
- Around line 378-381: The ageing calculation uses CLOCKS_PER_SEC to convert the
microsecond value returned by gr_clock_us() which is misleading; update the
conversion in the age calculation (the code using fdb->last_seen, gr_clock_us(),
and age) to divide by an explicit microsecond-per-second constant (e.g. 1000000
or a US_PER_SEC constant) instead of CLOCKS_PER_SEC so the unit conversion is
explicit and robust; add a named constant US_PER_SEC and use (now -
fdb->last_seen) / US_PER_SEC when computing age.
- Around line 417-422: fdb_pool allocated in fdb_reconfig is never freed in
fdb_fini causing a mempool leak; update fdb_fini to check fdb_pool and call the
appropriate mempool free (e.g. rte_mempool_free or the project-specific free
function) and set fdb_pool to NULL, in addition to the existing cleanup of
ageing_timer and fdb_hash so both the hash and the mempool are released on
shutdown.
| static int bridge_attach_member(struct iface *bridge, struct iface *member) { | ||
| struct iface_info_bridge *br = iface_info_bridge(bridge); | ||
|
|
||
| switch (member->type) { | ||
| case GR_IFACE_TYPE_PORT: | ||
| case GR_IFACE_TYPE_VLAN: | ||
| case GR_IFACE_TYPE_BOND: | ||
| break; | ||
| default: | ||
| return errno_set(EMEDIUMTYPE); | ||
| } | ||
|
|
||
| if (br->n_members == ARRAY_DIM(br->members)) | ||
| return errno_set(EUSERS); | ||
|
|
||
| br->members[br->n_members++] = member; | ||
| member->domain_id = bridge->id; | ||
| member->vrf_id = GR_VRF_ID_UNDEF; | ||
| member->mode = GR_IFACE_MODE_BRIDGE; | ||
|
|
||
| return 0; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's check the bond_attach_member function to see if it has a duplicate check
cat modules/infra/control/bond.c | sed -n '170,185p'Repository: DPDK/grout
Length of output: 529
🏁 Script executed:
# Let's also find the caller of bridge_attach_member to understand the attach flow
rg -n 'bridge_attach_member' --type cRepository: DPDK/grout
Length of output: 234
🏁 Script executed:
# Check how iface attach_domain is called to understand if there's higher-level protection
rg -n 'attach_domain|detach_domain' modules/infra/control/iface.c -A 5 -B 5Repository: DPDK/grout
Length of output: 2246
🏁 Script executed:
# Look for any domain attachment safeguards in the iface module
rg -n 'struct iface' modules/infra/control/iface.h -A 3 | head -40Repository: DPDK/grout
Length of output: 123
Missing duplicate member check in bridge_attach_member.
If the same interface is attached to a bridge multiple times, it will occupy multiple slots in br->members and cause inconsistencies on detach (only the first slot is removed). The analogous bond_attach_member in modules/infra/control/bond.c explicitly checks for and returns early on duplicates—bridge should do the same.
Proposed fix
static int bridge_attach_member(struct iface *bridge, struct iface *member) {
struct iface_info_bridge *br = iface_info_bridge(bridge);
switch (member->type) {
case GR_IFACE_TYPE_PORT:
case GR_IFACE_TYPE_VLAN:
case GR_IFACE_TYPE_BOND:
break;
default:
return errno_set(EMEDIUMTYPE);
}
+ for (unsigned i = 0; i < br->n_members; i++) {
+ if (br->members[i] == member)
+ return 0; // already a member
+ }
+
if (br->n_members == ARRAY_DIM(br->members))
return errno_set(EUSERS);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static int bridge_attach_member(struct iface *bridge, struct iface *member) { | |
| struct iface_info_bridge *br = iface_info_bridge(bridge); | |
| switch (member->type) { | |
| case GR_IFACE_TYPE_PORT: | |
| case GR_IFACE_TYPE_VLAN: | |
| case GR_IFACE_TYPE_BOND: | |
| break; | |
| default: | |
| return errno_set(EMEDIUMTYPE); | |
| } | |
| if (br->n_members == ARRAY_DIM(br->members)) | |
| return errno_set(EUSERS); | |
| br->members[br->n_members++] = member; | |
| member->domain_id = bridge->id; | |
| member->vrf_id = GR_VRF_ID_UNDEF; | |
| member->mode = GR_IFACE_MODE_BRIDGE; | |
| return 0; | |
| } | |
| static int bridge_attach_member(struct iface *bridge, struct iface *member) { | |
| struct iface_info_bridge *br = iface_info_bridge(bridge); | |
| switch (member->type) { | |
| case GR_IFACE_TYPE_PORT: | |
| case GR_IFACE_TYPE_VLAN: | |
| case GR_IFACE_TYPE_BOND: | |
| break; | |
| default: | |
| return errno_set(EMEDIUMTYPE); | |
| } | |
| for (unsigned i = 0; i < br->n_members; i++) { | |
| if (br->members[i] == member) | |
| return 0; // already a member | |
| } | |
| if (br->n_members == ARRAY_DIM(br->members)) | |
| return errno_set(EUSERS); | |
| br->members[br->n_members++] = member; | |
| member->domain_id = bridge->id; | |
| member->vrf_id = GR_VRF_ID_UNDEF; | |
| member->mode = GR_IFACE_MODE_BRIDGE; | |
| return 0; | |
| } |
🤖 Prompt for AI Agents
In `@modules/l2/control/bridge.c` around lines 31 - 52, bridge_attach_member
currently allows adding the same iface multiple times; add a duplicate check
similar to bond_attach_member by iterating br->members[0..br->n_members-1] and
if any entry equals the incoming member return 0 (no-op) before appending. Use
the same comparison semantics as bond_attach_member, keep the check before the
capacity check (br->n_members == ARRAY_DIM(br->members)), and do not change
member state when returning early.
| continue; | ||
|
|
||
| age = (now - fdb->last_seen) / CLOCKS_PER_SEC; | ||
|
|
There was a problem hiding this comment.
Ageing arithmetic uses CLOCKS_PER_SEC to convert gr_clock_us() microseconds — semantically misleading.
gr_clock_us() returns microseconds. Dividing by CLOCKS_PER_SEC (which is 1,000,000 on POSIX) happens to produce seconds, but only by coincidence — CLOCKS_PER_SEC is defined for clock() ticks, not for microsecond values. If gr_clock_us() ever changes unit, or if this runs on a non-POSIX platform, this silently breaks.
Consider using 1000000 or a US_PER_SEC constant explicitly.
🤖 Prompt for AI Agents
In `@modules/l2/control/fdb.c` around lines 378 - 381, The ageing calculation uses
CLOCKS_PER_SEC to convert the microsecond value returned by gr_clock_us() which
is misleading; update the conversion in the age calculation (the code using
fdb->last_seen, gr_clock_us(), and age) to divide by an explicit
microsecond-per-second constant (e.g. 1000000 or a US_PER_SEC constant) instead
of CLOCKS_PER_SEC so the unit conversion is explicit and robust; add a named
constant US_PER_SEC and use (now - fdb->last_seen) / US_PER_SEC when computing
age.
| static void fdb_fini(struct event_base *) { | ||
| if (ageing_timer != NULL) | ||
| event_free(ageing_timer); | ||
|
|
||
| rte_hash_free(fdb_hash); | ||
| } |
There was a problem hiding this comment.
fdb_pool is never freed in fdb_fini — resource leak.
fdb_reconfig allocates both fdb_hash and fdb_pool, but fdb_fini only frees the hash. The mempool leaks on shutdown.
Proposed fix
static void fdb_fini(struct event_base *) {
if (ageing_timer != NULL)
event_free(ageing_timer);
rte_hash_free(fdb_hash);
+ rte_mempool_free(fdb_pool);
}🤖 Prompt for AI Agents
In `@modules/l2/control/fdb.c` around lines 417 - 422, fdb_pool allocated in
fdb_reconfig is never freed in fdb_fini causing a mempool leak; update fdb_fini
to check fdb_pool and call the appropriate mempool free (e.g. rte_mempool_free
or the project-specific free function) and set fdb_pool to NULL, in addition to
the existing cleanup of ageing_timer and fdb_hash so both the hash and the
mempool are released on shutdown.
When outputting on a VLAN interface, the local iface variable is reassigned to the parent interface after VLAN tag insertion. The subsequent UP status check and TX stats increment then use this reassigned pointer, accounting them on the parent instead of the original VLAN interface. Use d->iface which still references the original VLAN interface for the status check and stats increment. Fixes: 7701685 ("port: add dedicated port_tx functions") Signed-off-by: Robin Jarry <rjarry@redhat.com>
Bridge members that are not VLAN interfaces (trunk ports) need to carry the VLAN ID through the output path so that the Ethernet header can be built with the correct 802.1Q tag. iface_output unconditionally clears d->vlan_id to zero for non-VLAN interfaces, discarding the VLAN ID set during input processing. Only set d->vlan_id when the output interface is actually a VLAN type. Clear it instead at the points where it is no longer needed: in eth_output after the Ethernet header has been built, and in the control plane injection path where no VLAN context exists. Signed-off-by: Robin Jarry <rjarry@redhat.com>
A future change will require calling control_queue_push() from gr_event_push() which lives in main/. If control_queue stays in the infra module, this would create a circular dependency between main and infra. Move control_queue.c and gr_control_queue.h to main/ and replace the event-based drain mechanism with explicit control_queue_drain() calls from iface_destroy() and nexthop_destroy() after the RCU sync. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Pass rte_lcore_id() to rte_rcu_qsbr_synchronize() instead of RTE_QSBR_THRID_INVALID to exclude the calling thread from the quiescent state wait. This is needed to allow creating objects from datapath workers. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Event notifications must be processed on the control plane thread. Modify gr_event_push() to detect when it is called from a datapath worker and use the control queue to defer the notification to the control plane event loop. This enables datapath nodes (such as bridge MAC learning) to create MAC entries on the fly without blocking the control plane. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Introduce a new l2 module with a bridge interface type that allows grouping multiple member interfaces (ports, VLANs, bonds) into a single L2 broadcast domain. The bridge maintains a list of members and supports configurable MAC learning, BUM traffic flooding, per-bridge ageing timer and a custom MAC address. Members are switched to GR_IFACE_MODE_BRIDGE when attached and restored to the default VRF when the bridge is destroyed. FDB management and datapath nodes for actual packet forwarding will follow in subsequent commits. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Implement a forwarding database backed by an RCU-protected rte_hash with a dedicated rte_mempool for entries. The hash is configured with a free_key_data_func callback so that deleted entries are automatically returned to the pool after RCU synchronization. Entries can be added/deleted/flushed via the API and can also be dynamically learned from the datapath via fdb_learn(). A periodic ageing timer evicts learned entries that have not been refreshed within the bridge ageing_time. Static entries configured by the user are never aged out. FDB entries associated with a member or bridge are automatically purged on detach or bridge destruction. The FDB table size defaults to 4096 entries and can be changed at runtime via the config set/get API, provided the table is empty. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Add bridge_input and bridge_flood datapath nodes. bridge_input receives packets from member interfaces via GR_IFACE_MODE_BRIDGE. It learns source MAC addresses into the FDB (unless GR_BRIDGE_F_NO_LEARN is set), then looks up the destination. Known unicast destinations are forwarded to the learned output interface. Unknown unicast, broadcast and multicast are sent to bridge_flood. Hairpin packets (destination is the source interface) are dropped. When the destination is the bridge interface itself, packets are sent to eth_input for local processing. bridge_flood replicates each packet to all bridge members except the ingress interface, and to the bridge interface itself. The first output reuses the original mbuf, subsequent ones are cloned. When GR_BRIDGE_F_NO_FLOOD is set, the packet is dropped instead. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Create a bridge with three member ports and verify L2 forwarding between namespaces, L3 reachability to the bridge interface address, and overwriting a dynamic FDB entry with a static one. Also check that detaching a member and deleting the bridge properly clean up FDB entries. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Implement L2 bridging functionality with MAC learning, flooding, and per-bridge configuration options.
This series introduces a new bridge interface type that can attach port and bond interfaces as members. Packets received on member interfaces are forwarded based on learned MAC addresses, with unknown unicast, broadcast and multicast traffic flooded to all other members.
The first commits prepare the infrastructure by moving the control queue to avoid circular dependencies and allowing nexthop creation from datapath workers. This enables the bridge input node to learn MAC addresses on the fly without blocking the control plane.
The L2 nexthop type stores (bridge_id, vlan_id, mac) tuples with automatic ageing based on per-bridge configurable timeout. Static entries can be created via CLI to override learned ones.
Bridge configuration supports disabling flooding (NO_FLOOD) and MAC learning (NO_LEARN) flags, custom MAC address, and up to 64 member interfaces. The frr plugin exposes bridges to zebra as ZEBRA_IF_BRIDGE with proper slave type on members.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests