Skip to content

Segmentation fault when adding a short-lived node to an executor #777

@adamlm

Description

@adamlm

Bug report

Description

I am trying to set up an executor "spinner thread" that will spin a specified rclcpp::Executor until commanded to stop. My goal is to construct the executor and spinner thread in one scope then add nodes to that executor in another scope. However, the rclcpp::Executor classes cause a segmentation fault under certain conditions. There appears to be a race condition in the RMW layer that manifests with short-lived nodes.

  • Operating System: Ubuntu 22.04 (Docker container); Ubuntu 22.04 host

  • Installation type: Binaries

  • Version or commit hash:

    • ros-humble-rclcpp: 16.0.10

    • ros-humble-rmw-fastrtps-cpp: 6.2.7

    • ros-humble-fastrtps: 2.6.8

    • ros-humble-rmw-cyclonedds-cpp: 0.10.4

    • ros-humble-cyclonedds: 1.3.4

  • DDS implementation:

    • Fast-DDS (Fast-RTPS)
    • Cyclone DDS
  • Client library (if applicable): rclcpp

  • Compiler:

    • Clang 14
    • GCC 11.4.0

Steps to reproduce issue

Minimal reproducible example

#include <atomic>
#include <memory>
#include <thread>

#include <rclcpp/executors/multi_threaded_executor.hpp>
#include <rclcpp/node.hpp>

auto function() -> void
{
    rclcpp::init(0, nullptr);

    rclcpp::executors::MultiThreadedExecutor executor;

    std::atomic_bool stop_spinning{false};
    auto spinner_thread{std::thread([&] {
        while (rclcpp::ok() && !stop_spinning.load()) {
            executor.spin_some();
        }
    })};

    {
        auto node{std::make_shared<rclcpp::Node>("node")};
        executor.add_node(node);
    }

    stop_spinning.store(true);
    spinner_thread.join();

    rclcpp::shutdown();
}

auto main() -> int
{
    while (true) {
        function();
    }

    return 0;
}

Steps

  • Compile and run the above example.
  • Wait for a segmentation fault error. It usually happens within a minute.

Expected behavior

The program runs indefinitely.

Actual behavior

The program exits due to a segmentation fault.

Program output when compiled with Clang

Segmentation fault (core dumped)

Program output when compiled with GCC

Segmentation fault (core dumped)

In (only) one of the many runs I did, I got the following output (when compiled with GCC):

terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
  what():  failed to add guard condition to wait set: guard condition implementation is invalid, at ./src/rcl/guard_condition.c:172, at ./src/rcl/wait.c:460
Aborted (core dumped)

Additional information

Relevant core dump

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f4154232e3f in rmw_fastrtps_shared_cpp::__rmw_wait(char const*, rmw_subscriptions_s*, rmw_guard_conditions_s*, rmw_services_s*, rmw_clients_s*, rmw_events_s*, rmw_wait_set_s*, rmw_time_s const*) ()
   from /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
[Current thread is 1 (Thread 0x7f41531e8640 (LWP 1202848))]
(gdb) bt
#0  0x00007f4154232e3f in rmw_fastrtps_shared_cpp::__rmw_wait(char const*, rmw_subscriptions_s*, rmw_guard_conditions_s*, rmw_services_s*, rmw_clients_s*, rmw_events_s*, rmw_wait_set_s*, rmw_time_s const*) ()
    at /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
#1  0x00007f415428a887 in rmw_wait(rmw_subscriptions_t*, rmw_guard_conditions_t*, rmw_services_t*, rmw_clients_t*, rmw_events_t*, rmw_wait_set_t*, rmw_time_t const*)
    (subscriptions=<optimized out>, guard_conditions=<optimized out>, services=<optimized out>, clients=<optimized out>, events=<optimized out>, wait_set=<optimized out>, wait_timeout=0x7f41531d5510) at ./src/rmw_wait.cpp:33
#2  0x00007f42759b1848 in rcl_wait () at /opt/ros/humble/lib/librcl.so
#3  0x00007f4274f2470c in rclcpp::Executor::wait_for_work(std::chrono::duration<long, std::ratio<1l, 1000000000l> >) (this=0x7ffdbe823c50, timeout=...) at /usr/include/c++/11/chrono:521
#4  0x00007f4274f28853 in rclcpp::Executor::spin_some_impl(std::chrono::duration<long, std::ratio<1l, 1000000000l> >, bool) (this=0x7ffdbe823c50, max_duration=..., exhaustive=false) at /usr/include/c++/11/chrono:537
#5  0x000063ca4387dbde in function()::$_0::operator()() const (this=0x63ca44fc84e8) at <redacted_executable_path>:17
#6  0x000063ca4387db1d in std::__invoke_impl<void, function()::$_0>(std::__invoke_other, function()::$_0&&) (__f=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61
#7  0x000063ca4387daad in std::__invoke<function()::$_0>(function()::$_0&&) (__fn=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96
#8  0x000063ca4387da85 in std::thread::_Invoker<std::tuple<function()::$_0> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x63ca44fc84e8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:259
#9  0x000063ca4387da55 in std::thread::_Invoker<std::tuple<function()::$_0> >::operator()() (this=0x63ca44fc84e8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:266
#10 0x000063ca4387d979 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<function()::$_0> > >::_M_run() (this=0x63ca44fc84e0) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211
#11 0x00007f4273705253 in  () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#12 0x00007f427338dac3 in  () at /usr/lib/x86_64-linux-gnu/libc.so.6
#13 0x00007f427341f850 in  () at /usr/lib/x86_64-linux-gnu/libc.so.6

Other notes

  • The issue occurs when using either the SingleThreadedExecutor or MultiThreadedExecutor.

  • No segmentation fault occurs* when relocating node to the outer scope.

    auto node{std::make_shared<rclcpp::Node>("node")};
    
    {
        executor.add_node(node);
    }
  • No segmentation fault occurs* when using Cyclone DDS

  • No segmentation fault occurs* when sleeping the main thread for a short duration before adding node to the executor.

    {
        auto node{std::make_shared<rclcpp::Node>("node")};
        std::this_thread::sleep_for(std::chrono::seconds(1));
        executor.add_node(node);
    }

* I concluded a segmentation fault was unlikely after running the program for several minutes.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions