-
Notifications
You must be signed in to change notification settings - Fork 268
Description
Bug report
Hi all, thank you for maintaining this wonderful library!
I've been encountering an intermittent bug that has left my head scratching. Unfortunately recently it started occurring more often, so I sat down and spent some time tracking down what was going on. Now, I've got a good idea of the problem- but I need some more expert eyes to decide on a good solution. I'll lay out all my findings and some perceived paths forward in this bug report.
Required Info:
- Operating System:
- Ubuntu 22.04, within a docker image that is also running Ubuntu 22.04
- Installation type:
- Binaries as installed from ros2 apt.
- Version or commit hash:
- rclpy==3.3.7, python3.10
- DDS implementation:
- CycloneDDS
- Client library (if applicable):
- rclpy==3.3.7, python3.10
Steps to reproduce issue
There are multiple places where this race condition manifests itself. I will demonstrate how to reproduce it when calling send_goal_async. Take the ActionClient.send_goal_async and add a sleep(#) directly below the line sequence_number = self._client_handle.send_goal_request(request). For example:
def send_goal_async(self, goal, feedback_callback=None, goal_uuid=None):
"""
...
"""
if not isinstance(goal, self._action_type.Goal):
raise TypeError()
request = self._action_type.Impl.SendGoalService.Request()
request.goal_id = (
self._generate_random_uuid() if goal_uuid is None else goal_uuid
)
request.goal = goal
sequence_number = self._client_handle.send_goal_request(request)
sleep(5) # <--- this line causes this race condition nearly 100% of the time
if sequence_number in self._pending_goal_requests:
raise RuntimeError(
"Sequence ({}) conflicts with pending goal request".format(
sequence_number
)
)
if feedback_callback is not None:
# TODO(jacobperron): Move conversion function to a general-use package
goal_uuid = bytes(request.goal_id.uuid)
self._feedback_callbacks[goal_uuid] = feedback_callback
future = Future()
self._pending_goal_requests[sequence_number] = future
self._goal_sequence_number_to_goal_id[sequence_number] = request.goal_id
future.add_done_callback(self._remove_pending_goal_request)
# Add future so executor is aware
self.add_future(future)
return futureExpected behavior
I would expect 'send_goal_async' to run a bit slower than usual, then return a future. That future might complete quite quickly (or already be complete).
Actual behavior
The future never completes, and I see the following line in the logs
Ignoring unexpected goal response. There may be more than one action server for the action '{ACTION_NAME}'Additional information
By my sleuthing, it appears that this error is caused by the following important lines being called:
client:Thread1
send_goal_asyncis called, and within it:sequence_number = self._client_handle.send_goal_request(request)self._goal_sequence_number_to_goal_id[sequence_number] = request.goal_id
client:Thread2
- A response to a goal request is received in the client C bindings, causing it to return True to
is_ready(...) execute(...)is calledif sequence_number in self._goal_sequence_number_to_goal_id:evaluates to False, becauseclient:Thread1has not reached its step 3- It ignores the
taken_data, and thus the future will never haveset_resultcalled.
server:Thread1
- A goal request is received, and responded to
Cause Summary
There is a race condition when sending goal requests, because the goal request can complete before the sequence_number is tracked in the self._goal_sequence_number_to_goal_id dictionary. Thus execute is called before it is tracked, and the taken_data is ignored, assumed to be a response from a duplicate action server.
I believe this same race condition affects all usages of the client API, such as _cancel_goal_async, or _get_result_async.
Recommendations
- Put a lock around every usage of
self._client_handleso that python-side objects can be edited and synced up. That way ifexecuteis called it will block until theself._client_handleisn't being used, and the_goal_sequence_number_to_goal_idwill be filled in. - Alternatively, create locks for the various in-memory dictionaries that are used in
execute:
- _goal_handles
- _pending_goal_requests
- _goal_sequence_number_to_goal_id
- _pending_cancel_requests
- _pending_result_requests
- _result_sequence_number_to_goal_id