Skip to content

Conversation

@Juliaj
Copy link
Collaborator

@Juliaj Juliaj commented Dec 3, 2025

Purpose

Supports #225, Design SLAM RAI features

  • This PR introduces rai_semap, a semantic map memory system that stores object detections with their 3D locations in the SLAM map frame. The system enables spatial-semantic queries like "where did I see a cup?" and provides persistent memory for robot exploration and task planning.

Proposed Changes

  • design doc & readme.
  • This is an early-stage implementation. The main goal is to gather feedback on the design and approach before further development. I'm also looking for ideas to deal with following challenges,
    • Deduplication: One of the main challenges is handling multiple detections of the same physical object. The system uses spatial merging with class-specific thresholds, point cloud-based matching when available, and confidence filtering to merge duplicate detections. However, distinguishing between a moved object and a new object instance remains a challenge that needs further refinement.

    • Object Detection Accuracy: The accuracy of object detection directly impacts the quality of the semantic map. The system includes confidence thresholds and bounding box size filtering, but detection accuracy depends on GroundingDINO.

Testing

  • New unit tests have been added.
  • The current design has been validated with the rosbot-xl demo.

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 65.32%. Comparing base (1567ab1) to head (d4c85be).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #727   +/-   ##
=======================================
  Coverage   65.32%   65.32%           
=======================================
  Files          78       78           
  Lines        3386     3386           
=======================================
  Hits         2212     2212           
  Misses       1174     1174           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@maciejmajek maciejmajek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After pyproject.toml changes I was able to run the semap feature. Amazing work!

The design.md mentions various algorithms for spatial/temporal deduplication. Before I reflect on that, I need to read some literature on this topic.

Since this PR is huge, I will continue the review tomorrow.


@abstractmethod
def spatial_query(
self, center: Point, radius: float, filters: Optional[Dict[str, Any]] = None
Copy link
Member

@maciejmajek maciejmajek Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I feel like interwining non ROS 2 code with ROS 2 types creates confusion and limits future integrations. Consider using src/rai_core/rai/types. Such separation validates the ros2 directory you've created. Let me know what you think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such separation validates the ros2 directory you've created.

Indeed, time to walk the talk.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice adapter pattern: rai/types is an adapter between ROS 2 and the Python system.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out to be a bit more work than I anticipated! I ended up using both the Pose from rai.types and the ROS2 message type. See the struggle here

The issue is that ROS2 transform functions (tf2_geometry_msgs) require ROS2 message types, but we also need rai.types.Pose for the Pydantic models. Using PoseType = Any with arbitrary_types_allowed=True lets Pydantic accept both, which works but isn't ideal. If you have suggestions for a cleaner approach, I'm all ears!

self.wait_tool = WaitForSecondsTool()
self._nav_action_ready = False

def wait_for_nav_action_server(self, timeout_sec: float = 60.0) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from rai.communication.ros2 import wait_for_ros2_actions - I think we need to add support for timeout in ros2 waiters. The boilerplate below is what we want to eliminate with ros2connectors & their tooling

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since there is no timeout in waiters, its ok as it is

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked as #728

Comment on lines 235 to 242
def _initialize_tf(self):
"""Initialize TF buffer and listener.

Note: ROS2Connector already creates _tf_buffer and _tf_listener internally.
We access the connector's TF buffer to avoid duplication.
"""
# Use the connector's internal TF buffer (already created in __init__)
self.tf_buffer = self._tf_buffer
Copy link
Member

@maciejmajek maciejmajek Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was left over code, removed.

map_frame_id = self._get_string_parameter("map_frame_id")
map_resolution = self._get_double_parameter("map_resolution")

backend = SQLiteBackend(database_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should most likely be configurable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a tool that can retrieve object types previously seen in a location.

Example:

  • The model is asked to bring an object that can appear in different shapes such as a bottle, can, or cup.
  • Finding the correct item might take three turns when using QuerySemanticMapTool, but only two turns if it can refer to a SeenObjects record.

At some point this may require context compression, but that can be handled by a future general module.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this do ?

class GetSeenObjectsTool(BaseTool):
    """Tool for retrieving object types previously seen in a location."""

    name: str = "get_seen_objects"
    description: str = (
        "Get a list of object types (e.g., 'bottle', 'cup', 'table') that have "
        "been previously seen in a location. Useful for discovering what objects "
        "exist before querying for specific instances."
    )

    args_schema: Type[GetSeenObjectsToolInput] = GetSeenObjectsToolInput

    memory: SemanticMapMemory

    def _run(self, location_id: Optional[str] = None) -> str:
        """Get list of distinct object classes seen in a location."""
        pass

I've added skeleton code to support this ask and actual implementation will come in subsequent PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, cool

vision_detection_id=vision_detection_id,
metadata=metadata if metadata else existing_ann.metadata,
)
self.memory.backend.update_annotation(updated)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From architectural (LoD) standpoint, I think that the node should not reach that far (memory.backend). Would it make sense to expose update annotation or similar in memory?


def _process_detection(
self,
detection,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

important type missing

# It performs object detection and 3D pose computation, which are general perception
# tasks not specific to semantic mapping. Consider moving to rai_perception when
# that package has ROS2 node infrastructure in place.
class DetectionPublisher(ROS2Connector):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use ROS2Connector through dependency injection. I'm not opposed to inheritance, just would love to know your reasoning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, they're in the ros2 folder and conceptually are ROS2 connectors. Time to walk the talk of loose coupling. I went ahead and refactored it. With dependency injection, the call chain is a bit longer than direct calling.

@maciejmajek
Copy link
Member

@Juliaj I've finished the review. Huge feature!

@Juliaj
Copy link
Collaborator Author

Juliaj commented Dec 10, 2025

I've finished the review.
@maciejmajek

Huge thanks again for all the detailed review feedback! I've deferred a few items for follow-up—one is tracked as an issue, and the other involves moving code to rai_perception. I'll work on these in a subsequent PR to keep this one more focused.

Copy link
Member

@maciejmajek maciejmajek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maciejmajek maciejmajek merged commit 148bf74 into main Dec 14, 2025
8 checks passed
@maciejmajek maciejmajek deleted the jj/feat/rai_slam branch December 14, 2025 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants