Skip to content

Grounded Semantic Role Labeling for Human–Robot Interaction — synthetic multimodal data generation and MiniCPM-V training for grounded command understanding. (EMNLP 2025)

Notifications You must be signed in to change notification settings

crux82/GroundedSRL4HRI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Grounded SRL for Human-Robot Interaction

Grounded Semantic Role Labeling from Synthetic Multimodal Data for Situated Robot Commands

The project introduces multimodal models for grounded semantic role labeling and the generation of synthetic domestic images. The generated dataset is conditioned on linguistic and environmental constraints extracted from the HuRIC dataset, enabling experiments in Situated Human-Robot Interaction (HRI).

The paper has been accepted to EMNLP 2025 and can be accessed here.


Overview

The repository provides the methods to train and evaluate multimodal models, specifically targeting Grounded Semantic Role Labelling (G-SRL) in domestic environments, using synthetic generated images, through a complete pipeline for generating and processing synthetic visual data for robotic command understanding. The pipeline supports:

  • Extraction of constraints from HuRIC annotations
  • Prompt generation for synthetic image creation
  • Image generation using diffusion models
  • Automatic bounding box annotation
  • Consistency checking with visual LLMs
  • Filtering and selection of top-ranked samples

Main Components

This repository includes two primary components:

1. training_models/

Contains the training and evaluation scripts for applying MiniCPM-V 2.6 to the G-SRL task using the generated and validated dataset.
Refer to the README in training_models/ for configuration and usage.

2. image_generation/

A self-contained pipeline to create a set of images for the G-SRL dataset. It includes:

  • Constraint and prompt generation
  • Diffusion-based image generation
  • Automatic bounding box labelling
  • Visual consistency evaluation
  • Top-k image selection

Refer to the README in image_generator/ for full details.


Setup Instructions

The prerequisites and environment setup are common to the entire project. You will need:

  • CUDA-capable GPUs
  • NVIDIA CUDA drivers installed
  • Python + Conda

Create the environment as follows:

export CUDA_HOME=/usr/local/cuda
conda env create -f environment.yml
conda activate visual_grounding
./install_requirements.sh

Getting Started

Each subfolder includes a dedicated README to walk you through its functionality. A typical workflow consists of:

  1. Running the image generation pipeline (image_generator/)
  2. Using the generated images to train or evaluate a model (training_models/)

If you want to just train the MiniCPM models (Step 2.), you can use our public available datasets by setting the correct paths.


Documentation

Please refer to the subfolder READMEs for detailed instructions on each component.


Citation

@inproceedings{hromei-etal-2025-grounded,
    title = "Grounded Semantic Role Labelling from Synthetic Multimodal Data for Situated Robot Commands",
    author = "Hromei, Claudiu Daniel  and
      Scaiella, Antonio  and
      Croce, Danilo  and
      Basili, Roberto",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.1212/",
    pages = "23758--23781",
    ISBN = "979-8-89176-332-6",
}

About

Grounded Semantic Role Labeling for Human–Robot Interaction — synthetic multimodal data generation and MiniCPM-V training for grounded command understanding. (EMNLP 2025)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •