Grounded SRL for Human-Robot Interaction

Grounded Semantic Role Labeling from Synthetic Multimodal Data for Situated Robot Commands

The project introduces multimodal models for grounded semantic role labeling and the generation of synthetic domestic images. The generated dataset is conditioned on linguistic and environmental constraints extracted from the HuRIC dataset, enabling experiments in Situated Human-Robot Interaction (HRI).

The paper has been accepted to EMNLP 2025 and can be accessed here.

Overview

The repository provides the methods to train and evaluate multimodal models, specifically targeting Grounded Semantic Role Labelling (G-SRL) in domestic environments, using synthetic generated images, through a complete pipeline for generating and processing synthetic visual data for robotic command understanding. The pipeline supports:

Extraction of constraints from HuRIC annotations
Prompt generation for synthetic image creation
Image generation using diffusion models
Automatic bounding box annotation
Consistency checking with visual LLMs
Filtering and selection of top-ranked samples

Main Components

This repository includes two primary components:

1. `training_models/`

Contains the training and evaluation scripts for applying MiniCPM-V 2.6 to the G-SRL task using the generated and validated dataset.
Refer to the README in training_models/ for configuration and usage.

2. `image_generation/`

A self-contained pipeline to create a set of images for the G-SRL dataset. It includes:

Constraint and prompt generation
Diffusion-based image generation
Automatic bounding box labelling
Visual consistency evaluation
Top-k image selection

Refer to the README in image_generator/ for full details.

Setup Instructions

The prerequisites and environment setup are common to the entire project. You will need:

CUDA-capable GPUs
NVIDIA CUDA drivers installed
Python + Conda

Create the environment as follows:

export CUDA_HOME=/usr/local/cuda
conda env create -f environment.yml
conda activate visual_grounding
./install_requirements.sh

Getting Started

Each subfolder includes a dedicated README to walk you through its functionality. A typical workflow consists of:

Running the image generation pipeline (image_generator/)
Using the generated images to train or evaluate a model (training_models/)

If you want to just train the MiniCPM models (Step 2.), you can use our public available datasets by setting the correct paths.

Documentation

Please refer to the subfolder READMEs for detailed instructions on each component.

Citation

@inproceedings{hromei-etal-2025-grounded,
    title = "Grounded Semantic Role Labelling from Synthetic Multimodal Data for Situated Robot Commands",
    author = "Hromei, Claudiu Daniel  and
      Scaiella, Antonio  and
      Croce, Danilo  and
      Basili, Roberto",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.1212/",
    pages = "23758--23781",
    ISBN = "979-8-89176-332-6",
}

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
image_generator		image_generator
training_models		training_models
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Grounded SRL for Human-Robot Interaction

Overview

Main Components

1. `training_models/`

2. `image_generation/`

Setup Instructions

Getting Started

Documentation

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

crux82/GroundedSRL4HRI

Folders and files

Latest commit

History

Repository files navigation

Grounded SRL for Human-Robot Interaction

Overview

Main Components

1. training_models/

2. image_generation/

Setup Instructions

Getting Started

Documentation

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

1. `training_models/`

2. `image_generation/`

Packages