GRAID: Generating Reasoning questions from Analysis of Images via Discriminative artificial intelligence
- Install uv (optional if you already have it):
curl -LsSf https://astral.sh/uv/install.sh | sh(or see uv installation guide) - Create a virtual environment:
uv venv - Activate it:
source .venv/bin/activate(or use direnv with the provided .envrc) - Install dependencies:
uv sync - Install all backends:
uv run install_all
Generate high-quality VQA datasets for modern ML workflows:
# Interactive mode with step-by-step guidance
graid generate-datasetKey Features:
- π― Object Filtering: Smart allowable sets for focused object detection
- π¬ Multi-Model Ensemble: Weighted Boxes Fusion (WBF) for improved accuracy
- βοΈ Flexible Configuration: JSON configs for reproducible experiments
- π HuggingFace Hub Integration: Direct upload to share datasets
- πΌοΈ PIL Image Support: Ready for modern vision-language models
- π Rich Metadata: Comprehensive dataset documentation
Quick Examples:
# Generate with specific object types (autonomous driving focus)
uv run graid generate-dataset --allowable-set "person,car,truck,bicycle,traffic light"
# Multi-model ensemble for enhanced accuracy
uv run graid generate-dataset --config examples/wbf_ensemble.json
# Upload directly to HuggingFace Hub
uv run graid generate-dataset --upload-to-hub --hub-repo-id "your-org/dataset-name"
# List all valid COCO objects
uv run graid generate-dataset --list-objectsCreate reusable configurations for systematic experiments:
Basic Configuration:
{
"dataset_name": "bdd",
"split": "val",
"models": [
{
"backend": "detectron",
"model_name": "faster_rcnn_R_50_FPN_3x",
"confidence_threshold": 0.7
},
{
"backend": "mmdetection",
"model_name": "co_detr",
"confidence_threshold": 0.6
}
],
"use_wbf": true,
"wbf_config": {
"iou_threshold": 0.6,
"model_weights": [1.0, 1.2]
},
"allowable_set": ["person", "car", "truck", "bus", "motorcycle", "bicycle"],
"confidence_threshold": 0.5,
"batch_size": 4
}Advanced Configuration with Custom Questions and Transforms:
{
"dataset_name": "bdd",
"split": "val",
"models": [
{
"backend": "ultralytics",
"model_name": "yolov8x.pt",
"confidence_threshold": 0.6
}
],
"use_wbf": false,
"allowable_set": ["person", "car", "bicycle", "motorcycle", "traffic light"],
"confidence_threshold": 0.5,
"batch_size": 2,
"questions": [
{
"name": "HowMany",
"params": {}
},
{
"name": "Quadrants",
"params": {
"N": 3,
"M": 3
}
},
{
"name": "WidthVsHeight",
"params": {
"threshold": 0.4
}
},
{
"name": "LargestAppearance",
"params": {
"threshold": 0.35
}
},
{
"name": "MostClusteredObjects",
"params": {
"threshold": 80
}
}
],
"transforms": {
"type": "yolo_bdd",
"new_shape": [640, 640]
},
"save_path": "./datasets/custom_bdd_vqa",
"upload_to_hub": true,
"hub_repo_id": "your-org/bdd-reasoning-dataset",
"hub_private": false
}Custom Model Configuration:
{
"dataset_name": "custom",
"split": "train",
"models": [
{
"backend": "detectron",
"model_name": "custom_retinanet",
"custom_config": {
"config": "path/to/config.yaml",
"weights": "path/to/model.pth"
}
},
{
"backend": "ultralytics",
"model_name": "custom_yolo",
"custom_config": {
"model_path": "path/to/custom_yolo.pt"
}
}
],
"transforms": {
"type": "yolo_bdd",
"new_shape": [832, 832]
},
"questions": [
{
"name": "IsObjectCentered",
"params": {}
},
{
"name": "LeftOf",
"params": {}
},
{
"name": "RightOf",
"params": {}
}
]
}Bring Your Own Data: GRAID supports any PyTorch-compatible dataset:
from graid.data.generate_dataset import generate_dataset
from torch.utils.data import Dataset
class CustomDataset(Dataset):
"""Your custom dataset implementation"""
def __getitem__(self, idx):
# Return: (image_tensor, optional_annotations, metadata)
# Annotations are only needed for mAP/mAR evaluation
# For VQA generation, only images are required
pass
# Generate HuggingFace dataset from your data
dataset = generate_dataset(
dataset_name="custom",
split="train",
models=your_models,
allowable_set=["person", "vehicle"],
save_path="./datasets/custom_vqa"
)Key Point: Custom datasets only require images for VQA generation. Annotations are optional and only needed if you want to evaluate model performance with mAP/mAR metrics.
Combine predictions from multiple models using Weighted Boxes Fusion for enhanced detection accuracy:
- Improved precision through model consensus
- Configurable fusion parameters and model weights
- Supports mixed backends (Detectron2 + MMDetection + Ultralytics)
Focus datasets on specific object categories:
- Common presets: Autonomous driving, indoor scenes, animals
- Interactive selection: Visual picker from 80 COCO categories
- Manual specification: Comma-separated object lists
- Validation: Automatic checking against COCO standard
Generated datasets include:
- PIL Images: Direct compatibility with vision-language models
- Rich Annotations: Bounding boxes, confidence scores, object classes
- Structured QA Pairs: Question templates with precise answers
- Comprehensive Metadata: Model info, generation parameters, statistics
| Detectron2 | MMDetection | Ultralytics | |
|---|---|---|---|
| Object Detection | β | β | β |
| Instance Segmentation | β | β | β |
| WBF Ensemble | β | β | β |
| BDD100K | NuImages | Waymo | |
|---|---|---|---|
| Object Detection | β | β | β |
| Instance Segmentation | β | β | β |
| HuggingFace Export | β | β | β |
Detectron2: faster_rcnn_R_50_FPN_3x, retinanet_R_101_FPN_3x
MMDetection: co_detr, dino, rtmdet
Ultralytics: yolov8x, yolov10x, yolo11x, rtdetr-x
This framework enables systematic evaluation of:
- Vision-Language Models: Generate targeted VQA benchmarks
- Object Detection Methods: Compare model performance on specific object types
- Reasoning Capabilities: Create challenging spatial and counting questions
- Domain Adaptation: Generate domain-specific evaluation sets
- Ensemble Methods: Evaluate fusion strategies across detection models
Generated datasets undergo comprehensive validation:
- Model Verification: Automatic testing of model loading and inference
- Annotation Quality: Confidence score filtering and duplicate removal
- Metadata Integrity: Complete provenance tracking for reproducibility
- Format Compliance: COCO-standard annotations with HuggingFace compatibility
Interactive CLI: User-friendly prompts for dataset and model selection
uv run graid generateAvailable Commands:
uv run graid --help # Show help
uv run graid list-models # List available models
uv run graid list-questions # List available question types with parameters
uv run graid info # Show project information
uv run graid generate-dataset # Modern HuggingFace generation
# Interactive features
uv run graid generate-dataset --interactive-questions # Select questions interactively
uv run graid generate-dataset --list-questions # Show available questionsGRAID is open source software licensed under the Apache License 2.0. This applies to both the GRAID framework code and any datasets generated using GRAID.
Important: When using GRAID with source datasets (BDD100K, Waymo, nuImages, etc.), you must also comply with the original source dataset license terms.