CABiNet (Context Aggregation Network) is a dual-branch convolutional neural network designed for real-time semantic segmentation with significantly lower computational costs compared to state-of-the-art methods while maintaining competitive accuracy. The architecture is specifically optimized for autonomous systems and real-time applications.
- High Performance: Achieves 75.9% mIoU on Cityscapes test set at 76 FPS (NVIDIA RTX 2080Ti)
- Edge Deployment: 8 FPS on Jetson Xavier NX for embedded applications
- Dual-Branch Architecture: Combines high-resolution spatial detailing with efficient context aggregation
- Lightweight Design: Reduced computational overhead through optimized global and local context blocks
- Multi-Scale Support: Effective feature extraction across different scales
| Dataset | mIoU | FPS (RTX 2080Ti) | FPS (Jetson Xavier NX) |
|---|---|---|---|
| Cityscapes | 75.9% | 76 | 8 |
| UAVid | 63.5% | 15 | - |
The CABiNet architecture employs a dual-branch design that balances spatial detail preservation and contextual understanding:
- Spatial Branch: Maintains high-resolution features for precise boundary detection
- Context Branch: Lightweight global aggregation and local distribution blocks for capturing long-range and local dependencies
- Feature Fusion Module (FFM): Normalizes and selects optimal features for scene segmentation
- Deep Supervision: Bottleneck in context branch enables better representational learning
Comparison of semantic segmentation results on the Cityscapes validation set:
From top to bottom: Input RGB images, SwiftNet predictions, CABiNet predictions (red boxes highlight improvements), ground truth
Performance on the UAVid validation set for aerial imagery:
Columns: Input images, State-of-the-art predictions, CABiNet predictions (white boxes show improvements)
- Python 3.8 or higher
- CUDA-capable GPU (recommended)
- Conda or pip for package management
-
Clone the repository:
git clone https://github.com/dronefreak/CABiNet.git cd CABiNet -
Create and activate environment:
# Using conda with provided environment file conda env create -f environment.yml conda activate cabinet # Or install in local environment mkdir env/ conda env create -f environment.yml --prefix env/cabinet conda activate env/cabinet
-
Install package:
pip install -e .
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
pip install -e .CABiNet/
├── src/
│ ├── models/ # Neural network architectures
│ │ ├── cabinet.py # Main CABiNet model implementation
│ │ ├── cab.py # Context Aggregation Block (plug-and-play module)
│ │ ├── mobilenetv3.py # MobileNetV3 backbone implementations
│ │ ├── layers/ # Shared layer components
│ │ └── constants.py # Model configuration constants
│ ├── datasets/ # Data loading and preprocessing
│ │ ├── cityscapes.py # Cityscapes dataset loader
│ │ ├── uavid.py # UAVid dataset loader
│ │ └── transform.py # Data augmentation pipeline
│ ├── scripts/ # Training and evaluation scripts
│ │ ├── train.py # Model training
│ │ ├── evaluate.py # Model evaluation (single/multi-scale)
│ │ └── visualize.py # Visualization and demo
│ └── utils/ # Utility functions
│ ├── loss.py # Loss functions (OHEM, Focal Loss)
│ ├── optimizer.py # Custom optimizer with warmup
│ ├── logger.py # Logging utilities
│ ├── profiler.py # Performance profiling tools
│ └── exceptions.py # Custom exception classes
├── configs/ # Configuration files
│ ├── train.yaml # Training configuration (Hydra)
│ ├── dataset/ # Dataset-specific configs
│ │ ├── cityscapes.yaml
│ │ └── uavid.yaml
│ ├── model/ # Model-specific configs
│ │ ├── mobilenetv3_large.yaml
│ │ └── mobilenetv3_small.yaml
│ ├── cityscapes_info.json # Cityscapes label information
│ └── UAVid_info.json # UAVid label information
├── tests/ # Test suite
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ └── conftest.py # Shared test fixtures
├── legacy/ # Legacy configuration files
└── .github/ # GitHub workflows and documentation
└── workflows/ # CI/CD pipelines
Models (src/models/)
cabinet.py: Complete implementation of the CABiNet architecture including spatial branch, context branch, and feature fusion modulescab.py: Context Aggregation Block - a modular component that can be integrated into other PyTorch modelsmobilenetv3.py: MobileNetV3-Large and MobileNetV3-Small backbone implementations with pretrained weight loadinglayers/common.py: Reusable layer components (DepthwiseConv, DepthwiseSeparableConv)
Datasets (src/datasets/)
cityscapes.py: Cityscapes dataset loader with label remapping and thread-safe preprocessinguavid.py: UAVid dataset loader with patch-based processing for large aerial imagestransform.py: Comprehensive data augmentation including geometric, photometric, and regularization transforms
Scripts (src/scripts/)
train.py: Main training script with Hydra configuration, mixed precision training, and evaluationevaluate.py: Multi-scale evaluation with sliding window inferencevisualize.py: Visualization script for generating prediction overlays
Utilities (src/utils/)
loss.py: OHEM Cross Entropy and Focal Loss implementationsoptimizer.py: Custom optimizer with polynomial learning rate decay and warmupprofiler.py: Performance profiling for inference time, memory usage, and FLOPs analysis
Configuration (configs/)
train.yaml: Main training configuration with hyperparameters and pathsdataset/*.yaml: Dataset-specific configurations (paths, preprocessing parameters)model/*.yaml: Model architecture configurations (backbone selection, feature dimensions)
-
Download the dataset from Cityscapes website:
gtFine_trainvaltest.zip(241MB) - Ground truth labelsleftImg8bit_trainvaltest.zip(11GB) - RGB images
-
Extract and configure:
# Extract datasets unzip gtFine_trainvaltest.zip -d data/cityscapes/ unzip leftImg8bit_trainvaltest.zip -d data/cityscapes/ # Update dataset path in configs/dataset/cityscapes.yaml
-
Start training:
export CUDA_VISIBLE_DEVICES=0 python src/scripts/train.py
-
Download from UAVid website under Downloads section
-
Configure and train:
# Update dataset path in configs/dataset/uavid.yaml python src/scripts/train.py dataset=uavid
Evaluate a trained model on the validation set:
# Single-scale evaluation
python src/scripts/evaluate.py --model-path experiments/model_best.pth
# Multi-scale evaluation (better accuracy, slower)
python src/scripts/evaluate.py --model-path experiments/model_best.pth --multi-scaleGenerate prediction visualizations:
python src/scripts/visualize.pyBenchmark model performance:
from src.utils.profiler import PerformanceProfiler
from src.models.cabinet import CABiNet
model = CABiNet(n_classes=19, mode="large")
profiler = PerformanceProfiler(model)
# Run comprehensive benchmark
results = profiler.run_full_benchmark(
input_size=(1, 3, 512, 512),
num_iterations=100
)
print(f"Average FPS: {results['timing']['fps']:.2f}")
print(f"Peak Memory: {results['memory']['peak_mb']:.2f} MB")Pretrained weights for MobileNetV3 backbones are available in the src/models/pretrained_backbones/ directory.
Note: Full CABiNet pretrained models on Cityscapes and UAVid will be available soon.
Run the test suite to verify installation:
# Run all tests
pytest tests/
# Run with coverage report
pytest tests/ --cov=src --cov-report=html
# Run specific test category
pytest tests/unit/ # Unit tests only
pytest tests/integration/ # Integration tests onlySee tests/README.md for detailed testing documentation.
The project uses several tools to maintain code quality:
- Black: Code formatting
- isort: Import sorting
- flake8: Linting
- mypy: Type checking
- pytest: Testing framework
Install pre-commit hooks to automatically check code quality:
pip install pre-commit
pre-commit install
# Run manually on all files
pre-commit run --all-filesContributions are welcome! Please see CONTRIBUTING.md for guidelines on:
- Setting up the development environment
- Code style and conventions
- Testing requirements
- Pull request process
If you find this work helpful, please consider citing our papers:
ICRA 2021:
@INPROCEEDINGS{9560977,
author={Kumaar, Saumya and Lyu, Ye and Nex, Francesco and Yang, Michael Ying},
booktitle={2021 IEEE International Conference on Robotics and Automation (ICRA)},
title={CABiNet: Efficient Context Aggregation Network for Low-Latency Semantic Segmentation},
year={2021},
pages={13517-13524},
doi={10.1109/ICRA48506.2021.9560977}
}ISPRS Journal 2021:
@article{YANG2021124,
title = {Real-time Semantic Segmentation with Context Aggregation Network},
journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
volume = {178},
pages = {124-134},
year = {2021},
issn = {0924-2716},
doi = {https://doi.org/10.1016/j.isprsjprs.2021.06.006},
url = {https://www.sciencedirect.com/science/article/pii/S0924271621001647},
author = {Michael Ying Yang and Saumya Kumaar and Ye Lyu and Francesco Nex},
keywords = {Semantic segmentation, Real-time, Convolutional neural network, Context aggregation network}
}This project is licensed under the MIT License - see the LICENSE file for details.
For questions, issues, or collaboration opportunities:
- Email: kumaar324@gmail.com
- Issues: Please use the GitHub issue tracker
- Pull Requests: Contributions are welcome via pull requests
This work was conducted at the University of Twente, Faculty of Geo-Information Science and Earth Observation (ITC).


