
A high-throughput, full-stack open AI research lab optimizing the critical bottlenecks of AI systems.
- [2025/12] 🚀 We released a PyTorch native INT8 quantization API for TorchAO, enabling INT8 inference with up to 4× memory reduction. Read the report.
- [2025/12] 🌐 We welcomed our First Batch of remote researchers working on open-source AI infrastructure.
AER Labs is an open AI research lab focused on advancing AI systems across the full stack. Our vision is not just to optimize AI systems, but to remove the critical bottlenecks that prevent widespread adoption.
We work on:
- GPU Orchestration: Efficient resource management for distributed GPU workloads.
- Inference Optimization: KV cache optimization, quantization, and efficient attention mechanisms.
- Agentic AI: Workflow orchestration and efficient serving for autonomous applications.
AER Labs drives research in critical infrastructure areas:
-
State-of-the-art Inference Optimization
- High-throughput serving infrastructure.
- Custom CUDA/Triton kernels for quantization.
- Support for dynamic activation quantization (INT8×INT8).
-
Production-Grade Generative AI
- LLM Serving: Optimized serving for large language models and multimodal systems.
- Vertical AI: Domain-specific solutions for quantitative finance and cybersecurity.
-
Flexible Research Frameworks
- Deep Learning: Novel architectures and training techniques.
- Benchmarking: Comprehensive comparison of serving frameworks (vLLM, SGLang, TGI).
We are actively recruiting for our next batch of researchers and developers.
- Apply Now: Join the First Batch
- Speaker Series: Join sessions with experts from NVIDIA, AMD, and ByteDance. View Events
- Read our Research: Explore our technical deep dives. Blog
For questions and collaborations, please reach out:
- General Inquiries: contact@aerlabs.tech