Hierarchical RAG architecture scaling to 693K chunks on consumer hardware (4GB VRAM). Features 3-address routing, hybrid vector+graph fusion, and SetFit classification.
-
Updated
Feb 11, 2026 - Python
Hierarchical RAG architecture scaling to 693K chunks on consumer hardware (4GB VRAM). Features 3-address routing, hybrid vector+graph fusion, and SetFit classification.
"Adaptive Hybrid Quantization Framework for deploying 7B+ LLMs on low-VRAM devices (e.g., GTX 1050). Features surgical block alignment and Numba-accelerated inference.
One-click Windows installer for Z-Image Turbo AI image generation. Optimized for low-VRAM GPUs (4GB+). Features Gradio web UI, automatic setup, and GGUF model support.
Lightweight 6GB VRAM Gradio web app with auto-installer for running AuraFlow locally — no cloud, no clutter.
A ComfyUI Workflow for low vram users
🎥 Generate high-quality videos on budget hardware with the Wan 2.2 14B Low-VRAM Workflow for ComfyUI, optimized for smooth performance and quick results.
Contains the notebooks and workflows configured to run inference from Wan 2.2 Animate with ComfyUI on Kaggle T4 GPUs smoothly
A privacy-first Generative AI pipeline for prototyping 3D-style game assets on consumer hardware. Optimized for low-VRAM (4GB) GPUs using PyTorch, Diffusers, and Streamlit.
Technical Showcase: 22B True-MoE Engine running on 6GB VRAM (GTX 1060). Demonstrates "Surgical" NF4 quantization, dynamic expert swapping, and the custom "Grace Hopper" pipeline.
A production-ready, frugal, sovereign AI system that orchestrates India's open-source language models to achieve state-of-the-art reasoning on consumer hardware through Test-Time Compute (TTC) and Cognitive Serialization.
Audit local LLM function calling and agentic reliability. Visual tool-use benchmarking for quantized models on YOUR hardware.
🚀 Run modern 7B LLMs on legacy 4GB GPUs without crashes, breaking the VRAM barrier for developers facing GPU limitations.
Add a description, image, and links to the low-vram topic page so that developers can more easily learn about it.
To associate your repository with the low-vram topic, visit your repo's landing page and select "manage topics."