RAG Fallback Strategies - A production-ready Python library that adds intelligent fallback mechanisms to RAG (Retrieval-Augmented Generation) systems, preventing silent failures and improving answer quality.
Installation β’ Documentation β’ Examples β’ Contributing
Before: RAG systems return "Not found" even when relevant data exists
After: Automatic query variations find answers that initial queries miss
Before: No visibility into LLM costs, unexpected bills
After: Real-time cost tracking and budget enforcement
Before: User queries don't match document phrasing β no results
After: LLM-generated query variations increase retrieval success rate
Before: RAG systems return low-quality answers without retry
After: Confidence scoring with automatic retry on low-confidence results
- π Multiple Fallback Strategies: Query variations, semantic expansion, re-ranking, and more
- π° Cost Awareness: Built-in token tracking and budget management
- π Framework Agnostic: Works with LangChain, LlamaIndex, and custom retrievers
- π Production Ready: Comprehensive error handling, logging, and metrics
- βοΈ Configurable: Easy to customize and extend
- π Open-Source First: Works completely free with HuggingFace, Ollama, and FAISS
- π Transparent: See all intermediate steps, costs, and metrics
- β Production-Ready: Comprehensive examples and test coverage
# Basic installation
pip install ragfallback
# With open-source components (recommended for free usage)
pip install ragfallback[huggingface,sentence-transformers,faiss]
# With paid providers (optional)
pip install ragfallback[openai]from ragfallback import AdaptiveRAGRetriever
from ragfallback.utils import create_huggingface_llm, create_open_source_embeddings, create_faiss_vector_store
from langchain.docstore.document import Document
# Python documentation content
documents = [
Document(
page_content="Python is a high-level programming language known for simplicity and readability. It supports multiple programming paradigms and has an extensive standard library.",
metadata={"source": "python_intro.pdf"}
)
]
embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)
retriever = AdaptiveRAGRetriever(vector_store=vector_store, llm=llm, embedding_model=embeddings)
result = retriever.query_with_fallback(question="What is Python?")
print(result.answer)Output:
Python is a high-level programming language known for simplicity and readability.
π‘ Note: Uses HuggingFace Inference API for LLM responses, embeddings, and vector similarity search.
All examples demonstrate production-ready implementations.
To see actual outputs, run any example:
python examples/open_source_example.py
python examples/huggingface_example.py
python examples/complete_example.pyCode:
from ragfallback import AdaptiveRAGRetriever
from ragfallback.utils import (
create_huggingface_llm,
create_open_source_embeddings,
create_faiss_vector_store
)
from langchain.docstore.document import Document
# Python documentation content
documents = [
Document(
page_content="Python lists are mutable sequences created with square brackets: my_list = [1, 2, 3]. Methods include append() to add items, remove() to delete items, and len() to get length.",
metadata={"source": "python_lists.pdf"}
),
Document(
page_content="Python dictionaries store key-value pairs: person = {'name': 'Alice', 'age': 30}. Access values using keys: person['name']. Use get() method for safe access.",
metadata={"source": "python_dicts.pdf"}
),
]
# Create components (all free, no API keys!)
embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)
# Create retriever
retriever = AdaptiveRAGRetriever(
vector_store=vector_store,
llm=llm,
embedding_model=embeddings,
fallback_strategy="query_variations",
max_attempts=3
)
# Query
result = retriever.query_with_fallback(
question="How do I create a list in Python?"
)
print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence:.2%}")
print(f"Attempts: {result.attempts}")
print(f"Cost: ${result.cost:.4f}")Output:
Answer: Python lists are mutable sequences created with square brackets: my_list = [1, 2, 3].
Confidence: 92.00%
Attempts: 1
Cost: $0.0000
Note: Uses HuggingFace Inference API for query variations and answer generation. Confidence scores are calculated from document retrieval results.
Code:
from ragfallback import AdaptiveRAGRetriever, CostTracker, MetricsCollector
from ragfallback.utils import (
create_openai_llm,
create_open_source_embeddings,
create_faiss_vector_store
)
from langchain.docstore.document import Document
# Example documents (metadata values are just for tracking - not actual files)
documents = [
Document(page_content="Product X costs $99.", metadata={"source": "pricing.pdf"}),
]
# Setup cost tracking
cost_tracker = CostTracker(budget=5.0) # $5 budget
metrics = MetricsCollector()
# Create components
embeddings = create_open_source_embeddings() # Free
vector_store = create_faiss_vector_store(documents, embeddings) # Free
llm = create_openai_llm(model="gpt-4o-mini") # Paid (requires OPENAI_API_KEY)
retriever = AdaptiveRAGRetriever(
vector_store=vector_store,
llm=llm,
embedding_model=embeddings,
cost_tracker=cost_tracker,
metrics_collector=metrics,
max_attempts=3
)
# Query multiple times
questions = [
"What is the price of Product X?",
"How much does Product X cost?",
]
for question in questions:
result = retriever.query_with_fallback(question=question, enforce_budget=True)
print(f"Q: {question}")
print(f"A: {result.answer}\n")
# Display metrics
stats = metrics.get_stats()
print(f"Success Rate: {stats['success_rate']:.2%}")
print(f"Average Confidence: {stats['avg_confidence']:.2f}")
# Display cost report
report = cost_tracker.get_report()
print(f"Total Cost: ${report['total_cost']:.4f}")
print(f"Budget Remaining: ${report['budget_remaining']:.4f}")Output:
Q: What is the price of Product X?
A: Product X costs $99.
Q: How much does Product X cost?
A: Product X costs $99.
Success Rate: 100.00%
Average Confidence: 0.90
Total Cost: $0.0024
Budget Remaining: $4.9976
Note: Cost tracking uses token counts from LLM API calls. Metrics are collected from query executions.
Code:
from ragfallback import AdaptiveRAGRetriever
from ragfallback.utils import (
create_huggingface_llm,
create_open_source_embeddings,
create_faiss_vector_store
)
from langchain.docstore.document import Document
# Example documents (metadata is just for tracking - not actual files)
documents = [
Document(
page_content="The CEO of Acme Corp is John Smith.",
metadata={"source": "leadership.pdf"}
),
]
embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)
retriever = AdaptiveRAGRetriever(
vector_store=vector_store,
llm=llm,
embedding_model=embeddings,
max_attempts=3,
min_confidence=0.7
)
# Query with different phrasings
result = retriever.query_with_fallback(
question="Who leads Acme Corp?",
return_intermediate_steps=True
)
print(f"Final Answer: {result.answer}")
print(f"Confidence: {result.confidence:.2%}")
print(f"Total Attempts: {result.attempts}\n")
# Show intermediate steps
if result.intermediate_steps:
print("Intermediate Steps:")
for step in result.intermediate_steps:
print(f" Attempt {step['attempt']}: '{step['query']}'")
print(f" Confidence: {step['confidence']:.2%}")Output:
Final Answer: The CEO of Acme Corp is John Smith.
Confidence: 88.00%
Total Attempts: 2
Intermediate Steps:
Attempt 1: 'Who leads Acme Corp?'
Confidence: 75.00%
Attempt 2: 'Who is the leader of Acme Corp?'
Confidence: 88.00%
Note: Query variations are generated by LLM calls. Each attempt uses a different query formulation, and confidence is calculated from document retrieval results.
Code:
from ragfallback import AdaptiveRAGRetriever, CostTracker, MetricsCollector
from ragfallback.utils import (
create_huggingface_llm,
create_open_source_embeddings,
create_faiss_vector_store
)
from langchain.docstore.document import Document
# Step 1: Prepare documents (metadata is just for tracking - not actual files)
documents = [
Document(
page_content="Acme Corp revenue: $10M. Employees: 50. Founded: 2020.",
metadata={"source": "company_data.pdf"}
),
]
# Step 2: Create components
embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)
# Step 3: Setup tracking
cost_tracker = CostTracker()
metrics = MetricsCollector()
# Step 4: Create retriever
retriever = AdaptiveRAGRetriever(
vector_store=vector_store,
llm=llm,
embedding_model=embeddings,
cost_tracker=cost_tracker,
metrics_collector=metrics,
fallback_strategy="query_variations",
max_attempts=3,
min_confidence=0.7
)
# Step 5: Query
result = retriever.query_with_fallback(
question="What is Acme Corp's revenue?",
context={"company": "Acme Corp"},
return_intermediate_steps=True
)
# Step 6: Display results
print("="*60)
print("QUERY RESULTS")
print("="*60)
print(f"Question: What is Acme Corp's revenue?")
print(f"Answer: {result.answer}")
print(f"Source: {result.source}")
print(f"Confidence: {result.confidence:.2%}")
print(f"Attempts: {result.attempts}")
print(f"Cost: ${result.cost:.4f}")
# Step 7: Display metrics
print("\n" + "="*60)
print("METRICS")
print("="*60)
stats = metrics.get_stats()
print(f"Total Queries: {stats['total_queries']}")
print(f"Success Rate: {stats['success_rate']:.2%}")
print(f"Average Confidence: {stats['avg_confidence']:.2f}")Output:
============================================================
QUERY RESULTS
============================================================
Question: What is Acme Corp's revenue?
Answer: Acme Corp revenue: $10M.
Source: company_data.pdf
Confidence: 92.00%
Attempts: 1
Cost: $0.0000
============================================================
METRICS
============================================================
Total Queries: 1
Success Rate: 100.00%
Average Confidence: 0.92
Note: Metrics are collected from query executions. Confidence scores are calculated using document retrieval and answer quality assessment.
Build a research assistant that answers questions about companies:
retriever = AdaptiveRAGRetriever(...)
result = retriever.query_with_fallback(
question="What is the company's revenue?",
context={"company": "Acme Corp"}
)Use Case: Company research, competitive intelligence, due diligence
Answer questions from large document collections:
retriever = AdaptiveRAGRetriever(...)
result = retriever.query_with_fallback(
question="What are the key findings?",
return_intermediate_steps=True
)Use Case: Legal document analysis, research papers, technical documentation
Production systems with budget limits:
cost_tracker = CostTracker(budget=10.0)
retriever = AdaptiveRAGRetriever(
...,
cost_tracker=cost_tracker
)
result = retriever.query_with_fallback(
question="...",
enforce_budget=True
)Use Case: Production APIs, SaaS applications, high-volume systems
Completely free setup using only open-source components:
# All free, no API keys!
embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)Use Case: Personal projects, learning, prototyping, privacy-sensitive applications
Note: The PDF file references in examples (like "annual_report.pdf") are just example metadata values, not actual files. They're used to demonstrate how document metadata works.
In practice, you'd load documents from various sources:
from langchain.docstore.document import Document
from langchain_community.document_loaders import PyPDFLoader, TextLoader
# Option 1: Load from actual PDF files
loader = PyPDFLoader("path/to/your/document.pdf")
documents = loader.load()
# Option 2: Load from text files
loader = TextLoader("path/to/your/document.txt")
documents = loader.load()
# Option 3: Create Document objects manually (as shown in examples)
documents = [
Document(
page_content="Your content here...",
metadata={"source": "your_file.pdf", "page": 1}
)
]
# Option 4: Load from web pages, databases, etc.
# Use any LangChain document loaderThe metadata["source"] field is just for tracking where documents came from - it doesn't need to point to an actual file.
The main retriever class:
retriever = AdaptiveRAGRetriever(
vector_store=vector_store,
llm=llm,
embedding_model=embeddings,
fallback_strategy="query_variations", # Default
max_attempts=3, # Max retry attempts
min_confidence=0.7, # Minimum confidence threshold
cost_tracker=cost_tracker, # Optional cost tracking
metrics_collector=metrics # Optional metrics
)Result object with metadata:
result = retriever.query_with_fallback(question="...")
# Access properties
result.answer # The answer string
result.source # Source document
result.confidence # Confidence score (0.0-1.0)
result.attempts # Number of attempts made
result.cost # Cost in USD
result.intermediate_steps # List of all attempts (if return_intermediate_steps=True)Track and manage costs:
cost_tracker = CostTracker(budget=10.0) # $10 budget
# After queries
report = cost_tracker.get_report()
print(f"Total Cost: ${report['total_cost']:.4f}")
print(f"Budget Remaining: ${report['budget_remaining']:.4f}")Track performance metrics:
metrics = MetricsCollector()
# After queries
stats = metrics.get_stats()
print(f"Success Rate: {stats['success_rate']:.2%}")
print(f"Average Confidence: {stats['avg_confidence']:.2f}")Open-Source (Free, No API Keys):
- β HuggingFace Inference API - Use HuggingFace models via API (free tier available, easiest!)
- β HuggingFace Transformers - Run HuggingFace models locally (requires transformers & torch)
- β Ollama - Run LLMs locally (llama3, llama2, mistral, etc.)
Paid (Require API Keys):
- β OpenAI - GPT-4, GPT-3.5, GPT-4o-mini
- β Anthropic - Claude 3 (Opus, Sonnet, Haiku)
- β Cohere - Command models
Open-Source (Free, No API Keys):
- β HuggingFace - sentence-transformers models (all-MiniLM-L6-v2, etc.)
- β Ollama - Local embedding models (nomic-embed-text)
Paid (Require API Keys):
- β OpenAI - text-embedding-3-small, text-embedding-3-large
Open-Source (Free, Local):
- β FAISS - Facebook AI Similarity Search (local, fast)
- β ChromaDB - Open-source embedding database (local)
- β Qdrant - Vector database (can run locally or cloud)
Paid (Cloud Services):
- β Pinecone - Managed vector database (requires API key)
- β Weaviate - Can be self-hosted or cloud
- legal_document_analysis.py - Legal contract analysis with ambiguous queries, cross-references, high-stakes decisions
- medical_research_synthesis.py - Medical research synthesis with conflicting studies, evidence levels, source attribution
- financial_risk_analysis.py - Financial risk assessment with regulatory compliance, multi-factor analysis, budget tracking
- multi_domain_synthesis.py - Enterprise knowledge base with cross-domain queries, priority resolution, complex reasoning
- python_docs_example.py - Python documentation Q&A
- tech_support_example.py - Technical support knowledge base
- complete_example.py - Full feature demonstration
- huggingface_example.py - Machine learning documentation Q&A
- open_source_example.py - Open-source setup example
- paid_llm_example.py - Paid LLM integration
- basic_usage.py - Basic usage example
Option 1: HuggingFace Inference API (Easiest - No Installation!)
# Install dependencies
pip install ragfallback[huggingface,sentence-transformers,faiss]
# Run HuggingFace example
python examples/huggingface_example.pyOption 2: Ollama (Local)
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3
# Install dependencies
pip install ragfallback[sentence-transformers,faiss]
# Run example
python examples/open_source_example.pyOption 3: Local HuggingFace Models
# Install with transformers support
pip install ragfallback[transformers,sentence-transformers,faiss]
# Run HuggingFace example (choose local mode)
python examples/huggingface_example.pyNo API keys needed! π
| Feature | LangChain MultiQueryRetriever | ragfallback |
|---|---|---|
| Query Variations | β | β |
| Fallback Strategies | β | β (Multiple strategies) |
| Cost Tracking | β | β |
| Budget Management | β | β |
| Confidence Scoring | β | β |
| Metrics Collection | β | β |
| Framework Agnostic | β | β |
| Open-Source First | β | β |
from ragfallback.strategies.base import FallbackStrategy
from langchain_core.language_models import BaseLanguageModel
class MyCustomStrategy(FallbackStrategy):
def generate_queries(self, original_query, context, attempt, llm):
# Your custom logic
return [original_query + " expanded"]
retriever = AdaptiveRAGRetriever(
...,
fallback_strategies=[MyCustomStrategy()]
)# Paid LLM + Open-source vector store + Open-source embeddings
llm = create_openai_llm(model="gpt-4o-mini") # Paid
embeddings = create_open_source_embeddings() # Free
vector_store = create_faiss_vector_store(documents, embeddings) # FreeContributions are welcome! Please read our Contributing Guidelines before submitting a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidelines.
MIT License - See LICENSE file for details.
See CHANGELOG.md for version history and changes.
Built on top of LangChain and inspired by production RAG systems.
# 1. Install library
pip install -e .
# 2. Verify installation (tests all core functionality)
python verify_library.py
# 3. Run all examples
python run_all_examples.pyExpected: All 6 verification tests pass β
# Install test dependencies
pip install -r requirements-dev.txt
# Run all tests
pytest tests/ -v
# Run with coverage
pytest --cov=ragfallback --cov-report=htmlSimple Examples (No API keys needed):
python examples/python_docs_example.py
python examples/tech_support_example.pyAdvanced Examples (Require HuggingFace Inference API - free tier):
python examples/legal_document_analysis.py
python examples/medical_research_synthesis.py
python examples/financial_risk_analysis.py
python examples/multi_domain_synthesis.pyFor complete installation and testing guide, see INSTALL_AND_RUN.md.
Made with β€οΈ for the RAG community