Skip to content

A robust template for debugging and testing High-Performance APIs

Notifications You must be signed in to change notification settings

daretechie/api-reliability-suite

Repository files navigation

API Reliability & Debugging Suite

Thumbnail

CI Pipeline Documentation Version Architecture Python FastAPI GitHub Sponsors


😀 The Reality

"Our API crashed at 3 AM and we have no idea why." "This endpoint is slow, but we can't figure out where the bottleneck is."

Most APIs are built without proper observability, security, or testingβ€”making debugging a nightmare when it matters most.

βœ… The Solution

This suite is a production-ready blueprint for building "High-Reliability" APIs. It moves beyond basic tutorials to deliver an Enterprise-Grade Product.

πŸš€ Key Capabilities

Feature Description
πŸ” Observable Every request is traced via OpenTelemetry. Logs are structured (JSON) and correlated across services.
πŸ€– Intelligent Integrated AI Agent (Groq/OpenAI) analyzes error logs and suggests root-cause fixes automatically.
πŸ›‘οΈ Secure Enterprise protection patterns including Rate Limiting (Token Bucket) and JWT Authentication.
πŸ”— Distributed Trace Propagation is built-in. Context is preserved from Client β†’ API β†’ External Services.
πŸ’₯ Resilient Circuit Breakers prevent cascading failures when downstream dependencies go offline.
πŸ“Š Visual Includes a production-grade Grafana Dashboard for real-time SLO & Error Budget tracking.

πŸ›οΈ Architecture: Hexagonal (Ports & Adapters)

This project follows Hexagonal Architecture to decouple business logic from infrastructure. This ensures the system remains testable, maintainable, and swap-able.

graph TD
    subgraph "External World"
        Client[Client Request]
        Jaeger[Jaeger Tracing]
        Prom[Prometheus]
    end

    subgraph "Infrastructure Adapters (Driving)"
        API["FastAPI Entrypoint<br/>(src.main)"]
    end

    subgraph "Core Application (The Domain)"
        direction TB
        Service["Auth Service<br/>(src.services.auth_service)"]
        Domain["Domain Models<br/>(src.domain.models)"]
    end

    subgraph "Infrastructure Adapters (Driven)"
        Repo["User Repository<br/>(src.infrastructure.user_repository)"]
        LLM["AI Adapter<br/>(src.core.llm)"]
        Logger["Structlog Adapter<br/>(src.core.logging)"]
        HTTP["Instrumented HTTP Client<br/>(src.infrastructure.http_client)"]
    end

    subgraph "Observability Sidecars"
        OTel[OpenTelemetry Collector]
        Metrics[Metrics Instrumentator]
    end

    Client --> API
    API --> Service
    Service --> Domain
    Service --> Repo

    API -.-> OTel
    API -.-> Metrics
    LLM -.-> API
    HTTP -.-> OTel

    OTel --> Jaeger
    Metrics --> Prom
Loading

πŸ“‚ The Codebase Explained

Features you might miss if you don't look closely:

Path Feature Why it matters
src/core/middleware.py Correlation ID Injects X-Correlation-ID into every request. Connects logs across microservices.
src/core/config.py Fail-Fast Settings Uses Pydantic v2 to validate env vars on startup. If a key is missing, the app crashes immediately (safe) rather than failing silently later.
src/infrastructure/http_client.py Distributed Tracing An instrumented client that automatically passes trace headers to external APIs (OpenAI, Stripe, etc).
src/core/circuit_breaker.py Circuit Breaker Tracks external failures. If an API is down, it "trips" and returns cached data instantly, preventing system hang.
src/core/rate_limit.py Token Bucket Limiter Prevents abuse by limiting requests per IP.
src/services/ Hexagonal Logic Business logic is pure Python. It doesn't know what "FastAPI" is, making it easy to test.

πŸš€ Day 2 Operations: Monitoring & Tracing

1. The "WOW" Dashboard (Real-Time)

We provide a pre-built Grafana Dashboard (infra/grafana/dashboard.json) that tracks:

  • SLO Tracking: Error Budget Burn Rate.
  • Experience: P99 Latency (the slowest 1% of requests).
  • Resilience: Live Circuit Breaker status.

How to Import:

  1. Open Grafana (http://localhost:3030 - admin/admin).
  2. Dashboards β†’ New β†’ Import.
  3. Upload infra/grafana/dashboard.json.
  4. Select Prometheus datasource and Load.

Grafana Dashboard

2. Distributed Tracing (Jaeger)

See the lifecycle of every request.

  1. Spin up Infrastructure: docker compose up -d
  2. Generate Traffic: make run and hit endpoints.
  3. View Traces: http://localhost:16686

Run Full Stack (Recommended)

# Build image
make docker-build

# Start API + Prometheus + Jaeger + Grafana
make stack-up

Stop All Services

make stack-down

πŸ” API Endpoints

Endpoint Auth Resilience Description
GET /health ❌ βœ… Rate Limit Health check
GET /slow ❌ ❌ Simulates slow request (tracing demo)
POST /login ❌ ❌ Get JWT token (demo/secret123)
GET /protected βœ… ❌ Protected route (requires JWT)
GET /external-api ❌ βœ… Circuit Breaker Demonstrates fault tolerance & fallback
GET /debug/summarize-errors βœ… βœ… Rate Limit AI analyzes logs and returns insights πŸ€–
GET /metrics ❌ ❌ Prometheus metrics for Grafana πŸ“Š

🧠 AI-Powered Debugging

This project includes a Self-Healing AI Agent that reads app.json logs and provides actionable insights.

AI Agent Screenshot

How to use:

  1. Set an API key in .env: GROQ_API_KEY, OPENAI_API_KEY, or GOOGLE_API_KEY.
  2. Hit the /debug/summarize-errors endpoint (requires auth).
  3. Receive a JSON summary of root causes and fixes.

πŸ‘· Developer Tools

This project uses Ruff for linting and Pre-Commit for quality checks.

# Install git hooks (runs automatically on commit)
make install-hooks

# Run tests
make test

# Format code manually
make format

πŸ’– Support This Project

If this template helps you, consider sponsoring my work!

🀝 Hire Me

Looking for a developer who understands API reliability, security, and DevOps? πŸ“§ adelekedare2012@gmail.com | LinkedIn

About

A robust template for debugging and testing High-Performance APIs

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Contributors 2

  •  
  •