"Our API crashed at 3 AM and we have no idea why." "This endpoint is slow, but we can't figure out where the bottleneck is."
Most APIs are built without proper observability, security, or testingβmaking debugging a nightmare when it matters most.
This suite is a production-ready blueprint for building "High-Reliability" APIs. It moves beyond basic tutorials to deliver an Enterprise-Grade Product.
| Feature | Description |
|---|---|
| π Observable | Every request is traced via OpenTelemetry. Logs are structured (JSON) and correlated across services. |
| π€ Intelligent | Integrated AI Agent (Groq/OpenAI) analyzes error logs and suggests root-cause fixes automatically. |
| π‘οΈ Secure | Enterprise protection patterns including Rate Limiting (Token Bucket) and JWT Authentication. |
| π Distributed | Trace Propagation is built-in. Context is preserved from Client β API β External Services. |
| π₯ Resilient | Circuit Breakers prevent cascading failures when downstream dependencies go offline. |
| π Visual | Includes a production-grade Grafana Dashboard for real-time SLO & Error Budget tracking. |
This project follows Hexagonal Architecture to decouple business logic from infrastructure. This ensures the system remains testable, maintainable, and swap-able.
graph TD
subgraph "External World"
Client[Client Request]
Jaeger[Jaeger Tracing]
Prom[Prometheus]
end
subgraph "Infrastructure Adapters (Driving)"
API["FastAPI Entrypoint<br/>(src.main)"]
end
subgraph "Core Application (The Domain)"
direction TB
Service["Auth Service<br/>(src.services.auth_service)"]
Domain["Domain Models<br/>(src.domain.models)"]
end
subgraph "Infrastructure Adapters (Driven)"
Repo["User Repository<br/>(src.infrastructure.user_repository)"]
LLM["AI Adapter<br/>(src.core.llm)"]
Logger["Structlog Adapter<br/>(src.core.logging)"]
HTTP["Instrumented HTTP Client<br/>(src.infrastructure.http_client)"]
end
subgraph "Observability Sidecars"
OTel[OpenTelemetry Collector]
Metrics[Metrics Instrumentator]
end
Client --> API
API --> Service
Service --> Domain
Service --> Repo
API -.-> OTel
API -.-> Metrics
LLM -.-> API
HTTP -.-> OTel
OTel --> Jaeger
Metrics --> Prom
Features you might miss if you don't look closely:
| Path | Feature | Why it matters |
|---|---|---|
src/core/middleware.py |
Correlation ID | Injects X-Correlation-ID into every request. Connects logs across microservices. |
src/core/config.py |
Fail-Fast Settings | Uses Pydantic v2 to validate env vars on startup. If a key is missing, the app crashes immediately (safe) rather than failing silently later. |
src/infrastructure/http_client.py |
Distributed Tracing | An instrumented client that automatically passes trace headers to external APIs (OpenAI, Stripe, etc). |
src/core/circuit_breaker.py |
Circuit Breaker | Tracks external failures. If an API is down, it "trips" and returns cached data instantly, preventing system hang. |
src/core/rate_limit.py |
Token Bucket Limiter | Prevents abuse by limiting requests per IP. |
src/services/ |
Hexagonal Logic | Business logic is pure Python. It doesn't know what "FastAPI" is, making it easy to test. |
We provide a pre-built Grafana Dashboard (infra/grafana/dashboard.json) that tracks:
- SLO Tracking: Error Budget Burn Rate.
- Experience: P99 Latency (the slowest 1% of requests).
- Resilience: Live Circuit Breaker status.
How to Import:
- Open Grafana (
http://localhost:3030- admin/admin). - Dashboards β New β Import.
- Upload
infra/grafana/dashboard.json. - Select Prometheus datasource and Load.
See the lifecycle of every request.
- Spin up Infrastructure:
docker compose up -d - Generate Traffic:
make runand hit endpoints. - View Traces:
http://localhost:16686
# Build image
make docker-build
# Start API + Prometheus + Jaeger + Grafana
make stack-upmake stack-down| Endpoint | Auth | Resilience | Description |
|---|---|---|---|
GET /health |
β | β Rate Limit | Health check |
GET /slow |
β | β | Simulates slow request (tracing demo) |
POST /login |
β | β | Get JWT token (demo/secret123) |
GET /protected |
β | β | Protected route (requires JWT) |
GET /external-api |
β | β Circuit Breaker | Demonstrates fault tolerance & fallback |
GET /debug/summarize-errors |
β | β Rate Limit | AI analyzes logs and returns insights π€ |
GET /metrics |
β | β | Prometheus metrics for Grafana π |
This project includes a Self-Healing AI Agent that reads app.json logs and provides actionable insights.
How to use:
- Set an API key in
.env:GROQ_API_KEY,OPENAI_API_KEY, orGOOGLE_API_KEY. - Hit the
/debug/summarize-errorsendpoint (requires auth). - Receive a JSON summary of root causes and fixes.
This project uses Ruff for linting and Pre-Commit for quality checks.
# Install git hooks (runs automatically on commit)
make install-hooks
# Run tests
make test
# Format code manually
make formatIf this template helps you, consider sponsoring my work!
Looking for a developer who understands API reliability, security, and DevOps? π§ adelekedare2012@gmail.com | LinkedIn

