Production-style monitoring platform for HTTP services, inspired by systems like Datadog, New Relic, and UptimeRobot.
This project demonstrates how to design and implement a real-world monitoring system with background workers, alerting semantics, historical metrics, and a modern dashboard — focusing on observability, reliability, and testability.
| Service | Status |
|---|---|
| API CI | |
| Coverage |
This repo simulates a lean version of real-world monitoring platforms. The goal is to demonstrate distributed design, incident detection, and observability concepts end to end.
- JWT-based authentication
- Monitor CRUD with strict ownership enforcement
- Secure, multi-user architecture
- Background scheduler (async, non-blocking)
- Periodic HTTP health checks
- Response time & availability tracking
- Time-series check run storage
- DOWN alerts after consecutive failures
- RECOVERY alerts only after confirmed DOWN
- Guaranteed semantics:
- No duplicate DOWN alerts
- RECOVERY only after an actual outage
- Monitors list & details
- Check history
GET /monitors/:id/checks - Alerts
GET /alerts - Summary statistics
GET /monitors/:id/summary?windowHours=24
- Authentication flow (login / register)
- Protected routes
- Monitors overview
- Monitor details:
- Uptime summary
- Latency & availability charts
- Check history table
- Alerts page with polling and filtering
apps/
├── api
│ ├── modules
│ │ ├── auth
│ │ ├── monitors
│ │ ├── checkruns
│ │ └── alerts
│ ├── engine
│ │ ├── monitoringEngine.ts
│ │ ├── httpCheck.ts
│ │ └── alertRules.ts
│ ├── middleware
│ └── config
│ └── web
├── pages
├── ui
└── api
Backend
- Node.js + TypeScript
- Express
- MongoDB (time-series style collections)
- Background monitoring engine (in-process workers)
Frontend
- React + Vite
- Tailwind CSS
- React Query
- Recharts
Infrastructure
- Docker & Docker Compose
- MongoDB container
- CI-ready setup
- Node.js ≥ 18
- Docker Desktop (or Docker Engine)
From the repository root:
docker compose up --buildBackend API:
http://localhost:4000
Health check:
GET http://localhost:4000/health
In a new terminal:
cd apps/web
npm install
npm run dev
Frontend
http://localhost:5173
Useful endpoints to demonstrate alerting and recovery:
stable uptime https://www.google.com
Real API https://api.github.com
Status failure https://httpstat.us/500
Timeout https://httpstat.us/200?sleep=10000
Guaranteed DOWN http://127.0.0.1:1
Alerting logic extracted into pure functions for deterministic testing
Monitoring engine decoupled from request lifecycle
Ownership enforced at query level (no cross-user data leakage)
Time-series data modeled explicitly (check runs)
Dockerized backend for reproducible execution
Unit tests for alerting rules
Integration tests for API endpoints
In-memory MongoDB for deterministic tests
Frontend tested via component & query-level testing
The backend API is covered by Jest integration tests (supertest + in-memory MongoDB).
Currently, the API layer has ~94% line coverage.
For practicality, the coverage report explicitly excludes:
src/engine/– the long-running monitoring enginesrc/config/– environment/config wiring
These folders are more infrastructure/long-running process code and would require a different testing strategy . For this project, the focus is on:
- HTTP endpoints behavior (auth, monitors, alerts, check runs)
- Validation and error handling
- Ownership/authorization rules
MIT License © Ali Romia
Ali Romia
Software Engineer
- GitHub: https://github.com/Aliromia21
- LinkedIn: https://www.linkedin.com/in/aliromia/