Skip to content

[Q2-P2] Add synthetic monitoring with continuous SDK health checks #28

@karlwaldman

Description

@karlwaldman

Problem

We only test SDK when:

  • Someone runs tests manually
  • CI runs on commits
  • Customers use it in production

Gap: No continuous validation that SDK works in production

Solution

Add synthetic monitoring that runs realistic SDK queries continuously:

# monitoring/synthetic/sdk_health_check.py

import schedule
import time
from oilpriceapi import OilPriceAPI

def health_check():
    """Run realistic SDK queries."""
    client = OilPriceAPI(api_key=MONITOR_KEY)

    checks = [
        ('current_price', lambda: client.prices.get('WTI_USD')),
        ('1_week_historical', lambda: client.historical.get(
            'WTI_USD',
            (datetime.now() - timedelta(days=7)).date(),
            datetime.now().date()
        )),
        ('1_month_historical', lambda: client.historical.get(
            'WTI_USD',
            (datetime.now() - timedelta(days=30)).date(),
            datetime.now().date()
        )),
    ]

    for name, check in checks:
        try:
            start = time.time()
            result = check()
            duration = time.time() - start

            send_metric(f'synthetic.{name}.success', 1)
            send_metric(f'synthetic.{name}.duration', duration)
        except Exception as e:
            send_metric(f'synthetic.{name}.failure', 1)
            send_alert(f'Synthetic check failed: {name}: {e}')

# Run every 5 minutes
schedule.every(5).minutes.do(health_check)

Deploy

# Deploy as cron job or lambda
apiVersion: batch/v1
kind: CronJob
metadata:
  name: sdk-synthetic-monitoring
spec:
  schedule: "*/5 * * * *"  # Every 5 minutes
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: health-check
            image: oilpriceapi/sdk-monitor:latest
            env:
            - name: MONITOR_API_KEY
              valueFrom:
                secretKeyRef:
                  name: monitoring-secrets
                  key: api-key

Alerts

ALERT: Synthetic check failure rate >10% over 15min
ALERT: Synthetic check duration >2x baseline
ALERT: Synthetic check not running (missing data)

Acceptance Criteria

  • Synthetic monitoring script created
  • Deployed and running every 5 minutes
  • Alerts configured
  • Dashboard shows synthetic check results
  • Runbook for responding to failures

Estimated Effort

Time: 4 hours

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority: mediumShould be fixed eventuallyquadrant: q2Important, Not Urgent (Schedule)type: monitoringMonitoring, observability, and alerting

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions