A workshop on Layered Recovery Architecture and breaking circular dependencies in platform engineering.
This workshop explores the "Layer Cake" architecture pattern for platform recovery, drawing lessons from:
- Atlassian's CPR (Continuous PaaS Recovery) Program: How they migrated from a tangled "ball of mud" to a layered architecture to enable disaster recovery.
- Netflix's Chaos Engineering: Using steady-state hypothesis and fault injection to validate resilience.
The core problem addressed is Circular Dependencies (e.g., Service A needs Service B, Service B needs Service A), which make cold-start recovery impossible during a total outage.
All content is available in setup.org.
- Dependency Mapping: Identify and visualize circular dependencies in a service graph.
- Layer Assignment: Implement an algorithm to assign services to layers (N) such that they only depend on layers < N.
- Recovery Simulation: Simulate a disaster recovery scenario to verify that the layered architecture allows for a bottom-up restoration of services.
- Python 3.9+
- Emacs (optional, for Org mode interaction)
- Make
Initialize the workshop:
make setupYou can run the solution code for the exercises using:
make testA component in layer N can only have hard dependencies on lower layers (N-1, N-2, etc.).
- Hard Dependency: Service cannot function/start without it (e.g., Database, IAM).
- Soft Dependency: Service works with reduced functionality (e.g., Logging, Metrics).
- Open Source Chaos & Resilience Testing Tools: A curated list of tools for evaluating resilience in production, IaC, and platform tooling.