-
-
Notifications
You must be signed in to change notification settings - Fork 116
Description
This issue captures the architectural vision and roadmap discussed in PR #293, focusing on modernizing WLED's codebase to address technical debt while improving performance and maintainability.
Thanks @ewowi for starting this discussion with a good question 😃
AI generated summary
Current State: Critical Issues Identified
1. Concurrency Model: "Wishful Single-Threading"
- Problem: Originally single-threaded ESP8266 code now running on multi-core ESP32
- Impact: Race conditions, data corruption, crashes
- Example: 4-year preset corruption bug caused by concurrent file writes (Pixelforge backport, UI stability improvements, speedup for UDP real-time #293)
2. Architectural Debt
- Global state accessible everywhere (no ownership model)
- Multiple mutexes without clear acquisition order (deadlock risk)
- Lock hierarchy violations throughout codebase
- Magic numbers and hidden dependencies (e.g.,
realtimeLock(65000)which turned out to mean 'infinite')
3. Testing & Validation Gap
- No automated testing (unit, integration, or concurrency stress tests)
- No static analysis in CI pipeline
- Bugs discovered in production after years
- No ThreadSanitizer or other sanitizer validation
Vision: WLED-MM 2.0 and Beyond
Core Principles
- Safety First: Eliminate race conditions and data corruption
- Performance Through Architecture: Proper design unlocks hardware capabilities
- Maintainability: Clear module boundaries and ownership model
- Multi-Platform: ESP32 optimized, Linux-ready for large installations
10-Year Roadmap
Phase 1: Stabilization (Year 1-2) - "Stop the Bleeding"
Q1-Q2: Concurrency Foundation
- Formalize lock hierarchy and document it
- Create mutex helper classes with RAII (no manual lock/unlock)
- Audit all global state access
- Add ThreadSanitizer builds to CI
Q3-Q4: Critical Bug Fixes
- Fix preset corruption (file mutex + atomic writes)
- Eliminate known race conditions
- Add error recovery mechanisms
- Create "safe mode" boot option
Deliverable: WLED-MM 1.0 LTS - "Production Ready"
Phase 2: Modernization (Year 2-4) - "Refactor Core"
Year 2: Module Boundaries
// Target: Clear module boundaries
class SegmentManager {
Segment* createSegment();
void updateSegment(id, props);
// All segment access goes through here
};Year 3: Data Model Refactor
- Replace global variables with managed state objects
- Introduce ownership model
- Create read/write interfaces with clear contracts
Year 4: Architecture Documentation
- Generate architecture diagrams
- Document threading model
- Create contribution guidelines with code patterns
Deliverable: WLED-MM 2.0 - "Maintainable Architecture"
Phase 3: Quality Systems (Year 4-6) - "Engineering Rigor"
Testing Infrastructure
- Unit tests for core algorithms (effects, color math)
- Integration tests for protocols
- Mock framework for hardware
- Concurrency stress tests
Static Analysis Integration
- Clang-Tidy in CI
- ThreadSanitizer nightly builds
- Memory leak detection
- Complexity metrics tracking
Observability
- Structured logging
- Performance counters exposed via API
- Diagnostic mode
Deliverable: WLED-MM 2.5 - "Enterprise Grade"
Phase 4: Next-Gen Features (Year 6-10) - "Innovation"
AI/ML Integration
- On-device effect generation using TinyML
- Adaptive brightness based on usage patterns
- Automatic optimization for LED types
Distributed Systems
- Multi-device orchestration (100+ devices in sync)
- Mesh networking for large installations
- Cloud integration
Performance Optimization
- GPU acceleration for ESP32-S3
- Hardware-accelerated DMA
- Sub-millisecond effect latency
Deliverable: WLED-MM 3.0 - "Smart LED Platform"
Immediate Action Items (Next 6 Months)
P0: Fix Corruption Issues
- Implement `presetFileMux` (PR Pixelforge backport, UI stability improvements, speedup for UDP real-time #293)
- Atomic file writes (temp + rename)
- Add file corruption detection + recovery
P0: Concurrency Safety
- Document lock acquisition order
- Create `ScopedLock` helper class
- Audit and fix all concurrent access to `strip`
P1: Developer Tools
- Add `--enable-thread-sanitizer` build option
- Create concurrency stress test suite
- Set up nightly CI builds with sanitizers
P2: Code Quality
- Replace magic numbers with named constants
- Add error code enums (not just log messages)
- Create coding standards document
Performance Strategy
ESP32-S3 Optimization
- Dual-core utilization: Network on Core 0, Rendering on Core 1
- DMA optimization: Async LED output with callbacks
- SIMD instructions: Use ESP32-S3 vector operations for color math
- Lock-free buffers: Double-buffering with atomic swaps for realtime paths
Performance Targets
- ESP32-S3: 10,000 LEDs @ 60 FPS
- RasPi 4: 50,000 LEDs @ 60 FPS
- API response: <10ms
- Effect switching: <5ms latency
Testing & Analysis Tools
QEMU Integration
# ESP32 emulation with sanitizers
qemu-system-xtensa -M esp32 -kernel wled.elf -fsanitizer=threadCI Pipeline
# Automated testing workflow
- ThreadSanitizer builds
- Concurrency stress tests
- Memory leak detection
- Performance regression testsMulti-Platform Architecture
┌─────────────────────────────────────────┐
│ WLED Application Layer │
│ (Effects, Presets, UI, JSON API) │
└──────────────────┬──────────────────────┘
│
┌──────────────────┴──────────────────────┐
│ Platform Abstraction Layer │
│ ┌────────────┐ ┌──────────────┐ │
│ │ ESP32 HAL │ │ Linux HAL │ │
│ └────────────┘ └──────────────┘ │
└─────────────────────────────────────────┘
Success Metrics
Stability
- Zero corruption bugs in production (6 months)
- Mean time between crashes > 30 days uptime
- 99.9% successful API calls
Maintainability
- New contributor onboarding time < 1 week
- Bug fix time < 48 hours
- Code review coverage > 80%
Performance
- Effect latency < 10ms
- API response time < 100ms
- Support 10,000+ LEDs per device
Related Issues & PRs
- PR Pixelforge backport, UI stability improvements, speedup for UDP real-time #293: Pixelforge backport, UI stability improvements, UDP realtime speedup (foundational mutex work)
- Preset corruption root cause identified after 4 years (file write race condition)
Key Insights
"The current codebase has defensive overhead everywhere. With proper architecture, we eliminate runtime checks and improve both performance AND correctness."
"WLED is at an inflection point: beloved by community, rich features, but architectural debt threatens long-term viability. The journey has begun with PR #293's mutex improvements."
Next Steps
- Community Discussion: Gather feedback on this roadmap
- Working Groups: Form teams for Phase 1 priorities (Concurrency, Testing, Performance)
- Documentation: Start architectural documentation alongside code improvements
- Upstream Coordination: Share learnings and fixes with upstream WLED community
Note: This roadmap is a living document. Feedback and contributions welcome! The goal is incremental improvement while maintaining backward compatibility and community momentum."