Skip to content

(discussion) WLED architecture and development vision for the next 10 years #299

@softhack007

Description

@softhack007

This issue captures the architectural vision and roadmap discussed in PR #293, focusing on modernizing WLED's codebase to address technical debt while improving performance and maintainability.

Thanks @ewowi for starting this discussion with a good question 😃


AI generated summary

Current State: Critical Issues Identified

1. Concurrency Model: "Wishful Single-Threading"

2. Architectural Debt

  • Global state accessible everywhere (no ownership model)
  • Multiple mutexes without clear acquisition order (deadlock risk)
  • Lock hierarchy violations throughout codebase
  • Magic numbers and hidden dependencies (e.g., realtimeLock(65000) which turned out to mean 'infinite')

3. Testing & Validation Gap

  • No automated testing (unit, integration, or concurrency stress tests)
  • No static analysis in CI pipeline
  • Bugs discovered in production after years
  • No ThreadSanitizer or other sanitizer validation

Vision: WLED-MM 2.0 and Beyond

Core Principles

  1. Safety First: Eliminate race conditions and data corruption
  2. Performance Through Architecture: Proper design unlocks hardware capabilities
  3. Maintainability: Clear module boundaries and ownership model
  4. Multi-Platform: ESP32 optimized, Linux-ready for large installations

10-Year Roadmap

Phase 1: Stabilization (Year 1-2) - "Stop the Bleeding"

Q1-Q2: Concurrency Foundation

  • Formalize lock hierarchy and document it
  • Create mutex helper classes with RAII (no manual lock/unlock)
  • Audit all global state access
  • Add ThreadSanitizer builds to CI

Q3-Q4: Critical Bug Fixes

  • Fix preset corruption (file mutex + atomic writes)
  • Eliminate known race conditions
  • Add error recovery mechanisms
  • Create "safe mode" boot option

Deliverable: WLED-MM 1.0 LTS - "Production Ready"

Phase 2: Modernization (Year 2-4) - "Refactor Core"

Year 2: Module Boundaries

// Target: Clear module boundaries
class SegmentManager {
  Segment* createSegment();
  void updateSegment(id, props);
  // All segment access goes through here
};

Year 3: Data Model Refactor

  • Replace global variables with managed state objects
  • Introduce ownership model
  • Create read/write interfaces with clear contracts

Year 4: Architecture Documentation

  • Generate architecture diagrams
  • Document threading model
  • Create contribution guidelines with code patterns

Deliverable: WLED-MM 2.0 - "Maintainable Architecture"

Phase 3: Quality Systems (Year 4-6) - "Engineering Rigor"

Testing Infrastructure

  • Unit tests for core algorithms (effects, color math)
  • Integration tests for protocols
  • Mock framework for hardware
  • Concurrency stress tests

Static Analysis Integration

  • Clang-Tidy in CI
  • ThreadSanitizer nightly builds
  • Memory leak detection
  • Complexity metrics tracking

Observability

  • Structured logging
  • Performance counters exposed via API
  • Diagnostic mode

Deliverable: WLED-MM 2.5 - "Enterprise Grade"

Phase 4: Next-Gen Features (Year 6-10) - "Innovation"

AI/ML Integration

  • On-device effect generation using TinyML
  • Adaptive brightness based on usage patterns
  • Automatic optimization for LED types

Distributed Systems

  • Multi-device orchestration (100+ devices in sync)
  • Mesh networking for large installations
  • Cloud integration

Performance Optimization

  • GPU acceleration for ESP32-S3
  • Hardware-accelerated DMA
  • Sub-millisecond effect latency

Deliverable: WLED-MM 3.0 - "Smart LED Platform"

Immediate Action Items (Next 6 Months)

P0: Fix Corruption Issues

  1. Implement `presetFileMux` (PR Pixelforge backport, UI stability improvements, speedup for UDP real-time #293)
  2. Atomic file writes (temp + rename)
  3. Add file corruption detection + recovery

P0: Concurrency Safety

  1. Document lock acquisition order
  2. Create `ScopedLock` helper class
  3. Audit and fix all concurrent access to `strip`

P1: Developer Tools

  1. Add `--enable-thread-sanitizer` build option
  2. Create concurrency stress test suite
  3. Set up nightly CI builds with sanitizers

P2: Code Quality

  1. Replace magic numbers with named constants
  2. Add error code enums (not just log messages)
  3. Create coding standards document

Performance Strategy

ESP32-S3 Optimization

  • Dual-core utilization: Network on Core 0, Rendering on Core 1
  • DMA optimization: Async LED output with callbacks
  • SIMD instructions: Use ESP32-S3 vector operations for color math
  • Lock-free buffers: Double-buffering with atomic swaps for realtime paths

Performance Targets

  • ESP32-S3: 10,000 LEDs @ 60 FPS
  • RasPi 4: 50,000 LEDs @ 60 FPS
  • API response: <10ms
  • Effect switching: <5ms latency

Testing & Analysis Tools

QEMU Integration

# ESP32 emulation with sanitizers
qemu-system-xtensa -M esp32 -kernel wled.elf -fsanitizer=thread

CI Pipeline

# Automated testing workflow
- ThreadSanitizer builds
- Concurrency stress tests
- Memory leak detection
- Performance regression tests

Multi-Platform Architecture

┌─────────────────────────────────────────┐
│        WLED Application Layer            │
│  (Effects, Presets, UI, JSON API)        │
└──────────────────┬──────────────────────┘
                   │
┌──────────────────┴──────────────────────┐
│         Platform Abstraction Layer       │
│  ┌────────────┐        ┌──────────────┐ │
│  │  ESP32 HAL │        │  Linux HAL   │ │
│  └────────────┘        └──────────────┘ │
└─────────────────────────────────────────┘

Success Metrics

Stability

  • Zero corruption bugs in production (6 months)
  • Mean time between crashes > 30 days uptime
  • 99.9% successful API calls

Maintainability

  • New contributor onboarding time < 1 week
  • Bug fix time < 48 hours
  • Code review coverage > 80%

Performance

  • Effect latency < 10ms
  • API response time < 100ms
  • Support 10,000+ LEDs per device

Related Issues & PRs

Key Insights

"The current codebase has defensive overhead everywhere. With proper architecture, we eliminate runtime checks and improve both performance AND correctness."

"WLED is at an inflection point: beloved by community, rich features, but architectural debt threatens long-term viability. The journey has begun with PR #293's mutex improvements."

Next Steps

  1. Community Discussion: Gather feedback on this roadmap
  2. Working Groups: Form teams for Phase 1 priorities (Concurrency, Testing, Performance)
  3. Documentation: Start architectural documentation alongside code improvements
  4. Upstream Coordination: Share learnings and fixes with upstream WLED community

Note: This roadmap is a living document. Feedback and contributions welcome! The goal is incremental improvement while maintaining backward compatibility and community momentum."

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions