Skip to content

Raspberry Pi 5 hard freeze under high NVMe I/O load (Frigate / InfluxDB) – system remains pingable but SSH and journald hang #7184

@Boldfor

Description

@Boldfor

Describe the bug

On a Raspberry Pi 5, the system occasionally enters a complete freeze under high NVMe I/O load.
The device remains reachable via ICMP (ping works), but:

  • SSH becomes unresponsive
  • systemd-journald stops processing logs
  • no further kernel messages are emitted
  • the system does not recover without a power cycle
  • The issue occurs very rarely (weeks/months of stable uptime), but is reproducible by triggering heavy I/O, e.g.:
  • scrubbing a long Frigate video timeline (NVMe read-heavy)
  • querying a full year of data in openHAB backed by InfluxDB

This looks like an I/O or PCIe/NVMe stall rather than a userspace crash or OOM condition.

Steps to reproduce the behaviour

  1. Raspberry Pi 5 running Raspberry Pi OS (64-bit), kernel 6.6.x+rpt-rpi-2712
  2. System booted from NVMe (via Pineboards AI Bundle, M-Key)
  3. Run Docker containers including:
    Frigate (video recordings on NVMe)
    InfluxDB
    openHAB
    RaspberryMatic / OpenCCU
  4. Trigger high sustained NVMe load, e.g.:
    Scrub through a long Frigate video timeline
    Query a large time range (e.g. 1 year) in openHAB that hits InfluxDB
  5. After some time (minutes), the system freezes:
    ping still works
    SSH and journald hang
    no recovery without power cycle

The issue does not happen under normal load and may require weeks to reoccur.

Device (s)

Raspberry Pi 5

System

Hardware:

  • Raspberry Pi 5 Model B
  • Pineboards AI Bundle (NVMe M-Key + E-Key, incl. Hailo 8L – not actively used)
  • NVMe SSD (used as root filesystem and for Docker data)

Kernel:

  • 6.6.74+rpt-rpi-2712 (Raspberry Pi kernel)

OS:

  • Raspberry Pi OS 64-bit (Bookworm)

Storage:

  • Root filesystem on NVMe

USB devices:

  • Homematic RF stick (directly connected via USB extension cable)
  • Zigbee + Z-Wave + Amber dongles (via active USB hub)

Issue also occurred previously with a passive hub

Containers (Docker):

  • openHAB
  • InfluxDB
  • Frigate
  • RaspberryMatic / OpenCCU

Logs

Relevant observations from logs before/during freezes:

  • NVMe timeouts and aborts:
nvme nvme0: I/O timeout, aborting
I/O error, dev nvme0n1
  • Journald overwhelmed or stalled:
/dev/kmsg buffer overrun, some messages lost
systemd-journald watchdog timeout
  • No classic OOM signature before the freeze
  • No soft-lockup watchdog messages
  • After reboot, previous journal sometimes marked as uncleanly shut down

Full logs around the freeze are attached to this issue.

Additional context

The system can run perfectly stable for weeks or months

The freeze correlates strongly with high NVMe I/O pressure

USB power issues are unlikely:

  • active USB hub in use
  • freeze still occurs

Ping remaining functional suggests:

  • kernel still partially alive
  • likely blocked kernel threads (I/O / PCIe / NVMe path)

Similar behavior observed both with Frigate (video I/O) and InfluxDB (large queries)

If helpful, I can:

  • test kernel parameters (e.g. NVMe power state limits)
  • enable hung task panic for better diagnostics
  • provide additional traces on next occurrence

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions