Raspberry Pi 5 hard freeze under high NVMe I/O load (Frigate / InfluxDB) – system remains pingable but SSH and journald hang

### Describe the bug

On a Raspberry Pi 5, the system occasionally enters a complete freeze under high NVMe I/O load.
The device remains reachable via ICMP (ping works), but:

- SSH becomes unresponsive
- systemd-journald stops processing logs
- no further kernel messages are emitted
- the system does not recover without a power cycle
- The issue occurs very rarely (weeks/months of stable uptime), but is reproducible by triggering heavy I/O, e.g.:
- scrubbing a long Frigate video timeline (NVMe read-heavy)
- querying a full year of data in openHAB backed by InfluxDB

This looks like an I/O or PCIe/NVMe stall rather than a userspace crash or OOM condition.

### Steps to reproduce the behaviour

1. Raspberry Pi 5 running Raspberry Pi OS (64-bit), kernel 6.6.x+rpt-rpi-2712
2. System booted from NVMe (via Pineboards AI Bundle, M-Key)
3. Run Docker containers including:
Frigate (video recordings on NVMe)
InfluxDB
openHAB
RaspberryMatic / OpenCCU
4. Trigger high sustained NVMe load, e.g.:
Scrub through a long Frigate video timeline
Query a large time range (e.g. 1 year) in openHAB that hits InfluxDB
5. After some time (minutes), the system freezes:
ping still works
SSH and journald hang
no recovery without power cycle

The issue does not happen under normal load and may require weeks to reoccur.

### Device (s)

Raspberry Pi 5

### System

Hardware:
- Raspberry Pi 5 Model B
- Pineboards AI Bundle (NVMe M-Key + E-Key, incl. Hailo 8L – not actively used)
- NVMe SSD (used as root filesystem and for Docker data)

Kernel:
- 6.6.74+rpt-rpi-2712 (Raspberry Pi kernel)

OS:
- Raspberry Pi OS 64-bit (Bookworm)

Storage:
- Root filesystem on NVMe

USB devices:
- Homematic RF stick (directly connected via USB extension cable)
- Zigbee + Z-Wave + Amber dongles (via active USB hub)

Issue also occurred previously with a passive hub

Containers (Docker):
- openHAB
- InfluxDB
- Frigate
- RaspberryMatic / OpenCCU

### Logs

Relevant observations from logs before/during freezes:
- NVMe timeouts and aborts:
```
nvme nvme0: I/O timeout, aborting
I/O error, dev nvme0n1
```

- Journald overwhelmed or stalled:

```
/dev/kmsg buffer overrun, some messages lost
systemd-journald watchdog timeout
```

- No classic OOM signature before the freeze
- No soft-lockup watchdog messages
- After reboot, previous journal sometimes marked as uncleanly shut down

Full logs around the freeze are attached to this issue.

### Additional context

The system can run perfectly stable for weeks or months

The freeze correlates strongly with high NVMe I/O pressure

USB power issues are unlikely:
- active USB hub in use
- freeze still occurs

Ping remaining functional suggests:
- kernel still partially alive
- likely blocked kernel threads (I/O / PCIe / NVMe path)

Similar behavior observed both with Frigate (video I/O) and InfluxDB (large queries)

If helpful, I can:
- test kernel parameters (e.g. NVMe power state limits)
- enable hung task panic for better diagnostics
- provide additional traces on next occurrence

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Raspberry Pi 5 hard freeze under high NVMe I/O load (Frigate / InfluxDB) – system remains pingable but SSH and journald hang #7184

Describe the bug

Steps to reproduce the behaviour

Device (s)

System

Logs

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Raspberry Pi 5 hard freeze under high NVMe I/O load (Frigate / InfluxDB) – system remains pingable but SSH and journald hang #7184

Description

Describe the bug

Steps to reproduce the behaviour

Device (s)

System

Logs

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions