kernel: claim src/dst at device_run, not at buf_done (fixes panic from #7) #8

Merged
marfrit merged 1 commits from noether/kernel-claim-bufs-at-device-run into main 2026-05-21 11:54:53 +00:00

1 Commits

Author SHA1 Message Date
claude-noether f10a26d883 kernel: claim src/dst at device_run, not at buf_done
Hard reboot observed on higgs (Pi CM5) during the first mpv vaapi-copy
playback against the freshly-deployed r28+g79256dc stack — kernel
panic, no persistent journal, no recoverable trace.  Bug introduced
by the daedalus-v4l2#6 reorder fix (#7).

Cause
-----
The new completion path runs `v4l2_m2m_job_finish` on SRC_CONSUMED
even when the dst_buf is still parked (waiting for a future
HAS_PIXELS).  job_finish moves the m2m_ctx back to IDLE, the
scheduler dispatches the next device_run — which calls
`v4l2_m2m_next_dst_buf`, which returns the head of the CAPTURE
ready-queue, which is STILL the parked dst_buf because we never
removed it.  Two inflight entries now reference the same vb2_buffer;
the later HAS_PIXELS triggers `v4l2_m2m_dst_buf_remove_by_buf` on a
vb2_buffer whose list_head is no longer linked to that queue, and
`list_del()` smashes the next/prev pointers of whatever ELSE was at
those addresses.

Fix
---
Take both src and dst off `m2m_ctx`'s rdy_queue at device_run — as
soon as `v4l2_m2m_next_*_buf` has peeked them and all early-exit
validation has passed.  After that, the daemon owns both halves
exclusively via the inflight item; the m2m scheduler can't re-issue
them on the next device_run.  Completion path drops the redundant
`_remove_by_buf` calls — list is already detached, so `buf_done`
alone is correct.

Matches the amphion `vdec.c`/`venc.c` pattern (which also claims at
device_run for the same reason: amphion's encode pipeline parks
output buffers across multiple frames waiting for the codec to
finish, structurally the same as our H.264 B-frame DPB parking).

`fail_buf_error` learns about the new `claimed` flag and skips the
`v4l2_m2m_*_buf_remove` calls when the buffers have already been
removed by-buf at device_run.

Verified
--------
Builds clean against 6.18.29+rpt-rpi-2712.  Field test pending —
deploy via marfrit-packages bump in lock-step with the daemon
(which doesn't need to change for this fix; PROTO_VERSION stays at 1).
2026-05-21 13:49:44 +02:00