kernel: per-ctx vb2 lock — Firefox multi-process VAAPI unblock #3
Reference in New Issue
Block a user
Delete Branch "noether/kernel-per-ctx-vb-mutex"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Firefox YouTube on Pi 5 — works on a Pi 5 with daedalus_v4l2
Bug
daedalus_queue_initwas wiringsrc_vq->lockanddst_vq->locktoctx->dev->m2m_lock— a device-wide mutex. Every vb2 ioctl (S_FMT, REQBUFS, QBUF, DQBUF, STREAMON, ...) serialised across all concurrent clients of /dev/video0.For single-client consumers (the test_m2m_* harness, simple ffmpeg jobs) this was invisible. For Firefox — which spawns separate content + RDD + GPU processes that each open /dev/video0 and run libva probe simultaneously — the contention surfaced as EBUSY from one libva session's S_FMT(OUTPUT_MPLANE) while another was mid-streamon.
Result: VAAPI initialised in ONE Firefox process, but other processes that also wanted to decode (e.g. the content process for actual video tags) bailed at S_FMT before any frame queued.
Fix
Each open() gets its own
ctx->vb_mutex, initialised indaedalus_openand destroyed indaedalus_release(and the err_ctrl unwind path). Per-context vb2_queue locks — concurrent clients no longer fight.cedrus / rkvdec / hantro all use the per-ctx vb mutex pattern for the same reason; this mirrors them.
Verification
On higgs (Pi CM5, kernel 6.18.29):
MOZ_VA_API_ENABLED=1 LIBVA_DRIVER_NAME=v4l2_request firefoxDaemon journal shows clean per-frame REQ_DECODE / decoder:OK pairs:
Zero EBUSY in firefox stderr or daemon journal during the session.
Throughput headroom: ~230 fps decoded at 640x368 in the test == ~7× the 30fps@1080p Pi 5 Fourier target, with the daemon's libavcodec dlopen path doing all the work CPU-side.
Closes
The Pi 5 Fourier campaign's H.264-via-libva-via-daedalus end-to-end path for multi-process VAAPI consumers (Firefox; Chromium / mpv should benefit the same).
Generated with Claude Code
daedalus_queue_init was wiring both src_vq->lock and dst_vq->lock to ctx->dev->m2m_lock — a device-wide mutex. That serialises every vb2 ioctl (S_FMT, REQBUFS, QBUF, DQBUF, STREAMON, ...) across ALL concurrent clients of /dev/video0. For a single-client consumer like the test_m2m_* tools it doesn't matter; for Firefox, which spawns separate content + RDD + GPU processes that each open /dev/video0 and run libva probe simultaneously, the contention showed up as EBUSY from one libva session's S_FMT(OUTPUT_MPLANE) when another session was mid-streamon on the same device. Observable on higgs (Pi CM5): $ MOZ_VA_API_ENABLED=1 LIBVA_DRIVER_NAME=v4l2_request firefox ... v4l2-request: phase 8.10: opened daedalus_v4l2 at video_fd=32 ... v4l2-request: cap_pool_init: 24 slots ready v4l2-request: Unable to set format for type 10: Device or resource busy After this fix, each open() gets its own ctx->vb_mutex and the per-context vb2_queue locks are independent — Firefox's multi- process VAAPI clients no longer fight each other. YouTube playback on higgs runs through daedalus at ~230 fps sustained (640x368, libavcodec dlopen path), 7× headroom over the 30fps target. cedrus / rkvdec / hantro all use the per-ctx vb mutex pattern for the same reason. This mirrors them. Lifecycle: - mutex_init in daedalus_open (right after the kzalloc that creates ctx, before v4l2_fh_init). - mutex_destroy in daedalus_release (after v4l2_fh_exit, before kfree), and in the err_ctrl unwind path in daedalus_open. Verified end-to-end on higgs: - rmmod + modprobe the rebuilt .ko. - Restart daedalus-v4l2.service. - Firefox YouTube playback engages VAAPI, daemon journal shows cookie=1..N codec=3 (H.264) REQ_DECODE / decoder:OK pairs with unique per-frame fnv1a hashes. - No EBUSY in either firefox stderr or daemon journal during the entire session. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>