Firefox VAAPI present path: NV12 dmabuf zero-copy not engaging; YUV→RGB lands CPU-side, stalls the daemon decode loop #5
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom
With the libva → kernel → daedalus_v4l2 daemon path fully alive on Pi CM5 (PR #4, daemon journal showing per-frame
REQ_DECODE codec=3 ... meta=h264withdecoder: OK 1280x720latency 3-5 ms), Firefox YouTube playback of a 30 fps 720p H.264 stream is visibly jumpy. Sustained decode rate measured at ~23 fps (577decoder: OKevents in 25 s) — ~23 % source-frame drop on what should be the easy case relative to the documented "30fps@1080p is fine" floor.What isn't the bottleneck
Ruled out on higgs (Pi CM5, kernel 6.18.29+rpt-rpi-2712) during 2026-05-21 diagnostic:
vcgencmd get_throttled = 0x0.MIN_CAP_POOL = 24(cap_pool.h:70); per-session log confirmscap_pool_init: 24 slots ready. rkvdec / cedrus / hantro don't enforce minimums beyond what userspace asks for, and our daedalus kernel-side queue_setup already enforcesmin 2on top — so buffer pipelining headroom is fine.MediaPDecoder #N ProcessDecodeline showsmDuration=33333µs(30 fps source),mTimeadvances in even 33333 µs steps. Firefox knows the frame rate.MediaPDecoder #Nworkers all hitIsHardwareAccelerated=1simultaneously, no EBUSY. PR #3 is doing its job under real load.What IS the bottleneck — verified
With
MOZ_LOG=Dmabuf:5,WaylandDMABuf:5,VideoBridge:5,WebRender:3, two smoking guns:(a) Firefox's
VideoFramePoolfor VAAPI is emptyThis is
dom/media/platforms/ffmpeg/FFmpegVideoFramePool.cpp's zero-copy pool. With size 0, every decoded frame must be uploaded by some other (CPU-side) route —VideoFramePoolis the channel that would import V4L2 CAPTURE NV12 dmabufs into Firefox's GPU process asDMABufSurfaceYUVwithout copying.(b) RDD → GPU texture-construct IPC fails
The RDD process is trying to hand a video texture through the
PVideoBridgeIPC to the GPU/parent process, and the GPU process is rejecting it as malformed. Consistent with a failed/missing dmabuf descriptor in the texture-construct message.(c) Compositor doesn't advertise NV12 dmabuf acceptance
Dmabuf-feedback from kwin_wayland (from MOZ_LOG
Dmabuf:5):No NV12. Compositor only accepts RGB-family dmabufs. So even if Firefox's
VideoFramePoolworked, the daemon's NV12 CAPTURE buffers couldn't go straight to the compositor — they need a YUV→RGB step on the way through.Cascade
VIDIOC_EXPBUF).VASurfaceIDs viavaCreateSurfaces2/vaExportSurfaceHandle.FFmpegVideoFramePoolshould enroll them asDMABufSurfaceYUVfor zero-copy GL/EGL import. Instead pool size 0.Where to look
This is a daedalus-adjacent companion issue, not a daemon bug. Investigation surface:
firefox-fourier(mozilla-central)dom/media/platforms/ffmpeg/FFmpegVideoFramePool.cpp— why does VAAPI surface enrollment produce a 0-size pool? Are we hitting an early-bailout path on avaExportSurfaceHandlefailure, or a format-modifier mismatch (V3D may not advertise NV12 as a sampler-importable format on the EGL_EXT_image_dma_buf_import side)?libva-v4l2-request-fouriervaExportSurfaceHandle— does the descriptor it produces from a V4L2 CAPTURE buffer have the modifiers Firefox/Mesa-V3D expect (DRM_FORMAT_MOD_LINEAR forV4L2_PIX_FMT_NV12)?EGL_EXT_image_dma_buf_importaccept NV12 modifier=LINEAR with two-plane layout? Some V3D builds historically rejected NV12 dmabuf.Architectural note
The right fix is to make the YUV→RGB step land on the V3D (VC VII) GPU as a zero-copy chain — daemon NV12 dmabuf → EGLImage → GL YUV-sampling shader → compositor. NOT to optimise the CPU-side fallback path. CPU-NEON improvements inside the daemon would be "convoluted pipeline doing nothing" (mfritsche 2026-05-21).
Repro
Grep
ff.logforVideoFramePool/PVideoBridge::Msg_PTextureConstructor. Watchjournalctl -u daedalus-v4l2for inter-REQ_DECODEgaps > 20 ms during steady-state playback.