DEBUG sweep (A) is the carried-four-iterations backlog and natural prerequisite for upstreaming. mpv libplacebo segfault (B) and perf binding cell (C) are also long-deferred carryovers. New candidates this iteration: PGO-disabled Firefox rebuild (G), and the natural codec/hardware extensions (H). Recommended primary: A + F (sweep + upstream prep) — with Track A fixed in iter4, the fork is upstreamable in shape and just needs the diagnostic noise removed. F is gated on explicit operator instruction per feedback_no_upstream.md. Phase 1 lock awaits user candidate pick. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
14 KiB
Iteration 5 — Phase 0 (substrate / motivation / inventory)
Opens 2026-05-05 immediately after iteration 4 close (phase8_iteration4_close.md, fork commit b81ce69, campaign close 67494ae).
Predecessor close-out summary (iteration 4 → iteration 5)
iter4 was the first iteration that closed Track A — the iter1+iter2+iter3 frame-11 EINVAL carryover. Three correctness fixes landed in fork:
74d8dd1— DPBfields = V4L2_H264_FRAME_REF+ skip stale entries (FFmpeg-semantics match)385dee1— freshrequest_fdper frame (THE load-bearing fix)b81ce69— B-slice L1 reflist.fieldscopy-paste
Plus diagnostic instrumentation (a12d299, 4892656, f21bdf0) accumulated during the diagnostic journey.
iter4 verified Track A via mpv direct stress test on ohm: 2130 BeginPictures over 90s with 0 EINVAL of any kind — real-time HW decode through libva-v4l2-request-fourier without MOZ_DISABLE_RDD_SANDBOX=1. Track F (sandbox patch) from iter3 stays GREEN; the campaign now has a working H.264 decode pipeline through libva on hantro.
The campaign's original substrate question — "make multi-planar libva work on Rockchip hantro for production VAAPI consumers" — is empirically achieved at the libva-side decode layer.
Iteration 5 candidate research questions
A. DEBUG instrumentation sweep (carried from iter1+iter2+iter3+iter4)
Remove all accumulated diagnostic instrumentation commit-by-commit, building cleanly between each removal. End state: zero
request_log()calls in non-error paths, no patch-0011 sentinel write inEndPicture, no msync workaround (or document why it stays). Driver source builds clean and vaapi-copy + vaapi smoke tests still green.
Inventory of instrumentation to remove (or keep, as decided per item):
- iter1 ENTER traces in surface entry points (CreateBuffer, BeginPicture, etc.)
- iter1 patch-0011 sentinel write in
EndPicture - iter1 patch-0010 CAPTURE/OUTPUT hex-dumps in SyncSurface
- iter1 msync(MS_SYNC|MS_INVALIDATE) workaround in SyncSurface (probably keep — was load-bearing for cache coherency)
- iter1 POC sentinel strip (KEEP — load-bearing for ffmpeg-vaapi consumers)
- iter1 patch-0014 EACCES retry-skip in
v4l2_get_controls(KEEP — load-bearing reflective behavior) - iter1 slice_header bit-precise parser + dec_ref_pic_marking_bit_size etc. (KEEP — fixes hantro hw decode)
- iter3 Y2 v1 in
v4l2_ioctl_controls(REMOVE — superseded by iter4 Y2 v3) - iter4 Y2 v3 with TRY_EXT_CTRLS retry (REMOVE — fault no longer reproduces)
- iter4 DPB census + per-entry dump (REMOVE — fault no longer reproduces)
- iter4 per-control TRY isolation (REMOVE — fault no longer reproduces)
Why first: required prerequisite for any upstream snapshot (iter5 candidate F). Was deferred at iter1+iter2+iter3+iter4. Smaller scope than C or F, fits in any iteration's slack.
Risk: removing instrumentation that's actually load-bearing. Each removal verified by re-running mpv + Firefox + vainfo smoke tests.
B. mpv libplacebo --vo=gpu segfault (carried from iter3 substrate, never iter3+iter4 scope)
Resolve the segfault on
LIBVA_DRIVER_NAME=v4l2_request mpv --hwdec=vaapi --vo=gpuafter 4 frames on bbb_1080p30 when Vulkan init fails (VK_ERROR_INITIALIZATION_FAILED).
Symptom (captured iter3 substrate): Vulkan init fails, mpv falls through to GPU non-vulkan path, decode runs for 4 frames cleanly, then Unable to request buffers: Device or resource busy (REQBUFS EBUSY mid-stream), then bizarre CreateSurfaces2: surf_width=16 surf_height=16 sizes[1]=1050626 (uninitialized memory shape), then SIGSEGV.
Hypothesis (iter3-era): cap_pool resolution-change path doesn't fully drain CAPTURE before REQBUFs → kernel returns EBUSY → driver pushes ahead with garbage → mmap or pool-init crashes. Could be a Mesa update side effect.
iter4 evidence point: mpv + --vo=null works for 2130 frames. So the issue is consumer-side compositor path, not libva-side decode. Diagnosis path: --vo=null (works) vs --vo=gpu (segfault) → bisect by mpv flags.
Risk: may surface a Mesa or libplacebo bug we can't fix from the libva side.
C. Performance binding cell (deferred from iter1+iter2+iter3+iter4)
Establish a measurement protocol for HW vs SW decode on this rig: drop counts, effective FPS, browser CPU%, scanout-plane residency for {mpv vaapi DMA-BUF, mpv vaapi-copy, Firefox-fourier HW (sandbox-on), SW baseline}. Anchor in
phaseN_evidence/.
Why: anchors all iter1+iter2+iter3+iter4 claims to numbers. Carried four iterations. iter4's mpv stress test is a partial perf measurement (2130 frames clean, but no CPU%/drop count anchor).
Pairing potential: A (DEBUG sweep) before C — perf measurements should be on a clean instrumentation-free build. Or, run a baseline-vs-iter4 comparison BEFORE the sweep to capture the value of each instrumentation point.
D. V4L2_MEMORY_DMABUF (carried from iter2+iter3+iter4)
Replace V4L2_MEMORY_MMAP with userspace dma-buf allocation. iter2 Fix 3 was statistical (LRU mitigation); Option B is architectural (userspace owns the buffer).
Why: the cap_pool LRU is empirically working but doesn't formally close the DMA-BUF lifecycle race window. Option B closes it.
Risk: highest unknown. Possibly requires kernel work. Hantro on this kernel may not support V4L2_MEMORY_DMABUF at all; gstreamer's v4l2slh264dec uses MMAP only. Worth a probe before commit.
E. Multi-context libva safety (Sonnet review 9.6 from iter1, carried iter2/3/4)
Make the backend safe for two concurrent libva contexts in the same process (e.g. Firefox tab playing one video while another tab plays a different resolution).
LAST_OUTPUT_WIDTH/HEIGHTis a process-global static;cap_poolis per-driver_data but the V4L2 device is shared.
Why: four iterations carried this. Real consumers (Firefox multi-tab, mpv-while-Firefox) would surface it. With Track A fixed, this becomes the next architectural correctness piece.
Risk: moderate. The fix shape is similar to iter2 Fix 1 (per-context state instead of process-global) but applied to more state.
F. Bootlin / Mozilla upstreaming (combined from iter3 candidate G + iter4 carryover)
File the Mozilla Bugzilla bug for
/dev/media*+ V4L2-stateless RDD sandbox with the iter3 firefox-fourier patch. File a bootlin issue onbootlin/libva-v4l2-requestwith iter1+iter2+iter3+iter4 patches as a cohesive working set.
Why: with Track A fixed, the libva-v4l2-request-fourier fork has empirical proof of working H.264 decode on hantro for any libva consumer. The patches are upstreamable in shape, just need the DEBUG sweep (A) cleanup first.
Stance: per feedback_no_upstream.md, no PR/MR/bug-file happens without explicit operator instruction. F is gated on operator decision.
G. PGO-disabled Firefox rebuild
Rebuild firefox-fourier without
--enable-profile-generate=crossto get a release-quality binary suitable for performance measurement and Firefox-side stress testing.
Why: iter3's PGO-instrumented binary is 3.6 GB libxul.so and decodes at ~0.23x realtime under sandbox. iter4 verified Track A via mpv direct because the PGO Firefox couldn't reach 720+ frames in 90s. A clean Firefox-fourier build would let iter5 do Firefox-side stress testing.
Risk: ~2h rebuild on boltzmann. The infrastructure is in place (firefox-fourier LXD container persists). Edit the PKGBUILD to skip PGO, rebuild, redeploy.
Pairing potential: G + C (rebuild + perf measurement) is natural. G + B (rebuild + libplacebo investigation through Firefox-side path) is also possible.
H. New codec / hardware (deferred from iter1+ scope)
Extend to MPEG-2 (next codec per iter1 lock) or to fresnel RK3399 / ampere RK3588 hardware (next platforms).
Why: the campaign's original locked scope was H.264-first then MPEG-2; ohm RK3568 first then fresnel and ampere/boltzmann. With ohm+H.264 working, the natural extensions become possible.
Risk: new hardware iterations are their own can-of-worms. Probably one-codec-OR-one-hardware per iteration.
Recommended pairings
- A + F (DEBUG sweep + upstream prep). Most natural sequence — sweep makes the patches mailing-list-ready. Smallest combined scope.
- A + C (sweep + perf). Sweep first to get clean measurements, then C anchors the campaign-wide claims.
- B alone (libplacebo) — separate consumer-side investigation, doesn't share authoring with anything else.
- E alone (multi-context safety) — architectural correctness piece, requires focused attention.
- G + C or G + B (PGO-disabled rebuild + perf or libplacebo) — Firefox-side validation matrix.
State that carries (re-verified 2026-05-05 close)
- Hardware: ohm RK3568 hantro G1/G2, kernel 6.19.10.
ohm.vpnaccess path. Plasma 6 Wayland session interactive. - Userspace: firefox 150.0.1 stock + firefox-fourier 150.0.1-1.1 (PGO-instrumented) at
/opt/firefox-fourier/, libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3. - Test fixture:
/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4sha256dcf8a7170fbd.... - Driver installed:
/usr/lib/dri/v4l2_request_drv_video.sopost-iter4 (sha256 to recompute on iter5 start; rebuild on ohm via meson+ninja in /tmp/libva-src to redeploy with iter4 commits). - Build container:
firefox-fourierLXD on boltzmann,ssh -J boltzmann builder@firefox-fourier. Persistent. Source still extracted at/build/aur/firefox-fourier/src/firefox-150.0.1/with iter3 patches applied — incremental rebuilds via./mach build. - Phase 7 scripts:
/home/mfritsche/iter3_phase7_evidence.sh+/tmp/run_phase7_v2.shon ohm.vpn. - mpv stress test command:
LIBVA_DRIVER_NAME=v4l2_request mpv --hwdec=vaapi-copy --vo=null --no-audio bbb_1080p30_h264.mp4— proven Track A verifier. - References cache:
references/ffmpeg-kwiboo/(FFmpeg V4L2-request reference),references/linux-mainline/(kernel hantro source),references/firefox-master/(Mozilla sandbox source).
State that does NOT carry
- Performance numbers. Same caveat as iter1+iter2+iter3+iter4. Candidate C is the natural anchor.
- iter4 driver build state on ohm
/tmp/libva-srcis tmpfs-volatile; rsync+rebuild from rpi at iter5 start.
Tooling and measurement-instrument inventory
Carried from iter4:
strace -f -e trace=openat,close,ioctlfor libva-side V4L2 ioctl tracingsudo ftrace events/v4l2/* events/vb2/* events/dma_fence/*for kernel-side V4L2/vb2 lifecyclesudo dmesg -wfor kernel-side warningsmpv --frames=N --vo=nullwith stderr capture for libva stressmpv --frames=N --vo=gpuwith stderr capture for full-pipeline (will surface candidate B's segfault)- Firefox
MOZ_LOG=PlatformDecoderModule:5,VideoBridge:5(under firefox-fourier, no MOZ_DISABLE_RDD_SANDBOX needed) - Operator visual inspection on real screen (load-bearing for "frames reach screen" claims)
- iter3 Y2 v1 + iter4 Y2 v3 + iter4 DPB census + iter4 per-control TRY iso (ALL up for removal in candidate A)
Likely needed for specific iter5 candidates:
- For A (sweep): per-removal smoke test recipe (vainfo + mpv vaapi-copy + Firefox-fourier 30s).
- For B (libplacebo): mpv
--vo=gpuminimal repro, possibly Mesa bisect or rollback. - For C (perf):
pidstat -u -p $(pidof ...)for CPU%, Mali-G52 freq via/sys/class/devfreq/fde60000.gpu, scanout-plane query (Waylandext-output-managementis hard — may need ftrace). - For D (DMABUF):
gbm_bo_createtest program +VIDIOC_QBUF type=V4L2_MEMORY_DMABUFexploratory. - For G (PGO-disabled rebuild): edit firefox-fourier PKGBUILD to skip
--enable-profile-generate=cross,./mach buildincremental, redeploy via 600 MB tarball.
In-scope (LOCKING DEFERRED — Phase 1 user input)
To be locked at Phase 1 from candidates A..H above. Recommended pairings flagged per candidate.
Out-of-scope (LOCKED 2026-05-05 for iteration 5)
- Track A re-test (DONE in iter4 — 2130 frames clean is anchored evidence).
- Track F re-test (DONE in iter3 — sandbox patch verified end-to-end).
- New codecs OUTSIDE H.264 / MPEG-2 (VP8/VP9/AV1/HEVC out per iter1 lock).
- Bootlin/Mozilla upstream PR/MR/bug-file unless explicitly tasked at Phase 1 (candidate F is the gated option).
Phase 1 success criterion (will lock after user picks candidate)
Pre-lock template:
- For candidate A: "Driver source builds clean with zero
request_log()calls in non-error paths, all iter1+iter3+iter4 DEBUG commits removed (or explicitly justified-and-kept), vaapi-copy + mpv smoke tests still green at 2000+ frames clean." - For candidate B: "
mpv --hwdec=vaapi --vo=gpudecodes ≥30s of bbb_1080p30 without segfault — or root cause documented as Mesa/libplacebo upstream issue with operator-actionable workaround." - For candidate C: "Anchored perf table for {mpv vaapi DMA-BUF, mpv vaapi-copy, Firefox-fourier HW (when G done), SW baseline} across drop count + CPU% + frame timing on bbb_1080p30; reproducible via documented script."
- For candidate D: "vaapi-copy + vaapi --vo=null still produce real frames with
V4L2_MEMORY_DMABUF-backed CAPTURE buffers; race window mathematically eliminated." - For candidate E: "Two concurrent libva contexts decode independently without cross-context state corruption (verifying via two simultaneous mpv processes on different fixtures)."
- For candidate F: "Mozilla Bugzilla bug filed with iter3 firefox-fourier patch; bootlin issue filed against libva-v4l2-request with iter1-iter4 patch series."
- For candidate G: "Firefox-fourier rebuilt without
--enable-profile-generate=cross, deployed to ohm, plays bbb_1080p30 at sustained ≥24 fps with HW decode through firefox-fourier sandbox." - For candidate H: per sub-target (MPEG-2 codec OR fresnel/ampere hardware).
Stop point
Phase 1 lock requires user input — pick from A..H (and any pairing). Recommended primary: A + F (DEBUG sweep + upstream prep). With Track A fixed, the fork is ready for upstream submission once the diagnostic noise is gone. F is gated on explicit operator instruction; if F is "no" this iteration, A alone is the natural close-the-instrumentation-loop iteration.
Alternative leans:
- A + C if perf measurement is a higher priority than upstream prep
- B alone if mpv libplacebo regression matters more than cleanup
- G + B if Firefox-side stress + libplacebo are the priority
After lock, iter5 phases 2..8 proceed autonomously per "Stop only if user is needed."