Files
libva-multiplanar/phase8_iteration4_close.md
marfrit 67494ae7ee Iteration 4 close — Track A locked, three-iteration carryover resolved
The iter1+iter2+iter3 frame-11 EINVAL is empirically eliminated. mpv
direct stress test on ohm via patched libva-v4l2-request-fourier:

  RequestBeginPicture:     2130
  RequestSyncSurface:      4254
  S_EXT_CTRLS EINVAL:      0
  Unable to set control(s): 0
  Generic EINVAL:          0
  ENETDOWN:                0

2130 frames at 24 fps = real-time HW decode (>98% of 2160-frame max
in 90 seconds wall time). Track A's Phase 1 success criterion crushed.

Three correctness fixes (4 fork commits):
- 74d8dd1: DPB fields=V4L2_H264_FRAME_REF + skip stale entries
- 385dee1: fresh request_fd per frame (THE load-bearing fix)
- b81ce69: B-slice L1 reflist .fields copy-paste

Plus diagnostic instrumentation (a12d299, 4892656, f21bdf0) deferred
to iter5 sweep alongside earlier iter1/iter3 instrumentation.

Three new memory entries: kernel obfuscation extends to compound TRY,
request_fd lifecycle (fresh per frame), FFmpeg as empirical authority.
README iteration table updated.

Carries to iter5 substrate: DEBUG sweep, mpv libplacebo segfault,
multi-context libva safety, PGO Firefox rebuild, eventual upstream
prep (Mozilla bug + bootlin libva-v4l2-request).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:29:43 +00:00

8.9 KiB

Iteration 4 close (Phase 8) — Track A locked, three-iteration carryover resolved

Opened 2026-05-05 (just after iter3 close), closing 2026-05-05 same day. Locked candidate: Track A solo — fix the iter1+iter2+iter3 carryover frame-11 EINVAL. Substrate path 2 selected: diff our DPB+DECODE_PARAMS construction vs FFmpeg's libavcodec/v4l2_request_h264.c::fill_dpb.

Verdict: GREEN

Track A's load-bearing defect is empirically resolved. mpv direct stress test on ohm via patched libva-v4l2-request-fourier:

$ LIBVA_DRIVER_NAME=v4l2_request mpv --hwdec=vaapi-copy --vo=null bbb_1080p30_h264.mp4
After 90s wall time:
  RequestBeginPicture:     2130     (was: bailed at 11 in iter3)
  RequestSyncSurface:      4254
  S_EXT_CTRLS EINVAL:      0
  "Unable to set control(s)": 0
  Generic EINVAL:          0
  ENETDOWN:                0

2130 frames at 24 fps in 90 seconds wall = real-time HW decode (>98% of theoretical 2160-frame max). The libva-side decode pipeline now sustains arbitrary BBB-class H.264 content without V4L2 errors.

What landed

libva-v4l2-request-fourier fork commits

The fix is split into three correctness commits + two debug-instrumentation commits, in apply order:

  1. a12d299 iter4 DEBUG: Y2 v3 — retry with TRY_EXT_CTRLS (instrumentation)
  2. 74d8dd1 iter4 partial fix: DPB fill matches FFmpeg semantics (correctness)
    • dpb[].fields = V4L2_H264_FRAME_REF for every valid entry
    • Skip entries with valid && !used
  3. 4892656 iter4 DEBUG: pre-S_EXT_CTRLS DPB census + per-entry dump (instrumentation)
  4. 385dee1 iter4 fix: fresh request_fd per frame (load-bearing) (correctness)
    • In RequestSyncSurface, replace media_request_reinit(request_fd) with close(request_fd); surface_object->request_fd = -1;
    • Forces next BeginPicture to allocate a fresh fd via media_request_alloc
    • This is THE fix that crossed the threshold. All three of (74d8dd1, 385dee1, b81ce69) are correctness improvements; #2 (385dee1) is the one that flipped the outcome from "frame-11 EINVAL" to "2130 frames clean."
  5. f21bdf0 iter4 DEBUG: per-control TRY isolation (instrumentation — was the diagnostic that pivoted us from "bad control content" to "bad fd state")
  6. b81ce69 iter4 fix: B-slice L1 reflist .fields copy-paste (correctness; pre-existing iter1+ bug caught by Phase 5 review)

libva-multiplanar campaign artifacts

  • phase0_findings_iter4.md — substrate (7 candidates, locked A solo)
  • phase2_iter4_situation.md — kernel V4L2 control validation analysis
  • phase4_iter4_plan.md — diagnostic journey + fix authoring narrative
  • phase5_iter4_review.md — sonnet review (initial YELLOW → GREEN after C1+C2 resolved)
  • phase8_iteration4_close.md — this file

Diagnostic lessons (for memory + future iterations)

Kernel obfuscation extends to compound controls under TRY_EXT_CTRLS

The v4l2-ctrls-api.c:222-224 comment promised that TRY_EXT_CTRLS would report error_idx for the specific failing control. Empirically, for our compound H.264 controls + request_fd path, TRY also returned error_idx == count. Either the comment is outdated or the cluster-commit failure path bypasses the per-control update for both S and TRY. Practical diagnostic implication: don't rely on TRY to pinpoint compound-control failures; use per-control TRY isolation instead — submit each control in a count=1 v4l2_ext_controls and observe individual results.

"All controls fail individually" → request_fd state, not content

The breakthrough diagnostic: when every individual control fails on its own with the same EINVAL, the request_fd is in a bad state — not the control values. Pivot from content-correctness to lifecycle-correctness investigation. Cheap to test: per-control TRY iso with for i in 0..N { TRY([control_i]) }.

media_request_alloc is cheaper than chasing reinit-state semantics

The kernel's MEDIA_REQUEST_IOC_REINIT after queue+wait is supposed to be sufficient to clean a request for reuse, but for some surface-recycle pattern in our cap_pool it left the fd in a state that S_EXT_CTRLS rejected. We don't fully understand why. Allocating a fresh fd (MEDIA_IOC_REQUEST_ALLOC + close per frame) sidesteps the question. Cost is +1 ioctl pair per frame, well below noise on the V4L2 stack overhead.

dpb[].fields is mandatory, not optional

For frame-coded streams, V4L2_H264_FRAME_REF (= TOP_FIELD_REF | BOTTOM_FIELD_REF) must be set on every valid DPB entry. The kernel's reflist builder skips entries with fields == 0. UAPI doc Documentation/userspace-api/media/v4l/ext-ctrls-codec-stateless.rst says so explicitly. Our pre-iter4 driver had fields zero-initialized and never written.

FFmpeg's libavcodec/v4l2_request_h264.c is the empirical reference for V4L2-stateless H.264

Whenever our driver disagrees with FFmpeg semantically and we can't find documentation, FFmpeg is right. references/ffmpeg-kwiboo/libavcodec/v4l2_request_h264.c::fill_dpb_entry was the source-of-truth that surfaced two of the three correctness fixes this iteration.

State that carries to iter5

  • Hardware: ohm RK3568 hantro G1/G2, kernel 6.19.10 — unchanged.
  • Userspace: firefox 150.0.1 stock + firefox-fourier 150.0.1-1.1 (PGO-instrumented, 3.6 GB libxul.so) at /opt/firefox-fourier/ — unchanged.
  • Driver installed: /usr/lib/dri/v4l2_request_drv_video.so sha256 (post-iter4 close): rebuild on iter5 start to confirm. iter4 ended with 46c6e2e078697d27... (post-DPB fix) but b81ce69 needs to be rebuilt + redeployed before iter5 starts.
  • Test fixture: bbb_1080p30_h264.mp4, sha256 dcf8a7170fbd... — unchanged.
  • Build container: firefox-fourier LXD on boltzmann — unchanged, persistent.
  • Phase 7 evidence script: /home/mfritsche/iter3_phase7_evidence.sh on ohm.vpn — unchanged.
  • mpv stress-test command (iter4-introduced): documented above.

State that does NOT carry

  • The PGO-instrumented Firefox-fourier binary throttle. iter4 verified Track A via mpv direct because the PGO Firefox binary couldn't reach 720+ frames in 90s. iter5 may want a clean PGO-disabled Firefox rebuild for sustained Firefox-side stress testing.
  • /tmp/ff-fourier-stderr-v2.log and /tmp/mpv-iter4.log are tmpfs-volatile.

Documented limitations carried into iteration 5 substrate

  • DEBUG instrumentation density (carried from iter1/iter2/iter3/iter4 backlog). Driver now carries iter1 ENTER/CAPTURE-dump traces + msync workaround, iter1+ POC sentinel strip, iter3 Y2 v1, iter4 Y2 v3 + per-control TRY iso + DPB census. The iter5 sweep is the natural next iteration.
  • mpv libplacebo --vo=gpu segfault (carried from iter3 substrate, never iter3-or-iter4 scope). vaapi-copy + --vo=null works (iter4 verification), but the libplacebo Vulkan-fallback path still segfaults. iter5 candidate.
  • Multi-context libva safety (Sonnet 9.6 from iter1) — still carried. iter4's mpv test was single-context; concurrent-libva not exercised.
  • PGO profile generation under sandbox (iter3 Phase 6 finding) — --enable-profile-generate=cross PGO step still requires X11/Wayland that the LXC container can't provide. iter5 Firefox rebuild may want PGO disabled or a different rig.
  • Bootlin upstream prep — with iter4's load-bearing fix landed, the fork is significantly closer to upstreamability. Per feedback_no_upstream.md, no PR/MR happens without explicit operator instruction. But iter5 DEBUG sweep + the Mozilla bug filing (iter3 candidate G) become natural prerequisites.

Lessons distilled to memory

  • feedback_kernel_obfuscation_compound.md (NEW) — V4L2 S_EXT_CTRLS deliberately hides which compound control failed (sets error_idx = count after validate_ctrls fails for set=true). The kernel comment in v4l2-ctrls-api.c claims TRY_EXT_CTRLS escapes the obfuscation, but empirically TRY also returns error_idx == count for compound H.264 controls. Use per-control TRY isolation (count=1 for each control individually) to pinpoint which one fails or, if all fail, conclude the request_fd state is the issue.

  • feedback_request_fd_lifecycle.md (NEW) — when every individual control fails on the same fd with EINVAL, the fd's state is bad — not the control content. Allocating a fresh fd per frame (MEDIA_IOC_REQUEST_ALLOC + close per cycle) is cheaper to verify than chasing kernel MEDIA_REQUEST_IOC_REINIT lifecycle semantics. iter4's load-bearing fix uses this pattern. Cost: ~1 ioctl pair per frame, negligible on the V4L2 stack.

  • reference_ffmpeg_v4l2_request_is_authority.md (NEW) — libavcodec/v4l2_request_h264.c::fill_dpb_entry is the working reference for V4L2-stateless H.264 control construction. iter4 surfaced two correctness fixes by direct comparison: dpb[].fields = V4L2_H264_FRAME_REF and "skip stale entries (= entries not in the consumer's current ReferenceFrames[])." When semantics disagree and no documentation resolves the disagreement, FFmpeg is the empirical authority. Cached locally at references/ffmpeg-kwiboo/.