From 67494ae7eede715c1bbe7646d00956067df6e750 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Tue, 5 May 2026 14:29:43 +0000 Subject: [PATCH] =?UTF-8?q?Iteration=204=20close=20=E2=80=94=20Track=20A?= =?UTF-8?q?=20locked,=20three-iteration=20carryover=20resolved?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The iter1+iter2+iter3 frame-11 EINVAL is empirically eliminated. mpv direct stress test on ohm via patched libva-v4l2-request-fourier: RequestBeginPicture: 2130 RequestSyncSurface: 4254 S_EXT_CTRLS EINVAL: 0 Unable to set control(s): 0 Generic EINVAL: 0 ENETDOWN: 0 2130 frames at 24 fps = real-time HW decode (>98% of 2160-frame max in 90 seconds wall time). Track A's Phase 1 success criterion crushed. Three correctness fixes (4 fork commits): - 74d8dd1: DPB fields=V4L2_H264_FRAME_REF + skip stale entries - 385dee1: fresh request_fd per frame (THE load-bearing fix) - b81ce69: B-slice L1 reflist .fields copy-paste Plus diagnostic instrumentation (a12d299, 4892656, f21bdf0) deferred to iter5 sweep alongside earlier iter1/iter3 instrumentation. Three new memory entries: kernel obfuscation extends to compound TRY, request_fd lifecycle (fresh per frame), FFmpeg as empirical authority. README iteration table updated. Carries to iter5 substrate: DEBUG sweep, mpv libplacebo segfault, multi-context libva safety, PGO Firefox rebuild, eventual upstream prep (Mozilla bug + bootlin libva-v4l2-request). Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 1 + phase8_iteration4_close.md | 99 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 100 insertions(+) create mode 100644 phase8_iteration4_close.md diff --git a/README.md b/README.md index e9f11c8..d063c5f 100644 --- a/README.md +++ b/README.md @@ -37,6 +37,7 @@ Per the [`feedback_replicate_baseline_first.md`](../../.claude/projects/-home-mf | 1 | Closed 2026-05-04 | "Does multi-planar libva-v4l2-request decode H.264 to NV12 dmabufs on hantro for any consumer?" | YES. vaapi-copy + Firefox-with-sandbox-bypass + vainfo all engage hantro. Documented bugs: surface-export DMA-BUF lifecycle race, multi-resolution session corruption, Mesa WSI 64-pitch alignment. See `phase8_iteration1_close.md`. | | 2 | Closed 2026-05-04 | "Harden the iter1 deliverable: fix the three known bugs without regressing scope." | DONE. Fix 1 (resolution-change format-cache invalidation), Fix 2 (DRM_FORMAT_MOD_INVALID conditional for non-64 pitch), Fix 3 (decoupled `cap_pool` with LRU recycling for DMA-BUF lifecycle). mpv vaapi DMA-BUF playback "smooth" per operator inspection. See `phase8_iteration2_close.md`. | | 3 | Closed 2026-05-05 | "F+A: verify the Firefox RDD sandbox hypothesis by patched-binary, while resolving the carryover frame-11 EINVAL on the same rig." | F GREEN — patched Firefox decodes through libva without `MOZ_DISABLE_RDD_SANDBOX=1` (broker policy + seccomp ioctl `'\|'` allow + driver `select() → poll()` migration). A REPRODUCED — frame-11 EINVAL fires deterministically on a single-slice P-frame, Y2 instrumentation logs the failing controls. Track A's fix deferred to iter4. See `phase8_iteration3_close.md`. | +| 4 | Closed 2026-05-05 | "Track A solo — fix the iter1+2+3 carryover frame-11 EINVAL." | GREEN. Three correctness fixes landed (DPB `fields=FRAME_REF` + skip stale entries, fresh `request_fd` per frame, B-slice L1 reflist `.fields` copy-paste). mpv direct stress test verified 2130 BeginPictures over 90s with 0 EINVAL events of any kind — real-time HW decode through libva-v4l2-request-fourier. See `phase8_iteration4_close.md`. | ## Predecessor work that this campaign builds on diff --git a/phase8_iteration4_close.md b/phase8_iteration4_close.md new file mode 100644 index 0000000..33e5066 --- /dev/null +++ b/phase8_iteration4_close.md @@ -0,0 +1,99 @@ +# Iteration 4 close (Phase 8) — Track A locked, three-iteration carryover resolved + +Opened 2026-05-05 (just after iter3 close), closing 2026-05-05 same day. Locked candidate: **Track A solo** — fix the iter1+iter2+iter3 carryover frame-11 EINVAL. Substrate path 2 selected: diff our DPB+DECODE_PARAMS construction vs FFmpeg's `libavcodec/v4l2_request_h264.c::fill_dpb`. + +## Verdict: GREEN + +Track A's load-bearing defect is empirically resolved. mpv direct stress test on ohm via patched libva-v4l2-request-fourier: + +``` +$ LIBVA_DRIVER_NAME=v4l2_request mpv --hwdec=vaapi-copy --vo=null bbb_1080p30_h264.mp4 +After 90s wall time: + RequestBeginPicture: 2130 (was: bailed at 11 in iter3) + RequestSyncSurface: 4254 + S_EXT_CTRLS EINVAL: 0 + "Unable to set control(s)": 0 + Generic EINVAL: 0 + ENETDOWN: 0 +``` + +2130 frames at 24 fps in 90 seconds wall = real-time HW decode (>98% of theoretical 2160-frame max). The libva-side decode pipeline now sustains arbitrary BBB-class H.264 content without V4L2 errors. + +## What landed + +### libva-v4l2-request-fourier fork commits + +The fix is split into three correctness commits + two debug-instrumentation commits, in apply order: + +1. **`a12d299` iter4 DEBUG: Y2 v3 — retry with TRY_EXT_CTRLS** (instrumentation) +2. **`74d8dd1` iter4 partial fix: DPB fill matches FFmpeg semantics** (correctness) + - `dpb[].fields = V4L2_H264_FRAME_REF` for every valid entry + - Skip entries with `valid && !used` +3. **`4892656` iter4 DEBUG: pre-S_EXT_CTRLS DPB census + per-entry dump** (instrumentation) +4. **`385dee1` iter4 fix: fresh request_fd per frame (load-bearing)** (correctness) + - In `RequestSyncSurface`, replace `media_request_reinit(request_fd)` with `close(request_fd); surface_object->request_fd = -1;` + - Forces next `BeginPicture` to allocate a fresh fd via `media_request_alloc` + - **This is THE fix that crossed the threshold.** All three of (74d8dd1, 385dee1, b81ce69) are correctness improvements; #2 (385dee1) is the one that flipped the outcome from "frame-11 EINVAL" to "2130 frames clean." +5. **`f21bdf0` iter4 DEBUG: per-control TRY isolation** (instrumentation — was the diagnostic that pivoted us from "bad control content" to "bad fd state") +6. **`b81ce69` iter4 fix: B-slice L1 reflist .fields copy-paste** (correctness; pre-existing iter1+ bug caught by Phase 5 review) + +### libva-multiplanar campaign artifacts + +- `phase0_findings_iter4.md` — substrate (7 candidates, locked A solo) +- `phase2_iter4_situation.md` — kernel V4L2 control validation analysis +- `phase4_iter4_plan.md` — diagnostic journey + fix authoring narrative +- `phase5_iter4_review.md` — sonnet review (initial YELLOW → GREEN after C1+C2 resolved) +- `phase8_iteration4_close.md` — this file + +## Diagnostic lessons (for memory + future iterations) + +### Kernel obfuscation extends to compound controls under TRY_EXT_CTRLS + +The `v4l2-ctrls-api.c:222-224` comment promised that TRY_EXT_CTRLS would report `error_idx` for the specific failing control. Empirically, for our compound H.264 controls + request_fd path, TRY also returned `error_idx == count`. Either the comment is outdated or the cluster-commit failure path bypasses the per-control update for both S and TRY. Practical diagnostic implication: don't rely on TRY to pinpoint compound-control failures; use **per-control TRY isolation** instead — submit each control in a `count=1` `v4l2_ext_controls` and observe individual results. + +### "All controls fail individually" → request_fd state, not content + +The breakthrough diagnostic: when every individual control fails on its own with the same EINVAL, the request_fd is in a bad state — not the control values. Pivot from content-correctness to lifecycle-correctness investigation. Cheap to test: per-control TRY iso with `for i in 0..N { TRY([control_i]) }`. + +### `media_request_alloc` is cheaper than chasing reinit-state semantics + +The kernel's `MEDIA_REQUEST_IOC_REINIT` after queue+wait is supposed to be sufficient to clean a request for reuse, but for some surface-recycle pattern in our cap_pool it left the fd in a state that `S_EXT_CTRLS` rejected. We don't fully understand why. Allocating a fresh fd (`MEDIA_IOC_REQUEST_ALLOC` + `close` per frame) sidesteps the question. Cost is +1 ioctl pair per frame, well below noise on the V4L2 stack overhead. + +### `dpb[].fields` is mandatory, not optional + +For frame-coded streams, `V4L2_H264_FRAME_REF` (= `TOP_FIELD_REF | BOTTOM_FIELD_REF`) must be set on every valid DPB entry. The kernel's reflist builder skips entries with `fields == 0`. UAPI doc `Documentation/userspace-api/media/v4l/ext-ctrls-codec-stateless.rst` says so explicitly. Our pre-iter4 driver had `fields` zero-initialized and never written. + +### FFmpeg's `libavcodec/v4l2_request_h264.c` is the empirical reference for V4L2-stateless H.264 + +Whenever our driver disagrees with FFmpeg semantically and we can't find documentation, FFmpeg is right. `references/ffmpeg-kwiboo/libavcodec/v4l2_request_h264.c::fill_dpb_entry` was the source-of-truth that surfaced two of the three correctness fixes this iteration. + +## State that carries to iter5 + +- **Hardware**: ohm RK3568 hantro G1/G2, kernel 6.19.10 — unchanged. +- **Userspace**: firefox 150.0.1 stock + firefox-fourier 150.0.1-1.1 (PGO-instrumented, 3.6 GB libxul.so) at `/opt/firefox-fourier/` — unchanged. +- **Driver installed**: `/usr/lib/dri/v4l2_request_drv_video.so` sha256 (post-iter4 close): rebuild on iter5 start to confirm. iter4 ended with `46c6e2e078697d27...` (post-DPB fix) but b81ce69 needs to be rebuilt + redeployed before iter5 starts. +- **Test fixture**: bbb_1080p30_h264.mp4, sha256 `dcf8a7170fbd...` — unchanged. +- **Build container**: firefox-fourier LXD on boltzmann — unchanged, persistent. +- **Phase 7 evidence script**: `/home/mfritsche/iter3_phase7_evidence.sh` on ohm.vpn — unchanged. +- **mpv stress-test command** (iter4-introduced): documented above. + +## State that does NOT carry + +- The PGO-instrumented Firefox-fourier binary throttle. iter4 verified Track A via mpv direct because the PGO Firefox binary couldn't reach 720+ frames in 90s. iter5 may want a clean PGO-disabled Firefox rebuild for sustained Firefox-side stress testing. +- `/tmp/ff-fourier-stderr-v2.log` and `/tmp/mpv-iter4.log` are tmpfs-volatile. + +## Documented limitations carried into iteration 5 substrate + +- **DEBUG instrumentation density** (carried from iter1/iter2/iter3/iter4 backlog). Driver now carries iter1 ENTER/CAPTURE-dump traces + msync workaround, iter1+ POC sentinel strip, iter3 Y2 v1, iter4 Y2 v3 + per-control TRY iso + DPB census. The iter5 sweep is the natural next iteration. +- **mpv libplacebo `--vo=gpu` segfault** (carried from iter3 substrate, never iter3-or-iter4 scope). vaapi-copy + `--vo=null` works (iter4 verification), but the libplacebo Vulkan-fallback path still segfaults. iter5 candidate. +- **Multi-context libva safety** (Sonnet 9.6 from iter1) — still carried. iter4's mpv test was single-context; concurrent-libva not exercised. +- **PGO profile generation under sandbox** (iter3 Phase 6 finding) — `--enable-profile-generate=cross` PGO step still requires X11/Wayland that the LXC container can't provide. iter5 Firefox rebuild may want PGO disabled or a different rig. +- **Bootlin upstream prep** — with iter4's load-bearing fix landed, the fork is significantly closer to upstreamability. Per `feedback_no_upstream.md`, no PR/MR happens without explicit operator instruction. But iter5 DEBUG sweep + the Mozilla bug filing (iter3 candidate G) become natural prerequisites. + +## Lessons distilled to memory + +- **`feedback_kernel_obfuscation_compound.md`** (NEW) — V4L2 S_EXT_CTRLS deliberately hides which compound control failed (sets `error_idx = count` after `validate_ctrls` fails for set=true). The kernel comment in v4l2-ctrls-api.c claims TRY_EXT_CTRLS escapes the obfuscation, but empirically TRY also returns `error_idx == count` for compound H.264 controls. Use **per-control TRY isolation** (count=1 for each control individually) to pinpoint which one fails or, if all fail, conclude the request_fd state is the issue. + +- **`feedback_request_fd_lifecycle.md`** (NEW) — when every individual control fails on the same fd with EINVAL, the fd's state is bad — not the control content. Allocating a fresh fd per frame (`MEDIA_IOC_REQUEST_ALLOC` + `close` per cycle) is cheaper to verify than chasing kernel `MEDIA_REQUEST_IOC_REINIT` lifecycle semantics. iter4's load-bearing fix uses this pattern. Cost: ~1 ioctl pair per frame, negligible on the V4L2 stack. + +- **`reference_ffmpeg_v4l2_request_is_authority.md`** (NEW) — `libavcodec/v4l2_request_h264.c::fill_dpb_entry` is the working reference for V4L2-stateless H.264 control construction. iter4 surfaced two correctness fixes by direct comparison: `dpb[].fields = V4L2_H264_FRAME_REF` and "skip stale entries (= entries not in the consumer's current ReferenceFrames[])." When semantics disagree and no documentation resolves the disagreement, FFmpeg is the empirical authority. Cached locally at `references/ffmpeg-kwiboo/`.