From 8dd3f399639f7a40bc7b70654a784ffcff09c94a Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Tue, 5 May 2026 13:02:25 +0000 Subject: [PATCH] =?UTF-8?q?Iteration=204=20Phase=201=20lock=20=E2=80=94=20?= =?UTF-8?q?A=20solo=20(frame-11=20EINVAL=20fix)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Track A locked solo. Pairing options A+B and A+D both deferred to iter5+; Track A is the load-bearing carry from iter1+iter2+iter3, fix-loop wants the focus. Phase 1 success criterion: ≥30s (≥720 frames @ 24fps) of bbb_1080p30 decoded by patched-Firefox-fourier on ohm without S_EXT_CTRLS EINVAL, with operator visual ack of frames rendering in the Firefox window. Diagnosis path: read hantro_g1_h264_dec.c set_params validation, diff our DECODE_PARAMS / SLICE_PARAMS / SPS / PPS struct construction vs FFmpeg reference, speculative-fix loop on ohm. Sonnet 7.5 (mid-stream non-IDR DPB state) is the suspect surface. Co-Authored-By: Claude Opus 4.7 (1M context) --- phase0_findings_iter4.md | 31 +++++++++++++------------------ 1 file changed, 13 insertions(+), 18 deletions(-) diff --git a/phase0_findings_iter4.md b/phase0_findings_iter4.md index 57536ed..9a8b3f9 100644 --- a/phase0_findings_iter4.md +++ b/phase0_findings_iter4.md @@ -122,30 +122,25 @@ Likely needed for specific iter4 candidates: - For D (perf): `pidstat -u -p $(pidof firefox)` for CPU%; Mali-G52 freq via `/sys/class/devfreq/fde60000.gpu`; compositor scanout query (Wayland `ext-output-management`?) is harder. - For E (DMABUF): `gbm_bo_create` userspace test program; VIDIOC_QBUF type=V4L2_MEMORY_DMABUF exploratory path. -## In-scope (LOCKING DEFERRED — Phase 1 user input) +## In-scope (LOCKED 2026-05-05 for iteration 4) — A solo -To be locked at Phase 1 from candidates A..G above. Recommended pairings flagged per candidate. +**Track A.** Identify which V4L2 control field's content the kernel rejects on the 11th `vaBeginPicture` in `bbb_1080p30_h264.mp4` and fix it in `libva-v4l2-request-fourier`. Diagnosis path per substrate candidate A: + +1. **Read kernel hantro validation**: `drivers/staging/media/hantro/hantro_g1_h264_dec.c::set_params()` (and any predecessor V4L2-stateless H.264 validators on this kernel — `v4l2-h264.c`, `v4l2-mem2mem.c`, etc.) on ohm. +2. **Diff our struct construction at frame-11 against FFmpeg reference**: leverage existing `diff_against_ffmpeg.md` scaffold; focus on DECODE_PARAMS reference picture list + flags state for non-IDR P-frames. +3. **Speculative fix**: rebuild driver on ohm, retest with `/tmp/run_phase7_v2.sh`. ~30 sec rebuild + 35 sec test = ~1 min cycle. Sonnet review 7.5 ("mid-stream non-IDR DPB state") is the suspect surface. + +Pairing decision: **solo**. iter3 substrate suggested A+B or A+D, user locked **A solo**. iter5 will pick up the natural follow-on (DEBUG sweep, perf, or whatever). ## Out-of-scope (LOCKED 2026-05-05 for iteration 4) -- New codecs (MPEG-2, VP8, VP9, AV1, HEVC) — H.264-only scope holds from iter1+iter2+iter3. -- New target hardware on the libva side (fresnel RK3399, ampere RK3588) — separate iteration after ohm path is hardened. Boltzmann remains a build host only, not a libva target. -- Bootlin upstreaming PR — `feedback_no_upstream.md` holds; no PRs unless explicitly tasked. Same for Mozilla bug-file (candidate G is gated on operator decision). -- HEVC re-introduction (stripped in fourier port; no hantro G2 HEVC validation in operator's test corpus). +- Candidates B, C, D, E, F, G — deferred to iter5+. None block A. +- Same standing OOS from iter3: new codecs, new target hardware, bootlin/Mozilla upstream PR/MR, HEVC. -## Phase 1 success criterion (will lock after user picks candidate) +## Phase 1 success criterion (LOCKED 2026-05-05) -Pre-lock template: -- For candidate A: "patched-Firefox-fourier on ohm decodes ≥30s of bbb_1080p30 without `Unable to set control(s): Invalid argument` emerging in driver stderr; visual ack confirms frames reach the Firefox window." -- For candidate B: "Driver source builds clean with zero `request_log()` calls in non-error paths, zero patch-0011 sentinel writes, Y2 instrumentation removed (if A also done); vaapi-copy + vaapi smoke tests still green." -- For candidate C: "`mpv --hwdec=vaapi --vo=gpu` plays bbb_1080p30 to completion (60s+) without segfault, regardless of Vulkan init outcome." -- For candidate D: "Anchored perf table for {mpv vaapi DMA-BUF, mpv vaapi-copy, Firefox-fourier HW, SW baseline} across drop count + CPU% + frame timing on bbb_1080p30; reproducible from operator instructions documented in iter4 substrate." -- For candidate E: "vaapi-copy + vaapi --vo=gpu still produce real frames with V4L2_MEMORY_DMABUF-backed CAPTURE buffers; race window mathematically eliminated." -- For candidate F: "Two concurrent libva contexts on the same V4L2 device decode independently without cross-context state corruption." -- For candidate G: "Mozilla bug filed with iter3 patch attached; bootlin/libva-v4l2-request bug filed with `select() → poll()` patch attached." +**Track A:** patched-Firefox-fourier on ohm decodes **≥30s of bbb_1080p30 (≥720 frames at 24 fps)** without `Unable to set control(s): Invalid argument` emerging in driver stderr. Anchored evidence: Y2 `S_EXT_CTRLS EINVAL` count = 0 over the run, Phase 7 evidence script verdict line "Track A: GREEN", and visual ack confirming frames render in the Firefox window (the load-bearing op-side check that decode output actually reaches the screen, not just succeeds at the libva layer). ## Stop point -**Phase 1 lock requires user input** — pick from A..G (and any pairing). Recommended primary: **A** (Track A is the carried-three-iterations defect; rig + Y2 in place from iter3 close; tightest cycle-time of any candidate to fix loop). Recommended pairing: **A + B** (defect fix + DEBUG sweep cleanup) or **A + D** (defect fix + perf anchor "before/after" the fix). - -After lock, iter4 phases 2..8 proceed autonomously per "Stop only if user is needed." +Phase 1 LOCKED on A solo. iter4 proceeds to Phase 2 (situation analysis: read hantro_g1_h264_dec.c set_params validation on ohm, identify which fields are checked at request-level), Phase 3 (baseline anchor: re-run iter3's autonomous Phase 7 rig to confirm the EINVAL still reproduces with the same control IDs/sizes), Phase 4 (write the fix, possibly multiple iterations of speculative fix → rebuild → retest), Phase 5 (sonnet review of fix), Phase 6 (deploy + smoke), Phase 7 (verify), Phase 8 (close). Stop only if user input is needed (e.g. the fix surfaces a multi-way design choice, or kernel-side state tracking emerges as required).