# Iteration 1 — Phase 8 close (2026-05-04) ## Deliverable A libva-v4l2-request fork (`marfrit/libva-v4l2-request-fourier`, master at commit `c036a44`) that engages the hantro G1/G2 hardware H.264 decoder on PineTab2 RK3568 end-to-end for VA-API consumers, producing real decoded NV12 pixel content. ## Boolean-correctness criterion (sharpened mid-iteration) > Consumer engages the libva backend AND kernel produces real pixel output, verified on a real VO via operator inspection. | Consumer | Result | |---|---| | vainfo | ✓ enumerates 7 H.264 + 2 MPEG-2 profiles | | mpv `--hwdec=vaapi-copy --vo=null` | ✓ HW engaged, contract trace clean | | mpv `--hwdec=vaapi-copy --vo=gpu` (live session) | ✓ bunny renders (operator-confirmed) | | mpv `--hwdec=vaapi --vo=gpu` (live session) | ⚠ HW engaged, real pixels, but DMA-BUF lifecycle race causes visual stutter — deferred to iteration 2 | | Firefox 150 (live Plasma Wayland session) | ✓ sustained HW decode (100+ slice_header parses, RDD memory-maps `/dev/video1`), bunny renders | | chromium-fourier 149 | n/a (uses chromium-internal V4L2 backend, bypasses libva entirely — not a libva validator) | **Met for vaapi-copy + Firefox + vainfo.** **Partial for vaapi (DMA-BUF)** — engagement and decode work; lifecycle race produces visible artifacts. iteration 2 target. ## What landed (fork commits ahead of bootlin upstream) ``` c036a44 image: fully populate VAImageFormat per VAAPI spec for NV12 ac891a0 surface: honor VA_EXPORT_SURFACE_SEPARATE_LAYERS in vaExportSurfaceHandle fdfee2d DEBUG: log SyncSurface RETURN to confirm clean exit before crash 21ae311 DEBUG: ENTER on CreateBuffer + BeginPicture for frame-1 crash narrowing 92f5b25 DEBUG: ENTER on buffer/image entry points to localize Firefox RDD crash 7da2b27 DEBUG: ENTER logging at libva entry points to trace Firefox call flow 6be3f3b h264: rate-limit V4L2 readback EACCES warning to once per process a047926 DEBUG: cache-fix CAPTURE dump + VIDIOC_G_EXT_CTRLS readback 2517a12 DEBUG: instrument surface CreateSurfaces2 + ExportSurfaceHandle for diagnosis 37c0e72 surface: re-set OUTPUT format on resolution change 9de1be3 h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields d41a4b9 h264: always submit SCALING_MATRIX + populate pps num_ref_idx 74b3793 STUDY.md: pointer to libva-multiplanar campaign Phase 0 8594d74..c45fea9 (18 patches) Step 1 reconciliation from marfrit-packages PKGBUILD ``` The four load-bearing functional commits: - `9de1be3` — slice-header bit-parser populates DECODE_PARAMS bit-size fields hantro G1 reads into MMIO registers (the load-bearing decode fix; without it the kernel writes all zeros) - `d41a4b9` — always submit SCALING_MATRIX + set the PPS_FLAG; populate pps num_ref_idx (decode prerequisites) - `37c0e72` — re-set OUTPUT format on resolution change (probe-then-real-resolution mpv pattern) - `ac891a0` — honor `VA_EXPORT_SURFACE_SEPARATE_LAYERS` flag in vaExportSurfaceHandle (the load-bearing Firefox fix) The DEBUG/diagnostic commits (`a047926`, `2517a12`, `6be3f3b`, `7da2b27`, `92f5b25`, `21ae311`, `fdfee2d`, plus pre-existing 0010/0011/0014) stay until iteration 2's DMA-BUF lifecycle work needs a clean baseline. Per Phase 5 review: clean sweep before any "this is what we'd push to bootlin" snapshot. ## Lessons distilled to memory - **`feedback_read_consumer_source_first.md`** (NEW) — when a specific consumer fails and others don't, read its source FIRST before generating speculative theories. Iteration 1 burned ~3h on Firefox crash/fallback hypotheses; the actual answer (`VA_EXPORT_SURFACE_SEPARATE_LAYERS`) was 30 lines of FFmpegVideoDecoder.cpp. - **`feedback_one_consumer_success_is_not_validation.md`** (added mid-iteration) — mpv tolerated several spec violations Firefox correctly rejected. The strictest consumer drives the boolean-correctness criterion, not the easiest-to-test one. - **`feedback_kernel_source_audit_for_uapi_contract.md`** (Sonnet's Phase 5 review crystallization) — when userspace fills V4L2/UAPI control fields, read the kernel driver source for which fields drive MMIO writes. Patch 0008's "empirical question — does hantro tolerate zero?" was the wrong resolution; reading hantro_g1_h264_dec.c gave the answer instantly. - **`feedback_stdout_is_data_too.md`** (NEW) — `feedback_phase3_no_theatre.md` says stdout doesn't *replace* deeper data acquisition; it doesn't say ignore stdout entirely. mpv's `Dropped: N` counter and V: timeline are the user-visible quality signal that strace/ftrace can't see. - **`feedback_no_premature_closure.md`** (added early-iteration, campaign-specific) — never frame "close the campaign" as a peer option to substantive work-remaining. ## Predecessor claims that were either reframed or invalidated | Claim (source) | Reframing in iteration 1 | |---|---| | "vainfo + mpv probes work end-to-end" (STUDY.md, 2026-04-26) | True at libva engagement layer; **wrong at kernel-decode layer** until commit `9de1be3` landed. The original test never inspected pixel content. | | "chromium-fourier 149 = libva-multi-planar working" (`fourier_attribution` cell A) | chromium-fourier uses **its own internal V4L2 stateless decoder** (chromium-side "Step 2" patches), bypassing libva entirely. Cell A's HW decode IS happening; the mechanism is chromium-internal, not libva. The 83 pp browser-CPU finding stands; the path attribution does not. | | "Step 1 patches deliver working decode" (predecessor close-out) | Step 1 engaged libva correctly but **left the decode broken at the kernel layer** because patch 0008's bit-size open question was unresolved. Iteration 1's commit `9de1be3` is the actual fix. | | "patch 0011 sentinel test reliably detects decode failure" (predecessor patch comment) | The sentinel test had a **cache-coherency bug** that consistently showed sentinel-survives even when the kernel had DMA-overwritten the buffer. Iteration 1's commit `a047926` adds `msync(MS_SYNC|MS_INVALIDATE)` to fix the readback. | ## Phase 0 substrate carry-over to iteration 2 State that carries (re-verified in iteration 1): - ohm RK3568 hantro G1/G2 on `/dev/video1` + `/dev/media0`, multi-planar V4L2 stateless, kernel 6.19.10 - mainline V4L2 stateless H.264 control IDs (no out-of-tree kernel patches needed) - libva `2.23.0`, libva-utils `2.22.0`, mpv `0.41.0-3`, Firefox `150.0.1`, Mesa `26.0.5` - Test clip: `/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4` (sha256 dcf8a7170fbd49bb...) - Build harness: `meson setup --buildtype=release && ninja` directly on ohm; `marfrit/libva-v4l2-request-fourier` deployed to `/usr/lib/dri/v4l2_request_drv_video.so` (sha256 of latest: per ohm `pacman -Qi`) - VIDIOC_G_EXT_CTRLS readback returns EACCES on this rig (kernel-side, not actionable from userspace; rate-limited warning per process) - mpv (vaapi-copy + vaapi DMA-BUF), vainfo, Firefox 150 in live Plasma Wayland session — all engage the libva backend and reach hantro State that does NOT carry (to be re-acquired per `feedback_replicate_baseline_first.md`): - Performance numbers: drop counts, effective FPS, browser CPU%, scanout-plane residency. Iteration 1 saw `64/300` drops in 12s for vaapi-copy and `29/300` for vaapi (DMA-BUF), but these are not anchors — re-measure in iteration 2 with consistent rig. - Multi-resolution kernel-state behavior: iteration 1 surfaced corruption when Firefox played multi-video pages (Mozilla homepage); iteration 2 needs to re-measure with the deferred fix in scope. Open questions for iteration 2: 1. **DMA-BUF EXPBUF refcount lifecycle** (Task #39) — V4L2 re-queues a CAPTURE buffer while consumer holds EXPBUF'd fd → physical memory overwritten under consumer → mpv `--vo=gpu --hwdec=vaapi` stutter. Load-bearing for iteration 2. 2. **WSI pitch alignment** (Task #40 part 1) — Mesa rejects pitch=864 (only 16-aligned, needs 64+); breaks 864-wide videos in Firefox. Need to round up reported pitch OR set a different DRM_FORMAT_MOD. 3. **Multi-resolution kernel-state** (Task #40 part 2) — after a 864→1920 sequence, kernel CAPTURE format reverts to 48×48; our `LAST_OUTPUT_WIDTH/HEIGHT` cache misses the corruption. Need REQBUFS(0)+S_FMT resync on detecting format mismatch. 4. **VIDIOC_G_EXT_CTRLS EACCES probe** (Sonnet review 7.1) — move readback before `MEDIA_REQUEST_IOC_QUEUE` and confirm the timing. If EACCES persists, file a kernel issue. 5. **Firefox seek-to-non-IDR** (Sonnet review 7.5) — verify Firefox handles a stream where the first played frame isn't an IDR (mid-stream seek). 6. **`SET_FORMAT_OF_OUTPUT_ONCE` removal completeness** (Sonnet review 7.3) — the global was replaced with `LAST_OUTPUT_WIDTH/HEIGHT` tracking; document the architectural state and verify multi-context use. 7. **`// HACK` block in surface.c** (Sonnet review 7.4) — refactor when codec multi-codec support (MPEG-2) is exercised. 8. **`num_ref_idx_l0/l1_default_active_minus1` source for multi-slice** (Sonnet review 7.2) — only matters for multi-slice streams with explicit per-slice override. Defer until a real test stream surfaces it. ## Phase 1 lock for iteration 1: closed Iteration 1's locked Phase 1 criterion ("consumer engages backend AND produces real pixel output, verified on real VO") is met for vainfo + mpv vaapi-copy + Firefox. Met-with-caveat for mpv vaapi (DMA-BUF) — engagement + decode correct, lifecycle bug deferred. The deliverable is real and operator-validated; iteration 2 hardens it. ## Bootlin upstream outlook (per `feedback_no_upstream.md`, no PRs unless asked) The Step 1 patch series + the slice-header parser + SEPARATE_LAYERS honor + LAST_OUTPUT_WIDTH tracking together represent ~2000 lines of changes against bootlin's dormant upstream. The DEBUG patches (0010, 0011, 0014, 3F readback, surface-export instrumentation) need to come out before any upstream-submission snapshot. Refactoring the `// HACK` block, fixing the DMA-BUF lifecycle, and adding multi-resolution support are required for upstreamability. Iteration 2 → 3 → eventual upstream is the natural arc, gated by operator decision per `feedback_no_upstream.md`.