Files
libva-multiplanar/phase8_iteration1_close.md
marfrit 7e66abb72f Phase 8: iteration 1 close — deliverable lands for vaapi-copy + Firefox + vainfo
Iteration 1 close. Boolean-correctness criterion met for vainfo +
mpv vaapi-copy + Firefox 150 in live Plasma session. mpv vaapi
(DMA-BUF) engages and decodes correctly but stutters due to a
DMA-BUF lifecycle race — deferred to iteration 2.

Four load-bearing functional commits on the fork:
  9de1be3 — slice-header bit-parser populates DECODE_PARAMS bit-size
            fields hantro G1 reads into MMIO registers
  d41a4b9 — always submit SCALING_MATRIX + populate pps num_ref_idx
  37c0e72 — re-set OUTPUT format on resolution change (mpv probe-pattern)
  ac891a0 — honor VA_EXPORT_SURFACE_SEPARATE_LAYERS in vaExportSurfaceHandle

Lessons distilled to memory:
  feedback_read_consumer_source_first (NEW)
  feedback_one_consumer_success_is_not_validation
  feedback_kernel_source_audit_for_uapi_contract (Sonnet Phase 5)
  feedback_stdout_is_data_too (NEW)
  feedback_no_premature_closure

Predecessor claims either reframed or invalidated:
- 'vainfo + mpv probes work end-to-end' was true at libva engagement
  layer, wrong at kernel-decode layer until 9de1be3
- 'chromium-fourier 149 = libva-multi-planar working' was wrong about
  mechanism; chromium-fourier uses chromium-internal V4L2 backend,
  not libva. The 83 pp browser-CPU finding from fourier_attribution
  cell A stands; the path attribution does not.
- 'patch 0011 sentinel test reliably detects decode failure' had a
  cache-coherency bug fixed in a047926.

Open questions carried to iteration 2: DMA-BUF EXPBUF refcount
lifecycle (load-bearing), WSI pitch alignment for non-64-aligned
widths, multi-resolution kernel-state corruption, plus six items
from Sonnet's Phase 5 review.

Iteration 2 opens with these as Phase 0 substrate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:46:07 +00:00

98 lines
9.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 1 — Phase 8 close (2026-05-04)
## Deliverable
A libva-v4l2-request fork (`marfrit/libva-v4l2-request-fourier`, master at commit `c036a44`) that engages the hantro G1/G2 hardware H.264 decoder on PineTab2 RK3568 end-to-end for VA-API consumers, producing real decoded NV12 pixel content.
## Boolean-correctness criterion (sharpened mid-iteration)
> Consumer engages the libva backend AND kernel produces real pixel output, verified on a real VO via operator inspection.
| Consumer | Result |
|---|---|
| vainfo | ✓ enumerates 7 H.264 + 2 MPEG-2 profiles |
| mpv `--hwdec=vaapi-copy --vo=null` | ✓ HW engaged, contract trace clean |
| mpv `--hwdec=vaapi-copy --vo=gpu` (live session) | ✓ bunny renders (operator-confirmed) |
| mpv `--hwdec=vaapi --vo=gpu` (live session) | ⚠ HW engaged, real pixels, but DMA-BUF lifecycle race causes visual stutter — deferred to iteration 2 |
| Firefox 150 (live Plasma Wayland session) | ✓ sustained HW decode (100+ slice_header parses, RDD memory-maps `/dev/video1`), bunny renders |
| chromium-fourier 149 | n/a (uses chromium-internal V4L2 backend, bypasses libva entirely — not a libva validator) |
**Met for vaapi-copy + Firefox + vainfo.** **Partial for vaapi (DMA-BUF)** — engagement and decode work; lifecycle race produces visible artifacts. iteration 2 target.
## What landed (fork commits ahead of bootlin upstream)
```
c036a44 image: fully populate VAImageFormat per VAAPI spec for NV12
ac891a0 surface: honor VA_EXPORT_SURFACE_SEPARATE_LAYERS in vaExportSurfaceHandle
fdfee2d DEBUG: log SyncSurface RETURN to confirm clean exit before crash
21ae311 DEBUG: ENTER on CreateBuffer + BeginPicture for frame-1 crash narrowing
92f5b25 DEBUG: ENTER on buffer/image entry points to localize Firefox RDD crash
7da2b27 DEBUG: ENTER logging at libva entry points to trace Firefox call flow
6be3f3b h264: rate-limit V4L2 readback EACCES warning to once per process
a047926 DEBUG: cache-fix CAPTURE dump + VIDIOC_G_EXT_CTRLS readback
2517a12 DEBUG: instrument surface CreateSurfaces2 + ExportSurfaceHandle for diagnosis
37c0e72 surface: re-set OUTPUT format on resolution change
9de1be3 h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields
d41a4b9 h264: always submit SCALING_MATRIX + populate pps num_ref_idx
74b3793 STUDY.md: pointer to libva-multiplanar campaign Phase 0
8594d74..c45fea9 (18 patches) Step 1 reconciliation from marfrit-packages PKGBUILD
```
The four load-bearing functional commits:
- `9de1be3` — slice-header bit-parser populates DECODE_PARAMS bit-size fields hantro G1 reads into MMIO registers (the load-bearing decode fix; without it the kernel writes all zeros)
- `d41a4b9` — always submit SCALING_MATRIX + set the PPS_FLAG; populate pps num_ref_idx (decode prerequisites)
- `37c0e72` — re-set OUTPUT format on resolution change (probe-then-real-resolution mpv pattern)
- `ac891a0` — honor `VA_EXPORT_SURFACE_SEPARATE_LAYERS` flag in vaExportSurfaceHandle (the load-bearing Firefox fix)
The DEBUG/diagnostic commits (`a047926`, `2517a12`, `6be3f3b`, `7da2b27`, `92f5b25`, `21ae311`, `fdfee2d`, plus pre-existing 0010/0011/0014) stay until iteration 2's DMA-BUF lifecycle work needs a clean baseline. Per Phase 5 review: clean sweep before any "this is what we'd push to bootlin" snapshot.
## Lessons distilled to memory
- **`feedback_read_consumer_source_first.md`** (NEW) — when a specific consumer fails and others don't, read its source FIRST before generating speculative theories. Iteration 1 burned ~3h on Firefox crash/fallback hypotheses; the actual answer (`VA_EXPORT_SURFACE_SEPARATE_LAYERS`) was 30 lines of FFmpegVideoDecoder.cpp.
- **`feedback_one_consumer_success_is_not_validation.md`** (added mid-iteration) — mpv tolerated several spec violations Firefox correctly rejected. The strictest consumer drives the boolean-correctness criterion, not the easiest-to-test one.
- **`feedback_kernel_source_audit_for_uapi_contract.md`** (Sonnet's Phase 5 review crystallization) — when userspace fills V4L2/UAPI control fields, read the kernel driver source for which fields drive MMIO writes. Patch 0008's "empirical question — does hantro tolerate zero?" was the wrong resolution; reading hantro_g1_h264_dec.c gave the answer instantly.
- **`feedback_stdout_is_data_too.md`** (NEW) — `feedback_phase3_no_theatre.md` says stdout doesn't *replace* deeper data acquisition; it doesn't say ignore stdout entirely. mpv's `Dropped: N` counter and V: timeline are the user-visible quality signal that strace/ftrace can't see.
- **`feedback_no_premature_closure.md`** (added early-iteration, campaign-specific) — never frame "close the campaign" as a peer option to substantive work-remaining.
## Predecessor claims that were either reframed or invalidated
| Claim (source) | Reframing in iteration 1 |
|---|---|
| "vainfo + mpv probes work end-to-end" (STUDY.md, 2026-04-26) | True at libva engagement layer; **wrong at kernel-decode layer** until commit `9de1be3` landed. The original test never inspected pixel content. |
| "chromium-fourier 149 = libva-multi-planar working" (`fourier_attribution` cell A) | chromium-fourier uses **its own internal V4L2 stateless decoder** (chromium-side "Step 2" patches), bypassing libva entirely. Cell A's HW decode IS happening; the mechanism is chromium-internal, not libva. The 83 pp browser-CPU finding stands; the path attribution does not. |
| "Step 1 patches deliver working decode" (predecessor close-out) | Step 1 engaged libva correctly but **left the decode broken at the kernel layer** because patch 0008's bit-size open question was unresolved. Iteration 1's commit `9de1be3` is the actual fix. |
| "patch 0011 sentinel test reliably detects decode failure" (predecessor patch comment) | The sentinel test had a **cache-coherency bug** that consistently showed sentinel-survives even when the kernel had DMA-overwritten the buffer. Iteration 1's commit `a047926` adds `msync(MS_SYNC|MS_INVALIDATE)` to fix the readback. |
## Phase 0 substrate carry-over to iteration 2
State that carries (re-verified in iteration 1):
- ohm RK3568 hantro G1/G2 on `/dev/video1` + `/dev/media0`, multi-planar V4L2 stateless, kernel 6.19.10
- mainline V4L2 stateless H.264 control IDs (no out-of-tree kernel patches needed)
- libva `2.23.0`, libva-utils `2.22.0`, mpv `0.41.0-3`, Firefox `150.0.1`, Mesa `26.0.5`
- Test clip: `/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4` (sha256 dcf8a7170fbd49bb...)
- Build harness: `meson setup --buildtype=release && ninja` directly on ohm; `marfrit/libva-v4l2-request-fourier` deployed to `/usr/lib/dri/v4l2_request_drv_video.so` (sha256 of latest: per ohm `pacman -Qi`)
- VIDIOC_G_EXT_CTRLS readback returns EACCES on this rig (kernel-side, not actionable from userspace; rate-limited warning per process)
- mpv (vaapi-copy + vaapi DMA-BUF), vainfo, Firefox 150 in live Plasma Wayland session — all engage the libva backend and reach hantro
State that does NOT carry (to be re-acquired per `feedback_replicate_baseline_first.md`):
- Performance numbers: drop counts, effective FPS, browser CPU%, scanout-plane residency. Iteration 1 saw `64/300` drops in 12s for vaapi-copy and `29/300` for vaapi (DMA-BUF), but these are not anchors — re-measure in iteration 2 with consistent rig.
- Multi-resolution kernel-state behavior: iteration 1 surfaced corruption when Firefox played multi-video pages (Mozilla homepage); iteration 2 needs to re-measure with the deferred fix in scope.
Open questions for iteration 2:
1. **DMA-BUF EXPBUF refcount lifecycle** (Task #39) — V4L2 re-queues a CAPTURE buffer while consumer holds EXPBUF'd fd → physical memory overwritten under consumer → mpv `--vo=gpu --hwdec=vaapi` stutter. Load-bearing for iteration 2.
2. **WSI pitch alignment** (Task #40 part 1) — Mesa rejects pitch=864 (only 16-aligned, needs 64+); breaks 864-wide videos in Firefox. Need to round up reported pitch OR set a different DRM_FORMAT_MOD.
3. **Multi-resolution kernel-state** (Task #40 part 2) — after a 864→1920 sequence, kernel CAPTURE format reverts to 48×48; our `LAST_OUTPUT_WIDTH/HEIGHT` cache misses the corruption. Need REQBUFS(0)+S_FMT resync on detecting format mismatch.
4. **VIDIOC_G_EXT_CTRLS EACCES probe** (Sonnet review 7.1) — move readback before `MEDIA_REQUEST_IOC_QUEUE` and confirm the timing. If EACCES persists, file a kernel issue.
5. **Firefox seek-to-non-IDR** (Sonnet review 7.5) — verify Firefox handles a stream where the first played frame isn't an IDR (mid-stream seek).
6. **`SET_FORMAT_OF_OUTPUT_ONCE` removal completeness** (Sonnet review 7.3) — the global was replaced with `LAST_OUTPUT_WIDTH/HEIGHT` tracking; document the architectural state and verify multi-context use.
7. **`// HACK` block in surface.c** (Sonnet review 7.4) — refactor when codec multi-codec support (MPEG-2) is exercised.
8. **`num_ref_idx_l0/l1_default_active_minus1` source for multi-slice** (Sonnet review 7.2) — only matters for multi-slice streams with explicit per-slice override. Defer until a real test stream surfaces it.
## Phase 1 lock for iteration 1: closed
Iteration 1's locked Phase 1 criterion ("consumer engages backend AND produces real pixel output, verified on real VO") is met for vainfo + mpv vaapi-copy + Firefox. Met-with-caveat for mpv vaapi (DMA-BUF) — engagement + decode correct, lifecycle bug deferred. The deliverable is real and operator-validated; iteration 2 hardens it.
## Bootlin upstream outlook (per `feedback_no_upstream.md`, no PRs unless asked)
The Step 1 patch series + the slice-header parser + SEPARATE_LAYERS honor + LAST_OUTPUT_WIDTH tracking together represent ~2000 lines of changes against bootlin's dormant upstream. The DEBUG patches (0010, 0011, 0014, 3F readback, surface-export instrumentation) need to come out before any upstream-submission snapshot. Refactoring the `// HACK` block, fixing the DMA-BUF lifecycle, and adding multi-resolution support are required for upstreamability. Iteration 2 → 3 → eventual upstream is the natural arc, gated by operator decision per `feedback_no_upstream.md`.