Phase 8: iteration 1 close — deliverable lands for vaapi-copy + Firefox + vainfo
Iteration 1 close. Boolean-correctness criterion met for vainfo +
mpv vaapi-copy + Firefox 150 in live Plasma session. mpv vaapi
(DMA-BUF) engages and decodes correctly but stutters due to a
DMA-BUF lifecycle race — deferred to iteration 2.
Four load-bearing functional commits on the fork:
9de1be3 — slice-header bit-parser populates DECODE_PARAMS bit-size
fields hantro G1 reads into MMIO registers
d41a4b9 — always submit SCALING_MATRIX + populate pps num_ref_idx
37c0e72 — re-set OUTPUT format on resolution change (mpv probe-pattern)
ac891a0 — honor VA_EXPORT_SURFACE_SEPARATE_LAYERS in vaExportSurfaceHandle
Lessons distilled to memory:
feedback_read_consumer_source_first (NEW)
feedback_one_consumer_success_is_not_validation
feedback_kernel_source_audit_for_uapi_contract (Sonnet Phase 5)
feedback_stdout_is_data_too (NEW)
feedback_no_premature_closure
Predecessor claims either reframed or invalidated:
- 'vainfo + mpv probes work end-to-end' was true at libva engagement
layer, wrong at kernel-decode layer until 9de1be3
- 'chromium-fourier 149 = libva-multi-planar working' was wrong about
mechanism; chromium-fourier uses chromium-internal V4L2 backend,
not libva. The 83 pp browser-CPU finding from fourier_attribution
cell A stands; the path attribution does not.
- 'patch 0011 sentinel test reliably detects decode failure' had a
cache-coherency bug fixed in a047926.
Open questions carried to iteration 2: DMA-BUF EXPBUF refcount
lifecycle (load-bearing), WSI pitch alignment for non-64-aligned
widths, multi-resolution kernel-state corruption, plus six items
from Sonnet's Phase 5 review.
Iteration 2 opens with these as Phase 0 substrate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,97 @@
|
||||
# Iteration 1 — Phase 8 close (2026-05-04)
|
||||
|
||||
## Deliverable
|
||||
|
||||
A libva-v4l2-request fork (`marfrit/libva-v4l2-request-fourier`, master at commit `c036a44`) that engages the hantro G1/G2 hardware H.264 decoder on PineTab2 RK3568 end-to-end for VA-API consumers, producing real decoded NV12 pixel content.
|
||||
|
||||
## Boolean-correctness criterion (sharpened mid-iteration)
|
||||
|
||||
> Consumer engages the libva backend AND kernel produces real pixel output, verified on a real VO via operator inspection.
|
||||
|
||||
| Consumer | Result |
|
||||
|---|---|
|
||||
| vainfo | ✓ enumerates 7 H.264 + 2 MPEG-2 profiles |
|
||||
| mpv `--hwdec=vaapi-copy --vo=null` | ✓ HW engaged, contract trace clean |
|
||||
| mpv `--hwdec=vaapi-copy --vo=gpu` (live session) | ✓ bunny renders (operator-confirmed) |
|
||||
| mpv `--hwdec=vaapi --vo=gpu` (live session) | ⚠ HW engaged, real pixels, but DMA-BUF lifecycle race causes visual stutter — deferred to iteration 2 |
|
||||
| Firefox 150 (live Plasma Wayland session) | ✓ sustained HW decode (100+ slice_header parses, RDD memory-maps `/dev/video1`), bunny renders |
|
||||
| chromium-fourier 149 | n/a (uses chromium-internal V4L2 backend, bypasses libva entirely — not a libva validator) |
|
||||
|
||||
**Met for vaapi-copy + Firefox + vainfo.** **Partial for vaapi (DMA-BUF)** — engagement and decode work; lifecycle race produces visible artifacts. iteration 2 target.
|
||||
|
||||
## What landed (fork commits ahead of bootlin upstream)
|
||||
|
||||
```
|
||||
c036a44 image: fully populate VAImageFormat per VAAPI spec for NV12
|
||||
ac891a0 surface: honor VA_EXPORT_SURFACE_SEPARATE_LAYERS in vaExportSurfaceHandle
|
||||
fdfee2d DEBUG: log SyncSurface RETURN to confirm clean exit before crash
|
||||
21ae311 DEBUG: ENTER on CreateBuffer + BeginPicture for frame-1 crash narrowing
|
||||
92f5b25 DEBUG: ENTER on buffer/image entry points to localize Firefox RDD crash
|
||||
7da2b27 DEBUG: ENTER logging at libva entry points to trace Firefox call flow
|
||||
6be3f3b h264: rate-limit V4L2 readback EACCES warning to once per process
|
||||
a047926 DEBUG: cache-fix CAPTURE dump + VIDIOC_G_EXT_CTRLS readback
|
||||
2517a12 DEBUG: instrument surface CreateSurfaces2 + ExportSurfaceHandle for diagnosis
|
||||
37c0e72 surface: re-set OUTPUT format on resolution change
|
||||
9de1be3 h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields
|
||||
d41a4b9 h264: always submit SCALING_MATRIX + populate pps num_ref_idx
|
||||
74b3793 STUDY.md: pointer to libva-multiplanar campaign Phase 0
|
||||
8594d74..c45fea9 (18 patches) Step 1 reconciliation from marfrit-packages PKGBUILD
|
||||
```
|
||||
|
||||
The four load-bearing functional commits:
|
||||
- `9de1be3` — slice-header bit-parser populates DECODE_PARAMS bit-size fields hantro G1 reads into MMIO registers (the load-bearing decode fix; without it the kernel writes all zeros)
|
||||
- `d41a4b9` — always submit SCALING_MATRIX + set the PPS_FLAG; populate pps num_ref_idx (decode prerequisites)
|
||||
- `37c0e72` — re-set OUTPUT format on resolution change (probe-then-real-resolution mpv pattern)
|
||||
- `ac891a0` — honor `VA_EXPORT_SURFACE_SEPARATE_LAYERS` flag in vaExportSurfaceHandle (the load-bearing Firefox fix)
|
||||
|
||||
The DEBUG/diagnostic commits (`a047926`, `2517a12`, `6be3f3b`, `7da2b27`, `92f5b25`, `21ae311`, `fdfee2d`, plus pre-existing 0010/0011/0014) stay until iteration 2's DMA-BUF lifecycle work needs a clean baseline. Per Phase 5 review: clean sweep before any "this is what we'd push to bootlin" snapshot.
|
||||
|
||||
## Lessons distilled to memory
|
||||
|
||||
- **`feedback_read_consumer_source_first.md`** (NEW) — when a specific consumer fails and others don't, read its source FIRST before generating speculative theories. Iteration 1 burned ~3h on Firefox crash/fallback hypotheses; the actual answer (`VA_EXPORT_SURFACE_SEPARATE_LAYERS`) was 30 lines of FFmpegVideoDecoder.cpp.
|
||||
- **`feedback_one_consumer_success_is_not_validation.md`** (added mid-iteration) — mpv tolerated several spec violations Firefox correctly rejected. The strictest consumer drives the boolean-correctness criterion, not the easiest-to-test one.
|
||||
- **`feedback_kernel_source_audit_for_uapi_contract.md`** (Sonnet's Phase 5 review crystallization) — when userspace fills V4L2/UAPI control fields, read the kernel driver source for which fields drive MMIO writes. Patch 0008's "empirical question — does hantro tolerate zero?" was the wrong resolution; reading hantro_g1_h264_dec.c gave the answer instantly.
|
||||
- **`feedback_stdout_is_data_too.md`** (NEW) — `feedback_phase3_no_theatre.md` says stdout doesn't *replace* deeper data acquisition; it doesn't say ignore stdout entirely. mpv's `Dropped: N` counter and V: timeline are the user-visible quality signal that strace/ftrace can't see.
|
||||
- **`feedback_no_premature_closure.md`** (added early-iteration, campaign-specific) — never frame "close the campaign" as a peer option to substantive work-remaining.
|
||||
|
||||
## Predecessor claims that were either reframed or invalidated
|
||||
|
||||
| Claim (source) | Reframing in iteration 1 |
|
||||
|---|---|
|
||||
| "vainfo + mpv probes work end-to-end" (STUDY.md, 2026-04-26) | True at libva engagement layer; **wrong at kernel-decode layer** until commit `9de1be3` landed. The original test never inspected pixel content. |
|
||||
| "chromium-fourier 149 = libva-multi-planar working" (`fourier_attribution` cell A) | chromium-fourier uses **its own internal V4L2 stateless decoder** (chromium-side "Step 2" patches), bypassing libva entirely. Cell A's HW decode IS happening; the mechanism is chromium-internal, not libva. The 83 pp browser-CPU finding stands; the path attribution does not. |
|
||||
| "Step 1 patches deliver working decode" (predecessor close-out) | Step 1 engaged libva correctly but **left the decode broken at the kernel layer** because patch 0008's bit-size open question was unresolved. Iteration 1's commit `9de1be3` is the actual fix. |
|
||||
| "patch 0011 sentinel test reliably detects decode failure" (predecessor patch comment) | The sentinel test had a **cache-coherency bug** that consistently showed sentinel-survives even when the kernel had DMA-overwritten the buffer. Iteration 1's commit `a047926` adds `msync(MS_SYNC|MS_INVALIDATE)` to fix the readback. |
|
||||
|
||||
## Phase 0 substrate carry-over to iteration 2
|
||||
|
||||
State that carries (re-verified in iteration 1):
|
||||
- ohm RK3568 hantro G1/G2 on `/dev/video1` + `/dev/media0`, multi-planar V4L2 stateless, kernel 6.19.10
|
||||
- mainline V4L2 stateless H.264 control IDs (no out-of-tree kernel patches needed)
|
||||
- libva `2.23.0`, libva-utils `2.22.0`, mpv `0.41.0-3`, Firefox `150.0.1`, Mesa `26.0.5`
|
||||
- Test clip: `/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4` (sha256 dcf8a7170fbd49bb...)
|
||||
- Build harness: `meson setup --buildtype=release && ninja` directly on ohm; `marfrit/libva-v4l2-request-fourier` deployed to `/usr/lib/dri/v4l2_request_drv_video.so` (sha256 of latest: per ohm `pacman -Qi`)
|
||||
- VIDIOC_G_EXT_CTRLS readback returns EACCES on this rig (kernel-side, not actionable from userspace; rate-limited warning per process)
|
||||
- mpv (vaapi-copy + vaapi DMA-BUF), vainfo, Firefox 150 in live Plasma Wayland session — all engage the libva backend and reach hantro
|
||||
|
||||
State that does NOT carry (to be re-acquired per `feedback_replicate_baseline_first.md`):
|
||||
- Performance numbers: drop counts, effective FPS, browser CPU%, scanout-plane residency. Iteration 1 saw `64/300` drops in 12s for vaapi-copy and `29/300` for vaapi (DMA-BUF), but these are not anchors — re-measure in iteration 2 with consistent rig.
|
||||
- Multi-resolution kernel-state behavior: iteration 1 surfaced corruption when Firefox played multi-video pages (Mozilla homepage); iteration 2 needs to re-measure with the deferred fix in scope.
|
||||
|
||||
Open questions for iteration 2:
|
||||
1. **DMA-BUF EXPBUF refcount lifecycle** (Task #39) — V4L2 re-queues a CAPTURE buffer while consumer holds EXPBUF'd fd → physical memory overwritten under consumer → mpv `--vo=gpu --hwdec=vaapi` stutter. Load-bearing for iteration 2.
|
||||
2. **WSI pitch alignment** (Task #40 part 1) — Mesa rejects pitch=864 (only 16-aligned, needs 64+); breaks 864-wide videos in Firefox. Need to round up reported pitch OR set a different DRM_FORMAT_MOD.
|
||||
3. **Multi-resolution kernel-state** (Task #40 part 2) — after a 864→1920 sequence, kernel CAPTURE format reverts to 48×48; our `LAST_OUTPUT_WIDTH/HEIGHT` cache misses the corruption. Need REQBUFS(0)+S_FMT resync on detecting format mismatch.
|
||||
4. **VIDIOC_G_EXT_CTRLS EACCES probe** (Sonnet review 7.1) — move readback before `MEDIA_REQUEST_IOC_QUEUE` and confirm the timing. If EACCES persists, file a kernel issue.
|
||||
5. **Firefox seek-to-non-IDR** (Sonnet review 7.5) — verify Firefox handles a stream where the first played frame isn't an IDR (mid-stream seek).
|
||||
6. **`SET_FORMAT_OF_OUTPUT_ONCE` removal completeness** (Sonnet review 7.3) — the global was replaced with `LAST_OUTPUT_WIDTH/HEIGHT` tracking; document the architectural state and verify multi-context use.
|
||||
7. **`// HACK` block in surface.c** (Sonnet review 7.4) — refactor when codec multi-codec support (MPEG-2) is exercised.
|
||||
8. **`num_ref_idx_l0/l1_default_active_minus1` source for multi-slice** (Sonnet review 7.2) — only matters for multi-slice streams with explicit per-slice override. Defer until a real test stream surfaces it.
|
||||
|
||||
## Phase 1 lock for iteration 1: closed
|
||||
|
||||
Iteration 1's locked Phase 1 criterion ("consumer engages backend AND produces real pixel output, verified on real VO") is met for vainfo + mpv vaapi-copy + Firefox. Met-with-caveat for mpv vaapi (DMA-BUF) — engagement + decode correct, lifecycle bug deferred. The deliverable is real and operator-validated; iteration 2 hardens it.
|
||||
|
||||
## Bootlin upstream outlook (per `feedback_no_upstream.md`, no PRs unless asked)
|
||||
|
||||
The Step 1 patch series + the slice-header parser + SEPARATE_LAYERS honor + LAST_OUTPUT_WIDTH tracking together represent ~2000 lines of changes against bootlin's dormant upstream. The DEBUG patches (0010, 0011, 0014, 3F readback, surface-export instrumentation) need to come out before any upstream-submission snapshot. Refactoring the `// HACK` block, fixing the DMA-BUF lifecycle, and adding multi-resolution support are required for upstreamability. Iteration 2 → 3 → eventual upstream is the natural arc, gated by operator decision per `feedback_no_upstream.md`.
|
||||
Reference in New Issue
Block a user