Phase 7 verification: vaapi-copy works, DMA-BUF surface-export bug surfaces

Live Plasma 6 Wayland session retest of all 4 target consumers
against fork commit 6be3f3b.

Results:
- vainfo: ✓ no regression (7 H.264 + 2 MPEG-2 profiles)
- mpv --hwdec=vaapi-copy --vo=gpu: ✓ bunny (Phase 6 success
  re-confirmed in live session)
- mpv --hwdec=vaapi --vo=gpu: ⚠ solid blue frame
- Firefox 150 (live session): ⚠ engages libva for 1 frame
  (gets real pixels per slice_header parse log), then falls
  back to FFmpeg(FFVPX) software for sustained playback
- chromium-fourier 149: ✓ no regression but ORTHOGONAL — uses
  chromium's own V4L2 stateless decoder, bypasses libva entirely

Tests A (mpv vaapi) and B (Firefox) converge on the same
DMA-BUF surface-export bug: vaExportSurfaceHandle in libva-
v4l2-request produces a DMA-BUF that Mesa/Firefox can't render
correctly — likely wrong DRM_FORMAT modifier or plane offset/
stride mismatch with hantro's tile-padded NV12 (sizeimage=
3,655,712 vs vanilla 3,133,440 for 1920x1088).

Also disambiguated: chromium-fourier 149's decode path does
NOT go through libva-v4l2-request — uses chromium's own V4L2
backend (Step-2 chromium-side patch). Reframes the 2026-05-03
fourier_attribution cell-A wheat verdict's path validation.

Boolean-correctness criterion (sharpened): met for vaapi-copy,
not for vaapi (DMA-BUF). Phase 1 lock should wait until both
paths work. Iteration 2 (perf) is gated on the DMA-BUF path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-04 13:25:16 +00:00
parent b18c6f70f9
commit a052d5d7cd
+72
View File
@@ -0,0 +1,72 @@
# Phase 7 verification — 2026-05-04
In-session retest of all four target consumers (vainfo, mpv, Firefox 150, chromium-fourier 149) against the campaign deliverable as of fork commit `6be3f3b` (with `d41a4b9` SCALING_MATRIX/num_ref_idx + `9de1be3` slice-header parser + `a047926`/`6be3f3b` observability hardening). Live Plasma 6 Wayland session on ohm.
## Summary
| Consumer | Path | Engages our libva? | Real pixels? | Verdict |
|---|---|---|---|---|
| vainfo | enumerate-only | ✓ enumerates 7 H.264 + 2 MPEG-2 profiles | n/a | ✓ no regression |
| mpv `--hwdec=vaapi-copy` (vo=null) | vaapi-copy → CPU | ✓ sustained (10/10 frames in smoke run) | ✓ ftrace `vb2_buf_done bytesused=3655712`, ymin/ymax variance > 0 mid-frame | ✓ |
| **mpv `--hwdec=vaapi-copy` (vo=gpu live)** | vaapi-copy → CPU → GL upload | ✓ sustained | **✓ bunny visible (operator-confirmed)** | **✓ Phase 6 success** |
| mpv `--hwdec=vaapi` (vo=gpu live) | vaapi → DMA-BUF → GL import | ✓ engages | ⚠ **solid blue frame** | ⚠ surface-export bug |
| Firefox 150 (live session) | RDD → vaapi → DMA-BUF | ✓ 1 frame, real pixels (slice_header parse logged once) | ⚠ **falls back to SW after frame 0** (lsof: 0 /dev/video1 holders during sustained playback; 21,522 ProcessDecode via FFmpeg(FFVPX) PDM) | ⚠ same DMA-BUF bug → SW fallback |
| chromium-fourier 149 | chromium-internal V4L2 stateless | ✗ uses own `media/gpu/v4l2/v4l2_video_decoder_backend_stateless` path; **bypasses libva entirely** | ✓ bunny visible, chrome://gpu = HW | ✓ no regression but **orthogonal** to libva fix |
## Boolean-correctness criterion (sharpened)
The Phase 1 criterion as amended in `phase0_evidence/2026-05-04-kernel-trace/findings.md`:
> consumer engages backend AND kernel produces decoded pixel output (verified by visual inspection on a real VO, not by sentinel test)
**Met for the vaapi-copy path** (mpv-vaapi-copy + vo=gpu = bunny). **Not met for the vaapi (DMA-BUF) path** (blue frame in mpv, SW fallback in Firefox).
## Surface-export bug (the next iteration)
Test A and Test B converge on the same failure mode: the libva backend's `vaExportSurfaceHandle` produces a DMA-BUF that downstream consumers can't render correctly.
- **mpv `--hwdec=vaapi --vo=gpu`** GL-imports the DMA-BUF as a texture and renders it → solid blue. The kernel-side decode is good (proven by vaapi-copy path showing the bunny), so the corruption is at the export → import → render layer.
- **Firefox 150** evaluates the first frame's HW-decoded result (presumably also receives a "blue" / wrong DMA-BUF), determines the path doesn't render correctly, and **falls back to FFmpeg(FFVPX) software decode** for sustained playback.
- **chromium-fourier 149** uses an entirely different code path (chromium-internal V4L2 stateless decoder, bypassing libva), so this bug doesn't affect it. The 2026-05-03 `fourier_attribution` cell A success was via that internal path, not via libva-v4l2-request — a fact this Phase 7 work disambiguated for the first time.
Likely root causes (in order to investigate):
1. **Wrong DRM_FORMAT**: hantro G1 NV12 has `sizeimage=3,655,712` for 1920×1088 (vs vanilla 3,133,440 = +522,272 bytes of tile padding). If `vaExportSurfaceHandle` reports `DRM_FORMAT_NV12` with **linear modifier** instead of a hantro-specific modifier (e.g. `DRM_FORMAT_MOD_LINEAR` vs a tile/coded-format modifier), Mesa will read the buffer at wrong byte offsets → garbage chroma → blue tinted output.
2. **Wrong plane offset/stride**: NV12 has Y plane followed by interleaved UV. If our export reports `offset[1]` (UV plane start) at the wrong byte, Mesa reads the wrong region for chroma.
3. **Missing colorspace hint**: DMA-BUF doesn't carry colorspace per se, but the consumer needs YUV→RGB conversion matrix info. mpv typically infers BT.709 limited-range for HD content, so this is a less likely culprit but should be ruled out.
4. **Missing surface-side cache flush**: Similar to the patch-0011 cache bug — if our export doesn't ensure the DMA-BUF is cache-coherent at the time of export, GL import might see stale data. But hantro CMA is probably uncached/coherent on ARM64 by default; less likely.
`surface-handle export, dmabuf modifier negotiation` is **explicitly in the campaign's locked scope** per `phase0_findings.md`. This is iteration 1's Phase 4→6 cycle #2.
## What this means for Phase 1 lock + iteration 2 (performance)
- Phase 1 boolean-correctness criterion is **partially met** — vaapi-copy works end-to-end, vaapi (DMA-BUF) does not. Phase 1 lock should not happen until both paths work, OR the criterion should be sharpened further to specify which path counts.
- Iteration 2 (performance: SW baseline vs HW with libva-multiplanar) is **gated** on the DMA-BUF path working, because real-world consumers (Firefox especially, but also production mpv configs) prefer DMA-BUF over the system-memory copy path. vaapi-copy involves an extra CPU-side copy that adds latency and CPU load — using it as the perf comparison would understate the HW benefit.
## Phase 6 follow-up: chromium-fourier path is orthogonal
The 2026-05-03 `fourier_attribution` Phase 5 review's cell-A "wheat" verdict for chromium-fourier-with-Step-1-patches was load-bearing on the assumption that Step 1 = libva-v4l2-request multi-planar work. Today's Phase 7 work shows that chromium-fourier's actual decode path **does not go through libva-v4l2-request** — it uses chromium's built-in V4L2 stateless decoder (the chromeos-mature `media/gpu/v4l2/v4l2_video_decoder_backend_stateless`), enabled via a chromium-side patch ("Step 2"). Our libva fix neither helps nor hurts that path.
That doesn't invalidate the cell-A vs cell-B 83 pp browser-CPU finding (HW decode IS happening for chromium-fourier; it's just via a different driver). But it does mean:
- The chromium-fourier-internal path doesn't depend on libva-v4l2-request fixes.
- Iteration 2's perf comparison should use mpv (vaapi-copy and once fixed, vaapi) and Firefox as the libva-driven consumers, with chromium-fourier as a separate performance reference (not a libva validator).
- A vanilla Chromium build forced through libva would be a true validator; chromium-fourier is not.
## Artifacts
Live-session test outputs (preserved on ohm `/tmp/`):
- `/tmp/test-A-mpv.{stdout,stderr,pid}` — mpv vaapi vo=gpu (blue)
- `/tmp/test-B-firefox.{stdout,stderr,pid}` — Firefox live session
- `/tmp/firefox-vaapi-test/firefox_test_B.log.{moz_log,child-1.moz_log}` — MOZ_LOG (1 slice_header parse, 21k+ ProcessDecode via FFVPX)
- `/tmp/test-C-chromium.{stdout,stderr,pid}` — chromium-fourier 149 (no libva engagement)
Not pulled to campaign repo — they're rebuilt easily and the per-process log files are large.
## Next iteration of Phase 4→6 (open)
1. Read libva-v4l2-request `src/surface.c::ExportSurfaceHandle` (or wherever vaExportSurfaceHandle lives in the fork's code).
2. Compare against FFmpeg `references/ffmpeg-kwiboo/libavcodec/v4l2_request.c` (or `hwcontext_v4l2request.c` per the b57fbbe head) for surface-export details — DRM_FORMAT, modifier, plane offsets, colorspace hint.
3. Diagnose the blue-frame: instrument the export call to log what's returned. Cross-check against what mpv's `--msg-level=vd=v --msg-level=vo=v` reports about the imported texture format/modifier.
4. Implement fix. Build. Real-VO test. Bunny via `--hwdec=vaapi --vo=gpu` = win.
5. Retest Firefox: should now stay engaged instead of falling back.