From a052d5d7cd1c965b1e85a8284a683b066c260a19 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Mon, 4 May 2026 13:25:16 +0000 Subject: [PATCH] Phase 7 verification: vaapi-copy works, DMA-BUF surface-export bug surfaces MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Live Plasma 6 Wayland session retest of all 4 target consumers against fork commit 6be3f3b. Results: - vainfo: ✓ no regression (7 H.264 + 2 MPEG-2 profiles) - mpv --hwdec=vaapi-copy --vo=gpu: ✓ bunny (Phase 6 success re-confirmed in live session) - mpv --hwdec=vaapi --vo=gpu: ⚠ solid blue frame - Firefox 150 (live session): ⚠ engages libva for 1 frame (gets real pixels per slice_header parse log), then falls back to FFmpeg(FFVPX) software for sustained playback - chromium-fourier 149: ✓ no regression but ORTHOGONAL — uses chromium's own V4L2 stateless decoder, bypasses libva entirely Tests A (mpv vaapi) and B (Firefox) converge on the same DMA-BUF surface-export bug: vaExportSurfaceHandle in libva- v4l2-request produces a DMA-BUF that Mesa/Firefox can't render correctly — likely wrong DRM_FORMAT modifier or plane offset/ stride mismatch with hantro's tile-padded NV12 (sizeimage= 3,655,712 vs vanilla 3,133,440 for 1920x1088). Also disambiguated: chromium-fourier 149's decode path does NOT go through libva-v4l2-request — uses chromium's own V4L2 backend (Step-2 chromium-side patch). Reframes the 2026-05-03 fourier_attribution cell-A wheat verdict's path validation. Boolean-correctness criterion (sharpened): met for vaapi-copy, not for vaapi (DMA-BUF). Phase 1 lock should wait until both paths work. Iteration 2 (perf) is gated on the DMA-BUF path. Co-Authored-By: Claude Opus 4.7 (1M context) --- phase7_findings.md | 72 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) create mode 100644 phase7_findings.md diff --git a/phase7_findings.md b/phase7_findings.md new file mode 100644 index 0000000..d74f9ec --- /dev/null +++ b/phase7_findings.md @@ -0,0 +1,72 @@ +# Phase 7 verification — 2026-05-04 + +In-session retest of all four target consumers (vainfo, mpv, Firefox 150, chromium-fourier 149) against the campaign deliverable as of fork commit `6be3f3b` (with `d41a4b9` SCALING_MATRIX/num_ref_idx + `9de1be3` slice-header parser + `a047926`/`6be3f3b` observability hardening). Live Plasma 6 Wayland session on ohm. + +## Summary + +| Consumer | Path | Engages our libva? | Real pixels? | Verdict | +|---|---|---|---|---| +| vainfo | enumerate-only | ✓ enumerates 7 H.264 + 2 MPEG-2 profiles | n/a | ✓ no regression | +| mpv `--hwdec=vaapi-copy` (vo=null) | vaapi-copy → CPU | ✓ sustained (10/10 frames in smoke run) | ✓ ftrace `vb2_buf_done bytesused=3655712`, ymin/ymax variance > 0 mid-frame | ✓ | +| **mpv `--hwdec=vaapi-copy` (vo=gpu live)** | vaapi-copy → CPU → GL upload | ✓ sustained | **✓ bunny visible (operator-confirmed)** | **✓ Phase 6 success** | +| mpv `--hwdec=vaapi` (vo=gpu live) | vaapi → DMA-BUF → GL import | ✓ engages | ⚠ **solid blue frame** | ⚠ surface-export bug | +| Firefox 150 (live session) | RDD → vaapi → DMA-BUF | ✓ 1 frame, real pixels (slice_header parse logged once) | ⚠ **falls back to SW after frame 0** (lsof: 0 /dev/video1 holders during sustained playback; 21,522 ProcessDecode via FFmpeg(FFVPX) PDM) | ⚠ same DMA-BUF bug → SW fallback | +| chromium-fourier 149 | chromium-internal V4L2 stateless | ✗ uses own `media/gpu/v4l2/v4l2_video_decoder_backend_stateless` path; **bypasses libva entirely** | ✓ bunny visible, chrome://gpu = HW | ✓ no regression but **orthogonal** to libva fix | + +## Boolean-correctness criterion (sharpened) + +The Phase 1 criterion as amended in `phase0_evidence/2026-05-04-kernel-trace/findings.md`: + +> consumer engages backend AND kernel produces decoded pixel output (verified by visual inspection on a real VO, not by sentinel test) + +**Met for the vaapi-copy path** (mpv-vaapi-copy + vo=gpu = bunny). **Not met for the vaapi (DMA-BUF) path** (blue frame in mpv, SW fallback in Firefox). + +## Surface-export bug (the next iteration) + +Test A and Test B converge on the same failure mode: the libva backend's `vaExportSurfaceHandle` produces a DMA-BUF that downstream consumers can't render correctly. + +- **mpv `--hwdec=vaapi --vo=gpu`** GL-imports the DMA-BUF as a texture and renders it → solid blue. The kernel-side decode is good (proven by vaapi-copy path showing the bunny), so the corruption is at the export → import → render layer. +- **Firefox 150** evaluates the first frame's HW-decoded result (presumably also receives a "blue" / wrong DMA-BUF), determines the path doesn't render correctly, and **falls back to FFmpeg(FFVPX) software decode** for sustained playback. +- **chromium-fourier 149** uses an entirely different code path (chromium-internal V4L2 stateless decoder, bypassing libva), so this bug doesn't affect it. The 2026-05-03 `fourier_attribution` cell A success was via that internal path, not via libva-v4l2-request — a fact this Phase 7 work disambiguated for the first time. + +Likely root causes (in order to investigate): + +1. **Wrong DRM_FORMAT**: hantro G1 NV12 has `sizeimage=3,655,712` for 1920×1088 (vs vanilla 3,133,440 = +522,272 bytes of tile padding). If `vaExportSurfaceHandle` reports `DRM_FORMAT_NV12` with **linear modifier** instead of a hantro-specific modifier (e.g. `DRM_FORMAT_MOD_LINEAR` vs a tile/coded-format modifier), Mesa will read the buffer at wrong byte offsets → garbage chroma → blue tinted output. +2. **Wrong plane offset/stride**: NV12 has Y plane followed by interleaved UV. If our export reports `offset[1]` (UV plane start) at the wrong byte, Mesa reads the wrong region for chroma. +3. **Missing colorspace hint**: DMA-BUF doesn't carry colorspace per se, but the consumer needs YUV→RGB conversion matrix info. mpv typically infers BT.709 limited-range for HD content, so this is a less likely culprit but should be ruled out. +4. **Missing surface-side cache flush**: Similar to the patch-0011 cache bug — if our export doesn't ensure the DMA-BUF is cache-coherent at the time of export, GL import might see stale data. But hantro CMA is probably uncached/coherent on ARM64 by default; less likely. + +`surface-handle export, dmabuf modifier negotiation` is **explicitly in the campaign's locked scope** per `phase0_findings.md`. This is iteration 1's Phase 4→6 cycle #2. + +## What this means for Phase 1 lock + iteration 2 (performance) + +- Phase 1 boolean-correctness criterion is **partially met** — vaapi-copy works end-to-end, vaapi (DMA-BUF) does not. Phase 1 lock should not happen until both paths work, OR the criterion should be sharpened further to specify which path counts. +- Iteration 2 (performance: SW baseline vs HW with libva-multiplanar) is **gated** on the DMA-BUF path working, because real-world consumers (Firefox especially, but also production mpv configs) prefer DMA-BUF over the system-memory copy path. vaapi-copy involves an extra CPU-side copy that adds latency and CPU load — using it as the perf comparison would understate the HW benefit. + +## Phase 6 follow-up: chromium-fourier path is orthogonal + +The 2026-05-03 `fourier_attribution` Phase 5 review's cell-A "wheat" verdict for chromium-fourier-with-Step-1-patches was load-bearing on the assumption that Step 1 = libva-v4l2-request multi-planar work. Today's Phase 7 work shows that chromium-fourier's actual decode path **does not go through libva-v4l2-request** — it uses chromium's built-in V4L2 stateless decoder (the chromeos-mature `media/gpu/v4l2/v4l2_video_decoder_backend_stateless`), enabled via a chromium-side patch ("Step 2"). Our libva fix neither helps nor hurts that path. + +That doesn't invalidate the cell-A vs cell-B 83 pp browser-CPU finding (HW decode IS happening for chromium-fourier; it's just via a different driver). But it does mean: + +- The chromium-fourier-internal path doesn't depend on libva-v4l2-request fixes. +- Iteration 2's perf comparison should use mpv (vaapi-copy and once fixed, vaapi) and Firefox as the libva-driven consumers, with chromium-fourier as a separate performance reference (not a libva validator). +- A vanilla Chromium build forced through libva would be a true validator; chromium-fourier is not. + +## Artifacts + +Live-session test outputs (preserved on ohm `/tmp/`): +- `/tmp/test-A-mpv.{stdout,stderr,pid}` — mpv vaapi vo=gpu (blue) +- `/tmp/test-B-firefox.{stdout,stderr,pid}` — Firefox live session +- `/tmp/firefox-vaapi-test/firefox_test_B.log.{moz_log,child-1.moz_log}` — MOZ_LOG (1 slice_header parse, 21k+ ProcessDecode via FFVPX) +- `/tmp/test-C-chromium.{stdout,stderr,pid}` — chromium-fourier 149 (no libva engagement) + +Not pulled to campaign repo — they're rebuilt easily and the per-process log files are large. + +## Next iteration of Phase 4→6 (open) + +1. Read libva-v4l2-request `src/surface.c::ExportSurfaceHandle` (or wherever vaExportSurfaceHandle lives in the fork's code). +2. Compare against FFmpeg `references/ffmpeg-kwiboo/libavcodec/v4l2_request.c` (or `hwcontext_v4l2request.c` per the b57fbbe head) for surface-export details — DRM_FORMAT, modifier, plane offsets, colorspace hint. +3. Diagnose the blue-frame: instrument the export call to log what's returned. Cross-check against what mpv's `--msg-level=vd=v --msg-level=vo=v` reports about the imported texture format/modifier. +4. Implement fix. Build. Real-VO test. Bunny via `--hwdec=vaapi --vo=gpu` = win. +5. Retest Firefox: should now stay engaged instead of falling back.