Files
libva-multiplanar/phase7_findings.md
marfrit a052d5d7cd Phase 7 verification: vaapi-copy works, DMA-BUF surface-export bug surfaces
Live Plasma 6 Wayland session retest of all 4 target consumers
against fork commit 6be3f3b.

Results:
- vainfo: ✓ no regression (7 H.264 + 2 MPEG-2 profiles)
- mpv --hwdec=vaapi-copy --vo=gpu: ✓ bunny (Phase 6 success
  re-confirmed in live session)
- mpv --hwdec=vaapi --vo=gpu: ⚠ solid blue frame
- Firefox 150 (live session): ⚠ engages libva for 1 frame
  (gets real pixels per slice_header parse log), then falls
  back to FFmpeg(FFVPX) software for sustained playback
- chromium-fourier 149: ✓ no regression but ORTHOGONAL — uses
  chromium's own V4L2 stateless decoder, bypasses libva entirely

Tests A (mpv vaapi) and B (Firefox) converge on the same
DMA-BUF surface-export bug: vaExportSurfaceHandle in libva-
v4l2-request produces a DMA-BUF that Mesa/Firefox can't render
correctly — likely wrong DRM_FORMAT modifier or plane offset/
stride mismatch with hantro's tile-padded NV12 (sizeimage=
3,655,712 vs vanilla 3,133,440 for 1920x1088).

Also disambiguated: chromium-fourier 149's decode path does
NOT go through libva-v4l2-request — uses chromium's own V4L2
backend (Step-2 chromium-side patch). Reframes the 2026-05-03
fourier_attribution cell-A wheat verdict's path validation.

Boolean-correctness criterion (sharpened): met for vaapi-copy,
not for vaapi (DMA-BUF). Phase 1 lock should wait until both
paths work. Iteration 2 (perf) is gated on the DMA-BUF path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:25:16 +00:00

73 lines
7.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 7 verification — 2026-05-04
In-session retest of all four target consumers (vainfo, mpv, Firefox 150, chromium-fourier 149) against the campaign deliverable as of fork commit `6be3f3b` (with `d41a4b9` SCALING_MATRIX/num_ref_idx + `9de1be3` slice-header parser + `a047926`/`6be3f3b` observability hardening). Live Plasma 6 Wayland session on ohm.
## Summary
| Consumer | Path | Engages our libva? | Real pixels? | Verdict |
|---|---|---|---|---|
| vainfo | enumerate-only | ✓ enumerates 7 H.264 + 2 MPEG-2 profiles | n/a | ✓ no regression |
| mpv `--hwdec=vaapi-copy` (vo=null) | vaapi-copy → CPU | ✓ sustained (10/10 frames in smoke run) | ✓ ftrace `vb2_buf_done bytesused=3655712`, ymin/ymax variance > 0 mid-frame | ✓ |
| **mpv `--hwdec=vaapi-copy` (vo=gpu live)** | vaapi-copy → CPU → GL upload | ✓ sustained | **✓ bunny visible (operator-confirmed)** | **✓ Phase 6 success** |
| mpv `--hwdec=vaapi` (vo=gpu live) | vaapi → DMA-BUF → GL import | ✓ engages | ⚠ **solid blue frame** | ⚠ surface-export bug |
| Firefox 150 (live session) | RDD → vaapi → DMA-BUF | ✓ 1 frame, real pixels (slice_header parse logged once) | ⚠ **falls back to SW after frame 0** (lsof: 0 /dev/video1 holders during sustained playback; 21,522 ProcessDecode via FFmpeg(FFVPX) PDM) | ⚠ same DMA-BUF bug → SW fallback |
| chromium-fourier 149 | chromium-internal V4L2 stateless | ✗ uses own `media/gpu/v4l2/v4l2_video_decoder_backend_stateless` path; **bypasses libva entirely** | ✓ bunny visible, chrome://gpu = HW | ✓ no regression but **orthogonal** to libva fix |
## Boolean-correctness criterion (sharpened)
The Phase 1 criterion as amended in `phase0_evidence/2026-05-04-kernel-trace/findings.md`:
> consumer engages backend AND kernel produces decoded pixel output (verified by visual inspection on a real VO, not by sentinel test)
**Met for the vaapi-copy path** (mpv-vaapi-copy + vo=gpu = bunny). **Not met for the vaapi (DMA-BUF) path** (blue frame in mpv, SW fallback in Firefox).
## Surface-export bug (the next iteration)
Test A and Test B converge on the same failure mode: the libva backend's `vaExportSurfaceHandle` produces a DMA-BUF that downstream consumers can't render correctly.
- **mpv `--hwdec=vaapi --vo=gpu`** GL-imports the DMA-BUF as a texture and renders it → solid blue. The kernel-side decode is good (proven by vaapi-copy path showing the bunny), so the corruption is at the export → import → render layer.
- **Firefox 150** evaluates the first frame's HW-decoded result (presumably also receives a "blue" / wrong DMA-BUF), determines the path doesn't render correctly, and **falls back to FFmpeg(FFVPX) software decode** for sustained playback.
- **chromium-fourier 149** uses an entirely different code path (chromium-internal V4L2 stateless decoder, bypassing libva), so this bug doesn't affect it. The 2026-05-03 `fourier_attribution` cell A success was via that internal path, not via libva-v4l2-request — a fact this Phase 7 work disambiguated for the first time.
Likely root causes (in order to investigate):
1. **Wrong DRM_FORMAT**: hantro G1 NV12 has `sizeimage=3,655,712` for 1920×1088 (vs vanilla 3,133,440 = +522,272 bytes of tile padding). If `vaExportSurfaceHandle` reports `DRM_FORMAT_NV12` with **linear modifier** instead of a hantro-specific modifier (e.g. `DRM_FORMAT_MOD_LINEAR` vs a tile/coded-format modifier), Mesa will read the buffer at wrong byte offsets → garbage chroma → blue tinted output.
2. **Wrong plane offset/stride**: NV12 has Y plane followed by interleaved UV. If our export reports `offset[1]` (UV plane start) at the wrong byte, Mesa reads the wrong region for chroma.
3. **Missing colorspace hint**: DMA-BUF doesn't carry colorspace per se, but the consumer needs YUV→RGB conversion matrix info. mpv typically infers BT.709 limited-range for HD content, so this is a less likely culprit but should be ruled out.
4. **Missing surface-side cache flush**: Similar to the patch-0011 cache bug — if our export doesn't ensure the DMA-BUF is cache-coherent at the time of export, GL import might see stale data. But hantro CMA is probably uncached/coherent on ARM64 by default; less likely.
`surface-handle export, dmabuf modifier negotiation` is **explicitly in the campaign's locked scope** per `phase0_findings.md`. This is iteration 1's Phase 4→6 cycle #2.
## What this means for Phase 1 lock + iteration 2 (performance)
- Phase 1 boolean-correctness criterion is **partially met** — vaapi-copy works end-to-end, vaapi (DMA-BUF) does not. Phase 1 lock should not happen until both paths work, OR the criterion should be sharpened further to specify which path counts.
- Iteration 2 (performance: SW baseline vs HW with libva-multiplanar) is **gated** on the DMA-BUF path working, because real-world consumers (Firefox especially, but also production mpv configs) prefer DMA-BUF over the system-memory copy path. vaapi-copy involves an extra CPU-side copy that adds latency and CPU load — using it as the perf comparison would understate the HW benefit.
## Phase 6 follow-up: chromium-fourier path is orthogonal
The 2026-05-03 `fourier_attribution` Phase 5 review's cell-A "wheat" verdict for chromium-fourier-with-Step-1-patches was load-bearing on the assumption that Step 1 = libva-v4l2-request multi-planar work. Today's Phase 7 work shows that chromium-fourier's actual decode path **does not go through libva-v4l2-request** — it uses chromium's built-in V4L2 stateless decoder (the chromeos-mature `media/gpu/v4l2/v4l2_video_decoder_backend_stateless`), enabled via a chromium-side patch ("Step 2"). Our libva fix neither helps nor hurts that path.
That doesn't invalidate the cell-A vs cell-B 83 pp browser-CPU finding (HW decode IS happening for chromium-fourier; it's just via a different driver). But it does mean:
- The chromium-fourier-internal path doesn't depend on libva-v4l2-request fixes.
- Iteration 2's perf comparison should use mpv (vaapi-copy and once fixed, vaapi) and Firefox as the libva-driven consumers, with chromium-fourier as a separate performance reference (not a libva validator).
- A vanilla Chromium build forced through libva would be a true validator; chromium-fourier is not.
## Artifacts
Live-session test outputs (preserved on ohm `/tmp/`):
- `/tmp/test-A-mpv.{stdout,stderr,pid}` — mpv vaapi vo=gpu (blue)
- `/tmp/test-B-firefox.{stdout,stderr,pid}` — Firefox live session
- `/tmp/firefox-vaapi-test/firefox_test_B.log.{moz_log,child-1.moz_log}` — MOZ_LOG (1 slice_header parse, 21k+ ProcessDecode via FFVPX)
- `/tmp/test-C-chromium.{stdout,stderr,pid}` — chromium-fourier 149 (no libva engagement)
Not pulled to campaign repo — they're rebuilt easily and the per-process log files are large.
## Next iteration of Phase 4→6 (open)
1. Read libva-v4l2-request `src/surface.c::ExportSurfaceHandle` (or wherever vaExportSurfaceHandle lives in the fork's code).
2. Compare against FFmpeg `references/ffmpeg-kwiboo/libavcodec/v4l2_request.c` (or `hwcontext_v4l2request.c` per the b57fbbe head) for surface-export details — DRM_FORMAT, modifier, plane offsets, colorspace hint.
3. Diagnose the blue-frame: instrument the export call to log what's returned. Cross-check against what mpv's `--msg-level=vd=v --msg-level=vo=v` reports about the imported texture format/modifier.
4. Implement fix. Build. Real-VO test. Bunny via `--hwdec=vaapi --vo=gpu` = win.
5. Retest Firefox: should now stay engaged instead of falling back.