Files
libva-multiplanar/phase7_findings.md
T
marfrit a052d5d7cd Phase 7 verification: vaapi-copy works, DMA-BUF surface-export bug surfaces
Live Plasma 6 Wayland session retest of all 4 target consumers
against fork commit 6be3f3b.

Results:
- vainfo: ✓ no regression (7 H.264 + 2 MPEG-2 profiles)
- mpv --hwdec=vaapi-copy --vo=gpu: ✓ bunny (Phase 6 success
  re-confirmed in live session)
- mpv --hwdec=vaapi --vo=gpu: ⚠ solid blue frame
- Firefox 150 (live session): ⚠ engages libva for 1 frame
  (gets real pixels per slice_header parse log), then falls
  back to FFmpeg(FFVPX) software for sustained playback
- chromium-fourier 149: ✓ no regression but ORTHOGONAL — uses
  chromium's own V4L2 stateless decoder, bypasses libva entirely

Tests A (mpv vaapi) and B (Firefox) converge on the same
DMA-BUF surface-export bug: vaExportSurfaceHandle in libva-
v4l2-request produces a DMA-BUF that Mesa/Firefox can't render
correctly — likely wrong DRM_FORMAT modifier or plane offset/
stride mismatch with hantro's tile-padded NV12 (sizeimage=
3,655,712 vs vanilla 3,133,440 for 1920x1088).

Also disambiguated: chromium-fourier 149's decode path does
NOT go through libva-v4l2-request — uses chromium's own V4L2
backend (Step-2 chromium-side patch). Reframes the 2026-05-03
fourier_attribution cell-A wheat verdict's path validation.

Boolean-correctness criterion (sharpened): met for vaapi-copy,
not for vaapi (DMA-BUF). Phase 1 lock should wait until both
paths work. Iteration 2 (perf) is gated on the DMA-BUF path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:25:16 +00:00

7.4 KiB
Raw Blame History

Phase 7 verification — 2026-05-04

In-session retest of all four target consumers (vainfo, mpv, Firefox 150, chromium-fourier 149) against the campaign deliverable as of fork commit 6be3f3b (with d41a4b9 SCALING_MATRIX/num_ref_idx + 9de1be3 slice-header parser + a047926/6be3f3b observability hardening). Live Plasma 6 Wayland session on ohm.

Summary

Consumer Path Engages our libva? Real pixels? Verdict
vainfo enumerate-only ✓ enumerates 7 H.264 + 2 MPEG-2 profiles n/a ✓ no regression
mpv --hwdec=vaapi-copy (vo=null) vaapi-copy → CPU ✓ sustained (10/10 frames in smoke run) ✓ ftrace vb2_buf_done bytesused=3655712, ymin/ymax variance > 0 mid-frame
mpv --hwdec=vaapi-copy (vo=gpu live) vaapi-copy → CPU → GL upload ✓ sustained ✓ bunny visible (operator-confirmed) ✓ Phase 6 success
mpv --hwdec=vaapi (vo=gpu live) vaapi → DMA-BUF → GL import ✓ engages solid blue frame ⚠ surface-export bug
Firefox 150 (live session) RDD → vaapi → DMA-BUF ✓ 1 frame, real pixels (slice_header parse logged once) falls back to SW after frame 0 (lsof: 0 /dev/video1 holders during sustained playback; 21,522 ProcessDecode via FFmpeg(FFVPX) PDM) ⚠ same DMA-BUF bug → SW fallback
chromium-fourier 149 chromium-internal V4L2 stateless ✗ uses own media/gpu/v4l2/v4l2_video_decoder_backend_stateless path; bypasses libva entirely ✓ bunny visible, chrome://gpu = HW ✓ no regression but orthogonal to libva fix

Boolean-correctness criterion (sharpened)

The Phase 1 criterion as amended in phase0_evidence/2026-05-04-kernel-trace/findings.md:

consumer engages backend AND kernel produces decoded pixel output (verified by visual inspection on a real VO, not by sentinel test)

Met for the vaapi-copy path (mpv-vaapi-copy + vo=gpu = bunny). Not met for the vaapi (DMA-BUF) path (blue frame in mpv, SW fallback in Firefox).

Surface-export bug (the next iteration)

Test A and Test B converge on the same failure mode: the libva backend's vaExportSurfaceHandle produces a DMA-BUF that downstream consumers can't render correctly.

  • mpv --hwdec=vaapi --vo=gpu GL-imports the DMA-BUF as a texture and renders it → solid blue. The kernel-side decode is good (proven by vaapi-copy path showing the bunny), so the corruption is at the export → import → render layer.
  • Firefox 150 evaluates the first frame's HW-decoded result (presumably also receives a "blue" / wrong DMA-BUF), determines the path doesn't render correctly, and falls back to FFmpeg(FFVPX) software decode for sustained playback.
  • chromium-fourier 149 uses an entirely different code path (chromium-internal V4L2 stateless decoder, bypassing libva), so this bug doesn't affect it. The 2026-05-03 fourier_attribution cell A success was via that internal path, not via libva-v4l2-request — a fact this Phase 7 work disambiguated for the first time.

Likely root causes (in order to investigate):

  1. Wrong DRM_FORMAT: hantro G1 NV12 has sizeimage=3,655,712 for 1920×1088 (vs vanilla 3,133,440 = +522,272 bytes of tile padding). If vaExportSurfaceHandle reports DRM_FORMAT_NV12 with linear modifier instead of a hantro-specific modifier (e.g. DRM_FORMAT_MOD_LINEAR vs a tile/coded-format modifier), Mesa will read the buffer at wrong byte offsets → garbage chroma → blue tinted output.
  2. Wrong plane offset/stride: NV12 has Y plane followed by interleaved UV. If our export reports offset[1] (UV plane start) at the wrong byte, Mesa reads the wrong region for chroma.
  3. Missing colorspace hint: DMA-BUF doesn't carry colorspace per se, but the consumer needs YUV→RGB conversion matrix info. mpv typically infers BT.709 limited-range for HD content, so this is a less likely culprit but should be ruled out.
  4. Missing surface-side cache flush: Similar to the patch-0011 cache bug — if our export doesn't ensure the DMA-BUF is cache-coherent at the time of export, GL import might see stale data. But hantro CMA is probably uncached/coherent on ARM64 by default; less likely.

surface-handle export, dmabuf modifier negotiation is explicitly in the campaign's locked scope per phase0_findings.md. This is iteration 1's Phase 4→6 cycle #2.

What this means for Phase 1 lock + iteration 2 (performance)

  • Phase 1 boolean-correctness criterion is partially met — vaapi-copy works end-to-end, vaapi (DMA-BUF) does not. Phase 1 lock should not happen until both paths work, OR the criterion should be sharpened further to specify which path counts.
  • Iteration 2 (performance: SW baseline vs HW with libva-multiplanar) is gated on the DMA-BUF path working, because real-world consumers (Firefox especially, but also production mpv configs) prefer DMA-BUF over the system-memory copy path. vaapi-copy involves an extra CPU-side copy that adds latency and CPU load — using it as the perf comparison would understate the HW benefit.

Phase 6 follow-up: chromium-fourier path is orthogonal

The 2026-05-03 fourier_attribution Phase 5 review's cell-A "wheat" verdict for chromium-fourier-with-Step-1-patches was load-bearing on the assumption that Step 1 = libva-v4l2-request multi-planar work. Today's Phase 7 work shows that chromium-fourier's actual decode path does not go through libva-v4l2-request — it uses chromium's built-in V4L2 stateless decoder (the chromeos-mature media/gpu/v4l2/v4l2_video_decoder_backend_stateless), enabled via a chromium-side patch ("Step 2"). Our libva fix neither helps nor hurts that path.

That doesn't invalidate the cell-A vs cell-B 83 pp browser-CPU finding (HW decode IS happening for chromium-fourier; it's just via a different driver). But it does mean:

  • The chromium-fourier-internal path doesn't depend on libva-v4l2-request fixes.
  • Iteration 2's perf comparison should use mpv (vaapi-copy and once fixed, vaapi) and Firefox as the libva-driven consumers, with chromium-fourier as a separate performance reference (not a libva validator).
  • A vanilla Chromium build forced through libva would be a true validator; chromium-fourier is not.

Artifacts

Live-session test outputs (preserved on ohm /tmp/):

  • /tmp/test-A-mpv.{stdout,stderr,pid} — mpv vaapi vo=gpu (blue)
  • /tmp/test-B-firefox.{stdout,stderr,pid} — Firefox live session
  • /tmp/firefox-vaapi-test/firefox_test_B.log.{moz_log,child-1.moz_log} — MOZ_LOG (1 slice_header parse, 21k+ ProcessDecode via FFVPX)
  • /tmp/test-C-chromium.{stdout,stderr,pid} — chromium-fourier 149 (no libva engagement)

Not pulled to campaign repo — they're rebuilt easily and the per-process log files are large.

Next iteration of Phase 4→6 (open)

  1. Read libva-v4l2-request src/surface.c::ExportSurfaceHandle (or wherever vaExportSurfaceHandle lives in the fork's code).
  2. Compare against FFmpeg references/ffmpeg-kwiboo/libavcodec/v4l2_request.c (or hwcontext_v4l2request.c per the b57fbbe head) for surface-export details — DRM_FORMAT, modifier, plane offsets, colorspace hint.
  3. Diagnose the blue-frame: instrument the export call to log what's returned. Cross-check against what mpv's --msg-level=vd=v --msg-level=vo=v reports about the imported texture format/modifier.
  4. Implement fix. Build. Real-VO test. Bunny via --hwdec=vaapi --vo=gpu = win.
  5. Retest Firefox: should now stay engaged instead of falling back.