Files
libva-multiplanar/phase8_iteration1_close.md
marfrit 7e66abb72f Phase 8: iteration 1 close — deliverable lands for vaapi-copy + Firefox + vainfo
Iteration 1 close. Boolean-correctness criterion met for vainfo +
mpv vaapi-copy + Firefox 150 in live Plasma session. mpv vaapi
(DMA-BUF) engages and decodes correctly but stutters due to a
DMA-BUF lifecycle race — deferred to iteration 2.

Four load-bearing functional commits on the fork:
  9de1be3 — slice-header bit-parser populates DECODE_PARAMS bit-size
            fields hantro G1 reads into MMIO registers
  d41a4b9 — always submit SCALING_MATRIX + populate pps num_ref_idx
  37c0e72 — re-set OUTPUT format on resolution change (mpv probe-pattern)
  ac891a0 — honor VA_EXPORT_SURFACE_SEPARATE_LAYERS in vaExportSurfaceHandle

Lessons distilled to memory:
  feedback_read_consumer_source_first (NEW)
  feedback_one_consumer_success_is_not_validation
  feedback_kernel_source_audit_for_uapi_contract (Sonnet Phase 5)
  feedback_stdout_is_data_too (NEW)
  feedback_no_premature_closure

Predecessor claims either reframed or invalidated:
- 'vainfo + mpv probes work end-to-end' was true at libva engagement
  layer, wrong at kernel-decode layer until 9de1be3
- 'chromium-fourier 149 = libva-multi-planar working' was wrong about
  mechanism; chromium-fourier uses chromium-internal V4L2 backend,
  not libva. The 83 pp browser-CPU finding from fourier_attribution
  cell A stands; the path attribution does not.
- 'patch 0011 sentinel test reliably detects decode failure' had a
  cache-coherency bug fixed in a047926.

Open questions carried to iteration 2: DMA-BUF EXPBUF refcount
lifecycle (load-bearing), WSI pitch alignment for non-64-aligned
widths, multi-resolution kernel-state corruption, plus six items
from Sonnet's Phase 5 review.

Iteration 2 opens with these as Phase 0 substrate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:46:07 +00:00

9.9 KiB
Raw Permalink Blame History

Iteration 1 — Phase 8 close (2026-05-04)

Deliverable

A libva-v4l2-request fork (marfrit/libva-v4l2-request-fourier, master at commit c036a44) that engages the hantro G1/G2 hardware H.264 decoder on PineTab2 RK3568 end-to-end for VA-API consumers, producing real decoded NV12 pixel content.

Boolean-correctness criterion (sharpened mid-iteration)

Consumer engages the libva backend AND kernel produces real pixel output, verified on a real VO via operator inspection.

Consumer Result
vainfo ✓ enumerates 7 H.264 + 2 MPEG-2 profiles
mpv --hwdec=vaapi-copy --vo=null ✓ HW engaged, contract trace clean
mpv --hwdec=vaapi-copy --vo=gpu (live session) ✓ bunny renders (operator-confirmed)
mpv --hwdec=vaapi --vo=gpu (live session) ⚠ HW engaged, real pixels, but DMA-BUF lifecycle race causes visual stutter — deferred to iteration 2
Firefox 150 (live Plasma Wayland session) ✓ sustained HW decode (100+ slice_header parses, RDD memory-maps /dev/video1), bunny renders
chromium-fourier 149 n/a (uses chromium-internal V4L2 backend, bypasses libva entirely — not a libva validator)

Met for vaapi-copy + Firefox + vainfo. Partial for vaapi (DMA-BUF) — engagement and decode work; lifecycle race produces visible artifacts. iteration 2 target.

What landed (fork commits ahead of bootlin upstream)

c036a44 image: fully populate VAImageFormat per VAAPI spec for NV12
ac891a0 surface: honor VA_EXPORT_SURFACE_SEPARATE_LAYERS in vaExportSurfaceHandle
fdfee2d DEBUG: log SyncSurface RETURN to confirm clean exit before crash
21ae311 DEBUG: ENTER on CreateBuffer + BeginPicture for frame-1 crash narrowing
92f5b25 DEBUG: ENTER on buffer/image entry points to localize Firefox RDD crash
7da2b27 DEBUG: ENTER logging at libva entry points to trace Firefox call flow
6be3f3b h264: rate-limit V4L2 readback EACCES warning to once per process
a047926 DEBUG: cache-fix CAPTURE dump + VIDIOC_G_EXT_CTRLS readback
2517a12 DEBUG: instrument surface CreateSurfaces2 + ExportSurfaceHandle for diagnosis
37c0e72 surface: re-set OUTPUT format on resolution change
9de1be3 h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields
d41a4b9 h264: always submit SCALING_MATRIX + populate pps num_ref_idx
74b3793 STUDY.md: pointer to libva-multiplanar campaign Phase 0
8594d74..c45fea9 (18 patches) Step 1 reconciliation from marfrit-packages PKGBUILD

The four load-bearing functional commits:

  • 9de1be3 — slice-header bit-parser populates DECODE_PARAMS bit-size fields hantro G1 reads into MMIO registers (the load-bearing decode fix; without it the kernel writes all zeros)
  • d41a4b9 — always submit SCALING_MATRIX + set the PPS_FLAG; populate pps num_ref_idx (decode prerequisites)
  • 37c0e72 — re-set OUTPUT format on resolution change (probe-then-real-resolution mpv pattern)
  • ac891a0 — honor VA_EXPORT_SURFACE_SEPARATE_LAYERS flag in vaExportSurfaceHandle (the load-bearing Firefox fix)

The DEBUG/diagnostic commits (a047926, 2517a12, 6be3f3b, 7da2b27, 92f5b25, 21ae311, fdfee2d, plus pre-existing 0010/0011/0014) stay until iteration 2's DMA-BUF lifecycle work needs a clean baseline. Per Phase 5 review: clean sweep before any "this is what we'd push to bootlin" snapshot.

Lessons distilled to memory

  • feedback_read_consumer_source_first.md (NEW) — when a specific consumer fails and others don't, read its source FIRST before generating speculative theories. Iteration 1 burned ~3h on Firefox crash/fallback hypotheses; the actual answer (VA_EXPORT_SURFACE_SEPARATE_LAYERS) was 30 lines of FFmpegVideoDecoder.cpp.
  • feedback_one_consumer_success_is_not_validation.md (added mid-iteration) — mpv tolerated several spec violations Firefox correctly rejected. The strictest consumer drives the boolean-correctness criterion, not the easiest-to-test one.
  • feedback_kernel_source_audit_for_uapi_contract.md (Sonnet's Phase 5 review crystallization) — when userspace fills V4L2/UAPI control fields, read the kernel driver source for which fields drive MMIO writes. Patch 0008's "empirical question — does hantro tolerate zero?" was the wrong resolution; reading hantro_g1_h264_dec.c gave the answer instantly.
  • feedback_stdout_is_data_too.md (NEW) — feedback_phase3_no_theatre.md says stdout doesn't replace deeper data acquisition; it doesn't say ignore stdout entirely. mpv's Dropped: N counter and V: timeline are the user-visible quality signal that strace/ftrace can't see.
  • feedback_no_premature_closure.md (added early-iteration, campaign-specific) — never frame "close the campaign" as a peer option to substantive work-remaining.

Predecessor claims that were either reframed or invalidated

Claim (source) Reframing in iteration 1
"vainfo + mpv probes work end-to-end" (STUDY.md, 2026-04-26) True at libva engagement layer; wrong at kernel-decode layer until commit 9de1be3 landed. The original test never inspected pixel content.
"chromium-fourier 149 = libva-multi-planar working" (fourier_attribution cell A) chromium-fourier uses its own internal V4L2 stateless decoder (chromium-side "Step 2" patches), bypassing libva entirely. Cell A's HW decode IS happening; the mechanism is chromium-internal, not libva. The 83 pp browser-CPU finding stands; the path attribution does not.
"Step 1 patches deliver working decode" (predecessor close-out) Step 1 engaged libva correctly but left the decode broken at the kernel layer because patch 0008's bit-size open question was unresolved. Iteration 1's commit 9de1be3 is the actual fix.
"patch 0011 sentinel test reliably detects decode failure" (predecessor patch comment) The sentinel test had a cache-coherency bug that consistently showed sentinel-survives even when the kernel had DMA-overwritten the buffer. Iteration 1's commit a047926 adds `msync(MS_SYNC

Phase 0 substrate carry-over to iteration 2

State that carries (re-verified in iteration 1):

  • ohm RK3568 hantro G1/G2 on /dev/video1 + /dev/media0, multi-planar V4L2 stateless, kernel 6.19.10
  • mainline V4L2 stateless H.264 control IDs (no out-of-tree kernel patches needed)
  • libva 2.23.0, libva-utils 2.22.0, mpv 0.41.0-3, Firefox 150.0.1, Mesa 26.0.5
  • Test clip: /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4 (sha256 dcf8a7170fbd49bb...)
  • Build harness: meson setup --buildtype=release && ninja directly on ohm; marfrit/libva-v4l2-request-fourier deployed to /usr/lib/dri/v4l2_request_drv_video.so (sha256 of latest: per ohm pacman -Qi)
  • VIDIOC_G_EXT_CTRLS readback returns EACCES on this rig (kernel-side, not actionable from userspace; rate-limited warning per process)
  • mpv (vaapi-copy + vaapi DMA-BUF), vainfo, Firefox 150 in live Plasma Wayland session — all engage the libva backend and reach hantro

State that does NOT carry (to be re-acquired per feedback_replicate_baseline_first.md):

  • Performance numbers: drop counts, effective FPS, browser CPU%, scanout-plane residency. Iteration 1 saw 64/300 drops in 12s for vaapi-copy and 29/300 for vaapi (DMA-BUF), but these are not anchors — re-measure in iteration 2 with consistent rig.
  • Multi-resolution kernel-state behavior: iteration 1 surfaced corruption when Firefox played multi-video pages (Mozilla homepage); iteration 2 needs to re-measure with the deferred fix in scope.

Open questions for iteration 2:

  1. DMA-BUF EXPBUF refcount lifecycle (Task #39) — V4L2 re-queues a CAPTURE buffer while consumer holds EXPBUF'd fd → physical memory overwritten under consumer → mpv --vo=gpu --hwdec=vaapi stutter. Load-bearing for iteration 2.
  2. WSI pitch alignment (Task #40 part 1) — Mesa rejects pitch=864 (only 16-aligned, needs 64+); breaks 864-wide videos in Firefox. Need to round up reported pitch OR set a different DRM_FORMAT_MOD.
  3. Multi-resolution kernel-state (Task #40 part 2) — after a 864→1920 sequence, kernel CAPTURE format reverts to 48×48; our LAST_OUTPUT_WIDTH/HEIGHT cache misses the corruption. Need REQBUFS(0)+S_FMT resync on detecting format mismatch.
  4. VIDIOC_G_EXT_CTRLS EACCES probe (Sonnet review 7.1) — move readback before MEDIA_REQUEST_IOC_QUEUE and confirm the timing. If EACCES persists, file a kernel issue.
  5. Firefox seek-to-non-IDR (Sonnet review 7.5) — verify Firefox handles a stream where the first played frame isn't an IDR (mid-stream seek).
  6. SET_FORMAT_OF_OUTPUT_ONCE removal completeness (Sonnet review 7.3) — the global was replaced with LAST_OUTPUT_WIDTH/HEIGHT tracking; document the architectural state and verify multi-context use.
  7. // HACK block in surface.c (Sonnet review 7.4) — refactor when codec multi-codec support (MPEG-2) is exercised.
  8. num_ref_idx_l0/l1_default_active_minus1 source for multi-slice (Sonnet review 7.2) — only matters for multi-slice streams with explicit per-slice override. Defer until a real test stream surfaces it.

Phase 1 lock for iteration 1: closed

Iteration 1's locked Phase 1 criterion ("consumer engages backend AND produces real pixel output, verified on real VO") is met for vainfo + mpv vaapi-copy + Firefox. Met-with-caveat for mpv vaapi (DMA-BUF) — engagement + decode correct, lifecycle bug deferred. The deliverable is real and operator-validated; iteration 2 hardens it.

Bootlin upstream outlook (per feedback_no_upstream.md, no PRs unless asked)

The Step 1 patch series + the slice-header parser + SEPARATE_LAYERS honor + LAST_OUTPUT_WIDTH tracking together represent ~2000 lines of changes against bootlin's dormant upstream. The DEBUG patches (0010, 0011, 0014, 3F readback, surface-export instrumentation) need to come out before any upstream-submission snapshot. Refactoring the // HACK block, fixing the DMA-BUF lifecycle, and adding multi-resolution support are required for upstreamability. Iteration 2 → 3 → eventual upstream is the natural arc, gated by operator decision per feedback_no_upstream.md.