Files
libva-multiplanar/phase0_findings.md
T
marfrit 365764fffb Phase 0 amendment: hantro writes zeros, sentinel test cache-buggy
Re-baselined libva-v4l2-request decode path with kernel-side
observability (ftrace v4l2/vb2/dma_fence + dmesg + dynamic_debug)
and visual disambiguator (mpv --vo=gpu in operator's live Plasma
session).

Findings:

1. Kernel reports successful CAPTURE buffer write every frame:
   ftrace vb2_buf_done shows bytesused=3655712 (full NV12 1920x1088
   + hantro tile padding). dmesg completely silent — no
   hantro/vpu/decode/error/warn messages.

2. Visual disambiguator: mpv --hwdec=vaapi-copy --vo=gpu shows a
   solid GREEN frame; --hwdec=vaapi --vo=gpu shows solid BLUE.
   Neither shows the sentinel mid-beige (NV12 Y=0xab,UV=0xab would
   render cream). Both colors are consistent with the kernel
   writing all-zero NV12 (Y=0,UV=0 → green via BT.709 limited; same
   buffer GL-imported as DMA-BUF with different colorspace → blue).

3. Patch 0011 sentinel test has a cache-coherency bug: writes
   0xab via cached surface_object->destination_map[0] mmap, never
   invalidates cache before readback. So the readback always
   shows the stale sentinel even when kernel DMA-overwrote it
   with zeros. vaapi-copy and Mesa DMA-BUF GL import correctly
   invalidate cache and see the real (zero) contents.

This corrects the previous Phase 0 verdicts twice in one day:
- Original commit f15ba8b ("the 2026-04-26 picture holds") was
  wrong: clean contract trace, never checked pixel content.
- Revised commit e892cea ("kernel produces no decoded pixel
  output, sentinel survives") was half right: kernel does write,
  writes zeros, and the sentinel test was reading stale cache.
- Now: kernel writes ALL ZEROS to the CAPTURE buffer. Hantro is
  silently failing the bitstream parse or some control validation.

This is consistent with patch 0011's own commit message hypothesis:
"All zeros → kernel did write 0x00s (overwriting our sentinel),
and the apparent 'no picture' output is the kernel-side decode
actually producing zeros (e.g. parser rejected the bitstream)."
That hypothesis was right; we just couldn't confirm it via the
sentinel test (cache bug) and went down the wrong rabbit hole.

Phase 6 direction sharpens substantially. Bug isn't "we can't
engage hantro" — it's "hantro engages but its parser produces
zeros." Bisect the control submission: VIDIOC_G_EXT_CTRLS
readback to verify writes stick, diff against FFmpeg's
v4l2_request_h264.c (proven working on hantro), verify SPS
completeness, resolve patch 0008's slice_header bit_size open
question, dyndbg the hantro module, etc. Phase 1 boolean-
correctness criterion needs a working pixel-content check before
lock; fix patch 0011's cache sync first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 11:39:42 +00:00

222 lines
26 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 0 — libva-multiplanar
This campaign's substrate, locked research question, and pre-Phase-1 inventory work. Adapted from the prior `STUDY.md` in the fork (`libva-v4l2-request-fourier/STUDY.md` as of commit `e0acc33`, which has now been replaced with a pointer to this file) and re-framed against the 8(+1)-phase loop discipline.
## Campaign-contained data discipline (governing rule)
Per [`feedback_dev_process.md`](../../.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md) Phase 0 + [`feedback_replicate_baseline_first.md`](../../.claude/projects/-home-mfritsche-src-kwin-overlay-subsurface/memory/feedback_replicate_baseline_first.md):
This campaign acquires its own measurement data in-session. Predecessor work (the fork's prior `STUDY.md`, `ohm_gl_fix/phase6/step1/` audit, `fourier_attribution` cell-A vs cell-B numbers) is documented for **state** carry-over — file:line pointers, contract analyses, build recipes, kernel-UAPI rename catalog, the V4L2-request multi-planar API map — but its measurement claims (e.g. "vainfo enumerates seven H.264 profiles cleanly", "Brave wall is chromeos pipeline as of 2026-04-26") are **reference history** until re-verified in-session. The 2026-04-26 failure-mode finding may have drifted; re-establish before relying on it.
## Research question (LOCKED 2026-05-04)
> **"Make libva-v4l2-request accepted at all by VA-API consumers on PineTab2 RK3568, providing access to the hantro G1/G2 hardware decoder for H.264 and MPEG-2, end-to-end. Performance metrics are explicitly deferred to a follow-up iteration."**
Pass/fail is **boolean correctness**, not throughput:
- Does the consumer dlopen `v4l2_request_drv_video.so`?
- Does it complete the VA-API surface lifecycle calls without falling back to SW?
- Does an actual V4L2 request-API ioctl (`VIDIOC_QBUF` with attached SPS/PPS controls + a request fd → `MEDIA_REQUEST_IOC_QUEUE``VIDIOC_DQBUF` of a populated CAPTURE buffer) land on hantro?
If yes → done for the iteration. Frame-rate / CPU% / drops measurement is a separate iteration whose binding cells will be locked separately.
## Mechanism the question targets
Hantro VPU on RK3568 exposes its decode interface as a **multi-planar V4L2 stateless** device (`/dev/video1`, `V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE` + `V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE`, request-API for control submission). VA-API consumers (mpv, Firefox via libavcodec, Chromium/Brave via its own decoder, vainfo as smoke test) speak libva, not V4L2 directly. The bridge they expect is `libva-v4l2-request` — a libva backend that translates `vaCreateSurfaces2` / `vaBeginPicture` / `vaRenderPicture` / `vaEndPicture` into the V4L2-stateless protocol.
Bootlin's upstream `libva-v4l2-request` (dormant since 2021) was written for **single-plane** sunxi-cedrus. None of the other public forks (jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov) ship multi-planar end-to-end. Collabora's strategic replacement `cros-codecs` is Rust + bypasses libva and is not shipping soon — leaving a hole that this campaign closes.
External pointers:
- Mozilla bug 1833354 / 1965646 — Firefox HW decode on RK3566/RK3588 explicitly requires `libva-v4l2-request`, not `v4l2-m2m`.
- Bootlin upstream (dormant): <https://github.com/bootlin/libva-v4l2-request>
## Predecessor close-out summary (state carry-over, not data)
### From `~/src/ohm_gl_fix/phase6/step1/` (closed 2026-05-02, contract-correct port snapshot)
Patches `0001..0018` against an early multi-planar branch of `libva-v4l2-request`, plus the audit at `audit_0008_decode_params_2026-05-01.md`. Most relevant for this campaign:
- `0008-h264-decode-params-correctness.patch` — V4L2_CTRL_TYPE_FWHT_PARAMS / DECODE_PARAMS shape verified against `hantro_h264.c` kernel source.
- `0012-h264-omit-scaling-matrix-frame-based.patch` — contract-correct gating of `SCALING_MATRIX` control by `matrix_set` rather than decode mode (one of the canonical examples of "Phase-3-derived implementation considered harmful" in `feedback_dev_process.md`).
- vainfo enumerates H.264 profiles cleanly with these patches against `chromium-fourier 149` binary, confirmed by `fourier_attribution` cell-A (54 % browser CPU, fps 24.0). **State**: the patches map cleanly onto a multi-planar libva-v4l2-request and represent a correctness baseline.
The Step 1 patches must be reconciled against the libva-v4l2-request-fourier `master` (12 commits ahead of bootlin tip). Either fold-in (preferred), or supersede the fork's WIP commits with the audit-anchored Step 1 set, or document why a divergent path makes sense.
### From `libva-v4l2-request-fourier/` (the fork, now sub-tree of this campaign)
Carry-over **state** (re-verify before treating as current):
- 12 commits ahead of bootlin `a3c2476`. Six "build cleanly against current kernel UAPI" commits (`V4L2_PIX_FMT_H264_SLICE_RAW``V4L2_PIX_FMT_H264_SLICE` rename; missing `utils.h` include; HEVC strip; `h264-ctrls.h` shim with `V4L2_CID_MPEG_VIDEO_H264_*``V4L2_CID_STATELESS_H264_*` aliases; `struct v4l2_ctrl_h264_slice_params` shape updates; `tiled_yuv.S` aarch64 stub).
- Five probe + control flow fix commits (`src/video.c` NV12 multi-plane format entry; `src/surface.c` MPLANE probe fallback; eager probe in `RequestInit`; `src/context.c` rename pass; **WIP**: `STREAMON` defer in `RequestCreateContext` — the V4L2 stateless protocol on hantro requires OUTPUT format → SPS controls → first slice queued → THEN STREAMON; deferring lets `vaCreateContext` succeed but proper sequencing is the next phase).
- `src/utils.c` diagnostic logging tee to `/tmp/libva-fourier.log` (will revert before any final).
- Recent (2026-05-02) WIP entry-point tracing across `surface.c`, `image.c`, `buffer.c`, `context.c` for Brave's libva surface stack instrumentation.
The build artifact is a ~265 KB `.so`. `vainfo` + `mpv --hwdec=vaapi` enumerated profiles end-to-end as of 2026-04-26.
### From `~/src/fourier_attribution/` (closed 2026-05-04 with Phase 5 review)
- Cell A (chromium-fourier 149 with Step 1 + Step 2 patches): `browser_cpu_median = 54.4 %`, `effective_fps = 24.0`, `drops_60s = 12`. **The libva-multi-planar path is engaged here** — this is what end-to-end success looks like at the workload level.
- Cell B (stock Brave 1.89 / Chromium 147): `browser_cpu_median = 137 %`, `fps = 23.18`, `drops_60s = 16`. **Brave's libva path falls back to SW** because of the chromeos-pipeline gating documented in `STUDY.md` § "Brave's failure is not in our driver".
- The 83 pp browser-CPU gap is the campaign-relevant signal that "multi-planar libva is the binding decode-side enabler" — but Sonnet's Phase 5 review correctly flagged this is confounded with the Brave-147-vs-Chromium-149 base-version delta. Cell E (vanilla Chromium 149) was identified as the cheapest disambiguator.
**Phase 7 verification gate (LOCKED 2026-05-04)**: when this campaign's Phase 6 lands a working multi-planar libva-v4l2-request, Phase 7 will retest `fourier_attribution` cell B (Brave) and the deferred cell E (vanilla Chromium 149) on this campaign's deliverable — that retroactively answers the chromium-fourier wheat verdict's confound.
### From `~/src/kwin_overlay_subsurface/` and `~/src/x11-session-research/` (orthogonal)
The NV12-scanout-plane gap on rockchip-drm RK3568 (Plane 39 the only NV12-LINEAR plane; Plane 45 advertises zero NV12 modifiers; X server doesn't program either with NV12 regardless of session server) is **orthogonal** to this campaign. libva is decode-side; the scanout gap is display-side. Don't confuse them. This campaign's deliverable does not unstick that. The display-side absorbs the NV12 → RGB GL-composite step in KWin (kept cheap by `kwin-fourier`'s `watchDmaBuf` fix per the `fourier_attribution` cell-D evidence).
## Current ohm state (carry-over from `fourier_attribution`)
- Kernel: `6.19.10-danctnix1-1-pinetab2`
- Mesa: `1:26.0.5-1`
- Plasma 6.6.4 Wayland session
- `qt6-base-fourier 1:6.11.0-3`, `qt6-xcb-private-headers-fourier 1:6.11.0-3`, `kwin-fourier 1:6.6.4-3` installed (cell-A package state restored end of `fourier_attribution`)
- `chromium-fourier 149` binary at `/tmp/chromium-ohm-gl-fix-step2/chrome` (Step 1 + Step 2 engaged)
- `brave-bin 1:1.89.145-1` (Chromium 147 base, control browser)
- governor `performance`, baloo disabled
- hantro on `/dev/video1`, `/dev/media0` — multi-planar V4L2 stateless
The fork tree at `~/src/libva-multiplanar/libva-v4l2-request-fourier/` is on commit `e0acc33` (master) with no uncommitted changes. Build harness: `meson setup` + `ninja` directly on ohm (small library, no distcc per operator instruction).
## In-scope (LOCKED 2026-05-04)
- libva-v4l2-request **backend only**. Libva front-end (the API library) is mature and supports multi-planar; out of scope for this campaign. Revisit only if Phase 2 source-read surfaces a specific front-end gap.
- Hardware target: **ohm RK3568 hantro G1/G2 first iteration only**. Other devices (fresnel RK3399 hantro G1, ampere/boltzmann RK3588 VDPU381) are explicit future iterations after the ohm path is solid. RK3588 in particular needs VDPU381 driver code that doesn't exist in the fork yet.
- Codecs: H.264 first; MPEG-2 next. HEVC explicitly out (kernel CIDs renamed, RK3566 has no HW HEVC, current fork stripped HEVC per the build-cleanly stack).
- Test consumers (LOCKED 2026-05-04):
- `vainfo` — smoke test, enumerates profiles + entrypoints
- `mpv --hwdec=vaapi` — most directly testable end-to-end consumer for HW decode validation
- Firefox via `media.ffmpeg.vaapi.enabled` + `LIBVA_DRIVER_NAME=v4l2_request` — primary "real consumer" target per Mozilla bug 1965646
- chromium-fourier 149 — regression check (cell A confirmed working; verify still works under any fork changes)
- Brave 1.89 — *deferred* test consumer; the chromeos-pipeline gating documented in `STUDY.md` is upstream to libva and probably not fixable from this campaign's seat. Test it for completeness; don't gate Phase 7 on it.
## Out-of-scope (LOCKED 2026-05-04)
- Front-end libva.
- Other hardware (fresnel, ampere, boltzmann) — separate iterations.
- HEVC, VP8, VP9, AV1.
- Userspace bitstream parsing (kernel V4L2-stateless does this; library forwards parameters).
- HEVC RFC reference frame compression (Rockchip-specific, kernel disabled on ohm).
- Performance metrics. **Explicitly deferred to a follow-up iteration.** Do not lock Phase 1 binding cells around CPU%, fps, drops_60s, or panfrost freq.
- KWin / Wayland scanout-plane work (orthogonal; different campaigns closed).
- `cros-codecs` Rust replacement (out per `user_stance_rust.md`).
- Bootlin / Collabora upstreaming. Per `feedback_no_upstream.md`: no PRs, no MRs, no bug reports unless explicitly tasked. Bootlin upstream is dormant; the question of engaging Hans de Goede / Jernej Škrabec / Collabora when this campaign reaches a defensible state is a separate explicit decision.
## Open questions before Phase 1 lock
1. **In-session re-verification of the 2026-04-26 failure-mode finding** — is it still "vainfo + mpv probes work end-to-end; Brave wall is chromeos pipeline upstream of libva"? Phase 0 inventory must confirm or update before binding cells lock.
2. **Step 1 reconciliation** — fold-in `ohm_gl_fix/phase6/step1/0001..0018` to libva-v4l2-request-fourier `master`, supersede fork WIP, or run a divergent branch? Phase 2 source-read should make the call before Phase 4 plan.
3. **Firefox configuration** — does `media.ffmpeg.vaapi.enabled=true` + `LIBVA_DRIVER_NAME=v4l2_request` + `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` work as documented? Phase 0 inventory item.
4. **`STREAMON` ordering on hantro** — STUDY.md flags this as the load-bearing pending fix: "set both queue formats up front, queue the first buffer with controls attached, then `STREAMON` both queues". Verify against `gst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c` and `FFmpeg/libavcodec/v4l2_request*` — both proven working on the same hardware. This is Phase 6 implementation work but the audit needs to land in Phase 2.
5. **`V4L2_EVENT_SOURCE_CHANGE` handling** — needed for resolution-change streams; not strictly required for the fixed-resolution `bbb_1080p30_h264.mp4` test clip. Defer to Phase 6+ iteration if first-frame decode succeeds without it.
## Open questions resolved in this exchange
- *libva fork scope*: backend only.
- *Hardware target lock*: ohm first; others future iterations.
- *Test corpus*: vainfo, mpv `--hwdec=vaapi`, Firefox VAAPI, chromium-fourier 149, Brave 1.89 (deferred).
- *Phase 1 success criterion*: boolean correctness ("libva accepted + providing access to hardware decoder"). Performance metrics deferred.
- *Cell E folded into Phase 7 verification gate*: confirmed.
- *distcc*: no — small library, builds on ohm directly.
- *Gitea repo for campaign root*: create `marfrit/libva-multiplanar` empty now; don't push until something publish-worthy lands.
## What Phase 0 will deliver (regardless of detail)
1. **Re-verify the failure-mode finding in-session.** Build the current fork on ohm, install to `/usr/lib/dri/v4l2_request_drv_video.so`, run `vainfo` and `mpv --hwdec=vaapi` on `bbb_1080p30_h264.mp4`. Capture syscall/strace + V4L2 ioctl trace. Compare against the 2026-04-26 STUDY.md picture; loop back to Phase 2 if rig differs.
2. **Reconcile Step 1 (`ohm_gl_fix/phase6/step1/0001..0018`) against fork master.** Map each Step 1 patch to a fork commit (or to a missing slot). Decide fold-in vs supersede vs branch-and-keep.
3. **Verify Firefox configuration end-to-end.** Stock Firefox + `media.ffmpeg.vaapi.enabled=true` + LIBVA env vars — does it engage our backend, fall back to SW, or fail to load? Phase 0 inventory item.
4. **Phase 0 baseline anchor (in-session N=3-equivalent).** For the boolean-success criterion, the "anchor" is more like a contract trace than a metric distribution: capture the V4L2 request-API ioctl sequence on a known-working consumer (chromium-fourier 149 binary on ohm — already engages this libva path per cell A) for 1 frame's decode, in-session, before any fork modifications. That trace is the spec the Phase 6 implementation must reproduce.
## In-session re-verification result (2026-05-04)
Items #1 and #4 above executed against the substrate that was actually deployed on ohm. Full write-up: [`phase0_evidence/2026-05-04/findings.md`](phase0_evidence/2026-05-04/findings.md). Headline:
- **Item #1 — 2026-04-26 picture HOLDS** at boolean-correctness level. vainfo enumerates 7 H.264 + 2 MPEG-2 profiles cleanly; `mpv --hwdec=vaapi-copy` decodes 68 H.264 frames end-to-end through the full V4L2-stateless contract on hantro (`/dev/video1` + `/dev/media0`) with zero EINVAL/EAGAIN/EBUSY on the request-API path. No rig drift requiring Phase 2 loopback.
- **Item #4 — contract trace captured** for mpv vaapi-copy. The chromium-fourier-as-spec-source plan from Phase 0 substrate is no longer blocking — mpv's trace is a clean reproducible substitute (same backend, same per-frame lifecycle: `MEDIA_REQUEST_IOC_REINIT` → per-request `S_EXT_CTRLS``QBUF`+`MEDIA_REQUEST_IOC_QUEUE``DQBUF`). Chromium trace remains worth capturing as cross-validation but isn't needed to lock Phase 1.
- **Substrate inventory shift**: the installed `/usr/lib/dri/v4l2_request_drv_video.so` on ohm is **not** built from `libva-v4l2-request-fourier/master`. It's `libva-v4l2-request-ohm-gl-fix 1.0.0.r0.ga3c2476-2`, built on **boltzmann** 2026-05-02 from `~/src/marfrit-packages/arch/libva-v4l2-request-ohm-gl-fix/PKGBUILD` (which applies `fourier-local.patch` + Step 1 patches `0001..0018` on top of bootlin tarball `a3c2476`). The git fork at `e8c3937` is a *pre-Step-1* substrate — it has the multi-planar wedge + HEVC strip + UAPI shim + STREAMON-defer WIP, but lacks `0002..0018` (request_pool, conditional PRED_WEIGHTS, ANNEX_B start codes, fill DECODE_PARAMS from VAAPI, no CAPTURE S_FMT, SCALING_MATRIX matrix_set predicate, level_idc, POC sentinel strip, DPB picnum, P/B-frame flags). **Rebuilding from the fork as-is would be a regression** — Phase 0 deliverable #2 (Step 1 reconciliation) is upstream of any "build from fork and install" step. The "Build + install on ohm" section below describes the *target* recipe once reconciliation lands; the *current* binary on ohm matches its build chain via the marfrit-packages PKGBUILD on boltzmann.
- **Rig caveat**: `mpv --hwdec=vaapi --vo=null` fails with `Could not create device.` because vo=null doesn't provide a DRM context to vaapi proper — this is mpv-side, not libva. Headless test rigs (SSH session) must use `--hwdec=vaapi-copy` or run inside a real Plasma/X session.
Phase 0 deliverables status: #1 ✓, #2 ✓ (Step 1 reconciled into fork master and pushed; see `libva-v4l2-request-fourier/` git log), #3 ⚠ partial (see below), #4 ✓.
## Firefox engagement test (Phase 0 deliverable #3, 2026-05-04)
### Headless run (Xvfb, SSH-driven)
Stock Firefox 150.0.1 + `media.ffmpeg.vaapi.enabled=true` + `LIBVA_DRIVER_NAME=v4l2_request` env, executed under Xvfb on ohm. Full write-up: [`phase0_evidence/2026-05-04-firefox/findings.md`](phase0_evidence/2026-05-04-firefox/findings.md).
**Result**: Firefox's RDD process dlopens libva.so.2 + libva-drm.so.2 + libva-x11.so.2 for capability probe then immediately closes them; never reaches `vaInitialize`. Gfx-environment platform-fitness check rejects VAAPI under Xvfb's software-framebuffer-with-no-DRI rig. Not a libva-side fault. Re-test in live session needed.
### Live Plasma Wayland session run — and follow-up kernel-side disambiguation
Same Firefox profile + LIBVA env, executed inside the operator's active Plasma 6 Wayland session (XDG_SESSION_TYPE=wayland, XDG_RUNTIME_DIR=/run/user/1001). Full write-up: [`phase0_evidence/2026-05-04-firefox-live/findings.md`](phase0_evidence/2026-05-04-firefox-live/findings.md).
**Result, two-layer**:
| Layer | Verdict |
|---|---|
| libva engagement (driver dlopen, contract lifecycle) | ✓ — clean. Single-frame attempt, all V4L2-stateless ioctls (REQUEST_ALLOC → S_FMT → CREATE_BUFS → STREAMON → S_EXT_CTRLS → QBUF + REQUEST_QUEUE → DQBUF + EXPBUF) succeed, no EINVAL on the request-API path. |
| **Kernel produces decoded pixel output** | **✗ — hantro returns CAPTURE buffer with patch-0011 sentinel `0xab` unchanged**. |
| Consumer reaction | Firefox detected the failed first frame and silently fell back to SW decode. User-visible: BBB plays normally for 5+ minutes via SW (operator-confirmed at t=337s playback time). |
**Cross-checked against the prior mpv vaapi-copy run**: re-examined `phase0_evidence/2026-05-04/mpv_vaapi_copy_2026-05-04.stderr`**68 of 68 mpv CAPTURE buffers show the same sentinel-survives pattern**. mpv's `--vo=null` consumed all 68 sentinel buffers as if they were valid NV12 frames; the failure was invisible. OUTPUT bytes are byte-for-byte identical between mpv and Firefox (same IDR slice, both via libavcodec).
### Implication: prior Phase 0 verdict (commit `f15ba8b`) was wrong
The 2026-04-26 STUDY claim of "vainfo + mpv probes work end-to-end" — repeated in the prior Phase 0 in-session re-verification commit — held only at the **libva-engagement** layer. At the **kernel-decode** layer, hantro produces no decoded output for either consumer. The patch-0011 sentinel test (in the deployed Step 1 build) was authored to detect exactly this; the predecessor close-out apparently didn't grep for it, and the contract-trace cleanliness was mistaken for end-to-end success.
Phase 0 deliverable status corrections:
- **#1** (re-verify failure-mode finding) — ✗ **AMENDED**: contract trace lands cleanly, kernel produces no decoded pixels.
- **#3** (Firefox configuration end-to-end) — ✓ engagement confirmed in live Plasma session; pixel-content failure mode identical to mpv.
- **#4** (Phase 0 baseline anchor) — ✗ **AMENDED**: captured trace describes Step 1's userspace behaviour, not the kernel-side spec Phase 6 must reproduce.
### Kernel-side re-baseline (2026-05-04) — corrects the prior verdict AGAIN
ftrace v4l2/vb2/dma_fence + dmesg + dynamic_debug enabled while running mpv `--hwdec=vaapi-copy --frames=2`. Full write-up: [`phase0_evidence/2026-05-04-kernel-trace/findings.md`](phase0_evidence/2026-05-04-kernel-trace/findings.md).
| Layer | Result |
|---|---|
| ftrace `vb2_buf_done` for CAPTURE_MPLANE | **`bytesused=3655712`** (full NV12 + hantro tile padding) reported every frame. **Kernel signals successful full-buffer write.** |
| dmesg | Completely silent. No hantro/vpu/decode/fail/error/reject/einval/warn. |
| Real-VO disambiguator (operator inspection in live session) | `--hwdec=vaapi-copy --vo=gpu`: **solid GREEN frame**. `--hwdec=vaapi --vo=gpu`: **solid BLUE frame**. NV12-with-Y=0,UV=0 BT.709-converted = green; same buffer via DMA-BUF GL import with different colorspace = blue. **Neither shows the sentinel mid-beige pattern; neither shows real bunny pixels.** |
**Corrected verdict**: hantro accepts the request, returns success, **and writes ALL ZEROS to the CAPTURE buffer**. The patch-0011 sentinel test we relied on is misleading — it has a **cache-coherency bug**. Patch 0011 writes `0xab` via cached `surface_object->destination_map[0]` mmap, but neither `0010-DEBUG-hex-dump` nor any other read path in libva-v4l2-request invalidates the cache after DQBUF. So the readback always shows the stale sentinel, hiding the fact that the kernel DMA-overwrote it with zeros. vaapi-copy and Mesa DMA-BUF GL import correctly invalidate cache and see the real (zero) contents.
**Bug surface narrows substantially.** The path is:
- libva engagement: ✓
- Contract trace: ✓ no EINVAL, all ioctls succeed
- Hantro request acceptance: ✓ kernel reports success
- **Hantro produces meaningful pixel output: ✗ writes ALL ZEROS** — almost certainly the bitstream parser silently rejects something (per patch-0011's own commit-message hypothesis: "the apparent 'no picture' output is the kernel-side decode actually producing zeros, e.g. parser rejected the bitstream")
This is consistent with a control-submission bug (something in SPS/PPS/DECODE_PARAMS is off), not a fundamental "we can't drive hantro" problem. Phase 6 work direction sharpens accordingly.
### Phase 6 priority list (revised after kernel-side baseline)
1. **Fix the patch-0011 sentinel test** (or replace it). Add `msync(MS_SYNC|MS_INVALIDATE)` or DMA-BUF cache sync before the readback. Without this, future debugging is unreliable in exactly the same way.
2. **VIDIOC_G_EXT_CTRLS readback** of the request fd before QUEUE — confirms our writes actually stick at the V4L2 layer (e.g. POC sentinel actually stripped to 0 by patch-0015, level_idc actually set, etc.).
3. **Diff our per-frame control set against FFmpeg's `v4l2_request_h264.c`** (proven working on hantro, downstream branch `code.ffmpeg.org/Kwiboo/FFmpeg.git v4l2-request-n8.1`). Identify any field FFmpeg sets that we don't.
4. **Verify SPS submission completeness**: VAAPI's `VAPictureParameterBufferH264` doesn't carry the full SPS — we may need to derive `profile_idc` / `seq_parameter_set_id` / `log2_max_frame_num_minus4` / `pic_order_cnt_type` / `log2_max_pic_order_cnt_lsb_minus4` / `max_num_ref_frames` from VAAPI fields or by parsing the slice header.
5. **DECODE_PARAMS slice_header bit_size fields** (patch 0008's never-resolved question): if hantro requires them for parse, our zeros could be the silent-reject trigger.
6. **dyndbg on hantro module**: reload with `dyndbg="file drivers/media/platform/verisilicon/* +pmflt"` to surface compiled-in `dev_dbg` calls for the next probe.
Phase 1 boolean-correctness criterion now must include pixel-content verification — but the verification can't rely on patch 0011 in its current form. Either fix patch 0011's cache sync, or use a different check: e.g. mpv `--vo=image-sequence` and inspect the dumped frame, or a small C reproducer that maps the buffer with proper cache flags and computes a luma histogram.
## Source-read references (carry-over from STUDY.md)
For Phase 2 source-read and Phase 6 implementation:
- **FFmpeg** — `libavcodec/v4l2_request.c`, `v4l2_request_buffer.c`, per-codec `v4l2_request_h264.c`. Already multi-planar, already works on hantro. Closest-API canonical example. Active downstream: `code.ffmpeg.org/Kwiboo/FFmpeg/` branch `v4l2-request-n8.1`. 2024-08 v2 patchset on the FFmpeg list.
- **GStreamer v4l2codecs** — `gst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c` + `gstv4l2codecsh264dec.c`. Canonical multi-planar S_FMT / REQBUFS / EXPBUF + request-API control submission on the exact Rockchip drivers we target.
- **Chromium** — `media/gpu/v4l2/v4l2_video_decoder_backend_stateless.{h,cc}` + `v4l2_queue.cc`. ChromeOS-mature multi-planar; higher abstraction than we need but useful for surface lifecycle / request-fd tracking patterns.
## Test fixtures
- **Test clip**: `/moviedata/fourier-test/bbb_1080p30_h264.mp4` on doppler (SHA-16 `dcf8a7170fbd49bb`, 1920×1080 H.264, 24 fps source). Already present at `/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4` on ohm from the `fourier_attribution` campaign. Pull via hertz `lxc file pull` if not present elsewhere.
- **Reference path that already works on the same hardware**: `gst-launch-1.0 filesrc ! qtdemux ! h264parse ! v4l2slh264dec ! waylandsink` — 6 % CPU, zero drops on ohm. That's the ceiling at the workload-end; libva path is expected to match within rounding once accepted. (Ceiling info noted; *not* a Phase 1 binding cell — performance is deferred.)
## Build + install on ohm
- `meson setup build && ninja -C build` directly on ohm. Small library; ~265 KB `.so`. **No distcc** (operator instruction; not enough work to be worth the orchestration).
- Install path: `/usr/lib/dri/v4l2_request_drv_video.so`.
- Activate: `LIBVA_DRIVER_NAME=v4l2_request` + `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` + `LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0`.
- Once the port works: package as `marfrit/libva-v4l2-request-fourier` next to `ffmpeg-v4l2-request-git`, with `provides=(libva-v4l2-request-git)` shape. (Out of Phase 1 scope — packaging is post-Phase-7.)