365764fffb
Re-baselined libva-v4l2-request decode path with kernel-side observability (ftrace v4l2/vb2/dma_fence + dmesg + dynamic_debug) and visual disambiguator (mpv --vo=gpu in operator's live Plasma session). Findings: 1. Kernel reports successful CAPTURE buffer write every frame: ftrace vb2_buf_done shows bytesused=3655712 (full NV12 1920x1088 + hantro tile padding). dmesg completely silent — no hantro/vpu/decode/error/warn messages. 2. Visual disambiguator: mpv --hwdec=vaapi-copy --vo=gpu shows a solid GREEN frame; --hwdec=vaapi --vo=gpu shows solid BLUE. Neither shows the sentinel mid-beige (NV12 Y=0xab,UV=0xab would render cream). Both colors are consistent with the kernel writing all-zero NV12 (Y=0,UV=0 → green via BT.709 limited; same buffer GL-imported as DMA-BUF with different colorspace → blue). 3. Patch 0011 sentinel test has a cache-coherency bug: writes 0xab via cached surface_object->destination_map[0] mmap, never invalidates cache before readback. So the readback always shows the stale sentinel even when kernel DMA-overwrote it with zeros. vaapi-copy and Mesa DMA-BUF GL import correctly invalidate cache and see the real (zero) contents. This corrects the previous Phase 0 verdicts twice in one day: - Original commitf15ba8b("the 2026-04-26 picture holds") was wrong: clean contract trace, never checked pixel content. - Revised commite892cea("kernel produces no decoded pixel output, sentinel survives") was half right: kernel does write, writes zeros, and the sentinel test was reading stale cache. - Now: kernel writes ALL ZEROS to the CAPTURE buffer. Hantro is silently failing the bitstream parse or some control validation. This is consistent with patch 0011's own commit message hypothesis: "All zeros → kernel did write 0x00s (overwriting our sentinel), and the apparent 'no picture' output is the kernel-side decode actually producing zeros (e.g. parser rejected the bitstream)." That hypothesis was right; we just couldn't confirm it via the sentinel test (cache bug) and went down the wrong rabbit hole. Phase 6 direction sharpens substantially. Bug isn't "we can't engage hantro" — it's "hantro engages but its parser produces zeros." Bisect the control submission: VIDIOC_G_EXT_CTRLS readback to verify writes stick, diff against FFmpeg's v4l2_request_h264.c (proven working on hantro), verify SPS completeness, resolve patch 0008's slice_header bit_size open question, dyndbg the hantro module, etc. Phase 1 boolean- correctness criterion needs a working pixel-content check before lock; fix patch 0011's cache sync first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
222 lines
26 KiB
Markdown
222 lines
26 KiB
Markdown
# Phase 0 — libva-multiplanar
|
||
|
||
This campaign's substrate, locked research question, and pre-Phase-1 inventory work. Adapted from the prior `STUDY.md` in the fork (`libva-v4l2-request-fourier/STUDY.md` as of commit `e0acc33`, which has now been replaced with a pointer to this file) and re-framed against the 8(+1)-phase loop discipline.
|
||
|
||
## Campaign-contained data discipline (governing rule)
|
||
|
||
Per [`feedback_dev_process.md`](../../.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md) Phase 0 + [`feedback_replicate_baseline_first.md`](../../.claude/projects/-home-mfritsche-src-kwin-overlay-subsurface/memory/feedback_replicate_baseline_first.md):
|
||
|
||
This campaign acquires its own measurement data in-session. Predecessor work (the fork's prior `STUDY.md`, `ohm_gl_fix/phase6/step1/` audit, `fourier_attribution` cell-A vs cell-B numbers) is documented for **state** carry-over — file:line pointers, contract analyses, build recipes, kernel-UAPI rename catalog, the V4L2-request multi-planar API map — but its measurement claims (e.g. "vainfo enumerates seven H.264 profiles cleanly", "Brave wall is chromeos pipeline as of 2026-04-26") are **reference history** until re-verified in-session. The 2026-04-26 failure-mode finding may have drifted; re-establish before relying on it.
|
||
|
||
## Research question (LOCKED 2026-05-04)
|
||
|
||
> **"Make libva-v4l2-request accepted at all by VA-API consumers on PineTab2 RK3568, providing access to the hantro G1/G2 hardware decoder for H.264 and MPEG-2, end-to-end. Performance metrics are explicitly deferred to a follow-up iteration."**
|
||
|
||
Pass/fail is **boolean correctness**, not throughput:
|
||
|
||
- Does the consumer dlopen `v4l2_request_drv_video.so`?
|
||
- Does it complete the VA-API surface lifecycle calls without falling back to SW?
|
||
- Does an actual V4L2 request-API ioctl (`VIDIOC_QBUF` with attached SPS/PPS controls + a request fd → `MEDIA_REQUEST_IOC_QUEUE` → `VIDIOC_DQBUF` of a populated CAPTURE buffer) land on hantro?
|
||
|
||
If yes → done for the iteration. Frame-rate / CPU% / drops measurement is a separate iteration whose binding cells will be locked separately.
|
||
|
||
## Mechanism the question targets
|
||
|
||
Hantro VPU on RK3568 exposes its decode interface as a **multi-planar V4L2 stateless** device (`/dev/video1`, `V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE` + `V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE`, request-API for control submission). VA-API consumers (mpv, Firefox via libavcodec, Chromium/Brave via its own decoder, vainfo as smoke test) speak libva, not V4L2 directly. The bridge they expect is `libva-v4l2-request` — a libva backend that translates `vaCreateSurfaces2` / `vaBeginPicture` / `vaRenderPicture` / `vaEndPicture` into the V4L2-stateless protocol.
|
||
|
||
Bootlin's upstream `libva-v4l2-request` (dormant since 2021) was written for **single-plane** sunxi-cedrus. None of the other public forks (jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov) ship multi-planar end-to-end. Collabora's strategic replacement `cros-codecs` is Rust + bypasses libva and is not shipping soon — leaving a hole that this campaign closes.
|
||
|
||
External pointers:
|
||
- Mozilla bug 1833354 / 1965646 — Firefox HW decode on RK3566/RK3588 explicitly requires `libva-v4l2-request`, not `v4l2-m2m`.
|
||
- Bootlin upstream (dormant): <https://github.com/bootlin/libva-v4l2-request>
|
||
|
||
## Predecessor close-out summary (state carry-over, not data)
|
||
|
||
### From `~/src/ohm_gl_fix/phase6/step1/` (closed 2026-05-02, contract-correct port snapshot)
|
||
|
||
Patches `0001..0018` against an early multi-planar branch of `libva-v4l2-request`, plus the audit at `audit_0008_decode_params_2026-05-01.md`. Most relevant for this campaign:
|
||
|
||
- `0008-h264-decode-params-correctness.patch` — V4L2_CTRL_TYPE_FWHT_PARAMS / DECODE_PARAMS shape verified against `hantro_h264.c` kernel source.
|
||
- `0012-h264-omit-scaling-matrix-frame-based.patch` — contract-correct gating of `SCALING_MATRIX` control by `matrix_set` rather than decode mode (one of the canonical examples of "Phase-3-derived implementation considered harmful" in `feedback_dev_process.md`).
|
||
- vainfo enumerates H.264 profiles cleanly with these patches against `chromium-fourier 149` binary, confirmed by `fourier_attribution` cell-A (54 % browser CPU, fps 24.0). **State**: the patches map cleanly onto a multi-planar libva-v4l2-request and represent a correctness baseline.
|
||
|
||
The Step 1 patches must be reconciled against the libva-v4l2-request-fourier `master` (12 commits ahead of bootlin tip). Either fold-in (preferred), or supersede the fork's WIP commits with the audit-anchored Step 1 set, or document why a divergent path makes sense.
|
||
|
||
### From `libva-v4l2-request-fourier/` (the fork, now sub-tree of this campaign)
|
||
|
||
Carry-over **state** (re-verify before treating as current):
|
||
|
||
- 12 commits ahead of bootlin `a3c2476`. Six "build cleanly against current kernel UAPI" commits (`V4L2_PIX_FMT_H264_SLICE_RAW` → `V4L2_PIX_FMT_H264_SLICE` rename; missing `utils.h` include; HEVC strip; `h264-ctrls.h` shim with `V4L2_CID_MPEG_VIDEO_H264_*` → `V4L2_CID_STATELESS_H264_*` aliases; `struct v4l2_ctrl_h264_slice_params` shape updates; `tiled_yuv.S` aarch64 stub).
|
||
- Five probe + control flow fix commits (`src/video.c` NV12 multi-plane format entry; `src/surface.c` MPLANE probe fallback; eager probe in `RequestInit`; `src/context.c` rename pass; **WIP**: `STREAMON` defer in `RequestCreateContext` — the V4L2 stateless protocol on hantro requires OUTPUT format → SPS controls → first slice queued → THEN STREAMON; deferring lets `vaCreateContext` succeed but proper sequencing is the next phase).
|
||
- `src/utils.c` diagnostic logging tee to `/tmp/libva-fourier.log` (will revert before any final).
|
||
- Recent (2026-05-02) WIP entry-point tracing across `surface.c`, `image.c`, `buffer.c`, `context.c` for Brave's libva surface stack instrumentation.
|
||
|
||
The build artifact is a ~265 KB `.so`. `vainfo` + `mpv --hwdec=vaapi` enumerated profiles end-to-end as of 2026-04-26.
|
||
|
||
### From `~/src/fourier_attribution/` (closed 2026-05-04 with Phase 5 review)
|
||
|
||
- Cell A (chromium-fourier 149 with Step 1 + Step 2 patches): `browser_cpu_median = 54.4 %`, `effective_fps = 24.0`, `drops_60s = 12`. **The libva-multi-planar path is engaged here** — this is what end-to-end success looks like at the workload level.
|
||
- Cell B (stock Brave 1.89 / Chromium 147): `browser_cpu_median = 137 %`, `fps = 23.18`, `drops_60s = 16`. **Brave's libva path falls back to SW** because of the chromeos-pipeline gating documented in `STUDY.md` § "Brave's failure is not in our driver".
|
||
- The 83 pp browser-CPU gap is the campaign-relevant signal that "multi-planar libva is the binding decode-side enabler" — but Sonnet's Phase 5 review correctly flagged this is confounded with the Brave-147-vs-Chromium-149 base-version delta. Cell E (vanilla Chromium 149) was identified as the cheapest disambiguator.
|
||
|
||
**Phase 7 verification gate (LOCKED 2026-05-04)**: when this campaign's Phase 6 lands a working multi-planar libva-v4l2-request, Phase 7 will retest `fourier_attribution` cell B (Brave) and the deferred cell E (vanilla Chromium 149) on this campaign's deliverable — that retroactively answers the chromium-fourier wheat verdict's confound.
|
||
|
||
### From `~/src/kwin_overlay_subsurface/` and `~/src/x11-session-research/` (orthogonal)
|
||
|
||
The NV12-scanout-plane gap on rockchip-drm RK3568 (Plane 39 the only NV12-LINEAR plane; Plane 45 advertises zero NV12 modifiers; X server doesn't program either with NV12 regardless of session server) is **orthogonal** to this campaign. libva is decode-side; the scanout gap is display-side. Don't confuse them. This campaign's deliverable does not unstick that. The display-side absorbs the NV12 → RGB GL-composite step in KWin (kept cheap by `kwin-fourier`'s `watchDmaBuf` fix per the `fourier_attribution` cell-D evidence).
|
||
|
||
## Current ohm state (carry-over from `fourier_attribution`)
|
||
|
||
- Kernel: `6.19.10-danctnix1-1-pinetab2`
|
||
- Mesa: `1:26.0.5-1`
|
||
- Plasma 6.6.4 Wayland session
|
||
- `qt6-base-fourier 1:6.11.0-3`, `qt6-xcb-private-headers-fourier 1:6.11.0-3`, `kwin-fourier 1:6.6.4-3` installed (cell-A package state restored end of `fourier_attribution`)
|
||
- `chromium-fourier 149` binary at `/tmp/chromium-ohm-gl-fix-step2/chrome` (Step 1 + Step 2 engaged)
|
||
- `brave-bin 1:1.89.145-1` (Chromium 147 base, control browser)
|
||
- governor `performance`, baloo disabled
|
||
- hantro on `/dev/video1`, `/dev/media0` — multi-planar V4L2 stateless
|
||
|
||
The fork tree at `~/src/libva-multiplanar/libva-v4l2-request-fourier/` is on commit `e0acc33` (master) with no uncommitted changes. Build harness: `meson setup` + `ninja` directly on ohm (small library, no distcc per operator instruction).
|
||
|
||
## In-scope (LOCKED 2026-05-04)
|
||
|
||
- libva-v4l2-request **backend only**. Libva front-end (the API library) is mature and supports multi-planar; out of scope for this campaign. Revisit only if Phase 2 source-read surfaces a specific front-end gap.
|
||
- Hardware target: **ohm RK3568 hantro G1/G2 first iteration only**. Other devices (fresnel RK3399 hantro G1, ampere/boltzmann RK3588 VDPU381) are explicit future iterations after the ohm path is solid. RK3588 in particular needs VDPU381 driver code that doesn't exist in the fork yet.
|
||
- Codecs: H.264 first; MPEG-2 next. HEVC explicitly out (kernel CIDs renamed, RK3566 has no HW HEVC, current fork stripped HEVC per the build-cleanly stack).
|
||
- Test consumers (LOCKED 2026-05-04):
|
||
- `vainfo` — smoke test, enumerates profiles + entrypoints
|
||
- `mpv --hwdec=vaapi` — most directly testable end-to-end consumer for HW decode validation
|
||
- Firefox via `media.ffmpeg.vaapi.enabled` + `LIBVA_DRIVER_NAME=v4l2_request` — primary "real consumer" target per Mozilla bug 1965646
|
||
- chromium-fourier 149 — regression check (cell A confirmed working; verify still works under any fork changes)
|
||
- Brave 1.89 — *deferred* test consumer; the chromeos-pipeline gating documented in `STUDY.md` is upstream to libva and probably not fixable from this campaign's seat. Test it for completeness; don't gate Phase 7 on it.
|
||
|
||
## Out-of-scope (LOCKED 2026-05-04)
|
||
|
||
- Front-end libva.
|
||
- Other hardware (fresnel, ampere, boltzmann) — separate iterations.
|
||
- HEVC, VP8, VP9, AV1.
|
||
- Userspace bitstream parsing (kernel V4L2-stateless does this; library forwards parameters).
|
||
- HEVC RFC reference frame compression (Rockchip-specific, kernel disabled on ohm).
|
||
- Performance metrics. **Explicitly deferred to a follow-up iteration.** Do not lock Phase 1 binding cells around CPU%, fps, drops_60s, or panfrost freq.
|
||
- KWin / Wayland scanout-plane work (orthogonal; different campaigns closed).
|
||
- `cros-codecs` Rust replacement (out per `user_stance_rust.md`).
|
||
- Bootlin / Collabora upstreaming. Per `feedback_no_upstream.md`: no PRs, no MRs, no bug reports unless explicitly tasked. Bootlin upstream is dormant; the question of engaging Hans de Goede / Jernej Škrabec / Collabora when this campaign reaches a defensible state is a separate explicit decision.
|
||
|
||
## Open questions before Phase 1 lock
|
||
|
||
1. **In-session re-verification of the 2026-04-26 failure-mode finding** — is it still "vainfo + mpv probes work end-to-end; Brave wall is chromeos pipeline upstream of libva"? Phase 0 inventory must confirm or update before binding cells lock.
|
||
2. **Step 1 reconciliation** — fold-in `ohm_gl_fix/phase6/step1/0001..0018` to libva-v4l2-request-fourier `master`, supersede fork WIP, or run a divergent branch? Phase 2 source-read should make the call before Phase 4 plan.
|
||
3. **Firefox configuration** — does `media.ffmpeg.vaapi.enabled=true` + `LIBVA_DRIVER_NAME=v4l2_request` + `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` work as documented? Phase 0 inventory item.
|
||
4. **`STREAMON` ordering on hantro** — STUDY.md flags this as the load-bearing pending fix: "set both queue formats up front, queue the first buffer with controls attached, then `STREAMON` both queues". Verify against `gst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c` and `FFmpeg/libavcodec/v4l2_request*` — both proven working on the same hardware. This is Phase 6 implementation work but the audit needs to land in Phase 2.
|
||
5. **`V4L2_EVENT_SOURCE_CHANGE` handling** — needed for resolution-change streams; not strictly required for the fixed-resolution `bbb_1080p30_h264.mp4` test clip. Defer to Phase 6+ iteration if first-frame decode succeeds without it.
|
||
|
||
## Open questions resolved in this exchange
|
||
|
||
- *libva fork scope*: backend only.
|
||
- *Hardware target lock*: ohm first; others future iterations.
|
||
- *Test corpus*: vainfo, mpv `--hwdec=vaapi`, Firefox VAAPI, chromium-fourier 149, Brave 1.89 (deferred).
|
||
- *Phase 1 success criterion*: boolean correctness ("libva accepted + providing access to hardware decoder"). Performance metrics deferred.
|
||
- *Cell E folded into Phase 7 verification gate*: confirmed.
|
||
- *distcc*: no — small library, builds on ohm directly.
|
||
- *Gitea repo for campaign root*: create `marfrit/libva-multiplanar` empty now; don't push until something publish-worthy lands.
|
||
|
||
## What Phase 0 will deliver (regardless of detail)
|
||
|
||
1. **Re-verify the failure-mode finding in-session.** Build the current fork on ohm, install to `/usr/lib/dri/v4l2_request_drv_video.so`, run `vainfo` and `mpv --hwdec=vaapi` on `bbb_1080p30_h264.mp4`. Capture syscall/strace + V4L2 ioctl trace. Compare against the 2026-04-26 STUDY.md picture; loop back to Phase 2 if rig differs.
|
||
2. **Reconcile Step 1 (`ohm_gl_fix/phase6/step1/0001..0018`) against fork master.** Map each Step 1 patch to a fork commit (or to a missing slot). Decide fold-in vs supersede vs branch-and-keep.
|
||
3. **Verify Firefox configuration end-to-end.** Stock Firefox + `media.ffmpeg.vaapi.enabled=true` + LIBVA env vars — does it engage our backend, fall back to SW, or fail to load? Phase 0 inventory item.
|
||
4. **Phase 0 baseline anchor (in-session N=3-equivalent).** For the boolean-success criterion, the "anchor" is more like a contract trace than a metric distribution: capture the V4L2 request-API ioctl sequence on a known-working consumer (chromium-fourier 149 binary on ohm — already engages this libva path per cell A) for 1 frame's decode, in-session, before any fork modifications. That trace is the spec the Phase 6 implementation must reproduce.
|
||
|
||
## In-session re-verification result (2026-05-04)
|
||
|
||
Items #1 and #4 above executed against the substrate that was actually deployed on ohm. Full write-up: [`phase0_evidence/2026-05-04/findings.md`](phase0_evidence/2026-05-04/findings.md). Headline:
|
||
|
||
- **Item #1 — 2026-04-26 picture HOLDS** at boolean-correctness level. vainfo enumerates 7 H.264 + 2 MPEG-2 profiles cleanly; `mpv --hwdec=vaapi-copy` decodes 68 H.264 frames end-to-end through the full V4L2-stateless contract on hantro (`/dev/video1` + `/dev/media0`) with zero EINVAL/EAGAIN/EBUSY on the request-API path. No rig drift requiring Phase 2 loopback.
|
||
- **Item #4 — contract trace captured** for mpv vaapi-copy. The chromium-fourier-as-spec-source plan from Phase 0 substrate is no longer blocking — mpv's trace is a clean reproducible substitute (same backend, same per-frame lifecycle: `MEDIA_REQUEST_IOC_REINIT` → per-request `S_EXT_CTRLS` → `QBUF`+`MEDIA_REQUEST_IOC_QUEUE` → `DQBUF`). Chromium trace remains worth capturing as cross-validation but isn't needed to lock Phase 1.
|
||
- **Substrate inventory shift**: the installed `/usr/lib/dri/v4l2_request_drv_video.so` on ohm is **not** built from `libva-v4l2-request-fourier/master`. It's `libva-v4l2-request-ohm-gl-fix 1.0.0.r0.ga3c2476-2`, built on **boltzmann** 2026-05-02 from `~/src/marfrit-packages/arch/libva-v4l2-request-ohm-gl-fix/PKGBUILD` (which applies `fourier-local.patch` + Step 1 patches `0001..0018` on top of bootlin tarball `a3c2476`). The git fork at `e8c3937` is a *pre-Step-1* substrate — it has the multi-planar wedge + HEVC strip + UAPI shim + STREAMON-defer WIP, but lacks `0002..0018` (request_pool, conditional PRED_WEIGHTS, ANNEX_B start codes, fill DECODE_PARAMS from VAAPI, no CAPTURE S_FMT, SCALING_MATRIX matrix_set predicate, level_idc, POC sentinel strip, DPB picnum, P/B-frame flags). **Rebuilding from the fork as-is would be a regression** — Phase 0 deliverable #2 (Step 1 reconciliation) is upstream of any "build from fork and install" step. The "Build + install on ohm" section below describes the *target* recipe once reconciliation lands; the *current* binary on ohm matches its build chain via the marfrit-packages PKGBUILD on boltzmann.
|
||
- **Rig caveat**: `mpv --hwdec=vaapi --vo=null` fails with `Could not create device.` because vo=null doesn't provide a DRM context to vaapi proper — this is mpv-side, not libva. Headless test rigs (SSH session) must use `--hwdec=vaapi-copy` or run inside a real Plasma/X session.
|
||
|
||
Phase 0 deliverables status: #1 ✓, #2 ✓ (Step 1 reconciled into fork master and pushed; see `libva-v4l2-request-fourier/` git log), #3 ⚠ partial (see below), #4 ✓.
|
||
|
||
## Firefox engagement test (Phase 0 deliverable #3, 2026-05-04)
|
||
|
||
### Headless run (Xvfb, SSH-driven)
|
||
|
||
Stock Firefox 150.0.1 + `media.ffmpeg.vaapi.enabled=true` + `LIBVA_DRIVER_NAME=v4l2_request` env, executed under Xvfb on ohm. Full write-up: [`phase0_evidence/2026-05-04-firefox/findings.md`](phase0_evidence/2026-05-04-firefox/findings.md).
|
||
|
||
**Result**: Firefox's RDD process dlopens libva.so.2 + libva-drm.so.2 + libva-x11.so.2 for capability probe then immediately closes them; never reaches `vaInitialize`. Gfx-environment platform-fitness check rejects VAAPI under Xvfb's software-framebuffer-with-no-DRI rig. Not a libva-side fault. Re-test in live session needed.
|
||
|
||
### Live Plasma Wayland session run — and follow-up kernel-side disambiguation
|
||
|
||
Same Firefox profile + LIBVA env, executed inside the operator's active Plasma 6 Wayland session (XDG_SESSION_TYPE=wayland, XDG_RUNTIME_DIR=/run/user/1001). Full write-up: [`phase0_evidence/2026-05-04-firefox-live/findings.md`](phase0_evidence/2026-05-04-firefox-live/findings.md).
|
||
|
||
**Result, two-layer**:
|
||
|
||
| Layer | Verdict |
|
||
|---|---|
|
||
| libva engagement (driver dlopen, contract lifecycle) | ✓ — clean. Single-frame attempt, all V4L2-stateless ioctls (REQUEST_ALLOC → S_FMT → CREATE_BUFS → STREAMON → S_EXT_CTRLS → QBUF + REQUEST_QUEUE → DQBUF + EXPBUF) succeed, no EINVAL on the request-API path. |
|
||
| **Kernel produces decoded pixel output** | **✗ — hantro returns CAPTURE buffer with patch-0011 sentinel `0xab` unchanged**. |
|
||
| Consumer reaction | Firefox detected the failed first frame and silently fell back to SW decode. User-visible: BBB plays normally for 5+ minutes via SW (operator-confirmed at t=337s playback time). |
|
||
|
||
**Cross-checked against the prior mpv vaapi-copy run**: re-examined `phase0_evidence/2026-05-04/mpv_vaapi_copy_2026-05-04.stderr` — **68 of 68 mpv CAPTURE buffers show the same sentinel-survives pattern**. mpv's `--vo=null` consumed all 68 sentinel buffers as if they were valid NV12 frames; the failure was invisible. OUTPUT bytes are byte-for-byte identical between mpv and Firefox (same IDR slice, both via libavcodec).
|
||
|
||
### Implication: prior Phase 0 verdict (commit `f15ba8b`) was wrong
|
||
|
||
The 2026-04-26 STUDY claim of "vainfo + mpv probes work end-to-end" — repeated in the prior Phase 0 in-session re-verification commit — held only at the **libva-engagement** layer. At the **kernel-decode** layer, hantro produces no decoded output for either consumer. The patch-0011 sentinel test (in the deployed Step 1 build) was authored to detect exactly this; the predecessor close-out apparently didn't grep for it, and the contract-trace cleanliness was mistaken for end-to-end success.
|
||
|
||
Phase 0 deliverable status corrections:
|
||
- **#1** (re-verify failure-mode finding) — ✗ **AMENDED**: contract trace lands cleanly, kernel produces no decoded pixels.
|
||
- **#3** (Firefox configuration end-to-end) — ✓ engagement confirmed in live Plasma session; pixel-content failure mode identical to mpv.
|
||
- **#4** (Phase 0 baseline anchor) — ✗ **AMENDED**: captured trace describes Step 1's userspace behaviour, not the kernel-side spec Phase 6 must reproduce.
|
||
|
||
### Kernel-side re-baseline (2026-05-04) — corrects the prior verdict AGAIN
|
||
|
||
ftrace v4l2/vb2/dma_fence + dmesg + dynamic_debug enabled while running mpv `--hwdec=vaapi-copy --frames=2`. Full write-up: [`phase0_evidence/2026-05-04-kernel-trace/findings.md`](phase0_evidence/2026-05-04-kernel-trace/findings.md).
|
||
|
||
| Layer | Result |
|
||
|---|---|
|
||
| ftrace `vb2_buf_done` for CAPTURE_MPLANE | **`bytesused=3655712`** (full NV12 + hantro tile padding) reported every frame. **Kernel signals successful full-buffer write.** |
|
||
| dmesg | Completely silent. No hantro/vpu/decode/fail/error/reject/einval/warn. |
|
||
| Real-VO disambiguator (operator inspection in live session) | `--hwdec=vaapi-copy --vo=gpu`: **solid GREEN frame**. `--hwdec=vaapi --vo=gpu`: **solid BLUE frame**. NV12-with-Y=0,UV=0 BT.709-converted = green; same buffer via DMA-BUF GL import with different colorspace = blue. **Neither shows the sentinel mid-beige pattern; neither shows real bunny pixels.** |
|
||
|
||
**Corrected verdict**: hantro accepts the request, returns success, **and writes ALL ZEROS to the CAPTURE buffer**. The patch-0011 sentinel test we relied on is misleading — it has a **cache-coherency bug**. Patch 0011 writes `0xab` via cached `surface_object->destination_map[0]` mmap, but neither `0010-DEBUG-hex-dump` nor any other read path in libva-v4l2-request invalidates the cache after DQBUF. So the readback always shows the stale sentinel, hiding the fact that the kernel DMA-overwrote it with zeros. vaapi-copy and Mesa DMA-BUF GL import correctly invalidate cache and see the real (zero) contents.
|
||
|
||
**Bug surface narrows substantially.** The path is:
|
||
- libva engagement: ✓
|
||
- Contract trace: ✓ no EINVAL, all ioctls succeed
|
||
- Hantro request acceptance: ✓ kernel reports success
|
||
- **Hantro produces meaningful pixel output: ✗ writes ALL ZEROS** — almost certainly the bitstream parser silently rejects something (per patch-0011's own commit-message hypothesis: "the apparent 'no picture' output is the kernel-side decode actually producing zeros, e.g. parser rejected the bitstream")
|
||
|
||
This is consistent with a control-submission bug (something in SPS/PPS/DECODE_PARAMS is off), not a fundamental "we can't drive hantro" problem. Phase 6 work direction sharpens accordingly.
|
||
|
||
### Phase 6 priority list (revised after kernel-side baseline)
|
||
|
||
1. **Fix the patch-0011 sentinel test** (or replace it). Add `msync(MS_SYNC|MS_INVALIDATE)` or DMA-BUF cache sync before the readback. Without this, future debugging is unreliable in exactly the same way.
|
||
2. **VIDIOC_G_EXT_CTRLS readback** of the request fd before QUEUE — confirms our writes actually stick at the V4L2 layer (e.g. POC sentinel actually stripped to 0 by patch-0015, level_idc actually set, etc.).
|
||
3. **Diff our per-frame control set against FFmpeg's `v4l2_request_h264.c`** (proven working on hantro, downstream branch `code.ffmpeg.org/Kwiboo/FFmpeg.git v4l2-request-n8.1`). Identify any field FFmpeg sets that we don't.
|
||
4. **Verify SPS submission completeness**: VAAPI's `VAPictureParameterBufferH264` doesn't carry the full SPS — we may need to derive `profile_idc` / `seq_parameter_set_id` / `log2_max_frame_num_minus4` / `pic_order_cnt_type` / `log2_max_pic_order_cnt_lsb_minus4` / `max_num_ref_frames` from VAAPI fields or by parsing the slice header.
|
||
5. **DECODE_PARAMS slice_header bit_size fields** (patch 0008's never-resolved question): if hantro requires them for parse, our zeros could be the silent-reject trigger.
|
||
6. **dyndbg on hantro module**: reload with `dyndbg="file drivers/media/platform/verisilicon/* +pmflt"` to surface compiled-in `dev_dbg` calls for the next probe.
|
||
|
||
Phase 1 boolean-correctness criterion now must include pixel-content verification — but the verification can't rely on patch 0011 in its current form. Either fix patch 0011's cache sync, or use a different check: e.g. mpv `--vo=image-sequence` and inspect the dumped frame, or a small C reproducer that maps the buffer with proper cache flags and computes a luma histogram.
|
||
|
||
## Source-read references (carry-over from STUDY.md)
|
||
|
||
For Phase 2 source-read and Phase 6 implementation:
|
||
|
||
- **FFmpeg** — `libavcodec/v4l2_request.c`, `v4l2_request_buffer.c`, per-codec `v4l2_request_h264.c`. Already multi-planar, already works on hantro. Closest-API canonical example. Active downstream: `code.ffmpeg.org/Kwiboo/FFmpeg/` branch `v4l2-request-n8.1`. 2024-08 v2 patchset on the FFmpeg list.
|
||
- **GStreamer v4l2codecs** — `gst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c` + `gstv4l2codecsh264dec.c`. Canonical multi-planar S_FMT / REQBUFS / EXPBUF + request-API control submission on the exact Rockchip drivers we target.
|
||
- **Chromium** — `media/gpu/v4l2/v4l2_video_decoder_backend_stateless.{h,cc}` + `v4l2_queue.cc`. ChromeOS-mature multi-planar; higher abstraction than we need but useful for surface lifecycle / request-fd tracking patterns.
|
||
|
||
## Test fixtures
|
||
|
||
- **Test clip**: `/moviedata/fourier-test/bbb_1080p30_h264.mp4` on doppler (SHA-16 `dcf8a7170fbd49bb`, 1920×1080 H.264, 24 fps source). Already present at `/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4` on ohm from the `fourier_attribution` campaign. Pull via hertz `lxc file pull` if not present elsewhere.
|
||
- **Reference path that already works on the same hardware**: `gst-launch-1.0 filesrc ! qtdemux ! h264parse ! v4l2slh264dec ! waylandsink` — 6 % CPU, zero drops on ohm. That's the ceiling at the workload-end; libva path is expected to match within rounding once accepted. (Ceiling info noted; *not* a Phase 1 binding cell — performance is deferred.)
|
||
|
||
## Build + install on ohm
|
||
|
||
- `meson setup build && ninja -C build` directly on ohm. Small library; ~265 KB `.so`. **No distcc** (operator instruction; not enough work to be worth the orchestration).
|
||
- Install path: `/usr/lib/dri/v4l2_request_drv_video.so`.
|
||
- Activate: `LIBVA_DRIVER_NAME=v4l2_request` + `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` + `LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0`.
|
||
- Once the port works: package as `marfrit/libva-v4l2-request-fourier` next to `ffmpeg-v4l2-request-git`, with `provides=(libva-v4l2-request-git)` shape. (Out of Phase 1 scope — packaging is post-Phase-7.)
|