Re-baselined libva-v4l2-request decode path with kernel-side observability (ftrace v4l2/vb2/dma_fence + dmesg + dynamic_debug) and visual disambiguator (mpv --vo=gpu in operator's live Plasma session). Findings: 1. Kernel reports successful CAPTURE buffer write every frame: ftrace vb2_buf_done shows bytesused=3655712 (full NV12 1920x1088 + hantro tile padding). dmesg completely silent — no hantro/vpu/decode/error/warn messages. 2. Visual disambiguator: mpv --hwdec=vaapi-copy --vo=gpu shows a solid GREEN frame; --hwdec=vaapi --vo=gpu shows solid BLUE. Neither shows the sentinel mid-beige (NV12 Y=0xab,UV=0xab would render cream). Both colors are consistent with the kernel writing all-zero NV12 (Y=0,UV=0 → green via BT.709 limited; same buffer GL-imported as DMA-BUF with different colorspace → blue). 3. Patch 0011 sentinel test has a cache-coherency bug: writes 0xab via cached surface_object->destination_map[0] mmap, never invalidates cache before readback. So the readback always shows the stale sentinel even when kernel DMA-overwrote it with zeros. vaapi-copy and Mesa DMA-BUF GL import correctly invalidate cache and see the real (zero) contents. This corrects the previous Phase 0 verdicts twice in one day: - Original commitf15ba8b("the 2026-04-26 picture holds") was wrong: clean contract trace, never checked pixel content. - Revised commite892cea("kernel produces no decoded pixel output, sentinel survives") was half right: kernel does write, writes zeros, and the sentinel test was reading stale cache. - Now: kernel writes ALL ZEROS to the CAPTURE buffer. Hantro is silently failing the bitstream parse or some control validation. This is consistent with patch 0011's own commit message hypothesis: "All zeros → kernel did write 0x00s (overwriting our sentinel), and the apparent 'no picture' output is the kernel-side decode actually producing zeros (e.g. parser rejected the bitstream)." That hypothesis was right; we just couldn't confirm it via the sentinel test (cache bug) and went down the wrong rabbit hole. Phase 6 direction sharpens substantially. Bug isn't "we can't engage hantro" — it's "hantro engages but its parser produces zeros." Bisect the control submission: VIDIOC_G_EXT_CTRLS readback to verify writes stick, diff against FFmpeg's v4l2_request_h264.c (proven working on hantro), verify SPS completeness, resolve patch 0008's slice_header bit_size open question, dyndbg the hantro module, etc. Phase 1 boolean- correctness criterion needs a working pixel-content check before lock; fix patch 0011's cache sync first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
26 KiB
Phase 0 — libva-multiplanar
This campaign's substrate, locked research question, and pre-Phase-1 inventory work. Adapted from the prior STUDY.md in the fork (libva-v4l2-request-fourier/STUDY.md as of commit e0acc33, which has now been replaced with a pointer to this file) and re-framed against the 8(+1)-phase loop discipline.
Campaign-contained data discipline (governing rule)
Per feedback_dev_process.md Phase 0 + feedback_replicate_baseline_first.md:
This campaign acquires its own measurement data in-session. Predecessor work (the fork's prior STUDY.md, ohm_gl_fix/phase6/step1/ audit, fourier_attribution cell-A vs cell-B numbers) is documented for state carry-over — file:line pointers, contract analyses, build recipes, kernel-UAPI rename catalog, the V4L2-request multi-planar API map — but its measurement claims (e.g. "vainfo enumerates seven H.264 profiles cleanly", "Brave wall is chromeos pipeline as of 2026-04-26") are reference history until re-verified in-session. The 2026-04-26 failure-mode finding may have drifted; re-establish before relying on it.
Research question (LOCKED 2026-05-04)
"Make libva-v4l2-request accepted at all by VA-API consumers on PineTab2 RK3568, providing access to the hantro G1/G2 hardware decoder for H.264 and MPEG-2, end-to-end. Performance metrics are explicitly deferred to a follow-up iteration."
Pass/fail is boolean correctness, not throughput:
- Does the consumer dlopen
v4l2_request_drv_video.so? - Does it complete the VA-API surface lifecycle calls without falling back to SW?
- Does an actual V4L2 request-API ioctl (
VIDIOC_QBUFwith attached SPS/PPS controls + a request fd →MEDIA_REQUEST_IOC_QUEUE→VIDIOC_DQBUFof a populated CAPTURE buffer) land on hantro?
If yes → done for the iteration. Frame-rate / CPU% / drops measurement is a separate iteration whose binding cells will be locked separately.
Mechanism the question targets
Hantro VPU on RK3568 exposes its decode interface as a multi-planar V4L2 stateless device (/dev/video1, V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE + V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, request-API for control submission). VA-API consumers (mpv, Firefox via libavcodec, Chromium/Brave via its own decoder, vainfo as smoke test) speak libva, not V4L2 directly. The bridge they expect is libva-v4l2-request — a libva backend that translates vaCreateSurfaces2 / vaBeginPicture / vaRenderPicture / vaEndPicture into the V4L2-stateless protocol.
Bootlin's upstream libva-v4l2-request (dormant since 2021) was written for single-plane sunxi-cedrus. None of the other public forks (jernejsk, ndufresne, pH5, jc-kynesim, ArtSvetlakov) ship multi-planar end-to-end. Collabora's strategic replacement cros-codecs is Rust + bypasses libva and is not shipping soon — leaving a hole that this campaign closes.
External pointers:
- Mozilla bug 1833354 / 1965646 — Firefox HW decode on RK3566/RK3588 explicitly requires
libva-v4l2-request, notv4l2-m2m. - Bootlin upstream (dormant): https://github.com/bootlin/libva-v4l2-request
Predecessor close-out summary (state carry-over, not data)
From ~/src/ohm_gl_fix/phase6/step1/ (closed 2026-05-02, contract-correct port snapshot)
Patches 0001..0018 against an early multi-planar branch of libva-v4l2-request, plus the audit at audit_0008_decode_params_2026-05-01.md. Most relevant for this campaign:
0008-h264-decode-params-correctness.patch— V4L2_CTRL_TYPE_FWHT_PARAMS / DECODE_PARAMS shape verified againsthantro_h264.ckernel source.0012-h264-omit-scaling-matrix-frame-based.patch— contract-correct gating ofSCALING_MATRIXcontrol bymatrix_setrather than decode mode (one of the canonical examples of "Phase-3-derived implementation considered harmful" infeedback_dev_process.md).- vainfo enumerates H.264 profiles cleanly with these patches against
chromium-fourier 149binary, confirmed byfourier_attributioncell-A (54 % browser CPU, fps 24.0). State: the patches map cleanly onto a multi-planar libva-v4l2-request and represent a correctness baseline.
The Step 1 patches must be reconciled against the libva-v4l2-request-fourier master (12 commits ahead of bootlin tip). Either fold-in (preferred), or supersede the fork's WIP commits with the audit-anchored Step 1 set, or document why a divergent path makes sense.
From libva-v4l2-request-fourier/ (the fork, now sub-tree of this campaign)
Carry-over state (re-verify before treating as current):
- 12 commits ahead of bootlin
a3c2476. Six "build cleanly against current kernel UAPI" commits (V4L2_PIX_FMT_H264_SLICE_RAW→V4L2_PIX_FMT_H264_SLICErename; missingutils.hinclude; HEVC strip;h264-ctrls.hshim withV4L2_CID_MPEG_VIDEO_H264_*→V4L2_CID_STATELESS_H264_*aliases;struct v4l2_ctrl_h264_slice_paramsshape updates;tiled_yuv.Saarch64 stub). - Five probe + control flow fix commits (
src/video.cNV12 multi-plane format entry;src/surface.cMPLANE probe fallback; eager probe inRequestInit;src/context.crename pass; WIP:STREAMONdefer inRequestCreateContext— the V4L2 stateless protocol on hantro requires OUTPUT format → SPS controls → first slice queued → THEN STREAMON; deferring letsvaCreateContextsucceed but proper sequencing is the next phase). src/utils.cdiagnostic logging tee to/tmp/libva-fourier.log(will revert before any final).- Recent (2026-05-02) WIP entry-point tracing across
surface.c,image.c,buffer.c,context.cfor Brave's libva surface stack instrumentation.
The build artifact is a ~265 KB .so. vainfo + mpv --hwdec=vaapi enumerated profiles end-to-end as of 2026-04-26.
From ~/src/fourier_attribution/ (closed 2026-05-04 with Phase 5 review)
- Cell A (chromium-fourier 149 with Step 1 + Step 2 patches):
browser_cpu_median = 54.4 %,effective_fps = 24.0,drops_60s = 12. The libva-multi-planar path is engaged here — this is what end-to-end success looks like at the workload level. - Cell B (stock Brave 1.89 / Chromium 147):
browser_cpu_median = 137 %,fps = 23.18,drops_60s = 16. Brave's libva path falls back to SW because of the chromeos-pipeline gating documented inSTUDY.md§ "Brave's failure is not in our driver". - The 83 pp browser-CPU gap is the campaign-relevant signal that "multi-planar libva is the binding decode-side enabler" — but Sonnet's Phase 5 review correctly flagged this is confounded with the Brave-147-vs-Chromium-149 base-version delta. Cell E (vanilla Chromium 149) was identified as the cheapest disambiguator.
Phase 7 verification gate (LOCKED 2026-05-04): when this campaign's Phase 6 lands a working multi-planar libva-v4l2-request, Phase 7 will retest fourier_attribution cell B (Brave) and the deferred cell E (vanilla Chromium 149) on this campaign's deliverable — that retroactively answers the chromium-fourier wheat verdict's confound.
From ~/src/kwin_overlay_subsurface/ and ~/src/x11-session-research/ (orthogonal)
The NV12-scanout-plane gap on rockchip-drm RK3568 (Plane 39 the only NV12-LINEAR plane; Plane 45 advertises zero NV12 modifiers; X server doesn't program either with NV12 regardless of session server) is orthogonal to this campaign. libva is decode-side; the scanout gap is display-side. Don't confuse them. This campaign's deliverable does not unstick that. The display-side absorbs the NV12 → RGB GL-composite step in KWin (kept cheap by kwin-fourier's watchDmaBuf fix per the fourier_attribution cell-D evidence).
Current ohm state (carry-over from fourier_attribution)
- Kernel:
6.19.10-danctnix1-1-pinetab2 - Mesa:
1:26.0.5-1 - Plasma 6.6.4 Wayland session
qt6-base-fourier 1:6.11.0-3,qt6-xcb-private-headers-fourier 1:6.11.0-3,kwin-fourier 1:6.6.4-3installed (cell-A package state restored end offourier_attribution)chromium-fourier 149binary at/tmp/chromium-ohm-gl-fix-step2/chrome(Step 1 + Step 2 engaged)brave-bin 1:1.89.145-1(Chromium 147 base, control browser)- governor
performance, baloo disabled - hantro on
/dev/video1,/dev/media0— multi-planar V4L2 stateless
The fork tree at ~/src/libva-multiplanar/libva-v4l2-request-fourier/ is on commit e0acc33 (master) with no uncommitted changes. Build harness: meson setup + ninja directly on ohm (small library, no distcc per operator instruction).
In-scope (LOCKED 2026-05-04)
- libva-v4l2-request backend only. Libva front-end (the API library) is mature and supports multi-planar; out of scope for this campaign. Revisit only if Phase 2 source-read surfaces a specific front-end gap.
- Hardware target: ohm RK3568 hantro G1/G2 first iteration only. Other devices (fresnel RK3399 hantro G1, ampere/boltzmann RK3588 VDPU381) are explicit future iterations after the ohm path is solid. RK3588 in particular needs VDPU381 driver code that doesn't exist in the fork yet.
- Codecs: H.264 first; MPEG-2 next. HEVC explicitly out (kernel CIDs renamed, RK3566 has no HW HEVC, current fork stripped HEVC per the build-cleanly stack).
- Test consumers (LOCKED 2026-05-04):
vainfo— smoke test, enumerates profiles + entrypointsmpv --hwdec=vaapi— most directly testable end-to-end consumer for HW decode validation- Firefox via
media.ffmpeg.vaapi.enabled+LIBVA_DRIVER_NAME=v4l2_request— primary "real consumer" target per Mozilla bug 1965646 - chromium-fourier 149 — regression check (cell A confirmed working; verify still works under any fork changes)
- Brave 1.89 — deferred test consumer; the chromeos-pipeline gating documented in
STUDY.mdis upstream to libva and probably not fixable from this campaign's seat. Test it for completeness; don't gate Phase 7 on it.
Out-of-scope (LOCKED 2026-05-04)
- Front-end libva.
- Other hardware (fresnel, ampere, boltzmann) — separate iterations.
- HEVC, VP8, VP9, AV1.
- Userspace bitstream parsing (kernel V4L2-stateless does this; library forwards parameters).
- HEVC RFC reference frame compression (Rockchip-specific, kernel disabled on ohm).
- Performance metrics. Explicitly deferred to a follow-up iteration. Do not lock Phase 1 binding cells around CPU%, fps, drops_60s, or panfrost freq.
- KWin / Wayland scanout-plane work (orthogonal; different campaigns closed).
cros-codecsRust replacement (out peruser_stance_rust.md).- Bootlin / Collabora upstreaming. Per
feedback_no_upstream.md: no PRs, no MRs, no bug reports unless explicitly tasked. Bootlin upstream is dormant; the question of engaging Hans de Goede / Jernej Škrabec / Collabora when this campaign reaches a defensible state is a separate explicit decision.
Open questions before Phase 1 lock
- In-session re-verification of the 2026-04-26 failure-mode finding — is it still "vainfo + mpv probes work end-to-end; Brave wall is chromeos pipeline upstream of libva"? Phase 0 inventory must confirm or update before binding cells lock.
- Step 1 reconciliation — fold-in
ohm_gl_fix/phase6/step1/0001..0018to libva-v4l2-request-fouriermaster, supersede fork WIP, or run a divergent branch? Phase 2 source-read should make the call before Phase 4 plan. - Firefox configuration — does
media.ffmpeg.vaapi.enabled=true+LIBVA_DRIVER_NAME=v4l2_request+LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1work as documented? Phase 0 inventory item. STREAMONordering on hantro — STUDY.md flags this as the load-bearing pending fix: "set both queue formats up front, queue the first buffer with controls attached, thenSTREAMONboth queues". Verify againstgst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.candFFmpeg/libavcodec/v4l2_request*— both proven working on the same hardware. This is Phase 6 implementation work but the audit needs to land in Phase 2.V4L2_EVENT_SOURCE_CHANGEhandling — needed for resolution-change streams; not strictly required for the fixed-resolutionbbb_1080p30_h264.mp4test clip. Defer to Phase 6+ iteration if first-frame decode succeeds without it.
Open questions resolved in this exchange
- libva fork scope: backend only.
- Hardware target lock: ohm first; others future iterations.
- Test corpus: vainfo, mpv
--hwdec=vaapi, Firefox VAAPI, chromium-fourier 149, Brave 1.89 (deferred). - Phase 1 success criterion: boolean correctness ("libva accepted + providing access to hardware decoder"). Performance metrics deferred.
- Cell E folded into Phase 7 verification gate: confirmed.
- distcc: no — small library, builds on ohm directly.
- Gitea repo for campaign root: create
marfrit/libva-multiplanarempty now; don't push until something publish-worthy lands.
What Phase 0 will deliver (regardless of detail)
- Re-verify the failure-mode finding in-session. Build the current fork on ohm, install to
/usr/lib/dri/v4l2_request_drv_video.so, runvainfoandmpv --hwdec=vaapionbbb_1080p30_h264.mp4. Capture syscall/strace + V4L2 ioctl trace. Compare against the 2026-04-26 STUDY.md picture; loop back to Phase 2 if rig differs. - Reconcile Step 1 (
ohm_gl_fix/phase6/step1/0001..0018) against fork master. Map each Step 1 patch to a fork commit (or to a missing slot). Decide fold-in vs supersede vs branch-and-keep. - Verify Firefox configuration end-to-end. Stock Firefox +
media.ffmpeg.vaapi.enabled=true+ LIBVA env vars — does it engage our backend, fall back to SW, or fail to load? Phase 0 inventory item. - Phase 0 baseline anchor (in-session N=3-equivalent). For the boolean-success criterion, the "anchor" is more like a contract trace than a metric distribution: capture the V4L2 request-API ioctl sequence on a known-working consumer (chromium-fourier 149 binary on ohm — already engages this libva path per cell A) for 1 frame's decode, in-session, before any fork modifications. That trace is the spec the Phase 6 implementation must reproduce.
In-session re-verification result (2026-05-04)
Items #1 and #4 above executed against the substrate that was actually deployed on ohm. Full write-up: phase0_evidence/2026-05-04/findings.md. Headline:
- Item #1 — 2026-04-26 picture HOLDS at boolean-correctness level. vainfo enumerates 7 H.264 + 2 MPEG-2 profiles cleanly;
mpv --hwdec=vaapi-copydecodes 68 H.264 frames end-to-end through the full V4L2-stateless contract on hantro (/dev/video1+/dev/media0) with zero EINVAL/EAGAIN/EBUSY on the request-API path. No rig drift requiring Phase 2 loopback. - Item #4 — contract trace captured for mpv vaapi-copy. The chromium-fourier-as-spec-source plan from Phase 0 substrate is no longer blocking — mpv's trace is a clean reproducible substitute (same backend, same per-frame lifecycle:
MEDIA_REQUEST_IOC_REINIT→ per-requestS_EXT_CTRLS→QBUF+MEDIA_REQUEST_IOC_QUEUE→DQBUF). Chromium trace remains worth capturing as cross-validation but isn't needed to lock Phase 1. - Substrate inventory shift: the installed
/usr/lib/dri/v4l2_request_drv_video.soon ohm is not built fromlibva-v4l2-request-fourier/master. It'slibva-v4l2-request-ohm-gl-fix 1.0.0.r0.ga3c2476-2, built on boltzmann 2026-05-02 from~/src/marfrit-packages/arch/libva-v4l2-request-ohm-gl-fix/PKGBUILD(which appliesfourier-local.patch+ Step 1 patches0001..0018on top of bootlin tarballa3c2476). The git fork ate8c3937is a pre-Step-1 substrate — it has the multi-planar wedge + HEVC strip + UAPI shim + STREAMON-defer WIP, but lacks0002..0018(request_pool, conditional PRED_WEIGHTS, ANNEX_B start codes, fill DECODE_PARAMS from VAAPI, no CAPTURE S_FMT, SCALING_MATRIX matrix_set predicate, level_idc, POC sentinel strip, DPB picnum, P/B-frame flags). Rebuilding from the fork as-is would be a regression — Phase 0 deliverable #2 (Step 1 reconciliation) is upstream of any "build from fork and install" step. The "Build + install on ohm" section below describes the target recipe once reconciliation lands; the current binary on ohm matches its build chain via the marfrit-packages PKGBUILD on boltzmann. - Rig caveat:
mpv --hwdec=vaapi --vo=nullfails withCould not create device.because vo=null doesn't provide a DRM context to vaapi proper — this is mpv-side, not libva. Headless test rigs (SSH session) must use--hwdec=vaapi-copyor run inside a real Plasma/X session.
Phase 0 deliverables status: #1 ✓, #2 ✓ (Step 1 reconciled into fork master and pushed; see libva-v4l2-request-fourier/ git log), #3 ⚠ partial (see below), #4 ✓.
Firefox engagement test (Phase 0 deliverable #3, 2026-05-04)
Headless run (Xvfb, SSH-driven)
Stock Firefox 150.0.1 + media.ffmpeg.vaapi.enabled=true + LIBVA_DRIVER_NAME=v4l2_request env, executed under Xvfb on ohm. Full write-up: phase0_evidence/2026-05-04-firefox/findings.md.
Result: Firefox's RDD process dlopens libva.so.2 + libva-drm.so.2 + libva-x11.so.2 for capability probe then immediately closes them; never reaches vaInitialize. Gfx-environment platform-fitness check rejects VAAPI under Xvfb's software-framebuffer-with-no-DRI rig. Not a libva-side fault. Re-test in live session needed.
Live Plasma Wayland session run — and follow-up kernel-side disambiguation
Same Firefox profile + LIBVA env, executed inside the operator's active Plasma 6 Wayland session (XDG_SESSION_TYPE=wayland, XDG_RUNTIME_DIR=/run/user/1001). Full write-up: phase0_evidence/2026-05-04-firefox-live/findings.md.
Result, two-layer:
| Layer | Verdict |
|---|---|
| libva engagement (driver dlopen, contract lifecycle) | ✓ — clean. Single-frame attempt, all V4L2-stateless ioctls (REQUEST_ALLOC → S_FMT → CREATE_BUFS → STREAMON → S_EXT_CTRLS → QBUF + REQUEST_QUEUE → DQBUF + EXPBUF) succeed, no EINVAL on the request-API path. |
| Kernel produces decoded pixel output | ✗ — hantro returns CAPTURE buffer with patch-0011 sentinel 0xab unchanged. |
| Consumer reaction | Firefox detected the failed first frame and silently fell back to SW decode. User-visible: BBB plays normally for 5+ minutes via SW (operator-confirmed at t=337s playback time). |
Cross-checked against the prior mpv vaapi-copy run: re-examined phase0_evidence/2026-05-04/mpv_vaapi_copy_2026-05-04.stderr — 68 of 68 mpv CAPTURE buffers show the same sentinel-survives pattern. mpv's --vo=null consumed all 68 sentinel buffers as if they were valid NV12 frames; the failure was invisible. OUTPUT bytes are byte-for-byte identical between mpv and Firefox (same IDR slice, both via libavcodec).
Implication: prior Phase 0 verdict (commit f15ba8b) was wrong
The 2026-04-26 STUDY claim of "vainfo + mpv probes work end-to-end" — repeated in the prior Phase 0 in-session re-verification commit — held only at the libva-engagement layer. At the kernel-decode layer, hantro produces no decoded output for either consumer. The patch-0011 sentinel test (in the deployed Step 1 build) was authored to detect exactly this; the predecessor close-out apparently didn't grep for it, and the contract-trace cleanliness was mistaken for end-to-end success.
Phase 0 deliverable status corrections:
- #1 (re-verify failure-mode finding) — ✗ AMENDED: contract trace lands cleanly, kernel produces no decoded pixels.
- #3 (Firefox configuration end-to-end) — ✓ engagement confirmed in live Plasma session; pixel-content failure mode identical to mpv.
- #4 (Phase 0 baseline anchor) — ✗ AMENDED: captured trace describes Step 1's userspace behaviour, not the kernel-side spec Phase 6 must reproduce.
Kernel-side re-baseline (2026-05-04) — corrects the prior verdict AGAIN
ftrace v4l2/vb2/dma_fence + dmesg + dynamic_debug enabled while running mpv --hwdec=vaapi-copy --frames=2. Full write-up: phase0_evidence/2026-05-04-kernel-trace/findings.md.
| Layer | Result |
|---|---|
ftrace vb2_buf_done for CAPTURE_MPLANE |
bytesused=3655712 (full NV12 + hantro tile padding) reported every frame. Kernel signals successful full-buffer write. |
| dmesg | Completely silent. No hantro/vpu/decode/fail/error/reject/einval/warn. |
| Real-VO disambiguator (operator inspection in live session) | --hwdec=vaapi-copy --vo=gpu: solid GREEN frame. --hwdec=vaapi --vo=gpu: solid BLUE frame. NV12-with-Y=0,UV=0 BT.709-converted = green; same buffer via DMA-BUF GL import with different colorspace = blue. Neither shows the sentinel mid-beige pattern; neither shows real bunny pixels. |
Corrected verdict: hantro accepts the request, returns success, and writes ALL ZEROS to the CAPTURE buffer. The patch-0011 sentinel test we relied on is misleading — it has a cache-coherency bug. Patch 0011 writes 0xab via cached surface_object->destination_map[0] mmap, but neither 0010-DEBUG-hex-dump nor any other read path in libva-v4l2-request invalidates the cache after DQBUF. So the readback always shows the stale sentinel, hiding the fact that the kernel DMA-overwrote it with zeros. vaapi-copy and Mesa DMA-BUF GL import correctly invalidate cache and see the real (zero) contents.
Bug surface narrows substantially. The path is:
- libva engagement: ✓
- Contract trace: ✓ no EINVAL, all ioctls succeed
- Hantro request acceptance: ✓ kernel reports success
- Hantro produces meaningful pixel output: ✗ writes ALL ZEROS — almost certainly the bitstream parser silently rejects something (per patch-0011's own commit-message hypothesis: "the apparent 'no picture' output is the kernel-side decode actually producing zeros, e.g. parser rejected the bitstream")
This is consistent with a control-submission bug (something in SPS/PPS/DECODE_PARAMS is off), not a fundamental "we can't drive hantro" problem. Phase 6 work direction sharpens accordingly.
Phase 6 priority list (revised after kernel-side baseline)
- Fix the patch-0011 sentinel test (or replace it). Add
msync(MS_SYNC|MS_INVALIDATE)or DMA-BUF cache sync before the readback. Without this, future debugging is unreliable in exactly the same way. - VIDIOC_G_EXT_CTRLS readback of the request fd before QUEUE — confirms our writes actually stick at the V4L2 layer (e.g. POC sentinel actually stripped to 0 by patch-0015, level_idc actually set, etc.).
- Diff our per-frame control set against FFmpeg's
v4l2_request_h264.c(proven working on hantro, downstream branchcode.ffmpeg.org/Kwiboo/FFmpeg.git v4l2-request-n8.1). Identify any field FFmpeg sets that we don't. - Verify SPS submission completeness: VAAPI's
VAPictureParameterBufferH264doesn't carry the full SPS — we may need to deriveprofile_idc/seq_parameter_set_id/log2_max_frame_num_minus4/pic_order_cnt_type/log2_max_pic_order_cnt_lsb_minus4/max_num_ref_framesfrom VAAPI fields or by parsing the slice header. - DECODE_PARAMS slice_header bit_size fields (patch 0008's never-resolved question): if hantro requires them for parse, our zeros could be the silent-reject trigger.
- dyndbg on hantro module: reload with
dyndbg="file drivers/media/platform/verisilicon/* +pmflt"to surface compiled-indev_dbgcalls for the next probe.
Phase 1 boolean-correctness criterion now must include pixel-content verification — but the verification can't rely on patch 0011 in its current form. Either fix patch 0011's cache sync, or use a different check: e.g. mpv --vo=image-sequence and inspect the dumped frame, or a small C reproducer that maps the buffer with proper cache flags and computes a luma histogram.
Source-read references (carry-over from STUDY.md)
For Phase 2 source-read and Phase 6 implementation:
- FFmpeg —
libavcodec/v4l2_request.c,v4l2_request_buffer.c, per-codecv4l2_request_h264.c. Already multi-planar, already works on hantro. Closest-API canonical example. Active downstream:code.ffmpeg.org/Kwiboo/FFmpeg/branchv4l2-request-n8.1. 2024-08 v2 patchset on the FFmpeg list. - GStreamer v4l2codecs —
gst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c+gstv4l2codecsh264dec.c. Canonical multi-planar S_FMT / REQBUFS / EXPBUF + request-API control submission on the exact Rockchip drivers we target. - Chromium —
media/gpu/v4l2/v4l2_video_decoder_backend_stateless.{h,cc}+v4l2_queue.cc. ChromeOS-mature multi-planar; higher abstraction than we need but useful for surface lifecycle / request-fd tracking patterns.
Test fixtures
- Test clip:
/moviedata/fourier-test/bbb_1080p30_h264.mp4on doppler (SHA-16dcf8a7170fbd49bb, 1920×1080 H.264, 24 fps source). Already present at/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4on ohm from thefourier_attributioncampaign. Pull via hertzlxc file pullif not present elsewhere. - Reference path that already works on the same hardware:
gst-launch-1.0 filesrc ! qtdemux ! h264parse ! v4l2slh264dec ! waylandsink— 6 % CPU, zero drops on ohm. That's the ceiling at the workload-end; libva path is expected to match within rounding once accepted. (Ceiling info noted; not a Phase 1 binding cell — performance is deferred.)
Build + install on ohm
meson setup build && ninja -C builddirectly on ohm. Small library; ~265 KB.so. No distcc (operator instruction; not enough work to be worth the orchestration).- Install path:
/usr/lib/dri/v4l2_request_drv_video.so. - Activate:
LIBVA_DRIVER_NAME=v4l2_request+LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1+LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0. - Once the port works: package as
marfrit/libva-v4l2-request-fouriernext toffmpeg-v4l2-request-git, withprovides=(libva-v4l2-request-git)shape. (Out of Phase 1 scope — packaging is post-Phase-7.)