Re-baselined libva-v4l2-request decode path with kernel-side observability (ftrace v4l2/vb2/dma_fence + dmesg + dynamic_debug) and visual disambiguator (mpv --vo=gpu in operator's live Plasma session). Findings: 1. Kernel reports successful CAPTURE buffer write every frame: ftrace vb2_buf_done shows bytesused=3655712 (full NV12 1920x1088 + hantro tile padding). dmesg completely silent — no hantro/vpu/decode/error/warn messages. 2. Visual disambiguator: mpv --hwdec=vaapi-copy --vo=gpu shows a solid GREEN frame; --hwdec=vaapi --vo=gpu shows solid BLUE. Neither shows the sentinel mid-beige (NV12 Y=0xab,UV=0xab would render cream). Both colors are consistent with the kernel writing all-zero NV12 (Y=0,UV=0 → green via BT.709 limited; same buffer GL-imported as DMA-BUF with different colorspace → blue). 3. Patch 0011 sentinel test has a cache-coherency bug: writes 0xab via cached surface_object->destination_map[0] mmap, never invalidates cache before readback. So the readback always shows the stale sentinel even when kernel DMA-overwrote it with zeros. vaapi-copy and Mesa DMA-BUF GL import correctly invalidate cache and see the real (zero) contents. This corrects the previous Phase 0 verdicts twice in one day: - Original commitf15ba8b("the 2026-04-26 picture holds") was wrong: clean contract trace, never checked pixel content. - Revised commite892cea("kernel produces no decoded pixel output, sentinel survives") was half right: kernel does write, writes zeros, and the sentinel test was reading stale cache. - Now: kernel writes ALL ZEROS to the CAPTURE buffer. Hantro is silently failing the bitstream parse or some control validation. This is consistent with patch 0011's own commit message hypothesis: "All zeros → kernel did write 0x00s (overwriting our sentinel), and the apparent 'no picture' output is the kernel-side decode actually producing zeros (e.g. parser rejected the bitstream)." That hypothesis was right; we just couldn't confirm it via the sentinel test (cache bug) and went down the wrong rabbit hole. Phase 6 direction sharpens substantially. Bug isn't "we can't engage hantro" — it's "hantro engages but its parser produces zeros." Bisect the control submission: VIDIOC_G_EXT_CTRLS readback to verify writes stick, diff against FFmpeg's v4l2_request_h264.c (proven working on hantro), verify SPS completeness, resolve patch 0008's slice_header bit_size open question, dyndbg the hantro module, etc. Phase 1 boolean- correctness criterion needs a working pixel-content check before lock; fix patch 0011's cache sync first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
10 KiB
Phase 0 deliverable #1 + #4 amendment — kernel-side observability (2026-05-04)
In-session re-baseline of the libva-v4l2-request decode path with kernel-side observability (ftrace v4l2/vb2/dma_fence events + dmesg + dynamic_debug on v4l2-core/videobuf2 + a real-VO visual disambiguator). Supersedes the userspace-only baseline from phase0_evidence/2026-05-04/findings.md.
Verdict
Hantro accepts the request, reports successful buffer write, and produces all-zero output. The patch-0011 sentinel test that we previously read as "kernel never wrote the buffer" was misleading due to a cache-coherency bug in the patch's mmap read — kernel DMA-writes the buffer (with zeros), and the cached userspace mmap continues to show the stale sentinel 0xab until something invalidates the cache. Tools that DO invalidate (vaapi-copy via libva's vaMapBuffer, GL DMA-BUF import via Mesa) see the real contents (zeros).
This corrects the previous Phase 0 verdict twice in one day:
- Original prior commit
f15ba8b: "the 2026-04-26 picture holds" — wrong, contract trace was clean but pixel content wasn't checked. - Revised commit
e892cea: "kernel produces no decoded pixel output, sentinel survives" — half right; kernel does write, but writes zeros, and the sentinel test was reading stale cache. - Now (this finding): kernel writes, writes ALL ZEROS, hantro is silently failing the bitstream parse or some control validation.
The actual bug surface narrows substantially. Phase 6 work is now bisecting the control-submission / bitstream-parse path, not investigating "why doesn't kernel get called."
Evidence
ftrace v4l2/vb2 events (132 lines, full mpv vaapi-copy --frames=2 lifecycle)
For every CAPTURE_MPLANE QBUF/DQBUF cycle, hantro reports:
vb2_buf_done: ... index = 0, type = 9, bytesused = 3655712, timestamp = 1777893356955599000
type=9 = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE, bytesused=3655712 = 1920×1088 NV12 + hantro tile padding. Kernel signals a successful full-buffer write. Identical signature for every one of 10 CAPTURE buffers in the trace.
OUTPUT_MPLANE QBUF for each frame shows realistic per-slice sizes: 6272 (IDR), 108 (small P), 109, 144299 (B-frame size), 117, 122, etc. Real H.264 bitstream, real per-frame sizes.
v4l2_dqbuf reports bytesused=0 to userspace — that's the multi-plane top-level field which is always 0; the per-plane bytesused lives in m.planes[] and isn't traced by these events.
dmesg — completely silent
$ grep -iE "hantro|vpu|decode|fail|error|reject|einval|warn" dmesg.txt | wc -l
0
No kernel-side error indication. Hantro doesn't log when its bitstream parser rejects something at this severity tier. (Worth turning on full dynamic_debug for the verisilicon driver in a future probe.)
Visual disambiguator — the load-bearing data point
After the prior commit (e892cea) declared the sentinel test conclusive, the operator ran two real-VO tests in their live Plasma 6 Wayland session:
| Variant | Decode path | Render path | Result |
|---|---|---|---|
| 1 | --hwdec=vaapi-copy (vaMapBuffer copies CAPTURE → system memory, then GL upload) |
--vo=gpu |
Solid GREEN frame |
| 2 | --hwdec=vaapi (CAPTURE stays as DMA-BUF) |
--vo=gpu (Mesa-imports DMA-BUF as GL texture) |
Solid BLUE frame |
Solid green for variant 1 means: NV12 luma=0, chroma=0, BT.709 limited-range expansion → green is the canonical "all-zeros NV12 displayed as RGB" color. Solid blue for variant 2 means same buffer, different colorspace/format interpretation by Mesa's DMA-BUF importer. Neither shows the sentinel mid-beige (0xab on both Y and UV would render as cream/beige), and neither shows real bunny pixels.
This rules out "decode produces real pixels" (would be bunny). It rules out "kernel never wrote buffer" (would be sentinel beige in variant 1, since vaapi-copy does its own cache invalidation and would see whatever's in physical memory — and the strace already showed the sentinel was the last thing written there pre-DMA).
The only remaining explanation: kernel DMA-writes zeros to the buffer, which overwrites the sentinel in physical memory. vaapi-copy and Mesa DMA-BUF import both correctly cache-invalidate and see the zeros. Patch 0011's sentinel-readback in libva-v4l2-request's picture.c::RequestEndPicture reads from a stale cache view — sees the sentinel that's no longer in physical memory.
Cache-coherency bug in patch 0011 — root cause
Patch 0011 instruments src/picture.c::RequestEndPicture to write 0xab × 32 into surface_object->destination_map[0] immediately before VIDIOC_QBUF. The mmap pointer was set up by libva-v4l2-request's surface.c via mmap(MAP_SHARED, V4L2_MEMORY_MMAP fd). On hantro/RK3568 (CMA-backed, dma-contig allocator), this mmap is CACHED by default unless the queue requests V4L2_MEMORY_DMABUF or the driver applies non-cached attrs. After QBUF → kernel DMA write → DQBUF, the userspace cached view is stale relative to physical memory.
Patch 0010's hex-dump (which reads back the buffer for the DEBUG output) reads from the same cached mmap — which is why the dump consistently shows ab ab ab ab ...: the cache line containing the first 32 bytes was filled by the userspace 0xab write and never invalidated.
The fix for the test (so it stops misleading future probes):
msync(p, N, MS_SYNC | MS_INVALIDATE)after DQBUF before reading- OR call DMA-BUF cache sync via
DMA_BUF_IOCTL_SYNCif using EXPBUF'd fd - OR allocate via V4L2_MEMORY_DMABUF instead of MMAP (changes the ownership model — different fix)
- OR use VIDIOC_PREPARE_BUF with cache-flush flags (not supported by hantro M2M IIUC)
The fix doesn't matter for production — patches 0010/0011 are explicitly DEBUG-only (PKGBUILD comments say "Removed before upstream submission"). What matters is the test was load-bearing for diagnosis and was wrong. The whole cascade of "decode is broken" → "we need kernel observability" → this finding came from trusting it.
What hantro is actually doing (and not doing)
Hantro's H.264 frontend likely does this for our requests:
- Accepts the V4L2_BUF + REQUEST_FD pair
- Reads the V4L2_CTRL_TYPE_H264_* controls from the request
- Validates them; if validation fails silently, falls through
- Parses the OUTPUT slice
- If parse fails (e.g. SPS/PPS mismatch with slice data, scaling matrix shape wrong, decode_mode wrong), the hardware pipeline still runs but produces zeros — because the decode "kernel" (the actual silicon pipeline) gets garbage matrix coefficients or stops at slice header
vb2_set_plane_payload(buf, 0, full_buffer_size)is called regardless — explaining the bytesused=3655712 signal even though no real pixels were written
This is consistent with what patch 0011's commit message anticipated:
All zeros → kernel did write 0x00s (overwriting our sentinel), and the apparent "no picture" output is the kernel-side decode actually producing zeros (e.g. parser rejected the bitstream).
That hypothesis was right; we just couldn't confirm it via the sentinel test (cache bug) and went down the wrong rabbit hole.
Phase 6 priority list (substantially narrowed)
The bug isn't "we can't engage hantro." The bug is "hantro engages but its parser produces zeros." Bisect the control submission:
- Use
VIDIOC_G_EXT_CTRLSto read controls back from the request fd before QUEUE. Compare against what0008-h264-fill-decode-params-from-vaapi.patchpopulated. Any discrepancy = our writes aren't sticking. - Compare the per-frame control set against FFmpeg's
v4l2_request_h264.c(proven working on hantro, per Bootlin's downstream branchv4l2-request-n8.1). Diff: which fields does FFmpeg set that we don't? - Verify SPS submission: the V4L2_CTRL_TYPE_H264_SPS control needs the full SPS struct. Patch 0013/0018 set level_idc; verify other SPS fields (
profile_idc,seq_parameter_set_id,log2_max_frame_num_minus4,pic_order_cnt_type,log2_max_pic_order_cnt_lsb_minus4,max_num_ref_frames, etc.) — these come from VAAPI'sVAPictureParameterBufferH264not directly, so we need to derive them from the bitstream or rely on VAAPI passing them. - DECODE_PARAMS bit_size fields (patch 0008's open question, never resolved): if hantro uses these for slice header parsing in FRAME_BASED mode, our zeros could trigger silent reject. FFmpeg sets these by parsing the slice header bit-precisely.
- POC sentinel — re-run with VIDIOC_G_EXT_CTRLS readback after patch-0015's strip to confirm the values reach V4L2 as 0/0, not 65536/65536.
reference_tsfor the IDR frame: even though IDRs have no references, hantro may still validate the field. Should be 0 / monotonic from frame 0.
Next observable I should add to the rig
For the next decode probe:
- Enable
dyndbg="file drivers/media/platform/verisilicon/* +pmflt"on the hantro module to surface any compiled-indev_dbgcalls. - (If module reload is acceptable) reload hantro with
dyndbgset at module-insert time. - Add ftrace function-graph for
hantro_*symbols to see the kernel decode path. - Add a v4l2-ctl based reproducer (smaller and easier to instrument than mpv): feed bbb_h264.h264 (Annex B raw) directly via
v4l2-ctl --stream-out-mmap— but the request-API shape makes this awkward. May need to write a small C reproducer.
Phase 0 status (re-revised)
- #1 Re-verify failure-mode finding — ✗ kernel side completes successfully but produces zero-pixel output. Bitstream parser silently fails. Original 2026-04-26 STUDY claim was wrong at the pixel-content layer.
- #2 Step 1 reconciliation — ✓ done in commit
74b3793. - #3 Firefox engagement — ✓ engages libva in live session; same zero-pixel decode failure as mpv. Falls back to SW after frame 0.
- #4 Phase 0 baseline anchor — partial: userspace contract trace ✓, kernel-side ftrace ✓ (this run), but the patch-0011 sentinel test we relied on for pixel-content verification is buggy. Need to fix or replace the pixel-content check before Phase 1 lock.
Artifacts
ftrace.txt— 132 lines, v4l2/vb2/dma_fence events for the mpv vaapi-copy --frames=2 rundmesg.txt— 63 lines, no hantro/vpu/decode/error/warn messagesmpv.stderr— 40 lines, the .so DEBUG dumps (10 sentinel-survivor CAPTURE reads, all of which we now know are cache-stale)mpv.stdout— 107 lines, normal mpv playback log (Using hardware decoding (vaapi-copy).)