# Phase 0 deliverable #1 + #4 amendment — kernel-side observability (2026-05-04) In-session re-baseline of the libva-v4l2-request decode path with kernel-side observability (ftrace v4l2/vb2/dma_fence events + dmesg + dynamic_debug on v4l2-core/videobuf2 + a real-VO visual disambiguator). Supersedes the userspace-only baseline from `phase0_evidence/2026-05-04/findings.md`. ## Verdict **Hantro accepts the request, reports successful buffer write, and produces all-zero output.** The patch-0011 sentinel test that we previously read as "kernel never wrote the buffer" was misleading due to a cache-coherency bug in the patch's mmap read — kernel DMA-writes the buffer (with zeros), and the cached userspace mmap continues to show the stale sentinel `0xab` until something invalidates the cache. Tools that DO invalidate (vaapi-copy via libva's vaMapBuffer, GL DMA-BUF import via Mesa) see the real contents (zeros). **This corrects the previous Phase 0 verdict twice in one day**: - Original prior commit `f15ba8b`: "the 2026-04-26 picture holds" — wrong, contract trace was clean but pixel content wasn't checked. - Revised commit `e892cea`: "kernel produces no decoded pixel output, sentinel survives" — half right; kernel does write, but writes zeros, and the sentinel test was reading stale cache. - Now (this finding): **kernel writes, writes ALL ZEROS, hantro is silently failing the bitstream parse or some control validation.** The actual bug surface narrows substantially. Phase 6 work is now bisecting the control-submission / bitstream-parse path, not investigating "why doesn't kernel get called." ## Evidence ### ftrace v4l2/vb2 events (132 lines, full mpv vaapi-copy --frames=2 lifecycle) For every CAPTURE_MPLANE QBUF/DQBUF cycle, hantro reports: ``` vb2_buf_done: ... index = 0, type = 9, bytesused = 3655712, timestamp = 1777893356955599000 ``` `type=9 = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE`, `bytesused=3655712 = 1920×1088 NV12 + hantro tile padding`. **Kernel signals a successful full-buffer write.** Identical signature for every one of 10 CAPTURE buffers in the trace. OUTPUT_MPLANE QBUF for each frame shows realistic per-slice sizes: 6272 (IDR), 108 (small P), 109, 144299 (B-frame size), 117, 122, etc. Real H.264 bitstream, real per-frame sizes. `v4l2_dqbuf` reports `bytesused=0` to userspace — that's the multi-plane top-level field which is always 0; the per-plane bytesused lives in `m.planes[]` and isn't traced by these events. ### dmesg — completely silent ``` $ grep -iE "hantro|vpu|decode|fail|error|reject|einval|warn" dmesg.txt | wc -l 0 ``` No kernel-side error indication. Hantro doesn't log when its bitstream parser rejects something at this severity tier. (Worth turning on full dynamic_debug for the verisilicon driver in a future probe.) ### Visual disambiguator — the load-bearing data point After the prior commit (`e892cea`) declared the sentinel test conclusive, the operator ran two real-VO tests in their live Plasma 6 Wayland session: | Variant | Decode path | Render path | Result | |---|---|---|---| | 1 | `--hwdec=vaapi-copy` (vaMapBuffer copies CAPTURE → system memory, then GL upload) | `--vo=gpu` | **Solid GREEN frame** | | 2 | `--hwdec=vaapi` (CAPTURE stays as DMA-BUF) | `--vo=gpu` (Mesa-imports DMA-BUF as GL texture) | **Solid BLUE frame** | Solid green for variant 1 means: NV12 luma=0, chroma=0, BT.709 limited-range expansion → green is the canonical "all-zeros NV12 displayed as RGB" color. Solid blue for variant 2 means same buffer, different colorspace/format interpretation by Mesa's DMA-BUF importer. **Neither shows the sentinel mid-beige (`0xab` on both Y and UV would render as cream/beige), and neither shows real bunny pixels.** This rules out "decode produces real pixels" (would be bunny). It rules out "kernel never wrote buffer" (would be sentinel beige in variant 1, since vaapi-copy does its own cache invalidation and would see whatever's in physical memory — and the strace already showed the sentinel was the last thing written there pre-DMA). The only remaining explanation: **kernel DMA-writes zeros to the buffer**, which overwrites the sentinel in physical memory. vaapi-copy and Mesa DMA-BUF import both correctly cache-invalidate and see the zeros. Patch 0011's sentinel-readback in libva-v4l2-request's `picture.c::RequestEndPicture` reads from a stale cache view — sees the sentinel that's no longer in physical memory. ## Cache-coherency bug in patch 0011 — root cause Patch 0011 instruments `src/picture.c::RequestEndPicture` to write `0xab × 32` into `surface_object->destination_map[0]` immediately before VIDIOC_QBUF. The mmap pointer was set up by libva-v4l2-request's `surface.c` via `mmap(MAP_SHARED, V4L2_MEMORY_MMAP fd)`. On hantro/RK3568 (CMA-backed, dma-contig allocator), this mmap is **CACHED by default unless the queue requests V4L2_MEMORY_DMABUF or the driver applies non-cached attrs**. After QBUF → kernel DMA write → DQBUF, the userspace cached view is stale relative to physical memory. Patch 0010's hex-dump (which reads back the buffer for the DEBUG output) reads from the same cached mmap — which is why the dump consistently shows `ab ab ab ab ...`: the cache line containing the first 32 bytes was filled by the userspace 0xab write and never invalidated. The fix for the test (so it stops misleading future probes): - `msync(p, N, MS_SYNC | MS_INVALIDATE)` after DQBUF before reading - OR call DMA-BUF cache sync via `DMA_BUF_IOCTL_SYNC` if using EXPBUF'd fd - OR allocate via V4L2_MEMORY_DMABUF instead of MMAP (changes the ownership model — different fix) - OR use VIDIOC_PREPARE_BUF with cache-flush flags (not supported by hantro M2M IIUC) The fix doesn't matter for production — patches 0010/0011 are explicitly DEBUG-only (PKGBUILD comments say "Removed before upstream submission"). What matters is **the test was load-bearing for diagnosis and was wrong**. The whole cascade of "decode is broken" → "we need kernel observability" → this finding came from trusting it. ## What hantro is actually doing (and not doing) Hantro's H.264 frontend likely does this for our requests: 1. Accepts the V4L2_BUF + REQUEST_FD pair 2. Reads the V4L2_CTRL_TYPE_H264_* controls from the request 3. Validates them; if validation fails silently, falls through 4. Parses the OUTPUT slice 5. If parse fails (e.g. SPS/PPS mismatch with slice data, scaling matrix shape wrong, decode_mode wrong), the hardware pipeline still runs but produces zeros — because the decode "kernel" (the actual silicon pipeline) gets garbage matrix coefficients or stops at slice header 6. `vb2_set_plane_payload(buf, 0, full_buffer_size)` is called regardless — explaining the bytesused=3655712 signal even though no real pixels were written This is consistent with what patch 0011's commit message anticipated: > All zeros → kernel did write 0x00s (overwriting our sentinel), and the apparent "no picture" output is the kernel-side decode actually producing zeros (e.g. parser rejected the bitstream). That hypothesis was **right**; we just couldn't confirm it via the sentinel test (cache bug) and went down the wrong rabbit hole. ## Phase 6 priority list (substantially narrowed) The bug isn't "we can't engage hantro." The bug is "hantro engages but its parser produces zeros." Bisect the control submission: 1. **Use `VIDIOC_G_EXT_CTRLS` to read controls back from the request fd before QUEUE.** Compare against what `0008-h264-fill-decode-params-from-vaapi.patch` populated. Any discrepancy = our writes aren't sticking. 2. **Compare the per-frame control set against FFmpeg's `v4l2_request_h264.c`** (proven working on hantro, per Bootlin's downstream branch `v4l2-request-n8.1`). Diff: which fields does FFmpeg set that we don't? 3. **Verify SPS submission:** the V4L2_CTRL_TYPE_H264_SPS control needs the full SPS struct. Patch 0013/0018 set level_idc; verify other SPS fields (`profile_idc`, `seq_parameter_set_id`, `log2_max_frame_num_minus4`, `pic_order_cnt_type`, `log2_max_pic_order_cnt_lsb_minus4`, `max_num_ref_frames`, etc.) — these come from VAAPI's `VAPictureParameterBufferH264` not directly, so we need to derive them from the bitstream or rely on VAAPI passing them. 4. **DECODE_PARAMS bit_size fields** (patch 0008's open question, never resolved): if hantro uses these for slice header parsing in FRAME_BASED mode, our zeros could trigger silent reject. FFmpeg sets these by parsing the slice header bit-precisely. 5. **POC sentinel** — re-run with VIDIOC_G_EXT_CTRLS readback after patch-0015's strip to confirm the values reach V4L2 as 0/0, not 65536/65536. 6. **`reference_ts` for the IDR frame**: even though IDRs have no references, hantro may still validate the field. Should be 0 / monotonic from frame 0. ## Next observable I should add to the rig For the next decode probe: - Enable `dyndbg="file drivers/media/platform/verisilicon/* +pmflt"` on the hantro module to surface any compiled-in `dev_dbg` calls. - (If module reload is acceptable) reload hantro with `dyndbg` set at module-insert time. - Add ftrace function-graph for `hantro_*` symbols to see the kernel decode path. - Add a v4l2-ctl based reproducer (smaller and easier to instrument than mpv): feed bbb_h264.h264 (Annex B raw) directly via `v4l2-ctl --stream-out-mmap` — but the request-API shape makes this awkward. May need to write a small C reproducer. ## Phase 0 status (re-revised) - **#1 Re-verify failure-mode finding** — ✗ kernel side completes successfully but produces zero-pixel output. Bitstream parser silently fails. Original 2026-04-26 STUDY claim was wrong at the pixel-content layer. - **#2 Step 1 reconciliation** — ✓ done in commit `74b3793`. - **#3 Firefox engagement** — ✓ engages libva in live session; same zero-pixel decode failure as mpv. Falls back to SW after frame 0. - **#4 Phase 0 baseline anchor** — partial: userspace contract trace ✓, kernel-side ftrace ✓ (this run), but the patch-0011 sentinel test we relied on for pixel-content verification is buggy. **Need to fix or replace the pixel-content check** before Phase 1 lock. ## Artifacts - `ftrace.txt` — 132 lines, v4l2/vb2/dma_fence events for the mpv vaapi-copy --frames=2 run - `dmesg.txt` — 63 lines, no hantro/vpu/decode/error/warn messages - `mpv.stderr` — 40 lines, the .so DEBUG dumps (10 sentinel-survivor CAPTURE reads, all of which we now know are cache-stale) - `mpv.stdout` — 107 lines, normal mpv playback log (`Using hardware decoding (vaapi-copy).`)