365764fffb
Re-baselined libva-v4l2-request decode path with kernel-side observability (ftrace v4l2/vb2/dma_fence + dmesg + dynamic_debug) and visual disambiguator (mpv --vo=gpu in operator's live Plasma session). Findings: 1. Kernel reports successful CAPTURE buffer write every frame: ftrace vb2_buf_done shows bytesused=3655712 (full NV12 1920x1088 + hantro tile padding). dmesg completely silent — no hantro/vpu/decode/error/warn messages. 2. Visual disambiguator: mpv --hwdec=vaapi-copy --vo=gpu shows a solid GREEN frame; --hwdec=vaapi --vo=gpu shows solid BLUE. Neither shows the sentinel mid-beige (NV12 Y=0xab,UV=0xab would render cream). Both colors are consistent with the kernel writing all-zero NV12 (Y=0,UV=0 → green via BT.709 limited; same buffer GL-imported as DMA-BUF with different colorspace → blue). 3. Patch 0011 sentinel test has a cache-coherency bug: writes 0xab via cached surface_object->destination_map[0] mmap, never invalidates cache before readback. So the readback always shows the stale sentinel even when kernel DMA-overwrote it with zeros. vaapi-copy and Mesa DMA-BUF GL import correctly invalidate cache and see the real (zero) contents. This corrects the previous Phase 0 verdicts twice in one day: - Original commitf15ba8b("the 2026-04-26 picture holds") was wrong: clean contract trace, never checked pixel content. - Revised commite892cea("kernel produces no decoded pixel output, sentinel survives") was half right: kernel does write, writes zeros, and the sentinel test was reading stale cache. - Now: kernel writes ALL ZEROS to the CAPTURE buffer. Hantro is silently failing the bitstream parse or some control validation. This is consistent with patch 0011's own commit message hypothesis: "All zeros → kernel did write 0x00s (overwriting our sentinel), and the apparent 'no picture' output is the kernel-side decode actually producing zeros (e.g. parser rejected the bitstream)." That hypothesis was right; we just couldn't confirm it via the sentinel test (cache bug) and went down the wrong rabbit hole. Phase 6 direction sharpens substantially. Bug isn't "we can't engage hantro" — it's "hantro engages but its parser produces zeros." Bisect the control submission: VIDIOC_G_EXT_CTRLS readback to verify writes stick, diff against FFmpeg's v4l2_request_h264.c (proven working on hantro), verify SPS completeness, resolve patch 0008's slice_header bit_size open question, dyndbg the hantro module, etc. Phase 1 boolean- correctness criterion needs a working pixel-content check before lock; fix patch 0011's cache sync first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
116 lines
10 KiB
Markdown
116 lines
10 KiB
Markdown
# Phase 0 deliverable #1 + #4 amendment — kernel-side observability (2026-05-04)
|
||
|
||
In-session re-baseline of the libva-v4l2-request decode path with kernel-side observability (ftrace v4l2/vb2/dma_fence events + dmesg + dynamic_debug on v4l2-core/videobuf2 + a real-VO visual disambiguator). Supersedes the userspace-only baseline from `phase0_evidence/2026-05-04/findings.md`.
|
||
|
||
## Verdict
|
||
|
||
**Hantro accepts the request, reports successful buffer write, and produces all-zero output.** The patch-0011 sentinel test that we previously read as "kernel never wrote the buffer" was misleading due to a cache-coherency bug in the patch's mmap read — kernel DMA-writes the buffer (with zeros), and the cached userspace mmap continues to show the stale sentinel `0xab` until something invalidates the cache. Tools that DO invalidate (vaapi-copy via libva's vaMapBuffer, GL DMA-BUF import via Mesa) see the real contents (zeros).
|
||
|
||
**This corrects the previous Phase 0 verdict twice in one day**:
|
||
- Original prior commit `f15ba8b`: "the 2026-04-26 picture holds" — wrong, contract trace was clean but pixel content wasn't checked.
|
||
- Revised commit `e892cea`: "kernel produces no decoded pixel output, sentinel survives" — half right; kernel does write, but writes zeros, and the sentinel test was reading stale cache.
|
||
- Now (this finding): **kernel writes, writes ALL ZEROS, hantro is silently failing the bitstream parse or some control validation.**
|
||
|
||
The actual bug surface narrows substantially. Phase 6 work is now bisecting the control-submission / bitstream-parse path, not investigating "why doesn't kernel get called."
|
||
|
||
## Evidence
|
||
|
||
### ftrace v4l2/vb2 events (132 lines, full mpv vaapi-copy --frames=2 lifecycle)
|
||
|
||
For every CAPTURE_MPLANE QBUF/DQBUF cycle, hantro reports:
|
||
```
|
||
vb2_buf_done: ... index = 0, type = 9, bytesused = 3655712, timestamp = 1777893356955599000
|
||
```
|
||
|
||
`type=9 = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE`, `bytesused=3655712 = 1920×1088 NV12 + hantro tile padding`. **Kernel signals a successful full-buffer write.** Identical signature for every one of 10 CAPTURE buffers in the trace.
|
||
|
||
OUTPUT_MPLANE QBUF for each frame shows realistic per-slice sizes: 6272 (IDR), 108 (small P), 109, 144299 (B-frame size), 117, 122, etc. Real H.264 bitstream, real per-frame sizes.
|
||
|
||
`v4l2_dqbuf` reports `bytesused=0` to userspace — that's the multi-plane top-level field which is always 0; the per-plane bytesused lives in `m.planes[]` and isn't traced by these events.
|
||
|
||
### dmesg — completely silent
|
||
|
||
```
|
||
$ grep -iE "hantro|vpu|decode|fail|error|reject|einval|warn" dmesg.txt | wc -l
|
||
0
|
||
```
|
||
|
||
No kernel-side error indication. Hantro doesn't log when its bitstream parser rejects something at this severity tier. (Worth turning on full dynamic_debug for the verisilicon driver in a future probe.)
|
||
|
||
### Visual disambiguator — the load-bearing data point
|
||
|
||
After the prior commit (`e892cea`) declared the sentinel test conclusive, the operator ran two real-VO tests in their live Plasma 6 Wayland session:
|
||
|
||
| Variant | Decode path | Render path | Result |
|
||
|---|---|---|---|
|
||
| 1 | `--hwdec=vaapi-copy` (vaMapBuffer copies CAPTURE → system memory, then GL upload) | `--vo=gpu` | **Solid GREEN frame** |
|
||
| 2 | `--hwdec=vaapi` (CAPTURE stays as DMA-BUF) | `--vo=gpu` (Mesa-imports DMA-BUF as GL texture) | **Solid BLUE frame** |
|
||
|
||
Solid green for variant 1 means: NV12 luma=0, chroma=0, BT.709 limited-range expansion → green is the canonical "all-zeros NV12 displayed as RGB" color. Solid blue for variant 2 means same buffer, different colorspace/format interpretation by Mesa's DMA-BUF importer. **Neither shows the sentinel mid-beige (`0xab` on both Y and UV would render as cream/beige), and neither shows real bunny pixels.**
|
||
|
||
This rules out "decode produces real pixels" (would be bunny). It rules out "kernel never wrote buffer" (would be sentinel beige in variant 1, since vaapi-copy does its own cache invalidation and would see whatever's in physical memory — and the strace already showed the sentinel was the last thing written there pre-DMA).
|
||
|
||
The only remaining explanation: **kernel DMA-writes zeros to the buffer**, which overwrites the sentinel in physical memory. vaapi-copy and Mesa DMA-BUF import both correctly cache-invalidate and see the zeros. Patch 0011's sentinel-readback in libva-v4l2-request's `picture.c::RequestEndPicture` reads from a stale cache view — sees the sentinel that's no longer in physical memory.
|
||
|
||
## Cache-coherency bug in patch 0011 — root cause
|
||
|
||
Patch 0011 instruments `src/picture.c::RequestEndPicture` to write `0xab × 32` into `surface_object->destination_map[0]` immediately before VIDIOC_QBUF. The mmap pointer was set up by libva-v4l2-request's `surface.c` via `mmap(MAP_SHARED, V4L2_MEMORY_MMAP fd)`. On hantro/RK3568 (CMA-backed, dma-contig allocator), this mmap is **CACHED by default unless the queue requests V4L2_MEMORY_DMABUF or the driver applies non-cached attrs**. After QBUF → kernel DMA write → DQBUF, the userspace cached view is stale relative to physical memory.
|
||
|
||
Patch 0010's hex-dump (which reads back the buffer for the DEBUG output) reads from the same cached mmap — which is why the dump consistently shows `ab ab ab ab ...`: the cache line containing the first 32 bytes was filled by the userspace 0xab write and never invalidated.
|
||
|
||
The fix for the test (so it stops misleading future probes):
|
||
- `msync(p, N, MS_SYNC | MS_INVALIDATE)` after DQBUF before reading
|
||
- OR call DMA-BUF cache sync via `DMA_BUF_IOCTL_SYNC` if using EXPBUF'd fd
|
||
- OR allocate via V4L2_MEMORY_DMABUF instead of MMAP (changes the ownership model — different fix)
|
||
- OR use VIDIOC_PREPARE_BUF with cache-flush flags (not supported by hantro M2M IIUC)
|
||
|
||
The fix doesn't matter for production — patches 0010/0011 are explicitly DEBUG-only (PKGBUILD comments say "Removed before upstream submission"). What matters is **the test was load-bearing for diagnosis and was wrong**. The whole cascade of "decode is broken" → "we need kernel observability" → this finding came from trusting it.
|
||
|
||
## What hantro is actually doing (and not doing)
|
||
|
||
Hantro's H.264 frontend likely does this for our requests:
|
||
1. Accepts the V4L2_BUF + REQUEST_FD pair
|
||
2. Reads the V4L2_CTRL_TYPE_H264_* controls from the request
|
||
3. Validates them; if validation fails silently, falls through
|
||
4. Parses the OUTPUT slice
|
||
5. If parse fails (e.g. SPS/PPS mismatch with slice data, scaling matrix shape wrong, decode_mode wrong), the hardware pipeline still runs but produces zeros — because the decode "kernel" (the actual silicon pipeline) gets garbage matrix coefficients or stops at slice header
|
||
6. `vb2_set_plane_payload(buf, 0, full_buffer_size)` is called regardless — explaining the bytesused=3655712 signal even though no real pixels were written
|
||
|
||
This is consistent with what patch 0011's commit message anticipated:
|
||
> All zeros → kernel did write 0x00s (overwriting our sentinel), and the apparent "no picture" output is the kernel-side decode actually producing zeros (e.g. parser rejected the bitstream).
|
||
|
||
That hypothesis was **right**; we just couldn't confirm it via the sentinel test (cache bug) and went down the wrong rabbit hole.
|
||
|
||
## Phase 6 priority list (substantially narrowed)
|
||
|
||
The bug isn't "we can't engage hantro." The bug is "hantro engages but its parser produces zeros." Bisect the control submission:
|
||
|
||
1. **Use `VIDIOC_G_EXT_CTRLS` to read controls back from the request fd before QUEUE.** Compare against what `0008-h264-fill-decode-params-from-vaapi.patch` populated. Any discrepancy = our writes aren't sticking.
|
||
2. **Compare the per-frame control set against FFmpeg's `v4l2_request_h264.c`** (proven working on hantro, per Bootlin's downstream branch `v4l2-request-n8.1`). Diff: which fields does FFmpeg set that we don't?
|
||
3. **Verify SPS submission:** the V4L2_CTRL_TYPE_H264_SPS control needs the full SPS struct. Patch 0013/0018 set level_idc; verify other SPS fields (`profile_idc`, `seq_parameter_set_id`, `log2_max_frame_num_minus4`, `pic_order_cnt_type`, `log2_max_pic_order_cnt_lsb_minus4`, `max_num_ref_frames`, etc.) — these come from VAAPI's `VAPictureParameterBufferH264` not directly, so we need to derive them from the bitstream or rely on VAAPI passing them.
|
||
4. **DECODE_PARAMS bit_size fields** (patch 0008's open question, never resolved): if hantro uses these for slice header parsing in FRAME_BASED mode, our zeros could trigger silent reject. FFmpeg sets these by parsing the slice header bit-precisely.
|
||
5. **POC sentinel** — re-run with VIDIOC_G_EXT_CTRLS readback after patch-0015's strip to confirm the values reach V4L2 as 0/0, not 65536/65536.
|
||
6. **`reference_ts` for the IDR frame**: even though IDRs have no references, hantro may still validate the field. Should be 0 / monotonic from frame 0.
|
||
|
||
## Next observable I should add to the rig
|
||
|
||
For the next decode probe:
|
||
- Enable `dyndbg="file drivers/media/platform/verisilicon/* +pmflt"` on the hantro module to surface any compiled-in `dev_dbg` calls.
|
||
- (If module reload is acceptable) reload hantro with `dyndbg` set at module-insert time.
|
||
- Add ftrace function-graph for `hantro_*` symbols to see the kernel decode path.
|
||
- Add a v4l2-ctl based reproducer (smaller and easier to instrument than mpv): feed bbb_h264.h264 (Annex B raw) directly via `v4l2-ctl --stream-out-mmap` — but the request-API shape makes this awkward. May need to write a small C reproducer.
|
||
|
||
## Phase 0 status (re-revised)
|
||
|
||
- **#1 Re-verify failure-mode finding** — ✗ kernel side completes successfully but produces zero-pixel output. Bitstream parser silently fails. Original 2026-04-26 STUDY claim was wrong at the pixel-content layer.
|
||
- **#2 Step 1 reconciliation** — ✓ done in commit `74b3793`.
|
||
- **#3 Firefox engagement** — ✓ engages libva in live session; same zero-pixel decode failure as mpv. Falls back to SW after frame 0.
|
||
- **#4 Phase 0 baseline anchor** — partial: userspace contract trace ✓, kernel-side ftrace ✓ (this run), but the patch-0011 sentinel test we relied on for pixel-content verification is buggy. **Need to fix or replace the pixel-content check** before Phase 1 lock.
|
||
|
||
## Artifacts
|
||
|
||
- `ftrace.txt` — 132 lines, v4l2/vb2/dma_fence events for the mpv vaapi-copy --frames=2 run
|
||
- `dmesg.txt` — 63 lines, no hantro/vpu/decode/error/warn messages
|
||
- `mpv.stderr` — 40 lines, the .so DEBUG dumps (10 sentinel-survivor CAPTURE reads, all of which we now know are cache-stale)
|
||
- `mpv.stdout` — 107 lines, normal mpv playback log (`Using hardware decoding (vaapi-copy).`)
|