Phase 0 amendment: hantro writes zeros, sentinel test cache-buggy

Re-baselined libva-v4l2-request decode path with kernel-side
observability (ftrace v4l2/vb2/dma_fence + dmesg + dynamic_debug)
and visual disambiguator (mpv --vo=gpu in operator's live Plasma
session).

Findings:

1. Kernel reports successful CAPTURE buffer write every frame:
   ftrace vb2_buf_done shows bytesused=3655712 (full NV12 1920x1088
   + hantro tile padding). dmesg completely silent — no
   hantro/vpu/decode/error/warn messages.

2. Visual disambiguator: mpv --hwdec=vaapi-copy --vo=gpu shows a
   solid GREEN frame; --hwdec=vaapi --vo=gpu shows solid BLUE.
   Neither shows the sentinel mid-beige (NV12 Y=0xab,UV=0xab would
   render cream). Both colors are consistent with the kernel
   writing all-zero NV12 (Y=0,UV=0 → green via BT.709 limited; same
   buffer GL-imported as DMA-BUF with different colorspace → blue).

3. Patch 0011 sentinel test has a cache-coherency bug: writes
   0xab via cached surface_object->destination_map[0] mmap, never
   invalidates cache before readback. So the readback always
   shows the stale sentinel even when kernel DMA-overwrote it
   with zeros. vaapi-copy and Mesa DMA-BUF GL import correctly
   invalidate cache and see the real (zero) contents.

This corrects the previous Phase 0 verdicts twice in one day:
- Original commit f15ba8b ("the 2026-04-26 picture holds") was
  wrong: clean contract trace, never checked pixel content.
- Revised commit e892cea ("kernel produces no decoded pixel
  output, sentinel survives") was half right: kernel does write,
  writes zeros, and the sentinel test was reading stale cache.
- Now: kernel writes ALL ZEROS to the CAPTURE buffer. Hantro is
  silently failing the bitstream parse or some control validation.

This is consistent with patch 0011's own commit message hypothesis:
"All zeros → kernel did write 0x00s (overwriting our sentinel),
and the apparent 'no picture' output is the kernel-side decode
actually producing zeros (e.g. parser rejected the bitstream)."
That hypothesis was right; we just couldn't confirm it via the
sentinel test (cache bug) and went down the wrong rabbit hole.

Phase 6 direction sharpens substantially. Bug isn't "we can't
engage hantro" — it's "hantro engages but its parser produces
zeros." Bisect the control submission: VIDIOC_G_EXT_CTRLS
readback to verify writes stick, diff against FFmpeg's
v4l2_request_h264.c (proven working on hantro), verify SPS
completeness, resolve patch 0008's slice_header bit_size open
question, dyndbg the hantro module, etc. Phase 1 boolean-
correctness criterion needs a working pixel-content check before
lock; fix patch 0011's cache sync first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-04 11:39:42 +00:00
parent e892cea858
commit 365764fffb
6 changed files with 487 additions and 8 deletions
+30 -8
View File
@@ -146,7 +146,7 @@ Stock Firefox 150.0.1 + `media.ffmpeg.vaapi.enabled=true` + `LIBVA_DRIVER_NAME=v
**Result**: Firefox's RDD process dlopens libva.so.2 + libva-drm.so.2 + libva-x11.so.2 for capability probe then immediately closes them; never reaches `vaInitialize`. Gfx-environment platform-fitness check rejects VAAPI under Xvfb's software-framebuffer-with-no-DRI rig. Not a libva-side fault. Re-test in live session needed.
### Live Plasma Wayland session run — INVERTS PRIOR PHASE 0 VERDICT
### Live Plasma Wayland session run — and follow-up kernel-side disambiguation
Same Firefox profile + LIBVA env, executed inside the operator's active Plasma 6 Wayland session (XDG_SESSION_TYPE=wayland, XDG_RUNTIME_DIR=/run/user/1001). Full write-up: [`phase0_evidence/2026-05-04-firefox-live/findings.md`](phase0_evidence/2026-05-04-firefox-live/findings.md).
@@ -169,14 +169,36 @@ Phase 0 deliverable status corrections:
- **#3** (Firefox configuration end-to-end) — ✓ engagement confirmed in live Plasma session; pixel-content failure mode identical to mpv.
- **#4** (Phase 0 baseline anchor) — ✗ **AMENDED**: captured trace describes Step 1's userspace behaviour, not the kernel-side spec Phase 6 must reproduce.
**Phase 1 lock should be deferred** until: (a) the boolean-correctness criterion is sharpened to require pixel-content verification (sentinel-overwrite check, NV12 luma min/max sanity, etc.), and (b) Phase 0 includes a kernel-side observability layer (ftrace `events/v4l2/`, `dmesg` for silent decode errors) so we can characterize *why* hantro is silent. The Step 1 18-patch series engages libva but doesn't make hantro decode — Phase 6 has substantive work.
### Kernel-side re-baseline (2026-05-04) — corrects the prior verdict AGAIN
Likely failure-mode candidates (priority order, from patch comments):
1. `reference_ts` not propagated (per patch-0017 commit body: "hantro doesn't read pic_num, uses reference_ts")
2. DECODE_PARAMS slice_header bit_size fields all zero (patch 0008's open question, never resolved)
3. POC sentinel still leaking past patch-0015's strip (DEBUG dump runs *before* the strip; need post-strip verification via VIDIOC_G_EXT_CTRLS)
4. level_idc over-allocation interaction (patch 0013 → 0018 transition)
5. `V4L2_EVENT_SOURCE_CHANGE` not handled (open Q #5)
ftrace v4l2/vb2/dma_fence + dmesg + dynamic_debug enabled while running mpv `--hwdec=vaapi-copy --frames=2`. Full write-up: [`phase0_evidence/2026-05-04-kernel-trace/findings.md`](phase0_evidence/2026-05-04-kernel-trace/findings.md).
| Layer | Result |
|---|---|
| ftrace `vb2_buf_done` for CAPTURE_MPLANE | **`bytesused=3655712`** (full NV12 + hantro tile padding) reported every frame. **Kernel signals successful full-buffer write.** |
| dmesg | Completely silent. No hantro/vpu/decode/fail/error/reject/einval/warn. |
| Real-VO disambiguator (operator inspection in live session) | `--hwdec=vaapi-copy --vo=gpu`: **solid GREEN frame**. `--hwdec=vaapi --vo=gpu`: **solid BLUE frame**. NV12-with-Y=0,UV=0 BT.709-converted = green; same buffer via DMA-BUF GL import with different colorspace = blue. **Neither shows the sentinel mid-beige pattern; neither shows real bunny pixels.** |
**Corrected verdict**: hantro accepts the request, returns success, **and writes ALL ZEROS to the CAPTURE buffer**. The patch-0011 sentinel test we relied on is misleading — it has a **cache-coherency bug**. Patch 0011 writes `0xab` via cached `surface_object->destination_map[0]` mmap, but neither `0010-DEBUG-hex-dump` nor any other read path in libva-v4l2-request invalidates the cache after DQBUF. So the readback always shows the stale sentinel, hiding the fact that the kernel DMA-overwrote it with zeros. vaapi-copy and Mesa DMA-BUF GL import correctly invalidate cache and see the real (zero) contents.
**Bug surface narrows substantially.** The path is:
- libva engagement: ✓
- Contract trace: ✓ no EINVAL, all ioctls succeed
- Hantro request acceptance: ✓ kernel reports success
- **Hantro produces meaningful pixel output: ✗ writes ALL ZEROS** — almost certainly the bitstream parser silently rejects something (per patch-0011's own commit-message hypothesis: "the apparent 'no picture' output is the kernel-side decode actually producing zeros, e.g. parser rejected the bitstream")
This is consistent with a control-submission bug (something in SPS/PPS/DECODE_PARAMS is off), not a fundamental "we can't drive hantro" problem. Phase 6 work direction sharpens accordingly.
### Phase 6 priority list (revised after kernel-side baseline)
1. **Fix the patch-0011 sentinel test** (or replace it). Add `msync(MS_SYNC|MS_INVALIDATE)` or DMA-BUF cache sync before the readback. Without this, future debugging is unreliable in exactly the same way.
2. **VIDIOC_G_EXT_CTRLS readback** of the request fd before QUEUE — confirms our writes actually stick at the V4L2 layer (e.g. POC sentinel actually stripped to 0 by patch-0015, level_idc actually set, etc.).
3. **Diff our per-frame control set against FFmpeg's `v4l2_request_h264.c`** (proven working on hantro, downstream branch `code.ffmpeg.org/Kwiboo/FFmpeg.git v4l2-request-n8.1`). Identify any field FFmpeg sets that we don't.
4. **Verify SPS submission completeness**: VAAPI's `VAPictureParameterBufferH264` doesn't carry the full SPS — we may need to derive `profile_idc` / `seq_parameter_set_id` / `log2_max_frame_num_minus4` / `pic_order_cnt_type` / `log2_max_pic_order_cnt_lsb_minus4` / `max_num_ref_frames` from VAAPI fields or by parsing the slice header.
5. **DECODE_PARAMS slice_header bit_size fields** (patch 0008's never-resolved question): if hantro requires them for parse, our zeros could be the silent-reject trigger.
6. **dyndbg on hantro module**: reload with `dyndbg="file drivers/media/platform/verisilicon/* +pmflt"` to surface compiled-in `dev_dbg` calls for the next probe.
Phase 1 boolean-correctness criterion now must include pixel-content verification — but the verification can't rely on patch 0011 in its current form. Either fix patch 0011's cache sync, or use a different check: e.g. mpv `--vo=image-sequence` and inspect the dumped frame, or a small C reproducer that maps the buffer with proper cache flags and computes a luma histogram.
## Source-read references (carry-over from STUDY.md)