iter5 Phase 3: baseline — 4/5 libva codecs race-lose, MPEG-2 wins, kdirect clean

5-codec sweep matrix on linux-fresnel-fourier 7.0-1 confirms:
- libva path returns all-zero cap_pool init pattern for H.264 (mostly)
  HEVC, VP9, VP8 (always). MPEG-2 wins the race (fastest hantro decode).
- kernel-direct ffmpeg-v4l2request hwdownload byte-matches SW for all
  4 race-losing codecs.
- B4 cosmetic init-probe EINVAL noise reproduced on hantro (2 ioctl per
  codec); MPEG-2 + VP8 stateless control submissions follow at = 0.

iter4 P7's "RGB(0,0x4c,0)" pattern corrected to all-zero raw bytes
(the 0x4c was YUV→RGB conversion of all-zero NV12). Same SHA shape
as iter3's hantro b34860e0 blocker fingerprint.

Control-payload strace anchors persisted as phase-7 invariants.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-11 05:14:57 +00:00
parent 9941523f1f
commit 3c05564e99
2 changed files with 82 additions and 0 deletions
Binary file not shown.
+82
View File
@@ -0,0 +1,82 @@
# Iteration 5 — Phase 3 (baseline)
Captured 2026-05-11 06:23 07:07 CEST on fresnel `linux-fresnel-fourier 7.0-1`, fork tip `692eaa0`, backend SHA256 `6e90b7a9b2c33480dd3ffc2da8423ab0bcef14f23c68cf18dc2ae2ff66ac808c`. Establishes the pre-fix anchor for iter5 Phase 7 four-criterion verification.
## Substrate state at Phase 3 open
- Kernel: `7.0.0-fresnel-fourier #1 SMP PREEMPT Sat May 9 19:24:42 CEST 2026 aarch64`.
- Device topology this boot: `/dev/video2 + /dev/media0` = hantro-vpu (decoder, MPEG-2 + VP8); `/dev/video3 + /dev/media1` = rkvdec (H.264 + HEVC + VP9); `/dev/media2` = uvcvideo.
- Fork tip: `692eaa0` (iter4 Phase 7 fix-forward). Unchanged from iter5 Phase 0.
- Test fixtures intact in `~/fourier-test/`: `bbb_1080p30_h264.mp4`, `bbb_720p10s_hevc.mp4`, `bbb_720p10s_vp9.webm`, `bbb_720p10s_mpeg2.ts`, `bbb_720p10s_vp8.webm`.
## Pre-fix YUV sweep — Bug 2 reproduction matrix
Method: ffmpeg-vaapi-hwdownload (via libva backend), ffmpeg-v4l2request-hwdownload (kernel-direct, bypasses cap_pool), and ffmpeg SW. All in `format=nv12,format=yuv420p,-f rawvideo`, 3 frames each. Hash is SHA256 of the raw YUV bytes. Hantro env override used for MPEG-2 + VP8; rkvdec env override for the others.
| Codec | Fixture | libva hash | kdirect hash | sw hash | libva==kdirect | libva==sw | kdirect==sw |
|---|---|---|---|---|---|---|---|
| H.264 1080p30 | `bbb_1080p30_h264.mp4` | `71ac099b…` | `1e7a0bc9…` | `1e7a0bc9…` | ✗ | ✗ | **✓** |
| HEVC 720p | `bbb_720p10s_hevc.mp4` | `06b2c5a0…` | `9340b832…` | `9340b832…` | ✗ | ✗ | **✓** |
| VP9 720p | `bbb_720p10s_vp9.webm` | `06b2c5a0…` | `4f1565e8…` | `4f1565e8…` | ✗ | ✗ | **✓** |
| MPEG-2 720p | `bbb_720p10s_mpeg2.ts` | `19eefbf4…` | `19eefbf4…` | `7be8cad7…` | **✓** | ✗ | ✗ |
| VP8 720p | `bbb_720p10s_vp8.webm` | `06b2c5a0…` | `136ce5cb…` | `136ce5cb…` | ✗ | ✗ | **✓** |
Findings:
1. **Bug 2 reproduces on 4 of 5 codecs** (H.264, HEVC, VP9, VP8). The libva-backend cap_pool readback returns wrong bytes — not kernel-decoded pixels. Kernel-direct path is byte-identical to SW reference for the same 4 codecs (cleanly proves the kernel decode itself is correct).
2. **MPEG-2 actually passes through libva**`libva_mpeg2 == kdirect_mpeg2`. SW differs from both at the same magnitude as in other codecs' HW↔SW comparisons (well-known hantro G1 MPEG-2 idct/dequant precision drift; not a Bug-2 issue). MPEG-2 wins the cap_pool race because hantro MPEG-2 decode completes faster than the userspace cap_pool readback path can issue its read.
3. **Three 720p codecs share the same libva-broken hash** `06b2c5a0c01e515d009c0bfbe0e61fafb105a54da5ec621104915cd5949849e8`. Verified to be SHA256 of `4 147 200 = 1280 × 720 × 1.5 × 3` zero bytes (`python3 -c "import hashlib; print(hashlib.sha256(b'\x00' * 4147200).hexdigest())"`). The cap_pool init pattern is **all-zero**, not `RGB(0, 0x4c, 0)` as iter4 Phase 7 reported. The earlier `0x4c` description was the BT.709 limited-range YUV→RGB conversion of all-zero NV12 rendered through Pillow / ffmpeg PNG encoding.
4. **H.264 1080p has a different broken hash** `71ac099b…` because the buffer is 1920×1080 (9 331 200 zero bytes ≠ same SHA). Per-byte histogram: 99.99% of bytes are 0x00, with traces of frame-1 keyframe content (16 first bytes of frame 1 = `81818080807f7f7f7f7f7f8080808181` — neutral grey chroma). Frame 2 + 3 are fully zero. Confirms the "H.264 keyframe sometimes wins the race, inter frames always lose" pattern iter4 Phase 7 documented.
5. **Timing-race framing validated.** Decode-speed ordering matches race-loss pattern:
- MPEG-2 hantro: fastest → never loses → libva works.
- H.264 rkvdec: keyframe-only intermittent → loses on inter → libva mostly broken.
- HEVC / VP9 / VP8: always loses → libva always broken.
The race is between kernel `vb2_buffer_done(VB2_BUF_STATE_DONE)` (which currently signals nothing in dma_resv) and the userspace cap_pool readback path's `read()` / `mmap()` of the dmabuf. The RFC v2 patches close the window by attaching a real dma_fence at buf_queue and signalling it at buffer-done; userspace consumers waiting on dma_resv-implicit-sync (or anything that resolves through it) then block correctly until decode finishes.
## Control-payload anchors per codec
Method: `strace -ff -v -x -s 999999 -e trace=ioctl` over ffmpeg-vaapi 3-frame decode for each codec. Per-codec `VIDIOC_S_EXT_CTRLS` submission summary:
| Codec | Device | Total S_EXT_CTRLS | Success | EINVAL | EINVAL cause |
|---|---|---|---|---|---|
| H.264 | rkvdec | 14 | 14 | 0 | — |
| HEVC | rkvdec | 15 | 15 | 0 | — |
| VP9 | rkvdec | 13 | 13 | 0 | — |
| MPEG-2 | hantro | 8 | 6 | 2 | B4: H.264 + HEVC init probes EINVAL |
| VP8 | hantro | 15 | 13 | 2 | B4: same |
The 2 EINVAL submissions on hantro paths are the B4 cosmetic init-probe noise documented in Phase 0: backend submits `0xa40900 H264_DECODE_MODE / 0xa40901 H264_START_CODE` and `0xa40a95 HEVC_DECODE_MODE / 0xa40a96 HEVC_START_CODE` at context-init unconditionally. Hantro doesn't expose these CIDs (it's MPEG-2 + VP8 only), so the kernel rejects with EINVAL. The backend proceeds; the actual codec controls (`0xa409dc/dd/de` MPEG-2 or `0xa409c8` VP8) submit cleanly afterward at `= 0`.
Full per-PID strace artifacts persisted at `iter5_phase3_baseline.tgz/anchors_<codec>/trace.*` — bundle pulled to noether at `~/src/fresnel-fourier/iter5_phase3_baseline.tgz` (25 MB).
These are the iter5 baseline control-payload anchors. Phase 7 will re-run the same strace on the kernel-agent-produced `linux-fresnel-fourier 7.0.kafr2` substrate; each codec's `VIDIOC_S_EXT_CTRLS` payload should byte-match the iter5 baseline. **Predicted: byte-identical match for all 5 codecs**, because the kernel patches in iter5 don't touch control-handling code (they only add the `vb2_buffer_attach_release_fence` opt-in call at `*_buf_queue`).
## Open questions resolved at Phase 3
- ~~"Bug 3 — hantro UAPI drift"~~ — **Resolved null** at Phase 2 source-read. Empirical Phase 3 confirms: MPEG-2 controls submit at `= 0`, VP8 controls submit at `= 0` (modulo the 2 cosmetic EINVAL B4 init probes).
- "Does the cap_pool race window manifest as all-zero or non-zero pattern?" — **All-zero**, confirmed.
- "Does kernel-direct decode produce SW-equivalent YUV for all 5 codecs?" — **Yes for 4 of 5** (H.264, HEVC, VP9, VP8). MPEG-2 HW differs from SW at the codec-precision level, unrelated to Bug 2.
## Open questions still pending Phase 4
- Does the videobuf2-core.c RFC v2 helper patch apply cleanly v6.12 → v7.0? Pending boltzmann reconnection for the `git apply --3way` verification.
- Does `vb2_buffer_attach_release_fence()` need any v7.0-specific adjustments (e.g., dma_resv API rename or signature drift)? Phase 4 verifies during rebase.
- Will the rkvdec consumer patch need any rkvdec-specific buf_queue refactor (e.g., the buf_queue happens before a delayed-decode worker, in which case the fence should be attached later)? Phase 4 reviews rkvdec.c around line 954 in context.
## Artifacts persisted
- `iter5_phase3_baseline.tgz` (25 MB): `sweep.sh` (runner), `anchors.sh` (anchor capture), `*.yuv` (15 raw YUV files: 5 libva + 5 kdirect + 5 sw), `*.log` (15 ffmpeg stderr logs), `anchors_<codec>/trace.*` (5 codecs × per-PID strace files, ~85 files total).
## Phase 4 preview
With the empirical confirmation that:
- Bug 2 is consistently the timing race on all 4 broken codecs (and the MPEG-2 path that "works" actually wins the same race);
- Kernel-direct decode is byte-clean for all 5 codecs vs SW (for the 4 that have HW=SW expectation);
- Backend control-payload submissions are stable and identical across H.264/HEVC/VP9 (all 14/15/13 `= 0`) and modulo cosmetic noise on hantro;
Phase 4 plan is now fully informed: produce the 4-patch series (3 rebased + 1 new rkvdec consumer), update `fleet/fresnel.yaml`, build via kernel-agent (or manual fallback) on boltzmann, install on fresnel, re-run the sweep, verify `libva_<codec>.yuv == kdirect_<codec>.yuv` for all 5 codecs (the post-fix-equivalent of the "all 5 pass" criterion).