iter5 Phase 0 loopback: real Bug 2 is surface.c:173 hardcoded OUTPUT format

Empirical strace of all 5 codecs through libva shows VIDIOC_S_FMT on OUTPUT_MPLANE ships pixelformat V4L2_PIX_FMT_H264_SLICE for EVERY profile. HEVC controls submitted on H264_SLICE OUTPUT → kernel rkvdec silently rejects/no-ops → CAPTURE stays in cap_pool init (all-zero). Per-codec Bug 2 taxonomy: - HEVC, VP9, VP8: OUTPUT format mismatch on rkvdec/hantro-strict → 100% zero - MPEG-2: format mismatch but hantro tolerates → works - H.264: format right by coincidence; keyframe decodes, inter all-zero (Bug 4, separate, deferred from iter5b) Site: src/surface.c:173 `unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE`. Same bug class as feedback_unconditional_codec_state.md (iter4 h264_start_code = true). iter5b new Phase 1: fix surface.c to switch pixelformat on config_object->profile. 4 criteria locked, all backend-side, no kernel patches. RFC v2 series filed back to backlog for a future DMABUF-import-consumer campaign. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 11:21:41 +00:00
parent 0adfb11fff
commit cd34ec1918
1 changed files with 147 additions and 0 deletions
@@ -0,0 +1,147 @@
+# Iteration 5 — Phase 0 loopback (re-root-cause Bug 2)
+
+Captured 2026-05-11 mid-day after Phase 5 review CRIT-1 invalidated the iter5 Phase 4 plan. iter5 returns to Phase 0 per `feedback_dev_process.md`. The vb2_dma_resv-RFC-v2-as-Bug-2-fix hypothesis is **rejected**. This document captures the new empirical evidence and re-frames iter5.
+
+## What Phase 5 + this loopback found
+
+The original Bug 2 framing (Phase 0 Candidate B, Phase 2 situation, Phase 3 baseline, Phase 4 plan) was: "userspace cap_pool readback races ahead of kernel decoder completion; vb2_dma_resv RFC v2 closes the race." Phase 5 reviewer empirically traced producer→primitive→consumer-read-site through the actual libva backend code path and found the fence mechanism never reaches the MMAP+EXPBUF path the backend uses. The author re-verified at the surface.c source level: `RequestSyncSurface` already does `media_request_wait_completion + v4l2_dequeue_buffer`, which already block until decode-DONE. The fence would have been a no-op for the libva path even if it had reached the right resv.
+
+So why are pages all-zero?
+
+This loopback's empirical investigation answers it.
+
+### Empirical finding: OUTPUT pixel format hardcoded H264_SLICE
+
+`/home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/src/surface.c:173`:
+
+```c
+unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE;
+```
+
+This is the OUTPUT-side pixel format the backend sets on the V4L2 queue. **It is hardcoded, regardless of which profile is active.** When the backend then submits HEVC controls (`V4L2_CID_STATELESS_HEVC_*`) on an OUTPUT buffer queued with H264_SLICE format, the kernel rkvdec driver sees a fundamental contract mismatch:
+
+- OUTPUT buffer format claims: H.264 NAL slices.
+- Submitted controls claim: HEVC SPS/PPS/decode_params/slice_params/scaling_matrix.
+
+Kernel doesn't decode (logs no error in dmesg — silent rejection). The CAPTURE buffer stays in the cap_pool init state (all-zero). When userspace `vaDeriveImage + vaMapBuffer + ffmpeg hwdownload` reads, it reads the unmodified all-zero pages.
+
+This is the **same class of bug** as memory `feedback_unconditional_codec_state.md`: codec-specific state set unconditionally without profile gating. The iter4 fix for `h264_start_code` gated it on H.264/HEVC profiles. Surface.c's pixelformat needs an analogous gate.
+
+### Per-codec Bug 2 taxonomy (post-loopback)
+
+| Codec | OUTPUT format in libva | Match? | Phase 3 libva result | Phase 3 explanation |
+|---|---|---|---|---|
+| H.264 | H264_SLICE | **right** | 99.99% zero + traces of keyframe content | Format right; **separate H.264 inter-frame bug** — race or something else |
+| HEVC | H264_SLICE | **wrong** | 100% zero | OUTPUT format mismatch → kernel rkvdec rejects/no-ops |
+| VP9 | H264_SLICE | **wrong** | 100% zero | OUTPUT format mismatch → kernel rkvdec rejects/no-ops |
+| MPEG-2 | H264_SLICE | **wrong but tolerated** | Real decoded pixels (libva == kdirect) | hantro is single-codec; ignores OUTPUT format mismatch, dispatches on control class. Or got lucky on timing. |
+| VP8 | H264_SLICE | **wrong** | 100% zero | OUTPUT format mismatch on hantro → unlike MPEG-2, VP8 doesn't tolerate it |
+
+Empirically verified via strace inspection at fresnel `/tmp/iter5_fmt/<codec>/trace.*` 2026-05-11: each codec's first `VIDIOC_S_FMT` on `V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE` ships pixelformat `S264` (= `V4L2_PIX_FMT_H264_SLICE`, FOURCC 0x34363253). No corrective re-S_FMT later in any codec's trace.
+
+### What this means for iter1-iter4 close validity
+
+iter2 (HEVC), iter3 (VP8), and iter4 (VP9) all closed via **transitive proof**: backend's `VIDIOC_S_EXT_CTRLS` payload byte-matched the kernel-direct ffmpeg-v4l2request anchor. That proof verified the **control-payload contract**, not the OUTPUT pixel format. The transitive proof was technically sound for what it claimed (controls correctly shaped), but **silently masked** the OUTPUT format bug — because the test never actually decoded a frame through libva and read pixels; it only compared control bytes.
+
+iter1 (MPEG-2) closed via direct pixel verification, and it happens to work because hantro tolerates the wrong OUTPUT format. iter1's "PASS direct" stands.
+
+H.264 (T4) is in the carry-over from libva-multiplanar's ohm work. It also happens to use the default H264_SLICE format. Decoded keyframes through libva were verified at iter1 P0 cross-validator (real pixels at +30s seek). But the new Phase 3 data shows inter frames go all-zero for H.264 too. **That's a separate Bug 4** (H.264 inter-frame race or sync gap) that the iter1 P0 testing didn't expose because mpv's `--vo=image --frames=2 --start=+30s` happens to hit a keyframe.
+
+### Bug taxonomy after loopback
+
+| Bug | Root cause | Affects | Severity |
+|---|---|---|---|
+| **Bug 2** | `surface.c:173` hardcodes OUTPUT format H264_SLICE for every profile | HEVC, VP9, VP8 (rkvdec + hantro that strictly checks format) | HIGH — codec-class bug, masks 3 of 5 codecs |
+| **Bug 3** | None — Phase 5 confirmed Bug 3 doesn't exist as UAPI drift | (n/a) | (n/a) |
+| **Bug 4** | (new, from Phase 3 + this loopback) H.264 inter frames produce all-zero pages through libva even though OUTPUT format matches | H.264 inter frames specifically | MEDIUM — keyframes work, inter frames don't |
+
+## Re-locked Phase 1 success criteria (iter5b)
+
+The original 4 criteria are partially still valid; criterion 1 changes target (backend fix, not kernel patch). Criteria 2 (substrate ships from kernel-agent) is dropped — no kernel patches needed for the new fix. Criterion 3 (no codec-contract regression) stays. Criterion 4 (5/5 direct) stays as the bar.
+
+> **"Fix the libva backend's OUTPUT pixel-format hardcoding (surface.c:173) so that each profile sets the correct V4L2_PIX_FMT_*_SLICE / *_FRAME on the OUTPUT_MPLANE buffer. After fix: ffmpeg-vaapi-hwdownload for HEVC + VP9 + VP8 produces YUV byte-identical to the kernel-direct + SW reference. MPEG-2 continues to pass. H.264 keyframes continue to decode correctly through libva. H.264 inter-frame Bug 4 is OUT OF SCOPE for iter5b; deferred to a follow-up iteration."**
+
+### Pass/fail (boolean, iter5b)
+
+1. **Bug 2 closed for HEVC, VP9, VP8** — `libva_<codec>.yuv == kdirect_<codec>.yuv == sw_<codec>.yuv` (SHA256-equal raw YUV bytes), for `bbb_720p10s_{hevc.mp4, vp9.webm, vp8.webm}`, 3-frame test, on the current `linux-fresnel-fourier 7.0-1` kernel (no kernel patches).
+2. **No regression on MPEG-2** — `libva_mpeg2.yuv == kdirect_mpeg2.yuv` still holds (matches Phase 3 baseline). MPEG-2 already worked; the OUTPUT format fix should not break it.
+3. **H.264 keyframe still decodes** — H.264 first frame (keyframe) through libva still produces real content (`81 81 80 80 …` neutral chroma row at byte offset 0 of frame 1). Bug 4 (inter frames all-zero) is acceptable for iter5b close — out of scope, recorded as backlog. H.264 still passes via the iter1-P0 keyframe-seek path.
+4. **Control-payload anchors hold** — `VIDIOC_S_EXT_CTRLS` payload for each codec on the fixed backend byte-matches the iter5 Phase 3 anchor. Backend control-handling code is unchanged; only the OUTPUT pixel format setup changes.
+
+Clean iter5b close = 4/4 criteria green.
+
+## Phase 2 source-read targets for iter5b
+
+The fix site is `src/surface.c:173`. The supporting per-profile mapping table needs to derive from:
+
+- VA-API VAProfile* enum (`/usr/include/va/va.h`).
+- Backend's existing profile dispatch (e.g., `config.c::RequestCreateConfig` switch, `picture.c::codec_set_controls` dispatch).
+- Kernel UAPI per-codec OUTPUT pixel formats (`<linux/v4l2-controls.h>`, FOURCC values).
+
+Expected mapping:
+
+| VA profile (object_config->profile) | V4L2_PIX_FMT_* | Already in config.c?(grep at iter5 Phase 0) |
+|---|---|---|
+| VAProfileH264* | H264_SLICE (0x34363253 = 'S','2','6','4') | yes line 151-154 |
+| VAProfileHEVCMain | HEVC_SLICE (0x53434548 = 'H','E','V','S' ?) | yes line 165-168 |
+| VAProfileMPEG2* | MPEG2_SLICE | yes line 140-143 |
+| VAProfileVP8Version0_3 | VP8_FRAME | yes line 175-178 |
+| VAProfileVP9Profile0 | VP9_FRAME | yes line 185-188 |
+
+config.c already knows the mapping (uses it for profile-enumeration probes). Phase 4 plan for iter5b is to thread the active profile from CreateContext through to CreateSurfaces, OR defer the OUTPUT-side `v4l2_set_format` from CreateSurfaces to CreateContext when the profile is known, OR look up the active context's profile from `driver_data` at CreateSurfaces time.
+
+Which approach is cleanest is a Phase 4 plan question, not a Phase 0 question.
+
+## Open question for iter5b Phase 4
+
+Where in the VA-API lifecycle should the OUTPUT format S_FMT happen?
+
+- **Option α**: At CreateSurfaces (current site), but read profile from `driver_data->current_profile` which must be set at CreateConfig OR CreateContext. Simplest patch.
+- **Option β**: Defer the S_FMT + CREATE_BUFS lifecycle entirely from CreateSurfaces to CreateContext (when profile is unambiguously known). Larger refactor but architecturally cleanest.
+- **Option γ**: Trigger S_FMT lazily at first BeginPicture, when profile is definitely active. Requires checking format on every BeginPicture and conditionally REQBUFS(0)+S_FMT+CREATE_BUFS — heavyweight if BeginPicture fires often.
+
+Phase 4 of iter5b picks one of these.
+
+## Bug 4 — H.264 inter frame race (OUT OF SCOPE for iter5b)
+
+Empirical signature (Phase 3 + this loopback):
+- H.264 keyframe: real content (`81 81 80 80 …` chroma row in frame 1 first 16 bytes).
+- H.264 inter frame: all-zero (frame 2 + 3 byte-by-byte zero).
+- This pattern is consistent (not intermittent), unlike the original "race" hypothesis would predict.
+
+Possible causes:
+- DPB-related issue specifically affecting inter frames (kernel rejects inter decode because reference list is malformed).
+- Some other backend-side state that's wrong for inter (e.g., decode_params flags).
+- An actual race that happens to lose every time for inter (decode is slower for inter because of motion compensation, while keyframes can finish before the readback).
+
+Filed as **Bug 4**. Investigated by a future iteration (likely iter6). Not iter5b's surface — Bug 2 OUTPUT format fix is enough for 3 codecs (HEVC, VP9, VP8) to gain direct-verification PASS status.
+
+After iter5b: campaign scoreboard becomes "5/5 with 4 direct + 1 (H.264 inter) partial" — better than the current "5/5 with 1 direct (MPEG-2) + 4 transitive."
+
+## Substrate at loopback open
+
+- Kernel: `linux-fresnel-fourier 7.0-1`. Unchanged.
+- Fork tip: `692eaa0`. Unchanged.
+- Backend installed: SHA256 `6e90b7a9b2c33480…`. Unchanged.
+- Test fixtures: unchanged.
+- Boltzmann: reachable as of Phase 4/Phase 5; not needed for iter5b (no kernel work).
+- vb2_dma_resv RFC v2 patches: still local at `~/src/linux-rfc/`. Filed back to the kernel-agent backlog for a future campaign that targets the DMABUF-import consumer path (KWin/Mesa).
+
+## Memory rules touched
+
+- New: `feedback_trace_fix_mechanism_to_consumer.md` (pinned at start of this loopback per operator instruction).
+- `feedback_unconditional_codec_state.md` — applied: surface.c's pixelformat hardcode is the same class as the h264_start_code unconditional-set bug iter4 fixed. The lesson generalizes.
+- `feedback_review_empirical_over_theoretical.md` — Phase 5 review embodied Direction 2 by tracing producer→consumer empirically.
+
+## What iter5b looks like
+
+1. **Phase 1 lock** (this doc): bullets above. 4 criteria.
+2. **Phase 2 source-read**: pick implementation option α/β/γ. Cite line numbers.
+3. **Phase 3 baseline**: re-run Phase 3 sweep (already done; baseline already captured at `iter5_phase3_baseline.tgz`). No new baseline needed.
+4. **Phase 4 plan**: patch shape, exact diff for surface.c, ancillary changes to track profile in driver_data.
+5. **Phase 5 review**: sonnet-architect, focused on the profile-threading mechanism.
+6. **Phase 6 implementation**: backend patch + rebuild + install. Estimated <100 LOC.
+7. **Phase 7 verification**: re-run Phase 3 sweep with the fixed backend; expect libva == kdirect for HEVC+VP9+VP8.
+8. **Phase 8 close**: campaign scoreboard updated.
+
+Estimated cadence: half a session for Phase 2-7. The fix is small, the verification is fast, no kernel build needed.