iter5 Phase 0 loopback: real Bug 2 is surface.c:173 hardcoded OUTPUT format

Empirical strace of all 5 codecs through libva shows VIDIOC_S_FMT on
OUTPUT_MPLANE ships pixelformat V4L2_PIX_FMT_H264_SLICE for EVERY
profile. HEVC controls submitted on H264_SLICE OUTPUT → kernel rkvdec
silently rejects/no-ops → CAPTURE stays in cap_pool init (all-zero).

Per-codec Bug 2 taxonomy:
- HEVC, VP9, VP8: OUTPUT format mismatch on rkvdec/hantro-strict → 100% zero
- MPEG-2: format mismatch but hantro tolerates → works
- H.264: format right by coincidence; keyframe decodes, inter all-zero
  (Bug 4, separate, deferred from iter5b)

Site: src/surface.c:173 `unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE`.
Same bug class as feedback_unconditional_codec_state.md
(iter4 h264_start_code = true).

iter5b new Phase 1: fix surface.c to switch pixelformat on
config_object->profile. 4 criteria locked, all backend-side, no kernel
patches. RFC v2 series filed back to backlog for a future
DMABUF-import-consumer campaign.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-11 11:21:41 +00:00
parent 0adfb11fff
commit cd34ec1918
+147
View File
@@ -0,0 +1,147 @@
# Iteration 5 — Phase 0 loopback (re-root-cause Bug 2)
Captured 2026-05-11 mid-day after Phase 5 review CRIT-1 invalidated the iter5 Phase 4 plan. iter5 returns to Phase 0 per `feedback_dev_process.md`. The vb2_dma_resv-RFC-v2-as-Bug-2-fix hypothesis is **rejected**. This document captures the new empirical evidence and re-frames iter5.
## What Phase 5 + this loopback found
The original Bug 2 framing (Phase 0 Candidate B, Phase 2 situation, Phase 3 baseline, Phase 4 plan) was: "userspace cap_pool readback races ahead of kernel decoder completion; vb2_dma_resv RFC v2 closes the race." Phase 5 reviewer empirically traced producer→primitive→consumer-read-site through the actual libva backend code path and found the fence mechanism never reaches the MMAP+EXPBUF path the backend uses. The author re-verified at the surface.c source level: `RequestSyncSurface` already does `media_request_wait_completion + v4l2_dequeue_buffer`, which already block until decode-DONE. The fence would have been a no-op for the libva path even if it had reached the right resv.
So why are pages all-zero?
This loopback's empirical investigation answers it.
### Empirical finding: OUTPUT pixel format hardcoded H264_SLICE
`/home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/src/surface.c:173`:
```c
unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE;
```
This is the OUTPUT-side pixel format the backend sets on the V4L2 queue. **It is hardcoded, regardless of which profile is active.** When the backend then submits HEVC controls (`V4L2_CID_STATELESS_HEVC_*`) on an OUTPUT buffer queued with H264_SLICE format, the kernel rkvdec driver sees a fundamental contract mismatch:
- OUTPUT buffer format claims: H.264 NAL slices.
- Submitted controls claim: HEVC SPS/PPS/decode_params/slice_params/scaling_matrix.
Kernel doesn't decode (logs no error in dmesg — silent rejection). The CAPTURE buffer stays in the cap_pool init state (all-zero). When userspace `vaDeriveImage + vaMapBuffer + ffmpeg hwdownload` reads, it reads the unmodified all-zero pages.
This is the **same class of bug** as memory `feedback_unconditional_codec_state.md`: codec-specific state set unconditionally without profile gating. The iter4 fix for `h264_start_code` gated it on H.264/HEVC profiles. Surface.c's pixelformat needs an analogous gate.
### Per-codec Bug 2 taxonomy (post-loopback)
| Codec | OUTPUT format in libva | Match? | Phase 3 libva result | Phase 3 explanation |
|---|---|---|---|---|
| H.264 | H264_SLICE | **right** | 99.99% zero + traces of keyframe content | Format right; **separate H.264 inter-frame bug** — race or something else |
| HEVC | H264_SLICE | **wrong** | 100% zero | OUTPUT format mismatch → kernel rkvdec rejects/no-ops |
| VP9 | H264_SLICE | **wrong** | 100% zero | OUTPUT format mismatch → kernel rkvdec rejects/no-ops |
| MPEG-2 | H264_SLICE | **wrong but tolerated** | Real decoded pixels (libva == kdirect) | hantro is single-codec; ignores OUTPUT format mismatch, dispatches on control class. Or got lucky on timing. |
| VP8 | H264_SLICE | **wrong** | 100% zero | OUTPUT format mismatch on hantro → unlike MPEG-2, VP8 doesn't tolerate it |
Empirically verified via strace inspection at fresnel `/tmp/iter5_fmt/<codec>/trace.*` 2026-05-11: each codec's first `VIDIOC_S_FMT` on `V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE` ships pixelformat `S264` (= `V4L2_PIX_FMT_H264_SLICE`, FOURCC 0x34363253). No corrective re-S_FMT later in any codec's trace.
### What this means for iter1-iter4 close validity
iter2 (HEVC), iter3 (VP8), and iter4 (VP9) all closed via **transitive proof**: backend's `VIDIOC_S_EXT_CTRLS` payload byte-matched the kernel-direct ffmpeg-v4l2request anchor. That proof verified the **control-payload contract**, not the OUTPUT pixel format. The transitive proof was technically sound for what it claimed (controls correctly shaped), but **silently masked** the OUTPUT format bug — because the test never actually decoded a frame through libva and read pixels; it only compared control bytes.
iter1 (MPEG-2) closed via direct pixel verification, and it happens to work because hantro tolerates the wrong OUTPUT format. iter1's "PASS direct" stands.
H.264 (T4) is in the carry-over from libva-multiplanar's ohm work. It also happens to use the default H264_SLICE format. Decoded keyframes through libva were verified at iter1 P0 cross-validator (real pixels at +30s seek). But the new Phase 3 data shows inter frames go all-zero for H.264 too. **That's a separate Bug 4** (H.264 inter-frame race or sync gap) that the iter1 P0 testing didn't expose because mpv's `--vo=image --frames=2 --start=+30s` happens to hit a keyframe.
### Bug taxonomy after loopback
| Bug | Root cause | Affects | Severity |
|---|---|---|---|
| **Bug 2** | `surface.c:173` hardcodes OUTPUT format H264_SLICE for every profile | HEVC, VP9, VP8 (rkvdec + hantro that strictly checks format) | HIGH — codec-class bug, masks 3 of 5 codecs |
| **Bug 3** | None — Phase 5 confirmed Bug 3 doesn't exist as UAPI drift | (n/a) | (n/a) |
| **Bug 4** | (new, from Phase 3 + this loopback) H.264 inter frames produce all-zero pages through libva even though OUTPUT format matches | H.264 inter frames specifically | MEDIUM — keyframes work, inter frames don't |
## Re-locked Phase 1 success criteria (iter5b)
The original 4 criteria are partially still valid; criterion 1 changes target (backend fix, not kernel patch). Criteria 2 (substrate ships from kernel-agent) is dropped — no kernel patches needed for the new fix. Criterion 3 (no codec-contract regression) stays. Criterion 4 (5/5 direct) stays as the bar.
> **"Fix the libva backend's OUTPUT pixel-format hardcoding (surface.c:173) so that each profile sets the correct V4L2_PIX_FMT_*_SLICE / *_FRAME on the OUTPUT_MPLANE buffer. After fix: ffmpeg-vaapi-hwdownload for HEVC + VP9 + VP8 produces YUV byte-identical to the kernel-direct + SW reference. MPEG-2 continues to pass. H.264 keyframes continue to decode correctly through libva. H.264 inter-frame Bug 4 is OUT OF SCOPE for iter5b; deferred to a follow-up iteration."**
### Pass/fail (boolean, iter5b)
1. **Bug 2 closed for HEVC, VP9, VP8**`libva_<codec>.yuv == kdirect_<codec>.yuv == sw_<codec>.yuv` (SHA256-equal raw YUV bytes), for `bbb_720p10s_{hevc.mp4, vp9.webm, vp8.webm}`, 3-frame test, on the current `linux-fresnel-fourier 7.0-1` kernel (no kernel patches).
2. **No regression on MPEG-2**`libva_mpeg2.yuv == kdirect_mpeg2.yuv` still holds (matches Phase 3 baseline). MPEG-2 already worked; the OUTPUT format fix should not break it.
3. **H.264 keyframe still decodes** — H.264 first frame (keyframe) through libva still produces real content (`81 81 80 80 …` neutral chroma row at byte offset 0 of frame 1). Bug 4 (inter frames all-zero) is acceptable for iter5b close — out of scope, recorded as backlog. H.264 still passes via the iter1-P0 keyframe-seek path.
4. **Control-payload anchors hold**`VIDIOC_S_EXT_CTRLS` payload for each codec on the fixed backend byte-matches the iter5 Phase 3 anchor. Backend control-handling code is unchanged; only the OUTPUT pixel format setup changes.
Clean iter5b close = 4/4 criteria green.
## Phase 2 source-read targets for iter5b
The fix site is `src/surface.c:173`. The supporting per-profile mapping table needs to derive from:
- VA-API VAProfile* enum (`/usr/include/va/va.h`).
- Backend's existing profile dispatch (e.g., `config.c::RequestCreateConfig` switch, `picture.c::codec_set_controls` dispatch).
- Kernel UAPI per-codec OUTPUT pixel formats (`<linux/v4l2-controls.h>`, FOURCC values).
Expected mapping:
| VA profile (object_config->profile) | V4L2_PIX_FMT_* | Already in config.c?(grep at iter5 Phase 0) |
|---|---|---|
| VAProfileH264* | H264_SLICE (0x34363253 = 'S','2','6','4') | yes line 151-154 |
| VAProfileHEVCMain | HEVC_SLICE (0x53434548 = 'H','E','V','S' ?) | yes line 165-168 |
| VAProfileMPEG2* | MPEG2_SLICE | yes line 140-143 |
| VAProfileVP8Version0_3 | VP8_FRAME | yes line 175-178 |
| VAProfileVP9Profile0 | VP9_FRAME | yes line 185-188 |
config.c already knows the mapping (uses it for profile-enumeration probes). Phase 4 plan for iter5b is to thread the active profile from CreateContext through to CreateSurfaces, OR defer the OUTPUT-side `v4l2_set_format` from CreateSurfaces to CreateContext when the profile is known, OR look up the active context's profile from `driver_data` at CreateSurfaces time.
Which approach is cleanest is a Phase 4 plan question, not a Phase 0 question.
## Open question for iter5b Phase 4
Where in the VA-API lifecycle should the OUTPUT format S_FMT happen?
- **Option α**: At CreateSurfaces (current site), but read profile from `driver_data->current_profile` which must be set at CreateConfig OR CreateContext. Simplest patch.
- **Option β**: Defer the S_FMT + CREATE_BUFS lifecycle entirely from CreateSurfaces to CreateContext (when profile is unambiguously known). Larger refactor but architecturally cleanest.
- **Option γ**: Trigger S_FMT lazily at first BeginPicture, when profile is definitely active. Requires checking format on every BeginPicture and conditionally REQBUFS(0)+S_FMT+CREATE_BUFS — heavyweight if BeginPicture fires often.
Phase 4 of iter5b picks one of these.
## Bug 4 — H.264 inter frame race (OUT OF SCOPE for iter5b)
Empirical signature (Phase 3 + this loopback):
- H.264 keyframe: real content (`81 81 80 80 …` chroma row in frame 1 first 16 bytes).
- H.264 inter frame: all-zero (frame 2 + 3 byte-by-byte zero).
- This pattern is consistent (not intermittent), unlike the original "race" hypothesis would predict.
Possible causes:
- DPB-related issue specifically affecting inter frames (kernel rejects inter decode because reference list is malformed).
- Some other backend-side state that's wrong for inter (e.g., decode_params flags).
- An actual race that happens to lose every time for inter (decode is slower for inter because of motion compensation, while keyframes can finish before the readback).
Filed as **Bug 4**. Investigated by a future iteration (likely iter6). Not iter5b's surface — Bug 2 OUTPUT format fix is enough for 3 codecs (HEVC, VP9, VP8) to gain direct-verification PASS status.
After iter5b: campaign scoreboard becomes "5/5 with 4 direct + 1 (H.264 inter) partial" — better than the current "5/5 with 1 direct (MPEG-2) + 4 transitive."
## Substrate at loopback open
- Kernel: `linux-fresnel-fourier 7.0-1`. Unchanged.
- Fork tip: `692eaa0`. Unchanged.
- Backend installed: SHA256 `6e90b7a9b2c33480…`. Unchanged.
- Test fixtures: unchanged.
- Boltzmann: reachable as of Phase 4/Phase 5; not needed for iter5b (no kernel work).
- vb2_dma_resv RFC v2 patches: still local at `~/src/linux-rfc/`. Filed back to the kernel-agent backlog for a future campaign that targets the DMABUF-import consumer path (KWin/Mesa).
## Memory rules touched
- New: `feedback_trace_fix_mechanism_to_consumer.md` (pinned at start of this loopback per operator instruction).
- `feedback_unconditional_codec_state.md` — applied: surface.c's pixelformat hardcode is the same class as the h264_start_code unconditional-set bug iter4 fixed. The lesson generalizes.
- `feedback_review_empirical_over_theoretical.md` — Phase 5 review embodied Direction 2 by tracing producer→consumer empirically.
## What iter5b looks like
1. **Phase 1 lock** (this doc): bullets above. 4 criteria.
2. **Phase 2 source-read**: pick implementation option α/β/γ. Cite line numbers.
3. **Phase 3 baseline**: re-run Phase 3 sweep (already done; baseline already captured at `iter5_phase3_baseline.tgz`). No new baseline needed.
4. **Phase 4 plan**: patch shape, exact diff for surface.c, ancillary changes to track profile in driver_data.
5. **Phase 5 review**: sonnet-architect, focused on the profile-threading mechanism.
6. **Phase 6 implementation**: backend patch + rebuild + install. Estimated <100 LOC.
7. **Phase 7 verification**: re-run Phase 3 sweep with the fixed backend; expect libva == kdirect for HEVC+VP9+VP8.
8. **Phase 8 close**: campaign scoreboard updated.
Estimated cadence: half a session for Phase 2-7. The fix is small, the verification is fast, no kernel build needed.