# Iteration 5 — Phase 0 loopback (re-root-cause Bug 2) Captured 2026-05-11 mid-day after Phase 5 review CRIT-1 invalidated the iter5 Phase 4 plan. iter5 returns to Phase 0 per `feedback_dev_process.md`. The vb2_dma_resv-RFC-v2-as-Bug-2-fix hypothesis is **rejected**. This document captures the new empirical evidence and re-frames iter5. ## What Phase 5 + this loopback found The original Bug 2 framing (Phase 0 Candidate B, Phase 2 situation, Phase 3 baseline, Phase 4 plan) was: "userspace cap_pool readback races ahead of kernel decoder completion; vb2_dma_resv RFC v2 closes the race." Phase 5 reviewer empirically traced producer→primitive→consumer-read-site through the actual libva backend code path and found the fence mechanism never reaches the MMAP+EXPBUF path the backend uses. The author re-verified at the surface.c source level: `RequestSyncSurface` already does `media_request_wait_completion + v4l2_dequeue_buffer`, which already block until decode-DONE. The fence would have been a no-op for the libva path even if it had reached the right resv. So why are pages all-zero? This loopback's empirical investigation answers it. ### Empirical finding: OUTPUT pixel format hardcoded H264_SLICE `/home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/src/surface.c:173`: ```c unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE; ``` This is the OUTPUT-side pixel format the backend sets on the V4L2 queue. **It is hardcoded, regardless of which profile is active.** When the backend then submits HEVC controls (`V4L2_CID_STATELESS_HEVC_*`) on an OUTPUT buffer queued with H264_SLICE format, the kernel rkvdec driver sees a fundamental contract mismatch: - OUTPUT buffer format claims: H.264 NAL slices. - Submitted controls claim: HEVC SPS/PPS/decode_params/slice_params/scaling_matrix. Kernel doesn't decode (logs no error in dmesg — silent rejection). The CAPTURE buffer stays in the cap_pool init state (all-zero). When userspace `vaDeriveImage + vaMapBuffer + ffmpeg hwdownload` reads, it reads the unmodified all-zero pages. This is the **same class of bug** as memory `feedback_unconditional_codec_state.md`: codec-specific state set unconditionally without profile gating. The iter4 fix for `h264_start_code` gated it on H.264/HEVC profiles. Surface.c's pixelformat needs an analogous gate. ### Per-codec Bug 2 taxonomy (post-loopback) | Codec | OUTPUT format in libva | Match? | Phase 3 libva result | Phase 3 explanation | |---|---|---|---|---| | H.264 | H264_SLICE | **right** | 99.99% zero + traces of keyframe content | Format right; **separate H.264 inter-frame bug** — race or something else | | HEVC | H264_SLICE | **wrong** | 100% zero | OUTPUT format mismatch → kernel rkvdec rejects/no-ops | | VP9 | H264_SLICE | **wrong** | 100% zero | OUTPUT format mismatch → kernel rkvdec rejects/no-ops | | MPEG-2 | H264_SLICE | **wrong but tolerated** | Real decoded pixels (libva == kdirect) | hantro is single-codec; ignores OUTPUT format mismatch, dispatches on control class. Or got lucky on timing. | | VP8 | H264_SLICE | **wrong** | 100% zero | OUTPUT format mismatch on hantro → unlike MPEG-2, VP8 doesn't tolerate it | Empirically verified via strace inspection at fresnel `/tmp/iter5_fmt//trace.*` 2026-05-11: each codec's first `VIDIOC_S_FMT` on `V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE` ships pixelformat `S264` (= `V4L2_PIX_FMT_H264_SLICE`, FOURCC 0x34363253). No corrective re-S_FMT later in any codec's trace. ### What this means for iter1-iter4 close validity iter2 (HEVC), iter3 (VP8), and iter4 (VP9) all closed via **transitive proof**: backend's `VIDIOC_S_EXT_CTRLS` payload byte-matched the kernel-direct ffmpeg-v4l2request anchor. That proof verified the **control-payload contract**, not the OUTPUT pixel format. The transitive proof was technically sound for what it claimed (controls correctly shaped), but **silently masked** the OUTPUT format bug — because the test never actually decoded a frame through libva and read pixels; it only compared control bytes. iter1 (MPEG-2) closed via direct pixel verification, and it happens to work because hantro tolerates the wrong OUTPUT format. iter1's "PASS direct" stands. H.264 (T4) is in the carry-over from libva-multiplanar's ohm work. It also happens to use the default H264_SLICE format. Decoded keyframes through libva were verified at iter1 P0 cross-validator (real pixels at +30s seek). But the new Phase 3 data shows inter frames go all-zero for H.264 too. **That's a separate Bug 4** (H.264 inter-frame race or sync gap) that the iter1 P0 testing didn't expose because mpv's `--vo=image --frames=2 --start=+30s` happens to hit a keyframe. ### Bug taxonomy after loopback | Bug | Root cause | Affects | Severity | |---|---|---|---| | **Bug 2** | `surface.c:173` hardcodes OUTPUT format H264_SLICE for every profile | HEVC, VP9, VP8 (rkvdec + hantro that strictly checks format) | HIGH — codec-class bug, masks 3 of 5 codecs | | **Bug 3** | None — Phase 5 confirmed Bug 3 doesn't exist as UAPI drift | (n/a) | (n/a) | | **Bug 4** | (new, from Phase 3 + this loopback) H.264 inter frames produce all-zero pages through libva even though OUTPUT format matches | H.264 inter frames specifically | MEDIUM — keyframes work, inter frames don't | ## Re-locked Phase 1 success criteria (iter5b) The original 4 criteria are partially still valid; criterion 1 changes target (backend fix, not kernel patch). Criteria 2 (substrate ships from kernel-agent) is dropped — no kernel patches needed for the new fix. Criterion 3 (no codec-contract regression) stays. Criterion 4 (5/5 direct) stays as the bar. > **"Fix the libva backend's OUTPUT pixel-format hardcoding (surface.c:173) so that each profile sets the correct V4L2_PIX_FMT_*_SLICE / *_FRAME on the OUTPUT_MPLANE buffer. After fix: ffmpeg-vaapi-hwdownload for HEVC + VP9 + VP8 produces YUV byte-identical to the kernel-direct + SW reference. MPEG-2 continues to pass. H.264 keyframes continue to decode correctly through libva. H.264 inter-frame Bug 4 is OUT OF SCOPE for iter5b; deferred to a follow-up iteration."** ### Pass/fail (boolean, iter5b) 1. **Bug 2 closed for HEVC, VP9, VP8** — `libva_.yuv == kdirect_.yuv == sw_.yuv` (SHA256-equal raw YUV bytes), for `bbb_720p10s_{hevc.mp4, vp9.webm, vp8.webm}`, 3-frame test, on the current `linux-fresnel-fourier 7.0-1` kernel (no kernel patches). 2. **No regression on MPEG-2** — `libva_mpeg2.yuv == kdirect_mpeg2.yuv` still holds (matches Phase 3 baseline). MPEG-2 already worked; the OUTPUT format fix should not break it. 3. **H.264 keyframe still decodes** — H.264 first frame (keyframe) through libva still produces real content (`81 81 80 80 …` neutral chroma row at byte offset 0 of frame 1). Bug 4 (inter frames all-zero) is acceptable for iter5b close — out of scope, recorded as backlog. H.264 still passes via the iter1-P0 keyframe-seek path. 4. **Control-payload anchors hold** — `VIDIOC_S_EXT_CTRLS` payload for each codec on the fixed backend byte-matches the iter5 Phase 3 anchor. Backend control-handling code is unchanged; only the OUTPUT pixel format setup changes. Clean iter5b close = 4/4 criteria green. ## Phase 2 source-read targets for iter5b The fix site is `src/surface.c:173`. The supporting per-profile mapping table needs to derive from: - VA-API VAProfile* enum (`/usr/include/va/va.h`). - Backend's existing profile dispatch (e.g., `config.c::RequestCreateConfig` switch, `picture.c::codec_set_controls` dispatch). - Kernel UAPI per-codec OUTPUT pixel formats (``, FOURCC values). Expected mapping: | VA profile (object_config->profile) | V4L2_PIX_FMT_* | Already in config.c?(grep at iter5 Phase 0) | |---|---|---| | VAProfileH264* | H264_SLICE (0x34363253 = 'S','2','6','4') | yes line 151-154 | | VAProfileHEVCMain | HEVC_SLICE (0x53434548 = 'H','E','V','S' ?) | yes line 165-168 | | VAProfileMPEG2* | MPEG2_SLICE | yes line 140-143 | | VAProfileVP8Version0_3 | VP8_FRAME | yes line 175-178 | | VAProfileVP9Profile0 | VP9_FRAME | yes line 185-188 | config.c already knows the mapping (uses it for profile-enumeration probes). Phase 4 plan for iter5b is to thread the active profile from CreateContext through to CreateSurfaces, OR defer the OUTPUT-side `v4l2_set_format` from CreateSurfaces to CreateContext when the profile is known, OR look up the active context's profile from `driver_data` at CreateSurfaces time. Which approach is cleanest is a Phase 4 plan question, not a Phase 0 question. ## Open question for iter5b Phase 4 Where in the VA-API lifecycle should the OUTPUT format S_FMT happen? - **Option α**: At CreateSurfaces (current site), but read profile from `driver_data->current_profile` which must be set at CreateConfig OR CreateContext. Simplest patch. - **Option β**: Defer the S_FMT + CREATE_BUFS lifecycle entirely from CreateSurfaces to CreateContext (when profile is unambiguously known). Larger refactor but architecturally cleanest. - **Option γ**: Trigger S_FMT lazily at first BeginPicture, when profile is definitely active. Requires checking format on every BeginPicture and conditionally REQBUFS(0)+S_FMT+CREATE_BUFS — heavyweight if BeginPicture fires often. Phase 4 of iter5b picks one of these. ## Bug 4 — H.264 inter frame race (OUT OF SCOPE for iter5b) Empirical signature (Phase 3 + this loopback): - H.264 keyframe: real content (`81 81 80 80 …` chroma row in frame 1 first 16 bytes). - H.264 inter frame: all-zero (frame 2 + 3 byte-by-byte zero). - This pattern is consistent (not intermittent), unlike the original "race" hypothesis would predict. Possible causes: - DPB-related issue specifically affecting inter frames (kernel rejects inter decode because reference list is malformed). - Some other backend-side state that's wrong for inter (e.g., decode_params flags). - An actual race that happens to lose every time for inter (decode is slower for inter because of motion compensation, while keyframes can finish before the readback). Filed as **Bug 4**. Investigated by a future iteration (likely iter6). Not iter5b's surface — Bug 2 OUTPUT format fix is enough for 3 codecs (HEVC, VP9, VP8) to gain direct-verification PASS status. After iter5b: campaign scoreboard becomes "5/5 with 4 direct + 1 (H.264 inter) partial" — better than the current "5/5 with 1 direct (MPEG-2) + 4 transitive." ## Substrate at loopback open - Kernel: `linux-fresnel-fourier 7.0-1`. Unchanged. - Fork tip: `692eaa0`. Unchanged. - Backend installed: SHA256 `6e90b7a9b2c33480…`. Unchanged. - Test fixtures: unchanged. - Boltzmann: reachable as of Phase 4/Phase 5; not needed for iter5b (no kernel work). - vb2_dma_resv RFC v2 patches: still local at `~/src/linux-rfc/`. Filed back to the kernel-agent backlog for a future campaign that targets the DMABUF-import consumer path (KWin/Mesa). ## Memory rules touched - New: `feedback_trace_fix_mechanism_to_consumer.md` (pinned at start of this loopback per operator instruction). - `feedback_unconditional_codec_state.md` — applied: surface.c's pixelformat hardcode is the same class as the h264_start_code unconditional-set bug iter4 fixed. The lesson generalizes. - `feedback_review_empirical_over_theoretical.md` — Phase 5 review embodied Direction 2 by tracing producer→consumer empirically. ## What iter5b looks like 1. **Phase 1 lock** (this doc): bullets above. 4 criteria. 2. **Phase 2 source-read**: pick implementation option α/β/γ. Cite line numbers. 3. **Phase 3 baseline**: re-run Phase 3 sweep (already done; baseline already captured at `iter5_phase3_baseline.tgz`). No new baseline needed. 4. **Phase 4 plan**: patch shape, exact diff for surface.c, ancillary changes to track profile in driver_data. 5. **Phase 5 review**: sonnet-architect, focused on the profile-threading mechanism. 6. **Phase 6 implementation**: backend patch + rebuild + install. Estimated <100 LOC. 7. **Phase 7 verification**: re-run Phase 3 sweep with the fixed backend; expect libva == kdirect for HEVC+VP9+VP8. 8. **Phase 8 close**: campaign scoreboard updated. Estimated cadence: half a session for Phase 2-7. The fix is small, the verification is fast, no kernel build needed.