# Iteration 6 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock Opens 2026-05-12 immediately after iter5b-β close ([`phase8_iteration5b_close.md`](phase8_iteration5b_close.md), commit `9a14cc2`). Per `feedback_dev_process.md` Phase 0, this document captures iter6's substrate state, the iter5b-β-surfaced bug inventory, and candidate research questions for Phase 1 lock. iter5b-β was the first iteration to break the "codec N + 1" pattern. iter6 inherits an even messier menu: 3 named bugs from iter5b, the iter4-B1+B4 carry-overs, and the option-Φ candidates from the original iter5 Phase 0 doc that weren't picked. ## Substrate state (verified 2026-05-12 at iter5b-β close) | Property | Value | Notes | |---|---|---| | Kernel | `7.0.0-fresnel-fourier` | `linux-fresnel-fourier 7.0-1` kernel-agent product; unchanged through iter5b. | | Boot device numbering (today) | rkvdec `/dev/video1+/dev/media0`, hantro-vpu-dec `/dev/video3+/dev/media1` | Different from yesterday; iter4-B1 still open. | | Fork tip | `70196f8` (β + Commit D) | On noether + fresnel + gitea. | | Backend installed | `/usr/lib/dri/v4l2_request_drv_video.so` SHA `2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8` | β architecture; OUTPUT lifecycle owned by CreateContext. | | Codec scoreboard | 5/5 with 2 direct (VP9, MPEG-2) + 3 mixed (H.264 keyframe-partial Bug 4, VP8 partial Bug 6, HEVC transitive-only direct-FAIL Bug 5) | iter5b-β closed VP9 directly; others remain mixed. | ## Bug inventory after iter5b-β ### Active bugs with explicit reproduction signatures **Bug 4 — H.264 inter-frame race-loss** *(carried from iter4 Phase 7)* - Signature: H.264 keyframe decodes correctly through libva; inter frames return all-zero pages. - Reproduce: `ffmpeg -hwaccel vaapi -i bbb_1080p30_h264.mp4 -frames:v 3 -vf hwdownload,format=nv12,format=yuv420p -f rawvideo out.yuv`. Hash `71ac099b…`. 99.99% zero; frame 1 first 16 bytes = `81 81 80 80 80 7f 7f 7f 7f 7f 7f 80 80 80 81 81` (real chroma); frames 2, 3 fully zero. - Hypothesis surface: slot rotation, partial DPB, or some inter-specific submission gap. **Bug 5 — HEVC libva DQBUF returns FLAG_ERROR** - Signature: every HEVC libva DQBUF (both OUTPUT and CAPTURE) sets `V4L2_BUF_FLAG_ERROR`. Kernel rkvdec rejects the decode. CAPTURE stays at cap_pool init pattern (all-zero). - Reproduce: same shape as Bug 4, with `bbb_720p10s_hevc.mp4`. Hash `06b2c5a0…` = all-zero. - Pre-existing: Phase 3 baseline anchor trace (pre-iter5b) also showed FLAG_ERROR on every HEVC DQBUF. iter2's "PASS via transitive proof" verified backend's control PAYLOAD matched kdirect's payload — but the kernel rejected the libva submission regardless. Some V4L2 protocol contract aspect differs between libva backend and ffmpeg-v4l2request that the transitive proof didn't capture (request_fd binding order, sequence number, ioctl sequencing, extra control needed, etc.). - Difficulty estimate: medium-to-high. Need to diff the actual V4L2 ioctl streams between libva-vaapi-HEVC and ffmpeg-v4l2request-HEVC and find what's different on the wire. **Bug 6 — VP8 libva produces non-zero but non-matching output** - Signature: VP8 libva runs decode (no DQBUF ERROR), produces real-looking content (256 unique bytes, frame 1 first 16 = `93 8e 8a 89 85 72 8c 6d 82 79 92 7e 80 80 80 80`), but output bytes diverge from kdirect's `136ce5cb…`. Pre-iter5b VP8 was all-zero (the format-mismatch issue); β unblocked decode; what's left is a different bug. - Reproduce: VP8 fixture, same harness. Hash `bcc57ed5…` (libva) vs `136ce5cb…` (kdirect == sw). 74.8% zero in libva output suggests partial fill. - Hypothesis surface: cap_pool slot rotation off-by-one, partial buffer fill per frame, or per-frame DPB sync issue. - Difficulty estimate: medium. ### Backend-class backlog items (carried forward, none touched in iter5b-β) - **iter4-B1** — auto-detect picks wrong device on per-boot enumeration shuffle. Cost: 1-2 min per session re-mapping. Backend-only fix; medium scope (proper media-topology decoder/encoder discrimination via `MEDIA_ENT_F_PROC_VIDEO_DECODER`). - **iter4-B2** — mpv-vaapi `Could not create device` for VP9. Consumer-side. - **iter4-Q6** — per-segment quant-scale lossy mapping (VP9). - **iter4-COLOR_RANGE** — VAAPI exposes no color_range field. - **B3** — picture.c BeginPicture profile-aware reset. - **B4** — context.c log suppression for unsupported codec controls (the EINVAL B4 cosmetic noise on hantro init probes). - **B5** — mpeg2 vbv_buffer_size polish. - **B6** — h265 SPS bitstream-parse fidelity gap. - **L3** — vaDeriveImage cache-stale. ### Substrate-class items (iter5 originally targeted, now contextualized) - **vb2_dma_resv RFC v2 patches**: tracked at `~/src/linux-rfc/`. Verified by iter5 Phase 5 review to NOT be the right fix for the libva backend's MMAP+EXPBUF readback path (different memory model than the patches address). Still useful for DMABUF-import compositor paths (KWin, Mesa). Not in iter6's libva-decode-correctness critical path. - **panfrost IOMMU_CACHE**: separate sibling work-stream; not in iter6 scope. ## Candidate research questions for iter6 ### Candidate F — Bug 5: HEVC libva decode kernel-rejection > *"Identify and fix the V4L2 protocol contract difference that causes kernel rkvdec to reject HEVC decode via the libva backend while accepting it via ffmpeg-v4l2request. After fix: libva HEVC == kdirect HEVC == SW (byte-identical YUV)."* **Approach sketch**: strace both libva-vaapi-HEVC and ffmpeg-v4l2request-HEVC on the same fixture; diff the ioctl streams; find the divergence (likely in S_EXT_CTRLS sequencing, MEDIA_REQUEST_IOC_QUEUE ordering, or buffer-binding ordering); patch libva backend's HEVC path or shared infrastructure. **Pros**: HEVC has the strongest "we claimed PASS but the proof was partial" stigma. Closing it directly upgrades iter2's transitive PASS to direct PASS, matching VP9's iter5b-β upgrade. **Cons**: HEVC backend code is substantial (`h265.c` was rewritten at iter2). Diff debugging via strace + source-read may take multiple sessions. Risk: the difference may be a long-standing assumption requiring substantial refactor. ### Candidate G — Bug 6: VP8 libva partial output > *"Identify and fix the cause of VP8 libva producing 74.8%-zero output (rather than the byte-identical kdirect output). After fix: libva VP8 == kdirect VP8 == SW."* **Approach sketch**: characterize the zero regions in libva_vp8 (which frames? which rows/columns? which planes?); compare with kdirect_vp8 at byte level; trace cap_pool slot binding and per-frame DPB submission. Likely a slot-rotation or partial-fill bug. **Pros**: VP8 is simpler than HEVC. The decode is already running (DQBUF success). Probably a small fix once root-caused. **Cons**: Diagnostic surface is fuzzier than HEVC's (kernel succeeds, output diverges). ### Candidate H — Bug 4: H.264 inter-frame race-loss > *"Fix H.264 inter-frame decode through libva so all frames (not just keyframes) produce correct pixels."* **Approach sketch**: similar to Bug 6. H.264 has the strongest keyframe-vs-inter discrimination — decode happens for keyframes (consistent real content in frame 1) but inter frames produce zero. Likely DPB-related: reference frame indices, request_fd lifecycle, or per-frame ordering. **Pros**: H.264 is the primary codec for most consumers. Fixing inter unlocks real video playback through libva. iter2-iter5b touched H.264 in passing; deeper investigation has been deferred since iter4. **Cons**: H.264 DPB management is intricate (B-slice L1 ref lists, fresh request_fd per frame, etc. — iter4 already touched these). The remaining bug may be subtle. ### Candidate I — Re-anchor iter6+ regression hashes on β substrate > *"Lock the now-stable per-codec hashes (VP9 `4f1565e8…`, MPEG-2 `19eefbf4…`, H.264 keyframe-partial `71ac099b…`, VP8 partial `bcc57ed5…`, HEVC all-zero `06b2c5a0…`) as iter6+ regression invariants. Verify each iter6+ patch against these anchors."* **Approach sketch**: codify the Phase 7 v2 sweep as a regression test. Add to `tests/` in the fork. Each iter6+ PR runs the sweep and compares. **Pros**: cheap; establishes a reproducible regression baseline. **Cons**: doesn't fix any bug. Maintenance work, not delivery work. ### Candidate J — iter4-B1 auto-detect device discrimination > *"Make backend auto-detect select the right V4L2 decode device on every boot regardless of `/dev/media*` enumeration order. No more env-override-per-session."* **Approach sketch**: walk media topology, require `MEDIA_ENT_F_PROC_VIDEO_DECODER` on the entities, prefer decoder-by-codec mapping. ~100 LOC in `request.c`. **Pros**: removes per-session friction. Mechanical fix. **Cons**: doesn't fix any decode bug. Quality-of-life. ## Out-of-scope items (carried unchanged) - Performance metrics (Candidate D from iter5 Phase 0) — still blocked by pixel-correctness gaps in HEVC, VP8, H.264-inter. Defer to a post-correctness iteration. - Front-end libva. - Other hardware (ohm, ampere/boltzmann). - AV1. - `cros-codecs` Rust replacement. - Bootlin / Collabora upstreaming. ## Recommendation If pressed: **Candidate H (Bug 4 H.264 inter)** for impact (H.264 is the most consumer-relevant codec), **Candidate F (Bug 5 HEVC)** for diagnostic-clarity practice (the kernel-direct comparison strace is a clean delta-finding exercise that re-validates the iter5b-β β architecture), or **Candidate G (Bug 6 VP8)** for fast iteration (simpler codec, smaller suspect surface). If multiple iterations are planned, the natural sequence is **G → H → F**: fix the simplest first (VP8), build technique, apply to harder cases. If iter6 should specifically MATCH the difficulty of iter5b-β (medium): **G or H**. If iter6 should specifically EXPAND on iter5b-β's architectural cleanup work: **J** (auto-detect harden) is the architectural fit; small backend change, removes a long-standing fragility. ## Locked research question (iteration 6, 2026-05-12) User pick: **Candidate G — Bug 6 VP8 partial output.** > *"Identify and fix the cause of VP8 libva producing 74.8%-zero output with traces of real content (hash `bcc57ed5…`) rather than the byte-identical kdirect output (`136ce5cb…`). After fix: `libva_vp8.yuv == kdirect_vp8.yuv == sw_vp8.yuv` for `bbb_720p10s_vp8.webm` 3-frame test. No regression on VP9, MPEG-2, H.264 keyframe-partial state, or HEVC."* ### Pass/fail (boolean) 1. **VP8 libva == kdirect**: `cmp -s libva_vp8.yuv kdirect_vp8.yuv` returns 0 on the standard 3-frame sweep. Both equal `136ce5cb…` (the iter5b-β kdirect anchor). 2. **VP9 unchanged**: `libva_vp9.yuv == kdirect_vp9.yuv == 4f1565e8…` (iter5b-β's PASS preserved). 3. **MPEG-2 unchanged**: `libva_mpeg2.yuv == kdirect_mpeg2.yuv == 19eefbf4…` (iter5b-β's maintained state preserved). 4. **H.264 keyframe-partial unchanged**: `libva_h264.yuv == 71ac099b…` (Bug 4 still deferred to a future iteration; no new H.264 regression introduced). 5. **HEVC unchanged**: `libva_hevc.yuv == 06b2c5a0…` all-zero (Bug 5 still deferred; no new HEVC regression). 6. **Control-payload anchors hold**: `VIDIOC_S_EXT_CTRLS` payloads on the 5-codec sweep byte-match the iter5 Phase 3 anchors. iter6 changes shouldn't touch control submission. Clean iter6 close = all six criteria green. Bug 6 is the only NEW behavior; the other four codecs must hold their iter5b-β state. ### Scope locks **In scope**: - `src/vp8.c` — VP8 backend control assembly + slice submission. - `src/picture.c` — VP8 dispatch in `codec_set_controls`, VP8 buffer-type cases, BeginPicture/EndPicture flow. - `src/surface.c` — `surface_bind_slot` (slot-to-surface binding for CAPTURE). - `src/cap_pool.c` / `request_pool.c` — slot lifecycle for VP8 path. - `surface.h` `params.vp8` union. - Any shared infrastructure that VP8 touches (request_fd lifecycle, DPB binding). **Out of scope**: - VP9 / H.264 / HEVC / MPEG-2 code paths (read-only for regression-verify). - Kernel patches (this is a backend-side bug per the empirical evidence: kernel succeeded the DQBUF, decode happened, output diverges). - Performance metrics. - Bug 4, Bug 5, all other backlog items. ### Phase 2 source-read targets For the upcoming Phase 2 situation analysis: - `src/vp8.c` — full file. iter3 wrote this; ~300 LOC. - `src/picture.c::codec_set_controls` — VP8 dispatch site. - `src/picture.c::codec_store_buffer` — VP8 buffer-type cases (Picture, Slice, IQMatrix, ProbabilityData). - `src/surface.c::surface_bind_slot` — destination_data fill. - `src/cap_pool.c::cap_pool_acquire` — slot selection logic. - Kernel UAPI `` — `V4L2_CID_STATELESS_VP8_FRAME` + `struct v4l2_ctrl_vp8_frame`. - FFmpeg `libavcodec/v4l2_request_vp8.c` (Kwiboo's downstream) — kernel-direct VP8 reference. - Phase 3 baseline strace at `iter5_phase3_baseline.tgz/anchors_vp8/` — control-payload anchor. ### Phase 3 baseline (re-acquire if needed) iter5 Phase 3 baseline already captured VP8 anchors. iter6 Phase 3 will re-verify on the current β backend + add: byte-level comparison of `libva_vp8.yuv` vs `kdirect_vp8.yuv` to identify where the divergence is (which frames? which planes? which spatial regions?). ### Predicted iter6 difficulty Small. The decode is already running through the kernel cleanly (no DQBUF ERROR like HEVC). The divergence is in cap_pool slot rotation or partial fill — most likely a 5-50 LOC fix once root-caused. Phase 2 + Phase 3 may take more time than Phase 6 implementation. ### Phase 1 → Phase 2 handoff iter6 Phase 1 is locked above. iter6 Phase 2 reads VP8 source and produces a situation analysis.