# Iteration 5 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock Opens 2026-05-10 immediately after iter4 close ([`iter4_phase7_close.md`](iter4_phase7_close.md), commit `9d2b7c1`). Per `feedback_dev_process.md` Phase 0, this document captures iter5's substrate state + open issue inventory and proposes candidate research questions for Phase 1 lock. **This iteration breaks pattern.** iter1..iter4 all targeted "the next codec." With VP9 closed at iter4, the campaign-level scope ("5 codecs boolean-correct") is technically green (4/5 direct + 1/5 transitive). iter5 must therefore answer a *different* class of research question than "add codec N+1." Three substantive candidates surfaced from iter4 close are recorded below; Phase 1 lock picks one. ## Substrate state (verified 2026-05-10 21:50 CEST on fresnel) | Property | Value | Notes | |---|---|---| | Kernel | `linux-fresnel-fourier 7.0-1` | kernel-agent product. v7.0 + 3 PBP DTS patches. **NOT** besser-direct. `vb2_dma_resv` + panfrost iommu-cache excluded pending RFC v2. | | Boot device numbering (today) | `/dev/video0` rga · `/dev/video1` vpu-enc · `/dev/video2` vpu-dec (hantro MPEG-2/VP8) · `/dev/video3` rkvdec (H.264/HEVC/VP9) · `/dev/video4,5` USB cams | `/dev/media0=hantro` · `/dev/media1=rkvdec` · `/dev/media2=uvc`. **Different from iter4 Phase 6 boot** — auto-detect picks hantro encoder first (iter4-B1 not fixed). | | Fork tip | `692eaa0` | iter4 Phase 7 fix-forward (`h264_start_code` profile-gating) | | Backend installed | `/usr/lib/dri/v4l2_request_drv_video.so` SHA256 `6e90b7a9b2c33480dd3ffc2da8423ab0bcef14f23c68cf18dc2ae2ff66ac808c` | matches fork tip build | | Codec scoreboard | 5/5 PASS (4 direct + 1 transitive) | iter1 MPEG-2 · iter2 HEVC · T4 H.264 · iter3 VP8 · iter4 VP9 | ## Open issue inventory from iter4 close Carried forward into iter5 (any/all can be locked as the iter5 target): ### Substrate-class issues (block direct verification through libva backend) **Bug 2 — cap_pool readback regression substrate-wide** on `linux-fresnel-fourier 7.0-1`. Symptom: libva-backend ffmpeg-vaapi-hwdownload returns the cap_pool init pattern (`RGB(0,0x4c,0)` constant) for all 4 already-shipping codecs at iter4 Phase 7 capture. H.264 partial exception (keyframe reads correctly, inter frames stuck). Hash `93dd9db5...` appears across VP9 + MPEG-2. Sibling to iter3 dma_resv issue but different signature: constant `0x4c` instead of all-zero. Kernel-direct ffmpeg-v4l2request works fine (iter4 Leg 3 PASS). So the broken path is libva → cap_pool → kernel binding, not the kernel. **Bug 3 — hantro UAPI drift** on MPEG-2 + VP8: kernel `Unable to set control(s)` rejection at submission time. Phase 2 of iter1 (6.19.9-99) had `v4l2_ctrl_vp9_frame` size 144 / `_compressed_hdr` 1947; iter4 Phase 3 (7.0-1) had 168 / 2040. Probably struct-size or new-required-field drift on hantro controls (MPEG-2/VP8 sequence/picture/quant), analogous to the VP9 struct evolution. ### Backend-class issues (iter4 Phase 6 backlog) - **iter4-B1** — Auto-detect picks first-enumerated `/dev/media*` driver-match without discriminating decoder vs encoder. Today's boot enumerates hantro-vpu **encoder** (`rk3399-vpu-enc`) at `/dev/media0`; auto-detect picks it for VP9 (wrong). Workaround: explicit `LIBVA_V4L2_REQUEST_NO_AUTODETECT=1 LIBVA_V4L2_REQUEST_VIDEO_PATH=... LIBVA_V4L2_REQUEST_MEDIA_PATH=...`. Fix: walk should require `MEDIA_ENT_F_PROC_VIDEO_DECODER` entity in the topology, not just driver-name match. - **iter4-B2** — mpv-vaapi `Could not create device` for VP9. Consumer-side, not backend. - **iter4-Q6** — per-segment `seg_param[s].luma_ac_quant_scale` → kernel `feature_data[s][ALT_Q]` is lossy for non-BBB segmentation-enabled streams. Dead code for BBB. - **iter4-COLOR_RANGE** — VAAPI exposes no `color_range`; backend leaves `V4L2_VP9_FRAME_FLAG_COLOR_RANGE_FULL_SWING` clear. Wrong for full-range JPEG-encoded VP9. - **B3** — picture.c BeginPicture profile-aware reset. iter1-4 added per-profile reset lines without refactoring. - **B4** — context.c log suppression for `Unable to set control(s)` warnings from codecs the bound driver doesn't support. - **B5** — mpeg2 vbv_buffer_size polish (iter1 S2 finding). - **B6** — h265 SPS bitstream-parse fidelity gap. - **L3** — vaDeriveImage cache-stale. ### Regression invariants (iter1-3 hashes invalidated by substrate change) iter4 Phase 7 documented all 4 prior codecs FAIL on `linux-fresnel-fourier 7.0-1`: | Codec | Driver | iter4 P7 verdict | Underlying cause | |---|---|---|---| | H.264 | rkvdec | PARTIAL (keyframe-only) | Bug 2 | | HEVC | rkvdec | FAIL (constant fill) | Bug 2 | | MPEG-2 | hantro | FAIL + `Unable to set control(s)` | Bug 2 + Bug 3 | | VP8 | hantro | FAIL + `Unable to set control(s)` | Bug 2 + Bug 3 | | VP9 | rkvdec | FAIL until fix-forward | iter4-specific (start-code prepend), closed | The campaign-level "5/5 codecs passing" claim is true on iter4's substrate frame-of-reference (transitive proof discipline) but NOT byte-identical on the new substrate. Until iter5 work establishes Bug 2 + Bug 3 closure, iter5+ regression test means **transitive proof** + control-payload byte-compare, not direct pixel hashes. ## Candidate iter5 research questions ### Candidate A — Re-anchor 4-codec regression block on `linux-fresnel-fourier 7.0-1` > **"Restore direct pixel verification for at least 3 of {H.264, HEVC, MPEG-2, VP8} on the new kernel substrate. Lock the regression-test reference hashes for iter5+ regression-block-of-5."** Pass/fail: For ≥3 codecs of the four iter1-3 codecs, ffmpeg-vaapi-hwdownload + libva backend produces a SHA256 (or transitive-proof equivalent) that locks as iter5+ regression invariant. The 4-codec block in iter5 binding cells is GREEN. **Scope**: Investigate Bug 2 (cap_pool readback) — is it a libva-backend cap_pool refactor needed, or a kernel patch? Investigate Bug 3 (hantro UAPI drift) — what changed in 7.0-1 struct shapes for MPEG-2/VP8 controls? Both can be backend-only OR cross-cutting into kernel-agent. **Pro**: Establishes a re-grounded baseline for iter5+ campaign claims. Without it, every future iteration must say "re-verify against substrate" before declaring regression-block clean. **Con**: Bug 2 might require kernel patches (RFC v2 vb2_dma_resv hasn't landed); fix-forward backend may not be enough. Scope risk: could expand into a kernel-iteration. ### Candidate B — Substrate stabilization: land vb2_dma_resv + hantro UAPI drift fixes in `kernel-agent` > **"Remove the cap_pool readback workaround across the campaign by landing kernel patches in the `linux-fresnel-fourier` product. Direct verification becomes possible for all 5 codecs."** Pass/fail: Post-iter5 kernel ships with patches; ffmpeg-vaapi-hwdownload returns correct pixels for all 5 codecs (no transitive proof needed); the iter4 transitive proof for VP9 can be replaced by direct pixel hash. **Scope**: Cross-product work in `kernel-agent` (sibling memory `project_kernel_agent.md`). vb2_dma_resv RFC v2 is in design upstream — track + land. Hantro UAPI drift root cause may be in linux-media tree, not local patch. **Pro**: Solves the *real* underlying issue. iter1-4 all become direct-verifiable; transitive proofs become belt-and-suspenders. iter6+ can lock pixel hashes confidently. **Con**: Large scope. Kernel-iteration timing, possibly multi-week. May not be ready for a single iter5 close cycle. Memory `reference_dmabuf_resv_blocker.md` notes RFC v1 rejected, v2 in design — landing window uncertain. ### Candidate C — Auto-detect device-discovery harden (iter4-B1) > **"Backend auto-detect picks the right decoder for each codec on every boot, without env-override workarounds. Validate against the iter4-style enumeration shuffle (today: hantro-vpu encoder enumerates first)."** Pass/fail: With auto-detect (no env vars), `vainfo` enumerates all 5 codec profiles correctly. ffmpeg-vaapi VP9 + H.264 + HEVC route to rkvdec; MPEG-2 + VP8 route to hantro-vpu-dec. Tested on at least 2 reboots with different device-numbering orders. **Scope**: backend-only. ~50-100 LOC change in `request.c::v4l2_request_init` to enrich the media topology walk: require `MEDIA_ENT_F_PROC_VIDEO_DECODER` on at least one entity, OR scan media-controller entities for decode-capable function, OR maintain a codec-to-card-name map. **Pro**: Smallest scope. Pure-backend fix. Removes a campaign-wide fragility. Mechanical. **Con**: Doesn't address Bug 2 + Bug 3 (substrate-class blockers). iter5 close would say "5/5 codecs auto-route correctly" but still need transitive proof to verify any of them actually decode. ### Candidate D — Performance / measurement iteration > **"Quantify per-codec performance on fresnel — CPU utilization, frames-per-second, drop count over a 60s window — for H.264/HEVC/VP9 paths. Establish iter5 binding-cell numbers for future performance-regression-checks."** Pass/fail: For each of {H.264 1080p30, HEVC 720p, VP9 720p}, mpv `--hwdec=vaapi --frames=10s` reports CPU% < some-threshold, no dropped frames, libva path engaged. The numbers themselves become the iter5+ binding-cell anchors. **Scope**: pure measurement + harness. NO backend code changes. Maybe a `perf` / `top` / `mpv --status` harness script. Reproducible per-codec invocations. **Pro**: Campaign README locked this as "follow-up iteration after correctness lands." Correctness has landed. Now is the formally-next iteration. NOT blocked by Bug 2/3 (perf is engagement-side, not pixel-side). **Con**: Bug 2 + Bug 3 still block the iter1-3 codecs from reporting *correct* pixels — performance numbers without pixel correctness are vacuous (could measure 60fps of constant green). MPEG-2 + VP8 may not even submit-controls cleanly. Likely needs Candidate A or B first. ## Recommendation If pressed to recommend a sequence: **A → C → B → D**. A first because regression invariants underpin everything else. C next because the fragility hits the user every boot. B because it's the real fix but timing-uncertain. D last because it depends on A. A and C could plausibly be one iteration if scope cooperates; B is a separate beast; D is genuinely "after correctness." ## Locked research question (iteration 5, 2026-05-10) User pick: **Candidate B — Bug 2 only, with rkvdec consumer added. Kernel-agent local-carry acceptable.** **Phase 2 source-read finding (2026-05-10 mid-Phase): Bug 3 does not exist as a UAPI drift.** Empirical struct-by-struct diff of `include/uapi/linux/v4l2-controls.h` between v6.12 (the RFC v2 base) and v7.0 (the kernel-agent product baseline for `linux-fresnel-fourier 7.0-1`) shows `v4l2_ctrl_mpeg2_{sequence,picture,quantisation}` and `v4l2_ctrl_vp8_frame` byte-identical across both kernels. On-fresnel re-trace of MPEG-2 decode with the correct hantro-decoder bind (`/dev/video2 + /dev/media0` on the 2026-05-10 boot) shows the actual MPEG-2 control submissions (`0xa409dc/dd/de`) returning `= 0` (success), with five successful frame decodes. The "Unable to set control(s): Invalid argument" log noise on hantro was traced to the backend's init-time H.264 + HEVC decode_mode/start_code probes (`0xa40900/01 + 0xa40a95/96`) — these EINVAL on hantro because hantro is MPEG-2/VP8 only, but have no functional impact on the MPEG-2/VP8 decode that follows. This is the **B4 backlog item** ("context.c log suppression for unsupported codec controls"), present on every device that doesn't support the full codec set, not a kernel bug. Bug 3 collapses to "B4 cosmetic noise on hantro, dropped from iter5 scope." > **"Land the vb2_dma_resv RFC v2 patch series (3 existing patches: videobuf2 helper + hantro consumer + rockchip-rga consumer) PLUS a new fourth rkvdec consumer patch in the `linux-fresnel-fourier` kernel product, so that every shipping codec on RK3399 — H.264, HEVC, VP9 via rkvdec; MPEG-2, VP8 via hantro-vpu-dec — produces directly-readable pixels through the libva backend's cap_pool path on the next kernel product shipped from kernel-agent. The transitive-proof discipline becomes optional belt-and-suspenders, not the only viable verifier."** ### Pass/fail (boolean, per `feedback_dev_process.md` Phase 1) **Amended 2026-05-10 post-Phase-2 source-read.** Dropped from 5 criteria to 4 after the empirical Bug-3 collapse documented above. MPEG-2 + VP8 fold into the Bug-2-closes block — the hantro cap_pool readback is the only blocker for them, identical in shape to the rkvdec block. The B4 cosmetic init-probe noise on hantro is out of iter5 scope (backlog). 1. **Bug 2 closed — cap_pool readback returns kernel-decoded pixels on the new kernel, for both decoder blocks.** For all 5 shipping codecs (H.264 1080p30 + HEVC 720p + VP9 720p via rkvdec; MPEG-2 720p + VP8 720p via hantro), `ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload,format=nv12,format=yuv420p -f rawvideo` via the libva backend on the new kernel returns raw YUV byte-identical to the SW reference. The `RGB(0,0x4c,0)` constant-fill signature is gone campaign-wide. 2. **Substrate ships from `kernel-agent`.** The new kernel is built and promoted via the kernel-agent pipeline (per memory `project_kernel_agent.md` — `ka-promote` + `ka-close --status success` semantics, OR the manual build path if `ka-*` CLI isn't operational yet) and tagged `linux-fresnel-fourier 7.0.kafr2` per the fleet manifest's versioning scheme. Local-carry patches in the kernel-agent product are acceptable for iter5 close — mainline-landing is bonus, not required. Patches are scope-tagged per agent discipline (Hard rule #7): vb2 helper under `subsystem/media/videobuf2/`, three consumer opt-ins under `subsystem/media//`. 3. **No regression on already-shipping codec contracts.** The Phase 3 anchor payloads (FRAME + COMPRESSED_HDR + slice for all 5 codecs) on the new kernel still byte-match the iter1-4 captures, modulo the documented S4 uv_mode VP9 carve-out. The libva backend tip `692eaa0` is unmodified; only the kernel substrate changes. The B4 init-probe EINVAL noise is unchanged (not in scope) but does not cause functional regression. 4. **Five-codec direct-verification block.** With the new kernel + unmodified backend tip `692eaa0`, all 5 codecs pass criterion-4 *directly* (not transitively). VP9 in particular: byte-identical YUV from libva-backend hwdownload to SW reference. MPEG-2 + VP8 pass directly through hantro hwdownload (previously transitive-only since iter3). The campaign scoreboard becomes "5/5 direct" (was "4 direct + 1 transitive" at iter4 close). A clean iter5 close has all four criteria green. Phase 7 → Phase 4 loopback per `feedback_dev_process.md` if any fail. ### Scope locks **In scope:** - vb2_dma_resv RFC v2 patch series — 3 existing patches authored by operator (Markus Fritsche) on `~/src/linux-rfc/`: - `fbe8bf57a media: videobuf2: add dma_resv release-fence helper` — opt-in API `vb2_buffer_attach_release_fence()` + per-queue fence context. - `14a68fcf0 media: hantro: attach dma_resv release fence at buf_queue` — one-line consumer opt-in. - `89b699508 media: rockchip-rga: attach dma_resv release fence at buf_queue` — one-line consumer opt-in. - **New rkvdec consumer patch (iter5 contribution)** — same one-line pattern, target `drivers/media/platform/rockchip/rkvdec/rkvdec.c::rkvdec_buf_queue` (line ~959 at v7.0, post-staging-promotion path). Closes Bug 2 for rkvdec (H.264 + HEVC + VP9 paths). - Rebase of the 4-patch series from v6.12 base → v7.0 base (`linux-fresnel-fourier 7.0-1` substrate). hantro_v4l2.c + rga-buf.c unchanged 6.12→7.0 → patches should apply mostly clean. videobuf2-core.c needs Phase 2 fine-grained re-check. - Kernel-agent fleet manifest update (`fleet/fresnel.yaml`): add the 4-patch group under `includes:`, remove from `Explicitly NOT included` comment block, bump version to `7.0.kafr2`. - Kernel-agent product build + install on fresnel via boltzmann build-host. `ka-*` CLI verbs aren't implemented yet (per memory `project_kernel_agent.md`) — fallback to the manual build path until they land. - Re-verification matrix on new kernel: all 5 codecs through libva, direct pixel hash, no transitive proof needed for any of them. **Out of scope:** - **Bug 3 as originally framed — dropped, doesn't exist as UAPI drift** (see Phase 2 source-read finding at top of locked-question block). - B4 "context.c log suppression for unsupported codec controls" — the H.264/HEVC init-probe EINVAL noise on hantro. Backlog, separate iteration. - iter4-B1 (auto-detect device discrimination) — backend-side fragility, separate iteration. - iter4-B2 (mpv-vaapi create-device failure) — consumer-side. - iter4-Q6, COLOR_RANGE, B3/B5/B6/L3 backlog — backend, not kernel. - panfrost IOMMU_CACHE — sibling issue not on Bug-2 critical path. Fleet manifest comment "ship together with vb2_dma_resv" is informational, but iter5 ships vb2_dma_resv standalone unless panfrost work surfaces unexpectedly. - Performance metrics — that's Candidate D, deferred again. - Front-end libva (campaign-wide, unchanged). - AV1 (no decoder block). - Bootlin / Collabora upstreaming — local-carry acceptable per user pick. Stays default-deferred. - Mainline merge — bonus, not gating. ### Open questions to resolve in Phase 0 inventory deepening / Phase 2 source-read 1. **vb2_dma_resv RFC v2 status** — what is the current patch revision on linux-media? Is it apply-able to v7.0 directly, or does it need rebase? Phase 2 task. 2. **Hantro 7.0 driver delta** — diff `drivers/media/platform/verisilicon/hantro_{mpeg2,vp8}.c` between v6.19.9 and v7.0 upstream. What controls' struct shapes changed? What new mandatory fields are there? Phase 3 baseline. 3. **kernel-agent operational status** — memory notes the agent is spec'd 2026-05-09, not yet operational; `ka-*` CLI verbs not implemented. Does iter5 trigger building the agent, or use the manual kernel-build path as fallback? Coordinate with the operator at Phase 1 close. 4. **Patch attribution** — vb2_dma_resv is somebody else's patch (linux-media author). For local-carry the kernel-agent's patch metadata needs the upstream author preserved + a "Backported-by: claude-noether " trailer. Phase 6 detail. 5. **Cross-substrate verification** — once new kernel ships, must we also rebuild libva backend against the new kernel UAPI headers? Probably no (backend reads UAPI at runtime, not build-time), but Phase 3 confirms. 6. **Test corpus** — same 5 fixtures (bbb_*) as iter1-4. New kernel could break the fixture-parse path if hantro's new UAPI requires different bitstream framing — unlikely but Phase 3 verifies. 7. **Reboot discipline** — installing a new kernel requires fresnel reboot. Schedule with the operator (Pinebook Pro is a working machine). ### Phase 2 source-read targets - `drivers/media/v4l2-core/videobuf2-dma-contig.c` (or wherever vb2_dma_resv lives upstream — RFC v2 patch context). - `drivers/media/platform/verisilicon/hantro_mpeg2.c` + `hantro_vp8.c` — v6.19 vs v7.0 delta. - `drivers/media/v4l2-core/v4l2-ctrls-defs.c` — control-struct evolution for VP8 + MPEG-2. - `/usr/include/linux/v4l2-controls.h` on fresnel (currently 7.0 headers). - Memory `project_kernel_agent.md` — agent operating envelope. - Memory `reference_fresnel_kernel_substrate.md` — current product version + excluded-patch list. - linux-media archive — RFC v2 thread + reviewer feedback. ### Phase 3 baseline (predicted shape) Phase 3 captures: - Verbatim FRAME + COMPRESSED_HDR (VP9) + slice (H.264/HEVC/MPEG-2/VP8) control payloads on `linux-fresnel-fourier 7.0-1` (current kernel) for all 5 codecs. Strace + `extract.py` from iter4 close evidence applies directly. - Raw YUV from ffmpeg-vaapi-hwdownload on the current kernel (will be all-`0x4c` constant-fill = init pattern for ≥3 codecs, confirms Bug 2 is reproduced). - Raw YUV from ffmpeg-v4l2request kernel-direct on the current kernel for all 5 codecs (the reference, since kernel-direct bypasses cap_pool). - Kernel `Unable to set control(s)` log capture for hantro MPEG-2 + VP8 (confirms Bug 3 reproduced). - Header diff: `linux-fresnel-fourier 7.0-1` vs `linux-eos-arm 6.19.9-99` `v4l2-controls.h` + hantro driver source. Once Phase 3 lands, Phase 4 designs patches against the diff. ### Predicted iter5 difficulty + shape - **vs iter1-3 (codec adds, ~300-800 LOC libva backend)**: iter5 is kernel-side. Patch scope likely small (vb2_dma_resv is reportedly ~50-100 lines mainline; hantro UAPI drift may be smaller still, just struct-size guards). Build + promote pipeline is the time sink. - **vs iter4 (single new codec)**: iter5 touches NO codec. It touches the kernel that codecs run on. - **Predicted scope**: 1-3 kernel patches (~50-200 LOC), kernel build + install per kernel-agent flow, libva-backend regression matrix re-run. No fork patches expected unless Phase 3 surfaces a backend-side amplification of Bug 2/3. - **Most likely Phase 7 failure mode**: vb2_dma_resv RFC v2 doesn't apply cleanly to v7.0; hantro 7.0 driver delta turns out to need a downstream patch the kernel maintainers haven't shipped yet; cap_pool readback fails for reasons beyond what vb2_dma_resv addresses. ### Predecessor close-out summary Per the campaign 8(+1)-phase loop, iter4 closed at `9d2b7c1` (campaign repo) + fork tip `692eaa0`. iter4's transitive proof was the *workaround* for Bug 2; iter5 sets out to *remove* that workaround for the rkvdec block, and to *remove the regression-block falsification* introduced by Bug 3 on the hantro block. Memory rules carry forward (no rule retirement expected at iter5 open): - `feedback_dev_process` (campaign-wide) - `feedback_gitea_as_claude_noether` (updated at iter4 close) - `feedback_no_session_termination_attempts` - `feedback_review_empirical_over_theoretical` (both directions) - `feedback_unconditional_codec_state` (iter4 lesson, applies to any new backend flag work) - `feedback_pixel_compare_in_yuv_not_png` (iter4 lesson, applies to criterion 4/5) - `feedback_hw_decode_engagement_check` (iter3 lesson) - `feedback_runtime_enumerates_allowlists` (iter3 lesson) - `reference_dmabuf_resv_blocker` (iter5 STARTS HERE — this is the bug iter5 closes) - `reference_fresnel_kernel_substrate` (iter5 produces the next entry's version bump) - `project_kernel_agent` (iter5 is the first campaign-side exercise of the agent flow) iter5 → iter6 handoff prediction: - If Bug 2 + Bug 3 close cleanly: iter6 picks up Candidate A (regression block re-anchored on new substrate, formal iter5+ binding-cell numbers) or Candidate C (auto-detect harden). - If Bug 2 closes but Bug 3 doesn't: iter6 = Bug 3 alone, or iter6 swaps the iter5 plan to iter5b mid-loop. - If neither closes: Phase 7 → Phase 4 loopback within iter5, OR Phase 7 → Phase 0 reset. ## Memory state at iter5 open Active rules (`MEMORY.md` index): 11 entries. Recently added at iter4 close: - `feedback_unconditional_codec_state` (iter4 Phase 7 root-cause) - `feedback_pixel_compare_in_yuv_not_png` (iter4 Phase 7 close) - `feedback_gitea_as_claude_noether` (updated — SSH push now wired, user is `gitea` at port 2222) No new memory needed at iter5 open until Phase 1 locks the target.