8acfca3fe0
Five Phase 1 criteria: Bug 2 closed (cap_pool readback returns real pixels through libva); Bug 3 closed (hantro MPEG-2 + VP8 controls accepted on new kernel); patches ship from kernel-agent (local-carry acceptable, mainline bonus); zero codec-contract regression vs iter4; 5/5 direct-verification block restored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
223 lines
20 KiB
Markdown
223 lines
20 KiB
Markdown
# Iteration 5 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock
|
|
|
|
Opens 2026-05-10 immediately after iter4 close ([`iter4_phase7_close.md`](iter4_phase7_close.md), commit `9d2b7c1`). Per `feedback_dev_process.md` Phase 0, this document captures iter5's substrate state + open issue inventory and proposes candidate research questions for Phase 1 lock.
|
|
|
|
**This iteration breaks pattern.** iter1..iter4 all targeted "the next codec." With VP9 closed at iter4, the campaign-level scope ("5 codecs boolean-correct") is technically green (4/5 direct + 1/5 transitive). iter5 must therefore answer a *different* class of research question than "add codec N+1." Three substantive candidates surfaced from iter4 close are recorded below; Phase 1 lock picks one.
|
|
|
|
## Substrate state (verified 2026-05-10 21:50 CEST on fresnel)
|
|
|
|
| Property | Value | Notes |
|
|
|---|---|---|
|
|
| Kernel | `linux-fresnel-fourier 7.0-1` | kernel-agent product. v7.0 + 3 PBP DTS patches. **NOT** besser-direct. `vb2_dma_resv` + panfrost iommu-cache excluded pending RFC v2. |
|
|
| Boot device numbering (today) | `/dev/video0` rga · `/dev/video1` vpu-enc · `/dev/video2` vpu-dec (hantro MPEG-2/VP8) · `/dev/video3` rkvdec (H.264/HEVC/VP9) · `/dev/video4,5` USB cams | `/dev/media0=hantro` · `/dev/media1=rkvdec` · `/dev/media2=uvc`. **Different from iter4 Phase 6 boot** — auto-detect picks hantro encoder first (iter4-B1 not fixed). |
|
|
| Fork tip | `692eaa0` | iter4 Phase 7 fix-forward (`h264_start_code` profile-gating) |
|
|
| Backend installed | `/usr/lib/dri/v4l2_request_drv_video.so` SHA256 `6e90b7a9b2c33480dd3ffc2da8423ab0bcef14f23c68cf18dc2ae2ff66ac808c` | matches fork tip build |
|
|
| Codec scoreboard | 5/5 PASS (4 direct + 1 transitive) | iter1 MPEG-2 · iter2 HEVC · T4 H.264 · iter3 VP8 · iter4 VP9 |
|
|
|
|
## Open issue inventory from iter4 close
|
|
|
|
Carried forward into iter5 (any/all can be locked as the iter5 target):
|
|
|
|
### Substrate-class issues (block direct verification through libva backend)
|
|
|
|
**Bug 2 — cap_pool readback regression substrate-wide** on `linux-fresnel-fourier 7.0-1`. Symptom: libva-backend ffmpeg-vaapi-hwdownload returns the cap_pool init pattern (`RGB(0,0x4c,0)` constant) for all 4 already-shipping codecs at iter4 Phase 7 capture. H.264 partial exception (keyframe reads correctly, inter frames stuck). Hash `93dd9db5...` appears across VP9 + MPEG-2. Sibling to iter3 dma_resv issue but different signature: constant `0x4c` instead of all-zero. Kernel-direct ffmpeg-v4l2request works fine (iter4 Leg 3 PASS). So the broken path is libva → cap_pool → kernel binding, not the kernel.
|
|
|
|
**Bug 3 — hantro UAPI drift** on MPEG-2 + VP8: kernel `Unable to set control(s)` rejection at submission time. Phase 2 of iter1 (6.19.9-99) had `v4l2_ctrl_vp9_frame` size 144 / `_compressed_hdr` 1947; iter4 Phase 3 (7.0-1) had 168 / 2040. Probably struct-size or new-required-field drift on hantro controls (MPEG-2/VP8 sequence/picture/quant), analogous to the VP9 struct evolution.
|
|
|
|
### Backend-class issues (iter4 Phase 6 backlog)
|
|
|
|
- **iter4-B1** — Auto-detect picks first-enumerated `/dev/media*` driver-match without discriminating decoder vs encoder. Today's boot enumerates hantro-vpu **encoder** (`rk3399-vpu-enc`) at `/dev/media0`; auto-detect picks it for VP9 (wrong). Workaround: explicit `LIBVA_V4L2_REQUEST_NO_AUTODETECT=1 LIBVA_V4L2_REQUEST_VIDEO_PATH=... LIBVA_V4L2_REQUEST_MEDIA_PATH=...`. Fix: walk should require `MEDIA_ENT_F_PROC_VIDEO_DECODER` entity in the topology, not just driver-name match.
|
|
- **iter4-B2** — mpv-vaapi `Could not create device` for VP9. Consumer-side, not backend.
|
|
- **iter4-Q6** — per-segment `seg_param[s].luma_ac_quant_scale` → kernel `feature_data[s][ALT_Q]` is lossy for non-BBB segmentation-enabled streams. Dead code for BBB.
|
|
- **iter4-COLOR_RANGE** — VAAPI exposes no `color_range`; backend leaves `V4L2_VP9_FRAME_FLAG_COLOR_RANGE_FULL_SWING` clear. Wrong for full-range JPEG-encoded VP9.
|
|
- **B3** — picture.c BeginPicture profile-aware reset. iter1-4 added per-profile reset lines without refactoring.
|
|
- **B4** — context.c log suppression for `Unable to set control(s)` warnings from codecs the bound driver doesn't support.
|
|
- **B5** — mpeg2 vbv_buffer_size polish (iter1 S2 finding).
|
|
- **B6** — h265 SPS bitstream-parse fidelity gap.
|
|
- **L3** — vaDeriveImage cache-stale.
|
|
|
|
### Regression invariants (iter1-3 hashes invalidated by substrate change)
|
|
|
|
iter4 Phase 7 documented all 4 prior codecs FAIL on `linux-fresnel-fourier 7.0-1`:
|
|
|
|
| Codec | Driver | iter4 P7 verdict | Underlying cause |
|
|
|---|---|---|---|
|
|
| H.264 | rkvdec | PARTIAL (keyframe-only) | Bug 2 |
|
|
| HEVC | rkvdec | FAIL (constant fill) | Bug 2 |
|
|
| MPEG-2 | hantro | FAIL + `Unable to set control(s)` | Bug 2 + Bug 3 |
|
|
| VP8 | hantro | FAIL + `Unable to set control(s)` | Bug 2 + Bug 3 |
|
|
| VP9 | rkvdec | FAIL until fix-forward | iter4-specific (start-code prepend), closed |
|
|
|
|
The campaign-level "5/5 codecs passing" claim is true on iter4's substrate frame-of-reference (transitive proof discipline) but NOT byte-identical on the new substrate. Until iter5 work establishes Bug 2 + Bug 3 closure, iter5+ regression test means **transitive proof** + control-payload byte-compare, not direct pixel hashes.
|
|
|
|
## Candidate iter5 research questions
|
|
|
|
### Candidate A — Re-anchor 4-codec regression block on `linux-fresnel-fourier 7.0-1`
|
|
|
|
> **"Restore direct pixel verification for at least 3 of {H.264, HEVC, MPEG-2, VP8} on the new kernel substrate. Lock the regression-test reference hashes for iter5+ regression-block-of-5."**
|
|
|
|
Pass/fail: For ≥3 codecs of the four iter1-3 codecs, ffmpeg-vaapi-hwdownload + libva backend produces a SHA256 (or transitive-proof equivalent) that locks as iter5+ regression invariant. The 4-codec block in iter5 binding cells is GREEN.
|
|
|
|
**Scope**: Investigate Bug 2 (cap_pool readback) — is it a libva-backend cap_pool refactor needed, or a kernel patch? Investigate Bug 3 (hantro UAPI drift) — what changed in 7.0-1 struct shapes for MPEG-2/VP8 controls? Both can be backend-only OR cross-cutting into kernel-agent.
|
|
|
|
**Pro**: Establishes a re-grounded baseline for iter5+ campaign claims. Without it, every future iteration must say "re-verify against substrate" before declaring regression-block clean.
|
|
|
|
**Con**: Bug 2 might require kernel patches (RFC v2 vb2_dma_resv hasn't landed); fix-forward backend may not be enough. Scope risk: could expand into a kernel-iteration.
|
|
|
|
### Candidate B — Substrate stabilization: land vb2_dma_resv + hantro UAPI drift fixes in `kernel-agent`
|
|
|
|
> **"Remove the cap_pool readback workaround across the campaign by landing kernel patches in the `linux-fresnel-fourier` product. Direct verification becomes possible for all 5 codecs."**
|
|
|
|
Pass/fail: Post-iter5 kernel ships with patches; ffmpeg-vaapi-hwdownload returns correct pixels for all 5 codecs (no transitive proof needed); the iter4 transitive proof for VP9 can be replaced by direct pixel hash.
|
|
|
|
**Scope**: Cross-product work in `kernel-agent` (sibling memory `project_kernel_agent.md`). vb2_dma_resv RFC v2 is in design upstream — track + land. Hantro UAPI drift root cause may be in linux-media tree, not local patch.
|
|
|
|
**Pro**: Solves the *real* underlying issue. iter1-4 all become direct-verifiable; transitive proofs become belt-and-suspenders. iter6+ can lock pixel hashes confidently.
|
|
|
|
**Con**: Large scope. Kernel-iteration timing, possibly multi-week. May not be ready for a single iter5 close cycle. Memory `reference_dmabuf_resv_blocker.md` notes RFC v1 rejected, v2 in design — landing window uncertain.
|
|
|
|
### Candidate C — Auto-detect device-discovery harden (iter4-B1)
|
|
|
|
> **"Backend auto-detect picks the right decoder for each codec on every boot, without env-override workarounds. Validate against the iter4-style enumeration shuffle (today: hantro-vpu encoder enumerates first)."**
|
|
|
|
Pass/fail: With auto-detect (no env vars), `vainfo` enumerates all 5 codec profiles correctly. ffmpeg-vaapi VP9 + H.264 + HEVC route to rkvdec; MPEG-2 + VP8 route to hantro-vpu-dec. Tested on at least 2 reboots with different device-numbering orders.
|
|
|
|
**Scope**: backend-only. ~50-100 LOC change in `request.c::v4l2_request_init` to enrich the media topology walk: require `MEDIA_ENT_F_PROC_VIDEO_DECODER` on at least one entity, OR scan media-controller entities for decode-capable function, OR maintain a codec-to-card-name map.
|
|
|
|
**Pro**: Smallest scope. Pure-backend fix. Removes a campaign-wide fragility. Mechanical.
|
|
|
|
**Con**: Doesn't address Bug 2 + Bug 3 (substrate-class blockers). iter5 close would say "5/5 codecs auto-route correctly" but still need transitive proof to verify any of them actually decode.
|
|
|
|
### Candidate D — Performance / measurement iteration
|
|
|
|
> **"Quantify per-codec performance on fresnel — CPU utilization, frames-per-second, drop count over a 60s window — for H.264/HEVC/VP9 paths. Establish iter5 binding-cell numbers for future performance-regression-checks."**
|
|
|
|
Pass/fail: For each of {H.264 1080p30, HEVC 720p, VP9 720p}, mpv `--hwdec=vaapi --frames=10s` reports CPU% < some-threshold, no dropped frames, libva path engaged. The numbers themselves become the iter5+ binding-cell anchors.
|
|
|
|
**Scope**: pure measurement + harness. NO backend code changes. Maybe a `perf` / `top` / `mpv --status` harness script. Reproducible per-codec invocations.
|
|
|
|
**Pro**: Campaign README locked this as "follow-up iteration after correctness lands." Correctness has landed. Now is the formally-next iteration. NOT blocked by Bug 2/3 (perf is engagement-side, not pixel-side).
|
|
|
|
**Con**: Bug 2 + Bug 3 still block the iter1-3 codecs from reporting *correct* pixels — performance numbers without pixel correctness are vacuous (could measure 60fps of constant green). MPEG-2 + VP8 may not even submit-controls cleanly. Likely needs Candidate A or B first.
|
|
|
|
## Recommendation
|
|
|
|
If pressed to recommend a sequence: **A → C → B → D**.
|
|
|
|
A first because regression invariants underpin everything else. C next because the fragility hits the user every boot. B because it's the real fix but timing-uncertain. D last because it depends on A.
|
|
|
|
A and C could plausibly be one iteration if scope cooperates; B is a separate beast; D is genuinely "after correctness."
|
|
|
|
## Locked research question (iteration 5, 2026-05-10)
|
|
|
|
User pick: **Candidate B — both Bug 2 + Bug 3 in one close, kernel-agent local-carry acceptable.**
|
|
|
|
> **"Land vb2_dma_resv (Bug 2) AND fix hantro UAPI drift (Bug 3) in the `linux-fresnel-fourier` kernel product so that every shipping codec on RK3399 — H.264, HEVC, VP9 via rkvdec; MPEG-2, VP8 via hantro-vpu-dec — produces directly-readable pixels through the libva backend's cap_pool path, on the next kernel product shipped from kernel-agent. The transitive-proof discipline becomes optional belt-and-suspenders, not the only viable verifier."**
|
|
|
|
### Pass/fail (boolean, per `feedback_dev_process.md` Phase 1)
|
|
|
|
1. **Bug 2 closed — cap_pool readback returns kernel-decoded pixels on the new kernel.** For all 3 rkvdec codecs (H.264 1080p30, HEVC 720p, VP9 720p), `ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload,format=nv12,format=yuv420p -f rawvideo` via the libva backend on the new kernel returns raw YUV byte-identical to the SW reference (or transitive equivalent that explicitly demonstrates cap_pool is delivering kernel-written pages, not init pattern). The `RGB(0,0x4c,0)` constant-fill signature is gone.
|
|
|
|
2. **Bug 3 closed — hantro MPEG-2 + VP8 controls accepted on the new kernel.** `ffmpeg -hwaccel vaapi ... bbb_720p10s_{mpeg2.ts,vp8.webm}` via the libva backend on the new kernel runs without kernel `Unable to set control(s)` log messages, and produces YUV byte-matching the SW reference (with or without transitive proof depending on Bug 2 status for the hantro path specifically).
|
|
|
|
3. **Substrate ships from `kernel-agent`.** The new kernel is built and promoted via the kernel-agent pipeline (per memory `project_kernel_agent.md` — `ka-promote` + `ka-close --status success` semantics) and tagged `linux-fresnel-fourier 7.0-2` (or whatever version the agent produces next). Local-carry patches in the kernel-agent product are acceptable for iter5 close — mainline-landing is bonus, not required. Patch scope-tagged per agent discipline (Hard rule #7).
|
|
|
|
4. **No regression on already-shipping codec contracts.** The Phase 3 anchor payloads (FRAME + COMPRESSED_HDR + slice for all 5 codecs) on the new kernel still byte-match the iter1-4 captures, modulo the documented S4 uv_mode VP9 carve-out. The libva backend tip `692eaa0` is unmodified; only the kernel substrate changes.
|
|
|
|
5. **Five-codec direct-verification block.** With the new kernel + unmodified backend tip `692eaa0`, all 5 codecs pass criterion-4 *directly* (not transitively). VP9 in particular: byte-identical YUV from libva-backend hwdownload to SW reference. The campaign scoreboard becomes "5/5 direct" (was "4 direct + 1 transitive").
|
|
|
|
A clean iter5 close has all five criteria green. Phase 7 → Phase 4 loopback per `feedback_dev_process.md` if any fail.
|
|
|
|
### Scope locks
|
|
|
|
**In scope:**
|
|
|
|
- vb2_dma_resv kernel patch (RFC v2 carry, or local equivalent). Source: linux-media list RFC v2 thread (resume from `reference_dmabuf_resv_blocker.md`).
|
|
- Hantro UAPI drift root-cause + fix. Probably struct-size / field-additions in `drivers/media/platform/verisilicon/hantro_*.c` between 6.19 → 7.0. Phase 3 baseline diffs the headers in both kernels.
|
|
- Kernel-agent product build + promote pipeline (per `project_kernel_agent.md`). Iter5 is the first user-driven exercise of `ka-promote` + `ka-close` for the fresnel-fourier campaign tag.
|
|
- Re-verification matrix on new kernel: all 5 codecs through libva, direct pixel hash.
|
|
|
|
**Out of scope:**
|
|
|
|
- iter4-B1 (auto-detect device discrimination) — backend-side fragility, separate iteration.
|
|
- iter4-B2 (mpv-vaapi create-device failure) — consumer-side.
|
|
- iter4-Q6, COLOR_RANGE, B3/B4/B5/B6/L3 backlog — backend, not kernel.
|
|
- panfrost IOMMU_CACHE — sibling issue not on Bug-2/Bug-3 critical path.
|
|
- Performance metrics — that's Candidate D, deferred again.
|
|
- Front-end libva (campaign-wide, unchanged).
|
|
- AV1 (no decoder block).
|
|
- Bootlin / Collabora upstreaming — local-carry acceptable per user pick. Stays default-deferred.
|
|
- Mainline merge — bonus, not gating.
|
|
|
|
### Open questions to resolve in Phase 0 inventory deepening / Phase 2 source-read
|
|
|
|
1. **vb2_dma_resv RFC v2 status** — what is the current patch revision on linux-media? Is it apply-able to v7.0 directly, or does it need rebase? Phase 2 task.
|
|
2. **Hantro 7.0 driver delta** — diff `drivers/media/platform/verisilicon/hantro_{mpeg2,vp8}.c` between v6.19.9 and v7.0 upstream. What controls' struct shapes changed? What new mandatory fields are there? Phase 3 baseline.
|
|
3. **kernel-agent operational status** — memory notes the agent is spec'd 2026-05-09, not yet operational; `ka-*` CLI verbs not implemented. Does iter5 trigger building the agent, or use the manual kernel-build path as fallback? Coordinate with the operator at Phase 1 close.
|
|
4. **Patch attribution** — vb2_dma_resv is somebody else's patch (linux-media author). For local-carry the kernel-agent's patch metadata needs the upstream author preserved + a "Backported-by: claude-noether <sentinel-email>" trailer. Phase 6 detail.
|
|
5. **Cross-substrate verification** — once new kernel ships, must we also rebuild libva backend against the new kernel UAPI headers? Probably no (backend reads UAPI at runtime, not build-time), but Phase 3 confirms.
|
|
6. **Test corpus** — same 5 fixtures (bbb_*) as iter1-4. New kernel could break the fixture-parse path if hantro's new UAPI requires different bitstream framing — unlikely but Phase 3 verifies.
|
|
7. **Reboot discipline** — installing a new kernel requires fresnel reboot. Schedule with the operator (Pinebook Pro is a working machine).
|
|
|
|
### Phase 2 source-read targets
|
|
|
|
- `drivers/media/v4l2-core/videobuf2-dma-contig.c` (or wherever vb2_dma_resv lives upstream — RFC v2 patch context).
|
|
- `drivers/media/platform/verisilicon/hantro_mpeg2.c` + `hantro_vp8.c` — v6.19 vs v7.0 delta.
|
|
- `drivers/media/v4l2-core/v4l2-ctrls-defs.c` — control-struct evolution for VP8 + MPEG-2.
|
|
- `/usr/include/linux/v4l2-controls.h` on fresnel (currently 7.0 headers).
|
|
- Memory `project_kernel_agent.md` — agent operating envelope.
|
|
- Memory `reference_fresnel_kernel_substrate.md` — current product version + excluded-patch list.
|
|
- linux-media archive — RFC v2 thread + reviewer feedback.
|
|
|
|
### Phase 3 baseline (predicted shape)
|
|
|
|
Phase 3 captures:
|
|
|
|
- Verbatim FRAME + COMPRESSED_HDR (VP9) + slice (H.264/HEVC/MPEG-2/VP8) control payloads on `linux-fresnel-fourier 7.0-1` (current kernel) for all 5 codecs. Strace + `extract.py` from iter4 close evidence applies directly.
|
|
- Raw YUV from ffmpeg-vaapi-hwdownload on the current kernel (will be all-`0x4c` constant-fill = init pattern for ≥3 codecs, confirms Bug 2 is reproduced).
|
|
- Raw YUV from ffmpeg-v4l2request kernel-direct on the current kernel for all 5 codecs (the reference, since kernel-direct bypasses cap_pool).
|
|
- Kernel `Unable to set control(s)` log capture for hantro MPEG-2 + VP8 (confirms Bug 3 reproduced).
|
|
- Header diff: `linux-fresnel-fourier 7.0-1` vs `linux-eos-arm 6.19.9-99` `v4l2-controls.h` + hantro driver source.
|
|
|
|
Once Phase 3 lands, Phase 4 designs patches against the diff.
|
|
|
|
### Predicted iter5 difficulty + shape
|
|
|
|
- **vs iter1-3 (codec adds, ~300-800 LOC libva backend)**: iter5 is kernel-side. Patch scope likely small (vb2_dma_resv is reportedly ~50-100 lines mainline; hantro UAPI drift may be smaller still, just struct-size guards). Build + promote pipeline is the time sink.
|
|
- **vs iter4 (single new codec)**: iter5 touches NO codec. It touches the kernel that codecs run on.
|
|
- **Predicted scope**: 1-3 kernel patches (~50-200 LOC), kernel build + install per kernel-agent flow, libva-backend regression matrix re-run. No fork patches expected unless Phase 3 surfaces a backend-side amplification of Bug 2/3.
|
|
- **Most likely Phase 7 failure mode**: vb2_dma_resv RFC v2 doesn't apply cleanly to v7.0; hantro 7.0 driver delta turns out to need a downstream patch the kernel maintainers haven't shipped yet; cap_pool readback fails for reasons beyond what vb2_dma_resv addresses.
|
|
|
|
### Predecessor close-out summary
|
|
|
|
Per the campaign 8(+1)-phase loop, iter4 closed at `9d2b7c1` (campaign repo) + fork tip `692eaa0`. iter4's transitive proof was the *workaround* for Bug 2; iter5 sets out to *remove* that workaround for the rkvdec block, and to *remove the regression-block falsification* introduced by Bug 3 on the hantro block.
|
|
|
|
Memory rules carry forward (no rule retirement expected at iter5 open):
|
|
- `feedback_dev_process` (campaign-wide)
|
|
- `feedback_gitea_as_claude_noether` (updated at iter4 close)
|
|
- `feedback_no_session_termination_attempts`
|
|
- `feedback_review_empirical_over_theoretical` (both directions)
|
|
- `feedback_unconditional_codec_state` (iter4 lesson, applies to any new backend flag work)
|
|
- `feedback_pixel_compare_in_yuv_not_png` (iter4 lesson, applies to criterion 4/5)
|
|
- `feedback_hw_decode_engagement_check` (iter3 lesson)
|
|
- `feedback_runtime_enumerates_allowlists` (iter3 lesson)
|
|
- `reference_dmabuf_resv_blocker` (iter5 STARTS HERE — this is the bug iter5 closes)
|
|
- `reference_fresnel_kernel_substrate` (iter5 produces the next entry's version bump)
|
|
- `project_kernel_agent` (iter5 is the first campaign-side exercise of the agent flow)
|
|
|
|
iter5 → iter6 handoff prediction:
|
|
|
|
- If Bug 2 + Bug 3 close cleanly: iter6 picks up Candidate A (regression block re-anchored on new substrate, formal iter5+ binding-cell numbers) or Candidate C (auto-detect harden).
|
|
- If Bug 2 closes but Bug 3 doesn't: iter6 = Bug 3 alone, or iter6 swaps the iter5 plan to iter5b mid-loop.
|
|
- If neither closes: Phase 7 → Phase 4 loopback within iter5, OR Phase 7 → Phase 0 reset.
|
|
|
|
## Memory state at iter5 open
|
|
|
|
Active rules (`MEMORY.md` index): 11 entries. Recently added at iter4 close:
|
|
|
|
- `feedback_unconditional_codec_state` (iter4 Phase 7 root-cause)
|
|
- `feedback_pixel_compare_in_yuv_not_png` (iter4 Phase 7 close)
|
|
- `feedback_gitea_as_claude_noether` (updated — SSH push now wired, user is `gitea` at port 2222)
|
|
|
|
No new memory needed at iter5 open until Phase 1 locks the target.
|