Files
fresnel-fourier/phase0_findings_iter5.md
T
marfrit 8acfca3fe0 iter5 Phase 0: lock Candidate B — vb2_dma_resv + hantro UAPI drift in linux-fresnel-fourier
Five Phase 1 criteria: Bug 2 closed (cap_pool readback returns real
pixels through libva); Bug 3 closed (hantro MPEG-2 + VP8 controls
accepted on new kernel); patches ship from kernel-agent (local-carry
acceptable, mainline bonus); zero codec-contract regression vs iter4;
5/5 direct-verification block restored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:54:54 +00:00

20 KiB

Iteration 5 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock

Opens 2026-05-10 immediately after iter4 close (iter4_phase7_close.md, commit 9d2b7c1). Per feedback_dev_process.md Phase 0, this document captures iter5's substrate state + open issue inventory and proposes candidate research questions for Phase 1 lock.

This iteration breaks pattern. iter1..iter4 all targeted "the next codec." With VP9 closed at iter4, the campaign-level scope ("5 codecs boolean-correct") is technically green (4/5 direct + 1/5 transitive). iter5 must therefore answer a different class of research question than "add codec N+1." Three substantive candidates surfaced from iter4 close are recorded below; Phase 1 lock picks one.

Substrate state (verified 2026-05-10 21:50 CEST on fresnel)

Property Value Notes
Kernel linux-fresnel-fourier 7.0-1 kernel-agent product. v7.0 + 3 PBP DTS patches. NOT besser-direct. vb2_dma_resv + panfrost iommu-cache excluded pending RFC v2.
Boot device numbering (today) /dev/video0 rga · /dev/video1 vpu-enc · /dev/video2 vpu-dec (hantro MPEG-2/VP8) · /dev/video3 rkvdec (H.264/HEVC/VP9) · /dev/video4,5 USB cams /dev/media0=hantro · /dev/media1=rkvdec · /dev/media2=uvc. Different from iter4 Phase 6 boot — auto-detect picks hantro encoder first (iter4-B1 not fixed).
Fork tip 692eaa0 iter4 Phase 7 fix-forward (h264_start_code profile-gating)
Backend installed /usr/lib/dri/v4l2_request_drv_video.so SHA256 6e90b7a9b2c33480dd3ffc2da8423ab0bcef14f23c68cf18dc2ae2ff66ac808c matches fork tip build
Codec scoreboard 5/5 PASS (4 direct + 1 transitive) iter1 MPEG-2 · iter2 HEVC · T4 H.264 · iter3 VP8 · iter4 VP9

Open issue inventory from iter4 close

Carried forward into iter5 (any/all can be locked as the iter5 target):

Substrate-class issues (block direct verification through libva backend)

Bug 2 — cap_pool readback regression substrate-wide on linux-fresnel-fourier 7.0-1. Symptom: libva-backend ffmpeg-vaapi-hwdownload returns the cap_pool init pattern (RGB(0,0x4c,0) constant) for all 4 already-shipping codecs at iter4 Phase 7 capture. H.264 partial exception (keyframe reads correctly, inter frames stuck). Hash 93dd9db5... appears across VP9 + MPEG-2. Sibling to iter3 dma_resv issue but different signature: constant 0x4c instead of all-zero. Kernel-direct ffmpeg-v4l2request works fine (iter4 Leg 3 PASS). So the broken path is libva → cap_pool → kernel binding, not the kernel.

Bug 3 — hantro UAPI drift on MPEG-2 + VP8: kernel Unable to set control(s) rejection at submission time. Phase 2 of iter1 (6.19.9-99) had v4l2_ctrl_vp9_frame size 144 / _compressed_hdr 1947; iter4 Phase 3 (7.0-1) had 168 / 2040. Probably struct-size or new-required-field drift on hantro controls (MPEG-2/VP8 sequence/picture/quant), analogous to the VP9 struct evolution.

Backend-class issues (iter4 Phase 6 backlog)

  • iter4-B1 — Auto-detect picks first-enumerated /dev/media* driver-match without discriminating decoder vs encoder. Today's boot enumerates hantro-vpu encoder (rk3399-vpu-enc) at /dev/media0; auto-detect picks it for VP9 (wrong). Workaround: explicit LIBVA_V4L2_REQUEST_NO_AUTODETECT=1 LIBVA_V4L2_REQUEST_VIDEO_PATH=... LIBVA_V4L2_REQUEST_MEDIA_PATH=.... Fix: walk should require MEDIA_ENT_F_PROC_VIDEO_DECODER entity in the topology, not just driver-name match.
  • iter4-B2 — mpv-vaapi Could not create device for VP9. Consumer-side, not backend.
  • iter4-Q6 — per-segment seg_param[s].luma_ac_quant_scale → kernel feature_data[s][ALT_Q] is lossy for non-BBB segmentation-enabled streams. Dead code for BBB.
  • iter4-COLOR_RANGE — VAAPI exposes no color_range; backend leaves V4L2_VP9_FRAME_FLAG_COLOR_RANGE_FULL_SWING clear. Wrong for full-range JPEG-encoded VP9.
  • B3 — picture.c BeginPicture profile-aware reset. iter1-4 added per-profile reset lines without refactoring.
  • B4 — context.c log suppression for Unable to set control(s) warnings from codecs the bound driver doesn't support.
  • B5 — mpeg2 vbv_buffer_size polish (iter1 S2 finding).
  • B6 — h265 SPS bitstream-parse fidelity gap.
  • L3 — vaDeriveImage cache-stale.

Regression invariants (iter1-3 hashes invalidated by substrate change)

iter4 Phase 7 documented all 4 prior codecs FAIL on linux-fresnel-fourier 7.0-1:

Codec Driver iter4 P7 verdict Underlying cause
H.264 rkvdec PARTIAL (keyframe-only) Bug 2
HEVC rkvdec FAIL (constant fill) Bug 2
MPEG-2 hantro FAIL + Unable to set control(s) Bug 2 + Bug 3
VP8 hantro FAIL + Unable to set control(s) Bug 2 + Bug 3
VP9 rkvdec FAIL until fix-forward iter4-specific (start-code prepend), closed

The campaign-level "5/5 codecs passing" claim is true on iter4's substrate frame-of-reference (transitive proof discipline) but NOT byte-identical on the new substrate. Until iter5 work establishes Bug 2 + Bug 3 closure, iter5+ regression test means transitive proof + control-payload byte-compare, not direct pixel hashes.

Candidate iter5 research questions

Candidate A — Re-anchor 4-codec regression block on linux-fresnel-fourier 7.0-1

"Restore direct pixel verification for at least 3 of {H.264, HEVC, MPEG-2, VP8} on the new kernel substrate. Lock the regression-test reference hashes for iter5+ regression-block-of-5."

Pass/fail: For ≥3 codecs of the four iter1-3 codecs, ffmpeg-vaapi-hwdownload + libva backend produces a SHA256 (or transitive-proof equivalent) that locks as iter5+ regression invariant. The 4-codec block in iter5 binding cells is GREEN.

Scope: Investigate Bug 2 (cap_pool readback) — is it a libva-backend cap_pool refactor needed, or a kernel patch? Investigate Bug 3 (hantro UAPI drift) — what changed in 7.0-1 struct shapes for MPEG-2/VP8 controls? Both can be backend-only OR cross-cutting into kernel-agent.

Pro: Establishes a re-grounded baseline for iter5+ campaign claims. Without it, every future iteration must say "re-verify against substrate" before declaring regression-block clean.

Con: Bug 2 might require kernel patches (RFC v2 vb2_dma_resv hasn't landed); fix-forward backend may not be enough. Scope risk: could expand into a kernel-iteration.

Candidate B — Substrate stabilization: land vb2_dma_resv + hantro UAPI drift fixes in kernel-agent

"Remove the cap_pool readback workaround across the campaign by landing kernel patches in the linux-fresnel-fourier product. Direct verification becomes possible for all 5 codecs."

Pass/fail: Post-iter5 kernel ships with patches; ffmpeg-vaapi-hwdownload returns correct pixels for all 5 codecs (no transitive proof needed); the iter4 transitive proof for VP9 can be replaced by direct pixel hash.

Scope: Cross-product work in kernel-agent (sibling memory project_kernel_agent.md). vb2_dma_resv RFC v2 is in design upstream — track + land. Hantro UAPI drift root cause may be in linux-media tree, not local patch.

Pro: Solves the real underlying issue. iter1-4 all become direct-verifiable; transitive proofs become belt-and-suspenders. iter6+ can lock pixel hashes confidently.

Con: Large scope. Kernel-iteration timing, possibly multi-week. May not be ready for a single iter5 close cycle. Memory reference_dmabuf_resv_blocker.md notes RFC v1 rejected, v2 in design — landing window uncertain.

Candidate C — Auto-detect device-discovery harden (iter4-B1)

"Backend auto-detect picks the right decoder for each codec on every boot, without env-override workarounds. Validate against the iter4-style enumeration shuffle (today: hantro-vpu encoder enumerates first)."

Pass/fail: With auto-detect (no env vars), vainfo enumerates all 5 codec profiles correctly. ffmpeg-vaapi VP9 + H.264 + HEVC route to rkvdec; MPEG-2 + VP8 route to hantro-vpu-dec. Tested on at least 2 reboots with different device-numbering orders.

Scope: backend-only. ~50-100 LOC change in request.c::v4l2_request_init to enrich the media topology walk: require MEDIA_ENT_F_PROC_VIDEO_DECODER on at least one entity, OR scan media-controller entities for decode-capable function, OR maintain a codec-to-card-name map.

Pro: Smallest scope. Pure-backend fix. Removes a campaign-wide fragility. Mechanical.

Con: Doesn't address Bug 2 + Bug 3 (substrate-class blockers). iter5 close would say "5/5 codecs auto-route correctly" but still need transitive proof to verify any of them actually decode.

Candidate D — Performance / measurement iteration

"Quantify per-codec performance on fresnel — CPU utilization, frames-per-second, drop count over a 60s window — for H.264/HEVC/VP9 paths. Establish iter5 binding-cell numbers for future performance-regression-checks."

Pass/fail: For each of {H.264 1080p30, HEVC 720p, VP9 720p}, mpv --hwdec=vaapi --frames=10s reports CPU% < some-threshold, no dropped frames, libva path engaged. The numbers themselves become the iter5+ binding-cell anchors.

Scope: pure measurement + harness. NO backend code changes. Maybe a perf / top / mpv --status harness script. Reproducible per-codec invocations.

Pro: Campaign README locked this as "follow-up iteration after correctness lands." Correctness has landed. Now is the formally-next iteration. NOT blocked by Bug 2/3 (perf is engagement-side, not pixel-side).

Con: Bug 2 + Bug 3 still block the iter1-3 codecs from reporting correct pixels — performance numbers without pixel correctness are vacuous (could measure 60fps of constant green). MPEG-2 + VP8 may not even submit-controls cleanly. Likely needs Candidate A or B first.

Recommendation

If pressed to recommend a sequence: A → C → B → D.

A first because regression invariants underpin everything else. C next because the fragility hits the user every boot. B because it's the real fix but timing-uncertain. D last because it depends on A.

A and C could plausibly be one iteration if scope cooperates; B is a separate beast; D is genuinely "after correctness."

Locked research question (iteration 5, 2026-05-10)

User pick: Candidate B — both Bug 2 + Bug 3 in one close, kernel-agent local-carry acceptable.

"Land vb2_dma_resv (Bug 2) AND fix hantro UAPI drift (Bug 3) in the linux-fresnel-fourier kernel product so that every shipping codec on RK3399 — H.264, HEVC, VP9 via rkvdec; MPEG-2, VP8 via hantro-vpu-dec — produces directly-readable pixels through the libva backend's cap_pool path, on the next kernel product shipped from kernel-agent. The transitive-proof discipline becomes optional belt-and-suspenders, not the only viable verifier."

Pass/fail (boolean, per feedback_dev_process.md Phase 1)

  1. Bug 2 closed — cap_pool readback returns kernel-decoded pixels on the new kernel. For all 3 rkvdec codecs (H.264 1080p30, HEVC 720p, VP9 720p), ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload,format=nv12,format=yuv420p -f rawvideo via the libva backend on the new kernel returns raw YUV byte-identical to the SW reference (or transitive equivalent that explicitly demonstrates cap_pool is delivering kernel-written pages, not init pattern). The RGB(0,0x4c,0) constant-fill signature is gone.

  2. Bug 3 closed — hantro MPEG-2 + VP8 controls accepted on the new kernel. ffmpeg -hwaccel vaapi ... bbb_720p10s_{mpeg2.ts,vp8.webm} via the libva backend on the new kernel runs without kernel Unable to set control(s) log messages, and produces YUV byte-matching the SW reference (with or without transitive proof depending on Bug 2 status for the hantro path specifically).

  3. Substrate ships from kernel-agent. The new kernel is built and promoted via the kernel-agent pipeline (per memory project_kernel_agent.mdka-promote + ka-close --status success semantics) and tagged linux-fresnel-fourier 7.0-2 (or whatever version the agent produces next). Local-carry patches in the kernel-agent product are acceptable for iter5 close — mainline-landing is bonus, not required. Patch scope-tagged per agent discipline (Hard rule #7).

  4. No regression on already-shipping codec contracts. The Phase 3 anchor payloads (FRAME + COMPRESSED_HDR + slice for all 5 codecs) on the new kernel still byte-match the iter1-4 captures, modulo the documented S4 uv_mode VP9 carve-out. The libva backend tip 692eaa0 is unmodified; only the kernel substrate changes.

  5. Five-codec direct-verification block. With the new kernel + unmodified backend tip 692eaa0, all 5 codecs pass criterion-4 directly (not transitively). VP9 in particular: byte-identical YUV from libva-backend hwdownload to SW reference. The campaign scoreboard becomes "5/5 direct" (was "4 direct + 1 transitive").

A clean iter5 close has all five criteria green. Phase 7 → Phase 4 loopback per feedback_dev_process.md if any fail.

Scope locks

In scope:

  • vb2_dma_resv kernel patch (RFC v2 carry, or local equivalent). Source: linux-media list RFC v2 thread (resume from reference_dmabuf_resv_blocker.md).
  • Hantro UAPI drift root-cause + fix. Probably struct-size / field-additions in drivers/media/platform/verisilicon/hantro_*.c between 6.19 → 7.0. Phase 3 baseline diffs the headers in both kernels.
  • Kernel-agent product build + promote pipeline (per project_kernel_agent.md). Iter5 is the first user-driven exercise of ka-promote + ka-close for the fresnel-fourier campaign tag.
  • Re-verification matrix on new kernel: all 5 codecs through libva, direct pixel hash.

Out of scope:

  • iter4-B1 (auto-detect device discrimination) — backend-side fragility, separate iteration.
  • iter4-B2 (mpv-vaapi create-device failure) — consumer-side.
  • iter4-Q6, COLOR_RANGE, B3/B4/B5/B6/L3 backlog — backend, not kernel.
  • panfrost IOMMU_CACHE — sibling issue not on Bug-2/Bug-3 critical path.
  • Performance metrics — that's Candidate D, deferred again.
  • Front-end libva (campaign-wide, unchanged).
  • AV1 (no decoder block).
  • Bootlin / Collabora upstreaming — local-carry acceptable per user pick. Stays default-deferred.
  • Mainline merge — bonus, not gating.

Open questions to resolve in Phase 0 inventory deepening / Phase 2 source-read

  1. vb2_dma_resv RFC v2 status — what is the current patch revision on linux-media? Is it apply-able to v7.0 directly, or does it need rebase? Phase 2 task.
  2. Hantro 7.0 driver delta — diff drivers/media/platform/verisilicon/hantro_{mpeg2,vp8}.c between v6.19.9 and v7.0 upstream. What controls' struct shapes changed? What new mandatory fields are there? Phase 3 baseline.
  3. kernel-agent operational status — memory notes the agent is spec'd 2026-05-09, not yet operational; ka-* CLI verbs not implemented. Does iter5 trigger building the agent, or use the manual kernel-build path as fallback? Coordinate with the operator at Phase 1 close.
  4. Patch attribution — vb2_dma_resv is somebody else's patch (linux-media author). For local-carry the kernel-agent's patch metadata needs the upstream author preserved + a "Backported-by: claude-noether " trailer. Phase 6 detail.
  5. Cross-substrate verification — once new kernel ships, must we also rebuild libva backend against the new kernel UAPI headers? Probably no (backend reads UAPI at runtime, not build-time), but Phase 3 confirms.
  6. Test corpus — same 5 fixtures (bbb_*) as iter1-4. New kernel could break the fixture-parse path if hantro's new UAPI requires different bitstream framing — unlikely but Phase 3 verifies.
  7. Reboot discipline — installing a new kernel requires fresnel reboot. Schedule with the operator (Pinebook Pro is a working machine).

Phase 2 source-read targets

  • drivers/media/v4l2-core/videobuf2-dma-contig.c (or wherever vb2_dma_resv lives upstream — RFC v2 patch context).
  • drivers/media/platform/verisilicon/hantro_mpeg2.c + hantro_vp8.c — v6.19 vs v7.0 delta.
  • drivers/media/v4l2-core/v4l2-ctrls-defs.c — control-struct evolution for VP8 + MPEG-2.
  • /usr/include/linux/v4l2-controls.h on fresnel (currently 7.0 headers).
  • Memory project_kernel_agent.md — agent operating envelope.
  • Memory reference_fresnel_kernel_substrate.md — current product version + excluded-patch list.
  • linux-media archive — RFC v2 thread + reviewer feedback.

Phase 3 baseline (predicted shape)

Phase 3 captures:

  • Verbatim FRAME + COMPRESSED_HDR (VP9) + slice (H.264/HEVC/MPEG-2/VP8) control payloads on linux-fresnel-fourier 7.0-1 (current kernel) for all 5 codecs. Strace + extract.py from iter4 close evidence applies directly.
  • Raw YUV from ffmpeg-vaapi-hwdownload on the current kernel (will be all-0x4c constant-fill = init pattern for ≥3 codecs, confirms Bug 2 is reproduced).
  • Raw YUV from ffmpeg-v4l2request kernel-direct on the current kernel for all 5 codecs (the reference, since kernel-direct bypasses cap_pool).
  • Kernel Unable to set control(s) log capture for hantro MPEG-2 + VP8 (confirms Bug 3 reproduced).
  • Header diff: linux-fresnel-fourier 7.0-1 vs linux-eos-arm 6.19.9-99 v4l2-controls.h + hantro driver source.

Once Phase 3 lands, Phase 4 designs patches against the diff.

Predicted iter5 difficulty + shape

  • vs iter1-3 (codec adds, ~300-800 LOC libva backend): iter5 is kernel-side. Patch scope likely small (vb2_dma_resv is reportedly ~50-100 lines mainline; hantro UAPI drift may be smaller still, just struct-size guards). Build + promote pipeline is the time sink.
  • vs iter4 (single new codec): iter5 touches NO codec. It touches the kernel that codecs run on.
  • Predicted scope: 1-3 kernel patches (~50-200 LOC), kernel build + install per kernel-agent flow, libva-backend regression matrix re-run. No fork patches expected unless Phase 3 surfaces a backend-side amplification of Bug 2/3.
  • Most likely Phase 7 failure mode: vb2_dma_resv RFC v2 doesn't apply cleanly to v7.0; hantro 7.0 driver delta turns out to need a downstream patch the kernel maintainers haven't shipped yet; cap_pool readback fails for reasons beyond what vb2_dma_resv addresses.

Predecessor close-out summary

Per the campaign 8(+1)-phase loop, iter4 closed at 9d2b7c1 (campaign repo) + fork tip 692eaa0. iter4's transitive proof was the workaround for Bug 2; iter5 sets out to remove that workaround for the rkvdec block, and to remove the regression-block falsification introduced by Bug 3 on the hantro block.

Memory rules carry forward (no rule retirement expected at iter5 open):

  • feedback_dev_process (campaign-wide)
  • feedback_gitea_as_claude_noether (updated at iter4 close)
  • feedback_no_session_termination_attempts
  • feedback_review_empirical_over_theoretical (both directions)
  • feedback_unconditional_codec_state (iter4 lesson, applies to any new backend flag work)
  • feedback_pixel_compare_in_yuv_not_png (iter4 lesson, applies to criterion 4/5)
  • feedback_hw_decode_engagement_check (iter3 lesson)
  • feedback_runtime_enumerates_allowlists (iter3 lesson)
  • reference_dmabuf_resv_blocker (iter5 STARTS HERE — this is the bug iter5 closes)
  • reference_fresnel_kernel_substrate (iter5 produces the next entry's version bump)
  • project_kernel_agent (iter5 is the first campaign-side exercise of the agent flow)

iter5 → iter6 handoff prediction:

  • If Bug 2 + Bug 3 close cleanly: iter6 picks up Candidate A (regression block re-anchored on new substrate, formal iter5+ binding-cell numbers) or Candidate C (auto-detect harden).
  • If Bug 2 closes but Bug 3 doesn't: iter6 = Bug 3 alone, or iter6 swaps the iter5 plan to iter5b mid-loop.
  • If neither closes: Phase 7 → Phase 4 loopback within iter5, OR Phase 7 → Phase 0 reset.

Memory state at iter5 open

Active rules (MEMORY.md index): 11 entries. Recently added at iter4 close:

  • feedback_unconditional_codec_state (iter4 Phase 7 root-cause)
  • feedback_pixel_compare_in_yuv_not_png (iter4 Phase 7 close)
  • feedback_gitea_as_claude_noether (updated — SSH push now wired, user is gitea at port 2222)

No new memory needed at iter5 open until Phase 1 locks the target.