Files
fresnel-fourier/phase4_iter7_plan.md
T
marfrit 8ce6372ef8 iter7 Phase 4: plan — split iter4-B1 into B1a (this iter, encoder/decoder) + B1b (defer, multi-decoder routing)
Phase 2 source-read found iter4-B1 conflates two sub-bugs:
- B1a: walk picks encoder when it should pick decoder. SMALL FIX
  (~100-150 LOC). Add MEDIA_ENT_F_PROC_VIDEO_DECODER entity check
  in find_video_node_via_topology; two-pass prefer rkvdec.
- B1b: multi-decoder routing (rkvdec for H.264/HEVC/VP9 + hantro
  for MPEG-2/VP8 from one backend instance). Bigger arch fix
  ~200-400 LOC. DEFERRED.

iter7 ships B1a. Phase 1 criteria amended:
- Auto-detect always picks a decoder, never an encoder.
- Prefer rkvdec over hantro (rkvdec serves 3 of 5 codecs).
- 2 reboots verify stability.
- vainfo lists rkvdec's 3 codecs minimum.
- No regression on iter5b-β / iter6 state.

Phase 6 will use MEDIA_IOC_G_TOPOLOGY's entities+links arrays to
match V4L node entities to decoder-proc entities. Two-pass walk:
pass-1 rkvdec only, pass-2 any decoder.

Empirical baseline: on 2026-05-12 boot, /dev/media0=rkvdec (only
decoder), /dev/media1=hantro-vpu (encoder AND decoder both inside),
/dev/media2=uvc. Fix must skip encoder when accepting media1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:40:53 +00:00

6.3 KiB

Iteration 7 — Phase 4 (plan)

Phase 1 criteria scoped to B1a (encoder/decoder discrimination) only — B1b (multi-decoder routing) is bigger and stays backlog. Phase 4 specifies a minimum-LOC fix that satisfies all 5 amended criteria.

Empirical baseline (Phase 3 evidence)

On the 2026-05-12 boot:

/dev/media driver Entities (relevant) Resulting /dev/video
/dev/media0 rkvdec rkvdec-source, rkvdec-proc, rkvdec-sink /dev/video1 (decoder; the only V4L node)
/dev/media1 hantro-vpu rockchip,rk3399-vpu-enc-{source,proc,sink} (entities 1, 3, 6) → /dev/video2 ; AND rockchip,rk3399-vpu-dec-{source,proc,sink} (entities 15, 17, 20) → /dev/video3 TWO V4L nodes inside one media device
/dev/media2 uvcvideo (USB camera) not relevant

The current find_video_node_via_topology walks interfaces[] (MEDIA_INTF_T_V4L_VIDEO) and picks the FIRST one. For /dev/media0 (rkvdec) that's /dev/video1 = correct. For /dev/media1 (hantro-vpu) that COULD be /dev/video2 (encoder) — wrong if the interface ordering puts encoder first.

The fix needs to:

  • Recognize hantro-vpu's encoder entity vs decoder entity inside the same media device.
  • Prefer rkvdec over hantro-vpu when both are present (rkvdec serves 3 of 5 codecs; hantro serves 2 of 5).

Contract clauses

C1 — Add find_decoder_video_node_via_topology to request.c

Replaces the current find_video_node_via_topology (which picks the first V4L interface). New logic:

  1. MEDIA_IOC_G_TOPOLOGY fetches both entities[] and interfaces[] and links[].
  2. Iterate entities[] looking for function == MEDIA_ENT_F_PROC_VIDEO_DECODER. Collect the matching entity IDs (typically 1, but multiple is fine).
  3. For each decoder-proc entity:
    • Find linked V4L "source" or "sink" entities (entity ↔ entity links).
    • Find the interface (MEDIA_INTF_T_V4L_VIDEO) linked to either of those V4L entities (interface ↔ entity links).
    • Resolve the interface's devnode.major/minor to /dev/videoN.
    • Return that.
  4. If no decoder entity found, return -1 (this media device has no decoder; caller skips).

The links graph: in v2 topology, each media_v2_link has source_id and sink_id referring to either entity IDs or interface IDs. The flags field distinguishes data links from interface links.

Predicted size: ~80-120 LOC for the new function, replacing ~40 LOC of the old one. Net +40-80 LOC.

C2 — Prefer rkvdec in known_decoder_drivers

The existing array already lists "rkvdec" first, then "hantro-vpu". Walking /dev/media* in numeric order means whichever media device hits the rkvdec match first wins. On RK3399, rkvdec is usually at /dev/media0 or /dev/media1 — already-first-discovered most of the time.

But if rkvdec enumerates at /dev/media2 and hantro at /dev/media0, the walk picks hantro first. To prefer rkvdec ALWAYS: do TWO passes — first pass accepts only rkvdec; second pass accepts any decoder. This is mechanical; ~20 LOC.

Alternative: do ONE pass but break early ONLY on rkvdec; keep hantro match as "best so far" and continue scanning. ~10 LOC.

Predicted approach: simpler 2-pass.

C3 — LIBVA_V4L2_REQUEST_NO_AUTODETECT escape hatch preserved

The current code at request.c:279+ honors LIBVA_V4L2_REQUEST_NO_AUTODETECT=1 to revert to legacy hardcoded paths. iter7 doesn't touch that branch.

C4 — Inline comment update

Replace the existing comment block at request.c:60-89 explaining the walk. Document the entity-function check, the two-pass preference, and remove the iter4-B1 backlog reference (since iter7 closes the encoder/decoder discrimination part).

C5 — Verification matrix

Phase 7 sweep:

  1. vainfo (no env override) — must list ≥3 codecs (rkvdec's H.264/HEVC/VP9 at minimum).
  2. ffmpeg-vaapi H.264 fixture — must engage decoder; verify via strace /dev/video<rkvdec> opened.
  3. Optional: explicit env-override route for MPEG-2/VP8 still works (B1b carry).
  4. 5-codec sweep equivalent to iter6 baseline — verify VP9 still PASS, MPEG-2 still PASS (with env override), no regression.
  5. Reboot fresnel; re-verify steps 1-4.

C6 — Phase 8 close

Document: B1a closed; B1b still open (multi-decoder dispatch needs ~200-400 LOC architectural work). Update backlog accordingly.

Risk register

Risk Probability Impact Mitigation
MEDIA_IOC_G_TOPOLOGY's links array doesn't link interface→entity the way I expect LOW MEDIUM Phase 5 reviewer empirically checks link structure on fresnel; Phase 6 instruments before-implementation to dump topology
Existing iter4 Commit Z code paths break under refactor LOW LOW Keep escape hatch; revert is easy
cedrus / sun4i_csi (Allwinner drivers) may not set MEDIA_ENT_F_PROC_VIDEO_DECODER UNKNOWN LOW Falls back via the two-pass: pass-1 rkvdec specifically, pass-2 any decoder including cedrus if it has the entity function; if cedrus doesn't, the fix is too strict for that platform. Not a fresnel-fourier blocker.
H.264 / HEVC / VP9 paths regress (somehow) LOW HIGH Phase 7 verifies; revert single iter7 commit if so

Predicted iter7 cadence

  • Phase 5: ~30 min (small surface).
  • Phase 6: ~45 min (single commit, ~100-150 LOC).
  • Phase 7: ~30 min (sweep + reboot).
  • Phase 8: ~15 min (close doc).

Total: ~2 hours wallclock.

Phase 5 review concerns to invite

  1. Entity function semantics: does MEDIA_ENT_F_PROC_VIDEO_DECODER actually appear on hantro-dec-proc and rkvdec-proc entities? Empirically check on fresnel.
  2. Link discovery: does the links graph use MEDIA_LNK_FL_INTERFACE_LINK flag to distinguish entity↔entity (data) from interface↔entity (control)? Verify.
  3. Multi-decoder-in-one-media-device case: my fix handles 2 decoders in one media device by returning the FIRST one. For hantro-vpu media device on RK3399 there's only ONE decoder entity (the dec-proc). For SoCs with multiple decoder cores in one media device — undefined for now; first-match acceptable.
  4. Phase 7 reboot test: how do we exercise the case where hantro encoder enumerates first? On the current boot rkvdec=/dev/media0, hantro=/dev/media1. The fix should handle BOTH orders. Forcing a re-enumeration may need a USB-cam unplug-replug or similar. Phase 7 attempts but accepts "single boot order verified" as sufficient.