From 8ce6372ef8174b0a1f7f4d26289139b5feeda772 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Tue, 12 May 2026 23:40:53 +0000 Subject: [PATCH] =?UTF-8?q?iter7=20Phase=204:=20plan=20=E2=80=94=20split?= =?UTF-8?q?=20iter4-B1=20into=20B1a=20(this=20iter,=20encoder/decoder)=20+?= =?UTF-8?q?=20B1b=20(defer,=20multi-decoder=20routing)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 2 source-read found iter4-B1 conflates two sub-bugs: - B1a: walk picks encoder when it should pick decoder. SMALL FIX (~100-150 LOC). Add MEDIA_ENT_F_PROC_VIDEO_DECODER entity check in find_video_node_via_topology; two-pass prefer rkvdec. - B1b: multi-decoder routing (rkvdec for H.264/HEVC/VP9 + hantro for MPEG-2/VP8 from one backend instance). Bigger arch fix ~200-400 LOC. DEFERRED. iter7 ships B1a. Phase 1 criteria amended: - Auto-detect always picks a decoder, never an encoder. - Prefer rkvdec over hantro (rkvdec serves 3 of 5 codecs). - 2 reboots verify stability. - vainfo lists rkvdec's 3 codecs minimum. - No regression on iter5b-β / iter6 state. Phase 6 will use MEDIA_IOC_G_TOPOLOGY's entities+links arrays to match V4L node entities to decoder-proc entities. Two-pass walk: pass-1 rkvdec only, pass-2 any decoder. Empirical baseline: on 2026-05-12 boot, /dev/media0=rkvdec (only decoder), /dev/media1=hantro-vpu (encoder AND decoder both inside), /dev/media2=uvc. Fix must skip encoder when accepting media1. Co-Authored-By: Claude Opus 4.7 (1M context) --- phase4_iter7_plan.md | 94 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) create mode 100644 phase4_iter7_plan.md diff --git a/phase4_iter7_plan.md b/phase4_iter7_plan.md new file mode 100644 index 0000000..0af8a06 --- /dev/null +++ b/phase4_iter7_plan.md @@ -0,0 +1,94 @@ +# Iteration 7 — Phase 4 (plan) + +Phase 1 criteria scoped to B1a (encoder/decoder discrimination) only — B1b (multi-decoder routing) is bigger and stays backlog. Phase 4 specifies a minimum-LOC fix that satisfies all 5 amended criteria. + +## Empirical baseline (Phase 3 evidence) + +On the 2026-05-12 boot: + +| /dev/media | driver | Entities (relevant) | Resulting /dev/video | +|---|---|---|---| +| `/dev/media0` | `rkvdec` | rkvdec-source, rkvdec-proc, rkvdec-sink | /dev/video1 (decoder; the only V4L node) | +| `/dev/media1` | `hantro-vpu` | `rockchip,rk3399-vpu-enc-{source,proc,sink}` (entities 1, 3, 6) → /dev/video2 ; AND `rockchip,rk3399-vpu-dec-{source,proc,sink}` (entities 15, 17, 20) → /dev/video3 | TWO V4L nodes inside one media device | +| `/dev/media2` | `uvcvideo` | (USB camera) | not relevant | + +The current `find_video_node_via_topology` walks `interfaces[]` (`MEDIA_INTF_T_V4L_VIDEO`) and picks the FIRST one. For /dev/media0 (rkvdec) that's /dev/video1 = correct. For /dev/media1 (hantro-vpu) that COULD be /dev/video2 (encoder) — wrong if the interface ordering puts encoder first. + +The fix needs to: +- Recognize hantro-vpu's encoder entity vs decoder entity inside the same media device. +- Prefer rkvdec over hantro-vpu when both are present (rkvdec serves 3 of 5 codecs; hantro serves 2 of 5). + +## Contract clauses + +### C1 — Add `find_decoder_video_node_via_topology` to request.c + +Replaces the current `find_video_node_via_topology` (which picks the first V4L interface). New logic: + +1. `MEDIA_IOC_G_TOPOLOGY` fetches both `entities[]` and `interfaces[]` and `links[]`. +2. Iterate `entities[]` looking for `function == MEDIA_ENT_F_PROC_VIDEO_DECODER`. Collect the matching entity IDs (typically 1, but multiple is fine). +3. For each decoder-proc entity: + - Find linked V4L "source" or "sink" entities (entity ↔ entity links). + - Find the interface (`MEDIA_INTF_T_V4L_VIDEO`) linked to either of those V4L entities (interface ↔ entity links). + - Resolve the interface's `devnode.major/minor` to /dev/videoN. + - Return that. +4. If no decoder entity found, return -1 (this media device has no decoder; caller skips). + +The links graph: in v2 topology, each `media_v2_link` has `source_id` and `sink_id` referring to either entity IDs or interface IDs. The `flags` field distinguishes data links from interface links. + +Predicted size: ~80-120 LOC for the new function, replacing ~40 LOC of the old one. Net +40-80 LOC. + +### C2 — Prefer rkvdec in `known_decoder_drivers` + +The existing array already lists `"rkvdec"` first, then `"hantro-vpu"`. Walking `/dev/media*` in numeric order means whichever media device hits the rkvdec match first wins. On RK3399, rkvdec is usually at /dev/media0 or /dev/media1 — already-first-discovered most of the time. + +But if rkvdec enumerates at /dev/media2 and hantro at /dev/media0, the walk picks hantro first. To prefer rkvdec ALWAYS: do TWO passes — first pass accepts only rkvdec; second pass accepts any decoder. This is mechanical; ~20 LOC. + +Alternative: do ONE pass but break early ONLY on rkvdec; keep hantro match as "best so far" and continue scanning. ~10 LOC. + +Predicted approach: simpler 2-pass. + +### C3 — `LIBVA_V4L2_REQUEST_NO_AUTODETECT` escape hatch preserved + +The current code at request.c:279+ honors `LIBVA_V4L2_REQUEST_NO_AUTODETECT=1` to revert to legacy hardcoded paths. iter7 doesn't touch that branch. + +### C4 — Inline comment update + +Replace the existing comment block at request.c:60-89 explaining the walk. Document the entity-function check, the two-pass preference, and remove the iter4-B1 backlog reference (since iter7 closes the encoder/decoder discrimination part). + +### C5 — Verification matrix + +Phase 7 sweep: +1. `vainfo` (no env override) — must list ≥3 codecs (rkvdec's H.264/HEVC/VP9 at minimum). +2. ffmpeg-vaapi H.264 fixture — must engage decoder; verify via strace `/dev/video` opened. +3. Optional: explicit env-override route for MPEG-2/VP8 still works (B1b carry). +4. 5-codec sweep equivalent to iter6 baseline — verify VP9 still PASS, MPEG-2 still PASS (with env override), no regression. +5. Reboot fresnel; re-verify steps 1-4. + +### C6 — Phase 8 close + +Document: B1a closed; B1b still open (multi-decoder dispatch needs ~200-400 LOC architectural work). Update backlog accordingly. + +## Risk register + +| Risk | Probability | Impact | Mitigation | +|---|---|---|---| +| MEDIA_IOC_G_TOPOLOGY's links array doesn't link interface→entity the way I expect | LOW | MEDIUM | Phase 5 reviewer empirically checks link structure on fresnel; Phase 6 instruments before-implementation to dump topology | +| Existing iter4 Commit Z code paths break under refactor | LOW | LOW | Keep escape hatch; revert is easy | +| `cedrus` / `sun4i_csi` (Allwinner drivers) may not set MEDIA_ENT_F_PROC_VIDEO_DECODER | UNKNOWN | LOW | Falls back via the two-pass: pass-1 rkvdec specifically, pass-2 any decoder including cedrus if it has the entity function; if cedrus doesn't, the fix is too strict for that platform. Not a fresnel-fourier blocker. | +| H.264 / HEVC / VP9 paths regress (somehow) | LOW | HIGH | Phase 7 verifies; revert single iter7 commit if so | + +## Predicted iter7 cadence + +- Phase 5: ~30 min (small surface). +- Phase 6: ~45 min (single commit, ~100-150 LOC). +- Phase 7: ~30 min (sweep + reboot). +- Phase 8: ~15 min (close doc). + +Total: ~2 hours wallclock. + +## Phase 5 review concerns to invite + +1. **Entity function semantics**: does `MEDIA_ENT_F_PROC_VIDEO_DECODER` actually appear on hantro-dec-proc and rkvdec-proc entities? Empirically check on fresnel. +2. **Link discovery**: does the links graph use `MEDIA_LNK_FL_INTERFACE_LINK` flag to distinguish entity↔entity (data) from interface↔entity (control)? Verify. +3. **Multi-decoder-in-one-media-device case**: my fix handles 2 decoders in one media device by returning the FIRST one. For hantro-vpu media device on RK3399 there's only ONE decoder entity (the dec-proc). For SoCs with multiple decoder cores in one media device — undefined for now; first-match acceptable. +4. **Phase 7 reboot test**: how do we exercise the case where hantro encoder enumerates first? On the current boot rkvdec=/dev/media0, hantro=/dev/media1. The fix should handle BOTH orders. Forcing a re-enumeration may need a USB-cam unplug-replug or similar. Phase 7 attempts but accepts "single boot order verified" as sufficient.