From fc44a1e63c53bbbfe862c8ad26a0129a6474f322 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Tue, 12 May 2026 23:25:18 +0000 Subject: [PATCH] =?UTF-8?q?iter7=20Phase=200=20lock:=20iter4-B1=20auto-det?= =?UTF-8?q?ect=20harden=20=E2=80=94=20require=20MEDIA=5FENT=5FF=5FPROC=5FV?= =?UTF-8?q?IDEO=5FDECODER?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Backend-only ~30-80 LOC. Walk media-topology entities (already partially done at iter4 Commit Z); require at least one entity with function == MEDIA_ENT_F_PROC_VIDEO_DECODER. Eliminates the hantro encoder false-match that breaks vainfo + ffmpeg-vaapi on every other reboot. 5 boolean Phase 1 criteria locked. No kernel work. No pixel-correctness chasing. Quality-of-life delivery; removes per-session env-override friction. Predicted lowest-difficulty iteration since iter1. 2-3 hours wallclock. Co-Authored-By: Claude Opus 4.7 (1M context) --- phase0_findings_iter7.md | 121 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) create mode 100644 phase0_findings_iter7.md diff --git a/phase0_findings_iter7.md b/phase0_findings_iter7.md new file mode 100644 index 0000000..1530b2c --- /dev/null +++ b/phase0_findings_iter7.md @@ -0,0 +1,121 @@ +# Iteration 7 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock + +Opens 2026-05-12 immediately after iter6 PARTIAL close ([`phase8_iteration6_close.md`](phase8_iteration6_close.md), commit `8ce00d3`). + +iter6 narrowed Bug 6 to kernel-side (H-E) and closed PARTIAL. iter7 pivots to a smaller, lower-risk delivery: **iter4-B1 auto-detect device discrimination**. Pure backend fix, no kernel work, no pixel-correctness chasing. + +## Locked research question (iteration 7) + +> *"Backend auto-detect picks the correct V4L2 decode device on every fresnel boot, regardless of `/dev/media*` enumeration order. After fix: a fresh-boot `vainfo` lists all 5 codec profiles correctly without any `LIBVA_V4L2_REQUEST_*` env override."* + +### Pass/fail (boolean) + +1. **Fresh-boot vainfo enumerates all 5 codecs**. `ssh fresnel 'env LIBVA_DRIVER_NAME=v4l2_request vainfo'` (no `LIBVA_V4L2_REQUEST_VIDEO_PATH` / `_MEDIA_PATH` override) lists `VAProfileH264*` + `VAProfileHEVCMain` + `VAProfileVP9Profile0` + `VAProfileMPEG2*` + `VAProfileVP8Version0_3`. +2. **Auto-detect correctly routes H.264/HEVC/VP9 to rkvdec**. `ffmpeg -hwaccel vaapi -i bbb_720p10s_hevc.mp4 -frames:v 1 -f null -` without env override engages rkvdec (verifiable via strace showing `/dev/video` opened). +3. **Auto-detect correctly routes MPEG-2/VP8 to hantro-vpu-dec**. Same shape for MPEG-2 + VP8 fixtures; strace shows hantro decoder opened (NOT hantro encoder at `/dev/video`). +4. **No regression on any locked iter5b-β / iter6 state**. VP9 still PASS direct, MPEG-2 still PASS, H.264 keyframe-partial still unchanged, HEVC + VP8 still in their existing partial states (Bugs 5/6 not in iter7 scope). +5. **Multi-boot stability**: at least 2 reboots of fresnel (different `/dev/media*` enumeration orders if achievable) confirm auto-detect routes correctly each time. + +Clean iter7 close = all 5 criteria green. Phase 7 → Phase 4 loopback per `feedback_dev_process.md` if any fail. + +## Mechanism the question targets + +Per `phase4_iter5b_plan_v2.md` C5 risk-register and iter5b Phase 7 retro: today's auto-detect at `request.c::v4l2_request_init` walks `/dev/media*` in enumeration order, picks the first one whose `MEDIA_IOC_DEVICE_INFO.driver` name matches an allow-list (`{rkvdec, hantro-vpu, cedrus, sun4i_csi}`). The allow-list doesn't discriminate decoder vs encoder. + +On RK3399 today, `hantro-vpu` is the kernel driver name for BOTH: +- `/dev/media0` or `/dev/media1` (boot-dependent) → `rockchip,rk3399-vpu-enc` (the encoder card) +- `/dev/media0` or `/dev/media1` (boot-dependent) → `rockchip,rk3399-vpu-dec` (the decoder card) + +The walk picks the first hantro-vpu match, which is sometimes the encoder. The encoder doesn't expose decode formats; vainfo enumerates nothing; ffmpeg-vaapi fails. + +iter4 Phase 6 Commit Z established the media-topology-walk pattern (better than enumeration-order /dev/video*). iter4 Phase 7 + iter5/iter5b/iter6 still hit the issue because the topology walk reads the driver name only, not the entity types. + +### The fix shape + +Walk `/dev/media*`, do `MEDIA_IOC_DEVICE_INFO` (driver name check), THEN walk media-topology entities and require at least one entity with function `MEDIA_ENT_F_PROC_VIDEO_DECODER`. Only accept the media device if a decoder entity is present. + +This eliminates the encoder. Predicted fix size: ~50-100 LOC in `request.c`. + +## Substrate state at iter7 open + +| Property | Value | +|---|---| +| Kernel | `7.0.0-fresnel-fourier` (linux-fresnel-fourier 7.0-1). Unchanged from iter5b/iter6. | +| Fork tip | `70196f8` (iter5b-β Phase 6 Commit D). Unchanged through iter6. | +| Backend installed | SHA `2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8`. Unchanged. | +| Test fixtures | unchanged. | +| Bugs 4/5/6 | still open, deferred to future iterations. | +| iter6 narrowing | Bug 6 confirmed kernel-side (H-E); 4 of 5 hypotheses eliminated. | + +## Scope locks + +**In scope**: +- `src/request.c::v4l2_request_init` auto-detect path. +- Media-topology entity-walk via `MEDIA_IOC_G_TOPOLOGY` (already partially used per iter4 Commit Z). +- Add `MEDIA_ENT_F_PROC_VIDEO_DECODER` entity-function check to the topology walk. +- Optional: scope the driver-name allow-list to encoder/decoder-aware variants if the kernel exposes them. +- 5-codec sweep regression-verify on the fixed backend. + +**Out of scope**: +- Any pixel-correctness chasing (Bugs 4/5/6). +- Kernel patches. +- Performance metrics. +- Multi-decoder per-driver-data routing (the "use rkvdec for some codecs + hantro for others on the same backend instance" challenge — known as iter4-B1's "walk-and-pick-first" sub-issue). +- Front-end libva. +- AV1 / other-hardware. + +## Phase 2 source-read targets + +- `src/request.c::v4l2_request_init` — current auto-detect implementation (iter4 Commit Z `7f8fa93`). +- `` — `MEDIA_IOC_G_TOPOLOGY`, `MEDIA_ENT_F_*` enum. +- iter4 Phase 6 commit Z body — what the walk does today. + +## Phase 3 baseline + +iter4-B1 is well-known: env-override required per boot. Phase 3 captures the empirical baseline: +1. Fresh boot. Enumerate `/dev/media*` driver names. +2. `vainfo` with auto-detect (no env override). Observe what gets picked. +3. Show that on the boot where hantro-vpu encoder enumerates first, vainfo lists NO profiles. + +iter6 Phase 3 already captured device-mapping artifacts inadvertently (per logs). Phase 3 of iter7 may reuse that. + +## Phase 4 plan shape (predicted) + +Mechanical: +1. After `MEDIA_IOC_DEVICE_INFO` matches the allow-list driver name, do `MEDIA_IOC_G_TOPOLOGY` (already happens at iter4 Commit Z). +2. Walk the topology's entities array. For each entity, check `function` field. +3. Accept the device only if AT LEAST ONE entity has `function == MEDIA_ENT_F_PROC_VIDEO_DECODER`. +4. Else skip and continue. + +LOC estimate: 30-80 LOC in `request.c`. One commit. Maybe a follow-up commit for any cosmetic logging. + +## Phase 5 review concerns to invite + +- Does the v7.0-fresnel-fourier kernel's hantro / rkvdec set `MEDIA_ENT_F_PROC_VIDEO_DECODER` on the right entities? Verify empirically by reading the topology of each media device. +- Edge case: a media device with both encoder AND decoder entities (e.g., some SoCs have one combined video subsystem). Would the new code accept it? Yes (decoder entity present) — that's correct. +- Edge case: a media device with no entity-type info (older kernels). Fall back to current driver-name-only check, or refuse the device? Phase 4 picks. + +## Predicted iter7 cadence + +Small. ~30 min for each phase. + +- Phase 0: this doc. +- Phase 2: source-read request.c + topology UAPI. ~15 min. +- Phase 3: baseline empirical capture. ~15 min. +- Phase 4: plan. ~15 min. +- Phase 5: sonnet-architect review. ~30 min. +- Phase 6: implement, build, install. ~30 min. +- Phase 7: verify, reboot test. ~30 min. +- Phase 8: close. ~15 min. + +Total: 2-3 hours wallclock, contingent on fresnel reboot availability. + +## What "iteration 7 close" looks like + +Per `feedback_dev_process.md` Phase 8: +- All 5 Phase 1 criteria green. +- `phase8_iteration7_close.md` summarizing the commit + verification. +- Memory entry update: `iter4-B1` removed from backlog; auto-detect harden documented (or fold into existing media-topology rule if it exists). +- Campaign scoreboard: unchanged on pixel-correctness axis; +1 quality-of-life delivery (no more env-override per session). + +Predicted iter7 difficulty: **lowest of any iter since iter1**. Pure backend mechanical fix. No new bug classes anticipated.