Backend-only ~30-80 LOC. Walk media-topology entities (already partially done at iter4 Commit Z); require at least one entity with function == MEDIA_ENT_F_PROC_VIDEO_DECODER. Eliminates the hantro encoder false-match that breaks vainfo + ffmpeg-vaapi on every other reboot. 5 boolean Phase 1 criteria locked. No kernel work. No pixel-correctness chasing. Quality-of-life delivery; removes per-session env-override friction. Predicted lowest-difficulty iteration since iter1. 2-3 hours wallclock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.3 KiB
Iteration 7 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock
Opens 2026-05-12 immediately after iter6 PARTIAL close (phase8_iteration6_close.md, commit 8ce00d3).
iter6 narrowed Bug 6 to kernel-side (H-E) and closed PARTIAL. iter7 pivots to a smaller, lower-risk delivery: iter4-B1 auto-detect device discrimination. Pure backend fix, no kernel work, no pixel-correctness chasing.
Locked research question (iteration 7)
"Backend auto-detect picks the correct V4L2 decode device on every fresnel boot, regardless of
/dev/media*enumeration order. After fix: a fresh-bootvainfolists all 5 codec profiles correctly without anyLIBVA_V4L2_REQUEST_*env override."
Pass/fail (boolean)
- Fresh-boot vainfo enumerates all 5 codecs.
ssh fresnel 'env LIBVA_DRIVER_NAME=v4l2_request vainfo'(noLIBVA_V4L2_REQUEST_VIDEO_PATH/_MEDIA_PATHoverride) listsVAProfileH264*+VAProfileHEVCMain+VAProfileVP9Profile0+VAProfileMPEG2*+VAProfileVP8Version0_3. - Auto-detect correctly routes H.264/HEVC/VP9 to rkvdec.
ffmpeg -hwaccel vaapi -i bbb_720p10s_hevc.mp4 -frames:v 1 -f null -without env override engages rkvdec (verifiable via strace showing/dev/video<rkvdec>opened). - Auto-detect correctly routes MPEG-2/VP8 to hantro-vpu-dec. Same shape for MPEG-2 + VP8 fixtures; strace shows hantro decoder opened (NOT hantro encoder at
/dev/video<hantro-enc>). - No regression on any locked iter5b-β / iter6 state. VP9 still PASS direct, MPEG-2 still PASS, H.264 keyframe-partial still unchanged, HEVC + VP8 still in their existing partial states (Bugs 5/6 not in iter7 scope).
- Multi-boot stability: at least 2 reboots of fresnel (different
/dev/media*enumeration orders if achievable) confirm auto-detect routes correctly each time.
Clean iter7 close = all 5 criteria green. Phase 7 → Phase 4 loopback per feedback_dev_process.md if any fail.
Mechanism the question targets
Per phase4_iter5b_plan_v2.md C5 risk-register and iter5b Phase 7 retro: today's auto-detect at request.c::v4l2_request_init walks /dev/media* in enumeration order, picks the first one whose MEDIA_IOC_DEVICE_INFO.driver name matches an allow-list ({rkvdec, hantro-vpu, cedrus, sun4i_csi}). The allow-list doesn't discriminate decoder vs encoder.
On RK3399 today, hantro-vpu is the kernel driver name for BOTH:
/dev/media0or/dev/media1(boot-dependent) →rockchip,rk3399-vpu-enc(the encoder card)/dev/media0or/dev/media1(boot-dependent) →rockchip,rk3399-vpu-dec(the decoder card)
The walk picks the first hantro-vpu match, which is sometimes the encoder. The encoder doesn't expose decode formats; vainfo enumerates nothing; ffmpeg-vaapi fails.
iter4 Phase 6 Commit Z established the media-topology-walk pattern (better than enumeration-order /dev/video*). iter4 Phase 7 + iter5/iter5b/iter6 still hit the issue because the topology walk reads the driver name only, not the entity types.
The fix shape
Walk /dev/media*, do MEDIA_IOC_DEVICE_INFO (driver name check), THEN walk media-topology entities and require at least one entity with function MEDIA_ENT_F_PROC_VIDEO_DECODER. Only accept the media device if a decoder entity is present.
This eliminates the encoder. Predicted fix size: ~50-100 LOC in request.c.
Substrate state at iter7 open
| Property | Value |
|---|---|
| Kernel | 7.0.0-fresnel-fourier (linux-fresnel-fourier 7.0-1). Unchanged from iter5b/iter6. |
| Fork tip | 70196f8 (iter5b-β Phase 6 Commit D). Unchanged through iter6. |
| Backend installed | SHA 2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8. Unchanged. |
| Test fixtures | unchanged. |
| Bugs 4/5/6 | still open, deferred to future iterations. |
| iter6 narrowing | Bug 6 confirmed kernel-side (H-E); 4 of 5 hypotheses eliminated. |
Scope locks
In scope:
src/request.c::v4l2_request_initauto-detect path.- Media-topology entity-walk via
MEDIA_IOC_G_TOPOLOGY(already partially used per iter4 Commit Z). - Add
MEDIA_ENT_F_PROC_VIDEO_DECODERentity-function check to the topology walk. - Optional: scope the driver-name allow-list to encoder/decoder-aware variants if the kernel exposes them.
- 5-codec sweep regression-verify on the fixed backend.
Out of scope:
- Any pixel-correctness chasing (Bugs 4/5/6).
- Kernel patches.
- Performance metrics.
- Multi-decoder per-driver-data routing (the "use rkvdec for some codecs + hantro for others on the same backend instance" challenge — known as iter4-B1's "walk-and-pick-first" sub-issue).
- Front-end libva.
- AV1 / other-hardware.
Phase 2 source-read targets
src/request.c::v4l2_request_init— current auto-detect implementation (iter4 Commit Z7f8fa93).<linux/media.h>—MEDIA_IOC_G_TOPOLOGY,MEDIA_ENT_F_*enum.- iter4 Phase 6 commit Z body — what the walk does today.
Phase 3 baseline
iter4-B1 is well-known: env-override required per boot. Phase 3 captures the empirical baseline:
- Fresh boot. Enumerate
/dev/media*driver names. vainfowith auto-detect (no env override). Observe what gets picked.- Show that on the boot where hantro-vpu encoder enumerates first, vainfo lists NO profiles.
iter6 Phase 3 already captured device-mapping artifacts inadvertently (per logs). Phase 3 of iter7 may reuse that.
Phase 4 plan shape (predicted)
Mechanical:
- After
MEDIA_IOC_DEVICE_INFOmatches the allow-list driver name, doMEDIA_IOC_G_TOPOLOGY(already happens at iter4 Commit Z). - Walk the topology's entities array. For each entity, check
functionfield. - Accept the device only if AT LEAST ONE entity has
function == MEDIA_ENT_F_PROC_VIDEO_DECODER. - Else skip and continue.
LOC estimate: 30-80 LOC in request.c. One commit. Maybe a follow-up commit for any cosmetic logging.
Phase 5 review concerns to invite
- Does the v7.0-fresnel-fourier kernel's hantro / rkvdec set
MEDIA_ENT_F_PROC_VIDEO_DECODERon the right entities? Verify empirically by reading the topology of each media device. - Edge case: a media device with both encoder AND decoder entities (e.g., some SoCs have one combined video subsystem). Would the new code accept it? Yes (decoder entity present) — that's correct.
- Edge case: a media device with no entity-type info (older kernels). Fall back to current driver-name-only check, or refuse the device? Phase 4 picks.
Predicted iter7 cadence
Small. ~30 min for each phase.
- Phase 0: this doc.
- Phase 2: source-read request.c + topology UAPI. ~15 min.
- Phase 3: baseline empirical capture. ~15 min.
- Phase 4: plan. ~15 min.
- Phase 5: sonnet-architect review. ~30 min.
- Phase 6: implement, build, install. ~30 min.
- Phase 7: verify, reboot test. ~30 min.
- Phase 8: close. ~15 min.
Total: 2-3 hours wallclock, contingent on fresnel reboot availability.
What "iteration 7 close" looks like
Per feedback_dev_process.md Phase 8:
- All 5 Phase 1 criteria green.
phase8_iteration7_close.mdsummarizing the commit + verification.- Memory entry update:
iter4-B1removed from backlog; auto-detect harden documented (or fold into existing media-topology rule if it exists). - Campaign scoreboard: unchanged on pixel-correctness axis; +1 quality-of-life delivery (no more env-override per session).
Predicted iter7 difficulty: lowest of any iter since iter1. Pure backend mechanical fix. No new bug classes anticipated.