iter7 Phase 0 lock: iter4-B1 auto-detect harden — require MEDIA_ENT_F_PROC_VIDEO_DECODER
Backend-only ~30-80 LOC. Walk media-topology entities (already partially done at iter4 Commit Z); require at least one entity with function == MEDIA_ENT_F_PROC_VIDEO_DECODER. Eliminates the hantro encoder false-match that breaks vainfo + ffmpeg-vaapi on every other reboot. 5 boolean Phase 1 criteria locked. No kernel work. No pixel-correctness chasing. Quality-of-life delivery; removes per-session env-override friction. Predicted lowest-difficulty iteration since iter1. 2-3 hours wallclock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,121 @@
|
||||
# Iteration 7 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock
|
||||
|
||||
Opens 2026-05-12 immediately after iter6 PARTIAL close ([`phase8_iteration6_close.md`](phase8_iteration6_close.md), commit `8ce00d3`).
|
||||
|
||||
iter6 narrowed Bug 6 to kernel-side (H-E) and closed PARTIAL. iter7 pivots to a smaller, lower-risk delivery: **iter4-B1 auto-detect device discrimination**. Pure backend fix, no kernel work, no pixel-correctness chasing.
|
||||
|
||||
## Locked research question (iteration 7)
|
||||
|
||||
> *"Backend auto-detect picks the correct V4L2 decode device on every fresnel boot, regardless of `/dev/media*` enumeration order. After fix: a fresh-boot `vainfo` lists all 5 codec profiles correctly without any `LIBVA_V4L2_REQUEST_*` env override."*
|
||||
|
||||
### Pass/fail (boolean)
|
||||
|
||||
1. **Fresh-boot vainfo enumerates all 5 codecs**. `ssh fresnel 'env LIBVA_DRIVER_NAME=v4l2_request vainfo'` (no `LIBVA_V4L2_REQUEST_VIDEO_PATH` / `_MEDIA_PATH` override) lists `VAProfileH264*` + `VAProfileHEVCMain` + `VAProfileVP9Profile0` + `VAProfileMPEG2*` + `VAProfileVP8Version0_3`.
|
||||
2. **Auto-detect correctly routes H.264/HEVC/VP9 to rkvdec**. `ffmpeg -hwaccel vaapi -i bbb_720p10s_hevc.mp4 -frames:v 1 -f null -` without env override engages rkvdec (verifiable via strace showing `/dev/video<rkvdec>` opened).
|
||||
3. **Auto-detect correctly routes MPEG-2/VP8 to hantro-vpu-dec**. Same shape for MPEG-2 + VP8 fixtures; strace shows hantro decoder opened (NOT hantro encoder at `/dev/video<hantro-enc>`).
|
||||
4. **No regression on any locked iter5b-β / iter6 state**. VP9 still PASS direct, MPEG-2 still PASS, H.264 keyframe-partial still unchanged, HEVC + VP8 still in their existing partial states (Bugs 5/6 not in iter7 scope).
|
||||
5. **Multi-boot stability**: at least 2 reboots of fresnel (different `/dev/media*` enumeration orders if achievable) confirm auto-detect routes correctly each time.
|
||||
|
||||
Clean iter7 close = all 5 criteria green. Phase 7 → Phase 4 loopback per `feedback_dev_process.md` if any fail.
|
||||
|
||||
## Mechanism the question targets
|
||||
|
||||
Per `phase4_iter5b_plan_v2.md` C5 risk-register and iter5b Phase 7 retro: today's auto-detect at `request.c::v4l2_request_init` walks `/dev/media*` in enumeration order, picks the first one whose `MEDIA_IOC_DEVICE_INFO.driver` name matches an allow-list (`{rkvdec, hantro-vpu, cedrus, sun4i_csi}`). The allow-list doesn't discriminate decoder vs encoder.
|
||||
|
||||
On RK3399 today, `hantro-vpu` is the kernel driver name for BOTH:
|
||||
- `/dev/media0` or `/dev/media1` (boot-dependent) → `rockchip,rk3399-vpu-enc` (the encoder card)
|
||||
- `/dev/media0` or `/dev/media1` (boot-dependent) → `rockchip,rk3399-vpu-dec` (the decoder card)
|
||||
|
||||
The walk picks the first hantro-vpu match, which is sometimes the encoder. The encoder doesn't expose decode formats; vainfo enumerates nothing; ffmpeg-vaapi fails.
|
||||
|
||||
iter4 Phase 6 Commit Z established the media-topology-walk pattern (better than enumeration-order /dev/video*). iter4 Phase 7 + iter5/iter5b/iter6 still hit the issue because the topology walk reads the driver name only, not the entity types.
|
||||
|
||||
### The fix shape
|
||||
|
||||
Walk `/dev/media*`, do `MEDIA_IOC_DEVICE_INFO` (driver name check), THEN walk media-topology entities and require at least one entity with function `MEDIA_ENT_F_PROC_VIDEO_DECODER`. Only accept the media device if a decoder entity is present.
|
||||
|
||||
This eliminates the encoder. Predicted fix size: ~50-100 LOC in `request.c`.
|
||||
|
||||
## Substrate state at iter7 open
|
||||
|
||||
| Property | Value |
|
||||
|---|---|
|
||||
| Kernel | `7.0.0-fresnel-fourier` (linux-fresnel-fourier 7.0-1). Unchanged from iter5b/iter6. |
|
||||
| Fork tip | `70196f8` (iter5b-β Phase 6 Commit D). Unchanged through iter6. |
|
||||
| Backend installed | SHA `2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8`. Unchanged. |
|
||||
| Test fixtures | unchanged. |
|
||||
| Bugs 4/5/6 | still open, deferred to future iterations. |
|
||||
| iter6 narrowing | Bug 6 confirmed kernel-side (H-E); 4 of 5 hypotheses eliminated. |
|
||||
|
||||
## Scope locks
|
||||
|
||||
**In scope**:
|
||||
- `src/request.c::v4l2_request_init` auto-detect path.
|
||||
- Media-topology entity-walk via `MEDIA_IOC_G_TOPOLOGY` (already partially used per iter4 Commit Z).
|
||||
- Add `MEDIA_ENT_F_PROC_VIDEO_DECODER` entity-function check to the topology walk.
|
||||
- Optional: scope the driver-name allow-list to encoder/decoder-aware variants if the kernel exposes them.
|
||||
- 5-codec sweep regression-verify on the fixed backend.
|
||||
|
||||
**Out of scope**:
|
||||
- Any pixel-correctness chasing (Bugs 4/5/6).
|
||||
- Kernel patches.
|
||||
- Performance metrics.
|
||||
- Multi-decoder per-driver-data routing (the "use rkvdec for some codecs + hantro for others on the same backend instance" challenge — known as iter4-B1's "walk-and-pick-first" sub-issue).
|
||||
- Front-end libva.
|
||||
- AV1 / other-hardware.
|
||||
|
||||
## Phase 2 source-read targets
|
||||
|
||||
- `src/request.c::v4l2_request_init` — current auto-detect implementation (iter4 Commit Z `7f8fa93`).
|
||||
- `<linux/media.h>` — `MEDIA_IOC_G_TOPOLOGY`, `MEDIA_ENT_F_*` enum.
|
||||
- iter4 Phase 6 commit Z body — what the walk does today.
|
||||
|
||||
## Phase 3 baseline
|
||||
|
||||
iter4-B1 is well-known: env-override required per boot. Phase 3 captures the empirical baseline:
|
||||
1. Fresh boot. Enumerate `/dev/media*` driver names.
|
||||
2. `vainfo` with auto-detect (no env override). Observe what gets picked.
|
||||
3. Show that on the boot where hantro-vpu encoder enumerates first, vainfo lists NO profiles.
|
||||
|
||||
iter6 Phase 3 already captured device-mapping artifacts inadvertently (per logs). Phase 3 of iter7 may reuse that.
|
||||
|
||||
## Phase 4 plan shape (predicted)
|
||||
|
||||
Mechanical:
|
||||
1. After `MEDIA_IOC_DEVICE_INFO` matches the allow-list driver name, do `MEDIA_IOC_G_TOPOLOGY` (already happens at iter4 Commit Z).
|
||||
2. Walk the topology's entities array. For each entity, check `function` field.
|
||||
3. Accept the device only if AT LEAST ONE entity has `function == MEDIA_ENT_F_PROC_VIDEO_DECODER`.
|
||||
4. Else skip and continue.
|
||||
|
||||
LOC estimate: 30-80 LOC in `request.c`. One commit. Maybe a follow-up commit for any cosmetic logging.
|
||||
|
||||
## Phase 5 review concerns to invite
|
||||
|
||||
- Does the v7.0-fresnel-fourier kernel's hantro / rkvdec set `MEDIA_ENT_F_PROC_VIDEO_DECODER` on the right entities? Verify empirically by reading the topology of each media device.
|
||||
- Edge case: a media device with both encoder AND decoder entities (e.g., some SoCs have one combined video subsystem). Would the new code accept it? Yes (decoder entity present) — that's correct.
|
||||
- Edge case: a media device with no entity-type info (older kernels). Fall back to current driver-name-only check, or refuse the device? Phase 4 picks.
|
||||
|
||||
## Predicted iter7 cadence
|
||||
|
||||
Small. ~30 min for each phase.
|
||||
|
||||
- Phase 0: this doc.
|
||||
- Phase 2: source-read request.c + topology UAPI. ~15 min.
|
||||
- Phase 3: baseline empirical capture. ~15 min.
|
||||
- Phase 4: plan. ~15 min.
|
||||
- Phase 5: sonnet-architect review. ~30 min.
|
||||
- Phase 6: implement, build, install. ~30 min.
|
||||
- Phase 7: verify, reboot test. ~30 min.
|
||||
- Phase 8: close. ~15 min.
|
||||
|
||||
Total: 2-3 hours wallclock, contingent on fresnel reboot availability.
|
||||
|
||||
## What "iteration 7 close" looks like
|
||||
|
||||
Per `feedback_dev_process.md` Phase 8:
|
||||
- All 5 Phase 1 criteria green.
|
||||
- `phase8_iteration7_close.md` summarizing the commit + verification.
|
||||
- Memory entry update: `iter4-B1` removed from backlog; auto-detect harden documented (or fold into existing media-topology rule if it exists).
|
||||
- Campaign scoreboard: unchanged on pixel-correctness axis; +1 quality-of-life delivery (no more env-override per session).
|
||||
|
||||
Predicted iter7 difficulty: **lowest of any iter since iter1**. Pure backend mechanical fix. No new bug classes anticipated.
|
||||
Reference in New Issue
Block a user