Files
fresnel-fourier/phase0_findings_iter7.md
T
marfrit fc44a1e63c iter7 Phase 0 lock: iter4-B1 auto-detect harden — require MEDIA_ENT_F_PROC_VIDEO_DECODER
Backend-only ~30-80 LOC. Walk media-topology entities (already partially
done at iter4 Commit Z); require at least one entity with function ==
MEDIA_ENT_F_PROC_VIDEO_DECODER. Eliminates the hantro encoder false-match
that breaks vainfo + ffmpeg-vaapi on every other reboot.

5 boolean Phase 1 criteria locked. No kernel work. No pixel-correctness
chasing. Quality-of-life delivery; removes per-session env-override
friction.

Predicted lowest-difficulty iteration since iter1. 2-3 hours wallclock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:25:18 +00:00

7.3 KiB

Iteration 7 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock

Opens 2026-05-12 immediately after iter6 PARTIAL close (phase8_iteration6_close.md, commit 8ce00d3).

iter6 narrowed Bug 6 to kernel-side (H-E) and closed PARTIAL. iter7 pivots to a smaller, lower-risk delivery: iter4-B1 auto-detect device discrimination. Pure backend fix, no kernel work, no pixel-correctness chasing.

Locked research question (iteration 7)

"Backend auto-detect picks the correct V4L2 decode device on every fresnel boot, regardless of /dev/media* enumeration order. After fix: a fresh-boot vainfo lists all 5 codec profiles correctly without any LIBVA_V4L2_REQUEST_* env override."

Pass/fail (boolean)

  1. Fresh-boot vainfo enumerates all 5 codecs. ssh fresnel 'env LIBVA_DRIVER_NAME=v4l2_request vainfo' (no LIBVA_V4L2_REQUEST_VIDEO_PATH / _MEDIA_PATH override) lists VAProfileH264* + VAProfileHEVCMain + VAProfileVP9Profile0 + VAProfileMPEG2* + VAProfileVP8Version0_3.
  2. Auto-detect correctly routes H.264/HEVC/VP9 to rkvdec. ffmpeg -hwaccel vaapi -i bbb_720p10s_hevc.mp4 -frames:v 1 -f null - without env override engages rkvdec (verifiable via strace showing /dev/video<rkvdec> opened).
  3. Auto-detect correctly routes MPEG-2/VP8 to hantro-vpu-dec. Same shape for MPEG-2 + VP8 fixtures; strace shows hantro decoder opened (NOT hantro encoder at /dev/video<hantro-enc>).
  4. No regression on any locked iter5b-β / iter6 state. VP9 still PASS direct, MPEG-2 still PASS, H.264 keyframe-partial still unchanged, HEVC + VP8 still in their existing partial states (Bugs 5/6 not in iter7 scope).
  5. Multi-boot stability: at least 2 reboots of fresnel (different /dev/media* enumeration orders if achievable) confirm auto-detect routes correctly each time.

Clean iter7 close = all 5 criteria green. Phase 7 → Phase 4 loopback per feedback_dev_process.md if any fail.

Mechanism the question targets

Per phase4_iter5b_plan_v2.md C5 risk-register and iter5b Phase 7 retro: today's auto-detect at request.c::v4l2_request_init walks /dev/media* in enumeration order, picks the first one whose MEDIA_IOC_DEVICE_INFO.driver name matches an allow-list ({rkvdec, hantro-vpu, cedrus, sun4i_csi}). The allow-list doesn't discriminate decoder vs encoder.

On RK3399 today, hantro-vpu is the kernel driver name for BOTH:

  • /dev/media0 or /dev/media1 (boot-dependent) → rockchip,rk3399-vpu-enc (the encoder card)
  • /dev/media0 or /dev/media1 (boot-dependent) → rockchip,rk3399-vpu-dec (the decoder card)

The walk picks the first hantro-vpu match, which is sometimes the encoder. The encoder doesn't expose decode formats; vainfo enumerates nothing; ffmpeg-vaapi fails.

iter4 Phase 6 Commit Z established the media-topology-walk pattern (better than enumeration-order /dev/video*). iter4 Phase 7 + iter5/iter5b/iter6 still hit the issue because the topology walk reads the driver name only, not the entity types.

The fix shape

Walk /dev/media*, do MEDIA_IOC_DEVICE_INFO (driver name check), THEN walk media-topology entities and require at least one entity with function MEDIA_ENT_F_PROC_VIDEO_DECODER. Only accept the media device if a decoder entity is present.

This eliminates the encoder. Predicted fix size: ~50-100 LOC in request.c.

Substrate state at iter7 open

Property Value
Kernel 7.0.0-fresnel-fourier (linux-fresnel-fourier 7.0-1). Unchanged from iter5b/iter6.
Fork tip 70196f8 (iter5b-β Phase 6 Commit D). Unchanged through iter6.
Backend installed SHA 2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8. Unchanged.
Test fixtures unchanged.
Bugs 4/5/6 still open, deferred to future iterations.
iter6 narrowing Bug 6 confirmed kernel-side (H-E); 4 of 5 hypotheses eliminated.

Scope locks

In scope:

  • src/request.c::v4l2_request_init auto-detect path.
  • Media-topology entity-walk via MEDIA_IOC_G_TOPOLOGY (already partially used per iter4 Commit Z).
  • Add MEDIA_ENT_F_PROC_VIDEO_DECODER entity-function check to the topology walk.
  • Optional: scope the driver-name allow-list to encoder/decoder-aware variants if the kernel exposes them.
  • 5-codec sweep regression-verify on the fixed backend.

Out of scope:

  • Any pixel-correctness chasing (Bugs 4/5/6).
  • Kernel patches.
  • Performance metrics.
  • Multi-decoder per-driver-data routing (the "use rkvdec for some codecs + hantro for others on the same backend instance" challenge — known as iter4-B1's "walk-and-pick-first" sub-issue).
  • Front-end libva.
  • AV1 / other-hardware.

Phase 2 source-read targets

  • src/request.c::v4l2_request_init — current auto-detect implementation (iter4 Commit Z 7f8fa93).
  • <linux/media.h>MEDIA_IOC_G_TOPOLOGY, MEDIA_ENT_F_* enum.
  • iter4 Phase 6 commit Z body — what the walk does today.

Phase 3 baseline

iter4-B1 is well-known: env-override required per boot. Phase 3 captures the empirical baseline:

  1. Fresh boot. Enumerate /dev/media* driver names.
  2. vainfo with auto-detect (no env override). Observe what gets picked.
  3. Show that on the boot where hantro-vpu encoder enumerates first, vainfo lists NO profiles.

iter6 Phase 3 already captured device-mapping artifacts inadvertently (per logs). Phase 3 of iter7 may reuse that.

Phase 4 plan shape (predicted)

Mechanical:

  1. After MEDIA_IOC_DEVICE_INFO matches the allow-list driver name, do MEDIA_IOC_G_TOPOLOGY (already happens at iter4 Commit Z).
  2. Walk the topology's entities array. For each entity, check function field.
  3. Accept the device only if AT LEAST ONE entity has function == MEDIA_ENT_F_PROC_VIDEO_DECODER.
  4. Else skip and continue.

LOC estimate: 30-80 LOC in request.c. One commit. Maybe a follow-up commit for any cosmetic logging.

Phase 5 review concerns to invite

  • Does the v7.0-fresnel-fourier kernel's hantro / rkvdec set MEDIA_ENT_F_PROC_VIDEO_DECODER on the right entities? Verify empirically by reading the topology of each media device.
  • Edge case: a media device with both encoder AND decoder entities (e.g., some SoCs have one combined video subsystem). Would the new code accept it? Yes (decoder entity present) — that's correct.
  • Edge case: a media device with no entity-type info (older kernels). Fall back to current driver-name-only check, or refuse the device? Phase 4 picks.

Predicted iter7 cadence

Small. ~30 min for each phase.

  • Phase 0: this doc.
  • Phase 2: source-read request.c + topology UAPI. ~15 min.
  • Phase 3: baseline empirical capture. ~15 min.
  • Phase 4: plan. ~15 min.
  • Phase 5: sonnet-architect review. ~30 min.
  • Phase 6: implement, build, install. ~30 min.
  • Phase 7: verify, reboot test. ~30 min.
  • Phase 8: close. ~15 min.

Total: 2-3 hours wallclock, contingent on fresnel reboot availability.

What "iteration 7 close" looks like

Per feedback_dev_process.md Phase 8:

  • All 5 Phase 1 criteria green.
  • phase8_iteration7_close.md summarizing the commit + verification.
  • Memory entry update: iter4-B1 removed from backlog; auto-detect harden documented (or fold into existing media-topology rule if it exists).
  • Campaign scoreboard: unchanged on pixel-correctness axis; +1 quality-of-life delivery (no more env-override per session).

Predicted iter7 difficulty: lowest of any iter since iter1. Pure backend mechanical fix. No new bug classes anticipated.