User pick. 6 boolean criteria locked: VP8 libva==kdirect; no regression on VP9/MPEG-2/H.264-keyframe/HEVC; control-payload anchors hold. Scope: src/vp8.c, src/picture.c VP8 dispatch + buffer cases, src/surface.c surface_bind_slot, cap_pool slot lifecycle. No kernel work. Backend-side fix expected (decode runs through kernel cleanly; output diverges in slot rotation or partial fill). Predicted small: 5-50 LOC once root-caused. Phase 2 + Phase 3 likely take more wallclock than Phase 6 implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 KiB
Iteration 6 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock
Opens 2026-05-12 immediately after iter5b-β close (phase8_iteration5b_close.md, commit 9a14cc2). Per feedback_dev_process.md Phase 0, this document captures iter6's substrate state, the iter5b-β-surfaced bug inventory, and candidate research questions for Phase 1 lock.
iter5b-β was the first iteration to break the "codec N + 1" pattern. iter6 inherits an even messier menu: 3 named bugs from iter5b, the iter4-B1+B4 carry-overs, and the option-Φ candidates from the original iter5 Phase 0 doc that weren't picked.
Substrate state (verified 2026-05-12 at iter5b-β close)
| Property | Value | Notes |
|---|---|---|
| Kernel | 7.0.0-fresnel-fourier |
linux-fresnel-fourier 7.0-1 kernel-agent product; unchanged through iter5b. |
| Boot device numbering (today) | rkvdec /dev/video1+/dev/media0, hantro-vpu-dec /dev/video3+/dev/media1 |
Different from yesterday; iter4-B1 still open. |
| Fork tip | 70196f8 (β + Commit D) |
On noether + fresnel + gitea. |
| Backend installed | /usr/lib/dri/v4l2_request_drv_video.so SHA 2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8 |
β architecture; OUTPUT lifecycle owned by CreateContext. |
| Codec scoreboard | 5/5 with 2 direct (VP9, MPEG-2) + 3 mixed (H.264 keyframe-partial Bug 4, VP8 partial Bug 6, HEVC transitive-only direct-FAIL Bug 5) | iter5b-β closed VP9 directly; others remain mixed. |
Bug inventory after iter5b-β
Active bugs with explicit reproduction signatures
Bug 4 — H.264 inter-frame race-loss (carried from iter4 Phase 7)
- Signature: H.264 keyframe decodes correctly through libva; inter frames return all-zero pages.
- Reproduce:
ffmpeg -hwaccel vaapi -i bbb_1080p30_h264.mp4 -frames:v 3 -vf hwdownload,format=nv12,format=yuv420p -f rawvideo out.yuv. Hash71ac099b…. 99.99% zero; frame 1 first 16 bytes =81 81 80 80 80 7f 7f 7f 7f 7f 7f 80 80 80 81 81(real chroma); frames 2, 3 fully zero. - Hypothesis surface: slot rotation, partial DPB, or some inter-specific submission gap.
Bug 5 — HEVC libva DQBUF returns FLAG_ERROR
- Signature: every HEVC libva DQBUF (both OUTPUT and CAPTURE) sets
V4L2_BUF_FLAG_ERROR. Kernel rkvdec rejects the decode. CAPTURE stays at cap_pool init pattern (all-zero). - Reproduce: same shape as Bug 4, with
bbb_720p10s_hevc.mp4. Hash06b2c5a0…= all-zero. - Pre-existing: Phase 3 baseline anchor trace (pre-iter5b) also showed FLAG_ERROR on every HEVC DQBUF. iter2's "PASS via transitive proof" verified backend's control PAYLOAD matched kdirect's payload — but the kernel rejected the libva submission regardless. Some V4L2 protocol contract aspect differs between libva backend and ffmpeg-v4l2request that the transitive proof didn't capture (request_fd binding order, sequence number, ioctl sequencing, extra control needed, etc.).
- Difficulty estimate: medium-to-high. Need to diff the actual V4L2 ioctl streams between libva-vaapi-HEVC and ffmpeg-v4l2request-HEVC and find what's different on the wire.
Bug 6 — VP8 libva produces non-zero but non-matching output
- Signature: VP8 libva runs decode (no DQBUF ERROR), produces real-looking content (256 unique bytes, frame 1 first 16 =
93 8e 8a 89 85 72 8c 6d 82 79 92 7e 80 80 80 80), but output bytes diverge from kdirect's136ce5cb…. Pre-iter5b VP8 was all-zero (the format-mismatch issue); β unblocked decode; what's left is a different bug. - Reproduce: VP8 fixture, same harness. Hash
bcc57ed5…(libva) vs136ce5cb…(kdirect == sw). 74.8% zero in libva output suggests partial fill. - Hypothesis surface: cap_pool slot rotation off-by-one, partial buffer fill per frame, or per-frame DPB sync issue.
- Difficulty estimate: medium.
Backend-class backlog items (carried forward, none touched in iter5b-β)
- iter4-B1 — auto-detect picks wrong device on per-boot enumeration shuffle. Cost: 1-2 min per session re-mapping. Backend-only fix; medium scope (proper media-topology decoder/encoder discrimination via
MEDIA_ENT_F_PROC_VIDEO_DECODER). - iter4-B2 — mpv-vaapi
Could not create devicefor VP9. Consumer-side. - iter4-Q6 — per-segment quant-scale lossy mapping (VP9).
- iter4-COLOR_RANGE — VAAPI exposes no color_range field.
- B3 — picture.c BeginPicture profile-aware reset.
- B4 — context.c log suppression for unsupported codec controls (the EINVAL B4 cosmetic noise on hantro init probes).
- B5 — mpeg2 vbv_buffer_size polish.
- B6 — h265 SPS bitstream-parse fidelity gap.
- L3 — vaDeriveImage cache-stale.
Substrate-class items (iter5 originally targeted, now contextualized)
- vb2_dma_resv RFC v2 patches: tracked at
~/src/linux-rfc/. Verified by iter5 Phase 5 review to NOT be the right fix for the libva backend's MMAP+EXPBUF readback path (different memory model than the patches address). Still useful for DMABUF-import compositor paths (KWin, Mesa). Not in iter6's libva-decode-correctness critical path. - panfrost IOMMU_CACHE: separate sibling work-stream; not in iter6 scope.
Candidate research questions for iter6
Candidate F — Bug 5: HEVC libva decode kernel-rejection
"Identify and fix the V4L2 protocol contract difference that causes kernel rkvdec to reject HEVC decode via the libva backend while accepting it via ffmpeg-v4l2request. After fix: libva HEVC == kdirect HEVC == SW (byte-identical YUV)."
Approach sketch: strace both libva-vaapi-HEVC and ffmpeg-v4l2request-HEVC on the same fixture; diff the ioctl streams; find the divergence (likely in S_EXT_CTRLS sequencing, MEDIA_REQUEST_IOC_QUEUE ordering, or buffer-binding ordering); patch libva backend's HEVC path or shared infrastructure.
Pros: HEVC has the strongest "we claimed PASS but the proof was partial" stigma. Closing it directly upgrades iter2's transitive PASS to direct PASS, matching VP9's iter5b-β upgrade.
Cons: HEVC backend code is substantial (h265.c was rewritten at iter2). Diff debugging via strace + source-read may take multiple sessions. Risk: the difference may be a long-standing assumption requiring substantial refactor.
Candidate G — Bug 6: VP8 libva partial output
"Identify and fix the cause of VP8 libva producing 74.8%-zero output (rather than the byte-identical kdirect output). After fix: libva VP8 == kdirect VP8 == SW."
Approach sketch: characterize the zero regions in libva_vp8 (which frames? which rows/columns? which planes?); compare with kdirect_vp8 at byte level; trace cap_pool slot binding and per-frame DPB submission. Likely a slot-rotation or partial-fill bug.
Pros: VP8 is simpler than HEVC. The decode is already running (DQBUF success). Probably a small fix once root-caused.
Cons: Diagnostic surface is fuzzier than HEVC's (kernel succeeds, output diverges).
Candidate H — Bug 4: H.264 inter-frame race-loss
"Fix H.264 inter-frame decode through libva so all frames (not just keyframes) produce correct pixels."
Approach sketch: similar to Bug 6. H.264 has the strongest keyframe-vs-inter discrimination — decode happens for keyframes (consistent real content in frame 1) but inter frames produce zero. Likely DPB-related: reference frame indices, request_fd lifecycle, or per-frame ordering.
Pros: H.264 is the primary codec for most consumers. Fixing inter unlocks real video playback through libva. iter2-iter5b touched H.264 in passing; deeper investigation has been deferred since iter4.
Cons: H.264 DPB management is intricate (B-slice L1 ref lists, fresh request_fd per frame, etc. — iter4 already touched these). The remaining bug may be subtle.
Candidate I — Re-anchor iter6+ regression hashes on β substrate
"Lock the now-stable per-codec hashes (VP9
4f1565e8…, MPEG-219eefbf4…, H.264 keyframe-partial71ac099b…, VP8 partialbcc57ed5…, HEVC all-zero06b2c5a0…) as iter6+ regression invariants. Verify each iter6+ patch against these anchors."
Approach sketch: codify the Phase 7 v2 sweep as a regression test. Add to tests/ in the fork. Each iter6+ PR runs the sweep and compares.
Pros: cheap; establishes a reproducible regression baseline.
Cons: doesn't fix any bug. Maintenance work, not delivery work.
Candidate J — iter4-B1 auto-detect device discrimination
"Make backend auto-detect select the right V4L2 decode device on every boot regardless of
/dev/media*enumeration order. No more env-override-per-session."
Approach sketch: walk media topology, require MEDIA_ENT_F_PROC_VIDEO_DECODER on the entities, prefer decoder-by-codec mapping. ~100 LOC in request.c.
Pros: removes per-session friction. Mechanical fix.
Cons: doesn't fix any decode bug. Quality-of-life.
Out-of-scope items (carried unchanged)
- Performance metrics (Candidate D from iter5 Phase 0) — still blocked by pixel-correctness gaps in HEVC, VP8, H.264-inter. Defer to a post-correctness iteration.
- Front-end libva.
- Other hardware (ohm, ampere/boltzmann).
- AV1.
cros-codecsRust replacement.- Bootlin / Collabora upstreaming.
Recommendation
If pressed: Candidate H (Bug 4 H.264 inter) for impact (H.264 is the most consumer-relevant codec), Candidate F (Bug 5 HEVC) for diagnostic-clarity practice (the kernel-direct comparison strace is a clean delta-finding exercise that re-validates the iter5b-β β architecture), or Candidate G (Bug 6 VP8) for fast iteration (simpler codec, smaller suspect surface).
If multiple iterations are planned, the natural sequence is G → H → F: fix the simplest first (VP8), build technique, apply to harder cases.
If iter6 should specifically MATCH the difficulty of iter5b-β (medium): G or H.
If iter6 should specifically EXPAND on iter5b-β's architectural cleanup work: J (auto-detect harden) is the architectural fit; small backend change, removes a long-standing fragility.
Locked research question (iteration 6, 2026-05-12)
User pick: Candidate G — Bug 6 VP8 partial output.
"Identify and fix the cause of VP8 libva producing 74.8%-zero output with traces of real content (hash
bcc57ed5…) rather than the byte-identical kdirect output (136ce5cb…). After fix:libva_vp8.yuv == kdirect_vp8.yuv == sw_vp8.yuvforbbb_720p10s_vp8.webm3-frame test. No regression on VP9, MPEG-2, H.264 keyframe-partial state, or HEVC."
Pass/fail (boolean)
- VP8 libva == kdirect:
cmp -s libva_vp8.yuv kdirect_vp8.yuvreturns 0 on the standard 3-frame sweep. Both equal136ce5cb…(the iter5b-β kdirect anchor). - VP9 unchanged:
libva_vp9.yuv == kdirect_vp9.yuv == 4f1565e8…(iter5b-β's PASS preserved). - MPEG-2 unchanged:
libva_mpeg2.yuv == kdirect_mpeg2.yuv == 19eefbf4…(iter5b-β's maintained state preserved). - H.264 keyframe-partial unchanged:
libva_h264.yuv == 71ac099b…(Bug 4 still deferred to a future iteration; no new H.264 regression introduced). - HEVC unchanged:
libva_hevc.yuv == 06b2c5a0…all-zero (Bug 5 still deferred; no new HEVC regression). - Control-payload anchors hold:
VIDIOC_S_EXT_CTRLSpayloads on the 5-codec sweep byte-match the iter5 Phase 3 anchors. iter6 changes shouldn't touch control submission.
Clean iter6 close = all six criteria green. Bug 6 is the only NEW behavior; the other four codecs must hold their iter5b-β state.
Scope locks
In scope:
src/vp8.c— VP8 backend control assembly + slice submission.src/picture.c— VP8 dispatch incodec_set_controls, VP8 buffer-type cases, BeginPicture/EndPicture flow.src/surface.c—surface_bind_slot(slot-to-surface binding for CAPTURE).src/cap_pool.c/request_pool.c— slot lifecycle for VP8 path.surface.hparams.vp8union.- Any shared infrastructure that VP8 touches (request_fd lifecycle, DPB binding).
Out of scope:
- VP9 / H.264 / HEVC / MPEG-2 code paths (read-only for regression-verify).
- Kernel patches (this is a backend-side bug per the empirical evidence: kernel succeeded the DQBUF, decode happened, output diverges).
- Performance metrics.
- Bug 4, Bug 5, all other backlog items.
Phase 2 source-read targets
For the upcoming Phase 2 situation analysis:
src/vp8.c— full file. iter3 wrote this; ~300 LOC.src/picture.c::codec_set_controls— VP8 dispatch site.src/picture.c::codec_store_buffer— VP8 buffer-type cases (Picture, Slice, IQMatrix, ProbabilityData).src/surface.c::surface_bind_slot— destination_data fill.src/cap_pool.c::cap_pool_acquire— slot selection logic.- Kernel UAPI
<linux/v4l2-controls.h>—V4L2_CID_STATELESS_VP8_FRAME+struct v4l2_ctrl_vp8_frame. - FFmpeg
libavcodec/v4l2_request_vp8.c(Kwiboo's downstream) — kernel-direct VP8 reference. - Phase 3 baseline strace at
iter5_phase3_baseline.tgz/anchors_vp8/— control-payload anchor.
Phase 3 baseline (re-acquire if needed)
iter5 Phase 3 baseline already captured VP8 anchors. iter6 Phase 3 will re-verify on the current β backend + add: byte-level comparison of libva_vp8.yuv vs kdirect_vp8.yuv to identify where the divergence is (which frames? which planes? which spatial regions?).
Predicted iter6 difficulty
Small. The decode is already running through the kernel cleanly (no DQBUF ERROR like HEVC). The divergence is in cap_pool slot rotation or partial fill — most likely a 5-50 LOC fix once root-caused. Phase 2 + Phase 3 may take more time than Phase 6 implementation.
Phase 1 → Phase 2 handoff
iter6 Phase 1 is locked above. iter6 Phase 2 reads VP8 source and produces a situation analysis.