Files

T

marfrit 868d854121 iter6 Phase 0 lock: Candidate G — Bug 6 VP8 partial output

User pick. 6 boolean criteria locked: VP8 libva==kdirect; no regression
on VP9/MPEG-2/H.264-keyframe/HEVC; control-payload anchors hold.

Scope: src/vp8.c, src/picture.c VP8 dispatch + buffer cases,
src/surface.c surface_bind_slot, cap_pool slot lifecycle.
No kernel work. Backend-side fix expected (decode runs through
kernel cleanly; output diverges in slot rotation or partial fill).

Predicted small: 5-50 LOC once root-caused. Phase 2 + Phase 3
likely take more wallclock than Phase 6 implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-12 19:37:13 +00:00

13 KiB

Raw Blame History

Iteration 6 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock

Opens 2026-05-12 immediately after iter5b-β close (phase8_iteration5b_close.md, commit 9a14cc2). Per feedback_dev_process.md Phase 0, this document captures iter6's substrate state, the iter5b-β-surfaced bug inventory, and candidate research questions for Phase 1 lock.

iter5b-β was the first iteration to break the "codec N + 1" pattern. iter6 inherits an even messier menu: 3 named bugs from iter5b, the iter4-B1+B4 carry-overs, and the option-Φ candidates from the original iter5 Phase 0 doc that weren't picked.

Substrate state (verified 2026-05-12 at iter5b-β close)

Property	Value	Notes
Kernel	`7.0.0-fresnel-fourier`	`linux-fresnel-fourier 7.0-1` kernel-agent product; unchanged through iter5b.
Boot device numbering (today)	rkvdec `/dev/video1+/dev/media0`, hantro-vpu-dec `/dev/video3+/dev/media1`	Different from yesterday; iter4-B1 still open.
Fork tip	`70196f8` (β + Commit D)	On noether + fresnel + gitea.
Backend installed	`/usr/lib/dri/v4l2_request_drv_video.so` SHA `2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8`	β architecture; OUTPUT lifecycle owned by CreateContext.
Codec scoreboard	5/5 with 2 direct (VP9, MPEG-2) + 3 mixed (H.264 keyframe-partial Bug 4, VP8 partial Bug 6, HEVC transitive-only direct-FAIL Bug 5)	iter5b-β closed VP9 directly; others remain mixed.

Bug inventory after iter5b-β

Active bugs with explicit reproduction signatures

Bug 4 — H.264 inter-frame race-loss (carried from iter4 Phase 7)

Signature: H.264 keyframe decodes correctly through libva; inter frames return all-zero pages.
Reproduce: ffmpeg -hwaccel vaapi -i bbb_1080p30_h264.mp4 -frames:v 3 -vf hwdownload,format=nv12,format=yuv420p -f rawvideo out.yuv. Hash 71ac099b…. 99.99% zero; frame 1 first 16 bytes = 81 81 80 80 80 7f 7f 7f 7f 7f 7f 80 80 80 81 81 (real chroma); frames 2, 3 fully zero.
Hypothesis surface: slot rotation, partial DPB, or some inter-specific submission gap.

Bug 5 — HEVC libva DQBUF returns FLAG_ERROR

Signature: every HEVC libva DQBUF (both OUTPUT and CAPTURE) sets V4L2_BUF_FLAG_ERROR. Kernel rkvdec rejects the decode. CAPTURE stays at cap_pool init pattern (all-zero).
Reproduce: same shape as Bug 4, with bbb_720p10s_hevc.mp4. Hash 06b2c5a0… = all-zero.
Pre-existing: Phase 3 baseline anchor trace (pre-iter5b) also showed FLAG_ERROR on every HEVC DQBUF. iter2's "PASS via transitive proof" verified backend's control PAYLOAD matched kdirect's payload — but the kernel rejected the libva submission regardless. Some V4L2 protocol contract aspect differs between libva backend and ffmpeg-v4l2request that the transitive proof didn't capture (request_fd binding order, sequence number, ioctl sequencing, extra control needed, etc.).
Difficulty estimate: medium-to-high. Need to diff the actual V4L2 ioctl streams between libva-vaapi-HEVC and ffmpeg-v4l2request-HEVC and find what's different on the wire.

Bug 6 — VP8 libva produces non-zero but non-matching output

Signature: VP8 libva runs decode (no DQBUF ERROR), produces real-looking content (256 unique bytes, frame 1 first 16 = 93 8e 8a 89 85 72 8c 6d 82 79 92 7e 80 80 80 80), but output bytes diverge from kdirect's 136ce5cb…. Pre-iter5b VP8 was all-zero (the format-mismatch issue); β unblocked decode; what's left is a different bug.
Reproduce: VP8 fixture, same harness. Hash bcc57ed5… (libva) vs 136ce5cb… (kdirect == sw). 74.8% zero in libva output suggests partial fill.
Hypothesis surface: cap_pool slot rotation off-by-one, partial buffer fill per frame, or per-frame DPB sync issue.
Difficulty estimate: medium.

Backend-class backlog items (carried forward, none touched in iter5b-β)

iter4-B1 — auto-detect picks wrong device on per-boot enumeration shuffle. Cost: 1-2 min per session re-mapping. Backend-only fix; medium scope (proper media-topology decoder/encoder discrimination via MEDIA_ENT_F_PROC_VIDEO_DECODER).
iter4-B2 — mpv-vaapi Could not create device for VP9. Consumer-side.
iter4-Q6 — per-segment quant-scale lossy mapping (VP9).
iter4-COLOR_RANGE — VAAPI exposes no color_range field.
B3 — picture.c BeginPicture profile-aware reset.
B4 — context.c log suppression for unsupported codec controls (the EINVAL B4 cosmetic noise on hantro init probes).
B5 — mpeg2 vbv_buffer_size polish.
B6 — h265 SPS bitstream-parse fidelity gap.
L3 — vaDeriveImage cache-stale.

Substrate-class items (iter5 originally targeted, now contextualized)

vb2_dma_resv RFC v2 patches: tracked at ~/src/linux-rfc/. Verified by iter5 Phase 5 review to NOT be the right fix for the libva backend's MMAP+EXPBUF readback path (different memory model than the patches address). Still useful for DMABUF-import compositor paths (KWin, Mesa). Not in iter6's libva-decode-correctness critical path.
panfrost IOMMU_CACHE: separate sibling work-stream; not in iter6 scope.

Candidate research questions for iter6

Candidate F — Bug 5: HEVC libva decode kernel-rejection

"Identify and fix the V4L2 protocol contract difference that causes kernel rkvdec to reject HEVC decode via the libva backend while accepting it via ffmpeg-v4l2request. After fix: libva HEVC == kdirect HEVC == SW (byte-identical YUV)."

Approach sketch: strace both libva-vaapi-HEVC and ffmpeg-v4l2request-HEVC on the same fixture; diff the ioctl streams; find the divergence (likely in S_EXT_CTRLS sequencing, MEDIA_REQUEST_IOC_QUEUE ordering, or buffer-binding ordering); patch libva backend's HEVC path or shared infrastructure.

Pros: HEVC has the strongest "we claimed PASS but the proof was partial" stigma. Closing it directly upgrades iter2's transitive PASS to direct PASS, matching VP9's iter5b-β upgrade.

Cons: HEVC backend code is substantial (h265.c was rewritten at iter2). Diff debugging via strace + source-read may take multiple sessions. Risk: the difference may be a long-standing assumption requiring substantial refactor.

Candidate G — Bug 6: VP8 libva partial output

"Identify and fix the cause of VP8 libva producing 74.8%-zero output (rather than the byte-identical kdirect output). After fix: libva VP8 == kdirect VP8 == SW."

Approach sketch: characterize the zero regions in libva_vp8 (which frames? which rows/columns? which planes?); compare with kdirect_vp8 at byte level; trace cap_pool slot binding and per-frame DPB submission. Likely a slot-rotation or partial-fill bug.

Pros: VP8 is simpler than HEVC. The decode is already running (DQBUF success). Probably a small fix once root-caused.

Cons: Diagnostic surface is fuzzier than HEVC's (kernel succeeds, output diverges).

Candidate H — Bug 4: H.264 inter-frame race-loss

"Fix H.264 inter-frame decode through libva so all frames (not just keyframes) produce correct pixels."

Approach sketch: similar to Bug 6. H.264 has the strongest keyframe-vs-inter discrimination — decode happens for keyframes (consistent real content in frame 1) but inter frames produce zero. Likely DPB-related: reference frame indices, request_fd lifecycle, or per-frame ordering.

Pros: H.264 is the primary codec for most consumers. Fixing inter unlocks real video playback through libva. iter2-iter5b touched H.264 in passing; deeper investigation has been deferred since iter4.

Cons: H.264 DPB management is intricate (B-slice L1 ref lists, fresh request_fd per frame, etc. — iter4 already touched these). The remaining bug may be subtle.

Candidate I — Re-anchor iter6+ regression hashes on β substrate

"Lock the now-stable per-codec hashes (VP9 4f1565e8…, MPEG-2 19eefbf4…, H.264 keyframe-partial 71ac099b…, VP8 partial bcc57ed5…, HEVC all-zero 06b2c5a0…) as iter6+ regression invariants. Verify each iter6+ patch against these anchors."

Approach sketch: codify the Phase 7 v2 sweep as a regression test. Add to tests/ in the fork. Each iter6+ PR runs the sweep and compares.

Pros: cheap; establishes a reproducible regression baseline.

Cons: doesn't fix any bug. Maintenance work, not delivery work.

Candidate J — iter4-B1 auto-detect device discrimination

"Make backend auto-detect select the right V4L2 decode device on every boot regardless of /dev/media* enumeration order. No more env-override-per-session."

Approach sketch: walk media topology, require MEDIA_ENT_F_PROC_VIDEO_DECODER on the entities, prefer decoder-by-codec mapping. ~100 LOC in request.c.

Pros: removes per-session friction. Mechanical fix.

Cons: doesn't fix any decode bug. Quality-of-life.

Out-of-scope items (carried unchanged)

Performance metrics (Candidate D from iter5 Phase 0) — still blocked by pixel-correctness gaps in HEVC, VP8, H.264-inter. Defer to a post-correctness iteration.
Front-end libva.
Other hardware (ohm, ampere/boltzmann).
AV1.
cros-codecs Rust replacement.
Bootlin / Collabora upstreaming.

Recommendation

If pressed: Candidate H (Bug 4 H.264 inter) for impact (H.264 is the most consumer-relevant codec), Candidate F (Bug 5 HEVC) for diagnostic-clarity practice (the kernel-direct comparison strace is a clean delta-finding exercise that re-validates the iter5b-β β architecture), or Candidate G (Bug 6 VP8) for fast iteration (simpler codec, smaller suspect surface).

If multiple iterations are planned, the natural sequence is G → H → F: fix the simplest first (VP8), build technique, apply to harder cases.

If iter6 should specifically MATCH the difficulty of iter5b-β (medium): G or H.

If iter6 should specifically EXPAND on iter5b-β's architectural cleanup work: J (auto-detect harden) is the architectural fit; small backend change, removes a long-standing fragility.

Locked research question (iteration 6, 2026-05-12)

User pick: Candidate G — Bug 6 VP8 partial output.

"Identify and fix the cause of VP8 libva producing 74.8%-zero output with traces of real content (hash bcc57ed5…) rather than the byte-identical kdirect output (136ce5cb…). After fix: libva_vp8.yuv == kdirect_vp8.yuv == sw_vp8.yuv for bbb_720p10s_vp8.webm 3-frame test. No regression on VP9, MPEG-2, H.264 keyframe-partial state, or HEVC."

Pass/fail (boolean)

VP8 libva == kdirect: cmp -s libva_vp8.yuv kdirect_vp8.yuv returns 0 on the standard 3-frame sweep. Both equal 136ce5cb… (the iter5b-β kdirect anchor).
VP9 unchanged: libva_vp9.yuv == kdirect_vp9.yuv == 4f1565e8… (iter5b-β's PASS preserved).
MPEG-2 unchanged: libva_mpeg2.yuv == kdirect_mpeg2.yuv == 19eefbf4… (iter5b-β's maintained state preserved).
H.264 keyframe-partial unchanged: libva_h264.yuv == 71ac099b… (Bug 4 still deferred to a future iteration; no new H.264 regression introduced).
HEVC unchanged: libva_hevc.yuv == 06b2c5a0… all-zero (Bug 5 still deferred; no new HEVC regression).
Control-payload anchors hold: VIDIOC_S_EXT_CTRLS payloads on the 5-codec sweep byte-match the iter5 Phase 3 anchors. iter6 changes shouldn't touch control submission.

Clean iter6 close = all six criteria green. Bug 6 is the only NEW behavior; the other four codecs must hold their iter5b-β state.

Scope locks

In scope:

src/vp8.c — VP8 backend control assembly + slice submission.
src/picture.c — VP8 dispatch in codec_set_controls, VP8 buffer-type cases, BeginPicture/EndPicture flow.
src/surface.c — surface_bind_slot (slot-to-surface binding for CAPTURE).
src/cap_pool.c / request_pool.c — slot lifecycle for VP8 path.
surface.h params.vp8 union.
Any shared infrastructure that VP8 touches (request_fd lifecycle, DPB binding).

Out of scope:

VP9 / H.264 / HEVC / MPEG-2 code paths (read-only for regression-verify).
Kernel patches (this is a backend-side bug per the empirical evidence: kernel succeeded the DQBUF, decode happened, output diverges).
Performance metrics.
Bug 4, Bug 5, all other backlog items.

Phase 2 source-read targets

For the upcoming Phase 2 situation analysis:

src/vp8.c — full file. iter3 wrote this; ~300 LOC.
src/picture.c::codec_set_controls — VP8 dispatch site.
src/picture.c::codec_store_buffer — VP8 buffer-type cases (Picture, Slice, IQMatrix, ProbabilityData).
src/surface.c::surface_bind_slot — destination_data fill.
src/cap_pool.c::cap_pool_acquire — slot selection logic.
Kernel UAPI <linux/v4l2-controls.h> — V4L2_CID_STATELESS_VP8_FRAME + struct v4l2_ctrl_vp8_frame.
FFmpeg libavcodec/v4l2_request_vp8.c (Kwiboo's downstream) — kernel-direct VP8 reference.
Phase 3 baseline strace at iter5_phase3_baseline.tgz/anchors_vp8/ — control-payload anchor.

Phase 3 baseline (re-acquire if needed)

iter5 Phase 3 baseline already captured VP8 anchors. iter6 Phase 3 will re-verify on the current β backend + add: byte-level comparison of libva_vp8.yuv vs kdirect_vp8.yuv to identify where the divergence is (which frames? which planes? which spatial regions?).

Predicted iter6 difficulty

Small. The decode is already running through the kernel cleanly (no DQBUF ERROR like HEVC). The divergence is in cap_pool slot rotation or partial fill — most likely a 5-50 LOC fix once root-caused. Phase 2 + Phase 3 may take more time than Phase 6 implementation.

Phase 1 → Phase 2 handoff

iter6 Phase 1 is locked above. iter6 Phase 2 reads VP8 source and produces a situation analysis.

13 KiB Raw Blame History