From 34e1480de5beede6c3074895855ac89c01e9bb83 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Tue, 12 May 2026 19:23:58 +0000 Subject: [PATCH] iter6 Phase 0: substrate inventory + 5 candidate research questions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit iter5b-β surfaced 3 explicit bugs (Bug 4 H.264 inter, Bug 5 HEVC DQBUF ERROR, Bug 6 VP8 partial output) plus carried backlog items (iter4-B1 device discrimination, B2-B6, L3, Q6, COLOR_RANGE). Candidates F-J laid out for user lock: - F: Bug 5 HEVC kernel-rejection (highest claim-vs-reality stigma) - G: Bug 6 VP8 partial output (smallest suspect surface) - H: Bug 4 H.264 inter race (highest consumer impact) - I: Re-anchor regression hashes on β substrate - J: iter4-B1 auto-detect harden Recommendation: G → H → F sequence if multiple iters planned; otherwise H for impact or J for architectural-cleanup fit. Phase 1 lock pending user pick. Co-Authored-By: Claude Opus 4.7 (1M context) --- phase0_findings_iter6.md | 131 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 131 insertions(+) create mode 100644 phase0_findings_iter6.md diff --git a/phase0_findings_iter6.md b/phase0_findings_iter6.md new file mode 100644 index 0000000..27d1966 --- /dev/null +++ b/phase0_findings_iter6.md @@ -0,0 +1,131 @@ +# Iteration 6 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock + +Opens 2026-05-12 immediately after iter5b-β close ([`phase8_iteration5b_close.md`](phase8_iteration5b_close.md), commit `9a14cc2`). Per `feedback_dev_process.md` Phase 0, this document captures iter6's substrate state, the iter5b-β-surfaced bug inventory, and candidate research questions for Phase 1 lock. + +iter5b-β was the first iteration to break the "codec N + 1" pattern. iter6 inherits an even messier menu: 3 named bugs from iter5b, the iter4-B1+B4 carry-overs, and the option-Φ candidates from the original iter5 Phase 0 doc that weren't picked. + +## Substrate state (verified 2026-05-12 at iter5b-β close) + +| Property | Value | Notes | +|---|---|---| +| Kernel | `7.0.0-fresnel-fourier` | `linux-fresnel-fourier 7.0-1` kernel-agent product; unchanged through iter5b. | +| Boot device numbering (today) | rkvdec `/dev/video1+/dev/media0`, hantro-vpu-dec `/dev/video3+/dev/media1` | Different from yesterday; iter4-B1 still open. | +| Fork tip | `70196f8` (β + Commit D) | On noether + fresnel + gitea. | +| Backend installed | `/usr/lib/dri/v4l2_request_drv_video.so` SHA `2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8` | β architecture; OUTPUT lifecycle owned by CreateContext. | +| Codec scoreboard | 5/5 with 2 direct (VP9, MPEG-2) + 3 mixed (H.264 keyframe-partial Bug 4, VP8 partial Bug 6, HEVC transitive-only direct-FAIL Bug 5) | iter5b-β closed VP9 directly; others remain mixed. | + +## Bug inventory after iter5b-β + +### Active bugs with explicit reproduction signatures + +**Bug 4 — H.264 inter-frame race-loss** *(carried from iter4 Phase 7)* + +- Signature: H.264 keyframe decodes correctly through libva; inter frames return all-zero pages. +- Reproduce: `ffmpeg -hwaccel vaapi -i bbb_1080p30_h264.mp4 -frames:v 3 -vf hwdownload,format=nv12,format=yuv420p -f rawvideo out.yuv`. Hash `71ac099b…`. 99.99% zero; frame 1 first 16 bytes = `81 81 80 80 80 7f 7f 7f 7f 7f 7f 80 80 80 81 81` (real chroma); frames 2, 3 fully zero. +- Hypothesis surface: slot rotation, partial DPB, or some inter-specific submission gap. + +**Bug 5 — HEVC libva DQBUF returns FLAG_ERROR** + +- Signature: every HEVC libva DQBUF (both OUTPUT and CAPTURE) sets `V4L2_BUF_FLAG_ERROR`. Kernel rkvdec rejects the decode. CAPTURE stays at cap_pool init pattern (all-zero). +- Reproduce: same shape as Bug 4, with `bbb_720p10s_hevc.mp4`. Hash `06b2c5a0…` = all-zero. +- Pre-existing: Phase 3 baseline anchor trace (pre-iter5b) also showed FLAG_ERROR on every HEVC DQBUF. iter2's "PASS via transitive proof" verified backend's control PAYLOAD matched kdirect's payload — but the kernel rejected the libva submission regardless. Some V4L2 protocol contract aspect differs between libva backend and ffmpeg-v4l2request that the transitive proof didn't capture (request_fd binding order, sequence number, ioctl sequencing, extra control needed, etc.). +- Difficulty estimate: medium-to-high. Need to diff the actual V4L2 ioctl streams between libva-vaapi-HEVC and ffmpeg-v4l2request-HEVC and find what's different on the wire. + +**Bug 6 — VP8 libva produces non-zero but non-matching output** + +- Signature: VP8 libva runs decode (no DQBUF ERROR), produces real-looking content (256 unique bytes, frame 1 first 16 = `93 8e 8a 89 85 72 8c 6d 82 79 92 7e 80 80 80 80`), but output bytes diverge from kdirect's `136ce5cb…`. Pre-iter5b VP8 was all-zero (the format-mismatch issue); β unblocked decode; what's left is a different bug. +- Reproduce: VP8 fixture, same harness. Hash `bcc57ed5…` (libva) vs `136ce5cb…` (kdirect == sw). 74.8% zero in libva output suggests partial fill. +- Hypothesis surface: cap_pool slot rotation off-by-one, partial buffer fill per frame, or per-frame DPB sync issue. +- Difficulty estimate: medium. + +### Backend-class backlog items (carried forward, none touched in iter5b-β) + +- **iter4-B1** — auto-detect picks wrong device on per-boot enumeration shuffle. Cost: 1-2 min per session re-mapping. Backend-only fix; medium scope (proper media-topology decoder/encoder discrimination via `MEDIA_ENT_F_PROC_VIDEO_DECODER`). +- **iter4-B2** — mpv-vaapi `Could not create device` for VP9. Consumer-side. +- **iter4-Q6** — per-segment quant-scale lossy mapping (VP9). +- **iter4-COLOR_RANGE** — VAAPI exposes no color_range field. +- **B3** — picture.c BeginPicture profile-aware reset. +- **B4** — context.c log suppression for unsupported codec controls (the EINVAL B4 cosmetic noise on hantro init probes). +- **B5** — mpeg2 vbv_buffer_size polish. +- **B6** — h265 SPS bitstream-parse fidelity gap. +- **L3** — vaDeriveImage cache-stale. + +### Substrate-class items (iter5 originally targeted, now contextualized) + +- **vb2_dma_resv RFC v2 patches**: tracked at `~/src/linux-rfc/`. Verified by iter5 Phase 5 review to NOT be the right fix for the libva backend's MMAP+EXPBUF readback path (different memory model than the patches address). Still useful for DMABUF-import compositor paths (KWin, Mesa). Not in iter6's libva-decode-correctness critical path. +- **panfrost IOMMU_CACHE**: separate sibling work-stream; not in iter6 scope. + +## Candidate research questions for iter6 + +### Candidate F — Bug 5: HEVC libva decode kernel-rejection + +> *"Identify and fix the V4L2 protocol contract difference that causes kernel rkvdec to reject HEVC decode via the libva backend while accepting it via ffmpeg-v4l2request. After fix: libva HEVC == kdirect HEVC == SW (byte-identical YUV)."* + +**Approach sketch**: strace both libva-vaapi-HEVC and ffmpeg-v4l2request-HEVC on the same fixture; diff the ioctl streams; find the divergence (likely in S_EXT_CTRLS sequencing, MEDIA_REQUEST_IOC_QUEUE ordering, or buffer-binding ordering); patch libva backend's HEVC path or shared infrastructure. + +**Pros**: HEVC has the strongest "we claimed PASS but the proof was partial" stigma. Closing it directly upgrades iter2's transitive PASS to direct PASS, matching VP9's iter5b-β upgrade. + +**Cons**: HEVC backend code is substantial (`h265.c` was rewritten at iter2). Diff debugging via strace + source-read may take multiple sessions. Risk: the difference may be a long-standing assumption requiring substantial refactor. + +### Candidate G — Bug 6: VP8 libva partial output + +> *"Identify and fix the cause of VP8 libva producing 74.8%-zero output (rather than the byte-identical kdirect output). After fix: libva VP8 == kdirect VP8 == SW."* + +**Approach sketch**: characterize the zero regions in libva_vp8 (which frames? which rows/columns? which planes?); compare with kdirect_vp8 at byte level; trace cap_pool slot binding and per-frame DPB submission. Likely a slot-rotation or partial-fill bug. + +**Pros**: VP8 is simpler than HEVC. The decode is already running (DQBUF success). Probably a small fix once root-caused. + +**Cons**: Diagnostic surface is fuzzier than HEVC's (kernel succeeds, output diverges). + +### Candidate H — Bug 4: H.264 inter-frame race-loss + +> *"Fix H.264 inter-frame decode through libva so all frames (not just keyframes) produce correct pixels."* + +**Approach sketch**: similar to Bug 6. H.264 has the strongest keyframe-vs-inter discrimination — decode happens for keyframes (consistent real content in frame 1) but inter frames produce zero. Likely DPB-related: reference frame indices, request_fd lifecycle, or per-frame ordering. + +**Pros**: H.264 is the primary codec for most consumers. Fixing inter unlocks real video playback through libva. iter2-iter5b touched H.264 in passing; deeper investigation has been deferred since iter4. + +**Cons**: H.264 DPB management is intricate (B-slice L1 ref lists, fresh request_fd per frame, etc. — iter4 already touched these). The remaining bug may be subtle. + +### Candidate I — Re-anchor iter6+ regression hashes on β substrate + +> *"Lock the now-stable per-codec hashes (VP9 `4f1565e8…`, MPEG-2 `19eefbf4…`, H.264 keyframe-partial `71ac099b…`, VP8 partial `bcc57ed5…`, HEVC all-zero `06b2c5a0…`) as iter6+ regression invariants. Verify each iter6+ patch against these anchors."* + +**Approach sketch**: codify the Phase 7 v2 sweep as a regression test. Add to `tests/` in the fork. Each iter6+ PR runs the sweep and compares. + +**Pros**: cheap; establishes a reproducible regression baseline. + +**Cons**: doesn't fix any bug. Maintenance work, not delivery work. + +### Candidate J — iter4-B1 auto-detect device discrimination + +> *"Make backend auto-detect select the right V4L2 decode device on every boot regardless of `/dev/media*` enumeration order. No more env-override-per-session."* + +**Approach sketch**: walk media topology, require `MEDIA_ENT_F_PROC_VIDEO_DECODER` on the entities, prefer decoder-by-codec mapping. ~100 LOC in `request.c`. + +**Pros**: removes per-session friction. Mechanical fix. + +**Cons**: doesn't fix any decode bug. Quality-of-life. + +## Out-of-scope items (carried unchanged) + +- Performance metrics (Candidate D from iter5 Phase 0) — still blocked by pixel-correctness gaps in HEVC, VP8, H.264-inter. Defer to a post-correctness iteration. +- Front-end libva. +- Other hardware (ohm, ampere/boltzmann). +- AV1. +- `cros-codecs` Rust replacement. +- Bootlin / Collabora upstreaming. + +## Recommendation + +If pressed: **Candidate H (Bug 4 H.264 inter)** for impact (H.264 is the most consumer-relevant codec), **Candidate F (Bug 5 HEVC)** for diagnostic-clarity practice (the kernel-direct comparison strace is a clean delta-finding exercise that re-validates the iter5b-β β architecture), or **Candidate G (Bug 6 VP8)** for fast iteration (simpler codec, smaller suspect surface). + +If multiple iterations are planned, the natural sequence is **G → H → F**: fix the simplest first (VP8), build technique, apply to harder cases. + +If iter6 should specifically MATCH the difficulty of iter5b-β (medium): **G or H**. + +If iter6 should specifically EXPAND on iter5b-β's architectural cleanup work: **J** (auto-detect harden) is the architectural fit; small backend change, removes a long-standing fragility. + +## Phase 1 lock pending + +This document does NOT lock Phase 1. The user picks the iter6 research question from candidates F-J (or proposes K). After that pick, this doc becomes "iter6 Phase 0 final" and feeds Phase 1 lock.