868d854121
User pick. 6 boolean criteria locked: VP8 libva==kdirect; no regression on VP9/MPEG-2/H.264-keyframe/HEVC; control-payload anchors hold. Scope: src/vp8.c, src/picture.c VP8 dispatch + buffer cases, src/surface.c surface_bind_slot, cap_pool slot lifecycle. No kernel work. Backend-side fix expected (decode runs through kernel cleanly; output diverges in slot rotation or partial fill). Predicted small: 5-50 LOC once root-caused. Phase 2 + Phase 3 likely take more wallclock than Phase 6 implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
186 lines
13 KiB
Markdown
186 lines
13 KiB
Markdown
# Iteration 6 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock
|
|
|
|
Opens 2026-05-12 immediately after iter5b-β close ([`phase8_iteration5b_close.md`](phase8_iteration5b_close.md), commit `9a14cc2`). Per `feedback_dev_process.md` Phase 0, this document captures iter6's substrate state, the iter5b-β-surfaced bug inventory, and candidate research questions for Phase 1 lock.
|
|
|
|
iter5b-β was the first iteration to break the "codec N + 1" pattern. iter6 inherits an even messier menu: 3 named bugs from iter5b, the iter4-B1+B4 carry-overs, and the option-Φ candidates from the original iter5 Phase 0 doc that weren't picked.
|
|
|
|
## Substrate state (verified 2026-05-12 at iter5b-β close)
|
|
|
|
| Property | Value | Notes |
|
|
|---|---|---|
|
|
| Kernel | `7.0.0-fresnel-fourier` | `linux-fresnel-fourier 7.0-1` kernel-agent product; unchanged through iter5b. |
|
|
| Boot device numbering (today) | rkvdec `/dev/video1+/dev/media0`, hantro-vpu-dec `/dev/video3+/dev/media1` | Different from yesterday; iter4-B1 still open. |
|
|
| Fork tip | `70196f8` (β + Commit D) | On noether + fresnel + gitea. |
|
|
| Backend installed | `/usr/lib/dri/v4l2_request_drv_video.so` SHA `2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8` | β architecture; OUTPUT lifecycle owned by CreateContext. |
|
|
| Codec scoreboard | 5/5 with 2 direct (VP9, MPEG-2) + 3 mixed (H.264 keyframe-partial Bug 4, VP8 partial Bug 6, HEVC transitive-only direct-FAIL Bug 5) | iter5b-β closed VP9 directly; others remain mixed. |
|
|
|
|
## Bug inventory after iter5b-β
|
|
|
|
### Active bugs with explicit reproduction signatures
|
|
|
|
**Bug 4 — H.264 inter-frame race-loss** *(carried from iter4 Phase 7)*
|
|
|
|
- Signature: H.264 keyframe decodes correctly through libva; inter frames return all-zero pages.
|
|
- Reproduce: `ffmpeg -hwaccel vaapi -i bbb_1080p30_h264.mp4 -frames:v 3 -vf hwdownload,format=nv12,format=yuv420p -f rawvideo out.yuv`. Hash `71ac099b…`. 99.99% zero; frame 1 first 16 bytes = `81 81 80 80 80 7f 7f 7f 7f 7f 7f 80 80 80 81 81` (real chroma); frames 2, 3 fully zero.
|
|
- Hypothesis surface: slot rotation, partial DPB, or some inter-specific submission gap.
|
|
|
|
**Bug 5 — HEVC libva DQBUF returns FLAG_ERROR**
|
|
|
|
- Signature: every HEVC libva DQBUF (both OUTPUT and CAPTURE) sets `V4L2_BUF_FLAG_ERROR`. Kernel rkvdec rejects the decode. CAPTURE stays at cap_pool init pattern (all-zero).
|
|
- Reproduce: same shape as Bug 4, with `bbb_720p10s_hevc.mp4`. Hash `06b2c5a0…` = all-zero.
|
|
- Pre-existing: Phase 3 baseline anchor trace (pre-iter5b) also showed FLAG_ERROR on every HEVC DQBUF. iter2's "PASS via transitive proof" verified backend's control PAYLOAD matched kdirect's payload — but the kernel rejected the libva submission regardless. Some V4L2 protocol contract aspect differs between libva backend and ffmpeg-v4l2request that the transitive proof didn't capture (request_fd binding order, sequence number, ioctl sequencing, extra control needed, etc.).
|
|
- Difficulty estimate: medium-to-high. Need to diff the actual V4L2 ioctl streams between libva-vaapi-HEVC and ffmpeg-v4l2request-HEVC and find what's different on the wire.
|
|
|
|
**Bug 6 — VP8 libva produces non-zero but non-matching output**
|
|
|
|
- Signature: VP8 libva runs decode (no DQBUF ERROR), produces real-looking content (256 unique bytes, frame 1 first 16 = `93 8e 8a 89 85 72 8c 6d 82 79 92 7e 80 80 80 80`), but output bytes diverge from kdirect's `136ce5cb…`. Pre-iter5b VP8 was all-zero (the format-mismatch issue); β unblocked decode; what's left is a different bug.
|
|
- Reproduce: VP8 fixture, same harness. Hash `bcc57ed5…` (libva) vs `136ce5cb…` (kdirect == sw). 74.8% zero in libva output suggests partial fill.
|
|
- Hypothesis surface: cap_pool slot rotation off-by-one, partial buffer fill per frame, or per-frame DPB sync issue.
|
|
- Difficulty estimate: medium.
|
|
|
|
### Backend-class backlog items (carried forward, none touched in iter5b-β)
|
|
|
|
- **iter4-B1** — auto-detect picks wrong device on per-boot enumeration shuffle. Cost: 1-2 min per session re-mapping. Backend-only fix; medium scope (proper media-topology decoder/encoder discrimination via `MEDIA_ENT_F_PROC_VIDEO_DECODER`).
|
|
- **iter4-B2** — mpv-vaapi `Could not create device` for VP9. Consumer-side.
|
|
- **iter4-Q6** — per-segment quant-scale lossy mapping (VP9).
|
|
- **iter4-COLOR_RANGE** — VAAPI exposes no color_range field.
|
|
- **B3** — picture.c BeginPicture profile-aware reset.
|
|
- **B4** — context.c log suppression for unsupported codec controls (the EINVAL B4 cosmetic noise on hantro init probes).
|
|
- **B5** — mpeg2 vbv_buffer_size polish.
|
|
- **B6** — h265 SPS bitstream-parse fidelity gap.
|
|
- **L3** — vaDeriveImage cache-stale.
|
|
|
|
### Substrate-class items (iter5 originally targeted, now contextualized)
|
|
|
|
- **vb2_dma_resv RFC v2 patches**: tracked at `~/src/linux-rfc/`. Verified by iter5 Phase 5 review to NOT be the right fix for the libva backend's MMAP+EXPBUF readback path (different memory model than the patches address). Still useful for DMABUF-import compositor paths (KWin, Mesa). Not in iter6's libva-decode-correctness critical path.
|
|
- **panfrost IOMMU_CACHE**: separate sibling work-stream; not in iter6 scope.
|
|
|
|
## Candidate research questions for iter6
|
|
|
|
### Candidate F — Bug 5: HEVC libva decode kernel-rejection
|
|
|
|
> *"Identify and fix the V4L2 protocol contract difference that causes kernel rkvdec to reject HEVC decode via the libva backend while accepting it via ffmpeg-v4l2request. After fix: libva HEVC == kdirect HEVC == SW (byte-identical YUV)."*
|
|
|
|
**Approach sketch**: strace both libva-vaapi-HEVC and ffmpeg-v4l2request-HEVC on the same fixture; diff the ioctl streams; find the divergence (likely in S_EXT_CTRLS sequencing, MEDIA_REQUEST_IOC_QUEUE ordering, or buffer-binding ordering); patch libva backend's HEVC path or shared infrastructure.
|
|
|
|
**Pros**: HEVC has the strongest "we claimed PASS but the proof was partial" stigma. Closing it directly upgrades iter2's transitive PASS to direct PASS, matching VP9's iter5b-β upgrade.
|
|
|
|
**Cons**: HEVC backend code is substantial (`h265.c` was rewritten at iter2). Diff debugging via strace + source-read may take multiple sessions. Risk: the difference may be a long-standing assumption requiring substantial refactor.
|
|
|
|
### Candidate G — Bug 6: VP8 libva partial output
|
|
|
|
> *"Identify and fix the cause of VP8 libva producing 74.8%-zero output (rather than the byte-identical kdirect output). After fix: libva VP8 == kdirect VP8 == SW."*
|
|
|
|
**Approach sketch**: characterize the zero regions in libva_vp8 (which frames? which rows/columns? which planes?); compare with kdirect_vp8 at byte level; trace cap_pool slot binding and per-frame DPB submission. Likely a slot-rotation or partial-fill bug.
|
|
|
|
**Pros**: VP8 is simpler than HEVC. The decode is already running (DQBUF success). Probably a small fix once root-caused.
|
|
|
|
**Cons**: Diagnostic surface is fuzzier than HEVC's (kernel succeeds, output diverges).
|
|
|
|
### Candidate H — Bug 4: H.264 inter-frame race-loss
|
|
|
|
> *"Fix H.264 inter-frame decode through libva so all frames (not just keyframes) produce correct pixels."*
|
|
|
|
**Approach sketch**: similar to Bug 6. H.264 has the strongest keyframe-vs-inter discrimination — decode happens for keyframes (consistent real content in frame 1) but inter frames produce zero. Likely DPB-related: reference frame indices, request_fd lifecycle, or per-frame ordering.
|
|
|
|
**Pros**: H.264 is the primary codec for most consumers. Fixing inter unlocks real video playback through libva. iter2-iter5b touched H.264 in passing; deeper investigation has been deferred since iter4.
|
|
|
|
**Cons**: H.264 DPB management is intricate (B-slice L1 ref lists, fresh request_fd per frame, etc. — iter4 already touched these). The remaining bug may be subtle.
|
|
|
|
### Candidate I — Re-anchor iter6+ regression hashes on β substrate
|
|
|
|
> *"Lock the now-stable per-codec hashes (VP9 `4f1565e8…`, MPEG-2 `19eefbf4…`, H.264 keyframe-partial `71ac099b…`, VP8 partial `bcc57ed5…`, HEVC all-zero `06b2c5a0…`) as iter6+ regression invariants. Verify each iter6+ patch against these anchors."*
|
|
|
|
**Approach sketch**: codify the Phase 7 v2 sweep as a regression test. Add to `tests/` in the fork. Each iter6+ PR runs the sweep and compares.
|
|
|
|
**Pros**: cheap; establishes a reproducible regression baseline.
|
|
|
|
**Cons**: doesn't fix any bug. Maintenance work, not delivery work.
|
|
|
|
### Candidate J — iter4-B1 auto-detect device discrimination
|
|
|
|
> *"Make backend auto-detect select the right V4L2 decode device on every boot regardless of `/dev/media*` enumeration order. No more env-override-per-session."*
|
|
|
|
**Approach sketch**: walk media topology, require `MEDIA_ENT_F_PROC_VIDEO_DECODER` on the entities, prefer decoder-by-codec mapping. ~100 LOC in `request.c`.
|
|
|
|
**Pros**: removes per-session friction. Mechanical fix.
|
|
|
|
**Cons**: doesn't fix any decode bug. Quality-of-life.
|
|
|
|
## Out-of-scope items (carried unchanged)
|
|
|
|
- Performance metrics (Candidate D from iter5 Phase 0) — still blocked by pixel-correctness gaps in HEVC, VP8, H.264-inter. Defer to a post-correctness iteration.
|
|
- Front-end libva.
|
|
- Other hardware (ohm, ampere/boltzmann).
|
|
- AV1.
|
|
- `cros-codecs` Rust replacement.
|
|
- Bootlin / Collabora upstreaming.
|
|
|
|
## Recommendation
|
|
|
|
If pressed: **Candidate H (Bug 4 H.264 inter)** for impact (H.264 is the most consumer-relevant codec), **Candidate F (Bug 5 HEVC)** for diagnostic-clarity practice (the kernel-direct comparison strace is a clean delta-finding exercise that re-validates the iter5b-β β architecture), or **Candidate G (Bug 6 VP8)** for fast iteration (simpler codec, smaller suspect surface).
|
|
|
|
If multiple iterations are planned, the natural sequence is **G → H → F**: fix the simplest first (VP8), build technique, apply to harder cases.
|
|
|
|
If iter6 should specifically MATCH the difficulty of iter5b-β (medium): **G or H**.
|
|
|
|
If iter6 should specifically EXPAND on iter5b-β's architectural cleanup work: **J** (auto-detect harden) is the architectural fit; small backend change, removes a long-standing fragility.
|
|
|
|
## Locked research question (iteration 6, 2026-05-12)
|
|
|
|
User pick: **Candidate G — Bug 6 VP8 partial output.**
|
|
|
|
> *"Identify and fix the cause of VP8 libva producing 74.8%-zero output with traces of real content (hash `bcc57ed5…`) rather than the byte-identical kdirect output (`136ce5cb…`). After fix: `libva_vp8.yuv == kdirect_vp8.yuv == sw_vp8.yuv` for `bbb_720p10s_vp8.webm` 3-frame test. No regression on VP9, MPEG-2, H.264 keyframe-partial state, or HEVC."*
|
|
|
|
### Pass/fail (boolean)
|
|
|
|
1. **VP8 libva == kdirect**: `cmp -s libva_vp8.yuv kdirect_vp8.yuv` returns 0 on the standard 3-frame sweep. Both equal `136ce5cb…` (the iter5b-β kdirect anchor).
|
|
2. **VP9 unchanged**: `libva_vp9.yuv == kdirect_vp9.yuv == 4f1565e8…` (iter5b-β's PASS preserved).
|
|
3. **MPEG-2 unchanged**: `libva_mpeg2.yuv == kdirect_mpeg2.yuv == 19eefbf4…` (iter5b-β's maintained state preserved).
|
|
4. **H.264 keyframe-partial unchanged**: `libva_h264.yuv == 71ac099b…` (Bug 4 still deferred to a future iteration; no new H.264 regression introduced).
|
|
5. **HEVC unchanged**: `libva_hevc.yuv == 06b2c5a0…` all-zero (Bug 5 still deferred; no new HEVC regression).
|
|
6. **Control-payload anchors hold**: `VIDIOC_S_EXT_CTRLS` payloads on the 5-codec sweep byte-match the iter5 Phase 3 anchors. iter6 changes shouldn't touch control submission.
|
|
|
|
Clean iter6 close = all six criteria green. Bug 6 is the only NEW behavior; the other four codecs must hold their iter5b-β state.
|
|
|
|
### Scope locks
|
|
|
|
**In scope**:
|
|
- `src/vp8.c` — VP8 backend control assembly + slice submission.
|
|
- `src/picture.c` — VP8 dispatch in `codec_set_controls`, VP8 buffer-type cases, BeginPicture/EndPicture flow.
|
|
- `src/surface.c` — `surface_bind_slot` (slot-to-surface binding for CAPTURE).
|
|
- `src/cap_pool.c` / `request_pool.c` — slot lifecycle for VP8 path.
|
|
- `surface.h` `params.vp8` union.
|
|
- Any shared infrastructure that VP8 touches (request_fd lifecycle, DPB binding).
|
|
|
|
**Out of scope**:
|
|
- VP9 / H.264 / HEVC / MPEG-2 code paths (read-only for regression-verify).
|
|
- Kernel patches (this is a backend-side bug per the empirical evidence: kernel succeeded the DQBUF, decode happened, output diverges).
|
|
- Performance metrics.
|
|
- Bug 4, Bug 5, all other backlog items.
|
|
|
|
### Phase 2 source-read targets
|
|
|
|
For the upcoming Phase 2 situation analysis:
|
|
|
|
- `src/vp8.c` — full file. iter3 wrote this; ~300 LOC.
|
|
- `src/picture.c::codec_set_controls` — VP8 dispatch site.
|
|
- `src/picture.c::codec_store_buffer` — VP8 buffer-type cases (Picture, Slice, IQMatrix, ProbabilityData).
|
|
- `src/surface.c::surface_bind_slot` — destination_data fill.
|
|
- `src/cap_pool.c::cap_pool_acquire` — slot selection logic.
|
|
- Kernel UAPI `<linux/v4l2-controls.h>` — `V4L2_CID_STATELESS_VP8_FRAME` + `struct v4l2_ctrl_vp8_frame`.
|
|
- FFmpeg `libavcodec/v4l2_request_vp8.c` (Kwiboo's downstream) — kernel-direct VP8 reference.
|
|
- Phase 3 baseline strace at `iter5_phase3_baseline.tgz/anchors_vp8/` — control-payload anchor.
|
|
|
|
### Phase 3 baseline (re-acquire if needed)
|
|
|
|
iter5 Phase 3 baseline already captured VP8 anchors. iter6 Phase 3 will re-verify on the current β backend + add: byte-level comparison of `libva_vp8.yuv` vs `kdirect_vp8.yuv` to identify where the divergence is (which frames? which planes? which spatial regions?).
|
|
|
|
### Predicted iter6 difficulty
|
|
|
|
Small. The decode is already running through the kernel cleanly (no DQBUF ERROR like HEVC). The divergence is in cap_pool slot rotation or partial fill — most likely a 5-50 LOC fix once root-caused. Phase 2 + Phase 3 may take more time than Phase 6 implementation.
|
|
|
|
### Phase 1 → Phase 2 handoff
|
|
|
|
iter6 Phase 1 is locked above. iter6 Phase 2 reads VP8 source and produces a situation analysis.
|