e47a7ba309
User pick at iter8 open. Carried unchanged through 5 iters (iter4..iter7); keyframe partially decodes (frame-1 first 16 bytes = real chroma) while inter frames return all-zero. Pass criterion: libva_h264 == kdirec_h264 == sw_h264 byte-identical for bbb_1080p30_h264.mp4 3-frame, including inter frames. In scope: src/h264.c, src/h264_slice_header.c, src/picture.c H.264 paths, per-frame request_fd lifecycle. Out of scope: VP9/VP8/HEVC/MPEG-2, kernel patches, performance, all other backlog items. Substrate at iter8 open: fork tip 6df2159 (iter7), backend SHA 520507f6.., kernel linux-fresnel-fourier 7.0-1, auto-detect picks rkvdec on every boot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
172 lines
15 KiB
Markdown
172 lines
15 KiB
Markdown
# Iteration 8 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock
|
||
|
||
Opens 2026-05-13 immediately after iter7 clean PASS close ([`phase8_iteration7_close.md`](phase8_iteration7_close.md), commit `b0ebe67`). iter7 closed iter4-B1a (auto-detect decoder/encoder discrimination) with 5/5 Phase 1 criteria green; no codec-correctness axis movement. iter8 pivots to the longest-deferred consumer-impact bug.
|
||
|
||
User pick at iter8 Phase 0 lock: **Bug 4 — H.264 inter-frame race-loss.** Rationale per user AskUserQuestion answer: "Largest consumer impact (H.264 = most common codec). Backend code familiar from iter4 work. Likely DPB-related."
|
||
|
||
## Locked research question (iteration 8, 2026-05-13)
|
||
|
||
> *"Identify and fix the cause of H.264 inter frames returning all-zero pixels through libva while keyframes partially decode. After fix: `libva_h264.yuv == kdirect_h264.yuv == sw_h264.yuv` byte-identical for the standard `bbb_1080p30_h264.mp4` 3-frame fixture, including the two inter frames (frames 2, 3) which today are wholly zero."*
|
||
|
||
### Pass/fail (boolean)
|
||
|
||
1. **H.264 libva == kdirect**: `cmp -s libva_h264.yuv kdirect_h264.yuv` returns 0 across the 3-frame sweep. Inter frames carry real pixel content, not all-zero.
|
||
2. **VP9 unchanged**: `libva_vp9.yuv == kdirect_vp9.yuv == 4f1565e89cd720c4…` (iter5b-β/iter7 PASS preserved).
|
||
3. **MPEG-2 unchanged**: `libva_mpeg2.yuv == kdirect_mpeg2.yuv == 19eefbf486e44496…` (iter5b-β/iter7 maintained state preserved).
|
||
4. **HEVC unchanged**: `libva_hevc.yuv == 06b2c5a0c01e515d…` all-zero (Bug 5 still deferred; no new HEVC regression).
|
||
5. **VP8 unchanged**: `libva_vp8.yuv == bcc57ed5c9021d02…` partial (Bug 6 still deferred; no new VP8 regression).
|
||
6. **Control-payload anchors hold**: `VIDIOC_S_EXT_CTRLS` payloads on the 5-codec sweep byte-match the iter5 Phase 3 anchors for the four non-H.264 codecs. H.264 payload anchor may shift — Phase 4 plan decides whether to lock the new H.264 anchor or accept the shift as part of the fix.
|
||
|
||
Clean iter8 close = all six criteria green. Bug 4 is the only NEW behavior; the other four codecs must hold their iter7 close state. Phase 7 → Phase 4 loopback per `feedback_dev_process.md` if any fail.
|
||
|
||
## Substrate state at iter8 open
|
||
|
||
| Property | Value | Notes |
|
||
|---|---|---|
|
||
| Kernel | `7.0.0-fresnel-fourier` (linux-fresnel-fourier 7.0-1) | Unchanged through iter5b/iter6/iter7. |
|
||
| Fork tip | `6df2159` (iter7 Phase 7 fix-forward) | On noether + fresnel + gitea (claude-noether identity). |
|
||
| Backend installed (fresnel) | SHA `520507f6d0a1a7eb3797bed42c6f74e0f3a4826ac8a22ed2655e01a6f20aa874` | iter7 auto-detect + iter5b-β + Commit D. |
|
||
| Auto-detect | Picks `/dev/video1 + /dev/media0` (rkvdec) on every fresh boot | iter7 closed. No env override needed for H.264. |
|
||
| Bug 4 signature | H.264 keyframe partially decodes; inter frames fully zero | Unchanged since iter4 Phase 7; carried through iter5/iter5b/iter6/iter7. |
|
||
| Hash anchor | `71ac099b8d007836385b6776e6bbf891ddd7b79caad66775ff1fbb85657fb349` | Iter5b-β/iter6/iter7 stable anchor for `bbb_1080p30_h264.mp4` 3-frame. |
|
||
| Frame-1 first 16 bytes | `81 81 80 80 80 7f 7f 7f 7f 7f 7f 80 80 80 81 81` | Real chroma — keyframe partially landed. |
|
||
| Frame-2/3 content | All zero pages | The race-loss signature. |
|
||
|
||
## Scope locks
|
||
|
||
**In scope**:
|
||
- `src/h264.c` — full file (~994 LOC). DPB management, picture-to-V4L2 control conversion, slice header / weight-pred, set_controls dispatch.
|
||
- `src/h264_slice_header.c` — slice-header bit-parser (361 LOC).
|
||
- `src/picture.c` — H.264 dispatch in `codec_set_controls`, BeginPicture/EndPicture flow, buffer storage for VAProfileH264*.
|
||
- `src/surface.c::surface_bind_slot` — destination_data fill (read-only verification; iter5b-β touched this).
|
||
- `src/request.c` — per-frame `request_fd` lifecycle (iter4 area). Read-only unless re-binding required.
|
||
- Kernel UAPI `<linux/v4l2-controls.h>` for H.264 — `V4L2_CID_STATELESS_H264_*` controls (decode_params, sps, pps, scaling_matrix, pred_weights, slice_params, decode_mode).
|
||
- Reference: Kwiboo's FFmpeg downstream `libavcodec/v4l2_request_h264.c` (kernel-direct H.264 reference path; pixel-identical to SW per iter4 transitive proof).
|
||
|
||
**Out of scope**:
|
||
- VP9 / VP8 / HEVC / MPEG-2 code paths (read-only for regression-verify).
|
||
- Kernel patches (Bug 4 is a backend race per the empirical evidence: keyframe data DOES arrive, so kernel decode partially executes; inter frames go to zero, indicating per-frame state setup gap rather than kernel decode-engine failure).
|
||
- Performance metrics.
|
||
- Bug 5, Bug 6, iter4-B1b, all other backlog items.
|
||
- Front-end libva-multiplanar core.
|
||
|
||
## Mechanism the question targets
|
||
|
||
H.264 backend code was substantially worked at iter4 (DPB refactor, B-slice L1 reflist support, fresh-`request_fd` per frame, slice-header parser fixes). After iter4, the keyframe started landing partially — clearly real chroma in frame 1's first plane bytes. But inter frames return zero, indicating: keyframe decode is in-progress but inter-frame setup is missing something the kernel needs.
|
||
|
||
Empirical hypotheses surface (Phase 2/3 will eliminate or confirm):
|
||
|
||
- **H-A — DPB short_ref / long_ref entries not pre-existing for inter frames at submission time.** iter4 fixed `dpb_find_invalid_entry` (h264.c:55) + `dpb_insert` (h264.c:140) flow, but the order of operations across frames may leave inter frames with empty DPB at slice submission.
|
||
- **H-B — `pic_num` / `frame_num` / POC encoding mismatch on inter slices.** `h264_strip_ffmpeg_poc_sentinel` (h264.c:219) strips FFmpeg's NULL-pic sentinel; check if it strips too aggressively on inter slices' reference picture lists.
|
||
- **H-C — Slice control `flags` field missing `V4L2_H264_SLICE_FLAG_FIELD_PIC` or similar for B/P slices.** `h264_va_slice_to_v4l2` (h264.c:601) assembles the slice control struct.
|
||
- **H-D — Per-inter-frame `request_fd` not allocated / not bound to controls correctly.** iter4 fixed fresh `request_fd` per frame; verify each call reaches `MEDIA_REQUEST_IOC_QUEUE` with full control set for inter frames.
|
||
- **H-E — Slot rotation / capture-buffer rotation off-by-one across keyframe→inter transitions.** iter5b-β touched the OUTPUT lifecycle but the cap_pool LRU recycling at `cap_pool.c` may bind inter frames to a slot that wasn't validly initialized.
|
||
- **H-F — Pred-weight-factors (slice control's pred-weight table) missing or zero-filled.** `h264_copy_pred_table` (h264.c:579) — if VAAPI's slice param doesn't fill weight factors for inter, the v4l2 control may carry stale or all-zero weights, causing the kernel decoder to skip MC.
|
||
- **H-G — Scaling matrix not re-uploaded each frame.** iter4 path: `set_controls` covers sps/pps/scaling on each frame, but check whether `params.h264.matrix_set` (picture.c:192) gates re-upload incorrectly.
|
||
- **H-H — VAAPI's slice-data buffer pointer for inter frames doesn't include a start_code prefix, or includes one when it shouldn't.** iter5b's `h264_start_code` per-codec profile gating; verify the dispatch in picture.c:70 is firing correctly.
|
||
- **H-I — Some new contract introduced by linux-fresnel-fourier 7.0-1 kernel** (v6.16-rc4 base + Bootlin out-of-tree backports). The keyframe partially landing makes a per-codec contract drift plausible — Phase 3 will diff libva-h264 vs ffmpeg-v4l2request-h264 ioctl streams.
|
||
|
||
The keyframe-partial pattern is the key cue. If the kernel decode engine were completely broken for H.264, the keyframe wouldn't have real bytes either. If only the kernel were misrouting buffers, all frames would show zero. The partial-keyframe / zero-inter split points at a **per-frame setup gap that misses something inter-specific**.
|
||
|
||
## Phase 2 source-read targets
|
||
|
||
For Phase 2 situation analysis:
|
||
|
||
- `src/h264.c` — full file. Particular focus on:
|
||
- `dpb_find_invalid_entry`, `dpb_find_oldest_unused_entry`, `dpb_lookup`, `dpb_clear_entry`, `dpb_insert`, `dpb_update` (lines 55–217) — DPB management invariants.
|
||
- `h264_fill_dpb` (line 228) — per-frame DPB control assembly.
|
||
- `h264_va_picture_to_v4l2` (line 346) — picture-control fill.
|
||
- `h264_va_slice_to_v4l2` (line 601) — slice-control fill. Suspicious area for H-C/H-F.
|
||
- `h264_default_flat_scaling_matrix`, `h264_copy_pred_table` (lines 570, 579) — control auxiliary fillers.
|
||
- `h264_get_controls`, `h264_set_controls` (lines 689, 797) — main dispatch.
|
||
- `src/h264_slice_header.c` — full file. Check whether inter-slice bit-positions parse correctly.
|
||
- `src/picture.c::codec_set_controls` (line 250 H.264 case) — dispatch site.
|
||
- `src/picture.c::codec_store_buffer` (lines 96, 131, 184) — H.264 buffer storage for picture/slice/matrix.
|
||
- `<linux/v4l2-controls.h>` — `V4L2_CID_STATELESS_H264_*` IDs + `struct v4l2_ctrl_h264_decode_params` + `struct v4l2_ctrl_h264_slice_params`.
|
||
- Kwiboo's FFmpeg `libavcodec/v4l2_request_h264.c` — kernel-direct reference at `~/src/ffmpeg-v4l2request` (or wherever the iter4 baseline lives).
|
||
- iter4 Phase 6 + Phase 7 commits — what was already fixed at iter4 (avoid re-investigating).
|
||
|
||
## Phase 3 baseline
|
||
|
||
Capture empirical baseline on current iter7 fork tip:
|
||
|
||
1. Run the 3-frame H.264 sweep through libva: hash `71ac099b…` (anchor).
|
||
2. Run the same fixture through kernel-direct ffmpeg-v4l2request: byte-identical to SW (`bbb_1080p30_h264.mp4` 3-frame, kdirect anchor TBD at Phase 3).
|
||
3. Strace both runs; diff the V4L2 ioctl streams. Particular interest:
|
||
- `VIDIOC_S_EXT_CTRLS` payloads per frame (keyframe vs inter): which controls differ, which fields differ.
|
||
- `MEDIA_REQUEST_IOC_QUEUE` ordering relative to `VIDIOC_QBUF` and S_EXT_CTRLS.
|
||
- Per-frame request_fd allocation pattern (one fresh fd per frame, reuse pattern, MEDIA_REQUEST_IOC_REINIT cycles).
|
||
- CAPTURE QBUF/DQBUF interleave: does libva-h264 DQBUF cleanly per inter frame, or does it sometimes get a wrong slot?
|
||
4. Byte-level frame-2 / frame-3 examination: is the all-zero output the cap_pool init pattern (constant 0x4c green fill) or true memory-zeroed? This discriminates between "kernel never wrote" and "kernel wrote but to wrong slot then got cleared".
|
||
|
||
iter5b Phase 3 already captured anchors at `iter5_phase3_baseline.tgz/anchors_h264/` — Phase 3 of iter8 may reuse those and supplement with kdirect comparison.
|
||
|
||
## Phase 4 plan shape (predicted)
|
||
|
||
iter4 work history says H.264 DPB / fresh-request_fd were already touched. The remaining bug is one of the surface items above (H-A through H-I). Mechanical path once root-caused:
|
||
|
||
- If H-A/H-B/H-G: ~5-30 LOC fix in `h264.c` dpb_* or h264_va_picture_to_v4l2.
|
||
- If H-C/H-F: ~10-40 LOC in `h264_va_slice_to_v4l2`, possibly a flag field or weight-table fill.
|
||
- If H-D: ~5-15 LOC in request.c per-frame lifecycle or h264_set_controls control batching.
|
||
- If H-E: 10-30 LOC in cap_pool.c slot rotation.
|
||
- If H-H: ~5 LOC in picture.c start_code gating.
|
||
- If H-I (kernel contract drift): may require finding a UAPI field added between v5.13 and v6.16-rc4 that libva backend doesn't fill. Same-iteration backend-only fix (no kernel work).
|
||
|
||
LOC estimate: 5-50 LOC in 1-3 files. One commit, possibly two if a control-fill helper is factored.
|
||
|
||
## Phase 5 review concerns to invite
|
||
|
||
- Sonnet-architect review of Phase 4 plan with empirical-over-theoretical discipline per `feedback_review_empirical_over_theoretical.md`.
|
||
- Re-verify each Phase 4 mechanism end-to-end on the consumer's code path per `feedback_trace_fix_mechanism_to_consumer.md`: producer (backend control fill) → primitive (V4L2 ioctl on the wire) → consumer (kernel rkvdec decode + CAPTURE-buffer fill). The Phase 4 plan must demonstrate the proposed fix reaches the inter-frame consumer site.
|
||
- Codec-state profile gating per `feedback_unconditional_codec_state.md`: any change must not silently break VP9 / VP8 / HEVC / MPEG-2.
|
||
- Strace-diff evidence requirement: Phase 4 plan must reference Phase 3 strace diff and identify a specific ioctl-stream divergence as the fix target. Theoretical-only hypotheses without strace anchoring are deferred.
|
||
|
||
## Phase 5 review note (for Phase 5 reviewer)
|
||
|
||
iter7 close added a deferred memory rule: **media-topology code should be validated against a live `MEDIA_IOC_G_TOPOLOGY` dump from the target hardware**. iter8 does NOT touch media-topology code, so this rule is informational only. Iter8's primary discipline anchor is `feedback_trace_fix_mechanism_to_consumer.md` (Phase 5 v2 iter5b reviewer's amendment).
|
||
|
||
## Predicted iter8 cadence
|
||
|
||
Medium. The bug has carried through iter4/iter5/iter5b/iter6/iter7 (5 iterations of deferral). It's not because the bug is hard — Bug 6 was harder — but because it hasn't had focused attention. Once Phase 3 strace-diff isolates the divergence, Phase 4 is likely small.
|
||
|
||
- Phase 0: this doc.
|
||
- Phase 2: source-read h264.c + picture.c + h264_slice_header.c + kdirect reference. ~45-60 min.
|
||
- Phase 3: strace-diff libva-h264 vs kdirect-h264 + byte-level frame-2/3 examination. ~45-60 min.
|
||
- Phase 4: plan with strace-anchored mechanism. ~30 min.
|
||
- Phase 5: sonnet-architect review with empirical re-verification. ~30-45 min.
|
||
- Phase 6: implement, build, install. ~30-45 min.
|
||
- Phase 7: verify 6/6 criteria + 5-codec regression sweep. ~30 min.
|
||
- Phase 8: close. ~15 min.
|
||
|
||
Total: 4-5 hours wallclock, contingent on fresnel uptime.
|
||
|
||
## What "iteration 8 close" looks like
|
||
|
||
Per `feedback_dev_process.md` Phase 8:
|
||
|
||
- All 6 Phase 1 criteria green → clean PASS close.
|
||
- Or 5/6 (Bug 4 narrowed, not fixed; e.g. strace divergence located but kernel-side) → PARTIAL close with explicit narrowing documented; Bug 4 status downgraded from "carried unchanged" to "specifically narrowed to mechanism X".
|
||
- `phase8_iteration8_close.md` summarizing commit(s) + verification.
|
||
- Campaign scoreboard: H.264 site → PASS direct if criterion 1 green; stays PARTIAL if narrowed-but-unfixed.
|
||
- Memory entry candidates:
|
||
- If Bug 4 root cause is generic (e.g., per-frame control batching contract), capture as a feedback rule for future codec work.
|
||
- If Bug 4 fix points at a kernel UAPI contract drift, capture as a reference memory.
|
||
|
||
## Iteration scoreboard at iter8 open
|
||
|
||
```
|
||
Codec | Site | Iter | Status | Verifier path
|
||
========|===========|========|===============|====================================
|
||
H.264 | rkvdec | T4 | PARTIAL | mpv keyframe-seek (Bug 4 inter race) ← iter8 TARGET
|
||
MPEG-2 | hantro | iter1 | PASS direct | ffmpeg-vaapi-hwdownload
|
||
HEVC | rkvdec | iter2 | DEGRADED * | transitive PASS / direct FAIL (Bug 5)
|
||
VP8 | hantro | iter3 | PARTIAL | transitive PASS / direct partial (Bug 6)
|
||
VP9 | rkvdec | iter4→iter5b | PASS direct ** | ffmpeg-vaapi-hwdownload (iter5b-β fix)
|
||
```
|
||
|
||
Auto-detect site infrastructure (iter4-B1a): closed at iter7. Multi-decoder routing (iter4-B1b): still backlog.
|
||
|
||
## Phase 1 → Phase 2 handoff
|
||
|
||
iter8 Phase 1 is locked above. iter8 Phase 2 reads H.264 backend source and produces a situation analysis with focus on the per-inter-frame submission path.
|