Files
fresnel-fourier/phase0_findings_iter8.md
marfrit e47a7ba309 iter8 Phase 0: lock Bug 4 — H.264 inter-frame race-loss
User pick at iter8 open. Carried unchanged through 5 iters (iter4..iter7);
keyframe partially decodes (frame-1 first 16 bytes = real chroma) while
inter frames return all-zero. Pass criterion: libva_h264 == kdirec_h264
== sw_h264 byte-identical for bbb_1080p30_h264.mp4 3-frame, including
inter frames.

In scope: src/h264.c, src/h264_slice_header.c, src/picture.c H.264 paths,
per-frame request_fd lifecycle. Out of scope: VP9/VP8/HEVC/MPEG-2, kernel
patches, performance, all other backlog items.

Substrate at iter8 open: fork tip 6df2159 (iter7), backend SHA 520507f6..,
kernel linux-fresnel-fourier 7.0-1, auto-detect picks rkvdec on every boot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:15:45 +00:00

15 KiB
Raw Permalink Blame History

Iteration 8 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock

Opens 2026-05-13 immediately after iter7 clean PASS close (phase8_iteration7_close.md, commit b0ebe67). iter7 closed iter4-B1a (auto-detect decoder/encoder discrimination) with 5/5 Phase 1 criteria green; no codec-correctness axis movement. iter8 pivots to the longest-deferred consumer-impact bug.

User pick at iter8 Phase 0 lock: Bug 4 — H.264 inter-frame race-loss. Rationale per user AskUserQuestion answer: "Largest consumer impact (H.264 = most common codec). Backend code familiar from iter4 work. Likely DPB-related."

Locked research question (iteration 8, 2026-05-13)

"Identify and fix the cause of H.264 inter frames returning all-zero pixels through libva while keyframes partially decode. After fix: libva_h264.yuv == kdirect_h264.yuv == sw_h264.yuv byte-identical for the standard bbb_1080p30_h264.mp4 3-frame fixture, including the two inter frames (frames 2, 3) which today are wholly zero."

Pass/fail (boolean)

  1. H.264 libva == kdirect: cmp -s libva_h264.yuv kdirect_h264.yuv returns 0 across the 3-frame sweep. Inter frames carry real pixel content, not all-zero.
  2. VP9 unchanged: libva_vp9.yuv == kdirect_vp9.yuv == 4f1565e89cd720c4… (iter5b-β/iter7 PASS preserved).
  3. MPEG-2 unchanged: libva_mpeg2.yuv == kdirect_mpeg2.yuv == 19eefbf486e44496… (iter5b-β/iter7 maintained state preserved).
  4. HEVC unchanged: libva_hevc.yuv == 06b2c5a0c01e515d… all-zero (Bug 5 still deferred; no new HEVC regression).
  5. VP8 unchanged: libva_vp8.yuv == bcc57ed5c9021d02… partial (Bug 6 still deferred; no new VP8 regression).
  6. Control-payload anchors hold: VIDIOC_S_EXT_CTRLS payloads on the 5-codec sweep byte-match the iter5 Phase 3 anchors for the four non-H.264 codecs. H.264 payload anchor may shift — Phase 4 plan decides whether to lock the new H.264 anchor or accept the shift as part of the fix.

Clean iter8 close = all six criteria green. Bug 4 is the only NEW behavior; the other four codecs must hold their iter7 close state. Phase 7 → Phase 4 loopback per feedback_dev_process.md if any fail.

Substrate state at iter8 open

Property Value Notes
Kernel 7.0.0-fresnel-fourier (linux-fresnel-fourier 7.0-1) Unchanged through iter5b/iter6/iter7.
Fork tip 6df2159 (iter7 Phase 7 fix-forward) On noether + fresnel + gitea (claude-noether identity).
Backend installed (fresnel) SHA 520507f6d0a1a7eb3797bed42c6f74e0f3a4826ac8a22ed2655e01a6f20aa874 iter7 auto-detect + iter5b-β + Commit D.
Auto-detect Picks /dev/video1 + /dev/media0 (rkvdec) on every fresh boot iter7 closed. No env override needed for H.264.
Bug 4 signature H.264 keyframe partially decodes; inter frames fully zero Unchanged since iter4 Phase 7; carried through iter5/iter5b/iter6/iter7.
Hash anchor 71ac099b8d007836385b6776e6bbf891ddd7b79caad66775ff1fbb85657fb349 Iter5b-β/iter6/iter7 stable anchor for bbb_1080p30_h264.mp4 3-frame.
Frame-1 first 16 bytes 81 81 80 80 80 7f 7f 7f 7f 7f 7f 80 80 80 81 81 Real chroma — keyframe partially landed.
Frame-2/3 content All zero pages The race-loss signature.

Scope locks

In scope:

  • src/h264.c — full file (~994 LOC). DPB management, picture-to-V4L2 control conversion, slice header / weight-pred, set_controls dispatch.
  • src/h264_slice_header.c — slice-header bit-parser (361 LOC).
  • src/picture.c — H.264 dispatch in codec_set_controls, BeginPicture/EndPicture flow, buffer storage for VAProfileH264*.
  • src/surface.c::surface_bind_slot — destination_data fill (read-only verification; iter5b-β touched this).
  • src/request.c — per-frame request_fd lifecycle (iter4 area). Read-only unless re-binding required.
  • Kernel UAPI <linux/v4l2-controls.h> for H.264 — V4L2_CID_STATELESS_H264_* controls (decode_params, sps, pps, scaling_matrix, pred_weights, slice_params, decode_mode).
  • Reference: Kwiboo's FFmpeg downstream libavcodec/v4l2_request_h264.c (kernel-direct H.264 reference path; pixel-identical to SW per iter4 transitive proof).

Out of scope:

  • VP9 / VP8 / HEVC / MPEG-2 code paths (read-only for regression-verify).
  • Kernel patches (Bug 4 is a backend race per the empirical evidence: keyframe data DOES arrive, so kernel decode partially executes; inter frames go to zero, indicating per-frame state setup gap rather than kernel decode-engine failure).
  • Performance metrics.
  • Bug 5, Bug 6, iter4-B1b, all other backlog items.
  • Front-end libva-multiplanar core.

Mechanism the question targets

H.264 backend code was substantially worked at iter4 (DPB refactor, B-slice L1 reflist support, fresh-request_fd per frame, slice-header parser fixes). After iter4, the keyframe started landing partially — clearly real chroma in frame 1's first plane bytes. But inter frames return zero, indicating: keyframe decode is in-progress but inter-frame setup is missing something the kernel needs.

Empirical hypotheses surface (Phase 2/3 will eliminate or confirm):

  • H-A — DPB short_ref / long_ref entries not pre-existing for inter frames at submission time. iter4 fixed dpb_find_invalid_entry (h264.c:55) + dpb_insert (h264.c:140) flow, but the order of operations across frames may leave inter frames with empty DPB at slice submission.
  • H-B — pic_num / frame_num / POC encoding mismatch on inter slices. h264_strip_ffmpeg_poc_sentinel (h264.c:219) strips FFmpeg's NULL-pic sentinel; check if it strips too aggressively on inter slices' reference picture lists.
  • H-C — Slice control flags field missing V4L2_H264_SLICE_FLAG_FIELD_PIC or similar for B/P slices. h264_va_slice_to_v4l2 (h264.c:601) assembles the slice control struct.
  • H-D — Per-inter-frame request_fd not allocated / not bound to controls correctly. iter4 fixed fresh request_fd per frame; verify each call reaches MEDIA_REQUEST_IOC_QUEUE with full control set for inter frames.
  • H-E — Slot rotation / capture-buffer rotation off-by-one across keyframe→inter transitions. iter5b-β touched the OUTPUT lifecycle but the cap_pool LRU recycling at cap_pool.c may bind inter frames to a slot that wasn't validly initialized.
  • H-F — Pred-weight-factors (slice control's pred-weight table) missing or zero-filled. h264_copy_pred_table (h264.c:579) — if VAAPI's slice param doesn't fill weight factors for inter, the v4l2 control may carry stale or all-zero weights, causing the kernel decoder to skip MC.
  • H-G — Scaling matrix not re-uploaded each frame. iter4 path: set_controls covers sps/pps/scaling on each frame, but check whether params.h264.matrix_set (picture.c:192) gates re-upload incorrectly.
  • H-H — VAAPI's slice-data buffer pointer for inter frames doesn't include a start_code prefix, or includes one when it shouldn't. iter5b's h264_start_code per-codec profile gating; verify the dispatch in picture.c:70 is firing correctly.
  • H-I — Some new contract introduced by linux-fresnel-fourier 7.0-1 kernel (v6.16-rc4 base + Bootlin out-of-tree backports). The keyframe partially landing makes a per-codec contract drift plausible — Phase 3 will diff libva-h264 vs ffmpeg-v4l2request-h264 ioctl streams.

The keyframe-partial pattern is the key cue. If the kernel decode engine were completely broken for H.264, the keyframe wouldn't have real bytes either. If only the kernel were misrouting buffers, all frames would show zero. The partial-keyframe / zero-inter split points at a per-frame setup gap that misses something inter-specific.

Phase 2 source-read targets

For Phase 2 situation analysis:

  • src/h264.c — full file. Particular focus on:
    • dpb_find_invalid_entry, dpb_find_oldest_unused_entry, dpb_lookup, dpb_clear_entry, dpb_insert, dpb_update (lines 55217) — DPB management invariants.
    • h264_fill_dpb (line 228) — per-frame DPB control assembly.
    • h264_va_picture_to_v4l2 (line 346) — picture-control fill.
    • h264_va_slice_to_v4l2 (line 601) — slice-control fill. Suspicious area for H-C/H-F.
    • h264_default_flat_scaling_matrix, h264_copy_pred_table (lines 570, 579) — control auxiliary fillers.
    • h264_get_controls, h264_set_controls (lines 689, 797) — main dispatch.
  • src/h264_slice_header.c — full file. Check whether inter-slice bit-positions parse correctly.
  • src/picture.c::codec_set_controls (line 250 H.264 case) — dispatch site.
  • src/picture.c::codec_store_buffer (lines 96, 131, 184) — H.264 buffer storage for picture/slice/matrix.
  • <linux/v4l2-controls.h>V4L2_CID_STATELESS_H264_* IDs + struct v4l2_ctrl_h264_decode_params + struct v4l2_ctrl_h264_slice_params.
  • Kwiboo's FFmpeg libavcodec/v4l2_request_h264.c — kernel-direct reference at ~/src/ffmpeg-v4l2request (or wherever the iter4 baseline lives).
  • iter4 Phase 6 + Phase 7 commits — what was already fixed at iter4 (avoid re-investigating).

Phase 3 baseline

Capture empirical baseline on current iter7 fork tip:

  1. Run the 3-frame H.264 sweep through libva: hash 71ac099b… (anchor).
  2. Run the same fixture through kernel-direct ffmpeg-v4l2request: byte-identical to SW (bbb_1080p30_h264.mp4 3-frame, kdirect anchor TBD at Phase 3).
  3. Strace both runs; diff the V4L2 ioctl streams. Particular interest:
    • VIDIOC_S_EXT_CTRLS payloads per frame (keyframe vs inter): which controls differ, which fields differ.
    • MEDIA_REQUEST_IOC_QUEUE ordering relative to VIDIOC_QBUF and S_EXT_CTRLS.
    • Per-frame request_fd allocation pattern (one fresh fd per frame, reuse pattern, MEDIA_REQUEST_IOC_REINIT cycles).
    • CAPTURE QBUF/DQBUF interleave: does libva-h264 DQBUF cleanly per inter frame, or does it sometimes get a wrong slot?
  4. Byte-level frame-2 / frame-3 examination: is the all-zero output the cap_pool init pattern (constant 0x4c green fill) or true memory-zeroed? This discriminates between "kernel never wrote" and "kernel wrote but to wrong slot then got cleared".

iter5b Phase 3 already captured anchors at iter5_phase3_baseline.tgz/anchors_h264/ — Phase 3 of iter8 may reuse those and supplement with kdirect comparison.

Phase 4 plan shape (predicted)

iter4 work history says H.264 DPB / fresh-request_fd were already touched. The remaining bug is one of the surface items above (H-A through H-I). Mechanical path once root-caused:

  • If H-A/H-B/H-G: ~5-30 LOC fix in h264.c dpb_* or h264_va_picture_to_v4l2.
  • If H-C/H-F: ~10-40 LOC in h264_va_slice_to_v4l2, possibly a flag field or weight-table fill.
  • If H-D: ~5-15 LOC in request.c per-frame lifecycle or h264_set_controls control batching.
  • If H-E: 10-30 LOC in cap_pool.c slot rotation.
  • If H-H: ~5 LOC in picture.c start_code gating.
  • If H-I (kernel contract drift): may require finding a UAPI field added between v5.13 and v6.16-rc4 that libva backend doesn't fill. Same-iteration backend-only fix (no kernel work).

LOC estimate: 5-50 LOC in 1-3 files. One commit, possibly two if a control-fill helper is factored.

Phase 5 review concerns to invite

  • Sonnet-architect review of Phase 4 plan with empirical-over-theoretical discipline per feedback_review_empirical_over_theoretical.md.
  • Re-verify each Phase 4 mechanism end-to-end on the consumer's code path per feedback_trace_fix_mechanism_to_consumer.md: producer (backend control fill) → primitive (V4L2 ioctl on the wire) → consumer (kernel rkvdec decode + CAPTURE-buffer fill). The Phase 4 plan must demonstrate the proposed fix reaches the inter-frame consumer site.
  • Codec-state profile gating per feedback_unconditional_codec_state.md: any change must not silently break VP9 / VP8 / HEVC / MPEG-2.
  • Strace-diff evidence requirement: Phase 4 plan must reference Phase 3 strace diff and identify a specific ioctl-stream divergence as the fix target. Theoretical-only hypotheses without strace anchoring are deferred.

Phase 5 review note (for Phase 5 reviewer)

iter7 close added a deferred memory rule: media-topology code should be validated against a live MEDIA_IOC_G_TOPOLOGY dump from the target hardware. iter8 does NOT touch media-topology code, so this rule is informational only. Iter8's primary discipline anchor is feedback_trace_fix_mechanism_to_consumer.md (Phase 5 v2 iter5b reviewer's amendment).

Predicted iter8 cadence

Medium. The bug has carried through iter4/iter5/iter5b/iter6/iter7 (5 iterations of deferral). It's not because the bug is hard — Bug 6 was harder — but because it hasn't had focused attention. Once Phase 3 strace-diff isolates the divergence, Phase 4 is likely small.

  • Phase 0: this doc.
  • Phase 2: source-read h264.c + picture.c + h264_slice_header.c + kdirect reference. ~45-60 min.
  • Phase 3: strace-diff libva-h264 vs kdirect-h264 + byte-level frame-2/3 examination. ~45-60 min.
  • Phase 4: plan with strace-anchored mechanism. ~30 min.
  • Phase 5: sonnet-architect review with empirical re-verification. ~30-45 min.
  • Phase 6: implement, build, install. ~30-45 min.
  • Phase 7: verify 6/6 criteria + 5-codec regression sweep. ~30 min.
  • Phase 8: close. ~15 min.

Total: 4-5 hours wallclock, contingent on fresnel uptime.

What "iteration 8 close" looks like

Per feedback_dev_process.md Phase 8:

  • All 6 Phase 1 criteria green → clean PASS close.
  • Or 5/6 (Bug 4 narrowed, not fixed; e.g. strace divergence located but kernel-side) → PARTIAL close with explicit narrowing documented; Bug 4 status downgraded from "carried unchanged" to "specifically narrowed to mechanism X".
  • phase8_iteration8_close.md summarizing commit(s) + verification.
  • Campaign scoreboard: H.264 site → PASS direct if criterion 1 green; stays PARTIAL if narrowed-but-unfixed.
  • Memory entry candidates:
    • If Bug 4 root cause is generic (e.g., per-frame control batching contract), capture as a feedback rule for future codec work.
    • If Bug 4 fix points at a kernel UAPI contract drift, capture as a reference memory.

Iteration scoreboard at iter8 open

Codec   | Site      | Iter   | Status        | Verifier path
========|===========|========|===============|====================================
H.264   | rkvdec    | T4     | PARTIAL       | mpv keyframe-seek (Bug 4 inter race) ← iter8 TARGET
MPEG-2  | hantro    | iter1  | PASS direct   | ffmpeg-vaapi-hwdownload
HEVC    | rkvdec    | iter2  | DEGRADED *    | transitive PASS / direct FAIL (Bug 5)
VP8     | hantro    | iter3  | PARTIAL       | transitive PASS / direct partial (Bug 6)
VP9     | rkvdec    | iter4→iter5b | PASS direct ** | ffmpeg-vaapi-hwdownload (iter5b-β fix)

Auto-detect site infrastructure (iter4-B1a): closed at iter7. Multi-decoder routing (iter4-B1b): still backlog.

Phase 1 → Phase 2 handoff

iter8 Phase 1 is locked above. iter8 Phase 2 reads H.264 backend source and produces a situation analysis with focus on the per-inter-frame submission path.