iter1 Phase 8 close: 2/5 codecs passing, 3 lessons distilled to memory

Iteration 1 closes with all five Phase 1 boolean-correctness criteria
green. Second codec passes — campaign scoreboard 1/5 → 2/5 (H.264
in T4, MPEG-2 in iter1). Loop terminates per
feedback_dev_process.md Phase 8.

What landed:

  Code (libva-v4l2-request-fourier master 65969da..229d6d1):
    e7dad7a iter1 Phase 6 commit A: config.c break for MPEG-2 cases
    5fe873c iter1 Phase 6 commit B: rewrite mpeg2.c against new V4L2 stateless API
    3aab187 iter1 Phase 6 commit C: delete staging-era include/mpeg2-ctrls.h
    229d6d1 iter1 Phase 6 commit D: drop missed mpeg2-ctrls.h include from context.c (fix-forward)

  All four authored as Claude (noether) per feedback_gitea_as_claude_noether.md.

  Campaign docs (fresnel-fourier):
    f720c77 iter1 Phase 0 + Phase 1 lock
    cc55a6e iter1 Phase 2 situation analysis (3 bugs)
    b9625af iter1 Phase 3 baseline measurements (4 baselines)
    3e996d0 iter1 Phase 4 plan (6 contract clauses)
    0e2e1c2 iter1 Phase 5 sonnet review (6 findings, 4 amendments)
    ec9133a iter1 Phase 7 verification (5/5 GREEN)
    [this commit] iter1 Phase 8 close

Three lessons distilled to durable memory:

  L1 — feedback_header_deletion_check.md
    Phase 2 grep audit found 2 of 3 include sites for the
    staging-era mpeg2-ctrls.h header; build broke on Commit C
    delete because of context.c:42. Rule: when removing a header,
    let the compiler enumerate includes authoritatively (clean
    rebuild after include-removal patches, before git rm). Grep
    is a hint; the compiler is the authority.

  L2 — feedback_review_empirical_over_theoretical.md
    Phase 5 reviewer S2 flagged a numerical mismatch in
    vbv_buffer_size between Baseline C (1.31 MB) and predicted
    post-fix (1 MB). I rejected with confident source-read
    reasoning (slot->size = sizeimage = matches). Phase 7
    byte-compare empirically showed the reviewer was right —
    post-fix produces 1 MB, not 1.31 MB. Operational impact nil
    (kernel ignores the field), but my Phase 5 rebuttal had a
    source-read gap. Rule: when a reviewer cites a concrete
    numerical discrepancy, defer to Phase 7 byte-compare; don't
    reject on source-read alone.

  L3 — feedback_rockchip_pixel_verify_path.md
    Iter1 + T4 (H.264) both empirically confirm: libva backend's
    vaDeriveImage / cached-mmap readback returns all-zero NV12
    on RK3399 — same iter1 patch-0011 cache-coherency bug class
    observed on RK3568. Pixel verification must use DMA-BUF GL
    import (mpv --vo=image, ffmpeg-v4l2request DRM_PRIME +
    hwdownload). NOT ffmpeg-vaapi+hwdownload (cache-stale on
    Rockchip). Codec-agnostic; applies to all 5 codecs in
    campaign scope.

Backlog items deferred (campaign-internal, not durable memory):

  B1: V4L2 /dev/videoN numbering shuffles across reboots
      (rkvdec moved video3+media1 → video1+media0 between
       Phase 0/3 and Phase 7). Backend should probe /dev/media*
      for driver match. Iter2+ Phase 4 cross-cutting candidate.

  B2: mpv --hwdec=vaapi-copy silently filters MPEG-2 out before
      libva is loaded. mpv --hwdec=vaapi (DMA-BUF) DOES engage.
      Phase 1 criterion 3 ended up anchored on ffmpeg-direct.
      Mpv-side investigation as separate follow-up.

  B3: Latent surface-reuse bug — picture.c:287 h264.matrix_set
      reset writes byte 240 of params union, lands inside
      mpeg2.iqmatrix.chroma_intra_quantiser_matrix[20] (verified
      via offsetof on fresnel via gcc + libva). Per-frame
      RenderPicture overwrites this byte for ffmpeg-vaapi flows
      that send VAIQMatrixBufferType every frame. Latent for
      VAAPI clients that reuse a surface without re-sending
      IQMatrix. Iter2+ candidate.

  B4: src/context.c:142-155 H.264 device-init runs unconditionally
      on every CreateContext, EINVALs on hantro. Intentional
      best-effort but request_log fires "Unable to set
      control(s)" cosmetically. Suppress-log candidate, low
      priority.

  B5: vbv_buffer_size = SOURCE_SIZE_MAX (1 MB) rather than
      negotiated sizeimage. Kernel ignores. Polish candidate.

Phase 4 cross-cutting work items collected:
  - Add VIDIOC_EXPBUF + DMA_BUF_IOCTL_SYNC to libva backend
    image-export (fixes L3's vaDeriveImage cache-stale bug for
    all codecs).
  - V4L2 device-discovery probe (fixes B1).
  - Picture.c BeginPicture profile-aware reset (fixes B3).
  - Context.c H.264 device-init log suppression (fixes B4).

Campaign roadmap (codec iterations remaining):
  iter2: HEVC on rkvdec — re-enable h265.c in build, audit against
         rkvdec kernel HEVC contract.
  iter3: VP8 on hantro — implement vp8.c.
  iter4: VP9 on rkvdec — implement vp9.c (largest control surface).

Phase 5 review S2 historical-record correction: Phase 5 reviewer
was numerically right about vbv_buffer_size. My Phase 5 rebuttal
in phase5_iter1_review.md was empirically wrong. Acknowledged in
phase7_iter1_verification.md and phase8_iteration1_close.md;
Phase 5 doc preserved as-is for the historical record.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-08 09:37:19 +00:00
parent ec9133a5e4
commit dc6937868a
+133
View File
@@ -0,0 +1,133 @@
# Iteration 1 — Phase 8 (close)
Iteration 1 of the fresnel-fourier campaign closes 2026-05-08 with all five Phase 1 boolean-correctness criteria green. **Second codec passes**: campaign scoreboard advances 1/5 → **2/5** (H.264 from T4, MPEG-2 from iter1).
## What iter1 fixed
MPEG-2 hardware-accelerated decode on Rockchip RK3399 / hantro-vpu-dec / libva-v4l2-request-fourier — bit-exact correct against software reference, when read via the cache-coherency-safe DMA-BUF GL import path.
Three independent bugs (per Phase 2 source-read), all in the libva backend (kernel + driver path was already solid per Phase 0 cross-validator sweep):
1. **`src/config.c:55-69` fall-through to `default:`** — MPEG-2 case labels existed but lacked `break;`, so `vaCreateConfig(VAProfileMPEG2Main)` returned `VA_STATUS_ERROR_UNSUPPORTED_PROFILE` (= 12). Fix: 5-line `break;` addition.
2. **`src/mpeg2.c` used staging-era UAPI** — `V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS` (= `V4L2_CID_MPEG_BASE+250` = 0x9909fa) and the combined slice_params struct. Mainline kernel removed these in favor of the split `V4L2_CID_STATELESS_MPEG2_{SEQUENCE,PICTURE,QUANTISATION}` (= 0xa409dc/dd/de) with three separate control structs. Fix: full rewrite (~120 lines) against the new API; one batched `VIDIOC_S_EXT_CTRLS count=3` per frame matching FFmpeg's `libavcodec/v4l2_request_mpeg2.c:130-155` reference.
3. **`include/mpeg2-ctrls.h` was the staging-era local header** — masking the kernel's modern `<linux/v4l2-controls.h>` definitions. Fix: delete the local header. (Required dropping `#include` from THREE files — `src/config.c:37`, `src/mpeg2.c:38`, and `src/context.c:42`. Phase 2 grep audit only found the first two; Commit D fix-forward caught the third.)
## Contract that was followed
Per `feedback_dev_process.md` Phase 6 contract-before-code, citations from authoritative sources, mapped 1:1 to the patch hunks (full detail in [`phase4_iter1_plan.md`](phase4_iter1_plan.md) Clauses 16):
- Linux mainline UAPI: `include/uapi/linux/v4l2-controls.h:1985-2105` for the new MPEG-2 stateless control IDs, struct layouts, and flag macros.
- FFmpeg Kwiboo branch: `libavcodec/v4l2_request_mpeg2.c:130-155` for the batched submission shape (3 `v4l2_ext_control` entries, single `VIDIOC_S_EXT_CTRLS`).
- Linux mainline driver: `drivers/media/platform/verisilicon/hantro_mpeg2.c::hantro_mpeg2_dec_copy_qtable` — confirms zigzag-to-raster permutation is kernel-side; libva backend's quantisation matrices stay in zigzag order (no permutation in userspace).
- Empirical anchor: Phase 3 Baseline C ([`phase0_evidence/2026-05-07/iter1_phase3/baseline_C_xvalidator/`](phase0_evidence/2026-05-07/iter1_phase3/baseline_C_xvalidator/)) — verbatim 256-byte QUANTISATION payload from ffmpeg-v4l2request decode of the BBB MPEG-2 fixture, used for byte-by-byte transcription of `mpeg2_default_intra_matrix[64]` (per Phase 5 S3 amendment that flagged spec-recall as unreliable).
## What landed where
Code commits on `git.reauktion.de/marfrit/libva-v4l2-request-fourier`:
```
229d6d1 fresnel-fourier iter1 Phase 6 commit D: drop missed mpeg2-ctrls.h include from context.c (fix-forward)
3aab187 fresnel-fourier iter1 Phase 6 commit C: delete staging-era include/mpeg2-ctrls.h
5fe873c fresnel-fourier iter1 Phase 6 commit B: rewrite mpeg2.c against new V4L2 stateless API
e7dad7a fresnel-fourier iter1 Phase 6 commit A: config.c break for MPEG-2 cases
65969da iter8 Phase 4: tests/run_perf_binding_cell.sh — perf binding cell harness (libva-multiplanar tip)
```
All four commits authored as `Claude (noether)` per memory `feedback_gitea_as_claude_noether.md`.
Campaign documents on `git.reauktion.de/marfrit/fresnel-fourier`:
```
ec9133a iter1 Phase 7: verification — all 5 criteria GREEN, second codec PASS
0e2e1c2 iter1 Phase 5: sonnet-architect review — 6 findings, 4 amendments
3e996d0 iter1 Phase 4: plan — contract clauses, diff scope, Phase 7 harness
cc55a6e iter1 Phase 2: situation analysis — three bugs in MPEG-2 path
b9625af iter1 Phase 3: baseline measurements — Phase 2 confirmed empirically
f720c77 iter1 Phase 0 + Phase 1 lock: MPEG-2 boolean correctness on hantro
```
(Plus campaign Phase 0 setup commits + this Phase 8 close.)
## Phase 1 binding-cell verification (Phase 7 verbatim)
| Criterion | Result |
|---|---|
| 1: vainfo MPEG-2 Simple+Main enumeration | ✅ both profiles with `VAEntrypointVLD` |
| 2: `vaCreateConfig(VAProfileMPEG2Main)` | ✅ `VA_STATUS_SUCCESS` (libva trace verbatim) |
| 3: ffmpeg-hwaccel-vaapi engages, exit 0 | ✅ 5 frames, exit 0, no `Failed to create decode configuration` |
| 4: DMA-BUF GL HW=SW byte-identical (+02s) | ✅ HW1=SW1=`6e7873030dbf...`, HW2=SW2=`ccc7ce08810d...`, frames differ |
| 5: T4 H.264 reference hashes match | ✅ HW + SW frames match `f623d5f7...` and `7d7bc6f2...` |
Bonus: post-fix `VIDIOC_S_EXT_CTRLS` payload structurally matches Baseline C cross-validator (count=3, ctrl_class=`0xf010000=V4L2_CTRL_CLASS_CODEC_STATELESS`, sizes 12/32/256, QUANTISATION intra_matrix bytes byte-identical). One numerical divergence in `vbv_buffer_size` (1 MB post-fix vs 1.31 MB cross-validator) — kernel-ignored informational field, non-blocking.
## Lessons (for Phase 8 memory update)
The 8-phase loop caught five lessons worth distilling. Three become durable memory entries (next section); two stay as campaign-internal backlog notes.
### Worth a memory entry
**L1 — Header deletion: completeness check is `git rm` + clean rebuild, not grep alone.**
Phase 2 source-read used `git grep "mpeg2-ctrls.h"` to enumerate include sites. The grep returned `src/config.c:37` and `src/mpeg2.c:38` — Phase 4 plan and Commit B + C scope expanded only those two files. After Commit C (`git rm include/mpeg2-ctrls.h`), the build broke on `src/context.c:42` — a third include the grep had missed (most likely because of grep tooling quirk + my path filtering). Cost: one fix-forward commit (D) and a moment of confused debugging.
**Rule**: when removing a header, schedule `git rm` *during* the patch series and verify with a clean rebuild. The compiler enumerates includes authoritatively; grep is best-effort. **Memory entry**: `feedback_header_deletion_check.md`.
**L2 — When the Phase 5 reviewer flags a numerical discrepancy, lean empirical, not theoretical.**
Phase 5 reviewer's S2 finding said: "post-fix `vbv_buffer_size` will be `SOURCE_SIZE_MAX` (1 MB), Baseline C shows 0x151800 (1.31 MB) — discrepancy unexplained." I responded with confident source-read reasoning that `slot->size = sizeimage = 1382400` ergo equality holds, and rejected the finding. Phase 7 byte-compare empirically showed: post-fix produces 1 MB, NOT 1.31 MB. The reviewer was numerically right; my source-read was wrong (the actual slot->size resolution path doesn't equal the negotiated sizeimage in our backend's S_FMT setup).
Operational impact: nil (kernel ignores `vbv_buffer_size` per the doc). But the historical record of "I rejected this finding with the wrong reason" is itself the lesson.
**Rule**: when a reviewer cites a concrete numerical mismatch between two captured payloads, run the empirical byte-compare in Phase 7 BEFORE rejecting on source-read grounds. Source-read can have gaps; verbatim-vs-verbatim comparison can't lie. **Memory entry**: `feedback_review_empirical_over_theoretical.md`.
**L3 — On RK3399 (and Rockchip in general?): `vaDeriveImage` cache-stale bug class is real and codec-agnostic.**
Both H.264 (T4) and MPEG-2 (iter1) decode paths produce all-zero NV12 when read via libva's `vaDeriveImage` + cached mmap (the path that ffmpeg-vaapi `+ hwdownload` uses). The kernel + driver decode is correct (verified by ffmpeg-v4l2request DRM_PRIME path producing matching pixels to SW reference). The bug is in the libva backend's image-export — same iter1 patch-0011 bug class observed on RK3568 (ohm) but pre-fixed there.
**Rule**: pixel verification on Rockchip platforms must use the DMA-BUF GL import path (`mpv --vo=image` or equivalent), NEVER the cached-mmap readback path. Treat ffmpeg-vaapi+hwdownload's correct-pixel guarantee as suspect on these SoCs until backend gains `VIDIOC_EXPBUF` + `DMA_BUF_IOCTL_SYNC` support. **Memory entry**: `feedback_rockchip_pixel_verify_path.md`.
### Stays as campaign-internal backlog (not durable memory)
**B1 — V4L2 `/dev/videoN` numbering shuffles across reboots on RK3399.** Phase 0/3 had `rkvdec=video3+media1, hantro=video5+media2`; this boot has `rkvdec=video1+media0, hantro=video3+media1`. Hardcoded env-var paths in Phase 1 binding cells are fragile. Backend should probe `/dev/media*` for `driver=hantro-vpu`/`rkvdec` rather than rely on env var stability. **Iter2+ Phase 4 cross-cutting candidate.**
**B2 — mpv `--hwdec=vaapi-copy` silently filters MPEG-2 out** before libva is loaded. mpv's hwdec policy or ffmpeg-mpv glue layer rejects MPEG-2 from the vaapi-copy candidates, so the binding cell never engaged the libva backend. mpv `--hwdec=vaapi` (DMA-BUF, not -copy) DOES engage. Phase 1 criterion 3 anchored on ffmpeg-direct as a result; mpv-vaapi-copy-MPEG-2 is a separate follow-up task (mpv-side investigation, possibly an `--hwdec-codecs` filter override).
### Latent bugs surfaced but deferred
**B3 — `src/picture.c:287` `params.h264.matrix_set = false` writes byte 240 of the surface's `params` union**, which (per offsetof verification on fresnel via gcc + libva) lands inside `mpeg2.iqmatrix.chroma_intra_quantiser_matrix[20]`. For ffmpeg-vaapi-driven flows this byte is overwritten by the per-frame `VAIQMatrixBufferType` (which `codec_store_buffer:140-146` copies wholesale). For VAAPI clients that reuse a surface without re-sending the IQMatrix on every cycle, the corruption persists. Latent bug; not exercised by iter1's binding cells. **Iter2+ candidate** — the fix is making the BeginPicture matrix_set reset profile-aware, or simpler `memset(&params, 0, sizeof params)`.
**B4 — `src/context.c:142-155` H.264 device-init runs unconditionally on every CreateContext** regardless of profile. Returns EINVAL on hantro-vpu-dec (which has no H.264 controls). Intentional best-effort behavior — return cast to `(void)` and discarded — but `src/v4l2.c:484: request_log("Unable to set control(s): %s\n", strerror(errno))` still fires unconditionally, so every MPEG-2 decode session prints `v4l2-request: Unable to set control(s): Invalid argument` once at init. Cosmetic only; iter2+ cleanup candidate (suppress the log when caller intent is best-effort silent fail).
**B5 — `vbv_buffer_size = SOURCE_SIZE_MAX = 1 MB`** rather than the kernel-negotiated `sizeimage`. Informational field, kernel ignores. Low-priority post-iter1 polish.
## What's next (campaign roadmap)
Per `phase0_evidence/2026-05-07/cross_validator_traces.md` suggested ordering, the campaign now has:
| Iter | Codec | Status |
|---|---|---|
| iter1 | MPEG-2 (hantro) | ✅ closed (this iteration) |
| iter2 | HEVC (rkvdec) | open — re-enable `h265.c` in build, audit against rkvdec kernel contract |
| iter3 | VP8 (hantro) | open — implement `vp8.c` |
| iter4 | VP9 (rkvdec) | open — implement `vp9.c` (largest kernel control surface) |
| iterN | Phase 4 cross-cutting | open — vaDeriveImage cache-stale fix; V4L2 device-discovery; B3-B5 polish |
Phase 8 doesn't pre-lock iter2; that decision is the next operator-driven moment. When iter2 opens, its Phase 0 substrate carries forward iter1's findings (especially B3-B5 if those iterations risk exercising them).
## Memory updates (next section)
Three new memory entries land alongside this close:
- `feedback_header_deletion_check.md` (L1)
- `feedback_review_empirical_over_theoretical.md` (L2)
- `feedback_rockchip_pixel_verify_path.md` (L3)
Plus an update to `phase5_iter1_review.md` to acknowledge S2 was numerically right (the historical-record correction).
Plus update to `MEMORY.md` index.
## Iteration close acknowledgement
Per `feedback_dev_process.md`'s Phase 8: "Loop terminates here." Iteration 1 of fresnel-fourier closes here. The 8-phase loop ran once end-to-end, no Phase 7 → Phase 4 loopbacks needed (one Phase 6 → Phase 4 mid-implementation surface for Commit D fix-forward). Phase 5 sonnet review caught two amendments worth landing (S3, Q4) plus one finding (S2) my response was wrong about. The campaign is on track.