iter2 Phase 8 close: 3/5 codecs passing, lesson L1 extended (BOTH directions)

Iteration 2 closes with all 5 Phase 1 boolean-correctness criteria
green. Third codec passes — campaign scoreboard 2/5 → 3/5 (H.264
in T4, MPEG-2 in iter1, HEVC in iter2). Loop terminates per
feedback_dev_process.md Phase 8.

Notable: ZERO Phase 7 → Phase 4 loopbacks needed. Phase 5 review
caught all 3 would-be loopback triggers in advance (data_byte_offset
rename, dpb.rps→index-arrays semantics, pic_order_cnt_val rename).
This is the dev-process ideal: review catches bugs before
implementation lands; verification confirms contract.

What landed:

  Code (libva-v4l2-request-fourier master 229d6d1 → 8d71e20):
    cca539d iter2 Phase 6 commit A: config.c break for HEVCMain case
    8d71e20 iter2 Phase 6 commit B: rewrite h265.c against new V4L2
            stateless HEVC API (6 files, 463 ins, 236 del)

  Both authored as Claude (noether) per feedback_gitea_as_claude_noether.md.

  Campaign docs (fresnel-fourier):
    6e8c970 iter2 Phase 0 + Phase 1 lock
    b3ba157 iter2 Phase 2 situation analysis (6 bugs)
    d35a247 iter2 Phase 3 baselines (substrate post-pacman-Syu + HEVC anchor)
    348736e iter2 Phase 4 plan (10 contract clauses)
    9eae068 iter2 Phase 5 sonnet review (3 Critical UAPI errors caught)
    05b4bd5 iter2 Phase 7 verification (5/5 GREEN)
    [this commit] iter2 Phase 8 close

Lesson L1 distilled to memory (extension of iter1 entry):

  feedback_review_empirical_over_theoretical.md updated with
  Direction 2 corollary. Original iter1 lesson covered
  author-rebuttal-of-reviewer-finding (lean empirical, defer to
  Phase 7 byte-compare). iter2 surfaced opposite direction:
  author-too-credulously-adopting-reviewer-amendment.

  Concrete iter2 instance: Phase 5 S1 suggested
  picture->pic_fields.bits.uniform_spacing_flag exists in VAAPI as
  source for V4L2_HEVC_PPS_FLAG_UNIFORM_SPACING. I adopted into
  amended plan without verifying. Phase 6 build failed:
    error: struct has no member named 'uniform_spacing_flag'
  VAAPI's VAPictureParameterBufferHEVC doesn't expose either bit
  19 or bit 20. Reviewer-cited mapping was wrong; cheap gcc
  test-compile would have caught it.

  Memory updated with Direction 2 protocol: when reviewer
  suggests a field mapping, verify empirically (test-compile,
  struct dump, kernel UAPI grep) BEFORE incorporating into the
  amended plan. Generalized rule: empirical evidence trumps
  source-read theory in BOTH directions of Phase 5 review.

Backlog items deferred (campaign-internal, not durable memory):

  B6 — VAAPI ↔ V4L2 SPS field-fidelity gaps:
    sps_max_num_reorder_pics (post-fix=0, baseline=2),
    sps_max_latency_increase_plus1 (post-fix=0, baseline=4),
    possibly PPS bit 12 ENTROPY_CODING_SYNC_ENABLED.
    VAAPI doesn't expose; FFmpeg parses from bitstream directly.
    Operational impact NIL (Phase 7 Criterion 4 byte-identical
    pixel pass). Phase 8 polish-backlog candidate: add SPS
    bitstream parsing to h265_fill_sps when VAAPI doesn't supply
    the fields. Probably low ROI — kernel HEVC handler tolerates
    0 values for the BBB fixture. Defer until a real-world consumer
    surfaces a fixture that breaks on this.

  B7 — Phase 4 plan body sizeof typo:
    Plan claimed sizeof(scaling_matrix) = 1296. Empirical = 1000
    bytes. Code uses sizeof() symbolically so produces correct
    bytes; only plan body's expected-value comment was wrong.
    Phase 7 byte-compare structural check caught it. Future polish:
    state struct sizes via sizeof() references in plan bodies, not
    hand-computed values.

iter1 carryover backlog (still deferred):

  B3 latent surface-reuse bug — picture.c:287 h264.matrix_set=false
      hits union byte 240. For HEVC: byte 240 lands in h265.picture.
      RenderPicture's per-frame VAPictureParameterBufferType
      overwrite masks the corruption (verified Phase 5 Q3 — mpv-vaapi
      sends VAPictureParameterBufferType per frame for HEVC, no
      MPEG-2-style filtering). Iter2+ Phase 4 cross-cutting candidate.

  B4 context.c H.264 device-init log noise (rkvdec accepts both
      H.264 + HEVC controls cleanly; on hantro both EINVAL with
      cosmetic log). Iter2 added a 2nd batched call for HEVC; same
      (void) swallow pattern. Cosmetic.

  B5 vbv_buffer_size 1MB vs 1.31MB (MPEG-2-only; not exercised by
      HEVC).

Phase 4 cross-cutting work items collected:
  - VIDIOC_EXPBUF + DMA_BUF_IOCTL_SYNC for vaDeriveImage cache-stale
    fix (still applies to all codecs)
  - V4L2 device-discovery probe (still applies)
  - picture.c BeginPicture profile-aware reset (B3)
  - context.c H.264+HEVC device-init log suppression (B4)
  - mpeg2 vbv_buffer_size negotiation polish (B5)
  - h265 SPS bitstream-parse fidelity polish (B6)

Campaign roadmap (codec iterations remaining):
  iter3: VP8 on hantro — implement vp8.c. Smaller scope than iter2;
         predicting closer to iter1 MPEG-2 in size. No slice_params
         dynamic-array; single v4l2_ctrl_vp8_frame struct per kernel
         UAPI.
  iter4: VP9 on rkvdec — implement vp9.c. Largest control surface
         remaining.

Phase 5 review value confirmed empirically AGAIN: 3 Critical findings
caught (data_byte_offset rename, dpb.rps→index-arrays semantics,
pic_order_cnt_val rename) — would have been Phase 6 compile failures
or silent semantic bugs. Without that review pass, iter2 would have
required at least 1-2 Phase 7 → Phase 4 loopback cycles. Reviews
are never skippable per global ~/.claude/CLAUDE.md rule; iter2
exemplifies why.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-08 15:16:55 +00:00
parent 05b4bd56ec
commit df787a6cc2
+138
View File
@@ -0,0 +1,138 @@
# Iteration 2 — Phase 8 (close)
Iteration 2 of the fresnel-fourier campaign closes 2026-05-08 with all five Phase 1 boolean-correctness criteria green. **Third codec passes**: campaign scoreboard advances 2/5 → **3/5** (H.264 in T4, MPEG-2 in iter1, HEVC in iter2).
## What iter2 fixed
HEVC Main hardware-accelerated decode on Rockchip RK3399 / rkvdec / libva-v4l2-request-fourier — bit-exact correct against software reference, when read via the cache-coherency-safe DMA-BUF GL import path.
Six bugs (per Phase 2 source-read), all in the libva backend (kernel + driver path was already proven by Phase 0 cross-validator sweep). Iter1's three-bug pattern repeated for HEVC, plus three additional bugs unique to HEVC's larger contract surface:
1. **`src/config.c:67` HEVCMain fall-through** — case label existed but lacked `break;`, so `vaCreateConfig(VAProfileHEVCMain)` returned `VA_STATUS_ERROR_UNSUPPORTED_PROFILE` (12). Same iter1-Bug-1 pattern. Fix: 5-line `break;` (Commit A).
2. **`src/picture.c:204-206` HEVCMain explicit reject** — `case VAProfileHEVCMain: return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;` with stale `/* Fourier-local: HEVC stripped, no HW support on RK3566. */` comment. RK3566 was ohm context; fresnel is RK3399 where rkvdec does support HEVC. Fix: replace explicit reject with dispatch to `h265_set_controls()` (Commit B).
3. **`src/h265.c` used staging-era CIDs** — `V4L2_CID_MPEG_VIDEO_HEVC_PPS/SPS/SLICE_PARAMS` removed from kernel UAPI (mainline split into 7 controls: `V4L2_CID_STATELESS_HEVC_{SPS,PPS,SLICE_PARAMS,SCALING_MATRIX,DECODE_PARAMS,DECODE_MODE,START_CODE}`). Test-compile rejected the old CIDs. Fix: full 588-line rewrite (Commit B).
4. **HEVC slice_params is now a dynamic-array, not single-slot** — kernel rkvdec advertises `elems=1 dims=[600] dynamic-array`. Old h265.c memcpy-overwrote the single slot per VASliceParameterBufferType arrival; multi-slice frames lost data. Fix: per-slice append to `params.h265.slices[64]` array in `codec_store_buffer`, batched dynamic-array submission in `h265_set_controls` (Commit B).
5. **Missing controls in submission** — old code submitted only PPS+SPS+SLICE_PARAMS. New API needs DECODE_PARAMS (per-frame DPB info, NEW), SCALING_MATRIX (conditional), DECODE_MODE+START_CODE (device-wide). Fix: 5-control batched per-frame submission + 2-control device-wide init in `context.c` (Commit B).
6. **`src/meson.build` excluded h265.c + h265.h** — left commented out from the libva-multiplanar iter5 sweep. Fix: uncomment both (Commit B).
## Contract that was followed
Per `feedback_dev_process.md` Phase 6 contract-before-code: 10 contract clauses cited from authoritative sources, mapped 1:1 to the patch hunks (full detail in [`phase4_iter2_plan.md`](phase4_iter2_plan.md) + Phase 5 review amendments in [`phase5_iter2_review.md`](phase5_iter2_review.md)):
- Linux mainline UAPI `<linux/v4l2-controls.h>:2090-2300` for the 7 stateless HEVC control struct definitions, flag bit positions, and DPB entry semantics.
- FFmpeg Kwiboo branch `libavcodec/v4l2_request_hevc.c:505-565` for batched submission shape and `get_ref_pic_index()` DPB-classification pattern.
- Linux mainline driver `drivers/staging/media/rkvdec/`**NOT available** for HEVC (mainline 7.1-rc2 has only h264 + vp9 in staging rkvdec; HEVC is out-of-tree per Christian Hewitt patches). Contract anchor instead leans on UAPI doc + FFmpeg + Phase 3 cross-validator bytes.
- Empirical anchor: Phase 3 Baseline B verbatim ([`phase0_evidence/2026-05-08/iter2_phase3/`](phase0_evidence/2026-05-08/iter2_phase3/)) — verbatim 5-control batched submission per frame, used to validate Phase 7 byte-compare.
## What landed where
**Code commits on `git.reauktion.de/marfrit/libva-v4l2-request-fourier`** (master 229d6d1 → 8d71e20):
```
8d71e20 fresnel-fourier iter2 Phase 6 commit B: rewrite h265.c against new V4L2 stateless HEVC API
(6 files, 463 insertions, 236 deletions: src/h265.c rewrite + src/picture.c
dispatch+accumulation+reset + src/surface.h slices[64] extension + src/h265.h
HEVC_MAX_SLICES_PER_FRAME #define + src/context.c HEVC device-init + src/meson.build
uncomment h265.c+h265.h)
cca539d fresnel-fourier iter2 Phase 6 commit A: config.c break for HEVCMain case
229d6d1 fresnel-fourier iter1 Phase 6 commit D (libva-multiplanar tip + iter1 work)
```
Both iter2 commits authored as `Claude (noether)` per memory `feedback_gitea_as_claude_noether.md`.
**Campaign documents on `git.reauktion.de/marfrit/fresnel-fourier`**:
```
05b4bd5 iter2 Phase 7: verification — all 5 criteria GREEN, third codec PASS
9eae068 iter2 Phase 5: sonnet review — 3 critical UAPI errors caught, 7 amendments
348736e iter2 Phase 4: plan — 10 contract clauses, ~400-line h265.c rewrite
d35a247 iter2 Phase 3: baselines — substrate verified post-upgrade, HEVC anchor captured
b3ba157 iter2 Phase 2: situation analysis — six bugs in HEVC path
6e8c970 iter2 Phase 0 + Phase 1 lock: HEVC Main on rkvdec
```
(Plus this Phase 8 close.)
## Phase 1 binding-cell verification (Phase 7 verbatim)
| Criterion | Result |
|---|---|
| 1: vainfo `VAProfileHEVCMain` enumeration on rkvdec bind | ✅ enumerated, no regression |
| 2: `vaCreateConfig(VAProfileHEVCMain)` | ✅ `VA_STATUS_SUCCESS` (libva trace verbatim) |
| 3: ffmpeg-hwaccel-vaapi exit 0 | ✅ 5 frames, no Failed-to-create, cap_pool engaged |
| 4: DMA-BUF GL HW=SW byte-identical (+02s) | ✅ HW1=SW1=`47a5f3850df5...`, HW2=SW2=`a467b3bc9d7b...`, frames differ |
| 5: iter1 MPEG-2 + T4 H.264 reference hashes | ✅ all 4 hashes match references byte-identical |
Bonus: post-fix `VIDIOC_S_EXT_CTRLS` payload structurally matches Baseline B cross-validator (count=5, ctrl_class=`V4L2_CTRL_CLASS_CODEC_STATELESS`, sizes 40/64/280/1000/328). Two informational SPS field-value divergences (`sps_max_num_reorder_pics`, `sps_max_latency_increase_plus1`) — VAAPI doesn't expose; non-blocking per Criterion 4 byte-identical pixel pass.
## Lessons (for Phase 8 memory update)
The 8-phase loop ran cleanly: zero Phase 7 → Phase 4 loopbacks. **Phase 5 review caught all the would-be loopback triggers in advance** — 3 Critical findings (data_byte_offset rename, dpb.rps→index-arrays semantics, pic_order_cnt_val rename) that would have been Phase 6 compile failures or silent semantic bugs.
### Worth a memory update
**L1 — Empirical-over-theoretical applies in BOTH directions of Phase 5 review**
iter1 lesson `feedback_review_empirical_over_theoretical.md` codified: when a Phase 5 reviewer cites a numerical mismatch, lean empirical (defer to Phase 7 byte-compare) instead of rejecting on source-read theory. iter2 surfaced the **reverse direction**: when a reviewer SUGGESTS a field mapping that the original plan was missing, the author should also verify the suggestion empirically before adopting.
Concretely: Phase 5 review S1 suggested `picture->pic_fields.bits.uniform_spacing_flag` exists in VAAPI as the source for `V4L2_HEVC_PPS_FLAG_UNIFORM_SPACING`. I incorporated the suggestion into the Phase 4 plan amendments without verifying. Phase 6 build failed:
```
../src/h265.c:224:37: error: 'struct <anonymous>' has no member named 'uniform_spacing_flag'
```
Test-compile would have caught it before Phase 6. The lesson generalizes the iter1 rule: **empirical evidence trumps source-read theory in BOTH directions** — author-rebuttals AND reviewer-suggested-amendments. The right discipline:
- Reviewer flags a numerical mismatch → defer to Phase 7 byte-compare; don't rebut on source-read.
- Reviewer suggests a field mapping → empirically verify (gcc test-compile, struct dump, kernel UAPI grep) before incorporating into the plan.
**Memory action**: extend the existing `feedback_review_empirical_over_theoretical.md` entry with this iter2-derived corollary. Keep the existing iter1 example, add the iter2 example.
### Stays as campaign-internal backlog (not durable memory)
**B6 — VAAPI ↔ V4L2 SPS field-fidelity gap (low-priority polish)**: post-iter2 SPS bytes diverge from Baseline B on `sps_max_num_reorder_pics` (post-fix=0, baseline=2) and `sps_max_latency_increase_plus1` (post-fix=0, baseline=4). VAAPI's `VAPictureParameterBufferHEVC` doesn't expose these; FFmpeg parses them from the bitstream's SPS header. Non-blocking (Criterion 4 byte-identical pixel pass). Phase 8 polish-backlog candidate: add SPS bitstream parsing to `h265_fill_sps` to extract these fields from the slice OUTPUT buffer when VAAPI doesn't supply them. Probably low ROI — kernel HEVC handler tolerates 0 values for the BBB fixture. Defer until a real-world consumer surfaces a fixture that breaks on this.
**B7 — Phase 4 plan body had a `sizeof()` typo**: claimed `sizeof(struct v4l2_ctrl_hevc_scaling_matrix) = 1296`; empirical sizeof is 1000 = 96 + 384 + 384 + 128 + 6 + 2. Code uses `sizeof()` symbolically so produces correct bytes; only the plan body's expected-value comment was wrong. Phase 7 byte-compare's structural check caught this (plan-body number wrong, code right). Future polish: state struct sizes via `sizeof()` references in plan bodies, not hand-computed values; or have Phase 7 byte-compare always validate sizes via runtime sizeof.
**B8 — Phase 5 amendment empirically wrong (uniform_spacing_flag)**: same as L1 above; documented inline in `src/h265.c::h265_fill_pps`. Already fixed (PPS flag bits 19+20 left zero — VAAPI doesn't expose either). Lesson is durable (L1); the bug is fixed.
### Latent bugs surfaced but deferred (carryover)
**iter1 B3 — picture.c:287 `params.h264.matrix_set = false` writes union byte 240** — for HEVC surfaces, byte 240 lands inside `h265.picture` (Phase 2 Bug 8 verified). RenderPicture's `VAPictureParameterBufferHEVC` per-frame copy overwrites the corrupted byte. Phase 7 confirmed mpv-vaapi sends VAPictureParameterBufferType per frame for HEVC (no silent filtering like vaapi-copy did for MPEG-2), so the masking holds for iter2 binding cells. Latent for VAAPI clients that reuse a surface without re-sending picture params. iter2+ Phase 4 cross-cutting backlog.
**iter1 B4 — `src/context.c` H.264 device-init log noise on hantro** — extended in iter2 Commit B with HEVC device-init in a separate batched call. New behavior on rkvdec: BOTH H.264 and HEVC device-init batches succeed (rkvdec advertises both control families). On hantro: H.264 batch EINVALs (existing noise), HEVC batch ALSO EINVALs (HEVC controls don't exist on hantro). Net: cosmetic log line might fire twice on hantro instead of once. Same intentional `(void)v4l2_set_controls(...)` swallow pattern; not blocking.
**iter1 B5 — vbv_buffer_size 1 MB vs negotiated 1.31 MB**: not exercised by HEVC (HEVC's SPS doesn't have an analogous field). MPEG-2-only.
## What's next (campaign roadmap)
Per `phase0_evidence/2026-05-07/cross_validator_traces.md` suggested ordering, the campaign now has:
| Iter | Codec | Status |
|---|---|---|
| iter1 | MPEG-2 (hantro) | ✅ closed |
| iter2 | HEVC (rkvdec) | ✅ closed (this iteration) |
| iter3 | VP8 (hantro) | open — implement `vp8.c` |
| iter4 | VP9 (rkvdec) | open — implement `vp9.c` (largest control surface) |
| iterN | Phase 4 cross-cutting | open — vaDeriveImage cache-stale fix; V4L2 device-discovery; B3 BeginPicture profile-aware reset; B4 context.c log suppression; B5 vbv_buffer_size; B6 SPS bitstream-parse fidelity polish |
Phase 8 doesn't pre-lock iter3; that's the next operator-driven moment. When iter3 opens, its Phase 0 substrate carries forward iter2's findings (especially B6, B7, B8 polish items if convenient to bundle with the VP8 work).
Predicted iter3 difficulty (VP8): smaller than iter2 (VP8 has fewer controls than HEVC). Closer to iter1 MPEG-2 in scope. Reuse the per-frame batched submission pattern; no slice_params dynamic-array (VP8 is frame-mode); no DECODE_PARAMS-equivalent (VP8 control state fits in a single `v4l2_ctrl_vp8_frame` struct per kernel UAPI).
## Memory updates
One extension to an existing memory entry (no new files):
- **Extend `feedback_review_empirical_over_theoretical.md`** with iter2's reverse-direction example (reviewer-suggested mapping that proved empirically wrong on test-compile). Same rule, both directions.
## Iteration close acknowledgement
Per `feedback_dev_process.md`'s Phase 8: "Loop terminates here." Iteration 2 of fresnel-fourier closes here. The 8-phase loop ran end-to-end with **zero Phase 7 → Phase 4 loopbacks** — Phase 5 review caught all the would-be loopback triggers in advance (3 Critical UAPI errors). This is the dev-process ideal: review catches the bugs before the implementation lands; verification confirms the contract; close documents both the substrate and the lessons.
Phase 5 review value confirmed empirically. The campaign continues against the corrected memory and the 3-codec-passing scoreboard.