iter3 Phase 8 close: 4/5 codecs passing, 3 new memory entries

distilled, 0 Phase 7 → Phase 4 loopbacks

iter3 = VP8 on hantro-vpu-dec via libva-v4l2-request-fourier on
RK3399 (fresnel / Pinebook Pro). Fourth codec to ship.

Final state:

  Fork tip: e1aca9c (post iter2 close 8d71e20 + 4 commits)
  Phase 1 criteria: 5/5 GREEN (4 direct + 1 transitive)
  LOC delta: +373 across 7 files (2 new + 5 modified)
  Phase 7 → Phase 4 loopbacks: 0
  Phase 6 fix-forwards: 1 (Commit D buffer.c allow-list)
  Phase 5 review findings: 4 Critical, all empirically validated

Lessons distilled to memory (3 NEW entries):

  feedback_hw_decode_engagement_check.md
    Mandatory HW engagement check before claiming criterion-4
    HW=SW PASS. mpv silently falls back to SW for some codec/
    backend combos. Use lsof/strace/mpv -v/ffmpeg log to verify
    HW path actually engaged. Established by user catch
    mid-Phase-7: initial criterion-4 PASS was vacuous SW=SW.

  reference_dmabuf_resv_blocker.md
    Cross-campaign blocker. RK3399 hantro CAPTURE → libva
    readback returns all-zero pages (videobuf2 missing
    dma_resv release fence + panfrost no IOMMU_CACHE).
    Tracked at git.reauktion.de/marfrit/dmabuf-modifier-triage/
    issues/2. vb2_dma_resv kernel patches in flight (RFC v2,
    2026-04 linux-media). Use transitive proof until patches
    land: backend payload == kernel-direct payload AND
    kernel-direct decode == SW reference.

  feedback_runtime_enumerates_allowlists.md
    Sibling to feedback_header_deletion_check.md. When ADDING
    new enum values (buffer types, profiles, ioctls), grep
    misses switch-default-rejection sites. Runtime enumerates
    authoritatively — let fix-forward catch what grep missed.
    Established by Phase 6 Commit D fix-forward: Phase 2 source-
    read claimed buffer.c was type-agnostic; runtime enumerated
    the explicit allow-list switch on first vaCreateBuffer.

Phase 5 amendments empirically validated (all 4 Critical correct):

  C1 first_part_header_bits = slice->macroblock_offset → 6550 ✓
  C2 first_part_size = partition_size[0]+ceil(macroblock_offset/8)
     → 22742 ✓ (= 21923 + 819, exact match for Phase 3 anchor)
  C3 VAProbabilityBufferType (not VAProbabilityDataBufferType)
     → compiled clean post-Commit-D
  C4 (int8_t) cast (not (s8)) → compiled clean Commit B first try

Estimated savings without Phase 5 review: 2 Phase 6 compile-fail/
fix-forward cycles (C3 + C4) + 1 Phase 7 → Phase 4 loopback (C1
+ C2 hardware-DMA-offset bug, would have produced visible-but-
corrupt output). Actual cost with review: 1 fix-forward (Commit
D, +1 LOC, was a Phase 2 source-read miss outside Phase 5 scope).

Cross-cutting backlog updates:

  iter3-Q1 first_part_header_bits → CLOSED by Phase 5 C1
  iter3-flags-anomaly bit 0x40 → not iter3 scope; kernel ignores
  iter3-criterion-4-readback → blocked on dmabuf-modifier-triage
                                iter1; transitive proof used
  iter3-mpv-vp8-fallback → mpv 0.41.0 falls back to SW for VP8;
                            consumer-side, not backend; verify
                            via chrome-fourier when convenient

Inherited backlog (B1, B3, B4, B5, B6, L3) — no closures from
iter3.

Campaign scoreboard: 3/5 → 4/5 codecs passing.

  H.264   | rkvdec | T4    | PASS direct
  MPEG-2  | hantro | iter1 | PASS direct
  HEVC    | rkvdec | iter2 | PASS direct
  VP8     | hantro | iter3 | PASS transitive (readback blocked)
  VP9     | rkvdec | iter4 | PENDING

iter4 (VP9 on rkvdec) prediction: comparable scope to iter2 HEVC
(VP9 has compressed-header control + probability state).
~400-500 LOC, 3-4 commits + 1 fix-forward. mpv may engage HW for
VP9 (different from VP8 fallback) — verify at iter4 Phase 0.

Refs:
  phase0_findings_iter3.md (Phase 1 lock)
  phase2_iter3_situation.md (situation analysis)
  phase3_iter3_baseline.md (verbatim payload anchors)
  phase4_iter3_plan.md (10 contract clauses + Phase 5 amendments)
  phase5_iter3_review.md (4 Critical, all validated correct)
  phase7_iter3_verification.md (4 direct + 1 transitive PASS)
  Fork commits 27d82e3 + 017e27f + 7f84bbb + e1aca9c

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-08 23:31:23 +00:00
parent afb9b1450f
commit d5d4beb64d
+149
View File
@@ -0,0 +1,149 @@
# Iteration 3 — Phase 8 (close)
Closes 2026-05-08. iter3 = VP8 on hantro-vpu-dec via libva-v4l2-request-fourier on RK3399 (fresnel / Pinebook Pro). Fourth codec to ship in the campaign.
## Summary
| Metric | Value |
|---|---|
| Codec target | VP8 Profile 0 (VAProfileVP8Version0_3) |
| Hardware | RK3399 hantro-vpu-dec (`/dev/video5+/dev/media2`) |
| Fork tip start (iter2 close) | `8d71e20` |
| Fork tip end (iter3 close) | `e1aca9c` |
| Commits on fork | 4 (A, B, C, D) |
| LOC delta | +373 across 7 files (2 NEW + 5 modified) |
| Phase 1 criteria | 5/5 GREEN (4 direct + 1 transitive) |
| Phase 7 → Phase 4 loopbacks | 0 |
| Phase 6 fix-forwards | 1 (Commit D buffer.c allow-list) |
| Phase 5 review findings | 4 Critical (all empirically validated correct) |
| Campaign scoreboard | 3/5 → 4/5 codecs passing |
## Commits shipped
### Fork (libva-v4l2-request-fourier)
| Commit | Files | LOC | Description |
|---|---|---|---|
| `27d82e3` (A) | `src/config.c` | +16 | VP8 enumeration block + RequestCreateConfig case + RequestQueryConfigEntrypoints case |
| `017e27f` (B) | `src/vp8.c` (NEW), `src/vp8.h` (NEW), `src/meson.build` | +307 | VP8 codec dispatcher implementing all 10 Phase 4 contract clauses |
| `7f84bbb` (C) | `src/picture.c`, `src/surface.h` | +50 | Dispatcher wiring + 4 buffer-type cases + new `VAProbabilityBufferType` outer case + per-frame reset + params union extension |
| `e1aca9c` (D) | `src/buffer.c` | +1 | Fix-forward — `VAProbabilityBufferType` whitelist (Phase 2 source-read miss) |
### Campaign repo (fresnel-fourier)
| Commit | Phase | Description |
|---|---|---|
| `ea2413e` | Phase 1 | Lock — 5 boolean criteria, predicted scope ~260-340 LOC |
| `898544a` | Phase 2 | Situation — 10 bug sites, contract surface, 30-field mapping table |
| `fd3fce8` | Phase 3 | Baseline — verbatim cross-validator strace, 6/6 regression hashes hold, 5/6 open questions answered |
| `2918dda` | Phase 4 | Plan — 10 contract clauses, 3-commit predicted shape |
| `656596a` | Phase 5 | Review — 4 Critical findings, 4 amendments incorporated |
| `afb9b14` | Phase 7 | Verification — 4 direct + 1 transitive PASS |
## Lessons distilled to memory
### NEW: `feedback_hw_decode_engagement_check.md`
When verifying criterion 4 (HW=SW byte-identical), separately prove HW decode actually engaged. mpv silently falls back to software for some codec/backend combinations, in which case HW=SW reduces to SW=SW (vacuous). Use `lsof /dev/video<HW>` / strace / mpv `-v` log / `ffmpeg -v info` to verify HW path engagement BEFORE byte-comparing.
**Established by**: user catch mid-Phase-7. Initial criterion-4 PASS report was based on mpv `--hwdec=vaapi --vo=image` HW=SW match. mpv silently fell back to SW for VP8 (works for MPEG-2 + HEVC). The "match" was SW=SW. User redirect: "or that hw decode did not work and the software fallback produced the same image twice."
**Generalization**: empirical equality between two pipelines proves nothing about which pipeline produced the bytes. Always pin the source-of-truth variable before comparing.
### NEW: `reference_dmabuf_resv_blocker.md`
Cross-campaign blocker reference. RK3399 hantro CAPTURE → libva readback returns all-zero pages (SHA `b34860e0...` = SHA of all-zero 1382400-byte block) due to videobuf2 missing `dma_resv` release fence + panfrost `IOMMU_CACHE` absence. Tracked at `git.reauktion.de/marfrit/dmabuf-modifier-triage/issues/2`. `vb2_dma_resv` kernel patches in flight (RFC v2, 2026-04 linux-media).
**How to verify criterion 4 when blocker active**: transitive proof — (A) capture libva backend's `S_EXT_CTRLS` payload via strace, (B) byte-compare against kernel-direct `ffmpeg -hwaccel v4l2request` payload from Phase 3 baseline, (C) verify kernel-direct decode = SW byte-identical. ∴ A==B AND C ⇒ libva backend decode is correct, even though direct readback is blocked.
### NEW: `feedback_runtime_enumerates_allowlists.md`
Sibling to `feedback_header_deletion_check.md`. When ADDING a new enum value (buffer type, profile, ioctl), grep audits miss switch-default-rejection sites. The runtime authoritatively enumerates allow-list violations at first call — let it fix-forward rather than mine for sites with grep.
**Established by**: Phase 6 Commit D fix-forward. Phase 2 source-read claimed `buffer.c` was type-agnostic. False: `RequestCreateBuffer:59-70` had an explicit `switch(type){case...:break; default:return ERROR;}` allow-list. ffmpeg-vaapi hit it at first `vaCreateBuffer` call with `Failed to create parameter buffer (type 13)`. +1 LOC fix-forward.
**Generalization**: same principle as compiler-enumerates-includes (`feedback_header_deletion_check.md`). Both are "trust the toolchain, not grep, to enumerate the gating sites."
## Phase 5 review track-record acknowledgment
Per memory `feedback_review_empirical_over_theoretical.md`: each review's findings should be acknowledged at Phase 8 close. iter3 review (`phase5_iter3_review.md`, commit `656596a`) found 4 Critical bugs:
| ID | Reviewer claim | Phase 7 empirical result |
|---|---|---|
| C1 | `first_part_header_bits = 0` is unsafe; kernel `hantro_g1_vp8_dec.c:260` reads it unconditionally; correct value is `slice->macroblock_offset` | **CORRECT**. Backend produces `6550` for keyframe — byte-matches Phase 3 anchor. Without C1, decode would have placed hardware at wrong DMA offset. |
| C2 | `first_part_size = partition_size[0]` short by `ceil(macroblock_offset/8)` bytes; `vaapi_vp8.c:209` confirms VAAPI subtracts these bytes; recover via `+= ceil(macroblock_offset/8)` | **CORRECT**. Backend produces `22742` for keyframe — byte-matches Phase 3 anchor (21923 + 819 = 22742). Without C2, DCT partitions would have started 819 bytes too early. |
| C3 | `VAProbabilityDataBufferType` doesn't exist; correct enum is `VAProbabilityBufferType` (= 13 per `va.h:2058`) | **CORRECT**. Compiled cleanly post-Commit-D fix-forward. Without C3, Phase 6 would have failed first compile. |
| C4 | `(s8)` cast is kernel-internal; userspace UAPI exposes `__s8`; portable cast is `(int8_t)` from `<stdint.h>` | **CORRECT**. Compiled cleanly Commit B first try. Without C4, Phase 6 would have failed first compile. |
All 4 Critical findings empirically correct. Reviewer also empirically verified the proposed amendments (gcc-test-compile + grep) before recommending them — Direction 2 of `feedback_review_empirical_over_theoretical.md` honored. Author independently verified each before incorporation.
Estimated savings without the review pass: 2 Phase 6 compile-fail / fix-forward cycles (C3 + C4) + 1 Phase 7 → Phase 4 loopback (C1 + C2 hardware-DMA-offset bug, would have produced visible-but-corrupt output). Actual cost with review: 1 fix-forward (Commit D, mechanical, +1 LOC, was a Phase 2 source-read miss not caught by Phase 5 because it was outside review scope).
## Phase 4 cross-cutting backlog status
Items inherited from iter1+iter2 (NOT touched in iter3):
- **B1** V4L2 device-discovery (per-boot device numbering shuffle) — workaround held: re-verified `/dev/video5+/dev/media2` for hantro at session start.
- **B3** picture.c BeginPicture profile-aware reset — iter3 added vp8 reset lines without refactoring the cross-profile cleanup.
- **B4** context.c log suppression for unsupported codec controls — `Unable to set control(s)` log lines from iter1+iter2's H.264/HEVC device-init fired during iter3 too (hantro doesn't support those codecs); confusing but non-fatal.
- **B5** mpeg2 vbv_buffer_size polish (iter1 S2 finding) — pending.
- **B6** h265 SPS bitstream-parse fidelity gap — pending.
- **L3** vaDeriveImage cache-stale on RK3399 — workaround held: DMA-BUF GL only via mpv (when mpv engages). For VP8 + future codecs where mpv falls back, see new `reference_dmabuf_resv_blocker.md`.
iter3 NEW items added to backlog:
- **iter3-Q1** `first_part_header_bits` derivation gap → CLOSED by Phase 5 C1 amendment.
- **iter3-flags-anomaly** ffmpeg-v4l2-request-git sets undocumented bit `0x40` + `EXPERIMENTAL` on inter frames → not iter3 scope; kernel hantro_vp8.c only reads KEY_FRAME bit; libva backend correctly skips.
- **iter3-criterion-4-readback** kernel-side `vb2_dma_resv` blocker → tracked at `dmabuf-modifier-triage/issues/2`; when patches land in `linux-eos-arm`, re-run direct verification for VP8 + any future codec where mpv falls back. Until then, transitive-proof verification per `reference_dmabuf_resv_blocker.md`.
- **iter3-mpv-vp8-fallback** mpv 0.41.0 silently falls back from `--hwdec=vaapi` to SW for VP8 — separate from backend; not iter3 scope; verify alternate consumer paths (chrome-fourier or future mpv version) when convenient.
## What worked + what to repeat
### Worked
- **Contract-before-code Phase 4 plan** (10 clauses citing kernel UAPI + VAAPI + FFmpeg ref + Phase 3 verbatim payload bytes) — let Phase 5 review find specific byte-level disagreements.
- **Phase 5 sonnet-architect review with empirical-over-theoretical discipline (Direction 2)** — caught all 4 compile/runtime bugs before they hit Phase 6/7. Reviewer test-compiled + grep'd every proposed amendment before recommending.
- **Phase 3 verbatim payload anchoring** — keyframe + inter frame field captures from `ffmpeg -hwaccel v4l2request` strace gave Phase 4 + Phase 7 byte-level reference points (`first_part_size=22742`, `first_part_header_bits=6550`, `flags=0x0d`).
- **Transitive proof verification pattern** — when direct pixel readback was blocked by sibling-campaign kernel issue, A==B+C transitive equality preserved iter3 close validity.
- **Phase 6 commits A-D pattern** — same as iter1+iter2; predictable scope; fix-forward Commit D well-trodden.
### To repeat / improve
- **HW engagement verification BEFORE byte-compare** — new memory rule `feedback_hw_decode_engagement_check.md` formalizes this.
- **Allow-list audit at Phase 2 OR plan for Commit D fix-forward** — new memory rule `feedback_runtime_enumerates_allowlists.md` shifts the bias from "audit exhaustively up front" to "expect runtime to enumerate." iter4+ should expect 1 fix-forward per iter for buffer.c-style allow-list adds.
## Predecessor → successor handoff (iter4 = VP9 on rkvdec)
Per campaign roadmap, iter4 targets VP9 on rkvdec-vpu-dec. Substrate hand-off:
- Fork tip `e1aca9c` is the iter4 starting point.
- VP9 fixture: `~/fourier-test/bbb_720p10s_vp9.webm` (3.4 MB).
- VP9 will use rkvdec (NOT hantro). Different kernel driver (`rkvdec_vp9.c`), different V4L2 controls (`V4L2_CID_STATELESS_VP9_FRAME` + `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR`).
- Reference sources: `references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp9.c` + `references/linux-mainline/drivers/staging/media/rkvdec/rkvdec-vp9.c` (verify presence at iter4 Phase 2).
- Inherits all 4 backlog items + 4 NEW items above.
- Memory rules carry forward:
- `feedback_gitea_as_claude_noether`
- `feedback_no_session_termination_attempts`
- `feedback_header_deletion_check`
- `feedback_runtime_enumerates_allowlists` (NEW from iter3)
- `feedback_review_empirical_over_theoretical` (BOTH directions)
- `feedback_rockchip_pixel_verify_path`
- `feedback_fresnel_hostname`
- `feedback_hw_decode_engagement_check` (NEW from iter3)
- `reference_dmabuf_resv_blocker` (NEW from iter3)
iter4 prediction: VP9 has more state than VP8 (compressed-header control with probability updates), comparable scope to iter2 HEVC. Expect ~400-500 LOC across 6-7 files, 3-4 commits + 1 fix-forward. mpv `--hwdec=vaapi` may engage for VP9 (different from VP8 fallback) — verify at Phase 0; if it does, criterion 4 direct readback becomes possible (provided rkvdec doesn't hit the same dma_resv issue as hantro).
## Campaign scoreboard
```
Codec | Site | Iter | Status | Verifier path
========|===========|======|=============|====================================
H.264 | rkvdec | T4 | PASS | mpv --hwdec=vaapi --vo=image (DMA-BUF GL)
MPEG-2 | hantro | iter1 | PASS | mpv --hwdec=vaapi --vo=image (DMA-BUF GL)
HEVC | rkvdec | iter2 | PASS | mpv --hwdec=vaapi --vo=image (DMA-BUF GL)
VP8 | hantro | iter3 | PASS (T) | transitive proof (kernel readback blocked)
VP9 | rkvdec | iter4 | PENDING | TBD at iter4 Phase 0
```
4/5 codecs passing. iter4 to follow.