diff --git a/iter4_phase7.tgz b/iter4_phase7.tgz new file mode 100644 index 0000000..d8196e3 Binary files /dev/null and b/iter4_phase7.tgz differ diff --git a/phase7_iter4_verification.md b/phase7_iter4_verification.md new file mode 100644 index 0000000..43e6c30 --- /dev/null +++ b/phase7_iter4_verification.md @@ -0,0 +1,120 @@ +# Iteration 4 — Phase 7 (verification) + +Verification of iter4 Commits Z+A+B+C against the 5 Phase 1 success criteria. Captured 2026-05-10 09:00–09:10 CEST on fresnel `linux-fresnel-fourier 7.0-1`. + +**Verdict**: criterion 1, 2, 3 PASS. **Criteria 4 and 5 FAIL** with two distinct root causes that combine into a Phase 7 → Phase 4 (or possibly Phase 0) loopback. + +## Summary table + +| Criterion | Test | Result | +|---|---|---| +| 1 | vainfo enumerates VAProfileVP9Profile0 | **PASS** (auto-detect rkvdec) | +| 2 | vaCreateConfig SUCCESS | **PASS** (implicit via 3) | +| 3 | ffmpeg-vaapi VP9 decode exit 0 | **PASS** (5 frames at 0.307x, no errors) | +| 4 | HW=SW byte-identical | **FAIL** (two issues, see below) | +| 5 | 4-codec regression | **FAIL** (substrate-wide cap_pool-readback regression) | + +Fork tip: `beaa914` (iter4 Commit C). Backend SHA256: `f2ff6598...`. + +## Criterion 4 — VP9 HW=SW + +**Test**: `ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload bbb_720p10s_vp9.webm -frames:v 5 hw_%04d.png` → SHA256 vs Phase 3 SW reference PNGs. + +**Result**: All 5 HW PNGs have the SAME hash `93dd9db51385...`. Pixels are constant `RGB(0, 0x4c, 0)` across all frames. Phase 3 SW reference has varied content `RGB(0x70, 0x6d, 0x55)` etc. + +**Conclusion**: HW path is producing constant-fill scratch instead of decoded video. Two contributing root causes identified: + +### Bug 1 — Clause 6 uncompressed-header parser bit-misalignment (VP9-specific) + +Strace of my backend's submission vs Phase 3 anchor (frame-1 keyframe): + +``` + lf struct (16 B) quant struct (8 B) +mine: 01 00 ff ff 00 00 03 00 01 00 ... 41 08 00 08 00 00 00 00 +Phase 3 anchor: 01 00 ff ff 00 00 03 00 03 00 ... 2e 00 00 00 00 00 00 00 + +Match: bytes 0–7 (lf prefix is correct, level=3, sharpness=0). +Diverge byte 8 (lf.flags): 0x01 (DELTA_ENABLED only) vs 0x03 (ENABLED|UPDATE). +Diverge byte 16 (base_q_idx): 0x41 (65) vs 0x2e (46). +Diverge byte 17 (delta_q_y_dc): 8 vs 0. +``` + +The parser is **2 bits ahead** by the time it reaches `lf_delta_enabled`. Bit-trace of the BBB keyframe bitstream (`82 49 83 42 40 4f f0 2c f6 06 38 ...`) shows the correct bit positions for these fields per VP9 spec section 6.2; my parser's output corresponds to reading them at positions +2 of the correct positions. + +The 2-bit drift origin is somewhere in the keyframe color_config / refresh_frame_context section. Likely candidates: +- `if (1) /* color_space != CS_RGB */` shortcut may be reading wrong bits when `color_space == CS_RGB`. +- The conditional refresh_frame_context / parallel_decoding_mode reading is off-spec for some path. +- An off-by-one in the keyframe `else` (inter) branch may carry over. + +The wrong `base_q_idx=65` produces garbage dequantization → kernel decodes incorrectly. This alone would invalidate VP9 decode output even if the readback path were perfect. + +**Fix path**: rewrite Clause 6 as a faithful port of FFmpeg `vp9.c::decode_frame_header` lines 540–700 (the canonical parser). ~150 LOC, replacing my current ~120 LOC partial port. Validate by extracting Phase 3 anchor's verbatim payload bits and confirming byte-by-byte match. + +### Bug 2 — Cap-pool readback returns scratch buffer (substrate-wide) + +The constant `RGB(0, 0x4c, 0)` pattern appears for **all 4 already-shipping codecs** as well, not just VP9 (see Criterion 5 below). This is a `linux-fresnel-fourier 7.0-1`-substrate-level issue, not VP9-specific. Even if Bug 1 is fixed, criterion 4 will still fail without resolving Bug 2. + +**Symptom**: kernel decodes successfully (Phase 3's strace + ffmpeg-v4l2request standalone test confirm decode succeeds), but our libva backend's cap_pool readback returns the buffer init pattern, not the kernel's decoded pixels. The pages may be cached or the slot may be unbound. + +This is sibling to the iter3 dma_resv issue but with a different signature (constant `0x4c` instead of all-zero `0x00`). Documented in memory `reference_dmabuf_resv_blocker.md`; the new substrate may have made it worse, broader, or simply different. + +## Criterion 5 — 4-codec regression block + +`ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload` for each codec, 3 frames each, SHA256 of resulting PNGs. + +| Codec | Driver | Frame 1 hash | Frames 2-3 hash | Verdict | +|---|---|---|---|---| +| H.264 | rkvdec auto | `50ca48b6...` (real `RGB(0, 0xe3, 0)` content) | `7c70fdda...` (stuck `RGB(0, 0x4c, 0)`) | **PARTIAL** — keyframe decodes, inter frames stuck | +| HEVC | rkvdec auto | `85243d2c...` (constant `RGB(0, 0x4c, 0)`) | same | **FAIL** | +| MPEG-2 | hantro env | `93dd9db5...` (same constant as VP9) | same | **FAIL** + kernel `Unable to set control(s)` errors | +| VP8 | hantro env | `f8103ae1...` (constant) | same | **FAIL** + kernel `Unable to set control(s)` errors | + +Hash `93dd9db5...` appears across VP9 and MPEG-2 — confirms shared cap_pool init pattern (`0x4c` = decimal 76). + +**Cap-pool readback regression**: H.264 keyframe DOES read correctly, indicating the readback path partially works. Inter frames return scratch — the cap_pool slot rotation between QBUF cycles isn't being driven by the kernel's actual decoded frame data. This is the sibling of Bug 2 above. + +**Hantro `Unable to set control(s)` errors**: a kernel-side rejection on hantro for MPEG-2/VP8. Substrate change appears to have shifted hantro's expected control structure or fields; iter1 (MPEG-2) and iter3 (VP8) were tested on 6.19.9 — UAPI likely drifted between 6.19.9 and 7.0 the same way VP9 did (Phase 3 baseline showed VP9 struct sizes 168/2040 differ from Phase 2's 6.19.9-based estimates 144/1947). + +## Distinguishing the two bug classes + +**Bug 1 (VP9 parser)** — pure iter4 work; localized to `src/vp9.c::vp9_parse_uncompressed_header_lf_quant`. Doesn't affect H.264/HEVC/MPEG-2/VP8. + +**Bug 2 (cap-pool readback)** — substrate-wide; affects ALL codecs on `linux-fresnel-fourier 7.0-1`. Was masked on `linux-eos-arm 6.19.9-99` because rkvdec's cap_pool worked there; iter3 hit the hantro variant via dma_resv. Now broader. + +**Bug 3 (hantro UAPI drift)** — substrate-related; affects MPEG-2 + VP8 specifically with kernel-rejection ioctl errors. Probably struct-size or new-required-field in the kernel UAPI for hantro stateless controls. Distinct from Bug 2's silent-pass-but-wrong-buffer pattern. + +## Path forward — three options + +### Option A — VP9-only Phase 4 loopback for Bug 1 alone + +Fix Clause 6 parser. Re-run Phase 6 + Phase 7. Accept Bug 2 as "criterion-4 transitive proof only" per iter3 precedent (memory `reference_dmabuf_resv_blocker.md`). Accept Bug 3 as substrate-induced regression of iter1+iter3 (file as new memory entry, defer fix). + +**Pro**: fastest path to a "VP9 controls correct" demonstration; preserves iter4's nominal scope. +**Con**: criterion 4 stays effectively-not-direct; campaign scoreboard slips from "5/5 direct" to "5/5 transitive". Bug 2 + Bug 3 are *campaign-wide* issues that this iter shouldn't paper over. + +### Option B — Full Phase 4 loopback covering Bugs 1 + 2 + 3 + +Fix Clause 6 parser. Investigate Bug 2 (cap-pool readback regression) — likely requires kernel-side patch (vb2_dma_resv RFC v2 per memory `reference_fresnel_kernel_substrate.md`) OR libva-side cap_pool refactor. Investigate Bug 3 (hantro UAPI drift) — re-run iter1 + iter3 strace baselines on 7.0 to see what changed. + +**Pro**: addresses root causes; restores 4-codec block; criterion 4 direct. +**Con**: scope expansion, possibly substantial. Bug 2 may require landing kernel patches that aren't ready yet (RFC v1 rejected, v2 in design). + +### Option C — Phase 0 reset: investigate substrate as the primary issue + +Step back from iter4. Document that 6.19 → 7.0 substrate change has broken the campaign's Phase 1 baseline assumptions. Choose one of: +- Roll fresnel back to a kernel that has the working cap_pool path (NOT 6.19.9 since decommissioned, but possibly some intermediate kernel before whatever regressed). +- Land kernel patches in `kernel-agent` to fix Bug 2 + Bug 3 substrate issues, rebuild `linux-fresnel-fourier`, retest from iter1 baseline. + +**Pro**: addresses the real underlying issue. iter4 + iter1 + iter2 + iter3 all become re-anchorable. +**Con**: large scope. Multi-week. iter4 specifically gets paused. + +## Substrate state at Phase 7 close + +- Fork at iter4 Commit C tip `beaa914` (Phase 6 build clean, Criterion 1+2+3 PASS). +- Phase 7 captures persisted: `fresnel:/tmp/iter4_phase7/` and `noether:~/src/fresnel-fourier/iter4_phase7.tgz`. +- Phase 4 plan ready for amendment after option-decision. +- Memory rules carry forward; no new memory entries added in Phase 7 (the findings are iteration-specific, not cross-cutting yet). + +## Decision point + +Pick A / B / C before Phase 4 loopback or campaign re-orientation.