iter4 Phase 7: criterion 1+2+3 PASS, criterion 4+5 FAIL — three bug classes identified
Verification on linux-fresnel-fourier 7.0-1:
PASS:
- Criterion 1: vainfo enumerates VAProfileVP9Profile0 via auto-detect.
- Criterion 2: vaCreateConfig SUCCESS (implicit).
- Criterion 3: ffmpeg-vaapi VP9 5-frame decode exit 0 at 0.307x, no
ioctl errors.
FAIL — three distinguishable bug classes:
Bug 1 (VP9-specific, my Clause 6 parser):
Strace of frame-1 keyframe FRAME control vs Phase 3 anchor:
- byte 8 (lf.flags): mine=0x01 (DELTA_ENABLED only) vs ref=0x03
(ENABLED|UPDATE).
- byte 16 (base_q_idx): mine=0x41 (65) vs ref=0x2e (46).
- byte 17 (delta_q_y_dc): mine=8 vs ref=0.
Bit-trace shows my parser is 2 bits ahead of correct position by
the time it reaches lf_delta_enabled. Fix path: faithful port of
FFmpeg vp9.c::decode_frame_header.
Bug 2 (substrate-wide, cap_pool readback):
Constant RGB(0, 0x4c, 0) "0x4c gray" pattern across all codecs
(VP9, HEVC, MPEG-2, VP8). H.264 keyframe DOES read correctly with
real RGB(0, 0xe3, 0) content; H.264 inter frames revert to 0x4c.
Kernel decode succeeds (Phase 3 strace + ffmpeg-v4l2request
standalone confirm). libva readback returns cap_pool init scratch.
Sibling of iter3 dma_resv blocker but with different signature
(constant 0x4c instead of all-zero 0x00).
Bug 3 (hantro UAPI drift):
MPEG-2 + VP8 produce kernel "Unable to set control(s): Invalid
argument" errors. UAPI struct sizes/fields likely shifted between
6.19.9 and 7.0 (sibling of Phase 3 VP9 struct-size correction
144/1947 -> 168/2040).
Three loopback options proposed (decision pending user):
- A: VP9-only fix (Clause 6 parser); accept Bug 2/3 as substrate
pre-existing; criterion 4 transitive-only per iter3.
- B: Full loopback covering all 3 bugs; possibly requires kernel
patches (vb2_dma_resv RFC v2).
- C: Phase 0 reset; substrate is the primary issue; pause iter4.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,120 @@
|
||||
# Iteration 4 — Phase 7 (verification)
|
||||
|
||||
Verification of iter4 Commits Z+A+B+C against the 5 Phase 1 success criteria. Captured 2026-05-10 09:00–09:10 CEST on fresnel `linux-fresnel-fourier 7.0-1`.
|
||||
|
||||
**Verdict**: criterion 1, 2, 3 PASS. **Criteria 4 and 5 FAIL** with two distinct root causes that combine into a Phase 7 → Phase 4 (or possibly Phase 0) loopback.
|
||||
|
||||
## Summary table
|
||||
|
||||
| Criterion | Test | Result |
|
||||
|---|---|---|
|
||||
| 1 | vainfo enumerates VAProfileVP9Profile0 | **PASS** (auto-detect rkvdec) |
|
||||
| 2 | vaCreateConfig SUCCESS | **PASS** (implicit via 3) |
|
||||
| 3 | ffmpeg-vaapi VP9 decode exit 0 | **PASS** (5 frames at 0.307x, no errors) |
|
||||
| 4 | HW=SW byte-identical | **FAIL** (two issues, see below) |
|
||||
| 5 | 4-codec regression | **FAIL** (substrate-wide cap_pool-readback regression) |
|
||||
|
||||
Fork tip: `beaa914` (iter4 Commit C). Backend SHA256: `f2ff6598...`.
|
||||
|
||||
## Criterion 4 — VP9 HW=SW
|
||||
|
||||
**Test**: `ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload bbb_720p10s_vp9.webm -frames:v 5 hw_%04d.png` → SHA256 vs Phase 3 SW reference PNGs.
|
||||
|
||||
**Result**: All 5 HW PNGs have the SAME hash `93dd9db51385...`. Pixels are constant `RGB(0, 0x4c, 0)` across all frames. Phase 3 SW reference has varied content `RGB(0x70, 0x6d, 0x55)` etc.
|
||||
|
||||
**Conclusion**: HW path is producing constant-fill scratch instead of decoded video. Two contributing root causes identified:
|
||||
|
||||
### Bug 1 — Clause 6 uncompressed-header parser bit-misalignment (VP9-specific)
|
||||
|
||||
Strace of my backend's submission vs Phase 3 anchor (frame-1 keyframe):
|
||||
|
||||
```
|
||||
lf struct (16 B) quant struct (8 B)
|
||||
mine: 01 00 ff ff 00 00 03 00 01 00 ... 41 08 00 08 00 00 00 00
|
||||
Phase 3 anchor: 01 00 ff ff 00 00 03 00 03 00 ... 2e 00 00 00 00 00 00 00
|
||||
|
||||
Match: bytes 0–7 (lf prefix is correct, level=3, sharpness=0).
|
||||
Diverge byte 8 (lf.flags): 0x01 (DELTA_ENABLED only) vs 0x03 (ENABLED|UPDATE).
|
||||
Diverge byte 16 (base_q_idx): 0x41 (65) vs 0x2e (46).
|
||||
Diverge byte 17 (delta_q_y_dc): 8 vs 0.
|
||||
```
|
||||
|
||||
The parser is **2 bits ahead** by the time it reaches `lf_delta_enabled`. Bit-trace of the BBB keyframe bitstream (`82 49 83 42 40 4f f0 2c f6 06 38 ...`) shows the correct bit positions for these fields per VP9 spec section 6.2; my parser's output corresponds to reading them at positions +2 of the correct positions.
|
||||
|
||||
The 2-bit drift origin is somewhere in the keyframe color_config / refresh_frame_context section. Likely candidates:
|
||||
- `if (1) /* color_space != CS_RGB */` shortcut may be reading wrong bits when `color_space == CS_RGB`.
|
||||
- The conditional refresh_frame_context / parallel_decoding_mode reading is off-spec for some path.
|
||||
- An off-by-one in the keyframe `else` (inter) branch may carry over.
|
||||
|
||||
The wrong `base_q_idx=65` produces garbage dequantization → kernel decodes incorrectly. This alone would invalidate VP9 decode output even if the readback path were perfect.
|
||||
|
||||
**Fix path**: rewrite Clause 6 as a faithful port of FFmpeg `vp9.c::decode_frame_header` lines 540–700 (the canonical parser). ~150 LOC, replacing my current ~120 LOC partial port. Validate by extracting Phase 3 anchor's verbatim payload bits and confirming byte-by-byte match.
|
||||
|
||||
### Bug 2 — Cap-pool readback returns scratch buffer (substrate-wide)
|
||||
|
||||
The constant `RGB(0, 0x4c, 0)` pattern appears for **all 4 already-shipping codecs** as well, not just VP9 (see Criterion 5 below). This is a `linux-fresnel-fourier 7.0-1`-substrate-level issue, not VP9-specific. Even if Bug 1 is fixed, criterion 4 will still fail without resolving Bug 2.
|
||||
|
||||
**Symptom**: kernel decodes successfully (Phase 3's strace + ffmpeg-v4l2request standalone test confirm decode succeeds), but our libva backend's cap_pool readback returns the buffer init pattern, not the kernel's decoded pixels. The pages may be cached or the slot may be unbound.
|
||||
|
||||
This is sibling to the iter3 dma_resv issue but with a different signature (constant `0x4c` instead of all-zero `0x00`). Documented in memory `reference_dmabuf_resv_blocker.md`; the new substrate may have made it worse, broader, or simply different.
|
||||
|
||||
## Criterion 5 — 4-codec regression block
|
||||
|
||||
`ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload` for each codec, 3 frames each, SHA256 of resulting PNGs.
|
||||
|
||||
| Codec | Driver | Frame 1 hash | Frames 2-3 hash | Verdict |
|
||||
|---|---|---|---|---|
|
||||
| H.264 | rkvdec auto | `50ca48b6...` (real `RGB(0, 0xe3, 0)` content) | `7c70fdda...` (stuck `RGB(0, 0x4c, 0)`) | **PARTIAL** — keyframe decodes, inter frames stuck |
|
||||
| HEVC | rkvdec auto | `85243d2c...` (constant `RGB(0, 0x4c, 0)`) | same | **FAIL** |
|
||||
| MPEG-2 | hantro env | `93dd9db5...` (same constant as VP9) | same | **FAIL** + kernel `Unable to set control(s)` errors |
|
||||
| VP8 | hantro env | `f8103ae1...` (constant) | same | **FAIL** + kernel `Unable to set control(s)` errors |
|
||||
|
||||
Hash `93dd9db5...` appears across VP9 and MPEG-2 — confirms shared cap_pool init pattern (`0x4c` = decimal 76).
|
||||
|
||||
**Cap-pool readback regression**: H.264 keyframe DOES read correctly, indicating the readback path partially works. Inter frames return scratch — the cap_pool slot rotation between QBUF cycles isn't being driven by the kernel's actual decoded frame data. This is the sibling of Bug 2 above.
|
||||
|
||||
**Hantro `Unable to set control(s)` errors**: a kernel-side rejection on hantro for MPEG-2/VP8. Substrate change appears to have shifted hantro's expected control structure or fields; iter1 (MPEG-2) and iter3 (VP8) were tested on 6.19.9 — UAPI likely drifted between 6.19.9 and 7.0 the same way VP9 did (Phase 3 baseline showed VP9 struct sizes 168/2040 differ from Phase 2's 6.19.9-based estimates 144/1947).
|
||||
|
||||
## Distinguishing the two bug classes
|
||||
|
||||
**Bug 1 (VP9 parser)** — pure iter4 work; localized to `src/vp9.c::vp9_parse_uncompressed_header_lf_quant`. Doesn't affect H.264/HEVC/MPEG-2/VP8.
|
||||
|
||||
**Bug 2 (cap-pool readback)** — substrate-wide; affects ALL codecs on `linux-fresnel-fourier 7.0-1`. Was masked on `linux-eos-arm 6.19.9-99` because rkvdec's cap_pool worked there; iter3 hit the hantro variant via dma_resv. Now broader.
|
||||
|
||||
**Bug 3 (hantro UAPI drift)** — substrate-related; affects MPEG-2 + VP8 specifically with kernel-rejection ioctl errors. Probably struct-size or new-required-field in the kernel UAPI for hantro stateless controls. Distinct from Bug 2's silent-pass-but-wrong-buffer pattern.
|
||||
|
||||
## Path forward — three options
|
||||
|
||||
### Option A — VP9-only Phase 4 loopback for Bug 1 alone
|
||||
|
||||
Fix Clause 6 parser. Re-run Phase 6 + Phase 7. Accept Bug 2 as "criterion-4 transitive proof only" per iter3 precedent (memory `reference_dmabuf_resv_blocker.md`). Accept Bug 3 as substrate-induced regression of iter1+iter3 (file as new memory entry, defer fix).
|
||||
|
||||
**Pro**: fastest path to a "VP9 controls correct" demonstration; preserves iter4's nominal scope.
|
||||
**Con**: criterion 4 stays effectively-not-direct; campaign scoreboard slips from "5/5 direct" to "5/5 transitive". Bug 2 + Bug 3 are *campaign-wide* issues that this iter shouldn't paper over.
|
||||
|
||||
### Option B — Full Phase 4 loopback covering Bugs 1 + 2 + 3
|
||||
|
||||
Fix Clause 6 parser. Investigate Bug 2 (cap-pool readback regression) — likely requires kernel-side patch (vb2_dma_resv RFC v2 per memory `reference_fresnel_kernel_substrate.md`) OR libva-side cap_pool refactor. Investigate Bug 3 (hantro UAPI drift) — re-run iter1 + iter3 strace baselines on 7.0 to see what changed.
|
||||
|
||||
**Pro**: addresses root causes; restores 4-codec block; criterion 4 direct.
|
||||
**Con**: scope expansion, possibly substantial. Bug 2 may require landing kernel patches that aren't ready yet (RFC v1 rejected, v2 in design).
|
||||
|
||||
### Option C — Phase 0 reset: investigate substrate as the primary issue
|
||||
|
||||
Step back from iter4. Document that 6.19 → 7.0 substrate change has broken the campaign's Phase 1 baseline assumptions. Choose one of:
|
||||
- Roll fresnel back to a kernel that has the working cap_pool path (NOT 6.19.9 since decommissioned, but possibly some intermediate kernel before whatever regressed).
|
||||
- Land kernel patches in `kernel-agent` to fix Bug 2 + Bug 3 substrate issues, rebuild `linux-fresnel-fourier`, retest from iter1 baseline.
|
||||
|
||||
**Pro**: addresses the real underlying issue. iter4 + iter1 + iter2 + iter3 all become re-anchorable.
|
||||
**Con**: large scope. Multi-week. iter4 specifically gets paused.
|
||||
|
||||
## Substrate state at Phase 7 close
|
||||
|
||||
- Fork at iter4 Commit C tip `beaa914` (Phase 6 build clean, Criterion 1+2+3 PASS).
|
||||
- Phase 7 captures persisted: `fresnel:/tmp/iter4_phase7/` and `noether:~/src/fresnel-fourier/iter4_phase7.tgz`.
|
||||
- Phase 4 plan ready for amendment after option-decision.
|
||||
- Memory rules carry forward; no new memory entries added in Phase 7 (the findings are iteration-specific, not cross-cutting yet).
|
||||
|
||||
## Decision point
|
||||
|
||||
Pick A / B / C before Phase 4 loopback or campaign re-orientation.
|
||||
Reference in New Issue
Block a user