Files
fresnel-fourier/phase7_iter4_verification.md
T
marfrit d87c940788 iter4 Phase 7: criterion 1+2+3 PASS, criterion 4+5 FAIL — three bug classes identified
Verification on linux-fresnel-fourier 7.0-1:

PASS:
- Criterion 1: vainfo enumerates VAProfileVP9Profile0 via auto-detect.
- Criterion 2: vaCreateConfig SUCCESS (implicit).
- Criterion 3: ffmpeg-vaapi VP9 5-frame decode exit 0 at 0.307x, no
  ioctl errors.

FAIL — three distinguishable bug classes:

Bug 1 (VP9-specific, my Clause 6 parser):
  Strace of frame-1 keyframe FRAME control vs Phase 3 anchor:
  - byte 8 (lf.flags): mine=0x01 (DELTA_ENABLED only) vs ref=0x03
    (ENABLED|UPDATE).
  - byte 16 (base_q_idx): mine=0x41 (65) vs ref=0x2e (46).
  - byte 17 (delta_q_y_dc): mine=8 vs ref=0.
  Bit-trace shows my parser is 2 bits ahead of correct position by
  the time it reaches lf_delta_enabled. Fix path: faithful port of
  FFmpeg vp9.c::decode_frame_header.

Bug 2 (substrate-wide, cap_pool readback):
  Constant RGB(0, 0x4c, 0) "0x4c gray" pattern across all codecs
  (VP9, HEVC, MPEG-2, VP8). H.264 keyframe DOES read correctly with
  real RGB(0, 0xe3, 0) content; H.264 inter frames revert to 0x4c.
  Kernel decode succeeds (Phase 3 strace + ffmpeg-v4l2request
  standalone confirm). libva readback returns cap_pool init scratch.
  Sibling of iter3 dma_resv blocker but with different signature
  (constant 0x4c instead of all-zero 0x00).

Bug 3 (hantro UAPI drift):
  MPEG-2 + VP8 produce kernel "Unable to set control(s): Invalid
  argument" errors. UAPI struct sizes/fields likely shifted between
  6.19.9 and 7.0 (sibling of Phase 3 VP9 struct-size correction
  144/1947 -> 168/2040).

Three loopback options proposed (decision pending user):
- A: VP9-only fix (Clause 6 parser); accept Bug 2/3 as substrate
     pre-existing; criterion 4 transitive-only per iter3.
- B: Full loopback covering all 3 bugs; possibly requires kernel
     patches (vb2_dma_resv RFC v2).
- C: Phase 0 reset; substrate is the primary issue; pause iter4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 07:20:51 +00:00

8.6 KiB
Raw Blame History

Iteration 4 — Phase 7 (verification)

Verification of iter4 Commits Z+A+B+C against the 5 Phase 1 success criteria. Captured 2026-05-10 09:0009:10 CEST on fresnel linux-fresnel-fourier 7.0-1.

Verdict: criterion 1, 2, 3 PASS. Criteria 4 and 5 FAIL with two distinct root causes that combine into a Phase 7 → Phase 4 (or possibly Phase 0) loopback.

Summary table

Criterion Test Result
1 vainfo enumerates VAProfileVP9Profile0 PASS (auto-detect rkvdec)
2 vaCreateConfig SUCCESS PASS (implicit via 3)
3 ffmpeg-vaapi VP9 decode exit 0 PASS (5 frames at 0.307x, no errors)
4 HW=SW byte-identical FAIL (two issues, see below)
5 4-codec regression FAIL (substrate-wide cap_pool-readback regression)

Fork tip: beaa914 (iter4 Commit C). Backend SHA256: f2ff6598....

Criterion 4 — VP9 HW=SW

Test: ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload bbb_720p10s_vp9.webm -frames:v 5 hw_%04d.png → SHA256 vs Phase 3 SW reference PNGs.

Result: All 5 HW PNGs have the SAME hash 93dd9db51385.... Pixels are constant RGB(0, 0x4c, 0) across all frames. Phase 3 SW reference has varied content RGB(0x70, 0x6d, 0x55) etc.

Conclusion: HW path is producing constant-fill scratch instead of decoded video. Two contributing root causes identified:

Bug 1 — Clause 6 uncompressed-header parser bit-misalignment (VP9-specific)

Strace of my backend's submission vs Phase 3 anchor (frame-1 keyframe):

                 lf struct (16 B)                    quant struct (8 B)
mine:            01 00 ff ff 00 00 03 00 01 00 ...   41 08 00 08 00 00 00 00
Phase 3 anchor:  01 00 ff ff 00 00 03 00 03 00 ...   2e 00 00 00 00 00 00 00

Match: bytes 07 (lf prefix is correct, level=3, sharpness=0).
Diverge byte 8 (lf.flags):     0x01 (DELTA_ENABLED only) vs 0x03 (ENABLED|UPDATE).
Diverge byte 16 (base_q_idx):  0x41 (65) vs 0x2e (46).
Diverge byte 17 (delta_q_y_dc): 8 vs 0.

The parser is 2 bits ahead by the time it reaches lf_delta_enabled. Bit-trace of the BBB keyframe bitstream (82 49 83 42 40 4f f0 2c f6 06 38 ...) shows the correct bit positions for these fields per VP9 spec section 6.2; my parser's output corresponds to reading them at positions +2 of the correct positions.

The 2-bit drift origin is somewhere in the keyframe color_config / refresh_frame_context section. Likely candidates:

  • if (1) /* color_space != CS_RGB */ shortcut may be reading wrong bits when color_space == CS_RGB.
  • The conditional refresh_frame_context / parallel_decoding_mode reading is off-spec for some path.
  • An off-by-one in the keyframe else (inter) branch may carry over.

The wrong base_q_idx=65 produces garbage dequantization → kernel decodes incorrectly. This alone would invalidate VP9 decode output even if the readback path were perfect.

Fix path: rewrite Clause 6 as a faithful port of FFmpeg vp9.c::decode_frame_header lines 540700 (the canonical parser). ~150 LOC, replacing my current ~120 LOC partial port. Validate by extracting Phase 3 anchor's verbatim payload bits and confirming byte-by-byte match.

Bug 2 — Cap-pool readback returns scratch buffer (substrate-wide)

The constant RGB(0, 0x4c, 0) pattern appears for all 4 already-shipping codecs as well, not just VP9 (see Criterion 5 below). This is a linux-fresnel-fourier 7.0-1-substrate-level issue, not VP9-specific. Even if Bug 1 is fixed, criterion 4 will still fail without resolving Bug 2.

Symptom: kernel decodes successfully (Phase 3's strace + ffmpeg-v4l2request standalone test confirm decode succeeds), but our libva backend's cap_pool readback returns the buffer init pattern, not the kernel's decoded pixels. The pages may be cached or the slot may be unbound.

This is sibling to the iter3 dma_resv issue but with a different signature (constant 0x4c instead of all-zero 0x00). Documented in memory reference_dmabuf_resv_blocker.md; the new substrate may have made it worse, broader, or simply different.

Criterion 5 — 4-codec regression block

ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload for each codec, 3 frames each, SHA256 of resulting PNGs.

Codec Driver Frame 1 hash Frames 2-3 hash Verdict
H.264 rkvdec auto 50ca48b6... (real RGB(0, 0xe3, 0) content) 7c70fdda... (stuck RGB(0, 0x4c, 0)) PARTIAL — keyframe decodes, inter frames stuck
HEVC rkvdec auto 85243d2c... (constant RGB(0, 0x4c, 0)) same FAIL
MPEG-2 hantro env 93dd9db5... (same constant as VP9) same FAIL + kernel Unable to set control(s) errors
VP8 hantro env f8103ae1... (constant) same FAIL + kernel Unable to set control(s) errors

Hash 93dd9db5... appears across VP9 and MPEG-2 — confirms shared cap_pool init pattern (0x4c = decimal 76).

Cap-pool readback regression: H.264 keyframe DOES read correctly, indicating the readback path partially works. Inter frames return scratch — the cap_pool slot rotation between QBUF cycles isn't being driven by the kernel's actual decoded frame data. This is the sibling of Bug 2 above.

Hantro Unable to set control(s) errors: a kernel-side rejection on hantro for MPEG-2/VP8. Substrate change appears to have shifted hantro's expected control structure or fields; iter1 (MPEG-2) and iter3 (VP8) were tested on 6.19.9 — UAPI likely drifted between 6.19.9 and 7.0 the same way VP9 did (Phase 3 baseline showed VP9 struct sizes 168/2040 differ from Phase 2's 6.19.9-based estimates 144/1947).

Distinguishing the two bug classes

Bug 1 (VP9 parser) — pure iter4 work; localized to src/vp9.c::vp9_parse_uncompressed_header_lf_quant. Doesn't affect H.264/HEVC/MPEG-2/VP8.

Bug 2 (cap-pool readback) — substrate-wide; affects ALL codecs on linux-fresnel-fourier 7.0-1. Was masked on linux-eos-arm 6.19.9-99 because rkvdec's cap_pool worked there; iter3 hit the hantro variant via dma_resv. Now broader.

Bug 3 (hantro UAPI drift) — substrate-related; affects MPEG-2 + VP8 specifically with kernel-rejection ioctl errors. Probably struct-size or new-required-field in the kernel UAPI for hantro stateless controls. Distinct from Bug 2's silent-pass-but-wrong-buffer pattern.

Path forward — three options

Option A — VP9-only Phase 4 loopback for Bug 1 alone

Fix Clause 6 parser. Re-run Phase 6 + Phase 7. Accept Bug 2 as "criterion-4 transitive proof only" per iter3 precedent (memory reference_dmabuf_resv_blocker.md). Accept Bug 3 as substrate-induced regression of iter1+iter3 (file as new memory entry, defer fix).

Pro: fastest path to a "VP9 controls correct" demonstration; preserves iter4's nominal scope. Con: criterion 4 stays effectively-not-direct; campaign scoreboard slips from "5/5 direct" to "5/5 transitive". Bug 2 + Bug 3 are campaign-wide issues that this iter shouldn't paper over.

Option B — Full Phase 4 loopback covering Bugs 1 + 2 + 3

Fix Clause 6 parser. Investigate Bug 2 (cap-pool readback regression) — likely requires kernel-side patch (vb2_dma_resv RFC v2 per memory reference_fresnel_kernel_substrate.md) OR libva-side cap_pool refactor. Investigate Bug 3 (hantro UAPI drift) — re-run iter1 + iter3 strace baselines on 7.0 to see what changed.

Pro: addresses root causes; restores 4-codec block; criterion 4 direct. Con: scope expansion, possibly substantial. Bug 2 may require landing kernel patches that aren't ready yet (RFC v1 rejected, v2 in design).

Option C — Phase 0 reset: investigate substrate as the primary issue

Step back from iter4. Document that 6.19 → 7.0 substrate change has broken the campaign's Phase 1 baseline assumptions. Choose one of:

  • Roll fresnel back to a kernel that has the working cap_pool path (NOT 6.19.9 since decommissioned, but possibly some intermediate kernel before whatever regressed).
  • Land kernel patches in kernel-agent to fix Bug 2 + Bug 3 substrate issues, rebuild linux-fresnel-fourier, retest from iter1 baseline.

Pro: addresses the real underlying issue. iter4 + iter1 + iter2 + iter3 all become re-anchorable. Con: large scope. Multi-week. iter4 specifically gets paused.

Substrate state at Phase 7 close

  • Fork at iter4 Commit C tip beaa914 (Phase 6 build clean, Criterion 1+2+3 PASS).
  • Phase 7 captures persisted: fresnel:/tmp/iter4_phase7/ and noether:~/src/fresnel-fourier/iter4_phase7.tgz.
  • Phase 4 plan ready for amendment after option-decision.
  • Memory rules carry forward; no new memory entries added in Phase 7 (the findings are iteration-specific, not cross-cutting yet).

Decision point

Pick A / B / C before Phase 4 loopback or campaign re-orientation.