Files
fresnel-fourier/phase7_iter5b_verification_v2.md
T
marfrit c773c3d2c1 iter5b-β Phase 7: PARTIAL PASS — VP9 unblocked, MPEG-2 maintained, HEVC+VP8 partial
Two acts:
Act 1 (β alone): all 5 libva codecs returned all-zero. MPEG-2 was a
regression (pre-β it worked); HEVC was unchanged (kernel returns
DQBUF FLAG_ERROR pre AND post β — same Phase 3 baseline showed it).
Root cause: ffmpeg-vaapi-copy passes surfaces_count=0 to vaCreateContext
per iter6 context.c:262 comment; my β walk of surfaces_ids[] was a
no-op → destination_planes_count stayed 0 → surface_bind_slot no-op
→ all-zero readback.

Act 2 (Commit D): cache format-uniform CAPTURE geometry in driver_data;
walk surface_heap in CreateContext; lazy-fill in CreateSurfaces2 when
fmt_valid is set; invalidate in DestroyContext. Restores MPEG-2 to
pre-β state and unlocks VP9.

Per Phase 1 criteria: criterion 1 PARTIAL (VP9 of HEVC+VP9+VP8);
criteria 2-4 PASS.

Bug 5 (NEW): HEVC libva DQBUF FLAG_ERROR — pre-existing kernel
rejection; β's OUTPUT format fix didn't address it. Transitive proof
at iter2 verified control payload shape but kernel still rejects;
some other V4L2 protocol contract aspect differs from kdirect.

Bug 6 (NEW): VP8 libva produces non-zero output with real content
(74.8% zero + 256 unique bytes incl. keyframe pixels at `93 8e 8a 89...`)
but diverges from kdirect. Decode runs; output mismatch likely
slot-rotation or partial-fill bug.

VP9 is iter5b-β's only clean PASS. Architecture-wise β succeeded:
no α'-style failure mode possible (no in-CreateSurfaces2 destructive
teardown), and the CRIT-1+CRIT-2 fixes from Phase 5 v2 review held.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 18:56:26 +00:00

9.3 KiB
Raw Blame History

Iteration 5b — Phase 7 v2 (verification, β + Commit D)

Captured 2026-05-12 evening on fresnel linux-fresnel-fourier 7.0-1, fork tip 70196f8, backend SHA 2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8. β architecture survived empirical contact; Commit D fix-forward addressed the ffmpeg-vaapi-copy-passes-surfaces_count=0 surprise the first run revealed.

Verdict

PARTIAL PASS. VP9 unblocked directly. MPEG-2 maintained. VP8 mechanically unblocked (decode runs, output diverges from kdirect). HEVC unchanged (separate kernel-rejection issue, pre-existing). H.264 unchanged (Bug 4 deferred as planned).

Codec Pre-β Post-β + D kdirect sw Verdict
H.264 1080p30 71ac099b… keyframe partial 71ac099b… unchanged 1e7a0bc9… 1e7a0bc9… PARTIAL — Bug 4 (deferred to iter6)
HEVC 720p 06b2c5a0… all-zero 06b2c5a0… still all-zero 9340b832… 9340b832… FAIL — separate kernel-rejection issue
VP9 720p 06b2c5a0… all-zero 4f1565e8… 4f1565e8… 4f1565e8… PASS ✓✓✓ (libva == kdirect == sw)
MPEG-2 720p 19eefbf4… worked 19eefbf4… maintained 19eefbf4… 7be8cad7… PASS (libva == kdirect; HW≠SW is codec precision drift, not Bug 2)
VP8 720p 06b2c5a0… all-zero bcc57ed5… real content but diverges 136ce5cb… 136ce5cb… PARTIAL — decode runs, output mismatch (256 unique bytes, real keyframe content 93 8e 8a 89 …)

Per Phase 1 lock criteria

  1. HEVC + VP9 + VP8 libva == kdirect == sw: PARTIAL — 1 of 3 (VP9 only).
  2. MPEG-2 unchanged: PASS (after Commit D fix-forward — pre-D was all-zero, post-D matches pre-iter5b).
  3. H.264 keyframe still decodes: PASS (unchanged).
  4. Control-payload anchors hold: PASS (β didn't touch control submission).

What happened in two acts

Act 1 — β alone: regression on MPEG-2, no progress elsewhere

Initial post-β sweep returned all-zero for every libva codec. Strace of HEVC: kernel decoder returned DQBUF with V4L2_BUF_FLAG_ERROR for every frame. Strace of MPEG-2 (post-β): kernel decoder succeeded (10 DQBUF, 0 ERROR) but the userspace read still got all-zero. The MPEG-2 success/userspace-zero combination pointed at a surface-state issue.

Phase 3 baseline cross-check showed:

  • Pre-iter5b HEVC trace.11115+: every DQBUF had ERROR flag too — pre-iter5b HEVC was already broken at the kernel level, β didn't introduce this.
  • Pre-iter5b MPEG-2 trace.11169: 0 ERROR, decode succeeded, userspace got real pixels — pre-iter5b MPEG-2 worked end-to-end via libva.

So β fixed the OUTPUT pixel format mismatch (cosmetic improvement for HEVC; correct format for VP9/VP8) but exposed an MPEG-2 regression. Root cause traced to context.c:262 comment: ffmpeg vaapi-copy passes surfaces_count==0 to vaCreateContext. My β walk of surfaces_ids[] for the destination_* fill was therefore a no-op for vaapi-copy → destination_planes_count stayed 0 → surface_bind_slot's for (j=0; j<0; ...) was a no-op → destination_data[] never assigned → copy_surface_to_image memcpy of 0 bytes from a NULL src.

Act 2 — Commit D: cache format-uniform state in driver_data, walk surface_heap, lazy-fill on late CreateSurfaces2

Commit 70196f8 adds:

  • request.h driver_data fields: fmt_valid, fmt_planes_count, fmt_buffers_count, fmt_format_height, fmt_sizes[], fmt_bytesperlines[].
  • surface.h/c: new surface_fill_format_uniform(driver_data, surface_object) helper, idempotent on destination_planes_count != 0.
  • context.c::RequestCreateContext: populates the cache after v4l2_get_format(CAPTURE); walks surface_heap (not just surfaces_ids[]) to fill every existing surface.
  • context.c::RequestCreateContext: lazy-fill via the helper after surface allocation.
  • context.c::RequestDestroyContext: invalidates fmt_valid = false.

Post-D sweep: MPEG-2 restored, VP9 PASS, VP8 partial, HEVC unchanged.

The remaining failures

HEVC: pre-existing kernel-rejection issue

Pre-β HEVC trace.11115 (Phase 3 baseline, same kernel) showed 4 of 4 DQBUFs with V4L2_BUF_FLAG_ERROR. Post-β: same. The kernel rkvdec driver rejects HEVC decode for some reason independent of OUTPUT pixel format.

Per the iter2-close docs, HEVC was claimed PASS via transitive proof: backend control payload byte-matched kdirect's payload. The transitive proof verified controls were CORRECTLY SHAPED, not that the kernel ACCEPTED them. Kdirect submits the same shape and the kernel decodes; libva submits the same shape and the kernel returns ERROR. The difference must be in V4L2 protocol order, request_fd binding, sequence numbers, or some other contract aspect not captured in the control-payload bytes.

Bug 5 (new): HEVC libva decode triggers DQBUF V4L2_BUF_FLAG_ERROR from rkvdec. Same kernel + same controls + same fixture works via kdirect. Difference is in the libva backend's V4L2 protocol behavior. Out of iter5b scope.

VP8: partial — decode runs but output diverges

VP8 libva returns bcc57ed5… (4 147 200 bytes, 256 unique bytes, 74.8% zero, real keyframe content). Pre-β was all-zero 06b2c5a0…. So β did unlock VP8 decode at some level. But the output bytes differ from kdirect's 136ce5cb….

Frame 1 first 16 bytes: 93 8e 8a 89 85 72 8c 6d 82 79 92 7e 80 80 80 80 — plausible BBB intro pixels (neutral grey-to-mid-tone). Frames 1, 2, 3 differ from each other (motion content present).

Likely candidates:

  • Slot rotation: backend reads from one cap_pool slot while kernel wrote to another (similar pattern as H.264 inter-frame Bug 4).
  • Partial fill: each frame's lower portion is zero (74.8% zero is a lot).
  • Per-frame DPB binding issue.

Bug 6 (new): VP8 libva decode produces non-zero but non-matching output. Out of iter5b scope.

Architecture verdict

The β refactor delivered its core goal: CreateSurfaces2 no longer touches V4L2 device state; CreateContext owns the OUTPUT-side lifecycle. The Phase 7 α' failure mode is structurally eliminated — there's no in-CreateSurfaces2 destructive teardown branch to corrupt state mid-stream.

The 2-CRIT findings from Phase 5 v2 (CRIT-1 NULL guard, CRIT-2 missing request_pool_destroy) were correct and incorporated; without them, Phase 7 wouldn't have gotten past the first CreateContext.

Commit D was the surprise: the architecturally cleaner β created a new edge case (vaapi-copy late-surface flow) that pre-β didn't have because CreateSurfaces2's "fill destination_* immediately" pattern happened to handle it. The fix-forward restores the equivalent behavior via lazy-fill at both sites.

What iter5b-β actually shipped

  • VP9 directly verifiable (no more transitive-proof dependency). Campaign scoreboard: 5/5 with 1 newly direct + 4 mixed (1 prior direct MPEG-2, 1 keyframe-partial H.264, 2 partial/transitive VP8+HEVC).
  • MPEG-2 not regressed.
  • H.264 not regressed.
  • Bug 5 (HEVC kernel rejection) and Bug 6 (VP8 partial output) exposed for iter6+ work.
  • Architecture cleaned up. surface.c::CreateSurfaces2 is now ~70 LOC (was ~250). context.c::RequestCreateContext owns the OUTPUT lifecycle clearly.

Substrate state at Phase 7 close

  • Fork tip 70196f8 on noether + fresnel + gitea.
  • Backend installed SHA 2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8.
  • Kernel linux-fresnel-fourier 7.0-1 (unchanged).
  • Test fixtures unchanged.
  • Phase 7 sweep artifacts at /tmp/iter5b_p7v2/ on fresnel.

Memory rules — candidates at iter5b-β close

The Phase 5 v2 reviewer's "before claiming no change needed, grep all call sites" discipline saved Phase 6 from CRIT-2 silently breaking the second CreateContext. Worth a memory rule:

Proposed: feedback_grep_all_callsites_before_no_change_claim.md — When a plan says "no change needed" for a code region, grep all symbols/functions/macros the change touches and enumerate the call sites. The author's surface read at Phase 4 v2 missed request_pool_destroy not being in DestroyContext; only the reviewer's enumeration caught it.

The Commit D fix-forward exposes a sibling rule: trust but verify the iter6 comment. The iter6 comment at context.c:262 explicitly named the ffmpeg-vaapi-copy surfaces_count=0 case. Phase 4 v2 plan acknowledged the comment but didn't trace the implication for the destination_* walk. Comment read → implication not derived.

Both rules can be folded into one: plans that touch lifecycle code must enumerate ALL existing comments AND grep ALL call sites for affected symbols; "no change needed" claims must be backed by explicit enumeration. Defer the memory rule write to Phase 8 close.

Phase 7 → Phase 8 handoff

Phase 7 closes here. iter5b reaches partial-PASS:

  • ✓ VP9 directly unblocked (the primary iter5b-β goal achieved for 1 of the 3 target codecs).
  • ✓ MPEG-2 not regressed (Commit D fix-forward).
  • ✓ H.264 not regressed.
  • ✗ HEVC remains broken (Bug 5 — pre-existing, exposed).
  • ✗ VP8 partial (Bug 6 — new, decode-runs-but-output-diverges).

The campaign scoreboard becomes "5/5 with 1 direct (MPEG-2) + 1 newly direct (VP9) + 3 mixed (H.264 keyframe-partial + VP8 partial + HEVC transitive-claimed-but-empirically-failing)."

Phase 8 close documents the iter5b-β shipped state and lists Bug 5 + Bug 6 + Bug 4 as iter6+ work targets. Memory rule update covers the empirical lessons.