Two acts: Act 1 (β alone): all 5 libva codecs returned all-zero. MPEG-2 was a regression (pre-β it worked); HEVC was unchanged (kernel returns DQBUF FLAG_ERROR pre AND post β — same Phase 3 baseline showed it). Root cause: ffmpeg-vaapi-copy passes surfaces_count=0 to vaCreateContext per iter6 context.c:262 comment; my β walk of surfaces_ids[] was a no-op → destination_planes_count stayed 0 → surface_bind_slot no-op → all-zero readback. Act 2 (Commit D): cache format-uniform CAPTURE geometry in driver_data; walk surface_heap in CreateContext; lazy-fill in CreateSurfaces2 when fmt_valid is set; invalidate in DestroyContext. Restores MPEG-2 to pre-β state and unlocks VP9. Per Phase 1 criteria: criterion 1 PARTIAL (VP9 of HEVC+VP9+VP8); criteria 2-4 PASS. Bug 5 (NEW): HEVC libva DQBUF FLAG_ERROR — pre-existing kernel rejection; β's OUTPUT format fix didn't address it. Transitive proof at iter2 verified control payload shape but kernel still rejects; some other V4L2 protocol contract aspect differs from kdirect. Bug 6 (NEW): VP8 libva produces non-zero output with real content (74.8% zero + 256 unique bytes incl. keyframe pixels at `93 8e 8a 89...`) but diverges from kdirect. Decode runs; output mismatch likely slot-rotation or partial-fill bug. VP9 is iter5b-β's only clean PASS. Architecture-wise β succeeded: no α'-style failure mode possible (no in-CreateSurfaces2 destructive teardown), and the CRIT-1+CRIT-2 fixes from Phase 5 v2 review held. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.3 KiB
Iteration 5b — Phase 7 v2 (verification, β + Commit D)
Captured 2026-05-12 evening on fresnel linux-fresnel-fourier 7.0-1, fork tip 70196f8, backend SHA 2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8. β architecture survived empirical contact; Commit D fix-forward addressed the ffmpeg-vaapi-copy-passes-surfaces_count=0 surprise the first run revealed.
Verdict
PARTIAL PASS. VP9 unblocked directly. MPEG-2 maintained. VP8 mechanically unblocked (decode runs, output diverges from kdirect). HEVC unchanged (separate kernel-rejection issue, pre-existing). H.264 unchanged (Bug 4 deferred as planned).
| Codec | Pre-β | Post-β + D | kdirect | sw | Verdict |
|---|---|---|---|---|---|
| H.264 1080p30 | 71ac099b… keyframe partial |
71ac099b… unchanged |
1e7a0bc9… |
1e7a0bc9… |
PARTIAL — Bug 4 (deferred to iter6) |
| HEVC 720p | 06b2c5a0… all-zero |
06b2c5a0… still all-zero |
9340b832… |
9340b832… |
FAIL — separate kernel-rejection issue |
| VP9 720p | 06b2c5a0… all-zero |
4f1565e8… |
4f1565e8… |
4f1565e8… |
PASS ✓✓✓ (libva == kdirect == sw) |
| MPEG-2 720p | 19eefbf4… worked |
19eefbf4… maintained |
19eefbf4… |
7be8cad7… |
PASS (libva == kdirect; HW≠SW is codec precision drift, not Bug 2) |
| VP8 720p | 06b2c5a0… all-zero |
bcc57ed5… real content but diverges |
136ce5cb… |
136ce5cb… |
PARTIAL — decode runs, output mismatch (256 unique bytes, real keyframe content 93 8e 8a 89 …) |
Per Phase 1 lock criteria
- HEVC + VP9 + VP8 libva == kdirect == sw: PARTIAL — 1 of 3 (VP9 only).
- MPEG-2 unchanged: PASS (after Commit D fix-forward — pre-D was all-zero, post-D matches pre-iter5b).
- H.264 keyframe still decodes: PASS (unchanged).
- Control-payload anchors hold: PASS (β didn't touch control submission).
What happened in two acts
Act 1 — β alone: regression on MPEG-2, no progress elsewhere
Initial post-β sweep returned all-zero for every libva codec. Strace of HEVC: kernel decoder returned DQBUF with V4L2_BUF_FLAG_ERROR for every frame. Strace of MPEG-2 (post-β): kernel decoder succeeded (10 DQBUF, 0 ERROR) but the userspace read still got all-zero. The MPEG-2 success/userspace-zero combination pointed at a surface-state issue.
Phase 3 baseline cross-check showed:
- Pre-iter5b HEVC trace.11115+: every DQBUF had ERROR flag too — pre-iter5b HEVC was already broken at the kernel level, β didn't introduce this.
- Pre-iter5b MPEG-2 trace.11169: 0 ERROR, decode succeeded, userspace got real pixels — pre-iter5b MPEG-2 worked end-to-end via libva.
So β fixed the OUTPUT pixel format mismatch (cosmetic improvement for HEVC; correct format for VP9/VP8) but exposed an MPEG-2 regression. Root cause traced to context.c:262 comment: ffmpeg vaapi-copy passes surfaces_count==0 to vaCreateContext. My β walk of surfaces_ids[] for the destination_* fill was therefore a no-op for vaapi-copy → destination_planes_count stayed 0 → surface_bind_slot's for (j=0; j<0; ...) was a no-op → destination_data[] never assigned → copy_surface_to_image memcpy of 0 bytes from a NULL src.
Act 2 — Commit D: cache format-uniform state in driver_data, walk surface_heap, lazy-fill on late CreateSurfaces2
Commit 70196f8 adds:
request.hdriver_data fields:fmt_valid,fmt_planes_count,fmt_buffers_count,fmt_format_height,fmt_sizes[],fmt_bytesperlines[].surface.h/c: newsurface_fill_format_uniform(driver_data, surface_object)helper, idempotent ondestination_planes_count != 0.context.c::RequestCreateContext: populates the cache afterv4l2_get_format(CAPTURE); walkssurface_heap(not justsurfaces_ids[]) to fill every existing surface.context.c::RequestCreateContext: lazy-fill via the helper after surface allocation.context.c::RequestDestroyContext: invalidatesfmt_valid = false.
Post-D sweep: MPEG-2 restored, VP9 PASS, VP8 partial, HEVC unchanged.
The remaining failures
HEVC: pre-existing kernel-rejection issue
Pre-β HEVC trace.11115 (Phase 3 baseline, same kernel) showed 4 of 4 DQBUFs with V4L2_BUF_FLAG_ERROR. Post-β: same. The kernel rkvdec driver rejects HEVC decode for some reason independent of OUTPUT pixel format.
Per the iter2-close docs, HEVC was claimed PASS via transitive proof: backend control payload byte-matched kdirect's payload. The transitive proof verified controls were CORRECTLY SHAPED, not that the kernel ACCEPTED them. Kdirect submits the same shape and the kernel decodes; libva submits the same shape and the kernel returns ERROR. The difference must be in V4L2 protocol order, request_fd binding, sequence numbers, or some other contract aspect not captured in the control-payload bytes.
Bug 5 (new): HEVC libva decode triggers DQBUF V4L2_BUF_FLAG_ERROR from rkvdec. Same kernel + same controls + same fixture works via kdirect. Difference is in the libva backend's V4L2 protocol behavior. Out of iter5b scope.
VP8: partial — decode runs but output diverges
VP8 libva returns bcc57ed5… (4 147 200 bytes, 256 unique bytes, 74.8% zero, real keyframe content). Pre-β was all-zero 06b2c5a0…. So β did unlock VP8 decode at some level. But the output bytes differ from kdirect's 136ce5cb….
Frame 1 first 16 bytes: 93 8e 8a 89 85 72 8c 6d 82 79 92 7e 80 80 80 80 — plausible BBB intro pixels (neutral grey-to-mid-tone). Frames 1, 2, 3 differ from each other (motion content present).
Likely candidates:
- Slot rotation: backend reads from one cap_pool slot while kernel wrote to another (similar pattern as H.264 inter-frame Bug 4).
- Partial fill: each frame's lower portion is zero (74.8% zero is a lot).
- Per-frame DPB binding issue.
Bug 6 (new): VP8 libva decode produces non-zero but non-matching output. Out of iter5b scope.
Architecture verdict
The β refactor delivered its core goal: CreateSurfaces2 no longer touches V4L2 device state; CreateContext owns the OUTPUT-side lifecycle. The Phase 7 α' failure mode is structurally eliminated — there's no in-CreateSurfaces2 destructive teardown branch to corrupt state mid-stream.
The 2-CRIT findings from Phase 5 v2 (CRIT-1 NULL guard, CRIT-2 missing request_pool_destroy) were correct and incorporated; without them, Phase 7 wouldn't have gotten past the first CreateContext.
Commit D was the surprise: the architecturally cleaner β created a new edge case (vaapi-copy late-surface flow) that pre-β didn't have because CreateSurfaces2's "fill destination_* immediately" pattern happened to handle it. The fix-forward restores the equivalent behavior via lazy-fill at both sites.
What iter5b-β actually shipped
- VP9 directly verifiable (no more transitive-proof dependency). Campaign scoreboard: 5/5 with 1 newly direct + 4 mixed (1 prior direct MPEG-2, 1 keyframe-partial H.264, 2 partial/transitive VP8+HEVC).
- MPEG-2 not regressed.
- H.264 not regressed.
- Bug 5 (HEVC kernel rejection) and Bug 6 (VP8 partial output) exposed for iter6+ work.
- Architecture cleaned up. surface.c::CreateSurfaces2 is now ~70 LOC (was ~250). context.c::RequestCreateContext owns the OUTPUT lifecycle clearly.
Substrate state at Phase 7 close
- Fork tip
70196f8on noether + fresnel + gitea. - Backend installed SHA
2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8. - Kernel
linux-fresnel-fourier 7.0-1(unchanged). - Test fixtures unchanged.
- Phase 7 sweep artifacts at
/tmp/iter5b_p7v2/on fresnel.
Memory rules — candidates at iter5b-β close
The Phase 5 v2 reviewer's "before claiming no change needed, grep all call sites" discipline saved Phase 6 from CRIT-2 silently breaking the second CreateContext. Worth a memory rule:
Proposed: feedback_grep_all_callsites_before_no_change_claim.md — When a plan says "no change needed" for a code region, grep all symbols/functions/macros the change touches and enumerate the call sites. The author's surface read at Phase 4 v2 missed request_pool_destroy not being in DestroyContext; only the reviewer's enumeration caught it.
The Commit D fix-forward exposes a sibling rule: trust but verify the iter6 comment. The iter6 comment at context.c:262 explicitly named the ffmpeg-vaapi-copy surfaces_count=0 case. Phase 4 v2 plan acknowledged the comment but didn't trace the implication for the destination_* walk. Comment read → implication not derived.
Both rules can be folded into one: plans that touch lifecycle code must enumerate ALL existing comments AND grep ALL call sites for affected symbols; "no change needed" claims must be backed by explicit enumeration. Defer the memory rule write to Phase 8 close.
Phase 7 → Phase 8 handoff
Phase 7 closes here. iter5b reaches partial-PASS:
- ✓ VP9 directly unblocked (the primary iter5b-β goal achieved for 1 of the 3 target codecs).
- ✓ MPEG-2 not regressed (Commit D fix-forward).
- ✓ H.264 not regressed.
- ✗ HEVC remains broken (Bug 5 — pre-existing, exposed).
- ✗ VP8 partial (Bug 6 — new, decode-runs-but-output-diverges).
The campaign scoreboard becomes "5/5 with 1 direct (MPEG-2) + 1 newly direct (VP9) + 3 mixed (H.264 keyframe-partial + VP8 partial + HEVC transitive-claimed-but-empirically-failing)."
Phase 8 close documents the iter5b-β shipped state and lists Bug 5 + Bug 6 + Bug 4 as iter6+ work targets. Memory rule update covers the empirical lessons.