From 9a14cc2527116b8f0e5a50ac3db70c7ffb9e94cf Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Tue, 12 May 2026 19:01:07 +0000 Subject: [PATCH] =?UTF-8?q?iter5b-=CE=B2=20Phase=208=20close:=20PARTIAL=20?= =?UTF-8?q?PASS=20=E2=80=94=20VP9=20unblocked=20direct,=20Bugs=204/5/6=20c?= =?UTF-8?q?arried=20to=20iter6?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Iteration shipped (fork tip 70196f8, backend SHA 2c6ff82c... on fresnel): - VP9 directly verifiable (Phase 1 criterion 1 met for 1 of 3 target codecs) - MPEG-2 maintained (no regression after Commit D fix-forward) - H.264 unchanged (Bug 4 deferred per Phase 1 lock) - Architecture cleaned: CreateSurfaces2 ~70 LOC (single-responsibility), CreateContext owns OUTPUT lifecycle, no α'-style failure mode possible. Surfaced bugs for iter6+: - Bug 5: HEVC libva DQBUF FLAG_ERROR (pre-existing; iter2's transitive PASS verified control payload but not decode outcome) - Bug 6: VP8 libva produces non-zero non-matching output (slot rotation or partial fill, masked pre-β by all-zero state) - Bug 4: H.264 inter-frame race-loss (carried from iter4 P7) Lessons distilled to memory: - feedback_grep_callsites_before_no_change.md (Phase 5 v2 CRIT-2 caught request_pool_destroy not in DestroyContext after C3 stripped its only per-session caller) - feedback_trust_iter_comments_for_lifecycle.md (Commit D fix-forward surfaced because Phase 4 v2 read but didn't trace context.c:262's iter6 ffmpeg-vaapi-copy surfaces_count=0 comment) Campaign scoreboard: 5/5 with 2 direct (VP9 new, MPEG-2 maintained) + 3 mixed (H.264 keyframe partial, VP8 partial new, HEVC transitive-only direct-FAIL). iter6 awaits Phase 0 research-question lock. Co-Authored-By: Claude Opus 4.7 (1M context) --- phase8_iteration5b_close.md | 209 ++++++++++++++++++++++++++++++++++++ 1 file changed, 209 insertions(+) create mode 100644 phase8_iteration5b_close.md diff --git a/phase8_iteration5b_close.md b/phase8_iteration5b_close.md new file mode 100644 index 0000000..4f4e1bd --- /dev/null +++ b/phase8_iteration5b_close.md @@ -0,0 +1,209 @@ +# Iteration 5b — Phase 8 (close) + +Closes 2026-05-12. iter5b = backend OUTPUT-format profile-derivation (option β refactor). PARTIAL PASS per Phase 7 v2: VP9 directly unblocked, MPEG-2 maintained, HEVC + VP8 expose previously-masked bugs. + +## Summary + +| Metric | Value | +|---|---| +| Iteration target | Backend Bug 2 (`surface.c:173` hardcoded H264_SLICE) via option β (CreateContext-centric OUTPUT lifecycle) | +| Hardware | RK3399 rkvdec (`/dev/video1+/dev/media0` this boot) + hantro-vpu-dec (`/dev/video3+/dev/media1`) | +| Fork tip start (iter4 close) | `692eaa0` | +| Fork tip mid-iter (α' rejected) | `4b2288f` → 3-revert → `6bc29ec` | +| Fork tip end (iter5b-β close) | `70196f8` | +| Commits authored | 3 β commits + 3 α'-reverts + 1 Commit D fix-forward + 3 α' commits in history = 10 fork commits net 4 visible β changes | +| LOC delta (β + D vs iter4 tip) | +320 / -293 across 8 files + 2 new | +| Phase 1 criteria | 1/4 PARTIAL · 3/4 PASS | +| Phase 7 → Phase 4 loopbacks | 1 (α' rejected, β chosen) | +| Phase 6 fix-forwards | 1 (Commit D for vaapi-copy late-surface flow) | +| Phase 5 review findings | iter5b-α' Phase 5: 1 CRIT (mechanical) + 3 IMP. iter5b-β Phase 5 v2: 2 CRIT (NULL guard, missing request_pool_destroy) + 3 IMP. All amendments incorporated. | +| Campaign scoreboard | 5/5 with **1 newly direct** (VP9) + 1 maintained direct (MPEG-2) + 3 mixed (H.264 keyframe-partial, VP8 partial-new-Bug-6, HEVC pre-existing-Bug-5) | + +## Commits shipped + +### Fork (libva-v4l2-request-fourier) + +**β-final commits** (the actual iter5b shipped code): + +| SHA | Files | LOC | Description | +|---|---|---|---| +| `1c548b1` (A) | `src/codec.h` (new) · `src/codec.c` (new) · `src/meson.build` | +102 / -2 | `pixelformat_for_profile()` helper mapping VAProfile → V4L2 OUTPUT FOURCC | +| `cc077a0` (B) | `src/config.c` | +11 | Wire `object_config->pixelformat` at CreateConfig | +| `7055b14` (C) | `src/surface.c` · `src/surface.h` · `src/context.c` · `src/request.h` | +238 / -280 | β refactor: CreateSurfaces2 stripped, CreateContext owns OUTPUT lifecycle. CRIT-1 (NULL guard) + CRIT-2 (request_pool_destroy) + IMP-1 (probe order) + IMP-2 (dead field cleanup) all incorporated | +| `70196f8` (D) | `src/context.c` · `src/surface.c` · `src/surface.h` · `src/request.h` | +136 / -35 | Fix-forward: format-uniform cache in driver_data, walk surface_heap, lazy-fill in CreateSurfaces2. Closes the ffmpeg-vaapi-copy surfaces_count=0 gap | + +**α' history** (rejected, kept in commit log): + +| SHA | Status | +|---|---| +| `ce304ef` (α' A) | Reverted by `6bc29ec` | +| `f8256e6` (α' B) | Reverted by `9a7f888` | +| `4b2288f` (α' C) | Reverted by `709ab34` | + +### Campaign repo (fresnel-fourier) + +| Commit | Phase | Description | +|---|---|---| +| `8acfca3` | Phase 0 v1 | Lock Candidate B (vb2_dma_resv) — later corrected via Phase 2 source-read | +| `31b9255` | Phase 0 amend | Bug 3 collapsed, criteria 5→4 | +| `9941523` | Phase 2 v1 | Situation, RFC v2 patches | +| `3c05564` | Phase 3 | 5-codec baseline sweep + all-zero cap_pool init signature | +| `a809e9c` | Phase 4 v1 | Option α' plan (rejected later) | +| `7d1c44b` | Phase 5 v1 | α' review | +| `550bb81` | Phase 6 v1 | α' commits landed (also reverted later) | +| `864af25` | Phase 7 v1 | α' FAIL (HEVC SIGSEGV) → loopback | +| `5abea73` | Phase 4 v2 | Option β plan | +| `3508a2c` | Phase 5 v2 | β review (CRIT-1, CRIT-2) | +| `311411b` | Phase 6 v2 | β commits landed | +| `c773c3d` | Phase 7 v2 | β PARTIAL PASS verdict | + +## What worked + +- **Option β architecture** delivered its core claim: CreateSurfaces2 touches no V4L2 device state, CreateContext owns the OUTPUT lifecycle. The α' destructive-teardown failure mode is structurally eliminated. +- **VP9 unblocked directly**: `libva == kdirect == sw` for `bbb_720p10s_vp9.webm`. The iter4 transitive proof becomes a direct proof. +- **MPEG-2 not regressed**: after Commit D, libva == kdirect (HW≠SW is codec precision drift, unrelated to Bug 2). +- **H.264 keyframe path preserved**: same `71ac099b…` hash as pre-iter5b. Bug 4 deferred cleanly. +- **Phase 5 reviewer's empirical-over-theoretical discipline saved Phase 6 twice**: + - α' Phase 5 caught a compile-failing iterator type bug. + - β Phase 5 v2 caught CRIT-1 (NULL guard) + CRIT-2 (missing `request_pool_destroy`) before either would have shipped broken. + +## What didn't work / surfaced + +### Bug 5 — HEVC libva decode rejected by kernel rkvdec + +Pre-iter5b AND post-β: every DQBUF on HEVC libva path has `V4L2_BUF_FLAG_ERROR`. Kernel rkvdec rejects the decode regardless of OUTPUT pixel format (β fixed format, ERROR persisted). + +iter2 closed HEVC as PASS via transitive proof. The proof verified backend's `VIDIOC_S_EXT_CTRLS` payload byte-matched kdirect's payload — which is correct as far as it goes, but doesn't catch V4L2 protocol-level differences (request_fd binding sequencing, ioctl ordering, sequence numbers, etc.). + +**Reframing**: iter2's HEVC "PASS" was a partial proof. The control payload is correct; what's wrong is something else in the libva backend's V4L2 protocol use that kernel-direct ffmpeg-v4l2request does differently. Out of iter5b scope; deferred to iter6+ as Bug 5. + +### Bug 6 — VP8 libva decode produces non-zero but non-matching output + +Pre-iter5b: all-zero. Post-β: real content (256 unique bytes, plausible BBB keyframe pixels at `93 8e 8a 89 …`) but doesn't byte-match kdirect. Some frames differ from others (motion content). 74.8% bytes are 0x00 — suggests partial fill. + +Plausible causes: slot-rotation bug, partial-buffer-fill, or per-frame DPB binding issue. Hadn't surfaced pre-β because all-zero output masked it. Out of iter5b scope; deferred to iter6+ as Bug 6. + +### Bug 4 (carried from iter4 Phase 7 doc) — H.264 inter-frame race-loss + +Persisted unchanged through iter5b. H.264 keyframe decodes; inter frames return zero. Some race or sync issue specific to H.264 inter. Out of iter5b-β scope by Phase 1 lock. + +## Phase 1 success criteria — final reckoning + +| # | Criterion | Verdict | Evidence | +|---|---|---|---| +| 1 | HEVC + VP9 + VP8 libva == kdirect | **PARTIAL (1 of 3)** | VP9 ✓, HEVC ✗ (Bug 5), VP8 ✗ (Bug 6) | +| 2 | MPEG-2 unchanged | **PASS** | `libva_mpeg2 == kdirect_mpeg2 == 19eefbf4…` post-Commit-D | +| 3 | H.264 keyframe still decodes | **PASS** | `libva_h264 == 71ac099b…` unchanged (Bug 4 deferred) | +| 4 | Control-payload anchors hold | **PASS** | β touched no control-handling code | + +Net: 3 of 4 PASS. Criterion 1 hit 1/3 of its goal. iter5b-β does not close fully. Bugs 4/5/6 are explicit findings handed off to iter6+. + +## Lessons distilled + +### Phase 5 reviewer track-record (both reviews) + +| Review | Findings | Author re-verified | Estimated savings | +|---|---|---|---| +| α' Phase 5 | 1 CRIT (mechanical) + 3 IMP | All correct | 1 failed `meson compile` cycle | +| β Phase 5 v2 | 2 CRIT (NULL guard, request_pool_destroy) + 3 IMP | All correct | β would have crashed at first CreateContext without CRIT-1; second CreateContext would have hung on stale pool without CRIT-2. Half a session at least. | + +Both reviews paid off. Empirical-over-theoretical (Direction 2) worked: reviewers grepped, enumerated, and read source rather than trusting the plan's surface claims. + +### α' failure: dismissing the TODO comment + +The Phase 4 v1 plan dismissed `surface.c:164-171`'s TODO comment as targeting multi-resolution-mid-stream only, on the grounds that iter5b's bug was "initial CreateSurfaces2" not "multi-resolution mid-stream." Empirically wrong: ffmpeg-vaapi's multi-CreateSurfaces2 pattern hit the same gate, triggered destructive teardown mid-stream, and SIGSEGV'd HEVC. + +The Phase 2 v1 reviewer's original recommendation (option β) was correct. iter5b's Phase 4 v1 author (claude) overrode it on insufficient grounds. **Trust Phase 2 architectural recommendations when smaller alternatives carry lifecycle ordering risk.** + +### Commit D surprise: vaapi-copy passes surfaces_count=0 + +The iter6 comment at `context.c:262` explicitly documented that ffmpeg-vaapi-copy passes `surfaces_count=0` to vaCreateContext. Phase 4 v2 plan acknowledged the comment but didn't trace the implication for the new destination_* walk in CreateContext. **Plans that touch lifecycle code must enumerate ALL existing comments AND grep ALL call sites for affected symbols.** + +### Transitive proof's blind spot + +iter1-4 closed HEVC/VP8/VP9 via "transitive proof": backend control bytes == kdirect control bytes AND kdirect decode == SW. The proof verified the BACKEND ↔ KDIRECT contract on a SPECIFIC ARTIFACT (control payloads). It did NOT verify the actual decode outcome through the backend. iter5b-β Phase 7 confirmed: +- VP9: transitive PASS held under direct verification. +- HEVC: transitive PASS did NOT hold under direct verification — Bug 5 was masked. +- VP8: transitive PASS partially held (decode runs, output diverges) — Bug 6 was masked. + +**Transitive proofs are useful but partial.** They prove the part they verify. They don't prove what they don't verify. Future "transitive PASS" claims should also state explicitly what is NOT verified. + +## Memory rule updates + +### Extend `feedback_review_empirical_over_theoretical.md` + +Existing rule: "lean empirical, not theoretical (BOTH directions) — empirical wins over source-read in BOTH author-rebuttal AND author-adopting-reviewer-amendment directions. Test-compile reviewer-suggested field mappings before incorporating into the amended plan." + +Add: when a plan claims "no change needed" for a code region, the author must grep all affected symbols/functions/macros and enumerate the call sites. The iter5b-β Phase 4 v2 plan's `DestroyContext — no change needed` claim missed that `request_pool_destroy` had only ONE per-session caller (the line surface.c:220 which β strips), invisible from a surface read. Phase 5 v2 reviewer's grep caught it. + +### Sibling rule from Commit D + +`feedback_trust_iter_comments_for_lifecycle.md` (proposed): when refactoring lifecycle code (CreateContext, CreateSurfaces2, BeginPicture, DestroyContext), enumerate ALL inline comments in the affected functions. Each comment was placed because some prior iteration discovered a specific bug; the comment encodes the contract that bug discovery established. A plan that doesn't trace the implication of each comment risks re-introducing the bug. + +Specific example: `context.c:262` documented ffmpeg-vaapi-copy's `surfaces_count==0` pattern from iter6. Phase 4 v2 read the comment but didn't trace "what does my new code do when surfaces_count==0?" Phase 7 revealed the answer. + +Defer the memory-rule write to a separate commit alongside this Phase 8 close. + +### Transitive-proof discipline note + +Add to `reference_dmabuf_resv_blocker.md` (which documented the transitive-proof pattern): explicit caveat that transitive proofs verify the AXIS proven (e.g., control payload byte-equality) but don't verify other axes (decode-actually-succeeds, output-actually-matches). When using transitive proof, state what's verified AND what's not. + +## Phase 4 cross-cutting backlog status (iter5b-β increment) + +Inherited from iter4: +- **iter4-B1** auto-detect device discrimination — STILL OPEN. Device numbering shuffles per boot continue to cost a few minutes per session. +- **iter4-B2** mpv-vaapi `Could not create device` — STILL OPEN. Consumer-side. +- **iter4-Q6**, **COLOR_RANGE** — STILL OPEN. +- **B3** picture.c BeginPicture profile-aware reset — STILL OPEN. +- **B4** context.c log suppression for unsupported codec controls — STILL OPEN. +- **B5** mpeg2 vbv_buffer_size — STILL OPEN. +- **B6** h265 SPS bitstream-parse fidelity gap — STILL OPEN. +- **L3** vaDeriveImage cache-stale — STILL OPEN. + +iter5b NEW backlog items: +- **Bug 4** H.264 inter-frame race-loss (carried from iter4 P7 doc). +- **Bug 5** HEVC libva decode rejected by kernel (DQBUF FLAG_ERROR). Pre-existing but only now explicitly named. +- **Bug 6** VP8 libva decode runs but output diverges (74.8% zero with real keyframe content). +- **`object_heap_first/next` iterator pattern** is canonicalized — Commit D uses it correctly per the iter5b-α' Phase 5 review CRIT-1 amendment. Worth documenting in a memory entry if not already covered. + +## Iter5b → iter6 handoff + +Substrate state at iter5b-β close: +- Fork tip `70196f8` on noether + fresnel + gitea. +- Backend installed SHA `2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8` on fresnel. +- Kernel `linux-fresnel-fourier 7.0-1` (unchanged). +- Test fixtures unchanged. +- Phase 3 baseline anchors at `iter5_phase3_baseline.tgz` still valid for iter6 Phase 3 reference. +- Phase 7 v2 artifacts at fresnel `/tmp/iter5b_p7v2/`. + +iter6 candidate research questions (for user lock): +- **Candidate F**: Bug 5 — fix HEVC libva decode (whatever V4L2 protocol contract aspect the backend gets wrong vs kdirect). +- **Candidate G**: Bug 6 — fix VP8 libva partial output (slot rotation, partial fill, or DPB binding). +- **Candidate H**: Bug 4 — fix H.264 inter-frame race-loss. +- **Candidate I**: Re-anchor regression hashes on the iter5b-β substrate so iter6+ can use direct verification anchors (now that VP9 + MPEG-2 work, the per-codec stable hashes serve as the regression-test baseline). +- **Candidate J**: iter4-B1 auto-detect harden (device discrimination, eliminate per-boot env-override). + +## Campaign scoreboard update + +``` +Codec | Site | Iter | Status | Verifier path +========|===========|========|===============|==================================== +H.264 | rkvdec | T4 | PARTIAL | mpv keyframe-seek (Bug 4 inter race) +MPEG-2 | hantro | iter1 | PASS direct | ffmpeg-vaapi-hwdownload +HEVC | rkvdec | iter2 | DEGRADED * | transitive PASS / direct FAIL (Bug 5) +VP8 | hantro | iter3 | PARTIAL | transitive PASS / direct partial (Bug 6) +VP9 | rkvdec | iter4→iter5b | PASS direct ** | ffmpeg-vaapi-hwdownload (iter5b-β fix) +``` + +\* HEVC: iter2 closed as transitive PASS. iter5b-β empirically verified direct-mode decode fails (DQBUF ERROR). iter2's transitive-proof claim was technically true but masks a separate bug. +\*\* VP9: iter4 closed as transitive PASS (criterion-4 via transitive). iter5b-β upgrades to direct PASS. + +Net iter5b-β contribution to campaign: +- **+1 direct verification** (VP9). +- **+0 codecs in fresh broken state** (all regressions caught + reverted in-iteration). +- **+2 named bugs** (5 and 6) for iter6+ work. +- **+1 architectural cleanup** (CreateSurfaces2 is now ~70 LOC, single-responsibility). + +## Phase 8 commit + close + +This document records iter5b-β close. Fork is at `70196f8`, backend on fresnel at SHA `2c6ff82c…`, campaign repo updated through Phase 7 v2 at `c773c3d`. Memory rule updates land as separate commits per the lessons above. iter6 Phase 0 awaits user research-question lock.