9a14cc2527
Iteration shipped (fork tip 70196f8, backend SHA 2c6ff82c... on fresnel): - VP9 directly verifiable (Phase 1 criterion 1 met for 1 of 3 target codecs) - MPEG-2 maintained (no regression after Commit D fix-forward) - H.264 unchanged (Bug 4 deferred per Phase 1 lock) - Architecture cleaned: CreateSurfaces2 ~70 LOC (single-responsibility), CreateContext owns OUTPUT lifecycle, no α'-style failure mode possible. Surfaced bugs for iter6+: - Bug 5: HEVC libva DQBUF FLAG_ERROR (pre-existing; iter2's transitive PASS verified control payload but not decode outcome) - Bug 6: VP8 libva produces non-zero non-matching output (slot rotation or partial fill, masked pre-β by all-zero state) - Bug 4: H.264 inter-frame race-loss (carried from iter4 P7) Lessons distilled to memory: - feedback_grep_callsites_before_no_change.md (Phase 5 v2 CRIT-2 caught request_pool_destroy not in DestroyContext after C3 stripped its only per-session caller) - feedback_trust_iter_comments_for_lifecycle.md (Commit D fix-forward surfaced because Phase 4 v2 read but didn't trace context.c:262's iter6 ffmpeg-vaapi-copy surfaces_count=0 comment) Campaign scoreboard: 5/5 with 2 direct (VP9 new, MPEG-2 maintained) + 3 mixed (H.264 keyframe partial, VP8 partial new, HEVC transitive-only direct-FAIL). iter6 awaits Phase 0 research-question lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
210 lines
14 KiB
Markdown
210 lines
14 KiB
Markdown
# Iteration 5b — Phase 8 (close)
|
||
|
||
Closes 2026-05-12. iter5b = backend OUTPUT-format profile-derivation (option β refactor). PARTIAL PASS per Phase 7 v2: VP9 directly unblocked, MPEG-2 maintained, HEVC + VP8 expose previously-masked bugs.
|
||
|
||
## Summary
|
||
|
||
| Metric | Value |
|
||
|---|---|
|
||
| Iteration target | Backend Bug 2 (`surface.c:173` hardcoded H264_SLICE) via option β (CreateContext-centric OUTPUT lifecycle) |
|
||
| Hardware | RK3399 rkvdec (`/dev/video1+/dev/media0` this boot) + hantro-vpu-dec (`/dev/video3+/dev/media1`) |
|
||
| Fork tip start (iter4 close) | `692eaa0` |
|
||
| Fork tip mid-iter (α' rejected) | `4b2288f` → 3-revert → `6bc29ec` |
|
||
| Fork tip end (iter5b-β close) | `70196f8` |
|
||
| Commits authored | 3 β commits + 3 α'-reverts + 1 Commit D fix-forward + 3 α' commits in history = 10 fork commits net 4 visible β changes |
|
||
| LOC delta (β + D vs iter4 tip) | +320 / -293 across 8 files + 2 new |
|
||
| Phase 1 criteria | 1/4 PARTIAL · 3/4 PASS |
|
||
| Phase 7 → Phase 4 loopbacks | 1 (α' rejected, β chosen) |
|
||
| Phase 6 fix-forwards | 1 (Commit D for vaapi-copy late-surface flow) |
|
||
| Phase 5 review findings | iter5b-α' Phase 5: 1 CRIT (mechanical) + 3 IMP. iter5b-β Phase 5 v2: 2 CRIT (NULL guard, missing request_pool_destroy) + 3 IMP. All amendments incorporated. |
|
||
| Campaign scoreboard | 5/5 with **1 newly direct** (VP9) + 1 maintained direct (MPEG-2) + 3 mixed (H.264 keyframe-partial, VP8 partial-new-Bug-6, HEVC pre-existing-Bug-5) |
|
||
|
||
## Commits shipped
|
||
|
||
### Fork (libva-v4l2-request-fourier)
|
||
|
||
**β-final commits** (the actual iter5b shipped code):
|
||
|
||
| SHA | Files | LOC | Description |
|
||
|---|---|---|---|
|
||
| `1c548b1` (A) | `src/codec.h` (new) · `src/codec.c` (new) · `src/meson.build` | +102 / -2 | `pixelformat_for_profile()` helper mapping VAProfile → V4L2 OUTPUT FOURCC |
|
||
| `cc077a0` (B) | `src/config.c` | +11 | Wire `object_config->pixelformat` at CreateConfig |
|
||
| `7055b14` (C) | `src/surface.c` · `src/surface.h` · `src/context.c` · `src/request.h` | +238 / -280 | β refactor: CreateSurfaces2 stripped, CreateContext owns OUTPUT lifecycle. CRIT-1 (NULL guard) + CRIT-2 (request_pool_destroy) + IMP-1 (probe order) + IMP-2 (dead field cleanup) all incorporated |
|
||
| `70196f8` (D) | `src/context.c` · `src/surface.c` · `src/surface.h` · `src/request.h` | +136 / -35 | Fix-forward: format-uniform cache in driver_data, walk surface_heap, lazy-fill in CreateSurfaces2. Closes the ffmpeg-vaapi-copy surfaces_count=0 gap |
|
||
|
||
**α' history** (rejected, kept in commit log):
|
||
|
||
| SHA | Status |
|
||
|---|---|
|
||
| `ce304ef` (α' A) | Reverted by `6bc29ec` |
|
||
| `f8256e6` (α' B) | Reverted by `9a7f888` |
|
||
| `4b2288f` (α' C) | Reverted by `709ab34` |
|
||
|
||
### Campaign repo (fresnel-fourier)
|
||
|
||
| Commit | Phase | Description |
|
||
|---|---|---|
|
||
| `8acfca3` | Phase 0 v1 | Lock Candidate B (vb2_dma_resv) — later corrected via Phase 2 source-read |
|
||
| `31b9255` | Phase 0 amend | Bug 3 collapsed, criteria 5→4 |
|
||
| `9941523` | Phase 2 v1 | Situation, RFC v2 patches |
|
||
| `3c05564` | Phase 3 | 5-codec baseline sweep + all-zero cap_pool init signature |
|
||
| `a809e9c` | Phase 4 v1 | Option α' plan (rejected later) |
|
||
| `7d1c44b` | Phase 5 v1 | α' review |
|
||
| `550bb81` | Phase 6 v1 | α' commits landed (also reverted later) |
|
||
| `864af25` | Phase 7 v1 | α' FAIL (HEVC SIGSEGV) → loopback |
|
||
| `5abea73` | Phase 4 v2 | Option β plan |
|
||
| `3508a2c` | Phase 5 v2 | β review (CRIT-1, CRIT-2) |
|
||
| `311411b` | Phase 6 v2 | β commits landed |
|
||
| `c773c3d` | Phase 7 v2 | β PARTIAL PASS verdict |
|
||
|
||
## What worked
|
||
|
||
- **Option β architecture** delivered its core claim: CreateSurfaces2 touches no V4L2 device state, CreateContext owns the OUTPUT lifecycle. The α' destructive-teardown failure mode is structurally eliminated.
|
||
- **VP9 unblocked directly**: `libva == kdirect == sw` for `bbb_720p10s_vp9.webm`. The iter4 transitive proof becomes a direct proof.
|
||
- **MPEG-2 not regressed**: after Commit D, libva == kdirect (HW≠SW is codec precision drift, unrelated to Bug 2).
|
||
- **H.264 keyframe path preserved**: same `71ac099b…` hash as pre-iter5b. Bug 4 deferred cleanly.
|
||
- **Phase 5 reviewer's empirical-over-theoretical discipline saved Phase 6 twice**:
|
||
- α' Phase 5 caught a compile-failing iterator type bug.
|
||
- β Phase 5 v2 caught CRIT-1 (NULL guard) + CRIT-2 (missing `request_pool_destroy`) before either would have shipped broken.
|
||
|
||
## What didn't work / surfaced
|
||
|
||
### Bug 5 — HEVC libva decode rejected by kernel rkvdec
|
||
|
||
Pre-iter5b AND post-β: every DQBUF on HEVC libva path has `V4L2_BUF_FLAG_ERROR`. Kernel rkvdec rejects the decode regardless of OUTPUT pixel format (β fixed format, ERROR persisted).
|
||
|
||
iter2 closed HEVC as PASS via transitive proof. The proof verified backend's `VIDIOC_S_EXT_CTRLS` payload byte-matched kdirect's payload — which is correct as far as it goes, but doesn't catch V4L2 protocol-level differences (request_fd binding sequencing, ioctl ordering, sequence numbers, etc.).
|
||
|
||
**Reframing**: iter2's HEVC "PASS" was a partial proof. The control payload is correct; what's wrong is something else in the libva backend's V4L2 protocol use that kernel-direct ffmpeg-v4l2request does differently. Out of iter5b scope; deferred to iter6+ as Bug 5.
|
||
|
||
### Bug 6 — VP8 libva decode produces non-zero but non-matching output
|
||
|
||
Pre-iter5b: all-zero. Post-β: real content (256 unique bytes, plausible BBB keyframe pixels at `93 8e 8a 89 …`) but doesn't byte-match kdirect. Some frames differ from others (motion content). 74.8% bytes are 0x00 — suggests partial fill.
|
||
|
||
Plausible causes: slot-rotation bug, partial-buffer-fill, or per-frame DPB binding issue. Hadn't surfaced pre-β because all-zero output masked it. Out of iter5b scope; deferred to iter6+ as Bug 6.
|
||
|
||
### Bug 4 (carried from iter4 Phase 7 doc) — H.264 inter-frame race-loss
|
||
|
||
Persisted unchanged through iter5b. H.264 keyframe decodes; inter frames return zero. Some race or sync issue specific to H.264 inter. Out of iter5b-β scope by Phase 1 lock.
|
||
|
||
## Phase 1 success criteria — final reckoning
|
||
|
||
| # | Criterion | Verdict | Evidence |
|
||
|---|---|---|---|
|
||
| 1 | HEVC + VP9 + VP8 libva == kdirect | **PARTIAL (1 of 3)** | VP9 ✓, HEVC ✗ (Bug 5), VP8 ✗ (Bug 6) |
|
||
| 2 | MPEG-2 unchanged | **PASS** | `libva_mpeg2 == kdirect_mpeg2 == 19eefbf4…` post-Commit-D |
|
||
| 3 | H.264 keyframe still decodes | **PASS** | `libva_h264 == 71ac099b…` unchanged (Bug 4 deferred) |
|
||
| 4 | Control-payload anchors hold | **PASS** | β touched no control-handling code |
|
||
|
||
Net: 3 of 4 PASS. Criterion 1 hit 1/3 of its goal. iter5b-β does not close fully. Bugs 4/5/6 are explicit findings handed off to iter6+.
|
||
|
||
## Lessons distilled
|
||
|
||
### Phase 5 reviewer track-record (both reviews)
|
||
|
||
| Review | Findings | Author re-verified | Estimated savings |
|
||
|---|---|---|---|
|
||
| α' Phase 5 | 1 CRIT (mechanical) + 3 IMP | All correct | 1 failed `meson compile` cycle |
|
||
| β Phase 5 v2 | 2 CRIT (NULL guard, request_pool_destroy) + 3 IMP | All correct | β would have crashed at first CreateContext without CRIT-1; second CreateContext would have hung on stale pool without CRIT-2. Half a session at least. |
|
||
|
||
Both reviews paid off. Empirical-over-theoretical (Direction 2) worked: reviewers grepped, enumerated, and read source rather than trusting the plan's surface claims.
|
||
|
||
### α' failure: dismissing the TODO comment
|
||
|
||
The Phase 4 v1 plan dismissed `surface.c:164-171`'s TODO comment as targeting multi-resolution-mid-stream only, on the grounds that iter5b's bug was "initial CreateSurfaces2" not "multi-resolution mid-stream." Empirically wrong: ffmpeg-vaapi's multi-CreateSurfaces2 pattern hit the same gate, triggered destructive teardown mid-stream, and SIGSEGV'd HEVC.
|
||
|
||
The Phase 2 v1 reviewer's original recommendation (option β) was correct. iter5b's Phase 4 v1 author (claude) overrode it on insufficient grounds. **Trust Phase 2 architectural recommendations when smaller alternatives carry lifecycle ordering risk.**
|
||
|
||
### Commit D surprise: vaapi-copy passes surfaces_count=0
|
||
|
||
The iter6 comment at `context.c:262` explicitly documented that ffmpeg-vaapi-copy passes `surfaces_count=0` to vaCreateContext. Phase 4 v2 plan acknowledged the comment but didn't trace the implication for the new destination_* walk in CreateContext. **Plans that touch lifecycle code must enumerate ALL existing comments AND grep ALL call sites for affected symbols.**
|
||
|
||
### Transitive proof's blind spot
|
||
|
||
iter1-4 closed HEVC/VP8/VP9 via "transitive proof": backend control bytes == kdirect control bytes AND kdirect decode == SW. The proof verified the BACKEND ↔ KDIRECT contract on a SPECIFIC ARTIFACT (control payloads). It did NOT verify the actual decode outcome through the backend. iter5b-β Phase 7 confirmed:
|
||
- VP9: transitive PASS held under direct verification.
|
||
- HEVC: transitive PASS did NOT hold under direct verification — Bug 5 was masked.
|
||
- VP8: transitive PASS partially held (decode runs, output diverges) — Bug 6 was masked.
|
||
|
||
**Transitive proofs are useful but partial.** They prove the part they verify. They don't prove what they don't verify. Future "transitive PASS" claims should also state explicitly what is NOT verified.
|
||
|
||
## Memory rule updates
|
||
|
||
### Extend `feedback_review_empirical_over_theoretical.md`
|
||
|
||
Existing rule: "lean empirical, not theoretical (BOTH directions) — empirical wins over source-read in BOTH author-rebuttal AND author-adopting-reviewer-amendment directions. Test-compile reviewer-suggested field mappings before incorporating into the amended plan."
|
||
|
||
Add: when a plan claims "no change needed" for a code region, the author must grep all affected symbols/functions/macros and enumerate the call sites. The iter5b-β Phase 4 v2 plan's `DestroyContext — no change needed` claim missed that `request_pool_destroy` had only ONE per-session caller (the line surface.c:220 which β strips), invisible from a surface read. Phase 5 v2 reviewer's grep caught it.
|
||
|
||
### Sibling rule from Commit D
|
||
|
||
`feedback_trust_iter_comments_for_lifecycle.md` (proposed): when refactoring lifecycle code (CreateContext, CreateSurfaces2, BeginPicture, DestroyContext), enumerate ALL inline comments in the affected functions. Each comment was placed because some prior iteration discovered a specific bug; the comment encodes the contract that bug discovery established. A plan that doesn't trace the implication of each comment risks re-introducing the bug.
|
||
|
||
Specific example: `context.c:262` documented ffmpeg-vaapi-copy's `surfaces_count==0` pattern from iter6. Phase 4 v2 read the comment but didn't trace "what does my new code do when surfaces_count==0?" Phase 7 revealed the answer.
|
||
|
||
Defer the memory-rule write to a separate commit alongside this Phase 8 close.
|
||
|
||
### Transitive-proof discipline note
|
||
|
||
Add to `reference_dmabuf_resv_blocker.md` (which documented the transitive-proof pattern): explicit caveat that transitive proofs verify the AXIS proven (e.g., control payload byte-equality) but don't verify other axes (decode-actually-succeeds, output-actually-matches). When using transitive proof, state what's verified AND what's not.
|
||
|
||
## Phase 4 cross-cutting backlog status (iter5b-β increment)
|
||
|
||
Inherited from iter4:
|
||
- **iter4-B1** auto-detect device discrimination — STILL OPEN. Device numbering shuffles per boot continue to cost a few minutes per session.
|
||
- **iter4-B2** mpv-vaapi `Could not create device` — STILL OPEN. Consumer-side.
|
||
- **iter4-Q6**, **COLOR_RANGE** — STILL OPEN.
|
||
- **B3** picture.c BeginPicture profile-aware reset — STILL OPEN.
|
||
- **B4** context.c log suppression for unsupported codec controls — STILL OPEN.
|
||
- **B5** mpeg2 vbv_buffer_size — STILL OPEN.
|
||
- **B6** h265 SPS bitstream-parse fidelity gap — STILL OPEN.
|
||
- **L3** vaDeriveImage cache-stale — STILL OPEN.
|
||
|
||
iter5b NEW backlog items:
|
||
- **Bug 4** H.264 inter-frame race-loss (carried from iter4 P7 doc).
|
||
- **Bug 5** HEVC libva decode rejected by kernel (DQBUF FLAG_ERROR). Pre-existing but only now explicitly named.
|
||
- **Bug 6** VP8 libva decode runs but output diverges (74.8% zero with real keyframe content).
|
||
- **`object_heap_first/next` iterator pattern** is canonicalized — Commit D uses it correctly per the iter5b-α' Phase 5 review CRIT-1 amendment. Worth documenting in a memory entry if not already covered.
|
||
|
||
## Iter5b → iter6 handoff
|
||
|
||
Substrate state at iter5b-β close:
|
||
- Fork tip `70196f8` on noether + fresnel + gitea.
|
||
- Backend installed SHA `2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8` on fresnel.
|
||
- Kernel `linux-fresnel-fourier 7.0-1` (unchanged).
|
||
- Test fixtures unchanged.
|
||
- Phase 3 baseline anchors at `iter5_phase3_baseline.tgz` still valid for iter6 Phase 3 reference.
|
||
- Phase 7 v2 artifacts at fresnel `/tmp/iter5b_p7v2/`.
|
||
|
||
iter6 candidate research questions (for user lock):
|
||
- **Candidate F**: Bug 5 — fix HEVC libva decode (whatever V4L2 protocol contract aspect the backend gets wrong vs kdirect).
|
||
- **Candidate G**: Bug 6 — fix VP8 libva partial output (slot rotation, partial fill, or DPB binding).
|
||
- **Candidate H**: Bug 4 — fix H.264 inter-frame race-loss.
|
||
- **Candidate I**: Re-anchor regression hashes on the iter5b-β substrate so iter6+ can use direct verification anchors (now that VP9 + MPEG-2 work, the per-codec stable hashes serve as the regression-test baseline).
|
||
- **Candidate J**: iter4-B1 auto-detect harden (device discrimination, eliminate per-boot env-override).
|
||
|
||
## Campaign scoreboard update
|
||
|
||
```
|
||
Codec | Site | Iter | Status | Verifier path
|
||
========|===========|========|===============|====================================
|
||
H.264 | rkvdec | T4 | PARTIAL | mpv keyframe-seek (Bug 4 inter race)
|
||
MPEG-2 | hantro | iter1 | PASS direct | ffmpeg-vaapi-hwdownload
|
||
HEVC | rkvdec | iter2 | DEGRADED * | transitive PASS / direct FAIL (Bug 5)
|
||
VP8 | hantro | iter3 | PARTIAL | transitive PASS / direct partial (Bug 6)
|
||
VP9 | rkvdec | iter4→iter5b | PASS direct ** | ffmpeg-vaapi-hwdownload (iter5b-β fix)
|
||
```
|
||
|
||
\* HEVC: iter2 closed as transitive PASS. iter5b-β empirically verified direct-mode decode fails (DQBUF ERROR). iter2's transitive-proof claim was technically true but masks a separate bug.
|
||
\*\* VP9: iter4 closed as transitive PASS (criterion-4 via transitive). iter5b-β upgrades to direct PASS.
|
||
|
||
Net iter5b-β contribution to campaign:
|
||
- **+1 direct verification** (VP9).
|
||
- **+0 codecs in fresh broken state** (all regressions caught + reverted in-iteration).
|
||
- **+2 named bugs** (5 and 6) for iter6+ work.
|
||
- **+1 architectural cleanup** (CreateSurfaces2 is now ~70 LOC, single-responsibility).
|
||
|
||
## Phase 8 commit + close
|
||
|
||
This document records iter5b-β close. Fork is at `70196f8`, backend on fresnel at SHA `2c6ff82c…`, campaign repo updated through Phase 7 v2 at `c773c3d`. Memory rule updates land as separate commits per the lessons above. iter6 Phase 0 awaits user research-question lock.
|