iter6 PARTIAL close: Bug 6 narrowed to H-E (kernel-side hantro VP8 partial-write)
Phase 3 Candidate K executed: H-D (slot rotation) ELIMINATED via instrumented bind+read site logging. Slot v4l2_index matches at BeginPicture and at vaGetImage for every surface; destination_data[0] matches slot->map[0]. No rotation mismatch. H-A/B/C/D all eliminated. H-E (kernel-side hantro VP8 partial-write) confirmed by elimination. The libva backend submits correct controls, correct slice bytes, correct slices_size, correct slot indices. Kernel writes erratic partial content (per-frame Y plane transitions at row 536, 24, ... — not a clean buffer-size truncation, not slot rotation). iter6 close PARTIAL: 5 of 6 Phase 1 criteria PASS; criterion 1 (libva_vp8 == kdirect) PARTIAL — kernel-side fix needed, out of iter6's locked backend-only scope. No patches landed. Fresnel substrate unchanged: fork tip 70196f8, backend SHA 2c6ff82c... (identical to iter5b-β close). Net deliverable: Phase 3 narrowing reduces Bug-6 hypothesis space from 5 to 1. Future iter7+ (or kernel-agent campaign) picks up the kernel-side investigation. Pattern recognized: iter2 HEVC transitive PASS masked Bug 5; iter3 VP8 transitive PASS masked Bug 6. Both surfaced under direct verification post-iter5b-β. Transitive proofs against ONE artifact (control payload) don't catch bugs in OTHER artifacts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -39,11 +39,26 @@ VP8 keyframe control payload byte-identical between libva and kdirect on the cur
|
|||||||
|
|
||||||
## Remaining hypotheses
|
## Remaining hypotheses
|
||||||
|
|
||||||
### H-D — CAPTURE slot rotation mismatch: open
|
### H-D — CAPTURE slot rotation mismatch: ELIMINATED (user pick Candidate K, executed 2026-05-12)
|
||||||
|
|
||||||
Not directly tested in this session. Would need: log `slot->v4l2_index` at cap_pool_acquire time and at copy_surface_to_image read time; verify they match. If they diverge, libva reads from a stale slot while kernel wrote to a different one.
|
Instrumented `surface_bind_slot` in surface.c and `copy_surface_to_image` in image.c to log slot indices and `destination_data[]` pointers. Re-ran VP8 sweep.
|
||||||
|
|
||||||
### H-E — kernel-side hantro VP8 quirk: open and increasingly likely
|
Empirical result (excerpt):
|
||||||
|
|
||||||
|
```
|
||||||
|
H-D bind: surface=0xaaab0111d630 slot=v4l2_index=0 dst_index=0 map[0]=0xffffa465e000
|
||||||
|
H-D bind: surface=0xaaab01122110 slot=v4l2_index=1 dst_index=1 map[0]=0xffffa450c000
|
||||||
|
H-D bind: surface=0xaaab01126bf0 slot=v4l2_index=2 dst_index=2 map[0]=0xffffa43ba000
|
||||||
|
H-D read: surface=0xaaab0111d630 dst_index=0 current_slot=… destination_data[0]=0xffffa465e000 destination_data[1]=0xffffa473f000
|
||||||
|
H-D read: surface=0xaaab01122110 dst_index=1 current_slot=… destination_data[0]=0xffffa450c000 destination_data[1]=0xffffa45ed000
|
||||||
|
H-D read: surface=0xaaab01126bf0 dst_index=2 current_slot=… destination_data[0]=0xffffa43ba000 destination_data[1]=0xffffa449b000
|
||||||
|
```
|
||||||
|
|
||||||
|
For each surface: the slot v4l2_index at `surface_bind_slot` (BeginPicture time) **matches** the dst_index at `copy_surface_to_image` (vaGetImage time). The `destination_data[0]` pointer matches `slot->map[0]` returned by `cap_pool_acquire`. **No slot rotation mismatch.** H-D eliminated.
|
||||||
|
|
||||||
|
(Bonus observation: cap_pool acquires slots in increasing index order — slot 0, 1, 2, 3, … through 12+ over a 3-frame decode. LRU semantics working as designed.)
|
||||||
|
|
||||||
|
### H-E — kernel-side hantro VP8 quirk: CONFIRMED by elimination of H-A/B/C/D
|
||||||
|
|
||||||
The output bytes show an **erratic partial-write pattern**:
|
The output bytes show an **erratic partial-write pattern**:
|
||||||
|
|
||||||
@@ -108,12 +123,9 @@ iter6 hands off Bugs 4/5/6 to iter7+. Memory updates for the iter6 lesson on tra
|
|||||||
|
|
||||||
## Decision point
|
## Decision point
|
||||||
|
|
||||||
This is a user-decision point. Phase 3 has done its narrowing job. Bug 6's actual fix is either:
|
User picked Candidate K. Phase 3 executed. **H-D eliminated**, **H-E confirmed**. iter6's Phase 1 locked scope was backend-only ("backend-side fix expected"). Bug 6 is kernel-side. **iter6 closes PARTIAL** — Phase 3 narrowing delivered as the iter6 contribution; Bug 6 fix deferred to iter7+ (would target kernel-side work, similar to original iter5 Candidate B scope).
|
||||||
|
|
||||||
- 1-2 hours more empirical work (Candidate K — likely productive)
|
Remaining user pick: which target for iter7.
|
||||||
- Multi-session kernel-side work (Candidate L)
|
|
||||||
- Pivot to a different bug (Candidate M)
|
|
||||||
- Close iter6 partial and document (Candidate N)
|
|
||||||
|
|
||||||
## Substrate state at iter6 Phase 3 close
|
## Substrate state at iter6 Phase 3 close
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,105 @@
|
|||||||
|
# Iteration 6 — Phase 8 (close)
|
||||||
|
|
||||||
|
Closes 2026-05-12. iter6 = Bug 6 (VP8 partial output post-iter5b-β) targeted; root-cause narrowed but **not fixed**. iter6 closes PARTIAL with the narrowing as the iter6 deliverable. Bug 6's actual fix lives in kernel territory, outside iter6's Phase 1 locked scope.
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
| Metric | Value |
|
||||||
|
|---|---|
|
||||||
|
| Iteration target | Bug 6 — VP8 libva output diverges from kdirect (74.8% zero with real keyframe content) |
|
||||||
|
| Hardware | RK3399 hantro-vpu-dec for VP8 |
|
||||||
|
| Fork tip start (iter5b-β close) | `70196f8` |
|
||||||
|
| Fork tip end (iter6 close) | `70196f8` (unchanged — no patches landed) |
|
||||||
|
| Phase 1 criteria | 5/6 PASS, 1/6 PARTIAL |
|
||||||
|
| Bug 6 status | **narrowed to kernel-side (H-E)**; backend-side fix not available |
|
||||||
|
| Campaign scoreboard | unchanged from iter5b-β close: 5/5 with 2 direct (VP9, MPEG-2) + 3 mixed (H.264 keyframe-partial, VP8 partial-with-Bug-6, HEVC transitive-only with Bug 5) |
|
||||||
|
|
||||||
|
## Deliverable: Bug 6 root-cause narrowing
|
||||||
|
|
||||||
|
Phase 3 systematically eliminated 4 of 5 hypotheses via empirical investigation:
|
||||||
|
|
||||||
|
| Hypothesis | Status | Evidence |
|
||||||
|
|---|---|---|
|
||||||
|
| H-A — slice data corruption in libva path | **ELIMINATED** | libva-dumped slice 0 (300614 bytes) byte-identical to raw VP8 frame 0 from .webm post-header (SHA `9e74956c…`). |
|
||||||
|
| H-B — `slices_size` wrong on OUTPUT QBUF | **ELIMINATED** | slices_size = fp_size + sum(dct_part_sizes) = 22742 + 277872 = 300614 exactly. |
|
||||||
|
| H-C — CAPTURE-side cache coherency | **ELIMINATED** | VP9 uses same `image.c::copy_surface_to_image` path and works fine. msync attempt empirically no-op (output hash unchanged). |
|
||||||
|
| H-D — CAPTURE slot rotation mismatch | **ELIMINATED** (Phase 3 Candidate K) | Instrumented bind + read sites; slot v4l2_index matches between BeginPicture and vaGetImage for every surface tested. destination_data[0] pointer matches slot->map[0]. |
|
||||||
|
| H-E — kernel-side hantro VP8 partial-write quirk | **CONFIRMED by elimination** | The control bytes are right, the slice bytes are right, the buffer pointers are right, the slot indices are right. Yet the kernel writes only a partial subset of the frame to the CAPTURE buffer (per-frame transition rows: frame 0 Y row 536, frame 0 UV row 134, frame 1 Y row 24 — erratic). |
|
||||||
|
|
||||||
|
The libva backend is doing the right thing on the V4L2 protocol surface. The kernel-side hantro VP8 driver writes partial content to the CAPTURE buffer.
|
||||||
|
|
||||||
|
This narrowing is iter6's deliverable. Future investigation (iter7 or kernel-agent work) needs ftrace, hantro source-read, possibly a kernel patch.
|
||||||
|
|
||||||
|
## Per-criterion verdict
|
||||||
|
|
||||||
|
| # | Criterion | Verdict | Evidence |
|
||||||
|
|---|---|---|---|
|
||||||
|
| 1 | VP8 libva == kdirect | **PARTIAL** — kernel-side bug confirmed via systematic elimination, not fixed | post-β VP8 hash `bcc57ed5…`, kdirect `136ce5cb…` |
|
||||||
|
| 2 | VP9 unchanged | **PASS** | `4f1565e8…` == kdirect (iter5b-β state preserved) |
|
||||||
|
| 3 | MPEG-2 unchanged | **PASS** | `19eefbf4…` == kdirect (iter5b-β state preserved) |
|
||||||
|
| 4 | H.264 keyframe-partial unchanged | **PASS** | `71ac099b…` (Bug 4 still deferred) |
|
||||||
|
| 5 | HEVC unchanged | **PASS** | `06b2c5a0…` all-zero (Bug 5 still deferred) |
|
||||||
|
| 6 | Control-payload anchors hold | **PASS** | No control-submission code changes in iter6 |
|
||||||
|
|
||||||
|
5 of 6 PASS. Criterion 1 PARTIAL: the locked goal of `libva_vp8 == kdirect_vp8` not reached, but Phase 3 narrowed it from "unknown" to "kernel-side H-E confirmed."
|
||||||
|
|
||||||
|
## Cost of iter6
|
||||||
|
|
||||||
|
- Phase 0+1+2+3 docs.
|
||||||
|
- 0 fork commits (no patches landed).
|
||||||
|
- ~3 hours of diagnostic instrumentation (slice-data dump, msync test, slot-index logging) all reverted at end of Phase 3.
|
||||||
|
- Substrate unchanged: fork tip `70196f8`, backend SHA `2c6ff82c…`. Net delta on fresnel: nil.
|
||||||
|
- Net deliverable: Phase 3 narrowing doc with crisp H-A/B/C/D elimination + H-E confirmation. The hypothesis space for future Bug-6 work shrinks from 5 → 1.
|
||||||
|
|
||||||
|
## Lessons distilled
|
||||||
|
|
||||||
|
### Transitive-proof blind spots (re-confirmed)
|
||||||
|
|
||||||
|
iter3 closed VP8 via "transitive PASS" verifying specific control fields (`first_part_size`, `first_part_header_bits`) matched kdirect. iter5b-β unblocked VP8 from the OUTPUT-format mismatch. iter6 Phase 3 confirmed: control payload is byte-identical to kdirect, slice data is byte-identical to the raw .webm, slices_size is correct — and the kernel STILL produces wrong output.
|
||||||
|
|
||||||
|
iter3's transitive proof, iter5b-β's β refactor, and iter6's Phase 3 narrowing are all consistent with each other. The kernel-side partial-write (H-E) was masked by the OUTPUT format mismatch pre-β (all-zero output regardless of what else was right). Once β fixed the format, H-E became visible.
|
||||||
|
|
||||||
|
Same shape as iter2-HEVC + Bug 5: transitive proofs against ONE artifact (control payload) don't catch bugs in OTHER artifacts (the kernel's actual decode behavior).
|
||||||
|
|
||||||
|
### Phase 3 instrumentation hygiene
|
||||||
|
|
||||||
|
iter6 Phase 3 added/reverted instrumentation three times (slice dump, msync, slot indices). Each rebuild + run + analyze cycle was ~5 minutes. Total instrumentation overhead: maybe 30 minutes across 3 cycles. The discipline of "add diagnostic, run, revert" worked — fresnel ends at clean SHA `2c6ff82c…` identical to iter5b-β close.
|
||||||
|
|
||||||
|
Worth a memory note: **instrument-revert workflow is fast and safe for empirical narrowing**. Future Phase 3 work should embrace the same pattern instead of trying to deduce from source alone.
|
||||||
|
|
||||||
|
### iter6 didn't need Phase 4-7 because the Phase 3 verdict was "kernel-side, out of scope"
|
||||||
|
|
||||||
|
The 8(+1)-phase loop assumes Phase 3 narrows the fix site and Phase 4 plans the fix. When Phase 3 narrows OUT OF SCOPE (e.g., kernel-side when the iteration locked backend-only), Phase 4 onwards is moot. iter6 closes from Phase 3 directly to Phase 8 with the narrowing as the deliverable.
|
||||||
|
|
||||||
|
This is a legitimate close pattern, not a process violation. The dev process allows for "Phase 3 → Phase 0 loopback" or "Phase 3 → iteration close PARTIAL" when the empirical evidence demands it. iter6 demonstrates the latter.
|
||||||
|
|
||||||
|
## Phase 4 cross-cutting backlog status (iter6 increment)
|
||||||
|
|
||||||
|
Unchanged from iter5b-β close:
|
||||||
|
- iter4-B1 device-discrimination: still open.
|
||||||
|
- iter4-B2 mpv-vaapi create-device: still open.
|
||||||
|
- iter4-Q6, COLOR_RANGE, B3-B6, L3: all still open.
|
||||||
|
|
||||||
|
iter6 NEW backlog items:
|
||||||
|
- **Bug 6 narrowed to H-E**: kernel-side hantro VP8 partial-write. Per-frame Y plane transitions at varying rows (536, 24, ...). Not slot rotation, not cache, not slice corruption, not bytesused. Investigation surface: hantro driver source (hantro_g1_vp8_dec.c likely), ftrace at kernel for VP8 decode path, possibly a hantro-VP8 kernel patch.
|
||||||
|
|
||||||
|
## iter6 → iter7 handoff
|
||||||
|
|
||||||
|
Substrate at close:
|
||||||
|
- Fork tip `70196f8` (iter5b-β + Commit D). Identical to iter6 open.
|
||||||
|
- Backend SHA `2c6ff82c…` on fresnel. Identical to iter6 open.
|
||||||
|
- Kernel unchanged.
|
||||||
|
- Test fixtures unchanged.
|
||||||
|
|
||||||
|
Candidates for iter7 Phase 0:
|
||||||
|
- **Bug 6 H-E kernel-side fix** — heavy, kernel work. Aligns with original iter5 Candidate B scope (vb2_dma_resv era kernel patches, plus new VP8 partial-write fix).
|
||||||
|
- **Bug 4 H.264 inter race-loss** — backend or kernel; iter4 partial work history.
|
||||||
|
- **Bug 5 HEVC DQBUF FLAG_ERROR** — iter2's transitive PASS masked it; need to find the V4L2 protocol gap between libva-HEVC and kdirect-HEVC.
|
||||||
|
- **iter4-B1 auto-detect harden** — backend, ~100 LOC, quality-of-life.
|
||||||
|
- **Re-anchor regression hashes on β substrate** — establish iter7+ regression invariants.
|
||||||
|
|
||||||
|
User picks at iter7 Phase 0 lock.
|
||||||
|
|
||||||
|
## Memory rule note (deferred)
|
||||||
|
|
||||||
|
iter6's lesson about Phase-3-empirical-revert workflow is worth a memory entry; defer the write to iter7 close or to a dedicated memory-curation session.
|
||||||
Reference in New Issue
Block a user