Files
fresnel-fourier/phase8_iteration8_close.md
T
marfrit 3ed1e454fb iter8 Phase 7c + 8: close iter8 PARTIAL — Bug 4 narrowed via 5 eliminations
α-2 (POC strip removal) changed wire bytes (POC now matches kdirect's
sentinel-encoded 0x10000) but H.264 output unchanged. POC not load-bearing.

5-codec regression sweep on α-2 backend: all 4 non-H.264 anchors hold.
Zero regression.

Iter8 close: 5/6 PASS, criterion-1 PARTIAL. Bug 4 narrowed but not fixed.

Eliminations achieved:
  1. libva-readback bug (γ dump)
  2. Slot-binding wrong (γ dump shows correct slot per surface)
  3. Stale residue (IMP-1 memset confirmed deterministic kernel write)
  4. constraint_set_flags (Phase 5b CRIT-1: rkvdec source review)
  5. POC sentinel strip (α-2 wire change, no output change)

Remaining candidates for iter9: PPS diff (α-3), DECODE_PARAMS post-DPB
fields (α-6), DPB entry order (α-4), slice data encoding (α-5).

Fork tip 0226684 carries γ + IMP-1 diagnostic + α-2 hygiene. All
env-gated off by default; α-2 is a wire-payload cleanup with zero
behavior effect.

Lessons distilled:
- Reviews are never skippable — Phase 5b CRIT-1 saved a build cycle.
- Wire-byte equivalence ≠ behavior equivalence.
- Per-driver kludges in shared codec code need explicit gating.
- Bug carryover labels can mislead (Bug 4 != "inter race-loss").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:01:36 +00:00

170 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 8 — Phase 8 (close)
Closes 2026-05-13. iter8 = Bug 4 (H.264 partial-fill, originally "inter race-loss") PARTIAL close — narrowed via three eliminations, not fixed.
## Summary
| Metric | Value |
|---|---|
| Iteration target | Bug 4 — H.264 libva produces 16×32 partial-fill instead of full frame |
| Hardware | RK3399 rkvdec |
| Fork tip start (iter7 close) | `6df2159` |
| Fork tip end (iter8 close) | `0226684` (4 commits: γ dump, IMP-1 pre-zero, stdlib include fix-fwd, α-2 POC strip removal) |
| LOC delta | +103 / -3 across `src/surface.c`, `src/picture.c`, `src/h264.c` |
| Phase 1 criteria | 5/6 PASS (C1 PARTIAL — Bug 4 narrowed not fixed) |
| Reviews | 2 (Phase 5 γ-plan review + Phase 5b/5c α candidate reviews); 1 plan killed (α-1) |
| Hypotheses eliminated | 5 (libva-readback, slot-binding, stale-residue, constraint_set_flags, POC sentinel) |
| Campaign scoreboard | Unchanged on pixel-correctness axis; +0 net codec fixes. +diagnostic instrumentation (γ dump, pre-zero gate). |
## Phase-by-phase narrative
### Phase 0 — Bug 4 locked
User pick: H.264 inter race-loss (carryover label from iter4 Phase 7). Lock criterion: `libva_h264 == kdirect_h264` byte-identical.
### Phase 2 — H.264 source-read
Walked h264.c pipeline end-to-end (994 LOC). Mapped per-frame decode flow (BeginPicture → RenderPicture → EndPicture → SyncSurface). Eliminated 6 of 13 surface-level hypotheses by source-read alone.
### Phase 3 — empirical strace + byte-level YUV examination
**Bug redefinition**: NOT inter-race-loss — even keyframe (IDR) fails. Pattern: structured 16×32 byte patch of luma-neutral data at Y top-left, rest zero. Per-codec diff: VP9 works via libva, HEVC/H.264 don't. SPS bytes: only diff is `constraint_set_flags` (libva=0 vs kdirect=2). DECODE_PARAMS shows POC sentinel diff (libva strips, kdirect doesn't).
### Phase 4 — γ-then-α plan
Decided: diagnostic dump γ FIRST to distinguish "kernel didn't write" from "libva mis-reads."
### Phase 5 — γ plan reviewed
Sonnet-architect: 2 CRIT (logging API misuse, missing null guard) + 4 IMP + 3 MIN. All mechanical; strategy stood.
### Phase 6 — γ implemented
Added env-gated diagnostic dump to surface.c::RequestSyncSurface. Built, installed.
### Phase 7 — γ + IMP-1 verification
γ dump showed libva-readback path returns exactly what's in the buffer (no readback bug). IMP-1 memset experiment confirmed the 16×32 patch is deterministic (NOT stale residue). **Bug pinned to kernel-side**: rkvdec accepts request, writes only 512 bytes, stops without error flag.
### Phase 4b/5b — α-1 SPS constraint_set_flags fix (KILLED)
Plan: derive constraint_set_flags per-profile.
Phase 5b reviewer empirically read rkvdec-h264.c end-to-end and found `constraint_set_flags` is NEVER accessed by the driver. **CRIT-1 killed α-1 before any build.** Saved ~15 LOC + a build/install/test cycle.
Reviewer redirected to DPB / POC diff.
### Phase 4c/5c/6c — α-2 POC sentinel strip removal
Plan: remove `h264_strip_ffmpeg_poc_sentinel` (hantro-specific kludge that's not needed for rkvdec).
Phase 5c greenlit with one amendment (preserve VA_PICTURE_H264_INVALID → 0 guard).
Phase 6c implemented: ~12 LOC change in h264.c.
### Phase 7c — α-2 verification
α-2 changed wire bytes (POC now matches kdirect's 0x00010000) but output unchanged. **POC isn't load-bearing for Bug 4.** Three more hypotheses eliminated (libva-readback, slot-binding, stale-residue, constraint_set_flags, POC sentinel — 5 total).
5-codec regression sweep: all 4 non-H.264 hashes unchanged. Zero regression.
## Commits shipped
### Fork (libva-v4l2-request-fourier)
| SHA | Files | LOC | Description |
|---|---|---|---|
| `7eae6ea` | src/surface.c | +78 / -1 | γ env-gated CAPTURE buffer diagnostic dump |
| `66ecbef` | src/picture.c | +23 | IMP-1 env-gated CAPTURE pre-zero diagnostic |
| `6f4e583` | src/picture.c | +1 | stdlib.h include fix-forward (getenv) |
| `0226684` | src/h264.c | +12 / -2 | α-2: pass H.264 POC values through unchanged |
### Campaign repo (fresnel-fourier)
| Commit | Phase | Description |
|---|---|---|
| `e47a7ba` | P0 | Lock Bug 4 |
| `abd97e3` | P2 | H.264 source-read |
| `4320d78` | P3 | Strace findings + bug redefinition |
| `3a63076` | P4 | γ-then-α plan |
| `d4c04b4` | P5 | Architect review |
| (this) | P7 + P8 | Verification + close |
## What worked
- **Phase 5b's empirical source-read killed α-1 BEFORE the build cycle**. Saved a build/install/test pass. Pure win for the review discipline.
- **γ dump immediately ruled out libva-readback hypothesis**. Without γ, iter9 would still be considering libva-side fixes.
- **IMP-1 memset experiment** was a one-line discriminator (env-gated) that definitively ruled out stale-residue.
- **5-codec regression sweep at every step**: no anchors moved across γ, IMP-1, α-2 changes. Zero regression risk to date.
## What didn't work
- **α-1 (constraint_set_flags)**: hypothesis dead per source read.
- **α-2 (POC strip)**: wire bytes changed; output unchanged.
Net: 5 hypotheses eliminated, Bug 4 still present.
## Lessons distilled
### Lesson 1: Wire-byte equivalence ≠ behavior equivalence
α-2 made libva's wire-byte output match kdirect's for POC fields. Output unchanged. **A single-field wire diff is a necessary signal but not sufficient — without knowing whether the kernel actually consumes that field, matching it is theater.** Phase 5b CRIT-1's source-read methodology is the right discipline.
### Lesson 2: Per-driver kludges should be flagged as such
`h264_strip_ffmpeg_poc_sentinel` was a hantro-specific workaround embedded in shared H.264 code without per-driver gating. iter5b-β's `feedback_unconditional_codec_state.md` rule covers per-codec; this case suggests an analogous **per-V4L2-driver** rule for any kludge with implementation-specific motivation. Memory candidate.
### Lesson 3: Bug carryover labels can mislead
"H.264 inter race-loss" was the label since iter4 P7. Phase 3 empirically showed the keyframe also fails — the bug isn't race-related at all. **Re-validate the bug shape with empirical evidence before assuming the prior label is correct.** Memory candidate.
## Cross-cutting backlog status (iter8 increment)
Closed:
- (none — Bug 4 is PARTIAL, not closed)
Narrowed:
- **Bug 4**: 5 hypotheses eliminated. Remaining candidates: PPS field diff, DPB ordering, DECODE_PARAMS post-DPB fields, slice data encoding.
Still open (unchanged):
- iter4-B1b (multi-decoder routing)
- iter4-B2, B3, Q6, COLOR_RANGE, L3
- Bug 5 (HEVC kernel rejection)
- Bug 6 (VP8 partial output, kernel-side per iter6 narrowing)
## Iter8 → iter9 handoff
Substrate at close:
- Fork tip `0226684` on noether + fresnel + gitea.
- Backend SHA `b6a3958a…` on fresnel.
- Kernel `linux-fresnel-fourier 7.0-1` unchanged.
- Test fixtures unchanged.
- γ + IMP-1 diagnostic infrastructure in place (env-gated off by default).
Campaign scoreboard (unchanged from iter7):
```
Codec | Site | Status | Notes
========|===========|===============|====================================
H.264 | rkvdec | PARTIAL | Bug 4 narrowed; PPS diff next (iter9)
HEVC | rkvdec | TRANSITIVE * | Bug 5 deferred
VP9 | rkvdec | PASS direct | iter5b-β fix
MPEG-2 | hantro | PASS (env) | iter1 PASS
VP8 | hantro | PARTIAL (env) | Bug 6 deferred
```
iter9 candidate ranking (per Phase 7c):
1. **α-3 PPS full-byte diff** (lowest LOC, mechanical strace re-run + diff)
2. **α-6 DECODE_PARAMS post-DPB fields** (strace re-run with longer -s, diff bytes 512-559)
3. **α-4 DPB entry ordering** (~10 LOC change to h264_fill_dpb)
4. **α-5 Slice data encoding** (deepest investigation, ~60 min)
## Memory rule candidates (deferred to next memory-curation pass)
- **Per-V4L2-driver kludge gating**: any hantro / rkvdec / cedrus-specific workaround in shared codec code must be gated, similar to per-codec gating per `feedback_unconditional_codec_state.md`.
- **Bug label revalidation**: when picking up a carryover bug, run Phase 3-equivalent empirical reproduction first; the label may be wrong.
- **Wire-byte equivalence ≠ behavior equivalence**: a successful "match kdirect's bytes" change isn't proof of fix. Always run the success criterion verifier post-change.
## Phase 8 commit
This document records iter8 close. Fork at `0226684`. Backend SHA `b6a3958a…`. Bug 4 narrowed via 5 eliminations; 4 of 5 codec criteria unchanged (PASS); criterion-1 H.264 still PARTIAL.
iter8 = 5/6 PASS. iter9 has a clearer search space than iter8 opened with.