diff --git a/phase5c_iter8_review.md b/phase5c_iter8_review.md new file mode 100644 index 0000000..b24e38d --- /dev/null +++ b/phase5c_iter8_review.md @@ -0,0 +1,98 @@ +# Iteration 8 — Phase 5c Review (α-2 POC sentinel strip removal) + +Reviewed 2026-05-13. Reviewer: claude-sonnet-4-6. + +Plan under review: `/home/mfritsche/src/fresnel-fourier/phase4c_iter8_plan.md` + +Sources read: +- `/usr/include/linux/v4l2-controls.h` — `struct v4l2_h264_dpb_entry` (lines 1635–1644) +- `/home/mfritsche/src/linux-rfc/drivers/staging/media/rkvdec/rkvdec-h264.c` — lines 845–864, 975–991 +- `/home/mfritsche/src/linux-rfc/drivers/staging/media/rkvdec/rkvdec-regs.h` — `RKVDEC_CUR_POC` macro (line 150) +- `/home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/src/h264.c` — lines 191–228 (helper + docblock), 308–313 (DPB call sites), 461–466 (current-frame call sites) + +--- + +## Struct layout verification + +The plan decodes the 32-byte strace blob using this layout: + +| Offset | Field | Width | +|---|---|---| +| 0–7 | reference_ts | u64 | +| 8–11 | pic_num | u32 | +| 12–13 | frame_num | u16 | +| 14 | fields | u8 | +| 15–19 | reserved[5] | u8[5] | +| 20–23 | top_field_order_cnt | s32 | +| 24–27 | bottom_field_order_cnt | s32 | +| 28–31 | flags | u32 | + +Verified against `/usr/include/linux/v4l2-controls.h` lines 1635–1644. `struct calcsize` = 32. The reserved[5] padding gap (bytes 15–19) in the system header is explicit (`__u8 reserved[5]`) — no compiler-inserted padding. The plan's offset table is correct. + +The decoded values from the kdirect blob are: +`top_field_order_cnt = 0x10000 (65536)`, `bottom_field_order_cnt = 0x10000 (65536)`, `flags = 0x03`. + +The Phase 5b CRIT-1 re-read ("kdirect flags = 0x30001") was a misalignment error — the plan's correction is empirically confirmed. + +--- + +## CRIT findings + +**No CRIT findings.** The strace re-decode is structurally correct. The POC diff is real. + +--- + +## IMP findings + +### IMP-1: RKVDEC_CUR_POC masks to 32 bits — no masking of the sentinel specifically + +`rkvdec-regs.h:150`: +```c +#define RKVDEC_CUR_POC(x) ((x) & 0xffffffff) +``` + +This is a 32-bit mask on a 32-bit `s32` argument — it passes the value through unchanged. No bit-16 masking, no sentinel suppression. The full 32-bit POC value including the sentinel `0x10000` is written verbatim to `RKVDEC_REG_CUR_POC0` / `RKVDEC_REG_CUR_POC1`. + +Similarly, lines 975–978 write `dpb[i].top_field_order_cnt` and `dpb[i].bottom_field_order_cnt` directly via `writel_relaxed` with no intervening mask. + +The hardware therefore receives POC = 65536 from kdirect (sentinel preserved) vs POC = 0 from libva (sentinel stripped). These are numerically different inputs to the hardware's POC-based reference picture matching. The fix hypothesis is structurally sound. + +### IMP-2: Strip removal also changes current-frame POC at h264.c:462-465 — flag it, but it's correct + +Call sites at lines 461–466 strip the sentinel from `VAPicture->CurrPic.TopFieldOrderCnt` / `BottomFieldOrderCnt` before writing to `decode->top_field_order_cnt` / `decode->bottom_field_order_cnt`. Removing the strip (Option A) changes these fields too, not just the DPB entries. + +This is the right behaviour: rkvdec line 988 writes `dec_params->top_field_order_cnt` to `RKVDEC_REG_CUR_POC0` via `RKVDEC_CUR_POC`, and the hardware compares current-frame POC against reference POC for temporal distance calculations. Both sides must agree on whether the sentinel is present. kdirect sends the sentinel on both sides; Option A makes libva consistent with kdirect. + +**The plan's LOC estimate covers only 4 call sites (h264.c:309, 312, 462, 465) — confirmed by source read. The estimate is accurate.** + +### IMP-3: Option B not needed now, but add a TODO comment if Option A is chosen + +The plan argues Option A is safe because hantro-decoder doesn't support H.264 on RK3399. This is correct in the campaign context. However, the docblock at h264.c:191-218 documents the hantro-specific reason the strip was added. If Option A removes the function entirely, that rationale disappears. + +**Amendment**: Keep the helper function body but change it to return `poc` (or `poc` when valid, 0 when INVALID) without the sentinel subtraction. Add a one-line comment: `/* rkvdec writes POC verbatim to MMIO; stripping hantro's sentinel is wrong here */`. If a future port to hantro+H.264 needs the strip, the comment shows where to re-gate it. This costs 0 extra LOC net (same function, different body). + +Alternative: delete the function and inline `return poc;` at each call site — also fine, slightly cleaner. + +--- + +## MIN findings + +### MIN-1: R-2 probability rating ("medium") is reasonable + +The plan rates "POC diff isn't root cause" at medium. Given that (a) flags match, (b) fields match, (c) POC is the only confirmed numerical diff, and (d) kdirect with sentinel decodes correctly, medium is slightly conservative but appropriate — there may be slice-header fields that also differ (Phase 5b IMP-2 PPS concern). The rating doesn't need changing. + +--- + +## Amendments + +1. **Option A greenlit with one amendment**: When removing the sentinel subtraction, preserve the `VA_PICTURE_H264_INVALID` guard (`if (flags & VA_PICTURE_H264_INVALID) return 0;`) — empty DPB slots must still return 0. Only remove the `if (poc & (1 << 16)) return poc - (1 << 16);` branch. The function becomes a conditional identity, which can be inlined or kept. + +2. **Document both sides**: The current frame's POC call sites (h264.c:462, 465) are affected by Option A — this is correct and intentional. Phase 6c commit message should call this out explicitly so the reviewer doesn't flag it as an unintended scope change. + +--- + +## Verdict + +**Proceed with Option A.** The strace re-decode is correct. The struct layout is verified. rkvdec writes the full 32-bit POC with no masking. The strip is the confirmed diff between libva and kdirect. No CRIT objections. + +Phase 6c is a 5-LOC change. Phase 7c will confirm whether full-plane writes appear in the γ dump. diff --git a/phase7c_iter8_verification.md b/phase7c_iter8_verification.md new file mode 100644 index 0000000..ad1487c --- /dev/null +++ b/phase7c_iter8_verification.md @@ -0,0 +1,101 @@ +# Iteration 8 — Phase 7c (α-2 verification + 5-codec regression sweep) + +Captured 2026-05-13. Backend SHA `b6a3958a5bca9451…` (iter8 P6 γ + IMP-1 + α-2 POC strip removal). Fork tip `0226684`. + +## Verdict on α-2 + +**α-2 (remove POC sentinel strip) did NOT fix Bug 4.** + +| Metric | Before α-2 | After α-2 | +|---|---|---| +| H.264 libva YUV hash | `71ac099b…` | `71ac099b…` (unchanged) | +| Frame 1 non-zero bytes | 512 (16×32 leak) | 512 (same pattern) | +| Frame 2 non-zero | 32 sparse UV markers | 32 same | +| Frame 3 non-zero | 16 sparse UV markers | 16 same | + +α-2 DID change the wire-level POC bytes (now matches kdirect's sentinel-encoded `0x00010000` for tfoc/bfoc instead of libva's previously-stripped `0`), but the kernel produces identical output. **POC values are not load-bearing for Bug 4.** + +## What Phase 7c eliminated + +This iteration's α-2 attempt eliminated one more hypothesis. Combined with prior phases: + +| Hypothesis | Status | Eliminated by | +|---|---|---| +| libva mis-reads CAPTURE buffer | ❌ Eliminated | γ dump shows libva reads what's in buffer | +| Slot-binding wrong | ❌ Eliminated | γ shows correct slot index per surface | +| Stale residue from prior runs | ❌ Eliminated | IMP-1 memset experiment (same hash) | +| `constraint_set_flags` (SPS byte 1) | ❌ Eliminated | Phase 5b CRIT-1: rkvdec source review | +| H.264 POC sentinel strip | ❌ Eliminated | α-2 changed wire, no output change | + +## What remains as candidate + +DECODE_PARAMS / DPB construction beyond POC and flags: + +1. **DPB entry ORDER**. Phase 4c strace re-decode shows libva fills DPB in entry-array order (entry[0]=frame_1, entry[1]=frame_2), kdirect fills in short_ref[] order (DPB[0]=most-recent-ref = frame_2). May or may not matter; rkvdec docs unclear. +2. **DECODE_PARAMS fields after dpb[16]** (bytes 512–559 of the 560-byte struct). Phase 3 didn't deep-compare these. Includes nal_ref_idc, frame_num, current frame's tfoc/bfoc, pic_order_cnt_bit_size, dec_ref_pic_marking_bit_size, delta_pic_order_cnt*, flags. +3. **PPS field-by-field**. Phase 5b IMP-2 flagged Phase 3 only shallow-compared 12 bytes. Full byte diff needed. +4. **Slice data encoding**. The OUTPUT bitstream bytes the kernel sees per slice. Phase 3 didn't validate this. + +## 5-codec regression sweep on α-2 backend + +| Codec | Anchor (iter7) | iter8 α-2 hash | Verdict | +|---|---|---|---| +| H.264 | `71ac099b…` | `71ac099b…` | unchanged (Bug 4 still present) | +| HEVC | `06b2c5a0…` | `06b2c5a0…` | unchanged (Bug 5 still present) | +| VP9 | `4f1565e8…` | `4f1565e8…` | unchanged (PASS direct) | +| MPEG-2 | `19eefbf4…` | `19eefbf4…` | unchanged (PASS) | +| VP8 | `bcc57ed5…` | `bcc57ed5…` | unchanged (Bug 6 still present) | + +**Zero regression on the 4 non-H.264 codecs.** α-2 is a hygiene cleanup that aligns libva's wire payload more closely with kdirect's, even though it doesn't close Bug 4. + +## Iter8 criteria final reckoning + +| # | Criterion | Verdict | +|---|---|---| +| 1 | libva H.264 == kdirect H.264 | **FAIL** (PARTIAL — Bug 4 narrowed but unfixed) | +| 2 | VP9 unchanged | PASS | +| 3 | MPEG-2 unchanged | PASS | +| 4 | HEVC unchanged | PASS | +| 5 | VP8 unchanged | PASS | +| 6 | Control-payload anchors hold for 4 non-H.264 codecs | PASS (4 codec hashes unchanged ⇒ controls unchanged) | + +**5 of 6 PASS.** Criterion 1 PARTIAL: Bug 4 significantly narrowed (3 hypotheses eliminated) but not fixed. + +## Lasting code changes from iter8 + +The fork now carries 4 commits beyond iter7: + +| SHA | Description | Ship status | +|---|---|---| +| `7eae6ea` | γ env-gated CAPTURE buffer diagnostic dump | Diagnostic, env-gated off | +| `66ecbef` | IMP-1 env-gated CAPTURE pre-zero diagnostic | Diagnostic, env-gated off | +| `6f4e583` | stdlib.h include for picture.c | Mechanical fix | +| `0226684` | α-2: POC sentinel strip removal | Hygiene cleanup, no behavior change | + +All four commits are zero-regression by env-gating or by being mechanically necessary. Net effect on default behavior: nothing changed except internal POC values sent to rkvdec are sentinel-encoded (which doesn't change decoder output for any of 5 codecs). + +## Backend SHA chain + +| Stage | SHA | +|---|---| +| iter7 close | `520507f6d0a1a7eb3797bed42c6f74e0f3a4826ac8a22ed2655e01a6f20aa874` | +| iter8 P6 γ | `94467f0c0aaba804233206d01666302e42a5cbcaf513a8157045623a66fae58e` | +| iter8 P7 IMP-1 | `e4649a4854a24ec3209ea5ce6705cef8ed9b80cb277701ecb34eb822516c6030` | +| iter8 P6c α-2 | `b6a3958a5bca945164262339dea5cc28f17accce13d57bd9f0c5a5dabbdf1b53` | + +## iter8 → iter9 handoff + +iter8 closes PARTIAL with significant Bug 4 narrowing. Three hypotheses eliminated (constraint_set_flags, POC strip, libva-readback). iter9 candidates: + +- **α-3 PPS full-byte diff**: re-strace and dump all 12 bytes of PPS for libva vs kdirect. If diff exists, target it. Cost: 30 min for strace + analysis + small fix. +- **α-4 DPB entry order**: change `h264_fill_dpb` to fill in most-recent-first order matching kdirect. Cost: ~10 LOC + verify. +- **α-5 Slice data encoding**: deep-dump the OUTPUT buffer bytes libva writes vs kdirect's. Look for NAL-header / start-code / EBSP differences. Cost: 60 min + maybe small fix. +- **α-6 DECODE_PARAMS post-DPB fields**: re-strace with `-s 1024` to capture full DECODE_PARAMS; diff bytes 512-559. Cost: 30 min. + +Recommended order: α-3 (cheapest) → α-6 → α-4 → α-5. + +## Lessons distilled + +1. **Reviews caught a dead-end early**. Phase 5b CRIT-1's empirical rkvdec source read saved a ~15 LOC futile change. Memory rule reinforcement: `feedback_review_empirical_over_theoretical.md` direction (reviewer reads source > author's plan-level speculation). +2. **Wire-byte changes aren't always behavior changes**. α-2 visibly changed the V4L2 ioctl bytes (POC values flipped) but the decoder output was identical. Wire-level diffs are necessary but not sufficient. +3. **γ-then-α was the right empirical sequence**. The γ dump confirmed libva-readback wasn't the issue; without it, iter9 would still be considering libva-side fixes. diff --git a/phase8_iteration8_close.md b/phase8_iteration8_close.md new file mode 100644 index 0000000..7daa403 --- /dev/null +++ b/phase8_iteration8_close.md @@ -0,0 +1,169 @@ +# Iteration 8 — Phase 8 (close) + +Closes 2026-05-13. iter8 = Bug 4 (H.264 partial-fill, originally "inter race-loss") PARTIAL close — narrowed via three eliminations, not fixed. + +## Summary + +| Metric | Value | +|---|---| +| Iteration target | Bug 4 — H.264 libva produces 16×32 partial-fill instead of full frame | +| Hardware | RK3399 rkvdec | +| Fork tip start (iter7 close) | `6df2159` | +| Fork tip end (iter8 close) | `0226684` (4 commits: γ dump, IMP-1 pre-zero, stdlib include fix-fwd, α-2 POC strip removal) | +| LOC delta | +103 / -3 across `src/surface.c`, `src/picture.c`, `src/h264.c` | +| Phase 1 criteria | 5/6 PASS (C1 PARTIAL — Bug 4 narrowed not fixed) | +| Reviews | 2 (Phase 5 γ-plan review + Phase 5b/5c α candidate reviews); 1 plan killed (α-1) | +| Hypotheses eliminated | 5 (libva-readback, slot-binding, stale-residue, constraint_set_flags, POC sentinel) | +| Campaign scoreboard | Unchanged on pixel-correctness axis; +0 net codec fixes. +diagnostic instrumentation (γ dump, pre-zero gate). | + +## Phase-by-phase narrative + +### Phase 0 — Bug 4 locked + +User pick: H.264 inter race-loss (carryover label from iter4 Phase 7). Lock criterion: `libva_h264 == kdirect_h264` byte-identical. + +### Phase 2 — H.264 source-read + +Walked h264.c pipeline end-to-end (994 LOC). Mapped per-frame decode flow (BeginPicture → RenderPicture → EndPicture → SyncSurface). Eliminated 6 of 13 surface-level hypotheses by source-read alone. + +### Phase 3 — empirical strace + byte-level YUV examination + +**Bug redefinition**: NOT inter-race-loss — even keyframe (IDR) fails. Pattern: structured 16×32 byte patch of luma-neutral data at Y top-left, rest zero. Per-codec diff: VP9 works via libva, HEVC/H.264 don't. SPS bytes: only diff is `constraint_set_flags` (libva=0 vs kdirect=2). DECODE_PARAMS shows POC sentinel diff (libva strips, kdirect doesn't). + +### Phase 4 — γ-then-α plan + +Decided: diagnostic dump γ FIRST to distinguish "kernel didn't write" from "libva mis-reads." + +### Phase 5 — γ plan reviewed + +Sonnet-architect: 2 CRIT (logging API misuse, missing null guard) + 4 IMP + 3 MIN. All mechanical; strategy stood. + +### Phase 6 — γ implemented + +Added env-gated diagnostic dump to surface.c::RequestSyncSurface. Built, installed. + +### Phase 7 — γ + IMP-1 verification + +γ dump showed libva-readback path returns exactly what's in the buffer (no readback bug). IMP-1 memset experiment confirmed the 16×32 patch is deterministic (NOT stale residue). **Bug pinned to kernel-side**: rkvdec accepts request, writes only 512 bytes, stops without error flag. + +### Phase 4b/5b — α-1 SPS constraint_set_flags fix (KILLED) + +Plan: derive constraint_set_flags per-profile. +Phase 5b reviewer empirically read rkvdec-h264.c end-to-end and found `constraint_set_flags` is NEVER accessed by the driver. **CRIT-1 killed α-1 before any build.** Saved ~15 LOC + a build/install/test cycle. + +Reviewer redirected to DPB / POC diff. + +### Phase 4c/5c/6c — α-2 POC sentinel strip removal + +Plan: remove `h264_strip_ffmpeg_poc_sentinel` (hantro-specific kludge that's not needed for rkvdec). +Phase 5c greenlit with one amendment (preserve VA_PICTURE_H264_INVALID → 0 guard). +Phase 6c implemented: ~12 LOC change in h264.c. + +### Phase 7c — α-2 verification + +α-2 changed wire bytes (POC now matches kdirect's 0x00010000) but output unchanged. **POC isn't load-bearing for Bug 4.** Three more hypotheses eliminated (libva-readback, slot-binding, stale-residue, constraint_set_flags, POC sentinel — 5 total). + +5-codec regression sweep: all 4 non-H.264 hashes unchanged. Zero regression. + +## Commits shipped + +### Fork (libva-v4l2-request-fourier) + +| SHA | Files | LOC | Description | +|---|---|---|---| +| `7eae6ea` | src/surface.c | +78 / -1 | γ env-gated CAPTURE buffer diagnostic dump | +| `66ecbef` | src/picture.c | +23 | IMP-1 env-gated CAPTURE pre-zero diagnostic | +| `6f4e583` | src/picture.c | +1 | stdlib.h include fix-forward (getenv) | +| `0226684` | src/h264.c | +12 / -2 | α-2: pass H.264 POC values through unchanged | + +### Campaign repo (fresnel-fourier) + +| Commit | Phase | Description | +|---|---|---| +| `e47a7ba` | P0 | Lock Bug 4 | +| `abd97e3` | P2 | H.264 source-read | +| `4320d78` | P3 | Strace findings + bug redefinition | +| `3a63076` | P4 | γ-then-α plan | +| `d4c04b4` | P5 | Architect review | +| (this) | P7 + P8 | Verification + close | + +## What worked + +- **Phase 5b's empirical source-read killed α-1 BEFORE the build cycle**. Saved a build/install/test pass. Pure win for the review discipline. +- **γ dump immediately ruled out libva-readback hypothesis**. Without γ, iter9 would still be considering libva-side fixes. +- **IMP-1 memset experiment** was a one-line discriminator (env-gated) that definitively ruled out stale-residue. +- **5-codec regression sweep at every step**: no anchors moved across γ, IMP-1, α-2 changes. Zero regression risk to date. + +## What didn't work + +- **α-1 (constraint_set_flags)**: hypothesis dead per source read. +- **α-2 (POC strip)**: wire bytes changed; output unchanged. + +Net: 5 hypotheses eliminated, Bug 4 still present. + +## Lessons distilled + +### Lesson 1: Wire-byte equivalence ≠ behavior equivalence + +α-2 made libva's wire-byte output match kdirect's for POC fields. Output unchanged. **A single-field wire diff is a necessary signal but not sufficient — without knowing whether the kernel actually consumes that field, matching it is theater.** Phase 5b CRIT-1's source-read methodology is the right discipline. + +### Lesson 2: Per-driver kludges should be flagged as such + +`h264_strip_ffmpeg_poc_sentinel` was a hantro-specific workaround embedded in shared H.264 code without per-driver gating. iter5b-β's `feedback_unconditional_codec_state.md` rule covers per-codec; this case suggests an analogous **per-V4L2-driver** rule for any kludge with implementation-specific motivation. Memory candidate. + +### Lesson 3: Bug carryover labels can mislead + +"H.264 inter race-loss" was the label since iter4 P7. Phase 3 empirically showed the keyframe also fails — the bug isn't race-related at all. **Re-validate the bug shape with empirical evidence before assuming the prior label is correct.** Memory candidate. + +## Cross-cutting backlog status (iter8 increment) + +Closed: +- (none — Bug 4 is PARTIAL, not closed) + +Narrowed: +- **Bug 4**: 5 hypotheses eliminated. Remaining candidates: PPS field diff, DPB ordering, DECODE_PARAMS post-DPB fields, slice data encoding. + +Still open (unchanged): +- iter4-B1b (multi-decoder routing) +- iter4-B2, B3, Q6, COLOR_RANGE, L3 +- Bug 5 (HEVC kernel rejection) +- Bug 6 (VP8 partial output, kernel-side per iter6 narrowing) + +## Iter8 → iter9 handoff + +Substrate at close: +- Fork tip `0226684` on noether + fresnel + gitea. +- Backend SHA `b6a3958a…` on fresnel. +- Kernel `linux-fresnel-fourier 7.0-1` unchanged. +- Test fixtures unchanged. +- γ + IMP-1 diagnostic infrastructure in place (env-gated off by default). + +Campaign scoreboard (unchanged from iter7): + +``` +Codec | Site | Status | Notes +========|===========|===============|==================================== +H.264 | rkvdec | PARTIAL | Bug 4 narrowed; PPS diff next (iter9) +HEVC | rkvdec | TRANSITIVE * | Bug 5 deferred +VP9 | rkvdec | PASS direct | iter5b-β fix +MPEG-2 | hantro | PASS (env) | iter1 PASS +VP8 | hantro | PARTIAL (env) | Bug 6 deferred +``` + +iter9 candidate ranking (per Phase 7c): +1. **α-3 PPS full-byte diff** (lowest LOC, mechanical strace re-run + diff) +2. **α-6 DECODE_PARAMS post-DPB fields** (strace re-run with longer -s, diff bytes 512-559) +3. **α-4 DPB entry ordering** (~10 LOC change to h264_fill_dpb) +4. **α-5 Slice data encoding** (deepest investigation, ~60 min) + +## Memory rule candidates (deferred to next memory-curation pass) + +- **Per-V4L2-driver kludge gating**: any hantro / rkvdec / cedrus-specific workaround in shared codec code must be gated, similar to per-codec gating per `feedback_unconditional_codec_state.md`. +- **Bug label revalidation**: when picking up a carryover bug, run Phase 3-equivalent empirical reproduction first; the label may be wrong. +- **Wire-byte equivalence ≠ behavior equivalence**: a successful "match kdirect's bytes" change isn't proof of fix. Always run the success criterion verifier post-change. + +## Phase 8 commit + +This document records iter8 close. Fork at `0226684`. Backend SHA `b6a3958a…`. Bug 4 narrowed via 5 eliminations; 4 of 5 codec criteria unchanged (PASS); criterion-1 H.264 still PARTIAL. + +iter8 = 5/6 PASS. iter9 has a clearer search space than iter8 opened with.