# Iteration 8 — Phase 4 (plan) Drafted 2026-05-13 after Phase 3 redefined Bug 4 as partial-fill (not inter race). This Phase 4 picks an **empirically-driven plan**: a diagnostic dump (γ) FIRST to definitively distinguish kernel-write failure from libva-read failure, then targeted fixes contingent on what γ reveals. ## Decision: γ-then-α path Phase 3 evidence: - libva H.264 produces a structured 16×32 patch of real-looking image content at Y plane top-left (cluster around 0x7c..0x83), but kdirect H.264 of the same frame shows uniform 0x10 (near-black BBB intro) throughout. - The libva patch content is **not even consistent** with BBB frame 1's true content. It's content from somewhere else. - VP9 via libva reads back correctly. HEVC via libva returns all-zero. H.264 via libva returns a partial-fill garbage. These three distinct signatures across codecs (VP9 correct, HEVC zeroed, H.264 partial-leak) **cannot be explained by a single root cause hypothesis** without empirical narrowing. **Phase 4 commits to a diagnostic dump first** (γ), small (~30 LOC), gated behind `LIBVA_V4L2_DUMP_CAPTURE` env var. The dump fires inside `RequestSyncSurface` after the CAPTURE DQBUF, before `cap_pool_mark_decoded`. Output: a small hex+stats summary of `surface->destination_data[0]` (Y plane) + `destination_data[1]` (UV plane) — written to a Phase-3-style log. Empirical question γ answers: > "Does the kernel write proper NV12 pixel data to the CAPTURE buffer? If yes — the bug is in libva's readback. If no — the bug is in the kernel's response to libva's control submission." If γ shows kernel wrote correct data: pivot to mmap / cache / offset bug in libva. Likely fix in `surface_bind_slot` or cap_pool mmap. If γ shows kernel wrote partial/wrong data: pivot to specific control field fix (α — adjust SPS.constraint_set_flags, DECODE_PARAMS.dpb.bottom_field_order_cnt, etc. to match kdirect). If γ shows the buffer holds stale data (matching a prior test run's content): the kernel didn't write, AND the cap_pool slot was previously used. Initial write pattern needs investigation. ## In scope - `src/surface.c::RequestSyncSurface` — add diagnostic dump after CAPTURE DQBUF. - `src/cap_pool.c` (if needed) — expose slot's underlying buffer pointer. - `src/request.h` — possibly extend driver_data struct for env-var caching. ## Out of scope - Any per-codec H.264 control-fill changes (those follow γ's results). - VP9 / VP8 / HEVC / MPEG-2 paths. - Kernel patches. ## Plan A: γ diagnostic implementation ### Change shape Add to `RequestSyncSurface` after line 388 (`cap_pool_mark_decoded`): ```c static const char *dump_env = NULL; static bool dump_env_checked = false; if (!dump_env_checked) { dump_env = getenv("LIBVA_V4L2_DUMP_CAPTURE"); dump_env_checked = true; } if (dump_env != NULL && dump_env[0] == '1') { request_log("CAPTURE buffer dump for surface %d (slot v4l2_index=%d):\n", surface_id, surface_object->destination_index); for (unsigned int p = 0; p < surface_object->destination_planes_count; p++) { const unsigned char *d = surface_object->destination_data[p]; size_t sz = surface_object->destination_sizes[p]; unsigned int nz = 0; for (size_t i = 0; i < (sz > 1024 ? 1024 : sz); i++) if (d[i] != 0) nz++; request_log(" plane[%u] size=%zu, first-1024-bytes non-zero=%u\n", p, sz, nz); request_log(" plane[%u] hex[0..32]: ", p); for (unsigned int i = 0; i < 32; i++) request_log("%02x ", d[i]); request_log("\n"); request_log(" plane[%u] hex[%zu-32..%zu] tail: ", p, sz - 32, sz); for (unsigned int i = 0; i < 32; i++) request_log("%02x ", d[sz - 32 + i]); request_log("\n"); } } ``` ### LOC ~30 LOC added in `surface.c`. No removals. No other files touched. ### Verification (Phase 7 will exercise) 1. Build + install on fresnel. 2. Run libva H.264 sweep with `LIBVA_V4L2_DUMP_CAPTURE=1`. Capture stderr. 3. Run libva VP9 sweep with same env. Capture stderr. 4. Run libva HEVC sweep with same env. Capture stderr. 5. Compare each codec's dump against the YUV output bytes (the libva→hwdownload path's view). Three predicted outcomes: | Dump shows | Interpretation | Next iter9 work | |---|---|---| | Plane[0] all-zero or only 16×32 region populated, plane[1] all-zero | Kernel didn't write | Control-fill fix (constraint_set_flags + BFOC + flags) | | Plane[0] fully populated with real Y values, plane[1] populated with real UV | libva mis-reads / cache / offset bug | mmap/cache fix in cap_pool or surface_bind_slot | | Plane[0] populated with content NOT matching frame's actual image | Slot binding wrong (frame N's slot reads frame M's data) | Slot management fix in cap_pool | ### Risk - **Logging volume**: 3 frames × 24 cap_pool slots × 2 planes = ~144 log lines per run. Bounded. - **Performance**: a few microseconds per frame for the dump. Negligible vs the H.264 decode latency. - **Correctness**: env-gated; default behavior unchanged when env not set. - **No regression risk**: pure additive code. ## Plan B (post-γ): targeted fix based on outcome Phase 4b will be drafted AFTER γ's verification phase. Outline only here: ### B-1 (if γ shows "kernel didn't write"): - Change `h264_set_controls` to set `constraint_set_flags = 0x02` to match kdirect (1 LOC). - Investigate BFOC field-order for `dpb_entry`: if `entry->pic.BottomFieldOrderCnt` is non-zero in libva's `h264_fill_dpb` but the strip-sentinel zeros it, suppress the strip for inter frames (~3 LOC). - Check the dpb-entry `flags` upper bits in kdirect. ### B-2 (if γ shows "libva mis-reads"): - Audit `surface_bind_slot` offset math (line 98-102). - Verify `destination_offsets[]` computation in `surface_fill_format_uniform` against actual kernel layout for H.264 NV12. - Check whether cache invalidation is required: do `msync(MAP_INVALIDATE)` on the buffer before reading. ### B-3 (if γ shows "slot binding wrong"): - Audit `cap_pool_acquire` LRU rotation. - Check whether `destination_index` and `destination_data` come from the same slot. LOC estimate for Plan B: 5-30 LOC in 1-2 files. One commit. ## Phase 5 review concerns to invite 1. **The redefinition itself**: Phase 3 reframed Bug 4 from "inter race-loss" to "partial-fill on all frames (incl. keyframe)". Phase 5 reviewer should validate the empirical evidence — is the bug really partial-fill for keyframe too, or is the 16×32 patch some other artifact? 2. **γ vs α-first**: this plan defers the fix attempt. Phase 5 reviewer may push for trying α first since the SPS/POC fields are cheap to change. 3. **VP9 success**: any proposed fix must NOT break VP9's working state. Phase 5 reviewer should verify any changes are H.264-gated (per `feedback_unconditional_codec_state.md`). 4. **iter5b-β's CreateContext refactor**: γ's results may finger that refactor as the regression source for H.264. If so, Phase 5 reviewer should consider whether re-introducing a per-CreateSurfaces2 path (option α' that was rejected at iter5b) is necessary. ## Phase 4 → Phase 5 handoff Phase 4 plan locked: implement γ diagnostic dump. Phase 5 reviewer reviews this plan, suggests amendments. Phase 6 implements amended plan. Phase 7 runs the dump and analyzes outcomes. Phase 4b (targeted fix) follows. ## Predicted iter8 cadence - Phase 4: this doc. - Phase 5: sonnet-architect review (30 min). - Phase 6: implement γ (30 min). - Phase 7: run dump, characterize outcome, draft Phase 4b plan (60 min). - Phase 5b: review Phase 4b plan (30 min). - Phase 6b: implement Phase 4b fix (60-90 min). - Phase 7b: verify against 5-codec sweep (30 min). - Phase 8: close (15 min). Total: 4-5 hours wallclock. The two-phase γ-then-α path is longer than a direct α attempt but more empirically defensible. ## Iter8 success criteria recap (from Phase 0) 1. H.264 libva == kdirect, byte-identical 3-frame sweep. 2. VP9 unchanged (PASS direct, `4f1565e8…`). 3. MPEG-2 unchanged (`19eefbf4…`). 4. HEVC unchanged (`06b2c5a0…` all-zero). 5. VP8 unchanged (`bcc57ed5…` partial). 6. Control-payload anchors hold for 4 non-H.264 codecs. Phase 7b (verification) hits criterion 1 + regression on 2-6. If criterion 1 not met → PARTIAL close with γ + B-1/2/3 narrative.