diff --git a/phase4_iter8_plan.md b/phase4_iter8_plan.md new file mode 100644 index 0000000..7eb24ff --- /dev/null +++ b/phase4_iter8_plan.md @@ -0,0 +1,156 @@ +# Iteration 8 — Phase 4 (plan) + +Drafted 2026-05-13 after Phase 3 redefined Bug 4 as partial-fill (not inter race). This Phase 4 picks an **empirically-driven plan**: a diagnostic dump (γ) FIRST to definitively distinguish kernel-write failure from libva-read failure, then targeted fixes contingent on what γ reveals. + +## Decision: γ-then-α path + +Phase 3 evidence: + +- libva H.264 produces a structured 16×32 patch of real-looking image content at Y plane top-left (cluster around 0x7c..0x83), but kdirect H.264 of the same frame shows uniform 0x10 (near-black BBB intro) throughout. +- The libva patch content is **not even consistent** with BBB frame 1's true content. It's content from somewhere else. +- VP9 via libva reads back correctly. HEVC via libva returns all-zero. H.264 via libva returns a partial-fill garbage. + +These three distinct signatures across codecs (VP9 correct, HEVC zeroed, H.264 partial-leak) **cannot be explained by a single root cause hypothesis** without empirical narrowing. + +**Phase 4 commits to a diagnostic dump first** (γ), small (~30 LOC), gated behind `LIBVA_V4L2_DUMP_CAPTURE` env var. The dump fires inside `RequestSyncSurface` after the CAPTURE DQBUF, before `cap_pool_mark_decoded`. Output: a small hex+stats summary of `surface->destination_data[0]` (Y plane) + `destination_data[1]` (UV plane) — written to a Phase-3-style log. + +Empirical question γ answers: + +> "Does the kernel write proper NV12 pixel data to the CAPTURE buffer? If yes — the bug is in libva's readback. If no — the bug is in the kernel's response to libva's control submission." + +If γ shows kernel wrote correct data: pivot to mmap / cache / offset bug in libva. Likely fix in `surface_bind_slot` or cap_pool mmap. + +If γ shows kernel wrote partial/wrong data: pivot to specific control field fix (α — adjust SPS.constraint_set_flags, DECODE_PARAMS.dpb.bottom_field_order_cnt, etc. to match kdirect). + +If γ shows the buffer holds stale data (matching a prior test run's content): the kernel didn't write, AND the cap_pool slot was previously used. Initial write pattern needs investigation. + +## In scope + +- `src/surface.c::RequestSyncSurface` — add diagnostic dump after CAPTURE DQBUF. +- `src/cap_pool.c` (if needed) — expose slot's underlying buffer pointer. +- `src/request.h` — possibly extend driver_data struct for env-var caching. + +## Out of scope + +- Any per-codec H.264 control-fill changes (those follow γ's results). +- VP9 / VP8 / HEVC / MPEG-2 paths. +- Kernel patches. + +## Plan A: γ diagnostic implementation + +### Change shape + +Add to `RequestSyncSurface` after line 388 (`cap_pool_mark_decoded`): + +```c +static const char *dump_env = NULL; +static bool dump_env_checked = false; +if (!dump_env_checked) { + dump_env = getenv("LIBVA_V4L2_DUMP_CAPTURE"); + dump_env_checked = true; +} +if (dump_env != NULL && dump_env[0] == '1') { + request_log("CAPTURE buffer dump for surface %d (slot v4l2_index=%d):\n", + surface_id, surface_object->destination_index); + for (unsigned int p = 0; p < surface_object->destination_planes_count; p++) { + const unsigned char *d = surface_object->destination_data[p]; + size_t sz = surface_object->destination_sizes[p]; + unsigned int nz = 0; + for (size_t i = 0; i < (sz > 1024 ? 1024 : sz); i++) + if (d[i] != 0) nz++; + request_log(" plane[%u] size=%zu, first-1024-bytes non-zero=%u\n", + p, sz, nz); + request_log(" plane[%u] hex[0..32]: ", p); + for (unsigned int i = 0; i < 32; i++) + request_log("%02x ", d[i]); + request_log("\n"); + request_log(" plane[%u] hex[%zu-32..%zu] tail: ", p, + sz - 32, sz); + for (unsigned int i = 0; i < 32; i++) + request_log("%02x ", d[sz - 32 + i]); + request_log("\n"); + } +} +``` + +### LOC + +~30 LOC added in `surface.c`. No removals. No other files touched. + +### Verification (Phase 7 will exercise) + +1. Build + install on fresnel. +2. Run libva H.264 sweep with `LIBVA_V4L2_DUMP_CAPTURE=1`. Capture stderr. +3. Run libva VP9 sweep with same env. Capture stderr. +4. Run libva HEVC sweep with same env. Capture stderr. +5. Compare each codec's dump against the YUV output bytes (the libva→hwdownload path's view). + +Three predicted outcomes: + +| Dump shows | Interpretation | Next iter9 work | +|---|---|---| +| Plane[0] all-zero or only 16×32 region populated, plane[1] all-zero | Kernel didn't write | Control-fill fix (constraint_set_flags + BFOC + flags) | +| Plane[0] fully populated with real Y values, plane[1] populated with real UV | libva mis-reads / cache / offset bug | mmap/cache fix in cap_pool or surface_bind_slot | +| Plane[0] populated with content NOT matching frame's actual image | Slot binding wrong (frame N's slot reads frame M's data) | Slot management fix in cap_pool | + +### Risk + +- **Logging volume**: 3 frames × 24 cap_pool slots × 2 planes = ~144 log lines per run. Bounded. +- **Performance**: a few microseconds per frame for the dump. Negligible vs the H.264 decode latency. +- **Correctness**: env-gated; default behavior unchanged when env not set. +- **No regression risk**: pure additive code. + +## Plan B (post-γ): targeted fix based on outcome + +Phase 4b will be drafted AFTER γ's verification phase. Outline only here: + +### B-1 (if γ shows "kernel didn't write"): +- Change `h264_set_controls` to set `constraint_set_flags = 0x02` to match kdirect (1 LOC). +- Investigate BFOC field-order for `dpb_entry`: if `entry->pic.BottomFieldOrderCnt` is non-zero in libva's `h264_fill_dpb` but the strip-sentinel zeros it, suppress the strip for inter frames (~3 LOC). +- Check the dpb-entry `flags` upper bits in kdirect. + +### B-2 (if γ shows "libva mis-reads"): +- Audit `surface_bind_slot` offset math (line 98-102). +- Verify `destination_offsets[]` computation in `surface_fill_format_uniform` against actual kernel layout for H.264 NV12. +- Check whether cache invalidation is required: do `msync(MAP_INVALIDATE)` on the buffer before reading. + +### B-3 (if γ shows "slot binding wrong"): +- Audit `cap_pool_acquire` LRU rotation. +- Check whether `destination_index` and `destination_data` come from the same slot. + +LOC estimate for Plan B: 5-30 LOC in 1-2 files. One commit. + +## Phase 5 review concerns to invite + +1. **The redefinition itself**: Phase 3 reframed Bug 4 from "inter race-loss" to "partial-fill on all frames (incl. keyframe)". Phase 5 reviewer should validate the empirical evidence — is the bug really partial-fill for keyframe too, or is the 16×32 patch some other artifact? +2. **γ vs α-first**: this plan defers the fix attempt. Phase 5 reviewer may push for trying α first since the SPS/POC fields are cheap to change. +3. **VP9 success**: any proposed fix must NOT break VP9's working state. Phase 5 reviewer should verify any changes are H.264-gated (per `feedback_unconditional_codec_state.md`). +4. **iter5b-β's CreateContext refactor**: γ's results may finger that refactor as the regression source for H.264. If so, Phase 5 reviewer should consider whether re-introducing a per-CreateSurfaces2 path (option α' that was rejected at iter5b) is necessary. + +## Phase 4 → Phase 5 handoff + +Phase 4 plan locked: implement γ diagnostic dump. Phase 5 reviewer reviews this plan, suggests amendments. Phase 6 implements amended plan. Phase 7 runs the dump and analyzes outcomes. Phase 4b (targeted fix) follows. + +## Predicted iter8 cadence + +- Phase 4: this doc. +- Phase 5: sonnet-architect review (30 min). +- Phase 6: implement γ (30 min). +- Phase 7: run dump, characterize outcome, draft Phase 4b plan (60 min). +- Phase 5b: review Phase 4b plan (30 min). +- Phase 6b: implement Phase 4b fix (60-90 min). +- Phase 7b: verify against 5-codec sweep (30 min). +- Phase 8: close (15 min). + +Total: 4-5 hours wallclock. The two-phase γ-then-α path is longer than a direct α attempt but more empirically defensible. + +## Iter8 success criteria recap (from Phase 0) + +1. H.264 libva == kdirect, byte-identical 3-frame sweep. +2. VP9 unchanged (PASS direct, `4f1565e8…`). +3. MPEG-2 unchanged (`19eefbf4…`). +4. HEVC unchanged (`06b2c5a0…` all-zero). +5. VP8 unchanged (`bcc57ed5…` partial). +6. Control-payload anchors hold for 4 non-H.264 codecs. + +Phase 7b (verification) hits criterion 1 + regression on 2-6. If criterion 1 not met → PARTIAL close with γ + B-1/2/3 narrative.