Phase 3 redefined Bug 4 to partial-fill (not inter race). Three distinct per-codec signatures (VP9 correct, HEVC zero, H.264 partial-leak) can't be explained by a single hypothesis. Phase 4 commits to γ first: a ~30 LOC env-gated diagnostic dump in RequestSyncSurface that fires after CAPTURE DQBUF, prints first/last 32 bytes of each destination_data plane and a non-zero-count of the first 1024 bytes. γ definitively distinguishes "kernel didn't write" from "libva mis-reads" from "slot binding wrong". Phase 4b targeted fix follows γ's outcome. Out of scope: per-codec H.264 control-fill changes (gated on γ's findings), VP9/VP8/HEVC/MPEG-2 paths, kernel patches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.2 KiB
Iteration 8 — Phase 4 (plan)
Drafted 2026-05-13 after Phase 3 redefined Bug 4 as partial-fill (not inter race). This Phase 4 picks an empirically-driven plan: a diagnostic dump (γ) FIRST to definitively distinguish kernel-write failure from libva-read failure, then targeted fixes contingent on what γ reveals.
Decision: γ-then-α path
Phase 3 evidence:
- libva H.264 produces a structured 16×32 patch of real-looking image content at Y plane top-left (cluster around 0x7c..0x83), but kdirect H.264 of the same frame shows uniform 0x10 (near-black BBB intro) throughout.
- The libva patch content is not even consistent with BBB frame 1's true content. It's content from somewhere else.
- VP9 via libva reads back correctly. HEVC via libva returns all-zero. H.264 via libva returns a partial-fill garbage.
These three distinct signatures across codecs (VP9 correct, HEVC zeroed, H.264 partial-leak) cannot be explained by a single root cause hypothesis without empirical narrowing.
Phase 4 commits to a diagnostic dump first (γ), small (~30 LOC), gated behind LIBVA_V4L2_DUMP_CAPTURE env var. The dump fires inside RequestSyncSurface after the CAPTURE DQBUF, before cap_pool_mark_decoded. Output: a small hex+stats summary of surface->destination_data[0] (Y plane) + destination_data[1] (UV plane) — written to a Phase-3-style log.
Empirical question γ answers:
"Does the kernel write proper NV12 pixel data to the CAPTURE buffer? If yes — the bug is in libva's readback. If no — the bug is in the kernel's response to libva's control submission."
If γ shows kernel wrote correct data: pivot to mmap / cache / offset bug in libva. Likely fix in surface_bind_slot or cap_pool mmap.
If γ shows kernel wrote partial/wrong data: pivot to specific control field fix (α — adjust SPS.constraint_set_flags, DECODE_PARAMS.dpb.bottom_field_order_cnt, etc. to match kdirect).
If γ shows the buffer holds stale data (matching a prior test run's content): the kernel didn't write, AND the cap_pool slot was previously used. Initial write pattern needs investigation.
In scope
src/surface.c::RequestSyncSurface— add diagnostic dump after CAPTURE DQBUF.src/cap_pool.c(if needed) — expose slot's underlying buffer pointer.src/request.h— possibly extend driver_data struct for env-var caching.
Out of scope
- Any per-codec H.264 control-fill changes (those follow γ's results).
- VP9 / VP8 / HEVC / MPEG-2 paths.
- Kernel patches.
Plan A: γ diagnostic implementation
Change shape
Add to RequestSyncSurface after line 388 (cap_pool_mark_decoded):
static const char *dump_env = NULL;
static bool dump_env_checked = false;
if (!dump_env_checked) {
dump_env = getenv("LIBVA_V4L2_DUMP_CAPTURE");
dump_env_checked = true;
}
if (dump_env != NULL && dump_env[0] == '1') {
request_log("CAPTURE buffer dump for surface %d (slot v4l2_index=%d):\n",
surface_id, surface_object->destination_index);
for (unsigned int p = 0; p < surface_object->destination_planes_count; p++) {
const unsigned char *d = surface_object->destination_data[p];
size_t sz = surface_object->destination_sizes[p];
unsigned int nz = 0;
for (size_t i = 0; i < (sz > 1024 ? 1024 : sz); i++)
if (d[i] != 0) nz++;
request_log(" plane[%u] size=%zu, first-1024-bytes non-zero=%u\n",
p, sz, nz);
request_log(" plane[%u] hex[0..32]: ", p);
for (unsigned int i = 0; i < 32; i++)
request_log("%02x ", d[i]);
request_log("\n");
request_log(" plane[%u] hex[%zu-32..%zu] tail: ", p,
sz - 32, sz);
for (unsigned int i = 0; i < 32; i++)
request_log("%02x ", d[sz - 32 + i]);
request_log("\n");
}
}
LOC
~30 LOC added in surface.c. No removals. No other files touched.
Verification (Phase 7 will exercise)
- Build + install on fresnel.
- Run libva H.264 sweep with
LIBVA_V4L2_DUMP_CAPTURE=1. Capture stderr. - Run libva VP9 sweep with same env. Capture stderr.
- Run libva HEVC sweep with same env. Capture stderr.
- Compare each codec's dump against the YUV output bytes (the libva→hwdownload path's view).
Three predicted outcomes:
| Dump shows | Interpretation | Next iter9 work |
|---|---|---|
| Plane[0] all-zero or only 16×32 region populated, plane[1] all-zero | Kernel didn't write | Control-fill fix (constraint_set_flags + BFOC + flags) |
| Plane[0] fully populated with real Y values, plane[1] populated with real UV | libva mis-reads / cache / offset bug | mmap/cache fix in cap_pool or surface_bind_slot |
| Plane[0] populated with content NOT matching frame's actual image | Slot binding wrong (frame N's slot reads frame M's data) | Slot management fix in cap_pool |
Risk
- Logging volume: 3 frames × 24 cap_pool slots × 2 planes = ~144 log lines per run. Bounded.
- Performance: a few microseconds per frame for the dump. Negligible vs the H.264 decode latency.
- Correctness: env-gated; default behavior unchanged when env not set.
- No regression risk: pure additive code.
Plan B (post-γ): targeted fix based on outcome
Phase 4b will be drafted AFTER γ's verification phase. Outline only here:
B-1 (if γ shows "kernel didn't write"):
- Change
h264_set_controlsto setconstraint_set_flags = 0x02to match kdirect (1 LOC). - Investigate BFOC field-order for
dpb_entry: ifentry->pic.BottomFieldOrderCntis non-zero in libva'sh264_fill_dpbbut the strip-sentinel zeros it, suppress the strip for inter frames (~3 LOC). - Check the dpb-entry
flagsupper bits in kdirect.
B-2 (if γ shows "libva mis-reads"):
- Audit
surface_bind_slotoffset math (line 98-102). - Verify
destination_offsets[]computation insurface_fill_format_uniformagainst actual kernel layout for H.264 NV12. - Check whether cache invalidation is required: do
msync(MAP_INVALIDATE)on the buffer before reading.
B-3 (if γ shows "slot binding wrong"):
- Audit
cap_pool_acquireLRU rotation. - Check whether
destination_indexanddestination_datacome from the same slot.
LOC estimate for Plan B: 5-30 LOC in 1-2 files. One commit.
Phase 5 review concerns to invite
- The redefinition itself: Phase 3 reframed Bug 4 from "inter race-loss" to "partial-fill on all frames (incl. keyframe)". Phase 5 reviewer should validate the empirical evidence — is the bug really partial-fill for keyframe too, or is the 16×32 patch some other artifact?
- γ vs α-first: this plan defers the fix attempt. Phase 5 reviewer may push for trying α first since the SPS/POC fields are cheap to change.
- VP9 success: any proposed fix must NOT break VP9's working state. Phase 5 reviewer should verify any changes are H.264-gated (per
feedback_unconditional_codec_state.md). - iter5b-β's CreateContext refactor: γ's results may finger that refactor as the regression source for H.264. If so, Phase 5 reviewer should consider whether re-introducing a per-CreateSurfaces2 path (option α' that was rejected at iter5b) is necessary.
Phase 4 → Phase 5 handoff
Phase 4 plan locked: implement γ diagnostic dump. Phase 5 reviewer reviews this plan, suggests amendments. Phase 6 implements amended plan. Phase 7 runs the dump and analyzes outcomes. Phase 4b (targeted fix) follows.
Predicted iter8 cadence
- Phase 4: this doc.
- Phase 5: sonnet-architect review (30 min).
- Phase 6: implement γ (30 min).
- Phase 7: run dump, characterize outcome, draft Phase 4b plan (60 min).
- Phase 5b: review Phase 4b plan (30 min).
- Phase 6b: implement Phase 4b fix (60-90 min).
- Phase 7b: verify against 5-codec sweep (30 min).
- Phase 8: close (15 min).
Total: 4-5 hours wallclock. The two-phase γ-then-α path is longer than a direct α attempt but more empirically defensible.
Iter8 success criteria recap (from Phase 0)
- H.264 libva == kdirect, byte-identical 3-frame sweep.
- VP9 unchanged (PASS direct,
4f1565e8…). - MPEG-2 unchanged (
19eefbf4…). - HEVC unchanged (
06b2c5a0…all-zero). - VP8 unchanged (
bcc57ed5…partial). - Control-payload anchors hold for 4 non-H.264 codecs.
Phase 7b (verification) hits criterion 1 + regression on 2-6. If criterion 1 not met → PARTIAL close with γ + B-1/2/3 narrative.