Files

T

marfrit 3a6307638d iter8 Phase 4: γ-then-α plan — diagnostic dump first, fix after

Phase 3 redefined Bug 4 to partial-fill (not inter race). Three distinct
per-codec signatures (VP9 correct, HEVC zero, H.264 partial-leak) can't
be explained by a single hypothesis. Phase 4 commits to γ first: a
~30 LOC env-gated diagnostic dump in RequestSyncSurface that fires
after CAPTURE DQBUF, prints first/last 32 bytes of each destination_data
plane and a non-zero-count of the first 1024 bytes.

γ definitively distinguishes "kernel didn't write" from "libva mis-reads"
from "slot binding wrong". Phase 4b targeted fix follows γ's outcome.

Out of scope: per-codec H.264 control-fill changes (gated on γ's
findings), VP9/VP8/HEVC/MPEG-2 paths, kernel patches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-13 11:51:47 +00:00

8.2 KiB

Raw Permalink Blame History

Iteration 8 — Phase 4 (plan)

Drafted 2026-05-13 after Phase 3 redefined Bug 4 as partial-fill (not inter race). This Phase 4 picks an empirically-driven plan: a diagnostic dump (γ) FIRST to definitively distinguish kernel-write failure from libva-read failure, then targeted fixes contingent on what γ reveals.

Decision: γ-then-α path

Phase 3 evidence:

libva H.264 produces a structured 16×32 patch of real-looking image content at Y plane top-left (cluster around 0x7c..0x83), but kdirect H.264 of the same frame shows uniform 0x10 (near-black BBB intro) throughout.
The libva patch content is not even consistent with BBB frame 1's true content. It's content from somewhere else.
VP9 via libva reads back correctly. HEVC via libva returns all-zero. H.264 via libva returns a partial-fill garbage.

These three distinct signatures across codecs (VP9 correct, HEVC zeroed, H.264 partial-leak) cannot be explained by a single root cause hypothesis without empirical narrowing.

Phase 4 commits to a diagnostic dump first (γ), small (~30 LOC), gated behind LIBVA_V4L2_DUMP_CAPTURE env var. The dump fires inside RequestSyncSurface after the CAPTURE DQBUF, before cap_pool_mark_decoded. Output: a small hex+stats summary of surface->destination_data[0] (Y plane) + destination_data[1] (UV plane) — written to a Phase-3-style log.

Empirical question γ answers:

"Does the kernel write proper NV12 pixel data to the CAPTURE buffer? If yes — the bug is in libva's readback. If no — the bug is in the kernel's response to libva's control submission."

If γ shows kernel wrote correct data: pivot to mmap / cache / offset bug in libva. Likely fix in surface_bind_slot or cap_pool mmap.

If γ shows kernel wrote partial/wrong data: pivot to specific control field fix (α — adjust SPS.constraint_set_flags, DECODE_PARAMS.dpb.bottom_field_order_cnt, etc. to match kdirect).

If γ shows the buffer holds stale data (matching a prior test run's content): the kernel didn't write, AND the cap_pool slot was previously used. Initial write pattern needs investigation.

In scope

src/surface.c::RequestSyncSurface — add diagnostic dump after CAPTURE DQBUF.
src/cap_pool.c (if needed) — expose slot's underlying buffer pointer.
src/request.h — possibly extend driver_data struct for env-var caching.

Out of scope

Any per-codec H.264 control-fill changes (those follow γ's results).
VP9 / VP8 / HEVC / MPEG-2 paths.
Kernel patches.

Plan A: γ diagnostic implementation

Change shape

Add to RequestSyncSurface after line 388 (cap_pool_mark_decoded):

static const char *dump_env = NULL;
static bool dump_env_checked = false;
if (!dump_env_checked) {
    dump_env = getenv("LIBVA_V4L2_DUMP_CAPTURE");
    dump_env_checked = true;
}
if (dump_env != NULL && dump_env[0] == '1') {
    request_log("CAPTURE buffer dump for surface %d (slot v4l2_index=%d):\n",
                surface_id, surface_object->destination_index);
    for (unsigned int p = 0; p < surface_object->destination_planes_count; p++) {
        const unsigned char *d = surface_object->destination_data[p];
        size_t sz = surface_object->destination_sizes[p];
        unsigned int nz = 0;
        for (size_t i = 0; i < (sz > 1024 ? 1024 : sz); i++)
            if (d[i] != 0) nz++;
        request_log("  plane[%u] size=%zu, first-1024-bytes non-zero=%u\n",
                    p, sz, nz);
        request_log("  plane[%u] hex[0..32]: ", p);
        for (unsigned int i = 0; i < 32; i++)
            request_log("%02x ", d[i]);
        request_log("\n");
        request_log("  plane[%u] hex[%zu-32..%zu] tail: ", p,
                    sz - 32, sz);
        for (unsigned int i = 0; i < 32; i++)
            request_log("%02x ", d[sz - 32 + i]);
        request_log("\n");
    }
}

LOC

~30 LOC added in surface.c. No removals. No other files touched.

Verification (Phase 7 will exercise)

Build + install on fresnel.
Run libva H.264 sweep with LIBVA_V4L2_DUMP_CAPTURE=1. Capture stderr.
Run libva VP9 sweep with same env. Capture stderr.
Run libva HEVC sweep with same env. Capture stderr.
Compare each codec's dump against the YUV output bytes (the libva→hwdownload path's view).

Three predicted outcomes:

Dump shows	Interpretation	Next iter9 work
Plane[0] all-zero or only 16×32 region populated, plane[1] all-zero	Kernel didn't write	Control-fill fix (constraint_set_flags + BFOC + flags)
Plane[0] fully populated with real Y values, plane[1] populated with real UV	libva mis-reads / cache / offset bug	mmap/cache fix in cap_pool or surface_bind_slot
Plane[0] populated with content NOT matching frame's actual image	Slot binding wrong (frame N's slot reads frame M's data)	Slot management fix in cap_pool

Risk

Logging volume: 3 frames × 24 cap_pool slots × 2 planes = ~144 log lines per run. Bounded.
Performance: a few microseconds per frame for the dump. Negligible vs the H.264 decode latency.
Correctness: env-gated; default behavior unchanged when env not set.
No regression risk: pure additive code.

Plan B (post-γ): targeted fix based on outcome

Phase 4b will be drafted AFTER γ's verification phase. Outline only here:

B-1 (if γ shows "kernel didn't write"):

Change h264_set_controls to set constraint_set_flags = 0x02 to match kdirect (1 LOC).
Investigate BFOC field-order for dpb_entry: if entry->pic.BottomFieldOrderCnt is non-zero in libva's h264_fill_dpb but the strip-sentinel zeros it, suppress the strip for inter frames (~3 LOC).
Check the dpb-entry flags upper bits in kdirect.

B-2 (if γ shows "libva mis-reads"):

Audit surface_bind_slot offset math (line 98-102).
Verify destination_offsets[] computation in surface_fill_format_uniform against actual kernel layout for H.264 NV12.
Check whether cache invalidation is required: do msync(MAP_INVALIDATE) on the buffer before reading.

B-3 (if γ shows "slot binding wrong"):

Audit cap_pool_acquire LRU rotation.
Check whether destination_index and destination_data come from the same slot.

LOC estimate for Plan B: 5-30 LOC in 1-2 files. One commit.

Phase 5 review concerns to invite

The redefinition itself: Phase 3 reframed Bug 4 from "inter race-loss" to "partial-fill on all frames (incl. keyframe)". Phase 5 reviewer should validate the empirical evidence — is the bug really partial-fill for keyframe too, or is the 16×32 patch some other artifact?
γ vs α-first: this plan defers the fix attempt. Phase 5 reviewer may push for trying α first since the SPS/POC fields are cheap to change.
VP9 success: any proposed fix must NOT break VP9's working state. Phase 5 reviewer should verify any changes are H.264-gated (per feedback_unconditional_codec_state.md).
iter5b-β's CreateContext refactor: γ's results may finger that refactor as the regression source for H.264. If so, Phase 5 reviewer should consider whether re-introducing a per-CreateSurfaces2 path (option α' that was rejected at iter5b) is necessary.

Phase 4 → Phase 5 handoff

Phase 4 plan locked: implement γ diagnostic dump. Phase 5 reviewer reviews this plan, suggests amendments. Phase 6 implements amended plan. Phase 7 runs the dump and analyzes outcomes. Phase 4b (targeted fix) follows.

Predicted iter8 cadence

Phase 4: this doc.
Phase 5: sonnet-architect review (30 min).
Phase 6: implement γ (30 min).
Phase 7: run dump, characterize outcome, draft Phase 4b plan (60 min).
Phase 5b: review Phase 4b plan (30 min).
Phase 6b: implement Phase 4b fix (60-90 min).
Phase 7b: verify against 5-codec sweep (30 min).
Phase 8: close (15 min).

Total: 4-5 hours wallclock. The two-phase γ-then-α path is longer than a direct α attempt but more empirically defensible.

Iter8 success criteria recap (from Phase 0)

H.264 libva == kdirect, byte-identical 3-frame sweep.
VP9 unchanged (PASS direct, 4f1565e8…).
MPEG-2 unchanged (19eefbf4…).
HEVC unchanged (06b2c5a0… all-zero).
VP8 unchanged (bcc57ed5… partial).
Control-payload anchors hold for 4 non-H.264 codecs.

Phase 7b (verification) hits criterion 1 + regression on 2-6. If criterion 1 not met → PARTIAL close with γ + B-1/2/3 narrative.

8.2 KiB Raw Permalink Blame History Unescape Escape