3a6307638d
Phase 3 redefined Bug 4 to partial-fill (not inter race). Three distinct per-codec signatures (VP9 correct, HEVC zero, H.264 partial-leak) can't be explained by a single hypothesis. Phase 4 commits to γ first: a ~30 LOC env-gated diagnostic dump in RequestSyncSurface that fires after CAPTURE DQBUF, prints first/last 32 bytes of each destination_data plane and a non-zero-count of the first 1024 bytes. γ definitively distinguishes "kernel didn't write" from "libva mis-reads" from "slot binding wrong". Phase 4b targeted fix follows γ's outcome. Out of scope: per-codec H.264 control-fill changes (gated on γ's findings), VP9/VP8/HEVC/MPEG-2 paths, kernel patches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
157 lines
8.2 KiB
Markdown
157 lines
8.2 KiB
Markdown
# Iteration 8 — Phase 4 (plan)
|
||
|
||
Drafted 2026-05-13 after Phase 3 redefined Bug 4 as partial-fill (not inter race). This Phase 4 picks an **empirically-driven plan**: a diagnostic dump (γ) FIRST to definitively distinguish kernel-write failure from libva-read failure, then targeted fixes contingent on what γ reveals.
|
||
|
||
## Decision: γ-then-α path
|
||
|
||
Phase 3 evidence:
|
||
|
||
- libva H.264 produces a structured 16×32 patch of real-looking image content at Y plane top-left (cluster around 0x7c..0x83), but kdirect H.264 of the same frame shows uniform 0x10 (near-black BBB intro) throughout.
|
||
- The libva patch content is **not even consistent** with BBB frame 1's true content. It's content from somewhere else.
|
||
- VP9 via libva reads back correctly. HEVC via libva returns all-zero. H.264 via libva returns a partial-fill garbage.
|
||
|
||
These three distinct signatures across codecs (VP9 correct, HEVC zeroed, H.264 partial-leak) **cannot be explained by a single root cause hypothesis** without empirical narrowing.
|
||
|
||
**Phase 4 commits to a diagnostic dump first** (γ), small (~30 LOC), gated behind `LIBVA_V4L2_DUMP_CAPTURE` env var. The dump fires inside `RequestSyncSurface` after the CAPTURE DQBUF, before `cap_pool_mark_decoded`. Output: a small hex+stats summary of `surface->destination_data[0]` (Y plane) + `destination_data[1]` (UV plane) — written to a Phase-3-style log.
|
||
|
||
Empirical question γ answers:
|
||
|
||
> "Does the kernel write proper NV12 pixel data to the CAPTURE buffer? If yes — the bug is in libva's readback. If no — the bug is in the kernel's response to libva's control submission."
|
||
|
||
If γ shows kernel wrote correct data: pivot to mmap / cache / offset bug in libva. Likely fix in `surface_bind_slot` or cap_pool mmap.
|
||
|
||
If γ shows kernel wrote partial/wrong data: pivot to specific control field fix (α — adjust SPS.constraint_set_flags, DECODE_PARAMS.dpb.bottom_field_order_cnt, etc. to match kdirect).
|
||
|
||
If γ shows the buffer holds stale data (matching a prior test run's content): the kernel didn't write, AND the cap_pool slot was previously used. Initial write pattern needs investigation.
|
||
|
||
## In scope
|
||
|
||
- `src/surface.c::RequestSyncSurface` — add diagnostic dump after CAPTURE DQBUF.
|
||
- `src/cap_pool.c` (if needed) — expose slot's underlying buffer pointer.
|
||
- `src/request.h` — possibly extend driver_data struct for env-var caching.
|
||
|
||
## Out of scope
|
||
|
||
- Any per-codec H.264 control-fill changes (those follow γ's results).
|
||
- VP9 / VP8 / HEVC / MPEG-2 paths.
|
||
- Kernel patches.
|
||
|
||
## Plan A: γ diagnostic implementation
|
||
|
||
### Change shape
|
||
|
||
Add to `RequestSyncSurface` after line 388 (`cap_pool_mark_decoded`):
|
||
|
||
```c
|
||
static const char *dump_env = NULL;
|
||
static bool dump_env_checked = false;
|
||
if (!dump_env_checked) {
|
||
dump_env = getenv("LIBVA_V4L2_DUMP_CAPTURE");
|
||
dump_env_checked = true;
|
||
}
|
||
if (dump_env != NULL && dump_env[0] == '1') {
|
||
request_log("CAPTURE buffer dump for surface %d (slot v4l2_index=%d):\n",
|
||
surface_id, surface_object->destination_index);
|
||
for (unsigned int p = 0; p < surface_object->destination_planes_count; p++) {
|
||
const unsigned char *d = surface_object->destination_data[p];
|
||
size_t sz = surface_object->destination_sizes[p];
|
||
unsigned int nz = 0;
|
||
for (size_t i = 0; i < (sz > 1024 ? 1024 : sz); i++)
|
||
if (d[i] != 0) nz++;
|
||
request_log(" plane[%u] size=%zu, first-1024-bytes non-zero=%u\n",
|
||
p, sz, nz);
|
||
request_log(" plane[%u] hex[0..32]: ", p);
|
||
for (unsigned int i = 0; i < 32; i++)
|
||
request_log("%02x ", d[i]);
|
||
request_log("\n");
|
||
request_log(" plane[%u] hex[%zu-32..%zu] tail: ", p,
|
||
sz - 32, sz);
|
||
for (unsigned int i = 0; i < 32; i++)
|
||
request_log("%02x ", d[sz - 32 + i]);
|
||
request_log("\n");
|
||
}
|
||
}
|
||
```
|
||
|
||
### LOC
|
||
|
||
~30 LOC added in `surface.c`. No removals. No other files touched.
|
||
|
||
### Verification (Phase 7 will exercise)
|
||
|
||
1. Build + install on fresnel.
|
||
2. Run libva H.264 sweep with `LIBVA_V4L2_DUMP_CAPTURE=1`. Capture stderr.
|
||
3. Run libva VP9 sweep with same env. Capture stderr.
|
||
4. Run libva HEVC sweep with same env. Capture stderr.
|
||
5. Compare each codec's dump against the YUV output bytes (the libva→hwdownload path's view).
|
||
|
||
Three predicted outcomes:
|
||
|
||
| Dump shows | Interpretation | Next iter9 work |
|
||
|---|---|---|
|
||
| Plane[0] all-zero or only 16×32 region populated, plane[1] all-zero | Kernel didn't write | Control-fill fix (constraint_set_flags + BFOC + flags) |
|
||
| Plane[0] fully populated with real Y values, plane[1] populated with real UV | libva mis-reads / cache / offset bug | mmap/cache fix in cap_pool or surface_bind_slot |
|
||
| Plane[0] populated with content NOT matching frame's actual image | Slot binding wrong (frame N's slot reads frame M's data) | Slot management fix in cap_pool |
|
||
|
||
### Risk
|
||
|
||
- **Logging volume**: 3 frames × 24 cap_pool slots × 2 planes = ~144 log lines per run. Bounded.
|
||
- **Performance**: a few microseconds per frame for the dump. Negligible vs the H.264 decode latency.
|
||
- **Correctness**: env-gated; default behavior unchanged when env not set.
|
||
- **No regression risk**: pure additive code.
|
||
|
||
## Plan B (post-γ): targeted fix based on outcome
|
||
|
||
Phase 4b will be drafted AFTER γ's verification phase. Outline only here:
|
||
|
||
### B-1 (if γ shows "kernel didn't write"):
|
||
- Change `h264_set_controls` to set `constraint_set_flags = 0x02` to match kdirect (1 LOC).
|
||
- Investigate BFOC field-order for `dpb_entry`: if `entry->pic.BottomFieldOrderCnt` is non-zero in libva's `h264_fill_dpb` but the strip-sentinel zeros it, suppress the strip for inter frames (~3 LOC).
|
||
- Check the dpb-entry `flags` upper bits in kdirect.
|
||
|
||
### B-2 (if γ shows "libva mis-reads"):
|
||
- Audit `surface_bind_slot` offset math (line 98-102).
|
||
- Verify `destination_offsets[]` computation in `surface_fill_format_uniform` against actual kernel layout for H.264 NV12.
|
||
- Check whether cache invalidation is required: do `msync(MAP_INVALIDATE)` on the buffer before reading.
|
||
|
||
### B-3 (if γ shows "slot binding wrong"):
|
||
- Audit `cap_pool_acquire` LRU rotation.
|
||
- Check whether `destination_index` and `destination_data` come from the same slot.
|
||
|
||
LOC estimate for Plan B: 5-30 LOC in 1-2 files. One commit.
|
||
|
||
## Phase 5 review concerns to invite
|
||
|
||
1. **The redefinition itself**: Phase 3 reframed Bug 4 from "inter race-loss" to "partial-fill on all frames (incl. keyframe)". Phase 5 reviewer should validate the empirical evidence — is the bug really partial-fill for keyframe too, or is the 16×32 patch some other artifact?
|
||
2. **γ vs α-first**: this plan defers the fix attempt. Phase 5 reviewer may push for trying α first since the SPS/POC fields are cheap to change.
|
||
3. **VP9 success**: any proposed fix must NOT break VP9's working state. Phase 5 reviewer should verify any changes are H.264-gated (per `feedback_unconditional_codec_state.md`).
|
||
4. **iter5b-β's CreateContext refactor**: γ's results may finger that refactor as the regression source for H.264. If so, Phase 5 reviewer should consider whether re-introducing a per-CreateSurfaces2 path (option α' that was rejected at iter5b) is necessary.
|
||
|
||
## Phase 4 → Phase 5 handoff
|
||
|
||
Phase 4 plan locked: implement γ diagnostic dump. Phase 5 reviewer reviews this plan, suggests amendments. Phase 6 implements amended plan. Phase 7 runs the dump and analyzes outcomes. Phase 4b (targeted fix) follows.
|
||
|
||
## Predicted iter8 cadence
|
||
|
||
- Phase 4: this doc.
|
||
- Phase 5: sonnet-architect review (30 min).
|
||
- Phase 6: implement γ (30 min).
|
||
- Phase 7: run dump, characterize outcome, draft Phase 4b plan (60 min).
|
||
- Phase 5b: review Phase 4b plan (30 min).
|
||
- Phase 6b: implement Phase 4b fix (60-90 min).
|
||
- Phase 7b: verify against 5-codec sweep (30 min).
|
||
- Phase 8: close (15 min).
|
||
|
||
Total: 4-5 hours wallclock. The two-phase γ-then-α path is longer than a direct α attempt but more empirically defensible.
|
||
|
||
## Iter8 success criteria recap (from Phase 0)
|
||
|
||
1. H.264 libva == kdirect, byte-identical 3-frame sweep.
|
||
2. VP9 unchanged (PASS direct, `4f1565e8…`).
|
||
3. MPEG-2 unchanged (`19eefbf4…`).
|
||
4. HEVC unchanged (`06b2c5a0…` all-zero).
|
||
5. VP8 unchanged (`bcc57ed5…` partial).
|
||
6. Control-payload anchors hold for 4 non-H.264 codecs.
|
||
|
||
Phase 7b (verification) hits criterion 1 + regression on 2-6. If criterion 1 not met → PARTIAL close with γ + B-1/2/3 narrative.
|