CRIT-1: request_log prepends prefix on every call; per-byte loop in γ sketch would emit 32 prefix-only lines. Fix: snprintf buffered emit. CRIT-2: γ dump block missing null guard on destination_data[]; the plan's env-var check is outside the current_slot != NULL guard. Fix: nest the dump inside the existing slot-null guard. IMP-1: "stale residue from prior decode" not eliminated as alternative explanation for the 16x32 patch. Add memset-zero-before-QBUF experiment to Phase 7 to discriminate. IMP-2: γ-first defensible but on IMP-1 grounds, not the three-signature argument (which is weaker than stated). IMP-3/4 placement clarifications. MIN-1/2/3 cosmetic. 5 mechanical amendments locked for Phase 6. γ-first strategy stands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 KiB
Iteration 8 — Phase 5 Review
Reviewer: Sonnet-grade architect (independent read). Reviewed 2026-05-13.
Documents read in order: phase0_findings_iter8.md, phase2_iter8_situation.md, phase3_iter8_findings.md, phase4_iter8_plan.md.
Source files examined: src/surface.c, src/surface.h, src/request.h,
src/cap_pool.h, src/context.c (lines 130–225), src/video.h,
src/utils.c.
Critical findings (CRIT)
CRIT-1 — request_log prepends a prefix on EVERY call; the inner-loop hex dump will not produce a readable hex line
Empirically verified.
utils.c:37: fprintf(stderr, "%s: ", V4L2_REQUEST_STR_VENDOR) fires on every
request_log(...) call before the format string. The plan's dump loop
(phase4 lines 62–65):
for (unsigned int i = 0; i < 32; i++)
request_log("%02x ", d[i]);
request_log("\n");
will emit 32 separate stderr lines, each prefixed with "v4l2-request: ":
v4l2-request: 81
v4l2-request: 81
v4l2-request: 80
...
This is not the intended single hex line. The newline call at the end also
emits a standalone "v4l2-request: \n" prefix line. The dump is still
technically readable if you strip the prefix from each line and join them,
but in practice when the output is captured from a multi-threaded ffmpeg run
(which also emits its own stderr lines), it becomes nearly impossible to
associate individual hex bytes with their plane/frame context.
Severity: CRIT because the primary purpose of γ is empirical disambiguation. If the dump output is unreadable it can produce a false negative — the author scans the log, sees the prefix lines but can't reconstruct the hex, and draws wrong conclusions about plane content.
Fix: Write the full hex line to a local char buf[128] with snprintf,
then emit it with a single request_log("%s\n", buf). Alternatively, call
fprintf(stderr, ...) directly inside the loop (no prefix) and emit one
request_log(...) per logical output line. Either approach is ~5 LOC delta
from the plan sketch.
CRIT-2 — γ placement: the plan says "after line 388 (cap_pool_mark_decoded)" but the surrounding guard if (surface_object->current_slot != NULL) matters
Empirically verified against surface.c lines 387–389.
if (surface_object->current_slot != NULL)
cap_pool_mark_decoded(&driver_data->capture_pool,
surface_object->current_slot);
The plan inserts the dump after cap_pool_mark_decoded. At that point:
destination_data[p]is still valid: it was populated atBeginPicture→surface_bind_slotand the slot has not been released (slot state is now DECODED, not FREE). Correct.destination_planes_countis also valid. Correct.
However the plan's sketch uses surface_object->destination_planes_count and
destination_data[p] outside any null-guard. If the dump fires on an error
path where current_slot == NULL (i.e., the if above was not entered),
destination_data[p] may hold a stale or null pointer from a prior decode
cycle or from RequestCreateSurfaces2's zeroed state. The plan's env-var
check is outside the if (surface_object->current_slot != NULL) guard.
Fix: Place the dump inside the same if (surface_object->current_slot != NULL) guard, or move it directly below cap_pool_mark_decoded with an
explicit null-check on destination_data[0]. This is a 2-line change to the
sketch.
Important findings (IMP)
IMP-1 — The bug redefinition (phase 3): "stale data from prior run" as alternative explanation for the 16×32 patch is not fully eliminated
Phase 3 asserts (line 143): "Bug 4 is NOT primarily an inter-frame race. Even the keyframe (frame 1, IDR) fails to decode fully."
The supporting evidence (line 30): the 16×32 patch values cluster around 0x7c..0x83 (luma neutral, smooth gradients). Phase 3 explicitly states these are "real decoded image data."
Phase 3 also notes (line 30): "kdirect frame 1 fills the entire Y plane with proper Y values; the 0x7c-0x83 cluster in libva is NOT in kdirect's frame 1 (kdirect has Y≈0x10 black throughout)."
This means the 16×32 patch content does NOT match frame 1's actual content. Phase 3 acknowledges this in scenario A (line 124): "the kernel decoded one tile then aborted." But there is a second-order alternative that phase 3 does not fully close: the 512 bytes are stale from a previous decode run's CAPTURE buffer content that was never zeroed before the current run reused the slot. Specifically:
cap_pool_initmmaps the CAPTURE buffers. If the kernel driver (rkvdec) doesn't zero-initialize the physical pages on REQBUFS, the pages may hold data from a previous use of the same physical memory.- BBB 1080p intro frames have luma values around 0x10 (near-black). But other content — including previous test runs of a different test clip — may have had neutral-gray luma in the 0x7c-0x83 range.
- The 16×32 region is exactly one macroblock row × 2 rows = a stripe that rkvdec's hardware decode engine commonly writes first (tile/MB-order).
If the 16×32 patch is stale residue rather than partial decode output, the interpretation changes: the kernel may have written NOTHING for H.264 frame 1 via libva, not "started then aborted." The bug would then be "kernel rejects silently, CAPTURE buffer never touched" — same as HEVC's all-zero (Bug 5) but with 512 bytes of pre-existing stale bytes.
How to distinguish: γ partially addresses this — if the dump shows the 512
bytes immediately after DQBUF, it would at least confirm the timing (the data
is there at DQBUF time, not injected later). But a stale-residue theory
predicts the dump also shows those 512 bytes. The definitive test would be to
deliberately zero the CAPTURE buffer (with memset(slot->map[0], 0, slot->map_lengths[0])) after cap_pool_acquire and before QBUF, then run
the H.264 sweep. If the 16×32 patch disappears → stale residue. If it
persists → kernel really writes it.
This is an author re-verification candidate. The plan should note this
alternative in Phase 7 execution: add one memset run to discriminate.
IMP-2 — γ vs α-first: the plan's reasoning ("three distinct signatures
cannot be explained without empirical narrowing") is sound but the actual cost of α is understated
The plan characterizes the γ-vs-α choice as: "three distinct signatures (VP9 correct, HEVC zeroed, H.264 partial-leak) cannot be explained by a single root cause hypothesis without empirical narrowing."
This is partially correct. However, the three-signature argument does NOT apply to the specific α candidates identified in phase3:
- SPS.constraint_set_flags (libva=0, kdirect=2): this is a per-codec
change (H.264 only, in
h264_set_controls). Changing it risks nothing for VP9/HEVC/MPEG-2/VP8. - DECODE_PARAMS.dpb[].bottom_field_order_cnt: similarly H.264-specific.
Neither change touches a shared code path. The three-signature argument is
an argument against a shared-path fix — not against an H.264-specific
10-LOC fix. The feedback_unconditional_codec_state.md rule is satisfied by
profile-gating; both α candidates are already gated to H.264.
A counter-argument for γ-first remains: even if α's candidates are correctly identified, if the kernel didn't write anything (stale-residue scenario per IMP-1), fixing control fields changes the wire protocol but still leaves a silent-reject. γ tells us whether the kernel writes before we guess which field to fix. Given the 16×32 ambiguity, γ-first is defensible. But the plan should acknowledge this tradeoff explicitly rather than citing the three- signature argument, which is weaker than it appears.
Recommendation: keep γ-first but update the rationale in Phase 4 (or note it here for Phase 6 drafting).
IMP-3 — Plan says "add to line 388 (cap_pool_mark_decoded)" but does not
say WHERE in RequestSyncSurface to put the dump relative to status = VA_STATUS_SUCCESS and goto complete
Looking at surface.c lines 387–394:
if (surface_object->current_slot != NULL)
cap_pool_mark_decoded(&driver_data->capture_pool,
surface_object->current_slot);
surface_object->status = VASurfaceDisplaying;
status = VA_STATUS_SUCCESS;
goto complete;
The plan should insert the dump between cap_pool_mark_decoded and
surface_object->status = VASurfaceDisplaying — i.e., still on the success
path only, before the goto. This is unambiguous once you read the code but
the plan sketch doesn't reference the surrounding context, leaving Phase 6
implementer to decide.
IMP-4 — Phase 4 predicts 3 outcomes for γ but the "stale slot" scenario
maps cleanly to a 4th diagnostic state that the dump can't distinguish alone
The plan's three-outcome table (phase4 lines 92-96):
| Dump shows | Interpretation |
|---|---|
| plane[0] only 16×32 populated | Kernel didn't write |
| plane[0] fully populated | libva mis-reads |
| plane[0] populated with other content | Slot binding wrong |
Scenario "plane[0] only 16×32 populated" is assigned to "Kernel didn't write." But per IMP-1, it could also be "Kernel wrote NOTHING and 16×32 is stale residue from prior kernel use." These require different follow-up: "Kernel didn't write" → control-fill fix. "Stale residue" → the control fix may still be correct, but the first-run memset experiment should be done first to correctly classify.
The plan should add a row:
| plane[0] non-zero 16×32 only, VP9 dump is fully populated | Inconclusive (stale vs partial-write); run memset-zero-before-QBUF experiment |
Minor findings (MIN)
MIN-1 — env-var caching uses file-scope statics; these would be shared
across concurrent contexts if libva ever has two active contexts in the same process
surface.c is compiled into a single shared library loaded once per process.
File-scope static bool dump_env_checked is process-global. For the
diagnostic use case this is fine (we want one env check per process). Just
note it so future reviewers don't flag it as a bug.
MIN-2 — sz > 1024 ? 1024 : sz in the non-zero-scan loop: for 1920×1088
NV12 plane[0], sz = 1920×1088 = 2,088,960. Sampling only the first 1024
bytes is intentional for log volume but may completely miss whether the
kernel wrote content that's not in the first 1024 bytes
For H.264 1080p in FRAME_BASED mode, rkvdec writes output MB-order or tile-
order. If the kernel writes the first MB row (bytes 0..30720 for 1920-wide)
but not the rest, the 1024-byte sample catches it. If the kernel writes only
the LAST macroblock row (pathological), the 1024-byte scan reports zero
non-zero bytes and misclassifies the dump. Suggest extending the sample to
cover at minimum one full MB row: min(sz, 1920 * 16) = 30720 for H.264
1080p.
This is MIN because the leading hypothesis has content at the TOP-LEFT (the 16×32 patch is at offset 0), so the 1024-byte scan would catch it. But make the note for future runs with different content.
MIN-3 — surface_id in the dump log is the VAAPI surface ID (integer
offset into object_heap), not a human-readable frame number. The log line
"CAPTURE buffer dump for surface %d" will print something like 67108864
(0x4000000 SURFACE_ID_OFFSET). This is not wrong, just potentially confusing.
Adding surface_object->destination_index (already in the plan) is the
cleaner identifier for correlating with strace DQBUF logs.
Author re-verification list
Per feedback_review_empirical_over_theoretical.md: the following are CLAIMS
by this reviewer that the author should empirically verify before incorporating
into an amended plan.
-
CRIT-1 (request_log prefix): Verify by running any existing call site where
request_logis called in a loop and inspecting stderr. The claim (each call emits its own"v4l2-request: "prefix) follows directly fromutils.c:37, but if there is a buffered-output path or a different compile guard in the installed binary, the observation may differ. Cheap test:grep -A3 "request_log" src/utils.cand readutils.c:33-42(which I did — see source read above). -
CRIT-2 (null guard on destination_data): Verify by tracing when
destination_data[0]can be null at the dump insertion point.surface.cRequestCreateSurfaces2:207zero-initsparams,slices_count, etc. butdestination_data[]is not explicitly zero-inited in that function; it relies on the zero-init fromobject_heap_allocate. Whetherobject_heap_allocatezero-fills depends on the heap implementation. Author should grepobject_heap_allocateto confirm zero-fill behavior before relying ondestination_data[0] == NULLas a safe null-check. -
IMP-1 (stale-residue alternative): The claim that the kernel may not zero-initialize rkvdec CAPTURE buffers on REQBUFS is based on typical DMA-coherent allocator behavior on RK3399. On some kernels, dma_alloc_* does zero physical pages at allocation time. The author should verify on fresnel by running a memset-before-QBUF experiment (described in IMP-1) rather than accepting either interpretation theoretically.
-
context.c:201–205 (destination_sizes for planes>0): Reviewer reads
destination_sizes[1] = destination_sizes[0] / 2for single-buffer NV12. This is the UV plane size = half the Y plane size (NV12 chroma is half- height half-width interleaved). Verify that this matches whatv4l2_get_formatreturns for CAPTURE sizeimage on rkvdec H.264 1080p — the G_FMT sizeimage may include alignment padding (e.g.4177920 = 1920 × 1088 × 3/2without padding). Ifdestination_sizes[0]is set frombytesperlines[0] * format_height = 1920 × 1088 = 2,088,960, thendestination_sizes[1] = 1,044,480. But if the kernel reportssizeimage = 4177920as a single plane total, the split may differ. Confirm the specific value in the γ dump output for plane[1].sz.
Amendments to Phase 6
These are the mechanical changes to apply when implementing γ:
Amendment 1 (CRIT-1 fix): Replace the inner hex-dump loops with
buffered snprintf output emitted in a single request_log call:
/* Replace the per-byte request_log calls with: */
char hexbuf[128];
int pos = 0;
for (unsigned int i = 0; i < 32 && i < sz; i++)
pos += snprintf(hexbuf + pos, sizeof(hexbuf) - pos, "%02x ", d[i]);
request_log(" plane[%u] hex[0..32]: %s\n", p, hexbuf);
pos = 0;
if (sz >= 32) {
for (unsigned int i = 0; i < 32; i++)
pos += snprintf(hexbuf + pos, sizeof(hexbuf) - pos,
"%02x ", d[sz - 32 + i]);
request_log(" plane[%u] tail hex[%zu..%zu]: %s\n",
p, sz - 32, sz - 1, hexbuf);
}
Amendment 2 (CRIT-2 fix): Wrap the entire dump block in the existing null guard:
if (surface_object->current_slot != NULL && dump_env != NULL && dump_env[0] == '1') {
/* dump block here */
}
Or equivalently, place the dump immediately INSIDE the existing
if (surface_object->current_slot != NULL) block in RequestSyncSurface,
after the cap_pool_mark_decoded call.
Amendment 3 (IMP-1 / experiment): For Phase 7 execution, add a second
test run with an explicit memset-zero of the CAPTURE slot immediately
after cap_pool_acquire in picture.c::RequestBeginPicture (or add a
diagnostic memset in cap_pool_init). Compare H.264 frame 1 output before
and after the memset. If the 16×32 patch disappears → stale-residue; adjust
IMP outcome table accordingly.
Amendment 4 (IMP-3 / placement): Phase 6 implementer should place the
dump block between cap_pool_mark_decoded(...) and
surface_object->status = VASurfaceDisplaying — i.e., on the happy-path
only, not inside an error label. Do not place it after goto complete.
Amendment 5 (MIN-2 / scan window): Change the non-zero scan limit from
1024 to min(sz, (unsigned int)(destination_bytesperlines[0] * 16)) for
plane 0 (covers one full macroblock row), and from 1024 to
min(sz, 1024) for plane 1 (UV is smaller, 1024 is fine). This requires
access to surface_object->destination_bytesperlines[0] inside the dump
block, which is available in scope.
Summary
Two CRIT issues, both in the γ implementation sketch rather than in the strategy choice. The γ-first rationale is defensible but IMP-1 (stale-residue alternative) adds a fourth diagnostic state that the plan's three-outcome table doesn't handle. The fixes are all small (Amendment 1 is the most important — broken hex formatting would make the dump useless). No plan-level strategy change is required; the γ-then-α path stands.