iter8 Phase 5: sonnet-architect review — 2 CRIT + 4 IMP + 3 MIN
CRIT-1: request_log prepends prefix on every call; per-byte loop in γ sketch would emit 32 prefix-only lines. Fix: snprintf buffered emit. CRIT-2: γ dump block missing null guard on destination_data[]; the plan's env-var check is outside the current_slot != NULL guard. Fix: nest the dump inside the existing slot-null guard. IMP-1: "stale residue from prior decode" not eliminated as alternative explanation for the 16x32 patch. Add memset-zero-before-QBUF experiment to Phase 7 to discriminate. IMP-2: γ-first defensible but on IMP-1 grounds, not the three-signature argument (which is weaker than stated). IMP-3/4 placement clarifications. MIN-1/2/3 cosmetic. 5 mechanical amendments locked for Phase 6. γ-first strategy stands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,362 @@
|
|||||||
|
# Iteration 8 — Phase 5 Review
|
||||||
|
|
||||||
|
Reviewer: Sonnet-grade architect (independent read). Reviewed 2026-05-13.
|
||||||
|
|
||||||
|
Documents read in order: phase0_findings_iter8.md, phase2_iter8_situation.md,
|
||||||
|
phase3_iter8_findings.md, phase4_iter8_plan.md.
|
||||||
|
|
||||||
|
Source files examined: `src/surface.c`, `src/surface.h`, `src/request.h`,
|
||||||
|
`src/cap_pool.h`, `src/context.c` (lines 130–225), `src/video.h`,
|
||||||
|
`src/utils.c`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Critical findings (CRIT)
|
||||||
|
|
||||||
|
### CRIT-1 — `request_log` prepends a prefix on EVERY call; the inner-loop hex dump will not produce a readable hex line
|
||||||
|
|
||||||
|
**Empirically verified.**
|
||||||
|
|
||||||
|
`utils.c:37`: `fprintf(stderr, "%s: ", V4L2_REQUEST_STR_VENDOR)` fires on every
|
||||||
|
`request_log(...)` call before the format string. The plan's dump loop
|
||||||
|
(phase4 lines 62–65):
|
||||||
|
|
||||||
|
```c
|
||||||
|
for (unsigned int i = 0; i < 32; i++)
|
||||||
|
request_log("%02x ", d[i]);
|
||||||
|
request_log("\n");
|
||||||
|
```
|
||||||
|
|
||||||
|
will emit 32 separate stderr lines, each prefixed with `"v4l2-request: "`:
|
||||||
|
|
||||||
|
```
|
||||||
|
v4l2-request: 81
|
||||||
|
v4l2-request: 81
|
||||||
|
v4l2-request: 80
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
This is not the intended single hex line. The newline call at the end also
|
||||||
|
emits a standalone `"v4l2-request: \n"` prefix line. The dump is still
|
||||||
|
technically readable if you strip the prefix from each line and join them,
|
||||||
|
but in practice when the output is captured from a multi-threaded ffmpeg run
|
||||||
|
(which also emits its own stderr lines), it becomes nearly impossible to
|
||||||
|
associate individual hex bytes with their plane/frame context.
|
||||||
|
|
||||||
|
**Severity: CRIT** because the primary purpose of γ is empirical
|
||||||
|
disambiguation. If the dump output is unreadable it can produce a false
|
||||||
|
negative — the author scans the log, sees the prefix lines but can't
|
||||||
|
reconstruct the hex, and draws wrong conclusions about plane content.
|
||||||
|
|
||||||
|
**Fix**: Write the full hex line to a local `char buf[128]` with `snprintf`,
|
||||||
|
then emit it with a single `request_log("%s\n", buf)`. Alternatively, call
|
||||||
|
`fprintf(stderr, ...)` directly inside the loop (no prefix) and emit one
|
||||||
|
`request_log(...)` per logical output line. Either approach is ~5 LOC delta
|
||||||
|
from the plan sketch.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### CRIT-2 — γ placement: the plan says "after line 388 (`cap_pool_mark_decoded`)" but the surrounding guard `if (surface_object->current_slot != NULL)` matters
|
||||||
|
|
||||||
|
**Empirically verified against `surface.c` lines 387–389.**
|
||||||
|
|
||||||
|
```c
|
||||||
|
if (surface_object->current_slot != NULL)
|
||||||
|
cap_pool_mark_decoded(&driver_data->capture_pool,
|
||||||
|
surface_object->current_slot);
|
||||||
|
```
|
||||||
|
|
||||||
|
The plan inserts the dump after `cap_pool_mark_decoded`. At that point:
|
||||||
|
|
||||||
|
- `destination_data[p]` is still valid: it was populated at `BeginPicture` →
|
||||||
|
`surface_bind_slot` and the slot has not been released (slot state is now
|
||||||
|
DECODED, not FREE). Correct.
|
||||||
|
- `destination_planes_count` is also valid. Correct.
|
||||||
|
|
||||||
|
However the plan's sketch uses `surface_object->destination_planes_count` and
|
||||||
|
`destination_data[p]` outside any null-guard. If the dump fires on an error
|
||||||
|
path where `current_slot == NULL` (i.e., the `if` above was not entered),
|
||||||
|
`destination_data[p]` may hold a stale or null pointer from a prior decode
|
||||||
|
cycle or from `RequestCreateSurfaces2`'s zeroed state. The plan's env-var
|
||||||
|
check is outside the `if (surface_object->current_slot != NULL)` guard.
|
||||||
|
|
||||||
|
**Fix**: Place the dump inside the same `if (surface_object->current_slot != NULL)` guard, or move it directly below `cap_pool_mark_decoded` with an
|
||||||
|
explicit null-check on `destination_data[0]`. This is a 2-line change to the
|
||||||
|
sketch.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Important findings (IMP)
|
||||||
|
|
||||||
|
### IMP-1 — The bug redefinition (phase 3): "stale data from prior run" as alternative explanation for the 16×32 patch is not fully eliminated
|
||||||
|
|
||||||
|
Phase 3 asserts (line 143): "Bug 4 is NOT primarily an inter-frame race.
|
||||||
|
Even the keyframe (frame 1, IDR) fails to decode fully."
|
||||||
|
|
||||||
|
The supporting evidence (line 30): the 16×32 patch values cluster around
|
||||||
|
0x7c..0x83 (luma neutral, smooth gradients). Phase 3 explicitly states these
|
||||||
|
are "real decoded image data."
|
||||||
|
|
||||||
|
Phase 3 also notes (line 30): "kdirect frame 1 fills the entire Y plane with
|
||||||
|
proper Y values; the 0x7c-0x83 cluster in libva is NOT in kdirect's frame 1
|
||||||
|
(kdirect has Y≈0x10 black throughout)."
|
||||||
|
|
||||||
|
This means the 16×32 patch content does NOT match frame 1's actual content.
|
||||||
|
Phase 3 acknowledges this in scenario A (line 124): "the kernel decoded one
|
||||||
|
tile then aborted." But there is a second-order alternative that phase 3 does
|
||||||
|
not fully close: **the 512 bytes are stale from a previous decode run's
|
||||||
|
CAPTURE buffer content that was never zeroed before the current run reused
|
||||||
|
the slot.** Specifically:
|
||||||
|
|
||||||
|
- `cap_pool_init` mmaps the CAPTURE buffers. If the kernel driver (rkvdec)
|
||||||
|
doesn't zero-initialize the physical pages on REQBUFS, the pages may hold
|
||||||
|
data from a previous use of the same physical memory.
|
||||||
|
- BBB 1080p intro frames have luma values around 0x10 (near-black). But
|
||||||
|
other content — including previous test runs of a different test clip — may
|
||||||
|
have had neutral-gray luma in the 0x7c-0x83 range.
|
||||||
|
- The 16×32 region is exactly one macroblock row × 2 rows = a stripe that
|
||||||
|
rkvdec's hardware decode engine commonly writes first (tile/MB-order).
|
||||||
|
|
||||||
|
If the 16×32 patch is stale residue rather than partial decode output, the
|
||||||
|
interpretation changes: the kernel may have written NOTHING for H.264 frame 1
|
||||||
|
via libva, not "started then aborted." The bug would then be "kernel rejects
|
||||||
|
silently, CAPTURE buffer never touched" — same as HEVC's all-zero (Bug 5) but
|
||||||
|
with 512 bytes of pre-existing stale bytes.
|
||||||
|
|
||||||
|
**How to distinguish**: γ partially addresses this — if the dump shows the 512
|
||||||
|
bytes immediately after DQBUF, it would at least confirm the timing (the data
|
||||||
|
is there at DQBUF time, not injected later). But a stale-residue theory
|
||||||
|
predicts the dump also shows those 512 bytes. The definitive test would be to
|
||||||
|
deliberately zero the CAPTURE buffer (with `memset(slot->map[0], 0,
|
||||||
|
slot->map_lengths[0])`) after `cap_pool_acquire` and before QBUF, then run
|
||||||
|
the H.264 sweep. If the 16×32 patch disappears → stale residue. If it
|
||||||
|
persists → kernel really writes it.
|
||||||
|
|
||||||
|
**This is an author re-verification candidate.** The plan should note this
|
||||||
|
alternative in Phase 7 execution: add one `memset` run to discriminate.
|
||||||
|
|
||||||
|
### IMP-2 — γ vs α-first: the plan's reasoning ("three distinct signatures
|
||||||
|
cannot be explained without empirical narrowing") is sound but the actual
|
||||||
|
cost of α is understated
|
||||||
|
|
||||||
|
The plan characterizes the γ-vs-α choice as: "three distinct signatures
|
||||||
|
(VP9 correct, HEVC zeroed, H.264 partial-leak) cannot be explained by a
|
||||||
|
single root cause hypothesis without empirical narrowing."
|
||||||
|
|
||||||
|
This is partially correct. However, the three-signature argument does NOT
|
||||||
|
apply to the specific α candidates identified in phase3:
|
||||||
|
|
||||||
|
- **SPS.constraint_set_flags** (libva=0, kdirect=2): this is a per-codec
|
||||||
|
change (H.264 only, in `h264_set_controls`). Changing it risks nothing for
|
||||||
|
VP9/HEVC/MPEG-2/VP8.
|
||||||
|
- **DECODE_PARAMS.dpb[].bottom_field_order_cnt**: similarly H.264-specific.
|
||||||
|
|
||||||
|
Neither change touches a shared code path. The three-signature argument is
|
||||||
|
an argument against a shared-path fix — not against an H.264-specific
|
||||||
|
10-LOC fix. The `feedback_unconditional_codec_state.md` rule is satisfied by
|
||||||
|
profile-gating; both α candidates are already gated to H.264.
|
||||||
|
|
||||||
|
A counter-argument for γ-first remains: even if α's candidates are correctly
|
||||||
|
identified, if the kernel didn't write anything (stale-residue scenario per
|
||||||
|
IMP-1), fixing control fields changes the wire protocol but still leaves a
|
||||||
|
silent-reject. γ tells us whether the kernel writes before we guess which
|
||||||
|
field to fix. Given the 16×32 ambiguity, γ-first is defensible. But the plan
|
||||||
|
should acknowledge this tradeoff explicitly rather than citing the three-
|
||||||
|
signature argument, which is weaker than it appears.
|
||||||
|
|
||||||
|
**Recommendation**: keep γ-first but update the rationale in Phase 4 (or
|
||||||
|
note it here for Phase 6 drafting).
|
||||||
|
|
||||||
|
### IMP-3 — Plan says "add to line 388 (cap_pool_mark_decoded)" but does not
|
||||||
|
say WHERE in RequestSyncSurface to put the dump relative to `status = VA_STATUS_SUCCESS` and `goto complete`
|
||||||
|
|
||||||
|
Looking at `surface.c` lines 387–394:
|
||||||
|
|
||||||
|
```c
|
||||||
|
if (surface_object->current_slot != NULL)
|
||||||
|
cap_pool_mark_decoded(&driver_data->capture_pool,
|
||||||
|
surface_object->current_slot);
|
||||||
|
|
||||||
|
surface_object->status = VASurfaceDisplaying;
|
||||||
|
status = VA_STATUS_SUCCESS;
|
||||||
|
goto complete;
|
||||||
|
```
|
||||||
|
|
||||||
|
The plan should insert the dump between `cap_pool_mark_decoded` and
|
||||||
|
`surface_object->status = VASurfaceDisplaying` — i.e., still on the success
|
||||||
|
path only, before the goto. This is unambiguous once you read the code but
|
||||||
|
the plan sketch doesn't reference the surrounding context, leaving Phase 6
|
||||||
|
implementer to decide.
|
||||||
|
|
||||||
|
### IMP-4 — Phase 4 predicts 3 outcomes for γ but the "stale slot" scenario
|
||||||
|
maps cleanly to a 4th diagnostic state that the dump can't distinguish alone
|
||||||
|
|
||||||
|
The plan's three-outcome table (phase4 lines 92-96):
|
||||||
|
|
||||||
|
| Dump shows | Interpretation |
|
||||||
|
|---|---|
|
||||||
|
| plane[0] only 16×32 populated | Kernel didn't write |
|
||||||
|
| plane[0] fully populated | libva mis-reads |
|
||||||
|
| plane[0] populated with other content | Slot binding wrong |
|
||||||
|
|
||||||
|
Scenario "plane[0] only 16×32 populated" is assigned to "Kernel didn't
|
||||||
|
write." But per IMP-1, it could also be "Kernel wrote NOTHING and 16×32 is
|
||||||
|
stale residue from prior kernel use." These require different follow-up:
|
||||||
|
"Kernel didn't write" → control-fill fix. "Stale residue" → the control fix
|
||||||
|
may still be correct, but the first-run memset experiment should be done
|
||||||
|
first to correctly classify.
|
||||||
|
|
||||||
|
The plan should add a row:
|
||||||
|
|
||||||
|
| plane[0] non-zero 16×32 only, VP9 dump is fully populated | Inconclusive (stale vs partial-write); run memset-zero-before-QBUF experiment |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Minor findings (MIN)
|
||||||
|
|
||||||
|
### MIN-1 — env-var caching uses file-scope statics; these would be shared
|
||||||
|
across concurrent contexts if libva ever has two active contexts in the same
|
||||||
|
process
|
||||||
|
|
||||||
|
`surface.c` is compiled into a single shared library loaded once per process.
|
||||||
|
File-scope `static bool dump_env_checked` is process-global. For the
|
||||||
|
diagnostic use case this is fine (we want one env check per process). Just
|
||||||
|
note it so future reviewers don't flag it as a bug.
|
||||||
|
|
||||||
|
### MIN-2 — `sz > 1024 ? 1024 : sz` in the non-zero-scan loop: for 1920×1088
|
||||||
|
NV12 plane[0], `sz = 1920×1088 = 2,088,960`. Sampling only the first 1024
|
||||||
|
bytes is intentional for log volume but may completely miss whether the
|
||||||
|
kernel wrote content that's not in the first 1024 bytes
|
||||||
|
|
||||||
|
For H.264 1080p in FRAME_BASED mode, rkvdec writes output MB-order or tile-
|
||||||
|
order. If the kernel writes the first MB row (bytes 0..30720 for 1920-wide)
|
||||||
|
but not the rest, the 1024-byte sample catches it. If the kernel writes only
|
||||||
|
the LAST macroblock row (pathological), the 1024-byte scan reports zero
|
||||||
|
non-zero bytes and misclassifies the dump. Suggest extending the sample to
|
||||||
|
cover at minimum one full MB row: `min(sz, 1920 * 16) = 30720` for H.264
|
||||||
|
1080p.
|
||||||
|
|
||||||
|
This is MIN because the leading hypothesis has content at the TOP-LEFT (the
|
||||||
|
16×32 patch is at offset 0), so the 1024-byte scan would catch it. But make
|
||||||
|
the note for future runs with different content.
|
||||||
|
|
||||||
|
### MIN-3 — `surface_id` in the dump log is the VAAPI surface ID (integer
|
||||||
|
offset into object_heap), not a human-readable frame number. The log line
|
||||||
|
`"CAPTURE buffer dump for surface %d"` will print something like `67108864`
|
||||||
|
(0x4000000 SURFACE_ID_OFFSET). This is not wrong, just potentially confusing.
|
||||||
|
Adding `surface_object->destination_index` (already in the plan) is the
|
||||||
|
cleaner identifier for correlating with strace DQBUF logs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Author re-verification list
|
||||||
|
|
||||||
|
Per `feedback_review_empirical_over_theoretical.md`: the following are CLAIMS
|
||||||
|
by this reviewer that the author should empirically verify before incorporating
|
||||||
|
into an amended plan.
|
||||||
|
|
||||||
|
1. **CRIT-1 (request_log prefix)**: Verify by running any existing call site
|
||||||
|
where `request_log` is called in a loop and inspecting stderr. The claim
|
||||||
|
(each call emits its own `"v4l2-request: "` prefix) follows directly from
|
||||||
|
`utils.c:37`, but if there is a buffered-output path or a different compile
|
||||||
|
guard in the installed binary, the observation may differ. Cheap test:
|
||||||
|
`grep -A3 "request_log" src/utils.c` and read `utils.c:33-42` (which I did
|
||||||
|
— see source read above).
|
||||||
|
|
||||||
|
2. **CRIT-2 (null guard on destination_data)**: Verify by tracing when
|
||||||
|
`destination_data[0]` can be null at the dump insertion point. `surface.c`
|
||||||
|
`RequestCreateSurfaces2:207` zero-inits `params`, `slices_count`, etc. but
|
||||||
|
`destination_data[]` is not explicitly zero-inited in that function; it
|
||||||
|
relies on the zero-init from `object_heap_allocate`. Whether
|
||||||
|
`object_heap_allocate` zero-fills depends on the heap implementation.
|
||||||
|
Author should grep `object_heap_allocate` to confirm zero-fill behavior
|
||||||
|
before relying on `destination_data[0] == NULL` as a safe null-check.
|
||||||
|
|
||||||
|
3. **IMP-1 (stale-residue alternative)**: The claim that the kernel may not
|
||||||
|
zero-initialize rkvdec CAPTURE buffers on REQBUFS is based on typical
|
||||||
|
DMA-coherent allocator behavior on RK3399. On some kernels, dma_alloc_*
|
||||||
|
does zero physical pages at allocation time. The author should verify on
|
||||||
|
fresnel by running a memset-before-QBUF experiment (described in IMP-1)
|
||||||
|
rather than accepting either interpretation theoretically.
|
||||||
|
|
||||||
|
4. **context.c:201–205 (destination_sizes for planes>0)**: Reviewer reads
|
||||||
|
`destination_sizes[1] = destination_sizes[0] / 2` for single-buffer NV12.
|
||||||
|
This is the UV plane size = half the Y plane size (NV12 chroma is half-
|
||||||
|
height half-width interleaved). Verify that this matches what
|
||||||
|
`v4l2_get_format` returns for CAPTURE sizeimage on rkvdec H.264 1080p —
|
||||||
|
the G_FMT sizeimage may include alignment padding (e.g. `4177920 = 1920 ×
|
||||||
|
1088 × 3/2` without padding). If `destination_sizes[0]` is set from
|
||||||
|
`bytesperlines[0] * format_height = 1920 × 1088 = 2,088,960`, then
|
||||||
|
`destination_sizes[1] = 1,044,480`. But if the kernel reports
|
||||||
|
`sizeimage = 4177920` as a single plane total, the split may differ.
|
||||||
|
Confirm the specific value in the γ dump output for plane[1].sz.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Amendments to Phase 6
|
||||||
|
|
||||||
|
These are the mechanical changes to apply when implementing γ:
|
||||||
|
|
||||||
|
**Amendment 1 (CRIT-1 fix)**: Replace the inner hex-dump loops with
|
||||||
|
buffered `snprintf` output emitted in a single `request_log` call:
|
||||||
|
|
||||||
|
```c
|
||||||
|
/* Replace the per-byte request_log calls with: */
|
||||||
|
char hexbuf[128];
|
||||||
|
int pos = 0;
|
||||||
|
for (unsigned int i = 0; i < 32 && i < sz; i++)
|
||||||
|
pos += snprintf(hexbuf + pos, sizeof(hexbuf) - pos, "%02x ", d[i]);
|
||||||
|
request_log(" plane[%u] hex[0..32]: %s\n", p, hexbuf);
|
||||||
|
|
||||||
|
pos = 0;
|
||||||
|
if (sz >= 32) {
|
||||||
|
for (unsigned int i = 0; i < 32; i++)
|
||||||
|
pos += snprintf(hexbuf + pos, sizeof(hexbuf) - pos,
|
||||||
|
"%02x ", d[sz - 32 + i]);
|
||||||
|
request_log(" plane[%u] tail hex[%zu..%zu]: %s\n",
|
||||||
|
p, sz - 32, sz - 1, hexbuf);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Amendment 2 (CRIT-2 fix)**: Wrap the entire dump block in the
|
||||||
|
existing null guard:
|
||||||
|
|
||||||
|
```c
|
||||||
|
if (surface_object->current_slot != NULL && dump_env != NULL && dump_env[0] == '1') {
|
||||||
|
/* dump block here */
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Or equivalently, place the dump immediately INSIDE the existing
|
||||||
|
`if (surface_object->current_slot != NULL)` block in `RequestSyncSurface`,
|
||||||
|
after the `cap_pool_mark_decoded` call.
|
||||||
|
|
||||||
|
**Amendment 3 (IMP-1 / experiment)**: For Phase 7 execution, add a second
|
||||||
|
test run with an explicit memset-zero of the CAPTURE slot immediately
|
||||||
|
after `cap_pool_acquire` in `picture.c::RequestBeginPicture` (or add a
|
||||||
|
diagnostic `memset` in `cap_pool_init`). Compare H.264 frame 1 output before
|
||||||
|
and after the memset. If the 16×32 patch disappears → stale-residue; adjust
|
||||||
|
IMP outcome table accordingly.
|
||||||
|
|
||||||
|
**Amendment 4 (IMP-3 / placement)**: Phase 6 implementer should place the
|
||||||
|
dump block between `cap_pool_mark_decoded(...)` and
|
||||||
|
`surface_object->status = VASurfaceDisplaying` — i.e., on the happy-path
|
||||||
|
only, not inside an error label. Do not place it after `goto complete`.
|
||||||
|
|
||||||
|
**Amendment 5 (MIN-2 / scan window)**: Change the non-zero scan limit from
|
||||||
|
1024 to `min(sz, (unsigned int)(destination_bytesperlines[0] * 16))` for
|
||||||
|
plane 0 (covers one full macroblock row), and from 1024 to
|
||||||
|
`min(sz, 1024)` for plane 1 (UV is smaller, 1024 is fine). This requires
|
||||||
|
access to `surface_object->destination_bytesperlines[0]` inside the dump
|
||||||
|
block, which is available in scope.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Two CRIT issues, both in the γ implementation sketch rather than in the
|
||||||
|
strategy choice. The γ-first rationale is defensible but IMP-1 (stale-residue
|
||||||
|
alternative) adds a fourth diagnostic state that the plan's three-outcome
|
||||||
|
table doesn't handle. The fixes are all small (Amendment 1 is the most
|
||||||
|
important — broken hex formatting would make the dump useless). No plan-level
|
||||||
|
strategy change is required; the γ-then-α path stands.
|
||||||
Reference in New Issue
Block a user