From 3508a2cfeb1363ad7a71cad9dd91631f338f0495 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Tue, 12 May 2026 13:50:08 +0000 Subject: [PATCH] =?UTF-8?q?iter5b=20Phase=205=20v2:=202=20CRIT=20findings?= =?UTF-8?q?=20=E2=80=94=20NULL=20guard=20+=20missing=20request=5Fpool=5Fde?= =?UTF-8?q?stroy?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CRIT-1: context.c:64-66 video_format==NULL guard rejects every first β CreateContext. β moves the probe from CreateSurfaces2 into CreateContext itself, so the guard fires before any new logic runs. Fix: remove guard, move CAPTURE probe to top of CreateContext. CRIT-2: DestroyContext lacks request_pool_destroy. Empirical grep shows only surface.c:220 (which β strips) calls it per-session. Without amendment, second CreateContext gets pool->initialized=true with stale slot pointers → QBUF EINVAL. Fix: add request_pool_destroy to DestroyContext before REQBUFS(0). C3 (surface.c strip) and CRIT-2 fix MUST land together. Plus IMP-1 (mplane assumption wrong for SUNXI_TILED_NV12) + IMP-2 (surface_reset_format_cache becomes dead under C7) + IMP-3 (error recovery comment). Phase 6 BLOCKED pending CRIT-1 + CRIT-2 fixes. Author confirmed both at code level — Phase 5 caught what Phase 4 v2's surface read missed ("DestroyContext teardown — no change needed" — wrong; was incomplete). Co-Authored-By: Claude Opus 4.7 (1M context) --- phase5_iter5b_review_v2.md | 173 +++++++++++++++++++++++++++++++++++++ 1 file changed, 173 insertions(+) create mode 100644 phase5_iter5b_review_v2.md diff --git a/phase5_iter5b_review_v2.md b/phase5_iter5b_review_v2.md new file mode 100644 index 0000000..a9cf114 --- /dev/null +++ b/phase5_iter5b_review_v2.md @@ -0,0 +1,173 @@ +# Iteration 5b — Phase 5 review v2 (of option-β plan) + +Phase 5 architect review of `phase4_iter5b_plan_v2.md` by sonnet-architect, 2026-05-12. Second Phase 5 pass for iter5b after the first Phase 5 cleared α' (which then failed at Phase 7). + +## Executive summary + +Option β's architectural choice is correct — the α' failure mode is genuinely eliminated. CreateContext is the right lifecycle site because `config_id` is unambiguously available there. **But two Critical blocking issues exist** that the plan glossed over: (1) `video_format == NULL` guard at context.c:64-66 rejects every first β CreateContext; (2) DestroyContext lacks `request_pool_destroy`, leaving stale output_pool state across context cycles. Stripping surface.c (C3) without fixing both makes the second CreateContext fail in a worse way than α' did. + +Plus three Important findings and four Non-Amendment confirmations. + +## Critical findings (BLOCKING Phase 6) + +### CRIT-1 — `video_format == NULL` guard at context.c:64-66 will reject every first β CreateContext + +In current code: + +```c +video_format = driver_data->video_format; // context.c:64 +if (video_format == NULL) // context.c:65 + return VA_STATUS_ERROR_OPERATION_FAILED; // context.c:66 +output_type = v4l2_type_video_output(video_format->v4l2_mplane); // :68 +capture_type = v4l2_type_video_capture(video_format->v4l2_mplane); // :69 +``` + +Pre-β: CreateSurfaces2 (surface.c:253) sets `driver_data->video_format` before CreateContext runs. Under β, that probe MOVES into CreateContext. First CreateContext call: `driver_data->video_format == NULL` → early return. + +**Required fix**: Remove the early-return at lines 65-66. Move CAPTURE-format probe (which sets `driver_data->video_format`) to the top of CreateContext. Derive `output_type`/`capture_type` after the probe. + +### CRIT-2 — `request_pool_destroy` missing from DestroyContext: second CreateContext re-uses stale pool + +DestroyContext does: +- v4l2_set_stream(OUTPUT, false) (line 276) +- v4l2_set_stream(CAPTURE, false) (line 280) +- RequestDestroySurfaces (line 286) +- v4l2_request_buffers(OUTPUT, 0) (line 296) — REQBUFS(0) frees kernel-side OUTPUT buffers +- cap_pool_destroy (line 306-307) — frees CAPTURE side +- surface_reset_format_cache (line 319) + +**Missing**: `request_pool_destroy(&driver_data->output_pool)`. Empirical grep: + +``` +src/request_pool.c:93: request_pool_destroy(pool); # internal error path +src/request_pool.c:97:void request_pool_destroy(...) # definition +src/request.c:361: request_pool_destroy(...) # Terminate +src/surface.c:220: request_pool_destroy(...) # resolution-change in CreateSurfaces2 +``` + +Only surface.c:220 calls it from a per-session path — and β strips that line. + +`request_pool_init` at request_pool.c:35-36 has `if (pool->initialized) return 0;`. Under β: +1. First CreateContext: request_pool_init succeeds, pool->initialized=true. +2. DestroyContext: REQBUFS(0) frees kernel buffers BUT pool->initialized stays true, slot mmap pointers stay populated. +3. Second CreateContext: request_pool_init sees initialized=true, returns 0 — pool retains stale slot indices pointing at kernel buffers that no longer exist. +4. Next QBUF on any slot → EINVAL. + +**Required fix**: Add `request_pool_destroy(&driver_data->output_pool)` to DestroyContext BEFORE `v4l2_request_buffers(OUTPUT, 0)`. Order matters: munmap (request_pool_destroy) first, then REQBUFS(0) (kernel buffer free). Mirror surface.c:219-220's existing `if (output_pool.initialized) request_pool_destroy(...)` shape. + +C3 (surface.c strip) and CRIT-2 fix (DestroyContext add) MUST land in the same commit or strictly sequenced. + +## Important findings + +### IMP-1 — `output_type` mplane assumption: hardcoded `v4l2_type_video_output(true)` is wrong for SUNXI_TILED_NV12 + +Plan's C4 opens with `output_type_pre = v4l2_type_video_output(true);` to set OUTPUT format before CAPTURE probe. Works on fresnel (NV12 = mplane=true) but fails on cedrus/Allwinner targets where SUNXI_TILED_NV12 has v4l2_mplane=false. + +**Fix**: Reorder — probe CAPTURE first (the find_format calls don't require OUTPUT to be set), derive output_type from `video_format->v4l2_mplane`, THEN S_FMT(OUTPUT). Eliminates the `output_type_pre` hack. + +Empirical correctness: `v4l2_find_format` enumerates available formats without requiring any S_FMT first — confirmed by surface.c:238-253 doing exactly that probe. + +### IMP-2 — `driver_data->video_format` retained across DestroyContext; `surface_reset_format_cache` becomes dead + +`video_format` is a pointer into static `formats[]` in video.c, never freed. Skipping the probe on second CreateContext via `if (!driver_data->video_format)` is intentional and correct. + +But: DestroyContext's `surface_reset_format_cache` call (line 319) and its comment (lines 309-318) reference `last_output_width/height` which C7 deletes. Under β, `surface_reset_format_cache` is empty/dead. + +**Fix**: C7 should also delete `surface_reset_format_cache` and the call in DestroyContext, plus update the comment to reflect β's CreateContext-centric lifecycle. + +### IMP-3 — Error recovery in CreateContext (β): partial state on S_FMT-succeeds + cap_pool_init-fails + +If S_FMT(OUTPUT) succeeds then cap_pool_init fails, the existing error path (context.c:244-252) frees ids+context_object but leaves OUTPUT format set. cap_pool's own error_cleanup at cap_pool.c:125-137 handles its own REQBUFS(0), so cap_pool failure is self-recovering. OUTPUT S_FMT is not destructive when no buffers exist yet. + +**Safe but messy.** Phase 6 should add a comment in the error path noting what state remains, so future readers don't think the rollback is complete. + +## Non-amendment findings + +### NC-1 — α' failure mode eliminated by β: CONFIRMED + +α' crash: CreateSurfaces2 fires mid-stream → cap_pool_destroy → stale destination_data → SIGSEGV. Under β: CreateSurfaces2 touches no V4L2 device state and allocates no V4L2 buffers. No cap_pool_destroy from CreateSurfaces2. Multi-CreateSurfaces2 mid-stream is harmless. + +### NC-2 — Multi-context without intervening DestroyContext: SAFE + +Second CreateContext sees STREAMON-on-OUTPUT EBUSY → returns OPERATION_FAILED. No silent corruption. Real consumers don't do this anyway. + +### NC-3 — Field partitioning for surface_bind_slot: CORRECT + +Format-uniform fields (`destination_planes_count`, `destination_sizes`, `destination_offsets`, `destination_bytesperlines`, `destination_buffers_count`) move to CreateContext (β). Per-slot fields (`destination_index`, `destination_map[]`, `destination_data[]`) stay at BeginPicture's surface_bind_slot. The split is exactly what surface.h:63-72 documents. + +Minor: `destination_buffers_count` is also set at surface_bind_slot (line 101). Redundant but harmless (slot->buffers_count matches video_format->v4l2_buffers_count). + +### NC-4 — mpv probe-then-real pattern: CORRECT under β with CRIT-2 fix + +Probe-CreateContext → DestroyContext (with CRIT-2 fix: full pool teardown) → Real-CreateContext. Clean rebuild at the new resolution. Without CRIT-2 fix this path breaks; with CRIT-2 it works. + +## Plan-specific concern answers + +| Concern | Verdict | +|---|---| +| 1 (Error-recovery branches) | IMP-3 | +| 2 (Multi-context shared driver_data) | NC-2 + IMP-2 | +| 3 (Surface ID lookup fails) | Handled via standard SURFACE() NULL check + error return | +| 4 (picture_width vs surface->width/height) | Pre-existing quirk, β doesn't introduce | +| 5 (destination_data fill ordering) | NC-3 | +| 6 (ohm backwards compat) | Safe — MPEG-2 gets explicit format that hantro previously substituted; H.264 unchanged | +| 7 (CreateContext without DestroyContext between) | NC-2 | +| 8 (mpv probe-then-real) | NC-4 (depends on CRIT-2 fix) | +| 9 (destination_* transplant correctness) | Mechanical; preserve all 3 branches of `video_format->v4l2_buffers_count` switch | +| 10 (CreateConfig error paths) | object_heap_allocate zeroes the struct; if CreateConfig returns ERROR before reaching the pixelformat assignment, the field is 0; CreateContext's `if (pixelformat == 0)` defensive check catches it | + +## Summary scorecard + +| Clause | Status | Finding | +|---|---|---| +| C1 (codec.h/c helper) | PASS | Identical to reverted commit; no issue | +| C2 (CreateConfig wire-up) | PASS | Correct | +| C3 (strip surface.c) | CONDITIONAL | Must land with CRIT-2 fix | +| C4 (CreateContext new flow) | FAIL: CRIT-1 + IMP-1 | NULL guard removal; probe order before S_FMT | +| C5 (DestroyContext: "no change") | **FAIL: CRIT-2** | Plan said "no change needed" — wrong; must add request_pool_destroy | +| C6 (surface.h unchanged) | PASS | No edit needed | +| C7 (delete last_output_*) | PASS | Recommend extending to delete surface_reset_format_cache too (IMP-2) | +| C8 (Phase 7 verification) | PASS | Same methodology | +| C9 (Phase 8 close) | CONDITIONAL | Depends on CRIT fixes | + +**Phase 6 BLOCKED pending CRIT-1 + CRIT-2 fixes.** + +## Required amendments before Phase 6 + +1. **CRIT-1**: Remove `video_format == NULL` guard at context.c:65-66. Move CAPTURE-format probe to top of CreateContext. Derive output_type/capture_type after probe. + +2. **CRIT-2**: Add `request_pool_destroy(&driver_data->output_pool)` to DestroyContext BEFORE `v4l2_request_buffers(OUTPUT, 0)`. Mirror surface.c:219-220's existing pattern. + +3. **IMP-1**: Reorder C4 — CAPTURE probe FIRST, then derive output_type from `video_format->v4l2_mplane`, then S_FMT(OUTPUT). Eliminates `output_type_pre` hack. + +4. **IMP-2 (folded into C7)**: Delete `surface_reset_format_cache` (function + call in DestroyContext + the stale comment block). Field deletion + function deletion happen together. + +5. **C3 + CRIT-2 commit ordering**: Phase 6 must commit C3 (strip surface.c) and CRIT-2 fix (DestroyContext add) together or strictly sequenced. Cannot commit C3 alone — it would leave NO call site for request_pool_destroy in per-session teardown. + +## Author re-verification (post-review, 2026-05-12) + +Per `feedback_review_empirical_over_theoretical.md` Direction 2: + +### CRIT-1 — CONFIRMED at code level + +context.c:64-66 verified by direct read. The early-return guarantees first-CreateContext failure under β. + +### CRIT-2 — CONFIRMED at code level + +Grep of `request_pool_destroy` callsites: only surface.c:220 and request.c:361 (Terminate) plus internal request_pool.c:93. β strips surface.c:220. Without amendment, ZERO per-session callsites. + +### IMP-1, IMP-2 — CONFIRMED + +mplane=false case exists for SUNXI_TILED_NV12 (video.c entries). `surface_reset_format_cache` reads only `last_output_width/height` — fully dead under C7's field deletion. + +### Lessons + +The first Phase 5 review (of α') cleared a plan that then failed Phase 7. The second Phase 5 review (this one) caught two architectural gaps before Phase 6. Empirical-over-theoretical worked: the reviewer read the source to verify "DestroyContext does the right teardown" — the plan claimed it did, but the source disagreed. + +The author (Phase 4 v2 writer, me) read context.c:276-307 and noted "STREAMOFF + REQBUFS(0) + cap_pool_destroy at context.c:276-307. No change needed." — missing that `request_pool_destroy` is absent. Surface-level read vs full empirical traversal. + +### Memory rule reinforcement + +`feedback_review_empirical_over_theoretical.md` (Direction 2): the reviewer here applied it correctly by enumerating `request_pool_destroy` callsites via grep before claiming the function existed only in surface.c. The author should apply Direction 1 in writing future plans: don't claim "no change needed" without enumerating call/use sites empirically. + +Worth a fresh memory rule? Maybe: **"Before writing 'no change needed' in a plan, grep for the symbol/function the change would touch. List the call sites. Confirm the change wouldn't break any of them."** Pin this lesson when iter5b closes.