From b2e3bf721b19a02aa37f4dc6b316637ec6cd76e9 Mon Sep 17 00:00:00 2001
From: claude-noether <claude@reauktion.de>
Date: Wed, 6 May 2026 07:17:46 +0000
Subject: [PATCH] iter7 Phase 2: situation analysis A+B+C

Per-track design + risk + plan + effort estimates:
- A (msync pixel verify): SW vs HW NV12 byte comparison via FFmpeg
- B (slot-leak fix): request_pool_force_release for error recovery
- C (cap_pool race harness): synthetic libva test program

Phase 4 implementation lands in the fork as commit 988b848
(libva-v4l2-request-fourier). Phase 5 sonnet code review caught
3 actionable issues, all addressed before commit.

Phase 6+7 deploy + verify pending operator ohm-tools availability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 phase2_iter7_situation.md | 140 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 140 insertions(+)
 create mode 100644 phase2_iter7_situation.md

diff --git a/phase2_iter7_situation.md b/phase2_iter7_situation.md
new file mode 100644
index 0000000..d596664
--- /dev/null
+++ b/phase2_iter7_situation.md
@@ -0,0 +1,140 @@
+# Phase 2 (iter7) — situation analysis: A + B + C
+
+Opened 2026-05-06 immediately after iter7 Phase 1 lock. Three independent sub-tracks; analysis per track.
+
+## Track A — msync pixel-correctness verification
+
+### Context
+
+iter1 had `msync(MS_SYNC | MS_INVALIDATE)` on the CAPTURE buffer mmap after DQBUF, before any userspace read. Was a companion to the iter1 patch-0010 hex-dump diagnostic ("where does the buffer write go?"). iter5 sweep commit `d3a299b` removed both the hex-dump and the msync. Phase 5 sonnet caveat C3 (iter5) flagged: "no pixel-correctness verification post-msync-removal."
+
+### Hypothesis
+
+On hantro G1 / RK3568 / kernel 6.19.10 with CMA-backed DMA contiguous allocator, the V4L2 framework (`videobuf2-dma-contig.c`) does cache sync at DQBUF time for cached mappings. If our CAPTURE mmap is cached, `msync(MS_INVALIDATE)` is structurally redundant. If it's write-combine / uncached, the kernel-side sync is unnecessary. Either way, msync removal should be safe.
+
+But this is testable, not just theoretical. The cleanest test:
+
+### Plan
+
+1. **SW reference**: `ffmpeg -i bbb_1080p30_h264.mp4 -frames:v 100 -f rawvideo -pix_fmt nv12 sw_ref.yuv` — pure libavcodec SW decode, well-defined NV12 packing.
+2. **HW subject**: same input through our driver via `ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i bbb -vf hwdownload,format=nv12 -frames:v 100 -f rawvideo -pix_fmt nv12 hw_capture.yuv`. Forces the same NV12 packing on output. The internal `hwdownload` path exercises libva's `vaDeriveImage` / `vaGetImage` readback, which is exactly what mpv-vaapi-copy uses.
+3. **Compare**: `cmp` byte-for-byte. If identical, the msync removal is verified safe — formally closes iter5 sonnet C3. If divergent, capture the divergence pattern (which frame, which plane, what bytes), restore msync, re-run.
+
+### What could go wrong
+
+- FFmpeg HW path may use slightly different vaapi entrypoint than mpv-vaapi-copy → test verifies a different code path than the one we care about. Mitigation: also run mpv-vaapi-copy with `--o=mpv_capture.yuv` and compare both against SW reference. If both match, we're covered for the two main consumer patterns.
+- Frame ordering may differ (B-frames out-of-order at decode vs display). Mitigation: extract reference and subject in *display order* (default for `-f rawvideo`). bbb is largely IPB-pattern; FFmpeg handles the reorder uniformly.
+- Bbb fixture's first 100 frames include I + multiple GOPs → exercises both intra and inter prediction paths.
+
+### Effort
+
+1-2 hours including the harness script. Result is binary (PASS = no msync needed; FAIL = restore msync).
+
+---
+
+## Track B — Slot-leak error recovery
+
+### Context
+
+iter6 fix bound one `media_request_alloc`-allocated fd per OUTPUT pool slot, REINIT'd between uses. But the error path in `RequestSyncSurface` (commit `a09c03c`) doesn't release the slot when REINIT or DQBUF fails — it only clears `surface_object->request_fd = -1`. The slot's `busy = true` stays. With pool size 16, after 16 such errors the pool is starved and acquire returns -1, decode permanently broken.
+
+Phase 5 sonnet code review for iter6 explicitly noted this as a "bounded leak we accept" but flagged the TODO: `request_pool_force_release` for error recovery.
+
+### Hypothesis
+
+Adding `request_pool_force_release(pool, slot_index)` that REINITs the slot's fd and clears `busy = true` will recover the slot. If REINIT fails (kernel object too far gone), fall back to `close + media_request_alloc`. If alloc also fails, mark the slot "dead" and skip it in future acquires — count effectively decremented by 1.
+
+### Plan
+
+1. Add `int media_fd` to `struct request_pool` so `force_release` can re-alloc.
+2. Update `request_pool_init` signature → already takes `media_fd` from iter6; just store it in the struct.
+3. Add `request_pool_force_release(pool, slot_index)`:
+   ```c
+   void request_pool_force_release(struct request_pool *pool, unsigned int index)
+   {
+       struct request_pool_slot *slot = lookup_slot(pool, index);
+       if (slot == NULL) return;
+       if (media_request_reinit(slot->request_fd) < 0) {
+           close(slot->request_fd);
+           slot->request_fd = media_request_alloc(pool->media_fd);
+           /* If realloc fails, slot stays busy=true (effectively dead).
+            * Other slots unaffected. */
+           if (slot->request_fd < 0) return;
+       }
+       slot->busy = false;
+   }
+   ```
+4. Call from `RequestSyncSurface` error path in `surface.c` after REINIT or DQBUF failure:
+   ```c
+   if (rc < 0) {
+       request_pool_force_release(&driver_data->output_pool, surface_object->source_index);
+       status = VA_STATUS_ERROR_OPERATION_FAILED;
+       goto error;
+   }
+   ```
+5. Fault inject test: build with `-DITER7_FAULT_INJECT_REINIT` that returns `-EBUSY` from `media_request_reinit` after N frames; observe pre-fix (pool starves at 16) vs post-fix (recovers).
+
+### What could go wrong
+
+- `media_request_alloc` returns lowest-free fd, which after close might be the SAME fd number — exactly the iter6 issue. But this is in error recovery, not normal flow; happens rarely; the primary safeguard (1:1 slot-fd binding for HAPPY path) still holds. Documented tradeoff.
+- The pool is driver-wide (multi-context safe per iter5 Track E); force_release is called from a single-threaded VA context path, no concurrency hazard.
+
+### Effort
+
+1-2 hours code + test scaffolding.
+
+---
+
+## Track C — cap_pool race synthetic harness
+
+### Context
+
+iter5 sonnet C4 flagged: "cap_pool resolution-change race — mpv's libplacebo Vulkan-fallback path triggers it; mpv recovers via SW fallback (no segfault), but the race exists. Fix: drain CAPTURE properly before issuing REQBUFs(0) on resolution change in CreateSurfaces2."
+
+Phase 5 sonnet for iter6 said: "the test plan does not yet include the probe-pattern test harness for the candidate A (REQBUFS-EBUSY) path. This is a gap. Without it, the 'mpv libplacebo GREEN' criterion may pass trivially." 
+
+iter6's REINIT discipline organically resolved the race (4 cap_pool_init events on YouTube clean), but a deterministic-repro synthetic test would anchor the claim formally.
+
+### Hypothesis
+
+A 50-line C program using libva that:
+1. Opens `/dev/dri/renderD128`, gets VADisplay
+2. Initializes libva, creates VAConfig for H264Main + VLD
+3. `vaCreateSurfaces(dpy, VA_RT_FORMAT_YUV420, 16, 16, small[4], 4, NULL, 0)` — probe-pattern small
+4. `vaDestroySurfaces(dpy, small, 4)` — dispose
+5. `vaCreateSurfaces(dpy, VA_RT_FORMAT_YUV420, 1920, 1080, big[4], 4, NULL, 0)` — real resolution
+6. `vaDestroySurfaces(dpy, big, 4)`, `vaTerminate(dpy)`, close DRM
+
+… should succeed without REQBUFS-EBUSY in driver stderr on the iter7 driver. Pre-iter5 / pre-iter6 it would fail.
+
+### Plan
+
+1. Write `tests/cap_pool_probe_pattern.c`. ~50 lines.
+2. Add a Meson build target that compiles it against installed libva (don't bundle it into the .so; build separately with `pkg-config --libs libva libva-drm`).
+3. Run on ohm under `LIBVA_DRIVER_NAME=v4l2_request ...` env vars.
+4. Capture stderr; grep for `REQBUFS|EBUSY|Unable to request buffers|Unable to set format`; non-zero match = FAIL.
+5. Optional extension: add a decode step to validate the produced surface contains a real frame. Defer to iter8 if needed; this iteration's harness is allocation-pattern-only.
+
+### What could go wrong
+
+- libva client API shape on PineTab2's libva 2.23.0 may differ slightly from the headers I'm coding against. Mitigation: read `/usr/include/va/va.h` from ohm to confirm the function signatures I use.
+- 16x16 probe surfaces may not be accepted by every driver (minimum size constraints). bbb's actual sizes are 1920x1080 and 864x486 — could use a more reasonable probe-pattern small (e.g. 128x128).
+- Test may pass trivially if the cap_pool init path doesn't actually run with surfaces_count=4 — need to verify the path is exercised. Diagnostic: temporarily add a `request_log` line in `cap_pool_init` confirming entry.
+
+### Effort
+
+2-3 hours including writing the test, debugging the libva API calls, building it on ohm, running, validating. Plus another ~30 min to stitch into the campaign repo with a Meson build description.
+
+---
+
+## Execution order
+
+B → C → A:
+
+1. **B first** (smallest, additive — no impact on happy path). Land + test, then build iter7-final driver.
+2. **C second** (synthetic test, no driver change required, but needs iter7-final driver to test against).
+3. **A last** (verification — validates the iter7-final driver including B's changes). Result either confirms no msync needed (closes iter5 sonnet C3) or restores msync as a follow-up.
+
+## Stop point
+
+Phase 2 closes here. Proceed autonomously to Phase 3 (baseline anchors) and Phase 4 (implementation).