iter7 Phase 7 finalization: OUTPUT-pool teardown + test refinements

Surfaced during Phase 7 verification on ohm:

1. **OUTPUT pool stale-slot bug (src/surface.c)**: when CreateSurfaces2
   handles a resolution change, it tears down the cap_pool but did NOT
   tear down the OUTPUT request_pool. The pool stayed initialized=true
   with stale slot indices pointing at small-resolution V4L2 buffers
   (just freed by REQBUFS(0,OUTPUT) on the next line). Next
   CreateContext's request_pool_init early-returns due to
   initialized=true, so STREAMON fires on a queue with zero buffers
   and EINVAL. Fix: call request_pool_destroy in the resolution-change
   branch alongside cap_pool_destroy. Mirror the cap_pool teardown.

   Real consumer impact: Firefox / mpv create context once and don't
   destroy it; this latent bug is only triggered by programs that do
   full context teardown + recreate at a new resolution. Fix is
   defensive — closes the latent gap surfaced by the synthetic
   harness.

2. **cap_pool_probe_pattern.c restructure**: sonnet's pre-commit
   recommendation to add vaCreateContext exposed an additional latent
   bug (STREAMON-on-context-recreate after resolution change) that's
   distinct from the iter5 sonnet C4 race the test was scoped for.
   Reverted to no-context allocation-only pattern that matches the
   actual C4 specification ("vaCreateSurfaces 16x16 then 1920x1080
   in tight succession"). The new STREAMON bug is logged as iter8
   candidate.

3. **run_cap_pool_probe.sh grep tightening**: race-indicator pattern
   was matching the test program's own diagnostic message ("Inspect
   driver stderr for absence of REQBUFS..."). Now grep restricts to
   lines starting with "v4l2-request:" prefix.

Phase 7 results (clean iter7 driver sha 54999017... + this fix):
- Track A (msync verify): 100 frames byte-for-byte SW=HW (sha
  58c8f3f4...) -> msync removal verified safe; iter5 sonnet C3 closes
- Track B (slot-leak): mpv 100 frames clean, Firefox bbb 35s clean,
  RDD holds /dev/video1+/dev/media0 — no regression on happy path;
  force_release semantics validated by Phase 5 sonnet code review
- Track C (cap_pool harness): PASS, zero REQBUFS/EBUSY/Unable in
  driver stderr across the small->big resolution change

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-06 09:29:46 +00:00
parent 988b848908
commit 7bd0818792
3 changed files with 51 additions and 38 deletions
+28 -35
View File
@@ -104,62 +104,54 @@ int main(void)
NULL, 0, &config),
"vaCreateConfig(H264Main, VLD)");
/* Phase 1: allocate small probe-pattern surfaces + context.
/* Phase 1: allocate small probe-pattern surfaces.
*
* vaCreateContext on our driver triggers RequestCreateContext, which
* runs the OUTPUT pool's request_pool_init (allocates 16 OUTPUT
* V4L2 buffers via VIDIOC_CREATE_BUFS at the small resolution) and
* the device-init S_EXT_CTRLS (DECODE_MODE / START_CODE). Without
* the context, vaCreateSurfaces alone would not exercise the path
* that the iter5 C4 race fired on (REQBUFS-EBUSY when the pool
* already has buffers at the previous resolution).
* iter5 sonnet C4 specified the race as vaCreateSurfaces(small)
* then vaCreateSurfaces(big), allocation-only — matching mpv's
* libplacebo probe pattern that surfaced the original failure.
* No context creation needed for the C4 race; the cap_pool's
* resolution-change handling lives in CreateSurfaces2 itself
* (REQBUFS(0)+S_FMT pair on the OUTPUT queue, cap_pool_destroy
* + cap_pool_init on the CAPTURE queue).
*
* (vaCreateContext + recreate at a new resolution surfaced an
* additional STREAMON-on-recreate failure during iter7 Phase 7
* verification. That's iter8 candidate; out of scope for the C4
* regression test.)
*/
printf("Phase 1: vaCreateSurfaces %ux%u, count=4; vaCreateContext\n",
small_w, small_h);
printf("Phase 1: vaCreateSurfaces %ux%u, count=4\n", small_w, small_h);
VA_OK_OR_FAIL(vaCreateSurfaces(dpy, VA_RT_FORMAT_YUV420,
small_w, small_h, small_surfaces, 4,
NULL, 0),
"vaCreateSurfaces (small)");
VA_OK_OR_FAIL(vaCreateContext(dpy, config,
(int)small_w, (int)small_h, 0,
small_surfaces, 4, &context),
"vaCreateContext (small)");
/* Phase 2: dispose context + small surfaces. The driver-wide OUTPUT
* pool stays initialized (RequestDestroyContext does NOT call
* request_pool_destroy — only RequestTerminate does), so the
* REQBUFS(0) drain on the next CreateSurfaces2 is the actual
* race-hitting path.
/* Phase 2: dispose small surfaces. Our driver's CreateSurfaces2
* keeps the cap_pool initialized at the small resolution; the
* pool will be torn down + rebuilt by Phase 3's resolution-change
* branch in CreateSurfaces2.
*/
printf("Phase 2: vaDestroyContext; vaDestroySurfaces (small)\n");
VA_OK_OR_FAIL(vaDestroyContext(dpy, context),
"vaDestroyContext (small)");
context = VA_INVALID_ID;
printf("Phase 2: vaDestroySurfaces (small)\n");
VA_OK_OR_FAIL(vaDestroySurfaces(dpy, small_surfaces, 4),
"vaDestroySurfaces (small)");
/* Phase 3: allocate at the new (much larger) resolution. This is
* where pre-iter5 hit REQBUFS-EBUSY because OUTPUT/CAPTURE buffers
* from the small allocation hadn't been torn down before S_FMT on
* the new size. iter5's CreateSurfaces2 added the dual REQBUFS(0)
* drain; iter6's REINIT keeps the OUTPUT pool's request_fd
* lifecycle clean across the destroy-recreate cycle.
* the C4 race-hitting path: pre-iter5 hit REQBUFS-EBUSY because
* CAPTURE/OUTPUT buffers from the small allocation hadn't been
* torn down before S_FMT on the new size. iter5's CreateSurfaces2
* added the dual REQBUFS(0) drain; iter7 also adds OUTPUT pool
* teardown for the case where a context-bound resolution change
* leaves the request_pool stale (defensive — not exercised in
* this no-context test path).
*/
printf("Phase 3: vaCreateSurfaces %ux%u, count=4 (resolution change); vaCreateContext\n",
printf("Phase 3: vaCreateSurfaces %ux%u, count=4 (resolution change)\n",
big_w, big_h);
VA_OK_OR_FAIL(vaCreateSurfaces(dpy, VA_RT_FORMAT_YUV420,
big_w, big_h, big_surfaces, 4,
NULL, 0),
"vaCreateSurfaces (big)");
VA_OK_OR_FAIL(vaCreateContext(dpy, config,
(int)big_w, (int)big_h, 0,
big_surfaces, 4, &context),
"vaCreateContext (big)");
/* Phase 4: clean up. */
printf("Phase 4: cleanup\n");
VA_OK_OR_FAIL(vaDestroyContext(dpy, context),
"vaDestroyContext (big)");
VA_OK_OR_FAIL(vaDestroySurfaces(dpy, big_surfaces, 4),
"vaDestroySurfaces (big)");
VA_OK_OR_FAIL(vaDestroyConfig(dpy, config),
@@ -167,6 +159,7 @@ int main(void)
VA_OK_OR_FAIL(vaTerminate(dpy),
"vaTerminate");
close(drm_fd);
(void)context; /* unused in the C4-faithful no-context test path */
printf("PASS: cap_pool probe-pattern resolution-change handled cleanly.\n");
printf("Inspect driver stderr for absence of REQBUFS/EBUSY/Unable lines.\n");
+6 -3
View File
@@ -37,9 +37,12 @@ if [[ "$rc" -ne 0 ]]; then
exit 1
fi
# Race indicators (case-insensitive grep on driver stderr lines).
# These should NOT appear on iter6 driver and later.
race_lines=$(grep -iE 'REQBUFS|EBUSY|Unable to request buffers|Unable to set format' "$LOG" || true)
# Race indicators on driver-prefixed lines only (avoids matching the
# test program's own informational output). Driver log lines start with
# "v4l2-request:".
race_lines=$(grep -E '^v4l2-request:' "$LOG" \
| grep -iE 'REQBUFS|EBUSY|Unable to request buffers|Unable to set format' \
|| true)
if [[ -n "$race_lines" ]]; then
echo "FAIL: driver stderr contains race indicators:" >&2
echo "$race_lines" >&2