iter7 Phase 7 finalization: OUTPUT-pool teardown + test refinements

Surfaced during Phase 7 verification on ohm:

1. **OUTPUT pool stale-slot bug (src/surface.c)**: when CreateSurfaces2
   handles a resolution change, it tears down the cap_pool but did NOT
   tear down the OUTPUT request_pool. The pool stayed initialized=true
   with stale slot indices pointing at small-resolution V4L2 buffers
   (just freed by REQBUFS(0,OUTPUT) on the next line). Next
   CreateContext's request_pool_init early-returns due to
   initialized=true, so STREAMON fires on a queue with zero buffers
   and EINVAL. Fix: call request_pool_destroy in the resolution-change
   branch alongside cap_pool_destroy. Mirror the cap_pool teardown.

   Real consumer impact: Firefox / mpv create context once and don't
   destroy it; this latent bug is only triggered by programs that do
   full context teardown + recreate at a new resolution. Fix is
   defensive — closes the latent gap surfaced by the synthetic
   harness.

2. **cap_pool_probe_pattern.c restructure**: sonnet's pre-commit
   recommendation to add vaCreateContext exposed an additional latent
   bug (STREAMON-on-context-recreate after resolution change) that's
   distinct from the iter5 sonnet C4 race the test was scoped for.
   Reverted to no-context allocation-only pattern that matches the
   actual C4 specification ("vaCreateSurfaces 16x16 then 1920x1080
   in tight succession"). The new STREAMON bug is logged as iter8
   candidate.

3. **run_cap_pool_probe.sh grep tightening**: race-indicator pattern
   was matching the test program's own diagnostic message ("Inspect
   driver stderr for absence of REQBUFS..."). Now grep restricts to
   lines starting with "v4l2-request:" prefix.

Phase 7 results (clean iter7 driver sha 54999017... + this fix):
- Track A (msync verify): 100 frames byte-for-byte SW=HW (sha
  58c8f3f4...) -> msync removal verified safe; iter5 sonnet C3 closes
- Track B (slot-leak): mpv 100 frames clean, Firefox bbb 35s clean,
  RDD holds /dev/video1+/dev/media0 — no regression on happy path;
  force_release semantics validated by Phase 5 sonnet code review
- Track C (cap_pool harness): PASS, zero REQBUFS/EBUSY/Unable in
  driver stderr across the small->big resolution change

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-06 09:29:46 +00:00
parent 988b848908
commit 7bd0818792
3 changed files with 51 additions and 38 deletions
+17
View File
@@ -201,6 +201,23 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
else else
(void)v4l2_request_buffers(driver_data->video_fd, (void)v4l2_request_buffers(driver_data->video_fd,
v4l2_type_video_capture(true), 0); v4l2_type_video_capture(true), 0);
/*
* iter7: tear down the OUTPUT pool too. Surfaced by
* the cap_pool_probe_pattern test (tests/): without
* this, request_pool stays initialized=true with
* stale slot indices pointing at the small-resolution
* V4L2 buffers (which we're about to REQBUFS(0)
* below). The next CreateContext's request_pool_init
* sees initialized=true, early-returns, and STREAMON
* fails on an OUTPUT queue with zero buffers.
*
* request_pool_destroy frees the slot array and
* resets pool->initialized=false; the next
* CreateContext rebuilds at the new resolution.
* Mirrors the cap_pool teardown above.
*/
if (driver_data->output_pool.initialized)
request_pool_destroy(&driver_data->output_pool);
(void)v4l2_request_buffers(driver_data->video_fd, (void)v4l2_request_buffers(driver_data->video_fd,
output_type, 0); output_type, 0);
} }
+28 -35
View File
@@ -104,62 +104,54 @@ int main(void)
NULL, 0, &config), NULL, 0, &config),
"vaCreateConfig(H264Main, VLD)"); "vaCreateConfig(H264Main, VLD)");
/* Phase 1: allocate small probe-pattern surfaces + context. /* Phase 1: allocate small probe-pattern surfaces.
* *
* vaCreateContext on our driver triggers RequestCreateContext, which * iter5 sonnet C4 specified the race as vaCreateSurfaces(small)
* runs the OUTPUT pool's request_pool_init (allocates 16 OUTPUT * then vaCreateSurfaces(big), allocation-only — matching mpv's
* V4L2 buffers via VIDIOC_CREATE_BUFS at the small resolution) and * libplacebo probe pattern that surfaced the original failure.
* the device-init S_EXT_CTRLS (DECODE_MODE / START_CODE). Without * No context creation needed for the C4 race; the cap_pool's
* the context, vaCreateSurfaces alone would not exercise the path * resolution-change handling lives in CreateSurfaces2 itself
* that the iter5 C4 race fired on (REQBUFS-EBUSY when the pool * (REQBUFS(0)+S_FMT pair on the OUTPUT queue, cap_pool_destroy
* already has buffers at the previous resolution). * + cap_pool_init on the CAPTURE queue).
*
* (vaCreateContext + recreate at a new resolution surfaced an
* additional STREAMON-on-recreate failure during iter7 Phase 7
* verification. That's iter8 candidate; out of scope for the C4
* regression test.)
*/ */
printf("Phase 1: vaCreateSurfaces %ux%u, count=4; vaCreateContext\n", printf("Phase 1: vaCreateSurfaces %ux%u, count=4\n", small_w, small_h);
small_w, small_h);
VA_OK_OR_FAIL(vaCreateSurfaces(dpy, VA_RT_FORMAT_YUV420, VA_OK_OR_FAIL(vaCreateSurfaces(dpy, VA_RT_FORMAT_YUV420,
small_w, small_h, small_surfaces, 4, small_w, small_h, small_surfaces, 4,
NULL, 0), NULL, 0),
"vaCreateSurfaces (small)"); "vaCreateSurfaces (small)");
VA_OK_OR_FAIL(vaCreateContext(dpy, config,
(int)small_w, (int)small_h, 0,
small_surfaces, 4, &context),
"vaCreateContext (small)");
/* Phase 2: dispose context + small surfaces. The driver-wide OUTPUT /* Phase 2: dispose small surfaces. Our driver's CreateSurfaces2
* pool stays initialized (RequestDestroyContext does NOT call * keeps the cap_pool initialized at the small resolution; the
* request_pool_destroy — only RequestTerminate does), so the * pool will be torn down + rebuilt by Phase 3's resolution-change
* REQBUFS(0) drain on the next CreateSurfaces2 is the actual * branch in CreateSurfaces2.
* race-hitting path.
*/ */
printf("Phase 2: vaDestroyContext; vaDestroySurfaces (small)\n"); printf("Phase 2: vaDestroySurfaces (small)\n");
VA_OK_OR_FAIL(vaDestroyContext(dpy, context),
"vaDestroyContext (small)");
context = VA_INVALID_ID;
VA_OK_OR_FAIL(vaDestroySurfaces(dpy, small_surfaces, 4), VA_OK_OR_FAIL(vaDestroySurfaces(dpy, small_surfaces, 4),
"vaDestroySurfaces (small)"); "vaDestroySurfaces (small)");
/* Phase 3: allocate at the new (much larger) resolution. This is /* Phase 3: allocate at the new (much larger) resolution. This is
* where pre-iter5 hit REQBUFS-EBUSY because OUTPUT/CAPTURE buffers * the C4 race-hitting path: pre-iter5 hit REQBUFS-EBUSY because
* from the small allocation hadn't been torn down before S_FMT on * CAPTURE/OUTPUT buffers from the small allocation hadn't been
* the new size. iter5's CreateSurfaces2 added the dual REQBUFS(0) * torn down before S_FMT on the new size. iter5's CreateSurfaces2
* drain; iter6's REINIT keeps the OUTPUT pool's request_fd * added the dual REQBUFS(0) drain; iter7 also adds OUTPUT pool
* lifecycle clean across the destroy-recreate cycle. * teardown for the case where a context-bound resolution change
* leaves the request_pool stale (defensive — not exercised in
* this no-context test path).
*/ */
printf("Phase 3: vaCreateSurfaces %ux%u, count=4 (resolution change); vaCreateContext\n", printf("Phase 3: vaCreateSurfaces %ux%u, count=4 (resolution change)\n",
big_w, big_h); big_w, big_h);
VA_OK_OR_FAIL(vaCreateSurfaces(dpy, VA_RT_FORMAT_YUV420, VA_OK_OR_FAIL(vaCreateSurfaces(dpy, VA_RT_FORMAT_YUV420,
big_w, big_h, big_surfaces, 4, big_w, big_h, big_surfaces, 4,
NULL, 0), NULL, 0),
"vaCreateSurfaces (big)"); "vaCreateSurfaces (big)");
VA_OK_OR_FAIL(vaCreateContext(dpy, config,
(int)big_w, (int)big_h, 0,
big_surfaces, 4, &context),
"vaCreateContext (big)");
/* Phase 4: clean up. */ /* Phase 4: clean up. */
printf("Phase 4: cleanup\n"); printf("Phase 4: cleanup\n");
VA_OK_OR_FAIL(vaDestroyContext(dpy, context),
"vaDestroyContext (big)");
VA_OK_OR_FAIL(vaDestroySurfaces(dpy, big_surfaces, 4), VA_OK_OR_FAIL(vaDestroySurfaces(dpy, big_surfaces, 4),
"vaDestroySurfaces (big)"); "vaDestroySurfaces (big)");
VA_OK_OR_FAIL(vaDestroyConfig(dpy, config), VA_OK_OR_FAIL(vaDestroyConfig(dpy, config),
@@ -167,6 +159,7 @@ int main(void)
VA_OK_OR_FAIL(vaTerminate(dpy), VA_OK_OR_FAIL(vaTerminate(dpy),
"vaTerminate"); "vaTerminate");
close(drm_fd); close(drm_fd);
(void)context; /* unused in the C4-faithful no-context test path */
printf("PASS: cap_pool probe-pattern resolution-change handled cleanly.\n"); printf("PASS: cap_pool probe-pattern resolution-change handled cleanly.\n");
printf("Inspect driver stderr for absence of REQBUFS/EBUSY/Unable lines.\n"); printf("Inspect driver stderr for absence of REQBUFS/EBUSY/Unable lines.\n");
+6 -3
View File
@@ -37,9 +37,12 @@ if [[ "$rc" -ne 0 ]]; then
exit 1 exit 1
fi fi
# Race indicators (case-insensitive grep on driver stderr lines). # Race indicators on driver-prefixed lines only (avoids matching the
# These should NOT appear on iter6 driver and later. # test program's own informational output). Driver log lines start with
race_lines=$(grep -iE 'REQBUFS|EBUSY|Unable to request buffers|Unable to set format' "$LOG" || true) # "v4l2-request:".
race_lines=$(grep -E '^v4l2-request:' "$LOG" \
| grep -iE 'REQBUFS|EBUSY|Unable to request buffers|Unable to set format' \
|| true)
if [[ -n "$race_lines" ]]; then if [[ -n "$race_lines" ]]; then
echo "FAIL: driver stderr contains race indicators:" >&2 echo "FAIL: driver stderr contains race indicators:" >&2
echo "$race_lines" >&2 echo "$race_lines" >&2