iter6 fix: per-OUTPUT-slot request_fd binding via REINIT

iter4 (385dee1) replaced the original media_request_reinit pattern
with close+media_request_alloc per frame to escape an EINVAL on
S_EXT_CTRLS that turned out to be a DPB-payload bug (74d8dd1, FFmpeg
V4L2_H264_FRAME_REF semantics). The per-frame close+alloc model
worked for mpv vaapi-copy (single-surface recycle) but raced under
Firefox 150's MediaSource pipeline (multi-surface rotation): fd=30
got reused via lowest-free-fd allocation faster than the kernel-
side per-buffer state-machine could tear down the prior request,
producing intermittent VIDIOC_QBUF EINVAL on OUTPUT after 1..53
successful frames.

Phase 2 telemetry confirmed:
- DQBUF returned the index we passed (no FIFO mismatch)
- SPS/PPS/DECODE_PARAMS/SCALING_MATRIX byte-identical between mpv
  and Firefox first 64 bytes
- Pool size bump 4 -> 16 only delayed the failure (62 frames)
- Different OUTPUT slot indices failed across runs (race signature)

Fix: each OUTPUT pool slot owns a permanent request_fd allocated
once at request_pool_init and REINIT'd between uses in
RequestSyncSurface. 1:1 slot-to-fd binding eliminates cross-slot fd
reuse entirely. Pool stays driver-wide (multi-context safe per
iter5 Track E); slots cycle through 16 distinct fds in round-robin
acquire.

Files:
- request_pool.h: add request_fd field to slot struct; init
  signature takes media_fd
- request_pool.c: alloc per-slot fd at init, close at destroy
- context.c: pass driver_data->media_fd; pool size 4 -> 16
- picture.c: BeginPicture binds slot->request_fd to surface;
  EndPicture's per-frame media_request_alloc removed
- surface.c: RequestSyncSurface uses media_request_reinit instead
  of close+alloc; DestroySurfaces close removed (slot owns fd);
  error path close removed; surface_object NULL-init for the
  -Wmaybe-uninitialized warning fix

Empirical verification (clean build sha ebe396d5..., no diagnostic
instrumentation):
- Firefox 150 + bbb_1080p30_h264.mp4 + LIBVA_DRIVER_NAME=v4l2_request
  + sandbox enabled: 35s+ playback, zero "Unable to queue buffer"
  / "Unable to set control(s)", lsof shows RDD process holds
  /dev/video1 + /dev/media0 throughout. Driver stderr: only the
  single cap_pool_init: 24 slots ready line.
- mpv vaapi-copy 50 frames: zero errors, "Using hardware decoding
  (vaapi-copy)" - no regression vs iter5-end driver.

Pool-size bump diagnostic (Phase 5 sonnet design review feedback):
4 -> 16 alone took 1->62 frames, far short of the 30s success
criterion (~900 frames at 30fps). REINIT discipline is the actual
fix; pool 16 is comfortable headroom over typical H.264 MaxDpbFrames.

Phase 5 sonnet code review: APPROVE-WITH-CHANGES (one comment
attribution corrected: cleanup runs at RequestTerminate, not
RequestDestroyContext, since the pool is driver-wide).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-05 21:30:39 +00:00
parent c8b6edec3d
commit a09c03c154
5 changed files with 92 additions and 26 deletions
+7 -1
View File
@@ -37,6 +37,12 @@ struct request_pool_slot {
void *data; /* mmap pointer for this slot */
unsigned int size; /* mmap size in bytes */
bool busy; /* true while borrowed for a request */
int request_fd; /* per-slot media-request fd, allocated
* once at pool init, REINIT'd between
* uses. iter6: replaces iter4 close+
* alloc-per-frame to eliminate cross-
* slot fd-reuse race that broke Firefox
* MediaSource's multi-surface decode. */
};
struct request_pool {
@@ -52,7 +58,7 @@ struct request_pool {
* VIDIOC_S_FMT on the OUTPUT queue. Returns 0 on success, -1 on
* failure.
*/
int request_pool_init(struct request_pool *pool, int video_fd,
int request_pool_init(struct request_pool *pool, int video_fd, int media_fd,
unsigned int output_type, unsigned int count);
/*