iter2 Fix 3: decoupled CAPTURE buffer pool with LRU recycling

Pre-iter2 each VA surface was permanently 1:1 bound to one V4L2 CAPTURE
buffer. mpv reusing a surface for a new decode while the compositor still
held an EXPBUF'd dma_buf fd to the prior frame caused the kernel to
write fresh decode output into the same physical memory the compositor
was reading -- visible as stutter / back-and-forth swap on
mpv --hwdec=vaapi --vo=gpu playback.

Architecture:
- New cap_pool abstraction (cap_pool.{h,c}) owns N CAPTURE buffers
  (N = max(surfaces_count, MIN_CAP_POOL=24)) with per-slot state
  {FREE, IN_DECODE, DECODED, EXPORTED} guarded by pthread_mutex_t.
- Surfaces no longer own buffers; each vaBeginPicture acquires the
  oldest FREE slot (LRU), binds it for the decode cycle, and the slot
  cycles IN_DECODE -> DECODED (post-DQBUF) -> EXPORTED (post-EXPBUF).
- Slot is released on next BeginPicture for the same surface or on
  vaDestroySurfaces.

Limitations (Sonnet Phase 5 review iter2 9.x, deferred to iter3+):
- Option-A statistical mitigation; race window narrows to "pool
  exhausted, force-recycle of oldest EXPORTED slot." For typical mpv
  16-surface playback with MIN_CAP_POOL=24 the fallback never fires.
- Multi-context concurrent use not addressed (one V4L2 device, multiple
  cap_pools -- iter3 scope).

Other call sites updated:
- picture.c::BeginPicture acquires + binds, releasing prior slot if any.
- surface.c::SyncSurface marks slot DECODED after DQBUF.
- surface.c::ExportSurfaceHandle marks slot EXPORTED, retaining OUR
  EXPBUF fd for force-recycle close().
- surface.c::DestroySurfaces releases via surface_unbind_slot;
  cap_pool owns the mmaps now.
- surface.c::CreateSurfaces2 destroys the pool in the resolution-change
  path before REQBUFS(0) (else stale v4l2_index after Fix 1's REQBUFS).
- context.c::DestroyContext invokes cap_pool_destroy.
- image.c::DeriveImage skips copy_surface_to_image when current_slot is
  NULL (ffmpeg av_hwframe_ctx_init probes derive on undecoded surfaces).

Verified: mpv vaapi-copy 200 frames bbb_1080p30, 0 drops, LRU visibly
recycling slot indices, real luma gradient. mpv vaapi --vo=gpu
operator-inspection follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-04 22:03:31 +00:00
parent e64bb0852d
commit 19acc76da4
9 changed files with 695 additions and 66 deletions
+35
View File
@@ -32,6 +32,9 @@
#include <va/va_backend.h>
#include "object_heap.h"
#include "cap_pool.h"
struct request_data;
#define SURFACE(data, id) \
((struct object_surface *)object_heap_lookup(&(data)->surface_heap, id))
@@ -48,6 +51,26 @@ struct object_surface {
void *source_data;
unsigned int source_size;
/*
* Iter2 Fix 3: destination_* fields below are now per-decode-cycle.
* They are populated from current_slot in RequestBeginPicture and
* remain valid through SyncSurface, ExportSurfaceHandle, and
* DeriveImage/copy_surface_to_image (vaapi-copy path). Subsequent
* BeginPicture for this surface releases the prior slot and
* acquires a new one.
*
* destination_planes_count, destination_sizes, destination_offsets,
* destination_bytesperlines are FORMAT-uniform across all CAPTURE
* buffers, so they're set once at CreateSurfaces2 time and stay.
*
* destination_index, destination_map[], destination_map_lengths,
* destination_map_offsets, destination_data[] are SLOT-specific
* and re-populated each BeginPicture from current_slot.
*
* destination_buffers_count is also format-uniform (V4L2 planes
* per buffer = 1 for single-plane MPLANE NV12).
*/
struct cap_pool_slot *current_slot; /* iter2 Fix 3 */
unsigned int destination_index;
void *destination_map[VIDEO_MAX_PLANES];
unsigned int destination_map_lengths[VIDEO_MAX_PLANES];
@@ -146,4 +169,16 @@ VAStatus RequestExportSurfaceHandle(VADriverContextP context,
*/
void surface_reset_format_cache(void);
/*
* Iter2 Fix 3: bind / unbind a CAPTURE-pool slot to an object_surface.
* Called from picture.c::RequestBeginPicture (acquire+bind) and
* surface.c::RequestDestroySurfaces (unbind). Mirrors slot's V4L2 index
* and mmap pointers into surface_object->destination_* so existing
* QBUF/DQBUF/EXPBUF code paths see no behavioral change.
*/
void surface_bind_slot(struct object_surface *surface_object,
struct cap_pool_slot *slot);
void surface_unbind_slot(struct request_data *driver_data,
struct object_surface *surface_object);
#endif