From a202de1646d4c8f8ee2ebc2e4c100b621975754a Mon Sep 17 00:00:00 2001 In-Reply-To: <20260429195306.239666-1-mfritsche@reauktion.de> References: <20260429195306.239666-1-mfritsche@reauktion.de> From: Markus Fritsche Date: Sat, 9 May 2026 16:16:07 +0200 Subject: [PATCH RFC v2] media: videobuf2: add opt-in dma_resv producer fence helper MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit V4L2 producers historically don't propagate buffer-state-done into the dmabuf's dma_resv exclusive fence. Userspace consumers that import V4L2-produced dmabufs and wait on the dmabuf's implicit-sync fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE, EGL_LINUX_DMA_BUF_EXT) currently see either zero fences or a stub fence from dma_fence_get_stub(). This is correct by accident for the common DQBUF-then-import case but represents a contract gap that breaks Wayland compositors importing CAPTURE buffers from a stateless H.264 decoder under continuous playback on implicit-sync GPU stacks (observed on RK3566 + hantro VPU + Mali-G52 panfrost; manifests as green frames -- BT.709 limited-range YUV(0,0,0) -> RGB(0,77,0) -- when the GPU samples the dmabuf before the producer's decode completes). Add an opt-in API gated by both a per-driver runtime flag (vb2_queue::supports_release_fences) and a Kconfig (CONFIG_VIDEOBUF2_RELEASE_FENCES, default n) that lets producers populate a real dma_resv exclusive write fence on the dmabufs they export. Drivers call vb2_buffer_attach_release_fence(vb) at a finite-time-fenced point in their pipeline (typically m2m device_run, just before the HW kick); vb2_buffer_done() signals and puts the fence as part of its state transition. The publish and signal paths are wrapped in dma_fence_begin_signalling() / dma_fence_end_signalling() so PROVE_LOCKING can validate that nothing taken in those critical sections deadlocks against the signal path. dma_resv_lock is sleepable but not taken on the signal path, so taking it inside the publish critical section is safe under lockdep. Skips planes whose vb2_plane.dbuf is NULL -- buffers never exported via VIDIOC_EXPBUF (or imported via V4L2_MEMORY_DMABUF) have no dmabuf for userspace to wait on. Drivers that don't opt in pay nothing: the helper is a no-op stub when CONFIG_VIDEOBUF2_RELEASE_FENCES=n, and an early-return check of supports_release_fences when =y but the flag is unset. Validated on RK3566 PineTab2 with PROVE_LOCKING enabled: 30s of bbb_1080p30 H.264 stateless decode + zero-copy panfrost EGL import via dmabuf-wayland (mpv 0.41 + KWin 6.6.4 + Mesa panfrost 26.0.5) produces 31,816 dma_fence init/signal pairs across 5,724 vb2 buffer cycles with zero lockdep splats from videobuf2 / dma_resv code paths. Subsequent patches in this series opt the hantro and rockchip-rga drivers in. Cc: Daniel Vetter Cc: Christian König Cc: Nicolas Dufresne Cc: Sumit Semwal Cc: Hans Verkuil Cc: Tomasz Figa Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Markus Fritsche --- drivers/media/common/videobuf2/Kconfig | 29 ++++ .../media/common/videobuf2/videobuf2-core.c | 135 ++++++++++++++++++ include/media/videobuf2-core.h | 51 +++++++ 3 files changed, 215 insertions(+) diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig index d2223a12c..bbfa26984 100644 --- a/drivers/media/common/videobuf2/Kconfig +++ b/drivers/media/common/videobuf2/Kconfig @@ -30,3 +30,32 @@ config VIDEOBUF2_DMA_SG config VIDEOBUF2_DVB tristate select VIDEOBUF2_CORE + +config VIDEOBUF2_RELEASE_FENCES + bool "videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports" + depends on VIDEOBUF2_CORE + depends on DMA_SHARED_BUFFER + default n + help + Enables an opt-in API that lets vb2 producers populate a dma_resv + exclusive write fence on the dmabufs they export to userspace. + The fence is signalled when the buffer transitions to + VB2_BUF_STATE_DONE. + + This gives userspace consumers that import V4L2-produced dmabufs + and wait on the dmabuf's implicit-sync fence (poll(POLLIN), + DMA_BUF_IOCTL_EXPORT_SYNC_FILE, EGL_LINUX_DMA_BUF_EXT) a real + producer fence to wait on, instead of a stub fence from + dma_fence_get_stub() that the dma_buf core substitutes when + dma_resv is empty. + + Drivers individually opt in by setting + vb2_queue::supports_release_fences = true and calling + vb2_buffer_attach_release_fence() at the right point in their + pipeline (typically m2m device_run, just before HW kick). + + Distributors leave this off unless targeting Wayland/EGL + consumers of V4L2 stateless decoder output on + implicit-sync-only GPU stacks (e.g. mainline panfrost). + + If unsure, say N. diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c index adf668b21..85d7fddbd 100644 --- a/drivers/media/common/videobuf2/videobuf2-core.c +++ b/drivers/media/common/videobuf2/videobuf2-core.c @@ -26,6 +26,12 @@ #include #include +#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES +#include +#include +#include +#endif + #include #include @@ -1173,6 +1179,120 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no) } EXPORT_SYMBOL_GPL(vb2_plane_cookie); +#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES +/* + * dma_resv release-fence integration. + * + * Optional, opt-in path that lets producers publish a real + * dma_fence on their CAPTURE-side dmabufs so userspace consumers + * (compositors, EGL importers) get spec-clean implicit-sync + * semantics instead of the dma_buf core's stub fence. Drivers + * call vb2_buffer_attach_release_fence() at a finite-time-fenced + * point (typically m2m device_run) and the fence is signalled by + * vb2_buffer_done(). Gated at runtime by + * vb2_queue::supports_release_fences and at compile time by + * CONFIG_VIDEOBUF2_RELEASE_FENCES. + */ + +static const char *vb2_dma_resv_get_driver_name(struct dma_fence *fence) +{ + return "videobuf2"; +} + +static const char *vb2_dma_resv_get_timeline_name(struct dma_fence *fence) +{ + return "vb2-release-fence"; +} + +static const struct dma_fence_ops vb2_dma_resv_fence_ops = { + .get_driver_name = vb2_dma_resv_get_driver_name, + .get_timeline_name = vb2_dma_resv_get_timeline_name, +}; + +int vb2_buffer_attach_release_fence(struct vb2_buffer *vb) +{ + struct vb2_queue *q = vb->vb2_queue; + struct dma_fence *fence; + unsigned int plane; + bool cookie; + + if (!q->supports_release_fences) + return 0; + + if (WARN_ON(vb->release_fence)) + return -EINVAL; + + fence = kzalloc(sizeof(*fence), GFP_KERNEL); + if (!fence) + return -ENOMEM; + + dma_fence_init(fence, &vb2_dma_resv_fence_ops, &q->dma_resv_fence_lock, + q->dma_resv_fence_context, + atomic64_inc_return(&q->dma_resv_fence_seqno)); + + /* + * Annotate the publish-side critical section. Per + * Documentation/driver-api/dma-buf.rst, lockdep validates + * that nothing taken in this region can deadlock against + * the signal path in vb2_buffer_signal_release_fence(). + * dma_resv_lock is sleepable but is not taken on the signal + * path, so taking it inside the critical section is safe. + */ + cookie = dma_fence_begin_signalling(); + for (plane = 0; plane < vb->num_planes; plane++) { + struct dma_buf *dbuf = vb->planes[plane].dbuf; + + if (!dbuf) + continue; + + dma_resv_lock(dbuf->resv, NULL); + dma_resv_add_fence(dbuf->resv, fence, DMA_RESV_USAGE_WRITE); + dma_resv_unlock(dbuf->resv); + } + dma_fence_end_signalling(cookie); + + /* One reference for the eventual signal in vb2_buffer_done. */ + vb->release_fence = dma_fence_get(fence); + + /* The dma_resv held its own reference per plane. Drop ours. */ + dma_fence_put(fence); + + return 0; +} +EXPORT_SYMBOL_GPL(vb2_buffer_attach_release_fence); + +static void vb2_buffer_signal_release_fence(struct vb2_buffer *vb, + enum vb2_buffer_state state) +{ + struct dma_fence *fence = vb->release_fence; + bool cookie; + + if (!fence) + return; + + cookie = dma_fence_begin_signalling(); + if (state == VB2_BUF_STATE_ERROR) + dma_fence_set_error(fence, -EIO); + dma_fence_signal(fence); + dma_fence_end_signalling(cookie); + + dma_fence_put(fence); + vb->release_fence = NULL; +} +#else /* !CONFIG_VIDEOBUF2_RELEASE_FENCES */ + +int vb2_buffer_attach_release_fence(struct vb2_buffer *vb) +{ + return 0; +} +EXPORT_SYMBOL_GPL(vb2_buffer_attach_release_fence); + +static inline void vb2_buffer_signal_release_fence(struct vb2_buffer *vb, + enum vb2_buffer_state state) +{ +} +#endif /* CONFIG_VIDEOBUF2_RELEASE_FENCES */ + void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state) { struct vb2_queue *q = vb->vb2_queue; @@ -1199,6 +1319,9 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state) if (state != VB2_BUF_STATE_QUEUED) __vb2_buf_mem_finish(vb); + if (state != VB2_BUF_STATE_QUEUED) + vb2_buffer_signal_release_fence(vb, state); + spin_lock_irqsave(&q->done_lock, flags); if (state == VB2_BUF_STATE_QUEUED) { vb->state = VB2_BUF_STATE_QUEUED; @@ -2651,6 +2774,18 @@ int vb2_core_queue_init(struct vb2_queue *q) mutex_init(&q->mmap_lock); init_waitqueue_head(&q->done_wq); +#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES + /* + * Per-queue dma_resv release-fence context. Drivers that + * opt in via supports_release_fences and call + * vb2_buffer_attach_release_fence() use these to allocate + * fences on a single per-queue timeline. + */ + q->dma_resv_fence_context = dma_fence_context_alloc(1); + atomic64_set(&q->dma_resv_fence_seqno, 0); + spin_lock_init(&q->dma_resv_fence_lock); +#endif + q->memory = VB2_MEMORY_UNKNOWN; if (q->buf_struct_size == 0) diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h index 4424d481d..766ff2194 100644 --- a/include/media/videobuf2-core.h +++ b/include/media/videobuf2-core.h @@ -288,6 +288,16 @@ struct vb2_buffer { unsigned int skip_cache_sync_on_finish:1; struct vb2_plane planes[VB2_MAX_PLANES]; +#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES + /* + * Producer release fence published on each plane's + * dmabuf->resv when the driver opts in via + * vb2_buffer_attach_release_fence(). Signalled and put by + * vb2_buffer_done() on transition to DONE/ERROR. NULL when + * the driver did not opt in for this buffer. + */ + struct dma_fence *release_fence; +#endif struct list_head queued_entry; struct list_head done_entry; #ifdef CONFIG_VIDEO_ADV_DEBUG @@ -648,6 +658,19 @@ struct vb2_queue { spinlock_t done_lock; wait_queue_head_t done_wq; +#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES + /* + * dma_resv release-fence context. Drivers that set + * supports_release_fences and call + * vb2_buffer_attach_release_fence() use these to allocate + * fences on a per-queue timeline. + */ + u64 dma_resv_fence_context; + atomic64_t dma_resv_fence_seqno; + spinlock_t dma_resv_fence_lock; +#endif + + unsigned int supports_release_fences:1; unsigned int streaming:1; unsigned int start_streaming_called:1; unsigned int error:1; @@ -735,6 +758,34 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no); */ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state); +/** + * vb2_buffer_attach_release_fence() - opt-in dma_resv release fence. + * @vb: the buffer being committed to the producer. + * + * Drivers that have set vb2_queue::supports_release_fences may call + * this from any sleepable context where they have committed to + * running the operation in finite time -- typically m2m + * device_run(), just before the HW kick. The helper allocates a + * dma_fence on the queue's per-queue timeline, attaches it as + * DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv, and stashes + * it in vb->release_fence. vb2_buffer_done() signals and puts the + * fence as part of the buffer's state transition. + * + * Skips planes whose vb2_plane.dbuf is NULL -- buffers never + * exported via VIDIOC_EXPBUF (or imported via V4L2_MEMORY_DMABUF) + * have no dmabuf for userspace to wait on. + * + * No-op when vb2_queue::supports_release_fences is not set + * (regardless of CONFIG_VIDEOBUF2_RELEASE_FENCES). When + * CONFIG_VIDEOBUF2_RELEASE_FENCES=n, this is a stub that returns 0. + * + * Returns 0 on success or when the no-op stub is in effect, + * negative errno on allocation failure when fence publishing was + * attempted. Best-effort: drivers should ignore the return value + * unless they want diagnostics. + */ +int vb2_buffer_attach_release_fence(struct vb2_buffer *vb); + /** * vb2_discard_done() - discard all buffers marked as DONE. * @q: pointer to &struct vb2_queue with videobuf2 queue. -- 2.53.0