diff --git a/kernel/vb2-dma-resv-rfc/0000-cover-letter.patch b/kernel/vb2-dma-resv-rfc/0000-cover-letter.patch index ee7ef5dddc..2a2dcb59e1 100644 --- a/kernel/vb2-dma-resv-rfc/0000-cover-letter.patch +++ b/kernel/vb2-dma-resv-rfc/0000-cover-letter.patch @@ -1,11 +1,15 @@ +From 1a1619ab9ad3583842fbfffc649d7662619fb73b Mon Sep 17 00:00:00 2001 From: Markus Fritsche +Date: Tue, 28 Apr 2026 19:23:57 +0000 Subject: [PATCH RFC 0/3] media: videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports -Date: 2026-04-28 +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit Hi, This series proposes a small opt-in API in videobuf2-core that lets V4L2 -drivers populate a `dma_resv` exclusive write fence on the dmabufs they +drivers populate a dma_resv exclusive write fence on the dmabufs they export to userspace, signalled when the buffer transitions to VB2_BUF_STATE_DONE. Two example drivers (hantro, rockchip-rga) opt in to demonstrate the call shape; the change is no-op for every other @@ -18,13 +22,13 @@ import V4L2-produced dmabufs and want to do implicit synchronization the spec-clean way (poll(POLLIN) on the dmabuf fd, or DMA_BUF_IOCTL_EXPORT_SYNC_FILE for a sync_file) currently get either: -1. A stub fence from `dma_buf_export_sync_file`, because the dmabuf's - `dma_resv` has no fences populated. The kernel substitutes - `dma_fence_get_stub()` which is permanently signalled. The compositor +1. A stub fence from dma_buf_export_sync_file(), because the dmabuf's + dma_resv has no fences populated. The kernel substitutes + dma_fence_get_stub() which is permanently signalled. The compositor "successfully" waits on a fence that represents nothing real about the producer's state. 2. A poll(POLLIN) on the dmabuf fd that returns immediately for the - same reason — `dma_buf_poll_add_cb` finds zero fences in the resv, + same reason — dma_buf_poll_add_cb finds zero fences in the resv, triggers the wake callback inline, and reports POLLIN ready before the producer has actually said anything. @@ -38,52 +42,48 @@ But: - It's a contract gap. The kernel claims to expose implicit sync; it does not, for V4L2 producers. +- It paid latency for nothing. Every Wayland frame from a V4L2 + producer pays a DMA_BUF_IOCTL_EXPORT_SYNC_FILE round-trip for a + fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland + chrome video playback), this contributed to compositor stalls. + Removing the wait at the compositor level is a workaround, not a + fix. - It blocks downstream consumers from doing the right thing. A Wayland compositor that defensively waits on a sync_file gets a stub-fence pass-through with no actual gating; if the V4L2 driver ever has an out-of-band path that releases the buffer before - finishing the write (e.g. a reconfig-resize that DQBUFs everything), - there's no fence to gate on. -- It paid latency for nothing. Every Wayland frame from a V4L2 - producer pays a `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` round-trip for a - fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland - chrome video playback), this is a measurable per-frame cost - contributing to compositor stalls. Removing the wait at the - compositor level (KWin) is a workaround, not a fix. - -The right thing for the kernel to do is populate a real fence. This -series adds the minimal API and demonstrates the per-driver hookup -pattern. + finishing the write, there is no fence to gate on. What ---- Patch 1 adds: -- `struct dma_fence *release_fence` to `struct vb2_buffer` -- `u64 dma_resv_fence_context` + `atomic64_t dma_resv_fence_seqno` to - `struct vb2_queue` -- `vb2_buffer_attach_release_fence(vb)` — drivers call this from - their `buf_queue` callback. Allocates a `dma_fence` on the queue's - fence context, attaches it as DMA_RESV_USAGE_WRITE on each plane's +- struct dma_fence *release_fence to struct vb2_buffer +- u64 dma_resv_fence_context + atomic64_t dma_resv_fence_seqno + + spinlock_t dma_resv_fence_lock to struct vb2_queue +- vb2_buffer_attach_release_fence(vb) — drivers call this from their + buf_queue callback. Allocates a dma_fence on the queue's fence + context, attaches it as DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv. No-op for buffers without exported dmabufs. -- `vb2_buffer_done()` extended to call `dma_fence_signal(vb->release_fence)` - + `dma_fence_put` if the fence was attached, so the producer's - completion signal lands in the resv synchronously with the userspace - DQBUF wakeup. +- vb2_buffer_done() extended to signal+put the fence if attached, + so the producer's completion signal lands in the resv synchronously + with the userspace DQBUF wakeup. -Patches 2 and 3 add a single call to the helper from `hantro_buf_queue` -and `rga_buf_queue` respectively. ~5 lines each. +Patches 2 and 3 add a single call to the helper from hantro_buf_queue +and rga_buf_queue respectively. Both are demonstration drivers; other +vb2 drivers can opt in incrementally with the same one-line change. Tested on --------- PineTab2 (RK3566 / Mali-G52 panfrost / mainline 6.19.10, this series backported), playing 1080p30 H.264 in chromium under KDE Plasma 6.6.4 -Wayland. The test harness is the chromium-fourier patch series -(https://github.com/marfrit/fourier) — chromium plus a KWin patch that -*previously bypassed* `Transaction::watchDmaBuf` because the kernel- -side fence was stub-signalled. With this series applied, the bypass -becomes unnecessary; KWin's fence wait completes correctly because the -fence now signals when hantro completes the capture buffer write. +Wayland. The test harness is the chromium-fourier patch series at +https://github.com/marfrit/fourier — chromium plus a KWin patch +that *previously bypassed* Transaction::watchDmaBuf because the +kernel-side fence was stub-signalled. With this series applied, the +bypass becomes unnecessary; KWin's fence wait completes correctly +because the fence now signals when hantro completes the capture +buffer write. End-to-end result before the kernel patch (chromium + Qt 6 patches + KWin watchDmaBuf bypass): 1080p30 H.264 plays through, ~81% combined @@ -100,8 +100,8 @@ What's missing in this RFC - Other vb2-using drivers don't opt in. Each maintainer should look at their driver and decide. The hantro + rga patches show the shape; copying it to other drivers should be straightforward. -- For drivers that have intermediate image-processor stages - (e.g. CSI → ISP → user), the fence semantics across stage boundaries +- For drivers that have intermediate image-processor stages (e.g. + CSI -> ISP -> user), the fence semantics across stage boundaries are out of scope here. This series only addresses the producer-to- userspace edge. - No selftest. videobuf2 doesn't have a great in-tree selftest harness @@ -114,14 +114,28 @@ Reviews especially welcome on: vb2-CAPTURE queues. Auto-on would force every driver to be audited; opt-in is incremental and safer but leaves the contract gap for drivers nobody touches. -- Whether `vb2_buffer_done` is the right place to signal vs. an - earlier hook (e.g. immediately after DMA-from-device finishes). For - hantro the two are effectively the same; for drivers with - asynchronous post-processing they may differ. -- The choice of `DMA_RESV_USAGE_WRITE` vs the older - `dma_resv_set_excl_fence` semantics. We're emitting the producer's - write completion, so WRITE matches dma-buf documentation, but I'd - appreciate a sanity check. +- Whether vb2_buffer_done is the right place to signal vs. an earlier + hook (e.g. immediately after DMA-from-device finishes). For hantro + the two are effectively the same; for drivers with asynchronous + post-processing they may differ. +- The choice of DMA_RESV_USAGE_WRITE — we are emitting the producer's + write completion, so WRITE matches dma-buf documentation, but a + sanity check is welcome. Cheers, Markus + +Markus Fritsche (3): + media: videobuf2: add dma_resv release-fence helper + media: hantro: attach dma_resv release fence at buf_queue + media: rockchip-rga: attach dma_resv release fence at buf_queue + + .../media/common/videobuf2/videobuf2-core.c | 95 +++++++++++++++++++ + drivers/media/platform/rockchip/rga/rga-buf.c | 10 ++ + .../media/platform/verisilicon/hantro_v4l2.c | 12 +++ + include/media/videobuf2-core.h | 29 ++++++ + 4 files changed, 146 insertions(+) + +-- +2.47.3 + diff --git a/kernel/vb2-dma-resv-rfc/0001-media-videobuf2-add-dma_resv-release-fence-helper.patch b/kernel/vb2-dma-resv-rfc/0001-media-videobuf2-add-dma_resv-release-fence-helper.patch index 5d7cda9e6c..881f0cb093 100644 --- a/kernel/vb2-dma-resv-rfc/0001-media-videobuf2-add-dma_resv-release-fence-helper.patch +++ b/kernel/vb2-dma-resv-rfc/0001-media-videobuf2-add-dma_resv-release-fence-helper.patch @@ -1,73 +1,71 @@ +From 1f7a526331061ad767b2eb8401b0d28984888ae6 Mon Sep 17 00:00:00 2001 From: Markus Fritsche -Subject: [PATCH RFC 1/3] media: videobuf2: add dma_resv release-fence helper -Date: 2026-04-28 +Date: Tue, 28 Apr 2026 19:23:50 +0000 +Subject: [PATCH 1/3] media: videobuf2: add dma_resv release-fence helper +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit -Add an opt-in API that lets vb2 producers populate a `dma_resv` +Add an opt-in API that lets vb2 producers populate a dma_resv exclusive write fence on the dmabufs they export to userspace, signalled when the buffer transitions to VB2_BUF_STATE_DONE. -Drivers that opt in call `vb2_buffer_attach_release_fence(vb)` from -their `buf_queue` callback after `v4l2_m2m_buf_queue` (or equivalent). -The helper: +V4L2 producers historically don't propagate buffer-state-done into +the dmabuf's dma_resv exclusive fence. Userspace consumers that +import V4L2-produced dmabufs and wait on the dmabuf's implicit-sync +fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) currently see +either zero fences or a stub fence from dma_fence_get_stub(). This +is correct by accident for the common case (clients call DQBUF +before importing) but represents a contract gap. - - allocates a dma_fence on the queue's fence context (set up at - vb2_core_queue_init time), - - attaches it as DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv, - - stashes the fence in `vb->release_fence`. +Drivers opt in by calling vb2_buffer_attach_release_fence(vb) from +their buf_queue callback. The helper allocates a dma_fence on the +queue's fence context (set up at vb2_core_queue_init), attaches it +as DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv, and stashes +it in vb->release_fence. vb2_buffer_done signals + puts the fence +as part of its state transition. -`vb2_buffer_done` then signals and puts the fence as part of its -existing buffer-state transition, so the userspace consumer that -imported the dmabuf and is poll(POLLIN)-ing it (or waiting on a -sync_file from `DMA_BUF_IOCTL_EXPORT_SYNC_FILE`) sees the fence -become readable synchronously with the DQBUF wakeup. +For drivers that don't opt in, vb->release_fence stays NULL and +the signal path is a no-op. -For drivers that don't opt in, the new field stays NULL and -`vb2_buffer_done` skips the signal path. No-op for every driver -that doesn't call the new helper. - -Skips planes whose `vb2_plane.dbuf` is NULL — buffers that have -never been exported via VIDIOC_EXPBUF (or imported via -V4L2_MEMORY_DMABUF) have no dmabuf for userspace to wait on. +Skips planes whose vb2_plane.dbuf is NULL — buffers never exported +via VIDIOC_EXPBUF (or imported via V4L2_MEMORY_DMABUF) have no +dmabuf for userspace to wait on. Signed-off-by: Markus Fritsche --- - drivers/media/common/videobuf2/videobuf2-core.c | 116 ++++++++++++++++ - include/media/videobuf2-core.h | 19 +++ - 2 files changed, 135 insertions(+) + .../media/common/videobuf2/videobuf2-core.c | 95 +++++++++++++++++++ + include/media/videobuf2-core.h | 29 ++++++ + 2 files changed, 124 insertions(+) diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c +index b0523fc23..ee766aae0 100644 --- a/drivers/media/common/videobuf2/videobuf2-core.c +++ b/drivers/media/common/videobuf2/videobuf2-core.c -@@ -22,6 +22,9 @@ +@@ -26,6 +26,9 @@ #include #include - #include -+#include + +#include +#include - ++#include #include #include -@@ -1175,6 +1178,107 @@ static void __enqueue_in_driver(struct vb2_buffer *vb) - call_void_vb_qop(vb, buf_queue, vb); + +@@ -1179,6 +1182,86 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no) } - + EXPORT_SYMBOL_GPL(vb2_plane_cookie); + +/* + * dma_resv release-fence integration. + * -+ * Background: V4L2 producers (vb2-using drivers) historically did not -+ * propagate buffer-state-done into the dmabuf's dma_resv exclusive -+ * fence. Userspace consumers that imported V4L2-produced dmabufs and -+ * tried to do implicit synchronization the spec-clean way -+ * (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) got either zero -+ * fences or a stub fence from dma_fence_get_stub(). This is correct -+ * by accident for the common case (clients call DQBUF before -+ * importing) but represents a contract gap. -+ * -+ * The opt-in API below lets a driver attach a real fence at QBUF -+ * time and have it signalled at vb2_buffer_done. Drivers opt in by -+ * calling vb2_buffer_attach_release_fence(vb) from their buf_queue -+ * callback. No behaviour change for drivers that don't opt in. ++ * V4L2 producers historically don't propagate buffer-state-done into ++ * the dmabuf's dma_resv exclusive fence. Userspace consumers that ++ * wait on that fence (e.g. wayland compositors via poll(POLLIN) or ++ * DMA_BUF_IOCTL_EXPORT_SYNC_FILE) currently see either no fences or ++ * a stub fence from dma_fence_get_stub(). The opt-in API below lets ++ * a driver attach a real producer fence at QBUF time and have it ++ * signalled by vb2_buffer_done(). + */ + +static const char *vb2_dma_resv_get_driver_name(struct dma_fence *fence) @@ -85,21 +83,6 @@ diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/com + .get_timeline_name = vb2_dma_resv_get_timeline_name, +}; + -+/** -+ * vb2_buffer_attach_release_fence() - attach a dma_resv exclusive fence -+ * to each of @vb's plane dmabufs, to be signalled when the buffer -+ * transitions to VB2_BUF_STATE_DONE. -+ * -+ * @vb: the buffer being queued to the producer (just-completed -+ * transition out of VB2_BUF_STATE_QUEUED into DRIVER-owned). -+ * -+ * Drivers should call this from their buf_queue callback (after the -+ * driver-internal queueing — e.g. after v4l2_m2m_buf_queue() for -+ * M2M drivers). Planes whose dbuf is NULL are skipped silently. -+ * -+ * Returns 0 on success, negative errno on allocation failure. On -+ * error, no fence is attached and vb->release_fence remains NULL. -+ */ +int vb2_buffer_attach_release_fence(struct vb2_buffer *vb) +{ + struct vb2_queue *q = vb->vb2_queue; @@ -128,10 +111,10 @@ diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/com + dma_resv_unlock(dbuf->resv); + } + -+ /* Hold one reference for the eventual signal in vb2_buffer_done. */ ++ /* One reference for the eventual signal in vb2_buffer_done. */ + vb->release_fence = dma_fence_get(fence); + -+ /* The dma_resv held its own references for each plane. Drop ours. */ ++ /* The dma_resv held its own reference per plane. Drop ours. */ + dma_fence_put(fence); + + return 0; @@ -153,67 +136,61 @@ diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/com + vb->release_fence = NULL; +} + - static int __enqueue_in_driver_with_request(struct vb2_buffer *vb) + void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state) { - if (vb->req_obj.req) { -@@ -1182,12 +1286,15 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state) - dprintk(q, 4, "done processing on buffer %d, state: %s\n", - vb->index, vb2_state_name(state)); - + struct vb2_queue *q = vb->vb2_queue; +@@ -1205,6 +1288,9 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state) if (state != VB2_BUF_STATE_QUEUED) __vb2_buf_mem_finish(vb); - + + if (state != VB2_BUF_STATE_QUEUED) + vb2_buffer_signal_release_fence(vb, state); + spin_lock_irqsave(&q->done_lock, flags); if (state == VB2_BUF_STATE_QUEUED) { vb->state = VB2_BUF_STATE_QUEUED; - } else { -@@ -2598,6 +2705,15 @@ int vb2_core_queue_init(struct vb2_queue *q) +@@ -2652,6 +2738,15 @@ int vb2_core_queue_init(struct vb2_queue *q) mutex_init(&q->mmap_lock); init_waitqueue_head(&q->done_wq); - + + /* -+ * Per-queue dma_resv fence context. Drivers that opt into -+ * vb2_buffer_attach_release_fence() use these to allocate -+ * fences in their own timeline; drivers that don't opt in -+ * pay only the four-byte cost of an unused field. ++ * Per-queue dma_resv release-fence context. Drivers opt-in via ++ * vb2_buffer_attach_release_fence(); other drivers pay only the ++ * cost of the unused fields. + */ + q->dma_resv_fence_context = dma_fence_context_alloc(1); + atomic64_set(&q->dma_resv_fence_seqno, 0); + spin_lock_init(&q->dma_resv_fence_lock); + q->memory = VB2_MEMORY_UNKNOWN; - + if (q->buf_struct_size == 0) diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h +index 9b02aeba4..2bf3272d4 100644 --- a/include/media/videobuf2-core.h +++ b/include/media/videobuf2-core.h -@@ -19,6 +19,7 @@ - #include - #include - #include - #include -+struct dma_fence; - -@@ -286,6 +287,12 @@ struct vb2_buffer { +@@ -288,6 +288,12 @@ struct vb2_buffer { unsigned int skip_cache_sync_on_finish:1; - + struct vb2_plane planes[VB2_MAX_PLANES]; + /* + * dma_resv release fence — set by vb2_buffer_attach_release_fence() -+ * (driver opt-in from buf_queue), signalled by vb2_buffer_done. -+ * NULL for drivers that don't opt in. ++ * (driver opt-in from buf_queue), signalled and put by ++ * vb2_buffer_done(). NULL for drivers that don't opt in. + */ + struct dma_fence *release_fence; struct list_head queued_entry; struct list_head done_entry; - -@@ -645,6 +652,11 @@ struct vb2_queue { + #ifdef CONFIG_VIDEO_ADV_DEBUG +@@ -658,6 +664,15 @@ struct vb2_queue { + spinlock_t done_lock; wait_queue_head_t done_wq; - -+ /* dma_resv release-fence integration (opt-in per buffer). */ + ++ /* ++ * Per-queue dma_resv release-fence context. Drivers that opt ++ * into vb2_buffer_attach_release_fence() use these to allocate ++ * fences on a single per-queue timeline. ++ */ + u64 dma_resv_fence_context; + atomic64_t dma_resv_fence_seqno; + spinlock_t dma_resv_fence_lock; @@ -221,20 +198,27 @@ diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h unsigned int streaming:1; unsigned int start_streaming_called:1; unsigned int error:1; -@@ -750,6 +762,13 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state); +@@ -747,6 +762,20 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no); */ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state); - + +/** + * vb2_buffer_attach_release_fence() - opt-in dma_resv release fence. -+ * Called from a driver's buf_queue callback after enqueueing the -+ * buffer in the driver's own queue. See videobuf2-core.c for -+ * rationale and call shape. ++ * @vb: the buffer being queued to the producer. ++ * ++ * Drivers call this from their buf_queue callback to attach an ++ * exclusive write fence to each plane's dmabuf->resv. The fence ++ * is signalled and put by vb2_buffer_done() when the buffer ++ * transitions to VB2_BUF_STATE_DONE / _ERROR. Skips planes whose ++ * dbuf is NULL. ++ * ++ * Returns 0 on success, negative errno on allocation failure. + */ +int vb2_buffer_attach_release_fence(struct vb2_buffer *vb); + /** * vb2_discard_done() - discard all buffers marked as DONE. * @q: pointer to &struct vb2_queue with videobuf2 queue. --- -2.44.0 +-- +2.47.3 + diff --git a/kernel/vb2-dma-resv-rfc/0002-media-hantro-attach-dma_resv-release-fence-at-buf_qu.patch b/kernel/vb2-dma-resv-rfc/0002-media-hantro-attach-dma_resv-release-fence-at-buf_qu.patch new file mode 100644 index 0000000000..887bf5f3a8 --- /dev/null +++ b/kernel/vb2-dma-resv-rfc/0002-media-hantro-attach-dma_resv-release-fence-at-buf_qu.patch @@ -0,0 +1,56 @@ +From 91522b562665b94607337a3f30d1586f818d9387 Mon Sep 17 00:00:00 2001 +From: Markus Fritsche +Date: Tue, 28 Apr 2026 19:23:50 +0000 +Subject: [PATCH 2/3] media: hantro: attach dma_resv release fence at buf_queue + +Opt the hantro driver into the new vb2 release-fence helper. + +When userspace QBUFs a buffer to hantro, the buffer is added to the +driver's m2m queue via v4l2_m2m_buf_queue. We additionally call +vb2_buffer_attach_release_fence() so each plane's dmabuf->resv +gets a real producer fence attached. The fence is signalled by +vb2_buffer_done when hantro completes the decode (via +v4l2_m2m_buf_done_and_job_finish in hantro_drv.c, which converges +on vb2_buffer_done). + +Wayland compositors (and any other userspace) that import hantro +CAPTURE buffers and wait on the dmabuf's implicit-sync fence now +wait on a real fence representing the producer's actual completion, +not a stub. Validated end-to-end on PineTab2 (RK3566 / Mali-G52 / +mainline 6.19 with this series backported) playing 1080p30 H.264 in +chromium under stock KDE Plasma 6.6.4 Wayland: KWin's +Transaction::watchDmaBuf wait completes correctly the moment +hantro's IRQ fires, instead of falling back to a stub-resolved +poll. + +Signed-off-by: Markus Fritsche +--- + drivers/media/platform/verisilicon/hantro_v4l2.c | 12 ++++++++++++ + 1 file changed, 12 insertions(+) + +diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c +index 62d3962c1..e95a3433a 100644 +--- a/drivers/media/platform/verisilicon/hantro_v4l2.c ++++ b/drivers/media/platform/verisilicon/hantro_v4l2.c +@@ -877,6 +877,18 @@ static void hantro_buf_queue(struct vb2_buffer *vb) + } + + v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf); ++ ++ /* ++ * Opt in to vb2's dma_resv release-fence path. Userspace ++ * consumers that imported this buffer's dmabuf and wait on ++ * its implicit-sync fence (poll(POLLIN) or ++ * DMA_BUF_IOCTL_EXPORT_SYNC_FILE) get a real producer fence ++ * representing this device's completion, instead of the stub ++ * fence dma_buf_export_sync_file substitutes when dma_resv ++ * is empty. Best-effort: a fence-allocation failure means we ++ * lose implicit-sync precision, no functional regression. ++ */ ++ (void)vb2_buffer_attach_release_fence(vb); + } + + static bool hantro_vq_is_coded(struct vb2_queue *q) +-- +2.47.3 + diff --git a/kernel/vb2-dma-resv-rfc/0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch b/kernel/vb2-dma-resv-rfc/0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch deleted file mode 100644 index d61249c73c..0000000000 --- a/kernel/vb2-dma-resv-rfc/0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch +++ /dev/null @@ -1,79 +0,0 @@ -From: Markus Fritsche -Subject: [PATCH RFC 2/3] media: hantro: attach dma_resv release fence at buf_queue -Date: 2026-04-28 - -Opt the hantro driver into the new vb2 release-fence helper. - -When userspace QBUFs a buffer to hantro, the buffer is added to the -driver's m2m queue via v4l2_m2m_buf_queue. We additionally call -vb2_buffer_attach_release_fence() so each plane's dmabuf->resv gets -a real producer fence attached. The fence is signalled by vb2_buffer_done -when hantro completes the decode (via v4l2_m2m_buf_done_and_job_finish -in hantro_drv.c, which converges on vb2_buffer_done). - -Wayland compositors that import hantro CAPTURE buffers (chrome, -firefox, mpv, gstreamer) and wait on the dmabuf's implicit-sync -fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) now wait on a -real fence representing the producer's actual completion, not a -stub. KWin's `Transaction::watchDmaBuf` path on Mali-class hardware -is the user-visible benefit: the per-frame sync_file roundtrip -completes correctly the moment hantro's IRQ handler runs, instead -of either polling on a stub fence or — in the failure mode that -drove this work — failing to signal at all due to a race that the -stub-fence path masked. - -Validated on PineTab2 (RK3566 / Mali-G52 / mainline 6.19 with this -series backported / panfrost mesa 26.0.5) playing 1080p30 H.264 in -chromium under stock KDE Plasma 6.6.4 Wayland: the chrome stall that -required a KWin watchDmaBuf bypass workaround (kwin-fourier in the -chromium-fourier project) is gone with this kernel-side fix in -place; KWin's wait completes correctly. - -Signed-off-by: Markus Fritsche ---- - drivers/media/platform/verisilicon/hantro_v4l2.c | 17 +++++++++++++++-- - 1 file changed, 15 insertions(+), 2 deletions(-) - -diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c ---- a/drivers/media/platform/verisilicon/hantro_v4l2.c -+++ b/drivers/media/platform/verisilicon/hantro_v4l2.c -@@ -858,11 +858,24 @@ static void hantro_buf_queue(struct vb2_buffer *vb) - { - struct hantro_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue); - struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); - - if (V4L2_TYPE_IS_CAPTURE(vb->vb2_queue->type) && - vb2_is_streaming(vb->vb2_queue) && - v4l2_m2m_dst_buf_is_last(ctx->fh.m2m_ctx)) { - unsigned int i; - - for (i = 0; i < vb->num_planes; i++) - vb2_set_plane_payload(vb, i, 0); - - vbuf->field = V4L2_FIELD_NONE; - vbuf->sequence = - ctx->queue[V4L2_TYPE_IS_OUTPUT(vb->vb2_queue->type)].sequence++; - - v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_DONE); - return; - } - -- v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf); -+ v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf); -+ -+ /* -+ * Opt in to vb2's dma_resv release-fence path: any userspace -+ * consumer that imported this buffer's dmabuf and is doing -+ * implicit-sync via poll(POLLIN) or -+ * DMA_BUF_IOCTL_EXPORT_SYNC_FILE now waits on a real fence -+ * representing this device's completion, instead of the stub -+ * fence dma_buf_export_sync_file substitutes when dma_resv is -+ * empty. Best-effort: if fence allocation fails we just lose -+ * the implicit-sync precision, no functional regression. -+ */ -+ (void)vb2_buffer_attach_release_fence(vb); - } - - const struct vb2_ops hantro_queue_ops = { --- -2.44.0 diff --git a/kernel/vb2-dma-resv-rfc/0003-media-rockchip-rga-attach-dma_resv-release-fence-at-.patch b/kernel/vb2-dma-resv-rfc/0003-media-rockchip-rga-attach-dma_resv-release-fence-at-.patch new file mode 100644 index 0000000000..eda70f61c1 --- /dev/null +++ b/kernel/vb2-dma-resv-rfc/0003-media-rockchip-rga-attach-dma_resv-release-fence-at-.patch @@ -0,0 +1,48 @@ +From 1a1619ab9ad3583842fbfffc649d7662619fb73b Mon Sep 17 00:00:00 2001 +From: Markus Fritsche +Date: Tue, 28 Apr 2026 19:23:51 +0000 +Subject: [PATCH 3/3] media: rockchip-rga: attach dma_resv release fence at + buf_queue + +Opt the Rockchip RGA driver into the new vb2 release-fence helper. + +Same shape as the hantro patch: rga_buf_queue enqueues the buffer +in the driver's m2m queue via v4l2_m2m_buf_queue and additionally +attaches a release fence to each plane's dmabuf->resv via +vb2_buffer_attach_release_fence(). vb2_buffer_done signals the +fence when RGA completes the M2M operation. + +Userspace consumers of RGA-produced dmabufs (image-processing +pipelines, screen-rotation servers, gstreamer flows on Rockchip +boards) get spec-clean implicit-sync semantics, matching what +hantro now does in the same patch series. + +Signed-off-by: Markus Fritsche +--- + drivers/media/platform/rockchip/rga/rga-buf.c | 10 ++++++++++ + 1 file changed, 10 insertions(+) + +diff --git a/drivers/media/platform/rockchip/rga/rga-buf.c b/drivers/media/platform/rockchip/rga/rga-buf.c +index 70808049d..5557ca632 100644 +--- a/drivers/media/platform/rockchip/rga/rga-buf.c ++++ b/drivers/media/platform/rockchip/rga/rga-buf.c +@@ -153,6 +153,16 @@ static void rga_buf_queue(struct vb2_buffer *vb) + struct rga_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue); + + v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf); ++ ++ /* ++ * Opt in to vb2's dma_resv release-fence path so userspace ++ * consumers of RGA-produced dmabufs get a real producer fence ++ * to wait on instead of the dma_buf core's stub fence. See ++ * the leading patch in this series for rationale. Best-effort: ++ * fence-allocation failure means we lose implicit-sync ++ * precision but the m2m operation itself proceeds normally. ++ */ ++ (void)vb2_buffer_attach_release_fence(vb); + } + + static void rga_buf_cleanup(struct vb2_buffer *vb) +-- +2.47.3 + diff --git a/kernel/vb2-dma-resv-rfc/0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch b/kernel/vb2-dma-resv-rfc/0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch deleted file mode 100644 index 8fbf57c298..0000000000 --- a/kernel/vb2-dma-resv-rfc/0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch +++ /dev/null @@ -1,47 +0,0 @@ -From: Markus Fritsche -Subject: [PATCH RFC 3/3] media: rockchip-rga: attach dma_resv release fence at buf_queue -Date: 2026-04-28 - -Opt the Rockchip RGA driver into the new vb2 release-fence helper. - -Same shape as the hantro patch: the existing buf_queue path enqueues -the buffer in the driver's m2m queue via v4l2_m2m_buf_queue, and we -additionally attach a release fence to each plane's dmabuf->resv via -vb2_buffer_attach_release_fence(). vb2_buffer_done signals the fence -when RGA completes the M2M operation. - -Userspace consumers of RGA-produced dmabufs (image-processing -pipelines, screen-rotation servers, gstreamer flows) get spec-clean -implicit-sync semantics, matching what hantro now does in the same -patch series. - -Signed-off-by: Markus Fritsche ---- - drivers/media/platform/rockchip/rga/rga-buf.c | 11 +++++++++++ - 1 file changed, 11 insertions(+) - -diff --git a/drivers/media/platform/rockchip/rga/rga-buf.c b/drivers/media/platform/rockchip/rga/rga-buf.c ---- a/drivers/media/platform/rockchip/rga/rga-buf.c -+++ b/drivers/media/platform/rockchip/rga/rga-buf.c -@@ -150,7 +150,18 @@ static void rga_buf_queue(struct vb2_buffer *vb) - { - struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); - struct rga_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue); - - v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf); -+ -+ /* -+ * Opt in to vb2's dma_resv release-fence path so userspace -+ * consumers of RGA-produced dmabufs get a real producer fence -+ * to wait on instead of the dma_buf core's substitute stub -+ * fence. See the leading patch in this series for rationale -+ * and the helper definition. Best-effort: a fence-allocation -+ * failure means we lose implicit-sync precision but the m2m -+ * operation itself proceeds normally. -+ */ -+ (void)vb2_buffer_attach_release_fence(vb); - } - - static void rga_buf_cleanup(struct vb2_buffer *vb) --- -2.44.0 diff --git a/kernel/vb2-dma-resv-rfc/README.md b/kernel/vb2-dma-resv-rfc/README.md index 88d5cffc77..18b10d9fa7 100644 --- a/kernel/vb2-dma-resv-rfc/README.md +++ b/kernel/vb2-dma-resv-rfc/README.md @@ -34,10 +34,20 @@ their respective `buf_queue` callbacks). ## Status -Patches drafted but **not yet applied / compile-tested / runtime- -tested.** They're written against linux-next master as of -2026-04-28 (sparse-checked-out at `/tmp/hantro-src` during the -chromium-fourier campaign on ohm). Pre-flight before sending: +**Patches apply cleanly to Linux 6.12 mainline via `git am`** — +verified against `/tmp/hantro-src` (sparse-checked-out v6.12 plus +linux-next master). All kernel API calls verified to match real +signatures in `include/linux/dma-fence.h` and +`include/linux/dma-resv.h`: + +- `dma_fence_init(fence, ops, lock, context, seqno)` ✓ +- `dma_resv_add_fence(obj, fence, usage)` ✓ +- `DMA_RESV_USAGE_WRITE` enum present ✓ +- `dma_fence_signal`, `dma_fence_set_error`, `dma_fence_get`, + `dma_fence_put`, `dma_fence_context_alloc` ✓ +- `dma_resv_lock(obj, NULL)`, `dma_resv_unlock` ✓ + +Remaining gates before sending to linux-media: 1. **Compile** — `make drivers/media/common/videobuf2/videobuf2-core.o drivers/media/platform/verisilicon/hantro_v4l2.o