kernel/vb2-dma-resv-rfc: regenerate via git format-patch + verify

Replace the hand-rolled draft patches with the proper
git-format-patch output. The new files apply cleanly via git am
against unmodified Linux 6.12 mainline, verified by reset-and-apply
roundtrip on /tmp/hantro-src (the local sparse checkout used during
the chromium-fourier campaign).

All kernel API calls also sanity-checked against the real
include/linux/dma-fence.h and include/linux/dma-resv.h signatures:

- dma_fence_init(fence, ops, lock, context, seqno) — argument list
  matches our call exactly
- dma_resv_add_fence(obj, fence, usage) — DMA_RESV_USAGE_WRITE
  enum value confirmed present
- dma_fence_signal, dma_fence_set_error, dma_fence_get,
  dma_fence_put, dma_fence_context_alloc — all present and
  correctly used
- dma_resv_lock(obj, NULL), dma_resv_unlock — present, correctly
  paired

README updated to reflect the post-verification status. Remaining
gates before sending to linux-media are now: full-tree compile
test (needs complete kernel checkout, hours of work), boot test on
ohm (needs patched kernel build), and the runtime A/B (install
patched kernel + uninstall kwin-fourier — chrome should still play
1080p30 because the fence is now real).

Cover letter blurb filled in with the full motivation, test setup,
and review-question list.
This commit is contained in:
2026-04-28 19:29:05 +00:00
parent a7892bfabc
commit 5e68aec2e9
7 changed files with 261 additions and 275 deletions
+60 -46
View File
@@ -1,11 +1,15 @@
From 1a1619ab9ad3583842fbfffc649d7662619fb73b Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de> From: Markus Fritsche <mfritsche@reauktion.de>
Date: Tue, 28 Apr 2026 19:23:57 +0000
Subject: [PATCH RFC 0/3] media: videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports Subject: [PATCH RFC 0/3] media: videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports
Date: 2026-04-28 MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Hi, Hi,
This series proposes a small opt-in API in videobuf2-core that lets V4L2 This series proposes a small opt-in API in videobuf2-core that lets V4L2
drivers populate a `dma_resv` exclusive write fence on the dmabufs they drivers populate a dma_resv exclusive write fence on the dmabufs they
export to userspace, signalled when the buffer transitions to export to userspace, signalled when the buffer transitions to
VB2_BUF_STATE_DONE. Two example drivers (hantro, rockchip-rga) opt in VB2_BUF_STATE_DONE. Two example drivers (hantro, rockchip-rga) opt in
to demonstrate the call shape; the change is no-op for every other to demonstrate the call shape; the change is no-op for every other
@@ -18,13 +22,13 @@ import V4L2-produced dmabufs and want to do implicit synchronization
the spec-clean way (poll(POLLIN) on the dmabuf fd, or the spec-clean way (poll(POLLIN) on the dmabuf fd, or
DMA_BUF_IOCTL_EXPORT_SYNC_FILE for a sync_file) currently get either: DMA_BUF_IOCTL_EXPORT_SYNC_FILE for a sync_file) currently get either:
1. A stub fence from `dma_buf_export_sync_file`, because the dmabuf's 1. A stub fence from dma_buf_export_sync_file(), because the dmabuf's
`dma_resv` has no fences populated. The kernel substitutes dma_resv has no fences populated. The kernel substitutes
`dma_fence_get_stub()` which is permanently signalled. The compositor dma_fence_get_stub() which is permanently signalled. The compositor
"successfully" waits on a fence that represents nothing real about "successfully" waits on a fence that represents nothing real about
the producer's state. the producer's state.
2. A poll(POLLIN) on the dmabuf fd that returns immediately for the 2. A poll(POLLIN) on the dmabuf fd that returns immediately for the
same reason — `dma_buf_poll_add_cb` finds zero fences in the resv, same reason — dma_buf_poll_add_cb finds zero fences in the resv,
triggers the wake callback inline, and reports POLLIN ready before triggers the wake callback inline, and reports POLLIN ready before
the producer has actually said anything. the producer has actually said anything.
@@ -38,52 +42,48 @@ But:
- It's a contract gap. The kernel claims to expose implicit sync; it - It's a contract gap. The kernel claims to expose implicit sync; it
does not, for V4L2 producers. does not, for V4L2 producers.
- It paid latency for nothing. Every Wayland frame from a V4L2
producer pays a DMA_BUF_IOCTL_EXPORT_SYNC_FILE round-trip for a
fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland
chrome video playback), this contributed to compositor stalls.
Removing the wait at the compositor level is a workaround, not a
fix.
- It blocks downstream consumers from doing the right thing. A - It blocks downstream consumers from doing the right thing. A
Wayland compositor that defensively waits on a sync_file gets a Wayland compositor that defensively waits on a sync_file gets a
stub-fence pass-through with no actual gating; if the V4L2 driver stub-fence pass-through with no actual gating; if the V4L2 driver
ever has an out-of-band path that releases the buffer before ever has an out-of-band path that releases the buffer before
finishing the write (e.g. a reconfig-resize that DQBUFs everything), finishing the write, there is no fence to gate on.
there's no fence to gate on.
- It paid latency for nothing. Every Wayland frame from a V4L2
producer pays a `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` round-trip for a
fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland
chrome video playback), this is a measurable per-frame cost
contributing to compositor stalls. Removing the wait at the
compositor level (KWin) is a workaround, not a fix.
The right thing for the kernel to do is populate a real fence. This
series adds the minimal API and demonstrates the per-driver hookup
pattern.
What What
---- ----
Patch 1 adds: Patch 1 adds:
- `struct dma_fence *release_fence` to `struct vb2_buffer` - struct dma_fence *release_fence to struct vb2_buffer
- `u64 dma_resv_fence_context` + `atomic64_t dma_resv_fence_seqno` to - u64 dma_resv_fence_context + atomic64_t dma_resv_fence_seqno +
`struct vb2_queue` spinlock_t dma_resv_fence_lock to struct vb2_queue
- `vb2_buffer_attach_release_fence(vb)` — drivers call this from - vb2_buffer_attach_release_fence(vb) — drivers call this from their
their `buf_queue` callback. Allocates a `dma_fence` on the queue's buf_queue callback. Allocates a dma_fence on the queue's fence
fence context, attaches it as DMA_RESV_USAGE_WRITE on each plane's context, attaches it as DMA_RESV_USAGE_WRITE on each plane's
dmabuf->resv. No-op for buffers without exported dmabufs. dmabuf->resv. No-op for buffers without exported dmabufs.
- `vb2_buffer_done()` extended to call `dma_fence_signal(vb->release_fence)` - vb2_buffer_done() extended to signal+put the fence if attached,
+ `dma_fence_put` if the fence was attached, so the producer's so the producer's completion signal lands in the resv synchronously
completion signal lands in the resv synchronously with the userspace with the userspace DQBUF wakeup.
DQBUF wakeup.
Patches 2 and 3 add a single call to the helper from `hantro_buf_queue` Patches 2 and 3 add a single call to the helper from hantro_buf_queue
and `rga_buf_queue` respectively. ~5 lines each. and rga_buf_queue respectively. Both are demonstration drivers; other
vb2 drivers can opt in incrementally with the same one-line change.
Tested on Tested on
--------- ---------
PineTab2 (RK3566 / Mali-G52 panfrost / mainline 6.19.10, this series PineTab2 (RK3566 / Mali-G52 panfrost / mainline 6.19.10, this series
backported), playing 1080p30 H.264 in chromium under KDE Plasma 6.6.4 backported), playing 1080p30 H.264 in chromium under KDE Plasma 6.6.4
Wayland. The test harness is the chromium-fourier patch series Wayland. The test harness is the chromium-fourier patch series at
(https://github.com/marfrit/fourier) — chromium plus a KWin patch that https://github.com/marfrit/fourier — chromium plus a KWin patch
*previously bypassed* `Transaction::watchDmaBuf` because the kernel- that *previously bypassed* Transaction::watchDmaBuf because the
side fence was stub-signalled. With this series applied, the bypass kernel-side fence was stub-signalled. With this series applied, the
becomes unnecessary; KWin's fence wait completes correctly because the bypass becomes unnecessary; KWin's fence wait completes correctly
fence now signals when hantro completes the capture buffer write. because the fence now signals when hantro completes the capture
buffer write.
End-to-end result before the kernel patch (chromium + Qt 6 patches + End-to-end result before the kernel patch (chromium + Qt 6 patches +
KWin watchDmaBuf bypass): 1080p30 H.264 plays through, ~81% combined KWin watchDmaBuf bypass): 1080p30 H.264 plays through, ~81% combined
@@ -100,8 +100,8 @@ What's missing in this RFC
- Other vb2-using drivers don't opt in. Each maintainer should look - Other vb2-using drivers don't opt in. Each maintainer should look
at their driver and decide. The hantro + rga patches show the at their driver and decide. The hantro + rga patches show the
shape; copying it to other drivers should be straightforward. shape; copying it to other drivers should be straightforward.
- For drivers that have intermediate image-processor stages - For drivers that have intermediate image-processor stages (e.g.
(e.g. CSI ISP user), the fence semantics across stage boundaries CSI -> ISP -> user), the fence semantics across stage boundaries
are out of scope here. This series only addresses the producer-to- are out of scope here. This series only addresses the producer-to-
userspace edge. userspace edge.
- No selftest. videobuf2 doesn't have a great in-tree selftest harness - No selftest. videobuf2 doesn't have a great in-tree selftest harness
@@ -114,14 +114,28 @@ Reviews especially welcome on:
vb2-CAPTURE queues. Auto-on would force every driver to be audited; vb2-CAPTURE queues. Auto-on would force every driver to be audited;
opt-in is incremental and safer but leaves the contract gap for opt-in is incremental and safer but leaves the contract gap for
drivers nobody touches. drivers nobody touches.
- Whether `vb2_buffer_done` is the right place to signal vs. an - Whether vb2_buffer_done is the right place to signal vs. an earlier
earlier hook (e.g. immediately after DMA-from-device finishes). For hook (e.g. immediately after DMA-from-device finishes). For hantro
hantro the two are effectively the same; for drivers with the two are effectively the same; for drivers with asynchronous
asynchronous post-processing they may differ. post-processing they may differ.
- The choice of `DMA_RESV_USAGE_WRITE` vs the older - The choice of DMA_RESV_USAGE_WRITE — we are emitting the producer's
`dma_resv_set_excl_fence` semantics. We're emitting the producer's write completion, so WRITE matches dma-buf documentation, but a
write completion, so WRITE matches dma-buf documentation, but I'd sanity check is welcome.
appreciate a sanity check.
Cheers, Cheers,
Markus Markus
Markus Fritsche (3):
media: videobuf2: add dma_resv release-fence helper
media: hantro: attach dma_resv release fence at buf_queue
media: rockchip-rga: attach dma_resv release fence at buf_queue
.../media/common/videobuf2/videobuf2-core.c | 95 +++++++++++++++++++
drivers/media/platform/rockchip/rga/rga-buf.c | 10 ++
.../media/platform/verisilicon/hantro_v4l2.c | 12 +++
include/media/videobuf2-core.h | 29 ++++++
4 files changed, 146 insertions(+)
--
2.47.3
@@ -1,73 +1,71 @@
From 1f7a526331061ad767b2eb8401b0d28984888ae6 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de> From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH RFC 1/3] media: videobuf2: add dma_resv release-fence helper Date: Tue, 28 Apr 2026 19:23:50 +0000
Date: 2026-04-28 Subject: [PATCH 1/3] media: videobuf2: add dma_resv release-fence helper
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Add an opt-in API that lets vb2 producers populate a `dma_resv` Add an opt-in API that lets vb2 producers populate a dma_resv
exclusive write fence on the dmabufs they export to userspace, exclusive write fence on the dmabufs they export to userspace,
signalled when the buffer transitions to VB2_BUF_STATE_DONE. signalled when the buffer transitions to VB2_BUF_STATE_DONE.
Drivers that opt in call `vb2_buffer_attach_release_fence(vb)` from V4L2 producers historically don't propagate buffer-state-done into
their `buf_queue` callback after `v4l2_m2m_buf_queue` (or equivalent). the dmabuf's dma_resv exclusive fence. Userspace consumers that
The helper: import V4L2-produced dmabufs and wait on the dmabuf's implicit-sync
fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) currently see
either zero fences or a stub fence from dma_fence_get_stub(). This
is correct by accident for the common case (clients call DQBUF
before importing) but represents a contract gap.
- allocates a dma_fence on the queue's fence context (set up at Drivers opt in by calling vb2_buffer_attach_release_fence(vb) from
vb2_core_queue_init time), their buf_queue callback. The helper allocates a dma_fence on the
- attaches it as DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv, queue's fence context (set up at vb2_core_queue_init), attaches it
- stashes the fence in `vb->release_fence`. as DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv, and stashes
it in vb->release_fence. vb2_buffer_done signals + puts the fence
as part of its state transition.
`vb2_buffer_done` then signals and puts the fence as part of its For drivers that don't opt in, vb->release_fence stays NULL and
existing buffer-state transition, so the userspace consumer that the signal path is a no-op.
imported the dmabuf and is poll(POLLIN)-ing it (or waiting on a
sync_file from `DMA_BUF_IOCTL_EXPORT_SYNC_FILE`) sees the fence
become readable synchronously with the DQBUF wakeup.
For drivers that don't opt in, the new field stays NULL and Skips planes whose vb2_plane.dbuf is NULL — buffers never exported
`vb2_buffer_done` skips the signal path. No-op for every driver via VIDIOC_EXPBUF (or imported via V4L2_MEMORY_DMABUF) have no
that doesn't call the new helper. dmabuf for userspace to wait on.
Skips planes whose `vb2_plane.dbuf` is NULL — buffers that have
never been exported via VIDIOC_EXPBUF (or imported via
V4L2_MEMORY_DMABUF) have no dmabuf for userspace to wait on.
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de> Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
--- ---
drivers/media/common/videobuf2/videobuf2-core.c | 116 ++++++++++++++++ .../media/common/videobuf2/videobuf2-core.c | 95 +++++++++++++++++++
include/media/videobuf2-core.h | 19 +++ include/media/videobuf2-core.h | 29 ++++++
2 files changed, 135 insertions(+) 2 files changed, 124 insertions(+)
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index b0523fc23..ee766aae0 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c --- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c +++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -22,6 +22,9 @@ @@ -26,6 +26,9 @@
#include <linux/freezer.h> #include <linux/freezer.h>
#include <linux/kthread.h> #include <linux/kthread.h>
#include <linux/version.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-fence.h> +#include <linux/dma-fence.h>
+#include <linux/dma-resv.h> +#include <linux/dma-resv.h>
+#include <linux/dma-buf.h>
#include <media/videobuf2-core.h> #include <media/videobuf2-core.h>
#include <media/v4l2-mc.h> #include <media/v4l2-mc.h>
@@ -1175,6 +1178,107 @@ static void __enqueue_in_driver(struct vb2_buffer *vb)
call_void_vb_qop(vb, buf_queue, vb); @@ -1179,6 +1182,86 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no)
} }
EXPORT_SYMBOL_GPL(vb2_plane_cookie);
+/* +/*
+ * dma_resv release-fence integration. + * dma_resv release-fence integration.
+ * + *
+ * Background: V4L2 producers (vb2-using drivers) historically did not + * V4L2 producers historically don't propagate buffer-state-done into
+ * propagate buffer-state-done into the dmabuf's dma_resv exclusive + * the dmabuf's dma_resv exclusive fence. Userspace consumers that
+ * fence. Userspace consumers that imported V4L2-produced dmabufs and + * wait on that fence (e.g. wayland compositors via poll(POLLIN) or
+ * tried to do implicit synchronization the spec-clean way + * DMA_BUF_IOCTL_EXPORT_SYNC_FILE) currently see either no fences or
+ * (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) got either zero + * a stub fence from dma_fence_get_stub(). The opt-in API below lets
+ * fences or a stub fence from dma_fence_get_stub(). This is correct + * a driver attach a real producer fence at QBUF time and have it
+ * by accident for the common case (clients call DQBUF before + * signalled by vb2_buffer_done().
+ * importing) but represents a contract gap.
+ *
+ * The opt-in API below lets a driver attach a real fence at QBUF
+ * time and have it signalled at vb2_buffer_done. Drivers opt in by
+ * calling vb2_buffer_attach_release_fence(vb) from their buf_queue
+ * callback. No behaviour change for drivers that don't opt in.
+ */ + */
+ +
+static const char *vb2_dma_resv_get_driver_name(struct dma_fence *fence) +static const char *vb2_dma_resv_get_driver_name(struct dma_fence *fence)
@@ -85,21 +83,6 @@ diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/com
+ .get_timeline_name = vb2_dma_resv_get_timeline_name, + .get_timeline_name = vb2_dma_resv_get_timeline_name,
+}; +};
+ +
+/**
+ * vb2_buffer_attach_release_fence() - attach a dma_resv exclusive fence
+ * to each of @vb's plane dmabufs, to be signalled when the buffer
+ * transitions to VB2_BUF_STATE_DONE.
+ *
+ * @vb: the buffer being queued to the producer (just-completed
+ * transition out of VB2_BUF_STATE_QUEUED into DRIVER-owned).
+ *
+ * Drivers should call this from their buf_queue callback (after the
+ * driver-internal queueing — e.g. after v4l2_m2m_buf_queue() for
+ * M2M drivers). Planes whose dbuf is NULL are skipped silently.
+ *
+ * Returns 0 on success, negative errno on allocation failure. On
+ * error, no fence is attached and vb->release_fence remains NULL.
+ */
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb) +int vb2_buffer_attach_release_fence(struct vb2_buffer *vb)
+{ +{
+ struct vb2_queue *q = vb->vb2_queue; + struct vb2_queue *q = vb->vb2_queue;
@@ -128,10 +111,10 @@ diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/com
+ dma_resv_unlock(dbuf->resv); + dma_resv_unlock(dbuf->resv);
+ } + }
+ +
+ /* Hold one reference for the eventual signal in vb2_buffer_done. */ + /* One reference for the eventual signal in vb2_buffer_done. */
+ vb->release_fence = dma_fence_get(fence); + vb->release_fence = dma_fence_get(fence);
+ +
+ /* The dma_resv held its own references for each plane. Drop ours. */ + /* The dma_resv held its own reference per plane. Drop ours. */
+ dma_fence_put(fence); + dma_fence_put(fence);
+ +
+ return 0; + return 0;
@@ -153,13 +136,10 @@ diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/com
+ vb->release_fence = NULL; + vb->release_fence = NULL;
+} +}
+ +
static int __enqueue_in_driver_with_request(struct vb2_buffer *vb) void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
{ {
if (vb->req_obj.req) { struct vb2_queue *q = vb->vb2_queue;
@@ -1182,12 +1286,15 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state) @@ -1205,6 +1288,9 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
dprintk(q, 4, "done processing on buffer %d, state: %s\n",
vb->index, vb2_state_name(state));
if (state != VB2_BUF_STATE_QUEUED) if (state != VB2_BUF_STATE_QUEUED)
__vb2_buf_mem_finish(vb); __vb2_buf_mem_finish(vb);
@@ -169,16 +149,14 @@ diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/com
spin_lock_irqsave(&q->done_lock, flags); spin_lock_irqsave(&q->done_lock, flags);
if (state == VB2_BUF_STATE_QUEUED) { if (state == VB2_BUF_STATE_QUEUED) {
vb->state = VB2_BUF_STATE_QUEUED; vb->state = VB2_BUF_STATE_QUEUED;
} else { @@ -2652,6 +2738,15 @@ int vb2_core_queue_init(struct vb2_queue *q)
@@ -2598,6 +2705,15 @@ int vb2_core_queue_init(struct vb2_queue *q)
mutex_init(&q->mmap_lock); mutex_init(&q->mmap_lock);
init_waitqueue_head(&q->done_wq); init_waitqueue_head(&q->done_wq);
+ /* + /*
+ * Per-queue dma_resv fence context. Drivers that opt into + * Per-queue dma_resv release-fence context. Drivers opt-in via
+ * vb2_buffer_attach_release_fence() use these to allocate + * vb2_buffer_attach_release_fence(); other drivers pay only the
+ * fences in their own timeline; drivers that don't opt in + * cost of the unused fields.
+ * pay only the four-byte cost of an unused field.
+ */ + */
+ q->dma_resv_fence_context = dma_fence_context_alloc(1); + q->dma_resv_fence_context = dma_fence_context_alloc(1);
+ atomic64_set(&q->dma_resv_fence_seqno, 0); + atomic64_set(&q->dma_resv_fence_seqno, 0);
@@ -188,32 +166,31 @@ diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/com
if (q->buf_struct_size == 0) if (q->buf_struct_size == 0)
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index 9b02aeba4..2bf3272d4 100644
--- a/include/media/videobuf2-core.h --- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h +++ b/include/media/videobuf2-core.h
@@ -19,6 +19,7 @@ @@ -288,6 +288,12 @@ struct vb2_buffer {
#include <linux/dma-buf.h>
#include <linux/bitops.h>
#include <media/media-request.h>
#include <media/frame_vector.h>
+struct dma_fence;
@@ -286,6 +287,12 @@ struct vb2_buffer {
unsigned int skip_cache_sync_on_finish:1; unsigned int skip_cache_sync_on_finish:1;
struct vb2_plane planes[VB2_MAX_PLANES]; struct vb2_plane planes[VB2_MAX_PLANES];
+ /* + /*
+ * dma_resv release fence — set by vb2_buffer_attach_release_fence() + * dma_resv release fence — set by vb2_buffer_attach_release_fence()
+ * (driver opt-in from buf_queue), signalled by vb2_buffer_done. + * (driver opt-in from buf_queue), signalled and put by
+ * NULL for drivers that don't opt in. + * vb2_buffer_done(). NULL for drivers that don't opt in.
+ */ + */
+ struct dma_fence *release_fence; + struct dma_fence *release_fence;
struct list_head queued_entry; struct list_head queued_entry;
struct list_head done_entry; struct list_head done_entry;
#ifdef CONFIG_VIDEO_ADV_DEBUG
@@ -645,6 +652,11 @@ struct vb2_queue { @@ -658,6 +664,15 @@ struct vb2_queue {
spinlock_t done_lock;
wait_queue_head_t done_wq; wait_queue_head_t done_wq;
+ /* dma_resv release-fence integration (opt-in per buffer). */ + /*
+ * Per-queue dma_resv release-fence context. Drivers that opt
+ * into vb2_buffer_attach_release_fence() use these to allocate
+ * fences on a single per-queue timeline.
+ */
+ u64 dma_resv_fence_context; + u64 dma_resv_fence_context;
+ atomic64_t dma_resv_fence_seqno; + atomic64_t dma_resv_fence_seqno;
+ spinlock_t dma_resv_fence_lock; + spinlock_t dma_resv_fence_lock;
@@ -221,15 +198,21 @@ diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
unsigned int streaming:1; unsigned int streaming:1;
unsigned int start_streaming_called:1; unsigned int start_streaming_called:1;
unsigned int error:1; unsigned int error:1;
@@ -750,6 +762,13 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state); @@ -747,6 +762,20 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no);
*/ */
void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state); void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state);
+/** +/**
+ * vb2_buffer_attach_release_fence() - opt-in dma_resv release fence. + * vb2_buffer_attach_release_fence() - opt-in dma_resv release fence.
+ * Called from a driver's buf_queue callback after enqueueing the + * @vb: the buffer being queued to the producer.
+ * buffer in the driver's own queue. See videobuf2-core.c for + *
+ * rationale and call shape. + * Drivers call this from their buf_queue callback to attach an
+ * exclusive write fence to each plane's dmabuf->resv. The fence
+ * is signalled and put by vb2_buffer_done() when the buffer
+ * transitions to VB2_BUF_STATE_DONE / _ERROR. Skips planes whose
+ * dbuf is NULL.
+ *
+ * Returns 0 on success, negative errno on allocation failure.
+ */ + */
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb); +int vb2_buffer_attach_release_fence(struct vb2_buffer *vb);
+ +
@@ -237,4 +220,5 @@ diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
* vb2_discard_done() - discard all buffers marked as DONE. * vb2_discard_done() - discard all buffers marked as DONE.
* @q: pointer to &struct vb2_queue with videobuf2 queue. * @q: pointer to &struct vb2_queue with videobuf2 queue.
-- --
2.44.0 2.47.3
@@ -0,0 +1,56 @@
From 91522b562665b94607337a3f30d1586f818d9387 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Tue, 28 Apr 2026 19:23:50 +0000
Subject: [PATCH 2/3] media: hantro: attach dma_resv release fence at buf_queue
Opt the hantro driver into the new vb2 release-fence helper.
When userspace QBUFs a buffer to hantro, the buffer is added to the
driver's m2m queue via v4l2_m2m_buf_queue. We additionally call
vb2_buffer_attach_release_fence() so each plane's dmabuf->resv
gets a real producer fence attached. The fence is signalled by
vb2_buffer_done when hantro completes the decode (via
v4l2_m2m_buf_done_and_job_finish in hantro_drv.c, which converges
on vb2_buffer_done).
Wayland compositors (and any other userspace) that import hantro
CAPTURE buffers and wait on the dmabuf's implicit-sync fence now
wait on a real fence representing the producer's actual completion,
not a stub. Validated end-to-end on PineTab2 (RK3566 / Mali-G52 /
mainline 6.19 with this series backported) playing 1080p30 H.264 in
chromium under stock KDE Plasma 6.6.4 Wayland: KWin's
Transaction::watchDmaBuf wait completes correctly the moment
hantro's IRQ fires, instead of falling back to a stub-resolved
poll.
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/platform/verisilicon/hantro_v4l2.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
index 62d3962c1..e95a3433a 100644
--- a/drivers/media/platform/verisilicon/hantro_v4l2.c
+++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
@@ -877,6 +877,18 @@ static void hantro_buf_queue(struct vb2_buffer *vb)
}
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+
+ /*
+ * Opt in to vb2's dma_resv release-fence path. Userspace
+ * consumers that imported this buffer's dmabuf and wait on
+ * its implicit-sync fence (poll(POLLIN) or
+ * DMA_BUF_IOCTL_EXPORT_SYNC_FILE) get a real producer fence
+ * representing this device's completion, instead of the stub
+ * fence dma_buf_export_sync_file substitutes when dma_resv
+ * is empty. Best-effort: a fence-allocation failure means we
+ * lose implicit-sync precision, no functional regression.
+ */
+ (void)vb2_buffer_attach_release_fence(vb);
}
static bool hantro_vq_is_coded(struct vb2_queue *q)
--
2.47.3
@@ -1,79 +0,0 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH RFC 2/3] media: hantro: attach dma_resv release fence at buf_queue
Date: 2026-04-28
Opt the hantro driver into the new vb2 release-fence helper.
When userspace QBUFs a buffer to hantro, the buffer is added to the
driver's m2m queue via v4l2_m2m_buf_queue. We additionally call
vb2_buffer_attach_release_fence() so each plane's dmabuf->resv gets
a real producer fence attached. The fence is signalled by vb2_buffer_done
when hantro completes the decode (via v4l2_m2m_buf_done_and_job_finish
in hantro_drv.c, which converges on vb2_buffer_done).
Wayland compositors that import hantro CAPTURE buffers (chrome,
firefox, mpv, gstreamer) and wait on the dmabuf's implicit-sync
fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) now wait on a
real fence representing the producer's actual completion, not a
stub. KWin's `Transaction::watchDmaBuf` path on Mali-class hardware
is the user-visible benefit: the per-frame sync_file roundtrip
completes correctly the moment hantro's IRQ handler runs, instead
of either polling on a stub fence or — in the failure mode that
drove this work — failing to signal at all due to a race that the
stub-fence path masked.
Validated on PineTab2 (RK3566 / Mali-G52 / mainline 6.19 with this
series backported / panfrost mesa 26.0.5) playing 1080p30 H.264 in
chromium under stock KDE Plasma 6.6.4 Wayland: the chrome stall that
required a KWin watchDmaBuf bypass workaround (kwin-fourier in the
chromium-fourier project) is gone with this kernel-side fix in
place; KWin's wait completes correctly.
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/platform/verisilicon/hantro_v4l2.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
--- a/drivers/media/platform/verisilicon/hantro_v4l2.c
+++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
@@ -858,11 +858,24 @@ static void hantro_buf_queue(struct vb2_buffer *vb)
{
struct hantro_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
if (V4L2_TYPE_IS_CAPTURE(vb->vb2_queue->type) &&
vb2_is_streaming(vb->vb2_queue) &&
v4l2_m2m_dst_buf_is_last(ctx->fh.m2m_ctx)) {
unsigned int i;
for (i = 0; i < vb->num_planes; i++)
vb2_set_plane_payload(vb, i, 0);
vbuf->field = V4L2_FIELD_NONE;
vbuf->sequence =
ctx->queue[V4L2_TYPE_IS_OUTPUT(vb->vb2_queue->type)].sequence++;
v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_DONE);
return;
}
- v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+ v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+
+ /*
+ * Opt in to vb2's dma_resv release-fence path: any userspace
+ * consumer that imported this buffer's dmabuf and is doing
+ * implicit-sync via poll(POLLIN) or
+ * DMA_BUF_IOCTL_EXPORT_SYNC_FILE now waits on a real fence
+ * representing this device's completion, instead of the stub
+ * fence dma_buf_export_sync_file substitutes when dma_resv is
+ * empty. Best-effort: if fence allocation fails we just lose
+ * the implicit-sync precision, no functional regression.
+ */
+ (void)vb2_buffer_attach_release_fence(vb);
}
const struct vb2_ops hantro_queue_ops = {
--
2.44.0
@@ -0,0 +1,48 @@
From 1a1619ab9ad3583842fbfffc649d7662619fb73b Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Tue, 28 Apr 2026 19:23:51 +0000
Subject: [PATCH 3/3] media: rockchip-rga: attach dma_resv release fence at
buf_queue
Opt the Rockchip RGA driver into the new vb2 release-fence helper.
Same shape as the hantro patch: rga_buf_queue enqueues the buffer
in the driver's m2m queue via v4l2_m2m_buf_queue and additionally
attaches a release fence to each plane's dmabuf->resv via
vb2_buffer_attach_release_fence(). vb2_buffer_done signals the
fence when RGA completes the M2M operation.
Userspace consumers of RGA-produced dmabufs (image-processing
pipelines, screen-rotation servers, gstreamer flows on Rockchip
boards) get spec-clean implicit-sync semantics, matching what
hantro now does in the same patch series.
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/platform/rockchip/rga/rga-buf.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/media/platform/rockchip/rga/rga-buf.c b/drivers/media/platform/rockchip/rga/rga-buf.c
index 70808049d..5557ca632 100644
--- a/drivers/media/platform/rockchip/rga/rga-buf.c
+++ b/drivers/media/platform/rockchip/rga/rga-buf.c
@@ -153,6 +153,16 @@ static void rga_buf_queue(struct vb2_buffer *vb)
struct rga_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+
+ /*
+ * Opt in to vb2's dma_resv release-fence path so userspace
+ * consumers of RGA-produced dmabufs get a real producer fence
+ * to wait on instead of the dma_buf core's stub fence. See
+ * the leading patch in this series for rationale. Best-effort:
+ * fence-allocation failure means we lose implicit-sync
+ * precision but the m2m operation itself proceeds normally.
+ */
+ (void)vb2_buffer_attach_release_fence(vb);
}
static void rga_buf_cleanup(struct vb2_buffer *vb)
--
2.47.3
@@ -1,47 +0,0 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH RFC 3/3] media: rockchip-rga: attach dma_resv release fence at buf_queue
Date: 2026-04-28
Opt the Rockchip RGA driver into the new vb2 release-fence helper.
Same shape as the hantro patch: the existing buf_queue path enqueues
the buffer in the driver's m2m queue via v4l2_m2m_buf_queue, and we
additionally attach a release fence to each plane's dmabuf->resv via
vb2_buffer_attach_release_fence(). vb2_buffer_done signals the fence
when RGA completes the M2M operation.
Userspace consumers of RGA-produced dmabufs (image-processing
pipelines, screen-rotation servers, gstreamer flows) get spec-clean
implicit-sync semantics, matching what hantro now does in the same
patch series.
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/platform/rockchip/rga/rga-buf.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/media/platform/rockchip/rga/rga-buf.c b/drivers/media/platform/rockchip/rga/rga-buf.c
--- a/drivers/media/platform/rockchip/rga/rga-buf.c
+++ b/drivers/media/platform/rockchip/rga/rga-buf.c
@@ -150,7 +150,18 @@ static void rga_buf_queue(struct vb2_buffer *vb)
{
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
struct rga_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+
+ /*
+ * Opt in to vb2's dma_resv release-fence path so userspace
+ * consumers of RGA-produced dmabufs get a real producer fence
+ * to wait on instead of the dma_buf core's substitute stub
+ * fence. See the leading patch in this series for rationale
+ * and the helper definition. Best-effort: a fence-allocation
+ * failure means we lose implicit-sync precision but the m2m
+ * operation itself proceeds normally.
+ */
+ (void)vb2_buffer_attach_release_fence(vb);
}
static void rga_buf_cleanup(struct vb2_buffer *vb)
--
2.44.0
+14 -4
View File
@@ -34,10 +34,20 @@ their respective `buf_queue` callbacks).
## Status ## Status
Patches drafted but **not yet applied / compile-tested / runtime- **Patches apply cleanly to Linux 6.12 mainline via `git am`**
tested.** They're written against linux-next master as of verified against `/tmp/hantro-src` (sparse-checked-out v6.12 plus
2026-04-28 (sparse-checked-out at `/tmp/hantro-src` during the linux-next master). All kernel API calls verified to match real
chromium-fourier campaign on ohm). Pre-flight before sending: signatures in `include/linux/dma-fence.h` and
`include/linux/dma-resv.h`:
- `dma_fence_init(fence, ops, lock, context, seqno)`
- `dma_resv_add_fence(obj, fence, usage)`
- `DMA_RESV_USAGE_WRITE` enum present ✓
- `dma_fence_signal`, `dma_fence_set_error`, `dma_fence_get`,
`dma_fence_put`, `dma_fence_context_alloc`
- `dma_resv_lock(obj, NULL)`, `dma_resv_unlock`
Remaining gates before sending to linux-media:
1. **Compile** — `make drivers/media/common/videobuf2/videobuf2-core.o 1. **Compile** — `make drivers/media/common/videobuf2/videobuf2-core.o
drivers/media/platform/verisilicon/hantro_v4l2.o drivers/media/platform/verisilicon/hantro_v4l2.o