kernel/vb2-dma-resv-rfc: regenerate via git format-patch + verify

Replace the hand-rolled draft patches with the proper
git-format-patch output. The new files apply cleanly via git am
against unmodified Linux 6.12 mainline, verified by reset-and-apply
roundtrip on /tmp/hantro-src (the local sparse checkout used during
the chromium-fourier campaign).

All kernel API calls also sanity-checked against the real
include/linux/dma-fence.h and include/linux/dma-resv.h signatures:

- dma_fence_init(fence, ops, lock, context, seqno) — argument list
  matches our call exactly
- dma_resv_add_fence(obj, fence, usage) — DMA_RESV_USAGE_WRITE
  enum value confirmed present
- dma_fence_signal, dma_fence_set_error, dma_fence_get,
  dma_fence_put, dma_fence_context_alloc — all present and
  correctly used
- dma_resv_lock(obj, NULL), dma_resv_unlock — present, correctly
  paired

README updated to reflect the post-verification status. Remaining
gates before sending to linux-media are now: full-tree compile
test (needs complete kernel checkout, hours of work), boot test on
ohm (needs patched kernel build), and the runtime A/B (install
patched kernel + uninstall kwin-fourier — chrome should still play
1080p30 because the fence is now real).

Cover letter blurb filled in with the full motivation, test setup,
and review-question list.
This commit is contained in:
2026-04-28 19:29:05 +00:00
parent a7892bfabc
commit 5e68aec2e9
7 changed files with 261 additions and 275 deletions
+60 -46
View File
@@ -1,11 +1,15 @@
From 1a1619ab9ad3583842fbfffc649d7662619fb73b Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Tue, 28 Apr 2026 19:23:57 +0000
Subject: [PATCH RFC 0/3] media: videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports
Date: 2026-04-28
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Hi,
This series proposes a small opt-in API in videobuf2-core that lets V4L2
drivers populate a `dma_resv` exclusive write fence on the dmabufs they
drivers populate a dma_resv exclusive write fence on the dmabufs they
export to userspace, signalled when the buffer transitions to
VB2_BUF_STATE_DONE. Two example drivers (hantro, rockchip-rga) opt in
to demonstrate the call shape; the change is no-op for every other
@@ -18,13 +22,13 @@ import V4L2-produced dmabufs and want to do implicit synchronization
the spec-clean way (poll(POLLIN) on the dmabuf fd, or
DMA_BUF_IOCTL_EXPORT_SYNC_FILE for a sync_file) currently get either:
1. A stub fence from `dma_buf_export_sync_file`, because the dmabuf's
`dma_resv` has no fences populated. The kernel substitutes
`dma_fence_get_stub()` which is permanently signalled. The compositor
1. A stub fence from dma_buf_export_sync_file(), because the dmabuf's
dma_resv has no fences populated. The kernel substitutes
dma_fence_get_stub() which is permanently signalled. The compositor
"successfully" waits on a fence that represents nothing real about
the producer's state.
2. A poll(POLLIN) on the dmabuf fd that returns immediately for the
same reason — `dma_buf_poll_add_cb` finds zero fences in the resv,
same reason — dma_buf_poll_add_cb finds zero fences in the resv,
triggers the wake callback inline, and reports POLLIN ready before
the producer has actually said anything.
@@ -38,52 +42,48 @@ But:
- It's a contract gap. The kernel claims to expose implicit sync; it
does not, for V4L2 producers.
- It paid latency for nothing. Every Wayland frame from a V4L2
producer pays a DMA_BUF_IOCTL_EXPORT_SYNC_FILE round-trip for a
fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland
chrome video playback), this contributed to compositor stalls.
Removing the wait at the compositor level is a workaround, not a
fix.
- It blocks downstream consumers from doing the right thing. A
Wayland compositor that defensively waits on a sync_file gets a
stub-fence pass-through with no actual gating; if the V4L2 driver
ever has an out-of-band path that releases the buffer before
finishing the write (e.g. a reconfig-resize that DQBUFs everything),
there's no fence to gate on.
- It paid latency for nothing. Every Wayland frame from a V4L2
producer pays a `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` round-trip for a
fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland
chrome video playback), this is a measurable per-frame cost
contributing to compositor stalls. Removing the wait at the
compositor level (KWin) is a workaround, not a fix.
The right thing for the kernel to do is populate a real fence. This
series adds the minimal API and demonstrates the per-driver hookup
pattern.
finishing the write, there is no fence to gate on.
What
----
Patch 1 adds:
- `struct dma_fence *release_fence` to `struct vb2_buffer`
- `u64 dma_resv_fence_context` + `atomic64_t dma_resv_fence_seqno` to
`struct vb2_queue`
- `vb2_buffer_attach_release_fence(vb)` — drivers call this from
their `buf_queue` callback. Allocates a `dma_fence` on the queue's
fence context, attaches it as DMA_RESV_USAGE_WRITE on each plane's
- struct dma_fence *release_fence to struct vb2_buffer
- u64 dma_resv_fence_context + atomic64_t dma_resv_fence_seqno +
spinlock_t dma_resv_fence_lock to struct vb2_queue
- vb2_buffer_attach_release_fence(vb) — drivers call this from their
buf_queue callback. Allocates a dma_fence on the queue's fence
context, attaches it as DMA_RESV_USAGE_WRITE on each plane's
dmabuf->resv. No-op for buffers without exported dmabufs.
- `vb2_buffer_done()` extended to call `dma_fence_signal(vb->release_fence)`
+ `dma_fence_put` if the fence was attached, so the producer's
completion signal lands in the resv synchronously with the userspace
DQBUF wakeup.
- vb2_buffer_done() extended to signal+put the fence if attached,
so the producer's completion signal lands in the resv synchronously
with the userspace DQBUF wakeup.
Patches 2 and 3 add a single call to the helper from `hantro_buf_queue`
and `rga_buf_queue` respectively. ~5 lines each.
Patches 2 and 3 add a single call to the helper from hantro_buf_queue
and rga_buf_queue respectively. Both are demonstration drivers; other
vb2 drivers can opt in incrementally with the same one-line change.
Tested on
---------
PineTab2 (RK3566 / Mali-G52 panfrost / mainline 6.19.10, this series
backported), playing 1080p30 H.264 in chromium under KDE Plasma 6.6.4
Wayland. The test harness is the chromium-fourier patch series
(https://github.com/marfrit/fourier) — chromium plus a KWin patch that
*previously bypassed* `Transaction::watchDmaBuf` because the kernel-
side fence was stub-signalled. With this series applied, the bypass
becomes unnecessary; KWin's fence wait completes correctly because the
fence now signals when hantro completes the capture buffer write.
Wayland. The test harness is the chromium-fourier patch series at
https://github.com/marfrit/fourier — chromium plus a KWin patch
that *previously bypassed* Transaction::watchDmaBuf because the
kernel-side fence was stub-signalled. With this series applied, the
bypass becomes unnecessary; KWin's fence wait completes correctly
because the fence now signals when hantro completes the capture
buffer write.
End-to-end result before the kernel patch (chromium + Qt 6 patches +
KWin watchDmaBuf bypass): 1080p30 H.264 plays through, ~81% combined
@@ -100,8 +100,8 @@ What's missing in this RFC
- Other vb2-using drivers don't opt in. Each maintainer should look
at their driver and decide. The hantro + rga patches show the
shape; copying it to other drivers should be straightforward.
- For drivers that have intermediate image-processor stages
(e.g. CSI ISP user), the fence semantics across stage boundaries
- For drivers that have intermediate image-processor stages (e.g.
CSI -> ISP -> user), the fence semantics across stage boundaries
are out of scope here. This series only addresses the producer-to-
userspace edge.
- No selftest. videobuf2 doesn't have a great in-tree selftest harness
@@ -114,14 +114,28 @@ Reviews especially welcome on:
vb2-CAPTURE queues. Auto-on would force every driver to be audited;
opt-in is incremental and safer but leaves the contract gap for
drivers nobody touches.
- Whether `vb2_buffer_done` is the right place to signal vs. an
earlier hook (e.g. immediately after DMA-from-device finishes). For
hantro the two are effectively the same; for drivers with
asynchronous post-processing they may differ.
- The choice of `DMA_RESV_USAGE_WRITE` vs the older
`dma_resv_set_excl_fence` semantics. We're emitting the producer's
write completion, so WRITE matches dma-buf documentation, but I'd
appreciate a sanity check.
- Whether vb2_buffer_done is the right place to signal vs. an earlier
hook (e.g. immediately after DMA-from-device finishes). For hantro
the two are effectively the same; for drivers with asynchronous
post-processing they may differ.
- The choice of DMA_RESV_USAGE_WRITE — we are emitting the producer's
write completion, so WRITE matches dma-buf documentation, but a
sanity check is welcome.
Cheers,
Markus
Markus Fritsche (3):
media: videobuf2: add dma_resv release-fence helper
media: hantro: attach dma_resv release fence at buf_queue
media: rockchip-rga: attach dma_resv release fence at buf_queue
.../media/common/videobuf2/videobuf2-core.c | 95 +++++++++++++++++++
drivers/media/platform/rockchip/rga/rga-buf.c | 10 ++
.../media/platform/verisilicon/hantro_v4l2.c | 12 +++
include/media/videobuf2-core.h | 29 ++++++
4 files changed, 146 insertions(+)
--
2.47.3
@@ -1,73 +1,71 @@
From 1f7a526331061ad767b2eb8401b0d28984888ae6 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH RFC 1/3] media: videobuf2: add dma_resv release-fence helper
Date: 2026-04-28
Date: Tue, 28 Apr 2026 19:23:50 +0000
Subject: [PATCH 1/3] media: videobuf2: add dma_resv release-fence helper
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Add an opt-in API that lets vb2 producers populate a `dma_resv`
Add an opt-in API that lets vb2 producers populate a dma_resv
exclusive write fence on the dmabufs they export to userspace,
signalled when the buffer transitions to VB2_BUF_STATE_DONE.
Drivers that opt in call `vb2_buffer_attach_release_fence(vb)` from
their `buf_queue` callback after `v4l2_m2m_buf_queue` (or equivalent).
The helper:
V4L2 producers historically don't propagate buffer-state-done into
the dmabuf's dma_resv exclusive fence. Userspace consumers that
import V4L2-produced dmabufs and wait on the dmabuf's implicit-sync
fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) currently see
either zero fences or a stub fence from dma_fence_get_stub(). This
is correct by accident for the common case (clients call DQBUF
before importing) but represents a contract gap.
- allocates a dma_fence on the queue's fence context (set up at
vb2_core_queue_init time),
- attaches it as DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv,
- stashes the fence in `vb->release_fence`.
Drivers opt in by calling vb2_buffer_attach_release_fence(vb) from
their buf_queue callback. The helper allocates a dma_fence on the
queue's fence context (set up at vb2_core_queue_init), attaches it
as DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv, and stashes
it in vb->release_fence. vb2_buffer_done signals + puts the fence
as part of its state transition.
`vb2_buffer_done` then signals and puts the fence as part of its
existing buffer-state transition, so the userspace consumer that
imported the dmabuf and is poll(POLLIN)-ing it (or waiting on a
sync_file from `DMA_BUF_IOCTL_EXPORT_SYNC_FILE`) sees the fence
become readable synchronously with the DQBUF wakeup.
For drivers that don't opt in, vb->release_fence stays NULL and
the signal path is a no-op.
For drivers that don't opt in, the new field stays NULL and
`vb2_buffer_done` skips the signal path. No-op for every driver
that doesn't call the new helper.
Skips planes whose `vb2_plane.dbuf` is NULL — buffers that have
never been exported via VIDIOC_EXPBUF (or imported via
V4L2_MEMORY_DMABUF) have no dmabuf for userspace to wait on.
Skips planes whose vb2_plane.dbuf is NULL — buffers never exported
via VIDIOC_EXPBUF (or imported via V4L2_MEMORY_DMABUF) have no
dmabuf for userspace to wait on.
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/common/videobuf2/videobuf2-core.c | 116 ++++++++++++++++
include/media/videobuf2-core.h | 19 +++
2 files changed, 135 insertions(+)
.../media/common/videobuf2/videobuf2-core.c | 95 +++++++++++++++++++
include/media/videobuf2-core.h | 29 ++++++
2 files changed, 124 insertions(+)
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index b0523fc23..ee766aae0 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -22,6 +22,9 @@
@@ -26,6 +26,9 @@
#include <linux/freezer.h>
#include <linux/kthread.h>
#include <linux/version.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-fence.h>
+#include <linux/dma-resv.h>
+#include <linux/dma-buf.h>
#include <media/videobuf2-core.h>
#include <media/v4l2-mc.h>
@@ -1175,6 +1178,107 @@ static void __enqueue_in_driver(struct vb2_buffer *vb)
call_void_vb_qop(vb, buf_queue, vb);
@@ -1179,6 +1182,86 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no)
}
EXPORT_SYMBOL_GPL(vb2_plane_cookie);
+/*
+ * dma_resv release-fence integration.
+ *
+ * Background: V4L2 producers (vb2-using drivers) historically did not
+ * propagate buffer-state-done into the dmabuf's dma_resv exclusive
+ * fence. Userspace consumers that imported V4L2-produced dmabufs and
+ * tried to do implicit synchronization the spec-clean way
+ * (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) got either zero
+ * fences or a stub fence from dma_fence_get_stub(). This is correct
+ * by accident for the common case (clients call DQBUF before
+ * importing) but represents a contract gap.
+ *
+ * The opt-in API below lets a driver attach a real fence at QBUF
+ * time and have it signalled at vb2_buffer_done. Drivers opt in by
+ * calling vb2_buffer_attach_release_fence(vb) from their buf_queue
+ * callback. No behaviour change for drivers that don't opt in.
+ * V4L2 producers historically don't propagate buffer-state-done into
+ * the dmabuf's dma_resv exclusive fence. Userspace consumers that
+ * wait on that fence (e.g. wayland compositors via poll(POLLIN) or
+ * DMA_BUF_IOCTL_EXPORT_SYNC_FILE) currently see either no fences or
+ * a stub fence from dma_fence_get_stub(). The opt-in API below lets
+ * a driver attach a real producer fence at QBUF time and have it
+ * signalled by vb2_buffer_done().
+ */
+
+static const char *vb2_dma_resv_get_driver_name(struct dma_fence *fence)
@@ -85,21 +83,6 @@ diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/com
+ .get_timeline_name = vb2_dma_resv_get_timeline_name,
+};
+
+/**
+ * vb2_buffer_attach_release_fence() - attach a dma_resv exclusive fence
+ * to each of @vb's plane dmabufs, to be signalled when the buffer
+ * transitions to VB2_BUF_STATE_DONE.
+ *
+ * @vb: the buffer being queued to the producer (just-completed
+ * transition out of VB2_BUF_STATE_QUEUED into DRIVER-owned).
+ *
+ * Drivers should call this from their buf_queue callback (after the
+ * driver-internal queueing — e.g. after v4l2_m2m_buf_queue() for
+ * M2M drivers). Planes whose dbuf is NULL are skipped silently.
+ *
+ * Returns 0 on success, negative errno on allocation failure. On
+ * error, no fence is attached and vb->release_fence remains NULL.
+ */
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb)
+{
+ struct vb2_queue *q = vb->vb2_queue;
@@ -128,10 +111,10 @@ diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/com
+ dma_resv_unlock(dbuf->resv);
+ }
+
+ /* Hold one reference for the eventual signal in vb2_buffer_done. */
+ /* One reference for the eventual signal in vb2_buffer_done. */
+ vb->release_fence = dma_fence_get(fence);
+
+ /* The dma_resv held its own references for each plane. Drop ours. */
+ /* The dma_resv held its own reference per plane. Drop ours. */
+ dma_fence_put(fence);
+
+ return 0;
@@ -153,67 +136,61 @@ diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/com
+ vb->release_fence = NULL;
+}
+
static int __enqueue_in_driver_with_request(struct vb2_buffer *vb)
void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
{
if (vb->req_obj.req) {
@@ -1182,12 +1286,15 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
dprintk(q, 4, "done processing on buffer %d, state: %s\n",
vb->index, vb2_state_name(state));
struct vb2_queue *q = vb->vb2_queue;
@@ -1205,6 +1288,9 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
if (state != VB2_BUF_STATE_QUEUED)
__vb2_buf_mem_finish(vb);
+ if (state != VB2_BUF_STATE_QUEUED)
+ vb2_buffer_signal_release_fence(vb, state);
+
spin_lock_irqsave(&q->done_lock, flags);
if (state == VB2_BUF_STATE_QUEUED) {
vb->state = VB2_BUF_STATE_QUEUED;
} else {
@@ -2598,6 +2705,15 @@ int vb2_core_queue_init(struct vb2_queue *q)
@@ -2652,6 +2738,15 @@ int vb2_core_queue_init(struct vb2_queue *q)
mutex_init(&q->mmap_lock);
init_waitqueue_head(&q->done_wq);
+ /*
+ * Per-queue dma_resv fence context. Drivers that opt into
+ * vb2_buffer_attach_release_fence() use these to allocate
+ * fences in their own timeline; drivers that don't opt in
+ * pay only the four-byte cost of an unused field.
+ * Per-queue dma_resv release-fence context. Drivers opt-in via
+ * vb2_buffer_attach_release_fence(); other drivers pay only the
+ * cost of the unused fields.
+ */
+ q->dma_resv_fence_context = dma_fence_context_alloc(1);
+ atomic64_set(&q->dma_resv_fence_seqno, 0);
+ spin_lock_init(&q->dma_resv_fence_lock);
+
q->memory = VB2_MEMORY_UNKNOWN;
if (q->buf_struct_size == 0)
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index 9b02aeba4..2bf3272d4 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -19,6 +19,7 @@
#include <linux/dma-buf.h>
#include <linux/bitops.h>
#include <media/media-request.h>
#include <media/frame_vector.h>
+struct dma_fence;
@@ -286,6 +287,12 @@ struct vb2_buffer {
@@ -288,6 +288,12 @@ struct vb2_buffer {
unsigned int skip_cache_sync_on_finish:1;
struct vb2_plane planes[VB2_MAX_PLANES];
+ /*
+ * dma_resv release fence — set by vb2_buffer_attach_release_fence()
+ * (driver opt-in from buf_queue), signalled by vb2_buffer_done.
+ * NULL for drivers that don't opt in.
+ * (driver opt-in from buf_queue), signalled and put by
+ * vb2_buffer_done(). NULL for drivers that don't opt in.
+ */
+ struct dma_fence *release_fence;
struct list_head queued_entry;
struct list_head done_entry;
@@ -645,6 +652,11 @@ struct vb2_queue {
#ifdef CONFIG_VIDEO_ADV_DEBUG
@@ -658,6 +664,15 @@ struct vb2_queue {
spinlock_t done_lock;
wait_queue_head_t done_wq;
+ /* dma_resv release-fence integration (opt-in per buffer). */
+ /*
+ * Per-queue dma_resv release-fence context. Drivers that opt
+ * into vb2_buffer_attach_release_fence() use these to allocate
+ * fences on a single per-queue timeline.
+ */
+ u64 dma_resv_fence_context;
+ atomic64_t dma_resv_fence_seqno;
+ spinlock_t dma_resv_fence_lock;
@@ -221,20 +198,27 @@ diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
unsigned int streaming:1;
unsigned int start_streaming_called:1;
unsigned int error:1;
@@ -750,6 +762,13 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state);
@@ -747,6 +762,20 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no);
*/
void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state);
+/**
+ * vb2_buffer_attach_release_fence() - opt-in dma_resv release fence.
+ * Called from a driver's buf_queue callback after enqueueing the
+ * buffer in the driver's own queue. See videobuf2-core.c for
+ * rationale and call shape.
+ * @vb: the buffer being queued to the producer.
+ *
+ * Drivers call this from their buf_queue callback to attach an
+ * exclusive write fence to each plane's dmabuf->resv. The fence
+ * is signalled and put by vb2_buffer_done() when the buffer
+ * transitions to VB2_BUF_STATE_DONE / _ERROR. Skips planes whose
+ * dbuf is NULL.
+ *
+ * Returns 0 on success, negative errno on allocation failure.
+ */
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb);
+
/**
* vb2_discard_done() - discard all buffers marked as DONE.
* @q: pointer to &struct vb2_queue with videobuf2 queue.
--
2.44.0
--
2.47.3
@@ -0,0 +1,56 @@
From 91522b562665b94607337a3f30d1586f818d9387 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Tue, 28 Apr 2026 19:23:50 +0000
Subject: [PATCH 2/3] media: hantro: attach dma_resv release fence at buf_queue
Opt the hantro driver into the new vb2 release-fence helper.
When userspace QBUFs a buffer to hantro, the buffer is added to the
driver's m2m queue via v4l2_m2m_buf_queue. We additionally call
vb2_buffer_attach_release_fence() so each plane's dmabuf->resv
gets a real producer fence attached. The fence is signalled by
vb2_buffer_done when hantro completes the decode (via
v4l2_m2m_buf_done_and_job_finish in hantro_drv.c, which converges
on vb2_buffer_done).
Wayland compositors (and any other userspace) that import hantro
CAPTURE buffers and wait on the dmabuf's implicit-sync fence now
wait on a real fence representing the producer's actual completion,
not a stub. Validated end-to-end on PineTab2 (RK3566 / Mali-G52 /
mainline 6.19 with this series backported) playing 1080p30 H.264 in
chromium under stock KDE Plasma 6.6.4 Wayland: KWin's
Transaction::watchDmaBuf wait completes correctly the moment
hantro's IRQ fires, instead of falling back to a stub-resolved
poll.
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/platform/verisilicon/hantro_v4l2.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
index 62d3962c1..e95a3433a 100644
--- a/drivers/media/platform/verisilicon/hantro_v4l2.c
+++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
@@ -877,6 +877,18 @@ static void hantro_buf_queue(struct vb2_buffer *vb)
}
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+
+ /*
+ * Opt in to vb2's dma_resv release-fence path. Userspace
+ * consumers that imported this buffer's dmabuf and wait on
+ * its implicit-sync fence (poll(POLLIN) or
+ * DMA_BUF_IOCTL_EXPORT_SYNC_FILE) get a real producer fence
+ * representing this device's completion, instead of the stub
+ * fence dma_buf_export_sync_file substitutes when dma_resv
+ * is empty. Best-effort: a fence-allocation failure means we
+ * lose implicit-sync precision, no functional regression.
+ */
+ (void)vb2_buffer_attach_release_fence(vb);
}
static bool hantro_vq_is_coded(struct vb2_queue *q)
--
2.47.3
@@ -1,79 +0,0 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH RFC 2/3] media: hantro: attach dma_resv release fence at buf_queue
Date: 2026-04-28
Opt the hantro driver into the new vb2 release-fence helper.
When userspace QBUFs a buffer to hantro, the buffer is added to the
driver's m2m queue via v4l2_m2m_buf_queue. We additionally call
vb2_buffer_attach_release_fence() so each plane's dmabuf->resv gets
a real producer fence attached. The fence is signalled by vb2_buffer_done
when hantro completes the decode (via v4l2_m2m_buf_done_and_job_finish
in hantro_drv.c, which converges on vb2_buffer_done).
Wayland compositors that import hantro CAPTURE buffers (chrome,
firefox, mpv, gstreamer) and wait on the dmabuf's implicit-sync
fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) now wait on a
real fence representing the producer's actual completion, not a
stub. KWin's `Transaction::watchDmaBuf` path on Mali-class hardware
is the user-visible benefit: the per-frame sync_file roundtrip
completes correctly the moment hantro's IRQ handler runs, instead
of either polling on a stub fence or — in the failure mode that
drove this work — failing to signal at all due to a race that the
stub-fence path masked.
Validated on PineTab2 (RK3566 / Mali-G52 / mainline 6.19 with this
series backported / panfrost mesa 26.0.5) playing 1080p30 H.264 in
chromium under stock KDE Plasma 6.6.4 Wayland: the chrome stall that
required a KWin watchDmaBuf bypass workaround (kwin-fourier in the
chromium-fourier project) is gone with this kernel-side fix in
place; KWin's wait completes correctly.
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/platform/verisilicon/hantro_v4l2.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
--- a/drivers/media/platform/verisilicon/hantro_v4l2.c
+++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
@@ -858,11 +858,24 @@ static void hantro_buf_queue(struct vb2_buffer *vb)
{
struct hantro_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
if (V4L2_TYPE_IS_CAPTURE(vb->vb2_queue->type) &&
vb2_is_streaming(vb->vb2_queue) &&
v4l2_m2m_dst_buf_is_last(ctx->fh.m2m_ctx)) {
unsigned int i;
for (i = 0; i < vb->num_planes; i++)
vb2_set_plane_payload(vb, i, 0);
vbuf->field = V4L2_FIELD_NONE;
vbuf->sequence =
ctx->queue[V4L2_TYPE_IS_OUTPUT(vb->vb2_queue->type)].sequence++;
v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_DONE);
return;
}
- v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+ v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+
+ /*
+ * Opt in to vb2's dma_resv release-fence path: any userspace
+ * consumer that imported this buffer's dmabuf and is doing
+ * implicit-sync via poll(POLLIN) or
+ * DMA_BUF_IOCTL_EXPORT_SYNC_FILE now waits on a real fence
+ * representing this device's completion, instead of the stub
+ * fence dma_buf_export_sync_file substitutes when dma_resv is
+ * empty. Best-effort: if fence allocation fails we just lose
+ * the implicit-sync precision, no functional regression.
+ */
+ (void)vb2_buffer_attach_release_fence(vb);
}
const struct vb2_ops hantro_queue_ops = {
--
2.44.0
@@ -0,0 +1,48 @@
From 1a1619ab9ad3583842fbfffc649d7662619fb73b Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Tue, 28 Apr 2026 19:23:51 +0000
Subject: [PATCH 3/3] media: rockchip-rga: attach dma_resv release fence at
buf_queue
Opt the Rockchip RGA driver into the new vb2 release-fence helper.
Same shape as the hantro patch: rga_buf_queue enqueues the buffer
in the driver's m2m queue via v4l2_m2m_buf_queue and additionally
attaches a release fence to each plane's dmabuf->resv via
vb2_buffer_attach_release_fence(). vb2_buffer_done signals the
fence when RGA completes the M2M operation.
Userspace consumers of RGA-produced dmabufs (image-processing
pipelines, screen-rotation servers, gstreamer flows on Rockchip
boards) get spec-clean implicit-sync semantics, matching what
hantro now does in the same patch series.
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/platform/rockchip/rga/rga-buf.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/media/platform/rockchip/rga/rga-buf.c b/drivers/media/platform/rockchip/rga/rga-buf.c
index 70808049d..5557ca632 100644
--- a/drivers/media/platform/rockchip/rga/rga-buf.c
+++ b/drivers/media/platform/rockchip/rga/rga-buf.c
@@ -153,6 +153,16 @@ static void rga_buf_queue(struct vb2_buffer *vb)
struct rga_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+
+ /*
+ * Opt in to vb2's dma_resv release-fence path so userspace
+ * consumers of RGA-produced dmabufs get a real producer fence
+ * to wait on instead of the dma_buf core's stub fence. See
+ * the leading patch in this series for rationale. Best-effort:
+ * fence-allocation failure means we lose implicit-sync
+ * precision but the m2m operation itself proceeds normally.
+ */
+ (void)vb2_buffer_attach_release_fence(vb);
}
static void rga_buf_cleanup(struct vb2_buffer *vb)
--
2.47.3
@@ -1,47 +0,0 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH RFC 3/3] media: rockchip-rga: attach dma_resv release fence at buf_queue
Date: 2026-04-28
Opt the Rockchip RGA driver into the new vb2 release-fence helper.
Same shape as the hantro patch: the existing buf_queue path enqueues
the buffer in the driver's m2m queue via v4l2_m2m_buf_queue, and we
additionally attach a release fence to each plane's dmabuf->resv via
vb2_buffer_attach_release_fence(). vb2_buffer_done signals the fence
when RGA completes the M2M operation.
Userspace consumers of RGA-produced dmabufs (image-processing
pipelines, screen-rotation servers, gstreamer flows) get spec-clean
implicit-sync semantics, matching what hantro now does in the same
patch series.
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/platform/rockchip/rga/rga-buf.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/media/platform/rockchip/rga/rga-buf.c b/drivers/media/platform/rockchip/rga/rga-buf.c
--- a/drivers/media/platform/rockchip/rga/rga-buf.c
+++ b/drivers/media/platform/rockchip/rga/rga-buf.c
@@ -150,7 +150,18 @@ static void rga_buf_queue(struct vb2_buffer *vb)
{
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
struct rga_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+
+ /*
+ * Opt in to vb2's dma_resv release-fence path so userspace
+ * consumers of RGA-produced dmabufs get a real producer fence
+ * to wait on instead of the dma_buf core's substitute stub
+ * fence. See the leading patch in this series for rationale
+ * and the helper definition. Best-effort: a fence-allocation
+ * failure means we lose implicit-sync precision but the m2m
+ * operation itself proceeds normally.
+ */
+ (void)vb2_buffer_attach_release_fence(vb);
}
static void rga_buf_cleanup(struct vb2_buffer *vb)
--
2.44.0
+14 -4
View File
@@ -34,10 +34,20 @@ their respective `buf_queue` callbacks).
## Status
Patches drafted but **not yet applied / compile-tested / runtime-
tested.** They're written against linux-next master as of
2026-04-28 (sparse-checked-out at `/tmp/hantro-src` during the
chromium-fourier campaign on ohm). Pre-flight before sending:
**Patches apply cleanly to Linux 6.12 mainline via `git am`**
verified against `/tmp/hantro-src` (sparse-checked-out v6.12 plus
linux-next master). All kernel API calls verified to match real
signatures in `include/linux/dma-fence.h` and
`include/linux/dma-resv.h`:
- `dma_fence_init(fence, ops, lock, context, seqno)`
- `dma_resv_add_fence(obj, fence, usage)`
- `DMA_RESV_USAGE_WRITE` enum present ✓
- `dma_fence_signal`, `dma_fence_set_error`, `dma_fence_get`,
`dma_fence_put`, `dma_fence_context_alloc`
- `dma_resv_lock(obj, NULL)`, `dma_resv_unlock`
Remaining gates before sending to linux-media:
1. **Compile** — `make drivers/media/common/videobuf2/videobuf2-core.o
drivers/media/platform/verisilicon/hantro_v4l2.o