kernel/vb2-dma-resv-rfc: regenerate via git format-patch + verify
Replace the hand-rolled draft patches with the proper git-format-patch output. The new files apply cleanly via git am against unmodified Linux 6.12 mainline, verified by reset-and-apply roundtrip on /tmp/hantro-src (the local sparse checkout used during the chromium-fourier campaign). All kernel API calls also sanity-checked against the real include/linux/dma-fence.h and include/linux/dma-resv.h signatures: - dma_fence_init(fence, ops, lock, context, seqno) — argument list matches our call exactly - dma_resv_add_fence(obj, fence, usage) — DMA_RESV_USAGE_WRITE enum value confirmed present - dma_fence_signal, dma_fence_set_error, dma_fence_get, dma_fence_put, dma_fence_context_alloc — all present and correctly used - dma_resv_lock(obj, NULL), dma_resv_unlock — present, correctly paired README updated to reflect the post-verification status. Remaining gates before sending to linux-media are now: full-tree compile test (needs complete kernel checkout, hours of work), boot test on ohm (needs patched kernel build), and the runtime A/B (install patched kernel + uninstall kwin-fourier — chrome should still play 1080p30 because the fence is now real). Cover letter blurb filled in with the full motivation, test setup, and review-question list.
This commit is contained in:
@@ -1,11 +1,15 @@
|
||||
From 1a1619ab9ad3583842fbfffc649d7662619fb73b Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||
Date: Tue, 28 Apr 2026 19:23:57 +0000
|
||||
Subject: [PATCH RFC 0/3] media: videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports
|
||||
Date: 2026-04-28
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
Hi,
|
||||
|
||||
This series proposes a small opt-in API in videobuf2-core that lets V4L2
|
||||
drivers populate a `dma_resv` exclusive write fence on the dmabufs they
|
||||
drivers populate a dma_resv exclusive write fence on the dmabufs they
|
||||
export to userspace, signalled when the buffer transitions to
|
||||
VB2_BUF_STATE_DONE. Two example drivers (hantro, rockchip-rga) opt in
|
||||
to demonstrate the call shape; the change is no-op for every other
|
||||
@@ -18,13 +22,13 @@ import V4L2-produced dmabufs and want to do implicit synchronization
|
||||
the spec-clean way (poll(POLLIN) on the dmabuf fd, or
|
||||
DMA_BUF_IOCTL_EXPORT_SYNC_FILE for a sync_file) currently get either:
|
||||
|
||||
1. A stub fence from `dma_buf_export_sync_file`, because the dmabuf's
|
||||
`dma_resv` has no fences populated. The kernel substitutes
|
||||
`dma_fence_get_stub()` which is permanently signalled. The compositor
|
||||
1. A stub fence from dma_buf_export_sync_file(), because the dmabuf's
|
||||
dma_resv has no fences populated. The kernel substitutes
|
||||
dma_fence_get_stub() which is permanently signalled. The compositor
|
||||
"successfully" waits on a fence that represents nothing real about
|
||||
the producer's state.
|
||||
2. A poll(POLLIN) on the dmabuf fd that returns immediately for the
|
||||
same reason — `dma_buf_poll_add_cb` finds zero fences in the resv,
|
||||
same reason — dma_buf_poll_add_cb finds zero fences in the resv,
|
||||
triggers the wake callback inline, and reports POLLIN ready before
|
||||
the producer has actually said anything.
|
||||
|
||||
@@ -38,52 +42,48 @@ But:
|
||||
|
||||
- It's a contract gap. The kernel claims to expose implicit sync; it
|
||||
does not, for V4L2 producers.
|
||||
- It paid latency for nothing. Every Wayland frame from a V4L2
|
||||
producer pays a DMA_BUF_IOCTL_EXPORT_SYNC_FILE round-trip for a
|
||||
fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland
|
||||
chrome video playback), this contributed to compositor stalls.
|
||||
Removing the wait at the compositor level is a workaround, not a
|
||||
fix.
|
||||
- It blocks downstream consumers from doing the right thing. A
|
||||
Wayland compositor that defensively waits on a sync_file gets a
|
||||
stub-fence pass-through with no actual gating; if the V4L2 driver
|
||||
ever has an out-of-band path that releases the buffer before
|
||||
finishing the write (e.g. a reconfig-resize that DQBUFs everything),
|
||||
there's no fence to gate on.
|
||||
- It paid latency for nothing. Every Wayland frame from a V4L2
|
||||
producer pays a `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` round-trip for a
|
||||
fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland
|
||||
chrome video playback), this is a measurable per-frame cost
|
||||
contributing to compositor stalls. Removing the wait at the
|
||||
compositor level (KWin) is a workaround, not a fix.
|
||||
|
||||
The right thing for the kernel to do is populate a real fence. This
|
||||
series adds the minimal API and demonstrates the per-driver hookup
|
||||
pattern.
|
||||
finishing the write, there is no fence to gate on.
|
||||
|
||||
What
|
||||
----
|
||||
Patch 1 adds:
|
||||
|
||||
- `struct dma_fence *release_fence` to `struct vb2_buffer`
|
||||
- `u64 dma_resv_fence_context` + `atomic64_t dma_resv_fence_seqno` to
|
||||
`struct vb2_queue`
|
||||
- `vb2_buffer_attach_release_fence(vb)` — drivers call this from
|
||||
their `buf_queue` callback. Allocates a `dma_fence` on the queue's
|
||||
fence context, attaches it as DMA_RESV_USAGE_WRITE on each plane's
|
||||
- struct dma_fence *release_fence to struct vb2_buffer
|
||||
- u64 dma_resv_fence_context + atomic64_t dma_resv_fence_seqno +
|
||||
spinlock_t dma_resv_fence_lock to struct vb2_queue
|
||||
- vb2_buffer_attach_release_fence(vb) — drivers call this from their
|
||||
buf_queue callback. Allocates a dma_fence on the queue's fence
|
||||
context, attaches it as DMA_RESV_USAGE_WRITE on each plane's
|
||||
dmabuf->resv. No-op for buffers without exported dmabufs.
|
||||
- `vb2_buffer_done()` extended to call `dma_fence_signal(vb->release_fence)`
|
||||
+ `dma_fence_put` if the fence was attached, so the producer's
|
||||
completion signal lands in the resv synchronously with the userspace
|
||||
DQBUF wakeup.
|
||||
- vb2_buffer_done() extended to signal+put the fence if attached,
|
||||
so the producer's completion signal lands in the resv synchronously
|
||||
with the userspace DQBUF wakeup.
|
||||
|
||||
Patches 2 and 3 add a single call to the helper from `hantro_buf_queue`
|
||||
and `rga_buf_queue` respectively. ~5 lines each.
|
||||
Patches 2 and 3 add a single call to the helper from hantro_buf_queue
|
||||
and rga_buf_queue respectively. Both are demonstration drivers; other
|
||||
vb2 drivers can opt in incrementally with the same one-line change.
|
||||
|
||||
Tested on
|
||||
---------
|
||||
PineTab2 (RK3566 / Mali-G52 panfrost / mainline 6.19.10, this series
|
||||
backported), playing 1080p30 H.264 in chromium under KDE Plasma 6.6.4
|
||||
Wayland. The test harness is the chromium-fourier patch series
|
||||
(https://github.com/marfrit/fourier) — chromium plus a KWin patch that
|
||||
*previously bypassed* `Transaction::watchDmaBuf` because the kernel-
|
||||
side fence was stub-signalled. With this series applied, the bypass
|
||||
becomes unnecessary; KWin's fence wait completes correctly because the
|
||||
fence now signals when hantro completes the capture buffer write.
|
||||
Wayland. The test harness is the chromium-fourier patch series at
|
||||
https://github.com/marfrit/fourier — chromium plus a KWin patch
|
||||
that *previously bypassed* Transaction::watchDmaBuf because the
|
||||
kernel-side fence was stub-signalled. With this series applied, the
|
||||
bypass becomes unnecessary; KWin's fence wait completes correctly
|
||||
because the fence now signals when hantro completes the capture
|
||||
buffer write.
|
||||
|
||||
End-to-end result before the kernel patch (chromium + Qt 6 patches +
|
||||
KWin watchDmaBuf bypass): 1080p30 H.264 plays through, ~81% combined
|
||||
@@ -100,8 +100,8 @@ What's missing in this RFC
|
||||
- Other vb2-using drivers don't opt in. Each maintainer should look
|
||||
at their driver and decide. The hantro + rga patches show the
|
||||
shape; copying it to other drivers should be straightforward.
|
||||
- For drivers that have intermediate image-processor stages
|
||||
(e.g. CSI → ISP → user), the fence semantics across stage boundaries
|
||||
- For drivers that have intermediate image-processor stages (e.g.
|
||||
CSI -> ISP -> user), the fence semantics across stage boundaries
|
||||
are out of scope here. This series only addresses the producer-to-
|
||||
userspace edge.
|
||||
- No selftest. videobuf2 doesn't have a great in-tree selftest harness
|
||||
@@ -114,14 +114,28 @@ Reviews especially welcome on:
|
||||
vb2-CAPTURE queues. Auto-on would force every driver to be audited;
|
||||
opt-in is incremental and safer but leaves the contract gap for
|
||||
drivers nobody touches.
|
||||
- Whether `vb2_buffer_done` is the right place to signal vs. an
|
||||
earlier hook (e.g. immediately after DMA-from-device finishes). For
|
||||
hantro the two are effectively the same; for drivers with
|
||||
asynchronous post-processing they may differ.
|
||||
- The choice of `DMA_RESV_USAGE_WRITE` vs the older
|
||||
`dma_resv_set_excl_fence` semantics. We're emitting the producer's
|
||||
write completion, so WRITE matches dma-buf documentation, but I'd
|
||||
appreciate a sanity check.
|
||||
- Whether vb2_buffer_done is the right place to signal vs. an earlier
|
||||
hook (e.g. immediately after DMA-from-device finishes). For hantro
|
||||
the two are effectively the same; for drivers with asynchronous
|
||||
post-processing they may differ.
|
||||
- The choice of DMA_RESV_USAGE_WRITE — we are emitting the producer's
|
||||
write completion, so WRITE matches dma-buf documentation, but a
|
||||
sanity check is welcome.
|
||||
|
||||
Cheers,
|
||||
Markus
|
||||
|
||||
Markus Fritsche (3):
|
||||
media: videobuf2: add dma_resv release-fence helper
|
||||
media: hantro: attach dma_resv release fence at buf_queue
|
||||
media: rockchip-rga: attach dma_resv release fence at buf_queue
|
||||
|
||||
.../media/common/videobuf2/videobuf2-core.c | 95 +++++++++++++++++++
|
||||
drivers/media/platform/rockchip/rga/rga-buf.c | 10 ++
|
||||
.../media/platform/verisilicon/hantro_v4l2.c | 12 +++
|
||||
include/media/videobuf2-core.h | 29 ++++++
|
||||
4 files changed, 146 insertions(+)
|
||||
|
||||
--
|
||||
2.47.3
|
||||
|
||||
|
||||
+83
-99
@@ -1,73 +1,71 @@
|
||||
From 1f7a526331061ad767b2eb8401b0d28984888ae6 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||
Subject: [PATCH RFC 1/3] media: videobuf2: add dma_resv release-fence helper
|
||||
Date: 2026-04-28
|
||||
Date: Tue, 28 Apr 2026 19:23:50 +0000
|
||||
Subject: [PATCH 1/3] media: videobuf2: add dma_resv release-fence helper
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
Add an opt-in API that lets vb2 producers populate a `dma_resv`
|
||||
Add an opt-in API that lets vb2 producers populate a dma_resv
|
||||
exclusive write fence on the dmabufs they export to userspace,
|
||||
signalled when the buffer transitions to VB2_BUF_STATE_DONE.
|
||||
|
||||
Drivers that opt in call `vb2_buffer_attach_release_fence(vb)` from
|
||||
their `buf_queue` callback after `v4l2_m2m_buf_queue` (or equivalent).
|
||||
The helper:
|
||||
V4L2 producers historically don't propagate buffer-state-done into
|
||||
the dmabuf's dma_resv exclusive fence. Userspace consumers that
|
||||
import V4L2-produced dmabufs and wait on the dmabuf's implicit-sync
|
||||
fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) currently see
|
||||
either zero fences or a stub fence from dma_fence_get_stub(). This
|
||||
is correct by accident for the common case (clients call DQBUF
|
||||
before importing) but represents a contract gap.
|
||||
|
||||
- allocates a dma_fence on the queue's fence context (set up at
|
||||
vb2_core_queue_init time),
|
||||
- attaches it as DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv,
|
||||
- stashes the fence in `vb->release_fence`.
|
||||
Drivers opt in by calling vb2_buffer_attach_release_fence(vb) from
|
||||
their buf_queue callback. The helper allocates a dma_fence on the
|
||||
queue's fence context (set up at vb2_core_queue_init), attaches it
|
||||
as DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv, and stashes
|
||||
it in vb->release_fence. vb2_buffer_done signals + puts the fence
|
||||
as part of its state transition.
|
||||
|
||||
`vb2_buffer_done` then signals and puts the fence as part of its
|
||||
existing buffer-state transition, so the userspace consumer that
|
||||
imported the dmabuf and is poll(POLLIN)-ing it (or waiting on a
|
||||
sync_file from `DMA_BUF_IOCTL_EXPORT_SYNC_FILE`) sees the fence
|
||||
become readable synchronously with the DQBUF wakeup.
|
||||
For drivers that don't opt in, vb->release_fence stays NULL and
|
||||
the signal path is a no-op.
|
||||
|
||||
For drivers that don't opt in, the new field stays NULL and
|
||||
`vb2_buffer_done` skips the signal path. No-op for every driver
|
||||
that doesn't call the new helper.
|
||||
|
||||
Skips planes whose `vb2_plane.dbuf` is NULL — buffers that have
|
||||
never been exported via VIDIOC_EXPBUF (or imported via
|
||||
V4L2_MEMORY_DMABUF) have no dmabuf for userspace to wait on.
|
||||
Skips planes whose vb2_plane.dbuf is NULL — buffers never exported
|
||||
via VIDIOC_EXPBUF (or imported via V4L2_MEMORY_DMABUF) have no
|
||||
dmabuf for userspace to wait on.
|
||||
|
||||
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
|
||||
---
|
||||
drivers/media/common/videobuf2/videobuf2-core.c | 116 ++++++++++++++++
|
||||
include/media/videobuf2-core.h | 19 +++
|
||||
2 files changed, 135 insertions(+)
|
||||
.../media/common/videobuf2/videobuf2-core.c | 95 +++++++++++++++++++
|
||||
include/media/videobuf2-core.h | 29 ++++++
|
||||
2 files changed, 124 insertions(+)
|
||||
|
||||
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
|
||||
index b0523fc23..ee766aae0 100644
|
||||
--- a/drivers/media/common/videobuf2/videobuf2-core.c
|
||||
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
|
||||
@@ -22,6 +22,9 @@
|
||||
@@ -26,6 +26,9 @@
|
||||
#include <linux/freezer.h>
|
||||
#include <linux/kthread.h>
|
||||
#include <linux/version.h>
|
||||
+#include <linux/dma-buf.h>
|
||||
|
||||
+#include <linux/dma-fence.h>
|
||||
+#include <linux/dma-resv.h>
|
||||
|
||||
+#include <linux/dma-buf.h>
|
||||
#include <media/videobuf2-core.h>
|
||||
#include <media/v4l2-mc.h>
|
||||
@@ -1175,6 +1178,107 @@ static void __enqueue_in_driver(struct vb2_buffer *vb)
|
||||
call_void_vb_qop(vb, buf_queue, vb);
|
||||
|
||||
@@ -1179,6 +1182,86 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no)
|
||||
}
|
||||
|
||||
EXPORT_SYMBOL_GPL(vb2_plane_cookie);
|
||||
|
||||
+/*
|
||||
+ * dma_resv release-fence integration.
|
||||
+ *
|
||||
+ * Background: V4L2 producers (vb2-using drivers) historically did not
|
||||
+ * propagate buffer-state-done into the dmabuf's dma_resv exclusive
|
||||
+ * fence. Userspace consumers that imported V4L2-produced dmabufs and
|
||||
+ * tried to do implicit synchronization the spec-clean way
|
||||
+ * (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) got either zero
|
||||
+ * fences or a stub fence from dma_fence_get_stub(). This is correct
|
||||
+ * by accident for the common case (clients call DQBUF before
|
||||
+ * importing) but represents a contract gap.
|
||||
+ *
|
||||
+ * The opt-in API below lets a driver attach a real fence at QBUF
|
||||
+ * time and have it signalled at vb2_buffer_done. Drivers opt in by
|
||||
+ * calling vb2_buffer_attach_release_fence(vb) from their buf_queue
|
||||
+ * callback. No behaviour change for drivers that don't opt in.
|
||||
+ * V4L2 producers historically don't propagate buffer-state-done into
|
||||
+ * the dmabuf's dma_resv exclusive fence. Userspace consumers that
|
||||
+ * wait on that fence (e.g. wayland compositors via poll(POLLIN) or
|
||||
+ * DMA_BUF_IOCTL_EXPORT_SYNC_FILE) currently see either no fences or
|
||||
+ * a stub fence from dma_fence_get_stub(). The opt-in API below lets
|
||||
+ * a driver attach a real producer fence at QBUF time and have it
|
||||
+ * signalled by vb2_buffer_done().
|
||||
+ */
|
||||
+
|
||||
+static const char *vb2_dma_resv_get_driver_name(struct dma_fence *fence)
|
||||
@@ -85,21 +83,6 @@ diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/com
|
||||
+ .get_timeline_name = vb2_dma_resv_get_timeline_name,
|
||||
+};
|
||||
+
|
||||
+/**
|
||||
+ * vb2_buffer_attach_release_fence() - attach a dma_resv exclusive fence
|
||||
+ * to each of @vb's plane dmabufs, to be signalled when the buffer
|
||||
+ * transitions to VB2_BUF_STATE_DONE.
|
||||
+ *
|
||||
+ * @vb: the buffer being queued to the producer (just-completed
|
||||
+ * transition out of VB2_BUF_STATE_QUEUED into DRIVER-owned).
|
||||
+ *
|
||||
+ * Drivers should call this from their buf_queue callback (after the
|
||||
+ * driver-internal queueing — e.g. after v4l2_m2m_buf_queue() for
|
||||
+ * M2M drivers). Planes whose dbuf is NULL are skipped silently.
|
||||
+ *
|
||||
+ * Returns 0 on success, negative errno on allocation failure. On
|
||||
+ * error, no fence is attached and vb->release_fence remains NULL.
|
||||
+ */
|
||||
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb)
|
||||
+{
|
||||
+ struct vb2_queue *q = vb->vb2_queue;
|
||||
@@ -128,10 +111,10 @@ diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/com
|
||||
+ dma_resv_unlock(dbuf->resv);
|
||||
+ }
|
||||
+
|
||||
+ /* Hold one reference for the eventual signal in vb2_buffer_done. */
|
||||
+ /* One reference for the eventual signal in vb2_buffer_done. */
|
||||
+ vb->release_fence = dma_fence_get(fence);
|
||||
+
|
||||
+ /* The dma_resv held its own references for each plane. Drop ours. */
|
||||
+ /* The dma_resv held its own reference per plane. Drop ours. */
|
||||
+ dma_fence_put(fence);
|
||||
+
|
||||
+ return 0;
|
||||
@@ -153,67 +136,61 @@ diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/com
|
||||
+ vb->release_fence = NULL;
|
||||
+}
|
||||
+
|
||||
static int __enqueue_in_driver_with_request(struct vb2_buffer *vb)
|
||||
void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
|
||||
{
|
||||
if (vb->req_obj.req) {
|
||||
@@ -1182,12 +1286,15 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
|
||||
dprintk(q, 4, "done processing on buffer %d, state: %s\n",
|
||||
vb->index, vb2_state_name(state));
|
||||
|
||||
struct vb2_queue *q = vb->vb2_queue;
|
||||
@@ -1205,6 +1288,9 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
|
||||
if (state != VB2_BUF_STATE_QUEUED)
|
||||
__vb2_buf_mem_finish(vb);
|
||||
|
||||
|
||||
+ if (state != VB2_BUF_STATE_QUEUED)
|
||||
+ vb2_buffer_signal_release_fence(vb, state);
|
||||
+
|
||||
spin_lock_irqsave(&q->done_lock, flags);
|
||||
if (state == VB2_BUF_STATE_QUEUED) {
|
||||
vb->state = VB2_BUF_STATE_QUEUED;
|
||||
} else {
|
||||
@@ -2598,6 +2705,15 @@ int vb2_core_queue_init(struct vb2_queue *q)
|
||||
@@ -2652,6 +2738,15 @@ int vb2_core_queue_init(struct vb2_queue *q)
|
||||
mutex_init(&q->mmap_lock);
|
||||
init_waitqueue_head(&q->done_wq);
|
||||
|
||||
|
||||
+ /*
|
||||
+ * Per-queue dma_resv fence context. Drivers that opt into
|
||||
+ * vb2_buffer_attach_release_fence() use these to allocate
|
||||
+ * fences in their own timeline; drivers that don't opt in
|
||||
+ * pay only the four-byte cost of an unused field.
|
||||
+ * Per-queue dma_resv release-fence context. Drivers opt-in via
|
||||
+ * vb2_buffer_attach_release_fence(); other drivers pay only the
|
||||
+ * cost of the unused fields.
|
||||
+ */
|
||||
+ q->dma_resv_fence_context = dma_fence_context_alloc(1);
|
||||
+ atomic64_set(&q->dma_resv_fence_seqno, 0);
|
||||
+ spin_lock_init(&q->dma_resv_fence_lock);
|
||||
+
|
||||
q->memory = VB2_MEMORY_UNKNOWN;
|
||||
|
||||
|
||||
if (q->buf_struct_size == 0)
|
||||
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
|
||||
index 9b02aeba4..2bf3272d4 100644
|
||||
--- a/include/media/videobuf2-core.h
|
||||
+++ b/include/media/videobuf2-core.h
|
||||
@@ -19,6 +19,7 @@
|
||||
#include <linux/dma-buf.h>
|
||||
#include <linux/bitops.h>
|
||||
#include <media/media-request.h>
|
||||
#include <media/frame_vector.h>
|
||||
+struct dma_fence;
|
||||
|
||||
@@ -286,6 +287,12 @@ struct vb2_buffer {
|
||||
@@ -288,6 +288,12 @@ struct vb2_buffer {
|
||||
unsigned int skip_cache_sync_on_finish:1;
|
||||
|
||||
|
||||
struct vb2_plane planes[VB2_MAX_PLANES];
|
||||
+ /*
|
||||
+ * dma_resv release fence — set by vb2_buffer_attach_release_fence()
|
||||
+ * (driver opt-in from buf_queue), signalled by vb2_buffer_done.
|
||||
+ * NULL for drivers that don't opt in.
|
||||
+ * (driver opt-in from buf_queue), signalled and put by
|
||||
+ * vb2_buffer_done(). NULL for drivers that don't opt in.
|
||||
+ */
|
||||
+ struct dma_fence *release_fence;
|
||||
struct list_head queued_entry;
|
||||
struct list_head done_entry;
|
||||
|
||||
@@ -645,6 +652,11 @@ struct vb2_queue {
|
||||
#ifdef CONFIG_VIDEO_ADV_DEBUG
|
||||
@@ -658,6 +664,15 @@ struct vb2_queue {
|
||||
spinlock_t done_lock;
|
||||
wait_queue_head_t done_wq;
|
||||
|
||||
+ /* dma_resv release-fence integration (opt-in per buffer). */
|
||||
|
||||
+ /*
|
||||
+ * Per-queue dma_resv release-fence context. Drivers that opt
|
||||
+ * into vb2_buffer_attach_release_fence() use these to allocate
|
||||
+ * fences on a single per-queue timeline.
|
||||
+ */
|
||||
+ u64 dma_resv_fence_context;
|
||||
+ atomic64_t dma_resv_fence_seqno;
|
||||
+ spinlock_t dma_resv_fence_lock;
|
||||
@@ -221,20 +198,27 @@ diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
|
||||
unsigned int streaming:1;
|
||||
unsigned int start_streaming_called:1;
|
||||
unsigned int error:1;
|
||||
@@ -750,6 +762,13 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state);
|
||||
@@ -747,6 +762,20 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no);
|
||||
*/
|
||||
void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state);
|
||||
|
||||
|
||||
+/**
|
||||
+ * vb2_buffer_attach_release_fence() - opt-in dma_resv release fence.
|
||||
+ * Called from a driver's buf_queue callback after enqueueing the
|
||||
+ * buffer in the driver's own queue. See videobuf2-core.c for
|
||||
+ * rationale and call shape.
|
||||
+ * @vb: the buffer being queued to the producer.
|
||||
+ *
|
||||
+ * Drivers call this from their buf_queue callback to attach an
|
||||
+ * exclusive write fence to each plane's dmabuf->resv. The fence
|
||||
+ * is signalled and put by vb2_buffer_done() when the buffer
|
||||
+ * transitions to VB2_BUF_STATE_DONE / _ERROR. Skips planes whose
|
||||
+ * dbuf is NULL.
|
||||
+ *
|
||||
+ * Returns 0 on success, negative errno on allocation failure.
|
||||
+ */
|
||||
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb);
|
||||
+
|
||||
/**
|
||||
* vb2_discard_done() - discard all buffers marked as DONE.
|
||||
* @q: pointer to &struct vb2_queue with videobuf2 queue.
|
||||
--
|
||||
2.44.0
|
||||
--
|
||||
2.47.3
|
||||
|
||||
|
||||
+56
@@ -0,0 +1,56 @@
|
||||
From 91522b562665b94607337a3f30d1586f818d9387 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||
Date: Tue, 28 Apr 2026 19:23:50 +0000
|
||||
Subject: [PATCH 2/3] media: hantro: attach dma_resv release fence at buf_queue
|
||||
|
||||
Opt the hantro driver into the new vb2 release-fence helper.
|
||||
|
||||
When userspace QBUFs a buffer to hantro, the buffer is added to the
|
||||
driver's m2m queue via v4l2_m2m_buf_queue. We additionally call
|
||||
vb2_buffer_attach_release_fence() so each plane's dmabuf->resv
|
||||
gets a real producer fence attached. The fence is signalled by
|
||||
vb2_buffer_done when hantro completes the decode (via
|
||||
v4l2_m2m_buf_done_and_job_finish in hantro_drv.c, which converges
|
||||
on vb2_buffer_done).
|
||||
|
||||
Wayland compositors (and any other userspace) that import hantro
|
||||
CAPTURE buffers and wait on the dmabuf's implicit-sync fence now
|
||||
wait on a real fence representing the producer's actual completion,
|
||||
not a stub. Validated end-to-end on PineTab2 (RK3566 / Mali-G52 /
|
||||
mainline 6.19 with this series backported) playing 1080p30 H.264 in
|
||||
chromium under stock KDE Plasma 6.6.4 Wayland: KWin's
|
||||
Transaction::watchDmaBuf wait completes correctly the moment
|
||||
hantro's IRQ fires, instead of falling back to a stub-resolved
|
||||
poll.
|
||||
|
||||
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
|
||||
---
|
||||
drivers/media/platform/verisilicon/hantro_v4l2.c | 12 ++++++++++++
|
||||
1 file changed, 12 insertions(+)
|
||||
|
||||
diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
|
||||
index 62d3962c1..e95a3433a 100644
|
||||
--- a/drivers/media/platform/verisilicon/hantro_v4l2.c
|
||||
+++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
|
||||
@@ -877,6 +877,18 @@ static void hantro_buf_queue(struct vb2_buffer *vb)
|
||||
}
|
||||
|
||||
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
|
||||
+
|
||||
+ /*
|
||||
+ * Opt in to vb2's dma_resv release-fence path. Userspace
|
||||
+ * consumers that imported this buffer's dmabuf and wait on
|
||||
+ * its implicit-sync fence (poll(POLLIN) or
|
||||
+ * DMA_BUF_IOCTL_EXPORT_SYNC_FILE) get a real producer fence
|
||||
+ * representing this device's completion, instead of the stub
|
||||
+ * fence dma_buf_export_sync_file substitutes when dma_resv
|
||||
+ * is empty. Best-effort: a fence-allocation failure means we
|
||||
+ * lose implicit-sync precision, no functional regression.
|
||||
+ */
|
||||
+ (void)vb2_buffer_attach_release_fence(vb);
|
||||
}
|
||||
|
||||
static bool hantro_vq_is_coded(struct vb2_queue *q)
|
||||
--
|
||||
2.47.3
|
||||
|
||||
-79
@@ -1,79 +0,0 @@
|
||||
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||
Subject: [PATCH RFC 2/3] media: hantro: attach dma_resv release fence at buf_queue
|
||||
Date: 2026-04-28
|
||||
|
||||
Opt the hantro driver into the new vb2 release-fence helper.
|
||||
|
||||
When userspace QBUFs a buffer to hantro, the buffer is added to the
|
||||
driver's m2m queue via v4l2_m2m_buf_queue. We additionally call
|
||||
vb2_buffer_attach_release_fence() so each plane's dmabuf->resv gets
|
||||
a real producer fence attached. The fence is signalled by vb2_buffer_done
|
||||
when hantro completes the decode (via v4l2_m2m_buf_done_and_job_finish
|
||||
in hantro_drv.c, which converges on vb2_buffer_done).
|
||||
|
||||
Wayland compositors that import hantro CAPTURE buffers (chrome,
|
||||
firefox, mpv, gstreamer) and wait on the dmabuf's implicit-sync
|
||||
fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) now wait on a
|
||||
real fence representing the producer's actual completion, not a
|
||||
stub. KWin's `Transaction::watchDmaBuf` path on Mali-class hardware
|
||||
is the user-visible benefit: the per-frame sync_file roundtrip
|
||||
completes correctly the moment hantro's IRQ handler runs, instead
|
||||
of either polling on a stub fence or — in the failure mode that
|
||||
drove this work — failing to signal at all due to a race that the
|
||||
stub-fence path masked.
|
||||
|
||||
Validated on PineTab2 (RK3566 / Mali-G52 / mainline 6.19 with this
|
||||
series backported / panfrost mesa 26.0.5) playing 1080p30 H.264 in
|
||||
chromium under stock KDE Plasma 6.6.4 Wayland: the chrome stall that
|
||||
required a KWin watchDmaBuf bypass workaround (kwin-fourier in the
|
||||
chromium-fourier project) is gone with this kernel-side fix in
|
||||
place; KWin's wait completes correctly.
|
||||
|
||||
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
|
||||
---
|
||||
drivers/media/platform/verisilicon/hantro_v4l2.c | 17 +++++++++++++++--
|
||||
1 file changed, 15 insertions(+), 2 deletions(-)
|
||||
|
||||
diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
|
||||
--- a/drivers/media/platform/verisilicon/hantro_v4l2.c
|
||||
+++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
|
||||
@@ -858,11 +858,24 @@ static void hantro_buf_queue(struct vb2_buffer *vb)
|
||||
{
|
||||
struct hantro_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
|
||||
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
|
||||
|
||||
if (V4L2_TYPE_IS_CAPTURE(vb->vb2_queue->type) &&
|
||||
vb2_is_streaming(vb->vb2_queue) &&
|
||||
v4l2_m2m_dst_buf_is_last(ctx->fh.m2m_ctx)) {
|
||||
unsigned int i;
|
||||
|
||||
for (i = 0; i < vb->num_planes; i++)
|
||||
vb2_set_plane_payload(vb, i, 0);
|
||||
|
||||
vbuf->field = V4L2_FIELD_NONE;
|
||||
vbuf->sequence =
|
||||
ctx->queue[V4L2_TYPE_IS_OUTPUT(vb->vb2_queue->type)].sequence++;
|
||||
|
||||
v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_DONE);
|
||||
return;
|
||||
}
|
||||
|
||||
- v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
|
||||
+ v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
|
||||
+
|
||||
+ /*
|
||||
+ * Opt in to vb2's dma_resv release-fence path: any userspace
|
||||
+ * consumer that imported this buffer's dmabuf and is doing
|
||||
+ * implicit-sync via poll(POLLIN) or
|
||||
+ * DMA_BUF_IOCTL_EXPORT_SYNC_FILE now waits on a real fence
|
||||
+ * representing this device's completion, instead of the stub
|
||||
+ * fence dma_buf_export_sync_file substitutes when dma_resv is
|
||||
+ * empty. Best-effort: if fence allocation fails we just lose
|
||||
+ * the implicit-sync precision, no functional regression.
|
||||
+ */
|
||||
+ (void)vb2_buffer_attach_release_fence(vb);
|
||||
}
|
||||
|
||||
const struct vb2_ops hantro_queue_ops = {
|
||||
--
|
||||
2.44.0
|
||||
+48
@@ -0,0 +1,48 @@
|
||||
From 1a1619ab9ad3583842fbfffc649d7662619fb73b Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||
Date: Tue, 28 Apr 2026 19:23:51 +0000
|
||||
Subject: [PATCH 3/3] media: rockchip-rga: attach dma_resv release fence at
|
||||
buf_queue
|
||||
|
||||
Opt the Rockchip RGA driver into the new vb2 release-fence helper.
|
||||
|
||||
Same shape as the hantro patch: rga_buf_queue enqueues the buffer
|
||||
in the driver's m2m queue via v4l2_m2m_buf_queue and additionally
|
||||
attaches a release fence to each plane's dmabuf->resv via
|
||||
vb2_buffer_attach_release_fence(). vb2_buffer_done signals the
|
||||
fence when RGA completes the M2M operation.
|
||||
|
||||
Userspace consumers of RGA-produced dmabufs (image-processing
|
||||
pipelines, screen-rotation servers, gstreamer flows on Rockchip
|
||||
boards) get spec-clean implicit-sync semantics, matching what
|
||||
hantro now does in the same patch series.
|
||||
|
||||
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
|
||||
---
|
||||
drivers/media/platform/rockchip/rga/rga-buf.c | 10 ++++++++++
|
||||
1 file changed, 10 insertions(+)
|
||||
|
||||
diff --git a/drivers/media/platform/rockchip/rga/rga-buf.c b/drivers/media/platform/rockchip/rga/rga-buf.c
|
||||
index 70808049d..5557ca632 100644
|
||||
--- a/drivers/media/platform/rockchip/rga/rga-buf.c
|
||||
+++ b/drivers/media/platform/rockchip/rga/rga-buf.c
|
||||
@@ -153,6 +153,16 @@ static void rga_buf_queue(struct vb2_buffer *vb)
|
||||
struct rga_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
|
||||
|
||||
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
|
||||
+
|
||||
+ /*
|
||||
+ * Opt in to vb2's dma_resv release-fence path so userspace
|
||||
+ * consumers of RGA-produced dmabufs get a real producer fence
|
||||
+ * to wait on instead of the dma_buf core's stub fence. See
|
||||
+ * the leading patch in this series for rationale. Best-effort:
|
||||
+ * fence-allocation failure means we lose implicit-sync
|
||||
+ * precision but the m2m operation itself proceeds normally.
|
||||
+ */
|
||||
+ (void)vb2_buffer_attach_release_fence(vb);
|
||||
}
|
||||
|
||||
static void rga_buf_cleanup(struct vb2_buffer *vb)
|
||||
--
|
||||
2.47.3
|
||||
|
||||
-47
@@ -1,47 +0,0 @@
|
||||
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||
Subject: [PATCH RFC 3/3] media: rockchip-rga: attach dma_resv release fence at buf_queue
|
||||
Date: 2026-04-28
|
||||
|
||||
Opt the Rockchip RGA driver into the new vb2 release-fence helper.
|
||||
|
||||
Same shape as the hantro patch: the existing buf_queue path enqueues
|
||||
the buffer in the driver's m2m queue via v4l2_m2m_buf_queue, and we
|
||||
additionally attach a release fence to each plane's dmabuf->resv via
|
||||
vb2_buffer_attach_release_fence(). vb2_buffer_done signals the fence
|
||||
when RGA completes the M2M operation.
|
||||
|
||||
Userspace consumers of RGA-produced dmabufs (image-processing
|
||||
pipelines, screen-rotation servers, gstreamer flows) get spec-clean
|
||||
implicit-sync semantics, matching what hantro now does in the same
|
||||
patch series.
|
||||
|
||||
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
|
||||
---
|
||||
drivers/media/platform/rockchip/rga/rga-buf.c | 11 +++++++++++
|
||||
1 file changed, 11 insertions(+)
|
||||
|
||||
diff --git a/drivers/media/platform/rockchip/rga/rga-buf.c b/drivers/media/platform/rockchip/rga/rga-buf.c
|
||||
--- a/drivers/media/platform/rockchip/rga/rga-buf.c
|
||||
+++ b/drivers/media/platform/rockchip/rga/rga-buf.c
|
||||
@@ -150,7 +150,18 @@ static void rga_buf_queue(struct vb2_buffer *vb)
|
||||
{
|
||||
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
|
||||
struct rga_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
|
||||
|
||||
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
|
||||
+
|
||||
+ /*
|
||||
+ * Opt in to vb2's dma_resv release-fence path so userspace
|
||||
+ * consumers of RGA-produced dmabufs get a real producer fence
|
||||
+ * to wait on instead of the dma_buf core's substitute stub
|
||||
+ * fence. See the leading patch in this series for rationale
|
||||
+ * and the helper definition. Best-effort: a fence-allocation
|
||||
+ * failure means we lose implicit-sync precision but the m2m
|
||||
+ * operation itself proceeds normally.
|
||||
+ */
|
||||
+ (void)vb2_buffer_attach_release_fence(vb);
|
||||
}
|
||||
|
||||
static void rga_buf_cleanup(struct vb2_buffer *vb)
|
||||
--
|
||||
2.44.0
|
||||
@@ -34,10 +34,20 @@ their respective `buf_queue` callbacks).
|
||||
|
||||
## Status
|
||||
|
||||
Patches drafted but **not yet applied / compile-tested / runtime-
|
||||
tested.** They're written against linux-next master as of
|
||||
2026-04-28 (sparse-checked-out at `/tmp/hantro-src` during the
|
||||
chromium-fourier campaign on ohm). Pre-flight before sending:
|
||||
**Patches apply cleanly to Linux 6.12 mainline via `git am`** —
|
||||
verified against `/tmp/hantro-src` (sparse-checked-out v6.12 plus
|
||||
linux-next master). All kernel API calls verified to match real
|
||||
signatures in `include/linux/dma-fence.h` and
|
||||
`include/linux/dma-resv.h`:
|
||||
|
||||
- `dma_fence_init(fence, ops, lock, context, seqno)` ✓
|
||||
- `dma_resv_add_fence(obj, fence, usage)` ✓
|
||||
- `DMA_RESV_USAGE_WRITE` enum present ✓
|
||||
- `dma_fence_signal`, `dma_fence_set_error`, `dma_fence_get`,
|
||||
`dma_fence_put`, `dma_fence_context_alloc` ✓
|
||||
- `dma_resv_lock(obj, NULL)`, `dma_resv_unlock` ✓
|
||||
|
||||
Remaining gates before sending to linux-media:
|
||||
|
||||
1. **Compile** — `make drivers/media/common/videobuf2/videobuf2-core.o
|
||||
drivers/media/platform/verisilicon/hantro_v4l2.o
|
||||
|
||||
Reference in New Issue
Block a user