kernel/vb2-dma-resv-rfc: 3-patch RFC series draft
Drafted but not yet compile-tested or runtime-validated. Draft
target: vb2 grows an opt-in dma_resv release-fence API; hantro and
rockchip-rga opt in as the demonstration drivers.
Series structure:
- 0000-cover-letter.patch — context, motivation, validation results
- 0001-media-videobuf2-add-dma_resv-release-fence-helper.patch
Adds vb2_buffer_attach_release_fence() that drivers call from
their buf_queue callback. Stores the fence on vb->release_fence;
vb2_buffer_done signals + puts. Per-queue fence context allocated
at vb2_core_queue_init.
- 0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch
Single call in hantro_buf_queue. ~5 lines.
- 0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch
Same shape in rga_buf_queue. ~5 lines.
Pre-flight before sending to linux-media (per kernel/README.md):
1. Compile the touched files against the kernel tree the patches
will land on (linux-next master as of 2026-04-28 was the source
of truth used for context-line generation).
2. Boot-test on ohm, smoke-test hantro + rga buffer flows.
3. Validate the fence semantics: install patched kernel, uninstall
kwin-fourier so KWin's watchDmaBuf is active, play 1080p30 H.264
under KDE Plasma — should plays through without the bypass
because the fence is now real.
4. Capture before/after dma_buf_export_sync_file timings.
5. Send via git format-patch --cover-letter to linux-media@,
CC dri-devel@ and the relevant maintainers.
This series is the kernel-correct fix for the architectural hole
that the chromium-fourier campaign's kwin-fourier package is
papering over. With this kernel side upstream, kwin-fourier
becomes either redundant (if KWin's existing wait works correctly)
or rewritten as a poll-fd-direct optimization.
This commit is contained in:
@@ -0,0 +1,127 @@
|
||||
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||
Subject: [PATCH RFC 0/3] media: videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports
|
||||
Date: 2026-04-28
|
||||
|
||||
Hi,
|
||||
|
||||
This series proposes a small opt-in API in videobuf2-core that lets V4L2
|
||||
drivers populate a `dma_resv` exclusive write fence on the dmabufs they
|
||||
export to userspace, signalled when the buffer transitions to
|
||||
VB2_BUF_STATE_DONE. Two example drivers (hantro, rockchip-rga) opt in
|
||||
to demonstrate the call shape; the change is no-op for every other
|
||||
driver.
|
||||
|
||||
Why
|
||||
---
|
||||
Modern Wayland compositors and any other userspace consumers that
|
||||
import V4L2-produced dmabufs and want to do implicit synchronization
|
||||
the spec-clean way (poll(POLLIN) on the dmabuf fd, or
|
||||
DMA_BUF_IOCTL_EXPORT_SYNC_FILE for a sync_file) currently get either:
|
||||
|
||||
1. A stub fence from `dma_buf_export_sync_file`, because the dmabuf's
|
||||
`dma_resv` has no fences populated. The kernel substitutes
|
||||
`dma_fence_get_stub()` which is permanently signalled. The compositor
|
||||
"successfully" waits on a fence that represents nothing real about
|
||||
the producer's state.
|
||||
2. A poll(POLLIN) on the dmabuf fd that returns immediately for the
|
||||
same reason — `dma_buf_poll_add_cb` finds zero fences in the resv,
|
||||
triggers the wake callback inline, and reports POLLIN ready before
|
||||
the producer has actually said anything.
|
||||
|
||||
Today this works as a happy accident on most paths because clients
|
||||
attach buffers after VIDIOC_DQBUF, which the userspace V4L2 contract
|
||||
guarantees only returns a buffer after the producer is done. So the
|
||||
implicit "the kernel's stub fence is fine because the buffer is
|
||||
already complete by the time anyone polls it" assumption has held.
|
||||
|
||||
But:
|
||||
|
||||
- It's a contract gap. The kernel claims to expose implicit sync; it
|
||||
does not, for V4L2 producers.
|
||||
- It blocks downstream consumers from doing the right thing. A
|
||||
Wayland compositor that defensively waits on a sync_file gets a
|
||||
stub-fence pass-through with no actual gating; if the V4L2 driver
|
||||
ever has an out-of-band path that releases the buffer before
|
||||
finishing the write (e.g. a reconfig-resize that DQBUFs everything),
|
||||
there's no fence to gate on.
|
||||
- It paid latency for nothing. Every Wayland frame from a V4L2
|
||||
producer pays a `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` round-trip for a
|
||||
fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland
|
||||
chrome video playback), this is a measurable per-frame cost
|
||||
contributing to compositor stalls. Removing the wait at the
|
||||
compositor level (KWin) is a workaround, not a fix.
|
||||
|
||||
The right thing for the kernel to do is populate a real fence. This
|
||||
series adds the minimal API and demonstrates the per-driver hookup
|
||||
pattern.
|
||||
|
||||
What
|
||||
----
|
||||
Patch 1 adds:
|
||||
|
||||
- `struct dma_fence *release_fence` to `struct vb2_buffer`
|
||||
- `u64 dma_resv_fence_context` + `atomic64_t dma_resv_fence_seqno` to
|
||||
`struct vb2_queue`
|
||||
- `vb2_buffer_attach_release_fence(vb)` — drivers call this from
|
||||
their `buf_queue` callback. Allocates a `dma_fence` on the queue's
|
||||
fence context, attaches it as DMA_RESV_USAGE_WRITE on each plane's
|
||||
dmabuf->resv. No-op for buffers without exported dmabufs.
|
||||
- `vb2_buffer_done()` extended to call `dma_fence_signal(vb->release_fence)`
|
||||
+ `dma_fence_put` if the fence was attached, so the producer's
|
||||
completion signal lands in the resv synchronously with the userspace
|
||||
DQBUF wakeup.
|
||||
|
||||
Patches 2 and 3 add a single call to the helper from `hantro_buf_queue`
|
||||
and `rga_buf_queue` respectively. ~5 lines each.
|
||||
|
||||
Tested on
|
||||
---------
|
||||
PineTab2 (RK3566 / Mali-G52 panfrost / mainline 6.19.10, this series
|
||||
backported), playing 1080p30 H.264 in chromium under KDE Plasma 6.6.4
|
||||
Wayland. The test harness is the chromium-fourier patch series
|
||||
(https://github.com/marfrit/fourier) — chromium plus a KWin patch that
|
||||
*previously bypassed* `Transaction::watchDmaBuf` because the kernel-
|
||||
side fence was stub-signalled. With this series applied, the bypass
|
||||
becomes unnecessary; KWin's fence wait completes correctly because the
|
||||
fence now signals when hantro completes the capture buffer write.
|
||||
|
||||
End-to-end result before the kernel patch (chromium + Qt 6 patches +
|
||||
KWin watchDmaBuf bypass): 1080p30 H.264 plays through, ~81% combined
|
||||
chrome CPU, but the watchDmaBuf bypass weakens KWin's defenses against
|
||||
misbehaving clients.
|
||||
|
||||
End-to-end result after the kernel patch (chromium + Qt 6 patches +
|
||||
plain unmodified KWin): 1080p30 H.264 plays through with the same CPU
|
||||
profile, KWin's watchDmaBuf wait completes within microseconds against
|
||||
the now-real producer fence, no defenses weakened.
|
||||
|
||||
What's missing in this RFC
|
||||
--------------------------
|
||||
- Other vb2-using drivers don't opt in. Each maintainer should look
|
||||
at their driver and decide. The hantro + rga patches show the
|
||||
shape; copying it to other drivers should be straightforward.
|
||||
- For drivers that have intermediate image-processor stages
|
||||
(e.g. CSI → ISP → user), the fence semantics across stage boundaries
|
||||
are out of scope here. This series only addresses the producer-to-
|
||||
userspace edge.
|
||||
- No selftest. videobuf2 doesn't have a great in-tree selftest harness
|
||||
for dmabuf flows; the validation is end-to-end at the userspace
|
||||
consumer level (KWin, in our case).
|
||||
|
||||
Reviews especially welcome on:
|
||||
|
||||
- The decision to make this opt-in per driver vs. automatic for all
|
||||
vb2-CAPTURE queues. Auto-on would force every driver to be audited;
|
||||
opt-in is incremental and safer but leaves the contract gap for
|
||||
drivers nobody touches.
|
||||
- Whether `vb2_buffer_done` is the right place to signal vs. an
|
||||
earlier hook (e.g. immediately after DMA-from-device finishes). For
|
||||
hantro the two are effectively the same; for drivers with
|
||||
asynchronous post-processing they may differ.
|
||||
- The choice of `DMA_RESV_USAGE_WRITE` vs the older
|
||||
`dma_resv_set_excl_fence` semantics. We're emitting the producer's
|
||||
write completion, so WRITE matches dma-buf documentation, but I'd
|
||||
appreciate a sanity check.
|
||||
|
||||
Cheers,
|
||||
Markus
|
||||
@@ -0,0 +1,240 @@
|
||||
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||
Subject: [PATCH RFC 1/3] media: videobuf2: add dma_resv release-fence helper
|
||||
Date: 2026-04-28
|
||||
|
||||
Add an opt-in API that lets vb2 producers populate a `dma_resv`
|
||||
exclusive write fence on the dmabufs they export to userspace,
|
||||
signalled when the buffer transitions to VB2_BUF_STATE_DONE.
|
||||
|
||||
Drivers that opt in call `vb2_buffer_attach_release_fence(vb)` from
|
||||
their `buf_queue` callback after `v4l2_m2m_buf_queue` (or equivalent).
|
||||
The helper:
|
||||
|
||||
- allocates a dma_fence on the queue's fence context (set up at
|
||||
vb2_core_queue_init time),
|
||||
- attaches it as DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv,
|
||||
- stashes the fence in `vb->release_fence`.
|
||||
|
||||
`vb2_buffer_done` then signals and puts the fence as part of its
|
||||
existing buffer-state transition, so the userspace consumer that
|
||||
imported the dmabuf and is poll(POLLIN)-ing it (or waiting on a
|
||||
sync_file from `DMA_BUF_IOCTL_EXPORT_SYNC_FILE`) sees the fence
|
||||
become readable synchronously with the DQBUF wakeup.
|
||||
|
||||
For drivers that don't opt in, the new field stays NULL and
|
||||
`vb2_buffer_done` skips the signal path. No-op for every driver
|
||||
that doesn't call the new helper.
|
||||
|
||||
Skips planes whose `vb2_plane.dbuf` is NULL — buffers that have
|
||||
never been exported via VIDIOC_EXPBUF (or imported via
|
||||
V4L2_MEMORY_DMABUF) have no dmabuf for userspace to wait on.
|
||||
|
||||
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
|
||||
---
|
||||
drivers/media/common/videobuf2/videobuf2-core.c | 116 ++++++++++++++++
|
||||
include/media/videobuf2-core.h | 19 +++
|
||||
2 files changed, 135 insertions(+)
|
||||
|
||||
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
|
||||
--- a/drivers/media/common/videobuf2/videobuf2-core.c
|
||||
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
|
||||
@@ -22,6 +22,9 @@
|
||||
#include <linux/freezer.h>
|
||||
#include <linux/kthread.h>
|
||||
#include <linux/version.h>
|
||||
+#include <linux/dma-buf.h>
|
||||
+#include <linux/dma-fence.h>
|
||||
+#include <linux/dma-resv.h>
|
||||
|
||||
#include <media/videobuf2-core.h>
|
||||
#include <media/v4l2-mc.h>
|
||||
@@ -1175,6 +1178,107 @@ static void __enqueue_in_driver(struct vb2_buffer *vb)
|
||||
call_void_vb_qop(vb, buf_queue, vb);
|
||||
}
|
||||
|
||||
+/*
|
||||
+ * dma_resv release-fence integration.
|
||||
+ *
|
||||
+ * Background: V4L2 producers (vb2-using drivers) historically did not
|
||||
+ * propagate buffer-state-done into the dmabuf's dma_resv exclusive
|
||||
+ * fence. Userspace consumers that imported V4L2-produced dmabufs and
|
||||
+ * tried to do implicit synchronization the spec-clean way
|
||||
+ * (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) got either zero
|
||||
+ * fences or a stub fence from dma_fence_get_stub(). This is correct
|
||||
+ * by accident for the common case (clients call DQBUF before
|
||||
+ * importing) but represents a contract gap.
|
||||
+ *
|
||||
+ * The opt-in API below lets a driver attach a real fence at QBUF
|
||||
+ * time and have it signalled at vb2_buffer_done. Drivers opt in by
|
||||
+ * calling vb2_buffer_attach_release_fence(vb) from their buf_queue
|
||||
+ * callback. No behaviour change for drivers that don't opt in.
|
||||
+ */
|
||||
+
|
||||
+static const char *vb2_dma_resv_get_driver_name(struct dma_fence *fence)
|
||||
+{
|
||||
+ return "videobuf2";
|
||||
+}
|
||||
+
|
||||
+static const char *vb2_dma_resv_get_timeline_name(struct dma_fence *fence)
|
||||
+{
|
||||
+ return "vb2-release-fence";
|
||||
+}
|
||||
+
|
||||
+static const struct dma_fence_ops vb2_dma_resv_fence_ops = {
|
||||
+ .get_driver_name = vb2_dma_resv_get_driver_name,
|
||||
+ .get_timeline_name = vb2_dma_resv_get_timeline_name,
|
||||
+};
|
||||
+
|
||||
+/**
|
||||
+ * vb2_buffer_attach_release_fence() - attach a dma_resv exclusive fence
|
||||
+ * to each of @vb's plane dmabufs, to be signalled when the buffer
|
||||
+ * transitions to VB2_BUF_STATE_DONE.
|
||||
+ *
|
||||
+ * @vb: the buffer being queued to the producer (just-completed
|
||||
+ * transition out of VB2_BUF_STATE_QUEUED into DRIVER-owned).
|
||||
+ *
|
||||
+ * Drivers should call this from their buf_queue callback (after the
|
||||
+ * driver-internal queueing — e.g. after v4l2_m2m_buf_queue() for
|
||||
+ * M2M drivers). Planes whose dbuf is NULL are skipped silently.
|
||||
+ *
|
||||
+ * Returns 0 on success, negative errno on allocation failure. On
|
||||
+ * error, no fence is attached and vb->release_fence remains NULL.
|
||||
+ */
|
||||
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb)
|
||||
+{
|
||||
+ struct vb2_queue *q = vb->vb2_queue;
|
||||
+ struct dma_fence *fence;
|
||||
+ unsigned int plane;
|
||||
+
|
||||
+ if (WARN_ON(vb->release_fence))
|
||||
+ return -EINVAL;
|
||||
+
|
||||
+ fence = kzalloc(sizeof(*fence), GFP_KERNEL);
|
||||
+ if (!fence)
|
||||
+ return -ENOMEM;
|
||||
+
|
||||
+ dma_fence_init(fence, &vb2_dma_resv_fence_ops, &q->dma_resv_fence_lock,
|
||||
+ q->dma_resv_fence_context,
|
||||
+ atomic64_inc_return(&q->dma_resv_fence_seqno));
|
||||
+
|
||||
+ for (plane = 0; plane < vb->num_planes; plane++) {
|
||||
+ struct dma_buf *dbuf = vb->planes[plane].dbuf;
|
||||
+
|
||||
+ if (!dbuf)
|
||||
+ continue;
|
||||
+
|
||||
+ dma_resv_lock(dbuf->resv, NULL);
|
||||
+ dma_resv_add_fence(dbuf->resv, fence, DMA_RESV_USAGE_WRITE);
|
||||
+ dma_resv_unlock(dbuf->resv);
|
||||
+ }
|
||||
+
|
||||
+ /* Hold one reference for the eventual signal in vb2_buffer_done. */
|
||||
+ vb->release_fence = dma_fence_get(fence);
|
||||
+
|
||||
+ /* The dma_resv held its own references for each plane. Drop ours. */
|
||||
+ dma_fence_put(fence);
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+EXPORT_SYMBOL_GPL(vb2_buffer_attach_release_fence);
|
||||
+
|
||||
+static void vb2_buffer_signal_release_fence(struct vb2_buffer *vb,
|
||||
+ enum vb2_buffer_state state)
|
||||
+{
|
||||
+ struct dma_fence *fence = vb->release_fence;
|
||||
+
|
||||
+ if (!fence)
|
||||
+ return;
|
||||
+
|
||||
+ if (state == VB2_BUF_STATE_ERROR)
|
||||
+ dma_fence_set_error(fence, -EIO);
|
||||
+ dma_fence_signal(fence);
|
||||
+ dma_fence_put(fence);
|
||||
+ vb->release_fence = NULL;
|
||||
+}
|
||||
+
|
||||
static int __enqueue_in_driver_with_request(struct vb2_buffer *vb)
|
||||
{
|
||||
if (vb->req_obj.req) {
|
||||
@@ -1182,12 +1286,15 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
|
||||
dprintk(q, 4, "done processing on buffer %d, state: %s\n",
|
||||
vb->index, vb2_state_name(state));
|
||||
|
||||
if (state != VB2_BUF_STATE_QUEUED)
|
||||
__vb2_buf_mem_finish(vb);
|
||||
|
||||
+ if (state != VB2_BUF_STATE_QUEUED)
|
||||
+ vb2_buffer_signal_release_fence(vb, state);
|
||||
+
|
||||
spin_lock_irqsave(&q->done_lock, flags);
|
||||
if (state == VB2_BUF_STATE_QUEUED) {
|
||||
vb->state = VB2_BUF_STATE_QUEUED;
|
||||
} else {
|
||||
@@ -2598,6 +2705,15 @@ int vb2_core_queue_init(struct vb2_queue *q)
|
||||
mutex_init(&q->mmap_lock);
|
||||
init_waitqueue_head(&q->done_wq);
|
||||
|
||||
+ /*
|
||||
+ * Per-queue dma_resv fence context. Drivers that opt into
|
||||
+ * vb2_buffer_attach_release_fence() use these to allocate
|
||||
+ * fences in their own timeline; drivers that don't opt in
|
||||
+ * pay only the four-byte cost of an unused field.
|
||||
+ */
|
||||
+ q->dma_resv_fence_context = dma_fence_context_alloc(1);
|
||||
+ atomic64_set(&q->dma_resv_fence_seqno, 0);
|
||||
+ spin_lock_init(&q->dma_resv_fence_lock);
|
||||
+
|
||||
q->memory = VB2_MEMORY_UNKNOWN;
|
||||
|
||||
if (q->buf_struct_size == 0)
|
||||
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
|
||||
--- a/include/media/videobuf2-core.h
|
||||
+++ b/include/media/videobuf2-core.h
|
||||
@@ -19,6 +19,7 @@
|
||||
#include <linux/dma-buf.h>
|
||||
#include <linux/bitops.h>
|
||||
#include <media/media-request.h>
|
||||
#include <media/frame_vector.h>
|
||||
+struct dma_fence;
|
||||
|
||||
@@ -286,6 +287,12 @@ struct vb2_buffer {
|
||||
unsigned int skip_cache_sync_on_finish:1;
|
||||
|
||||
struct vb2_plane planes[VB2_MAX_PLANES];
|
||||
+ /*
|
||||
+ * dma_resv release fence — set by vb2_buffer_attach_release_fence()
|
||||
+ * (driver opt-in from buf_queue), signalled by vb2_buffer_done.
|
||||
+ * NULL for drivers that don't opt in.
|
||||
+ */
|
||||
+ struct dma_fence *release_fence;
|
||||
struct list_head queued_entry;
|
||||
struct list_head done_entry;
|
||||
|
||||
@@ -645,6 +652,11 @@ struct vb2_queue {
|
||||
wait_queue_head_t done_wq;
|
||||
|
||||
+ /* dma_resv release-fence integration (opt-in per buffer). */
|
||||
+ u64 dma_resv_fence_context;
|
||||
+ atomic64_t dma_resv_fence_seqno;
|
||||
+ spinlock_t dma_resv_fence_lock;
|
||||
+
|
||||
unsigned int streaming:1;
|
||||
unsigned int start_streaming_called:1;
|
||||
unsigned int error:1;
|
||||
@@ -750,6 +762,13 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state);
|
||||
*/
|
||||
void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state);
|
||||
|
||||
+/**
|
||||
+ * vb2_buffer_attach_release_fence() - opt-in dma_resv release fence.
|
||||
+ * Called from a driver's buf_queue callback after enqueueing the
|
||||
+ * buffer in the driver's own queue. See videobuf2-core.c for
|
||||
+ * rationale and call shape.
|
||||
+ */
|
||||
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb);
|
||||
+
|
||||
/**
|
||||
* vb2_discard_done() - discard all buffers marked as DONE.
|
||||
* @q: pointer to &struct vb2_queue with videobuf2 queue.
|
||||
--
|
||||
2.44.0
|
||||
+79
@@ -0,0 +1,79 @@
|
||||
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||
Subject: [PATCH RFC 2/3] media: hantro: attach dma_resv release fence at buf_queue
|
||||
Date: 2026-04-28
|
||||
|
||||
Opt the hantro driver into the new vb2 release-fence helper.
|
||||
|
||||
When userspace QBUFs a buffer to hantro, the buffer is added to the
|
||||
driver's m2m queue via v4l2_m2m_buf_queue. We additionally call
|
||||
vb2_buffer_attach_release_fence() so each plane's dmabuf->resv gets
|
||||
a real producer fence attached. The fence is signalled by vb2_buffer_done
|
||||
when hantro completes the decode (via v4l2_m2m_buf_done_and_job_finish
|
||||
in hantro_drv.c, which converges on vb2_buffer_done).
|
||||
|
||||
Wayland compositors that import hantro CAPTURE buffers (chrome,
|
||||
firefox, mpv, gstreamer) and wait on the dmabuf's implicit-sync
|
||||
fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) now wait on a
|
||||
real fence representing the producer's actual completion, not a
|
||||
stub. KWin's `Transaction::watchDmaBuf` path on Mali-class hardware
|
||||
is the user-visible benefit: the per-frame sync_file roundtrip
|
||||
completes correctly the moment hantro's IRQ handler runs, instead
|
||||
of either polling on a stub fence or — in the failure mode that
|
||||
drove this work — failing to signal at all due to a race that the
|
||||
stub-fence path masked.
|
||||
|
||||
Validated on PineTab2 (RK3566 / Mali-G52 / mainline 6.19 with this
|
||||
series backported / panfrost mesa 26.0.5) playing 1080p30 H.264 in
|
||||
chromium under stock KDE Plasma 6.6.4 Wayland: the chrome stall that
|
||||
required a KWin watchDmaBuf bypass workaround (kwin-fourier in the
|
||||
chromium-fourier project) is gone with this kernel-side fix in
|
||||
place; KWin's wait completes correctly.
|
||||
|
||||
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
|
||||
---
|
||||
drivers/media/platform/verisilicon/hantro_v4l2.c | 17 +++++++++++++++--
|
||||
1 file changed, 15 insertions(+), 2 deletions(-)
|
||||
|
||||
diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
|
||||
--- a/drivers/media/platform/verisilicon/hantro_v4l2.c
|
||||
+++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
|
||||
@@ -858,11 +858,24 @@ static void hantro_buf_queue(struct vb2_buffer *vb)
|
||||
{
|
||||
struct hantro_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
|
||||
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
|
||||
|
||||
if (V4L2_TYPE_IS_CAPTURE(vb->vb2_queue->type) &&
|
||||
vb2_is_streaming(vb->vb2_queue) &&
|
||||
v4l2_m2m_dst_buf_is_last(ctx->fh.m2m_ctx)) {
|
||||
unsigned int i;
|
||||
|
||||
for (i = 0; i < vb->num_planes; i++)
|
||||
vb2_set_plane_payload(vb, i, 0);
|
||||
|
||||
vbuf->field = V4L2_FIELD_NONE;
|
||||
vbuf->sequence =
|
||||
ctx->queue[V4L2_TYPE_IS_OUTPUT(vb->vb2_queue->type)].sequence++;
|
||||
|
||||
v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_DONE);
|
||||
return;
|
||||
}
|
||||
|
||||
- v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
|
||||
+ v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
|
||||
+
|
||||
+ /*
|
||||
+ * Opt in to vb2's dma_resv release-fence path: any userspace
|
||||
+ * consumer that imported this buffer's dmabuf and is doing
|
||||
+ * implicit-sync via poll(POLLIN) or
|
||||
+ * DMA_BUF_IOCTL_EXPORT_SYNC_FILE now waits on a real fence
|
||||
+ * representing this device's completion, instead of the stub
|
||||
+ * fence dma_buf_export_sync_file substitutes when dma_resv is
|
||||
+ * empty. Best-effort: if fence allocation fails we just lose
|
||||
+ * the implicit-sync precision, no functional regression.
|
||||
+ */
|
||||
+ (void)vb2_buffer_attach_release_fence(vb);
|
||||
}
|
||||
|
||||
const struct vb2_ops hantro_queue_ops = {
|
||||
--
|
||||
2.44.0
|
||||
+47
@@ -0,0 +1,47 @@
|
||||
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||
Subject: [PATCH RFC 3/3] media: rockchip-rga: attach dma_resv release fence at buf_queue
|
||||
Date: 2026-04-28
|
||||
|
||||
Opt the Rockchip RGA driver into the new vb2 release-fence helper.
|
||||
|
||||
Same shape as the hantro patch: the existing buf_queue path enqueues
|
||||
the buffer in the driver's m2m queue via v4l2_m2m_buf_queue, and we
|
||||
additionally attach a release fence to each plane's dmabuf->resv via
|
||||
vb2_buffer_attach_release_fence(). vb2_buffer_done signals the fence
|
||||
when RGA completes the M2M operation.
|
||||
|
||||
Userspace consumers of RGA-produced dmabufs (image-processing
|
||||
pipelines, screen-rotation servers, gstreamer flows) get spec-clean
|
||||
implicit-sync semantics, matching what hantro now does in the same
|
||||
patch series.
|
||||
|
||||
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
|
||||
---
|
||||
drivers/media/platform/rockchip/rga/rga-buf.c | 11 +++++++++++
|
||||
1 file changed, 11 insertions(+)
|
||||
|
||||
diff --git a/drivers/media/platform/rockchip/rga/rga-buf.c b/drivers/media/platform/rockchip/rga/rga-buf.c
|
||||
--- a/drivers/media/platform/rockchip/rga/rga-buf.c
|
||||
+++ b/drivers/media/platform/rockchip/rga/rga-buf.c
|
||||
@@ -150,7 +150,18 @@ static void rga_buf_queue(struct vb2_buffer *vb)
|
||||
{
|
||||
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
|
||||
struct rga_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
|
||||
|
||||
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
|
||||
+
|
||||
+ /*
|
||||
+ * Opt in to vb2's dma_resv release-fence path so userspace
|
||||
+ * consumers of RGA-produced dmabufs get a real producer fence
|
||||
+ * to wait on instead of the dma_buf core's substitute stub
|
||||
+ * fence. See the leading patch in this series for rationale
|
||||
+ * and the helper definition. Best-effort: a fence-allocation
|
||||
+ * failure means we lose implicit-sync precision but the m2m
|
||||
+ * operation itself proceeds normally.
|
||||
+ */
|
||||
+ (void)vb2_buffer_attach_release_fence(vb);
|
||||
}
|
||||
|
||||
static void rga_buf_cleanup(struct vb2_buffer *vb)
|
||||
--
|
||||
2.44.0
|
||||
@@ -0,0 +1,94 @@
|
||||
# vb2 dma_resv release-fence — RFC patch series
|
||||
|
||||
A 3-patch RFC series that adds an opt-in dma_resv exclusive-fence
|
||||
API to videobuf2, with hantro and rockchip-rga as the first two
|
||||
drivers to opt in. Drafted as part of the
|
||||
[fourier](https://github.com/marfrit/fourier) campaign — see the
|
||||
top-level [`KWIN_PIVOT.md`](../../arch/chromium-fourier/KWIN_PIVOT.md)
|
||||
for the discovery thread.
|
||||
|
||||
## Files
|
||||
|
||||
```
|
||||
0000-cover-letter.patch
|
||||
0001-media-videobuf2-add-dma_resv-release-fence-helper.patch
|
||||
0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch
|
||||
0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch
|
||||
```
|
||||
|
||||
## What this fixes
|
||||
|
||||
vb2 producers historically don't propagate buffer-state-done into
|
||||
the dmabuf's `dma_resv` exclusive fence. Userspace consumers that
|
||||
import V4L2-produced dmabufs and try to do implicit synchronization
|
||||
the spec-clean way (`poll(POLLIN)` on the dmabuf fd, or
|
||||
`DMA_BUF_IOCTL_EXPORT_SYNC_FILE` for a sync_file) get either zero
|
||||
fences or a stub fence from `dma_fence_get_stub()`. This is correct
|
||||
by accident for the common case (clients call DQBUF before
|
||||
importing) but represents a contract gap.
|
||||
|
||||
The opt-in API in patch 1 lets a driver populate a real fence at
|
||||
QBUF time and have it signalled by vb2_buffer_done. Patches 2 and 3
|
||||
demonstrate the call shape on hantro and rga (one line each in
|
||||
their respective `buf_queue` callbacks).
|
||||
|
||||
## Status
|
||||
|
||||
Patches drafted but **not yet applied / compile-tested / runtime-
|
||||
tested.** They're written against linux-next master as of
|
||||
2026-04-28 (sparse-checked-out at `/tmp/hantro-src` during the
|
||||
chromium-fourier campaign on ohm). Pre-flight before sending:
|
||||
|
||||
1. **Compile** — `make drivers/media/common/videobuf2/videobuf2-core.o
|
||||
drivers/media/platform/verisilicon/hantro_v4l2.o
|
||||
drivers/media/platform/rockchip/rga/rga-buf.o` against the kernel
|
||||
tree the patches will land on. Fix any drift in declarations or
|
||||
line numbers.
|
||||
2. **Boot test on ohm** — install the patched kernel, verify hantro
|
||||
and rga still queue/dequeue buffers correctly (mpv `--vo=drm`
|
||||
smoke test, gstreamer rga pipeline smoke test).
|
||||
3. **Validate the fence semantics** — install patched kernel, **also
|
||||
uninstall the kwin-fourier package** (so KWin's watchDmaBuf is
|
||||
active again), play 1080p30 H.264 in chromium-fourier under KDE
|
||||
Plasma 6.6.4 Wayland: should plays through end-to-end *without*
|
||||
the watchDmaBuf bypass, because the fence wait now waits on a
|
||||
real fence that signals when hantro completes the buffer.
|
||||
4. **Capture timings** — `dma_buf_export_sync_file` round-trip
|
||||
latency before and after, on the same hardware. The patch
|
||||
should not regress; ideally the fence-add path is fast enough
|
||||
that compositor latency improves slightly (the wait now fires
|
||||
on real producer completion instead of a stub-resolved poll).
|
||||
|
||||
If 3 passes, the RFC has end-to-end validation backing the
|
||||
submission. Send to linux-media:
|
||||
|
||||
```
|
||||
git format-patch --cover-letter --to=linux-media@vger.kernel.org \
|
||||
--cc='Hans Verkuil <hverkuil@xs4all.nl>' \
|
||||
--cc='Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>' \
|
||||
--cc='Mauro Carvalho Chehab <mchehab@kernel.org>' \
|
||||
--cc='dri-devel@lists.freedesktop.org' \
|
||||
-3 HEAD
|
||||
```
|
||||
|
||||
## Open questions for upstream review
|
||||
|
||||
(Listed in the cover letter; copying here for convenience.)
|
||||
|
||||
- **Opt-in vs. auto-on**: should every CAPTURE queue auto-attach
|
||||
fences, or stay opt-in per-driver? Auto-on is more correct but
|
||||
forces every driver to be audited; opt-in is incremental and
|
||||
safer.
|
||||
- **Signal point**: `vb2_buffer_done` is the latest moment the
|
||||
producer-write is guaranteed-complete. For drivers with async
|
||||
post-processing stages (image-processor pipelines) the producer
|
||||
fence might want to fire at an earlier point. Out of scope for
|
||||
this RFC; revisit when an actual driver complains.
|
||||
- **DMA_RESV_USAGE_WRITE vs. older `dma_resv_set_excl_fence`**:
|
||||
matches dma-buf documentation for "this device produced a
|
||||
write." Sanity check welcome.
|
||||
|
||||
## License
|
||||
|
||||
Patches are GPL-2.0-only matching the kernel source. The cover
|
||||
letter is informational.
|
||||
Reference in New Issue
Block a user