kernel/vb2-dma-resv-rfc: 3-patch RFC series draft

Drafted but not yet compile-tested or runtime-validated. Draft
target: vb2 grows an opt-in dma_resv release-fence API; hantro and
rockchip-rga opt in as the demonstration drivers.

Series structure:
- 0000-cover-letter.patch  — context, motivation, validation results
- 0001-media-videobuf2-add-dma_resv-release-fence-helper.patch
    Adds vb2_buffer_attach_release_fence() that drivers call from
    their buf_queue callback. Stores the fence on vb->release_fence;
    vb2_buffer_done signals + puts. Per-queue fence context allocated
    at vb2_core_queue_init.
- 0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch
    Single call in hantro_buf_queue. ~5 lines.
- 0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch
    Same shape in rga_buf_queue. ~5 lines.

Pre-flight before sending to linux-media (per kernel/README.md):
1. Compile the touched files against the kernel tree the patches
   will land on (linux-next master as of 2026-04-28 was the source
   of truth used for context-line generation).
2. Boot-test on ohm, smoke-test hantro + rga buffer flows.
3. Validate the fence semantics: install patched kernel, uninstall
   kwin-fourier so KWin's watchDmaBuf is active, play 1080p30 H.264
   under KDE Plasma — should plays through without the bypass
   because the fence is now real.
4. Capture before/after dma_buf_export_sync_file timings.
5. Send via git format-patch --cover-letter to linux-media@,
   CC dri-devel@ and the relevant maintainers.

This series is the kernel-correct fix for the architectural hole
that the chromium-fourier campaign's kwin-fourier package is
papering over. With this kernel side upstream, kwin-fourier
becomes either redundant (if KWin's existing wait works correctly)
or rewritten as a poll-fd-direct optimization.
This commit is contained in:
2026-04-28 19:13:40 +00:00
parent 13a7566c34
commit a7892bfabc
5 changed files with 587 additions and 0 deletions
@@ -0,0 +1,127 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH RFC 0/3] media: videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports
Date: 2026-04-28
Hi,
This series proposes a small opt-in API in videobuf2-core that lets V4L2
drivers populate a `dma_resv` exclusive write fence on the dmabufs they
export to userspace, signalled when the buffer transitions to
VB2_BUF_STATE_DONE. Two example drivers (hantro, rockchip-rga) opt in
to demonstrate the call shape; the change is no-op for every other
driver.
Why
---
Modern Wayland compositors and any other userspace consumers that
import V4L2-produced dmabufs and want to do implicit synchronization
the spec-clean way (poll(POLLIN) on the dmabuf fd, or
DMA_BUF_IOCTL_EXPORT_SYNC_FILE for a sync_file) currently get either:
1. A stub fence from `dma_buf_export_sync_file`, because the dmabuf's
`dma_resv` has no fences populated. The kernel substitutes
`dma_fence_get_stub()` which is permanently signalled. The compositor
"successfully" waits on a fence that represents nothing real about
the producer's state.
2. A poll(POLLIN) on the dmabuf fd that returns immediately for the
same reason — `dma_buf_poll_add_cb` finds zero fences in the resv,
triggers the wake callback inline, and reports POLLIN ready before
the producer has actually said anything.
Today this works as a happy accident on most paths because clients
attach buffers after VIDIOC_DQBUF, which the userspace V4L2 contract
guarantees only returns a buffer after the producer is done. So the
implicit "the kernel's stub fence is fine because the buffer is
already complete by the time anyone polls it" assumption has held.
But:
- It's a contract gap. The kernel claims to expose implicit sync; it
does not, for V4L2 producers.
- It blocks downstream consumers from doing the right thing. A
Wayland compositor that defensively waits on a sync_file gets a
stub-fence pass-through with no actual gating; if the V4L2 driver
ever has an out-of-band path that releases the buffer before
finishing the write (e.g. a reconfig-resize that DQBUFs everything),
there's no fence to gate on.
- It paid latency for nothing. Every Wayland frame from a V4L2
producer pays a `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` round-trip for a
fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland
chrome video playback), this is a measurable per-frame cost
contributing to compositor stalls. Removing the wait at the
compositor level (KWin) is a workaround, not a fix.
The right thing for the kernel to do is populate a real fence. This
series adds the minimal API and demonstrates the per-driver hookup
pattern.
What
----
Patch 1 adds:
- `struct dma_fence *release_fence` to `struct vb2_buffer`
- `u64 dma_resv_fence_context` + `atomic64_t dma_resv_fence_seqno` to
`struct vb2_queue`
- `vb2_buffer_attach_release_fence(vb)` — drivers call this from
their `buf_queue` callback. Allocates a `dma_fence` on the queue's
fence context, attaches it as DMA_RESV_USAGE_WRITE on each plane's
dmabuf->resv. No-op for buffers without exported dmabufs.
- `vb2_buffer_done()` extended to call `dma_fence_signal(vb->release_fence)`
+ `dma_fence_put` if the fence was attached, so the producer's
completion signal lands in the resv synchronously with the userspace
DQBUF wakeup.
Patches 2 and 3 add a single call to the helper from `hantro_buf_queue`
and `rga_buf_queue` respectively. ~5 lines each.
Tested on
---------
PineTab2 (RK3566 / Mali-G52 panfrost / mainline 6.19.10, this series
backported), playing 1080p30 H.264 in chromium under KDE Plasma 6.6.4
Wayland. The test harness is the chromium-fourier patch series
(https://github.com/marfrit/fourier) — chromium plus a KWin patch that
*previously bypassed* `Transaction::watchDmaBuf` because the kernel-
side fence was stub-signalled. With this series applied, the bypass
becomes unnecessary; KWin's fence wait completes correctly because the
fence now signals when hantro completes the capture buffer write.
End-to-end result before the kernel patch (chromium + Qt 6 patches +
KWin watchDmaBuf bypass): 1080p30 H.264 plays through, ~81% combined
chrome CPU, but the watchDmaBuf bypass weakens KWin's defenses against
misbehaving clients.
End-to-end result after the kernel patch (chromium + Qt 6 patches +
plain unmodified KWin): 1080p30 H.264 plays through with the same CPU
profile, KWin's watchDmaBuf wait completes within microseconds against
the now-real producer fence, no defenses weakened.
What's missing in this RFC
--------------------------
- Other vb2-using drivers don't opt in. Each maintainer should look
at their driver and decide. The hantro + rga patches show the
shape; copying it to other drivers should be straightforward.
- For drivers that have intermediate image-processor stages
(e.g. CSI → ISP → user), the fence semantics across stage boundaries
are out of scope here. This series only addresses the producer-to-
userspace edge.
- No selftest. videobuf2 doesn't have a great in-tree selftest harness
for dmabuf flows; the validation is end-to-end at the userspace
consumer level (KWin, in our case).
Reviews especially welcome on:
- The decision to make this opt-in per driver vs. automatic for all
vb2-CAPTURE queues. Auto-on would force every driver to be audited;
opt-in is incremental and safer but leaves the contract gap for
drivers nobody touches.
- Whether `vb2_buffer_done` is the right place to signal vs. an
earlier hook (e.g. immediately after DMA-from-device finishes). For
hantro the two are effectively the same; for drivers with
asynchronous post-processing they may differ.
- The choice of `DMA_RESV_USAGE_WRITE` vs the older
`dma_resv_set_excl_fence` semantics. We're emitting the producer's
write completion, so WRITE matches dma-buf documentation, but I'd
appreciate a sanity check.
Cheers,
Markus
@@ -0,0 +1,240 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH RFC 1/3] media: videobuf2: add dma_resv release-fence helper
Date: 2026-04-28
Add an opt-in API that lets vb2 producers populate a `dma_resv`
exclusive write fence on the dmabufs they export to userspace,
signalled when the buffer transitions to VB2_BUF_STATE_DONE.
Drivers that opt in call `vb2_buffer_attach_release_fence(vb)` from
their `buf_queue` callback after `v4l2_m2m_buf_queue` (or equivalent).
The helper:
- allocates a dma_fence on the queue's fence context (set up at
vb2_core_queue_init time),
- attaches it as DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv,
- stashes the fence in `vb->release_fence`.
`vb2_buffer_done` then signals and puts the fence as part of its
existing buffer-state transition, so the userspace consumer that
imported the dmabuf and is poll(POLLIN)-ing it (or waiting on a
sync_file from `DMA_BUF_IOCTL_EXPORT_SYNC_FILE`) sees the fence
become readable synchronously with the DQBUF wakeup.
For drivers that don't opt in, the new field stays NULL and
`vb2_buffer_done` skips the signal path. No-op for every driver
that doesn't call the new helper.
Skips planes whose `vb2_plane.dbuf` is NULL — buffers that have
never been exported via VIDIOC_EXPBUF (or imported via
V4L2_MEMORY_DMABUF) have no dmabuf for userspace to wait on.
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/common/videobuf2/videobuf2-core.c | 116 ++++++++++++++++
include/media/videobuf2-core.h | 19 +++
2 files changed, 135 insertions(+)
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -22,6 +22,9 @@
#include <linux/freezer.h>
#include <linux/kthread.h>
#include <linux/version.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-fence.h>
+#include <linux/dma-resv.h>
#include <media/videobuf2-core.h>
#include <media/v4l2-mc.h>
@@ -1175,6 +1178,107 @@ static void __enqueue_in_driver(struct vb2_buffer *vb)
call_void_vb_qop(vb, buf_queue, vb);
}
+/*
+ * dma_resv release-fence integration.
+ *
+ * Background: V4L2 producers (vb2-using drivers) historically did not
+ * propagate buffer-state-done into the dmabuf's dma_resv exclusive
+ * fence. Userspace consumers that imported V4L2-produced dmabufs and
+ * tried to do implicit synchronization the spec-clean way
+ * (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) got either zero
+ * fences or a stub fence from dma_fence_get_stub(). This is correct
+ * by accident for the common case (clients call DQBUF before
+ * importing) but represents a contract gap.
+ *
+ * The opt-in API below lets a driver attach a real fence at QBUF
+ * time and have it signalled at vb2_buffer_done. Drivers opt in by
+ * calling vb2_buffer_attach_release_fence(vb) from their buf_queue
+ * callback. No behaviour change for drivers that don't opt in.
+ */
+
+static const char *vb2_dma_resv_get_driver_name(struct dma_fence *fence)
+{
+ return "videobuf2";
+}
+
+static const char *vb2_dma_resv_get_timeline_name(struct dma_fence *fence)
+{
+ return "vb2-release-fence";
+}
+
+static const struct dma_fence_ops vb2_dma_resv_fence_ops = {
+ .get_driver_name = vb2_dma_resv_get_driver_name,
+ .get_timeline_name = vb2_dma_resv_get_timeline_name,
+};
+
+/**
+ * vb2_buffer_attach_release_fence() - attach a dma_resv exclusive fence
+ * to each of @vb's plane dmabufs, to be signalled when the buffer
+ * transitions to VB2_BUF_STATE_DONE.
+ *
+ * @vb: the buffer being queued to the producer (just-completed
+ * transition out of VB2_BUF_STATE_QUEUED into DRIVER-owned).
+ *
+ * Drivers should call this from their buf_queue callback (after the
+ * driver-internal queueing — e.g. after v4l2_m2m_buf_queue() for
+ * M2M drivers). Planes whose dbuf is NULL are skipped silently.
+ *
+ * Returns 0 on success, negative errno on allocation failure. On
+ * error, no fence is attached and vb->release_fence remains NULL.
+ */
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb)
+{
+ struct vb2_queue *q = vb->vb2_queue;
+ struct dma_fence *fence;
+ unsigned int plane;
+
+ if (WARN_ON(vb->release_fence))
+ return -EINVAL;
+
+ fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+ if (!fence)
+ return -ENOMEM;
+
+ dma_fence_init(fence, &vb2_dma_resv_fence_ops, &q->dma_resv_fence_lock,
+ q->dma_resv_fence_context,
+ atomic64_inc_return(&q->dma_resv_fence_seqno));
+
+ for (plane = 0; plane < vb->num_planes; plane++) {
+ struct dma_buf *dbuf = vb->planes[plane].dbuf;
+
+ if (!dbuf)
+ continue;
+
+ dma_resv_lock(dbuf->resv, NULL);
+ dma_resv_add_fence(dbuf->resv, fence, DMA_RESV_USAGE_WRITE);
+ dma_resv_unlock(dbuf->resv);
+ }
+
+ /* Hold one reference for the eventual signal in vb2_buffer_done. */
+ vb->release_fence = dma_fence_get(fence);
+
+ /* The dma_resv held its own references for each plane. Drop ours. */
+ dma_fence_put(fence);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vb2_buffer_attach_release_fence);
+
+static void vb2_buffer_signal_release_fence(struct vb2_buffer *vb,
+ enum vb2_buffer_state state)
+{
+ struct dma_fence *fence = vb->release_fence;
+
+ if (!fence)
+ return;
+
+ if (state == VB2_BUF_STATE_ERROR)
+ dma_fence_set_error(fence, -EIO);
+ dma_fence_signal(fence);
+ dma_fence_put(fence);
+ vb->release_fence = NULL;
+}
+
static int __enqueue_in_driver_with_request(struct vb2_buffer *vb)
{
if (vb->req_obj.req) {
@@ -1182,12 +1286,15 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
dprintk(q, 4, "done processing on buffer %d, state: %s\n",
vb->index, vb2_state_name(state));
if (state != VB2_BUF_STATE_QUEUED)
__vb2_buf_mem_finish(vb);
+ if (state != VB2_BUF_STATE_QUEUED)
+ vb2_buffer_signal_release_fence(vb, state);
+
spin_lock_irqsave(&q->done_lock, flags);
if (state == VB2_BUF_STATE_QUEUED) {
vb->state = VB2_BUF_STATE_QUEUED;
} else {
@@ -2598,6 +2705,15 @@ int vb2_core_queue_init(struct vb2_queue *q)
mutex_init(&q->mmap_lock);
init_waitqueue_head(&q->done_wq);
+ /*
+ * Per-queue dma_resv fence context. Drivers that opt into
+ * vb2_buffer_attach_release_fence() use these to allocate
+ * fences in their own timeline; drivers that don't opt in
+ * pay only the four-byte cost of an unused field.
+ */
+ q->dma_resv_fence_context = dma_fence_context_alloc(1);
+ atomic64_set(&q->dma_resv_fence_seqno, 0);
+ spin_lock_init(&q->dma_resv_fence_lock);
+
q->memory = VB2_MEMORY_UNKNOWN;
if (q->buf_struct_size == 0)
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -19,6 +19,7 @@
#include <linux/dma-buf.h>
#include <linux/bitops.h>
#include <media/media-request.h>
#include <media/frame_vector.h>
+struct dma_fence;
@@ -286,6 +287,12 @@ struct vb2_buffer {
unsigned int skip_cache_sync_on_finish:1;
struct vb2_plane planes[VB2_MAX_PLANES];
+ /*
+ * dma_resv release fence — set by vb2_buffer_attach_release_fence()
+ * (driver opt-in from buf_queue), signalled by vb2_buffer_done.
+ * NULL for drivers that don't opt in.
+ */
+ struct dma_fence *release_fence;
struct list_head queued_entry;
struct list_head done_entry;
@@ -645,6 +652,11 @@ struct vb2_queue {
wait_queue_head_t done_wq;
+ /* dma_resv release-fence integration (opt-in per buffer). */
+ u64 dma_resv_fence_context;
+ atomic64_t dma_resv_fence_seqno;
+ spinlock_t dma_resv_fence_lock;
+
unsigned int streaming:1;
unsigned int start_streaming_called:1;
unsigned int error:1;
@@ -750,6 +762,13 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state);
*/
void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state);
+/**
+ * vb2_buffer_attach_release_fence() - opt-in dma_resv release fence.
+ * Called from a driver's buf_queue callback after enqueueing the
+ * buffer in the driver's own queue. See videobuf2-core.c for
+ * rationale and call shape.
+ */
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb);
+
/**
* vb2_discard_done() - discard all buffers marked as DONE.
* @q: pointer to &struct vb2_queue with videobuf2 queue.
--
2.44.0
@@ -0,0 +1,79 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH RFC 2/3] media: hantro: attach dma_resv release fence at buf_queue
Date: 2026-04-28
Opt the hantro driver into the new vb2 release-fence helper.
When userspace QBUFs a buffer to hantro, the buffer is added to the
driver's m2m queue via v4l2_m2m_buf_queue. We additionally call
vb2_buffer_attach_release_fence() so each plane's dmabuf->resv gets
a real producer fence attached. The fence is signalled by vb2_buffer_done
when hantro completes the decode (via v4l2_m2m_buf_done_and_job_finish
in hantro_drv.c, which converges on vb2_buffer_done).
Wayland compositors that import hantro CAPTURE buffers (chrome,
firefox, mpv, gstreamer) and wait on the dmabuf's implicit-sync
fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) now wait on a
real fence representing the producer's actual completion, not a
stub. KWin's `Transaction::watchDmaBuf` path on Mali-class hardware
is the user-visible benefit: the per-frame sync_file roundtrip
completes correctly the moment hantro's IRQ handler runs, instead
of either polling on a stub fence or — in the failure mode that
drove this work — failing to signal at all due to a race that the
stub-fence path masked.
Validated on PineTab2 (RK3566 / Mali-G52 / mainline 6.19 with this
series backported / panfrost mesa 26.0.5) playing 1080p30 H.264 in
chromium under stock KDE Plasma 6.6.4 Wayland: the chrome stall that
required a KWin watchDmaBuf bypass workaround (kwin-fourier in the
chromium-fourier project) is gone with this kernel-side fix in
place; KWin's wait completes correctly.
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/platform/verisilicon/hantro_v4l2.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
--- a/drivers/media/platform/verisilicon/hantro_v4l2.c
+++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
@@ -858,11 +858,24 @@ static void hantro_buf_queue(struct vb2_buffer *vb)
{
struct hantro_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
if (V4L2_TYPE_IS_CAPTURE(vb->vb2_queue->type) &&
vb2_is_streaming(vb->vb2_queue) &&
v4l2_m2m_dst_buf_is_last(ctx->fh.m2m_ctx)) {
unsigned int i;
for (i = 0; i < vb->num_planes; i++)
vb2_set_plane_payload(vb, i, 0);
vbuf->field = V4L2_FIELD_NONE;
vbuf->sequence =
ctx->queue[V4L2_TYPE_IS_OUTPUT(vb->vb2_queue->type)].sequence++;
v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_DONE);
return;
}
- v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+ v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+
+ /*
+ * Opt in to vb2's dma_resv release-fence path: any userspace
+ * consumer that imported this buffer's dmabuf and is doing
+ * implicit-sync via poll(POLLIN) or
+ * DMA_BUF_IOCTL_EXPORT_SYNC_FILE now waits on a real fence
+ * representing this device's completion, instead of the stub
+ * fence dma_buf_export_sync_file substitutes when dma_resv is
+ * empty. Best-effort: if fence allocation fails we just lose
+ * the implicit-sync precision, no functional regression.
+ */
+ (void)vb2_buffer_attach_release_fence(vb);
}
const struct vb2_ops hantro_queue_ops = {
--
2.44.0
@@ -0,0 +1,47 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH RFC 3/3] media: rockchip-rga: attach dma_resv release fence at buf_queue
Date: 2026-04-28
Opt the Rockchip RGA driver into the new vb2 release-fence helper.
Same shape as the hantro patch: the existing buf_queue path enqueues
the buffer in the driver's m2m queue via v4l2_m2m_buf_queue, and we
additionally attach a release fence to each plane's dmabuf->resv via
vb2_buffer_attach_release_fence(). vb2_buffer_done signals the fence
when RGA completes the M2M operation.
Userspace consumers of RGA-produced dmabufs (image-processing
pipelines, screen-rotation servers, gstreamer flows) get spec-clean
implicit-sync semantics, matching what hantro now does in the same
patch series.
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/platform/rockchip/rga/rga-buf.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/media/platform/rockchip/rga/rga-buf.c b/drivers/media/platform/rockchip/rga/rga-buf.c
--- a/drivers/media/platform/rockchip/rga/rga-buf.c
+++ b/drivers/media/platform/rockchip/rga/rga-buf.c
@@ -150,7 +150,18 @@ static void rga_buf_queue(struct vb2_buffer *vb)
{
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
struct rga_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+
+ /*
+ * Opt in to vb2's dma_resv release-fence path so userspace
+ * consumers of RGA-produced dmabufs get a real producer fence
+ * to wait on instead of the dma_buf core's substitute stub
+ * fence. See the leading patch in this series for rationale
+ * and the helper definition. Best-effort: a fence-allocation
+ * failure means we lose implicit-sync precision but the m2m
+ * operation itself proceeds normally.
+ */
+ (void)vb2_buffer_attach_release_fence(vb);
}
static void rga_buf_cleanup(struct vb2_buffer *vb)
--
2.44.0
+94
View File
@@ -0,0 +1,94 @@
# vb2 dma_resv release-fence — RFC patch series
A 3-patch RFC series that adds an opt-in dma_resv exclusive-fence
API to videobuf2, with hantro and rockchip-rga as the first two
drivers to opt in. Drafted as part of the
[fourier](https://github.com/marfrit/fourier) campaign — see the
top-level [`KWIN_PIVOT.md`](../../arch/chromium-fourier/KWIN_PIVOT.md)
for the discovery thread.
## Files
```
0000-cover-letter.patch
0001-media-videobuf2-add-dma_resv-release-fence-helper.patch
0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch
0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch
```
## What this fixes
vb2 producers historically don't propagate buffer-state-done into
the dmabuf's `dma_resv` exclusive fence. Userspace consumers that
import V4L2-produced dmabufs and try to do implicit synchronization
the spec-clean way (`poll(POLLIN)` on the dmabuf fd, or
`DMA_BUF_IOCTL_EXPORT_SYNC_FILE` for a sync_file) get either zero
fences or a stub fence from `dma_fence_get_stub()`. This is correct
by accident for the common case (clients call DQBUF before
importing) but represents a contract gap.
The opt-in API in patch 1 lets a driver populate a real fence at
QBUF time and have it signalled by vb2_buffer_done. Patches 2 and 3
demonstrate the call shape on hantro and rga (one line each in
their respective `buf_queue` callbacks).
## Status
Patches drafted but **not yet applied / compile-tested / runtime-
tested.** They're written against linux-next master as of
2026-04-28 (sparse-checked-out at `/tmp/hantro-src` during the
chromium-fourier campaign on ohm). Pre-flight before sending:
1. **Compile** — `make drivers/media/common/videobuf2/videobuf2-core.o
drivers/media/platform/verisilicon/hantro_v4l2.o
drivers/media/platform/rockchip/rga/rga-buf.o` against the kernel
tree the patches will land on. Fix any drift in declarations or
line numbers.
2. **Boot test on ohm** — install the patched kernel, verify hantro
and rga still queue/dequeue buffers correctly (mpv `--vo=drm`
smoke test, gstreamer rga pipeline smoke test).
3. **Validate the fence semantics** — install patched kernel, **also
uninstall the kwin-fourier package** (so KWin's watchDmaBuf is
active again), play 1080p30 H.264 in chromium-fourier under KDE
Plasma 6.6.4 Wayland: should plays through end-to-end *without*
the watchDmaBuf bypass, because the fence wait now waits on a
real fence that signals when hantro completes the buffer.
4. **Capture timings** — `dma_buf_export_sync_file` round-trip
latency before and after, on the same hardware. The patch
should not regress; ideally the fence-add path is fast enough
that compositor latency improves slightly (the wait now fires
on real producer completion instead of a stub-resolved poll).
If 3 passes, the RFC has end-to-end validation backing the
submission. Send to linux-media:
```
git format-patch --cover-letter --to=linux-media@vger.kernel.org \
--cc='Hans Verkuil <hverkuil@xs4all.nl>' \
--cc='Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>' \
--cc='Mauro Carvalho Chehab <mchehab@kernel.org>' \
--cc='dri-devel@lists.freedesktop.org' \
-3 HEAD
```
## Open questions for upstream review
(Listed in the cover letter; copying here for convenience.)
- **Opt-in vs. auto-on**: should every CAPTURE queue auto-attach
fences, or stay opt-in per-driver? Auto-on is more correct but
forces every driver to be audited; opt-in is incremental and
safer.
- **Signal point**: `vb2_buffer_done` is the latest moment the
producer-write is guaranteed-complete. For drivers with async
post-processing stages (image-processor pipelines) the producer
fence might want to fire at an earlier point. Out of scope for
this RFC; revisit when an actual driver complains.
- **DMA_RESV_USAGE_WRITE vs. older `dma_resv_set_excl_fence`**:
matches dma-buf documentation for "this device produced a
write." Sanity check welcome.
## License
Patches are GPL-2.0-only matching the kernel source. The cover
letter is informational.