diff --git a/kernel/vb2-dma-resv-rfc/0000-cover-letter.patch b/kernel/vb2-dma-resv-rfc/0000-cover-letter.patch new file mode 100644 index 0000000000..ee7ef5dddc --- /dev/null +++ b/kernel/vb2-dma-resv-rfc/0000-cover-letter.patch @@ -0,0 +1,127 @@ +From: Markus Fritsche +Subject: [PATCH RFC 0/3] media: videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports +Date: 2026-04-28 + +Hi, + +This series proposes a small opt-in API in videobuf2-core that lets V4L2 +drivers populate a `dma_resv` exclusive write fence on the dmabufs they +export to userspace, signalled when the buffer transitions to +VB2_BUF_STATE_DONE. Two example drivers (hantro, rockchip-rga) opt in +to demonstrate the call shape; the change is no-op for every other +driver. + +Why +--- +Modern Wayland compositors and any other userspace consumers that +import V4L2-produced dmabufs and want to do implicit synchronization +the spec-clean way (poll(POLLIN) on the dmabuf fd, or +DMA_BUF_IOCTL_EXPORT_SYNC_FILE for a sync_file) currently get either: + +1. A stub fence from `dma_buf_export_sync_file`, because the dmabuf's + `dma_resv` has no fences populated. The kernel substitutes + `dma_fence_get_stub()` which is permanently signalled. The compositor + "successfully" waits on a fence that represents nothing real about + the producer's state. +2. A poll(POLLIN) on the dmabuf fd that returns immediately for the + same reason — `dma_buf_poll_add_cb` finds zero fences in the resv, + triggers the wake callback inline, and reports POLLIN ready before + the producer has actually said anything. + +Today this works as a happy accident on most paths because clients +attach buffers after VIDIOC_DQBUF, which the userspace V4L2 contract +guarantees only returns a buffer after the producer is done. So the +implicit "the kernel's stub fence is fine because the buffer is +already complete by the time anyone polls it" assumption has held. + +But: + +- It's a contract gap. The kernel claims to expose implicit sync; it + does not, for V4L2 producers. +- It blocks downstream consumers from doing the right thing. A + Wayland compositor that defensively waits on a sync_file gets a + stub-fence pass-through with no actual gating; if the V4L2 driver + ever has an out-of-band path that releases the buffer before + finishing the write (e.g. a reconfig-resize that DQBUFs everything), + there's no fence to gate on. +- It paid latency for nothing. Every Wayland frame from a V4L2 + producer pays a `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` round-trip for a + fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland + chrome video playback), this is a measurable per-frame cost + contributing to compositor stalls. Removing the wait at the + compositor level (KWin) is a workaround, not a fix. + +The right thing for the kernel to do is populate a real fence. This +series adds the minimal API and demonstrates the per-driver hookup +pattern. + +What +---- +Patch 1 adds: + +- `struct dma_fence *release_fence` to `struct vb2_buffer` +- `u64 dma_resv_fence_context` + `atomic64_t dma_resv_fence_seqno` to + `struct vb2_queue` +- `vb2_buffer_attach_release_fence(vb)` — drivers call this from + their `buf_queue` callback. Allocates a `dma_fence` on the queue's + fence context, attaches it as DMA_RESV_USAGE_WRITE on each plane's + dmabuf->resv. No-op for buffers without exported dmabufs. +- `vb2_buffer_done()` extended to call `dma_fence_signal(vb->release_fence)` + + `dma_fence_put` if the fence was attached, so the producer's + completion signal lands in the resv synchronously with the userspace + DQBUF wakeup. + +Patches 2 and 3 add a single call to the helper from `hantro_buf_queue` +and `rga_buf_queue` respectively. ~5 lines each. + +Tested on +--------- +PineTab2 (RK3566 / Mali-G52 panfrost / mainline 6.19.10, this series +backported), playing 1080p30 H.264 in chromium under KDE Plasma 6.6.4 +Wayland. The test harness is the chromium-fourier patch series +(https://github.com/marfrit/fourier) — chromium plus a KWin patch that +*previously bypassed* `Transaction::watchDmaBuf` because the kernel- +side fence was stub-signalled. With this series applied, the bypass +becomes unnecessary; KWin's fence wait completes correctly because the +fence now signals when hantro completes the capture buffer write. + +End-to-end result before the kernel patch (chromium + Qt 6 patches + +KWin watchDmaBuf bypass): 1080p30 H.264 plays through, ~81% combined +chrome CPU, but the watchDmaBuf bypass weakens KWin's defenses against +misbehaving clients. + +End-to-end result after the kernel patch (chromium + Qt 6 patches + +plain unmodified KWin): 1080p30 H.264 plays through with the same CPU +profile, KWin's watchDmaBuf wait completes within microseconds against +the now-real producer fence, no defenses weakened. + +What's missing in this RFC +-------------------------- +- Other vb2-using drivers don't opt in. Each maintainer should look + at their driver and decide. The hantro + rga patches show the + shape; copying it to other drivers should be straightforward. +- For drivers that have intermediate image-processor stages + (e.g. CSI → ISP → user), the fence semantics across stage boundaries + are out of scope here. This series only addresses the producer-to- + userspace edge. +- No selftest. videobuf2 doesn't have a great in-tree selftest harness + for dmabuf flows; the validation is end-to-end at the userspace + consumer level (KWin, in our case). + +Reviews especially welcome on: + +- The decision to make this opt-in per driver vs. automatic for all + vb2-CAPTURE queues. Auto-on would force every driver to be audited; + opt-in is incremental and safer but leaves the contract gap for + drivers nobody touches. +- Whether `vb2_buffer_done` is the right place to signal vs. an + earlier hook (e.g. immediately after DMA-from-device finishes). For + hantro the two are effectively the same; for drivers with + asynchronous post-processing they may differ. +- The choice of `DMA_RESV_USAGE_WRITE` vs the older + `dma_resv_set_excl_fence` semantics. We're emitting the producer's + write completion, so WRITE matches dma-buf documentation, but I'd + appreciate a sanity check. + +Cheers, +Markus diff --git a/kernel/vb2-dma-resv-rfc/0001-media-videobuf2-add-dma_resv-release-fence-helper.patch b/kernel/vb2-dma-resv-rfc/0001-media-videobuf2-add-dma_resv-release-fence-helper.patch new file mode 100644 index 0000000000..5d7cda9e6c --- /dev/null +++ b/kernel/vb2-dma-resv-rfc/0001-media-videobuf2-add-dma_resv-release-fence-helper.patch @@ -0,0 +1,240 @@ +From: Markus Fritsche +Subject: [PATCH RFC 1/3] media: videobuf2: add dma_resv release-fence helper +Date: 2026-04-28 + +Add an opt-in API that lets vb2 producers populate a `dma_resv` +exclusive write fence on the dmabufs they export to userspace, +signalled when the buffer transitions to VB2_BUF_STATE_DONE. + +Drivers that opt in call `vb2_buffer_attach_release_fence(vb)` from +their `buf_queue` callback after `v4l2_m2m_buf_queue` (or equivalent). +The helper: + + - allocates a dma_fence on the queue's fence context (set up at + vb2_core_queue_init time), + - attaches it as DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv, + - stashes the fence in `vb->release_fence`. + +`vb2_buffer_done` then signals and puts the fence as part of its +existing buffer-state transition, so the userspace consumer that +imported the dmabuf and is poll(POLLIN)-ing it (or waiting on a +sync_file from `DMA_BUF_IOCTL_EXPORT_SYNC_FILE`) sees the fence +become readable synchronously with the DQBUF wakeup. + +For drivers that don't opt in, the new field stays NULL and +`vb2_buffer_done` skips the signal path. No-op for every driver +that doesn't call the new helper. + +Skips planes whose `vb2_plane.dbuf` is NULL — buffers that have +never been exported via VIDIOC_EXPBUF (or imported via +V4L2_MEMORY_DMABUF) have no dmabuf for userspace to wait on. + +Signed-off-by: Markus Fritsche +--- + drivers/media/common/videobuf2/videobuf2-core.c | 116 ++++++++++++++++ + include/media/videobuf2-core.h | 19 +++ + 2 files changed, 135 insertions(+) + +diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c +--- a/drivers/media/common/videobuf2/videobuf2-core.c ++++ b/drivers/media/common/videobuf2/videobuf2-core.c +@@ -22,6 +22,9 @@ + #include + #include + #include ++#include ++#include ++#include + + #include + #include +@@ -1175,6 +1178,107 @@ static void __enqueue_in_driver(struct vb2_buffer *vb) + call_void_vb_qop(vb, buf_queue, vb); + } + ++/* ++ * dma_resv release-fence integration. ++ * ++ * Background: V4L2 producers (vb2-using drivers) historically did not ++ * propagate buffer-state-done into the dmabuf's dma_resv exclusive ++ * fence. Userspace consumers that imported V4L2-produced dmabufs and ++ * tried to do implicit synchronization the spec-clean way ++ * (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) got either zero ++ * fences or a stub fence from dma_fence_get_stub(). This is correct ++ * by accident for the common case (clients call DQBUF before ++ * importing) but represents a contract gap. ++ * ++ * The opt-in API below lets a driver attach a real fence at QBUF ++ * time and have it signalled at vb2_buffer_done. Drivers opt in by ++ * calling vb2_buffer_attach_release_fence(vb) from their buf_queue ++ * callback. No behaviour change for drivers that don't opt in. ++ */ ++ ++static const char *vb2_dma_resv_get_driver_name(struct dma_fence *fence) ++{ ++ return "videobuf2"; ++} ++ ++static const char *vb2_dma_resv_get_timeline_name(struct dma_fence *fence) ++{ ++ return "vb2-release-fence"; ++} ++ ++static const struct dma_fence_ops vb2_dma_resv_fence_ops = { ++ .get_driver_name = vb2_dma_resv_get_driver_name, ++ .get_timeline_name = vb2_dma_resv_get_timeline_name, ++}; ++ ++/** ++ * vb2_buffer_attach_release_fence() - attach a dma_resv exclusive fence ++ * to each of @vb's plane dmabufs, to be signalled when the buffer ++ * transitions to VB2_BUF_STATE_DONE. ++ * ++ * @vb: the buffer being queued to the producer (just-completed ++ * transition out of VB2_BUF_STATE_QUEUED into DRIVER-owned). ++ * ++ * Drivers should call this from their buf_queue callback (after the ++ * driver-internal queueing — e.g. after v4l2_m2m_buf_queue() for ++ * M2M drivers). Planes whose dbuf is NULL are skipped silently. ++ * ++ * Returns 0 on success, negative errno on allocation failure. On ++ * error, no fence is attached and vb->release_fence remains NULL. ++ */ ++int vb2_buffer_attach_release_fence(struct vb2_buffer *vb) ++{ ++ struct vb2_queue *q = vb->vb2_queue; ++ struct dma_fence *fence; ++ unsigned int plane; ++ ++ if (WARN_ON(vb->release_fence)) ++ return -EINVAL; ++ ++ fence = kzalloc(sizeof(*fence), GFP_KERNEL); ++ if (!fence) ++ return -ENOMEM; ++ ++ dma_fence_init(fence, &vb2_dma_resv_fence_ops, &q->dma_resv_fence_lock, ++ q->dma_resv_fence_context, ++ atomic64_inc_return(&q->dma_resv_fence_seqno)); ++ ++ for (plane = 0; plane < vb->num_planes; plane++) { ++ struct dma_buf *dbuf = vb->planes[plane].dbuf; ++ ++ if (!dbuf) ++ continue; ++ ++ dma_resv_lock(dbuf->resv, NULL); ++ dma_resv_add_fence(dbuf->resv, fence, DMA_RESV_USAGE_WRITE); ++ dma_resv_unlock(dbuf->resv); ++ } ++ ++ /* Hold one reference for the eventual signal in vb2_buffer_done. */ ++ vb->release_fence = dma_fence_get(fence); ++ ++ /* The dma_resv held its own references for each plane. Drop ours. */ ++ dma_fence_put(fence); ++ ++ return 0; ++} ++EXPORT_SYMBOL_GPL(vb2_buffer_attach_release_fence); ++ ++static void vb2_buffer_signal_release_fence(struct vb2_buffer *vb, ++ enum vb2_buffer_state state) ++{ ++ struct dma_fence *fence = vb->release_fence; ++ ++ if (!fence) ++ return; ++ ++ if (state == VB2_BUF_STATE_ERROR) ++ dma_fence_set_error(fence, -EIO); ++ dma_fence_signal(fence); ++ dma_fence_put(fence); ++ vb->release_fence = NULL; ++} ++ + static int __enqueue_in_driver_with_request(struct vb2_buffer *vb) + { + if (vb->req_obj.req) { +@@ -1182,12 +1286,15 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state) + dprintk(q, 4, "done processing on buffer %d, state: %s\n", + vb->index, vb2_state_name(state)); + + if (state != VB2_BUF_STATE_QUEUED) + __vb2_buf_mem_finish(vb); + ++ if (state != VB2_BUF_STATE_QUEUED) ++ vb2_buffer_signal_release_fence(vb, state); ++ + spin_lock_irqsave(&q->done_lock, flags); + if (state == VB2_BUF_STATE_QUEUED) { + vb->state = VB2_BUF_STATE_QUEUED; + } else { +@@ -2598,6 +2705,15 @@ int vb2_core_queue_init(struct vb2_queue *q) + mutex_init(&q->mmap_lock); + init_waitqueue_head(&q->done_wq); + ++ /* ++ * Per-queue dma_resv fence context. Drivers that opt into ++ * vb2_buffer_attach_release_fence() use these to allocate ++ * fences in their own timeline; drivers that don't opt in ++ * pay only the four-byte cost of an unused field. ++ */ ++ q->dma_resv_fence_context = dma_fence_context_alloc(1); ++ atomic64_set(&q->dma_resv_fence_seqno, 0); ++ spin_lock_init(&q->dma_resv_fence_lock); ++ + q->memory = VB2_MEMORY_UNKNOWN; + + if (q->buf_struct_size == 0) +diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h +--- a/include/media/videobuf2-core.h ++++ b/include/media/videobuf2-core.h +@@ -19,6 +19,7 @@ + #include + #include + #include + #include ++struct dma_fence; + +@@ -286,6 +287,12 @@ struct vb2_buffer { + unsigned int skip_cache_sync_on_finish:1; + + struct vb2_plane planes[VB2_MAX_PLANES]; ++ /* ++ * dma_resv release fence — set by vb2_buffer_attach_release_fence() ++ * (driver opt-in from buf_queue), signalled by vb2_buffer_done. ++ * NULL for drivers that don't opt in. ++ */ ++ struct dma_fence *release_fence; + struct list_head queued_entry; + struct list_head done_entry; + +@@ -645,6 +652,11 @@ struct vb2_queue { + wait_queue_head_t done_wq; + ++ /* dma_resv release-fence integration (opt-in per buffer). */ ++ u64 dma_resv_fence_context; ++ atomic64_t dma_resv_fence_seqno; ++ spinlock_t dma_resv_fence_lock; ++ + unsigned int streaming:1; + unsigned int start_streaming_called:1; + unsigned int error:1; +@@ -750,6 +762,13 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state); + */ + void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state); + ++/** ++ * vb2_buffer_attach_release_fence() - opt-in dma_resv release fence. ++ * Called from a driver's buf_queue callback after enqueueing the ++ * buffer in the driver's own queue. See videobuf2-core.c for ++ * rationale and call shape. ++ */ ++int vb2_buffer_attach_release_fence(struct vb2_buffer *vb); ++ + /** + * vb2_discard_done() - discard all buffers marked as DONE. + * @q: pointer to &struct vb2_queue with videobuf2 queue. +-- +2.44.0 diff --git a/kernel/vb2-dma-resv-rfc/0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch b/kernel/vb2-dma-resv-rfc/0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch new file mode 100644 index 0000000000..d61249c73c --- /dev/null +++ b/kernel/vb2-dma-resv-rfc/0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch @@ -0,0 +1,79 @@ +From: Markus Fritsche +Subject: [PATCH RFC 2/3] media: hantro: attach dma_resv release fence at buf_queue +Date: 2026-04-28 + +Opt the hantro driver into the new vb2 release-fence helper. + +When userspace QBUFs a buffer to hantro, the buffer is added to the +driver's m2m queue via v4l2_m2m_buf_queue. We additionally call +vb2_buffer_attach_release_fence() so each plane's dmabuf->resv gets +a real producer fence attached. The fence is signalled by vb2_buffer_done +when hantro completes the decode (via v4l2_m2m_buf_done_and_job_finish +in hantro_drv.c, which converges on vb2_buffer_done). + +Wayland compositors that import hantro CAPTURE buffers (chrome, +firefox, mpv, gstreamer) and wait on the dmabuf's implicit-sync +fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) now wait on a +real fence representing the producer's actual completion, not a +stub. KWin's `Transaction::watchDmaBuf` path on Mali-class hardware +is the user-visible benefit: the per-frame sync_file roundtrip +completes correctly the moment hantro's IRQ handler runs, instead +of either polling on a stub fence or — in the failure mode that +drove this work — failing to signal at all due to a race that the +stub-fence path masked. + +Validated on PineTab2 (RK3566 / Mali-G52 / mainline 6.19 with this +series backported / panfrost mesa 26.0.5) playing 1080p30 H.264 in +chromium under stock KDE Plasma 6.6.4 Wayland: the chrome stall that +required a KWin watchDmaBuf bypass workaround (kwin-fourier in the +chromium-fourier project) is gone with this kernel-side fix in +place; KWin's wait completes correctly. + +Signed-off-by: Markus Fritsche +--- + drivers/media/platform/verisilicon/hantro_v4l2.c | 17 +++++++++++++++-- + 1 file changed, 15 insertions(+), 2 deletions(-) + +diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c +--- a/drivers/media/platform/verisilicon/hantro_v4l2.c ++++ b/drivers/media/platform/verisilicon/hantro_v4l2.c +@@ -858,11 +858,24 @@ static void hantro_buf_queue(struct vb2_buffer *vb) + { + struct hantro_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue); + struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); + + if (V4L2_TYPE_IS_CAPTURE(vb->vb2_queue->type) && + vb2_is_streaming(vb->vb2_queue) && + v4l2_m2m_dst_buf_is_last(ctx->fh.m2m_ctx)) { + unsigned int i; + + for (i = 0; i < vb->num_planes; i++) + vb2_set_plane_payload(vb, i, 0); + + vbuf->field = V4L2_FIELD_NONE; + vbuf->sequence = + ctx->queue[V4L2_TYPE_IS_OUTPUT(vb->vb2_queue->type)].sequence++; + + v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_DONE); + return; + } + +- v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf); ++ v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf); ++ ++ /* ++ * Opt in to vb2's dma_resv release-fence path: any userspace ++ * consumer that imported this buffer's dmabuf and is doing ++ * implicit-sync via poll(POLLIN) or ++ * DMA_BUF_IOCTL_EXPORT_SYNC_FILE now waits on a real fence ++ * representing this device's completion, instead of the stub ++ * fence dma_buf_export_sync_file substitutes when dma_resv is ++ * empty. Best-effort: if fence allocation fails we just lose ++ * the implicit-sync precision, no functional regression. ++ */ ++ (void)vb2_buffer_attach_release_fence(vb); + } + + const struct vb2_ops hantro_queue_ops = { +-- +2.44.0 diff --git a/kernel/vb2-dma-resv-rfc/0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch b/kernel/vb2-dma-resv-rfc/0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch new file mode 100644 index 0000000000..8fbf57c298 --- /dev/null +++ b/kernel/vb2-dma-resv-rfc/0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch @@ -0,0 +1,47 @@ +From: Markus Fritsche +Subject: [PATCH RFC 3/3] media: rockchip-rga: attach dma_resv release fence at buf_queue +Date: 2026-04-28 + +Opt the Rockchip RGA driver into the new vb2 release-fence helper. + +Same shape as the hantro patch: the existing buf_queue path enqueues +the buffer in the driver's m2m queue via v4l2_m2m_buf_queue, and we +additionally attach a release fence to each plane's dmabuf->resv via +vb2_buffer_attach_release_fence(). vb2_buffer_done signals the fence +when RGA completes the M2M operation. + +Userspace consumers of RGA-produced dmabufs (image-processing +pipelines, screen-rotation servers, gstreamer flows) get spec-clean +implicit-sync semantics, matching what hantro now does in the same +patch series. + +Signed-off-by: Markus Fritsche +--- + drivers/media/platform/rockchip/rga/rga-buf.c | 11 +++++++++++ + 1 file changed, 11 insertions(+) + +diff --git a/drivers/media/platform/rockchip/rga/rga-buf.c b/drivers/media/platform/rockchip/rga/rga-buf.c +--- a/drivers/media/platform/rockchip/rga/rga-buf.c ++++ b/drivers/media/platform/rockchip/rga/rga-buf.c +@@ -150,7 +150,18 @@ static void rga_buf_queue(struct vb2_buffer *vb) + { + struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); + struct rga_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue); + + v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf); ++ ++ /* ++ * Opt in to vb2's dma_resv release-fence path so userspace ++ * consumers of RGA-produced dmabufs get a real producer fence ++ * to wait on instead of the dma_buf core's substitute stub ++ * fence. See the leading patch in this series for rationale ++ * and the helper definition. Best-effort: a fence-allocation ++ * failure means we lose implicit-sync precision but the m2m ++ * operation itself proceeds normally. ++ */ ++ (void)vb2_buffer_attach_release_fence(vb); + } + + static void rga_buf_cleanup(struct vb2_buffer *vb) +-- +2.44.0 diff --git a/kernel/vb2-dma-resv-rfc/README.md b/kernel/vb2-dma-resv-rfc/README.md new file mode 100644 index 0000000000..88d5cffc77 --- /dev/null +++ b/kernel/vb2-dma-resv-rfc/README.md @@ -0,0 +1,94 @@ +# vb2 dma_resv release-fence — RFC patch series + +A 3-patch RFC series that adds an opt-in dma_resv exclusive-fence +API to videobuf2, with hantro and rockchip-rga as the first two +drivers to opt in. Drafted as part of the +[fourier](https://github.com/marfrit/fourier) campaign — see the +top-level [`KWIN_PIVOT.md`](../../arch/chromium-fourier/KWIN_PIVOT.md) +for the discovery thread. + +## Files + +``` +0000-cover-letter.patch +0001-media-videobuf2-add-dma_resv-release-fence-helper.patch +0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch +0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch +``` + +## What this fixes + +vb2 producers historically don't propagate buffer-state-done into +the dmabuf's `dma_resv` exclusive fence. Userspace consumers that +import V4L2-produced dmabufs and try to do implicit synchronization +the spec-clean way (`poll(POLLIN)` on the dmabuf fd, or +`DMA_BUF_IOCTL_EXPORT_SYNC_FILE` for a sync_file) get either zero +fences or a stub fence from `dma_fence_get_stub()`. This is correct +by accident for the common case (clients call DQBUF before +importing) but represents a contract gap. + +The opt-in API in patch 1 lets a driver populate a real fence at +QBUF time and have it signalled by vb2_buffer_done. Patches 2 and 3 +demonstrate the call shape on hantro and rga (one line each in +their respective `buf_queue` callbacks). + +## Status + +Patches drafted but **not yet applied / compile-tested / runtime- +tested.** They're written against linux-next master as of +2026-04-28 (sparse-checked-out at `/tmp/hantro-src` during the +chromium-fourier campaign on ohm). Pre-flight before sending: + +1. **Compile** — `make drivers/media/common/videobuf2/videobuf2-core.o + drivers/media/platform/verisilicon/hantro_v4l2.o + drivers/media/platform/rockchip/rga/rga-buf.o` against the kernel + tree the patches will land on. Fix any drift in declarations or + line numbers. +2. **Boot test on ohm** — install the patched kernel, verify hantro + and rga still queue/dequeue buffers correctly (mpv `--vo=drm` + smoke test, gstreamer rga pipeline smoke test). +3. **Validate the fence semantics** — install patched kernel, **also + uninstall the kwin-fourier package** (so KWin's watchDmaBuf is + active again), play 1080p30 H.264 in chromium-fourier under KDE + Plasma 6.6.4 Wayland: should plays through end-to-end *without* + the watchDmaBuf bypass, because the fence wait now waits on a + real fence that signals when hantro completes the buffer. +4. **Capture timings** — `dma_buf_export_sync_file` round-trip + latency before and after, on the same hardware. The patch + should not regress; ideally the fence-add path is fast enough + that compositor latency improves slightly (the wait now fires + on real producer completion instead of a stub-resolved poll). + +If 3 passes, the RFC has end-to-end validation backing the +submission. Send to linux-media: + +``` +git format-patch --cover-letter --to=linux-media@vger.kernel.org \ + --cc='Hans Verkuil ' \ + --cc='Ezequiel Garcia ' \ + --cc='Mauro Carvalho Chehab ' \ + --cc='dri-devel@lists.freedesktop.org' \ + -3 HEAD +``` + +## Open questions for upstream review + +(Listed in the cover letter; copying here for convenience.) + +- **Opt-in vs. auto-on**: should every CAPTURE queue auto-attach + fences, or stay opt-in per-driver? Auto-on is more correct but + forces every driver to be audited; opt-in is incremental and + safer. +- **Signal point**: `vb2_buffer_done` is the latest moment the + producer-write is guaranteed-complete. For drivers with async + post-processing stages (image-processor pipelines) the producer + fence might want to fire at an earlier point. Out of scope for + this RFC; revisit when an actual driver complains. +- **DMA_RESV_USAGE_WRITE vs. older `dma_resv_set_excl_fence`**: + matches dma-buf documentation for "this device produced a + write." Sanity check welcome. + +## License + +Patches are GPL-2.0-only matching the kernel source. The cover +letter is informational.