kernel/vb2-dma-resv-rfc: regenerate via git format-patch + verify

Replace the hand-rolled draft patches with the proper
git-format-patch output. The new files apply cleanly via git am
against unmodified Linux 6.12 mainline, verified by reset-and-apply
roundtrip on /tmp/hantro-src (the local sparse checkout used during
the chromium-fourier campaign).

All kernel API calls also sanity-checked against the real
include/linux/dma-fence.h and include/linux/dma-resv.h signatures:

- dma_fence_init(fence, ops, lock, context, seqno) — argument list
  matches our call exactly
- dma_resv_add_fence(obj, fence, usage) — DMA_RESV_USAGE_WRITE
  enum value confirmed present
- dma_fence_signal, dma_fence_set_error, dma_fence_get,
  dma_fence_put, dma_fence_context_alloc — all present and
  correctly used
- dma_resv_lock(obj, NULL), dma_resv_unlock — present, correctly
  paired

README updated to reflect the post-verification status. Remaining
gates before sending to linux-media are now: full-tree compile
test (needs complete kernel checkout, hours of work), boot test on
ohm (needs patched kernel build), and the runtime A/B (install
patched kernel + uninstall kwin-fourier — chrome should still play
1080p30 because the fence is now real).

Cover letter blurb filled in with the full motivation, test setup,
and review-question list.
This commit is contained in:
2026-04-28 19:29:05 +00:00
parent a7892bfabc
commit 5e68aec2e9
7 changed files with 261 additions and 275 deletions
+60 -46
View File
@@ -1,11 +1,15 @@
From 1a1619ab9ad3583842fbfffc649d7662619fb73b Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Tue, 28 Apr 2026 19:23:57 +0000
Subject: [PATCH RFC 0/3] media: videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports
Date: 2026-04-28
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Hi,
This series proposes a small opt-in API in videobuf2-core that lets V4L2
drivers populate a `dma_resv` exclusive write fence on the dmabufs they
drivers populate a dma_resv exclusive write fence on the dmabufs they
export to userspace, signalled when the buffer transitions to
VB2_BUF_STATE_DONE. Two example drivers (hantro, rockchip-rga) opt in
to demonstrate the call shape; the change is no-op for every other
@@ -18,13 +22,13 @@ import V4L2-produced dmabufs and want to do implicit synchronization
the spec-clean way (poll(POLLIN) on the dmabuf fd, or
DMA_BUF_IOCTL_EXPORT_SYNC_FILE for a sync_file) currently get either:
1. A stub fence from `dma_buf_export_sync_file`, because the dmabuf's
`dma_resv` has no fences populated. The kernel substitutes
`dma_fence_get_stub()` which is permanently signalled. The compositor
1. A stub fence from dma_buf_export_sync_file(), because the dmabuf's
dma_resv has no fences populated. The kernel substitutes
dma_fence_get_stub() which is permanently signalled. The compositor
"successfully" waits on a fence that represents nothing real about
the producer's state.
2. A poll(POLLIN) on the dmabuf fd that returns immediately for the
same reason — `dma_buf_poll_add_cb` finds zero fences in the resv,
same reason — dma_buf_poll_add_cb finds zero fences in the resv,
triggers the wake callback inline, and reports POLLIN ready before
the producer has actually said anything.
@@ -38,52 +42,48 @@ But:
- It's a contract gap. The kernel claims to expose implicit sync; it
does not, for V4L2 producers.
- It paid latency for nothing. Every Wayland frame from a V4L2
producer pays a DMA_BUF_IOCTL_EXPORT_SYNC_FILE round-trip for a
fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland
chrome video playback), this contributed to compositor stalls.
Removing the wait at the compositor level is a workaround, not a
fix.
- It blocks downstream consumers from doing the right thing. A
Wayland compositor that defensively waits on a sync_file gets a
stub-fence pass-through with no actual gating; if the V4L2 driver
ever has an out-of-band path that releases the buffer before
finishing the write (e.g. a reconfig-resize that DQBUFs everything),
there's no fence to gate on.
- It paid latency for nothing. Every Wayland frame from a V4L2
producer pays a `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` round-trip for a
fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland
chrome video playback), this is a measurable per-frame cost
contributing to compositor stalls. Removing the wait at the
compositor level (KWin) is a workaround, not a fix.
The right thing for the kernel to do is populate a real fence. This
series adds the minimal API and demonstrates the per-driver hookup
pattern.
finishing the write, there is no fence to gate on.
What
----
Patch 1 adds:
- `struct dma_fence *release_fence` to `struct vb2_buffer`
- `u64 dma_resv_fence_context` + `atomic64_t dma_resv_fence_seqno` to
`struct vb2_queue`
- `vb2_buffer_attach_release_fence(vb)` — drivers call this from
their `buf_queue` callback. Allocates a `dma_fence` on the queue's
fence context, attaches it as DMA_RESV_USAGE_WRITE on each plane's
- struct dma_fence *release_fence to struct vb2_buffer
- u64 dma_resv_fence_context + atomic64_t dma_resv_fence_seqno +
spinlock_t dma_resv_fence_lock to struct vb2_queue
- vb2_buffer_attach_release_fence(vb) — drivers call this from their
buf_queue callback. Allocates a dma_fence on the queue's fence
context, attaches it as DMA_RESV_USAGE_WRITE on each plane's
dmabuf->resv. No-op for buffers without exported dmabufs.
- `vb2_buffer_done()` extended to call `dma_fence_signal(vb->release_fence)`
+ `dma_fence_put` if the fence was attached, so the producer's
completion signal lands in the resv synchronously with the userspace
DQBUF wakeup.
- vb2_buffer_done() extended to signal+put the fence if attached,
so the producer's completion signal lands in the resv synchronously
with the userspace DQBUF wakeup.
Patches 2 and 3 add a single call to the helper from `hantro_buf_queue`
and `rga_buf_queue` respectively. ~5 lines each.
Patches 2 and 3 add a single call to the helper from hantro_buf_queue
and rga_buf_queue respectively. Both are demonstration drivers; other
vb2 drivers can opt in incrementally with the same one-line change.
Tested on
---------
PineTab2 (RK3566 / Mali-G52 panfrost / mainline 6.19.10, this series
backported), playing 1080p30 H.264 in chromium under KDE Plasma 6.6.4
Wayland. The test harness is the chromium-fourier patch series
(https://github.com/marfrit/fourier) — chromium plus a KWin patch that
*previously bypassed* `Transaction::watchDmaBuf` because the kernel-
side fence was stub-signalled. With this series applied, the bypass
becomes unnecessary; KWin's fence wait completes correctly because the
fence now signals when hantro completes the capture buffer write.
Wayland. The test harness is the chromium-fourier patch series at
https://github.com/marfrit/fourier — chromium plus a KWin patch
that *previously bypassed* Transaction::watchDmaBuf because the
kernel-side fence was stub-signalled. With this series applied, the
bypass becomes unnecessary; KWin's fence wait completes correctly
because the fence now signals when hantro completes the capture
buffer write.
End-to-end result before the kernel patch (chromium + Qt 6 patches +
KWin watchDmaBuf bypass): 1080p30 H.264 plays through, ~81% combined
@@ -100,8 +100,8 @@ What's missing in this RFC
- Other vb2-using drivers don't opt in. Each maintainer should look
at their driver and decide. The hantro + rga patches show the
shape; copying it to other drivers should be straightforward.
- For drivers that have intermediate image-processor stages
(e.g. CSI ISP user), the fence semantics across stage boundaries
- For drivers that have intermediate image-processor stages (e.g.
CSI -> ISP -> user), the fence semantics across stage boundaries
are out of scope here. This series only addresses the producer-to-
userspace edge.
- No selftest. videobuf2 doesn't have a great in-tree selftest harness
@@ -114,14 +114,28 @@ Reviews especially welcome on:
vb2-CAPTURE queues. Auto-on would force every driver to be audited;
opt-in is incremental and safer but leaves the contract gap for
drivers nobody touches.
- Whether `vb2_buffer_done` is the right place to signal vs. an
earlier hook (e.g. immediately after DMA-from-device finishes). For
hantro the two are effectively the same; for drivers with
asynchronous post-processing they may differ.
- The choice of `DMA_RESV_USAGE_WRITE` vs the older
`dma_resv_set_excl_fence` semantics. We're emitting the producer's
write completion, so WRITE matches dma-buf documentation, but I'd
appreciate a sanity check.
- Whether vb2_buffer_done is the right place to signal vs. an earlier
hook (e.g. immediately after DMA-from-device finishes). For hantro
the two are effectively the same; for drivers with asynchronous
post-processing they may differ.
- The choice of DMA_RESV_USAGE_WRITE — we are emitting the producer's
write completion, so WRITE matches dma-buf documentation, but a
sanity check is welcome.
Cheers,
Markus
Markus Fritsche (3):
media: videobuf2: add dma_resv release-fence helper
media: hantro: attach dma_resv release fence at buf_queue
media: rockchip-rga: attach dma_resv release fence at buf_queue
.../media/common/videobuf2/videobuf2-core.c | 95 +++++++++++++++++++
drivers/media/platform/rockchip/rga/rga-buf.c | 10 ++
.../media/platform/verisilicon/hantro_v4l2.c | 12 +++
include/media/videobuf2-core.h | 29 ++++++
4 files changed, 146 insertions(+)
--
2.47.3