kernel/vb2-dma-resv-rfc: regenerate via git format-patch + verify
Replace the hand-rolled draft patches with the proper git-format-patch output. The new files apply cleanly via git am against unmodified Linux 6.12 mainline, verified by reset-and-apply roundtrip on /tmp/hantro-src (the local sparse checkout used during the chromium-fourier campaign). All kernel API calls also sanity-checked against the real include/linux/dma-fence.h and include/linux/dma-resv.h signatures: - dma_fence_init(fence, ops, lock, context, seqno) — argument list matches our call exactly - dma_resv_add_fence(obj, fence, usage) — DMA_RESV_USAGE_WRITE enum value confirmed present - dma_fence_signal, dma_fence_set_error, dma_fence_get, dma_fence_put, dma_fence_context_alloc — all present and correctly used - dma_resv_lock(obj, NULL), dma_resv_unlock — present, correctly paired README updated to reflect the post-verification status. Remaining gates before sending to linux-media are now: full-tree compile test (needs complete kernel checkout, hours of work), boot test on ohm (needs patched kernel build), and the runtime A/B (install patched kernel + uninstall kwin-fourier — chrome should still play 1080p30 because the fence is now real). Cover letter blurb filled in with the full motivation, test setup, and review-question list.
This commit is contained in:
@@ -1,11 +1,15 @@
|
||||
From 1a1619ab9ad3583842fbfffc649d7662619fb73b Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <mfritsche@reauktion.de>
|
||||
Date: Tue, 28 Apr 2026 19:23:57 +0000
|
||||
Subject: [PATCH RFC 0/3] media: videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports
|
||||
Date: 2026-04-28
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
Hi,
|
||||
|
||||
This series proposes a small opt-in API in videobuf2-core that lets V4L2
|
||||
drivers populate a `dma_resv` exclusive write fence on the dmabufs they
|
||||
drivers populate a dma_resv exclusive write fence on the dmabufs they
|
||||
export to userspace, signalled when the buffer transitions to
|
||||
VB2_BUF_STATE_DONE. Two example drivers (hantro, rockchip-rga) opt in
|
||||
to demonstrate the call shape; the change is no-op for every other
|
||||
@@ -18,13 +22,13 @@ import V4L2-produced dmabufs and want to do implicit synchronization
|
||||
the spec-clean way (poll(POLLIN) on the dmabuf fd, or
|
||||
DMA_BUF_IOCTL_EXPORT_SYNC_FILE for a sync_file) currently get either:
|
||||
|
||||
1. A stub fence from `dma_buf_export_sync_file`, because the dmabuf's
|
||||
`dma_resv` has no fences populated. The kernel substitutes
|
||||
`dma_fence_get_stub()` which is permanently signalled. The compositor
|
||||
1. A stub fence from dma_buf_export_sync_file(), because the dmabuf's
|
||||
dma_resv has no fences populated. The kernel substitutes
|
||||
dma_fence_get_stub() which is permanently signalled. The compositor
|
||||
"successfully" waits on a fence that represents nothing real about
|
||||
the producer's state.
|
||||
2. A poll(POLLIN) on the dmabuf fd that returns immediately for the
|
||||
same reason — `dma_buf_poll_add_cb` finds zero fences in the resv,
|
||||
same reason — dma_buf_poll_add_cb finds zero fences in the resv,
|
||||
triggers the wake callback inline, and reports POLLIN ready before
|
||||
the producer has actually said anything.
|
||||
|
||||
@@ -38,52 +42,48 @@ But:
|
||||
|
||||
- It's a contract gap. The kernel claims to expose implicit sync; it
|
||||
does not, for V4L2 producers.
|
||||
- It paid latency for nothing. Every Wayland frame from a V4L2
|
||||
producer pays a DMA_BUF_IOCTL_EXPORT_SYNC_FILE round-trip for a
|
||||
fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland
|
||||
chrome video playback), this contributed to compositor stalls.
|
||||
Removing the wait at the compositor level is a workaround, not a
|
||||
fix.
|
||||
- It blocks downstream consumers from doing the right thing. A
|
||||
Wayland compositor that defensively waits on a sync_file gets a
|
||||
stub-fence pass-through with no actual gating; if the V4L2 driver
|
||||
ever has an out-of-band path that releases the buffer before
|
||||
finishing the write (e.g. a reconfig-resize that DQBUFs everything),
|
||||
there's no fence to gate on.
|
||||
- It paid latency for nothing. Every Wayland frame from a V4L2
|
||||
producer pays a `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` round-trip for a
|
||||
fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland
|
||||
chrome video playback), this is a measurable per-frame cost
|
||||
contributing to compositor stalls. Removing the wait at the
|
||||
compositor level (KWin) is a workaround, not a fix.
|
||||
|
||||
The right thing for the kernel to do is populate a real fence. This
|
||||
series adds the minimal API and demonstrates the per-driver hookup
|
||||
pattern.
|
||||
finishing the write, there is no fence to gate on.
|
||||
|
||||
What
|
||||
----
|
||||
Patch 1 adds:
|
||||
|
||||
- `struct dma_fence *release_fence` to `struct vb2_buffer`
|
||||
- `u64 dma_resv_fence_context` + `atomic64_t dma_resv_fence_seqno` to
|
||||
`struct vb2_queue`
|
||||
- `vb2_buffer_attach_release_fence(vb)` — drivers call this from
|
||||
their `buf_queue` callback. Allocates a `dma_fence` on the queue's
|
||||
fence context, attaches it as DMA_RESV_USAGE_WRITE on each plane's
|
||||
- struct dma_fence *release_fence to struct vb2_buffer
|
||||
- u64 dma_resv_fence_context + atomic64_t dma_resv_fence_seqno +
|
||||
spinlock_t dma_resv_fence_lock to struct vb2_queue
|
||||
- vb2_buffer_attach_release_fence(vb) — drivers call this from their
|
||||
buf_queue callback. Allocates a dma_fence on the queue's fence
|
||||
context, attaches it as DMA_RESV_USAGE_WRITE on each plane's
|
||||
dmabuf->resv. No-op for buffers without exported dmabufs.
|
||||
- `vb2_buffer_done()` extended to call `dma_fence_signal(vb->release_fence)`
|
||||
+ `dma_fence_put` if the fence was attached, so the producer's
|
||||
completion signal lands in the resv synchronously with the userspace
|
||||
DQBUF wakeup.
|
||||
- vb2_buffer_done() extended to signal+put the fence if attached,
|
||||
so the producer's completion signal lands in the resv synchronously
|
||||
with the userspace DQBUF wakeup.
|
||||
|
||||
Patches 2 and 3 add a single call to the helper from `hantro_buf_queue`
|
||||
and `rga_buf_queue` respectively. ~5 lines each.
|
||||
Patches 2 and 3 add a single call to the helper from hantro_buf_queue
|
||||
and rga_buf_queue respectively. Both are demonstration drivers; other
|
||||
vb2 drivers can opt in incrementally with the same one-line change.
|
||||
|
||||
Tested on
|
||||
---------
|
||||
PineTab2 (RK3566 / Mali-G52 panfrost / mainline 6.19.10, this series
|
||||
backported), playing 1080p30 H.264 in chromium under KDE Plasma 6.6.4
|
||||
Wayland. The test harness is the chromium-fourier patch series
|
||||
(https://github.com/marfrit/fourier) — chromium plus a KWin patch that
|
||||
*previously bypassed* `Transaction::watchDmaBuf` because the kernel-
|
||||
side fence was stub-signalled. With this series applied, the bypass
|
||||
becomes unnecessary; KWin's fence wait completes correctly because the
|
||||
fence now signals when hantro completes the capture buffer write.
|
||||
Wayland. The test harness is the chromium-fourier patch series at
|
||||
https://github.com/marfrit/fourier — chromium plus a KWin patch
|
||||
that *previously bypassed* Transaction::watchDmaBuf because the
|
||||
kernel-side fence was stub-signalled. With this series applied, the
|
||||
bypass becomes unnecessary; KWin's fence wait completes correctly
|
||||
because the fence now signals when hantro completes the capture
|
||||
buffer write.
|
||||
|
||||
End-to-end result before the kernel patch (chromium + Qt 6 patches +
|
||||
KWin watchDmaBuf bypass): 1080p30 H.264 plays through, ~81% combined
|
||||
@@ -100,8 +100,8 @@ What's missing in this RFC
|
||||
- Other vb2-using drivers don't opt in. Each maintainer should look
|
||||
at their driver and decide. The hantro + rga patches show the
|
||||
shape; copying it to other drivers should be straightforward.
|
||||
- For drivers that have intermediate image-processor stages
|
||||
(e.g. CSI → ISP → user), the fence semantics across stage boundaries
|
||||
- For drivers that have intermediate image-processor stages (e.g.
|
||||
CSI -> ISP -> user), the fence semantics across stage boundaries
|
||||
are out of scope here. This series only addresses the producer-to-
|
||||
userspace edge.
|
||||
- No selftest. videobuf2 doesn't have a great in-tree selftest harness
|
||||
@@ -114,14 +114,28 @@ Reviews especially welcome on:
|
||||
vb2-CAPTURE queues. Auto-on would force every driver to be audited;
|
||||
opt-in is incremental and safer but leaves the contract gap for
|
||||
drivers nobody touches.
|
||||
- Whether `vb2_buffer_done` is the right place to signal vs. an
|
||||
earlier hook (e.g. immediately after DMA-from-device finishes). For
|
||||
hantro the two are effectively the same; for drivers with
|
||||
asynchronous post-processing they may differ.
|
||||
- The choice of `DMA_RESV_USAGE_WRITE` vs the older
|
||||
`dma_resv_set_excl_fence` semantics. We're emitting the producer's
|
||||
write completion, so WRITE matches dma-buf documentation, but I'd
|
||||
appreciate a sanity check.
|
||||
- Whether vb2_buffer_done is the right place to signal vs. an earlier
|
||||
hook (e.g. immediately after DMA-from-device finishes). For hantro
|
||||
the two are effectively the same; for drivers with asynchronous
|
||||
post-processing they may differ.
|
||||
- The choice of DMA_RESV_USAGE_WRITE — we are emitting the producer's
|
||||
write completion, so WRITE matches dma-buf documentation, but a
|
||||
sanity check is welcome.
|
||||
|
||||
Cheers,
|
||||
Markus
|
||||
|
||||
Markus Fritsche (3):
|
||||
media: videobuf2: add dma_resv release-fence helper
|
||||
media: hantro: attach dma_resv release fence at buf_queue
|
||||
media: rockchip-rga: attach dma_resv release fence at buf_queue
|
||||
|
||||
.../media/common/videobuf2/videobuf2-core.c | 95 +++++++++++++++++++
|
||||
drivers/media/platform/rockchip/rga/rga-buf.c | 10 ++
|
||||
.../media/platform/verisilicon/hantro_v4l2.c | 12 +++
|
||||
include/media/videobuf2-core.h | 29 ++++++
|
||||
4 files changed, 146 insertions(+)
|
||||
|
||||
--
|
||||
2.47.3
|
||||
|
||||
|
||||
Reference in New Issue
Block a user