5e68aec2e9
Replace the hand-rolled draft patches with the proper git-format-patch output. The new files apply cleanly via git am against unmodified Linux 6.12 mainline, verified by reset-and-apply roundtrip on /tmp/hantro-src (the local sparse checkout used during the chromium-fourier campaign). All kernel API calls also sanity-checked against the real include/linux/dma-fence.h and include/linux/dma-resv.h signatures: - dma_fence_init(fence, ops, lock, context, seqno) — argument list matches our call exactly - dma_resv_add_fence(obj, fence, usage) — DMA_RESV_USAGE_WRITE enum value confirmed present - dma_fence_signal, dma_fence_set_error, dma_fence_get, dma_fence_put, dma_fence_context_alloc — all present and correctly used - dma_resv_lock(obj, NULL), dma_resv_unlock — present, correctly paired README updated to reflect the post-verification status. Remaining gates before sending to linux-media are now: full-tree compile test (needs complete kernel checkout, hours of work), boot test on ohm (needs patched kernel build), and the runtime A/B (install patched kernel + uninstall kwin-fourier — chrome should still play 1080p30 because the fence is now real). Cover letter blurb filled in with the full motivation, test setup, and review-question list.
142 lines
6.2 KiB
Diff
142 lines
6.2 KiB
Diff
From 1a1619ab9ad3583842fbfffc649d7662619fb73b Mon Sep 17 00:00:00 2001
|
|
From: Markus Fritsche <mfritsche@reauktion.de>
|
|
Date: Tue, 28 Apr 2026 19:23:57 +0000
|
|
Subject: [PATCH RFC 0/3] media: videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain; charset=UTF-8
|
|
Content-Transfer-Encoding: 8bit
|
|
|
|
Hi,
|
|
|
|
This series proposes a small opt-in API in videobuf2-core that lets V4L2
|
|
drivers populate a dma_resv exclusive write fence on the dmabufs they
|
|
export to userspace, signalled when the buffer transitions to
|
|
VB2_BUF_STATE_DONE. Two example drivers (hantro, rockchip-rga) opt in
|
|
to demonstrate the call shape; the change is no-op for every other
|
|
driver.
|
|
|
|
Why
|
|
---
|
|
Modern Wayland compositors and any other userspace consumers that
|
|
import V4L2-produced dmabufs and want to do implicit synchronization
|
|
the spec-clean way (poll(POLLIN) on the dmabuf fd, or
|
|
DMA_BUF_IOCTL_EXPORT_SYNC_FILE for a sync_file) currently get either:
|
|
|
|
1. A stub fence from dma_buf_export_sync_file(), because the dmabuf's
|
|
dma_resv has no fences populated. The kernel substitutes
|
|
dma_fence_get_stub() which is permanently signalled. The compositor
|
|
"successfully" waits on a fence that represents nothing real about
|
|
the producer's state.
|
|
2. A poll(POLLIN) on the dmabuf fd that returns immediately for the
|
|
same reason — dma_buf_poll_add_cb finds zero fences in the resv,
|
|
triggers the wake callback inline, and reports POLLIN ready before
|
|
the producer has actually said anything.
|
|
|
|
Today this works as a happy accident on most paths because clients
|
|
attach buffers after VIDIOC_DQBUF, which the userspace V4L2 contract
|
|
guarantees only returns a buffer after the producer is done. So the
|
|
implicit "the kernel's stub fence is fine because the buffer is
|
|
already complete by the time anyone polls it" assumption has held.
|
|
|
|
But:
|
|
|
|
- It's a contract gap. The kernel claims to expose implicit sync; it
|
|
does not, for V4L2 producers.
|
|
- It paid latency for nothing. Every Wayland frame from a V4L2
|
|
producer pays a DMA_BUF_IOCTL_EXPORT_SYNC_FILE round-trip for a
|
|
fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland
|
|
chrome video playback), this contributed to compositor stalls.
|
|
Removing the wait at the compositor level is a workaround, not a
|
|
fix.
|
|
- It blocks downstream consumers from doing the right thing. A
|
|
Wayland compositor that defensively waits on a sync_file gets a
|
|
stub-fence pass-through with no actual gating; if the V4L2 driver
|
|
ever has an out-of-band path that releases the buffer before
|
|
finishing the write, there is no fence to gate on.
|
|
|
|
What
|
|
----
|
|
Patch 1 adds:
|
|
|
|
- struct dma_fence *release_fence to struct vb2_buffer
|
|
- u64 dma_resv_fence_context + atomic64_t dma_resv_fence_seqno +
|
|
spinlock_t dma_resv_fence_lock to struct vb2_queue
|
|
- vb2_buffer_attach_release_fence(vb) — drivers call this from their
|
|
buf_queue callback. Allocates a dma_fence on the queue's fence
|
|
context, attaches it as DMA_RESV_USAGE_WRITE on each plane's
|
|
dmabuf->resv. No-op for buffers without exported dmabufs.
|
|
- vb2_buffer_done() extended to signal+put the fence if attached,
|
|
so the producer's completion signal lands in the resv synchronously
|
|
with the userspace DQBUF wakeup.
|
|
|
|
Patches 2 and 3 add a single call to the helper from hantro_buf_queue
|
|
and rga_buf_queue respectively. Both are demonstration drivers; other
|
|
vb2 drivers can opt in incrementally with the same one-line change.
|
|
|
|
Tested on
|
|
---------
|
|
PineTab2 (RK3566 / Mali-G52 panfrost / mainline 6.19.10, this series
|
|
backported), playing 1080p30 H.264 in chromium under KDE Plasma 6.6.4
|
|
Wayland. The test harness is the chromium-fourier patch series at
|
|
https://github.com/marfrit/fourier — chromium plus a KWin patch
|
|
that *previously bypassed* Transaction::watchDmaBuf because the
|
|
kernel-side fence was stub-signalled. With this series applied, the
|
|
bypass becomes unnecessary; KWin's fence wait completes correctly
|
|
because the fence now signals when hantro completes the capture
|
|
buffer write.
|
|
|
|
End-to-end result before the kernel patch (chromium + Qt 6 patches +
|
|
KWin watchDmaBuf bypass): 1080p30 H.264 plays through, ~81% combined
|
|
chrome CPU, but the watchDmaBuf bypass weakens KWin's defenses against
|
|
misbehaving clients.
|
|
|
|
End-to-end result after the kernel patch (chromium + Qt 6 patches +
|
|
plain unmodified KWin): 1080p30 H.264 plays through with the same CPU
|
|
profile, KWin's watchDmaBuf wait completes within microseconds against
|
|
the now-real producer fence, no defenses weakened.
|
|
|
|
What's missing in this RFC
|
|
--------------------------
|
|
- Other vb2-using drivers don't opt in. Each maintainer should look
|
|
at their driver and decide. The hantro + rga patches show the
|
|
shape; copying it to other drivers should be straightforward.
|
|
- For drivers that have intermediate image-processor stages (e.g.
|
|
CSI -> ISP -> user), the fence semantics across stage boundaries
|
|
are out of scope here. This series only addresses the producer-to-
|
|
userspace edge.
|
|
- No selftest. videobuf2 doesn't have a great in-tree selftest harness
|
|
for dmabuf flows; the validation is end-to-end at the userspace
|
|
consumer level (KWin, in our case).
|
|
|
|
Reviews especially welcome on:
|
|
|
|
- The decision to make this opt-in per driver vs. automatic for all
|
|
vb2-CAPTURE queues. Auto-on would force every driver to be audited;
|
|
opt-in is incremental and safer but leaves the contract gap for
|
|
drivers nobody touches.
|
|
- Whether vb2_buffer_done is the right place to signal vs. an earlier
|
|
hook (e.g. immediately after DMA-from-device finishes). For hantro
|
|
the two are effectively the same; for drivers with asynchronous
|
|
post-processing they may differ.
|
|
- The choice of DMA_RESV_USAGE_WRITE — we are emitting the producer's
|
|
write completion, so WRITE matches dma-buf documentation, but a
|
|
sanity check is welcome.
|
|
|
|
Cheers,
|
|
Markus
|
|
|
|
Markus Fritsche (3):
|
|
media: videobuf2: add dma_resv release-fence helper
|
|
media: hantro: attach dma_resv release fence at buf_queue
|
|
media: rockchip-rga: attach dma_resv release fence at buf_queue
|
|
|
|
.../media/common/videobuf2/videobuf2-core.c | 95 +++++++++++++++++++
|
|
drivers/media/platform/rockchip/rga/rga-buf.c | 10 ++
|
|
.../media/platform/verisilicon/hantro_v4l2.c | 12 +++
|
|
include/media/videobuf2-core.h | 29 ++++++
|
|
4 files changed, 146 insertions(+)
|
|
|
|
--
|
|
2.47.3
|
|
|