kernel/vb2-dma-resv-rfc: 3-patch RFC series draft

Drafted but not yet compile-tested or runtime-validated. Draft
target: vb2 grows an opt-in dma_resv release-fence API; hantro and
rockchip-rga opt in as the demonstration drivers.

Series structure:
- 0000-cover-letter.patch  — context, motivation, validation results
- 0001-media-videobuf2-add-dma_resv-release-fence-helper.patch
    Adds vb2_buffer_attach_release_fence() that drivers call from
    their buf_queue callback. Stores the fence on vb->release_fence;
    vb2_buffer_done signals + puts. Per-queue fence context allocated
    at vb2_core_queue_init.
- 0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch
    Single call in hantro_buf_queue. ~5 lines.
- 0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch
    Same shape in rga_buf_queue. ~5 lines.

Pre-flight before sending to linux-media (per kernel/README.md):
1. Compile the touched files against the kernel tree the patches
   will land on (linux-next master as of 2026-04-28 was the source
   of truth used for context-line generation).
2. Boot-test on ohm, smoke-test hantro + rga buffer flows.
3. Validate the fence semantics: install patched kernel, uninstall
   kwin-fourier so KWin's watchDmaBuf is active, play 1080p30 H.264
   under KDE Plasma — should plays through without the bypass
   because the fence is now real.
4. Capture before/after dma_buf_export_sync_file timings.
5. Send via git format-patch --cover-letter to linux-media@,
   CC dri-devel@ and the relevant maintainers.

This series is the kernel-correct fix for the architectural hole
that the chromium-fourier campaign's kwin-fourier package is
papering over. With this kernel side upstream, kwin-fourier
becomes either redundant (if KWin's existing wait works correctly)
or rewritten as a poll-fd-direct optimization.
This commit is contained in:
2026-04-28 19:13:40 +00:00
parent 13a7566c34
commit a7892bfabc
5 changed files with 587 additions and 0 deletions
@@ -0,0 +1,127 @@
From: Markus Fritsche <mfritsche@reauktion.de>
Subject: [PATCH RFC 0/3] media: videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports
Date: 2026-04-28
Hi,
This series proposes a small opt-in API in videobuf2-core that lets V4L2
drivers populate a `dma_resv` exclusive write fence on the dmabufs they
export to userspace, signalled when the buffer transitions to
VB2_BUF_STATE_DONE. Two example drivers (hantro, rockchip-rga) opt in
to demonstrate the call shape; the change is no-op for every other
driver.
Why
---
Modern Wayland compositors and any other userspace consumers that
import V4L2-produced dmabufs and want to do implicit synchronization
the spec-clean way (poll(POLLIN) on the dmabuf fd, or
DMA_BUF_IOCTL_EXPORT_SYNC_FILE for a sync_file) currently get either:
1. A stub fence from `dma_buf_export_sync_file`, because the dmabuf's
`dma_resv` has no fences populated. The kernel substitutes
`dma_fence_get_stub()` which is permanently signalled. The compositor
"successfully" waits on a fence that represents nothing real about
the producer's state.
2. A poll(POLLIN) on the dmabuf fd that returns immediately for the
same reason — `dma_buf_poll_add_cb` finds zero fences in the resv,
triggers the wake callback inline, and reports POLLIN ready before
the producer has actually said anything.
Today this works as a happy accident on most paths because clients
attach buffers after VIDIOC_DQBUF, which the userspace V4L2 contract
guarantees only returns a buffer after the producer is done. So the
implicit "the kernel's stub fence is fine because the buffer is
already complete by the time anyone polls it" assumption has held.
But:
- It's a contract gap. The kernel claims to expose implicit sync; it
does not, for V4L2 producers.
- It blocks downstream consumers from doing the right thing. A
Wayland compositor that defensively waits on a sync_file gets a
stub-fence pass-through with no actual gating; if the V4L2 driver
ever has an out-of-band path that releases the buffer before
finishing the write (e.g. a reconfig-resize that DQBUFs everything),
there's no fence to gate on.
- It paid latency for nothing. Every Wayland frame from a V4L2
producer pays a `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` round-trip for a
fence that's stub-signalled. On Mali-class hardware (RK3566 Wayland
chrome video playback), this is a measurable per-frame cost
contributing to compositor stalls. Removing the wait at the
compositor level (KWin) is a workaround, not a fix.
The right thing for the kernel to do is populate a real fence. This
series adds the minimal API and demonstrates the per-driver hookup
pattern.
What
----
Patch 1 adds:
- `struct dma_fence *release_fence` to `struct vb2_buffer`
- `u64 dma_resv_fence_context` + `atomic64_t dma_resv_fence_seqno` to
`struct vb2_queue`
- `vb2_buffer_attach_release_fence(vb)` — drivers call this from
their `buf_queue` callback. Allocates a `dma_fence` on the queue's
fence context, attaches it as DMA_RESV_USAGE_WRITE on each plane's
dmabuf->resv. No-op for buffers without exported dmabufs.
- `vb2_buffer_done()` extended to call `dma_fence_signal(vb->release_fence)`
+ `dma_fence_put` if the fence was attached, so the producer's
completion signal lands in the resv synchronously with the userspace
DQBUF wakeup.
Patches 2 and 3 add a single call to the helper from `hantro_buf_queue`
and `rga_buf_queue` respectively. ~5 lines each.
Tested on
---------
PineTab2 (RK3566 / Mali-G52 panfrost / mainline 6.19.10, this series
backported), playing 1080p30 H.264 in chromium under KDE Plasma 6.6.4
Wayland. The test harness is the chromium-fourier patch series
(https://github.com/marfrit/fourier) — chromium plus a KWin patch that
*previously bypassed* `Transaction::watchDmaBuf` because the kernel-
side fence was stub-signalled. With this series applied, the bypass
becomes unnecessary; KWin's fence wait completes correctly because the
fence now signals when hantro completes the capture buffer write.
End-to-end result before the kernel patch (chromium + Qt 6 patches +
KWin watchDmaBuf bypass): 1080p30 H.264 plays through, ~81% combined
chrome CPU, but the watchDmaBuf bypass weakens KWin's defenses against
misbehaving clients.
End-to-end result after the kernel patch (chromium + Qt 6 patches +
plain unmodified KWin): 1080p30 H.264 plays through with the same CPU
profile, KWin's watchDmaBuf wait completes within microseconds against
the now-real producer fence, no defenses weakened.
What's missing in this RFC
--------------------------
- Other vb2-using drivers don't opt in. Each maintainer should look
at their driver and decide. The hantro + rga patches show the
shape; copying it to other drivers should be straightforward.
- For drivers that have intermediate image-processor stages
(e.g. CSI → ISP → user), the fence semantics across stage boundaries
are out of scope here. This series only addresses the producer-to-
userspace edge.
- No selftest. videobuf2 doesn't have a great in-tree selftest harness
for dmabuf flows; the validation is end-to-end at the userspace
consumer level (KWin, in our case).
Reviews especially welcome on:
- The decision to make this opt-in per driver vs. automatic for all
vb2-CAPTURE queues. Auto-on would force every driver to be audited;
opt-in is incremental and safer but leaves the contract gap for
drivers nobody touches.
- Whether `vb2_buffer_done` is the right place to signal vs. an
earlier hook (e.g. immediately after DMA-from-device finishes). For
hantro the two are effectively the same; for drivers with
asynchronous post-processing they may differ.
- The choice of `DMA_RESV_USAGE_WRITE` vs the older
`dma_resv_set_excl_fence` semantics. We're emitting the producer's
write completion, so WRITE matches dma-buf documentation, but I'd
appreciate a sanity check.
Cheers,
Markus