Upstream-submit vb2 dma_resv RFC v2 to linux-media #3
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Tracking the v2 resubmission of the videobuf2
dma_resvproducer-fence series to linux-media. iter1 (closed #2) provided the empirical "mechanism is necessary and sufficient" evidence the v2 cover letter wants.Background
v1 sent 2026-04-29:
[PATCH RFC 0/3] media: videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports(cover Msg-Id<20260429195306.239666-1-mfritsche@reauktion.de>, lore: https://lore.kernel.org/linux-media/20260429195306.239666-1-mfritsche@reauktion.de/).Two substantive reviews received 2026-04-30:
buf_queueattach point violates the dma_fence finite-time contract (OUTPUT side may be starved → fence never arms → compositor hangs).dma_fence_begin/end_signalling()and run with lockdep.Net: directed redirect with a permission slip for our exact use case (implicit-synced panfrost importer).
What iter1 settled
iter1 of dmabuf-modifier-triage (closed #2) built
linux-pinetab2-danctnix-besser pkgrel=2with the v1-shape patches and ran the standard mpv--vo=dmabuf-wayland --hwdec=v4l2requestreproduction on ohm:c8c8e9b88521a0069f709d483451c3d4(uniform green = GPU sampled zero pages)f6c6e78291e6cdc020d78f83178caef6(real BBB frame, 62,750 colors)The mechanism is necessary and sufficient for the panfrost zero-copy import case on RK3566. Christian König's "behind a flag for implicit-synced HW" framing is empirically the right pitch.
What v2 still owes (decisions already taken — see auto-memory
project_vb2_dma_resv_v2_state.md)buf_queueto m2mdevice_run. Closes Nicolas's finite-time-fence concern.drivers/media/platform/verisilicon/hantro_drv.c:187(v4l2_m2m_buf_copy_metadata) and line 189 (ctx->codec_ops->run). PM/clocks already up.spin_lock_irqsave(&rga->ctrl_lock)atdrivers/media/platform/rockchip/rga/rga.c:41; helper callsdma_resv_lock(sleepable). Restructure: liftv4l2_m2m_next_*_bufand the fence-attach above the spinlock.vb2_queue::supports_release_fencesbool, set by drivers in queue init. Helper no-ops when unset. Combined with KconfigCONFIG_VIDEOBUF2_RELEASE_FENCES(default n) for distributor gating.dma_fence_begin_signalling()/dma_fence_end_signalling()around publish + signal paths.bool(not tristate).[PATCH RFC v2].daniel.vetter@ffwll.chper MAINTAINERS).--in-reply-toof v1 cover so v2 threads cleanly under v1.Pre-flight checklist before
git send-emailvb2_queue::supports_release_fencesbool + Kconfigdma_fence_begin/end_signalling()CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=y; install on ohmc8c8e9b88521a0069f709d483451c3d4(i.e. v2 attach point still fixes the bug — non-trivial because device_run timing differs from buf_queue)marfrit-packages/upstream-submissions/vb2-dma-resv/v2-cover-letter-sketch.mdscripts/get_maintainer.pl+ manual Daniel/Sima Vetter)git send-emailwith--in-reply-to=20260429195306.239666-1-mfritsche@reauktion.deOut of scope for v2 (defer to v3 / parallel work)
media_requestout-fence path (Nicolas's preferred long-term direction). Bigger redesign, parallel work, does not conflict with this opt-in producer-fence series. He cited DW100's m2m-to-request conversion as precedent.Artifacts on disk
marfrit-packages/upstream-submissions/vb2-dma-resv/v2-cover-letter-sketch.mdv2-prior-art-references.md~/src/linux-rfc(branchvb2-dma-resv-rfc)boltzmann:~/src/besser/danctnix-besser-pkgbuild/kernel/linux-pinetab2-danctnix-besser-7.0.danctnix1-2-aarch64.pkg.tar.zstohm:/tmp/iter1-screenshot-1778310575.png(also at/tmp/iter1-result.pngon driver host)Lore-watch caveat for thread fetches
lore-watch.shonly matches headers naming the user's address. Mid-thread replies that drop the address (typical for review threads) are NOT caught. Both v1 review replies were missed by cron; fetched manually via the public-inbox git protocol on noether. Same expected for v2.Sonnet audit (2026-05-09)
Independent audit of v2 readiness against v1 reviewer feedback.
Coverage of v1 feedback — 5/6 plan-complete, 1 plan-only
device_runattach in plan addressesDMA_BUF_IOCTL_EXPORT_SYNC_FILE/media_requestout-fence (Nicolas)CONFIG_VIDEOBUF2_RELEASE_FENCES=n+vb2_queue::supports_release_fencesetnaviv_gem_submit.c::submit_attach_object_fences,msm_gem_submit.c::submit_attach_object_fences)dma_fence_begin/end_signalling())Readiness gaps before
git send-emailbuf_queuepatches are on disk atboltzmann:~/src/besser/...; the newdevice_runvariants haven't been written.CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=ybuild not run.device_runattach point not empirically validated on hardware (iter1 validated v1'sbuf_queueshape; v2's attach point change is a separate empirical question).daniel.vetter@ffwll.chper MAINTAINERS — Christian named "Sima Vetter's primitives").linux-rockchip lore check (independent of linux-media)
mfritsche@reauktion.de—lore-watch.sh's blind-spot didn't bite us this round.Unrelated-but-relevant rebase risk: Sven Püschel's RGA3 series
Sven Püschel (Pengutronix) posted "media: platform: rga: Add RGA3 support" v5, 29 patches, 2026-04-28, targeting RK3588. Active review through 2026-05-08.
It restructures
rga.csubstantially — adding RGA3 multi-core support, refactoringrga_framelayout, movingcmdbuftorga_ctx, and restructuring the HW kick path.Does not directly conflict with
device_runyet, but: if it merges into media-next before our v2, the call-site line numbers in our sketch (v6.12 mainline) shift and patch 3 needs rebase. Soft risk, not a hard conflict.Top 3 risks
rga spinlock restructure introduces a subtle race. Moving
v4l2_m2m_next_src_buf/v4l2_m2m_next_dst_bufabovespin_lock_irqsave(&rga->ctrl_lock)assumes the m2m core's locking covers buffer state across that window. hantro makes this call without a driver spinlock, so the pattern is attested. But rga'sctrl_lockcurrently guards the entiredevice_runbody including the buffer dequeue; lifting the dequeue above it changes concurrency semantics that have not been audited or tested under load. A buggy restructure could cause a buffer use-after-completion race invisible in single-stream testing.Cover letter claiming lockdep-clean before the run happens. If annotations bracket the wrong region (e.g.,
vb2_buffer_donefrom IRQ context with different annotation requirements), reviewer's PROVE_LOCKING will splat even if ours didn't. Sending with a false claim is worse than sending without the claim. Must do the run before drafting the cover claim.Sven's RGA3 v5 may merge first. Pure timing — would force a v3 to rebase patch 3 against the new shape. Not fatal, but watch upstream cadence before sending.
Plan forward
dma_fence_begin/end_signalling()annotations to the existing v1 patches first (validates the bracketing on a known-working shape).CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=y CONFIG_DEBUG_LOCK_ALLOC=yin.config.PROVE_LOCKING run — clean (annotations validated)
Kernel
linux-pinetab2-danctnix-besser 7.0.danctnix1-3built with:CONFIG_PROVE_LOCKING=y,CONFIG_LOCKDEP=y,CONFIG_DEBUG_LOCK_ALLOC=y0004-vb2-dma_fence-signalling-annotations.patch: tightdma_fence_begin_signalling()/dma_fence_end_signalling()brackets around publish (dma_resv_add_fenceloop, GFP_KERNEL kzalloc kept outside) and signal (dma_fence_set_error+dma_fence_signal).Test
12 seconds of continuous playback. Engagement confirmed in mpv verbose:
Spectacle screenshots at t=2s, 4s, 6s, 8s, 10s, 12s (wall-clock): 6 distinct frames, md5s all different, 184k–369k colors each, all visually clean BBB content (bunny close-up, bunny+apples, chipmunk).
Lockdep result
Zero splats during the full 12s decode → publish → wait → sample → present cycle. The dma_resv release-fence publish path doesn't deadlock against the signal path; nothing in
vb2_buffer_attach_release_fence(which holdsdma_resv_lockper plane) triggers a lockdep ordering complaint againstvb2_buffer_signal_release_fence(which doesn't take that lock).What this validates for v2
CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=y CONFIG_DEBUG_LOCK_ALLOC=yenabled, no splats during 12 seconds of bbb_1080p30 stateless H.264 decode + zero-copy panfrost import on RK3566" without it being a placeholder.0004carries forward to the v2 patches unchanged — thedevice_runattach-point move doesn't change the publish/signal call sites, only when publish is invoked.Caveat: still owed before v2 send
device_runattach + rga spinlock restructure) not yet written.device_runattach point hasn't been re-validated under PROVE_LOCKING; the publish-time context changes (called inside m2m's job dispatch instead of frombuf_queueIOCTL handler), so a fresh lockdep run is required after v2 patches land.Open
User separately reported "flashy block artifacts on iFrame arrival" during continuous playback on pkgrel=2 + mpv-fourier 0.41.0-10 (before this PROVE_LOCKING build). Did not reproduce in this 12s sample on pkgrel=3 — could be intermittent, or the lockdep instrumentation incidentally serializes a race that pkgrel=2 was hitting. Worth a longer-duration visual test to rule out.
Deeper trace — 30s playback, ftrace events, per-second screenshots
Long-form follow-up to
#issuecomment-292. Same pkgrel=3 build (PROVE_LOCKING + 0004 annotations); 30s playback, ftrace events fordma_fence/*,vb2/*,v4l2/*enabled, 30 spectacle screenshots at ~1Hz cadence.Visual result
All 30 frames: mean 35k–50k, colors 84k–369k, entropy 0.61–0.69. Zero frames flagged uniform/low-diversity. No iFrame block artifacts in this 30s window. Captures span ~2:40 to 2:58 of bbb 1080p30 source content (forest scenes with bunny + chipmunk activity, lots of motion).
ftrace event accounting (135,601 lines total)
dma_fence_initdma_fence_signaleddma_fence_destroydma_fence_wait_startdma_fence_wait_endvb2_buf_queuevb2_buf_doneEvery init has a signal. Every wait_start has a wait_end. Every queue has a done. No fence leak, no buffer leak, no hung wait. The 60-fence init/destroy delta is steady-state in-flight at trace stop.
Lockdep
Grep on dmesg surfaced one splat. Detail:
Splat is in
bes2600(BES2600 wireless driver), NOT in any vb2/dma_fence/dma_resv code path. Zerovb2_*,dma_fence,dma_resv,videobufsymbols on the call chain. Pre-existing latent bug surfaced for the first time by enabling PROVE_LOCKING on a besser stack — filed independently as marfrit/besser#18.What this confirms for v2
0004survives 30s of real load (5,724 buffer cycles, 31,816 fence operations, 2,930 fence waits) without surfacing any vb2/dma_resv/dma_fence lockdep warning.CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=y CONFIG_DEBUG_LOCK_ALLOC=yenabled; 30s of stateless H.264 decode + zero-copy panfrost import via dmabuf-wayland; 31,816 dma_fence init/signal pairs, 5,724 vb2 buffer cycles, no lockdep warnings from videobuf2 or dma_resv code paths."Open
User-reported "flashy block artifacts on iFrame arrival" did NOT reproduce in either the 12s sample (
#issuecomment-292) or this 30s sample. Three possibilities:0004annotations — thedma_fence_begin_signalling()brackets enforce a consistent state-machine that pkgrel=2's plain code didn't haveWill schedule a longer playback test to differentiate.
v2 patches: PROVE_LOCKING run on the device_run attach point — clean
Kernel
linux-pinetab2-danctnix-besser 7.0.danctnix1-4rebuilt with the v2 patches (replacing v1):0001-media-videobuf2-add-opt-in-dma_resv-producer-fence-h.patch(helper + Kconfig + opt-in flag + integrated lockdep annotations)0002-media-hantro-attach-dma_resv-release-fence-at-device.patch(attach point moved todevice_run)0003-media-rockchip-rga-attach-dma_resv-release-fence-at-.patch(attach atdevice_run, sleepable helper hoisted abovespin_lock_irqsave(&rga->ctrl_lock))Kernel config:
CONFIG_VIDEOBUF2_RELEASE_FENCES=y CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=y CONFIG_DEBUG_LOCK_ALLOC=y.Test
Same 30s deeper-trace harness as
#issuecomment-293:mpv --hwdec=v4l2request --vo=dmabuf-wayland --fullscreen --start=00:01:00.0 ~/fourier-test/bbb_1080p30_h264.mp4, ftracedma_fence/*+vb2/*+v4l2/*, 30 spectacle screenshots at 1Hz.Result
dma_fence_init/signaleddma_fence_wait_start/endvb2_buf_queue/donemarfrit/besser#18)mpv verbose confirms the full path engages:
What this empirically validates
The v2 device_run attach point and the rga spinlock restructure work correctly under load with PROVE_LOCKING enabled. Specifically:
device_run(betweenv4l2_m2m_buf_copy_metadataandctx->codec_ops->run) and signalled byvb2_buffer_doneafter IRQ-driven decode-complete. No lockdep complaints about the publish-time lock graph, no hung waits.dma_resv_lockhoisted abovespin_lock_irqsave(&rga->ctrl_lock)). Patch 3 still owes a runtime exercise — would need an RGA workload (e.g. format conversion / scale via gstreamer rga sink) to cover that code path. Filed as a v3 follow-up gap.dma_fence_init/ 29,050dma_fence_signaled— the publish/signal annotations bracketed correctly under all observed orderings. CONFIG-gated: whenCONFIG_VIDEOBUF2_RELEASE_FENCES=n, the helper compiles toreturn 0;(no measurable cost for distros that don't opt in).Bes2600 splat absence
The one lockdep splat we caught on pkgrel=3 (
marfrit/besser#18) didn't fire in this 30s run. The trigger path requires a specific cfg80211 wiphy-work flush concurrent with bes2600 softirq tx — that timing window simply didn't open during this test. The bug still exists; the absence here is just the trigger window not opening.Pre-flight checklist for v2 send (per audit)
git send-emailNote on user-reported iFrame artifacts
Did not reproduce in this 30s sample either (matches pkgrel=3 result). 90 cumulative seconds of playback (3 × 30s) under v1-shape, v1+annotations-shape, and v2-shape — zero artifacts captured at 1Hz. Either intermittent at >1s intervals or PROVE_LOCKING/annotations incidentally serialize the race. Open question for v3 gates.
v2 patch 0003 (rga) — runtime exercise under PROVE_LOCKING
Closing the audit's open gap. Same pkgrel=4 kernel.
Workload
gstreamer pipeline driving rga m2m at 30 fps for 30 seconds:
This exercises rga's
device_runat NV12 1080p → RGB 720p conversion + scale, which goes through every code path the v2 0003 patch touches:v4l2_m2m_next_src_buf,v4l2_m2m_next_dst_buf,vb2_buffer_attach_release_fence(&dst->vb2_buf), thenspin_lock_irqsave(&rga->ctrl_lock)forrga_hw_start.Result
dma_fence_init/signaledvb2_buf_queue/donedma_fence_wait_*WARNING:/BUG:/Oopslockdep/deadlock/circular/signalling/annotatKernel produced informational rga driver logs about the format negotiation, no warnings:
What this empirically validates
The primary risk Sonnet flagged in the audit (
#issuecomment-291risk #1) was the rga spinlock restructure introducing a buffer use-after-completion race:Result: 900 m2m operations completed cleanly under PROVE_LOCKING with the restructured locking, no race, no splat. The m2m-job ownership argument made in patch 3's commit message holds empirically: by the time
device_runis invoked, the m2m core has selected this context and serializes onedevice_runper context, sov4l2_m2m_next_*_bufreturns stable pointers until the corresponding*_buf_removeinrga_isr. The earlierctrl_lockwas protecting per-device state (rga->curr) and HW kick (rga_hw_start), not the buffer-fetch.Updated pre-flight checklist
#issuecomment-294)git send-emailAll empirical pre-flight gates closed. Send is gated on user review.