From: Markus Fritsche Subject: [PATCH RFC 2/3] media: hantro: attach dma_resv release fence at buf_queue Date: 2026-04-28 Opt the hantro driver into the new vb2 release-fence helper. When userspace QBUFs a buffer to hantro, the buffer is added to the driver's m2m queue via v4l2_m2m_buf_queue. We additionally call vb2_buffer_attach_release_fence() so each plane's dmabuf->resv gets a real producer fence attached. The fence is signalled by vb2_buffer_done when hantro completes the decode (via v4l2_m2m_buf_done_and_job_finish in hantro_drv.c, which converges on vb2_buffer_done). Wayland compositors that import hantro CAPTURE buffers (chrome, firefox, mpv, gstreamer) and wait on the dmabuf's implicit-sync fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE) now wait on a real fence representing the producer's actual completion, not a stub. KWin's `Transaction::watchDmaBuf` path on Mali-class hardware is the user-visible benefit: the per-frame sync_file roundtrip completes correctly the moment hantro's IRQ handler runs, instead of either polling on a stub fence or — in the failure mode that drove this work — failing to signal at all due to a race that the stub-fence path masked. Validated on PineTab2 (RK3566 / Mali-G52 / mainline 6.19 with this series backported / panfrost mesa 26.0.5) playing 1080p30 H.264 in chromium under stock KDE Plasma 6.6.4 Wayland: the chrome stall that required a KWin watchDmaBuf bypass workaround (kwin-fourier in the chromium-fourier project) is gone with this kernel-side fix in place; KWin's wait completes correctly. Signed-off-by: Markus Fritsche --- drivers/media/platform/verisilicon/hantro_v4l2.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c --- a/drivers/media/platform/verisilicon/hantro_v4l2.c +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c @@ -858,11 +858,24 @@ static void hantro_buf_queue(struct vb2_buffer *vb) { struct hantro_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue); struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); if (V4L2_TYPE_IS_CAPTURE(vb->vb2_queue->type) && vb2_is_streaming(vb->vb2_queue) && v4l2_m2m_dst_buf_is_last(ctx->fh.m2m_ctx)) { unsigned int i; for (i = 0; i < vb->num_planes; i++) vb2_set_plane_payload(vb, i, 0); vbuf->field = V4L2_FIELD_NONE; vbuf->sequence = ctx->queue[V4L2_TYPE_IS_OUTPUT(vb->vb2_queue->type)].sequence++; v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_DONE); return; } - v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf); + v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf); + + /* + * Opt in to vb2's dma_resv release-fence path: any userspace + * consumer that imported this buffer's dmabuf and is doing + * implicit-sync via poll(POLLIN) or + * DMA_BUF_IOCTL_EXPORT_SYNC_FILE now waits on a real fence + * representing this device's completion, instead of the stub + * fence dma_buf_export_sync_file substitutes when dma_resv is + * empty. Best-effort: if fence allocation fails we just lose + * the implicit-sync precision, no functional regression. + */ + (void)vb2_buffer_attach_release_fence(vb); } const struct vb2_ops hantro_queue_ops = { -- 2.44.0