From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Fri, 8 May 2026 23:30:00 +0000 Subject: [PATCH] vo_dmabuf_wayland: explicit DMA_BUF_IOCTL_SYNC on import fds V4L2 does not attach implicit fences (dma_resv) to CAPTURE buffers on VIDIOC_DQBUF. When the buffer is forwarded to a Wayland compositor that imports it via wl_dmabuf and samples in the GPU, the GPU may read from physical memory before the producer's writes have flushed, producing all-zero output (manifests as solid green for BT.601 limited-range YUV(0,0,0) -> RGB(0, 135, 0) on the consumer side). Issue an explicit DMA_BUF_IOCTL_SYNC(SYNC_START|SYNC_RW) + SYNC_END(SYNC_RW) round-trip on each unique dma_buf fd before zwp_linux_buffer_params_v1_add(). This invokes the producer driver's dma_buf_ops->begin_cpu_access / end_cpu_access, which on most ARM SoCs flushes write buffers and synchronizes coherent memory before the compositor's GPU import. This is a userspace workaround. Root cause is the missing implicit fence on V4L2 CAPTURE DQBUF and is being addressed upstream via the vb2_dma_resv RFC. Without this patch, on RK3566 (hantro VPU + Mali-G52 panfrost + KDE Plasma 6 / KWin 6.6.4), `mpv --hwdec=vaapi --vo=dmabuf-wayland` shows solid green frames for all hardware-decoded content. With this patch, decoded frames are presented correctly. Signed-off-by: Markus Fritsche --- diff --git a/video/out/vo_dmabuf_wayland.c b/video/out/vo_dmabuf_wayland.c index 6b7c511..16e3d18 100644 --- a/video/out/vo_dmabuf_wayland.c +++ b/video/out/vo_dmabuf_wayland.c @@ -27,6 +27,12 @@ #include #endif +/* fourier patch: explicit dma_buf cache sync workaround for missing + * implicit-fence on V4L2 stateless CAPTURE buffers. Applies to both + * VAAPI and DRMPrime import paths. */ +#include +#include + #include "gpu/hwdec.h" #include "gpu/video.h" #include "mpv_talloc.h" @@ -205,6 +211,14 @@ static void vaapi_dmabuf_importer(struct buffer *buf, struct mp_image *src, buf->drm_format = 0; goto done; } + /* fourier patch: explicit cache coherency sync on each dma_buf fd + * before submitting to the compositor. See top-of-file comment. */ + for (int obj_no = 0; obj_no < desc.num_objects; obj_no++) { + struct dma_buf_sync sync = { .flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_RW }; + (void)ioctl(desc.objects[obj_no].fd, DMA_BUF_IOCTL_SYNC, &sync); + sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_RW; + (void)ioctl(desc.objects[obj_no].fd, DMA_BUF_IOCTL_SYNC, &sync); + } for (int plane_no = 0; plane_no < desc.layers[layer_no].num_planes; ++plane_no) { int object = desc.layers[layer_no].object_index[plane_no]; uint64_t modifier = desc.objects[object].drm_format_modifier; @@ -258,6 +272,16 @@ static void drmprime_dmabuf_importer(struct buffer *buf, struct mp_image *src, return; buf->id = drmprime_surface_id(src); + + /* fourier patch: explicit cache coherency sync on each dma_buf fd + * before submitting to the compositor. See top-of-file comment. */ + for (int obj_no = 0; obj_no < desc->nb_objects; obj_no++) { + struct dma_buf_sync sync = { .flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_RW }; + (void)ioctl(desc->objects[obj_no].fd, DMA_BUF_IOCTL_SYNC, &sync); + sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_RW; + (void)ioctl(desc->objects[obj_no].fd, DMA_BUF_IOCTL_SYNC, &sync); + } + for (layer_no = 0; layer_no < desc->nb_layers; layer_no++) { AVDRMLayerDescriptor layer = desc->layers[layer_no]; -- 2.51.0