Merge pull request 'phase 1: promote vb2_dma_resv RFC v2 + add ka-status + ampere as 2nd aarch64 host' (#7) from claude-noether/kernel-agent:noether/phase1-promote-vb2-dma-resv into main

Reviewed-on: #7
This commit was merged in pull request #7.
This commit is contained in:
2026-05-15 15:36:56 +00:00
7 changed files with 790 additions and 13 deletions
+17 -9
View File
@@ -115,7 +115,7 @@ ka-pause-prune / ka-resume-prune
ka-restore-archive <job-id> ka-restore-archive <job-id>
ka-snooze <issue-id> [--for <duration>] ka-snooze <issue-id> [--for <duration>]
ka-debug <job-id> # shells into the same container that ran the build ka-debug <job-id> # shells into the same container that ran the build
ka-status # per-host one-liner with drift/pending state ka-status # per-host one-liner with drift/pending state [bin/ka-status — implemented Phase 1]
ka-migrate-tree --from <p> --to <p> ka-migrate-tree --from <p> --to <p>
ka-wake-data # wraps wake-host data through His ka-wake-data # wraps wake-host data through His
``` ```
@@ -162,11 +162,13 @@ manifest rewrite); paths stable otherwise.
## Build hosts ## Build hosts
``` ```
Host Where Role Wake? Notes Host Where Role Wake? Notes
────────────────────────────────────────────────────────────────────────── ──────────────────────────────────────────────────────────────────────────
boltzmann Rock 5 ITX+ aarch64 primary always container kbuild-aarch64 boltzmann Rock 5 ITX+ aarch64 primary always container kbuild-aarch64
fermi hertz LXD aarch64 fallback always matches kbuild-aarch64 profile ampere CoolPi GenBook aarch64 secondary on-demand RK3588 32GB; same uarch as boltzmann,
kbuild-x86 data CT x86_64 on-demand wakes via His; idle 30 min → release wakes via His; idle 30 min → release
fermi hertz LXD aarch64 fallback always matches kbuild-aarch64 profile
kbuild-x86 data CT x86_64 on-demand wakes via His; idle 30 min → release
``` ```
Native make on the assigned build host. **No distcc** for kernel-agent Native make on the assigned build host. **No distcc** for kernel-agent
@@ -288,10 +290,16 @@ build.
### Out of scope this round (explicit defer) ### Out of scope this round (explicit defer)
- **vb2 dma_resv RFC v2** + panfrost IOMMU_CACHE for RK3399 — would have closed - **vb2 dma_resv RFC v2** *resolved 2026-05-15.* Markus iterated v2 locally
the fresnel-fourier campaign criterion-4 readback transitive-proof gap, but on boltzmann reaching pkgrel=14; the v2 series attaches the fence at
v2 isn't implemented (RFC v1 rejected upstream). Deferred to a follow-up `device_run` (slept-OK context per Dufresne's v1 review). Now carried in
build once v2 lands. See `marfrit/dmabuf-modifier-triage#3`. `patches/subsystem/media/videobuf2/dma-resv-release-fence/` and included
in `fleet/fresnel.yaml`. Still in scope for upstream targeting; default
remains "build-tree only, no PR until explicitly asked"
(`feedback_no_upstream.md`).
- **panfrost IOMMU_CACHE for RK3399** — sibling kernel work that targets the
readback transitive-proof gap that vb2_dma_resv alone doesn't close.
Still deferred until that lands; ship together when ready.
- **Replace** `linux-eos-arm` rather than coexist alongside — preserves easy - **Replace** `linux-eos-arm` rather than coexist alongside — preserves easy
rollback at u-boot. Can flip to `provides=(linux-eos-arm) conflicts=(...)` rollback at u-boot. Can flip to `provides=(linux-eos-arm) conflicts=(...)`
later once burn-in proves the OC kernel reliable. later once burn-in proves the OC kernel reliable.
Executable
+136
View File
@@ -0,0 +1,136 @@
#!/usr/bin/env bash
#
# ka-status — per-host kernel-agent state summary.
#
# Reads fleet/*.yaml manifests + queries Gitea for open [ka:*] issues +
# probes each host (where reachable) for the installed kernel-package
# version. Designed to give a first-look "what's the state" before any
# ka-promote / ka-install action.
#
# Usage:
# ka-status # summary across all manifests
# ka-status <host> # detail for one host
#
# Read-only. Never mutates state. No sudo. No SSH-into-host writes.
#
# Phase 1 deliverable. Future ka-* CLI verbs (ka-promote / ka-close /
# ka-install) build on the same Gitea-API + manifest-parsing skeleton.
set -euo pipefail
GITEA_URL="${GITEA_URL:-https://git.reauktion.de}"
REPO="${KERNEL_AGENT_REPO:-marfrit/kernel-agent}"
TOKEN_FILE="${KERNEL_AGENT_TOKEN_FILE:-/opt/herding/etc/claude-identities/noether.creds}"
# Resolve token from per-host claude-identity creds, or env override.
token=""
if [ -n "${GITEA_TOKEN:-}" ]; then
token="$GITEA_TOKEN"
elif [ -r "$TOKEN_FILE" ]; then
token="$(grep -E '^GITEA_TOKEN=' "$TOKEN_FILE" | head -1 | cut -d= -f2)"
fi
# Locate fleet/ — script lives in bin/ next to fleet/.
script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
fleet_dir="${script_dir}/../fleet"
[ -d "$fleet_dir" ] || { echo "fleet/ not found relative to $script_dir" >&2; exit 2; }
api_get() {
local path="$1"
local args=(--silent --show-error --max-time 15)
[ -n "$token" ] && args+=(-H "Authorization: token $token")
curl "${args[@]}" "${GITEA_URL}/api/v1/${path}"
}
# Open ka-prefixed issues, JSON array on stdout.
fetch_ka_issues() {
api_get "repos/${REPO}/issues?state=open&type=issues&limit=50" \
| python3 -c '
import json, sys
issues = json.load(sys.stdin)
def kind(t):
if not t.startswith("["): return None
p = t.find("]")
return t[1:p] if p > 0 else None
out = [{
"number": i["number"],
"title": i["title"],
"kind": kind(i["title"]),
} for i in issues if kind(i["title"]) and kind(i["title"]).startswith("ka:")]
print(json.dumps(out))
' 2>/dev/null || echo "[]"
}
# Parse a fleet/<host>.yaml manifest (very-narrow YAML subset).
manifest_field() {
local file="$1" key="$2"
grep -E "^[[:space:]]*${key}:" "$file" | head -1 | sed -E "s/^[^:]*:[[:space:]]*//; s/[[:space:]]+#.*//; s/^[\"']//; s/[\"']$//"
}
manifest_pkgname() {
local file="$1"
awk '/^package:/{p=1; next} p && /^[a-z]/{p=0} p && /^[[:space:]]+name:/{sub(/^[[:space:]]+name:[[:space:]]*/,""); print; exit}' "$file"
}
# Probe a host for installed kernel-package version (best-effort, non-blocking).
probe_installed() {
local host="$1" pkg="$2"
ssh -o ConnectTimeout=3 -o BatchMode=yes -o StrictHostKeyChecking=accept-new \
"${host}.fritz.box" "pacman -Q '$pkg' 2>/dev/null || dpkg-query -W -f='\${Package} \${Version}\n' '$pkg' 2>/dev/null || echo 'host-up:not-installed'" 2>/dev/null \
|| echo "host-down"
}
issues_json="$(fetch_ka_issues)"
# Per-host issue grouping — match on title containing the host name (cheap heuristic;
# proper kernel-agent will tag issues with a host label).
issues_for_host() {
local host="$1"
echo "$issues_json" | python3 -c "
import json, sys
host = '$host'
issues = json.load(sys.stdin)
hits = [i for i in issues if host in i['title'].lower()]
for h in hits:
print(f\" #{h['number']} [{h['kind']}] {h['title']}\")
" 2>/dev/null
}
print_host() {
local file="$1"
local host pkg
host="$(basename "$file" .yaml)"
pkg="$(manifest_pkgname "$file")"
local arch="$(manifest_field "$file" arch)"
local soc="$(manifest_field "$file" soc)"
local board="$(manifest_field "$file" board)"
printf '\n══ %s ══\n' "$host"
printf ' manifest: arch=%s soc=%s board=%s\n' "$arch" "$soc" "$board"
printf ' package: %s\n' "$pkg"
if [ -n "$pkg" ]; then
printf ' installed: %s\n' "$(probe_installed "$host" "$pkg")"
fi
local n=0
while IFS= read -r line; do
[ -z "$line" ] && continue
if [ $n -eq 0 ]; then printf ' open ka-issues:\n'; fi
printf '%s\n' "$line"
n=$((n+1))
done < <(issues_for_host "$host")
[ $n -eq 0 ] && printf ' open ka-issues: (none for this host)\n'
}
if [ $# -ge 1 ]; then
f="${fleet_dir}/${1}.yaml"
[ -r "$f" ] || { echo "no manifest for host '$1'" >&2; exit 2; }
print_host "$f"
else
printf 'kernel-agent status (repo: %s)\n' "$REPO"
printf 'open [ka:*] issues total: %s\n' "$(echo "$issues_json" | python3 -c 'import json,sys; print(len(json.load(sys.stdin)))')"
for f in "$fleet_dir"/*.yaml; do
[ -e "$f" ] || continue
print_host "$f"
done
fi
+8 -4
View File
@@ -23,13 +23,17 @@ includes:
- board/pinebook-pro/0001-arm64-dts-rk3399-pinebook-pro-add-OC-OPP-tables-1704-2184.patch - board/pinebook-pro/0001-arm64-dts-rk3399-pinebook-pro-add-OC-OPP-tables-1704-2184.patch
- board/pinebook-pro/0002-arm64-dts-rk3399-pinebook-pro-enable-hdmi-sound.patch - board/pinebook-pro/0002-arm64-dts-rk3399-pinebook-pro-enable-hdmi-sound.patch
- board/pinebook-pro/0003-arm64-dts-rk3399-pinebook-pro-spi1-max-freq-10MHz.patch - board/pinebook-pro/0003-arm64-dts-rk3399-pinebook-pro-spi1-max-freq-10MHz.patch
# vb2_dma_resv RFC v2 series — added 2026-05-15 (Markus iterated v2 locally
# on boltzmann reaching pkgrel=14; pre-v2 decision was "defer". v2 attaches
# the fence at device_run in slept-OK context per Dufresne's v1 review).
- subsystem/media/videobuf2/dma-resv-release-fence/0004-media-videobuf2-add-opt-in-dma_resv-producer-fence-h.patch
- subsystem/media/videobuf2/dma-resv-release-fence/0005-media-hantro-attach-dma_resv-release-fence-at-device.patch
- subsystem/media/videobuf2/dma-resv-release-fence/0006-media-rockchip-rga-attach-dma_resv-release-fence-at-.patch
# Explicitly NOT included (tracked elsewhere, decision logged): # Explicitly NOT included (tracked elsewhere, decision logged):
# - subsystem/media/videobuf2/dma-resv-release-fence/ (RFC v1 rejected;
# v2 in design — see marfrit/dmabuf-modifier-triage#3. Skip until v2 lands
# or we explicitly accept v1-shape parity with ohm.)
# - driver/panfrost/iommu-cache-rk3399/ (sibling kernel work; ship together # - driver/panfrost/iommu-cache-rk3399/ (sibling kernel work; ship together
# with vb2_dma_resv when it lands.) # once it lands. Targets the readback transitive-proof gap that vb2_dma_resv
# alone doesn't close.)
config: config:
source: /proc/config.gz on running fresnel kernel (linux-eos-arm 6.19.9-99) source: /proc/config.gz on running fresnel kernel (linux-eos-arm 6.19.9-99)
@@ -0,0 +1,356 @@
From a202de1646d4c8f8ee2ebc2e4c100b621975754a Mon Sep 17 00:00:00 2001
In-Reply-To: <20260429195306.239666-1-mfritsche@reauktion.de>
References: <20260429195306.239666-1-mfritsche@reauktion.de>
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Sat, 9 May 2026 16:16:07 +0200
Subject: [PATCH RFC v2] media: videobuf2: add opt-in dma_resv producer fence
helper
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
V4L2 producers historically don't propagate buffer-state-done into
the dmabuf's dma_resv exclusive fence. Userspace consumers that
import V4L2-produced dmabufs and wait on the dmabuf's implicit-sync
fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE,
EGL_LINUX_DMA_BUF_EXT) currently see either zero fences or a stub
fence from dma_fence_get_stub(). This is correct by accident for the
common DQBUF-then-import case but represents a contract gap that
breaks Wayland compositors importing CAPTURE buffers from a stateless
H.264 decoder under continuous playback on implicit-sync GPU stacks
(observed on RK3566 + hantro VPU + Mali-G52 panfrost; manifests as
green frames -- BT.709 limited-range YUV(0,0,0) -> RGB(0,77,0) -- when
the GPU samples the dmabuf before the producer's decode completes).
Add an opt-in API gated by both a per-driver runtime flag
(vb2_queue::supports_release_fences) and a Kconfig
(CONFIG_VIDEOBUF2_RELEASE_FENCES, default n) that lets producers
populate a real dma_resv exclusive write fence on the dmabufs they
export. Drivers call vb2_buffer_attach_release_fence(vb) at a
finite-time-fenced point in their pipeline (typically m2m
device_run, just before the HW kick); vb2_buffer_done() signals and
puts the fence as part of its state transition.
The publish and signal paths are wrapped in
dma_fence_begin_signalling() / dma_fence_end_signalling() so
PROVE_LOCKING can validate that nothing taken in those critical
sections deadlocks against the signal path. dma_resv_lock is
sleepable but not taken on the signal path, so taking it inside the
publish critical section is safe under lockdep.
Skips planes whose vb2_plane.dbuf is NULL -- buffers never exported
via VIDIOC_EXPBUF (or imported via V4L2_MEMORY_DMABUF) have no
dmabuf for userspace to wait on.
Drivers that don't opt in pay nothing: the helper is a no-op stub
when CONFIG_VIDEOBUF2_RELEASE_FENCES=n, and an early-return check
of supports_release_fences when =y but the flag is unset.
Validated on RK3566 PineTab2 with PROVE_LOCKING enabled: 30s of
bbb_1080p30 H.264 stateless decode + zero-copy panfrost EGL import
via dmabuf-wayland (mpv 0.41 + KWin 6.6.4 + Mesa panfrost 26.0.5)
produces 31,816 dma_fence init/signal pairs across 5,724 vb2 buffer
cycles with zero lockdep splats from videobuf2 / dma_resv code paths.
Subsequent patches in this series opt the hantro and rockchip-rga
drivers in.
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Christian König <christian.koenig@amd.com>
Cc: Nicolas Dufresne <nicolas@ndufresne.ca>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/common/videobuf2/Kconfig | 29 ++++
.../media/common/videobuf2/videobuf2-core.c | 135 ++++++++++++++++++
include/media/videobuf2-core.h | 51 +++++++
3 files changed, 215 insertions(+)
diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
index d2223a12c..bbfa26984 100644
--- a/drivers/media/common/videobuf2/Kconfig
+++ b/drivers/media/common/videobuf2/Kconfig
@@ -30,3 +30,32 @@ config VIDEOBUF2_DMA_SG
config VIDEOBUF2_DVB
tristate
select VIDEOBUF2_CORE
+
+config VIDEOBUF2_RELEASE_FENCES
+ bool "videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports"
+ depends on VIDEOBUF2_CORE
+ depends on DMA_SHARED_BUFFER
+ default n
+ help
+ Enables an opt-in API that lets vb2 producers populate a dma_resv
+ exclusive write fence on the dmabufs they export to userspace.
+ The fence is signalled when the buffer transitions to
+ VB2_BUF_STATE_DONE.
+
+ This gives userspace consumers that import V4L2-produced dmabufs
+ and wait on the dmabuf's implicit-sync fence (poll(POLLIN),
+ DMA_BUF_IOCTL_EXPORT_SYNC_FILE, EGL_LINUX_DMA_BUF_EXT) a real
+ producer fence to wait on, instead of a stub fence from
+ dma_fence_get_stub() that the dma_buf core substitutes when
+ dma_resv is empty.
+
+ Drivers individually opt in by setting
+ vb2_queue::supports_release_fences = true and calling
+ vb2_buffer_attach_release_fence() at the right point in their
+ pipeline (typically m2m device_run, just before HW kick).
+
+ Distributors leave this off unless targeting Wayland/EGL
+ consumers of V4L2 stateless decoder output on
+ implicit-sync-only GPU stacks (e.g. mainline panfrost).
+
+ If unsure, say N.
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index adf668b21..85d7fddbd 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -26,6 +26,12 @@
#include <linux/freezer.h>
#include <linux/kthread.h>
+#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES
+#include <linux/dma-fence.h>
+#include <linux/dma-resv.h>
+#include <linux/dma-buf.h>
+#endif
+
#include <media/videobuf2-core.h>
#include <media/v4l2-mc.h>
@@ -1173,6 +1179,120 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no)
}
EXPORT_SYMBOL_GPL(vb2_plane_cookie);
+#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES
+/*
+ * dma_resv release-fence integration.
+ *
+ * Optional, opt-in path that lets producers publish a real
+ * dma_fence on their CAPTURE-side dmabufs so userspace consumers
+ * (compositors, EGL importers) get spec-clean implicit-sync
+ * semantics instead of the dma_buf core's stub fence. Drivers
+ * call vb2_buffer_attach_release_fence() at a finite-time-fenced
+ * point (typically m2m device_run) and the fence is signalled by
+ * vb2_buffer_done(). Gated at runtime by
+ * vb2_queue::supports_release_fences and at compile time by
+ * CONFIG_VIDEOBUF2_RELEASE_FENCES.
+ */
+
+static const char *vb2_dma_resv_get_driver_name(struct dma_fence *fence)
+{
+ return "videobuf2";
+}
+
+static const char *vb2_dma_resv_get_timeline_name(struct dma_fence *fence)
+{
+ return "vb2-release-fence";
+}
+
+static const struct dma_fence_ops vb2_dma_resv_fence_ops = {
+ .get_driver_name = vb2_dma_resv_get_driver_name,
+ .get_timeline_name = vb2_dma_resv_get_timeline_name,
+};
+
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb)
+{
+ struct vb2_queue *q = vb->vb2_queue;
+ struct dma_fence *fence;
+ unsigned int plane;
+ bool cookie;
+
+ if (!q->supports_release_fences)
+ return 0;
+
+ if (WARN_ON(vb->release_fence))
+ return -EINVAL;
+
+ fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+ if (!fence)
+ return -ENOMEM;
+
+ dma_fence_init(fence, &vb2_dma_resv_fence_ops, &q->dma_resv_fence_lock,
+ q->dma_resv_fence_context,
+ atomic64_inc_return(&q->dma_resv_fence_seqno));
+
+ /*
+ * Annotate the publish-side critical section. Per
+ * Documentation/driver-api/dma-buf.rst, lockdep validates
+ * that nothing taken in this region can deadlock against
+ * the signal path in vb2_buffer_signal_release_fence().
+ * dma_resv_lock is sleepable but is not taken on the signal
+ * path, so taking it inside the critical section is safe.
+ */
+ cookie = dma_fence_begin_signalling();
+ for (plane = 0; plane < vb->num_planes; plane++) {
+ struct dma_buf *dbuf = vb->planes[plane].dbuf;
+
+ if (!dbuf)
+ continue;
+
+ dma_resv_lock(dbuf->resv, NULL);
+ dma_resv_add_fence(dbuf->resv, fence, DMA_RESV_USAGE_WRITE);
+ dma_resv_unlock(dbuf->resv);
+ }
+ dma_fence_end_signalling(cookie);
+
+ /* One reference for the eventual signal in vb2_buffer_done. */
+ vb->release_fence = dma_fence_get(fence);
+
+ /* The dma_resv held its own reference per plane. Drop ours. */
+ dma_fence_put(fence);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vb2_buffer_attach_release_fence);
+
+static void vb2_buffer_signal_release_fence(struct vb2_buffer *vb,
+ enum vb2_buffer_state state)
+{
+ struct dma_fence *fence = vb->release_fence;
+ bool cookie;
+
+ if (!fence)
+ return;
+
+ cookie = dma_fence_begin_signalling();
+ if (state == VB2_BUF_STATE_ERROR)
+ dma_fence_set_error(fence, -EIO);
+ dma_fence_signal(fence);
+ dma_fence_end_signalling(cookie);
+
+ dma_fence_put(fence);
+ vb->release_fence = NULL;
+}
+#else /* !CONFIG_VIDEOBUF2_RELEASE_FENCES */
+
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vb2_buffer_attach_release_fence);
+
+static inline void vb2_buffer_signal_release_fence(struct vb2_buffer *vb,
+ enum vb2_buffer_state state)
+{
+}
+#endif /* CONFIG_VIDEOBUF2_RELEASE_FENCES */
+
void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
{
struct vb2_queue *q = vb->vb2_queue;
@@ -1199,6 +1319,9 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
if (state != VB2_BUF_STATE_QUEUED)
__vb2_buf_mem_finish(vb);
+ if (state != VB2_BUF_STATE_QUEUED)
+ vb2_buffer_signal_release_fence(vb, state);
+
spin_lock_irqsave(&q->done_lock, flags);
if (state == VB2_BUF_STATE_QUEUED) {
vb->state = VB2_BUF_STATE_QUEUED;
@@ -2651,6 +2774,18 @@ int vb2_core_queue_init(struct vb2_queue *q)
mutex_init(&q->mmap_lock);
init_waitqueue_head(&q->done_wq);
+#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES
+ /*
+ * Per-queue dma_resv release-fence context. Drivers that
+ * opt in via supports_release_fences and call
+ * vb2_buffer_attach_release_fence() use these to allocate
+ * fences on a single per-queue timeline.
+ */
+ q->dma_resv_fence_context = dma_fence_context_alloc(1);
+ atomic64_set(&q->dma_resv_fence_seqno, 0);
+ spin_lock_init(&q->dma_resv_fence_lock);
+#endif
+
q->memory = VB2_MEMORY_UNKNOWN;
if (q->buf_struct_size == 0)
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index 4424d481d..766ff2194 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -288,6 +288,16 @@ struct vb2_buffer {
unsigned int skip_cache_sync_on_finish:1;
struct vb2_plane planes[VB2_MAX_PLANES];
+#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES
+ /*
+ * Producer release fence published on each plane's
+ * dmabuf->resv when the driver opts in via
+ * vb2_buffer_attach_release_fence(). Signalled and put by
+ * vb2_buffer_done() on transition to DONE/ERROR. NULL when
+ * the driver did not opt in for this buffer.
+ */
+ struct dma_fence *release_fence;
+#endif
struct list_head queued_entry;
struct list_head done_entry;
#ifdef CONFIG_VIDEO_ADV_DEBUG
@@ -648,6 +658,19 @@ struct vb2_queue {
spinlock_t done_lock;
wait_queue_head_t done_wq;
+#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES
+ /*
+ * dma_resv release-fence context. Drivers that set
+ * supports_release_fences and call
+ * vb2_buffer_attach_release_fence() use these to allocate
+ * fences on a per-queue timeline.
+ */
+ u64 dma_resv_fence_context;
+ atomic64_t dma_resv_fence_seqno;
+ spinlock_t dma_resv_fence_lock;
+#endif
+
+ unsigned int supports_release_fences:1;
unsigned int streaming:1;
unsigned int start_streaming_called:1;
unsigned int error:1;
@@ -735,6 +758,34 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no);
*/
void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state);
+/**
+ * vb2_buffer_attach_release_fence() - opt-in dma_resv release fence.
+ * @vb: the buffer being committed to the producer.
+ *
+ * Drivers that have set vb2_queue::supports_release_fences may call
+ * this from any sleepable context where they have committed to
+ * running the operation in finite time -- typically m2m
+ * device_run(), just before the HW kick. The helper allocates a
+ * dma_fence on the queue's per-queue timeline, attaches it as
+ * DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv, and stashes
+ * it in vb->release_fence. vb2_buffer_done() signals and puts the
+ * fence as part of the buffer's state transition.
+ *
+ * Skips planes whose vb2_plane.dbuf is NULL -- buffers never
+ * exported via VIDIOC_EXPBUF (or imported via V4L2_MEMORY_DMABUF)
+ * have no dmabuf for userspace to wait on.
+ *
+ * No-op when vb2_queue::supports_release_fences is not set
+ * (regardless of CONFIG_VIDEOBUF2_RELEASE_FENCES). When
+ * CONFIG_VIDEOBUF2_RELEASE_FENCES=n, this is a stub that returns 0.
+ *
+ * Returns 0 on success or when the no-op stub is in effect,
+ * negative errno on allocation failure when fence publishing was
+ * attempted. Best-effort: drivers should ignore the return value
+ * unless they want diagnostics.
+ */
+int vb2_buffer_attach_release_fence(struct vb2_buffer *vb);
+
/**
* vb2_discard_done() - discard all buffers marked as DONE.
* @q: pointer to &struct vb2_queue with videobuf2 queue.
--
2.53.0
@@ -0,0 +1,95 @@
From 1844c263bde8dd244d7db46f8c508e7c70da459c Mon Sep 17 00:00:00 2001
In-Reply-To: <20260429195306.239666-1-mfritsche@reauktion.de>
References: <20260429195306.239666-1-mfritsche@reauktion.de>
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Sat, 9 May 2026 16:24:01 +0200
Subject: [PATCH RFC v2] media: hantro: attach dma_resv release fence at
device_run
Opt the hantro driver into the new vb2 release-fence helper so its
CAPTURE-side dmabufs carry a real producer fence that wayland
compositors and other implicit-sync consumers can wait on, instead
of the dma_buf core's stub fence.
Attach point is m2m device_run, immediately after
v4l2_m2m_buf_copy_metadata() and before ctx->codec_ops->run().
Per Nicolas Dufresne's v1 review (lore.kernel.org/linux-media/
3d8deeb15581b754e4c061d4c4a13657aa08bc3c.camel@ndufresne.ca/),
this satisfies the dma_fence finite-time contract: the m2m core
has committed to running the job by this point, codec_ops->run
either kicks the HW (decode-complete signals the fence via
vb2_buffer_done) or fails immediately (job_finish with
VB2_BUF_STATE_ERROR signals with -EIO). PM and clocks are already
up by this point, so no allocation context restrictions.
The CAPTURE queue is opted in with supports_release_fences=true at
queue_init.
Userspace consumers that import hantro CAPTURE dmabufs and wait on
their implicit-sync fence (Wayland zwp_linux_dmabuf_v1 +
panfrost EGL_LINUX_DMA_BUF_EXT) now wait on a real fence
representing the producer's actual completion, fixing green-frame
corruption observed on RK3566 PineTab2 + Mali-G52 panfrost (the
GPU was sampling zero pages because the dmabuf's implicit fence
was the dma_buf core's pre-signalled stub).
Validated end-to-end on PineTab2 (RK3566 / hantro G1 / Mali-G52
mainline panfrost): 30s of bbb_1080p30 H.264 stateless decode +
zero-copy panfrost EGL import via dmabuf-wayland (mpv 0.41 +
KWin 6.6.4 + Mesa panfrost 26.0.5) renders correctly with no
green-frame corruption and no PROVE_LOCKING splats.
Cc: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Nicolas Dufresne <nicolas@ndufresne.ca>
Cc: linux-media@vger.kernel.org
Cc: linux-rockchip@lists.infradead.org
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
.../media/platform/verisilicon/hantro_drv.c | 23 +++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/media/platform/verisilicon/hantro_drv.c b/drivers/media/platform/verisilicon/hantro_drv.c
index 2e81877f6..6a66c47ed 100644
--- a/drivers/media/platform/verisilicon/hantro_drv.c
+++ b/drivers/media/platform/verisilicon/hantro_drv.c
@@ -186,6 +186,22 @@ static void device_run(void *priv)
v4l2_m2m_buf_copy_metadata(src, dst);
+ /*
+ * Attach a producer fence on the CAPTURE-side dmabuf so userspace
+ * importers (e.g. Wayland compositors) get spec-clean implicit-sync
+ * semantics. Called from device_run rather than buf_queue: the
+ * dma_fence finite-time contract requires that once a fence is
+ * published, the producer must signal it in finite time. By the
+ * time we reach device_run, the m2m core has committed to running
+ * this job, and the next hop (codec_ops->run) either kicks the HW
+ * (decode-complete signals the fence via vb2_buffer_done) or
+ * fails immediately (job_finish with VB2_BUF_STATE_ERROR signals
+ * the fence with -EIO). Either path resolves the fence in finite
+ * time. Best-effort: a NOMEM here means we lose implicit-sync
+ * precision for this frame, no functional regression.
+ */
+ (void)vb2_buffer_attach_release_fence(&dst->vb2_buf);
+
if (ctx->codec_ops->run(ctx))
goto err_cancel_job;
@@ -249,6 +265,13 @@ queue_init(void *priv, struct vb2_queue *src_vq, struct vb2_queue *dst_vq)
dst_vq->lock = &ctx->dev->vpu_mutex;
dst_vq->dev = ctx->dev->v4l2_dev.dev;
+ /*
+ * Opt the CAPTURE queue into vb2 release-fence publishing.
+ * No-op unless CONFIG_VIDEOBUF2_RELEASE_FENCES=y; runtime cost
+ * is one extra fence allocation + dma_resv update per device_run.
+ */
+ dst_vq->supports_release_fences = true;
+
return vb2_queue_init(dst_vq);
}
--
2.53.0
@@ -0,0 +1,117 @@
From 2c63a63bf65739763051dc4ce7ce2ffaf2d514c4 Mon Sep 17 00:00:00 2001
In-Reply-To: <20260429195306.239666-1-mfritsche@reauktion.de>
References: <20260429195306.239666-1-mfritsche@reauktion.de>
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Sat, 9 May 2026 16:50:51 +0200
Subject: [PATCH RFC v2] media: rockchip-rga: attach dma_resv release fence at
device_run
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Opt the rockchip-rga driver into the new vb2 release-fence helper.
Same shape as the hantro patch: attach a producer fence on the
CAPTURE-side dmabuf at m2m device_run, signalled by
vb2_buffer_done() when RGA completes the m2m operation.
Differs from hantro in one mechanical detail: rga's device_run
wraps the entire body in spin_lock_irqsave(&rga->ctrl_lock). Our
helper calls dma_resv_lock(), which is sleepable, so the
buffer-fetch + fence-attach sequence has to run above the spinlock.
Restructure device_run so:
- v4l2_m2m_next_src_buf / next_dst_buf,
- src->sequence increment,
- vb2_buffer_attach_release_fence(&dst->vb2_buf)
run before spin_lock_irqsave; only the rga->curr assignment and
rga_hw_start() (the actual HW kick) remain inside the spinlock.
This is safe under the m2m-job ownership model: by the time
device_run is called, the m2m core has selected this context and
serializes one device_run per context, so v4l2_m2m_next_*_buf
returns stable pointers until the corresponding *_buf_remove in
rga_isr. ctrl_lock was previously protecting per-device state
(rga->curr) and the HW register access, neither of which depends on
the buffer-fetch happening inside the lock.
The CAPTURE queue is opted in with supports_release_fences=true at
queue_init.
Userspace consumers of RGA-produced dmabufs (image-processing
pipelines, screen-rotation servers, gstreamer flows on Rockchip
boards) get spec-clean implicit-sync semantics, matching what
hantro does in the previous patch in this series.
Sven Püschel's ongoing "media: platform: rga: Add RGA3 support"
v5 series (linux-rockchip 2026-04-28) restructures rga.c
substantially. If that lands first, the device_run restructure
here will need a rebase against the new shape; the locking story
itself is invariant.
Cc: Jacob Chen <jacob-chen@iotwrt.com>
Cc: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Cc: Sven Püschel <s.pueschel@pengutronix.de>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Cc: linux-media@vger.kernel.org
Cc: linux-rockchip@lists.infradead.org
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/platform/rockchip/rga/rga.c | 27 +++++++++++++++++++----
1 file changed, 23 insertions(+), 4 deletions(-)
diff --git a/drivers/media/platform/rockchip/rga/rga.c b/drivers/media/platform/rockchip/rga/rga.c
index fea63b94c..03030c7ea 100644
--- a/drivers/media/platform/rockchip/rga/rga.c
+++ b/drivers/media/platform/rockchip/rga/rga.c
@@ -38,15 +38,28 @@ static void device_run(void *prv)
struct vb2_v4l2_buffer *src, *dst;
unsigned long flags;
- spin_lock_irqsave(&rga->ctrl_lock, flags);
-
- rga->curr = ctx;
-
+ /*
+ * Fetch the next-job buffers and (best-effort) attach a producer
+ * fence on CAPTURE before taking ctrl_lock below.
+ * vb2_buffer_attach_release_fence() takes dma_resv_lock, which is
+ * sleepable; ctrl_lock is taken with spin_lock_irqsave so any
+ * sleepable call must happen above it. Buffer ownership is
+ * already committed at this point: the m2m core has selected
+ * this context for device_run and serializes one device_run per
+ * context, so v4l2_m2m_next_*_buf returns stable pointers until
+ * the corresponding *_buf_remove in rga_isr.
+ */
src = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
src->sequence = ctx->osequence++;
dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
+ (void)vb2_buffer_attach_release_fence(&dst->vb2_buf);
+
+ spin_lock_irqsave(&rga->ctrl_lock, flags);
+
+ rga->curr = ctx;
+
rga_hw_start(rga, vb_to_rga(src), vb_to_rga(dst));
spin_unlock_irqrestore(&rga->ctrl_lock, flags);
@@ -123,6 +136,12 @@ queue_init(void *priv, struct vb2_queue *src_vq, struct vb2_queue *dst_vq)
dst_vq->lock = &ctx->rga->mutex;
dst_vq->dev = ctx->rga->v4l2_dev.dev;
+ /*
+ * Opt the CAPTURE queue into vb2 release-fence publishing.
+ * Compile-time gated by CONFIG_VIDEOBUF2_RELEASE_FENCES.
+ */
+ dst_vq->supports_release_fences = true;
+
return vb2_queue_init(dst_vq);
}
--
2.53.0
@@ -0,0 +1,61 @@
# vb2_dma_resv release-fence — RFC v2 series
Three-patch series that opts V4L2 m2m drivers into attaching real
producer fences to CAPTURE-side dmabufs, so implicit-sync GPU
consumers (Wayland / panfrost / panthor) wait correctly on the
producer's decode-completion rather than seeing the dma_buf core's
stub fence.
| Patch | Subject |
|---|---|
| `0004` | media: videobuf2: add opt-in dma_resv producer fence helper |
| `0005` | media: hantro: attach dma_resv release fence at device_run |
| `0006` | media: rockchip-rga: attach dma_resv release fence at device_run |
Numbered with the leading 4/5/6 because the fresnel build series carries
these alongside the 3 board-scoped `0001/0002/0003` Pinebook Pro DTS
patches; the numbers reflect apply-order in the PKGBUILD, not the
upstream lore series ordering (which starts at 1/4..4/4 with a cover).
## Status
**RFC v2.** Iterated on `lore.kernel.org/linux-media` after v1 was
rejected over the dma_fence finite-time contract gap and bus-locked
allocation issues. v2 attaches the fence at `device_run` instead of
QBUF, which puts allocation in slept-OK context (PM and clocks up,
job committed) — per Nicolas Dufresne's v1 review feedback.
Cover-letter reference: `marfrit/dmabuf-modifier-triage#3` (campaign
session that owns the upstream-targeting work; this directory ships
the build-tree-ready form for kernel-agent fleet consumption).
## Scope
`subsystem/media/videobuf2/` for the helper (0004), with two
driver opt-ins (0005/0006) shipped together because hantro and
rockchip-rga both need the helper to be useful on the RK35xx fleet.
Splitting 0005/0006 into `driver/hantro/` and `driver/rockchip-rga/`
was considered but rejected: they're a single contract series, and
their apply-order matters (0004 must precede). Series-as-unit beats
per-driver promote eligibility here.
## Fleet eligibility
- **fresnel** (RK3399 + hantro + Mali-G52): eligible. Carried in
`fleet/fresnel.yaml` since 2026-05-15 (decision flipped from "defer
to v2" to "include — v2 in this tree").
- **ohm** (RK3566 + hantro + Mali-G52): eligible. Was the original
reproducer for the green-frames symptom. Will be carried in
`fleet/ohm.yaml` once that manifest lands.
- **ampere** (RK3588 + hantro for some codecs): eligibility deferred —
RK3588 uses rkvdec2 for primary decode, hantro role is narrower.
Re-assess when `fleet/ampere.yaml` lands per issue #6.
- **boltzmann** (RK3588): same as ampere — defer.
## Upstream targeting
Not yet posted to linux-media in v2 form. Per `feedback_no_upstream.md`
the default is "build-tree only, wait for explicit ask". When/if the
upstream submission happens, this directory's `0004/0005/0006` are the
canonical source — they include the v2 commit headers (`PATCH RFC v2`,
`In-Reply-To` chain to the v1 cover-letter Message-Id).