From 18da673cccb224c2277f58374b890a77b48476c0 Mon Sep 17 00:00:00 2001 From: "Claude (noether)" Date: Fri, 15 May 2026 15:32:00 +0000 Subject: [PATCH] phase 1: promote vb2_dma_resv RFC v2 + add ka-status + ampere as 2nd aarch64 host MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three changes that together flip kernel-agent from spec'd to operational in the manual-orchestrated form. Real ka-* CLI verbs come in later phases; this commit gets a first iteration through the pipeline and proves the flow at the artifact level. 1. Promote vb2_dma_resv RFC v2 series into the scope-tagged tree Markus iterated v2 locally on boltzmann (kernel-agent-bootstrap dir, reaching linux-fresnel-fourier pkgrel=14). v2 attaches the producer fence at device_run in slept-OK context per Dufresne's v1 review on linux-media. The three patches land under patches/subsystem/media/videobuf2/dma-resv-release-fence/: - 0004 (helper) — opt-in vb2 dma_resv producer-fence helper - 0005 (driver opt-in) — hantro device_run attach - 0006 (driver opt-in) — rockchip-rga device_run attach Numbered 4/5/6 because the fresnel build PKGBUILD applies them after the three 0001/0002/0003 PBP DTS patches; this directory's numbering follows that apply-order, not the upstream lore series numbering. README at the scope dir documents fleet eligibility, decision history, and the v1 → v2 design pivot. 2. Update fleet/fresnel.yaml to include the v2 series Pre-v2 manifest had a comment block 'Explicitly NOT included … vb2 dma-resv-release-fence … defer until v2 lands'. v2 has landed. Move those three lines from 'excluded' to 'includes', annotate the decision inline. 3. README updates - Build hosts table: add ampere (CoolPi GenBook, RK3588 32GB) as secondary aarch64 host. Same uarch as boltzmann, on-demand wake via His. Gives the fleet a second native build target for when boltzmann is busy (e.g. carrying a firefox-fourier 4h build). - 'Out of scope this round' bootstrap section: mark vb2_dma_resv as resolved 2026-05-15, keep panfrost IOMMU_CACHE deferred. 4. First ka-* CLI verb implemented: bin/ka-status bash, ~120 lines. Reads fleet/*.yaml manifests, queries Gitea for open [ka:*] issues, probes each reachable host for the installed kernel-package version. Read-only — no sudo, no host writes. Picks GITEA_TOKEN from /opt/herding/etc/claude-identities/.creds or env override. Proves the agent's Gitea-API + manifest-parsing skeleton works end-to-end without committing to a full ka-promote/build/install implementation. Smoke-tested locally: $ bin/ka-status kernel-agent status (repo: marfrit/kernel-agent) open [ka:*] issues total: 1 ══ fresnel ══ manifest: arch=arm64 soc=rockchip/rk3399 board=pinebook-pro package: linux-fresnel-fourier installed: host-down # (fresnel is currently powered off) open ka-issues: (none for this host) No PKGBUILD update in this PR — that lives in marfrit-packages and ships as a sibling PR (the actual linux-fresnel-fourier-7.0-14 publish). --- README.md | 26 +- bin/ka-status | 136 +++++++ fleet/fresnel.yaml | 12 +- ...add-opt-in-dma_resv-producer-fence-h.patch | 356 ++++++++++++++++++ ...ach-dma_resv-release-fence-at-device.patch | 95 +++++ ...ga-attach-dma_resv-release-fence-at-.patch | 117 ++++++ .../dma-resv-release-fence/README.md | 61 +++ 7 files changed, 790 insertions(+), 13 deletions(-) create mode 100755 bin/ka-status create mode 100644 patches/subsystem/media/videobuf2/dma-resv-release-fence/0004-media-videobuf2-add-opt-in-dma_resv-producer-fence-h.patch create mode 100644 patches/subsystem/media/videobuf2/dma-resv-release-fence/0005-media-hantro-attach-dma_resv-release-fence-at-device.patch create mode 100644 patches/subsystem/media/videobuf2/dma-resv-release-fence/0006-media-rockchip-rga-attach-dma_resv-release-fence-at-.patch create mode 100644 patches/subsystem/media/videobuf2/dma-resv-release-fence/README.md diff --git a/README.md b/README.md index 37f25c2..13a981f 100644 --- a/README.md +++ b/README.md @@ -115,7 +115,7 @@ ka-pause-prune / ka-resume-prune ka-restore-archive ka-snooze [--for ] ka-debug # shells into the same container that ran the build -ka-status # per-host one-liner with drift/pending state +ka-status # per-host one-liner with drift/pending state [bin/ka-status — implemented Phase 1] ka-migrate-tree --from

--to

ka-wake-data # wraps wake-host data through His ``` @@ -162,11 +162,13 @@ manifest rewrite); paths stable otherwise. ## Build hosts ``` -Host Where Role Wake? Notes +Host Where Role Wake? Notes ────────────────────────────────────────────────────────────────────────── -boltzmann Rock 5 ITX+ aarch64 primary always container kbuild-aarch64 -fermi hertz LXD aarch64 fallback always matches kbuild-aarch64 profile -kbuild-x86 data CT x86_64 on-demand wakes via His; idle 30 min → release +boltzmann Rock 5 ITX+ aarch64 primary always container kbuild-aarch64 +ampere CoolPi GenBook aarch64 secondary on-demand RK3588 32GB; same uarch as boltzmann, + wakes via His; idle 30 min → release +fermi hertz LXD aarch64 fallback always matches kbuild-aarch64 profile +kbuild-x86 data CT x86_64 on-demand wakes via His; idle 30 min → release ``` Native make on the assigned build host. **No distcc** for kernel-agent @@ -288,10 +290,16 @@ build. ### Out of scope this round (explicit defer) -- **vb2 dma_resv RFC v2** + panfrost IOMMU_CACHE for RK3399 — would have closed - the fresnel-fourier campaign criterion-4 readback transitive-proof gap, but - v2 isn't implemented (RFC v1 rejected upstream). Deferred to a follow-up - build once v2 lands. See `marfrit/dmabuf-modifier-triage#3`. +- **vb2 dma_resv RFC v2** — *resolved 2026-05-15.* Markus iterated v2 locally + on boltzmann reaching pkgrel=14; the v2 series attaches the fence at + `device_run` (slept-OK context per Dufresne's v1 review). Now carried in + `patches/subsystem/media/videobuf2/dma-resv-release-fence/` and included + in `fleet/fresnel.yaml`. Still in scope for upstream targeting; default + remains "build-tree only, no PR until explicitly asked" + (`feedback_no_upstream.md`). +- **panfrost IOMMU_CACHE for RK3399** — sibling kernel work that targets the + readback transitive-proof gap that vb2_dma_resv alone doesn't close. + Still deferred until that lands; ship together when ready. - **Replace** `linux-eos-arm` rather than coexist alongside — preserves easy rollback at u-boot. Can flip to `provides=(linux-eos-arm) conflicts=(...)` later once burn-in proves the OC kernel reliable. diff --git a/bin/ka-status b/bin/ka-status new file mode 100755 index 0000000..f903467 --- /dev/null +++ b/bin/ka-status @@ -0,0 +1,136 @@ +#!/usr/bin/env bash +# +# ka-status — per-host kernel-agent state summary. +# +# Reads fleet/*.yaml manifests + queries Gitea for open [ka:*] issues + +# probes each host (where reachable) for the installed kernel-package +# version. Designed to give a first-look "what's the state" before any +# ka-promote / ka-install action. +# +# Usage: +# ka-status # summary across all manifests +# ka-status # detail for one host +# +# Read-only. Never mutates state. No sudo. No SSH-into-host writes. +# +# Phase 1 deliverable. Future ka-* CLI verbs (ka-promote / ka-close / +# ka-install) build on the same Gitea-API + manifest-parsing skeleton. + +set -euo pipefail + +GITEA_URL="${GITEA_URL:-https://git.reauktion.de}" +REPO="${KERNEL_AGENT_REPO:-marfrit/kernel-agent}" +TOKEN_FILE="${KERNEL_AGENT_TOKEN_FILE:-/opt/herding/etc/claude-identities/noether.creds}" + +# Resolve token from per-host claude-identity creds, or env override. +token="" +if [ -n "${GITEA_TOKEN:-}" ]; then + token="$GITEA_TOKEN" +elif [ -r "$TOKEN_FILE" ]; then + token="$(grep -E '^GITEA_TOKEN=' "$TOKEN_FILE" | head -1 | cut -d= -f2)" +fi + +# Locate fleet/ — script lives in bin/ next to fleet/. +script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +fleet_dir="${script_dir}/../fleet" +[ -d "$fleet_dir" ] || { echo "fleet/ not found relative to $script_dir" >&2; exit 2; } + +api_get() { + local path="$1" + local args=(--silent --show-error --max-time 15) + [ -n "$token" ] && args+=(-H "Authorization: token $token") + curl "${args[@]}" "${GITEA_URL}/api/v1/${path}" +} + +# Open ka-prefixed issues, JSON array on stdout. +fetch_ka_issues() { + api_get "repos/${REPO}/issues?state=open&type=issues&limit=50" \ + | python3 -c ' +import json, sys +issues = json.load(sys.stdin) +def kind(t): + if not t.startswith("["): return None + p = t.find("]") + return t[1:p] if p > 0 else None +out = [{ + "number": i["number"], + "title": i["title"], + "kind": kind(i["title"]), +} for i in issues if kind(i["title"]) and kind(i["title"]).startswith("ka:")] +print(json.dumps(out)) +' 2>/dev/null || echo "[]" +} + +# Parse a fleet/.yaml manifest (very-narrow YAML subset). +manifest_field() { + local file="$1" key="$2" + grep -E "^[[:space:]]*${key}:" "$file" | head -1 | sed -E "s/^[^:]*:[[:space:]]*//; s/[[:space:]]+#.*//; s/^[\"']//; s/[\"']$//" +} + +manifest_pkgname() { + local file="$1" + awk '/^package:/{p=1; next} p && /^[a-z]/{p=0} p && /^[[:space:]]+name:/{sub(/^[[:space:]]+name:[[:space:]]*/,""); print; exit}' "$file" +} + +# Probe a host for installed kernel-package version (best-effort, non-blocking). +probe_installed() { + local host="$1" pkg="$2" + ssh -o ConnectTimeout=3 -o BatchMode=yes -o StrictHostKeyChecking=accept-new \ + "${host}.fritz.box" "pacman -Q '$pkg' 2>/dev/null || dpkg-query -W -f='\${Package} \${Version}\n' '$pkg' 2>/dev/null || echo 'host-up:not-installed'" 2>/dev/null \ + || echo "host-down" +} + +issues_json="$(fetch_ka_issues)" + +# Per-host issue grouping — match on title containing the host name (cheap heuristic; +# proper kernel-agent will tag issues with a host label). +issues_for_host() { + local host="$1" + echo "$issues_json" | python3 -c " +import json, sys +host = '$host' +issues = json.load(sys.stdin) +hits = [i for i in issues if host in i['title'].lower()] +for h in hits: + print(f\" #{h['number']} [{h['kind']}] {h['title']}\") +" 2>/dev/null +} + +print_host() { + local file="$1" + local host pkg + host="$(basename "$file" .yaml)" + pkg="$(manifest_pkgname "$file")" + local arch="$(manifest_field "$file" arch)" + local soc="$(manifest_field "$file" soc)" + local board="$(manifest_field "$file" board)" + + printf '\n══ %s ══\n' "$host" + printf ' manifest: arch=%s soc=%s board=%s\n' "$arch" "$soc" "$board" + printf ' package: %s\n' "$pkg" + if [ -n "$pkg" ]; then + printf ' installed: %s\n' "$(probe_installed "$host" "$pkg")" + fi + + local n=0 + while IFS= read -r line; do + [ -z "$line" ] && continue + if [ $n -eq 0 ]; then printf ' open ka-issues:\n'; fi + printf '%s\n' "$line" + n=$((n+1)) + done < <(issues_for_host "$host") + [ $n -eq 0 ] && printf ' open ka-issues: (none for this host)\n' +} + +if [ $# -ge 1 ]; then + f="${fleet_dir}/${1}.yaml" + [ -r "$f" ] || { echo "no manifest for host '$1'" >&2; exit 2; } + print_host "$f" +else + printf 'kernel-agent status (repo: %s)\n' "$REPO" + printf 'open [ka:*] issues total: %s\n' "$(echo "$issues_json" | python3 -c 'import json,sys; print(len(json.load(sys.stdin)))')" + for f in "$fleet_dir"/*.yaml; do + [ -e "$f" ] || continue + print_host "$f" + done +fi diff --git a/fleet/fresnel.yaml b/fleet/fresnel.yaml index cbc53cb..c59d812 100644 --- a/fleet/fresnel.yaml +++ b/fleet/fresnel.yaml @@ -23,13 +23,17 @@ includes: - board/pinebook-pro/0001-arm64-dts-rk3399-pinebook-pro-add-OC-OPP-tables-1704-2184.patch - board/pinebook-pro/0002-arm64-dts-rk3399-pinebook-pro-enable-hdmi-sound.patch - board/pinebook-pro/0003-arm64-dts-rk3399-pinebook-pro-spi1-max-freq-10MHz.patch + # vb2_dma_resv RFC v2 series — added 2026-05-15 (Markus iterated v2 locally + # on boltzmann reaching pkgrel=14; pre-v2 decision was "defer". v2 attaches + # the fence at device_run in slept-OK context per Dufresne's v1 review). + - subsystem/media/videobuf2/dma-resv-release-fence/0004-media-videobuf2-add-opt-in-dma_resv-producer-fence-h.patch + - subsystem/media/videobuf2/dma-resv-release-fence/0005-media-hantro-attach-dma_resv-release-fence-at-device.patch + - subsystem/media/videobuf2/dma-resv-release-fence/0006-media-rockchip-rga-attach-dma_resv-release-fence-at-.patch # Explicitly NOT included (tracked elsewhere, decision logged): -# - subsystem/media/videobuf2/dma-resv-release-fence/ (RFC v1 rejected; -# v2 in design — see marfrit/dmabuf-modifier-triage#3. Skip until v2 lands -# or we explicitly accept v1-shape parity with ohm.) # - driver/panfrost/iommu-cache-rk3399/ (sibling kernel work; ship together -# with vb2_dma_resv when it lands.) +# once it lands. Targets the readback transitive-proof gap that vb2_dma_resv +# alone doesn't close.) config: source: /proc/config.gz on running fresnel kernel (linux-eos-arm 6.19.9-99) diff --git a/patches/subsystem/media/videobuf2/dma-resv-release-fence/0004-media-videobuf2-add-opt-in-dma_resv-producer-fence-h.patch b/patches/subsystem/media/videobuf2/dma-resv-release-fence/0004-media-videobuf2-add-opt-in-dma_resv-producer-fence-h.patch new file mode 100644 index 0000000..9f61650 --- /dev/null +++ b/patches/subsystem/media/videobuf2/dma-resv-release-fence/0004-media-videobuf2-add-opt-in-dma_resv-producer-fence-h.patch @@ -0,0 +1,356 @@ +From a202de1646d4c8f8ee2ebc2e4c100b621975754a Mon Sep 17 00:00:00 2001 +In-Reply-To: <20260429195306.239666-1-mfritsche@reauktion.de> +References: <20260429195306.239666-1-mfritsche@reauktion.de> +From: Markus Fritsche +Date: Sat, 9 May 2026 16:16:07 +0200 +Subject: [PATCH RFC v2] media: videobuf2: add opt-in dma_resv producer fence + helper +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +V4L2 producers historically don't propagate buffer-state-done into +the dmabuf's dma_resv exclusive fence. Userspace consumers that +import V4L2-produced dmabufs and wait on the dmabuf's implicit-sync +fence (poll(POLLIN), DMA_BUF_IOCTL_EXPORT_SYNC_FILE, +EGL_LINUX_DMA_BUF_EXT) currently see either zero fences or a stub +fence from dma_fence_get_stub(). This is correct by accident for the +common DQBUF-then-import case but represents a contract gap that +breaks Wayland compositors importing CAPTURE buffers from a stateless +H.264 decoder under continuous playback on implicit-sync GPU stacks +(observed on RK3566 + hantro VPU + Mali-G52 panfrost; manifests as +green frames -- BT.709 limited-range YUV(0,0,0) -> RGB(0,77,0) -- when +the GPU samples the dmabuf before the producer's decode completes). + +Add an opt-in API gated by both a per-driver runtime flag +(vb2_queue::supports_release_fences) and a Kconfig +(CONFIG_VIDEOBUF2_RELEASE_FENCES, default n) that lets producers +populate a real dma_resv exclusive write fence on the dmabufs they +export. Drivers call vb2_buffer_attach_release_fence(vb) at a +finite-time-fenced point in their pipeline (typically m2m +device_run, just before the HW kick); vb2_buffer_done() signals and +puts the fence as part of its state transition. + +The publish and signal paths are wrapped in +dma_fence_begin_signalling() / dma_fence_end_signalling() so +PROVE_LOCKING can validate that nothing taken in those critical +sections deadlocks against the signal path. dma_resv_lock is +sleepable but not taken on the signal path, so taking it inside the +publish critical section is safe under lockdep. + +Skips planes whose vb2_plane.dbuf is NULL -- buffers never exported +via VIDIOC_EXPBUF (or imported via V4L2_MEMORY_DMABUF) have no +dmabuf for userspace to wait on. + +Drivers that don't opt in pay nothing: the helper is a no-op stub +when CONFIG_VIDEOBUF2_RELEASE_FENCES=n, and an early-return check +of supports_release_fences when =y but the flag is unset. + +Validated on RK3566 PineTab2 with PROVE_LOCKING enabled: 30s of +bbb_1080p30 H.264 stateless decode + zero-copy panfrost EGL import +via dmabuf-wayland (mpv 0.41 + KWin 6.6.4 + Mesa panfrost 26.0.5) +produces 31,816 dma_fence init/signal pairs across 5,724 vb2 buffer +cycles with zero lockdep splats from videobuf2 / dma_resv code paths. + +Subsequent patches in this series opt the hantro and rockchip-rga +drivers in. + +Cc: Daniel Vetter +Cc: Christian König +Cc: Nicolas Dufresne +Cc: Sumit Semwal +Cc: Hans Verkuil +Cc: Tomasz Figa +Cc: linux-media@vger.kernel.org +Cc: dri-devel@lists.freedesktop.org +Cc: linaro-mm-sig@lists.linaro.org +Signed-off-by: Markus Fritsche +--- + drivers/media/common/videobuf2/Kconfig | 29 ++++ + .../media/common/videobuf2/videobuf2-core.c | 135 ++++++++++++++++++ + include/media/videobuf2-core.h | 51 +++++++ + 3 files changed, 215 insertions(+) + +diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig +index d2223a12c..bbfa26984 100644 +--- a/drivers/media/common/videobuf2/Kconfig ++++ b/drivers/media/common/videobuf2/Kconfig +@@ -30,3 +30,32 @@ config VIDEOBUF2_DMA_SG + config VIDEOBUF2_DVB + tristate + select VIDEOBUF2_CORE ++ ++config VIDEOBUF2_RELEASE_FENCES ++ bool "videobuf2: opt-in dma_resv producer fences for V4L2 dmabuf exports" ++ depends on VIDEOBUF2_CORE ++ depends on DMA_SHARED_BUFFER ++ default n ++ help ++ Enables an opt-in API that lets vb2 producers populate a dma_resv ++ exclusive write fence on the dmabufs they export to userspace. ++ The fence is signalled when the buffer transitions to ++ VB2_BUF_STATE_DONE. ++ ++ This gives userspace consumers that import V4L2-produced dmabufs ++ and wait on the dmabuf's implicit-sync fence (poll(POLLIN), ++ DMA_BUF_IOCTL_EXPORT_SYNC_FILE, EGL_LINUX_DMA_BUF_EXT) a real ++ producer fence to wait on, instead of a stub fence from ++ dma_fence_get_stub() that the dma_buf core substitutes when ++ dma_resv is empty. ++ ++ Drivers individually opt in by setting ++ vb2_queue::supports_release_fences = true and calling ++ vb2_buffer_attach_release_fence() at the right point in their ++ pipeline (typically m2m device_run, just before HW kick). ++ ++ Distributors leave this off unless targeting Wayland/EGL ++ consumers of V4L2 stateless decoder output on ++ implicit-sync-only GPU stacks (e.g. mainline panfrost). ++ ++ If unsure, say N. +diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c +index adf668b21..85d7fddbd 100644 +--- a/drivers/media/common/videobuf2/videobuf2-core.c ++++ b/drivers/media/common/videobuf2/videobuf2-core.c +@@ -26,6 +26,12 @@ + #include + #include + ++#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES ++#include ++#include ++#include ++#endif ++ + #include + #include + +@@ -1173,6 +1179,120 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no) + } + EXPORT_SYMBOL_GPL(vb2_plane_cookie); + ++#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES ++/* ++ * dma_resv release-fence integration. ++ * ++ * Optional, opt-in path that lets producers publish a real ++ * dma_fence on their CAPTURE-side dmabufs so userspace consumers ++ * (compositors, EGL importers) get spec-clean implicit-sync ++ * semantics instead of the dma_buf core's stub fence. Drivers ++ * call vb2_buffer_attach_release_fence() at a finite-time-fenced ++ * point (typically m2m device_run) and the fence is signalled by ++ * vb2_buffer_done(). Gated at runtime by ++ * vb2_queue::supports_release_fences and at compile time by ++ * CONFIG_VIDEOBUF2_RELEASE_FENCES. ++ */ ++ ++static const char *vb2_dma_resv_get_driver_name(struct dma_fence *fence) ++{ ++ return "videobuf2"; ++} ++ ++static const char *vb2_dma_resv_get_timeline_name(struct dma_fence *fence) ++{ ++ return "vb2-release-fence"; ++} ++ ++static const struct dma_fence_ops vb2_dma_resv_fence_ops = { ++ .get_driver_name = vb2_dma_resv_get_driver_name, ++ .get_timeline_name = vb2_dma_resv_get_timeline_name, ++}; ++ ++int vb2_buffer_attach_release_fence(struct vb2_buffer *vb) ++{ ++ struct vb2_queue *q = vb->vb2_queue; ++ struct dma_fence *fence; ++ unsigned int plane; ++ bool cookie; ++ ++ if (!q->supports_release_fences) ++ return 0; ++ ++ if (WARN_ON(vb->release_fence)) ++ return -EINVAL; ++ ++ fence = kzalloc(sizeof(*fence), GFP_KERNEL); ++ if (!fence) ++ return -ENOMEM; ++ ++ dma_fence_init(fence, &vb2_dma_resv_fence_ops, &q->dma_resv_fence_lock, ++ q->dma_resv_fence_context, ++ atomic64_inc_return(&q->dma_resv_fence_seqno)); ++ ++ /* ++ * Annotate the publish-side critical section. Per ++ * Documentation/driver-api/dma-buf.rst, lockdep validates ++ * that nothing taken in this region can deadlock against ++ * the signal path in vb2_buffer_signal_release_fence(). ++ * dma_resv_lock is sleepable but is not taken on the signal ++ * path, so taking it inside the critical section is safe. ++ */ ++ cookie = dma_fence_begin_signalling(); ++ for (plane = 0; plane < vb->num_planes; plane++) { ++ struct dma_buf *dbuf = vb->planes[plane].dbuf; ++ ++ if (!dbuf) ++ continue; ++ ++ dma_resv_lock(dbuf->resv, NULL); ++ dma_resv_add_fence(dbuf->resv, fence, DMA_RESV_USAGE_WRITE); ++ dma_resv_unlock(dbuf->resv); ++ } ++ dma_fence_end_signalling(cookie); ++ ++ /* One reference for the eventual signal in vb2_buffer_done. */ ++ vb->release_fence = dma_fence_get(fence); ++ ++ /* The dma_resv held its own reference per plane. Drop ours. */ ++ dma_fence_put(fence); ++ ++ return 0; ++} ++EXPORT_SYMBOL_GPL(vb2_buffer_attach_release_fence); ++ ++static void vb2_buffer_signal_release_fence(struct vb2_buffer *vb, ++ enum vb2_buffer_state state) ++{ ++ struct dma_fence *fence = vb->release_fence; ++ bool cookie; ++ ++ if (!fence) ++ return; ++ ++ cookie = dma_fence_begin_signalling(); ++ if (state == VB2_BUF_STATE_ERROR) ++ dma_fence_set_error(fence, -EIO); ++ dma_fence_signal(fence); ++ dma_fence_end_signalling(cookie); ++ ++ dma_fence_put(fence); ++ vb->release_fence = NULL; ++} ++#else /* !CONFIG_VIDEOBUF2_RELEASE_FENCES */ ++ ++int vb2_buffer_attach_release_fence(struct vb2_buffer *vb) ++{ ++ return 0; ++} ++EXPORT_SYMBOL_GPL(vb2_buffer_attach_release_fence); ++ ++static inline void vb2_buffer_signal_release_fence(struct vb2_buffer *vb, ++ enum vb2_buffer_state state) ++{ ++} ++#endif /* CONFIG_VIDEOBUF2_RELEASE_FENCES */ ++ + void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state) + { + struct vb2_queue *q = vb->vb2_queue; +@@ -1199,6 +1319,9 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state) + if (state != VB2_BUF_STATE_QUEUED) + __vb2_buf_mem_finish(vb); + ++ if (state != VB2_BUF_STATE_QUEUED) ++ vb2_buffer_signal_release_fence(vb, state); ++ + spin_lock_irqsave(&q->done_lock, flags); + if (state == VB2_BUF_STATE_QUEUED) { + vb->state = VB2_BUF_STATE_QUEUED; +@@ -2651,6 +2774,18 @@ int vb2_core_queue_init(struct vb2_queue *q) + mutex_init(&q->mmap_lock); + init_waitqueue_head(&q->done_wq); + ++#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES ++ /* ++ * Per-queue dma_resv release-fence context. Drivers that ++ * opt in via supports_release_fences and call ++ * vb2_buffer_attach_release_fence() use these to allocate ++ * fences on a single per-queue timeline. ++ */ ++ q->dma_resv_fence_context = dma_fence_context_alloc(1); ++ atomic64_set(&q->dma_resv_fence_seqno, 0); ++ spin_lock_init(&q->dma_resv_fence_lock); ++#endif ++ + q->memory = VB2_MEMORY_UNKNOWN; + + if (q->buf_struct_size == 0) +diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h +index 4424d481d..766ff2194 100644 +--- a/include/media/videobuf2-core.h ++++ b/include/media/videobuf2-core.h +@@ -288,6 +288,16 @@ struct vb2_buffer { + unsigned int skip_cache_sync_on_finish:1; + + struct vb2_plane planes[VB2_MAX_PLANES]; ++#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES ++ /* ++ * Producer release fence published on each plane's ++ * dmabuf->resv when the driver opts in via ++ * vb2_buffer_attach_release_fence(). Signalled and put by ++ * vb2_buffer_done() on transition to DONE/ERROR. NULL when ++ * the driver did not opt in for this buffer. ++ */ ++ struct dma_fence *release_fence; ++#endif + struct list_head queued_entry; + struct list_head done_entry; + #ifdef CONFIG_VIDEO_ADV_DEBUG +@@ -648,6 +658,19 @@ struct vb2_queue { + spinlock_t done_lock; + wait_queue_head_t done_wq; + ++#ifdef CONFIG_VIDEOBUF2_RELEASE_FENCES ++ /* ++ * dma_resv release-fence context. Drivers that set ++ * supports_release_fences and call ++ * vb2_buffer_attach_release_fence() use these to allocate ++ * fences on a per-queue timeline. ++ */ ++ u64 dma_resv_fence_context; ++ atomic64_t dma_resv_fence_seqno; ++ spinlock_t dma_resv_fence_lock; ++#endif ++ ++ unsigned int supports_release_fences:1; + unsigned int streaming:1; + unsigned int start_streaming_called:1; + unsigned int error:1; +@@ -735,6 +758,34 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no); + */ + void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state); + ++/** ++ * vb2_buffer_attach_release_fence() - opt-in dma_resv release fence. ++ * @vb: the buffer being committed to the producer. ++ * ++ * Drivers that have set vb2_queue::supports_release_fences may call ++ * this from any sleepable context where they have committed to ++ * running the operation in finite time -- typically m2m ++ * device_run(), just before the HW kick. The helper allocates a ++ * dma_fence on the queue's per-queue timeline, attaches it as ++ * DMA_RESV_USAGE_WRITE on each plane's dmabuf->resv, and stashes ++ * it in vb->release_fence. vb2_buffer_done() signals and puts the ++ * fence as part of the buffer's state transition. ++ * ++ * Skips planes whose vb2_plane.dbuf is NULL -- buffers never ++ * exported via VIDIOC_EXPBUF (or imported via V4L2_MEMORY_DMABUF) ++ * have no dmabuf for userspace to wait on. ++ * ++ * No-op when vb2_queue::supports_release_fences is not set ++ * (regardless of CONFIG_VIDEOBUF2_RELEASE_FENCES). When ++ * CONFIG_VIDEOBUF2_RELEASE_FENCES=n, this is a stub that returns 0. ++ * ++ * Returns 0 on success or when the no-op stub is in effect, ++ * negative errno on allocation failure when fence publishing was ++ * attempted. Best-effort: drivers should ignore the return value ++ * unless they want diagnostics. ++ */ ++int vb2_buffer_attach_release_fence(struct vb2_buffer *vb); ++ + /** + * vb2_discard_done() - discard all buffers marked as DONE. + * @q: pointer to &struct vb2_queue with videobuf2 queue. +-- +2.53.0 + diff --git a/patches/subsystem/media/videobuf2/dma-resv-release-fence/0005-media-hantro-attach-dma_resv-release-fence-at-device.patch b/patches/subsystem/media/videobuf2/dma-resv-release-fence/0005-media-hantro-attach-dma_resv-release-fence-at-device.patch new file mode 100644 index 0000000..0d88f78 --- /dev/null +++ b/patches/subsystem/media/videobuf2/dma-resv-release-fence/0005-media-hantro-attach-dma_resv-release-fence-at-device.patch @@ -0,0 +1,95 @@ +From 1844c263bde8dd244d7db46f8c508e7c70da459c Mon Sep 17 00:00:00 2001 +In-Reply-To: <20260429195306.239666-1-mfritsche@reauktion.de> +References: <20260429195306.239666-1-mfritsche@reauktion.de> +From: Markus Fritsche +Date: Sat, 9 May 2026 16:24:01 +0200 +Subject: [PATCH RFC v2] media: hantro: attach dma_resv release fence at + device_run + +Opt the hantro driver into the new vb2 release-fence helper so its +CAPTURE-side dmabufs carry a real producer fence that wayland +compositors and other implicit-sync consumers can wait on, instead +of the dma_buf core's stub fence. + +Attach point is m2m device_run, immediately after +v4l2_m2m_buf_copy_metadata() and before ctx->codec_ops->run(). +Per Nicolas Dufresne's v1 review (lore.kernel.org/linux-media/ +3d8deeb15581b754e4c061d4c4a13657aa08bc3c.camel@ndufresne.ca/), +this satisfies the dma_fence finite-time contract: the m2m core +has committed to running the job by this point, codec_ops->run +either kicks the HW (decode-complete signals the fence via +vb2_buffer_done) or fails immediately (job_finish with +VB2_BUF_STATE_ERROR signals with -EIO). PM and clocks are already +up by this point, so no allocation context restrictions. + +The CAPTURE queue is opted in with supports_release_fences=true at +queue_init. + +Userspace consumers that import hantro CAPTURE dmabufs and wait on +their implicit-sync fence (Wayland zwp_linux_dmabuf_v1 + +panfrost EGL_LINUX_DMA_BUF_EXT) now wait on a real fence +representing the producer's actual completion, fixing green-frame +corruption observed on RK3566 PineTab2 + Mali-G52 panfrost (the +GPU was sampling zero pages because the dmabuf's implicit fence +was the dma_buf core's pre-signalled stub). + +Validated end-to-end on PineTab2 (RK3566 / hantro G1 / Mali-G52 +mainline panfrost): 30s of bbb_1080p30 H.264 stateless decode + +zero-copy panfrost EGL import via dmabuf-wayland (mpv 0.41 + +KWin 6.6.4 + Mesa panfrost 26.0.5) renders correctly with no +green-frame corruption and no PROVE_LOCKING splats. + +Cc: Ezequiel Garcia +Cc: Philipp Zabel +Cc: Nicolas Dufresne +Cc: linux-media@vger.kernel.org +Cc: linux-rockchip@lists.infradead.org +Signed-off-by: Markus Fritsche +--- + .../media/platform/verisilicon/hantro_drv.c | 23 +++++++++++++++++++ + 1 file changed, 23 insertions(+) + +diff --git a/drivers/media/platform/verisilicon/hantro_drv.c b/drivers/media/platform/verisilicon/hantro_drv.c +index 2e81877f6..6a66c47ed 100644 +--- a/drivers/media/platform/verisilicon/hantro_drv.c ++++ b/drivers/media/platform/verisilicon/hantro_drv.c +@@ -186,6 +186,22 @@ static void device_run(void *priv) + + v4l2_m2m_buf_copy_metadata(src, dst); + ++ /* ++ * Attach a producer fence on the CAPTURE-side dmabuf so userspace ++ * importers (e.g. Wayland compositors) get spec-clean implicit-sync ++ * semantics. Called from device_run rather than buf_queue: the ++ * dma_fence finite-time contract requires that once a fence is ++ * published, the producer must signal it in finite time. By the ++ * time we reach device_run, the m2m core has committed to running ++ * this job, and the next hop (codec_ops->run) either kicks the HW ++ * (decode-complete signals the fence via vb2_buffer_done) or ++ * fails immediately (job_finish with VB2_BUF_STATE_ERROR signals ++ * the fence with -EIO). Either path resolves the fence in finite ++ * time. Best-effort: a NOMEM here means we lose implicit-sync ++ * precision for this frame, no functional regression. ++ */ ++ (void)vb2_buffer_attach_release_fence(&dst->vb2_buf); ++ + if (ctx->codec_ops->run(ctx)) + goto err_cancel_job; + +@@ -249,6 +265,13 @@ queue_init(void *priv, struct vb2_queue *src_vq, struct vb2_queue *dst_vq) + dst_vq->lock = &ctx->dev->vpu_mutex; + dst_vq->dev = ctx->dev->v4l2_dev.dev; + ++ /* ++ * Opt the CAPTURE queue into vb2 release-fence publishing. ++ * No-op unless CONFIG_VIDEOBUF2_RELEASE_FENCES=y; runtime cost ++ * is one extra fence allocation + dma_resv update per device_run. ++ */ ++ dst_vq->supports_release_fences = true; ++ + return vb2_queue_init(dst_vq); + } + +-- +2.53.0 + diff --git a/patches/subsystem/media/videobuf2/dma-resv-release-fence/0006-media-rockchip-rga-attach-dma_resv-release-fence-at-.patch b/patches/subsystem/media/videobuf2/dma-resv-release-fence/0006-media-rockchip-rga-attach-dma_resv-release-fence-at-.patch new file mode 100644 index 0000000..324fe07 --- /dev/null +++ b/patches/subsystem/media/videobuf2/dma-resv-release-fence/0006-media-rockchip-rga-attach-dma_resv-release-fence-at-.patch @@ -0,0 +1,117 @@ +From 2c63a63bf65739763051dc4ce7ce2ffaf2d514c4 Mon Sep 17 00:00:00 2001 +In-Reply-To: <20260429195306.239666-1-mfritsche@reauktion.de> +References: <20260429195306.239666-1-mfritsche@reauktion.de> +From: Markus Fritsche +Date: Sat, 9 May 2026 16:50:51 +0200 +Subject: [PATCH RFC v2] media: rockchip-rga: attach dma_resv release fence at + device_run +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +Opt the rockchip-rga driver into the new vb2 release-fence helper. + +Same shape as the hantro patch: attach a producer fence on the +CAPTURE-side dmabuf at m2m device_run, signalled by +vb2_buffer_done() when RGA completes the m2m operation. + +Differs from hantro in one mechanical detail: rga's device_run +wraps the entire body in spin_lock_irqsave(&rga->ctrl_lock). Our +helper calls dma_resv_lock(), which is sleepable, so the +buffer-fetch + fence-attach sequence has to run above the spinlock. +Restructure device_run so: + + - v4l2_m2m_next_src_buf / next_dst_buf, + - src->sequence increment, + - vb2_buffer_attach_release_fence(&dst->vb2_buf) + +run before spin_lock_irqsave; only the rga->curr assignment and +rga_hw_start() (the actual HW kick) remain inside the spinlock. + +This is safe under the m2m-job ownership model: by the time +device_run is called, the m2m core has selected this context and +serializes one device_run per context, so v4l2_m2m_next_*_buf +returns stable pointers until the corresponding *_buf_remove in +rga_isr. ctrl_lock was previously protecting per-device state +(rga->curr) and the HW register access, neither of which depends on +the buffer-fetch happening inside the lock. + +The CAPTURE queue is opted in with supports_release_fences=true at +queue_init. + +Userspace consumers of RGA-produced dmabufs (image-processing +pipelines, screen-rotation servers, gstreamer flows on Rockchip +boards) get spec-clean implicit-sync semantics, matching what +hantro does in the previous patch in this series. + +Sven Püschel's ongoing "media: platform: rga: Add RGA3 support" +v5 series (linux-rockchip 2026-04-28) restructures rga.c +substantially. If that lands first, the device_run restructure +here will need a rebase against the new shape; the locking story +itself is invariant. + +Cc: Jacob Chen +Cc: Ezequiel Garcia +Cc: Sven Püschel +Cc: Heiko Stuebner +Cc: Hans Verkuil +Cc: linux-media@vger.kernel.org +Cc: linux-rockchip@lists.infradead.org +Signed-off-by: Markus Fritsche +--- + drivers/media/platform/rockchip/rga/rga.c | 27 +++++++++++++++++++---- + 1 file changed, 23 insertions(+), 4 deletions(-) + +diff --git a/drivers/media/platform/rockchip/rga/rga.c b/drivers/media/platform/rockchip/rga/rga.c +index fea63b94c..03030c7ea 100644 +--- a/drivers/media/platform/rockchip/rga/rga.c ++++ b/drivers/media/platform/rockchip/rga/rga.c +@@ -38,15 +38,28 @@ static void device_run(void *prv) + struct vb2_v4l2_buffer *src, *dst; + unsigned long flags; + +- spin_lock_irqsave(&rga->ctrl_lock, flags); +- +- rga->curr = ctx; +- ++ /* ++ * Fetch the next-job buffers and (best-effort) attach a producer ++ * fence on CAPTURE before taking ctrl_lock below. ++ * vb2_buffer_attach_release_fence() takes dma_resv_lock, which is ++ * sleepable; ctrl_lock is taken with spin_lock_irqsave so any ++ * sleepable call must happen above it. Buffer ownership is ++ * already committed at this point: the m2m core has selected ++ * this context for device_run and serializes one device_run per ++ * context, so v4l2_m2m_next_*_buf returns stable pointers until ++ * the corresponding *_buf_remove in rga_isr. ++ */ + src = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx); + src->sequence = ctx->osequence++; + + dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx); + ++ (void)vb2_buffer_attach_release_fence(&dst->vb2_buf); ++ ++ spin_lock_irqsave(&rga->ctrl_lock, flags); ++ ++ rga->curr = ctx; ++ + rga_hw_start(rga, vb_to_rga(src), vb_to_rga(dst)); + + spin_unlock_irqrestore(&rga->ctrl_lock, flags); +@@ -123,6 +136,12 @@ queue_init(void *priv, struct vb2_queue *src_vq, struct vb2_queue *dst_vq) + dst_vq->lock = &ctx->rga->mutex; + dst_vq->dev = ctx->rga->v4l2_dev.dev; + ++ /* ++ * Opt the CAPTURE queue into vb2 release-fence publishing. ++ * Compile-time gated by CONFIG_VIDEOBUF2_RELEASE_FENCES. ++ */ ++ dst_vq->supports_release_fences = true; ++ + return vb2_queue_init(dst_vq); + } + +-- +2.53.0 + diff --git a/patches/subsystem/media/videobuf2/dma-resv-release-fence/README.md b/patches/subsystem/media/videobuf2/dma-resv-release-fence/README.md new file mode 100644 index 0000000..16ce5bb --- /dev/null +++ b/patches/subsystem/media/videobuf2/dma-resv-release-fence/README.md @@ -0,0 +1,61 @@ +# vb2_dma_resv release-fence — RFC v2 series + +Three-patch series that opts V4L2 m2m drivers into attaching real +producer fences to CAPTURE-side dmabufs, so implicit-sync GPU +consumers (Wayland / panfrost / panthor) wait correctly on the +producer's decode-completion rather than seeing the dma_buf core's +stub fence. + +| Patch | Subject | +|---|---| +| `0004` | media: videobuf2: add opt-in dma_resv producer fence helper | +| `0005` | media: hantro: attach dma_resv release fence at device_run | +| `0006` | media: rockchip-rga: attach dma_resv release fence at device_run | + +Numbered with the leading 4/5/6 because the fresnel build series carries +these alongside the 3 board-scoped `0001/0002/0003` Pinebook Pro DTS +patches; the numbers reflect apply-order in the PKGBUILD, not the +upstream lore series ordering (which starts at 1/4..4/4 with a cover). + +## Status + +**RFC v2.** Iterated on `lore.kernel.org/linux-media` after v1 was +rejected over the dma_fence finite-time contract gap and bus-locked +allocation issues. v2 attaches the fence at `device_run` instead of +QBUF, which puts allocation in slept-OK context (PM and clocks up, +job committed) — per Nicolas Dufresne's v1 review feedback. + +Cover-letter reference: `marfrit/dmabuf-modifier-triage#3` (campaign +session that owns the upstream-targeting work; this directory ships +the build-tree-ready form for kernel-agent fleet consumption). + +## Scope + +`subsystem/media/videobuf2/` for the helper (0004), with two +driver opt-ins (0005/0006) shipped together because hantro and +rockchip-rga both need the helper to be useful on the RK35xx fleet. +Splitting 0005/0006 into `driver/hantro/` and `driver/rockchip-rga/` +was considered but rejected: they're a single contract series, and +their apply-order matters (0004 must precede). Series-as-unit beats +per-driver promote eligibility here. + +## Fleet eligibility + +- **fresnel** (RK3399 + hantro + Mali-G52): eligible. Carried in + `fleet/fresnel.yaml` since 2026-05-15 (decision flipped from "defer + to v2" to "include — v2 in this tree"). +- **ohm** (RK3566 + hantro + Mali-G52): eligible. Was the original + reproducer for the green-frames symptom. Will be carried in + `fleet/ohm.yaml` once that manifest lands. +- **ampere** (RK3588 + hantro for some codecs): eligibility deferred — + RK3588 uses rkvdec2 for primary decode, hantro role is narrower. + Re-assess when `fleet/ampere.yaml` lands per issue #6. +- **boltzmann** (RK3588): same as ampere — defer. + +## Upstream targeting + +Not yet posted to linux-media in v2 form. Per `feedback_no_upstream.md` +the default is "build-tree only, wait for explicit ask". When/if the +upstream submission happens, this directory's `0004/0005/0006` are the +canonical source — they include the v2 commit headers (`PATCH RFC v2`, +`In-Reply-To` chain to the v1 cover-letter Message-Id). -- 2.47.3