Source-read complete: 3 RFC v2 patches dissected, v7.0 rkvdec_buf_queue site identified at line 954 of drivers/media/platform/rockchip/rkvdec/rkvdec.c, empirical disproof of Bug 3 UAPI drift via byte-identical v6.12↔v7.0 struct diff, hantro_v4l2.c confirmed unchanged across the same range. Rebase risk concentrated in videobuf2-core.c (medium — vb2 core sees regular activity); deferred to Phase 4 when boltzmann is reachable for the git apply --3way verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 KiB
Iteration 5 — Phase 2 (situation analysis)
Captured 2026-05-10 evening / 2026-05-11 in resume. Closes Phase 2 of iter5 per feedback_dev_process.md: source-read of the kernel paths Bug 2 touches, contract-before-patch citation of every site iter5 modifies. The major mid-Phase finding (Bug 3 collapses) was already folded back into Phase 0 (phase0_findings_iter5.md amendment, commit 31b9255).
Bug 2 — vb2_dma_resv blocker (real, RFC v2 ready)
Root-cause framing
Per memory reference_dmabuf_resv_blocker.md: V4L2 producers don't propagate VB2_BUF_STATE_DONE into the dmabuf's dma_resv exclusive fence. When userspace consumers (libva backend cap_pool readback path, Wayland compositors, etc.) import a V4L2-produced dmabuf and wait on the implicit-sync fence (poll(POLLIN) / DMA_BUF_IOCTL_EXPORT_SYNC_FILE), they see either no fences or the stub fence from dma_fence_get_stub().
For our cap_pool readback path, the practical result is: the libva backend reads the dmabuf-backed CAPTURE buffer before the kernel-side decoder IRQ has signalled completion, gets back the page state at QBUF time (the cap_pool init pattern RGB(0, 0x4c, 0)), and ships that to ffmpeg-vaapi-hwdownload. The kernel decoded the frame correctly — but the userspace consumer read the page out of order.
Why it surfaces on linux-fresnel-fourier 7.0-1 and not on linux-eos-arm 6.19.9-99: not a regression. The same bug existed on 6.19; iter3 hit it on hantro (memory reference_dmabuf_resv_blocker.md documents the symptom as all-zero pages on RK3399 hantro CAPTURE). The shift from "iter3 saw it on hantro only" to "iter4 sees it across all 4 codecs on the new kernel" is most likely timing: the new kernel's cap_pool allocation / buffer-handoff path is slightly faster (or slower) than 6.19's, and the userspace race window that was sometimes-closed-sometimes-open on rkvdec at 6.19 is now consistently open at 7.0. iter3 deferred this for hantro; iter4 surfaced it for rkvdec on the new substrate; iter5 fixes it for both blocks at the kernel layer.
RFC v2 patch series (source: ~/src/linux-rfc/ branch vb2-dma-resv-rfc)
Three operator-authored patches on top of v6.12:
Patch 1/3 — fbe8bf57a media: videobuf2: add dma_resv release-fence helper
drivers/media/common/videobuf2/videobuf2-core.c | +99
include/media/videobuf2-core.h | +19
Adds the opt-in API. Key surface:
int vb2_buffer_attach_release_fence(struct vb2_buffer *vb)— driver calls from buf_queue. Allocates adma_fenceon the queue's per-queue fence context (set up atvb2_core_queue_init), attaches it asDMA_RESV_USAGE_WRITEon each plane'sdmabuf->resv, stashes invb->release_fence. Skips planes whosevb2_plane.dbufis NULL.vb2_buffer_signal_release_fence(vb, state)— internal helper called fromvb2_buffer_done()on state transition. Signals + puts the fence. No-op whenvb->release_fenceis NULL (drivers that didn't opt in).- New
struct vb2_queuefields:u64 dma_resv_fence_context,atomic64_t dma_resv_fence_seqno,spinlock_t dma_resv_fence_lock. - New
struct vb2_bufferfield:struct dma_fence *release_fence.
This is the only non-trivial patch in the series — adds ~120 lines of new code in vb2 core. Drivers that don't opt in pay zero cost beyond a few extra struct fields.
Patch 2/3 — 14a68fcf0 media: hantro: attach dma_resv release fence at buf_queue
drivers/media/platform/verisilicon/hantro_v4l2.c | +12
The driver-side opt-in is one line of code plus a 10-line comment block:
static void hantro_buf_queue(struct vb2_buffer *vb)
{
...
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+ /*
+ * Opt in to vb2's dma_resv release-fence path. [...]
+ */
+ (void)vb2_buffer_attach_release_fence(vb);
}
Operator's commit message empirically validated on PineTab2 (RK3566 hantro) mainline 6.19 + this series backported: KWin's Transaction::watchDmaBuf wait completes correctly the moment hantro's IRQ fires.
Patch 3/3 — 89b699508 media: rockchip-rga: attach dma_resv release fence at buf_queue
drivers/media/platform/rockchip/rga/rga-buf.c | +10
Same shape as the hantro patch. Out-of-scope for iter5's libva path (we don't use RGA), but kept in the kernel-agent local-carry as part of the cohesive series — RGA is referenced by GStreamer flows on Rockchip boards and the operator's intent (per RFC commit message) is to land all three v4l2 producers together.
Gap — no rkvdec consumer patch
The series ships hantro + rga but not rkvdec. iter4 Phase 7 verified Bug 2 hits rkvdec too on the new substrate (constant 0x4c for H.264 inter + HEVC + VP9 cap_pool reads). iter5 contributes the missing 4th patch.
Patch 4/4 — media: rkvdec: attach dma_resv release fence at buf_queue (NEW, iter5 work)
Target file: drivers/media/platform/rockchip/rkvdec/rkvdec.c at v7.0 (post-staging-promotion path; was drivers/staging/media/rkvdec/ in earlier kernels).
Target function: rkvdec_buf_queue at line 954 of 028ef9c96e96 Linux 7.0:
static void rkvdec_buf_queue(struct vb2_buffer *vb)
{
struct rkvdec_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
}
Patch shape (mechanical, same as hantro patch):
static void rkvdec_buf_queue(struct vb2_buffer *vb)
{
struct rkvdec_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+
+ /*
+ * Opt in to vb2's dma_resv release-fence path. Userspace
+ * consumers that imported this buffer's dmabuf and wait on
+ * its implicit-sync fence get a real producer fence
+ * representing rkvdec's completion, instead of the stub
+ * fence dma_buf_export_sync_file substitutes when dma_resv
+ * is empty. Best-effort: a fence-allocation failure means we
+ * lose implicit-sync precision, no functional regression.
+ */
+ (void)vb2_buffer_attach_release_fence(vb);
}
Author trailer must preserve attribution discipline per memory feedback_gitea_as_claude_noether.md: this is Claude-authored work, sign as claude-noether <sentinel-email>, with a Co-Authored-By: trailer for the operator if iter5 is reviewed via PR flow. Local-carry-only acceptable per Phase 0 lock.
Rebase risk — v6.12 base → v7.0 base
The 3 existing RFC v2 patches were authored against v6.12. The kernel-agent product baseline is v7.0 (per fleet/fresnel.yaml). Risk surface:
| File | v6.12 → v7.0 delta | Rebase risk |
|---|---|---|
drivers/media/common/videobuf2/videobuf2-core.c |
Not measured (boltzmann offline). Expect non-zero delta — vb2 core sees regular activity. | MEDIUM — the helper patch adds includes + extends vb2_buffer_done + extends vb2_core_queue_init. Conflicts possible. Phase 4 task: run git apply --3way against v7.0 and resolve. |
include/media/videobuf2-core.h |
Not measured. | LOW — header changes typically less churn-prone. |
drivers/media/platform/verisilicon/hantro_v4l2.c |
Confirmed unchanged v6.12 → v7.0 (boltzmann diff stat showed 0 lines in hantro_v4l2.c). | LOW — patch should apply cleanly. |
drivers/media/platform/rockchip/rga/rga-buf.c |
Not measured. | LOW — rga sees less churn than vb2 core. |
drivers/media/platform/rockchip/rkvdec/rkvdec.c |
Not applicable — iter5 is authoring this patch fresh against v7.0. | N/A |
Boltzmann reconnection needed for Phase 4 final rebase verification. Not blocking Phase 2 close.
v4l2_m2m / v4l2-mem2mem rebase note
The hantro + rga patches both insert their opt-in call after v4l2_m2m_buf_queue(). The rkvdec consumer follows the same shape. If any of these v4l2_m2m_* helpers shifted between v6.12 and v7.0 in a way that affects the buf_queue call signature, the patches need updating. Not measured; Phase 4 task.
Bug 3 — collapsed (UAPI drift hypothesis was wrong)
Empirical disproof of "UAPI drift" hypothesis
iter4 Phase 7 doc speculated:
Hantro
Unable to set control(s)errors: a kernel-side rejection on hantro for MPEG-2/VP8. Substrate change appears to have shifted hantro's expected control structure or fields; iter1 (MPEG-2) and iter3 (VP8) were tested on 6.19.9 — UAPI likely drifted between 6.19.9 and 7.0 the same way VP9 did.
Empirical struct-by-struct check 2026-05-10:
ssh boltzmann 'cd ~/src/linux-rockchip &&
for ref in v6.12 028ef9c96e96; do
echo "===$ref==="
git show $ref:include/uapi/linux/v4l2-controls.h | awk \
"/^struct v4l2_ctrl_mpeg2_(sequence|picture|quantisation|vp8_frame) {/{f=1; print; next} f{print; if(\$0~/^};/) f=0}"
done'
Result: byte-identical struct definitions across v6.12 and v7.0 for:
struct v4l2_ctrl_mpeg2_sequence(8 fields)struct v4l2_ctrl_mpeg2_picture(8 fields)struct v4l2_ctrl_mpeg2_quantisation(4 fields)struct v4l2_ctrl_vp8_frame(30 fields)
Plus the surrounding drivers/media/v4l2-core/v4l2-ctrls-defs.c delta was 15 lines, all additions for unrelated controls (FLASH duration, HEVC EXT_SPS_*_RPS, AV1).
So the iter4 hypothesis was wrong — there is no UAPI drift on MPEG-2 or VP8.
Actual cause of "Unable to set control(s)"
Re-traced MPEG-2 decode on fresnel 7.0-1 with explicit hantro-decoder env override (/dev/video2 + /dev/media0 on the 2026-05-10 boot):
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_NO_AUTODETECT=1 \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video2 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
ffmpeg -hwaccel vaapi -i bbb_720p10s_mpeg2.ts -frames:v 2 -f null -
Captured ioctl trace (strace -ff -v -x -s 999999). Sequence of VIDIOC_S_EXT_CTRLS submissions on hantro:
| # | ctrl_class | controls | result | meaning |
|---|---|---|---|---|
| 1 | 0 | 0xa40900 H264_DECODE_MODE, 0xa40901 H264_START_CODE |
EINVAL error_idx=2 | Backend probe — fails because hantro doesn't expose H.264 |
| 2 | 0 | 0xa40a95 HEVC_DECODE_MODE, 0xa40a96 HEVC_START_CODE |
EINVAL error_idx=2 | Backend probe — fails because hantro doesn't expose HEVC |
| 3 | 0xf010000 |
0xa409dc MPEG2_SEQUENCE(12B), 0xa409dd MPEG2_PICTURE(32B), 0xa409de MPEG2_QUANTISATION(256B) |
0 | Frame 1 controls accepted |
| 4 | 0xf010000 |
same shape | 0 | Frame 2 controls accepted |
| 5..7 | 0xf010000 |
same shape, varying timestamps | 0 | More frames |
The init-time H.264 + HEVC probes happen on every device the libva backend binds to. On rkvdec they succeed (rkvdec supports both); on hantro they EINVAL because hantro is MPEG-2 + VP8 only. The EINVAL log lines are cosmetic — actual MPEG-2 (and presumably VP8) frame submission goes through = 0.
Bug 3 → B4 backlog item, not iter5 scope
This is iter1+ backlog item B4 ("context.c log suppression for unsupported codec controls"). Cosmetic noise. Doesn't affect functional decode. The actual MPEG-2 + VP8 pixel-output FAIL at iter4 Phase 7 was caused by Bug 2 (cap_pool readback returning init pattern), identical in shape to the rkvdec case. Fixing Bug 2 fixes MPEG-2 + VP8 too.
B4 stays in backlog for a separate iteration; iter5 doesn't touch the backend.
Kernel-agent operational state
Per memory project_kernel_agent.md (2026-05-09):
- Agent spec'd, not operational.
ka-promote / ka-close / ka-install / ka-statusCLI verbs designed but not implemented. - Fleet manifest exists at
git.reauktion.de/marfrit/kernel-agent/fleet/fresnel.yamland documents the canonical patch set + baseline. - Build host primary: boltzmann (kbuild-aarch64 surrogate, native).
- Build host fallback: fermi (hertz LXD, ALARM aarch64).
- No distcc for kernel-agent builds (per
feedback_kernel_agent_no_distcc.md). - Package versioning:
${baseline_ref}.kafr${pkgrel}. iter5 produces7.0.kafr2.
The current manifest fleet/fresnel.yaml explicitly excludes vb2_dma_resv per a 2026-04-28 decision:
Explicitly NOT included (tracked elsewhere, decision logged):
- subsystem/media/videobuf2/dma-resv-release-fence/ (RFC v1 rejected; v2 in design — see marfrit/dmabuf-modifier-triage#3. Skip until v2 lands or we explicitly accept v1-shape parity with ohm.)
iter5 work re-classifies vb2_dma_resv from "skip" to "include," updates the manifest, and lands the build. Manual build path (no ka-* CLI yet) is the fallback per Phase 0 lock.
Phase 4 plan preview
Phase 4 will detail the patch sequence + manifest update + build pipeline + verification matrix. Predicted shape:
- 4 patches (3 RFC v2 rebased + 1 new rkvdec consumer).
- 1 manifest update to
fleet/fresnel.yaml: removeExplicitly NOT includedblock for vb2_dma_resv, add 4 includes underincludes:, bump version comment. - 1 build cycle on boltzmann producing
linux-fresnel-fourier 7.0.kafr2-*.pkg.tar.zst. - 1 install + reboot on fresnel via pacman.
- 1 Phase 7 verification matrix running ffmpeg-vaapi-hwdownload on all 5 codecs, byte-identical YUV check vs SW reference, no transitive proof.
Predicted LOC delta:
- Patch 1 (vb2 helper): ~120 LOC kernel, operator-authored.
- Patch 2 (hantro consumer): +12 LOC, operator-authored.
- Patch 3 (rga consumer): +10 LOC, operator-authored.
- Patch 4 (rkvdec consumer): +12 LOC, claude-noether-authored (iter5 contribution).
- Manifest update: ~10 LOC YAML.
Total iter5 new code authorship: ~12 LOC of kernel C, ~10 LOC of YAML config.
Phase 4 source-read targets
Already complete in Phase 2 (above):
- ✓
~/src/linux-rfc/branchvb2-dma-resv-rfc— 3 RFC v2 patches read end-to-end. - ✓ v6.12 + v7.0
include/uapi/linux/v4l2-controls.hMPEG-2 + VP8 struct diff — byte-identical. - ✓ v6.12 + v7.0
drivers/media/v4l2-core/v4l2-ctrls-defs.cdiff — 15 lines, none MPEG-2/VP8 related. - ✓ v7.0
drivers/media/platform/rockchip/rkvdec/rkvdec.c::rkvdec_buf_queue— confirmed mechanical opt-in site. - ✓ Fleet manifest
fleet/fresnel.yaml— current state captured, exclusion-of-vb2_dma_resv noted. - ✓ Empirical re-trace of MPEG-2 decode on fresnel — confirms Bug 3 is B4 cosmetic noise.
For Phase 4 (deferred until boltzmann reconnects):
- v6.12 → v7.0 delta on
drivers/media/common/videobuf2/videobuf2-core.c— rebase risk assessment. - v6.12 → v7.0 delta on
drivers/media/platform/rockchip/rga/rga-buf.c— confirm rebase trivial. - Apply the 3 RFC v2 patches with
git apply --3wayonto v7.0 baseline and capture conflict-rate.
What "iteration 5 close" looks like
Per feedback_dev_process.md Phase 8:
- All 4 Phase 1 criteria green (Bug 2 closed for all 5 codecs · substrate ships from kernel-agent · no codec-contract regression · 5/5 direct verification).
phase8_iteration5_close.mddocumenting the patches, build details, verification matrix.- Campaign scoreboard updated from "5/5 (4 direct + 1 transitive)" to "5/5 direct."
- Memory entries distilled — likely 1 new entry on the contract: "vb2_dma_resv pattern: V4L2 producers must opt-in per driver, one line at end of buf_queue callback." Predicted name:
reference_vb2_dma_resv_opt_in_pattern.mdor fold update into existingreference_dmabuf_resv_blocker.md. - Phase 5 sonnet-architect review pass signed off.
- Commits authored as
claude-noetherperfeedback_gitea_as_claude_noether.md. Operator-authored RFC v2 patches preserveSigned-off-by: Markus Fritsche <mfritsche@reauktion.de>. - Kernel-agent
fleet/fresnel.yamlupdated and committed.
Predicted iter5 difficulty vs iter1-4:
- vs iter1-3 (~370 LOC libva backend per codec): iter5 is much smaller in LOC but larger in scope — touches kernel + build pipeline instead of single binary.
- vs iter4 (single new codec, 4 commits + 1 fix-forward): iter5 has 4 patches (3 operator-existing + 1 claude-new) + 1 manifest update. Comparable patch count, simpler per-patch shape.
- Predicted Phase 7 failure modes:
- RFC v2 rebase conflicts on videobuf2-core.c (medium risk — vb2 core is active code).
- Helper patch causes silent regression on a non-opted-in driver (low risk — patch is opt-in by design).
- fence-allocation under memory pressure fails and the fence-attach call returns -ENOMEM (low impact — best-effort by design).
- cap_pool readback still fails after the fix (the userspace race window isn't what we thought it was). This is the highest-impact failure mode — would force Phase 7 → Phase 4 or even Phase 7 → Phase 0 loopback.