diff --git a/phase2_iter5_situation.md b/phase2_iter5_situation.md new file mode 100644 index 0000000..bd6a6ea --- /dev/null +++ b/phase2_iter5_situation.md @@ -0,0 +1,262 @@ +# Iteration 5 — Phase 2 (situation analysis) + +Captured 2026-05-10 evening / 2026-05-11 in resume. Closes Phase 2 of iter5 per `feedback_dev_process.md`: source-read of the kernel paths Bug 2 touches, contract-before-patch citation of every site iter5 modifies. The major mid-Phase finding (Bug 3 collapses) was already folded back into Phase 0 (`phase0_findings_iter5.md` amendment, commit `31b9255`). + +## Bug 2 — vb2_dma_resv blocker (real, RFC v2 ready) + +### Root-cause framing + +Per memory `reference_dmabuf_resv_blocker.md`: V4L2 producers don't propagate `VB2_BUF_STATE_DONE` into the dmabuf's `dma_resv` exclusive fence. When userspace consumers (libva backend cap_pool readback path, Wayland compositors, etc.) import a V4L2-produced dmabuf and wait on the implicit-sync fence (`poll(POLLIN)` / `DMA_BUF_IOCTL_EXPORT_SYNC_FILE`), they see either no fences or the stub fence from `dma_fence_get_stub()`. + +For our cap_pool readback path, the practical result is: the libva backend reads the dmabuf-backed CAPTURE buffer before the kernel-side decoder IRQ has signalled completion, gets back the page state at QBUF time (the cap_pool init pattern `RGB(0, 0x4c, 0)`), and ships that to ffmpeg-vaapi-hwdownload. The kernel decoded the frame correctly — but the userspace consumer read the page out of order. + +Why it surfaces on `linux-fresnel-fourier 7.0-1` and not on `linux-eos-arm 6.19.9-99`: not a regression. The same bug existed on 6.19; iter3 hit it on hantro (memory `reference_dmabuf_resv_blocker.md` documents the symptom as all-zero pages on RK3399 hantro CAPTURE). The shift from "iter3 saw it on hantro only" to "iter4 sees it across all 4 codecs on the new kernel" is most likely **timing**: the new kernel's cap_pool allocation / buffer-handoff path is slightly faster (or slower) than 6.19's, and the userspace race window that was sometimes-closed-sometimes-open on rkvdec at 6.19 is now consistently open at 7.0. iter3 deferred this for hantro; iter4 surfaced it for rkvdec on the new substrate; iter5 fixes it for both blocks at the kernel layer. + +### RFC v2 patch series (source: `~/src/linux-rfc/` branch `vb2-dma-resv-rfc`) + +Three operator-authored patches on top of `v6.12`: + +#### Patch 1/3 — `fbe8bf57a media: videobuf2: add dma_resv release-fence helper` + +``` +drivers/media/common/videobuf2/videobuf2-core.c | +99 +include/media/videobuf2-core.h | +19 +``` + +Adds the opt-in API. Key surface: + +- `int vb2_buffer_attach_release_fence(struct vb2_buffer *vb)` — driver calls from buf_queue. Allocates a `dma_fence` on the queue's per-queue fence context (set up at `vb2_core_queue_init`), attaches it as `DMA_RESV_USAGE_WRITE` on each plane's `dmabuf->resv`, stashes in `vb->release_fence`. Skips planes whose `vb2_plane.dbuf` is NULL. +- `vb2_buffer_signal_release_fence(vb, state)` — internal helper called from `vb2_buffer_done()` on state transition. Signals + puts the fence. No-op when `vb->release_fence` is NULL (drivers that didn't opt in). +- New `struct vb2_queue` fields: `u64 dma_resv_fence_context`, `atomic64_t dma_resv_fence_seqno`, `spinlock_t dma_resv_fence_lock`. +- New `struct vb2_buffer` field: `struct dma_fence *release_fence`. + +This is the only non-trivial patch in the series — adds ~120 lines of new code in vb2 core. Drivers that don't opt in pay zero cost beyond a few extra struct fields. + +#### Patch 2/3 — `14a68fcf0 media: hantro: attach dma_resv release fence at buf_queue` + +``` +drivers/media/platform/verisilicon/hantro_v4l2.c | +12 +``` + +The driver-side opt-in is one line of code plus a 10-line comment block: + +```c +static void hantro_buf_queue(struct vb2_buffer *vb) +{ + ... + v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf); + ++ /* ++ * Opt in to vb2's dma_resv release-fence path. [...] ++ */ ++ (void)vb2_buffer_attach_release_fence(vb); +} +``` + +Operator's commit message empirically validated on PineTab2 (RK3566 hantro) mainline 6.19 + this series backported: KWin's `Transaction::watchDmaBuf` wait completes correctly the moment hantro's IRQ fires. + +#### Patch 3/3 — `89b699508 media: rockchip-rga: attach dma_resv release fence at buf_queue` + +``` +drivers/media/platform/rockchip/rga/rga-buf.c | +10 +``` + +Same shape as the hantro patch. Out-of-scope for iter5's libva path (we don't use RGA), but kept in the kernel-agent local-carry as part of the cohesive series — RGA is referenced by GStreamer flows on Rockchip boards and the operator's intent (per RFC commit message) is to land all three v4l2 producers together. + +### Gap — no rkvdec consumer patch + +The series ships hantro + rga but **not rkvdec**. iter4 Phase 7 verified Bug 2 hits rkvdec too on the new substrate (constant `0x4c` for H.264 inter + HEVC + VP9 cap_pool reads). iter5 contributes the missing 4th patch. + +### Patch 4/4 — `media: rkvdec: attach dma_resv release fence at buf_queue` (NEW, iter5 work) + +Target file: `drivers/media/platform/rockchip/rkvdec/rkvdec.c` at v7.0 (post-staging-promotion path; was `drivers/staging/media/rkvdec/` in earlier kernels). + +Target function: `rkvdec_buf_queue` at line 954 of `028ef9c96e96 Linux 7.0`: + +```c +static void rkvdec_buf_queue(struct vb2_buffer *vb) +{ + struct rkvdec_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue); + struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); + + v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf); +} +``` + +Patch shape (mechanical, same as hantro patch): + +```diff + static void rkvdec_buf_queue(struct vb2_buffer *vb) + { + struct rkvdec_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue); + struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); + + v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf); ++ ++ /* ++ * Opt in to vb2's dma_resv release-fence path. Userspace ++ * consumers that imported this buffer's dmabuf and wait on ++ * its implicit-sync fence get a real producer fence ++ * representing rkvdec's completion, instead of the stub ++ * fence dma_buf_export_sync_file substitutes when dma_resv ++ * is empty. Best-effort: a fence-allocation failure means we ++ * lose implicit-sync precision, no functional regression. ++ */ ++ (void)vb2_buffer_attach_release_fence(vb); + } +``` + +Author trailer must preserve attribution discipline per memory `feedback_gitea_as_claude_noether.md`: this is Claude-authored work, sign as `claude-noether `, with a `Co-Authored-By:` trailer for the operator if iter5 is reviewed via PR flow. Local-carry-only acceptable per Phase 0 lock. + +## Rebase risk — v6.12 base → v7.0 base + +The 3 existing RFC v2 patches were authored against v6.12. The kernel-agent product baseline is v7.0 (per `fleet/fresnel.yaml`). Risk surface: + +| File | v6.12 → v7.0 delta | Rebase risk | +|---|---|---| +| `drivers/media/common/videobuf2/videobuf2-core.c` | Not measured (boltzmann offline). Expect non-zero delta — vb2 core sees regular activity. | **MEDIUM** — the helper patch adds includes + extends `vb2_buffer_done` + extends `vb2_core_queue_init`. Conflicts possible. Phase 4 task: run `git apply --3way` against v7.0 and resolve. | +| `include/media/videobuf2-core.h` | Not measured. | **LOW** — header changes typically less churn-prone. | +| `drivers/media/platform/verisilicon/hantro_v4l2.c` | Confirmed unchanged v6.12 → v7.0 (boltzmann diff stat showed 0 lines in hantro_v4l2.c). | **LOW** — patch should apply cleanly. | +| `drivers/media/platform/rockchip/rga/rga-buf.c` | Not measured. | **LOW** — rga sees less churn than vb2 core. | +| `drivers/media/platform/rockchip/rkvdec/rkvdec.c` | Not applicable — iter5 is authoring this patch fresh against v7.0. | N/A | + +Boltzmann reconnection needed for Phase 4 final rebase verification. Not blocking Phase 2 close. + +### v4l2_m2m / v4l2-mem2mem rebase note + +The hantro + rga patches both insert their opt-in call *after* `v4l2_m2m_buf_queue()`. The rkvdec consumer follows the same shape. If any of these `v4l2_m2m_*` helpers shifted between v6.12 and v7.0 in a way that affects the buf_queue call signature, the patches need updating. Not measured; Phase 4 task. + +## Bug 3 — collapsed (UAPI drift hypothesis was wrong) + +### Empirical disproof of "UAPI drift" hypothesis + +iter4 Phase 7 doc speculated: + +> **Hantro `Unable to set control(s)` errors**: a kernel-side rejection on hantro for MPEG-2/VP8. Substrate change appears to have shifted hantro's expected control structure or fields; iter1 (MPEG-2) and iter3 (VP8) were tested on 6.19.9 — UAPI likely drifted between 6.19.9 and 7.0 the same way VP9 did. + +Empirical struct-by-struct check 2026-05-10: + +```bash +ssh boltzmann 'cd ~/src/linux-rockchip && + for ref in v6.12 028ef9c96e96; do + echo "===$ref===" + git show $ref:include/uapi/linux/v4l2-controls.h | awk \ + "/^struct v4l2_ctrl_mpeg2_(sequence|picture|quantisation|vp8_frame) {/{f=1; print; next} f{print; if(\$0~/^};/) f=0}" + done' +``` + +Result: **byte-identical** struct definitions across v6.12 and v7.0 for: +- `struct v4l2_ctrl_mpeg2_sequence` (8 fields) +- `struct v4l2_ctrl_mpeg2_picture` (8 fields) +- `struct v4l2_ctrl_mpeg2_quantisation` (4 fields) +- `struct v4l2_ctrl_vp8_frame` (30 fields) + +Plus the surrounding `drivers/media/v4l2-core/v4l2-ctrls-defs.c` delta was 15 lines, all additions for unrelated controls (FLASH duration, HEVC EXT_SPS_*_RPS, AV1). + +So the iter4 hypothesis was wrong — there is no UAPI drift on MPEG-2 or VP8. + +### Actual cause of "Unable to set control(s)" + +Re-traced MPEG-2 decode on fresnel 7.0-1 with explicit hantro-decoder env override (`/dev/video2 + /dev/media0` on the 2026-05-10 boot): + +```bash +LIBVA_DRIVER_NAME=v4l2_request \ + LIBVA_V4L2_REQUEST_NO_AUTODETECT=1 \ + LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video2 \ + LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \ + ffmpeg -hwaccel vaapi -i bbb_720p10s_mpeg2.ts -frames:v 2 -f null - +``` + +Captured ioctl trace (`strace -ff -v -x -s 999999`). Sequence of `VIDIOC_S_EXT_CTRLS` submissions on hantro: + +| # | ctrl_class | controls | result | meaning | +|---|---|---|---|---| +| 1 | 0 | `0xa40900 H264_DECODE_MODE`, `0xa40901 H264_START_CODE` | **EINVAL** error_idx=2 | Backend probe — fails because hantro doesn't expose H.264 | +| 2 | 0 | `0xa40a95 HEVC_DECODE_MODE`, `0xa40a96 HEVC_START_CODE` | **EINVAL** error_idx=2 | Backend probe — fails because hantro doesn't expose HEVC | +| 3 | `0xf010000` | `0xa409dc MPEG2_SEQUENCE`(12B), `0xa409dd MPEG2_PICTURE`(32B), `0xa409de MPEG2_QUANTISATION`(256B) | **0** | Frame 1 controls accepted | +| 4 | `0xf010000` | same shape | **0** | Frame 2 controls accepted | +| 5..7 | `0xf010000` | same shape, varying timestamps | **0** | More frames | + +The init-time H.264 + HEVC probes happen on every device the libva backend binds to. On rkvdec they succeed (rkvdec supports both); on hantro they EINVAL because hantro is MPEG-2 + VP8 only. The EINVAL log lines are cosmetic — actual MPEG-2 (and presumably VP8) frame submission goes through `= 0`. + +### Bug 3 → B4 backlog item, not iter5 scope + +This is iter1+ backlog item **B4** ("context.c log suppression for unsupported codec controls"). Cosmetic noise. Doesn't affect functional decode. The actual MPEG-2 + VP8 pixel-output FAIL at iter4 Phase 7 was caused by **Bug 2** (cap_pool readback returning init pattern), identical in shape to the rkvdec case. Fixing Bug 2 fixes MPEG-2 + VP8 too. + +B4 stays in backlog for a separate iteration; iter5 doesn't touch the backend. + +## Kernel-agent operational state + +Per memory `project_kernel_agent.md` (2026-05-09): + +- Agent **spec'd, not operational**. `ka-promote / ka-close / ka-install / ka-status` CLI verbs designed but not implemented. +- Fleet manifest exists at `git.reauktion.de/marfrit/kernel-agent/fleet/fresnel.yaml` and documents the canonical patch set + baseline. +- Build host primary: boltzmann (kbuild-aarch64 surrogate, native). +- Build host fallback: fermi (hertz LXD, ALARM aarch64). +- No distcc for kernel-agent builds (per `feedback_kernel_agent_no_distcc.md`). +- Package versioning: `${baseline_ref}.kafr${pkgrel}`. iter5 produces `7.0.kafr2`. + +The current manifest `fleet/fresnel.yaml` explicitly excludes vb2_dma_resv per a 2026-04-28 decision: + +> Explicitly NOT included (tracked elsewhere, decision logged): +> - subsystem/media/videobuf2/dma-resv-release-fence/ (RFC v1 rejected; +> v2 in design — see marfrit/dmabuf-modifier-triage#3. Skip until v2 lands +> or we explicitly accept v1-shape parity with ohm.) + +iter5 work re-classifies vb2_dma_resv from "skip" to "include," updates the manifest, and lands the build. Manual build path (no `ka-*` CLI yet) is the fallback per Phase 0 lock. + +## Phase 4 plan preview + +Phase 4 will detail the patch sequence + manifest update + build pipeline + verification matrix. Predicted shape: + +- **4 patches** (3 RFC v2 rebased + 1 new rkvdec consumer). +- **1 manifest update** to `fleet/fresnel.yaml`: remove `Explicitly NOT included` block for vb2_dma_resv, add 4 includes under `includes:`, bump version comment. +- **1 build cycle** on boltzmann producing `linux-fresnel-fourier 7.0.kafr2-*.pkg.tar.zst`. +- **1 install + reboot on fresnel** via pacman. +- **1 Phase 7 verification matrix** running ffmpeg-vaapi-hwdownload on all 5 codecs, byte-identical YUV check vs SW reference, no transitive proof. + +Predicted LOC delta: +- Patch 1 (vb2 helper): ~120 LOC kernel, **operator-authored**. +- Patch 2 (hantro consumer): +12 LOC, operator-authored. +- Patch 3 (rga consumer): +10 LOC, operator-authored. +- Patch 4 (rkvdec consumer): +12 LOC, **claude-noether-authored (iter5 contribution)**. +- Manifest update: ~10 LOC YAML. + +Total iter5 new code authorship: ~12 LOC of kernel C, ~10 LOC of YAML config. + +## Phase 4 source-read targets + +Already complete in Phase 2 (above): +- ✓ `~/src/linux-rfc/` branch `vb2-dma-resv-rfc` — 3 RFC v2 patches read end-to-end. +- ✓ v6.12 + v7.0 `include/uapi/linux/v4l2-controls.h` MPEG-2 + VP8 struct diff — byte-identical. +- ✓ v6.12 + v7.0 `drivers/media/v4l2-core/v4l2-ctrls-defs.c` diff — 15 lines, none MPEG-2/VP8 related. +- ✓ v7.0 `drivers/media/platform/rockchip/rkvdec/rkvdec.c::rkvdec_buf_queue` — confirmed mechanical opt-in site. +- ✓ Fleet manifest `fleet/fresnel.yaml` — current state captured, exclusion-of-vb2_dma_resv noted. +- ✓ Empirical re-trace of MPEG-2 decode on fresnel — confirms Bug 3 is B4 cosmetic noise. + +For Phase 4 (deferred until boltzmann reconnects): +- v6.12 → v7.0 delta on `drivers/media/common/videobuf2/videobuf2-core.c` — rebase risk assessment. +- v6.12 → v7.0 delta on `drivers/media/platform/rockchip/rga/rga-buf.c` — confirm rebase trivial. +- Apply the 3 RFC v2 patches with `git apply --3way` onto v7.0 baseline and capture conflict-rate. + +## What "iteration 5 close" looks like + +Per `feedback_dev_process.md` Phase 8: + +- All 4 Phase 1 criteria green (Bug 2 closed for all 5 codecs · substrate ships from kernel-agent · no codec-contract regression · 5/5 direct verification). +- `phase8_iteration5_close.md` documenting the patches, build details, verification matrix. +- Campaign scoreboard updated from "5/5 (4 direct + 1 transitive)" to "5/5 direct." +- Memory entries distilled — likely 1 new entry on the contract: "vb2_dma_resv pattern: V4L2 producers must opt-in per driver, one line at end of buf_queue callback." Predicted name: `reference_vb2_dma_resv_opt_in_pattern.md` or fold update into existing `reference_dmabuf_resv_blocker.md`. +- Phase 5 sonnet-architect review pass signed off. +- Commits authored as `claude-noether` per `feedback_gitea_as_claude_noether.md`. Operator-authored RFC v2 patches preserve `Signed-off-by: Markus Fritsche `. +- Kernel-agent `fleet/fresnel.yaml` updated and committed. + +Predicted iter5 difficulty vs iter1-4: + +- **vs iter1-3 (~370 LOC libva backend per codec)**: iter5 is **much smaller in LOC** but **larger in scope** — touches kernel + build pipeline instead of single binary. +- **vs iter4 (single new codec, 4 commits + 1 fix-forward)**: iter5 has 4 patches (3 operator-existing + 1 claude-new) + 1 manifest update. Comparable patch count, simpler per-patch shape. +- **Predicted Phase 7 failure modes**: + 1. RFC v2 rebase conflicts on videobuf2-core.c (medium risk — vb2 core is active code). + 2. Helper patch causes silent regression on a non-opted-in driver (low risk — patch is opt-in by design). + 3. fence-allocation under memory pressure fails and the fence-attach call returns -ENOMEM (low impact — best-effort by design). + 4. cap_pool readback still fails after the fix (the userspace race window isn't what we thought it was). **This is the highest-impact failure mode** — would force Phase 7 → Phase 4 or even Phase 7 → Phase 0 loopback.