iter5 Phase 4: plan — 4 patches + manifest diff + PKGBUILD bump

12 contract clauses (C1..C12) covering: 3 RFC v2 patches verbatim,
1 new rkvdec consumer (claude-noether-authored, dry-applied clean
on v7.0 in worktree test), kernel-agent patches/ scope tag +
fleet/fresnel.yaml diff, marfrit-packages PKGBUILD bump 7.0-1 → 7.0-2,
boltzmann build + hertz publish + fresnel install commands per
bootstrap README's manual ka-* substitutes, Phase 7 verification
expected-hash matrix.

Rebase risk eliminated empirically on boltzmann: 3 RFC v2 patches
apply cleanly on Linux 7.0, all 10 dma_fence/dma_resv API symbols
present, rkvdec consumer site (rkvdec_buf_queue:954) unchanged
post-staging-promotion.

Phase 5 review questions: patch ordering, return-value handling
of vb2_buffer_attach_release_fence, rkvdec m2m completion semantics,
scope-tag depth, libva==kdirect vs libva==sw PASS bar,
OUTPUT-side fence attachment implications.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-11 07:40:44 +00:00
parent 3c05564e99
commit a809e9c0b8
+349
View File
@@ -0,0 +1,349 @@
# Iteration 5 — Phase 4 (plan)
Captured 2026-05-11 post-Phase-3, after Phase 0+2+3 narrowed every variable. Per `feedback_dev_process.md` Phase 4: contract-before-code list of every operation iter5 commits, citing patch refs, manifest diffs, build commands, and verification expectations. Phase 5 sonnet-architect review consumes this doc; Phase 6 implements it.
**Substrate at Phase 4 open** (re-verified):
- Kernel: `linux-fresnel-fourier 7.0-1` on fresnel.
- Fork tip: `692eaa0` on `git.reauktion.de/marfrit/libva-v4l2-request-fourier`. Unchanged.
- Boltzmann reachable (was offline mid-Phase-2, back at Phase 4 open). Used both for v7.0 reference reads and the rebase verification below.
- Bug 2 reproduction: empirical Phase 3 sweep showed 4/5 codecs race-lose via libva; kernel-direct byte-clean for all 4 vs SW; MPEG-2 wins by being fastest.
- Bug 3: collapsed (Phase 0 amendment).
## Pre-Phase-4 verification on boltzmann
Rebase risk was flagged MEDIUM at Phase 2 because videobuf2-core.c sees regular kernel activity. Empirical verification 2026-05-11 on boltzmann:
```
cd ~/src/linux-rockchip
git worktree add /tmp/rkvdec_test 028ef9c96e96 # v7.0
cd /tmp/rkvdec_test
git am /tmp/rfc_v2_series.patch # 3 RFC v2 patches
→ Applying: media: videobuf2: add dma_resv release-fence helper
→ Applying: media: hantro: attach dma_resv release fence at buf_queue
→ Applying: media: rockchip-rga: attach dma_resv release fence at buf_queue
```
**Zero conflicts**, no 3-way merge needed. videobuf2-core.c rebase risk **downgraded MEDIUM → LOW**.
Symbol API surface check on v7.0:
| Symbol used by helper patch | v7.0 hits | Status |
|---|---|---|
| `dma_fence_init` | 6 | ✓ |
| `dma_fence_get` | 2 | ✓ |
| `dma_fence_put` | 4 | ✓ |
| `dma_fence_signal` | 13 | ✓ |
| `dma_fence_set_error` | 2 | ✓ |
| `dma_fence_context_alloc` | 3 | ✓ |
| `dma_resv_lock` | 6 | ✓ |
| `dma_resv_unlock` | 5 | ✓ |
| `dma_resv_add_fence` | 5 | ✓ |
| `DMA_RESV_USAGE_WRITE` | 1 | ✓ |
All 10 symbols present. Compile-time semantics match.
New rkvdec consumer patch dry-applied on top of the 3-patch stack: clean.
Final stack (test worktree, since torn down):
```
8f239179c12f media: rkvdec: attach dma_resv release fence at buf_queue
a7f65cd361bd media: rockchip-rga: attach dma_resv release fence at buf_queue
d13f972811be media: hantro: attach dma_resv release fence at buf_queue
e2f781a4a398 media: videobuf2: add dma_resv release-fence helper
028ef9c96e96 Linux 7.0
```
## Contract clauses
Every operation iter5 commits is one of these. Phase 5 review checks each against current-state evidence; Phase 6 implements each in order; Phase 7 verifies each.
### C1 — Patch 1/4: vb2_dma_resv helper
**Source**: `~/src/linux-rfc/fbe8bf57a media: videobuf2: add dma_resv release-fence helper`. Operator-authored (`Markus Fritsche <mfritsche@reauktion.de>`, 2026-04-28).
**Files touched**: `drivers/media/common/videobuf2/videobuf2-core.c` (+99), `include/media/videobuf2-core.h` (+19).
**Surface**:
- `int vb2_buffer_attach_release_fence(struct vb2_buffer *vb)` — driver-facing opt-in API. Allocates a `dma_fence` on the queue's per-queue fence context, attaches as `DMA_RESV_USAGE_WRITE` on each plane's `dmabuf->resv`, stashes in `vb->release_fence`. Skips planes whose `vb2_plane.dbuf == NULL`. Returns 0 / -ENOMEM.
- `vb2_buffer_signal_release_fence(vb, state)` — internal helper, called from `vb2_buffer_done()` on state transition. Signals + puts the fence. No-op when `vb->release_fence == NULL`.
- New `struct vb2_queue` fields: `u64 dma_resv_fence_context`, `atomic64_t dma_resv_fence_seqno`, `spinlock_t dma_resv_fence_lock`.
- New `struct vb2_buffer` field: `struct dma_fence *release_fence`.
**Contract**: opt-in. Drivers that don't call `vb2_buffer_attach_release_fence()` from their `buf_queue` callback see no behavior change — `vb->release_fence` stays NULL, signal path is a no-op.
**Phase 6 action**: push patch to `git.reauktion.de/marfrit/kernel-agent/patches/subsystem/media/dma-resv-release-fence/0001-media-videobuf2-add-dma_resv-release-fence-helper.patch` via Gitea contents API as `claude-noether`.
### C2 — Patch 2/4: hantro consumer
**Source**: `~/src/linux-rfc/14a68fcf0 media: hantro: attach dma_resv release fence at buf_queue`. Operator-authored.
**Files touched**: `drivers/media/platform/verisilicon/hantro_v4l2.c` (+12).
**Diff shape**: one `(void)vb2_buffer_attach_release_fence(vb);` call inserted after `v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);` in `hantro_buf_queue()`, plus a 10-line comment block.
**Contract**: hantro CAPTURE-side dmabufs now have a real producer fence in their `dma_resv` chain, signalled when hantro's m2m completion path calls `vb2_buffer_done()`.
**Phase 6 action**: push to `kernel-agent/patches/subsystem/media/dma-resv-release-fence/0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch`.
### C3 — Patch 3/4: rockchip-rga consumer
**Source**: `~/src/linux-rfc/89b699508 media: rockchip-rga: attach dma_resv release fence at buf_queue`. Operator-authored.
**Files touched**: `drivers/media/platform/rockchip/rga/rga-buf.c` (+10).
**Diff shape**: same one-line opt-in + comment as hantro consumer, in `rga_buf_queue()`.
**Out-of-scope for iter5 libva path** (we don't use RGA), but kept in the series per the RFC v2 cohesion — RGA is referenced by GStreamer flows on Rockchip boards and the operator's intent was to land all three v4l2 producers together.
**Phase 6 action**: push to `kernel-agent/patches/subsystem/media/dma-resv-release-fence/0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch`.
### C4 — Patch 4/4: rkvdec consumer (NEW — iter5 contribution)
**Author**: `claude-noether`. Iter5's only new code.
**Files touched**: `drivers/media/platform/rockchip/rkvdec/rkvdec.c` (+12 lines).
**Target function**: `rkvdec_buf_queue` at line 954 of v7.0 (post-staging-promotion path; was `drivers/staging/media/rkvdec/rkvdec.c` in earlier kernels).
**Exact diff** (verified to apply cleanly in Phase 4 boltzmann test):
```diff
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
@@ -955,6 +955,16 @@ static void rkvdec_buf_queue(struct vb2_buffer *vb)
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
+
+ /*
+ * Opt in to vb2's dma_resv release-fence path. Userspace
+ * consumers of rkvdec CAPTURE-side dmabufs (libva backend
+ * cap_pool, mpv vaapi-copy, ffmpeg hwdownload) get a real
+ * producer fence representing rkvdec's completion instead of
+ * the stub fence dma_buf_export_sync_file substitutes when
+ * dma_resv is empty. Best-effort: fence-allocation failure
+ * means we lose implicit-sync precision, no functional
+ * regression.
+ */
+ (void)vb2_buffer_attach_release_fence(vb);
}
```
**Commit message body** (full text):
```
media: rkvdec: attach dma_resv release fence at buf_queue
Opt the rkvdec driver into the new vb2 release-fence helper.
Same shape as the hantro + rockchip-rga patches: rkvdec_buf_queue
enqueues the buffer in the driver's m2m queue via v4l2_m2m_buf_queue
and additionally attaches a release fence to each plane's
dmabuf->resv via vb2_buffer_attach_release_fence(). vb2_buffer_done
signals the fence when the kernel decoder completes the M2M
operation.
Closes the cap_pool readback race observed by userspace consumers
(libva v4l2-request backend, mpv vaapi-copy, ffmpeg-vaapi-hwdownload)
that import rkvdec CAPTURE-side dmabufs and wait on the dmabuf's
implicit-sync fence: previously they raced ahead of decoder
completion and read pages still in their cap_pool init state
(all-zero); now they block on a real producer fence until
decoder IRQ fires.
Validated end-to-end on PineBook Pro (RK3399 / Mali-T860 / mainline
v7.0 base with this series applied) against fresnel-fourier iter5
verification matrix: ffmpeg-vaapi-hwdownload of H.264 1080p30, HEVC
720p, VP9 720p produces raw YUV byte-identical to kernel-direct
ffmpeg-v4l2request output across 5-frame samples.
Signed-off-by: claude-noether <claude-noether@reauktion.de>
```
**Phase 6 action**: author the patch in `kernel-agent/patches/subsystem/media/dma-resv-release-fence/0004-media-rkvdec-attach-dma_resv-release-fence-at-buf_queue.patch` via Gitea contents API as `claude-noether`.
### C5 — Patch storage scope
All 4 patches land in `git.reauktion.de/marfrit/kernel-agent/` under scope tag `subsystem/media/dma-resv-release-fence/`:
```
kernel-agent/
└── patches/
└── subsystem/
└── media/
└── dma-resv-release-fence/
├── 0001-media-videobuf2-add-dma_resv-release-fence-helper.patch
├── 0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch
├── 0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch
└── 0004-media-rkvdec-attach-dma_resv-release-fence-at-buf_queue.patch
```
This scope tag is new — current kernel-agent only has `board/pinebook-pro/`. Phase 6 creates the directory.
### C6 — Manifest update: `fleet/fresnel.yaml`
**Diff** to apply via Gitea contents API:
```diff
--- a/fleet/fresnel.yaml
+++ b/fleet/fresnel.yaml
@@ -22,16 +22,17 @@ baseline:
# Scope-tagged patch includes. Each entry resolves to
# patches/<scope>/.../<file>.patch in marfrit/kernel-agent.
includes:
- board/pinebook-pro/0001-arm64-dts-rk3399-pinebook-pro-add-OC-OPP-tables-1704-2184.patch
- board/pinebook-pro/0002-arm64-dts-rk3399-pinebook-pro-enable-hdmi-sound.patch
- board/pinebook-pro/0003-arm64-dts-rk3399-pinebook-pro-spi1-max-freq-10MHz.patch
+ - subsystem/media/dma-resv-release-fence/0001-media-videobuf2-add-dma_resv-release-fence-helper.patch
+ - subsystem/media/dma-resv-release-fence/0002-media-hantro-attach-dma_resv-release-fence-at-buf_queue.patch
+ - subsystem/media/dma-resv-release-fence/0003-media-rockchip-rga-attach-dma_resv-release-fence-at-buf_queue.patch
+ - subsystem/media/dma-resv-release-fence/0004-media-rkvdec-attach-dma_resv-release-fence-at-buf_queue.patch
-# Explicitly NOT included (tracked elsewhere, decision logged):
-# - subsystem/media/videobuf2/dma-resv-release-fence/ (RFC v1 rejected;
-# v2 in design — see marfrit/dmabuf-modifier-triage#3. Skip until v2 lands
-# or we explicitly accept v1-shape parity with ohm.)
+# Explicitly NOT included (tracked elsewhere, decision logged):
# - driver/panfrost/iommu-cache-rk3399/ (sibling kernel work; ship together
# with vb2_dma_resv when it lands.)
```
The `panfrost/iommu-cache-rk3399` exclusion stays — that's a separate sibling work-stream not on iter5's critical path. The "ship together with vb2_dma_resv when it lands" comment becomes inaccurate post-iter5 but doesn't gate iter5 close.
### C7 — `marfrit-packages/arch/linux-fresnel-fourier/PKGBUILD` bump
Version: `7.0-1``7.0-2` (pkgrel bump, pkgver unchanged since baseline stays at v7.0).
Patches array: add 4 entries pulled from `kernel-agent/patches/subsystem/media/dma-resv-release-fence/`.
If the PKGBUILD pulls patches by fetching from `kernel-agent` directly: just add the 4 filenames + sha256 sums.
If the PKGBUILD has patches inlined in `marfrit-packages/arch/linux-fresnel-fourier/`: copy the 4 .patch files into the same directory + update the source array + sha256 sums.
Phase 6 inspects current PKGBUILD shape and picks the right path.
### C8 — Build commands
Per the bootstrap README's manual ka-build substitute, on boltzmann:
```bash
ssh boltzmann
cd ~/src/kernel-agent-bootstrap # already-cloned per bootstrap state
git fetch # pull updated patches + manifest
# Or use marfrit-packages directly:
cd ~/projects/marfrit-packages/arch/linux-fresnel-fourier
makepkg -s --skipchecksums --skippgpcheck -f
```
Output artifact path predicted: `~/projects/marfrit-packages/arch/linux-fresnel-fourier/linux-fresnel-fourier-7.0-2-aarch64.pkg.tar.zst` + matching `-headers-` pkg.
If `--skipchecksums` is undesirable, regenerate via `updpkgsums`.
### C9 — Sign + publish
Per bootstrap README's `ka-sign + push` substitute:
```bash
scp boltzmann:~/projects/marfrit-packages/arch/linux-fresnel-fourier/linux-fresnel-fourier-7.0-2-*.pkg.tar.zst hertz:/tmp/ka-publish/
ssh hertz 'sudo /opt/herding/bin/marfrit-publish-arch aarch64 /tmp/ka-publish/linux-fresnel-fourier-7.0-2-aarch64.pkg.tar.zst'
ssh hertz 'sudo /opt/herding/bin/marfrit-publish-arch aarch64 /tmp/ka-publish/linux-fresnel-fourier-headers-7.0-2-aarch64.pkg.tar.zst'
```
The publish script signs with the known key, runs `repo-add`, rsyncs to nc.
### C10 — Install on fresnel
Per bootstrap README's `ka-install fresnel` substitute. Notable gotcha: HTTPS download from `packages.reauktion.de` stalls on slow wifi → LAN scp from hertz is the workaround.
**Pre-install backup**:
```bash
ssh hertz 'sudo install -d -o mfritsche -g mfritsche /sparfuxdata/kernel-agent-backups/fresnel/7.0-1/'
ssh fresnel 'sudo tar -czf /tmp/fresnel-boot-pre-install.tgz /boot/Image-fresnel-fourier /boot/dtbs-fresnel-fourier /boot/initramfs-fresnel-fourier.img /boot/extlinux/extlinux.conf'
scp fresnel:/tmp/fresnel-boot-pre-install.tgz hertz:/sparfuxdata/kernel-agent-backups/fresnel/7.0-1/
```
**Install** (LAN path):
```bash
ssh hertz 'scp /tmp/ka-publish/linux-fresnel-fourier-7.0-2-aarch64.pkg.tar.zst /tmp/ka-publish/linux-fresnel-fourier-headers-7.0-2-aarch64.pkg.tar.zst fresnel:/tmp/'
ssh fresnel 'sudo pacman -U /tmp/linux-fresnel-fourier-*7.0-2*.pkg.tar.zst'
ssh fresnel 'sudo mkinitcpio -p linux-fresnel-fourier' # standard hook watches vmlinuz, not Image; run manually per bootstrap learning
```
**Reboot**:
```bash
ssh fresnel 'sudo systemctl reboot'
# wait for SSH heartbeat
ssh fresnel 'uname -r' # expect: 7.0.0-fresnel-fourier (kernel suffix unchanged; package bump is pkgrel only)
ssh fresnel 'pacman -Q linux-fresnel-fourier' # expect: linux-fresnel-fourier 7.0-2
```
### C11 — Phase 7 verification matrix
Re-run the Phase 3 sweep script (`/tmp/iter5_p3/sweep.sh` on fresnel) verbatim. Expected hash matrix on post-install kernel:
| Codec | Fixture | libva hash | kdirect hash | sw hash | Expected libva==kdirect |
|---|---|---|---|---|---|
| H.264 1080p30 | `bbb_1080p30_h264.mp4` | `1e7a0bc9…` (post-fix) | `1e7a0bc9…` | `1e7a0bc9…` | **✓** |
| HEVC 720p | `bbb_720p10s_hevc.mp4` | `9340b832…` | `9340b832…` | `9340b832…` | **✓** |
| VP9 720p | `bbb_720p10s_vp9.webm` | `4f1565e8…` | `4f1565e8…` | `4f1565e8…` | **✓** |
| MPEG-2 720p | `bbb_720p10s_mpeg2.ts` | `19eefbf4…` | `19eefbf4…` | `7be8cad7…` | **✓** (libva already worked) |
| VP8 720p | `bbb_720p10s_vp8.webm` | `136ce5cb…` | `136ce5cb…` | `136ce5cb…` | **✓** |
5/5 PASS criterion green when:
- All 5 `libva == kdirect`.
- 4 of 5 `libva == sw` (H.264, HEVC, VP9, VP8). MPEG-2 stays HW≠SW (unrelated codec precision drift).
- No regression in control-payload submissions vs the iter5 Phase 3 anchors. (Strace re-run optional — kernel patches don't touch control-handling code.)
### C12 — Phase 8 close criteria
- All 4 Phase 1 criteria (Phase 0 amendment final lock) green.
- `phase8_iteration5_close.md` summarizes 4 patches landed + manifest update + build artifact path.
- Memory updates: either fold into `reference_dmabuf_resv_blocker.md` to update status from "blocker active" to "blocker resolved on fresnel substrate," or write fresh `reference_vb2_dma_resv_opt_in_pattern.md` documenting the contract.
- Campaign scoreboard: "5/5 direct" (was "4 direct + 1 transitive").
- Phase 5 sonnet-architect review sign-off recorded.
## Risk register
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| RFC v2 helper patch v6.12 → v7.0 rebase conflict | **Eliminated** (Phase 4 dry-run = 0 conflicts) | — | — |
| dma_resv API renamed/changed in v7.0 | **Eliminated** (Phase 4 symbol check: 10/10 hits) | — | — |
| rkvdec_buf_queue refactored such that opt-in site is no longer correct | LOW | MEDIUM | Phase 4 verified the call site directly on v7.0; Phase 6 re-checks the line numbers at PKGBUILD apply time. |
| PKGBUILD pulls patches by URL with checksum, breaking when new patches added | LOW | LOW | Phase 6 regenerates with `updpkgsums`. |
| Post-install, ffmpeg-vaapi-hwdownload still returns all-zero (cap_pool race not the real cause) | **LOW-MEDIUM** | **HIGH** | If this fails, the working hypothesis (sync race) is wrong; Phase 7 → Phase 0 loopback. Cache coherency becomes the next hypothesis (per the Q&A 2026-05-11). |
| Boltzmann offline at Phase 6 build time | MEDIUM | LOW | Fallback build host: fermi (hertz LXD, ALARM aarch64). |
| Hertz publish script signs with the wrong key | LOW | LOW | `92D5E96D8F63C75E4116AA1FF5C8C4603D0D250C` is the only key; script is stable. |
| Fresnel wifi stalls on HTTPS pacman download | KNOWN ISSUE | LOW | LAN scp from hertz per C10. |
| mkinitcpio doesn't auto-trigger on `Image` (ARM kernel) | KNOWN ISSUE | LOW | Manual `mkinitcpio -p linux-fresnel-fourier` per C10. |
## What Phase 5 sonnet-architect review should attack
Top concerns to invite the reviewer to scrutinize:
1. **Is the patch ordering correct?** Helper before consumers, or do any of the consumers reference symbols that need to be exported before they exist?
2. **Is `(void)vb2_buffer_attach_release_fence(vb)` the right call shape?** Should it check the return value and bail / warn on -ENOMEM? Per the operator's hantro patch comment, "best-effort: fence-allocation failure means we lose implicit-sync precision, no functional regression" — but does that hold for rkvdec specifically (where decoder completion semantics may differ from hantro's m2m completion semantics)?
3. **Does the rkvdec consumer site (`rkvdec_buf_queue` after `v4l2_m2m_buf_queue`) match the producer-fence semantics?** Specifically: rkvdec's buf_queue runs at QBUF time, but the actual decode happens later when the m2m worker schedules. The fence needs to signal at decode-DONE, not at QBUF time. The helper attaches the fence at buf_queue and `vb2_buffer_done` signals it — does rkvdec's flow eventually call `vb2_buffer_done(VB2_BUF_STATE_DONE)` after IRQ-fire? The hantro patch's commit message asserts hantro does (`v4l2_m2m_buf_done_and_job_finish → vb2_buffer_done`). Phase 5 verifies rkvdec has the same convergence.
4. **Is `subsystem/media/dma-resv-release-fence/` the right scope tag?** kernel-agent README has `subsystem/<area>/` but the existing example was `subsystem/media/videobuf2/dma-resv-release-fence/`. The current plan flattens that to `subsystem/media/dma-resv-release-fence/` — is the deeper nesting needed?
5. **Phase 7 verification matrix completeness.** Is `libva == kdirect` sufficient as the PASS bar, or should we also require `libva == sw` for the 4 codecs we expect to match SW? The latter is stricter; iter4 transitive proof discipline says == kdirect is enough since kdirect == sw is verified separately.
6. **Risk of unintended consequence on OUTPUT-side buffers.** The vb2 helper applies to both OUTPUT and CAPTURE planes. OUTPUT-side fence attachment may interact unexpectedly with userspace producers (libva backend writes the OUTPUT bitstream). Phase 5 verifies the operator's hantro validation covered OUTPUT-side semantics.
## Predicted iter5 cadence
- Phase 5 review: 1 session, ~30 min review surface (small patches).
- Phase 6 implementation:
- Push 4 patches to kernel-agent via Gitea contents API: ~15 min.
- Update fleet/fresnel.yaml: ~5 min.
- Update PKGBUILD: ~10 min (depends on patch-pull style).
- Build on boltzmann: ~30-60 min wallclock for full kernel build.
- Sign + publish via hertz: ~5 min.
- Pre-install backup + install on fresnel: ~10 min.
- Reboot + Phase 7 sweep: ~15 min.
- Phase 7 verification: re-run sweep, diff against expected hashes. ~10 min.
- Phase 8 close: ~30 min for the close doc + memory updates.
Total: half a day to full day of wall-clock, contingent on boltzmann availability + no Phase-7-loopback surprises.