iter9 phase 0: lock cap_pool/REQBUFS/REINIT cascade as the question
Campaign reopen — iter8's "campaign-closing" status was contingent on "mpv --hwdec=vaapi smooth", which doesn't hold against fresh-install interactive testing. iter9 single-track scope: - Bug #1 (libva-v4l2-request-fourier#1) only - mpv H.264 fresh-login through ≥30s of decode without any of: cap_pool double-init, REQBUFS EBUSY, REINIT bad-fd, OUTPUT ENOMEM - Phase 0 will source-read cap_pool + request_pool + iter6 REINIT, build a vo=null reproduction harness, prepare bisect against iter5 baseline, and a libva-direct C probe for minimal repro Bug #2 (presentation green) is dmabuf-modifier-triage's job — peer campaign opened 2026-05-08 at ~/src/dmabuf-modifier-triage/. README cross-link now points at it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -58,7 +58,7 @@ The iter8 close (`65969da3`) was packaged as `libva-v4l2-request-fourier-1.0.0.r
|
|||||||
|
|
||||||
1. **libva cap_pool / REQBUFS / iter6-REINIT lifecycle cascade** — filed at [marfrit/libva-v4l2-request-fourier#1](https://git.reauktion.de/marfrit/libva-v4l2-request-fourier/issues/1). Under `mpv --hwdec=vaapi` interactive playback, `cap_pool_init` runs twice for slots 0..23 (probe-context + decode-context, no teardown between), `VIDIOC_REQBUFS` returns EBUSY (queue still STREAMON'd), iter6's per-OUTPUT-slot REINIT (commit `a09c03c`) chokes on a Bad fd, OUTPUT queue (`type=9`) hits ENOMEM after a few REQBUFS retries, decode aborts with `Failed to create surface: 2 (resource allocation failed)`. The Phase-5-sonnet-C4 caveat from iter5 (`cap_pool resolution-change race latent under untested consumer probe patterns`) was prescient — this is exactly that race, made hard-failing by iter6/7's additions. iter9 input.
|
1. **libva cap_pool / REQBUFS / iter6-REINIT lifecycle cascade** — filed at [marfrit/libva-v4l2-request-fourier#1](https://git.reauktion.de/marfrit/libva-v4l2-request-fourier/issues/1). Under `mpv --hwdec=vaapi` interactive playback, `cap_pool_init` runs twice for slots 0..23 (probe-context + decode-context, no teardown between), `VIDIOC_REQBUFS` returns EBUSY (queue still STREAMON'd), iter6's per-OUTPUT-slot REINIT (commit `a09c03c`) chokes on a Bad fd, OUTPUT queue (`type=9`) hits ENOMEM after a few REQBUFS retries, decode aborts with `Failed to create surface: 2 (resource allocation failed)`. The Phase-5-sonnet-C4 caveat from iter5 (`cap_pool resolution-change race latent under untested consumer probe patterns`) was prescient — this is exactly that race, made hard-failing by iter6/7's additions. iter9 input.
|
||||||
|
|
||||||
2. **dmabuf-wayland↔KWin presentation handoff produces solid green** — independent of libva. Filed at [marfrit/libva-multiplanar#1](https://git.reauktion.de/marfrit/libva-multiplanar/issues/1). `mpv --hwdec=v4l2request --vo=dmabuf-wayland` (libva-bypassed, ffmpeg's native V4L2 hwaccel) **also** produces solid green frames on ohm/KWin 6. `mpv --hwdec=v4l2request --vo=gpu` displays the correct picture (slow, GPU shader path on Mali-G52). So the green is dmabuf-wayland-specific, not decoder-side. KWin's `linux-dmabuf-v1` advertised list contains only `NV12 LINEAR (modifier 0x0)` — likely an NV12 modifier/pitch handshake regression somewhere between iter5 close (2026-05-05, "mpv `--hwdec=vaapi` smooth") and now. Could be on the libva side (vaExportSurfaceHandle modifier reporting), the ffmpeg side (drm_prime export modifier), or KWin/Mesa-panfrost upgrade since 2026-05-05.
|
2. **dmabuf-wayland↔KWin presentation handoff produces solid green** — independent of libva. Filed at [marfrit/libva-multiplanar#1](https://git.reauktion.de/marfrit/libva-multiplanar/issues/1); triage moved to dedicated peer campaign [`~/src/dmabuf-modifier-triage/`](../dmabuf-modifier-triage/) (Gitea: [marfrit/dmabuf-modifier-triage](https://git.reauktion.de/marfrit/dmabuf-modifier-triage)) opened 2026-05-08. Smoking gun identified at scaffold time: kwin-fourier currently ships `0001-transaction-bypass-watchDmaBuf-fence-wait.patch` active, which bypasses KWin's implicit-sync fence wait on dmabufs — a runtime-observable race that fits the symptom. Triage Phase 0 item 1 is the stock-kwin A/B that decides it. Doesn't gate libva-multiplanar iter9.
|
||||||
|
|
||||||
**Working ohm HW-decode path right now (workaround):**
|
**Working ohm HW-decode path right now (workaround):**
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,77 @@
|
|||||||
|
# Phase 0 — iteration 9 substrate (libva-multiplanar campaign — REOPEN)
|
||||||
|
|
||||||
|
Opened **2026-05-08** after the iter8 production-tip artifact (`libva-v4l2-request-fourier-1.0.0.r280.65969da-1`, shipped to `[marfrit]` 2026-05-08) was found to fail under fresh-install interactive mpv H.264 playback on ohm. iter8's "campaign-closing" status (per `phase0_findings_iter8.md` line 3) was contingent on the iter5/8 close claim of "mpv `--hwdec=vaapi` smooth" — that claim does not hold against a fresh-install + fresh-Plasma-session test path.
|
||||||
|
|
||||||
|
**This is a campaign reopen, not a continuation.** iter8 still represents the validated state under the test paths it was measured against (the `tests/run_perf_binding_cell.sh` harness). iter9 exists to address what the harness didn't catch.
|
||||||
|
|
||||||
|
## Predecessor close-out summary (iter8 → iter9)
|
||||||
|
|
||||||
|
iter8 landed three fork commits on top of iter7:
|
||||||
|
|
||||||
|
- `dcaa1f1` (2026-05-06) — docs/silicon-ID fix (PineTab2 = RK3566 silicon).
|
||||||
|
- `65969da` (2026-05-06) — `tests/run_perf_binding_cell.sh` harness for measured per-consumer drop/CPU/freq/memory numbers.
|
||||||
|
- (iter8 close commit not in fork log; close artifact `phase8_iteration8_close.md` records GREEN for E on 2026-05-06.)
|
||||||
|
|
||||||
|
iter8 then sat for two days. On 2026-05-08:
|
||||||
|
|
||||||
|
- The fork was packaged as `libva-v4l2-request-fourier` (PKGBUILD at `~/src/marfrit-packages/arch/libva-v4l2-request-fourier/`), pinned to `_commit=65969da`. CI built and published to `[marfrit]` via Gitea Actions run #65 success. ohm pulled the package via `pacman -Syu`. `[marfrit]` repo enabled in `/etc/pacman.conf`. `/etc/profile.d/libva-v4l2-request.sh` exports `LIBVA_DRIVER_NAME=v4l2_request` + `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1` + `LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0`.
|
||||||
|
- `vainfo` confirmed the new driver loads cleanly, enumerates the H.264 + MPEG-2 profile list (same shape as predecessor).
|
||||||
|
- Interactive `mpv --hwdec=vaapi --vo=dmabuf-wayland fourier-test/bbb_1080p30_h264.mp4` immediately hit the **cap_pool / REQBUFS / REINIT cascade** described in [marfrit/libva-v4l2-request-fourier#1](https://git.reauktion.de/marfrit/libva-v4l2-request-fourier/issues/1).
|
||||||
|
- Separately: even the `--hwdec=v4l2request` (libva-bypassed) path produced solid green frames via `--vo=dmabuf-wayland`, isolated to a different bug. That second bug moved to its own peer campaign at [`~/src/dmabuf-modifier-triage/`](../dmabuf-modifier-triage/) and does **not** gate iter9.
|
||||||
|
|
||||||
|
## Locked research question — iter9
|
||||||
|
|
||||||
|
> **Triage and fix the probe-then-decode lifecycle cascade exposed in interactive mpv H.264 playback at the iter8 production tip — fresh login through to ≥30s of decode without any of the following events: `cap_pool_init` firing twice for overlapping slot ranges in a single mpv invocation, `VIDIOC_REQBUFS` returning EBUSY, `Unable to reinit media request: Bad file descriptor`, `Unable to create buffer for type 9: No buffer space available`. The campaign re-closes only when this test path passes from a freshly-logged-in Plasma session.**
|
||||||
|
|
||||||
|
This is the test path iter5/8 closes implicitly claimed worked. iter9 makes the claim explicit and verifiable.
|
||||||
|
|
||||||
|
## Hypothesis space (Phase 0 must read source to confirm)
|
||||||
|
|
||||||
|
Three layers can produce the observed cascade:
|
||||||
|
|
||||||
|
1. **`cap_pool` lifecycle in `src/cap_pool.c` + callers.** Two `cap_pool_init` events for slot range 0..23 in close succession before the first decoded frame strongly suggests probe-context + decode-context double-init without teardown between. mpv's VA-API call sequence is roughly: `vaInitialize` → `vaQueryConfigProfiles` → `vaCreateConfig` → `vaCreateContext` → `vaCreateSurfaces` → decode loop. If `cap_pool_init` is wired into `vaCreateContext` rather than `vaCreateConfig`, both the probe context and the actual context would init separately and require teardown to be symmetric.
|
||||||
|
|
||||||
|
2. **`request_pool` lifecycle + iter6's REINIT.** "Unable to reinit media request: Bad file descriptor" is a direct iter6 output (commit `a09c03c`, "iter6 fix: per-OUTPUT-slot request_fd binding via REINIT"). The fd is being closed before REINIT runs. Possible causes: request_pool teardown closes the fd unconditionally, or the iter7 slot-leak fix (commit `988b848` adds `request_pool_force_release`) mistakenly closes a still-bound request fd.
|
||||||
|
|
||||||
|
3. **VIDIOC_REQBUFS without prior STREAMOFF.** EBUSY on REQBUFS means the queue is in `STREAMING` state. The fork's STREAMOFF call sites need to be audited — every `REQBUFS(count=N)` after a previous successful `REQBUFS(count=M, M>0)` must be preceded by `STREAMOFF` if the queue was started in between.
|
||||||
|
|
||||||
|
## Phase 0 will deliver
|
||||||
|
|
||||||
|
1. **Source-read of `cap_pool` + `request_pool` + `surface.c` + `context.c`** at commit `65969da`. Output: `phase0_iter9_source_read.md` capturing the actual call graph: which `vaXxx` entry point triggers `cap_pool_init`, when STREAMON happens, when STREAMOFF happens, when REINIT happens, who owns each fd. Read against the iter5 sweep commits (`951233a`, `848fc0c`, `843febc`, `d3a299b`, `c8b6ede`, `b993355`) and the iter6/7 fix commits (`a09c03c`, `988b848`, `7bd0818`) so the diff that introduced the leak is identifiable.
|
||||||
|
|
||||||
|
2. **Reproduction harness — `tests/run_iter9_lifecycle_repro.sh`.** Wraps a single `mpv --hwdec=vaapi --vo=null --frames=300 ...` call (vo=null isolates the bug from any presentation issues) with `LIBVA_TRACE` capture, parses output for the four cascade signatures, exits non-zero if any signature fires. Anchored to `bbb_1080p30_h264.mp4`. **Critical:** must launch from a fresh subshell with no leftover env / VA state so probe-then-decode lifecycle is exercised.
|
||||||
|
|
||||||
|
3. **Bisection plan against the iter5..iter8 commit range.** If the source-read in (1) doesn't unambiguously identify the regression-introducing commit, prepare a `git bisect` script using the harness in (2) so phase 4 can mechanically narrow.
|
||||||
|
|
||||||
|
4. **iter5 close re-validation.** Re-run the harness from (2) against the iter5-state commit (`c8b6ede` = "iter5 sweep follow-up"). Two outcomes — both useful:
|
||||||
|
- If iter5 also fails → the bug pre-dates iter6/7's additions and the iter5 close was over-claimed.
|
||||||
|
- If iter5 passes → iter6 (request_fd REINIT) or iter7 (slot-leak) introduced the regression. Bisect (3) narrows further.
|
||||||
|
|
||||||
|
5. **Sanity check against a single-track ffmpeg vainfo + decode probe.** Build a small C harness that calls `vaInitialize` + `vaQueryConfigProfiles` + `vaCreateConfig` + `vaCreateContext` + `vaCreateSurfaces` + a single decode + teardown, all via libva direct (no mpv, no ffmpeg). If the harness reproduces the cascade with no mpv complexity, the test surface area for phase 4's fix shrinks dramatically.
|
||||||
|
|
||||||
|
After Phase 0 closes, Phase 1 will replicate the baseline from items 2 + 4 (per `feedback_replicate_baseline_first.md`). Phase 2 will source-deep-dive on the layer item 1 fingered. Phase 3 will write the deterministic regression test. Phase 4 will fix. Phase 5 review will be sonnet.
|
||||||
|
|
||||||
|
## In-scope (LOCKED 2026-05-08 for iteration 9)
|
||||||
|
|
||||||
|
Single-track. Decoder-side cascade only.
|
||||||
|
|
||||||
|
- Bug #1 ([libva-v4l2-request-fourier#1](https://git.reauktion.de/marfrit/libva-v4l2-request-fourier/issues/1)) only.
|
||||||
|
- Test fixture: `~/fourier-test/bbb_1080p30_h264.mp4` (already on ohm).
|
||||||
|
- Target host: ohm. fresnel sits — fresnel-fourier is a peer campaign with separate iteration cadence; whatever iter9 fixes on ohm will be auto-inherited when fresnel-fourier rebuilds against the same fork master.
|
||||||
|
|
||||||
|
## Out-of-scope (LOCKED 2026-05-08 for iteration 9)
|
||||||
|
|
||||||
|
- Bug #2 ([libva-multiplanar#1](https://git.reauktion.de/marfrit/libva-multiplanar/issues/1)) — owned by `~/src/dmabuf-modifier-triage/`. Not gating iter9.
|
||||||
|
- Performance / measurement. iter8's perf binding cell is already in the fork (`tests/run_perf_binding_cell.sh`) and re-runs as part of any future iteration close. iter9 only needs to demonstrate that the cascade no longer fires; numbers are not a deliverable.
|
||||||
|
- Other consumers (Firefox, chromium-fourier, vainfo). mpv is the consumer that surfaced the bug; mpv is the consumer iter9 closes against. Sweep to other consumers is iter10's call.
|
||||||
|
- Other codecs (HEVC, VP9). H.264 only.
|
||||||
|
- Other hardware. ohm only.
|
||||||
|
- Upstreaming. Per `feedback_no_upstream.md`.
|
||||||
|
|
||||||
|
## Reference history
|
||||||
|
|
||||||
|
- `phase0_findings.md` — original campaign Phase 0 substrate.
|
||||||
|
- `phase0_findings_iter[2-8].md` — per-iter substrate.
|
||||||
|
- `phase8_iteration[1-8]_close.md` — per-iter close artifacts. Re-read iter5 + iter7 + iter8 closes specifically; the iter9 hypothesis space refers back to their explicit fix commits.
|
||||||
|
- [`~/src/libva-multiplanar/libva-v4l2-request-fourier/`](libva-v4l2-request-fourier/) — fork at `master = 229d6d1` today (with fresnel-fourier MPEG-2 commits past `65969da`); iter9 work happens on this master, with potentially a feature branch if the fix becomes large enough to warrant one.
|
||||||
|
- [`~/src/marfrit-packages/arch/libva-v4l2-request-fourier/PKGBUILD`](../marfrit-packages/arch/libva-v4l2-request-fourier/PKGBUILD) — bump `_commit` after iter9 close + close validation passes.
|
||||||
Reference in New Issue
Block a user