Files
libva-multiplanar/phase0_findings_iter7.md
claude-noether 2707725fea iter7+: drop candidate D (upstreaming) — philosophical, not technical
Operator's stance, recorded verbatim in
memory/project_no_upstreaming_philosophical.md: the AI-slop-buster
review climate in 2026 open-source maintainership makes submission
cost > benefit when personal requirements are met. Multiple iterations
of substantive work (sonnet pre/post-commit reviews, formal pixel
verification, regression test harnesses, clean commit history) don't
necessarily survive first contact with reviewers who treat
AI-assisted = automatic slop regardless of substance.

Track D was carried iter3+4+5+6 as a possible culminating iteration.
Dropped 2026-05-06.

Distinct from the prior feedback_no_upstream.md rule (which was
procedural — "no PR without explicit instruction"); the new memory
entry records the underlying philosophical reason. Procedurally,
the new instruction is "don't ever, regardless."

Reopen criterion documented in the memory entry: operator may change
mind if the climate softens, or if a trusted maintainer signals
"send it." Until then, none.

Remaining iter8+ candidate: E (performance binding cell) only.
Plus iter7 carries (low priority): STREAMON-on-context-recreate,
pool-size parameterization, fault-inject build for slot-leak.
Plus separate top-level campaigns: fourier-fresnel, panvk-bifrost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 11:46:33 +00:00

187 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 0 — iteration 7 substrate (libva-multiplanar campaign)
Opened 2026-05-06 immediately after iter6 close. iter6 closed GREEN with a single architectural fix (per-OUTPUT-slot request_fd binding via REINIT, fork commit `a09c03c`) that resolved the merged AI scope. Both candidate I (Firefox VIDIOC_QBUF EINVAL) and candidate A (cap_pool resolution-change race) closed in the same fix; the resolution-change race was organically exercised by YouTube's quality-renegotiation `cap_pool_init` events (4 events on a ~95s YT avc1 run, all clean).
Operator's primary goal — Firefox HW decode of YouTube avc1 on PineTab2 — is now MET end-to-end. Remaining campaign work is polish, formal verification, and upstream-prep.
## Predecessor close-out summary (iteration 6 → iteration 7)
iter6 landed a single fork commit:
- `a09c03c` — per-OUTPUT-slot request_fd binding via `MEDIA_REQUEST_IOC_REINIT`. Replaces iter4's `385dee1` close+`media_request_alloc`-per-frame model. Pool size 4 → 16. Slot owns the fd, surface borrows. iter4's case-against-REINIT was confirmed to be a DPB-payload confounder (since fixed in `74d8dd1`).
iter6 dropped MPEG-2 from the carry list (CPU handles it fine on the PineTab2 A55 cluster).
iter6 carried into iter7:
- **msync pixel-correctness verification** (carry from iter5 Phase 5 sonnet C3)
- **Slot-leak error recovery** (iter6 internal carry — `request_pool_force_release` for the rare case where REINIT or DQBUF fails mid-cycle)
- **Probe-pattern test harness for cap_pool race** (carry from iter5 sonnet C4 — race is organically exercised by YT but a synthetic test would anchor the claim formally)
- **WiFi-IRQ frame drops** (out-of-campaign system concern; flagged but not on iter7 scope)
## Iteration 7 candidate research questions
### A. msync removal pixel-correctness verification (carried from iter5 Phase 5 sonnet C3)
> Confirm decoded frames are byte-identical (or visually-correct-and-deterministic) post-iter5-sweep, after `msync(MS_SYNC | MS_INVALIDATE)` was removed alongside the iter1 patch-0010 hex-dump diagnostic.
**Plan**: 100-frame `vaapi-copy` run on `bbb_1080p30_h264.mp4`, capture md5/sha of each decoded YUV plane, compare against (a) iter1 baseline (msync-present), (b) FFmpeg software decode reference, (c) iter5-end baseline.
**Risk**: low. If frames diverge, msync goes back in. If frames match, formally close iter5 sonnet C3.
**Effort**: 1-2 hours including writing the frame-hash harness.
### B. Slot-leak error recovery (iter6 internal carry)
> Add `request_pool_force_release(pool, slot_index)` that REINITs the slot's fd and clears `busy=true`, callable from RequestSyncSurface error paths.
Currently when REINIT or DQBUF fails mid-cycle, the slot stays busy=true until `RequestTerminate`. With pool=16 and rare errors this is bounded, but it's a slow leak that sould eventually starve acquire under sustained-error scenarios.
**Plan**: add `request_pool_force_release` to request_pool.{c,h}. Call from surface.c error paths. Verify with a fault-injection harness (return `-EBUSY` from REINIT in some test scenario).
**Risk**: low — additive function, doesn't change happy path.
**Effort**: 1 hour code + smoke test.
### C. Probe-pattern test harness for cap_pool race (iter5 sonnet C4 / iter6 candidate A formal-anchor)
> Write a synthetic test program that exercises the `vaCreateSurfaces(small) → vaCreateSurfaces(big)` resolution-change pattern that originally triggered REQBUFS-EBUSY. Confirms iter6's REINIT discipline holds the cap_pool race closed in a deterministic-repro form, not just organically via YT.
**Plan**: 50-line C program using libva: open device, create context, create 4 surfaces at 128×128, vaPutSurface noop, destroy, create 4 surfaces at 1920×1080, decode bbb's first I-frame, sha256 the output. No EBUSY events expected in driver stderr.
**Risk**: low. Test harness only; doesn't change driver.
**Effort**: 2-3 hours including writing test, fixturing bbb's first I-frame as raw H.264 NAL.
### D. Bootlin / Mozilla upstreaming prep (DROPPED 2026-05-06)
Carried iter3+4+5+6 as candidate. **Dropped 2026-05-06** on operator's philosophical-stance grounds: the AI-slop-buster review climate in 2026 open-source maintainership makes the social cost of submission exceed the benefit when personal requirements are already met. See `memory/project_no_upstreaming_philosophical.md` for the operator-verbatim rationale.
This is a **stance change** vs the campaign's prior `feedback_no_upstream.md` rule. Procedurally that rule said "no PR/MR without operator instruction"; the new memory entry records that the instruction is now "don't ever, regardless." The campaign deliverables stay on `git.reauktion.de` for personal use + reference.
### E. Performance binding cell (carried from iter1+2+3+4+5+6)
> Anchor measured numbers for the four primary consumer paths: mpv-vaapi DMA-BUF, mpv-vaapi-copy, Firefox-fourier HW, SW baseline. Drop count, CPU%, frame timing, GPU freq, on bbb_1080p30. Reproducible from a documented script.
**Plan**: shell script that runs each consumer for 30s, captures `pidstat -u -p <PID>`, reads `/sys/class/devfreq/fde60000.gpu/cur_freq`, parses mpv stats / Firefox `about:processes`. Generate a markdown table. Carried five iterations now.
**Risk**: low. Measurement only.
**Effort**: 3-4 hours including script + run + table.
### F. V4L2_MEMORY_DMABUF (carried from iter2+3+4+5+6)
> Replace V4L2_MEMORY_MMAP with userspace dma-buf allocation for the OUTPUT pool. Architectural fix for the iter2 cap_pool race (statistical / LRU mitigation now superseded by iter6's REINIT discipline, but DMABUF is structurally cleaner).
**Risk**: highest unknown of any candidate. Possibly requires kernel work. Hantro on this kernel may not accept DMABUF on OUTPUT.
**Effort**: 1-3 days, possibly more.
### G. WiFi-IRQ frame drops (NEW from iter6, out-of-campaign-scope flagged)
> ohm shows visible frame drops during YouTube playback whenever the brcm/iwlwifi driver does heavy IRQ work. Decode pipeline is fine; presentation schedule slips.
**Stance**: out of campaign scope (system-level concern, not a libva-multiplanar bug). Listed here because operator surfaced it during iter6 verification. If desired, separate investigation into IRQ affinity, GRO offload, network-stack buffer settings.
### H. fourier-fresnel campaign (carried from iter5 followon-campaigns memory)
> Open the `fourier-fresnel` campaign — port `libva-v4l2-request-fourier` from ohm PineTab2 (RK3566 via hantro/rk3568-vpu) to fresnel RK3399 (Pinebook Pro). Validates generality of iter1-iter6 fixes on a second hardware target.
**Stance**: separate top-level campaign, not an iter7 candidate. Charter at `~/src/fourier-fresnel/` once opened. iter5 memory entry `project_followon_campaigns.md` records sequencing: fourier-fresnel before panvk-bifrost.
### I. panvk-bifrost campaign (carried from iter5 followon-campaigns memory)
> Charter at `~/src/panvk-bifrost/` already exists as document-only. Sequenced after fourier-fresnel.
**Stance**: separate top-level campaign.
### Recommended pairings
- **A + B** (msync verify + slot-leak fix) — both small, both formally close iter6's carry items. Tightest scope.
- **A + C** (msync verify + cap_pool race harness) — formally closes the two iter5 sonnet caveats. Mid scope.
- **A + B + C** — closes all three internal carry items in one iteration. Reasonable upper bound.
- **D alone** — gated on operator instruction. Big effort but the campaign's culmination.
- **E alone** — anchors campaign-wide claims to numbers. Carried six iterations now.
- **F alone** — high-risk architectural.
**Recommended primary**: **A + B + C** — closes all three internal carry items, leaves D/E/F for iter8+ depending on whether operator wants the upstream-filing iteration first or the perf-anchor iteration first.
**Alternative**: **D alone** if operator wants to make iter7 the upstream-prep / external-filing iteration. Carries political weight (Mozilla + bootlin) but minimal new code.
## State that carries (re-verified iter6 close)
- **Hardware**: ohm PineTab2 (Rockchip RK3566 silicon; hantro driver via `rockchip,rk3568-vpu` DT compatible), kernel 6.19.10. Access: `ohm` (LAN). VPN currently flaky.
- **Userspace**: firefox 150.0.1-1.1 (iter5 amendment), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3.
- **Driver installed**: `/usr/lib/dri/v4l2_request_drv_video.so` sha256 `ebe396d55104dbfedfa1065232d7f1959c519b4afe6fe33f46c1b9af13465ed6` (iter6-end, REINIT discipline + pool=16).
- **Test fixture**: bbb_1080p30_h264.mp4 sha256 `dcf8a7170fbd...`.
- **Build container**: firefox-fourier LXD on boltzmann, persistent.
- **Diagnostic instrument**: `/home/mfritsche/iter6-fork-dx/` on ohm — full fork tree with ITER6_DX log instrumentation, can be reactivated for iter7 candidate B fault-injection or candidate C harness wiring.
## State that does NOT carry
- iter6 telemetry logs are tmpfs-volatile.
- Original iter6-DX driver binary backed up at `/home/mfritsche/v4l2_request_drv_video.so.iter5end.bak` (iter5-end pre-iter6); the iter6-DX with diagnostics is at `/home/mfritsche/iter6-fork-dx/build/src/v4l2_request_drv_video.so` if needed for re-instrumentation.
## Tooling and measurement-instrument inventory
Same as iter6. Plus for new candidates:
- For A (msync verify): `vaapi-copy` already produces NV12 output to mpv's vo=null path. Add an mpv `--frames=N --o=output.yuv` script to capture raw YUV; sha256 each frame. FFmpeg's `ffmpeg -i bbb -vframes N output_%d.yuv` provides the SW reference.
- For B (slot-leak): synthesize a fault-inject by adding a debug toggle in surface.c that returns `-EBUSY` from REINIT after N frames; observe pool starvation, then add force_release and verify recovery.
- For C (cap_pool harness): write a 50-line libva test program. Reuse driver_data inspection from iter5 Track E test pattern.
- For D (upstream): no new tooling; existing patch series live in `firefox-fourier/` and the fork's git log iter1-iter6.
- For E (perf): `pidstat -u`, `/sys/class/devfreq/fde60000.gpu`, mpv stats overlay, Firefox `about:processes`.
- For F (DMABUF): kernel docs `Documentation/userspace-api/media/v4l/buffer.rst`, hantro driver source `drivers/staging/media/hantro/`.
## In-scope (LOCKED 2026-05-06 for iteration 7) — A + B + C
Operator locked **A + B + C**: msync pixel-correctness verification, slot-leak error recovery, and cap_pool-race synthetic test harness. Closes all three iter5/iter6 internal carry items in one iteration.
D (upstreaming), E (perf binding cell), F (V4L2_MEMORY_DMABUF) deferred to iter8+. G (WiFi-IRQ frame drops) remains out-of-campaign-scope. H, I are separate top-level campaigns.
## Out-of-scope (LOCKED 2026-05-06 for iteration 7)
- iter6-completed work (per-slot REINIT discipline) — done.
- iter5-completed work (Track A sweep, Track G PGO rebuild, Track E multi-context, Track B mpv libplacebo) — done.
- iter5 amendment (Utility seccomp / Firefox-fourier sandbox) — done.
- iter4-completed work (Track A frame-11 EINVAL fix) — done.
- iter3-completed work (Track F sandbox patch) — done in firefox-fourier; upstream filing is iter7 candidate D.
- New codecs OUTSIDE H.264 (VP8/VP9/AV1/HEVC out per iter1 lock; MPEG-2 dropped at iter6 close).
- New target hardware (fresnel, ampere) — separate campaigns (H, I above).
- WiFi-IRQ frame drops — system-level, not libva-multiplanar.
## Phase 1 success criterion (LOCKED 2026-05-06 for iteration 7)
> All three sub-tracks must independently pass on the iter7-end driver build:
>
> **A — msync pixel-correctness verification**
> - 100-frame `mpv --hwdec=vaapi-copy --o=output_%04d.yuv` against `bbb_1080p30_h264.mp4`.
> - Frame-by-frame sha256 of the captured YUV planes compared against FFmpeg SW decode reference (`ffmpeg -i bbb -frames:v 100 -f rawvideo -pix_fmt nv12 -`).
> - **Pass:** all 100 frames match SW reference byte-for-byte (or visually-identical with documented bit-precision delta if the kernel's NV12 packing differs trivially from FFmpeg's). Formally closes iter5 sonnet C3.
> - **Fail action:** restore `msync(MS_SYNC | MS_INVALIDATE)` in the surface DQBUF path; re-run; verify match. Document either way.
>
> **B — Slot-leak error recovery**
> - `request_pool_force_release(pool, slot_index)` added to request_pool.{c,h}; REINITs the slot's fd and clears `busy=true`.
> - Called from `RequestSyncSurface` error paths after `media_request_reinit` or `DQBUF` failure.
> - Synthetic fault-injection: a debug compile flag returns `-EBUSY` from REINIT after N frames. Pre-fix: pool starves after 16 errors. Post-fix: pool recovers; decode continues across error events.
> - mpv-vaapi-copy 100-frame regression test still GREEN (no regression on the happy path).
>
> **C — Probe-pattern test harness for cap_pool race**
> - C program at `tests/cap_pool_probe_pattern.c` (~50 lines) using libva: open device, `vaCreateContext`, `vaCreateSurfaces(128×128, 4)`, dispose, `vaCreateSurfaces(1920×1080, 4)`, decode bbb's first I-frame, sha256 the output.
> - **Pass:** zero `REQBUFS-EBUSY` events in driver stderr; decoded frame sha matches FFmpeg SW reference for the same I-frame; harness exits 0.
> - Formally anchors iter5 sonnet C4 / iter6 candidate A — the race that was organically exercised by YouTube's resolution renegotiations is now also covered by a deterministic synthetic test.
>
> Phase 5 sonnet review must explicitly confirm: (a) any restored msync (if A required it) is correctly placed, (b) `request_pool_force_release` doesn't introduce new mutable global state or break the pool's invariants, (c) the cap_pool harness is a real test (not just a fixture-hardcoded check that passes trivially).
## Phase 1 LOCKED. Iteration 7 proceeds.
iter7 = A + B + C combined. Phases 2..8:
- Phase 2: situation analysis for each track (A/B/C) — what we expect to find, what tools needed, what could go wrong
- Phase 3: baseline anchor — capture pre-fix state for each (A: current frame hashes vs SW; B: current pool starvation under fault inject; C: current behavior on probe pattern)
- Phase 4: execute. Order: B (smallest, additive) → C (synthetic test, no driver change) → A (verification — runs against the iter7-end driver including any B/C changes)
- Phase 5: sonnet review of combined diff before commit
- Phase 6: deploy iter7 driver to ohm
- Phase 7: verify all three tracks against locked criteria above
- Phase 8: close
After lock, iter7 phases 2..8 proceed autonomously per "Stop only if user is needed."