iter8 Phase 0+1: lock E (perf binding cell) — campaign-closing iteration

iter8 is the final iteration. Locks Track E (carried iter1..iter7) as the empirical-anchor closing artifact: measure CPU%, drops, frame timing, GPU freq, memory across four consumer configurations (mpv DMA-BUF, mpv vaapi-copy, Firefox-fourier, SW baseline) on bbb_1080p30_h264.mp4 against the iter7-end driver. Why now: iter1-iter7 prioritized binary blockers; measuring a broken decoder is useless. iter7-end driver is the first stable substrate where numbers don't drift between consumer probes. Why this matters even without upstreaming (D dropped 2026-05-06): - Personal regression detection for any future fork change - Realism check on the campaign's own qualitative claims - Calibration for follow-on campaigns (fourier-fresnel will compare RK3399 numbers against this anchor) Phase 1 success criterion (5 parts): 1. Reproducible script in tests/ 2. Anchored numbers in a campaign artifact 3. Honest qualitative interpretation (no spin) 4. Phase 5 sonnet review confirms script is fixture-agnostic 5. Campaign close doc states "campaign closes" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 11:50:32 +00:00
parent 2707725fea
commit 94fc8afcd5
1 changed files with 97 additions and 0 deletions
@@ -0,0 +1,97 @@
+# Phase 0 — iteration 8 substrate (libva-multiplanar campaign — final iteration)
+
+Opened 2026-05-06 immediately after iter7 close + post-close research. iter8 is the **campaign-closing iteration**: anchors the deliverables to measured numbers (candidate E — performance binding cell), then formally closes the campaign.
+
+iter6 met the operator's primary goal end-to-end (Firefox HW decode of YouTube avc1 on PineTab2 with sandbox enabled). iter7 closed three internal carry items (msync verify + slot-leak recovery + cap_pool race harness). Post-iter7 research dropped Track F (DMABUF on OUTPUT, technical merit) and Track D (upstreaming, philosophical — see `memory/project_no_upstreaming_philosophical.md`). Track E is the last remaining candidate within the campaign. Follow-on top-level campaigns (`fourier-fresnel`, `panvk-bifrost`) are chartered separately and not part of this campaign's iteration sequence.
+
+## Predecessor close-out summary (iteration 7 → iteration 8)
+
+iter7 landed two fork commits:
+
+- `988b848` — main A+B+C: slot-leak `request_pool_force_release`, cap_pool race synthetic harness in `tests/`, msync pixel-verify shell harness in `tests/`.
+- `7bd0818` — Phase 7 finalization: OUTPUT-pool teardown on resolution-change in CreateSurfaces2 (latent bug surfaced by the synthetic harness).
+- `dcaa1f1` — silicon-ID nomenclature fix (PineTab2 = RK3566 silicon, hantro driver via the `rockchip,rk3568-vpu` DT compatible).
+
+iter7 carried into iter8:
+- **STREAMON-on-context-recreate after resolution change** — corner case (real consumers don't trigger), low priority
+- **Pool-size parameterization** — iter6 sonnet review carry, low priority
+- **Fault-inject build for Track B** — empirical hard-guarantee for the slot-leak recovery code path; sonnet code-review covered semantic correctness, deferred unless concretely needed
+
+None of those are blockers for iter8 close.
+
+## Iteration 8 candidate research question (single track)
+
+### E. Performance binding cell (carried iter1..iter7 — finally locked iter8)
+
+> Anchor measured numbers for the four primary consumer paths on `bbb_1080p30_h264.mp4`. Drop count, CPU%, frame timing, GPU/VPU freq, memory footprint. Reproducible from a documented script.
+
+**Why now**: iter1-iter7 each prioritized closing a binary blocker over measurement. Measuring a broken decoder is useless; iter7-end driver is the first stable substrate where numbers are meaningful and won't drift between consumer probes. This is the campaign's empirical anchor, the closing artifact.
+
+**Why this matters even without upstreaming** (per `project_no_upstreaming_philosophical.md` Track D drop):
+- Personal regression detection: any future change to the fork has a measured "before" to reference.
+- Realism check on the campaign's own qualitative claims (iter5/iter6/iter7 closes used "GREEN" without numbers — E forces honesty about what HW decode actually saves).
+- Calibrates expectations for the follow-on campaigns (`fourier-fresnel` will compare RK3399 numbers against PineTab2's anchor; `panvk-bifrost` will reference the GLES-vs-future-Vulkan delta).
+
+**Plan**: shell script in `tests/run_perf_binding_cell.sh`. Runs each of four consumer configurations for 30s on the campaign fixture, captures:
+- `pidstat -u -p <PID> 1 30` → per-second CPU% timeseries → median, p90
+- `/sys/class/devfreq/fde60000.gpu/cur_freq` polled at 100ms cadence → freq residency histogram
+- mpv `--term-status-msg='${frame-drop-count} ${time-pos} ${vsync-jitter}'` → drops + actual position + jitter
+- Firefox via `top -p` snapshot during steady-state playback (RDD process) since `about:processes` isn't programmatically scrapeable
+- `/proc/<PID>/status` VmRSS at start + end → memory delta
+- Optional: `/sys/kernel/debug/...hantro...` if exposed
+
+Four consumer configurations:
+1. **mpv `--hwdec=vaapi`** — DMA-BUF zero-copy path (full HW)
+2. **mpv `--hwdec=vaapi-copy`** — HW decode + VAImage readback to userspace
+3. **Firefox 150 (iter5-amend, sandbox enabled)** — production HW path through libva
+4. **mpv `--hwdec=no` (SW baseline)** — control
+
+**Risk**: low. Measurement-only. No driver code changes.
+
+**Effort**: 3-4 hours including script + run + parsing + markdown table generation.
+
+## In-scope (LOCKED 2026-05-06 for iteration 8) — E
+
+Operator locked **E** as the sole iter8 track. iter8 is the campaign-closing iteration.
+
+D (upstreaming) was dropped 2026-05-06 on philosophical grounds (`memory/project_no_upstreaming_philosophical.md`).
+F (DMABUF on OUTPUT) was dropped 2026-05-06 on technical grounds (`track_F_research_2026-05-06.md`).
+A, B, C closed iter7. iter1-iter6 carries all closed.
+
+iter7 carries (STREAMON-on-context-recreate, pool-size parameterization, slot-leak fault-inject) remain as low-priority items in the campaign-close doc, not iter8 scope.
+
+## Out-of-scope (LOCKED 2026-05-06 for iteration 8)
+
+- iter1-iter7 completed work — done.
+- Codecs outside H.264 (MPEG-2 dropped iter6, others out per iter1 lock).
+- New target hardware (fresnel, ampere) — separate top-level campaigns.
+- Upstreaming — dropped on philosophical grounds.
+- DMABUF on OUTPUT — dropped on technical grounds.
+- Driver code changes — measurement only.
+
+## Phase 1 success criterion (LOCKED 2026-05-06 for iteration 8)
+
+> 1. **Reproducible measurement script** committed to `tests/run_perf_binding_cell.sh` (or similar) that runs each of the four consumer configurations for ≥30 seconds against `bbb_1080p30_h264.mp4` on ohm and emits a markdown-formatted table with the following columns per row: consumer, CPU% median, CPU% p90, drops in measurement window, p50 frame interval (ms), GPU freq median (MHz), VmRSS delta (MiB).
+>
+> 2. **Anchored numbers** for all four consumers captured into a campaign artifact (`phase7_iter8_perf_anchor.md` or similar). Numbers must come from a clean ohm run on the iter7-end driver (sha `54999017…` or rebuild from iter7 HEAD `7bd0818`).
+>
+> 3. **Honest qualitative interpretation** in the close doc. If the numbers are uglier than expected (e.g., HW decode only saves 30% browser CPU rather than 80%), document that. The campaign's prior qualitative descriptors get re-anchored to the actual data.
+>
+> 4. **Phase 5 sonnet review** confirms: (a) script is fixture-agnostic (works for any H.264 file the operator passes), (b) measurements aren't fixture-hardcoded, (c) results are presented honestly without spin.
+>
+> 5. **Campaign close doc** (`phase8_iteration8_close.md`) explicitly states "campaign closes" and lists residual carries for any future operator who picks this up.
+
+## Phase 1 LOCKED. Iteration 8 proceeds.
+
+iter8 = candidate **E** alone. Phases 2..8 + campaign-close:
+- Phase 2: situation analysis — measurement methodology, parsing approach, edge cases (SW baseline drops to dozens at 1080p30, expect Firefox numbers limited by what we can scrape without Firefox-internal hooks)
+- Phase 3: baseline anchor — quick smoke run of `pidstat` + `/sys/class/devfreq` polling on ohm to confirm tooling availability
+- Phase 4: implement `tests/run_perf_binding_cell.sh`
+- Phase 5: sonnet review
+- Phase 6: deploy script (commit + sync to ohm)
+- Phase 7: run, capture, generate table
+- Phase 8: close iteration AND close campaign
+
+## Stop point
+
+After Phase 8 close, the campaign formally closes. Future operator-initiated work would re-open as a new top-level campaign (e.g., `fourier-fresnel` per `project_followon_campaigns.md`).