iter8 Phase 0+1: lock E (perf binding cell) — campaign-closing iteration

iter8 is the final iteration. Locks Track E (carried iter1..iter7)
as the empirical-anchor closing artifact: measure CPU%, drops,
frame timing, GPU freq, memory across four consumer configurations
(mpv DMA-BUF, mpv vaapi-copy, Firefox-fourier, SW baseline) on
bbb_1080p30_h264.mp4 against the iter7-end driver.

Why now: iter1-iter7 prioritized binary blockers; measuring a broken
decoder is useless. iter7-end driver is the first stable substrate
where numbers don't drift between consumer probes.

Why this matters even without upstreaming (D dropped 2026-05-06):
- Personal regression detection for any future fork change
- Realism check on the campaign's own qualitative claims
- Calibration for follow-on campaigns (fourier-fresnel will
  compare RK3399 numbers against this anchor)

Phase 1 success criterion (5 parts):
1. Reproducible script in tests/
2. Anchored numbers in a campaign artifact
3. Honest qualitative interpretation (no spin)
4. Phase 5 sonnet review confirms script is fixture-agnostic
5. Campaign close doc states "campaign closes"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-06 11:50:32 +00:00
parent 2707725fea
commit 94fc8afcd5
+97
View File
@@ -0,0 +1,97 @@
# Phase 0 — iteration 8 substrate (libva-multiplanar campaign — final iteration)
Opened 2026-05-06 immediately after iter7 close + post-close research. iter8 is the **campaign-closing iteration**: anchors the deliverables to measured numbers (candidate E — performance binding cell), then formally closes the campaign.
iter6 met the operator's primary goal end-to-end (Firefox HW decode of YouTube avc1 on PineTab2 with sandbox enabled). iter7 closed three internal carry items (msync verify + slot-leak recovery + cap_pool race harness). Post-iter7 research dropped Track F (DMABUF on OUTPUT, technical merit) and Track D (upstreaming, philosophical — see `memory/project_no_upstreaming_philosophical.md`). Track E is the last remaining candidate within the campaign. Follow-on top-level campaigns (`fourier-fresnel`, `panvk-bifrost`) are chartered separately and not part of this campaign's iteration sequence.
## Predecessor close-out summary (iteration 7 → iteration 8)
iter7 landed two fork commits:
- `988b848` — main A+B+C: slot-leak `request_pool_force_release`, cap_pool race synthetic harness in `tests/`, msync pixel-verify shell harness in `tests/`.
- `7bd0818` — Phase 7 finalization: OUTPUT-pool teardown on resolution-change in CreateSurfaces2 (latent bug surfaced by the synthetic harness).
- `dcaa1f1` — silicon-ID nomenclature fix (PineTab2 = RK3566 silicon, hantro driver via the `rockchip,rk3568-vpu` DT compatible).
iter7 carried into iter8:
- **STREAMON-on-context-recreate after resolution change** — corner case (real consumers don't trigger), low priority
- **Pool-size parameterization** — iter6 sonnet review carry, low priority
- **Fault-inject build for Track B** — empirical hard-guarantee for the slot-leak recovery code path; sonnet code-review covered semantic correctness, deferred unless concretely needed
None of those are blockers for iter8 close.
## Iteration 8 candidate research question (single track)
### E. Performance binding cell (carried iter1..iter7 — finally locked iter8)
> Anchor measured numbers for the four primary consumer paths on `bbb_1080p30_h264.mp4`. Drop count, CPU%, frame timing, GPU/VPU freq, memory footprint. Reproducible from a documented script.
**Why now**: iter1-iter7 each prioritized closing a binary blocker over measurement. Measuring a broken decoder is useless; iter7-end driver is the first stable substrate where numbers are meaningful and won't drift between consumer probes. This is the campaign's empirical anchor, the closing artifact.
**Why this matters even without upstreaming** (per `project_no_upstreaming_philosophical.md` Track D drop):
- Personal regression detection: any future change to the fork has a measured "before" to reference.
- Realism check on the campaign's own qualitative claims (iter5/iter6/iter7 closes used "GREEN" without numbers — E forces honesty about what HW decode actually saves).
- Calibrates expectations for the follow-on campaigns (`fourier-fresnel` will compare RK3399 numbers against PineTab2's anchor; `panvk-bifrost` will reference the GLES-vs-future-Vulkan delta).
**Plan**: shell script in `tests/run_perf_binding_cell.sh`. Runs each of four consumer configurations for 30s on the campaign fixture, captures:
- `pidstat -u -p <PID> 1 30` → per-second CPU% timeseries → median, p90
- `/sys/class/devfreq/fde60000.gpu/cur_freq` polled at 100ms cadence → freq residency histogram
- mpv `--term-status-msg='${frame-drop-count} ${time-pos} ${vsync-jitter}'` → drops + actual position + jitter
- Firefox via `top -p` snapshot during steady-state playback (RDD process) since `about:processes` isn't programmatically scrapeable
- `/proc/<PID>/status` VmRSS at start + end → memory delta
- Optional: `/sys/kernel/debug/...hantro...` if exposed
Four consumer configurations:
1. **mpv `--hwdec=vaapi`** — DMA-BUF zero-copy path (full HW)
2. **mpv `--hwdec=vaapi-copy`** — HW decode + VAImage readback to userspace
3. **Firefox 150 (iter5-amend, sandbox enabled)** — production HW path through libva
4. **mpv `--hwdec=no` (SW baseline)** — control
**Risk**: low. Measurement-only. No driver code changes.
**Effort**: 3-4 hours including script + run + parsing + markdown table generation.
## In-scope (LOCKED 2026-05-06 for iteration 8) — E
Operator locked **E** as the sole iter8 track. iter8 is the campaign-closing iteration.
D (upstreaming) was dropped 2026-05-06 on philosophical grounds (`memory/project_no_upstreaming_philosophical.md`).
F (DMABUF on OUTPUT) was dropped 2026-05-06 on technical grounds (`track_F_research_2026-05-06.md`).
A, B, C closed iter7. iter1-iter6 carries all closed.
iter7 carries (STREAMON-on-context-recreate, pool-size parameterization, slot-leak fault-inject) remain as low-priority items in the campaign-close doc, not iter8 scope.
## Out-of-scope (LOCKED 2026-05-06 for iteration 8)
- iter1-iter7 completed work — done.
- Codecs outside H.264 (MPEG-2 dropped iter6, others out per iter1 lock).
- New target hardware (fresnel, ampere) — separate top-level campaigns.
- Upstreaming — dropped on philosophical grounds.
- DMABUF on OUTPUT — dropped on technical grounds.
- Driver code changes — measurement only.
## Phase 1 success criterion (LOCKED 2026-05-06 for iteration 8)
> 1. **Reproducible measurement script** committed to `tests/run_perf_binding_cell.sh` (or similar) that runs each of the four consumer configurations for ≥30 seconds against `bbb_1080p30_h264.mp4` on ohm and emits a markdown-formatted table with the following columns per row: consumer, CPU% median, CPU% p90, drops in measurement window, p50 frame interval (ms), GPU freq median (MHz), VmRSS delta (MiB).
>
> 2. **Anchored numbers** for all four consumers captured into a campaign artifact (`phase7_iter8_perf_anchor.md` or similar). Numbers must come from a clean ohm run on the iter7-end driver (sha `54999017…` or rebuild from iter7 HEAD `7bd0818`).
>
> 3. **Honest qualitative interpretation** in the close doc. If the numbers are uglier than expected (e.g., HW decode only saves 30% browser CPU rather than 80%), document that. The campaign's prior qualitative descriptors get re-anchored to the actual data.
>
> 4. **Phase 5 sonnet review** confirms: (a) script is fixture-agnostic (works for any H.264 file the operator passes), (b) measurements aren't fixture-hardcoded, (c) results are presented honestly without spin.
>
> 5. **Campaign close doc** (`phase8_iteration8_close.md`) explicitly states "campaign closes" and lists residual carries for any future operator who picks this up.
## Phase 1 LOCKED. Iteration 8 proceeds.
iter8 = candidate **E** alone. Phases 2..8 + campaign-close:
- Phase 2: situation analysis — measurement methodology, parsing approach, edge cases (SW baseline drops to dozens at 1080p30, expect Firefox numbers limited by what we can scrape without Firefox-internal hooks)
- Phase 3: baseline anchor — quick smoke run of `pidstat` + `/sys/class/devfreq` polling on ohm to confirm tooling availability
- Phase 4: implement `tests/run_perf_binding_cell.sh`
- Phase 5: sonnet review
- Phase 6: deploy script (commit + sync to ohm)
- Phase 7: run, capture, generate table
- Phase 8: close iteration AND close campaign
## Stop point
After Phase 8 close, the campaign formally closes. Future operator-initiated work would re-open as a new top-level campaign (e.g., `fourier-fresnel` per `project_followon_campaigns.md`).