Files
libva-multiplanar/phase0_findings_iter8.md
claude-noether 94fc8afcd5 iter8 Phase 0+1: lock E (perf binding cell) — campaign-closing iteration
iter8 is the final iteration. Locks Track E (carried iter1..iter7)
as the empirical-anchor closing artifact: measure CPU%, drops,
frame timing, GPU freq, memory across four consumer configurations
(mpv DMA-BUF, mpv vaapi-copy, Firefox-fourier, SW baseline) on
bbb_1080p30_h264.mp4 against the iter7-end driver.

Why now: iter1-iter7 prioritized binary blockers; measuring a broken
decoder is useless. iter7-end driver is the first stable substrate
where numbers don't drift between consumer probes.

Why this matters even without upstreaming (D dropped 2026-05-06):
- Personal regression detection for any future fork change
- Realism check on the campaign's own qualitative claims
- Calibration for follow-on campaigns (fourier-fresnel will
  compare RK3399 numbers against this anchor)

Phase 1 success criterion (5 parts):
1. Reproducible script in tests/
2. Anchored numbers in a campaign artifact
3. Honest qualitative interpretation (no spin)
4. Phase 5 sonnet review confirms script is fixture-agnostic
5. Campaign close doc states "campaign closes"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 11:50:32 +00:00

7.3 KiB

Phase 0 — iteration 8 substrate (libva-multiplanar campaign — final iteration)

Opened 2026-05-06 immediately after iter7 close + post-close research. iter8 is the campaign-closing iteration: anchors the deliverables to measured numbers (candidate E — performance binding cell), then formally closes the campaign.

iter6 met the operator's primary goal end-to-end (Firefox HW decode of YouTube avc1 on PineTab2 with sandbox enabled). iter7 closed three internal carry items (msync verify + slot-leak recovery + cap_pool race harness). Post-iter7 research dropped Track F (DMABUF on OUTPUT, technical merit) and Track D (upstreaming, philosophical — see memory/project_no_upstreaming_philosophical.md). Track E is the last remaining candidate within the campaign. Follow-on top-level campaigns (fourier-fresnel, panvk-bifrost) are chartered separately and not part of this campaign's iteration sequence.

Predecessor close-out summary (iteration 7 → iteration 8)

iter7 landed two fork commits:

  • 988b848 — main A+B+C: slot-leak request_pool_force_release, cap_pool race synthetic harness in tests/, msync pixel-verify shell harness in tests/.
  • 7bd0818 — Phase 7 finalization: OUTPUT-pool teardown on resolution-change in CreateSurfaces2 (latent bug surfaced by the synthetic harness).
  • dcaa1f1 — silicon-ID nomenclature fix (PineTab2 = RK3566 silicon, hantro driver via the rockchip,rk3568-vpu DT compatible).

iter7 carried into iter8:

  • STREAMON-on-context-recreate after resolution change — corner case (real consumers don't trigger), low priority
  • Pool-size parameterization — iter6 sonnet review carry, low priority
  • Fault-inject build for Track B — empirical hard-guarantee for the slot-leak recovery code path; sonnet code-review covered semantic correctness, deferred unless concretely needed

None of those are blockers for iter8 close.

Iteration 8 candidate research question (single track)

E. Performance binding cell (carried iter1..iter7 — finally locked iter8)

Anchor measured numbers for the four primary consumer paths on bbb_1080p30_h264.mp4. Drop count, CPU%, frame timing, GPU/VPU freq, memory footprint. Reproducible from a documented script.

Why now: iter1-iter7 each prioritized closing a binary blocker over measurement. Measuring a broken decoder is useless; iter7-end driver is the first stable substrate where numbers are meaningful and won't drift between consumer probes. This is the campaign's empirical anchor, the closing artifact.

Why this matters even without upstreaming (per project_no_upstreaming_philosophical.md Track D drop):

  • Personal regression detection: any future change to the fork has a measured "before" to reference.
  • Realism check on the campaign's own qualitative claims (iter5/iter6/iter7 closes used "GREEN" without numbers — E forces honesty about what HW decode actually saves).
  • Calibrates expectations for the follow-on campaigns (fourier-fresnel will compare RK3399 numbers against PineTab2's anchor; panvk-bifrost will reference the GLES-vs-future-Vulkan delta).

Plan: shell script in tests/run_perf_binding_cell.sh. Runs each of four consumer configurations for 30s on the campaign fixture, captures:

  • pidstat -u -p <PID> 1 30 → per-second CPU% timeseries → median, p90
  • /sys/class/devfreq/fde60000.gpu/cur_freq polled at 100ms cadence → freq residency histogram
  • mpv --term-status-msg='${frame-drop-count} ${time-pos} ${vsync-jitter}' → drops + actual position + jitter
  • Firefox via top -p snapshot during steady-state playback (RDD process) since about:processes isn't programmatically scrapeable
  • /proc/<PID>/status VmRSS at start + end → memory delta
  • Optional: /sys/kernel/debug/...hantro... if exposed

Four consumer configurations:

  1. mpv --hwdec=vaapi — DMA-BUF zero-copy path (full HW)
  2. mpv --hwdec=vaapi-copy — HW decode + VAImage readback to userspace
  3. Firefox 150 (iter5-amend, sandbox enabled) — production HW path through libva
  4. mpv --hwdec=no (SW baseline) — control

Risk: low. Measurement-only. No driver code changes.

Effort: 3-4 hours including script + run + parsing + markdown table generation.

In-scope (LOCKED 2026-05-06 for iteration 8) — E

Operator locked E as the sole iter8 track. iter8 is the campaign-closing iteration.

D (upstreaming) was dropped 2026-05-06 on philosophical grounds (memory/project_no_upstreaming_philosophical.md). F (DMABUF on OUTPUT) was dropped 2026-05-06 on technical grounds (track_F_research_2026-05-06.md). A, B, C closed iter7. iter1-iter6 carries all closed.

iter7 carries (STREAMON-on-context-recreate, pool-size parameterization, slot-leak fault-inject) remain as low-priority items in the campaign-close doc, not iter8 scope.

Out-of-scope (LOCKED 2026-05-06 for iteration 8)

  • iter1-iter7 completed work — done.
  • Codecs outside H.264 (MPEG-2 dropped iter6, others out per iter1 lock).
  • New target hardware (fresnel, ampere) — separate top-level campaigns.
  • Upstreaming — dropped on philosophical grounds.
  • DMABUF on OUTPUT — dropped on technical grounds.
  • Driver code changes — measurement only.

Phase 1 success criterion (LOCKED 2026-05-06 for iteration 8)

  1. Reproducible measurement script committed to tests/run_perf_binding_cell.sh (or similar) that runs each of the four consumer configurations for ≥30 seconds against bbb_1080p30_h264.mp4 on ohm and emits a markdown-formatted table with the following columns per row: consumer, CPU% median, CPU% p90, drops in measurement window, p50 frame interval (ms), GPU freq median (MHz), VmRSS delta (MiB).

  2. Anchored numbers for all four consumers captured into a campaign artifact (phase7_iter8_perf_anchor.md or similar). Numbers must come from a clean ohm run on the iter7-end driver (sha 54999017… or rebuild from iter7 HEAD 7bd0818).

  3. Honest qualitative interpretation in the close doc. If the numbers are uglier than expected (e.g., HW decode only saves 30% browser CPU rather than 80%), document that. The campaign's prior qualitative descriptors get re-anchored to the actual data.

  4. Phase 5 sonnet review confirms: (a) script is fixture-agnostic (works for any H.264 file the operator passes), (b) measurements aren't fixture-hardcoded, (c) results are presented honestly without spin.

  5. Campaign close doc (phase8_iteration8_close.md) explicitly states "campaign closes" and lists residual carries for any future operator who picks this up.

Phase 1 LOCKED. Iteration 8 proceeds.

iter8 = candidate E alone. Phases 2..8 + campaign-close:

  • Phase 2: situation analysis — measurement methodology, parsing approach, edge cases (SW baseline drops to dozens at 1080p30, expect Firefox numbers limited by what we can scrape without Firefox-internal hooks)
  • Phase 3: baseline anchor — quick smoke run of pidstat + /sys/class/devfreq polling on ohm to confirm tooling availability
  • Phase 4: implement tests/run_perf_binding_cell.sh
  • Phase 5: sonnet review
  • Phase 6: deploy script (commit + sync to ohm)
  • Phase 7: run, capture, generate table
  • Phase 8: close iteration AND close campaign

Stop point

After Phase 8 close, the campaign formally closes. Future operator-initiated work would re-open as a new top-level campaign (e.g., fourier-fresnel per project_followon_campaigns.md).