Files
libva-multiplanar/phase0_findings_iter7.md
T
claude-noether ec769a9687 docs: clarify Rockchip silicon across operative docs (RK3566)
PineTab2 is Rockchip RK3566 silicon, not RK3568. The hantro driver
attaches via the rockchip,rk3568-vpu DT compatible because RK3566/
RK3568 silicon is close enough to share that variant. The proper
RK3566 mainline driver target (rkvdec2 / vdpu346) has no kernel
support yet — Christian Hewitt's patch series LKML 2025/12/26/206
is unmerged.

Updated operative docs to use the consistent form:
"PineTab2 (Rockchip RK3566 silicon; hantro driver via the
rockchip,rk3568-vpu DT compatible)" or shorter variants.

Files updated:
- README.md (campaign top-level): TL;DR, deliverable, KWin link,
  hardware target, hardware listing
- firefox-fourier/README.md: tested-on line
- phase8_iteration7_close.md: hardware carry
- phase8_iteration6_close.md: hardware carry, MPEG-2 drop
  rationale
- phase0_findings_iter7.md: predecessor summary, fourier-fresnel
  description, hardware carry
- phase2_iter7_situation.md: msync hypothesis hardware reference

Historical iter1-iter5 phase docs left as-is — they're snapshots
of what the campaign believed at the time. The canonical source
for the silicon-ID correction is track_F_research_2026-05-06.md
(commit 358801b).

Not a correctness change. The campaign's empirical evidence is
unaffected — the hantro/rk3568-vpu driver path that we exercised
was always the actual decode path on PineTab2 silicon.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 11:39:28 +00:00

193 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 0 — iteration 7 substrate (libva-multiplanar campaign)
Opened 2026-05-06 immediately after iter6 close. iter6 closed GREEN with a single architectural fix (per-OUTPUT-slot request_fd binding via REINIT, fork commit `a09c03c`) that resolved the merged AI scope. Both candidate I (Firefox VIDIOC_QBUF EINVAL) and candidate A (cap_pool resolution-change race) closed in the same fix; the resolution-change race was organically exercised by YouTube's quality-renegotiation `cap_pool_init` events (4 events on a ~95s YT avc1 run, all clean).
Operator's primary goal — Firefox HW decode of YouTube avc1 on PineTab2 — is now MET end-to-end. Remaining campaign work is polish, formal verification, and upstream-prep.
## Predecessor close-out summary (iteration 6 → iteration 7)
iter6 landed a single fork commit:
- `a09c03c` — per-OUTPUT-slot request_fd binding via `MEDIA_REQUEST_IOC_REINIT`. Replaces iter4's `385dee1` close+`media_request_alloc`-per-frame model. Pool size 4 → 16. Slot owns the fd, surface borrows. iter4's case-against-REINIT was confirmed to be a DPB-payload confounder (since fixed in `74d8dd1`).
iter6 dropped MPEG-2 from the carry list (CPU handles it fine on the PineTab2 A55 cluster).
iter6 carried into iter7:
- **msync pixel-correctness verification** (carry from iter5 Phase 5 sonnet C3)
- **Slot-leak error recovery** (iter6 internal carry — `request_pool_force_release` for the rare case where REINIT or DQBUF fails mid-cycle)
- **Probe-pattern test harness for cap_pool race** (carry from iter5 sonnet C4 — race is organically exercised by YT but a synthetic test would anchor the claim formally)
- **WiFi-IRQ frame drops** (out-of-campaign system concern; flagged but not on iter7 scope)
## Iteration 7 candidate research questions
### A. msync removal pixel-correctness verification (carried from iter5 Phase 5 sonnet C3)
> Confirm decoded frames are byte-identical (or visually-correct-and-deterministic) post-iter5-sweep, after `msync(MS_SYNC | MS_INVALIDATE)` was removed alongside the iter1 patch-0010 hex-dump diagnostic.
**Plan**: 100-frame `vaapi-copy` run on `bbb_1080p30_h264.mp4`, capture md5/sha of each decoded YUV plane, compare against (a) iter1 baseline (msync-present), (b) FFmpeg software decode reference, (c) iter5-end baseline.
**Risk**: low. If frames diverge, msync goes back in. If frames match, formally close iter5 sonnet C3.
**Effort**: 1-2 hours including writing the frame-hash harness.
### B. Slot-leak error recovery (iter6 internal carry)
> Add `request_pool_force_release(pool, slot_index)` that REINITs the slot's fd and clears `busy=true`, callable from RequestSyncSurface error paths.
Currently when REINIT or DQBUF fails mid-cycle, the slot stays busy=true until `RequestTerminate`. With pool=16 and rare errors this is bounded, but it's a slow leak that sould eventually starve acquire under sustained-error scenarios.
**Plan**: add `request_pool_force_release` to request_pool.{c,h}. Call from surface.c error paths. Verify with a fault-injection harness (return `-EBUSY` from REINIT in some test scenario).
**Risk**: low — additive function, doesn't change happy path.
**Effort**: 1 hour code + smoke test.
### C. Probe-pattern test harness for cap_pool race (iter5 sonnet C4 / iter6 candidate A formal-anchor)
> Write a synthetic test program that exercises the `vaCreateSurfaces(small) → vaCreateSurfaces(big)` resolution-change pattern that originally triggered REQBUFS-EBUSY. Confirms iter6's REINIT discipline holds the cap_pool race closed in a deterministic-repro form, not just organically via YT.
**Plan**: 50-line C program using libva: open device, create context, create 4 surfaces at 128×128, vaPutSurface noop, destroy, create 4 surfaces at 1920×1080, decode bbb's first I-frame, sha256 the output. No EBUSY events expected in driver stderr.
**Risk**: low. Test harness only; doesn't change driver.
**Effort**: 2-3 hours including writing test, fixturing bbb's first I-frame as raw H.264 NAL.
### D. Bootlin / Mozilla upstreaming prep (carried from iter3 candidate G + iter4 + iter5 + iter6)
> File the firefox-fourier patch with Mozilla Bugzilla (bug 1833354 / 1965646 reference for V4L2 stateless analogue). File libva-v4l2-request fork's iter1-iter6 patch series with bootlin's libva-v4l2-request maintainer (Paul Kocialkowski) as a coherent series.
**Plan (Mozilla)**: Bugzilla account, write up "V4L2 stateless decoders blocked by RDD+Utility sandbox" with reproducer, attach combined 160-line patch.
**Plan (bootlin)**: structure the iter1-iter6 commits as a clean patch series. Possibly squash some of iter5's instrumentation-removal commits with their original patch landings. Address upstream concerns about per-call diagnostics that survived as `request_log` lines.
**Risk**: socially-mediated. Maintainer may push back on architectural decisions; per `feedback_no_upstream.md` no PR/MR happens without explicit operator instruction.
**Effort**: bug filing 2-4 hours; patch-series prep 4-8 hours.
### E. Performance binding cell (carried from iter1+2+3+4+5+6)
> Anchor measured numbers for the four primary consumer paths: mpv-vaapi DMA-BUF, mpv-vaapi-copy, Firefox-fourier HW, SW baseline. Drop count, CPU%, frame timing, GPU freq, on bbb_1080p30. Reproducible from a documented script.
**Plan**: shell script that runs each consumer for 30s, captures `pidstat -u -p <PID>`, reads `/sys/class/devfreq/fde60000.gpu/cur_freq`, parses mpv stats / Firefox `about:processes`. Generate a markdown table. Carried five iterations now.
**Risk**: low. Measurement only.
**Effort**: 3-4 hours including script + run + table.
### F. V4L2_MEMORY_DMABUF (carried from iter2+3+4+5+6)
> Replace V4L2_MEMORY_MMAP with userspace dma-buf allocation for the OUTPUT pool. Architectural fix for the iter2 cap_pool race (statistical / LRU mitigation now superseded by iter6's REINIT discipline, but DMABUF is structurally cleaner).
**Risk**: highest unknown of any candidate. Possibly requires kernel work. Hantro on this kernel may not accept DMABUF on OUTPUT.
**Effort**: 1-3 days, possibly more.
### G. WiFi-IRQ frame drops (NEW from iter6, out-of-campaign-scope flagged)
> ohm shows visible frame drops during YouTube playback whenever the brcm/iwlwifi driver does heavy IRQ work. Decode pipeline is fine; presentation schedule slips.
**Stance**: out of campaign scope (system-level concern, not a libva-multiplanar bug). Listed here because operator surfaced it during iter6 verification. If desired, separate investigation into IRQ affinity, GRO offload, network-stack buffer settings.
### H. fourier-fresnel campaign (carried from iter5 followon-campaigns memory)
> Open the `fourier-fresnel` campaign — port `libva-v4l2-request-fourier` from ohm PineTab2 (RK3566 via hantro/rk3568-vpu) to fresnel RK3399 (Pinebook Pro). Validates generality of iter1-iter6 fixes on a second hardware target.
**Stance**: separate top-level campaign, not an iter7 candidate. Charter at `~/src/fourier-fresnel/` once opened. iter5 memory entry `project_followon_campaigns.md` records sequencing: fourier-fresnel before panvk-bifrost.
### I. panvk-bifrost campaign (carried from iter5 followon-campaigns memory)
> Charter at `~/src/panvk-bifrost/` already exists as document-only. Sequenced after fourier-fresnel.
**Stance**: separate top-level campaign.
### Recommended pairings
- **A + B** (msync verify + slot-leak fix) — both small, both formally close iter6's carry items. Tightest scope.
- **A + C** (msync verify + cap_pool race harness) — formally closes the two iter5 sonnet caveats. Mid scope.
- **A + B + C** — closes all three internal carry items in one iteration. Reasonable upper bound.
- **D alone** — gated on operator instruction. Big effort but the campaign's culmination.
- **E alone** — anchors campaign-wide claims to numbers. Carried six iterations now.
- **F alone** — high-risk architectural.
**Recommended primary**: **A + B + C** — closes all three internal carry items, leaves D/E/F for iter8+ depending on whether operator wants the upstream-filing iteration first or the perf-anchor iteration first.
**Alternative**: **D alone** if operator wants to make iter7 the upstream-prep / external-filing iteration. Carries political weight (Mozilla + bootlin) but minimal new code.
## State that carries (re-verified iter6 close)
- **Hardware**: ohm PineTab2 (Rockchip RK3566 silicon; hantro driver via `rockchip,rk3568-vpu` DT compatible), kernel 6.19.10. Access: `ohm` (LAN). VPN currently flaky.
- **Userspace**: firefox 150.0.1-1.1 (iter5 amendment), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3.
- **Driver installed**: `/usr/lib/dri/v4l2_request_drv_video.so` sha256 `ebe396d55104dbfedfa1065232d7f1959c519b4afe6fe33f46c1b9af13465ed6` (iter6-end, REINIT discipline + pool=16).
- **Test fixture**: bbb_1080p30_h264.mp4 sha256 `dcf8a7170fbd...`.
- **Build container**: firefox-fourier LXD on boltzmann, persistent.
- **Diagnostic instrument**: `/home/mfritsche/iter6-fork-dx/` on ohm — full fork tree with ITER6_DX log instrumentation, can be reactivated for iter7 candidate B fault-injection or candidate C harness wiring.
## State that does NOT carry
- iter6 telemetry logs are tmpfs-volatile.
- Original iter6-DX driver binary backed up at `/home/mfritsche/v4l2_request_drv_video.so.iter5end.bak` (iter5-end pre-iter6); the iter6-DX with diagnostics is at `/home/mfritsche/iter6-fork-dx/build/src/v4l2_request_drv_video.so` if needed for re-instrumentation.
## Tooling and measurement-instrument inventory
Same as iter6. Plus for new candidates:
- For A (msync verify): `vaapi-copy` already produces NV12 output to mpv's vo=null path. Add an mpv `--frames=N --o=output.yuv` script to capture raw YUV; sha256 each frame. FFmpeg's `ffmpeg -i bbb -vframes N output_%d.yuv` provides the SW reference.
- For B (slot-leak): synthesize a fault-inject by adding a debug toggle in surface.c that returns `-EBUSY` from REINIT after N frames; observe pool starvation, then add force_release and verify recovery.
- For C (cap_pool harness): write a 50-line libva test program. Reuse driver_data inspection from iter5 Track E test pattern.
- For D (upstream): no new tooling; existing patch series live in `firefox-fourier/` and the fork's git log iter1-iter6.
- For E (perf): `pidstat -u`, `/sys/class/devfreq/fde60000.gpu`, mpv stats overlay, Firefox `about:processes`.
- For F (DMABUF): kernel docs `Documentation/userspace-api/media/v4l/buffer.rst`, hantro driver source `drivers/staging/media/hantro/`.
## In-scope (LOCKED 2026-05-06 for iteration 7) — A + B + C
Operator locked **A + B + C**: msync pixel-correctness verification, slot-leak error recovery, and cap_pool-race synthetic test harness. Closes all three iter5/iter6 internal carry items in one iteration.
D (upstreaming), E (perf binding cell), F (V4L2_MEMORY_DMABUF) deferred to iter8+. G (WiFi-IRQ frame drops) remains out-of-campaign-scope. H, I are separate top-level campaigns.
## Out-of-scope (LOCKED 2026-05-06 for iteration 7)
- iter6-completed work (per-slot REINIT discipline) — done.
- iter5-completed work (Track A sweep, Track G PGO rebuild, Track E multi-context, Track B mpv libplacebo) — done.
- iter5 amendment (Utility seccomp / Firefox-fourier sandbox) — done.
- iter4-completed work (Track A frame-11 EINVAL fix) — done.
- iter3-completed work (Track F sandbox patch) — done in firefox-fourier; upstream filing is iter7 candidate D.
- New codecs OUTSIDE H.264 (VP8/VP9/AV1/HEVC out per iter1 lock; MPEG-2 dropped at iter6 close).
- New target hardware (fresnel, ampere) — separate campaigns (H, I above).
- WiFi-IRQ frame drops — system-level, not libva-multiplanar.
## Phase 1 success criterion (LOCKED 2026-05-06 for iteration 7)
> All three sub-tracks must independently pass on the iter7-end driver build:
>
> **A — msync pixel-correctness verification**
> - 100-frame `mpv --hwdec=vaapi-copy --o=output_%04d.yuv` against `bbb_1080p30_h264.mp4`.
> - Frame-by-frame sha256 of the captured YUV planes compared against FFmpeg SW decode reference (`ffmpeg -i bbb -frames:v 100 -f rawvideo -pix_fmt nv12 -`).
> - **Pass:** all 100 frames match SW reference byte-for-byte (or visually-identical with documented bit-precision delta if the kernel's NV12 packing differs trivially from FFmpeg's). Formally closes iter5 sonnet C3.
> - **Fail action:** restore `msync(MS_SYNC | MS_INVALIDATE)` in the surface DQBUF path; re-run; verify match. Document either way.
>
> **B — Slot-leak error recovery**
> - `request_pool_force_release(pool, slot_index)` added to request_pool.{c,h}; REINITs the slot's fd and clears `busy=true`.
> - Called from `RequestSyncSurface` error paths after `media_request_reinit` or `DQBUF` failure.
> - Synthetic fault-injection: a debug compile flag returns `-EBUSY` from REINIT after N frames. Pre-fix: pool starves after 16 errors. Post-fix: pool recovers; decode continues across error events.
> - mpv-vaapi-copy 100-frame regression test still GREEN (no regression on the happy path).
>
> **C — Probe-pattern test harness for cap_pool race**
> - C program at `tests/cap_pool_probe_pattern.c` (~50 lines) using libva: open device, `vaCreateContext`, `vaCreateSurfaces(128×128, 4)`, dispose, `vaCreateSurfaces(1920×1080, 4)`, decode bbb's first I-frame, sha256 the output.
> - **Pass:** zero `REQBUFS-EBUSY` events in driver stderr; decoded frame sha matches FFmpeg SW reference for the same I-frame; harness exits 0.
> - Formally anchors iter5 sonnet C4 / iter6 candidate A — the race that was organically exercised by YouTube's resolution renegotiations is now also covered by a deterministic synthetic test.
>
> Phase 5 sonnet review must explicitly confirm: (a) any restored msync (if A required it) is correctly placed, (b) `request_pool_force_release` doesn't introduce new mutable global state or break the pool's invariants, (c) the cap_pool harness is a real test (not just a fixture-hardcoded check that passes trivially).
## Phase 1 LOCKED. Iteration 7 proceeds.
iter7 = A + B + C combined. Phases 2..8:
- Phase 2: situation analysis for each track (A/B/C) — what we expect to find, what tools needed, what could go wrong
- Phase 3: baseline anchor — capture pre-fix state for each (A: current frame hashes vs SW; B: current pool starvation under fault inject; C: current behavior on probe pattern)
- Phase 4: execute. Order: B (smallest, additive) → C (synthetic test, no driver change) → A (verification — runs against the iter7-end driver including any B/C changes)
- Phase 5: sonnet review of combined diff before commit
- Phase 6: deploy iter7 driver to ohm
- Phase 7: verify all three tracks against locked criteria above
- Phase 8: close
After lock, iter7 phases 2..8 proceed autonomously per "Stop only if user is needed."