Files
libva-multiplanar/phase0_findings_iter7.md
T
claude-noether 6f03fb8baa iter7 Phase 0: substrate + 6 candidates
Predecessor (iter6): primary user goal MET — Firefox + YouTube avc1
HW decode works on PineTab2. Remaining campaign work is polish,
formal verification, and upstream-prep.

Candidates:
- A: msync removal pixel-correctness verification (carry from
  iter5 sonnet C3)
- B: Slot-leak error recovery — request_pool_force_release for
  REINIT/DQBUF mid-cycle failures (iter6 internal carry)
- C: Probe-pattern test harness for cap_pool race — formal anchor
  for iter5 sonnet C4 / iter6 candidate A organic exercise
- D: Bootlin / Mozilla upstreaming prep (carried iter3+4+5+6)
- E: Performance binding cell (carried six iterations)
- F: V4L2_MEMORY_DMABUF (high-risk architectural)

G (WiFi-IRQ frame drops) flagged out-of-campaign-scope. H/I
(fourier-fresnel, panvk-bifrost) separate top-level campaigns.

Recommended primary: A+B+C — closes all three internal carry
items in one iteration. Alternative: D alone for the upstream-
prep iteration.

Phase 1 lock requires operator input.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:34:41 +00:00

178 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 0 — iteration 7 substrate (libva-multiplanar campaign)
Opened 2026-05-06 immediately after iter6 close. iter6 closed GREEN with a single architectural fix (per-OUTPUT-slot request_fd binding via REINIT, fork commit `a09c03c`) that resolved the merged AI scope. Both candidate I (Firefox VIDIOC_QBUF EINVAL) and candidate A (cap_pool resolution-change race) closed in the same fix; the resolution-change race was organically exercised by YouTube's quality-renegotiation `cap_pool_init` events (4 events on a ~95s YT avc1 run, all clean).
Operator's primary goal — Firefox HW decode of YouTube avc1 on PineTab2 — is now MET end-to-end. Remaining campaign work is polish, formal verification, and upstream-prep.
## Predecessor close-out summary (iteration 6 → iteration 7)
iter6 landed a single fork commit:
- `a09c03c` — per-OUTPUT-slot request_fd binding via `MEDIA_REQUEST_IOC_REINIT`. Replaces iter4's `385dee1` close+`media_request_alloc`-per-frame model. Pool size 4 → 16. Slot owns the fd, surface borrows. iter4's case-against-REINIT was confirmed to be a DPB-payload confounder (since fixed in `74d8dd1`).
iter6 dropped MPEG-2 from the carry list (CPU handles it fine on RK3568).
iter6 carried into iter7:
- **msync pixel-correctness verification** (carry from iter5 Phase 5 sonnet C3)
- **Slot-leak error recovery** (iter6 internal carry — `request_pool_force_release` for the rare case where REINIT or DQBUF fails mid-cycle)
- **Probe-pattern test harness for cap_pool race** (carry from iter5 sonnet C4 — race is organically exercised by YT but a synthetic test would anchor the claim formally)
- **WiFi-IRQ frame drops** (out-of-campaign system concern; flagged but not on iter7 scope)
## Iteration 7 candidate research questions
### A. msync removal pixel-correctness verification (carried from iter5 Phase 5 sonnet C3)
> Confirm decoded frames are byte-identical (or visually-correct-and-deterministic) post-iter5-sweep, after `msync(MS_SYNC | MS_INVALIDATE)` was removed alongside the iter1 patch-0010 hex-dump diagnostic.
**Plan**: 100-frame `vaapi-copy` run on `bbb_1080p30_h264.mp4`, capture md5/sha of each decoded YUV plane, compare against (a) iter1 baseline (msync-present), (b) FFmpeg software decode reference, (c) iter5-end baseline.
**Risk**: low. If frames diverge, msync goes back in. If frames match, formally close iter5 sonnet C3.
**Effort**: 1-2 hours including writing the frame-hash harness.
### B. Slot-leak error recovery (iter6 internal carry)
> Add `request_pool_force_release(pool, slot_index)` that REINITs the slot's fd and clears `busy=true`, callable from RequestSyncSurface error paths.
Currently when REINIT or DQBUF fails mid-cycle, the slot stays busy=true until `RequestTerminate`. With pool=16 and rare errors this is bounded, but it's a slow leak that sould eventually starve acquire under sustained-error scenarios.
**Plan**: add `request_pool_force_release` to request_pool.{c,h}. Call from surface.c error paths. Verify with a fault-injection harness (return `-EBUSY` from REINIT in some test scenario).
**Risk**: low — additive function, doesn't change happy path.
**Effort**: 1 hour code + smoke test.
### C. Probe-pattern test harness for cap_pool race (iter5 sonnet C4 / iter6 candidate A formal-anchor)
> Write a synthetic test program that exercises the `vaCreateSurfaces(small) → vaCreateSurfaces(big)` resolution-change pattern that originally triggered REQBUFS-EBUSY. Confirms iter6's REINIT discipline holds the cap_pool race closed in a deterministic-repro form, not just organically via YT.
**Plan**: 50-line C program using libva: open device, create context, create 4 surfaces at 128×128, vaPutSurface noop, destroy, create 4 surfaces at 1920×1080, decode bbb's first I-frame, sha256 the output. No EBUSY events expected in driver stderr.
**Risk**: low. Test harness only; doesn't change driver.
**Effort**: 2-3 hours including writing test, fixturing bbb's first I-frame as raw H.264 NAL.
### D. Bootlin / Mozilla upstreaming prep (carried from iter3 candidate G + iter4 + iter5 + iter6)
> File the firefox-fourier patch with Mozilla Bugzilla (bug 1833354 / 1965646 reference for V4L2 stateless analogue). File libva-v4l2-request fork's iter1-iter6 patch series with bootlin's libva-v4l2-request maintainer (Paul Kocialkowski) as a coherent series.
**Plan (Mozilla)**: Bugzilla account, write up "V4L2 stateless decoders blocked by RDD+Utility sandbox" with reproducer, attach combined 160-line patch.
**Plan (bootlin)**: structure the iter1-iter6 commits as a clean patch series. Possibly squash some of iter5's instrumentation-removal commits with their original patch landings. Address upstream concerns about per-call diagnostics that survived as `request_log` lines.
**Risk**: socially-mediated. Maintainer may push back on architectural decisions; per `feedback_no_upstream.md` no PR/MR happens without explicit operator instruction.
**Effort**: bug filing 2-4 hours; patch-series prep 4-8 hours.
### E. Performance binding cell (carried from iter1+2+3+4+5+6)
> Anchor measured numbers for the four primary consumer paths: mpv-vaapi DMA-BUF, mpv-vaapi-copy, Firefox-fourier HW, SW baseline. Drop count, CPU%, frame timing, GPU freq, on bbb_1080p30. Reproducible from a documented script.
**Plan**: shell script that runs each consumer for 30s, captures `pidstat -u -p <PID>`, reads `/sys/class/devfreq/fde60000.gpu/cur_freq`, parses mpv stats / Firefox `about:processes`. Generate a markdown table. Carried five iterations now.
**Risk**: low. Measurement only.
**Effort**: 3-4 hours including script + run + table.
### F. V4L2_MEMORY_DMABUF (carried from iter2+3+4+5+6)
> Replace V4L2_MEMORY_MMAP with userspace dma-buf allocation for the OUTPUT pool. Architectural fix for the iter2 cap_pool race (statistical / LRU mitigation now superseded by iter6's REINIT discipline, but DMABUF is structurally cleaner).
**Risk**: highest unknown of any candidate. Possibly requires kernel work. Hantro on this kernel may not accept DMABUF on OUTPUT.
**Effort**: 1-3 days, possibly more.
### G. WiFi-IRQ frame drops (NEW from iter6, out-of-campaign-scope flagged)
> ohm shows visible frame drops during YouTube playback whenever the brcm/iwlwifi driver does heavy IRQ work. Decode pipeline is fine; presentation schedule slips.
**Stance**: out of campaign scope (system-level concern, not a libva-multiplanar bug). Listed here because operator surfaced it during iter6 verification. If desired, separate investigation into IRQ affinity, GRO offload, network-stack buffer settings.
### H. fourier-fresnel campaign (carried from iter5 followon-campaigns memory)
> Open the `fourier-fresnel` campaign — port `libva-v4l2-request-fourier` from ohm RK3568 to fresnel RK3399 (Pinebook Pro). Validates generality of iter1-iter6 fixes on a second hardware target.
**Stance**: separate top-level campaign, not an iter7 candidate. Charter at `~/src/fourier-fresnel/` once opened. iter5 memory entry `project_followon_campaigns.md` records sequencing: fourier-fresnel before panvk-bifrost.
### I. panvk-bifrost campaign (carried from iter5 followon-campaigns memory)
> Charter at `~/src/panvk-bifrost/` already exists as document-only. Sequenced after fourier-fresnel.
**Stance**: separate top-level campaign.
### Recommended pairings
- **A + B** (msync verify + slot-leak fix) — both small, both formally close iter6's carry items. Tightest scope.
- **A + C** (msync verify + cap_pool race harness) — formally closes the two iter5 sonnet caveats. Mid scope.
- **A + B + C** — closes all three internal carry items in one iteration. Reasonable upper bound.
- **D alone** — gated on operator instruction. Big effort but the campaign's culmination.
- **E alone** — anchors campaign-wide claims to numbers. Carried six iterations now.
- **F alone** — high-risk architectural.
**Recommended primary**: **A + B + C** — closes all three internal carry items, leaves D/E/F for iter8+ depending on whether operator wants the upstream-filing iteration first or the perf-anchor iteration first.
**Alternative**: **D alone** if operator wants to make iter7 the upstream-prep / external-filing iteration. Carries political weight (Mozilla + bootlin) but minimal new code.
## State that carries (re-verified iter6 close)
- **Hardware**: ohm RK3568 hantro G1/G2, kernel 6.19.10. Access: `ohm` (LAN). VPN currently flaky.
- **Userspace**: firefox 150.0.1-1.1 (iter5 amendment), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3.
- **Driver installed**: `/usr/lib/dri/v4l2_request_drv_video.so` sha256 `ebe396d55104dbfedfa1065232d7f1959c519b4afe6fe33f46c1b9af13465ed6` (iter6-end, REINIT discipline + pool=16).
- **Test fixture**: bbb_1080p30_h264.mp4 sha256 `dcf8a7170fbd...`.
- **Build container**: firefox-fourier LXD on boltzmann, persistent.
- **Diagnostic instrument**: `/home/mfritsche/iter6-fork-dx/` on ohm — full fork tree with ITER6_DX log instrumentation, can be reactivated for iter7 candidate B fault-injection or candidate C harness wiring.
## State that does NOT carry
- iter6 telemetry logs are tmpfs-volatile.
- Original iter6-DX driver binary backed up at `/home/mfritsche/v4l2_request_drv_video.so.iter5end.bak` (iter5-end pre-iter6); the iter6-DX with diagnostics is at `/home/mfritsche/iter6-fork-dx/build/src/v4l2_request_drv_video.so` if needed for re-instrumentation.
## Tooling and measurement-instrument inventory
Same as iter6. Plus for new candidates:
- For A (msync verify): `vaapi-copy` already produces NV12 output to mpv's vo=null path. Add an mpv `--frames=N --o=output.yuv` script to capture raw YUV; sha256 each frame. FFmpeg's `ffmpeg -i bbb -vframes N output_%d.yuv` provides the SW reference.
- For B (slot-leak): synthesize a fault-inject by adding a debug toggle in surface.c that returns `-EBUSY` from REINIT after N frames; observe pool starvation, then add force_release and verify recovery.
- For C (cap_pool harness): write a 50-line libva test program. Reuse driver_data inspection from iter5 Track E test pattern.
- For D (upstream): no new tooling; existing patch series live in `firefox-fourier/` and the fork's git log iter1-iter6.
- For E (perf): `pidstat -u`, `/sys/class/devfreq/fde60000.gpu`, mpv stats overlay, Firefox `about:processes`.
- For F (DMABUF): kernel docs `Documentation/userspace-api/media/v4l/buffer.rst`, hantro driver source `drivers/staging/media/hantro/`.
## In-scope (LOCKING DEFERRED — Phase 1 user input)
To be locked at Phase 1 from candidates A..F above. G is out-of-campaign-scope. H, I are separate top-level campaign decisions, not iter7 candidates.
## Out-of-scope (LOCKED 2026-05-06 for iteration 7)
- iter6-completed work (per-slot REINIT discipline) — done.
- iter5-completed work (Track A sweep, Track G PGO rebuild, Track E multi-context, Track B mpv libplacebo) — done.
- iter5 amendment (Utility seccomp / Firefox-fourier sandbox) — done.
- iter4-completed work (Track A frame-11 EINVAL fix) — done.
- iter3-completed work (Track F sandbox patch) — done in firefox-fourier; upstream filing is iter7 candidate D.
- New codecs OUTSIDE H.264 (VP8/VP9/AV1/HEVC out per iter1 lock; MPEG-2 dropped at iter6 close).
- New target hardware (fresnel, ampere) — separate campaigns (H, I above).
- WiFi-IRQ frame drops — system-level, not libva-multiplanar.
## Phase 1 success criterion (will lock after user picks candidate)
Pre-lock template:
- For candidate A: "100-frame `vaapi-copy` produces frame hashes matching either FFmpeg SW baseline (preferred) or iter1 baseline (if msync-removal causes any divergence). If divergence, msync restored and verified."
- For candidate B: "Synthetic fault-injection (REINIT returns -EBUSY after N frames) demonstrates pool starvation pre-fix; post-fix demonstrates `request_pool_force_release` reclaims the slot and decode resumes."
- For candidate C: "Synthetic test program issues `vaCreateSurfaces(small)` then `vaCreateSurfaces(big)` then decodes bbb's first I-frame; driver stderr has zero REQBUFS-EBUSY events; output frame sha matches FFmpeg SW reference for that I-frame."
- For candidate D: "Mozilla Bugzilla bug filed with combined 160-line patch attached, references bug 1833354/1965646. Bootlin patch series prepared as a clean iter1-iter6 sequence on a separate branch, ready to send (no PR until operator OK)."
- For candidate E: "Anchored perf table for {mpv vaapi DMA-BUF, mpv vaapi-copy, Firefox-fourier HW, SW baseline} across drop count + CPU% + frame timing + GPU freq on bbb_1080p30. Reproducible from documented script."
- For candidate F: "vaapi-copy + vaapi --vo=null still produce real frames with V4L2_MEMORY_DMABUF-backed OUTPUT buffers; race window architecturally closed."
## Stop point
**Phase 1 lock requires user input** — pick from A..F (and any pairing).
Recommended primary: **A + B + C** — closes all three internal carry items, leaves D/E/F for iter8+ depending on whether operator wants upstream-filing or perf-anchor next.
Alternative leans:
- **D alone** if operator wants the upstream-prep iteration now
- **E alone** if perf measurement matters more than carryover-closure
- **F alone** if architectural cleanliness drives the next iteration
After lock, iter7 phases 2..8 proceed autonomously per "Stop only if user is needed."