ec769a9687
PineTab2 is Rockchip RK3566 silicon, not RK3568. The hantro driver
attaches via the rockchip,rk3568-vpu DT compatible because RK3566/
RK3568 silicon is close enough to share that variant. The proper
RK3566 mainline driver target (rkvdec2 / vdpu346) has no kernel
support yet — Christian Hewitt's patch series LKML 2025/12/26/206
is unmerged.
Updated operative docs to use the consistent form:
"PineTab2 (Rockchip RK3566 silicon; hantro driver via the
rockchip,rk3568-vpu DT compatible)" or shorter variants.
Files updated:
- README.md (campaign top-level): TL;DR, deliverable, KWin link,
hardware target, hardware listing
- firefox-fourier/README.md: tested-on line
- phase8_iteration7_close.md: hardware carry
- phase8_iteration6_close.md: hardware carry, MPEG-2 drop
rationale
- phase0_findings_iter7.md: predecessor summary, fourier-fresnel
description, hardware carry
- phase2_iter7_situation.md: msync hypothesis hardware reference
Historical iter1-iter5 phase docs left as-is — they're snapshots
of what the campaign believed at the time. The canonical source
for the silicon-ID correction is track_F_research_2026-05-06.md
(commit 358801b).
Not a correctness change. The campaign's empirical evidence is
unaffected — the hantro/rk3568-vpu driver path that we exercised
was always the actual decode path on PineTab2 silicon.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
193 lines
14 KiB
Markdown
193 lines
14 KiB
Markdown
# Phase 0 — iteration 7 substrate (libva-multiplanar campaign)
|
||
|
||
Opened 2026-05-06 immediately after iter6 close. iter6 closed GREEN with a single architectural fix (per-OUTPUT-slot request_fd binding via REINIT, fork commit `a09c03c`) that resolved the merged A∪I scope. Both candidate I (Firefox VIDIOC_QBUF EINVAL) and candidate A (cap_pool resolution-change race) closed in the same fix; the resolution-change race was organically exercised by YouTube's quality-renegotiation `cap_pool_init` events (4 events on a ~95s YT avc1 run, all clean).
|
||
|
||
Operator's primary goal — Firefox HW decode of YouTube avc1 on PineTab2 — is now MET end-to-end. Remaining campaign work is polish, formal verification, and upstream-prep.
|
||
|
||
## Predecessor close-out summary (iteration 6 → iteration 7)
|
||
|
||
iter6 landed a single fork commit:
|
||
|
||
- `a09c03c` — per-OUTPUT-slot request_fd binding via `MEDIA_REQUEST_IOC_REINIT`. Replaces iter4's `385dee1` close+`media_request_alloc`-per-frame model. Pool size 4 → 16. Slot owns the fd, surface borrows. iter4's case-against-REINIT was confirmed to be a DPB-payload confounder (since fixed in `74d8dd1`).
|
||
|
||
iter6 dropped MPEG-2 from the carry list (CPU handles it fine on the PineTab2 A55 cluster).
|
||
|
||
iter6 carried into iter7:
|
||
- **msync pixel-correctness verification** (carry from iter5 Phase 5 sonnet C3)
|
||
- **Slot-leak error recovery** (iter6 internal carry — `request_pool_force_release` for the rare case where REINIT or DQBUF fails mid-cycle)
|
||
- **Probe-pattern test harness for cap_pool race** (carry from iter5 sonnet C4 — race is organically exercised by YT but a synthetic test would anchor the claim formally)
|
||
- **WiFi-IRQ frame drops** (out-of-campaign system concern; flagged but not on iter7 scope)
|
||
|
||
## Iteration 7 candidate research questions
|
||
|
||
### A. msync removal pixel-correctness verification (carried from iter5 Phase 5 sonnet C3)
|
||
|
||
> Confirm decoded frames are byte-identical (or visually-correct-and-deterministic) post-iter5-sweep, after `msync(MS_SYNC | MS_INVALIDATE)` was removed alongside the iter1 patch-0010 hex-dump diagnostic.
|
||
|
||
**Plan**: 100-frame `vaapi-copy` run on `bbb_1080p30_h264.mp4`, capture md5/sha of each decoded YUV plane, compare against (a) iter1 baseline (msync-present), (b) FFmpeg software decode reference, (c) iter5-end baseline.
|
||
|
||
**Risk**: low. If frames diverge, msync goes back in. If frames match, formally close iter5 sonnet C3.
|
||
|
||
**Effort**: 1-2 hours including writing the frame-hash harness.
|
||
|
||
### B. Slot-leak error recovery (iter6 internal carry)
|
||
|
||
> Add `request_pool_force_release(pool, slot_index)` that REINITs the slot's fd and clears `busy=true`, callable from RequestSyncSurface error paths.
|
||
|
||
Currently when REINIT or DQBUF fails mid-cycle, the slot stays busy=true until `RequestTerminate`. With pool=16 and rare errors this is bounded, but it's a slow leak that sould eventually starve acquire under sustained-error scenarios.
|
||
|
||
**Plan**: add `request_pool_force_release` to request_pool.{c,h}. Call from surface.c error paths. Verify with a fault-injection harness (return `-EBUSY` from REINIT in some test scenario).
|
||
|
||
**Risk**: low — additive function, doesn't change happy path.
|
||
|
||
**Effort**: 1 hour code + smoke test.
|
||
|
||
### C. Probe-pattern test harness for cap_pool race (iter5 sonnet C4 / iter6 candidate A formal-anchor)
|
||
|
||
> Write a synthetic test program that exercises the `vaCreateSurfaces(small) → vaCreateSurfaces(big)` resolution-change pattern that originally triggered REQBUFS-EBUSY. Confirms iter6's REINIT discipline holds the cap_pool race closed in a deterministic-repro form, not just organically via YT.
|
||
|
||
**Plan**: 50-line C program using libva: open device, create context, create 4 surfaces at 128×128, vaPutSurface noop, destroy, create 4 surfaces at 1920×1080, decode bbb's first I-frame, sha256 the output. No EBUSY events expected in driver stderr.
|
||
|
||
**Risk**: low. Test harness only; doesn't change driver.
|
||
|
||
**Effort**: 2-3 hours including writing test, fixturing bbb's first I-frame as raw H.264 NAL.
|
||
|
||
### D. Bootlin / Mozilla upstreaming prep (carried from iter3 candidate G + iter4 + iter5 + iter6)
|
||
|
||
> File the firefox-fourier patch with Mozilla Bugzilla (bug 1833354 / 1965646 reference for V4L2 stateless analogue). File libva-v4l2-request fork's iter1-iter6 patch series with bootlin's libva-v4l2-request maintainer (Paul Kocialkowski) as a coherent series.
|
||
|
||
**Plan (Mozilla)**: Bugzilla account, write up "V4L2 stateless decoders blocked by RDD+Utility sandbox" with reproducer, attach combined 160-line patch.
|
||
|
||
**Plan (bootlin)**: structure the iter1-iter6 commits as a clean patch series. Possibly squash some of iter5's instrumentation-removal commits with their original patch landings. Address upstream concerns about per-call diagnostics that survived as `request_log` lines.
|
||
|
||
**Risk**: socially-mediated. Maintainer may push back on architectural decisions; per `feedback_no_upstream.md` no PR/MR happens without explicit operator instruction.
|
||
|
||
**Effort**: bug filing 2-4 hours; patch-series prep 4-8 hours.
|
||
|
||
### E. Performance binding cell (carried from iter1+2+3+4+5+6)
|
||
|
||
> Anchor measured numbers for the four primary consumer paths: mpv-vaapi DMA-BUF, mpv-vaapi-copy, Firefox-fourier HW, SW baseline. Drop count, CPU%, frame timing, GPU freq, on bbb_1080p30. Reproducible from a documented script.
|
||
|
||
**Plan**: shell script that runs each consumer for 30s, captures `pidstat -u -p <PID>`, reads `/sys/class/devfreq/fde60000.gpu/cur_freq`, parses mpv stats / Firefox `about:processes`. Generate a markdown table. Carried five iterations now.
|
||
|
||
**Risk**: low. Measurement only.
|
||
|
||
**Effort**: 3-4 hours including script + run + table.
|
||
|
||
### F. V4L2_MEMORY_DMABUF (carried from iter2+3+4+5+6)
|
||
|
||
> Replace V4L2_MEMORY_MMAP with userspace dma-buf allocation for the OUTPUT pool. Architectural fix for the iter2 cap_pool race (statistical / LRU mitigation now superseded by iter6's REINIT discipline, but DMABUF is structurally cleaner).
|
||
|
||
**Risk**: highest unknown of any candidate. Possibly requires kernel work. Hantro on this kernel may not accept DMABUF on OUTPUT.
|
||
|
||
**Effort**: 1-3 days, possibly more.
|
||
|
||
### G. WiFi-IRQ frame drops (NEW from iter6, out-of-campaign-scope flagged)
|
||
|
||
> ohm shows visible frame drops during YouTube playback whenever the brcm/iwlwifi driver does heavy IRQ work. Decode pipeline is fine; presentation schedule slips.
|
||
|
||
**Stance**: out of campaign scope (system-level concern, not a libva-multiplanar bug). Listed here because operator surfaced it during iter6 verification. If desired, separate investigation into IRQ affinity, GRO offload, network-stack buffer settings.
|
||
|
||
### H. fourier-fresnel campaign (carried from iter5 followon-campaigns memory)
|
||
|
||
> Open the `fourier-fresnel` campaign — port `libva-v4l2-request-fourier` from ohm PineTab2 (RK3566 via hantro/rk3568-vpu) to fresnel RK3399 (Pinebook Pro). Validates generality of iter1-iter6 fixes on a second hardware target.
|
||
|
||
**Stance**: separate top-level campaign, not an iter7 candidate. Charter at `~/src/fourier-fresnel/` once opened. iter5 memory entry `project_followon_campaigns.md` records sequencing: fourier-fresnel before panvk-bifrost.
|
||
|
||
### I. panvk-bifrost campaign (carried from iter5 followon-campaigns memory)
|
||
|
||
> Charter at `~/src/panvk-bifrost/` already exists as document-only. Sequenced after fourier-fresnel.
|
||
|
||
**Stance**: separate top-level campaign.
|
||
|
||
### Recommended pairings
|
||
|
||
- **A + B** (msync verify + slot-leak fix) — both small, both formally close iter6's carry items. Tightest scope.
|
||
- **A + C** (msync verify + cap_pool race harness) — formally closes the two iter5 sonnet caveats. Mid scope.
|
||
- **A + B + C** — closes all three internal carry items in one iteration. Reasonable upper bound.
|
||
- **D alone** — gated on operator instruction. Big effort but the campaign's culmination.
|
||
- **E alone** — anchors campaign-wide claims to numbers. Carried six iterations now.
|
||
- **F alone** — high-risk architectural.
|
||
|
||
**Recommended primary**: **A + B + C** — closes all three internal carry items, leaves D/E/F for iter8+ depending on whether operator wants the upstream-filing iteration first or the perf-anchor iteration first.
|
||
|
||
**Alternative**: **D alone** if operator wants to make iter7 the upstream-prep / external-filing iteration. Carries political weight (Mozilla + bootlin) but minimal new code.
|
||
|
||
## State that carries (re-verified iter6 close)
|
||
|
||
- **Hardware**: ohm PineTab2 (Rockchip RK3566 silicon; hantro driver via `rockchip,rk3568-vpu` DT compatible), kernel 6.19.10. Access: `ohm` (LAN). VPN currently flaky.
|
||
- **Userspace**: firefox 150.0.1-1.1 (iter5 amendment), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3.
|
||
- **Driver installed**: `/usr/lib/dri/v4l2_request_drv_video.so` sha256 `ebe396d55104dbfedfa1065232d7f1959c519b4afe6fe33f46c1b9af13465ed6` (iter6-end, REINIT discipline + pool=16).
|
||
- **Test fixture**: bbb_1080p30_h264.mp4 sha256 `dcf8a7170fbd...`.
|
||
- **Build container**: firefox-fourier LXD on boltzmann, persistent.
|
||
- **Diagnostic instrument**: `/home/mfritsche/iter6-fork-dx/` on ohm — full fork tree with ITER6_DX log instrumentation, can be reactivated for iter7 candidate B fault-injection or candidate C harness wiring.
|
||
|
||
## State that does NOT carry
|
||
|
||
- iter6 telemetry logs are tmpfs-volatile.
|
||
- Original iter6-DX driver binary backed up at `/home/mfritsche/v4l2_request_drv_video.so.iter5end.bak` (iter5-end pre-iter6); the iter6-DX with diagnostics is at `/home/mfritsche/iter6-fork-dx/build/src/v4l2_request_drv_video.so` if needed for re-instrumentation.
|
||
|
||
## Tooling and measurement-instrument inventory
|
||
|
||
Same as iter6. Plus for new candidates:
|
||
|
||
- For A (msync verify): `vaapi-copy` already produces NV12 output to mpv's vo=null path. Add an mpv `--frames=N --o=output.yuv` script to capture raw YUV; sha256 each frame. FFmpeg's `ffmpeg -i bbb -vframes N output_%d.yuv` provides the SW reference.
|
||
- For B (slot-leak): synthesize a fault-inject by adding a debug toggle in surface.c that returns `-EBUSY` from REINIT after N frames; observe pool starvation, then add force_release and verify recovery.
|
||
- For C (cap_pool harness): write a 50-line libva test program. Reuse driver_data inspection from iter5 Track E test pattern.
|
||
- For D (upstream): no new tooling; existing patch series live in `firefox-fourier/` and the fork's git log iter1-iter6.
|
||
- For E (perf): `pidstat -u`, `/sys/class/devfreq/fde60000.gpu`, mpv stats overlay, Firefox `about:processes`.
|
||
- For F (DMABUF): kernel docs `Documentation/userspace-api/media/v4l/buffer.rst`, hantro driver source `drivers/staging/media/hantro/`.
|
||
|
||
## In-scope (LOCKED 2026-05-06 for iteration 7) — A + B + C
|
||
|
||
Operator locked **A + B + C**: msync pixel-correctness verification, slot-leak error recovery, and cap_pool-race synthetic test harness. Closes all three iter5/iter6 internal carry items in one iteration.
|
||
|
||
D (upstreaming), E (perf binding cell), F (V4L2_MEMORY_DMABUF) deferred to iter8+. G (WiFi-IRQ frame drops) remains out-of-campaign-scope. H, I are separate top-level campaigns.
|
||
|
||
## Out-of-scope (LOCKED 2026-05-06 for iteration 7)
|
||
|
||
- iter6-completed work (per-slot REINIT discipline) — done.
|
||
- iter5-completed work (Track A sweep, Track G PGO rebuild, Track E multi-context, Track B mpv libplacebo) — done.
|
||
- iter5 amendment (Utility seccomp / Firefox-fourier sandbox) — done.
|
||
- iter4-completed work (Track A frame-11 EINVAL fix) — done.
|
||
- iter3-completed work (Track F sandbox patch) — done in firefox-fourier; upstream filing is iter7 candidate D.
|
||
- New codecs OUTSIDE H.264 (VP8/VP9/AV1/HEVC out per iter1 lock; MPEG-2 dropped at iter6 close).
|
||
- New target hardware (fresnel, ampere) — separate campaigns (H, I above).
|
||
- WiFi-IRQ frame drops — system-level, not libva-multiplanar.
|
||
|
||
## Phase 1 success criterion (LOCKED 2026-05-06 for iteration 7)
|
||
|
||
> All three sub-tracks must independently pass on the iter7-end driver build:
|
||
>
|
||
> **A — msync pixel-correctness verification**
|
||
> - 100-frame `mpv --hwdec=vaapi-copy --o=output_%04d.yuv` against `bbb_1080p30_h264.mp4`.
|
||
> - Frame-by-frame sha256 of the captured YUV planes compared against FFmpeg SW decode reference (`ffmpeg -i bbb -frames:v 100 -f rawvideo -pix_fmt nv12 -`).
|
||
> - **Pass:** all 100 frames match SW reference byte-for-byte (or visually-identical with documented bit-precision delta if the kernel's NV12 packing differs trivially from FFmpeg's). Formally closes iter5 sonnet C3.
|
||
> - **Fail action:** restore `msync(MS_SYNC | MS_INVALIDATE)` in the surface DQBUF path; re-run; verify match. Document either way.
|
||
>
|
||
> **B — Slot-leak error recovery**
|
||
> - `request_pool_force_release(pool, slot_index)` added to request_pool.{c,h}; REINITs the slot's fd and clears `busy=true`.
|
||
> - Called from `RequestSyncSurface` error paths after `media_request_reinit` or `DQBUF` failure.
|
||
> - Synthetic fault-injection: a debug compile flag returns `-EBUSY` from REINIT after N frames. Pre-fix: pool starves after 16 errors. Post-fix: pool recovers; decode continues across error events.
|
||
> - mpv-vaapi-copy 100-frame regression test still GREEN (no regression on the happy path).
|
||
>
|
||
> **C — Probe-pattern test harness for cap_pool race**
|
||
> - C program at `tests/cap_pool_probe_pattern.c` (~50 lines) using libva: open device, `vaCreateContext`, `vaCreateSurfaces(128×128, 4)`, dispose, `vaCreateSurfaces(1920×1080, 4)`, decode bbb's first I-frame, sha256 the output.
|
||
> - **Pass:** zero `REQBUFS-EBUSY` events in driver stderr; decoded frame sha matches FFmpeg SW reference for the same I-frame; harness exits 0.
|
||
> - Formally anchors iter5 sonnet C4 / iter6 candidate A — the race that was organically exercised by YouTube's resolution renegotiations is now also covered by a deterministic synthetic test.
|
||
>
|
||
> Phase 5 sonnet review must explicitly confirm: (a) any restored msync (if A required it) is correctly placed, (b) `request_pool_force_release` doesn't introduce new mutable global state or break the pool's invariants, (c) the cap_pool harness is a real test (not just a fixture-hardcoded check that passes trivially).
|
||
|
||
## Phase 1 LOCKED. Iteration 7 proceeds.
|
||
|
||
iter7 = A + B + C combined. Phases 2..8:
|
||
- Phase 2: situation analysis for each track (A/B/C) — what we expect to find, what tools needed, what could go wrong
|
||
- Phase 3: baseline anchor — capture pre-fix state for each (A: current frame hashes vs SW; B: current pool starvation under fault inject; C: current behavior on probe pattern)
|
||
- Phase 4: execute. Order: B (smallest, additive) → C (synthetic test, no driver change) → A (verification — runs against the iter7-end driver including any B/C changes)
|
||
- Phase 5: sonnet review of combined diff before commit
|
||
- Phase 6: deploy iter7 driver to ohm
|
||
- Phase 7: verify all three tracks against locked criteria above
|
||
- Phase 8: close
|
||
|
||
After lock, iter7 phases 2..8 proceed autonomously per "Stop only if user is needed."
|