c36c61e197
Operator-driven Firefox 150 video session under iter2 driver: - With MOZ_DISABLE_RDD_SANDBOX=1: our libva backend engages, decodes 10 frames cleanly through hantro (luma gradient 0x10..0x1c matching BBB intro fade), surface ID recycle + cap_pool slot cycling work as designed. EINVAL on frame 11 is iter1-carryover (Sonnet 7.x family), not an iter2 code regression. - Without sandbox bypass: Firefox 150 RDD sandbox blocks open of /dev/media0 with ENETDOWN, libva init fails, Firefox SW-falls-back. iter1 evidence shows libva ran in the utility process (sandboxingKind=0) at that time; Firefox routing has since changed to RDD. NOT iter2-side; flag for iter3 substrate. Cap_pool architecture confirmed working with Firefox: 8 surfaces, slots recycled IN_DECODE -> DECODED -> reacquired across multiple BeginPicture cycles for the same surface ID. Decoded NV12 content is real (matches mpv vaapi-copy luma signature on the same clip). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
83 lines
9.5 KiB
Markdown
83 lines
9.5 KiB
Markdown
# Iteration 2 — Phase 8 close (2026-05-04)
|
|
|
|
## Deliverable
|
|
|
|
`marfrit/libva-v4l2-request-fourier` master at `19acc76` — three independent hardening fixes layered on top of iteration 1's working decode:
|
|
|
|
| Fix | Commit | What it addresses |
|
|
|---|---|---|
|
|
| Fix 1 | `06beef6` | Multi-resolution session-state corruption: REQBUFS(0) on CAPTURE in resolution-change path + invalidate `LAST_OUTPUT_WIDTH/HEIGHT` cache on `RequestDestroyContext` |
|
|
| Fix 2 | `e64bb08` | Mesa WSI rejection of non-64-aligned pitch: conditional `DRM_FORMAT_MOD_INVALID` for pitches that aren't 64-byte aligned |
|
|
| Fix 3 | `19acc76` | DMA-BUF lifecycle race: decoupled CAPTURE buffer pool with 4-state machine (FREE / IN_DECODE / DECODED / EXPORTED), pthread-locked, LRU recycling. Pool size = max(surfaces_count, MIN_CAP_POOL=24). |
|
|
|
|
## Iteration 2 Phase 1 lock — met
|
|
|
|
> mpv `--hwdec=vaapi --vo=gpu` plays bbb_1080p30 for ≥30s without operator-visible stutter or back-and-forth (binding cell: operator inspection on real screen).
|
|
|
|
**Met (operator-confirmed: "smooth"** on the goal-test, 2026-05-04 with driver sha256 `f27e0064...`).
|
|
|
|
## Phase 6 / 7 verification status
|
|
|
|
| Consumer | Result |
|
|
|---|---|
|
|
| vainfo | ✓ enumerates 7 H.264 + 2 MPEG-2 profiles |
|
|
| mpv `--hwdec=vaapi-copy --vo=null --frames=200` bbb_1080p30 | ✓ 0 drops, 0 decoder drops, real luma gradient (0x10→0x81), pool LRU visibly recycling slot indices `0..23 → 0,2,1,4,3...` |
|
|
| mpv `--hwdec=vaapi --vo=gpu` (real-VO, operator inspection) | ✓ smooth — iteration 2 goal met |
|
|
| Firefox 150 (sandbox-disabled) | ✓ engages our libva, decodes 10 frames cleanly through hantro (luma gradient `0x10→0x1c` matching BBB intro fade, real NV12 pixels), then EINVAL on `set_controls` at frame 11. The EINVAL is a **non-iter2 issue** — same Sonnet 7.x family carryover from iter1 (likely 7.5 mid-stream / 7.2 num_ref_idx). cap_pool model is NOT the regression. |
|
|
| Firefox 150 (default sandbox) | ✗ libva init fails inside RDD sandbox on `open(/dev/media0)` returning ENETDOWN — Firefox SW-falls-back. **NOT an iter2 code regression** (iter1 init code is byte-identical), but a Firefox routing change since iter1: iter1's findings.md shows decode happened on the **utility** process (`sandboxingKind=0`), iter2 today shows the libva path goes through RDD which is sandbox-blocked. Workaround: launch Firefox with `MOZ_DISABLE_RDD_SANDBOX=1`. |
|
|
| Brave / chromium-fourier 149 | ✓ unchanged scope (uses chromium-internal V4L2 backend, bypasses libva) |
|
|
|
|
The 10-frame decoded sequence under sandbox-bypass confirms Fix 3's cap_pool architecture works correctly with Firefox: surface IDs 67108864..67108871 each acquired their own slot, and surface IDs were recycled across frames 5,6,9 with the slot state machine cycling through IN_DECODE → DECODED → recycle on next BeginPicture for the same surface. Pool was operating exactly as designed.
|
|
|
|
## Architecture-level changes worth knowing for iter3+
|
|
|
|
- **`cap_pool` abstraction (`src/cap_pool.{h,c}`)** is now the single owner of CAPTURE buffers' V4L2 indices, mmap regions, and EXPBUF fd lifetimes. Surfaces hold a `current_slot` reference, populated at `BeginPicture` and cleared at `DestroySurfaces` or next `BeginPicture` for the same surface. Format-uniform fields (`destination_planes_count`, `destination_sizes`, `destination_offsets`, `destination_bytesperlines`) stay on `object_surface` since they're constant across slot rebinds.
|
|
- **Pool destroy lifecycle** is split between two paths: (a) `DestroyContext` calls `cap_pool_destroy` after the surface unbind cascade; (b) the resolution-change branch in `CreateSurfaces2` also tears the pool down before issuing REQBUFS(0) to avoid leaving slots pointing at dead V4L2 indices. Both code paths are required.
|
|
- **`DeriveImage` semantic change**: skips the surface→image copy when `current_slot == NULL`. ffmpeg's `av_hwframe_ctx_init` calls `vaDeriveImage` on never-decoded surfaces purely as a format probe; pre-Fix-3 this worked because every surface had a permanently-bound zero-filled mmap, post-Fix-3 it would crash. The new behavior gives the consumer a valid `VAImage` structure with a zero-filled buffer; the actual surface data is read on a later post-decode `DeriveImage`.
|
|
|
|
## Documented limitations carried into iteration 3 substrate
|
|
|
|
- **Option-A statistical mitigation, not provably race-free.** When the pool exhausts FREE slots and force-recycles the oldest EXPORTED slot, the consumer may still hold a dup'd dma_buf fd to the recycled slot's underlying physical memory. The kernel will happily DMA new content into it on the next QBUF. Closing OUR EXPBUF fd does not invalidate the consumer's dup. There is no public V4L2 API to query dma_buf refcount. For typical mpv 16-surface playback with `MIN_CAP_POOL=24`, the force-recycle path never fires. For pathological workloads (paused-with-video-still-visible, very long compositor hold windows, multi-stream), races still possible.
|
|
- **Multi-context concurrent libva use.** Each context allocates its own `cap_pool`, but there's only one V4L2 device. Two concurrent libva contexts (Firefox playing two tabs through separate RDD instances) would multiply pool size on a single device. Not addressed in iteration 2.
|
|
- **`LAST_OUTPUT_WIDTH/HEIGHT` cache is process-global.** Sequential session reuse is fixed (Fix 1's invalidation), but two concurrent contexts in the same process would still race on the static.
|
|
- **Iteration 1 carry-overs not addressed in iteration 2** (Sonnet review 7.x): EACCES on `VIDIOC_G_EXT_CTRLS` readback (7.1), `num_ref_idx` for multi-slice streams (7.2), `// HACK` block in `surface.c::CreateSurfaces2` for multi-codec (7.4), Firefox seek-to-non-IDR (7.5).
|
|
|
|
## What landed (commits ahead of iteration 1 close)
|
|
|
|
```
|
|
19acc76 iter2 Fix 3: decoupled CAPTURE buffer pool with LRU recycling
|
|
e64bb08 iter2 Fix 2: conditional DRM_FORMAT_MOD_INVALID for non-64-aligned pitch
|
|
06beef6 iter2 Fix 1: invalidate format cache on DestroyContext + REQBUFS(0) on CAPTURE in resolution-change path
|
|
```
|
|
|
|
DEBUG instrumentation from iteration 1 stays per the original Sonnet review guidance ("clean sweep at iteration 2 close" was the original plan, but Fix 3 introduced new sites that benefit from the existing CAPTURE-dump and ENTER-trace harness). Iteration 3 sweep still pending.
|
|
|
|
## Lessons distilled to memory
|
|
|
|
- **`feedback_consumer_probe_calls.md`** (NEW) — VAAPI consumers (notably ffmpeg's `av_hwframe_ctx_init`) may call `vaDeriveImage` and similar surface-introspection APIs on never-decoded surfaces purely as format probes. Backend code must tolerate uninitialized per-decode state (NULL data pointers, no slot bound) on these paths and return valid metadata + zero-filled buffers, not crash. Saved for iter3 work plus future codec backends.
|
|
|
|
(Phase 5 architecture lessons + the C-comment-fragility / `VIDEO_MAX_PLANES` redefinition syntax-bugs are not memory-worthy — derivable from current code state and gcc behavior respectively.)
|
|
|
|
## Phase 0 carry-over to iteration 3 (substrate input)
|
|
|
|
State that carries (re-verified in iteration 2):
|
|
- ohm RK3568 hantro G1/G2 on `/dev/video1` + `/dev/media0`, kernel 6.19.10
|
|
- libva 2.23.0, libva-utils 2.22.0, mpv 0.41.0-3, Firefox 150.0.1, Mesa 26.0.5
|
|
- Test fixture: `/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4`
|
|
- Build harness: `meson setup --buildtype=release && ninja` on ohm; deploy to `/usr/lib/dri/v4l2_request_drv_video.so`
|
|
- All three iter2 fixes deployed (driver sha256 `f27e0064...`)
|
|
|
|
Open questions for iteration 3:
|
|
1. **Firefox 150 RDD sandbox blocks `/dev/media0`** — workaround `MOZ_DISABLE_RDD_SANDBOX=1` confirmed. Either a Firefox routing change since iter1 (decode used to go through utility/sandboxingKind=0; now goes through RDD/sandboxed) or a Mesa-side change; needs investigation. Either upstream a Firefox prefs option, document the env var requirement permanently, or switch to a libva backend pattern that doesn't need /dev/media0 (likely impossible — the V4L2 stateless API requires it).
|
|
2. **EINVAL after ~10 frames in Firefox.** Same as iter1 carryover Sonnet 7.x family (7.5 mid-stream non-IDR, 7.2 num_ref_idx for multi-slice). Reproducible now; iter3 should narrow which control is rejected.
|
|
2. **DEBUG instrumentation sweep.** Remove ENTER logging, CAPTURE/OUTPUT hex dumps, sentinel write in `EndPicture`, msync workaround comment-only-or-removable. Required before any upstream snapshot.
|
|
3. **Performance comparison binding cell** (deferred from iter1, iter2 substrate notes). Iteration 3 should be the first iteration that does perf measurement: drop counts, effective FPS, browser CPU% on a defined corpus, scanout-plane residency for vaapi vs vaapi-copy vs SW.
|
|
4. **Multi-context libva safety** (Sonnet review 9.6). Two concurrent contexts (different processes or same-process Firefox + mpv overlap) need pool isolation that doesn't currently exist.
|
|
5. **`V4L2_MEMORY_DMABUF` userspace allocation** (the iter2 plan's Option B). True architectural fix for the EXPBUF lifecycle race — userspace allocates the dma_buf and lifetime is unambiguous. Significant kernel-side test surface; not free.
|
|
6. **Sonnet iter1 carry-over questions** still open: 7.1 EACCES, 7.2 num_ref_idx, 7.4 HACK refactor, 7.5 Firefox seek-to-non-IDR.
|
|
7. **Cleanup of `LAST_OUTPUT_WIDTH/HEIGHT` static.** Move to per-context (or per-pool) state to fix concurrent-context safety.
|
|
|
|
## Bootlin upstream outlook
|
|
|
|
iteration 2 widens the gap between this fork and bootlin upstream by ~700 LOC (Fix 3's cap_pool abstraction). Per `feedback_no_upstream.md`, no PRs unless asked. Iteration 3's DEBUG sweep + multi-codec HACK refactor + perf data are the natural prerequisites if upstreamability becomes a goal.
|