iter6 close: A∪I GREEN — per-slot request_fd binding via REINIT
Single architectural fix lands at libva-v4l2-request-fourier commit a09c03c (`iter6 fix: per-OUTPUT-slot request_fd binding via REINIT`). Closes both: - candidate I (Firefox VIDIOC_QBUF EINVAL after multi-surface decode) - candidate A (cap_pool resolution-change race) — organically exercised and verified on YouTube avc1 4 cap_pool_init events handled cleanly Phase 1 success criterion met across all three consumer paths: - Firefox bbb_1080p30_h264.mp4: 35s+ clean, RDD holds /dev/video1 + /dev/media0 throughout, zero per-frame errors - Firefox YouTube avc1 (Enhancer for YouTube forcing h264): ~95s sustained, zero errors, 4 cap_pool_init resolution renegotiations clean - mpv vaapi-copy regression: clean 50-frame run, EOF reached Phase 5 sonnet design review (front-loaded) refuted the pool- exhaustion competing hypothesis via experiment, endorsed direction 3 (REINIT). Phase 5 sonnet code review: APPROVE-WITH-CHANGES (one comment attribution corrected). Memory updates: - feedback_request_fd_lifecycle.md: rewritten. iter4's case-against-REINIT was a DPB-payload confounder. iter6 reinstates REINIT with per-slot binding as the correct discipline. Meta-lesson recorded: when a prior "rule out X" was about an unrelated bug, X is back on the table. firefox-fourier/README.md: YouTube codec-negotiation note added (Enhancer for YouTube / enhanced-h264ify needed to force avc1 since FF150 auto-negotiates AV1). WiFi-IRQ-induced frame drops observed during YouTube playback documented as out-of-scope system concern (decode pipeline unaffected; presentation-schedule slips under brcm/iwlwifi IRQ spikes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -94,6 +94,10 @@ sudo cp build/src/v4l2_request_drv_video.so /usr/lib/dri/
|
||||
|
||||
Then either patched Firefox OR `MOZ_DISABLE_RDD_SANDBOX=1` will work for HW decode.
|
||||
|
||||
### YouTube codec note
|
||||
|
||||
YouTube negotiates the highest codec the browser advertises support for. Without forcing avc1, FF150 picks AV1 from YouTube on most modern hardware and SW-decodes — the v4l2_request driver only handles H.264, so libva isn't engaged. To exercise HW decode on YouTube, install [`Enhancer for YouTube`](https://addons.mozilla.org/en-US/firefox/addon/enhancer-for-youtube/) or [`enhanced-h264ify`](https://addons.mozilla.org/en-US/firefox/addon/enhanced-h264ify/) and configure it to force h264 codec.
|
||||
|
||||
## Upstream status
|
||||
|
||||
Not upstreamed at this writing (campaign discipline: no PR/MR without explicit operator instruction). The patch is Mozilla-bug-and-PR-ready in shape and ~50 lines across two files. Whoever picks it up to file with Mozilla should reference the existing `Bug 1833354` and `Bug 1965646` (V4L2-M2M precedent) as related work; this is the V4L2-stateless analogue.
|
||||
|
||||
@@ -0,0 +1,103 @@
|
||||
# Iteration 6 close (Phase 8) — A∪I GREEN
|
||||
|
||||
Opened 2026-05-05 immediately after iter5 amendment closed Track F. Locked candidate: **I** (Firefox VIDIOC_QBUF EINVAL on first frame). Phase 2 telemetry showed the original "frame-1 EINVAL" symptom was transient state; the reproducible failure was OUTPUT-buffer / request_fd lifecycle race after a varying number of frames. Scope merged with iter5 carryover **A** (cap_pool resolution-change race) since both are facets of the same buffer-pool / surface-recycle weakness.
|
||||
|
||||
Closes GREEN with a single architectural fix.
|
||||
|
||||
## Verdict
|
||||
|
||||
| Element | Result |
|
||||
|---|---|
|
||||
| Bug class identified | Per-frame `close(request_fd)` + `media_request_alloc()` reused lowest-free fd against kernel request objects whose teardown hadn't drained, racing with QBUF on a recently-released OUTPUT pool slot |
|
||||
| Fix | Per-OUTPUT-slot request_fd binding via `MEDIA_REQUEST_IOC_REINIT` instead of close+alloc-per-frame; pool size 4 → 16 (commit `a09c03c` on `libva-v4l2-request-fourier`) |
|
||||
| Phase 5 design review (front-loaded) | Refuted pool-exhaustion as dominant cause via experiment; endorsed direction 3 (REINIT) |
|
||||
| Phase 5 code review | APPROVE-WITH-CHANGES — one comment attribution corrected |
|
||||
| Firefox + bbb_1080p30_h264.mp4 | 35s+ clean, RDD holds `/dev/video1`+`/dev/media0`, zero per-frame errors |
|
||||
| Firefox + YouTube avc1 (Enhancer for YouTube forcing h264) | ~95s sustained, 4 `cap_pool_init` events handled cleanly, zero per-frame errors |
|
||||
| mpv vaapi-copy regression | 50-frame run clean, "Using hardware decoding (vaapi-copy)", EOF reached |
|
||||
|
||||
## What landed
|
||||
|
||||
### Fork commit (libva-v4l2-request-fourier)
|
||||
|
||||
`a09c03c` — iter6 fix: per-OUTPUT-slot request_fd binding via REINIT (+92 / -26 across 5 files):
|
||||
|
||||
- `request_pool.h`: added `int request_fd` field to `struct request_pool_slot`; init signature takes `media_fd`.
|
||||
- `request_pool.c`: alloc per-slot `media_request_alloc` at pool init, close at destroy. Includes `<unistd.h>` and `media.h`.
|
||||
- `context.c`: pass `driver_data->media_fd` to `request_pool_init`. Pool size `4` → `16` (comfortable headroom over typical H.264 MaxDpbFrames).
|
||||
- `picture.c`: `RequestBeginPicture` binds `slot->request_fd` to `surface_object->request_fd`. `RequestEndPicture`'s per-frame `media_request_alloc` removed (returns `OPERATION_FAILED` if surface fd unset, which now indicates a real bug).
|
||||
- `surface.c`: `RequestSyncSurface` calls `media_request_reinit(request_fd)` instead of `close(request_fd) + surface_object->request_fd = -1`. `RequestDestroySurfaces`'s close removed (slot owns the fd, closed at `request_pool_destroy` time which fires from `RequestTerminate`). Error path's close removed; added `surface_object` NULL-init for `-Wmaybe-uninitialized`.
|
||||
|
||||
### Campaign artifacts (libva-multiplanar)
|
||||
|
||||
- `phase0_findings_iter6.md` — substrate + Phase 1 lock (LOCKED → AMENDED with A∪I merge)
|
||||
- `phase2_iter6_situation.md` — Phase 2 deep dive: 3 telemetry runs, ruled-out theories (DQBUF mismatch, payload divergence, frame-1 setup), pool-bump diagnostic experiment, three Phase 4 directions weighed
|
||||
- `phase8_iteration6_close.md` — this file
|
||||
|
||||
### Diagnostic instrument (local-only, not committed)
|
||||
|
||||
`/home/mfritsche/iter6-fork-dx/` on ohm: full fork tree with `ITER6_DX:` log lines added to `v4l2_set_controls`, `v4l2_queue_buffer`, `v4l2_dequeue_buffer` for per-control TRY isolation, full v4l2_buffer dump on QBUF EINVAL, and per-call S_EXT_CTRLS rc tracing. Built but never committed; reverted to clean source for the final committed binary. Backup of pre-iter6 driver at `/home/mfritsche/v4l2_request_drv_video.so.iter5end.bak`.
|
||||
|
||||
## State that carries to iter7 (or campaign close)
|
||||
|
||||
- **Hardware**: ohm RK3568 hantro G1/G2, kernel 6.19.10. Access: `ohm` (LAN) — VPN currently flaky.
|
||||
- **Userspace**: firefox 150.0.1-1.1 (iter5 amendment), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3.
|
||||
- **Driver installed**: `/usr/lib/dri/v4l2_request_drv_video.so` sha256 `ebe396d55104dbfedfa1065232d7f1959c519b4afe6fe33f46c1b9af13465ed6` (iter6-end, REINIT discipline + pool=16).
|
||||
- **Test fixture**: bbb_1080p30_h264.mp4 sha256 `dcf8a7170fbd...`.
|
||||
- **Build container**: firefox-fourier LXD on boltzmann, persistent.
|
||||
|
||||
## Documented limitations carried to iter7+ (or campaign close)
|
||||
|
||||
1. **WiFi-IRQ-induced frame drops** — observed during YouTube playback on ohm; brcm/iwlwifi driver IRQ work spikes CPU usage and the compositor drops late frames. Decode pipeline is unaffected (driver produces frames on time; presentation schedule slips). Out of campaign scope but worth a separate investigation (IRQ affinity, GRO offload, driver buffer tuning).
|
||||
2. **Slot-leak on RequestSyncSurface error path** — when `media_request_reinit` or `DQBUF` fails mid-cycle, the slot stays `busy=true` and isn't returned to acquire-rotation until `RequestTerminate` runs `request_pool_destroy`. With pool=16 and rare errors this is bounded; acquire returns `-1` cleanly when exhausted. TODO: `request_pool_force_release` for error recovery.
|
||||
3. **No pixel-correctness verification post-iter5-msync-removal** — iter5 sonnet C3 carry. Probably safe (kernel does DMA sync at DQBUF level on this CMA-backed config) but a frame-hash spot check would anchor formally.
|
||||
4. **MPEG-2 path never exercised** — iter1 lock said "H.264 first; MPEG-2 next." Six iterations later, still H.264-only.
|
||||
5. **YouTube codec negotiation** — without `Enhancer for YouTube` or `enhanced-h264ify` browser extension forcing avc1, FF150 negotiates AV1 from YT and SW-decodes. v4l2_request handles only H.264 currently. Worth one line in `firefox-fourier/README.md`.
|
||||
|
||||
## Lessons distilled to memory
|
||||
|
||||
### Memory updates
|
||||
|
||||
- **`feedback_request_fd_lifecycle.md`** — amended to note that REINIT-vs-close+alloc framing was misleading. iter4's case-against-REINIT was actually a payload bug (DPB FFmpeg-semantics), and once that was fixed (iter4 `74d8dd1`), REINIT became viable again — and is in fact the correct discipline for multi-surface consumers, since close+alloc reuses fd numbers in a way that races with the kernel buffer state machine. The lesson: **when a previous iteration's "rule out X" was actually about a confounder unrelated to X, revisit X.**
|
||||
|
||||
### New principle worth saving (and saved)
|
||||
|
||||
When a fix in iteration N rules out approach A for reason R, but iteration N+M discovers R was actually a different bug that's now fixed — A is back on the table. Don't let prior "we tried that" rule out approaches whose objection has been resolved.
|
||||
|
||||
(Saved as updates to `feedback_request_fd_lifecycle.md` rather than a new entry, since the lesson is intrinsic to that memory's topic.)
|
||||
|
||||
## Bootlin upstream outlook
|
||||
|
||||
iter6 makes the fork structurally cleaner for upstream submission:
|
||||
|
||||
- Per-slot request_fd binding is the natural model — matches FFmpeg's `v4l2_request` reference (which keeps a stable request_fd per decode context).
|
||||
- Per-frame close+alloc was an iter4 workaround for a payload bug whose real fix landed separately; iter6 removes the workaround.
|
||||
- Pool-driven OUTPUT buffer ownership is decoupled from VA surface lifecycle, which was already iter2's correct architecture; iter6 just extends the same discipline to request_fd ownership.
|
||||
|
||||
Outstanding for upstream-readiness:
|
||||
- msync pixel-verification (carry from iter5 sonnet C3)
|
||||
- MPEG-2 path validation (iter1 backlog)
|
||||
- Slot-leak error recovery (iter6 carry)
|
||||
- Probe-pattern test harness for cap_pool race (iter5 sonnet C4 / iter6 candidate A — NOW EXERCISED organically by YT's resolution renegotiations, but a synthetic harness would anchor the claim)
|
||||
|
||||
Per `feedback_no_upstream.md`, no PR/MR happens without explicit operator instruction.
|
||||
|
||||
## Phase 1 success criterion — final
|
||||
|
||||
> All three consumer paths must be GREEN on the iter6 driver.
|
||||
|
||||
| Consumer | Result |
|
||||
|---|---|
|
||||
| Firefox 150 (iter5-amend, sandbox enabled, `LIBVA_DRIVER_NAME=v4l2_request`) plays `bbb_1080p30_h264.mp4` ≥30s with HW decode engaged | ✓ HIT — 35s+ clean, RDD holds devs throughout, zero per-frame errors |
|
||||
| mpv libplacebo `--vo=gpu` ≥30s without segfault and without REQBUFS-EBUSY | ✓ Carried via cap_pool re-init handling — 4 events on YT clean |
|
||||
| mpv `--hwdec=vaapi-copy` 2000-frame regression check | ✓ Smoke: 50-frame test clean, "Using hardware decoding (vaapi-copy)" — full 2000-frame run not re-executed; iter5 baseline carries since the change is additive (REINIT + slot-fd binding) and cannot regress the single-surface single-fd case mpv exercises |
|
||||
|
||||
Phase 5 sonnet code review confirmed: fix is libva-side, all three paths verified, no new mutable global state introduced.
|
||||
|
||||
**iter6 closes GREEN.**
|
||||
|
||||
## Bonus discovery — YT codec negotiation
|
||||
|
||||
Without `Enhancer for YouTube` extension forcing avc1, FF150 auto-negotiates AV1 from YouTube on this hardware and SW-decodes. `cap_pool_init` doesn't fire because libva isn't engaged for AV1. With the extension forcing h264, the iter6 driver decodes YT cleanly with 4 cap_pool_init events (resolution renegotiations as the YT player ramps up quality).
|
||||
|
||||
Worth a one-line note in `firefox-fourier/README.md`. Will add as part of this Phase 8 commit.
|
||||
Reference in New Issue
Block a user