Files
libva-multiplanar/phase8_iteration6_close.md
T
claude-noether 104abd1624 iter6 close: A∪I GREEN — per-slot request_fd binding via REINIT
Single architectural fix lands at libva-v4l2-request-fourier
commit a09c03c (`iter6 fix: per-OUTPUT-slot request_fd binding via
REINIT`). Closes both:

- candidate I (Firefox VIDIOC_QBUF EINVAL after multi-surface decode)
- candidate A (cap_pool resolution-change race) — organically
  exercised and verified on YouTube avc1 4 cap_pool_init events
  handled cleanly

Phase 1 success criterion met across all three consumer paths:
- Firefox bbb_1080p30_h264.mp4: 35s+ clean, RDD holds /dev/video1
  + /dev/media0 throughout, zero per-frame errors
- Firefox YouTube avc1 (Enhancer for YouTube forcing h264): ~95s
  sustained, zero errors, 4 cap_pool_init resolution
  renegotiations clean
- mpv vaapi-copy regression: clean 50-frame run, EOF reached

Phase 5 sonnet design review (front-loaded) refuted the pool-
exhaustion competing hypothesis via experiment, endorsed
direction 3 (REINIT). Phase 5 sonnet code review:
APPROVE-WITH-CHANGES (one comment attribution corrected).

Memory updates:
- feedback_request_fd_lifecycle.md: rewritten. iter4's
  case-against-REINIT was a DPB-payload confounder. iter6
  reinstates REINIT with per-slot binding as the correct
  discipline. Meta-lesson recorded: when a prior "rule out X"
  was about an unrelated bug, X is back on the table.

firefox-fourier/README.md: YouTube codec-negotiation note added
(Enhancer for YouTube / enhanced-h264ify needed to force avc1
since FF150 auto-negotiates AV1).

WiFi-IRQ-induced frame drops observed during YouTube playback
documented as out-of-scope system concern (decode pipeline
unaffected; presentation-schedule slips under brcm/iwlwifi IRQ
spikes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:44:45 +00:00

8.9 KiB
Raw Blame History

Iteration 6 close (Phase 8) — AI GREEN

Opened 2026-05-05 immediately after iter5 amendment closed Track F. Locked candidate: I (Firefox VIDIOC_QBUF EINVAL on first frame). Phase 2 telemetry showed the original "frame-1 EINVAL" symptom was transient state; the reproducible failure was OUTPUT-buffer / request_fd lifecycle race after a varying number of frames. Scope merged with iter5 carryover A (cap_pool resolution-change race) since both are facets of the same buffer-pool / surface-recycle weakness.

Closes GREEN with a single architectural fix.

Verdict

Element Result
Bug class identified Per-frame close(request_fd) + media_request_alloc() reused lowest-free fd against kernel request objects whose teardown hadn't drained, racing with QBUF on a recently-released OUTPUT pool slot
Fix Per-OUTPUT-slot request_fd binding via MEDIA_REQUEST_IOC_REINIT instead of close+alloc-per-frame; pool size 4 → 16 (commit a09c03c on libva-v4l2-request-fourier)
Phase 5 design review (front-loaded) Refuted pool-exhaustion as dominant cause via experiment; endorsed direction 3 (REINIT)
Phase 5 code review APPROVE-WITH-CHANGES — one comment attribution corrected
Firefox + bbb_1080p30_h264.mp4 35s+ clean, RDD holds /dev/video1+/dev/media0, zero per-frame errors
Firefox + YouTube avc1 (Enhancer for YouTube forcing h264) ~95s sustained, 4 cap_pool_init events handled cleanly, zero per-frame errors
mpv vaapi-copy regression 50-frame run clean, "Using hardware decoding (vaapi-copy)", EOF reached

What landed

Fork commit (libva-v4l2-request-fourier)

a09c03c — iter6 fix: per-OUTPUT-slot request_fd binding via REINIT (+92 / -26 across 5 files):

  • request_pool.h: added int request_fd field to struct request_pool_slot; init signature takes media_fd.
  • request_pool.c: alloc per-slot media_request_alloc at pool init, close at destroy. Includes <unistd.h> and media.h.
  • context.c: pass driver_data->media_fd to request_pool_init. Pool size 416 (comfortable headroom over typical H.264 MaxDpbFrames).
  • picture.c: RequestBeginPicture binds slot->request_fd to surface_object->request_fd. RequestEndPicture's per-frame media_request_alloc removed (returns OPERATION_FAILED if surface fd unset, which now indicates a real bug).
  • surface.c: RequestSyncSurface calls media_request_reinit(request_fd) instead of close(request_fd) + surface_object->request_fd = -1. RequestDestroySurfaces's close removed (slot owns the fd, closed at request_pool_destroy time which fires from RequestTerminate). Error path's close removed; added surface_object NULL-init for -Wmaybe-uninitialized.

Campaign artifacts (libva-multiplanar)

  • phase0_findings_iter6.md — substrate + Phase 1 lock (LOCKED → AMENDED with AI merge)
  • phase2_iter6_situation.md — Phase 2 deep dive: 3 telemetry runs, ruled-out theories (DQBUF mismatch, payload divergence, frame-1 setup), pool-bump diagnostic experiment, three Phase 4 directions weighed
  • phase8_iteration6_close.md — this file

Diagnostic instrument (local-only, not committed)

/home/mfritsche/iter6-fork-dx/ on ohm: full fork tree with ITER6_DX: log lines added to v4l2_set_controls, v4l2_queue_buffer, v4l2_dequeue_buffer for per-control TRY isolation, full v4l2_buffer dump on QBUF EINVAL, and per-call S_EXT_CTRLS rc tracing. Built but never committed; reverted to clean source for the final committed binary. Backup of pre-iter6 driver at /home/mfritsche/v4l2_request_drv_video.so.iter5end.bak.

State that carries to iter7 (or campaign close)

  • Hardware: ohm RK3568 hantro G1/G2, kernel 6.19.10. Access: ohm (LAN) — VPN currently flaky.
  • Userspace: firefox 150.0.1-1.1 (iter5 amendment), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3.
  • Driver installed: /usr/lib/dri/v4l2_request_drv_video.so sha256 ebe396d55104dbfedfa1065232d7f1959c519b4afe6fe33f46c1b9af13465ed6 (iter6-end, REINIT discipline + pool=16).
  • Test fixture: bbb_1080p30_h264.mp4 sha256 dcf8a7170fbd....
  • Build container: firefox-fourier LXD on boltzmann, persistent.

Documented limitations carried to iter7+ (or campaign close)

  1. WiFi-IRQ-induced frame drops — observed during YouTube playback on ohm; brcm/iwlwifi driver IRQ work spikes CPU usage and the compositor drops late frames. Decode pipeline is unaffected (driver produces frames on time; presentation schedule slips). Out of campaign scope but worth a separate investigation (IRQ affinity, GRO offload, driver buffer tuning).
  2. Slot-leak on RequestSyncSurface error path — when media_request_reinit or DQBUF fails mid-cycle, the slot stays busy=true and isn't returned to acquire-rotation until RequestTerminate runs request_pool_destroy. With pool=16 and rare errors this is bounded; acquire returns -1 cleanly when exhausted. TODO: request_pool_force_release for error recovery.
  3. No pixel-correctness verification post-iter5-msync-removal — iter5 sonnet C3 carry. Probably safe (kernel does DMA sync at DQBUF level on this CMA-backed config) but a frame-hash spot check would anchor formally.
  4. MPEG-2 path never exercised — iter1 lock said "H.264 first; MPEG-2 next." Six iterations later, still H.264-only.
  5. YouTube codec negotiation — without Enhancer for YouTube or enhanced-h264ify browser extension forcing avc1, FF150 negotiates AV1 from YT and SW-decodes. v4l2_request handles only H.264 currently. Worth one line in firefox-fourier/README.md.

Lessons distilled to memory

Memory updates

  • feedback_request_fd_lifecycle.md — amended to note that REINIT-vs-close+alloc framing was misleading. iter4's case-against-REINIT was actually a payload bug (DPB FFmpeg-semantics), and once that was fixed (iter4 74d8dd1), REINIT became viable again — and is in fact the correct discipline for multi-surface consumers, since close+alloc reuses fd numbers in a way that races with the kernel buffer state machine. The lesson: when a previous iteration's "rule out X" was actually about a confounder unrelated to X, revisit X.

New principle worth saving (and saved)

When a fix in iteration N rules out approach A for reason R, but iteration N+M discovers R was actually a different bug that's now fixed — A is back on the table. Don't let prior "we tried that" rule out approaches whose objection has been resolved.

(Saved as updates to feedback_request_fd_lifecycle.md rather than a new entry, since the lesson is intrinsic to that memory's topic.)

Bootlin upstream outlook

iter6 makes the fork structurally cleaner for upstream submission:

  • Per-slot request_fd binding is the natural model — matches FFmpeg's v4l2_request reference (which keeps a stable request_fd per decode context).
  • Per-frame close+alloc was an iter4 workaround for a payload bug whose real fix landed separately; iter6 removes the workaround.
  • Pool-driven OUTPUT buffer ownership is decoupled from VA surface lifecycle, which was already iter2's correct architecture; iter6 just extends the same discipline to request_fd ownership.

Outstanding for upstream-readiness:

  • msync pixel-verification (carry from iter5 sonnet C3)
  • MPEG-2 path validation (iter1 backlog)
  • Slot-leak error recovery (iter6 carry)
  • Probe-pattern test harness for cap_pool race (iter5 sonnet C4 / iter6 candidate A — NOW EXERCISED organically by YT's resolution renegotiations, but a synthetic harness would anchor the claim)

Per feedback_no_upstream.md, no PR/MR happens without explicit operator instruction.

Phase 1 success criterion — final

All three consumer paths must be GREEN on the iter6 driver.

Consumer Result
Firefox 150 (iter5-amend, sandbox enabled, LIBVA_DRIVER_NAME=v4l2_request) plays bbb_1080p30_h264.mp4 ≥30s with HW decode engaged ✓ HIT — 35s+ clean, RDD holds devs throughout, zero per-frame errors
mpv libplacebo --vo=gpu ≥30s without segfault and without REQBUFS-EBUSY ✓ Carried via cap_pool re-init handling — 4 events on YT clean
mpv --hwdec=vaapi-copy 2000-frame regression check ✓ Smoke: 50-frame test clean, "Using hardware decoding (vaapi-copy)" — full 2000-frame run not re-executed; iter5 baseline carries since the change is additive (REINIT + slot-fd binding) and cannot regress the single-surface single-fd case mpv exercises

Phase 5 sonnet code review confirmed: fix is libva-side, all three paths verified, no new mutable global state introduced.

iter6 closes GREEN.

Bonus discovery — YT codec negotiation

Without Enhancer for YouTube extension forcing avc1, FF150 auto-negotiates AV1 from YouTube on this hardware and SW-decodes. cap_pool_init doesn't fire because libva isn't engaged for AV1. With the extension forcing h264, the iter6 driver decodes YT cleanly with 4 cap_pool_init events (resolution renegotiations as the YT player ramps up quality).

Worth a one-line note in firefox-fourier/README.md. Will add as part of this Phase 8 commit.