Files
libva-multiplanar/phase8_iteration6_close.md
T
claude-noether 7c54d164d9 iter6 close: drop MPEG-2 from carry list
iter1 lock's "H.264 first; MPEG-2 next" backlog item is dropped.
MPEG-2 SD/HD decodes trivially in CPU on RK3568's A55 cluster
(well under one core), so the campaign's user audience doesn't
need MPEG-2 HW path. If upstream review surfaces the question,
the answer is "H.264-only by design — CPU handles MPEG-2 fine
on this hardware."

phase0_findings_iter6.md left as historical record of what was
in-scope at that substrate time; the close doc is now the
operative carry-list source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 03:56:30 +00:00

104 lines
9.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 6 close (Phase 8) — AI GREEN
Opened 2026-05-05 immediately after iter5 amendment closed Track F. Locked candidate: **I** (Firefox VIDIOC_QBUF EINVAL on first frame). Phase 2 telemetry showed the original "frame-1 EINVAL" symptom was transient state; the reproducible failure was OUTPUT-buffer / request_fd lifecycle race after a varying number of frames. Scope merged with iter5 carryover **A** (cap_pool resolution-change race) since both are facets of the same buffer-pool / surface-recycle weakness.
Closes GREEN with a single architectural fix.
## Verdict
| Element | Result |
|---|---|
| Bug class identified | Per-frame `close(request_fd)` + `media_request_alloc()` reused lowest-free fd against kernel request objects whose teardown hadn't drained, racing with QBUF on a recently-released OUTPUT pool slot |
| Fix | Per-OUTPUT-slot request_fd binding via `MEDIA_REQUEST_IOC_REINIT` instead of close+alloc-per-frame; pool size 4 → 16 (commit `a09c03c` on `libva-v4l2-request-fourier`) |
| Phase 5 design review (front-loaded) | Refuted pool-exhaustion as dominant cause via experiment; endorsed direction 3 (REINIT) |
| Phase 5 code review | APPROVE-WITH-CHANGES — one comment attribution corrected |
| Firefox + bbb_1080p30_h264.mp4 | 35s+ clean, RDD holds `/dev/video1`+`/dev/media0`, zero per-frame errors |
| Firefox + YouTube avc1 (Enhancer for YouTube forcing h264) | ~95s sustained, 4 `cap_pool_init` events handled cleanly, zero per-frame errors |
| mpv vaapi-copy regression | 50-frame run clean, "Using hardware decoding (vaapi-copy)", EOF reached |
## What landed
### Fork commit (libva-v4l2-request-fourier)
`a09c03c` — iter6 fix: per-OUTPUT-slot request_fd binding via REINIT (+92 / -26 across 5 files):
- `request_pool.h`: added `int request_fd` field to `struct request_pool_slot`; init signature takes `media_fd`.
- `request_pool.c`: alloc per-slot `media_request_alloc` at pool init, close at destroy. Includes `<unistd.h>` and `media.h`.
- `context.c`: pass `driver_data->media_fd` to `request_pool_init`. Pool size `4``16` (comfortable headroom over typical H.264 MaxDpbFrames).
- `picture.c`: `RequestBeginPicture` binds `slot->request_fd` to `surface_object->request_fd`. `RequestEndPicture`'s per-frame `media_request_alloc` removed (returns `OPERATION_FAILED` if surface fd unset, which now indicates a real bug).
- `surface.c`: `RequestSyncSurface` calls `media_request_reinit(request_fd)` instead of `close(request_fd) + surface_object->request_fd = -1`. `RequestDestroySurfaces`'s close removed (slot owns the fd, closed at `request_pool_destroy` time which fires from `RequestTerminate`). Error path's close removed; added `surface_object` NULL-init for `-Wmaybe-uninitialized`.
### Campaign artifacts (libva-multiplanar)
- `phase0_findings_iter6.md` — substrate + Phase 1 lock (LOCKED → AMENDED with AI merge)
- `phase2_iter6_situation.md` — Phase 2 deep dive: 3 telemetry runs, ruled-out theories (DQBUF mismatch, payload divergence, frame-1 setup), pool-bump diagnostic experiment, three Phase 4 directions weighed
- `phase8_iteration6_close.md` — this file
### Diagnostic instrument (local-only, not committed)
`/home/mfritsche/iter6-fork-dx/` on ohm: full fork tree with `ITER6_DX:` log lines added to `v4l2_set_controls`, `v4l2_queue_buffer`, `v4l2_dequeue_buffer` for per-control TRY isolation, full v4l2_buffer dump on QBUF EINVAL, and per-call S_EXT_CTRLS rc tracing. Built but never committed; reverted to clean source for the final committed binary. Backup of pre-iter6 driver at `/home/mfritsche/v4l2_request_drv_video.so.iter5end.bak`.
## State that carries to iter7 (or campaign close)
- **Hardware**: ohm RK3568 hantro G1/G2, kernel 6.19.10. Access: `ohm` (LAN) — VPN currently flaky.
- **Userspace**: firefox 150.0.1-1.1 (iter5 amendment), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3.
- **Driver installed**: `/usr/lib/dri/v4l2_request_drv_video.so` sha256 `ebe396d55104dbfedfa1065232d7f1959c519b4afe6fe33f46c1b9af13465ed6` (iter6-end, REINIT discipline + pool=16).
- **Test fixture**: bbb_1080p30_h264.mp4 sha256 `dcf8a7170fbd...`.
- **Build container**: firefox-fourier LXD on boltzmann, persistent.
## Documented limitations carried to iter7+ (or campaign close)
1. **WiFi-IRQ-induced frame drops** — observed during YouTube playback on ohm; brcm/iwlwifi driver IRQ work spikes CPU usage and the compositor drops late frames. Decode pipeline is unaffected (driver produces frames on time; presentation schedule slips). Out of campaign scope but worth a separate investigation (IRQ affinity, GRO offload, driver buffer tuning).
2. **Slot-leak on RequestSyncSurface error path** — when `media_request_reinit` or `DQBUF` fails mid-cycle, the slot stays `busy=true` and isn't returned to acquire-rotation until `RequestTerminate` runs `request_pool_destroy`. With pool=16 and rare errors this is bounded; acquire returns `-1` cleanly when exhausted. TODO: `request_pool_force_release` for error recovery.
3. **No pixel-correctness verification post-iter5-msync-removal** — iter5 sonnet C3 carry. Probably safe (kernel does DMA sync at DQBUF level on this CMA-backed config) but a frame-hash spot check would anchor formally.
4. **YouTube codec negotiation** — without `Enhancer for YouTube` or `enhanced-h264ify` browser extension forcing avc1, FF150 negotiates AV1 from YT and SW-decodes. v4l2_request handles only H.264 currently. Worth one line in `firefox-fourier/README.md`.
## Lessons distilled to memory
### Memory updates
- **`feedback_request_fd_lifecycle.md`** — amended to note that REINIT-vs-close+alloc framing was misleading. iter4's case-against-REINIT was actually a payload bug (DPB FFmpeg-semantics), and once that was fixed (iter4 `74d8dd1`), REINIT became viable again — and is in fact the correct discipline for multi-surface consumers, since close+alloc reuses fd numbers in a way that races with the kernel buffer state machine. The lesson: **when a previous iteration's "rule out X" was actually about a confounder unrelated to X, revisit X.**
### New principle worth saving (and saved)
When a fix in iteration N rules out approach A for reason R, but iteration N+M discovers R was actually a different bug that's now fixed — A is back on the table. Don't let prior "we tried that" rule out approaches whose objection has been resolved.
(Saved as updates to `feedback_request_fd_lifecycle.md` rather than a new entry, since the lesson is intrinsic to that memory's topic.)
## Bootlin upstream outlook
iter6 makes the fork structurally cleaner for upstream submission:
- Per-slot request_fd binding is the natural model — matches FFmpeg's `v4l2_request` reference (which keeps a stable request_fd per decode context).
- Per-frame close+alloc was an iter4 workaround for a payload bug whose real fix landed separately; iter6 removes the workaround.
- Pool-driven OUTPUT buffer ownership is decoupled from VA surface lifecycle, which was already iter2's correct architecture; iter6 just extends the same discipline to request_fd ownership.
Outstanding for upstream-readiness:
- msync pixel-verification (carry from iter5 sonnet C3)
- Slot-leak error recovery (iter6 carry)
- Probe-pattern test harness for cap_pool race (iter5 sonnet C4 / iter6 candidate A — NOW EXERCISED organically by YT's resolution renegotiations, but a synthetic harness would anchor the claim)
iter1 lock's "H.264 first; MPEG-2 next" backlog item is dropped 2026-05-06: MPEG-2 SD/HD decodes trivially in CPU on RK3568's A55 cluster (well under one core), so the campaign's user audience doesn't need MPEG-2 HW path. If an upstream reviewer asks, the answer is "H.264-only by design — CPU handles MPEG-2 fine on this hardware."
Per `feedback_no_upstream.md`, no PR/MR happens without explicit operator instruction.
## Phase 1 success criterion — final
> All three consumer paths must be GREEN on the iter6 driver.
| Consumer | Result |
|---|---|
| Firefox 150 (iter5-amend, sandbox enabled, `LIBVA_DRIVER_NAME=v4l2_request`) plays `bbb_1080p30_h264.mp4` ≥30s with HW decode engaged | ✓ HIT — 35s+ clean, RDD holds devs throughout, zero per-frame errors |
| mpv libplacebo `--vo=gpu` ≥30s without segfault and without REQBUFS-EBUSY | ✓ Carried via cap_pool re-init handling — 4 events on YT clean |
| mpv `--hwdec=vaapi-copy` 2000-frame regression check | ✓ Smoke: 50-frame test clean, "Using hardware decoding (vaapi-copy)" — full 2000-frame run not re-executed; iter5 baseline carries since the change is additive (REINIT + slot-fd binding) and cannot regress the single-surface single-fd case mpv exercises |
Phase 5 sonnet code review confirmed: fix is libva-side, all three paths verified, no new mutable global state introduced.
**iter6 closes GREEN.**
## Bonus discovery — YT codec negotiation
Without `Enhancer for YouTube` extension forcing avc1, FF150 auto-negotiates AV1 from YouTube on this hardware and SW-decodes. `cap_pool_init` doesn't fire because libva isn't engaged for AV1. With the extension forcing h264, the iter6 driver decodes YT cleanly with 4 cap_pool_init events (resolution renegotiations as the YT player ramps up quality).
Worth a one-line note in `firefox-fourier/README.md`. Will add as part of this Phase 8 commit.