# Iteration 2 Phase 4 — plan Three independent fixes, ordered cheapest-first. Each tested before stacking the next. ## Fix 1 — Multi-resolution: invalidate `LAST_OUTPUT_WIDTH/HEIGHT` cache on session teardown **File**: `src/surface.c` + `src/context.c` (or wherever DestroyContext lives) **Diff scale**: ~5 lines **Problem**: Iteration 1 commit `37c0e72` added `LAST_OUTPUT_WIDTH/HEIGHT` static globals to skip re-S_FMT when dimensions unchanged. Across multiple playback sessions, the kernel CAPTURE format gets reset (likely on REQBUFS(0) at session end) but our cache still says "format already 1920×1088, no re-set needed." Next session inherits a 48×48 default kernel state and our cache lies. **Fix**: zero out `LAST_OUTPUT_WIDTH/HEIGHT` whenever the OUTPUT queue is REQBUFS(0)'d (in our existing reset path on resolution change) AND in `RequestDestroyContext`. Forces next CreateSurfaces2 to S_FMT. **Risk**: low. Worst case extra S_FMT call during normal operation — harmless. **Validation**: Firefox loading mozilla.org main page (multi-video sequence at varying resolutions) should not trigger `fmt_width=48 fmt_height=48` regression. Operator-confirmable. ## Fix 2 — WSI alignment: try `DRM_FORMAT_MOD_INVALID` in surface descriptor **File**: `src/surface.c::RequestExportSurfaceHandle` line 657 **Diff scale**: 1 line + 1 fallback case **Problem**: Mesa's WSI rejects pitch=864 (only 16-aligned) with "WSI pitch not properly aligned." Our `objects[i].drm_format_modifier = video_format->drm_modifier = DRM_FORMAT_MOD_NONE` (= 0 = LINEAR explicit). Mesa interprets this as "I am promised this is a strict LINEAR DMA-BUF that meets WSI alignment" — but hantro's actual buffer doesn't meet WSI alignment for non-64-aligned widths. **Fix attempt 1** (cheapest): change `drm_modifier` to `DRM_FORMAT_MOD_INVALID` (~0ULL). This tells Mesa "modifier unknown, treat as implicit / texture-import-only" — Mesa will texture-upload rather than direct WSI-import, paying a small perf cost but correct. **Fix attempt 2** (fallback): if `DRM_FORMAT_MOD_INVALID` causes regressions in mpv vaapi-copy or Firefox, fall back to per-modifier branching: - mpv vaapi-copy: use `DRM_FORMAT_MOD_LINEAR` (works as before) - Firefox + WSI-aligned widths (64-aligned): `DRM_FORMAT_MOD_LINEAR` - Non-aligned widths: `DRM_FORMAT_MOD_INVALID` We can't easily detect the consumer at export time, so attempt 1 first universally. **Validation**: Firefox plays mozilla.org's 864-wide intro videos without "MESA: error: WSI pitch not properly aligned"; mpv vaapi-copy and vaapi (DMA-BUF) still render bunny on bbb 1080p. **Risk**: medium. `DRM_FORMAT_MOD_INVALID` is a documented sentinel but its semantics in EGL_EXT_image_dma_buf_import vary by driver. Worst case: regression on consumers we currently work for. Mitigation: revert to `DRM_FORMAT_MOD_NONE` if regression observed. ## Fix 3 — Decoupled CAPTURE buffer pool with LRU recycling (the load-bearing fix) **File**: `src/surface.h`, `src/surface.c`, `src/picture.c` **Diff scale**: ~150-200 lines (new code) **Problem**: 1:1 surface↔buffer binding causes V4L2 to re-QBUF the same physical buffer while consumer still holds an EXPBUF'd fd to it. Per Phase 2 analysis + Phase 3 baseline (re-queue interval ~875ms vs compositor hold ~50ms): race window opens during certain playback patterns and produces operator-visible stutter on mpv vaapi --vo=gpu. **Architecture (Option A from Phase 2)**: - Add `struct cap_pool_slot` per CAPTURE buffer with state: FREE, IN_DECODE, EXPORTED. - Allocate `max(surfaces_count, MIN_CAP_POOL)` CAPTURE buffers (MIN_CAP_POOL = 24 — gives ~3x headroom for typical mpv 16-surface pool). - Decouple: surfaces no longer have permanent `destination_index`. Instead, each surface holds a transient pointer to its currently-bound slot. - On `RequestBeginPicture(surface_X)`: - If surface_X has a previously-bound slot in EXPORTED state: close OUR copy of the EXPBUF'd fd, mark slot as FREE - Find LRU FREE slot (oldest "last_used_at" timestamp) - Bind surface_X → slot, set destination_index, mmap if needed - QBUF that slot - On `RequestSyncSurface(surface_X)`: DQBUF as before, slot transitions FREE→IN_DECODE→EXPORTED→... - On `RequestExportSurfaceHandle(surface_X)`: VIDIOC_EXPBUF on slot's V4L2 index, save our fd, mark slot EXPORTED, record timestamp - On `RequestDestroySurfaces`: free all slots, REQBUFS(0) **Implementation details**: - `cap_pool_slot` struct in `surface.h`: ```c enum cap_slot_state { CAP_SLOT_FREE, CAP_SLOT_IN_DECODE, CAP_SLOT_EXPORTED }; struct cap_pool_slot { unsigned int v4l2_index; void *map[VIDEO_MAX_PLANES]; unsigned int map_lengths[VIDEO_MAX_PLANES]; unsigned int map_offsets[VIDEO_MAX_PLANES]; enum cap_slot_state state; int our_export_fd; /* -1 if not exported */ uint64_t last_used_at_ns; /* for LRU recycling */ }; ``` - Pool stored in `request_data` (driver-level), not in `surface_object`. - LRU helper: scan slots, find FREE with oldest `last_used_at_ns`. If no FREE, force-recycle the oldest EXPORTED slot (close fd, demote to FREE) — race window may still open in pathological cases but rare. **Validation**: - mpv `--hwdec=vaapi --vo=gpu` plays bbb 1080p for ≥60s without operator-visible stutter - Drop count < 100 in 14s (Phase 3 anchor) - Firefox 150 sustained engagement with no MESA WSI errors after Fix 2 - vainfo + mpv vaapi-copy + chromium-fourier 149 — no regression - strace shows recycled buffer indices have re-queue intervals consistent with LRU spread (e.g., wider than current ~875ms) **Risk**: medium-high. Substantive code change. Mitigations: - Keep pool size + recycling logic configurable - Preserve Fix 1 + Fix 2 as separate commits before stacking Fix 3 (revert one without losing the others) - Phase 5 sonnet review before committing Fix 3 ## Order of attack 1. **Fix 1 (multi-resolution cache)** — small, low-risk, fixes a known regression. Test: Firefox multi-video session. 2. **Fix 2 (WSI modifier)** — 1 line, test on 864-wide video. If regression, revert. 3. **Phase 5 review (sonnet)** before Fix 3 — get an outside read on the buffer-pool architecture before committing the substantive code. 4. **Fix 3 (buffer pool)** — implement, test, iterate. 5. **Phase 7 verification** — full 4-consumer matrix + multi-video session. 6. **Phase 8 close** — memory entry, iteration 3 input doc. ## Out-of-scope for iteration 2 (carried to iteration 3) - VIDIOC_G_EXT_CTRLS EACCES probe (Sonnet 7.1) - num_ref_idx for multi-slice streams (Sonnet 7.2) - HACK block in surface.c MPEG-2 case (Sonnet 7.4) - Firefox seek-to-non-IDR (Sonnet 7.5) - DEBUG instrumentation cleanup (until iteration 2 verified, per Sonnet) - V4L2_MEMORY_DMABUF mode rewrite (Option B from Phase 2 — proper but expensive) - Performance metrics — iteration 3