Files
libva-multiplanar/phase4_iter5_plan.md
T
marfrit f36c6b040d Iteration 5 Track G complete + Phase 7G verified
firefox-fourier 150.0.1-1.1 rebuilt without --enable-profile-generate=cross
on boltzmann firefox-fourier container (single-pass, ~2h27m). Pkg
68.7 MB, libxul.so 169 MB stripped — 21× smaller than iter3
PGO-instrumented 3.6 GB binary. Installed on ohm via pacman -U.

Phase 7G: 35s autonomous run (no MOZ_DISABLE_RDD_SANDBOX=1):
  ENETDOWN: 0  (sandbox patch holds)
  EINVAL: 0    (iter4 fix holds)
  RDD ProcessDecode: 538 events
  Stream mTime reached: 22.3s
  Decode rate: 0.64× realtime (~2.7× speedup vs PGO-instrumented)

All four iter5 tracks (A+G+B+E) GREEN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 17:37:34 +00:00

71 lines
7.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 5 — Phase 4 (plan + execution across 4 tracks)
iter5 locked four tracks at Phase 1: A (DEBUG sweep) + G (PGO-disabled Firefox rebuild) + B (mpv libplacebo segfault) + E (multi-context libva safety). Phase 4 splits into 4A / 4G / 4E / 4B sub-phases.
## Track A — DEBUG instrumentation sweep ✓ COMPLETE
Sweep landed in 6 commits (in apply order):
1. **`848fc0c`** — remove iter3 Y2 v1 + iter4 Y2 v3 + per-control TRY iso from `v4l2.c::v4l2_ioctl_controls` (-54 lines)
2. **`39498f0`** — remove iter4 DPB census + per-entry dump from `h264.c::h264_set_controls` (-31 lines)
3. **`951233a`** — remove iter1 patch-0014 ENTER traces from buffer.c, image.c, picture.c, surface.c (-17 lines, 13 call sites)
4. **`d3a299b`** — remove iter1 patch-0010 hex-dumps + patch-0011 sentinel write from picture.c + surface.c (-81 lines)
5. **`843febc`** — remove iter1 slice_header parse echo + VAPicture byte-dump in h264.c, RequestSyncSurface RETURN/early-exit traces in surface.c, suppress per-frame "Unable to get control(s)" when errno==EACCES (-49 lines net)
Total: ~232 lines of instrumentation removed. Per-frame v4l2-request log noise dropped from ~30+ lines/frame to 0 (only init-time + once-per-resolution-change). Driver source builds clean; 2000-frame stress test (timeout 120s) shows 0 EINVAL, 0 "Unable to" lines, 9 v4l2-request log lines total (all init).
KEPT (justified):
- POC sentinel strip (`h264_strip_ffmpeg_poc_sentinel`) — load-bearing for ffmpeg-vaapi consumers
- slice_header bit-precise parser — load-bearing for hantro hw decode (DECODE_PARAMS bit_size fields)
- EACCES retry-skip in v4l2_get_controls — load-bearing reflective behavior; one-time announcement message stays
- "slice_header parse FAILED" log — fires only on decode-blocking errors, not per-frame noise
## Track E — Multi-context libva safety ✓ COMPLETE
Commit **`b993355`** moves `LAST_OUTPUT_WIDTH/HEIGHT` from process-global static in `surface.c` to `struct request_data.last_output_width/height`. The V4L2 device fd is per-driver_data, so this is the correct binding unit (one fd, one current OUTPUT format).
`surface_reset_format_cache()` signature changed to take a `struct request_data *driver_data` parameter; one callsite in `context.c` updated.
Audit confirmed only LAST_OUTPUT_* was mutable process-global state. Other statics (formats[], formats_count) are constant lookup tables — no race.
**Verification:** two concurrent mpv processes with 2-second stagger both decoded 300 frames cleanly, no cross-context corruption. Sub-second co-launch hits kernel-level fd contention on /dev/video1 (hantro is a single-instance device); cross-process serialization is out of scope for a libva backend.
## Track B — mpv libplacebo `--vo=gpu`: doesn't reproduce on this consumer ✓
iter3 substrate documented the segfault: Vulkan init fails → mpv falls through to GPU non-vulkan path → 4 frames decode → REQBUFS EBUSY → bizarre CreateSurfaces2 with `sizes[1]=1050626` (uninitialized memory) → SIGSEGV.
**Empirical re-test on iter5-end driver (post-A + post-E):** `mpv --hwdec=vaapi --vo=gpu` ran for 32 seconds of stream content (all of `--frames=200` + sustained beyond), 98 dropped frames out of ~768, **zero segfaults / SIGSEGV / VK_ERROR_DEVICE_LOST / abort()**. The Vulkan-init-failed warnings still appear ("EnumeratePhysicalDevices ... VK_ERROR_INITIALIZATION_FAILED") and that's steady-state on Mali-G52 / Bifrost (no PanVk for that GPU yet — see `reference_pinetab_no_vulkan.md` memory). mpv falls through to GLES via Panfrost.
Phase 5 sonnet review (C4): "implicit fix" overstated — refined to **"doesn't reproduce on this consumer pattern."** The iter4 + iter5 code changes don't directly close the cap_pool REQBUFS-EBUSY-on-resolution-change path that the iter3 substrate documented. The 32s GLES test path doesn't exercise the probe-with-garbage-dimensions consumer pattern that originally triggered the SIGSEGV. So the failure shape is *latent*, not *closed*: a future libva consumer that probes with `vaCreateSurfaces(16, 16)` between two `vaCreateSurfaces(1920, 1088)` calls while CAPTURE STREAMON is active could still hit the same path.
The cap_pool drain ordering concern survives as iter6+ candidate. iter5's Track B success criterion ("≥30s of bbb_1080p30 without segfault — OR root cause documented as upstream issue with workaround") is satisfied by the 32s clean run; the named caveat (cap_pool race window still latent under untested consumer patterns) is documented here.
No iter5 code change required for Track B beyond what A + E landed. Phase 5 review C4 framing applied.
## Track G — PGO-disabled Firefox rebuild (in progress)
PKGBUILD overlay edit replaced the 3-tier PGO sequence with a single-pass optimized build. The PGO profile-collection step needed `xvfb-run` + display server, which the boltzmann LXC container can't provide.
Single-pass build kicked at iter5 Phase 4G start; running on boltzmann firefox-fourier container. Currently at ~36 minutes in, mid C++ compile phase. ETA: 30-60 min more, then `mach package` step (5-10 min), then transfer to ohm + extract.
Will deploy to `/opt/firefox-fourier/` replacing the iter3 PGO-instrumented binary. Expected libxul.so size delta: 3.6 GB (PGO instrumented) → ~150-300 MB (release). Phase 7G verifies on-ohm playback.
## Phase 5 sonnet review caveats addressed (in commit `c8b6ede`)
Phase 5 review came back YELLOW with four caveats. Three resolved in code, one in documentation:
- **C1 (Track A incomplete):** sweep missed three surface.c DEBUG sites (CreateSurfaces2 format-dump, ExportSurfaceHandle descriptor-dump, QuerySurfaceStatus status-dump) and a "3F observability" V4L2 readback block in h264.c. Resolved in `c8b6ede` — additional 107-line removal.
- **C2 (`static bool readback_warned`):** new mutable process-global state introduced inside the readback block. Resolved by removing the readback block entirely (point above).
- **C3 (msync removal pixel-correctness):** msync(MS_SYNC|MS_INVALIDATE) was paired with the iter1 hex-dump and removed alongside it. The CAPTURE buffer is read post-DQBUF via `copy_surface_to_image` (image.c) for vaapi-copy; on a CMA-backed non-coherent setup this could in principle need cache invalidation. Empirical: 2000-frame stress with 0 errors, no visible decode failure. Likely the kernel does DMA sync at DQBUF level. Documented as named caveat — frame-hash spot check could anchor it formally if needed; for now accept based on empirical pass.
- **C4 (Track B "implicit fix" overstated):** reframed above as "doesn't reproduce on this consumer pattern." Cap_pool resolution-change race window remains latent under untested consumer probe patterns.
## Phase 4 → Phase 6/7/8 transition
Phase 4 + Phase 5 done for A + E + B. G in progress.
Phase 7 verification anchored (driver sha256 `4bed52ec5d44b389...`, post-cleanup):
- **A**: 2000-frame mpv vaapi-copy stress, **0 EINVAL, 1 v4l2-request log line, 3.0 KB log** (down from 9 lines / 4.4 KB pre-Phase-5-cleanup). Phase 5 C1 caveat resolved.
- **E**: 2-process concurrent mpv post-cleanup (300 frames each, 2s stagger), both clean ("Exiting (End of file)").
- **B**: 35s mpv `--vo=gpu` post-cleanup: 31s stream pos, 29 dropped, 0 segfaults. Same shape as pre-cleanup (cap_pool race fires at init, mpv falls back to SW gracefully). Sonnet C4 caveat stands as iter6+ candidate.
- **G**: ✓ COMPLETE. firefox 150.0.1-1.1 built (single-pass non-PGO, ~2h27m on boltzmann), 68.7 MB pkg, libxul.so stripped to 169 MB (21× smaller than iter3 PGO-instrumented 3.6 GB). Installed via pacman -U on ohm replacing stock firefox 150.0.1-1. Phase 7G test (35s autonomous run, no `MOZ_DISABLE_RDD_SANDBOX=1`): ENETDOWN=0, EINVAL=0, 538 RDD ProcessDecode events, 22.3s stream content (0.64× realtime — ~2.7× speedup over PGO-instrumented).