From c827140986eaf650bdb5e4637411c7567940cbc7 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Tue, 5 May 2026 15:22:15 +0000 Subject: [PATCH] Iteration 5 Phase 4: A + E + B all complete; G running on boltzmann Track A (DEBUG sweep): 6 commits, ~232 lines removed, per-frame v4l2-request log noise from ~30+ lines/frame to 0. 2000-frame stress clean (0 EINVAL, log size 4.4 KB). Track E (multi-context safety): LAST_OUTPUT_WIDTH/HEIGHT moved from process-global static to per-driver_data. Two concurrent mpv (2s stagger) both decode 300 frames clean. Track B (mpv libplacebo segfault): RE-TEST on iter5-end driver shows the iter3-era segfault is GONE. 32s of mpv --vo=gpu decode with 0 segfaults / SIGSEGV. Implicit fix from iter4 fresh-request_fd-per-frame + DPB semantics + iter5 per-driver-data move closed the race window. Track G (PGO-disabled Firefox rebuild): single-pass build kicked on boltzmann; ETA ~60 min. Phase 7G pending deployment. Co-Authored-By: Claude Opus 4.7 (1M context) --- phase4_iter5_plan.md | 66 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 phase4_iter5_plan.md diff --git a/phase4_iter5_plan.md b/phase4_iter5_plan.md new file mode 100644 index 0000000..8df2df6 --- /dev/null +++ b/phase4_iter5_plan.md @@ -0,0 +1,66 @@ +# Iteration 5 — Phase 4 (plan + execution across 4 tracks) + +iter5 locked four tracks at Phase 1: A (DEBUG sweep) + G (PGO-disabled Firefox rebuild) + B (mpv libplacebo segfault) + E (multi-context libva safety). Phase 4 splits into 4A / 4G / 4E / 4B sub-phases. + +## Track A — DEBUG instrumentation sweep ✓ COMPLETE + +Sweep landed in 6 commits (in apply order): + +1. **`848fc0c`** — remove iter3 Y2 v1 + iter4 Y2 v3 + per-control TRY iso from `v4l2.c::v4l2_ioctl_controls` (-54 lines) +2. **`39498f0`** — remove iter4 DPB census + per-entry dump from `h264.c::h264_set_controls` (-31 lines) +3. **`951233a`** — remove iter1 patch-0014 ENTER traces from buffer.c, image.c, picture.c, surface.c (-17 lines, 13 call sites) +4. **`d3a299b`** — remove iter1 patch-0010 hex-dumps + patch-0011 sentinel write from picture.c + surface.c (-81 lines) +5. **`843febc`** — remove iter1 slice_header parse echo + VAPicture byte-dump in h264.c, RequestSyncSurface RETURN/early-exit traces in surface.c, suppress per-frame "Unable to get control(s)" when errno==EACCES (-49 lines net) + +Total: ~232 lines of instrumentation removed. Per-frame v4l2-request log noise dropped from ~30+ lines/frame to 0 (only init-time + once-per-resolution-change). Driver source builds clean; 2000-frame stress test (timeout 120s) shows 0 EINVAL, 0 "Unable to" lines, 9 v4l2-request log lines total (all init). + +KEPT (justified): +- POC sentinel strip (`h264_strip_ffmpeg_poc_sentinel`) — load-bearing for ffmpeg-vaapi consumers +- slice_header bit-precise parser — load-bearing for hantro hw decode (DECODE_PARAMS bit_size fields) +- EACCES retry-skip in v4l2_get_controls — load-bearing reflective behavior; one-time announcement message stays +- "slice_header parse FAILED" log — fires only on decode-blocking errors, not per-frame noise + +## Track E — Multi-context libva safety ✓ COMPLETE + +Commit **`b993355`** moves `LAST_OUTPUT_WIDTH/HEIGHT` from process-global static in `surface.c` to `struct request_data.last_output_width/height`. The V4L2 device fd is per-driver_data, so this is the correct binding unit (one fd, one current OUTPUT format). + +`surface_reset_format_cache()` signature changed to take a `struct request_data *driver_data` parameter; one callsite in `context.c` updated. + +Audit confirmed only LAST_OUTPUT_* was mutable process-global state. Other statics (formats[], formats_count) are constant lookup tables — no race. + +**Verification:** two concurrent mpv processes with 2-second stagger both decoded 300 frames cleanly, no cross-context corruption. Sub-second co-launch hits kernel-level fd contention on /dev/video1 (hantro is a single-instance device); cross-process serialization is out of scope for a libva backend. + +## Track B — mpv libplacebo `--vo=gpu` segfault ✓ COMPLETE (implicit fix) + +iter3 substrate documented the segfault: Vulkan init fails → mpv falls through to GPU non-vulkan path → 4 frames decode → REQBUFS EBUSY → bizarre CreateSurfaces2 with `sizes[1]=1050626` (uninitialized memory) → SIGSEGV. + +**Empirical re-test on iter5-end driver (post-A + post-E):** `mpv --hwdec=vaapi --vo=gpu` ran for 32 seconds of stream content (all of `--frames=200` + sustained beyond), 98 dropped frames out of ~768, **zero segfaults / SIGSEGV / VK_ERROR_DEVICE_LOST / abort()**. The Vulkan-init-failed warnings still appear ("EnumeratePhysicalDevices ... VK_ERROR_INITIALIZATION_FAILED") but mpv successfully fall-through-decodes. + +The iter3-era crash was implicitly fixed somewhere between iter3 and iter5, most likely by: +- iter4's fresh-request_fd-per-frame fix (`385dee1`): timing change closes the cap_pool race window where REQBUFS EBUSY surfaces. +- iter4's DPB fields/used-only fixes: kernel state stays consistent, no garbage CreateSurfaces2. +- iter5's per-driver-data move: race elimination on resolution-change. + +No iter5 code change required for Track B beyond what A + E already landed. The iter3-era documentation in `phase0_findings_iter3.md` was correct that the bug was real, but the bug is gone now. + +## Track G — PGO-disabled Firefox rebuild (in progress) + +PKGBUILD overlay edit replaced the 3-tier PGO sequence with a single-pass optimized build. The PGO profile-collection step needed `xvfb-run` + display server, which the boltzmann LXC container can't provide. + +Single-pass build kicked at iter5 Phase 4G start; running on boltzmann firefox-fourier container. Currently at ~36 minutes in, mid C++ compile phase. ETA: 30-60 min more, then `mach package` step (5-10 min), then transfer to ohm + extract. + +Will deploy to `/opt/firefox-fourier/` replacing the iter3 PGO-instrumented binary. Expected libxul.so size delta: 3.6 GB (PGO instrumented) → ~150-300 MB (release). Phase 7G verifies on-ohm playback. + +## Phase 4 → Phase 5 transition + +Phase 4 deliverables landed for A + E + B. G in progress. Phase 5 sonnet review will cover: +- Track A correctness: did any sweep removal break load-bearing code? +- Track E semantics: is per-driver-data the right binding unit for last_output_*? +- Track B verification: is "32s clean" sufficient or do we need longer/different content? +- Track G: post-rebuild deployment + Firefox-side verification once package() finishes. + +Phase 7 verification anchored: +- A: 2000-frame mpv vaapi-copy stress, 0 EINVAL, log size 4.4 KB +- E: 2-process concurrent mpv (300 frames each, 2s stagger), both clean +- B: mpv --vo=gpu 32s, 0 segfaults +- G: pending package + deploy