Track A (DEBUG sweep): 6 commits, ~232 lines removed, per-frame v4l2-request log noise from ~30+ lines/frame to 0. 2000-frame stress clean (0 EINVAL, log size 4.4 KB). Track E (multi-context safety): LAST_OUTPUT_WIDTH/HEIGHT moved from process-global static to per-driver_data. Two concurrent mpv (2s stagger) both decode 300 frames clean. Track B (mpv libplacebo segfault): RE-TEST on iter5-end driver shows the iter3-era segfault is GONE. 32s of mpv --vo=gpu decode with 0 segfaults / SIGSEGV. Implicit fix from iter4 fresh-request_fd-per-frame + DPB semantics + iter5 per-driver-data move closed the race window. Track G (PGO-disabled Firefox rebuild): single-pass build kicked on boltzmann; ETA ~60 min. Phase 7G pending deployment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.3 KiB
Iteration 5 — Phase 4 (plan + execution across 4 tracks)
iter5 locked four tracks at Phase 1: A (DEBUG sweep) + G (PGO-disabled Firefox rebuild) + B (mpv libplacebo segfault) + E (multi-context libva safety). Phase 4 splits into 4A / 4G / 4E / 4B sub-phases.
Track A — DEBUG instrumentation sweep ✓ COMPLETE
Sweep landed in 6 commits (in apply order):
848fc0c— remove iter3 Y2 v1 + iter4 Y2 v3 + per-control TRY iso fromv4l2.c::v4l2_ioctl_controls(-54 lines)39498f0— remove iter4 DPB census + per-entry dump fromh264.c::h264_set_controls(-31 lines)951233a— remove iter1 patch-0014 ENTER traces from buffer.c, image.c, picture.c, surface.c (-17 lines, 13 call sites)d3a299b— remove iter1 patch-0010 hex-dumps + patch-0011 sentinel write from picture.c + surface.c (-81 lines)843febc— remove iter1 slice_header parse echo + VAPicture byte-dump in h264.c, RequestSyncSurface RETURN/early-exit traces in surface.c, suppress per-frame "Unable to get control(s)" when errno==EACCES (-49 lines net)
Total: ~232 lines of instrumentation removed. Per-frame v4l2-request log noise dropped from ~30+ lines/frame to 0 (only init-time + once-per-resolution-change). Driver source builds clean; 2000-frame stress test (timeout 120s) shows 0 EINVAL, 0 "Unable to" lines, 9 v4l2-request log lines total (all init).
KEPT (justified):
- POC sentinel strip (
h264_strip_ffmpeg_poc_sentinel) — load-bearing for ffmpeg-vaapi consumers - slice_header bit-precise parser — load-bearing for hantro hw decode (DECODE_PARAMS bit_size fields)
- EACCES retry-skip in v4l2_get_controls — load-bearing reflective behavior; one-time announcement message stays
- "slice_header parse FAILED" log — fires only on decode-blocking errors, not per-frame noise
Track E — Multi-context libva safety ✓ COMPLETE
Commit b993355 moves LAST_OUTPUT_WIDTH/HEIGHT from process-global static in surface.c to struct request_data.last_output_width/height. The V4L2 device fd is per-driver_data, so this is the correct binding unit (one fd, one current OUTPUT format).
surface_reset_format_cache() signature changed to take a struct request_data *driver_data parameter; one callsite in context.c updated.
Audit confirmed only LAST_OUTPUT_* was mutable process-global state. Other statics (formats[], formats_count) are constant lookup tables — no race.
Verification: two concurrent mpv processes with 2-second stagger both decoded 300 frames cleanly, no cross-context corruption. Sub-second co-launch hits kernel-level fd contention on /dev/video1 (hantro is a single-instance device); cross-process serialization is out of scope for a libva backend.
Track B — mpv libplacebo --vo=gpu segfault ✓ COMPLETE (implicit fix)
iter3 substrate documented the segfault: Vulkan init fails → mpv falls through to GPU non-vulkan path → 4 frames decode → REQBUFS EBUSY → bizarre CreateSurfaces2 with sizes[1]=1050626 (uninitialized memory) → SIGSEGV.
Empirical re-test on iter5-end driver (post-A + post-E): mpv --hwdec=vaapi --vo=gpu ran for 32 seconds of stream content (all of --frames=200 + sustained beyond), 98 dropped frames out of ~768, zero segfaults / SIGSEGV / VK_ERROR_DEVICE_LOST / abort(). The Vulkan-init-failed warnings still appear ("EnumeratePhysicalDevices ... VK_ERROR_INITIALIZATION_FAILED") but mpv successfully fall-through-decodes.
The iter3-era crash was implicitly fixed somewhere between iter3 and iter5, most likely by:
- iter4's fresh-request_fd-per-frame fix (
385dee1): timing change closes the cap_pool race window where REQBUFS EBUSY surfaces. - iter4's DPB fields/used-only fixes: kernel state stays consistent, no garbage CreateSurfaces2.
- iter5's per-driver-data move: race elimination on resolution-change.
No iter5 code change required for Track B beyond what A + E already landed. The iter3-era documentation in phase0_findings_iter3.md was correct that the bug was real, but the bug is gone now.
Track G — PGO-disabled Firefox rebuild (in progress)
PKGBUILD overlay edit replaced the 3-tier PGO sequence with a single-pass optimized build. The PGO profile-collection step needed xvfb-run + display server, which the boltzmann LXC container can't provide.
Single-pass build kicked at iter5 Phase 4G start; running on boltzmann firefox-fourier container. Currently at ~36 minutes in, mid C++ compile phase. ETA: 30-60 min more, then mach package step (5-10 min), then transfer to ohm + extract.
Will deploy to /opt/firefox-fourier/ replacing the iter3 PGO-instrumented binary. Expected libxul.so size delta: 3.6 GB (PGO instrumented) → ~150-300 MB (release). Phase 7G verifies on-ohm playback.
Phase 4 → Phase 5 transition
Phase 4 deliverables landed for A + E + B. G in progress. Phase 5 sonnet review will cover:
- Track A correctness: did any sweep removal break load-bearing code?
- Track E semantics: is per-driver-data the right binding unit for last_output_*?
- Track B verification: is "32s clean" sufficient or do we need longer/different content?
- Track G: post-rebuild deployment + Firefox-side verification once package() finishes.
Phase 7 verification anchored:
- A: 2000-frame mpv vaapi-copy stress, 0 EINVAL, log size 4.4 KB
- E: 2-process concurrent mpv (300 frames each, 2s stagger), both clean
- B: mpv --vo=gpu 32s, 0 segfaults
- G: pending package + deploy