After fork commit c8b6ede (Phase 5 follow-up sweep): - Track A: 2000-frame mpv stress, 0 EINVAL, 1 v4l2-request log line, 3 KB log (down from 9 lines / 4.4 KB pre-cleanup). - Track E: 2-process concurrent mpv with 2s stagger, both clean. - Track B: 35s mpv --vo=gpu, 31s stream pos, 0 segfaults (mpv falls back to SW gracefully after init-time cap_pool EBUSY race; the race latent caveat stands per Phase 5 C4). Track G still rebuilding on boltzmann; ETA <1h. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.3 KiB
Iteration 5 — Phase 4 (plan + execution across 4 tracks)
iter5 locked four tracks at Phase 1: A (DEBUG sweep) + G (PGO-disabled Firefox rebuild) + B (mpv libplacebo segfault) + E (multi-context libva safety). Phase 4 splits into 4A / 4G / 4E / 4B sub-phases.
Track A — DEBUG instrumentation sweep ✓ COMPLETE
Sweep landed in 6 commits (in apply order):
848fc0c— remove iter3 Y2 v1 + iter4 Y2 v3 + per-control TRY iso fromv4l2.c::v4l2_ioctl_controls(-54 lines)39498f0— remove iter4 DPB census + per-entry dump fromh264.c::h264_set_controls(-31 lines)951233a— remove iter1 patch-0014 ENTER traces from buffer.c, image.c, picture.c, surface.c (-17 lines, 13 call sites)d3a299b— remove iter1 patch-0010 hex-dumps + patch-0011 sentinel write from picture.c + surface.c (-81 lines)843febc— remove iter1 slice_header parse echo + VAPicture byte-dump in h264.c, RequestSyncSurface RETURN/early-exit traces in surface.c, suppress per-frame "Unable to get control(s)" when errno==EACCES (-49 lines net)
Total: ~232 lines of instrumentation removed. Per-frame v4l2-request log noise dropped from ~30+ lines/frame to 0 (only init-time + once-per-resolution-change). Driver source builds clean; 2000-frame stress test (timeout 120s) shows 0 EINVAL, 0 "Unable to" lines, 9 v4l2-request log lines total (all init).
KEPT (justified):
- POC sentinel strip (
h264_strip_ffmpeg_poc_sentinel) — load-bearing for ffmpeg-vaapi consumers - slice_header bit-precise parser — load-bearing for hantro hw decode (DECODE_PARAMS bit_size fields)
- EACCES retry-skip in v4l2_get_controls — load-bearing reflective behavior; one-time announcement message stays
- "slice_header parse FAILED" log — fires only on decode-blocking errors, not per-frame noise
Track E — Multi-context libva safety ✓ COMPLETE
Commit b993355 moves LAST_OUTPUT_WIDTH/HEIGHT from process-global static in surface.c to struct request_data.last_output_width/height. The V4L2 device fd is per-driver_data, so this is the correct binding unit (one fd, one current OUTPUT format).
surface_reset_format_cache() signature changed to take a struct request_data *driver_data parameter; one callsite in context.c updated.
Audit confirmed only LAST_OUTPUT_* was mutable process-global state. Other statics (formats[], formats_count) are constant lookup tables — no race.
Verification: two concurrent mpv processes with 2-second stagger both decoded 300 frames cleanly, no cross-context corruption. Sub-second co-launch hits kernel-level fd contention on /dev/video1 (hantro is a single-instance device); cross-process serialization is out of scope for a libva backend.
Track B — mpv libplacebo --vo=gpu: doesn't reproduce on this consumer ✓
iter3 substrate documented the segfault: Vulkan init fails → mpv falls through to GPU non-vulkan path → 4 frames decode → REQBUFS EBUSY → bizarre CreateSurfaces2 with sizes[1]=1050626 (uninitialized memory) → SIGSEGV.
Empirical re-test on iter5-end driver (post-A + post-E): mpv --hwdec=vaapi --vo=gpu ran for 32 seconds of stream content (all of --frames=200 + sustained beyond), 98 dropped frames out of ~768, zero segfaults / SIGSEGV / VK_ERROR_DEVICE_LOST / abort(). The Vulkan-init-failed warnings still appear ("EnumeratePhysicalDevices ... VK_ERROR_INITIALIZATION_FAILED") and that's steady-state on Mali-G52 / Bifrost (no PanVk for that GPU yet — see reference_pinetab_no_vulkan.md memory). mpv falls through to GLES via Panfrost.
Phase 5 sonnet review (C4): "implicit fix" overstated — refined to "doesn't reproduce on this consumer pattern." The iter4 + iter5 code changes don't directly close the cap_pool REQBUFS-EBUSY-on-resolution-change path that the iter3 substrate documented. The 32s GLES test path doesn't exercise the probe-with-garbage-dimensions consumer pattern that originally triggered the SIGSEGV. So the failure shape is latent, not closed: a future libva consumer that probes with vaCreateSurfaces(16, 16) between two vaCreateSurfaces(1920, 1088) calls while CAPTURE STREAMON is active could still hit the same path.
The cap_pool drain ordering concern survives as iter6+ candidate. iter5's Track B success criterion ("≥30s of bbb_1080p30 without segfault — OR root cause documented as upstream issue with workaround") is satisfied by the 32s clean run; the named caveat (cap_pool race window still latent under untested consumer patterns) is documented here.
No iter5 code change required for Track B beyond what A + E landed. Phase 5 review C4 framing applied.
Track G — PGO-disabled Firefox rebuild (in progress)
PKGBUILD overlay edit replaced the 3-tier PGO sequence with a single-pass optimized build. The PGO profile-collection step needed xvfb-run + display server, which the boltzmann LXC container can't provide.
Single-pass build kicked at iter5 Phase 4G start; running on boltzmann firefox-fourier container. Currently at ~36 minutes in, mid C++ compile phase. ETA: 30-60 min more, then mach package step (5-10 min), then transfer to ohm + extract.
Will deploy to /opt/firefox-fourier/ replacing the iter3 PGO-instrumented binary. Expected libxul.so size delta: 3.6 GB (PGO instrumented) → ~150-300 MB (release). Phase 7G verifies on-ohm playback.
Phase 5 sonnet review caveats addressed (in commit c8b6ede)
Phase 5 review came back YELLOW with four caveats. Three resolved in code, one in documentation:
- C1 (Track A incomplete): sweep missed three surface.c DEBUG sites (CreateSurfaces2 format-dump, ExportSurfaceHandle descriptor-dump, QuerySurfaceStatus status-dump) and a "3F observability" V4L2 readback block in h264.c. Resolved in
c8b6ede— additional 107-line removal. - C2 (
static bool readback_warned): new mutable process-global state introduced inside the readback block. Resolved by removing the readback block entirely (point above). - C3 (msync removal pixel-correctness): msync(MS_SYNC|MS_INVALIDATE) was paired with the iter1 hex-dump and removed alongside it. The CAPTURE buffer is read post-DQBUF via
copy_surface_to_image(image.c) for vaapi-copy; on a CMA-backed non-coherent setup this could in principle need cache invalidation. Empirical: 2000-frame stress with 0 errors, no visible decode failure. Likely the kernel does DMA sync at DQBUF level. Documented as named caveat — frame-hash spot check could anchor it formally if needed; for now accept based on empirical pass. - C4 (Track B "implicit fix" overstated): reframed above as "doesn't reproduce on this consumer pattern." Cap_pool resolution-change race window remains latent under untested consumer probe patterns.
Phase 4 → Phase 6/7/8 transition
Phase 4 + Phase 5 done for A + E + B. G in progress.
Phase 7 verification anchored (driver sha256 4bed52ec5d44b389..., post-cleanup):
- A: 2000-frame mpv vaapi-copy stress, 0 EINVAL, 1 v4l2-request log line, 3.0 KB log (down from 9 lines / 4.4 KB pre-Phase-5-cleanup). Phase 5 C1 caveat resolved.
- E: 2-process concurrent mpv post-cleanup (300 frames each, 2s stagger), both clean ("Exiting (End of file)").
- B: 35s mpv
--vo=gpupost-cleanup: 31s stream pos, 29 dropped, 0 segfaults. Same shape as pre-cleanup (cap_pool race fires at init, mpv falls back to SW gracefully). Sonnet C4 caveat stands as iter6+ candidate. - G: pending package + deploy