Files
libva-multiplanar/phase3_iter3_baseline.md
marfrit f91469abe3 Iteration 3 close — F GREEN, A reproduced + diagnosed for iter4
Phase 1 locked F (Firefox RDD sandbox verify-by-patch) and A (frame-11
EINVAL diagnose) running in parallel on a single firefox-fourier build.

Track F: GREEN. Patched Firefox 150.0.1 (firefox-fourier, pkgrel=1.1)
launches on ohm WITHOUT MOZ_DISABLE_RDD_SANDBOX=1 and engages our
libva-v4l2-request backend end-to-end. Three patches needed (Phase 2
identified one and deferred two):
  - Broker policy (SandboxBrokerPolicyFactory.cpp): allow /dev/media*,
    extend cap-filter to admit stateless decoders that lack M2M caps.
  - Seccomp policy (SandboxFilter.cpp): allow ioctl magic byte '|'
    for <linux/media.h> request-API ioctls.
  - Driver (media.c): replace select() with poll() — Mozilla's RDD
    seccomp common policy admits poll/ppoll/epoll_* but not
    select/pselect6. Driver-side fix preferred; smaller surface,
    portable across sandbox policies, and poll() is the modern API.

Track A: REPRODUCES + DIAGNOSED. Frame-11 EINVAL fires deterministically
on a single-slice P-frame (slice_type=0, frame_num=5, post-IDR) — the
exact iter1/iter2 carryover signature, confirming it isn't environmental.
Y2 instrumentation (in v4l2_ioctl_controls) now logs num_controls /
error_idx / per-control id+size on EINVAL. Sizes match kernel UAPI;
error_idx == num_controls is the kernel's "all bad / no specific control"
sentinel — it's a request-level rejection, not a single-field violation.
Fix is iter4's lock; rig + Y2 in place for fast iter4 turnaround.

Build infrastructure introduced: firefox-fourier LXD container on
boltzmann (RK3588 aarch64, persistent, ssh -J boltzmann
builder@firefox-fourier). Upstream Arch x86_64 wasi packages installed
to work around 4-year-stale ALARM versions. PGO generation crashes at
exit (LXC has no display); obj/dist/ tarball used as the deployable
artifact instead of the pacman package.

Phase 6 surprises captured in phase6_iter3_findings.md: malformed
first-cut patch (descriptive vs numeric hunk headers), --enable-v4l2
isn't a Mozilla 150 flag (auto-set on aarch64+GTK), Mozilla 2025 PGP
key rotation, ALARM-stale wasi, onnxruntime missing in ALARM, and the
"no tricks" lesson (revert workarounds first when redirected).

Carries to iter4 substrate: Track A fix is the natural lock; mpv
libplacebo --vo=gpu segfault stays as separate iter4 candidate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:56:34 +00:00

5.3 KiB

Iteration 3 — Phase 3 (baseline anchor: pre-patch Firefox 150 behavior on ohm)

Goal: anchor the pre-patch behavior so Phase 7 has a "before" picture. Two distinct baselines matter for iter3:

  • Baseline-S (sandbox): stock Firefox 150 with default RDD sandbox → libva fails at open(/dev/media0) with ENETDOWN → Firefox SW-falls-back. This is what Track F's patch is supposed to fix.
  • Baseline-A (frame-11 EINVAL): stock Firefox 150 with MOZ_DISABLE_RDD_SANDBOX=1 → libva engages hantro, decodes 10 frames, then EINVAL on set_controls at frame 11. This is the carryover defect Track A is supposed to fix.

Anchored baseline source

ohm is currently powered off (probe ping -c 1 ohm.fritz.box from rpi at 2026-05-04 ~23:50 returned 100% packet loss; PineTab2 has no WoL — manual power-on by operator required). So the in-session re-acquire of /tmp/ff-stdout.log is not possible right now. The substantive risk this poses to Phase 3 is low, because:

  1. iter2 close (phase8_iteration2_close.md, commit c36c61e) recorded the same baseline observations on 2026-05-04, the same day this Phase 3 anchor is being written. Same kernel (6.19.10), same userspace (Firefox 150.0.1, libva 2.23.0, mesa 26.0.5), same fixture (bbb_1080p30_h264.mp4 sha256 dcf8a7170fbd...), same driver build (sha256 f27e0064...). No state has drifted.

  2. The "before" picture is what we want to PROVE WRONG via the patch. The verifying observation is the "after" picture in Phase 7. Re-acquiring the "before" within hours of an identical observation that's already in git would be ceremonial.

So this Phase 3 doc anchors the iter2-close evidence by reference, with the explicit understanding that Phase 7 will produce the corresponding "after" rig. If at Phase 7 we discover the stock Firefox baseline has shifted (e.g. Firefox 151 has dropped through pacman update by then), we re-acquire then.

Baseline-S evidence (anchored from iter2 close)

Quoted verbatim from phase8_iteration2_close.md:

Firefox 150 (default sandbox) | ✗ libva init fails inside RDD sandbox on open(/dev/media0) returning ENETDOWN — Firefox SW-falls-back. NOT an iter2 code regression (iter1 init code is byte-identical), but a Firefox routing change since iter1: iter1's findings.md shows decode happened on the utility process (sandboxingKind=0), iter2 today shows the libva path goes through RDD which is sandbox-blocked. Workaround: launch Firefox with MOZ_DISABLE_RDD_SANDBOX=1.

Signature to match in Phase 7:

  • Command: stock firefox /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4 (no MOZ_DISABLE_RDD_SANDBOX)
  • driver stderr: shows open("/dev/media0", O_RDWR) returning -1 ENETDOWN
  • decode behavior: SW fallback, no hantro engagement

Phase 7 verifies: with firefox-fourier patched binary and same launch (no env var), the open succeeds and ≥10 frames decode through hantro.

Baseline-A evidence (anchored from iter2 close)

Firefox 150 (sandbox-disabled) | ✓ engages our libva, decodes 10 frames cleanly through hantro (luma gradient 0x10→0x1c matching BBB intro fade, real NV12 pixels), then EINVAL on set_controls at frame 11. The EINVAL is a non-iter2 issue — same Sonnet 7.x family carryover from iter1 (likely 7.5 mid-stream / 7.2 num_ref_idx). cap_pool model is NOT the regression.

The 10-frame decoded sequence under sandbox-bypass confirms Fix 3's cap_pool architecture works correctly with Firefox: surface IDs 67108864..67108871 each acquired their own slot, and surface IDs were recycled across frames 5,6,9 with the slot state machine cycling through IN_DECODE → DECODED → recycle on next BeginPicture for the same surface. Pool was operating exactly as designed.

Signature to match in Phase 7 (track A):

  • Command: firefox-fourier /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4 with the libva-v4l2-request-fourier driver instrumented to log per-request control values
  • driver stderr: Unable to set control(s): Invalid argument emerging at the 11th frame
  • Where to look: per-request controls submitted from EndPicture — DECODE_PARAMS, SLICE_PARAMS, SCALING_MATRIX, SPS, PPS — for the slice immediately after the cap_pool's first recycle event

Phase 7 verifies: with iter3's libva fix applied (Phase 4 also produces this), the EINVAL no longer fires; ≥30s of bbb_1080p30 decode without Unable to set control(s).

Phase 3 carry-over to Phase 4

Phase 4 (patch + PKGBUILD overlay authorship) does not need Baseline-S or Baseline-A re-acquired live. It needs:

  1. The verbatim Mozilla source from Phase 2 (already captured in phase2_iter3_situation.md)
  2. The cap-set of hantro on ohm to confirm whether V4L2_CAP_VIDEO_M2M_MPLANE is set (cheap to check at Phase 7 boot via v4l2-ctl --device=/dev/video1 --info)
  3. The fixture and driver state (anchored, unchanged since iter2)

Operator action item for Phase 7 prep: when ohm is next powered on, run v4l2-ctl --device=/dev/video1 --info | grep -E 'Capabilities|Device' and capture output. If Video M2M Multiplanar is in the cap list, the cap-filter extension is unnecessary and the patch shrinks to "just add /dev/media0". If absent, both pieces of the patch are needed.

Stop point

Phase 3 anchored. Proceeding to Phase 4: write the firefox-fourier patch + the AUR PKGBUILD overlay. Operator-side action item flagged above. ohm offline does NOT block Phase 4 (writing the patch is desk work).