Files
libva-multiplanar/phase4_iter3_plan.md
marfrit f91469abe3 Iteration 3 close — F GREEN, A reproduced + diagnosed for iter4
Phase 1 locked F (Firefox RDD sandbox verify-by-patch) and A (frame-11
EINVAL diagnose) running in parallel on a single firefox-fourier build.

Track F: GREEN. Patched Firefox 150.0.1 (firefox-fourier, pkgrel=1.1)
launches on ohm WITHOUT MOZ_DISABLE_RDD_SANDBOX=1 and engages our
libva-v4l2-request backend end-to-end. Three patches needed (Phase 2
identified one and deferred two):
  - Broker policy (SandboxBrokerPolicyFactory.cpp): allow /dev/media*,
    extend cap-filter to admit stateless decoders that lack M2M caps.
  - Seccomp policy (SandboxFilter.cpp): allow ioctl magic byte '|'
    for <linux/media.h> request-API ioctls.
  - Driver (media.c): replace select() with poll() — Mozilla's RDD
    seccomp common policy admits poll/ppoll/epoll_* but not
    select/pselect6. Driver-side fix preferred; smaller surface,
    portable across sandbox policies, and poll() is the modern API.

Track A: REPRODUCES + DIAGNOSED. Frame-11 EINVAL fires deterministically
on a single-slice P-frame (slice_type=0, frame_num=5, post-IDR) — the
exact iter1/iter2 carryover signature, confirming it isn't environmental.
Y2 instrumentation (in v4l2_ioctl_controls) now logs num_controls /
error_idx / per-control id+size on EINVAL. Sizes match kernel UAPI;
error_idx == num_controls is the kernel's "all bad / no specific control"
sentinel — it's a request-level rejection, not a single-field violation.
Fix is iter4's lock; rig + Y2 in place for fast iter4 turnaround.

Build infrastructure introduced: firefox-fourier LXD container on
boltzmann (RK3588 aarch64, persistent, ssh -J boltzmann
builder@firefox-fourier). Upstream Arch x86_64 wasi packages installed
to work around 4-year-stale ALARM versions. PGO generation crashes at
exit (LXC has no display); obj/dist/ tarball used as the deployable
artifact instead of the pacman package.

Phase 6 surprises captured in phase6_iter3_findings.md: malformed
first-cut patch (descriptive vs numeric hunk headers), --enable-v4l2
isn't a Mozilla 150 flag (auto-set on aarch64+GTK), Mozilla 2025 PGP
key rotation, ALARM-stale wasi, onnxruntime missing in ALARM, and the
"no tricks" lesson (revert workarounds first when redirected).

Carries to iter4 substrate: Track A fix is the natural lock; mpv
libplacebo --vo=gpu segfault stays as separate iter4 candidate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:56:34 +00:00

7.0 KiB

Iteration 3 — Phase 4 (plan + inputs)

Track F (sandbox patch) and Track A (frame-11 EINVAL) plans, ready for Phase 5 sonnet review.

Track F — firefox-fourier RDD sandbox patch

Deliverable authored at firefox-fourier/0001-rdd-allow-stateless-v4l2-request-api.patch.

What it changes (single source file, two hunks + one new function):

  1. AddV4l2Dependencies() cap-filter widened to also admit nodes with V4L2_CAP_VIDEO_CAPTURE_MPLANE & V4L2_CAP_VIDEO_OUTPUT_MPLANE & V4L2_CAP_STREAMING. This catches stateless decoders that don't advertise M2M.

  2. New static AddV4l2RequestApiDependencies() function that enumerates /dev/media* and adds each rdwr to the RDD broker policy. Mirrors the structure of AddV4l2Dependencies() for symmetry and reviewer-friendliness.

  3. GetRDDPolicy() calls the new function under MOZ_ENABLE_V4L2.

What it does NOT change: the seccomp policy in SandboxFilter.cpp. iter3 Phase 2 deferred this to empirical Phase 7 verification. Rationale: the iter2 failure signature was ENETDOWN at open(/dev/media0), which is broker-policy-denial, not seccomp. If MEDIA_REQUEST_IOC_QUEUE turns out to be seccomp-blocked once the open succeeds (would manifest as SIGSYS abort with seccomp_unotify in stderr), Phase 7 amends the patch with a SandboxFilter.cpp hunk allowing ioctl with magic byte '|' (or specifically the MEDIA_IOC_* range). This is a known-feasible amendment, not architectural; the cost of guess-and-check vs source-fetch-through-WebFetch favored guess-and-check.

Patch-application risk: the hunks use text-context anchors (verbatim Mozilla source from Phase 2), not line numbers. Minor whitespace drift in firefox-150.0.1.source.tar.xz vs the searchfox mozilla-release snapshot is the failure mode. Mitigation: dry-run patch -p1 --dry-run against an unpacked tarball BEFORE first makepkg. If hunks fail, re-anchor.

Track F — AUR PKGBUILD overlay

Deliverable authored at firefox-fourier/PKGBUILD-overlay.md.

Strategy: use upstream Arch firefox PKGBUILD (gitlab.archlinux.org) as basis, layer 5 hunks: rename → add aarch64 → add patch source → updpkgsums → apply in prepare(). NO mach-build or mozilla-central. The boltzmann LXD container has rust 1.95 / clang 22 / cbindgen 0.29 pre-staged and the upstream PKGBUILD's --enable-v4l2 mozconfig option is verified active.

Rebuild contract: makepkg -e (--noextract) skips re-extracting the firefox tarball and re-applying the patch, dramatically faster on iteration. For full clean rebuild (e.g. patch text changed): makepkg -C (--cleanbuild). Acknowledged user guidance: "if an aur package is the basis, remember to skip re-extraction and patching (makepkg -e) on rebuilds".

Fallback if rust-on-aarch64 fails: documented in iter3 Phase 1 lock. Power on data (x86), prevent sleep, set up x86 host with cross-compile target aarch64. Same .patch and same PKGBUILD overlay carry over; only arch= and the build host change. NOT expected to be needed since boltzmann's rust 1.95 toolchain already exists and Mozilla certifies aarch64 builds in CI.

Track A — libva-v4l2-request-fourier frame-11 EINVAL

No code fix in Phase 4. The fix requires knowing WHICH V4L2 control field returns EINVAL on frame 11, which we don't yet know. Phase 4 instead delivers the diagnostic-loaded driver build that surfaces the failing field name when run under the patched Firefox.

Plan:

  1. Diagnostic instrumentation in libva-v4l2-request-fourier/src/:

    • In surface.c::EndPicture (or wherever per-request controls are submitted via VIDIOC_S_EXT_CTRLS), wrap the ioctl with a request_log() call that, on EINVAL, dumps every control struct member: id, size, value (or for compound controls, the compound struct contents). Use V4L2_CID_* symbolic name lookup (a switch on id → string), or fall through to numeric id.
    • Also log the slice index, picture index, surface ID, and POC (Picture Order Count) so we can correlate with the 11th-frame timing.
    • This is purely add-only logging; revert in iter4's DEBUG sweep.
  2. Build + deploy: rebuild driver via meson setup --buildtype=release && ninja on ohm at /tmp/libva-src/..., deploy to /usr/lib/dri/v4l2_request_drv_video.so. Driver sha256 changes.

  3. Phase 7 capture: with patched Firefox + instrumented driver, run bbb_1080p30. Capture stderr; the EINVAL frame-11 line will name the control. Then we know whether it's:

    • DECODE_PARAMS (Sonnet 7.5 mid-stream non-IDR territory)
    • SLICE_PARAMS (num_ref_idx_l0/l1, Sonnet 7.2)
    • SCALING_MATRIX (less likely; usually constant)
    • SPS/PPS (even less likely; usually constant or per-IDR-only)
  4. Fix authoring happens AFTER Phase 7 capture, in what becomes Phase 7.5 / Phase 8 territory rather than Phase 4. This is the natural shape of "Track A informed by Track F's rig".

Reading reference for control validation rules: drivers/staging/media/hantro/hantro_g1_h264_dec.c in the kernel tree on ohm. Check on which control fields the driver returns -EINVAL in the validate path. (This ALSO is doable on rpi if we have a copy of the kernel source nearby; ohm being offline doesn't block this preliminary read.)

Phase 5 review checklist (what sonnet should look at)

  • Patch correctness: does the .patch text apply cleanly to firefox-150.0.1? Are the hunks anchored on stable text? Is nsAutoCString path("/dev/") the right string-builder type for this codebase (vs std::string, nsCString, or others)? Are the cap-filter conditions logically equivalent to the substrate's claim "stateless decoders need CAPTURE_MPLANE+OUTPUT_MPLANE+STREAMING"?

  • Patch security: does adding /dev/media* rdwr to RDD increase the attack surface in a way the existing /dev/video* rdwr policy doesn't already? Is there a media-controller node on common Linux desktops that exposes more than V4L2 (e.g. ISP / camera control nodes)? Should we filter /dev/media* by some capability check analogous to AddV4l2Dependencies's M2M check, or is enumeration sufficient?

  • PKGBUILD safety: is renaming to firefox-fourier with conflicts=(firefox) the right pacman pattern, or should we use a provides=() pin without the conflict? Does the makepkg -e contract documented in the overlay actually hold for this PKGBUILD's prepare() shape?

  • Track A diagnostic plan: is the EndPicture wrapping going to fire on the failing path, or could there be a different ioctl call site (S_EXT_CTRLS in submit_request, in queue.c, etc.) that hits EINVAL first? Should the instrumentation be at a lower layer (libva ioctl wrapper, or strace-derived signature) instead?

  • Deferred-seccomp risk: Phase 2 deferred SandboxFilter.cpp to empirical Phase 7 test. Does sonnet have a fast path to fetch that source we missed? Is the deferral acceptable?

Stop point

Phase 4 deliverables landed: patch text, PKGBUILD overlay strategy, Track A diagnostic plan, Phase 5 review checklist. Proceeding to Phase 5: sonnet review of the above. After Phase 5 passes (or the issues from review are resolved), Phase 6 builds firefox-fourier in the container and Phase 7 verifies on ohm.