f91469abe3
Phase 1 locked F (Firefox RDD sandbox verify-by-patch) and A (frame-11
EINVAL diagnose) running in parallel on a single firefox-fourier build.
Track F: GREEN. Patched Firefox 150.0.1 (firefox-fourier, pkgrel=1.1)
launches on ohm WITHOUT MOZ_DISABLE_RDD_SANDBOX=1 and engages our
libva-v4l2-request backend end-to-end. Three patches needed (Phase 2
identified one and deferred two):
- Broker policy (SandboxBrokerPolicyFactory.cpp): allow /dev/media*,
extend cap-filter to admit stateless decoders that lack M2M caps.
- Seccomp policy (SandboxFilter.cpp): allow ioctl magic byte '|'
for <linux/media.h> request-API ioctls.
- Driver (media.c): replace select() with poll() — Mozilla's RDD
seccomp common policy admits poll/ppoll/epoll_* but not
select/pselect6. Driver-side fix preferred; smaller surface,
portable across sandbox policies, and poll() is the modern API.
Track A: REPRODUCES + DIAGNOSED. Frame-11 EINVAL fires deterministically
on a single-slice P-frame (slice_type=0, frame_num=5, post-IDR) — the
exact iter1/iter2 carryover signature, confirming it isn't environmental.
Y2 instrumentation (in v4l2_ioctl_controls) now logs num_controls /
error_idx / per-control id+size on EINVAL. Sizes match kernel UAPI;
error_idx == num_controls is the kernel's "all bad / no specific control"
sentinel — it's a request-level rejection, not a single-field violation.
Fix is iter4's lock; rig + Y2 in place for fast iter4 turnaround.
Build infrastructure introduced: firefox-fourier LXD container on
boltzmann (RK3588 aarch64, persistent, ssh -J boltzmann
builder@firefox-fourier). Upstream Arch x86_64 wasi packages installed
to work around 4-year-stale ALARM versions. PGO generation crashes at
exit (LXC has no display); obj/dist/ tarball used as the deployable
artifact instead of the pacman package.
Phase 6 surprises captured in phase6_iter3_findings.md: malformed
first-cut patch (descriptive vs numeric hunk headers), --enable-v4l2
isn't a Mozilla 150 flag (auto-set on aarch64+GTK), Mozilla 2025 PGP
key rotation, ALARM-stale wasi, onnxruntime missing in ALARM, and the
"no tricks" lesson (revert workarounds first when redirected).
Carries to iter4 substrate: Track A fix is the natural lock; mpv
libplacebo --vo=gpu segfault stays as separate iter4 candidate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
69 lines
7.0 KiB
Markdown
69 lines
7.0 KiB
Markdown
# Iteration 3 — Phase 4 (plan + inputs)
|
|
|
|
Track F (sandbox patch) and Track A (frame-11 EINVAL) plans, ready for Phase 5 sonnet review.
|
|
|
|
## Track F — firefox-fourier RDD sandbox patch
|
|
|
|
**Deliverable** authored at `firefox-fourier/0001-rdd-allow-stateless-v4l2-request-api.patch`.
|
|
|
|
**What it changes** (single source file, two hunks + one new function):
|
|
|
|
1. `AddV4l2Dependencies()` cap-filter widened to also admit nodes with `V4L2_CAP_VIDEO_CAPTURE_MPLANE & V4L2_CAP_VIDEO_OUTPUT_MPLANE & V4L2_CAP_STREAMING`. This catches stateless decoders that don't advertise M2M.
|
|
|
|
2. New static `AddV4l2RequestApiDependencies()` function that enumerates `/dev/media*` and adds each rdwr to the RDD broker policy. Mirrors the structure of `AddV4l2Dependencies()` for symmetry and reviewer-friendliness.
|
|
|
|
3. `GetRDDPolicy()` calls the new function under `MOZ_ENABLE_V4L2`.
|
|
|
|
**What it does NOT change:** the seccomp policy in `SandboxFilter.cpp`. iter3 Phase 2 deferred this to empirical Phase 7 verification. Rationale: the iter2 failure signature was ENETDOWN at `open(/dev/media0)`, which is broker-policy-denial, not seccomp. If MEDIA_REQUEST_IOC_QUEUE turns out to be seccomp-blocked once the open succeeds (would manifest as SIGSYS abort with seccomp_unotify in stderr), Phase 7 amends the patch with a SandboxFilter.cpp hunk allowing ioctl with magic byte `'|'` (or specifically the MEDIA_IOC_* range). This is a known-feasible amendment, not architectural; the cost of guess-and-check vs source-fetch-through-WebFetch favored guess-and-check.
|
|
|
|
**Patch-application risk:** the hunks use text-context anchors (verbatim Mozilla source from Phase 2), not line numbers. Minor whitespace drift in firefox-150.0.1.source.tar.xz vs the searchfox `mozilla-release` snapshot is the failure mode. Mitigation: dry-run `patch -p1 --dry-run` against an unpacked tarball BEFORE first `makepkg`. If hunks fail, re-anchor.
|
|
|
|
## Track F — AUR PKGBUILD overlay
|
|
|
|
**Deliverable** authored at `firefox-fourier/PKGBUILD-overlay.md`.
|
|
|
|
**Strategy:** use upstream Arch `firefox` PKGBUILD (gitlab.archlinux.org) as basis, layer 5 hunks: rename → add aarch64 → add patch source → updpkgsums → apply in prepare(). NO mach-build or mozilla-central. The boltzmann LXD container has rust 1.95 / clang 22 / cbindgen 0.29 pre-staged and the upstream PKGBUILD's `--enable-v4l2` mozconfig option is verified active.
|
|
|
|
**Rebuild contract:** `makepkg -e` (--noextract) skips re-extracting the firefox tarball and re-applying the patch, dramatically faster on iteration. For full clean rebuild (e.g. patch text changed): `makepkg -C` (--cleanbuild). Acknowledged user guidance: "if an aur package is the basis, remember to skip re-extraction and patching (makepkg -e) on rebuilds".
|
|
|
|
**Fallback if rust-on-aarch64 fails:** documented in iter3 Phase 1 lock. Power on `data` (x86), prevent sleep, set up x86 host with cross-compile target aarch64. Same .patch and same PKGBUILD overlay carry over; only `arch=` and the build host change. NOT expected to be needed since boltzmann's rust 1.95 toolchain already exists and Mozilla certifies aarch64 builds in CI.
|
|
|
|
## Track A — libva-v4l2-request-fourier frame-11 EINVAL
|
|
|
|
**No code fix in Phase 4.** The fix requires knowing WHICH V4L2 control field returns EINVAL on frame 11, which we don't yet know. Phase 4 instead delivers the **diagnostic-loaded driver build** that surfaces the failing field name when run under the patched Firefox.
|
|
|
|
**Plan:**
|
|
|
|
1. **Diagnostic instrumentation** in `libva-v4l2-request-fourier/src/`:
|
|
- In `surface.c::EndPicture` (or wherever per-request controls are submitted via `VIDIOC_S_EXT_CTRLS`), wrap the ioctl with a `request_log()` call that, on EINVAL, dumps every control struct member: `id`, `size`, `value` (or for compound controls, the compound struct contents). Use `V4L2_CID_*` symbolic name lookup (a switch on id → string), or fall through to numeric id.
|
|
- Also log the slice index, picture index, surface ID, and POC (Picture Order Count) so we can correlate with the 11th-frame timing.
|
|
- This is purely add-only logging; revert in iter4's DEBUG sweep.
|
|
|
|
2. **Build + deploy**: rebuild driver via `meson setup --buildtype=release && ninja` on ohm at `/tmp/libva-src/...`, deploy to `/usr/lib/dri/v4l2_request_drv_video.so`. Driver sha256 changes.
|
|
|
|
3. **Phase 7 capture**: with patched Firefox + instrumented driver, run bbb_1080p30. Capture stderr; the EINVAL frame-11 line will name the control. Then we know whether it's:
|
|
- DECODE_PARAMS (Sonnet 7.5 mid-stream non-IDR territory)
|
|
- SLICE_PARAMS (`num_ref_idx_l0/l1`, Sonnet 7.2)
|
|
- SCALING_MATRIX (less likely; usually constant)
|
|
- SPS/PPS (even less likely; usually constant or per-IDR-only)
|
|
|
|
4. **Fix authoring** happens AFTER Phase 7 capture, in what becomes Phase 7.5 / Phase 8 territory rather than Phase 4. This is the natural shape of "Track A informed by Track F's rig".
|
|
|
|
**Reading reference for control validation rules**: `drivers/staging/media/hantro/hantro_g1_h264_dec.c` in the kernel tree on ohm. Check on which control fields the driver returns -EINVAL in the validate path. (This ALSO is doable on rpi if we have a copy of the kernel source nearby; ohm being offline doesn't block this preliminary read.)
|
|
|
|
## Phase 5 review checklist (what sonnet should look at)
|
|
|
|
- **Patch correctness:** does the .patch text apply cleanly to firefox-150.0.1? Are the hunks anchored on stable text? Is `nsAutoCString path("/dev/")` the right string-builder type for this codebase (vs `std::string`, `nsCString`, or others)? Are the cap-filter conditions logically equivalent to the substrate's claim "stateless decoders need CAPTURE_MPLANE+OUTPUT_MPLANE+STREAMING"?
|
|
|
|
- **Patch security:** does adding `/dev/media*` rdwr to RDD increase the attack surface in a way the existing `/dev/video*` rdwr policy doesn't already? Is there a media-controller node on common Linux desktops that exposes more than V4L2 (e.g. ISP / camera control nodes)? Should we filter /dev/media* by some capability check analogous to AddV4l2Dependencies's M2M check, or is enumeration sufficient?
|
|
|
|
- **PKGBUILD safety:** is renaming to firefox-fourier with conflicts=(firefox) the right pacman pattern, or should we use a `provides=()` pin without the conflict? Does the makepkg -e contract documented in the overlay actually hold for this PKGBUILD's prepare() shape?
|
|
|
|
- **Track A diagnostic plan:** is the EndPicture wrapping going to fire on the failing path, or could there be a different ioctl call site (S_EXT_CTRLS in submit_request, in queue.c, etc.) that hits EINVAL first? Should the instrumentation be at a lower layer (libva ioctl wrapper, or strace-derived signature) instead?
|
|
|
|
- **Deferred-seccomp risk:** Phase 2 deferred `SandboxFilter.cpp` to empirical Phase 7 test. Does sonnet have a fast path to fetch that source we missed? Is the deferral acceptable?
|
|
|
|
## Stop point
|
|
|
|
Phase 4 deliverables landed: patch text, PKGBUILD overlay strategy, Track A diagnostic plan, Phase 5 review checklist. Proceeding to Phase 5: sonnet review of the above. After Phase 5 passes (or the issues from review are resolved), Phase 6 builds firefox-fourier in the container and Phase 7 verifies on ohm.
|