Phase 1 locked F (Firefox RDD sandbox verify-by-patch) and A (frame-11
EINVAL diagnose) running in parallel on a single firefox-fourier build.
Track F: GREEN. Patched Firefox 150.0.1 (firefox-fourier, pkgrel=1.1)
launches on ohm WITHOUT MOZ_DISABLE_RDD_SANDBOX=1 and engages our
libva-v4l2-request backend end-to-end. Three patches needed (Phase 2
identified one and deferred two):
- Broker policy (SandboxBrokerPolicyFactory.cpp): allow /dev/media*,
extend cap-filter to admit stateless decoders that lack M2M caps.
- Seccomp policy (SandboxFilter.cpp): allow ioctl magic byte '|'
for <linux/media.h> request-API ioctls.
- Driver (media.c): replace select() with poll() — Mozilla's RDD
seccomp common policy admits poll/ppoll/epoll_* but not
select/pselect6. Driver-side fix preferred; smaller surface,
portable across sandbox policies, and poll() is the modern API.
Track A: REPRODUCES + DIAGNOSED. Frame-11 EINVAL fires deterministically
on a single-slice P-frame (slice_type=0, frame_num=5, post-IDR) — the
exact iter1/iter2 carryover signature, confirming it isn't environmental.
Y2 instrumentation (in v4l2_ioctl_controls) now logs num_controls /
error_idx / per-control id+size on EINVAL. Sizes match kernel UAPI;
error_idx == num_controls is the kernel's "all bad / no specific control"
sentinel — it's a request-level rejection, not a single-field violation.
Fix is iter4's lock; rig + Y2 in place for fast iter4 turnaround.
Build infrastructure introduced: firefox-fourier LXD container on
boltzmann (RK3588 aarch64, persistent, ssh -J boltzmann
builder@firefox-fourier). Upstream Arch x86_64 wasi packages installed
to work around 4-year-stale ALARM versions. PGO generation crashes at
exit (LXC has no display); obj/dist/ tarball used as the deployable
artifact instead of the pacman package.
Phase 6 surprises captured in phase6_iter3_findings.md: malformed
first-cut patch (descriptive vs numeric hunk headers), --enable-v4l2
isn't a Mozilla 150 flag (auto-set on aarch64+GTK), Mozilla 2025 PGP
key rotation, ALARM-stale wasi, onnxruntime missing in ALARM, and the
"no tricks" lesson (revert workarounds first when redirected).
Carries to iter4 substrate: Track A fix is the natural lock; mpv
libplacebo --vo=gpu segfault stays as separate iter4 candidate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 KiB
Iteration 3 close (Phase 8) — F+A locked, F GREEN, A reproduced + diagnosed
Opened 2026-05-04, closing 2026-05-05. Locked candidate: F (Firefox RDD sandbox verify-by-patch) + A (frame-11 EINVAL diagnose) running in parallel on a single firefox-fourier build.
Verdict per track
Track F: GREEN
Patched Firefox 150.0.1 (firefox-fourier, pkgrel=1.1) launched on ohm without MOZ_DISABLE_RDD_SANDBOX=1 engages our libva-v4l2-request backend, opens /dev/video1 + /dev/media0 from the sandboxed RDD process, and submits decode requests through MEDIA_REQUEST_IOC_* ioctls. ENETDOWN signature from iter2 is gone; libva fully initialized; decode reaches the same frame-10 mark as iter2's sandbox-bypass run — proving the patched-sandbox is functionally equivalent to the bypass for V4L2 stateless decode.
Three distinct gates needed patching to reach this state — Phase 2 had identified one (broker policy) and explicitly deferred the seccomp question to empirical Phase 7. Phase 7 surfaced two MORE gates beyond what Phase 2 anticipated:
- Broker policy (
security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp):AddV4l2Dependencies()cap-filter widened: admit(CAPTURE_MPLANE & OUTPUT_MPLANE & STREAMING)for stateless decoders that don't advertiseM2M.- New
AddV4l2RequestApiDependencies()enumerates/dev/media*as rdwr.
- Seccomp policy (
security/sandbox/linux/SandboxFilter.cpp):- Add ioctl magic byte
'|'(<linux/media.h>ioctls) to RDD's allowlist alongside existing'V'(V4L2). Without this, MEDIA_REQUEST_IOC_NEW_REQUEST returned ENOSYS; libva couldn't allocate request fds.
- Add ioctl magic byte
- Driver-side (
libva-v4l2-request-fourier/src/media.c):media_request_wait_completion()migrated fromselect()topoll(). Mozilla's RDD seccomp common policy admitspoll/ppoll/epoll_*but notselect/pselect6. Without this,select()returned ENOSYS even after the broker + ioctl gates opened. Driver-side fix preferred over expanding Firefox seccomp — smaller surface, more portable across sandbox policies, andpoll()is the modern API anyway.
The Phase 2 deferral ("if patched binary trips SIGSYS, extend SandboxFilter") was correctly defensive but missed that Mozilla's seccomp returns ENOSYS via SECCOMP_RET_ERRNO rather than SIGSYS — silent fall-through that we only caught by reading our driver's own log lines. Lesson distilled below.
Track A: REPRODUCED + DIAGNOSED, NOT FIXED
Frame-11 EINVAL fires deterministically on the patched-sandbox rig — exactly matching iter1/iter2's carryover signature, ruling out "rig-specific" alibis. Decode succeeds for 10 BeginPictures (luma var=0..4 confirms real NV12 output), then on the 11th set_controls call the kernel rejects with EINVAL.
Y2 instrumentation (v4l2_ioctl_controls extension, two iterations) now produces full diagnostic output on the failing call:
v4l2-request: S_EXT_CTRLS EINVAL: num_controls=4 error_idx=4
ctrl[0]: id=0x00a40902 size=1048 # V4L2_CID_STATELESS_H264_SPS
ctrl[1]: id=0x00a40903 size=12 # V4L2_CID_STATELESS_H264_PPS
ctrl[2]: id=0x00a40907 size=560 # V4L2_CID_STATELESS_H264_DECODE_PARAMS
ctrl[3]: id=0x00a40904 size=480 # V4L2_CID_STATELESS_H264_SCALING_MATRIX
error_idx == num_controls is the kernel's "all bad / no specific control identified" sentinel — request-level rejection, not a single-field violation. Sizes match kernel UAPI (v4l2_ctrl_h264_sps=1048, etc.) so this is NOT a struct-size mismatch.
The failing frame is a single-slice P-frame post-IDR: slice_type=0 frame_num=5 poc_lsb=20 flags=SHORT_TERM_REFERENCE. Sonnet review 7.5 ("mid-stream non-IDR") fits this signature better than 7.2 (multi-slice num_ref_idx) which doesn't apply to single-slice frames.
Phase 4 plan explicitly framed Track A's fix as Phase 7+ work informed by the rig: "No code fix in Phase 4. The fix requires knowing WHICH V4L2 control field returns EINVAL on frame 11." iter3 delivered the rig that makes that diagnosis reproducible. The next step — read hantro_g1_h264_dec.c::set_params() validation, diff against our DECODE_PARAMS / SLICE_PARAMS / SPS / PPS construction, narrow the failing field — is iter4's locked question.
What landed
libva-v4l2-request-fourier commits
media.c::media_request_wait_completion: replaceselect(except_fds)withpoll(POLLPRI)for sandbox compatibilityv4l2.c::v4l2_ioctl_controls: Y2 instrumentation. OnVIDIOC_S_EXT_CTRLSreturning -EINVAL, lognum_controls,error_idx, and per-controlid+size. Pure diagnostic add-on; no behavior change. Should be removed at iter4's DEBUG sweep alongside iter1's instrumentation.
libva-multiplanar campaign artifacts
firefox-fourier/0001-rdd-allow-stateless-v4l2-request-api.patch— three-hunk Firefox patch (broker policy two hunks, seccomp policy one hunk). Applied via Arch PKGBUILD overlay in the boltzmann LXD container.firefox-fourier/PKGBUILD-overlay.md— verified working PKGBUILD overlay strategy:pkgrel=1.1,arch=(x86_64 aarch64), our patch insource=()+prepare(), onnxruntime stripped,--skippgpcheckfor Mozilla key rotation. No--enable-v4l2(Mozilla 150 auto-enables on aarch64+GTK).firefox-fourier/bootstrap.sh— reproducible bootstrap inside the LXD container.phase2_iter3_situation.md— Mozilla sandbox source verbatim (broker policy + cap filter quoted).phase3_iter3_baseline.md— pre-patch baseline anchored from iter2-close evidence (ohm offline at Phase 3 time).phase4_iter3_plan.md— Phase 4 plan + Phase 5 review checklist.phase5_iter3_review.md— sonnet review (Y1 patch idiom fix, Y2 drivererror_idxinstrumentation requirement, B-slice copy-paste finding kept for iter4).phase6_iter3_findings.md— six build-side surprises (proper unified-diff, no--enable-v4l2, GPG rotation, ALARM-stale wasi cluster, onnxruntime gap, "no tricks" lesson).phase8_iteration3_close.md— this file.
Build infrastructure introduced
firefox-fourierLXD container on boltzmann (RK3588 aarch64, 8 cores, 24 GB RAM, 787 GB free on/buildNVMe). Provisioned by thehisagent. Persistent (autostart=true). Useful for iter4 if Firefox rebuilds become necessary.- Upstream Arch x86_64 wasi packages (
arch=any) cached at/build/aur/wasi/upstream-any/. ALARM extra is years stale on these — same fix pattern likely needed for any future ALARM container needing current wasi tooling. - Phase 7 evidence collector:
/home/mfritsche/iter3_phase7_evidence.shon ohm.vpn. HonorsLOG=env override, prints per-track verdict. - Autonomous Phase 7 runner:
/tmp/run_phase7_v2.shon ohm.vpn. Discovers Plasma session env from a long-running user process, launches firefox-fourier, captures stderr, kills cleanly. Tmpfs-volatile.
State that carries to iter4
- Hardware: ohm RK3568 hantro G1/G2, kernel 6.19.10. Userspace versions all unchanged (firefox 150.0.1, libva 2.23.0, mesa 26.0.5, libdrm 2.4.131).
- Driver installed:
/usr/lib/dri/v4l2_request_drv_video.sosha25670a2bb1e16012a5d...(iter3 build with poll() fix + Y2 instrumentation). - Firefox installed:
/opt/firefox-fourier/firefox(Mozilla Firefox 150.0.1, libxul.so 3.59 GB — PGO-instrumented stage-1 binary; functionally equivalent to release for our purposes; iter4 may want a clean PGO-disabled rebuild for performance). - Test fixture:
/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4(sha256dcf8a7170fbd...). - Access path to ohm:
ohm.vpn(changed fromohm.fritz.boxmid-iteration). Autonomous test rig works without operator intervention via Plasma session env discovery. - Build container:
firefox-fourierLXD on boltzmann, accessedssh -J boltzmann builder@firefox-fourier. Source still extracted at/build/aur/firefox-fourier/src/firefox-150.0.1/with iter3 patches applied.
State that does NOT carry
- The PGO instrumentation profile attempt always crashes at exit with
LLVM Profile Error: Permission deniedwrites — irrelevant noise, will recur on every run of this binary. /tmp/ff-fourier-stderr-v2.logis tmpfs-volatile. Anchor before reboot if needed; iter3's Phase 7 anchored evidence is in this campaign repo's commit history (script outputs were captured in the close).
Documented limitations carried into iteration 4 substrate
- Track A unfixed. The frame-11 EINVAL is the natural iter4 lock. With the rig and Y2 in place, iter4 starts with a richer baseline than iter3 did.
- Mpv libplacebo
--vo=gpuregression (carried from iter3 substrate, never iter3-scope).Unable to request buffers: Device or resource busyfollowed by SEGV during a downscale-probe surface creation. Vulkan init fails on this Plasma session; Mesa/Mozilla update may have shifted the fallback path. iter4 candidate. - VAAPI consumer probe robustness (existing memory
feedback_consumer_probe_calls.md) — ffmpeg'sav_hwframe_ctx_initcalls vaDeriveImage on never-decoded surfaces. Our cap_pool tolerates this post-iter2; iter4 work shouldn't regress. - PGO profile generation under sandbox. Phase 6 finding:
--enable-profile-generate=crossPGO step needs an X11/Wayland display the LXC container can't provide. iter4 may want a clean PGO-disabled rebuild.
Lessons distilled to memory
feedback_no_tricks_revert_first.md(NEW) — when the user redirects on an in-flight workaround, the first action is to revert the workaround on disk, not continue diagnosing with the trick still active. iter3 lost ~1h to a stale background makepkg running against a python-edited PKGBUILD that had--without-wasm-sandboxed-librariessubstituted in after the user said "no tricks." Thehissubagent caught and reverted it; the lesson is: do that proactively.feedback_seccomp_returns_enosys.md(NEW) — Mozilla's RDD seccomp policy returnsSECCOMP_RET_ERRNOwithENOSYSfor filtered syscalls, notSIGSYS. Phase 2's deferral defaulted to "we'll see SIGSYS if seccomp blocks something" — that assumption was wrong. ENOSYS surfaces asFunction not implementedstrerror in driver logs, easy to miss. Pattern: any "not implemented" errno from a sandboxed process under Mozilla's filter, suspect seccomp first.reference_alarm_stale_wasi.md(NEW) — ALARM (Arch Linux ARM) extra repo's wasi-* packages are 4 years stale (sdk-13 era). Mozilla 150 + clang 22 require sdk-33 wasm32-wasip1 toolchain. Fix: install upstream Arch x86_64arch=anypackages directly fromgeo.mirror.pkgbuild.com. Cached at/build/aur/wasi/upstream-any/on boltzmann firefox-fourier container.reference_firefox_fourier_container.md(NEW) — boltzmann LXDfirefox-fouriercontainer: builder@firefox-fourier viassh -J boltzmann, /build is NVMe-backed bind-mount with 787 GB free, all Firefox build prereqs staged. Persistent across boltzmann reboots.
(Process memory feedback_replicate_baseline_first.md continues to apply; iter3's Phase 3 anchored from iter2-close evidence rather than re-acquiring with ohm offline, which was the right call when ohm was unreachable but the substrate state was unchanged within hours.)
Bootlin upstream outlook
iter3 produces a Firefox patch that's a candidate for upstream Mozilla submission (currently no Mozilla bug exists for /dev/media* + V4L2-stateless RDD sandbox per Phase 0 Sonnet research). The patch is ~50 lines across two files; reviewer concerns would center on /dev/media* rdwr enumeration on x86 desktop where media controllers can be ISP/webcam (not just codec). For ARM-embedded targets the patch is well-scoped. Per feedback_no_upstream.md, no PR/MR happens without explicit operator instruction.
Driver-side select() → poll() change is a portable improvement that benefits any sandbox model, not just Mozilla's. Also a candidate for bootlin upstream — but again, deferred per policy.
Phase 1 success criterion — final
Quoted from phase0_findings_iter3.md:
Track F: Patched
firefox-fourier(firefox-150.0.1 + RDD-sandbox patch) launched on ohm WITHOUTMOZ_DISABLE_RDD_SANDBOX=1engages our libva-v4l2-request backend, opens/dev/video1+/dev/media0from RDD process, and decodes ≥10 frames of bbb_1080p30 through hantro.
✓ HIT. ENETDOWN=0, cap_pool_init=1, BeginPicture=10, SyncSurface=42 (consumer probe overhead), EINVAL=0 in the first 10 frames.
Track A: Same patched-binary rig decodes ≥30s of bbb_1080p30 without
Unable to set control(s): Invalid argumentemerging in driver stderr.
✗ NOT HIT. EINVAL fires on the 11th BeginPicture (single-slice P-frame, frame_num=5 poc_lsb=20 slice_type=0), exactly the iter1+iter2 carryover. Track A's fix is iter4 territory; the diagnostic rig and Y2 instrumentation are now in place to make iter4's debug loop short.
Joint success: Both above, on the same patched binary, in the same operator session, with anchored evidence.
PARTIAL — F locked, A surfaced under controlled rig with rich diagnostics. iter3 closes at "F+A in parallel, F achieved, A diagnosed-but-deferred." Honest accounting.