# Iteration 3 close (Phase 8) — F+A locked, F GREEN, A reproduced + diagnosed Opened 2026-05-04, closing 2026-05-05. Locked candidate: **F (Firefox RDD sandbox verify-by-patch) + A (frame-11 EINVAL diagnose)** running in parallel on a single firefox-fourier build. ## Verdict per track ### Track F: GREEN Patched Firefox 150.0.1 (firefox-fourier, `pkgrel=1.1`) launched on ohm **without `MOZ_DISABLE_RDD_SANDBOX=1`** engages our libva-v4l2-request backend, opens `/dev/video1` + `/dev/media0` from the sandboxed RDD process, and submits decode requests through `MEDIA_REQUEST_IOC_*` ioctls. ENETDOWN signature from iter2 is gone; libva fully initialized; decode reaches the same frame-10 mark as iter2's sandbox-bypass run — proving the patched-sandbox is functionally equivalent to the bypass for V4L2 stateless decode. Three distinct gates needed patching to reach this state — Phase 2 had identified one (broker policy) and explicitly deferred the seccomp question to empirical Phase 7. Phase 7 surfaced two MORE gates beyond what Phase 2 anticipated: 1. **Broker policy** (`security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp`): - `AddV4l2Dependencies()` cap-filter widened: admit `(CAPTURE_MPLANE & OUTPUT_MPLANE & STREAMING)` for stateless decoders that don't advertise `M2M`. - New `AddV4l2RequestApiDependencies()` enumerates `/dev/media*` as rdwr. 2. **Seccomp policy** (`security/sandbox/linux/SandboxFilter.cpp`): - Add ioctl magic byte `'|'` (`` ioctls) to RDD's allowlist alongside existing `'V'` (V4L2). Without this, MEDIA_REQUEST_IOC_NEW_REQUEST returned ENOSYS; libva couldn't allocate request fds. 3. **Driver-side** (`libva-v4l2-request-fourier/src/media.c`): - `media_request_wait_completion()` migrated from `select()` to `poll()`. Mozilla's RDD seccomp common policy admits `poll/ppoll/epoll_*` but not `select/pselect6`. Without this, `select()` returned ENOSYS even after the broker + ioctl gates opened. Driver-side fix preferred over expanding Firefox seccomp — smaller surface, more portable across sandbox policies, and `poll()` is the modern API anyway. The Phase 2 deferral ("if patched binary trips SIGSYS, extend SandboxFilter") was correctly defensive but missed that Mozilla's seccomp returns ENOSYS via `SECCOMP_RET_ERRNO` rather than SIGSYS — silent fall-through that we only caught by reading our driver's own log lines. Lesson distilled below. ### Track A: REPRODUCED + DIAGNOSED, NOT FIXED Frame-11 EINVAL fires deterministically on the patched-sandbox rig — exactly matching iter1/iter2's carryover signature, ruling out "rig-specific" alibis. Decode succeeds for 10 BeginPictures (luma `var=0..4` confirms real NV12 output), then on the 11th `set_controls` call the kernel rejects with EINVAL. Y2 instrumentation (`v4l2_ioctl_controls` extension, two iterations) now produces full diagnostic output on the failing call: ``` v4l2-request: S_EXT_CTRLS EINVAL: num_controls=4 error_idx=4 ctrl[0]: id=0x00a40902 size=1048 # V4L2_CID_STATELESS_H264_SPS ctrl[1]: id=0x00a40903 size=12 # V4L2_CID_STATELESS_H264_PPS ctrl[2]: id=0x00a40907 size=560 # V4L2_CID_STATELESS_H264_DECODE_PARAMS ctrl[3]: id=0x00a40904 size=480 # V4L2_CID_STATELESS_H264_SCALING_MATRIX ``` `error_idx == num_controls` is the kernel's "all bad / no specific control identified" sentinel — request-level rejection, not a single-field violation. Sizes match kernel UAPI (`v4l2_ctrl_h264_sps`=1048, etc.) so this is NOT a struct-size mismatch. The failing frame is a single-slice P-frame post-IDR: `slice_type=0 frame_num=5 poc_lsb=20 flags=SHORT_TERM_REFERENCE`. Sonnet review 7.5 ("mid-stream non-IDR") fits this signature better than 7.2 (multi-slice num_ref_idx) which doesn't apply to single-slice frames. Phase 4 plan explicitly framed Track A's fix as Phase 7+ work informed by the rig: *"No code fix in Phase 4. The fix requires knowing WHICH V4L2 control field returns EINVAL on frame 11."* iter3 delivered the rig that makes that diagnosis reproducible. The next step — read `hantro_g1_h264_dec.c::set_params()` validation, diff against our DECODE_PARAMS / SLICE_PARAMS / SPS / PPS construction, narrow the failing field — is iter4's locked question. ## What landed ### libva-v4l2-request-fourier commits - `media.c::media_request_wait_completion`: replace `select(except_fds)` with `poll(POLLPRI)` for sandbox compatibility - `v4l2.c::v4l2_ioctl_controls`: Y2 instrumentation. On `VIDIOC_S_EXT_CTRLS` returning -EINVAL, log `num_controls`, `error_idx`, and per-control `id`+`size`. Pure diagnostic add-on; no behavior change. Should be removed at iter4's DEBUG sweep alongside iter1's instrumentation. ### libva-multiplanar campaign artifacts - `firefox-fourier/0001-rdd-allow-stateless-v4l2-request-api.patch` — three-hunk Firefox patch (broker policy two hunks, seccomp policy one hunk). Applied via Arch PKGBUILD overlay in the boltzmann LXD container. - `firefox-fourier/PKGBUILD-overlay.md` — verified working PKGBUILD overlay strategy: `pkgrel=1.1`, `arch=(x86_64 aarch64)`, our patch in `source=()` + `prepare()`, onnxruntime stripped, `--skippgpcheck` for Mozilla key rotation. No `--enable-v4l2` (Mozilla 150 auto-enables on aarch64+GTK). - `firefox-fourier/bootstrap.sh` — reproducible bootstrap inside the LXD container. - `phase2_iter3_situation.md` — Mozilla sandbox source verbatim (broker policy + cap filter quoted). - `phase3_iter3_baseline.md` — pre-patch baseline anchored from iter2-close evidence (ohm offline at Phase 3 time). - `phase4_iter3_plan.md` — Phase 4 plan + Phase 5 review checklist. - `phase5_iter3_review.md` — sonnet review (Y1 patch idiom fix, Y2 driver `error_idx` instrumentation requirement, B-slice copy-paste finding kept for iter4). - `phase6_iter3_findings.md` — six build-side surprises (proper unified-diff, no `--enable-v4l2`, GPG rotation, ALARM-stale wasi cluster, onnxruntime gap, "no tricks" lesson). - `phase8_iteration3_close.md` — this file. ### Build infrastructure introduced - `firefox-fourier` LXD container on **boltzmann** (RK3588 aarch64, 8 cores, 24 GB RAM, 787 GB free on `/build` NVMe). Provisioned by the `his` agent. Persistent (autostart=true). Useful for iter4 if Firefox rebuilds become necessary. - Upstream Arch x86_64 wasi packages (`arch=any`) cached at `/build/aur/wasi/upstream-any/`. ALARM extra is years stale on these — same fix pattern likely needed for any future ALARM container needing current wasi tooling. - Phase 7 evidence collector: `/home/mfritsche/iter3_phase7_evidence.sh` on ohm.vpn. Honors `LOG=` env override, prints per-track verdict. - Autonomous Phase 7 runner: `/tmp/run_phase7_v2.sh` on ohm.vpn. Discovers Plasma session env from a long-running user process, launches firefox-fourier, captures stderr, kills cleanly. Tmpfs-volatile. ## State that carries to iter4 - **Hardware**: ohm RK3568 hantro G1/G2, kernel 6.19.10. Userspace versions all unchanged (firefox 150.0.1, libva 2.23.0, mesa 26.0.5, libdrm 2.4.131). - **Driver installed**: `/usr/lib/dri/v4l2_request_drv_video.so` sha256 `70a2bb1e16012a5d...` (iter3 build with poll() fix + Y2 instrumentation). - **Firefox installed**: `/opt/firefox-fourier/firefox` (Mozilla Firefox 150.0.1, libxul.so 3.59 GB — PGO-instrumented stage-1 binary; functionally equivalent to release for our purposes; iter4 may want a clean PGO-disabled rebuild for performance). - **Test fixture**: `/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4` (sha256 `dcf8a7170fbd...`). - **Access path to ohm**: `ohm.vpn` (changed from `ohm.fritz.box` mid-iteration). Autonomous test rig works without operator intervention via Plasma session env discovery. - **Build container**: `firefox-fourier` LXD on boltzmann, accessed `ssh -J boltzmann builder@firefox-fourier`. Source still extracted at `/build/aur/firefox-fourier/src/firefox-150.0.1/` with iter3 patches applied. ## State that does NOT carry - The PGO instrumentation profile attempt always crashes at exit with `LLVM Profile Error: Permission denied` writes — irrelevant noise, will recur on every run of this binary. - `/tmp/ff-fourier-stderr-v2.log` is tmpfs-volatile. Anchor before reboot if needed; iter3's Phase 7 anchored evidence is in this campaign repo's commit history (script outputs were captured in the close). ## Documented limitations carried into iteration 4 substrate - **Track A unfixed**. The frame-11 EINVAL is the natural iter4 lock. With the rig and Y2 in place, iter4 starts with a richer baseline than iter3 did. - **Mpv libplacebo `--vo=gpu` regression** (carried from iter3 substrate, never iter3-scope). `Unable to request buffers: Device or resource busy` followed by SEGV during a downscale-probe surface creation. Vulkan init fails on this Plasma session; Mesa/Mozilla update may have shifted the fallback path. iter4 candidate. - **VAAPI consumer probe robustness** (existing memory `feedback_consumer_probe_calls.md`) — ffmpeg's `av_hwframe_ctx_init` calls vaDeriveImage on never-decoded surfaces. Our cap_pool tolerates this post-iter2; iter4 work shouldn't regress. - **PGO profile generation under sandbox**. Phase 6 finding: `--enable-profile-generate=cross` PGO step needs an X11/Wayland display the LXC container can't provide. iter4 may want a clean PGO-disabled rebuild. ## Lessons distilled to memory - **`feedback_no_tricks_revert_first.md`** (NEW) — when the user redirects on an in-flight workaround, the first action is to revert the workaround on disk, not continue diagnosing with the trick still active. iter3 lost ~1h to a stale background makepkg running against a python-edited PKGBUILD that had `--without-wasm-sandboxed-libraries` substituted in after the user said "no tricks." The `his` subagent caught and reverted it; the lesson is: do that proactively. - **`feedback_seccomp_returns_enosys.md`** (NEW) — Mozilla's RDD seccomp policy returns `SECCOMP_RET_ERRNO` with `ENOSYS` for filtered syscalls, not `SIGSYS`. Phase 2's deferral defaulted to "we'll see SIGSYS if seccomp blocks something" — that assumption was wrong. ENOSYS surfaces as `Function not implemented` strerror in driver logs, easy to miss. Pattern: any "not implemented" errno from a sandboxed process under Mozilla's filter, suspect seccomp first. - **`reference_alarm_stale_wasi.md`** (NEW) — ALARM (Arch Linux ARM) extra repo's wasi-* packages are 4 years stale (sdk-13 era). Mozilla 150 + clang 22 require sdk-33 wasm32-wasip1 toolchain. Fix: install upstream Arch x86_64 `arch=any` packages directly from `geo.mirror.pkgbuild.com`. Cached at `/build/aur/wasi/upstream-any/` on boltzmann firefox-fourier container. - **`reference_firefox_fourier_container.md`** (NEW) — boltzmann LXD `firefox-fourier` container: builder@firefox-fourier via `ssh -J boltzmann`, /build is NVMe-backed bind-mount with 787 GB free, all Firefox build prereqs staged. Persistent across boltzmann reboots. (Process memory `feedback_replicate_baseline_first.md` continues to apply; iter3's Phase 3 anchored from iter2-close evidence rather than re-acquiring with ohm offline, which was the right call when ohm was unreachable but the substrate state was unchanged within hours.) ## Bootlin upstream outlook iter3 produces a Firefox patch that's a candidate for upstream Mozilla submission (currently no Mozilla bug exists for /dev/media* + V4L2-stateless RDD sandbox per Phase 0 Sonnet research). The patch is ~50 lines across two files; reviewer concerns would center on `/dev/media*` rdwr enumeration on x86 desktop where media controllers can be ISP/webcam (not just codec). For ARM-embedded targets the patch is well-scoped. Per `feedback_no_upstream.md`, no PR/MR happens without explicit operator instruction. Driver-side `select() → poll()` change is a portable improvement that benefits any sandbox model, not just Mozilla's. Also a candidate for bootlin upstream — but again, deferred per policy. ## Phase 1 success criterion — final Quoted from `phase0_findings_iter3.md`: > **Track F:** Patched `firefox-fourier` (firefox-150.0.1 + RDD-sandbox patch) launched on ohm WITHOUT `MOZ_DISABLE_RDD_SANDBOX=1` engages our libva-v4l2-request backend, opens `/dev/video1` + `/dev/media0` from RDD process, and decodes ≥10 frames of bbb_1080p30 through hantro. ✓ HIT. ENETDOWN=0, cap_pool_init=1, BeginPicture=10, SyncSurface=42 (consumer probe overhead), EINVAL=0 in the first 10 frames. > **Track A:** Same patched-binary rig decodes ≥30s of bbb_1080p30 without `Unable to set control(s): Invalid argument` emerging in driver stderr. ✗ NOT HIT. EINVAL fires on the 11th BeginPicture (single-slice P-frame, `frame_num=5 poc_lsb=20 slice_type=0`), exactly the iter1+iter2 carryover. Track A's fix is iter4 territory; the diagnostic rig and Y2 instrumentation are now in place to make iter4's debug loop short. > **Joint success:** Both above, on the same patched binary, in the same operator session, with anchored evidence. PARTIAL — F locked, A surfaced under controlled rig with rich diagnostics. iter3 closes at "F+A in parallel, F achieved, A diagnosed-but-deferred." Honest accounting.