Files
libva-multiplanar/phase5_iter3_review.md
marfrit f91469abe3 Iteration 3 close — F GREEN, A reproduced + diagnosed for iter4
Phase 1 locked F (Firefox RDD sandbox verify-by-patch) and A (frame-11
EINVAL diagnose) running in parallel on a single firefox-fourier build.

Track F: GREEN. Patched Firefox 150.0.1 (firefox-fourier, pkgrel=1.1)
launches on ohm WITHOUT MOZ_DISABLE_RDD_SANDBOX=1 and engages our
libva-v4l2-request backend end-to-end. Three patches needed (Phase 2
identified one and deferred two):
  - Broker policy (SandboxBrokerPolicyFactory.cpp): allow /dev/media*,
    extend cap-filter to admit stateless decoders that lack M2M caps.
  - Seccomp policy (SandboxFilter.cpp): allow ioctl magic byte '|'
    for <linux/media.h> request-API ioctls.
  - Driver (media.c): replace select() with poll() — Mozilla's RDD
    seccomp common policy admits poll/ppoll/epoll_* but not
    select/pselect6. Driver-side fix preferred; smaller surface,
    portable across sandbox policies, and poll() is the modern API.

Track A: REPRODUCES + DIAGNOSED. Frame-11 EINVAL fires deterministically
on a single-slice P-frame (slice_type=0, frame_num=5, post-IDR) — the
exact iter1/iter2 carryover signature, confirming it isn't environmental.
Y2 instrumentation (in v4l2_ioctl_controls) now logs num_controls /
error_idx / per-control id+size on EINVAL. Sizes match kernel UAPI;
error_idx == num_controls is the kernel's "all bad / no specific control"
sentinel — it's a request-level rejection, not a single-field violation.
Fix is iter4's lock; rig + Y2 in place for fast iter4 turnaround.

Build infrastructure introduced: firefox-fourier LXD container on
boltzmann (RK3588 aarch64, persistent, ssh -J boltzmann
builder@firefox-fourier). Upstream Arch x86_64 wasi packages installed
to work around 4-year-stale ALARM versions. PGO generation crashes at
exit (LXC has no display); obj/dist/ tarball used as the deployable
artifact instead of the pacman package.

Phase 6 surprises captured in phase6_iter3_findings.md: malformed
first-cut patch (descriptive vs numeric hunk headers), --enable-v4l2
isn't a Mozilla 150 flag (auto-set on aarch64+GTK), Mozilla 2025 PGP
key rotation, ALARM-stale wasi, onnxruntime missing in ALARM, and the
"no tricks" lesson (revert workarounds first when redirected).

Carries to iter4 substrate: Track A fix is the natural lock; mpv
libplacebo --vo=gpu segfault stays as separate iter4 candidate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:56:34 +00:00

6.9 KiB

Iteration 3 — Phase 5 (sonnet review of Phase 4 deliverables)

Reviewer: Claude Sonnet 4.6, in-conversation subagent. Date: 2026-05-04. Inputs reviewed: phase0_findings_iter3.md, phase2_iter3_situation.md, phase3_iter3_baseline.md, phase4_iter3_plan.md, firefox-fourier/0001-rdd-allow-stateless-v4l2-request-api.patch, firefox-fourier/PKGBUILD-overlay.md. Reviewer additionally read the actual Mozilla source via fetch (not relying solely on Phase 2's quoted excerpts) and the libva-v4l2-request-fourier/src/ nested fork.

Verdict

YELLOW — proceed to Phase 6 with two named required fixes.

Findings

Y1 (BLOCKER for Phase 6) — string idiom mismatch in new function

The patch's AddV4l2RequestApiDependencies constructs the path as:

nsAutoCString path("/dev/");
path.Append(dir_entry->d_name);

The existing Mozilla codebase in the same translation unit uses:

nsCString path = "/dev/"_ns;
path += nsDependentCString(dir_entry->d_name);

nsAutoCString is a stack-buffered subclass of nsCString and the (const char*) constructor + .Append() exist, so the patch likely compiles, but it diverges from the file's own idiom and would be flagged by any Mozilla reviewer. Match the existing style.

Fix applied to firefox-fourier/0001-rdd-allow-stateless-v4l2-request-api.patch before build kickoff.

Y2 (BLOCKER for Phase 7 capture, not Phase 6) — driver does not log error_idx

v4l2_set_controls() in libva-v4l2-request-fourier currently logs "Unable to set control(s): %s" on VIDIOC_S_EXT_CTRLS failure, but does not surface controls.error_idx. When errno == EINVAL, that field names exactly which control in the array was rejected. Without it, Phase 7 capture is no more diagnostic than iter2's existing log — Track A's plan to identify "which control fails on frame 11" cannot succeed.

Fix: in v4l2_set_controls() (or whichever wrapper actually calls VIDIOC_S_EXT_CTRLS), after the ioctl returns -1 with EINVAL, log ext_controls.error_idx, the offending control's id (with V4L2_CID_* symbolic name lookup), and its size/value content. One-line change. Apply at Phase 6 alongside the firefox-fourier build (driver build is independent and fast).

Bonus (not Phase 4 induced; potential Track A fix candidate) — B-slice ref-list-1 copy-paste bug

In libva-v4l2-request-fourier/src/h264.c, the h264_va_slice_to_v4l2() function around line 663 has the B-slice ref-list-1 loop writing slice->ref_pic_list0[i].fields = fields instead of slice->ref_pic_list1[i].fields = fields. L1 entries .fields member is being written into L0 slot.

For bbb_1080p30 (mostly I+P frames in the BBB SFX intro segment), this bug may not fire. If frame 11 happens to be a B-frame in this stream, this could be the EINVAL cause — or could contribute to silent reference-list corruption with a downstream EINVAL signature.

Disposition: do NOT speculative-fix at Phase 6. We don't yet know whether frame 11 is a B-frame. Y2's error_idx logging will reveal whether the failing control is a SLICE_PARAMS field touching ref_pic_list1 — if yes, the copy-paste fix becomes the obvious patch. Save the candidate fix for Phase 7's analysis stage.

Minor — --skipinteg vs --skipchecksums in PKGBUILD overlay doc

The overlay doc references makepkg -ef --skipinteg. On modern Arch makepkg (7.1.0 inside the firefox-fourier container) the flag is --skipchecksums. Both work via --skipinteg aliasing in some pacman branches but --skipchecksums is canonical. Cosmetic; fix later.

Phase 6 finding (overrides Sonnet) — --enable-v4l2 is NOT a Mozilla 150 configure flag

Sonnet's review noted an --enable-v4l2 mozconfig "verify present" gate. Empirical Phase 6 ground-truth (2026-05-05): Mozilla 150 has no --enable-v4l2 flag at all. Adding it crashes configure with mozbuild.configure.options.InvalidOptionError: Unknown option: --enable-v4l2. The actual gate is in toolkit/moz.configure:643-651:

@depends(target, toolkit_gtk)
def v4l2(target, toolkit_gtk):
    # V4L2 decode is only used in GTK/Linux and generally only appears on
    # embedded SOCs.
    if target.cpu in ("arm", "aarch64", "riscv64") and toolkit_gtk:
        return True

set_config("MOZ_ENABLE_V4L2", True, when=v4l2)
set_define("MOZ_ENABLE_V4L2", True, when=v4l2)

So MOZ_ENABLE_V4L2 is automatically set whenever the target is arm/aarch64/riscv64 and the toolkit is GTK. boltzmann's container is aarch64+GTK → MOZ_ENABLE_V4L2 is implicitly true; our patch's #ifdef MOZ_ENABLE_V4L2 blocks compile in normally.

This is a tighter binding than --enable-v4l2 would have been: x86_64 desktop builds will NOT compile our patch. Acceptable for ohm; the ARM-only auto-enable in moz.configure also explains why upstream Mozilla doesn't ship --enable-v4l2 as a user-facing option — it's a target-architecture decision, not a per-build choice. Filing-day implication: any upstream submission of this patch should not add a configure-flag toggle, but live inside the existing MOZ_ENABLE_V4L2 ifdef.

Minor — mozconfig linker flag check at Phase 6 start

The upstream Arch PKGBUILD targets arch=(x86_64) and may not include aarch64-specific linker hints (--enable-linker=lld or equivalent). Low probability of build break given boltzmann's rust 1.95 + clang 22, but check grep -E 'lld|linker' mozconfig before kickoff. ALARM-style PKGBUILDs sometimes patch this; upstream Arch may not.

Cap-filter and security review (NOT findings — green-lit)

The reviewer confirms:

  • The (CAPTURE_MPLANE & OUTPUT_MPLANE & STREAMING) triple-AND for stateless decoders is the correct guard — camera-capture-only nodes lack OUTPUT_MPLANE; display-output-only nodes lack CAPTURE_MPLANE; the union with M2M arms is idempotent in AddPath.
  • /dev/media* rdwr enumeration on the embedded ARM target is in the same security domain as /dev/video* already-rdwr — not a campaign-blocking attack-surface increase. For upstream Mozilla submission, a reviewer would prefer filtering by MEDIA_IOC_DEVICE_INFO + MEDIA_ENT_F_PROC_VIDEO_DECODER, but the campaign goal (verify on ohm) is well-served by the blunt enumeration. Note for an eventual Mozilla bug filing.
  • Seccomp deferral is sound: ENETDOWN at open() time is broker-policy evidence; SIGSYS at ioctl time is unmistakable and different. Deferring SandboxFilter.cpp to Phase 7 empirical is correct.
  • PKGBUILD pattern (rename + conflicts + provides) is valid and standard. makepkg -e semantics in the doc match makepkg actual behavior.

Phase 5 → Phase 6 transition gates

  • Y1 patch fix applied (this Phase 5 close).
  • Y2 driver instrumentation applied (this Phase 5 close, in libva-v4l2-request-fourier).
  • Phase 6 build kicked off in firefox-fourier container.
  • Phase 6 first action: grep -E 'lld|linker' mozconfig after PKGBUILD fetch.
  • Phase 7 includes the B-slice bug as a candidate Track A fix; trigger only if Y2's error_idx log names a ref_pic_list1 field.