Files
libva-multiplanar/phase2_iter3_situation.md
T
marfrit f91469abe3 Iteration 3 close — F GREEN, A reproduced + diagnosed for iter4
Phase 1 locked F (Firefox RDD sandbox verify-by-patch) and A (frame-11
EINVAL diagnose) running in parallel on a single firefox-fourier build.

Track F: GREEN. Patched Firefox 150.0.1 (firefox-fourier, pkgrel=1.1)
launches on ohm WITHOUT MOZ_DISABLE_RDD_SANDBOX=1 and engages our
libva-v4l2-request backend end-to-end. Three patches needed (Phase 2
identified one and deferred two):
  - Broker policy (SandboxBrokerPolicyFactory.cpp): allow /dev/media*,
    extend cap-filter to admit stateless decoders that lack M2M caps.
  - Seccomp policy (SandboxFilter.cpp): allow ioctl magic byte '|'
    for <linux/media.h> request-API ioctls.
  - Driver (media.c): replace select() with poll() — Mozilla's RDD
    seccomp common policy admits poll/ppoll/epoll_* but not
    select/pselect6. Driver-side fix preferred; smaller surface,
    portable across sandbox policies, and poll() is the modern API.

Track A: REPRODUCES + DIAGNOSED. Frame-11 EINVAL fires deterministically
on a single-slice P-frame (slice_type=0, frame_num=5, post-IDR) — the
exact iter1/iter2 carryover signature, confirming it isn't environmental.
Y2 instrumentation (in v4l2_ioctl_controls) now logs num_controls /
error_idx / per-control id+size on EINVAL. Sizes match kernel UAPI;
error_idx == num_controls is the kernel's "all bad / no specific control"
sentinel — it's a request-level rejection, not a single-field violation.
Fix is iter4's lock; rig + Y2 in place for fast iter4 turnaround.

Build infrastructure introduced: firefox-fourier LXD container on
boltzmann (RK3588 aarch64, persistent, ssh -J boltzmann
builder@firefox-fourier). Upstream Arch x86_64 wasi packages installed
to work around 4-year-stale ALARM versions. PGO generation crashes at
exit (LXC has no display); obj/dist/ tarball used as the deployable
artifact instead of the pacman package.

Phase 6 surprises captured in phase6_iter3_findings.md: malformed
first-cut patch (descriptive vs numeric hunk headers), --enable-v4l2
isn't a Mozilla 150 flag (auto-set on aarch64+GTK), Mozilla 2025 PGP
key rotation, ALARM-stale wasi, onnxruntime missing in ALARM, and the
"no tricks" lesson (revert workarounds first when redirected).

Carries to iter4 substrate: Track A fix is the natural lock; mpv
libplacebo --vo=gpu segfault stays as separate iter4 candidate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:56:34 +00:00

8.0 KiB

Iteration 3 — Phase 2 (situation analysis: Mozilla sandbox source)

Goal of this phase: confirm or refute Sonnet's static-analysis verdict from iter3 substrate candidate F. The verdict says: Firefox's RDD sandbox blocks hantro decode because (a) /dev/media* is missing from GetRDDPolicy(), and (b) AddV4l2Dependencies() filters /dev/video* by an M2M-only capability check that excludes stateless decoders. We need verbatim source confirmation before authoring the patch.

Source: mozilla-release branch on searchfox.org as of 2026-05-04. This branch reflects Firefox 150.x (matches the 150.0.1 binary on ohm).

Finding 1 — GetRDDPolicy() confirms zero /dev/media* references

Verbatim excerpt from security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp::GetRDDPolicy:

/* static */ UniquePtr<SandboxBroker::Policy>
SandboxBrokerPolicyFactory::GetRDDPolicy(int aPid) {
  auto policy = MakeUnique<SandboxBroker::Policy>();
  AddSharedMemoryPaths(policy.get(), aPid);
  policy->AddPath(rdonly, "/dev/urandom");
  policy->AddPath(rdonly, "/proc/cpuinfo");
  policy->AddPath(rdonly,
      "/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq");
  policy->AddPath(rdonly, "/sys/devices/system/cpu/cpu0/cache/index2/size");
  policy->AddPath(rdonly, "/sys/devices/system/cpu/cpu0/cache/index3/size");
  policy->AddTree(rdonly, "/sys/devices/cpu");
  policy->AddTree(rdonly, "/sys/devices/system/cpu");
  policy->AddTree(rdonly, "/sys/devices/system/node");
  policy->AddTree(rdonly, "/lib");
  policy->AddTree(rdonly, "/lib64");
  policy->AddTree(rdonly, "/usr/lib");
  policy->AddTree(rdonly, "/usr/lib32");
  policy->AddTree(rdonly, "/usr/lib64");
  policy->AddTree(rdonly, "/run/opengl-driver/lib");
  policy->AddTree(rdonly, "/nix/store");
  AddMemoryReporting(policy.get(), aPid);
  AddGLDependencies(policy.get());
  AddLdconfigPaths(policy.get());
  AddLdLibraryEnvPaths(policy.get());
#ifdef MOZ_ENABLE_V4L2
  AddV4l2Dependencies(policy.get());
#endif
  // ... NVIDIA Tegra ARM64 conditional block ...
#if defined(MOZ_PROFILE_GENERATE)
  AddLLVMProfilePathDirectory(policy.get());
#endif
  if (policy->IsEmpty()) {
    policy = nullptr;
  }
  return policy;
}

Verdict: Confirmed. Zero references to /dev/media, /dev/v4l/by-path, or any media-controller path. The only V4L2-relevant entry point is AddV4l2Dependencies() under MOZ_ENABLE_V4L2. Sonnet's static-analysis is correct.

AddGLDependencies() handles /dev/dri/renderD* separately — that's why GPU access works even though it's not visible here.

Finding 2 — AddV4l2Dependencies() confirms M2M-only cap filter

Verbatim excerpt from same file:

#ifdef MOZ_ENABLE_V4L2
static void AddV4l2Dependencies(SandboxBroker::Policy* policy) {
  DIR* dir = opendir("/dev");
  if (!dir) {
    SANDBOX_LOG("Couldn't list /dev");
    return;
  }
  struct dirent* dir_entry;
  while ((dir_entry = readdir(dir))) {
    if (strncmp(dir_entry->d_name, "video", 5)) {
      continue;  // Not a /dev/video* device, ignore
    }
    // ... open each /dev/video* device ...
    struct v4l2_capability cap;
    int result = ioctl(fd, VIDIOC_QUERYCAP, &cap);
    if (result < 0) {
      SANDBOX_LOG("Couldn't query capabilities...");
      close(fd);
      continue;
    }
    if ((cap.device_caps & V4L2_CAP_VIDEO_M2M) ||
        (cap.device_caps & V4L2_CAP_VIDEO_M2M_MPLANE)) {
      policy->AddPath(rdwr, path.get());
    }
    close(fd);
  }
  closedir(dir);
  policy->AddPath(rdonly, "/dev");
}
#endif

Verdict: Confirmed. The cap test is exactly V4L2_CAP_VIDEO_M2M | V4L2_CAP_VIDEO_M2M_MPLANE. Hantro stateless reports V4L2_CAP_VIDEO_CAPTURE_MPLANE | V4L2_CAP_VIDEO_OUTPUT_MPLANE | V4L2_CAP_STREAMING — neither of the two M2M caps is set, so /dev/video1 is silently rejected. Then because there are no entries added, RDD lacks any /dev/video* path at all.

Cross-checked against ohm's vainfo-time output, hantro G1 H.264 (capture-mplane only): Driver Capabilities: 0x00d04000 = V4L2_CAP_VIDEO_CAPTURE_MPLANE|V4L2_CAP_STREAMING|V4L2_CAP_VIDEO_OUTPUT_MPLANE|V4L2_CAP_VIDEO_M2M_MPLANE — actually wait, does hantro on this kernel set M2M_MPLANE? Re-verify at Phase 3 / Phase 7 with v4l2-ctl --device=/dev/video1 --info to confirm the cap set on the test rig. If hantro DOES set M2M_MPLANE, /dev/video1 already passes — and the missing piece is purely /dev/media0. If not, both gates need patching. (The substrate's "explicitly excluded by this filter" claim from Sonnet was the basis for assuming both gates fail; an empirical check on the test rig is the cheap confirmation.)

Finding 3 — seccomp side (SandboxFilter.cpp) UNRESOLVED

Phase-2 attempts to fetch RDDSandboxPolicy (or whatever class implements RDD's seccomp policy) via searchfox returned truncated content; the relevant EvaluateSyscall(int sysno) / ioctl-handling section sits past WebFetch's content window. Searchfox's search API also doesn't render through WebFetch.

Open question: does Firefox's RDD seccomp policy filter ioctl() by request-number magic byte? If yes, MEDIA_REQUEST_IOC_QUEUE (magic '|', type 0xb7, number 0x02) might be blocked even after the broker policy lets open(/dev/media0) through, since ioctl is not brokered — it runs locally in RDD under seccomp.

Plan: defer to empirical Phase 7 test. Specifically:

  • If the patched Firefox runs and decodes ≥10 frames through hantro: seccomp is permissive on V4L2/media ioctls; broker-policy patch alone is sufficient.
  • If the patched Firefox SIGSYS-aborts on first ioctl after /dev/media0 open, with a MOZ_LOG=Sandbox:5 trace pointing at MEDIA_REQUEST_IOC_QUEUE: extend the patch with a SandboxFilter.cpp seccomp allow rule.

This is cheaper than chasing the seccomp source through three searchfox round trips — the patched binary is the source of truth, and the SIGSYS signature is unmistakable.

Empirical evidence supporting "broker is the load-bearing gate"

iter2 close (phase8_iteration2_close.md) recorded the failure mode under stock Firefox 150 (no patch, default sandbox):

"libva init fails inside RDD sandbox on open(/dev/media0) returning ENETDOWN — Firefox SW-falls-back."

ENETDOWN is the synthesized errno that Firefox's broker returns when the path policy denies access to a path the broker is asked to open. Seccomp returning EPERM/SIGSYS on a syscall would have produced a DIFFERENT signature (process abort with seccomp_unotify info, or errno=EPERM). The fact that the failure surfaces as ENETDOWN at open() time is direct evidence that the broker's path policy is the active gate — confirming that adding /dev/media0 to GetRDDPolicy is the highest-leverage change.

After that change lands, the next syscalls (ioctl-on-media-fd) become the next observable gate. Empirically chase only if they fail.

Implication for Phase 4 (patch design)

Minimum patch (highest probability of being sufficient):

  1. GetRDDPolicy: add /dev/media* enumeration (analogous shape to AddV4l2Dependencies's /dev/video* walker — let it scan /dev for media prefix and add each as rdwr). Or simpler: add policy->AddPath(rdwr, "/dev/media0") if we're willing to hardcode for ohm; safer is to enumerate.
  2. AddV4l2Dependencies: extend the cap check to also admit nodes that have (VIDEO_CAPTURE_MPLANE & VIDEO_OUTPUT_MPLANE & STREAMING) even without M2M_*. This catches stateless decoders.

Possibly needed (resolved at Phase 7): 3. SandboxFilter.cpp RDD seccomp: allow ioctl with magic byte '|' (linux/media.h ioctls).

The patch will be split into two source files (or one if seccomp untouched). Both files are in security/sandbox/linux/, both have stable upstream paths, both are a few-line surgical edit — no re-architecture. The substrate's "30-line patch upstream" claim from Sonnet stands.

State of Phase 2 close

  • Broker-side analysis: COMPLETE, source verified verbatim.
  • Seccomp-side analysis: DEFERRED to Phase 7 empirical test.
  • Test-rig cap-set verification (hantro M2M_MPLANE bit): DEFERRED to Phase 3 (when ohm is reachable).
  • Patch design sketch: ready for Phase 4 author-time.