Files
libva-multiplanar/phase0_findings_iter3.md
marfrit f91469abe3 Iteration 3 close — F GREEN, A reproduced + diagnosed for iter4
Phase 1 locked F (Firefox RDD sandbox verify-by-patch) and A (frame-11
EINVAL diagnose) running in parallel on a single firefox-fourier build.

Track F: GREEN. Patched Firefox 150.0.1 (firefox-fourier, pkgrel=1.1)
launches on ohm WITHOUT MOZ_DISABLE_RDD_SANDBOX=1 and engages our
libva-v4l2-request backend end-to-end. Three patches needed (Phase 2
identified one and deferred two):
  - Broker policy (SandboxBrokerPolicyFactory.cpp): allow /dev/media*,
    extend cap-filter to admit stateless decoders that lack M2M caps.
  - Seccomp policy (SandboxFilter.cpp): allow ioctl magic byte '|'
    for <linux/media.h> request-API ioctls.
  - Driver (media.c): replace select() with poll() — Mozilla's RDD
    seccomp common policy admits poll/ppoll/epoll_* but not
    select/pselect6. Driver-side fix preferred; smaller surface,
    portable across sandbox policies, and poll() is the modern API.

Track A: REPRODUCES + DIAGNOSED. Frame-11 EINVAL fires deterministically
on a single-slice P-frame (slice_type=0, frame_num=5, post-IDR) — the
exact iter1/iter2 carryover signature, confirming it isn't environmental.
Y2 instrumentation (in v4l2_ioctl_controls) now logs num_controls /
error_idx / per-control id+size on EINVAL. Sizes match kernel UAPI;
error_idx == num_controls is the kernel's "all bad / no specific control"
sentinel — it's a request-level rejection, not a single-field violation.
Fix is iter4's lock; rig + Y2 in place for fast iter4 turnaround.

Build infrastructure introduced: firefox-fourier LXD container on
boltzmann (RK3588 aarch64, persistent, ssh -J boltzmann
builder@firefox-fourier). Upstream Arch x86_64 wasi packages installed
to work around 4-year-stale ALARM versions. PGO generation crashes at
exit (LXC has no display); obj/dist/ tarball used as the deployable
artifact instead of the pacman package.

Phase 6 surprises captured in phase6_iter3_findings.md: malformed
first-cut patch (descriptive vs numeric hunk headers), --enable-v4l2
isn't a Mozilla 150 flag (auto-set on aarch64+GTK), Mozilla 2025 PGP
key rotation, ALARM-stale wasi, onnxruntime missing in ALARM, and the
"no tricks" lesson (revert workarounds first when redirected).

Carries to iter4 substrate: Track A fix is the natural lock; mpv
libplacebo --vo=gpu segfault stays as separate iter4 candidate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:56:34 +00:00

18 KiB
Raw Permalink Blame History

Iteration 3 — Phase 0 (substrate / motivation / inventory)

Opens 2026-05-04 immediately after iteration 2 close (phase8_iteration2_close.md, fork commit 19acc76, campaign close commit c36c61e).

Predecessor close-out summary (iteration 2 → iteration 3)

iter2 was hardening, not new feature work. Three independent fixes landed:

  • Fix 1 (06beef6): multi-resolution session-state corruption
  • Fix 2 (e64bb08): Mesa WSI rejection of non-64-aligned pitch
  • Fix 3 (19acc76): DMA-BUF lifecycle race — decoupled CAPTURE buffer pool with LRU recycling, the load-bearing fix

iter2 Phase 1 lock met: mpv --hwdec=vaapi --vo=gpu plays bbb_1080p30 smoothly per operator inspection.

iter2 Phase 7 verification:

  • vaapi-copy 200 frames: 0 drops, real luma gradient, pool LRU visibly recycling ✓
  • vaapi --vo=gpu real-VO: smooth ✓
  • Firefox 150: engages our libva backend (cap_pool architecture confirmed working with surface ID recycling), decodes 10 frames cleanly through hantro, then EINVAL on frame 11 (iter1 carryover Sonnet 7.x family). Requires MOZ_DISABLE_RDD_SANDBOX=1 because Firefox now routes VAAPI through RDD instead of utility (changed since iter1's test).

Iteration 3 candidate research questions (UNLOCKED — user picks)

Multiple substantive items survive iter2 close. Each could anchor an entire iteration; some pair naturally.

A. Frame-11 EINVAL (Sonnet 7.x carryover, decode correctness)

Identify which V4L2 control returns EINVAL on the 11th decoded frame in Firefox (and likely also under specific mpv stream-shape patterns), and fix.

Why first: it's the load-bearing remaining defect. Firefox decodes 10 frames cleanly, then HW decode terminates. Over a 30s+ session the EINVAL likely recurs across surface-recycle cycles. The Unable to set control(s): Invalid argument log from iter2 Phase 7 narrows the suspect set: per-request controls (DECODE_PARAMS, SCALING_MATRIX, SPS, PPS) for slice_type=1 / non-IDR mid-stream frames. Sonnet review 7.5 (mid-stream non-IDR) and 7.2 (num_ref_idx_l0/l1 for multi-slice) are the named hypotheses. Reading hantro_g1_h264_dec.c for which control fields it validates is the first concrete investigation step.

Risk: may surface a need for VPS/SPS state tracking across requests that doesn't exist in our backend today.

B. DEBUG instrumentation sweep

Remove the 0010 / 0011 / 0014 + ENTER logging + sentinel write + msync workaround commit-by-commit, building cleanly between each removal. End state: zero request_log() calls in non-error paths, no patch-0011 sentinel write in EndPicture, and either delete the msync or document why it stays.

Why: required prerequisite for any upstream snapshot, plus Phase 5 review for iter1 had this on the to-do list and iter2 explicitly deferred it. Smaller scope than A; could pair with A or run standalone.

C. Performance binding cell (deferred from iter1, named-deferred in iter2)

Establish a measurement protocol for HW vs SW decode on this rig: drop counts, effective FPS, browser CPU%, scanout-plane residency for {mpv vaapi DMA-BUF, mpv vaapi-copy, Firefox HW (sandbox-bypassed), SW baseline}. Anchor results in an in-session evidence dir.

Why: anchors all iter1+iter2 claims to numbers. The iter1 close noted "Performance numbers: drop counts ... not anchors — re-measure in iteration 2 with consistent rig" but iter2 was hardening-only. This is the natural perf iteration.

Risk: lots of fixture work for one binding cell.

D. Multi-context libva safety (Sonnet review 9.6)

Make the backend safe for two concurrent libva contexts in the same process (e.g. Firefox tab playing one video while another tab plays a different resolution). Today's LAST_OUTPUT_WIDTH/HEIGHT is a process-global static and cap_pool is per-driver_data but the V4L2 device is shared.

Why: iter2 documented this as deferred; the architectural surgery is similar to Fix 3 (per-context pools, per-context format cache). One real consumer might surface this — Firefox tab + mpv together, for example.

E. V4L2_MEMORY_DMABUF (the iter2 plan's Option B — true architectural fix for DMA-BUF lifecycle)

Replace V4L2_MEMORY_MMAP with userspace dma-buf allocation: userspace allocates buffers via gbm/dumb-buffer, hands them to V4L2 via VIDIOC_QBUF with type DMABUF. EXPBUF is no longer needed; lifetime is unambiguous because userspace owns the buffer.

Why: iter2 Fix 3 is statistical (Option A) — adding more buffers + LRU narrows the race window but doesn't close it. Option B closes it. Significant kernel-side test surface (does hantro on this kernel actually accept DMABUF type? GStreamer's v4l2slh264dec uses MMAP, so DMABUF on hantro may not be tested upstream).

Risk: highest unknown of any candidate; possibly requires kernel work.

F. Firefox RDD sandbox vs /dev/media0 (backlog from iter2 close — UPDATED with web research 2026-05-04)

File a Mozilla Bugzilla report ("RDD sandbox: allow /dev/media* and V4L2-stateless nodes for request-API hardware decoders") referencing Bug 1833354 as the V4L2-M2M precedent, propose a small patch.

Sonnet web research findings (2026-05-04):

  • NO existing Mozilla bug covers /dev/media* or the V4L2-stateless request-API path. We'd be filing the first.
  • Closest existing work is Bug 1833354 (FF116 — V4L2-M2M only) and Bug 1965646 (FF141 — extends M2M codecs). Neither addresses the request-API.
  • Allowlist is in security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp::GetRDDPolicy(). It calls AddV4l2Dependencies() which enumerates /dev/video* and filters by V4L2_CAP_VIDEO_M2M | V4L2_CAP_VIDEO_M2M_MPLANE. Hantro stateless uses CAPTURE_MPLANE + OUTPUT_MPLANE + request-API — explicitly excluded by this filter. /dev/media* is completely absent (no AddPath for it anywhere).
  • Architecture is broker-on-IPC (same model as renderD128): when sandboxed RDD calls open(), seccomp intercepts, forwards over socket to broker thread in parent process, which checks path policy and opens on RDD's behalf. No broker redesign needed — just two functions in one file:
    1. GetRDDPolicy(): add policy->AddPath(rdwr, "/dev/media0") (or glob)
    2. AddV4l2Dependencies(): extend cap filter to also admit stateless V4L2 nodes (or add a new AddV4l2RequestDependencies())
    3. Possibly SandboxFilter.cpp: verify MEDIA_REQUEST_IOC_QUEUE ioctl (type 0xb7, number 0x02) is allowed
  • Verdict: real Mozilla patch needed, small in scope. NOT a documentation-only fix.

Why: highest-leverage upstream contribution out of any iter3 candidate — a 30-line patch upstream removes the env-var requirement for every V4L2-stateless user, including future Rockchip/Hantro/Cedrus/Allwinner targets. Cross-verification on Intel/NVIDIA test boxes (meitner / clevo) is no longer needed for the diagnosis (cap-filter + missing /dev/media* are mechanically the explanation), but might be useful for "smoke-confirm the same env behaves identically on a working VAAPI box" before filing.

Sources (Sonnet's references):

G. Sonnet 7.x miscellany (carryovers from iter1 review)

7.1 EACCES on VIDIOC_G_EXT_CTRLS readback (move before MEDIA_REQUEST_IOC_QUEUE; if still EACCES, file kernel issue and remove dead code). 7.4 // HACK block in surface.c::CreateSurfaces2 hardcoding V4L2_PIX_FMT_H264_SLICE — wrong for MPEG-2. 7.5 Firefox seek-to-non-IDR test corpus.

Why: cleanup. Fits in slack time of any other iteration's plan.

  • A + B (defect fix + clean-up of unused diagnostic before commit)
  • C alone (perf measurement is its own thing)
  • E alone (high risk, focus needed)
  • F + G as a "small-debt cleanup" iteration

State that carries (re-verified 2026-05-04 23:35)

  • Hardware: ohm RK3568, kernel 6.19.10-danctnix1-1-pinetab2, hantro G1/G2 on /dev/video1 + /dev/media0 (perms unchanged: crw-rw----+ root:video with mfritsche ACL)
  • Userspace: libva 2.23.0-1, libva-utils 2.22.0-1, mpv 1:0.41.0-3, firefox 150.0.1-1, mesa 1:26.0.5-1, libdrm 2.4.131-1
  • Test fixture: /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4 sha256 dcf8a7170fbd49bb...
  • Driver installed: /usr/lib/dri/v4l2_request_drv_video.so sha256 f27e006433cd4769... (iter2 build with all three fixes)
  • Fork master: 19acc76 (iter2 Fix 3); commits in iter1 + iter2 still ahead of bootlin upstream
  • Build harness: meson setup --buildtype=release && ninja on ohm at /tmp/libva-src/libva-v4l2-request-fourier (rebuilds on every session — /tmp is tmpfs); deploy to /usr/lib/dri/v4l2_request_drv_video.so
  • Live test rig env (Plasma 6 Wayland): XDG_RUNTIME_DIR=/run/user/1001, WAYLAND_DISPLAY=wayland-0, DISPLAY=:0, XAUTHORITY=/run/user/1001/xauth_ZnmtRw (changes per session), DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1001/bus. SSH-driven launches need MOZ_ENABLE_WAYLAND=1 for Firefox + MOZ_DISABLE_RDD_SANDBOX=1 for VAAPI.

State that does NOT carry (re-acquire per feedback_replicate_baseline_first.md)

  • Performance numbers: same caution as iter1+iter2 close. iter2 saw 0 drops in 200 frames for vaapi-copy and smooth on operator inspection for vaapi DMA-BUF; these are smoke verdicts, not anchored measurements. iter3 perf candidate (option C) is the natural place to anchor.
  • Firefox 10-frame decode + EINVAL at frame 11: re-acquire under iter3-specific binding cell. Today's evidence is in /tmp/ff-stdout.log on ohm but tmpfs-volatile.

Tooling and measurement-instrument inventory (live verification)

Carried from iter2:

  • strace -f -e trace=openat,close,ioctl for libva-side V4L2 ioctl tracing
  • sudo ftrace events/v4l2/* events/vb2/* events/dma_fence/* for kernel-side V4L2/vb2 lifecycle
  • sudo dmesg -w for kernel-side warnings (typically silent on this rig)
  • sudo lsof /dev/video1 for fd ownership snapshots
  • mpv --frames=N --vo=gpu with stderr capture
  • Firefox MOZ_LOG=PlatformDecoderModule:5,VideoBridge:5 (under MOZ_DISABLE_RDD_SANDBOX=1)
  • coredumpctl info <pid> for crash backtraces
  • Operator visual inspection on real screen (load-bearing for boolean correctness)

Likely needed for specific iter3 candidates:

  • For A (frame-11 EINVAL): dmesg | grep hantro during decode to catch driver-side EINVAL reasons; ftrace events/hantro/* if kernel exposes such; reading hantro_g1_h264_dec.c for control validation rules
  • For C (perf): pidstat -u -p $(pidof mpv) 1 for CPU%; gpustat-equivalent for Mali-G52 (likely needs Panfrost ftrace); compositor scanout query (Wayland ext-output-management?) is harder
  • For E (DMABUF): gbm_bo_create userspace allocation test program; VIDIOC_QBUF with type=V4L2_MEMORY_DMABUF exploratory path
  • For F (sandbox): meitner / clevo access; Firefox source security/sandbox/linux/SandboxFilter.cpp

In-scope (LOCKED 2026-05-04 for iteration 3) — F + A in parallel

Track F (sandbox hypothesis verify-by-patch). Build firefox-fourier: a Firefox 150.0.1 fork with the RDD-sandbox patch from candidate F (allow /dev/media0, extend AddV4l2Dependencies() cap filter to admit stateless V4L2 nodes, verify MEDIA_REQUEST_IOC_QUEUE ioctl passes seccomp). Run on ohm without MOZ_DISABLE_RDD_SANDBOX=1. Stronger test of the hypothesis than Sonnet's static-analysis verdict — empirically separates "sandbox is the env-var requirement's cause" from any other gating factor.

Track A (frame-11 EINVAL). With sandbox now controlled (Track F's patched binary), the frame-11 EINVAL still recurs — clean-rig isolation. Identify which V4L2 control returns EINVAL on the 11th decoded frame in Firefox; suspect surface narrowed by Sonnet review to per-request DECODE_PARAMS / SCALING_MATRIX / SPS / PPS for non-IDR slices (7.5) or num_ref_idx_l0/l1 mismatch in multi-slice frames (7.2). First concrete step: read hantro_g1_h264_dec.c for control validation rules; run patched Firefox under MOZ_LOG=PlatformDecoderModule:5 + driver request_log to capture the failing control set.

Why parallel rather than sequential: Track F's verification rig (patched Firefox on ohm, running bbb_1080p30 without sandbox bypass) IS the rig that surfaces Track A's signature. Running them in one binding cell is the natural shape; splitting to two iterations would require setting up the same rig twice.

Build host plan (Phase 4 input prereq)

Build venue: boltzmann LXD container (RK3588 aarch64, 8 cores, 30 GB RAM, NVMe, always-on). Native arm64 build avoids cross-compile. AUR/PKGBUILD-based overlay preferred over raw mozilla-central checkout — Arch's firefox PKGBUILD already has a working aarch64 mozconfig and dep set; we layer our sandbox patch as an additional source=() patch in prepare(). On rebuilds use makepkg -e to skip re-extraction and re-patching.

Fallback if rust-on-aarch64 toolchain proves unworkable in the container: power up data (x86_64 box), prevent its sleep timer, set up cross-compile toolchain to aarch64. AUR rebuild semantics (makepkg -e) carry over.

Out-of-scope finding surfaced 2026-05-05 (carry to iter4)

mpv libplacebo segfault on --vo=gpu post-reboot. Operator-side reproduction with LIBVA_DRIVER_NAME=v4l2_request mpv --hwdec=vaapi --vo=gpu --no-audio bbb_1080p30_h264.mp4 after host reboot hit a NEW failure pattern (not the iter2-close "smooth" verdict, not the Track A frame-11 EINVAL):

  • Vulkan init fails: [vo/gpu/libplacebo] EnumeratePhysicalDevices ... VK_ERROR_INITIALIZATION_FAILED (line 4 of trace)
  • 4 frames decode cleanly (surfaces 6710886467108867 sync to real luma data, var=4 on the I-frame)
  • After surf 67108868's BeginPicture: two Unable to request buffers: Device or resource busy (EBUSY on REQBUFS)
  • Then a bizarre CreateSurfaces2: surf_width=16 surf_height=16 fmt_width=48 fmt_height=48 sizes[1]=1050626 (=0x100802, looks uninitialized)
  • Segfault

Hypothesis: vulkan-init-failed code path triggers a resolution-probe in libplacebo/mpv that calls vaCreateSurfaces with downscale-probe dimensions while CAPTURE is still queued. The cap_pool resolution-change path drains+REQBUFs but doesn't fully flush queued CAPTURE buffers, kernel returns EBUSY, driver pushes ahead with garbage sizes[1], mmap or pool-init crashes.

iter3 disposition: option 3 selected (verify-via-Firefox first, defer libplacebo segfault to iter4). Firefox doesn't go through the libplacebo probe paths, so F+A's verification can proceed on patched-Firefox even with mpv broken on the vulkan-fallback path. If firefox-fourier works on ohm despite this regression, the lock for iter4 becomes:

  • Track libplacebo: harden cap_pool resolution-change to drain CAPTURE before REQBUFs; reject vaCreateSurfaces with sentinel-shaped sizes[]; investigate the Vulkan init failure (could be Mesa update, kernel reboot reshuffling GPU state, or genuine Mesa/libplacebo regression).

Or, if the mpv segfault ALSO afflicts firefox-fourier (e.g. the same resolution-probe path is shared at a lower libva layer), iter3 expands or yields back at Phase 7. We learn that empirically.

Out-of-scope (LOCKED 2026-05-04 for iteration 3)

  • Candidates B, C, D, E, G — deferred to a later iteration. B (DEBUG sweep) is the most natural candidate for iter4 since it's an upstream prereq.
  • New codecs (MPEG-2, VP8, VP9, AV1, HEVC) — H.264-only scope holds from iter1+iter2.
  • New target hardware on the libva side (fresnel RK3399, ampere RK3588) — separate iteration after ohm path is hardened. Note: boltzmann (RK3588) is recruited only as a Firefox build host this iteration, NOT as a libva target.
  • Bootlin upstreaming PR — feedback_no_upstream.md holds; no PRs unless explicitly tasked.
  • Mozilla Bugzilla bug-file. Substituted by verify-by-patch; if the patched binary works, the bug filing becomes a follow-up upstream contribution, not part of iter3's Phase 1 success criterion.
  • HEVC re-introduction (stripped in fourier port; no hantro G2 HEVC validation in operator's test corpus).

Phase 1 success criterion (LOCKED 2026-05-04)

Track F: Patched firefox-fourier (firefox-150.0.1 + RDD-sandbox patch) launched on ohm WITHOUT MOZ_DISABLE_RDD_SANDBOX=1 engages our libva-v4l2-request backend, opens /dev/video1 + /dev/media0 from RDD process, and decodes ≥10 frames of bbb_1080p30 through hantro. (10 frames is the iter2-observed floor before the EINVAL hits — past 10 is Track A's domain.)

Track A: Same patched-binary rig decodes ≥30s of bbb_1080p30 without Unable to set control(s): Invalid argument emerging in driver stderr. Where this requires changes, the change lives in libva-v4l2-request-fourier (per-request control set construction), not in firefox-fourier.

Joint success: Both above, on the same patched binary, in the same operator session, with anchored evidence (driver stderr capture, Firefox MOZ_LOG capture, dmesg capture, operator visual confirmation of decode output on screen).

Stop point

Phase 1 LOCKED. iter3 proceeds to Phase 2 (situation analysis: read Mozilla sandbox source on a local mirror for the two target functions), Phase 3 (baseline anchor: re-verify frame-11 EINVAL still reproduces on ohm with stock Firefox 150 + sandbox bypass — same picture as iter2 close), Phase 4 (write the sandbox patch + plan PKGBUILD overlay + lock container provisioning with his), Phase 5 (sonnet review of patch), Phase 6 (build firefox-fourier in container, deploy to ohm), Phase 7 (verify F + A simultaneously), Phase 8 (iteration close). Stop only if user is needed (e.g. the patch produces multi-way design choice, or the rust-aarch64 fallback to data is required).