Files
libva-multiplanar/phase3_iter3_baseline.md
T
marfrit f91469abe3 Iteration 3 close — F GREEN, A reproduced + diagnosed for iter4
Phase 1 locked F (Firefox RDD sandbox verify-by-patch) and A (frame-11
EINVAL diagnose) running in parallel on a single firefox-fourier build.

Track F: GREEN. Patched Firefox 150.0.1 (firefox-fourier, pkgrel=1.1)
launches on ohm WITHOUT MOZ_DISABLE_RDD_SANDBOX=1 and engages our
libva-v4l2-request backend end-to-end. Three patches needed (Phase 2
identified one and deferred two):
  - Broker policy (SandboxBrokerPolicyFactory.cpp): allow /dev/media*,
    extend cap-filter to admit stateless decoders that lack M2M caps.
  - Seccomp policy (SandboxFilter.cpp): allow ioctl magic byte '|'
    for <linux/media.h> request-API ioctls.
  - Driver (media.c): replace select() with poll() — Mozilla's RDD
    seccomp common policy admits poll/ppoll/epoll_* but not
    select/pselect6. Driver-side fix preferred; smaller surface,
    portable across sandbox policies, and poll() is the modern API.

Track A: REPRODUCES + DIAGNOSED. Frame-11 EINVAL fires deterministically
on a single-slice P-frame (slice_type=0, frame_num=5, post-IDR) — the
exact iter1/iter2 carryover signature, confirming it isn't environmental.
Y2 instrumentation (in v4l2_ioctl_controls) now logs num_controls /
error_idx / per-control id+size on EINVAL. Sizes match kernel UAPI;
error_idx == num_controls is the kernel's "all bad / no specific control"
sentinel — it's a request-level rejection, not a single-field violation.
Fix is iter4's lock; rig + Y2 in place for fast iter4 turnaround.

Build infrastructure introduced: firefox-fourier LXD container on
boltzmann (RK3588 aarch64, persistent, ssh -J boltzmann
builder@firefox-fourier). Upstream Arch x86_64 wasi packages installed
to work around 4-year-stale ALARM versions. PGO generation crashes at
exit (LXC has no display); obj/dist/ tarball used as the deployable
artifact instead of the pacman package.

Phase 6 surprises captured in phase6_iter3_findings.md: malformed
first-cut patch (descriptive vs numeric hunk headers), --enable-v4l2
isn't a Mozilla 150 flag (auto-set on aarch64+GTK), Mozilla 2025 PGP
key rotation, ALARM-stale wasi, onnxruntime missing in ALARM, and the
"no tricks" lesson (revert workarounds first when redirected).

Carries to iter4 substrate: Track A fix is the natural lock; mpv
libplacebo --vo=gpu segfault stays as separate iter4 candidate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:56:34 +00:00

57 lines
5.3 KiB
Markdown

# Iteration 3 — Phase 3 (baseline anchor: pre-patch Firefox 150 behavior on ohm)
Goal: anchor the pre-patch behavior so Phase 7 has a "before" picture. Two distinct baselines matter for iter3:
- **Baseline-S (sandbox):** stock Firefox 150 with default RDD sandbox → libva fails at `open(/dev/media0)` with ENETDOWN → Firefox SW-falls-back. This is what Track F's patch is supposed to fix.
- **Baseline-A (frame-11 EINVAL):** stock Firefox 150 with `MOZ_DISABLE_RDD_SANDBOX=1` → libva engages hantro, decodes 10 frames, then EINVAL on `set_controls` at frame 11. This is the carryover defect Track A is supposed to fix.
## Anchored baseline source
ohm is currently powered off (probe `ping -c 1 ohm.fritz.box` from rpi at 2026-05-04 ~23:50 returned `100% packet loss`; PineTab2 has no WoL — manual power-on by operator required). So the in-session re-acquire of `/tmp/ff-stdout.log` is not possible right now. The substantive risk this poses to Phase 3 is **low**, because:
1. iter2 close (`phase8_iteration2_close.md`, commit `c36c61e`) recorded the same baseline observations on **2026-05-04**, the same day this Phase 3 anchor is being written. Same kernel (6.19.10), same userspace (Firefox 150.0.1, libva 2.23.0, mesa 26.0.5), same fixture (bbb_1080p30_h264.mp4 sha256 `dcf8a7170fbd...`), same driver build (sha256 `f27e0064...`). No state has drifted.
2. The "before" picture is what we want to PROVE WRONG via the patch. The verifying observation is the "after" picture in Phase 7. Re-acquiring the "before" within hours of an identical observation that's already in git would be ceremonial.
So this Phase 3 doc anchors the iter2-close evidence by reference, with the explicit understanding that Phase 7 will produce the corresponding "after" rig. If at Phase 7 we discover the stock Firefox baseline has shifted (e.g. Firefox 151 has dropped through pacman update by then), we re-acquire then.
## Baseline-S evidence (anchored from iter2 close)
Quoted verbatim from `phase8_iteration2_close.md`:
> Firefox 150 (default sandbox) | ✗ libva init fails inside RDD sandbox on `open(/dev/media0)` returning ENETDOWN — Firefox SW-falls-back. **NOT an iter2 code regression** (iter1 init code is byte-identical), but a Firefox routing change since iter1: iter1's findings.md shows decode happened on the **utility** process (`sandboxingKind=0`), iter2 today shows the libva path goes through RDD which is sandbox-blocked. Workaround: launch Firefox with `MOZ_DISABLE_RDD_SANDBOX=1`.
Signature to match in Phase 7:
- Command: stock `firefox /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4` (no MOZ_DISABLE_RDD_SANDBOX)
- driver stderr: shows `open("/dev/media0", O_RDWR)` returning -1 ENETDOWN
- decode behavior: SW fallback, no hantro engagement
Phase 7 verifies: with `firefox-fourier` patched binary and same launch (no env var), the open succeeds and ≥10 frames decode through hantro.
## Baseline-A evidence (anchored from iter2 close)
> Firefox 150 (sandbox-disabled) | ✓ engages our libva, decodes 10 frames cleanly through hantro (luma gradient `0x10→0x1c` matching BBB intro fade, real NV12 pixels), then EINVAL on `set_controls` at frame 11. The EINVAL is a non-iter2 issue — same Sonnet 7.x family carryover from iter1 (likely 7.5 mid-stream / 7.2 num_ref_idx). cap_pool model is NOT the regression.
> The 10-frame decoded sequence under sandbox-bypass confirms Fix 3's cap_pool architecture works correctly with Firefox: surface IDs 67108864..67108871 each acquired their own slot, and surface IDs were recycled across frames 5,6,9 with the slot state machine cycling through IN_DECODE → DECODED → recycle on next BeginPicture for the same surface. Pool was operating exactly as designed.
Signature to match in Phase 7 (track A):
- Command: `firefox-fourier /home/mfritsche/fourier-test/bbb_1080p30_h264.mp4` with the libva-v4l2-request-fourier driver instrumented to log per-request control values
- driver stderr: `Unable to set control(s): Invalid argument` emerging at the 11th frame
- Where to look: per-request controls submitted from `EndPicture` — DECODE_PARAMS, SLICE_PARAMS, SCALING_MATRIX, SPS, PPS — for the slice immediately after the cap_pool's first recycle event
Phase 7 verifies: with iter3's libva fix applied (Phase 4 also produces this), the EINVAL no longer fires; ≥30s of bbb_1080p30 decode without `Unable to set control(s)`.
## Phase 3 carry-over to Phase 4
Phase 4 (patch + PKGBUILD overlay authorship) does not need Baseline-S or Baseline-A re-acquired live. It needs:
1. The verbatim Mozilla source from Phase 2 (already captured in `phase2_iter3_situation.md`)
2. The cap-set of hantro on ohm to confirm whether `V4L2_CAP_VIDEO_M2M_MPLANE` is set (cheap to check at Phase 7 boot via `v4l2-ctl --device=/dev/video1 --info`)
3. The fixture and driver state (anchored, unchanged since iter2)
Operator action item for Phase 7 prep: when ohm is next powered on, run `v4l2-ctl --device=/dev/video1 --info | grep -E 'Capabilities|Device'` and capture output. If `Video M2M Multiplanar` is in the cap list, the cap-filter extension is unnecessary and the patch shrinks to "just add /dev/media0". If absent, both pieces of the patch are needed.
## Stop point
Phase 3 anchored. Proceeding to Phase 4: write the firefox-fourier patch + the AUR PKGBUILD overlay. Operator-side action item flagged above. ohm offline does NOT block Phase 4 (writing the patch is desk work).