diff --git a/phase0_findings_iter6.md b/phase0_findings_iter6.md index cdf3d98..90bc2d2 100644 --- a/phase0_findings_iter6.md +++ b/phase0_findings_iter6.md @@ -65,6 +65,18 @@ The campaign's original substrate question — "make multi-planar libva work end **Plan**: find an MPEG-2 test fixture, decode via mpv vaapi-copy + vainfo. Verify hantro G1's MPEG-2 path through libva-v4l2-request-fourier. Surface any codec-specific bugs (the iter4 DPB+request_fd fixes were H.264-specific; MPEG-2 has different control flow). +### I. Firefox VIDIOC_QBUF EINVAL on first frame (NEW from iter5 amendment 2026-05-05) + +> The iter5 amendment (Utility seccomp fix) closes Track F: sandbox no longer blocks the V4L2 request API. With sandbox open, Firefox loads the v4l2_request driver, `cap_pool_init: 24 slots ready` succeeds, then a single `Unable to queue buffer: Invalid argument` (VIDIOC_QBUF EINVAL) on what looks like the first frame, after which Firefox falls back to SW. + +**Why this is a new candidate, not a Track A regression**: mpv `--hwdec=vaapi-copy` decoded 2000 frames clean on the same iter5-end driver build (sha `4bed52ec5d44b389…`). Only Firefox triggers the EINVAL. So the bug lives in the Firefox-specific consumer path through libva, not in the per-frame request_fd/DPB logic that iter4 closed. + +**Diagnostic plan**: strace the Firefox Utility process during initial VIDIOC_QBUF, capture the exact `v4l2_buffer` struct payload, compare against what mpv-vaapi-copy sends at the same call site. Likely culprits: surface handle lifecycle differences (Firefox uses VAImage / VAExportSurfaceHandle paths mpv doesn't), VABufferType ordering (Firefox might submit slice data before SPS/PPS in some frames), or zero-copy DMA-buf attach state. + +**Reproducibility**: 100% on `file:///home/mfritsche/fourier-test/bbb_1080p30_h264.mp4` with `LIBVA_DRIVER_NAME=v4l2_request` env vars + iter5-amend Firefox 150 + sandbox enabled. + +**Risk**: medium. Could be a 5-line fix in v4l2.c QBUF prep, or could surface a fundamental Firefox-vs-mpv divergence in libva surface management. + ### G. V4L2_MEMORY_DMABUF (carried from iter2+3+4+5) > Replace V4L2_MEMORY_MMAP with userspace dma-buf allocation (iter2 Fix 3 was statistical / LRU mitigation; this is architectural). @@ -87,6 +99,7 @@ The campaign's original substrate question — "make multi-planar libva work end - **E alone** (perf binding cell) — anchors campaign-wide claims to numbers. Carried five iterations. - **F alone** (MPEG-2) — validates beyond H.264 scope. - **G alone** (DMABUF) — high-risk architectural. +- **I alone** (Firefox QBUF EINVAL) — narrow, deterministic repro, gates the only known consumer-with-iter5-amendment hard-failure. Strong candidate for the iter6 lock. ## State that carries (re-verified iter5 close) diff --git a/phase8_iteration5_close.md b/phase8_iteration5_close.md index 654f23d..5a1e600 100644 --- a/phase8_iteration5_close.md +++ b/phase8_iteration5_close.md @@ -194,7 +194,34 @@ Adding to memory: `feedback_seccomp_silent_enosys.md` already covers the silent- |---|---| | Combined 160-line patch | Authored, in campaign repo + container 0005-... | | Container src/ tree | Broker (from iter5-G prepare) + seccomp (manually patched in iter5 amend) | -| Built pkg | `675bbc7d…` on boltzmann:/tmp/, awaiting deploy to ohm | -| ohm deploy | Pending operator-side scp (vpn route closed at amendment author time) | -| HW-decode lsof verification | Pending | -| Campaign git commit | Pending | +| Built pkg | `675bbc7d…` on boltzmann:/tmp/ | +| ohm deploy | Done via temporary HTTP server on boltzmann:18080 (vpn route was closed; ohm has lan route to boltzmann.fritz.box). `pacman -U` confirmed Build Date 21:01 CEST = 19:01 UTC = amendment build. `libmozsandbox.so` sha `4e6c7d58bc2220dbdf6ad817ee70fa77fc85e618ffd49ebdabb833f416dc3076`, size 187536 (vs prior 185712, +1824 bytes for the new seccomp code). | +| HW-decode verification | **Track F GREEN.** Played `bbb_1080p30_h264.mp4` via Firefox 150 with `LIBVA_DRIVER_NAME=v4l2_request` and full sandbox. Log: `cap_pool_init: 24 slots ready` then `Unable to queue buffer: Invalid argument` — **no seccomp violation, no ENOSYS on `MEDIA_IOC_REQUEST_ALLOC`**. The seccomp gate is fully passed; the remaining EINVAL is post-sandbox, driver-level. | +| Campaign git commit | `d2d9107` (patch + docs) pushed to Gitea pre-verification. | + +### Track F closes — definitive evidence + +Pre-amendment failure pattern (iter5-G with broker-only patch): +``` +v4l2-request: cap_pool_init: 24 slots ready +[PID] Sandbox: seccomp sandbox violation: pid PID, syscall 29, args FD 0x80047C05 ... +v4l2-request: Unable to allocate media request: Function not implemented +``` + +Post-amendment pattern (iter5-amend): +``` +v4l2-request: cap_pool_init: 24 slots ready +v4l2-request: Unable to queue buffer: Invalid argument +``` + +The disappearance of "syscall 29 ... 0x80047C05" and "Function not implemented" in the post-amendment log is the closing proof: `MEDIA_IOC_REQUEST_ALLOC` (`_IOR('|', 0x05, int)`) now reaches the kernel from the Utility process. Track F is **GREEN**. + +YouTube (`watch?v=7DAPd5MGodY`) played without engaging the v4l2_request driver at all — no `cap_pool_init`, no `/dev/video1` activity. Likely cause: YouTube negotiated VP9 or AV1 with FF150 (no `h264ify` extension installed); v4l2_request only handles H.264, so libva isn't dispatched. This is unrelated to Track F — a codec-negotiation issue, not a sandbox issue. The H.264-only fixture (`bbb_1080p30_h264.mp4` direct file URL) bypasses YouTube's codec negotiation and triggers v4l2_request, which is how the sandbox close was demonstrated. + +### New iter6 candidate — Firefox VIDIOC_QBUF EINVAL on first frame + +A driver-level issue surfaced post-sandbox-fix: with the v4l2_request driver loaded (cap_pool_init succeeds), Firefox's libva path issues a single `VIDIOC_QBUF` that returns EINVAL on what appears to be the first frame. `mpv --hwdec=vaapi-copy` decoded 2000 frames clean on the same iter5-end driver build (sha `4bed52ec5d44b389…`); only Firefox triggers this. Likely consumer-specific: Firefox's media stack feeds the driver differently than mpv (different VAEncodedSlice ordering, VAImage usage pattern, surface handle lifecycle, etc.). + +Logged as iter6 candidate. Track A's frame-11 EINVAL fix was about per-frame request_fd lifecycle and DPB FFmpeg-semantics matching — this looks earlier (init-phase, not frame 11), so likely a different root cause. Diagnosis would start with strace on the Firefox Utility process during initial QBUF, capture the exact `v4l2_buffer` struct payload, compare against what mpv sends. + +This does not retract the iter5 sweep's GREEN verdict — the per-consumer divergence reinforces the Phase 5 sonnet caveat ("sweep-completion verification needs to exercise EVERY consumer code path") that's already documented as a process lesson.