# Phase 0 deliverable #3 — Firefox VAAPI engagement, LIVE Plasma session (2026-05-04) Re-test of `phase0_evidence/2026-05-04-firefox/findings.md` inside an active Plasma 6 Wayland session on ohm (XDG_SESSION_TYPE=wayland, XDG_RUNTIME_DIR=/run/user/1001, kwin_wayland 144533, plasmashell 144667, Xwayland :0). Same Firefox profile + LIBVA env vars; only the gfx environment changed. ## Verdict **Two-layer finding that inverts the prior Phase 0 re-verification verdict.** | Layer | Result | |---|---| | Firefox loads libva-v4l2-request driver | **✓** dlopen `v4l2_request_drv_video.so` succeeds; no sandbox/gating issue under real Plasma session | | Firefox completes the V4L2-stateless contract lifecycle on hantro | **✓** REQUEST_ALLOC → S_FMT → CREATE_BUFS → STREAMON → S_EXT_CTRLS → QBUF + REQUEST_QUEUE → DQBUF + EXPBUF, no EINVAL on the request-API path | | **Kernel produces decoded pixel output** | **✗** CAPTURE buffer returns with patch-0011 sentinel `0xab` unchanged. **Hantro never wrote the buffer.** | | Consumer reaction to bad output | Firefox detects the failed first frame and **silently falls back to software decode** in RDD's FFmpeg-OS-library PDM. User-visible playback continues normally; observed t=337s (5+ minutes) of stable bunny playback via SW. | **This means the prior Phase 0 re-verification verdict (commit `f15ba8b`, "the 2026-04-26 picture holds at boolean-correctness level") was wrong** — the prior test was a clean *contract trace* but never inspected the actual *decoded pixel content*. ## What the live-session strace shows Decode happened on PID **146420** (Firefox utility process for hwaccel), `/dev/video1` fd 7, `/dev/media0` fd 8. Single-frame attempt (full ioctl summary, exhaustive — not a sample): ``` 22 VIDIOC_ENUM_FMT (probe, including expected MPLANE-fallback EINVALs) 5 VIDIOC_QUERYBUF 4 VIDIOC_G_FMT 2 VIDIOC_STREAMON (OUTPUT_MPLANE + CAPTURE_MPLANE) 2 VIDIOC_STREAMOFF (the bail-out after frame 0) 2 VIDIOC_S_EXT_CTRLS (1 device-wide DECODE_MODE+START_CODE per patch 0002, 1 per-request) 2 VIDIOC_REQBUFS 2 VIDIOC_QBUF (1 OUTPUT, 1 CAPTURE) 2 VIDIOC_DQBUF (1 OUTPUT, 1 CAPTURE) 2 VIDIOC_CREATE_BUFS 1 VIDIOC_S_FMT (OUTPUT_MPLANE H264_SLICE 1920x1088) 1 VIDIOC_EXPBUF (DMA-BUF export of CAPTURE buffer) 1 MEDIA_REQUEST_IOC_QUEUE 1 MEDIA_REQUEST_IOC_REINIT 1 MEDIA_IOC_REQUEST_ALLOC ``` Single QBUF/DQBUF pair, single MEDIA_REQUEST_IOC_QUEUE = exactly one frame attempted. No EINVAL on any request-API ioctl. Two STREAMOFF = clean shutdown of both queues after the failed frame. After the libva STREAMOFF, **no further V4L2 ioctls** anywhere in the strace. Bunny continued playing for 5+ minutes via SW decode — confirmed by user inspection (t=337s playback time visible in window title). ## What the .so DEBUG instrumentation shows `stderr_live` (4 lines, 553 B, the entire output of patches 0010+0011+0014): ``` v4l2-request: VAPictureH264 sizeof=36 CurrPic[0..31]: 00 00 00 04 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 v4l2-request: VAPictureH264 CurrPic field reads: picture_id=0x04000000 frame_idx=0 flags=0x0 TopFOC=65536 BottomFOC=65536 frame_num=0 v4l2-request: OUTPUT[idx=0, len=6272]: 00 00 01 25 b8 20 20 21 44 c5 00 01 57 9b ef be fb ef be fb ef be fb ef be fb ef be fb ef be fb v4l2-request: CAPTURE[idx=0, plane0]: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ``` - **VAPictureH264 dump (patch 0014)**: `TopFOC=65536` and `BottomFOC=65536` — these are the ffmpeg-vaapi POC sentinel values that patch 0015 is supposed to strip. **The dump shows the values BEFORE patch 0015's strip.** The strip happens after the dump in the code path — so the dump captures the raw VAAPI input. After strip, the values handed to V4L2 should be 0/0. (Worth verifying via a second dump after the strip.) - **OUTPUT buffer (patch 0010)**: `00 00 01 25 b8 20 ...` is correct ANNEX_B start code + IDR slice NAL header (0x25 = forbidden_zero=0, nal_ref_idc=01, nal_unit_type=00101 = IDR slice). The `ef be fb` repeating pattern that initially looked like poison is real H.264 RBSP slice data (CABAC bin sequences are random-looking). 6272 bytes is plausible for a 1080p IDR slice. - **CAPTURE buffer (patch 0011)**: **All 32 bytes of plane 0 still hold `0xab`** — the sentinel pattern patch 0011 wrote before QBUF. The kernel returned the buffer via DQBUF without writing to it. Hantro did *not* decode this frame. ## Cross-check against the 2026-05-04 mpv vaapi-copy run Re-examined `phase0_evidence/2026-05-04/mpv_vaapi_copy_2026-05-04.stderr` (which we previously called "68 frames decoded cleanly"): ``` $ grep "CAPTURE\[" mpv_vaapi_copy_2026-05-04.stderr | wc -l 68 $ grep "CAPTURE\[" mpv_vaapi_copy_2026-05-04.stderr | grep -c "ab ab ab ab" 68 ``` **68 of 68 mpv CAPTURE buffers show the same sentinel-survives pattern.** mpv `--vo=null` consumed all 68 sentinel buffers as if they were valid NV12 frames; with no real VO to render to, the failure was invisible. OUTPUT bytes for frame 0 are byte-for-byte identical between mpv and Firefox (same IDR slice from same source clip, both via libavcodec). Both consumers feed hantro the same data; hantro silently drops both. ## Why the 2026-04-26 STUDY claim survived as long as it did The claim was "vainfo + mpv probes work end-to-end." This is true *at the libva-engagement layer*: vainfo enumerates profiles, mpv completes the contract lifecycle, no errors on the request-API path. The check that was missing was *pixel-content verification*: - vainfo doesn't decode — it only enumerates capabilities. Always green. - mpv with `--vo=gpu` or `--vo=vaapi` would have shown garbage (all 0xab = mid-gray NV12), but the test rig in the predecessor campaign was probably the same as ours: SSH + `--vo=null`. - `fourier_attribution` cell A (chromium-fourier 149 with Step 1 patches, browser_cpu_median = 54 %) **was visually inspected by the operator on a real screen** during that campaign. Cell A's bunny WAS visible and playing — but on Brave/Chromium, not on mpv or Firefox. Chromium's V4L2 path may differ (uses its own backend in addition to libva, depending on flags); the cell A success could be a different code path than the one we just traced. - The patch-0011 sentinel test was apparently authored to detect this exact failure mode but its output wasn't being grepped in the close-out. The patch series was held to be working based on the contract-trace cleanliness — which we now know is necessary but not sufficient. ## Implication: Phase 0 substrate result is "kernel decode broken" The Phase 0 in-session re-verification (campaign repo commit `f15ba8b`) overstated the case. The corrected verdict: - libva engagement: ✓ on both mpv and Firefox in their respective rigs - V4L2-stateless contract trace: ✓ no EINVAL on the request-API path - **Hantro produces decoded pixel output: ✗ on every frame attempted, by either consumer** Phase 1 boolean-correctness criterion as currently locked says "boolean correctness — `libva accepted + providing access to hardware decoder`." We reached "libva accepted" but **not** "providing access to hardware decoder" in the meaningful sense. The criterion should be sharpened to require pixel-content verification, e.g.: "the CAPTURE buffer returned from DQBUF must contain decoded pixel data (sentinel-overwritten); a smoke test of NV12 luma min/max range across at least one frame must reject the all-0xab pattern." Phase 1 lock now needs amendment. ## What this means for Phase 6 / Step 1 The deployed Step 1 18-patch series engages the libva path correctly but doesn't make hantro decode. The bug surface is in one of these areas (rough priority order, based on patch-comment hints): 1. **`reference_ts` not propagated.** Patch 0017's commit message: "hantro doesn't read pic_num (uses reference_ts)." Implication: hantro depends on reference_ts being populated correctly to find DPB references for inter prediction. For an IDR (frame 0), reference_ts is irrelevant — but if reference_ts is malformed in a way that breaks SPS parsing, hantro might bail before decode. 2. **DECODE_PARAMS missing slice_header bit_size fields.** Patch 0008's open question was explicitly "does hantro tolerate the bit_size fields being zero, or do we need a slice_header() bit-level parser?" The Step 1 series did NOT add a slice_header parser — those fields are zero. Maybe hantro doesn't tolerate that and silently skips decode. 3. **POC sentinel still leaking.** Patch 0015's strip happens at the right call sites, but the DEBUG dump (patch 0014) runs *before* the strip — so the dump shows the raw 65536 values. Verify the values handed to V4L2 are actually stripped by adding a post-strip dump or reading the V4L2_CTRL_TYPE_H264_DECODE_PARAMS via VIDIOC_G_EXT_CTRLS just before QBUF. 4. **Level_idc over-allocation interaction.** Patch 0013 hardcodes `level_idc=51`; patch 0018 derives it from Annex A.3 (so for 1080p we'd get level_idc=41). hantro uses level_idc to size DPB/MV buffers. Wrong sizing might allocate too small and drop the decode silently. 5. **CAPTURE format derivation.** Patch 0009 removed `VIDIOC_S_FMT` on CAPTURE per "Hantro derives CAPTURE format from per-request SPS." The G_FMT shows NV12 1920×1088, which looks right — but if SPS isn't being submitted, the kernel might decode into a different layout that overwrites neither the sentinel nor the real CAPTURE bytes. 6. **Other hantro silent-failure paths**: `V4L2_EVENT_SOURCE_CHANGE` (open Q #5 in `phase0_findings.md`), per-frame timestamp / VIDIOC_S_PARM, missing `frame_num` / IDR-bit setup in DECODE_PARAMS flags. The correct Phase 6 starting point is to instrument *the kernel side*: ftrace `events/v4l2/` and `events/hantro/` (if exposed), or `dmesg` for any silent-decode-error messages, while the userspace contract trace runs. That's the actual Phase 3 baseline we need. ## Artifacts - `firefox_live.log.{moz_log,child-1.moz_log}` — MOZ_LOG output from the live-session run - `firefox_stderr_live` — the .so DEBUG output (only 4 lines because only 1 frame was attempted) - `firefox_stdout_live` — empty - `strace_146420` — the decode utility process: full V4L2-stateless lifecycle - `strace_146198`, `strace_146199`, `strace_146200`, `strace_146201`, `strace_146203` — RDD + content + GPU process traces - `strace_146147` — Firefox parent - `strace_146164` — fork-server / GMP-related child ## Phase 0 deliverables status (updated) - **#1** Re-verify failure-mode finding in-session — ✗ **AMENDED**: contract trace lands cleanly, but kernel produces no decoded pixels. Prior commit `f15ba8b` overstated the verdict. - **#2** Step 1 reconciliation — ✓ done in commit `74b3793` on fork master. - **#3** Firefox configuration end-to-end — ✓ engagement confirmed in live Plasma session; pixel-content failure mode identical to mpv (libva ✓, hantro ✗). - **#4** Phase 0 baseline anchor — ✗ **AMENDED**: the captured contract trace describes Step 1's userspace behaviour, not what Phase 6 must reproduce. Phase 6's spec needs to include kernel-side observability (ftrace / dmesg) so we can actually characterize hantro's silent failure. Phase 1 lock should be deferred until we have a sharpened boolean-correctness criterion (pixel-content verification) and at least a hypothesis about why hantro is silent. Phase 0 isn't done yet.