Phase 0 deliverable #3 (Firefox live session): inverted verdict

Re-tested Firefox 150.0.1 inside operator's active Plasma 6 Wayland
session (not Xvfb). Two-layer finding:

1. Firefox engages libva in real Plasma session: full V4L2-stateless
   contract lifecycle completes, no EINVAL on the request-API path,
   v4l2_request_drv_video.so successfully loaded, /dev/video1 +
   /dev/media0 opened by RDD utility process 146420.

2. Kernel produces no decoded pixel output: CAPTURE buffer returns
   from DQBUF with the patch-0011 sentinel pattern 0xab unchanged.
   Hantro never wrote the buffer despite the contract trace looking
   clean. Firefox detected the failed first frame and silently fell
   back to SW decode in RDD's FFmpeg-OS-library PDM. User-visible
   playback continues normally for 5+ minutes (operator confirmed
   t=337s playback time in live inspection).

Cross-checked against the prior 2026-05-04 mpv vaapi-copy run: 68 of
68 mpv CAPTURE buffers show the same sentinel-survives pattern.
mpv's --vo=null consumed all 68 sentinel buffers as if they were
valid NV12 frames; the failure was invisible. OUTPUT bytes are
byte-for-byte identical between mpv and Firefox (same IDR slice via
libavcodec, both consumers feed hantro the same data, hantro
silently drops both).

Implication: the prior Phase 0 in-session re-verification verdict
(commit f15ba8b: "the 2026-04-26 picture holds at boolean-correctness
level") was wrong at the kernel-decode layer. The patch-0011 sentinel
test in the deployed Step 1 build was authored specifically to detect
this failure mode; the predecessor close-out didn't grep for it, and
contract-trace cleanliness was mistaken for end-to-end success.

Phase 1 lock should be deferred until: (a) boolean-correctness
criterion is sharpened to require pixel-content verification,
(b) Phase 0 acquires kernel-side observability (ftrace, dmesg) to
characterize WHY hantro is silent. Step 1 engages libva but doesn't
make hantro decode -- Phase 6 has substantive work beyond the
18-patch series.

Likely failure-mode candidates flagged in findings_live.md priority
order: reference_ts not propagated; DECODE_PARAMS slice_header
bit_size zero; POC sentinel may still leak past patch-0015 strip;
level_idc over-allocation; SOURCE_CHANGE event handling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-04 10:38:57 +00:00
parent f115fa6cbc
commit e892cea858
14 changed files with 15747 additions and 3 deletions
+33 -3
View File
@@ -140,13 +140,43 @@ Phase 0 deliverables status: #1 ✓, #2 ✓ (Step 1 reconciled into fork master
## Firefox engagement test (Phase 0 deliverable #3, 2026-05-04)
### Headless run (Xvfb, SSH-driven)
Stock Firefox 150.0.1 + `media.ffmpeg.vaapi.enabled=true` + `LIBVA_DRIVER_NAME=v4l2_request` env, executed under Xvfb on ohm. Full write-up: [`phase0_evidence/2026-05-04-firefox/findings.md`](phase0_evidence/2026-05-04-firefox/findings.md).
**Verdict**: inconclusive at the boolean-correctness level under the headless rig. Firefox's RDD process **dlopens libva.so.2 + libva-drm.so.2 + libva-x11.so.2 for capability probe** then immediately closes them; never reaches `vaInitialize`, never opens `/dev/dri/renderD128`, never reaches `v4l2_request_drv_video.so`. Falls back to software H.264 in RDD via FFmpeg-OS-library PDM (`Broadcast support from 'RDD', support=H264 SWDEC`). The gating decision happens **at Firefox's gfx-environment platform-fitness check**, before VAAPI device init — Xvfb provides software framebuffer with no DRI/DRM render-node integration, so Firefox's PDM enumerator skips VAAPI entirely. Not a libva-side or driver-side fault.
**Result**: Firefox's RDD process dlopens libva.so.2 + libva-drm.so.2 + libva-x11.so.2 for capability probe then immediately closes them; never reaches `vaInitialize`. Gfx-environment platform-fitness check rejects VAAPI under Xvfb's software-framebuffer-with-no-DRI rig. Not a libva-side fault. Re-test in live session needed.
mpv `--hwdec=vaapi-copy` in the same headless rig DID engage end-to-end, so the issue is specifically Firefox's gfx-env requirements being stricter. Definitive Firefox verdict requires retesting inside a live Plasma session — currently ohm has only SDDM greeter on tty1 with no active user session.
### Live Plasma Wayland session run — INVERTS PRIOR PHASE 0 VERDICT
**Implication for Phase 1**: Firefox stays as a target consumer in the corpus, but the binding cell for "does Firefox engage HW decode" is locked to Phase 7 verification in a real session, not to a Phase 0 baseline. mpv `--hwdec=vaapi-copy` carries the boolean-correctness substrate for Phase 0; vainfo + chromium-fourier 149 (TBD) provide additional triangulation.
Same Firefox profile + LIBVA env, executed inside the operator's active Plasma 6 Wayland session (XDG_SESSION_TYPE=wayland, XDG_RUNTIME_DIR=/run/user/1001). Full write-up: [`phase0_evidence/2026-05-04-firefox-live/findings.md`](phase0_evidence/2026-05-04-firefox-live/findings.md).
**Result, two-layer**:
| Layer | Verdict |
|---|---|
| libva engagement (driver dlopen, contract lifecycle) | ✓ — clean. Single-frame attempt, all V4L2-stateless ioctls (REQUEST_ALLOC → S_FMT → CREATE_BUFS → STREAMON → S_EXT_CTRLS → QBUF + REQUEST_QUEUE → DQBUF + EXPBUF) succeed, no EINVAL on the request-API path. |
| **Kernel produces decoded pixel output** | **✗ — hantro returns CAPTURE buffer with patch-0011 sentinel `0xab` unchanged**. |
| Consumer reaction | Firefox detected the failed first frame and silently fell back to SW decode. User-visible: BBB plays normally for 5+ minutes via SW (operator-confirmed at t=337s playback time). |
**Cross-checked against the prior mpv vaapi-copy run**: re-examined `phase0_evidence/2026-05-04/mpv_vaapi_copy_2026-05-04.stderr`**68 of 68 mpv CAPTURE buffers show the same sentinel-survives pattern**. mpv's `--vo=null` consumed all 68 sentinel buffers as if they were valid NV12 frames; the failure was invisible. OUTPUT bytes are byte-for-byte identical between mpv and Firefox (same IDR slice, both via libavcodec).
### Implication: prior Phase 0 verdict (commit `f15ba8b`) was wrong
The 2026-04-26 STUDY claim of "vainfo + mpv probes work end-to-end" — repeated in the prior Phase 0 in-session re-verification commit — held only at the **libva-engagement** layer. At the **kernel-decode** layer, hantro produces no decoded output for either consumer. The patch-0011 sentinel test (in the deployed Step 1 build) was authored to detect exactly this; the predecessor close-out apparently didn't grep for it, and the contract-trace cleanliness was mistaken for end-to-end success.
Phase 0 deliverable status corrections:
- **#1** (re-verify failure-mode finding) — ✗ **AMENDED**: contract trace lands cleanly, kernel produces no decoded pixels.
- **#3** (Firefox configuration end-to-end) — ✓ engagement confirmed in live Plasma session; pixel-content failure mode identical to mpv.
- **#4** (Phase 0 baseline anchor) — ✗ **AMENDED**: captured trace describes Step 1's userspace behaviour, not the kernel-side spec Phase 6 must reproduce.
**Phase 1 lock should be deferred** until: (a) the boolean-correctness criterion is sharpened to require pixel-content verification (sentinel-overwrite check, NV12 luma min/max sanity, etc.), and (b) Phase 0 includes a kernel-side observability layer (ftrace `events/v4l2/`, `dmesg` for silent decode errors) so we can characterize *why* hantro is silent. The Step 1 18-patch series engages libva but doesn't make hantro decode — Phase 6 has substantive work.
Likely failure-mode candidates (priority order, from patch comments):
1. `reference_ts` not propagated (per patch-0017 commit body: "hantro doesn't read pic_num, uses reference_ts")
2. DECODE_PARAMS slice_header bit_size fields all zero (patch 0008's open question, never resolved)
3. POC sentinel still leaking past patch-0015's strip (DEBUG dump runs *before* the strip; need post-strip verification via VIDIOC_G_EXT_CTRLS)
4. level_idc over-allocation interaction (patch 0013 → 0018 transition)
5. `V4L2_EVENT_SOURCE_CHANGE` not handled (open Q #5)
## Source-read references (carry-over from STUDY.md)