# Phase 6 — implementation result The Tier 1A + Tier 1B + Tier 2C fixes from `diff_against_ffmpeg.md` produced a working hardware-decoded H.264 pipeline on hantro RK3568 / ohm via libva-v4l2-request, end-to-end, in the operator's live Plasma 6 Wayland session. ## Verdict **WIN.** mpv `--hwdec=vaapi-copy --vo=gpu` on `bbb_1080p30_h264.mp4` shows the bunny playing on screen — real decoded NV12 pixel data, not the all-zero (green) or solid-color output we'd been getting all day. Operator confirmation: 2026-05-04, "A big fat white bunny shows up." First end-to-end success in this campaign. ## What landed in the fork Two commits on `marfrit/libva-v4l2-request-fourier` master: - **`d41a4b9`** — h264: always submit SCALING_MATRIX + populate pps num_ref_idx - `h264_default_flat_scaling_matrix()`: H.264 spec flat default (16/16/...) when bitstream has no explicit lists - SCALING_MATRIX submitted unconditionally (Tier 2C) - `V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT` set unconditionally (Tier 2C) - `pps->num_ref_idx_l0_default_active_minus1` and `_l1_` populated from VASlice (Tier 1B) - **`9de1be3`** — h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields (the load-bearing fix; Tier 1A) - New `src/h264_slice_header.{c,h}` — minimal RBSP bit reader + slice_header() parser per ITU-T H.264 §7.3.3 - Walks slice_header through dec_ref_pic_marking(), measures bit positions - Populates `decode->idr_pic_id`, `pic_order_cnt_lsb`, `delta_pic_order_cnt_*`, `pic_order_cnt_bit_size`, `dec_ref_pic_marking_bit_size` - These three `_bit_size` fields drive hantro G1 hardware MMIO registers `G1_REG_DEC_CTRL5_REFPIC_MK_LEN`, `G1_REG_DEC_CTRL5_IDR_PIC_ID`, `G1_REG_DEC_CTRL6_POC_LENGTH`. Without them the hardware bitstream parser walks past zero bits, lands on garbage, decodes zero pixels. ## Slice-header parser empirical validation The parser's own `request_log` output for the first three frames of bbb (with new build deployed): ``` slice_header parse: idr_pic_id=0 poc_lsb=0 poc_bits=8 refmark_bits=2 frame_num=0 slice_type=2 pps_id=0 slice_header parse: idr_pic_id=0 poc_lsb=4 poc_bits=8 refmark_bits=1 frame_num=1 slice_type=0 pps_id=0 slice_header parse: idr_pic_id=0 poc_lsb=2 poc_bits=8 refmark_bits=0 frame_num=2 slice_type=1 pps_id=0 ``` - Frame 0 IDR: slice_type=2 (I-slice), idr_pic_id=0, frame_num=0, refmark_bits=2 (no_output_of_prior_pics_flag + long_term_reference_flag — exactly what IDR's dec_ref_pic_marking is) ✓ - Frame 1: slice_type=0 (P-slice), frame_num=1, refmark_bits=1 (adaptive_ref_pic_marking_mode_flag=0 only) ✓ - Frame 2: slice_type=1 (B-slice), frame_num=2 ✓ - `frame_num` from bitstream parse matches VAAPI's pre-parsed `frame_num`. POC bit-size = 8 matches `log2_max_pic_order_cnt_lsb_minus4 + 4 = 4 + 4 = 8` ✓ Cross-check: parser output is internally consistent and matches VAAPI's pre-parsed values for all observable fields. ## Build + deployment ``` ssh ohm: git clone gitea/marfrit/libva-v4l2-request-fourier (HEAD = 9de1be3) ssh ohm: cd src && meson setup build --buildtype=release && ninja -C build ssh ohm (sudo): cp build/src/v4l2_request_drv_video.so /usr/lib/dri/ ``` New `.so` size: 77 224 B (vs prior 67 480 B for the Step 1 build — net +9.7 kB for the slice-header parser + minor edits). sha256: `33cfe687e459310d8d5571e6380e330f7ed716ac12710d67ef3821532108c7a0`. No build warnings except pre-existing ones in `src/v4l2.h` (struct forward-decl issue; orthogonal). ## Phase 1 boolean-correctness criterion — re-evaluated | Criterion | Pre-fix | Post-fix | |---|---|---| | Consumer dlopens v4l2_request_drv_video.so | ✓ | ✓ | | vaInitialize succeeds | ✓ | ✓ | | V4L2-stateless contract trace lands without EINVAL | ✓ | ✓ | | Kernel processes the request (no dmesg errors) | ✓ | ✓ | | Kernel writes to CAPTURE buffer | ✓ (zeros) | ✓ (real NV12 pixels) | | **Real-VO renders decoded frames** | ✗ (green/blue uniform) | **✓ (bunny visible)** | The sharpened criterion holds for mpv `--hwdec=vaapi-copy` in a live session. Phase 7 needs to extend this verification across the full test corpus. ## Phase 7 verification — what's left - mpv `--hwdec=vaapi-copy --vo=gpu` (live session): ✓ confirmed (bunny) - mpv `--hwdec=vaapi --vo=gpu` (live session): TBD — different code path (DMA-BUF GL-import vs vaapi-copy) - vainfo: was ✓ pre-fix at the enumeration layer; should remain ✓ - Firefox: pre-fix attempted ONE libva frame, got zeros, fell back to SW. Post-fix: should engage and stay engaged. Re-test in live session. - chromium-fourier 149: pre-fix worked per `fourier_attribution` cell A (browser_cpu_median = 54%, fps = 24.0). Post-fix should not regress; ideally should also work via this libva path now. - Brave 1.89 (deferred per `phase0_findings.md`): not blocking. ## What's next - **Step 4** (observability hardening): fix patch-0011 cache coherency (msync MS_INVALIDATE), add VIDIOC_G_EXT_CTRLS readback to confirm V4L2 layer accepts our writes. Helps future probes; not urgent. - **Step 5** (Phase 7): retest mpv vaapi (no -copy), Firefox, chromium-fourier across the live session. Document any regressions or edge-case failures. ## Why the win The full path: 1. **Phase 0 substrate** correctly characterized hantro's silent zero-output behavior (after three iterations of Phase 0 verdict revisions). 2. **Phase 4 plan** (`diff_against_ffmpeg.md`) anchored against two authoritative sources: FFmpeg's proven-working `v4l2_request_h264.c` and Linux mainline's `hantro_g1_h264_dec.c`. The plan was specific (which fields, why, what register). 3. **Phase 6 implementation** scoped strictly to the plan: three fixes in two commits, no scope creep, contract-explicit commit messages. 4. **Phase 7-style real-VO check** confirmed the fix without ambiguity (operator visual inspection of the bunny). The lesson captured in `feedback_kernel_source_audit_for_uapi_contract.md` (added 2026-05-04) is the durable takeaway: kernel-side source audit is non-optional for UAPI contract work where userspace fields drive hardware MMIO writes.