First end-to-end hardware decode in this campaign. mpv --hwdec= vaapi-copy --vo=gpu on bbb_1080p30_h264.mp4 in operator's live Plasma 6 Wayland session shows the bunny playing — real decoded NV12 pixel data, not the all-zero (green) or solid-color output we had all day. Operator confirmation 2026-05-04: "A big fat white bunny shows up." Two fork commits got us here: d41a4b9 — h264: always submit SCALING_MATRIX + populate pps num_ref_idx 9de1be3 — h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields The load-bearing fix was 9de1be3 (slice-header bit-parser) — it populates dec_param->dec_ref_pic_marking_bit_size, idr_pic_id, pic_order_cnt_bit_size which hantro G1 writes directly into MMIO registers (G1_REG_DEC_CTRL5_REFPIC_MK_LEN, G1_REG_DEC_CTRL5_IDR_PIC_ID, G1_REG_DEC_CTRL6_POC_LENGTH). Phase 1 boolean-correctness criterion now sharpened to require real-VO pixel-content verification. Met for mpv vaapi-copy in live session. Phase 7 verification still owed across the full test corpus (vainfo, mpv vaapi (no -copy), Firefox, chromium-fourier 149). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.0 KiB
Phase 6 — implementation result
The Tier 1A + Tier 1B + Tier 2C fixes from diff_against_ffmpeg.md produced a working hardware-decoded H.264 pipeline on hantro RK3568 / ohm via libva-v4l2-request, end-to-end, in the operator's live Plasma 6 Wayland session.
Verdict
WIN. mpv --hwdec=vaapi-copy --vo=gpu on bbb_1080p30_h264.mp4 shows the bunny playing on screen — real decoded NV12 pixel data, not the all-zero (green) or solid-color output we'd been getting all day.
Operator confirmation: 2026-05-04, "A big fat white bunny shows up." First end-to-end success in this campaign.
What landed in the fork
Two commits on marfrit/libva-v4l2-request-fourier master:
d41a4b9— h264: always submit SCALING_MATRIX + populate pps num_ref_idxh264_default_flat_scaling_matrix(): H.264 spec flat default (16/16/...) when bitstream has no explicit lists- SCALING_MATRIX submitted unconditionally (Tier 2C)
V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENTset unconditionally (Tier 2C)pps->num_ref_idx_l0_default_active_minus1and_l1_populated from VASlice (Tier 1B)
9de1be3— h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields (the load-bearing fix; Tier 1A)- New
src/h264_slice_header.{c,h}— minimal RBSP bit reader + slice_header() parser per ITU-T H.264 §7.3.3 - Walks slice_header through dec_ref_pic_marking(), measures bit positions
- Populates
decode->idr_pic_id,pic_order_cnt_lsb,delta_pic_order_cnt_*,pic_order_cnt_bit_size,dec_ref_pic_marking_bit_size - These three
_bit_sizefields drive hantro G1 hardware MMIO registersG1_REG_DEC_CTRL5_REFPIC_MK_LEN,G1_REG_DEC_CTRL5_IDR_PIC_ID,G1_REG_DEC_CTRL6_POC_LENGTH. Without them the hardware bitstream parser walks past zero bits, lands on garbage, decodes zero pixels.
- New
Slice-header parser empirical validation
The parser's own request_log output for the first three frames of bbb (with new build deployed):
slice_header parse: idr_pic_id=0 poc_lsb=0 poc_bits=8 refmark_bits=2 frame_num=0 slice_type=2 pps_id=0
slice_header parse: idr_pic_id=0 poc_lsb=4 poc_bits=8 refmark_bits=1 frame_num=1 slice_type=0 pps_id=0
slice_header parse: idr_pic_id=0 poc_lsb=2 poc_bits=8 refmark_bits=0 frame_num=2 slice_type=1 pps_id=0
- Frame 0 IDR: slice_type=2 (I-slice), idr_pic_id=0, frame_num=0, refmark_bits=2 (no_output_of_prior_pics_flag + long_term_reference_flag — exactly what IDR's dec_ref_pic_marking is) ✓
- Frame 1: slice_type=0 (P-slice), frame_num=1, refmark_bits=1 (adaptive_ref_pic_marking_mode_flag=0 only) ✓
- Frame 2: slice_type=1 (B-slice), frame_num=2 ✓
frame_numfrom bitstream parse matches VAAPI's pre-parsedframe_num. POC bit-size = 8 matcheslog2_max_pic_order_cnt_lsb_minus4 + 4 = 4 + 4 = 8✓
Cross-check: parser output is internally consistent and matches VAAPI's pre-parsed values for all observable fields.
Build + deployment
ssh ohm: git clone gitea/marfrit/libva-v4l2-request-fourier (HEAD = 9de1be3)
ssh ohm: cd src && meson setup build --buildtype=release && ninja -C build
ssh ohm (sudo): cp build/src/v4l2_request_drv_video.so /usr/lib/dri/
New .so size: 77 224 B (vs prior 67 480 B for the Step 1 build — net +9.7 kB for the slice-header parser + minor edits). sha256: 33cfe687e459310d8d5571e6380e330f7ed716ac12710d67ef3821532108c7a0.
No build warnings except pre-existing ones in src/v4l2.h (struct forward-decl issue; orthogonal).
Phase 1 boolean-correctness criterion — re-evaluated
| Criterion | Pre-fix | Post-fix |
|---|---|---|
| Consumer dlopens v4l2_request_drv_video.so | ✓ | ✓ |
| vaInitialize succeeds | ✓ | ✓ |
| V4L2-stateless contract trace lands without EINVAL | ✓ | ✓ |
| Kernel processes the request (no dmesg errors) | ✓ | ✓ |
| Kernel writes to CAPTURE buffer | ✓ (zeros) | ✓ (real NV12 pixels) |
| Real-VO renders decoded frames | ✗ (green/blue uniform) | ✓ (bunny visible) |
The sharpened criterion holds for mpv --hwdec=vaapi-copy in a live session. Phase 7 needs to extend this verification across the full test corpus.
Phase 7 verification — what's left
- mpv
--hwdec=vaapi-copy --vo=gpu(live session): ✓ confirmed (bunny) - mpv
--hwdec=vaapi --vo=gpu(live session): TBD — different code path (DMA-BUF GL-import vs vaapi-copy) - vainfo: was ✓ pre-fix at the enumeration layer; should remain ✓
- Firefox: pre-fix attempted ONE libva frame, got zeros, fell back to SW. Post-fix: should engage and stay engaged. Re-test in live session.
- chromium-fourier 149: pre-fix worked per
fourier_attributioncell A (browser_cpu_median = 54%, fps = 24.0). Post-fix should not regress; ideally should also work via this libva path now. - Brave 1.89 (deferred per
phase0_findings.md): not blocking.
What's next
- Step 4 (observability hardening): fix patch-0011 cache coherency (msync MS_INVALIDATE), add VIDIOC_G_EXT_CTRLS readback to confirm V4L2 layer accepts our writes. Helps future probes; not urgent.
- Step 5 (Phase 7): retest mpv vaapi (no -copy), Firefox, chromium-fourier across the live session. Document any regressions or edge-case failures.
Why the win
The full path:
- Phase 0 substrate correctly characterized hantro's silent zero-output behavior (after three iterations of Phase 0 verdict revisions).
- Phase 4 plan (
diff_against_ffmpeg.md) anchored against two authoritative sources: FFmpeg's proven-workingv4l2_request_h264.cand Linux mainline'shantro_g1_h264_dec.c. The plan was specific (which fields, why, what register). - Phase 6 implementation scoped strictly to the plan: three fixes in two commits, no scope creep, contract-explicit commit messages.
- Phase 7-style real-VO check confirmed the fix without ambiguity (operator visual inspection of the bunny).
The lesson captured in feedback_kernel_source_audit_for_uapi_contract.md (added 2026-05-04) is the durable takeaway: kernel-side source audit is non-optional for UAPI contract work where userspace fields drive hardware MMIO writes.