Files
libva-multiplanar/phase6_findings.md
T
marfrit b18c6f70f9 Phase 6 success: hantro decodes real H.264 pixels via libva-v4l2-request
First end-to-end hardware decode in this campaign. mpv --hwdec=
vaapi-copy --vo=gpu on bbb_1080p30_h264.mp4 in operator's live
Plasma 6 Wayland session shows the bunny playing — real decoded
NV12 pixel data, not the all-zero (green) or solid-color output
we had all day.

Operator confirmation 2026-05-04: "A big fat white bunny shows up."

Two fork commits got us here:
  d41a4b9 — h264: always submit SCALING_MATRIX + populate pps num_ref_idx
  9de1be3 — h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields

The load-bearing fix was 9de1be3 (slice-header bit-parser) — it
populates dec_param->dec_ref_pic_marking_bit_size, idr_pic_id,
pic_order_cnt_bit_size which hantro G1 writes directly into MMIO
registers (G1_REG_DEC_CTRL5_REFPIC_MK_LEN, G1_REG_DEC_CTRL5_IDR_PIC_ID,
G1_REG_DEC_CTRL6_POC_LENGTH).

Phase 1 boolean-correctness criterion now sharpened to require
real-VO pixel-content verification. Met for mpv vaapi-copy in
live session.

Phase 7 verification still owed across the full test corpus
(vainfo, mpv vaapi (no -copy), Firefox, chromium-fourier 149).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:49:59 +00:00

6.0 KiB

Phase 6 — implementation result

The Tier 1A + Tier 1B + Tier 2C fixes from diff_against_ffmpeg.md produced a working hardware-decoded H.264 pipeline on hantro RK3568 / ohm via libva-v4l2-request, end-to-end, in the operator's live Plasma 6 Wayland session.

Verdict

WIN. mpv --hwdec=vaapi-copy --vo=gpu on bbb_1080p30_h264.mp4 shows the bunny playing on screen — real decoded NV12 pixel data, not the all-zero (green) or solid-color output we'd been getting all day.

Operator confirmation: 2026-05-04, "A big fat white bunny shows up." First end-to-end success in this campaign.

What landed in the fork

Two commits on marfrit/libva-v4l2-request-fourier master:

  • d41a4b9 — h264: always submit SCALING_MATRIX + populate pps num_ref_idx
    • h264_default_flat_scaling_matrix(): H.264 spec flat default (16/16/...) when bitstream has no explicit lists
    • SCALING_MATRIX submitted unconditionally (Tier 2C)
    • V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT set unconditionally (Tier 2C)
    • pps->num_ref_idx_l0_default_active_minus1 and _l1_ populated from VASlice (Tier 1B)
  • 9de1be3 — h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields (the load-bearing fix; Tier 1A)
    • New src/h264_slice_header.{c,h} — minimal RBSP bit reader + slice_header() parser per ITU-T H.264 §7.3.3
    • Walks slice_header through dec_ref_pic_marking(), measures bit positions
    • Populates decode->idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*, pic_order_cnt_bit_size, dec_ref_pic_marking_bit_size
    • These three _bit_size fields drive hantro G1 hardware MMIO registers G1_REG_DEC_CTRL5_REFPIC_MK_LEN, G1_REG_DEC_CTRL5_IDR_PIC_ID, G1_REG_DEC_CTRL6_POC_LENGTH. Without them the hardware bitstream parser walks past zero bits, lands on garbage, decodes zero pixels.

Slice-header parser empirical validation

The parser's own request_log output for the first three frames of bbb (with new build deployed):

slice_header parse: idr_pic_id=0 poc_lsb=0 poc_bits=8 refmark_bits=2 frame_num=0 slice_type=2 pps_id=0
slice_header parse: idr_pic_id=0 poc_lsb=4 poc_bits=8 refmark_bits=1 frame_num=1 slice_type=0 pps_id=0
slice_header parse: idr_pic_id=0 poc_lsb=2 poc_bits=8 refmark_bits=0 frame_num=2 slice_type=1 pps_id=0
  • Frame 0 IDR: slice_type=2 (I-slice), idr_pic_id=0, frame_num=0, refmark_bits=2 (no_output_of_prior_pics_flag + long_term_reference_flag — exactly what IDR's dec_ref_pic_marking is) ✓
  • Frame 1: slice_type=0 (P-slice), frame_num=1, refmark_bits=1 (adaptive_ref_pic_marking_mode_flag=0 only) ✓
  • Frame 2: slice_type=1 (B-slice), frame_num=2 ✓
  • frame_num from bitstream parse matches VAAPI's pre-parsed frame_num. POC bit-size = 8 matches log2_max_pic_order_cnt_lsb_minus4 + 4 = 4 + 4 = 8

Cross-check: parser output is internally consistent and matches VAAPI's pre-parsed values for all observable fields.

Build + deployment

ssh ohm: git clone gitea/marfrit/libva-v4l2-request-fourier (HEAD = 9de1be3)
ssh ohm: cd src && meson setup build --buildtype=release && ninja -C build
ssh ohm (sudo): cp build/src/v4l2_request_drv_video.so /usr/lib/dri/

New .so size: 77 224 B (vs prior 67 480 B for the Step 1 build — net +9.7 kB for the slice-header parser + minor edits). sha256: 33cfe687e459310d8d5571e6380e330f7ed716ac12710d67ef3821532108c7a0.

No build warnings except pre-existing ones in src/v4l2.h (struct forward-decl issue; orthogonal).

Phase 1 boolean-correctness criterion — re-evaluated

Criterion Pre-fix Post-fix
Consumer dlopens v4l2_request_drv_video.so
vaInitialize succeeds
V4L2-stateless contract trace lands without EINVAL
Kernel processes the request (no dmesg errors)
Kernel writes to CAPTURE buffer ✓ (zeros) ✓ (real NV12 pixels)
Real-VO renders decoded frames ✗ (green/blue uniform) ✓ (bunny visible)

The sharpened criterion holds for mpv --hwdec=vaapi-copy in a live session. Phase 7 needs to extend this verification across the full test corpus.

Phase 7 verification — what's left

  • mpv --hwdec=vaapi-copy --vo=gpu (live session): ✓ confirmed (bunny)
  • mpv --hwdec=vaapi --vo=gpu (live session): TBD — different code path (DMA-BUF GL-import vs vaapi-copy)
  • vainfo: was ✓ pre-fix at the enumeration layer; should remain ✓
  • Firefox: pre-fix attempted ONE libva frame, got zeros, fell back to SW. Post-fix: should engage and stay engaged. Re-test in live session.
  • chromium-fourier 149: pre-fix worked per fourier_attribution cell A (browser_cpu_median = 54%, fps = 24.0). Post-fix should not regress; ideally should also work via this libva path now.
  • Brave 1.89 (deferred per phase0_findings.md): not blocking.

What's next

  • Step 4 (observability hardening): fix patch-0011 cache coherency (msync MS_INVALIDATE), add VIDIOC_G_EXT_CTRLS readback to confirm V4L2 layer accepts our writes. Helps future probes; not urgent.
  • Step 5 (Phase 7): retest mpv vaapi (no -copy), Firefox, chromium-fourier across the live session. Document any regressions or edge-case failures.

Why the win

The full path:

  1. Phase 0 substrate correctly characterized hantro's silent zero-output behavior (after three iterations of Phase 0 verdict revisions).
  2. Phase 4 plan (diff_against_ffmpeg.md) anchored against two authoritative sources: FFmpeg's proven-working v4l2_request_h264.c and Linux mainline's hantro_g1_h264_dec.c. The plan was specific (which fields, why, what register).
  3. Phase 6 implementation scoped strictly to the plan: three fixes in two commits, no scope creep, contract-explicit commit messages.
  4. Phase 7-style real-VO check confirmed the fix without ambiguity (operator visual inspection of the bunny).

The lesson captured in feedback_kernel_source_audit_for_uapi_contract.md (added 2026-05-04) is the durable takeaway: kernel-side source audit is non-optional for UAPI contract work where userspace fields drive hardware MMIO writes.