Phase 6 success: hantro decodes real H.264 pixels via libva-v4l2-request
First end-to-end hardware decode in this campaign. mpv --hwdec= vaapi-copy --vo=gpu on bbb_1080p30_h264.mp4 in operator's live Plasma 6 Wayland session shows the bunny playing — real decoded NV12 pixel data, not the all-zero (green) or solid-color output we had all day. Operator confirmation 2026-05-04: "A big fat white bunny shows up." Two fork commits got us here: d41a4b9 — h264: always submit SCALING_MATRIX + populate pps num_ref_idx 9de1be3 — h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields The load-bearing fix was 9de1be3 (slice-header bit-parser) — it populates dec_param->dec_ref_pic_marking_bit_size, idr_pic_id, pic_order_cnt_bit_size which hantro G1 writes directly into MMIO registers (G1_REG_DEC_CTRL5_REFPIC_MK_LEN, G1_REG_DEC_CTRL5_IDR_PIC_ID, G1_REG_DEC_CTRL6_POC_LENGTH). Phase 1 boolean-correctness criterion now sharpened to require real-VO pixel-content verification. Met for mpv vaapi-copy in live session. Phase 7 verification still owed across the full test corpus (vainfo, mpv vaapi (no -copy), Firefox, chromium-fourier 149). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,90 @@
|
|||||||
|
# Phase 6 — implementation result
|
||||||
|
|
||||||
|
The Tier 1A + Tier 1B + Tier 2C fixes from `diff_against_ffmpeg.md` produced a working hardware-decoded H.264 pipeline on hantro RK3568 / ohm via libva-v4l2-request, end-to-end, in the operator's live Plasma 6 Wayland session.
|
||||||
|
|
||||||
|
## Verdict
|
||||||
|
|
||||||
|
**WIN.** mpv `--hwdec=vaapi-copy --vo=gpu` on `bbb_1080p30_h264.mp4` shows the bunny playing on screen — real decoded NV12 pixel data, not the all-zero (green) or solid-color output we'd been getting all day.
|
||||||
|
|
||||||
|
Operator confirmation: 2026-05-04, "A big fat white bunny shows up." First end-to-end success in this campaign.
|
||||||
|
|
||||||
|
## What landed in the fork
|
||||||
|
|
||||||
|
Two commits on `marfrit/libva-v4l2-request-fourier` master:
|
||||||
|
|
||||||
|
- **`d41a4b9`** — h264: always submit SCALING_MATRIX + populate pps num_ref_idx
|
||||||
|
- `h264_default_flat_scaling_matrix()`: H.264 spec flat default (16/16/...) when bitstream has no explicit lists
|
||||||
|
- SCALING_MATRIX submitted unconditionally (Tier 2C)
|
||||||
|
- `V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT` set unconditionally (Tier 2C)
|
||||||
|
- `pps->num_ref_idx_l0_default_active_minus1` and `_l1_` populated from VASlice (Tier 1B)
|
||||||
|
- **`9de1be3`** — h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields (the load-bearing fix; Tier 1A)
|
||||||
|
- New `src/h264_slice_header.{c,h}` — minimal RBSP bit reader + slice_header() parser per ITU-T H.264 §7.3.3
|
||||||
|
- Walks slice_header through dec_ref_pic_marking(), measures bit positions
|
||||||
|
- Populates `decode->idr_pic_id`, `pic_order_cnt_lsb`, `delta_pic_order_cnt_*`, `pic_order_cnt_bit_size`, `dec_ref_pic_marking_bit_size`
|
||||||
|
- These three `_bit_size` fields drive hantro G1 hardware MMIO registers `G1_REG_DEC_CTRL5_REFPIC_MK_LEN`, `G1_REG_DEC_CTRL5_IDR_PIC_ID`, `G1_REG_DEC_CTRL6_POC_LENGTH`. Without them the hardware bitstream parser walks past zero bits, lands on garbage, decodes zero pixels.
|
||||||
|
|
||||||
|
## Slice-header parser empirical validation
|
||||||
|
|
||||||
|
The parser's own `request_log` output for the first three frames of bbb (with new build deployed):
|
||||||
|
|
||||||
|
```
|
||||||
|
slice_header parse: idr_pic_id=0 poc_lsb=0 poc_bits=8 refmark_bits=2 frame_num=0 slice_type=2 pps_id=0
|
||||||
|
slice_header parse: idr_pic_id=0 poc_lsb=4 poc_bits=8 refmark_bits=1 frame_num=1 slice_type=0 pps_id=0
|
||||||
|
slice_header parse: idr_pic_id=0 poc_lsb=2 poc_bits=8 refmark_bits=0 frame_num=2 slice_type=1 pps_id=0
|
||||||
|
```
|
||||||
|
|
||||||
|
- Frame 0 IDR: slice_type=2 (I-slice), idr_pic_id=0, frame_num=0, refmark_bits=2 (no_output_of_prior_pics_flag + long_term_reference_flag — exactly what IDR's dec_ref_pic_marking is) ✓
|
||||||
|
- Frame 1: slice_type=0 (P-slice), frame_num=1, refmark_bits=1 (adaptive_ref_pic_marking_mode_flag=0 only) ✓
|
||||||
|
- Frame 2: slice_type=1 (B-slice), frame_num=2 ✓
|
||||||
|
- `frame_num` from bitstream parse matches VAAPI's pre-parsed `frame_num`. POC bit-size = 8 matches `log2_max_pic_order_cnt_lsb_minus4 + 4 = 4 + 4 = 8` ✓
|
||||||
|
|
||||||
|
Cross-check: parser output is internally consistent and matches VAAPI's pre-parsed values for all observable fields.
|
||||||
|
|
||||||
|
## Build + deployment
|
||||||
|
|
||||||
|
```
|
||||||
|
ssh ohm: git clone gitea/marfrit/libva-v4l2-request-fourier (HEAD = 9de1be3)
|
||||||
|
ssh ohm: cd src && meson setup build --buildtype=release && ninja -C build
|
||||||
|
ssh ohm (sudo): cp build/src/v4l2_request_drv_video.so /usr/lib/dri/
|
||||||
|
```
|
||||||
|
|
||||||
|
New `.so` size: 77 224 B (vs prior 67 480 B for the Step 1 build — net +9.7 kB for the slice-header parser + minor edits). sha256: `33cfe687e459310d8d5571e6380e330f7ed716ac12710d67ef3821532108c7a0`.
|
||||||
|
|
||||||
|
No build warnings except pre-existing ones in `src/v4l2.h` (struct forward-decl issue; orthogonal).
|
||||||
|
|
||||||
|
## Phase 1 boolean-correctness criterion — re-evaluated
|
||||||
|
|
||||||
|
| Criterion | Pre-fix | Post-fix |
|
||||||
|
|---|---|---|
|
||||||
|
| Consumer dlopens v4l2_request_drv_video.so | ✓ | ✓ |
|
||||||
|
| vaInitialize succeeds | ✓ | ✓ |
|
||||||
|
| V4L2-stateless contract trace lands without EINVAL | ✓ | ✓ |
|
||||||
|
| Kernel processes the request (no dmesg errors) | ✓ | ✓ |
|
||||||
|
| Kernel writes to CAPTURE buffer | ✓ (zeros) | ✓ (real NV12 pixels) |
|
||||||
|
| **Real-VO renders decoded frames** | ✗ (green/blue uniform) | **✓ (bunny visible)** |
|
||||||
|
|
||||||
|
The sharpened criterion holds for mpv `--hwdec=vaapi-copy` in a live session. Phase 7 needs to extend this verification across the full test corpus.
|
||||||
|
|
||||||
|
## Phase 7 verification — what's left
|
||||||
|
|
||||||
|
- mpv `--hwdec=vaapi-copy --vo=gpu` (live session): ✓ confirmed (bunny)
|
||||||
|
- mpv `--hwdec=vaapi --vo=gpu` (live session): TBD — different code path (DMA-BUF GL-import vs vaapi-copy)
|
||||||
|
- vainfo: was ✓ pre-fix at the enumeration layer; should remain ✓
|
||||||
|
- Firefox: pre-fix attempted ONE libva frame, got zeros, fell back to SW. Post-fix: should engage and stay engaged. Re-test in live session.
|
||||||
|
- chromium-fourier 149: pre-fix worked per `fourier_attribution` cell A (browser_cpu_median = 54%, fps = 24.0). Post-fix should not regress; ideally should also work via this libva path now.
|
||||||
|
- Brave 1.89 (deferred per `phase0_findings.md`): not blocking.
|
||||||
|
|
||||||
|
## What's next
|
||||||
|
|
||||||
|
- **Step 4** (observability hardening): fix patch-0011 cache coherency (msync MS_INVALIDATE), add VIDIOC_G_EXT_CTRLS readback to confirm V4L2 layer accepts our writes. Helps future probes; not urgent.
|
||||||
|
- **Step 5** (Phase 7): retest mpv vaapi (no -copy), Firefox, chromium-fourier across the live session. Document any regressions or edge-case failures.
|
||||||
|
|
||||||
|
## Why the win
|
||||||
|
|
||||||
|
The full path:
|
||||||
|
1. **Phase 0 substrate** correctly characterized hantro's silent zero-output behavior (after three iterations of Phase 0 verdict revisions).
|
||||||
|
2. **Phase 4 plan** (`diff_against_ffmpeg.md`) anchored against two authoritative sources: FFmpeg's proven-working `v4l2_request_h264.c` and Linux mainline's `hantro_g1_h264_dec.c`. The plan was specific (which fields, why, what register).
|
||||||
|
3. **Phase 6 implementation** scoped strictly to the plan: three fixes in two commits, no scope creep, contract-explicit commit messages.
|
||||||
|
4. **Phase 7-style real-VO check** confirmed the fix without ambiguity (operator visual inspection of the bunny).
|
||||||
|
|
||||||
|
The lesson captured in `feedback_kernel_source_audit_for_uapi_contract.md` (added 2026-05-04) is the durable takeaway: kernel-side source audit is non-optional for UAPI contract work where userspace fields drive hardware MMIO writes.
|
||||||
Reference in New Issue
Block a user