Files
libva-v4l2-request-fourier/phase0_pi5_hevc.md
T
claude-noether 25b8a15e09 phase0_pi5_hevc: open Pi 5 / CM5 HEVC chapter (substrate doc only)
Empirical higgs probe (sibling session 2026-05-17) confirmed
rpi-hevc-dec at /dev/video19 is V4L2 STATELESS, not stateful:
- Section header literally "Stateless Codec Controls"
- OUTPUT V4L2_PIX_FMT_HEVC_SLICE (parsed slices), not full-stream HEVC
- V4L2_CID_STATELESS_HEVC_* control set + slice_param_array[4096]
- CAPTURE NC12 / NC30 (V4L2_PIX_FMT_NV12_COL128 / _10_COL128,
  SAND 128-column tiled, Pi-specific)

So the Pi 5 HEVC HW path belongs HERE (request/stateless backend),
not in a separate stateful project. Replaces the now-deleted
libva-v4l2-stateful-fourier scaffold attempt.

phase0_pi5_hevc.md captures:
- Substrate (target host, backend baseline, empirical probe output)
- What carries forward unchanged (most of HEVC plumbing)
- What needs adding (RPI_HEVC_DEC driver_kind, NC12/NC30 video_format
  + detile primitive, image.c branch — small surface area)
- Six open questions Phase 1 must answer first (EXT_SPS presence,
  start_code default, SAND tile spec, drm_prime modifier round-trip,
  rpi-hevc-dec submission ordering quirks, packaging target OS)
- Phase 1 goal sketch (NOT locked) + Phase 3 baseline plan

No code in this commit. Phase 1 opens when higgs is up + first two
open questions are answered live.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 18:48:09 +00:00

7.3 KiB
Raw Blame History

Phase 0 — Pi 5 / CM5 HEVC chapter

Opened 2026-05-17 evening, after the failed libva-v4l2-stateful-fourier scaffold attempt. Brother-session empirical Phase 0 on higgs invalidated the stateful premise: rpi-hevc-dec is V4L2 stateless, so Pi 5 HEVC belongs in this backend, not a separate sibling.

No code in this chapter yet. This doc is the substrate. Phase 1 picks up from the "Open questions" section.

Substrate

Target host

higgs — Pi CM5 module on Pi CM5 IO board. BCM2712 SoC. VPN-only, often offline; wake via HIS skill recipe (no Fritz!Box plug — runs on power when on). Debian-based. Sole HW video decoder is rpi-hevc-dec at /dev/video19 + /dev/media1.

Backend baseline at chapter open

libva-v4l2-request-fourier master tip cf8cd9d (iter39 + Option B + h265 ref-list cap fix). Multi-device probe (iter38) already opens rkvdec + hantro slots; adding a third decoder slot for rpi-hevc-dec is a natural extension of that architecture.

iter2 (ampere VDPU381 HEVC EXT_SPS) added the GStreamer 1.28.2 H.265 parser vendor + EXT_SPS_ST_RPS / _LT_RPS dynamic-array submission. That plumbing is probe-gated (has_hevc_ext_sps_rps_rkvdec), so it stays dormant on hosts where the controls don't exist.

Empirical higgs probe (brother session)

v4l2-ctl -d /dev/video19 --list-formats-ext --list-ctrls:

Stateless Codec Controls

  hevc_sequence_parameter_set        (compound, V4L2_CID_STATELESS_HEVC_SPS)
  hevc_picture_parameter_set         (compound, V4L2_CID_STATELESS_HEVC_PPS)
  slice_param_array                  (compound dynamic-array dims=[4096])
  hevc_scaling_matrix                (compound)
  hevc_decode_parameters             (compound)
  hevc_decode_mode                   (menu, "Frame-Based")
  hevc_start_code                    (menu, default "No Start Code")

OUTPUT formats:
  S265  V4L2_PIX_FMT_HEVC_SLICE  (parsed slice payload)

CAPTURE formats:
  NC12  V4L2_PIX_FMT_NV12_COL128       (8-bit  SAND 128-column tiled)
  NC30  V4L2_PIX_FMT_NV12_10_COL128    (10-bit SAND 128-column tiled)

Conclusion: this is the standard V4L2_CID_STATELESS_HEVC_* control set exposed under the V4L2-request uAPI, exactly the same family our backend already drives for rkvdec/hantro/cedrus HEVC paths. The novel parts are two pixel formats (NC12, NC30) and one driver-id (rpi-hevc-dec).

What carries forward unchanged

  • VAAPI HEVC profile enumeration (config.c)
  • h265_set_controls core path (h265.c) — same compound ctrl set
  • Synthetic SPS pre-seed pattern (iter25/26) — already runs pre-CAPTURE-alloc
  • Multi-device dispatch in RequestCreateConfig (iter38)
  • VAAPI slice / picture / IQ matrix buffer parsing
  • HEVC h264-style start-code policy (we already DON'T prepend for HEVC)

What needs adding

Item Location Sizing
RPI_HEVC_DEC enum in driver_kind_t request.h trivial
Multi-device probe extends to /dev/video19 discovery context.c / request.c init small — mirror hantro slot
V4L2_PIX_FMT_NV12_COL128 (NC12) video_format entry video.c small
V4L2_PIX_FMT_NV12_10_COL128 (NC30) video_format entry video.c small
NC12 → NV12 detile primitive new nv12_col128.c mid — column tile layout, see kernel docs
NC30 → P010 detile primitive new nv12_col128.c mid — 10-bit variant of above
copy_surface_to_image branch for NC12/NC30 image.c small (mirror NV15→P010 gating)
Per-driver gating for any rpi-specific quirks discovered various per per-driver-kludge-gating

Open questions for Phase 1

Lock these before Phase 1 commits to a goal.

  1. EXT_SPS controls on rpi-hevc-dec? Brother's --list-ctrls output above shows the standard V4L2_CID_STATELESS_HEVC_* family — NOT the EXT_SPS_ST_RPS / EXT_SPS_LT_RPS extensions that VDPU381 needs. Verify: does slice_param_array[4096] accept st_rps_bits / lt_rps_bits in the per-slice payload, or does rpi-hevc-dec parse RPS itself from the slice header? If the latter, the iter2 EXT_SPS path stays dormant (probe-gated already), and rpi-hevc-dec just needs the picture->st_rps_bitsslice_params->short_term_ref_pic_set_size plumbing that iter31 α-29 already wired. Expectation: works out of the box. Confirm before assuming.

  2. hevc_start_code ctrl: "No Start Code" vs Annex B? Brother saw default "No Start Code" — matches our behavior (we don't prepend on HEVC). But the ctrl is configurable. Verify the menu values exposed and confirm "No Start Code" passes our raw slice-NAL payload as-is. If it doesn't, set the ctrl explicitly per unconditional-codec-state gating.

  3. NC12 / NC30 SAND tile layout — exact spec. Read Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst for the COL128 variants. Confirm: column stride = 128 bytes (Y) / 128 bytes (UV interleaved). Row count = ALIGN(height, 16) or ALIGN(height, 8)? Get the exact alignment and tile-traversal order before writing the detile primitive. Cite from kernel doc, NOT inferred from a hex dump.

  4. drm_prime / SAND modifier round-trip. Does ffmpeg-vaapi (and Firefox) accept the NC12 buffer via DRM_PRIME export carrying the DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT modifier, allowing zero-copy to a SAND-aware compositor? Or is libva-side detile to a linear NV12 buffer the only viable Firefox path? If detile is required for the consumer, the rockchip-pixel-verify-path rule (DMA-BUF GL preferred over cached mmap) might NOT apply since SAND is Pi-specific and not in the wider Wayland modifier ecosystem.

  5. rpi-hevc-dec quirks on first SPS submission. rkvdec needs image_fmt pre-seed before CAPTURE alloc (iter25). Does rpi-hevc-dec have an analogous "must set OUTPUT pix_fmt + SPS before CAPTURE" ordering? Verify with strace early.

  6. higgs OS + libva versioning. Brother probed on Debian. We package for Arch ALARM. What's the install path on higgs — Arch / Debian / Raspberry Pi OS? If Debian, the package needs a debian/ tree, not just PKGBUILD. Decide packaging target before Phase 8.

Phase 1 goal sketch (NOT locked)

Firefox HW HEVC playback on higgs at ≥30fps for 1080p Main, byte-exact libva-vs-kdirect for ≥3 reference fixtures (8-bit Main and 10-bit Main10).

Two measurable subgoals follow naturally:

  • libva (this backend, NV12 image output) == kdirect (ffmpeg-v4l2request, NV12 image output) byte-exact for the same input.
  • Firefox VA-API path engages (verify via chrome://gpu equivalent / log inspection — MOZ_LOG=PlatformDecoderModule:5).

Phase 3 baseline plan

Before any backend code touches rpi-hevc-dec:

  • kdirect floor: ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime -i bbb_720p10s_hevc.mp4 -vf hwdownload,format=nv12 -frames:v 10 ... and sha256 the YUV.
  • SW reference: same ffmpeg without -hwaccel, sha256 the YUV.
  • Both runs N=3 per replicate-baseline-first.
  • Capture strace -f -e ioctl of the kdirect run — gives the canonical ioctl sequence rpi-hevc-dec expects.

Phase 0 closing

This doc commits the substrate. Phase 1 starts when:

  • higgs is up + reachable
  • Open questions 1+2 (EXT_SPS + start_code) are answered live, in one short probe session
  • Phase 3 baseline floors are captured

No work blocks the close of iter39 / fresnel campaign — those are shipped.