Files
libva-v4l2-request-fourier/phase0_pi5_hevc.md
T
claude-noether b6a65fc692 phase0_pi5_hevc: close addendum with empirical higgs probe data
Live probe of rpi-hevc-dec on higgs (Pi CM5, kernel 6.12.75-rpt-rpi-2712,
Debian 13 trixie) answers Phase 0 open questions Q1, Q2, Q5, Q6
empirically; Q3 partial; Q4 still open.

Q1 (EXT_SPS): NOT present. Only standard V4L2_CID_STATELESS_HEVC_*.
  Probe ctrl id 0xa97 returns EINVAL — same gate iter2's
  has_hevc_ext_sps_rps_rkvdec uses. iter31 alpha-29 plumbing applies.

Q2 (hevc_start_code): default 0 "No Start Code"; matches our behaviour.

Q3 (NC12 SAND tile layout): partial. CAPTURE S_FMT for 1280x720 NC12
  returns sizeimage=1382400 (linear NV12 byte count) but
  bytesperline=1080 (suspect, encodes SAND col count not linear stride).
  Need kernel-doc / driver-source read before writing detile primitive.

Q4 (DRM modifier round-trip): hwdownload rejects SAND-tiled drm_prime
  (-38 Function not implemented). Backend CPU-detile to NV12 is the
  safe path for Firefox.

Q5 (submission ordering): empirical ioctl trace shows canonical V4L2
  stateless flow. Two notes for the backend: kdirect uses
  V4L2_MEMORY_DMABUF for both queues (we use MMAP for CAPTURE on
  rkvdec); kdirect does NOT need the iter25 SPS pre-seed pattern -
  rpi-hevc-dec takes explicit NC12 + dims directly.

Q6 (packaging): Debian 13 trixie. Phase 8 needs a debian/ tree, not
  just PKGBUILD. Decision in Phase 1.

Other findings: ffmpeg 7.1.3 from stock Debian is built with
--enable-v4l2-request. kdirect engagement line:
  Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19;
  buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8
No libva ICD installed (only armada-drm_dri.so). mpv installable.
Firefox 145 + rpi-firefox-mods present.

Phase 0 closed. Phase 1 opens with goal:
  HEVC bit-exact libva-vs-kdirect on higgs for 1280x720 Main 8-bit
  via the new RPI_HEVC_DEC driver_kind slot + NC12 detile primitive.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 18:54:08 +00:00

14 KiB
Raw Blame History

Phase 0 — Pi 5 / CM5 HEVC chapter

Opened 2026-05-17 evening, after the failed libva-v4l2-stateful-fourier scaffold attempt. Brother-session empirical Phase 0 on higgs invalidated the stateful premise: rpi-hevc-dec is V4L2 stateless, so Pi 5 HEVC belongs in this backend, not a separate sibling.

No code in this chapter yet. This doc is the substrate. Phase 1 picks up from the "Open questions" section.

Substrate

Target host

higgs — Pi CM5 module on Pi CM5 IO board. BCM2712 SoC. VPN-only, often offline; wake via HIS skill recipe (no Fritz!Box plug — runs on power when on). Debian-based. Sole HW video decoder is rpi-hevc-dec at /dev/video19 + /dev/media1.

Backend baseline at chapter open

libva-v4l2-request-fourier master tip cf8cd9d (iter39 + Option B + h265 ref-list cap fix). Multi-device probe (iter38) already opens rkvdec + hantro slots; adding a third decoder slot for rpi-hevc-dec is a natural extension of that architecture.

iter2 (ampere VDPU381 HEVC EXT_SPS) added the GStreamer 1.28.2 H.265 parser vendor + EXT_SPS_ST_RPS / _LT_RPS dynamic-array submission. That plumbing is probe-gated (has_hevc_ext_sps_rps_rkvdec), so it stays dormant on hosts where the controls don't exist.

Empirical higgs probe (brother session)

v4l2-ctl -d /dev/video19 --list-formats-ext --list-ctrls:

Stateless Codec Controls

  hevc_sequence_parameter_set        (compound, V4L2_CID_STATELESS_HEVC_SPS)
  hevc_picture_parameter_set         (compound, V4L2_CID_STATELESS_HEVC_PPS)
  slice_param_array                  (compound dynamic-array dims=[4096])
  hevc_scaling_matrix                (compound)
  hevc_decode_parameters             (compound)
  hevc_decode_mode                   (menu, "Frame-Based")
  hevc_start_code                    (menu, default "No Start Code")

OUTPUT formats:
  S265  V4L2_PIX_FMT_HEVC_SLICE  (parsed slice payload)

CAPTURE formats:
  NC12  V4L2_PIX_FMT_NV12_COL128       (8-bit  SAND 128-column tiled)
  NC30  V4L2_PIX_FMT_NV12_10_COL128    (10-bit SAND 128-column tiled)

Conclusion: this is the standard V4L2_CID_STATELESS_HEVC_* control set exposed under the V4L2-request uAPI, exactly the same family our backend already drives for rkvdec/hantro/cedrus HEVC paths. The novel parts are two pixel formats (NC12, NC30) and one driver-id (rpi-hevc-dec).

What carries forward unchanged

  • VAAPI HEVC profile enumeration (config.c)
  • h265_set_controls core path (h265.c) — same compound ctrl set
  • Synthetic SPS pre-seed pattern (iter25/26) — already runs pre-CAPTURE-alloc
  • Multi-device dispatch in RequestCreateConfig (iter38)
  • VAAPI slice / picture / IQ matrix buffer parsing
  • HEVC h264-style start-code policy (we already DON'T prepend for HEVC)

What needs adding

Item Location Sizing
RPI_HEVC_DEC enum in driver_kind_t request.h trivial
Multi-device probe extends to /dev/video19 discovery context.c / request.c init small — mirror hantro slot
V4L2_PIX_FMT_NV12_COL128 (NC12) video_format entry video.c small
V4L2_PIX_FMT_NV12_10_COL128 (NC30) video_format entry video.c small
NC12 → NV12 detile primitive new nv12_col128.c mid — column tile layout, see kernel docs
NC30 → P010 detile primitive new nv12_col128.c mid — 10-bit variant of above
copy_surface_to_image branch for NC12/NC30 image.c small (mirror NV15→P010 gating)
Per-driver gating for any rpi-specific quirks discovered various per per-driver-kludge-gating

Open questions for Phase 1

Lock these before Phase 1 commits to a goal.

  1. EXT_SPS controls on rpi-hevc-dec? Brother's --list-ctrls output above shows the standard V4L2_CID_STATELESS_HEVC_* family — NOT the EXT_SPS_ST_RPS / EXT_SPS_LT_RPS extensions that VDPU381 needs. Verify: does slice_param_array[4096] accept st_rps_bits / lt_rps_bits in the per-slice payload, or does rpi-hevc-dec parse RPS itself from the slice header? If the latter, the iter2 EXT_SPS path stays dormant (probe-gated already), and rpi-hevc-dec just needs the picture->st_rps_bitsslice_params->short_term_ref_pic_set_size plumbing that iter31 α-29 already wired. Expectation: works out of the box. Confirm before assuming.

  2. hevc_start_code ctrl: "No Start Code" vs Annex B? Brother saw default "No Start Code" — matches our behavior (we don't prepend on HEVC). But the ctrl is configurable. Verify the menu values exposed and confirm "No Start Code" passes our raw slice-NAL payload as-is. If it doesn't, set the ctrl explicitly per unconditional-codec-state gating.

  3. NC12 / NC30 SAND tile layout — exact spec. Read Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst for the COL128 variants. Confirm: column stride = 128 bytes (Y) / 128 bytes (UV interleaved). Row count = ALIGN(height, 16) or ALIGN(height, 8)? Get the exact alignment and tile-traversal order before writing the detile primitive. Cite from kernel doc, NOT inferred from a hex dump.

  4. drm_prime / SAND modifier round-trip. Does ffmpeg-vaapi (and Firefox) accept the NC12 buffer via DRM_PRIME export carrying the DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT modifier, allowing zero-copy to a SAND-aware compositor? Or is libva-side detile to a linear NV12 buffer the only viable Firefox path? If detile is required for the consumer, the rockchip-pixel-verify-path rule (DMA-BUF GL preferred over cached mmap) might NOT apply since SAND is Pi-specific and not in the wider Wayland modifier ecosystem.

  5. rpi-hevc-dec quirks on first SPS submission. rkvdec needs image_fmt pre-seed before CAPTURE alloc (iter25). Does rpi-hevc-dec have an analogous "must set OUTPUT pix_fmt + SPS before CAPTURE" ordering? Verify with strace early.

  6. higgs OS + libva versioning. Brother probed on Debian. We package for Arch ALARM. What's the install path on higgs — Arch / Debian / Raspberry Pi OS? If Debian, the package needs a debian/ tree, not just PKGBUILD. Decide packaging target before Phase 8.

Phase 1 goal sketch (NOT locked)

Firefox HW HEVC playback on higgs at ≥30fps for 1080p Main, byte-exact libva-vs-kdirect for ≥3 reference fixtures (8-bit Main and 10-bit Main10).

Two measurable subgoals follow naturally:

  • libva (this backend, NV12 image output) == kdirect (ffmpeg-v4l2request, NV12 image output) byte-exact for the same input.
  • Firefox VA-API path engages (verify via chrome://gpu equivalent / log inspection — MOZ_LOG=PlatformDecoderModule:5).

Phase 3 baseline plan

Before any backend code touches rpi-hevc-dec:

  • kdirect floor: ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime -i bbb_720p10s_hevc.mp4 -vf hwdownload,format=nv12 -frames:v 10 ... and sha256 the YUV.
  • SW reference: same ffmpeg without -hwaccel, sha256 the YUV.
  • Both runs N=3 per replicate-baseline-first.
  • Capture strace -f -e ioctl of the kdirect run — gives the canonical ioctl sequence rpi-hevc-dec expects.

Phase 0 closing

This doc commits the substrate. Phase 1 starts when:

  • higgs is up + reachable
  • Open questions 1+2 (EXT_SPS + start_code) are answered live, in one short probe session
  • Phase 3 baseline floors are captured

No work blocks the close of iter39 / fresnel campaign — those are shipped.

Phase 0 close addendum (2026-05-17 evening, higgs probe session)

Empirical probes on higgs answered Q1, Q2, partial Q3, full Q5, full Q6. Q4 (DRM modifier round-trip) remains open. Phase 0 is closed; Phase 1 opens with what's below.

Q1 — EXT_SPS controls on rpi-hevc-dec: NOT present

v4l2-ctl -d /dev/video19 --list-ctrls confirms ONLY the standard V4L2_CID_STATELESS_HEVC_* set:

  • hevc_sequence_parameter_set (0x00a40a90)
  • hevc_picture_parameter_set (0x00a40a91)
  • slice_param_array (0x00a40a92, dynamic-array dims=[4096])
  • hevc_scaling_matrix (0x00a40a93)
  • hevc_decode_parameters (0x00a40a94)
  • hevc_decode_mode (0x00a40a95, menu min=1 max=1 default=1 = Frame-Based)
  • hevc_start_code (0x00a40a96, menu min=0 max=1 default=0 = No Start Code)
  • 0x00a40a97 returns EINVAL (no EXT_SPS_*_RPS controls)

ioctl trace confirms ffmpeg's VIDIOC_QUERY_EXT_CTRL for 0xa97 returns EINVAL — same probe pattern our backend uses for has_hevc_ext_sps_rps_rkvdec. The iter2 path stays dormant; the iter31 α-29 slice_params->short_term_ref_pic_set_size plumbing is the correct one for rpi-hevc-dec.

Q2 — hevc_start_code: default 0 (No Start Code), values {0, 1}

Default 0 matches our backend's "don't prepend HEVC start code" stance. Confirm in Phase 1: rpi-hevc-dec accepts our raw NAL slice payload as-is.

Q3 — NC12 / NC30 SAND tile layout: PARTIAL

CAPTURE S_FMT result for 1280×720 NC12:

  • sizeimage=1382400 = 1280 × 720 × 1.5 (linear NV12 byte count)
  • bytesperline=1080 (NOT 1280)

The bytesperline=1080 for a 1280-wide CAPTURE buffer is suspect — likely encodes SAND column count rather than linear stride. Read drivers/staging/media/rpivid/ (or wherever NC12_COL128 lives in 6.12) kernel source + drm_fourcc.h / nv12_col128.rst (if it exists) for exact tile layout BEFORE writing the detile primitive. Do NOT infer layout from this single observation.

Q4 — DRM modifier round-trip: BLOCKED on hwdownload

ffmpeg -hwaccel drm -hwaccel_output_format drm_prime -vf hwmap=mode=read,format=nv12 returns Failed to map frame: -38 (Function not implemented). hwdownload cannot consume the SAND modifier directly.

ffmpeg's path that DOES work: -hwaccel drm -c:v hevc WITHOUT -hwaccel_output_format drm_prime lets ffmpeg's internal pipeline pull back, detile (presumably via a Pi-specific helper or libdrm transform), and present NV12 to the next filter. Bit-exact vs SW for the test fixture (1280×720 Main 8-bit) — confirms HW engagement.

Phase 1 / Phase 4 will need to decide:

  • Detile in the backend (CPU SIMD), exposing NV12 via VAImage; or
  • Pass-through DRM_PRIME with SAND modifier and let the consumer (compositor / Firefox) detile. Firefox almost certainly can't, so CPU detile is the safe bet.

Q5 — rpi-hevc-dec submission ordering: empirically locked

strace -e ioctl of the kdirect run shows:

  1. MEDIA_IOC_DEVICE_INFO + MEDIA_IOC_G_TOPOLOGY (per media node)
  2. VIDIOC_QUERYCAP per video node — driver="rpi-hevc-dec" identifies the right one
  3. VIDIOC_ENUM_FMT OUTPUT → S265 only
  4. VIDIOC_S_FMT OUTPUT (HEVC_SLICE, placeholder dims)
  5. VIDIOC_REQBUFS OUTPUT (DMABUF, count=N) — count=6 in kdirect
  6. VIDIOC_S_FMT CAPTURE (NC12, actual dims from SPS parse)
  7. VIDIOC_CREATE_BUFS CAPTURE (DMABUF, count=16)
  8. VIDIOC_STREAMON both queues
  9. VIDIOC_QUERY_EXT_CTRL enumeration
  10. VIDIOC_S_EXT_CTRLS (decode_mode + start_code) — global ctrls
  11. Per frame: VIDIOC_S_EXT_CTRLS (SPS+PPS+decode_params+slice_array, class=0xf010000 = per-request) + VIDIOC_QBUF CAPTURE + VIDIOC_QBUF OUTPUT (with V4L2_BUF_FLAG_IN_REQUEST | V4L2_BUF_FLAG_REQUEST_FD) + VIDIOC_DQBUF OUTPUT + VIDIOC_DQBUF CAPTURE

Two structural notes for the backend:

  • OUTPUT + CAPTURE both use V4L2_MEMORY_DMABUF in kdirect. Our backend currently uses MMAP for CAPTURE on rkvdec/hantro. For Pi 5 we should either follow kdirect (DMABUF, allows zero-copy DRM_PRIME export) or use MMAP and CPU-detile. Phase 4 design decision.
  • The order S_FMT OUTPUT → REQBUFS OUTPUT → S_FMT CAPTURE → CREATE_BUFS CAPTURE → STREAMON differs from our iter25 rkvdec pre-seed pattern (where SPS via S_EXT_CTRLS must come BEFORE CAPTURE alloc to resolve the image_fmt). rpi-hevc-dec apparently DOESN'T need that pre-seed — CAPTURE S_FMT just takes the explicit NC12 + caller's dims. Confirm in Phase 1 by trying our existing iter25 pre-seed flow against it.

Q6 — packaging: Debian 13 trixie, NOT Arch

higgs runs Debian 13 trixie (PRETTY_NAME="Debian GNU/Linux 13 (trixie)"), not Arch ALARM. Phase 8 (per the dev-process Phase 8 packaging rule) for the Pi 5 chapter needs a debian/ packaging tree, not just a PKGBUILD.

Decide in Phase 1 whether to:

  • Add Debian packaging to marfrit-packages as a second target, OR
  • Use distrobox/podman with an Arch ALARM container on higgs for install (test-only, not production), OR
  • Pi 5 chapter ships a Debian source pkg via gitea / a personal Debian repo.

Other new findings from the probe session

  • ffmpeg 7.1.3 from Debian 13 is built with --enable-v4l2-request — the kdirect path exists. Invocation is ffmpeg -hwaccel drm -c:v hevc (not just -hwaccel drm; the explicit codec flag matters for the negotiation). Engagement log line is Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19; buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8. Per hw-decode-engagement-check, grep for that line to confirm HW path engaged.
  • No libva ICD installed on higgs — only armada-drm_dri.so ships, which doesn't apply. We'd be the first VA-API HW path for HEVC on Pi 5 once installed.
  • mpv is apt-installable (mpv 0.40.0-3+deb13u1) — useful as a pixel-readback verifier once the backend works (mpv --vo=image or --vo=drm).
  • Firefox 145.0.1 + rpi-firefox-mods 20251016 installed (firefox-esr package status was rc = removed but config remains). The mods package likely contains VA-API plumbing prefs.

What changes for Phase 1

  • Goal is now phrasable: HEVC bit-exact libva-vs-kdirect on higgs for the 1280×720 Main 8-bit test fixture (same generator as /tmp/bbb_main.mp4 here). Kdirect engagement signal is the Hwaccel V4L2 HEVC stateless V4 log line.
  • Most backend code reuses existing rkvdec/hantro HEVC path: ctrls, per-frame submission, request_fd, multi-device probe pattern.
  • New code: NC12 video_format entry + detile primitive (sibling to nv15_unpack_plane_to_p010) + RPI_HEVC_DEC driver_kind.
  • Packaging target = Debian, not Arch.