Live probe of rpi-hevc-dec on higgs (Pi CM5, kernel 6.12.75-rpt-rpi-2712, Debian 13 trixie) answers Phase 0 open questions Q1, Q2, Q5, Q6 empirically; Q3 partial; Q4 still open. Q1 (EXT_SPS): NOT present. Only standard V4L2_CID_STATELESS_HEVC_*. Probe ctrl id 0xa97 returns EINVAL — same gate iter2's has_hevc_ext_sps_rps_rkvdec uses. iter31 alpha-29 plumbing applies. Q2 (hevc_start_code): default 0 "No Start Code"; matches our behaviour. Q3 (NC12 SAND tile layout): partial. CAPTURE S_FMT for 1280x720 NC12 returns sizeimage=1382400 (linear NV12 byte count) but bytesperline=1080 (suspect, encodes SAND col count not linear stride). Need kernel-doc / driver-source read before writing detile primitive. Q4 (DRM modifier round-trip): hwdownload rejects SAND-tiled drm_prime (-38 Function not implemented). Backend CPU-detile to NV12 is the safe path for Firefox. Q5 (submission ordering): empirical ioctl trace shows canonical V4L2 stateless flow. Two notes for the backend: kdirect uses V4L2_MEMORY_DMABUF for both queues (we use MMAP for CAPTURE on rkvdec); kdirect does NOT need the iter25 SPS pre-seed pattern - rpi-hevc-dec takes explicit NC12 + dims directly. Q6 (packaging): Debian 13 trixie. Phase 8 needs a debian/ tree, not just PKGBUILD. Decision in Phase 1. Other findings: ffmpeg 7.1.3 from stock Debian is built with --enable-v4l2-request. kdirect engagement line: Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19; buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8 No libva ICD installed (only armada-drm_dri.so). mpv installable. Firefox 145 + rpi-firefox-mods present. Phase 0 closed. Phase 1 opens with goal: HEVC bit-exact libva-vs-kdirect on higgs for 1280x720 Main 8-bit via the new RPI_HEVC_DEC driver_kind slot + NC12 detile primitive. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
14 KiB
Phase 0 — Pi 5 / CM5 HEVC chapter
Opened 2026-05-17 evening, after the failed libva-v4l2-stateful-fourier
scaffold attempt. Brother-session empirical Phase 0 on higgs invalidated
the stateful premise: rpi-hevc-dec is V4L2 stateless, so Pi 5 HEVC
belongs in this backend, not a separate sibling.
No code in this chapter yet. This doc is the substrate. Phase 1 picks up from the "Open questions" section.
Substrate
Target host
higgs — Pi CM5 module on Pi CM5 IO board. BCM2712 SoC. VPN-only, often
offline; wake via HIS skill recipe (no Fritz!Box plug — runs on power
when on). Debian-based. Sole HW video decoder is rpi-hevc-dec at
/dev/video19 + /dev/media1.
Backend baseline at chapter open
libva-v4l2-request-fourier master tip cf8cd9d (iter39 + Option B +
h265 ref-list cap fix). Multi-device probe (iter38) already opens
rkvdec + hantro slots; adding a third decoder slot for rpi-hevc-dec is
a natural extension of that architecture.
iter2 (ampere VDPU381 HEVC EXT_SPS) added the GStreamer 1.28.2 H.265
parser vendor + EXT_SPS_ST_RPS / _LT_RPS dynamic-array submission. That
plumbing is probe-gated (has_hevc_ext_sps_rps_rkvdec), so it stays
dormant on hosts where the controls don't exist.
Empirical higgs probe (brother session)
v4l2-ctl -d /dev/video19 --list-formats-ext --list-ctrls:
Stateless Codec Controls
hevc_sequence_parameter_set (compound, V4L2_CID_STATELESS_HEVC_SPS)
hevc_picture_parameter_set (compound, V4L2_CID_STATELESS_HEVC_PPS)
slice_param_array (compound dynamic-array dims=[4096])
hevc_scaling_matrix (compound)
hevc_decode_parameters (compound)
hevc_decode_mode (menu, "Frame-Based")
hevc_start_code (menu, default "No Start Code")
OUTPUT formats:
S265 V4L2_PIX_FMT_HEVC_SLICE (parsed slice payload)
CAPTURE formats:
NC12 V4L2_PIX_FMT_NV12_COL128 (8-bit SAND 128-column tiled)
NC30 V4L2_PIX_FMT_NV12_10_COL128 (10-bit SAND 128-column tiled)
Conclusion: this is the standard V4L2_CID_STATELESS_HEVC_* control set
exposed under the V4L2-request uAPI, exactly the same family our backend
already drives for rkvdec/hantro/cedrus HEVC paths. The novel parts are
two pixel formats (NC12, NC30) and one driver-id (rpi-hevc-dec).
What carries forward unchanged
- VAAPI HEVC profile enumeration (
config.c) h265_set_controlscore path (h265.c) — same compound ctrl set- Synthetic SPS pre-seed pattern (iter25/26) — already runs pre-CAPTURE-alloc
- Multi-device dispatch in
RequestCreateConfig(iter38) - VAAPI slice / picture / IQ matrix buffer parsing
- HEVC h264-style start-code policy (we already DON'T prepend for HEVC)
What needs adding
| Item | Location | Sizing |
|---|---|---|
RPI_HEVC_DEC enum in driver_kind_t |
request.h |
trivial |
Multi-device probe extends to /dev/video19 discovery |
context.c / request.c init |
small — mirror hantro slot |
V4L2_PIX_FMT_NV12_COL128 (NC12) video_format entry |
video.c |
small |
V4L2_PIX_FMT_NV12_10_COL128 (NC30) video_format entry |
video.c |
small |
| NC12 → NV12 detile primitive | new nv12_col128.c |
mid — column tile layout, see kernel docs |
| NC30 → P010 detile primitive | new nv12_col128.c |
mid — 10-bit variant of above |
copy_surface_to_image branch for NC12/NC30 |
image.c |
small (mirror NV15→P010 gating) |
| Per-driver gating for any rpi-specific quirks discovered | various | per per-driver-kludge-gating |
Open questions for Phase 1
Lock these before Phase 1 commits to a goal.
-
EXT_SPS controls on rpi-hevc-dec? Brother's
--list-ctrlsoutput above shows the standardV4L2_CID_STATELESS_HEVC_*family — NOT theEXT_SPS_ST_RPS/EXT_SPS_LT_RPSextensions that VDPU381 needs. Verify: doesslice_param_array[4096]acceptst_rps_bits/lt_rps_bitsin the per-slice payload, or does rpi-hevc-dec parse RPS itself from the slice header? If the latter, the iter2 EXT_SPS path stays dormant (probe-gated already), and rpi-hevc-dec just needs thepicture->st_rps_bits→slice_params->short_term_ref_pic_set_sizeplumbing that iter31 α-29 already wired. Expectation: works out of the box. Confirm before assuming. -
hevc_start_codectrl: "No Start Code" vs Annex B? Brother saw default"No Start Code"— matches our behavior (we don't prepend on HEVC). But the ctrl is configurable. Verify the menu values exposed and confirm "No Start Code" passes our raw slice-NAL payload as-is. If it doesn't, set the ctrl explicitly per unconditional-codec-state gating. -
NC12 / NC30 SAND tile layout — exact spec. Read
Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rstfor the COL128 variants. Confirm: column stride = 128 bytes (Y) / 128 bytes (UV interleaved). Row count =ALIGN(height, 16)orALIGN(height, 8)? Get the exact alignment and tile-traversal order before writing the detile primitive. Cite from kernel doc, NOT inferred from a hex dump. -
drm_prime / SAND modifier round-trip. Does ffmpeg-vaapi (and Firefox) accept the NC12 buffer via DRM_PRIME export carrying the DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT modifier, allowing zero-copy to a SAND-aware compositor? Or is libva-side detile to a linear NV12 buffer the only viable Firefox path? If detile is required for the consumer, the rockchip-pixel-verify-path rule (DMA-BUF GL preferred over cached mmap) might NOT apply since SAND is Pi-specific and not in the wider Wayland modifier ecosystem.
-
rpi-hevc-dec quirks on first SPS submission. rkvdec needs image_fmt pre-seed before CAPTURE alloc (iter25). Does rpi-hevc-dec have an analogous "must set OUTPUT pix_fmt + SPS before CAPTURE" ordering? Verify with strace early.
-
higgs OS + libva versioning. Brother probed on Debian. We package for Arch ALARM. What's the install path on higgs — Arch / Debian / Raspberry Pi OS? If Debian, the package needs a
debian/tree, not just PKGBUILD. Decide packaging target before Phase 8.
Phase 1 goal sketch (NOT locked)
Firefox HW HEVC playback on higgs at ≥30fps for 1080p Main, byte-exact libva-vs-kdirect for ≥3 reference fixtures (8-bit Main and 10-bit Main10).
Two measurable subgoals follow naturally:
- libva (this backend, NV12 image output) == kdirect (ffmpeg-v4l2request, NV12 image output) byte-exact for the same input.
- Firefox VA-API path engages (verify via
chrome://gpuequivalent / log inspection —MOZ_LOG=PlatformDecoderModule:5).
Phase 3 baseline plan
Before any backend code touches rpi-hevc-dec:
kdirectfloor:ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime -i bbb_720p10s_hevc.mp4 -vf hwdownload,format=nv12 -frames:v 10 ...and sha256 the YUV.SW reference: same ffmpeg without-hwaccel, sha256 the YUV.- Both runs N=3 per replicate-baseline-first.
- Capture
strace -f -e ioctlof the kdirect run — gives the canonical ioctl sequence rpi-hevc-dec expects.
Phase 0 closing
This doc commits the substrate. Phase 1 starts when:
- higgs is up + reachable
- Open questions 1+2 (EXT_SPS + start_code) are answered live, in one short probe session
- Phase 3 baseline floors are captured
No work blocks the close of iter39 / fresnel campaign — those are shipped.
Phase 0 close addendum (2026-05-17 evening, higgs probe session)
Empirical probes on higgs answered Q1, Q2, partial Q3, full Q5, full Q6. Q4 (DRM modifier round-trip) remains open. Phase 0 is closed; Phase 1 opens with what's below.
Q1 — EXT_SPS controls on rpi-hevc-dec: NOT present
v4l2-ctl -d /dev/video19 --list-ctrls confirms ONLY the standard
V4L2_CID_STATELESS_HEVC_* set:
hevc_sequence_parameter_set(0x00a40a90)hevc_picture_parameter_set(0x00a40a91)slice_param_array(0x00a40a92, dynamic-array dims=[4096])hevc_scaling_matrix(0x00a40a93)hevc_decode_parameters(0x00a40a94)hevc_decode_mode(0x00a40a95, menu min=1 max=1 default=1 = Frame-Based)hevc_start_code(0x00a40a96, menu min=0 max=1 default=0 = No Start Code)- 0x00a40a97 returns EINVAL (no EXT_SPS_*_RPS controls)
ioctl trace confirms ffmpeg's VIDIOC_QUERY_EXT_CTRL for 0xa97 returns
EINVAL — same probe pattern our backend uses for
has_hevc_ext_sps_rps_rkvdec. The iter2 path stays dormant; the
iter31 α-29 slice_params->short_term_ref_pic_set_size plumbing is the
correct one for rpi-hevc-dec.
Q2 — hevc_start_code: default 0 (No Start Code), values {0, 1}
Default 0 matches our backend's "don't prepend HEVC start code" stance. Confirm in Phase 1: rpi-hevc-dec accepts our raw NAL slice payload as-is.
Q3 — NC12 / NC30 SAND tile layout: PARTIAL
CAPTURE S_FMT result for 1280×720 NC12:
sizeimage=1382400=1280 × 720 × 1.5(linear NV12 byte count)bytesperline=1080(NOT 1280)
The bytesperline=1080 for a 1280-wide CAPTURE buffer is suspect — likely
encodes SAND column count rather than linear stride. Read
drivers/staging/media/rpivid/ (or wherever NC12_COL128 lives in 6.12)
kernel source + drm_fourcc.h / nv12_col128.rst (if it exists) for
exact tile layout BEFORE writing the detile primitive. Do NOT infer
layout from this single observation.
Q4 — DRM modifier round-trip: BLOCKED on hwdownload
ffmpeg -hwaccel drm -hwaccel_output_format drm_prime -vf hwmap=mode=read,format=nv12 returns Failed to map frame: -38
(Function not implemented). hwdownload cannot consume the SAND
modifier directly.
ffmpeg's path that DOES work: -hwaccel drm -c:v hevc WITHOUT
-hwaccel_output_format drm_prime lets ffmpeg's internal pipeline pull
back, detile (presumably via a Pi-specific helper or libdrm transform),
and present NV12 to the next filter. Bit-exact vs SW for the test
fixture (1280×720 Main 8-bit) — confirms HW engagement.
Phase 1 / Phase 4 will need to decide:
- Detile in the backend (CPU SIMD), exposing NV12 via VAImage; or
- Pass-through DRM_PRIME with SAND modifier and let the consumer (compositor / Firefox) detile. Firefox almost certainly can't, so CPU detile is the safe bet.
Q5 — rpi-hevc-dec submission ordering: empirically locked
strace -e ioctl of the kdirect run shows:
MEDIA_IOC_DEVICE_INFO+MEDIA_IOC_G_TOPOLOGY(per media node)VIDIOC_QUERYCAPper video node —driver="rpi-hevc-dec"identifies the right oneVIDIOC_ENUM_FMTOUTPUT → S265 onlyVIDIOC_S_FMTOUTPUT (HEVC_SLICE, placeholder dims)VIDIOC_REQBUFSOUTPUT (DMABUF, count=N) — count=6 in kdirectVIDIOC_S_FMTCAPTURE (NC12, actual dims from SPS parse)VIDIOC_CREATE_BUFSCAPTURE (DMABUF, count=16)VIDIOC_STREAMONboth queuesVIDIOC_QUERY_EXT_CTRLenumerationVIDIOC_S_EXT_CTRLS(decode_mode + start_code) — global ctrls- Per frame:
VIDIOC_S_EXT_CTRLS(SPS+PPS+decode_params+slice_array, class=0xf010000 = per-request) +VIDIOC_QBUFCAPTURE +VIDIOC_QBUFOUTPUT (withV4L2_BUF_FLAG_IN_REQUEST | V4L2_BUF_FLAG_REQUEST_FD) +VIDIOC_DQBUFOUTPUT +VIDIOC_DQBUFCAPTURE
Two structural notes for the backend:
- OUTPUT + CAPTURE both use
V4L2_MEMORY_DMABUFin kdirect. Our backend currently uses MMAP for CAPTURE on rkvdec/hantro. For Pi 5 we should either follow kdirect (DMABUF, allows zero-copy DRM_PRIME export) or use MMAP and CPU-detile. Phase 4 design decision. - The order
S_FMT OUTPUT → REQBUFS OUTPUT → S_FMT CAPTURE → CREATE_BUFS CAPTURE → STREAMONdiffers from our iter25 rkvdec pre-seed pattern (where SPS via S_EXT_CTRLS must come BEFORE CAPTURE alloc to resolve the image_fmt). rpi-hevc-dec apparently DOESN'T need that pre-seed — CAPTURE S_FMT just takes the explicit NC12 + caller's dims. Confirm in Phase 1 by trying our existing iter25 pre-seed flow against it.
Q6 — packaging: Debian 13 trixie, NOT Arch
higgs runs Debian 13 trixie (PRETTY_NAME="Debian GNU/Linux 13 (trixie)"),
not Arch ALARM. Phase 8 (per the dev-process Phase 8 packaging rule) for
the Pi 5 chapter needs a debian/ packaging tree, not just a PKGBUILD.
Decide in Phase 1 whether to:
- Add Debian packaging to
marfrit-packagesas a second target, OR - Use distrobox/podman with an Arch ALARM container on higgs for install (test-only, not production), OR
- Pi 5 chapter ships a Debian source pkg via gitea / a personal Debian repo.
Other new findings from the probe session
- ffmpeg 7.1.3 from Debian 13 is built with
--enable-v4l2-request— the kdirect path exists. Invocation isffmpeg -hwaccel drm -c:v hevc(not just-hwaccel drm; the explicit codec flag matters for the negotiation). Engagement log line isHwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19; buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8. Per hw-decode-engagement-check, grep for that line to confirm HW path engaged. - No libva ICD installed on higgs — only
armada-drm_dri.soships, which doesn't apply. We'd be the first VA-API HW path for HEVC on Pi 5 once installed. - mpv is apt-installable (
mpv 0.40.0-3+deb13u1) — useful as a pixel-readback verifier once the backend works (mpv --vo=imageor--vo=drm). - Firefox 145.0.1 + rpi-firefox-mods 20251016 installed (firefox-esr
package status was
rc= removed but config remains). The mods package likely contains VA-API plumbing prefs.
What changes for Phase 1
- Goal is now phrasable: HEVC bit-exact libva-vs-kdirect on higgs for
the 1280×720 Main 8-bit test fixture (same generator as
/tmp/bbb_main.mp4here). Kdirect engagement signal is theHwaccel V4L2 HEVC stateless V4log line. - Most backend code reuses existing rkvdec/hantro HEVC path: ctrls, per-frame submission, request_fd, multi-device probe pattern.
- New code: NC12 video_format entry + detile primitive (sibling to
nv15_unpack_plane_to_p010) + RPI_HEVC_DEC driver_kind. - Packaging target = Debian, not Arch.