diff --git a/phase0_pi5_hevc.md b/phase0_pi5_hevc.md new file mode 100644 index 0000000..db41aa7 --- /dev/null +++ b/phase0_pi5_hevc.md @@ -0,0 +1,160 @@ +# Phase 0 — Pi 5 / CM5 HEVC chapter + +Opened 2026-05-17 evening, after the failed `libva-v4l2-stateful-fourier` +scaffold attempt. Brother-session empirical Phase 0 on higgs invalidated +the stateful premise: rpi-hevc-dec is V4L2 **stateless**, so Pi 5 HEVC +belongs in this backend, not a separate sibling. + +No code in this chapter yet. This doc is the substrate. Phase 1 picks up +from the "Open questions" section. + +## Substrate + +### Target host + +higgs — Pi CM5 module on Pi CM5 IO board. BCM2712 SoC. VPN-only, often +offline; wake via HIS skill recipe (no Fritz!Box plug — runs on power +when on). Debian-based. Sole HW video decoder is rpi-hevc-dec at +`/dev/video19` + `/dev/media1`. + +### Backend baseline at chapter open + +`libva-v4l2-request-fourier` master tip `cf8cd9d` (iter39 + Option B + +h265 ref-list cap fix). Multi-device probe (iter38) already opens +rkvdec + hantro slots; adding a third decoder slot for rpi-hevc-dec is +a natural extension of that architecture. + +iter2 (ampere VDPU381 HEVC EXT_SPS) added the GStreamer 1.28.2 H.265 +parser vendor + EXT_SPS_ST_RPS / _LT_RPS dynamic-array submission. That +plumbing is probe-gated (`has_hevc_ext_sps_rps_rkvdec`), so it stays +dormant on hosts where the controls don't exist. + +### Empirical higgs probe (brother session) + +`v4l2-ctl -d /dev/video19 --list-formats-ext --list-ctrls`: + +``` +Stateless Codec Controls + + hevc_sequence_parameter_set (compound, V4L2_CID_STATELESS_HEVC_SPS) + hevc_picture_parameter_set (compound, V4L2_CID_STATELESS_HEVC_PPS) + slice_param_array (compound dynamic-array dims=[4096]) + hevc_scaling_matrix (compound) + hevc_decode_parameters (compound) + hevc_decode_mode (menu, "Frame-Based") + hevc_start_code (menu, default "No Start Code") + +OUTPUT formats: + S265 V4L2_PIX_FMT_HEVC_SLICE (parsed slice payload) + +CAPTURE formats: + NC12 V4L2_PIX_FMT_NV12_COL128 (8-bit SAND 128-column tiled) + NC30 V4L2_PIX_FMT_NV12_10_COL128 (10-bit SAND 128-column tiled) +``` + +Conclusion: this is the standard `V4L2_CID_STATELESS_HEVC_*` control set +exposed under the V4L2-request uAPI, exactly the same family our backend +already drives for rkvdec/hantro/cedrus HEVC paths. The novel parts are +two pixel formats (NC12, NC30) and one driver-id (rpi-hevc-dec). + +## What carries forward unchanged + +- VAAPI HEVC profile enumeration (`config.c`) +- `h265_set_controls` core path (`h265.c`) — same compound ctrl set +- Synthetic SPS pre-seed pattern (iter25/26) — already runs pre-CAPTURE-alloc +- Multi-device dispatch in `RequestCreateConfig` (iter38) +- VAAPI slice / picture / IQ matrix buffer parsing +- HEVC h264-style start-code policy (we already DON'T prepend for HEVC) + +## What needs adding + +| Item | Location | Sizing | +|------|----------|--------| +| `RPI_HEVC_DEC` enum in `driver_kind_t` | `request.h` | trivial | +| Multi-device probe extends to `/dev/video19` discovery | `context.c` / `request.c` init | small — mirror hantro slot | +| `V4L2_PIX_FMT_NV12_COL128` (NC12) `video_format` entry | `video.c` | small | +| `V4L2_PIX_FMT_NV12_10_COL128` (NC30) `video_format` entry | `video.c` | small | +| NC12 → NV12 detile primitive | new `nv12_col128.c` | mid — column tile layout, see kernel docs | +| NC30 → P010 detile primitive | new `nv12_col128.c` | mid — 10-bit variant of above | +| `copy_surface_to_image` branch for NC12/NC30 | `image.c` | small (mirror NV15→P010 gating) | +| Per-driver gating for any rpi-specific quirks discovered | various | per [[per-driver-kludge-gating]] | + +## Open questions for Phase 1 + +Lock these before Phase 1 commits to a goal. + +1. **EXT_SPS controls on rpi-hevc-dec?** Brother's `--list-ctrls` output + above shows the standard `V4L2_CID_STATELESS_HEVC_*` family — NOT the + `EXT_SPS_ST_RPS` / `EXT_SPS_LT_RPS` extensions that VDPU381 needs. + Verify: does `slice_param_array[4096]` accept `st_rps_bits` / + `lt_rps_bits` in the per-slice payload, or does rpi-hevc-dec parse RPS + itself from the slice header? If the latter, the iter2 EXT_SPS path + stays dormant (probe-gated already), and rpi-hevc-dec just needs the + `picture->st_rps_bits` → `slice_params->short_term_ref_pic_set_size` + plumbing that iter31 α-29 already wired. Expectation: works out of the + box. Confirm before assuming. + +2. **`hevc_start_code` ctrl: "No Start Code" vs Annex B?** Brother saw + default `"No Start Code"` — matches our behavior (we don't prepend on + HEVC). But the ctrl is configurable. Verify the menu values exposed + and confirm "No Start Code" passes our raw slice-NAL payload as-is. + If it doesn't, set the ctrl explicitly per [[unconditional-codec-state]] + gating. + +3. **NC12 / NC30 SAND tile layout — exact spec.** Read + `Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst` for the + COL128 variants. Confirm: column stride = 128 bytes (Y) / 128 bytes + (UV interleaved). Row count = `ALIGN(height, 16)` or `ALIGN(height, 8)`? + Get the exact alignment and tile-traversal order before writing the + detile primitive. Cite from kernel doc, NOT inferred from a hex dump. + +4. **drm_prime / SAND modifier round-trip.** Does ffmpeg-vaapi (and + Firefox) accept the NC12 buffer via DRM_PRIME export carrying the + DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT modifier, allowing + zero-copy to a SAND-aware compositor? Or is libva-side detile to a + linear NV12 buffer the only viable Firefox path? If detile is + required for the consumer, the [[rockchip-pixel-verify-path]] rule + (DMA-BUF GL preferred over cached mmap) might NOT apply since SAND + is Pi-specific and not in the wider Wayland modifier ecosystem. + +5. **rpi-hevc-dec quirks on first SPS submission.** rkvdec needs + image_fmt pre-seed before CAPTURE alloc (iter25). Does rpi-hevc-dec + have an analogous "must set OUTPUT pix_fmt + SPS before CAPTURE" + ordering? Verify with strace early. + +6. **higgs OS + libva versioning.** Brother probed on Debian. We package + for Arch ALARM. What's the install path on higgs — Arch / Debian / + Raspberry Pi OS? If Debian, the package needs a `debian/` tree, not + just PKGBUILD. Decide packaging target before Phase 8. + +## Phase 1 goal sketch (NOT locked) + +> Firefox HW HEVC playback on higgs at ≥30fps for 1080p Main, byte-exact +> libva-vs-kdirect for ≥3 reference fixtures (8-bit Main and 10-bit Main10). + +Two measurable subgoals follow naturally: +- libva (this backend, NV12 image output) == kdirect (ffmpeg-v4l2request, + NV12 image output) byte-exact for the same input. +- Firefox VA-API path engages (verify via `chrome://gpu` equivalent / log + inspection — `MOZ_LOG=PlatformDecoderModule:5`). + +## Phase 3 baseline plan + +Before any backend code touches rpi-hevc-dec: +- `kdirect` floor: `ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime + -i bbb_720p10s_hevc.mp4 -vf hwdownload,format=nv12 -frames:v 10 ...` and + sha256 the YUV. +- `SW reference`: same ffmpeg without `-hwaccel`, sha256 the YUV. +- Both runs N=3 per [[replicate-baseline-first]]. +- Capture `strace -f -e ioctl` of the kdirect run — gives the canonical + ioctl sequence rpi-hevc-dec expects. + +## Phase 0 closing + +This doc commits the substrate. Phase 1 starts when: +- higgs is up + reachable +- Open questions 1+2 (EXT_SPS + start_code) are answered live, in one + short probe session +- Phase 3 baseline floors are captured + +No work blocks the close of iter39 / fresnel campaign — those are shipped.