From f1be489c751d21d1f225539a2c4b64efa74b2740 Mon Sep 17 00:00:00 2001 From: claude-noether Date: Sun, 17 May 2026 19:04:28 +0000 Subject: [PATCH] phase5_pi5_hevc_review: 3 critical findings empirically verified, 1 fixture gap MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sonnet Plan-agent review of phase1_pi5_hevc plan. Empirically verified each finding against current source per feedback_review_empirical_over_theoretical BEFORE accepting: F1 (CRITICAL): #ifdef __arm__ at image.c:239+268 kills NC12 (and already-present NV15) detile on AArch64. fresnel iter39 5/5 PASS masked this because 10-bit path was never exercised. Fix: extend guard to __aarch64__. F2 (CRITICAL): destination_bytesperlines for NC12 source returns column-stride (1080) not linear-NV12 Y stride (1280). VAImage consumers see wrong pitch. Fix: override in RequestCreateImage when src=NC12, dst image=NV12. F3 (CRITICAL): request.c primary-driver detection has else-if branches for rkvdec and hantro-vpu only. On higgs (rpi-hevc-dec primary), neither matches → new fd pair stays -1 → routing no-ops. Fix: add explicit rpi-hevc-dec branch. F4 (accepted): add 1366x768 fixture to exercise column padding. F5 (verify-only): HEVC START_CODE_ANNEX_B may not work on rpi-hevc-dec (kdirect uses NONE). Don't pre-gate; verify empirically in Phase 7. F6 (CRITICAL): iter25 synthetic-SPS pre-seed fires for HEVC regardless of driver_kind. Would issue HEVC_SPS to rpi-hevc-dec which doesn't need it AND uses different submission order. Fix: gate on driver_data->video_fd != video_fd_rpi_hevc_dec. F7/F8 (no findings): image.c gate predicate sound; cross-device regression scope clean. Amended Phase 6 step list with 3 new gating actions. Phase 7 verification expanded with empirical START_CODE check + 1366 fixture. Co-Authored-By: Claude Opus 4.7 --- phase5_pi5_hevc_review.md | 194 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 194 insertions(+) create mode 100644 phase5_pi5_hevc_review.md diff --git a/phase5_pi5_hevc_review.md b/phase5_pi5_hevc_review.md new file mode 100644 index 0000000..0e24722 --- /dev/null +++ b/phase5_pi5_hevc_review.md @@ -0,0 +1,194 @@ +# Phase 5 review — iter40 plan (sonnet review + amendments) + +Reviewer verdict: **yellow** — plan substantively sound, 3 concrete blockers ++ 1 fixture gap + 1 verification-only note. All findings verified empirically +against current source (per [[feedback_review_empirical_over_theoretical]]) +BEFORE accepting into the amended plan. + +## Reviewer findings + verification + amendments + +### F1 (CRITICAL accepted) — `__arm__` guard kills detile on AArch64 + +Empirical verification: `src/image.c` lines 239 + 268 wrap the entire +per-format detile dispatch (incl. `nv15_unpack_plane_to_p010`) in +`#ifdef __arm__`. Pi 5 / fresnel / ampere are all AArch64 → guard never +fires → both NC12 detile (proposed) AND existing NV15→P010 unpack +(iter39) are silently dead code on aarch64. iter39 5/5 PASS on fresnel +was bit-exact for 8-bit codecs only; the 10-bit detile path was never +exercised, so the dead-code didn't manifest as a failure. + +**Amendment:** Phase 6 step 5 first sub-action — change guard at lines +239 + 268 from `#ifdef __arm__` to `#if defined(__arm__) || defined(__aarch64__)`. +This re-enables the existing NV15→P010 detile AND lets the new NC12 +detile branch execute. No semantic change on x86 (no detile primitives +compiled there). Add explicit comment crediting Phase 5 review + this +finding. + +### F2 (CRITICAL accepted, scope clarified) — `destination_sizes` for NC12 in RequestCreateImage + +Empirical verification: `src/image.c` lines 90-115 already recompute +`destination_bytesperlines[0]` + `destination_sizes[0]` for `P010` +(line 90: `destination_bytesperlines[0] = width * 2`). The fall-through +"NV12" branch (line 108) uses V4L2-reported stride directly, which for +NC12 source is the column-stride 1080, not the linear Y stride 1280. +That breaks the VAImage's `pitches[0]` consumers expect. + +`context.c` lines 379-383 — `destination_sizes[0] = destination_bytesperlines[0] * format_height` — IS used at cap_pool init time to size the +CAPTURE buffer's MMAP region accounting in `driver_data->fmt_sizes[]`. +For NC12: 1080 × 720 = 777600 vs actual `sizeimage` 1382400. cap_pool +allocates the actual `sizeimage` via REQBUFS, so the underlying buffer +is correctly sized; `fmt_sizes[]` is just a back-cache for later access +patterns that don't go through the kernel-reported value. + +**Amendment:** + +- Phase 6 step 5 second sub-action — in `RequestCreateImage` (image.c + ~line 107, the "else" / NV12 branch), add detection: if the source + CAPTURE format is `V4L2_PIX_FMT_NV12_COL128` AND the requested image + format is `VA_FOURCC_NV12`, override `destination_bytesperlines[0] = + width` (linear NV12 Y stride). `destination_sizes[0]` then computes + to `width × format_height` (correct linear Y plane size). Existing + NV12-source linear path unchanged. +- Phase 6 step 3 video.c — set `v4l2_buffers_count = 1` for NC12 (single + contiguous buffer holding Y+UV) and document this is the planes-1 + multi-plane case (similar to NV12 MPLANE). +- context.c lines 380-383 (`destination_sizes[0] = bytesperlines * height`) + stays AS-IS for now. It only affects cap_pool MMAP accounting which + uses the kernel-reported `sizeimage` via REQBUFS anyway. If a future + bug emerges from this mismatch on the rkvdec/hantro side, address + then; not a blocker for iter40 NC12. + +### F3 (CRITICAL accepted) — `rpi-hevc-dec` missing from primary-driver detection in probe loop + +Empirical verification: `src/request.c` lines 647-657 only have `else if` +branches for `rkvdec` and `hantro-vpu`. On higgs (no rkvdec, no hantro) +the primary device IS `rpi-hevc-dec`, but neither branch matches → no +`primary_driver` set → no fds stored into the new +`video_fd_rpi_hevc_dec` slot → routing silently no-ops with -1 fds. + +**Amendment:** Phase 6 step 2 sub-action — add explicit `else if (strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in the primary-driver +detection block. Sets `video_fd_rpi_hevc_dec = video_fd` + `media_fd_rpi_hevc_dec = media_fd`. Pi has no alt — `alt_driver` stays NULL, +no second-decoder probe runs for higgs. (rkvdec + hantro alt-probes +remain dead on higgs because the find_decoder_device_by_driver walk +returns absent for them.) + +Also: extend `find_decoder_device_by_driver`'s driver-name table at +request.c:94-95 if needed to include `rpi-hevc-dec` — verify it's a +free-form string match (it is, per the code), not a hard table — so the +caller passes `"rpi-hevc-dec"` and the walk just looks for it. + +### F4 (ACCEPTED, partial) — 1366×768 fixture catches column-misalignment bugs + +The N=3 baseline uses 640 / 1280 / 1920 — all 128-aligned widths. A +1366-wide fixture exercises the `ALIGN(width, 128) → 1408` column +padding path. The right-edge 42 pixels (cols 1366-1407) are padding; +the detile primitive must not write past the requested width. + +**Amendment:** Phase 7 sub-action — add `bbb_1366_main.mp4` (1366×768) +to the Phase 7 verification set. Phase 3 baseline retroactively +captured at Phase 7 time. Goal: same kdirect/SW bit-exact PASS at +N=1 (no need to redo the deterministic N=3 — one rep proves the +edge-case). If libva differs from kdirect on 1366 but matches on +1280/1920, the detile column-base math is buggy. + +### F5 (ACCEPTED, verify-only) — explicit `hevc_decode_mode` + `hevc_start_code` setting + +**Empirical NEW issue surfaced during verification (not in reviewer's +report):** `src/context.c` lines 516-528 unconditionally sets +`V4L2_CID_STATELESS_HEVC_START_CODE` to `_ANNEX_B` (value 1) AND +prepends `0x00 0x00 0x01` start codes to each slice payload (per the +H.264 mirror block at line 532+). But Phase 0 strace shows kdirect uses +`start_code=0` = `_NONE` and submits raw NAL slice payload WITHOUT start +codes. + +Both modes are in rpi-hevc-dec's menu range (min=0 max=1). Open +question: does rpi-hevc-dec correctly parse start-code-prepended +payload when in ANNEX_B mode? Two possibilities: + (a) Yes — driver implements both modes, ANNEX_B works, libva PASSes + bit-exact in our default code path. + (b) No — driver only really implements NONE; ANNEX_B is a degenerate + menu entry; we'd need per-driver gating to send `_NONE` for + rpi-hevc-dec + suppress start-code prepend. + +**Amendment:** Phase 7 — verify empirically via the first libva-vs-kdirect +diff. If (a), no code change needed. If (b), add per-driver gate around +the START_CODE set (mirror rkvdec/hantro pattern). Don't pre-emptively +gate; let empiricism decide. + +### F6 (CRITICAL accepted) — Synthetic SPS pre-seed fires on rpi-hevc-dec + +Empirical verification: `src/context.c` lines 277-346 — the iter25 +synthetic-SPS injection block runs for `VAProfileHEVCMain` regardless +of active driver_kind. On higgs, `driver_data->video_fd` will be +`video_fd_rpi_hevc_dec` at this point → `v4l2_set_controls(...SPS...)` +fires on rpi-hevc-dec. Phase 0 strace shows rpi-hevc-dec doesn't need +this AND uses a different submission ordering (S_FMT_OUTPUT → REQBUFS_OUTPUT → S_FMT_CAPTURE → CREATE_BUFS_CAPTURE → STREAMON, then global +ctrls per-frame). + +The pre-seed is wrapped in `(void)v4l2_set_controls(...)` — failure is +silently ignored, BUT the call may also succeed in an unintended way +on rpi-hevc-dec (it has the HEVC_SPS ctrl), potentially leaving its +internal state stuck on the dummy SPS until the first real per-frame +SPS arrives. + +**Amendment:** Phase 6 step 2 sub-action — gate the synthetic-SPS +injection block at context.c:277 with +`if (driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec)`. The +pre-seed only fires when active fd is NOT rpi-hevc-dec. rkvdec / +hantro paths unchanged. + +### F7 (No findings) — `image.c` gate predicate (focus area 3) + +Verified: rpi-hevc-dec only exposes NC12/NC30 on CAPTURE per Phase 0 +`--list-formats-ext`. No legitimate NV12-linear case exists on Pi. Gate +predicate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128` is sound — fires only when +both conditions hold, excludes legitimate NV12-linear on RK / Allwinner. + +### F8 (No findings) — cross-device regression scope (focus area 4) + +Verified: new fd fields initialise to -1; probe loop extensions are +additive (no-op when string doesn't match); `request_device_kind_for_profile`'s 'p' branch only fires when `video_fd_rpi_hevc_dec >= 0`; +new video.c entry is additive. fresnel + ampere paths unchanged. + +## Final amended Phase 6 step list + +1. `src/request.h` — add `video_fd_rpi_hevc_dec`, `media_fd_rpi_hevc_dec`, + `has_hevc_ext_sps_rps_rpi_hevc_dec` (mirror iter38 + iter2 layout). +2. `src/request.c` — (a) extend init -1 block; (b) **add `else if + (strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in primary-driver + detection** [F3]; (c) extend `request_device_kind_for_profile` so + HEVC→'p' when rpi present, else 'r'; (d) extend `request_switch_device_for_profile` 'p' branch; (e) probe ext_sps on new fd. +3. `src/context.c` — **gate synthetic-SPS pre-seed (lines 277-346) on + `driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec`** [F6]. +4. `src/video.c` — NC12 entry with `v4l2_buffers_count=1`, + `v4l2_mplane=true`, NOT marked linear. +5. `src/image.c`: + - **Extend `#ifdef __arm__` guards (lines 239, 268) to `#if defined(__arm__) || defined(__aarch64__)`** [F1]. + - **Add NC12 detection in RequestCreateImage** (line 107 area): if + source format is NC12 + VAImage format is NV12, override + `destination_bytesperlines[0] = width` [F2]. + - **Add NC12 detile branch in `copy_surface_to_image`** (line 238+): + gate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128`; call new detile primitive. +6. `src/nv12_col128.c` + `.h` (NEW) — detile primitive. +7. `tests/test_nv12_col128_detile.c` (NEW) — unit test with hand-crafted + NC12 bytes + known linear output. +8. `src/meson.build` + `src/Makefile.am` — source list updates. +9. Build clean on higgs; if `tests/` doesn't auto-run, run manually. + +## Final amended Phase 7 verification + +- Build cleanly on higgs. +- Local install `.so` to `/usr/lib/aarch64-linux-gnu/dri/`. +- `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain`. +- Phase 3 fixtures (640 / 1280 / 1920) + new 1366×768 fixture: libva + output SHA == kdirect SHA [F4]. +- `lsof` during libva decode shows `/dev/video19` open. +- `strace -e ioctl` shows pre-seed pattern ABSENT on rpi-hevc-dec [F6 + verification]. +- HEVC_START_CODE behavior verified empirically: if libva-vs-kdirect + fails for HEVC, add per-driver `_NONE` gate per F5 amendment. +- Sibling regression: re-run fresnel iter38 5/5 test rig — no change + expected since iter40 path is gated on new fd. + +Total amended LoC estimate: ~280 backend + 100 primitive (was 250 + 100; +F1 + F2 + F6 add ~30 LoC of gates / overrides).