phase5_pi5_hevc_review: 3 critical findings empirically verified, 1 fixture gap

Sonnet Plan-agent review of phase1_pi5_hevc plan. Empirically
verified each finding against current source per
feedback_review_empirical_over_theoretical BEFORE accepting:

F1 (CRITICAL): #ifdef __arm__ at image.c:239+268 kills NC12 (and
already-present NV15) detile on AArch64. fresnel iter39 5/5 PASS
masked this because 10-bit path was never exercised. Fix: extend
guard to __aarch64__.

F2 (CRITICAL): destination_bytesperlines for NC12 source returns
column-stride (1080) not linear-NV12 Y stride (1280). VAImage
consumers see wrong pitch. Fix: override in RequestCreateImage
when src=NC12, dst image=NV12.

F3 (CRITICAL): request.c primary-driver detection has else-if
branches for rkvdec and hantro-vpu only. On higgs (rpi-hevc-dec
primary), neither matches → new fd pair stays -1 → routing
no-ops. Fix: add explicit rpi-hevc-dec branch.

F4 (accepted): add 1366x768 fixture to exercise column padding.

F5 (verify-only): HEVC START_CODE_ANNEX_B may not work on
rpi-hevc-dec (kdirect uses NONE). Don't pre-gate; verify
empirically in Phase 7.

F6 (CRITICAL): iter25 synthetic-SPS pre-seed fires for HEVC
regardless of driver_kind. Would issue HEVC_SPS to rpi-hevc-dec
which doesn't need it AND uses different submission order. Fix:
gate on driver_data->video_fd != video_fd_rpi_hevc_dec.

F7/F8 (no findings): image.c gate predicate sound; cross-device
regression scope clean.

Amended Phase 6 step list with 3 new gating actions. Phase 7
verification expanded with empirical START_CODE check + 1366
fixture.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-17 19:04:28 +00:00
parent bf52725ab3
commit f1be489c75
+194
View File
@@ -0,0 +1,194 @@
# Phase 5 review — iter40 plan (sonnet review + amendments)
Reviewer verdict: **yellow** — plan substantively sound, 3 concrete blockers
+ 1 fixture gap + 1 verification-only note. All findings verified empirically
against current source (per [[feedback_review_empirical_over_theoretical]])
BEFORE accepting into the amended plan.
## Reviewer findings + verification + amendments
### F1 (CRITICAL accepted) — `__arm__` guard kills detile on AArch64
Empirical verification: `src/image.c` lines 239 + 268 wrap the entire
per-format detile dispatch (incl. `nv15_unpack_plane_to_p010`) in
`#ifdef __arm__`. Pi 5 / fresnel / ampere are all AArch64 → guard never
fires → both NC12 detile (proposed) AND existing NV15→P010 unpack
(iter39) are silently dead code on aarch64. iter39 5/5 PASS on fresnel
was bit-exact for 8-bit codecs only; the 10-bit detile path was never
exercised, so the dead-code didn't manifest as a failure.
**Amendment:** Phase 6 step 5 first sub-action — change guard at lines
239 + 268 from `#ifdef __arm__` to `#if defined(__arm__) || defined(__aarch64__)`.
This re-enables the existing NV15→P010 detile AND lets the new NC12
detile branch execute. No semantic change on x86 (no detile primitives
compiled there). Add explicit comment crediting Phase 5 review + this
finding.
### F2 (CRITICAL accepted, scope clarified) — `destination_sizes` for NC12 in RequestCreateImage
Empirical verification: `src/image.c` lines 90-115 already recompute
`destination_bytesperlines[0]` + `destination_sizes[0]` for `P010`
(line 90: `destination_bytesperlines[0] = width * 2`). The fall-through
"NV12" branch (line 108) uses V4L2-reported stride directly, which for
NC12 source is the column-stride 1080, not the linear Y stride 1280.
That breaks the VAImage's `pitches[0]` consumers expect.
`context.c` lines 379-383 — `destination_sizes[0] = destination_bytesperlines[0] * format_height` — IS used at cap_pool init time to size the
CAPTURE buffer's MMAP region accounting in `driver_data->fmt_sizes[]`.
For NC12: 1080 × 720 = 777600 vs actual `sizeimage` 1382400. cap_pool
allocates the actual `sizeimage` via REQBUFS, so the underlying buffer
is correctly sized; `fmt_sizes[]` is just a back-cache for later access
patterns that don't go through the kernel-reported value.
**Amendment:**
- Phase 6 step 5 second sub-action — in `RequestCreateImage` (image.c
~line 107, the "else" / NV12 branch), add detection: if the source
CAPTURE format is `V4L2_PIX_FMT_NV12_COL128` AND the requested image
format is `VA_FOURCC_NV12`, override `destination_bytesperlines[0] =
width` (linear NV12 Y stride). `destination_sizes[0]` then computes
to `width × format_height` (correct linear Y plane size). Existing
NV12-source linear path unchanged.
- Phase 6 step 3 video.c — set `v4l2_buffers_count = 1` for NC12 (single
contiguous buffer holding Y+UV) and document this is the planes-1
multi-plane case (similar to NV12 MPLANE).
- context.c lines 380-383 (`destination_sizes[0] = bytesperlines * height`)
stays AS-IS for now. It only affects cap_pool MMAP accounting which
uses the kernel-reported `sizeimage` via REQBUFS anyway. If a future
bug emerges from this mismatch on the rkvdec/hantro side, address
then; not a blocker for iter40 NC12.
### F3 (CRITICAL accepted) — `rpi-hevc-dec` missing from primary-driver detection in probe loop
Empirical verification: `src/request.c` lines 647-657 only have `else if`
branches for `rkvdec` and `hantro-vpu`. On higgs (no rkvdec, no hantro)
the primary device IS `rpi-hevc-dec`, but neither branch matches → no
`primary_driver` set → no fds stored into the new
`video_fd_rpi_hevc_dec` slot → routing silently no-ops with -1 fds.
**Amendment:** Phase 6 step 2 sub-action — add explicit `else if (strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in the primary-driver
detection block. Sets `video_fd_rpi_hevc_dec = video_fd` + `media_fd_rpi_hevc_dec = media_fd`. Pi has no alt — `alt_driver` stays NULL,
no second-decoder probe runs for higgs. (rkvdec + hantro alt-probes
remain dead on higgs because the find_decoder_device_by_driver walk
returns absent for them.)
Also: extend `find_decoder_device_by_driver`'s driver-name table at
request.c:94-95 if needed to include `rpi-hevc-dec` — verify it's a
free-form string match (it is, per the code), not a hard table — so the
caller passes `"rpi-hevc-dec"` and the walk just looks for it.
### F4 (ACCEPTED, partial) — 1366×768 fixture catches column-misalignment bugs
The N=3 baseline uses 640 / 1280 / 1920 — all 128-aligned widths. A
1366-wide fixture exercises the `ALIGN(width, 128) → 1408` column
padding path. The right-edge 42 pixels (cols 1366-1407) are padding;
the detile primitive must not write past the requested width.
**Amendment:** Phase 7 sub-action — add `bbb_1366_main.mp4` (1366×768)
to the Phase 7 verification set. Phase 3 baseline retroactively
captured at Phase 7 time. Goal: same kdirect/SW bit-exact PASS at
N=1 (no need to redo the deterministic N=3 — one rep proves the
edge-case). If libva differs from kdirect on 1366 but matches on
1280/1920, the detile column-base math is buggy.
### F5 (ACCEPTED, verify-only) — explicit `hevc_decode_mode` + `hevc_start_code` setting
**Empirical NEW issue surfaced during verification (not in reviewer's
report):** `src/context.c` lines 516-528 unconditionally sets
`V4L2_CID_STATELESS_HEVC_START_CODE` to `_ANNEX_B` (value 1) AND
prepends `0x00 0x00 0x01` start codes to each slice payload (per the
H.264 mirror block at line 532+). But Phase 0 strace shows kdirect uses
`start_code=0` = `_NONE` and submits raw NAL slice payload WITHOUT start
codes.
Both modes are in rpi-hevc-dec's menu range (min=0 max=1). Open
question: does rpi-hevc-dec correctly parse start-code-prepended
payload when in ANNEX_B mode? Two possibilities:
(a) Yes — driver implements both modes, ANNEX_B works, libva PASSes
bit-exact in our default code path.
(b) No — driver only really implements NONE; ANNEX_B is a degenerate
menu entry; we'd need per-driver gating to send `_NONE` for
rpi-hevc-dec + suppress start-code prepend.
**Amendment:** Phase 7 — verify empirically via the first libva-vs-kdirect
diff. If (a), no code change needed. If (b), add per-driver gate around
the START_CODE set (mirror rkvdec/hantro pattern). Don't pre-emptively
gate; let empiricism decide.
### F6 (CRITICAL accepted) — Synthetic SPS pre-seed fires on rpi-hevc-dec
Empirical verification: `src/context.c` lines 277-346 — the iter25
synthetic-SPS injection block runs for `VAProfileHEVCMain` regardless
of active driver_kind. On higgs, `driver_data->video_fd` will be
`video_fd_rpi_hevc_dec` at this point → `v4l2_set_controls(...SPS...)`
fires on rpi-hevc-dec. Phase 0 strace shows rpi-hevc-dec doesn't need
this AND uses a different submission ordering (S_FMT_OUTPUT → REQBUFS_OUTPUT → S_FMT_CAPTURE → CREATE_BUFS_CAPTURE → STREAMON, then global
ctrls per-frame).
The pre-seed is wrapped in `(void)v4l2_set_controls(...)` — failure is
silently ignored, BUT the call may also succeed in an unintended way
on rpi-hevc-dec (it has the HEVC_SPS ctrl), potentially leaving its
internal state stuck on the dummy SPS until the first real per-frame
SPS arrives.
**Amendment:** Phase 6 step 2 sub-action — gate the synthetic-SPS
injection block at context.c:277 with
`if (driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec)`. The
pre-seed only fires when active fd is NOT rpi-hevc-dec. rkvdec /
hantro paths unchanged.
### F7 (No findings) — `image.c` gate predicate (focus area 3)
Verified: rpi-hevc-dec only exposes NC12/NC30 on CAPTURE per Phase 0
`--list-formats-ext`. No legitimate NV12-linear case exists on Pi. Gate
predicate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128` is sound — fires only when
both conditions hold, excludes legitimate NV12-linear on RK / Allwinner.
### F8 (No findings) — cross-device regression scope (focus area 4)
Verified: new fd fields initialise to -1; probe loop extensions are
additive (no-op when string doesn't match); `request_device_kind_for_profile`'s 'p' branch only fires when `video_fd_rpi_hevc_dec >= 0`;
new video.c entry is additive. fresnel + ampere paths unchanged.
## Final amended Phase 6 step list
1. `src/request.h` — add `video_fd_rpi_hevc_dec`, `media_fd_rpi_hevc_dec`,
`has_hevc_ext_sps_rps_rpi_hevc_dec` (mirror iter38 + iter2 layout).
2. `src/request.c` — (a) extend init -1 block; (b) **add `else if
(strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in primary-driver
detection** [F3]; (c) extend `request_device_kind_for_profile` so
HEVC→'p' when rpi present, else 'r'; (d) extend `request_switch_device_for_profile` 'p' branch; (e) probe ext_sps on new fd.
3. `src/context.c` — **gate synthetic-SPS pre-seed (lines 277-346) on
`driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec`** [F6].
4. `src/video.c` — NC12 entry with `v4l2_buffers_count=1`,
`v4l2_mplane=true`, NOT marked linear.
5. `src/image.c`:
- **Extend `#ifdef __arm__` guards (lines 239, 268) to `#if defined(__arm__) || defined(__aarch64__)`** [F1].
- **Add NC12 detection in RequestCreateImage** (line 107 area): if
source format is NC12 + VAImage format is NV12, override
`destination_bytesperlines[0] = width` [F2].
- **Add NC12 detile branch in `copy_surface_to_image`** (line 238+):
gate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128`; call new detile primitive.
6. `src/nv12_col128.c` + `.h` (NEW) — detile primitive.
7. `tests/test_nv12_col128_detile.c` (NEW) — unit test with hand-crafted
NC12 bytes + known linear output.
8. `src/meson.build` + `src/Makefile.am` — source list updates.
9. Build clean on higgs; if `tests/` doesn't auto-run, run manually.
## Final amended Phase 7 verification
- Build cleanly on higgs.
- Local install `.so` to `/usr/lib/aarch64-linux-gnu/dri/`.
- `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain`.
- Phase 3 fixtures (640 / 1280 / 1920) + new 1366×768 fixture: libva
output SHA == kdirect SHA [F4].
- `lsof` during libva decode shows `/dev/video19` open.
- `strace -e ioctl` shows pre-seed pattern ABSENT on rpi-hevc-dec [F6
verification].
- HEVC_START_CODE behavior verified empirically: if libva-vs-kdirect
fails for HEVC, add per-driver `_NONE` gate per F5 amendment.
- Sibling regression: re-run fresnel iter38 5/5 test rig — no change
expected since iter40 path is gated on new fd.
Total amended LoC estimate: ~280 backend + 100 primitive (was 250 + 100;
F1 + F2 + F6 add ~30 LoC of gates / overrides).