Sonnet Plan-agent review of phase1_pi5_hevc plan. Empirically verified each finding against current source per feedback_review_empirical_over_theoretical BEFORE accepting: F1 (CRITICAL): #ifdef __arm__ at image.c:239+268 kills NC12 (and already-present NV15) detile on AArch64. fresnel iter39 5/5 PASS masked this because 10-bit path was never exercised. Fix: extend guard to __aarch64__. F2 (CRITICAL): destination_bytesperlines for NC12 source returns column-stride (1080) not linear-NV12 Y stride (1280). VAImage consumers see wrong pitch. Fix: override in RequestCreateImage when src=NC12, dst image=NV12. F3 (CRITICAL): request.c primary-driver detection has else-if branches for rkvdec and hantro-vpu only. On higgs (rpi-hevc-dec primary), neither matches → new fd pair stays -1 → routing no-ops. Fix: add explicit rpi-hevc-dec branch. F4 (accepted): add 1366x768 fixture to exercise column padding. F5 (verify-only): HEVC START_CODE_ANNEX_B may not work on rpi-hevc-dec (kdirect uses NONE). Don't pre-gate; verify empirically in Phase 7. F6 (CRITICAL): iter25 synthetic-SPS pre-seed fires for HEVC regardless of driver_kind. Would issue HEVC_SPS to rpi-hevc-dec which doesn't need it AND uses different submission order. Fix: gate on driver_data->video_fd != video_fd_rpi_hevc_dec. F7/F8 (no findings): image.c gate predicate sound; cross-device regression scope clean. Amended Phase 6 step list with 3 new gating actions. Phase 7 verification expanded with empirical START_CODE check + 1366 fixture. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
10 KiB
Phase 5 review — iter40 plan (sonnet review + amendments)
Reviewer verdict: yellow — plan substantively sound, 3 concrete blockers
- 1 fixture gap + 1 verification-only note. All findings verified empirically against current source (per feedback_review_empirical_over_theoretical) BEFORE accepting into the amended plan.
Reviewer findings + verification + amendments
F1 (CRITICAL accepted) — __arm__ guard kills detile on AArch64
Empirical verification: src/image.c lines 239 + 268 wrap the entire
per-format detile dispatch (incl. nv15_unpack_plane_to_p010) in
#ifdef __arm__. Pi 5 / fresnel / ampere are all AArch64 → guard never
fires → both NC12 detile (proposed) AND existing NV15→P010 unpack
(iter39) are silently dead code on aarch64. iter39 5/5 PASS on fresnel
was bit-exact for 8-bit codecs only; the 10-bit detile path was never
exercised, so the dead-code didn't manifest as a failure.
Amendment: Phase 6 step 5 first sub-action — change guard at lines
239 + 268 from #ifdef __arm__ to #if defined(__arm__) || defined(__aarch64__).
This re-enables the existing NV15→P010 detile AND lets the new NC12
detile branch execute. No semantic change on x86 (no detile primitives
compiled there). Add explicit comment crediting Phase 5 review + this
finding.
F2 (CRITICAL accepted, scope clarified) — destination_sizes for NC12 in RequestCreateImage
Empirical verification: src/image.c lines 90-115 already recompute
destination_bytesperlines[0] + destination_sizes[0] for P010
(line 90: destination_bytesperlines[0] = width * 2). The fall-through
"NV12" branch (line 108) uses V4L2-reported stride directly, which for
NC12 source is the column-stride 1080, not the linear Y stride 1280.
That breaks the VAImage's pitches[0] consumers expect.
context.c lines 379-383 — destination_sizes[0] = destination_bytesperlines[0] * format_height — IS used at cap_pool init time to size the
CAPTURE buffer's MMAP region accounting in driver_data->fmt_sizes[].
For NC12: 1080 × 720 = 777600 vs actual sizeimage 1382400. cap_pool
allocates the actual sizeimage via REQBUFS, so the underlying buffer
is correctly sized; fmt_sizes[] is just a back-cache for later access
patterns that don't go through the kernel-reported value.
Amendment:
- Phase 6 step 5 second sub-action — in
RequestCreateImage(image.c ~line 107, the "else" / NV12 branch), add detection: if the source CAPTURE format isV4L2_PIX_FMT_NV12_COL128AND the requested image format isVA_FOURCC_NV12, overridedestination_bytesperlines[0] = width(linear NV12 Y stride).destination_sizes[0]then computes towidth × format_height(correct linear Y plane size). Existing NV12-source linear path unchanged. - Phase 6 step 3 video.c — set
v4l2_buffers_count = 1for NC12 (single contiguous buffer holding Y+UV) and document this is the planes-1 multi-plane case (similar to NV12 MPLANE). - context.c lines 380-383 (
destination_sizes[0] = bytesperlines * height) stays AS-IS for now. It only affects cap_pool MMAP accounting which uses the kernel-reportedsizeimagevia REQBUFS anyway. If a future bug emerges from this mismatch on the rkvdec/hantro side, address then; not a blocker for iter40 NC12.
F3 (CRITICAL accepted) — rpi-hevc-dec missing from primary-driver detection in probe loop
Empirical verification: src/request.c lines 647-657 only have else if
branches for rkvdec and hantro-vpu. On higgs (no rkvdec, no hantro)
the primary device IS rpi-hevc-dec, but neither branch matches → no
primary_driver set → no fds stored into the new
video_fd_rpi_hevc_dec slot → routing silently no-ops with -1 fds.
Amendment: Phase 6 step 2 sub-action — add explicit else if (strcmp(info.driver, "rpi-hevc-dec") == 0) branch in the primary-driver
detection block. Sets video_fd_rpi_hevc_dec = video_fd + media_fd_rpi_hevc_dec = media_fd. Pi has no alt — alt_driver stays NULL,
no second-decoder probe runs for higgs. (rkvdec + hantro alt-probes
remain dead on higgs because the find_decoder_device_by_driver walk
returns absent for them.)
Also: extend find_decoder_device_by_driver's driver-name table at
request.c:94-95 if needed to include rpi-hevc-dec — verify it's a
free-form string match (it is, per the code), not a hard table — so the
caller passes "rpi-hevc-dec" and the walk just looks for it.
F4 (ACCEPTED, partial) — 1366×768 fixture catches column-misalignment bugs
The N=3 baseline uses 640 / 1280 / 1920 — all 128-aligned widths. A
1366-wide fixture exercises the ALIGN(width, 128) → 1408 column
padding path. The right-edge 42 pixels (cols 1366-1407) are padding;
the detile primitive must not write past the requested width.
Amendment: Phase 7 sub-action — add bbb_1366_main.mp4 (1366×768)
to the Phase 7 verification set. Phase 3 baseline retroactively
captured at Phase 7 time. Goal: same kdirect/SW bit-exact PASS at
N=1 (no need to redo the deterministic N=3 — one rep proves the
edge-case). If libva differs from kdirect on 1366 but matches on
1280/1920, the detile column-base math is buggy.
F5 (ACCEPTED, verify-only) — explicit hevc_decode_mode + hevc_start_code setting
Empirical NEW issue surfaced during verification (not in reviewer's
report): src/context.c lines 516-528 unconditionally sets
V4L2_CID_STATELESS_HEVC_START_CODE to _ANNEX_B (value 1) AND
prepends 0x00 0x00 0x01 start codes to each slice payload (per the
H.264 mirror block at line 532+). But Phase 0 strace shows kdirect uses
start_code=0 = _NONE and submits raw NAL slice payload WITHOUT start
codes.
Both modes are in rpi-hevc-dec's menu range (min=0 max=1). Open
question: does rpi-hevc-dec correctly parse start-code-prepended
payload when in ANNEX_B mode? Two possibilities:
(a) Yes — driver implements both modes, ANNEX_B works, libva PASSes
bit-exact in our default code path.
(b) No — driver only really implements NONE; ANNEX_B is a degenerate
menu entry; we'd need per-driver gating to send _NONE for
rpi-hevc-dec + suppress start-code prepend.
Amendment: Phase 7 — verify empirically via the first libva-vs-kdirect diff. If (a), no code change needed. If (b), add per-driver gate around the START_CODE set (mirror rkvdec/hantro pattern). Don't pre-emptively gate; let empiricism decide.
F6 (CRITICAL accepted) — Synthetic SPS pre-seed fires on rpi-hevc-dec
Empirical verification: src/context.c lines 277-346 — the iter25
synthetic-SPS injection block runs for VAProfileHEVCMain regardless
of active driver_kind. On higgs, driver_data->video_fd will be
video_fd_rpi_hevc_dec at this point → v4l2_set_controls(...SPS...)
fires on rpi-hevc-dec. Phase 0 strace shows rpi-hevc-dec doesn't need
this AND uses a different submission ordering (S_FMT_OUTPUT → REQBUFS_OUTPUT → S_FMT_CAPTURE → CREATE_BUFS_CAPTURE → STREAMON, then global
ctrls per-frame).
The pre-seed is wrapped in (void)v4l2_set_controls(...) — failure is
silently ignored, BUT the call may also succeed in an unintended way
on rpi-hevc-dec (it has the HEVC_SPS ctrl), potentially leaving its
internal state stuck on the dummy SPS until the first real per-frame
SPS arrives.
Amendment: Phase 6 step 2 sub-action — gate the synthetic-SPS
injection block at context.c:277 with
if (driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec). The
pre-seed only fires when active fd is NOT rpi-hevc-dec. rkvdec /
hantro paths unchanged.
F7 (No findings) — image.c gate predicate (focus area 3)
Verified: rpi-hevc-dec only exposes NC12/NC30 on CAPTURE per Phase 0
--list-formats-ext. No legitimate NV12-linear case exists on Pi. Gate
predicate image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128 is sound — fires only when
both conditions hold, excludes legitimate NV12-linear on RK / Allwinner.
F8 (No findings) — cross-device regression scope (focus area 4)
Verified: new fd fields initialise to -1; probe loop extensions are
additive (no-op when string doesn't match); request_device_kind_for_profile's 'p' branch only fires when video_fd_rpi_hevc_dec >= 0;
new video.c entry is additive. fresnel + ampere paths unchanged.
Final amended Phase 6 step list
src/request.h— addvideo_fd_rpi_hevc_dec,media_fd_rpi_hevc_dec,has_hevc_ext_sps_rps_rpi_hevc_dec(mirror iter38 + iter2 layout).src/request.c— (a) extend init -1 block; (b) addelse if (strcmp(info.driver, "rpi-hevc-dec") == 0)branch in primary-driver detection [F3]; (c) extendrequest_device_kind_for_profileso HEVC→'p' when rpi present, else 'r'; (d) extendrequest_switch_device_for_profile'p' branch; (e) probe ext_sps on new fd.src/context.c— gate synthetic-SPS pre-seed (lines 277-346) ondriver_data->video_fd != driver_data->video_fd_rpi_hevc_dec[F6].src/video.c— NC12 entry withv4l2_buffers_count=1,v4l2_mplane=true, NOT marked linear.src/image.c:- Extend
#ifdef __arm__guards (lines 239, 268) to#if defined(__arm__) || defined(__aarch64__)[F1]. - Add NC12 detection in RequestCreateImage (line 107 area): if
source format is NC12 + VAImage format is NV12, override
destination_bytesperlines[0] = width[F2]. - Add NC12 detile branch in
copy_surface_to_image(line 238+): gateimage->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128; call new detile primitive.
- Extend
src/nv12_col128.c+.h(NEW) — detile primitive.tests/test_nv12_col128_detile.c(NEW) — unit test with hand-crafted NC12 bytes + known linear output.src/meson.build+src/Makefile.am— source list updates.- Build clean on higgs; if
tests/doesn't auto-run, run manually.
Final amended Phase 7 verification
- Build cleanly on higgs.
- Local install
.soto/usr/lib/aarch64-linux-gnu/dri/. LIBVA_DRIVER_NAME=v4l2_request vainfolistsVAProfileHEVCMain.- Phase 3 fixtures (640 / 1280 / 1920) + new 1366×768 fixture: libva output SHA == kdirect SHA [F4].
lsofduring libva decode shows/dev/video19open.strace -e ioctlshows pre-seed pattern ABSENT on rpi-hevc-dec [F6 verification].- HEVC_START_CODE behavior verified empirically: if libva-vs-kdirect
fails for HEVC, add per-driver
_NONEgate per F5 amendment. - Sibling regression: re-run fresnel iter38 5/5 test rig — no change expected since iter40 path is gated on new fd.
Total amended LoC estimate: ~280 backend + 100 primitive (was 250 + 100; F1 + F2 + F6 add ~30 LoC of gates / overrides).