forked from marfrit/libva-v4l2-request-fourier
f1be489c75
Sonnet Plan-agent review of phase1_pi5_hevc plan. Empirically verified each finding against current source per feedback_review_empirical_over_theoretical BEFORE accepting: F1 (CRITICAL): #ifdef __arm__ at image.c:239+268 kills NC12 (and already-present NV15) detile on AArch64. fresnel iter39 5/5 PASS masked this because 10-bit path was never exercised. Fix: extend guard to __aarch64__. F2 (CRITICAL): destination_bytesperlines for NC12 source returns column-stride (1080) not linear-NV12 Y stride (1280). VAImage consumers see wrong pitch. Fix: override in RequestCreateImage when src=NC12, dst image=NV12. F3 (CRITICAL): request.c primary-driver detection has else-if branches for rkvdec and hantro-vpu only. On higgs (rpi-hevc-dec primary), neither matches → new fd pair stays -1 → routing no-ops. Fix: add explicit rpi-hevc-dec branch. F4 (accepted): add 1366x768 fixture to exercise column padding. F5 (verify-only): HEVC START_CODE_ANNEX_B may not work on rpi-hevc-dec (kdirect uses NONE). Don't pre-gate; verify empirically in Phase 7. F6 (CRITICAL): iter25 synthetic-SPS pre-seed fires for HEVC regardless of driver_kind. Would issue HEVC_SPS to rpi-hevc-dec which doesn't need it AND uses different submission order. Fix: gate on driver_data->video_fd != video_fd_rpi_hevc_dec. F7/F8 (no findings): image.c gate predicate sound; cross-device regression scope clean. Amended Phase 6 step list with 3 new gating actions. Phase 7 verification expanded with empirical START_CODE check + 1366 fixture. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
195 lines
10 KiB
Markdown
195 lines
10 KiB
Markdown
# Phase 5 review — iter40 plan (sonnet review + amendments)
|
||
|
||
Reviewer verdict: **yellow** — plan substantively sound, 3 concrete blockers
|
||
+ 1 fixture gap + 1 verification-only note. All findings verified empirically
|
||
against current source (per [[feedback_review_empirical_over_theoretical]])
|
||
BEFORE accepting into the amended plan.
|
||
|
||
## Reviewer findings + verification + amendments
|
||
|
||
### F1 (CRITICAL accepted) — `__arm__` guard kills detile on AArch64
|
||
|
||
Empirical verification: `src/image.c` lines 239 + 268 wrap the entire
|
||
per-format detile dispatch (incl. `nv15_unpack_plane_to_p010`) in
|
||
`#ifdef __arm__`. Pi 5 / fresnel / ampere are all AArch64 → guard never
|
||
fires → both NC12 detile (proposed) AND existing NV15→P010 unpack
|
||
(iter39) are silently dead code on aarch64. iter39 5/5 PASS on fresnel
|
||
was bit-exact for 8-bit codecs only; the 10-bit detile path was never
|
||
exercised, so the dead-code didn't manifest as a failure.
|
||
|
||
**Amendment:** Phase 6 step 5 first sub-action — change guard at lines
|
||
239 + 268 from `#ifdef __arm__` to `#if defined(__arm__) || defined(__aarch64__)`.
|
||
This re-enables the existing NV15→P010 detile AND lets the new NC12
|
||
detile branch execute. No semantic change on x86 (no detile primitives
|
||
compiled there). Add explicit comment crediting Phase 5 review + this
|
||
finding.
|
||
|
||
### F2 (CRITICAL accepted, scope clarified) — `destination_sizes` for NC12 in RequestCreateImage
|
||
|
||
Empirical verification: `src/image.c` lines 90-115 already recompute
|
||
`destination_bytesperlines[0]` + `destination_sizes[0]` for `P010`
|
||
(line 90: `destination_bytesperlines[0] = width * 2`). The fall-through
|
||
"NV12" branch (line 108) uses V4L2-reported stride directly, which for
|
||
NC12 source is the column-stride 1080, not the linear Y stride 1280.
|
||
That breaks the VAImage's `pitches[0]` consumers expect.
|
||
|
||
`context.c` lines 379-383 — `destination_sizes[0] = destination_bytesperlines[0] * format_height` — IS used at cap_pool init time to size the
|
||
CAPTURE buffer's MMAP region accounting in `driver_data->fmt_sizes[]`.
|
||
For NC12: 1080 × 720 = 777600 vs actual `sizeimage` 1382400. cap_pool
|
||
allocates the actual `sizeimage` via REQBUFS, so the underlying buffer
|
||
is correctly sized; `fmt_sizes[]` is just a back-cache for later access
|
||
patterns that don't go through the kernel-reported value.
|
||
|
||
**Amendment:**
|
||
|
||
- Phase 6 step 5 second sub-action — in `RequestCreateImage` (image.c
|
||
~line 107, the "else" / NV12 branch), add detection: if the source
|
||
CAPTURE format is `V4L2_PIX_FMT_NV12_COL128` AND the requested image
|
||
format is `VA_FOURCC_NV12`, override `destination_bytesperlines[0] =
|
||
width` (linear NV12 Y stride). `destination_sizes[0]` then computes
|
||
to `width × format_height` (correct linear Y plane size). Existing
|
||
NV12-source linear path unchanged.
|
||
- Phase 6 step 3 video.c — set `v4l2_buffers_count = 1` for NC12 (single
|
||
contiguous buffer holding Y+UV) and document this is the planes-1
|
||
multi-plane case (similar to NV12 MPLANE).
|
||
- context.c lines 380-383 (`destination_sizes[0] = bytesperlines * height`)
|
||
stays AS-IS for now. It only affects cap_pool MMAP accounting which
|
||
uses the kernel-reported `sizeimage` via REQBUFS anyway. If a future
|
||
bug emerges from this mismatch on the rkvdec/hantro side, address
|
||
then; not a blocker for iter40 NC12.
|
||
|
||
### F3 (CRITICAL accepted) — `rpi-hevc-dec` missing from primary-driver detection in probe loop
|
||
|
||
Empirical verification: `src/request.c` lines 647-657 only have `else if`
|
||
branches for `rkvdec` and `hantro-vpu`. On higgs (no rkvdec, no hantro)
|
||
the primary device IS `rpi-hevc-dec`, but neither branch matches → no
|
||
`primary_driver` set → no fds stored into the new
|
||
`video_fd_rpi_hevc_dec` slot → routing silently no-ops with -1 fds.
|
||
|
||
**Amendment:** Phase 6 step 2 sub-action — add explicit `else if (strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in the primary-driver
|
||
detection block. Sets `video_fd_rpi_hevc_dec = video_fd` + `media_fd_rpi_hevc_dec = media_fd`. Pi has no alt — `alt_driver` stays NULL,
|
||
no second-decoder probe runs for higgs. (rkvdec + hantro alt-probes
|
||
remain dead on higgs because the find_decoder_device_by_driver walk
|
||
returns absent for them.)
|
||
|
||
Also: extend `find_decoder_device_by_driver`'s driver-name table at
|
||
request.c:94-95 if needed to include `rpi-hevc-dec` — verify it's a
|
||
free-form string match (it is, per the code), not a hard table — so the
|
||
caller passes `"rpi-hevc-dec"` and the walk just looks for it.
|
||
|
||
### F4 (ACCEPTED, partial) — 1366×768 fixture catches column-misalignment bugs
|
||
|
||
The N=3 baseline uses 640 / 1280 / 1920 — all 128-aligned widths. A
|
||
1366-wide fixture exercises the `ALIGN(width, 128) → 1408` column
|
||
padding path. The right-edge 42 pixels (cols 1366-1407) are padding;
|
||
the detile primitive must not write past the requested width.
|
||
|
||
**Amendment:** Phase 7 sub-action — add `bbb_1366_main.mp4` (1366×768)
|
||
to the Phase 7 verification set. Phase 3 baseline retroactively
|
||
captured at Phase 7 time. Goal: same kdirect/SW bit-exact PASS at
|
||
N=1 (no need to redo the deterministic N=3 — one rep proves the
|
||
edge-case). If libva differs from kdirect on 1366 but matches on
|
||
1280/1920, the detile column-base math is buggy.
|
||
|
||
### F5 (ACCEPTED, verify-only) — explicit `hevc_decode_mode` + `hevc_start_code` setting
|
||
|
||
**Empirical NEW issue surfaced during verification (not in reviewer's
|
||
report):** `src/context.c` lines 516-528 unconditionally sets
|
||
`V4L2_CID_STATELESS_HEVC_START_CODE` to `_ANNEX_B` (value 1) AND
|
||
prepends `0x00 0x00 0x01` start codes to each slice payload (per the
|
||
H.264 mirror block at line 532+). But Phase 0 strace shows kdirect uses
|
||
`start_code=0` = `_NONE` and submits raw NAL slice payload WITHOUT start
|
||
codes.
|
||
|
||
Both modes are in rpi-hevc-dec's menu range (min=0 max=1). Open
|
||
question: does rpi-hevc-dec correctly parse start-code-prepended
|
||
payload when in ANNEX_B mode? Two possibilities:
|
||
(a) Yes — driver implements both modes, ANNEX_B works, libva PASSes
|
||
bit-exact in our default code path.
|
||
(b) No — driver only really implements NONE; ANNEX_B is a degenerate
|
||
menu entry; we'd need per-driver gating to send `_NONE` for
|
||
rpi-hevc-dec + suppress start-code prepend.
|
||
|
||
**Amendment:** Phase 7 — verify empirically via the first libva-vs-kdirect
|
||
diff. If (a), no code change needed. If (b), add per-driver gate around
|
||
the START_CODE set (mirror rkvdec/hantro pattern). Don't pre-emptively
|
||
gate; let empiricism decide.
|
||
|
||
### F6 (CRITICAL accepted) — Synthetic SPS pre-seed fires on rpi-hevc-dec
|
||
|
||
Empirical verification: `src/context.c` lines 277-346 — the iter25
|
||
synthetic-SPS injection block runs for `VAProfileHEVCMain` regardless
|
||
of active driver_kind. On higgs, `driver_data->video_fd` will be
|
||
`video_fd_rpi_hevc_dec` at this point → `v4l2_set_controls(...SPS...)`
|
||
fires on rpi-hevc-dec. Phase 0 strace shows rpi-hevc-dec doesn't need
|
||
this AND uses a different submission ordering (S_FMT_OUTPUT → REQBUFS_OUTPUT → S_FMT_CAPTURE → CREATE_BUFS_CAPTURE → STREAMON, then global
|
||
ctrls per-frame).
|
||
|
||
The pre-seed is wrapped in `(void)v4l2_set_controls(...)` — failure is
|
||
silently ignored, BUT the call may also succeed in an unintended way
|
||
on rpi-hevc-dec (it has the HEVC_SPS ctrl), potentially leaving its
|
||
internal state stuck on the dummy SPS until the first real per-frame
|
||
SPS arrives.
|
||
|
||
**Amendment:** Phase 6 step 2 sub-action — gate the synthetic-SPS
|
||
injection block at context.c:277 with
|
||
`if (driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec)`. The
|
||
pre-seed only fires when active fd is NOT rpi-hevc-dec. rkvdec /
|
||
hantro paths unchanged.
|
||
|
||
### F7 (No findings) — `image.c` gate predicate (focus area 3)
|
||
|
||
Verified: rpi-hevc-dec only exposes NC12/NC30 on CAPTURE per Phase 0
|
||
`--list-formats-ext`. No legitimate NV12-linear case exists on Pi. Gate
|
||
predicate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128` is sound — fires only when
|
||
both conditions hold, excludes legitimate NV12-linear on RK / Allwinner.
|
||
|
||
### F8 (No findings) — cross-device regression scope (focus area 4)
|
||
|
||
Verified: new fd fields initialise to -1; probe loop extensions are
|
||
additive (no-op when string doesn't match); `request_device_kind_for_profile`'s 'p' branch only fires when `video_fd_rpi_hevc_dec >= 0`;
|
||
new video.c entry is additive. fresnel + ampere paths unchanged.
|
||
|
||
## Final amended Phase 6 step list
|
||
|
||
1. `src/request.h` — add `video_fd_rpi_hevc_dec`, `media_fd_rpi_hevc_dec`,
|
||
`has_hevc_ext_sps_rps_rpi_hevc_dec` (mirror iter38 + iter2 layout).
|
||
2. `src/request.c` — (a) extend init -1 block; (b) **add `else if
|
||
(strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in primary-driver
|
||
detection** [F3]; (c) extend `request_device_kind_for_profile` so
|
||
HEVC→'p' when rpi present, else 'r'; (d) extend `request_switch_device_for_profile` 'p' branch; (e) probe ext_sps on new fd.
|
||
3. `src/context.c` — **gate synthetic-SPS pre-seed (lines 277-346) on
|
||
`driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec`** [F6].
|
||
4. `src/video.c` — NC12 entry with `v4l2_buffers_count=1`,
|
||
`v4l2_mplane=true`, NOT marked linear.
|
||
5. `src/image.c`:
|
||
- **Extend `#ifdef __arm__` guards (lines 239, 268) to `#if defined(__arm__) || defined(__aarch64__)`** [F1].
|
||
- **Add NC12 detection in RequestCreateImage** (line 107 area): if
|
||
source format is NC12 + VAImage format is NV12, override
|
||
`destination_bytesperlines[0] = width` [F2].
|
||
- **Add NC12 detile branch in `copy_surface_to_image`** (line 238+):
|
||
gate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128`; call new detile primitive.
|
||
6. `src/nv12_col128.c` + `.h` (NEW) — detile primitive.
|
||
7. `tests/test_nv12_col128_detile.c` (NEW) — unit test with hand-crafted
|
||
NC12 bytes + known linear output.
|
||
8. `src/meson.build` + `src/Makefile.am` — source list updates.
|
||
9. Build clean on higgs; if `tests/` doesn't auto-run, run manually.
|
||
|
||
## Final amended Phase 7 verification
|
||
|
||
- Build cleanly on higgs.
|
||
- Local install `.so` to `/usr/lib/aarch64-linux-gnu/dri/`.
|
||
- `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain`.
|
||
- Phase 3 fixtures (640 / 1280 / 1920) + new 1366×768 fixture: libva
|
||
output SHA == kdirect SHA [F4].
|
||
- `lsof` during libva decode shows `/dev/video19` open.
|
||
- `strace -e ioctl` shows pre-seed pattern ABSENT on rpi-hevc-dec [F6
|
||
verification].
|
||
- HEVC_START_CODE behavior verified empirically: if libva-vs-kdirect
|
||
fails for HEVC, add per-driver `_NONE` gate per F5 amendment.
|
||
- Sibling regression: re-run fresnel iter38 5/5 test rig — no change
|
||
expected since iter40 path is gated on new fd.
|
||
|
||
Total amended LoC estimate: ~280 backend + 100 primitive (was 250 + 100;
|
||
F1 + F2 + F6 add ~30 LoC of gates / overrides).
|