forked from marfrit/libva-v4l2-request-fourier
phase5_pi5_hevc_review: 3 critical findings empirically verified, 1 fixture gap
Sonnet Plan-agent review of phase1_pi5_hevc plan. Empirically verified each finding against current source per feedback_review_empirical_over_theoretical BEFORE accepting: F1 (CRITICAL): #ifdef __arm__ at image.c:239+268 kills NC12 (and already-present NV15) detile on AArch64. fresnel iter39 5/5 PASS masked this because 10-bit path was never exercised. Fix: extend guard to __aarch64__. F2 (CRITICAL): destination_bytesperlines for NC12 source returns column-stride (1080) not linear-NV12 Y stride (1280). VAImage consumers see wrong pitch. Fix: override in RequestCreateImage when src=NC12, dst image=NV12. F3 (CRITICAL): request.c primary-driver detection has else-if branches for rkvdec and hantro-vpu only. On higgs (rpi-hevc-dec primary), neither matches → new fd pair stays -1 → routing no-ops. Fix: add explicit rpi-hevc-dec branch. F4 (accepted): add 1366x768 fixture to exercise column padding. F5 (verify-only): HEVC START_CODE_ANNEX_B may not work on rpi-hevc-dec (kdirect uses NONE). Don't pre-gate; verify empirically in Phase 7. F6 (CRITICAL): iter25 synthetic-SPS pre-seed fires for HEVC regardless of driver_kind. Would issue HEVC_SPS to rpi-hevc-dec which doesn't need it AND uses different submission order. Fix: gate on driver_data->video_fd != video_fd_rpi_hevc_dec. F7/F8 (no findings): image.c gate predicate sound; cross-device regression scope clean. Amended Phase 6 step list with 3 new gating actions. Phase 7 verification expanded with empirical START_CODE check + 1366 fixture. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,194 @@
|
|||||||
|
# Phase 5 review — iter40 plan (sonnet review + amendments)
|
||||||
|
|
||||||
|
Reviewer verdict: **yellow** — plan substantively sound, 3 concrete blockers
|
||||||
|
+ 1 fixture gap + 1 verification-only note. All findings verified empirically
|
||||||
|
against current source (per [[feedback_review_empirical_over_theoretical]])
|
||||||
|
BEFORE accepting into the amended plan.
|
||||||
|
|
||||||
|
## Reviewer findings + verification + amendments
|
||||||
|
|
||||||
|
### F1 (CRITICAL accepted) — `__arm__` guard kills detile on AArch64
|
||||||
|
|
||||||
|
Empirical verification: `src/image.c` lines 239 + 268 wrap the entire
|
||||||
|
per-format detile dispatch (incl. `nv15_unpack_plane_to_p010`) in
|
||||||
|
`#ifdef __arm__`. Pi 5 / fresnel / ampere are all AArch64 → guard never
|
||||||
|
fires → both NC12 detile (proposed) AND existing NV15→P010 unpack
|
||||||
|
(iter39) are silently dead code on aarch64. iter39 5/5 PASS on fresnel
|
||||||
|
was bit-exact for 8-bit codecs only; the 10-bit detile path was never
|
||||||
|
exercised, so the dead-code didn't manifest as a failure.
|
||||||
|
|
||||||
|
**Amendment:** Phase 6 step 5 first sub-action — change guard at lines
|
||||||
|
239 + 268 from `#ifdef __arm__` to `#if defined(__arm__) || defined(__aarch64__)`.
|
||||||
|
This re-enables the existing NV15→P010 detile AND lets the new NC12
|
||||||
|
detile branch execute. No semantic change on x86 (no detile primitives
|
||||||
|
compiled there). Add explicit comment crediting Phase 5 review + this
|
||||||
|
finding.
|
||||||
|
|
||||||
|
### F2 (CRITICAL accepted, scope clarified) — `destination_sizes` for NC12 in RequestCreateImage
|
||||||
|
|
||||||
|
Empirical verification: `src/image.c` lines 90-115 already recompute
|
||||||
|
`destination_bytesperlines[0]` + `destination_sizes[0]` for `P010`
|
||||||
|
(line 90: `destination_bytesperlines[0] = width * 2`). The fall-through
|
||||||
|
"NV12" branch (line 108) uses V4L2-reported stride directly, which for
|
||||||
|
NC12 source is the column-stride 1080, not the linear Y stride 1280.
|
||||||
|
That breaks the VAImage's `pitches[0]` consumers expect.
|
||||||
|
|
||||||
|
`context.c` lines 379-383 — `destination_sizes[0] = destination_bytesperlines[0] * format_height` — IS used at cap_pool init time to size the
|
||||||
|
CAPTURE buffer's MMAP region accounting in `driver_data->fmt_sizes[]`.
|
||||||
|
For NC12: 1080 × 720 = 777600 vs actual `sizeimage` 1382400. cap_pool
|
||||||
|
allocates the actual `sizeimage` via REQBUFS, so the underlying buffer
|
||||||
|
is correctly sized; `fmt_sizes[]` is just a back-cache for later access
|
||||||
|
patterns that don't go through the kernel-reported value.
|
||||||
|
|
||||||
|
**Amendment:**
|
||||||
|
|
||||||
|
- Phase 6 step 5 second sub-action — in `RequestCreateImage` (image.c
|
||||||
|
~line 107, the "else" / NV12 branch), add detection: if the source
|
||||||
|
CAPTURE format is `V4L2_PIX_FMT_NV12_COL128` AND the requested image
|
||||||
|
format is `VA_FOURCC_NV12`, override `destination_bytesperlines[0] =
|
||||||
|
width` (linear NV12 Y stride). `destination_sizes[0]` then computes
|
||||||
|
to `width × format_height` (correct linear Y plane size). Existing
|
||||||
|
NV12-source linear path unchanged.
|
||||||
|
- Phase 6 step 3 video.c — set `v4l2_buffers_count = 1` for NC12 (single
|
||||||
|
contiguous buffer holding Y+UV) and document this is the planes-1
|
||||||
|
multi-plane case (similar to NV12 MPLANE).
|
||||||
|
- context.c lines 380-383 (`destination_sizes[0] = bytesperlines * height`)
|
||||||
|
stays AS-IS for now. It only affects cap_pool MMAP accounting which
|
||||||
|
uses the kernel-reported `sizeimage` via REQBUFS anyway. If a future
|
||||||
|
bug emerges from this mismatch on the rkvdec/hantro side, address
|
||||||
|
then; not a blocker for iter40 NC12.
|
||||||
|
|
||||||
|
### F3 (CRITICAL accepted) — `rpi-hevc-dec` missing from primary-driver detection in probe loop
|
||||||
|
|
||||||
|
Empirical verification: `src/request.c` lines 647-657 only have `else if`
|
||||||
|
branches for `rkvdec` and `hantro-vpu`. On higgs (no rkvdec, no hantro)
|
||||||
|
the primary device IS `rpi-hevc-dec`, but neither branch matches → no
|
||||||
|
`primary_driver` set → no fds stored into the new
|
||||||
|
`video_fd_rpi_hevc_dec` slot → routing silently no-ops with -1 fds.
|
||||||
|
|
||||||
|
**Amendment:** Phase 6 step 2 sub-action — add explicit `else if (strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in the primary-driver
|
||||||
|
detection block. Sets `video_fd_rpi_hevc_dec = video_fd` + `media_fd_rpi_hevc_dec = media_fd`. Pi has no alt — `alt_driver` stays NULL,
|
||||||
|
no second-decoder probe runs for higgs. (rkvdec + hantro alt-probes
|
||||||
|
remain dead on higgs because the find_decoder_device_by_driver walk
|
||||||
|
returns absent for them.)
|
||||||
|
|
||||||
|
Also: extend `find_decoder_device_by_driver`'s driver-name table at
|
||||||
|
request.c:94-95 if needed to include `rpi-hevc-dec` — verify it's a
|
||||||
|
free-form string match (it is, per the code), not a hard table — so the
|
||||||
|
caller passes `"rpi-hevc-dec"` and the walk just looks for it.
|
||||||
|
|
||||||
|
### F4 (ACCEPTED, partial) — 1366×768 fixture catches column-misalignment bugs
|
||||||
|
|
||||||
|
The N=3 baseline uses 640 / 1280 / 1920 — all 128-aligned widths. A
|
||||||
|
1366-wide fixture exercises the `ALIGN(width, 128) → 1408` column
|
||||||
|
padding path. The right-edge 42 pixels (cols 1366-1407) are padding;
|
||||||
|
the detile primitive must not write past the requested width.
|
||||||
|
|
||||||
|
**Amendment:** Phase 7 sub-action — add `bbb_1366_main.mp4` (1366×768)
|
||||||
|
to the Phase 7 verification set. Phase 3 baseline retroactively
|
||||||
|
captured at Phase 7 time. Goal: same kdirect/SW bit-exact PASS at
|
||||||
|
N=1 (no need to redo the deterministic N=3 — one rep proves the
|
||||||
|
edge-case). If libva differs from kdirect on 1366 but matches on
|
||||||
|
1280/1920, the detile column-base math is buggy.
|
||||||
|
|
||||||
|
### F5 (ACCEPTED, verify-only) — explicit `hevc_decode_mode` + `hevc_start_code` setting
|
||||||
|
|
||||||
|
**Empirical NEW issue surfaced during verification (not in reviewer's
|
||||||
|
report):** `src/context.c` lines 516-528 unconditionally sets
|
||||||
|
`V4L2_CID_STATELESS_HEVC_START_CODE` to `_ANNEX_B` (value 1) AND
|
||||||
|
prepends `0x00 0x00 0x01` start codes to each slice payload (per the
|
||||||
|
H.264 mirror block at line 532+). But Phase 0 strace shows kdirect uses
|
||||||
|
`start_code=0` = `_NONE` and submits raw NAL slice payload WITHOUT start
|
||||||
|
codes.
|
||||||
|
|
||||||
|
Both modes are in rpi-hevc-dec's menu range (min=0 max=1). Open
|
||||||
|
question: does rpi-hevc-dec correctly parse start-code-prepended
|
||||||
|
payload when in ANNEX_B mode? Two possibilities:
|
||||||
|
(a) Yes — driver implements both modes, ANNEX_B works, libva PASSes
|
||||||
|
bit-exact in our default code path.
|
||||||
|
(b) No — driver only really implements NONE; ANNEX_B is a degenerate
|
||||||
|
menu entry; we'd need per-driver gating to send `_NONE` for
|
||||||
|
rpi-hevc-dec + suppress start-code prepend.
|
||||||
|
|
||||||
|
**Amendment:** Phase 7 — verify empirically via the first libva-vs-kdirect
|
||||||
|
diff. If (a), no code change needed. If (b), add per-driver gate around
|
||||||
|
the START_CODE set (mirror rkvdec/hantro pattern). Don't pre-emptively
|
||||||
|
gate; let empiricism decide.
|
||||||
|
|
||||||
|
### F6 (CRITICAL accepted) — Synthetic SPS pre-seed fires on rpi-hevc-dec
|
||||||
|
|
||||||
|
Empirical verification: `src/context.c` lines 277-346 — the iter25
|
||||||
|
synthetic-SPS injection block runs for `VAProfileHEVCMain` regardless
|
||||||
|
of active driver_kind. On higgs, `driver_data->video_fd` will be
|
||||||
|
`video_fd_rpi_hevc_dec` at this point → `v4l2_set_controls(...SPS...)`
|
||||||
|
fires on rpi-hevc-dec. Phase 0 strace shows rpi-hevc-dec doesn't need
|
||||||
|
this AND uses a different submission ordering (S_FMT_OUTPUT → REQBUFS_OUTPUT → S_FMT_CAPTURE → CREATE_BUFS_CAPTURE → STREAMON, then global
|
||||||
|
ctrls per-frame).
|
||||||
|
|
||||||
|
The pre-seed is wrapped in `(void)v4l2_set_controls(...)` — failure is
|
||||||
|
silently ignored, BUT the call may also succeed in an unintended way
|
||||||
|
on rpi-hevc-dec (it has the HEVC_SPS ctrl), potentially leaving its
|
||||||
|
internal state stuck on the dummy SPS until the first real per-frame
|
||||||
|
SPS arrives.
|
||||||
|
|
||||||
|
**Amendment:** Phase 6 step 2 sub-action — gate the synthetic-SPS
|
||||||
|
injection block at context.c:277 with
|
||||||
|
`if (driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec)`. The
|
||||||
|
pre-seed only fires when active fd is NOT rpi-hevc-dec. rkvdec /
|
||||||
|
hantro paths unchanged.
|
||||||
|
|
||||||
|
### F7 (No findings) — `image.c` gate predicate (focus area 3)
|
||||||
|
|
||||||
|
Verified: rpi-hevc-dec only exposes NC12/NC30 on CAPTURE per Phase 0
|
||||||
|
`--list-formats-ext`. No legitimate NV12-linear case exists on Pi. Gate
|
||||||
|
predicate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128` is sound — fires only when
|
||||||
|
both conditions hold, excludes legitimate NV12-linear on RK / Allwinner.
|
||||||
|
|
||||||
|
### F8 (No findings) — cross-device regression scope (focus area 4)
|
||||||
|
|
||||||
|
Verified: new fd fields initialise to -1; probe loop extensions are
|
||||||
|
additive (no-op when string doesn't match); `request_device_kind_for_profile`'s 'p' branch only fires when `video_fd_rpi_hevc_dec >= 0`;
|
||||||
|
new video.c entry is additive. fresnel + ampere paths unchanged.
|
||||||
|
|
||||||
|
## Final amended Phase 6 step list
|
||||||
|
|
||||||
|
1. `src/request.h` — add `video_fd_rpi_hevc_dec`, `media_fd_rpi_hevc_dec`,
|
||||||
|
`has_hevc_ext_sps_rps_rpi_hevc_dec` (mirror iter38 + iter2 layout).
|
||||||
|
2. `src/request.c` — (a) extend init -1 block; (b) **add `else if
|
||||||
|
(strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in primary-driver
|
||||||
|
detection** [F3]; (c) extend `request_device_kind_for_profile` so
|
||||||
|
HEVC→'p' when rpi present, else 'r'; (d) extend `request_switch_device_for_profile` 'p' branch; (e) probe ext_sps on new fd.
|
||||||
|
3. `src/context.c` — **gate synthetic-SPS pre-seed (lines 277-346) on
|
||||||
|
`driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec`** [F6].
|
||||||
|
4. `src/video.c` — NC12 entry with `v4l2_buffers_count=1`,
|
||||||
|
`v4l2_mplane=true`, NOT marked linear.
|
||||||
|
5. `src/image.c`:
|
||||||
|
- **Extend `#ifdef __arm__` guards (lines 239, 268) to `#if defined(__arm__) || defined(__aarch64__)`** [F1].
|
||||||
|
- **Add NC12 detection in RequestCreateImage** (line 107 area): if
|
||||||
|
source format is NC12 + VAImage format is NV12, override
|
||||||
|
`destination_bytesperlines[0] = width` [F2].
|
||||||
|
- **Add NC12 detile branch in `copy_surface_to_image`** (line 238+):
|
||||||
|
gate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128`; call new detile primitive.
|
||||||
|
6. `src/nv12_col128.c` + `.h` (NEW) — detile primitive.
|
||||||
|
7. `tests/test_nv12_col128_detile.c` (NEW) — unit test with hand-crafted
|
||||||
|
NC12 bytes + known linear output.
|
||||||
|
8. `src/meson.build` + `src/Makefile.am` — source list updates.
|
||||||
|
9. Build clean on higgs; if `tests/` doesn't auto-run, run manually.
|
||||||
|
|
||||||
|
## Final amended Phase 7 verification
|
||||||
|
|
||||||
|
- Build cleanly on higgs.
|
||||||
|
- Local install `.so` to `/usr/lib/aarch64-linux-gnu/dri/`.
|
||||||
|
- `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain`.
|
||||||
|
- Phase 3 fixtures (640 / 1280 / 1920) + new 1366×768 fixture: libva
|
||||||
|
output SHA == kdirect SHA [F4].
|
||||||
|
- `lsof` during libva decode shows `/dev/video19` open.
|
||||||
|
- `strace -e ioctl` shows pre-seed pattern ABSENT on rpi-hevc-dec [F6
|
||||||
|
verification].
|
||||||
|
- HEVC_START_CODE behavior verified empirically: if libva-vs-kdirect
|
||||||
|
fails for HEVC, add per-driver `_NONE` gate per F5 amendment.
|
||||||
|
- Sibling regression: re-run fresnel iter38 5/5 test rig — no change
|
||||||
|
expected since iter40 path is gated on new fd.
|
||||||
|
|
||||||
|
Total amended LoC estimate: ~280 backend + 100 primitive (was 250 + 100;
|
||||||
|
F1 + F2 + F6 add ~30 LoC of gates / overrides).
|
||||||
Reference in New Issue
Block a user