iter40b: SPS-parse fix lands but bit-exact still blocked upstream

Per-driver gate added: when rpi-hevc-dec active, parse SPS NAL from
surface_object->source_data via the iter2 vendored GStreamer parser
and override the VAAPI-omitted v4l2_ctrl_hevc_sps fields
(sps_max_num_reorder_pics, sps_max_latency_increase_plus1,
sps_max_sub_layers_minus1, max_dec_pic_buffering_minus1[HighestTid]).
Cached at driver_data->hevc_sps_field_cache.

Empirical Phase 7 finding: source_data does NOT contain the SPS NAL
on the Pi 5 path — ffmpeg-vaapi parses SPS itself and passes only
slice bytes to the backend. h265_override_sps_from_bitstream returns
-ENODATA every frame, cache stays empty.

Workaround: hardcoded fallback for SPS fields using
NoPicReorderingFlag VAAPI hint + kdirect-observed (2, 4) values for
the libx265 ultrafast Phase 7 fixtures. Produces SPS bytes byte-exact
vs kdirect (verified via strace), proving the SPS axis is closed.
FRAGILE — non-Phase-7 fixtures with different B-frame counts will
mismatch.

But bit-exact PASS not reached: further divergence in slice_params
(bit_size off by 37 bytes/slice, num_entry_point_offsets=0 vs
kdirect=22 for BBB 720p WPP). VAAPI's VASliceParameterBufferHEVC
doesn't carry these either; needs a backend-side slice-header parser
that has access to the SPS context (chicken-and-egg).

Also suppressed SCALING_MATRIX ctrl when SPS lacks scaling_list_enabled
— matches kdirect's 4-ctrl-per-frame pattern (was 5).

Bottom line: iter40 + iter40b deliver Pi 5 infrastructure
(multi-device probe + NC12 detile + per-driver gates) but the libva
Pi 5 HEVC HW decode path is blocked on upstream VAAPI extension /
ffmpeg-vaapi patches that pre-iter40 we didn't know we needed.

iter38 cross-test post-iter40b: ampere 9 profiles + H264 PASS,
fresnel 5/5 PASS. No sibling regression.

Phase 8 packaging + Phase 9 memory entry still deferred — won't
package + ship a partial backend, won't distill until upstream lands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-17 19:45:43 +00:00
parent 9037934b21
commit 071b08dcc2
3 changed files with 265 additions and 6 deletions
+79 -1
View File
@@ -123,9 +123,87 @@ plumbing that iter2 already provides. Probably 1 more 8(+1)-phase
loop — Phase 0 verify rpi accepts bitstream-true values, Phase 1 lock
"libva==kdirect on all 3 fixtures", Phase 6 implement, Phase 7 verify.
## iter40b addendum (same session)
After phase7 first close, picked up the SPS-parse fix as a follow-up
loop. Findings — all empirical:
1. **Source_data lacks SPS NAL.** Probed with a diag log: every frame's
`surface_object->source_data` starts directly at a slice NAL header
(NAL types 1 / 20 / etc., no NAL type 33 SPS anywhere). ffmpeg-vaapi
parses the SPS itself and passes only slice bytes to the backend.
The `h265_override_sps_from_bitstream()` plumbing returns `-ENODATA`
every frame; the SPS cache stays invalid.
2. **VAAPI doesn't expose the SPS fields rpi needs.** Read
`/usr/include/va/va_dec_hevc.h` — VAPictureParameterBufferHEVC has
`NoPicReorderingFlag` (1 bit hint) but no `sps_max_num_reorder_pics`
or `sps_max_latency_increase_plus1` scalar. They simply aren't
reachable from the standard VAAPI API.
3. **Empirical SPS fix lands (hardcoded values match kdirect).** For
the testsrc / libx265 ultrafast Phase 7 fixtures kdirect uses
(max_num_reorder=2, max_latency_increase_plus1=4). Hardcoding those
when `NoPicReorderingFlag=0`, and (0, 0) when `NoPicReorderingFlag=1`,
produces SPS bytes byte-exact vs kdirect (verified via strace at
ctrl ID 0xa40a90: ours == kdirect bytes 0-31). Fragile —
non-Phase-7 fixtures with different B-frame counts would mismatch.
Documented in h265.c::h265_set_controls (the rpi-hevc-dec gate).
4. **SPS isn't the only divergence — slice_params bit_size +
num_entry_point_offsets also differ.** Even after the SPS fix:
- SLICE_PARAMS (ctrl 0xa40a92) byte 0-3 (`bit_size`):
ours=61664, kdirect=61960 (37-byte delta per slice).
- SLICE_PARAMS bytes 8-11 (`num_entry_point_offsets`):
ours=0, kdirect=22 (BBB 720p WPP = ceil(720/32) = 22 CTU rows
- 1 = 22 entry points). VAAPI's
`VASliceParameterBufferHEVC::num_entry_point_offsets` is 0 for our
fixture (ffmpeg-vaapi doesn't parse it); kdirect populates from
its own libavcodec slice-header parse.
5. **Bit-exact still NOT reached after iter40b.** Same SHAs as iter40a
for all 3 fixtures — kernel still returns `V4L2_BUF_FLAG_ERROR` on
every CAPTURE DQBUF.
### Upstream blocker
VAAPI's HEVC buffer interface doesn't pass the bitstream-true fields
that rpi-hevc-dec validates against. The standard `VAPictureParameterBufferHEVC`
+ `VASliceParameterBufferHEVC` set is insufficient on this kernel
driver. Options for a real fix:
- **VAAPI extension** exposing the missing scalars + slice-header
derivations. Multi-quarter upstream effort.
- **A backdoor `VABufferType` for raw SPS/PPS/slice-header NAL bytes**.
Libva-internal; consumers would have to populate it.
- **Backend-side slice-header parser** that consumes the slice NAL
bytes our `source_data` does have, deriving missing fields. Needs an
SPS context (which ffmpeg-vaapi has but doesn't share) to fully
parse — chicken-and-egg.
- **Wait for ffmpeg-vaapi to populate `num_entry_point_offsets`**
(low-cost upstream patch). Plus the SPS extension above.
None achievable in this iteration. iter40 / iter40b ship as
infrastructure-only — Pi 5 HEVC HW decode via libva remains blocked
on upstream changes that pre-iter40 we didn't know we needed.
### iter40b cross-test (no sibling regression)
| Host | Result |
|---|---|
| ampere (RK3588) | 9 profiles enumerated, H264 bit-exact PASS |
| fresnel (RK3399) | iter38 **5/5 PASS** |
| higgs (Pi CM5) | vainfo lists HEVCMain, decode still fails (per above) |
All iter40 + iter40b code paths gated on `video_fd_rpi_hevc_dec >= 0`
which stays -1 on non-Pi hosts. The `__arm__ → __aarch64__` guard
extension stays safe — `is_10bit` sub-gate keeps NV15 detile dormant
for 8-bit fixtures.
## What's shipped this iter
Branch master `3ffa9d0`. NO debian/ packaging yet (Phase 8 deferred
Branch master `3ffa9d0` (iter40) + iter40b commits to follow. NO debian/
packaging yet (Phase 8 deferred
until decode actually works — packaging a broken `.so` is mis-direction).
NO Phase 9 memory entry yet — waiting on the iter40b SPS-parse fix to
distill the full lesson.