ampere-av1 Phase 3 finding: iter2 Fix 3 release is NOT the divergence cause

Investigated whether picture.c::BeginPicture's iter2 Fix 3 release-on-
rebind was causing AV1 inter-frame divergence on av1_larger.ivf
(film_grain stress vector). Added env-gated LIBVA_SKIP_REBIND=1
experiment (leak old slot instead of release); A/B run showed identical
3/10 PASS count with and without the release. Hypothesis disproved.

Where the divergence actually lives:
  - patched ffmpeg-v4l2-request-fourier libavcodec.so with a fwrite
    diag in ff_v4l2_request_append_output → 7 dump files for the
    -frames:v 5 kdirect run, sizes [15133, 3670, 1970, 1323, 812,
    886, 1310] BYTE-IDENTICAL to our LIBVA_V4L2_DUMP_OUTPUT first 7
    submissions for the same input
  - our backend has 2 EXTRA EndPicture calls (t8 size 824, t9 size
    487) on RE-USED surfaces (0x4000008 and 0x4000006)
  - the extras happen because ffmpeg-vaapi's AV1 hwaccel issues
    redecode requests onto surfaces that already hold frames the
    consumer hasn't downloaded yet
  - SKIP_REBIND should let those redecodes' slots stay around but
    doesn't help, because surface_object->current_slot can only
    point at ONE slot at a time and bind_slot overwrites it

True root cause: ffmpeg-vaapi AV1 hwaccel's surface accounting is
incompatible with the iter2 Fix 3 1:1 surface↔slot invariant when
the stream has show_existing_frame frames. Fix would need either
(a) cap_pool tracking N surfaces per slot, or (b) backend reading
ffmpeg-vaapi's display-order mapping and remapping slots accordingly.
Both are non-trivial Phase 4 work — outside this iteration's scope.

Reverted the LIBVA_SKIP_REBIND env-gate to clean shape. Comment
updated with the investigation outcome so the next session has the
context without rediscovering.

State: 3/10 av1_larger frames bit-exact (frames 0/2/4, the
apply_grain=1 IDR-derived ones). test_av1.ivf 208x208 still bit-exact
PASS (no regression). diagnostic logs in BeginPicture +
surface_unbind_slot + v4l2_ioctl_controls retained for future
investigation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-17 12:12:23 +00:00
parent d7ef0f6cd9
commit c839b9456e
+9
View File
@@ -378,6 +378,15 @@ VAStatus RequestBeginPicture(VADriverContextP context, VAContextID context_id,
* first. The new slot is bound and its V4L2 index + mmap pointers
* are mirrored into surface_object->destination_* so the existing
* QBUF/DQBUF/EXPBUF code paths see no behavioral change.
*
* AV1 Phase 3 finding: LIBVA_SKIP_REBIND=1 experiment (do NOT
* unbind on rebind) did not improve PASS count for the av1_larger
* film_grain stress vector — proving the iter2 Fix 3 release is
* NOT the source of the inter-frame divergence. The issue is
* deeper in ffmpeg-vaapi's AV1 hwaccel: per byte-equal OUTPUT
* comparison with the patched-ffmpeg-v4l2request reference run
* (LD_LIBRARY_PATH override on a debug libavcodec.so), 7/7 first
* EndPicture submissions are byte-identical, libva has 2 EXTRA.
*/
if (surface_object->current_slot != NULL)
surface_unbind_slot(driver_data, surface_object);