c839b9456e
Investigated whether picture.c::BeginPicture's iter2 Fix 3 release-on-
rebind was causing AV1 inter-frame divergence on av1_larger.ivf
(film_grain stress vector). Added env-gated LIBVA_SKIP_REBIND=1
experiment (leak old slot instead of release); A/B run showed identical
3/10 PASS count with and without the release. Hypothesis disproved.
Where the divergence actually lives:
- patched ffmpeg-v4l2-request-fourier libavcodec.so with a fwrite
diag in ff_v4l2_request_append_output → 7 dump files for the
-frames:v 5 kdirect run, sizes [15133, 3670, 1970, 1323, 812,
886, 1310] BYTE-IDENTICAL to our LIBVA_V4L2_DUMP_OUTPUT first 7
submissions for the same input
- our backend has 2 EXTRA EndPicture calls (t8 size 824, t9 size
487) on RE-USED surfaces (0x4000008 and 0x4000006)
- the extras happen because ffmpeg-vaapi's AV1 hwaccel issues
redecode requests onto surfaces that already hold frames the
consumer hasn't downloaded yet
- SKIP_REBIND should let those redecodes' slots stay around but
doesn't help, because surface_object->current_slot can only
point at ONE slot at a time and bind_slot overwrites it
True root cause: ffmpeg-vaapi AV1 hwaccel's surface accounting is
incompatible with the iter2 Fix 3 1:1 surface↔slot invariant when
the stream has show_existing_frame frames. Fix would need either
(a) cap_pool tracking N surfaces per slot, or (b) backend reading
ffmpeg-vaapi's display-order mapping and remapping slots accordingly.
Both are non-trivial Phase 4 work — outside this iteration's scope.
Reverted the LIBVA_SKIP_REBIND env-gate to clean shape. Comment
updated with the investigation outcome so the next session has the
context without rediscovering.
State: 3/10 av1_larger frames bit-exact (frames 0/2/4, the
apply_grain=1 IDR-derived ones). test_av1.ivf 208x208 still bit-exact
PASS (no regression). diagnostic logs in BeginPicture +
surface_unbind_slot + v4l2_ioctl_controls retained for future
investigation.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>