ampere-av1 Phase 3 progress: film_grain link + UPDATE_GRAIN; frame 0 bit-exact

Three structural fixes for AV1 with film_grain on vpu981 (RK3588). Output
is no longer empty / crashed; frame 0 (IDR with apply_grain=1) is
bit-exact vs kdirect. Inter frames still diverge.

Fix 1 — surface.h + surface.c: linked_decode_surface_id field on
object_surface, initialized to VA_INVALID_SURFACE. When AV1 picture has
apply_grain=1, VAAPI's VADecPictureParameterBufferAV1 carries a
current_display_picture distinct from current_frame. ffmpeg-vaapi calls
vaBeginPicture on current_frame (decode surface, slot gets bound) but
vaGetImage on current_display_picture (display surface, no slot) → NULL
deref in copy_surface_to_image.

Fix 2 — av1.c: in av1_set_controls, when cur_frame != cur_display, set
display_surface->linked_decode_surface_id = current_frame. Establishes
the back-link so display surface can borrow decode surface's data.

Fix 3 — image.c copy_surface_to_image: when slot is NULL and the
surface has linked_decode_surface_id, lookup the decode surface and
mirror its destination_data[] + destination_sizes[] +
destination_planes_count. NULL guard with diagnostic log retained.

Fix 4 — av1.c fill_film_grain: when apply_grain=1, also set
V4L2_AV1_FILM_GRAIN_FLAG_UPDATE_GRAIN. Confirmed by strace-diff: kdirect
sends flags=0x0B (APPLY|UPDATE|...), libva was sending 0x09 (APPLY but
no UPDATE). Without UPDATE the kernel tries to reuse from
film_grain_params_ref_idx=0, which is never populated. Earlier reverted
because UPDATE seemed to trigger a SEGV — but that SEGV was the
unmasked NULL-slot deref; with fix 1+2+3 in place UPDATE is safe.

Fix 5 — av1.c reference_frame_ts plumbing: when a referenced surface
has timestamp=0 AND linked_decode_surface_id set, follow the link to
find the decode surface that carries the real timestamp. Display
surfaces don't get OUTPUT QBUF'd by us, so their own timestamp stays
zero.

Also: BeginPicture diagnostic log + surface_unbind_slot diagnostic log
+ v4l2.c error_idx diagnostic (kept from earlier — useful for ongoing
investigation).

Verification on ampere:
  test_av1.ivf (208x208, 2 frames, no grain): bit-exact PASS sha
    029ee72c214b37c1 (unchanged, no regression)
  av1_larger.ivf (352x288, 10 frames, film_grain alternates):
    frame 0 (key, apply_grain=1): PASS bit-exact vs kdirect
    frame 4: PASS bit-exact
    frames 1,2,3,5,6,7,8,9: DIFFER

Frame 0 PASS proves: SEQUENCE + FRAME + TILE_GROUP_ENTRY + FILM_GRAIN
mapping is correct for IDR. Frame 4 PASS is unexplained but encouraging.
Inter-frame divergence (frame 1+) points at: reference handling for
inter prediction is still off — either order_hints[] (still zero,
VAAPI doesn't expose per-ref), or grain-applied vs pre-grain DPB
semantics, or ref_frame_idx pointing into the wrong surface space.

Next investigation: per-frame strace diff between libva and kdirect
controls payload to spot remaining field mis-mappings on inter frames.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-17 10:45:31 +00:00
parent ab79ed5e4d
commit 5803cbcf6c
5 changed files with 122 additions and 9 deletions
+48 -9
View File
@@ -460,13 +460,19 @@ static void av1_fill_film_grain(VADecPictureParameterBufferAV1 *picture,
ctrl->cb_offset = fg->cb_offset;
ctrl->cr_offset = fg->cr_offset;
if (fg->film_grain_info_fields.bits.apply_grain)
if (fg->film_grain_info_fields.bits.apply_grain) {
ctrl->flags |= V4L2_AV1_FILM_GRAIN_FLAG_APPLY_GRAIN;
/* VAAPI doesn't expose update_grain; not setting it means
* "reuse from film_grain_params_ref_idx (=0)". A previous test
* with UPDATE_GRAIN=1 unconditional triggered a segfault on the
* 352x288 film_grain-50 fixture — keep update_grain implicit-zero
* here and revisit during Phase 3 deeper investigation. */
/* kdirect strace diff confirmed: V4L2_AV1_FILM_GRAIN_FLAG_
* UPDATE_GRAIN must be set when apply_grain=1 (kdirect's
* flags byte is 0x0B = APPLY|UPDATE|...). VAAPI's
* VAFilmGrainStructAV1 doesn't expose update_grain
* separately. Default to UPDATE=1 (use submitted params,
* not reuse from non-existent prior film_grain ref). The
* earlier segfault we saw with this flag was unmasked by
* the link-NULL deref (now fixed via linked_decode_surface);
* not caused by UPDATE_GRAIN itself. */
ctrl->flags |= V4L2_AV1_FILM_GRAIN_FLAG_UPDATE_GRAIN;
}
if (fg->film_grain_info_fields.bits.chroma_scaling_from_luma)
ctrl->flags |= V4L2_AV1_FILM_GRAIN_FLAG_CHROMA_SCALING_FROM_LUMA;
if (fg->film_grain_info_fields.bits.overlap_flag)
@@ -517,6 +523,24 @@ int av1_set_controls(struct request_data *driver_data,
(void)context;
/*
* AV1 film_grain link: when apply_grain=1, ffmpeg-vaapi allocates a
* separate display surface (current_display_picture) from the decode
* surface (current_frame). vpu981 HW applies grain inline to the
* decode CAPTURE buffer, so the consumable data is in current_frame's
* slot. ffmpeg then calls vaGetImage on current_display_picture which
* has no slot bound. Link the display surface back to the decode
* surface so copy_surface_to_image can borrow destination_data[].
*/
if (picture->current_display_picture != VA_INVALID_SURFACE &&
picture->current_display_picture != picture->current_frame) {
struct object_surface *display_surface =
SURFACE(driver_data, picture->current_display_picture);
if (display_surface != NULL)
display_surface->linked_decode_surface_id =
picture->current_frame;
}
if (num_tiles > AV1_MAX_TILES)
num_tiles = AV1_MAX_TILES;
@@ -553,12 +577,27 @@ int av1_set_controls(struct request_data *driver_data,
for (i = 0; i < BACKEND_AV1_TOTAL_REFS_PER_FRAME; i++) {
VASurfaceID ref_id = picture->ref_frame_map[i];
struct object_surface *ref_surface;
uint64_t ts;
if (ref_id == VA_INVALID_SURFACE)
continue;
ref_surface = SURFACE(driver_data, ref_id);
if (ref_surface)
frame.reference_frame_ts[i] =
v4l2_timeval_to_ns(&ref_surface->timestamp);
if (ref_surface == NULL)
continue;
ts = v4l2_timeval_to_ns(&ref_surface->timestamp);
/* AV1 film_grain: if the referenced surface is a display
* surface (linked to a decode surface), its own timestamp
* may be zero (never bound for OUTPUT QBUF). Follow the
* link to find the decode surface that carries the real
* timestamp. */
if (ts == 0 &&
ref_surface->linked_decode_surface_id != VA_INVALID_SURFACE) {
struct object_surface *dec =
SURFACE(driver_data,
ref_surface->linked_decode_surface_id);
if (dec != NULL)
ts = v4l2_timeval_to_ns(&dec->timestamp);
}
frame.reference_frame_ts[i] = ts;
}
if (driver_data->has_av1_film_grain)
+43
View File
@@ -216,7 +216,50 @@ static VAStatus copy_surface_to_image (struct request_data *driver_data,
}
}
/*
* AV1 film_grain: when this surface is the display surface of a
* decode (current_display_picture != current_frame with apply_grain=1),
* its slot is NULL because BeginPicture only fired on the decode
* surface. Follow the back-link set in av1_set_controls and borrow
* the decode surface's destination_data + sizes for the copy.
*/
if (surface_object->current_slot == NULL &&
surface_object->linked_decode_surface_id != VA_INVALID_SURFACE) {
struct object_surface *decode_surface =
SURFACE(driver_data,
surface_object->linked_decode_surface_id);
if (decode_surface != NULL &&
decode_surface->current_slot != NULL) {
/* Mirror the fields we read below. The surface heap
* pointer is stable for the surface's lifetime; we
* only need destination_data + destination_sizes +
* destination_planes_count from it. */
surface_object->destination_planes_count =
decode_surface->destination_planes_count;
for (i = 0; i < decode_surface->destination_planes_count; i++) {
surface_object->destination_data[i] =
decode_surface->destination_data[i];
surface_object->destination_sizes[i] =
decode_surface->destination_sizes[i];
}
}
}
for (i = 0; i < surface_object->destination_planes_count; i++) {
/* AV1 Phase 3 diag: surface NULL-deref hunt. */
if (buffer_object->data == NULL ||
surface_object->destination_data[i] == NULL) {
request_log("copy_surface_to_image NULL i=%u "
"buf_data=%p dest_data=%p dest_size=%u "
"planes=%u slot=%p linked=0x%x\n",
i, (void *)buffer_object->data,
(void *)surface_object->destination_data[i],
surface_object->destination_sizes[i],
surface_object->destination_planes_count,
(void *)surface_object->current_slot,
surface_object->linked_decode_surface_id);
return VA_STATUS_ERROR_OPERATION_FAILED;
}
#ifdef __arm__
if (!video_format_is_linear(driver_data->video_format))
tiled_to_planar(surface_object->destination_data[i],
+6
View File
@@ -361,6 +361,12 @@ VAStatus RequestBeginPicture(VADriverContextP context, VAContextID context_id,
if (surface_object == NULL)
return VA_STATUS_ERROR_INVALID_SURFACE;
/* AV1 Phase 3 diag */
request_log("BeginPicture id=0x%x prev_slot=%p status=%d\n",
surface_object->base.id,
(void *)surface_object->current_slot,
surface_object->status);
if (surface_object->status == VASurfaceRendering)
RequestSyncSurface(context, surface_id);
+8
View File
@@ -111,6 +111,13 @@ void surface_unbind_slot(struct request_data *driver_data,
{
if (surface_object->current_slot == NULL)
return;
/* AV1 Phase 3 diag: log every unbind with surface id + slot idx
* + status — confirms whether BeginPicture rebind is racing the
* consumer's vaGetImage on the previous frame. */
request_log("surface_unbind_slot id=0x%x status=%d slot_idx=%u\n",
surface_object->base.id,
surface_object->status,
surface_object->current_slot->v4l2_index);
cap_pool_release(&driver_data->capture_pool, surface_object->current_slot);
surface_object->current_slot = NULL;
}
@@ -192,6 +199,7 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
return VA_STATUS_ERROR_ALLOCATION_FAILED;
surface_object->current_slot = NULL; /* iter2 Fix 3 */
surface_object->linked_decode_surface_id = VA_INVALID_SURFACE;
surface_object->destination_index = 0; /* set on bind */
surface_object->destination_planes_count = 0; /* set at CreateContext */
surface_object->destination_buffers_count = 0; /* set at CreateContext */
+17
View File
@@ -89,6 +89,23 @@ struct object_surface {
struct timeval timestamp;
/*
* AV1 Phase 3: for streams with apply_grain=1, VAAPI's
* VADecPictureParameterBufferAV1 carries current_display_picture
* (display-time surface) separate from current_frame (decode
* target). vpu981 HW applies grain inline to the decode CAPTURE
* buffer, so the decoded data lives in current_frame's slot — but
* ffmpeg calls vaGetImage on current_display_picture which has no
* slot bound. linked_decode_surface_id, set in av1_set_controls
* on the display surface, points to the decode surface so
* copy_surface_to_image can borrow its destination_data[].
*
* VA_INVALID_SURFACE = no link (the common case: 8-bit codecs,
* AV1 with apply_grain=0, AV1 frames where cur_frame ==
* cur_display).
*/
VASurfaceID linked_decode_surface_id;
union {
struct {
VAPictureParameterBufferMPEG2 picture;