d7ef0f6cd9
Three more fixes after strace-diff localization vs kdirect. Fix 6 — fill_sequence ENABLE_SUPERRES: gate on picture->pic_info_fields.bits.use_superres instead of unconditional set-true. VAAPI doesn't expose enable_superres at sequence level; per strace diff kdirect clears the flag for streams not using superres (byte 1 of flags was the only SEQUENCE diff). After this fix, SEQUENCE ctrl byte-equal kdirect on every call. Fix 7 — refresh_frame_flags = 0xff (was 0): VAAPI doesn't expose refresh_frame_flags. Default 0xff = "refresh all DPB slots" matches kdirect's submission and AV1 spec default for KEY/SWITCH frames; for inter frames simple P-frame chains naturally tolerate this. Fix 8 — surface_object->av1_order_hint per-surface tracking. Set in av1_set_controls from picture->order_hint of the current frame. Also propagated to the linked display surface (when apply_grain=1 → cur_frame != cur_display) so future frames referencing the display surface find the order_hint via the linked_decode_surface_id. Tried + reverted: ref-name iteration of reference_frame_ts / order_hints via picture->ref_frame_idx[i-1] → DPB slot (Kwiboo's convention via FFmpeg's s->ref[i]). Empirically regressed 3/10 → 1/10. V4L2 uAPI's indexing here looks DPB-slot-direct despite the AV1 spec lexicon — needs kernel-side disambiguation to settle. Verification on ampere (av1_larger.ivf 352x288, 10 frames): Frames 0, 2, 4: PASS bit-exact (apply_grain=1, grain HW path) Frames 1, 3, 5-9: DIFF (apply_grain=0) 3/10 PASS (was 1/10 after iter checkpoint). test_av1.ivf 208x208: unchanged bit-exact PASS sha 029ee72c214b37c1 Remaining open: frame 1 (apply_grain=0, first inter) submits IDENTICAL FRAME ctrl bytes to kdirect (verified strace-diff post-fix), yet decoded output diverges. That means the divergence is no longer in control submission — points at OUTPUT-side bitstream differences between ffmpeg-vaapi and ffmpeg-v4l2request, or at DPB CAPTURE buffer state (grain-applied data being used as reference vs pre-grain). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
242 lines
9.2 KiB
C
242 lines
9.2 KiB
C
/*
|
|
* Copyright (C) 2007 Intel Corporation
|
|
* Copyright (C) 2016 Florent Revest <florent.revest@free-electrons.com>
|
|
* Copyright (C) 2018 Paul Kocialkowski <paul.kocialkowski@bootlin.com>
|
|
*
|
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
|
* copy of this software and associated documentation files (the
|
|
* "Software"), to deal in the Software without restriction, including
|
|
* without limitation the rights to use, copy, modify, merge, publish,
|
|
* distribute, sub license, and/or sell copies of the Software, and to
|
|
* permit persons to whom the Software is furnished to do so, subject to
|
|
* the following conditions:
|
|
*
|
|
* The above copyright notice and this permission notice (including the
|
|
* next paragraph) shall be included in all copies or substantial portions
|
|
* of the Software.
|
|
*
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
|
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
|
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
|
|
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
|
|
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
|
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
|
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
*/
|
|
|
|
#ifndef _SURFACE_H_
|
|
#define _SURFACE_H_
|
|
|
|
#include <linux/videodev2.h>
|
|
|
|
#include <va/va_backend.h>
|
|
|
|
#include "object_heap.h"
|
|
#include "cap_pool.h"
|
|
|
|
#include "h265.h"
|
|
|
|
struct request_data;
|
|
|
|
#define SURFACE(data, id) \
|
|
((struct object_surface *)object_heap_lookup(&(data)->surface_heap, id))
|
|
#define SURFACE_ID_OFFSET 0x04000000
|
|
|
|
struct object_surface {
|
|
struct object_base base;
|
|
|
|
VASurfaceStatus status;
|
|
int width;
|
|
int height;
|
|
|
|
unsigned int source_index;
|
|
void *source_data;
|
|
unsigned int source_size;
|
|
|
|
/*
|
|
* Iter2 Fix 3: destination_* fields below are now per-decode-cycle.
|
|
* They are populated from current_slot in RequestBeginPicture and
|
|
* remain valid through SyncSurface, ExportSurfaceHandle, and
|
|
* DeriveImage/copy_surface_to_image (vaapi-copy path). Subsequent
|
|
* BeginPicture for this surface releases the prior slot and
|
|
* acquires a new one.
|
|
*
|
|
* destination_planes_count, destination_sizes, destination_offsets,
|
|
* destination_bytesperlines are FORMAT-uniform across all CAPTURE
|
|
* buffers, so they're set once at CreateSurfaces2 time and stay.
|
|
*
|
|
* destination_index, destination_map[], destination_map_lengths,
|
|
* destination_map_offsets, destination_data[] are SLOT-specific
|
|
* and re-populated each BeginPicture from current_slot.
|
|
*
|
|
* destination_buffers_count is also format-uniform (V4L2 planes
|
|
* per buffer = 1 for single-plane MPLANE NV12).
|
|
*/
|
|
struct cap_pool_slot *current_slot; /* iter2 Fix 3 */
|
|
unsigned int destination_index;
|
|
void *destination_map[VIDEO_MAX_PLANES];
|
|
unsigned int destination_map_lengths[VIDEO_MAX_PLANES];
|
|
unsigned int destination_map_offsets[VIDEO_MAX_PLANES];
|
|
void *destination_data[VIDEO_MAX_PLANES];
|
|
unsigned int destination_sizes[VIDEO_MAX_PLANES];
|
|
unsigned int destination_offsets[VIDEO_MAX_PLANES];
|
|
unsigned int destination_bytesperlines[VIDEO_MAX_PLANES];
|
|
unsigned int destination_planes_count;
|
|
unsigned int destination_buffers_count;
|
|
|
|
unsigned int slices_size;
|
|
unsigned int slices_count;
|
|
|
|
struct timeval timestamp;
|
|
|
|
/*
|
|
* AV1 Phase 3: for streams with apply_grain=1, VAAPI's
|
|
* VADecPictureParameterBufferAV1 carries current_display_picture
|
|
* (display-time surface) separate from current_frame (decode
|
|
* target). vpu981 HW applies grain inline to the decode CAPTURE
|
|
* buffer, so the decoded data lives in current_frame's slot — but
|
|
* ffmpeg calls vaGetImage on current_display_picture which has no
|
|
* slot bound. linked_decode_surface_id, set in av1_set_controls
|
|
* on the display surface, points to the decode surface so
|
|
* copy_surface_to_image can borrow its destination_data[].
|
|
*
|
|
* VA_INVALID_SURFACE = no link (the common case: 8-bit codecs,
|
|
* AV1 with apply_grain=0, AV1 frames where cur_frame ==
|
|
* cur_display).
|
|
*/
|
|
VASurfaceID linked_decode_surface_id;
|
|
|
|
/*
|
|
* AV1 Phase 3: AV1 order_hint of the frame currently decoded into
|
|
* this surface. VAAPI's VADecPictureParameterBufferAV1.order_hint
|
|
* is per-frame; kernel's v4l2_ctrl_av1_frame.order_hints[8] is
|
|
* per-reference. We track each decoded frame's order_hint here so
|
|
* the next frame's av1_set_controls can populate order_hints[i]
|
|
* from ref_frame_map[i] → SURFACE → av1_order_hint.
|
|
*/
|
|
uint8_t av1_order_hint;
|
|
|
|
union {
|
|
struct {
|
|
VAPictureParameterBufferMPEG2 picture;
|
|
VASliceParameterBufferMPEG2 slice;
|
|
VAIQMatrixBufferMPEG2 iqmatrix;
|
|
bool iqmatrix_set;
|
|
} mpeg2;
|
|
struct {
|
|
VAIQMatrixBufferH264 matrix;
|
|
bool matrix_set;
|
|
VAPictureParameterBufferH264 picture;
|
|
VASliceParameterBufferH264 slice;
|
|
} h264;
|
|
struct {
|
|
VAPictureParameterBufferHEVC picture;
|
|
VASliceParameterBufferHEVC slice;
|
|
VASliceParameterBufferHEVC slices[HEVC_MAX_SLICES_PER_FRAME];
|
|
unsigned int num_slices;
|
|
VAIQMatrixBufferHEVC iqmatrix;
|
|
bool iqmatrix_set;
|
|
} h265;
|
|
struct {
|
|
VAPictureParameterBufferVP8 picture;
|
|
VASliceParameterBufferVP8 slice;
|
|
VAIQMatrixBufferVP8 iqmatrix;
|
|
bool iqmatrix_set;
|
|
VAProbabilityDataBufferVP8 probability;
|
|
bool probability_set;
|
|
} vp8;
|
|
struct {
|
|
VADecPictureParameterBufferVP9 picture;
|
|
VASliceParameterBufferVP9 slice;
|
|
} vp9;
|
|
/*
|
|
* ampere-av1-enablement: AV1 needs picture-header +
|
|
* variable number of slice/tile params (one per tile).
|
|
* tile_group_entries[] holds parsed VASliceParameterBufferAV1
|
|
* entries up to MAX_TILES; av1.c builds the matching
|
|
* v4l2_ctrl_av1_tile_group_entry[] at set_controls time.
|
|
*/
|
|
struct {
|
|
#define AV1_MAX_TILES 128
|
|
VADecPictureParameterBufferAV1 picture;
|
|
VASliceParameterBufferAV1 tile_group_entries[AV1_MAX_TILES];
|
|
unsigned int num_tile_group_entries;
|
|
} av1;
|
|
} params;
|
|
|
|
int request_fd;
|
|
};
|
|
|
|
VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
|
|
unsigned int width, unsigned int height,
|
|
VASurfaceID *surfaces_ids,
|
|
unsigned int surfaces_count,
|
|
VASurfaceAttrib *attributes,
|
|
unsigned int attributes_count);
|
|
VAStatus RequestCreateSurfaces(VADriverContextP context, int width, int height,
|
|
int format, int surfaces_count,
|
|
VASurfaceID *surfaces_ids);
|
|
VAStatus RequestDestroySurfaces(VADriverContextP context,
|
|
VASurfaceID *surfaces_ids, int surfaces_count);
|
|
VAStatus RequestSyncSurface(VADriverContextP context, VASurfaceID surface_id);
|
|
VAStatus RequestQuerySurfaceAttributes(VADriverContextP context,
|
|
VAConfigID config,
|
|
VASurfaceAttrib *attributes,
|
|
unsigned int *attributes_count);
|
|
VAStatus RequestQuerySurfaceStatus(VADriverContextP context,
|
|
VASurfaceID surface_id,
|
|
VASurfaceStatus *status);
|
|
VAStatus RequestPutSurface(VADriverContextP context, VASurfaceID surface_id,
|
|
void *draw, short src_x, short src_y,
|
|
unsigned short src_width, unsigned short src_height,
|
|
short dst_x, short dst_y, unsigned short dst_width,
|
|
unsigned short dst_height, VARectangle *cliprects,
|
|
unsigned int cliprects_count, unsigned int flags);
|
|
VAStatus RequestLockSurface(VADriverContextP context, VASurfaceID surface_id,
|
|
unsigned int *fourcc, unsigned int *luma_stride,
|
|
unsigned int *chroma_u_stride,
|
|
unsigned int *chroma_v_stride,
|
|
unsigned int *luma_offset,
|
|
unsigned int *chroma_u_offset,
|
|
unsigned int *chroma_v_offset,
|
|
unsigned int *buffer_name, void **buffer);
|
|
VAStatus RequestUnlockSurface(VADriverContextP context, VASurfaceID surface_id);
|
|
VAStatus RequestExportSurfaceHandle(VADriverContextP context,
|
|
VASurfaceID surface_id, uint32_t mem_type,
|
|
uint32_t flags, void *descriptor);
|
|
|
|
/*
|
|
* iter5b-β Commit D: populate a surface's format-uniform destination_*
|
|
* fields (planes_count, buffers_count, offsets, sizes, bytesperlines)
|
|
* from driver_data's cached CAPTURE-side geometry. Idempotent: skip
|
|
* if already filled (destination_planes_count != 0). Caller must
|
|
* ensure driver_data->fmt_valid is true (CreateContext has run).
|
|
*
|
|
* Called by:
|
|
* - context.c::RequestCreateContext after v4l2_get_format(CAPTURE)
|
|
* populates the cache; walks the surface_heap and fills every
|
|
* existing surface (covers surfaces created before CreateContext,
|
|
* including the ffmpeg vaapi-copy case where surfaces_count=0 is
|
|
* passed but surfaces exist in the heap from earlier
|
|
* CreateSurfaces2 calls).
|
|
* - surface.c::RequestCreateSurfaces2 after surface allocation,
|
|
* covering the case where CreateContext fired before this
|
|
* CreateSurfaces2 call (fmt cache is valid, fill immediately).
|
|
*/
|
|
void surface_fill_format_uniform(struct request_data *driver_data,
|
|
struct object_surface *surface_object);
|
|
|
|
/*
|
|
* Iter2 Fix 3: bind / unbind a CAPTURE-pool slot to an object_surface.
|
|
* Called from picture.c::RequestBeginPicture (acquire+bind) and
|
|
* surface.c::RequestDestroySurfaces (unbind). Mirrors slot's V4L2 index
|
|
* and mmap pointers into surface_object->destination_* so existing
|
|
* QBUF/DQBUF/EXPBUF code paths see no behavioral change.
|
|
*/
|
|
void surface_bind_slot(struct object_surface *surface_object,
|
|
struct cap_pool_slot *slot);
|
|
void surface_unbind_slot(struct request_data *driver_data,
|
|
struct object_surface *surface_object);
|
|
|
|
#endif
|