forked from marfrit/libva-v4l2-request-fourier
f61f736380
Fixes the rkvdec_hevc_prepare_hw_st_rps out-of-bounds kernel OOPS that blocked HEVC decode on ampere (RK3588) per marfrit/libva-v4l2-request-fourier#3 and ampere-fourier iter1 close. Mechanism (Phase 5 amendment to issue body): The new EXT_SPS controls are registered as V4L2_CTRL_FLAG_DYNAMIC_ARRAY in vdpu38x_hevc_ctrl_descs (rkvdec.c:279/284) with cfg.dims = { 65 }. The v4l2-ctrl framework init-allocates 1 zeroed element (ctrls-core.c:2116). When num_short_term_ref_pic_sets > 1, rkvdec_hevc_prepare_hw_st_rps (rkvdec-hevc-common.c:393-405) iterates idx 0..N-1 and overruns the 1-element kernel allocation. Submitting an N-element dynamic-array control via S_EXT_CTRLS extends the framework allocation. Userspace fix: - VIDIOC_QUERY_EXT_CTRL probe at first HEVC CreateContext sets driver_data->has_ext_sps_rps (true on VDPU381/383, false on legacy RK3399 — control unregistered there, so fresnel iter38 5/5 + iter39 sub-profile paths are byte-identical to pre-iter2). - When set, h265_set_controls appends EXT_SPS_ST_RPS + _LT_RPS as calloc'd zero arrays, sized by VAAPI's count fields and capped at H.265 §7.4.3.2 spec maxima (ST 64, LT 32). Min 1 (kernel rejects 0). - Free post-S_EXT_CTRLS. Decode correctness scope: VAAPI does NOT expose per-set st_ref_pic_set syntax elements (delta_idx_minus1, delta_rps_sign, etc.) — confirmed in va_dec_hevc.h. All-zero entries give empty inter-pred RPS per set, which is correct for IDR-only streams and incorrect for streams with inter-pred RPS dependence. iter2 acceptance: stop the OOPS. Decode-correctness for inter-RPS content is a known follow-up requiring either bitstream-snoop or SPS-passthrough via a new VAAPI extension. Files: - include/hevc-ctrls.h: #ifndef-guarded fallback definitions for V4L2_CID_STATELESS_HEVC_EXT_SPS_{ST,LT}_RPS + structs (ampere host is on linux-api-headers 6.19-1; the new CIDs land in 7.0). - src/request.h: driver_data->has_ext_sps_rps (persists for driver lifetime; gated solely by HEVC code path so cross-codec leakage impossible). - src/context.c: probe at HEVC CreateContext via v4l2_query_ext_ctrl. - src/h265.c: controls[5] → controls[7]; #include <hevc-ctrls.h> (replaces <linux/v4l2-controls.h>) for forward UAPI compatibility. Compile-tested on boltzmann (aarch64 native, gcc 15.2.1): clean .so, 0 new warnings. Fresnel cross-device safety: legacy RK3399 rkvdec_ctrl table omits the CIDs; probe returns false; new code path never executes. iter39 sub-profile work (commits662f887+8746690) is preserved in-tree; iter2 is a forward-compatible additive change. Refs: marfrit/libva-v4l2-request-fourier#3 ampere-fourier/iter1_close.md HEVC blocker ampere-fourier/iter2_phase0_findings.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
175 lines
6.7 KiB
C
175 lines
6.7 KiB
C
/*
|
|
* Copyright (C) 2007 Intel Corporation
|
|
* Copyright (C) 2016 Florent Revest <florent.revest@free-electrons.com>
|
|
* Copyright (C) 2018 Paul Kocialkowski <paul.kocialkowski@bootlin.com>
|
|
*
|
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
|
* copy of this software and associated documentation files (the
|
|
* "Software"), to deal in the Software without restriction, including
|
|
* without limitation the rights to use, copy, modify, merge, publish,
|
|
* distribute, sub license, and/or sell copies of the Software, and to
|
|
* permit persons to whom the Software is furnished to do so, subject to
|
|
* the following conditions:
|
|
*
|
|
* The above copyright notice and this permission notice (including the
|
|
* next paragraph) shall be included in all copies or substantial portions
|
|
* of the Software.
|
|
*
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
|
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
|
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
|
|
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
|
|
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
|
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
|
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
*/
|
|
|
|
#ifndef _V4L2_REQUEST_H_
|
|
#define _V4L2_REQUEST_H_
|
|
|
|
#include <stdbool.h>
|
|
|
|
#include "context.h"
|
|
#include "object_heap.h"
|
|
#include "request_pool.h"
|
|
#include "cap_pool.h"
|
|
#include "video.h"
|
|
#include <va/va.h>
|
|
|
|
#include <linux/videodev2.h>
|
|
|
|
#define V4L2_REQUEST_STR_VENDOR "v4l2-request"
|
|
|
|
#define V4L2_REQUEST_MAX_PROFILES 13
|
|
#define V4L2_REQUEST_MAX_ENTRYPOINTS 5
|
|
#define V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES 10
|
|
#define V4L2_REQUEST_MAX_IMAGE_FORMATS 10
|
|
#define V4L2_REQUEST_MAX_SUBPIC_FORMATS 4
|
|
#define V4L2_REQUEST_MAX_DISPLAY_ATTRIBUTES 4
|
|
|
|
struct request_data {
|
|
struct object_heap config_heap;
|
|
struct object_heap context_heap;
|
|
struct object_heap surface_heap;
|
|
struct object_heap buffer_heap;
|
|
struct object_heap image_heap;
|
|
int video_fd;
|
|
int media_fd;
|
|
|
|
/*
|
|
* iter38: multi-device probe. RK3399 has two V4L2 stateless decoders:
|
|
* - rkvdec → H264 / HEVC / VP9
|
|
* - hantro-vpu (rk3399-vpu-dec) → MPEG-2 / VP8
|
|
* At VA_DRIVER_INIT we probe both, open their fds, and store them
|
|
* here. driver_data->video_fd / media_fd above are the "active" fds
|
|
* (point at one of the pairs below). RequestCreateConfig retargets
|
|
* them based on the profile's required device. Pools and video_format
|
|
* are torn down at retarget time so the next CreateContext rebuilds
|
|
* them against the right device.
|
|
*
|
|
* -1 means that device kind isn't present on this kernel boot.
|
|
* Honours LIBVA_V4L2_REQUEST_VIDEO_PATH / MEDIA_PATH explicit
|
|
* overrides — when those are set, only the single requested device
|
|
* is opened and the alt fds stay -1.
|
|
*/
|
|
int video_fd_rkvdec;
|
|
int media_fd_rkvdec;
|
|
int video_fd_hantro;
|
|
int media_fd_hantro;
|
|
|
|
struct video_format *video_format;
|
|
|
|
/*
|
|
* OUTPUT (bitstream-input) buffer pool, decoupled from VA
|
|
* surfaces. Sized by codec pipeline depth, populated on first
|
|
* RequestCreateContext, torn down at driver Terminate.
|
|
*/
|
|
struct request_pool output_pool;
|
|
|
|
/*
|
|
* CAPTURE (decoded-frame) buffer pool, decoupled from VA
|
|
* surfaces (iter2 Fix 3). Each surface acquires a slot at
|
|
* vaBeginPicture time and releases it on the next acquisition
|
|
* or vaDestroySurfaces. Pool sized to max(surfaces_count,
|
|
* MIN_CAP_POOL) at first vaCreateSurfaces2; torn down at
|
|
* vaDestroyContext.
|
|
*
|
|
* Background: pre-iter2 each surface was 1:1 bound to one
|
|
* CAPTURE buffer index; mpv re-using a surface for a new decode
|
|
* caused V4L2 to re-QBUF the same physical buffer while a
|
|
* compositor still held an EXPBUF'd dma_buf fd, producing
|
|
* visible stutter on mpv vaapi --vo=gpu.
|
|
*/
|
|
struct cap_pool capture_pool;
|
|
|
|
/*
|
|
* iter5b-β: the pre-β last_output_{width,height} cache fields
|
|
* and surface_reset_format_cache() helper are deleted. They
|
|
* existed because CreateSurfaces2 owned the OUTPUT-side V4L2
|
|
* device-format lifecycle and needed to gate re-S_FMT on
|
|
* resolution change. β moves that lifecycle to CreateContext,
|
|
* which is naturally one-shot per context cycle; no caching is
|
|
* required. DestroyContext + next CreateContext rebuild from
|
|
* scratch.
|
|
*
|
|
* iter5b-β Commit D: cache the format-uniform CAPTURE-side
|
|
* geometry from v4l2_get_format so CreateSurfaces2 can populate
|
|
* a newly-created surface's destination_* fields without
|
|
* re-querying the device. Set by CreateContext after the
|
|
* v4l2_get_format(CAPTURE) call; consumed by both:
|
|
* 1. CreateContext's surface_heap walk (fills surfaces that
|
|
* pre-exist when CreateContext fires);
|
|
* 2. CreateSurfaces2's per-surface init (fills surfaces
|
|
* created AFTER CreateContext, e.g. ffmpeg vaapi-copy
|
|
* pool dynamics where the consumer passes surfaces_count=0
|
|
* to vaCreateContext and creates surfaces lazily).
|
|
*
|
|
* fmt_valid is true once CreateContext has populated the cache;
|
|
* CreateSurfaces2 only lazy-fills when fmt_valid is true.
|
|
*/
|
|
bool fmt_valid;
|
|
unsigned int fmt_format_height;
|
|
unsigned int fmt_planes_count;
|
|
unsigned int fmt_buffers_count;
|
|
unsigned int fmt_sizes[VIDEO_MAX_PLANES];
|
|
unsigned int fmt_bytesperlines[VIDEO_MAX_PLANES];
|
|
|
|
/*
|
|
* iter39: active session is decoding a 10-bit profile (Hi10P / Main10).
|
|
* Set in RequestCreateContext from config->profile. Drives:
|
|
* - CAPTURE pix_fmt selection (NV15 instead of NV12)
|
|
* - image.c DeriveImage / QueryImageFormats fourcc reporting (P010
|
|
* instead of NV12)
|
|
* - copy_surface_to_image NV15→P010 unpack branch
|
|
* Reset to false at DestroyContext.
|
|
*/
|
|
bool is_10bit;
|
|
|
|
/*
|
|
* iter2 (ampere-fourier): rkvdec on this host exposes the
|
|
* V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS dynamic-array control
|
|
* (VDPU381/383 path). When true, h265_set_controls appends
|
|
* EXT_SPS_ST_RPS + EXT_SPS_LT_RPS (zero-initialized arrays sized
|
|
* by VAAPI's count fields, capped at H.265 spec maxima 64/32).
|
|
* Required to prevent rkvdec_hevc_prepare_hw_st_rps out-of-bounds
|
|
* kernel OOPS when num_short_term_ref_pic_sets > 1.
|
|
* Probed at first HEVC CreateContext via VIDIOC_QUERY_EXT_CTRL.
|
|
* Persists for driver lifetime (gated solely by HEVC-only code
|
|
* path, so cross-codec leakage cannot occur).
|
|
*/
|
|
bool has_ext_sps_rps;
|
|
};
|
|
|
|
VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context);
|
|
VAStatus RequestTerminate(VADriverContextP context);
|
|
|
|
/*
|
|
* iter38: retarget driver_data->{video,media}_fd to the device required by
|
|
* `profile`. Returns 0 on success, -1 on profile not mappable to any kind.
|
|
* Defined in request.c.
|
|
*/
|
|
int request_switch_device_for_profile(struct request_data *driver_data,
|
|
VAProfile profile);
|
|
|
|
#endif
|