iter39 α-31: H264 Hi10P + HEVC Main10 sub-profile support (10-bit, rkvdec NV15)

Adds VAProfileH264High10 and VAProfileHEVCMain10 to the libva-v4l2-request
backend. RK3399 rkvdec emits decoded frames as V4L2_PIX_FMT_NV15 (4 × 10-bit
values packed in 5 bytes per element); VAAPI consumers receive standard
VA_FOURCC_P010 via a new userspace unpack in copy_surface_to_image.

VP9 Profile 2 explicitly NOT added — RK3399 rkvdec kernel ctrl table
caps at V4L2_MPEG_VIDEO_VP9_PROFILE_0 (rkvdec.c::rkvdec_vp9_ctrl_descs).

Touchpoints (per Phase 5 sonnet-architect review amendments):
  - include/drm_fourcc.h: define DRM_FORMAT_NV15 (vendored libdrm lacks it)
  - src/nv15.{c,h}: NV15 → P010 plane unpack (LSB-first, per
    Documentation/userspace-api/media/v4l/pixfmt-nv15.rst)
  - src/video.c: NV15 entry in formats[] (else NULL-deref on video_format_find)
  - src/codec.c: pixelformat_for_profile cases for Hi10P + Main10
  - src/config.c: enumeration, validation, entrypoints, RT_FORMAT_YUV420_10
    advertisement for 10-bit profiles
  - src/context.c: per-profile CAPTURE pix_fmt (NV12/NV15), 10-bit synthetic
    SPS (bit_depth_luma_minus8=2), video_format invalidation on bit-depth
    transition (sibling to iter38 device-switch invalidation), is_10bit flag
  - src/surface.c: RT_FORMAT_YUV420_10 admission, NV15 fourcc on PRIME export
  - src/image.c: P010 reporting in DeriveImage + QueryImageFormats,
    P010-aware sizing in CreateImage, NV15 → P010 unpack call in
    copy_surface_to_image (gated on is_10bit + image.format.fourcc == P010)
  - src/picture.c: 4 switch blocks route Hi10P/Main10 to existing H264/HEVC
    per-codec paths
  - src/request.h: MAX_PROFILES bump 11 → 13, driver_data->is_10bit flag

Scope: COPY path (vaGetImage / vaDeriveImage) only. Standard ffmpeg-vaapi
hwdownload, mpv vaapi-copy, and any consumer using vaGetImage works
end-to-end. PRIME-path consumers that only know NV12/P010 must use the
COPY path; PRIME consumers aware of NV15 (panfrost-Mesa et al.) get the
correct fourcc on RequestExportSurfaceHandle. PRIME-side P010 emission is
follow-up scope (would need DRM_FORMAT_P010 + per-plane unpack into a
GPU-accessible buffer).

Compile-tested on boltzmann (aarch64 native, gcc 15.2.1, libva 1.23.0,
libdrm 2.4.133): clean build, .so produced, 0 new warnings.

Phase 0/2 evidence: linux-mmind-v7.0 drivers/media/platform/rockchip/rkvdec.
rkvdec_h264_decoded_fmts[] and rkvdec_hevc_decoded_fmts[] both list NV15;
ctrl tables cap at HEVC MAIN_10 and H264 HIGH_422_INTRA (Hi10P < cap, not
in menu_skip_mask). image_fmt resolution (rkvdec-h264-common.c:196,
rkvdec-hevc-common.c:467) dispatches on bit_depth_luma_minus8 only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-17 09:15:16 +00:00
parent 7ac934e0c5
commit 662f8874ba
12 changed files with 363 additions and 67 deletions
+82 -26
View File
@@ -39,6 +39,7 @@
#include <linux/dma-buf.h>
#include "nv15.h"
#include "tiled_yuv.h"
#include "utils.h"
#include "v4l2.h"
@@ -86,13 +87,31 @@ VAStatus RequestCreateImage(VADriverContextP context, VAImageFormat *format,
for (i = 0; i < planes_count; i++)
size += destination_sizes[i];
/* Here we calculate the sizes assuming NV12. */
if (format->fourcc == VA_FOURCC_P010) {
/*
* iter39: P010 image overrides V4L2-side NV15 sizing. The
* source is the kernel-reported NV15 packed plane; the image
* buffer holds dense P010 (2 bytes per pixel, 16bpp).
* Recompute sizes/pitches against P010 layout so consumers
* (vaGetImage, vaDeriveImage) see standard P010 geometry.
*/
destination_bytesperlines[0] = width * 2;
destination_sizes[0] = destination_bytesperlines[0] * format_height;
for (i = 1; i < destination_planes_count; i++) {
destination_bytesperlines[i] = destination_bytesperlines[0];
destination_sizes[i] = destination_sizes[0] / 2;
}
size = 0;
for (i = 0; i < destination_planes_count; i++)
size += destination_sizes[i];
} else {
/* NV12: V4L2 stride is correct, sizes derived from height. */
destination_sizes[0] = destination_bytesperlines[0] * format_height;
destination_sizes[0] = destination_bytesperlines[0] * format_height;
for (i = 1; i < destination_planes_count; i++) {
destination_bytesperlines[i] = destination_bytesperlines[0];
destination_sizes[i] = destination_sizes[0] / 2;
for (i = 1; i < destination_planes_count; i++) {
destination_bytesperlines[i] = destination_bytesperlines[0];
destination_sizes[i] = destination_sizes[0] / 2;
}
}
id = object_heap_allocate(&driver_data->image_heap);
@@ -224,7 +243,24 @@ static VAStatus copy_surface_to_image (struct request_data *driver_data,
image->pitches[i], image->width,
i == 0 ? image->height :
image->height / 2);
else {
else if (driver_data->is_10bit &&
image->format.fourcc == VA_FOURCC_P010) {
/*
* iter39: rkvdec emits NV15 (4×10-bit packed in 5
* bytes); the VA image buffer is dense P010 (2B/pixel,
* value in bits[15:6]). Source stride is the V4L2-
* reported NV15 bytesperline (= ceil(width/4)*5,
* possibly aligned higher by the kernel); destination
* stride is image->pitches[i] = width * 2.
*/
unsigned int plane_h = (i == 0) ? image->height
: image->height / 2;
nv15_unpack_plane_to_p010(
surface_object->destination_data[i],
(uint16_t *)(buffer_object->data + image->offsets[i]),
image->width, plane_h,
surface_object->destination_bytesperlines[i]);
} else {
#endif
memcpy(buffer_object->data + image->offsets[i],
surface_object->destination_data[i],
@@ -268,9 +304,17 @@ VAStatus RequestDeriveImage(VADriverContextP context, VASurfaceID surface_id,
/* Fully populate VAImageFormat to match QueryImageFormats output. */
memset(&format, 0, sizeof(format));
format.fourcc = VA_FOURCC_NV12;
format.byte_order = VA_LSB_FIRST;
format.bits_per_pixel = 12;
if (driver_data->is_10bit) {
/* iter39: 10-bit session derives a P010 image. NV15-source
* unpack happens in copy_surface_to_image. */
format.fourcc = VA_FOURCC_P010;
format.byte_order = VA_LSB_FIRST;
format.bits_per_pixel = 24;
} else {
format.fourcc = VA_FOURCC_NV12;
format.byte_order = VA_LSB_FIRST;
format.bits_per_pixel = 12;
}
status = RequestCreateImage(context, &format, surface_object->width,
surface_object->height, image);
@@ -305,26 +349,38 @@ VAStatus RequestDeriveImage(VADriverContextP context, VASurfaceID surface_id,
VAStatus RequestQueryImageFormats(VADriverContextP context,
VAImageFormat *formats, int *formats_count)
{
struct request_data *driver_data = context->pDriverData;
int n = 0;
/*
* Populate the VAImageFormat fully per VAAPI spec for NV12 —
* not just .fourcc. Consumers (FFmpeg's hwcontext_vaapi, mpv,
* Firefox) read .byte_order and .bits_per_pixel; leaving them
* uninitialized inherits whatever caller-stack garbage is in
* the buffer and produces non-deterministic behavior. Reference:
* Mesa's gallium/frontends/va/image.c::vlVaQueryImageFormats and
* intel-vaapi-driver's i965_drv_video.c — both publish NV12
* with byte_order=VA_LSB_FIRST and bits_per_pixel=12.
* Populate the VAImageFormat fully per VAAPI spec — not just
* .fourcc. Consumers (FFmpeg's hwcontext_vaapi, mpv, Firefox)
* read .byte_order and .bits_per_pixel; leaving them
* uninitialized inherits caller-stack garbage and produces
* non-deterministic behavior. Reference: Mesa's
* gallium/frontends/va/image.c::vlVaQueryImageFormats and
* intel-vaapi-driver's i965_drv_video.c.
*
* For YUV formats, depth/red_mask/green_mask/blue_mask/alpha_mask
* are not meaningful (those describe RGB bit layouts); leave them
* zeroed via memset before populating.
* iter39: advertise P010 when an active session is 10-bit so
* ffmpeg-vaapi sees a valid 10-bit-compatible entry during
* vaQueryImageFormats. NV12 stays advertised unconditionally so
* the 8-bit catalog query response is unchanged.
*/
memset(&formats[0], 0, sizeof(formats[0]));
formats[0].fourcc = VA_FOURCC_NV12;
formats[0].byte_order = VA_LSB_FIRST;
formats[0].bits_per_pixel = 12;
*formats_count = 1;
memset(&formats[n], 0, sizeof(formats[n]));
formats[n].fourcc = VA_FOURCC_NV12;
formats[n].byte_order = VA_LSB_FIRST;
formats[n].bits_per_pixel = 12;
n++;
if (driver_data->is_10bit && n < V4L2_REQUEST_MAX_IMAGE_FORMATS) {
memset(&formats[n], 0, sizeof(formats[n]));
formats[n].fourcc = VA_FOURCC_P010;
formats[n].byte_order = VA_LSB_FIRST;
formats[n].bits_per_pixel = 24;
n++;
}
*formats_count = n;
return VA_STATUS_SUCCESS;
}