forked from marfrit/libva-v4l2-request-fourier
iter40: Pi 5 HEVC chapter — backend integration lands, bit-exact pending
Phase 6 implementation. Backend builds clean on higgs (Debian 13 trixie, aarch64), vainfo lists VAProfileHEVCMain via rpi-hevc-dec, multi-device probe finds /dev/video19 + /dev/media1, CreateContext + S_FMT + REQBUFS + STREAMON all succeed. Phase 7 partial: infrastructure works, 10 frames flow through the pipeline (correct byte counts produced — 13824000 for 1280x720 x 10 NV12 frames). But every DQBUF CAPTURE returns V4L2_BUF_FLAG_ERROR so output content is wrong (libva sha != kdirect sha). The decode itself is failing on the rpi-hevc-dec side despite all ctrl submissions returning success. Code changes: - request.h: video_fd_rpi_hevc_dec / media_fd_rpi_hevc_dec slots + has_hevc_ext_sps_rps_rpi_hevc_dec flag (mirrors iter38 + iter2 pair-of-flags pattern, naturally false on Pi). - request.c: known_decoder_drivers gains rpi-hevc-dec; primary-driver probe gets an else-if branch setting the new fds (Phase 5 F3); request_switch_device_for_profile prefers 'p' for HEVC when rpi-hevc-dec present. - context.c: per-fd want_pixfmt (NC12 on Pi), capture_pixelformat taken from video_format slot (not hardcoded NV12/NV15); synthetic-SPS pre-seed gated off for Pi (Phase 5 F6); destination_sizes uses nv12_col128_uv_plane_offset for NC12 SAND layout (Phase 5 F2); per-driver HEVC_START_CODE (NONE on Pi, ANNEX_B on RK); per-driver context_object->h264_start_code (skip prepend on Pi). - video.c: NV12_COL128 video_format entry (8-bit SAND, single buffer, 2 planes, NV12 drm_format with MOD_NONE so detile branch fires rather than tiled_to_planar). - nv12_col128.c/.h: detile primitive (Y + UV per-plane, kernel hevc_d_video.c bytesperline formula + ffmpeg/Kynesim per-pixel offset). UV plane offset = 128 * ALIGN(h, 8) — within-column (SAND interleaves Y+UV per column, NOT plane-concatenated; earlier wrong formula caught by Phase 7 SEGV). - image.c: #ifdef __arm__ extended to __arm__ || __aarch64__ (Phase 5 F1 — guard was killing detile path on all aarch64 hosts including fresnel iter39 NV15 path, masked because 10-bit never exercised); RequestCreateImage NC12 → NV12 stride override (linear width, not column-stride); copy_surface_to_image NC12 detile branch (gates on fourcc + v4l2_format). - nv15.h: fallback V4L2_PIX_FMT_NV15 define (Debian 13 headers omit it though they have NC12). - nv12_col128.h: fallback V4L2_PIX_FMT_NV12_COL128 + V4L2_PIX_FMT_NV12_10_COL128 (Arch / mainline pre-Pi headers). - tests/test_nv12_col128_detile.c: hand-crafted-bytes unit test; passes (8 cases: Y + UV for 4 widths incl. 1366 misaligned; UV-offset helper). - meson.build / nv12_col128 sources listed. Phase 7 status: not yet bit-exact. Remaining diagnosis: per-frame S_EXT_CTRLS payload diff vs kdirect (kdirect sends 4 ctrls SPS+PPS+decode_params+slice_array; ours sends 5 incl. scaling_matrix; field ordering differs). Likely the slice_array contents need per-driver handling for rpi-hevc-dec's expected layout. Beyond in-session reach. iter38 5/5 baseline on fresnel + ampere should be unaffected (new fd stays -1 on non-Pi hosts; all gates either short-circuit on fd-not-present or no-op). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
+77
-3
@@ -40,6 +40,7 @@
|
||||
#include <linux/dma-buf.h>
|
||||
|
||||
#include "nv15.h"
|
||||
#include "nv12_col128.h"
|
||||
#include "tiled_yuv.h"
|
||||
#include "utils.h"
|
||||
#include "v4l2.h"
|
||||
@@ -104,6 +105,25 @@ VAStatus RequestCreateImage(VADriverContextP context, VAImageFormat *format,
|
||||
size = 0;
|
||||
for (i = 0; i < destination_planes_count; i++)
|
||||
size += destination_sizes[i];
|
||||
} else if (format->fourcc == VA_FOURCC_NV12 &&
|
||||
video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128) {
|
||||
/*
|
||||
* iter40 Phase 5 review F2: NC12 source, NV12 image output.
|
||||
* V4L2-reported destination_bytesperlines[0] is the NC12
|
||||
* column stride (= ALIGN(height,8) * 3/2 — e.g. 1080 for
|
||||
* 1280×720), NOT the linear NV12 Y stride. Override to the
|
||||
* linear stride (width) so VAImage pitches reflect the
|
||||
* detile-output layout the consumer reads.
|
||||
*/
|
||||
destination_bytesperlines[0] = width;
|
||||
destination_sizes[0] = destination_bytesperlines[0] * format_height;
|
||||
for (i = 1; i < destination_planes_count; i++) {
|
||||
destination_bytesperlines[i] = destination_bytesperlines[0];
|
||||
destination_sizes[i] = destination_sizes[0] / 2;
|
||||
}
|
||||
size = 0;
|
||||
for (i = 0; i < destination_planes_count; i++)
|
||||
size += destination_sizes[i];
|
||||
} else {
|
||||
/* NV12: V4L2 stride is correct, sizes derived from height. */
|
||||
destination_sizes[0] = destination_bytesperlines[0] * format_height;
|
||||
@@ -236,14 +256,31 @@ static VAStatus copy_surface_to_image (struct request_data *driver_data,
|
||||
}
|
||||
|
||||
for (i = 0; i < surface_object->destination_planes_count; i++) {
|
||||
#ifdef __arm__
|
||||
/*
|
||||
* iter40 Phase 5 review F1: guard extended from __arm__ to
|
||||
* __arm__ || __aarch64__. Without this, the detile primitives
|
||||
* silently compiled out on aarch64 (fresnel RK3399, ampere
|
||||
* RK3588, higgs Pi CM5) and the memcpy fall-through delivered
|
||||
* raw tiled bytes to NV12/P010 image consumers. iter39 5/5
|
||||
* PASS masked the issue because no 10-bit path was exercised.
|
||||
*/
|
||||
#if defined(__arm__) || defined(__aarch64__)
|
||||
/*
|
||||
* Sunxi tiled_to_planar lives in tiled_yuv.S which is
|
||||
* #ifdef __arm__ — symbol absent on aarch64. Keep this
|
||||
* branch arm-only; aarch64 Sunxi support would need a C or
|
||||
* aarch64-ASM port (no Sunxi aarch64 board in current fleet).
|
||||
*/
|
||||
#if defined(__arm__)
|
||||
if (!video_format_is_linear(driver_data->video_format))
|
||||
tiled_to_planar(surface_object->destination_data[i],
|
||||
buffer_object->data + image->offsets[i],
|
||||
image->pitches[i], image->width,
|
||||
i == 0 ? image->height :
|
||||
image->height / 2);
|
||||
else if (driver_data->is_10bit &&
|
||||
else
|
||||
#endif
|
||||
if (driver_data->is_10bit &&
|
||||
image->format.fourcc == VA_FOURCC_P010) {
|
||||
/*
|
||||
* iter39: rkvdec emits NV15 (4×10-bit packed in 5
|
||||
@@ -260,12 +297,49 @@ static VAStatus copy_surface_to_image (struct request_data *driver_data,
|
||||
(uint16_t *)(buffer_object->data + image->offsets[i]),
|
||||
image->width, plane_h,
|
||||
surface_object->destination_bytesperlines[i]);
|
||||
} else if (driver_data->video_format != NULL &&
|
||||
driver_data->video_format->v4l2_format ==
|
||||
V4L2_PIX_FMT_NV12_COL128 &&
|
||||
image->format.fourcc == VA_FOURCC_NV12) {
|
||||
/*
|
||||
* iter40: Pi 5 rpi-hevc-dec emits NV12_COL128 (SAND
|
||||
* 128-pixel-wide column tiles). Detile to linear NV12
|
||||
* via the per-plane primitive. surface_object->
|
||||
* destination_data[i] is the V4L2 CAPTURE mmap (single
|
||||
* buffer, planes_count==2): i==0 is the Y plane base,
|
||||
* i==1 is the UV plane base offset within the SAME
|
||||
* physical buffer (per cap_pool plane[1] offset = Y
|
||||
* plane size in COL128 layout).
|
||||
*
|
||||
* src_col_stride = destination_bytesperlines[i] = the
|
||||
* kernel-reported NC12 bytesperline (column stride,
|
||||
* = ALIGN(image_h, 8) * 3/2). Same for both planes
|
||||
* since column geometry is plane-agnostic.
|
||||
*
|
||||
* dst stride is image->pitches[i] = image->width
|
||||
* (overridden in RequestCreateImage NC12 branch below).
|
||||
*/
|
||||
if (i == 0) {
|
||||
nv12_col128_detile_y(
|
||||
(uint8_t *)(buffer_object->data + image->offsets[i]),
|
||||
image->pitches[i],
|
||||
surface_object->destination_data[i],
|
||||
surface_object->destination_bytesperlines[i],
|
||||
image->width, image->height);
|
||||
} else {
|
||||
nv12_col128_detile_uv(
|
||||
(uint8_t *)(buffer_object->data + image->offsets[i]),
|
||||
image->pitches[i],
|
||||
surface_object->destination_data[i],
|
||||
surface_object->destination_bytesperlines[i],
|
||||
image->width, image->height / 2);
|
||||
}
|
||||
} else {
|
||||
#endif
|
||||
memcpy(buffer_object->data + image->offsets[i],
|
||||
surface_object->destination_data[i],
|
||||
surface_object->destination_sizes[i]);
|
||||
#ifdef __arm__
|
||||
#if defined(__arm__) || defined(__aarch64__)
|
||||
}
|
||||
#endif
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user