Files
libva-v4l2-request-fourier/src/nv12_col128.h
T
claude-noether 3ffa9d0d17 iter40: Pi 5 HEVC chapter — backend integration lands, bit-exact pending
Phase 6 implementation. Backend builds clean on higgs (Debian 13
trixie, aarch64), vainfo lists VAProfileHEVCMain via rpi-hevc-dec,
multi-device probe finds /dev/video19 + /dev/media1, CreateContext
+ S_FMT + REQBUFS + STREAMON all succeed.

Phase 7 partial: infrastructure works, 10 frames flow through the
pipeline (correct byte counts produced — 13824000 for 1280x720 x 10
NV12 frames). But every DQBUF CAPTURE returns V4L2_BUF_FLAG_ERROR
so output content is wrong (libva sha != kdirect sha). The decode
itself is failing on the rpi-hevc-dec side despite all ctrl
submissions returning success.

Code changes:
- request.h: video_fd_rpi_hevc_dec / media_fd_rpi_hevc_dec slots +
  has_hevc_ext_sps_rps_rpi_hevc_dec flag (mirrors iter38 + iter2
  pair-of-flags pattern, naturally false on Pi).
- request.c: known_decoder_drivers gains rpi-hevc-dec; primary-driver
  probe gets an else-if branch setting the new fds (Phase 5 F3);
  request_switch_device_for_profile prefers 'p' for HEVC when
  rpi-hevc-dec present.
- context.c: per-fd want_pixfmt (NC12 on Pi), capture_pixelformat
  taken from video_format slot (not hardcoded NV12/NV15);
  synthetic-SPS pre-seed gated off for Pi (Phase 5 F6);
  destination_sizes uses nv12_col128_uv_plane_offset for NC12 SAND
  layout (Phase 5 F2);
  per-driver HEVC_START_CODE (NONE on Pi, ANNEX_B on RK);
  per-driver context_object->h264_start_code (skip prepend on Pi).
- video.c: NV12_COL128 video_format entry (8-bit SAND, single
  buffer, 2 planes, NV12 drm_format with MOD_NONE so detile branch
  fires rather than tiled_to_planar).
- nv12_col128.c/.h: detile primitive (Y + UV per-plane, kernel
  hevc_d_video.c bytesperline formula + ffmpeg/Kynesim per-pixel
  offset). UV plane offset = 128 * ALIGN(h, 8) — within-column
  (SAND interleaves Y+UV per column, NOT plane-concatenated;
  earlier wrong formula caught by Phase 7 SEGV).
- image.c: #ifdef __arm__ extended to __arm__ || __aarch64__
  (Phase 5 F1 — guard was killing detile path on all aarch64
  hosts including fresnel iter39 NV15 path, masked because 10-bit
  never exercised); RequestCreateImage NC12 → NV12 stride override
  (linear width, not column-stride); copy_surface_to_image NC12
  detile branch (gates on fourcc + v4l2_format).
- nv15.h: fallback V4L2_PIX_FMT_NV15 define (Debian 13 headers
  omit it though they have NC12).
- nv12_col128.h: fallback V4L2_PIX_FMT_NV12_COL128 +
  V4L2_PIX_FMT_NV12_10_COL128 (Arch / mainline pre-Pi headers).
- tests/test_nv12_col128_detile.c: hand-crafted-bytes unit test;
  passes (8 cases: Y + UV for 4 widths incl. 1366 misaligned;
  UV-offset helper).
- meson.build / nv12_col128 sources listed.

Phase 7 status: not yet bit-exact. Remaining diagnosis: per-frame
S_EXT_CTRLS payload diff vs kdirect (kdirect sends 4 ctrls
SPS+PPS+decode_params+slice_array; ours sends 5 incl. scaling_matrix;
field ordering differs). Likely the slice_array contents need
per-driver handling for rpi-hevc-dec's expected layout. Beyond
in-session reach.

iter38 5/5 baseline on fresnel + ampere should be unaffected (new
fd stays -1 on non-Pi hosts; all gates either short-circuit on
fd-not-present or no-op).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 19:17:14 +00:00

89 lines
3.7 KiB
C

/*
* V4L2_PIX_FMT_NV12_COL128 (NC12) SAND-tiled → linear NV12 detile.
*
* Pi 5 / CM5 (BCM2712) rpi-hevc-dec CAPTURE format. iter40 (2026-05-17).
*
* Layout (kernel drivers/media/platform/raspberrypi/hevc_dec/hevc_d_video.c
* size-formula + ffmpeg/Kynesim libavutil/rpi_sand_fn_pw.h per-pixel
* offset math):
*
* width ALIGN(image_width, 128) -- columns are 128 px wide
* height ALIGN(image_height, 8)
* col_stride (= bytesperline) = height * 3 / 2
* (bytes per [128-wide column] vertical unit incl. Y + UV)
* sizeimage = col_stride * width = total bytes
*
* For pixel (x, y) in the Y plane:
* col = x / 128
* in_col_x = x % 128
* offset = col * col_stride * 128 + y * 128 + in_col_x
*
* UV plane starts at offset (128 * height * num_columns_y) — the same
* per-column layout, h/2 rows tall (CbCr interleaved).
*
* The primitive copies the entire image extent at once. width/height are
* the cropped consumer-visible dimensions; src_col_stride is the kernel-
* reported bytesperline (i.e. ALIGN(height,8) * 3/2).
*/
#ifndef _NV12_COL128_H_
#define _NV12_COL128_H_
#include <stdint.h>
#include <linux/videodev2.h>
/*
* Pre-Pi-kernel headers (Arch ALARM linux-api-headers, older mainline
* kernel-headers packages) may not define V4L2_PIX_FMT_NV12_COL128. The
* fourcc is Pi-specific. Provide a private fallback so the backend
* builds on hosts that target NON-Pi codecs too.
*/
#ifndef V4L2_PIX_FMT_NV12_COL128
#define V4L2_PIX_FMT_NV12_COL128 \
((unsigned int)('N') | ((unsigned int)('C') << 8) | \
((unsigned int)('1') << 16) | ((unsigned int)('2') << 24))
#endif
#ifndef V4L2_PIX_FMT_NV12_10_COL128
/* 10-bit SAND variant: 3 pixels packed into 4 bytes in 128-byte / 96-pixel
* wide columns. iter40 references the fourcc for completeness; the 10-bit
* Pi 5 HEVC chapter (Main10) is post-iter40. */
#define V4L2_PIX_FMT_NV12_10_COL128 \
((unsigned int)('N') | ((unsigned int)('C') << 8) | \
((unsigned int)('3') << 16) | ((unsigned int)('0') << 24))
#endif
/* Detile the Y plane of an NC12 source to a linear NV12 Y plane.
* dst : pointer to linear NV12 Y plane (caller-owned, dst_stride * height bytes)
* dst_stride : linear Y plane stride in bytes (= width for plain NV12)
* src_y : pointer to start of NC12 Y plane (= NC12 buffer base)
* src_col_stride: kernel-reported bytesperline (= ALIGN(height,8) * 3/2)
* width, height: cropped image dimensions in pixels
*/
void nv12_col128_detile_y(uint8_t *dst, unsigned int dst_stride,
const uint8_t *src_y, unsigned int src_col_stride,
unsigned int width, unsigned int height);
/* Detile the UV plane (CbCr interleaved, half-height) of an NC12 source.
* dst : pointer to linear NV12 UV plane
* dst_stride : linear UV plane stride in bytes (= width for NV12)
* src_uv : pointer to start of NC12 UV plane (= src_y + Y-plane-size)
* src_col_stride: same as Y plane (same column geometry)
* width : Y-plane width in pixels (UV plane has same byte width)
* uv_height : UV plane height = height / 2
*/
void nv12_col128_detile_uv(uint8_t *dst, unsigned int dst_stride,
const uint8_t *src_uv, unsigned int src_col_stride,
unsigned int width, unsigned int uv_height);
/* Compute the offset of the UV plane within an NC12 buffer.
* image_width, image_height: cropped image dimensions in pixels
* Returns: byte offset from buffer start to UV plane start
* (= 128 * ALIGN(image_height, 8) * num_columns_y)
*/
unsigned int nv12_col128_uv_plane_offset(unsigned int image_width,
unsigned int image_height);
#endif /* _NV12_COL128_H_ */