Commit Graph

18 Commits

Author SHA1 Message Date
claude-noether 3ffa9d0d17 iter40: Pi 5 HEVC chapter — backend integration lands, bit-exact pending
Phase 6 implementation. Backend builds clean on higgs (Debian 13
trixie, aarch64), vainfo lists VAProfileHEVCMain via rpi-hevc-dec,
multi-device probe finds /dev/video19 + /dev/media1, CreateContext
+ S_FMT + REQBUFS + STREAMON all succeed.

Phase 7 partial: infrastructure works, 10 frames flow through the
pipeline (correct byte counts produced — 13824000 for 1280x720 x 10
NV12 frames). But every DQBUF CAPTURE returns V4L2_BUF_FLAG_ERROR
so output content is wrong (libva sha != kdirect sha). The decode
itself is failing on the rpi-hevc-dec side despite all ctrl
submissions returning success.

Code changes:
- request.h: video_fd_rpi_hevc_dec / media_fd_rpi_hevc_dec slots +
  has_hevc_ext_sps_rps_rpi_hevc_dec flag (mirrors iter38 + iter2
  pair-of-flags pattern, naturally false on Pi).
- request.c: known_decoder_drivers gains rpi-hevc-dec; primary-driver
  probe gets an else-if branch setting the new fds (Phase 5 F3);
  request_switch_device_for_profile prefers 'p' for HEVC when
  rpi-hevc-dec present.
- context.c: per-fd want_pixfmt (NC12 on Pi), capture_pixelformat
  taken from video_format slot (not hardcoded NV12/NV15);
  synthetic-SPS pre-seed gated off for Pi (Phase 5 F6);
  destination_sizes uses nv12_col128_uv_plane_offset for NC12 SAND
  layout (Phase 5 F2);
  per-driver HEVC_START_CODE (NONE on Pi, ANNEX_B on RK);
  per-driver context_object->h264_start_code (skip prepend on Pi).
- video.c: NV12_COL128 video_format entry (8-bit SAND, single
  buffer, 2 planes, NV12 drm_format with MOD_NONE so detile branch
  fires rather than tiled_to_planar).
- nv12_col128.c/.h: detile primitive (Y + UV per-plane, kernel
  hevc_d_video.c bytesperline formula + ffmpeg/Kynesim per-pixel
  offset). UV plane offset = 128 * ALIGN(h, 8) — within-column
  (SAND interleaves Y+UV per column, NOT plane-concatenated;
  earlier wrong formula caught by Phase 7 SEGV).
- image.c: #ifdef __arm__ extended to __arm__ || __aarch64__
  (Phase 5 F1 — guard was killing detile path on all aarch64
  hosts including fresnel iter39 NV15 path, masked because 10-bit
  never exercised); RequestCreateImage NC12 → NV12 stride override
  (linear width, not column-stride); copy_surface_to_image NC12
  detile branch (gates on fourcc + v4l2_format).
- nv15.h: fallback V4L2_PIX_FMT_NV15 define (Debian 13 headers
  omit it though they have NC12).
- nv12_col128.h: fallback V4L2_PIX_FMT_NV12_COL128 +
  V4L2_PIX_FMT_NV12_10_COL128 (Arch / mainline pre-Pi headers).
- tests/test_nv12_col128_detile.c: hand-crafted-bytes unit test;
  passes (8 cases: Y + UV for 4 widths incl. 1366 misaligned;
  UV-offset helper).
- meson.build / nv12_col128 sources listed.

Phase 7 status: not yet bit-exact. Remaining diagnosis: per-frame
S_EXT_CTRLS payload diff vs kdirect (kdirect sends 4 ctrls
SPS+PPS+decode_params+slice_array; ours sends 5 incl. scaling_matrix;
field ordering differs). Likely the slice_array contents need
per-driver handling for rpi-hevc-dec's expected layout. Beyond
in-session reach.

iter38 5/5 baseline on fresnel + ampere should be unaffected (new
fd stays -1 on non-Pi hosts; all gates either short-circuit on
fd-not-present or no-op).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 19:17:14 +00:00
claude-noether 393d02f413 iter2 step3: HEVC EXT_SPS_*_RPS UAPI header + runtime probe
src/hevc-ctrls/v4l2-hevc-ext-controls.h (NEW, MIT, ~95 LOC):
  Verbatim mirror of Linux 7.0 V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS
  and _LT_RPS control IDs + struct definitions + flag macros. Each
  symbol is ifndef-guarded so when ampere's linux-api-headers
  eventually bumps to 7.0+, the kernel header takes precedence and
  this shim silently no-ops. Citation block links the upstream
  Casanova v8 series.

  Per LGPL section 3.b, kernel UAPI struct definitions are excepted
  from GPL inheritance, so copying them into MIT userspace is fine.

src/request.h: added has_hevc_ext_sps_rps_rkvdec + _hantro bool
  fields on struct request_data — pair-of-flags layout mirrors
  video_fd_rkvdec / video_fd_hantro (iter38 multi-device-probe
  pattern, per feedback_multi_device_probe_design). Phase 5 review
  identified single-scalar storage as a silent-misbehavior risk
  across device-switch boundaries.

src/request.c:
  - new probe_hevc_ext_sps_rps_controls(fd) helper: queries the two
    new CIDs via VIDIOC_QUERYCTRL; returns true iff both register.
    RK3399 rkvdec (linux 6.x or 7.x without VDPU381/383 bindings)
    returns false; RK3588 rkvdec (VDPU381/383) returns true.
  - probe each driver_data->video_fd_rkvdec / _hantro after the
    iter38 multi-device-probe block at VA_DRIVER_INIT time
  - log-line if rkvdec supports it - diagnostic for Phase 7

src/meson.build: added the new UAPI header to the headers list.

Build verified: ninja -C build clean, .so produced. The new probe
runs at driver init and stores the result, but nothing CONSUMES the
result yet — that's Step 4 (h265_set_controls wiring).

Per ampere-kernel-decoders campaign iter2 Phase 4 step 3 (amended
by Phase 5 review item 'per-fd storage').

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:49:09 +00:00
claude-noether 9f7437e8ee iter2 step2: GLib/GStreamer compat shim, build succeeds
Vendored gsth265parser + nalutils + gstbitreader + gstbytereader (the
Step 1 commit) compile cleanly against libc + libv4l2 only after
adding 1 compat translation unit + 5 stub headers, no edits to the
vendored .c/.h files themselves.

src/h265_parser/gst_compat.{h,c} — new files (MIT, original work):
  - GLib type aliases (gboolean, gchar, gint*, guint*, gsize, gpointer)
  - Memory helpers (g_malloc/g_free as #define free, g_memdup2 inline)
  - Asserts as no-op + parser-return-code-propagation
  - All GST_DEBUG/INFO/WARNING/ERROR/LOG/FIXME as no-ops (the parser
    is heavy on debug logging; we compile it all out)
  - GArray implementation (~100 LOC, just enough for gsth265parser.c's
    24 call sites)
  - GList full struct with .data/.next/.prev so callers compile;
    list-manipulation functions abort() — dead code paths only
  - Byte-order read/write macros (GST_READ_UINT8/16/24/32/64_LE/BE,
    GST_WRITE_UINT8/16/24/32_BE) — aarch64 LE inlines
  - g_once_init_enter/leave as simple gate
  - G_MAXUINT*, G_MAXINT*, G_MINxxx, G_GNUC_* attribute macros, etc.
  - Opaque GstBuffer/GstMemory/GstMapInfo + abort-stub functions for
    the encoder-side SEI-insertion paths the libva backend never invokes
  - gst_util_ceil_log2 real impl (used by slice-header parser; dead
    for our SPS-only call path but cheaper to implement than stub)

src/h265_parser/gst/{gst.h,base/base-prelude.h,base/gstbitwriter.h,
codecparsers/codecparsers-prelude.h,glib-compat-private.h} — 5 new
stub headers (MIT). All include gst_compat.h. gstbitwriter.h adds
abort-stub functions for the bit-writer API (used by nalutils.c's NAL
emulation-prevention encoder path — dead code for the parse-only
libva backend).

src/meson.build — added the 5 new .c source files and 10 new .h
headers; added include_directories('h265_parser') to the include path
so the vendored files' '#include <gst/base/...>' style references
resolve to the stub headers + actual vendored files in the local
tree.

Build verified: ninja -C build produces v4l2_request_drv_video.so
(682 KB, up from 485 KB pre-vendor — the +200 KB is the vendored
parser code). nm shows gst_h265_parse_sps, gst_h265_parse_sps_ext,
gst_h265_parser_identify_nalu, and the other functions we need for
Step 4 are present in the binary.

Two #warning messages from gsth265parser.h about API stability are
upstream-intentional and harmless ('The H.265 parsing library is
unstable API and may change in future').

This commit completes Step 2 of ampere-kernel-decoders iter2 Phase 6.
Backend remains functionally identical to pre-iter2 — the new code
compiles + links but is not yet called from h265_set_controls (that's
Step 4). Existing 5 codecs continue to work as before.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:49:06 +00:00
claude-noether 662f8874ba iter39 α-31: H264 Hi10P + HEVC Main10 sub-profile support (10-bit, rkvdec NV15)
Adds VAProfileH264High10 and VAProfileHEVCMain10 to the libva-v4l2-request
backend. RK3399 rkvdec emits decoded frames as V4L2_PIX_FMT_NV15 (4 × 10-bit
values packed in 5 bytes per element); VAAPI consumers receive standard
VA_FOURCC_P010 via a new userspace unpack in copy_surface_to_image.

VP9 Profile 2 explicitly NOT added — RK3399 rkvdec kernel ctrl table
caps at V4L2_MPEG_VIDEO_VP9_PROFILE_0 (rkvdec.c::rkvdec_vp9_ctrl_descs).

Touchpoints (per Phase 5 sonnet-architect review amendments):
  - include/drm_fourcc.h: define DRM_FORMAT_NV15 (vendored libdrm lacks it)
  - src/nv15.{c,h}: NV15 → P010 plane unpack (LSB-first, per
    Documentation/userspace-api/media/v4l/pixfmt-nv15.rst)
  - src/video.c: NV15 entry in formats[] (else NULL-deref on video_format_find)
  - src/codec.c: pixelformat_for_profile cases for Hi10P + Main10
  - src/config.c: enumeration, validation, entrypoints, RT_FORMAT_YUV420_10
    advertisement for 10-bit profiles
  - src/context.c: per-profile CAPTURE pix_fmt (NV12/NV15), 10-bit synthetic
    SPS (bit_depth_luma_minus8=2), video_format invalidation on bit-depth
    transition (sibling to iter38 device-switch invalidation), is_10bit flag
  - src/surface.c: RT_FORMAT_YUV420_10 admission, NV15 fourcc on PRIME export
  - src/image.c: P010 reporting in DeriveImage + QueryImageFormats,
    P010-aware sizing in CreateImage, NV15 → P010 unpack call in
    copy_surface_to_image (gated on is_10bit + image.format.fourcc == P010)
  - src/picture.c: 4 switch blocks route Hi10P/Main10 to existing H264/HEVC
    per-codec paths
  - src/request.h: MAX_PROFILES bump 11 → 13, driver_data->is_10bit flag

Scope: COPY path (vaGetImage / vaDeriveImage) only. Standard ffmpeg-vaapi
hwdownload, mpv vaapi-copy, and any consumer using vaGetImage works
end-to-end. PRIME-path consumers that only know NV12/P010 must use the
COPY path; PRIME consumers aware of NV15 (panfrost-Mesa et al.) get the
correct fourcc on RequestExportSurfaceHandle. PRIME-side P010 emission is
follow-up scope (would need DRM_FORMAT_P010 + per-plane unpack into a
GPU-accessible buffer).

Compile-tested on boltzmann (aarch64 native, gcc 15.2.1, libva 1.23.0,
libdrm 2.4.133): clean build, .so produced, 0 new warnings.

Phase 0/2 evidence: linux-mmind-v7.0 drivers/media/platform/rockchip/rkvdec.
rkvdec_h264_decoded_fmts[] and rkvdec_hevc_decoded_fmts[] both list NV15;
ctrl tables cap at HEVC MAIN_10 and H264 HIGH_422_INTRA (Hi10P < cap, not
in menu_skip_mask). image_fmt resolution (rkvdec-h264-common.c:196,
rkvdec-hevc-common.c:467) dispatches on bit_depth_luma_minus8 only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:15:16 +00:00
claude-noether 1c548b136a fresnel-fourier iter5b-β Phase 6 commit A: NEW src/codec.{h,c} — pixelformat_for_profile helper
Re-introduce after the iter5b-α' revert. Helper maps VAProfile to V4L2
OUTPUT-side FOURCC, used at CreateConfig in commit B to populate the
previously-dead object_config->pixelformat field. β reads from there
at CreateContext (commit C).

Single source of truth for the profile→pixelformat mapping; mirrors
the per-profile probes in config.c::RequestQueryConfigProfiles
(lines 138-188).

Register codec.c in meson.build sources, codec.h in headers.

Signed-off-by: claude-noether <claude-noether@reauktion.de>
2026-05-12 14:10:46 +00:00
claude-noether 6bc29ec582 Revert "fresnel-fourier iter5b Phase 6 commit A: NEW src/codec.{h,c} — pixelformat_for_profile helper"
This reverts commit ce304ef5af.
2026-05-12 12:32:57 +00:00
claude-noether ce304ef5af fresnel-fourier iter5b Phase 6 commit A: NEW src/codec.{h,c} — pixelformat_for_profile helper
Add a small helper that maps a VAProfile to its V4L2 OUTPUT-side
pixel format FOURCC. Single source of truth, mirrors the per-profile
probes in config.c::RequestQueryConfigProfiles (lines 138-188).

Used by commits B + C in this series:
- commit B: populate object_config->pixelformat at CreateConfig
- commit C: surface.c reads the populated field to set OUTPUT format
  per-profile instead of hardcoded H264_SLICE

Register in meson.build sources + headers.

Signed-off-by: claude-noether <claude-noether@reauktion.de>
2026-05-12 09:04:02 +00:00
claude-noether 406d08e122 fresnel-fourier iter4 Phase 6 commit B: NEW src/vp9.c + src/vp9.h + meson.build + context.h (vp9_lf) + surface.h (params.vp9)
VP9 codec dispatcher implementing 12 contract clauses against
V4L2_CID_STATELESS_VP9_FRAME (0xa40a2c) +
V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (0xa40a2d). 2 batched
controls per frame; rkvdec on RK3399 mandatorily requires both
per drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble:752.

Implementation:
- ~80 LOC VPX range coder (vp9_rac_*) — minimal port of FFmpeg
  vpx_rac.[ch] + vp89_rac.h. Stateless static helpers.
- inv_map_table[255] + read_prob_delta — verbatim copy from
  v4l2_request_vp9.c:44-97.
- vp9_parse_uncompressed_header_lf_quant — partial parse for the
  fields VAAPI doesn't expose: lf_delta_enabled / lf_delta_update /
  lf_ref_delta[4] / lf_mode_delta[2] / base_q_idx /
  delta_q_y_dc / delta_q_uv_dc / delta_q_uv_ac. ~120 LOC.
- vp9_fill_compressed_hdr — port of FFmpeg fill_compressed_hdr
  with Phase 5 C3 out_reference_mode parameter. ~140 LOC.
- vp9_set_controls — orchestrates Clauses 1+2+4+5+7+10+11+12.
  ~120 LOC.

Phase 5 amendments incorporated in code:
- C1: frame.interpolation_filter = direct from VAAPI's
  mcomp_filter_type (NO XOR; vaapi_vp9.c:62 already applied it
  before storing into VAAPI's mcomp_filter_type).
- C2: persistent vp9_lf state added to object_context (in
  context.h). Initialized to VP9 spec defaults
  {1,0,-1,-1,0,0} on keyframe / intra_only / error_resilient.
  Updated only when parser sees lf_delta.update=1. Always
  copied to kernel control.
- C3: vp9_fill_compressed_hdr takes uint8_t *out_reference_mode;
  threaded through call site. allowcompinter derived from VAAPI
  sign-bias bits.

Phase 5 S4: uv_mode memcpy from FFmpeg's fill_compressed_hdr
omitted — rkvdec reads uv_mode from kernel's persistent
probability_tables, NOT from prob_updates ctrl.

Clause 3 compile-time _Static_assert on struct sizes (168/2040)
matches Phase 3 empirical baseline; UAPI shifts will fail loudly.

surface.h: extends params union with vp9 { picture, slice }.
context.h: adds vp9_lf { ref_deltas[4], mode_deltas[2], initialized }.
meson.build: adds vp9.c + vp9.h.

Build: clean on fresnel (linux-fresnel-fourier 7.0-1, libva 1.23).
Runtime: not yet wired in picture.c — next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 06:46:11 +00:00
claude-noether 017e27f389 fresnel-fourier iter3 Phase 6 commit B: NEW src/vp8.c + src/vp8.h
+ meson.build VP8 entries

Net-new VP8 codec dispatcher implemented against
V4L2_CID_STATELESS_VP8_FRAME (kernel UAPI <linux/v4l2-controls.h>:
1900-1958). Single batched control per frame, no init-time device-
wide menus (VP8 has no DECODE_MODE/START_CODE).

Per-frame submission: ONE VIDIOC_S_EXT_CTRLS, count=1, with full
v4l2_ctrl_vp8_frame struct (1232 bytes — corrected vs Phase 2
implicit ~400 estimate; entropy.coeff_probs[4][8][3][11] alone is
1056 bytes).

vp8_set_controls() implements 10 contract clauses per
phase4_iter3_plan.md:

  Clause 1: single-control batched submission (count=1)
  Clause 2: stack alloc + memset zero (covers all padding)
  Clause 3: width/height/version/per-frame scalars; off-by-one
            num_dct_parts = num_of_partitions - 1
  Clause 4: DPB timestamp resolution (3 refs: last/golden/alt;
            NULL surface → 0-sentinel via memset; mirrors iter1
            mpeg2.c::pic.forward_ref_ts)
  Clause 5: loop filter (6 fields + 3 flag bits; ADJ_ENABLE/
            DELTA_UPDATE/FILTER_TYPE_SIMPLE)
  Clause 6: quant base + delta derivation from VAAPI's per-segment
            absolute index matrix (subtraction recovers signed
            deltas; correct for typical content per Phase 5 S1)
  Clause 7: segment fields (segment_probs direct copy; flags
            assembled with DELTA_VALUE_MODE set unconditionally
            per FFmpeg pattern)
  Clause 8: entropy table — 3 VAAPI sources merged (Picture: y_mode +
            uv_mode + mv_probs; ProbabilityData: coeff_probs[4][8][3]
            [11] direct memcpy; IQMatrix: quant)
  Clause 9: coder state + first-partition fields + flags assembly
  Clause 10: v4l2_set_controls submission

Phase 5 review amendments incorporated:

  C1 first_part_header_bits = slice->macroblock_offset
     NOT 0 — kernel hantro_g1_vp8_dec.c:260 + rockchip_vpu2_hw_vp8_
     dec.c:372 read this field unconditionally to compute the MB-
     data DMA offset. Verified via source identity: vaapi_vp8.c:204
     and v4l2_request_vp8.c:83 use byte-identical formulas
     (8 * (input - data) - bit_count - 8); VAAPI exposes via
     slice->macroblock_offset, V4L2 names it first_part_header_bits.

  C2 first_part_size = slice->partition_size[0] +
                       ((macroblock_offset + 7) / 8)
     VAAPI's partition_size[0] is the REMAINING bytes after parsing
     (vaapi_vp8.c:209; va_dec_vp8.h:193-196). Kernel needs the
     TOTAL control partition size; recover by adding back ceil
     (macroblock_offset/8) bytes.
     Phase 3 keyframe verbatim cross-check: 21923 + 819 = 22742 ✓

  C4 (int8_t) cast (NOT (s8); s8 is kernel-internal typedef from
     <linux/types.h> not exposed to userspace; userspace UAPI
     exposes __s8 with double-underscore; portable userspace cast
     is int8_t from <stdint.h>).

  S3 assert(probability_set) — kernel hantro_vp8.c::hantro_vp8_
     prob_update reads coeff_probs unconditionally; NO default-
     table fallback. Practical risk low (FFmpeg vaapi_vp8.c always
     sends VAProbabilityBufferType per frame), but assert surfaces
     immediately if a future consumer doesn't.

Flags assembly: 6 mainline-documented bits only (KEY_FRAME, SHOW_
FRAME, MB_NO_SKIP_COEFF, SIGN_BIAS_GOLDEN, SIGN_BIAS_ALT). EXP +
bit 0x40 NOT replicated despite ffmpeg-v4l2-request-git setting
them on inter frames — kernel hantro_vp8.c only inspects KEY_FRAME
bit. SHOW_FRAME forced unconditional per Phase 3 Q4 (BBB has no
alt-ref invisible frames; documented fidelity gap).

VAAPI inverts: key_frame=0 means it IS a keyframe per VP8 spec.
Backend writes V4L2_VP8_FRAME_FLAG_KEY_FRAME iff
!picture->pic_fields.bits.key_frame.

After this commit alone: vp8.o compiles standalone; meson.build
links it into the shared library. picture.c can't dispatch yet
(commit C wires that).

Refs:
  ../fresnel-fourier/phase4_iter3_plan.md (10 contract clauses,
                                            Phase 5 amendments
                                            section)
  ../fresnel-fourier/phase5_iter3_review.md (C1, C2, C3, C4, S3
                                              all incorporated)
  ../fresnel-fourier/phase3_iter3_baseline.md (verbatim payload
                                                anchors)
  references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp8.c (V4L2 ref)
  references/ffmpeg-kwiboo/libavcodec/vaapi_vp8.c (VAAPI source ref)
  references/linux-mainline/drivers/media/platform/verisilicon/
    hantro_g1_vp8_dec.c (RK3399 kernel driver — first_part_header_
    bits + first_part_size usage)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 22:51:12 +00:00
claude-noether 8d71e20bf7 fresnel-fourier iter2 Phase 6 commit B: rewrite h265.c against new V4L2 stateless HEVC API
Rewrites src/h265.c (407 lines → 588 lines) and the picture.c HEVC
dispatch + per-slice accumulation against the modern split V4L2_CID_
STATELESS_HEVC_{SPS,PPS,SLICE_PARAMS,SCALING_MATRIX,DECODE_PARAMS,
DECODE_MODE,START_CODE} stateless controls. Replaces the staging-era
V4L2_CID_MPEG_VIDEO_HEVC_{SPS,PPS,SLICE_PARAMS} CIDs that were
removed from the kernel UAPI.

Per-frame submission: ONE batched VIDIOC_S_EXT_CTRLS, count=5,
ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS:

  0xa40a90 SPS            (40  bytes)
  0xa40a91 PPS            (64  bytes)
  0xa40a92 SLICE_PARAMS   (variable; dynamic-array; one entry per slice)
  0xa40a93 SCALING_MATRIX (1296 bytes; memset-zero when no scaling list)
  0xa40a94 DECODE_PARAMS  (328 bytes; per-frame DPB info)

Plus device-wide menus set once at context.c init (separate batched
S_EXT_CTRLS call so a kernel without HEVC controls — e.g. hantro on
RK3568/RK3399 — silently fails its batch without invalidating H.264):

  0xa40a95 DECODE_MODE  (FRAME_BASED on rkvdec)
  0xa40a96 START_CODE   (ANNEX_B on rkvdec)

Reference: FFmpeg libavcodec/v4l2_request_hevc.c:505-565
           (v4l2_request_hevc_queue_decode batched submission shape).

Phase 5 review amendments incorporated:

  C1 (data_byte_offset NOT data_bit_offset):
    Old h265.c at lines 184-209 ran an 8-bit search to compute
    bit-granularity offset. New API renames the field to
    data_byte_offset (u32 byte offset). Bit-search dropped; replaced
    with plain byte offset = source_offset + slice->slice_data_byte_offset.

  C2 (dpb_entry.flags only LONG_TERM_REFERENCE; pic_order_cnt_val
      singular; poc_st_curr_*[] arrays hold DPB INDICES not POC):
    h265_fill_decode_params replaces old slice-params DPB iteration
    with explicit DPB classification + index-array population.
    For each VAAPI ReferenceFrames[i]:
      - Classify into ST_CURR_BEFORE / ST_CURR_AFTER / LT_CURR via
        VA_PICTURE_HEVC_RPS_* flags.
      - Set dpb[j].timestamp, .pic_order_cnt_val (singular), .field_pic.
      - Set dpb[j].flags = LONG_TERM_REFERENCE iff RPS_LT_CURR.
      - Append j (DPB index, u8) to poc_st_curr_before[k] /
        poc_st_curr_after[k] / poc_lt_curr[k] based on classification.

  C3 (union-aliasing reasoning corrected):
    BeginPicture's params.h265.num_slices = 0 reset is benign for
    non-HEVC profiles because byte ~17764 of the params union is past
    any field non-HEVC profiles read, NOT because RenderPicture's
    per-buffer copies overwrite that location. Wording amended in
    phase4_iter2_plan.md per phase5_iter2_review.md.

  S1 (PPS flags 19 + 20 — DEBLOCKING_FILTER_CONTROL_PRESENT and
      UNIFORM_SPACING):
    Empirically VAAPI does NOT expose either flag in the
    VAPictureParameterBufferHEVC pic_fields.bits or
    slice_parsing_fields.bits. Both bits left zero. BBB-720p10s_hevc
    fixture uses neither tiles nor explicit deblocking-control
    parameters, so the omission is correct for the iter2 binding cell.

  S2 (3 PPS scalars added):
    pic_parameter_set_id (default 0; VAAPI doesn't expose),
    num_ref_idx_l0_default_active_minus1, num_ref_idx_l1_default_
    active_minus1 (both populated from VAAPI picture struct).

  Q2 (slice_segment_addr populated):
    Was missing in old h265.c. Now sourced from
    VAAPI's slice->slice_segment_address.

  S3 (SCALING_MATRIX content choice):
    Implementer choice taken: when iqmatrix_set==false (BBB has no
    scaling list per SPS flags = SAO|STRONG_INTRA_SMOOTHING),
    h265_fill_scaling_matrix sends memset-zero. Matches FFmpeg's
    sl=NULL pattern at v4l2_request_hevc.c:384-403 (preserves
    byte-equality vs cross-validator anchor).

  S4 (FFmpeg function name fix): cosmetic; no code impact.

Plus one Phase 6 inline correction: phase 5 review S1 suggested
VAAPI exposes uniform_spacing_flag in pic_fields.bits; empirical
test-compile shows it doesn't. Comment added in h265_fill_pps
documenting the omission.

Picture.c changes (3 edits):

  1. codec_set_controls HEVCMain dispatch (lines 204-206 → call
     h265_set_controls; replaces explicit Fourier-local: HEVC stripped
     reject).
  2. codec_store_buffer HEVC VASliceParameterBufferType case: append
     VAAPI slice param to params.h265.slices[N] array, increment
     num_slices. Single-slice mirror at .slice retained for
     h265_fill_pps (which reads dependent_slice_segment_flag from
     LongSliceFlags).
  3. RequestBeginPicture: add params.h265.num_slices = 0 reset
     alongside existing h264.matrix_set = false reset.

Surface.h: extend params.h265 struct with slices[HEVC_MAX_SLICES_PER_
FRAME=64] array + num_slices counter. ~17 KB extra per surface union;
24 surfaces in iter7 cap_pool = ~400 KB total surface_heap growth.
object_heap allocator picks up new size automatically via
sizeof(struct object_surface).

Context.c: separate 2-control batched call sets HEVC DECODE_MODE +
START_CODE device-wide. Same best-effort (void)v4l2_set_controls
pattern as the existing H.264 device-init block; if kernel doesn't
advertise HEVC controls (hantro on RK3568/RK3399), the batch silently
fails without invalidating the H.264 batch.

Meson.build: uncomment 'h265.c' (line 50) and 'h265.h' (line 73)
in sources + headers lists.

H265.h: added HEVC_MAX_SLICES_PER_FRAME=64 #define before struct
forward declarations.

Phase 6 smoke test on fresnel (post Commit A + Commit B):

  Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec env binding
              (/dev/video1 + /dev/media0). PASS.

  Criterion 3: ffmpeg -hwaccel vaapi HEVC decode of bbb_720p10s_hevc.mp4
              -frames:v 5 -f null -, exit 0. cap_pool_init: 24 slots
              ready. PASS.

  Criterion 4: mpv --hwdec=vaapi --vo=image at +02s seek, HEVC fixture:
    HW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5
    SW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5
    HW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656
    SW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656
    HW=SW byte-identical for both frames; frame1 != frame2 (real motion).
    PASS.

  Criterion 5: regression hashes hold for both prior cells:
    H.264 +30s HW frame 1: f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9 (T4 ref MATCH)
    H.264 +30s HW frame 2: 7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8 (T4 ref MATCH)
    MPEG-2 +02s HW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 (iter1 ref MATCH)
    MPEG-2 +02s HW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de (iter1 ref MATCH)
    PASS.

All five criteria green on first build attempt — Phase 5 review
caught the 3 Critical UAPI errors (data_bit_offset → data_byte_offset
rename; dpb.rps field gone + pic_order_cnt_val rename + index-array
semantics) that would have been Phase 6 compile failures or silent
Phase 7 byte-compare divergences. Without that review pass, this
commit would have been the start of a 2+ loopback debugging cycle.

Refs:
  ../fresnel-fourier/phase4_iter2_plan.md (10 contract clauses,
                                            File 4 patch shape)
  ../fresnel-fourier/phase5_iter2_review.md (C1, C2, C3, S1, S2,
                                              S3, S4, Q2 amendments
                                              all incorporated)
  ../fresnel-fourier/phase0_evidence/2026-05-08/iter2_phase3/
    ffmpeg_v4l2req.stdout (cross-validator anchor — Phase 7
    bonus byte-compare verification target)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:58:34 +02:00
test0r 19acc76da4 iter2 Fix 3: decoupled CAPTURE buffer pool with LRU recycling
Pre-iter2 each VA surface was permanently 1:1 bound to one V4L2 CAPTURE
buffer. mpv reusing a surface for a new decode while the compositor still
held an EXPBUF'd dma_buf fd to the prior frame caused the kernel to
write fresh decode output into the same physical memory the compositor
was reading -- visible as stutter / back-and-forth swap on
mpv --hwdec=vaapi --vo=gpu playback.

Architecture:
- New cap_pool abstraction (cap_pool.{h,c}) owns N CAPTURE buffers
  (N = max(surfaces_count, MIN_CAP_POOL=24)) with per-slot state
  {FREE, IN_DECODE, DECODED, EXPORTED} guarded by pthread_mutex_t.
- Surfaces no longer own buffers; each vaBeginPicture acquires the
  oldest FREE slot (LRU), binds it for the decode cycle, and the slot
  cycles IN_DECODE -> DECODED (post-DQBUF) -> EXPORTED (post-EXPBUF).
- Slot is released on next BeginPicture for the same surface or on
  vaDestroySurfaces.

Limitations (Sonnet Phase 5 review iter2 9.x, deferred to iter3+):
- Option-A statistical mitigation; race window narrows to "pool
  exhausted, force-recycle of oldest EXPORTED slot." For typical mpv
  16-surface playback with MIN_CAP_POOL=24 the fallback never fires.
- Multi-context concurrent use not addressed (one V4L2 device, multiple
  cap_pools -- iter3 scope).

Other call sites updated:
- picture.c::BeginPicture acquires + binds, releasing prior slot if any.
- surface.c::SyncSurface marks slot DECODED after DQBUF.
- surface.c::ExportSurfaceHandle marks slot EXPORTED, retaining OUR
  EXPBUF fd for force-recycle close().
- surface.c::DestroySurfaces releases via surface_unbind_slot;
  cap_pool owns the mmaps now.
- surface.c::CreateSurfaces2 destroys the pool in the resolution-change
  path before REQBUFS(0) (else stale v4l2_index after Fix 1's REQBUFS).
- context.c::DestroyContext invokes cap_pool_destroy.
- image.c::DeriveImage skips copy_surface_to_image when current_slot is
  NULL (ffmpeg av_hwframe_ctx_init probes derive on undecoded surfaces).

Verified: mpv vaapi-copy 200 frames bbb_1080p30, 0 drops, LRU visibly
recycling slot indices, real luma gradient. mpv vaapi --vo=gpu
operator-inspection follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:03:31 +00:00
test0r 9de1be34ef h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields
The load-bearing fix from diff_against_ffmpeg.md (campaign repo).

Adds src/h264_slice_header.{c,h} — a minimal H.264 slice_header()
bit-parser per ITU-T H.264 (08/2024) §7.3.3. Parses just enough of
the slice header to populate the V4L2 DECODE_PARAMS fields VAAPI
doesn't carry and that hantro G1 hardware reads directly out of
DECODE_PARAMS into MMIO registers:

  dec_param->dec_ref_pic_marking_bit_size  -> G1_REG_DEC_CTRL5_REFPIC_MK_LEN
  dec_param->idr_pic_id                    -> G1_REG_DEC_CTRL5_IDR_PIC_ID
  dec_param->pic_order_cnt_bit_size        -> G1_REG_DEC_CTRL6_POC_LENGTH
  dec_param->pic_order_cnt_lsb             -> hantro reflist builder (poc_type=0)
  dec_param->delta_pic_order_cnt_bottom    -> same
  dec_param->delta_pic_order_cnt0/1        -> hantro reflist builder (poc_type=1)

Without these set correctly, hantro's hardware bitstream parser
walks past zero bits in the slice header, lands on garbage, decodes
zero pixels — the all-zero CAPTURE output observed across both mpv
and Firefox during 2026-05-04 Phase 0 (see libva-multiplanar campaign
phase0_evidence/2026-05-04-kernel-trace/findings.md).

Implementation:
- Minimal RBSP bit reader (br_read_u/_ue/_se), MSB-first, fault-flag
  on overrun.
- Emulation-prevention unescape (strips 0x03 after 0x00 0x00) on
  the first 64 bytes of the slice — slice headers fit comfortably.
- Walks slice_header() up to and including dec_ref_pic_marking(),
  measuring bit positions for the *_bit_size fields.
- Skips ref_pic_list_modification() and pred_weight_table() —
  needed only to advance the bit position to dec_ref_pic_marking().
- Returns a struct with the V4L2 fields plus diagnostics
  (first_mb_in_slice, slice_type, pps_id, frame_num).

Wired into h264_va_picture_to_v4l2 (src/h264.c) right after the
nal_ref_idc/nal_unit_type extraction. SPS/PPS context is built from
VAPicture's seq_fields and pic_fields; num_ref_idx_l0/l1_active
defaults come from VASlice (best available substitute for the
parsed PPS values). On parse success, populates decode_params with
the recovered values + emits a request_log with the decoded fields
for cross-validation against VAAPI's pre-parsed values.

src/meson.build: adds h264_slice_header.{c,h} to sources.

Cross-references:
- FFmpeg libavcodec/h264_slice.c (Kwiboo v4l2-request-n8.1) — populates
  H264SliceContext::ref_pic_marking_bit_size / pic_order_cnt_bit_size
  by the same bit-precise parse, then v4l2_request_h264.c forwards
  to V4L2.
- Linux drivers/media/platform/verisilicon/hantro_g1_h264_dec.c
  set_params() — the register-write code that reads these fields.

MVC nal_unit_type 20/21 unhandled (this fork strips MVC alongside
HEVC). Multi-slice non-IDR streams parse the first slice's header
only; for FRAME_BASED mode that's fine — kernel sees the whole
bitstream and parses subsequent slices itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:34:47 +00:00
test0r 565f5c0de4 context: introduce request_pool, decouple OUTPUT buffers from surfaces
Commit 3 of the upstreamable plan (upstreamable_design.md §1, §5).
Replaces the prior per-surface OUTPUT-buffer ownership model with a
small driver-wide pool sized by codec pipeline depth (4 H.264 frames
in flight), allocated unconditionally regardless of caller's
num_render_targets.

Prior art (kernel UAPI dev-stateless-decoder.rst, ffmpeg
v4l2_request.c, Chromium V4L2StatelessVideoDecoder, GStreamer
v4l2slh264dec) all decouple OUTPUT and CAPTURE pool sizing. fourier's
"output_count == surfaces_count" model was a category error: OUTPUT
buffers are request-time bitstream slots, CAPTURE buffers are
picture-time DPB slots; their lifecycles and sizing are independent.

Changes:
  * NEW src/request_pool.{c,h} (~200 LoC):
      - request_pool_init(): CREATE_BUFS + per-slot QUERYBUF + mmap.
      - request_pool_destroy(): munmap all, idempotent.
      - request_pool_acquire(): round-robin claim; returns V4L2 buffer
        index of an unused slot or -1.
      - request_pool_release(): mark slot free for reuse.
      - request_pool_slot(): accessor for ptr/size given a buffer index.

  * src/request.h: add struct request_pool output_pool to request_data.

  * src/context.c::RequestCreateContext: replace the per-surface
    OUTPUT loop with a single request_pool_init() call (count=4,
    independent of surfaces_count). Drop the now-unused locals
    (length, offset, source_data, output_buffers_count, index,
    index_base, i, surface_object). DELETES patch 0002's
    "output_buffers_count = ... ? ... : 4" hack inline — the pool's
    own count parameter supersedes it.

  * src/picture.c::RequestBeginPicture: borrow a pool slot at frame
    start, write its mmap pointer/size/index into the surface's
    transient source_* fields. The fields stay (still useful as
    a borrow handle that the existing codec_store_buffer memcpys
    target), but no longer represent surface-permanent ownership.
    Reset slices_size/slices_count here too (was implicit on first
    Render).

  * src/surface.c::RequestSyncSurface: after VIDIOC_DQBUF returns
    the OUTPUT buffer, release the pool slot and clear the surface's
    borrow handle. Fixes the segv on second-frame submission.

  * src/surface.c::RequestDestroySurfaces: remove the munmap of
    source_data — pool owns the mmap.

  * src/request.c::RequestTerminate: call request_pool_destroy()
    before close(video_fd) so munmaps still target a valid fd.

  * src/meson.build: add request_pool.c and request_pool.h to the
    sources/headers lists.

This commit removes 0002's OUTPUT-pool hack inline (the
"floor to 4" line is gone). The DECODE_MODE/START_CODE block in 0002
remains until commit 4 lands.

Build-verified clean on aarch64.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r c45fea96e3 fourier-local: stateless control modernization + HEVC strip
Compound patch carrying the fork's pre-Step-1 substrate, originally
authored by Jernej Škrabec / fourier on top of bootlin's a3c2476:

- src/h264.c + src/picture.c: V4L2_CID_MPEG_VIDEO_H264_* renamed to
  V4L2_CID_STATELESS_H264_*, struct shapes tracked to mainline
  (V4L2_CID_STATELESS_H264_DECODE_MODE/_START_CODE added to the
  passthrough shim).
- include/hevc-ctrls.h: redirect shim to <linux/v4l2-controls.h>
  (kernel-side HEVC controls now live in the canonical UAPI header).
- src/meson.build: src/h265.c / src/h265.h commented out — HEVC
  build path is excluded from this fork (RK3568 hantro G1/G2 has
  no HEVC, and the kernel-side HEVC controls have a separate
  rework in flight upstream).
- src/tiled_yuv.S: aarch64 stub for tiled_to_planar (assembly
  source was sunxi-cedrus armv7-only; aarch64 needs a stub to keep
  the build linking).
- include/h264-ctrls.h: removed (dead post-fourier — no source
  includes it; the passthrough shim's CID aliases live in the
  kernel header now).

Functionally equivalent to the prior fork master commits:
  c1f5108 V4L2_PIX_FMT_H264_SLICE rename
  4ccbfe9 Strip HEVC build path
  da9f2a5 include/h264-ctrls.h passthrough + CID aliases
  fc4bb10 src/h264.c track upstream UAPI shape
  13e9b64 src/h264.c drop num_slices field
  4d14ffb src/tiled_yuv.S aarch64 stub
  1b02c9b src/h264.c include utils.h

Folded into one commit during 2026-05-04 Step 1 reconciliation
(see ../phase0_evidence/2026-05-04/findings.md). Per-patch history
of the early fork commits preserved on the pre-step1 branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 09:40:14 +00:00
Chen-Yu Tsai a3c2476de1 Respect libdir for install path
Distritubions may install libraries under architecture-specific
sub-directories, to support multiple architectures on the same
system. In addition, the user may not wish to install the library
with the default prefix.

Use the libdir variable when setting the install path. This
allows both specifying different sub-directories, and a different
prefix.
2019-05-17 13:59:26 +08:00
Chen-Yu Tsai 3264c0495c Add option to specify path to up-to-date kernel headers
The system normally has kernel headers shipped with the distribution.
These typically lag behind actual kernel releases. Thus they would not
have the latest API additions, such as the V4L2 request API this driver
uses.

However, it is also bad practice to just install new kernel headers into
the system wide default location, as there may be some differences
between it and what the C library was built against.

Add an option to specify a path to a set of up-to-date kernel headers.
This would allow the user to build this project in a safe but working
environment.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2019-05-17 13:59:23 +08:00
Paul Kocialkowski 3176adf69c Include local copies of DRM and V4L2 codec definitions
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:37:12 +01:00
Paul Kocialkowski ca5198b429 Add support for the meson build system
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:37:12 +01:00