3ffa9d0d175a3831f83188c6078b993e3985fc6e
18 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
3ffa9d0d17 |
iter40: Pi 5 HEVC chapter — backend integration lands, bit-exact pending
Phase 6 implementation. Backend builds clean on higgs (Debian 13 trixie, aarch64), vainfo lists VAProfileHEVCMain via rpi-hevc-dec, multi-device probe finds /dev/video19 + /dev/media1, CreateContext + S_FMT + REQBUFS + STREAMON all succeed. Phase 7 partial: infrastructure works, 10 frames flow through the pipeline (correct byte counts produced — 13824000 for 1280x720 x 10 NV12 frames). But every DQBUF CAPTURE returns V4L2_BUF_FLAG_ERROR so output content is wrong (libva sha != kdirect sha). The decode itself is failing on the rpi-hevc-dec side despite all ctrl submissions returning success. Code changes: - request.h: video_fd_rpi_hevc_dec / media_fd_rpi_hevc_dec slots + has_hevc_ext_sps_rps_rpi_hevc_dec flag (mirrors iter38 + iter2 pair-of-flags pattern, naturally false on Pi). - request.c: known_decoder_drivers gains rpi-hevc-dec; primary-driver probe gets an else-if branch setting the new fds (Phase 5 F3); request_switch_device_for_profile prefers 'p' for HEVC when rpi-hevc-dec present. - context.c: per-fd want_pixfmt (NC12 on Pi), capture_pixelformat taken from video_format slot (not hardcoded NV12/NV15); synthetic-SPS pre-seed gated off for Pi (Phase 5 F6); destination_sizes uses nv12_col128_uv_plane_offset for NC12 SAND layout (Phase 5 F2); per-driver HEVC_START_CODE (NONE on Pi, ANNEX_B on RK); per-driver context_object->h264_start_code (skip prepend on Pi). - video.c: NV12_COL128 video_format entry (8-bit SAND, single buffer, 2 planes, NV12 drm_format with MOD_NONE so detile branch fires rather than tiled_to_planar). - nv12_col128.c/.h: detile primitive (Y + UV per-plane, kernel hevc_d_video.c bytesperline formula + ffmpeg/Kynesim per-pixel offset). UV plane offset = 128 * ALIGN(h, 8) — within-column (SAND interleaves Y+UV per column, NOT plane-concatenated; earlier wrong formula caught by Phase 7 SEGV). - image.c: #ifdef __arm__ extended to __arm__ || __aarch64__ (Phase 5 F1 — guard was killing detile path on all aarch64 hosts including fresnel iter39 NV15 path, masked because 10-bit never exercised); RequestCreateImage NC12 → NV12 stride override (linear width, not column-stride); copy_surface_to_image NC12 detile branch (gates on fourcc + v4l2_format). - nv15.h: fallback V4L2_PIX_FMT_NV15 define (Debian 13 headers omit it though they have NC12). - nv12_col128.h: fallback V4L2_PIX_FMT_NV12_COL128 + V4L2_PIX_FMT_NV12_10_COL128 (Arch / mainline pre-Pi headers). - tests/test_nv12_col128_detile.c: hand-crafted-bytes unit test; passes (8 cases: Y + UV for 4 widths incl. 1366 misaligned; UV-offset helper). - meson.build / nv12_col128 sources listed. Phase 7 status: not yet bit-exact. Remaining diagnosis: per-frame S_EXT_CTRLS payload diff vs kdirect (kdirect sends 4 ctrls SPS+PPS+decode_params+slice_array; ours sends 5 incl. scaling_matrix; field ordering differs). Likely the slice_array contents need per-driver handling for rpi-hevc-dec's expected layout. Beyond in-session reach. iter38 5/5 baseline on fresnel + ampere should be unaffected (new fd stays -1 on non-Pi hosts; all gates either short-circuit on fd-not-present or no-op). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
393d02f413 |
iter2 step3: HEVC EXT_SPS_*_RPS UAPI header + runtime probe
src/hevc-ctrls/v4l2-hevc-ext-controls.h (NEW, MIT, ~95 LOC):
Verbatim mirror of Linux 7.0 V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS
and _LT_RPS control IDs + struct definitions + flag macros. Each
symbol is ifndef-guarded so when ampere's linux-api-headers
eventually bumps to 7.0+, the kernel header takes precedence and
this shim silently no-ops. Citation block links the upstream
Casanova v8 series.
Per LGPL section 3.b, kernel UAPI struct definitions are excepted
from GPL inheritance, so copying them into MIT userspace is fine.
src/request.h: added has_hevc_ext_sps_rps_rkvdec + _hantro bool
fields on struct request_data — pair-of-flags layout mirrors
video_fd_rkvdec / video_fd_hantro (iter38 multi-device-probe
pattern, per feedback_multi_device_probe_design). Phase 5 review
identified single-scalar storage as a silent-misbehavior risk
across device-switch boundaries.
src/request.c:
- new probe_hevc_ext_sps_rps_controls(fd) helper: queries the two
new CIDs via VIDIOC_QUERYCTRL; returns true iff both register.
RK3399 rkvdec (linux 6.x or 7.x without VDPU381/383 bindings)
returns false; RK3588 rkvdec (VDPU381/383) returns true.
- probe each driver_data->video_fd_rkvdec / _hantro after the
iter38 multi-device-probe block at VA_DRIVER_INIT time
- log-line if rkvdec supports it - diagnostic for Phase 7
src/meson.build: added the new UAPI header to the headers list.
Build verified: ninja -C build clean, .so produced. The new probe
runs at driver init and stores the result, but nothing CONSUMES the
result yet — that's Step 4 (h265_set_controls wiring).
Per ampere-kernel-decoders campaign iter2 Phase 4 step 3 (amended
by Phase 5 review item 'per-fd storage').
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
9f7437e8ee |
iter2 step2: GLib/GStreamer compat shim, build succeeds
Vendored gsth265parser + nalutils + gstbitreader + gstbytereader (the
Step 1 commit) compile cleanly against libc + libv4l2 only after
adding 1 compat translation unit + 5 stub headers, no edits to the
vendored .c/.h files themselves.
src/h265_parser/gst_compat.{h,c} — new files (MIT, original work):
- GLib type aliases (gboolean, gchar, gint*, guint*, gsize, gpointer)
- Memory helpers (g_malloc/g_free as #define free, g_memdup2 inline)
- Asserts as no-op + parser-return-code-propagation
- All GST_DEBUG/INFO/WARNING/ERROR/LOG/FIXME as no-ops (the parser
is heavy on debug logging; we compile it all out)
- GArray implementation (~100 LOC, just enough for gsth265parser.c's
24 call sites)
- GList full struct with .data/.next/.prev so callers compile;
list-manipulation functions abort() — dead code paths only
- Byte-order read/write macros (GST_READ_UINT8/16/24/32/64_LE/BE,
GST_WRITE_UINT8/16/24/32_BE) — aarch64 LE inlines
- g_once_init_enter/leave as simple gate
- G_MAXUINT*, G_MAXINT*, G_MINxxx, G_GNUC_* attribute macros, etc.
- Opaque GstBuffer/GstMemory/GstMapInfo + abort-stub functions for
the encoder-side SEI-insertion paths the libva backend never invokes
- gst_util_ceil_log2 real impl (used by slice-header parser; dead
for our SPS-only call path but cheaper to implement than stub)
src/h265_parser/gst/{gst.h,base/base-prelude.h,base/gstbitwriter.h,
codecparsers/codecparsers-prelude.h,glib-compat-private.h} — 5 new
stub headers (MIT). All include gst_compat.h. gstbitwriter.h adds
abort-stub functions for the bit-writer API (used by nalutils.c's NAL
emulation-prevention encoder path — dead code for the parse-only
libva backend).
src/meson.build — added the 5 new .c source files and 10 new .h
headers; added include_directories('h265_parser') to the include path
so the vendored files' '#include <gst/base/...>' style references
resolve to the stub headers + actual vendored files in the local
tree.
Build verified: ninja -C build produces v4l2_request_drv_video.so
(682 KB, up from 485 KB pre-vendor — the +200 KB is the vendored
parser code). nm shows gst_h265_parse_sps, gst_h265_parse_sps_ext,
gst_h265_parser_identify_nalu, and the other functions we need for
Step 4 are present in the binary.
Two #warning messages from gsth265parser.h about API stability are
upstream-intentional and harmless ('The H.265 parsing library is
unstable API and may change in future').
This commit completes Step 2 of ampere-kernel-decoders iter2 Phase 6.
Backend remains functionally identical to pre-iter2 — the new code
compiles + links but is not yet called from h265_set_controls (that's
Step 4). Existing 5 codecs continue to work as before.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
662f8874ba |
iter39 α-31: H264 Hi10P + HEVC Main10 sub-profile support (10-bit, rkvdec NV15)
Adds VAProfileH264High10 and VAProfileHEVCMain10 to the libva-v4l2-request
backend. RK3399 rkvdec emits decoded frames as V4L2_PIX_FMT_NV15 (4 × 10-bit
values packed in 5 bytes per element); VAAPI consumers receive standard
VA_FOURCC_P010 via a new userspace unpack in copy_surface_to_image.
VP9 Profile 2 explicitly NOT added — RK3399 rkvdec kernel ctrl table
caps at V4L2_MPEG_VIDEO_VP9_PROFILE_0 (rkvdec.c::rkvdec_vp9_ctrl_descs).
Touchpoints (per Phase 5 sonnet-architect review amendments):
- include/drm_fourcc.h: define DRM_FORMAT_NV15 (vendored libdrm lacks it)
- src/nv15.{c,h}: NV15 → P010 plane unpack (LSB-first, per
Documentation/userspace-api/media/v4l/pixfmt-nv15.rst)
- src/video.c: NV15 entry in formats[] (else NULL-deref on video_format_find)
- src/codec.c: pixelformat_for_profile cases for Hi10P + Main10
- src/config.c: enumeration, validation, entrypoints, RT_FORMAT_YUV420_10
advertisement for 10-bit profiles
- src/context.c: per-profile CAPTURE pix_fmt (NV12/NV15), 10-bit synthetic
SPS (bit_depth_luma_minus8=2), video_format invalidation on bit-depth
transition (sibling to iter38 device-switch invalidation), is_10bit flag
- src/surface.c: RT_FORMAT_YUV420_10 admission, NV15 fourcc on PRIME export
- src/image.c: P010 reporting in DeriveImage + QueryImageFormats,
P010-aware sizing in CreateImage, NV15 → P010 unpack call in
copy_surface_to_image (gated on is_10bit + image.format.fourcc == P010)
- src/picture.c: 4 switch blocks route Hi10P/Main10 to existing H264/HEVC
per-codec paths
- src/request.h: MAX_PROFILES bump 11 → 13, driver_data->is_10bit flag
Scope: COPY path (vaGetImage / vaDeriveImage) only. Standard ffmpeg-vaapi
hwdownload, mpv vaapi-copy, and any consumer using vaGetImage works
end-to-end. PRIME-path consumers that only know NV12/P010 must use the
COPY path; PRIME consumers aware of NV15 (panfrost-Mesa et al.) get the
correct fourcc on RequestExportSurfaceHandle. PRIME-side P010 emission is
follow-up scope (would need DRM_FORMAT_P010 + per-plane unpack into a
GPU-accessible buffer).
Compile-tested on boltzmann (aarch64 native, gcc 15.2.1, libva 1.23.0,
libdrm 2.4.133): clean build, .so produced, 0 new warnings.
Phase 0/2 evidence: linux-mmind-v7.0 drivers/media/platform/rockchip/rkvdec.
rkvdec_h264_decoded_fmts[] and rkvdec_hevc_decoded_fmts[] both list NV15;
ctrl tables cap at HEVC MAIN_10 and H264 HIGH_422_INTRA (Hi10P < cap, not
in menu_skip_mask). image_fmt resolution (rkvdec-h264-common.c:196,
rkvdec-hevc-common.c:467) dispatches on bit_depth_luma_minus8 only.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
1c548b136a |
fresnel-fourier iter5b-β Phase 6 commit A: NEW src/codec.{h,c} — pixelformat_for_profile helper
Re-introduce after the iter5b-α' revert. Helper maps VAProfile to V4L2 OUTPUT-side FOURCC, used at CreateConfig in commit B to populate the previously-dead object_config->pixelformat field. β reads from there at CreateContext (commit C). Single source of truth for the profile→pixelformat mapping; mirrors the per-profile probes in config.c::RequestQueryConfigProfiles (lines 138-188). Register codec.c in meson.build sources, codec.h in headers. Signed-off-by: claude-noether <claude-noether@reauktion.de> |
||
|
|
6bc29ec582 |
Revert "fresnel-fourier iter5b Phase 6 commit A: NEW src/codec.{h,c} — pixelformat_for_profile helper"
This reverts commit
|
||
|
|
ce304ef5af |
fresnel-fourier iter5b Phase 6 commit A: NEW src/codec.{h,c} — pixelformat_for_profile helper
Add a small helper that maps a VAProfile to its V4L2 OUTPUT-side pixel format FOURCC. Single source of truth, mirrors the per-profile probes in config.c::RequestQueryConfigProfiles (lines 138-188). Used by commits B + C in this series: - commit B: populate object_config->pixelformat at CreateConfig - commit C: surface.c reads the populated field to set OUTPUT format per-profile instead of hardcoded H264_SLICE Register in meson.build sources + headers. Signed-off-by: claude-noether <claude-noether@reauktion.de> |
||
|
|
406d08e122 |
fresnel-fourier iter4 Phase 6 commit B: NEW src/vp9.c + src/vp9.h + meson.build + context.h (vp9_lf) + surface.h (params.vp9)
VP9 codec dispatcher implementing 12 contract clauses against
V4L2_CID_STATELESS_VP9_FRAME (0xa40a2c) +
V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (0xa40a2d). 2 batched
controls per frame; rkvdec on RK3399 mandatorily requires both
per drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble:752.
Implementation:
- ~80 LOC VPX range coder (vp9_rac_*) — minimal port of FFmpeg
vpx_rac.[ch] + vp89_rac.h. Stateless static helpers.
- inv_map_table[255] + read_prob_delta — verbatim copy from
v4l2_request_vp9.c:44-97.
- vp9_parse_uncompressed_header_lf_quant — partial parse for the
fields VAAPI doesn't expose: lf_delta_enabled / lf_delta_update /
lf_ref_delta[4] / lf_mode_delta[2] / base_q_idx /
delta_q_y_dc / delta_q_uv_dc / delta_q_uv_ac. ~120 LOC.
- vp9_fill_compressed_hdr — port of FFmpeg fill_compressed_hdr
with Phase 5 C3 out_reference_mode parameter. ~140 LOC.
- vp9_set_controls — orchestrates Clauses 1+2+4+5+7+10+11+12.
~120 LOC.
Phase 5 amendments incorporated in code:
- C1: frame.interpolation_filter = direct from VAAPI's
mcomp_filter_type (NO XOR; vaapi_vp9.c:62 already applied it
before storing into VAAPI's mcomp_filter_type).
- C2: persistent vp9_lf state added to object_context (in
context.h). Initialized to VP9 spec defaults
{1,0,-1,-1,0,0} on keyframe / intra_only / error_resilient.
Updated only when parser sees lf_delta.update=1. Always
copied to kernel control.
- C3: vp9_fill_compressed_hdr takes uint8_t *out_reference_mode;
threaded through call site. allowcompinter derived from VAAPI
sign-bias bits.
Phase 5 S4: uv_mode memcpy from FFmpeg's fill_compressed_hdr
omitted — rkvdec reads uv_mode from kernel's persistent
probability_tables, NOT from prob_updates ctrl.
Clause 3 compile-time _Static_assert on struct sizes (168/2040)
matches Phase 3 empirical baseline; UAPI shifts will fail loudly.
surface.h: extends params union with vp9 { picture, slice }.
context.h: adds vp9_lf { ref_deltas[4], mode_deltas[2], initialized }.
meson.build: adds vp9.c + vp9.h.
Build: clean on fresnel (linux-fresnel-fourier 7.0-1, libva 1.23).
Runtime: not yet wired in picture.c — next commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
017e27f389 |
fresnel-fourier iter3 Phase 6 commit B: NEW src/vp8.c + src/vp8.h
+ meson.build VP8 entries
Net-new VP8 codec dispatcher implemented against
V4L2_CID_STATELESS_VP8_FRAME (kernel UAPI <linux/v4l2-controls.h>:
1900-1958). Single batched control per frame, no init-time device-
wide menus (VP8 has no DECODE_MODE/START_CODE).
Per-frame submission: ONE VIDIOC_S_EXT_CTRLS, count=1, with full
v4l2_ctrl_vp8_frame struct (1232 bytes — corrected vs Phase 2
implicit ~400 estimate; entropy.coeff_probs[4][8][3][11] alone is
1056 bytes).
vp8_set_controls() implements 10 contract clauses per
phase4_iter3_plan.md:
Clause 1: single-control batched submission (count=1)
Clause 2: stack alloc + memset zero (covers all padding)
Clause 3: width/height/version/per-frame scalars; off-by-one
num_dct_parts = num_of_partitions - 1
Clause 4: DPB timestamp resolution (3 refs: last/golden/alt;
NULL surface → 0-sentinel via memset; mirrors iter1
mpeg2.c::pic.forward_ref_ts)
Clause 5: loop filter (6 fields + 3 flag bits; ADJ_ENABLE/
DELTA_UPDATE/FILTER_TYPE_SIMPLE)
Clause 6: quant base + delta derivation from VAAPI's per-segment
absolute index matrix (subtraction recovers signed
deltas; correct for typical content per Phase 5 S1)
Clause 7: segment fields (segment_probs direct copy; flags
assembled with DELTA_VALUE_MODE set unconditionally
per FFmpeg pattern)
Clause 8: entropy table — 3 VAAPI sources merged (Picture: y_mode +
uv_mode + mv_probs; ProbabilityData: coeff_probs[4][8][3]
[11] direct memcpy; IQMatrix: quant)
Clause 9: coder state + first-partition fields + flags assembly
Clause 10: v4l2_set_controls submission
Phase 5 review amendments incorporated:
C1 first_part_header_bits = slice->macroblock_offset
NOT 0 — kernel hantro_g1_vp8_dec.c:260 + rockchip_vpu2_hw_vp8_
dec.c:372 read this field unconditionally to compute the MB-
data DMA offset. Verified via source identity: vaapi_vp8.c:204
and v4l2_request_vp8.c:83 use byte-identical formulas
(8 * (input - data) - bit_count - 8); VAAPI exposes via
slice->macroblock_offset, V4L2 names it first_part_header_bits.
C2 first_part_size = slice->partition_size[0] +
((macroblock_offset + 7) / 8)
VAAPI's partition_size[0] is the REMAINING bytes after parsing
(vaapi_vp8.c:209; va_dec_vp8.h:193-196). Kernel needs the
TOTAL control partition size; recover by adding back ceil
(macroblock_offset/8) bytes.
Phase 3 keyframe verbatim cross-check: 21923 + 819 = 22742 ✓
C4 (int8_t) cast (NOT (s8); s8 is kernel-internal typedef from
<linux/types.h> not exposed to userspace; userspace UAPI
exposes __s8 with double-underscore; portable userspace cast
is int8_t from <stdint.h>).
S3 assert(probability_set) — kernel hantro_vp8.c::hantro_vp8_
prob_update reads coeff_probs unconditionally; NO default-
table fallback. Practical risk low (FFmpeg vaapi_vp8.c always
sends VAProbabilityBufferType per frame), but assert surfaces
immediately if a future consumer doesn't.
Flags assembly: 6 mainline-documented bits only (KEY_FRAME, SHOW_
FRAME, MB_NO_SKIP_COEFF, SIGN_BIAS_GOLDEN, SIGN_BIAS_ALT). EXP +
bit 0x40 NOT replicated despite ffmpeg-v4l2-request-git setting
them on inter frames — kernel hantro_vp8.c only inspects KEY_FRAME
bit. SHOW_FRAME forced unconditional per Phase 3 Q4 (BBB has no
alt-ref invisible frames; documented fidelity gap).
VAAPI inverts: key_frame=0 means it IS a keyframe per VP8 spec.
Backend writes V4L2_VP8_FRAME_FLAG_KEY_FRAME iff
!picture->pic_fields.bits.key_frame.
After this commit alone: vp8.o compiles standalone; meson.build
links it into the shared library. picture.c can't dispatch yet
(commit C wires that).
Refs:
../fresnel-fourier/phase4_iter3_plan.md (10 contract clauses,
Phase 5 amendments
section)
../fresnel-fourier/phase5_iter3_review.md (C1, C2, C3, C4, S3
all incorporated)
../fresnel-fourier/phase3_iter3_baseline.md (verbatim payload
anchors)
references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp8.c (V4L2 ref)
references/ffmpeg-kwiboo/libavcodec/vaapi_vp8.c (VAAPI source ref)
references/linux-mainline/drivers/media/platform/verisilicon/
hantro_g1_vp8_dec.c (RK3399 kernel driver — first_part_header_
bits + first_part_size usage)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
8d71e20bf7 |
fresnel-fourier iter2 Phase 6 commit B: rewrite h265.c against new V4L2 stateless HEVC API
Rewrites src/h265.c (407 lines → 588 lines) and the picture.c HEVC
dispatch + per-slice accumulation against the modern split V4L2_CID_
STATELESS_HEVC_{SPS,PPS,SLICE_PARAMS,SCALING_MATRIX,DECODE_PARAMS,
DECODE_MODE,START_CODE} stateless controls. Replaces the staging-era
V4L2_CID_MPEG_VIDEO_HEVC_{SPS,PPS,SLICE_PARAMS} CIDs that were
removed from the kernel UAPI.
Per-frame submission: ONE batched VIDIOC_S_EXT_CTRLS, count=5,
ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS:
0xa40a90 SPS (40 bytes)
0xa40a91 PPS (64 bytes)
0xa40a92 SLICE_PARAMS (variable; dynamic-array; one entry per slice)
0xa40a93 SCALING_MATRIX (1296 bytes; memset-zero when no scaling list)
0xa40a94 DECODE_PARAMS (328 bytes; per-frame DPB info)
Plus device-wide menus set once at context.c init (separate batched
S_EXT_CTRLS call so a kernel without HEVC controls — e.g. hantro on
RK3568/RK3399 — silently fails its batch without invalidating H.264):
0xa40a95 DECODE_MODE (FRAME_BASED on rkvdec)
0xa40a96 START_CODE (ANNEX_B on rkvdec)
Reference: FFmpeg libavcodec/v4l2_request_hevc.c:505-565
(v4l2_request_hevc_queue_decode batched submission shape).
Phase 5 review amendments incorporated:
C1 (data_byte_offset NOT data_bit_offset):
Old h265.c at lines 184-209 ran an 8-bit search to compute
bit-granularity offset. New API renames the field to
data_byte_offset (u32 byte offset). Bit-search dropped; replaced
with plain byte offset = source_offset + slice->slice_data_byte_offset.
C2 (dpb_entry.flags only LONG_TERM_REFERENCE; pic_order_cnt_val
singular; poc_st_curr_*[] arrays hold DPB INDICES not POC):
h265_fill_decode_params replaces old slice-params DPB iteration
with explicit DPB classification + index-array population.
For each VAAPI ReferenceFrames[i]:
- Classify into ST_CURR_BEFORE / ST_CURR_AFTER / LT_CURR via
VA_PICTURE_HEVC_RPS_* flags.
- Set dpb[j].timestamp, .pic_order_cnt_val (singular), .field_pic.
- Set dpb[j].flags = LONG_TERM_REFERENCE iff RPS_LT_CURR.
- Append j (DPB index, u8) to poc_st_curr_before[k] /
poc_st_curr_after[k] / poc_lt_curr[k] based on classification.
C3 (union-aliasing reasoning corrected):
BeginPicture's params.h265.num_slices = 0 reset is benign for
non-HEVC profiles because byte ~17764 of the params union is past
any field non-HEVC profiles read, NOT because RenderPicture's
per-buffer copies overwrite that location. Wording amended in
phase4_iter2_plan.md per phase5_iter2_review.md.
S1 (PPS flags 19 + 20 — DEBLOCKING_FILTER_CONTROL_PRESENT and
UNIFORM_SPACING):
Empirically VAAPI does NOT expose either flag in the
VAPictureParameterBufferHEVC pic_fields.bits or
slice_parsing_fields.bits. Both bits left zero. BBB-720p10s_hevc
fixture uses neither tiles nor explicit deblocking-control
parameters, so the omission is correct for the iter2 binding cell.
S2 (3 PPS scalars added):
pic_parameter_set_id (default 0; VAAPI doesn't expose),
num_ref_idx_l0_default_active_minus1, num_ref_idx_l1_default_
active_minus1 (both populated from VAAPI picture struct).
Q2 (slice_segment_addr populated):
Was missing in old h265.c. Now sourced from
VAAPI's slice->slice_segment_address.
S3 (SCALING_MATRIX content choice):
Implementer choice taken: when iqmatrix_set==false (BBB has no
scaling list per SPS flags = SAO|STRONG_INTRA_SMOOTHING),
h265_fill_scaling_matrix sends memset-zero. Matches FFmpeg's
sl=NULL pattern at v4l2_request_hevc.c:384-403 (preserves
byte-equality vs cross-validator anchor).
S4 (FFmpeg function name fix): cosmetic; no code impact.
Plus one Phase 6 inline correction: phase 5 review S1 suggested
VAAPI exposes uniform_spacing_flag in pic_fields.bits; empirical
test-compile shows it doesn't. Comment added in h265_fill_pps
documenting the omission.
Picture.c changes (3 edits):
1. codec_set_controls HEVCMain dispatch (lines 204-206 → call
h265_set_controls; replaces explicit Fourier-local: HEVC stripped
reject).
2. codec_store_buffer HEVC VASliceParameterBufferType case: append
VAAPI slice param to params.h265.slices[N] array, increment
num_slices. Single-slice mirror at .slice retained for
h265_fill_pps (which reads dependent_slice_segment_flag from
LongSliceFlags).
3. RequestBeginPicture: add params.h265.num_slices = 0 reset
alongside existing h264.matrix_set = false reset.
Surface.h: extend params.h265 struct with slices[HEVC_MAX_SLICES_PER_
FRAME=64] array + num_slices counter. ~17 KB extra per surface union;
24 surfaces in iter7 cap_pool = ~400 KB total surface_heap growth.
object_heap allocator picks up new size automatically via
sizeof(struct object_surface).
Context.c: separate 2-control batched call sets HEVC DECODE_MODE +
START_CODE device-wide. Same best-effort (void)v4l2_set_controls
pattern as the existing H.264 device-init block; if kernel doesn't
advertise HEVC controls (hantro on RK3568/RK3399), the batch silently
fails without invalidating the H.264 batch.
Meson.build: uncomment 'h265.c' (line 50) and 'h265.h' (line 73)
in sources + headers lists.
H265.h: added HEVC_MAX_SLICES_PER_FRAME=64 #define before struct
forward declarations.
Phase 6 smoke test on fresnel (post Commit A + Commit B):
Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec env binding
(/dev/video1 + /dev/media0). PASS.
Criterion 3: ffmpeg -hwaccel vaapi HEVC decode of bbb_720p10s_hevc.mp4
-frames:v 5 -f null -, exit 0. cap_pool_init: 24 slots
ready. PASS.
Criterion 4: mpv --hwdec=vaapi --vo=image at +02s seek, HEVC fixture:
HW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5
SW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5
HW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656
SW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656
HW=SW byte-identical for both frames; frame1 != frame2 (real motion).
PASS.
Criterion 5: regression hashes hold for both prior cells:
H.264 +30s HW frame 1: f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9 (T4 ref MATCH)
H.264 +30s HW frame 2: 7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8 (T4 ref MATCH)
MPEG-2 +02s HW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 (iter1 ref MATCH)
MPEG-2 +02s HW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de (iter1 ref MATCH)
PASS.
All five criteria green on first build attempt — Phase 5 review
caught the 3 Critical UAPI errors (data_bit_offset → data_byte_offset
rename; dpb.rps field gone + pic_order_cnt_val rename + index-array
semantics) that would have been Phase 6 compile failures or silent
Phase 7 byte-compare divergences. Without that review pass, this
commit would have been the start of a 2+ loopback debugging cycle.
Refs:
../fresnel-fourier/phase4_iter2_plan.md (10 contract clauses,
File 4 patch shape)
../fresnel-fourier/phase5_iter2_review.md (C1, C2, C3, S1, S2,
S3, S4, Q2 amendments
all incorporated)
../fresnel-fourier/phase0_evidence/2026-05-08/iter2_phase3/
ffmpeg_v4l2req.stdout (cross-validator anchor — Phase 7
bonus byte-compare verification target)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
19acc76da4 |
iter2 Fix 3: decoupled CAPTURE buffer pool with LRU recycling
Pre-iter2 each VA surface was permanently 1:1 bound to one V4L2 CAPTURE
buffer. mpv reusing a surface for a new decode while the compositor still
held an EXPBUF'd dma_buf fd to the prior frame caused the kernel to
write fresh decode output into the same physical memory the compositor
was reading -- visible as stutter / back-and-forth swap on
mpv --hwdec=vaapi --vo=gpu playback.
Architecture:
- New cap_pool abstraction (cap_pool.{h,c}) owns N CAPTURE buffers
(N = max(surfaces_count, MIN_CAP_POOL=24)) with per-slot state
{FREE, IN_DECODE, DECODED, EXPORTED} guarded by pthread_mutex_t.
- Surfaces no longer own buffers; each vaBeginPicture acquires the
oldest FREE slot (LRU), binds it for the decode cycle, and the slot
cycles IN_DECODE -> DECODED (post-DQBUF) -> EXPORTED (post-EXPBUF).
- Slot is released on next BeginPicture for the same surface or on
vaDestroySurfaces.
Limitations (Sonnet Phase 5 review iter2 9.x, deferred to iter3+):
- Option-A statistical mitigation; race window narrows to "pool
exhausted, force-recycle of oldest EXPORTED slot." For typical mpv
16-surface playback with MIN_CAP_POOL=24 the fallback never fires.
- Multi-context concurrent use not addressed (one V4L2 device, multiple
cap_pools -- iter3 scope).
Other call sites updated:
- picture.c::BeginPicture acquires + binds, releasing prior slot if any.
- surface.c::SyncSurface marks slot DECODED after DQBUF.
- surface.c::ExportSurfaceHandle marks slot EXPORTED, retaining OUR
EXPBUF fd for force-recycle close().
- surface.c::DestroySurfaces releases via surface_unbind_slot;
cap_pool owns the mmaps now.
- surface.c::CreateSurfaces2 destroys the pool in the resolution-change
path before REQBUFS(0) (else stale v4l2_index after Fix 1's REQBUFS).
- context.c::DestroyContext invokes cap_pool_destroy.
- image.c::DeriveImage skips copy_surface_to_image when current_slot is
NULL (ffmpeg av_hwframe_ctx_init probes derive on undecoded surfaces).
Verified: mpv vaapi-copy 200 frames bbb_1080p30, 0 drops, LRU visibly
recycling slot indices, real luma gradient. mpv vaapi --vo=gpu
operator-inspection follows.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
9de1be34ef |
h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields
The load-bearing fix from diff_against_ffmpeg.md (campaign repo).
Adds src/h264_slice_header.{c,h} — a minimal H.264 slice_header()
bit-parser per ITU-T H.264 (08/2024) §7.3.3. Parses just enough of
the slice header to populate the V4L2 DECODE_PARAMS fields VAAPI
doesn't carry and that hantro G1 hardware reads directly out of
DECODE_PARAMS into MMIO registers:
dec_param->dec_ref_pic_marking_bit_size -> G1_REG_DEC_CTRL5_REFPIC_MK_LEN
dec_param->idr_pic_id -> G1_REG_DEC_CTRL5_IDR_PIC_ID
dec_param->pic_order_cnt_bit_size -> G1_REG_DEC_CTRL6_POC_LENGTH
dec_param->pic_order_cnt_lsb -> hantro reflist builder (poc_type=0)
dec_param->delta_pic_order_cnt_bottom -> same
dec_param->delta_pic_order_cnt0/1 -> hantro reflist builder (poc_type=1)
Without these set correctly, hantro's hardware bitstream parser
walks past zero bits in the slice header, lands on garbage, decodes
zero pixels — the all-zero CAPTURE output observed across both mpv
and Firefox during 2026-05-04 Phase 0 (see libva-multiplanar campaign
phase0_evidence/2026-05-04-kernel-trace/findings.md).
Implementation:
- Minimal RBSP bit reader (br_read_u/_ue/_se), MSB-first, fault-flag
on overrun.
- Emulation-prevention unescape (strips 0x03 after 0x00 0x00) on
the first 64 bytes of the slice — slice headers fit comfortably.
- Walks slice_header() up to and including dec_ref_pic_marking(),
measuring bit positions for the *_bit_size fields.
- Skips ref_pic_list_modification() and pred_weight_table() —
needed only to advance the bit position to dec_ref_pic_marking().
- Returns a struct with the V4L2 fields plus diagnostics
(first_mb_in_slice, slice_type, pps_id, frame_num).
Wired into h264_va_picture_to_v4l2 (src/h264.c) right after the
nal_ref_idc/nal_unit_type extraction. SPS/PPS context is built from
VAPicture's seq_fields and pic_fields; num_ref_idx_l0/l1_active
defaults come from VASlice (best available substitute for the
parsed PPS values). On parse success, populates decode_params with
the recovered values + emits a request_log with the decoded fields
for cross-validation against VAAPI's pre-parsed values.
src/meson.build: adds h264_slice_header.{c,h} to sources.
Cross-references:
- FFmpeg libavcodec/h264_slice.c (Kwiboo v4l2-request-n8.1) — populates
H264SliceContext::ref_pic_marking_bit_size / pic_order_cnt_bit_size
by the same bit-precise parse, then v4l2_request_h264.c forwards
to V4L2.
- Linux drivers/media/platform/verisilicon/hantro_g1_h264_dec.c
set_params() — the register-write code that reads these fields.
MVC nal_unit_type 20/21 unhandled (this fork strips MVC alongside
HEVC). Multi-slice non-IDR streams parse the first slice's header
only; for FRAME_BASED mode that's fine — kernel sees the whole
bitstream and parses subsequent slices itself.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
565f5c0de4 |
context: introduce request_pool, decouple OUTPUT buffers from surfaces
Commit 3 of the upstreamable plan (upstreamable_design.md §1, §5).
Replaces the prior per-surface OUTPUT-buffer ownership model with a
small driver-wide pool sized by codec pipeline depth (4 H.264 frames
in flight), allocated unconditionally regardless of caller's
num_render_targets.
Prior art (kernel UAPI dev-stateless-decoder.rst, ffmpeg
v4l2_request.c, Chromium V4L2StatelessVideoDecoder, GStreamer
v4l2slh264dec) all decouple OUTPUT and CAPTURE pool sizing. fourier's
"output_count == surfaces_count" model was a category error: OUTPUT
buffers are request-time bitstream slots, CAPTURE buffers are
picture-time DPB slots; their lifecycles and sizing are independent.
Changes:
* NEW src/request_pool.{c,h} (~200 LoC):
- request_pool_init(): CREATE_BUFS + per-slot QUERYBUF + mmap.
- request_pool_destroy(): munmap all, idempotent.
- request_pool_acquire(): round-robin claim; returns V4L2 buffer
index of an unused slot or -1.
- request_pool_release(): mark slot free for reuse.
- request_pool_slot(): accessor for ptr/size given a buffer index.
* src/request.h: add struct request_pool output_pool to request_data.
* src/context.c::RequestCreateContext: replace the per-surface
OUTPUT loop with a single request_pool_init() call (count=4,
independent of surfaces_count). Drop the now-unused locals
(length, offset, source_data, output_buffers_count, index,
index_base, i, surface_object). DELETES patch 0002's
"output_buffers_count = ... ? ... : 4" hack inline — the pool's
own count parameter supersedes it.
* src/picture.c::RequestBeginPicture: borrow a pool slot at frame
start, write its mmap pointer/size/index into the surface's
transient source_* fields. The fields stay (still useful as
a borrow handle that the existing codec_store_buffer memcpys
target), but no longer represent surface-permanent ownership.
Reset slices_size/slices_count here too (was implicit on first
Render).
* src/surface.c::RequestSyncSurface: after VIDIOC_DQBUF returns
the OUTPUT buffer, release the pool slot and clear the surface's
borrow handle. Fixes the segv on second-frame submission.
* src/surface.c::RequestDestroySurfaces: remove the munmap of
source_data — pool owns the mmap.
* src/request.c::RequestTerminate: call request_pool_destroy()
before close(video_fd) so munmaps still target a valid fd.
* src/meson.build: add request_pool.c and request_pool.h to the
sources/headers lists.
This commit removes 0002's OUTPUT-pool hack inline (the
"floor to 4" line is gone). The DECODE_MODE/START_CODE block in 0002
remains until commit 4 lands.
Build-verified clean on aarch64.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||
|
|
c45fea96e3 |
fourier-local: stateless control modernization + HEVC strip
Compound patch carrying the fork's pre-Step-1 substrate, originally authored by Jernej Škrabec / fourier on top of bootlin's |
||
|
|
a3c2476de1 |
Respect libdir for install path
Distritubions may install libraries under architecture-specific sub-directories, to support multiple architectures on the same system. In addition, the user may not wish to install the library with the default prefix. Use the libdir variable when setting the install path. This allows both specifying different sub-directories, and a different prefix. |
||
|
|
3264c0495c |
Add option to specify path to up-to-date kernel headers
The system normally has kernel headers shipped with the distribution. These typically lag behind actual kernel releases. Thus they would not have the latest API additions, such as the V4L2 request API this driver uses. However, it is also bad practice to just install new kernel headers into the system wide default location, as there may be some differences between it and what the C library was built against. Add an option to specify a path to a set of up-to-date kernel headers. This would allow the user to build this project in a safe but working environment. Signed-off-by: Chen-Yu Tsai <wens@csie.org> |
||
|
|
3176adf69c |
Include local copies of DRM and V4L2 codec definitions
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> |
||
|
|
ca5198b429 |
Add support for the meson build system
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> |