libva-v4l2-request-fourier

Author	SHA1	Message	Date
claude-noether	d7ef0f6cd9	ampere-av1 Phase 3: SEQUENCE byte-equal kdirect; 3/10 frames PASS bit-exact Three more fixes after strace-diff localization vs kdirect. Fix 6 — fill_sequence ENABLE_SUPERRES: gate on picture->pic_info_fields.bits.use_superres instead of unconditional set-true. VAAPI doesn't expose enable_superres at sequence level; per strace diff kdirect clears the flag for streams not using superres (byte 1 of flags was the only SEQUENCE diff). After this fix, SEQUENCE ctrl byte-equal kdirect on every call. Fix 7 — refresh_frame_flags = 0xff (was 0): VAAPI doesn't expose refresh_frame_flags. Default 0xff = "refresh all DPB slots" matches kdirect's submission and AV1 spec default for KEY/SWITCH frames; for inter frames simple P-frame chains naturally tolerate this. Fix 8 — surface_object->av1_order_hint per-surface tracking. Set in av1_set_controls from picture->order_hint of the current frame. Also propagated to the linked display surface (when apply_grain=1 → cur_frame != cur_display) so future frames referencing the display surface find the order_hint via the linked_decode_surface_id. Tried + reverted: ref-name iteration of reference_frame_ts / order_hints via picture->ref_frame_idx[i-1] → DPB slot (Kwiboo's convention via FFmpeg's s->ref[i]). Empirically regressed 3/10 → 1/10. V4L2 uAPI's indexing here looks DPB-slot-direct despite the AV1 spec lexicon — needs kernel-side disambiguation to settle. Verification on ampere (av1_larger.ivf 352x288, 10 frames): Frames 0, 2, 4: PASS bit-exact (apply_grain=1, grain HW path) Frames 1, 3, 5-9: DIFF (apply_grain=0) 3/10 PASS (was 1/10 after iter checkpoint). test_av1.ivf 208x208: unchanged bit-exact PASS sha 029ee72c214b37c1 Remaining open: frame 1 (apply_grain=0, first inter) submits IDENTICAL FRAME ctrl bytes to kdirect (verified strace-diff post-fix), yet decoded output diverges. That means the divergence is no longer in control submission — points at OUTPUT-side bitstream differences between ffmpeg-vaapi and ffmpeg-v4l2request, or at DPB CAPTURE buffer state (grain-applied data being used as reference vs pre-grain). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 10:55:07 +00:00
claude-noether	5803cbcf6c	ampere-av1 Phase 3 progress: film_grain link + UPDATE_GRAIN; frame 0 bit-exact Three structural fixes for AV1 with film_grain on vpu981 (RK3588). Output is no longer empty / crashed; frame 0 (IDR with apply_grain=1) is bit-exact vs kdirect. Inter frames still diverge. Fix 1 — surface.h + surface.c: linked_decode_surface_id field on object_surface, initialized to VA_INVALID_SURFACE. When AV1 picture has apply_grain=1, VAAPI's VADecPictureParameterBufferAV1 carries a current_display_picture distinct from current_frame. ffmpeg-vaapi calls vaBeginPicture on current_frame (decode surface, slot gets bound) but vaGetImage on current_display_picture (display surface, no slot) → NULL deref in copy_surface_to_image. Fix 2 — av1.c: in av1_set_controls, when cur_frame != cur_display, set display_surface->linked_decode_surface_id = current_frame. Establishes the back-link so display surface can borrow decode surface's data. Fix 3 — image.c copy_surface_to_image: when slot is NULL and the surface has linked_decode_surface_id, lookup the decode surface and mirror its destination_data[] + destination_sizes[] + destination_planes_count. NULL guard with diagnostic log retained. Fix 4 — av1.c fill_film_grain: when apply_grain=1, also set V4L2_AV1_FILM_GRAIN_FLAG_UPDATE_GRAIN. Confirmed by strace-diff: kdirect sends flags=0x0B (APPLY\|UPDATE\|...), libva was sending 0x09 (APPLY but no UPDATE). Without UPDATE the kernel tries to reuse from film_grain_params_ref_idx=0, which is never populated. Earlier reverted because UPDATE seemed to trigger a SEGV — but that SEGV was the unmasked NULL-slot deref; with fix 1+2+3 in place UPDATE is safe. Fix 5 — av1.c reference_frame_ts plumbing: when a referenced surface has timestamp=0 AND linked_decode_surface_id set, follow the link to find the decode surface that carries the real timestamp. Display surfaces don't get OUTPUT QBUF'd by us, so their own timestamp stays zero. Also: BeginPicture diagnostic log + surface_unbind_slot diagnostic log + v4l2.c error_idx diagnostic (kept from earlier — useful for ongoing investigation). Verification on ampere: test_av1.ivf (208x208, 2 frames, no grain): bit-exact PASS sha 029ee72c214b37c1 (unchanged, no regression) av1_larger.ivf (352x288, 10 frames, film_grain alternates): frame 0 (key, apply_grain=1): PASS bit-exact vs kdirect frame 4: PASS bit-exact frames 1,2,3,5,6,7,8,9: DIFFER Frame 0 PASS proves: SEQUENCE + FRAME + TILE_GROUP_ENTRY + FILM_GRAIN mapping is correct for IDR. Frame 4 PASS is unexplained but encouraging. Inter-frame divergence (frame 1+) points at: reference handling for inter prediction is still off — either order_hints[] (still zero, VAAPI doesn't expose per-ref), or grain-applied vs pre-grain DPB semantics, or ref_frame_idx pointing into the wrong surface space. Next investigation: per-frame strace diff between libva and kdirect controls payload to spot remaining field mis-mappings on inter frames. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 10:45:31 +00:00
claude-noether	78a9978b02	ampere-av1 Phase 2 step 4: AV1 dispatch scaffolding compiles and wires surface.h: av1 substruct (picture + tile_group_entries[AV1_MAX_TILES=128] + num_tile_group_entries counter) picture.c: dispatch VAPictureParameterBufferAV1 + VASliceParameterBufferAV1 into surface->params.av1.*; call av1_set_controls in EndPicture path av1.h: minimal interface (av1_set_controls signature) av1.c: stub set_controls returning -1 with diagnostic; _Static_assert on v4l2_ctrl_av1_tile_group_entry size = 16 (Janet hygiene) meson.build: av1.c + av1.h in source list Verified on ampere with /tmp/test_av1.ivf via LIBVA_DRIVER_NAME=v4l2_request: v4l2-request: ampere-av1: vpu981 AV1 decoder at /dev/video4 + /dev/media3 v4l2-request: ampere-av1: av1_set_controls stub — Phase 2.1 will implement ... [av1] Failed to end picture decode issue: 1 (operation failed). [av1] HW accel end frame fail. [dec:av1] Error submitting packet to decoder: Input/output error Clean graceful failure — vpu981 probe works, dispatch reaches av1.c, stub returns ERROR, ffmpeg falls back to SW. No crash, no IOMMU fault, no kernel taint. Next: Phase 2.1 implementation of fill_sequence + fill_frame + fill_film_grain + fill_tile_group_entries (~700 LoC mirror of Kwiboo v4l2_request_av1.c, applying F1/F2/F3 implementation-time corrections from Janet review v2). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 09:55:39 +02:00
claude-noether	70196f8065	fresnel-fourier iter5b-β Phase 7 fix-forward commit D: destination_* for vaapi-copy late-surface flow Phase 7 empirical: all 5 libva codecs returned all-zero because CreateContext's surfaces_ids[] walk was a no-op for ffmpeg-vaapi-copy which passes surfaces_count=0 to vaCreateContext (per the iter6 comment at context.c:262). Surfaces existed in driver_data's surface_heap but weren't in the param array → destination_* stayed at the zero initialization from CreateSurfaces2 β → BeginPicture's surface_bind_slot saw destination_planes_count=0 → no data assignment → copy_surface_to_image read all-zero. Fix: cache the format-uniform CAPTURE geometry in driver_data (fmt_valid, fmt_planes_count, fmt_buffers_count, fmt_format_height, fmt_sizes[], fmt_bytesperlines[]). Populate at CreateContext after v4l2_get_format(CAPTURE). Walk surface_heap (not just surfaces_ids[]) to fill every existing surface. Add lazy-fill in CreateSurfaces2 for surfaces created AFTER CreateContext. Invalidate cache in DestroyContext. New helper: surface_fill_format_uniform(driver_data, surface_object). Idempotent on destination_planes_count != 0. Signed-off-by: claude-noether <claude-noether@reauktion.de>	2026-05-12 18:52:33 +00:00
claude-noether	7055b14f5e	fresnel-fourier iter5b-β Phase 6 commit C: β refactor — OUTPUT lifecycle to CreateContext + CRIT-1 + CRIT-2 Strip OUTPUT-side V4L2 device-format lifecycle out of RequestCreateSurfaces2 entirely. Move S_FMT(OUTPUT), CAPTURE-format probe, cap_pool_init, per-surface destination_* fill into RequestCreateContext where config_id (and therefore the bound VAProfile) is known via config_object->pixelformat (wired by commit B). The α' multi-CreateSurfaces2-mid-stream failure mode disappears because β has no in-CreateSurfaces2 teardown branch; each context cycle does its own setup, DestroyContext handles teardown. Phase 5 v2 review amendments: - CRIT-1: removed video_format==NULL early-return at context.c:64-66 (would have rejected every first β CreateContext). - CRIT-2: added request_pool_destroy() to DestroyContext before REQBUFS(0). Pre-β only surface.c's resolution-change branch called request_pool_destroy; β strips that, so DestroyContext becomes the sole per-session teardown site. - IMP-1: probe CAPTURE format first to derive output_type from video_format->v4l2_mplane (eliminates the hardcoded mplane=true hack from the Phase 4 v2 plan). - IMP-2: surface_reset_format_cache() deleted (function + declaration in surface.h + call in DestroyContext + last_output_{width,height} fields in request.h). All dead under β. CreateSurfaces2 now ~50 LOC (was ~250). Pure surface ID allocation + per-surface lifecycle bookkeeping; no V4L2 device state touched. Signed-off-by: claude-noether <claude-noether@reauktion.de>	2026-05-12 14:41:35 +00:00
claude-noether	406d08e122	fresnel-fourier iter4 Phase 6 commit B: NEW src/vp9.c + src/vp9.h + meson.build + context.h (vp9_lf) + surface.h (params.vp9) VP9 codec dispatcher implementing 12 contract clauses against V4L2_CID_STATELESS_VP9_FRAME (0xa40a2c) + V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (0xa40a2d). 2 batched controls per frame; rkvdec on RK3399 mandatorily requires both per drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble:752. Implementation: - ~80 LOC VPX range coder (vp9_rac_) — minimal port of FFmpeg vpx_rac.[ch] + vp89_rac.h. Stateless static helpers. - inv_map_table[255] + read_prob_delta — verbatim copy from v4l2_request_vp9.c:44-97. - vp9_parse_uncompressed_header_lf_quant — partial parse for the fields VAAPI doesn't expose: lf_delta_enabled / lf_delta_update / lf_ref_delta[4] / lf_mode_delta[2] / base_q_idx / delta_q_y_dc / delta_q_uv_dc / delta_q_uv_ac. ~120 LOC. - vp9_fill_compressed_hdr — port of FFmpeg fill_compressed_hdr with Phase 5 C3 out_reference_mode parameter. ~140 LOC. - vp9_set_controls — orchestrates Clauses 1+2+4+5+7+10+11+12. ~120 LOC. Phase 5 amendments incorporated in code: - C1: frame.interpolation_filter = direct from VAAPI's mcomp_filter_type (NO XOR; vaapi_vp9.c:62 already applied it before storing into VAAPI's mcomp_filter_type). - C2: persistent vp9_lf state added to object_context (in context.h). Initialized to VP9 spec defaults {1,0,-1,-1,0,0} on keyframe / intra_only / error_resilient. Updated only when parser sees lf_delta.update=1. Always copied to kernel control. - C3: vp9_fill_compressed_hdr takes uint8_t out_reference_mode; threaded through call site. allowcompinter derived from VAAPI sign-bias bits. Phase 5 S4: uv_mode memcpy from FFmpeg's fill_compressed_hdr omitted — rkvdec reads uv_mode from kernel's persistent probability_tables, NOT from prob_updates ctrl. Clause 3 compile-time _Static_assert on struct sizes (168/2040) matches Phase 3 empirical baseline; UAPI shifts will fail loudly. surface.h: extends params union with vp9 { picture, slice }. context.h: adds vp9_lf { ref_deltas[4], mode_deltas[2], initialized }. meson.build: adds vp9.c + vp9.h. Build: clean on fresnel (linux-fresnel-fourier 7.0-1, libva 1.23). Runtime: not yet wired in picture.c — next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 06:46:11 +00:00
claude-noether	7f84bbb50f	fresnel-fourier iter3 Phase 6 commit C: picture.c VP8 dispatch + 4 buffer-type cases + new VAProbabilityBufferType outer case + per- frame reset + surface.h params.vp8 union extension Five sites in picture.c + one site in surface.h wire up the VP8 codec dispatcher introduced by commit B: 1. Include #include "vp8.h" in the codec headers block. 2. codec_set_controls: NEW case VAProfileVP8Version0_3 calling vp8_set_controls(driver_data, context, surface_object). Same shape as MPEG-2 + HEVC dispatch. 3. codec_store_buffer VAPictureParameterBufferType: NEW VP8 case memcpy'ing into surface_object->params.vp8.picture (sizeof VAPictureParameterBufferVP8). 4. codec_store_buffer VASliceParameterBufferType: NEW VP8 case memcpy'ing into surface_object->params.vp8.slice (single, no slices[] array — VP8 is frame-mode, no multi-slice). 5. codec_store_buffer VAIQMatrixBufferType: NEW VP8 case memcpy'ing into surface_object->params.vp8.iqmatrix + setting iqmatrix_set true. 6. codec_store_buffer NEW outer case VAProbabilityBufferType (Phase 5 C3: NOT VAProbabilityDataBufferType — that's the STRUCT name; the buffer-type enum constant is VAProbabilityBufferType = 13 per va.h:2058). Inner switch dispatches by profile, with VP8 case memcpy'ing into surface_object->params.vp8.probability + setting probability_set true. 7. RequestBeginPicture: NEW per-frame reset for the two VP8 flags — params.vp8.iqmatrix_set = false + params.vp8.probability_set = false. Mirrors the existing iter1 (h264.matrix_set) + iter2 (h265.num_slices) per-frame resets. surface.h extension: 8. params union: NEW vp8 struct after h265 — holds the 4 VAAPI buffer-type structs (VAPictureParameterBufferVP8, VASliceParameterBufferVP8, VAIQMatrixBufferVP8 + iqmatrix_set, VAProbabilityDataBufferVP8 + probability_set). The NEW vp8 union member adds ~5300 bytes (sizeof VAProbabilityDataBufferVP8 dominated by dct_coeff_probs[4][8][3] [11] = 1056 + bookkeeping). The h265 member with slices[64] array remains the largest (~17 KB), so the union size doesn't grow. After this commit: backend builds clean, links cleanly. mpv-vaapi VP8 decode should engage end-to-end on hantro env binding. Phase 1 criteria 1 + 2 + 3 expected satisfied; criterion 4 (HW=SW byte- identical) and criterion 5 (3-codec regression) verified at Phase 6 smoke + Phase 7. Refs: ../fresnel-fourier/phase4_iter3_plan.md (Commit C site list) ../fresnel-fourier/phase2_iter3_situation.md (B6, B7, B8, B9 bug enumeration) ../fresnel-fourier/phase5_iter3_review.md (C3 VAProbabilityBuffer Type rename empirically verified) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 22:52:24 +00:00
claude-noether	8d71e20bf7	fresnel-fourier iter2 Phase 6 commit B: rewrite h265.c against new V4L2 stateless HEVC API Rewrites src/h265.c (407 lines → 588 lines) and the picture.c HEVC dispatch + per-slice accumulation against the modern split V4L2_CID_ STATELESS_HEVC_{SPS,PPS,SLICE_PARAMS,SCALING_MATRIX,DECODE_PARAMS, DECODE_MODE,START_CODE} stateless controls. Replaces the staging-era V4L2_CID_MPEG_VIDEO_HEVC_{SPS,PPS,SLICE_PARAMS} CIDs that were removed from the kernel UAPI. Per-frame submission: ONE batched VIDIOC_S_EXT_CTRLS, count=5, ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS: 0xa40a90 SPS (40 bytes) 0xa40a91 PPS (64 bytes) 0xa40a92 SLICE_PARAMS (variable; dynamic-array; one entry per slice) 0xa40a93 SCALING_MATRIX (1296 bytes; memset-zero when no scaling list) 0xa40a94 DECODE_PARAMS (328 bytes; per-frame DPB info) Plus device-wide menus set once at context.c init (separate batched S_EXT_CTRLS call so a kernel without HEVC controls — e.g. hantro on RK3568/RK3399 — silently fails its batch without invalidating H.264): 0xa40a95 DECODE_MODE (FRAME_BASED on rkvdec) 0xa40a96 START_CODE (ANNEX_B on rkvdec) Reference: FFmpeg libavcodec/v4l2_request_hevc.c:505-565 (v4l2_request_hevc_queue_decode batched submission shape). Phase 5 review amendments incorporated: C1 (data_byte_offset NOT data_bit_offset): Old h265.c at lines 184-209 ran an 8-bit search to compute bit-granularity offset. New API renames the field to data_byte_offset (u32 byte offset). Bit-search dropped; replaced with plain byte offset = source_offset + slice->slice_data_byte_offset. C2 (dpb_entry.flags only LONG_TERM_REFERENCE; pic_order_cnt_val singular; poc_st_curr_[] arrays hold DPB INDICES not POC): h265_fill_decode_params replaces old slice-params DPB iteration with explicit DPB classification + index-array population. For each VAAPI ReferenceFrames[i]: - Classify into ST_CURR_BEFORE / ST_CURR_AFTER / LT_CURR via VA_PICTURE_HEVC_RPS_ flags. - Set dpb[j].timestamp, .pic_order_cnt_val (singular), .field_pic. - Set dpb[j].flags = LONG_TERM_REFERENCE iff RPS_LT_CURR. - Append j (DPB index, u8) to poc_st_curr_before[k] / poc_st_curr_after[k] / poc_lt_curr[k] based on classification. C3 (union-aliasing reasoning corrected): BeginPicture's params.h265.num_slices = 0 reset is benign for non-HEVC profiles because byte ~17764 of the params union is past any field non-HEVC profiles read, NOT because RenderPicture's per-buffer copies overwrite that location. Wording amended in phase4_iter2_plan.md per phase5_iter2_review.md. S1 (PPS flags 19 + 20 — DEBLOCKING_FILTER_CONTROL_PRESENT and UNIFORM_SPACING): Empirically VAAPI does NOT expose either flag in the VAPictureParameterBufferHEVC pic_fields.bits or slice_parsing_fields.bits. Both bits left zero. BBB-720p10s_hevc fixture uses neither tiles nor explicit deblocking-control parameters, so the omission is correct for the iter2 binding cell. S2 (3 PPS scalars added): pic_parameter_set_id (default 0; VAAPI doesn't expose), num_ref_idx_l0_default_active_minus1, num_ref_idx_l1_default_ active_minus1 (both populated from VAAPI picture struct). Q2 (slice_segment_addr populated): Was missing in old h265.c. Now sourced from VAAPI's slice->slice_segment_address. S3 (SCALING_MATRIX content choice): Implementer choice taken: when iqmatrix_set==false (BBB has no scaling list per SPS flags = SAO\|STRONG_INTRA_SMOOTHING), h265_fill_scaling_matrix sends memset-zero. Matches FFmpeg's sl=NULL pattern at v4l2_request_hevc.c:384-403 (preserves byte-equality vs cross-validator anchor). S4 (FFmpeg function name fix): cosmetic; no code impact. Plus one Phase 6 inline correction: phase 5 review S1 suggested VAAPI exposes uniform_spacing_flag in pic_fields.bits; empirical test-compile shows it doesn't. Comment added in h265_fill_pps documenting the omission. Picture.c changes (3 edits): 1. codec_set_controls HEVCMain dispatch (lines 204-206 → call h265_set_controls; replaces explicit Fourier-local: HEVC stripped reject). 2. codec_store_buffer HEVC VASliceParameterBufferType case: append VAAPI slice param to params.h265.slices[N] array, increment num_slices. Single-slice mirror at .slice retained for h265_fill_pps (which reads dependent_slice_segment_flag from LongSliceFlags). 3. RequestBeginPicture: add params.h265.num_slices = 0 reset alongside existing h264.matrix_set = false reset. Surface.h: extend params.h265 struct with slices[HEVC_MAX_SLICES_PER_ FRAME=64] array + num_slices counter. ~17 KB extra per surface union; 24 surfaces in iter7 cap_pool = ~400 KB total surface_heap growth. object_heap allocator picks up new size automatically via sizeof(struct object_surface). Context.c: separate 2-control batched call sets HEVC DECODE_MODE + START_CODE device-wide. Same best-effort (void)v4l2_set_controls pattern as the existing H.264 device-init block; if kernel doesn't advertise HEVC controls (hantro on RK3568/RK3399), the batch silently fails without invalidating the H.264 batch. Meson.build: uncomment 'h265.c' (line 50) and 'h265.h' (line 73) in sources + headers lists. H265.h: added HEVC_MAX_SLICES_PER_FRAME=64 #define before struct forward declarations. Phase 6 smoke test on fresnel (post Commit A + Commit B): Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec env binding (/dev/video1 + /dev/media0). PASS. Criterion 3: ffmpeg -hwaccel vaapi HEVC decode of bbb_720p10s_hevc.mp4 -frames:v 5 -f null -, exit 0. cap_pool_init: 24 slots ready. PASS. Criterion 4: mpv --hwdec=vaapi --vo=image at +02s seek, HEVC fixture: HW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5 SW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5 HW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656 SW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656 HW=SW byte-identical for both frames; frame1 != frame2 (real motion). PASS. Criterion 5: regression hashes hold for both prior cells: H.264 +30s HW frame 1: f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9 (T4 ref MATCH) H.264 +30s HW frame 2: 7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8 (T4 ref MATCH) MPEG-2 +02s HW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 (iter1 ref MATCH) MPEG-2 +02s HW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de (iter1 ref MATCH) PASS. All five criteria green on first build attempt — Phase 5 review caught the 3 Critical UAPI errors (data_bit_offset → data_byte_offset rename; dpb.rps field gone + pic_order_cnt_val rename + index-array semantics) that would have been Phase 6 compile failures or silent Phase 7 byte-compare divergences. Without that review pass, this commit would have been the start of a 2+ loopback debugging cycle. Refs: ../fresnel-fourier/phase4_iter2_plan.md (10 contract clauses, File 4 patch shape) ../fresnel-fourier/phase5_iter2_review.md (C1, C2, C3, S1, S2, S3, S4, Q2 amendments all incorporated) ../fresnel-fourier/phase0_evidence/2026-05-08/iter2_phase3/ ffmpeg_v4l2req.stdout (cross-validator anchor — Phase 7 bonus byte-compare verification target) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 15:58:34 +02:00
test0r	b993355507	iter5 Track E: move LAST_OUTPUT_WIDTH/HEIGHT from process-global to per-driver-data Sonnet review 7.3 / 9.6 from iter1 + carried iter2/3/4 substrate. Two libva driver_data instances in the same process (e.g. Firefox playing two tabs at different resolutions, or Firefox + mpv via the same dlopened backend) would race on the static cache. Move to struct request_data.last_output_width/height. The V4L2 device fd is already per-driver_data, so this is the correct binding unit (one fd, one current OUTPUT format). Verified: two concurrent mpv processes (2s stagger) both decode 300 frames cleanly with no cross-corruption. Same-instant init still hits kernel-level fd contention on /dev/video1 (hantro is a single-instance device); cross-process serialization is out of scope for a libva backend. Resolves the surface_reset_format_cache() callsite: now takes driver_data parameter (was zero-arg). Also drops the 'rc' unused-variable warning in v4l2_ioctl_controls that the iter5 sweep left behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 15:05:41 +00:00
test0r	19acc76da4	iter2 Fix 3: decoupled CAPTURE buffer pool with LRU recycling Pre-iter2 each VA surface was permanently 1:1 bound to one V4L2 CAPTURE buffer. mpv reusing a surface for a new decode while the compositor still held an EXPBUF'd dma_buf fd to the prior frame caused the kernel to write fresh decode output into the same physical memory the compositor was reading -- visible as stutter / back-and-forth swap on mpv --hwdec=vaapi --vo=gpu playback. Architecture: - New cap_pool abstraction (cap_pool.{h,c}) owns N CAPTURE buffers (N = max(surfaces_count, MIN_CAP_POOL=24)) with per-slot state {FREE, IN_DECODE, DECODED, EXPORTED} guarded by pthread_mutex_t. - Surfaces no longer own buffers; each vaBeginPicture acquires the oldest FREE slot (LRU), binds it for the decode cycle, and the slot cycles IN_DECODE -> DECODED (post-DQBUF) -> EXPORTED (post-EXPBUF). - Slot is released on next BeginPicture for the same surface or on vaDestroySurfaces. Limitations (Sonnet Phase 5 review iter2 9.x, deferred to iter3+): - Option-A statistical mitigation; race window narrows to "pool exhausted, force-recycle of oldest EXPORTED slot." For typical mpv 16-surface playback with MIN_CAP_POOL=24 the fallback never fires. - Multi-context concurrent use not addressed (one V4L2 device, multiple cap_pools -- iter3 scope). Other call sites updated: - picture.c::BeginPicture acquires + binds, releasing prior slot if any. - surface.c::SyncSurface marks slot DECODED after DQBUF. - surface.c::ExportSurfaceHandle marks slot EXPORTED, retaining OUR EXPBUF fd for force-recycle close(). - surface.c::DestroySurfaces releases via surface_unbind_slot; cap_pool owns the mmaps now. - surface.c::CreateSurfaces2 destroys the pool in the resolution-change path before REQBUFS(0) (else stale v4l2_index after Fix 1's REQBUFS). - context.c::DestroyContext invokes cap_pool_destroy. - image.c::DeriveImage skips copy_surface_to_image when current_slot is NULL (ffmpeg av_hwframe_ctx_init probes derive on undecoded surfaces). Verified: mpv vaapi-copy 200 frames bbb_1080p30, 0 drops, LRU visibly recycling slot indices, real luma gradient. mpv vaapi --vo=gpu operator-inspection follows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:03:31 +00:00
test0r	06beef6248	iter2 Fix 1: invalidate format cache on DestroyContext + REQBUFS(0) on CAPTURE in resolution-change path Fix 1 of iteration 2 per phase4_iter2_plan.md. Adds surface_reset_format_cache() exposed from src/surface.h. Called from RequestDestroyContext after the dual REQBUFS(0). Without this, multi-video Firefox sessions on mozilla.org corrupted the next session's CAPTURE format query: the kernel reset to defaults but our LAST_OUTPUT_WIDTH/HEIGHT cache still said 'already 1920x1088,' so the next G_FMT returned 48x48 and the exported descriptor encoded wrong pitch/offset. Also adds REQBUFS(0) on CAPTURE in the resolution-change path of RequestCreateSurfaces2 (Sonnet Phase 5 review iter2 9.1). The existing code only did REQBUFS(0) on OUTPUT before re-S_FMTting; hantro derives CAPTURE format from OUTPUT format, so leftover CAPTURE buffers from the prior resolution would also block the implicit format change. Pre-existing bug surfaced by Sonnet's audit; Fix 3 pool refactor would have exposed it more often. Limitation noted in surface.h docblock: the LAST_OUTPUT_WIDTH/ HEIGHT cache is a static process-global, so concurrent multi- context use still races (Sonnet 7.3 / 9.6). Iteration 2 only addresses sequential sessions. Multi-context safety is iteration 3+. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 19:11:03 +00:00
test0r	841f616e74	h264: gate SCALING_MATRIX submission on VAIQMatrixBuffer presence VAAPI signals "explicit scaling lists are present in the bitstream" implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a VAIQMatrixBufferH264 alongside RenderPicture iff sps_scaling_matrix_present_flag \|\| pps_scaling_matrix_present_flag. When the bitstream uses default (flat) scaling, no IQMatrixBuffer arrives and the in-tree h264.matrix struct stays zero-initialised. fourier's existing codec_store_buffer for MPEG2 and HEVC tracks this via a per-surface iqmatrix_set boolean (surface.h::mpeg2.iqmatrix_set, h265.iqmatrix_set) — the H.264 path was missing the equivalent flag, so set_controls always submitted the scaling matrix, including the zero-initialised case. Symptom on hantro-vpu RK3568: when TRANSFORM_8X8_MODE is enabled in PPS, the kernel multiplies all 8x8 DCT coefficients by the zeroed scaling_list_8x8, producing a zeroed CAPTURE buffer despite a successful decode round-trip (no V4L2_BUF_FLAG_ERROR, bytesused=3655712 reported). Earlier draft of this patch unconditionally omitted SCALING_MATRIX in FRAME_BASED. That's corpus-correct (bbb has no explicit scaling lists) but the wrong predicate: the kernel-side gating is by "matrix-supplied vs. not," not by decode mode. Streams that signal explicit scaling lists must submit SCALING_MATRIX in either mode. Contract verification (audit_0008_decode_params_2026-05-01.md + hantro_h264.c::assemble_scaling_list): the kernel uses the supplied matrix when SCALING_MATRIX is in the control batch and falls back to spec-defined defaults when absent. Mode-independent. This patch: - surface.h: adds bool matrix_set to params.h264, mirroring mpeg2.iqmatrix_set / h265.iqmatrix_set. - picture.c codec_store_buffer (H.264 VAIQMatrixBufferType case): sets matrix_set = true when the buffer arrives. - picture.c RequestBeginPicture: resets matrix_set = false at the start of each Begin/Render/End cycle. - h264.c h264_set_controls: builds the controls[] array incrementally; SPS/PPS/DECODE_PARAMS always; SCALING_MATRIX iff matrix_set; SLICE_PARAMS only in SLICE_BASED; PRED_WEIGHTS only when both SLICE_BASED and V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED. The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is preserved — kernel doc ext-ctrls-codec-stateless.rst:752: "When this mode is selected, the V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall not be set." Cross-reference: kernel UAPI section ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SCALING_MATRIX (matrix supplied iff explicit scaling lists in bitstream) and hantro_h264.c::assemble_scaling_list (consumes supplied matrix or falls back to defaults). Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
test0r	c45fea96e3	fourier-local: stateless control modernization + HEVC strip Compound patch carrying the fork's pre-Step-1 substrate, originally authored by Jernej Škrabec / fourier on top of bootlin's `a3c2476`: - src/h264.c + src/picture.c: V4L2_CID_MPEG_VIDEO_H264_* renamed to V4L2_CID_STATELESS_H264_*, struct shapes tracked to mainline (V4L2_CID_STATELESS_H264_DECODE_MODE/_START_CODE added to the passthrough shim). - include/hevc-ctrls.h: redirect shim to <linux/v4l2-controls.h> (kernel-side HEVC controls now live in the canonical UAPI header). - src/meson.build: src/h265.c / src/h265.h commented out — HEVC build path is excluded from this fork (RK3568 hantro G1/G2 has no HEVC, and the kernel-side HEVC controls have a separate rework in flight upstream). - src/tiled_yuv.S: aarch64 stub for tiled_to_planar (assembly source was sunxi-cedrus armv7-only; aarch64 needs a stub to keep the build linking). - include/h264-ctrls.h: removed (dead post-fourier — no source includes it; the passthrough shim's CID aliases live in the kernel header now). Functionally equivalent to the prior fork master commits: `c1f5108` V4L2_PIX_FMT_H264_SLICE rename `4ccbfe9` Strip HEVC build path `da9f2a5` include/h264-ctrls.h passthrough + CID aliases `fc4bb10` src/h264.c track upstream UAPI shape `13e9b64` src/h264.c drop num_slices field `4d14ffb` src/tiled_yuv.S aarch64 stub `1b02c9b` src/h264.c include utils.h Folded into one commit during 2026-05-04 Step 1 reconciliation (see ../phase0_evidence/2026-05-04/findings.md). Per-patch history of the early fork commits preserved on the pre-step1 branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 09:40:14 +00:00
Paul Kocialkowski	0c611c6b7a	Implement proper timestamping for references Reference frames are now identified using their timestamp: set the timestamp when queuing the output buffer and use it to identify the frame later on. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2019-03-07 11:41:56 +01:00
Paul Kocialkowski	518d7a0c59	Update and harmonize heading author lists Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2019-03-07 11:37:12 +01:00
Paul Kocialkowski	13eaae060e	Add support for H265 decoding, including predictive frames Some features are missing, such as scaling lists (quantization) and 10-bit output. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-08-31 10:13:52 +02:00
Paul Kocialkowski	5fd5c9823b	mpeg2: Update to match latest definitions Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-08-09 14:13:19 +02:00
Paul Kocialkowski	7d1ac10517	Add support for MPEG2 quantization matrices This adds support for MPEG2 quantization matrices, which are optional given that fallback default matrices are used (on the kernel side) when no such matrix is provided. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-25 14:36:36 +02:00
Paul Kocialkowski	c764527c17	Add support for QuerySurfaceAttributes Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-18 15:07:42 +02:00
Paul Kocialkowski	7587ef6901	surface: Add ExportSurfaceHandle support for dma-buf export This is the latest version of dma-buf export, that does support specifying DRM modifiers. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-18 15:02:37 +02:00
Paul Kocialkowski	829abae895	surface: Add basic support for CreateSurfaces2 Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-18 14:40:36 +02:00
Maxime Ripard	913e1e642c	tree: Rename the libva hooks As part of our renaming effort, Rename the libva hooks names to mention request instead of SunxiCedrus Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 17:02:23 +02:00
Maxime Ripard	2d1bce38c2	h264: Don't set num_slices anymore The num_slices parameter was improperly set to the number of reference frames, which is incorrect. Add a counter for the number of slices per surface, and set num_slices to that value. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 15:28:55 +02:00
Maxime Ripard	5aeb07f8bf	tree: Run clang-format to conform to the kernel coding style The coding style has been a bit erratic. Enforce the linux kernel coding style by reusing their .clang-format file, running clang-format on the source, and ignoring the few shortcomings that clang-format has at the moment (especially on aligning the define values). Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 10:12:15 +02:00
Maxime Ripard	fd263773cc	tree: Change the macros to take the actual arguments they are using Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-13 16:00:08 +02:00
Maxime Ripard	1efa9d877e	Add support for H264 decoding Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-11 17:07:15 +02:00
Maxime Ripard	d7d8fc744b	Abstract away MPEG2 support Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-11 17:07:15 +02:00
Paul Kocialkowski	9f2c069f76	Rework buffer management to be more generic and support untiled format Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-11 15:16:52 +02:00
Paul Kocialkowski	e23807f928	Add dummy vaPutSurface implementation As it turns out vaPutSurface is one of the required core functions. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-06-21 09:55:44 +02:00
Paul Kocialkowski	bb73d363a3	Sync with latest definitions from the Cedrus driver and requests API Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-06-21 09:30:06 +02:00
Paul Kocialkowski	c0a3cd8fcd	Remove X11 support with vaPutSurface Using VAAPI as a video output (through vaPutSurface) is deprecated and definitely not recommended for any use case. Since we're starting to support non-X11 pipelines, remove X11 support altogether. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-06-21 09:30:06 +02:00
Paul Kocialkowski	675b9e965e	surface: Remove unused surface_id object member Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-25 11:34:22 +02:00
Paul Kocialkowski	f872e345d0	Centralize buffer-related ressources in surface object and avoid dynamic indexes Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-25 10:48:17 +02:00
Paul Kocialkowski	f70d3fd4d2	surface: Resolve various trivial build issues Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-25 09:12:47 +02:00
Paul Kocialkowski	e25b757b7e	Harmonize defines for headers include protections Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-23 16:40:07 +02:00
Paul Kocialkowski	a5354efe43	Rework comments by splitting them into README and removing redundant ones Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-23 16:40:00 +02:00
Paul Kocialkowski	621b26b781	surface: Rename functions arguments for more clarity Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-23 14:54:40 +02:00
Paul Kocialkowski	2ef39048c2	surface: Reorder object surface structure Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-23 12:00:04 +02:00
Paul Kocialkowski	d8a51f0cd4	Use libVA naming style for public API functions Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-23 11:23:10 +02:00
Paul Kocialkowski	97950176ad	surface: Use object surface structure directly instead of abstract type Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-23 10:56:00 +02:00
Florent Revest	e263c9542c	Adds a sunxi-cedrus-drv-video libVA backend This VA backend uses v4l2's Frame API proposal to interface with the "sunxi-cedrus" video driver on Allwinner SoC. Only a few parts of the code are really dependent on sunxi-cedrus and this VA backend could be reused for other v4l drivers using the Frame API.	2016-08-25 16:19:34 +02:00

41 Commits