Commit Graph

33 Commits

Author SHA1 Message Date
test0r b993355507 iter5 Track E: move LAST_OUTPUT_WIDTH/HEIGHT from process-global to per-driver-data
Sonnet review 7.3 / 9.6 from iter1 + carried iter2/3/4 substrate.
Two libva driver_data instances in the same process (e.g. Firefox
playing two tabs at different resolutions, or Firefox + mpv via the
same dlopened backend) would race on the static cache.

Move to struct request_data.last_output_width/height. The V4L2
device fd is already per-driver_data, so this is the correct binding
unit (one fd, one current OUTPUT format).

Verified: two concurrent mpv processes (2s stagger) both decode
300 frames cleanly with no cross-corruption. Same-instant init still
hits kernel-level fd contention on /dev/video1 (hantro is a
single-instance device); cross-process serialization is out of scope
for a libva backend.

Resolves the surface_reset_format_cache() callsite: now takes
driver_data parameter (was zero-arg).

Also drops the 'rc' unused-variable warning in v4l2_ioctl_controls
that the iter5 sweep left behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:05:41 +00:00
test0r 19acc76da4 iter2 Fix 3: decoupled CAPTURE buffer pool with LRU recycling
Pre-iter2 each VA surface was permanently 1:1 bound to one V4L2 CAPTURE
buffer. mpv reusing a surface for a new decode while the compositor still
held an EXPBUF'd dma_buf fd to the prior frame caused the kernel to
write fresh decode output into the same physical memory the compositor
was reading -- visible as stutter / back-and-forth swap on
mpv --hwdec=vaapi --vo=gpu playback.

Architecture:
- New cap_pool abstraction (cap_pool.{h,c}) owns N CAPTURE buffers
  (N = max(surfaces_count, MIN_CAP_POOL=24)) with per-slot state
  {FREE, IN_DECODE, DECODED, EXPORTED} guarded by pthread_mutex_t.
- Surfaces no longer own buffers; each vaBeginPicture acquires the
  oldest FREE slot (LRU), binds it for the decode cycle, and the slot
  cycles IN_DECODE -> DECODED (post-DQBUF) -> EXPORTED (post-EXPBUF).
- Slot is released on next BeginPicture for the same surface or on
  vaDestroySurfaces.

Limitations (Sonnet Phase 5 review iter2 9.x, deferred to iter3+):
- Option-A statistical mitigation; race window narrows to "pool
  exhausted, force-recycle of oldest EXPORTED slot." For typical mpv
  16-surface playback with MIN_CAP_POOL=24 the fallback never fires.
- Multi-context concurrent use not addressed (one V4L2 device, multiple
  cap_pools -- iter3 scope).

Other call sites updated:
- picture.c::BeginPicture acquires + binds, releasing prior slot if any.
- surface.c::SyncSurface marks slot DECODED after DQBUF.
- surface.c::ExportSurfaceHandle marks slot EXPORTED, retaining OUR
  EXPBUF fd for force-recycle close().
- surface.c::DestroySurfaces releases via surface_unbind_slot;
  cap_pool owns the mmaps now.
- surface.c::CreateSurfaces2 destroys the pool in the resolution-change
  path before REQBUFS(0) (else stale v4l2_index after Fix 1's REQBUFS).
- context.c::DestroyContext invokes cap_pool_destroy.
- image.c::DeriveImage skips copy_surface_to_image when current_slot is
  NULL (ffmpeg av_hwframe_ctx_init probes derive on undecoded surfaces).

Verified: mpv vaapi-copy 200 frames bbb_1080p30, 0 drops, LRU visibly
recycling slot indices, real luma gradient. mpv vaapi --vo=gpu
operator-inspection follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:03:31 +00:00
test0r 06beef6248 iter2 Fix 1: invalidate format cache on DestroyContext + REQBUFS(0) on CAPTURE in resolution-change path
Fix 1 of iteration 2 per phase4_iter2_plan.md.

Adds surface_reset_format_cache() exposed from src/surface.h. Called
from RequestDestroyContext after the dual REQBUFS(0). Without this,
multi-video Firefox sessions on mozilla.org corrupted the next
session's CAPTURE format query: the kernel reset to defaults but
our LAST_OUTPUT_WIDTH/HEIGHT cache still said 'already 1920x1088,'
so the next G_FMT returned 48x48 and the exported descriptor
encoded wrong pitch/offset.

Also adds REQBUFS(0) on CAPTURE in the resolution-change path of
RequestCreateSurfaces2 (Sonnet Phase 5 review iter2 9.1). The
existing code only did REQBUFS(0) on OUTPUT before re-S_FMTting;
hantro derives CAPTURE format from OUTPUT format, so leftover
CAPTURE buffers from the prior resolution would also block the
implicit format change. Pre-existing bug surfaced by Sonnet's
audit; Fix 3 pool refactor would have exposed it more often.

Limitation noted in surface.h docblock: the LAST_OUTPUT_WIDTH/
HEIGHT cache is a static process-global, so concurrent multi-
context use still races (Sonnet 7.3 / 9.6). Iteration 2 only
addresses sequential sessions. Multi-context safety is iteration 3+.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 19:11:03 +00:00
test0r 841f616e74 h264: gate SCALING_MATRIX submission on VAIQMatrixBuffer presence
VAAPI signals "explicit scaling lists are present in the bitstream"
implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a
VAIQMatrixBufferH264 alongside RenderPicture iff
sps_scaling_matrix_present_flag || pps_scaling_matrix_present_flag.
When the bitstream uses default (flat) scaling, no IQMatrixBuffer
arrives and the in-tree h264.matrix struct stays zero-initialised.

fourier's existing codec_store_buffer for MPEG2 and HEVC tracks this
via a per-surface iqmatrix_set boolean (surface.h::mpeg2.iqmatrix_set,
h265.iqmatrix_set) — the H.264 path was missing the equivalent flag,
so set_controls always submitted the scaling matrix, including the
zero-initialised case.

Symptom on hantro-vpu RK3568: when TRANSFORM_8X8_MODE is enabled in
PPS, the kernel multiplies all 8x8 DCT coefficients by the zeroed
scaling_list_8x8, producing a zeroed CAPTURE buffer despite a
successful decode round-trip (no V4L2_BUF_FLAG_ERROR,
bytesused=3655712 reported).

Earlier draft of this patch unconditionally omitted SCALING_MATRIX in
FRAME_BASED. That's corpus-correct (bbb has no explicit scaling
lists) but the wrong predicate: the kernel-side gating is by
"matrix-supplied vs. not," not by decode mode. Streams that signal
explicit scaling lists must submit SCALING_MATRIX in either mode.

Contract verification (audit_0008_decode_params_2026-05-01.md +
hantro_h264.c::assemble_scaling_list): the kernel uses the supplied
matrix when SCALING_MATRIX is in the control batch and falls back
to spec-defined defaults when absent. Mode-independent.

This patch:
  - surface.h: adds bool matrix_set to params.h264, mirroring
    mpeg2.iqmatrix_set / h265.iqmatrix_set.
  - picture.c codec_store_buffer (H.264 VAIQMatrixBufferType case):
    sets matrix_set = true when the buffer arrives.
  - picture.c RequestBeginPicture: resets matrix_set = false at the
    start of each Begin/Render/End cycle.
  - h264.c h264_set_controls: builds the controls[] array
    incrementally; SPS/PPS/DECODE_PARAMS always; SCALING_MATRIX iff
    matrix_set; SLICE_PARAMS only in SLICE_BASED; PRED_WEIGHTS only
    when both SLICE_BASED and V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.

The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is preserved —
kernel doc ext-ctrls-codec-stateless.rst:752: "When this mode is
selected, the V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall
not be set."

Cross-reference: kernel UAPI section
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SCALING_MATRIX
(matrix supplied iff explicit scaling lists in bitstream) and
hantro_h264.c::assemble_scaling_list (consumes supplied matrix or
falls back to defaults).

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r c45fea96e3 fourier-local: stateless control modernization + HEVC strip
Compound patch carrying the fork's pre-Step-1 substrate, originally
authored by Jernej Škrabec / fourier on top of bootlin's a3c2476:

- src/h264.c + src/picture.c: V4L2_CID_MPEG_VIDEO_H264_* renamed to
  V4L2_CID_STATELESS_H264_*, struct shapes tracked to mainline
  (V4L2_CID_STATELESS_H264_DECODE_MODE/_START_CODE added to the
  passthrough shim).
- include/hevc-ctrls.h: redirect shim to <linux/v4l2-controls.h>
  (kernel-side HEVC controls now live in the canonical UAPI header).
- src/meson.build: src/h265.c / src/h265.h commented out — HEVC
  build path is excluded from this fork (RK3568 hantro G1/G2 has
  no HEVC, and the kernel-side HEVC controls have a separate
  rework in flight upstream).
- src/tiled_yuv.S: aarch64 stub for tiled_to_planar (assembly
  source was sunxi-cedrus armv7-only; aarch64 needs a stub to keep
  the build linking).
- include/h264-ctrls.h: removed (dead post-fourier — no source
  includes it; the passthrough shim's CID aliases live in the
  kernel header now).

Functionally equivalent to the prior fork master commits:
  c1f5108 V4L2_PIX_FMT_H264_SLICE rename
  4ccbfe9 Strip HEVC build path
  da9f2a5 include/h264-ctrls.h passthrough + CID aliases
  fc4bb10 src/h264.c track upstream UAPI shape
  13e9b64 src/h264.c drop num_slices field
  4d14ffb src/tiled_yuv.S aarch64 stub
  1b02c9b src/h264.c include utils.h

Folded into one commit during 2026-05-04 Step 1 reconciliation
(see ../phase0_evidence/2026-05-04/findings.md). Per-patch history
of the early fork commits preserved on the pre-step1 branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 09:40:14 +00:00
Paul Kocialkowski 0c611c6b7a Implement proper timestamping for references
Reference frames are now identified using their timestamp:
set the timestamp when queuing the output buffer and use it to identify
the frame later on.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:41:56 +01:00
Paul Kocialkowski 518d7a0c59 Update and harmonize heading author lists
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:37:12 +01:00
Paul Kocialkowski 13eaae060e Add support for H265 decoding, including predictive frames
Some features are missing, such as scaling lists (quantization) and
10-bit output.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-08-31 10:13:52 +02:00
Paul Kocialkowski 5fd5c9823b mpeg2: Update to match latest definitions
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-08-09 14:13:19 +02:00
Paul Kocialkowski 7d1ac10517 Add support for MPEG2 quantization matrices
This adds support for MPEG2 quantization matrices, which are optional
given that fallback default matrices are used (on the kernel side) when
no such matrix is provided.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-25 14:36:36 +02:00
Paul Kocialkowski c764527c17 Add support for QuerySurfaceAttributes
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-18 15:07:42 +02:00
Paul Kocialkowski 7587ef6901 surface: Add ExportSurfaceHandle support for dma-buf export
This is the latest version of dma-buf export, that does support
specifying DRM modifiers.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-18 15:02:37 +02:00
Paul Kocialkowski 829abae895 surface: Add basic support for CreateSurfaces2
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-18 14:40:36 +02:00
Maxime Ripard 913e1e642c tree: Rename the libva hooks
As part of our renaming effort, Rename the libva hooks names to mention
request instead of SunxiCedrus

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 17:02:23 +02:00
Maxime Ripard 2d1bce38c2 h264: Don't set num_slices anymore
The num_slices parameter was improperly set to the number of reference
frames, which is incorrect.

Add a counter for the number of slices per surface, and set num_slices to
that value.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 15:28:55 +02:00
Maxime Ripard 5aeb07f8bf tree: Run clang-format to conform to the kernel coding style
The coding style has been a bit erratic. Enforce the linux kernel coding
style by reusing their .clang-format file, running clang-format on the
source, and ignoring the few shortcomings that clang-format has at the
moment (especially on aligning the define values).

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 10:12:15 +02:00
Maxime Ripard fd263773cc tree: Change the macros to take the actual arguments they are using
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-13 16:00:08 +02:00
Maxime Ripard 1efa9d877e Add support for H264 decoding
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-11 17:07:15 +02:00
Maxime Ripard d7d8fc744b Abstract away MPEG2 support
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-11 17:07:15 +02:00
Paul Kocialkowski 9f2c069f76 Rework buffer management to be more generic and support untiled format
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-11 15:16:52 +02:00
Paul Kocialkowski e23807f928 Add dummy vaPutSurface implementation
As it turns out vaPutSurface is one of the required core functions.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-06-21 09:55:44 +02:00
Paul Kocialkowski bb73d363a3 Sync with latest definitions from the Cedrus driver and requests API
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-06-21 09:30:06 +02:00
Paul Kocialkowski c0a3cd8fcd Remove X11 support with vaPutSurface
Using VAAPI as a video output (through vaPutSurface) is deprecated and
definitely not recommended for any use case. Since we're starting to
support non-X11 pipelines, remove X11 support altogether.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-06-21 09:30:06 +02:00
Paul Kocialkowski 675b9e965e surface: Remove unused surface_id object member
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-25 11:34:22 +02:00
Paul Kocialkowski f872e345d0 Centralize buffer-related ressources in surface object and avoid dynamic indexes
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-25 10:48:17 +02:00
Paul Kocialkowski f70d3fd4d2 surface: Resolve various trivial build issues
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-25 09:12:47 +02:00
Paul Kocialkowski e25b757b7e Harmonize defines for headers include protections
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-23 16:40:07 +02:00
Paul Kocialkowski a5354efe43 Rework comments by splitting them into README and removing redundant ones
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-23 16:40:00 +02:00
Paul Kocialkowski 621b26b781 surface: Rename functions arguments for more clarity
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-23 14:54:40 +02:00
Paul Kocialkowski 2ef39048c2 surface: Reorder object surface structure
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-23 12:00:04 +02:00
Paul Kocialkowski d8a51f0cd4 Use libVA naming style for public API functions
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-23 11:23:10 +02:00
Paul Kocialkowski 97950176ad surface: Use object surface structure directly instead of abstract type
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-23 10:56:00 +02:00
Florent Revest e263c9542c Adds a sunxi-cedrus-drv-video libVA backend
This VA backend uses v4l2's Frame API proposal to interface with the
"sunxi-cedrus" video driver on Allwinner SoC. Only a few parts of the
code are really dependent on sunxi-cedrus and this VA backend could be
reused for other v4l drivers using the Frame API.
2016-08-25 16:19:34 +02:00