Iteration 2 Fix 2: branch on bytesperline alignment when setting
the drm_format_modifier in RequestExportSurfaceHandle. For
non-64-byte-aligned pitches (e.g. 864 for 864-wide videos),
report DRM_FORMAT_MOD_INVALID instead of DRM_FORMAT_MOD_NONE
(LINEAR explicit). Mesa's WSI rejects LINEAR buffers that
aren't scanout-aligned with 'WSI pitch not properly aligned';
MOD_INVALID tells the importer to treat as texture-only, which
is the correct behavior for buffers that don't meet scanout
alignment requirements.
Diagnosis from operator's mozilla.org session in iteration 1
close: 864-wide intro videos triggered the WSI alignment error
and Firefox fell back to SW for those videos.
Sonnet Phase 5 review endorsed the conditional approach over a
universal MOD_INVALID change to preserve LINEAR semantics for
already-aligned content (avoids unnecessary perf cost on the
common 1920-wide case).
Verification path (Phase 7 of iteration 2): Firefox loads
mozilla.org main page; check no MESA WSI errors in stderr;
operator confirms intro videos engage HW decode (or at least
don't fall back).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix 1 of iteration 2 per phase4_iter2_plan.md.
Adds surface_reset_format_cache() exposed from src/surface.h. Called
from RequestDestroyContext after the dual REQBUFS(0). Without this,
multi-video Firefox sessions on mozilla.org corrupted the next
session's CAPTURE format query: the kernel reset to defaults but
our LAST_OUTPUT_WIDTH/HEIGHT cache still said 'already 1920x1088,'
so the next G_FMT returned 48x48 and the exported descriptor
encoded wrong pitch/offset.
Also adds REQBUFS(0) on CAPTURE in the resolution-change path of
RequestCreateSurfaces2 (Sonnet Phase 5 review iter2 9.1). The
existing code only did REQBUFS(0) on OUTPUT before re-S_FMTting;
hantro derives CAPTURE format from OUTPUT format, so leftover
CAPTURE buffers from the prior resolution would also block the
implicit format change. Pre-existing bug surfaced by Sonnet's
audit; Fix 3 pool refactor would have exposed it more often.
Limitation noted in surface.h docblock: the LAST_OUTPUT_WIDTH/
HEIGHT cache is a static process-global, so concurrent multi-
context use still races (Sonnet 7.3 / 9.6). Iteration 2 only
addresses sequential sessions. Multi-context safety is iteration 3+.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Firefox 150's RDD calls vaExportSurfaceHandle with the
VA_EXPORT_SURFACE_SEPARATE_LAYERS flag (per FFmpegVideoDecoder.cpp
GetVAAPISurfaceDescriptor at the libva-VAAPI export site). With
that flag, libva consumers expect 2 separate layers — Y as
DRM_FORMAT_R8, UV as DRM_FORMAT_GR88, each with num_planes=1 —
not the COMPOSED single-layer-with-2-planes shape we always
returned regardless of flags.
Our previous code ignored the flag parameter and always built
the COMPOSED descriptor. mpv works with that because mpv passes
the default (COMPOSED) flag and the shape matches. Firefox's
DMABufSurfaceYUV import code parsed our COMPOSED descriptor as
if it were SEPARATE, found bogus layer-1 data, silently fell
back to FFmpeg(FFVPX) software decode after frame 0.
Fix: branch on the flag and build the appropriate descriptor.
flags & VA_EXPORT_SURFACE_SEPARATE_LAYERS:
num_layers=2
layers[0] = Y as DRM_FORMAT_R8, num_planes=1
layers[1] = UV as DRM_FORMAT_GR88, num_planes=1
default (COMPOSED, including unflagged):
num_layers=1, drm_format=DRM_FORMAT_NV12, num_planes=2
(existing behavior, preserved for mpv et al.)
For the single-fd case (hantro NV12 backed by one CMA buffer),
both layers reference object_index=0 with different offsets and
pitches (both stride=1920 for 1920x1088).
Diagnosed via Firefox source dive (mozilla/gecko-dev master,
dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp:1638) — the
explicit flag in the export call was the discriminator between
mpv's success and Firefox's silent SW fallback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds request_log on entry to:
- RequestSyncSurface
- RequestQuerySurfaceAttributes
- RequestQuerySurfaceStatus (including the returned status value)
- RequestDeriveImage
- RequestQueryImageFormats
- RequestGetImage
Goal: identify which API call Firefox 150 makes that returns
differently than it expects, causing the SW fallback after
frame 0. mpv works end-to-end with the surface-export fix in
place; Firefox does not. Per operator's correction: don't assume
mpv's success means the driver is correct — Firefox may detect
a real spec violation that mpv silently tolerates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The static SET_FORMAT_OF_OUTPUT_ONCE flag pinned the OUTPUT format
to the first call's dimensions, which for mpv's probe pattern means
128x128. Subsequent CreateSurfaces2 for the real 1920x1088 resolution
would then read CAPTURE format from the kernel (which derives from
the OUTPUT format) and get 128x128 sizes back — leading to a
VADRMPRIMESurfaceDescriptor with width=1920 height=1088 but pitch=128
offset=16384. Mesa's WSI rejected this as 'pitch too small,' and the
mpv vaapi --vo=gpu render landed on a solid blue frame. Same root
cause for Firefox 150's SW fallback after frame 0.
Replace SET_FORMAT_OF_OUTPUT_ONCE with LAST_OUTPUT_WIDTH/HEIGHT
tracking. When dimensions change, call REQBUFS(0) on OUTPUT to drop
any stale buffers (S_FMT is rejected by V4L2 while buffers exist),
then re-S_FMT at the new resolution. The kernel will derive the
new CAPTURE format from this OUTPUT format on the next CreateBufs
+ G_FMT cycle.
Caveat (TODO for next iteration): for consumers that legitimately
stream multiple resolutions in sequence (mid-stream resolution
change via V4L2_EVENT_SOURCE_CHANGE), the current approach still
requires CreateSurfaces2 to be called, which mpv does on probe.
A proper context-level redesign would handle SOURCE_CHANGE inline
with STREAMOFF + REQBUFS(0) + new S_FMT.
Diagnosis and root cause: surfaced by 2026-05-04 Phase 5 sonnet
review (finding 7.3) as a 'latent bug to document.' Today's
instrumentation captured it as the active bug — the
ExportSurfaceHandle dump showed pitch=128 for 1920x1088 surfaces
right before MESA reported 'WSI pitch too small' and dropped to
software.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Logs format_width/height + bytesperline + sizes from v4l2_get_format
in CreateSurfaces2, and the full VADRMPRIMESurfaceDescriptor in
ExportSurfaceHandle (fd, fourcc, width/height, num_objects/layers,
obj.size + drm_format/modifier, plane offsets/pitches).
Diagnostic for the surface-export bug surfaced by Phase 7 (mpv
--hwdec=vaapi --vo=gpu shows solid blue, Firefox falls back to SW
after frame 0 — both consumers GL-import the DMA-BUF, both fail
to render correctly while vaapi-copy works).
Phase 5 review (sonnet) suggested format_height might be 1080
(stream) vs 1088 (MB-aligned), miscomputing UV offset by 15360
bytes. Earlier ftrace shows kernel returns height=1088 — the
hypothesis is likely false but verifying in-driver to confirm.
Will compare with mpv --msg-level=vd=v --msg-level=vo=v output to
identify the import-side discrepancy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tier 3E + 3F observability hardening from the libva-multiplanar
campaign Phase 6 follow-up. Improves diagnostic reliability for
future probes; no functional decode path change.
Tier 3E (cache-fix): patch-0010's CAPTURE Y-plane dump now calls
msync(p, 32, MS_SYNC|MS_INVALIDATE) before the read so userspace
sees what the kernel actually DMA-wrote, not a stale CPU cache
line. Without this, the previous version of the dump consistently
showed the patch-0011 sentinel (0xab) even when the kernel had
overwritten it — caused half a day of mistaken "kernel never wrote
the buffer" diagnosis. Also computes a luma min/max/variance
signal so a uniform fill (variance=0) is visually obvious vs
real pixel data (variance > 0).
Tier 3F (VIDIOC_G_EXT_CTRLS readback): after v4l2_set_controls
in h264_set_controls, reads back DECODE_PARAMS + PPS via
v4l2_get_controls (added by patch 0003) on the request fd.
Logs key fields:
dec.idr_pic_id, poc_lsb, refmark_bits, poc_bits — confirms
slice-header parser outputs landed in the V4L2 control batch.
dec.top_foc / bot_foc — confirms
patch-0015 POC sentinel strip actually applied (should NOT
show 65536 unless the strip mis-fired).
dec.frame_num — cross-checks
against VAAPI's pre-parsed frame_num (also already logged by
patch 0014).
pps.flags + (SMP=...) — confirms
SCALING_MATRIX_PRESENT bit set this build.
pps.refidx_l0/l1 — confirms
Tier 1B num_ref_idx writes landed.
Discriminates "we wrote X but kernel saw Y" from "we wrote zero
all along" — the failure mode the original patch series didn't
catch when slice-header bit_size fields were left zero.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Diagnostic-only patch (NOT for upstream). Hex-dumps:
- First 32 bytes of OUTPUT buffer at QBUF time in
picture.c::RequestEndPicture (i.e. what we feed the kernel)
- First 32 bytes of CAPTURE Y-plane after DQBUF in
surface.c::RequestSyncSurface (i.e. what kernel returned)
Lets us see whether:
- OUTPUT bitstream begins with valid ANNEX_B start code + NAL
header byte (e.g. `00 00 01 65` for IDR slice)
- CAPTURE Y-plane after decode contains varied luma data
(working) vs. all-zeros / repeating pattern (kernel didn't
write anything).
Removed once Step 1 decode is verified working. Output goes via
existing request_log() to stderr.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
The hantro stateless decoder derives the CAPTURE format from the
SPS attached to the per-request OUTPUT controls. Calling
VIDIOC_S_FMT on the CAPTURE queue at vaCreateSurfaces2 time can
leave the driver's vb2 state in an inconsistent configuration
where the queue accepts buffers and DQBUF returns successfully but
the kernel never actually writes decoded pixels into them.
Cross-reference: GStreamer's gst-plugins-bad/sys/v4l2codecs/
gstv4l2decoder.c only calls VIDIOC_G_FMT on the CAPTURE side
(via gst_v4l2_decoder_negotiate_src_format and friends). The
same code path produces correctly-decoded NV12 frames on the
same RK3568 hantro-vpu where libva-v4l2-request-with-S_FMT
emits flat-green zeroed CAPTURE buffers.
The v4l2_get_format() call immediately after this block already
gives us the bytesperline / sizes the driver chose; nothing else
in this file consumed the explicit S_FMT side-effects.
Empirical hypothesis test for the lingering "kernel decodes
without errors but emits zeroed CAPTURE" bug. If post-patch
output shows actual picture content, this confirms the
diagnosis: explicit CAPTURE format mutation breaks hantro's
internal state. If output remains flat-green, the bug is
elsewhere and we resume hex-dump-grade instrumentation.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
Commit 3 of the upstreamable plan (upstreamable_design.md §1, §5).
Replaces the prior per-surface OUTPUT-buffer ownership model with a
small driver-wide pool sized by codec pipeline depth (4 H.264 frames
in flight), allocated unconditionally regardless of caller's
num_render_targets.
Prior art (kernel UAPI dev-stateless-decoder.rst, ffmpeg
v4l2_request.c, Chromium V4L2StatelessVideoDecoder, GStreamer
v4l2slh264dec) all decouple OUTPUT and CAPTURE pool sizing. fourier's
"output_count == surfaces_count" model was a category error: OUTPUT
buffers are request-time bitstream slots, CAPTURE buffers are
picture-time DPB slots; their lifecycles and sizing are independent.
Changes:
* NEW src/request_pool.{c,h} (~200 LoC):
- request_pool_init(): CREATE_BUFS + per-slot QUERYBUF + mmap.
- request_pool_destroy(): munmap all, idempotent.
- request_pool_acquire(): round-robin claim; returns V4L2 buffer
index of an unused slot or -1.
- request_pool_release(): mark slot free for reuse.
- request_pool_slot(): accessor for ptr/size given a buffer index.
* src/request.h: add struct request_pool output_pool to request_data.
* src/context.c::RequestCreateContext: replace the per-surface
OUTPUT loop with a single request_pool_init() call (count=4,
independent of surfaces_count). Drop the now-unused locals
(length, offset, source_data, output_buffers_count, index,
index_base, i, surface_object). DELETES patch 0002's
"output_buffers_count = ... ? ... : 4" hack inline — the pool's
own count parameter supersedes it.
* src/picture.c::RequestBeginPicture: borrow a pool slot at frame
start, write its mmap pointer/size/index into the surface's
transient source_* fields. The fields stay (still useful as
a borrow handle that the existing codec_store_buffer memcpys
target), but no longer represent surface-permanent ownership.
Reset slices_size/slices_count here too (was implicit on first
Render).
* src/surface.c::RequestSyncSurface: after VIDIOC_DQBUF returns
the OUTPUT buffer, release the pool slot and clear the surface's
borrow handle. Fixes the segv on second-frame submission.
* src/surface.c::RequestDestroySurfaces: remove the munmap of
source_data — pool owns the mmap.
* src/request.c::RequestTerminate: call request_pool_destroy()
before close(video_fd) so munmaps still target a valid fd.
* src/meson.build: add request_pool.c and request_pool.h to the
sources/headers lists.
This commit removes 0002's OUTPUT-pool hack inline (the
"floor to 4" line is gone). The DECODE_MODE/START_CODE block in 0002
remains until commit 4 lands.
Build-verified clean on aarch64.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
Fourier's local patch already wired multiplanar plumbing through
src/v4l2.c (helpers v4l2_type_video_{output,capture}() at lines 59-69,
struct v4l2_plane planes[] threading in QUERYBUF/QBUF/DQBUF, per-plane
EXPBUF loop at line 411) and through src/context.c, src/buffer.c,
src/picture.c via the v4l2_type_video_{output,capture}(video_format
->v4l2_mplane) helper calls.
The remaining gap: the NV12 entry in src/video.c was hardcoded to
v4l2_mplane=false, and the bootstrap path in src/surface.c was
hardcoded to singleplanar literals before video_format is populated.
This patch flips the NV12 entry to v4l2_mplane=true and updates the
two singleplanar literals in src/surface.c to their MPLANE variants:
- src/video.c:42 v4l2_mplane=false -> true (NV12 only;
Sunxi-tiled NV12 left at false for cedrus compatibility)
- src/surface.c:84 output_type = v4l2_type_video_output(true)
- src/surface.c:109 v4l2_find_format(..., CAPTURE_MPLANE, NV12)
Empirically, hantro-vpu (RK3568 mainline) advertises NV12 only under
V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; querying the singleplanar type
returns no match (verified via VIDIOC_ENUM_FMT in Phase 3 GStreamer
strace baseline).
Trade-off accepted: legacy sunxi-cedrus singleplanar NV12 paths are
left unchanged via the SUNXI_TILED_NV12 entry (still mplane=false,
__arm__ only). Pure-NV12 cedrus on aarch64 would regress, but the
known userbase here is RK3566/RK3568 hantro.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
Compound patch carrying the fork's pre-Step-1 substrate, originally
authored by Jernej Škrabec / fourier on top of bootlin's a3c2476:
- src/h264.c + src/picture.c: V4L2_CID_MPEG_VIDEO_H264_* renamed to
V4L2_CID_STATELESS_H264_*, struct shapes tracked to mainline
(V4L2_CID_STATELESS_H264_DECODE_MODE/_START_CODE added to the
passthrough shim).
- include/hevc-ctrls.h: redirect shim to <linux/v4l2-controls.h>
(kernel-side HEVC controls now live in the canonical UAPI header).
- src/meson.build: src/h265.c / src/h265.h commented out — HEVC
build path is excluded from this fork (RK3568 hantro G1/G2 has
no HEVC, and the kernel-side HEVC controls have a separate
rework in flight upstream).
- src/tiled_yuv.S: aarch64 stub for tiled_to_planar (assembly
source was sunxi-cedrus armv7-only; aarch64 needs a stub to keep
the build linking).
- include/h264-ctrls.h: removed (dead post-fourier — no source
includes it; the passthrough shim's CID aliases live in the
kernel header now).
Functionally equivalent to the prior fork master commits:
c1f5108 V4L2_PIX_FMT_H264_SLICE rename
4ccbfe9 Strip HEVC build path
da9f2a5 include/h264-ctrls.h passthrough + CID aliases
fc4bb10 src/h264.c track upstream UAPI shape
13e9b64 src/h264.c drop num_slices field
4d14ffb src/tiled_yuv.S aarch64 stub
1b02c9b src/h264.c include utils.h
Folded into one commit during 2026-05-04 Step 1 reconciliation
(see ../phase0_evidence/2026-05-04/findings.md). Per-patch history
of the early fork commits preserved on the pre-step1 branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The vaCreateSurface2 may be called multiple times, setting the format
again would lead to EBUSY being returned as you cannot change the
format if you have buffers allocated.
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Apparently, pixelformats are expected to be settable although
the reason is not exactly clear to me.
However, intel vaapi driver sets all its pixelformats as
settable, and gstreamer-vaapi expects that as well.
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Because there might be more than a single call to CreateSurfaces,
we cannot assume that the index relative to the number of surfaces
requested in a single call matches the v4l2 index.
Grab the base index (as returned by the kernel) when allocating
buffers and use it for memory mapping and addressing them in v4l2.
This avoids memory-mapping the first (index 0) buffer multiple times
in that scenario instead of the n-th allocated buffer (in the n-th
call in the sequence).
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Since the V4L2 ioctl is called QUERYBUF, it makes more sense to
call the associated function with the same name.
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
This is the latest version of dma-buf export, that does support
specifying DRM modifiers.
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
void * can be assigned from and stored to any pointer type without any
warning. Remove the explicit casts.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The cedrus_data structure carries the old name. In order to migrate to the
new name, let's rename it to request_data.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The sunxi_cedrus.h header contains a bunch of defines prefixed with
SUNXI_CEDRUS.
As part as the ongoing migration to a more generic name, change that prefix
for V4L2_REQUEST, and the header file to request.h
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
As part of our renaming effort, Rename the libva hooks names to mention
request instead of SunxiCedrus
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The num_slices parameter was improperly set to the number of reference
frames, which is incorrect.
Add a counter for the number of slices per surface, and set num_slices to
that value.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The coding style has been a bit erratic. Enforce the linux kernel coding
style by reusing their .clang-format file, running clang-format on the
source, and ignoring the few shortcomings that clang-format has at the
moment (especially on aligning the define values).
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
This long structure name makes it quite difficult to fit within the 80
characters limit. Shorten it.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Using VAAPI as a video output (through vaPutSurface) is deprecated and
definitely not recommended for any use case. Since we're starting to
support non-X11 pipelines, remove X11 support altogether.
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>