libva-v4l2-request-fourier

Author	SHA1	Message	Date
Markus Fritsche	a12d29937c	iter4 DEBUG: Y2 v3 — retry with TRY_EXT_CTRLS on S_EXT_CTRLS EINVAL Per kernel comment in v4l2-ctrls-api.c:222-224, S_EXT_CTRLS deliberately obfuscates by setting error_idx = count, while TRY_EXT_CTRLS reports the actual failing index. Adds TRY retry inside the EINVAL diagnostic path. Empirical finding (iter4 Phase 4): TRY also returned error_idx == count on the frame-11 EINVAL on bbb_1080p30. Conclusion: failure is in the post-validate cluster commit (hantro driver's try_ctrl op or similar state-coherence check), NOT in any individual control's std_validate. The kernel comment may be outdated for compound controls, or the H.264 stateless cluster is committed atomically post-validate where error_idx is intentionally not updated for either S or TRY. Path forward (Phase 4 next): switch from "read kernel source" to "diff our DECODE_PARAMS construction vs FFmpeg's libavcodec/v4l2_request_h264.c" to identify field-by-field divergence at frame 11. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:25:40 +00:00
Markus Fritsche	086b7ce8cb	iter3 DEBUG: S_EXT_CTRLS EINVAL diagnostic in v4l2_ioctl_controls When VIDIOC_S_EXT_CTRLS returns -EINVAL, log num_controls, error_idx, and per-control id+size. Lets iter3+ debug "Unable to set control(s): Invalid argument" failures by naming exactly which control set was rejected — previously the request_log line in v4l2_set_controls just printed strerror(errno) with no specificity. Used in iter3 Phase 7 to confirm the frame-11 EINVAL is request-level ("error_idx == num_controls" sentinel = kernel rejected but couldn't pinpoint a single field) rather than a single-control size mismatch. To remove at iter4 DEBUG sweep alongside iter1 ENTER/CAPTURE-dump instrumentation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:57:01 +00:00
Markus Fritsche	4a7a07e0f4	iter3 Fix: select() → poll() in media_request_wait_completion Firefox's RDD seccomp common policy admits poll/ppoll/epoll_* but does NOT admit select/pselect6. Under the iter3 sandbox-patched RDD process, our select(except_fds) call returned ENOSYS (Mozilla's seccomp uses SECCOMP_RET_ERRNO with ENOSYS for filtered syscalls — not SIGSYS), killing libva decode after just one BeginPicture. poll(POLLPRI) is functionally equivalent for waiting on the media request fd's exceptional-condition completion signal, and lives inside a syscall family Mozilla's sandbox already permits. Driver-side fix preferred over expanding Firefox's seccomp surface — smaller blast radius, portable across sandbox policies, and poll() is the modern API. Verified iter3 Phase 7 on ohm: with this change in place plus the firefox-fourier broker + seccomp ioctl '\|' patches, Firefox decodes through libva inside the sandboxed RDD without MOZ_DISABLE_RDD_SANDBOX=1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:56:49 +00:00
Markus Fritsche	19acc76da4	iter2 Fix 3: decoupled CAPTURE buffer pool with LRU recycling Pre-iter2 each VA surface was permanently 1:1 bound to one V4L2 CAPTURE buffer. mpv reusing a surface for a new decode while the compositor still held an EXPBUF'd dma_buf fd to the prior frame caused the kernel to write fresh decode output into the same physical memory the compositor was reading -- visible as stutter / back-and-forth swap on mpv --hwdec=vaapi --vo=gpu playback. Architecture: - New cap_pool abstraction (cap_pool.{h,c}) owns N CAPTURE buffers (N = max(surfaces_count, MIN_CAP_POOL=24)) with per-slot state {FREE, IN_DECODE, DECODED, EXPORTED} guarded by pthread_mutex_t. - Surfaces no longer own buffers; each vaBeginPicture acquires the oldest FREE slot (LRU), binds it for the decode cycle, and the slot cycles IN_DECODE -> DECODED (post-DQBUF) -> EXPORTED (post-EXPBUF). - Slot is released on next BeginPicture for the same surface or on vaDestroySurfaces. Limitations (Sonnet Phase 5 review iter2 9.x, deferred to iter3+): - Option-A statistical mitigation; race window narrows to "pool exhausted, force-recycle of oldest EXPORTED slot." For typical mpv 16-surface playback with MIN_CAP_POOL=24 the fallback never fires. - Multi-context concurrent use not addressed (one V4L2 device, multiple cap_pools -- iter3 scope). Other call sites updated: - picture.c::BeginPicture acquires + binds, releasing prior slot if any. - surface.c::SyncSurface marks slot DECODED after DQBUF. - surface.c::ExportSurfaceHandle marks slot EXPORTED, retaining OUR EXPBUF fd for force-recycle close(). - surface.c::DestroySurfaces releases via surface_unbind_slot; cap_pool owns the mmaps now. - surface.c::CreateSurfaces2 destroys the pool in the resolution-change path before REQBUFS(0) (else stale v4l2_index after Fix 1's REQBUFS). - context.c::DestroyContext invokes cap_pool_destroy. - image.c::DeriveImage skips copy_surface_to_image when current_slot is NULL (ffmpeg av_hwframe_ctx_init probes derive on undecoded surfaces). Verified: mpv vaapi-copy 200 frames bbb_1080p30, 0 drops, LRU visibly recycling slot indices, real luma gradient. mpv vaapi --vo=gpu operator-inspection follows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:03:31 +00:00
Markus Fritsche	e64bb0852d	iter2 Fix 2: conditional DRM_FORMAT_MOD_INVALID for non-64-aligned pitch Iteration 2 Fix 2: branch on bytesperline alignment when setting the drm_format_modifier in RequestExportSurfaceHandle. For non-64-byte-aligned pitches (e.g. 864 for 864-wide videos), report DRM_FORMAT_MOD_INVALID instead of DRM_FORMAT_MOD_NONE (LINEAR explicit). Mesa's WSI rejects LINEAR buffers that aren't scanout-aligned with 'WSI pitch not properly aligned'; MOD_INVALID tells the importer to treat as texture-only, which is the correct behavior for buffers that don't meet scanout alignment requirements. Diagnosis from operator's mozilla.org session in iteration 1 close: 864-wide intro videos triggered the WSI alignment error and Firefox fell back to SW for those videos. Sonnet Phase 5 review endorsed the conditional approach over a universal MOD_INVALID change to preserve LINEAR semantics for already-aligned content (avoids unnecessary perf cost on the common 1920-wide case). Verification path (Phase 7 of iteration 2): Firefox loads mozilla.org main page; check no MESA WSI errors in stderr; operator confirms intro videos engage HW decode (or at least don't fall back). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 19:18:55 +00:00
Markus Fritsche	06beef6248	iter2 Fix 1: invalidate format cache on DestroyContext + REQBUFS(0) on CAPTURE in resolution-change path Fix 1 of iteration 2 per phase4_iter2_plan.md. Adds surface_reset_format_cache() exposed from src/surface.h. Called from RequestDestroyContext after the dual REQBUFS(0). Without this, multi-video Firefox sessions on mozilla.org corrupted the next session's CAPTURE format query: the kernel reset to defaults but our LAST_OUTPUT_WIDTH/HEIGHT cache still said 'already 1920x1088,' so the next G_FMT returned 48x48 and the exported descriptor encoded wrong pitch/offset. Also adds REQBUFS(0) on CAPTURE in the resolution-change path of RequestCreateSurfaces2 (Sonnet Phase 5 review iter2 9.1). The existing code only did REQBUFS(0) on OUTPUT before re-S_FMTting; hantro derives CAPTURE format from OUTPUT format, so leftover CAPTURE buffers from the prior resolution would also block the implicit format change. Pre-existing bug surfaced by Sonnet's audit; Fix 3 pool refactor would have exposed it more often. Limitation noted in surface.h docblock: the LAST_OUTPUT_WIDTH/ HEIGHT cache is a static process-global, so concurrent multi- context use still races (Sonnet 7.3 / 9.6). Iteration 2 only addresses sequential sessions. Multi-context safety is iteration 3+. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 19:11:03 +00:00
Markus Fritsche	c036a44f98	image: fully populate VAImageFormat per VAAPI spec for NV12 QueryImageFormats and DeriveImage previously set only .fourcc and left byte_order, bits_per_pixel, depth, and color masks zero (uninitialized in the caller's buffer). VAAPI consumers that read these fields (FFmpeg's hwcontext_vaapi.c::vaapi_init_pixfmt, intel-vaapi-driver test paths) inherit caller-stack garbage with non-deterministic behavior. Cross-reference: Mesa's gallium/frontends/va/image.c and intel-vaapi-driver's i965_drv_video.c both publish NV12 with byte_order=VA_LSB_FIRST and bits_per_pixel=12. We now match. For YUV formats, depth/red_mask/green_mask/blue_mask/alpha_mask are not meaningful (RGB-bitlayout-only fields); leave them zeroed via memset. Audit context: 2026-05-04 cross-reference of all libva entry points Firefox 150 calls vs our backend implementations. The SEPARATE_LAYERS fix (commit `ac891a0`) cleared the load-bearing bug; this fixes a latent uninitialized-field issue that was masked by mpv's tolerance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 18:34:50 +00:00
Markus Fritsche	ac891a01fa	surface: honor VA_EXPORT_SURFACE_SEPARATE_LAYERS in vaExportSurfaceHandle Firefox 150's RDD calls vaExportSurfaceHandle with the VA_EXPORT_SURFACE_SEPARATE_LAYERS flag (per FFmpegVideoDecoder.cpp GetVAAPISurfaceDescriptor at the libva-VAAPI export site). With that flag, libva consumers expect 2 separate layers — Y as DRM_FORMAT_R8, UV as DRM_FORMAT_GR88, each with num_planes=1 — not the COMPOSED single-layer-with-2-planes shape we always returned regardless of flags. Our previous code ignored the flag parameter and always built the COMPOSED descriptor. mpv works with that because mpv passes the default (COMPOSED) flag and the shape matches. Firefox's DMABufSurfaceYUV import code parsed our COMPOSED descriptor as if it were SEPARATE, found bogus layer-1 data, silently fell back to FFmpeg(FFVPX) software decode after frame 0. Fix: branch on the flag and build the appropriate descriptor. flags & VA_EXPORT_SURFACE_SEPARATE_LAYERS: num_layers=2 layers[0] = Y as DRM_FORMAT_R8, num_planes=1 layers[1] = UV as DRM_FORMAT_GR88, num_planes=1 default (COMPOSED, including unflagged): num_layers=1, drm_format=DRM_FORMAT_NV12, num_planes=2 (existing behavior, preserved for mpv et al.) For the single-fd case (hantro NV12 backed by one CMA buffer), both layers reference object_index=0 with different offsets and pitches (both stride=1920 for 1920x1088). Diagnosed via Firefox source dive (mozilla/gecko-dev master, dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp:1638) — the explicit flag in the export call was the discriminator between mpv's success and Firefox's silent SW fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:32:12 +00:00
Markus Fritsche	fdfee2d661	DEBUG: log SyncSurface RETURN to confirm clean exit before crash	2026-05-04 15:08:25 +00:00
Markus Fritsche	21ae311077	DEBUG: ENTER on CreateBuffer + BeginPicture for frame-1 crash narrowing	2026-05-04 14:43:29 +00:00
Markus Fritsche	92f5b254e6	DEBUG: ENTER on buffer/image entry points to localize Firefox RDD crash	2026-05-04 14:28:53 +00:00
Markus Fritsche	7da2b27454	DEBUG: ENTER logging at libva entry points to trace Firefox call flow Adds request_log on entry to: - RequestSyncSurface - RequestQuerySurfaceAttributes - RequestQuerySurfaceStatus (including the returned status value) - RequestDeriveImage - RequestQueryImageFormats - RequestGetImage Goal: identify which API call Firefox 150 makes that returns differently than it expects, causing the SW fallback after frame 0. mpv works end-to-end with the surface-export fix in place; Firefox does not. Per operator's correction: don't assume mpv's success means the driver is correct — Firefox may detect a real spec violation that mpv silently tolerates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:17:52 +00:00
Markus Fritsche	37c0e720fc	surface: re-set OUTPUT format on resolution change The static SET_FORMAT_OF_OUTPUT_ONCE flag pinned the OUTPUT format to the first call's dimensions, which for mpv's probe pattern means 128x128. Subsequent CreateSurfaces2 for the real 1920x1088 resolution would then read CAPTURE format from the kernel (which derives from the OUTPUT format) and get 128x128 sizes back — leading to a VADRMPRIMESurfaceDescriptor with width=1920 height=1088 but pitch=128 offset=16384. Mesa's WSI rejected this as 'pitch too small,' and the mpv vaapi --vo=gpu render landed on a solid blue frame. Same root cause for Firefox 150's SW fallback after frame 0. Replace SET_FORMAT_OF_OUTPUT_ONCE with LAST_OUTPUT_WIDTH/HEIGHT tracking. When dimensions change, call REQBUFS(0) on OUTPUT to drop any stale buffers (S_FMT is rejected by V4L2 while buffers exist), then re-S_FMT at the new resolution. The kernel will derive the new CAPTURE format from this OUTPUT format on the next CreateBufs + G_FMT cycle. Caveat (TODO for next iteration): for consumers that legitimately stream multiple resolutions in sequence (mid-stream resolution change via V4L2_EVENT_SOURCE_CHANGE), the current approach still requires CreateSurfaces2 to be called, which mpv does on probe. A proper context-level redesign would handle SOURCE_CHANGE inline with STREAMOFF + REQBUFS(0) + new S_FMT. Diagnosis and root cause: surfaced by 2026-05-04 Phase 5 sonnet review (finding 7.3) as a 'latent bug to document.' Today's instrumentation captured it as the active bug — the ExportSurfaceHandle dump showed pitch=128 for 1920x1088 surfaces right before MESA reported 'WSI pitch too small' and dropped to software. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:05:16 +00:00
Markus Fritsche	2517a1206b	DEBUG: instrument surface CreateSurfaces2 + ExportSurfaceHandle for diagnosis Logs format_width/height + bytesperline + sizes from v4l2_get_format in CreateSurfaces2, and the full VADRMPRIMESurfaceDescriptor in ExportSurfaceHandle (fd, fourcc, width/height, num_objects/layers, obj.size + drm_format/modifier, plane offsets/pitches). Diagnostic for the surface-export bug surfaced by Phase 7 (mpv --hwdec=vaapi --vo=gpu shows solid blue, Firefox falls back to SW after frame 0 — both consumers GL-import the DMA-BUF, both fail to render correctly while vaapi-copy works). Phase 5 review (sonnet) suggested format_height might be 1080 (stream) vs 1088 (MB-aligned), miscomputing UV offset by 15360 bytes. Earlier ftrace shows kernel returns height=1088 — the hypothesis is likely false but verifying in-driver to confirm. Will compare with mpv --msg-level=vd=v --msg-level=vo=v output to identify the import-side discrepancy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:00:13 +00:00
Markus Fritsche	6be3f3b120	h264: rate-limit V4L2 readback EACCES warning to once per process Hantro+v4l2 on Linux 6.19.x returns EACCES from VIDIOC_G_EXT_CTRLS on a request_fd in QUEUEING state for compound H.264 controls. Not actionable from userspace — kernel-side permission check whose semantics aren't yet investigated. Decode is unaffected (SET-side write succeeds; we just can't verify via readback from this rig). Logging the failure once per process instead of per-frame keeps the diagnostic message visible without flooding stderr. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 13:00:49 +00:00
Markus Fritsche	a047926dbc	DEBUG: cache-fix CAPTURE dump + VIDIOC_G_EXT_CTRLS readback Tier 3E + 3F observability hardening from the libva-multiplanar campaign Phase 6 follow-up. Improves diagnostic reliability for future probes; no functional decode path change. Tier 3E (cache-fix): patch-0010's CAPTURE Y-plane dump now calls msync(p, 32, MS_SYNC\|MS_INVALIDATE) before the read so userspace sees what the kernel actually DMA-wrote, not a stale CPU cache line. Without this, the previous version of the dump consistently showed the patch-0011 sentinel (0xab) even when the kernel had overwritten it — caused half a day of mistaken "kernel never wrote the buffer" diagnosis. Also computes a luma min/max/variance signal so a uniform fill (variance=0) is visually obvious vs real pixel data (variance > 0). Tier 3F (VIDIOC_G_EXT_CTRLS readback): after v4l2_set_controls in h264_set_controls, reads back DECODE_PARAMS + PPS via v4l2_get_controls (added by patch 0003) on the request fd. Logs key fields: dec.idr_pic_id, poc_lsb, refmark_bits, poc_bits — confirms slice-header parser outputs landed in the V4L2 control batch. dec.top_foc / bot_foc — confirms patch-0015 POC sentinel strip actually applied (should NOT show 65536 unless the strip mis-fired). dec.frame_num — cross-checks against VAAPI's pre-parsed frame_num (also already logged by patch 0014). pps.flags + (SMP=...) — confirms SCALING_MATRIX_PRESENT bit set this build. pps.refidx_l0/l1 — confirms Tier 1B num_ref_idx writes landed. Discriminates "we wrote X but kernel saw Y" from "we wrote zero all along" — the failure mode the original patch series didn't catch when slice-header bit_size fields were left zero. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 12:58:52 +00:00
Markus Fritsche	9de1be34ef	h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields The load-bearing fix from diff_against_ffmpeg.md (campaign repo). Adds src/h264_slice_header.{c,h} — a minimal H.264 slice_header() bit-parser per ITU-T H.264 (08/2024) §7.3.3. Parses just enough of the slice header to populate the V4L2 DECODE_PARAMS fields VAAPI doesn't carry and that hantro G1 hardware reads directly out of DECODE_PARAMS into MMIO registers: dec_param->dec_ref_pic_marking_bit_size -> G1_REG_DEC_CTRL5_REFPIC_MK_LEN dec_param->idr_pic_id -> G1_REG_DEC_CTRL5_IDR_PIC_ID dec_param->pic_order_cnt_bit_size -> G1_REG_DEC_CTRL6_POC_LENGTH dec_param->pic_order_cnt_lsb -> hantro reflist builder (poc_type=0) dec_param->delta_pic_order_cnt_bottom -> same dec_param->delta_pic_order_cnt0/1 -> hantro reflist builder (poc_type=1) Without these set correctly, hantro's hardware bitstream parser walks past zero bits in the slice header, lands on garbage, decodes zero pixels — the all-zero CAPTURE output observed across both mpv and Firefox during 2026-05-04 Phase 0 (see libva-multiplanar campaign phase0_evidence/2026-05-04-kernel-trace/findings.md). Implementation: - Minimal RBSP bit reader (br_read_u/_ue/_se), MSB-first, fault-flag on overrun. - Emulation-prevention unescape (strips 0x03 after 0x00 0x00) on the first 64 bytes of the slice — slice headers fit comfortably. - Walks slice_header() up to and including dec_ref_pic_marking(), measuring bit positions for the *_bit_size fields. - Skips ref_pic_list_modification() and pred_weight_table() — needed only to advance the bit position to dec_ref_pic_marking(). - Returns a struct with the V4L2 fields plus diagnostics (first_mb_in_slice, slice_type, pps_id, frame_num). Wired into h264_va_picture_to_v4l2 (src/h264.c) right after the nal_ref_idc/nal_unit_type extraction. SPS/PPS context is built from VAPicture's seq_fields and pic_fields; num_ref_idx_l0/l1_active defaults come from VASlice (best available substitute for the parsed PPS values). On parse success, populates decode_params with the recovered values + emits a request_log with the decoded fields for cross-validation against VAAPI's pre-parsed values. src/meson.build: adds h264_slice_header.{c,h} to sources. Cross-references: - FFmpeg libavcodec/h264_slice.c (Kwiboo v4l2-request-n8.1) — populates H264SliceContext::ref_pic_marking_bit_size / pic_order_cnt_bit_size by the same bit-precise parse, then v4l2_request_h264.c forwards to V4L2. - Linux drivers/media/platform/verisilicon/hantro_g1_h264_dec.c set_params() — the register-write code that reads these fields. MVC nal_unit_type 20/21 unhandled (this fork strips MVC alongside HEVC). Multi-slice non-IDR streams parse the first slice's header only; for FRAME_BASED mode that's fine — kernel sees the whole bitstream and parses subsequent slices itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 12:34:47 +00:00
Markus Fritsche	d41a4b96b3	h264: always submit SCALING_MATRIX + populate pps num_ref_idx Three Tier-2C/1B fixes from diff_against_ffmpeg.md (campaign repo): 1. Submit V4L2_CID_STATELESS_H264_SCALING_MATRIX every frame, with the H.264 spec flat default (every entry = 16) when the consumer didn't send a VAIQMatrixBufferH264. New helper: h264_default_flat_scaling_matrix(). Mirrors FFmpeg's v4l2_request_h264.c which always provides a scaling matrix. Replaces patch 0012's VAIQMatrixBuffer-conditional submission — that was corpus-correct (bbb has no explicit scaling lists) but inconsistent with what hantro G1 expects. 2. Set pps->flags \|= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT unconditionally. Hantro G1's set_params reads this flag to gate G1_REG_DEC_CTRL2_TYPE1_QUANT_E. 3. Populate pps->num_ref_idx_l0/l1_default_active_minus1 from VASliceParameterBufferH264.num_ref_idx_l*_active_minus1. Hantro G1 writes both into G1_REG_DEC_CTRL6_REFIDX0_ACTIVE / REFIDX1_ACTIVE. VAAPI doesn't expose the parsed-PPS default fields; the per-slice override is the closest available source (matches PPS default except on streams with explicit per-slice override). Why now: 2026-05-04 Phase 0 kernel-side audit (kernel source drivers/media/platform/verisilicon/hantro_g1_h264_dec.c) showed hantro G1 writes these fields directly into hardware MMIO registers. Prior assumption that they're "informational" or that "VAAPI handles defaults" was wrong — the hardware uses them to bit-walk the slice header and to size reference lists. See ~/src/libva-multiplanar/diff_against_ffmpeg.md. This is the easy half of the fix. The load-bearing half — adding a slice-header bit-parser to populate dec_param->dec_ref_pic_ marking_bit_size, idr_pic_id, pic_order_cnt_bit_size — comes in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 12:21:23 +00:00
Markus Fritsche	74b3793e3c	STUDY.md: pointer to libva-multiplanar campaign Phase 0 The Phase 0 / Phase 2 substrate that previously lived in this fork's STUDY.md has been migrated to the campaign-level phase0_findings.md at ../phase0_findings.md. This file is a pointer only. Note: after the 2026-05-04 Step 1 reconciliation (resetting fork master to bootlin `a3c2476` and replaying the marfrit-packages 18-patch series as commits), the historical commit referenced as `e0acc33` (STUDY.md phase 2 finding) lives only on the pre-step1 branch. To recover the historical content: 'git show pre-step1:STUDY.md'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 09:45:45 +00:00
Markus Fritsche	8594d74275	h264: derive sps.level_idc from H.264 Annex A.3 MaxFS Replaces patch 0013's hardcoded level_idc = 51 with a small lookup that picks the smallest level whose MaxFS contains the encoded frame size. Patch 0013's TODO is resolved by this change. VAAPI does not expose level_idc on the decode side (VAPictureParameterBufferH264 has no such field; only VAEncSequenceParameterBufferH264 carries it). The H.264 SPS NAL is parsed client-side by ffmpeg-vaapi and only slice data forwards in VASliceDataBuffer, so a SPS-NAL byte parser is not viable from the bitstream the libva-v4l2-request layer receives. We therefore derive level_idc from picture dimensions, which VAAPI does provide in VAPictureParameterBufferH264.picture_{width,height}_in_mbs_minus1. Annex A.3 (Table A-1) MaxFS thresholds: Level 1.0: 99 MBs ( 176×144 = 11×9 = 99 ) Level 1.1: 396 ( 352×288 = 22×18 = 396 ) Level 2.0: 396 Level 2.1: 792 ( 352×576 / 720×288 ) Level 2.2: 1620 ( 720×480 ≈ 1350; 720×576 = 1620 ) Level 3.0: 1620 Level 3.1: 3600 (1280×720 ≈ 3600 ) Level 3.2: 5120 Level 4.0: 8192 (1920×1088 = 8160 fits ) Level 4.1: 8192 Level 4.2: 8704 Level 5.0: 22080 Level 5.1: 36864 (3840×2176 = 32640 fits; 4K@8K-edge ) Level 5.2: 36864 Level 6.0: 139264 (8K ) V4L2 control encoding: level_idc = (level major × 10) + (level minor). Level 4.1 → 41, Level 5.1 → 51, Level 6.0 → 60. Picks for typical content: 1080p (1920×1088 = 8160 MBs) → Level 4.1 (level_idc = 41) 4K (3840×2176 = 32640 MBs) → Level 5.1 (level_idc = 51) 8K (7680×4352 = 130560 MBs) → Level 6.0 (level_idc = 60) The previous hardcode of 51 was over-allocating for 1080p; with this patch hantro can pre-allocate based on the actual frame size. For our ohm corpus (1080p) this drops the requested DPB / MV buffer sizing from level-5.1 generosity to level-4.1 right-sized. Without VAAPI exposing framerate we cannot also check MaxMBPS / MaxBR / MaxCPB. The frame-size-based pick is acceptable in practice: temporally-dense streams almost always also push spatially-large frames, so MaxFS captures the dominant resource-sizing signal. Cross-reference: H.264 spec Annex A, Table A-1 ("Level limits"). ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SPS lists level_idc as required-userspace-input, no kernel-derives annotation. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	b0a93e4683	h264: fill dpb[].pic_num as PicNum/LongTermPicNum, not VAAPI surface id fourier's h264_fill_dpb assigned `dpb->pic_num = entry->pic.picture_id` — the VAAPI surface id. Per ext-ctrls-codec-stateless.rst:651-655, v4l2_h264_dpb_entry.pic_num must equal the H.264 spec PicNum (equation 8-28) for short-term references or LongTermPicNum (equation 8-29) for long-term references. The surface id has no relationship to either. Kernel-side consumers of pic_num: - mediatek/decoder/vdec/vdec_h264_req_common.c (line 210): dst_entry->pic_num = src_entry->pic_num. Used for field-coded short-term reference disambiguation. - hantro / rkvdec / cedrus / qcom-iris-stateless: do NOT read pic_num. They resolve refs via reference_ts (timestamp) and POC. This is why fourier's wrong value never surfaced on RK3568 hantro. This patch makes pic_num spec-correct so the libva-v4l2-request fork is upstreamable across drivers without depending on each target's tolerance for non-spec fills. Computation, derived from H.264 spec section 8.2.4.1: For frames (not field-coded), PicNum = FrameNumWrap. FrameNumWrap = (frame_num > cur_frame_num) ? frame_num - max_frame_num : frame_num max_frame_num = 1 << (sps.log2_max_frame_num_minus4 + 4) cur_frame_num = current picture's frame_num For long-term references: LongTermPicNum = long_term_frame_idx (when not field-coded). VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic line 64): VAPictureH264.frame_idx = long_ref ? pic_id : frame_num So long-term refs already carry long_term_frame_idx in frame_idx; we copy it through. Field-coded streams require an extra factor-of-2 plus a parity adjustment per spec equations 8-28/8-29; this patch does not handle field-coded content. ohm corpus is all frame-coded so this is a follow-up for later. Implementation: add VAPicture parameter to h264_fill_dpb so the function has access to seq_fields.log2_max_frame_num_minus4 and the current picture's frame_num. Update the single caller in h264_va_picture_to_v4l2. Cross-reference: kernel doc ext-ctrls-codec-stateless.rst dpb_entry table (line 651-655) and mediatek/vdec/vdec_h264_req_common.c line 210. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	05ffd02ff2	h264: derive PFRAME / BFRAME flags from VASlice slice_type v4l2_ctrl_h264_decode_params.flags has PFRAME and BFRAME bits per ext-ctrls-codec-stateless.rst. fourier never set them; libva-v4l2- request relied on each backing driver tolerating frame-class ambiguity. Kernel survey (linux 6.19.x): - tegra-vde/h264.c (lines 783-799) consumes both flags to select the inter-frame decode kernel. Without them the I-frame kernel runs on P/B content. - visl-trace-h264.h uses them for decode tracing. - hantro / rkvdec / cedrus / mediatek / qcom-iris-stateless do not consume the flags. Hantro on ohm decoded bbb cleanly without these flags set (see phase6/step1/ohm_smoke_2026-05-02T060255Z_post_0015/), so this is an upstreamability fix for cross-driver portability rather than a correctness fix for hantro. VAAPI's VASliceParameterBufferH264.slice_type maps directly to the H.264 slice_header() slice_type field. Per spec 7.4.3: 0=P 1=B 2=I 3=SP 4=SI; 5..9 = "all slices in the picture have this slice_type." `slice_type % 5` recovers the underlying type in either encoding form. In FRAME_BASED mode we only see surface->params.h264.slice from the most-recent VASliceParameterBuffer — that's fine: a single coded picture has a uniform slice_type for the purposes of the PFRAME / BFRAME flag (multi-slice frames may mix slice types in some streams, but the flag's semantic is "this is an inter-coded frame," which holds if any slice is P or B; using the last-seen slice's type is a reasonable approximation). Cross-reference: ext-ctrls-codec-stateless.rst Decode Parameters Flags table. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	fdb0b728d7	h264: strip ffmpeg-vaapi POC sentinel before passing to V4L2 ROOT CAUSE for "kernel decodes successfully but produces zeroed CAPTURE buffers despite no V4L2_BUF_FLAG_ERROR": ffmpeg's H264POCContext initialises prev_poc_msb to (1 << 16) = 0x10000 as a sentinel for "uninitialised": libavcodec/h264dec.c:301 — global init in ff_h264_decode_init libavcodec/h264dec.c:444 — IDR reset in idr() helper ff_h264_init_poc (libavcodec/h264_parse.c:296-305) then computes pc->poc_msb = pc->prev_poc_msb whenever the slice header's pic_order_cnt_lsb hasn't wrapped relative to prev_poc_lsb (which is the typical case for any normal H.264 content with sane POC ordering). The sentinel leaks into field_poc[] (line 305) and from there into VAPictureH264.TopFieldOrderCnt / BottomFieldOrderCnt at libavcodec/vaapi_h264.c::fill_vaapi_pic (lines 73-78). Empirical confirmation via meitner 2026-05-02 ground-truth test: ran an LD_PRELOAD shim around vaCreateBuffer against an i965 VAAPI backend decoding a 60-frame H.264 Main clip. Every frame showed TopFieldOrderCnt = (POC \| 0x10000): Frame 1 IDR: raw bytes "00 00 01 00" at offset 12 → TopFOC=65536 Frame 2: raw bytes "06 00 01 00" → TopFOC=65542 Frame 3: "02 00 01 00" → TopFOC=65538 i965 successfully decodes regardless. V4L2 stateless drivers (hantro_h264.c::prepare_table feeds the value direct to tbl->poc[i*2]/[32], the kernel reflist builder uses it directly for cur_pic_order_count comparison) cannot tolerate the high word — the kernel's resource sizing math sees POC=65536 for an IDR and breaks. This patch adds h264_strip_ffmpeg_poc_sentinel() as a small static inline in src/h264.c. It detects bit 16 set rather than blindly subtracting, so a future ffmpeg version that fixes the leak degrades gracefully. The helper is applied at all four POC sites: 1. h264_fill_dpb: dpb->top_field_order_cnt 2. h264_fill_dpb: dpb->bottom_field_order_cnt 3. h264_va_picture_to_v4l2: decode->top_field_order_cnt 4. h264_va_picture_to_v4l2: decode->bottom_field_order_cnt VA_PICTURE_H264_INVALID DPB slots are short-circuited to POC=0 because libavcodec/vaapi_h264.c::init_vaapi_pic (line 43) already sets POC=0 there; the sentinel never applies. Zeroing them explicitly removes a class of "stale POC value in invalidated slot" foot-guns. Non-trivial follow-ups identified during the meitner experiment that are NOT addressed by this patch: - PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags are not yet derived from VASliceParameterBufferH264.slice_type. The bbb corpus is I-only at the start so this hasn't been a blocker, but a clip with B-frames will need the slice-type routing patch. - h264_fill_dpb's pic_num assignment (entry->pic.picture_id) is almost certainly wrong per the kernel doc — pic_num must equal the H.264 spec's PicNum / FrameNumWrap, not the VAAPI surface id. Out of scope here; will surface as a defect on streams that have multi-frame DPB lookups. Cross-references: audit_0008_decode_params_2026-05-01.md — kernel-side consumer audit confirming POC fields are userspace-required. api_contract_findings_2026-05-01.md — VAAPI doc gap on POC semantics; H.264 spec section 8.2.1 is the binding contract. meitner_2026-05-02_vaapi_idr_groundtruth/ — full empirical capture of the sentinel pattern across 60 frames. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	affb4bd12a	DEBUG: dump VAPictureH264 raw bytes + decoded fields Diagnostic-only. Investigating the observed anomaly: - V4L2 strace shows decode_params.top_field_order_cnt = 65536 on the first IDR frame submitted by mpv+ffmpeg+libva-v4l2-request - GStreamer's reference path writes 0 (spec-correct: PicOrderCnt=0 for IDR with pic_order_cnt_type=0 / pic_order_cnt_lsb=0) - Reading FFmpeg source (libavcodec/vaapi_h264.c::fill_vaapi_pic): va_pic->TopFieldOrderCnt = 0; if (pic->field_poc[0] != INT_MAX) va_pic->TopFieldOrderCnt = pic->field_poc[0]; For IDR: ff_h264_init_poc sets field_poc[0] = poc_msb + poc_lsb = 0 + 0 = 0. So FFmpeg should write 0. If FFmpeg writes 0 but fourier reads 65536, the mismatch is in the libva ABI between ffmpeg's writer and our reader. Most likely suspect: VA_PADDING_LOW size in VAPictureH264 differs between the libva headers ffmpeg+libva were built against and the headers fourier was built against, shifting struct field offsets. This patch dumps: 1. sizeof(VAPictureH264) at our reader's view 2. First 32 raw bytes of VAPicture->CurrPic 3. Field-decoded values via the .picture_id, .frame_idx, .flags, .TopFieldOrderCnt, .BottomFieldOrderCnt accessors If the raw bytes show 00 00 01 00 at offset 12 (= 65536 LE), the field offset is correct and FFmpeg actually wrote 65536 — meaning either FFmpeg has a bug, or our test scenario triggers a non-spec code path. If the raw bytes show 00 00 00 00 at offset 12 but TopFieldOrderCnt accessor returns 65536, the struct ABI is mismatched and we need to reconcile libva versions. If sizeof(VAPictureH264) prints as something other than 36 (= 45 + 4VA_PADDING_LOW assuming VA_PADDING_LOW=4), the struct layout on this build differs from the documented libva-2.x layout. Removed once the source of the 65536 is identified. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	c672f19f44	h264: hardcode SPS level_idc = 51 (intentional over-allocation) fourier's h264_va_picture_to_v4l2 never assigns sps->level_idc; the field stays at zero-init. level_idc=0 is invalid per the H.264 spec (lowest legal value is 10, Level 1.0). Hantro and other stateless H.264 decoders use level_idc to pre-allocate decoder resources (DPB size, motion-vector buffers); when fed an invalid level the hantro kernel driver silently skips the decode-hardware dispatch — the V4L2 request completes with no error, DQBUF returns the CAPTURE buffer reporting bytesused=3655712 and no V4L2_BUF_FLAG_ERROR, but the buffer is never written. VAAPI's decode-side VAPictureParameterBufferH264 structurally does NOT include level_idc — `grep level_idc va/va.h` returns only hits inside VAEncSequenceParameterBufferH264 (the encode path). The H.264 SPS NAL is also not included in VASliceDataBuffer because ffmpeg-vaapi parses it client-side and forwards only slice data (verified empirically via patch 0010's hex-dump of the OUTPUT buffer: it contains "00 00 01 65 ..." — i.e. ANNEX_B start code + IDR slice NAL byte, no SPS NAL). A SPS-NAL byte extractor is therefore not viable from the bitstream libva-v4l2-request receives. Workaround: hardcode level_idc = 51 (= Level 5.1, max for 1080p and 4K@30 mainstream consumer profiles). This INTENTIONALLY OVER-ALLOCATES decoder resources but is sufficient for any stream up to 4K@30. It is corpus-correct, not contract-correct: a 4K@60 stream (Level 6.x) would under-allocate. This patch is a known-incomplete intermediate, not a final fix. The proper upstreamable answer is a level-from-resolution derivation per H.264 Annex A.3 (max MB rate / max frame size thresholds). That requires mapping consumer-side framerate which VAAPI does not expose, so the lookup table is non-trivial. The TODO is captured inline. This patch's goal is unblocking decode-hardware engagement on the ohm_gl_fix corpus while the full level-derivation work proceeds. Cross-reference: kernel doc ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SPS lists level_idc as a required field with no "kernel-derives" annotation — i.e., userspace-required. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	841f616e74	h264: gate SCALING_MATRIX submission on VAIQMatrixBuffer presence VAAPI signals "explicit scaling lists are present in the bitstream" implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a VAIQMatrixBufferH264 alongside RenderPicture iff sps_scaling_matrix_present_flag \|\| pps_scaling_matrix_present_flag. When the bitstream uses default (flat) scaling, no IQMatrixBuffer arrives and the in-tree h264.matrix struct stays zero-initialised. fourier's existing codec_store_buffer for MPEG2 and HEVC tracks this via a per-surface iqmatrix_set boolean (surface.h::mpeg2.iqmatrix_set, h265.iqmatrix_set) — the H.264 path was missing the equivalent flag, so set_controls always submitted the scaling matrix, including the zero-initialised case. Symptom on hantro-vpu RK3568: when TRANSFORM_8X8_MODE is enabled in PPS, the kernel multiplies all 8x8 DCT coefficients by the zeroed scaling_list_8x8, producing a zeroed CAPTURE buffer despite a successful decode round-trip (no V4L2_BUF_FLAG_ERROR, bytesused=3655712 reported). Earlier draft of this patch unconditionally omitted SCALING_MATRIX in FRAME_BASED. That's corpus-correct (bbb has no explicit scaling lists) but the wrong predicate: the kernel-side gating is by "matrix-supplied vs. not," not by decode mode. Streams that signal explicit scaling lists must submit SCALING_MATRIX in either mode. Contract verification (audit_0008_decode_params_2026-05-01.md + hantro_h264.c::assemble_scaling_list): the kernel uses the supplied matrix when SCALING_MATRIX is in the control batch and falls back to spec-defined defaults when absent. Mode-independent. This patch: - surface.h: adds bool matrix_set to params.h264, mirroring mpeg2.iqmatrix_set / h265.iqmatrix_set. - picture.c codec_store_buffer (H.264 VAIQMatrixBufferType case): sets matrix_set = true when the buffer arrives. - picture.c RequestBeginPicture: resets matrix_set = false at the start of each Begin/Render/End cycle. - h264.c h264_set_controls: builds the controls[] array incrementally; SPS/PPS/DECODE_PARAMS always; SCALING_MATRIX iff matrix_set; SLICE_PARAMS only in SLICE_BASED; PRED_WEIGHTS only when both SLICE_BASED and V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED. The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is preserved — kernel doc ext-ctrls-codec-stateless.rst:752: "When this mode is selected, the V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall not be set." Cross-reference: kernel UAPI section ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SCALING_MATRIX (matrix supplied iff explicit scaling lists in bitstream) and hantro_h264.c::assemble_scaling_list (consumes supplied matrix or falls back to defaults). Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	1690dfaa79	DEBUG: sentinel-pattern test for CAPTURE buffer write Diagnostic-only. Writes 0xab×32 into the CAPTURE buffer's first 32 bytes immediately before VIDIOC_QBUF. The 0010 hex-dump after DQBUF reveals which case we're in: - All 0xab → kernel never wrote to this buffer (wrong buffer chosen, alias, or no decode actually happened despite bytesused=3655712 reported). - All zeros → kernel did write 0x00s (overwriting our sentinel), and the apparent "no picture" output is the kernel-side decode actually producing zeros (e.g. parser rejected the bitstream). - Mix of zeros and real luma values → kernel wrote real decoded pixels; CPU read sees stale-cached zeros somewhere OR the sentinel area was a header that decoder zeroed but rest is real. Need to check more bytes. - All 0xab still → kernel never touched this region but other parts of buffer may be filled (incomplete decode). Removed once Step 1 decode is verified. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	3609fbb425	DEBUG: hex-dump OUTPUT and CAPTURE buffer contents per frame Diagnostic-only patch (NOT for upstream). Hex-dumps: - First 32 bytes of OUTPUT buffer at QBUF time in picture.c::RequestEndPicture (i.e. what we feed the kernel) - First 32 bytes of CAPTURE Y-plane after DQBUF in surface.c::RequestSyncSurface (i.e. what kernel returned) Lets us see whether: - OUTPUT bitstream begins with valid ANNEX_B start code + NAL header byte (e.g. `00 00 01 65` for IDR slice) - CAPTURE Y-plane after decode contains varied luma data (working) vs. all-zeros / repeating pattern (kernel didn't write anything). Removed once Step 1 decode is verified working. Output goes via existing request_log() to stderr. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	597e896594	surface: don't VIDIOC_S_FMT the CAPTURE queue The hantro stateless decoder derives the CAPTURE format from the SPS attached to the per-request OUTPUT controls. Calling VIDIOC_S_FMT on the CAPTURE queue at vaCreateSurfaces2 time can leave the driver's vb2 state in an inconsistent configuration where the queue accepts buffers and DQBUF returns successfully but the kernel never actually writes decoded pixels into them. Cross-reference: GStreamer's gst-plugins-bad/sys/v4l2codecs/ gstv4l2decoder.c only calls VIDIOC_G_FMT on the CAPTURE side (via gst_v4l2_decoder_negotiate_src_format and friends). The same code path produces correctly-decoded NV12 frames on the same RK3568 hantro-vpu where libva-v4l2-request-with-S_FMT emits flat-green zeroed CAPTURE buffers. The v4l2_get_format() call immediately after this block already gives us the bytesperline / sizes the driver chose; nothing else in this file consumed the explicit S_FMT side-effects. Empirical hypothesis test for the lingering "kernel decodes without errors but emits zeroed CAPTURE" bug. If post-patch output shows actual picture content, this confirms the diagnosis: explicit CAPTURE format mutation breaks hantro's internal state. If output remains flat-green, the bug is elsewhere and we resume hex-dump-grade instrumentation. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	86a8545146	h264: fill DECODE_PARAMS frame_num + field flags from VAAPI Fourier's h264_va_picture_to_v4l2 only populated four fields of the struct v4l2_ctrl_h264_decode_params: dpb (via h264_fill_dpb), nal_ref_idc, top_field_order_cnt, bottom_field_order_cnt, and the IDR_PIC flag. Many other required-by-spec fields were left at zero- init (frame_num, idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_, dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size, slice_group_change_cycle, FIELD_PIC and BOTTOM_FIELD flags). For an IDR (first frame) on hantro-vpu RK3568, the kernel parses the bitstream from the OUTPUT buffer and uses these fields to drive its bitstream-element offset tracking. Empirically the kernel returned a successfully-decoded but ZEROED CAPTURE buffer — flat dark-green frames in mpv output, no errors logged. This patch fills every field VAAPI exposes: - frame_num: from VAPicture->frame_num. - FIELD_PIC flag: from VAPicture->pic_fields.bits.field_pic_flag. - BOTTOM_FIELD flag: from VAPicture->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD. Also corrects the IDR_PIC flag to use \|= instead of = so the new field flags don't clobber it. Fields NOT derivable from VAAPI's pre-parsed structures — idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_, dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size, slice_group_change_cycle — require a slice_header() bit-level parse. libva-v4l2-request does not currently do this. They remain at zero-init. Empirical question this patch answers: does hantro tolerate the bit_size fields being zero for IDR frames, or does it strictly require them? If post-patch CAPTURE is still zeroed, a slice-header parser is required. If CAPTURE shows real picture data, hantro fills in the bit-positions itself when no hint is supplied. Cross-reference: gstv4l2codech264dec.c:: gst_v4l2_codec_h264_dec_fill_decoder_params (commit 9e3e775, lines 632-678). Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	4078368104	context: enable ANNEX_B start-code emission to match device Patch 0002 sets V4L2_CID_STATELESS_H264_START_CODE to ANNEX_B on the device, telling the kernel that OUTPUT-buffer payloads will contain 0x00 0x00 0x01 NAL start codes. picture.c::codec_store_buffer has the prepend logic guarded by `if (context->h264_start_code)`, but that boolean is set ONLY inside h264_get_controls() — a function that exists but is never called. Result: device expects ANNEX_B, libva-v4l2-request feeds raw NAL payloads with no start codes, kernel cannot find slice boundaries, hantro emits a zeroed CAPTURE buffer. mpv reports successful decode because the V4L2 round-trip succeeds (no EINVAL); the visual output is a flat dark-green frame (NV12 zero through BT.709). Identified via: - Patch 0006 cleared the EINVAL cluster-rejection (128 → 0 on bbb_1080p30) but visual output remained flat green. - GStreamer reference (gstv4l2codech264dec.c:1363-1377) confirms start codes are required when ANNEX_B is selected. - Source-archaeology of fourier's picture.c:67-74 showed the gate on context->h264_start_code. Fix: in context.c::RequestCreateContext, immediately after patch 0002's device-control block, set context_object->h264_start_code = true to match the ANNEX_B mode we just programmed. Hardcoded for now (matches 0002's hardcoded set); replaced with a runtime probe in the planned probe-then-set commit. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	4246d5d537	h264: omit per-slice controls in FRAME_BASED mode Identified by cross-reference against GStreamer's gst-plugins-bad/sys/v4l2codecs/gstv4l2codech264dec.c (upstream commit 9e3e775). At lines 1263-1304, GStreamer gates SLICE_PARAMS and PRED_WEIGHTS submission on is_slice_based(self): if (is_slice_based (self)) { control[num_controls].id = V4L2_CID_STATELESS_H264_SLICE_PARAMS; ... control[num_controls].id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS; ... } In V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED, the kernel parses the bitstream itself from the OUTPUT-queue payload; per-slice controls in the request trigger cluster-validation EINVAL at error_idx=count (observed on RK3568 hantro-vpu, kernel 6.19.10). This patch: - Reorders controls[] so FRAME_BASED-required entries come first (SPS, PPS, SCALING_MATRIX, DECODE_PARAMS at indices 0..3) and the SLICE_BASED-only entries come last (SLICE_PARAMS, PRED_WEIGHTS at indices 4..5). - Defaults num_controls=4 (FRAME_BASED), expanding to 5 for SLICE_BASED and 6 when V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED. - Hardcodes slice_based=false for now since patch 0002 sets the device to FRAME_BASED unconditionally. A TODO marks the spot for the planned probe-then-set commit, which will populate context->decode_mode at CreateContext via VIDIOC_QUERYCTRL/ G_EXT_CTRLS and replace the hardcoded false with a runtime check. Diagnosis chain: - patch 0005 reduced one EINVAL per frame on PRED_WEIGHTS submission, but cluster-level rejection persisted at error_idx=5 (count) — meaning kernel walked all 5 controls cleanly but rejected the request as a whole. - dmesg silent → rejection in V4L2 core (v4l2-ctrls-request.c / v4l2-h264.c), not in hantro driver where it could log. - GStreamer reference confirmed FRAME_BASED contract: only 4 sequence-and-frame-level controls go in the per-request batch. After this patch the kernel should accept the per-request controls and actually decode the bitstream into the CAPTURE buffer. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	e382c63e20	h264: submit PRED_WEIGHTS only when WEIGHTED_PRED applies Per kernel UAPI (include/uapi/linux/v4l2-controls.h), V4L2_CID_STATELESS_H264_PRED_WEIGHTS is a conditional control: V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) := ((pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) && (slice_type == P \|\| slice_type == SP)) \|\| (pps->weighted_bipred_idc == 1 && slice_type == B) Submitting PRED_WEIGHTS on a frame where the macro evaluates false triggers VIDIOC_S_EXT_CTRLS to return EINVAL at error_idx=5 (the 6th, last control in the per-request batch) on hantro-vpu and any other driver that strictly enforces the spec. Smoke trace from RK3568 hantro on bbb_1080p30 (Main profile, no weighted prediction): every per-frame batch fails identically, 13 EINVALs over a 10-frame run. Without this fix, ffmpeg's vaapi-copy falls back to software decode for every frame. Fix: narrow num_controls to 5 (excluding PRED_WEIGHTS at index 5) when the macro returns false; keep at 6 when it returns true. Defect found and fixed via Phase 6 Step 1 ohm smoke testing. Not part of Sonnet's six-commit upstreamable plan; slotted in as patch 0005 ahead of the planned probe-then-set / FRAME_BASED commits because it unblocks per-frame submission on every backing driver, not just hantro. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	565f5c0de4	context: introduce request_pool, decouple OUTPUT buffers from surfaces Commit 3 of the upstreamable plan (upstreamable_design.md §1, §5). Replaces the prior per-surface OUTPUT-buffer ownership model with a small driver-wide pool sized by codec pipeline depth (4 H.264 frames in flight), allocated unconditionally regardless of caller's num_render_targets. Prior art (kernel UAPI dev-stateless-decoder.rst, ffmpeg v4l2_request.c, Chromium V4L2StatelessVideoDecoder, GStreamer v4l2slh264dec) all decouple OUTPUT and CAPTURE pool sizing. fourier's "output_count == surfaces_count" model was a category error: OUTPUT buffers are request-time bitstream slots, CAPTURE buffers are picture-time DPB slots; their lifecycles and sizing are independent. Changes: * NEW src/request_pool.{c,h} (~200 LoC): - request_pool_init(): CREATE_BUFS + per-slot QUERYBUF + mmap. - request_pool_destroy(): munmap all, idempotent. - request_pool_acquire(): round-robin claim; returns V4L2 buffer index of an unused slot or -1. - request_pool_release(): mark slot free for reuse. - request_pool_slot(): accessor for ptr/size given a buffer index. * src/request.h: add struct request_pool output_pool to request_data. * src/context.c::RequestCreateContext: replace the per-surface OUTPUT loop with a single request_pool_init() call (count=4, independent of surfaces_count). Drop the now-unused locals (length, offset, source_data, output_buffers_count, index, index_base, i, surface_object). DELETES patch 0002's "output_buffers_count = ... ? ... : 4" hack inline — the pool's own count parameter supersedes it. * src/picture.c::RequestBeginPicture: borrow a pool slot at frame start, write its mmap pointer/size/index into the surface's transient source_* fields. The fields stay (still useful as a borrow handle that the existing codec_store_buffer memcpys target), but no longer represent surface-permanent ownership. Reset slices_size/slices_count here too (was implicit on first Render). * src/surface.c::RequestSyncSurface: after VIDIOC_DQBUF returns the OUTPUT buffer, release the pool slot and clear the surface's borrow handle. Fixes the segv on second-frame submission. * src/surface.c::RequestDestroySurfaces: remove the munmap of source_data — pool owns the mmap. * src/request.c::RequestTerminate: call request_pool_destroy() before close(video_fd) so munmaps still target a valid fd. * src/meson.build: add request_pool.c and request_pool.h to the sources/headers lists. This commit removes 0002's OUTPUT-pool hack inline (the "floor to 4" line is gone). The DECODE_MODE/START_CODE block in 0002 remains until commit 4 lands. Build-verified clean on aarch64. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	58a0e8baf9	v4l2: add QUERYCTRL/QUERYMENU capability-probe helpers Pure utility additions, no behaviour change. Three helpers in src/v4l2.{c,h}: - v4l2_query_ext_ctrl(): wraps VIDIOC_QUERY_EXT_CTRL by CID. Returns 0 if the control exists, -1 if not. Caller passes NULL qec to test existence only. - v4l2_query_menu(): wraps VIDIOC_QUERYMENU at a given index. Returns 0 if a menu item exists at that index, -1 otherwise. - v4l2_ctrl_menu_has_value(): convenience layered on the above. For a menu/intmenu-type control, walks all menu items between minimum and maximum and returns true iff `value` is a valid entry. Used by callers that ask "does this driver accept menu value X for this CID?" without caring about iteration details. These unblock commit 3 (request_pool — needs ext-ctrl probing for codec-ops dispatch) and commit 4 (probe-then-set DECODE_MODE/ START_CODE — replaces 0002's unconditional set with a real probe) of the upstreamable design's six-commit series. Forward-declarations in v4l2.h keep the header lean: existing prototypes already use opaque struct v4l2_ext_control * pointers without including <linux/videodev2.h>; we follow the same convention for struct v4l2_query_ext_ctrl and struct v4l2_querymenu. No call sites added in this commit. Compile-only verification: the .so links cleanly with three new exports. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	50e0c2b996	context: pre-STREAMON device controls and minimum OUTPUT pool Two related fixes that surfaced during the first hantro-vpu (RK3568) smoke test of the multiplanar build: 1. OUTPUT queue must be non-empty at STREAMON. Hantro's vb2_start_streaming rejects an empty queue with EINVAL. Some VA-API callers (notably ffmpeg's vaapi-copy path) call vaCreateContext with num_render_targets=0 and allocate render targets lazily. The OUTPUT (bitstream-input) pool must NOT be sized off surfaces_count alone — it is a request-time resource, not per-surface. Quick fix: floor the pool to 4 buffers when the caller passes 0. (A proper decoupling of OUTPUT pool from surface lifecycle is documented in upstreamable_design.md.) 2. Device-wide stateless H.264 controls before STREAMON. The V4L2 stateless framework requires V4L2_CID_STATELESS_H264_ DECODE_MODE and START_CODE be set on the device fd (request_fd=-1) before stream start. Per-request controls (SPS/PPS/SLICE_PARAMS/etc.) attached to a request_fd come later via h264_set_controls(). hantro-vpu accepts only DECODE_MODE_FRAME_BASED; START_CODE_ANNEX_B matches what the existing slice-assembly path emits. This is set unconditionally for now (errors silently ignored) to keep cedrus and other backends compatible — they may default to SLICE_BASED and not expose DECODE_MODE at all. Probe-then-set via VIDIOC_QUERYCTRL is the upstream-correct approach (see upstreamable_design.md §3). After this patch, vainfo still enumerates as before, but the first mpv vaapi-copy attempt advances past STREAMON and into actual decode submission. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	10114f6781	mplane: enable V4L2 multiplanar capture for NV12 on hantro-vpu Fourier's local patch already wired multiplanar plumbing through src/v4l2.c (helpers v4l2_type_video_{output,capture}() at lines 59-69, struct v4l2_plane planes[] threading in QUERYBUF/QBUF/DQBUF, per-plane EXPBUF loop at line 411) and through src/context.c, src/buffer.c, src/picture.c via the v4l2_type_video_{output,capture}(video_format ->v4l2_mplane) helper calls. The remaining gap: the NV12 entry in src/video.c was hardcoded to v4l2_mplane=false, and the bootstrap path in src/surface.c was hardcoded to singleplanar literals before video_format is populated. This patch flips the NV12 entry to v4l2_mplane=true and updates the two singleplanar literals in src/surface.c to their MPLANE variants: - src/video.c:42 v4l2_mplane=false -> true (NV12 only; Sunxi-tiled NV12 left at false for cedrus compatibility) - src/surface.c:84 output_type = v4l2_type_video_output(true) - src/surface.c:109 v4l2_find_format(..., CAPTURE_MPLANE, NV12) Empirically, hantro-vpu (RK3568 mainline) advertises NV12 only under V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; querying the singleplanar type returns no match (verified via VIDIOC_ENUM_FMT in Phase 3 GStreamer strace baseline). Trade-off accepted: legacy sunxi-cedrus singleplanar NV12 paths are left unchanged via the SUNXI_TILED_NV12 entry (still mplane=false, __arm__ only). Pure-NV12 cedrus on aarch64 would regress, but the known userbase here is RK3566/RK3568 hantro. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	c45fea96e3	fourier-local: stateless control modernization + HEVC strip Compound patch carrying the fork's pre-Step-1 substrate, originally authored by Jernej Škrabec / fourier on top of bootlin's `a3c2476`: - src/h264.c + src/picture.c: V4L2_CID_MPEG_VIDEO_H264_* renamed to V4L2_CID_STATELESS_H264_*, struct shapes tracked to mainline (V4L2_CID_STATELESS_H264_DECODE_MODE/_START_CODE added to the passthrough shim). - include/hevc-ctrls.h: redirect shim to <linux/v4l2-controls.h> (kernel-side HEVC controls now live in the canonical UAPI header). - src/meson.build: src/h265.c / src/h265.h commented out — HEVC build path is excluded from this fork (RK3568 hantro G1/G2 has no HEVC, and the kernel-side HEVC controls have a separate rework in flight upstream). - src/tiled_yuv.S: aarch64 stub for tiled_to_planar (assembly source was sunxi-cedrus armv7-only; aarch64 needs a stub to keep the build linking). - include/h264-ctrls.h: removed (dead post-fourier — no source includes it; the passthrough shim's CID aliases live in the kernel header now). Functionally equivalent to the prior fork master commits: `c1f5108` V4L2_PIX_FMT_H264_SLICE rename `4ccbfe9` Strip HEVC build path `da9f2a5` include/h264-ctrls.h passthrough + CID aliases `fc4bb10` src/h264.c track upstream UAPI shape `13e9b64` src/h264.c drop num_slices field `4d14ffb` src/tiled_yuv.S aarch64 stub `1b02c9b` src/h264.c include utils.h Folded into one commit during 2026-05-04 Step 1 reconciliation (see ../phase0_evidence/2026-05-04/findings.md). Per-patch history of the early fork commits preserved on the pre-step1 branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 09:40:14 +00:00
Chen-Yu Tsai	a3c2476de1	Respect libdir for install path Distritubions may install libraries under architecture-specific sub-directories, to support multiple architectures on the same system. In addition, the user may not wish to install the library with the default prefix. Use the libdir variable when setting the install path. This allows both specifying different sub-directories, and a different prefix.	2019-05-17 13:59:26 +08:00
Chen-Yu Tsai	3264c0495c	Add option to specify path to up-to-date kernel headers The system normally has kernel headers shipped with the distribution. These typically lag behind actual kernel releases. Thus they would not have the latest API additions, such as the V4L2 request API this driver uses. However, it is also bad practice to just install new kernel headers into the system wide default location, as there may be some differences between it and what the C library was built against. Add an option to specify a path to a set of up-to-date kernel headers. This would allow the user to build this project in a safe but working environment. Signed-off-by: Chen-Yu Tsai <wens@csie.org>	2019-05-17 13:59:23 +08:00
Paul Kocialkowski	7f359be748	Include missing needed codec headers for build Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2019-05-16 16:32:03 +02:00
Paul Kocialkowski	d48ace9757	Update H.264 V4L2 pixel format, which was renamed Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2019-05-16 16:24:23 +02:00
Nicolas Dufresne	fc9252a4d0	image: Fix pitches and offsets in the save image We where first copying the image structure and then setting the pitches and offets, so this information was lost. This fixes vaDerivedImage and vaGetImage implementation. Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>	2019-05-16 16:14:55 +02:00
Nicolas Dufresne	7233c5a2ae	image: Partially implement vaGetImage This enables raw playback within GStreamer. This is useful for testing even if slower then DMABuf. This is a partial implementation since we don't implement partial copy of the surface. Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>	2019-05-16 16:14:55 +02:00
Nicolas Dufresne	b8ac9bb9ea	surface: Only set format if unset The vaCreateSurface2 may be called multiple times, setting the format again would lead to EBUSY being returned as you cannot change the format if you have buffers allocated. Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>	2019-05-16 16:14:55 +02:00
Paul Kocialkowski	b5cee9f480	include: Update headers to latest series Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2019-05-16 16:14:55 +02:00
Paul Kocialkowski	0f4a76e9a6	Lower libva requirement to API version 1.1.0 (lib version 2.1.0) Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2019-03-07 14:11:18 +01:00
Paul Kocialkowski	0c611c6b7a	Implement proper timestamping for references Reference frames are now identified using their timestamp: set the timestamp when queuing the output buffer and use it to identify the frame later on. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2019-03-07 11:41:56 +01:00
Paul Kocialkowski	3176adf69c	Include local copies of DRM and V4L2 codec definitions Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2019-03-07 11:37:12 +01:00
Paul Kocialkowski	ca5198b429	Add support for the meson build system Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2019-03-07 11:37:12 +01:00

1 2 3 4 5 ...

363 Commits