libva-v4l2-request-fourier

Author	SHA1	Message	Date
claude-noether	709ab34624	Revert "fresnel-fourier iter5b Phase 6 commit C: surface.c — profile-derived OUTPUT pixel format" This reverts commit `4b2288fa9a`.	2026-05-12 12:32:56 +00:00
claude-noether	4b2288fa9a	fresnel-fourier iter5b Phase 6 commit C: surface.c — profile-derived OUTPUT pixel format Replace the hardcoded `V4L2_PIX_FMT_H264_SLICE` at surface.c:173 with a profile-derived lookup via find_sole_active_pixelformat(). The helper walks the config_heap; with one active config (universal across mpv, ffmpeg, Firefox, Chromium) it returns the cached pixelformat populated at CreateConfig in commit B. Falls back to the pre-iter5b H264_SLICE for the pathological "zero or multiple configs" case (probe surfaces before CreateConfig; multi-config-then-surfaces). Extend the existing resolution-change gate to also fire on pixelformat (codec) change. The teardown branch handles both cases identically — REQBUFS(0) on both queues before re-S_FMT. The kernel behavior pre-iter5b on RK3399: - hantro: hantro_find_format(H264_SLICE) returns NULL on the RK3399 decoder block (no H.264 support); hantro_try_fmt silently substitutes the first format in rk3399_vpu_dec_fmts = MPEG2_SLICE → codec_mode = MPEG2_DECODER. VP8 bitstream dispatched to MPEG2 ops → all-zero CAPTURE. MPEG-2 worked by accident (bitstream matched the substituted codec_mode). - rkvdec: format/control mismatch; decoder silently drops the request → all-zero CAPTURE. Same bug class as iter4 commit `692eaa0` (h264_start_code unconditional set). Both fixes thread the active VAProfile into codec-specific kernel state. Signed-off-by: claude-noether <claude-noether@reauktion.de>	2026-05-12 09:23:31 +00:00
claude-noether	7bd0818792	iter7 Phase 7 finalization: OUTPUT-pool teardown + test refinements Surfaced during Phase 7 verification on ohm: 1. OUTPUT pool stale-slot bug (src/surface.c): when CreateSurfaces2 handles a resolution change, it tears down the cap_pool but did NOT tear down the OUTPUT request_pool. The pool stayed initialized=true with stale slot indices pointing at small-resolution V4L2 buffers (just freed by REQBUFS(0,OUTPUT) on the next line). Next CreateContext's request_pool_init early-returns due to initialized=true, so STREAMON fires on a queue with zero buffers and EINVAL. Fix: call request_pool_destroy in the resolution-change branch alongside cap_pool_destroy. Mirror the cap_pool teardown. Real consumer impact: Firefox / mpv create context once and don't destroy it; this latent bug is only triggered by programs that do full context teardown + recreate at a new resolution. Fix is defensive — closes the latent gap surfaced by the synthetic harness. 2. cap_pool_probe_pattern.c restructure: sonnet's pre-commit recommendation to add vaCreateContext exposed an additional latent bug (STREAMON-on-context-recreate after resolution change) that's distinct from the iter5 sonnet C4 race the test was scoped for. Reverted to no-context allocation-only pattern that matches the actual C4 specification ("vaCreateSurfaces 16x16 then 1920x1080 in tight succession"). The new STREAMON bug is logged as iter8 candidate. 3. run_cap_pool_probe.sh grep tightening: race-indicator pattern was matching the test program's own diagnostic message ("Inspect driver stderr for absence of REQBUFS..."). Now grep restricts to lines starting with "v4l2-request:" prefix. Phase 7 results (clean iter7 driver sha 54999017... + this fix): - Track A (msync verify): 100 frames byte-for-byte SW=HW (sha 58c8f3f4...) -> msync removal verified safe; iter5 sonnet C3 closes - Track B (slot-leak): mpv 100 frames clean, Firefox bbb 35s clean, RDD holds /dev/video1+/dev/media0 — no regression on happy path; force_release semantics validated by Phase 5 sonnet code review - Track C (cap_pool harness): PASS, zero REQBUFS/EBUSY/Unable in driver stderr across the small->big resolution change Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 09:29:46 +00:00
claude-noether	988b848908	iter7: A+B+C — slot-leak fix, cap_pool harness, msync verify harness Closes three internal carry items in one fork commit. iter6 deferred these as TODOs; iter7 lands the implementations + supporting tests. # Track B — slot-leak error recovery (src/) iter6 documented the RequestSyncSurface error paths as a "bounded leak we accept" — slots stayed busy=true after REINIT/DQBUF failures until RequestTerminate ran. With pool=16 and rare errors this was acceptable, but a sustained-error scenario could starve the pool. Adds request_pool_force_release(pool, index) which: 1. Tries media_request_reinit on the slot's fd (cheap path) 2. Falls back to close + media_request_alloc (recovery) 3. Leaves the slot dead-busy if even alloc fails (other slots unaffected, pool capacity reduced by 1 until destroy) Wires it into surface.c RequestSyncSurface error paths only for errors before the OUTPUT-DQBUF attempt. After OUTPUT-DQBUF failure the V4L2 buffer is in indeterminate kernel state, so a separate error label (`error_buffer_indeterminate`) leaves the slot dead-busy — reusing the slot would QBUF on a kernel-still-held buffer and EINVAL. Phase 5 sonnet review caught this discriminator subtlety pre-commit. Files: request_pool.{h,c}, surface.c. # Track C — cap_pool race synthetic harness (tests/) iter5 sonnet C4 / iter6 candidate A: cap_pool resolution-change race was organically exercised by YT's quality renegotiations (iter6 close, 4 cap_pool_init events clean) but had no deterministic regression test. tests/cap_pool_probe_pattern.c — ~170-line C program: opens libva display, vaCreateConfig, vaCreateSurfaces(small) + vaCreateContext (triggers OUTPUT pool init at small resolution), dispose, vaCreateSurfaces(big) + vaCreateContext (forces S_FMT on the new resolution against an in-use OUTPUT pool — the actual race-hitting path). Phase 5 sonnet flagged that without vaCreateContext the test would pass trivially (OUTPUT pool never init'd, REQBUFS(0) on empty queue is a no-op). Fixed before commit. tests/run_cap_pool_probe.sh — runner; greps driver stderr for REQBUFS / EBUSY / "Unable to set format" race indicators. # Track A — msync pixel-correctness verify harness (tests/) iter5 sweep removed msync(MS_SYNC\|MS_INVALIDATE) from CAPTURE DQBUF path. iter5 sonnet C3 flagged: no formal pixel verification. tests/run_msync_pixel_verify.sh — runs FFmpeg SW decode (libavcodec reference) and FFmpeg HW decode (via our v4l2_request driver), compares NV12 byte streams. Probes fixture dimensions via ffprobe and uses crop=$W:$H after hwdownload to normalize MB-padding artifacts (hantro pads height to 16-line align; SW returns crop-aligned). Phase 5 sonnet flagged the stride-mismatch false-failure risk pre-commit. Fixed: explicit crop + diagnostic that distinguishes genuine pixel divergence from MB-padding stride artifacts. # Phase 5 sonnet code review Verdict: APPROVE-WITH-CHANGES. Three actionable findings, all addressed before this commit: 1. surface.c error path: separated OUTPUT-DQBUF-failure into error_buffer_indeterminate label, slot stays dead-busy 2. cap_pool_probe_pattern.c: added vaCreateContext to actually exercise the OUTPUT pool init at the small resolution 3. run_msync_pixel_verify.sh: explicit crop on HW path, stride-mismatch diagnostic distinguished from corruption Empirical verification (Phase 6+7 deploy + run): pending operator ohm-tools availability. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 06:49:48 +00:00
claude-noether	a09c03c154	iter6 fix: per-OUTPUT-slot request_fd binding via REINIT iter4 (`385dee1`) replaced the original media_request_reinit pattern with close+media_request_alloc per frame to escape an EINVAL on S_EXT_CTRLS that turned out to be a DPB-payload bug (`74d8dd1`, FFmpeg V4L2_H264_FRAME_REF semantics). The per-frame close+alloc model worked for mpv vaapi-copy (single-surface recycle) but raced under Firefox 150's MediaSource pipeline (multi-surface rotation): fd=30 got reused via lowest-free-fd allocation faster than the kernel- side per-buffer state-machine could tear down the prior request, producing intermittent VIDIOC_QBUF EINVAL on OUTPUT after 1..53 successful frames. Phase 2 telemetry confirmed: - DQBUF returned the index we passed (no FIFO mismatch) - SPS/PPS/DECODE_PARAMS/SCALING_MATRIX byte-identical between mpv and Firefox first 64 bytes - Pool size bump 4 -> 16 only delayed the failure (62 frames) - Different OUTPUT slot indices failed across runs (race signature) Fix: each OUTPUT pool slot owns a permanent request_fd allocated once at request_pool_init and REINIT'd between uses in RequestSyncSurface. 1:1 slot-to-fd binding eliminates cross-slot fd reuse entirely. Pool stays driver-wide (multi-context safe per iter5 Track E); slots cycle through 16 distinct fds in round-robin acquire. Files: - request_pool.h: add request_fd field to slot struct; init signature takes media_fd - request_pool.c: alloc per-slot fd at init, close at destroy - context.c: pass driver_data->media_fd; pool size 4 -> 16 - picture.c: BeginPicture binds slot->request_fd to surface; EndPicture's per-frame media_request_alloc removed - surface.c: RequestSyncSurface uses media_request_reinit instead of close+alloc; DestroySurfaces close removed (slot owns fd); error path close removed; surface_object NULL-init for the -Wmaybe-uninitialized warning fix Empirical verification (clean build sha ebe396d5..., no diagnostic instrumentation): - Firefox 150 + bbb_1080p30_h264.mp4 + LIBVA_DRIVER_NAME=v4l2_request + sandbox enabled: 35s+ playback, zero "Unable to queue buffer" / "Unable to set control(s)", lsof shows RDD process holds /dev/video1 + /dev/media0 throughout. Driver stderr: only the single cap_pool_init: 24 slots ready line. - mpv vaapi-copy 50 frames: zero errors, "Using hardware decoding (vaapi-copy)" - no regression vs iter5-end driver. Pool-size bump diagnostic (Phase 5 sonnet design review feedback): 4 -> 16 alone took 1->62 frames, far short of the 30s success criterion (~900 frames at 30fps). REINIT discipline is the actual fix; pool 16 is comfortable headroom over typical H.264 MaxDpbFrames. Phase 5 sonnet code review: APPROVE-WITH-CHANGES (one comment attribution corrected: cleanup runs at RequestTerminate, not RequestDestroyContext, since the pool is driver-wide). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 21:30:39 +00:00
test0r	c8b6edec3d	iter5 sweep follow-up: remove additional DEBUG sites flagged by Phase 5 review Phase 5 sonnet review caught four DEBUG sites the first sweep pass missed (the vaapi-copy + --vo=null stress test didn't exercise the ExportSurfaceHandle path, so per-frame ExportSurfaceHandle dumps went undetected). Removed: - surface.c::CreateSurfaces2 format-dump (per-CreateSurfaces2 noise, labeled DEBUG INSTRUMENTATION (surface-export diagnosis 2026-05-04)) - surface.c::ExportSurfaceHandle full-descriptor dump (per-frame for consumers using DMA-BUF, also labeled DEBUG) - surface.c::QuerySurfaceStatus -> status= line (per-call noise) - h264.c V4L2 readback block (~67 lines): static bool readback_warned + the per-frame VIDIOC_G_EXT_CTRLS attempt + the readback success log + the "V4L2 readback unavailable" fallback announcement. With the iter4 fixes landed, the readback EACCES is no longer load-bearing to investigate — drop the block + the per-process global state. Removing the readback block also resolves Phase 5 finding C2: the static bool readback_warned was new mutable process-global state introduced post-Track-E, inconsistent with that track's intent. Net: -107 lines from src/{h264,surface}.c. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 16:04:03 +00:00
test0r	b993355507	iter5 Track E: move LAST_OUTPUT_WIDTH/HEIGHT from process-global to per-driver-data Sonnet review 7.3 / 9.6 from iter1 + carried iter2/3/4 substrate. Two libva driver_data instances in the same process (e.g. Firefox playing two tabs at different resolutions, or Firefox + mpv via the same dlopened backend) would race on the static cache. Move to struct request_data.last_output_width/height. The V4L2 device fd is already per-driver_data, so this is the correct binding unit (one fd, one current OUTPUT format). Verified: two concurrent mpv processes (2s stagger) both decode 300 frames cleanly with no cross-corruption. Same-instant init still hits kernel-level fd contention on /dev/video1 (hantro is a single-instance device); cross-process serialization is out of scope for a libva backend. Resolves the surface_reset_format_cache() callsite: now takes driver_data parameter (was zero-arg). Also drops the 'rc' unused-variable warning in v4l2_ioctl_controls that the iter5 sweep left behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 15:05:41 +00:00
test0r	843febc174	iter5 sweep: remove iter1 slice_header parse + VAPicture dump + Sync RETURN trace h264.c: - Remove the slice_header parse success log (the parse data is now forwarded into decode_params directly without per-frame echo). Keep the FAILED-rc log since it indicates a real decode-blocking error. - Remove the iter1 patch-0014 VAPictureH264 byte-dump + field-read log block. The TopFieldOrderCnt=65536 anomaly it diagnosed was resolved by the POC sentinel strip (h264_strip_ffmpeg_poc_sentinel) that stays in the codebase. surface.c: - Remove the per-call "RequestSyncSurface RETURN status=" trace. - Remove the per-call "RequestSyncSurface early-exit" trace. v4l2.c: - Suppress the per-frame "Unable to get control(s): Permission denied" log when errno == EACCES (the expected case on this hantro rig per iter1 patch-0014's findings). The one-time announcement in h264.c stays. Real EACCES-on-non-request-fd or other errno values still log normally. Per-frame v4l2-request log noise drops from ~30+ lines/frame to init-time + once-per-resolution-change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 14:51:10 +00:00
test0r	d3a299b4cc	iter5 sweep: remove iter1 patch-0010 hex-dumps + patch-0011 sentinel picture.c: remove the 0xab sentinel write into CAPTURE buffer first 32 bytes pre-QBUF + the OUTPUT hex-dump pre-QBUF. Both were iter1 diagnostics for "where does the buffer write go?" investigation. surface.c: remove the post-DQBUF CAPTURE Y-plane hex-dump + luma variance signal. The msync(MS_SYNC\|MS_INVALIDATE) was added as a companion fix for the cached-mmap issue surfaced by the dump itself — removing the dump removes the need for the msync. With iter1+iter2+iter3+iter4 fixes landed, these dumps fire on every single frame and produce hundreds of MB of log noise during sustained decode. Now gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 14:48:31 +00:00
test0r	951233a12e	iter5 sweep: remove iter1 ENTER traces (13 call sites across 4 files) Removes the iter1 patch-0014 ENTER traces from buffer.c, image.c, picture.c, surface.c. These were diagnostic-only entry-point logs added during iter1's "where does Firefox RDD crash?" investigation. With the iter1+iter2+iter3+iter4 fixes landed, the entry-point traces are pure noise. If a future investigation needs entry-point coverage, strace -e trace on the libva consumer process gives equivalent visibility without modifying the driver. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 14:47:25 +00:00
test0r	385dee1bbf	iter4 fix: fresh request_fd per frame (fixes carryover EINVAL) This is the load-bearing fix that resolves the iter1+iter2+iter3 "frame-11 EINVAL" carryover. Replace the per-surface request_fd cache + MEDIA_REQUEST_IOC_REINIT pattern with allocate-fresh-per-frame: in RequestSyncSurface, after queue + wait_completion succeed, close the request_fd and reset surface_object->request_fd = -1 so the next BeginPicture allocates a new one via media_request_alloc. Diagnostic root cause: per-control VIDIOC_TRY_EXT_CTRLS isolation showed all four H.264 controls (SPS/PPS/DECODE_PARAMS/SCALING_MATRIX) fail individually with EINVAL on the same request_fd that had been through queue+wait+reinit. The fd state was bad even though every ioctl in the previous decode cycle returned success. Allocating fresh sidesteps any kernel-side request-state-machine subtlety we don't fully understand. Empirical verification (iter4 Phase 7, 90s autonomous run on ohm via firefox-fourier without MOZ_DISABLE_RDD_SANDBOX=1, bbb_1080p30 H.264): - ENETDOWN count: 0 - S_EXT_CTRLS rejected: 0 (was: fired at frame 11 every iter1-3) - Unable to set control(s): 0 - Generic EINVAL: 0 - Video stream mTime reached: 49.7 seconds - Audio stream mTime reached: 51.5 seconds Cost: ~one extra MEDIA_IOC_REQUEST_ALLOC + close() per decoded frame. Negligible (cycles below the V4L2 set_controls + queue + wait stack). Companion fixes that landed earlier in iter4 to get to this point: `74d8dd1` — DPB fields=V4L2_H264_FRAME_REF + skip !used entries (matches FFmpeg's libavcodec/v4l2_request_h264.c semantics) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 14:12:41 +00:00
test0r	19acc76da4	iter2 Fix 3: decoupled CAPTURE buffer pool with LRU recycling Pre-iter2 each VA surface was permanently 1:1 bound to one V4L2 CAPTURE buffer. mpv reusing a surface for a new decode while the compositor still held an EXPBUF'd dma_buf fd to the prior frame caused the kernel to write fresh decode output into the same physical memory the compositor was reading -- visible as stutter / back-and-forth swap on mpv --hwdec=vaapi --vo=gpu playback. Architecture: - New cap_pool abstraction (cap_pool.{h,c}) owns N CAPTURE buffers (N = max(surfaces_count, MIN_CAP_POOL=24)) with per-slot state {FREE, IN_DECODE, DECODED, EXPORTED} guarded by pthread_mutex_t. - Surfaces no longer own buffers; each vaBeginPicture acquires the oldest FREE slot (LRU), binds it for the decode cycle, and the slot cycles IN_DECODE -> DECODED (post-DQBUF) -> EXPORTED (post-EXPBUF). - Slot is released on next BeginPicture for the same surface or on vaDestroySurfaces. Limitations (Sonnet Phase 5 review iter2 9.x, deferred to iter3+): - Option-A statistical mitigation; race window narrows to "pool exhausted, force-recycle of oldest EXPORTED slot." For typical mpv 16-surface playback with MIN_CAP_POOL=24 the fallback never fires. - Multi-context concurrent use not addressed (one V4L2 device, multiple cap_pools -- iter3 scope). Other call sites updated: - picture.c::BeginPicture acquires + binds, releasing prior slot if any. - surface.c::SyncSurface marks slot DECODED after DQBUF. - surface.c::ExportSurfaceHandle marks slot EXPORTED, retaining OUR EXPBUF fd for force-recycle close(). - surface.c::DestroySurfaces releases via surface_unbind_slot; cap_pool owns the mmaps now. - surface.c::CreateSurfaces2 destroys the pool in the resolution-change path before REQBUFS(0) (else stale v4l2_index after Fix 1's REQBUFS). - context.c::DestroyContext invokes cap_pool_destroy. - image.c::DeriveImage skips copy_surface_to_image when current_slot is NULL (ffmpeg av_hwframe_ctx_init probes derive on undecoded surfaces). Verified: mpv vaapi-copy 200 frames bbb_1080p30, 0 drops, LRU visibly recycling slot indices, real luma gradient. mpv vaapi --vo=gpu operator-inspection follows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:03:31 +00:00
test0r	e64bb0852d	iter2 Fix 2: conditional DRM_FORMAT_MOD_INVALID for non-64-aligned pitch Iteration 2 Fix 2: branch on bytesperline alignment when setting the drm_format_modifier in RequestExportSurfaceHandle. For non-64-byte-aligned pitches (e.g. 864 for 864-wide videos), report DRM_FORMAT_MOD_INVALID instead of DRM_FORMAT_MOD_NONE (LINEAR explicit). Mesa's WSI rejects LINEAR buffers that aren't scanout-aligned with 'WSI pitch not properly aligned'; MOD_INVALID tells the importer to treat as texture-only, which is the correct behavior for buffers that don't meet scanout alignment requirements. Diagnosis from operator's mozilla.org session in iteration 1 close: 864-wide intro videos triggered the WSI alignment error and Firefox fell back to SW for those videos. Sonnet Phase 5 review endorsed the conditional approach over a universal MOD_INVALID change to preserve LINEAR semantics for already-aligned content (avoids unnecessary perf cost on the common 1920-wide case). Verification path (Phase 7 of iteration 2): Firefox loads mozilla.org main page; check no MESA WSI errors in stderr; operator confirms intro videos engage HW decode (or at least don't fall back). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 19:18:55 +00:00
test0r	06beef6248	iter2 Fix 1: invalidate format cache on DestroyContext + REQBUFS(0) on CAPTURE in resolution-change path Fix 1 of iteration 2 per phase4_iter2_plan.md. Adds surface_reset_format_cache() exposed from src/surface.h. Called from RequestDestroyContext after the dual REQBUFS(0). Without this, multi-video Firefox sessions on mozilla.org corrupted the next session's CAPTURE format query: the kernel reset to defaults but our LAST_OUTPUT_WIDTH/HEIGHT cache still said 'already 1920x1088,' so the next G_FMT returned 48x48 and the exported descriptor encoded wrong pitch/offset. Also adds REQBUFS(0) on CAPTURE in the resolution-change path of RequestCreateSurfaces2 (Sonnet Phase 5 review iter2 9.1). The existing code only did REQBUFS(0) on OUTPUT before re-S_FMTting; hantro derives CAPTURE format from OUTPUT format, so leftover CAPTURE buffers from the prior resolution would also block the implicit format change. Pre-existing bug surfaced by Sonnet's audit; Fix 3 pool refactor would have exposed it more often. Limitation noted in surface.h docblock: the LAST_OUTPUT_WIDTH/ HEIGHT cache is a static process-global, so concurrent multi- context use still races (Sonnet 7.3 / 9.6). Iteration 2 only addresses sequential sessions. Multi-context safety is iteration 3+. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 19:11:03 +00:00
test0r	ac891a01fa	surface: honor VA_EXPORT_SURFACE_SEPARATE_LAYERS in vaExportSurfaceHandle Firefox 150's RDD calls vaExportSurfaceHandle with the VA_EXPORT_SURFACE_SEPARATE_LAYERS flag (per FFmpegVideoDecoder.cpp GetVAAPISurfaceDescriptor at the libva-VAAPI export site). With that flag, libva consumers expect 2 separate layers — Y as DRM_FORMAT_R8, UV as DRM_FORMAT_GR88, each with num_planes=1 — not the COMPOSED single-layer-with-2-planes shape we always returned regardless of flags. Our previous code ignored the flag parameter and always built the COMPOSED descriptor. mpv works with that because mpv passes the default (COMPOSED) flag and the shape matches. Firefox's DMABufSurfaceYUV import code parsed our COMPOSED descriptor as if it were SEPARATE, found bogus layer-1 data, silently fell back to FFmpeg(FFVPX) software decode after frame 0. Fix: branch on the flag and build the appropriate descriptor. flags & VA_EXPORT_SURFACE_SEPARATE_LAYERS: num_layers=2 layers[0] = Y as DRM_FORMAT_R8, num_planes=1 layers[1] = UV as DRM_FORMAT_GR88, num_planes=1 default (COMPOSED, including unflagged): num_layers=1, drm_format=DRM_FORMAT_NV12, num_planes=2 (existing behavior, preserved for mpv et al.) For the single-fd case (hantro NV12 backed by one CMA buffer), both layers reference object_index=0 with different offsets and pitches (both stride=1920 for 1920x1088). Diagnosed via Firefox source dive (mozilla/gecko-dev master, dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp:1638) — the explicit flag in the export call was the discriminator between mpv's success and Firefox's silent SW fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:32:12 +00:00
test0r	fdfee2d661	DEBUG: log SyncSurface RETURN to confirm clean exit before crash	2026-05-04 15:08:25 +00:00
test0r	7da2b27454	DEBUG: ENTER logging at libva entry points to trace Firefox call flow Adds request_log on entry to: - RequestSyncSurface - RequestQuerySurfaceAttributes - RequestQuerySurfaceStatus (including the returned status value) - RequestDeriveImage - RequestQueryImageFormats - RequestGetImage Goal: identify which API call Firefox 150 makes that returns differently than it expects, causing the SW fallback after frame 0. mpv works end-to-end with the surface-export fix in place; Firefox does not. Per operator's correction: don't assume mpv's success means the driver is correct — Firefox may detect a real spec violation that mpv silently tolerates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:17:52 +00:00
test0r	37c0e720fc	surface: re-set OUTPUT format on resolution change The static SET_FORMAT_OF_OUTPUT_ONCE flag pinned the OUTPUT format to the first call's dimensions, which for mpv's probe pattern means 128x128. Subsequent CreateSurfaces2 for the real 1920x1088 resolution would then read CAPTURE format from the kernel (which derives from the OUTPUT format) and get 128x128 sizes back — leading to a VADRMPRIMESurfaceDescriptor with width=1920 height=1088 but pitch=128 offset=16384. Mesa's WSI rejected this as 'pitch too small,' and the mpv vaapi --vo=gpu render landed on a solid blue frame. Same root cause for Firefox 150's SW fallback after frame 0. Replace SET_FORMAT_OF_OUTPUT_ONCE with LAST_OUTPUT_WIDTH/HEIGHT tracking. When dimensions change, call REQBUFS(0) on OUTPUT to drop any stale buffers (S_FMT is rejected by V4L2 while buffers exist), then re-S_FMT at the new resolution. The kernel will derive the new CAPTURE format from this OUTPUT format on the next CreateBufs + G_FMT cycle. Caveat (TODO for next iteration): for consumers that legitimately stream multiple resolutions in sequence (mid-stream resolution change via V4L2_EVENT_SOURCE_CHANGE), the current approach still requires CreateSurfaces2 to be called, which mpv does on probe. A proper context-level redesign would handle SOURCE_CHANGE inline with STREAMOFF + REQBUFS(0) + new S_FMT. Diagnosis and root cause: surfaced by 2026-05-04 Phase 5 sonnet review (finding 7.3) as a 'latent bug to document.' Today's instrumentation captured it as the active bug — the ExportSurfaceHandle dump showed pitch=128 for 1920x1088 surfaces right before MESA reported 'WSI pitch too small' and dropped to software. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:05:16 +00:00
test0r	2517a1206b	DEBUG: instrument surface CreateSurfaces2 + ExportSurfaceHandle for diagnosis Logs format_width/height + bytesperline + sizes from v4l2_get_format in CreateSurfaces2, and the full VADRMPRIMESurfaceDescriptor in ExportSurfaceHandle (fd, fourcc, width/height, num_objects/layers, obj.size + drm_format/modifier, plane offsets/pitches). Diagnostic for the surface-export bug surfaced by Phase 7 (mpv --hwdec=vaapi --vo=gpu shows solid blue, Firefox falls back to SW after frame 0 — both consumers GL-import the DMA-BUF, both fail to render correctly while vaapi-copy works). Phase 5 review (sonnet) suggested format_height might be 1080 (stream) vs 1088 (MB-aligned), miscomputing UV offset by 15360 bytes. Earlier ftrace shows kernel returns height=1088 — the hypothesis is likely false but verifying in-driver to confirm. Will compare with mpv --msg-level=vd=v --msg-level=vo=v output to identify the import-side discrepancy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:00:13 +00:00
test0r	a047926dbc	DEBUG: cache-fix CAPTURE dump + VIDIOC_G_EXT_CTRLS readback Tier 3E + 3F observability hardening from the libva-multiplanar campaign Phase 6 follow-up. Improves diagnostic reliability for future probes; no functional decode path change. Tier 3E (cache-fix): patch-0010's CAPTURE Y-plane dump now calls msync(p, 32, MS_SYNC\|MS_INVALIDATE) before the read so userspace sees what the kernel actually DMA-wrote, not a stale CPU cache line. Without this, the previous version of the dump consistently showed the patch-0011 sentinel (0xab) even when the kernel had overwritten it — caused half a day of mistaken "kernel never wrote the buffer" diagnosis. Also computes a luma min/max/variance signal so a uniform fill (variance=0) is visually obvious vs real pixel data (variance > 0). Tier 3F (VIDIOC_G_EXT_CTRLS readback): after v4l2_set_controls in h264_set_controls, reads back DECODE_PARAMS + PPS via v4l2_get_controls (added by patch 0003) on the request fd. Logs key fields: dec.idr_pic_id, poc_lsb, refmark_bits, poc_bits — confirms slice-header parser outputs landed in the V4L2 control batch. dec.top_foc / bot_foc — confirms patch-0015 POC sentinel strip actually applied (should NOT show 65536 unless the strip mis-fired). dec.frame_num — cross-checks against VAAPI's pre-parsed frame_num (also already logged by patch 0014). pps.flags + (SMP=...) — confirms SCALING_MATRIX_PRESENT bit set this build. pps.refidx_l0/l1 — confirms Tier 1B num_ref_idx writes landed. Discriminates "we wrote X but kernel saw Y" from "we wrote zero all along" — the failure mode the original patch series didn't catch when slice-header bit_size fields were left zero. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 12:58:52 +00:00
test0r	3609fbb425	DEBUG: hex-dump OUTPUT and CAPTURE buffer contents per frame Diagnostic-only patch (NOT for upstream). Hex-dumps: - First 32 bytes of OUTPUT buffer at QBUF time in picture.c::RequestEndPicture (i.e. what we feed the kernel) - First 32 bytes of CAPTURE Y-plane after DQBUF in surface.c::RequestSyncSurface (i.e. what kernel returned) Lets us see whether: - OUTPUT bitstream begins with valid ANNEX_B start code + NAL header byte (e.g. `00 00 01 65` for IDR slice) - CAPTURE Y-plane after decode contains varied luma data (working) vs. all-zeros / repeating pattern (kernel didn't write anything). Removed once Step 1 decode is verified working. Output goes via existing request_log() to stderr. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
test0r	597e896594	surface: don't VIDIOC_S_FMT the CAPTURE queue The hantro stateless decoder derives the CAPTURE format from the SPS attached to the per-request OUTPUT controls. Calling VIDIOC_S_FMT on the CAPTURE queue at vaCreateSurfaces2 time can leave the driver's vb2 state in an inconsistent configuration where the queue accepts buffers and DQBUF returns successfully but the kernel never actually writes decoded pixels into them. Cross-reference: GStreamer's gst-plugins-bad/sys/v4l2codecs/ gstv4l2decoder.c only calls VIDIOC_G_FMT on the CAPTURE side (via gst_v4l2_decoder_negotiate_src_format and friends). The same code path produces correctly-decoded NV12 frames on the same RK3568 hantro-vpu where libva-v4l2-request-with-S_FMT emits flat-green zeroed CAPTURE buffers. The v4l2_get_format() call immediately after this block already gives us the bytesperline / sizes the driver chose; nothing else in this file consumed the explicit S_FMT side-effects. Empirical hypothesis test for the lingering "kernel decodes without errors but emits zeroed CAPTURE" bug. If post-patch output shows actual picture content, this confirms the diagnosis: explicit CAPTURE format mutation breaks hantro's internal state. If output remains flat-green, the bug is elsewhere and we resume hex-dump-grade instrumentation. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
test0r	565f5c0de4	context: introduce request_pool, decouple OUTPUT buffers from surfaces Commit 3 of the upstreamable plan (upstreamable_design.md §1, §5). Replaces the prior per-surface OUTPUT-buffer ownership model with a small driver-wide pool sized by codec pipeline depth (4 H.264 frames in flight), allocated unconditionally regardless of caller's num_render_targets. Prior art (kernel UAPI dev-stateless-decoder.rst, ffmpeg v4l2_request.c, Chromium V4L2StatelessVideoDecoder, GStreamer v4l2slh264dec) all decouple OUTPUT and CAPTURE pool sizing. fourier's "output_count == surfaces_count" model was a category error: OUTPUT buffers are request-time bitstream slots, CAPTURE buffers are picture-time DPB slots; their lifecycles and sizing are independent. Changes: * NEW src/request_pool.{c,h} (~200 LoC): - request_pool_init(): CREATE_BUFS + per-slot QUERYBUF + mmap. - request_pool_destroy(): munmap all, idempotent. - request_pool_acquire(): round-robin claim; returns V4L2 buffer index of an unused slot or -1. - request_pool_release(): mark slot free for reuse. - request_pool_slot(): accessor for ptr/size given a buffer index. * src/request.h: add struct request_pool output_pool to request_data. * src/context.c::RequestCreateContext: replace the per-surface OUTPUT loop with a single request_pool_init() call (count=4, independent of surfaces_count). Drop the now-unused locals (length, offset, source_data, output_buffers_count, index, index_base, i, surface_object). DELETES patch 0002's "output_buffers_count = ... ? ... : 4" hack inline — the pool's own count parameter supersedes it. * src/picture.c::RequestBeginPicture: borrow a pool slot at frame start, write its mmap pointer/size/index into the surface's transient source_* fields. The fields stay (still useful as a borrow handle that the existing codec_store_buffer memcpys target), but no longer represent surface-permanent ownership. Reset slices_size/slices_count here too (was implicit on first Render). * src/surface.c::RequestSyncSurface: after VIDIOC_DQBUF returns the OUTPUT buffer, release the pool slot and clear the surface's borrow handle. Fixes the segv on second-frame submission. * src/surface.c::RequestDestroySurfaces: remove the munmap of source_data — pool owns the mmap. * src/request.c::RequestTerminate: call request_pool_destroy() before close(video_fd) so munmaps still target a valid fd. * src/meson.build: add request_pool.c and request_pool.h to the sources/headers lists. This commit removes 0002's OUTPUT-pool hack inline (the "floor to 4" line is gone). The DECODE_MODE/START_CODE block in 0002 remains until commit 4 lands. Build-verified clean on aarch64. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
test0r	10114f6781	mplane: enable V4L2 multiplanar capture for NV12 on hantro-vpu Fourier's local patch already wired multiplanar plumbing through src/v4l2.c (helpers v4l2_type_video_{output,capture}() at lines 59-69, struct v4l2_plane planes[] threading in QUERYBUF/QBUF/DQBUF, per-plane EXPBUF loop at line 411) and through src/context.c, src/buffer.c, src/picture.c via the v4l2_type_video_{output,capture}(video_format ->v4l2_mplane) helper calls. The remaining gap: the NV12 entry in src/video.c was hardcoded to v4l2_mplane=false, and the bootstrap path in src/surface.c was hardcoded to singleplanar literals before video_format is populated. This patch flips the NV12 entry to v4l2_mplane=true and updates the two singleplanar literals in src/surface.c to their MPLANE variants: - src/video.c:42 v4l2_mplane=false -> true (NV12 only; Sunxi-tiled NV12 left at false for cedrus compatibility) - src/surface.c:84 output_type = v4l2_type_video_output(true) - src/surface.c:109 v4l2_find_format(..., CAPTURE_MPLANE, NV12) Empirically, hantro-vpu (RK3568 mainline) advertises NV12 only under V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; querying the singleplanar type returns no match (verified via VIDIOC_ENUM_FMT in Phase 3 GStreamer strace baseline). Trade-off accepted: legacy sunxi-cedrus singleplanar NV12 paths are left unchanged via the SUNXI_TILED_NV12 entry (still mplane=false, __arm__ only). Pure-NV12 cedrus on aarch64 would regress, but the known userbase here is RK3566/RK3568 hantro. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
test0r	c45fea96e3	fourier-local: stateless control modernization + HEVC strip Compound patch carrying the fork's pre-Step-1 substrate, originally authored by Jernej Škrabec / fourier on top of bootlin's `a3c2476`: - src/h264.c + src/picture.c: V4L2_CID_MPEG_VIDEO_H264_* renamed to V4L2_CID_STATELESS_H264_*, struct shapes tracked to mainline (V4L2_CID_STATELESS_H264_DECODE_MODE/_START_CODE added to the passthrough shim). - include/hevc-ctrls.h: redirect shim to <linux/v4l2-controls.h> (kernel-side HEVC controls now live in the canonical UAPI header). - src/meson.build: src/h265.c / src/h265.h commented out — HEVC build path is excluded from this fork (RK3568 hantro G1/G2 has no HEVC, and the kernel-side HEVC controls have a separate rework in flight upstream). - src/tiled_yuv.S: aarch64 stub for tiled_to_planar (assembly source was sunxi-cedrus armv7-only; aarch64 needs a stub to keep the build linking). - include/h264-ctrls.h: removed (dead post-fourier — no source includes it; the passthrough shim's CID aliases live in the kernel header now). Functionally equivalent to the prior fork master commits: `c1f5108` V4L2_PIX_FMT_H264_SLICE rename `4ccbfe9` Strip HEVC build path `da9f2a5` include/h264-ctrls.h passthrough + CID aliases `fc4bb10` src/h264.c track upstream UAPI shape `13e9b64` src/h264.c drop num_slices field `4d14ffb` src/tiled_yuv.S aarch64 stub `1b02c9b` src/h264.c include utils.h Folded into one commit during 2026-05-04 Step 1 reconciliation (see ../phase0_evidence/2026-05-04/findings.md). Per-patch history of the early fork commits preserved on the pre-step1 branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 09:40:14 +00:00
Nicolas Dufresne	b8ac9bb9ea	surface: Only set format if unset The vaCreateSurface2 may be called multiple times, setting the format again would lead to EBUSY being returned as you cannot change the format if you have buffers allocated. Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>	2019-05-16 16:14:55 +02:00
Paul Kocialkowski	518d7a0c59	Update and harmonize heading author lists Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2019-03-07 11:37:12 +01:00
Ezequiel Garcia	2c27ec3794	Add settable attributes to pixelformats Apparently, pixelformats are expected to be settable although the reason is not exactly clear to me. However, intel vaapi driver sets all its pixelformats as settable, and gstreamer-vaapi expects that as well. Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>	2018-10-12 16:13:05 -03:00
Paul Kocialkowski	7ff2543e64	Add support for the single-planar V4L2 API Signed-off-by: Paul Kocialkowski <contact@paulk.fr>	2018-09-07 16:43:13 +02:00
Paul Kocialkowski	25a8ac4d7e	Register video format directly instead of tiled indicator Signed-off-by: Paul Kocialkowski <contact@paulk.fr>	2018-09-07 12:58:44 +02:00
Paul Kocialkowski	c9327dd55a	Grab the base index when allocating buffers and mapping them Because there might be more than a single call to CreateSurfaces, we cannot assume that the index relative to the number of surfaces requested in a single call matches the v4l2 index. Grab the base index (as returned by the kernel) when allocating buffers and use it for memory mapping and addressing them in v4l2. This avoids memory-mapping the first (index 0) buffer multiple times in that scenario instead of the n-th allocated buffer (in the n-th call in the sequence). Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-20 13:48:51 +02:00
Paul Kocialkowski	3b0e7dbf12	surface: Avoid unitialized variable compiler warning Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-20 13:47:48 +02:00
Paul Kocialkowski	d2357862f8	Rename request_buffer helper to query_buffer Since the V4L2 ioctl is called QUERYBUF, it makes more sense to call the associated function with the same name. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-20 13:47:48 +02:00
Paul Kocialkowski	c2fb5683cf	surface: Remove duplicate request fd close This removes a duplicate conditional close of the request fd. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-20 13:47:48 +02:00
Paul Kocialkowski	c764527c17	Add support for QuerySurfaceAttributes Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-18 15:07:42 +02:00
Paul Kocialkowski	7587ef6901	surface: Add ExportSurfaceHandle support for dma-buf export This is the latest version of dma-buf export, that does support specifying DRM modifiers. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-18 15:02:37 +02:00
Paul Kocialkowski	829abae895	surface: Add basic support for CreateSurfaces2 Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-18 14:40:36 +02:00
Maxime Ripard	92f6546596	tree: Remove void * casts void * can be assigned from and stored to any pointer type without any warning. Remove the explicit casts. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 17:02:23 +02:00
Maxime Ripard	111f5b209a	tree: Rename cedrus_data to request_data The cedrus_data structure carries the old name. In order to migrate to the new name, let's rename it to request_data. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 17:02:23 +02:00
Maxime Ripard	4ad990e087	tree: Rename the header and defines The sunxi_cedrus.h header contains a bunch of defines prefixed with SUNXI_CEDRUS. As part as the ongoing migration to a more generic name, change that prefix for V4L2_REQUEST, and the header file to request.h Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 17:02:23 +02:00
Maxime Ripard	913e1e642c	tree: Rename the libva hooks As part of our renaming effort, Rename the libva hooks names to mention request instead of SunxiCedrus Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 17:02:23 +02:00
Maxime Ripard	2d1bce38c2	h264: Don't set num_slices anymore The num_slices parameter was improperly set to the number of reference frames, which is incorrect. Add a counter for the number of slices per surface, and set num_slices to that value. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 15:28:55 +02:00
Maxime Ripard	5aeb07f8bf	tree: Run clang-format to conform to the kernel coding style The coding style has been a bit erratic. Enforce the linux kernel coding style by reusing their .clang-format file, running clang-format on the source, and ignoring the few shortcomings that clang-format has at the moment (especially on aligning the define values). Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 10:12:15 +02:00
Maxime Ripard	b938824c48	tree: Shorten struct sunxi_cedrus_driver_data name This long structure name makes it quite difficult to fit within the 80 characters limit. Shorten it. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 09:34:15 +02:00
Maxime Ripard	fd263773cc	tree: Change the macros to take the actual arguments they are using Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-13 16:00:08 +02:00
Maxime Ripard	d7d8fc744b	Abstract away MPEG2 support Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-11 17:07:15 +02:00
Paul Kocialkowski	9f2c069f76	Rework buffer management to be more generic and support untiled format Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-11 15:16:52 +02:00
Paul Kocialkowski	2ca67372f8	Set surface destination index at context time for consistency Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-10 10:45:35 +02:00
Paul Kocialkowski	e23807f928	Add dummy vaPutSurface implementation As it turns out vaPutSurface is one of the required core functions. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-06-21 09:55:44 +02:00
Paul Kocialkowski	bb73d363a3	Sync with latest definitions from the Cedrus driver and requests API Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-06-21 09:30:06 +02:00

1 2

85 Commits