libva-v4l2-request-fourier

Author	SHA1	Message	Date
claude-noether	8d71e20bf7	fresnel-fourier iter2 Phase 6 commit B: rewrite h265.c against new V4L2 stateless HEVC API Rewrites src/h265.c (407 lines → 588 lines) and the picture.c HEVC dispatch + per-slice accumulation against the modern split V4L2_CID_ STATELESS_HEVC_{SPS,PPS,SLICE_PARAMS,SCALING_MATRIX,DECODE_PARAMS, DECODE_MODE,START_CODE} stateless controls. Replaces the staging-era V4L2_CID_MPEG_VIDEO_HEVC_{SPS,PPS,SLICE_PARAMS} CIDs that were removed from the kernel UAPI. Per-frame submission: ONE batched VIDIOC_S_EXT_CTRLS, count=5, ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS: 0xa40a90 SPS (40 bytes) 0xa40a91 PPS (64 bytes) 0xa40a92 SLICE_PARAMS (variable; dynamic-array; one entry per slice) 0xa40a93 SCALING_MATRIX (1296 bytes; memset-zero when no scaling list) 0xa40a94 DECODE_PARAMS (328 bytes; per-frame DPB info) Plus device-wide menus set once at context.c init (separate batched S_EXT_CTRLS call so a kernel without HEVC controls — e.g. hantro on RK3568/RK3399 — silently fails its batch without invalidating H.264): 0xa40a95 DECODE_MODE (FRAME_BASED on rkvdec) 0xa40a96 START_CODE (ANNEX_B on rkvdec) Reference: FFmpeg libavcodec/v4l2_request_hevc.c:505-565 (v4l2_request_hevc_queue_decode batched submission shape). Phase 5 review amendments incorporated: C1 (data_byte_offset NOT data_bit_offset): Old h265.c at lines 184-209 ran an 8-bit search to compute bit-granularity offset. New API renames the field to data_byte_offset (u32 byte offset). Bit-search dropped; replaced with plain byte offset = source_offset + slice->slice_data_byte_offset. C2 (dpb_entry.flags only LONG_TERM_REFERENCE; pic_order_cnt_val singular; poc_st_curr_[] arrays hold DPB INDICES not POC): h265_fill_decode_params replaces old slice-params DPB iteration with explicit DPB classification + index-array population. For each VAAPI ReferenceFrames[i]: - Classify into ST_CURR_BEFORE / ST_CURR_AFTER / LT_CURR via VA_PICTURE_HEVC_RPS_ flags. - Set dpb[j].timestamp, .pic_order_cnt_val (singular), .field_pic. - Set dpb[j].flags = LONG_TERM_REFERENCE iff RPS_LT_CURR. - Append j (DPB index, u8) to poc_st_curr_before[k] / poc_st_curr_after[k] / poc_lt_curr[k] based on classification. C3 (union-aliasing reasoning corrected): BeginPicture's params.h265.num_slices = 0 reset is benign for non-HEVC profiles because byte ~17764 of the params union is past any field non-HEVC profiles read, NOT because RenderPicture's per-buffer copies overwrite that location. Wording amended in phase4_iter2_plan.md per phase5_iter2_review.md. S1 (PPS flags 19 + 20 — DEBLOCKING_FILTER_CONTROL_PRESENT and UNIFORM_SPACING): Empirically VAAPI does NOT expose either flag in the VAPictureParameterBufferHEVC pic_fields.bits or slice_parsing_fields.bits. Both bits left zero. BBB-720p10s_hevc fixture uses neither tiles nor explicit deblocking-control parameters, so the omission is correct for the iter2 binding cell. S2 (3 PPS scalars added): pic_parameter_set_id (default 0; VAAPI doesn't expose), num_ref_idx_l0_default_active_minus1, num_ref_idx_l1_default_ active_minus1 (both populated from VAAPI picture struct). Q2 (slice_segment_addr populated): Was missing in old h265.c. Now sourced from VAAPI's slice->slice_segment_address. S3 (SCALING_MATRIX content choice): Implementer choice taken: when iqmatrix_set==false (BBB has no scaling list per SPS flags = SAO\|STRONG_INTRA_SMOOTHING), h265_fill_scaling_matrix sends memset-zero. Matches FFmpeg's sl=NULL pattern at v4l2_request_hevc.c:384-403 (preserves byte-equality vs cross-validator anchor). S4 (FFmpeg function name fix): cosmetic; no code impact. Plus one Phase 6 inline correction: phase 5 review S1 suggested VAAPI exposes uniform_spacing_flag in pic_fields.bits; empirical test-compile shows it doesn't. Comment added in h265_fill_pps documenting the omission. Picture.c changes (3 edits): 1. codec_set_controls HEVCMain dispatch (lines 204-206 → call h265_set_controls; replaces explicit Fourier-local: HEVC stripped reject). 2. codec_store_buffer HEVC VASliceParameterBufferType case: append VAAPI slice param to params.h265.slices[N] array, increment num_slices. Single-slice mirror at .slice retained for h265_fill_pps (which reads dependent_slice_segment_flag from LongSliceFlags). 3. RequestBeginPicture: add params.h265.num_slices = 0 reset alongside existing h264.matrix_set = false reset. Surface.h: extend params.h265 struct with slices[HEVC_MAX_SLICES_PER_ FRAME=64] array + num_slices counter. ~17 KB extra per surface union; 24 surfaces in iter7 cap_pool = ~400 KB total surface_heap growth. object_heap allocator picks up new size automatically via sizeof(struct object_surface). Context.c: separate 2-control batched call sets HEVC DECODE_MODE + START_CODE device-wide. Same best-effort (void)v4l2_set_controls pattern as the existing H.264 device-init block; if kernel doesn't advertise HEVC controls (hantro on RK3568/RK3399), the batch silently fails without invalidating the H.264 batch. Meson.build: uncomment 'h265.c' (line 50) and 'h265.h' (line 73) in sources + headers lists. H265.h: added HEVC_MAX_SLICES_PER_FRAME=64 #define before struct forward declarations. Phase 6 smoke test on fresnel (post Commit A + Commit B): Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec env binding (/dev/video1 + /dev/media0). PASS. Criterion 3: ffmpeg -hwaccel vaapi HEVC decode of bbb_720p10s_hevc.mp4 -frames:v 5 -f null -, exit 0. cap_pool_init: 24 slots ready. PASS. Criterion 4: mpv --hwdec=vaapi --vo=image at +02s seek, HEVC fixture: HW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5 SW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5 HW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656 SW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656 HW=SW byte-identical for both frames; frame1 != frame2 (real motion). PASS. Criterion 5: regression hashes hold for both prior cells: H.264 +30s HW frame 1: f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9 (T4 ref MATCH) H.264 +30s HW frame 2: 7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8 (T4 ref MATCH) MPEG-2 +02s HW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 (iter1 ref MATCH) MPEG-2 +02s HW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de (iter1 ref MATCH) PASS. All five criteria green on first build attempt — Phase 5 review caught the 3 Critical UAPI errors (data_bit_offset → data_byte_offset rename; dpb.rps field gone + pic_order_cnt_val rename + index-array semantics) that would have been Phase 6 compile failures or silent Phase 7 byte-compare divergences. Without that review pass, this commit would have been the start of a 2+ loopback debugging cycle. Refs: ../fresnel-fourier/phase4_iter2_plan.md (10 contract clauses, File 4 patch shape) ../fresnel-fourier/phase5_iter2_review.md (C1, C2, C3, S1, S2, S3, S4, Q2 amendments all incorporated) ../fresnel-fourier/phase0_evidence/2026-05-08/iter2_phase3/ ffmpeg_v4l2req.stdout (cross-validator anchor — Phase 7 bonus byte-compare verification target) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 15:58:34 +02:00
claude-noether	229d6d11be	fresnel-fourier iter1 Phase 6 commit D: drop missed mpeg2-ctrls.h include from context.c Fix-forward for commit C (`3aab187`): Phase 2 source-read missed a third occurrence of #include <mpeg2-ctrls.h> in src/context.c:42. The Phase 2 grep audit reported only two callsites (src/config.c:37, src/mpeg2.c:38), both removed in commit B. After commit C deleted include/mpeg2-ctrls.h from disk, the build broke on context.c with: ../src/context.c:42:10: fatal error: mpeg2-ctrls.h: No such file or directory 42 \| #include <mpeg2-ctrls.h> \| ^~~~~~~~~~~~~~~ The include in context.c was vestigial — context.c references no V4L2_CID_MPEG_VIDEO_MPEG2_* symbols and never needed the header even before iter1's rewrite. The Phase 2 grep was simply incomplete. This commit drops the orphan include line. Build now passes; install clean; Phase 1 criterion 4 (DMA-BUF GL HW=SW byte-identical pixel hashes) still PASS: HW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 SW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 HW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de SW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de Per feedback_dev_process.md Phase 6 discipline: "If a plan revision is needed mid-implementation, surface it explicitly and re-enter Phase 4." This is a 1-line scope expansion of commit B's "drop mpeg2-ctrls.h include from all callsites" intent. Surfacing explicitly here rather than silently amending B (which is already pushed). No re-lock of plan needed; the spirit of File 1+2 in phase4_iter1_plan.md was "drop the include from every file that has it." The audit method (Phase 2 grep) was the gap. Lesson for Phase 8 memory update: a more authoritative completeness check than naive grep before deleting a header — recursive build attempt to drive out hidden includes, or grep with no path filter would have caught it. Refs: ../fresnel-fourier/phase4_iter1_plan.md (File 3 + audit) ../fresnel-fourier/phase2_iter1_situation.md Bug 3 (incomplete audit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 10:24:50 +02:00
claude-noether	dcaa1f12e5	docs: clarify Rockchip silicon — PineTab2 is RK3566, not RK3568 Surfaced during iter7 Track F research: the campaign target hardware is Rockchip RK3566 silicon (PineTab2). The hantro driver attaches via the rockchip,rk3568-vpu DT compatible because the RK3566 silicon is close enough to RK3568 to share that variant. The proper RK3566 mainline driver target (rkvdec2 / vdpu346) has no kernel support yet — Christian Hewitt's patch series LKML 2025/12/26/206 is unmerged. Updates the two src/ comments that called the hardware "RK3568": - context.c: hantro-vpu device-init S_EXT_CTRLS comment now reads "via rockchip,rk3568-vpu DT compatible (covers RK3568 and RK3566 — PineTab2 silicon — since they're close enough)" - h264.c: DPB pic_num discussion ends "...never surfaced on PineTab2 (RK3566 via hantro/rk3568-vpu)" Not a correctness change. Compiles + decodes identically. The update matters for upstream submission accuracy (bootlin/Rockchip maintainers will care which silicon the campaign tested on). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 11:39:11 +00:00
claude-noether	a09c03c154	iter6 fix: per-OUTPUT-slot request_fd binding via REINIT iter4 (`385dee1`) replaced the original media_request_reinit pattern with close+media_request_alloc per frame to escape an EINVAL on S_EXT_CTRLS that turned out to be a DPB-payload bug (`74d8dd1`, FFmpeg V4L2_H264_FRAME_REF semantics). The per-frame close+alloc model worked for mpv vaapi-copy (single-surface recycle) but raced under Firefox 150's MediaSource pipeline (multi-surface rotation): fd=30 got reused via lowest-free-fd allocation faster than the kernel- side per-buffer state-machine could tear down the prior request, producing intermittent VIDIOC_QBUF EINVAL on OUTPUT after 1..53 successful frames. Phase 2 telemetry confirmed: - DQBUF returned the index we passed (no FIFO mismatch) - SPS/PPS/DECODE_PARAMS/SCALING_MATRIX byte-identical between mpv and Firefox first 64 bytes - Pool size bump 4 -> 16 only delayed the failure (62 frames) - Different OUTPUT slot indices failed across runs (race signature) Fix: each OUTPUT pool slot owns a permanent request_fd allocated once at request_pool_init and REINIT'd between uses in RequestSyncSurface. 1:1 slot-to-fd binding eliminates cross-slot fd reuse entirely. Pool stays driver-wide (multi-context safe per iter5 Track E); slots cycle through 16 distinct fds in round-robin acquire. Files: - request_pool.h: add request_fd field to slot struct; init signature takes media_fd - request_pool.c: alloc per-slot fd at init, close at destroy - context.c: pass driver_data->media_fd; pool size 4 -> 16 - picture.c: BeginPicture binds slot->request_fd to surface; EndPicture's per-frame media_request_alloc removed - surface.c: RequestSyncSurface uses media_request_reinit instead of close+alloc; DestroySurfaces close removed (slot owns fd); error path close removed; surface_object NULL-init for the -Wmaybe-uninitialized warning fix Empirical verification (clean build sha ebe396d5..., no diagnostic instrumentation): - Firefox 150 + bbb_1080p30_h264.mp4 + LIBVA_DRIVER_NAME=v4l2_request + sandbox enabled: 35s+ playback, zero "Unable to queue buffer" / "Unable to set control(s)", lsof shows RDD process holds /dev/video1 + /dev/media0 throughout. Driver stderr: only the single cap_pool_init: 24 slots ready line. - mpv vaapi-copy 50 frames: zero errors, "Using hardware decoding (vaapi-copy)" - no regression vs iter5-end driver. Pool-size bump diagnostic (Phase 5 sonnet design review feedback): 4 -> 16 alone took 1->62 frames, far short of the 30s success criterion (~900 frames at 30fps). REINIT discipline is the actual fix; pool 16 is comfortable headroom over typical H.264 MaxDpbFrames. Phase 5 sonnet code review: APPROVE-WITH-CHANGES (one comment attribution corrected: cleanup runs at RequestTerminate, not RequestDestroyContext, since the pool is driver-wide). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 21:30:39 +00:00
test0r	b993355507	iter5 Track E: move LAST_OUTPUT_WIDTH/HEIGHT from process-global to per-driver-data Sonnet review 7.3 / 9.6 from iter1 + carried iter2/3/4 substrate. Two libva driver_data instances in the same process (e.g. Firefox playing two tabs at different resolutions, or Firefox + mpv via the same dlopened backend) would race on the static cache. Move to struct request_data.last_output_width/height. The V4L2 device fd is already per-driver_data, so this is the correct binding unit (one fd, one current OUTPUT format). Verified: two concurrent mpv processes (2s stagger) both decode 300 frames cleanly with no cross-corruption. Same-instant init still hits kernel-level fd contention on /dev/video1 (hantro is a single-instance device); cross-process serialization is out of scope for a libva backend. Resolves the surface_reset_format_cache() callsite: now takes driver_data parameter (was zero-arg). Also drops the 'rc' unused-variable warning in v4l2_ioctl_controls that the iter5 sweep left behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 15:05:41 +00:00
test0r	19acc76da4	iter2 Fix 3: decoupled CAPTURE buffer pool with LRU recycling Pre-iter2 each VA surface was permanently 1:1 bound to one V4L2 CAPTURE buffer. mpv reusing a surface for a new decode while the compositor still held an EXPBUF'd dma_buf fd to the prior frame caused the kernel to write fresh decode output into the same physical memory the compositor was reading -- visible as stutter / back-and-forth swap on mpv --hwdec=vaapi --vo=gpu playback. Architecture: - New cap_pool abstraction (cap_pool.{h,c}) owns N CAPTURE buffers (N = max(surfaces_count, MIN_CAP_POOL=24)) with per-slot state {FREE, IN_DECODE, DECODED, EXPORTED} guarded by pthread_mutex_t. - Surfaces no longer own buffers; each vaBeginPicture acquires the oldest FREE slot (LRU), binds it for the decode cycle, and the slot cycles IN_DECODE -> DECODED (post-DQBUF) -> EXPORTED (post-EXPBUF). - Slot is released on next BeginPicture for the same surface or on vaDestroySurfaces. Limitations (Sonnet Phase 5 review iter2 9.x, deferred to iter3+): - Option-A statistical mitigation; race window narrows to "pool exhausted, force-recycle of oldest EXPORTED slot." For typical mpv 16-surface playback with MIN_CAP_POOL=24 the fallback never fires. - Multi-context concurrent use not addressed (one V4L2 device, multiple cap_pools -- iter3 scope). Other call sites updated: - picture.c::BeginPicture acquires + binds, releasing prior slot if any. - surface.c::SyncSurface marks slot DECODED after DQBUF. - surface.c::ExportSurfaceHandle marks slot EXPORTED, retaining OUR EXPBUF fd for force-recycle close(). - surface.c::DestroySurfaces releases via surface_unbind_slot; cap_pool owns the mmaps now. - surface.c::CreateSurfaces2 destroys the pool in the resolution-change path before REQBUFS(0) (else stale v4l2_index after Fix 1's REQBUFS). - context.c::DestroyContext invokes cap_pool_destroy. - image.c::DeriveImage skips copy_surface_to_image when current_slot is NULL (ffmpeg av_hwframe_ctx_init probes derive on undecoded surfaces). Verified: mpv vaapi-copy 200 frames bbb_1080p30, 0 drops, LRU visibly recycling slot indices, real luma gradient. mpv vaapi --vo=gpu operator-inspection follows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:03:31 +00:00
test0r	06beef6248	iter2 Fix 1: invalidate format cache on DestroyContext + REQBUFS(0) on CAPTURE in resolution-change path Fix 1 of iteration 2 per phase4_iter2_plan.md. Adds surface_reset_format_cache() exposed from src/surface.h. Called from RequestDestroyContext after the dual REQBUFS(0). Without this, multi-video Firefox sessions on mozilla.org corrupted the next session's CAPTURE format query: the kernel reset to defaults but our LAST_OUTPUT_WIDTH/HEIGHT cache still said 'already 1920x1088,' so the next G_FMT returned 48x48 and the exported descriptor encoded wrong pitch/offset. Also adds REQBUFS(0) on CAPTURE in the resolution-change path of RequestCreateSurfaces2 (Sonnet Phase 5 review iter2 9.1). The existing code only did REQBUFS(0) on OUTPUT before re-S_FMTting; hantro derives CAPTURE format from OUTPUT format, so leftover CAPTURE buffers from the prior resolution would also block the implicit format change. Pre-existing bug surfaced by Sonnet's audit; Fix 3 pool refactor would have exposed it more often. Limitation noted in surface.h docblock: the LAST_OUTPUT_WIDTH/ HEIGHT cache is a static process-global, so concurrent multi- context use still races (Sonnet 7.3 / 9.6). Iteration 2 only addresses sequential sessions. Multi-context safety is iteration 3+. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 19:11:03 +00:00
test0r	4078368104	context: enable ANNEX_B start-code emission to match device Patch 0002 sets V4L2_CID_STATELESS_H264_START_CODE to ANNEX_B on the device, telling the kernel that OUTPUT-buffer payloads will contain 0x00 0x00 0x01 NAL start codes. picture.c::codec_store_buffer has the prepend logic guarded by `if (context->h264_start_code)`, but that boolean is set ONLY inside h264_get_controls() — a function that exists but is never called. Result: device expects ANNEX_B, libva-v4l2-request feeds raw NAL payloads with no start codes, kernel cannot find slice boundaries, hantro emits a zeroed CAPTURE buffer. mpv reports successful decode because the V4L2 round-trip succeeds (no EINVAL); the visual output is a flat dark-green frame (NV12 zero through BT.709). Identified via: - Patch 0006 cleared the EINVAL cluster-rejection (128 → 0 on bbb_1080p30) but visual output remained flat green. - GStreamer reference (gstv4l2codech264dec.c:1363-1377) confirms start codes are required when ANNEX_B is selected. - Source-archaeology of fourier's picture.c:67-74 showed the gate on context->h264_start_code. Fix: in context.c::RequestCreateContext, immediately after patch 0002's device-control block, set context_object->h264_start_code = true to match the ANNEX_B mode we just programmed. Hardcoded for now (matches 0002's hardcoded set); replaced with a runtime probe in the planned probe-then-set commit. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
test0r	565f5c0de4	context: introduce request_pool, decouple OUTPUT buffers from surfaces Commit 3 of the upstreamable plan (upstreamable_design.md §1, §5). Replaces the prior per-surface OUTPUT-buffer ownership model with a small driver-wide pool sized by codec pipeline depth (4 H.264 frames in flight), allocated unconditionally regardless of caller's num_render_targets. Prior art (kernel UAPI dev-stateless-decoder.rst, ffmpeg v4l2_request.c, Chromium V4L2StatelessVideoDecoder, GStreamer v4l2slh264dec) all decouple OUTPUT and CAPTURE pool sizing. fourier's "output_count == surfaces_count" model was a category error: OUTPUT buffers are request-time bitstream slots, CAPTURE buffers are picture-time DPB slots; their lifecycles and sizing are independent. Changes: * NEW src/request_pool.{c,h} (~200 LoC): - request_pool_init(): CREATE_BUFS + per-slot QUERYBUF + mmap. - request_pool_destroy(): munmap all, idempotent. - request_pool_acquire(): round-robin claim; returns V4L2 buffer index of an unused slot or -1. - request_pool_release(): mark slot free for reuse. - request_pool_slot(): accessor for ptr/size given a buffer index. * src/request.h: add struct request_pool output_pool to request_data. * src/context.c::RequestCreateContext: replace the per-surface OUTPUT loop with a single request_pool_init() call (count=4, independent of surfaces_count). Drop the now-unused locals (length, offset, source_data, output_buffers_count, index, index_base, i, surface_object). DELETES patch 0002's "output_buffers_count = ... ? ... : 4" hack inline — the pool's own count parameter supersedes it. * src/picture.c::RequestBeginPicture: borrow a pool slot at frame start, write its mmap pointer/size/index into the surface's transient source_* fields. The fields stay (still useful as a borrow handle that the existing codec_store_buffer memcpys target), but no longer represent surface-permanent ownership. Reset slices_size/slices_count here too (was implicit on first Render). * src/surface.c::RequestSyncSurface: after VIDIOC_DQBUF returns the OUTPUT buffer, release the pool slot and clear the surface's borrow handle. Fixes the segv on second-frame submission. * src/surface.c::RequestDestroySurfaces: remove the munmap of source_data — pool owns the mmap. * src/request.c::RequestTerminate: call request_pool_destroy() before close(video_fd) so munmaps still target a valid fd. * src/meson.build: add request_pool.c and request_pool.h to the sources/headers lists. This commit removes 0002's OUTPUT-pool hack inline (the "floor to 4" line is gone). The DECODE_MODE/START_CODE block in 0002 remains until commit 4 lands. Build-verified clean on aarch64. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
test0r	50e0c2b996	context: pre-STREAMON device controls and minimum OUTPUT pool Two related fixes that surfaced during the first hantro-vpu (RK3568) smoke test of the multiplanar build: 1. OUTPUT queue must be non-empty at STREAMON. Hantro's vb2_start_streaming rejects an empty queue with EINVAL. Some VA-API callers (notably ffmpeg's vaapi-copy path) call vaCreateContext with num_render_targets=0 and allocate render targets lazily. The OUTPUT (bitstream-input) pool must NOT be sized off surfaces_count alone — it is a request-time resource, not per-surface. Quick fix: floor the pool to 4 buffers when the caller passes 0. (A proper decoupling of OUTPUT pool from surface lifecycle is documented in upstreamable_design.md.) 2. Device-wide stateless H.264 controls before STREAMON. The V4L2 stateless framework requires V4L2_CID_STATELESS_H264_ DECODE_MODE and START_CODE be set on the device fd (request_fd=-1) before stream start. Per-request controls (SPS/PPS/SLICE_PARAMS/etc.) attached to a request_fd come later via h264_set_controls(). hantro-vpu accepts only DECODE_MODE_FRAME_BASED; START_CODE_ANNEX_B matches what the existing slice-assembly path emits. This is set unconditionally for now (errors silently ignored) to keep cedrus and other backends compatible — they may default to SLICE_BASED and not expose DECODE_MODE at all. Probe-then-set via VIDIOC_QUERYCTRL is the upstream-correct approach (see upstreamable_design.md §3). After this patch, vainfo still enumerates as before, but the first mpv vaapi-copy attempt advances past STREAMON and into actual decode submission. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
test0r	c45fea96e3	fourier-local: stateless control modernization + HEVC strip Compound patch carrying the fork's pre-Step-1 substrate, originally authored by Jernej Škrabec / fourier on top of bootlin's `a3c2476`: - src/h264.c + src/picture.c: V4L2_CID_MPEG_VIDEO_H264_* renamed to V4L2_CID_STATELESS_H264_*, struct shapes tracked to mainline (V4L2_CID_STATELESS_H264_DECODE_MODE/_START_CODE added to the passthrough shim). - include/hevc-ctrls.h: redirect shim to <linux/v4l2-controls.h> (kernel-side HEVC controls now live in the canonical UAPI header). - src/meson.build: src/h265.c / src/h265.h commented out — HEVC build path is excluded from this fork (RK3568 hantro G1/G2 has no HEVC, and the kernel-side HEVC controls have a separate rework in flight upstream). - src/tiled_yuv.S: aarch64 stub for tiled_to_planar (assembly source was sunxi-cedrus armv7-only; aarch64 needs a stub to keep the build linking). - include/h264-ctrls.h: removed (dead post-fourier — no source includes it; the passthrough shim's CID aliases live in the kernel header now). Functionally equivalent to the prior fork master commits: `c1f5108` V4L2_PIX_FMT_H264_SLICE rename `4ccbfe9` Strip HEVC build path `da9f2a5` include/h264-ctrls.h passthrough + CID aliases `fc4bb10` src/h264.c track upstream UAPI shape `13e9b64` src/h264.c drop num_slices field `4d14ffb` src/tiled_yuv.S aarch64 stub `1b02c9b` src/h264.c include utils.h Folded into one commit during 2026-05-04 Step 1 reconciliation (see ../phase0_evidence/2026-05-04/findings.md). Per-patch history of the early fork commits preserved on the pre-step1 branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 09:40:14 +00:00
Paul Kocialkowski	7f359be748	Include missing needed codec headers for build Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2019-05-16 16:32:03 +02:00
Paul Kocialkowski	d48ace9757	Update H.264 V4L2 pixel format, which was renamed Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2019-05-16 16:24:23 +02:00
Paul Kocialkowski	e29b04ccc7	autotools: Rewrite configuration in a minimalistic fashion Drop the per-codec options while at it, since we'll soon include a copy of the associated headers. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2019-03-07 11:37:12 +01:00
Paul Kocialkowski	518d7a0c59	Update and harmonize heading author lists Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2019-03-07 11:37:12 +01:00
Ezequiel Garcia	b2944629fa	Add support for dynamic detection of supported codecs H.264 and H.265 support is still not supported upstream, so it makes sense to autodetect each codec and only enable those that are supported. Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>	2018-10-12 16:11:06 -03:00
Paul Kocialkowski	7ff2543e64	Add support for the single-planar V4L2 API Signed-off-by: Paul Kocialkowski <contact@paulk.fr>	2018-09-07 16:43:13 +02:00
Paul Kocialkowski	13eaae060e	Add support for H265 decoding, including predictive frames Some features are missing, such as scaling lists (quantization) and 10-bit output. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-08-31 10:13:52 +02:00
Paul Kocialkowski	c9327dd55a	Grab the base index when allocating buffers and mapping them Because there might be more than a single call to CreateSurfaces, we cannot assume that the index relative to the number of surfaces requested in a single call matches the v4l2 index. Grab the base index (as returned by the kernel) when allocating buffers and use it for memory mapping and addressing them in v4l2. This avoids memory-mapping the first (index 0) buffer multiple times in that scenario instead of the n-th allocated buffer (in the n-th call in the sequence). Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-20 13:48:51 +02:00
Paul Kocialkowski	fa7ab6a251	context: Liberate output and capture buffers at ContextDestroy The V4L2 API does not currently provide a way to liberate allocated buffers one by one (which would fit well with DestroySurfaces in VAAPI). Moreover, streaming needs to be off before liberating buffers is allowed. As a result, output an capture buffers can only be liberated when destroying the decoding context, all at once, such as implemented in this patch. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-20 13:47:48 +02:00
Paul Kocialkowski	d2357862f8	Rename request_buffer helper to query_buffer Since the V4L2 ioctl is called QUERYBUF, it makes more sense to call the associated function with the same name. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-20 13:47:48 +02:00
Maxime Ripard	92f6546596	tree: Remove void * casts void * can be assigned from and stored to any pointer type without any warning. Remove the explicit casts. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 17:02:23 +02:00
Maxime Ripard	111f5b209a	tree: Rename cedrus_data to request_data The cedrus_data structure carries the old name. In order to migrate to the new name, let's rename it to request_data. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 17:02:23 +02:00
Maxime Ripard	4ad990e087	tree: Rename the header and defines The sunxi_cedrus.h header contains a bunch of defines prefixed with SUNXI_CEDRUS. As part as the ongoing migration to a more generic name, change that prefix for V4L2_REQUEST, and the header file to request.h Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 17:02:23 +02:00
Maxime Ripard	913e1e642c	tree: Rename the libva hooks As part of our renaming effort, Rename the libva hooks names to mention request instead of SunxiCedrus Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 17:02:23 +02:00
Maxime Ripard	e7c09a336f	h264: Implement local cache of the latest decoded pictures The libva only provides the reference images needed to decode the current picture, but not the full DPB. However, some codecs need that whole DPB in order to decode a picture. For example, the Allwinner hardware codec has an internal SRAM, with each picture getting a slot in that SRAM, and during each decoding process, some metadata will then be generated from that SRAM content to a separate buffer. Therefore, each frames must be located at the same SRAM position each time so that the metadata are then re-used properly. However, since libva will only pass a few reference images, we can end up in a situation where multiple, subsequent, frames will have the same reference images set, but might all be used as reference later on and cannot therefore be located at the same position. And from a more theorical point of view, Linux expects a full blown DPB in its H264 control. In order to work around this, we can create a shadow of the DPB by simply maintaining a list of 16 decoded images, each associated with their VAPictureH264 and an age. This age is the last time we used that frame as reference. When a new picture is decoded, either we assign it to a free slot, or we reuse the slot from the frame that hasn't been used as a reference for the longest time. This is a much simpler approach than the one documented in the H264 spec, but this shouldn't really be a problem since we don't handle the reference frames ourselves, but just re-use the one from the libva, and taken from the bitstream before. As such, frames that are not supposed to be used for reference will not be anymore, their age will not increase, and therefore after a while we will garbage-collect their slot to store a much newer frame. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 15:30:33 +02:00
Maxime Ripard	5aeb07f8bf	tree: Run clang-format to conform to the kernel coding style The coding style has been a bit erratic. Enforce the linux kernel coding style by reusing their .clang-format file, running clang-format on the source, and ignoring the few shortcomings that clang-format has at the moment (especially on aligning the define values). Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 10:12:15 +02:00
Maxime Ripard	b938824c48	tree: Shorten struct sunxi_cedrus_driver_data name This long structure name makes it quite difficult to fit within the 80 characters limit. Shorten it. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-17 09:34:15 +02:00
Maxime Ripard	fd263773cc	tree: Change the macros to take the actual arguments they are using Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-13 16:00:08 +02:00
Maxime Ripard	1efa9d877e	Add support for H264 decoding Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-11 17:07:15 +02:00
Paul Kocialkowski	03fd51b3b3	Reduce switch/case indentation Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-11 15:32:52 +02:00
Paul Kocialkowski	a9f3129298	context: Use proper error path Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-11 15:21:17 +02:00
Maxime Ripard	53a8c6e1cf	context: Make it clear why we copy the ids Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>	2018-07-11 15:20:05 +02:00
Paul Kocialkowski	9f2c069f76	Rework buffer management to be more generic and support untiled format Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-11 15:16:52 +02:00
Paul Kocialkowski	2ca67372f8	Set surface destination index at context time for consistency Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-07-10 10:45:35 +02:00
Paul Kocialkowski	bb73d363a3	Sync with latest definitions from the Cedrus driver and requests API Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-06-21 09:30:06 +02:00
Paul Kocialkowski	00c190c740	context: Include missing utils header Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-05-02 16:42:35 +02:00
Paul Kocialkowski	26536a0d8b	context: Add warning about index mismatch when allocating source buffers Signed-off-by: Paul Kocialkowski <contact@paulk.fr>	2018-05-02 14:53:42 +02:00
Paul Kocialkowski	ebd5a845b1	context: Register context parameter with object Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-25 15:05:37 +02:00
Paul Kocialkowski	b01a66dcd8	context: Add missing new line Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-25 11:12:21 +02:00
Paul Kocialkowski	294a6c958a	Use all-caps macros instead of object_heap_lookup (for now) Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-25 11:03:58 +02:00
Paul Kocialkowski	f872e345d0	Centralize buffer-related ressources in surface object and avoid dynamic indexes Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-25 10:48:17 +02:00
Paul Kocialkowski	58b15c25c9	context: Resolve various trivial build issues Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-25 09:19:00 +02:00
Paul Kocialkowski	a8c191b544	Rename mem2mem_fd to video_fd to prepare for media introduction Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-24 16:40:42 +02:00
Paul Kocialkowski	c7f0d7684a	Introduce and use dedicated v4l2 helpers to replace inline ioctls Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-24 15:39:31 +02:00
Paul Kocialkowski	2399515b84	context: Harmonize coding style Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-24 10:03:58 +02:00
Paul Kocialkowski	cd31cb568c	Rename va_config to config for consistency Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-23 17:09:19 +02:00
Paul Kocialkowski	4b7e71668e	Reorder functions, with a straightforward logic Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-23 16:51:15 +02:00
Paul Kocialkowski	a5354efe43	Rework comments by splitting them into README and removing redundant ones Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-23 16:40:00 +02:00
Paul Kocialkowski	104eb22462	context: Rename target structure elements to make them explicit Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>	2018-04-23 14:55:18 +02:00

1 2

64 Commits