libva-v4l2-request-fourier

Author	SHA1	Message	Date
marfrit	c454618ae1	Merge pull request 'picture, request_pool: transparent OUTPUT-pool resize on bitstream overrun (#15 )' (#16 ) from claude-noether/libva-v4l2-request-fourier:noether/output-pool-resize-issue-15 into master Reviewed-on: #16	2026-05-21 11:23:08 +00:00
claude-noether	5939ac6ae0	picture, request_pool: transparent OUTPUT-pool resize on bitstream overrun Follow-up to #13 (PR #14, bounds-check floor). When a stream-level resolution upshift mid-session pushes an Annex-B start code / VP8 header pad / slice payload past the OUTPUT pool slot's mmap, the bounds check used to return VA_STATUS_ERROR_ALLOCATION_FAILED and force the libva consumer to recreate the surface (losing the frame). This patch absorbs the resize transparently: 1. codec_store_buffer's three append sites call a new codec_store_buffer_ensure_capacity() before each memcpy/memset. 2. On overflow, ensure_capacity snapshots the in-flight surface's accumulated bytes, temporarily releases its OUTPUT pool slot, and calls request_pool_resize. 3. request_pool_resize STREAMOFFs the OUTPUT queue, munmaps every slot, closes every per-slot media-request fd, REQBUFS(0)s the V4L2 buffers, re-issues S_FMT with a sizeimage hint = 2× the required total (capped at 1 GiB, rounded up to a 4 KiB page), CREATE_BUFSes the original slot count, per-slot queries + mmaps + media_request_allocs, and STREAMONs. 4. ensure_capacity re-acquires a pool slot, re-mirrors source_{index,data,size,request_fd} onto the surface, and restores the saved bytes via memcpy. The cached S_FMT params (pixelformat, picture_width, picture_height) are stashed on the request_pool at init time so the resize is fully self-contained — caller passes only the new sizeimage hint. A new v4l2_set_format_sizeimage() helper accepts an explicit sizeimage override; v4l2_set_format keeps the SOURCE_SIZE_MAX (1 MiB) default for CreateContext-time S_FMT. The pre-condition for the resize is "no pool slot may be borrowed." The inline-Sync-in-EndPicture pattern (RequestEndPicture calls RequestSyncSurface before returning) guarantees that during codec_store_buffer, the only borrowed slot is the current render_surface_id's — which the resize trigger explicitly releases before invoking the pool function. request_pool_resize asserts the invariant via a busy-scan and bails loudly if anyone breaks it rather than corrupting in-flight V4L2 state. On resize failure: re-acquire the just-released slot (it was a clean busy=false flip; the resize aborted before tearing it down in the common case, or zeroed its mmap fields in the late-abort case — either way the re-acquire keeps surface_object's mirror internally consistent) and surface the original VA_STATUS_ERROR_ALLOCATION_FAILED so libva clients fall back to surface recreation as before this patch. CAPTURE side is untouched — the V4L2 stateless API treats per-queue streaming independently, so STREAMOFF/STREAMON on OUTPUT does not disrupt the CAPTURE queue, and a resolution-upshift CAPTURE budget mismatch becomes a clean V4L2_BUF_FLAG_ERROR on the next DQBUF (handled by the existing surface error path). Closes marfrit/libva-v4l2-request-fourier#15.	2026-05-21 13:11:55 +02:00
marfrit	2860d75afe	Merge pull request 'picture: bounds-check codec_store_buffer slice writes against source_size (#13 )' (#14 ) from claude-noether/libva-v4l2-request-fourier:noether/codec-store-buffer-bounds-check-13 into master Reviewed-on: #14	2026-05-21 10:17:15 +00:00
claude-noether	bfcb286031	picture: bounds-check codec_store_buffer slice writes against source_size surface_object->source_data points at an OUTPUT-pool mmap of fixed size source_size, negotiated by v4l2_query_buffer at request_pool_init time (kernel sizeimage at S_FMT). codec_store_buffer's VASliceDataBufferType branch appended to it at three sites (H.264 Annex-B start code, VP8 uncompressed-header pad, slice payload) without consulting that capacity — a stream-level resolution upshift would walk past the mmap and SIGSEGV inside the memcpy (mpv --hwdec=vaapi-copy on the daedalus path, issue #13) or corrupt adjacent heap (Firefox RDD). Add a check at each append site that fails the RenderPicture call with VA_STATUS_ERROR_ALLOCATION_FAILED when slices_size+payload exceeds source_size, and logs the over-budget request for postmortem. libavcodec recreates the surface at the new dimensions on the next BeginPicture, so a refused upshift slice is recoverable. Doesn't address the root cause (surfaces should be re-created on resolution change, or source_data should be grown on demand) but removes the memory-safety hazard while the larger refactor waits. Closes marfrit/libva-v4l2-request-fourier#13.	2026-05-21 12:14:48 +02:00
marfrit	77f9236466	Merge pull request 'av1: populate V4L2_CID_STATELESS_AV1_SEQUENCE in codec_set_controls (#11 libva side)' (#12 ) from claude-noether/libva-v4l2-request-fourier:noether/av1-set-controls-bug-11 into master Reviewed-on: #12	2026-05-20 19:14:49 +00:00
claude-noether	9fa18f2312	av1: populate V4L2_CID_STATELESS_AV1_SEQUENCE in codec_set_controls Implements the libva-side portion of issue #11 — replaces PR #10's no-op AV1 dispatch with a real av1_set_controls that maps VAAPI's VADecPictureParameterBufferAV1.seq_info_fields + scalar fields onto struct v4l2_ctrl_av1_sequence (the kernel uAPI control declared at linux/v4l2-controls.h:2891-2919). Daemon-track context (issue #11 daemon side, operator-owned): ffmpeg-vaapi splits the AV1 bitstream client-side and strips the OBU_SEQUENCE_HEADER before delivery; the V4L2 OUTPUT buffer contains only OBU_FRAME_HEADER + OBU_TILE_GROUP. libdav1d in the daedalus daemon cannot parse this — it expects a complete OBU stream. The daemon side has to synthesise OBU_SEQUENCE_HEADER from the SEQUENCE ctrl and prepend it to the slice bitstream. This libva-side change just makes the SEQUENCE ctrl populated and queued via S_EXT_CTRLS; the daemon track is the consumer. Three small touch points beyond the new src/av1.{c,h}: - src/surface.h: add an av1 leaf to surface->params holding VADecPictureParameterBufferAV1. Slice params intentionally absent — the daedalus daemon consumes the slice OBU bytes directly from the OUTPUT buffer; no per-tile-group struct → OBU re-synthesis required from libva today. - src/picture.c: copy the picture-param buffer into the new leaf in RenderPicture, mirror of the per-codec memcpy pattern, plus call av1_set_controls from codec_set_controls (replacing the no-op). - src/meson.build: register src/av1.c. Sequence-field mapping covers everything VAAPI exposes at the sequence level (12 of 18 V4L2_AV1_SEQUENCE_FLAG_* bits + the four scalars). Bits VAAPI doesn't carry at the sequence level (WARPED_MOTION, REF_FRAME_MVS, SUPERRES, RESTORATION, SEPARATE_UV_DELTA_Q) stay clear; per-frame consumers (libdav1d via the daemon, vpu981 via the hardware path) read those from the OBU_FRAME_HEADER that is already in the slice buffer anyway. See feedback memory `feedback_vaapi_blind_to_some_hevc_sps_fields` for the precedent. Build verified on higgs (Debian 13 trixie, gcc 14.2.0, libva 2.22.0, linux uAPI v4l2-controls.h sizeof(struct v4l2_ctrl_av1_sequence)==12): clean meson + ninja link of v4l2_request_drv_video.so, vainfo enumerates VAProfileAV1Profile0 via daedalus_v4l2 slot, av1_set_controls symbol present. Out of scope on this PR (operator-track, issue #11 follow-up): - daedalus-v4l2 kernel module wire-protocol extension (daedalus_ collect_av1_meta + AV1 ctrl request_setup). - daedalus daemon OBU synthesiser (~400 LoC AV1 OBU encoder in daemon/src/av1_obu_synth.{c,h}). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 21:13:07 +02:00
marfrit	9a9cfd05db	Merge pull request 'picture: no-op codec_set_controls case for VAProfileAV1Profile0' (#10 ) from noether/picture-av1-noop into master Reviewed-on: #10	2026-05-20 19:07:12 +00:00
marfrit	96d70af674	picture: no-op codec_set_controls case for VAProfileAV1Profile0 picture.c's codec_set_controls() switch was falling through to the default case for VAProfileAV1Profile0, returning VA_STATUS_ERROR_UNSUPPORTED_PROFILE. Result: vaEndPicture failed with status 12 ("requested VAProfile is not supported"), no OUTPUT buffer ever got queued, and the daedalus_v4l2 daemon never saw a REQ_DECODE for AV1. config.c's VAProfileAV1Profile0 case (line 84-93) explicitly notes "Decode-side ctrl dispatch (V4L2_CID_STATELESS_AV1_) is NOT YET WIRED on master — vainfo will list the profile + CreateConfig succeeds, but consumers that submit decode buffers hit a NOP path". The NOP path was never actually wired in picture.c — it hit the default UNSUPPORTED_PROFILE branch instead. Fix: add a VAProfileAV1Profile0 case that just `break;`s through without setting V4L2 controls. For the daedalus_v4l2 daemon path this is exactly the right shape — AV1 frame data is self-describing per OBU stream (no separate SPS/PPS controls needed at the V4L2 boundary), so the OUTPUT buffer alone is sufficient for the kernel to forward to the daemon. Verified on higgs: ffmpeg -hwaccel vaapi -i av1.mkv now actually queues frames to /dev/video2 and the daemon's libdav1d context opens. Decode itself still fails (libdav1d wants the AV1 sequence header OBU, which ffmpeg-vaapi sends via VAPictureParameterBufferAV1 not via the slice buffer) — separate issue, needs an OBU sequence-header synthesiser in the daedalus daemon (analogous to the new H.264 SPS/PPS NAL synth in daedalus-v4l2/daemon/src/h264_nal_synth.c). That sequence-header synth work is a substantial follow-up; this patch unblocks AV1 reaching the daemon at all. For RK3588 vpu981 (the originally-planned AV1 target), this remains a true NO-OP — when V4L2_CID_STATELESS_AV1_ dispatch lands from the av1-iter1 operator branch, replace the no-op with av1_set_controls(...). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 20:58:57 +02:00
marfrit	c1bb444d07	Merge pull request 'h264: max_num_ref_frames fallback + libva-boundary instrumentation (#8 )' (#9 ) from claude-noether/libva-v4l2-request-fourier:noether/h264-3-set-controls-bitstream-bug-8 into master Reviewed-on: #9	2026-05-20 18:19:03 +00:00
claude-noether	0791f8e612	h264: max_num_ref_frames fallback + libva-boundary instrumentation Closes the libva-side portion of marfrit/libva-v4l2-request-fourier#8. Two small additions to h264_set_controls: 1. When VAPicture->num_ref_frames is 0 (older ffmpeg-vaapi paths / some daedalus_v4l2 consumers), count valid (non-INVALID) DPB entries in ReferenceFrames[16]. If even that returns 0, fall back to a per-profile spec minimum (1 for baseline, 4 for main/high). Hardware decoders (rkvdec, hantro, rpi-hevc-dec) tolerated the prior 0; libavcodec-via-daedalus enforces sps.max_num_ref_frames strictly and rejected every frame. 2. One request_log line at function entry dumping the raw VAAPI fields (seq_fields.value, pic_fields.value, num_ref_frames, bit_depth_, picture__in_mbs_minus1). Disambiguates "ffmpeg-vaapi never populated" from "daedalus_v4l2 wire protocol corrupted" for the bit-fields-read-as-zero portion of issue #8. Out of scope here (separate issue if pursued): profile_idc and level_idc remain session-derived. VAAPI's VAPictureParameterBufferH264 omits both (verified higgs libva 2.22.0-3, /usr/include/va/va.h: 3571-3622) — same VAAPI-blindspot family as the HEVC SPS fields. A real fix requires SPS-NAL parsing from surface->source_data OR a daedalus wire-protocol pass-through; both are operator design calls, not a libva-only patch. Build verified on higgs (Debian 13 trixie, gcc 14.2.0, libva 2.22.0): clean ninja link of v4l2_request_drv_video.so, vainfo enumerates all 8 codec profiles, no init regression. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 20:17:27 +02:00
marfrit	989833114a	Merge pull request 'config: include video_fd_daedalus in profile enumeration probe' (#7 ) from claude-noether/libva-v4l2-request-fourier:noether/libva-2-config-profile-enum-daedalus into master Reviewed-on: #7	2026-05-20 14:52:11 +00:00
marfrit	d1ba4625d2	config: include video_fd_daedalus in profile enumeration probe LIBVA-2 follow-up. RequestQueryConfigProfiles walks each known decoder fd via any_fd_supports_output_format() and adds a VAProfile* for each codec OUTPUT format the V4L2 device advertises. The fd list missed video_fd_daedalus — so on a Pi 5 with rpi-hevc-dec primary + daedalus_v4l2 alt, only S265 (HEVC) was probed and the H.264 / VP9 / AV1 profiles never got enumerated. Effect on higgs: ffmpeg -hwaccel vaapi -i h264_test.mp4 reported "No support for codec h264 profile 578" before the per-codec dispatch in request_switch_device_for_profile could fire — the profile-578 (H264 Constrained Baseline) check happened during hwaccel init, found nothing in the libva profile list, and bailed without ever calling into the daedalus path. Fix: extend the fds[] array in any_fd_supports_output_format from 5 to 6 entries, with the sixth being video_fd_daedalus when HAVE_DAEDALUS_V4L2 is on (and -1 otherwise so it's skipped by the `if (fds[i] < 0) continue;` guard). After the fix, daedalus_v4l2's OUTPUT format menu (VP9F + AV1F + S264) gets seen, and Request- QueryConfigProfiles returns VP9Profile0 + AV1Profile0 + the H264* profiles, all of which then route through the LIBVA-1 'd' kind override in request_switch_device_for_profile. Verified on higgs: Before: vainfo: Supported profile and entrypoints VAProfileHEVCMain : VAEntrypointVLD (only HEVC; H264/VP9/AV1 not enumerated) ffmpeg vaapi -i h264 → "No support for codec h264 profile 578" Build clean on boltzmann (only config.c.o + request.c.o recompile). Backward-compatible on RK3399/3588 — the new slot is gated by HAVE_DAEDALUS_V4L2 and video_fd_daedalus >= 0; both stay false in those deployments. Existing 5-fd probe order unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 16:45:33 +02:00
claude-noether	c332d34643	Merge pull request 'request: route VP9/AV1/H.264 to daedalus_v4l2 on Pi 5 mixed deploy' (#6 ) from claude-noether/libva-v4l2-request-fourier:noether/libva-1-per-codec-dispatch into master	2026-05-20 08:53:04 +00:00
marfrit	6173a8da8e	request: route VP9/AV1/H.264 to daedalus_v4l2 on Pi 5 mixed deploy LIBVA-1 — when both rpi-hevc-dec and daedalus_v4l2 are loaded, finish the per-codec dispatch so HEVC goes to rpi-hevc-dec (existing 'p' override) and VP9 / AV1 / H.264 go to the daedalus daemon ('d'). Before this change the multi-device-probe accepted only ONE driver plus a fixed alt slot (rkvdec↔hantro-vpu); on a Pi 5 with both decoders the find_codec_device() walk preferred rpi-hevc-dec by known_decoder_ drivers[] order and never opened daedalus_v4l2, so VP9/AV1/H.264 frames hit rpi-hevc-dec's S_FMT and failed. Changes: - request.c multi-device-probe: when primary = rpi-hevc-dec, alt = daedalus_v4l2 (when HAVE_DAEDALUS_V4L2 is on); symmetric handling in the daedalus_v4l2 primary branch so alt = rpi-hevc-dec. This preserves the iter40 fallback (no daedalus → alt = NULL) when the build option is off. - request.c alt-driver opening block: generalized from the iter38 rkvdec/hantro pair to also dispatch into video_fd_rpi_hevc_dec and video_fd_daedalus slots. Defensive close on unknown alt-driver name (shouldn't happen — primary_driver branches gate the choices — but keeps the slot tally clean if a future driver name is added above without wiring up the dispatch here). - request_switch_device_for_profile: added 'd' kind handler + profile override block. When daedalus is open, VP9 / AV1 / H.264* route to it. HEVC stays on rpi-hevc-dec via the existing 'p' override. AV1 'a' kind (RK3588 vpu981) wins ONLY if vpu981 was probed, so the override only fires on hosts where vpu981 stayed -1 (i.e. Pi 5). - RequestTerminate: close the daedalus_v4l2 fd pair on teardown (was leaking — caught while reviewing the alt-driver expansion). Build: meson + ninja clean on boltzmann (only pre-existing GStreamer H265 parser noise). Behaviour on RK3399/3588 boxes unchanged — the new branches are gated by HAVE_DAEDALUS_V4L2 and video_fd_daedalus ≥ 0, both of which stay false in those deployments. Companion to daedalus-v4l2 481279c (Phase 8.13 systemd unit) and marfrit-packages noether/daedalus-v4l2-kernel-6.18-compat branch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 10:41:18 +02:00
marfrit	de27e95571	v4l2: log error_idx + failing ctrl id on S_EXT_CTRLS failure Better diagnostic when VIDIOC_S_EXT_CTRLS returns < 0: read back error_idx and print which control id rejected (or "ioctl-level" when error_idx == count, meaning the rejection was generic, not per-control). Made it possible to triage the daedalus_v4l2 phase 8.13 issue by separating "the actual stateless control failed" (would show failing_ctrl_id=0xa40a2c VP9_FRAME) from "libva probing H264/HEVC profile/level we don't expose" (failing_ctrl_id= 0xa40900 H264_PROFILE etc.) — the latter is harmless on a VP9-only context. Before: v4l2-request: Unable to set control(s): Invalid argument After (per-control): v4l2-request: Unable to set control(s): Invalid argument (error_idx=0/2 failing_ctrl_id=0xa40900 size=0) After (ioctl-level): v4l2-request: Unable to set control(s): Invalid argument (error_idx=2/2 ioctl-level) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 18:14:50 +00:00
marfrit	2146341460	daedalus_v4l2: meson option gate (default true) Adds a build-time switch so platforms that will never see a daedalus_v4l2 kernel module (Allwinner cedrus, RK without the shim, etc.) can opt out of the probe entry + dispatch branch. meson setup build # daedalus support on meson setup build-off -Ddaedalus_v4l2=false # off Implementation: - meson_options.txt: new boolean `daedalus_v4l2`, default true. - src/meson.build: when option is true, autoconfig.h gets `#define HAVE_DAEDALUS_V4L2 1`. - src/request.c: known_decoder_drivers[] entry, primary-driver detection branch, and post-probe log line all gated by #ifdef HAVE_DAEDALUS_V4L2. - src/request.h: struct daedalus fields kept UNCONDITIONAL. Two extra int per session and the struct layout stays stable across translation units regardless of option — avoids the ODR risk of every consumer of request.h needing to include autoconfig.h before request.h. Verified on hertz: both builds compile clean. build/src/autoconfig.h has HAVE_DAEDALUS_V4L2; .so contains "daedalus_v4l2" string + log message. build-off/src/autoconfig.h doesn't; .so contains no daedalus strings at all. Default-on build still passes vainfo end-to-end: vainfo: Driver version: v4l2-request vainfo: Supported profile and entrypoints VAProfileH264Main / High / ConstrainedBaseline / MultiviewHigh / StereoHigh : VAEntrypointVLD VAProfileVP9Profile0 : VAEntrypointVLD VAProfileAV1Profile0 : VAEntrypointVLD Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:41:17 +00:00
marfrit	b5b3acf0f7	daedalus_v4l2: add to known_decoder_drivers + multi-device-probe slot Phase 8.10 of the daedalus-v4l2 sibling campaign — out-of-tree V4L2 stateless decoder shim that forwards bitstream to a userspace daemon (FFmpeg-software decode for VP9 / AV1 / H.264; pixels back via dmabuf into the CAPTURE buffer). Adds the same iter40-shaped wiring as rpi-hevc-dec: - known_decoder_drivers[] entry "daedalus_v4l2" - video_fd_daedalus + media_fd_daedalus slots in driver_data - -1 init alongside the other multi-device slots - primary-driver detection branch in the auto-probe block - post-probe log line for symmetry with iter40 No per-profile dispatch changes needed — daedalus_v4l2 advertises the standard V4L2_PIX_FMT_{VP9_FRAME,AV1_FRAME,H264_SLICE} OUTPUT fourccs the fork's existing per-driver paths already handle. Verified on hertz (Pi 5 / CM5, 6.12.75+rpt-rpi-2712) with the daedalus_v4l2 module loaded: LIBVA_DRIVER_NAME=v4l2_request \ LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0 \ LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media3 \ vainfo --display drm --device /dev/dri/renderD128 v4l2-request: opened daedalus_v4l2 at video_fd=... media_fd=... (Pi 5 daemon-backed VP9/AV1/H264) vainfo: Driver version: v4l2-request vainfo: Supported profile and entrypoints VAProfileH264Main : VAEntrypointVLD VAProfileH264High : VAEntrypointVLD VAProfileH264ConstrainedBaseline: VAEntrypointVLD VAProfileH264MultiviewHigh : VAEntrypointVLD VAProfileH264StereoHigh : VAEntrypointVLD VAProfileVP9Profile0 : VAEntrypointVLD VAProfileAV1Profile0 : VAEntrypointVLD Without the env override the auto-probe still picks rpi-hevc-dec first (it's earlier in known_decoder_drivers[]); on the standalone daedalus_v4l2 path the daemon-backed decode is what answers S_FMT/QBUF/DQBUF. On a mixed-driver Pi 5 box where both modules are loaded, HEVC continues to route through rpi-hevc-dec via the existing 'p' override; VP9/AV1/H264 would prefer daedalus_v4l2 since rpi-hevc-dec is HEVC-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:37:53 +00:00
marfrit	820557268b	Merge PR #5 : ampere-av1 Phase 2 (master) — fourth-fd probe + AV1 enumeration	2026-05-18 13:47:56 +00:00
claude-noether	c6f81c653f	ampere-av1 Phase 2 (master): fourth-fd probe + AV1 enumeration Imports the minimal "vainfo lists VAProfileAV1Profile0" layer from the operator's in-progress av1-iter1 branch (Phase 2 steps 1, 2 — commits `bed75c0` + `61db76e` on av1-iter1). The Phase 3-5 bit-exact decode-side work stays in av1-iter1; this commit gives master the enumeration + fd-routing layer so consumers (ffmpeg-vaapi, firefox-fourier, chromium- fourier) at least see VAProfileAV1Profile0 today on RK3588. What this commit adds: - video_fd_vpu981 + media_fd_vpu981 slots to struct request_data (named to match av1-iter1's convention so the operator's Phase 3-5 merge resolves cleanly) - 4th-decoder probe loop in VA_DRIVER_INIT that walks hantro-vpu media nodes for an instance advertising V4L2_PIX_FMT_AV1_FRAME (AV1F) as OUTPUT pixfmt. RK3588 has 3 hantro-vpu instances all reporting driver="hantro-vpu" + model="hantro-vpu", so OUTPUT- format probe is the only DTS-independent discriminator. - 'a' kind in request_device_kind_for_profile (VAProfileAV1Profile0) + 'a' branch in request_switch_device_for_profile. - video_fd_vpu981 added to any_fd_supports_output_format helper (existing 3-slot loop missed the new fd; same off-by-one trap that bit ampere's av1-iter1 enumeration for a week). - VAProfileAV1Profile0 → V4L2_PIX_FMT_AV1_FRAME in pixelformat_for _profile. - VAProfileAV1Profile0 push in RequestQueryConfigProfiles + RequestQueryConfigEntrypoints + RequestCreateConfig switch. - vpu981 fd cleanup in RequestTerminate. - rpi_hevc_dec fd cleanup added at the same time (was already missing in master — fixed defensively). - V4L2_REQUEST_MAX_PROFILES bumped 13 → 14. Defensively sized for the post-Option-B-revert future: with iter39 Option B reverted (Hi10P + Main10 back in enumeration) plus AV1, max possible enumeration is 13. The per-group guards use `index < MAX - N` pattern; for a singleton push to succeed at index=13 we need MAX >= 14. Bumping now avoids the same off-by-one bug from silently dropping AV1 when Option B eventually reverts. What this commit does NOT add: - av1.{c,h} decode-side scaffolding (Phase 2 step 4 on av1-iter1 — ~177 LoC including a stub av1_set_controls that returns -1). When the operator's av1-iter1 Phase 3-5 work lands on master, those 500+ LoC + the stub will follow. Without them, consumers calling vaCreateContext(VAProfileAV1Profile0) succeed at the libva layer but ffmpeg-vaapi will fail at the first vaRenderPicture with an AV1-buffer-type rejection — clean error, no crash. Verified 2026-05-18 on ampere: $ env LIBVA_DRIVER_NAME=v4l2_request vainfo \| grep VAProfile ... (10 prior profiles, unchanged) ... VAProfileAV1Profile0 : VAEntrypointVLD ✓ Probe log: "ampere-av1: vpu981 AV1 decoder at /dev/video4 + /dev/media3" Build clean on ampere with GCC 16.1.1; no warnings introduced. ampere's running module restored to the av1-iter1 build after the verification — this commit's .so was NOT permanently installed. Closes the headline acceptance criterion in marfrit/libva-v4l2-request-fourier#2 ("vainfo on ampere lists VAProfileAV1"). End-to-end AV1 decode bit-exactness is iter4 work that the av1-iter1 branch continues to drive. Co-Authored-By: claude-noether <claude-noether@reauktion.de>	2026-05-18 13:45:04 +00:00
claude-noether	9bb5a5a722	README: ffmpeg-v4l2-request-fourier flipped to published Build + publish landed (2:8.1.r123329.b57fbbe-3, Kwiboo's v4l2-request-n8.1 tip + libudev-bypass companion patch). Deploy-host verified on fresnel: installs cleanly, ffmpeg buildconf shows --enable-v4l2-request, hwaccels list includes 'v4l2request', HEVC decode via -hwaccel v4l2request produces correct-size output. Quickstart per-host pacman -S lines now include ffmpeg-v4l2-request-fourier. Status table flipped its row from pending to published. Remaining pending: chromium-fourier (clang 22 -> 23 blocker), qt6-base-fourier (Wayland GL_ALPHA fix). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 21:01:04 +00:00
claude-noether	0182307403	README: add Quickstart section with per-host install + full stack matrix The TL;DR of 'what packages do I install to watch YouTube on my Rockchip board with HW acceleration in Firefox' wasn't reachable from this README without reading three other repos' commit histories. Fixed. Now landed at the top: - Stack matrix: kernel (linux-{fresnel,ampere}-fourier) -> ffmpeg (ffmpeg-v4l2-request-fourier) -> libva (libva-v4l2-request-fourier) -> browser (firefox-fourier or chromium-fourier + kwin-fourier on Wayland). - Honest acknowledgement that the browser HW path is libavcodec hwdevice DRM, not VAAPI-via-libva. This backend matters for mpv / ffmpeg-as-vaapi consumers. - Per-host pacman -S incantations for fresnel (RK3399), ampere (RK3588), ohm (RK3566). - Live marfrit repo URL + signing-key import flow. - Smoke-test commands (vainfo + MOZ_LOG patterns). - Honest status flag: ffmpeg-v4l2-request-fourier, chromium-fourier, qt6-base-fourier exist in marfrit-packages source tree but NOT yet in the live repo. Users building those locally now. - RK3588 mainline (Feb 2026) called out alongside ampere row. What hasn't changed: Pi 5 standoff section, technical notes, existing iter39 / iter40 status tables. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 20:48:53 +00:00
claude-noether	941fbc5b1b	README: candid 'standoff' framing for Pi 5 HEVC + RK matrix Replace the original 2018 Bootlin upstream README with the fourier-fork situation as of May 2026. What works: fresnel 5/5, ampere iter1+2, ohm baseline (all RK family, mainline VDPU381/383 landing Feb 2026 helps). What doesn't: Pi 5 HEVC via this backend. New 'The Pi 5 standoff' section captures the honest situation surfaced by the May 2026 web-research pass: - Kwiboo's ffmpeg-v4l2request hwaccel: 8 years un-merged upstream - libva-v4l2-request: no commits since ~2021 - rpi-hevc-dec mainline: 17 months in review, still not merged; Pi 6.18.x downstream has active HEVC regressions (#7228, #7306) - Mozilla bug 1969297 picks the ffmpeg-hwaccel-context path, not libva — explicit ack that strict drivers need libavcodec's internal SPS context - Frames the issue as ecosystem coordination failure (principal- agent stalemate), not architectural impossibility Notes that iter40 + iter40b lands but parks: backend infra is sound + reusable for any future strict V4L2 stateless target ffmpeg ships before libva does, but the user-facing Pi 5 HEVC story will not come from this backend — it'll come from Mozilla / Kwiboo / upstream coordination unblocking. iter38 5/5 fresnel + 9-profile ampere baselines preserved post-iter40b — documented as no-regression in phase7_pi5_hevc_close. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:58:52 +00:00
claude-noether	071b08dcc2	iter40b: SPS-parse fix lands but bit-exact still blocked upstream Per-driver gate added: when rpi-hevc-dec active, parse SPS NAL from surface_object->source_data via the iter2 vendored GStreamer parser and override the VAAPI-omitted v4l2_ctrl_hevc_sps fields (sps_max_num_reorder_pics, sps_max_latency_increase_plus1, sps_max_sub_layers_minus1, max_dec_pic_buffering_minus1[HighestTid]). Cached at driver_data->hevc_sps_field_cache. Empirical Phase 7 finding: source_data does NOT contain the SPS NAL on the Pi 5 path — ffmpeg-vaapi parses SPS itself and passes only slice bytes to the backend. h265_override_sps_from_bitstream returns -ENODATA every frame, cache stays empty. Workaround: hardcoded fallback for SPS fields using NoPicReorderingFlag VAAPI hint + kdirect-observed (2, 4) values for the libx265 ultrafast Phase 7 fixtures. Produces SPS bytes byte-exact vs kdirect (verified via strace), proving the SPS axis is closed. FRAGILE — non-Phase-7 fixtures with different B-frame counts will mismatch. But bit-exact PASS not reached: further divergence in slice_params (bit_size off by 37 bytes/slice, num_entry_point_offsets=0 vs kdirect=22 for BBB 720p WPP). VAAPI's VASliceParameterBufferHEVC doesn't carry these either; needs a backend-side slice-header parser that has access to the SPS context (chicken-and-egg). Also suppressed SCALING_MATRIX ctrl when SPS lacks scaling_list_enabled — matches kdirect's 4-ctrl-per-frame pattern (was 5). Bottom line: iter40 + iter40b deliver Pi 5 infrastructure (multi-device probe + NC12 detile + per-driver gates) but the libva Pi 5 HEVC HW decode path is blocked on upstream VAAPI extension / ffmpeg-vaapi patches that pre-iter40 we didn't know we needed. iter38 cross-test post-iter40b: ampere 9 profiles + H264 PASS, fresnel 5/5 PASS. No sibling regression. Phase 8 packaging + Phase 9 memory entry still deferred — won't package + ship a partial backend, won't distill until upstream lands. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:45:43 +00:00
claude-noether	9037934b21	phase7_pi5_hevc_close: iter40 partial — backend integration works, decode rejected by rpi-hevc-dec C1 vainfo PASS, C3 HW engagement PASS, C6 decode-correctness FAIL (V4L2_BUF_FLAG_ERROR on every CAPTURE DQBUF). Root cause empirically located: SPS sps_max_num_reorder_pics + sps_max_latency_increase_plus1 fields. Our backend uses a spec-legal fallback (sps_max_dec_pic_buffering_minus1, 0) because VAAPI doesn't forward these fields; rkvdec accepts it, rpi-hevc-dec validates against bitstream-true values and rejects. Real fix needs SPS NAL parse via the iter2 vendored GStreamer parser to populate bitstream-true values for the V4L2 SPS ctrl. Estimated 1 more 8(+1)-phase loop (iter40b). Phase 8 + Phase 9 deferred — won't package + deploy + ship a broken backend; won't distill lessons until the real fix lands. Sibling iter38 baseline NOT yet re-verified on fresnel + ampere post-iter40. Code paths gated on video_fd_rpi_hevc_dec >= 0 stay no-op on non-Pi hosts; only __arm__ → __aarch64__ guard change is globally observable but its is_10bit sub-gate stays dormant on 8-bit fixtures. Verify before declaring no-regression. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:18:16 +00:00
claude-noether	3ffa9d0d17	iter40: Pi 5 HEVC chapter — backend integration lands, bit-exact pending Phase 6 implementation. Backend builds clean on higgs (Debian 13 trixie, aarch64), vainfo lists VAProfileHEVCMain via rpi-hevc-dec, multi-device probe finds /dev/video19 + /dev/media1, CreateContext + S_FMT + REQBUFS + STREAMON all succeed. Phase 7 partial: infrastructure works, 10 frames flow through the pipeline (correct byte counts produced — 13824000 for 1280x720 x 10 NV12 frames). But every DQBUF CAPTURE returns V4L2_BUF_FLAG_ERROR so output content is wrong (libva sha != kdirect sha). The decode itself is failing on the rpi-hevc-dec side despite all ctrl submissions returning success. Code changes: - request.h: video_fd_rpi_hevc_dec / media_fd_rpi_hevc_dec slots + has_hevc_ext_sps_rps_rpi_hevc_dec flag (mirrors iter38 + iter2 pair-of-flags pattern, naturally false on Pi). - request.c: known_decoder_drivers gains rpi-hevc-dec; primary-driver probe gets an else-if branch setting the new fds (Phase 5 F3); request_switch_device_for_profile prefers 'p' for HEVC when rpi-hevc-dec present. - context.c: per-fd want_pixfmt (NC12 on Pi), capture_pixelformat taken from video_format slot (not hardcoded NV12/NV15); synthetic-SPS pre-seed gated off for Pi (Phase 5 F6); destination_sizes uses nv12_col128_uv_plane_offset for NC12 SAND layout (Phase 5 F2); per-driver HEVC_START_CODE (NONE on Pi, ANNEX_B on RK); per-driver context_object->h264_start_code (skip prepend on Pi). - video.c: NV12_COL128 video_format entry (8-bit SAND, single buffer, 2 planes, NV12 drm_format with MOD_NONE so detile branch fires rather than tiled_to_planar). - nv12_col128.c/.h: detile primitive (Y + UV per-plane, kernel hevc_d_video.c bytesperline formula + ffmpeg/Kynesim per-pixel offset). UV plane offset = 128 * ALIGN(h, 8) — within-column (SAND interleaves Y+UV per column, NOT plane-concatenated; earlier wrong formula caught by Phase 7 SEGV). - image.c: #ifdef __arm__ extended to __arm__ \|\| __aarch64__ (Phase 5 F1 — guard was killing detile path on all aarch64 hosts including fresnel iter39 NV15 path, masked because 10-bit never exercised); RequestCreateImage NC12 → NV12 stride override (linear width, not column-stride); copy_surface_to_image NC12 detile branch (gates on fourcc + v4l2_format). - nv15.h: fallback V4L2_PIX_FMT_NV15 define (Debian 13 headers omit it though they have NC12). - nv12_col128.h: fallback V4L2_PIX_FMT_NV12_COL128 + V4L2_PIX_FMT_NV12_10_COL128 (Arch / mainline pre-Pi headers). - tests/test_nv12_col128_detile.c: hand-crafted-bytes unit test; passes (8 cases: Y + UV for 4 widths incl. 1366 misaligned; UV-offset helper). - meson.build / nv12_col128 sources listed. Phase 7 status: not yet bit-exact. Remaining diagnosis: per-frame S_EXT_CTRLS payload diff vs kdirect (kdirect sends 4 ctrls SPS+PPS+decode_params+slice_array; ours sends 5 incl. scaling_matrix; field ordering differs). Likely the slice_array contents need per-driver handling for rpi-hevc-dec's expected layout. Beyond in-session reach. iter38 5/5 baseline on fresnel + ampere should be unaffected (new fd stays -1 on non-Pi hosts; all gates either short-circuit on fd-not-present or no-op). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:17:14 +00:00
claude-noether	f1be489c75	phase5_pi5_hevc_review: 3 critical findings empirically verified, 1 fixture gap Sonnet Plan-agent review of phase1_pi5_hevc plan. Empirically verified each finding against current source per feedback_review_empirical_over_theoretical BEFORE accepting: F1 (CRITICAL): #ifdef __arm__ at image.c:239+268 kills NC12 (and already-present NV15) detile on AArch64. fresnel iter39 5/5 PASS masked this because 10-bit path was never exercised. Fix: extend guard to __aarch64__. F2 (CRITICAL): destination_bytesperlines for NC12 source returns column-stride (1080) not linear-NV12 Y stride (1280). VAImage consumers see wrong pitch. Fix: override in RequestCreateImage when src=NC12, dst image=NV12. F3 (CRITICAL): request.c primary-driver detection has else-if branches for rkvdec and hantro-vpu only. On higgs (rpi-hevc-dec primary), neither matches → new fd pair stays -1 → routing no-ops. Fix: add explicit rpi-hevc-dec branch. F4 (accepted): add 1366x768 fixture to exercise column padding. F5 (verify-only): HEVC START_CODE_ANNEX_B may not work on rpi-hevc-dec (kdirect uses NONE). Don't pre-gate; verify empirically in Phase 7. F6 (CRITICAL): iter25 synthetic-SPS pre-seed fires for HEVC regardless of driver_kind. Would issue HEVC_SPS to rpi-hevc-dec which doesn't need it AND uses different submission order. Fix: gate on driver_data->video_fd != video_fd_rpi_hevc_dec. F7/F8 (no findings): image.c gate predicate sound; cross-device regression scope clean. Amended Phase 6 step list with 3 new gating actions. Phase 7 verification expanded with empirical START_CODE check + 1366 fixture. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:04:28 +00:00
claude-noether	bf52725ab3	phase1_pi5_hevc: lock goal + situation + N=3 baseline + plan (iter40) Phase 1 measurable goal: HEVC Main 8-bit bit-exact libva-vs-kdirect on higgs for 640x360 / 1280x720 / 1920x1080 fixtures with HW path engagement verified via lsof + ffmpeg-vaapi log signal. Phase 2 surface-area audit: ~250 LoC backend + 100 LoC standalone detile primitive. Reuses iter38 multi-device-probe pattern (now 3 slots: rkvdec + hantro + rpi-hevc-dec) + iter2 per-driver gating shape. h265_set_controls + iter31 a-29 plumbing transfers unchanged. iter25 SPS pre-seed gated off for rpi-hevc-dec. Phase 3 baseline locked: N=3 bit-exact SW==kdirect for all three fixtures on higgs. kdirect engagement signal: Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19; buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8 Phase 4 plan: 7 sequenced steps (request.h -> request.c -> video.c -> nv12_col128.c new -> image.c branch -> meson/Makefile -> build on higgs). NC12 tile geometry locked from kernel hevc_d_video.c math + ffmpeg/Kynesim av_rpi_sand_to_planar_y8 byte-offset formula. Risks + mitigations enumerated. Phase 5 sonnet review explicitly requested per CLAUDE.md no-skip-reviews rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:00:35 +00:00
claude-noether	b6a65fc692	phase0_pi5_hevc: close addendum with empirical higgs probe data Live probe of rpi-hevc-dec on higgs (Pi CM5, kernel 6.12.75-rpt-rpi-2712, Debian 13 trixie) answers Phase 0 open questions Q1, Q2, Q5, Q6 empirically; Q3 partial; Q4 still open. Q1 (EXT_SPS): NOT present. Only standard V4L2_CID_STATELESS_HEVC_*. Probe ctrl id 0xa97 returns EINVAL — same gate iter2's has_hevc_ext_sps_rps_rkvdec uses. iter31 alpha-29 plumbing applies. Q2 (hevc_start_code): default 0 "No Start Code"; matches our behaviour. Q3 (NC12 SAND tile layout): partial. CAPTURE S_FMT for 1280x720 NC12 returns sizeimage=1382400 (linear NV12 byte count) but bytesperline=1080 (suspect, encodes SAND col count not linear stride). Need kernel-doc / driver-source read before writing detile primitive. Q4 (DRM modifier round-trip): hwdownload rejects SAND-tiled drm_prime (-38 Function not implemented). Backend CPU-detile to NV12 is the safe path for Firefox. Q5 (submission ordering): empirical ioctl trace shows canonical V4L2 stateless flow. Two notes for the backend: kdirect uses V4L2_MEMORY_DMABUF for both queues (we use MMAP for CAPTURE on rkvdec); kdirect does NOT need the iter25 SPS pre-seed pattern - rpi-hevc-dec takes explicit NC12 + dims directly. Q6 (packaging): Debian 13 trixie. Phase 8 needs a debian/ tree, not just PKGBUILD. Decision in Phase 1. Other findings: ffmpeg 7.1.3 from stock Debian is built with --enable-v4l2-request. kdirect engagement line: Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19; buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8 No libva ICD installed (only armada-drm_dri.so). mpv installable. Firefox 145 + rpi-firefox-mods present. Phase 0 closed. Phase 1 opens with goal: HEVC bit-exact libva-vs-kdirect on higgs for 1280x720 Main 8-bit via the new RPI_HEVC_DEC driver_kind slot + NC12 detile primitive. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 18:54:08 +00:00
claude-noether	25b8a15e09	phase0_pi5_hevc: open Pi 5 / CM5 HEVC chapter (substrate doc only) Empirical higgs probe (sibling session 2026-05-17) confirmed rpi-hevc-dec at /dev/video19 is V4L2 STATELESS, not stateful: - Section header literally "Stateless Codec Controls" - OUTPUT V4L2_PIX_FMT_HEVC_SLICE (parsed slices), not full-stream HEVC - V4L2_CID_STATELESS_HEVC_* control set + slice_param_array[4096] - CAPTURE NC12 / NC30 (V4L2_PIX_FMT_NV12_COL128 / _10_COL128, SAND 128-column tiled, Pi-specific) So the Pi 5 HEVC HW path belongs HERE (request/stateless backend), not in a separate stateful project. Replaces the now-deleted libva-v4l2-stateful-fourier scaffold attempt. phase0_pi5_hevc.md captures: - Substrate (target host, backend baseline, empirical probe output) - What carries forward unchanged (most of HEVC plumbing) - What needs adding (RPI_HEVC_DEC driver_kind, NC12/NC30 video_format + detile primitive, image.c branch — small surface area) - Six open questions Phase 1 must answer first (EXT_SPS presence, start_code default, SAND tile spec, drm_prime modifier round-trip, rpi-hevc-dec submission ordering quirks, packaging target OS) - Phase 1 goal sketch (NOT locked) + Phase 3 baseline plan No code in this commit. Phase 1 opens when higgs is up + first two open questions are answered live. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 18:48:09 +00:00
claude-noether	cf8cd9d2be	h265: cap pred-weight + ref-list loops at VAAPI source size (15) V4L2_HEVC_DPB_ENTRIES_NUM_MAX is 16, but VASliceParameterBufferHEVC::RefPicList is [2][15] and the eight delta_*_weight_lX / luma_offset_lX / delta_chroma_weight_lX / ChromaOffsetLX arrays are all [15]. Iterating the per-slot copy loops to 16 over-reads the VAAPI source by one element. The bug was always there but hidden under -O3 (meson's default buildtype=release): GCC unrolled the inner loop and dead-folded the out-of-bounds load. Under -O2 (Arch makepkg CFLAGS) the canonical vectorised loop ran and produced a real SEGV at v4l2_request_drv_video.so + 0xb3a4 inside h265_fill_slice_params, breaking HEVC immediately after the package install on fresnel (iter38 5/5 baseline dropped to 4/5). Define a local VA_HEVC_REF_LIST_LEN (15) and use it as the cap for the four offending loops. RefPicList and pred_weight_table copies now respect the source bound; V4L2 destination still has 16 slots, the upper one stays at memset-zero which is correct. Verified locally: -O2 build + package re-install restores HEVC to bit-exact PASS vs kdirect (sha 108f925bb6cbb6c9). iter38 5/5 baseline restored. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 17:00:52 +00:00
claude-noether	c9f32aff49	iter39 Option B revert of `63fed87`: P010 advertisement gated on is_10bit again Phase 7 fix `63fed87` (unconditional P010 in QueryImageFormats) broke HEVC 8-bit on fresnel: ffmpeg-vaapi picked P010 for the HEVC hwframe pool, vaEndPicture SEGV'd when consumer-side P010 expectations met the 8-bit NV12 CAPTURE buffer. Exit 139 (SIGSEGV) on first frame. Original reasoning for `63fed87` (advertise early so ffmpeg's pre- CreateContext query sees P010) doesn't apply with Option B in place — Hi10P + Main10 are dropped from RequestQueryConfigProfiles, so no 10-bit decode pipeline reaches QueryImageFormats. The gate on is_10bit (false for all enumerated profiles post-Option-B) correctly returns NV12-only. Verified on fresnel post-revert: HEVC bit-exact PASS sha 108f925bb6cbb6c9 restored; iter38 5/5 baseline intact. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 16:52:36 +00:00
claude-noether	6bc12fe7e4	iter39 Option B: drop Hi10P + Main10 from RequestQueryConfigProfiles Per Phase 7 close + user-directed Option B trigger (web research / rockchip-mpp showed Hi10P is effectively impossible on the current stack). Cross-test on ampere RK3588 confirmed the SAME failure mode as fresnel RK3399 — both produce all-zero output via libva; kdirect fails with EINVAL on both. The blocker is in ffmpeg-v4l2-request userspace plumbing for the new uAPI controls Karlman's kernel patches introduced, NOT in our backend or the kernel. Sources confirming kernel + HW capable but userspace pending: - lwn.net/Articles/950434: "to fully runtime test... you may need upstream DRM commits, FFmpeg patches" - patchwork.kernel.org Karlman v6 → v10 series on linux-media - Rockchip RK3399 + RK3588 datasheets list 10-bit H.264 support Stop enumerating Hi10P + Main10 so VAAPI consumers don't try the broken path. The backend infrastructure (codec.c profile cases, context.c NV15 CAPTURE + synthetic SPS bit_depth=2 + video_format invalidation, image.c P010 reporting + NV15→P010 unpack, surface.c RT_FORMAT_YUV420_10 guard + NV15 PRIME fourcc, nv15.c + nv15.h unpack primitive, request.h is_10bit flag) is RETAINED — just re-add the two profiles[index++] lines and bump the H264 guard back to (-6) when upstream ffmpeg-vaapi V4L2 hwaccel learns 10-bit. Memory: feedback_rk3399_h264_hi10p_advertised_not_functional.md captures the empirical evidence for future iterations. vainfo after this commit: 10 profiles (was 12), matches the iter38 baseline. iter38 5/5 PASS preserved (no other codec touched). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 16:43:44 +00:00
claude-noether	63fed87bc5	iter39 fresnel fix: advertise P010 unconditionally in QueryImageFormats ffmpeg-vaapi's hwcontext_vaapi calls vaQueryImageFormats during hwframes context setup, BEFORE vaCreateContext fires. Our previous gate on driver_data->is_10bit meant P010 wasn't in the catalog at that early query — ffmpeg's hwdownload then rejected pix_fmt=p010le with "Invalid output format p010le for hwframe download" and decode failed before our backend's CreateContext saw the 10-bit profile. Fix: advertise P010 unconditionally in QueryImageFormats. Safe because consumers ask for P010 only when their decode pipeline needs 10-bit, and our P010 unpack path in copy_surface_to_image is gated on image->format.fourcc == VA_FOURCC_P010 (independent of is_10bit). Verified on fresnel: with this fix, Hi10P decode advances past the hwdownload filter setup. (Run pending bundle to fresnel.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 16:34:52 +00:00
claude-noether	a13215de45	iter39 fresnel fix: skip pre-S_FMT NV15 CAPTURE format probe RK3399 rkvdec advertises NV15 in VIDIOC_ENUM_FMT(CAPTURE) only AFTER S_FMT(OUTPUT) + S_EXT_CTRLS(SPS) resolve image_fmt to 420_10BIT. Pre-flight v4l2_find_format(NV15) always returns 0 → video_format stays NULL → CreateContext returns OPERATION_FAILED → ffmpeg-vaapi hwaccel init fails with "Failed to create decode context: 1". Verified on fresnel (kernel 7.0-14 / linux-fresnel-fourier): v4l2-ctl -d /dev/video1 --list-formats → only NV12 enumerated Fix: for 10-bit profiles, skip the find_format probe and directly map to our NV15 video_format entry. The later S_FMT(CAPTURE) in the same RequestCreateContext path commits the actual NV15 mode once the synthetic-SPS injection sets bit_depth_luma_minus8=2. Discovered during Phase 7 sub-profile verification — Criterion 1 (vainfo enumeration) PASSed but Criteria 2/3 (Hi10P/Main10 decode) failed with the hwaccel init error. iter38 5/5 baseline still PASSES (no regression — non-10-bit path unchanged). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 16:34:14 +00:00
claude-noether	f0ef69d279	iter2 step4: wire h265_set_controls to populate EXT_SPS__RPS controls Per Phase 4 plan + Phase 5 review amendments (SPS parse-and-cache, per-fd gating). src/h265.c additions: - #include <errno.h>, the v4l2-hevc-ext-controls.h, and the vendored gst/codecparsers/gsth265parser.h - new static helper h265_populate_ext_sps_rps_cache(): walks surface_object->source_data for an SPS NAL (nal_unit_type == 33) using gst_h265_parser_identify_nalu; if found, calls gst_h265_parser_parse_sps_ext (NOT gst_h265_parser_parse_sps — the latter discards the per-RPS-entry EXT data we need); maps GstH265ShortTermRefPicSet (base) + GstH265ShortTermRefPicSetExt (carrying use_delta_flag[16], used_by_curr_pic_flag[16], delta_poc_s0_minus1[16], delta_poc_s1_minus1[16]) into the V4L2 struct arrays; stores on driver_data->hevc_rps_cache_ - non-IDR-frame handling: cache holds across frames, so frames whose source_data lacks an SPS NAL reuse the previously-parsed cached arrays (Phase 5 review item #3) - controls[] grows from [5] to [7]; the 2 new entries are appended after the standard 5 (SPS/PPS/SLICE_PARAMS/SCALING_MATRIX/ DECODE_PARAMS), gated by driver_data->has_hevc_ext_sps_rps_rkvdec (per-fd probe result from Step 3) + the cache being valid - field-by-field mapping mirrors GStreamer's gst_v4l2_codec_h265_dec_fill_ext_sps_rps verbatim (the upstream reference identified in Phase 0 prior-art survey) src/request.h additions: - struct request_data carries hevc_rps_cache_st (array pointer), _st_count, hevc_rps_cache_lt, _lt_count, hevc_rps_cache_valid. Single-slot cache (sps_id 0 only; multi-SPS streams would need expanding). Stores POST-MAPPED V4L2 structs so request.h doesn't need to know GstH265SPS / GstH265SPSEXT types. Critical interpretation correction (Phase 5 review followup): GstH265SPS has short_term_ref_pic_set[65] (base) but NOT short_term_ref_pic_set_ext[]. The EXT array lives on a SEPARATE GstH265SPSEXT struct accessed via gst_h265_parser_parse_sps_ext. The 'plain' gst_h265_parser_parse_sps internally calls _ext with a LOCAL discarded SPSEXT (see gsth265parser.c:2050). Our call must use the _ext variant directly to keep the EXT data. Caught during Step 4 first-build error. Build verified: ninja -C build clean. .so is 759 KB (up from 485 KB original, 682 KB after Step 2 vendor — the +80 KB is the new helper + extension). iter2 Phase 6 Step 5 (install + reboot + smoke-test) is the F1 falsifier moment: if HEVC stops OOPSing, mechanism confirmed; if it still OOPSes, loopback Phase 0 with re-opened kernel-agent#11. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 09:49:12 +00:00
claude-noether	393d02f413	iter2 step3: HEVC EXT_SPS_*_RPS UAPI header + runtime probe src/hevc-ctrls/v4l2-hevc-ext-controls.h (NEW, MIT, ~95 LOC): Verbatim mirror of Linux 7.0 V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS and _LT_RPS control IDs + struct definitions + flag macros. Each symbol is ifndef-guarded so when ampere's linux-api-headers eventually bumps to 7.0+, the kernel header takes precedence and this shim silently no-ops. Citation block links the upstream Casanova v8 series. Per LGPL section 3.b, kernel UAPI struct definitions are excepted from GPL inheritance, so copying them into MIT userspace is fine. src/request.h: added has_hevc_ext_sps_rps_rkvdec + _hantro bool fields on struct request_data — pair-of-flags layout mirrors video_fd_rkvdec / video_fd_hantro (iter38 multi-device-probe pattern, per feedback_multi_device_probe_design). Phase 5 review identified single-scalar storage as a silent-misbehavior risk across device-switch boundaries. src/request.c: - new probe_hevc_ext_sps_rps_controls(fd) helper: queries the two new CIDs via VIDIOC_QUERYCTRL; returns true iff both register. RK3399 rkvdec (linux 6.x or 7.x without VDPU381/383 bindings) returns false; RK3588 rkvdec (VDPU381/383) returns true. - probe each driver_data->video_fd_rkvdec / _hantro after the iter38 multi-device-probe block at VA_DRIVER_INIT time - log-line if rkvdec supports it - diagnostic for Phase 7 src/meson.build: added the new UAPI header to the headers list. Build verified: ninja -C build clean, .so produced. The new probe runs at driver init and stores the result, but nothing CONSUMES the result yet — that's Step 4 (h265_set_controls wiring). Per ampere-kernel-decoders campaign iter2 Phase 4 step 3 (amended by Phase 5 review item 'per-fd storage'). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 09:49:09 +00:00
claude-noether	9f7437e8ee	iter2 step2: GLib/GStreamer compat shim, build succeeds Vendored gsth265parser + nalutils + gstbitreader + gstbytereader (the Step 1 commit) compile cleanly against libc + libv4l2 only after adding 1 compat translation unit + 5 stub headers, no edits to the vendored .c/.h files themselves. src/h265_parser/gst_compat.{h,c} — new files (MIT, original work): - GLib type aliases (gboolean, gchar, gint, guint, gsize, gpointer) - Memory helpers (g_malloc/g_free as #define free, g_memdup2 inline) - Asserts as no-op + parser-return-code-propagation - All GST_DEBUG/INFO/WARNING/ERROR/LOG/FIXME as no-ops (the parser is heavy on debug logging; we compile it all out) - GArray implementation (~100 LOC, just enough for gsth265parser.c's 24 call sites) - GList full struct with .data/.next/.prev so callers compile; list-manipulation functions abort() — dead code paths only - Byte-order read/write macros (GST_READ_UINT8/16/24/32/64_LE/BE, GST_WRITE_UINT8/16/24/32_BE) — aarch64 LE inlines - g_once_init_enter/leave as simple gate - G_MAXUINT, G_MAXINT, G_MINxxx, G_GNUC_* attribute macros, etc. - Opaque GstBuffer/GstMemory/GstMapInfo + abort-stub functions for the encoder-side SEI-insertion paths the libva backend never invokes - gst_util_ceil_log2 real impl (used by slice-header parser; dead for our SPS-only call path but cheaper to implement than stub) src/h265_parser/gst/{gst.h,base/base-prelude.h,base/gstbitwriter.h, codecparsers/codecparsers-prelude.h,glib-compat-private.h} — 5 new stub headers (MIT). All include gst_compat.h. gstbitwriter.h adds abort-stub functions for the bit-writer API (used by nalutils.c's NAL emulation-prevention encoder path — dead code for the parse-only libva backend). src/meson.build — added the 5 new .c source files and 10 new .h headers; added include_directories('h265_parser') to the include path so the vendored files' '#include <gst/base/...>' style references resolve to the stub headers + actual vendored files in the local tree. Build verified: ninja -C build produces v4l2_request_drv_video.so (682 KB, up from 485 KB pre-vendor — the +200 KB is the vendored parser code). nm shows gst_h265_parse_sps, gst_h265_parse_sps_ext, gst_h265_parser_identify_nalu, and the other functions we need for Step 4 are present in the binary. Two #warning messages from gsth265parser.h about API stability are upstream-intentional and harmless ('The H.265 parsing library is unstable API and may change in future'). This commit completes Step 2 of ampere-kernel-decoders iter2 Phase 6. Backend remains functionally identical to pre-iter2 — the new code compiles + links but is not yet called from h265_set_controls (that's Step 4). Existing 5 codecs continue to work as before. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 09:49:06 +00:00
claude-noether	c9b7fcff50	iter2 step1: vendor GStreamer 1.28.2 H.265 parser unchanged Source: gitlab.freedesktop.org/gstreamer/gstreamer @ commit 43421c2a5b8a (refs/tags/1.28.2). All 8 vendored files copied verbatim into src/h265_parser/: gst-plugins-bad/gst-libs/gst/codecparsers/gsth265parser.c (168 KB) gst-plugins-bad/gst-libs/gst/codecparsers/gsth265parser.h ( 92 KB) gst-plugins-bad/gst-libs/gst/codecparsers/nalutils.c (13 KB) gst-plugins-bad/gst-libs/gst/codecparsers/nalutils.h ( 8 KB) gstreamer/libs/gst/base/gstbitreader.c ( 8 KB) gstreamer/libs/gst/base/gstbitreader.h ( 10 KB) gstreamer/libs/gst/base/gstbytereader.c ( 39 KB) gstreamer/libs/gst/base/gstbytereader.h ( 25 KB) Total ~11 KLOC, LGPL v2.1+ per original headers (Intel + Sreerenj Balachandran + others). LGPL headers preserved verbatim. Backend's existing COPYING.LGPL covers redistribution. Build is INTENTIONALLY BROKEN at this commit. GLib dependencies (GArray, g_malloc, gboolean, GST_DEBUG, etc.) are not yet satisfied; src/Makefile.am is not yet updated to include these files. Step 2 performs the GLib-to-libc mechanical adaptation; Step 3 wires the header + Makefile. This vendor-unchanged commit is the upstream-tracking baseline. When GStreamer ships a parser bug fix, the future-sync workflow is: git diff src/h265_parser/ HEAD..(this commit) to surface our adaptations, then rebase those over the upstream fix. Per ampere-kernel-decoders campaign iter2 Phase 4 §Step 1 (/home/mfritsche/src/ampere-kernel-decoders/phase4_plan_iter2.md). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 09:48:52 +00:00
claude-noether	a8a91d92d6	Revert "ampere iter2: HEVC EXT_SPS_ST_RPS / _LT_RPS dynamic-array submission (VDPU381)" This reverts commit `f61f736380`.	2026-05-17 09:48:29 +00:00
claude-noether	f61f736380	ampere iter2: HEVC EXT_SPS_ST_RPS / _LT_RPS dynamic-array submission (VDPU381) Fixes the rkvdec_hevc_prepare_hw_st_rps out-of-bounds kernel OOPS that blocked HEVC decode on ampere (RK3588) per marfrit/libva-v4l2-request-fourier#3 and ampere-fourier iter1 close. Mechanism (Phase 5 amendment to issue body): The new EXT_SPS controls are registered as V4L2_CTRL_FLAG_DYNAMIC_ARRAY in vdpu38x_hevc_ctrl_descs (rkvdec.c:279/284) with cfg.dims = { 65 }. The v4l2-ctrl framework init-allocates 1 zeroed element (ctrls-core.c:2116). When num_short_term_ref_pic_sets > 1, rkvdec_hevc_prepare_hw_st_rps (rkvdec-hevc-common.c:393-405) iterates idx 0..N-1 and overruns the 1-element kernel allocation. Submitting an N-element dynamic-array control via S_EXT_CTRLS extends the framework allocation. Userspace fix: - VIDIOC_QUERY_EXT_CTRL probe at first HEVC CreateContext sets driver_data->has_ext_sps_rps (true on VDPU381/383, false on legacy RK3399 — control unregistered there, so fresnel iter38 5/5 + iter39 sub-profile paths are byte-identical to pre-iter2). - When set, h265_set_controls appends EXT_SPS_ST_RPS + _LT_RPS as calloc'd zero arrays, sized by VAAPI's count fields and capped at H.265 §7.4.3.2 spec maxima (ST 64, LT 32). Min 1 (kernel rejects 0). - Free post-S_EXT_CTRLS. Decode correctness scope: VAAPI does NOT expose per-set st_ref_pic_set syntax elements (delta_idx_minus1, delta_rps_sign, etc.) — confirmed in va_dec_hevc.h. All-zero entries give empty inter-pred RPS per set, which is correct for IDR-only streams and incorrect for streams with inter-pred RPS dependence. iter2 acceptance: stop the OOPS. Decode-correctness for inter-RPS content is a known follow-up requiring either bitstream-snoop or SPS-passthrough via a new VAAPI extension. Files: - include/hevc-ctrls.h: #ifndef-guarded fallback definitions for V4L2_CID_STATELESS_HEVC_EXT_SPS_{ST,LT}_RPS + structs (ampere host is on linux-api-headers 6.19-1; the new CIDs land in 7.0). - src/request.h: driver_data->has_ext_sps_rps (persists for driver lifetime; gated solely by HEVC code path so cross-codec leakage impossible). - src/context.c: probe at HEVC CreateContext via v4l2_query_ext_ctrl. - src/h265.c: controls[5] → controls[7]; #include <hevc-ctrls.h> (replaces <linux/v4l2-controls.h>) for forward UAPI compatibility. Compile-tested on boltzmann (aarch64 native, gcc 15.2.1): clean .so, 0 new warnings. Fresnel cross-device safety: legacy RK3399 rkvdec_ctrl table omits the CIDs; probe returns false; new code path never executes. iter39 sub-profile work (commits `662f887` + `8746690`) is preserved in-tree; iter2 is a forward-compatible additive change. Refs: marfrit/libva-v4l2-request-fourier#3 ampere-fourier/iter1_close.md HEVC blocker ampere-fourier/iter2_phase0_findings.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 09:34:58 +00:00
claude-noether	8746690739	iter39: add NV15 → P010 unpack self-test (tests/test_nv15_unpack.c) Pure-C unit test for nv15_unpack_plane_to_p010, independent of any V4L2 hardware. Verifies bit layout against the spec at Documentation/userspace-api/media/v4l/pixfmt-nv15.rst by packing known 10-bit pixel values, running the unpack, and asserting P010 output matches pixel<<6. Coverage: - zero, all-max - 8 known position/spread vectors - widths {1, 2, 3, 7, 8} including remainder paths - multi-row with stride padding - chroma-shape (half-height) Build + run: cc -Wall -Werror -O2 -o test_nv15_unpack \ tests/test_nv15_unpack.c src/nv15.c ./test_nv15_unpack Confirmed PASS on noether (x86_64 native). Catches the highest-risk class of regression in iter39 — silent bit-shift errors in the unpack — without requiring fresnel hardware. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 09:22:14 +00:00
claude-noether	662f8874ba	iter39 α-31: H264 Hi10P + HEVC Main10 sub-profile support (10-bit, rkvdec NV15) Adds VAProfileH264High10 and VAProfileHEVCMain10 to the libva-v4l2-request backend. RK3399 rkvdec emits decoded frames as V4L2_PIX_FMT_NV15 (4 × 10-bit values packed in 5 bytes per element); VAAPI consumers receive standard VA_FOURCC_P010 via a new userspace unpack in copy_surface_to_image. VP9 Profile 2 explicitly NOT added — RK3399 rkvdec kernel ctrl table caps at V4L2_MPEG_VIDEO_VP9_PROFILE_0 (rkvdec.c::rkvdec_vp9_ctrl_descs). Touchpoints (per Phase 5 sonnet-architect review amendments): - include/drm_fourcc.h: define DRM_FORMAT_NV15 (vendored libdrm lacks it) - src/nv15.{c,h}: NV15 → P010 plane unpack (LSB-first, per Documentation/userspace-api/media/v4l/pixfmt-nv15.rst) - src/video.c: NV15 entry in formats[] (else NULL-deref on video_format_find) - src/codec.c: pixelformat_for_profile cases for Hi10P + Main10 - src/config.c: enumeration, validation, entrypoints, RT_FORMAT_YUV420_10 advertisement for 10-bit profiles - src/context.c: per-profile CAPTURE pix_fmt (NV12/NV15), 10-bit synthetic SPS (bit_depth_luma_minus8=2), video_format invalidation on bit-depth transition (sibling to iter38 device-switch invalidation), is_10bit flag - src/surface.c: RT_FORMAT_YUV420_10 admission, NV15 fourcc on PRIME export - src/image.c: P010 reporting in DeriveImage + QueryImageFormats, P010-aware sizing in CreateImage, NV15 → P010 unpack call in copy_surface_to_image (gated on is_10bit + image.format.fourcc == P010) - src/picture.c: 4 switch blocks route Hi10P/Main10 to existing H264/HEVC per-codec paths - src/request.h: MAX_PROFILES bump 11 → 13, driver_data->is_10bit flag Scope: COPY path (vaGetImage / vaDeriveImage) only. Standard ffmpeg-vaapi hwdownload, mpv vaapi-copy, and any consumer using vaGetImage works end-to-end. PRIME-path consumers that only know NV12/P010 must use the COPY path; PRIME consumers aware of NV15 (panfrost-Mesa et al.) get the correct fourcc on RequestExportSurfaceHandle. PRIME-side P010 emission is follow-up scope (would need DRM_FORMAT_P010 + per-plane unpack into a GPU-accessible buffer). Compile-tested on boltzmann (aarch64 native, gcc 15.2.1, libva 1.23.0, libdrm 2.4.133): clean build, .so produced, 0 new warnings. Phase 0/2 evidence: linux-mmind-v7.0 drivers/media/platform/rockchip/rkvdec. rkvdec_h264_decoded_fmts[] and rkvdec_hevc_decoded_fmts[] both list NV15; ctrl tables cap at HEVC MAIN_10 and H264 HIGH_422_INTRA (Hi10P < cap, not in menu_skip_mask). image_fmt resolution (rkvdec-h264-common.c:196, rkvdec-hevc-common.c:467) dispatches on bit_depth_luma_minus8 only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 09:15:16 +00:00
claude-noether	7ac934e0c5	iter38b: bounds check uses MAX_PROFILES (11), not MAX_CONFIG_ATTRIBUTES (10) Latent bug surfaced by iter38 multi-device probe. profiles[] array in RequestQueryConfigProfiles is sized by V4L2_REQUEST_MAX_PROFILES (set as context->max_profiles=11 in VA_DRIVER_INIT), but the bounds checks used V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES (10). Pre-iter38 only a single device's profiles were enumerated, total ≤9, so the off-by- one never bit. With iter38's rkvdec+hantro union (10 profiles total across MPEG2/H264/HEVC/VP8/VP9), the last enumerator (VP9) hit index=9 with the check 'index < 10-1 = 9' → skipped.	2026-05-14 18:55:27 +00:00
claude-noether	c56a77bd4c	iter38: multi-device probe — single libva session serves all 5 codecs Probe BOTH rkvdec and hantro-vpu at VA_DRIVER_INIT and keep their {video,media}_fd pairs in driver_data. RequestQueryConfigProfiles enumerates the union of supported profiles from all open fds. RequestCreateConfig retargets driver_data->{video,media}_fd to the device that serves the requested profile; if a switch is needed (active fd is wrong), tears down output_pool, capture_pool, video_format cache, and fmt_valid so the next RequestCreateContext rebuilds them on the new device. Profile→device map (RK3399-shaped): H264 / HEVC / VP9 → rkvdec MPEG-2 / VP8 → hantro-vpu Honours LIBVA_V4L2_REQUEST_VIDEO_PATH / MEDIA_PATH explicit overrides (skips alt-probe when those are set). Closes the 'libva multi-device probe' open item from iter36/iter37 campaign-close.	2026-05-14 18:52:12 +00:00
claude-noether	25d3e5f06f	iter37: revert α-26 — decode_params.short_term_ref_pic_set_size back to 0 α-26 (iter26) wrote VAAPI's picture->st_rps_bits to the V4L2 decode_params field of the same name based on field-name match. Per V4L2 spec, this field is the bit-count of st_ref_pic_set() in the SPS — VAAPI doesn't expose that. The slice-header bit-count (which IS what VAAPI's st_rps_bits provides) belongs in slice_params->short_term_ref_pic_set_size (handled correctly in α-29). rkvdec doesn't read decode_params->short_term_ref_pic_set_size, so the misroute was harmless but stale. This revert restores spec-correct semantics (0 when SPS bit-count is unknown). Cosmetic cleanup; no functional change.	2026-05-14 18:38:26 +00:00
claude-noether	7db15a5685	iter36: remove env-gated DIAG probes (iter29/30/33/35) Cleans up the campaign's exploratory env-gated dumps now that all bugs are fixed: - iter29 LIBVA_HEVC_DUMP_SLICE_TAIL (h265.c) — refuted 40-byte inflation theory - iter30 LIBVA_TS_SCALE (picture.c) — refuted timestamp magnitude theory - iter33 LIBVA_VP8_DUMP_FRAME (vp8.c) — led to α-30 fix - iter35 LIBVA_MPEG2_DUMP_FRAME (mpeg2.c) — confirmed MPEG-2 ctrls correct Total: -131 lines / +7 lines (α-7 comment refresh). Preexisting framework env knobs retained: - LIBVA_V4L2_DUMP_OUTPUT (picture.c α-16) - LIBVA_V4L2_DUMP_CAPTURE (surface.c) - LIBVA_V4L2_ZERO_CAPTURE (picture.c) - LIBVA_V4L2_REQUEST_VIDEO_PATH / MEDIA_PATH / NO_AUTODETECT (request.c) The 3 load-bearing fixes remain unchanged: α-25 (rkvdec image_fmt pre-seed, src/context.c) α-29 (slice_params.short_term_ref_pic_set_size, src/h265.c) α-30 (VP8 OUTPUT header prepend, src/picture.c)	2026-05-14 18:12:55 +00:00
claude-noether	48fd0288c3	iter35 DIAG: env-gated dump of v4l2_ctrl_mpeg2_* contents	2026-05-14 17:55:09 +00:00
claude-noether	7e0848d7d2	iter33 α-30: prepend VP8 uncompressed frame header to OUTPUT buffer ROOT CAUSE FIX for VP8 libva decode garbage output. ffmpeg-vaapi's vaapi_vp8.c:191-192 STRIPS the VP8 uncompressed header (3 bytes for interframe, 10 bytes for keyframe) before submitting the slice data via VAAPI. ffmpeg-v4l2request (kdirect) KEEPS the header in its OUTPUT buffer. Hantro's rockchip_vpu2_vp8_dec_run (rockchip_vpu2_hw_vp8_dec.c:349) hard-codes 'first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3' as the byte offset into OUTPUT where the first compressed partition starts. It uses this offset for: - mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8 - dct_part_offset = first_part_offset + first_part_size Without the header, every offset is wrong, the entropy decoder spins on the wrong bytes, and every frame decodes to garbage. Fix: in codec_store_buffer for VAProfileVP8Version0_3, prepend header_size bytes (10 keyframe / 3 interframe) of zeros to OUTPUT before the slice data memcpy. Hantro skips these bytes for actual parsing (uses ctrl-struct values instead), so zero-fill is fine. Empirical: iter33 kernel printk in vpu2_vp8_dec_run dumped the v4l2_ctrl_vp8_frame struct for libva vs kdirect and confirmed byte-identical control fields. Only the OUTPUT buffer bytes differed, traced to ffmpeg-vaapi's header stripping.	2026-05-14 16:35:41 +00:00
claude-noether	bf3e3d8587	iter33: extend VP8 DIAG to dump VAAPI probability struct directly	2026-05-14 16:15:00 +00:00
claude-noether	4b3c21b105	iter33 DIAG: env-gated dump of v4l2_ctrl_vp8_frame contents LIBVA_VP8_DUMP_FRAME=1 prints the v4l2_ctrl_vp8_frame struct fields to stderr before VIDIOC_S_EXT_CTRLS. Goal: diff libva-side struct against expected kdirect-side values for VP8 frame-2+ divergence (libva produces non-trivial but wrong output; kdirect VP8 byte-equal to SW). Env-gated, no behavior change otherwise.	2026-05-14 16:13:11 +00:00

1 2 3 4 5 ...

390 Commits