c454618ae11addce2e17b560f4deeacbed067d98
390 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
c454618ae1 |
Merge pull request 'picture, request_pool: transparent OUTPUT-pool resize on bitstream overrun (#15)' (#16) from claude-noether/libva-v4l2-request-fourier:noether/output-pool-resize-issue-15 into master
Reviewed-on: #16 |
||
|
|
5939ac6ae0 |
picture, request_pool: transparent OUTPUT-pool resize on bitstream overrun
Follow-up to #13 (PR #14, bounds-check floor). When a stream-level resolution upshift mid-session pushes an Annex-B start code / VP8 header pad / slice payload past the OUTPUT pool slot's mmap, the bounds check used to return VA_STATUS_ERROR_ALLOCATION_FAILED and force the libva consumer to recreate the surface (losing the frame). This patch absorbs the resize transparently: 1. codec_store_buffer's three append sites call a new codec_store_buffer_ensure_capacity() before each memcpy/memset. 2. On overflow, ensure_capacity snapshots the in-flight surface's accumulated bytes, temporarily releases its OUTPUT pool slot, and calls request_pool_resize. 3. request_pool_resize STREAMOFFs the OUTPUT queue, munmaps every slot, closes every per-slot media-request fd, REQBUFS(0)s the V4L2 buffers, re-issues S_FMT with a sizeimage hint = 2× the required total (capped at 1 GiB, rounded up to a 4 KiB page), CREATE_BUFSes the original slot count, per-slot queries + mmaps + media_request_allocs, and STREAMONs. 4. ensure_capacity re-acquires a pool slot, re-mirrors source_{index,data,size,request_fd} onto the surface, and restores the saved bytes via memcpy. The cached S_FMT params (pixelformat, picture_width, picture_height) are stashed on the request_pool at init time so the resize is fully self-contained — caller passes only the new sizeimage hint. A new v4l2_set_format_sizeimage() helper accepts an explicit sizeimage override; v4l2_set_format keeps the SOURCE_SIZE_MAX (1 MiB) default for CreateContext-time S_FMT. The pre-condition for the resize is "no pool slot may be borrowed." The inline-Sync-in-EndPicture pattern (RequestEndPicture calls RequestSyncSurface before returning) guarantees that during codec_store_buffer, the only borrowed slot is the current render_surface_id's — which the resize trigger explicitly releases before invoking the pool function. request_pool_resize asserts the invariant via a busy-scan and bails loudly if anyone breaks it rather than corrupting in-flight V4L2 state. On resize failure: re-acquire the just-released slot (it was a clean busy=false flip; the resize aborted before tearing it down in the common case, or zeroed its mmap fields in the late-abort case — either way the re-acquire keeps surface_object's mirror internally consistent) and surface the original VA_STATUS_ERROR_ALLOCATION_FAILED so libva clients fall back to surface recreation as before this patch. CAPTURE side is untouched — the V4L2 stateless API treats per-queue streaming independently, so STREAMOFF/STREAMON on OUTPUT does not disrupt the CAPTURE queue, and a resolution-upshift CAPTURE budget mismatch becomes a clean V4L2_BUF_FLAG_ERROR on the next DQBUF (handled by the existing surface error path). Closes marfrit/libva-v4l2-request-fourier#15. |
||
|
|
2860d75afe |
Merge pull request 'picture: bounds-check codec_store_buffer slice writes against source_size (#13)' (#14) from claude-noether/libva-v4l2-request-fourier:noether/codec-store-buffer-bounds-check-13 into master
Reviewed-on: #14 |
||
|
|
bfcb286031 |
picture: bounds-check codec_store_buffer slice writes against source_size
surface_object->source_data points at an OUTPUT-pool mmap of fixed size source_size, negotiated by v4l2_query_buffer at request_pool_init time (kernel sizeimage at S_FMT). codec_store_buffer's VASliceDataBufferType branch appended to it at three sites (H.264 Annex-B start code, VP8 uncompressed-header pad, slice payload) without consulting that capacity — a stream-level resolution upshift would walk past the mmap and SIGSEGV inside the memcpy (mpv --hwdec=vaapi-copy on the daedalus path, issue #13) or corrupt adjacent heap (Firefox RDD). Add a check at each append site that fails the RenderPicture call with VA_STATUS_ERROR_ALLOCATION_FAILED when slices_size+payload exceeds source_size, and logs the over-budget request for postmortem. libavcodec recreates the surface at the new dimensions on the next BeginPicture, so a refused upshift slice is recoverable. Doesn't address the root cause (surfaces should be re-created on resolution change, or source_data should be grown on demand) but removes the memory-safety hazard while the larger refactor waits. Closes marfrit/libva-v4l2-request-fourier#13. |
||
|
|
77f9236466 |
Merge pull request 'av1: populate V4L2_CID_STATELESS_AV1_SEQUENCE in codec_set_controls (#11 libva side)' (#12) from claude-noether/libva-v4l2-request-fourier:noether/av1-set-controls-bug-11 into master
Reviewed-on: #12 |
||
|
|
9fa18f2312 |
av1: populate V4L2_CID_STATELESS_AV1_SEQUENCE in codec_set_controls
Implements the libva-side portion of issue #11 — replaces PR #10's no-op AV1 dispatch with a real av1_set_controls that maps VAAPI's VADecPictureParameterBufferAV1.seq_info_fields + scalar fields onto struct v4l2_ctrl_av1_sequence (the kernel uAPI control declared at linux/v4l2-controls.h:2891-2919). Daemon-track context (issue #11 daemon side, operator-owned): ffmpeg-vaapi splits the AV1 bitstream client-side and strips the OBU_SEQUENCE_HEADER before delivery; the V4L2 OUTPUT buffer contains only OBU_FRAME_HEADER + OBU_TILE_GROUP. libdav1d in the daedalus daemon cannot parse this — it expects a complete OBU stream. The daemon side has to synthesise OBU_SEQUENCE_HEADER from the SEQUENCE ctrl and prepend it to the slice bitstream. This libva-side change just makes the SEQUENCE ctrl populated and queued via S_EXT_CTRLS; the daemon track is the consumer. Three small touch points beyond the new src/av1.{c,h}: - src/surface.h: add an av1 leaf to surface->params holding VADecPictureParameterBufferAV1. Slice params intentionally absent — the daedalus daemon consumes the slice OBU bytes directly from the OUTPUT buffer; no per-tile-group struct → OBU re-synthesis required from libva today. - src/picture.c: copy the picture-param buffer into the new leaf in RenderPicture, mirror of the per-codec memcpy pattern, plus call av1_set_controls from codec_set_controls (replacing the no-op). - src/meson.build: register src/av1.c. Sequence-field mapping covers everything VAAPI exposes at the sequence level (12 of 18 V4L2_AV1_SEQUENCE_FLAG_* bits + the four scalars). Bits VAAPI doesn't carry at the sequence level (WARPED_MOTION, REF_FRAME_MVS, SUPERRES, RESTORATION, SEPARATE_UV_DELTA_Q) stay clear; per-frame consumers (libdav1d via the daemon, vpu981 via the hardware path) read those from the OBU_FRAME_HEADER that is already in the slice buffer anyway. See feedback memory `feedback_vaapi_blind_to_some_hevc_sps_fields` for the precedent. Build verified on higgs (Debian 13 trixie, gcc 14.2.0, libva 2.22.0, linux uAPI v4l2-controls.h sizeof(struct v4l2_ctrl_av1_sequence)==12): clean meson + ninja link of v4l2_request_drv_video.so, vainfo enumerates VAProfileAV1Profile0 via daedalus_v4l2 slot, av1_set_controls symbol present. Out of scope on this PR (operator-track, issue #11 follow-up): - daedalus-v4l2 kernel module wire-protocol extension (daedalus_ collect_av1_meta + AV1 ctrl request_setup). - daedalus daemon OBU synthesiser (~400 LoC AV1 OBU encoder in daemon/src/av1_obu_synth.{c,h}). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
9a9cfd05db |
Merge pull request 'picture: no-op codec_set_controls case for VAProfileAV1Profile0' (#10) from noether/picture-av1-noop into master
Reviewed-on: #10 |
||
|
|
96d70af674 |
picture: no-op codec_set_controls case for VAProfileAV1Profile0
picture.c's codec_set_controls() switch was falling through to the
default case for VAProfileAV1Profile0, returning
VA_STATUS_ERROR_UNSUPPORTED_PROFILE. Result: vaEndPicture failed
with status 12 ("requested VAProfile is not supported"), no OUTPUT
buffer ever got queued, and the daedalus_v4l2 daemon never saw a
REQ_DECODE for AV1.
config.c's VAProfileAV1Profile0 case (line 84-93) explicitly notes
"Decode-side ctrl dispatch (V4L2_CID_STATELESS_AV1_*) is NOT YET
WIRED on master — vainfo will list the profile + CreateConfig
succeeds, but consumers that submit decode buffers hit a NOP path".
The NOP path was never actually wired in picture.c — it hit the
default UNSUPPORTED_PROFILE branch instead.
Fix: add a VAProfileAV1Profile0 case that just `break;`s through
without setting V4L2 controls. For the daedalus_v4l2 daemon path
this is exactly the right shape — AV1 frame data is self-describing
per OBU stream (no separate SPS/PPS controls needed at the V4L2
boundary), so the OUTPUT buffer alone is sufficient for the kernel
to forward to the daemon.
Verified on higgs: ffmpeg -hwaccel vaapi -i av1.mkv now actually
queues frames to /dev/video2 and the daemon's libdav1d context opens.
Decode itself still fails (libdav1d wants the AV1 sequence header
OBU, which ffmpeg-vaapi sends via VAPictureParameterBufferAV1 not
via the slice buffer) — separate issue, needs an OBU sequence-header
synthesiser in the daedalus daemon (analogous to the new H.264
SPS/PPS NAL synth in daedalus-v4l2/daemon/src/h264_nal_synth.c).
That sequence-header synth work is a substantial follow-up; this
patch unblocks AV1 reaching the daemon at all.
For RK3588 vpu981 (the originally-planned AV1 target), this
remains a true NO-OP — when V4L2_CID_STATELESS_AV1_* dispatch
lands from the av1-iter1 operator branch, replace the no-op with
av1_set_controls(...).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
c1bb444d07 |
Merge pull request 'h264: max_num_ref_frames fallback + libva-boundary instrumentation (#8)' (#9) from claude-noether/libva-v4l2-request-fourier:noether/h264-3-set-controls-bitstream-bug-8 into master
Reviewed-on: #9 |
||
|
|
0791f8e612 |
h264: max_num_ref_frames fallback + libva-boundary instrumentation
Closes the libva-side portion of marfrit/libva-v4l2-request-fourier#8. Two small additions to h264_set_controls: 1. When VAPicture->num_ref_frames is 0 (older ffmpeg-vaapi paths / some daedalus_v4l2 consumers), count valid (non-INVALID) DPB entries in ReferenceFrames[16]. If even that returns 0, fall back to a per-profile spec minimum (1 for baseline, 4 for main/high). Hardware decoders (rkvdec, hantro, rpi-hevc-dec) tolerated the prior 0; libavcodec-via-daedalus enforces sps.max_num_ref_frames strictly and rejected every frame. 2. One request_log line at function entry dumping the raw VAAPI fields (seq_fields.value, pic_fields.value, num_ref_frames, bit_depth_*, picture_*_in_mbs_minus1). Disambiguates "ffmpeg-vaapi never populated" from "daedalus_v4l2 wire protocol corrupted" for the bit-fields-read-as-zero portion of issue #8. Out of scope here (separate issue if pursued): profile_idc and level_idc remain session-derived. VAAPI's VAPictureParameterBufferH264 omits both (verified higgs libva 2.22.0-3, /usr/include/va/va.h: 3571-3622) — same VAAPI-blindspot family as the HEVC SPS fields. A real fix requires SPS-NAL parsing from surface->source_data OR a daedalus wire-protocol pass-through; both are operator design calls, not a libva-only patch. Build verified on higgs (Debian 13 trixie, gcc 14.2.0, libva 2.22.0): clean ninja link of v4l2_request_drv_video.so, vainfo enumerates all 8 codec profiles, no init regression. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
989833114a |
Merge pull request 'config: include video_fd_daedalus in profile enumeration probe' (#7) from claude-noether/libva-v4l2-request-fourier:noether/libva-2-config-profile-enum-daedalus into master
Reviewed-on: #7 |
||
|
|
d1ba4625d2 |
config: include video_fd_daedalus in profile enumeration probe
LIBVA-2 follow-up. RequestQueryConfigProfiles walks each known
decoder fd via any_fd_supports_output_format() and adds a VAProfile*
for each codec OUTPUT format the V4L2 device advertises. The fd
list missed video_fd_daedalus — so on a Pi 5 with rpi-hevc-dec
primary + daedalus_v4l2 alt, only S265 (HEVC) was probed and the
H.264 / VP9 / AV1 profiles never got enumerated.
Effect on higgs: ffmpeg -hwaccel vaapi -i h264_test.mp4 reported
"No support for codec h264 profile 578" before the per-codec
dispatch in request_switch_device_for_profile could fire — the
profile-578 (H264 Constrained Baseline) check happened during
hwaccel init, found nothing in the libva profile list, and bailed
without ever calling into the daedalus path.
Fix: extend the fds[] array in any_fd_supports_output_format from
5 to 6 entries, with the sixth being video_fd_daedalus when
HAVE_DAEDALUS_V4L2 is on (and -1 otherwise so it's skipped by the
`if (fds[i] < 0) continue;` guard). After the fix, daedalus_v4l2's
OUTPUT format menu (VP9F + AV1F + S264) gets seen, and Request-
QueryConfigProfiles returns VP9Profile0 + AV1Profile0 + the H264*
profiles, all of which then route through the LIBVA-1 'd' kind
override in request_switch_device_for_profile.
Verified on higgs:
Before:
vainfo: Supported profile and entrypoints
VAProfileHEVCMain : VAEntrypointVLD
(only HEVC; H264/VP9/AV1 not enumerated)
ffmpeg vaapi -i h264 → "No support for codec h264 profile 578"
Build clean on boltzmann (only config.c.o + request.c.o recompile).
Backward-compatible on RK3399/3588 — the new slot is gated by
HAVE_DAEDALUS_V4L2 *and* video_fd_daedalus >= 0; both stay false in
those deployments. Existing 5-fd probe order unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
c332d34643 | Merge pull request 'request: route VP9/AV1/H.264 to daedalus_v4l2 on Pi 5 mixed deploy' (#6) from claude-noether/libva-v4l2-request-fourier:noether/libva-1-per-codec-dispatch into master | ||
|
|
6173a8da8e |
request: route VP9/AV1/H.264 to daedalus_v4l2 on Pi 5 mixed deploy
LIBVA-1 — when both rpi-hevc-dec and daedalus_v4l2 are loaded, finish
the per-codec dispatch so HEVC goes to rpi-hevc-dec (existing 'p'
override) and VP9 / AV1 / H.264 go to the daedalus daemon ('d').
Before this change the multi-device-probe accepted only ONE driver
plus a fixed alt slot (rkvdec↔hantro-vpu); on a Pi 5 with both decoders
the find_codec_device() walk preferred rpi-hevc-dec by known_decoder_
drivers[] order and never opened daedalus_v4l2, so VP9/AV1/H.264 frames
hit rpi-hevc-dec's S_FMT and failed.
Changes:
- request.c multi-device-probe: when primary = rpi-hevc-dec, alt =
daedalus_v4l2 (when HAVE_DAEDALUS_V4L2 is on); symmetric handling
in the daedalus_v4l2 primary branch so alt = rpi-hevc-dec. This
preserves the iter40 fallback (no daedalus → alt = NULL) when the
build option is off.
- request.c alt-driver opening block: generalized from the iter38
rkvdec/hantro pair to also dispatch into video_fd_rpi_hevc_dec and
video_fd_daedalus slots. Defensive close on unknown alt-driver
name (shouldn't happen — primary_driver branches gate the choices —
but keeps the slot tally clean if a future driver name is added
above without wiring up the dispatch here).
- request_switch_device_for_profile: added 'd' kind handler +
profile override block. When daedalus is open, VP9 / AV1 / H.264*
route to it. HEVC stays on rpi-hevc-dec via the existing 'p'
override. AV1 'a' kind (RK3588 vpu981) wins ONLY if vpu981 was
probed, so the override only fires on hosts where vpu981 stayed
-1 (i.e. Pi 5).
- RequestTerminate: close the daedalus_v4l2 fd pair on teardown
(was leaking — caught while reviewing the alt-driver expansion).
Build: meson + ninja clean on boltzmann (only pre-existing GStreamer
H265 parser noise). Behaviour on RK3399/3588 boxes unchanged — the
new branches are gated by HAVE_DAEDALUS_V4L2 *and* video_fd_daedalus
≥ 0, both of which stay false in those deployments.
Companion to daedalus-v4l2 481279c (Phase 8.13 systemd unit) and
marfrit-packages noether/daedalus-v4l2-kernel-6.18-compat branch.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
de27e95571 |
v4l2: log error_idx + failing ctrl id on S_EXT_CTRLS failure
Better diagnostic when VIDIOC_S_EXT_CTRLS returns < 0: read
back error_idx and print which control id rejected (or
"ioctl-level" when error_idx == count, meaning the rejection
was generic, not per-control).
Made it possible to triage the daedalus_v4l2 phase 8.13 issue
by separating "the actual stateless control failed" (would
show failing_ctrl_id=0xa40a2c VP9_FRAME) from "libva probing
H264/HEVC profile/level we don't expose" (failing_ctrl_id=
0xa40900 H264_PROFILE etc.) — the latter is harmless on a
VP9-only context.
Before:
v4l2-request: Unable to set control(s): Invalid argument
After (per-control):
v4l2-request: Unable to set control(s): Invalid argument
(error_idx=0/2 failing_ctrl_id=0xa40900 size=0)
After (ioctl-level):
v4l2-request: Unable to set control(s): Invalid argument
(error_idx=2/2 ioctl-level)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2146341460 |
daedalus_v4l2: meson option gate (default true)
Adds a build-time switch so platforms that will never see a
daedalus_v4l2 kernel module (Allwinner cedrus, RK without the
shim, etc.) can opt out of the probe entry + dispatch branch.
meson setup build # daedalus support on
meson setup build-off -Ddaedalus_v4l2=false # off
Implementation:
- meson_options.txt: new boolean `daedalus_v4l2`, default true.
- src/meson.build: when option is true, autoconfig.h gets
`#define HAVE_DAEDALUS_V4L2 1`.
- src/request.c: known_decoder_drivers[] entry, primary-driver
detection branch, and post-probe log line all gated by
#ifdef HAVE_DAEDALUS_V4L2.
- src/request.h: struct daedalus fields kept UNCONDITIONAL.
Two extra int per session and the struct layout stays stable
across translation units regardless of option — avoids the
ODR risk of every consumer of request.h needing to include
autoconfig.h before request.h.
Verified on hertz: both builds compile clean.
build/src/autoconfig.h has HAVE_DAEDALUS_V4L2; .so contains
"daedalus_v4l2" string + log message.
build-off/src/autoconfig.h doesn't; .so contains no daedalus
strings at all.
Default-on build still passes vainfo end-to-end:
vainfo: Driver version: v4l2-request
vainfo: Supported profile and entrypoints
VAProfileH264Main / High / ConstrainedBaseline / MultiviewHigh
/ StereoHigh : VAEntrypointVLD
VAProfileVP9Profile0 : VAEntrypointVLD
VAProfileAV1Profile0 : VAEntrypointVLD
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b5b3acf0f7 |
daedalus_v4l2: add to known_decoder_drivers + multi-device-probe slot
Phase 8.10 of the daedalus-v4l2 sibling campaign — out-of-tree
V4L2 stateless decoder shim that forwards bitstream to a
userspace daemon (FFmpeg-software decode for VP9 / AV1 / H.264;
pixels back via dmabuf into the CAPTURE buffer).
Adds the same iter40-shaped wiring as rpi-hevc-dec:
- known_decoder_drivers[] entry "daedalus_v4l2"
- video_fd_daedalus + media_fd_daedalus slots in driver_data
- -1 init alongside the other multi-device slots
- primary-driver detection branch in the auto-probe block
- post-probe log line for symmetry with iter40
No per-profile dispatch changes needed — daedalus_v4l2 advertises
the standard V4L2_PIX_FMT_{VP9_FRAME,AV1_FRAME,H264_SLICE}
OUTPUT fourccs the fork's existing per-driver paths already
handle.
Verified on hertz (Pi 5 / CM5, 6.12.75+rpt-rpi-2712) with the
daedalus_v4l2 module loaded:
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media3 \
vainfo --display drm --device /dev/dri/renderD128
v4l2-request: opened daedalus_v4l2 at video_fd=... media_fd=... (Pi 5 daemon-backed VP9/AV1/H264)
vainfo: Driver version: v4l2-request
vainfo: Supported profile and entrypoints
VAProfileH264Main : VAEntrypointVLD
VAProfileH264High : VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileH264MultiviewHigh : VAEntrypointVLD
VAProfileH264StereoHigh : VAEntrypointVLD
VAProfileVP9Profile0 : VAEntrypointVLD
VAProfileAV1Profile0 : VAEntrypointVLD
Without the env override the auto-probe still picks rpi-hevc-dec
first (it's earlier in known_decoder_drivers[]); on the standalone
daedalus_v4l2 path the daemon-backed decode is what answers
S_FMT/QBUF/DQBUF. On a mixed-driver Pi 5 box where both modules
are loaded, HEVC continues to route through rpi-hevc-dec via the
existing 'p' override; VP9/AV1/H264 would prefer daedalus_v4l2
since rpi-hevc-dec is HEVC-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
820557268b | Merge PR #5: ampere-av1 Phase 2 (master) — fourth-fd probe + AV1 enumeration | ||
|
|
c6f81c653f |
ampere-av1 Phase 2 (master): fourth-fd probe + AV1 enumeration
Imports the minimal "vainfo lists VAProfileAV1Profile0" layer from the operator's in-progress av1-iter1 branch (Phase 2 steps 1, 2 — commits |
||
|
|
9bb5a5a722 |
README: ffmpeg-v4l2-request-fourier flipped to published
Build + publish landed (2:8.1.r123329.b57fbbe-3, Kwiboo's v4l2-request-n8.1 tip + libudev-bypass companion patch). Deploy-host verified on fresnel: installs cleanly, ffmpeg buildconf shows --enable-v4l2-request, hwaccels list includes 'v4l2request', HEVC decode via -hwaccel v4l2request produces correct-size output. Quickstart per-host pacman -S lines now include ffmpeg-v4l2-request-fourier. Status table flipped its row from pending to published. Remaining pending: chromium-fourier (clang 22 -> 23 blocker), qt6-base-fourier (Wayland GL_ALPHA fix). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
0182307403 |
README: add Quickstart section with per-host install + full stack matrix
The TL;DR of 'what packages do I install to watch YouTube on my
Rockchip board with HW acceleration in Firefox' wasn't reachable
from this README without reading three other repos' commit
histories. Fixed.
Now landed at the top:
- Stack matrix: kernel (linux-{fresnel,ampere}-fourier) -> ffmpeg
(ffmpeg-v4l2-request-fourier) -> libva (libva-v4l2-request-fourier)
-> browser (firefox-fourier or chromium-fourier + kwin-fourier on
Wayland).
- Honest acknowledgement that the browser HW path is libavcodec
hwdevice DRM, not VAAPI-via-libva. This backend matters for mpv /
ffmpeg-as-vaapi consumers.
- Per-host pacman -S incantations for fresnel (RK3399), ampere
(RK3588), ohm (RK3566).
- Live marfrit repo URL + signing-key import flow.
- Smoke-test commands (vainfo + MOZ_LOG patterns).
- Honest status flag: ffmpeg-v4l2-request-fourier, chromium-fourier,
qt6-base-fourier exist in marfrit-packages source tree but NOT
yet in the live repo. Users building those locally now.
- RK3588 mainline (Feb 2026) called out alongside ampere row.
What hasn't changed: Pi 5 standoff section, technical notes,
existing iter39 / iter40 status tables.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
941fbc5b1b |
README: candid 'standoff' framing for Pi 5 HEVC + RK matrix
Replace the original 2018 Bootlin upstream README with the fourier-fork situation as of May 2026. What works: fresnel 5/5, ampere iter1+2, ohm baseline (all RK family, mainline VDPU381/383 landing Feb 2026 helps). What doesn't: Pi 5 HEVC via this backend. New 'The Pi 5 standoff' section captures the honest situation surfaced by the May 2026 web-research pass: - Kwiboo's ffmpeg-v4l2request hwaccel: 8 years un-merged upstream - libva-v4l2-request: no commits since ~2021 - rpi-hevc-dec mainline: 17 months in review, still not merged; Pi 6.18.x downstream has active HEVC regressions (#7228, #7306) - Mozilla bug 1969297 picks the ffmpeg-hwaccel-context path, not libva — explicit ack that strict drivers need libavcodec's internal SPS context - Frames the issue as ecosystem coordination failure (principal- agent stalemate), not architectural impossibility Notes that iter40 + iter40b lands but parks: backend infra is sound + reusable for any future strict V4L2 stateless target ffmpeg ships before libva does, but the user-facing Pi 5 HEVC story will not come from this backend — it'll come from Mozilla / Kwiboo / upstream coordination unblocking. iter38 5/5 fresnel + 9-profile ampere baselines preserved post-iter40b — documented as no-regression in phase7_pi5_hevc_close. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
071b08dcc2 |
iter40b: SPS-parse fix lands but bit-exact still blocked upstream
Per-driver gate added: when rpi-hevc-dec active, parse SPS NAL from surface_object->source_data via the iter2 vendored GStreamer parser and override the VAAPI-omitted v4l2_ctrl_hevc_sps fields (sps_max_num_reorder_pics, sps_max_latency_increase_plus1, sps_max_sub_layers_minus1, max_dec_pic_buffering_minus1[HighestTid]). Cached at driver_data->hevc_sps_field_cache. Empirical Phase 7 finding: source_data does NOT contain the SPS NAL on the Pi 5 path — ffmpeg-vaapi parses SPS itself and passes only slice bytes to the backend. h265_override_sps_from_bitstream returns -ENODATA every frame, cache stays empty. Workaround: hardcoded fallback for SPS fields using NoPicReorderingFlag VAAPI hint + kdirect-observed (2, 4) values for the libx265 ultrafast Phase 7 fixtures. Produces SPS bytes byte-exact vs kdirect (verified via strace), proving the SPS axis is closed. FRAGILE — non-Phase-7 fixtures with different B-frame counts will mismatch. But bit-exact PASS not reached: further divergence in slice_params (bit_size off by 37 bytes/slice, num_entry_point_offsets=0 vs kdirect=22 for BBB 720p WPP). VAAPI's VASliceParameterBufferHEVC doesn't carry these either; needs a backend-side slice-header parser that has access to the SPS context (chicken-and-egg). Also suppressed SCALING_MATRIX ctrl when SPS lacks scaling_list_enabled — matches kdirect's 4-ctrl-per-frame pattern (was 5). Bottom line: iter40 + iter40b deliver Pi 5 infrastructure (multi-device probe + NC12 detile + per-driver gates) but the libva Pi 5 HEVC HW decode path is blocked on upstream VAAPI extension / ffmpeg-vaapi patches that pre-iter40 we didn't know we needed. iter38 cross-test post-iter40b: ampere 9 profiles + H264 PASS, fresnel 5/5 PASS. No sibling regression. Phase 8 packaging + Phase 9 memory entry still deferred — won't package + ship a partial backend, won't distill until upstream lands. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
9037934b21 |
phase7_pi5_hevc_close: iter40 partial — backend integration works, decode rejected by rpi-hevc-dec
C1 vainfo PASS, C3 HW engagement PASS, C6 decode-correctness FAIL (V4L2_BUF_FLAG_ERROR on every CAPTURE DQBUF). Root cause empirically located: SPS sps_max_num_reorder_pics + sps_max_latency_increase_plus1 fields. Our backend uses a spec-legal fallback (sps_max_dec_pic_buffering_minus1, 0) because VAAPI doesn't forward these fields; rkvdec accepts it, rpi-hevc-dec validates against bitstream-true values and rejects. Real fix needs SPS NAL parse via the iter2 vendored GStreamer parser to populate bitstream-true values for the V4L2 SPS ctrl. Estimated 1 more 8(+1)-phase loop (iter40b). Phase 8 + Phase 9 deferred — won't package + deploy + ship a broken backend; won't distill lessons until the real fix lands. Sibling iter38 baseline NOT yet re-verified on fresnel + ampere post-iter40. Code paths gated on video_fd_rpi_hevc_dec >= 0 stay no-op on non-Pi hosts; only __arm__ → __aarch64__ guard change is globally observable but its is_10bit sub-gate stays dormant on 8-bit fixtures. Verify before declaring no-regression. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
3ffa9d0d17 |
iter40: Pi 5 HEVC chapter — backend integration lands, bit-exact pending
Phase 6 implementation. Backend builds clean on higgs (Debian 13 trixie, aarch64), vainfo lists VAProfileHEVCMain via rpi-hevc-dec, multi-device probe finds /dev/video19 + /dev/media1, CreateContext + S_FMT + REQBUFS + STREAMON all succeed. Phase 7 partial: infrastructure works, 10 frames flow through the pipeline (correct byte counts produced — 13824000 for 1280x720 x 10 NV12 frames). But every DQBUF CAPTURE returns V4L2_BUF_FLAG_ERROR so output content is wrong (libva sha != kdirect sha). The decode itself is failing on the rpi-hevc-dec side despite all ctrl submissions returning success. Code changes: - request.h: video_fd_rpi_hevc_dec / media_fd_rpi_hevc_dec slots + has_hevc_ext_sps_rps_rpi_hevc_dec flag (mirrors iter38 + iter2 pair-of-flags pattern, naturally false on Pi). - request.c: known_decoder_drivers gains rpi-hevc-dec; primary-driver probe gets an else-if branch setting the new fds (Phase 5 F3); request_switch_device_for_profile prefers 'p' for HEVC when rpi-hevc-dec present. - context.c: per-fd want_pixfmt (NC12 on Pi), capture_pixelformat taken from video_format slot (not hardcoded NV12/NV15); synthetic-SPS pre-seed gated off for Pi (Phase 5 F6); destination_sizes uses nv12_col128_uv_plane_offset for NC12 SAND layout (Phase 5 F2); per-driver HEVC_START_CODE (NONE on Pi, ANNEX_B on RK); per-driver context_object->h264_start_code (skip prepend on Pi). - video.c: NV12_COL128 video_format entry (8-bit SAND, single buffer, 2 planes, NV12 drm_format with MOD_NONE so detile branch fires rather than tiled_to_planar). - nv12_col128.c/.h: detile primitive (Y + UV per-plane, kernel hevc_d_video.c bytesperline formula + ffmpeg/Kynesim per-pixel offset). UV plane offset = 128 * ALIGN(h, 8) — within-column (SAND interleaves Y+UV per column, NOT plane-concatenated; earlier wrong formula caught by Phase 7 SEGV). - image.c: #ifdef __arm__ extended to __arm__ || __aarch64__ (Phase 5 F1 — guard was killing detile path on all aarch64 hosts including fresnel iter39 NV15 path, masked because 10-bit never exercised); RequestCreateImage NC12 → NV12 stride override (linear width, not column-stride); copy_surface_to_image NC12 detile branch (gates on fourcc + v4l2_format). - nv15.h: fallback V4L2_PIX_FMT_NV15 define (Debian 13 headers omit it though they have NC12). - nv12_col128.h: fallback V4L2_PIX_FMT_NV12_COL128 + V4L2_PIX_FMT_NV12_10_COL128 (Arch / mainline pre-Pi headers). - tests/test_nv12_col128_detile.c: hand-crafted-bytes unit test; passes (8 cases: Y + UV for 4 widths incl. 1366 misaligned; UV-offset helper). - meson.build / nv12_col128 sources listed. Phase 7 status: not yet bit-exact. Remaining diagnosis: per-frame S_EXT_CTRLS payload diff vs kdirect (kdirect sends 4 ctrls SPS+PPS+decode_params+slice_array; ours sends 5 incl. scaling_matrix; field ordering differs). Likely the slice_array contents need per-driver handling for rpi-hevc-dec's expected layout. Beyond in-session reach. iter38 5/5 baseline on fresnel + ampere should be unaffected (new fd stays -1 on non-Pi hosts; all gates either short-circuit on fd-not-present or no-op). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
f1be489c75 |
phase5_pi5_hevc_review: 3 critical findings empirically verified, 1 fixture gap
Sonnet Plan-agent review of phase1_pi5_hevc plan. Empirically verified each finding against current source per feedback_review_empirical_over_theoretical BEFORE accepting: F1 (CRITICAL): #ifdef __arm__ at image.c:239+268 kills NC12 (and already-present NV15) detile on AArch64. fresnel iter39 5/5 PASS masked this because 10-bit path was never exercised. Fix: extend guard to __aarch64__. F2 (CRITICAL): destination_bytesperlines for NC12 source returns column-stride (1080) not linear-NV12 Y stride (1280). VAImage consumers see wrong pitch. Fix: override in RequestCreateImage when src=NC12, dst image=NV12. F3 (CRITICAL): request.c primary-driver detection has else-if branches for rkvdec and hantro-vpu only. On higgs (rpi-hevc-dec primary), neither matches → new fd pair stays -1 → routing no-ops. Fix: add explicit rpi-hevc-dec branch. F4 (accepted): add 1366x768 fixture to exercise column padding. F5 (verify-only): HEVC START_CODE_ANNEX_B may not work on rpi-hevc-dec (kdirect uses NONE). Don't pre-gate; verify empirically in Phase 7. F6 (CRITICAL): iter25 synthetic-SPS pre-seed fires for HEVC regardless of driver_kind. Would issue HEVC_SPS to rpi-hevc-dec which doesn't need it AND uses different submission order. Fix: gate on driver_data->video_fd != video_fd_rpi_hevc_dec. F7/F8 (no findings): image.c gate predicate sound; cross-device regression scope clean. Amended Phase 6 step list with 3 new gating actions. Phase 7 verification expanded with empirical START_CODE check + 1366 fixture. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
bf52725ab3 |
phase1_pi5_hevc: lock goal + situation + N=3 baseline + plan (iter40)
Phase 1 measurable goal: HEVC Main 8-bit bit-exact libva-vs-kdirect on higgs for 640x360 / 1280x720 / 1920x1080 fixtures with HW path engagement verified via lsof + ffmpeg-vaapi log signal. Phase 2 surface-area audit: ~250 LoC backend + 100 LoC standalone detile primitive. Reuses iter38 multi-device-probe pattern (now 3 slots: rkvdec + hantro + rpi-hevc-dec) + iter2 per-driver gating shape. h265_set_controls + iter31 a-29 plumbing transfers unchanged. iter25 SPS pre-seed gated off for rpi-hevc-dec. Phase 3 baseline locked: N=3 bit-exact SW==kdirect for all three fixtures on higgs. kdirect engagement signal: Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19; buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8 Phase 4 plan: 7 sequenced steps (request.h -> request.c -> video.c -> nv12_col128.c new -> image.c branch -> meson/Makefile -> build on higgs). NC12 tile geometry locked from kernel hevc_d_video.c math + ffmpeg/Kynesim av_rpi_sand_to_planar_y8 byte-offset formula. Risks + mitigations enumerated. Phase 5 sonnet review explicitly requested per CLAUDE.md no-skip-reviews rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
b6a65fc692 |
phase0_pi5_hevc: close addendum with empirical higgs probe data
Live probe of rpi-hevc-dec on higgs (Pi CM5, kernel 6.12.75-rpt-rpi-2712, Debian 13 trixie) answers Phase 0 open questions Q1, Q2, Q5, Q6 empirically; Q3 partial; Q4 still open. Q1 (EXT_SPS): NOT present. Only standard V4L2_CID_STATELESS_HEVC_*. Probe ctrl id 0xa97 returns EINVAL — same gate iter2's has_hevc_ext_sps_rps_rkvdec uses. iter31 alpha-29 plumbing applies. Q2 (hevc_start_code): default 0 "No Start Code"; matches our behaviour. Q3 (NC12 SAND tile layout): partial. CAPTURE S_FMT for 1280x720 NC12 returns sizeimage=1382400 (linear NV12 byte count) but bytesperline=1080 (suspect, encodes SAND col count not linear stride). Need kernel-doc / driver-source read before writing detile primitive. Q4 (DRM modifier round-trip): hwdownload rejects SAND-tiled drm_prime (-38 Function not implemented). Backend CPU-detile to NV12 is the safe path for Firefox. Q5 (submission ordering): empirical ioctl trace shows canonical V4L2 stateless flow. Two notes for the backend: kdirect uses V4L2_MEMORY_DMABUF for both queues (we use MMAP for CAPTURE on rkvdec); kdirect does NOT need the iter25 SPS pre-seed pattern - rpi-hevc-dec takes explicit NC12 + dims directly. Q6 (packaging): Debian 13 trixie. Phase 8 needs a debian/ tree, not just PKGBUILD. Decision in Phase 1. Other findings: ffmpeg 7.1.3 from stock Debian is built with --enable-v4l2-request. kdirect engagement line: Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19; buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8 No libva ICD installed (only armada-drm_dri.so). mpv installable. Firefox 145 + rpi-firefox-mods present. Phase 0 closed. Phase 1 opens with goal: HEVC bit-exact libva-vs-kdirect on higgs for 1280x720 Main 8-bit via the new RPI_HEVC_DEC driver_kind slot + NC12 detile primitive. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
25b8a15e09 |
phase0_pi5_hevc: open Pi 5 / CM5 HEVC chapter (substrate doc only)
Empirical higgs probe (sibling session 2026-05-17) confirmed rpi-hevc-dec at /dev/video19 is V4L2 STATELESS, not stateful: - Section header literally "Stateless Codec Controls" - OUTPUT V4L2_PIX_FMT_HEVC_SLICE (parsed slices), not full-stream HEVC - V4L2_CID_STATELESS_HEVC_* control set + slice_param_array[4096] - CAPTURE NC12 / NC30 (V4L2_PIX_FMT_NV12_COL128 / _10_COL128, SAND 128-column tiled, Pi-specific) So the Pi 5 HEVC HW path belongs HERE (request/stateless backend), not in a separate stateful project. Replaces the now-deleted libva-v4l2-stateful-fourier scaffold attempt. phase0_pi5_hevc.md captures: - Substrate (target host, backend baseline, empirical probe output) - What carries forward unchanged (most of HEVC plumbing) - What needs adding (RPI_HEVC_DEC driver_kind, NC12/NC30 video_format + detile primitive, image.c branch — small surface area) - Six open questions Phase 1 must answer first (EXT_SPS presence, start_code default, SAND tile spec, drm_prime modifier round-trip, rpi-hevc-dec submission ordering quirks, packaging target OS) - Phase 1 goal sketch (NOT locked) + Phase 3 baseline plan No code in this commit. Phase 1 opens when higgs is up + first two open questions are answered live. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
cf8cd9d2be |
h265: cap pred-weight + ref-list loops at VAAPI source size (15)
V4L2_HEVC_DPB_ENTRIES_NUM_MAX is 16, but VASliceParameterBufferHEVC::RefPicList is [2][15] and the eight delta_*_weight_lX / luma_offset_lX / delta_chroma_weight_lX / ChromaOffsetLX arrays are all [15]. Iterating the per-slot copy loops to 16 over-reads the VAAPI source by one element. The bug was always there but hidden under -O3 (meson's default buildtype=release): GCC unrolled the inner loop and dead-folded the out-of-bounds load. Under -O2 (Arch makepkg CFLAGS) the canonical vectorised loop ran and produced a real SEGV at v4l2_request_drv_video.so + 0xb3a4 inside h265_fill_slice_params, breaking HEVC immediately after the package install on fresnel (iter38 5/5 baseline dropped to 4/5). Define a local VA_HEVC_REF_LIST_LEN (15) and use it as the cap for the four offending loops. RefPicList and pred_weight_table copies now respect the source bound; V4L2 destination still has 16 slots, the upper one stays at memset-zero which is correct. Verified locally: -O2 build + package re-install restores HEVC to bit-exact PASS vs kdirect (sha 108f925bb6cbb6c9). iter38 5/5 baseline restored. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
c9f32aff49 |
iter39 Option B revert of 63fed87: P010 advertisement gated on is_10bit again
Phase 7 fix |
||
|
|
6bc12fe7e4 |
iter39 Option B: drop Hi10P + Main10 from RequestQueryConfigProfiles
Per Phase 7 close + user-directed Option B trigger (web research /
rockchip-mpp showed Hi10P is effectively impossible on the current
stack). Cross-test on ampere RK3588 confirmed the SAME failure mode
as fresnel RK3399 — both produce all-zero output via libva; kdirect
fails with EINVAL on both. The blocker is in ffmpeg-v4l2-request
userspace plumbing for the new uAPI controls Karlman's kernel patches
introduced, NOT in our backend or the kernel.
Sources confirming kernel + HW capable but userspace pending:
- lwn.net/Articles/950434: "to fully runtime test... you may need
upstream DRM commits, FFmpeg patches"
- patchwork.kernel.org Karlman v6 → v10 series on linux-media
- Rockchip RK3399 + RK3588 datasheets list 10-bit H.264 support
Stop enumerating Hi10P + Main10 so VAAPI consumers don't try the
broken path. The backend infrastructure (codec.c profile cases,
context.c NV15 CAPTURE + synthetic SPS bit_depth=2 + video_format
invalidation, image.c P010 reporting + NV15→P010 unpack, surface.c
RT_FORMAT_YUV420_10 guard + NV15 PRIME fourcc, nv15.c + nv15.h
unpack primitive, request.h is_10bit flag) is RETAINED — just
re-add the two profiles[index++] lines and bump the H264 guard
back to (-6) when upstream ffmpeg-vaapi V4L2 hwaccel learns 10-bit.
Memory: feedback_rk3399_h264_hi10p_advertised_not_functional.md
captures the empirical evidence for future iterations.
vainfo after this commit: 10 profiles (was 12), matches the iter38
baseline. iter38 5/5 PASS preserved (no other codec touched).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
63fed87bc5 |
iter39 fresnel fix: advertise P010 unconditionally in QueryImageFormats
ffmpeg-vaapi's hwcontext_vaapi calls vaQueryImageFormats during hwframes context setup, BEFORE vaCreateContext fires. Our previous gate on driver_data->is_10bit meant P010 wasn't in the catalog at that early query — ffmpeg's hwdownload then rejected pix_fmt=p010le with "Invalid output format p010le for hwframe download" and decode failed before our backend's CreateContext saw the 10-bit profile. Fix: advertise P010 unconditionally in QueryImageFormats. Safe because consumers ask for P010 only when their decode pipeline needs 10-bit, and our P010 unpack path in copy_surface_to_image is gated on image->format.fourcc == VA_FOURCC_P010 (independent of is_10bit). Verified on fresnel: with this fix, Hi10P decode advances past the hwdownload filter setup. (Run pending bundle to fresnel.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
a13215de45 |
iter39 fresnel fix: skip pre-S_FMT NV15 CAPTURE format probe
RK3399 rkvdec advertises NV15 in VIDIOC_ENUM_FMT(CAPTURE) only AFTER S_FMT(OUTPUT) + S_EXT_CTRLS(SPS) resolve image_fmt to 420_10BIT. Pre-flight v4l2_find_format(NV15) always returns 0 → video_format stays NULL → CreateContext returns OPERATION_FAILED → ffmpeg-vaapi hwaccel init fails with "Failed to create decode context: 1". Verified on fresnel (kernel 7.0-14 / linux-fresnel-fourier): v4l2-ctl -d /dev/video1 --list-formats → only NV12 enumerated Fix: for 10-bit profiles, skip the find_format probe and directly map to our NV15 video_format entry. The later S_FMT(CAPTURE) in the same RequestCreateContext path commits the actual NV15 mode once the synthetic-SPS injection sets bit_depth_luma_minus8=2. Discovered during Phase 7 sub-profile verification — Criterion 1 (vainfo enumeration) PASSed but Criteria 2/3 (Hi10P/Main10 decode) failed with the hwaccel init error. iter38 5/5 baseline still PASSES (no regression — non-10-bit path unchanged). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
f0ef69d279 |
iter2 step4: wire h265_set_controls to populate EXT_SPS_*_RPS controls
Per Phase 4 plan + Phase 5 review amendments (SPS parse-and-cache,
per-fd gating).
src/h265.c additions:
- #include <errno.h>, the v4l2-hevc-ext-controls.h, and the
vendored gst/codecparsers/gsth265parser.h
- new static helper h265_populate_ext_sps_rps_cache(): walks
surface_object->source_data for an SPS NAL (nal_unit_type == 33)
using gst_h265_parser_identify_nalu; if found, calls
gst_h265_parser_parse_sps_ext (NOT gst_h265_parser_parse_sps —
the latter discards the per-RPS-entry EXT data we need); maps
GstH265ShortTermRefPicSet (base) + GstH265ShortTermRefPicSetExt
(carrying use_delta_flag[16], used_by_curr_pic_flag[16],
delta_poc_s0_minus1[16], delta_poc_s1_minus1[16]) into the V4L2
struct arrays; stores on driver_data->hevc_rps_cache_*
- non-IDR-frame handling: cache holds across frames, so frames
whose source_data lacks an SPS NAL reuse the previously-parsed
cached arrays (Phase 5 review item #3)
- controls[] grows from [5] to [7]; the 2 new entries are appended
after the standard 5 (SPS/PPS/SLICE_PARAMS/SCALING_MATRIX/
DECODE_PARAMS), gated by driver_data->has_hevc_ext_sps_rps_rkvdec
(per-fd probe result from Step 3) + the cache being valid
- field-by-field mapping mirrors GStreamer's
gst_v4l2_codec_h265_dec_fill_ext_sps_rps verbatim (the upstream
reference identified in Phase 0 prior-art survey)
src/request.h additions:
- struct request_data carries hevc_rps_cache_st (array pointer),
_st_count, hevc_rps_cache_lt, _lt_count, hevc_rps_cache_valid.
Single-slot cache (sps_id 0 only; multi-SPS streams would need
expanding). Stores POST-MAPPED V4L2 structs so request.h doesn't
need to know GstH265SPS / GstH265SPSEXT types.
Critical interpretation correction (Phase 5 review followup):
GstH265SPS has short_term_ref_pic_set[65] (base) but NOT
short_term_ref_pic_set_ext[]. The EXT array lives on a SEPARATE
GstH265SPSEXT struct accessed via gst_h265_parser_parse_sps_ext.
The 'plain' gst_h265_parser_parse_sps internally calls _ext with a
LOCAL discarded SPSEXT (see gsth265parser.c:2050). Our call must
use the _ext variant directly to keep the EXT data. Caught during
Step 4 first-build error.
Build verified: ninja -C build clean. .so is 759 KB (up from 485 KB
original, 682 KB after Step 2 vendor — the +80 KB is the new helper
+ extension).
iter2 Phase 6 Step 5 (install + reboot + smoke-test) is the F1
falsifier moment: if HEVC stops OOPSing, mechanism confirmed; if it
still OOPSes, loopback Phase 0 with re-opened kernel-agent#11.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
393d02f413 |
iter2 step3: HEVC EXT_SPS_*_RPS UAPI header + runtime probe
src/hevc-ctrls/v4l2-hevc-ext-controls.h (NEW, MIT, ~95 LOC):
Verbatim mirror of Linux 7.0 V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS
and _LT_RPS control IDs + struct definitions + flag macros. Each
symbol is ifndef-guarded so when ampere's linux-api-headers
eventually bumps to 7.0+, the kernel header takes precedence and
this shim silently no-ops. Citation block links the upstream
Casanova v8 series.
Per LGPL section 3.b, kernel UAPI struct definitions are excepted
from GPL inheritance, so copying them into MIT userspace is fine.
src/request.h: added has_hevc_ext_sps_rps_rkvdec + _hantro bool
fields on struct request_data — pair-of-flags layout mirrors
video_fd_rkvdec / video_fd_hantro (iter38 multi-device-probe
pattern, per feedback_multi_device_probe_design). Phase 5 review
identified single-scalar storage as a silent-misbehavior risk
across device-switch boundaries.
src/request.c:
- new probe_hevc_ext_sps_rps_controls(fd) helper: queries the two
new CIDs via VIDIOC_QUERYCTRL; returns true iff both register.
RK3399 rkvdec (linux 6.x or 7.x without VDPU381/383 bindings)
returns false; RK3588 rkvdec (VDPU381/383) returns true.
- probe each driver_data->video_fd_rkvdec / _hantro after the
iter38 multi-device-probe block at VA_DRIVER_INIT time
- log-line if rkvdec supports it - diagnostic for Phase 7
src/meson.build: added the new UAPI header to the headers list.
Build verified: ninja -C build clean, .so produced. The new probe
runs at driver init and stores the result, but nothing CONSUMES the
result yet — that's Step 4 (h265_set_controls wiring).
Per ampere-kernel-decoders campaign iter2 Phase 4 step 3 (amended
by Phase 5 review item 'per-fd storage').
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
9f7437e8ee |
iter2 step2: GLib/GStreamer compat shim, build succeeds
Vendored gsth265parser + nalutils + gstbitreader + gstbytereader (the
Step 1 commit) compile cleanly against libc + libv4l2 only after
adding 1 compat translation unit + 5 stub headers, no edits to the
vendored .c/.h files themselves.
src/h265_parser/gst_compat.{h,c} — new files (MIT, original work):
- GLib type aliases (gboolean, gchar, gint*, guint*, gsize, gpointer)
- Memory helpers (g_malloc/g_free as #define free, g_memdup2 inline)
- Asserts as no-op + parser-return-code-propagation
- All GST_DEBUG/INFO/WARNING/ERROR/LOG/FIXME as no-ops (the parser
is heavy on debug logging; we compile it all out)
- GArray implementation (~100 LOC, just enough for gsth265parser.c's
24 call sites)
- GList full struct with .data/.next/.prev so callers compile;
list-manipulation functions abort() — dead code paths only
- Byte-order read/write macros (GST_READ_UINT8/16/24/32/64_LE/BE,
GST_WRITE_UINT8/16/24/32_BE) — aarch64 LE inlines
- g_once_init_enter/leave as simple gate
- G_MAXUINT*, G_MAXINT*, G_MINxxx, G_GNUC_* attribute macros, etc.
- Opaque GstBuffer/GstMemory/GstMapInfo + abort-stub functions for
the encoder-side SEI-insertion paths the libva backend never invokes
- gst_util_ceil_log2 real impl (used by slice-header parser; dead
for our SPS-only call path but cheaper to implement than stub)
src/h265_parser/gst/{gst.h,base/base-prelude.h,base/gstbitwriter.h,
codecparsers/codecparsers-prelude.h,glib-compat-private.h} — 5 new
stub headers (MIT). All include gst_compat.h. gstbitwriter.h adds
abort-stub functions for the bit-writer API (used by nalutils.c's NAL
emulation-prevention encoder path — dead code for the parse-only
libva backend).
src/meson.build — added the 5 new .c source files and 10 new .h
headers; added include_directories('h265_parser') to the include path
so the vendored files' '#include <gst/base/...>' style references
resolve to the stub headers + actual vendored files in the local
tree.
Build verified: ninja -C build produces v4l2_request_drv_video.so
(682 KB, up from 485 KB pre-vendor — the +200 KB is the vendored
parser code). nm shows gst_h265_parse_sps, gst_h265_parse_sps_ext,
gst_h265_parser_identify_nalu, and the other functions we need for
Step 4 are present in the binary.
Two #warning messages from gsth265parser.h about API stability are
upstream-intentional and harmless ('The H.265 parsing library is
unstable API and may change in future').
This commit completes Step 2 of ampere-kernel-decoders iter2 Phase 6.
Backend remains functionally identical to pre-iter2 — the new code
compiles + links but is not yet called from h265_set_controls (that's
Step 4). Existing 5 codecs continue to work as before.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
c9b7fcff50 |
iter2 step1: vendor GStreamer 1.28.2 H.265 parser unchanged
Source: gitlab.freedesktop.org/gstreamer/gstreamer @ commit 43421c2a5b8a (refs/tags/1.28.2). All 8 vendored files copied verbatim into src/h265_parser/: gst-plugins-bad/gst-libs/gst/codecparsers/gsth265parser.c (168 KB) gst-plugins-bad/gst-libs/gst/codecparsers/gsth265parser.h ( 92 KB) gst-plugins-bad/gst-libs/gst/codecparsers/nalutils.c (13 KB) gst-plugins-bad/gst-libs/gst/codecparsers/nalutils.h ( 8 KB) gstreamer/libs/gst/base/gstbitreader.c ( 8 KB) gstreamer/libs/gst/base/gstbitreader.h ( 10 KB) gstreamer/libs/gst/base/gstbytereader.c ( 39 KB) gstreamer/libs/gst/base/gstbytereader.h ( 25 KB) Total ~11 KLOC, LGPL v2.1+ per original headers (Intel + Sreerenj Balachandran + others). LGPL headers preserved verbatim. Backend's existing COPYING.LGPL covers redistribution. ** Build is INTENTIONALLY BROKEN at this commit. ** GLib dependencies (GArray, g_malloc, gboolean, GST_DEBUG, etc.) are not yet satisfied; src/Makefile.am is not yet updated to include these files. Step 2 performs the GLib-to-libc mechanical adaptation; Step 3 wires the header + Makefile. This vendor-unchanged commit is the upstream-tracking baseline. When GStreamer ships a parser bug fix, the future-sync workflow is: git diff src/h265_parser/ HEAD..(this commit) to surface our adaptations, then rebase those over the upstream fix. Per ampere-kernel-decoders campaign iter2 Phase 4 §Step 1 (/home/mfritsche/src/ampere-kernel-decoders/phase4_plan_iter2.md). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
a8a91d92d6 |
Revert "ampere iter2: HEVC EXT_SPS_ST_RPS / _LT_RPS dynamic-array submission (VDPU381)"
This reverts commit
|
||
|
|
f61f736380 |
ampere iter2: HEVC EXT_SPS_ST_RPS / _LT_RPS dynamic-array submission (VDPU381)
Fixes the rkvdec_hevc_prepare_hw_st_rps out-of-bounds kernel OOPS that blocked HEVC decode on ampere (RK3588) per marfrit/libva-v4l2-request-fourier#3 and ampere-fourier iter1 close. Mechanism (Phase 5 amendment to issue body): The new EXT_SPS controls are registered as V4L2_CTRL_FLAG_DYNAMIC_ARRAY in vdpu38x_hevc_ctrl_descs (rkvdec.c:279/284) with cfg.dims = { 65 }. The v4l2-ctrl framework init-allocates 1 zeroed element (ctrls-core.c:2116). When num_short_term_ref_pic_sets > 1, rkvdec_hevc_prepare_hw_st_rps (rkvdec-hevc-common.c:393-405) iterates idx 0..N-1 and overruns the 1-element kernel allocation. Submitting an N-element dynamic-array control via S_EXT_CTRLS extends the framework allocation. Userspace fix: - VIDIOC_QUERY_EXT_CTRL probe at first HEVC CreateContext sets driver_data->has_ext_sps_rps (true on VDPU381/383, false on legacy RK3399 — control unregistered there, so fresnel iter38 5/5 + iter39 sub-profile paths are byte-identical to pre-iter2). - When set, h265_set_controls appends EXT_SPS_ST_RPS + _LT_RPS as calloc'd zero arrays, sized by VAAPI's count fields and capped at H.265 §7.4.3.2 spec maxima (ST 64, LT 32). Min 1 (kernel rejects 0). - Free post-S_EXT_CTRLS. Decode correctness scope: VAAPI does NOT expose per-set st_ref_pic_set syntax elements (delta_idx_minus1, delta_rps_sign, etc.) — confirmed in va_dec_hevc.h. All-zero entries give empty inter-pred RPS per set, which is correct for IDR-only streams and incorrect for streams with inter-pred RPS dependence. iter2 acceptance: stop the OOPS. Decode-correctness for inter-RPS content is a known follow-up requiring either bitstream-snoop or SPS-passthrough via a new VAAPI extension. Files: - include/hevc-ctrls.h: #ifndef-guarded fallback definitions for V4L2_CID_STATELESS_HEVC_EXT_SPS_{ST,LT}_RPS + structs (ampere host is on linux-api-headers 6.19-1; the new CIDs land in 7.0). - src/request.h: driver_data->has_ext_sps_rps (persists for driver lifetime; gated solely by HEVC code path so cross-codec leakage impossible). - src/context.c: probe at HEVC CreateContext via v4l2_query_ext_ctrl. - src/h265.c: controls[5] → controls[7]; #include <hevc-ctrls.h> (replaces <linux/v4l2-controls.h>) for forward UAPI compatibility. Compile-tested on boltzmann (aarch64 native, gcc 15.2.1): clean .so, 0 new warnings. Fresnel cross-device safety: legacy RK3399 rkvdec_ctrl table omits the CIDs; probe returns false; new code path never executes. iter39 sub-profile work (commits |
||
|
|
8746690739 |
iter39: add NV15 → P010 unpack self-test (tests/test_nv15_unpack.c)
Pure-C unit test for nv15_unpack_plane_to_p010, independent of any V4L2
hardware. Verifies bit layout against the spec at
Documentation/userspace-api/media/v4l/pixfmt-nv15.rst by packing known
10-bit pixel values, running the unpack, and asserting P010 output
matches pixel<<6.
Coverage:
- zero, all-max
- 8 known position/spread vectors
- widths {1, 2, 3, 7, 8} including remainder paths
- multi-row with stride padding
- chroma-shape (half-height)
Build + run:
cc -Wall -Werror -O2 -o test_nv15_unpack \
tests/test_nv15_unpack.c src/nv15.c
./test_nv15_unpack
Confirmed PASS on noether (x86_64 native). Catches the highest-risk
class of regression in iter39 — silent bit-shift errors in the unpack —
without requiring fresnel hardware.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
662f8874ba |
iter39 α-31: H264 Hi10P + HEVC Main10 sub-profile support (10-bit, rkvdec NV15)
Adds VAProfileH264High10 and VAProfileHEVCMain10 to the libva-v4l2-request
backend. RK3399 rkvdec emits decoded frames as V4L2_PIX_FMT_NV15 (4 × 10-bit
values packed in 5 bytes per element); VAAPI consumers receive standard
VA_FOURCC_P010 via a new userspace unpack in copy_surface_to_image.
VP9 Profile 2 explicitly NOT added — RK3399 rkvdec kernel ctrl table
caps at V4L2_MPEG_VIDEO_VP9_PROFILE_0 (rkvdec.c::rkvdec_vp9_ctrl_descs).
Touchpoints (per Phase 5 sonnet-architect review amendments):
- include/drm_fourcc.h: define DRM_FORMAT_NV15 (vendored libdrm lacks it)
- src/nv15.{c,h}: NV15 → P010 plane unpack (LSB-first, per
Documentation/userspace-api/media/v4l/pixfmt-nv15.rst)
- src/video.c: NV15 entry in formats[] (else NULL-deref on video_format_find)
- src/codec.c: pixelformat_for_profile cases for Hi10P + Main10
- src/config.c: enumeration, validation, entrypoints, RT_FORMAT_YUV420_10
advertisement for 10-bit profiles
- src/context.c: per-profile CAPTURE pix_fmt (NV12/NV15), 10-bit synthetic
SPS (bit_depth_luma_minus8=2), video_format invalidation on bit-depth
transition (sibling to iter38 device-switch invalidation), is_10bit flag
- src/surface.c: RT_FORMAT_YUV420_10 admission, NV15 fourcc on PRIME export
- src/image.c: P010 reporting in DeriveImage + QueryImageFormats,
P010-aware sizing in CreateImage, NV15 → P010 unpack call in
copy_surface_to_image (gated on is_10bit + image.format.fourcc == P010)
- src/picture.c: 4 switch blocks route Hi10P/Main10 to existing H264/HEVC
per-codec paths
- src/request.h: MAX_PROFILES bump 11 → 13, driver_data->is_10bit flag
Scope: COPY path (vaGetImage / vaDeriveImage) only. Standard ffmpeg-vaapi
hwdownload, mpv vaapi-copy, and any consumer using vaGetImage works
end-to-end. PRIME-path consumers that only know NV12/P010 must use the
COPY path; PRIME consumers aware of NV15 (panfrost-Mesa et al.) get the
correct fourcc on RequestExportSurfaceHandle. PRIME-side P010 emission is
follow-up scope (would need DRM_FORMAT_P010 + per-plane unpack into a
GPU-accessible buffer).
Compile-tested on boltzmann (aarch64 native, gcc 15.2.1, libva 1.23.0,
libdrm 2.4.133): clean build, .so produced, 0 new warnings.
Phase 0/2 evidence: linux-mmind-v7.0 drivers/media/platform/rockchip/rkvdec.
rkvdec_h264_decoded_fmts[] and rkvdec_hevc_decoded_fmts[] both list NV15;
ctrl tables cap at HEVC MAIN_10 and H264 HIGH_422_INTRA (Hi10P < cap, not
in menu_skip_mask). image_fmt resolution (rkvdec-h264-common.c:196,
rkvdec-hevc-common.c:467) dispatches on bit_depth_luma_minus8 only.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
7ac934e0c5 |
iter38b: bounds check uses MAX_PROFILES (11), not MAX_CONFIG_ATTRIBUTES (10)
Latent bug surfaced by iter38 multi-device probe. profiles[] array in RequestQueryConfigProfiles is sized by V4L2_REQUEST_MAX_PROFILES (set as context->max_profiles=11 in VA_DRIVER_INIT), but the bounds checks used V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES (10). Pre-iter38 only a single device's profiles were enumerated, total ≤9, so the off-by- one never bit. With iter38's rkvdec+hantro union (10 profiles total across MPEG2/H264/HEVC/VP8/VP9), the last enumerator (VP9) hit index=9 with the check 'index < 10-1 = 9' → skipped. |
||
|
|
c56a77bd4c |
iter38: multi-device probe — single libva session serves all 5 codecs
Probe BOTH rkvdec and hantro-vpu at VA_DRIVER_INIT and keep their
{video,media}_fd pairs in driver_data. RequestQueryConfigProfiles
enumerates the union of supported profiles from all open fds.
RequestCreateConfig retargets driver_data->{video,media}_fd to the
device that serves the requested profile; if a switch is needed
(active fd is wrong), tears down output_pool, capture_pool, video_format
cache, and fmt_valid so the next RequestCreateContext rebuilds them
on the new device.
Profile→device map (RK3399-shaped):
H264 / HEVC / VP9 → rkvdec
MPEG-2 / VP8 → hantro-vpu
Honours LIBVA_V4L2_REQUEST_VIDEO_PATH / MEDIA_PATH explicit overrides
(skips alt-probe when those are set).
Closes the 'libva multi-device probe' open item from iter36/iter37
campaign-close.
|
||
|
|
25d3e5f06f |
iter37: revert α-26 — decode_params.short_term_ref_pic_set_size back to 0
α-26 (iter26) wrote VAAPI's picture->st_rps_bits to the V4L2 decode_params field of the same name based on field-name match. Per V4L2 spec, this field is the bit-count of st_ref_pic_set() *in the SPS* — VAAPI doesn't expose that. The slice-header bit-count (which IS what VAAPI's st_rps_bits provides) belongs in slice_params->short_term_ref_pic_set_size (handled correctly in α-29). rkvdec doesn't read decode_params->short_term_ref_pic_set_size, so the misroute was harmless but stale. This revert restores spec-correct semantics (0 when SPS bit-count is unknown). Cosmetic cleanup; no functional change. |
||
|
|
7db15a5685 |
iter36: remove env-gated DIAG probes (iter29/30/33/35)
Cleans up the campaign's exploratory env-gated dumps now that all bugs are fixed: - iter29 LIBVA_HEVC_DUMP_SLICE_TAIL (h265.c) — refuted 40-byte inflation theory - iter30 LIBVA_TS_SCALE (picture.c) — refuted timestamp magnitude theory - iter33 LIBVA_VP8_DUMP_FRAME (vp8.c) — led to α-30 fix - iter35 LIBVA_MPEG2_DUMP_FRAME (mpeg2.c) — confirmed MPEG-2 ctrls correct Total: -131 lines / +7 lines (α-7 comment refresh). Preexisting framework env knobs retained: - LIBVA_V4L2_DUMP_OUTPUT (picture.c α-16) - LIBVA_V4L2_DUMP_CAPTURE (surface.c) - LIBVA_V4L2_ZERO_CAPTURE (picture.c) - LIBVA_V4L2_REQUEST_VIDEO_PATH / MEDIA_PATH / NO_AUTODETECT (request.c) The 3 load-bearing fixes remain unchanged: α-25 (rkvdec image_fmt pre-seed, src/context.c) α-29 (slice_params.short_term_ref_pic_set_size, src/h265.c) α-30 (VP8 OUTPUT header prepend, src/picture.c) |
||
|
|
48fd0288c3 | iter35 DIAG: env-gated dump of v4l2_ctrl_mpeg2_* contents | ||
|
|
7e0848d7d2 |
iter33 α-30: prepend VP8 uncompressed frame header to OUTPUT buffer
ROOT CAUSE FIX for VP8 libva decode garbage output. ffmpeg-vaapi's vaapi_vp8.c:191-192 STRIPS the VP8 uncompressed header (3 bytes for interframe, 10 bytes for keyframe) before submitting the slice data via VAAPI. ffmpeg-v4l2request (kdirect) KEEPS the header in its OUTPUT buffer. Hantro's rockchip_vpu2_vp8_dec_run (rockchip_vpu2_hw_vp8_dec.c:349) hard-codes 'first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3' as the byte offset into OUTPUT where the first compressed partition starts. It uses this offset for: - mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8 - dct_part_offset = first_part_offset + first_part_size Without the header, every offset is wrong, the entropy decoder spins on the wrong bytes, and every frame decodes to garbage. Fix: in codec_store_buffer for VAProfileVP8Version0_3, prepend header_size bytes (10 keyframe / 3 interframe) of zeros to OUTPUT before the slice data memcpy. Hantro skips these bytes for actual parsing (uses ctrl-struct values instead), so zero-fill is fine. Empirical: iter33 kernel printk in vpu2_vp8_dec_run dumped the v4l2_ctrl_vp8_frame struct for libva vs kdirect and confirmed byte-identical control fields. Only the OUTPUT buffer bytes differed, traced to ffmpeg-vaapi's header stripping. |
||
|
|
bf3e3d8587 | iter33: extend VP8 DIAG to dump VAAPI probability struct directly | ||
|
|
4b3c21b105 |
iter33 DIAG: env-gated dump of v4l2_ctrl_vp8_frame contents
LIBVA_VP8_DUMP_FRAME=1 prints the v4l2_ctrl_vp8_frame struct fields to stderr before VIDIOC_S_EXT_CTRLS. Goal: diff libva-side struct against expected kdirect-side values for VP8 frame-2+ divergence (libva produces non-trivial but wrong output; kdirect VP8 byte-equal to SW). Env-gated, no behavior change otherwise. |