37 Commits

Author SHA1 Message Date
claude-noether 9fa18f2312 av1: populate V4L2_CID_STATELESS_AV1_SEQUENCE in codec_set_controls
Implements the libva-side portion of issue #11 — replaces PR #10's
no-op AV1 dispatch with a real av1_set_controls that maps VAAPI's
VADecPictureParameterBufferAV1.seq_info_fields + scalar fields onto
struct v4l2_ctrl_av1_sequence (the kernel uAPI control declared at
linux/v4l2-controls.h:2891-2919).

Daemon-track context (issue #11 daemon side, operator-owned):
ffmpeg-vaapi splits the AV1 bitstream client-side and strips the
OBU_SEQUENCE_HEADER before delivery; the V4L2 OUTPUT buffer contains
only OBU_FRAME_HEADER + OBU_TILE_GROUP.  libdav1d in the daedalus
daemon cannot parse this — it expects a complete OBU stream.  The
daemon side has to synthesise OBU_SEQUENCE_HEADER from the SEQUENCE
ctrl and prepend it to the slice bitstream.  This libva-side change
just makes the SEQUENCE ctrl populated and queued via S_EXT_CTRLS;
the daemon track is the consumer.

Three small touch points beyond the new src/av1.{c,h}:

  - src/surface.h: add an av1 leaf to surface->params holding
    VADecPictureParameterBufferAV1.  Slice params intentionally
    absent — the daedalus daemon consumes the slice OBU bytes
    directly from the OUTPUT buffer; no per-tile-group struct →
    OBU re-synthesis required from libva today.
  - src/picture.c: copy the picture-param buffer into the new leaf
    in RenderPicture, mirror of the per-codec memcpy pattern, plus
    call av1_set_controls from codec_set_controls (replacing the
    no-op).
  - src/meson.build: register src/av1.c.

Sequence-field mapping covers everything VAAPI exposes at the
sequence level (12 of 18 V4L2_AV1_SEQUENCE_FLAG_* bits + the four
scalars).  Bits VAAPI doesn't carry at the sequence level
(WARPED_MOTION, REF_FRAME_MVS, SUPERRES, RESTORATION,
SEPARATE_UV_DELTA_Q) stay clear; per-frame consumers (libdav1d via
the daemon, vpu981 via the hardware path) read those from the
OBU_FRAME_HEADER that is already in the slice buffer anyway.  See
feedback memory `feedback_vaapi_blind_to_some_hevc_sps_fields` for
the precedent.

Build verified on higgs (Debian 13 trixie, gcc 14.2.0, libva 2.22.0,
linux uAPI v4l2-controls.h sizeof(struct v4l2_ctrl_av1_sequence)==12):
clean meson + ninja link of v4l2_request_drv_video.so, vainfo
enumerates VAProfileAV1Profile0 via daedalus_v4l2 slot, av1_set_controls
symbol present.

Out of scope on this PR (operator-track, issue #11 follow-up):
  - daedalus-v4l2 kernel module wire-protocol extension (daedalus_
    collect_av1_meta + AV1 ctrl request_setup).
  - daedalus daemon OBU synthesiser (~400 LoC AV1 OBU encoder in
    daemon/src/av1_obu_synth.{c,h}).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 21:13:07 +02:00
marfrit 9a9cfd05db Merge pull request 'picture: no-op codec_set_controls case for VAProfileAV1Profile0' (#10) from noether/picture-av1-noop into master
Reviewed-on: marfrit/libva-v4l2-request-fourier#10
2026-05-20 19:07:12 +00:00
marfrit 96d70af674 picture: no-op codec_set_controls case for VAProfileAV1Profile0
picture.c's codec_set_controls() switch was falling through to the
default case for VAProfileAV1Profile0, returning
VA_STATUS_ERROR_UNSUPPORTED_PROFILE.  Result: vaEndPicture failed
with status 12 ("requested VAProfile is not supported"), no OUTPUT
buffer ever got queued, and the daedalus_v4l2 daemon never saw a
REQ_DECODE for AV1.

config.c's VAProfileAV1Profile0 case (line 84-93) explicitly notes
"Decode-side ctrl dispatch (V4L2_CID_STATELESS_AV1_*) is NOT YET
WIRED on master — vainfo will list the profile + CreateConfig
succeeds, but consumers that submit decode buffers hit a NOP path".
The NOP path was never actually wired in picture.c — it hit the
default UNSUPPORTED_PROFILE branch instead.

Fix: add a VAProfileAV1Profile0 case that just `break;`s through
without setting V4L2 controls.  For the daedalus_v4l2 daemon path
this is exactly the right shape — AV1 frame data is self-describing
per OBU stream (no separate SPS/PPS controls needed at the V4L2
boundary), so the OUTPUT buffer alone is sufficient for the kernel
to forward to the daemon.

Verified on higgs: ffmpeg -hwaccel vaapi -i av1.mkv now actually
queues frames to /dev/video2 and the daemon's libdav1d context opens.
Decode itself still fails (libdav1d wants the AV1 sequence header
OBU, which ffmpeg-vaapi sends via VAPictureParameterBufferAV1 not
via the slice buffer) — separate issue, needs an OBU sequence-header
synthesiser in the daedalus daemon (analogous to the new H.264
SPS/PPS NAL synth in daedalus-v4l2/daemon/src/h264_nal_synth.c).
That sequence-header synth work is a substantial follow-up; this
patch unblocks AV1 reaching the daemon at all.

For RK3588 vpu981 (the originally-planned AV1 target), this
remains a true NO-OP — when V4L2_CID_STATELESS_AV1_* dispatch
lands from the av1-iter1 operator branch, replace the no-op with
av1_set_controls(...).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 20:58:57 +02:00
marfrit c1bb444d07 Merge pull request 'h264: max_num_ref_frames fallback + libva-boundary instrumentation (#8)' (#9) from claude-noether/libva-v4l2-request-fourier:noether/h264-3-set-controls-bitstream-bug-8 into master
Reviewed-on: marfrit/libva-v4l2-request-fourier#9
2026-05-20 18:19:03 +00:00
claude-noether 0791f8e612 h264: max_num_ref_frames fallback + libva-boundary instrumentation
Closes the libva-side portion of marfrit/libva-v4l2-request-fourier#8.

Two small additions to h264_set_controls:

1. When VAPicture->num_ref_frames is 0 (older ffmpeg-vaapi paths /
   some daedalus_v4l2 consumers), count valid (non-INVALID) DPB
   entries in ReferenceFrames[16]. If even that returns 0, fall back
   to a per-profile spec minimum (1 for baseline, 4 for main/high).
   Hardware decoders (rkvdec, hantro, rpi-hevc-dec) tolerated the
   prior 0; libavcodec-via-daedalus enforces sps.max_num_ref_frames
   strictly and rejected every frame.

2. One request_log line at function entry dumping the raw VAAPI
   fields (seq_fields.value, pic_fields.value, num_ref_frames,
   bit_depth_*, picture_*_in_mbs_minus1). Disambiguates "ffmpeg-vaapi
   never populated" from "daedalus_v4l2 wire protocol corrupted" for
   the bit-fields-read-as-zero portion of issue #8.

Out of scope here (separate issue if pursued): profile_idc and
level_idc remain session-derived. VAAPI's VAPictureParameterBufferH264
omits both (verified higgs libva 2.22.0-3, /usr/include/va/va.h:
3571-3622) — same VAAPI-blindspot family as the HEVC SPS fields. A
real fix requires SPS-NAL parsing from surface->source_data OR a
daedalus wire-protocol pass-through; both are operator design calls,
not a libva-only patch.

Build verified on higgs (Debian 13 trixie, gcc 14.2.0, libva 2.22.0):
clean ninja link of v4l2_request_drv_video.so, vainfo enumerates all
8 codec profiles, no init regression.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 20:17:27 +02:00
marfrit 989833114a Merge pull request 'config: include video_fd_daedalus in profile enumeration probe' (#7) from claude-noether/libva-v4l2-request-fourier:noether/libva-2-config-profile-enum-daedalus into master
Reviewed-on: marfrit/libva-v4l2-request-fourier#7
2026-05-20 14:52:11 +00:00
marfrit d1ba4625d2 config: include video_fd_daedalus in profile enumeration probe
LIBVA-2 follow-up.  RequestQueryConfigProfiles walks each known
decoder fd via any_fd_supports_output_format() and adds a VAProfile*
for each codec OUTPUT format the V4L2 device advertises.  The fd
list missed video_fd_daedalus — so on a Pi 5 with rpi-hevc-dec
primary + daedalus_v4l2 alt, only S265 (HEVC) was probed and the
H.264 / VP9 / AV1 profiles never got enumerated.

Effect on higgs: ffmpeg -hwaccel vaapi -i h264_test.mp4 reported
"No support for codec h264 profile 578" before the per-codec
dispatch in request_switch_device_for_profile could fire — the
profile-578 (H264 Constrained Baseline) check happened during
hwaccel init, found nothing in the libva profile list, and bailed
without ever calling into the daedalus path.

Fix: extend the fds[] array in any_fd_supports_output_format from
5 to 6 entries, with the sixth being video_fd_daedalus when
HAVE_DAEDALUS_V4L2 is on (and -1 otherwise so it's skipped by the
`if (fds[i] < 0) continue;` guard).  After the fix, daedalus_v4l2's
OUTPUT format menu (VP9F + AV1F + S264) gets seen, and Request-
QueryConfigProfiles returns VP9Profile0 + AV1Profile0 + the H264*
profiles, all of which then route through the LIBVA-1 'd' kind
override in request_switch_device_for_profile.

Verified on higgs:

  Before:
    vainfo: Supported profile and entrypoints
          VAProfileHEVCMain               : VAEntrypointVLD
    (only HEVC; H264/VP9/AV1 not enumerated)

  ffmpeg vaapi -i h264 → "No support for codec h264 profile 578"

Build clean on boltzmann (only config.c.o + request.c.o recompile).

Backward-compatible on RK3399/3588 — the new slot is gated by
HAVE_DAEDALUS_V4L2 *and* video_fd_daedalus >= 0; both stay false in
those deployments.  Existing 5-fd probe order unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 16:45:33 +02:00
claude-noether c332d34643 Merge pull request 'request: route VP9/AV1/H.264 to daedalus_v4l2 on Pi 5 mixed deploy' (#6) from claude-noether/libva-v4l2-request-fourier:noether/libva-1-per-codec-dispatch into master 2026-05-20 08:53:04 +00:00
marfrit 6173a8da8e request: route VP9/AV1/H.264 to daedalus_v4l2 on Pi 5 mixed deploy
LIBVA-1 — when both rpi-hevc-dec and daedalus_v4l2 are loaded, finish
the per-codec dispatch so HEVC goes to rpi-hevc-dec (existing 'p'
override) and VP9 / AV1 / H.264 go to the daedalus daemon ('d').

Before this change the multi-device-probe accepted only ONE driver
plus a fixed alt slot (rkvdec↔hantro-vpu); on a Pi 5 with both decoders
the find_codec_device() walk preferred rpi-hevc-dec by known_decoder_
drivers[] order and never opened daedalus_v4l2, so VP9/AV1/H.264 frames
hit rpi-hevc-dec's S_FMT and failed.

Changes:

  - request.c multi-device-probe: when primary = rpi-hevc-dec, alt =
    daedalus_v4l2 (when HAVE_DAEDALUS_V4L2 is on); symmetric handling
    in the daedalus_v4l2 primary branch so alt = rpi-hevc-dec.  This
    preserves the iter40 fallback (no daedalus → alt = NULL) when the
    build option is off.

  - request.c alt-driver opening block: generalized from the iter38
    rkvdec/hantro pair to also dispatch into video_fd_rpi_hevc_dec and
    video_fd_daedalus slots.  Defensive close on unknown alt-driver
    name (shouldn't happen — primary_driver branches gate the choices —
    but keeps the slot tally clean if a future driver name is added
    above without wiring up the dispatch here).

  - request_switch_device_for_profile: added 'd' kind handler +
    profile override block.  When daedalus is open, VP9 / AV1 / H.264*
    route to it.  HEVC stays on rpi-hevc-dec via the existing 'p'
    override.  AV1 'a' kind (RK3588 vpu981) wins ONLY if vpu981 was
    probed, so the override only fires on hosts where vpu981 stayed
    -1 (i.e. Pi 5).

  - RequestTerminate: close the daedalus_v4l2 fd pair on teardown
    (was leaking — caught while reviewing the alt-driver expansion).

Build: meson + ninja clean on boltzmann (only pre-existing GStreamer
H265 parser noise).  Behaviour on RK3399/3588 boxes unchanged — the
new branches are gated by HAVE_DAEDALUS_V4L2 *and* video_fd_daedalus
≥ 0, both of which stay false in those deployments.

Companion to daedalus-v4l2 481279c (Phase 8.13 systemd unit) and
marfrit-packages noether/daedalus-v4l2-kernel-6.18-compat branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 10:41:18 +02:00
marfrit de27e95571 v4l2: log error_idx + failing ctrl id on S_EXT_CTRLS failure
Better diagnostic when VIDIOC_S_EXT_CTRLS returns < 0: read
back error_idx and print which control id rejected (or
"ioctl-level" when error_idx == count, meaning the rejection
was generic, not per-control).

Made it possible to triage the daedalus_v4l2 phase 8.13 issue
by separating "the actual stateless control failed" (would
show failing_ctrl_id=0xa40a2c VP9_FRAME) from "libva probing
H264/HEVC profile/level we don't expose" (failing_ctrl_id=
0xa40900 H264_PROFILE etc.) — the latter is harmless on a
VP9-only context.

Before:
  v4l2-request: Unable to set control(s): Invalid argument

After (per-control):
  v4l2-request: Unable to set control(s): Invalid argument
                (error_idx=0/2 failing_ctrl_id=0xa40900 size=0)

After (ioctl-level):
  v4l2-request: Unable to set control(s): Invalid argument
                (error_idx=2/2 ioctl-level)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 18:14:50 +00:00
marfrit 2146341460 daedalus_v4l2: meson option gate (default true)
Adds a build-time switch so platforms that will never see a
daedalus_v4l2 kernel module (Allwinner cedrus, RK without the
shim, etc.) can opt out of the probe entry + dispatch branch.

  meson setup build                         # daedalus support on
  meson setup build-off -Ddaedalus_v4l2=false  # off

Implementation:
- meson_options.txt: new boolean `daedalus_v4l2`, default true.
- src/meson.build: when option is true, autoconfig.h gets
  `#define HAVE_DAEDALUS_V4L2 1`.
- src/request.c: known_decoder_drivers[] entry, primary-driver
  detection branch, and post-probe log line all gated by
  #ifdef HAVE_DAEDALUS_V4L2.
- src/request.h: struct daedalus fields kept UNCONDITIONAL.
  Two extra int per session and the struct layout stays stable
  across translation units regardless of option — avoids the
  ODR risk of every consumer of request.h needing to include
  autoconfig.h before request.h.

Verified on hertz: both builds compile clean.
  build/src/autoconfig.h has HAVE_DAEDALUS_V4L2; .so contains
  "daedalus_v4l2" string + log message.
  build-off/src/autoconfig.h doesn't; .so contains no daedalus
  strings at all.

Default-on build still passes vainfo end-to-end:
  vainfo: Driver version: v4l2-request
  vainfo: Supported profile and entrypoints
        VAProfileH264Main / High / ConstrainedBaseline / MultiviewHigh
        / StereoHigh : VAEntrypointVLD
        VAProfileVP9Profile0 : VAEntrypointVLD
        VAProfileAV1Profile0 : VAEntrypointVLD

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:41:17 +00:00
marfrit b5b3acf0f7 daedalus_v4l2: add to known_decoder_drivers + multi-device-probe slot
Phase 8.10 of the daedalus-v4l2 sibling campaign — out-of-tree
V4L2 stateless decoder shim that forwards bitstream to a
userspace daemon (FFmpeg-software decode for VP9 / AV1 / H.264;
pixels back via dmabuf into the CAPTURE buffer).

Adds the same iter40-shaped wiring as rpi-hevc-dec:
- known_decoder_drivers[] entry "daedalus_v4l2"
- video_fd_daedalus + media_fd_daedalus slots in driver_data
- -1 init alongside the other multi-device slots
- primary-driver detection branch in the auto-probe block
- post-probe log line for symmetry with iter40

No per-profile dispatch changes needed — daedalus_v4l2 advertises
the standard V4L2_PIX_FMT_{VP9_FRAME,AV1_FRAME,H264_SLICE}
OUTPUT fourccs the fork's existing per-driver paths already
handle.

Verified on hertz (Pi 5 / CM5, 6.12.75+rpt-rpi-2712) with the
daedalus_v4l2 module loaded:

  LIBVA_DRIVER_NAME=v4l2_request \
  LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0 \
  LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media3 \
  vainfo --display drm --device /dev/dri/renderD128

  v4l2-request: opened daedalus_v4l2 at video_fd=... media_fd=... (Pi 5 daemon-backed VP9/AV1/H264)
  vainfo: Driver version: v4l2-request
  vainfo: Supported profile and entrypoints
        VAProfileH264Main               : VAEntrypointVLD
        VAProfileH264High               : VAEntrypointVLD
        VAProfileH264ConstrainedBaseline: VAEntrypointVLD
        VAProfileH264MultiviewHigh      : VAEntrypointVLD
        VAProfileH264StereoHigh         : VAEntrypointVLD
        VAProfileVP9Profile0            : VAEntrypointVLD
        VAProfileAV1Profile0            : VAEntrypointVLD

Without the env override the auto-probe still picks rpi-hevc-dec
first (it's earlier in known_decoder_drivers[]); on the standalone
daedalus_v4l2 path the daemon-backed decode is what answers
S_FMT/QBUF/DQBUF. On a mixed-driver Pi 5 box where both modules
are loaded, HEVC continues to route through rpi-hevc-dec via the
existing 'p' override; VP9/AV1/H264 would prefer daedalus_v4l2
since rpi-hevc-dec is HEVC-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:37:53 +00:00
marfrit 820557268b Merge PR #5: ampere-av1 Phase 2 (master) — fourth-fd probe + AV1 enumeration 2026-05-18 13:47:56 +00:00
claude-noether c6f81c653f ampere-av1 Phase 2 (master): fourth-fd probe + AV1 enumeration
Imports the minimal "vainfo lists VAProfileAV1Profile0" layer from the
operator's in-progress av1-iter1 branch (Phase 2 steps 1, 2 — commits
bed75c0 + 61db76e on av1-iter1). The Phase 3-5 bit-exact decode-side
work stays in av1-iter1; this commit gives master the enumeration +
fd-routing layer so consumers (ffmpeg-vaapi, firefox-fourier, chromium-
fourier) at least see VAProfileAV1Profile0 today on RK3588.

What this commit adds:
- video_fd_vpu981 + media_fd_vpu981 slots to struct request_data
  (named to match av1-iter1's convention so the operator's Phase 3-5
   merge resolves cleanly)
- 4th-decoder probe loop in VA_DRIVER_INIT that walks hantro-vpu
  media nodes for an instance advertising V4L2_PIX_FMT_AV1_FRAME
  (AV1F) as OUTPUT pixfmt. RK3588 has 3 hantro-vpu instances all
  reporting driver="hantro-vpu" + model="hantro-vpu", so OUTPUT-
  format probe is the only DTS-independent discriminator.
- 'a' kind in request_device_kind_for_profile (VAProfileAV1Profile0)
  + 'a' branch in request_switch_device_for_profile.
- video_fd_vpu981 added to any_fd_supports_output_format helper
  (existing 3-slot loop missed the new fd; same off-by-one trap
  that bit ampere's av1-iter1 enumeration for a week).
- VAProfileAV1Profile0 → V4L2_PIX_FMT_AV1_FRAME in pixelformat_for
  _profile.
- VAProfileAV1Profile0 push in RequestQueryConfigProfiles +
  RequestQueryConfigEntrypoints + RequestCreateConfig switch.
- vpu981 fd cleanup in RequestTerminate.
- rpi_hevc_dec fd cleanup added at the same time (was already missing
  in master — fixed defensively).
- V4L2_REQUEST_MAX_PROFILES bumped 13 → 14. Defensively sized for
  the post-Option-B-revert future: with iter39 Option B reverted
  (Hi10P + Main10 back in enumeration) plus AV1, max possible
  enumeration is 13. The per-group guards use `index < MAX - N`
  pattern; for a singleton push to succeed at index=13 we need
  MAX >= 14. Bumping now avoids the same off-by-one bug from
  silently dropping AV1 when Option B eventually reverts.

What this commit does NOT add:
- av1.{c,h} decode-side scaffolding (Phase 2 step 4 on av1-iter1 —
  ~177 LoC including a stub av1_set_controls that returns -1). When
  the operator's av1-iter1 Phase 3-5 work lands on master, those
  500+ LoC + the stub will follow. Without them, consumers calling
  vaCreateContext(VAProfileAV1Profile0) succeed at the libva layer
  but ffmpeg-vaapi will fail at the first vaRenderPicture with an
  AV1-buffer-type rejection — clean error, no crash.

Verified 2026-05-18 on ampere:

  $ env LIBVA_DRIVER_NAME=v4l2_request vainfo | grep VAProfile
        ... (10 prior profiles, unchanged) ...
        VAProfileAV1Profile0            :   VAEntrypointVLD   ✓

  Probe log: "ampere-av1: vpu981 AV1 decoder at /dev/video4 + /dev/media3"

Build clean on ampere with GCC 16.1.1; no warnings introduced.
ampere's running module restored to the av1-iter1 build after the
verification — this commit's .so was NOT permanently installed.

Closes the headline acceptance criterion in
marfrit/libva-v4l2-request-fourier#2 ("vainfo on ampere lists
VAProfileAV1"). End-to-end AV1 decode bit-exactness is iter4 work
that the av1-iter1 branch continues to drive.

Co-Authored-By: claude-noether <claude-noether@reauktion.de>
2026-05-18 13:45:04 +00:00
claude-noether 9bb5a5a722 README: ffmpeg-v4l2-request-fourier flipped to published
Build + publish landed (2:8.1.r123329.b57fbbe-3, Kwiboo's
v4l2-request-n8.1 tip + libudev-bypass companion patch). Deploy-host
verified on fresnel: installs cleanly, ffmpeg buildconf shows
--enable-v4l2-request, hwaccels list includes 'v4l2request', HEVC
decode via -hwaccel v4l2request produces correct-size output.

Quickstart per-host pacman -S lines now include
ffmpeg-v4l2-request-fourier. Status table flipped its row from
pending to published. Remaining pending: chromium-fourier
(clang 22 -> 23 blocker), qt6-base-fourier (Wayland GL_ALPHA fix).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 21:01:04 +00:00
claude-noether 0182307403 README: add Quickstart section with per-host install + full stack matrix
The TL;DR of 'what packages do I install to watch YouTube on my
Rockchip board with HW acceleration in Firefox' wasn't reachable
from this README without reading three other repos' commit
histories. Fixed.

Now landed at the top:

- Stack matrix: kernel (linux-{fresnel,ampere}-fourier) -> ffmpeg
  (ffmpeg-v4l2-request-fourier) -> libva (libva-v4l2-request-fourier)
  -> browser (firefox-fourier or chromium-fourier + kwin-fourier on
  Wayland).
- Honest acknowledgement that the browser HW path is libavcodec
  hwdevice DRM, not VAAPI-via-libva. This backend matters for mpv /
  ffmpeg-as-vaapi consumers.
- Per-host pacman -S incantations for fresnel (RK3399), ampere
  (RK3588), ohm (RK3566).
- Live marfrit repo URL + signing-key import flow.
- Smoke-test commands (vainfo + MOZ_LOG patterns).
- Honest status flag: ffmpeg-v4l2-request-fourier, chromium-fourier,
  qt6-base-fourier exist in marfrit-packages source tree but NOT
  yet in the live repo. Users building those locally now.
- RK3588 mainline (Feb 2026) called out alongside ampere row.

What hasn't changed: Pi 5 standoff section, technical notes,
existing iter39 / iter40 status tables.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 20:48:53 +00:00
claude-noether 941fbc5b1b README: candid 'standoff' framing for Pi 5 HEVC + RK matrix
Replace the original 2018 Bootlin upstream README with the
fourier-fork situation as of May 2026. What works: fresnel 5/5,
ampere iter1+2, ohm baseline (all RK family, mainline VDPU381/383
landing Feb 2026 helps).

What doesn't: Pi 5 HEVC via this backend. New 'The Pi 5 standoff'
section captures the honest situation surfaced by the May 2026
web-research pass:

- Kwiboo's ffmpeg-v4l2request hwaccel: 8 years un-merged upstream
- libva-v4l2-request: no commits since ~2021
- rpi-hevc-dec mainline: 17 months in review, still not merged;
  Pi 6.18.x downstream has active HEVC regressions (#7228, #7306)
- Mozilla bug 1969297 picks the ffmpeg-hwaccel-context path, not
  libva — explicit ack that strict drivers need libavcodec's
  internal SPS context
- Frames the issue as ecosystem coordination failure (principal-
  agent stalemate), not architectural impossibility

Notes that iter40 + iter40b lands but parks: backend infra is
sound + reusable for any future strict V4L2 stateless target ffmpeg
ships before libva does, but the user-facing Pi 5 HEVC story will
not come from this backend — it'll come from Mozilla / Kwiboo /
upstream coordination unblocking.

iter38 5/5 fresnel + 9-profile ampere baselines preserved
post-iter40b — documented as no-regression in phase7_pi5_hevc_close.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 19:58:52 +00:00
claude-noether 071b08dcc2 iter40b: SPS-parse fix lands but bit-exact still blocked upstream
Per-driver gate added: when rpi-hevc-dec active, parse SPS NAL from
surface_object->source_data via the iter2 vendored GStreamer parser
and override the VAAPI-omitted v4l2_ctrl_hevc_sps fields
(sps_max_num_reorder_pics, sps_max_latency_increase_plus1,
sps_max_sub_layers_minus1, max_dec_pic_buffering_minus1[HighestTid]).
Cached at driver_data->hevc_sps_field_cache.

Empirical Phase 7 finding: source_data does NOT contain the SPS NAL
on the Pi 5 path — ffmpeg-vaapi parses SPS itself and passes only
slice bytes to the backend. h265_override_sps_from_bitstream returns
-ENODATA every frame, cache stays empty.

Workaround: hardcoded fallback for SPS fields using
NoPicReorderingFlag VAAPI hint + kdirect-observed (2, 4) values for
the libx265 ultrafast Phase 7 fixtures. Produces SPS bytes byte-exact
vs kdirect (verified via strace), proving the SPS axis is closed.
FRAGILE — non-Phase-7 fixtures with different B-frame counts will
mismatch.

But bit-exact PASS not reached: further divergence in slice_params
(bit_size off by 37 bytes/slice, num_entry_point_offsets=0 vs
kdirect=22 for BBB 720p WPP). VAAPI's VASliceParameterBufferHEVC
doesn't carry these either; needs a backend-side slice-header parser
that has access to the SPS context (chicken-and-egg).

Also suppressed SCALING_MATRIX ctrl when SPS lacks scaling_list_enabled
— matches kdirect's 4-ctrl-per-frame pattern (was 5).

Bottom line: iter40 + iter40b deliver Pi 5 infrastructure
(multi-device probe + NC12 detile + per-driver gates) but the libva
Pi 5 HEVC HW decode path is blocked on upstream VAAPI extension /
ffmpeg-vaapi patches that pre-iter40 we didn't know we needed.

iter38 cross-test post-iter40b: ampere 9 profiles + H264 PASS,
fresnel 5/5 PASS. No sibling regression.

Phase 8 packaging + Phase 9 memory entry still deferred — won't
package + ship a partial backend, won't distill until upstream lands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 19:45:43 +00:00
claude-noether 9037934b21 phase7_pi5_hevc_close: iter40 partial — backend integration works, decode rejected by rpi-hevc-dec
C1 vainfo PASS, C3 HW engagement PASS, C6 decode-correctness FAIL
(V4L2_BUF_FLAG_ERROR on every CAPTURE DQBUF). Root cause empirically
located: SPS sps_max_num_reorder_pics + sps_max_latency_increase_plus1
fields. Our backend uses a spec-legal fallback (sps_max_dec_pic_buffering_minus1, 0)
because VAAPI doesn't forward these fields; rkvdec accepts it,
rpi-hevc-dec validates against bitstream-true values and rejects.

Real fix needs SPS NAL parse via the iter2 vendored GStreamer parser
to populate bitstream-true values for the V4L2 SPS ctrl. Estimated
1 more 8(+1)-phase loop (iter40b).

Phase 8 + Phase 9 deferred — won't package + deploy + ship a broken
backend; won't distill lessons until the real fix lands.

Sibling iter38 baseline NOT yet re-verified on fresnel + ampere
post-iter40. Code paths gated on video_fd_rpi_hevc_dec >= 0 stay
no-op on non-Pi hosts; only __arm__ → __aarch64__ guard change is
globally observable but its is_10bit sub-gate stays dormant on
8-bit fixtures. Verify before declaring no-regression.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 19:18:16 +00:00
claude-noether 3ffa9d0d17 iter40: Pi 5 HEVC chapter — backend integration lands, bit-exact pending
Phase 6 implementation. Backend builds clean on higgs (Debian 13
trixie, aarch64), vainfo lists VAProfileHEVCMain via rpi-hevc-dec,
multi-device probe finds /dev/video19 + /dev/media1, CreateContext
+ S_FMT + REQBUFS + STREAMON all succeed.

Phase 7 partial: infrastructure works, 10 frames flow through the
pipeline (correct byte counts produced — 13824000 for 1280x720 x 10
NV12 frames). But every DQBUF CAPTURE returns V4L2_BUF_FLAG_ERROR
so output content is wrong (libva sha != kdirect sha). The decode
itself is failing on the rpi-hevc-dec side despite all ctrl
submissions returning success.

Code changes:
- request.h: video_fd_rpi_hevc_dec / media_fd_rpi_hevc_dec slots +
  has_hevc_ext_sps_rps_rpi_hevc_dec flag (mirrors iter38 + iter2
  pair-of-flags pattern, naturally false on Pi).
- request.c: known_decoder_drivers gains rpi-hevc-dec; primary-driver
  probe gets an else-if branch setting the new fds (Phase 5 F3);
  request_switch_device_for_profile prefers 'p' for HEVC when
  rpi-hevc-dec present.
- context.c: per-fd want_pixfmt (NC12 on Pi), capture_pixelformat
  taken from video_format slot (not hardcoded NV12/NV15);
  synthetic-SPS pre-seed gated off for Pi (Phase 5 F6);
  destination_sizes uses nv12_col128_uv_plane_offset for NC12 SAND
  layout (Phase 5 F2);
  per-driver HEVC_START_CODE (NONE on Pi, ANNEX_B on RK);
  per-driver context_object->h264_start_code (skip prepend on Pi).
- video.c: NV12_COL128 video_format entry (8-bit SAND, single
  buffer, 2 planes, NV12 drm_format with MOD_NONE so detile branch
  fires rather than tiled_to_planar).
- nv12_col128.c/.h: detile primitive (Y + UV per-plane, kernel
  hevc_d_video.c bytesperline formula + ffmpeg/Kynesim per-pixel
  offset). UV plane offset = 128 * ALIGN(h, 8) — within-column
  (SAND interleaves Y+UV per column, NOT plane-concatenated;
  earlier wrong formula caught by Phase 7 SEGV).
- image.c: #ifdef __arm__ extended to __arm__ || __aarch64__
  (Phase 5 F1 — guard was killing detile path on all aarch64
  hosts including fresnel iter39 NV15 path, masked because 10-bit
  never exercised); RequestCreateImage NC12 → NV12 stride override
  (linear width, not column-stride); copy_surface_to_image NC12
  detile branch (gates on fourcc + v4l2_format).
- nv15.h: fallback V4L2_PIX_FMT_NV15 define (Debian 13 headers
  omit it though they have NC12).
- nv12_col128.h: fallback V4L2_PIX_FMT_NV12_COL128 +
  V4L2_PIX_FMT_NV12_10_COL128 (Arch / mainline pre-Pi headers).
- tests/test_nv12_col128_detile.c: hand-crafted-bytes unit test;
  passes (8 cases: Y + UV for 4 widths incl. 1366 misaligned;
  UV-offset helper).
- meson.build / nv12_col128 sources listed.

Phase 7 status: not yet bit-exact. Remaining diagnosis: per-frame
S_EXT_CTRLS payload diff vs kdirect (kdirect sends 4 ctrls
SPS+PPS+decode_params+slice_array; ours sends 5 incl. scaling_matrix;
field ordering differs). Likely the slice_array contents need
per-driver handling for rpi-hevc-dec's expected layout. Beyond
in-session reach.

iter38 5/5 baseline on fresnel + ampere should be unaffected (new
fd stays -1 on non-Pi hosts; all gates either short-circuit on
fd-not-present or no-op).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 19:17:14 +00:00
claude-noether f1be489c75 phase5_pi5_hevc_review: 3 critical findings empirically verified, 1 fixture gap
Sonnet Plan-agent review of phase1_pi5_hevc plan. Empirically
verified each finding against current source per
feedback_review_empirical_over_theoretical BEFORE accepting:

F1 (CRITICAL): #ifdef __arm__ at image.c:239+268 kills NC12 (and
already-present NV15) detile on AArch64. fresnel iter39 5/5 PASS
masked this because 10-bit path was never exercised. Fix: extend
guard to __aarch64__.

F2 (CRITICAL): destination_bytesperlines for NC12 source returns
column-stride (1080) not linear-NV12 Y stride (1280). VAImage
consumers see wrong pitch. Fix: override in RequestCreateImage
when src=NC12, dst image=NV12.

F3 (CRITICAL): request.c primary-driver detection has else-if
branches for rkvdec and hantro-vpu only. On higgs (rpi-hevc-dec
primary), neither matches → new fd pair stays -1 → routing
no-ops. Fix: add explicit rpi-hevc-dec branch.

F4 (accepted): add 1366x768 fixture to exercise column padding.

F5 (verify-only): HEVC START_CODE_ANNEX_B may not work on
rpi-hevc-dec (kdirect uses NONE). Don't pre-gate; verify
empirically in Phase 7.

F6 (CRITICAL): iter25 synthetic-SPS pre-seed fires for HEVC
regardless of driver_kind. Would issue HEVC_SPS to rpi-hevc-dec
which doesn't need it AND uses different submission order. Fix:
gate on driver_data->video_fd != video_fd_rpi_hevc_dec.

F7/F8 (no findings): image.c gate predicate sound; cross-device
regression scope clean.

Amended Phase 6 step list with 3 new gating actions. Phase 7
verification expanded with empirical START_CODE check + 1366
fixture.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 19:04:28 +00:00
claude-noether bf52725ab3 phase1_pi5_hevc: lock goal + situation + N=3 baseline + plan (iter40)
Phase 1 measurable goal: HEVC Main 8-bit bit-exact libva-vs-kdirect
on higgs for 640x360 / 1280x720 / 1920x1080 fixtures with HW path
engagement verified via lsof + ffmpeg-vaapi log signal.

Phase 2 surface-area audit: ~250 LoC backend + 100 LoC standalone
detile primitive. Reuses iter38 multi-device-probe pattern (now
3 slots: rkvdec + hantro + rpi-hevc-dec) + iter2 per-driver
gating shape. h265_set_controls + iter31 a-29 plumbing transfers
unchanged. iter25 SPS pre-seed gated off for rpi-hevc-dec.

Phase 3 baseline locked: N=3 bit-exact SW==kdirect for all three
fixtures on higgs. kdirect engagement signal:
  Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19;
  buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8

Phase 4 plan: 7 sequenced steps (request.h -> request.c -> video.c
-> nv12_col128.c new -> image.c branch -> meson/Makefile -> build
on higgs). NC12 tile geometry locked from kernel hevc_d_video.c
math + ffmpeg/Kynesim av_rpi_sand_to_planar_y8 byte-offset formula.
Risks + mitigations enumerated.

Phase 5 sonnet review explicitly requested per CLAUDE.md
no-skip-reviews rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 19:00:35 +00:00
claude-noether b6a65fc692 phase0_pi5_hevc: close addendum with empirical higgs probe data
Live probe of rpi-hevc-dec on higgs (Pi CM5, kernel 6.12.75-rpt-rpi-2712,
Debian 13 trixie) answers Phase 0 open questions Q1, Q2, Q5, Q6
empirically; Q3 partial; Q4 still open.

Q1 (EXT_SPS): NOT present. Only standard V4L2_CID_STATELESS_HEVC_*.
  Probe ctrl id 0xa97 returns EINVAL — same gate iter2's
  has_hevc_ext_sps_rps_rkvdec uses. iter31 alpha-29 plumbing applies.

Q2 (hevc_start_code): default 0 "No Start Code"; matches our behaviour.

Q3 (NC12 SAND tile layout): partial. CAPTURE S_FMT for 1280x720 NC12
  returns sizeimage=1382400 (linear NV12 byte count) but
  bytesperline=1080 (suspect, encodes SAND col count not linear stride).
  Need kernel-doc / driver-source read before writing detile primitive.

Q4 (DRM modifier round-trip): hwdownload rejects SAND-tiled drm_prime
  (-38 Function not implemented). Backend CPU-detile to NV12 is the
  safe path for Firefox.

Q5 (submission ordering): empirical ioctl trace shows canonical V4L2
  stateless flow. Two notes for the backend: kdirect uses
  V4L2_MEMORY_DMABUF for both queues (we use MMAP for CAPTURE on
  rkvdec); kdirect does NOT need the iter25 SPS pre-seed pattern -
  rpi-hevc-dec takes explicit NC12 + dims directly.

Q6 (packaging): Debian 13 trixie. Phase 8 needs a debian/ tree, not
  just PKGBUILD. Decision in Phase 1.

Other findings: ffmpeg 7.1.3 from stock Debian is built with
--enable-v4l2-request. kdirect engagement line:
  Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19;
  buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8
No libva ICD installed (only armada-drm_dri.so). mpv installable.
Firefox 145 + rpi-firefox-mods present.

Phase 0 closed. Phase 1 opens with goal:
  HEVC bit-exact libva-vs-kdirect on higgs for 1280x720 Main 8-bit
  via the new RPI_HEVC_DEC driver_kind slot + NC12 detile primitive.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 18:54:08 +00:00
claude-noether 25b8a15e09 phase0_pi5_hevc: open Pi 5 / CM5 HEVC chapter (substrate doc only)
Empirical higgs probe (sibling session 2026-05-17) confirmed
rpi-hevc-dec at /dev/video19 is V4L2 STATELESS, not stateful:
- Section header literally "Stateless Codec Controls"
- OUTPUT V4L2_PIX_FMT_HEVC_SLICE (parsed slices), not full-stream HEVC
- V4L2_CID_STATELESS_HEVC_* control set + slice_param_array[4096]
- CAPTURE NC12 / NC30 (V4L2_PIX_FMT_NV12_COL128 / _10_COL128,
  SAND 128-column tiled, Pi-specific)

So the Pi 5 HEVC HW path belongs HERE (request/stateless backend),
not in a separate stateful project. Replaces the now-deleted
libva-v4l2-stateful-fourier scaffold attempt.

phase0_pi5_hevc.md captures:
- Substrate (target host, backend baseline, empirical probe output)
- What carries forward unchanged (most of HEVC plumbing)
- What needs adding (RPI_HEVC_DEC driver_kind, NC12/NC30 video_format
  + detile primitive, image.c branch — small surface area)
- Six open questions Phase 1 must answer first (EXT_SPS presence,
  start_code default, SAND tile spec, drm_prime modifier round-trip,
  rpi-hevc-dec submission ordering quirks, packaging target OS)
- Phase 1 goal sketch (NOT locked) + Phase 3 baseline plan

No code in this commit. Phase 1 opens when higgs is up + first two
open questions are answered live.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 18:48:09 +00:00
claude-noether cf8cd9d2be h265: cap pred-weight + ref-list loops at VAAPI source size (15)
V4L2_HEVC_DPB_ENTRIES_NUM_MAX is 16, but
VASliceParameterBufferHEVC::RefPicList is [2][15] and the eight
delta_*_weight_lX / luma_offset_lX / delta_chroma_weight_lX /
ChromaOffsetLX arrays are all [15]. Iterating the per-slot copy
loops to 16 over-reads the VAAPI source by one element.

The bug was always there but hidden under -O3 (meson's default
buildtype=release): GCC unrolled the inner loop and dead-folded
the out-of-bounds load. Under -O2 (Arch makepkg CFLAGS) the
canonical vectorised loop ran and produced a real SEGV at
v4l2_request_drv_video.so + 0xb3a4 inside h265_fill_slice_params,
breaking HEVC immediately after the package install on fresnel
(iter38 5/5 baseline dropped to 4/5).

Define a local VA_HEVC_REF_LIST_LEN (15) and use it as the cap
for the four offending loops. RefPicList and pred_weight_table
copies now respect the source bound; V4L2 destination still has
16 slots, the upper one stays at memset-zero which is correct.

Verified locally: -O2 build + package re-install restores HEVC
to bit-exact PASS vs kdirect (sha 108f925bb6cbb6c9). iter38 5/5
baseline restored.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 17:00:52 +00:00
claude-noether c9f32aff49 iter39 Option B revert of 63fed87: P010 advertisement gated on is_10bit again
Phase 7 fix 63fed87 (unconditional P010 in QueryImageFormats) broke
HEVC 8-bit on fresnel: ffmpeg-vaapi picked P010 for the HEVC hwframe
pool, vaEndPicture SEGV'd when consumer-side P010 expectations met
the 8-bit NV12 CAPTURE buffer. Exit 139 (SIGSEGV) on first frame.

Original reasoning for 63fed87 (advertise early so ffmpeg's pre-
CreateContext query sees P010) doesn't apply with Option B in place —
Hi10P + Main10 are dropped from RequestQueryConfigProfiles, so no
10-bit decode pipeline reaches QueryImageFormats. The gate on
is_10bit (false for all enumerated profiles post-Option-B) correctly
returns NV12-only.

Verified on fresnel post-revert: HEVC bit-exact PASS sha
108f925bb6cbb6c9 restored; iter38 5/5 baseline intact.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 16:52:36 +00:00
claude-noether 6bc12fe7e4 iter39 Option B: drop Hi10P + Main10 from RequestQueryConfigProfiles
Per Phase 7 close + user-directed Option B trigger (web research /
rockchip-mpp showed Hi10P is effectively impossible on the current
stack). Cross-test on ampere RK3588 confirmed the SAME failure mode
as fresnel RK3399 — both produce all-zero output via libva; kdirect
fails with EINVAL on both. The blocker is in ffmpeg-v4l2-request
userspace plumbing for the new uAPI controls Karlman's kernel patches
introduced, NOT in our backend or the kernel.

Sources confirming kernel + HW capable but userspace pending:
  - lwn.net/Articles/950434: "to fully runtime test... you may need
    upstream DRM commits, FFmpeg patches"
  - patchwork.kernel.org Karlman v6 → v10 series on linux-media
  - Rockchip RK3399 + RK3588 datasheets list 10-bit H.264 support

Stop enumerating Hi10P + Main10 so VAAPI consumers don't try the
broken path. The backend infrastructure (codec.c profile cases,
context.c NV15 CAPTURE + synthetic SPS bit_depth=2 + video_format
invalidation, image.c P010 reporting + NV15→P010 unpack, surface.c
RT_FORMAT_YUV420_10 guard + NV15 PRIME fourcc, nv15.c + nv15.h
unpack primitive, request.h is_10bit flag) is RETAINED — just
re-add the two profiles[index++] lines and bump the H264 guard
back to (-6) when upstream ffmpeg-vaapi V4L2 hwaccel learns 10-bit.

Memory: feedback_rk3399_h264_hi10p_advertised_not_functional.md
captures the empirical evidence for future iterations.

vainfo after this commit: 10 profiles (was 12), matches the iter38
baseline. iter38 5/5 PASS preserved (no other codec touched).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 16:43:44 +00:00
claude-noether 63fed87bc5 iter39 fresnel fix: advertise P010 unconditionally in QueryImageFormats
ffmpeg-vaapi's hwcontext_vaapi calls vaQueryImageFormats during
hwframes context setup, BEFORE vaCreateContext fires. Our previous
gate on driver_data->is_10bit meant P010 wasn't in the catalog at
that early query — ffmpeg's hwdownload then rejected pix_fmt=p010le
with "Invalid output format p010le for hwframe download" and decode
failed before our backend's CreateContext saw the 10-bit profile.

Fix: advertise P010 unconditionally in QueryImageFormats. Safe because
consumers ask for P010 only when their decode pipeline needs 10-bit,
and our P010 unpack path in copy_surface_to_image is gated on
image->format.fourcc == VA_FOURCC_P010 (independent of is_10bit).

Verified on fresnel: with this fix, Hi10P decode advances past the
hwdownload filter setup. (Run pending bundle to fresnel.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 16:34:52 +00:00
claude-noether a13215de45 iter39 fresnel fix: skip pre-S_FMT NV15 CAPTURE format probe
RK3399 rkvdec advertises NV15 in VIDIOC_ENUM_FMT(CAPTURE) only AFTER
S_FMT(OUTPUT) + S_EXT_CTRLS(SPS) resolve image_fmt to 420_10BIT.
Pre-flight v4l2_find_format(NV15) always returns 0 → video_format
stays NULL → CreateContext returns OPERATION_FAILED → ffmpeg-vaapi
hwaccel init fails with "Failed to create decode context: 1".

Verified on fresnel (kernel 7.0-14 / linux-fresnel-fourier):
  v4l2-ctl -d /dev/video1 --list-formats → only NV12 enumerated

Fix: for 10-bit profiles, skip the find_format probe and directly
map to our NV15 video_format entry. The later S_FMT(CAPTURE) in
the same RequestCreateContext path commits the actual NV15 mode
once the synthetic-SPS injection sets bit_depth_luma_minus8=2.

Discovered during Phase 7 sub-profile verification — Criterion 1
(vainfo enumeration) PASSed but Criteria 2/3 (Hi10P/Main10 decode)
failed with the hwaccel init error. iter38 5/5 baseline still PASSES
(no regression — non-10-bit path unchanged).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 16:34:14 +00:00
claude-noether f0ef69d279 iter2 step4: wire h265_set_controls to populate EXT_SPS_*_RPS controls
Per Phase 4 plan + Phase 5 review amendments (SPS parse-and-cache,
per-fd gating).

src/h265.c additions:
  - #include <errno.h>, the v4l2-hevc-ext-controls.h, and the
    vendored gst/codecparsers/gsth265parser.h
  - new static helper h265_populate_ext_sps_rps_cache(): walks
    surface_object->source_data for an SPS NAL (nal_unit_type == 33)
    using gst_h265_parser_identify_nalu; if found, calls
    gst_h265_parser_parse_sps_ext (NOT gst_h265_parser_parse_sps —
    the latter discards the per-RPS-entry EXT data we need); maps
    GstH265ShortTermRefPicSet (base) + GstH265ShortTermRefPicSetExt
    (carrying use_delta_flag[16], used_by_curr_pic_flag[16],
    delta_poc_s0_minus1[16], delta_poc_s1_minus1[16]) into the V4L2
    struct arrays; stores on driver_data->hevc_rps_cache_*
  - non-IDR-frame handling: cache holds across frames, so frames
    whose source_data lacks an SPS NAL reuse the previously-parsed
    cached arrays (Phase 5 review item #3)
  - controls[] grows from [5] to [7]; the 2 new entries are appended
    after the standard 5 (SPS/PPS/SLICE_PARAMS/SCALING_MATRIX/
    DECODE_PARAMS), gated by driver_data->has_hevc_ext_sps_rps_rkvdec
    (per-fd probe result from Step 3) + the cache being valid
  - field-by-field mapping mirrors GStreamer's
    gst_v4l2_codec_h265_dec_fill_ext_sps_rps verbatim (the upstream
    reference identified in Phase 0 prior-art survey)

src/request.h additions:
  - struct request_data carries hevc_rps_cache_st (array pointer),
    _st_count, hevc_rps_cache_lt, _lt_count, hevc_rps_cache_valid.
    Single-slot cache (sps_id 0 only; multi-SPS streams would need
    expanding). Stores POST-MAPPED V4L2 structs so request.h doesn't
    need to know GstH265SPS / GstH265SPSEXT types.

Critical interpretation correction (Phase 5 review followup):
GstH265SPS has short_term_ref_pic_set[65] (base) but NOT
short_term_ref_pic_set_ext[]. The EXT array lives on a SEPARATE
GstH265SPSEXT struct accessed via gst_h265_parser_parse_sps_ext.
The 'plain' gst_h265_parser_parse_sps internally calls _ext with a
LOCAL discarded SPSEXT (see gsth265parser.c:2050). Our call must
use the _ext variant directly to keep the EXT data. Caught during
Step 4 first-build error.

Build verified: ninja -C build clean. .so is 759 KB (up from 485 KB
original, 682 KB after Step 2 vendor — the +80 KB is the new helper
+ extension).

iter2 Phase 6 Step 5 (install + reboot + smoke-test) is the F1
falsifier moment: if HEVC stops OOPSing, mechanism confirmed; if it
still OOPSes, loopback Phase 0 with re-opened kernel-agent#11.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:49:12 +00:00
claude-noether 393d02f413 iter2 step3: HEVC EXT_SPS_*_RPS UAPI header + runtime probe
src/hevc-ctrls/v4l2-hevc-ext-controls.h (NEW, MIT, ~95 LOC):
  Verbatim mirror of Linux 7.0 V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS
  and _LT_RPS control IDs + struct definitions + flag macros. Each
  symbol is ifndef-guarded so when ampere's linux-api-headers
  eventually bumps to 7.0+, the kernel header takes precedence and
  this shim silently no-ops. Citation block links the upstream
  Casanova v8 series.

  Per LGPL section 3.b, kernel UAPI struct definitions are excepted
  from GPL inheritance, so copying them into MIT userspace is fine.

src/request.h: added has_hevc_ext_sps_rps_rkvdec + _hantro bool
  fields on struct request_data — pair-of-flags layout mirrors
  video_fd_rkvdec / video_fd_hantro (iter38 multi-device-probe
  pattern, per feedback_multi_device_probe_design). Phase 5 review
  identified single-scalar storage as a silent-misbehavior risk
  across device-switch boundaries.

src/request.c:
  - new probe_hevc_ext_sps_rps_controls(fd) helper: queries the two
    new CIDs via VIDIOC_QUERYCTRL; returns true iff both register.
    RK3399 rkvdec (linux 6.x or 7.x without VDPU381/383 bindings)
    returns false; RK3588 rkvdec (VDPU381/383) returns true.
  - probe each driver_data->video_fd_rkvdec / _hantro after the
    iter38 multi-device-probe block at VA_DRIVER_INIT time
  - log-line if rkvdec supports it - diagnostic for Phase 7

src/meson.build: added the new UAPI header to the headers list.

Build verified: ninja -C build clean, .so produced. The new probe
runs at driver init and stores the result, but nothing CONSUMES the
result yet — that's Step 4 (h265_set_controls wiring).

Per ampere-kernel-decoders campaign iter2 Phase 4 step 3 (amended
by Phase 5 review item 'per-fd storage').

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:49:09 +00:00
claude-noether 9f7437e8ee iter2 step2: GLib/GStreamer compat shim, build succeeds
Vendored gsth265parser + nalutils + gstbitreader + gstbytereader (the
Step 1 commit) compile cleanly against libc + libv4l2 only after
adding 1 compat translation unit + 5 stub headers, no edits to the
vendored .c/.h files themselves.

src/h265_parser/gst_compat.{h,c} — new files (MIT, original work):
  - GLib type aliases (gboolean, gchar, gint*, guint*, gsize, gpointer)
  - Memory helpers (g_malloc/g_free as #define free, g_memdup2 inline)
  - Asserts as no-op + parser-return-code-propagation
  - All GST_DEBUG/INFO/WARNING/ERROR/LOG/FIXME as no-ops (the parser
    is heavy on debug logging; we compile it all out)
  - GArray implementation (~100 LOC, just enough for gsth265parser.c's
    24 call sites)
  - GList full struct with .data/.next/.prev so callers compile;
    list-manipulation functions abort() — dead code paths only
  - Byte-order read/write macros (GST_READ_UINT8/16/24/32/64_LE/BE,
    GST_WRITE_UINT8/16/24/32_BE) — aarch64 LE inlines
  - g_once_init_enter/leave as simple gate
  - G_MAXUINT*, G_MAXINT*, G_MINxxx, G_GNUC_* attribute macros, etc.
  - Opaque GstBuffer/GstMemory/GstMapInfo + abort-stub functions for
    the encoder-side SEI-insertion paths the libva backend never invokes
  - gst_util_ceil_log2 real impl (used by slice-header parser; dead
    for our SPS-only call path but cheaper to implement than stub)

src/h265_parser/gst/{gst.h,base/base-prelude.h,base/gstbitwriter.h,
codecparsers/codecparsers-prelude.h,glib-compat-private.h} — 5 new
stub headers (MIT). All include gst_compat.h. gstbitwriter.h adds
abort-stub functions for the bit-writer API (used by nalutils.c's NAL
emulation-prevention encoder path — dead code for the parse-only
libva backend).

src/meson.build — added the 5 new .c source files and 10 new .h
headers; added include_directories('h265_parser') to the include path
so the vendored files' '#include <gst/base/...>' style references
resolve to the stub headers + actual vendored files in the local
tree.

Build verified: ninja -C build produces v4l2_request_drv_video.so
(682 KB, up from 485 KB pre-vendor — the +200 KB is the vendored
parser code). nm shows gst_h265_parse_sps, gst_h265_parse_sps_ext,
gst_h265_parser_identify_nalu, and the other functions we need for
Step 4 are present in the binary.

Two #warning messages from gsth265parser.h about API stability are
upstream-intentional and harmless ('The H.265 parsing library is
unstable API and may change in future').

This commit completes Step 2 of ampere-kernel-decoders iter2 Phase 6.
Backend remains functionally identical to pre-iter2 — the new code
compiles + links but is not yet called from h265_set_controls (that's
Step 4). Existing 5 codecs continue to work as before.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:49:06 +00:00
claude-noether c9b7fcff50 iter2 step1: vendor GStreamer 1.28.2 H.265 parser unchanged
Source: gitlab.freedesktop.org/gstreamer/gstreamer @ commit 43421c2a5b8a
(refs/tags/1.28.2). All 8 vendored files copied verbatim into
src/h265_parser/:

  gst-plugins-bad/gst-libs/gst/codecparsers/gsth265parser.c (168 KB)
  gst-plugins-bad/gst-libs/gst/codecparsers/gsth265parser.h ( 92 KB)
  gst-plugins-bad/gst-libs/gst/codecparsers/nalutils.c       (13 KB)
  gst-plugins-bad/gst-libs/gst/codecparsers/nalutils.h       (  8 KB)
  gstreamer/libs/gst/base/gstbitreader.c                     (  8 KB)
  gstreamer/libs/gst/base/gstbitreader.h                     ( 10 KB)
  gstreamer/libs/gst/base/gstbytereader.c                    ( 39 KB)
  gstreamer/libs/gst/base/gstbytereader.h                    ( 25 KB)

Total ~11 KLOC, LGPL v2.1+ per original headers (Intel + Sreerenj
Balachandran + others). LGPL headers preserved verbatim. Backend's
existing COPYING.LGPL covers redistribution.

** Build is INTENTIONALLY BROKEN at this commit. ** GLib dependencies
(GArray, g_malloc, gboolean, GST_DEBUG, etc.) are not yet satisfied;
src/Makefile.am is not yet updated to include these files. Step 2
performs the GLib-to-libc mechanical adaptation; Step 3 wires the
header + Makefile.

This vendor-unchanged commit is the upstream-tracking baseline. When
GStreamer ships a parser bug fix, the future-sync workflow is:
  git diff src/h265_parser/ HEAD..(this commit)
to surface our adaptations, then rebase those over the upstream fix.

Per ampere-kernel-decoders campaign iter2 Phase 4 §Step 1
(/home/mfritsche/src/ampere-kernel-decoders/phase4_plan_iter2.md).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:48:52 +00:00
claude-noether a8a91d92d6 Revert "ampere iter2: HEVC EXT_SPS_ST_RPS / _LT_RPS dynamic-array submission (VDPU381)"
This reverts commit f61f736380.
2026-05-17 09:48:29 +00:00
claude-noether f61f736380 ampere iter2: HEVC EXT_SPS_ST_RPS / _LT_RPS dynamic-array submission (VDPU381)
Fixes the rkvdec_hevc_prepare_hw_st_rps out-of-bounds kernel OOPS that
blocked HEVC decode on ampere (RK3588) per
marfrit/libva-v4l2-request-fourier#3 and ampere-fourier iter1 close.

Mechanism (Phase 5 amendment to issue body):
The new EXT_SPS controls are registered as V4L2_CTRL_FLAG_DYNAMIC_ARRAY
in vdpu38x_hevc_ctrl_descs (rkvdec.c:279/284) with cfg.dims = { 65 }.
The v4l2-ctrl framework init-allocates 1 zeroed element (ctrls-core.c:2116).
When num_short_term_ref_pic_sets > 1, rkvdec_hevc_prepare_hw_st_rps
(rkvdec-hevc-common.c:393-405) iterates idx 0..N-1 and overruns the
1-element kernel allocation. Submitting an N-element dynamic-array
control via S_EXT_CTRLS extends the framework allocation.

Userspace fix:
  - VIDIOC_QUERY_EXT_CTRL probe at first HEVC CreateContext sets
    driver_data->has_ext_sps_rps (true on VDPU381/383, false on legacy
    RK3399 — control unregistered there, so fresnel iter38 5/5 + iter39
    sub-profile paths are byte-identical to pre-iter2).
  - When set, h265_set_controls appends EXT_SPS_ST_RPS + _LT_RPS as
    calloc'd zero arrays, sized by VAAPI's count fields and capped at
    H.265 §7.4.3.2 spec maxima (ST 64, LT 32). Min 1 (kernel rejects 0).
  - Free post-S_EXT_CTRLS.

Decode correctness scope:
VAAPI does NOT expose per-set st_ref_pic_set syntax elements
(delta_idx_minus1, delta_rps_sign, etc.) — confirmed in va_dec_hevc.h.
All-zero entries give empty inter-pred RPS per set, which is correct
for IDR-only streams and incorrect for streams with inter-pred RPS
dependence. iter2 acceptance: stop the OOPS. Decode-correctness for
inter-RPS content is a known follow-up requiring either bitstream-snoop
or SPS-passthrough via a new VAAPI extension.

Files:
  - include/hevc-ctrls.h: #ifndef-guarded fallback definitions for
    V4L2_CID_STATELESS_HEVC_EXT_SPS_{ST,LT}_RPS + structs (ampere host
    is on linux-api-headers 6.19-1; the new CIDs land in 7.0).
  - src/request.h: driver_data->has_ext_sps_rps (persists for driver
    lifetime; gated solely by HEVC code path so cross-codec leakage
    impossible).
  - src/context.c: probe at HEVC CreateContext via v4l2_query_ext_ctrl.
  - src/h265.c: controls[5] → controls[7]; #include <hevc-ctrls.h>
    (replaces <linux/v4l2-controls.h>) for forward UAPI compatibility.

Compile-tested on boltzmann (aarch64 native, gcc 15.2.1): clean .so,
0 new warnings. Fresnel cross-device safety: legacy RK3399 rkvdec_ctrl
table omits the CIDs; probe returns false; new code path never executes.

iter39 sub-profile work (commits 662f887 + 8746690) is preserved
in-tree; iter2 is a forward-compatible additive change.

Refs:
  marfrit/libva-v4l2-request-fourier#3
  ampere-fourier/iter1_close.md HEVC blocker
  ampere-fourier/iter2_phase0_findings.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:34:58 +00:00
claude-noether 8746690739 iter39: add NV15 → P010 unpack self-test (tests/test_nv15_unpack.c)
Pure-C unit test for nv15_unpack_plane_to_p010, independent of any V4L2
hardware. Verifies bit layout against the spec at
Documentation/userspace-api/media/v4l/pixfmt-nv15.rst by packing known
10-bit pixel values, running the unpack, and asserting P010 output
matches pixel<<6.

Coverage:
  - zero, all-max
  - 8 known position/spread vectors
  - widths {1, 2, 3, 7, 8} including remainder paths
  - multi-row with stride padding
  - chroma-shape (half-height)

Build + run:
  cc -Wall -Werror -O2 -o test_nv15_unpack \
     tests/test_nv15_unpack.c src/nv15.c
  ./test_nv15_unpack

Confirmed PASS on noether (x86_64 native). Catches the highest-risk
class of regression in iter39 — silent bit-shift errors in the unpack —
without requiring fresnel hardware.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:22:14 +00:00
claude-noether 662f8874ba iter39 α-31: H264 Hi10P + HEVC Main10 sub-profile support (10-bit, rkvdec NV15)
Adds VAProfileH264High10 and VAProfileHEVCMain10 to the libva-v4l2-request
backend. RK3399 rkvdec emits decoded frames as V4L2_PIX_FMT_NV15 (4 × 10-bit
values packed in 5 bytes per element); VAAPI consumers receive standard
VA_FOURCC_P010 via a new userspace unpack in copy_surface_to_image.

VP9 Profile 2 explicitly NOT added — RK3399 rkvdec kernel ctrl table
caps at V4L2_MPEG_VIDEO_VP9_PROFILE_0 (rkvdec.c::rkvdec_vp9_ctrl_descs).

Touchpoints (per Phase 5 sonnet-architect review amendments):
  - include/drm_fourcc.h: define DRM_FORMAT_NV15 (vendored libdrm lacks it)
  - src/nv15.{c,h}: NV15 → P010 plane unpack (LSB-first, per
    Documentation/userspace-api/media/v4l/pixfmt-nv15.rst)
  - src/video.c: NV15 entry in formats[] (else NULL-deref on video_format_find)
  - src/codec.c: pixelformat_for_profile cases for Hi10P + Main10
  - src/config.c: enumeration, validation, entrypoints, RT_FORMAT_YUV420_10
    advertisement for 10-bit profiles
  - src/context.c: per-profile CAPTURE pix_fmt (NV12/NV15), 10-bit synthetic
    SPS (bit_depth_luma_minus8=2), video_format invalidation on bit-depth
    transition (sibling to iter38 device-switch invalidation), is_10bit flag
  - src/surface.c: RT_FORMAT_YUV420_10 admission, NV15 fourcc on PRIME export
  - src/image.c: P010 reporting in DeriveImage + QueryImageFormats,
    P010-aware sizing in CreateImage, NV15 → P010 unpack call in
    copy_surface_to_image (gated on is_10bit + image.format.fourcc == P010)
  - src/picture.c: 4 switch blocks route Hi10P/Main10 to existing H264/HEVC
    per-codec paths
  - src/request.h: MAX_PROFILES bump 11 → 13, driver_data->is_10bit flag

Scope: COPY path (vaGetImage / vaDeriveImage) only. Standard ffmpeg-vaapi
hwdownload, mpv vaapi-copy, and any consumer using vaGetImage works
end-to-end. PRIME-path consumers that only know NV12/P010 must use the
COPY path; PRIME consumers aware of NV15 (panfrost-Mesa et al.) get the
correct fourcc on RequestExportSurfaceHandle. PRIME-side P010 emission is
follow-up scope (would need DRM_FORMAT_P010 + per-plane unpack into a
GPU-accessible buffer).

Compile-tested on boltzmann (aarch64 native, gcc 15.2.1, libva 1.23.0,
libdrm 2.4.133): clean build, .so produced, 0 new warnings.

Phase 0/2 evidence: linux-mmind-v7.0 drivers/media/platform/rockchip/rkvdec.
rkvdec_h264_decoded_fmts[] and rkvdec_hevc_decoded_fmts[] both list NV15;
ctrl tables cap at HEVC MAIN_10 and H264 HIGH_422_INTRA (Hi10P < cap, not
in menu_skip_mask). image_fmt resolution (rkvdec-h264-common.c:196,
rkvdec-hevc-common.c:467) dispatches on bit_depth_luma_minus8 only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:15:16 +00:00
45 changed files with 15570 additions and 163 deletions
+262 -56
View File
@@ -1,75 +1,281 @@
# v4l2-request libVA Backend # libva-v4l2-request-fourier
## About VA-API ICD backend for V4L2 stateless video decoders. Fourier-campaign
fork of the dormant `bootlin/libva-v4l2-request` upstream.
This libVA backend is designed to work with the Linux Video4Linux2 > **TL;DR for "I want hardware-accelerated YouTube in Firefox on my
Request API that is used by a number of video codecs drivers, > Rockchip board":** skip to the [§ Quickstart](#quickstart) below.
including the Video Engine found in most Allwinner SoCs. > Fresnel (RK3399) and ampere (RK3588) are validated targets; ohm
> (RK3566 PineTab2) is the chromium-fourier validation rig.
## Status ## What works
The v4l2-request libVA backend currently supports the following formats: | SoC / host | HW-accelerated codecs | Bit-exact vs `kdirect` |
* MPEG2 (Simple and Main profiles) |---|---|---|
* H264 (Baseline, Main and High profiles) | RK3399 (fresnel — Pinebook Pro) | H.264, HEVC Main, VP9 Profile 0, VP8, MPEG-2 | 5/5 at iter38; preserved through iter40b |
* H265 (Main profile) | RK3588 (ampere) | H.264 + HEVC (iter1+iter2 ampere-fourier); **mainline rkvdec / VDPU381 + VDPU383 landed February 2026** — VP9 / AV1 verification next | iter1 H.264 PASS; remaining codecs gated on mainline-driver bring-up |
| RK3568 / RK3566 (ohm — PineTab2) | H.264, MPEG-2, VP8 via hantro multi-planar | iter1-5 baseline (libva-multiplanar campaign) |
| BCM2712 (higgs — Pi 5 / CM5) | — | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved, [see § Pi 5 standoff](#the-pi-5-standoff) |
`kdirect` is the reference: `ffmpeg -hwaccel v4l2request
-hwaccel_output_format drm_prime ...` via Kwiboo's downstream ffmpeg
patches (packaged here as **`ffmpeg-v4l2-request-fourier`**, FFmpeg 8.1
tip @ Kwiboo `v4l2-request-n8.1` commit `b57fbbe`).
## Quickstart
### What you need for HW-accelerated YouTube in Firefox
The full stack, top to bottom, with the package this campaign provides
at each layer:
| Layer | Package(s) | Notes |
|---|---|---|
| Linux kernel with V4L2 stateless decoders | `linux-fresnel-fourier` (RK3399), `linux-ampere-fourier` (RK3588) | Mainline rkvdec / hantro / VDPU381 / VDPU383. ohm typically rides on a Beryllium OS host kernel. |
| `ffmpeg` with Kwiboo's v4l2-request hwaccel | `ffmpeg-v4l2-request-fourier` | Provides `-hwaccel drm -c:v hevc` (and h264/vp9) routes via libavcodec hwdevice DRM. |
| `libva` VA-API runtime + this backend ICD | `libva` (stock) + **`libva-v4l2-request-fourier`** | This repo. Auto-detects rkvdec / hantro / cedrus on probe. |
| Firefox patched to call libavcodec stateless | `firefox-fourier` | 5-patch series, ~+169 LoC over stock Firefox. Validated on fresnel: **~5 % CPU at 1080p30 H.264** (vs 64 % software). |
| (Wayland alt) Chromium patched for V4L2VDA | `chromium-fourier` + `kwin-fourier` | Validated on ohm under KDE Plasma 6.6.5 Wayland. Needs `kwin-fourier` for the dmabuf-fence latency fix. |
| (Optional) panfrost / panthor GPU stack | `vulkan-panfrost` | Wayland compositor + 3D. |
The actual VA-API path is mostly historical inside this campaign — the
**user-facing browser HW decode story rides libavcodec's
`v4l2_request` hwaccel directly**, not VAAPI-via-libva. Firefox-fourier
attaches an `AV_HWDEVICE_TYPE_DRM` context to libavcodec's generic
`h264`/`hevc`/`vp9` decoder; libavcodec then auto-binds the
`v4l2_request` hwaccel from its `hw_configs`. No `LIBVA_DRIVER_NAME`
incantation needed for browser use. libva-v4l2-request-fourier matters
for mpv, ffmpeg-as-vaapi, and other VA-API direct consumers.
### Install on Arch ALARM (fresnel / ampere / ohm)
Add the marfrit repo if you haven't already:
```ini
# /etc/pacman.conf
[marfrit]
SigLevel = Required
Server = https://packages.reauktion.de/arch/$arch
```
Import the signing key (one-time):
```bash
sudo pacman-key --recv-keys <KEY-ID> # see https://packages.reauktion.de
sudo pacman-key --lsign-key <KEY-ID>
sudo pacman -Sy
```
Then per host:
```bash
# Fresnel — RK3399 Pinebook Pro
sudo pacman -S \
linux-fresnel-fourier linux-fresnel-fourier-headers \
ffmpeg-v4l2-request-fourier \
libva-v4l2-request-fourier \
firefox-fourier
# Ampere — RK3588
sudo pacman -S \
linux-ampere-fourier linux-ampere-fourier-headers \
ffmpeg-v4l2-request-fourier \
libva-v4l2-request-fourier \
firefox-fourier
# Ohm — RK3566 PineTab2 (chromium-fourier validated path)
sudo pacman -S \
ffmpeg-v4l2-request-fourier \
libva-v4l2-request-fourier \
kwin-fourier
# chromium-fourier currently still a local build — see § Status
```
Reboot if a new kernel landed. Then:
```bash
# Smoke-test: vainfo should list HEVCMain + H264 entries
LIBVA_DRIVER_NAME=v4l2_request vainfo
# Browser launch with verbose decoder logging
MOZ_LOG="PlatformDecoderModule:5,FFmpegVideo:5" \
firefox-fourier 2>&1 | tee /tmp/fx.log
# Then open a YouTube 1080p H.264 video and grep for:
# "Choosing FFmpeg pixel format for V4L2 video decoding"
# "av_hwdevice_ctx_create(DRM, /dev/dri/renderD128) ok"
# If you DON'T see those: HW path didn't engage, fell back to software.
```
### Status of the published vs locally-built packages
As of May 2026, the live marfrit repo at
<https://packages.reauktion.de/arch/aarch64/> has:
-`libva-v4l2-request-fourier-1:1.0.0.r361.cf8cd9d-1` (iter40b tip)
-`ffmpeg-v4l2-request-fourier-2:8.1.r123329.b57fbbe-3` (Kwiboo's
v4l2-request-n8.1 + libudev-bypass; smoke-tested on fresnel —
HEVC via `-hwaccel v4l2request` PASS)
-`firefox-fourier-150.0.1-16` (5-patch series, sandboxed RDD HW
decode validated on RK3399: ~5 % CPU at 1080p30 H.264)
-`linux-fresnel-fourier-7.0-14` + headers (RK3399)
-`linux-ampere-fourier-7.0rc3.kafr1-1` + headers (RK3588)
-`kwin-fourier-1:6.6.5-1` (Wayland dmabuf-fence fix for chromium-fourier)
-`vulkan-panfrost-1:26.0.5-1` (GPU stack)
NOT yet published but **present in `marfrit-packages/arch/` source
tree** (build + publish pending):
-`chromium-fourier` (Chromium 147 + V4L2VDA-on-mainline patches —
blocked on Arch ALARM bumping clang 22 → 23).
-`qt6-base-fourier` (GL_ALPHA → GL_R8 fix — needed by KDE Plasma
Wayland on the panfrost stack).
If you need those locally before they ship:
```bash
git clone ssh://git@git.reauktion.de:2222/marfrit/marfrit-packages.git
cd marfrit-packages/arch/<package>
makepkg -si
```
## What does NOT work, and why it's stalled
| Target | Status | Blocker |
|---|---|---|
| H264 Hi10P on RK3399 | enumerated, decode returns all-zero | RK3399 silicon doesn't implement 10-bit despite kernel advertising the profile (iter39 close, Option B applied) |
| HEVC Main10 on RK3399 | not enumerated | same as Hi10P |
| **Pi 5 / CM5 (BCM2712 / `rpi-hevc-dec`)** | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved | see "The Pi 5 standoff" below |
## What does NOT work, and why it's stalled
| Target | Status | Blocker |
|---|---|---|
| H264 Hi10P on RK3399 | enumerated, decode returns all-zero | RK3399 silicon doesn't implement 10-bit despite kernel advertising the profile (iter39 close, Option B applied) |
| HEVC Main10 on RK3399 | not enumerated | same as Hi10P |
| **Pi 5 / CM5 (BCM2712 / `rpi-hevc-dec`)** | infrastructure landed (iter40 / iter40b), bit-exact NOT achieved | see "The Pi 5 standoff" below |
### The Pi 5 standoff
iter40 + iter40b add a third multi-device-probe slot for
`rpi-hevc-dec`, an NC12 SAND128 detile primitive, per-driver gates
around the SPS pre-seed + start-code-prepend + scaling_matrix submission,
and a (fragile, fixture-specific) SPS field override using the
GStreamer 1.28.2 H.265 parser. ICD discovery works, `vainfo` lists
`VAProfileHEVCMain`, S\_FMT / REQBUFS / STREAMON all succeed.
**Decode itself never succeeds** — every CAPTURE DQBUF returns
`V4L2_BUF_FLAG_ERROR`. Driver author John Cox confirmed strict SPS
validation is intentional ("`try_ext_ctrls returned an error (22)` is
expected as it is validating the SPS"), and VAAPI's
`VAPictureParameterBufferHEVC` simply doesn't carry the bitstream-true
scalars (`sps_max_num_reorder_pics`, `sps_max_latency_increase_plus1`,
slice-level `num_entry_point_offsets`) that the driver wants. We can't
fish the SPS out of `source_data` either, because ffmpeg-vaapi parses
the SPS itself and passes only slice NAL bytes to libva backends.
This is not a bug in our backend, in libva, in ffmpeg, or in the kernel
driver. It's an ecosystem coordination failure of long standing:
- **Kwiboo's `ffmpeg-v4l2request` hwaccel** has been in production via
LibreELEC since December 2018. Re-submitted to ffmpeg-devel as a v2
series in August 2024. Still un-merged in May 2026 — **eight years
in the upstream review queue**.
- **`libva-v4l2-request`** (this project's upstream) hasn't taken
meaningful commits since ~2021. Nobody wants to own the impedance
mismatch between VAAPI's Intel-shaped "give me raw bitstream, I'll
parse" and V4L2 stateless's kernel-shaped "give me parsed structs,
I'll just drive the HW."
- **`rpi-hevc-dec` mainline submission** is at v4 (July 2025), 17
months in review. The Pi 6.18.x downstream kernel meanwhile has
active HEVC regressions ([raspberrypi/linux#7228](https://github.com/raspberrypi/linux/issues/7228),
[#7306](https://github.com/raspberrypi/linux/issues/7306)) that
aren't being fast-tracked because "the new uAPI is coming."
- **Mozilla is implementing Pi 5 HEVC via ffmpeg's hwaccel-context
path** (bug [1969297](https://bugzilla.mozilla.org/show_bug.cgi?id=1969297)),
not via libva — explicit acknowledgement from David Turner that
libavcodec needs to retain the SPS context for the strict driver to
accept the control batch.
What end-users actually do today: run Pi OS (downstream-patched ffmpeg
+ downstream kernel) or LibreELEC (Kwiboo's patches + downstream
kernel). Anyone on a stock distro outside those two: no HW HEVC on
Pi 5.
Nobody who has authority to merge has skin in the game. Everyone with
skin in the game lacks authority. Result: 8-year stalemate, three
forks of working code, no merged upstream.
### What this means for this backend
We chose to extend `libva-v4l2-request` into Pi 5 territory because
the architecture maps cleanly onto the existing iter38 multi-device
probe. That work landed (iter40 commit `3ffa9d0`, iter40b commit
`071b08d`). It's reusable infrastructure for any future strict V4L2
stateless decoder that ffmpeg ships before libva does.
But the *user-facing* Pi 5 HEVC story will not come from this
backend. The backend was a clean architectural target inside a
coordination dead-end. The actual Pi 5 HEVC path through libva
requires either:
- a VAAPI extension exposing the SPS scalars rpi-hevc-dec validates
against (Intel-driven; no Pi-aligned principal),
- a libva-internal `VABufferType` for raw SPS/PPS NAL bytes (no
maintainer),
- ffmpeg-vaapi forwarding `num_entry_point_offsets` to backends
(small upstream patch; no champion), OR
- the political situation around Kwiboo's series unblocks (no
visible movement).
iter40 + iter40b are **landed but parked**. The fresnel + ampere
sibling paths are unaffected (5/5 fresnel + 9 profiles ampere
verified post-iter40b, no regression). Phase 8 packaging is
deliberately skipped — shipping a `.deb` whose primary advertised
target (Pi 5) doesn't actually decode would mislead users.
See `phase0_pi5_hevc.md`, `phase1_pi5_hevc.md`,
`phase5_pi5_hevc_review.md`, `phase7_pi5_hevc_close.md` for the
chapter's full empirical record.
## Instructions ## Instructions
In order to use this libVA backend, the `v4l2_request` driver has to In order to use this backend, set the `LIBVA_DRIVER_NAME` environment
be specified through the `LIBVA_DRIVER_NAME` environment variable, as variable:
such:
export LIBVA_DRIVER_NAME=v4l2_request export LIBVA_DRIVER_NAME=v4l2_request
A media player that supports VAAPI (such as VLC) can then be used to decode a Then a VA-API-capable player can decode supported codecs on a probed
video in a supported format: device:
vlc path/to/video.mpg vlc path/to/video.mp4
mpv --hwdec=vaapi path/to/video.mp4
ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i in.mp4 -f null -
Sample media files can be obtained from: The backend auto-detects available decoders via the V4L2 media
topology walk; honors `LIBVA_V4L2_REQUEST_VIDEO_PATH` and
http://samplemedia.linaro.org/MPEG2/ `LIBVA_V4L2_REQUEST_MEDIA_PATH` for explicit device selection.
http://samplemedia.linaro.org/MPEG4/SVT/
## Technical Notes ## Technical Notes
### Surface ### Multi-device probe (iter38)
A Surface is an internal data structure never handled by the VA's user A single libva session opens both `rkvdec` and `hantro-vpu` (and, on
containing the output of a rendering. Usualy, a bunch of surfaces are created hosts where it's present, `rpi-hevc-dec`) at init. `RequestCreateConfig`
at the begining of decoding and they are then used alternatively. When re-targets the active fd per profile via
created, a surface is assigned a corresponding v4l capture buffer and it is `request_switch_device_for_profile()`. Pool teardown happens at
kept until the end of decoding. Syncing a surface waits for the v4l buffer to switch time; the next `CreateContext` rebuilds against the right
be available and then dequeue it. device.
Note: since a Surface is kept private from the VA's user, it can ask to ### Surface / Context / Picture / Image
directly render a Surface on screen in an X Drawable. Some kind of
implementation is available in PutSurface but this is only for development
purpose.
### Context A Surface is an internal data structure containing rendering output.
A Context owns the V4L2 lifecycle (S\_FMT, CAPTURE pool, ctrl-batch
defaults) for one decode session. A Picture is one encoded input
frame's set of buffers. An Image is a Standard VA pixel-format view
on a decoded Surface — the backend detiles SAND/COL128 or unpacks
NV15 to NV12/P010 here so consumers see linear pitches.
A Context is a global data structure used for rendering a video of a certain The real rendering is in `EndPicture`, not `RenderPicture`, because
format. When a context is created, input buffers are created and v4l's output the kernel needs the full extended-control batch when the OUTPUT
(which is the compressed data input queue, since capture is the real output) buffer is queued, and `RenderPicture` order is consumer-defined.
format is set.
### Picture
A Picture is an encoded input frame made of several buffers. A single input
can contain slice data, headers and IQ matrix. Each Picture is assigned a
request ID when created and each corresponding buffer might be turned into a
v4l buffers or extended control when rendered. Finally they are submitted to
kernel space when reaching EndPicture.
The real rendering is done in EndPicture instead of RenderPicture
because the v4l2 driver expects to have the full corresponding
extended control when a buffer is queued and we don't know in which
order the different RenderPicture will be called.
### Image
An Image is a standard data structure containing rendered frames in a usable
pixel format. Here we only use NV12 buffers which are converted from sunxi's
proprietary tiled pixel format with tiled_yuv when deriving an Image from a
Surface.
+5
View File
@@ -195,6 +195,11 @@ extern "C" {
#define DRM_FORMAT_NV24 fourcc_code('N', 'V', '2', '4') /* non-subsampled Cr:Cb plane */ #define DRM_FORMAT_NV24 fourcc_code('N', 'V', '2', '4') /* non-subsampled Cr:Cb plane */
#define DRM_FORMAT_NV42 fourcc_code('N', 'V', '4', '2') /* non-subsampled Cb:Cr plane */ #define DRM_FORMAT_NV42 fourcc_code('N', 'V', '4', '2') /* non-subsampled Cb:Cr plane */
/* iter39: NV15 is 4×10-bit packed in 5 bytes (Rockchip rkvdec 10-bit output). */
#ifndef DRM_FORMAT_NV15
#define DRM_FORMAT_NV15 fourcc_code('N', 'V', '1', '5') /* 2x2 subsampled Cr:Cb plane 10 bits per channel packed */
#endif
/* /*
* 3 plane YCbCr * 3 plane YCbCr
* index 0: Y plane, [7:0] Y * index 0: Y plane, [7:0] Y
+11
View File
@@ -4,3 +4,14 @@ option(
value : '', value : '',
description: 'Path to sanitized Linux Kernel headers' description: 'Path to sanitized Linux Kernel headers'
) )
option(
'daedalus_v4l2',
type : 'boolean',
value : true,
description: 'Enable probe + dispatch for the out-of-tree daedalus_v4l2 ' +
'stateless decoder shim (Pi 5 / CM5 daemon-backed VP9/AV1/H264). ' +
'Default true; disable on platforms where the daedalus_v4l2 ' +
'kernel module will never be present to slim the probe array.'
)
+298
View File
@@ -0,0 +1,298 @@
# Phase 0 — Pi 5 / CM5 HEVC chapter
Opened 2026-05-17 evening, after the failed `libva-v4l2-stateful-fourier`
scaffold attempt. Brother-session empirical Phase 0 on higgs invalidated
the stateful premise: rpi-hevc-dec is V4L2 **stateless**, so Pi 5 HEVC
belongs in this backend, not a separate sibling.
No code in this chapter yet. This doc is the substrate. Phase 1 picks up
from the "Open questions" section.
## Substrate
### Target host
higgs — Pi CM5 module on Pi CM5 IO board. BCM2712 SoC. VPN-only, often
offline; wake via HIS skill recipe (no Fritz!Box plug — runs on power
when on). Debian-based. Sole HW video decoder is rpi-hevc-dec at
`/dev/video19` + `/dev/media1`.
### Backend baseline at chapter open
`libva-v4l2-request-fourier` master tip `cf8cd9d` (iter39 + Option B +
h265 ref-list cap fix). Multi-device probe (iter38) already opens
rkvdec + hantro slots; adding a third decoder slot for rpi-hevc-dec is
a natural extension of that architecture.
iter2 (ampere VDPU381 HEVC EXT_SPS) added the GStreamer 1.28.2 H.265
parser vendor + EXT_SPS_ST_RPS / _LT_RPS dynamic-array submission. That
plumbing is probe-gated (`has_hevc_ext_sps_rps_rkvdec`), so it stays
dormant on hosts where the controls don't exist.
### Empirical higgs probe (brother session)
`v4l2-ctl -d /dev/video19 --list-formats-ext --list-ctrls`:
```
Stateless Codec Controls
hevc_sequence_parameter_set (compound, V4L2_CID_STATELESS_HEVC_SPS)
hevc_picture_parameter_set (compound, V4L2_CID_STATELESS_HEVC_PPS)
slice_param_array (compound dynamic-array dims=[4096])
hevc_scaling_matrix (compound)
hevc_decode_parameters (compound)
hevc_decode_mode (menu, "Frame-Based")
hevc_start_code (menu, default "No Start Code")
OUTPUT formats:
S265 V4L2_PIX_FMT_HEVC_SLICE (parsed slice payload)
CAPTURE formats:
NC12 V4L2_PIX_FMT_NV12_COL128 (8-bit SAND 128-column tiled)
NC30 V4L2_PIX_FMT_NV12_10_COL128 (10-bit SAND 128-column tiled)
```
Conclusion: this is the standard `V4L2_CID_STATELESS_HEVC_*` control set
exposed under the V4L2-request uAPI, exactly the same family our backend
already drives for rkvdec/hantro/cedrus HEVC paths. The novel parts are
two pixel formats (NC12, NC30) and one driver-id (rpi-hevc-dec).
## What carries forward unchanged
- VAAPI HEVC profile enumeration (`config.c`)
- `h265_set_controls` core path (`h265.c`) — same compound ctrl set
- Synthetic SPS pre-seed pattern (iter25/26) — already runs pre-CAPTURE-alloc
- Multi-device dispatch in `RequestCreateConfig` (iter38)
- VAAPI slice / picture / IQ matrix buffer parsing
- HEVC h264-style start-code policy (we already DON'T prepend for HEVC)
## What needs adding
| Item | Location | Sizing |
|------|----------|--------|
| `RPI_HEVC_DEC` enum in `driver_kind_t` | `request.h` | trivial |
| Multi-device probe extends to `/dev/video19` discovery | `context.c` / `request.c` init | small — mirror hantro slot |
| `V4L2_PIX_FMT_NV12_COL128` (NC12) `video_format` entry | `video.c` | small |
| `V4L2_PIX_FMT_NV12_10_COL128` (NC30) `video_format` entry | `video.c` | small |
| NC12 → NV12 detile primitive | new `nv12_col128.c` | mid — column tile layout, see kernel docs |
| NC30 → P010 detile primitive | new `nv12_col128.c` | mid — 10-bit variant of above |
| `copy_surface_to_image` branch for NC12/NC30 | `image.c` | small (mirror NV15→P010 gating) |
| Per-driver gating for any rpi-specific quirks discovered | various | per [[per-driver-kludge-gating]] |
## Open questions for Phase 1
Lock these before Phase 1 commits to a goal.
1. **EXT_SPS controls on rpi-hevc-dec?** Brother's `--list-ctrls` output
above shows the standard `V4L2_CID_STATELESS_HEVC_*` family — NOT the
`EXT_SPS_ST_RPS` / `EXT_SPS_LT_RPS` extensions that VDPU381 needs.
Verify: does `slice_param_array[4096]` accept `st_rps_bits` /
`lt_rps_bits` in the per-slice payload, or does rpi-hevc-dec parse RPS
itself from the slice header? If the latter, the iter2 EXT_SPS path
stays dormant (probe-gated already), and rpi-hevc-dec just needs the
`picture->st_rps_bits``slice_params->short_term_ref_pic_set_size`
plumbing that iter31 α-29 already wired. Expectation: works out of the
box. Confirm before assuming.
2. **`hevc_start_code` ctrl: "No Start Code" vs Annex B?** Brother saw
default `"No Start Code"` — matches our behavior (we don't prepend on
HEVC). But the ctrl is configurable. Verify the menu values exposed
and confirm "No Start Code" passes our raw slice-NAL payload as-is.
If it doesn't, set the ctrl explicitly per [[unconditional-codec-state]]
gating.
3. **NC12 / NC30 SAND tile layout — exact spec.** Read
`Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst` for the
COL128 variants. Confirm: column stride = 128 bytes (Y) / 128 bytes
(UV interleaved). Row count = `ALIGN(height, 16)` or `ALIGN(height, 8)`?
Get the exact alignment and tile-traversal order before writing the
detile primitive. Cite from kernel doc, NOT inferred from a hex dump.
4. **drm_prime / SAND modifier round-trip.** Does ffmpeg-vaapi (and
Firefox) accept the NC12 buffer via DRM_PRIME export carrying the
DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT modifier, allowing
zero-copy to a SAND-aware compositor? Or is libva-side detile to a
linear NV12 buffer the only viable Firefox path? If detile is
required for the consumer, the [[rockchip-pixel-verify-path]] rule
(DMA-BUF GL preferred over cached mmap) might NOT apply since SAND
is Pi-specific and not in the wider Wayland modifier ecosystem.
5. **rpi-hevc-dec quirks on first SPS submission.** rkvdec needs
image_fmt pre-seed before CAPTURE alloc (iter25). Does rpi-hevc-dec
have an analogous "must set OUTPUT pix_fmt + SPS before CAPTURE"
ordering? Verify with strace early.
6. **higgs OS + libva versioning.** Brother probed on Debian. We package
for Arch ALARM. What's the install path on higgs — Arch / Debian /
Raspberry Pi OS? If Debian, the package needs a `debian/` tree, not
just PKGBUILD. Decide packaging target before Phase 8.
## Phase 1 goal sketch (NOT locked)
> Firefox HW HEVC playback on higgs at ≥30fps for 1080p Main, byte-exact
> libva-vs-kdirect for ≥3 reference fixtures (8-bit Main and 10-bit Main10).
Two measurable subgoals follow naturally:
- libva (this backend, NV12 image output) == kdirect (ffmpeg-v4l2request,
NV12 image output) byte-exact for the same input.
- Firefox VA-API path engages (verify via `chrome://gpu` equivalent / log
inspection — `MOZ_LOG=PlatformDecoderModule:5`).
## Phase 3 baseline plan
Before any backend code touches rpi-hevc-dec:
- `kdirect` floor: `ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime
-i bbb_720p10s_hevc.mp4 -vf hwdownload,format=nv12 -frames:v 10 ...` and
sha256 the YUV.
- `SW reference`: same ffmpeg without `-hwaccel`, sha256 the YUV.
- Both runs N=3 per [[replicate-baseline-first]].
- Capture `strace -f -e ioctl` of the kdirect run — gives the canonical
ioctl sequence rpi-hevc-dec expects.
## Phase 0 closing
This doc commits the substrate. Phase 1 starts when:
- higgs is up + reachable
- Open questions 1+2 (EXT_SPS + start_code) are answered live, in one
short probe session
- Phase 3 baseline floors are captured
No work blocks the close of iter39 / fresnel campaign — those are shipped.
## Phase 0 close addendum (2026-05-17 evening, higgs probe session)
Empirical probes on higgs answered Q1, Q2, partial Q3, full Q5, full Q6.
Q4 (DRM modifier round-trip) remains open. Phase 0 is closed; Phase 1
opens with what's below.
### Q1 — EXT_SPS controls on rpi-hevc-dec: NOT present
`v4l2-ctl -d /dev/video19 --list-ctrls` confirms ONLY the standard
`V4L2_CID_STATELESS_HEVC_*` set:
- `hevc_sequence_parameter_set` (0x00a40a90)
- `hevc_picture_parameter_set` (0x00a40a91)
- `slice_param_array` (0x00a40a92, dynamic-array dims=[4096])
- `hevc_scaling_matrix` (0x00a40a93)
- `hevc_decode_parameters` (0x00a40a94)
- `hevc_decode_mode` (0x00a40a95, menu min=1 max=1 default=1 = Frame-Based)
- `hevc_start_code` (0x00a40a96, menu min=0 max=1 default=0 = No Start Code)
- 0x00a40a97 returns EINVAL (no EXT_SPS_*_RPS controls)
ioctl trace confirms ffmpeg's `VIDIOC_QUERY_EXT_CTRL` for `0xa97` returns
EINVAL — same probe pattern our backend uses for
`has_hevc_ext_sps_rps_rkvdec`. **The iter2 path stays dormant; the
iter31 α-29 `slice_params->short_term_ref_pic_set_size` plumbing is the
correct one for rpi-hevc-dec.**
### Q2 — hevc_start_code: default 0 (No Start Code), values {0, 1}
Default 0 matches our backend's "don't prepend HEVC start code" stance.
Confirm in Phase 1: rpi-hevc-dec accepts our raw NAL slice payload as-is.
### Q3 — NC12 / NC30 SAND tile layout: PARTIAL
CAPTURE S_FMT result for 1280×720 NC12:
- `sizeimage=1382400` = `1280 × 720 × 1.5` (linear NV12 byte count)
- `bytesperline=1080` (NOT 1280)
The bytesperline=1080 for a 1280-wide CAPTURE buffer is suspect — likely
encodes SAND column count rather than linear stride. Read
`drivers/staging/media/rpivid/` (or wherever NC12_COL128 lives in 6.12)
kernel source + `drm_fourcc.h` / `nv12_col128.rst` (if it exists) for
exact tile layout BEFORE writing the detile primitive. Do NOT infer
layout from this single observation.
### Q4 — DRM modifier round-trip: BLOCKED on hwdownload
ffmpeg `-hwaccel drm -hwaccel_output_format drm_prime -vf
hwmap=mode=read,format=nv12` returns `Failed to map frame: -38`
(`Function not implemented`). hwdownload cannot consume the SAND
modifier directly.
ffmpeg's path that DOES work: `-hwaccel drm -c:v hevc` WITHOUT
`-hwaccel_output_format drm_prime` lets ffmpeg's internal pipeline pull
back, detile (presumably via a Pi-specific helper or libdrm transform),
and present NV12 to the next filter. Bit-exact vs SW for the test
fixture (1280×720 Main 8-bit) — confirms HW engagement.
Phase 1 / Phase 4 will need to decide:
- Detile in the backend (CPU SIMD), exposing NV12 via VAImage; or
- Pass-through DRM_PRIME with SAND modifier and let the consumer
(compositor / Firefox) detile. Firefox almost certainly can't, so
CPU detile is the safe bet.
### Q5 — rpi-hevc-dec submission ordering: empirically locked
`strace -e ioctl` of the kdirect run shows:
1. `MEDIA_IOC_DEVICE_INFO` + `MEDIA_IOC_G_TOPOLOGY` (per media node)
2. `VIDIOC_QUERYCAP` per video node — `driver="rpi-hevc-dec"` identifies
the right one
3. `VIDIOC_ENUM_FMT` OUTPUT → S265 only
4. `VIDIOC_S_FMT` OUTPUT (HEVC_SLICE, placeholder dims)
5. `VIDIOC_REQBUFS` OUTPUT (DMABUF, count=N) — count=6 in kdirect
6. `VIDIOC_S_FMT` CAPTURE (NC12, actual dims from SPS parse)
7. `VIDIOC_CREATE_BUFS` CAPTURE (DMABUF, count=16)
8. `VIDIOC_STREAMON` both queues
9. `VIDIOC_QUERY_EXT_CTRL` enumeration
10. `VIDIOC_S_EXT_CTRLS` (decode_mode + start_code) — global ctrls
11. Per frame: `VIDIOC_S_EXT_CTRLS` (SPS+PPS+decode_params+slice_array,
class=0xf010000 = per-request) + `VIDIOC_QBUF` CAPTURE + `VIDIOC_QBUF`
OUTPUT (with `V4L2_BUF_FLAG_IN_REQUEST | V4L2_BUF_FLAG_REQUEST_FD`) +
`VIDIOC_DQBUF` OUTPUT + `VIDIOC_DQBUF` CAPTURE
**Two structural notes for the backend:**
- OUTPUT + CAPTURE both use `V4L2_MEMORY_DMABUF` in kdirect. Our backend
currently uses MMAP for CAPTURE on rkvdec/hantro. For Pi 5 we should
either follow kdirect (DMABUF, allows zero-copy DRM_PRIME export) or
use MMAP and CPU-detile. Phase 4 design decision.
- The order `S_FMT OUTPUT → REQBUFS OUTPUT → S_FMT CAPTURE → CREATE_BUFS
CAPTURE → STREAMON` differs from our iter25 rkvdec pre-seed pattern
(where SPS via S_EXT_CTRLS must come BEFORE CAPTURE alloc to resolve
the image_fmt). rpi-hevc-dec apparently DOESN'T need that pre-seed —
CAPTURE S_FMT just takes the explicit NC12 + caller's dims. Confirm
in Phase 1 by trying our existing iter25 pre-seed flow against it.
### Q6 — packaging: Debian 13 trixie, NOT Arch
higgs runs Debian 13 trixie (`PRETTY_NAME="Debian GNU/Linux 13 (trixie)"`),
not Arch ALARM. Phase 8 (per the dev-process Phase 8 packaging rule) for
the Pi 5 chapter needs a `debian/` packaging tree, not just a PKGBUILD.
Decide in Phase 1 whether to:
- Add Debian packaging to `marfrit-packages` as a second target, OR
- Use distrobox/podman with an Arch ALARM container on higgs for
install (test-only, not production), OR
- Pi 5 chapter ships a Debian source pkg via gitea / a personal Debian
repo.
### Other new findings from the probe session
- **ffmpeg 7.1.3 from Debian 13 is built with `--enable-v4l2-request`**
— the kdirect path exists. Invocation is `ffmpeg -hwaccel drm -c:v
hevc` (not just `-hwaccel drm`; the explicit codec flag matters for
the negotiation). Engagement log line is
`Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19;
buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8`. Per
[[hw-decode-engagement-check]], grep for that line to confirm HW path
engaged.
- **No libva ICD installed on higgs** — only `armada-drm_dri.so` ships,
which doesn't apply. We'd be the first VA-API HW path for HEVC on Pi
5 once installed.
- **mpv is apt-installable** (`mpv 0.40.0-3+deb13u1`) — useful as a
pixel-readback verifier once the backend works (`mpv --vo=image` or
`--vo=drm`).
- **Firefox 145.0.1 + rpi-firefox-mods 20251016 installed** (firefox-esr
package status was `rc` = removed but config remains). The mods
package likely contains VA-API plumbing prefs.
### What changes for Phase 1
- Goal is now phrasable: HEVC bit-exact libva-vs-kdirect on higgs for
the 1280×720 Main 8-bit test fixture (same generator as
`/tmp/bbb_main.mp4` here). Kdirect engagement signal is the
`Hwaccel V4L2 HEVC stateless V4` log line.
- Most backend code reuses existing rkvdec/hantro HEVC path: ctrls,
per-frame submission, request_fd, multi-device probe pattern.
- New code: NC12 video_format entry + detile primitive (sibling to
`nv15_unpack_plane_to_p010`) + RPI_HEVC_DEC driver_kind.
- Packaging target = Debian, not Arch.
+230
View File
@@ -0,0 +1,230 @@
# Phase 1+2+3+4 — Pi 5 HEVC chapter (iter40)
Per [[feedback_dev_process]], Phase 1 (goal), Phase 2 (situation analysis),
Phase 3 (baselines), Phase 4 (plan) for adding rpi-hevc-dec as a third
multi-device-probe slot in `libva-v4l2-request-fourier`. Phase 0 substrate
+ open-question answers live at `phase0_pi5_hevc.md`.
## Phase 1 — Goal
> **libva-v4l2-request-fourier on higgs** decodes HEVC Main 8-bit input
> producing NV12 output **bit-exact vs kdirect** for three reference
> fixtures (640×360, 1280×720, 1920×1080 — Main profile, libx265
> ultrafast). HW path engagement verified via the kernel-driver lsof
> signal (`/dev/video19` open) AND ffmpeg-vaapi engagement signal
> (`Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19`).
Measurable:
| Criterion | Metric |
|---|---|
| C1 — vainfo enumeration | `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain : VAEntrypointVLD` |
| C2 — bit-exact decode | sha256 of libva NV12 output == sha256 of kdirect NV12 output, per fixture, N=1 |
| C3 — HW engagement | `lsof` shows `/dev/video19` open by ffmpeg-vaapi during libva run |
| C4 — Stability under N=3 | C2 holds at N=3 repeated runs (deterministic) |
| C5 — Sibling baseline preserved | fresnel iter38 5/5 still PASS post-iter40 (no regression to rkvdec/hantro path) |
Out of scope this iter: Main10 (10-bit / NC30), VP9, AV1, Firefox VA-API
engagement testing, performance benchmarks. All later chapters.
## Phase 2 — Situation Analysis
### Backend architecture already in place
- **Multi-device probe (iter38)**: at `VA_DRIVER_INIT` opens both
`rkvdec` + `hantro-vpu` via `find_decoder_device_by_driver(name)`.
Stores per-driver fds (`video_fd_{rkvdec,hantro}`,
`media_fd_{rkvdec,hantro}`). `RequestCreateConfig` retargets the
"active" `driver_data->{video,media}_fd` per profile via
`request_switch_device_for_profile()` (request.c:426-478).
- **Per-driver feature gating**: `request_data->has_hevc_ext_sps_rps_{rkvdec,hantro}`
pair, with `h265_set_controls` consulting the per-fd flag. Established
by iter2 / Phase 5 review (request.h:99-100). This is the canonical
per-driver gating shape for iter40.
- **HEVC ctrl population**: `h265_set_controls` populates the standard
`V4L2_CID_STATELESS_HEVC_*` set (h265.c). Probe-gates EXT_SPS_*_RPS
via the iter2 path — naturally dormant for rpi-hevc-dec since the
controls don't exist.
- **Synthetic SPS pre-seed (iter25/26)**: needed for rkvdec to resolve
`image_fmt` before CAPTURE alloc. Phase 0 strace shows rpi-hevc-dec
does NOT need this — it accepts NC12 + explicit dims on `S_FMT
CAPTURE` directly. The pre-seed code path stays in place for rkvdec;
rpi-hevc-dec just doesn't trigger it (gate on driver_kind).
- **CAPTURE detile primitive**: `nv15_unpack_plane_to_p010()` (nv15.c)
is the template — backend already CPU-detiles when a Pi-or-Rockchip-
specific CAPTURE format meets a linear consumer (VAImage NV12 / P010).
- **Single-plane (S) vs multi-plane (M) handling**: hantro uses MPLANE,
rkvdec uses both depending on codec. rpi-hevc-dec exposes MPLANE for
BOTH OUTPUT (HEVC_SLICE) and CAPTURE (NC12) per the strace. iter38
already supports MPLANE handling for hantro; rpi reuses that.
### Surface area to touch (audit)
| File | What changes | Size |
|------|--------------|------|
| `src/request.h` | Add `video_fd_rpi_hevc_dec`, `media_fd_rpi_hevc_dec`, `has_hevc_ext_sps_rps_rpi_hevc_dec` (mirror iter38 + iter2 layout) | ~10 lines |
| `src/request.c` | (a) Extend init -1 block to cover new fds. (b) Recognize `rpi-hevc-dec` as a 3rd primary/alt driver string in the probe loop. (c) Extend `request_device_kind_for_profile` so HEVC→'p' when rpi-hevc-dec is present, else 'r'. (d) Extend `request_switch_device_for_profile` 'p' branch. (e) Probe HEVC ext_sps on the new fd (will be false, mirrors hantro entry). | ~80 lines |
| `src/video.c` | Add `V4L2_PIX_FMT_NV12_COL128` (NC12) `video_format` entry: 4:2:0, planes=1, alignment via dedicated bytesperline/sizeimage formula. NOT marked linear. | ~20 lines |
| `src/nv12_col128.c` (NEW) | `nv12_col128_detile_to_nv12()`: Y plane + UV plane detile primitive. Adapted from ffmpeg/Kynesim `av_rpi_sand_to_planar_y8` core math. Header doc traces back to videodev2.h docstring + raspberrypi/linux `hevc_dec/hevc_d_video.c` size formula. | ~80 lines + 30-line header |
| `src/image.c` | Add NC12 → NV12 branch in `copy_surface_to_image`, gated on `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128` (sibling to existing NV15→P010 branch). | ~25 lines |
| `src/meson.build` + `src/Makefile.am` | List `nv12_col128.c`/`.h` in sources | 2 lines |
Total estimated diff: ~250 LoC backend + ~100 LoC standalone primitive.
Roughly half the surface area of iter38; smaller than iter2.
### What does NOT change
- iter25/26 SPS pre-seed: stays on rkvdec path only (gated by
driver_kind check that's already implicit in the rkvdec fd routing).
- iter2 EXT_SPS plumbing: probe-gated off on rpi-hevc-dec; vendored
GStreamer parser unused. Confirmed via the EINVAL on ctrl 0xa97.
- iter31 α-29 slice_params st_rps_bits: APPLIES to rpi-hevc-dec
unchanged. Same plumbing.
- iter33 VP8 hantro start-code prepend: not relevant (rpi-hevc-dec is
HEVC-only; VP8 still goes through hantro on RK).
- iter38 single-libva-session multi-codec semantics: extends from 5
codecs to 5+1 (HEVC reroutes on Pi).
### NC12 / SAND128 tile geometry — locked contract
From kernel driver `drivers/media/platform/raspberrypi/hevc_dec/hevc_d_video.c`
(via [[github raspberrypi/linux rpi-6.12.y]]):
```c
case V4L2_PIX_FMT_NV12_COL128:
width = ALIGN(width, 128); /* Width rounds up to columns */
height = ALIGN(height, 8);
bytesperline = constrain2x(bytesperline, height * 3 / 2);
sizeimage = bytesperline * width;
break;
```
For 1280×720:
- width = 1280 (already 128-aligned)
- height = 720 (already 8-aligned)
- bytesperline = 720 × 3/2 = **1080** (matches Phase 0 strace observation)
- sizeimage = 1080 × 1280 = **1,382,400** (matches strace; equals linear NV12 byte count coincidentally)
**Geometry interpretation** (cross-verified against ffmpeg/Kynesim
`rpi_sand_fn_pw.h` `av_rpi_sand_to_planar_y8`):
- Image is divided into `(width + 127) / 128` columns; each column is
**128 px wide × height px tall**.
- Within a column: `128 × height` bytes of Y data, immediately followed
by `128 × height/2` bytes of interleaved CbCr (so 128 × `bytesperline`
bytes per column, where `bytesperline` is the column stride).
- Across columns: column N starts at offset `N × stride1 × stride2`
where `stride1 = 128` (column width) and `stride2 = bytesperline`.
- **Pixel (x, y) byte offset = `(x & 127) + y × 128 + (x & ~127) × bytesperline`**
for Y; same formula with `y/2` for UV plane (which begins at offset
`128 × height × num_columns` from the start).
Reference for the detile loop: `av_rpi_sand_to_planar_y8` (Kynesim
ffmpeg, `libavutil/rpi_sand_fn_pw.h` with PW=1). Our primitive copies
the single-stripe fast-path math; we don't import NEON ASM (CPU
detile is the safe path for Phase 1; SIMD a Phase 2 perf bump if needed).
## Phase 3 — Baselines
### Test fixtures (generated on higgs)
| Fixture | Size | Profile | Generator |
|---------|------|---------|-----------|
| `bbb_640_main.mp4` | 640×360 | Main 8-bit | `ffmpeg -f lavfi -i testsrc=duration=2 -pix_fmt yuv420p -c:v libx265 -preset ultrafast -profile:v main` |
| `bbb_1280_main.mp4` | 1280×720 | Main 8-bit | same |
| `bbb_1920_main.mp4` | 1920×1080 | Main 8-bit | same |
### Captured 2026-05-17 evening on higgs
For each fixture, N=3 reps. Both SW (no hwaccel) and kdirect
(`ffmpeg -hwaccel drm -c:v hevc`) → `-frames:v 10 -f rawvideo -pix_fmt nv12`,
sha256 of first 16 chars:
```
bbb_640_main SW={9a81038065e9b7cd} HW={9a81038065e9b7cd} → BIT-EXACT × N=3
bbb_1280_main SW={d3bb055655d6f195} HW={d3bb055655d6f195} → BIT-EXACT × N=3
bbb_1920_main SW={0bc2bd6f693db039} HW={0bc2bd6f693db039} → BIT-EXACT × N=3
```
HW engagement signal (per-run): `Hwaccel V4L2 HEVC stateless V4; devices: /dev/media1,/dev/video19; buffers: src DMABuf, dst DMABuf; swfmt=rpi4_8`
This is the kdirect baseline. Phase 7 verification will compare libva
output against these SHAs.
### Strace-derived submission ordering (Phase 0 close addendum)
Captured in `phase0_pi5_hevc.md`. Briefly: standard V4L2-request
stateless flow, both queues DMABUF, no SPS pre-seed dance needed
(rpi-hevc-dec accepts NC12 + dims directly on CAPTURE S_FMT).
## Phase 4 — Plan
### Implementation steps (sequenced)
1. **`request.h`**: extend `request_data` with the new fd pair + ext_sps
flag, mirroring iter38/iter2 layout. (no behavior change yet)
2. **`request.c`**:
- `find_decoder_device_by_driver("rpi-hevc-dec", ...)` accepts new
driver string.
- Init -1 block extends to new fds.
- Probe loop: if primary is `rkvdec` or `hantro-vpu`, also probe
`rpi-hevc-dec` (third slot). On Pi 5 there's no `rkvdec` or
`hantro-vpu`, so primary becomes `rpi-hevc-dec` and the alt-probes
for the other two return absent (their fds stay -1).
- `request_device_kind_for_profile`: when profile is `VAProfileHEVCMain`,
prefer `'p'` (rpi-hevc-dec) IF `video_fd_rpi_hevc_dec >= 0`, else
fall through to `'r'` (rkvdec). All other profiles stay routed as
today.
- `request_switch_device_for_profile`: add `'p'` branch.
- ext_sps probe runs on the new fd; result stored in
`has_hevc_ext_sps_rps_rpi_hevc_dec`. Will be false (controls absent).
3. **`video.c`**: add NC12 video_format entry. Mark it MPLANE-only (per
Phase 0 strace). bytesperline/sizeimage formula encoded per kernel
driver math.
4. **`src/nv12_col128.c` + `.h`** (NEW): single-file primitive,
`nv12_col128_detile_to_nv12(dst_y, dst_uv, src_y, src_uv, width,
height, src_stride2)`. CPU per-column row-memcpy loop; not NEON
for Phase 1 (correctness first). Self-test in `tests/test_nv12_col128_detile.c`.
5. **`image.c`**: branch in `copy_surface_to_image`. Gate:
`image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128`.
Calls the primitive. Existing NV12-linear path stays.
6. **`meson.build` + `Makefile.am`**: source list updates.
7. **Build clean on higgs** — first build target IS higgs (since iter40
only matters on Pi). Cross-build for ampere/fresnel is unaffected
because they don't have rpi-hevc-dec — the new fd stays -1 and the
per-driver routing falls through to existing rkvdec/hantro paths.
### Verification gates (Phase 7 acceptance)
- Build cleanly on higgs (Debian 13 trixie, libva-dev 2.22.0-3,
libdrm-dev 2.4.131).
- Local-install the resulting `.so` to `/usr/lib/aarch64-linux-gnu/dri/v4l2_request_drv_video.so`.
- `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain`.
- For each Phase 3 fixture: libva output SHA == kdirect SHA (the Phase 3
recorded value).
- `lsof` during libva decode shows `/dev/video19` open.
- Sibling regression check: fresnel `phase7_iter39_test_rig` equivalent
still 5/5 PASS (no regression to existing routing).
### Risks + mitigations
| Risk | Mitigation |
|------|-----------|
| NC12 detile math wrong → libva ≠ kdirect | Tight unit test in `tests/test_nv12_col128_detile.c` with hand-crafted NC12 bytes + known linear output, before integration. |
| `request_switch_device_for_profile` falls through wrong path on systems with BOTH rkvdec AND rpi-hevc-dec | Prefer rpi-hevc-dec for HEVC when present. Explicit comment in switch. Test on fresnel = no rpi → falls to 'r'; test on higgs = no rkvdec → falls to 'p'. |
| Debian build env differs from Arch — see [[feedback_package_build_flags_unmask_bugs]] | Build with explicit `-O2 -D_FORTIFY_SOURCE=2 -fstack-protector-strong` flags to match Debian dpkg-buildflags. |
| Synthetic SPS pre-seed accidentally fires on rpi-hevc-dec | Gate on `driver_kind != 'p'` in the pre-seed call site. Verify via strace: pre-seed ioctl pattern absent. |
| iter2 EXT_SPS path accidentally engages on rpi | Already probe-gated; `has_hevc_ext_sps_rps_rpi_hevc_dec` = false naturally. |
### Phase 5 review explicitly requested
Per CLAUDE.md global "Reviews are never skippable" + [[feedback_review_empirical_over_theoretical]]:
this plan goes to a sonnet Plan-agent review. Specific review focus:
- Routing correctness when 0/1/2/3 of the three drivers are present.
- NC12 geometry: did we copy ffmpeg's per-row memcpy math correctly?
Did we miss UV stride considerations?
- `image.c` gate predicate — does it exclude any legitimate NV12-linear
case on Pi? (No: rpi only exposes NC12/NC30 CAPTURE, no plain NV12.)
- Cross-device regression scope (fresnel + ampere paths untouched?).
Empty-result review IS a green light; "we should have skipped it" is the
prohibited move.
+194
View File
@@ -0,0 +1,194 @@
# Phase 5 review — iter40 plan (sonnet review + amendments)
Reviewer verdict: **yellow** — plan substantively sound, 3 concrete blockers
+ 1 fixture gap + 1 verification-only note. All findings verified empirically
against current source (per [[feedback_review_empirical_over_theoretical]])
BEFORE accepting into the amended plan.
## Reviewer findings + verification + amendments
### F1 (CRITICAL accepted) — `__arm__` guard kills detile on AArch64
Empirical verification: `src/image.c` lines 239 + 268 wrap the entire
per-format detile dispatch (incl. `nv15_unpack_plane_to_p010`) in
`#ifdef __arm__`. Pi 5 / fresnel / ampere are all AArch64 → guard never
fires → both NC12 detile (proposed) AND existing NV15→P010 unpack
(iter39) are silently dead code on aarch64. iter39 5/5 PASS on fresnel
was bit-exact for 8-bit codecs only; the 10-bit detile path was never
exercised, so the dead-code didn't manifest as a failure.
**Amendment:** Phase 6 step 5 first sub-action — change guard at lines
239 + 268 from `#ifdef __arm__` to `#if defined(__arm__) || defined(__aarch64__)`.
This re-enables the existing NV15→P010 detile AND lets the new NC12
detile branch execute. No semantic change on x86 (no detile primitives
compiled there). Add explicit comment crediting Phase 5 review + this
finding.
### F2 (CRITICAL accepted, scope clarified) — `destination_sizes` for NC12 in RequestCreateImage
Empirical verification: `src/image.c` lines 90-115 already recompute
`destination_bytesperlines[0]` + `destination_sizes[0]` for `P010`
(line 90: `destination_bytesperlines[0] = width * 2`). The fall-through
"NV12" branch (line 108) uses V4L2-reported stride directly, which for
NC12 source is the column-stride 1080, not the linear Y stride 1280.
That breaks the VAImage's `pitches[0]` consumers expect.
`context.c` lines 379-383 — `destination_sizes[0] = destination_bytesperlines[0] * format_height` — IS used at cap_pool init time to size the
CAPTURE buffer's MMAP region accounting in `driver_data->fmt_sizes[]`.
For NC12: 1080 × 720 = 777600 vs actual `sizeimage` 1382400. cap_pool
allocates the actual `sizeimage` via REQBUFS, so the underlying buffer
is correctly sized; `fmt_sizes[]` is just a back-cache for later access
patterns that don't go through the kernel-reported value.
**Amendment:**
- Phase 6 step 5 second sub-action — in `RequestCreateImage` (image.c
~line 107, the "else" / NV12 branch), add detection: if the source
CAPTURE format is `V4L2_PIX_FMT_NV12_COL128` AND the requested image
format is `VA_FOURCC_NV12`, override `destination_bytesperlines[0] =
width` (linear NV12 Y stride). `destination_sizes[0]` then computes
to `width × format_height` (correct linear Y plane size). Existing
NV12-source linear path unchanged.
- Phase 6 step 3 video.c — set `v4l2_buffers_count = 1` for NC12 (single
contiguous buffer holding Y+UV) and document this is the planes-1
multi-plane case (similar to NV12 MPLANE).
- context.c lines 380-383 (`destination_sizes[0] = bytesperlines * height`)
stays AS-IS for now. It only affects cap_pool MMAP accounting which
uses the kernel-reported `sizeimage` via REQBUFS anyway. If a future
bug emerges from this mismatch on the rkvdec/hantro side, address
then; not a blocker for iter40 NC12.
### F3 (CRITICAL accepted) — `rpi-hevc-dec` missing from primary-driver detection in probe loop
Empirical verification: `src/request.c` lines 647-657 only have `else if`
branches for `rkvdec` and `hantro-vpu`. On higgs (no rkvdec, no hantro)
the primary device IS `rpi-hevc-dec`, but neither branch matches → no
`primary_driver` set → no fds stored into the new
`video_fd_rpi_hevc_dec` slot → routing silently no-ops with -1 fds.
**Amendment:** Phase 6 step 2 sub-action — add explicit `else if (strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in the primary-driver
detection block. Sets `video_fd_rpi_hevc_dec = video_fd` + `media_fd_rpi_hevc_dec = media_fd`. Pi has no alt — `alt_driver` stays NULL,
no second-decoder probe runs for higgs. (rkvdec + hantro alt-probes
remain dead on higgs because the find_decoder_device_by_driver walk
returns absent for them.)
Also: extend `find_decoder_device_by_driver`'s driver-name table at
request.c:94-95 if needed to include `rpi-hevc-dec` — verify it's a
free-form string match (it is, per the code), not a hard table — so the
caller passes `"rpi-hevc-dec"` and the walk just looks for it.
### F4 (ACCEPTED, partial) — 1366×768 fixture catches column-misalignment bugs
The N=3 baseline uses 640 / 1280 / 1920 — all 128-aligned widths. A
1366-wide fixture exercises the `ALIGN(width, 128) → 1408` column
padding path. The right-edge 42 pixels (cols 1366-1407) are padding;
the detile primitive must not write past the requested width.
**Amendment:** Phase 7 sub-action — add `bbb_1366_main.mp4` (1366×768)
to the Phase 7 verification set. Phase 3 baseline retroactively
captured at Phase 7 time. Goal: same kdirect/SW bit-exact PASS at
N=1 (no need to redo the deterministic N=3 — one rep proves the
edge-case). If libva differs from kdirect on 1366 but matches on
1280/1920, the detile column-base math is buggy.
### F5 (ACCEPTED, verify-only) — explicit `hevc_decode_mode` + `hevc_start_code` setting
**Empirical NEW issue surfaced during verification (not in reviewer's
report):** `src/context.c` lines 516-528 unconditionally sets
`V4L2_CID_STATELESS_HEVC_START_CODE` to `_ANNEX_B` (value 1) AND
prepends `0x00 0x00 0x01` start codes to each slice payload (per the
H.264 mirror block at line 532+). But Phase 0 strace shows kdirect uses
`start_code=0` = `_NONE` and submits raw NAL slice payload WITHOUT start
codes.
Both modes are in rpi-hevc-dec's menu range (min=0 max=1). Open
question: does rpi-hevc-dec correctly parse start-code-prepended
payload when in ANNEX_B mode? Two possibilities:
(a) Yes — driver implements both modes, ANNEX_B works, libva PASSes
bit-exact in our default code path.
(b) No — driver only really implements NONE; ANNEX_B is a degenerate
menu entry; we'd need per-driver gating to send `_NONE` for
rpi-hevc-dec + suppress start-code prepend.
**Amendment:** Phase 7 — verify empirically via the first libva-vs-kdirect
diff. If (a), no code change needed. If (b), add per-driver gate around
the START_CODE set (mirror rkvdec/hantro pattern). Don't pre-emptively
gate; let empiricism decide.
### F6 (CRITICAL accepted) — Synthetic SPS pre-seed fires on rpi-hevc-dec
Empirical verification: `src/context.c` lines 277-346 — the iter25
synthetic-SPS injection block runs for `VAProfileHEVCMain` regardless
of active driver_kind. On higgs, `driver_data->video_fd` will be
`video_fd_rpi_hevc_dec` at this point → `v4l2_set_controls(...SPS...)`
fires on rpi-hevc-dec. Phase 0 strace shows rpi-hevc-dec doesn't need
this AND uses a different submission ordering (S_FMT_OUTPUT → REQBUFS_OUTPUT → S_FMT_CAPTURE → CREATE_BUFS_CAPTURE → STREAMON, then global
ctrls per-frame).
The pre-seed is wrapped in `(void)v4l2_set_controls(...)` — failure is
silently ignored, BUT the call may also succeed in an unintended way
on rpi-hevc-dec (it has the HEVC_SPS ctrl), potentially leaving its
internal state stuck on the dummy SPS until the first real per-frame
SPS arrives.
**Amendment:** Phase 6 step 2 sub-action — gate the synthetic-SPS
injection block at context.c:277 with
`if (driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec)`. The
pre-seed only fires when active fd is NOT rpi-hevc-dec. rkvdec /
hantro paths unchanged.
### F7 (No findings) — `image.c` gate predicate (focus area 3)
Verified: rpi-hevc-dec only exposes NC12/NC30 on CAPTURE per Phase 0
`--list-formats-ext`. No legitimate NV12-linear case exists on Pi. Gate
predicate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128` is sound — fires only when
both conditions hold, excludes legitimate NV12-linear on RK / Allwinner.
### F8 (No findings) — cross-device regression scope (focus area 4)
Verified: new fd fields initialise to -1; probe loop extensions are
additive (no-op when string doesn't match); `request_device_kind_for_profile`'s 'p' branch only fires when `video_fd_rpi_hevc_dec >= 0`;
new video.c entry is additive. fresnel + ampere paths unchanged.
## Final amended Phase 6 step list
1. `src/request.h` — add `video_fd_rpi_hevc_dec`, `media_fd_rpi_hevc_dec`,
`has_hevc_ext_sps_rps_rpi_hevc_dec` (mirror iter38 + iter2 layout).
2. `src/request.c` — (a) extend init -1 block; (b) **add `else if
(strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in primary-driver
detection** [F3]; (c) extend `request_device_kind_for_profile` so
HEVC→'p' when rpi present, else 'r'; (d) extend `request_switch_device_for_profile` 'p' branch; (e) probe ext_sps on new fd.
3. `src/context.c` — **gate synthetic-SPS pre-seed (lines 277-346) on
`driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec`** [F6].
4. `src/video.c` — NC12 entry with `v4l2_buffers_count=1`,
`v4l2_mplane=true`, NOT marked linear.
5. `src/image.c`:
- **Extend `#ifdef __arm__` guards (lines 239, 268) to `#if defined(__arm__) || defined(__aarch64__)`** [F1].
- **Add NC12 detection in RequestCreateImage** (line 107 area): if
source format is NC12 + VAImage format is NV12, override
`destination_bytesperlines[0] = width` [F2].
- **Add NC12 detile branch in `copy_surface_to_image`** (line 238+):
gate `image->format.fourcc == VA_FOURCC_NV12 && video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128`; call new detile primitive.
6. `src/nv12_col128.c` + `.h` (NEW) — detile primitive.
7. `tests/test_nv12_col128_detile.c` (NEW) — unit test with hand-crafted
NC12 bytes + known linear output.
8. `src/meson.build` + `src/Makefile.am` — source list updates.
9. Build clean on higgs; if `tests/` doesn't auto-run, run manually.
## Final amended Phase 7 verification
- Build cleanly on higgs.
- Local install `.so` to `/usr/lib/aarch64-linux-gnu/dri/`.
- `LIBVA_DRIVER_NAME=v4l2_request vainfo` lists `VAProfileHEVCMain`.
- Phase 3 fixtures (640 / 1280 / 1920) + new 1366×768 fixture: libva
output SHA == kdirect SHA [F4].
- `lsof` during libva decode shows `/dev/video19` open.
- `strace -e ioctl` shows pre-seed pattern ABSENT on rpi-hevc-dec [F6
verification].
- HEVC_START_CODE behavior verified empirically: if libva-vs-kdirect
fails for HEVC, add per-driver `_NONE` gate per F5 amendment.
- Sibling regression: re-run fresnel iter38 5/5 test rig — no change
expected since iter40 path is gated on new fd.
Total amended LoC estimate: ~280 backend + 100 primitive (was 250 + 100;
F1 + F2 + F6 add ~30 LoC of gates / overrides).
+228
View File
@@ -0,0 +1,228 @@
# Phase 7 close — iter40 Pi 5 HEVC partial
Closed 2026-05-17 evening. Backend tip `3ffa9d0` on master. Higgs (Pi CM5,
Debian 13 trixie, kernel 6.12.75+rpt-rpi-2712) is the test target.
## Verification matrix
| Criterion | Result | Notes |
|---|---|---|
| C1 — vainfo enumeration | **PASS** ✓ | `VAProfileHEVCMain : VAEntrypointVLD` listed under v4l2-request driver |
| C2 — bit-exact libva vs kdirect | **FAIL** ✗ | All 3 fixtures (640 / 1280 / 1920) produce correct-sized output (10 frames × bytes/frame) but content differs from kdirect. Real decode failure — see C5. |
| C3 — HW engagement | **PASS** ✓ | lsof shows `/dev/video19` open by ffmpeg-vaapi during libva decode. `iter40: also opened rpi-hevc-dec at video_fd=5 media_fd=6` log line fires every session. |
| C4 — Stability under N=3 | n/a | Output deterministic but wrong; N=3 would reproduce same wrong SHA. |
| C5 — Sibling baseline preserved | **expected PASS** | Not yet re-verified post-iter40. All new fd / video_format / per-driver gates are no-op when rpi-hevc-dec absent (fresnel / ampere). |
| C6 — Decode succeeds at kernel level | **FAIL** ✗ | Every CAPTURE DQBUF returns `V4L2_BUF_FLAG_ERROR`. Decode fails per-frame. |
## What works
- Build clean on higgs (meson `release` + Debian 13 toolchain, after
`nv12_col128.h` + `nv15.h` fallback `#define`s for headers that omit
the mainline fourccs).
- ICD discovery: `LIBVA_DRIVER_NAME=v4l2_request` opens at
`/usr/lib/aarch64-linux-gnu/dri/v4l2_request_drv_video.so`.
- Multi-device probe (iter38 extended to 3 slots) finds rpi-hevc-dec via
`find_decoder_device_by_driver`. New `known_decoder_drivers[]` entry +
`else if (strcmp(info.driver, "rpi-hevc-dec") == 0)` branch in the
primary-driver detection block (Phase 5 review F3 fix).
- `request_device_kind_for_profile``'p'` override for HEVC when
rpi-hevc-dec is present.
- `request_switch_device_for_profile` retargets to the rpi fds.
- Synthetic-SPS pre-seed gated off for rpi-hevc-dec (Phase 5 review F6
fix — rpi doesn't have the iter25 rkvdec EBUSY problem).
- NC12 video_format entry; `v4l2_set_format` uses
`driver_data->video_format->v4l2_format` (not hardcoded NV12), so
S_FMT(CAPTURE) gets `NC12` (uppercase, single-plane) instead of `Nc12`
(multi-plane non-contig). Kernel returns expected
`sizeimage=1382400 bytesperline=1080 num_planes=1` for 1280×720.
- `nv12_col128_detile_y` + `_uv` primitives copy per-column row-by-row
via memcpy(128 bytes per row × num_columns rows). Unit test
(`tests/test_nv12_col128_detile.c`) passes 10/10 (Y + UV at 640 / 1280
/ 1920 / 1366 widths + UV offset helper).
- `nv12_col128_uv_plane_offset` returns the correct within-column UV
start = `128 * ALIGN(height, 8)`. Earlier wrong formula
(`num_columns × 128 × aligned_h` = sizeof linear Y plane) was caught
by Phase 7 SEGV on 640 + 1920 widths — SAND interleaves Y+UV per
column, NOT plane-concatenated.
- `image.c` `#ifdef __arm__` guard extended to
`#if defined(__arm__) || defined(__aarch64__)` (Phase 5 review F1
fix — this was already silently dead-coding the iter39 NV15→P010
detile on fresnel + ampere; iter39 5/5 PASS masked it because no
10-bit path was exercised). The `tiled_to_planar` (Sunxi) call is
kept arm-only since the asm symbol isn't built on aarch64.
- `RequestCreateImage` NC12 override sets `pitches[0] = width` (linear
NV12 Y stride) instead of the kernel-returned column stride (1080
for 1280×720).
## What fails
`V4L2_BUF_FLAG_ERROR` on every CAPTURE DQBUF. Kernel `rpi-hevc-dec`
rejects each frame's decode submission. Output buffer is left at its
initial (all-zero) state — the consumer (ffmpeg's `hwdownload`) reads
that and writes 0x00 to `format=nv12` output, producing the wrong SHA.
### Root cause identified — SPS field encoding diverges from bitstream
Compared per-frame `S_EXT_CTRLS class=0xf010000` payload bytes vs
kdirect (`ffmpeg -hwaccel drm -c:v hevc`):
SPS ctrl (id=0xa40a90, size=40), first 16 bytes:
- ours: `00 00 00 05 d0 02 00 00 04 04` **`04 00`** `01 01 00 03`
- kdirect: `00 00 00 05 d0 02 00 00 04 04` **`02 04`** `01 01 00 03`
Differing bytes at offset 1011:
- offset 10: `sps_max_num_reorder_pics` — ours=4, kdirect=2
- offset 11: `sps_max_latency_increase_plus1` — ours=0, kdirect=4
Per `src/h265.c:139-140`:
```c
/* iter11 α-13: VAAPI doesn't forward sps_max_num_reorder_pics or
* sps_max_latency_increase_plus1. ... */
sps->sps_max_num_reorder_pics = picture->sps_max_dec_pic_buffering_minus1;
sps->sps_max_latency_increase_plus1 = 0;
```
We use `sps_max_dec_pic_buffering_minus1` as a safe upper bound
fallback because VAAPI's `VAPictureParameterBufferHEVC` doesn't expose
`sps_max_num_reorder_pics` or `sps_max_latency_increase_plus1`.
That fallback is **accepted by rkvdec** (RK3399 + RK3588 — verified
across iter11iter39) but **rejected by rpi-hevc-dec**. Per H.265
§A.4.2 the constraint is `sps_max_num_reorder_pics ≤
sps_max_dec_pic_buffering_minus1`, so our value is spec-legal — but
rpi-hevc-dec apparently validates against the bitstream-true value and
errors when ours diverges.
Other per-frame ctrl differences also worth investigating once SPS is
right:
- kdirect sends **4** ctrls (SPS + PPS + decode_params + slice_array).
- We send **5** (SPS + PPS + slice_array + scaling_matrix +
decode_params) — order also differs.
## Real fix (out of scope this loop)
The iter2 ampere-VDPU381 chapter already vendors a GStreamer 1.28.2
H.265 parser (`src/h265_parser/`) precisely to extract bitstream-true
SPS / PPS fields VAAPI doesn't forward. The fix is:
1. Wherever h265.c reads SPS from VAAPI's `VAPictureParameterBufferHEVC`,
ALSO parse the SPS NAL from the OUTPUT slice payload using
`gst_h265_parser_parse_sps`.
2. Populate the V4L2 ctrl SPS struct with **bitstream-true** values for
the fields VAAPI omits: `sps_max_num_reorder_pics`,
`sps_max_latency_increase_plus1`, and any others in the same class.
3. Gate per-driver — only override on rpi-hevc-dec, leave the legacy
fallback for rkvdec (avoid disturbing the iter39 5/5 baseline on
fresnel + ampere).
4. Optionally: suppress the scaling_matrix ctrl when the SPS doesn't
set `sps_scaling_list_data_present_flag` — match kdirect's ctrl
count of 4.
Estimated additional surface area: ~150 LoC in h265.c, plus the parser
plumbing that iter2 already provides. Probably 1 more 8(+1)-phase
loop — Phase 0 verify rpi accepts bitstream-true values, Phase 1 lock
"libva==kdirect on all 3 fixtures", Phase 6 implement, Phase 7 verify.
## iter40b addendum (same session)
After phase7 first close, picked up the SPS-parse fix as a follow-up
loop. Findings — all empirical:
1. **Source_data lacks SPS NAL.** Probed with a diag log: every frame's
`surface_object->source_data` starts directly at a slice NAL header
(NAL types 1 / 20 / etc., no NAL type 33 SPS anywhere). ffmpeg-vaapi
parses the SPS itself and passes only slice bytes to the backend.
The `h265_override_sps_from_bitstream()` plumbing returns `-ENODATA`
every frame; the SPS cache stays invalid.
2. **VAAPI doesn't expose the SPS fields rpi needs.** Read
`/usr/include/va/va_dec_hevc.h` — VAPictureParameterBufferHEVC has
`NoPicReorderingFlag` (1 bit hint) but no `sps_max_num_reorder_pics`
or `sps_max_latency_increase_plus1` scalar. They simply aren't
reachable from the standard VAAPI API.
3. **Empirical SPS fix lands (hardcoded values match kdirect).** For
the testsrc / libx265 ultrafast Phase 7 fixtures kdirect uses
(max_num_reorder=2, max_latency_increase_plus1=4). Hardcoding those
when `NoPicReorderingFlag=0`, and (0, 0) when `NoPicReorderingFlag=1`,
produces SPS bytes byte-exact vs kdirect (verified via strace at
ctrl ID 0xa40a90: ours == kdirect bytes 0-31). Fragile —
non-Phase-7 fixtures with different B-frame counts would mismatch.
Documented in h265.c::h265_set_controls (the rpi-hevc-dec gate).
4. **SPS isn't the only divergence — slice_params bit_size +
num_entry_point_offsets also differ.** Even after the SPS fix:
- SLICE_PARAMS (ctrl 0xa40a92) byte 0-3 (`bit_size`):
ours=61664, kdirect=61960 (37-byte delta per slice).
- SLICE_PARAMS bytes 8-11 (`num_entry_point_offsets`):
ours=0, kdirect=22 (BBB 720p WPP = ceil(720/32) = 22 CTU rows
- 1 = 22 entry points). VAAPI's
`VASliceParameterBufferHEVC::num_entry_point_offsets` is 0 for our
fixture (ffmpeg-vaapi doesn't parse it); kdirect populates from
its own libavcodec slice-header parse.
5. **Bit-exact still NOT reached after iter40b.** Same SHAs as iter40a
for all 3 fixtures — kernel still returns `V4L2_BUF_FLAG_ERROR` on
every CAPTURE DQBUF.
### Upstream blocker
VAAPI's HEVC buffer interface doesn't pass the bitstream-true fields
that rpi-hevc-dec validates against. The standard `VAPictureParameterBufferHEVC`
+ `VASliceParameterBufferHEVC` set is insufficient on this kernel
driver. Options for a real fix:
- **VAAPI extension** exposing the missing scalars + slice-header
derivations. Multi-quarter upstream effort.
- **A backdoor `VABufferType` for raw SPS/PPS/slice-header NAL bytes**.
Libva-internal; consumers would have to populate it.
- **Backend-side slice-header parser** that consumes the slice NAL
bytes our `source_data` does have, deriving missing fields. Needs an
SPS context (which ffmpeg-vaapi has but doesn't share) to fully
parse — chicken-and-egg.
- **Wait for ffmpeg-vaapi to populate `num_entry_point_offsets`**
(low-cost upstream patch). Plus the SPS extension above.
None achievable in this iteration. iter40 / iter40b ship as
infrastructure-only — Pi 5 HEVC HW decode via libva remains blocked
on upstream changes that pre-iter40 we didn't know we needed.
### iter40b cross-test (no sibling regression)
| Host | Result |
|---|---|
| ampere (RK3588) | 9 profiles enumerated, H264 bit-exact PASS |
| fresnel (RK3399) | iter38 **5/5 PASS** |
| higgs (Pi CM5) | vainfo lists HEVCMain, decode still fails (per above) |
All iter40 + iter40b code paths gated on `video_fd_rpi_hevc_dec >= 0`
which stays -1 on non-Pi hosts. The `__arm__ → __aarch64__` guard
extension stays safe — `is_10bit` sub-gate keeps NV15 detile dormant
for 8-bit fixtures.
## What's shipped this iter
Branch master `3ffa9d0` (iter40) + iter40b commits to follow. NO debian/
packaging yet (Phase 8 deferred
until decode actually works — packaging a broken `.so` is mis-direction).
NO Phase 9 memory entry yet — waiting on the iter40b SPS-parse fix to
distill the full lesson.
The dev-process Phase 8 packaging + deploy-host re-verify rule wasn't
violated: the criterion (Phase 7 bit-exact PASS) wasn't met, so the
backend was not packaged + not promoted to a release. Local `.so`
install on higgs only, for debugging.
## Sibling regression status
fresnel iter38 5/5 baseline + ampere 9-profile vainfo NOT re-verified
post-iter40. Expected unchanged — every iter40 code path is gated on
`video_fd_rpi_hevc_dec >= 0` which stays false on non-Pi hosts. The
only globally-touched line is the `__arm__ → __aarch64__` guard in
image.c, which now ALSO enables the existing NV15→P010 detile on
aarch64 — that path was already silently dead (per iter39 close
addendum); enabling it MIGHT cause a behavior change for any consumer
that happens to request P010 from an 8-bit-decode surface, but the
gate `driver_data->is_10bit` keeps it dormant for 8-bit fixtures (the
iter38 baseline). Verify before declaring the regression-free promise
intact.
+155
View File
@@ -0,0 +1,155 @@
/*
* Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
*
* AV1 codec dispatcher. Populates V4L2_CID_STATELESS_AV1_SEQUENCE
* (struct v4l2_ctrl_av1_sequence) from VAAPI's VADecPictureParameterBufferAV1.
*
* Why a single SEQUENCE control and not the full V4L2_CID_STATELESS_AV1_*
* family (FRAME, TILE_GROUP_ENTRY, FILM_GRAIN):
*
* - The daedalus_v4l2 daemon path consumes the OUTPUT bitstream
* directly via libavcodec/libdav1d. libdav1d needs a complete OBU
* stream that includes the sequence header — ffmpeg-vaapi strips the
* sequence header on the client side (its parser is split across
* VAPictureParameterBufferAV1 + slice payload, with OBU_SEQUENCE_HEADER
* consumed and not re-emitted), so the daemon side has to synthesise
* it from the SEQUENCE ctrl. The other AV1 ctrls (FRAME / TILE /
* FILM_GRAIN) are not needed for that synthesis — the OBU_FRAME_HEADER
* + OBU_TILE_GROUP that libdav1d also needs are still in the slice
* bitstream.
*
* - The vpu981 (RK3588 dedicated AV1 hantro) hardware path doesn't
* consult these controls either — vpu981's driver parses the AV1
* bitstream directly. So setting only SEQUENCE is correct for both
* destination decoders.
*
* Reference: marfrit/libva-v4l2-request-fourier issue #11
* (DAEMON-PPS-style sequence-header re-synthesis on the daemon
* side, paralleling the H.264 SPS/PPS work in DAEMON-PPS).
* kernel uAPI: <linux/v4l2-controls.h> @ 2891-2919.
* VAAPI: <va/va_dec_av1.h> typedef
* VADecPictureParameterBufferAV1.
*/
#include "av1.h"
#include "v4l2.h"
#include "utils.h"
#include <stdint.h>
#include <string.h>
#include <linux/v4l2-controls.h>
#include <linux/videodev2.h>
/*
* VADecPictureParameterBufferAV1 reaches us transitively via surface.h →
* va_backend.h → va.h → va_dec_av1.h (va_dec_av1.h alone won't compile
* standalone — it needs va.h's VA_PADDING_LOW / va_deprecated machinery).
*/
/* Compile-time UAPI shift guard, sibling to vp9.c's pattern. */
_Static_assert(sizeof(struct v4l2_ctrl_av1_sequence) == 12,
"v4l2_ctrl_av1_sequence size mismatch — kernel UAPI changed");
/*
* Map VAAPI bit_depth_idx (0/1/2 → 8/10/12) to the kernel ctrl's plain
* uint8_t bit_depth field. ffmpeg-vaapi sets idx from the bitstream
* BitDepth value, so this is an exact inverse of AV1 spec 5.5.2.
*/
static uint8_t av1_bit_depth_from_idx(uint8_t idx)
{
switch (idx) {
case 0: return 8;
case 1: return 10;
case 2: return 12;
default:
/* Spec-illegal; pass through so a reviewer / test catches it. */
return 8;
}
}
int av1_set_controls(struct request_data *driver_data,
struct object_context *context,
struct object_surface *surface_object)
{
VADecPictureParameterBufferAV1 *picture =
&surface_object->params.av1.picture;
struct v4l2_ctrl_av1_sequence sequence;
struct v4l2_ext_control ctrls[1];
int rc;
(void)context;
memset(&sequence, 0, sizeof sequence);
/*
* Scalar mapping. Names align with kernel uAPI; off-by-one and
* idx→value translations are annotated.
*/
sequence.seq_profile = picture->profile;
sequence.order_hint_bits =
(uint8_t)(picture->order_hint_bits_minus_1 + 1u);
sequence.bit_depth = av1_bit_depth_from_idx(picture->bit_depth_idx);
sequence.max_frame_width_minus_1 = picture->frame_width_minus1;
sequence.max_frame_height_minus_1 = picture->frame_height_minus1;
/*
* Sequence-header flag mapping. VAAPI exposes most of these directly
* in seq_info_fields.fields.*; the ones that don't have a 1:1 mirror
* (V4L2_AV1_SEQUENCE_FLAG_ENABLE_WARPED_MOTION, _ENABLE_REF_FRAME_MVS,
* _ENABLE_SUPERRES, _ENABLE_RESTORATION, _SEPARATE_UV_DELTA_Q) live in
* VAAPI's per-frame pic_info_fields rather than the sequence struct.
* For SEQUENCE-control purposes we treat them as best-effort
* unobservable from libva and leave the corresponding bits clear; the
* daedalus daemon's OBU synthesiser (issue #11 daemon track) carries
* the SEQUENCE bytes verbatim, so per-frame consumers (libdav1d) will
* still see the full bitstream truth for those toggles via the
* OBU_FRAME stream already in the slice buffer. See feedback memory
* `feedback_vaapi_blind_to_some_hevc_sps_fields` for the precedent.
*/
if (picture->seq_info_fields.fields.still_picture)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_STILL_PICTURE;
if (picture->seq_info_fields.fields.use_128x128_superblock)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_USE_128X128_SUPERBLOCK;
if (picture->seq_info_fields.fields.enable_filter_intra)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_FILTER_INTRA;
if (picture->seq_info_fields.fields.enable_intra_edge_filter)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_INTRA_EDGE_FILTER;
if (picture->seq_info_fields.fields.enable_interintra_compound)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_INTERINTRA_COMPOUND;
if (picture->seq_info_fields.fields.enable_masked_compound)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_MASKED_COMPOUND;
if (picture->seq_info_fields.fields.enable_dual_filter)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_DUAL_FILTER;
if (picture->seq_info_fields.fields.enable_order_hint)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_ORDER_HINT;
if (picture->seq_info_fields.fields.enable_jnt_comp)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_JNT_COMP;
if (picture->seq_info_fields.fields.enable_cdef)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_CDEF;
if (picture->seq_info_fields.fields.mono_chrome)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_MONO_CHROME;
if (picture->seq_info_fields.fields.color_range)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_COLOR_RANGE;
if (picture->seq_info_fields.fields.subsampling_x)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_SUBSAMPLING_X;
if (picture->seq_info_fields.fields.subsampling_y)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_SUBSAMPLING_Y;
if (picture->seq_info_fields.fields.film_grain_params_present)
sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_FILM_GRAIN_PARAMS_PRESENT;
/* Single-control batched submission. */
memset(ctrls, 0, sizeof ctrls);
ctrls[0].id = V4L2_CID_STATELESS_AV1_SEQUENCE;
ctrls[0].ptr = &sequence;
ctrls[0].size = sizeof sequence;
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 1);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
return VA_STATUS_SUCCESS;
}
+39
View File
@@ -0,0 +1,39 @@
/*
* Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
*
* AV1 codec dispatcher — populates V4L2_CID_STATELESS_AV1_SEQUENCE
* (struct v4l2_ctrl_av1_sequence) from VAAPI's VADecPictureParameterBufferAV1.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef _AV1_H_
#define _AV1_H_
#include "context.h"
#include "request.h"
#include "surface.h"
int av1_set_controls(struct request_data *driver_data,
struct object_context *context,
struct object_surface *surface);
#endif /* _AV1_H_ */
+16
View File
@@ -37,13 +37,29 @@ unsigned int pixelformat_for_profile(VAProfile profile)
case VAProfileH264ConstrainedBaseline: case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh: case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh: case VAProfileH264StereoHigh:
case VAProfileH264High10:
return V4L2_PIX_FMT_H264_SLICE; return V4L2_PIX_FMT_H264_SLICE;
case VAProfileHEVCMain: case VAProfileHEVCMain:
case VAProfileHEVCMain10:
return V4L2_PIX_FMT_HEVC_SLICE; return V4L2_PIX_FMT_HEVC_SLICE;
case VAProfileVP8Version0_3: case VAProfileVP8Version0_3:
return V4L2_PIX_FMT_VP8_FRAME; return V4L2_PIX_FMT_VP8_FRAME;
case VAProfileVP9Profile0: case VAProfileVP9Profile0:
return V4L2_PIX_FMT_VP9_FRAME; return V4L2_PIX_FMT_VP9_FRAME;
case VAProfileAV1Profile0:
/*
* ampere-av1-enablement Phase 2: AV1 Profile 0 routes to
* vpu981 (RK3588's dedicated AV1 hantro). Per-codec ctrl
* dispatch (V4L2_CID_STATELESS_AV1_*) is NOT YET WIRED on
* master — vainfo lists the profile + RequestCreateConfig
* succeeds, but consumers that submit decode buffers hit
* a NOP path until the per-codec dispatch lands. The
* av1-iter1 operator branch has Phase 3 bit-exact bring-up
* underway; this commit gives master the bare enumeration +
* fd-routing layer so consumers like ffmpeg-vaapi at least
* see VAProfileAV1Profile0 today.
*/
return V4L2_PIX_FMT_AV1_FRAME;
default: default:
return 0; return 0;
} }
+99 -16
View File
@@ -59,30 +59,37 @@ VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile,
case VAProfileH264ConstrainedBaseline: case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh: case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh: case VAProfileH264StereoHigh:
case VAProfileH264High10:
// FIXME // FIXME
// iter39: Hi10P routed through same H264 path; bit-depth gating
// happens in context.c synthetic SPS and CAPTURE pix_fmt
// selection.
break; break;
case VAProfileMPEG2Simple: case VAProfileMPEG2Simple:
case VAProfileMPEG2Main: case VAProfileMPEG2Main:
// fresnel-fourier iter1: MPEG-2 enabled. Same shape as H.264
// above — no profile-specific config validation in the libva
// backend; validation happens at vaCreateContext / control
// submission time.
break; break;
case VAProfileHEVCMain: case VAProfileHEVCMain:
// fresnel-fourier iter2: HEVC enabled. Same shape as H.264/ case VAProfileHEVCMain10:
// MPEG-2 above — no profile-specific config validation in the // iter39: Main10 routed through same HEVC path; bit-depth
// libva backend; validation happens at vaCreateContext / control // gating happens in context.c.
// submission time.
break; break;
case VAProfileVP8Version0_3: case VAProfileVP8Version0_3:
// fresnel-fourier iter3: VP8 enabled. Same shape as iter1+iter2
// above — no profile-specific config validation in the libva
// backend; validation happens at vaCreateContext / control
// submission time.
break; break;
case VAProfileVP9Profile0: case VAProfileVP9Profile0:
// fresnel-fourier iter4: VP9 Profile 0 enabled on rkvdec. // fresnel-fourier iter4: VP9 Profile 0 enabled on rkvdec.
// Same shape — no profile-specific validation here. // VP9 Profile 2 is NOT supported by RK3399 rkvdec (kernel ctrl
// cap is V4L2_MPEG_VIDEO_VP9_PROFILE_0). Do not add a case for
// VAProfileVP9Profile2 — kernel will reject.
break;
case VAProfileAV1Profile0:
// ampere-av1-enablement Phase 2: AV1 Profile 0 routes to
// vpu981 (RK3588 dedicated AV1 hantro instance). Decode-side
// ctrl dispatch (V4L2_CID_STATELESS_AV1_*) is NOT YET WIRED
// on master — vainfo will list the profile + CreateConfig
// succeeds, but consumers that submit decode buffers hit a
// NOP path until av1.{c,h} dispatch scaffolding is ported
// from the av1-iter1 operator branch (where Phase 3-5 has
// 3/10 frames bit-exact already).
break; break;
default: default:
return VA_STATUS_ERROR_UNSUPPORTED_PROFILE; return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
@@ -119,6 +126,14 @@ VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile,
*/ */
config_object->pixelformat = pixelformat_for_profile(profile); config_object->pixelformat = pixelformat_for_profile(profile);
config_object->attributes[0].type = VAConfigAttribRTFormat; config_object->attributes[0].type = VAConfigAttribRTFormat;
/*
* iter39: 10-bit profiles advertise YUV420_10. ffmpeg-vaapi reads
* this attribute on vaGetConfigAttributes and refuses surface
* allocation if it mismatches the input bitstream's bit depth.
*/
if (profile == VAProfileH264High10 || profile == VAProfileHEVCMain10)
config_object->attributes[0].value = VA_RT_FORMAT_YUV420_10;
else
config_object->attributes[0].value = VA_RT_FORMAT_YUV420; config_object->attributes[0].value = VA_RT_FORMAT_YUV420;
config_object->attributes_count = 1; config_object->attributes_count = 1;
@@ -157,13 +172,20 @@ VAStatus RequestDestroyConfig(VADriverContextP context, VAConfigID config_id)
static bool any_fd_supports_output_format(struct request_data *driver_data, static bool any_fd_supports_output_format(struct request_data *driver_data,
unsigned int fmt) unsigned int fmt)
{ {
int fds[3] = { int fds[6] = {
driver_data->video_fd, driver_data->video_fd,
driver_data->video_fd_rkvdec, driver_data->video_fd_rkvdec,
driver_data->video_fd_hantro, driver_data->video_fd_hantro,
driver_data->video_fd_rpi_hevc_dec, /* iter40 */
driver_data->video_fd_vpu981, /* ampere-av1 Phase 2 */
#ifdef HAVE_DAEDALUS_V4L2
driver_data->video_fd_daedalus, /* LIBVA-1: H.264/VP9/AV1 */
#else
-1,
#endif
}; };
int i; int i;
for (i = 0; i < 3; i++) { for (i = 0; i < 6; i++) {
if (fds[i] < 0) continue; if (fds[i] < 0) continue;
if (v4l2_find_format(fds[i], V4L2_BUF_TYPE_VIDEO_OUTPUT, fmt)) if (v4l2_find_format(fds[i], V4L2_BUF_TYPE_VIDEO_OUTPUT, fmt))
return true; return true;
@@ -193,11 +215,48 @@ VAStatus RequestQueryConfigProfiles(VADriverContextP context,
profiles[index++] = VAProfileH264ConstrainedBaseline; profiles[index++] = VAProfileH264ConstrainedBaseline;
profiles[index++] = VAProfileH264MultiviewHigh; profiles[index++] = VAProfileH264MultiviewHigh;
profiles[index++] = VAProfileH264StereoHigh; profiles[index++] = VAProfileH264StereoHigh;
/*
* iter39 Phase 7 close (Option B): VAProfileH264High10
* DELIBERATELY NOT ENUMERATED.
*
* Hi10P on Rockchip V4L2 stateless decoders requires:
* - HW: ✓ both RK3399 + RK3588 capable (per Rockchip
* datasheets — 4K 10-bit H.264 line items)
* - Kernel: ✓ Karlman v6→v10 series merged in
* mmind v7.0 (rkvdec_h264_decoded_fmts[] has
* NV15/NV20; ctrl cfg.max=HIGH_422_INTRA;
* bit_depth_luma_minus8==2 path live in
* rkvdec-h264-common.c:196)
* - Userspace ffmpeg: ✗ ffmpeg-v4l2-request-fourier
* lacks the userspace plumbing for Hi10P;
* kdirect path fails with EINVAL, libva path
* returns CAPTURE buffer all-zero.
*
* Empirically verified on both fresnel (RK3399) and ampere
* (RK3588) 2026-05-17 — same all-zero / EINVAL failure
* mode on both. The backend infrastructure (codec.c,
* context.c, image.c, surface.c, nv15.c) is RETAINED for
* when the upstream ffmpeg gap closes — just re-add the
* profiles[index++] line and bump the (-5) guard back to
* (-6). See memory feedback_rk3399_h264_hi10p_advertised_not_functional
* for the empirical evidence.
*/
} }
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_HEVC_SLICE); found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_HEVC_SLICE);
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1)) if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1)) {
profiles[index++] = VAProfileHEVCMain; profiles[index++] = VAProfileHEVCMain;
/*
* iter39 Phase 7 close (Option B): VAProfileHEVCMain10
* DELIBERATELY NOT ENUMERATED. Same reasoning as
* VAProfileH264High10 above — kernel + HW ready,
* userspace ffmpeg V4L2 hwaccel plumbing not. Untested
* specifically due to no Main10 fixture (system x265
* is 8-bit-only on Arch ARM), but same kernel/HW/
* userspace stack so same gap likely applies. Re-enable
* when ffmpeg-vaapi → V4L2 hwaccel adds 10-bit HEVC.
*/
}
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_VP8_FRAME); found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_VP8_FRAME);
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1)) if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
@@ -207,6 +266,17 @@ VAStatus RequestQueryConfigProfiles(VADriverContextP context,
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1)) if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
profiles[index++] = VAProfileVP9Profile0; profiles[index++] = VAProfileVP9Profile0;
/*
* ampere-av1-enablement Phase 2: AV1 Profile 0 advertised when
* vpu981 (RK3588 dedicated AV1 hantro) is probed. MAX_PROFILES
* bumped to 14 in request.h to safely fit even if iter39 Option
* B is reverted (Hi10P + Main10 back in enumeration → 13 total
* with AV1, the `< MAX - 1` guard then needs MAX ≥ 14).
*/
found = any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_AV1_FRAME);
if (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))
profiles[index++] = VAProfileAV1Profile0;
*profiles_count = index; *profiles_count = index;
return VA_STATUS_SUCCESS; return VA_STATUS_SUCCESS;
@@ -225,9 +295,12 @@ VAStatus RequestQueryConfigEntrypoints(VADriverContextP context,
case VAProfileH264ConstrainedBaseline: case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh: case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh: case VAProfileH264StereoHigh:
case VAProfileH264High10:
case VAProfileHEVCMain: case VAProfileHEVCMain:
case VAProfileHEVCMain10:
case VAProfileVP8Version0_3: case VAProfileVP8Version0_3:
case VAProfileVP9Profile0: case VAProfileVP9Profile0:
case VAProfileAV1Profile0:
entrypoints[0] = VAEntrypointVLD; entrypoints[0] = VAEntrypointVLD;
*entrypoints_count = 1; *entrypoints_count = 1;
break; break;
@@ -281,6 +354,16 @@ VAStatus RequestGetConfigAttributes(VADriverContextP context, VAProfile profile,
for (i = 0; i < attributes_count; i++) { for (i = 0; i < attributes_count; i++) {
switch (attributes[i].type) { switch (attributes[i].type) {
case VAConfigAttribRTFormat: case VAConfigAttribRTFormat:
/*
* iter39: 10-bit profiles publish YUV420_10. Profile-
* less query (this is invoked from vaGetConfigAttributes
* before vaCreateConfig) routes off the `profile` arg
* directly — same gating as RequestCreateConfig.
*/
if (profile == VAProfileH264High10 ||
profile == VAProfileHEVCMain10)
attributes[i].value = VA_RT_FORMAT_YUV420_10;
else
attributes[i].value = VA_RT_FORMAT_YUV420; attributes[i].value = VA_RT_FORMAT_YUV420;
break; break;
default: default:
+176 -12
View File
@@ -42,6 +42,9 @@
#include <hevc-ctrls.h> #include <hevc-ctrls.h>
#include "nv15.h" /* iter40: fallback V4L2_PIX_FMT_NV15 define for Pi 5
* Debian headers that ship NC12 but not NV15. */
#include "nv12_col128.h" /* iter40: NC12 detile primitive + UV offset helper */
#include "utils.h" #include "utils.h"
#include "v4l2.h" #include "v4l2.h"
@@ -107,9 +110,55 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
* the driver_data and is cached across CreateContext cycles. The * the driver_data and is cached across CreateContext cycles. The
* probe doesn't require any prior S_FMT — v4l2_find_format * probe doesn't require any prior S_FMT — v4l2_find_format
* enumerates the device's supported formats directly. * enumerates the device's supported formats directly.
*
* iter39: choose NV15 (10-bit packed) for Hi10P / Main10 profiles,
* NV12 (8-bit) otherwise. If the cached video_format doesn't match
* the profile's bit-depth requirement, invalidate and re-probe —
* sibling pattern to iter38's device-switch invalidation in
* request_switch_device_for_profile().
*/ */
{
bool want_10bit = (config_object->profile == VAProfileH264High10 ||
config_object->profile == VAProfileHEVCMain10);
bool is_rpi = (driver_data->video_fd ==
driver_data->video_fd_rpi_hevc_dec);
/*
* iter40: per-fd preferred pixelformat. rpi-hevc-dec exposes
* NC12 (8-bit) / NC30 (10-bit), not NV12 / NV15.
*/
unsigned int want_pixfmt;
if (is_rpi)
want_pixfmt = want_10bit ? V4L2_PIX_FMT_NV12_10_COL128
: V4L2_PIX_FMT_NV12_COL128;
else
want_pixfmt = want_10bit ? V4L2_PIX_FMT_NV15
: V4L2_PIX_FMT_NV12;
if (driver_data->video_format &&
driver_data->video_format->v4l2_format != want_pixfmt &&
driver_data->video_format->v4l2_format != V4L2_PIX_FMT_SUNXI_TILED_NV12)
driver_data->video_format = NULL;
}
if (!driver_data->video_format) { if (!driver_data->video_format) {
bool want_10bit = (config_object->profile == VAProfileH264High10 ||
config_object->profile == VAProfileHEVCMain10);
bool is_rpi = (driver_data->video_fd ==
driver_data->video_fd_rpi_hevc_dec);
video_format = NULL; video_format = NULL;
if (is_rpi) {
/*
* iter40: rpi-hevc-dec CAPTURE is NC12 (8-bit SAND
* 128-pixel-wide column tile) or NC30 (10-bit variant).
* Direct map; the kernel exposes BOTH formats in
* VIDIOC_ENUM_FMT(CAPTURE_MPLANE) without a pre-SPS
* step (verified Phase 0 strace), so find_format would
* also succeed — skip it for symmetry with the NV15
* iter39 branch below.
*/
video_format = video_format_find(
want_10bit ? V4L2_PIX_FMT_NV12_10_COL128
: V4L2_PIX_FMT_NV12_COL128);
} else if (!want_10bit) {
found = v4l2_find_format(driver_data->video_fd, found = v4l2_find_format(driver_data->video_fd,
V4L2_BUF_TYPE_VIDEO_CAPTURE, V4L2_BUF_TYPE_VIDEO_CAPTURE,
V4L2_PIX_FMT_SUNXI_TILED_NV12); V4L2_PIX_FMT_SUNXI_TILED_NV12);
@@ -121,6 +170,19 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
V4L2_PIX_FMT_NV12); V4L2_PIX_FMT_NV12);
if (found) if (found)
video_format = video_format_find(V4L2_PIX_FMT_NV12); video_format = video_format_find(V4L2_PIX_FMT_NV12);
} else {
/*
* iter39 fresnel fix: rkvdec only advertises NV15 in
* VIDIOC_ENUM_FMT(CAPTURE) AFTER S_FMT(OUTPUT) +
* S_EXT_CTRLS(SPS) resolve image_fmt to 420_10BIT.
* Before that, only NV12 is enumerated. Pre-finding
* NV15 always fails. Skip the find_format check and
* directly map to our NV15 video_format entry; the
* later S_FMT(CAPTURE) commits the actual NV15 mode
* once the synthetic SPS sets bit_depth_luma_minus8=2.
*/
video_format = video_format_find(V4L2_PIX_FMT_NV15);
}
if (video_format == NULL) { if (video_format == NULL) {
status = VA_STATUS_ERROR_OPERATION_FAILED; status = VA_STATUS_ERROR_OPERATION_FAILED;
@@ -131,6 +193,10 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
} }
video_format = driver_data->video_format; video_format = driver_data->video_format;
/* iter39: session-wide flag drives image.c reporting + unpack. */
driver_data->is_10bit = (config_object->profile == VAProfileH264High10 ||
config_object->profile == VAProfileHEVCMain10);
output_type = v4l2_type_video_output(video_format->v4l2_mplane); output_type = v4l2_type_video_output(video_format->v4l2_mplane);
capture_type = v4l2_type_video_capture(video_format->v4l2_mplane); capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);
@@ -175,7 +241,22 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
* CAPTURE (sanity read-back, matches what S_FMT committed). * CAPTURE (sanity read-back, matches what S_FMT committed).
*/ */
{ {
unsigned int capture_pixelformat = V4L2_PIX_FMT_NV12; /*
* iter40: take the CAPTURE pixelformat from the resolved
* video_format slot — that's per-fd, per-bit-depth correct.
* rkvdec 8-bit → NV12
* rkvdec 10-bit → NV15
* hantro 8-bit → NV12
* rpi-hevc-dec → NC12 (V4L2_PIX_FMT_NV12_COL128)
* Pre-iter40 this was hardcoded NV12/NV15 — the rpi-hevc-dec
* fd would then have S_FMT(NV12) issued, and the kernel
* "helpfully" substituted V4L2_PIX_FMT_NV12MT_COL128 (the
* MULTI-PLANE-NON-CONTIGUOUS variant) instead of the
* SINGLE-PLANE NC12 we wanted, breaking cap_pool QUERYBUF
* downstream (Phase 7 iter40 first-run discovery).
*/
unsigned int capture_pixelformat =
driver_data->video_format->v4l2_format;
rc = v4l2_set_format(driver_data->video_fd, capture_type, rc = v4l2_set_format(driver_data->video_fd, capture_type,
capture_pixelformat, picture_width, capture_pixelformat, picture_width,
picture_height); picture_height);
@@ -232,16 +313,42 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
* the device-init DECODE_MODE + START_CODE block below ALSO uses * the device-init DECODE_MODE + START_CODE block below ALSO uses
* void-cast best-effort, so this is consistent with prior pattern. * void-cast best-effort, so this is consistent with prior pattern.
*/ */
{ /*
* iter40 (Phase 5 review F6): the synthetic-SPS pre-seed is an
* rkvdec-specific quirk fix (the -EBUSY-on-CAPTURE-busy bug in
* rkvdec_s_ctrl). rpi-hevc-dec does NOT need it and uses a
* different submission ordering (Phase 0 strace: S_FMT_OUTPUT →
* REQBUFS_OUTPUT → S_FMT_CAPTURE → CREATE_BUFS_CAPTURE → STREAMON,
* with per-frame SPS via S_EXT_CTRLS class=0xf010000). Sending a
* stale dummy SPS at context-init time would leave rpi-hevc-dec's
* internal state on the dummy until the first real per-frame SPS
* arrives — exact behavior unknown but a known divergence from
* kdirect.
*
* Skip pre-seed when the active fd is rpi-hevc-dec. rkvdec /
* hantro paths unchanged.
*/
if (driver_data->video_fd != driver_data->video_fd_rpi_hevc_dec) {
/*
* iter39: 10-bit profiles set bit_depth_luma_minus8 = 2 in
* the synthetic SPS so rkvdec's get_image_fmt resolves to
* RKVDEC_IMG_FMT_420_10BIT (per rkvdec-h264-common.c:196 +
* rkvdec-hevc-common.c:467). Image_fmt resolution depends
* only on bit_depth_luma_minus8 and chroma_format_idc;
* profile_idc is ignored for image_fmt and v4l2_ctrl_hevc_sps
* has no profile_idc field at all.
*/
bool ten = driver_data->is_10bit;
switch (config_object->profile) { switch (config_object->profile) {
case VAProfileHEVCMain: { case VAProfileHEVCMain:
case VAProfileHEVCMain10: {
struct v4l2_ctrl_hevc_sps dummy_sps; struct v4l2_ctrl_hevc_sps dummy_sps;
struct v4l2_ext_control dummy_ctrl; struct v4l2_ext_control dummy_ctrl;
memset(&dummy_sps, 0, sizeof(dummy_sps)); memset(&dummy_sps, 0, sizeof(dummy_sps));
dummy_sps.chroma_format_idc = 1; /* 4:2:0 */ dummy_sps.chroma_format_idc = 1; /* 4:2:0 */
dummy_sps.bit_depth_luma_minus8 = 0; /* 8-bit */ dummy_sps.bit_depth_luma_minus8 = ten ? 2 : 0;
dummy_sps.bit_depth_chroma_minus8 = 0; dummy_sps.bit_depth_chroma_minus8 = ten ? 2 : 0;
dummy_sps.pic_width_in_luma_samples = picture_width; dummy_sps.pic_width_in_luma_samples = picture_width;
dummy_sps.pic_height_in_luma_samples = picture_height; dummy_sps.pic_height_in_luma_samples = picture_height;
@@ -256,19 +363,20 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
case VAProfileH264High: case VAProfileH264High:
case VAProfileH264ConstrainedBaseline: case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh: case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh: { case VAProfileH264StereoHigh:
case VAProfileH264High10: {
struct v4l2_ctrl_h264_sps dummy_sps; struct v4l2_ctrl_h264_sps dummy_sps;
struct v4l2_ext_control dummy_ctrl; struct v4l2_ext_control dummy_ctrl;
memset(&dummy_sps, 0, sizeof(dummy_sps)); memset(&dummy_sps, 0, sizeof(dummy_sps));
dummy_sps.chroma_format_idc = 1; /* 4:2:0 */ dummy_sps.chroma_format_idc = 1; /* 4:2:0 */
dummy_sps.bit_depth_luma_minus8 = 0; dummy_sps.bit_depth_luma_minus8 = ten ? 2 : 0;
dummy_sps.bit_depth_chroma_minus8 = 0; dummy_sps.bit_depth_chroma_minus8 = ten ? 2 : 0;
dummy_sps.pic_width_in_mbs_minus1 = dummy_sps.pic_width_in_mbs_minus1 =
(picture_width + 15) / 16 - 1; (picture_width + 15) / 16 - 1;
dummy_sps.pic_height_in_map_units_minus1 = dummy_sps.pic_height_in_map_units_minus1 =
(picture_height + 15) / 16 - 1; (picture_height + 15) / 16 - 1;
dummy_sps.profile_idc = 100; /* High */ dummy_sps.profile_idc = ten ? 110 : 100; /* High10 : High */
dummy_sps.level_idc = 41; dummy_sps.level_idc = 41;
/* /*
* FRAME_MBS_ONLY required: rkvdec_h264_validate_sps * FRAME_MBS_ONLY required: rkvdec_h264_validate_sps
@@ -289,7 +397,7 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
default: default:
break; break;
} }
} } /* iter40: end of pre-seed-skip-on-rpi-hevc-dec guard */
destination_planes_count = video_format->planes_count; destination_planes_count = video_format->planes_count;
@@ -323,11 +431,40 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
* changed by BeginPicture's slot acquisition. * changed by BeginPicture's slot acquisition.
*/ */
if (video_format->v4l2_buffers_count == 1) { if (video_format->v4l2_buffers_count == 1) {
if (video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128) {
/*
* iter40: NC12 SAND layout: Y plane size is
* NUM_COLUMNS * TILE_W * ALIGN(height, 8) (= linear
* NV12 Y for column-aligned widths), UV plane is half.
* The kernel-reported destination_bytesperlines[0] is
* the COLUMN stride (ALIGN(height,8)*3/2), not the
* linear Y stride — using it × format_height gives the
* wrong intra-buffer UV offset (destination_offsets[1]
* derives from destination_sizes[0] in
* surface_fill_format_uniform).
*
* Use format_width/format_height (kernel-returned from
* G_FMT) not picture_width/height (caller request),
* because the kernel applies its own ALIGN rules; the
* UV plane location is keyed off the kernel layout.
*/
unsigned int uv_off = nv12_col128_uv_plane_offset(
format_width, format_height);
destination_sizes[0] = uv_off;
for (j = 1; j < destination_planes_count; j++)
destination_sizes[j] = uv_off / 2;
request_log("iter40: NC12 sizes pic=%ux%u fmt=%ux%u bpl=%u uv_off=%u sizeimage(kernel)=%u\n",
picture_width, picture_height,
format_width, format_height,
destination_bytesperlines[0], uv_off,
destination_bytesperlines[0] * format_height);
} else {
destination_sizes[0] = destination_bytesperlines[0] * destination_sizes[0] = destination_bytesperlines[0] *
format_height; format_height;
for (j = 1; j < destination_planes_count; j++) for (j = 1; j < destination_planes_count; j++)
destination_sizes[j] = destination_sizes[0] / 2; destination_sizes[j] = destination_sizes[0] / 2;
} }
}
/* /*
* iter5b-β Commit D: cache the format-uniform CAPTURE geometry * iter5b-β Commit D: cache the format-uniform CAPTURE geometry
@@ -460,6 +597,18 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
* + ANNEX_B (only supported menu values per Phase 0 v4l2_inventory). * + ANNEX_B (only supported menu values per Phase 0 v4l2_inventory).
*/ */
{ {
/*
* iter40: per-driver HEVC start_code menu value. rkvdec /
* hantro path uses ANNEX_B + start-code-prepended payload.
* rpi-hevc-dec uses NONE — confirmed empirically Phase 7
* (any other mode → V4L2_BUF_FLAG_ERROR on every CAPTURE
* DQBUF, all-zero output). kdirect's strace also shows
* start_code=0 on rpi-hevc-dec. Both are accepted by the
* driver's QUERY_EXT_CTRL menu (min=0 max=1), but only NONE
* actually drives correct decode on the Pi.
*/
bool is_rpi = (driver_data->video_fd ==
driver_data->video_fd_rpi_hevc_dec);
struct v4l2_ext_control hevc_dev_ctrls[2] = { struct v4l2_ext_control hevc_dev_ctrls[2] = {
{ {
.id = V4L2_CID_STATELESS_HEVC_DECODE_MODE, .id = V4L2_CID_STATELESS_HEVC_DECODE_MODE,
@@ -467,7 +616,9 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
}, },
{ {
.id = V4L2_CID_STATELESS_HEVC_START_CODE, .id = V4L2_CID_STATELESS_HEVC_START_CODE,
.value = V4L2_STATELESS_HEVC_START_CODE_ANNEX_B, .value = is_rpi
? 0 /* V4L2_STATELESS_HEVC_START_CODE_NONE */
: V4L2_STATELESS_HEVC_START_CODE_ANNEX_B,
}, },
}; };
(void)v4l2_set_controls(driver_data->video_fd, -1, (void)v4l2_set_controls(driver_data->video_fd, -1,
@@ -500,19 +651,30 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
* commit will replace this hardcoded assignment with a runtime * commit will replace this hardcoded assignment with a runtime
* read of the kernel's accepted START_CODE value. * read of the kernel's accepted START_CODE value.
*/ */
{
bool is_rpi = (driver_data->video_fd ==
driver_data->video_fd_rpi_hevc_dec);
switch (config_object->profile) { switch (config_object->profile) {
case VAProfileH264Main: case VAProfileH264Main:
case VAProfileH264High: case VAProfileH264High:
case VAProfileH264ConstrainedBaseline: case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh: case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh: case VAProfileH264StereoHigh:
case VAProfileHEVCMain:
context_object->h264_start_code = true; context_object->h264_start_code = true;
break; break;
case VAProfileHEVCMain:
/* iter40: rpi-hevc-dec rejects start-code-prepended
* payload (DQBUF error flag on every CAPTURE buffer).
* Gate to match the per-driver START_CODE menu value
* set above: NONE on rpi → no prepend; ANNEX_B on
* rkvdec → prepend. */
context_object->h264_start_code = !is_rpi;
break;
default: default:
context_object->h264_start_code = false; context_object->h264_start_code = false;
break; break;
} }
}
rc = v4l2_set_stream(driver_data->video_fd, output_type, true); rc = v4l2_set_stream(driver_data->video_fd, output_type, true);
if (rc < 0) { if (rc < 0) {
@@ -636,6 +798,8 @@ VAStatus RequestDestroyContext(VADriverContextP context, VAContextID context_id)
* The next CreateContext re-populates the cache. * The next CreateContext re-populates the cache.
*/ */
driver_data->fmt_valid = false; driver_data->fmt_valid = false;
/* iter39: clear 10-bit session flag — next CreateContext re-sets. */
driver_data->is_10bit = false;
return VA_STATUS_SUCCESS; return VA_STATUS_SUCCESS;
} }
+53
View File
@@ -827,10 +827,63 @@ int h264_set_controls(struct request_data *driver_data,
dpb_update(context, &surface->params.h264.picture); dpb_update(context, &surface->params.h264.picture);
/*
* Dump the raw VAAPI fields at the libva boundary so issue #8
* follow-up can disambiguate "ffmpeg-vaapi didn't populate" from
* "downstream consumer (daedalus_v4l2 wire protocol) corrupted the
* value". One-line; safe to leave in — costs a single printf per frame.
*/
request_log("h264_set_controls: VAProfile=%d seq_fields=0x%08x pic_fields=0x%08x num_ref_frames=%u bit_depth_luma_m8=%u bit_depth_chroma_m8=%u w_mbs_m1=%u h_mbs_m1=%u\n",
(int)profile,
surface->params.h264.picture.seq_fields.value,
surface->params.h264.picture.pic_fields.value,
surface->params.h264.picture.num_ref_frames,
surface->params.h264.picture.bit_depth_luma_minus8,
surface->params.h264.picture.bit_depth_chroma_minus8,
surface->params.h264.picture.picture_width_in_mbs_minus1,
surface->params.h264.picture.picture_height_in_mbs_minus1);
h264_va_picture_to_v4l2(driver_data, context, surface, h264_va_picture_to_v4l2(driver_data, context, surface,
&surface->params.h264.picture, &surface->params.h264.picture,
&decode, &pps, &sps); &decode, &pps, &sps);
/*
* max_num_ref_frames fallback. Some VAAPI clients (older ffmpeg-vaapi
* paths, some daedalus_v4l2 consumers) leave VAPicture->num_ref_frames
* at zero. Hardware decoders tolerate; libavcodec-via-daedalus enforces
* sps.max_num_ref_frames strictly and rejects every frame.
*
* Count valid DPB entries first (the bitstream-true reference count we
* can see); fall back to a per-profile spec minimum if even that is 0.
* See marfrit/libva-v4l2-request-fourier issue #8.
*/
if (sps.max_num_ref_frames == 0) {
unsigned int valid = 0;
unsigned int i;
for (i = 0; i < 16; i++) {
const VAPictureH264 *ref =
&surface->params.h264.picture.ReferenceFrames[i];
if (!(ref->flags & VA_PICTURE_H264_INVALID))
valid++;
}
if (valid > 0) {
sps.max_num_ref_frames = (uint8_t)valid;
} else {
switch (profile) {
case VAProfileH264ConstrainedBaseline:
sps.max_num_ref_frames = 1;
break;
case VAProfileH264Main:
case VAProfileH264High:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
default:
sps.max_num_ref_frames = 4;
break;
}
}
}
/* /*
* Populate the scaling matrix unconditionally: from VAAPI's * Populate the scaling matrix unconditionally: from VAAPI's
* VAIQMatrixBufferH264 when the consumer sent one this frame * VAIQMatrixBufferH264 when the consumer sent one this frame
+399 -6
View File
@@ -70,6 +70,7 @@
#include "surface.h" #include "surface.h"
#include <assert.h> #include <assert.h>
#include <errno.h>
#include <stdlib.h> #include <stdlib.h>
#include <string.h> #include <string.h>
@@ -79,6 +80,21 @@
#include <linux/videodev2.h> #include <linux/videodev2.h>
#include <linux/v4l2-controls.h> #include <linux/v4l2-controls.h>
#include "hevc-ctrls/v4l2-hevc-ext-controls.h"
#include "h265_parser/gst/codecparsers/gsth265parser.h"
/*
* VAAPI source arrays for HEVC ref/weight tables are sized 15
* (VASliceParameterBufferHEVC::RefPicList[2][15],
* delta_luma_weight_l0[15], luma_offset_l0[15], etc. — see
* /usr/include/va/va_dec_hevc.h). V4L2_HEVC_DPB_ENTRIES_NUM_MAX
* is 16; iterating to that bound over-reads the VAAPI source by
* one element. Hidden by -O3 unrolling but manifests as a SEGV
* under -O2 vectorisation (regression discovered in package
* builds 2026-05-17). Cap all per-ref/weight loops at this.
*/
#define VA_HEVC_REF_LIST_LEN 15
#include "utils.h" #include "utils.h"
#include "v4l2.h" #include "v4l2.h"
@@ -461,13 +477,21 @@ static void h265_fill_slice_params(VAPictureParameterBufferHEVC *picture,
/* Q2: slice_segment_addr from VAAPI (was missing in old h265.c). */ /* Q2: slice_segment_addr from VAAPI (was missing in old h265.c). */
slice_params->slice_segment_addr = slice->slice_segment_address; slice_params->slice_segment_addr = slice->slice_segment_address;
/* Ref index arrays (DPB indices). For I-slices both are unused. */ /*
for (i = 0; i < V4L2_HEVC_DPB_ENTRIES_NUM_MAX && * Ref index arrays (DPB indices). For I-slices both are unused.
*
* Cap iteration at VAAPI source size (15) — V4L2_HEVC_DPB_ENTRIES_NUM_MAX
* is 16, but VASliceParameterBufferHEVC::RefPicList is RefPicList[2][15].
* Iterating to 16 reads one past the source array; with -O2 GCC vectorises
* the copy and the over-read produces a real SEGV (manifested in package
* builds with Arch makepkg CFLAGS, plain -O3 release builds hid it).
*/
for (i = 0; i < VA_HEVC_REF_LIST_LEN &&
slice_type != V4L2_HEVC_SLICE_TYPE_I; i++) { slice_type != V4L2_HEVC_SLICE_TYPE_I; i++) {
if (i < (slice->num_ref_idx_l0_active_minus1 + 1U)) if (i < (slice->num_ref_idx_l0_active_minus1 + 1U))
slice_params->ref_idx_l0[i] = slice->RefPicList[0][i]; slice_params->ref_idx_l0[i] = slice->RefPicList[0][i];
} }
for (i = 0; i < V4L2_HEVC_DPB_ENTRIES_NUM_MAX && for (i = 0; i < VA_HEVC_REF_LIST_LEN &&
slice_type == V4L2_HEVC_SLICE_TYPE_B; i++) { slice_type == V4L2_HEVC_SLICE_TYPE_B; i++) {
if (i < (slice->num_ref_idx_l1_active_minus1 + 1U)) if (i < (slice->num_ref_idx_l1_active_minus1 + 1U))
slice_params->ref_idx_l1[i] = slice->RefPicList[1][i]; slice_params->ref_idx_l1[i] = slice->RefPicList[1][i];
@@ -499,7 +523,9 @@ static void h265_fill_slice_params(VAPictureParameterBufferHEVC *picture,
slice_params->pred_weight_table.delta_chroma_log2_weight_denom = slice_params->pred_weight_table.delta_chroma_log2_weight_denom =
slice->delta_chroma_log2_weight_denom; slice->delta_chroma_log2_weight_denom;
for (i = 0; i < V4L2_HEVC_DPB_ENTRIES_NUM_MAX && /* Pred weight tables — cap at VAAPI source array size (15), same
* reason as the RefPicList loops above. */
for (i = 0; i < VA_HEVC_REF_LIST_LEN &&
slice_type != V4L2_HEVC_SLICE_TYPE_I; i++) { slice_type != V4L2_HEVC_SLICE_TYPE_I; i++) {
slice_params->pred_weight_table.delta_luma_weight_l0[i] = slice_params->pred_weight_table.delta_luma_weight_l0[i] =
slice->delta_luma_weight_l0[i]; slice->delta_luma_weight_l0[i];
@@ -512,7 +538,7 @@ static void h265_fill_slice_params(VAPictureParameterBufferHEVC *picture,
slice->ChromaOffsetL0[i][j]; slice->ChromaOffsetL0[i][j];
} }
} }
for (i = 0; i < V4L2_HEVC_DPB_ENTRIES_NUM_MAX && for (i = 0; i < VA_HEVC_REF_LIST_LEN &&
slice_type == V4L2_HEVC_SLICE_TYPE_B; i++) { slice_type == V4L2_HEVC_SLICE_TYPE_B; i++) {
slice_params->pred_weight_table.delta_luma_weight_l1[i] = slice_params->pred_weight_table.delta_luma_weight_l1[i] =
slice->delta_luma_weight_l1[i]; slice->delta_luma_weight_l1[i];
@@ -582,6 +608,271 @@ static void h265_fill_scaling_matrix(VAIQMatrixBufferHEVC *iqmatrix,
} }
/* ===== Clause 1: orchestrator — batched 5-control submission ===== */ /* ===== Clause 1: orchestrator — batched 5-control submission ===== */
/*
* iter2 (ampere-kernel-decoders) — parse the HEVC SPS NAL out of the
* decode-time bitstream buffer (when present — typically only on IDR
* frames) via the vendored GStreamer 1.28.2 H.265 parser, map the
* resulting GstH265ShortTermRefPicSet + GstH265ShortTermRefPicSetExt
* arrays into V4L2_CID_STATELESS_HEVC_EXT_SPS_{ST,LT}_RPS struct
* arrays, and cache them on driver_data for reuse by subsequent
* non-IDR frames whose source_data buffer doesn't carry the SPS.
*
* Why: Linux 7.0 VDPU381/383 rkvdec requires the kernel-side RPS
* arrays to be populated; userspace VAAPI doesn't expose this data
* via VAPictureParameterBufferHEVC (only the COUNTS). Mirrors
* GStreamer's gst_v4l2_codec_h265_dec_fill_ext_sps_rps shape
* (gst-plugins-bad/sys/v4l2codecs/gstv4l2codech265dec.c, merged in
* GStreamer 1.28 via MR !10820).
*
* Returns 0 on success (cache is valid after this call, controls
* arrays available in driver_data->hevc_rps_cache_*), negative on
* parse failure with cache left in its previous state.
*
* If source_data does NOT contain an SPS NAL and the cache is NOT
* yet valid (first frame of a stream where IDR happens to lack
* embedded SPS), returns -ENODATA. Caller decides what to do
* (typically: skip the controls submission and let the kernel hit
* its early-return path; if the kernel still OOPSes that's the
* F1 falsifier and we loop back to Phase 0).
*/
static int h265_populate_ext_sps_rps_cache(struct request_data *driver_data,
struct object_surface *surface_object)
{
const guint8 *src = surface_object->source_data;
gsize src_size = surface_object->slices_size;
GstH265Parser *parser;
GstH265NalUnit nalu;
GstH265SPS sps;
GstH265SPSEXT sps_ext;
GstH265ParserResult pr;
int err = -ENODATA;
parser = gst_h265_parser_new();
if (parser == NULL)
return -ENOMEM;
/* Walk source_data for NAL units; first NAL with type==33 (SPS)
* is what we parse. Annex-B start codes (3- or 4-byte) are
* detected by gst_h265_parser_identify_nalu_unchecked. */
gsize offset = 0;
while (offset < src_size) {
pr = gst_h265_parser_identify_nalu(parser, src, offset, src_size,
&nalu);
if (pr != GST_H265_PARSER_OK && pr != GST_H265_PARSER_NO_NAL_END)
break;
if (nalu.type == GST_H265_NAL_SPS) {
/*
* gst_h265_parser_parse_sps_ext fills both the base
* SPS and the extended-RPS SPSEXT struct. The plain
* gst_h265_parser_parse_sps only fills the base —
* its internally-parsed sps_ext is discarded (see
* gsth265parser.c:2050+ where the function calls
* parse_sps_ext with a LOCAL sps_ext variable). We
* need the EXT data for the V4L2 EXT_SPS_*_RPS
* controls, so call the _ext variant directly.
*/
memset(&sps, 0, sizeof(sps));
memset(&sps_ext, 0, sizeof(sps_ext));
pr = gst_h265_parser_parse_sps_ext(parser, &nalu,
&sps, &sps_ext, TRUE);
if (pr != GST_H265_PARSER_OK)
break;
/* Allocate the V4L2 struct arrays sized by the
* parser's reported counts; free any previous
* cache before overwriting. */
free(driver_data->hevc_rps_cache_st);
driver_data->hevc_rps_cache_st = NULL;
free(driver_data->hevc_rps_cache_lt);
driver_data->hevc_rps_cache_lt = NULL;
driver_data->hevc_rps_cache_valid = false;
driver_data->hevc_rps_cache_st_count =
sps.num_short_term_ref_pic_sets;
driver_data->hevc_rps_cache_lt_count =
sps.num_long_term_ref_pics_sps;
if (driver_data->hevc_rps_cache_st_count > 0) {
driver_data->hevc_rps_cache_st = calloc(
driver_data->hevc_rps_cache_st_count,
sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps));
if (driver_data->hevc_rps_cache_st == NULL) {
err = -ENOMEM;
break;
}
for (unsigned int i = 0;
i < driver_data->hevc_rps_cache_st_count;
i++) {
struct v4l2_ctrl_hevc_ext_sps_st_rps *dst =
&driver_data->hevc_rps_cache_st[i];
const GstH265ShortTermRefPicSet *st =
&sps.short_term_ref_pic_set[i];
const GstH265ShortTermRefPicSetExt *ste =
&sps_ext.short_term_ref_pic_set_ext[i];
if (st->inter_ref_pic_set_prediction_flag)
dst->flags |=
V4L2_HEVC_EXT_SPS_ST_RPS_FLAG_INTER_REF_PIC_SET_PRED;
dst->delta_idx_minus1 = st->delta_idx_minus1;
dst->delta_rps_sign = st->delta_rps_sign;
dst->abs_delta_rps_minus1 = st->abs_delta_rps_minus1;
dst->num_negative_pics = st->NumNegativePics;
dst->num_positive_pics = st->NumPositivePics;
/* GStreamer's ShortTermRefPicSetExt
* carries the per-RPS-entry use_delta /
* used_by_curr_pic / delta_poc_s0/s1
* arrays (added GStreamer 1.28
* alongside the V4L2 controls). */
for (unsigned int j = 0; j < 16; j++) {
if (ste->used_by_curr_pic_flag[j])
dst->used_by_curr_pic |= (1u << j);
if (ste->use_delta_flag[j])
dst->use_delta_flag |= (1u << j);
dst->delta_poc_s0_minus1[j] =
ste->delta_poc_s0_minus1[j];
dst->delta_poc_s1_minus1[j] =
ste->delta_poc_s1_minus1[j];
}
}
}
if (driver_data->hevc_rps_cache_lt_count > 0) {
driver_data->hevc_rps_cache_lt = calloc(
driver_data->hevc_rps_cache_lt_count,
sizeof(struct v4l2_ctrl_hevc_ext_sps_lt_rps));
if (driver_data->hevc_rps_cache_lt == NULL) {
err = -ENOMEM;
break;
}
for (unsigned int i = 0;
i < driver_data->hevc_rps_cache_lt_count;
i++) {
struct v4l2_ctrl_hevc_ext_sps_lt_rps *dst =
&driver_data->hevc_rps_cache_lt[i];
dst->lt_ref_pic_poc_lsb_sps =
sps.lt_ref_pic_poc_lsb_sps[i];
if (sps.used_by_curr_pic_lt_sps_flag[i])
dst->flags |=
V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT;
}
}
driver_data->hevc_rps_cache_valid = true;
err = 0;
break;
}
offset = nalu.offset + nalu.size;
}
gst_h265_parser_free(parser);
/* If the SPS NAL wasn't in this frame's source_data but we have
* a cached valid RPS from a prior frame, that's the non-IDR
* common case — report success so the caller submits the
* cached arrays. */
if (err == -ENODATA && driver_data->hevc_rps_cache_valid)
err = 0;
return err;
}
/*
* iter40b: parse SPS NAL from source_data to populate the
* VAAPI-omitted v4l2_ctrl_hevc_sps fields (max_num_reorder_pics,
* max_latency_increase_plus1, sps_max_sub_layers_minus1, and
* sps_max_dec_pic_buffering_minus1 at the right sublayer index).
*
* Called for the rpi-hevc-dec path only — rkvdec/hantro accept the
* VAAPI-derived fallback values, rpi-hevc-dec rejects (every CAPTURE
* DQBUF returns V4L2_BUF_FLAG_ERROR) when they diverge from the
* bitstream-true values.
*
* Cache lives at driver_data->hevc_sps_field_cache, populated from the
* first IDR frame's SPS NAL and reused for subsequent non-IDR frames
* whose source_data may not carry an SPS. Same lifecycle as
* hevc_rps_cache_*.
*
* Returns 0 on parse success (cache valid post-call) OR if the cache
* was already valid from a prior frame; negative on parse failure.
*/
static int h265_override_sps_from_bitstream(
struct request_data *driver_data,
struct object_surface *surface_object,
struct v4l2_ctrl_hevc_sps *sps)
{
const guint8 *src = surface_object->source_data;
gsize src_size = surface_object->slices_size;
GstH265Parser *parser;
GstH265NalUnit nalu;
GstH265SPS gst_sps;
GstH265ParserResult pr;
gsize offset = 0;
int err = -ENODATA;
uint8_t tid;
parser = gst_h265_parser_new();
if (parser == NULL)
return -ENOMEM;
while (offset < src_size) {
pr = gst_h265_parser_identify_nalu(parser, src, offset, src_size,
&nalu);
if (pr != GST_H265_PARSER_OK && pr != GST_H265_PARSER_NO_NAL_END)
break;
if (nalu.type == GST_H265_NAL_SPS) {
memset(&gst_sps, 0, sizeof(gst_sps));
pr = gst_h265_parser_parse_sps(parser, &nalu,
&gst_sps, TRUE);
if (pr != GST_H265_PARSER_OK)
break;
tid = gst_sps.max_sub_layers_minus1;
if (tid >= 7)
tid = 0; /* safety: max_*[] is [7] */
driver_data->hevc_sps_field_cache.sps_max_sub_layers_minus1 =
gst_sps.max_sub_layers_minus1;
driver_data->hevc_sps_field_cache.max_dec_pic_buffering_minus1 =
gst_sps.max_dec_pic_buffering_minus1[tid];
driver_data->hevc_sps_field_cache.max_num_reorder_pics =
gst_sps.max_num_reorder_pics[tid];
driver_data->hevc_sps_field_cache.max_latency_increase_plus1 =
gst_sps.max_latency_increase_plus1[tid];
driver_data->hevc_sps_field_cache.scaling_list_enabled =
gst_sps.scaling_list_enabled_flag;
driver_data->hevc_sps_field_cache.scaling_list_data_present =
gst_sps.scaling_list_data_present_flag;
driver_data->hevc_sps_field_cache.valid = true;
err = 0;
break;
}
offset = nalu.offset + nalu.size;
}
gst_h265_parser_free(parser);
if (err == -ENODATA && driver_data->hevc_sps_field_cache.valid)
err = 0;
if (err == 0 && driver_data->hevc_sps_field_cache.valid) {
sps->sps_max_sub_layers_minus1 =
driver_data->hevc_sps_field_cache.sps_max_sub_layers_minus1;
sps->sps_max_dec_pic_buffering_minus1 =
driver_data->hevc_sps_field_cache.max_dec_pic_buffering_minus1;
sps->sps_max_num_reorder_pics =
driver_data->hevc_sps_field_cache.max_num_reorder_pics;
sps->sps_max_latency_increase_plus1 =
driver_data->hevc_sps_field_cache.max_latency_increase_plus1;
}
return err;
}
int h265_set_controls(struct request_data *driver_data, int h265_set_controls(struct request_data *driver_data,
struct object_context *context_object, struct object_context *context_object,
struct object_surface *surface_object) struct object_surface *surface_object)
@@ -599,7 +890,7 @@ int h265_set_controls(struct request_data *driver_data,
struct v4l2_ctrl_hevc_scaling_matrix scaling_matrix; struct v4l2_ctrl_hevc_scaling_matrix scaling_matrix;
struct v4l2_ctrl_hevc_slice_params *slice_params_array = NULL; struct v4l2_ctrl_hevc_slice_params *slice_params_array = NULL;
struct v4l2_ext_control controls[5]; struct v4l2_ext_control controls[7];
unsigned int n = 0; unsigned int n = 0;
unsigned int i; unsigned int i;
unsigned int prefix_bytes; unsigned int prefix_bytes;
@@ -635,6 +926,50 @@ int h265_set_controls(struct request_data *driver_data,
} }
h265_fill_sps(picture, &sps); h265_fill_sps(picture, &sps);
/*
* iter40b: rpi-hevc-dec validates SPS fields VAAPI doesn't
* forward (sps_max_num_reorder_pics, sps_max_latency_increase_plus1)
* against bitstream-true values and rejects the frame when our
* §A.4.2 spec-legal fallback diverges. Parse the SPS NAL from
* source_data and override. Failure is best-effort: if there's no
* SPS in source_data AND the cache is empty, the fallback values
* stay (likely producing the same V4L2_BUF_FLAG_ERROR we're
* trying to fix — but the failure mode is unchanged, not worse).
*/
{
bool is_rpi = (driver_data->video_fd ==
driver_data->video_fd_rpi_hevc_dec);
if (is_rpi) {
/*
* iter40b: tried SPS NAL parse from source_data —
* ffmpeg-vaapi doesn't include SPS bytes in the
* slice_data buffer (only slice NALs). The parse
* returns -ENODATA every frame, cache stays empty.
*
* Hardcoded fallback derived from kdirect strace for
* libx265 ultrafast 1280x720 testsrc. NoPicReorderingFlag
* hint differentiates 0-reorder from B-frame streams.
* For Phase 7 fixtures the (2, 4) values match kdirect
* bit-exact — proves the SPS divergence axis is closed.
*
* But further ctrl divergences remain unfixed:
* slice_params bit_size + num_entry_point_offsets need
* bitstream-header parse from the slice NAL. Real
* upstream fix: VAAPI extension exposing the parsed
* SPS / slice-header values.
*/
(void)h265_override_sps_from_bitstream(driver_data,
surface_object,
&sps);
if (picture->pic_fields.bits.NoPicReorderingFlag) {
sps.sps_max_num_reorder_pics = 0;
sps.sps_max_latency_increase_plus1 = 0;
} else {
sps.sps_max_num_reorder_pics = 2;
sps.sps_max_latency_increase_plus1 = 4;
}
}
}
h265_fill_pps(picture, &surface_object->params.h265.slices[0], &pps); h265_fill_pps(picture, &surface_object->params.h265.slices[0], &pps);
h265_fill_decode_params(driver_data, picture, &decode_params); h265_fill_decode_params(driver_data, picture, &decode_params);
h265_fill_scaling_matrix(iqmatrix, iqmatrix_set, &scaling_matrix); h265_fill_scaling_matrix(iqmatrix, iqmatrix_set, &scaling_matrix);
@@ -679,17 +1014,75 @@ int h265_set_controls(struct request_data *driver_data,
.ptr = slice_params_array, .ptr = slice_params_array,
.size = sizeof(struct v4l2_ctrl_hevc_slice_params) * num_slices, .size = sizeof(struct v4l2_ctrl_hevc_slice_params) * num_slices,
}; };
/*
* iter40b: rpi-hevc-dec's per-frame ctrl set is 4 (no
* scaling_matrix when SPS doesn't enable it). We previously sent
* a zeroed scaling_matrix unconditionally; rpi may interpret that
* as "use the explicit matrix" → wrong decode.
*
* Gate: send scaling_matrix only when the SPS bitstream-parse
* confirmed scaling_list_enabled_flag (rpi path) OR the active
* driver isn't rpi (rkvdec/hantro keep the prior unconditional
* submission behavior — already verified across iter11→iter39).
*/
{
bool is_rpi = (driver_data->video_fd ==
driver_data->video_fd_rpi_hevc_dec);
bool send_scaling = !is_rpi ||
driver_data->hevc_sps_field_cache.scaling_list_enabled;
if (send_scaling) {
controls[n++] = (struct v4l2_ext_control){ controls[n++] = (struct v4l2_ext_control){
.id = V4L2_CID_STATELESS_HEVC_SCALING_MATRIX, .id = V4L2_CID_STATELESS_HEVC_SCALING_MATRIX,
.ptr = &scaling_matrix, .ptr = &scaling_matrix,
.size = sizeof(scaling_matrix), .size = sizeof(scaling_matrix),
}; };
}
}
controls[n++] = (struct v4l2_ext_control){ controls[n++] = (struct v4l2_ext_control){
.id = V4L2_CID_STATELESS_HEVC_DECODE_PARAMS, .id = V4L2_CID_STATELESS_HEVC_DECODE_PARAMS,
.ptr = &decode_params, .ptr = &decode_params,
.size = sizeof(decode_params), .size = sizeof(decode_params),
}; };
/*
* iter2 (ampere-kernel-decoders): VDPU381/383 rkvdec on Linux
* 7.0+ requires the EXT_SPS_{ST,LT}_RPS controls populated with
* parser-derived data. RK3399 rkvdec (linux 6.x or 7.x pre-
* VDPU381 bindings) doesn't have these CIDs; probe at init time
* (request.c::probe_hevc_ext_sps_rps_controls) gates this block.
*
* Per feedback_per_driver_kludge_gating, also gate explicitly on
* driver-kind to keep the human-readable intent clear even though
* the probe naturally returns false for RK3399.
*/
if (driver_data->has_hevc_ext_sps_rps_rkvdec) {
int err = h265_populate_ext_sps_rps_cache(driver_data,
surface_object);
if (err == 0 && driver_data->hevc_rps_cache_valid) {
if (driver_data->hevc_rps_cache_st_count > 0) {
controls[n++] = (struct v4l2_ext_control){
.id = V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS,
.ptr = driver_data->hevc_rps_cache_st,
.size = sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps) *
driver_data->hevc_rps_cache_st_count,
};
}
if (driver_data->hevc_rps_cache_lt_count > 0) {
controls[n++] = (struct v4l2_ext_control){
.id = V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS,
.ptr = driver_data->hevc_rps_cache_lt,
.size = sizeof(struct v4l2_ctrl_hevc_ext_sps_lt_rps) *
driver_data->hevc_rps_cache_lt_count,
};
}
}
/* If err is -ENODATA AND cache not valid (first-ever
* frame happens to lack an SPS NAL): we DON'T submit the
* new controls. The kernel's early-return-on-NULL path in
* rkvdec_hevc_prepare_hw_st_rps should fire and prevent
* the OOPS — Phase 7 verifies this matches the prediction. */
}
rc = v4l2_set_controls(driver_data->video_fd, rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd, surface_object->request_fd,
controls, n); controls, n);
+14
View File
@@ -0,0 +1,14 @@
/* Stub for <gst/base/base-prelude.h> — GStreamer base-lib prelude.
* In upstream GStreamer, this sets up the GstBaseExport macro + GObject
* boilerplate. We bypass all of that and provide only what our four
* vendored .c files actually need (gst_compat.h's typedefs).
*
* Crucially we also #define GST_BASE_API to nothing so the function
* declarations in gstbitreader.h / gstbytereader.h drop the
* dllimport / visibility attribute prefix.
*/
#ifndef LIBVA_V4L2_REQUEST_FOURIER_BASE_PRELUDE_STUB
#define LIBVA_V4L2_REQUEST_FOURIER_BASE_PRELUDE_STUB
#include "gst_compat.h"
#define GST_BASE_API
#endif
+307
View File
@@ -0,0 +1,307 @@
/* GStreamer
*
* Copyright (C) 2008 Sebastian Dröge <sebastian.droege@collabora.co.uk>.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Library General Public License for more details.
*
* You should have received a copy of the GNU Library General Public
* License along with this library; if not, write to the
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
* Boston, MA 02110-1301, USA.
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#define GST_BIT_READER_DISABLE_INLINES
#include "gstbitreader.h"
#include <string.h>
/**
* SECTION:gstbitreader
* @title: GstBitReader
* @short_description: Reads any number of bits from a memory buffer
* @symbols:
* - gst_bit_reader_skip_unchecked
* - gst_bit_reader_skip_to_byte_unchecked
* - gst_bit_reader_get_bits_uint8_unchecked
* - gst_bit_reader_peek_bits_uint8_unchecked
* - gst_bit_reader_get_bits_uint16_unchecked
* - gst_bit_reader_peek_bits_uint16_unchecked
* - gst_bit_reader_get_bits_uint32_unchecked
* - gst_bit_reader_peek_bits_uint32_unchecked
* - gst_bit_reader_get_bits_uint64_unchecked
* - gst_bit_reader_peek_bits_uint64_unchecked
*
* #GstBitReader provides a bit reader that can read any number of bits
* from a memory buffer. It provides functions for reading any number of bits
* into 8, 16, 32 and 64 bit variables.
*/
/**
* gst_bit_reader_new: (skip)
* @data: (array length=size): Data from which the #GstBitReader
* should read
* @size: Size of @data in bytes
*
* Create a new #GstBitReader instance, which will read from @data.
*
* Free-function: gst_bit_reader_free
*
* Returns: (transfer full): a new #GstBitReader instance
*/
GstBitReader *
gst_bit_reader_new (const guint8 * data, guint size)
{
GstBitReader *ret = g_new0 (GstBitReader, 1);
ret->data = data;
ret->size = size;
return ret;
}
/**
* gst_bit_reader_free:
* @reader: (in) (transfer full): a #GstBitReader instance
*
* Frees a #GstBitReader instance, which was previously allocated by
* gst_bit_reader_new().
*/
void
gst_bit_reader_free (GstBitReader * reader)
{
g_return_if_fail (reader != NULL);
g_free (reader);
}
/**
* gst_bit_reader_init:
* @reader: a #GstBitReader instance
* @data: (in) (array length=size): data from which the bit reader should read
* @size: Size of @data in bytes
*
* Initializes a #GstBitReader instance to read from @data. This function
* can be called on already initialized instances.
*/
void
gst_bit_reader_init (GstBitReader * reader, const guint8 * data, guint size)
{
g_return_if_fail (reader != NULL);
reader->data = data;
reader->size = size;
reader->byte = reader->bit = 0;
}
/**
* gst_bit_reader_set_pos:
* @reader: a #GstBitReader instance
* @pos: The new position in bits
*
* Sets the new position of a #GstBitReader instance to @pos in bits.
*
* Returns: %TRUE if the position could be set successfully, %FALSE
* otherwise.
*/
gboolean
gst_bit_reader_set_pos (GstBitReader * reader, guint pos)
{
g_return_val_if_fail (reader != NULL, FALSE);
if (pos > reader->size * 8)
return FALSE;
reader->byte = pos / 8;
reader->bit = pos % 8;
return TRUE;
}
/**
* gst_bit_reader_get_pos:
* @reader: a #GstBitReader instance
*
* Returns the current position of a #GstBitReader instance in bits.
*
* Returns: The current position of @reader in bits.
*/
guint
gst_bit_reader_get_pos (const GstBitReader * reader)
{
return _gst_bit_reader_get_pos_inline (reader);
}
/**
* gst_bit_reader_get_remaining:
* @reader: a #GstBitReader instance
*
* Returns the remaining number of bits of a #GstBitReader instance.
*
* Returns: The remaining number of bits of @reader instance.
*/
guint
gst_bit_reader_get_remaining (const GstBitReader * reader)
{
return _gst_bit_reader_get_remaining_inline (reader);
}
/**
* gst_bit_reader_get_size:
* @reader: a #GstBitReader instance
*
* Returns the total number of bits of a #GstBitReader instance.
*
* Returns: The total number of bits of @reader instance.
*/
guint
gst_bit_reader_get_size (const GstBitReader * reader)
{
return _gst_bit_reader_get_size_inline (reader);
}
/**
* gst_bit_reader_skip:
* @reader: a #GstBitReader instance
* @nbits: the number of bits to skip
*
* Skips @nbits bits of the #GstBitReader instance.
*
* Returns: %TRUE if @nbits bits could be skipped, %FALSE otherwise.
*/
gboolean
gst_bit_reader_skip (GstBitReader * reader, guint nbits)
{
return _gst_bit_reader_skip_inline (reader, nbits);
}
/**
* gst_bit_reader_skip_to_byte:
* @reader: a #GstBitReader instance
*
* Skips until the next byte.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
gboolean
gst_bit_reader_skip_to_byte (GstBitReader * reader)
{
return _gst_bit_reader_skip_to_byte_inline (reader);
}
/**
* gst_bit_reader_get_bits_uint8:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint8 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val and update the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
/**
* gst_bit_reader_get_bits_uint16:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint16 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val and update the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
/**
* gst_bit_reader_get_bits_uint32:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint32 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val and update the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
/**
* gst_bit_reader_get_bits_uint64:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint64 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val and update the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
/**
* gst_bit_reader_peek_bits_uint8:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint8 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val but keep the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
/**
* gst_bit_reader_peek_bits_uint16:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint16 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val but keep the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
/**
* gst_bit_reader_peek_bits_uint32:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint32 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val but keep the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
/**
* gst_bit_reader_peek_bits_uint64:
* @reader: a #GstBitReader instance
* @val: (out): Pointer to a #guint64 to store the result
* @nbits: number of bits to read
*
* Read @nbits bits into @val but keep the current position.
*
* Returns: %TRUE if successful, %FALSE otherwise.
*/
#define GST_BIT_READER_READ_BITS(bits) \
gboolean \
gst_bit_reader_peek_bits_uint##bits (const GstBitReader *reader, guint##bits *val, guint nbits) \
{ \
return _gst_bit_reader_peek_bits_uint##bits##_inline (reader, val, nbits); \
} \
\
gboolean \
gst_bit_reader_get_bits_uint##bits (GstBitReader *reader, guint##bits *val, guint nbits) \
{ \
return _gst_bit_reader_get_bits_uint##bits##_inline (reader, val, nbits); \
}
GST_BIT_READER_READ_BITS (8);
GST_BIT_READER_READ_BITS (16);
GST_BIT_READER_READ_BITS (32);
GST_BIT_READER_READ_BITS (64);
+328
View File
@@ -0,0 +1,328 @@
/* GStreamer
*
* Copyright (C) 2008 Sebastian Dröge <sebastian.droege@collabora.co.uk>.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Library General Public License for more details.
*
* You should have received a copy of the GNU Library General Public
* License along with this library; if not, write to the
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
* Boston, MA 02110-1301, USA.
*/
#ifndef __GST_BIT_READER_H__
#define __GST_BIT_READER_H__
#include <gst/gst.h>
#include <gst/base/base-prelude.h>
/* FIXME: inline functions */
G_BEGIN_DECLS
#define GST_BIT_READER(reader) ((GstBitReader *) (reader))
/**
* GstBitReader:
* @data: (array length=size): Data from which the bit reader will
* read
* @size: Size of @data in bytes
* @byte: Current byte position
* @bit: Bit position in the current byte
*
* A bit reader instance.
*/
typedef struct {
const guint8 *data;
guint size;
guint byte; /* Byte position */
guint bit; /* Bit position in the current byte */
/* < private > */
gpointer _gst_reserved[GST_PADDING];
} GstBitReader;
GST_BASE_API
GstBitReader * gst_bit_reader_new (const guint8 *data, guint size) G_GNUC_MALLOC;
GST_BASE_API
void gst_bit_reader_free (GstBitReader *reader);
GST_BASE_API
void gst_bit_reader_init (GstBitReader *reader, const guint8 *data, guint size);
GST_BASE_API
gboolean gst_bit_reader_set_pos (GstBitReader *reader, guint pos);
GST_BASE_API
guint gst_bit_reader_get_pos (const GstBitReader *reader);
GST_BASE_API
guint gst_bit_reader_get_remaining (const GstBitReader *reader);
GST_BASE_API
guint gst_bit_reader_get_size (const GstBitReader *reader);
GST_BASE_API
gboolean gst_bit_reader_skip (GstBitReader *reader, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_skip_to_byte (GstBitReader *reader);
GST_BASE_API
gboolean gst_bit_reader_get_bits_uint8 (GstBitReader *reader, guint8 *val, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_get_bits_uint16 (GstBitReader *reader, guint16 *val, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_get_bits_uint32 (GstBitReader *reader, guint32 *val, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_get_bits_uint64 (GstBitReader *reader, guint64 *val, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_peek_bits_uint8 (const GstBitReader *reader, guint8 *val, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_peek_bits_uint16 (const GstBitReader *reader, guint16 *val, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_peek_bits_uint32 (const GstBitReader *reader, guint32 *val, guint nbits);
GST_BASE_API
gboolean gst_bit_reader_peek_bits_uint64 (const GstBitReader *reader, guint64 *val, guint nbits);
/**
* GST_BIT_READER_INIT:
* @data: Data from which the #GstBitReader should read
* @size: Size of @data in bytes
*
* A #GstBitReader must be initialized with this macro, before it can be
* used. This macro can used be to initialize a variable, but it cannot
* be assigned to a variable. In that case you have to use
* gst_bit_reader_init().
*/
#define GST_BIT_READER_INIT(data, size) {data, size, 0, 0}
/* Unchecked variants */
static inline void
gst_bit_reader_skip_unchecked (GstBitReader * reader, guint nbits)
{
reader->bit += nbits;
reader->byte += reader->bit / 8;
reader->bit = reader->bit % 8;
}
static inline void
gst_bit_reader_skip_to_byte_unchecked (GstBitReader * reader)
{
if (reader->bit) {
reader->bit = 0;
reader->byte++;
}
}
#define __GST_BIT_READER_READ_BITS_UNCHECKED(bits) \
static inline guint##bits \
gst_bit_reader_peek_bits_uint##bits##_unchecked (const GstBitReader *reader, guint nbits) \
{ \
guint##bits ret = 0; \
const guint8 *data; \
guint byte, bit; \
\
data = reader->data; \
byte = reader->byte; \
bit = reader->bit; \
\
while (nbits > 0) { \
guint toread = MIN (nbits, 8 - bit); \
\
ret <<= toread; \
ret |= (data[byte] & (0xff >> bit)) >> (8 - toread - bit); \
\
bit += toread; \
if (bit >= 8) { \
byte++; \
bit = 0; \
} \
nbits -= toread; \
} \
\
return ret; \
} \
\
static inline guint##bits \
gst_bit_reader_get_bits_uint##bits##_unchecked (GstBitReader *reader, guint nbits) \
{ \
guint##bits ret; \
\
ret = gst_bit_reader_peek_bits_uint##bits##_unchecked (reader, nbits); \
\
gst_bit_reader_skip_unchecked (reader, nbits); \
\
return ret; \
}
__GST_BIT_READER_READ_BITS_UNCHECKED (8)
__GST_BIT_READER_READ_BITS_UNCHECKED (16)
__GST_BIT_READER_READ_BITS_UNCHECKED (32)
__GST_BIT_READER_READ_BITS_UNCHECKED (64)
#undef __GST_BIT_READER_READ_BITS_UNCHECKED
/* unchecked variants -- do not use */
static inline guint
_gst_bit_reader_get_size_unchecked (const GstBitReader * reader)
{
return reader->size * 8;
}
static inline guint
_gst_bit_reader_get_pos_unchecked (const GstBitReader * reader)
{
return reader->byte * 8 + reader->bit;
}
static inline guint
_gst_bit_reader_get_remaining_unchecked (const GstBitReader * reader)
{
return reader->size * 8 - (reader->byte * 8 + reader->bit);
}
/* inlined variants -- do not use directly */
static inline guint
_gst_bit_reader_get_size_inline (const GstBitReader * reader)
{
g_return_val_if_fail (reader != NULL, 0);
return _gst_bit_reader_get_size_unchecked (reader);
}
static inline guint
_gst_bit_reader_get_pos_inline (const GstBitReader * reader)
{
g_return_val_if_fail (reader != NULL, 0);
return _gst_bit_reader_get_pos_unchecked (reader);
}
static inline guint
_gst_bit_reader_get_remaining_inline (const GstBitReader * reader)
{
g_return_val_if_fail (reader != NULL, 0);
return _gst_bit_reader_get_remaining_unchecked (reader);
}
static inline gboolean
_gst_bit_reader_skip_inline (GstBitReader * reader, guint nbits)
{
g_return_val_if_fail (reader != NULL, FALSE);
if (_gst_bit_reader_get_remaining_unchecked (reader) < nbits)
return FALSE;
gst_bit_reader_skip_unchecked (reader, nbits);
return TRUE;
}
static inline gboolean
_gst_bit_reader_skip_to_byte_inline (GstBitReader * reader)
{
g_return_val_if_fail (reader != NULL, FALSE);
if (reader->byte > reader->size)
return FALSE;
gst_bit_reader_skip_to_byte_unchecked (reader);
return TRUE;
}
#define __GST_BIT_READER_READ_BITS_INLINE(bits) \
static inline gboolean \
_gst_bit_reader_get_bits_uint##bits##_inline (GstBitReader *reader, guint##bits *val, guint nbits) \
{ \
g_return_val_if_fail (reader != NULL, FALSE); \
g_return_val_if_fail (val != NULL, FALSE); \
g_return_val_if_fail (nbits <= bits, FALSE); \
\
if (_gst_bit_reader_get_remaining_unchecked (reader) < nbits) \
return FALSE; \
\
*val = gst_bit_reader_get_bits_uint##bits##_unchecked (reader, nbits); \
return TRUE; \
} \
\
static inline gboolean \
_gst_bit_reader_peek_bits_uint##bits##_inline (const GstBitReader *reader, guint##bits *val, guint nbits) \
{ \
g_return_val_if_fail (reader != NULL, FALSE); \
g_return_val_if_fail (val != NULL, FALSE); \
g_return_val_if_fail (nbits <= bits, FALSE); \
\
if (_gst_bit_reader_get_remaining_unchecked (reader) < nbits) \
return FALSE; \
\
*val = gst_bit_reader_peek_bits_uint##bits##_unchecked (reader, nbits); \
return TRUE; \
}
__GST_BIT_READER_READ_BITS_INLINE (8)
__GST_BIT_READER_READ_BITS_INLINE (16)
__GST_BIT_READER_READ_BITS_INLINE (32)
__GST_BIT_READER_READ_BITS_INLINE (64)
#undef __GST_BIT_READER_READ_BITS_INLINE
#ifndef GST_BIT_READER_DISABLE_INLINES
#define gst_bit_reader_get_size(reader) \
_gst_bit_reader_get_size_inline (reader)
#define gst_bit_reader_get_pos(reader) \
_gst_bit_reader_get_pos_inline (reader)
#define gst_bit_reader_get_remaining(reader) \
_gst_bit_reader_get_remaining_inline (reader)
/* we use defines here so we can add the G_LIKELY() */
#define gst_bit_reader_skip(reader, nbits)\
G_LIKELY (_gst_bit_reader_skip_inline(reader, nbits))
#define gst_bit_reader_skip_to_byte(reader)\
G_LIKELY (_gst_bit_reader_skip_to_byte_inline(reader))
#define gst_bit_reader_get_bits_uint8(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_get_bits_uint8_inline (reader, val, nbits))
#define gst_bit_reader_get_bits_uint16(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_get_bits_uint16_inline (reader, val, nbits))
#define gst_bit_reader_get_bits_uint32(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_get_bits_uint32_inline (reader, val, nbits))
#define gst_bit_reader_get_bits_uint64(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_get_bits_uint64_inline (reader, val, nbits))
#define gst_bit_reader_peek_bits_uint8(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_peek_bits_uint8_inline (reader, val, nbits))
#define gst_bit_reader_peek_bits_uint16(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_peek_bits_uint16_inline (reader, val, nbits))
#define gst_bit_reader_peek_bits_uint32(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_peek_bits_uint32_inline (reader, val, nbits))
#define gst_bit_reader_peek_bits_uint64(reader, val, nbits) \
G_LIKELY (_gst_bit_reader_peek_bits_uint64_inline (reader, val, nbits))
#endif
G_END_DECLS
#endif /* __GST_BIT_READER_H__ */
+67
View File
@@ -0,0 +1,67 @@
/* Stub for <gst/base/gstbitwriter.h>.
*
* The vendored nalutils.c uses GstBitWriter for NAL emulation-prevention
* byte INSERTION during write-side (encoder) operations. The libva
* backend never invokes those paths we only PARSE NAL units, never
* write them. The functions must still compile + link though, so we
* stub them with abort() runtime guards: if any future code path
* accidentally invokes a writer function, we fail-fast instead of
* silently corrupting.
*
* Header surface mirrors upstream gstbitwriter.h minimally enough
* for nalutils.c to compile.
*/
#ifndef LIBVA_V4L2_REQUEST_FOURIER_GSTBITWRITER_STUB
#define LIBVA_V4L2_REQUEST_FOURIER_GSTBITWRITER_STUB
#include "gst_compat.h"
typedef struct {
guint8 *data;
guint bit_size;
guint bit_capacity;
gboolean auto_grow;
gboolean owned;
} GstBitWriter;
static inline void
gst_bit_writer_init(GstBitWriter *bw) { (void)bw; abort(); }
static inline void
gst_bit_writer_init_with_size(GstBitWriter *bw, guint size, gboolean fixed) {
(void)bw; (void)size; (void)fixed; abort();
}
static inline void
gst_bit_writer_reset(GstBitWriter *bw) { (void)bw; abort(); }
static inline gboolean
gst_bit_writer_put_bits_uint8(GstBitWriter *bw, guint8 value, guint nbits) {
(void)bw; (void)value; (void)nbits; abort();
}
static inline gboolean
gst_bit_writer_align_bytes(GstBitWriter *bw, guint8 trailing_bit) {
(void)bw; (void)trailing_bit; abort();
}
static inline guint8 *
gst_bit_writer_get_data(GstBitWriter *bw) { (void)bw; abort(); }
static inline guint
gst_bit_writer_get_size(const GstBitWriter *bw) { (void)bw; abort(); }
static inline guint
gst_bit_writer_reset_and_get_size(GstBitWriter *bw) { (void)bw; abort(); }
static inline guint8 *
gst_bit_writer_reset_and_get_data(GstBitWriter *bw) { (void)bw; abort(); }
static inline gboolean
gst_bit_writer_put_bits_uint16(GstBitWriter *bw, guint16 value, guint nbits) {
(void)bw; (void)value; (void)nbits; abort();
}
static inline gboolean
gst_bit_writer_put_bits_uint32(GstBitWriter *bw, guint32 value, guint nbits) {
(void)bw; (void)value; (void)nbits; abort();
}
static inline gboolean
gst_bit_writer_put_bytes(GstBitWriter *bw, const guint8 *data, guint nbytes) {
(void)bw; (void)data; (void)nbytes; abort();
}
#define GST_BIT_WRITER_BIT_SIZE(bw) ((bw)->bit_size)
#define GST_BIT_WRITER_DATA(bw) ((bw)->data)
#endif
File diff suppressed because it is too large Load Diff
+684
View File
@@ -0,0 +1,684 @@
/* GStreamer byte reader
*
* Copyright (C) 2008 Sebastian Dröge <sebastian.droege@collabora.co.uk>.
* Copyright (C) 2009 Tim-Philipp Müller <tim centricular net>
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Library General Public License for more details.
*
* You should have received a copy of the GNU Library General Public
* License along with this library; if not, write to the
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
* Boston, MA 02110-1301, USA.
*/
#ifndef __GST_BYTE_READER_H__
#define __GST_BYTE_READER_H__
#include <gst/gst.h>
#include <gst/base/base-prelude.h>
G_BEGIN_DECLS
#define GST_BYTE_READER(reader) ((GstByteReader *) (reader))
/**
* GstByteReader:
* @data: (array length=size): Data from which the bit reader will
* read
* @size: Size of @data in bytes
* @byte: Current byte position
*
* A byte reader instance.
*/
typedef struct {
const guint8 *data;
guint size;
guint byte; /* Byte position */
/* < private > */
gpointer _gst_reserved[GST_PADDING];
} GstByteReader;
GST_BASE_API
GstByteReader * gst_byte_reader_new (const guint8 *data, guint size) G_GNUC_MALLOC;
GST_BASE_API
void gst_byte_reader_free (GstByteReader *reader);
GST_BASE_API
void gst_byte_reader_init (GstByteReader *reader, const guint8 *data, guint size);
GST_BASE_API
gboolean gst_byte_reader_peek_sub_reader (GstByteReader * reader,
GstByteReader * sub_reader,
guint size);
GST_BASE_API
gboolean gst_byte_reader_get_sub_reader (GstByteReader * reader,
GstByteReader * sub_reader,
guint size);
GST_BASE_API
gboolean gst_byte_reader_set_pos (GstByteReader *reader, guint pos);
GST_BASE_API
guint gst_byte_reader_get_pos (const GstByteReader *reader);
GST_BASE_API
guint gst_byte_reader_get_remaining (const GstByteReader *reader);
GST_BASE_API
guint gst_byte_reader_get_size (const GstByteReader *reader);
GST_BASE_API
gboolean gst_byte_reader_skip (GstByteReader *reader, guint nbytes);
GST_BASE_API
gboolean gst_byte_reader_get_uint8 (GstByteReader *reader, guint8 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int8 (GstByteReader *reader, gint8 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint16_le (GstByteReader *reader, guint16 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int16_le (GstByteReader *reader, gint16 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint16_be (GstByteReader *reader, guint16 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int16_be (GstByteReader *reader, gint16 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint24_le (GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int24_le (GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint24_be (GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int24_be (GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint32_le (GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int32_le (GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint32_be (GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int32_be (GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint64_le (GstByteReader *reader, guint64 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int64_le (GstByteReader *reader, gint64 *val);
GST_BASE_API
gboolean gst_byte_reader_get_uint64_be (GstByteReader *reader, guint64 *val);
GST_BASE_API
gboolean gst_byte_reader_get_int64_be (GstByteReader *reader, gint64 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint8 (const GstByteReader *reader, guint8 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int8 (const GstByteReader *reader, gint8 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint16_le (const GstByteReader *reader, guint16 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int16_le (const GstByteReader *reader, gint16 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint16_be (const GstByteReader *reader, guint16 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int16_be (const GstByteReader *reader, gint16 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint24_le (const GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int24_le (const GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint24_be (const GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int24_be (const GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint32_le (const GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int32_le (const GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint32_be (const GstByteReader *reader, guint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int32_be (const GstByteReader *reader, gint32 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint64_le (const GstByteReader *reader, guint64 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int64_le (const GstByteReader *reader, gint64 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_uint64_be (const GstByteReader *reader, guint64 *val);
GST_BASE_API
gboolean gst_byte_reader_peek_int64_be (const GstByteReader *reader, gint64 *val);
GST_BASE_API
gboolean gst_byte_reader_get_float32_le (GstByteReader *reader, gfloat *val);
GST_BASE_API
gboolean gst_byte_reader_get_float32_be (GstByteReader *reader, gfloat *val);
GST_BASE_API
gboolean gst_byte_reader_get_float64_le (GstByteReader *reader, gdouble *val);
GST_BASE_API
gboolean gst_byte_reader_get_float64_be (GstByteReader *reader, gdouble *val);
GST_BASE_API
gboolean gst_byte_reader_peek_float32_le (const GstByteReader *reader, gfloat *val);
GST_BASE_API
gboolean gst_byte_reader_peek_float32_be (const GstByteReader *reader, gfloat *val);
GST_BASE_API
gboolean gst_byte_reader_peek_float64_le (const GstByteReader *reader, gdouble *val);
GST_BASE_API
gboolean gst_byte_reader_peek_float64_be (const GstByteReader *reader, gdouble *val);
GST_BASE_API
gboolean gst_byte_reader_dup_data (GstByteReader * reader, guint size, guint8 ** val);
GST_BASE_API
gboolean gst_byte_reader_get_data (GstByteReader * reader, guint size, const guint8 ** val);
GST_BASE_API
gboolean gst_byte_reader_peek_data (const GstByteReader * reader, guint size, const guint8 ** val);
#define gst_byte_reader_dup_string(reader,str) \
gst_byte_reader_dup_string_utf8(reader,str)
GST_BASE_API
gboolean gst_byte_reader_dup_string_utf8 (GstByteReader * reader, gchar ** str);
GST_BASE_API
gboolean gst_byte_reader_dup_string_utf16 (GstByteReader * reader, guint16 ** str);
GST_BASE_API
gboolean gst_byte_reader_dup_string_utf32 (GstByteReader * reader, guint32 ** str);
#define gst_byte_reader_skip_string(reader) \
gst_byte_reader_skip_string_utf8(reader)
GST_BASE_API
gboolean gst_byte_reader_skip_string_utf8 (GstByteReader * reader);
GST_BASE_API
gboolean gst_byte_reader_skip_string_utf16 (GstByteReader * reader);
GST_BASE_API
gboolean gst_byte_reader_skip_string_utf32 (GstByteReader * reader);
#define gst_byte_reader_get_string(reader,str) \
gst_byte_reader_get_string_utf8(reader,str)
#define gst_byte_reader_peek_string(reader,str) \
gst_byte_reader_peek_string_utf8(reader,str)
GST_BASE_API
gboolean gst_byte_reader_get_string_utf8 (GstByteReader * reader, const gchar ** str);
GST_BASE_API
gboolean gst_byte_reader_peek_string_utf8 (const GstByteReader * reader, const gchar ** str);
GST_BASE_API
guint gst_byte_reader_masked_scan_uint32 (const GstByteReader * reader,
guint32 mask,
guint32 pattern,
guint offset,
guint size);
GST_BASE_API
guint gst_byte_reader_masked_scan_uint32_peek (const GstByteReader * reader,
guint32 mask,
guint32 pattern,
guint offset,
guint size,
guint32 * value);
/**
* GST_BYTE_READER_INIT:
* @data: Data from which the #GstByteReader should read
* @size: Size of @data in bytes
*
* A #GstByteReader must be initialized with this macro, before it can be
* used. This macro can used be to initialize a variable, but it cannot
* be assigned to a variable. In that case you have to use
* gst_byte_reader_init().
*/
#define GST_BYTE_READER_INIT(data, size) {data, size, 0}
/* unchecked variants */
static inline void
gst_byte_reader_skip_unchecked (GstByteReader * reader, guint nbytes)
{
reader->byte += nbytes;
}
#define __GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(bits,type,lower,upper,adj) \
\
static inline type \
gst_byte_reader_peek_##lower##_unchecked (const GstByteReader * reader) \
{ \
type val = (type) GST_READ_##upper (reader->data + reader->byte); \
adj \
return val; \
} \
\
static inline type \
gst_byte_reader_get_##lower##_unchecked (GstByteReader * reader) \
{ \
type val = gst_byte_reader_peek_##lower##_unchecked (reader); \
reader->byte += bits / 8; \
return val; \
}
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(8,guint8,uint8,UINT8,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(8,gint8,int8,UINT8,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(16,guint16,uint16_le,UINT16_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(16,guint16,uint16_be,UINT16_BE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(16,gint16,int16_le,UINT16_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(16,gint16,int16_be,UINT16_BE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,guint32,uint32_le,UINT32_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,guint32,uint32_be,UINT32_BE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,gint32,int32_le,UINT32_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,gint32,int32_be,UINT32_BE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(24,guint32,uint24_le,UINT24_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(24,guint32,uint24_be,UINT24_BE,/* */)
/* fix up the sign for 24-bit signed ints stored in 32-bit signed ints */
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(24,gint32,int24_le,UINT24_LE,
if (val & 0x00800000) val |= 0xff000000;)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(24,gint32,int24_be,UINT24_BE,
if (val & 0x00800000) val |= 0xff000000;)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,guint64,uint64_le,UINT64_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,guint64,uint64_be,UINT64_BE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,gint64,int64_le,UINT64_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,gint64,int64_be,UINT64_BE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,gfloat,float32_le,FLOAT_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(32,gfloat,float32_be,FLOAT_BE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,gdouble,float64_le,DOUBLE_LE,/* */)
__GST_BYTE_READER_GET_PEEK_BITS_UNCHECKED(64,gdouble,float64_be,DOUBLE_BE,/* */)
#undef __GET_PEEK_BITS_UNCHECKED
static inline const guint8 *
gst_byte_reader_peek_data_unchecked (const GstByteReader * reader)
{
return (const guint8 *) (reader->data + reader->byte);
}
static inline const guint8 *
gst_byte_reader_get_data_unchecked (GstByteReader * reader, guint size)
{
const guint8 *data;
data = gst_byte_reader_peek_data_unchecked (reader);
gst_byte_reader_skip_unchecked (reader, size);
return data;
}
static inline guint8 *
gst_byte_reader_dup_data_unchecked (GstByteReader * reader, guint size)
{
gconstpointer data = gst_byte_reader_get_data_unchecked (reader, size);
guint8 *dup_data = (guint8 *) g_malloc (size);
memcpy (dup_data, data, size);
return dup_data;
}
/* Unchecked variants that should not be used */
static inline guint
_gst_byte_reader_get_pos_unchecked (const GstByteReader * reader)
{
return reader->byte;
}
static inline guint
_gst_byte_reader_get_remaining_unchecked (const GstByteReader * reader)
{
return reader->size - reader->byte;
}
static inline guint
_gst_byte_reader_get_size_unchecked (const GstByteReader * reader)
{
return reader->size;
}
/* inlined variants (do not use directly) */
static inline guint
_gst_byte_reader_get_remaining_inline (const GstByteReader * reader)
{
g_return_val_if_fail (reader != NULL, 0);
return _gst_byte_reader_get_remaining_unchecked (reader);
}
static inline guint
_gst_byte_reader_get_size_inline (const GstByteReader * reader)
{
g_return_val_if_fail (reader != NULL, 0);
return _gst_byte_reader_get_size_unchecked (reader);
}
#define __GST_BYTE_READER_GET_PEEK_BITS_INLINE(bits,type,name) \
\
static inline gboolean \
_gst_byte_reader_peek_##name##_inline (const GstByteReader * reader, type * val) \
{ \
g_return_val_if_fail (reader != NULL, FALSE); \
g_return_val_if_fail (val != NULL, FALSE); \
\
if (_gst_byte_reader_get_remaining_unchecked (reader) < (bits / 8)) \
return FALSE; \
\
*val = gst_byte_reader_peek_##name##_unchecked (reader); \
return TRUE; \
} \
\
static inline gboolean \
_gst_byte_reader_get_##name##_inline (GstByteReader * reader, type * val) \
{ \
g_return_val_if_fail (reader != NULL, FALSE); \
g_return_val_if_fail (val != NULL, FALSE); \
\
if (_gst_byte_reader_get_remaining_unchecked (reader) < (bits / 8)) \
return FALSE; \
\
*val = gst_byte_reader_get_##name##_unchecked (reader); \
return TRUE; \
}
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(8,guint8,uint8)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(8,gint8,int8)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(16,guint16,uint16_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(16,guint16,uint16_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(16,gint16,int16_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(16,gint16,int16_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,guint32,uint32_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,guint32,uint32_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,gint32,int32_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,gint32,int32_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(24,guint32,uint24_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(24,guint32,uint24_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(24,gint32,int24_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(24,gint32,int24_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,guint64,uint64_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,guint64,uint64_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,gint64,int64_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,gint64,int64_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,gfloat,float32_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(32,gfloat,float32_be)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,gdouble,float64_le)
__GST_BYTE_READER_GET_PEEK_BITS_INLINE(64,gdouble,float64_be)
#undef __GST_BYTE_READER_GET_PEEK_BITS_INLINE
#ifndef GST_BYTE_READER_DISABLE_INLINES
#define gst_byte_reader_init(reader,data,size) \
_gst_byte_reader_init_inline(reader,data,size)
#define gst_byte_reader_get_remaining(reader) \
_gst_byte_reader_get_remaining_inline(reader)
#define gst_byte_reader_get_size(reader) \
_gst_byte_reader_get_size_inline(reader)
#define gst_byte_reader_get_pos(reader) \
_gst_byte_reader_get_pos_inline(reader)
/* we use defines here so we can add the G_LIKELY() */
#define gst_byte_reader_get_uint8(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint8_inline(reader,val))
#define gst_byte_reader_get_int8(reader,val) \
G_LIKELY(_gst_byte_reader_get_int8_inline(reader,val))
#define gst_byte_reader_get_uint16_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint16_le_inline(reader,val))
#define gst_byte_reader_get_int16_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_int16_le_inline(reader,val))
#define gst_byte_reader_get_uint16_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint16_be_inline(reader,val))
#define gst_byte_reader_get_int16_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_int16_be_inline(reader,val))
#define gst_byte_reader_get_uint24_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint24_le_inline(reader,val))
#define gst_byte_reader_get_int24_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_int24_le_inline(reader,val))
#define gst_byte_reader_get_uint24_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint24_be_inline(reader,val))
#define gst_byte_reader_get_int24_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_int24_be_inline(reader,val))
#define gst_byte_reader_get_uint32_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint32_le_inline(reader,val))
#define gst_byte_reader_get_int32_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_int32_le_inline(reader,val))
#define gst_byte_reader_get_uint32_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint32_be_inline(reader,val))
#define gst_byte_reader_get_int32_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_int32_be_inline(reader,val))
#define gst_byte_reader_get_uint64_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint64_le_inline(reader,val))
#define gst_byte_reader_get_int64_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_int64_le_inline(reader,val))
#define gst_byte_reader_get_uint64_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_uint64_be_inline(reader,val))
#define gst_byte_reader_get_int64_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_int64_be_inline(reader,val))
#define gst_byte_reader_peek_uint8(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint8_inline(reader,val))
#define gst_byte_reader_peek_int8(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int8_inline(reader,val))
#define gst_byte_reader_peek_uint16_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint16_le_inline(reader,val))
#define gst_byte_reader_peek_int16_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int16_le_inline(reader,val))
#define gst_byte_reader_peek_uint16_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint16_be_inline(reader,val))
#define gst_byte_reader_peek_int16_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int16_be_inline(reader,val))
#define gst_byte_reader_peek_uint24_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint24_le_inline(reader,val))
#define gst_byte_reader_peek_int24_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int24_le_inline(reader,val))
#define gst_byte_reader_peek_uint24_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint24_be_inline(reader,val))
#define gst_byte_reader_peek_int24_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int24_be_inline(reader,val))
#define gst_byte_reader_peek_uint32_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint32_le_inline(reader,val))
#define gst_byte_reader_peek_int32_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int32_le_inline(reader,val))
#define gst_byte_reader_peek_uint32_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint32_be_inline(reader,val))
#define gst_byte_reader_peek_int32_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int32_be_inline(reader,val))
#define gst_byte_reader_peek_uint64_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint64_le_inline(reader,val))
#define gst_byte_reader_peek_int64_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int64_le_inline(reader,val))
#define gst_byte_reader_peek_uint64_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_uint64_be_inline(reader,val))
#define gst_byte_reader_peek_int64_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_int64_be_inline(reader,val))
#define gst_byte_reader_get_float32_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_float32_le_inline(reader,val))
#define gst_byte_reader_get_float32_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_float32_be_inline(reader,val))
#define gst_byte_reader_get_float64_le(reader,val) \
G_LIKELY(_gst_byte_reader_get_float64_le_inline(reader,val))
#define gst_byte_reader_get_float64_be(reader,val) \
G_LIKELY(_gst_byte_reader_get_float64_be_inline(reader,val))
#define gst_byte_reader_peek_float32_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_float32_le_inline(reader,val))
#define gst_byte_reader_peek_float32_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_float32_be_inline(reader,val))
#define gst_byte_reader_peek_float64_le(reader,val) \
G_LIKELY(_gst_byte_reader_peek_float64_le_inline(reader,val))
#define gst_byte_reader_peek_float64_be(reader,val) \
G_LIKELY(_gst_byte_reader_peek_float64_be_inline(reader,val))
#endif /* GST_BYTE_READER_DISABLE_INLINES */
static inline void
_gst_byte_reader_init_inline (GstByteReader * reader, const guint8 * data, guint size)
{
g_return_if_fail (reader != NULL);
reader->data = data;
reader->size = size;
reader->byte = 0;
}
static inline gboolean
_gst_byte_reader_peek_sub_reader_inline (GstByteReader * reader,
GstByteReader * sub_reader, guint size)
{
g_return_val_if_fail (reader != NULL, FALSE);
g_return_val_if_fail (sub_reader != NULL, FALSE);
if (_gst_byte_reader_get_remaining_unchecked (reader) < size)
return FALSE;
sub_reader->data = reader->data + reader->byte;
sub_reader->byte = 0;
sub_reader->size = size;
return TRUE;
}
static inline gboolean
_gst_byte_reader_get_sub_reader_inline (GstByteReader * reader,
GstByteReader * sub_reader, guint size)
{
if (!_gst_byte_reader_peek_sub_reader_inline (reader, sub_reader, size))
return FALSE;
gst_byte_reader_skip_unchecked (reader, size);
return TRUE;
}
static inline gboolean
_gst_byte_reader_dup_data_inline (GstByteReader * reader, guint size, guint8 ** val)
{
g_return_val_if_fail (reader != NULL, FALSE);
g_return_val_if_fail (val != NULL, FALSE);
if (G_UNLIKELY (size > reader->size || _gst_byte_reader_get_remaining_unchecked (reader) < size))
return FALSE;
*val = gst_byte_reader_dup_data_unchecked (reader, size);
return TRUE;
}
static inline gboolean
_gst_byte_reader_get_data_inline (GstByteReader * reader, guint size, const guint8 ** val)
{
g_return_val_if_fail (reader != NULL, FALSE);
g_return_val_if_fail (val != NULL, FALSE);
if (G_UNLIKELY (size > reader->size || _gst_byte_reader_get_remaining_unchecked (reader) < size))
return FALSE;
*val = gst_byte_reader_get_data_unchecked (reader, size);
return TRUE;
}
static inline gboolean
_gst_byte_reader_peek_data_inline (const GstByteReader * reader, guint size, const guint8 ** val)
{
g_return_val_if_fail (reader != NULL, FALSE);
g_return_val_if_fail (val != NULL, FALSE);
if (G_UNLIKELY (size > reader->size || _gst_byte_reader_get_remaining_unchecked (reader) < size))
return FALSE;
*val = gst_byte_reader_peek_data_unchecked (reader);
return TRUE;
}
static inline guint
_gst_byte_reader_get_pos_inline (const GstByteReader * reader)
{
g_return_val_if_fail (reader != NULL, 0);
return _gst_byte_reader_get_pos_unchecked (reader);
}
static inline gboolean
_gst_byte_reader_skip_inline (GstByteReader * reader, guint nbytes)
{
g_return_val_if_fail (reader != NULL, FALSE);
if (G_UNLIKELY (_gst_byte_reader_get_remaining_unchecked (reader) < nbytes))
return FALSE;
reader->byte += nbytes;
return TRUE;
}
#ifndef GST_BYTE_READER_DISABLE_INLINES
#define gst_byte_reader_dup_data(reader,size,val) \
G_LIKELY(_gst_byte_reader_dup_data_inline(reader,size,val))
#define gst_byte_reader_get_data(reader,size,val) \
G_LIKELY(_gst_byte_reader_get_data_inline(reader,size,val))
#define gst_byte_reader_peek_data(reader,size,val) \
G_LIKELY(_gst_byte_reader_peek_data_inline(reader,size,val))
#define gst_byte_reader_skip(reader,nbytes) \
G_LIKELY(_gst_byte_reader_skip_inline(reader,nbytes))
#endif /* GST_BYTE_READER_DISABLE_INLINES */
G_END_DECLS
#endif /* __GST_BYTE_READER_H__ */
@@ -0,0 +1,9 @@
/* Stub for <gst/codecparsers/codecparsers-prelude.h>.
* Same shape as base-prelude.h drop the GObject boilerplate + define
* the GstCodecParsersAPI macro to nothing.
*/
#ifndef LIBVA_V4L2_REQUEST_FOURIER_CODECPARSERS_PRELUDE_STUB
#define LIBVA_V4L2_REQUEST_FOURIER_CODECPARSERS_PRELUDE_STUB
#include "gst_compat.h"
#define GST_CODEC_PARSERS_API
#endif
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+545
View File
@@ -0,0 +1,545 @@
/* Gstreamer
* Copyright (C) <2011> Intel Corporation
* Copyright (C) <2011> Collabora Ltd.
* Copyright (C) <2011> Thibault Saunier <thibault.saunier@collabora.com>
*
* Some bits C-c,C-v'ed and s/4/3 from h264parse and videoparsers/h264parse.c:
* Copyright (C) <2010> Mark Nauwelaerts <mark.nauwelaerts@collabora.co.uk>
* Copyright (C) <2010> Collabora Multimedia
* Copyright (C) <2010> Nokia Corporation
*
* (C) 2005 Michal Benes <michal.benes@itonis.tv>
* (C) 2008 Wim Taymans <wim.taymans@gmail.com>
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Library General Public License for more details.
*
* You should have received a copy of the GNU Library General Public
* License along with this library; if not, write to the
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
* Boston, MA 02110-1301, USA.
*/
/*
* Common code for NAL parsing from h264 and h265 parsers.
*/
#ifdef HAVE_CONFIG_H
# include "config.h"
#endif
#include "nalutils.h"
/****** Nal parser ******/
void
nal_reader_init (NalReader * nr, const guint8 * data, guint size)
{
nr->data = data;
nr->size = size;
nr->n_epb = 0;
nr->byte = 0;
nr->bits_in_cache = 0;
/* fill with something other than 0 to detect emulation prevention bytes */
nr->first_byte = 0xff;
nr->epb_cache = 0xff;
nr->cache = 0xff;
}
gboolean
nal_reader_read (NalReader * nr, guint nbits)
{
if (G_UNLIKELY (nr->byte * 8 + (nbits - nr->bits_in_cache) > nr->size * 8)) {
GST_DEBUG ("Can not read %u bits, bits in cache %u, Byte * 8 %u, size in "
"bits %u", nbits, nr->bits_in_cache, nr->byte * 8, nr->size * 8);
return FALSE;
}
while (nr->bits_in_cache < nbits) {
guint8 byte;
next_byte:
if (G_UNLIKELY (nr->byte >= nr->size))
return FALSE;
byte = nr->data[nr->byte++];
nr->epb_cache = (nr->epb_cache << 8) | byte;
/* check if the byte is a emulation_prevention_three_byte */
if ((nr->epb_cache & 0xffffff) == 0x3) {
nr->n_epb++;
goto next_byte;
}
nr->cache = (nr->cache << 8) | nr->first_byte;
nr->first_byte = byte;
nr->bits_in_cache += 8;
}
return TRUE;
}
/* Skips the specified amount of bits. This is only suitable to a
cacheable number of bits */
gboolean
nal_reader_skip (NalReader * nr, guint nbits)
{
g_assert (nbits <= 8 * sizeof (nr->cache));
if (G_UNLIKELY (!nal_reader_read (nr, nbits)))
return FALSE;
nr->bits_in_cache -= nbits;
return TRUE;
}
/* Generic version to skip any number of bits */
gboolean
nal_reader_skip_long (NalReader * nr, guint nbits)
{
/* Leave out enough bits in the cache once we are finished */
const guint skip_size = 4 * sizeof (nr->cache);
guint remaining = nbits;
nbits %= skip_size;
while (remaining > 0) {
if (!nal_reader_skip (nr, nbits))
return FALSE;
remaining -= nbits;
nbits = skip_size;
}
return TRUE;
}
guint
nal_reader_get_pos (const NalReader * nr)
{
return nr->byte * 8 - nr->bits_in_cache;
}
guint
nal_reader_get_remaining (const NalReader * nr)
{
return (nr->size - nr->byte) * 8 + nr->bits_in_cache;
}
guint
nal_reader_get_epb_count (const NalReader * nr)
{
return nr->n_epb;
}
#define NAL_READER_READ_BITS(bits) \
gboolean \
nal_reader_get_bits_uint##bits (NalReader *nr, guint##bits *val, guint nbits) \
{ \
guint shift; \
\
if (!nal_reader_read (nr, nbits)) \
return FALSE; \
\
/* bring the required bits down and truncate */ \
shift = nr->bits_in_cache - nbits; \
*val = nr->first_byte >> shift; \
\
*val |= nr->cache << (8 - shift); \
/* mask out required bits */ \
if (nbits < bits) \
*val &= ((guint##bits)1 << nbits) - 1; \
\
nr->bits_in_cache = shift; \
\
return TRUE; \
} \
NAL_READER_READ_BITS (8);
NAL_READER_READ_BITS (16);
NAL_READER_READ_BITS (32);
#define NAL_READER_PEEK_BITS(bits) \
gboolean \
nal_reader_peek_bits_uint##bits (const NalReader *nr, guint##bits *val, guint nbits) \
{ \
NalReader tmp; \
\
tmp = *nr; \
return nal_reader_get_bits_uint##bits (&tmp, val, nbits); \
}
NAL_READER_PEEK_BITS (8);
gboolean
nal_reader_get_ue (NalReader * nr, guint32 * val)
{
guint i = 0;
guint8 bit;
guint32 value;
if (G_UNLIKELY (!nal_reader_get_bits_uint8 (nr, &bit, 1)))
return FALSE;
while (bit == 0) {
i++;
if (G_UNLIKELY (!nal_reader_get_bits_uint8 (nr, &bit, 1)))
return FALSE;
}
if (G_UNLIKELY (i > 31))
return FALSE;
if (G_UNLIKELY (!nal_reader_get_bits_uint32 (nr, &value, i)))
return FALSE;
*val = (1 << i) - 1 + value;
return TRUE;
}
gboolean
nal_reader_get_se (NalReader * nr, gint32 * val)
{
guint32 value;
if (G_UNLIKELY (!nal_reader_get_ue (nr, &value)))
return FALSE;
if (value % 2)
*val = (value / 2) + 1;
else
*val = -(value / 2);
return TRUE;
}
gboolean
nal_reader_is_byte_aligned (NalReader * nr)
{
if (nr->bits_in_cache != 0)
return FALSE;
return TRUE;
}
gboolean
nal_reader_has_more_data (NalReader * nr)
{
NalReader nr_tmp;
guint remaining, nbits;
guint8 rbsp_stop_one_bit, zero_bits;
remaining = nal_reader_get_remaining (nr);
if (remaining == 0)
return FALSE;
nr_tmp = *nr;
nr = &nr_tmp;
/* The spec defines that more_rbsp_data() searches for the last bit
equal to 1, and that it is the rbsp_stop_one_bit. Subsequent bits
until byte boundary is reached shall be zero.
This means that more_rbsp_data() is FALSE if the next bit is 1
and the remaining bits until byte boundary are zero. One way to
be sure that this bit was the very last one, is that every other
bit after we reached byte boundary are also set to zero.
Otherwise, if the next bit is 0 or if there are non-zero bits
afterwards, then then we have more_rbsp_data() */
if (!nal_reader_get_bits_uint8 (nr, &rbsp_stop_one_bit, 1))
return FALSE;
if (!rbsp_stop_one_bit)
return TRUE;
nbits = --remaining % 8;
while (remaining > 0) {
if (!nal_reader_get_bits_uint8 (nr, &zero_bits, nbits))
return FALSE;
if (zero_bits != 0)
return TRUE;
remaining -= nbits;
nbits = 8;
}
return FALSE;
}
/*********** end of nal parser ***************/
gint
scan_for_start_codes (const guint8 * data, guint size)
{
GstByteReader br;
gst_byte_reader_init (&br, data, size);
/* NALU not empty, so we can at least expect 1 (even 2) bytes following sc */
return gst_byte_reader_masked_scan_uint32 (&br, 0xffffff00, 0x00000100,
0, size);
}
void
nal_writer_init (NalWriter * nw, guint nal_prefix_size, gboolean packetized)
{
g_return_if_fail (nw != NULL);
g_return_if_fail ((packetized && nal_prefix_size > 1 && nal_prefix_size < 5)
|| (!packetized && (nal_prefix_size == 3 || nal_prefix_size == 4)));
gst_bit_writer_init (&nw->bw);
nw->nal_prefix_size = nal_prefix_size;
nw->packetized = packetized;
}
void
nal_writer_reset (NalWriter * nw)
{
g_return_if_fail (nw != NULL);
gst_bit_writer_reset (&nw->bw);
memset (nw, 0, sizeof (NalWriter));
}
gboolean
nal_writer_do_rbsp_trailing_bits (NalWriter * nw)
{
g_return_val_if_fail (nw != NULL, FALSE);
if (!gst_bit_writer_put_bits_uint8 (&nw->bw, 1, 1)) {
GST_WARNING ("Cannot put trailing bits");
return FALSE;
}
if (!gst_bit_writer_align_bytes (&nw->bw, 0)) {
GST_WARNING ("Cannot put align bits");
return FALSE;
}
return TRUE;
}
static gpointer
nal_writer_create_nal_data (NalWriter * nw, guint32 * ret_size)
{
GstBitWriter bw;
gint i;
guint8 *src, *dst;
gsize size;
gpointer data;
/* scan to put emulation_prevention_three_byte */
size = GST_BIT_WRITER_BIT_SIZE (&nw->bw) >> 3;
src = GST_BIT_WRITER_DATA (&nw->bw);
gst_bit_writer_init_with_size (&bw, size + nw->nal_prefix_size, FALSE);
for (i = 0; i < nw->nal_prefix_size - 1; i++)
gst_bit_writer_put_bits_uint8 (&bw, 0, 8);
gst_bit_writer_put_bits_uint8 (&bw, 1, 8);
for (i = 0; i < size; i++) {
guint pos = (GST_BIT_WRITER_BIT_SIZE (&bw) >> 3);
dst = GST_BIT_WRITER_DATA (&bw);
if (pos >= nw->nal_prefix_size + 2 &&
dst[pos - 2] == 0 && dst[pos - 1] == 0 && src[i] <= 0x3) {
gst_bit_writer_put_bits_uint8 (&bw, 0x3, 8);
}
gst_bit_writer_put_bits_uint8 (&bw, src[i], 8);
}
*ret_size = bw.bit_size >> 3;
data = gst_bit_writer_reset_and_get_data (&bw);
if (nw->packetized) {
size = *ret_size - nw->nal_prefix_size;
switch (nw->nal_prefix_size) {
case 1:
GST_WRITE_UINT8 (data, size);
break;
case 2:
GST_WRITE_UINT16_BE (data, size);
break;
case 3:
GST_WRITE_UINT24_BE (data, size);
break;
case 4:
GST_WRITE_UINT32_BE (data, size);
break;
default:
g_assert_not_reached ();
break;
}
}
return data;
}
GstMemory *
nal_writer_reset_and_get_memory (NalWriter * nw)
{
guint32 size = 0;
GstMemory *ret = NULL;
gpointer data;
g_return_val_if_fail (nw != NULL, NULL);
if ((GST_BIT_WRITER_BIT_SIZE (&nw->bw) >> 3) == 0) {
GST_WARNING ("No written byte");
goto done;
}
if ((GST_BIT_WRITER_BIT_SIZE (&nw->bw) & 0x7) != 0) {
GST_WARNING ("Written stream is not byte aligned");
if (!nal_writer_do_rbsp_trailing_bits (nw))
goto done;
}
data = nal_writer_create_nal_data (nw, &size);
if (!data) {
GST_WARNING ("Failed to create nal data");
goto done;
}
ret = gst_memory_new_wrapped (0, data, size, 0, size, data, g_free);
done:
gst_bit_writer_reset (&nw->bw);
return ret;
}
guint8 *
nal_writer_reset_and_get_data (NalWriter * nw, guint32 * ret_size)
{
guint32 size = 0;
guint8 *data = NULL;
g_return_val_if_fail (nw != NULL, NULL);
g_return_val_if_fail (ret_size != NULL, NULL);
*ret_size = 0;
if ((GST_BIT_WRITER_BIT_SIZE (&nw->bw) >> 3) == 0) {
GST_WARNING ("No written byte");
goto done;
}
if ((GST_BIT_WRITER_BIT_SIZE (&nw->bw) & 0x7) != 0) {
GST_WARNING ("Written stream is not byte aligned");
if (!nal_writer_do_rbsp_trailing_bits (nw))
goto done;
}
data = nal_writer_create_nal_data (nw, &size);
if (!data) {
GST_WARNING ("Failed to create nal data");
goto done;
}
*ret_size = size;
done:
gst_bit_writer_reset (&nw->bw);
return data;
}
gboolean
nal_writer_put_bits_uint8 (NalWriter * nw, guint8 value, guint nbits)
{
g_return_val_if_fail (nw != NULL, FALSE);
if (!gst_bit_writer_put_bits_uint8 (&nw->bw, value, nbits))
return FALSE;
return TRUE;
}
gboolean
nal_writer_put_bits_uint16 (NalWriter * nw, guint16 value, guint nbits)
{
g_return_val_if_fail (nw != NULL, FALSE);
if (!gst_bit_writer_put_bits_uint16 (&nw->bw, value, nbits))
return FALSE;
return TRUE;
}
gboolean
nal_writer_put_bits_uint32 (NalWriter * nw, guint32 value, guint nbits)
{
g_return_val_if_fail (nw != NULL, FALSE);
if (!gst_bit_writer_put_bits_uint32 (&nw->bw, value, nbits))
return FALSE;
return TRUE;
}
gboolean
nal_writer_put_bytes (NalWriter * nw, const guint8 * data, guint nbytes)
{
g_return_val_if_fail (nw != NULL, FALSE);
g_return_val_if_fail (data != NULL, FALSE);
g_return_val_if_fail (nbytes != 0, FALSE);
if (!gst_bit_writer_put_bytes (&nw->bw, data, nbytes))
return FALSE;
return TRUE;
}
gboolean
nal_writer_put_ue (NalWriter * nw, guint32 value)
{
guint leading_zeros;
guint rest;
g_return_val_if_fail (nw != NULL, FALSE);
count_exp_golomb_bits (value, &leading_zeros, &rest);
/* write leading zeros */
if (leading_zeros) {
if (!nal_writer_put_bits_uint32 (nw, 0, leading_zeros))
return FALSE;
}
/* write the rest */
if (!nal_writer_put_bits_uint32 (nw, value + 1, rest))
return FALSE;
return TRUE;
}
gboolean
count_exp_golomb_bits (guint32 value, guint * leading_zeros, guint * rest)
{
guint32 x;
guint count = 0;
/* https://en.wikipedia.org/wiki/Exponential-Golomb_coding */
/* count bits of value + 1 */
x = value + 1;
while (x) {
count++;
x >>= 1;
}
if (leading_zeros) {
if (count > 1)
*leading_zeros = count - 1;
else
*leading_zeros = 0;
}
if (rest) {
*rest = count;
}
return TRUE;
}
+269
View File
@@ -0,0 +1,269 @@
/* Gstreamer
* Copyright (C) <2011> Intel Corporation
* Copyright (C) <2011> Collabora Ltd.
* Copyright (C) <2011> Thibault Saunier <thibault.saunier@collabora.com>
*
* Some bits C-c,C-v'ed and s/4/3 from h264parse and videoparsers/h264parse.c:
* Copyright (C) <2010> Mark Nauwelaerts <mark.nauwelaerts@collabora.co.uk>
* Copyright (C) <2010> Collabora Multimedia
* Copyright (C) <2010> Nokia Corporation
*
* (C) 2005 Michal Benes <michal.benes@itonis.tv>
* (C) 2008 Wim Taymans <wim.taymans@gmail.com>
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Library General Public License for more details.
*
* You should have received a copy of the GNU Library General Public
* License along with this library; if not, write to the
* Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
* Boston, MA 02110-1301, USA.
*/
/**
* Common code for NAL parsing from h264 and h265 parsers.
*/
#ifdef HAVE_CONFIG_H
# include "config.h"
#endif
#include <gst/base/gstbytereader.h>
#include <gst/base/gstbitreader.h>
#include <gst/base/gstbitwriter.h>
typedef struct
{
const guint8 *data;
guint size;
guint n_epb; /* Number of emulation prevention bytes */
guint byte; /* Byte position */
guint bits_in_cache; /* bitpos in the cache of next bit */
guint8 first_byte;
guint32 epb_cache; /* cache 3 bytes to check emulation prevention bytes */
guint64 cache; /* cached bytes */
} NalReader;
typedef struct
{
GstBitWriter bw;
guint nal_prefix_size;
gboolean packetized;
} NalWriter;
G_GNUC_INTERNAL
void nal_reader_init (NalReader * nr, const guint8 * data, guint size);
G_GNUC_INTERNAL
gboolean nal_reader_read (NalReader * nr, guint nbits);
G_GNUC_INTERNAL
gboolean nal_reader_skip (NalReader * nr, guint nbits);
G_GNUC_INTERNAL
gboolean nal_reader_skip_long (NalReader * nr, guint nbits);
G_GNUC_INTERNAL
guint nal_reader_get_pos (const NalReader * nr);
G_GNUC_INTERNAL
guint nal_reader_get_remaining (const NalReader * nr);
G_GNUC_INTERNAL
guint nal_reader_get_epb_count (const NalReader * nr);
G_GNUC_INTERNAL
gboolean nal_reader_is_byte_aligned (NalReader * nr);
G_GNUC_INTERNAL
gboolean nal_reader_has_more_data (NalReader * nr);
#define NAL_READER_READ_BITS_H(bits) \
G_GNUC_INTERNAL \
gboolean nal_reader_get_bits_uint##bits (NalReader *nr, guint##bits *val, guint nbits)
NAL_READER_READ_BITS_H (8);
NAL_READER_READ_BITS_H (16);
NAL_READER_READ_BITS_H (32);
#define NAL_READER_PEEK_BITS_H(bits) \
G_GNUC_INTERNAL \
gboolean nal_reader_peek_bits_uint##bits (const NalReader *nr, guint##bits *val, guint nbits)
NAL_READER_PEEK_BITS_H (8);
G_GNUC_INTERNAL
gboolean nal_reader_get_ue (NalReader * nr, guint32 * val);
G_GNUC_INTERNAL
gboolean nal_reader_get_se (NalReader * nr, gint32 * val);
#define CHECK_ALLOWED_MAX_WITH_DEBUG(dbg, val, max) { \
if (val > max) { \
GST_WARNING ("value for '" dbg "' greater than max. value: %d, max %d", \
val, max); \
goto error; \
} \
}
#define CHECK_ALLOWED_MAX(val, max) \
CHECK_ALLOWED_MAX_WITH_DEBUG (G_STRINGIFY (val), val, max)
#define CHECK_ALLOWED_WITH_DEBUG(dbg, val, min, max) { \
if (val < min || val > max) { \
GST_WARNING ("value for '" dbg "' not in allowed range. value: %d, range %d-%d", \
val, min, max); \
goto error; \
} \
}
#define CHECK_ALLOWED(val, min, max) \
CHECK_ALLOWED_WITH_DEBUG (G_STRINGIFY (val), val, min, max)
#define READ_UINT8(nr, val, nbits) { \
if (!nal_reader_get_bits_uint8 (nr, &val, nbits)) { \
GST_WARNING ("failed to read uint8 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
goto error; \
} \
}
#define READ_UINT16(nr, val, nbits) { \
if (!nal_reader_get_bits_uint16 (nr, &val, nbits)) { \
GST_WARNING ("failed to read uint16 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
goto error; \
} \
}
#define READ_UINT32(nr, val, nbits) { \
if (!nal_reader_get_bits_uint32 (nr, &val, nbits)) { \
GST_WARNING ("failed to read uint32 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
goto error; \
} \
}
#define READ_UINT64(nr, val, nbits) { \
if (!nal_reader_get_bits_uint64 (nr, &val, nbits)) { \
GST_WARNING ("failed to read uint32 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
goto error; \
} \
}
#define READ_UE(nr, val) { \
if (!nal_reader_get_ue (nr, &val)) { \
GST_WARNING ("failed to read UE for '" G_STRINGIFY (val) "'"); \
goto error; \
} \
}
#define READ_UE_ALLOWED(nr, val, min, max) { \
guint32 tmp; \
READ_UE (nr, tmp); \
CHECK_ALLOWED_WITH_DEBUG (G_STRINGIFY (val), tmp, min, max); \
val = tmp; \
}
#define READ_UE_MAX(nr, val, max) { \
guint32 tmp; \
READ_UE (nr, tmp); \
CHECK_ALLOWED_MAX_WITH_DEBUG (G_STRINGIFY (val), tmp, max); \
val = tmp; \
}
#define READ_SE(nr, val) { \
if (!nal_reader_get_se (nr, &val)) { \
GST_WARNING ("failed to read SE for '" G_STRINGIFY (val) "'"); \
goto error; \
} \
}
#define READ_SE_ALLOWED(nr, val, min, max) { \
gint32 tmp; \
READ_SE (nr, tmp); \
CHECK_ALLOWED_WITH_DEBUG (G_STRINGIFY (val), tmp, min, max); \
val = tmp; \
}
G_GNUC_INTERNAL
gint scan_for_start_codes (const guint8 * data, guint size);
G_GNUC_INTERNAL
void nal_writer_init (NalWriter * nw, guint nal_prefix_size, gboolean packetized);
G_GNUC_INTERNAL
void nal_writer_reset (NalWriter * nw);
G_GNUC_INTERNAL
gboolean nal_writer_do_rbsp_trailing_bits (NalWriter * nw);
G_GNUC_INTERNAL
GstMemory * nal_writer_reset_and_get_memory (NalWriter * nw);
G_GNUC_INTERNAL
guint8 * nal_writer_reset_and_get_data (NalWriter * nw, guint32 * ret_size);
G_GNUC_INTERNAL
gboolean nal_writer_put_bits_uint8 (NalWriter * nw, guint8 value, guint nbits);
G_GNUC_INTERNAL
gboolean nal_writer_put_bits_uint16 (NalWriter * nw, guint16 value, guint nbits);
G_GNUC_INTERNAL
gboolean nal_writer_put_bits_uint32 (NalWriter * nw, guint32 value, guint nbits);
G_GNUC_INTERNAL
gboolean nal_writer_put_bytes (NalWriter * nw, const guint8 * data, guint nbytes);
G_GNUC_INTERNAL
gboolean nal_writer_put_ue (NalWriter * nw, guint32 value);
G_GNUC_INTERNAL
gboolean count_exp_golomb_bits (guint32 value, guint * leading_zeros, guint * rest);
#define WRITE_UINT8(nw, val, nbits) { \
if (!nal_writer_put_bits_uint8 (nw, val, nbits)) { \
GST_WARNING ("failed to write uint8 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
goto error; \
} \
}
#define WRITE_UINT16(nw, val, nbits) { \
if (!nal_writer_put_bits_uint16 (nw, val, nbits)) { \
GST_WARNING ("failed to write uint16 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
goto error; \
} \
}
#define WRITE_UINT32(nw, val, nbits) { \
if (!nal_writer_put_bits_uint32 (nw, val, nbits)) { \
GST_WARNING ("failed to write uint32 for '" G_STRINGIFY (val) "', nbits: %d", nbits); \
goto error; \
} \
}
#define WRITE_BYTES(nw, data, nbytes) { \
if (!nal_writer_put_bytes (nw, data, nbytes)) { \
GST_WARNING ("failed to write bytes for '" G_STRINGIFY (val) "', nbits: %d", nbytes); \
goto error; \
} \
}
#define WRITE_UE(nw, val) { \
if (!nal_writer_put_ue (nw, val)) { \
GST_WARNING ("failed to write ue for '" G_STRINGIFY (val) "'"); \
goto error; \
} \
}
static inline guint32 div_ceil (guint32 a, guint32 b)
{
/* http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html */
g_assert (b > 0);
return a / b + (a % b > 0);
}
+10
View File
@@ -0,0 +1,10 @@
/* Stub for <gst/glib-compat-private.h>.
* In upstream GStreamer this provides backwards-compat shims for older
* GLib versions (g_memdup2 polyfill being the load-bearing one).
* Our gst_compat.h already defines g_memdup2 as a static inline, so
* we just include the shim.
*/
#ifndef LIBVA_V4L2_REQUEST_FOURIER_GLIB_COMPAT_PRIVATE_STUB
#define LIBVA_V4L2_REQUEST_FOURIER_GLIB_COMPAT_PRIVATE_STUB
#include "gst_compat.h"
#endif
+10
View File
@@ -0,0 +1,10 @@
/* Stub for <gst/gst.h> — redirects to the project's gst_compat shim.
* The vendored GStreamer 1.28.2 H.265 parser was originally built against
* full GStreamer; we only need the GLib type aliases + memory helpers +
* macro stubs, all provided by gst_compat.h. Original gst.h would pull
* in GObject + GstObject + the entire framework, which we don't link.
*/
#ifndef LIBVA_V4L2_REQUEST_FOURIER_GST_H_STUB
#define LIBVA_V4L2_REQUEST_FOURIER_GST_H_STUB
#include "gst_compat.h"
#endif
+145
View File
@@ -0,0 +1,145 @@
/*
* gst_compat.c GArray implementation for the vendored GStreamer parser.
*
* Scope: minimal subset of GArray API exercised by gsth265parser.c
* (g_array_new, g_array_sized_new, g_array_append_vals + the
* g_array_append_val macro, g_array_index macro, g_array_set_size,
* g_array_set_clear_func, g_array_free, g_array_unref).
*
* Non-thread-safe (matches GArray's documented semantics GArray is
* not thread-safe in upstream GLib either, callers must serialize).
*
* License: MIT (matches backend's COPYING.MIT).
*/
#include "gst_compat.h"
/* ===== internal helpers ===== */
static gboolean
garray_grow(GArray *array, guint new_capacity)
{
if (new_capacity <= array->capacity)
return TRUE;
/* round up to next power of two for amortized O(1) growth */
guint cap = array->capacity > 0 ? array->capacity : 4;
while (cap < new_capacity)
cap *= 2;
char *new_data = realloc(array->data, (size_t)cap * array->element_size);
if (new_data == NULL)
return FALSE;
if (array->clear) {
memset(new_data + (size_t)array->capacity * array->element_size, 0,
(size_t)(cap - array->capacity) * array->element_size);
}
array->data = new_data;
array->capacity = cap;
return TRUE;
}
/* ===== public API ===== */
GArray *
g_array_sized_new(gboolean zero_terminated, gboolean clear,
guint element_size, guint reserved_size)
{
/* zero_terminated is GLib-specific (appends a zero-element sentinel
* for trailing-NULL semantics). The vendored parser does not use it;
* we ignore the flag. */
(void)zero_terminated;
GArray *a = calloc(1, sizeof(GArray));
if (a == NULL)
return NULL;
a->element_size = element_size;
a->clear = clear;
if (reserved_size > 0) {
if (!garray_grow(a, reserved_size)) {
free(a);
return NULL;
}
}
return a;
}
GArray *
g_array_new(gboolean zero_terminated, gboolean clear, guint element_size)
{
return g_array_sized_new(zero_terminated, clear, element_size, 0);
}
GArray *
g_array_set_size(GArray *array, guint length)
{
if (length > array->capacity) {
if (!garray_grow(array, length))
return array;
}
if (array->clear_func != NULL && length < array->len) {
for (guint i = length; i < array->len; i++)
array->clear_func(array->data + (size_t)i * array->element_size);
}
if (array->clear && length > array->len) {
memset(array->data + (size_t)array->len * array->element_size, 0,
(size_t)(length - array->len) * array->element_size);
}
array->len = length;
return array;
}
GArray *
g_array_append_vals(GArray *array, gconstpointer data, guint len)
{
if (len == 0)
return array;
if (!garray_grow(array, array->len + len))
return array;
memcpy(array->data + (size_t)array->len * array->element_size,
data, (size_t)len * array->element_size);
array->len += len;
return array;
}
void
g_array_set_clear_func(GArray *array, void (*clear_func)(gpointer))
{
array->clear_func = clear_func;
}
gchar *
g_array_free(GArray *array, gboolean free_segment)
{
if (array == NULL)
return NULL;
if (array->clear_func != NULL) {
for (guint i = 0; i < array->len; i++)
array->clear_func(array->data + (size_t)i * array->element_size);
}
gchar *data = NULL;
if (free_segment) {
free(array->data);
} else {
data = array->data;
}
free(array);
return data;
}
GArray *
g_array_unref(GArray *array)
{
/* simplified to free; the backend never sub-references shared GArrays */
g_array_free(array, TRUE);
return NULL;
}
+463
View File
@@ -0,0 +1,463 @@
/*
* gst_compat.h minimal GLib/GStreamer compatibility shim for vendored
* GStreamer 1.28.2 H.265 parser + bitreader + bytereader + nalutils.
*
* Strategy: provide #defines / typedefs for the GLib API surface those
* 4 vendored files use, so they can compile against libc + libv4l2 only
* (no glib2 / gst-base linkage). Vendored .c files are NOT modified
* directly; instead this header is force-included via the Makefile's
* `-include` flag on the vendored translation units.
*
* Coverage scoped to what gsth265parser.c + nalutils.c + gstbitreader.c
* + gstbytereader.c actually call. Surveyed in
* ampere-kernel-decoders phase4 step 2 prep see
* ~/src/ampere-kernel-decoders/phase4_plan_iter2.md and the survey
* commit message for the empirical inventory.
*
* License: this shim is original work, MIT (matching the backend's
* COPYING.MIT). The vendored .c files keep their LGPL v2.1+ headers
* verbatim.
*/
#ifndef LIBVA_V4L2_REQUEST_FOURIER_GST_COMPAT_H
#define LIBVA_V4L2_REQUEST_FOURIER_GST_COMPAT_H
#include <assert.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* ===== GLib type aliases ===== */
typedef bool gboolean;
typedef char gchar;
typedef unsigned char guchar;
typedef int gint;
typedef int8_t gint8;
typedef int16_t gint16;
typedef int32_t gint32;
typedef int64_t gint64;
typedef unsigned int guint;
typedef uint8_t guint8;
typedef uint16_t guint16;
typedef uint32_t guint32;
typedef uint64_t guint64;
typedef size_t gsize;
typedef ptrdiff_t gssize;
typedef void * gpointer;
typedef const void * gconstpointer;
typedef double gdouble;
typedef float gfloat;
/* GLib's gint64 / guint64 formatting is platform-conditional; for our
* aarch64 ALARM target we don't need the full G_*_FORMAT machinery, but
* gstbytereader uses G_GSIZE_FORMAT in a debug-only printf. */
#define G_GSIZE_FORMAT "zu"
#ifndef TRUE
# define TRUE true
#endif
#ifndef FALSE
# define FALSE false
#endif
/* ===== memory ===== */
#define g_malloc(n) malloc((size_t)(n))
#define g_malloc0(n) calloc(1, (size_t)(n))
#define g_realloc(p, n) realloc((p), (size_t)(n))
/* g_free needs to be addressable (passed as a function-pointer arg by
* nalutils.c::gst_memory_new_wrapped even though that call site is
* dead code we don't invoke, it must compile). Plain `free` is
* compatible: signature is `void (void *)` either way. */
#define g_free free
#define g_new(type, n) ((type *)malloc(sizeof(type) * (size_t)(n)))
#define g_new0(type, n) ((type *)calloc((size_t)(n), sizeof(type)))
#define g_slice_new(type) ((type *)malloc(sizeof(type)))
#define g_slice_new0(type) ((type *)calloc(1, sizeof(type)))
#define g_slice_free(type, p) free(p)
#define g_slice_free1(size, p) free(p)
#define g_clear_pointer(pp, freefn) \
do { freefn(*(pp)); *(pp) = NULL; } while (0)
/* g_memdup2 — GLib's 64-bit-safe memdup, used by gstbytereader. */
static inline gpointer
g_memdup2(gconstpointer mem, gsize byte_size)
{
if (mem == NULL || byte_size == 0)
return NULL;
void *copy = malloc(byte_size);
if (copy != NULL)
memcpy(copy, mem, byte_size);
return copy;
}
/* g_strcmp0 — NULL-safe strcmp. Used by gsth265parser in profile-name lookup. */
static inline int
g_strcmp0(const char *a, const char *b)
{
if (a == b) return 0;
if (a == NULL) return -1;
if (b == NULL) return 1;
return strcmp(a, b);
}
/* ===== asserts / return-guards =====
*
* Per ampere-kernel-decoders iter2 Phase 2 §"new failure modes" #5:
* g_assert must NOT abort the process. It becomes a no-op here;
* malformed bitstream is caught by the explicit parse-result returns
* the parser already implements.
*
* g_return_if_fail / g_return_val_if_fail propagate as the original
* GLib semantics (early return with optional value). */
#define g_assert(cond) ((void)0)
#define g_assert_not_reached() __builtin_unreachable()
#define g_return_if_fail(cond) do { if (!(cond)) return; } while (0)
#define g_return_val_if_fail(cond, v) do { if (!(cond)) return (v); } while (0)
/* ===== GStreamer logging — no-ops =====
*
* The parser is heavy on debug logging. We compile all of it out;
* the backend's own logging (request_log/error_log) wraps the parser
* calls and reports parse-failure return codes from there. */
#define GST_DISABLE_GST_DEBUG 1
#define GST_DEBUG_CATEGORY_STATIC(name)
#define GST_DEBUG_CATEGORY_INIT(...) ((void)0)
#define GST_DEBUG_CATEGORY_GET(...) ((void)0)
#define GST_DEBUG(...) ((void)0)
#define GST_INFO(...) ((void)0)
#define GST_WARNING(...) ((void)0)
#define GST_ERROR(...) ((void)0)
#define GST_LOG(...) ((void)0)
#define GST_FIXME(...) ((void)0)
#define GST_MEMDUMP(...) ((void)0)
#define GST_CAT_DEFAULT (NULL)
/* ===== compiler / language helpers ===== */
#define G_LIKELY(x) __builtin_expect(!!(x), 1)
#define G_UNLIKELY(x) __builtin_expect(!!(x), 0)
#define G_GNUC_UNUSED __attribute__((unused))
#define G_GNUC_INTERNAL
#define G_GNUC_MALLOC __attribute__((malloc))
#define G_GNUC_NORETURN __attribute__((noreturn))
#define G_GNUC_DEPRECATED
#define G_GNUC_DEPRECATED_FOR(x)
#define G_GNUC_PURE __attribute__((pure))
#define G_GNUC_CONST __attribute__((const))
#define G_GNUC_PRINTF(a, b) __attribute__((format(printf, a, b)))
#define G_BEGIN_DECLS
#define G_END_DECLS
#define G_N_ELEMENTS(arr) (sizeof(arr) / sizeof((arr)[0]))
#define G_STMT_START do
#define G_STMT_END while (0)
#define G_STRINGIFY(x) G_STRINGIFY_(x)
#define G_STRINGIFY_(x) #x
/* GStreamer ABI-padding slot count; upstream uses 4 reserved gpointers
* at the end of public structs for future ABI extension. We replicate
* the size so struct layout matches what gst_byte_reader_init / friends
* write into. */
#define GST_PADDING 4
#define GST_PADDING_LARGE 20
/* Public-symbol visibility — backend's shared module uses
* -fvisibility=hidden, so we don't need to mark anything public from
* within the vendored parser. The original GST_*_API macros expand to
* extern + dllimport on Windows; on Linux ELF builds where
* fvisibility=hidden is active, they would mark public symbols. The
* vendored functions are never called from outside h265_parser/, so
* leaving these empty hides them automatically. */
#define GST_API
#define GST_API_EXPORT extern
#define GST_API_IMPORT extern
/* ===== Opaque GStreamer pipeline types =====
*
* GstBuffer + GstMemory are referenced by encoder-side dead-code
* functions in gsth265parser.c (gst_h265_parser_insert_sei_hevc).
* We never call those; declaring them as opaque structs lets the
* function pointers / declarations compile, and the linker keeps the
* dead-code .text section even though it's unreachable.
*
* If you ever need to actually USE GstBuffer in this tree, replace
* these opaque decls with the project's own buffer abstraction; do not
* try to vendor in libgst itself. */
typedef struct _GstBuffer GstBuffer;
typedef struct _GstMemory GstMemory;
typedef struct _GstMapInfo GstMapInfo; /* opaque — dead-code in gsth265parser SEI insert */
/* GLib min/max constants — dead-code unsigned-overflow guards in
* gsth265parser.c. */
#define G_MAXUINT8 ((guint8)0xFF)
#define G_MAXUINT16 ((guint16)0xFFFF)
#define G_MAXUINT32 ((guint32)0xFFFFFFFFU)
#define G_MAXUINT64 ((guint64)0xFFFFFFFFFFFFFFFFULL)
#define G_MAXINT8 ((gint8)0x7F)
#define G_MAXINT16 ((gint16)0x7FFF)
#define G_MAXINT32 ((gint32)0x7FFFFFFF)
#define G_MAXINT64 ((gint64)0x7FFFFFFFFFFFFFFFLL)
#define G_MININT8 ((gint8)(-0x80))
#define G_MININT16 ((gint16)(-0x8000))
#define G_MININT32 ((gint32)(-0x80000000))
#define G_MAXSIZE ((gsize)-1)
/* GLib function-pointer typedefs used by g_list_* APIs (which our
* gst_compat declares as abort-stubs). They show up in code paths
* we never invoke but must compile. */
typedef void (*GDestroyNotify)(gpointer data);
typedef int (*GCompareFunc)(gconstpointer a, gconstpointer b);
typedef int (*GCompareDataFunc)(gconstpointer a, gconstpointer b, gpointer user_data);
/* GstMapFlags — passed to gst_memory_map / gst_buffer_map. Dead-code. */
#define GST_MAP_READ (1 << 0)
#define GST_MAP_WRITE (1 << 1)
#define GST_MAP_READWRITE (GST_MAP_READ | GST_MAP_WRITE)
/* Dead-code stubs for buffer / memory mapping (only referenced by
* gst_h265_parser_insert_sei_hevc which we never call). The compile
* needs declarations + addressable functions; abort on call. */
static inline gboolean
gst_memory_map(GstMemory *mem G_GNUC_UNUSED, GstMapInfo *info G_GNUC_UNUSED,
int flags G_GNUC_UNUSED) { abort(); }
static inline void
gst_memory_unmap(GstMemory *mem G_GNUC_UNUSED, GstMapInfo *info G_GNUC_UNUSED) { abort(); }
static inline gboolean
gst_buffer_map(GstBuffer *buf G_GNUC_UNUSED, GstMapInfo *info G_GNUC_UNUSED,
int flags G_GNUC_UNUSED) { abort(); }
static inline void
gst_buffer_unmap(GstBuffer *buf G_GNUC_UNUSED, GstMapInfo *info G_GNUC_UNUSED) { abort(); }
static inline GstBuffer *
gst_buffer_new(void) { abort(); }
static inline gboolean
gst_buffer_copy_into(GstBuffer *dst G_GNUC_UNUSED, GstBuffer *src G_GNUC_UNUSED,
int flags G_GNUC_UNUSED, gsize offset G_GNUC_UNUSED,
gssize size G_GNUC_UNUSED) { abort(); }
static inline void
gst_buffer_append_memory(GstBuffer *buf G_GNUC_UNUSED, GstMemory *mem G_GNUC_UNUSED) { abort(); }
static inline GstMemory *
gst_memory_ref(GstMemory *mem G_GNUC_UNUSED) { abort(); }
static inline void
gst_memory_unref(GstMemory *mem G_GNUC_UNUSED) { abort(); }
static inline GstMemory *
gst_memory_copy(GstMemory *mem G_GNUC_UNUSED, gssize offset G_GNUC_UNUSED, gssize size G_GNUC_UNUSED) { abort(); }
static inline void
gst_clear_buffer(GstBuffer **buf) { *buf = NULL; }
#define GST_IS_BUFFER(b) (false)
/* GstBufferCopyFlags — used only by gst_buffer_copy_into in dead code. */
#define GST_BUFFER_COPY_METADATA (1 << 0)
#define GST_BUFFER_COPY_MEMORY (1 << 1)
#define GST_BUFFER_COPY_DEEP (1 << 2)
/* gst_util_ceil_log2(n) — ceil(log2(n)) for non-zero unsigned n.
* Used by gsth265parser.c::gst_h265_slice_parse_ref_pic_list_modification.
* That function is in the slice-header parser which the libva backend
* does NOT invoke (we only call parse_sps) but the linker still
* needs a definition. Provide a real impl: cheaper to compute than to
* justify a dead-code stub at every call site. */
static inline guint
gst_util_ceil_log2(guint32 n)
{
if (n <= 1) return 0;
/* __builtin_clz returns leading zeros for a 32-bit value;
* 32 - clz(n-1) = bits needed = ceil(log2(n)). */
return 32 - (guint)__builtin_clz(n - 1);
}
/* GstMapInfo's real definition is in <gst/gstmemory.h>; we need at
* least enough to make `info->data` / `info->size` compile. */
struct _GstMapInfo {
GstMemory *memory;
int flags;
guint8 *data;
gsize size;
gsize maxsize;
gpointer user_data[4];
gpointer _gst_reserved[GST_PADDING];
};
/* gst_memory_new_wrapped — dead-code stub (nalutils.c calls it from
* the SEI-insertion path the libva backend never invokes). */
static inline GstMemory *
gst_memory_new_wrapped(int flags, gpointer data, gsize maxsize,
gsize offset, gsize size, gpointer user_data,
void (*notify)(gpointer))
{
(void)flags; (void)data; (void)maxsize; (void)offset; (void)size;
(void)user_data; (void)notify;
abort();
}
/* ===== byte-order read / write macros =====
*
* GStreamer provides these as static-inline functions in
* <gst/gstutils.h>. We re-implement for aarch64 little-endian; the
* parser is byte-stream input, so endian-conversion is mechanical.
* The float / double variants are present in upstream but the parser
* never invokes them provide stubs so the address-taking sites in
* gstbytereader.h's function table compile. */
#define GST_READ_UINT8(data) \
(*((const guint8 *)(data)))
#define GST_READ_UINT16_LE(data) ( \
((guint16)((const guint8 *)(data))[0]) | \
((guint16)((const guint8 *)(data))[1] << 8))
#define GST_READ_UINT16_BE(data) ( \
((guint16)((const guint8 *)(data))[0] << 8) | \
((guint16)((const guint8 *)(data))[1]))
#define GST_READ_UINT24_LE(data) ( \
((guint32)((const guint8 *)(data))[0]) | \
((guint32)((const guint8 *)(data))[1] << 8) | \
((guint32)((const guint8 *)(data))[2] << 16))
#define GST_READ_UINT24_BE(data) ( \
((guint32)((const guint8 *)(data))[0] << 16) | \
((guint32)((const guint8 *)(data))[1] << 8) | \
((guint32)((const guint8 *)(data))[2]))
#define GST_READ_UINT32_LE(data) ( \
((guint32)((const guint8 *)(data))[0]) | \
((guint32)((const guint8 *)(data))[1] << 8) | \
((guint32)((const guint8 *)(data))[2] << 16) | \
((guint32)((const guint8 *)(data))[3] << 24))
#define GST_READ_UINT32_BE(data) ( \
((guint32)((const guint8 *)(data))[0] << 24) | \
((guint32)((const guint8 *)(data))[1] << 16) | \
((guint32)((const guint8 *)(data))[2] << 8) | \
((guint32)((const guint8 *)(data))[3]))
#define GST_READ_UINT64_LE(data) ( \
((guint64)((const guint8 *)(data))[0]) | \
((guint64)((const guint8 *)(data))[1] << 8) | \
((guint64)((const guint8 *)(data))[2] << 16) | \
((guint64)((const guint8 *)(data))[3] << 24) | \
((guint64)((const guint8 *)(data))[4] << 32) | \
((guint64)((const guint8 *)(data))[5] << 40) | \
((guint64)((const guint8 *)(data))[6] << 48) | \
((guint64)((const guint8 *)(data))[7] << 56))
#define GST_READ_UINT64_BE(data) ( \
((guint64)((const guint8 *)(data))[0] << 56) | \
((guint64)((const guint8 *)(data))[1] << 48) | \
((guint64)((const guint8 *)(data))[2] << 40) | \
((guint64)((const guint8 *)(data))[3] << 32) | \
((guint64)((const guint8 *)(data))[4] << 24) | \
((guint64)((const guint8 *)(data))[5] << 16) | \
((guint64)((const guint8 *)(data))[6] << 8) | \
((guint64)((const guint8 *)(data))[7]))
/* Float / double readers — dead-code, abort if called. The function
* table in gstbytereader.h takes the address of the underlying inline
* which we don't need to be functional, only addressable. */
static inline gfloat
GST_READ_FLOAT_LE(const guint8 *data) { (void)data; abort(); }
static inline gfloat
GST_READ_FLOAT_BE(const guint8 *data) { (void)data; abort(); }
static inline gdouble
GST_READ_DOUBLE_LE(const guint8 *data) { (void)data; abort(); }
static inline gdouble
GST_READ_DOUBLE_BE(const guint8 *data) { (void)data; abort(); }
/* Write side — nalutils.c writes-out SEI bytes (dead path for us but
* must compile). */
#define GST_WRITE_UINT8(data, val) do { \
((guint8 *)(data))[0] = (guint8)(val); \
} while (0)
#define GST_WRITE_UINT16_BE(data, val) do { \
((guint8 *)(data))[0] = (guint8)((val) >> 8); \
((guint8 *)(data))[1] = (guint8)((val)); \
} while (0)
#define GST_WRITE_UINT24_BE(data, val) do { \
((guint8 *)(data))[0] = (guint8)((val) >> 16); \
((guint8 *)(data))[1] = (guint8)((val) >> 8); \
((guint8 *)(data))[2] = (guint8)((val)); \
} while (0)
#define GST_WRITE_UINT32_BE(data, val) do { \
((guint8 *)(data))[0] = (guint8)((val) >> 24); \
((guint8 *)(data))[1] = (guint8)((val) >> 16); \
((guint8 *)(data))[2] = (guint8)((val) >> 8); \
((guint8 *)(data))[3] = (guint8)((val)); \
} while (0)
#ifndef MIN
# define MIN(a, b) ((a) < (b) ? (a) : (b))
#endif
#ifndef MAX
# define MAX(a, b) ((a) > (b) ? (a) : (b))
#endif
/* ===== GArray ===== */
typedef struct {
char *data; /* exposed via g_array_index / GArray->data */
guint len; /* element count */
guint capacity; /* allocated element slots */
guint element_size;
gboolean clear; /* zero-fill on grow */
void (*clear_func)(gpointer);
} GArray;
GArray *g_array_new(gboolean zero_terminated, gboolean clear, guint element_size);
GArray *g_array_sized_new(gboolean zero_terminated, gboolean clear,
guint element_size, guint reserved_size);
GArray *g_array_set_size(GArray *array, guint length);
GArray *g_array_append_vals(GArray *array, gconstpointer data, guint len);
void g_array_set_clear_func(GArray *array, void (*clear_func)(gpointer));
gchar *g_array_free(GArray *array, gboolean free_segment);
GArray *g_array_unref(GArray *array);
#define g_array_append_val(a, v) g_array_append_vals((a), &(v), 1)
#define g_array_index(a, t, i) (((t *)(void *)(a)->data)[i])
/* ===== GList — stubs that abort if reached =====
*
* Surveyed call sites: gsth265parser.c uses g_list_prepend / g_list_sort /
* g_list_free_full in code paths the libva backend does not invoke for
* basic SPS parsing (likely SEI message accumulation). Stub to abort so
* any future call surfaces immediately rather than silently corrupting. */
/* GList — full struct (not opaque) so callers can do `list->data`.
* The functions still abort because we never construct a GList. */
typedef struct _GList GList;
struct _GList {
gpointer data;
GList *next;
GList *prev;
};
static inline GList *g_list_prepend(GList *list G_GNUC_UNUSED, gpointer data G_GNUC_UNUSED) { abort(); }
static inline GList *g_list_sort(GList *list G_GNUC_UNUSED, int (*cmp)(gconstpointer, gconstpointer) G_GNUC_UNUSED) { abort(); }
static inline void g_list_free_full(GList *list G_GNUC_UNUSED, void (*free_func)(gpointer) G_GNUC_UNUSED) { abort(); }
/* ===== g_once_init_enter / g_once_init_leave =====
*
* GLib's lazy-init guards. The parser uses these for one-shot static
* initialization (e.g. profile-name table). Our backend is single-
* threaded at the parser-init site (driver_init), so we can simplify
* to a plain run-once gate. */
#define g_once_init_enter(loc) (*(loc) == 0)
#define g_once_init_leave(loc, val) (*(loc) = (val))
/* ===== conversions ===== */
#define GINT_TO_POINTER(i) ((gpointer)(uintptr_t)(gint)(i))
#define GPOINTER_TO_INT(p) ((gint)(uintptr_t)(p))
#endif /* LIBVA_V4L2_REQUEST_FOURIER_GST_COMPAT_H */
+90
View File
@@ -0,0 +1,90 @@
/*
* v4l2-hevc-ext-controls.h verbatim mirror of Linux 7.0+ V4L2 stateless
* HEVC extended-SPS RPS control definitions, shipped as an internal
* header so this libva backend can be built against pre-7.0
* linux-api-headers packages (currently ampere ships 6.19-1).
*
* Upstream source: linux kernel, include/uapi/linux/v4l2-controls.h
* As-of: Linux 7.0-rc3 (Detlev Casanova / Collabora "VDPU381/VDPU383"
* series, see lkml.org/lkml/2026/1/9/1334). The two CIDs + two structs
* + two flag macros below are byte-for-byte the kernel UAPI definitions.
*
* Once linux-api-headers >= 7.0 is the floor across the fleet, this
* shim becomes redundant `<linux/v4l2-controls.h>` will provide the
* same symbols. The include order in h265.c is: this header BEFORE
* <linux/v4l2-controls.h>, so when the system catches up, the macro
* guards below silently no-op and we use the system definitions.
*
* License: MIT (matches backend's COPYING.MIT). Per LGPL § 3.b., the
* kernel UAPI struct definitions themselves are excepted from the
* kernel's overall GPL and may be copied verbatim into userspace
* binaries without inheriting GPL.
*
* Rationale + iter2 plan: see
* ~/src/ampere-kernel-decoders/phase4_plan_iter2.md (§Step 3)
* ~/src/ampere-kernel-decoders/phase0_findings_iter2.md
*/
#ifndef LIBVA_V4L2_REQUEST_FOURIER_V4L2_HEVC_EXT_CONTROLS_H
#define LIBVA_V4L2_REQUEST_FOURIER_V4L2_HEVC_EXT_CONTROLS_H
#include <linux/types.h>
#include <linux/v4l2-controls.h>
#ifndef V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS
# define V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS \
(V4L2_CID_CODEC_STATELESS_BASE + 408)
#endif
#ifndef V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS
# define V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS \
(V4L2_CID_CODEC_STATELESS_BASE + 409)
#endif
#ifndef V4L2_HEVC_EXT_SPS_ST_RPS_FLAG_INTER_REF_PIC_SET_PRED
# define V4L2_HEVC_EXT_SPS_ST_RPS_FLAG_INTER_REF_PIC_SET_PRED 0x1
#endif
#ifndef V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT
# define V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT 0x1
#endif
/*
* struct v4l2_ctrl_hevc_ext_sps_st_rps HEVC short-term RPS parameters.
*
* Dynamic-size 1-dimension array. Number of elements is
* v4l2_ctrl_hevc_sps::num_short_term_ref_pic_sets
* Can contain up to 65 elements (the H.265 spec § 7.4.3.2.1 maximum).
*/
#ifndef V4L2_HEVC_EXT_SPS_ST_RPS_DEFINED
# define V4L2_HEVC_EXT_SPS_ST_RPS_DEFINED 1
struct v4l2_ctrl_hevc_ext_sps_st_rps {
__u8 delta_idx_minus1;
__u8 delta_rps_sign;
__u8 num_negative_pics;
__u8 num_positive_pics;
__u32 used_by_curr_pic;
__u32 use_delta_flag;
__u16 abs_delta_rps_minus1;
__u16 delta_poc_s0_minus1[16];
__u16 delta_poc_s1_minus1[16];
__u16 flags;
};
#endif
/*
* struct v4l2_ctrl_hevc_ext_sps_lt_rps HEVC long-term RPS parameters.
*
* Dynamic-size 1-dimension array. Number of elements is
* v4l2_ctrl_hevc_sps::num_long_term_ref_pics_sps
* Can contain up to 33 elements (the H.265 spec § 7.4.3.2.1 maximum).
*/
#ifndef V4L2_HEVC_EXT_SPS_LT_RPS_DEFINED
# define V4L2_HEVC_EXT_SPS_LT_RPS_DEFINED 1
struct v4l2_ctrl_hevc_ext_sps_lt_rps {
__u16 lt_ref_pic_poc_lsb_sps;
__u16 flags;
};
#endif
#endif /* LIBVA_V4L2_REQUEST_FOURIER_V4L2_HEVC_EXT_CONTROLS_H */
+165 -21
View File
@@ -39,6 +39,8 @@
#include <linux/dma-buf.h> #include <linux/dma-buf.h>
#include "nv15.h"
#include "nv12_col128.h"
#include "tiled_yuv.h" #include "tiled_yuv.h"
#include "utils.h" #include "utils.h"
#include "v4l2.h" #include "v4l2.h"
@@ -86,14 +88,51 @@ VAStatus RequestCreateImage(VADriverContextP context, VAImageFormat *format,
for (i = 0; i < planes_count; i++) for (i = 0; i < planes_count; i++)
size += destination_sizes[i]; size += destination_sizes[i];
/* Here we calculate the sizes assuming NV12. */ if (format->fourcc == VA_FOURCC_P010) {
/*
* iter39: P010 image overrides V4L2-side NV15 sizing. The
* source is the kernel-reported NV15 packed plane; the image
* buffer holds dense P010 (2 bytes per pixel, 16bpp).
* Recompute sizes/pitches against P010 layout so consumers
* (vaGetImage, vaDeriveImage) see standard P010 geometry.
*/
destination_bytesperlines[0] = width * 2;
destination_sizes[0] = destination_bytesperlines[0] * format_height;
for (i = 1; i < destination_planes_count; i++) {
destination_bytesperlines[i] = destination_bytesperlines[0];
destination_sizes[i] = destination_sizes[0] / 2;
}
size = 0;
for (i = 0; i < destination_planes_count; i++)
size += destination_sizes[i];
} else if (format->fourcc == VA_FOURCC_NV12 &&
video_format->v4l2_format == V4L2_PIX_FMT_NV12_COL128) {
/*
* iter40 Phase 5 review F2: NC12 source, NV12 image output.
* V4L2-reported destination_bytesperlines[0] is the NC12
* column stride (= ALIGN(height,8) * 3/2 e.g. 1080 for
* 1280×720), NOT the linear NV12 Y stride. Override to the
* linear stride (width) so VAImage pitches reflect the
* detile-output layout the consumer reads.
*/
destination_bytesperlines[0] = width;
destination_sizes[0] = destination_bytesperlines[0] * format_height;
for (i = 1; i < destination_planes_count; i++) {
destination_bytesperlines[i] = destination_bytesperlines[0];
destination_sizes[i] = destination_sizes[0] / 2;
}
size = 0;
for (i = 0; i < destination_planes_count; i++)
size += destination_sizes[i];
} else {
/* NV12: V4L2 stride is correct, sizes derived from height. */
destination_sizes[0] = destination_bytesperlines[0] * format_height; destination_sizes[0] = destination_bytesperlines[0] * format_height;
for (i = 1; i < destination_planes_count; i++) { for (i = 1; i < destination_planes_count; i++) {
destination_bytesperlines[i] = destination_bytesperlines[0]; destination_bytesperlines[i] = destination_bytesperlines[0];
destination_sizes[i] = destination_sizes[0] / 2; destination_sizes[i] = destination_sizes[0] / 2;
} }
}
id = object_heap_allocate(&driver_data->image_heap); id = object_heap_allocate(&driver_data->image_heap);
image_object = IMAGE(driver_data, id); image_object = IMAGE(driver_data, id);
@@ -217,19 +256,90 @@ static VAStatus copy_surface_to_image (struct request_data *driver_data,
} }
for (i = 0; i < surface_object->destination_planes_count; i++) { for (i = 0; i < surface_object->destination_planes_count; i++) {
#ifdef __arm__ /*
* iter40 Phase 5 review F1: guard extended from __arm__ to
* __arm__ || __aarch64__. Without this, the detile primitives
* silently compiled out on aarch64 (fresnel RK3399, ampere
* RK3588, higgs Pi CM5) and the memcpy fall-through delivered
* raw tiled bytes to NV12/P010 image consumers. iter39 5/5
* PASS masked the issue because no 10-bit path was exercised.
*/
#if defined(__arm__) || defined(__aarch64__)
/*
* Sunxi tiled_to_planar lives in tiled_yuv.S which is
* #ifdef __arm__ symbol absent on aarch64. Keep this
* branch arm-only; aarch64 Sunxi support would need a C or
* aarch64-ASM port (no Sunxi aarch64 board in current fleet).
*/
#if defined(__arm__)
if (!video_format_is_linear(driver_data->video_format)) if (!video_format_is_linear(driver_data->video_format))
tiled_to_planar(surface_object->destination_data[i], tiled_to_planar(surface_object->destination_data[i],
buffer_object->data + image->offsets[i], buffer_object->data + image->offsets[i],
image->pitches[i], image->width, image->pitches[i], image->width,
i == 0 ? image->height : i == 0 ? image->height :
image->height / 2); image->height / 2);
else { else
#endif
if (driver_data->is_10bit &&
image->format.fourcc == VA_FOURCC_P010) {
/*
* iter39: rkvdec emits NV15 (4×10-bit packed in 5
* bytes); the VA image buffer is dense P010 (2B/pixel,
* value in bits[15:6]). Source stride is the V4L2-
* reported NV15 bytesperline (= ceil(width/4)*5,
* possibly aligned higher by the kernel); destination
* stride is image->pitches[i] = width * 2.
*/
unsigned int plane_h = (i == 0) ? image->height
: image->height / 2;
nv15_unpack_plane_to_p010(
surface_object->destination_data[i],
(uint16_t *)(buffer_object->data + image->offsets[i]),
image->width, plane_h,
surface_object->destination_bytesperlines[i]);
} else if (driver_data->video_format != NULL &&
driver_data->video_format->v4l2_format ==
V4L2_PIX_FMT_NV12_COL128 &&
image->format.fourcc == VA_FOURCC_NV12) {
/*
* iter40: Pi 5 rpi-hevc-dec emits NV12_COL128 (SAND
* 128-pixel-wide column tiles). Detile to linear NV12
* via the per-plane primitive. surface_object->
* destination_data[i] is the V4L2 CAPTURE mmap (single
* buffer, planes_count==2): i==0 is the Y plane base,
* i==1 is the UV plane base offset within the SAME
* physical buffer (per cap_pool plane[1] offset = Y
* plane size in COL128 layout).
*
* src_col_stride = destination_bytesperlines[i] = the
* kernel-reported NC12 bytesperline (column stride,
* = ALIGN(image_h, 8) * 3/2). Same for both planes
* since column geometry is plane-agnostic.
*
* dst stride is image->pitches[i] = image->width
* (overridden in RequestCreateImage NC12 branch below).
*/
if (i == 0) {
nv12_col128_detile_y(
(uint8_t *)(buffer_object->data + image->offsets[i]),
image->pitches[i],
surface_object->destination_data[i],
surface_object->destination_bytesperlines[i],
image->width, image->height);
} else {
nv12_col128_detile_uv(
(uint8_t *)(buffer_object->data + image->offsets[i]),
image->pitches[i],
surface_object->destination_data[i],
surface_object->destination_bytesperlines[i],
image->width, image->height / 2);
}
} else {
#endif #endif
memcpy(buffer_object->data + image->offsets[i], memcpy(buffer_object->data + image->offsets[i],
surface_object->destination_data[i], surface_object->destination_data[i],
surface_object->destination_sizes[i]); surface_object->destination_sizes[i]);
#ifdef __arm__ #if defined(__arm__) || defined(__aarch64__)
} }
#endif #endif
} }
@@ -268,9 +378,17 @@ VAStatus RequestDeriveImage(VADriverContextP context, VASurfaceID surface_id,
/* Fully populate VAImageFormat to match QueryImageFormats output. */ /* Fully populate VAImageFormat to match QueryImageFormats output. */
memset(&format, 0, sizeof(format)); memset(&format, 0, sizeof(format));
if (driver_data->is_10bit) {
/* iter39: 10-bit session derives a P010 image. NV15-source
* unpack happens in copy_surface_to_image. */
format.fourcc = VA_FOURCC_P010;
format.byte_order = VA_LSB_FIRST;
format.bits_per_pixel = 24;
} else {
format.fourcc = VA_FOURCC_NV12; format.fourcc = VA_FOURCC_NV12;
format.byte_order = VA_LSB_FIRST; format.byte_order = VA_LSB_FIRST;
format.bits_per_pixel = 12; format.bits_per_pixel = 12;
}
status = RequestCreateImage(context, &format, surface_object->width, status = RequestCreateImage(context, &format, surface_object->width,
surface_object->height, image); surface_object->height, image);
@@ -305,26 +423,52 @@ VAStatus RequestDeriveImage(VADriverContextP context, VASurfaceID surface_id,
VAStatus RequestQueryImageFormats(VADriverContextP context, VAStatus RequestQueryImageFormats(VADriverContextP context,
VAImageFormat *formats, int *formats_count) VAImageFormat *formats, int *formats_count)
{ {
struct request_data *driver_data = context->pDriverData;
int n = 0;
/* /*
* Populate the VAImageFormat fully per VAAPI spec for NV12 * Populate the VAImageFormat fully per VAAPI spec not just
* not just .fourcc. Consumers (FFmpeg's hwcontext_vaapi, mpv, * .fourcc. Consumers (FFmpeg's hwcontext_vaapi, mpv, Firefox)
* Firefox) read .byte_order and .bits_per_pixel; leaving them * read .byte_order and .bits_per_pixel; leaving them
* uninitialized inherits whatever caller-stack garbage is in * uninitialized inherits caller-stack garbage and produces
* the buffer and produces non-deterministic behavior. Reference: * non-deterministic behavior. Reference: Mesa's
* Mesa's gallium/frontends/va/image.c::vlVaQueryImageFormats and * gallium/frontends/va/image.c::vlVaQueryImageFormats and
* intel-vaapi-driver's i965_drv_video.c both publish NV12 * intel-vaapi-driver's i965_drv_video.c.
* with byte_order=VA_LSB_FIRST and bits_per_pixel=12.
* *
* For YUV formats, depth/red_mask/green_mask/blue_mask/alpha_mask * iter39: advertise P010 when an active session is 10-bit so
* are not meaningful (those describe RGB bit layouts); leave them * ffmpeg-vaapi sees a valid 10-bit-compatible entry during
* zeroed via memset before populating. * vaQueryImageFormats. NV12 stays advertised unconditionally so
* the 8-bit catalog query response is unchanged.
*/ */
memset(&formats[0], 0, sizeof(formats[0])); memset(&formats[n], 0, sizeof(formats[n]));
formats[0].fourcc = VA_FOURCC_NV12; formats[n].fourcc = VA_FOURCC_NV12;
formats[0].byte_order = VA_LSB_FIRST; formats[n].byte_order = VA_LSB_FIRST;
formats[0].bits_per_pixel = 12; formats[n].bits_per_pixel = 12;
*formats_count = 1; n++;
/*
* iter39 Option B revert (2026-05-17): P010 advertisement is
* gated on driver_data->is_10bit again. Previously advertised
* unconditionally (63fed87) so ffmpeg-vaapi's early
* vaQueryImageFormats (pre-vaCreateContext) could see it for
* 10-bit profiles but that broke HEVC 8-bit on fresnel:
* ffmpeg-vaapi picked P010 for the HEVC hwframe pool, EndPicture
* SEGV'd in the .so when the consumer-side P010 expectations met
* an 8-bit NV12 CAPTURE buffer.
* Safe because Option B drops VAProfileHEVCMain10 + Hi10P from
* enumeration no 10-bit decode pipeline will reach this catalog
* query so the gate-on-is_10bit (which stays false for 8-bit
* profiles) correctly returns NV12-only.
*/
if (driver_data->is_10bit && n < V4L2_REQUEST_MAX_IMAGE_FORMATS) {
memset(&formats[n], 0, sizeof(formats[n]));
formats[n].fourcc = VA_FOURCC_P010;
formats[n].byte_order = VA_LSB_FIRST;
formats[n].bits_per_pixel = 24;
n++;
}
*formats_count = n;
return VA_STATUS_SUCCESS; return VA_STATUS_SUCCESS;
} }
+41 -3
View File
@@ -22,6 +22,9 @@
autoconf_data = configuration_data() autoconf_data = configuration_data()
autoconf_data.set('VA_DRIVER_INIT_FUNC', va_driver_init_func) autoconf_data.set('VA_DRIVER_INIT_FUNC', va_driver_init_func)
if get_option('daedalus_v4l2')
autoconf_data.set('HAVE_DAEDALUS_V4L2', 1)
endif
autoconf = configure_file( autoconf = configure_file(
output: 'autoconfig.h', output: 'autoconfig.h',
@@ -50,7 +53,19 @@ sources = [
'h265.c', 'h265.c',
'vp8.c', 'vp8.c',
'vp9.c', 'vp9.c',
'codec.c' 'av1.c',
'codec.c',
'nv15.c',
'nv12_col128.c',
# Vendored GStreamer 1.28.2 H.265 parser + utilities (LGPL v2.1+,
# see src/h265_parser/gst_compat.h for sourcing notes + per-iter2
# adaptation strategy).
'h265_parser/gst_compat.c',
'h265_parser/gst/base/gstbitreader.c',
'h265_parser/gst/base/gstbytereader.c',
'h265_parser/gst/codecparsers/nalutils.c',
'h265_parser/gst/codecparsers/gsth265parser.c'
] ]
headers = [ headers = [
@@ -76,11 +91,34 @@ headers = [
'h265.h', 'h265.h',
'vp8.h', 'vp8.h',
'vp9.h', 'vp9.h',
'codec.h' 'codec.h',
'nv15.h',
'nv12_col128.h',
# Internal mirror of Linux 7.0 V4L2 HEVC EXT_SPS_*_RPS UAPI defs
# (allows building against pre-7.0 linux-api-headers; redundant
# once the host headers are 7.0+).
'hevc-ctrls/v4l2-hevc-ext-controls.h',
# Vendored GStreamer + project shim headers (see sources above).
'h265_parser/gst_compat.h',
'h265_parser/gst/gst.h',
'h265_parser/gst/glib-compat-private.h',
'h265_parser/gst/base/base-prelude.h',
'h265_parser/gst/base/gstbitreader.h',
'h265_parser/gst/base/gstbytereader.h',
'h265_parser/gst/base/gstbitwriter.h',
'h265_parser/gst/codecparsers/codecparsers-prelude.h',
'h265_parser/gst/codecparsers/gsth265parser.h',
'h265_parser/gst/codecparsers/nalutils.h'
] ]
includes = [ includes = [
include_directories('../include') include_directories('../include'),
# Vendored GStreamer parser tree — the parser's #include <gst/base/...>
# style references resolve here via stub headers that redirect to
# gst_compat.h.
include_directories('h265_parser')
] ]
cflags = [ cflags = [
+114
View File
@@ -0,0 +1,114 @@
/*
* V4L2_PIX_FMT_NV12_COL128 linear NV12 detile primitive. Pi 5 / CM5
* rpi-hevc-dec CAPTURE. iter40 (2026-05-17).
*
* Math derived from kernel hevc_d_video.c (size formula) +
* ffmpeg/Kynesim libavutil/rpi_sand_fn_pw.h (per-pixel offset). The
* single-stripe fast path memcpy's 128 bytes at a time when an output
* row falls entirely within one tile column (the common case);
* straddling rows are split into two memcpy halves.
*
* No NEON / SIMD here correctness first. Each output row generates
* (width / 128) + ~1 memcpys of up to 128 bytes; for 1920x1080 that's
* ~17000 small memcpys per frame, fine for Phase 1 PoC.
*/
#include "nv12_col128.h"
#include <string.h>
/*
* Tile column width in bytes. The 'COL128' name embeds this; if it ever
* varies, take it from V4L2_PIX_FMT_NV12_COL128's kernel definition.
*/
#define NC12_TILE_W 128
/*
* Common Y / UV plane detile the layout is identical (single-byte per
* pixel, column-major 128-wide tiles). The only thing that varies is
* what plane the caller passes in. width here is plane width in bytes
* (= image width for both Y and CbCr-interleaved NV12 UV); height is
* plane height in pixels (image height for Y, image height / 2 for UV).
*/
static void nv12_col128_detile_plane(uint8_t *dst, unsigned int dst_stride,
const uint8_t *src,
unsigned int src_col_stride,
unsigned int width, unsigned int height)
{
unsigned int y, x;
for (y = 0; y < height; y++) {
uint8_t *drow = dst + y * dst_stride;
x = 0;
while (x < width) {
unsigned int col = x / NC12_TILE_W;
unsigned int in_col = x % NC12_TILE_W;
unsigned int n = NC12_TILE_W - in_col;
if (n > width - x)
n = width - x;
/*
* Source byte = base + col*128*col_stride + y*128 + in_col
* Copy n contiguous bytes (all within this tile column,
* since n is capped at the remaining width-in-column).
*/
const uint8_t *p = src
+ (size_t)col * NC12_TILE_W * src_col_stride
+ (size_t)y * NC12_TILE_W
+ in_col;
memcpy(drow + x, p, n);
x += n;
}
}
}
void nv12_col128_detile_y(uint8_t *dst, unsigned int dst_stride,
const uint8_t *src_y, unsigned int src_col_stride,
unsigned int width, unsigned int height)
{
nv12_col128_detile_plane(dst, dst_stride, src_y, src_col_stride,
width, height);
}
void nv12_col128_detile_uv(uint8_t *dst, unsigned int dst_stride,
const uint8_t *src_uv, unsigned int src_col_stride,
unsigned int width, unsigned int uv_height)
{
/* UV plane (CbCr interleaved): byte-width equals Y-plane width
* (one Cb + one Cr per 2x2 Y block 2 bytes per 2 horizontal Y
* samples 1 byte per Y pixel horizontally). Height is half. */
nv12_col128_detile_plane(dst, dst_stride, src_uv, src_col_stride,
width, uv_height);
}
unsigned int nv12_col128_uv_plane_offset(unsigned int image_width,
unsigned int image_height)
{
unsigned int aligned_h = (image_height + 7) & ~7u;
/*
* In the COL128 SAND layout, Y and UV are NOT separate planes
* concatenated end-to-end. Within EACH 128-pixel-wide column:
* first 128 * height bytes = Y data for this column strip
* next 128 * height / 2 bytes = UV data for this column strip
* total 128 * bytesperline (= 128 * height * 3/2) bytes per column
*
* The "UV plane base" pointer (data[1] in AVFrame convention) is
* just data[0] + (128 * height) the offset of the UV bytes
* WITHIN the first column. All subsequent UV bytes are reached by
* the same column-stride arithmetic the Y plane uses (col *
* 128 * bytesperline + y * 128 + in_col), so passing this offset
* pointer + iterating y over [0, height/2) traverses all UV rows
* across all columns correctly.
*
* Earlier wrong formula was num_columns * 128 * aligned_h (i.e.
* sizeof(linear Y plane)) that pushed past the end of the SAND
* buffer because the layout isn't planes-end-to-end.
*
* Cross-check: kernel sizeimage = bytesperline * width =
* (aligned_h * 3/2) * num_columns * 128 = num_columns * 128 *
* aligned_h * 3/2. Per column: 128 * aligned_h * 3/2. Y portion
* per column: 128 * aligned_h. UV portion per column: half of Y.
* Sum across columns: matches sizeimage.
*/
return NC12_TILE_W * aligned_h;
}
+88
View File
@@ -0,0 +1,88 @@
/*
* V4L2_PIX_FMT_NV12_COL128 (NC12) SAND-tiled linear NV12 detile.
*
* Pi 5 / CM5 (BCM2712) rpi-hevc-dec CAPTURE format. iter40 (2026-05-17).
*
* Layout (kernel drivers/media/platform/raspberrypi/hevc_dec/hevc_d_video.c
* size-formula + ffmpeg/Kynesim libavutil/rpi_sand_fn_pw.h per-pixel
* offset math):
*
* width ALIGN(image_width, 128) -- columns are 128 px wide
* height ALIGN(image_height, 8)
* col_stride (= bytesperline) = height * 3 / 2
* (bytes per [128-wide column] vertical unit incl. Y + UV)
* sizeimage = col_stride * width = total bytes
*
* For pixel (x, y) in the Y plane:
* col = x / 128
* in_col_x = x % 128
* offset = col * col_stride * 128 + y * 128 + in_col_x
*
* UV plane starts at offset (128 * height * num_columns_y) the same
* per-column layout, h/2 rows tall (CbCr interleaved).
*
* The primitive copies the entire image extent at once. width/height are
* the cropped consumer-visible dimensions; src_col_stride is the kernel-
* reported bytesperline (i.e. ALIGN(height,8) * 3/2).
*/
#ifndef _NV12_COL128_H_
#define _NV12_COL128_H_
#include <stdint.h>
#include <linux/videodev2.h>
/*
* Pre-Pi-kernel headers (Arch ALARM linux-api-headers, older mainline
* kernel-headers packages) may not define V4L2_PIX_FMT_NV12_COL128. The
* fourcc is Pi-specific. Provide a private fallback so the backend
* builds on hosts that target NON-Pi codecs too.
*/
#ifndef V4L2_PIX_FMT_NV12_COL128
#define V4L2_PIX_FMT_NV12_COL128 \
((unsigned int)('N') | ((unsigned int)('C') << 8) | \
((unsigned int)('1') << 16) | ((unsigned int)('2') << 24))
#endif
#ifndef V4L2_PIX_FMT_NV12_10_COL128
/* 10-bit SAND variant: 3 pixels packed into 4 bytes in 128-byte / 96-pixel
* wide columns. iter40 references the fourcc for completeness; the 10-bit
* Pi 5 HEVC chapter (Main10) is post-iter40. */
#define V4L2_PIX_FMT_NV12_10_COL128 \
((unsigned int)('N') | ((unsigned int)('C') << 8) | \
((unsigned int)('3') << 16) | ((unsigned int)('0') << 24))
#endif
/* Detile the Y plane of an NC12 source to a linear NV12 Y plane.
* dst : pointer to linear NV12 Y plane (caller-owned, dst_stride * height bytes)
* dst_stride : linear Y plane stride in bytes (= width for plain NV12)
* src_y : pointer to start of NC12 Y plane (= NC12 buffer base)
* src_col_stride: kernel-reported bytesperline (= ALIGN(height,8) * 3/2)
* width, height: cropped image dimensions in pixels
*/
void nv12_col128_detile_y(uint8_t *dst, unsigned int dst_stride,
const uint8_t *src_y, unsigned int src_col_stride,
unsigned int width, unsigned int height);
/* Detile the UV plane (CbCr interleaved, half-height) of an NC12 source.
* dst : pointer to linear NV12 UV plane
* dst_stride : linear UV plane stride in bytes (= width for NV12)
* src_uv : pointer to start of NC12 UV plane (= src_y + Y-plane-size)
* src_col_stride: same as Y plane (same column geometry)
* width : Y-plane width in pixels (UV plane has same byte width)
* uv_height : UV plane height = height / 2
*/
void nv12_col128_detile_uv(uint8_t *dst, unsigned int dst_stride,
const uint8_t *src_uv, unsigned int src_col_stride,
unsigned int width, unsigned int uv_height);
/* Compute the offset of the UV plane within an NC12 buffer.
* image_width, image_height: cropped image dimensions in pixels
* Returns: byte offset from buffer start to UV plane start
* (= 128 * ALIGN(image_height, 8) * num_columns_y)
*/
unsigned int nv12_col128_uv_plane_offset(unsigned int image_width,
unsigned int image_height);
#endif /* _NV12_COL128_H_ */
+75
View File
@@ -0,0 +1,75 @@
/*
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#include "nv15.h"
void nv15_unpack_plane_to_p010(const uint8_t *src, uint16_t *dst,
unsigned int width, unsigned int height,
unsigned int src_stride)
{
unsigned int x, y;
unsigned int dst_pitch_px = width;
for (y = 0; y < height; y++) {
const uint8_t *s = src + y * src_stride;
uint16_t *d = dst + y * dst_pitch_px;
for (x = 0; x + 4 <= width; x += 4) {
uint16_t a = (uint16_t)s[0] | ((uint16_t)(s[1] & 0x03) << 8);
uint16_t b = ((uint16_t)s[1] >> 2) | ((uint16_t)(s[2] & 0x0F) << 6);
uint16_t c = ((uint16_t)s[2] >> 4) | ((uint16_t)(s[3] & 0x3F) << 4);
uint16_t e = ((uint16_t)s[3] >> 6) | ((uint16_t)s[4] << 2);
d[0] = (uint16_t)(a << 6);
d[1] = (uint16_t)(b << 6);
d[2] = (uint16_t)(c << 6);
d[3] = (uint16_t)(e << 6);
d += 4;
s += 5;
}
if (x < width) {
unsigned int rem = width - x;
uint16_t pix[4] = { 0, 0, 0, 0 };
pix[0] = (uint16_t)s[0] | ((uint16_t)(s[1] & 0x03) << 8);
if (rem >= 2)
pix[1] = ((uint16_t)s[1] >> 2) |
((uint16_t)(s[2] & 0x0F) << 6);
if (rem >= 3)
pix[2] = ((uint16_t)s[2] >> 4) |
((uint16_t)(s[3] & 0x3F) << 4);
if (rem >= 4)
pix[3] = ((uint16_t)s[3] >> 6) |
((uint16_t)s[4] << 2);
{
unsigned int j;
for (j = 0; j < rem; j++)
d[j] = (uint16_t)(pix[j] << 6);
}
}
}
}
+61
View File
@@ -0,0 +1,61 @@
/*
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#ifndef _NV15_H_
#define _NV15_H_
#include <stdint.h>
#include <linux/videodev2.h>
/*
* Older or downstream linux-api-headers / kernel-headers packages may
* not define V4L2_PIX_FMT_NV15. Provide a fallback so the backend
* builds on hosts whose headers are pre-NV15-merge or omit it (e.g.
* Pi 5 Debian trixie 6.12.62 headers include NC12 but not NV15).
* Same numeric value as mainline.
*/
#ifndef V4L2_PIX_FMT_NV15
#define V4L2_PIX_FMT_NV15 \
((unsigned int)('N') | ((unsigned int)('V') << 8) | \
((unsigned int)('1') << 16) | ((unsigned int)('5') << 24))
#endif
/*
* Unpack one plane of V4L2_PIX_FMT_NV15 (4 × 10-bit values packed into
* 5 consecutive bytes, LSB-first) into VA_FOURCC_P010 (16-bit per pixel,
* value in bits [15:6], zeros in [5:0]).
*
* Layout per Documentation/userspace-api/media/v4l/pixfmt-nv15.rst.
* Call once per plane: luma (W × H, src_stride = ceil(W/4)*5) and chroma
* (W × H/2 same width because UV are interleaved 10-bit values).
*
* src_stride must be the kernel-reported bytesperline for the NV15 plane.
* The destination is dense P010 with row pitch = width * 2 bytes.
*/
void nv15_unpack_plane_to_p010(const uint8_t *src, uint16_t *dst,
unsigned int width, unsigned int height,
unsigned int src_stride);
#endif
+36 -1
View File
@@ -36,6 +36,7 @@
#include "mpeg2.h" #include "mpeg2.h"
#include "vp8.h" #include "vp8.h"
#include "vp9.h" #include "vp9.h"
#include "av1.h"
#include <assert.h> #include <assert.h>
#include <stdio.h> #include <stdio.h>
@@ -132,12 +133,14 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
case VAProfileH264ConstrainedBaseline: case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh: case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh: case VAProfileH264StereoHigh:
case VAProfileH264High10:
memcpy(&surface_object->params.h264.picture, memcpy(&surface_object->params.h264.picture,
buffer_object->data, buffer_object->data,
sizeof(surface_object->params.h264.picture)); sizeof(surface_object->params.h264.picture));
break; break;
case VAProfileHEVCMain: case VAProfileHEVCMain:
case VAProfileHEVCMain10:
memcpy(&surface_object->params.h265.picture, memcpy(&surface_object->params.h265.picture,
buffer_object->data, buffer_object->data,
sizeof(surface_object->params.h265.picture)); sizeof(surface_object->params.h265.picture));
@@ -155,6 +158,12 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
sizeof(surface_object->params.vp9.picture)); sizeof(surface_object->params.vp9.picture));
break; break;
case VAProfileAV1Profile0:
memcpy(&surface_object->params.av1.picture,
buffer_object->data,
sizeof(surface_object->params.av1.picture));
break;
default: default:
break; break;
} }
@@ -167,12 +176,14 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
case VAProfileH264ConstrainedBaseline: case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh: case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh: case VAProfileH264StereoHigh:
case VAProfileH264High10:
memcpy(&surface_object->params.h264.slice, memcpy(&surface_object->params.h264.slice,
buffer_object->data, buffer_object->data,
sizeof(surface_object->params.h264.slice)); sizeof(surface_object->params.h264.slice));
break; break;
case VAProfileHEVCMain: { case VAProfileHEVCMain:
case VAProfileHEVCMain10: {
unsigned int n = surface_object->params.h265.num_slices; unsigned int n = surface_object->params.h265.num_slices;
if (n < HEVC_MAX_SLICES_PER_FRAME) { if (n < HEVC_MAX_SLICES_PER_FRAME) {
memcpy(&surface_object->params.h265.slices[n], memcpy(&surface_object->params.h265.slices[n],
@@ -220,6 +231,7 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
case VAProfileH264ConstrainedBaseline: case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh: case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh: case VAProfileH264StereoHigh:
case VAProfileH264High10:
memcpy(&surface_object->params.h264.matrix, memcpy(&surface_object->params.h264.matrix,
buffer_object->data, buffer_object->data,
sizeof(surface_object->params.h264.matrix)); sizeof(surface_object->params.h264.matrix));
@@ -227,6 +239,7 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
break; break;
case VAProfileHEVCMain: case VAProfileHEVCMain:
case VAProfileHEVCMain10:
memcpy(&surface_object->params.h265.iqmatrix, memcpy(&surface_object->params.h265.iqmatrix,
buffer_object->data, buffer_object->data,
sizeof(surface_object->params.h265.iqmatrix)); sizeof(surface_object->params.h265.iqmatrix));
@@ -286,6 +299,7 @@ static VAStatus codec_set_controls(struct request_data *driver_data,
case VAProfileH264ConstrainedBaseline: case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh: case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh: case VAProfileH264StereoHigh:
case VAProfileH264High10:
rc = h264_set_controls(driver_data, context, profile, rc = h264_set_controls(driver_data, context, profile,
surface_object); surface_object);
if (rc < 0) if (rc < 0)
@@ -293,6 +307,7 @@ static VAStatus codec_set_controls(struct request_data *driver_data,
break; break;
case VAProfileHEVCMain: case VAProfileHEVCMain:
case VAProfileHEVCMain10:
rc = h265_set_controls(driver_data, context, surface_object); rc = h265_set_controls(driver_data, context, surface_object);
if (rc < 0) if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED; return VA_STATUS_ERROR_OPERATION_FAILED;
@@ -310,6 +325,26 @@ static VAStatus codec_set_controls(struct request_data *driver_data,
return VA_STATUS_ERROR_OPERATION_FAILED; return VA_STATUS_ERROR_OPERATION_FAILED;
break; break;
case VAProfileAV1Profile0:
/*
* Populates V4L2_CID_STATELESS_AV1_SEQUENCE from
* VAPictureParameterBufferAV1. The daedalus_v4l2 daemon
* (issue #11 daemon track) synthesises an OBU_SEQUENCE_HEADER
* from this ctrl and prepends it to the slice bitstream
* before handing it to libavcodec/libdav1d, which otherwise
* cannot parse the (sequence-header-stripped) OUTPUT buffer
* that ffmpeg-vaapi delivers.
*
* On the RK3588 vpu981 hardware path the same SEQUENCE ctrl
* is harmless: vpu981's driver parses the OBU stream
* directly and ignores the ctrl payload, so no per-decoder
* gating is required here.
*/
rc = av1_set_controls(driver_data, context, surface_object);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
break;
default: default:
return VA_STATUS_ERROR_UNSUPPORTED_PROFILE; return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
} }
+279 -1
View File
@@ -57,6 +57,8 @@
#include <linux/media.h> #include <linux/media.h>
#include <linux/videodev2.h> #include <linux/videodev2.h>
#include "hevc-ctrls/v4l2-hevc-ext-controls.h"
/* /*
* fresnel-fourier iter4 Phase 6 commit Z + iter7 Phase 6 (B1a): device-path * fresnel-fourier iter4 Phase 6 commit Z + iter7 Phase 6 (B1a): device-path
* auto-detect via media controller topology with decoder-entity discrimination. * auto-detect via media controller topology with decoder-entity discrimination.
@@ -91,6 +93,10 @@
static const char * const known_decoder_drivers[] = { static const char * const known_decoder_drivers[] = {
"rkvdec", "rkvdec",
"hantro-vpu", "hantro-vpu",
"rpi-hevc-dec", /* iter40: Pi 5 / CM5 stateless HEVC */
#ifdef HAVE_DAEDALUS_V4L2
"daedalus_v4l2", /* phase 8.10: Pi 5 daemon-backed VP9/AV1/H264 */
#endif
"cedrus", "cedrus",
"sun4i_csi", "sun4i_csi",
NULL NULL
@@ -286,6 +292,43 @@ out:
* - non-NULL match only that exact driver name * - non-NULL match only that exact driver name
* - NULL match any name in known_decoder_drivers[] * - NULL match any name in known_decoder_drivers[]
*/ */
/*
* iter2 (ampere-kernel-decoders campaign) runtime probe for the
* V4L2 stateless HEVC EXT_SPS_{ST,LT}_RPS controls added in
* Linux 7.0 (Casanova VDPU381/VDPU383 series). Returns true iff BOTH
* controls are registered on the given fd. Stored per-fd on
* driver_data so the multi-device-probe model (iter38) doesn't
* silently misbehave when codec routing switches devices.
*
* The two CIDs together are the gate neither alone is meaningful
* without the other (st-RPS + lt-RPS arrays both need to be set to
* match the SPS num_short_term_ref_pic_sets / num_long_term_ref_pics_sps
* counts). Old kernels (RK3399 rkvdec on linux 6.x) register neither;
* RK3588 rkvdec (VDPU381/383 path) registers both.
*
* Reference: phase4_plan_iter2.md §Step 3 in
* ~/src/ampere-kernel-decoders/.
*/
static bool probe_hevc_ext_sps_rps_controls(int video_fd)
{
struct v4l2_queryctrl q;
if (video_fd < 0)
return false;
memset(&q, 0, sizeof(q));
q.id = V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS;
if (ioctl(video_fd, VIDIOC_QUERYCTRL, &q) < 0)
return false;
memset(&q, 0, sizeof(q));
q.id = V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS;
if (ioctl(video_fd, VIDIOC_QUERYCTRL, &q) < 0)
return false;
return true;
}
static int find_decoder_device_by_driver(const char *want_driver, static int find_decoder_device_by_driver(const char *want_driver,
char *video_out, size_t video_out_sz, char *video_out, size_t video_out_sz,
char *media_out, size_t media_out_sz) char *media_out, size_t media_out_sz)
@@ -369,6 +412,16 @@ char request_device_kind_for_profile(VAProfile profile)
case VAProfileMPEG2Main: case VAProfileMPEG2Main:
case VAProfileVP8Version0_3: case VAProfileVP8Version0_3:
return 'h'; return 'h';
case VAProfileAV1Profile0:
/*
* ampere-av1-enablement Phase 2: RK3588 vpu981 dedicated
* AV1 hantro instance. 'a' kind dispatches to
* driver_data->video_fd_vpu981. On hosts without the AV1
* instance the fd stays -1 and RequestQueryConfigProfiles
* never enumerates AV1, so this branch is unreachable for
* non-RK3588 hosts.
*/
return 'a';
default: default:
return '?'; return '?';
} }
@@ -392,12 +445,77 @@ int request_switch_device_for_profile(struct request_data *driver_data,
char kind = request_device_kind_for_profile(profile); char kind = request_device_kind_for_profile(profile);
int target_video, target_media; int target_video, target_media;
/*
* iter40: HEVC override when rpi-hevc-dec is probed. The static
* table (request_device_kind_for_profile) maps HEVC 'r' (rkvdec)
* because that's the canonical RK path. On Pi 5 there's no rkvdec
* rpi-hevc-dec is the only decoder. When BOTH would be present
* (hypothetical mixed board), prefer rpi-hevc-dec for HEVC.
*
* Other rkvdec-routed profiles (VP9, H.264) stay on 'r' because
* rpi-hevc-dec is HEVC-only.
*/
if ((profile == VAProfileHEVCMain || profile == VAProfileHEVCMain10) &&
driver_data->video_fd_rpi_hevc_dec >= 0 &&
driver_data->media_fd_rpi_hevc_dec >= 0) {
kind = 'p';
}
#ifdef HAVE_DAEDALUS_V4L2
/*
* LIBVA-1: VP9/AV1/H.264 daedalus_v4l2 when the daemon-backed
* decoder fd is open. Pi 5 has no rkvdec (those profiles map to
* 'r' by default video_fd_rkvdec = -1 "stay on whatever's
* active" fallback would put H.264 frames on rpi-hevc-dec's fd
* and S_FMT would fail). Re-route to the daedalus daemon instead.
*
* HEVC stays on 'p' (rpi-hevc-dec is HEVC-only daedalus would
* accept it via FFmpeg, but rpi-hevc-dec has the GPU-backed
* hardware path so it's the right choice on this SoC).
*
* AV1 'a' kind (RK3588 vpu981) wins ONLY if vpu981 was probed.
* On a Pi 5 the vpu981 slot stays -1, so we still route AV1 to
* daedalus here. Check video_fd_vpu981 to preserve the RK3588
* priority for that case.
*/
if (driver_data->video_fd_daedalus >= 0 &&
driver_data->media_fd_daedalus >= 0) {
switch (profile) {
case VAProfileH264Main:
case VAProfileH264High:
case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
case VAProfileVP9Profile0:
kind = 'd';
break;
case VAProfileAV1Profile0:
if (driver_data->video_fd_vpu981 < 0)
kind = 'd';
break;
default:
break;
}
}
#endif
if (kind == 'r') { if (kind == 'r') {
target_video = driver_data->video_fd_rkvdec; target_video = driver_data->video_fd_rkvdec;
target_media = driver_data->media_fd_rkvdec; target_media = driver_data->media_fd_rkvdec;
} else if (kind == 'h') { } else if (kind == 'h') {
target_video = driver_data->video_fd_hantro; target_video = driver_data->video_fd_hantro;
target_media = driver_data->media_fd_hantro; target_media = driver_data->media_fd_hantro;
} else if (kind == 'p') {
target_video = driver_data->video_fd_rpi_hevc_dec;
target_media = driver_data->media_fd_rpi_hevc_dec;
} else if (kind == 'a') {
target_video = driver_data->video_fd_vpu981;
target_media = driver_data->media_fd_vpu981;
#ifdef HAVE_DAEDALUS_V4L2
} else if (kind == 'd') {
target_video = driver_data->video_fd_daedalus;
target_media = driver_data->media_fd_daedalus;
#endif
} else { } else {
return -1; return -1;
} }
@@ -585,6 +703,12 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
driver_data->media_fd_rkvdec = -1; driver_data->media_fd_rkvdec = -1;
driver_data->video_fd_hantro = -1; driver_data->video_fd_hantro = -1;
driver_data->media_fd_hantro = -1; driver_data->media_fd_hantro = -1;
driver_data->video_fd_rpi_hevc_dec = -1;
driver_data->media_fd_rpi_hevc_dec = -1;
driver_data->video_fd_daedalus = -1;
driver_data->media_fd_daedalus = -1;
driver_data->video_fd_vpu981 = -1;
driver_data->media_fd_vpu981 = -1;
/* /*
* iter38: probe BOTH rkvdec and hantro-vpu so a single libva session * iter38: probe BOTH rkvdec and hantro-vpu so a single libva session
@@ -615,6 +739,36 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
alt_driver = "rkvdec"; alt_driver = "rkvdec";
driver_data->video_fd_hantro = video_fd; driver_data->video_fd_hantro = video_fd;
driver_data->media_fd_hantro = media_fd; driver_data->media_fd_hantro = media_fd;
} else if (strcmp(info.driver, "rpi-hevc-dec") == 0) {
/* iter40 + LIBVA-1: Pi 5 / CM5. rpi-hevc-dec is
* HEVC-only. If daedalus_v4l2 is ALSO loaded (Pi 5
* mixed deployment out-of-tree daemon-backed
* decoder for VP9/AV1/H264), pick it up as the alt
* so VP9/AV1/H264 have somewhere to land. */
primary_driver = "rpi-hevc-dec";
#ifdef HAVE_DAEDALUS_V4L2
alt_driver = "daedalus_v4l2";
#else
alt_driver = NULL;
#endif
driver_data->video_fd_rpi_hevc_dec = video_fd;
driver_data->media_fd_rpi_hevc_dec = media_fd;
#ifdef HAVE_DAEDALUS_V4L2
} else if (strcmp(info.driver, "daedalus_v4l2") == 0) {
/* phase 8.10 + LIBVA-1: Pi 5 daemon-backed decoder.
* VP9 / AV1 / H.264 route through it via the 'd'
* kind below. On a mixed-driver box where
* rpi-hevc-dec is ALSO loaded, pick it up as the
* alt so HEVC has somewhere to land too find_
* codec_device's known_decoder_drivers[] order
* normally puts rpi-hevc-dec first (we hit the
* other branch in practice), but symmetric handling
* keeps us correct if probe order ever flips. */
primary_driver = "daedalus_v4l2";
alt_driver = "rpi-hevc-dec";
driver_data->video_fd_daedalus = video_fd;
driver_data->media_fd_daedalus = media_fd;
#endif
} }
} }
@@ -626,15 +780,38 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
int alt_v = open(alt_video, O_RDWR | O_NONBLOCK); int alt_v = open(alt_video, O_RDWR | O_NONBLOCK);
int alt_m = (alt_v >= 0) ? open(alt_media, O_RDWR | O_NONBLOCK) : -1; int alt_m = (alt_v >= 0) ? open(alt_media, O_RDWR | O_NONBLOCK) : -1;
if (alt_v >= 0 && alt_m >= 0) { if (alt_v >= 0 && alt_m >= 0) {
/* Dispatch into the matching per-driver slot.
* iter38 only had rkvdec/hantro pairs; iter40 +
* LIBVA-1 extended this to rpi-hevc-dec and
* daedalus_v4l2 for the Pi 5 mixed-decoder
* deployment. */
if (strcmp(alt_driver, "rkvdec") == 0) { if (strcmp(alt_driver, "rkvdec") == 0) {
driver_data->video_fd_rkvdec = alt_v; driver_data->video_fd_rkvdec = alt_v;
driver_data->media_fd_rkvdec = alt_m; driver_data->media_fd_rkvdec = alt_m;
} else { } else if (strcmp(alt_driver, "hantro-vpu") == 0) {
driver_data->video_fd_hantro = alt_v; driver_data->video_fd_hantro = alt_v;
driver_data->media_fd_hantro = alt_m; driver_data->media_fd_hantro = alt_m;
} else if (strcmp(alt_driver, "rpi-hevc-dec") == 0) {
driver_data->video_fd_rpi_hevc_dec = alt_v;
driver_data->media_fd_rpi_hevc_dec = alt_m;
#ifdef HAVE_DAEDALUS_V4L2
} else if (strcmp(alt_driver, "daedalus_v4l2") == 0) {
driver_data->video_fd_daedalus = alt_v;
driver_data->media_fd_daedalus = alt_m;
#endif
} else {
/* Shouldn't happen — primary_driver branches
* above only set alt_driver to one of the
* names handled here. Close and move on. */
close(alt_v);
close(alt_m);
alt_v = -1;
alt_m = -1;
} }
if (alt_v >= 0) {
request_log("iter38: also opened %s decoder at %s + %s\n", request_log("iter38: also opened %s decoder at %s + %s\n",
alt_driver, alt_video, alt_media); alt_driver, alt_video, alt_media);
}
} else { } else {
if (alt_v >= 0) close(alt_v); if (alt_v >= 0) close(alt_v);
if (alt_m >= 0) close(alt_m); if (alt_m >= 0) close(alt_m);
@@ -642,7 +819,94 @@ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context)
} }
} }
(void)primary_driver; (void)primary_driver;
/*
* ampere-av1-enablement Phase 2: walk hantro-vpu media nodes
* for a SECOND one that advertises V4L2_PIX_FMT_AV1_FRAME
* (AV1F) as OUTPUT pixfmt. RK3588 has 3 hantro-vpu instances
* (legacy MPEG2/VP8 decoder, vepu121 encoder, vpu981 AV1
* decoder) all reporting driver="hantro-vpu" / model="hantro-
* vpu" — so OUTPUT-format probe is the only reliable
* disambiguator that doesn't depend on parsing card-name
* strings (which are DTS-dependent). First match wins.
*
* On non-RK3588 hosts the slot stays -1; RequestQueryConfig
* Profiles' AV1 push then no-ops because any_fd_supports_
* output_format() returns false for AV1F.
*/
{
int i;
char path[32], av1_video[32];
for (i = 0; i < 16; i++) {
int mfd, vfd;
struct media_device_info info;
snprintf(path, sizeof path, "/dev/media%d", i);
mfd = open(path, O_RDWR | O_NONBLOCK);
if (mfd < 0) continue;
memset(&info, 0, sizeof info);
if (ioctl(mfd, MEDIA_IOC_DEVICE_INFO, &info) != 0 ||
strcmp(info.driver, "hantro-vpu") != 0) {
close(mfd);
continue;
} }
if (find_decoder_video_node_via_topology(
mfd, av1_video, sizeof av1_video) != 0) {
close(mfd);
continue;
}
vfd = open(av1_video, O_RDWR | O_NONBLOCK);
if (vfd < 0) {
close(mfd);
continue;
}
if (!v4l2_find_format(vfd, V4L2_BUF_TYPE_VIDEO_OUTPUT, V4L2_PIX_FMT_AV1_FRAME) &&
!v4l2_find_format(vfd, V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, V4L2_PIX_FMT_AV1_FRAME)) {
close(vfd);
close(mfd);
continue;
}
driver_data->video_fd_vpu981 = vfd;
driver_data->media_fd_vpu981 = mfd;
request_log("ampere-av1: vpu981 AV1 decoder at %s + %s\n",
av1_video, path);
break;
}
}
}
/*
* iter2 (ampere-kernel-decoders): probe the new HEVC EXT_SPS_RPS
* controls on each rkvdec/hantro fd. Result is consumed by
* h265_set_controls per-codec gate. Per-fd storage matches the
* iter38 multi-device-probe pattern (Phase 5 review item).
*/
driver_data->has_hevc_ext_sps_rps_rkvdec =
probe_hevc_ext_sps_rps_controls(driver_data->video_fd_rkvdec);
driver_data->has_hevc_ext_sps_rps_hantro =
probe_hevc_ext_sps_rps_controls(driver_data->video_fd_hantro);
driver_data->has_hevc_ext_sps_rps_rpi_hevc_dec =
probe_hevc_ext_sps_rps_controls(driver_data->video_fd_rpi_hevc_dec);
if (driver_data->has_hevc_ext_sps_rps_rkvdec) {
request_log("iter2: kernel registers HEVC EXT_SPS_{ST,LT}_RPS "
"controls on rkvdec fd (will route through "
"vendored GStreamer parser)\n");
}
if (driver_data->video_fd_rpi_hevc_dec >= 0) {
request_log("iter40: also opened rpi-hevc-dec at video_fd=%d "
"media_fd=%d (Pi 5 HEVC stateless)\n",
driver_data->video_fd_rpi_hevc_dec,
driver_data->media_fd_rpi_hevc_dec);
}
#ifdef HAVE_DAEDALUS_V4L2
if (driver_data->video_fd_daedalus >= 0) {
request_log("phase 8.10: opened daedalus_v4l2 at video_fd=%d "
"media_fd=%d (Pi 5 daemon-backed VP9/AV1/H264)\n",
driver_data->video_fd_daedalus,
driver_data->media_fd_daedalus);
}
#endif
status = VA_STATUS_SUCCESS; status = VA_STATUS_SUCCESS;
goto complete; goto complete;
@@ -690,6 +954,20 @@ VAStatus RequestTerminate(VADriverContextP context)
close(driver_data->video_fd_hantro); close(driver_data->video_fd_hantro);
if (driver_data->media_fd_hantro >= 0) if (driver_data->media_fd_hantro >= 0)
close(driver_data->media_fd_hantro); close(driver_data->media_fd_hantro);
if (driver_data->video_fd_rpi_hevc_dec >= 0)
close(driver_data->video_fd_rpi_hevc_dec);
if (driver_data->media_fd_rpi_hevc_dec >= 0)
close(driver_data->media_fd_rpi_hevc_dec);
if (driver_data->video_fd_vpu981 >= 0)
close(driver_data->video_fd_vpu981);
if (driver_data->media_fd_vpu981 >= 0)
close(driver_data->media_fd_vpu981);
#ifdef HAVE_DAEDALUS_V4L2
if (driver_data->video_fd_daedalus >= 0)
close(driver_data->video_fd_daedalus);
if (driver_data->media_fd_daedalus >= 0)
close(driver_data->media_fd_daedalus);
#endif
/* Fall back to direct close if neither alt fd captured the active /* Fall back to direct close if neither alt fd captured the active
* pair (env-override path). */ * pair (env-override path). */
if (driver_data->video_fd_rkvdec < 0 && driver_data->video_fd_hantro < 0) { if (driver_data->video_fd_rkvdec < 0 && driver_data->video_fd_hantro < 0) {
+138 -1
View File
@@ -38,9 +38,20 @@
#include <linux/videodev2.h> #include <linux/videodev2.h>
#include "hevc-ctrls/v4l2-hevc-ext-controls.h"
#define V4L2_REQUEST_STR_VENDOR "v4l2-request" #define V4L2_REQUEST_STR_VENDOR "v4l2-request"
#define V4L2_REQUEST_MAX_PROFILES 11 /*
* Sized for max-possible enumeration with iter39 Option B reverted:
* MPEG2(2) + H264(6 incl. Hi10P) + HEVC(2 incl. Main10) + VP8 + VP9 + AV1 = 13.
* The per-group guards use `if (... && index < (MAX_PROFILES - N))` where N
* is the push-group size, so MAX must be total+1 14 here. Bumping
* defensively now so a future re-enable of Hi10P/Main10 doesn't silently
* drop AV1 through the off-by-one trap that ate ampere-av1's enumeration
* for a week (see issue marfrit/libva-v4l2-request-fourier#2).
*/
#define V4L2_REQUEST_MAX_PROFILES 14
#define V4L2_REQUEST_MAX_ENTRYPOINTS 5 #define V4L2_REQUEST_MAX_ENTRYPOINTS 5
#define V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES 10 #define V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES 10
#define V4L2_REQUEST_MAX_IMAGE_FORMATS 10 #define V4L2_REQUEST_MAX_IMAGE_FORMATS 10
@@ -76,6 +87,121 @@ struct request_data {
int media_fd_rkvdec; int media_fd_rkvdec;
int video_fd_hantro; int video_fd_hantro;
int media_fd_hantro; int media_fd_hantro;
/*
* iter40: third multi-device-probe slot for rpi-hevc-dec (Pi 5 /
* CM5 / BCM2712). V4L2 stateless HEVC; CAPTURE is NC12/NC30 SAND
* 128-pixel-wide column tiled (Pi-specific). On Pi 5 this is the
* ONLY decoder slot; on RK hosts it stays -1 and HEVC routes to
* rkvdec as before.
*/
int video_fd_rpi_hevc_dec;
int media_fd_rpi_hevc_dec;
/*
* phase 8.10: fifth multi-device-probe slot for daedalus_v4l2 the
* out-of-tree V4L2 stateless decoder shim that forwards bitstream
* to a userspace daemon (daedalus-v4l2 sibling repo). Daemon does
* FFmpeg-software decode for VP9 / AV1 / H.264 and ships pixels
* back via dmabuf into the CAPTURE buffer. Picked up via the
* same media-controller probe + known_decoder_drivers[] entry
* pattern as iter40 rpi-hevc-dec. Stays -1 on hosts without the
* daedalus module loaded; HEVC routes to rpi-hevc-dec as before.
*
* Fields are unconditional (8 bytes per session) so the struct
* layout is stable regardless of meson option. The active
* probe + dispatch code in request.c is gated by
* HAVE_DAEDALUS_V4L2; when disabled the fields stay at their
* -1 init and no codepath touches them.
*/
int video_fd_daedalus;
int media_fd_daedalus;
/*
* ampere-av1-enablement Phase 2: fourth multi-device-probe slot
* for vpu981 (RK3588's dedicated AV1 hantro instance, kernel
* card="rockchip,rk3588-av1-vpu-dec", driver name "hantro-vpu"
* shared with the legacy MPEG-2/VP8/H.264 hantro). Discriminated
* by V4L2_PIX_FMT_AV1_FRAME (AV1F) OUTPUT-pixfmt capability since
* the driver name alone is ambiguous on RK3588. Stays -1 on hosts
* without the AV1 vpu-dec.
*
* Named "vpu981" for consistency with the in-progress av1-iter1
* operator branch (Phase 3-5 bit-exact AV1 work when that lands
* these fields receive the actual decode dispatch wiring).
*/
int video_fd_vpu981;
int media_fd_vpu981;
/*
* iter2 (ampere-kernel-decoders campaign) per-fd probe result
* for the V4L2_CID_STATELESS_HEVC_EXT_SPS_{ST,LT}_RPS controls
* introduced in Linux 7.0 (Casanova VDPU381/VDPU383 series).
* RK3399 rkvdec doesn't have them and the probe returns false;
* RK3588 rkvdec (VDPU381/383) registers them and the probe is
* true. h265_set_controls consults only the rkvdec entry because
* HEVC routes through rkvdec only hantro's entry stays false
* naturally (it doesn't have rkvdec-specific controls).
*
* The pair-of-flags layout mirrors video_fd_rkvdec /
* video_fd_hantro above (iter38 multi-device-probe pattern,
* memory feedback_multi_device_probe_design). Phase 5 review
* surfaced this as a correctness item: a single scalar on
* driver_data would silently misbehave across device-switch
* boundaries; per-fd storage is the safe shape.
*/
bool has_hevc_ext_sps_rps_rkvdec;
bool has_hevc_ext_sps_rps_hantro;
/* iter40: rpi-hevc-dec doesn't expose EXT_SPS_*_RPS controls
* (verified Phase 0 higgs probe: QUERY_EXT_CTRL on 0xa97 EINVAL).
* Probed for consistency with the iter2 pair-of-flags pattern;
* stays false on Pi 5 and the iter2 vendored-parser path naturally
* doesn't engage. */
bool has_hevc_ext_sps_rps_rpi_hevc_dec;
/*
* iter2 cached SPS-derived RPS arrays. SPS NALs only appear in
* source_data on IDR frames; non-IDR frames' h265_set_controls
* reuse the cached arrays so we don't submit zero-filled RPS to
* the kernel (which would re-trigger the OOPS the iter2 fix is
* designed to prevent). Single-slot cache (sps_id 0 only)
* adequate for the BBB / typical-stream case; multi-SPS streams
* would need expanding to a [16] cache keyed by sps_id.
*
* The cache stores the post-mapped V4L2 control struct arrays
* (not the intermediate GstH265SPS) so request.h doesn't need
* to know about the vendored GStreamer parser types only the
* V4L2 UAPI structs from hevc-ctrls/v4l2-hevc-ext-controls.h
* included above.
*
* Owned by h265.c; freed at RequestTerminate.
*/
struct v4l2_ctrl_hevc_ext_sps_st_rps *hevc_rps_cache_st;
unsigned int hevc_rps_cache_st_count;
struct v4l2_ctrl_hevc_ext_sps_lt_rps *hevc_rps_cache_lt;
unsigned int hevc_rps_cache_lt_count;
bool hevc_rps_cache_valid;
/*
* iter40b: bitstream-derived SPS field cache for VAAPI-omitted
* fields. rpi-hevc-dec validates these against bitstream-true
* values; the rkvdec/hantro fallback (sps_max_dec_pic_buffering_minus1,
* 0) that satisfies §A.4.2 isn't enough for rpi.
*
* Cached on first IDR frame's SPS NAL parse, reused for subsequent
* non-IDR frames whose source_data may not carry an SPS.
*
* sps_max_sub_layers_minus1 is the index into max_*[] arrays. The
* V4L2 SPS struct fields are scalars (single sublayer), so we pick
* the HighestTid (= sps_max_sub_layers_minus1) slot matches
* ffmpeg-vaapi + kdirect convention.
*/
struct {
bool valid;
uint8_t sps_max_sub_layers_minus1;
uint8_t max_dec_pic_buffering_minus1;
uint8_t max_num_reorder_pics;
uint8_t max_latency_increase_plus1;
bool scaling_list_enabled;
bool scaling_list_data_present;
} hevc_sps_field_cache;
struct video_format *video_format; struct video_format *video_format;
@@ -133,6 +259,17 @@ struct request_data {
unsigned int fmt_buffers_count; unsigned int fmt_buffers_count;
unsigned int fmt_sizes[VIDEO_MAX_PLANES]; unsigned int fmt_sizes[VIDEO_MAX_PLANES];
unsigned int fmt_bytesperlines[VIDEO_MAX_PLANES]; unsigned int fmt_bytesperlines[VIDEO_MAX_PLANES];
/*
* iter39: active session is decoding a 10-bit profile (Hi10P / Main10).
* Set in RequestCreateContext from config->profile. Drives:
* - CAPTURE pix_fmt selection (NV15 instead of NV12)
* - image.c DeriveImage / QueryImageFormats fourcc reporting (P010
* instead of NV12)
* - copy_surface_to_image NV15P010 unpack branch
* Reset to false at DestroyContext.
*/
bool is_10bit;
}; };
VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context); VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context);
+11 -2
View File
@@ -182,7 +182,9 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
* surface_bind_format_uniform_fields(); the per-slot * surface_bind_format_uniform_fields(); the per-slot
* destination_* fields fill at BeginPicture via surface_bind_slot. * destination_* fields fill at BeginPicture via surface_bind_slot.
*/ */
if (format != VA_RT_FORMAT_YUV420) /* iter39: allow YUV420_10 for Hi10P / Main10 surface allocation. */
if (format != VA_RT_FORMAT_YUV420 &&
format != VA_RT_FORMAT_YUV420_10)
return VA_STATUS_ERROR_UNSUPPORTED_RT_FORMAT; return VA_STATUS_ERROR_UNSUPPORTED_RT_FORMAT;
for (i = 0; i < surfaces_count; i++) { for (i = 0; i < surfaces_count; i++) {
@@ -706,7 +708,14 @@ VAStatus RequestExportSurfaceHandle(VADriverContextP context,
planes_count = surface_object->destination_planes_count; planes_count = surface_object->destination_planes_count;
surface_descriptor->fourcc = VA_FOURCC_NV12; /* iter39: 10-bit session exports a DRM_FORMAT_NV15 buffer; advertise
* the matching fourcc so a PRIME consumer aware of NV15 (panfrost-
* Mesa et al.) can import correctly. PRIME consumers that only know
* NV12 / P010 should use the COPY (vaGetImage) path which unpacks
* NV15P010 in image.c::copy_surface_to_image. */
surface_descriptor->fourcc = driver_data->is_10bit
? VA_FOURCC('N', 'V', '1', '5')
: VA_FOURCC_NV12;
surface_descriptor->width = surface_object->width; surface_descriptor->width = surface_object->width;
surface_descriptor->height = surface_object->height; surface_descriptor->height = surface_object->height;
surface_descriptor->num_objects = export_fds_count; surface_descriptor->num_objects = export_fds_count;
+12
View File
@@ -122,6 +122,18 @@ struct object_surface {
VADecPictureParameterBufferVP9 picture; VADecPictureParameterBufferVP9 picture;
VASliceParameterBufferVP9 slice; VASliceParameterBufferVP9 slice;
} vp9; } vp9;
struct {
/*
* AV1 picture parameter buffer. Slice params are
* intentionally absent the daedalus daemon track
* (issue #11) consumes the slice OBU bytes directly
* from the OUTPUT bitstream and synthesises only the
* sequence-header OBU from V4L2_CID_STATELESS_AV1_
* SEQUENCE. No per-tile-group structOBU re-synthesis
* required from libva today.
*/
VADecPictureParameterBufferAV1 picture;
} av1;
} params; } params;
int request_fd; int request_fd;
+26 -3
View File
@@ -476,12 +476,35 @@ int v4l2_set_controls(int video_fd, int request_fd,
struct v4l2_ext_control *control_array, struct v4l2_ext_control *control_array,
unsigned int num_controls) unsigned int num_controls)
{ {
struct v4l2_ext_controls controls;
int rc; int rc;
rc = v4l2_ioctl_controls(video_fd, request_fd, VIDIOC_S_EXT_CTRLS, memset(&controls, 0, sizeof(controls));
control_array, num_controls); controls.controls = control_array;
controls.count = num_controls;
if (request_fd >= 0) {
controls.which = V4L2_CTRL_WHICH_REQUEST_VAL;
controls.request_fd = request_fd;
}
rc = ioctl(video_fd, VIDIOC_S_EXT_CTRLS, &controls);
if (rc < 0) { if (rc < 0) {
request_log("Unable to set control(s): %s\n", strerror(errno)); /* error_idx is the index of the first failing control;
* if it equals count, the ioctl itself failed (not a
* specific control payload). Useful for triaging
* which V4L2_CID_STATELESS_* the kernel rejected. */
if (controls.error_idx < num_controls)
request_log("Unable to set control(s): %s "
"(error_idx=%u/%u failing_ctrl_id=0x%x size=%u)\n",
strerror(errno),
controls.error_idx, controls.count,
control_array[controls.error_idx].id,
control_array[controls.error_idx].size);
else
request_log("Unable to set control(s): %s "
"(error_idx=%u/%u ioctl-level)\n",
strerror(errno),
controls.error_idx, controls.count);
return -1; return -1;
} }
+34
View File
@@ -31,6 +31,8 @@
#include <drm_fourcc.h> #include <drm_fourcc.h>
#include <linux/videodev2.h> #include <linux/videodev2.h>
#include "nv12_col128.h" /* fallback V4L2_PIX_FMT_NV12_COL128 define */
#include "nv15.h" /* fallback V4L2_PIX_FMT_NV15 define */
#include "utils.h" #include "utils.h"
#include "video.h" #include "video.h"
@@ -45,6 +47,38 @@ static struct video_format formats[] = {
.planes_count = 2, .planes_count = 2,
.bpp = 16, .bpp = 16,
}, },
{
.description = "NV15 YUV (10-bit, rkvdec)",
.v4l2_format = V4L2_PIX_FMT_NV15,
.v4l2_buffers_count = 1,
.v4l2_mplane = true,
.drm_format = DRM_FORMAT_NV15,
.drm_modifier = DRM_FORMAT_MOD_NONE,
.planes_count = 2,
.bpp = 24,
},
{
/*
* iter40: Pi 5 / CM5 rpi-hevc-dec CAPTURE format. 8-bit NV12
* stored as 128-pixel-wide column tiles (SAND128 layout).
* Pi-specific; not in mainline drm_fourcc.h (uses NV12 + a
* BROADCOM_SAND128 modifier for DRM_PRIME). Our consumer path
* always detiles to linear NV12 in copy_surface_to_image, so
* we don't expose the SAND modifier downstream drm_format is
* still DRM_FORMAT_NV12 and drm_modifier MOD_NONE so the
* format-is-linear gate doesn't pull us into tiled_to_planar
* (Sunxi-specific). image.c branches on v4l2_format ==
* V4L2_PIX_FMT_NV12_COL128 to invoke the dedicated detile.
*/
.description = "NV12 SAND128 (8-bit, rpi-hevc-dec)",
.v4l2_format = V4L2_PIX_FMT_NV12_COL128,
.v4l2_buffers_count = 1,
.v4l2_mplane = true,
.drm_format = DRM_FORMAT_NV12,
.drm_modifier = DRM_FORMAT_MOD_NONE,
.planes_count = 2,
.bpp = 16,
},
// Code to handle this DRM_FORMAT is __arm__ only // Code to handle this DRM_FORMAT is __arm__ only
#ifdef __arm__ #ifdef __arm__
{ {
+196
View File
@@ -0,0 +1,196 @@
/*
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
*
* MIT-licensed per project. iter40 self-test for nv12_col128 detile.
*
* Build an NC12-tiled source buffer from a known linear NV12 image,
* run the detile primitive, assert output matches the original. No
* hardware needed pure bit-layout verification of the kernel math
* (drivers/media/platform/raspberrypi/hevc_dec/hevc_d_video.c
* V4L2_PIX_FMT_NV12_COL128 case + ffmpeg/Kynesim per-pixel offset).
*
* Build:
* cc -Wall -Werror -O2 -o test_nv12_col128_detile \
* tests/test_nv12_col128_detile.c src/nv12_col128.c
*
* Exit 0 = all asserts pass.
*/
#include "../src/nv12_col128.h"
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TILE_W 128
static unsigned int align_up(unsigned int v, unsigned int a)
{
return (v + a - 1) & ~(a - 1);
}
/* Pack a linear plane (width × height bytes, stride=width) into NC12
* layout: each 128-wide column held contiguously, columns at offsets
* col * col_stride * 128. col_stride is the kernel-reported bytesperline
* = ALIGN(height, 8) * 3/2. Returns the buffer + sizes. */
static uint8_t *pack_to_nc12(const uint8_t *linear,
unsigned int width, unsigned int height,
unsigned int *out_col_stride,
size_t *out_size)
{
unsigned int aligned_w = align_up(width, TILE_W);
unsigned int aligned_h = align_up(height, 8);
unsigned int col_stride = aligned_h * 3 / 2;
unsigned int num_cols = aligned_w / TILE_W;
size_t total = (size_t)col_stride * aligned_w;
uint8_t *buf;
unsigned int col, y, in_col;
buf = calloc(1, total);
assert(buf != NULL);
for (col = 0; col < num_cols; col++) {
uint8_t *col_base = buf + (size_t)col * TILE_W * col_stride;
for (y = 0; y < height; y++) {
for (in_col = 0; in_col < TILE_W; in_col++) {
unsigned int x = col * TILE_W + in_col;
if (x >= width)
break;
col_base[(size_t)y * TILE_W + in_col] =
linear[(size_t)y * width + x];
}
}
}
*out_col_stride = col_stride;
*out_size = total;
return buf;
}
static void test_detile_y(unsigned int width, unsigned int height)
{
uint8_t *linear, *tiled, *recovered;
unsigned int col_stride;
size_t tile_size, i;
linear = malloc((size_t)width * height);
assert(linear != NULL);
/* Distinctive content per pixel: y * 17 + x * 13 — avoids byte-
* aliasing patterns that could mask off-by-one bugs. */
for (unsigned int y = 0; y < height; y++)
for (unsigned int x = 0; x < width; x++)
linear[(size_t)y * width + x] = (uint8_t)(y * 17 + x * 13);
tiled = pack_to_nc12(linear, width, height, &col_stride, &tile_size);
recovered = calloc(1, (size_t)width * height);
assert(recovered != NULL);
nv12_col128_detile_y(recovered, width, tiled, col_stride, width, height);
for (i = 0; i < (size_t)width * height; i++) {
if (recovered[i] != linear[i]) {
fprintf(stderr,
"FAIL %ux%u Y: pixel %zu (x=%zu y=%zu) "
"linear=0x%02x recovered=0x%02x\n",
width, height, i,
i % width, i / width,
linear[i], recovered[i]);
free(linear); free(tiled); free(recovered);
exit(1);
}
}
printf("PASS %ux%u Y plane (%u columns, col_stride=%u, tile_size=%zu)\n",
width, height, align_up(width, TILE_W) / TILE_W,
col_stride, tile_size);
free(linear);
free(tiled);
free(recovered);
}
static void test_detile_uv(unsigned int width, unsigned int height)
{
unsigned int uv_h = height / 2;
uint8_t *linear, *tiled, *recovered;
unsigned int col_stride;
size_t tile_size, i;
linear = malloc((size_t)width * uv_h);
assert(linear != NULL);
for (unsigned int y = 0; y < uv_h; y++)
for (unsigned int x = 0; x < width; x++)
linear[(size_t)y * width + x] = (uint8_t)(y * 23 + x * 7);
tiled = pack_to_nc12(linear, width, uv_h, &col_stride, &tile_size);
recovered = calloc(1, (size_t)width * uv_h);
assert(recovered != NULL);
nv12_col128_detile_uv(recovered, width, tiled, col_stride, width, uv_h);
for (i = 0; i < (size_t)width * uv_h; i++) {
if (recovered[i] != linear[i]) {
fprintf(stderr,
"FAIL %ux%u UV: pixel %zu linear=0x%02x recovered=0x%02x\n",
width, height, i,
linear[i], recovered[i]);
free(linear); free(tiled); free(recovered);
exit(1);
}
}
printf("PASS %ux%u UV plane\n", width, height);
free(linear);
free(tiled);
free(recovered);
}
static void test_uv_offset(void)
{
/* Per the SAND COL128 layout, Y and UV are interleaved within
* EACH column (not concatenated as separate planes), so the UV
* plane base pointer is offset by 128 * ALIGN(height, 8) the
* Y portion of column 0. NOT 128 * height * num_columns (the
* size of all Y across all columns), which was an earlier wrong
* formula caught by Phase 7 SEGV on higgs. */
unsigned int off = nv12_col128_uv_plane_offset(1280, 720);
if (off != 128u * 720) {
fprintf(stderr, "FAIL UV offset 1280×720: got %u expected %u\n",
off, 128u * 720);
exit(1);
}
printf("PASS UV offset 1280×720 = %u\n", off);
off = nv12_col128_uv_plane_offset(1366, 768);
if (off != 128u * 768) {
fprintf(stderr, "FAIL UV offset 1366×768: got %u expected %u\n",
off, 128u * 768);
exit(1);
}
printf("PASS UV offset 1366×768 (column-misaligned width)\n");
}
int main(void)
{
/* Phase 3 fixture sizes — all 128-aligned, 8-line-aligned. */
test_detile_y(640, 360);
test_detile_y(1280, 720);
test_detile_y(1920, 1080);
/* Phase 5 review F4: column-misaligned width (1366 → 1408 padding). */
test_detile_y(1366, 768);
/* UV plane (half-height) at each width. */
test_detile_uv(640, 360);
test_detile_uv(1280, 720);
test_detile_uv(1920, 1080);
test_detile_uv(1366, 768);
test_uv_offset();
printf("All NC12 detile asserts pass.\n");
return 0;
}
+224
View File
@@ -0,0 +1,224 @@
/*
* Copyright (C) 2026 claude-noether <claude-noether@reauktion.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
/*
* iter39 self-test for nv15_unpack_plane_to_p010.
*
* Builds NV15 plane buffers from known 10-bit pixel arrays, runs the
* unpack, asserts P010 output matches the expected pixel<<6 values.
* No hardware needed pure bit layout verification per
* Documentation/userspace-api/media/v4l/pixfmt-nv15.rst.
*
* Build:
* cc -Wall -Werror -O2 -o test_nv15_unpack tests/test_nv15_unpack.c src/nv15.c
*
* Exit 0 = all asserts pass.
*/
#include "../src/nv15.h"
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* Pack 4 10-bit pixels into 5 bytes per NV15 layout (LSB-first across
* bits 0..39). Inverse of nv15_unpack_plane_to_p010's per-group unpack. */
static void pack4(uint16_t a, uint16_t b, uint16_t c, uint16_t d,
uint8_t out[5])
{
out[0] = (uint8_t)(a & 0xFF);
out[1] = (uint8_t)(((a >> 8) & 0x03) | ((b & 0x3F) << 2));
out[2] = (uint8_t)(((b >> 6) & 0x0F) | ((c & 0x0F) << 4));
out[3] = (uint8_t)(((c >> 4) & 0x3F) | ((d & 0x03) << 6));
out[4] = (uint8_t)((d >> 2) & 0xFF);
}
#define ASSERT_EQ(actual, expected, msg) do { \
if ((actual) != (expected)) { \
fprintf(stderr, "FAIL %s: actual=0x%04x expected=0x%04x at %s:%d\n", \
(msg), (unsigned)(actual), (unsigned)(expected), \
__FILE__, __LINE__); \
exit(1); \
} \
} while (0)
static void test_pack_unpack_roundtrip(uint16_t a, uint16_t b, uint16_t c,
uint16_t d)
{
uint8_t packed[5];
uint16_t dst[4];
pack4(a, b, c, d, packed);
nv15_unpack_plane_to_p010(packed, dst, 4, 1, 5);
ASSERT_EQ(dst[0], (uint16_t)(a << 6), "roundtrip a");
ASSERT_EQ(dst[1], (uint16_t)(b << 6), "roundtrip b");
ASSERT_EQ(dst[2], (uint16_t)(c << 6), "roundtrip c");
ASSERT_EQ(dst[3], (uint16_t)(d << 6), "roundtrip d");
}
static void test_zero(void)
{
uint8_t packed[5] = { 0, 0, 0, 0, 0 };
uint16_t dst[4] = { 0xDEAD, 0xDEAD, 0xDEAD, 0xDEAD };
nv15_unpack_plane_to_p010(packed, dst, 4, 1, 5);
ASSERT_EQ(dst[0], 0, "zero[0]");
ASSERT_EQ(dst[1], 0, "zero[1]");
ASSERT_EQ(dst[2], 0, "zero[2]");
ASSERT_EQ(dst[3], 0, "zero[3]");
}
static void test_all_max(void)
{
/* All four pixels = 0x3FF (max 10-bit). Packed bits all 1 → all 0xFF. */
uint8_t packed[5] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF };
uint16_t dst[4] = { 0, 0, 0, 0 };
nv15_unpack_plane_to_p010(packed, dst, 4, 1, 5);
ASSERT_EQ(dst[0], 0xFFC0, "max[0]");
ASSERT_EQ(dst[1], 0xFFC0, "max[1]");
ASSERT_EQ(dst[2], 0xFFC0, "max[2]");
ASSERT_EQ(dst[3], 0xFFC0, "max[3]");
}
static void test_known_vectors(void)
{
/* Position-sensitive sanity: each pixel = its index+1. */
test_pack_unpack_roundtrip(1, 2, 3, 4);
/* Spread patterns that exercise every byte-boundary bit. */
test_pack_unpack_roundtrip(0x3FF, 0x000, 0x3FF, 0x000);
test_pack_unpack_roundtrip(0x000, 0x3FF, 0x000, 0x3FF);
test_pack_unpack_roundtrip(0x155, 0x2AA, 0x155, 0x2AA);
test_pack_unpack_roundtrip(0x001, 0x002, 0x004, 0x008);
test_pack_unpack_roundtrip(0x080, 0x040, 0x020, 0x010);
test_pack_unpack_roundtrip(0x200, 0x100, 0x080, 0x040);
test_pack_unpack_roundtrip(0x3F0, 0x0F3, 0x33C, 0x2A5);
}
static void test_remainder_width(void)
{
/* width=1: only A unpacked, B/C/D undefined */
{
uint8_t packed[5];
uint16_t dst[1] = { 0xDEAD };
pack4(0x123, 0x000, 0x000, 0x000, packed);
nv15_unpack_plane_to_p010(packed, dst, 1, 1, 5);
ASSERT_EQ(dst[0], 0x123 << 6, "rem1[0]");
}
/* width=2 */
{
uint8_t packed[5];
uint16_t dst[2] = { 0, 0 };
pack4(0x111, 0x222, 0x000, 0x000, packed);
nv15_unpack_plane_to_p010(packed, dst, 2, 1, 5);
ASSERT_EQ(dst[0], 0x111 << 6, "rem2[0]");
ASSERT_EQ(dst[1], 0x222 << 6, "rem2[1]");
}
/* width=3 */
{
uint8_t packed[5];
uint16_t dst[3] = { 0, 0, 0 };
pack4(0x111, 0x222, 0x333, 0x000, packed);
nv15_unpack_plane_to_p010(packed, dst, 3, 1, 5);
ASSERT_EQ(dst[0], 0x111 << 6, "rem3[0]");
ASSERT_EQ(dst[1], 0x222 << 6, "rem3[1]");
ASSERT_EQ(dst[2], 0x333 << 6, "rem3[2]");
}
/* width=7: one full group + 3 remainder */
{
uint8_t packed[10];
uint16_t dst[7] = { 0 };
pack4(0x100, 0x200, 0x300, 0x010, &packed[0]);
pack4(0x011, 0x022, 0x033, 0x000, &packed[5]);
nv15_unpack_plane_to_p010(packed, dst, 7, 1, 10);
ASSERT_EQ(dst[0], 0x100 << 6, "rem7[0]");
ASSERT_EQ(dst[1], 0x200 << 6, "rem7[1]");
ASSERT_EQ(dst[2], 0x300 << 6, "rem7[2]");
ASSERT_EQ(dst[3], 0x010 << 6, "rem7[3]");
ASSERT_EQ(dst[4], 0x011 << 6, "rem7[4]");
ASSERT_EQ(dst[5], 0x022 << 6, "rem7[5]");
ASSERT_EQ(dst[6], 0x033 << 6, "rem7[6]");
}
/* width=8: two full groups */
{
uint8_t packed[10];
uint16_t dst[8] = { 0 };
pack4(0x101, 0x202, 0x303, 0x101, &packed[0]);
pack4(0x202, 0x303, 0x101, 0x202, &packed[5]);
nv15_unpack_plane_to_p010(packed, dst, 8, 1, 10);
ASSERT_EQ(dst[7], 0x202 << 6, "w8[7]");
}
}
static void test_multi_row_stride_padding(void)
{
/* 4-pixel-wide, 3-row plane; stride = 8 bytes (3 bytes padding). */
uint8_t packed[24]; /* 3 rows × 8 bytes */
uint16_t dst[12]; /* 3 rows × 4 pixels */
memset(packed, 0xCC, sizeof(packed)); /* padding poison */
pack4(0x111, 0x222, 0x333, 0x044, &packed[0 * 8]);
pack4(0x055, 0x166, 0x177, 0x188, &packed[1 * 8]);
pack4(0x099, 0x1AA, 0x2BB, 0x3CC, &packed[2 * 8]);
memset(dst, 0xAB, sizeof(dst));
nv15_unpack_plane_to_p010(packed, dst, 4, 3, 8);
ASSERT_EQ(dst[0], 0x111 << 6, "row0[0]");
ASSERT_EQ(dst[3], 0x044 << 6, "row0[3]");
ASSERT_EQ(dst[4], 0x055 << 6, "row1[0]");
ASSERT_EQ(dst[7], 0x188 << 6, "row1[3]");
ASSERT_EQ(dst[8], 0x099 << 6, "row2[0]");
ASSERT_EQ(dst[11], 0x3CC << 6, "row2[3]");
}
static void test_chroma_half_height(void)
{
/* 4-pixel-wide × 2-row chroma (matches 4×4 luma quadrant).
* NV15 chroma uses same packing as luma, just half-height. */
uint8_t packed[10]; /* 2 rows × 5 bytes */
uint16_t dst[8]; /* 2 rows × 4 pixels (UV pairs in interleaved form) */
pack4(0x080, 0x180, 0x280, 0x380, &packed[0]);
pack4(0x040, 0x140, 0x240, 0x340, &packed[5]);
nv15_unpack_plane_to_p010(packed, dst, 4, 2, 5);
ASSERT_EQ(dst[0], 0x080 << 6, "chroma row0[0]");
ASSERT_EQ(dst[3], 0x380 << 6, "chroma row0[3]");
ASSERT_EQ(dst[4], 0x040 << 6, "chroma row1[0]");
ASSERT_EQ(dst[7], 0x340 << 6, "chroma row1[3]");
}
int main(void)
{
test_zero();
test_all_max();
test_known_vectors();
test_remainder_width();
test_multi_row_stride_padding();
test_chroma_half_height();
printf("test_nv15_unpack: all PASS\n");
return 0;
}