Files
claude-noether 794bac72df Phase 1 plan: libva-v4l2-request-fourier AV1 dispatch
Goal: VAAPI consumers (mpv, VLC, GStreamer-VAAPI, browsers) can decode
AV1 via libva backend, same HW path that ffmpeg-v4l2request kdirect
already uses bit-perfectly (Phase 0).

Plan ~800 LoC across 7 files (new av1.c ~700 LoC, av1.h, plus edits to
codec.c, config.c, picture.c, surface.h, Makefile.am).

Canonical reference: Kwiboo/FFmpeg v4l2-request-n8.1
libavcodec/v4l2_request_av1.c (636 LoC) — exact field mappings for
v4l2_ctrl_av1_sequence/_frame/_film_grain/_tile_group_entry.

Architectural pattern: existing vp9.c (700+ LoC) in the backend.

6 open architectural questions for Janet review before Phase 2 code:
Q1 4-control batching (vs vp9's 2)
Q2 film_grain conditional vs unconditional submit
Q3 SEQUENCE caching strategy
Q4 VAOpaqueAV1 opaque payload semantics
Q5 vpu981 vs rkvdec device selection in cap_pool
Q6 multi-device probe extension (iter38b pattern + vpu981 for AV1)

Phase 2 starts after Janet sign-off.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 07:47:28 +00:00

6.5 KiB

Phase 1 plan — libva-v4l2-request-fourier AV1 dispatch

Date: 2026-05-17 09:15. Phase 0 verified mainline AV1 works bit-perfect via ffmpeg's kdirect (v4l2request) path. Phase 1 plans the libva backend AV1 dispatch so VAAPI consumers can also use the HW path.

Goal

Add ~/src/libva-v4l2-request-fourier/src/av1.{c,h} that dispatches VAAPI AV1 picture/slice buffers onto V4L2 V4L2_CID_STATELESS_AV1_* controls, matching the working ffmpeg-v4l2request behaviour. End-state criterion: VAAPI-via-libva AV1 decode bit-perfect against the verified kdirect path (/tmp/hw-av1*.nv12 from Phase 0).

Reference implementations

Source Where Use
Kwiboo FFmpeg v4l2_request_av1.c Kwiboo/FFmpeg:v4l2-request-n8.1:libavcodec/v4l2_request_av1.c (636 LoC) Canonical filler — exact field semantics for v4l2_ctrl_av1_*. Reads from FFmpeg's AV1RawSequenceHeader/AV1DecContext but the v4l2 output structs are identical to what we need.
libva backend vp9.c ampere:~/src/libva-v4l2-request-fourier/src/vp9.c (700+ LoC) Architectural pattern — set_controls / store_buffer dispatch / context wiring
V4L2 AV1 uAPI /usr/include/linux/v4l2-controls.h:2895..3517 Source of truth for struct layouts
VAAPI AV1 structures <va/va_dec_av1.h> (libva-dev) — VAPictureParameterBufferAV1, VASliceParameterBufferAV1, VAOpaque payload App→backend interface

V4L2 AV1 controls (4 controls per frame)

Control Struct Size Notes
V4L2_CID_STATELESS_AV1_SEQUENCE v4l2_ctrl_av1_sequence tiny Sequence header — set once per stream
V4L2_CID_STATELESS_AV1_FRAME v4l2_ctrl_av1_frame 7+ sub-structs Per-frame header — bulk of the work
V4L2_CID_STATELESS_AV1_TILE_GROUP_ENTRY v4l2_ctrl_av1_tile_group_entry[] array One entry per tile
V4L2_CID_STATELESS_AV1_FILM_GRAIN v4l2_ctrl_av1_film_grain medium Per-frame film grain (optional)

v4l2_ctrl_av1_frame (the heavy one) aggregates: v4l2_av1_tile_info, v4l2_av1_quantization, v4l2_av1_segmentation, v4l2_av1_loop_filter, v4l2_av1_cdef, v4l2_av1_loop_restoration, v4l2_av1_global_motion. Each maps to AV1-spec fields directly.

Backend integration plan

Step File Action
1 codec.c Add case VAProfileAV1Profile0: return V4L2_PIX_FMT_AV1_FRAME;
2 config.c Add any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_AV1_FRAME) check; advertise VAProfileAV1Profile0
3 surface.h Add av1 substruct: picture (VAPictureParameterBufferAV1) + slice (VASliceParameterBufferAV1) parsed payload
4 picture.c::codec_store_buffer Add VAAPI buffer-type cases: VAPictureParameterBufferType + VASliceParameterBufferType for VAProfileAV1Profile0
5 picture.c::render_picture Call av1_set_controls() for VAProfileAV1Profile0
6 NEW av1.h Public av1_set_controls() signature (matches vp9.h)
7 NEW av1.c ~700 lines: fill_sequence, fill_frame, fill_film_grain, fill_tile_group_entry, set_controls dispatch
8 Makefile.am Add av1.c av1.h to source list
9 Build autoreconf -fi && ./configure && make on ampere

Phase 2 file breakdown (estimated)

File Lines added
av1.c ~700 (mirror vp9.c structure + AV1's larger control surface)
av1.h ~30
codec.c +5
config.c +10
picture.c +30
surface.h +10
Makefile.am +2

Total: ~800 LoC. Realistic effort: 1-2 focused days.

Field-mapping notes (key concerns from Kwiboo reference)

  1. v4l2_ctrl_av1_sequence.flags — many V4L2_AV1_SEQUENCE_FLAG_* bits mapped from VAAPI's seq_fields.bits.*. Direct correspondence. Kwiboo's fill_sequence is 75 lines, all if (seq->X) flags |= V4L2_AV1_SEQUENCE_FLAG_X;.

  2. v4l2_ctrl_av1_frame — 257 LoC in Kwiboo. Contains per-frame: tile_info, quantization, segmentation, loop_filter, cdef, loop_restoration, global_motion. Each is a separate v4l2_av1_* sub-struct that needs filling.

  3. Reference frame mapping: VAAPI provides VAReferenceFrameAV1[8]; V4L2 expects frame.reference_frame_ts[7] + frame.order_hints[7] (skipping intra reference slot). Index translation needed.

  4. Film grain: VAAPI passes VAFilmGrainStructureAV1; V4L2 has v4l2_ctrl_av1_film_grain. Direct field copy with byte-level translations.

  5. Tile group entries: VAAPI passes VASliceParameterBufferAV1 (one per slice = one per tile). Each becomes one v4l2_ctrl_av1_tile_group_entry. Array allocated per-frame (variable size).

  6. VAOpaqueAV1 payload: VAAPI also has an opaque container (VAOpaqueAV1) carrying things VAAPI's spec doesn't enumerate. Need to verify FFmpeg-vaapi uses it for AV1 (probably not for kdirect equivalence).

Open architectural questions for Janet review (before Phase 2 code)

# Question
Q1 Should the libva backend use a v4l2-request-style request-fd-per-frame model, or batch the 4 controls atomically? vp9.c uses 2 controls per frame (FRAME + COMPRESSED_HDR); AV1 has 4 (SEQUENCE + FRAME + TILE_GROUP_ENTRY[] + FILM_GRAIN). Tile group entry is a DYNAMIC_ARRAY ctrl — does that need special handling?
Q2 Film grain is optional (V4L2_CID may not always need to be set if film_grain_params_present == 0). Should we always send or conditionally send? Kwiboo (line 593) tracks has_film_grain per-context.
Q3 SEQUENCE_HEADER caching: should we set V4L2_CID_STATELESS_AV1_SEQUENCE once and reuse, or every frame? Per V4L2 docs, set per stream is sufficient; per-frame is wasteful but harmless.
Q4 VAAPI's VAOpaqueAV1 opaque payload — does ffmpeg-vaapi populate it for AV1? If yes, we may need to parse OBU-level data. If no, ignore. Verify with strace ffmpeg -hwaccel vaapi -c:v av1 (which currently fails because backend lacks AV1, but we can see what VAAPI tries to pass).
Q5 V4L2_PIX_FMT_AV1_FRAME is the OUTPUT format on rkvdec, but for vpu981 (hantro AV1) the V4L2 device is /dev/video4. Verify backend's cap_pool and request_data correctly target vpu981 device, not rkvdec, when AV1 is selected.
Q6 The sibling-campaign iter38b multi-device probe (5/5 codecs in one libva session) is in scope — must AV1 work alongside HEVC/H264/VP9/VP8/MPEG2 in the same session? Yes — extend the multi-device probe to include vpu981 for AV1.

Sequence

Phase 1 (this doc) → Janet architectural review → Phase 2 implementation → Phase 3 test on ampere (byte-compare libva vs kdirect, both should produce identical bit-perfect output).