diff --git a/phase1_plan.md b/phase1_plan.md new file mode 100644 index 0000000..0dbf514 --- /dev/null +++ b/phase1_plan.md @@ -0,0 +1,84 @@ +# Phase 1 plan — libva-v4l2-request-fourier AV1 dispatch + +Date: 2026-05-17 09:15. Phase 0 verified mainline AV1 works bit-perfect via ffmpeg's kdirect (v4l2request) path. Phase 1 plans the libva backend AV1 dispatch so VAAPI consumers can also use the HW path. + +## Goal + +Add `~/src/libva-v4l2-request-fourier/src/av1.{c,h}` that dispatches VAAPI AV1 picture/slice buffers onto V4L2 `V4L2_CID_STATELESS_AV1_*` controls, matching the working ffmpeg-v4l2request behaviour. End-state criterion: VAAPI-via-libva AV1 decode bit-perfect against the verified kdirect path (`/tmp/hw-av1*.nv12` from Phase 0). + +## Reference implementations + +| Source | Where | Use | +|---|---|---| +| Kwiboo FFmpeg `v4l2_request_av1.c` | `Kwiboo/FFmpeg:v4l2-request-n8.1:libavcodec/v4l2_request_av1.c` (636 LoC) | Canonical filler — exact field semantics for `v4l2_ctrl_av1_*`. Reads from FFmpeg's AV1RawSequenceHeader/AV1DecContext but the v4l2 output structs are identical to what we need. | +| libva backend `vp9.c` | ampere:~/src/libva-v4l2-request-fourier/src/vp9.c (700+ LoC) | Architectural pattern — set_controls / store_buffer dispatch / context wiring | +| V4L2 AV1 uAPI | `/usr/include/linux/v4l2-controls.h:2895..3517` | Source of truth for struct layouts | +| VAAPI AV1 structures | `` (libva-dev) — `VAPictureParameterBufferAV1`, `VASliceParameterBufferAV1`, `VAOpaque` payload | App→backend interface | + +## V4L2 AV1 controls (4 controls per frame) + +| Control | Struct | Size | Notes | +|---|---|---|---| +| `V4L2_CID_STATELESS_AV1_SEQUENCE` | `v4l2_ctrl_av1_sequence` | tiny | Sequence header — set once per stream | +| `V4L2_CID_STATELESS_AV1_FRAME` | `v4l2_ctrl_av1_frame` | 7+ sub-structs | Per-frame header — bulk of the work | +| `V4L2_CID_STATELESS_AV1_TILE_GROUP_ENTRY` | `v4l2_ctrl_av1_tile_group_entry[]` | array | One entry per tile | +| `V4L2_CID_STATELESS_AV1_FILM_GRAIN` | `v4l2_ctrl_av1_film_grain` | medium | Per-frame film grain (optional) | + +`v4l2_ctrl_av1_frame` (the heavy one) aggregates: `v4l2_av1_tile_info`, `v4l2_av1_quantization`, `v4l2_av1_segmentation`, `v4l2_av1_loop_filter`, `v4l2_av1_cdef`, `v4l2_av1_loop_restoration`, `v4l2_av1_global_motion`. Each maps to AV1-spec fields directly. + +## Backend integration plan + +| Step | File | Action | +|---|---|---| +| 1 | `codec.c` | Add `case VAProfileAV1Profile0: return V4L2_PIX_FMT_AV1_FRAME;` | +| 2 | `config.c` | Add `any_fd_supports_output_format(driver_data, V4L2_PIX_FMT_AV1_FRAME)` check; advertise `VAProfileAV1Profile0` | +| 3 | `surface.h` | Add `av1` substruct: picture (VAPictureParameterBufferAV1) + slice (VASliceParameterBufferAV1) parsed payload | +| 4 | `picture.c::codec_store_buffer` | Add VAAPI buffer-type cases: `VAPictureParameterBufferType` + `VASliceParameterBufferType` for VAProfileAV1Profile0 | +| 5 | `picture.c::render_picture` | Call `av1_set_controls()` for VAProfileAV1Profile0 | +| 6 | NEW `av1.h` | Public `av1_set_controls()` signature (matches vp9.h) | +| 7 | NEW `av1.c` | ~700 lines: fill_sequence, fill_frame, fill_film_grain, fill_tile_group_entry, set_controls dispatch | +| 8 | `Makefile.am` | Add `av1.c av1.h` to source list | +| 9 | Build | `autoreconf -fi && ./configure && make` on ampere | + +## Phase 2 file breakdown (estimated) + +| File | Lines added | +|---|---| +| `av1.c` | ~700 (mirror vp9.c structure + AV1's larger control surface) | +| `av1.h` | ~30 | +| `codec.c` | +5 | +| `config.c` | +10 | +| `picture.c` | +30 | +| `surface.h` | +10 | +| `Makefile.am` | +2 | + +Total: ~800 LoC. Realistic effort: 1-2 focused days. + +## Field-mapping notes (key concerns from Kwiboo reference) + +1. **`v4l2_ctrl_av1_sequence.flags`** — many `V4L2_AV1_SEQUENCE_FLAG_*` bits mapped from VAAPI's `seq_fields.bits.*`. Direct correspondence. Kwiboo's `fill_sequence` is 75 lines, all `if (seq->X) flags |= V4L2_AV1_SEQUENCE_FLAG_X;`. + +2. **`v4l2_ctrl_av1_frame`** — 257 LoC in Kwiboo. Contains per-frame: tile_info, quantization, segmentation, loop_filter, cdef, loop_restoration, global_motion. Each is a separate `v4l2_av1_*` sub-struct that needs filling. + +3. **Reference frame mapping**: VAAPI provides `VAReferenceFrameAV1[8]`; V4L2 expects `frame.reference_frame_ts[7]` + `frame.order_hints[7]` (skipping intra reference slot). Index translation needed. + +4. **Film grain**: VAAPI passes `VAFilmGrainStructureAV1`; V4L2 has `v4l2_ctrl_av1_film_grain`. Direct field copy with byte-level translations. + +5. **Tile group entries**: VAAPI passes `VASliceParameterBufferAV1` (one per slice = one per tile). Each becomes one `v4l2_ctrl_av1_tile_group_entry`. Array allocated per-frame (variable size). + +6. **VAOpaqueAV1 payload**: VAAPI also has an opaque container (`VAOpaqueAV1`) carrying things VAAPI's spec doesn't enumerate. Need to verify FFmpeg-vaapi uses it for AV1 (probably not for kdirect equivalence). + +## Open architectural questions for Janet review (before Phase 2 code) + +| # | Question | +|---|---| +| Q1 | Should the libva backend use a v4l2-request-style request-fd-per-frame model, or batch the 4 controls atomically? vp9.c uses 2 controls per frame (FRAME + COMPRESSED_HDR); AV1 has 4 (SEQUENCE + FRAME + TILE_GROUP_ENTRY[] + FILM_GRAIN). Tile group entry is a DYNAMIC_ARRAY ctrl — does that need special handling? | +| Q2 | Film grain is optional (V4L2_CID may not always need to be set if `film_grain_params_present == 0`). Should we always send or conditionally send? Kwiboo (line 593) tracks `has_film_grain` per-context. | +| Q3 | SEQUENCE_HEADER caching: should we set V4L2_CID_STATELESS_AV1_SEQUENCE once and reuse, or every frame? Per V4L2 docs, set per stream is sufficient; per-frame is wasteful but harmless. | +| Q4 | VAAPI's `VAOpaqueAV1` opaque payload — does ffmpeg-vaapi populate it for AV1? If yes, we may need to parse OBU-level data. If no, ignore. Verify with `strace ffmpeg -hwaccel vaapi -c:v av1` (which currently fails because backend lacks AV1, but we can see what VAAPI tries to pass). | +| Q5 | `V4L2_PIX_FMT_AV1_FRAME` is the OUTPUT format on rkvdec, but for vpu981 (hantro AV1) the V4L2 device is `/dev/video4`. Verify backend's `cap_pool` and `request_data` correctly target vpu981 device, not rkvdec, when AV1 is selected. | +| Q6 | The sibling-campaign iter38b multi-device probe (5/5 codecs in one libva session) is in scope — must AV1 work alongside HEVC/H264/VP9/VP8/MPEG2 in the same session? Yes — extend the multi-device probe to include vpu981 for AV1. | + +## Sequence + +Phase 1 (this doc) → Janet architectural review → Phase 2 implementation → Phase 3 test on ampere (byte-compare libva vs kdirect, both should produce identical bit-perfect output).