# Phase 8.10 + 8.11 closure — libva consumer integration scaffold

**Status:** closed 2026-05-18.

Two interleaved phases:

- **Phase 8.10** — wire daedalus_v4l2 into the existing
  `libva-v4l2-request-fourier` campaign fork (the sibling
  repo at `marfrit/libva-v4l2-request-fourier` already had
  VP9/AV1/H264/HEVC working on Rockchip/Allwinner).
- **Phase 8.11** — extend daedalus_v4l2 with the V4L2
  surface format and media-controller plumbing the libva
  fork needs.

Together they bring the daedalus stack from "standalone
test client" to "VAAPI-discoverable decoder with all the
ICD-side framework integration in place."  Actual decode
through the libva path stops at the V4L2 stateless control
payload acceptance step — a deeper framework integration
that lands in Phase 8.12.

## What lands

### libva-v4l2-request-fourier (sibling fork, gitea `marfrit/`)

Two commits pushed to `master`:

- `b5b3acf` — daedalus_v4l2: add to known_decoder_drivers
  + multi-device-probe slot.  Same shape as iter40's
  rpi-hevc-dec wiring: array entry, driver_data fd slot,
  primary-driver detection branch, post-probe log line.
  34-line diff.
- `2146341` — daedalus_v4l2: meson option gate (default
  true).  `meson setup -Ddaedalus_v4l2=false` builds a .so
  with no daedalus strings at all (verified via strings
  on both build dirs).  Struct fields stay unconditional
  to avoid ODR risk across translation units.

### daedalus-v4l2 (this repo)

Three production changes in this commit:

**1. `V4L2_PIX_FMT_NV12` (single-plane) on CAPTURE**

The libva fork's `video_format` table only knows NV12
single-plane (W*H Y bytes followed by W*H/2 interleaved
CbCr bytes in one buffer plane), not NV12M (two-plane).
Added NV12 alongside our existing NV12M + P010 in the
CAPTURE format list:

- `daedalus_capture_formats[]` grew an `V4L2_PIX_FMT_NV12`
  entry; `enum_fmt` now lists 3 CAPTURE formats.
- `daedalus_fill_capture_fmt` handles the new num_planes=1
  layout (sizeimage = W*H*3/2, bytesperline = W).
- daemon `pack_nv12_single_to_plane`: Y line-by-line into
  base+0, interleaved CbCr at base+(stride*H).  Mirrors
  the P010 pack structure minus the depth shift.
- `daedalus_decoder_run_request` dispatches on
  `req->capture_pix_fmt` to the right pack function.

**Verified**: VP9 1080p decoded into NV12 single-plane via
`tools/test_m2m_stream`, byte-for-byte match against
`ffmpeg -pix_fmt nv12` reference (10-frame 31 MB stream).

**2. V4L2 Request API media ops**

`daedalus_media_ops = { .req_validate = vb2_request_validate,
.req_queue = v4l2_m2m_request_queue }` assigned to
`mdev.ops` before `media_device_init`.  Before this,
`MEDIA_IOC_REQUEST_ALLOC` returned `-ENOTTY` and any
VAAPI consumer couldn't even allocate a media_request.

**3. Stateless control registration via `v4l2_ctrl_new_custom`**

Switched from `v4l2_ctrl_new_std_compound(NULL p_def)` to
`v4l2_ctrl_new_custom(&cfg, NULL)` — the pattern rkvdec /
cedrus / hantro use.  Adds a no-op `s_ctrl` callback so
v4l2-core has somewhere to dispatch SET operations.

## Verification

### Probe + enumeration

```
$ LIBVA_DRIVER_NAME=v4l2_request \
  LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0 \
  LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media3 \
  vainfo --display drm --device /dev/dri/renderD128

v4l2-request: phase 8.10: opened daedalus_v4l2 at video_fd=N media_fd=M
vainfo: Driver version: v4l2-request
vainfo: Supported profile and entrypoints
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264MultiviewHigh      : VAEntrypointVLD
      VAProfileH264StereoHigh         : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileAV1Profile0            : VAEntrypointVLD
```

Seven VAAPI profiles enumerated through the libva path.

### LibVA trace through `ffmpeg -hwaccel vaapi`

| Step | Status |
|------|--------|
| `vaInitialize` | ✓ |
| `vaQueryConfigProfiles` | ✓ |
| `vaQueryConfigEntrypoints` (VLD) | ✓ |
| `vaCreateConfig` (VP9 + VLD + NV12) | ✓ |
| `vaQuerySurfaceAttributes` (NV12 fourcc reported) | ✓ |
| `vaCreateSurfaces` | ✓ |
| `vaCreateContext` (cap_pool: 24 slots, 1 plane each) | ✓ |
| `vaCreateBuffer` (slice + picture params) | ✓ |
| `MEDIA_IOC_REQUEST_ALLOC` | ✓ |
| `VIDIOC_S_EXT_CTRLS` (stateless ctrls) | ✗ EINVAL |
| `VIDIOC_QBUF` with request fd | ✗ "Invalid request descriptor" |
| `vaEndPicture` | ✗ OPERATION_FAILED |

Everything past discovery / probe / context / surface /
buffer / request alloc works.  The blocker is
`VIDIOC_S_EXT_CTRLS` returning EINVAL when libva tries to
populate `V4L2_CID_STATELESS_VP9_FRAME` on the request —
the payload validation against the kernel's expected
compound-control struct shape rejects.

This isn't a "missing line" fix — it needs proper
stateless control plumbing (the SPS/PPS/SliceParams/etc.
get_dims, validate, default-value paths that the in-tree
rkvdec / cedrus / hantro decoders implement to satisfy
v4l2-core's `std_validate` machinery).  That's Phase 8.12
scope.

### Standalone NV12 verification

```
$ sudo ./tools/test_m2m_stream /tmp/vp9_1080_stream.ivf \
     /tmp/nv12_out.nv12 1920 1080 vp9 nv12
parsed 10 frames, 1920x1080
CAPTURE fmt=NV12 planes=1 sizeimage=[3110400,0]
decoded 10 / 10 frames
$ cmp /tmp/nv12_out.nv12 /tmp/vp9_1080_stream_ref.nv12
$ echo $?
0
```

Byte-exact through the daedalus-internal path with the
NV12 single-plane format.  Confirms `pack_nv12_single_to_plane`
produces the same pixels as `pack_nv12_to_planes` (just
re-laid-out into one buffer).

## Design decisions

### Why ship even though decode-via-libva is blocked

The framework integration up to MEDIA_IOC_REQUEST_ALLOC is
itself a significant deliverable:

- Other VAAPI consumers (testing tools, future patched
  ffmpeg paths, custom VA clients) get the same scaffolding
  for free.
- The remaining gap is well-characterised: it's the
  stateless control payload acceptance, a single named
  V4L2 framework integration.  Phase 8.12's surface is
  clearly defined.
- All the new code is correct on its own merits (NV12
  single-plane decode is byte-exact via our own path;
  request API ops are the canonical helpers).

### Why register stateless controls if we don't act on them

libva-v4l2-request-fourier's per-codec dispatch requires
the standard V4L2 stateless controls to be visible — that's
how it validates that the kernel supports the right
profile.  Without the registered controls vainfo would
not enumerate the profile.

Our daemon ignores the control values because FFmpeg
re-parses the bitstream on its own.  The plumbing exists
to satisfy the V4L2 stateless contract; the actual
decode logic doesn't depend on it.

### Why `v4l2_ctrl_new_custom` over `_std_compound`

`v4l2_ctrl_new_std_compound` with `NULL` default rejected
SET requests (same EINVAL libva is hitting today —
removing the `NULL` default didn't fix it either).
`v4l2_ctrl_new_custom` is the pattern in-tree decoders
use; v4l2-core auto-fills the type/dims/size from the std
control table when given just a `.id`.

The remaining issue isn't the registration pattern but
the payload validation path — v4l2-core expects more
than just a registered control; it expects the driver to
have set up `min/max/step/def` for each compound field,
which `v4l2_ctrl_new_std_compound` does internally for
known CIDs but our handler isn't quite right.

## What's NOT here (Phase 8.12 scope)

- **Stateless control payload acceptance**: the S_EXT_CTRLS
  EINVAL.  Needs proper v4l2-core validation hooks —
  likely meaning the daemon needs to actually consume the
  per-frame controls (not just ignore them), so the
  validation path has something to hand off to.
- **Per-codec control wiring**: even if S_EXT_CTRLS
  succeeds, the actual decode submission to the daemon
  needs to bundle the per-buffer controls (or document
  why they're ignored — and convince v4l2-core to allow
  the request to validate).
- **First end-to-end decoded frame via libva**: the
  payoff for Phase 8.12.

## Phase 8.12 plan

1. Study cedrus or rkvdec's stateless control validation
   to understand what `std_validate` expects beyond just
   registration.
2. Either:
   - (a) Add proper compound-control validation hooks so
     S_EXT_CTRLS succeeds without us doing real work
     (control values become "advisory" since daemon
     re-parses bitstream), OR
   - (b) Wire the daemon to actually use the per-frame
     control payload (skip FFmpeg's parse step, trust
     libva's parsed values).  Bigger change but more
     correct.
3. Verify first frame decoded through the libva path.
4. Run the full vainfo --display drm --decode test if
   that exists, or a small VA decode snippet.