Files
daedalus-v4l2/docs/phase_8_13_closure.md
marfrit f04d7000f8 Phase 8.13: byte-exact end-to-end via libva (consumer target hit)
The project's consumer-side goal landed: a real VAAPI consumer
(ffmpeg with -hwaccel vaapi) drives our libva backend → V4L2
driver → daemon → byte-exact NV12 output back to ffmpeg.

  ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
         -hwaccel_output_format nv12 -i vp9_small.ivf \
         -f rawvideo -y /tmp/vp9_via_libva.nv12
  cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12  → match

18432-byte NV12 byte-for-byte identical to plain ffmpeg
-pix_fmt nv12 software decode. The project_consumer_target
memory's deliverable shape — "V4L2 stateless node consumed by
a real VAAPI client" — is achieved.

Two related kernel changes:

1. v4l2_ctrl_handler_setup(&ctx->hdl) after registration —
   matches rkvdec/cedrus/hantro. Brings each registered
   compound control out of "uninitialised" state via
   std_init_compound defaults.

2. Per-request control completion in the decode path —
   the real fix for "Timeout when waiting for media request".
   vb2-core's vb2_buffer_done unbinds the BUFFER's req_obj
   on normal decode completion, but the per-request CONTROL
   object stays bound. buf_request_complete fires only from
   queue-cancel paths (vb2-core line 2284), NOT from normal
   buf_done. The driver must call
   v4l2_ctrl_request_complete(req, hdl) explicitly from the
   completion path.

   struct daedalus_inflight gained a `struct media_request
   *req` field, captured from src_buf->vb2_buf.req_obj.req
   in device_run. daedalus_complete_resp_frame then calls
   v4l2_ctrl_request_complete before
   v4l2_m2m_buf_done_and_job_finish — triggers
   MEDIA_REQUEST_STATE_COMPLETE and wakes the request fd
   poll.

   For non-request flows (test_m2m_stream direct QBUF)
   inf->req is NULL; the conditional skips the call.
   Both consumer styles work concurrently.

Diagnostic clarification (was Phase 8.13a):

strace traced three S_EXT_CTRLS calls per frame:
  1. H264_PROFILE + H264_LEVEL → EINVAL  (we don't register)
  2. HEVC_PROFILE + HEVC_LEVEL → EINVAL  (we don't register)
  3. VP9_FRAME + VP9_COMPRESSED_HDR → SUCCESS

The first two are harmless: libva probes whether we support
H264/HEVC integer profile/level controls during config
negotiation; we don't (we expose them as stateless), so EINVAL
just falls through. The actual VP9 stateless controls (#3)
succeeded all along — the libva-side "Unable to set control(s)"
log was misleading us into thinking the control path was the
bug.

Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):

  daemon log:
    REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
    decoder: opened vp9 context
    decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe ...

  ffmpeg side:
    no Timeout, no Decoding error
    /tmp/vp9_via_libva.nv12: 18432 bytes

  cmp vs reference: byte-for-byte identical.

Roadmap update:
- 8.10/8.11, 8.12, 8.13 marked closed with closure docs.
- 8.14 = multi-frame VP9 via libva, AV1 + H.264, mpv/Firefox
  higher-level consumers.

Per correctness-before-speed:
- strace + kernel-source-reading found the actual root cause
  rather than guessing.
- Conditional v4l2_ctrl_request_complete preserves the existing
  test_m2m_stream non-request path — both consumer styles work
  concurrently without per-flow branching elsewhere.
- Byte-exact pixel comparison, not "frame size matches."

Phase 8.14 next: multi-frame stream + multi-codec via libva +
mpv/Firefox.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 18:14:34 +00:00

7.1 KiB
Raw Permalink Blame History

Phase 8.13 closure — byte-exact end-to-end via libva

Status: closed 2026-05-18.

The project's consumer-side goal landed: a real VAAPI consumer (ffmpeg with -hwaccel vaapi) drives our libva backend → V4L2 driver → daemon → byte-exact NV12 output back to ffmpeg.

ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
       -hwaccel_output_format nv12 -i vp9_small.ivf \
       -f rawvideo -y /tmp/vp9_via_libva.nv12

cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12   # ← match

18432 bytes, byte-for-byte identical to plain ffmpeg -pix_fmt nv12 -f rawvideo software decode of the same VP9 keyframe. The project_consumer_target memory's deliverable shape — "V4L2 stateless node consumed by a real VAAPI client" — is achieved.

What lands

Two related kernel changes that unstick the libva request completion handshake:

1. Stateless control handler initialisation

v4l2_ctrl_handler_setup(&ctx->hdl) after registration — matches rkvdec/cedrus/hantro. Brings each registered compound control out of "uninitialised" state via the std_init_compound defaults (e.g. VP9_FRAME gets profile=0, bit_depth=8).

2. Per-request control completion in the decode path

The actual root cause of "Timeout when waiting for media request":

  • vb2-core's vb2_buffer_done unbinds the BUFFER's req_obj on normal decode completion.
  • But the per-request CONTROL object stays bound until v4l2_ctrl_request_complete runs.
  • The vb2 buf_request_complete op fires only from queue-cancel paths (vb2-core line 2284), NOT from normal buf_done.
  • The driver must call v4l2_ctrl_request_complete(req, hdl) explicitly from its decode-completion path.

Fix (in kernel/daedalus_v4l2_main.c):

struct daedalus_inflight {
    ...
    struct media_request *req;  /* captured from src_buf */
};

static void daedalus_device_run(void *priv) {
    ...
    inf->req = src_buf->vb2_buf.req_obj.req;
    ...
}

void daedalus_complete_resp_frame(...) {
    ...
    if (inf->req)
        v4l2_ctrl_request_complete(inf->req, &inf->ctx->hdl);
    v4l2_m2m_buf_done_and_job_finish(...);
}

For non-request flows (test_m2m_stream's direct QBUF) inf->req is NULL; the conditional skips the v4l2_ctrl_request_complete call. Both consumer styles work concurrently.

3. Diagnostic improvements

  • libva-v4l2-request-fourier src/v4l2.c: better error logging in v4l2_set_controls (logs error_idx, failing control id, size). Made the diagnosis above tractable.

Verification

End-to-end via ffmpeg + libva, byte-exact

$ pkill -f daedalus_v4l2_daemon; sudo rmmod daedalus_v4l2
$ sudo insmod kernel/daedalus_v4l2.ko
$ daedalus_v4l2_daemon -v daemon &

$ LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
  LIBVA_DRIVER_NAME=v4l2_request \
  LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0 \
  LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media3 \
  ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
         -hwaccel_output_format nv12 -i /tmp/vp9_small.ivf \
         -f rawvideo -y /tmp/vp9_via_libva.nv12

  v4l2-request: cap_pool_init: 24 slots ready
  v4l2-request: Unable to set control(s): EINVAL (H264 probe — harmless)
  v4l2-request: Unable to set control(s): EINVAL (HEVC probe — harmless)
  (no timeout, no decode error)

daemon log:
  REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
  decoder: opened vp9 context
  decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe luma=12288 chroma=6144

$ ls -la /tmp/vp9_via_libva.nv12
-rw-r--r-- 1 root root 18432 May 18 20:13 /tmp/vp9_via_libva.nv12

$ ffmpeg -i /tmp/vp9_small.ivf -pix_fmt nv12 -f rawvideo \
       -y /tmp/vp9_ref_for_libva.nv12
$ cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12
$ echo $?
0

Byte-for-byte match. The two Unable to set control(s): EINVAL messages are libva probing H264 + HEVC PROFILE/LEVEL integer controls during config negotiation — we don't register those (since we expose VP9/AV1/H264 stateless), libva gets EINVAL, logs it, and moves on. Functional flow is unaffected.

Design analysis

Why was Phase 8.12 close but not complete

8.12 wired all the request API hooks (supports_requests, buf_out_validate, buf_request_complete) and the daemon-side decode worked byte-exact — but the per-request control object stayed bound forever because buf_request_complete only fires from queue-cancel paths in vb2-core, not from normal buf_done. Result: request never transitioned to COMPLETE, libva poll timed out.

Phase 8.13 closes that loop by capturing the media_request from the OUTPUT vb2_buffer's req_obj at device_run time and calling v4l2_ctrl_request_complete explicitly when the decode finishes (chardev RESP_FRAME path). Mirrors what rkvdec does from its IRQ handler and cedrus from its device_run completion.

Why the EINVAL noise was misleading

Earlier phases (8.108.12) kept seeing "Unable to set control(s): Invalid argument" and assumed it pointed at our stateless control registration. strace revealed three separate S_EXT_CTRLS calls per frame:

# controls result meaning
1 H264_PROFILE + H264_LEVEL EINVAL libva probes H264; we don't register it
2 HEVC_PROFILE + HEVC_LEVEL EINVAL libva probes HEVC; we don't register it
3 VP9_FRAME + VP9_COMPRESSED_HDR OK actual decode controls

Calls 1 and 2 are harmless: libva detects we don't support H264/HEVC integer probes and falls back to the stateless controls it does have. Call 3 (the actual VP9 stateless controls) succeeded all along. Only the completion handshake was broken.

Phase 8.13's added error_idx logging in v4l2.c (failing_ctrl_id=0xa40900 size=0 etc.) is what made the distinction visible.

Why one fix unblocked both 8.13 and 8.14

The original plan split Phase 8.13 ("trace the EINVAL") from Phase 8.14 ("call request_complete from the right place"). Once strace clarified that the EINVAL was probe noise, the real fix was just the request_complete call from the decode path — a 10-line change. Doing both in one shot avoided a phase boundary that wouldn't have shipped anything additional.

What's NOT here (Phase 8.14+ scope)

  • Multi-frame stream via libva. Verified single keyframe; P-frame reference handling across requests untested. Likely works (the daemon's AVCodecContext is persistent across REQ_DECODE calls — already proven via test_m2m_stream).
  • AV1 + H.264 via libva. Different stateless control sets; needs the same control-payload validation path. May need similar v4l2_ctrl_request_complete adjustments per codec.
  • mpv + Firefox end-to-end. The lower-level harness (ffmpeg vaapi) works; higher-level consumers should follow but each has its own VAAPI quirks.
  • The two harmless EINVALs from H264/HEVC profile probes. Could be suppressed by registering those integer controls too (always rejecting writes) but that's a polish item.

Phase 8.14 plan

  1. Multi-frame VP9 stream via libva (re-use vp9_60s.ivf from Phase 8.9 stress test).
  2. AV1 + H.264 single-frame via libva (likely needs codec- specific tweaks).
  3. Document any remaining libva-side quirks for higher-level consumers (mpv, Firefox).