OUTPUT_MPLANE sizeimage hardcap of 65484 too small for real H.264 slices; libva resize gets clamped back #19

Closed
opened 2026-05-22 18:45:50 +00:00 by marfrit · 0 comments
Owner

Symptom

On higgs (Pi CM5, libva-v4l2-request-fourier driver tracing enabled), Firefox YouTube avc1 emits a stream of:

v4l2-request: h264_set_controls: VAProfile=6 ... w_mbs_m1=79 h_mbs_m1=44
v4l2-request: codec_store_buffer: OUTPUT-pool resize (need 72921 > cap 65484 → new_sizeimage 147456)
v4l2-request: codec_store_buffer_ensure_capacity: kernel returned sizeimage 65484 < required 72921

The libva driver detects a slice larger than the current OUTPUT-MPLANE buffer (72921 B > 65484 B), asks the kernel to grow the buffer to 147456 B via VIDIOC_S_FMT, and the kernel returns the same fixed 65484 — the format setter daedalus_fill_output_fmt ignores userspace's requested sizeimage and unconditionally pins it to DAEDALUS_MAX_BITSTREAM.

Same cap is what ffmpeg-fourier -hwaccel v4l2request ran into on the morning of 2026-05-22 (Failed to append 197516 bytes data to OUTPUT buffer N (3 of 65484 used)).

Root cause

From include/daedalus_v4l2_proto.h:74:

#define DAEDALUS_PROTO_MAX_PAYLOAD	(64u * 1024u)	/* 64 KiB */

From kernel/daedalus_v4l2_main.c:120:

#define DAEDALUS_MAX_BITSTREAM	(DAEDALUS_PROTO_MAX_PAYLOAD -	\
				 sizeof(struct daedalus_req_decode))
/* … */
static void daedalus_fill_output_fmt(struct v4l2_pix_format_mplane *f, )
{
	/* … */
	f->plane_fmt[0].sizeimage    = DAEDALUS_MAX_BITSTREAM;  /* hardcoded, ignores userspace */
}

The wire-protocol cap is the binding constraint: a payload > 64 KiB cannot fit in a chardev REQ_DECODE message, so the kernel cannot accept a larger OUTPUT bitstream even if vb2 allocated room for it.

Real-world H.264 slice sizes

Resolution Typical I-frame Worst-case I-frame
720p 40-100 KB 200 KB
1080p 80-300 KB 500 KB
4K 300 KB-1 MB 2 MB

64 KiB is below typical 720p I-frame size; the failure is structural, not a corner case.

Fix (proposed)

Bump DAEDALUS_PROTO_MAX_PAYLOAD to 1 MiB in include/daedalus_v4l2_proto.h. All existing allocations (kmemdup, kmalloc, daemon malloc of the read buffer) are sized to the actual payload at runtime; the only growth is the daemon's startup read buffer (one buffer per daemon process) and the V4L2 OUTPUT_MPLANE per-buffer size (kmalloc-able on aarch64 kernels; KMALLOC_MAX_SIZE is ~4 MiB on SLUB).

Other V4L2 stateless decoders (cedrus, rkvdec, hantro) report 1-4 MiB OUTPUT_MPLANE sizeimage — 1 MiB is the conservative end of normal.

Wire protocol

Value change of a #define only; no struct layout change. But the practical effect is bidirectional: a kernel running stale 64 KiB cap with a new 1 MiB-aware daemon (or vice versa) will reject the larger payloads. Lock-step install of daedalus-v4l2 + daedalus-v4l2-dkms is required, same shape as the PR-#7/#8 era.

Reproduce

  1. Pi CM5 / trixie / daedalus-v4l2 0.1.0+r43+g1d8f5af (with PR #18 tiny-bitstream filter)
  2. Firefox + LIBVA_TRACE_BUFDATA=1 (or any verbose libva-v4l2-request-fourier trace path)
  3. Open a YouTube avc1 720p video; observe codec_store_buffer_ensure_capacity: kernel returned sizeimage 65484 < required N log lines once a slice exceeds ~64 KiB.
  4. Effect on Firefox: VAAPI decode fails for that slice; Firefox falls back to libmozavcodec SW until the next session.

Refs

  • include/daedalus_v4l2_proto.h:74 — wire cap
  • kernel/daedalus_v4l2_main.c:120 + :398 — kernel cap + format-setter
  • Issue #17 + PR #18 fixed the tiny-bitstream end (pause sentinel). This is the large-bitstream end of the same OUTPUT-pool sizing story.
## Symptom On higgs (Pi CM5, libva-v4l2-request-fourier driver tracing enabled), Firefox YouTube avc1 emits a stream of: ``` v4l2-request: h264_set_controls: VAProfile=6 ... w_mbs_m1=79 h_mbs_m1=44 v4l2-request: codec_store_buffer: OUTPUT-pool resize (need 72921 > cap 65484 → new_sizeimage 147456) v4l2-request: codec_store_buffer_ensure_capacity: kernel returned sizeimage 65484 < required 72921 ``` The libva driver detects a slice larger than the current OUTPUT-MPLANE buffer (72921 B > 65484 B), asks the kernel to grow the buffer to 147456 B via `VIDIOC_S_FMT`, and the kernel returns the same fixed 65484 — the format setter `daedalus_fill_output_fmt` ignores userspace's requested `sizeimage` and unconditionally pins it to `DAEDALUS_MAX_BITSTREAM`. Same cap is what `ffmpeg-fourier -hwaccel v4l2request` ran into on the morning of 2026-05-22 (`Failed to append 197516 bytes data to OUTPUT buffer N (3 of 65484 used)`). ## Root cause From `include/daedalus_v4l2_proto.h:74`: ```c #define DAEDALUS_PROTO_MAX_PAYLOAD (64u * 1024u) /* 64 KiB */ ``` From `kernel/daedalus_v4l2_main.c:120`: ```c #define DAEDALUS_MAX_BITSTREAM (DAEDALUS_PROTO_MAX_PAYLOAD - \ sizeof(struct daedalus_req_decode)) /* … */ static void daedalus_fill_output_fmt(struct v4l2_pix_format_mplane *f, …) { /* … */ f->plane_fmt[0].sizeimage = DAEDALUS_MAX_BITSTREAM; /* hardcoded, ignores userspace */ } ``` The wire-protocol cap is the binding constraint: a payload > 64 KiB cannot fit in a chardev REQ_DECODE message, so the kernel cannot accept a larger OUTPUT bitstream even if vb2 allocated room for it. ## Real-world H.264 slice sizes | Resolution | Typical I-frame | Worst-case I-frame | |---|---|---| | 720p | 40-100 KB | 200 KB | | 1080p | 80-300 KB | 500 KB | | 4K | 300 KB-1 MB | 2 MB | 64 KiB is below typical 720p I-frame size; the failure is structural, not a corner case. ## Fix (proposed) Bump `DAEDALUS_PROTO_MAX_PAYLOAD` to 1 MiB in `include/daedalus_v4l2_proto.h`. All existing allocations (`kmemdup`, `kmalloc`, daemon `malloc` of the read buffer) are sized to the actual payload at runtime; the only growth is the daemon's startup read buffer (one buffer per daemon process) and the V4L2 OUTPUT_MPLANE per-buffer size (kmalloc-able on aarch64 kernels; KMALLOC_MAX_SIZE is ~4 MiB on SLUB). Other V4L2 stateless decoders (cedrus, rkvdec, hantro) report 1-4 MiB OUTPUT_MPLANE sizeimage — 1 MiB is the conservative end of normal. ## Wire protocol Value change of a `#define` only; no struct layout change. But the practical effect is bidirectional: a kernel running stale 64 KiB cap with a new 1 MiB-aware daemon (or vice versa) will reject the larger payloads. Lock-step install of `daedalus-v4l2` + `daedalus-v4l2-dkms` is required, same shape as the PR-#7/#8 era. ## Reproduce 1. Pi CM5 / trixie / daedalus-v4l2 0.1.0+r43+g1d8f5af (with PR #18 tiny-bitstream filter) 2. Firefox + `LIBVA_TRACE_BUFDATA=1` (or any verbose libva-v4l2-request-fourier trace path) 3. Open a YouTube avc1 720p video; observe `codec_store_buffer_ensure_capacity: kernel returned sizeimage 65484 < required N` log lines once a slice exceeds ~64 KiB. 4. Effect on Firefox: VAAPI decode fails for that slice; Firefox falls back to libmozavcodec SW until the next session. ## Refs - `include/daedalus_v4l2_proto.h:74` — wire cap - `kernel/daedalus_v4l2_main.c:120` + `:398` — kernel cap + format-setter - Issue #17 + PR #18 fixed the *tiny*-bitstream end (pause sentinel). This is the *large*-bitstream end of the same OUTPUT-pool sizing story.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: reauktion/daedalus-v4l2#19