Merge pull request 'picture, request_pool: transparent OUTPUT-pool resize on bitstream overrun (#15 )' (#16 ) from claude-noether/libva-v4l2-request-fourier:noether/output-pool-resize-issue-15 into master

Reviewed-on: marfrit/libva-v4l2-request-fourier#16
picture, request_pool: transparent OUTPUT-pool resize on bitstream overrun
2026-05-21 11:23:08 +00:00 · 2026-05-21 13:11:55 +02:00 · 2026-05-21 10:17:15 +00:00 · 2026-05-21 12:14:48 +02:00 · 2026-05-20 19:14:49 +00:00 · 2026-05-20 21:13:07 +02:00
12 changed files with 732 additions and 14 deletions
@@ -0,0 +1,155 @@
 /*
 * Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
 *
 * AV1 codec dispatcher.  Populates V4L2_CID_STATELESS_AV1_SEQUENCE
 * (struct v4l2_ctrl_av1_sequence) from VAAPI's VADecPictureParameterBufferAV1.
 *
 * Why a single SEQUENCE control and not the full V4L2_CID_STATELESS_AV1_*
 * family (FRAME, TILE_GROUP_ENTRY, FILM_GRAIN):
 *
 *   - The daedalus_v4l2 daemon path consumes the OUTPUT bitstream
 *     directly via libavcodec/libdav1d.  libdav1d needs a complete OBU
 *     stream that includes the sequence header — ffmpeg-vaapi strips the
 *     sequence header on the client side (its parser is split across
 *     VAPictureParameterBufferAV1 + slice payload, with OBU_SEQUENCE_HEADER
 *     consumed and not re-emitted), so the daemon side has to synthesise
 *     it from the SEQUENCE ctrl.  The other AV1 ctrls (FRAME / TILE /
 *     FILM_GRAIN) are not needed for that synthesis — the OBU_FRAME_HEADER
 *     + OBU_TILE_GROUP that libdav1d also needs are still in the slice
 *     bitstream.
 *
 *   - The vpu981 (RK3588 dedicated AV1 hantro) hardware path doesn't
 *     consult these controls either — vpu981's driver parses the AV1
 *     bitstream directly.  So setting only SEQUENCE is correct for both
 *     destination decoders.
 *
 * Reference: marfrit/libva-v4l2-request-fourier issue #11
 *            (DAEMON-PPS-style sequence-header re-synthesis on the daemon
 *            side, paralleling the H.264 SPS/PPS work in DAEMON-PPS).
 *            kernel uAPI: <linux/v4l2-controls.h> @ 2891-2919.
 *            VAAPI:       <va/va_dec_av1.h> typedef
 *                         VADecPictureParameterBufferAV1.
 */
 #include "av1.h"
 #include "v4l2.h"
 #include "utils.h"
 #include <stdint.h>
 #include <string.h>
 #include <linux/v4l2-controls.h>
 #include <linux/videodev2.h>
 /*
 * VADecPictureParameterBufferAV1 reaches us transitively via surface.h →
 * va_backend.h → va.h → va_dec_av1.h (va_dec_av1.h alone won't compile
 * standalone — it needs va.h's VA_PADDING_LOW / va_deprecated machinery).
 */
 /* Compile-time UAPI shift guard, sibling to vp9.c's pattern. */
 _Static_assert(sizeof(struct v4l2_ctrl_av1_sequence) == 12,
 	       "v4l2_ctrl_av1_sequence size mismatch — kernel UAPI changed");
 /*
 * Map VAAPI bit_depth_idx (0/1/2 → 8/10/12) to the kernel ctrl's plain
 * uint8_t bit_depth field.  ffmpeg-vaapi sets idx from the bitstream
 * BitDepth value, so this is an exact inverse of AV1 spec 5.5.2.
 */
 static uint8_t av1_bit_depth_from_idx(uint8_t idx)
 {
 	switch (idx) {
 	case 0:  return 8;
 	case 1:  return 10;
 	case 2:  return 12;
 	default:
 		/* Spec-illegal; pass through so a reviewer / test catches it. */
 		return 8;
 	}
 }
 int av1_set_controls(struct request_data *driver_data,
 		     struct object_context *context,
 		     struct object_surface *surface_object)
 {
 	VADecPictureParameterBufferAV1 *picture =
 		&surface_object->params.av1.picture;
 	struct v4l2_ctrl_av1_sequence sequence;
 	struct v4l2_ext_control ctrls[1];
 	int rc;
 	(void)context;
 	memset(&sequence, 0, sizeof sequence);
 	/*
 	 * Scalar mapping.  Names align with kernel uAPI; off-by-one and
 	 * idx→value translations are annotated.
 	 */
 	sequence.seq_profile = picture->profile;
 	sequence.order_hint_bits =
 		(uint8_t)(picture->order_hint_bits_minus_1 + 1u);
 	sequence.bit_depth = av1_bit_depth_from_idx(picture->bit_depth_idx);
 	sequence.max_frame_width_minus_1 = picture->frame_width_minus1;
 	sequence.max_frame_height_minus_1 = picture->frame_height_minus1;
 	/*
 	 * Sequence-header flag mapping.  VAAPI exposes most of these directly
 	 * in seq_info_fields.fields.*; the ones that don't have a 1:1 mirror
 	 * (V4L2_AV1_SEQUENCE_FLAG_ENABLE_WARPED_MOTION, _ENABLE_REF_FRAME_MVS,
 	 * _ENABLE_SUPERRES, _ENABLE_RESTORATION, _SEPARATE_UV_DELTA_Q) live in
 	 * VAAPI's per-frame pic_info_fields rather than the sequence struct.
 	 * For SEQUENCE-control purposes we treat them as best-effort
 	 * unobservable from libva and leave the corresponding bits clear; the
 	 * daedalus daemon's OBU synthesiser (issue #11 daemon track) carries
 	 * the SEQUENCE bytes verbatim, so per-frame consumers (libdav1d) will
 	 * still see the full bitstream truth for those toggles via the
 	 * OBU_FRAME stream already in the slice buffer.  See feedback memory
 	 * `feedback_vaapi_blind_to_some_hevc_sps_fields` for the precedent.
 	 */
 	if (picture->seq_info_fields.fields.still_picture)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_STILL_PICTURE;
 	if (picture->seq_info_fields.fields.use_128x128_superblock)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_USE_128X128_SUPERBLOCK;
 	if (picture->seq_info_fields.fields.enable_filter_intra)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_FILTER_INTRA;
 	if (picture->seq_info_fields.fields.enable_intra_edge_filter)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_INTRA_EDGE_FILTER;
 	if (picture->seq_info_fields.fields.enable_interintra_compound)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_INTERINTRA_COMPOUND;
 	if (picture->seq_info_fields.fields.enable_masked_compound)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_MASKED_COMPOUND;
 	if (picture->seq_info_fields.fields.enable_dual_filter)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_DUAL_FILTER;
 	if (picture->seq_info_fields.fields.enable_order_hint)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_ORDER_HINT;
 	if (picture->seq_info_fields.fields.enable_jnt_comp)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_JNT_COMP;
 	if (picture->seq_info_fields.fields.enable_cdef)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_ENABLE_CDEF;
 	if (picture->seq_info_fields.fields.mono_chrome)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_MONO_CHROME;
 	if (picture->seq_info_fields.fields.color_range)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_COLOR_RANGE;
 	if (picture->seq_info_fields.fields.subsampling_x)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_SUBSAMPLING_X;
 	if (picture->seq_info_fields.fields.subsampling_y)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_SUBSAMPLING_Y;
 	if (picture->seq_info_fields.fields.film_grain_params_present)
 		sequence.flags |= V4L2_AV1_SEQUENCE_FLAG_FILM_GRAIN_PARAMS_PRESENT;
 	/* Single-control batched submission. */
 	memset(ctrls, 0, sizeof ctrls);
 	ctrls[0].id   = V4L2_CID_STATELESS_AV1_SEQUENCE;
 	ctrls[0].ptr  = &sequence;
 	ctrls[0].size = sizeof sequence;
 	rc = v4l2_set_controls(driver_data->video_fd,
 			       surface_object->request_fd,
 			       ctrls, 1);
 	if (rc < 0)
 		return VA_STATUS_ERROR_OPERATION_FAILED;
 	return VA_STATUS_SUCCESS;
 }
@@ -0,0 +1,39 @@
 /*
 * Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
 *
 * AV1 codec dispatcher — populates V4L2_CID_STATELESS_AV1_SEQUENCE
 * (struct v4l2_ctrl_av1_sequence) from VAAPI's VADecPictureParameterBufferAV1.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the
 * "Software"), to deal in the Software without restriction, including
 * without limitation the rights to use, copy, modify, merge, publish,
 * distribute, sub license, and/or sell copies of the Software, and to
 * permit persons to whom the Software is furnished to do so, subject to
 * the following conditions:
 *
 * The above copyright notice and this permission notice (including the
 * next paragraph) shall be included in all copies or substantial portions
 * of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
 * IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE FOR ANY CLAIM,
 * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
 * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
 * THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 */
 #ifndef _AV1_H_
 #define _AV1_H_
 #include "context.h"
 #include "request.h"
 #include "surface.h"
 int av1_set_controls(struct request_data *driver_data,
 		     struct object_context *context,
 		     struct object_surface *surface);
 #endif /* _AV1_H_ */
@@ -172,15 +172,20 @@ VAStatus RequestDestroyConfig(VADriverContextP context, VAConfigID config_id)
 static bool any_fd_supports_output_format(struct request_data *driver_data,
 					  unsigned int fmt)
 {
-	int fds[5] = {
+	int fds[6] = {
 		driver_data->video_fd,
 		driver_data->video_fd_rkvdec,
 		driver_data->video_fd_hantro,
 		driver_data->video_fd_rpi_hevc_dec,  /* iter40 */
 		driver_data->video_fd_vpu981,        /* ampere-av1 Phase 2 */
 #ifdef HAVE_DAEDALUS_V4L2
 		driver_data->video_fd_daedalus,      /* LIBVA-1: H.264/VP9/AV1 */
 #else
 		-1,
 #endif
 	};
 	int i;
-	for (i = 0; i < 5; i++) {
+	for (i = 0; i < 6; i++) {
 		if (fds[i] < 0) continue;
 		if (v4l2_find_format(fds[i], V4L2_BUF_TYPE_VIDEO_OUTPUT, fmt))
 			return true;
@@ -537,7 +537,9 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
 	 */
 	rc = request_pool_init(&driver_data->output_pool,
 			       driver_data->video_fd, driver_data->media_fd,
-			       output_type, 16);
+			       output_type, 16, pixelformat,
 			       (unsigned int)picture_width,
 			       (unsigned int)picture_height);
 	if (rc < 0) {
 		status = VA_STATUS_ERROR_ALLOCATION_FAILED;
 		goto error;
@@ -827,10 +827,63 @@ int h264_set_controls(struct request_data *driver_data,
 	dpb_update(context, &surface->params.h264.picture);
 	/*
 	 * Dump the raw VAAPI fields at the libva boundary so issue #8
 	 * follow-up can disambiguate "ffmpeg-vaapi didn't populate" from
 	 * "downstream consumer (daedalus_v4l2 wire protocol) corrupted the
 	 * value". One-line; safe to leave in — costs a single printf per frame.
 	 */
 	request_log("h264_set_controls: VAProfile=%d seq_fields=0x%08x pic_fields=0x%08x num_ref_frames=%u bit_depth_luma_m8=%u bit_depth_chroma_m8=%u w_mbs_m1=%u h_mbs_m1=%u\n",
 		    (int)profile,
 		    surface->params.h264.picture.seq_fields.value,
 		    surface->params.h264.picture.pic_fields.value,
 		    surface->params.h264.picture.num_ref_frames,
 		    surface->params.h264.picture.bit_depth_luma_minus8,
 		    surface->params.h264.picture.bit_depth_chroma_minus8,
 		    surface->params.h264.picture.picture_width_in_mbs_minus1,
 		    surface->params.h264.picture.picture_height_in_mbs_minus1);
 	h264_va_picture_to_v4l2(driver_data, context, surface,
 				&surface->params.h264.picture,
 				&decode, &pps, &sps);
 	/*
 	 * max_num_ref_frames fallback. Some VAAPI clients (older ffmpeg-vaapi
 	 * paths, some daedalus_v4l2 consumers) leave VAPicture->num_ref_frames
 	 * at zero. Hardware decoders tolerate; libavcodec-via-daedalus enforces
 	 * sps.max_num_ref_frames strictly and rejects every frame.
 	 *
 	 * Count valid DPB entries first (the bitstream-true reference count we
 	 * can see); fall back to a per-profile spec minimum if even that is 0.
 	 * See marfrit/libva-v4l2-request-fourier issue #8.
 	 */
 	if (sps.max_num_ref_frames == 0) {
 		unsigned int valid = 0;
 		unsigned int i;
 		for (i = 0; i < 16; i++) {
 			const VAPictureH264 *ref =
 				&surface->params.h264.picture.ReferenceFrames[i];
 			if (!(ref->flags & VA_PICTURE_H264_INVALID))
 				valid++;
 		}
 		if (valid > 0) {
 			sps.max_num_ref_frames = (uint8_t)valid;
 		} else {
 			switch (profile) {
 			case VAProfileH264ConstrainedBaseline:
 				sps.max_num_ref_frames = 1;
 				break;
 			case VAProfileH264Main:
 			case VAProfileH264High:
 			case VAProfileH264MultiviewHigh:
 			case VAProfileH264StereoHigh:
 			default:
 				sps.max_num_ref_frames = 4;
 				break;
 			}
 		}
 	}
 	/*
 	 * Populate the scaling matrix unconditionally: from VAAPI's
 	 * VAIQMatrixBufferH264 when the consumer sent one this frame
@@ -53,6 +53,7 @@ sources = [
 	'h265.c',
 	'vp8.c',
 	'vp9.c',
 	'av1.c',
 	'codec.c',
 	'nv15.c',
 	'nv12_col128.c',
@@ -36,6 +36,8 @@
 #include "mpeg2.h"
 #include "vp8.h"
 #include "vp9.h"
 #include "av1.h"
 #include "request_pool.h"
 #include <assert.h>
 #include <stdio.h>
@@ -54,6 +56,159 @@
 #include "autoconfig.h"
 /*
 * iter#15 — issue #15: ensure the in-flight surface's OUTPUT mmap has
 * room for `delta` more bytes appended to slices_size; if not, grow the
 * pool transparently via request_pool_resize.
 *
 * Sequence on overflow:
 *   1. Snapshot the surface's accumulated bytes to a temp heap buffer.
 *   2. Release the surface's OUTPUT pool slot back to FREE (resize
 *      requires no slot be borrowed).
 *   3. Compute new sizeimage = roundup(needed * 2, 4 KiB), and at least
 *      double the current source_size so geometric growth amortises
 *      repeated overruns at the same resolution.
 *   4. Call request_pool_resize.
 *   5. Re-acquire a pool slot (the new pool has fresh indices and fds).
 *   6. Re-mirror surface_object->source_{index,data,size,request_fd}
 *      from the new slot.
 *   7. Restore the saved bytes via memcpy into the new mmap.
 *
 * Returns VA_STATUS_SUCCESS on clean resize (or no resize needed) and
 * VA_STATUS_ERROR_ALLOCATION_FAILED on heap-alloc / V4L2 / kernel
 * failure — the libva client falls back to surface re-creation as
 * before the resize hook landed.
 *
 * NOTE on inline-Sync invariant: RequestEndPicture calls
 * RequestSyncSurface inline, so when codec_store_buffer runs no other
 * pool slot is borrowed across libva-driver-API entry points. The
 * temporary release-then-reacquire of the in-flight slot here keeps
 * that invariant intact across the resize.
 */
 static VAStatus
 codec_store_buffer_ensure_capacity(struct request_data *driver_data,
 				   struct object_surface *surface_object,
 				   size_t need)
 {
 	struct request_pool_slot *slot;
 	uint8_t *save_buf;
 	size_t save_size;
 	unsigned int saved_index;
 	size_t want_sizeimage;
 	unsigned int new_sizeimage;
 	int new_index;
 	int rc;
 	if (need <= surface_object->source_size)
 		return VA_STATUS_SUCCESS;
 	save_size = surface_object->slices_size;
 	save_buf = NULL;
 	if (save_size > 0) {
 		save_buf = malloc(save_size);
 		if (save_buf == NULL) {
 			request_log("codec_store_buffer_ensure_capacity: malloc(%zu) for resize-save failed\n",
 				    save_size);
 			return VA_STATUS_ERROR_ALLOCATION_FAILED;
 		}
 		memcpy(save_buf, surface_object->source_data, save_size);
 	}
 	/*
 	 * Temporarily release the in-flight slot. The slot's V4L2 buffer
 	 * has NOT been QBUF'd yet (QBUF lives in RequestEndPicture, after
 	 * this codec_store_buffer call), so the release is a clean
 	 * busy=false flip; no kernel state is in question.  The slot's
 	 * stale request_fd does not need to be saved — the resize closes
 	 * every slot's fd and the post-resize acquire below re-mirrors a
 	 * fresh slot's request_fd into surface_object->request_fd.
 	 */
 	saved_index = surface_object->source_index;
 	request_pool_release(&driver_data->output_pool, saved_index);
 	/*
 	 * Geometric growth: at least 2× the current source_size, but no
 	 * less than 2× the required total — so a single resize covers the
 	 * triggering append plus comfortable headroom for the rest of
 	 * this frame. Round up to a 4 KiB page boundary so the kernel's
 	 * own alignment doesn't waste pages.  Compute in size_t so the
 	 * 2× doubling can't silently wrap at 2 GiB on 32-bit unsigned int
 	 * (sizeimage stays bounded by V4L2's u32, but the doubling target
 	 * could otherwise overflow before the clamp).
 	 */
 	want_sizeimage = need * 2;
 	if (want_sizeimage < (size_t)surface_object->source_size * 2)
 		want_sizeimage = (size_t)surface_object->source_size * 2;
 	if (want_sizeimage > 0x40000000u) /* 1 GiB hard cap — V4L2 sizeimage is u32 */
 		want_sizeimage = 0x40000000u;
 	want_sizeimage = (want_sizeimage + 0xFFFu) & ~(size_t)0xFFFu;
 	new_sizeimage = (unsigned int)want_sizeimage;
 	request_log("codec_store_buffer: OUTPUT-pool resize (need %zu > cap %u → new_sizeimage %u)\n",
 		    need, surface_object->source_size, new_sizeimage);
 	rc = request_pool_resize(&driver_data->output_pool, new_sizeimage);
 	if (rc < 0) {
 		/*
 		 * Resize failed. The original slot was already released
 		 * above, so surface_object->source_data is now pointing
 		 * at a FREE-but-still-borrowable mmap. Restore the
 		 * surface's slot mirror so EndPicture / DestroyContext
 		 * unwind paths see a consistent (if partial) state.
 		 *
 		 * If the resize aborted early (pre-STREAMOFF), the slot
 		 * is intact: re-acquiring the same index is the inverse
 		 * of the temporary release above. If it aborted later
 		 * (post-teardown), the slot's data/size were zeroed in
 		 * place by request_pool_resize and the re-acquire flips
 		 * busy=true on a dead slot — still safe, because the
 		 * caller will return ERROR_ALLOCATION_FAILED and the
 		 * libva consumer destroys the surface/context.
 		 */
 		(void)request_pool_acquire(&driver_data->output_pool);
 		free(save_buf);
 		return VA_STATUS_ERROR_ALLOCATION_FAILED;
 	}
 	new_index = request_pool_acquire(&driver_data->output_pool);
 	if (new_index < 0) {
 		free(save_buf);
 		return VA_STATUS_ERROR_ALLOCATION_FAILED;
 	}
 	slot = request_pool_slot(&driver_data->output_pool,
 				 (unsigned int)new_index);
 	if (slot == NULL) {
 		request_pool_release(&driver_data->output_pool,
 				     (unsigned int)new_index);
 		free(save_buf);
 		return VA_STATUS_ERROR_ALLOCATION_FAILED;
 	}
 	surface_object->source_index = slot->index;
 	surface_object->source_data = slot->data;
 	surface_object->source_size = slot->size;
 	surface_object->request_fd = slot->request_fd;
 	if (need > surface_object->source_size) {
 		/*
 		 * Kernel rounded the new sizeimage down below what we
 		 * needed — drivers may clamp at their per-codec ceiling.
 		 * Don't corrupt memory; surface the error to libva.
 		 */
 		request_log("codec_store_buffer_ensure_capacity: kernel returned sizeimage %u < required %zu\n",
 			    surface_object->source_size, need);
 		free(save_buf);
 		return VA_STATUS_ERROR_ALLOCATION_FAILED;
 	}
 	if (save_buf != NULL) {
 		memcpy(surface_object->source_data, save_buf, save_size);
 		free(save_buf);
 	}
 	return VA_STATUS_SUCCESS;
 }
 static VAStatus codec_store_buffer(struct request_data *driver_data,
 				   struct object_context *context,
 				   VAProfile profile,
@@ -61,16 +216,36 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
 				   struct object_buffer *buffer_object)
 {
 	switch (buffer_object->type) {
-	case VASliceDataBufferType:
+	case VASliceDataBufferType: {
 		/*
 		 * Since there is no guarantee that the allocation
 		 * order is the same as the submission order (via
 		 * RenderPicture), we can't use a V4L2 buffer directly
 		 * and have to copy from a regular buffer.
 		 *
 		 * Capacity guard (issue #13 + #15): surface_object->source_data
 		 * points at an OUTPUT-pool mmap of size source_size, negotiated
 		 * at S_FMT time. A stream-level resolution upshift can produce
 		 * a slice larger than this allocation. Each append site below
 		 * computes the post-append running total and calls
 		 * codec_store_buffer_ensure_capacity, which transparently grows
 		 * the OUTPUT pool (request_pool_resize) so the existing memcpy
 		 * has room. The hard error path (VA_STATUS_ERROR_ALLOCATION_FAILED)
 		 * only fires if both the heap save buffer AND the kernel-side
 		 * grow fail — at which point libavcodec recreates the surface.
 		 */
 		size_t need;
 		VAStatus ensure_rc;
 		if (context->h264_start_code) {
 			static const char start_code[3] = { 0x00, 0x00, 0x01 };
 			need = (size_t)surface_object->slices_size +
 			       sizeof(start_code);
 			ensure_rc = codec_store_buffer_ensure_capacity(
 				driver_data, surface_object, need);
 			if (ensure_rc != VA_STATUS_SUCCESS)
 				return ensure_rc;
 			memcpy(surface_object->source_data +
 			       surface_object->slices_size,
 			       start_code, sizeof(start_code));
@@ -104,19 +279,32 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
 			unsigned int header_size =
 				surface_object->params.vp8.picture.pic_fields.bits.key_frame == 0 ?
 					10 : 3;
 			need = (size_t)surface_object->slices_size + header_size;
 			ensure_rc = codec_store_buffer_ensure_capacity(
 				driver_data, surface_object, need);
 			if (ensure_rc != VA_STATUS_SUCCESS)
 				return ensure_rc;
 			memset(surface_object->source_data +
 			       surface_object->slices_size,
 			       0, header_size);
 			surface_object->slices_size += header_size;
 		}
 		{
 			size_t payload = (size_t)buffer_object->size *
 					 buffer_object->count;
 			need = (size_t)surface_object->slices_size + payload;
 			ensure_rc = codec_store_buffer_ensure_capacity(
 				driver_data, surface_object, need);
 			if (ensure_rc != VA_STATUS_SUCCESS)
 				return ensure_rc;
 			memcpy(surface_object->source_data +
 				       surface_object->slices_size,
-		       buffer_object->data,
+			       buffer_object->data, payload);
-		       buffer_object->size * buffer_object->count);
+			surface_object->slices_size += payload;
-		surface_object->slices_size +=
+		}
 			buffer_object->size * buffer_object->count;
 		surface_object->slices_count++;
 		break;
 	}
 	case VAPictureParameterBufferType:
 		switch (profile) {
@@ -157,6 +345,12 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
 			       sizeof(surface_object->params.vp9.picture));
 			break;
 		case VAProfileAV1Profile0:
 			memcpy(&surface_object->params.av1.picture,
 			       buffer_object->data,
 			       sizeof(surface_object->params.av1.picture));
 			break;
 		default:
 			break;
 		}
@@ -318,6 +512,26 @@ static VAStatus codec_set_controls(struct request_data *driver_data,
 			return VA_STATUS_ERROR_OPERATION_FAILED;
 		break;
 	case VAProfileAV1Profile0:
 		/*
 		 * Populates V4L2_CID_STATELESS_AV1_SEQUENCE from
 		 * VAPictureParameterBufferAV1.  The daedalus_v4l2 daemon
 		 * (issue #11 daemon track) synthesises an OBU_SEQUENCE_HEADER
 		 * from this ctrl and prepends it to the slice bitstream
 		 * before handing it to libavcodec/libdav1d, which otherwise
 		 * cannot parse the (sequence-header-stripped) OUTPUT buffer
 		 * that ffmpeg-vaapi delivers.
 		 *
 		 * On the RK3588 vpu981 hardware path the same SEQUENCE ctrl
 		 * is harmless: vpu981's driver parses the OBU stream
 		 * directly and ignores the ctrl payload, so no per-decoder
 		 * gating is required here.
 		 */
 		rc = av1_set_controls(driver_data, context, surface_object);
 		if (rc < 0)
 			return VA_STATUS_ERROR_OPERATION_FAILED;
 		break;
 	default:
 		return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
 	}
@@ -21,7 +21,10 @@
 #include "v4l2.h"
 int request_pool_init(struct request_pool *pool, int video_fd, int media_fd,
-		      unsigned int output_type, unsigned int count)
+		      unsigned int output_type, unsigned int count,
 		      unsigned int pixelformat,
 		      unsigned int picture_width,
 		      unsigned int picture_height)
 {
 	unsigned int index_base;
 	unsigned int length;
@@ -43,6 +46,16 @@ int request_pool_init(struct request_pool *pool, int video_fd, int media_fd,
 	pool->next = 0;
 	pool->media_fd = media_fd;	/* iter7: kept for force_release re-alloc */
 	/*
 	 * iter#15: cache the S_FMT params so request_pool_resize can
 	 * re-issue S_FMT with a sizeimage hint override on overrun.
 	 */
 	pool->video_fd = video_fd;
 	pool->output_type = output_type;
 	pool->pixelformat = pixelformat;
 	pool->picture_width = picture_width;
 	pool->picture_height = picture_height;
 	for (i = 0; i < count; i++)
 		pool->slots[i].request_fd = -1;
@@ -94,6 +107,118 @@ error:
 	return -1;
 }
 int request_pool_resize(struct request_pool *pool,
 			unsigned int new_sizeimage_min)
 {
 	unsigned int index_base;
 	unsigned int length;
 	unsigned int offset;
 	unsigned int saved_count;
 	unsigned int i;
 	int rc;
 	if (pool == NULL || !pool->initialized || pool->count == 0)
 		return -1;
 	/*
 	 * Pre-condition guard: no slot may be borrowed when we tear the
 	 * pool down. The caller in codec_store_buffer temporarily releases
 	 * the current in-flight surface's slot before invoking us; the
 	 * inline-Sync-in-EndPicture pattern guarantees no other slot is
 	 * borrowed elsewhere in the driver. Bail loudly if anyone breaks
 	 * that invariant rather than corrupting in-flight V4L2 state.
 	 */
 	for (i = 0; i < pool->count; i++) {
 		if (pool->slots[i].busy) {
 			request_log("request_pool_resize: slot %u still busy — "
 				    "caller must release before resize\n", i);
 			return -1;
 		}
 	}
 	saved_count = pool->count;
 	/* STREAMOFF the OUTPUT queue so REQBUFS(0) is accepted. */
 	rc = v4l2_set_stream(pool->video_fd, pool->output_type, false);
 	if (rc < 0)
 		return -1;
 	/*
 	 * Tear down every slot: munmap, close per-slot request_fd. Slot
 	 * fields are zeroed in place so failure halfway is recoverable.
 	 */
 	for (i = 0; i < pool->count; i++) {
 		if (pool->slots[i].data != NULL && pool->slots[i].size > 0) {
 			munmap(pool->slots[i].data, pool->slots[i].size);
 			pool->slots[i].data = NULL;
 			pool->slots[i].size = 0;
 		}
 		if (pool->slots[i].request_fd >= 0) {
 			close(pool->slots[i].request_fd);
 			pool->slots[i].request_fd = -1;
 		}
 	}
 	/*
 	 * Release the V4L2 OUTPUT buffer indices. REQBUFS(0) is the only
 	 * way to ask the kernel to free buffers so CREATE_BUFS can re-
 	 * allocate with a new per-buffer sizeimage.
 	 */
 	rc = v4l2_request_buffers(pool->video_fd, pool->output_type, 0);
 	if (rc < 0)
 		return -1;
 	/*
 	 * Re-issue S_FMT with the cached dimensions but a larger
 	 * sizeimage. The kernel may round up further (driver-specific
 	 * page / alignment rules); we accept whatever it returns and
 	 * pick that up from per-slot v4l2_query_buffer below.
 	 */
 	rc = v4l2_set_format_sizeimage(pool->video_fd, pool->output_type,
 				       pool->pixelformat,
 				       pool->picture_width,
 				       pool->picture_height,
 				       new_sizeimage_min);
 	if (rc < 0)
 		return -1;
 	rc = v4l2_create_buffers(pool->video_fd, pool->output_type,
 				 saved_count, &index_base);
 	if (rc < 0)
 		return -1;
 	for (i = 0; i < saved_count; i++) {
 		pool->slots[i].index = index_base + i;
 		pool->slots[i].busy = false;
 		rc = v4l2_query_buffer(pool->video_fd, pool->output_type,
 				       pool->slots[i].index,
 				       &length, &offset, 1);
 		if (rc < 0)
 			return -1;
 		pool->slots[i].data = mmap(NULL, length,
 					   PROT_READ | PROT_WRITE,
 					   MAP_SHARED, pool->video_fd, offset);
 		if (pool->slots[i].data == MAP_FAILED) {
 			pool->slots[i].data = NULL;
 			return -1;
 		}
 		pool->slots[i].size = length;
 		pool->slots[i].request_fd = media_request_alloc(pool->media_fd);
 		if (pool->slots[i].request_fd < 0)
 			return -1;
 	}
 	rc = v4l2_set_stream(pool->video_fd, pool->output_type, true);
 	if (rc < 0)
 		return -1;
 	pool->next = 0;
 	return 0;
 }
 void request_pool_destroy(struct request_pool *pool)
 {
 	unsigned int i;
@@ -52,16 +52,71 @@ struct request_pool {
 	int				 media_fd;	/* iter7: kept for
 							 * force_release re-alloc */
 	bool				 initialized;
 	/*
 	 * iter#15: cached S_FMT params from request_pool_init, so
 	 * request_pool_resize can re-S_FMT the OUTPUT queue with a new
 	 * sizeimage override on a mid-session resolution upshift overrun
 	 * without the caller having to re-thread these through six call
 	 * sites. video_fd is also cached so the resize is fully
 	 * self-contained — request_pool_resize takes only the pool and
 	 * the new sizeimage hint.
 	 */
 	int				 video_fd;
 	unsigned int			 output_type;
 	unsigned int			 pixelformat;
 	unsigned int			 picture_width;
 	unsigned int			 picture_height;
 };
 /*
 * Allocate count OUTPUT buffers via VIDIOC_CREATE_BUFS, query and mmap
 * each, populate pool->slots[]. Caller must have already done
- * VIDIOC_S_FMT on the OUTPUT queue. Returns 0 on success, -1 on
+ * VIDIOC_S_FMT on the OUTPUT queue. The S_FMT params (pixelformat,
- * failure.
+ * picture_width, picture_height) are stashed on the pool so that
 * request_pool_resize can re-issue S_FMT with the same dimensions but
 * a larger sizeimage hint. Returns 0 on success, -1 on failure.
 */
 int request_pool_init(struct request_pool *pool, int video_fd, int media_fd,
-		      unsigned int output_type, unsigned int count);
+		      unsigned int output_type, unsigned int count,
 		      unsigned int pixelformat,
 		      unsigned int picture_width,
 		      unsigned int picture_height);
 /*
 * iter#15: grow the OUTPUT pool's per-slot sizeimage in place.
 *
 * Issued from codec_store_buffer when an Annex-B start code / VP8
 * header pad / slice payload won't fit in the current
 * surface->source_size — i.e. the stream's per-frame bitstream budget
 * has outgrown the OUTPUT pool slot's mmap (typical cause: SPS-driven
 * resolution upshift mid-session).
 *
 * Steps:
 *   1. STREAMOFF the OUTPUT queue.
 *   2. munmap every slot, close every per-slot media-request fd.
 *   3. VIDIOC_REQBUFS(count=0) to release the V4L2 buffer indices.
 *   4. S_FMT with the cached pixelformat / picture_width /
 *      picture_height but a sizeimage hint of new_sizeimage_min.
 *   5. CREATE_BUFS with the original slot count.
 *   6. Per-slot: query buffer length, mmap, alloc fresh request_fd.
 *   7. STREAMON.
 *
 * Returns 0 on success, -1 on failure (caller falls back to
 * VA_STATUS_ERROR_ALLOCATION_FAILED — the libva consumer recreates
 * the surface at the new resolution).
 *
 * Pre-condition: NO pool slot is currently borrowed (busy=false on
 * every slot) AND no buffer is in-flight on the OUTPUT queue. The
 * inline-Sync-in-EndPicture pattern (RequestEndPicture calls
 * RequestSyncSurface before returning) makes this trivially true at
 * codec_store_buffer time for the only-supported single-context
 * single-render-surface flow: the in-flight surface's slot is the
 * sole borrowed slot, and the resize caller temporarily releases it
 * before calling here.
 */
 int request_pool_resize(struct request_pool *pool,
 			unsigned int new_sizeimage_min);
 /*
 * Munmap all slots and free the slots array. Idempotent.
@@ -122,6 +122,18 @@ struct object_surface {
 			VADecPictureParameterBufferVP9 picture;
 			VASliceParameterBufferVP9 slice;
 		} vp9;
 		struct {
 			/*
 			 * AV1 picture parameter buffer.  Slice params are
 			 * intentionally absent — the daedalus daemon track
 			 * (issue #11) consumes the slice OBU bytes directly
 			 * from the OUTPUT bitstream and synthesises only the
 			 * sequence-header OBU from V4L2_CID_STATELESS_AV1_
 			 * SEQUENCE.  No per-tile-group struct→OBU re-synthesis
 			 * required from libva today.
 			 */
 			VADecPictureParameterBufferAV1 picture;
 		} av1;
 	} params;
 	int request_fd;
@@ -113,6 +113,28 @@ static void v4l2_setup_format(struct v4l2_format *format, unsigned int type,
 	}
 }
 static void v4l2_setup_format_sizeimage(struct v4l2_format *format,
 					unsigned int type,
 					unsigned int width, unsigned int height,
 					unsigned int pixelformat,
 					unsigned int sizeimage)
 {
 	memset(format, 0, sizeof(*format));
 	format->type = type;
 	if (v4l2_type_is_mplane(type)) {
 		format->fmt.pix_mp.width = width;
 		format->fmt.pix_mp.height = height;
 		format->fmt.pix_mp.plane_fmt[0].sizeimage = sizeimage;
 		format->fmt.pix_mp.pixelformat = pixelformat;
 	} else {
 		format->fmt.pix.width = width;
 		format->fmt.pix.height = height;
 		format->fmt.pix.sizeimage = sizeimage;
 		format->fmt.pix.pixelformat = pixelformat;
 	}
 }
 bool v4l2_find_format(int video_fd, unsigned int type, unsigned int pixelformat)
 {
 	struct v4l2_fmtdesc fmtdesc;
@@ -172,6 +194,30 @@ int v4l2_set_format(int video_fd, unsigned int type, unsigned int pixelformat,
 	return 0;
 }
 int v4l2_set_format_sizeimage(int video_fd, unsigned int type,
 			      unsigned int pixelformat,
 			      unsigned int width, unsigned int height,
 			      unsigned int sizeimage)
 {
 	struct v4l2_format format;
 	int rc;
 	if (sizeimage == 0)
 		return v4l2_set_format(video_fd, type, pixelformat, width, height);
 	v4l2_setup_format_sizeimage(&format, type, width, height, pixelformat,
 				    sizeimage);
 	rc = ioctl(video_fd, VIDIOC_S_FMT, &format);
 	if (rc < 0) {
 		request_log("Unable to set format (sizeimage=%u) for type %d: %s\n",
 			    sizeimage, type, strerror(errno));
 		return -1;
 	}
 	return 0;
 }
 int v4l2_get_format(int video_fd, unsigned int type, unsigned int *width,
 		    unsigned int *height, unsigned int *bytesperline,
 		    unsigned int *sizes, unsigned int *planes_count)
@@ -36,6 +36,17 @@ bool v4l2_find_format(int video_fd, unsigned int type,
 		      unsigned int pixelformat);
 int v4l2_set_format(int video_fd, unsigned int type, unsigned int pixelformat,
 		    unsigned int width, unsigned int height);
 /*
 * Same as v4l2_set_format but explicitly overrides the OUTPUT
 * sizeimage hint. Pass sizeimage=0 to get the v4l2_set_format default
 * (SOURCE_SIZE_MAX for OUTPUT, 0 for CAPTURE). Used by
 * request_pool_resize on a mid-session bitstream-budget overrun to
 * grow the OUTPUT pool slots past the SOURCE_SIZE_MAX floor.
 */
 int v4l2_set_format_sizeimage(int video_fd, unsigned int type,
 			      unsigned int pixelformat,
 			      unsigned int width, unsigned int height,
 			      unsigned int sizeimage);
 int v4l2_get_format(int video_fd, unsigned int type, unsigned int *width,
 		    unsigned int *height, unsigned int *bytesperline,
 		    unsigned int *sizes, unsigned int *planes_count);
Author	SHA1	Message	Date
marfrit	c454618ae1	Merge pull request 'picture, request_pool: transparent OUTPUT-pool resize on bitstream overrun (#15 )' (#16 ) from claude-noether/libva-v4l2-request-fourier:noether/output-pool-resize-issue-15 into master Reviewed-on: marfrit/libva-v4l2-request-fourier#16	2026-05-21 11:23:08 +00:00
claude-noether	5939ac6ae0	picture, request_pool: transparent OUTPUT-pool resize on bitstream overrun Follow-up to #13 (PR #14, bounds-check floor). When a stream-level resolution upshift mid-session pushes an Annex-B start code / VP8 header pad / slice payload past the OUTPUT pool slot's mmap, the bounds check used to return VA_STATUS_ERROR_ALLOCATION_FAILED and force the libva consumer to recreate the surface (losing the frame). This patch absorbs the resize transparently: 1. codec_store_buffer's three append sites call a new codec_store_buffer_ensure_capacity() before each memcpy/memset. 2. On overflow, ensure_capacity snapshots the in-flight surface's accumulated bytes, temporarily releases its OUTPUT pool slot, and calls request_pool_resize. 3. request_pool_resize STREAMOFFs the OUTPUT queue, munmaps every slot, closes every per-slot media-request fd, REQBUFS(0)s the V4L2 buffers, re-issues S_FMT with a sizeimage hint = 2× the required total (capped at 1 GiB, rounded up to a 4 KiB page), CREATE_BUFSes the original slot count, per-slot queries + mmaps + media_request_allocs, and STREAMONs. 4. ensure_capacity re-acquires a pool slot, re-mirrors source_{index,data,size,request_fd} onto the surface, and restores the saved bytes via memcpy. The cached S_FMT params (pixelformat, picture_width, picture_height) are stashed on the request_pool at init time so the resize is fully self-contained — caller passes only the new sizeimage hint. A new v4l2_set_format_sizeimage() helper accepts an explicit sizeimage override; v4l2_set_format keeps the SOURCE_SIZE_MAX (1 MiB) default for CreateContext-time S_FMT. The pre-condition for the resize is "no pool slot may be borrowed." The inline-Sync-in-EndPicture pattern (RequestEndPicture calls RequestSyncSurface before returning) guarantees that during codec_store_buffer, the only borrowed slot is the current render_surface_id's — which the resize trigger explicitly releases before invoking the pool function. request_pool_resize asserts the invariant via a busy-scan and bails loudly if anyone breaks it rather than corrupting in-flight V4L2 state. On resize failure: re-acquire the just-released slot (it was a clean busy=false flip; the resize aborted before tearing it down in the common case, or zeroed its mmap fields in the late-abort case — either way the re-acquire keeps surface_object's mirror internally consistent) and surface the original VA_STATUS_ERROR_ALLOCATION_FAILED so libva clients fall back to surface recreation as before this patch. CAPTURE side is untouched — the V4L2 stateless API treats per-queue streaming independently, so STREAMOFF/STREAMON on OUTPUT does not disrupt the CAPTURE queue, and a resolution-upshift CAPTURE budget mismatch becomes a clean V4L2_BUF_FLAG_ERROR on the next DQBUF (handled by the existing surface error path). Closes marfrit/libva-v4l2-request-fourier#15.	2026-05-21 13:11:55 +02:00
marfrit	2860d75afe	Merge pull request 'picture: bounds-check codec_store_buffer slice writes against source_size (#13 )' (#14 ) from claude-noether/libva-v4l2-request-fourier:noether/codec-store-buffer-bounds-check-13 into master Reviewed-on: marfrit/libva-v4l2-request-fourier#14	2026-05-21 10:17:15 +00:00
claude-noether	bfcb286031	picture: bounds-check codec_store_buffer slice writes against source_size surface_object->source_data points at an OUTPUT-pool mmap of fixed size source_size, negotiated by v4l2_query_buffer at request_pool_init time (kernel sizeimage at S_FMT). codec_store_buffer's VASliceDataBufferType branch appended to it at three sites (H.264 Annex-B start code, VP8 uncompressed-header pad, slice payload) without consulting that capacity — a stream-level resolution upshift would walk past the mmap and SIGSEGV inside the memcpy (mpv --hwdec=vaapi-copy on the daedalus path, issue #13) or corrupt adjacent heap (Firefox RDD). Add a check at each append site that fails the RenderPicture call with VA_STATUS_ERROR_ALLOCATION_FAILED when slices_size+payload exceeds source_size, and logs the over-budget request for postmortem. libavcodec recreates the surface at the new dimensions on the next BeginPicture, so a refused upshift slice is recoverable. Doesn't address the root cause (surfaces should be re-created on resolution change, or source_data should be grown on demand) but removes the memory-safety hazard while the larger refactor waits. Closes marfrit/libva-v4l2-request-fourier#13.	2026-05-21 12:14:48 +02:00
marfrit	77f9236466	Merge pull request 'av1: populate V4L2_CID_STATELESS_AV1_SEQUENCE in codec_set_controls (#11 libva side)' (#12 ) from claude-noether/libva-v4l2-request-fourier:noether/av1-set-controls-bug-11 into master Reviewed-on: marfrit/libva-v4l2-request-fourier#12	2026-05-20 19:14:49 +00:00
claude-noether	9fa18f2312	av1: populate V4L2_CID_STATELESS_AV1_SEQUENCE in codec_set_controls Implements the libva-side portion of issue #11 — replaces PR #10's no-op AV1 dispatch with a real av1_set_controls that maps VAAPI's VADecPictureParameterBufferAV1.seq_info_fields + scalar fields onto struct v4l2_ctrl_av1_sequence (the kernel uAPI control declared at linux/v4l2-controls.h:2891-2919). Daemon-track context (issue #11 daemon side, operator-owned): ffmpeg-vaapi splits the AV1 bitstream client-side and strips the OBU_SEQUENCE_HEADER before delivery; the V4L2 OUTPUT buffer contains only OBU_FRAME_HEADER + OBU_TILE_GROUP. libdav1d in the daedalus daemon cannot parse this — it expects a complete OBU stream. The daemon side has to synthesise OBU_SEQUENCE_HEADER from the SEQUENCE ctrl and prepend it to the slice bitstream. This libva-side change just makes the SEQUENCE ctrl populated and queued via S_EXT_CTRLS; the daemon track is the consumer. Three small touch points beyond the new src/av1.{c,h}: - src/surface.h: add an av1 leaf to surface->params holding VADecPictureParameterBufferAV1. Slice params intentionally absent — the daedalus daemon consumes the slice OBU bytes directly from the OUTPUT buffer; no per-tile-group struct → OBU re-synthesis required from libva today. - src/picture.c: copy the picture-param buffer into the new leaf in RenderPicture, mirror of the per-codec memcpy pattern, plus call av1_set_controls from codec_set_controls (replacing the no-op). - src/meson.build: register src/av1.c. Sequence-field mapping covers everything VAAPI exposes at the sequence level (12 of 18 V4L2_AV1_SEQUENCE_FLAG_* bits + the four scalars). Bits VAAPI doesn't carry at the sequence level (WARPED_MOTION, REF_FRAME_MVS, SUPERRES, RESTORATION, SEPARATE_UV_DELTA_Q) stay clear; per-frame consumers (libdav1d via the daemon, vpu981 via the hardware path) read those from the OBU_FRAME_HEADER that is already in the slice buffer anyway. See feedback memory `feedback_vaapi_blind_to_some_hevc_sps_fields` for the precedent. Build verified on higgs (Debian 13 trixie, gcc 14.2.0, libva 2.22.0, linux uAPI v4l2-controls.h sizeof(struct v4l2_ctrl_av1_sequence)==12): clean meson + ninja link of v4l2_request_drv_video.so, vainfo enumerates VAProfileAV1Profile0 via daedalus_v4l2 slot, av1_set_controls symbol present. Out of scope on this PR (operator-track, issue #11 follow-up): - daedalus-v4l2 kernel module wire-protocol extension (daedalus_ collect_av1_meta + AV1 ctrl request_setup). - daedalus daemon OBU synthesiser (~400 LoC AV1 OBU encoder in daemon/src/av1_obu_synth.{c,h}). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 21:13:07 +02:00
marfrit	9a9cfd05db	Merge pull request 'picture: no-op codec_set_controls case for VAProfileAV1Profile0' (#10 ) from noether/picture-av1-noop into master Reviewed-on: marfrit/libva-v4l2-request-fourier#10	2026-05-20 19:07:12 +00:00
marfrit	96d70af674	picture: no-op codec_set_controls case for VAProfileAV1Profile0 picture.c's codec_set_controls() switch was falling through to the default case for VAProfileAV1Profile0, returning VA_STATUS_ERROR_UNSUPPORTED_PROFILE. Result: vaEndPicture failed with status 12 ("requested VAProfile is not supported"), no OUTPUT buffer ever got queued, and the daedalus_v4l2 daemon never saw a REQ_DECODE for AV1. config.c's VAProfileAV1Profile0 case (line 84-93) explicitly notes "Decode-side ctrl dispatch (V4L2_CID_STATELESS_AV1_) is NOT YET WIRED on master — vainfo will list the profile + CreateConfig succeeds, but consumers that submit decode buffers hit a NOP path". The NOP path was never actually wired in picture.c — it hit the default UNSUPPORTED_PROFILE branch instead. Fix: add a VAProfileAV1Profile0 case that just `break;`s through without setting V4L2 controls. For the daedalus_v4l2 daemon path this is exactly the right shape — AV1 frame data is self-describing per OBU stream (no separate SPS/PPS controls needed at the V4L2 boundary), so the OUTPUT buffer alone is sufficient for the kernel to forward to the daemon. Verified on higgs: ffmpeg -hwaccel vaapi -i av1.mkv now actually queues frames to /dev/video2 and the daemon's libdav1d context opens. Decode itself still fails (libdav1d wants the AV1 sequence header OBU, which ffmpeg-vaapi sends via VAPictureParameterBufferAV1 not via the slice buffer) — separate issue, needs an OBU sequence-header synthesiser in the daedalus daemon (analogous to the new H.264 SPS/PPS NAL synth in daedalus-v4l2/daemon/src/h264_nal_synth.c). That sequence-header synth work is a substantial follow-up; this patch unblocks AV1 reaching the daemon at all. For RK3588 vpu981 (the originally-planned AV1 target), this remains a true NO-OP — when V4L2_CID_STATELESS_AV1_ dispatch lands from the av1-iter1 operator branch, replace the no-op with av1_set_controls(...). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 20:58:57 +02:00
marfrit	c1bb444d07	Merge pull request 'h264: max_num_ref_frames fallback + libva-boundary instrumentation (#8 )' (#9 ) from claude-noether/libva-v4l2-request-fourier:noether/h264-3-set-controls-bitstream-bug-8 into master Reviewed-on: marfrit/libva-v4l2-request-fourier#9	2026-05-20 18:19:03 +00:00
claude-noether	0791f8e612	h264: max_num_ref_frames fallback + libva-boundary instrumentation Closes the libva-side portion of marfrit/libva-v4l2-request-fourier#8. Two small additions to h264_set_controls: 1. When VAPicture->num_ref_frames is 0 (older ffmpeg-vaapi paths / some daedalus_v4l2 consumers), count valid (non-INVALID) DPB entries in ReferenceFrames[16]. If even that returns 0, fall back to a per-profile spec minimum (1 for baseline, 4 for main/high). Hardware decoders (rkvdec, hantro, rpi-hevc-dec) tolerated the prior 0; libavcodec-via-daedalus enforces sps.max_num_ref_frames strictly and rejected every frame. 2. One request_log line at function entry dumping the raw VAAPI fields (seq_fields.value, pic_fields.value, num_ref_frames, bit_depth_, picture__in_mbs_minus1). Disambiguates "ffmpeg-vaapi never populated" from "daedalus_v4l2 wire protocol corrupted" for the bit-fields-read-as-zero portion of issue #8. Out of scope here (separate issue if pursued): profile_idc and level_idc remain session-derived. VAAPI's VAPictureParameterBufferH264 omits both (verified higgs libva 2.22.0-3, /usr/include/va/va.h: 3571-3622) — same VAAPI-blindspot family as the HEVC SPS fields. A real fix requires SPS-NAL parsing from surface->source_data OR a daedalus wire-protocol pass-through; both are operator design calls, not a libva-only patch. Build verified on higgs (Debian 13 trixie, gcc 14.2.0, libva 2.22.0): clean ninja link of v4l2_request_drv_video.so, vainfo enumerates all 8 codec profiles, no init regression. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 20:17:27 +02:00
marfrit	989833114a	Merge pull request 'config: include video_fd_daedalus in profile enumeration probe' (#7 ) from claude-noether/libva-v4l2-request-fourier:noether/libva-2-config-profile-enum-daedalus into master Reviewed-on: marfrit/libva-v4l2-request-fourier#7	2026-05-20 14:52:11 +00:00
marfrit	d1ba4625d2	config: include video_fd_daedalus in profile enumeration probe LIBVA-2 follow-up. RequestQueryConfigProfiles walks each known decoder fd via any_fd_supports_output_format() and adds a VAProfile* for each codec OUTPUT format the V4L2 device advertises. The fd list missed video_fd_daedalus — so on a Pi 5 with rpi-hevc-dec primary + daedalus_v4l2 alt, only S265 (HEVC) was probed and the H.264 / VP9 / AV1 profiles never got enumerated. Effect on higgs: ffmpeg -hwaccel vaapi -i h264_test.mp4 reported "No support for codec h264 profile 578" before the per-codec dispatch in request_switch_device_for_profile could fire — the profile-578 (H264 Constrained Baseline) check happened during hwaccel init, found nothing in the libva profile list, and bailed without ever calling into the daedalus path. Fix: extend the fds[] array in any_fd_supports_output_format from 5 to 6 entries, with the sixth being video_fd_daedalus when HAVE_DAEDALUS_V4L2 is on (and -1 otherwise so it's skipped by the `if (fds[i] < 0) continue;` guard). After the fix, daedalus_v4l2's OUTPUT format menu (VP9F + AV1F + S264) gets seen, and Request- QueryConfigProfiles returns VP9Profile0 + AV1Profile0 + the H264* profiles, all of which then route through the LIBVA-1 'd' kind override in request_switch_device_for_profile. Verified on higgs: Before: vainfo: Supported profile and entrypoints VAProfileHEVCMain : VAEntrypointVLD (only HEVC; H264/VP9/AV1 not enumerated) ffmpeg vaapi -i h264 → "No support for codec h264 profile 578" Build clean on boltzmann (only config.c.o + request.c.o recompile). Backward-compatible on RK3399/3588 — the new slot is gated by HAVE_DAEDALUS_V4L2 and video_fd_daedalus >= 0; both stay false in those deployments. Existing 5-fd probe order unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 16:45:33 +02:00
claude-noether	c332d34643	Merge pull request 'request: route VP9/AV1/H.264 to daedalus_v4l2 on Pi 5 mixed deploy' (#6 ) from claude-noether/libva-v4l2-request-fourier:noether/libva-1-per-codec-dispatch into master	2026-05-20 08:53:04 +00:00