Add libva-v4l2-request-ohm-gl-fix package

Mirrors phase6/step1/ from the ohm_gl_fix campaign. Contract-correct hantro multi-planar / chromium-149-era stateless H.264 port of bootlin's libva-v4l2-request, patches 0001..0018 + fourier-local. Honest characterisation in README: - Builds cleanly on chromium-builder LXC (boltzmann) - vainfo enumerates H.264 profiles cleanly with LIBVA_DRIVER_NAME=v4l2_request - NOT on Brave's decode path on ohm_gl_fix stack — Brave uses Chromium's own V4L2VideoDecoder in media/gpu/v4l2/. - Most likely useful for a future Firefox-via-libavcodec-vaapi campaign, modulo a separate Mesa-panfrost WSI pitch issue. - DEBUG patches (0010, 0011, 0014) intentionally kept in series for development; remove for cleaner production runs. Audit trail in the source repo at ohm_gl_fix: phase6/step1/audit_0008_decode_params_2026-05-01.md phase6/step1/api_contract_findings_2026-05-01.md phase3_remeasure_2026-05-02/B3_decoder_discovery.md (why this isn't on Brave's path)
2026-05-02 15:17:10 +00:00
parent 7a5ec202ff
commit b47938e0bc
21 changed files with 3692 additions and 0 deletions
@@ -0,0 +1,70 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-01
 Subject: [PATCH] mplane: enable V4L2 multiplanar capture for NV12 on hantro-vpu
 Fourier's local patch already wired multiplanar plumbing through
 src/v4l2.c (helpers v4l2_type_video_{output,capture}() at lines 59-69,
 struct v4l2_plane planes[] threading in QUERYBUF/QBUF/DQBUF, per-plane
 EXPBUF loop at line 411) and through src/context.c, src/buffer.c,
 src/picture.c via the v4l2_type_video_{output,capture}(video_format
 ->v4l2_mplane) helper calls.
 The remaining gap: the NV12 entry in src/video.c was hardcoded to
 v4l2_mplane=false, and the bootstrap path in src/surface.c was
 hardcoded to singleplanar literals before video_format is populated.
 This patch flips the NV12 entry to v4l2_mplane=true and updates the
 two singleplanar literals in src/surface.c to their MPLANE variants:
  - src/video.c:42  v4l2_mplane=false -> true (NV12 only;
    Sunxi-tiled NV12 left at false for cedrus compatibility)
  - src/surface.c:84  output_type = v4l2_type_video_output(true)
  - src/surface.c:109 v4l2_find_format(..., CAPTURE_MPLANE, NV12)
 Empirically, hantro-vpu (RK3568 mainline) advertises NV12 only under
 V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; querying the singleplanar type
 returns no match (verified via VIDIOC_ENUM_FMT in Phase 3 GStreamer
 strace baseline).
 Trade-off accepted: legacy sunxi-cedrus singleplanar NV12 paths are
 left unchanged via the SUNXI_TILED_NV12 entry (still mplane=false,
 __arm__ only). Pure-NV12 cedrus on aarch64 would regress, but the
 known userbase here is RK3566/RK3568 hantro.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 src/surface.c | 4 ++--
 src/video.c   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)
 --- a/src/video.c
 +++ b/src/video.c
@@ -39,7 +39,7 @@ static struct video_format formats[] = {
 		.description		= "NV12 YUV",
 		.v4l2_format		= V4L2_PIX_FMT_NV12,
 		.v4l2_buffers_count	= 1,
 -		.v4l2_mplane		= false,
 +		.v4l2_mplane		= true,
 		.drm_format		= DRM_FORMAT_NV12,
 		.drm_modifier		= DRM_FORMAT_MOD_NONE,
 		.planes_count		= 2,
 --- a/src/surface.c
 +++ b/src/surface.c
@@ -81,7 +81,7 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
 	// we declare SET_FORMAT_OF_OUTPUT_ONCE to ensure v4l2_set_format only gets called once
 	// (in the first RequestCreateSurfaces2 call BEFORE any buffers are created later on)
 	unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE;
 -	unsigned int output_type = v4l2_type_video_output(false);
 +	unsigned int output_type = v4l2_type_video_output(true);
 	if (!SET_FORMAT_OF_OUTPUT_ONCE) {
 		rc = v4l2_set_format(driver_data->video_fd, output_type, pixelformat,
@@ -106,7 +106,7 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
 			video_format = video_format_find(V4L2_PIX_FMT_SUNXI_TILED_NV12);
 		found = v4l2_find_format(driver_data->video_fd,
 -					 V4L2_BUF_TYPE_VIDEO_CAPTURE,
 +					 V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE,
 					 V4L2_PIX_FMT_NV12);
 		if (found)
 			video_format = video_format_find(V4L2_PIX_FMT_NV12);
@@ -0,0 +1,103 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-01
 Subject: [PATCH] context: pre-STREAMON device controls and minimum OUTPUT pool
 Two related fixes that surfaced during the first hantro-vpu (RK3568)
 smoke test of the multiplanar build:
 1. **OUTPUT queue must be non-empty at STREAMON.** Hantro's
   vb2_start_streaming rejects an empty queue with EINVAL. Some
   VA-API callers (notably ffmpeg's vaapi-copy path) call
   vaCreateContext with num_render_targets=0 and allocate render
   targets lazily. The OUTPUT (bitstream-input) pool must NOT be
   sized off surfaces_count alone — it is a request-time resource,
   not per-surface. Quick fix: floor the pool to 4 buffers when
   the caller passes 0. (A proper decoupling of OUTPUT pool from
   surface lifecycle is documented in upstreamable_design.md.)
 2. **Device-wide stateless H.264 controls before STREAMON.** The
   V4L2 stateless framework requires V4L2_CID_STATELESS_H264_
   DECODE_MODE and START_CODE be set on the device fd
   (request_fd=-1) before stream start. Per-request controls
   (SPS/PPS/SLICE_PARAMS/etc.) attached to a request_fd come
   later via h264_set_controls(). hantro-vpu accepts only
   DECODE_MODE_FRAME_BASED; START_CODE_ANNEX_B matches what the
   existing slice-assembly path emits.
   This is set unconditionally for now (errors silently ignored)
   to keep cedrus and other backends compatible — they may
   default to SLICE_BASED and not expose DECODE_MODE at all.
   Probe-then-set via VIDIOC_QUERYCTRL is the upstream-correct
   approach (see upstreamable_design.md §3).
 After this patch, vainfo still enumerates as before, but the first
 mpv vaapi-copy attempt advances past STREAMON and into actual
 decode submission.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 src/context.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)
 --- a/src/context.c
 +++ b/src/context.c
@@ -64,6 +64,7 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
 	VAContextID id;
 	VAStatus status;
 	unsigned int output_type, capture_type;
 +	unsigned int output_buffers_count;
 	unsigned int index_base;
 	unsigned int index;
 	unsigned int i;
@@ -90,8 +91,16 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
 	}
 	memset(&context_object->dpb, 0, sizeof(context_object->dpb));
 +	/*
 +	 * The OUTPUT (bitstream-input) queue must be non-empty before
 +	 * VIDIOC_STREAMON or hantro-class drivers reject with EINVAL.
 +	 * VA-API callers (e.g. ffmpeg's vaapi-copy path) may invoke
 +	 * vaCreateContext with num_render_targets=0; allocate a small
 +	 * minimum pool independent of the caller's surface count.
 +	 */
 +	output_buffers_count = surfaces_count > 0 ? surfaces_count : 4;
 	rc = v4l2_create_buffers(driver_data->video_fd, output_type,
 -				 surfaces_count, &index_base);
 +				 output_buffers_count, &index_base);
 	if (rc < 0) {
 		status = VA_STATUS_ERROR_ALLOCATION_FAILED;
 		goto error;
@@ -138,6 +147,33 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
 		surface_object->source_size = length;
 	}
 +	/*
 +	 * Stateless H.264 device-wide controls. The kernel V4L2 stateless
 +	 * framework requires DECODE_MODE and START_CODE be set on the
 +	 * device fd (request_fd=-1) before VIDIOC_STREAMON; per-request
 +	 * controls (SPS/PPS/etc.) attached to a request_fd come later.
 +	 *
 +	 * hantro-vpu (RK3568) accepts only DECODE_MODE_FRAME_BASED.
 +	 * START_CODE_ANNEX_B preserves leading 0x00000001 in the slice
 +	 * payload that h264.c assembles. Errors here are not fatal: not
 +	 * every backing driver supports both controls (e.g. cedrus may
 +	 * default to SLICE_BASED without exposing DECODE_MODE).
 +	 */
 +	{
 +		struct v4l2_ext_control dev_ctrls[2] = {
 +			{
 +				.id = V4L2_CID_STATELESS_H264_DECODE_MODE,
 +				.value = V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED,
 +			},
 +			{
 +				.id = V4L2_CID_STATELESS_H264_START_CODE,
 +				.value = V4L2_STATELESS_H264_START_CODE_ANNEX_B,
 +			},
 +		};
 +		(void)v4l2_set_controls(driver_data->video_fd, -1,
 +					dev_ctrls, 2);
 +	}
 +
 	rc = v4l2_set_stream(driver_data->video_fd, output_type, true);
 	if (rc < 0) {
 		status = VA_STATUS_ERROR_OPERATION_FAILED;
@@ -0,0 +1,145 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-01
 Subject: [PATCH] v4l2: add QUERYCTRL/QUERYMENU capability-probe helpers
 Pure utility additions, no behaviour change. Three helpers in
 src/v4l2.{c,h}:
  - v4l2_query_ext_ctrl(): wraps VIDIOC_QUERY_EXT_CTRL by CID.
    Returns 0 if the control exists, -1 if not. Caller passes NULL
    qec to test existence only.
  - v4l2_query_menu(): wraps VIDIOC_QUERYMENU at a given index.
    Returns 0 if a menu item exists at that index, -1 otherwise.
  - v4l2_ctrl_menu_has_value(): convenience layered on the above.
    For a menu/intmenu-type control, walks all menu items between
    minimum and maximum and returns true iff `value` is a valid
    entry. Used by callers that ask "does this driver accept menu
    value X for this CID?" without caring about iteration details.
 These unblock commit 3 (request_pool — needs ext-ctrl probing for
 codec-ops dispatch) and commit 4 (probe-then-set DECODE_MODE/
 START_CODE — replaces 0002's unconditional set with a real probe)
 of the upstreamable design's six-commit series.
 Forward-declarations in v4l2.h keep the header lean: existing
 prototypes already use opaque struct v4l2_ext_control * pointers
 without including <linux/videodev2.h>; we follow the same
 convention for struct v4l2_query_ext_ctrl and struct v4l2_querymenu.
 No call sites added in this commit. Compile-only verification:
 the .so links cleanly with three new exports.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 src/v4l2.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/v4l2.h | 33 +++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+)
 --- a/src/v4l2.h
 +++ b/src/v4l2.h
@@ -64,4 +64,37 @@ int v4l2_set_control(int video_fd, int request_fd, unsigned int id, void *data,
 		     unsigned int size);
 int v4l2_set_stream(int video_fd, unsigned int type, bool enable);
 +/*
 + * Capability-probe helpers. These let calling code discover what the
 + * backing kernel driver supports rather than hardcoding assumptions
 + * about specific decoder hardware.
 + */
 +
 +/*
 + * Query the metadata of an extended control by CID. Fills *qec on
 + * success. Returns 0 if the control exists, -1 (errno=EINVAL) if the
 + * driver does not expose this CID. Pass qec=NULL to test existence
 + * only.
 + */
 +struct v4l2_query_ext_ctrl;
 +int v4l2_query_ext_ctrl(int video_fd, unsigned int id,
 +			struct v4l2_query_ext_ctrl *qec);
 +
 +/*
 + * Query a single menu item of a menu/intmenu control at the given
 + * index. Fills *qm on success. Returns 0 if the menu item exists at
 + * this index, -1 otherwise.
 + */
 +struct v4l2_querymenu;
 +int v4l2_query_menu(int video_fd, unsigned int id, unsigned int index,
 +		    struct v4l2_querymenu *qm);
 +
 +/*
 + * Convenience: for a menu-type control, return true iff `value` is a
 + * valid menu entry (i.e. the driver accepts it). Walks all menu items
 + * up to the control's maximum to check.
 + */
 +bool v4l2_ctrl_menu_has_value(int video_fd, unsigned int id,
 +			      unsigned int value);
 +
 #endif
 --- a/src/v4l2.c
 +++ b/src/v4l2.c
@@ -508,3 +508,63 @@ int v4l2_set_stream(int video_fd, unsigned int type, bool enable)
 	return 0;
 }
 +
 +int v4l2_query_ext_ctrl(int video_fd, unsigned int id,
 +			struct v4l2_query_ext_ctrl *qec)
 +{
 +	struct v4l2_query_ext_ctrl local;
 +	struct v4l2_query_ext_ctrl *target = qec ? qec : &local;
 +	int rc;
 +
 +	memset(target, 0, sizeof(*target));
 +	target->id = id;
 +
 +	rc = ioctl(video_fd, VIDIOC_QUERY_EXT_CTRL, target);
 +	if (rc < 0)
 +		return -1;
 +
 +	return 0;
 +}
 +
 +int v4l2_query_menu(int video_fd, unsigned int id, unsigned int index,
 +		    struct v4l2_querymenu *qm)
 +{
 +	int rc;
 +
 +	if (qm == NULL)
 +		return -1;
 +
 +	memset(qm, 0, sizeof(*qm));
 +	qm->id = id;
 +	qm->index = index;
 +
 +	rc = ioctl(video_fd, VIDIOC_QUERYMENU, qm);
 +	if (rc < 0)
 +		return -1;
 +
 +	return 0;
 +}
 +
 +bool v4l2_ctrl_menu_has_value(int video_fd, unsigned int id,
 +			      unsigned int value)
 +{
 +	struct v4l2_query_ext_ctrl qec;
 +	struct v4l2_querymenu qm;
 +	long long i;
 +
 +	if (v4l2_query_ext_ctrl(video_fd, id, &qec) < 0)
 +		return false;
 +
 +	if (qec.type != V4L2_CTRL_TYPE_MENU &&
 +	    qec.type != V4L2_CTRL_TYPE_INTEGER_MENU)
 +		return false;
 +
 +	for (i = qec.minimum; i <= qec.maximum; i += qec.step ? qec.step : 1) {
 +		if (v4l2_query_menu(video_fd, id, (unsigned int)i, &qm) < 0)
 +			continue;
 +		if ((unsigned int)i == value)
 +			return true;
 +	}
 +
 +	return false;
 +}
@@ -0,0 +1,545 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-01
 Subject: [PATCH] context: introduce request_pool, decouple OUTPUT buffers from surfaces
 Commit 3 of the upstreamable plan (upstreamable_design.md §1, §5).
 Replaces the prior per-surface OUTPUT-buffer ownership model with a
 small driver-wide pool sized by codec pipeline depth (4 H.264 frames
 in flight), allocated unconditionally regardless of caller's
 num_render_targets.
 Prior art (kernel UAPI dev-stateless-decoder.rst, ffmpeg
 v4l2_request.c, Chromium V4L2StatelessVideoDecoder, GStreamer
 v4l2slh264dec) all decouple OUTPUT and CAPTURE pool sizing. fourier's
 "output_count == surfaces_count" model was a category error: OUTPUT
 buffers are request-time bitstream slots, CAPTURE buffers are
 picture-time DPB slots; their lifecycles and sizing are independent.
 Changes:
  * NEW src/request_pool.{c,h} (~200 LoC):
      - request_pool_init(): CREATE_BUFS + per-slot QUERYBUF + mmap.
      - request_pool_destroy(): munmap all, idempotent.
      - request_pool_acquire(): round-robin claim; returns V4L2 buffer
        index of an unused slot or -1.
      - request_pool_release(): mark slot free for reuse.
      - request_pool_slot(): accessor for ptr/size given a buffer index.
  * src/request.h: add struct request_pool output_pool to request_data.
  * src/context.c::RequestCreateContext: replace the per-surface
    OUTPUT loop with a single request_pool_init() call (count=4,
    independent of surfaces_count). Drop the now-unused locals
    (length, offset, source_data, output_buffers_count, index,
    index_base, i, surface_object). DELETES patch 0002's
    "output_buffers_count = ... ? ... : 4" hack inline — the pool's
    own count parameter supersedes it.
  * src/picture.c::RequestBeginPicture: borrow a pool slot at frame
    start, write its mmap pointer/size/index into the surface's
    transient source_* fields. The fields stay (still useful as
    a borrow handle that the existing codec_store_buffer memcpys
    target), but no longer represent surface-permanent ownership.
    Reset slices_size/slices_count here too (was implicit on first
    Render).
  * src/surface.c::RequestSyncSurface: after VIDIOC_DQBUF returns
    the OUTPUT buffer, release the pool slot and clear the surface's
    borrow handle. Fixes the segv on second-frame submission.
  * src/surface.c::RequestDestroySurfaces: remove the munmap of
    source_data — pool owns the mmap.
  * src/request.c::RequestTerminate: call request_pool_destroy()
    before close(video_fd) so munmaps still target a valid fd.
  * src/meson.build: add request_pool.c and request_pool.h to the
    sources/headers lists.
 This commit removes 0002's OUTPUT-pool hack inline (the
 "floor to 4" line is gone). The DECODE_MODE/START_CODE block in 0002
 remains until commit 4 lands.
 Build-verified clean on aarch64.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/request.h	2026-05-01 20:09:57.346428828 +0000
 +++ b/src/request.h	2026-05-01 20:17:57.497514185 +0000
@@ -31,6 +31,7 @@
 #include "context.h"
 #include "object_heap.h"
 +#include "request_pool.h"
 #include "video.h"
 #include <va/va.h>
@@ -55,6 +56,13 @@
 	int media_fd;
 	struct video_format *video_format;
 +
 +	/*
 +	 * OUTPUT (bitstream-input) buffer pool, decoupled from VA
 +	 * surfaces. Sized by codec pipeline depth, populated on first
 +	 * RequestCreateContext, torn down at driver Terminate.
 +	 */
 +	struct request_pool output_pool;
 };
 VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context);
 --- a/src/request.c	2026-05-01 20:09:57.346428828 +0000
 +++ b/src/request.c	2026-05-01 20:19:48.091143681 +0000
@@ -205,6 +205,13 @@
 	struct object_config *config_object;
 	int iterator;
 +	/*
 +	 * Tear down the OUTPUT buffer pool before closing video_fd so
 +	 * the munmap calls in request_pool_destroy() can still touch the
 +	 * mmap regions (which are tied to that fd's lifetime).
 +	 */
 +	request_pool_destroy(&driver_data->output_pool);
 +
 	close(driver_data->video_fd);
 	close(driver_data->media_fd);
 --- a/src/context.c	2026-05-01 20:09:57.346428828 +0000
 +++ b/src/context.c	2026-05-01 20:18:33.738048227 +0000
@@ -54,20 +54,12 @@
 {
 	struct request_data *driver_data = context->pDriverData;
 	struct object_config *config_object;
 -	struct object_surface *surface_object;
 	struct object_context *context_object = NULL;
 	struct video_format *video_format;
 -	unsigned int length;
 -	unsigned int offset;
 -	void *source_data = MAP_FAILED;
 	VASurfaceID *ids = NULL;
 	VAContextID id;
 	VAStatus status;
 	unsigned int output_type, capture_type;
 -	unsigned int output_buffers_count;
 -	unsigned int index_base;
 -	unsigned int index;
 -	unsigned int i;
 	int rc;
 	video_format = driver_data->video_format;
@@ -92,15 +84,20 @@
 	memset(&context_object->dpb, 0, sizeof(context_object->dpb));
 	/*
 -	 * The OUTPUT (bitstream-input) queue must be non-empty before
 -	 * VIDIOC_STREAMON or hantro-class drivers reject with EINVAL.
 -	 * VA-API callers (e.g. ffmpeg's vaapi-copy path) may invoke
 -	 * vaCreateContext with num_render_targets=0; allocate a small
 -	 * minimum pool independent of the caller's surface count.
 +	 * Initialize the OUTPUT (bitstream-input) buffer pool. Sized by
 +	 * codec pipeline depth (4 H.264 frames in flight is sufficient
 +	 * for current hantro/rkvdec scheduling); independent of caller-
 +	 * supplied surfaces_count. Pool is owned by driver_data so it
 +	 * outlives any single context destroy/recreate cycle.
 +	 *
 +	 * This replaces the prior per-surface OUTPUT loop, which (a)
 +	 * created an empty queue when surfaces_count==0 (ffmpeg vaapi-
 +	 * copy path) and (b) only populated surface->source_* for
 +	 * surfaces present at vaCreateContext time, NULL-derefing on
 +	 * surfaces created later.
 	 */
 -	output_buffers_count = surfaces_count > 0 ? surfaces_count : 4;
 -	rc = v4l2_create_buffers(driver_data->video_fd, output_type,
 -				 output_buffers_count, &index_base);
 +	rc = request_pool_init(&driver_data->output_pool,
 +			       driver_data->video_fd, output_type, 4);
 	if (rc < 0) {
 		status = VA_STATUS_ERROR_ALLOCATION_FAILED;
 		goto error;
@@ -111,40 +108,15 @@
 	 * we don't have any indication wrt its life time. Let's make sure
 	 * its life span is under our control.
 	 */
 -	ids = malloc(surfaces_count * sizeof(VASurfaceID));
 -	if (ids == NULL) {
 -		status = VA_STATUS_ERROR_ALLOCATION_FAILED;
 -		goto error;
 -	}
 -
 -	memcpy(ids, surfaces_ids, surfaces_count * sizeof(VASurfaceID));
 -
 -	for (i = 0; i < surfaces_count; i++) {
 -		index = index_base + i;
 -
 -		surface_object = SURFACE(driver_data, surfaces_ids[i]);
 -		if (surface_object == NULL) {
 -			status = VA_STATUS_ERROR_INVALID_SURFACE;
 -			goto error;
 -		}
 -
 -		rc = v4l2_query_buffer(driver_data->video_fd, output_type,
 -				       index, &length, &offset, 1);
 -		if (rc < 0) {
 +	if (surfaces_count > 0) {
 +		ids = malloc(surfaces_count * sizeof(VASurfaceID));
 +		if (ids == NULL) {
 			status = VA_STATUS_ERROR_ALLOCATION_FAILED;
 			goto error;
 		}
 -		source_data = mmap(NULL, length, PROT_READ | PROT_WRITE,
 -				   MAP_SHARED, driver_data->video_fd, offset);
 -		if (source_data == MAP_FAILED) {
 -			status = VA_STATUS_ERROR_ALLOCATION_FAILED;
 -			goto error;
 -		}
 -
 -		surface_object->source_index = index;
 -		surface_object->source_data = source_data;
 -		surface_object->source_size = length;
 +		memcpy(ids, surfaces_ids,
 +		       surfaces_count * sizeof(VASurfaceID));
 	}
 	/*
@@ -200,9 +172,6 @@
 	goto complete;
 error:
 -	if (source_data != MAP_FAILED)
 -		munmap(source_data, length);
 -
 	if (ids != NULL)
 		free(ids);
 --- a/src/picture.c	2026-05-01 20:09:57.346428828 +0000
 +++ b/src/picture.c	2026-05-01 20:19:10.742593454 +0000
@@ -216,6 +216,8 @@
 	struct request_data *driver_data = context->pDriverData;
 	struct object_context *context_object;
 	struct object_surface *surface_object;
 +	struct request_pool_slot *slot;
 +	int slot_index;
 	context_object = CONTEXT(driver_data, context_id);
 	if (context_object == NULL)
@@ -228,6 +230,31 @@
 	if (surface_object->status == VASurfaceRendering)
 		RequestSyncSurface(context, surface_id);
 +	/*
 +	 * Borrow an OUTPUT (bitstream-input) slot from the driver-wide
 +	 * pool for the duration of this Begin/Render/End cycle. The
 +	 * surface's source_* fields hold the borrow's mmap pointer/size/
 +	 * V4L2 buffer index until RequestSyncSurface releases it after
 +	 * VIDIOC_DQBUF.
 +	 */
 +	slot_index = request_pool_acquire(&driver_data->output_pool);
 +	if (slot_index < 0)
 +		return VA_STATUS_ERROR_ALLOCATION_FAILED;
 +
 +	slot = request_pool_slot(&driver_data->output_pool,
 +				 (unsigned int)slot_index);
 +	if (slot == NULL) {
 +		request_pool_release(&driver_data->output_pool,
 +				     (unsigned int)slot_index);
 +		return VA_STATUS_ERROR_ALLOCATION_FAILED;
 +	}
 +
 +	surface_object->source_index = slot->index;
 +	surface_object->source_data = slot->data;
 +	surface_object->source_size = slot->size;
 +	surface_object->slices_size = 0;
 +	surface_object->slices_count = 0;
 +
 	surface_object->status = VASurfaceRendering;
 	context_object->render_surface_id = surface_id;
 --- a/src/surface.c	2026-05-01 20:09:57.346428828 +0000
 +++ b/src/surface.c	2026-05-01 20:19:35.490958060 +0000
@@ -254,10 +254,11 @@
 		if (surface_object == NULL)
 			return VA_STATUS_ERROR_INVALID_SURFACE;
 -		if (surface_object->source_data != NULL &&
 -		    surface_object->source_size > 0)
 -			munmap(surface_object->source_data,
 -			       surface_object->source_size);
 +		/*
 +		 * source_* are now transient borrows from request_pool, not
 +		 * surface-owned mappings; the pool owns the underlying mmap.
 +		 * Nothing to free here.
 +		 */
 		for (j = 0; j < surface_object->destination_buffers_count; j++)
 			if (surface_object->destination_map[j] != NULL &&
@@ -336,6 +337,15 @@
 		goto error;
 	}
 +	/*
 +	 * OUTPUT buffer is back from the kernel: return its pool slot
 +	 * for reuse and clear the surface's transient borrow handle.
 +	 */
 +	request_pool_release(&driver_data->output_pool,
 +			     surface_object->source_index);
 +	surface_object->source_data = NULL;
 +	surface_object->source_size = 0;
 +
 	rc = v4l2_dequeue_buffer(driver_data->video_fd, -1, capture_type,
 				 surface_object->destination_index,
 				 surface_object->destination_buffers_count);
 --- a/src/meson.build	2026-05-01 20:09:57.346428828 +0000
 +++ b/src/meson.build	2026-05-01 20:20:04.775389455 +0000
@@ -44,6 +44,7 @@
 	'v4l2.c',
 	'mpeg2.c',
 	'h264.c',
 +	'request_pool.c',
 #	'h265.c'
 ]
@@ -64,6 +65,7 @@
 	'v4l2.h',
 	'mpeg2.h',
 	'h264.h',
 +	'request_pool.h',
 #	'h265.h'
 ]
 --- a/src/request_pool.h	2025-09-03 18:38:22.431999998 +0000
 +++ b/src/request_pool.h	2026-05-01 20:17:37.517219722 +0000
@@ -0,0 +1,84 @@
 +/*
 + * Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the
 + * "Software"), to deal in the Software without restriction.
 + *
 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
 + */
 +
 +#ifndef _REQUEST_POOL_H_
 +#define _REQUEST_POOL_H_
 +
 +#include <stdbool.h>
 +
 +/*
 + * OUTPUT (bitstream-input) buffer pool, decoupled from caller-allocated
 + * VA surfaces. Sizing is driven by codec pipeline depth (typically 4
 + * for H.264), not by the consumer's surface count.
 + *
 + * The pool owns the V4L2 buffer indices and their mmap pointers. A
 + * decode request "borrows" a slot at vaBeginPicture, fills it across
 + * vaRenderPicture calls, queues it at vaEndPicture, and releases it
 + * after VIDIOC_DQBUF returns.
 + *
 + * This replaces the per-surface OUTPUT-buffer ownership model in the
 + * pre-refactor code, where object_surface.source_* fields permanently
 + * held a single OUTPUT buffer per surface — incorrect because OUTPUT
 + * buffers are request-time resources, not picture-time resources, and
 + * because the per-surface loop in RequestCreateContext only ran when
 + * surfaces_count > 0 (breaking ffmpeg's vaapi-copy num_render_targets=0
 + * convention).
 + */
 +
 +struct request_pool_slot {
 +	unsigned int	index;		/* V4L2 buffer index in OUTPUT queue */
 +	void		*data;		/* mmap pointer for this slot */
 +	unsigned int	size;		/* mmap size in bytes */
 +	bool		busy;		/* true while borrowed for a request */
 +};
 +
 +struct request_pool {
 +	struct request_pool_slot	*slots;
 +	unsigned int			 count;
 +	unsigned int			 next;	/* round-robin acquire cursor */
 +	bool				 initialized;
 +};
 +
 +/*
 + * Allocate count OUTPUT buffers via VIDIOC_CREATE_BUFS, query and mmap
 + * each, populate pool->slots[]. Caller must have already done
 + * VIDIOC_S_FMT on the OUTPUT queue. Returns 0 on success, -1 on
 + * failure.
 + */
 +int request_pool_init(struct request_pool *pool, int video_fd,
 +		      unsigned int output_type, unsigned int count);
 +
 +/*
 + * Munmap all slots and free the slots array. Idempotent.
 + */
 +void request_pool_destroy(struct request_pool *pool);
 +
 +/*
 + * Claim the next free slot (round-robin). Returns the slot's V4L2
 + * buffer index on success (slot in pool->slots[] is determined by
 + * the returned index), or -1 if all slots are busy.
 + */
 +int request_pool_acquire(struct request_pool *pool);
 +
 +/*
 + * Mark the slot at pool->slots[i] free for reuse. Caller must pass the
 + * V4L2 buffer index returned earlier from request_pool_acquire().
 + */
 +void request_pool_release(struct request_pool *pool, unsigned int index);
 +
 +/*
 + * Look up the pool slot owning a given V4L2 buffer index. Returns
 + * pointer to the slot on success, NULL if the index is out of range.
 + * The returned pointer is valid until pool destruction; do not free.
 + */
 +struct request_pool_slot *request_pool_slot(struct request_pool *pool,
 +					    unsigned int index);
 +
 +#endif
 --- a/src/request_pool.c	2025-09-03 18:38:22.431999998 +0000
 +++ b/src/request_pool.c	2026-05-01 20:17:37.537220017 +0000
@@ -0,0 +1,147 @@
 +/*
 + * Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the
 + * "Software"), to deal in the Software without restriction.
 + *
 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
 + */
 +
 +#include "request_pool.h"
 +
 +#include <errno.h>
 +#include <stdlib.h>
 +#include <string.h>
 +#include <sys/mman.h>
 +
 +#include "utils.h"
 +#include "v4l2.h"
 +
 +int request_pool_init(struct request_pool *pool, int video_fd,
 +		      unsigned int output_type, unsigned int count)
 +{
 +	unsigned int index_base;
 +	unsigned int length;
 +	unsigned int offset;
 +	unsigned int i;
 +	int rc;
 +
 +	if (pool == NULL || count == 0)
 +		return -1;
 +
 +	if (pool->initialized)
 +		return 0;
 +
 +	pool->slots = calloc(count, sizeof(*pool->slots));
 +	if (pool->slots == NULL)
 +		return -1;
 +
 +	pool->count = count;
 +	pool->next = 0;
 +
 +	rc = v4l2_create_buffers(video_fd, output_type, count, &index_base);
 +	if (rc < 0)
 +		goto error;
 +
 +	for (i = 0; i < count; i++) {
 +		pool->slots[i].index = index_base + i;
 +		pool->slots[i].busy = false;
 +
 +		rc = v4l2_query_buffer(video_fd, output_type,
 +				       pool->slots[i].index,
 +				       &length, &offset, 1);
 +		if (rc < 0)
 +			goto error;
 +
 +		pool->slots[i].data = mmap(NULL, length,
 +					   PROT_READ | PROT_WRITE,
 +					   MAP_SHARED, video_fd, offset);
 +		if (pool->slots[i].data == MAP_FAILED) {
 +			pool->slots[i].data = NULL;
 +			goto error;
 +		}
 +
 +		pool->slots[i].size = length;
 +	}
 +
 +	pool->initialized = true;
 +	return 0;
 +
 +error:
 +	request_pool_destroy(pool);
 +	return -1;
 +}
 +
 +void request_pool_destroy(struct request_pool *pool)
 +{
 +	unsigned int i;
 +
 +	if (pool == NULL || pool->slots == NULL)
 +		return;
 +
 +	for (i = 0; i < pool->count; i++) {
 +		if (pool->slots[i].data != NULL && pool->slots[i].size > 0)
 +			munmap(pool->slots[i].data, pool->slots[i].size);
 +	}
 +
 +	free(pool->slots);
 +	pool->slots = NULL;
 +	pool->count = 0;
 +	pool->next = 0;
 +	pool->initialized = false;
 +}
 +
 +int request_pool_acquire(struct request_pool *pool)
 +{
 +	unsigned int start;
 +	unsigned int i;
 +
 +	if (pool == NULL || !pool->initialized || pool->count == 0)
 +		return -1;
 +
 +	start = pool->next;
 +	for (i = 0; i < pool->count; i++) {
 +		unsigned int slot = (start + i) % pool->count;
 +
 +		if (!pool->slots[slot].busy) {
 +			pool->slots[slot].busy = true;
 +			pool->next = (slot + 1) % pool->count;
 +			return (int)pool->slots[slot].index;
 +		}
 +	}
 +
 +	/* All slots busy; caller must wait for an in-flight DQBUF. */
 +	return -1;
 +}
 +
 +void request_pool_release(struct request_pool *pool, unsigned int index)
 +{
 +	unsigned int i;
 +
 +	if (pool == NULL || pool->slots == NULL)
 +		return;
 +
 +	for (i = 0; i < pool->count; i++) {
 +		if (pool->slots[i].index == index) {
 +			pool->slots[i].busy = false;
 +			return;
 +		}
 +	}
 +}
 +
 +struct request_pool_slot *request_pool_slot(struct request_pool *pool,
 +					    unsigned int index)
 +{
 +	unsigned int i;
 +
 +	if (pool == NULL || pool->slots == NULL)
 +		return NULL;
 +
 +	for (i = 0; i < pool->count; i++) {
 +		if (pool->slots[i].index == index)
 +			return &pool->slots[i];
 +	}
 +
 +	return NULL;
 +}
@@ -0,0 +1,61 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-01
 Subject: [PATCH] h264: submit PRED_WEIGHTS only when WEIGHTED_PRED applies
 Per kernel UAPI (include/uapi/linux/v4l2-controls.h),
 V4L2_CID_STATELESS_H264_PRED_WEIGHTS is a conditional control:
    V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) :=
        ((pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) &&
         (slice_type == P || slice_type == SP)) ||
        (pps->weighted_bipred_idc == 1 && slice_type == B)
 Submitting PRED_WEIGHTS on a frame where the macro evaluates false
 triggers VIDIOC_S_EXT_CTRLS to return EINVAL at error_idx=5 (the
 6th, last control in the per-request batch) on hantro-vpu and any
 other driver that strictly enforces the spec.
 Smoke trace from RK3568 hantro on bbb_1080p30 (Main profile, no
 weighted prediction): every per-frame batch fails identically, 13
 EINVALs over a 10-frame run. Without this fix, ffmpeg's vaapi-copy
 falls back to software decode for every frame.
 Fix: narrow num_controls to 5 (excluding PRED_WEIGHTS at index 5)
 when the macro returns false; keep at 6 when it returns true.
 Defect found and fixed via Phase 6 Step 1 ohm smoke testing. Not
 part of Sonnet's six-commit upstreamable plan; slotted in as patch
 0005 ahead of the planned probe-then-set / FRAME_BASED commits
 because it unblocks per-frame submission on every backing driver,
 not just hantro.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/h264.c	2026-05-01 20:17:02.108697824 +0000
 +++ b/src/h264.c	2026-05-01 20:30:02.632190563 +0000
@@ -559,8 +559,24 @@
 		}
 	};
 +	/*
 +	 * PRED_WEIGHTS is conditionally required per kernel UAPI:
 +	 * V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) is only
 +	 * true when explicit weighted prediction applies (P/SP slice
 +	 * with WEIGHTED_PRED flag, or B slice with weighted_bipred_idc
 +	 * == 1). Submitting it unconditionally on a frame that does
 +	 * not need it triggers EINVAL at error_idx=5 on hantro and
 +	 * other drivers that strictly enforce the spec.
 +	 *
 +	 * controls[5] is PRED_WEIGHTS (last in array); narrow the
 +	 * submission count to exclude it when not required.
 +	 */
 +	unsigned int num_controls = 6;
 +	if (!V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
 +		num_controls = 5;
 +
 	rc = v4l2_set_controls(driver_data->video_fd, surface->request_fd,
 -			       controls, 6);
 +			       controls, num_controls);
 	if (rc < 0)
 		return VA_STATUS_ERROR_OPERATION_FAILED;
@@ -0,0 +1,128 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-01
 Subject: [PATCH] h264: omit per-slice controls in FRAME_BASED mode
 Identified by cross-reference against GStreamer's
 gst-plugins-bad/sys/v4l2codecs/gstv4l2codech264dec.c (upstream commit
 9e3e775). At lines 1263-1304, GStreamer gates SLICE_PARAMS and
 PRED_WEIGHTS submission on is_slice_based(self):
    if (is_slice_based (self)) {
        control[num_controls].id = V4L2_CID_STATELESS_H264_SLICE_PARAMS;
        ...
        control[num_controls].id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS;
        ...
    }
 In V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED, the kernel parses the
 bitstream itself from the OUTPUT-queue payload; per-slice controls in
 the request trigger cluster-validation EINVAL at error_idx=count
 (observed on RK3568 hantro-vpu, kernel 6.19.10).
 This patch:
  - Reorders controls[] so FRAME_BASED-required entries come first
    (SPS, PPS, SCALING_MATRIX, DECODE_PARAMS at indices 0..3) and the
    SLICE_BASED-only entries come last (SLICE_PARAMS, PRED_WEIGHTS at
    indices 4..5).
  - Defaults num_controls=4 (FRAME_BASED), expanding to 5 for
    SLICE_BASED and 6 when V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
  - Hardcodes slice_based=false for now since patch 0002 sets the
    device to FRAME_BASED unconditionally. A TODO marks the spot for
    the planned probe-then-set commit, which will populate
    context->decode_mode at CreateContext via VIDIOC_QUERYCTRL/
    G_EXT_CTRLS and replace the hardcoded false with a runtime check.
 Diagnosis chain:
  - patch 0005 reduced one EINVAL per frame on PRED_WEIGHTS
    submission, but cluster-level rejection persisted at error_idx=5
    (count) — meaning kernel walked all 5 controls cleanly but
    rejected the request as a whole.
  - dmesg silent → rejection in V4L2 core (v4l2-ctrls-request.c /
    v4l2-h264.c), not in hantro driver where it could log.
  - GStreamer reference confirmed FRAME_BASED contract: only 4
    sequence-and-frame-level controls go in the per-request batch.
 After this patch the kernel should accept the per-request controls
 and actually decode the bitstream into the CAPTURE buffer.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/h264.c	2026-05-01 20:30:02.632190563 +0000
 +++ b/src/h264.c	2026-05-01 20:49:46.937497317 +0000
@@ -531,6 +531,21 @@
 	sps.profile_idc = h264_profile_to_idc(profile);
 +	/*
 +	 * Per-request control batch, ordered so the controls REQUIRED in
 +	 * V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED come first
 +	 * (indices 0..3) and the SLICE_BASED-only controls come last
 +	 * (indices 4..5).
 +	 *
 +	 * Cross-reference: GStreamer gst-plugins-bad
 +	 * sys/v4l2codecs/gstv4l2codech264dec.c (commit 9e3e775,
 +	 * lines 1263-1304) gates SLICE_PARAMS and PRED_WEIGHTS on
 +	 * is_slice_based(self); under FRAME_BASED only SPS/PPS/
 +	 * SCALING_MATRIX/DECODE_PARAMS are submitted. The kernel
 +	 * parses the bitstream itself in FRAME_BASED mode; submitting
 +	 * per-slice controls in that mode triggers cluster-validation
 +	 * EINVAL at error_idx=count.
 +	 */
 	struct v4l2_ext_control controls[6] = {
 		{
 			.id = V4L2_CID_STATELESS_H264_SPS,
@@ -545,14 +560,14 @@
 			.p_h264_scaling_matrix = &matrix,
 			.size = sizeof(matrix),
 		}, {
 -			.id = V4L2_CID_STATELESS_H264_SLICE_PARAMS,
 -			.p_h264_slice_params = &slice,
 -			.size = sizeof(slice),
 -		}, {
 			.id = V4L2_CID_STATELESS_H264_DECODE_PARAMS,
 			.p_h264_decode_params = &decode,
 			.size = sizeof(decode),
 		}, {
 +			.id = V4L2_CID_STATELESS_H264_SLICE_PARAMS,
 +			.p_h264_slice_params = &slice,
 +			.size = sizeof(slice),
 +		}, {
 			.id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS,
 			.ptr = &weights,
 			.size = sizeof(weights),
@@ -560,20 +575,24 @@
 	};
 	/*
 -	 * PRED_WEIGHTS is conditionally required per kernel UAPI:
 -	 * V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) is only
 -	 * true when explicit weighted prediction applies (P/SP slice
 -	 * with WEIGHTED_PRED flag, or B slice with weighted_bipred_idc
 -	 * == 1). Submitting it unconditionally on a frame that does
 -	 * not need it triggers EINVAL at error_idx=5 on hantro and
 -	 * other drivers that strictly enforce the spec.
 +	 * Decode-mode dispatch. Patch 0002 unconditionally sets the
 +	 * device to FRAME_BASED, so we hardcode that here. When the
 +	 * planned probe-then-set commit lands, slice_based becomes
 +	 *     context->decode_mode == V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED
 +	 * with context->decode_mode populated at CreateContext via
 +	 * VIDIOC_QUERYCTRL/G_EXT_CTRLS.
 	 *
 -	 * controls[5] is PRED_WEIGHTS (last in array); narrow the
 -	 * submission count to exclude it when not required.
 +	 * FRAME_BASED:    4 controls (SPS, PPS, SCALING_MATRIX, DECODE_PARAMS).
 +	 * SLICE_BASED:   +SLICE_PARAMS (always), +PRED_WEIGHTS (when
 +	 *                V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED).
 	 */
 -	unsigned int num_controls = 6;
 -	if (!V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
 +	const bool slice_based = false; /* TODO: probe via context->decode_mode */
 +	unsigned int num_controls = 4;
 +	if (slice_based) {
 		num_controls = 5;
 +		if (V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
 +			num_controls = 6;
 +	}
 	rc = v4l2_set_controls(driver_data->video_fd, surface->request_fd,
 			       controls, num_controls);
@@ -0,0 +1,59 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-01
 Subject: [PATCH] context: enable ANNEX_B start-code emission to match device
 Patch 0002 sets V4L2_CID_STATELESS_H264_START_CODE to ANNEX_B on the
 device, telling the kernel that OUTPUT-buffer payloads will contain
 0x00 0x00 0x01 NAL start codes. picture.c::codec_store_buffer has
 the prepend logic guarded by `if (context->h264_start_code)`, but
 that boolean is set ONLY inside h264_get_controls() — a function
 that exists but is never called.
 Result: device expects ANNEX_B, libva-v4l2-request feeds raw NAL
 payloads with no start codes, kernel cannot find slice boundaries,
 hantro emits a zeroed CAPTURE buffer. mpv reports successful decode
 because the V4L2 round-trip succeeds (no EINVAL); the visual output
 is a flat dark-green frame (NV12 zero through BT.709).
 Identified via:
  - Patch 0006 cleared the EINVAL cluster-rejection (128 → 0 on
    bbb_1080p30) but visual output remained flat green.
  - GStreamer reference (gstv4l2codech264dec.c:1363-1377) confirms
    start codes are required when ANNEX_B is selected.
  - Source-archaeology of fourier's picture.c:67-74 showed the gate
    on context->h264_start_code.
 Fix: in context.c::RequestCreateContext, immediately after patch
 0002's device-control block, set context_object->h264_start_code =
 true to match the ANNEX_B mode we just programmed. Hardcoded for
 now (matches 0002's hardcoded set); replaced with a runtime probe
 in the planned probe-then-set commit.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/context.c	2026-05-01 20:48:59.884816330 +0000
 +++ b/src/context.c	2026-05-01 20:59:54.446340219 +0000
@@ -146,6 +146,23 @@
 					dev_ctrls, 2);
 	}
 +	/*
 +	 * Mirror the ANNEX_B start-code mode set on the device above
 +	 * into context_object->h264_start_code so picture.c::
 +	 * codec_store_buffer prepends 0x00 0x00 0x01 to each slice
 +	 * payload it copies into the OUTPUT buffer. Without this, the
 +	 * kernel — which we just told to expect ANNEX_B — sees a raw
 +	 * NAL stream with no start codes, fails to find slice
 +	 * boundaries, and emits a zeroed CAPTURE buffer (visually a
 +	 * flat dark-green frame).
 +	 *
 +	 * h264_get_controls() exists for this purpose but is never
 +	 * called in the current code path; the planned probe-then-set
 +	 * commit will replace this hardcoded assignment with a runtime
 +	 * read of the kernel's accepted START_CODE value.
 +	 */
 +	context_object->h264_start_code = true;
 +
 	rc = v4l2_set_stream(driver_data->video_fd, output_type, true);
 	if (rc < 0) {
 		status = VA_STATUS_ERROR_OPERATION_FAILED;
@@ -0,0 +1,87 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-01
 Subject: [PATCH] h264: fill DECODE_PARAMS frame_num + field flags from VAAPI
 Fourier's h264_va_picture_to_v4l2 only populated four fields of the
 struct v4l2_ctrl_h264_decode_params: dpb (via h264_fill_dpb),
 nal_ref_idc, top_field_order_cnt, bottom_field_order_cnt, and the
 IDR_PIC flag. Many other required-by-spec fields were left at zero-
 init (frame_num, idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
 dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
 slice_group_change_cycle, FIELD_PIC and BOTTOM_FIELD flags).
 For an IDR (first frame) on hantro-vpu RK3568, the kernel parses
 the bitstream from the OUTPUT buffer and uses these fields to drive
 its bitstream-element offset tracking. Empirically the kernel
 returned a successfully-decoded but ZEROED CAPTURE buffer — flat
 dark-green frames in mpv output, no errors logged.
 This patch fills every field VAAPI exposes:
  - frame_num: from VAPicture->frame_num.
  - FIELD_PIC flag: from VAPicture->pic_fields.bits.field_pic_flag.
  - BOTTOM_FIELD flag: from
    VAPicture->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD.
 Also corrects the IDR_PIC flag to use |= instead of = so the new
 field flags don't clobber it.
 Fields NOT derivable from VAAPI's pre-parsed structures —
 idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
 dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
 slice_group_change_cycle — require a slice_header() bit-level
 parse. libva-v4l2-request does not currently do this. They remain
 at zero-init.
 Empirical question this patch answers: does hantro tolerate the
 bit_size fields being zero for IDR frames, or does it strictly
 require them? If post-patch CAPTURE is still zeroed, a slice-header
 parser is required. If CAPTURE shows real picture data, hantro
 fills in the bit-positions itself when no hint is supplied.
 Cross-reference: gstv4l2codech264dec.c::
 gst_v4l2_codec_h264_dec_fill_decoder_params (commit 9e3e775,
 lines 632-678).
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/h264.c	2026-05-01 20:59:41.710154198 +0000
 +++ b/src/h264.c	2026-05-01 21:16:35.712995986 +0000
@@ -243,13 +243,34 @@
 	h264_fill_dpb(driver_data, context, decode);
 -	//decode->num_slices = surface->slices_count;
 +	/*
 +	 * Populate every V4L2_CID_STATELESS_H264_DECODE_PARAMS field
 +	 * we can derive from VAAPI's pre-parsed VAPictureParameterBuffer
 +	 * + bitstream byte. Cross-reference: GStreamer
 +	 * gstv4l2codech264dec.c::gst_v4l2_codec_h264_dec_fill_decoder_params
 +	 * (lines 632-678).
 +	 *
 +	 * Fields not derivable from VAAPI (idr_pic_id, pic_order_cnt_lsb,
 +	 * delta_pic_order_cnt_*, dec_ref_pic_marking_bit_size,
 +	 * pic_order_cnt_bit_size, slice_group_change_cycle) require a
 +	 * full slice_header() bit-level parse, which libva-v4l2-request
 +	 * does not currently do. They are left at zero-init and the
 +	 * kernel-side hantro-vpu may compute them itself when scanning
 +	 * the OUTPUT bitstream — a hypothesis verified empirically by
 +	 * running this patch and inspecting the CAPTURE buffer.
 +	 */
 	decode->nal_ref_idc = nal_ref_idc;
 -	if (nal_unit_type == 5)
 -		decode->flags = V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC;
 +	decode->frame_num = VAPicture->frame_num;
 	decode->top_field_order_cnt = VAPicture->CurrPic.TopFieldOrderCnt;
 	decode->bottom_field_order_cnt = VAPicture->CurrPic.BottomFieldOrderCnt;
 +	if (nal_unit_type == 5)
 +		decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC;
 +	if (VAPicture->pic_fields.bits.field_pic_flag)
 +		decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC;
 +	if (VAPicture->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD)
 +		decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD;
 +
 	pps->weighted_bipred_idc =
 		VAPicture->pic_fields.bits.weighted_bipred_idc;
 	pps->pic_init_qs_minus26 = VAPicture->pic_init_qs_minus26;
@@ -0,0 +1,58 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-01
 Subject: [PATCH] surface: don't VIDIOC_S_FMT the CAPTURE queue
 The hantro stateless decoder derives the CAPTURE format from the
 SPS attached to the per-request OUTPUT controls. Calling
 VIDIOC_S_FMT on the CAPTURE queue at vaCreateSurfaces2 time can
 leave the driver's vb2 state in an inconsistent configuration
 where the queue accepts buffers and DQBUF returns successfully but
 the kernel never actually writes decoded pixels into them.
 Cross-reference: GStreamer's gst-plugins-bad/sys/v4l2codecs/
 gstv4l2decoder.c only calls VIDIOC_G_FMT on the CAPTURE side
 (via gst_v4l2_decoder_negotiate_src_format and friends). The
 same code path produces correctly-decoded NV12 frames on the
 same RK3568 hantro-vpu where libva-v4l2-request-with-S_FMT
 emits flat-green zeroed CAPTURE buffers.
 The v4l2_get_format() call immediately after this block already
 gives us the bytesperline / sizes the driver chose; nothing else
 in this file consumed the explicit S_FMT side-effects.
 Empirical hypothesis test for the lingering "kernel decodes
 without errors but emits zeroed CAPTURE" bug. If post-patch
 output shows actual picture content, this confirms the
 diagnosis: explicit CAPTURE format mutation breaks hantro's
 internal state. If output remains flat-green, the bug is
 elsewhere and we resume hex-dump-grade instrumentation.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/surface.c	2026-05-01 21:16:19.588759711 +0000
 +++ b/src/surface.c	2026-05-01 21:41:12.095146549 +0000
@@ -118,10 +118,20 @@
 		capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);
 -		rc = v4l2_set_format(driver_data->video_fd, capture_type,
 -				     video_format->v4l2_format, width, height);
 -		if (rc < 0)
 -			return VA_STATUS_ERROR_OPERATION_FAILED;
 +		/*
 +		 * Do not VIDIOC_S_FMT on the CAPTURE queue. The hantro
 +		 * stateless decoder derives the CAPTURE format from the
 +		 * SPS attached to the OUTPUT request; explicitly setting
 +		 * it here can put the driver into an inconsistent state.
 +		 * GStreamer's v4l2slh264dec only G_FMTs CAPTURE (see
 +		 * gst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c::
 +		 * gst_v4l2_decoder_negotiate_src_format), and that
 +		 * variant produces correct decoded NV12 on the same
 +		 * hardware where this driver currently emits zeros.
 +		 *
 +		 * v4l2_get_format() below queries the driver's current
 +		 * state and gives us the bytesperline/sizes we need.
 +		 */
         } else {
 		video_format = driver_data->video_format;
 		capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);
@@ -0,0 +1,101 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-01
 Subject: [PATCH] DEBUG: hex-dump OUTPUT and CAPTURE buffer contents per frame
 Diagnostic-only patch (NOT for upstream). Hex-dumps:
  - First 32 bytes of OUTPUT buffer at QBUF time in
    picture.c::RequestEndPicture (i.e. what we feed the kernel)
  - First 32 bytes of CAPTURE Y-plane after DQBUF in
    surface.c::RequestSyncSurface (i.e. what kernel returned)
 Lets us see whether:
  - OUTPUT bitstream begins with valid ANNEX_B start code + NAL
    header byte (e.g. `00 00 01 65` for IDR slice)
  - CAPTURE Y-plane after decode contains varied luma data
    (working) vs. all-zeros / repeating pattern (kernel didn't
    write anything).
 Removed once Step 1 decode is verified working. Output goes via
 existing request_log() to stderr.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/picture.c	2026-05-01 21:41:00.114969150 +0000
 +++ b/src/picture.c	2026-05-01 21:50:11.123117853 +0000
@@ -36,6 +36,7 @@
 #include "mpeg2.h"
 #include <assert.h>
 +#include <stdio.h>
 #include <string.h>
 #include <errno.h>
@@ -354,6 +355,27 @@
 	if (rc < 0)
 		return VA_STATUS_ERROR_OPERATION_FAILED;
 +	/*
 +	 * DEBUG INSTRUMENTATION (0010): hex-dump first 32 bytes of the
 +	 * OUTPUT buffer at the moment we hand it to the kernel. Helps
 +	 * pin down whether our bitstream prepend logic is correct.
 +	 * For a valid ANNEX_B IDR slice the dump should start
 +	 * 00 00 01 65 ... (00 00 01 = start code; 0x65 = nal_ref_idc=3,
 +	 * nal_unit_type=5 = IDR slice). Removed once Step 1 decode is
 +	 * verified working.
 +	 */
 +	{
 +		const unsigned char *p = surface_object->source_data;
 +		char hex[32 * 3 + 1] = { 0 };
 +		unsigned int i, n = surface_object->slices_size < 32 ?
 +				    surface_object->slices_size : 32;
 +		for (i = 0; i < n; i++)
 +			snprintf(hex + i * 3, 4, " %02x", p[i]);
 +		request_log("OUTPUT[idx=%u, len=%u]:%s\n",
 +			    surface_object->source_index,
 +			    surface_object->slices_size, hex);
 +	}
 +
 	rc = v4l2_queue_buffer(driver_data->video_fd, request_fd, output_type,
 			       &surface_object->timestamp,
 			       surface_object->source_index,
 --- a/src/surface.c	2026-05-01 21:41:12.095146549 +0000
 +++ b/src/surface.c	2026-05-01 21:50:15.895188360 +0000
@@ -29,6 +29,7 @@
 #include <assert.h>
 #include <errno.h>
 +#include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
@@ -364,6 +365,30 @@
 		goto error;
 	}
 +	/*
 +	 * DEBUG INSTRUMENTATION (0010): hex-dump first 32 bytes of the
 +	 * decoded CAPTURE Y-plane after DQBUF. If the kernel actually
 +	 * decoded the frame, these should reflect a real Y-luma pattern
 +	 * (varied bytes). All-zero or all-identical means no decode
 +	 * landed pixels in the buffer. Removed once Step 1 is verified.
 +	 */
 +	{
 +		const unsigned char *p =
 +			(unsigned char *)surface_object->destination_map[0];
 +		char hex[32 * 3 + 1] = { 0 };
 +		unsigned int i;
 +		if (p == NULL) {
 +			request_log("CAPTURE[idx=%u, plane0]: (NULL)\n",
 +				    surface_object->destination_index);
 +		} else {
 +			for (i = 0; i < 32; i++)
 +				snprintf(hex + i * 3, 4, " %02x", p[i]);
 +			request_log("CAPTURE[idx=%u, plane0]:%s\n",
 +				    surface_object->destination_index,
 +				    hex);
 +		}
 +	}
 +
 	surface_object->status = VASurfaceDisplaying;
 	status = VA_STATUS_SUCCESS;
@@ -0,0 +1,53 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-01
 Subject: [PATCH] DEBUG: sentinel-pattern test for CAPTURE buffer write
 Diagnostic-only. Writes 0xab×32 into the CAPTURE buffer's first 32
 bytes immediately before VIDIOC_QBUF. The 0010 hex-dump after
 DQBUF reveals which case we're in:
  - All 0xab → kernel never wrote to this buffer (wrong buffer
    chosen, alias, or no decode actually happened despite
    bytesused=3655712 reported).
  - All zeros → kernel did write 0x00s (overwriting our
    sentinel), and the apparent "no picture" output is the
    kernel-side decode actually producing zeros (e.g. parser
    rejected the bitstream).
  - Mix of zeros and real luma values → kernel wrote real
    decoded pixels; CPU read sees stale-cached zeros somewhere
    OR the sentinel area was a header that decoder zeroed but
    rest is real. Need to check more bytes.
  - All 0xab still → kernel never touched this region but other
    parts of buffer may be filled (incomplete decode).
 Removed once Step 1 decode is verified.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/picture.c	2026-05-01 21:50:11.123117853 +0000
 +++ b/src/picture.c	2026-05-01 22:20:20.589037667 +0000
@@ -349,6 +349,24 @@
 	if (rc != VA_STATUS_SUCCESS)
 		return rc;
 +	/*
 +	 * DEBUG INSTRUMENTATION (0011): write a sentinel pattern into
 +	 * the CAPTURE buffer's first 32 bytes BEFORE QBUF. If after
 +	 * DQBUF the sentinel survives (per surface.c hex dump), the
 +	 * kernel never wrote to this buffer. If the sentinel is gone
 +	 * (replaced by zeros), the kernel did write but our CPU read
 +	 * sees stale-cached data — cache-coherency issue.
 +	 */
 +	{
 +		unsigned char *p = (unsigned char *)
 +			surface_object->destination_map[0];
 +		if (p != NULL) {
 +			unsigned int i;
 +			for (i = 0; i < 32; i++)
 +				p[i] = 0xab;
 +		}
 +	}
 +
 	rc = v4l2_queue_buffer(driver_data->video_fd, -1, capture_type, NULL,
 			       surface_object->destination_index, 0,
 			       surface_object->destination_buffers_count);
@@ -0,0 +1,214 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-02
 Subject: [PATCH] h264: gate SCALING_MATRIX submission on VAIQMatrixBuffer presence
 VAAPI signals "explicit scaling lists are present in the bitstream"
 implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a
 VAIQMatrixBufferH264 alongside RenderPicture iff
 sps_scaling_matrix_present_flag || pps_scaling_matrix_present_flag.
 When the bitstream uses default (flat) scaling, no IQMatrixBuffer
 arrives and the in-tree h264.matrix struct stays zero-initialised.
 fourier's existing codec_store_buffer for MPEG2 and HEVC tracks this
 via a per-surface iqmatrix_set boolean (surface.h::mpeg2.iqmatrix_set,
 h265.iqmatrix_set) — the H.264 path was missing the equivalent flag,
 so set_controls always submitted the scaling matrix, including the
 zero-initialised case.
 Symptom on hantro-vpu RK3568: when TRANSFORM_8X8_MODE is enabled in
 PPS, the kernel multiplies all 8x8 DCT coefficients by the zeroed
 scaling_list_8x8, producing a zeroed CAPTURE buffer despite a
 successful decode round-trip (no V4L2_BUF_FLAG_ERROR,
 bytesused=3655712 reported).
 Earlier draft of this patch unconditionally omitted SCALING_MATRIX in
 FRAME_BASED. That's corpus-correct (bbb has no explicit scaling
 lists) but the wrong predicate: the kernel-side gating is by
 "matrix-supplied vs. not," not by decode mode. Streams that signal
 explicit scaling lists must submit SCALING_MATRIX in either mode.
 Contract verification (audit_0008_decode_params_2026-05-01.md +
 hantro_h264.c::assemble_scaling_list): the kernel uses the supplied
 matrix when SCALING_MATRIX is in the control batch and falls back
 to spec-defined defaults when absent. Mode-independent.
 This patch:
  - surface.h: adds bool matrix_set to params.h264, mirroring
    mpeg2.iqmatrix_set / h265.iqmatrix_set.
  - picture.c codec_store_buffer (H.264 VAIQMatrixBufferType case):
    sets matrix_set = true when the buffer arrives.
  - picture.c RequestBeginPicture: resets matrix_set = false at the
    start of each Begin/Render/End cycle.
  - h264.c h264_set_controls: builds the controls[] array
    incrementally; SPS/PPS/DECODE_PARAMS always; SCALING_MATRIX iff
    matrix_set; SLICE_PARAMS only in SLICE_BASED; PRED_WEIGHTS only
    when both SLICE_BASED and V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
 The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is preserved —
 kernel doc ext-ctrls-codec-stateless.rst:752: "When this mode is
 selected, the V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall
 not be set."
 Cross-reference: kernel UAPI section
 ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SCALING_MATRIX
 (matrix supplied iff explicit scaling lists in bitstream) and
 hantro_h264.c::assemble_scaling_list (consumes supplied matrix or
 falls back to defaults).
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/surface.h
 +++ b/src/surface.h
@@ -73,6 +73,7 @@
 		} mpeg2;
 		struct {
 			VAIQMatrixBufferH264 matrix;
 +			bool matrix_set;
 			VAPictureParameterBufferH264 picture;
 			VASliceParameterBufferH264 slice;
 		} h264;
 --- a/src/picture.c
 +++ b/src/picture.c
@@ -153,6 +153,7 @@
 			memcpy(&surface_object->params.h264.matrix,
 			       buffer_object->data,
 			       sizeof(surface_object->params.h264.matrix));
 +			surface_object->params.h264.matrix_set = true;
 			break;
 		case VAProfileHEVCMain:
@@ -255,6 +256,7 @@
 	surface_object->source_size = slot->size;
 	surface_object->slices_size = 0;
 	surface_object->slices_count = 0;
 +	surface_object->params.h264.matrix_set = false;
 	surface_object->status = VASurfaceRendering;
 	context_object->render_surface_id = surface_id;
 --- a/src/h264.c
 +++ b/src/h264.c
@@ -553,66 +553,68 @@
 	sps.profile_idc = h264_profile_to_idc(profile);
 	/*
 -	 * Per-request control batch, ordered so the controls REQUIRED in
 -	 * V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED come first
 -	 * (indices 0..3) and the SLICE_BASED-only controls come last
 -	 * (indices 4..5).
 +	 * Build the per-request control list incrementally:
 +	 *   - SPS, PPS, DECODE_PARAMS: always required (in either decode
 +	 *     mode).
 +	 *   - SCALING_MATRIX: gated on surface->params.h264.matrix_set,
 +	 *     i.e. the consumer sent a VAIQMatrixBufferH264 this frame.
 +	 *     This matches the H.264 spec: explicit scaling lists are
 +	 *     present iff sps_scaling_matrix_present_flag ||
 +	 *     pps_scaling_matrix_present_flag, in which case VAAPI
 +	 *     consumers send the matrix; otherwise the kernel uses
 +	 *     spec-defined defaults. Independent of FRAME_BASED /
 +	 *     SLICE_BASED.
 +	 *   - SLICE_PARAMS: SLICE_BASED only. Kernel doc
 +	 *     ext-ctrls-codec-stateless.rst (FRAME_BASED entry):
 +	 *     "When this mode is selected, the
 +	 *     V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall not be
 +	 *     set." Submitting it under FRAME_BASED triggers cluster-
 +	 *     validation EINVAL at error_idx=count.
 +	 *   - PRED_WEIGHTS: SLICE_BASED + V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
 	 *
 -	 * Cross-reference: GStreamer gst-plugins-bad
 -	 * sys/v4l2codecs/gstv4l2codech264dec.c (commit 9e3e775,
 -	 * lines 1263-1304) gates SLICE_PARAMS and PRED_WEIGHTS on
 -	 * is_slice_based(self); under FRAME_BASED only SPS/PPS/
 -	 * SCALING_MATRIX/DECODE_PARAMS are submitted. The kernel
 -	 * parses the bitstream itself in FRAME_BASED mode; submitting
 -	 * per-slice controls in that mode triggers cluster-validation
 -	 * EINVAL at error_idx=count.
 -	 */
 -	struct v4l2_ext_control controls[6] = {
 -		{
 -			.id = V4L2_CID_STATELESS_H264_SPS,
 -			.p_h264_sps = &sps,
 -			.size = sizeof(sps),
 -		}, {
 -			.id = V4L2_CID_STATELESS_H264_PPS,
 -			.p_h264_pps = &pps,
 -			.size = sizeof(pps),
 -		}, {
 -			.id = V4L2_CID_STATELESS_H264_SCALING_MATRIX,
 -			.p_h264_scaling_matrix = &matrix,
 -			.size = sizeof(matrix),
 -		}, {
 -			.id = V4L2_CID_STATELESS_H264_DECODE_PARAMS,
 -			.p_h264_decode_params = &decode,
 -			.size = sizeof(decode),
 -		}, {
 -			.id = V4L2_CID_STATELESS_H264_SLICE_PARAMS,
 -			.p_h264_slice_params = &slice,
 -			.size = sizeof(slice),
 -		}, {
 -			.id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS,
 -			.ptr = &weights,
 -			.size = sizeof(weights),
 -		}
 -	};
 -
 -	/*
 -	 * Decode-mode dispatch. Patch 0002 unconditionally sets the
 -	 * device to FRAME_BASED, so we hardcode that here. When the
 -	 * planned probe-then-set commit lands, slice_based becomes
 -	 *     context->decode_mode == V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED
 -	 * with context->decode_mode populated at CreateContext via
 -	 * VIDIOC_QUERYCTRL/G_EXT_CTRLS.
 -	 *
 -	 * FRAME_BASED:    4 controls (SPS, PPS, SCALING_MATRIX, DECODE_PARAMS).
 -	 * SLICE_BASED:   +SLICE_PARAMS (always), +PRED_WEIGHTS (when
 -	 *                V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED).
 +	 * Patch 0002 unconditionally sets the device to FRAME_BASED,
 +	 * so slice_based is hardcoded false here. When the planned
 +	 * probe-then-set commit lands, this becomes
 +	 *     context->decode_mode == V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED.
 	 */
 +	struct v4l2_ext_control controls[6] = { 0 };
 +	unsigned int num_controls = 0;
 	const bool slice_based = false; /* TODO: probe via context->decode_mode */
 -	unsigned int num_controls = 4;
 +
 +	controls[num_controls].id = V4L2_CID_STATELESS_H264_SPS;
 +	controls[num_controls].p_h264_sps = &sps;
 +	controls[num_controls].size = sizeof(sps);
 +	num_controls++;
 +
 +	controls[num_controls].id = V4L2_CID_STATELESS_H264_PPS;
 +	controls[num_controls].p_h264_pps = &pps;
 +	controls[num_controls].size = sizeof(pps);
 +	num_controls++;
 +
 +	controls[num_controls].id = V4L2_CID_STATELESS_H264_DECODE_PARAMS;
 +	controls[num_controls].p_h264_decode_params = &decode;
 +	controls[num_controls].size = sizeof(decode);
 +	num_controls++;
 +
 +	if (surface->params.h264.matrix_set) {
 +		controls[num_controls].id = V4L2_CID_STATELESS_H264_SCALING_MATRIX;
 +		controls[num_controls].p_h264_scaling_matrix = &matrix;
 +		controls[num_controls].size = sizeof(matrix);
 +		num_controls++;
 +	}
 +
 	if (slice_based) {
 -		num_controls = 5;
 -		if (V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
 -			num_controls = 6;
 +		controls[num_controls].id = V4L2_CID_STATELESS_H264_SLICE_PARAMS;
 +		controls[num_controls].p_h264_slice_params = &slice;
 +		controls[num_controls].size = sizeof(slice);
 +		num_controls++;
 +
 +		if (V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice)) {
 +			controls[num_controls].id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS;
 +			controls[num_controls].ptr = &weights;
 +			controls[num_controls].size = sizeof(weights);
 +			num_controls++;
 +		}
 	}
 	rc = v4l2_set_controls(driver_data->video_fd, surface->request_fd,
@@ -0,0 +1,86 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-02
 Subject: [PATCH] h264: hardcode SPS level_idc = 51 (intentional over-allocation)
 fourier's h264_va_picture_to_v4l2 never assigns sps->level_idc; the
 field stays at zero-init. level_idc=0 is invalid per the H.264 spec
 (lowest legal value is 10, Level 1.0). Hantro and other stateless
 H.264 decoders use level_idc to pre-allocate decoder resources (DPB
 size, motion-vector buffers); when fed an invalid level the hantro
 kernel driver silently skips the decode-hardware dispatch — the V4L2
 request completes with no error, DQBUF returns the CAPTURE buffer
 reporting bytesused=3655712 and no V4L2_BUF_FLAG_ERROR, but the
 buffer is never written.
 VAAPI's decode-side VAPictureParameterBufferH264 structurally does
 NOT include level_idc — `grep level_idc va/va.h` returns only hits
 inside VAEncSequenceParameterBufferH264 (the encode path). The
 H.264 SPS NAL is also not included in VASliceDataBuffer because
 ffmpeg-vaapi parses it client-side and forwards only slice data
 (verified empirically via patch 0010's hex-dump of the OUTPUT
 buffer: it contains "00 00 01 65 ..." — i.e. ANNEX_B start code +
 IDR slice NAL byte, no SPS NAL). A SPS-NAL byte extractor is
 therefore not viable from the bitstream libva-v4l2-request
 receives.
 Workaround: hardcode level_idc = 51 (= Level 5.1, max for 1080p
 and 4K@30 mainstream consumer profiles). This INTENTIONALLY
 OVER-ALLOCATES decoder resources but is sufficient for any stream
 up to 4K@30. It is corpus-correct, not contract-correct: a 4K@60
 stream (Level 6.x) would under-allocate.
 This patch is a known-incomplete intermediate, not a final fix.
 The proper upstreamable answer is a level-from-resolution
 derivation per H.264 Annex A.3 (max MB rate / max frame size
 thresholds). That requires mapping consumer-side framerate which
 VAAPI does not expose, so the lookup table is non-trivial. The
 TODO is captured inline.
 This patch's goal is unblocking decode-hardware engagement on the
 ohm_gl_fix corpus while the full level-derivation work proceeds.
 Cross-reference: kernel doc
 ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SPS lists
 level_idc as a required field with no "kernel-derives" annotation —
 i.e., userspace-required.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/h264.c
 +++ b/src/h264.c
@@ -553,6 +553,35 @@
 	sps.profile_idc = h264_profile_to_idc(profile);
 	/*
 +	 * VAAPI's decode-side VAPictureParameterBufferH264 does not carry
 +	 * level_idc — see va.h, the field exists only in
 +	 * VAEncSequenceParameterBufferH264 on the encode path. The H.264
 +	 * SPS NAL is also not included in VASliceDataBuffer (ffmpeg-vaapi
 +	 * parses it client-side and forwards only slice data), so a
 +	 * SPS-NAL byte extractor is not viable from the bitstream we
 +	 * receive.
 +	 *
 +	 * Hantro and other stateless H.264 decoders use level_idc to
 +	 * pre-allocate decoder resources (DPB, motion-vector buffers); a
 +	 * zero-init level_idc=0 is invalid (lowest legal is 10 = Level
 +	 * 1.0) and causes hantro to silently skip the decode hardware
 +	 * dispatch.
 +	 *
 +	 * Hardcode level_idc = 51 (Level 5.1, max for 1080p/4K@30) as a
 +	 * known-incomplete intermediate. This INTENTIONALLY OVER-ALLOCATES
 +	 * decoder resources and is sufficient for any stream up to 4K@30.
 +	 * It is corpus-correct, not contract-correct.
 +	 *
 +	 * TODO: derive level_idc from (VAProfile, picture_width_in_mbs,
 +	 * picture_height_in_mbs) per H.264 Annex A.3 max-MB-per-second
 +	 * thresholds. That is a small lookup table but requires also
 +	 * mapping the consumer's framerate, which VAAPI doesn't provide
 +	 * directly. For now the over-allocation is the upstreamable
 +	 * compromise.
 +	 */
 +	sps.level_idc = 51;
 +
 +	/*
 	 * Build the per-request control list incrementally:
 	 *   - SPS, PPS, DECODE_PARAMS: always required (in either decode
 	 *     mode).
@@ -0,0 +1,97 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-01
 Subject: [PATCH] DEBUG: dump VAPictureH264 raw bytes + decoded fields
 Diagnostic-only. Investigating the observed anomaly:
  - V4L2 strace shows decode_params.top_field_order_cnt = 65536
    on the first IDR frame submitted by mpv+ffmpeg+libva-v4l2-request
  - GStreamer's reference path writes 0 (spec-correct: PicOrderCnt=0
    for IDR with pic_order_cnt_type=0 / pic_order_cnt_lsb=0)
  - Reading FFmpeg source (libavcodec/vaapi_h264.c::fill_vaapi_pic):
      va_pic->TopFieldOrderCnt = 0;
      if (pic->field_poc[0] != INT_MAX)
          va_pic->TopFieldOrderCnt = pic->field_poc[0];
    For IDR: ff_h264_init_poc sets field_poc[0] = poc_msb + poc_lsb
    = 0 + 0 = 0. So FFmpeg should write 0.
 If FFmpeg writes 0 but fourier reads 65536, the mismatch is in the
 libva ABI between ffmpeg's writer and our reader. Most likely
 suspect: VA_PADDING_LOW size in VAPictureH264 differs between the
 libva headers ffmpeg+libva were built against and the headers
 fourier was built against, shifting struct field offsets.
 This patch dumps:
  1. sizeof(VAPictureH264) at our reader's view
  2. First 32 raw bytes of VAPicture->CurrPic
  3. Field-decoded values via the .picture_id, .frame_idx, .flags,
     .TopFieldOrderCnt, .BottomFieldOrderCnt accessors
 If the raw bytes show 00 00 01 00 at offset 12 (= 65536 LE), the
 field offset is correct and FFmpeg actually wrote 65536 — meaning
 either FFmpeg has a bug, or our test scenario triggers a non-spec
 code path. If the raw bytes show 00 00 00 00 at offset 12 but
 TopFieldOrderCnt accessor returns 65536, the struct ABI is
 mismatched and we need to reconcile libva versions.
 If sizeof(VAPictureH264) prints as something other than 36 (= 4*5
 + 4*VA_PADDING_LOW assuming VA_PADDING_LOW=4), the struct layout
 on this build differs from the documented libva-2.x layout.
 Removed once the source of the 65536 is identified.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/h264.c	2026-05-01 22:56:42.656744048 +0000
 +++ b/src/h264.c	2026-05-02 00:00:00.000000000 +0000
@@ -28,6 +28,7 @@
 #include <assert.h>
 #include <limits.h>
 #include <string.h>
 +#include <stdio.h>
 #include <sys/ioctl.h>
 #include <sys/mman.h>
@@ -259,6 +259,42 @@
 	 * the OUTPUT bitstream — a hypothesis verified empirically by
 	 * running this patch and inspecting the CAPTURE buffer.
 	 */
 +	/*
 +	 * DEBUG INSTRUMENTATION (0014): dump the raw bytes of
 +	 * VAPicture->CurrPic plus sizeof(VAPictureH264) so we can
 +	 * tell whether the observed TopFieldOrderCnt=65536 anomaly is
 +	 * (a) at the documented byte-offset 12 (ffmpeg-side bug or
 +	 * intentional non-spec encoding) or
 +	 * (b) at a different offset (libva ABI / VA_PADDING_LOW
 +	 * mismatch between ffmpeg's writer and our reader).
 +	 *
 +	 * Documented VAPictureH264 layout (libva-2.x):
 +	 *   offset 0:  VASurfaceID picture_id  (uint32)
 +	 *   offset 4:  uint32 frame_idx
 +	 *   offset 8:  uint32 flags
 +	 *   offset 12: int32 TopFieldOrderCnt
 +	 *   offset 16: int32 BottomFieldOrderCnt
 +	 *   offset 20+: uint32 va_reserved[VA_PADDING_LOW]
 +	 */
 +	{
 +		const unsigned char *cp = (const unsigned char *)&VAPicture->CurrPic;
 +		char hex[32 * 3 + 1] = { 0 };
 +		unsigned int i;
 +		for (i = 0; i < 32; i++)
 +			snprintf(hex + i * 3, 4, " %02x", cp[i]);
 +		request_log("VAPictureH264 sizeof=%zu CurrPic[0..31]:%s\n",
 +			    sizeof(VAPictureH264), hex);
 +		request_log("VAPictureH264 CurrPic field reads: "
 +			    "picture_id=0x%08x frame_idx=%u flags=0x%x "
 +			    "TopFOC=%d BottomFOC=%d frame_num=%u\n",
 +			    (unsigned)VAPicture->CurrPic.picture_id,
 +			    (unsigned)VAPicture->CurrPic.frame_idx,
 +			    (unsigned)VAPicture->CurrPic.flags,
 +			    (int)VAPicture->CurrPic.TopFieldOrderCnt,
 +			    (int)VAPicture->CurrPic.BottomFieldOrderCnt,
 +			    (unsigned)VAPicture->frame_num);
 +	}
 +
 	decode->nal_ref_idc = nal_ref_idc;
 	decode->frame_num = VAPicture->frame_num;
 	decode->top_field_order_cnt = VAPicture->CurrPic.TopFieldOrderCnt;
@@ -0,0 +1,150 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-02
 Subject: [PATCH] h264: strip ffmpeg-vaapi POC sentinel before passing to V4L2
 ROOT CAUSE for "kernel decodes successfully but produces zeroed
 CAPTURE buffers despite no V4L2_BUF_FLAG_ERROR":
 ffmpeg's H264POCContext initialises prev_poc_msb to (1 << 16) =
 0x10000 as a sentinel for "uninitialised":
  libavcodec/h264dec.c:301 — global init in ff_h264_decode_init
  libavcodec/h264dec.c:444 — IDR reset in idr() helper
 ff_h264_init_poc (libavcodec/h264_parse.c:296-305) then computes
 pc->poc_msb = pc->prev_poc_msb whenever the slice header's
 pic_order_cnt_lsb hasn't wrapped relative to prev_poc_lsb (which
 is the typical case for any normal H.264 content with sane POC
 ordering). The sentinel leaks into field_poc[] (line 305) and from
 there into VAPictureH264.TopFieldOrderCnt / BottomFieldOrderCnt at
 libavcodec/vaapi_h264.c::fill_vaapi_pic (lines 73-78).
 Empirical confirmation via meitner 2026-05-02 ground-truth test:
 ran an LD_PRELOAD shim around vaCreateBuffer against an i965
 VAAPI backend decoding a 60-frame H.264 Main clip. Every frame
 showed TopFieldOrderCnt = (POC | 0x10000):
  Frame 1 IDR:  raw bytes "00 00 01 00" at offset 12 → TopFOC=65536
  Frame 2:      raw bytes "06 00 01 00"             → TopFOC=65542
  Frame 3:      "02 00 01 00"                       → TopFOC=65538
 i965 successfully decodes regardless. V4L2 stateless drivers
 (hantro_h264.c::prepare_table feeds the value direct to
 tbl->poc[i*2]/[32], the kernel reflist builder uses it directly
 for cur_pic_order_count comparison) cannot tolerate the high word —
 the kernel's resource sizing math sees POC=65536 for an IDR and
 breaks.
 This patch adds h264_strip_ffmpeg_poc_sentinel() as a small static
 inline in src/h264.c. It detects bit 16 set rather than blindly
 subtracting, so a future ffmpeg version that fixes the leak
 degrades gracefully. The helper is applied at all four POC sites:
  1. h264_fill_dpb:           dpb->top_field_order_cnt
  2. h264_fill_dpb:           dpb->bottom_field_order_cnt
  3. h264_va_picture_to_v4l2: decode->top_field_order_cnt
  4. h264_va_picture_to_v4l2: decode->bottom_field_order_cnt
 VA_PICTURE_H264_INVALID DPB slots are short-circuited to POC=0
 because libavcodec/vaapi_h264.c::init_vaapi_pic (line 43) already
 sets POC=0 there; the sentinel never applies. Zeroing them
 explicitly removes a class of "stale POC value in invalidated
 slot" foot-guns.
 Non-trivial follow-ups identified during the meitner experiment
 that are NOT addressed by this patch:
  - PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags are
    not yet derived from VASliceParameterBufferH264.slice_type. The
    bbb corpus is I-only at the start so this hasn't been a
    blocker, but a clip with B-frames will need the slice-type
    routing patch.
  - h264_fill_dpb's pic_num assignment (entry->pic.picture_id) is
    almost certainly wrong per the kernel doc — pic_num must equal
    the H.264 spec's PicNum / FrameNumWrap, not the VAAPI surface
    id. Out of scope here; will surface as a defect on streams
    that have multi-frame DPB lookups.
 Cross-references:
  audit_0008_decode_params_2026-05-01.md — kernel-side consumer
    audit confirming POC fields are userspace-required.
  api_contract_findings_2026-05-01.md — VAAPI doc gap on POC
    semantics; H.264 spec section 8.2.1 is the binding contract.
  meitner_2026-05-02_vaapi_idr_groundtruth/ — full empirical
    capture of the sentinel pattern across 60 frames.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/h264.c
 +++ b/src/h264.c
@@ -187,6 +187,43 @@
 	}
 }
 +/*
 + * Strip ffmpeg-vaapi's POC sentinel.
 + *
 + * ffmpeg's H264POCContext initialises prev_poc_msb to (1 << 16) =
 + * 0x10000 in libavcodec/h264dec.c (lines 301 and 444 of v8.0). After
 + * an IDR the idr() helper resets prev_poc_msb to that same sentinel.
 + * ff_h264_init_poc (libavcodec/h264_parse.c lines 296-305) then
 + * computes pc->poc_msb as prev_poc_msb when the slice header's
 + * poc_lsb hasn't wrapped — which is the typical case for normal
 + * content. The sentinel leaks into field_poc[] and from there into
 + * VAPictureH264.TopFieldOrderCnt / BottomFieldOrderCnt at
 + * libavcodec/vaapi_h264.c::fill_vaapi_pic.
 + *
 + * Working VAAPI backends (intel-iHD, i965 verified empirically on
 + * meitner 2026-05-02) tolerate the high word — they either mask it
 + * or treat POCs as relative comparisons. V4L2 stateless H.264
 + * driver-side consumers (hantro_h264.c::prepare_table feeds the
 + * value direct to tbl->poc[]) need the spec value, so we strip the
 + * sentinel here at the libva-v4l2-request boundary.
 + *
 + * Detection by bit-16-set rather than blind subtraction so that a
 + * future ffmpeg version that fixes the sentinel leak degrades
 + * gracefully. POC values for non-degenerate H.264 content rarely
 + * exceed 16 bits; bit 16 set is a strong signal of the sentinel.
 + *
 + * Empty DPB slots (VA_PICTURE_H264_INVALID) carry POC=0 by
 + * libavcodec/vaapi_h264.c::init_vaapi_pic and need no fix-up.
 + */
 +static inline int32_t h264_strip_ffmpeg_poc_sentinel(int32_t poc, uint32_t flags)
 +{
 +	if (flags & VA_PICTURE_H264_INVALID)
 +		return 0;
 +	if (poc & (1 << 16))
 +		return poc - (1 << 16);
 +	return poc;
 +}
 +
 static void h264_fill_dpb(struct request_data *data,
 			  struct object_context *context,
 			  struct v4l2_ctrl_h264_decode_params *decode)
@@ -210,8 +247,12 @@
 		dpb->frame_num = entry->pic.frame_idx;
 		dpb->pic_num = entry->pic.picture_id;
 -		dpb->top_field_order_cnt = entry->pic.TopFieldOrderCnt;
 -		dpb->bottom_field_order_cnt = entry->pic.BottomFieldOrderCnt;
 +		dpb->top_field_order_cnt =
 +			h264_strip_ffmpeg_poc_sentinel(entry->pic.TopFieldOrderCnt,
 +						       entry->pic.flags);
 +		dpb->bottom_field_order_cnt =
 +			h264_strip_ffmpeg_poc_sentinel(entry->pic.BottomFieldOrderCnt,
 +						       entry->pic.flags);
 		dpb->flags = V4L2_H264_DPB_ENTRY_FLAG_VALID;
@@ -298,8 +339,12 @@
 	decode->nal_ref_idc = nal_ref_idc;
 	decode->frame_num = VAPicture->frame_num;
 -	decode->top_field_order_cnt = VAPicture->CurrPic.TopFieldOrderCnt;
 -	decode->bottom_field_order_cnt = VAPicture->CurrPic.BottomFieldOrderCnt;
 +	decode->top_field_order_cnt =
 +		h264_strip_ffmpeg_poc_sentinel(VAPicture->CurrPic.TopFieldOrderCnt,
 +					       VAPicture->CurrPic.flags);
 +	decode->bottom_field_order_cnt =
 +		h264_strip_ffmpeg_poc_sentinel(VAPicture->CurrPic.BottomFieldOrderCnt,
 +					       VAPicture->CurrPic.flags);
 	if (nal_unit_type == 5)
 		decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC;
@@ -0,0 +1,82 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-02
 Subject: [PATCH] h264: derive PFRAME / BFRAME flags from VASlice slice_type
 v4l2_ctrl_h264_decode_params.flags has PFRAME and BFRAME bits per
 ext-ctrls-codec-stateless.rst. fourier never set them; libva-v4l2-
 request relied on each backing driver tolerating frame-class
 ambiguity.
 Kernel survey (linux 6.19.x):
  - tegra-vde/h264.c (lines 783-799) consumes both flags to select
    the inter-frame decode kernel. Without them the I-frame kernel
    runs on P/B content.
  - visl-trace-h264.h uses them for decode tracing.
  - hantro / rkvdec / cedrus / mediatek / qcom-iris-stateless do
    not consume the flags.
 Hantro on ohm decoded bbb cleanly without these flags set (see
 phase6/step1/ohm_smoke_2026-05-02T060255Z_post_0015/), so this is
 an upstreamability fix for cross-driver portability rather than a
 correctness fix for hantro.
 VAAPI's VASliceParameterBufferH264.slice_type maps directly to the
 H.264 slice_header() slice_type field. Per spec 7.4.3:
  0=P 1=B 2=I 3=SP 4=SI; 5..9 = "all slices in the picture have
  this slice_type." `slice_type % 5` recovers the underlying type
  in either encoding form.
 In FRAME_BASED mode we only see surface->params.h264.slice from the
 most-recent VASliceParameterBuffer — that's fine: a single coded
 picture has a uniform slice_type for the purposes of the PFRAME /
 BFRAME flag (multi-slice frames may mix slice types in some streams,
 but the flag's semantic is "this is an inter-coded frame," which
 holds if any slice is P or B; using the last-seen slice's type is
 a reasonable approximation).
 Cross-reference: ext-ctrls-codec-stateless.rst Decode Parameters
 Flags table.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/h264.c
 +++ b/src/h264.c
@@ -587,6 +587,38 @@
 			      &surface->params.h264.slice,
 			      &surface->params.h264.picture, &slice, &weights);
 +	/*
 +	 * Derive PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags
 +	 * from VASliceParameterBufferH264.slice_type. VAAPI's slice_type
 +	 * matches the H.264 spec slice_type semantic: 0=P, 1=B, 2=I, 3=SP,
 +	 * 4=SI; values 5..9 mean "all slices in the picture have this
 +	 * slice_type" (mod 5 yields the underlying type). VAAPI consumers
 +	 * (ffmpeg, mpv) populate this for every slice; in FRAME_BASED mode
 +	 * we only see the most-recent slice's params, but slice_type is
 +	 * uniform across a single coded picture for our purposes.
 +	 *
 +	 * Kernel consumers that read these flags: tegra-vde
 +	 * (drivers/media/platform/nvidia/tegra-vde/h264.c lines 783-799 of
 +	 * 6.19.x) selects the inter-frame decode kernel. Hantro / rkvdec /
 +	 * cedrus / mediatek / qcom-iris-stateless do not consume them.
 +	 * Setting them keeps the libva-v4l2-request fork upstreamable
 +	 * across drivers without affecting hantro behaviour.
 +	 *
 +	 * Cross-reference: ext-ctrls-codec-stateless.rst Decode Parameters
 +	 * Flags — V4L2_H264_DECODE_PARAM_FLAG_PFRAME / _BFRAME.
 +	 */
 +	switch (surface->params.h264.slice.slice_type % 5) {
 +	case H264_SLICE_P:
 +		decode.flags |= V4L2_H264_DECODE_PARAM_FLAG_PFRAME;
 +		break;
 +	case H264_SLICE_B:
 +		decode.flags |= V4L2_H264_DECODE_PARAM_FLAG_BFRAME;
 +		break;
 +	default:
 +		/* I / SP / SI: no extra flag. */
 +		break;
 +	}
 +
 	sps.profile_idc = h264_profile_to_idc(profile);
 	/*
@@ -0,0 +1,124 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-02
 Subject: [PATCH] h264: fill dpb[].pic_num as PicNum/LongTermPicNum, not VAAPI surface id
 fourier's h264_fill_dpb assigned `dpb->pic_num = entry->pic.picture_id`
 — the VAAPI surface id. Per ext-ctrls-codec-stateless.rst:651-655,
 v4l2_h264_dpb_entry.pic_num must equal the H.264 spec PicNum
 (equation 8-28) for short-term references or LongTermPicNum
 (equation 8-29) for long-term references. The surface id has no
 relationship to either.
 Kernel-side consumers of pic_num:
  - mediatek/decoder/vdec/vdec_h264_req_common.c (line 210):
    dst_entry->pic_num = src_entry->pic_num. Used for
    field-coded short-term reference disambiguation.
  - hantro / rkvdec / cedrus / qcom-iris-stateless: do NOT read
    pic_num. They resolve refs via reference_ts (timestamp)
    and POC. This is why fourier's wrong value never surfaced
    on RK3568 hantro.
 This patch makes pic_num spec-correct so the libva-v4l2-request
 fork is upstreamable across drivers without depending on each
 target's tolerance for non-spec fills.
 Computation, derived from H.264 spec section 8.2.4.1:
  For frames (not field-coded), PicNum = FrameNumWrap.
  FrameNumWrap = (frame_num > cur_frame_num)
                 ? frame_num - max_frame_num
                 : frame_num
  max_frame_num = 1 << (sps.log2_max_frame_num_minus4 + 4)
  cur_frame_num = current picture's frame_num
 For long-term references:
  LongTermPicNum = long_term_frame_idx (when not field-coded).
  VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic line 64):
    VAPictureH264.frame_idx = long_ref ? pic_id : frame_num
  So long-term refs already carry long_term_frame_idx in frame_idx;
  we copy it through.
 Field-coded streams require an extra factor-of-2 plus a parity
 adjustment per spec equations 8-28/8-29; this patch does not handle
 field-coded content. ohm corpus is all frame-coded so this is a
 follow-up for later.
 Implementation: add VAPicture parameter to h264_fill_dpb so the
 function has access to seq_fields.log2_max_frame_num_minus4 and
 the current picture's frame_num. Update the single caller in
 h264_va_picture_to_v4l2.
 Cross-reference: kernel doc ext-ctrls-codec-stateless.rst dpb_entry
 table (line 651-655) and mediatek/vdec/vdec_h264_req_common.c
 line 210.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/h264.c
 +++ b/src/h264.c
@@ -226,8 +226,12 @@
 static void h264_fill_dpb(struct request_data *data,
 			  struct object_context *context,
 +			  VAPictureParameterBufferH264 *VAPicture,
 			  struct v4l2_ctrl_h264_decode_params *decode)
 {
 +	const int max_frame_num =
 +		1 << (VAPicture->seq_fields.bits.log2_max_frame_num_minus4 + 4);
 +	const int cur_frame_num = (int)VAPicture->frame_num;
 	int i;
 	for (i = 0; i < H264_DPB_SIZE; i++) {
@@ -246,7 +250,41 @@
 		}
 		dpb->frame_num = entry->pic.frame_idx;
 -		dpb->pic_num = entry->pic.picture_id;
 +
 +		/*
 +		 * Per ext-ctrls-codec-stateless.rst, dpb[].pic_num must
 +		 * equal the H.264 spec's PicNum (8-28) for short-term refs
 +		 * or LongTermPicNum (8-29) for long-term refs.
 +		 *
 +		 * For frames (not field-coded), PicNum = FrameNumWrap.
 +		 * FrameNumWrap = (frame_num > cur_frame_num)
 +		 *                ? frame_num - max_frame_num
 +		 *                : frame_num
 +		 * (per spec section 8.2.4.1, frame_num wraparound).
 +		 *
 +		 * VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic
 +		 * line 64): VAPictureH264.frame_idx holds long_term_frame_idx
 +		 * for long-term refs and frame_num for short-term refs. So
 +		 * for long-term entries we copy frame_idx straight through
 +		 * as LongTermPicNum.
 +		 *
 +		 * fourier's previous code set pic_num to picture_id (the
 +		 * VAAPI surface id) which is unrelated to H.264 PicNum;
 +		 * mediatek's vdec_h264_req_common.c::dst_entry->pic_num is
 +		 * one consumer that fails on that. Hantro doesn't read
 +		 * pic_num at all (uses reference_ts for ref resolution),
 +		 * which is why fourier's wrong value never surfaced on
 +		 * RK3568.
 +		 */
 +		if (entry->pic.flags & VA_PICTURE_H264_LONG_TERM_REFERENCE) {
 +			dpb->pic_num = entry->pic.frame_idx;
 +		} else {
 +			int frame_num = (int)entry->pic.frame_idx;
 +			dpb->pic_num = (frame_num > cur_frame_num)
 +				? frame_num - max_frame_num
 +				: frame_num;
 +		}
 +
 		dpb->top_field_order_cnt =
 			h264_strip_ffmpeg_poc_sentinel(entry->pic.TopFieldOrderCnt,
 						       entry->pic.flags);
@@ -283,7 +321,7 @@
 	nal_ref_idc = (b[0] >> 5) & 0x3;
 	nal_unit_type = b[0] & 0x1f;
 -	h264_fill_dpb(driver_data, context, decode);
 +	h264_fill_dpb(driver_data, context, VAPicture, decode);
 	/*
 	 * Populate every V4L2_CID_STATELESS_H264_DECODE_PARAMS field
@@ -0,0 +1,159 @@
 From: Markus Fritsche <fritsche.markus@gmail.com>
 Date: 2026-05-02
 Subject: [PATCH] h264: derive sps.level_idc from H.264 Annex A.3 MaxFS
 Replaces patch 0013's hardcoded level_idc = 51 with a small lookup
 that picks the smallest level whose MaxFS contains the encoded
 frame size. Patch 0013's TODO is resolved by this change.
 VAAPI does not expose level_idc on the decode side
 (VAPictureParameterBufferH264 has no such field; only
 VAEncSequenceParameterBufferH264 carries it). The H.264 SPS NAL is
 parsed client-side by ffmpeg-vaapi and only slice data forwards in
 VASliceDataBuffer, so a SPS-NAL byte parser is not viable from the
 bitstream the libva-v4l2-request layer receives. We therefore
 derive level_idc from picture dimensions, which VAAPI does provide
 in VAPictureParameterBufferH264.picture_{width,height}_in_mbs_minus1.
 Annex A.3 (Table A-1) MaxFS thresholds:
  Level 1.0:    99 MBs   ( 176×144  =  11×9   = 99 )
  Level 1.1:   396       ( 352×288  =  22×18  = 396 )
  Level 2.0:   396
  Level 2.1:   792       ( 352×576  /  720×288 )
  Level 2.2:  1620       ( 720×480 ≈ 1350; 720×576 = 1620 )
  Level 3.0:  1620
  Level 3.1:  3600       (1280×720 ≈ 3600 )
  Level 3.2:  5120
  Level 4.0:  8192       (1920×1088 = 8160 fits )
  Level 4.1:  8192
  Level 4.2:  8704
  Level 5.0: 22080
  Level 5.1: 36864       (3840×2176 = 32640 fits; 4K@8K-edge )
  Level 5.2: 36864
  Level 6.0: 139264      (8K )
 V4L2 control encoding: level_idc = (level major × 10) + (level minor).
 Level 4.1 → 41, Level 5.1 → 51, Level 6.0 → 60.
 Picks for typical content:
  1080p (1920×1088 = 8160 MBs) → Level 4.1 (level_idc = 41)
  4K    (3840×2176 = 32640 MBs) → Level 5.1 (level_idc = 51)
  8K   (7680×4352 = 130560 MBs) → Level 6.0 (level_idc = 60)
 The previous hardcode of 51 was over-allocating for 1080p; with
 this patch hantro can pre-allocate based on the actual frame size.
 For our ohm corpus (1080p) this drops the requested DPB / MV
 buffer sizing from level-5.1 generosity to level-4.1 right-sized.
 Without VAAPI exposing framerate we cannot also check MaxMBPS /
 MaxBR / MaxCPB. The frame-size-based pick is acceptable in
 practice: temporally-dense streams almost always also push
 spatially-large frames, so MaxFS captures the dominant
 resource-sizing signal.
 Cross-reference: H.264 spec Annex A, Table A-1 ("Level limits").
 ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SPS lists
 level_idc as required-userspace-input, no kernel-derives annotation.
 Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
 ---
 --- a/src/h264.c
 +++ b/src/h264.c
@@ -638,6 +638,55 @@
 	}
 }
 +/*
 + * Derive sps.level_idc from the encoded frame size in macroblocks per
 + * H.264 Annex A.3 (Table A-1) MaxFS thresholds. Each level's MaxFS is
 + * the maximum encoded frame size in MBs the level supports; we pick
 + * the smallest level whose MaxFS contains the actual frame size.
 + *
 + * Level decoding for the V4L2 control: level_idc = level * 10
 + *   Level 1.0 → 10, Level 4.1 → 41, Level 5.1 → 51, Level 6.0 → 60.
 + *
 + * VAAPI does not expose the bitstream's actual level_idc on the
 + * decode side (VAPictureParameterBufferH264 has no such field) — see
 + * va.h. The H.264 SPS NAL is parsed client-side by ffmpeg-vaapi /
 + * mpv and only slice data is forwarded in VASliceDataBuffer, so a
 + * SPS-NAL byte parser is not viable at this layer.
 + *
 + * Without framerate we cannot also check MaxMBPS / MaxBR / MaxCPB.
 + * That gap is acceptable in practice: consumers that push
 + * temporally-dense streams (high MBPS) almost always also push
 + * spatially-large frames (high MaxFS), so frame-size-based level
 + * selection over-allocates on the temporal axis but never
 + * under-allocates a level the consumer relies on for correct
 + * decode-resource sizing.
 + *
 + * Picks for typical content:
 + *   1080p (8160 MBs) → Level 4.1 (level_idc = 41)
 + *   4K   (32400 MBs) → Level 5.1 (level_idc = 51)
 + *   8K  (138240 MBs) → Level 6.0 (level_idc = 60)
 + *
 + * Replaces the hardcoded level_idc=51 from patch 0013.
 + */
 +static inline __u8 h264_derive_level_idc(unsigned int width_in_mbs,
 +					 unsigned int height_in_mbs)
 +{
 +	const unsigned int frame_size_mbs = width_in_mbs * height_in_mbs;
 +
 +	if (frame_size_mbs <= 99)     return 10;  /* Level 1.0 */
 +	if (frame_size_mbs <= 396)    return 11;  /* Level 1.1 - 2.0 */
 +	if (frame_size_mbs <= 792)    return 21;  /* Level 2.1 */
 +	if (frame_size_mbs <= 1620)   return 22;  /* Level 2.2 - 3.0 */
 +	if (frame_size_mbs <= 3600)   return 31;  /* Level 3.1 */
 +	if (frame_size_mbs <= 5120)   return 32;  /* Level 3.2 */
 +	if (frame_size_mbs <= 8192)   return 41;  /* Level 4.0 - 4.1 */
 +	if (frame_size_mbs <= 8704)   return 42;  /* Level 4.2 */
 +	if (frame_size_mbs <= 22080)  return 50;  /* Level 5.0 */
 +	if (frame_size_mbs <= 36864)  return 51;  /* Level 5.1 - 5.2 */
 +	if (frame_size_mbs <= 139264) return 60;  /* Level 6.0 - 6.2 */
 +	return 62;                                /* > Level 6 ceiling */
 +}
 +
 int h264_set_controls(struct request_data *driver_data,
 		      struct object_context *context,
 		      VAProfile profile,
@@ -705,33 +754,15 @@
 	sps.profile_idc = h264_profile_to_idc(profile);
 	/*
 -	 * VAAPI's decode-side VAPictureParameterBufferH264 does not carry
 -	 * level_idc — see va.h, the field exists only in
 -	 * VAEncSequenceParameterBufferH264 on the encode path. The H.264
 -	 * SPS NAL is also not included in VASliceDataBuffer (ffmpeg-vaapi
 -	 * parses it client-side and forwards only slice data), so a
 -	 * SPS-NAL byte extractor is not viable from the bitstream we
 -	 * receive.
 -	 *
 -	 * Hantro and other stateless H.264 decoders use level_idc to
 -	 * pre-allocate decoder resources (DPB, motion-vector buffers); a
 -	 * zero-init level_idc=0 is invalid (lowest legal is 10 = Level
 -	 * 1.0) and causes hantro to silently skip the decode hardware
 -	 * dispatch.
 -	 *
 -	 * Hardcode level_idc = 51 (Level 5.1, max for 1080p/4K@30) as a
 -	 * known-incomplete intermediate. This INTENTIONALLY OVER-ALLOCATES
 -	 * decoder resources and is sufficient for any stream up to 4K@30.
 -	 * It is corpus-correct, not contract-correct.
 -	 *
 -	 * TODO: derive level_idc from (VAProfile, picture_width_in_mbs,
 -	 * picture_height_in_mbs) per H.264 Annex A.3 max-MB-per-second
 -	 * thresholds. That is a small lookup table but requires also
 -	 * mapping the consumer's framerate, which VAAPI doesn't provide
 -	 * directly. For now the over-allocation is the upstreamable
 -	 * compromise.
 +	 * Derive level_idc from encoded frame size per H.264 Annex A.3.
 +	 * VAAPI doesn't expose level_idc on the decode side (see
 +	 * h264_derive_level_idc()'s docblock for the rationale); we pick
 +	 * the smallest level whose MaxFS contains the picture dimensions.
 +	 * Replaces patch 0013's intermediate hardcode of 51.
 	 */
 -	sps.level_idc = 51;
 +	sps.level_idc = h264_derive_level_idc(
 +		(unsigned int)surface->params.h264.picture.picture_width_in_mbs_minus1 + 1u,
 +		(unsigned int)surface->params.h264.picture.picture_height_in_mbs_minus1 + 1u);
 	/*
 	 * Build the per-request control list incrementally:
@@ -0,0 +1,255 @@
 # Maintainer: Markus Fritsche <fritsche.markus@gmail.com>
 # Campaign: ohm_gl_fix Phase 6 Step 1
 #
 # Forks libva-v4l2-request to add hantro-vpu multiplanar + modern
 # stateless UAPI support. Conflicts/replaces stock libva-v4l2-request.
 #
 # Build target: fermi LXD on hertz (Arch ARM aarch64) via marfrit-packages
 # Gitea Actions; alternative: boltzmann via his subagent.
 pkgname=libva-v4l2-request-ohm-gl-fix
 _upstreampkg=libva-v4l2-request
 pkgver=1.0.0.r0.ga3c2476
 pkgrel=2
 pkgdesc="VA-API backend for V4L2 stateless decoders, hantro-vpu multiplanar fork"
 arch=('aarch64')
 url="https://github.com/bootlin/libva-v4l2-request"
 license=('LGPL2.1' 'MIT')
 depends=('libva' 'libdrm' 'systemd-libs')
 makedepends=('meson' 'ninja' 'pkgconf' 'git')
 provides=("${_upstreampkg}=${pkgver}" 'libva-driver')
 conflicts=("${_upstreampkg}")
 replaces=("${_upstreampkg}")
 # Bootlin upstream tarball — pinned to last meaningful commit 2019-05-17.
 # Use full SHA: github extracts the archive to <repo>-<full-sha>/, so the
 # short form would mismatch ${srcdir}/${_upstreampkg}-${_commit}.
 _commit=a3c2476de19e6635458273ceeaeceff124fabd63
 source=(
    "${_upstreampkg}-${_commit}.tar.gz::https://github.com/bootlin/libva-v4l2-request/archive/${_commit}.tar.gz"
    "fourier-local.patch"
    "0001-mplane-multiplanar-port.patch"
    "0002-pre-streamon-controls-and-output-pool.patch"
    "0003-v4l2-query-helpers.patch"
    "0004-context-request-pool.patch"
    "0005-h264-conditional-pred-weights.patch"
    "0006-h264-frame-based-omit-slice-controls.patch"
    "0007-context-h264-start-code-annex-b.patch"
    "0008-h264-fill-decode-params-from-vaapi.patch"
    "0009-surface-no-capture-sfmt.patch"
    "0010-DEBUG-hex-dump-output-capture.patch"
    "0011-DEBUG-sentinel-capture-buffer.patch"
    "0012-h264-omit-scaling-matrix-frame-based.patch"
    "0013-h264-sps-level-idc.patch"
    "0014-DEBUG-vapic-bytes-dump.patch"
    "0015-h264-strip-ffmpeg-poc-sentinel.patch"
    "0016-h264-derive-pframe-bframe-flags.patch"
    "0017-h264-dpb-picnum-correctness.patch"
    "0018-h264-level-idc-from-annex-a3.patch"
 )
 sha256sums=(
    '92b523050561d64f7b6016edb53ca00524805f9f31a8b566baf457bbb15716fa'
    '1577ff1e2fd7944d2af85bba07658c26b2c54787175c2cc9024174ad2425d3ac'
    '7637c8c76f86a4b745516cdfd1ee89484b7fe7ce88425ff38460bd4494e1451e'
    '5309582759f260456b15635be610c2fe6fe25cbdf427cf8ad851f74991dc8c6e'
    '4e38eacc2b2dc26094cbad38964e8dc8bec19d2ad408a37a3ee21952003e6c38'
    'e5b61965921093292912136ae21727c9c792d0417201d86dc90b2e622f1edbb0'
    'c7a8e02f2e84c6248586d1ceacf25f4c26578f2f365044c3b4a011080ec016e8'
    '5ea1f8b193a3cba21631b00ac3d9cb8c6016754f0af47b33fcccce9e0114b32a'
    '092abc79f639ecb7ac698a48fb544edc3b6eff3cb9b711efa2cc452365c17ed0'
    'ac81992783f562128f55620dc54507b70026342519bb7c7d3efadf6275387861'
    '6b2c26feeeaf253f87e0ef0517191b636c6945e374660a13574f3114331aaed6'
    'ef8706062302fd7f13535501b5b2e8aed5325dbb0d4d56a88674bef53ca96eee'
    '242a42e10ff09e4e82bcadb8824e036dddab94cf6dc9c5f6c80eb4c2cc5dda50'
    '417c39397dfbc86db2cabc6217f54d9072de26dedcacf9a965b909fc998de052'
    '472deb316ff3ad282c6be028cfaf033d69ddfee845dcd519c28a0692f298bb6a'
    '835378dd0b7c126a6101b8df0c015951d88f5139f9586a618af6b3ee503d67b6'
    '380d334a88213185183a05e7e55380de503e388fc29d8b11d96909dafcbbeb65'
    'eaf1e363de111ee43d7ca3e4b161d9a3a3f6b1c9ca3d8642871abe70f18fbf95'
    '7b6f0f63fdde32a411cf3230cabeb610ef8a6bd09777976a06dbd274daa540c7'
    '15a0a40b918988e77e5f36eebf15e9f45b2c13a6628b5640efdac528c57aab80'
 )
 prepare() {
    cd "${srcdir}/${_upstreampkg}-${_commit}"
    # Patch 0: fourier's stateless-control modernization.
    # - src/h264.c + src/picture.c → V4L2_CID_STATELESS_H264_*
    # - include/hevc-ctrls.h → redirect shim to <linux/v4l2-controls.h>
    # - src/meson.build: h265.c/h265.h commented out (HEVC excluded)
    patch -p1 < "${srcdir}/fourier-local.patch"
    # Hygiene: include/h264-ctrls.h is dead post-fourier (no source
    # includes it, no install_headers directive). Drop it so the
    # built source tree has no stale UAPI carry-over.
    rm -f include/h264-ctrls.h
    # Patch 1: ohm-gl-fix multiplanar port.
    # - V4L2_BUF_TYPE_VIDEO_{OUTPUT,CAPTURE} -> *_MPLANE in src/v4l2.c
    # - per-plane VIDIOC_EXPBUF in src/surface.c
    # - struct v4l2_plane planes[] threading throughout
    # - image.c plane-stride adjustments
    patch -p1 < "${srcdir}/0001-mplane-multiplanar-port.patch"
    # Patch 2: pre-STREAMON device controls + minimum OUTPUT pool.
    # - context.c: floor OUTPUT pool to 4 buffers (not surfaces_count)
    # - context.c: set V4L2_CID_STATELESS_H264_{DECODE_MODE,START_CODE}
    #              device-wide before VIDIOC_STREAMON
    # THROWAWAY: superseded inline by patches 4 (request_pool) and 5
    # (probe-then-set DECODE_MODE) per upstreamable_design.md §5.
    patch -p1 < "${srcdir}/0002-pre-streamon-controls-and-output-pool.patch"
    # Patch 3 (commit 2 in revised plan): QUERYCTRL/QUERYMENU helpers.
    # Pure utility additions to src/v4l2.{c,h}, no behaviour change.
    # Unblocks the request_pool and probe-then-set commits.
    patch -p1 < "${srcdir}/0003-v4l2-query-helpers.patch"
    # Patch 4 (commit 3 in revised plan): request_pool decoupling.
    # NEW src/request_pool.{c,h}; context.c uses pool instead of
    # per-surface OUTPUT loop; picture.c borrows on Begin, releases
    # on Sync after DQBUF. Deletes 0002's "floor to 4" hunk inline
    # — the pool's count parameter supersedes it. 0002's
    # set_controls block remains until probe-then-set commit lands.
    patch -p1 < "${srcdir}/0004-context-request-pool.patch"
    # Patch 5: conditional PRED_WEIGHTS submission. Defect-fix found
    # via ohm smoke testing (kernel rejects PRED_WEIGHTS at
    # error_idx=5 on Main-profile clips without weighted prediction).
    # Not part of Sonnet's planned series, but unblocks per-frame
    # decode on every backing driver.
    patch -p1 < "${srcdir}/0005-h264-conditional-pred-weights.patch"
    # Patch 6: omit per-slice controls in FRAME_BASED mode. Identified
    # via cross-reference against GStreamer's gstv4l2codech264dec.c
    # (commit 9e3e775). FRAME_BASED requests must contain only
    # SPS/PPS/SCALING_MATRIX/DECODE_PARAMS — submitting SLICE_PARAMS
    # triggers V4L2 cluster validation EINVAL at error_idx=count.
    # Hardcodes slice_based=false for now since 0002 sets FRAME_BASED;
    # promotes to runtime probe via context->decode_mode in a
    # follow-up commit.
    patch -p1 < "${srcdir}/0006-h264-frame-based-omit-slice-controls.patch"
    # Patch 7: enable ANNEX_B start-code emission in
    # codec_store_buffer to match the device-side START_CODE_ANNEX_B
    # set by 0002. Without this, kernel sees a raw NAL stream with
    # no 0x00 0x00 0x01 markers, fails to parse slice boundaries,
    # and emits a zeroed CAPTURE buffer (flat green frames in mpv).
    patch -p1 < "${srcdir}/0007-context-h264-start-code-annex-b.patch"
    # Patch 8: fill DECODE_PARAMS frame_num + FIELD_PIC/BOTTOM_FIELD
    # flag bits from VAAPI. fourier left these zero-init; under
    # FRAME_BASED on hantro the kernel uses them to drive bitstream
    # parsing. Empirical question: does hantro tolerate the bit_size
    # fields (idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
    # dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
    # slice_group_change_cycle) being zero, or do we need a
    # slice_header() bit-level parser?
    patch -p1 < "${srcdir}/0008-h264-fill-decode-params-from-vaapi.patch"
    # Patch 9: drop VIDIOC_S_FMT on CAPTURE queue. Hantro derives
    # CAPTURE format from per-request SPS; explicit S_FMT here can
    # leave the driver in a state where DQBUF returns zeroed
    # buffers despite no errors. GStreamer's reference path only
    # G_FMTs the CAPTURE side.
    patch -p1 < "${srcdir}/0009-surface-no-capture-sfmt.patch"
    # Patch 10: DEBUG-only instrumentation. Hex-dumps OUTPUT and
    # CAPTURE buffer first 32 bytes per frame via request_log().
    # Removed before upstream submission.
    patch -p1 < "${srcdir}/0010-DEBUG-hex-dump-output-capture.patch"
    # Patch 11: DEBUG-only sentinel write before CAPTURE QBUF.
    # Tells us whether kernel wrote to the buffer (sentinel gone)
    # or didn't (sentinel survives).
    patch -p1 < "${srcdir}/0011-DEBUG-sentinel-capture-buffer.patch"
    # Patch 12 (REVISED 2026-05-02): gate SCALING_MATRIX submission
    # on a per-surface matrix_set flag mirroring fourier's existing
    # mpeg2.iqmatrix_set / h265.iqmatrix_set pattern. The earlier
    # draft of this patch unconditionally omitted SCALING_MATRIX in
    # FRAME_BASED, which was corpus-correct (bbb has no explicit
    # scaling lists) but the wrong predicate — kernel-side gating
    # is by "matrix-supplied vs. not," not by decode mode. Streams
    # with explicit scaling lists must still submit in either mode.
    # Three coordinated changes: surface.h adds bool matrix_set;
    # picture.c sets it on VAIQMatrixBuffer arrival and resets it
    # in RequestBeginPicture; h264.c builds controls[] incrementally.
    # The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is
    # preserved (kernel doc is explicit on it).
    patch -p1 < "${srcdir}/0012-h264-omit-scaling-matrix-frame-based.patch"
    # Patch 13 (REVISED 2026-05-02): hardcode SPS level_idc = 51 as
    # an INTENTIONALLY OVER-ALLOCATING known-incomplete intermediate.
    # VAAPI's decode-side picture-parameter buffer structurally lacks
    # level_idc (only present in encode path). The H.264 SPS NAL is
    # not in VASliceDataBuffer either (ffmpeg-vaapi parses it
    # client-side and forwards only slice data — verified via the
    # 0010 hex dump showing OUTPUT first bytes are "00 00 01 65 ...",
    # i.e. start code + IDR slice NAL, no SPS). So a SPS-NAL byte
    # extractor is not viable. TODO captured inline for level-from-
    # resolution derivation per H.264 Annex A.3.
    patch -p1 < "${srcdir}/0013-h264-sps-level-idc.patch"
    # Patch 14: DEBUG-only — dump VAPictureH264 raw bytes + decoded
    # fields. Used to disambiguate the TopFieldOrderCnt=65536 anomaly
    # on ohm in 2026-04-30..2026-05-02 investigation. The dump output
    # cross-referenced against meitner ground-truth (i965 backend)
    # confirmed +0x10000 is ffmpeg-vaapi convention, not an ohm bug.
    # Resolution: patch 0015. This patch stays in the series until
    # the 65536 sentinel handling has been validated on ohm; remove
    # before upstream submission.
    patch -p1 < "${srcdir}/0014-DEBUG-vapic-bytes-dump.patch"
    # Patch 15: strip ffmpeg-vaapi's POC sentinel before passing to
    # V4L2. Root-cause fix for the "kernel decodes successfully but
    # produces zeroed CAPTURE buffers" symptom. ffmpeg's
    # H264POCContext initialises prev_poc_msb to (1 << 16) and the
    # value leaks through field_poc[] to VAPictureH264. Working
    # backends (i965, intel-iHD) tolerate the high word; V4L2
    # stateless drivers cannot. Adds h264_strip_ffmpeg_poc_sentinel()
    # static inline and applies it at all 4 POC sites (DPB top/bot,
    # CurrPic top/bot). Detection by bit-16-set so a future ffmpeg
    # version that fixes the leak degrades gracefully.
    patch -p1 < "${srcdir}/0015-h264-strip-ffmpeg-poc-sentinel.patch"
    # Patch 16: derive PFRAME / BFRAME flags from VAAPI slice_type.
    # Upstreamability fix — tegra-vde consumes these flags to choose
    # the inter-frame decode kernel; hantro/rkvdec/cedrus/mediatek/
    # qcom don't read them but should still see spec-correct values.
    patch -p1 < "${srcdir}/0016-h264-derive-pframe-bframe-flags.patch"
    # Patch 17: fill dpb[].pic_num as PicNum/LongTermPicNum per H.264
    # spec equations 8-28/8-29 instead of fourier's wrong VAAPI
    # surface-id assignment. Adds VAPicture parameter to h264_fill_dpb
    # so it can compute FrameNumWrap from log2_max_frame_num_minus4 +
    # current frame_num. Mediatek consumes pic_num for short-term
    # field-coded ref disambiguation; hantro doesn't read it (uses
    # reference_ts), which is why fourier's wrong value never
    # surfaced on RK3568.
    patch -p1 < "${srcdir}/0017-h264-dpb-picnum-correctness.patch"
    # Patch 18: derive sps.level_idc from encoded frame size per
    # H.264 Annex A.3 (Table A-1) MaxFS thresholds. Replaces patch
    # 0013's intermediate hardcode of 51. For typical content:
    # 1080p → Level 4.1 (level_idc=41), 4K → 5.1, 8K → 6.0. Hantro
    # uses level_idc to size DPB / MV buffers; correct sizing means
    # less wasted memory than 0013's blanket over-allocation.
    patch -p1 < "${srcdir}/0018-h264-level-idc-from-annex-a3.patch"
 }
 build() {
    cd "${srcdir}/${_upstreampkg}-${_commit}"
    # meson_options.txt only exposes 'kernel_headers' — leave it empty to
    # use system /usr/include kernel UAPI headers. No per-codec toggles.
    arch-meson build --buildtype=release
    meson compile -C build
 }
 package() {
    cd "${srcdir}/${_upstreampkg}-${_commit}"
    meson install -C build --destdir "${pkgdir}"
    install -Dm644 COPYING       "${pkgdir}/usr/share/licenses/${pkgname}/COPYING"
    install -Dm644 COPYING.LGPL  "${pkgdir}/usr/share/licenses/${pkgname}/COPYING.LGPL"
    install -Dm644 COPYING.MIT   "${pkgdir}/usr/share/licenses/${pkgname}/COPYING.MIT"
 }
@@ -0,0 +1,58 @@
 # libva-v4l2-request-ohm-gl-fix
 Bootlin's libva-v4l2-request VA-API backend, with hantro-vpu
 multi-planar + chromium-149-era stateless H.264 patches developed
 in the [ohm_gl_fix campaign](../../../ohm_gl_fix/) Phase 6 Step 1
 (2026-05-01..2026-05-02).
 Patches 0001-0018 are contract-correct against the kernel V4L2
 stateless H.264 UAPI, validated by inspection against
 hantro_h264.c and v4l2_h264_init_reflist_builder() in linux 6.19.x.
 See ohm_gl_fix's `phase6/step1/audit_0008_decode_params_2026-05-01.md`
 and `phase6/step1/api_contract_findings_2026-05-01.md` for the
 audit trail.
 ## Honest characterisation
 This package compiles cleanly, installs cleanly, and `vainfo` with
 `LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1`
 enumerates all H.264 profiles. It is, however, **not on Brave's
 critical decode path** on the ohm_gl_fix Step-1+Step-2 stack —
 Brave/Chromium uses its own `V4L2VideoDecoder` in
 `media/gpu/v4l2/`, opens `/dev/video1` directly, and never loads
 libva. See `phase3_remeasure_2026-05-02/B3_decoder_discovery.md`
 in the ohm_gl_fix repo for the strace/maps evidence.
 For libva consumers that DO route through libva (mpv with
 `--hwdec=vaapi`, ffmpeg-vaapi, Firefox with
 `media.ffmpeg.vaapi.enabled=true`), this backend would in
 principle engage hantro hardware decode — but each consumer hits
 its own downstream issue:
 - mpv `--vo=gpu-next`: blocked by a Mesa-panfrost WSI pitch bug
  during EGL dmabuf import (out of scope for this package; see
  `phase3_remeasure_2026-05-02/A3_mesa_wsi_pitch.md`).
 - mpv `--vo=image`: silently falls back to libavcodec SW decode
  rather than engaging the libva session (see
  `phase3_remeasure_2026-05-02/A1_morning_pass_disambiguation.md`).
  Reason not yet diagnosed.
 The most likely use case where this backend cleanly delivers
 hardware decode end-to-end is **Firefox via libavcodec-vaapi**, on
 a stack that also has the Mesa pitch issue resolved (or that
 doesn't hit the EGL import path because Firefox's video element
 composites differently). Untested at time of writing.
 ## DEBUG patches in the series
 `0010-DEBUG-hex-dump-output-capture.patch`,
 `0011-DEBUG-sentinel-capture-buffer.patch`, and
 `0014-DEBUG-vapic-bytes-dump.patch` produce verbose stderr output
 useful for diagnosing decode-path issues but excessive for
 production. They are intentionally kept in the applied series
 during development; remove for cleaner runs.
 ## Status
 Tagged stable as of 2026-05-02 against chromium-149 / linux-6.19.10
 / libva-2.22.0. Contract-correct; ecosystem-validation-pending.