Add libva-v4l2-request-ohm-gl-fix package

Mirrors phase6/step1/ from the ohm_gl_fix campaign. Contract-correct hantro multi-planar / chromium-149-era stateless H.264 port of bootlin's libva-v4l2-request, patches 0001..0018 + fourier-local. Honest characterisation in README: - Builds cleanly on chromium-builder LXC (boltzmann) - vainfo enumerates H.264 profiles cleanly with LIBVA_DRIVER_NAME=v4l2_request - NOT on Brave's decode path on ohm_gl_fix stack — Brave uses Chromium's own V4L2VideoDecoder in media/gpu/v4l2/. - Most likely useful for a future Firefox-via-libavcodec-vaapi campaign, modulo a separate Mesa-panfrost WSI pitch issue. - DEBUG patches (0010, 0011, 0014) intentionally kept in series for development; remove for cleaner production runs. Audit trail in the source repo at ohm_gl_fix: phase6/step1/audit_0008_decode_params_2026-05-01.md phase6/step1/api_contract_findings_2026-05-01.md phase3_remeasure_2026-05-02/B3_decoder_discovery.md (why this isn't on Brave's path)
2026-05-02 15:17:10 +00:00
parent 7a5ec202ff
commit b47938e0bc
21 changed files with 3692 additions and 0 deletions
@@ -0,0 +1,70 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-01
+Subject: [PATCH] mplane: enable V4L2 multiplanar capture for NV12 on hantro-vpu
+
+Fourier's local patch already wired multiplanar plumbing through
+src/v4l2.c (helpers v4l2_type_video_{output,capture}() at lines 59-69,
+struct v4l2_plane planes[] threading in QUERYBUF/QBUF/DQBUF, per-plane
+EXPBUF loop at line 411) and through src/context.c, src/buffer.c,
+src/picture.c via the v4l2_type_video_{output,capture}(video_format
+->v4l2_mplane) helper calls.
+
+The remaining gap: the NV12 entry in src/video.c was hardcoded to
+v4l2_mplane=false, and the bootstrap path in src/surface.c was
+hardcoded to singleplanar literals before video_format is populated.
+
+This patch flips the NV12 entry to v4l2_mplane=true and updates the
+two singleplanar literals in src/surface.c to their MPLANE variants:
+
+  - src/video.c:42  v4l2_mplane=false -> true (NV12 only;
+    Sunxi-tiled NV12 left at false for cedrus compatibility)
+  - src/surface.c:84  output_type = v4l2_type_video_output(true)
+  - src/surface.c:109 v4l2_find_format(..., CAPTURE_MPLANE, NV12)
+
+Empirically, hantro-vpu (RK3568 mainline) advertises NV12 only under
+V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; querying the singleplanar type
+returns no match (verified via VIDIOC_ENUM_FMT in Phase 3 GStreamer
+strace baseline).
+
+Trade-off accepted: legacy sunxi-cedrus singleplanar NV12 paths are
+left unchanged via the SUNXI_TILED_NV12 entry (still mplane=false,
+__arm__ only). Pure-NV12 cedrus on aarch64 would regress, but the
+known userbase here is RK3566/RK3568 hantro.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+ src/surface.c | 4 ++--
+ src/video.c   | 2 +-
+ 2 files changed, 3 insertions(+), 3 deletions(-)
+
+--- a/src/video.c
+++ b/src/video.c
+@@ -39,7 +39,7 @@ static struct video_format formats[] = {
+ 		.description		= "NV12 YUV",
+ 		.v4l2_format		= V4L2_PIX_FMT_NV12,
+ 		.v4l2_buffers_count	= 1,
+-		.v4l2_mplane		= false,
+		.v4l2_mplane		= true,
+ 		.drm_format		= DRM_FORMAT_NV12,
+ 		.drm_modifier		= DRM_FORMAT_MOD_NONE,
+ 		.planes_count		= 2,
+--- a/src/surface.c
+++ b/src/surface.c
+@@ -81,7 +81,7 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
+ 	// we declare SET_FORMAT_OF_OUTPUT_ONCE to ensure v4l2_set_format only gets called once
+ 	// (in the first RequestCreateSurfaces2 call BEFORE any buffers are created later on)
+ 	unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE;
+-	unsigned int output_type = v4l2_type_video_output(false);
+	unsigned int output_type = v4l2_type_video_output(true);
+
+ 	if (!SET_FORMAT_OF_OUTPUT_ONCE) {
+ 		rc = v4l2_set_format(driver_data->video_fd, output_type, pixelformat,
+@@ -106,7 +106,7 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
+ 			video_format = video_format_find(V4L2_PIX_FMT_SUNXI_TILED_NV12);
+
+ 		found = v4l2_find_format(driver_data->video_fd,
+-					 V4L2_BUF_TYPE_VIDEO_CAPTURE,
+					 V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE,
+ 					 V4L2_PIX_FMT_NV12);
+ 		if (found)
+ 			video_format = video_format_find(V4L2_PIX_FMT_NV12);
@@ -0,0 +1,103 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-01
+Subject: [PATCH] context: pre-STREAMON device controls and minimum OUTPUT pool
+
+Two related fixes that surfaced during the first hantro-vpu (RK3568)
+smoke test of the multiplanar build:
+
+1. **OUTPUT queue must be non-empty at STREAMON.** Hantro's
+   vb2_start_streaming rejects an empty queue with EINVAL. Some
+   VA-API callers (notably ffmpeg's vaapi-copy path) call
+   vaCreateContext with num_render_targets=0 and allocate render
+   targets lazily. The OUTPUT (bitstream-input) pool must NOT be
+   sized off surfaces_count alone — it is a request-time resource,
+   not per-surface. Quick fix: floor the pool to 4 buffers when
+   the caller passes 0. (A proper decoupling of OUTPUT pool from
+   surface lifecycle is documented in upstreamable_design.md.)
+
+2. **Device-wide stateless H.264 controls before STREAMON.** The
+   V4L2 stateless framework requires V4L2_CID_STATELESS_H264_
+   DECODE_MODE and START_CODE be set on the device fd
+   (request_fd=-1) before stream start. Per-request controls
+   (SPS/PPS/SLICE_PARAMS/etc.) attached to a request_fd come
+   later via h264_set_controls(). hantro-vpu accepts only
+   DECODE_MODE_FRAME_BASED; START_CODE_ANNEX_B matches what the
+   existing slice-assembly path emits.
+
+   This is set unconditionally for now (errors silently ignored)
+   to keep cedrus and other backends compatible — they may
+   default to SLICE_BASED and not expose DECODE_MODE at all.
+   Probe-then-set via VIDIOC_QUERYCTRL is the upstream-correct
+   approach (see upstreamable_design.md §3).
+
+After this patch, vainfo still enumerates as before, but the first
+mpv vaapi-copy attempt advances past STREAMON and into actual
+decode submission.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+ src/context.c | 38 +++++++++++++++++++++++++++++++++++++-
+ 1 file changed, 37 insertions(+), 1 deletion(-)
+
+--- a/src/context.c
+++ b/src/context.c
+@@ -64,6 +64,7 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
+ 	VAContextID id;
+ 	VAStatus status;
+ 	unsigned int output_type, capture_type;
+	unsigned int output_buffers_count;
+ 	unsigned int index_base;
+ 	unsigned int index;
+ 	unsigned int i;
+@@ -90,8 +91,16 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
+ 	}
+ 	memset(&context_object->dpb, 0, sizeof(context_object->dpb));
+
+	/*
+	 * The OUTPUT (bitstream-input) queue must be non-empty before
+	 * VIDIOC_STREAMON or hantro-class drivers reject with EINVAL.
+	 * VA-API callers (e.g. ffmpeg's vaapi-copy path) may invoke
+	 * vaCreateContext with num_render_targets=0; allocate a small
+	 * minimum pool independent of the caller's surface count.
+	 */
+	output_buffers_count = surfaces_count > 0 ? surfaces_count : 4;
+ 	rc = v4l2_create_buffers(driver_data->video_fd, output_type,
+-				 surfaces_count, &index_base);
+				 output_buffers_count, &index_base);
+ 	if (rc < 0) {
+ 		status = VA_STATUS_ERROR_ALLOCATION_FAILED;
+ 		goto error;
+@@ -138,6 +147,33 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
+ 		surface_object->source_size = length;
+ 	}
+
+	/*
+	 * Stateless H.264 device-wide controls. The kernel V4L2 stateless
+	 * framework requires DECODE_MODE and START_CODE be set on the
+	 * device fd (request_fd=-1) before VIDIOC_STREAMON; per-request
+	 * controls (SPS/PPS/etc.) attached to a request_fd come later.
+	 *
+	 * hantro-vpu (RK3568) accepts only DECODE_MODE_FRAME_BASED.
+	 * START_CODE_ANNEX_B preserves leading 0x00000001 in the slice
+	 * payload that h264.c assembles. Errors here are not fatal: not
+	 * every backing driver supports both controls (e.g. cedrus may
+	 * default to SLICE_BASED without exposing DECODE_MODE).
+	 */
+	{
+		struct v4l2_ext_control dev_ctrls[2] = {
+			{
+				.id = V4L2_CID_STATELESS_H264_DECODE_MODE,
+				.value = V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED,
+			},
+			{
+				.id = V4L2_CID_STATELESS_H264_START_CODE,
+				.value = V4L2_STATELESS_H264_START_CODE_ANNEX_B,
+			},
+		};
+		(void)v4l2_set_controls(driver_data->video_fd, -1,
+					dev_ctrls, 2);
+	}
+
+ 	rc = v4l2_set_stream(driver_data->video_fd, output_type, true);
+ 	if (rc < 0) {
+ 		status = VA_STATUS_ERROR_OPERATION_FAILED;
@@ -0,0 +1,145 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-01
+Subject: [PATCH] v4l2: add QUERYCTRL/QUERYMENU capability-probe helpers
+
+Pure utility additions, no behaviour change. Three helpers in
+src/v4l2.{c,h}:
+
+  - v4l2_query_ext_ctrl(): wraps VIDIOC_QUERY_EXT_CTRL by CID.
+    Returns 0 if the control exists, -1 if not. Caller passes NULL
+    qec to test existence only.
+
+  - v4l2_query_menu(): wraps VIDIOC_QUERYMENU at a given index.
+    Returns 0 if a menu item exists at that index, -1 otherwise.
+
+  - v4l2_ctrl_menu_has_value(): convenience layered on the above.
+    For a menu/intmenu-type control, walks all menu items between
+    minimum and maximum and returns true iff `value` is a valid
+    entry. Used by callers that ask "does this driver accept menu
+    value X for this CID?" without caring about iteration details.
+
+These unblock commit 3 (request_pool — needs ext-ctrl probing for
+codec-ops dispatch) and commit 4 (probe-then-set DECODE_MODE/
+START_CODE — replaces 0002's unconditional set with a real probe)
+of the upstreamable design's six-commit series.
+
+Forward-declarations in v4l2.h keep the header lean: existing
+prototypes already use opaque struct v4l2_ext_control * pointers
+without including <linux/videodev2.h>; we follow the same
+convention for struct v4l2_query_ext_ctrl and struct v4l2_querymenu.
+
+No call sites added in this commit. Compile-only verification:
+the .so links cleanly with three new exports.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+ src/v4l2.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ src/v4l2.h | 33 +++++++++++++++++++++++++++++
+ 2 files changed, 93 insertions(+)
+
+--- a/src/v4l2.h
+++ b/src/v4l2.h
+@@ -64,4 +64,37 @@ int v4l2_set_control(int video_fd, int request_fd, unsigned int id, void *data,
+ 		     unsigned int size);
+ int v4l2_set_stream(int video_fd, unsigned int type, bool enable);
+
+/*
+ * Capability-probe helpers. These let calling code discover what the
+ * backing kernel driver supports rather than hardcoding assumptions
+ * about specific decoder hardware.
+ */
+
+/*
+ * Query the metadata of an extended control by CID. Fills *qec on
+ * success. Returns 0 if the control exists, -1 (errno=EINVAL) if the
+ * driver does not expose this CID. Pass qec=NULL to test existence
+ * only.
+ */
+struct v4l2_query_ext_ctrl;
+int v4l2_query_ext_ctrl(int video_fd, unsigned int id,
+			struct v4l2_query_ext_ctrl *qec);
+
+/*
+ * Query a single menu item of a menu/intmenu control at the given
+ * index. Fills *qm on success. Returns 0 if the menu item exists at
+ * this index, -1 otherwise.
+ */
+struct v4l2_querymenu;
+int v4l2_query_menu(int video_fd, unsigned int id, unsigned int index,
+		    struct v4l2_querymenu *qm);
+
+/*
+ * Convenience: for a menu-type control, return true iff `value` is a
+ * valid menu entry (i.e. the driver accepts it). Walks all menu items
+ * up to the control's maximum to check.
+ */
+bool v4l2_ctrl_menu_has_value(int video_fd, unsigned int id,
+			      unsigned int value);
+
+ #endif
+--- a/src/v4l2.c
+++ b/src/v4l2.c
+@@ -508,3 +508,63 @@ int v4l2_set_stream(int video_fd, unsigned int type, bool enable)
+
+ 	return 0;
+ }
+
+int v4l2_query_ext_ctrl(int video_fd, unsigned int id,
+			struct v4l2_query_ext_ctrl *qec)
+{
+	struct v4l2_query_ext_ctrl local;
+	struct v4l2_query_ext_ctrl *target = qec ? qec : &local;
+	int rc;
+
+	memset(target, 0, sizeof(*target));
+	target->id = id;
+
+	rc = ioctl(video_fd, VIDIOC_QUERY_EXT_CTRL, target);
+	if (rc < 0)
+		return -1;
+
+	return 0;
+}
+
+int v4l2_query_menu(int video_fd, unsigned int id, unsigned int index,
+		    struct v4l2_querymenu *qm)
+{
+	int rc;
+
+	if (qm == NULL)
+		return -1;
+
+	memset(qm, 0, sizeof(*qm));
+	qm->id = id;
+	qm->index = index;
+
+	rc = ioctl(video_fd, VIDIOC_QUERYMENU, qm);
+	if (rc < 0)
+		return -1;
+
+	return 0;
+}
+
+bool v4l2_ctrl_menu_has_value(int video_fd, unsigned int id,
+			      unsigned int value)
+{
+	struct v4l2_query_ext_ctrl qec;
+	struct v4l2_querymenu qm;
+	long long i;
+
+	if (v4l2_query_ext_ctrl(video_fd, id, &qec) < 0)
+		return false;
+
+	if (qec.type != V4L2_CTRL_TYPE_MENU &&
+	    qec.type != V4L2_CTRL_TYPE_INTEGER_MENU)
+		return false;
+
+	for (i = qec.minimum; i <= qec.maximum; i += qec.step ? qec.step : 1) {
+		if (v4l2_query_menu(video_fd, id, (unsigned int)i, &qm) < 0)
+			continue;
+		if ((unsigned int)i == value)
+			return true;
+	}
+
+	return false;
+}
@@ -0,0 +1,545 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-01
+Subject: [PATCH] context: introduce request_pool, decouple OUTPUT buffers from surfaces
+
+Commit 3 of the upstreamable plan (upstreamable_design.md §1, §5).
+Replaces the prior per-surface OUTPUT-buffer ownership model with a
+small driver-wide pool sized by codec pipeline depth (4 H.264 frames
+in flight), allocated unconditionally regardless of caller's
+num_render_targets.
+
+Prior art (kernel UAPI dev-stateless-decoder.rst, ffmpeg
+v4l2_request.c, Chromium V4L2StatelessVideoDecoder, GStreamer
+v4l2slh264dec) all decouple OUTPUT and CAPTURE pool sizing. fourier's
+"output_count == surfaces_count" model was a category error: OUTPUT
+buffers are request-time bitstream slots, CAPTURE buffers are
+picture-time DPB slots; their lifecycles and sizing are independent.
+
+Changes:
+  * NEW src/request_pool.{c,h} (~200 LoC):
+      - request_pool_init(): CREATE_BUFS + per-slot QUERYBUF + mmap.
+      - request_pool_destroy(): munmap all, idempotent.
+      - request_pool_acquire(): round-robin claim; returns V4L2 buffer
+        index of an unused slot or -1.
+      - request_pool_release(): mark slot free for reuse.
+      - request_pool_slot(): accessor for ptr/size given a buffer index.
+
+  * src/request.h: add struct request_pool output_pool to request_data.
+
+  * src/context.c::RequestCreateContext: replace the per-surface
+    OUTPUT loop with a single request_pool_init() call (count=4,
+    independent of surfaces_count). Drop the now-unused locals
+    (length, offset, source_data, output_buffers_count, index,
+    index_base, i, surface_object). DELETES patch 0002's
+    "output_buffers_count = ... ? ... : 4" hack inline — the pool's
+    own count parameter supersedes it.
+
+  * src/picture.c::RequestBeginPicture: borrow a pool slot at frame
+    start, write its mmap pointer/size/index into the surface's
+    transient source_* fields. The fields stay (still useful as
+    a borrow handle that the existing codec_store_buffer memcpys
+    target), but no longer represent surface-permanent ownership.
+    Reset slices_size/slices_count here too (was implicit on first
+    Render).
+
+  * src/surface.c::RequestSyncSurface: after VIDIOC_DQBUF returns
+    the OUTPUT buffer, release the pool slot and clear the surface's
+    borrow handle. Fixes the segv on second-frame submission.
+
+  * src/surface.c::RequestDestroySurfaces: remove the munmap of
+    source_data — pool owns the mmap.
+
+  * src/request.c::RequestTerminate: call request_pool_destroy()
+    before close(video_fd) so munmaps still target a valid fd.
+
+  * src/meson.build: add request_pool.c and request_pool.h to the
+    sources/headers lists.
+
+This commit removes 0002's OUTPUT-pool hack inline (the
+"floor to 4" line is gone). The DECODE_MODE/START_CODE block in 0002
+remains until commit 4 lands.
+
+Build-verified clean on aarch64.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/request.h	2026-05-01 20:09:57.346428828 +0000
+++ b/src/request.h	2026-05-01 20:17:57.497514185 +0000
+@@ -31,6 +31,7 @@
+ 
+ #include "context.h"
+ #include "object_heap.h"
+#include "request_pool.h"
+ #include "video.h"
+ #include <va/va.h>
+ 
+@@ -55,6 +56,13 @@
+ 	int media_fd;
+ 
+ 	struct video_format *video_format;
+
+	/*
+	 * OUTPUT (bitstream-input) buffer pool, decoupled from VA
+	 * surfaces. Sized by codec pipeline depth, populated on first
+	 * RequestCreateContext, torn down at driver Terminate.
+	 */
+	struct request_pool output_pool;
+ };
+ 
+ VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context);
+--- a/src/request.c	2026-05-01 20:09:57.346428828 +0000
+++ b/src/request.c	2026-05-01 20:19:48.091143681 +0000
+@@ -205,6 +205,13 @@
+ 	struct object_config *config_object;
+ 	int iterator;
+ 
+	/*
+	 * Tear down the OUTPUT buffer pool before closing video_fd so
+	 * the munmap calls in request_pool_destroy() can still touch the
+	 * mmap regions (which are tied to that fd's lifetime).
+	 */
+	request_pool_destroy(&driver_data->output_pool);
+
+ 	close(driver_data->video_fd);
+ 	close(driver_data->media_fd);
+ 
+--- a/src/context.c	2026-05-01 20:09:57.346428828 +0000
+++ b/src/context.c	2026-05-01 20:18:33.738048227 +0000
+@@ -54,20 +54,12 @@
+ {
+ 	struct request_data *driver_data = context->pDriverData;
+ 	struct object_config *config_object;
+-	struct object_surface *surface_object;
+ 	struct object_context *context_object = NULL;
+ 	struct video_format *video_format;
+-	unsigned int length;
+-	unsigned int offset;
+-	void *source_data = MAP_FAILED;
+ 	VASurfaceID *ids = NULL;
+ 	VAContextID id;
+ 	VAStatus status;
+ 	unsigned int output_type, capture_type;
+-	unsigned int output_buffers_count;
+-	unsigned int index_base;
+-	unsigned int index;
+-	unsigned int i;
+ 	int rc;
+ 
+ 	video_format = driver_data->video_format;
+@@ -92,15 +84,20 @@
+ 	memset(&context_object->dpb, 0, sizeof(context_object->dpb));
+ 
+ 	/*
+-	 * The OUTPUT (bitstream-input) queue must be non-empty before
+-	 * VIDIOC_STREAMON or hantro-class drivers reject with EINVAL.
+-	 * VA-API callers (e.g. ffmpeg's vaapi-copy path) may invoke
+-	 * vaCreateContext with num_render_targets=0; allocate a small
+-	 * minimum pool independent of the caller's surface count.
+	 * Initialize the OUTPUT (bitstream-input) buffer pool. Sized by
+	 * codec pipeline depth (4 H.264 frames in flight is sufficient
+	 * for current hantro/rkvdec scheduling); independent of caller-
+	 * supplied surfaces_count. Pool is owned by driver_data so it
+	 * outlives any single context destroy/recreate cycle.
+	 *
+	 * This replaces the prior per-surface OUTPUT loop, which (a)
+	 * created an empty queue when surfaces_count==0 (ffmpeg vaapi-
+	 * copy path) and (b) only populated surface->source_* for
+	 * surfaces present at vaCreateContext time, NULL-derefing on
+	 * surfaces created later.
+ 	 */
+-	output_buffers_count = surfaces_count > 0 ? surfaces_count : 4;
+-	rc = v4l2_create_buffers(driver_data->video_fd, output_type,
+-				 output_buffers_count, &index_base);
+	rc = request_pool_init(&driver_data->output_pool,
+			       driver_data->video_fd, output_type, 4);
+ 	if (rc < 0) {
+ 		status = VA_STATUS_ERROR_ALLOCATION_FAILED;
+ 		goto error;
+@@ -111,40 +108,15 @@
+ 	 * we don't have any indication wrt its life time. Let's make sure
+ 	 * its life span is under our control.
+ 	 */
+-	ids = malloc(surfaces_count * sizeof(VASurfaceID));
+-	if (ids == NULL) {
+-		status = VA_STATUS_ERROR_ALLOCATION_FAILED;
+-		goto error;
+-	}
+-
+-	memcpy(ids, surfaces_ids, surfaces_count * sizeof(VASurfaceID));
+-
+-	for (i = 0; i < surfaces_count; i++) {
+-		index = index_base + i;
+-
+-		surface_object = SURFACE(driver_data, surfaces_ids[i]);
+-		if (surface_object == NULL) {
+-			status = VA_STATUS_ERROR_INVALID_SURFACE;
+-			goto error;
+-		}
+-
+-		rc = v4l2_query_buffer(driver_data->video_fd, output_type,
+-				       index, &length, &offset, 1);
+-		if (rc < 0) {
+	if (surfaces_count > 0) {
+		ids = malloc(surfaces_count * sizeof(VASurfaceID));
+		if (ids == NULL) {
+ 			status = VA_STATUS_ERROR_ALLOCATION_FAILED;
+ 			goto error;
+ 		}
+ 
+-		source_data = mmap(NULL, length, PROT_READ | PROT_WRITE,
+-				   MAP_SHARED, driver_data->video_fd, offset);
+-		if (source_data == MAP_FAILED) {
+-			status = VA_STATUS_ERROR_ALLOCATION_FAILED;
+-			goto error;
+-		}
+-
+-		surface_object->source_index = index;
+-		surface_object->source_data = source_data;
+-		surface_object->source_size = length;
+		memcpy(ids, surfaces_ids,
+		       surfaces_count * sizeof(VASurfaceID));
+ 	}
+ 
+ 	/*
+@@ -200,9 +172,6 @@
+ 	goto complete;
+ 
+ error:
+-	if (source_data != MAP_FAILED)
+-		munmap(source_data, length);
+-
+ 	if (ids != NULL)
+ 		free(ids);
+ 
+--- a/src/picture.c	2026-05-01 20:09:57.346428828 +0000
+++ b/src/picture.c	2026-05-01 20:19:10.742593454 +0000
+@@ -216,6 +216,8 @@
+ 	struct request_data *driver_data = context->pDriverData;
+ 	struct object_context *context_object;
+ 	struct object_surface *surface_object;
+	struct request_pool_slot *slot;
+	int slot_index;
+ 
+ 	context_object = CONTEXT(driver_data, context_id);
+ 	if (context_object == NULL)
+@@ -228,6 +230,31 @@
+ 	if (surface_object->status == VASurfaceRendering)
+ 		RequestSyncSurface(context, surface_id);
+ 
+	/*
+	 * Borrow an OUTPUT (bitstream-input) slot from the driver-wide
+	 * pool for the duration of this Begin/Render/End cycle. The
+	 * surface's source_* fields hold the borrow's mmap pointer/size/
+	 * V4L2 buffer index until RequestSyncSurface releases it after
+	 * VIDIOC_DQBUF.
+	 */
+	slot_index = request_pool_acquire(&driver_data->output_pool);
+	if (slot_index < 0)
+		return VA_STATUS_ERROR_ALLOCATION_FAILED;
+
+	slot = request_pool_slot(&driver_data->output_pool,
+				 (unsigned int)slot_index);
+	if (slot == NULL) {
+		request_pool_release(&driver_data->output_pool,
+				     (unsigned int)slot_index);
+		return VA_STATUS_ERROR_ALLOCATION_FAILED;
+	}
+
+	surface_object->source_index = slot->index;
+	surface_object->source_data = slot->data;
+	surface_object->source_size = slot->size;
+	surface_object->slices_size = 0;
+	surface_object->slices_count = 0;
+
+ 	surface_object->status = VASurfaceRendering;
+ 	context_object->render_surface_id = surface_id;
+ 
+--- a/src/surface.c	2026-05-01 20:09:57.346428828 +0000
+++ b/src/surface.c	2026-05-01 20:19:35.490958060 +0000
+@@ -254,10 +254,11 @@
+ 		if (surface_object == NULL)
+ 			return VA_STATUS_ERROR_INVALID_SURFACE;
+ 
+-		if (surface_object->source_data != NULL &&
+-		    surface_object->source_size > 0)
+-			munmap(surface_object->source_data,
+-			       surface_object->source_size);
+		/*
+		 * source_* are now transient borrows from request_pool, not
+		 * surface-owned mappings; the pool owns the underlying mmap.
+		 * Nothing to free here.
+		 */
+ 
+ 		for (j = 0; j < surface_object->destination_buffers_count; j++)
+ 			if (surface_object->destination_map[j] != NULL &&
+@@ -336,6 +337,15 @@
+ 		goto error;
+ 	}
+ 
+	/*
+	 * OUTPUT buffer is back from the kernel: return its pool slot
+	 * for reuse and clear the surface's transient borrow handle.
+	 */
+	request_pool_release(&driver_data->output_pool,
+			     surface_object->source_index);
+	surface_object->source_data = NULL;
+	surface_object->source_size = 0;
+
+ 	rc = v4l2_dequeue_buffer(driver_data->video_fd, -1, capture_type,
+ 				 surface_object->destination_index,
+ 				 surface_object->destination_buffers_count);
+--- a/src/meson.build	2026-05-01 20:09:57.346428828 +0000
+++ b/src/meson.build	2026-05-01 20:20:04.775389455 +0000
+@@ -44,6 +44,7 @@
+ 	'v4l2.c',
+ 	'mpeg2.c',
+ 	'h264.c',
+	'request_pool.c',
+ #	'h265.c'
+ ]
+ 
+@@ -64,6 +65,7 @@
+ 	'v4l2.h',
+ 	'mpeg2.h',
+ 	'h264.h',
+	'request_pool.h',
+ #	'h265.h'
+ ]
+ 
+--- a/src/request_pool.h	2025-09-03 18:38:22.431999998 +0000
+++ b/src/request_pool.h	2026-05-01 20:17:37.517219722 +0000
+@@ -0,0 +1,84 @@
+/*
+ * Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
+ */
+
+#ifndef _REQUEST_POOL_H_
+#define _REQUEST_POOL_H_
+
+#include <stdbool.h>
+
+/*
+ * OUTPUT (bitstream-input) buffer pool, decoupled from caller-allocated
+ * VA surfaces. Sizing is driven by codec pipeline depth (typically 4
+ * for H.264), not by the consumer's surface count.
+ *
+ * The pool owns the V4L2 buffer indices and their mmap pointers. A
+ * decode request "borrows" a slot at vaBeginPicture, fills it across
+ * vaRenderPicture calls, queues it at vaEndPicture, and releases it
+ * after VIDIOC_DQBUF returns.
+ *
+ * This replaces the per-surface OUTPUT-buffer ownership model in the
+ * pre-refactor code, where object_surface.source_* fields permanently
+ * held a single OUTPUT buffer per surface — incorrect because OUTPUT
+ * buffers are request-time resources, not picture-time resources, and
+ * because the per-surface loop in RequestCreateContext only ran when
+ * surfaces_count > 0 (breaking ffmpeg's vaapi-copy num_render_targets=0
+ * convention).
+ */
+
+struct request_pool_slot {
+	unsigned int	index;		/* V4L2 buffer index in OUTPUT queue */
+	void		*data;		/* mmap pointer for this slot */
+	unsigned int	size;		/* mmap size in bytes */
+	bool		busy;		/* true while borrowed for a request */
+};
+
+struct request_pool {
+	struct request_pool_slot	*slots;
+	unsigned int			 count;
+	unsigned int			 next;	/* round-robin acquire cursor */
+	bool				 initialized;
+};
+
+/*
+ * Allocate count OUTPUT buffers via VIDIOC_CREATE_BUFS, query and mmap
+ * each, populate pool->slots[]. Caller must have already done
+ * VIDIOC_S_FMT on the OUTPUT queue. Returns 0 on success, -1 on
+ * failure.
+ */
+int request_pool_init(struct request_pool *pool, int video_fd,
+		      unsigned int output_type, unsigned int count);
+
+/*
+ * Munmap all slots and free the slots array. Idempotent.
+ */
+void request_pool_destroy(struct request_pool *pool);
+
+/*
+ * Claim the next free slot (round-robin). Returns the slot's V4L2
+ * buffer index on success (slot in pool->slots[] is determined by
+ * the returned index), or -1 if all slots are busy.
+ */
+int request_pool_acquire(struct request_pool *pool);
+
+/*
+ * Mark the slot at pool->slots[i] free for reuse. Caller must pass the
+ * V4L2 buffer index returned earlier from request_pool_acquire().
+ */
+void request_pool_release(struct request_pool *pool, unsigned int index);
+
+/*
+ * Look up the pool slot owning a given V4L2 buffer index. Returns
+ * pointer to the slot on success, NULL if the index is out of range.
+ * The returned pointer is valid until pool destruction; do not free.
+ */
+struct request_pool_slot *request_pool_slot(struct request_pool *pool,
+					    unsigned int index);
+
+#endif
+--- a/src/request_pool.c	2025-09-03 18:38:22.431999998 +0000
+++ b/src/request_pool.c	2026-05-01 20:17:37.537220017 +0000
+@@ -0,0 +1,147 @@
+/*
+ * Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
+ */
+
+#include "request_pool.h"
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+
+#include "utils.h"
+#include "v4l2.h"
+
+int request_pool_init(struct request_pool *pool, int video_fd,
+		      unsigned int output_type, unsigned int count)
+{
+	unsigned int index_base;
+	unsigned int length;
+	unsigned int offset;
+	unsigned int i;
+	int rc;
+
+	if (pool == NULL || count == 0)
+		return -1;
+
+	if (pool->initialized)
+		return 0;
+
+	pool->slots = calloc(count, sizeof(*pool->slots));
+	if (pool->slots == NULL)
+		return -1;
+
+	pool->count = count;
+	pool->next = 0;
+
+	rc = v4l2_create_buffers(video_fd, output_type, count, &index_base);
+	if (rc < 0)
+		goto error;
+
+	for (i = 0; i < count; i++) {
+		pool->slots[i].index = index_base + i;
+		pool->slots[i].busy = false;
+
+		rc = v4l2_query_buffer(video_fd, output_type,
+				       pool->slots[i].index,
+				       &length, &offset, 1);
+		if (rc < 0)
+			goto error;
+
+		pool->slots[i].data = mmap(NULL, length,
+					   PROT_READ | PROT_WRITE,
+					   MAP_SHARED, video_fd, offset);
+		if (pool->slots[i].data == MAP_FAILED) {
+			pool->slots[i].data = NULL;
+			goto error;
+		}
+
+		pool->slots[i].size = length;
+	}
+
+	pool->initialized = true;
+	return 0;
+
+error:
+	request_pool_destroy(pool);
+	return -1;
+}
+
+void request_pool_destroy(struct request_pool *pool)
+{
+	unsigned int i;
+
+	if (pool == NULL || pool->slots == NULL)
+		return;
+
+	for (i = 0; i < pool->count; i++) {
+		if (pool->slots[i].data != NULL && pool->slots[i].size > 0)
+			munmap(pool->slots[i].data, pool->slots[i].size);
+	}
+
+	free(pool->slots);
+	pool->slots = NULL;
+	pool->count = 0;
+	pool->next = 0;
+	pool->initialized = false;
+}
+
+int request_pool_acquire(struct request_pool *pool)
+{
+	unsigned int start;
+	unsigned int i;
+
+	if (pool == NULL || !pool->initialized || pool->count == 0)
+		return -1;
+
+	start = pool->next;
+	for (i = 0; i < pool->count; i++) {
+		unsigned int slot = (start + i) % pool->count;
+
+		if (!pool->slots[slot].busy) {
+			pool->slots[slot].busy = true;
+			pool->next = (slot + 1) % pool->count;
+			return (int)pool->slots[slot].index;
+		}
+	}
+
+	/* All slots busy; caller must wait for an in-flight DQBUF. */
+	return -1;
+}
+
+void request_pool_release(struct request_pool *pool, unsigned int index)
+{
+	unsigned int i;
+
+	if (pool == NULL || pool->slots == NULL)
+		return;
+
+	for (i = 0; i < pool->count; i++) {
+		if (pool->slots[i].index == index) {
+			pool->slots[i].busy = false;
+			return;
+		}
+	}
+}
+
+struct request_pool_slot *request_pool_slot(struct request_pool *pool,
+					    unsigned int index)
+{
+	unsigned int i;
+
+	if (pool == NULL || pool->slots == NULL)
+		return NULL;
+
+	for (i = 0; i < pool->count; i++) {
+		if (pool->slots[i].index == index)
+			return &pool->slots[i];
+	}
+
+	return NULL;
+}
@@ -0,0 +1,61 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-01
+Subject: [PATCH] h264: submit PRED_WEIGHTS only when WEIGHTED_PRED applies
+
+Per kernel UAPI (include/uapi/linux/v4l2-controls.h),
+V4L2_CID_STATELESS_H264_PRED_WEIGHTS is a conditional control:
+
+    V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) :=
+        ((pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) &&
+         (slice_type == P || slice_type == SP)) ||
+        (pps->weighted_bipred_idc == 1 && slice_type == B)
+
+Submitting PRED_WEIGHTS on a frame where the macro evaluates false
+triggers VIDIOC_S_EXT_CTRLS to return EINVAL at error_idx=5 (the
+6th, last control in the per-request batch) on hantro-vpu and any
+other driver that strictly enforces the spec.
+
+Smoke trace from RK3568 hantro on bbb_1080p30 (Main profile, no
+weighted prediction): every per-frame batch fails identically, 13
+EINVALs over a 10-frame run. Without this fix, ffmpeg's vaapi-copy
+falls back to software decode for every frame.
+
+Fix: narrow num_controls to 5 (excluding PRED_WEIGHTS at index 5)
+when the macro returns false; keep at 6 when it returns true.
+
+Defect found and fixed via Phase 6 Step 1 ohm smoke testing. Not
+part of Sonnet's six-commit upstreamable plan; slotted in as patch
+0005 ahead of the planned probe-then-set / FRAME_BASED commits
+because it unblocks per-frame submission on every backing driver,
+not just hantro.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/h264.c	2026-05-01 20:17:02.108697824 +0000
+++ b/src/h264.c	2026-05-01 20:30:02.632190563 +0000
+@@ -559,8 +559,24 @@
+ 		}
+ 	};
+ 
+	/*
+	 * PRED_WEIGHTS is conditionally required per kernel UAPI:
+	 * V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) is only
+	 * true when explicit weighted prediction applies (P/SP slice
+	 * with WEIGHTED_PRED flag, or B slice with weighted_bipred_idc
+	 * == 1). Submitting it unconditionally on a frame that does
+	 * not need it triggers EINVAL at error_idx=5 on hantro and
+	 * other drivers that strictly enforce the spec.
+	 *
+	 * controls[5] is PRED_WEIGHTS (last in array); narrow the
+	 * submission count to exclude it when not required.
+	 */
+	unsigned int num_controls = 6;
+	if (!V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
+		num_controls = 5;
+
+ 	rc = v4l2_set_controls(driver_data->video_fd, surface->request_fd,
+-			       controls, 6);
+			       controls, num_controls);
+ 	if (rc < 0)
+ 		return VA_STATUS_ERROR_OPERATION_FAILED;
+ 
@@ -0,0 +1,128 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-01
+Subject: [PATCH] h264: omit per-slice controls in FRAME_BASED mode
+
+Identified by cross-reference against GStreamer's
+gst-plugins-bad/sys/v4l2codecs/gstv4l2codech264dec.c (upstream commit
+9e3e775). At lines 1263-1304, GStreamer gates SLICE_PARAMS and
+PRED_WEIGHTS submission on is_slice_based(self):
+
+    if (is_slice_based (self)) {
+        control[num_controls].id = V4L2_CID_STATELESS_H264_SLICE_PARAMS;
+        ...
+        control[num_controls].id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS;
+        ...
+    }
+
+In V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED, the kernel parses the
+bitstream itself from the OUTPUT-queue payload; per-slice controls in
+the request trigger cluster-validation EINVAL at error_idx=count
+(observed on RK3568 hantro-vpu, kernel 6.19.10).
+
+This patch:
+  - Reorders controls[] so FRAME_BASED-required entries come first
+    (SPS, PPS, SCALING_MATRIX, DECODE_PARAMS at indices 0..3) and the
+    SLICE_BASED-only entries come last (SLICE_PARAMS, PRED_WEIGHTS at
+    indices 4..5).
+  - Defaults num_controls=4 (FRAME_BASED), expanding to 5 for
+    SLICE_BASED and 6 when V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
+  - Hardcodes slice_based=false for now since patch 0002 sets the
+    device to FRAME_BASED unconditionally. A TODO marks the spot for
+    the planned probe-then-set commit, which will populate
+    context->decode_mode at CreateContext via VIDIOC_QUERYCTRL/
+    G_EXT_CTRLS and replace the hardcoded false with a runtime check.
+
+Diagnosis chain:
+  - patch 0005 reduced one EINVAL per frame on PRED_WEIGHTS
+    submission, but cluster-level rejection persisted at error_idx=5
+    (count) — meaning kernel walked all 5 controls cleanly but
+    rejected the request as a whole.
+  - dmesg silent → rejection in V4L2 core (v4l2-ctrls-request.c /
+    v4l2-h264.c), not in hantro driver where it could log.
+  - GStreamer reference confirmed FRAME_BASED contract: only 4
+    sequence-and-frame-level controls go in the per-request batch.
+
+After this patch the kernel should accept the per-request controls
+and actually decode the bitstream into the CAPTURE buffer.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/h264.c	2026-05-01 20:30:02.632190563 +0000
+++ b/src/h264.c	2026-05-01 20:49:46.937497317 +0000
+@@ -531,6 +531,21 @@
+ 
+ 	sps.profile_idc = h264_profile_to_idc(profile);
+ 
+	/*
+	 * Per-request control batch, ordered so the controls REQUIRED in
+	 * V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED come first
+	 * (indices 0..3) and the SLICE_BASED-only controls come last
+	 * (indices 4..5).
+	 *
+	 * Cross-reference: GStreamer gst-plugins-bad
+	 * sys/v4l2codecs/gstv4l2codech264dec.c (commit 9e3e775,
+	 * lines 1263-1304) gates SLICE_PARAMS and PRED_WEIGHTS on
+	 * is_slice_based(self); under FRAME_BASED only SPS/PPS/
+	 * SCALING_MATRIX/DECODE_PARAMS are submitted. The kernel
+	 * parses the bitstream itself in FRAME_BASED mode; submitting
+	 * per-slice controls in that mode triggers cluster-validation
+	 * EINVAL at error_idx=count.
+	 */
+ 	struct v4l2_ext_control controls[6] = {
+ 		{
+ 			.id = V4L2_CID_STATELESS_H264_SPS,
+@@ -545,14 +560,14 @@
+ 			.p_h264_scaling_matrix = &matrix,
+ 			.size = sizeof(matrix),
+ 		}, {
+-			.id = V4L2_CID_STATELESS_H264_SLICE_PARAMS,
+-			.p_h264_slice_params = &slice,
+-			.size = sizeof(slice),
+-		}, {
+ 			.id = V4L2_CID_STATELESS_H264_DECODE_PARAMS,
+ 			.p_h264_decode_params = &decode,
+ 			.size = sizeof(decode),
+ 		}, {
+			.id = V4L2_CID_STATELESS_H264_SLICE_PARAMS,
+			.p_h264_slice_params = &slice,
+			.size = sizeof(slice),
+		}, {
+ 			.id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS,
+ 			.ptr = &weights,
+ 			.size = sizeof(weights),
+@@ -560,20 +575,24 @@
+ 	};
+ 
+ 	/*
+-	 * PRED_WEIGHTS is conditionally required per kernel UAPI:
+-	 * V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) is only
+-	 * true when explicit weighted prediction applies (P/SP slice
+-	 * with WEIGHTED_PRED flag, or B slice with weighted_bipred_idc
+-	 * == 1). Submitting it unconditionally on a frame that does
+-	 * not need it triggers EINVAL at error_idx=5 on hantro and
+-	 * other drivers that strictly enforce the spec.
+	 * Decode-mode dispatch. Patch 0002 unconditionally sets the
+	 * device to FRAME_BASED, so we hardcode that here. When the
+	 * planned probe-then-set commit lands, slice_based becomes
+	 *     context->decode_mode == V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED
+	 * with context->decode_mode populated at CreateContext via
+	 * VIDIOC_QUERYCTRL/G_EXT_CTRLS.
+ 	 *
+-	 * controls[5] is PRED_WEIGHTS (last in array); narrow the
+-	 * submission count to exclude it when not required.
+	 * FRAME_BASED:    4 controls (SPS, PPS, SCALING_MATRIX, DECODE_PARAMS).
+	 * SLICE_BASED:   +SLICE_PARAMS (always), +PRED_WEIGHTS (when
+	 *                V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED).
+ 	 */
+-	unsigned int num_controls = 6;
+-	if (!V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
+	const bool slice_based = false; /* TODO: probe via context->decode_mode */
+	unsigned int num_controls = 4;
+	if (slice_based) {
+ 		num_controls = 5;
+		if (V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
+			num_controls = 6;
+	}
+ 
+ 	rc = v4l2_set_controls(driver_data->video_fd, surface->request_fd,
+ 			       controls, num_controls);
@@ -0,0 +1,59 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-01
+Subject: [PATCH] context: enable ANNEX_B start-code emission to match device
+
+Patch 0002 sets V4L2_CID_STATELESS_H264_START_CODE to ANNEX_B on the
+device, telling the kernel that OUTPUT-buffer payloads will contain
+0x00 0x00 0x01 NAL start codes. picture.c::codec_store_buffer has
+the prepend logic guarded by `if (context->h264_start_code)`, but
+that boolean is set ONLY inside h264_get_controls() — a function
+that exists but is never called.
+
+Result: device expects ANNEX_B, libva-v4l2-request feeds raw NAL
+payloads with no start codes, kernel cannot find slice boundaries,
+hantro emits a zeroed CAPTURE buffer. mpv reports successful decode
+because the V4L2 round-trip succeeds (no EINVAL); the visual output
+is a flat dark-green frame (NV12 zero through BT.709).
+
+Identified via:
+  - Patch 0006 cleared the EINVAL cluster-rejection (128 → 0 on
+    bbb_1080p30) but visual output remained flat green.
+  - GStreamer reference (gstv4l2codech264dec.c:1363-1377) confirms
+    start codes are required when ANNEX_B is selected.
+  - Source-archaeology of fourier's picture.c:67-74 showed the gate
+    on context->h264_start_code.
+
+Fix: in context.c::RequestCreateContext, immediately after patch
+0002's device-control block, set context_object->h264_start_code =
+true to match the ANNEX_B mode we just programmed. Hardcoded for
+now (matches 0002's hardcoded set); replaced with a runtime probe
+in the planned probe-then-set commit.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/context.c	2026-05-01 20:48:59.884816330 +0000
+++ b/src/context.c	2026-05-01 20:59:54.446340219 +0000
+@@ -146,6 +146,23 @@
+ 					dev_ctrls, 2);
+ 	}
+ 
+	/*
+	 * Mirror the ANNEX_B start-code mode set on the device above
+	 * into context_object->h264_start_code so picture.c::
+	 * codec_store_buffer prepends 0x00 0x00 0x01 to each slice
+	 * payload it copies into the OUTPUT buffer. Without this, the
+	 * kernel — which we just told to expect ANNEX_B — sees a raw
+	 * NAL stream with no start codes, fails to find slice
+	 * boundaries, and emits a zeroed CAPTURE buffer (visually a
+	 * flat dark-green frame).
+	 *
+	 * h264_get_controls() exists for this purpose but is never
+	 * called in the current code path; the planned probe-then-set
+	 * commit will replace this hardcoded assignment with a runtime
+	 * read of the kernel's accepted START_CODE value.
+	 */
+	context_object->h264_start_code = true;
+
+ 	rc = v4l2_set_stream(driver_data->video_fd, output_type, true);
+ 	if (rc < 0) {
+ 		status = VA_STATUS_ERROR_OPERATION_FAILED;
@@ -0,0 +1,87 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-01
+Subject: [PATCH] h264: fill DECODE_PARAMS frame_num + field flags from VAAPI
+
+Fourier's h264_va_picture_to_v4l2 only populated four fields of the
+struct v4l2_ctrl_h264_decode_params: dpb (via h264_fill_dpb),
+nal_ref_idc, top_field_order_cnt, bottom_field_order_cnt, and the
+IDR_PIC flag. Many other required-by-spec fields were left at zero-
+init (frame_num, idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
+dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
+slice_group_change_cycle, FIELD_PIC and BOTTOM_FIELD flags).
+
+For an IDR (first frame) on hantro-vpu RK3568, the kernel parses
+the bitstream from the OUTPUT buffer and uses these fields to drive
+its bitstream-element offset tracking. Empirically the kernel
+returned a successfully-decoded but ZEROED CAPTURE buffer — flat
+dark-green frames in mpv output, no errors logged.
+
+This patch fills every field VAAPI exposes:
+
+  - frame_num: from VAPicture->frame_num.
+  - FIELD_PIC flag: from VAPicture->pic_fields.bits.field_pic_flag.
+  - BOTTOM_FIELD flag: from
+    VAPicture->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD.
+
+Also corrects the IDR_PIC flag to use |= instead of = so the new
+field flags don't clobber it.
+
+Fields NOT derivable from VAAPI's pre-parsed structures —
+idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
+dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
+slice_group_change_cycle — require a slice_header() bit-level
+parse. libva-v4l2-request does not currently do this. They remain
+at zero-init.
+
+Empirical question this patch answers: does hantro tolerate the
+bit_size fields being zero for IDR frames, or does it strictly
+require them? If post-patch CAPTURE is still zeroed, a slice-header
+parser is required. If CAPTURE shows real picture data, hantro
+fills in the bit-positions itself when no hint is supplied.
+
+Cross-reference: gstv4l2codech264dec.c::
+gst_v4l2_codec_h264_dec_fill_decoder_params (commit 9e3e775,
+lines 632-678).
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/h264.c	2026-05-01 20:59:41.710154198 +0000
+++ b/src/h264.c	2026-05-01 21:16:35.712995986 +0000
+@@ -243,13 +243,34 @@
+ 
+ 	h264_fill_dpb(driver_data, context, decode);
+ 
+-	//decode->num_slices = surface->slices_count;
+	/*
+	 * Populate every V4L2_CID_STATELESS_H264_DECODE_PARAMS field
+	 * we can derive from VAAPI's pre-parsed VAPictureParameterBuffer
+	 * + bitstream byte. Cross-reference: GStreamer
+	 * gstv4l2codech264dec.c::gst_v4l2_codec_h264_dec_fill_decoder_params
+	 * (lines 632-678).
+	 *
+	 * Fields not derivable from VAAPI (idr_pic_id, pic_order_cnt_lsb,
+	 * delta_pic_order_cnt_*, dec_ref_pic_marking_bit_size,
+	 * pic_order_cnt_bit_size, slice_group_change_cycle) require a
+	 * full slice_header() bit-level parse, which libva-v4l2-request
+	 * does not currently do. They are left at zero-init and the
+	 * kernel-side hantro-vpu may compute them itself when scanning
+	 * the OUTPUT bitstream — a hypothesis verified empirically by
+	 * running this patch and inspecting the CAPTURE buffer.
+	 */
+ 	decode->nal_ref_idc = nal_ref_idc;
+-	if (nal_unit_type == 5)
+-		decode->flags = V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC;
+	decode->frame_num = VAPicture->frame_num;
+ 	decode->top_field_order_cnt = VAPicture->CurrPic.TopFieldOrderCnt;
+ 	decode->bottom_field_order_cnt = VAPicture->CurrPic.BottomFieldOrderCnt;
+ 
+	if (nal_unit_type == 5)
+		decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC;
+	if (VAPicture->pic_fields.bits.field_pic_flag)
+		decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC;
+	if (VAPicture->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD)
+		decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD;
+
+ 	pps->weighted_bipred_idc =
+ 		VAPicture->pic_fields.bits.weighted_bipred_idc;
+ 	pps->pic_init_qs_minus26 = VAPicture->pic_init_qs_minus26;
@@ -0,0 +1,58 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-01
+Subject: [PATCH] surface: don't VIDIOC_S_FMT the CAPTURE queue
+
+The hantro stateless decoder derives the CAPTURE format from the
+SPS attached to the per-request OUTPUT controls. Calling
+VIDIOC_S_FMT on the CAPTURE queue at vaCreateSurfaces2 time can
+leave the driver's vb2 state in an inconsistent configuration
+where the queue accepts buffers and DQBUF returns successfully but
+the kernel never actually writes decoded pixels into them.
+
+Cross-reference: GStreamer's gst-plugins-bad/sys/v4l2codecs/
+gstv4l2decoder.c only calls VIDIOC_G_FMT on the CAPTURE side
+(via gst_v4l2_decoder_negotiate_src_format and friends). The
+same code path produces correctly-decoded NV12 frames on the
+same RK3568 hantro-vpu where libva-v4l2-request-with-S_FMT
+emits flat-green zeroed CAPTURE buffers.
+
+The v4l2_get_format() call immediately after this block already
+gives us the bytesperline / sizes the driver chose; nothing else
+in this file consumed the explicit S_FMT side-effects.
+
+Empirical hypothesis test for the lingering "kernel decodes
+without errors but emits zeroed CAPTURE" bug. If post-patch
+output shows actual picture content, this confirms the
+diagnosis: explicit CAPTURE format mutation breaks hantro's
+internal state. If output remains flat-green, the bug is
+elsewhere and we resume hex-dump-grade instrumentation.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/surface.c	2026-05-01 21:16:19.588759711 +0000
+++ b/src/surface.c	2026-05-01 21:41:12.095146549 +0000
+@@ -118,10 +118,20 @@
+ 
+ 		capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);
+ 
+-		rc = v4l2_set_format(driver_data->video_fd, capture_type,
+-				     video_format->v4l2_format, width, height);
+-		if (rc < 0)
+-			return VA_STATUS_ERROR_OPERATION_FAILED;
+		/*
+		 * Do not VIDIOC_S_FMT on the CAPTURE queue. The hantro
+		 * stateless decoder derives the CAPTURE format from the
+		 * SPS attached to the OUTPUT request; explicitly setting
+		 * it here can put the driver into an inconsistent state.
+		 * GStreamer's v4l2slh264dec only G_FMTs CAPTURE (see
+		 * gst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c::
+		 * gst_v4l2_decoder_negotiate_src_format), and that
+		 * variant produces correct decoded NV12 on the same
+		 * hardware where this driver currently emits zeros.
+		 *
+		 * v4l2_get_format() below queries the driver's current
+		 * state and gives us the bytesperline/sizes we need.
+		 */
+         } else {
+ 		video_format = driver_data->video_format;
+ 		capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);
@@ -0,0 +1,101 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-01
+Subject: [PATCH] DEBUG: hex-dump OUTPUT and CAPTURE buffer contents per frame
+
+Diagnostic-only patch (NOT for upstream). Hex-dumps:
+  - First 32 bytes of OUTPUT buffer at QBUF time in
+    picture.c::RequestEndPicture (i.e. what we feed the kernel)
+  - First 32 bytes of CAPTURE Y-plane after DQBUF in
+    surface.c::RequestSyncSurface (i.e. what kernel returned)
+
+Lets us see whether:
+  - OUTPUT bitstream begins with valid ANNEX_B start code + NAL
+    header byte (e.g. `00 00 01 65` for IDR slice)
+  - CAPTURE Y-plane after decode contains varied luma data
+    (working) vs. all-zeros / repeating pattern (kernel didn't
+    write anything).
+
+Removed once Step 1 decode is verified working. Output goes via
+existing request_log() to stderr.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/picture.c	2026-05-01 21:41:00.114969150 +0000
+++ b/src/picture.c	2026-05-01 21:50:11.123117853 +0000
+@@ -36,6 +36,7 @@
+ #include "mpeg2.h"
+ 
+ #include <assert.h>
+#include <stdio.h>
+ #include <string.h>
+ 
+ #include <errno.h>
+@@ -354,6 +355,27 @@
+ 	if (rc < 0)
+ 		return VA_STATUS_ERROR_OPERATION_FAILED;
+ 
+	/*
+	 * DEBUG INSTRUMENTATION (0010): hex-dump first 32 bytes of the
+	 * OUTPUT buffer at the moment we hand it to the kernel. Helps
+	 * pin down whether our bitstream prepend logic is correct.
+	 * For a valid ANNEX_B IDR slice the dump should start
+	 * 00 00 01 65 ... (00 00 01 = start code; 0x65 = nal_ref_idc=3,
+	 * nal_unit_type=5 = IDR slice). Removed once Step 1 decode is
+	 * verified working.
+	 */
+	{
+		const unsigned char *p = surface_object->source_data;
+		char hex[32 * 3 + 1] = { 0 };
+		unsigned int i, n = surface_object->slices_size < 32 ?
+				    surface_object->slices_size : 32;
+		for (i = 0; i < n; i++)
+			snprintf(hex + i * 3, 4, " %02x", p[i]);
+		request_log("OUTPUT[idx=%u, len=%u]:%s\n",
+			    surface_object->source_index,
+			    surface_object->slices_size, hex);
+	}
+
+ 	rc = v4l2_queue_buffer(driver_data->video_fd, request_fd, output_type,
+ 			       &surface_object->timestamp,
+ 			       surface_object->source_index,
+--- a/src/surface.c	2026-05-01 21:41:12.095146549 +0000
+++ b/src/surface.c	2026-05-01 21:50:15.895188360 +0000
+@@ -29,6 +29,7 @@
+ 
+ #include <assert.h>
+ #include <errno.h>
+#include <stdio.h>
+ #include <stdlib.h>
+ #include <string.h>
+ #include <unistd.h>
+@@ -364,6 +365,30 @@
+ 		goto error;
+ 	}
+ 
+	/*
+	 * DEBUG INSTRUMENTATION (0010): hex-dump first 32 bytes of the
+	 * decoded CAPTURE Y-plane after DQBUF. If the kernel actually
+	 * decoded the frame, these should reflect a real Y-luma pattern
+	 * (varied bytes). All-zero or all-identical means no decode
+	 * landed pixels in the buffer. Removed once Step 1 is verified.
+	 */
+	{
+		const unsigned char *p =
+			(unsigned char *)surface_object->destination_map[0];
+		char hex[32 * 3 + 1] = { 0 };
+		unsigned int i;
+		if (p == NULL) {
+			request_log("CAPTURE[idx=%u, plane0]: (NULL)\n",
+				    surface_object->destination_index);
+		} else {
+			for (i = 0; i < 32; i++)
+				snprintf(hex + i * 3, 4, " %02x", p[i]);
+			request_log("CAPTURE[idx=%u, plane0]:%s\n",
+				    surface_object->destination_index,
+				    hex);
+		}
+	}
+
+ 	surface_object->status = VASurfaceDisplaying;
+ 
+ 	status = VA_STATUS_SUCCESS;
@@ -0,0 +1,53 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-01
+Subject: [PATCH] DEBUG: sentinel-pattern test for CAPTURE buffer write
+
+Diagnostic-only. Writes 0xab×32 into the CAPTURE buffer's first 32
+bytes immediately before VIDIOC_QBUF. The 0010 hex-dump after
+DQBUF reveals which case we're in:
+
+  - All 0xab → kernel never wrote to this buffer (wrong buffer
+    chosen, alias, or no decode actually happened despite
+    bytesused=3655712 reported).
+  - All zeros → kernel did write 0x00s (overwriting our
+    sentinel), and the apparent "no picture" output is the
+    kernel-side decode actually producing zeros (e.g. parser
+    rejected the bitstream).
+  - Mix of zeros and real luma values → kernel wrote real
+    decoded pixels; CPU read sees stale-cached zeros somewhere
+    OR the sentinel area was a header that decoder zeroed but
+    rest is real. Need to check more bytes.
+  - All 0xab still → kernel never touched this region but other
+    parts of buffer may be filled (incomplete decode).
+
+Removed once Step 1 decode is verified.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/picture.c	2026-05-01 21:50:11.123117853 +0000
+++ b/src/picture.c	2026-05-01 22:20:20.589037667 +0000
+@@ -349,6 +349,24 @@
+ 	if (rc != VA_STATUS_SUCCESS)
+ 		return rc;
+ 
+	/*
+	 * DEBUG INSTRUMENTATION (0011): write a sentinel pattern into
+	 * the CAPTURE buffer's first 32 bytes BEFORE QBUF. If after
+	 * DQBUF the sentinel survives (per surface.c hex dump), the
+	 * kernel never wrote to this buffer. If the sentinel is gone
+	 * (replaced by zeros), the kernel did write but our CPU read
+	 * sees stale-cached data — cache-coherency issue.
+	 */
+	{
+		unsigned char *p = (unsigned char *)
+			surface_object->destination_map[0];
+		if (p != NULL) {
+			unsigned int i;
+			for (i = 0; i < 32; i++)
+				p[i] = 0xab;
+		}
+	}
+
+ 	rc = v4l2_queue_buffer(driver_data->video_fd, -1, capture_type, NULL,
+ 			       surface_object->destination_index, 0,
+ 			       surface_object->destination_buffers_count);
@@ -0,0 +1,214 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-02
+Subject: [PATCH] h264: gate SCALING_MATRIX submission on VAIQMatrixBuffer presence
+
+VAAPI signals "explicit scaling lists are present in the bitstream"
+implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a
+VAIQMatrixBufferH264 alongside RenderPicture iff
+sps_scaling_matrix_present_flag || pps_scaling_matrix_present_flag.
+When the bitstream uses default (flat) scaling, no IQMatrixBuffer
+arrives and the in-tree h264.matrix struct stays zero-initialised.
+
+fourier's existing codec_store_buffer for MPEG2 and HEVC tracks this
+via a per-surface iqmatrix_set boolean (surface.h::mpeg2.iqmatrix_set,
+h265.iqmatrix_set) — the H.264 path was missing the equivalent flag,
+so set_controls always submitted the scaling matrix, including the
+zero-initialised case.
+
+Symptom on hantro-vpu RK3568: when TRANSFORM_8X8_MODE is enabled in
+PPS, the kernel multiplies all 8x8 DCT coefficients by the zeroed
+scaling_list_8x8, producing a zeroed CAPTURE buffer despite a
+successful decode round-trip (no V4L2_BUF_FLAG_ERROR,
+bytesused=3655712 reported).
+
+Earlier draft of this patch unconditionally omitted SCALING_MATRIX in
+FRAME_BASED. That's corpus-correct (bbb has no explicit scaling
+lists) but the wrong predicate: the kernel-side gating is by
+"matrix-supplied vs. not," not by decode mode. Streams that signal
+explicit scaling lists must submit SCALING_MATRIX in either mode.
+
+Contract verification (audit_0008_decode_params_2026-05-01.md +
+hantro_h264.c::assemble_scaling_list): the kernel uses the supplied
+matrix when SCALING_MATRIX is in the control batch and falls back
+to spec-defined defaults when absent. Mode-independent.
+
+This patch:
+  - surface.h: adds bool matrix_set to params.h264, mirroring
+    mpeg2.iqmatrix_set / h265.iqmatrix_set.
+  - picture.c codec_store_buffer (H.264 VAIQMatrixBufferType case):
+    sets matrix_set = true when the buffer arrives.
+  - picture.c RequestBeginPicture: resets matrix_set = false at the
+    start of each Begin/Render/End cycle.
+  - h264.c h264_set_controls: builds the controls[] array
+    incrementally; SPS/PPS/DECODE_PARAMS always; SCALING_MATRIX iff
+    matrix_set; SLICE_PARAMS only in SLICE_BASED; PRED_WEIGHTS only
+    when both SLICE_BASED and V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
+
+The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is preserved —
+kernel doc ext-ctrls-codec-stateless.rst:752: "When this mode is
+selected, the V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall
+not be set."
+
+Cross-reference: kernel UAPI section
+ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SCALING_MATRIX
+(matrix supplied iff explicit scaling lists in bitstream) and
+hantro_h264.c::assemble_scaling_list (consumes supplied matrix or
+falls back to defaults).
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/surface.h
+++ b/src/surface.h
+@@ -73,6 +73,7 @@
+ 		} mpeg2;
+ 		struct {
+ 			VAIQMatrixBufferH264 matrix;
+			bool matrix_set;
+ 			VAPictureParameterBufferH264 picture;
+ 			VASliceParameterBufferH264 slice;
+ 		} h264;
+--- a/src/picture.c
+++ b/src/picture.c
+@@ -153,6 +153,7 @@
+ 			memcpy(&surface_object->params.h264.matrix,
+ 			       buffer_object->data,
+ 			       sizeof(surface_object->params.h264.matrix));
+			surface_object->params.h264.matrix_set = true;
+ 			break;
+
+ 		case VAProfileHEVCMain:
+@@ -255,6 +256,7 @@
+ 	surface_object->source_size = slot->size;
+ 	surface_object->slices_size = 0;
+ 	surface_object->slices_count = 0;
+	surface_object->params.h264.matrix_set = false;
+
+ 	surface_object->status = VASurfaceRendering;
+ 	context_object->render_surface_id = surface_id;
+--- a/src/h264.c
+++ b/src/h264.c
+@@ -553,66 +553,68 @@
+ 	sps.profile_idc = h264_profile_to_idc(profile);
+
+ 	/*
+-	 * Per-request control batch, ordered so the controls REQUIRED in
+-	 * V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED come first
+-	 * (indices 0..3) and the SLICE_BASED-only controls come last
+-	 * (indices 4..5).
+	 * Build the per-request control list incrementally:
+	 *   - SPS, PPS, DECODE_PARAMS: always required (in either decode
+	 *     mode).
+	 *   - SCALING_MATRIX: gated on surface->params.h264.matrix_set,
+	 *     i.e. the consumer sent a VAIQMatrixBufferH264 this frame.
+	 *     This matches the H.264 spec: explicit scaling lists are
+	 *     present iff sps_scaling_matrix_present_flag ||
+	 *     pps_scaling_matrix_present_flag, in which case VAAPI
+	 *     consumers send the matrix; otherwise the kernel uses
+	 *     spec-defined defaults. Independent of FRAME_BASED /
+	 *     SLICE_BASED.
+	 *   - SLICE_PARAMS: SLICE_BASED only. Kernel doc
+	 *     ext-ctrls-codec-stateless.rst (FRAME_BASED entry):
+	 *     "When this mode is selected, the
+	 *     V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall not be
+	 *     set." Submitting it under FRAME_BASED triggers cluster-
+	 *     validation EINVAL at error_idx=count.
+	 *   - PRED_WEIGHTS: SLICE_BASED + V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
+ 	 *
+-	 * Cross-reference: GStreamer gst-plugins-bad
+-	 * sys/v4l2codecs/gstv4l2codech264dec.c (commit 9e3e775,
+-	 * lines 1263-1304) gates SLICE_PARAMS and PRED_WEIGHTS on
+-	 * is_slice_based(self); under FRAME_BASED only SPS/PPS/
+-	 * SCALING_MATRIX/DECODE_PARAMS are submitted. The kernel
+-	 * parses the bitstream itself in FRAME_BASED mode; submitting
+-	 * per-slice controls in that mode triggers cluster-validation
+-	 * EINVAL at error_idx=count.
+-	 */
+-	struct v4l2_ext_control controls[6] = {
+-		{
+-			.id = V4L2_CID_STATELESS_H264_SPS,
+-			.p_h264_sps = &sps,
+-			.size = sizeof(sps),
+-		}, {
+-			.id = V4L2_CID_STATELESS_H264_PPS,
+-			.p_h264_pps = &pps,
+-			.size = sizeof(pps),
+-		}, {
+-			.id = V4L2_CID_STATELESS_H264_SCALING_MATRIX,
+-			.p_h264_scaling_matrix = &matrix,
+-			.size = sizeof(matrix),
+-		}, {
+-			.id = V4L2_CID_STATELESS_H264_DECODE_PARAMS,
+-			.p_h264_decode_params = &decode,
+-			.size = sizeof(decode),
+-		}, {
+-			.id = V4L2_CID_STATELESS_H264_SLICE_PARAMS,
+-			.p_h264_slice_params = &slice,
+-			.size = sizeof(slice),
+-		}, {
+-			.id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS,
+-			.ptr = &weights,
+-			.size = sizeof(weights),
+-		}
+-	};
+-
+-	/*
+-	 * Decode-mode dispatch. Patch 0002 unconditionally sets the
+-	 * device to FRAME_BASED, so we hardcode that here. When the
+-	 * planned probe-then-set commit lands, slice_based becomes
+-	 *     context->decode_mode == V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED
+-	 * with context->decode_mode populated at CreateContext via
+-	 * VIDIOC_QUERYCTRL/G_EXT_CTRLS.
+-	 *
+-	 * FRAME_BASED:    4 controls (SPS, PPS, SCALING_MATRIX, DECODE_PARAMS).
+-	 * SLICE_BASED:   +SLICE_PARAMS (always), +PRED_WEIGHTS (when
+-	 *                V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED).
+	 * Patch 0002 unconditionally sets the device to FRAME_BASED,
+	 * so slice_based is hardcoded false here. When the planned
+	 * probe-then-set commit lands, this becomes
+	 *     context->decode_mode == V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED.
+ 	 */
+	struct v4l2_ext_control controls[6] = { 0 };
+	unsigned int num_controls = 0;
+ 	const bool slice_based = false; /* TODO: probe via context->decode_mode */
+-	unsigned int num_controls = 4;
+
+	controls[num_controls].id = V4L2_CID_STATELESS_H264_SPS;
+	controls[num_controls].p_h264_sps = &sps;
+	controls[num_controls].size = sizeof(sps);
+	num_controls++;
+
+	controls[num_controls].id = V4L2_CID_STATELESS_H264_PPS;
+	controls[num_controls].p_h264_pps = &pps;
+	controls[num_controls].size = sizeof(pps);
+	num_controls++;
+
+	controls[num_controls].id = V4L2_CID_STATELESS_H264_DECODE_PARAMS;
+	controls[num_controls].p_h264_decode_params = &decode;
+	controls[num_controls].size = sizeof(decode);
+	num_controls++;
+
+	if (surface->params.h264.matrix_set) {
+		controls[num_controls].id = V4L2_CID_STATELESS_H264_SCALING_MATRIX;
+		controls[num_controls].p_h264_scaling_matrix = &matrix;
+		controls[num_controls].size = sizeof(matrix);
+		num_controls++;
+	}
+
+ 	if (slice_based) {
+-		num_controls = 5;
+-		if (V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
+-			num_controls = 6;
+		controls[num_controls].id = V4L2_CID_STATELESS_H264_SLICE_PARAMS;
+		controls[num_controls].p_h264_slice_params = &slice;
+		controls[num_controls].size = sizeof(slice);
+		num_controls++;
+
+		if (V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice)) {
+			controls[num_controls].id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS;
+			controls[num_controls].ptr = &weights;
+			controls[num_controls].size = sizeof(weights);
+			num_controls++;
+		}
+ 	}
+
+ 	rc = v4l2_set_controls(driver_data->video_fd, surface->request_fd,
@@ -0,0 +1,86 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-02
+Subject: [PATCH] h264: hardcode SPS level_idc = 51 (intentional over-allocation)
+
+fourier's h264_va_picture_to_v4l2 never assigns sps->level_idc; the
+field stays at zero-init. level_idc=0 is invalid per the H.264 spec
+(lowest legal value is 10, Level 1.0). Hantro and other stateless
+H.264 decoders use level_idc to pre-allocate decoder resources (DPB
+size, motion-vector buffers); when fed an invalid level the hantro
+kernel driver silently skips the decode-hardware dispatch — the V4L2
+request completes with no error, DQBUF returns the CAPTURE buffer
+reporting bytesused=3655712 and no V4L2_BUF_FLAG_ERROR, but the
+buffer is never written.
+
+VAAPI's decode-side VAPictureParameterBufferH264 structurally does
+NOT include level_idc — `grep level_idc va/va.h` returns only hits
+inside VAEncSequenceParameterBufferH264 (the encode path). The
+H.264 SPS NAL is also not included in VASliceDataBuffer because
+ffmpeg-vaapi parses it client-side and forwards only slice data
+(verified empirically via patch 0010's hex-dump of the OUTPUT
+buffer: it contains "00 00 01 65 ..." — i.e. ANNEX_B start code +
+IDR slice NAL byte, no SPS NAL). A SPS-NAL byte extractor is
+therefore not viable from the bitstream libva-v4l2-request
+receives.
+
+Workaround: hardcode level_idc = 51 (= Level 5.1, max for 1080p
+and 4K@30 mainstream consumer profiles). This INTENTIONALLY
+OVER-ALLOCATES decoder resources but is sufficient for any stream
+up to 4K@30. It is corpus-correct, not contract-correct: a 4K@60
+stream (Level 6.x) would under-allocate.
+
+This patch is a known-incomplete intermediate, not a final fix.
+The proper upstreamable answer is a level-from-resolution
+derivation per H.264 Annex A.3 (max MB rate / max frame size
+thresholds). That requires mapping consumer-side framerate which
+VAAPI does not expose, so the lookup table is non-trivial. The
+TODO is captured inline.
+
+This patch's goal is unblocking decode-hardware engagement on the
+ohm_gl_fix corpus while the full level-derivation work proceeds.
+
+Cross-reference: kernel doc
+ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SPS lists
+level_idc as a required field with no "kernel-derives" annotation —
+i.e., userspace-required.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/h264.c
+++ b/src/h264.c
+@@ -553,6 +553,35 @@
+ 	sps.profile_idc = h264_profile_to_idc(profile);
+
+ 	/*
+	 * VAAPI's decode-side VAPictureParameterBufferH264 does not carry
+	 * level_idc — see va.h, the field exists only in
+	 * VAEncSequenceParameterBufferH264 on the encode path. The H.264
+	 * SPS NAL is also not included in VASliceDataBuffer (ffmpeg-vaapi
+	 * parses it client-side and forwards only slice data), so a
+	 * SPS-NAL byte extractor is not viable from the bitstream we
+	 * receive.
+	 *
+	 * Hantro and other stateless H.264 decoders use level_idc to
+	 * pre-allocate decoder resources (DPB, motion-vector buffers); a
+	 * zero-init level_idc=0 is invalid (lowest legal is 10 = Level
+	 * 1.0) and causes hantro to silently skip the decode hardware
+	 * dispatch.
+	 *
+	 * Hardcode level_idc = 51 (Level 5.1, max for 1080p/4K@30) as a
+	 * known-incomplete intermediate. This INTENTIONALLY OVER-ALLOCATES
+	 * decoder resources and is sufficient for any stream up to 4K@30.
+	 * It is corpus-correct, not contract-correct.
+	 *
+	 * TODO: derive level_idc from (VAProfile, picture_width_in_mbs,
+	 * picture_height_in_mbs) per H.264 Annex A.3 max-MB-per-second
+	 * thresholds. That is a small lookup table but requires also
+	 * mapping the consumer's framerate, which VAAPI doesn't provide
+	 * directly. For now the over-allocation is the upstreamable
+	 * compromise.
+	 */
+	sps.level_idc = 51;
+
+	/*
+ 	 * Build the per-request control list incrementally:
+ 	 *   - SPS, PPS, DECODE_PARAMS: always required (in either decode
+ 	 *     mode).
@@ -0,0 +1,97 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-01
+Subject: [PATCH] DEBUG: dump VAPictureH264 raw bytes + decoded fields
+
+Diagnostic-only. Investigating the observed anomaly:
+
+  - V4L2 strace shows decode_params.top_field_order_cnt = 65536
+    on the first IDR frame submitted by mpv+ffmpeg+libva-v4l2-request
+  - GStreamer's reference path writes 0 (spec-correct: PicOrderCnt=0
+    for IDR with pic_order_cnt_type=0 / pic_order_cnt_lsb=0)
+  - Reading FFmpeg source (libavcodec/vaapi_h264.c::fill_vaapi_pic):
+      va_pic->TopFieldOrderCnt = 0;
+      if (pic->field_poc[0] != INT_MAX)
+          va_pic->TopFieldOrderCnt = pic->field_poc[0];
+    For IDR: ff_h264_init_poc sets field_poc[0] = poc_msb + poc_lsb
+    = 0 + 0 = 0. So FFmpeg should write 0.
+
+If FFmpeg writes 0 but fourier reads 65536, the mismatch is in the
+libva ABI between ffmpeg's writer and our reader. Most likely
+suspect: VA_PADDING_LOW size in VAPictureH264 differs between the
+libva headers ffmpeg+libva were built against and the headers
+fourier was built against, shifting struct field offsets.
+
+This patch dumps:
+  1. sizeof(VAPictureH264) at our reader's view
+  2. First 32 raw bytes of VAPicture->CurrPic
+  3. Field-decoded values via the .picture_id, .frame_idx, .flags,
+     .TopFieldOrderCnt, .BottomFieldOrderCnt accessors
+
+If the raw bytes show 00 00 01 00 at offset 12 (= 65536 LE), the
+field offset is correct and FFmpeg actually wrote 65536 — meaning
+either FFmpeg has a bug, or our test scenario triggers a non-spec
+code path. If the raw bytes show 00 00 00 00 at offset 12 but
+TopFieldOrderCnt accessor returns 65536, the struct ABI is
+mismatched and we need to reconcile libva versions.
+
+If sizeof(VAPictureH264) prints as something other than 36 (= 4*5
+ 4*VA_PADDING_LOW assuming VA_PADDING_LOW=4), the struct layout
+on this build differs from the documented libva-2.x layout.
+
+Removed once the source of the 65536 is identified.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/h264.c	2026-05-01 22:56:42.656744048 +0000
+++ b/src/h264.c	2026-05-02 00:00:00.000000000 +0000
+@@ -28,6 +28,7 @@
+ #include <assert.h>
+ #include <limits.h>
+ #include <string.h>
+#include <stdio.h>
+
+ #include <sys/ioctl.h>
+ #include <sys/mman.h>
+@@ -259,6 +259,42 @@
+ 	 * the OUTPUT bitstream — a hypothesis verified empirically by
+ 	 * running this patch and inspecting the CAPTURE buffer.
+ 	 */
+	/*
+	 * DEBUG INSTRUMENTATION (0014): dump the raw bytes of
+	 * VAPicture->CurrPic plus sizeof(VAPictureH264) so we can
+	 * tell whether the observed TopFieldOrderCnt=65536 anomaly is
+	 * (a) at the documented byte-offset 12 (ffmpeg-side bug or
+	 * intentional non-spec encoding) or
+	 * (b) at a different offset (libva ABI / VA_PADDING_LOW
+	 * mismatch between ffmpeg's writer and our reader).
+	 *
+	 * Documented VAPictureH264 layout (libva-2.x):
+	 *   offset 0:  VASurfaceID picture_id  (uint32)
+	 *   offset 4:  uint32 frame_idx
+	 *   offset 8:  uint32 flags
+	 *   offset 12: int32 TopFieldOrderCnt
+	 *   offset 16: int32 BottomFieldOrderCnt
+	 *   offset 20+: uint32 va_reserved[VA_PADDING_LOW]
+	 */
+	{
+		const unsigned char *cp = (const unsigned char *)&VAPicture->CurrPic;
+		char hex[32 * 3 + 1] = { 0 };
+		unsigned int i;
+		for (i = 0; i < 32; i++)
+			snprintf(hex + i * 3, 4, " %02x", cp[i]);
+		request_log("VAPictureH264 sizeof=%zu CurrPic[0..31]:%s\n",
+			    sizeof(VAPictureH264), hex);
+		request_log("VAPictureH264 CurrPic field reads: "
+			    "picture_id=0x%08x frame_idx=%u flags=0x%x "
+			    "TopFOC=%d BottomFOC=%d frame_num=%u\n",
+			    (unsigned)VAPicture->CurrPic.picture_id,
+			    (unsigned)VAPicture->CurrPic.frame_idx,
+			    (unsigned)VAPicture->CurrPic.flags,
+			    (int)VAPicture->CurrPic.TopFieldOrderCnt,
+			    (int)VAPicture->CurrPic.BottomFieldOrderCnt,
+			    (unsigned)VAPicture->frame_num);
+	}
+
+ 	decode->nal_ref_idc = nal_ref_idc;
+ 	decode->frame_num = VAPicture->frame_num;
+ 	decode->top_field_order_cnt = VAPicture->CurrPic.TopFieldOrderCnt;
@@ -0,0 +1,150 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-02
+Subject: [PATCH] h264: strip ffmpeg-vaapi POC sentinel before passing to V4L2
+
+ROOT CAUSE for "kernel decodes successfully but produces zeroed
+CAPTURE buffers despite no V4L2_BUF_FLAG_ERROR":
+
+ffmpeg's H264POCContext initialises prev_poc_msb to (1 << 16) =
+0x10000 as a sentinel for "uninitialised":
+  libavcodec/h264dec.c:301 — global init in ff_h264_decode_init
+  libavcodec/h264dec.c:444 — IDR reset in idr() helper
+ff_h264_init_poc (libavcodec/h264_parse.c:296-305) then computes
+pc->poc_msb = pc->prev_poc_msb whenever the slice header's
+pic_order_cnt_lsb hasn't wrapped relative to prev_poc_lsb (which
+is the typical case for any normal H.264 content with sane POC
+ordering). The sentinel leaks into field_poc[] (line 305) and from
+there into VAPictureH264.TopFieldOrderCnt / BottomFieldOrderCnt at
+libavcodec/vaapi_h264.c::fill_vaapi_pic (lines 73-78).
+
+Empirical confirmation via meitner 2026-05-02 ground-truth test:
+ran an LD_PRELOAD shim around vaCreateBuffer against an i965
+VAAPI backend decoding a 60-frame H.264 Main clip. Every frame
+showed TopFieldOrderCnt = (POC | 0x10000):
+
+  Frame 1 IDR:  raw bytes "00 00 01 00" at offset 12 → TopFOC=65536
+  Frame 2:      raw bytes "06 00 01 00"             → TopFOC=65542
+  Frame 3:      "02 00 01 00"                       → TopFOC=65538
+
+i965 successfully decodes regardless. V4L2 stateless drivers
+(hantro_h264.c::prepare_table feeds the value direct to
+tbl->poc[i*2]/[32], the kernel reflist builder uses it directly
+for cur_pic_order_count comparison) cannot tolerate the high word —
+the kernel's resource sizing math sees POC=65536 for an IDR and
+breaks.
+
+This patch adds h264_strip_ffmpeg_poc_sentinel() as a small static
+inline in src/h264.c. It detects bit 16 set rather than blindly
+subtracting, so a future ffmpeg version that fixes the leak
+degrades gracefully. The helper is applied at all four POC sites:
+
+  1. h264_fill_dpb:           dpb->top_field_order_cnt
+  2. h264_fill_dpb:           dpb->bottom_field_order_cnt
+  3. h264_va_picture_to_v4l2: decode->top_field_order_cnt
+  4. h264_va_picture_to_v4l2: decode->bottom_field_order_cnt
+
+VA_PICTURE_H264_INVALID DPB slots are short-circuited to POC=0
+because libavcodec/vaapi_h264.c::init_vaapi_pic (line 43) already
+sets POC=0 there; the sentinel never applies. Zeroing them
+explicitly removes a class of "stale POC value in invalidated
+slot" foot-guns.
+
+Non-trivial follow-ups identified during the meitner experiment
+that are NOT addressed by this patch:
+  - PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags are
+    not yet derived from VASliceParameterBufferH264.slice_type. The
+    bbb corpus is I-only at the start so this hasn't been a
+    blocker, but a clip with B-frames will need the slice-type
+    routing patch.
+  - h264_fill_dpb's pic_num assignment (entry->pic.picture_id) is
+    almost certainly wrong per the kernel doc — pic_num must equal
+    the H.264 spec's PicNum / FrameNumWrap, not the VAAPI surface
+    id. Out of scope here; will surface as a defect on streams
+    that have multi-frame DPB lookups.
+
+Cross-references:
+  audit_0008_decode_params_2026-05-01.md — kernel-side consumer
+    audit confirming POC fields are userspace-required.
+  api_contract_findings_2026-05-01.md — VAAPI doc gap on POC
+    semantics; H.264 spec section 8.2.1 is the binding contract.
+  meitner_2026-05-02_vaapi_idr_groundtruth/ — full empirical
+    capture of the sentinel pattern across 60 frames.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/h264.c
+++ b/src/h264.c
+@@ -187,6 +187,43 @@
+ 	}
+ }
+
+/*
+ * Strip ffmpeg-vaapi's POC sentinel.
+ *
+ * ffmpeg's H264POCContext initialises prev_poc_msb to (1 << 16) =
+ * 0x10000 in libavcodec/h264dec.c (lines 301 and 444 of v8.0). After
+ * an IDR the idr() helper resets prev_poc_msb to that same sentinel.
+ * ff_h264_init_poc (libavcodec/h264_parse.c lines 296-305) then
+ * computes pc->poc_msb as prev_poc_msb when the slice header's
+ * poc_lsb hasn't wrapped — which is the typical case for normal
+ * content. The sentinel leaks into field_poc[] and from there into
+ * VAPictureH264.TopFieldOrderCnt / BottomFieldOrderCnt at
+ * libavcodec/vaapi_h264.c::fill_vaapi_pic.
+ *
+ * Working VAAPI backends (intel-iHD, i965 verified empirically on
+ * meitner 2026-05-02) tolerate the high word — they either mask it
+ * or treat POCs as relative comparisons. V4L2 stateless H.264
+ * driver-side consumers (hantro_h264.c::prepare_table feeds the
+ * value direct to tbl->poc[]) need the spec value, so we strip the
+ * sentinel here at the libva-v4l2-request boundary.
+ *
+ * Detection by bit-16-set rather than blind subtraction so that a
+ * future ffmpeg version that fixes the sentinel leak degrades
+ * gracefully. POC values for non-degenerate H.264 content rarely
+ * exceed 16 bits; bit 16 set is a strong signal of the sentinel.
+ *
+ * Empty DPB slots (VA_PICTURE_H264_INVALID) carry POC=0 by
+ * libavcodec/vaapi_h264.c::init_vaapi_pic and need no fix-up.
+ */
+static inline int32_t h264_strip_ffmpeg_poc_sentinel(int32_t poc, uint32_t flags)
+{
+	if (flags & VA_PICTURE_H264_INVALID)
+		return 0;
+	if (poc & (1 << 16))
+		return poc - (1 << 16);
+	return poc;
+}
+
+ static void h264_fill_dpb(struct request_data *data,
+ 			  struct object_context *context,
+ 			  struct v4l2_ctrl_h264_decode_params *decode)
+@@ -210,8 +247,12 @@
+
+ 		dpb->frame_num = entry->pic.frame_idx;
+ 		dpb->pic_num = entry->pic.picture_id;
+-		dpb->top_field_order_cnt = entry->pic.TopFieldOrderCnt;
+-		dpb->bottom_field_order_cnt = entry->pic.BottomFieldOrderCnt;
+		dpb->top_field_order_cnt =
+			h264_strip_ffmpeg_poc_sentinel(entry->pic.TopFieldOrderCnt,
+						       entry->pic.flags);
+		dpb->bottom_field_order_cnt =
+			h264_strip_ffmpeg_poc_sentinel(entry->pic.BottomFieldOrderCnt,
+						       entry->pic.flags);
+
+ 		dpb->flags = V4L2_H264_DPB_ENTRY_FLAG_VALID;
+
+@@ -298,8 +339,12 @@
+
+ 	decode->nal_ref_idc = nal_ref_idc;
+ 	decode->frame_num = VAPicture->frame_num;
+-	decode->top_field_order_cnt = VAPicture->CurrPic.TopFieldOrderCnt;
+-	decode->bottom_field_order_cnt = VAPicture->CurrPic.BottomFieldOrderCnt;
+	decode->top_field_order_cnt =
+		h264_strip_ffmpeg_poc_sentinel(VAPicture->CurrPic.TopFieldOrderCnt,
+					       VAPicture->CurrPic.flags);
+	decode->bottom_field_order_cnt =
+		h264_strip_ffmpeg_poc_sentinel(VAPicture->CurrPic.BottomFieldOrderCnt,
+					       VAPicture->CurrPic.flags);
+
+ 	if (nal_unit_type == 5)
+ 		decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC;
@@ -0,0 +1,82 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-02
+Subject: [PATCH] h264: derive PFRAME / BFRAME flags from VASlice slice_type
+
+v4l2_ctrl_h264_decode_params.flags has PFRAME and BFRAME bits per
+ext-ctrls-codec-stateless.rst. fourier never set them; libva-v4l2-
+request relied on each backing driver tolerating frame-class
+ambiguity.
+
+Kernel survey (linux 6.19.x):
+  - tegra-vde/h264.c (lines 783-799) consumes both flags to select
+    the inter-frame decode kernel. Without them the I-frame kernel
+    runs on P/B content.
+  - visl-trace-h264.h uses them for decode tracing.
+  - hantro / rkvdec / cedrus / mediatek / qcom-iris-stateless do
+    not consume the flags.
+
+Hantro on ohm decoded bbb cleanly without these flags set (see
+phase6/step1/ohm_smoke_2026-05-02T060255Z_post_0015/), so this is
+an upstreamability fix for cross-driver portability rather than a
+correctness fix for hantro.
+
+VAAPI's VASliceParameterBufferH264.slice_type maps directly to the
+H.264 slice_header() slice_type field. Per spec 7.4.3:
+  0=P 1=B 2=I 3=SP 4=SI; 5..9 = "all slices in the picture have
+  this slice_type." `slice_type % 5` recovers the underlying type
+  in either encoding form.
+
+In FRAME_BASED mode we only see surface->params.h264.slice from the
+most-recent VASliceParameterBuffer — that's fine: a single coded
+picture has a uniform slice_type for the purposes of the PFRAME /
+BFRAME flag (multi-slice frames may mix slice types in some streams,
+but the flag's semantic is "this is an inter-coded frame," which
+holds if any slice is P or B; using the last-seen slice's type is
+a reasonable approximation).
+
+Cross-reference: ext-ctrls-codec-stateless.rst Decode Parameters
+Flags table.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/h264.c
+++ b/src/h264.c
+@@ -587,6 +587,38 @@
+ 			      &surface->params.h264.slice,
+ 			      &surface->params.h264.picture, &slice, &weights);
+
+	/*
+	 * Derive PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags
+	 * from VASliceParameterBufferH264.slice_type. VAAPI's slice_type
+	 * matches the H.264 spec slice_type semantic: 0=P, 1=B, 2=I, 3=SP,
+	 * 4=SI; values 5..9 mean "all slices in the picture have this
+	 * slice_type" (mod 5 yields the underlying type). VAAPI consumers
+	 * (ffmpeg, mpv) populate this for every slice; in FRAME_BASED mode
+	 * we only see the most-recent slice's params, but slice_type is
+	 * uniform across a single coded picture for our purposes.
+	 *
+	 * Kernel consumers that read these flags: tegra-vde
+	 * (drivers/media/platform/nvidia/tegra-vde/h264.c lines 783-799 of
+	 * 6.19.x) selects the inter-frame decode kernel. Hantro / rkvdec /
+	 * cedrus / mediatek / qcom-iris-stateless do not consume them.
+	 * Setting them keeps the libva-v4l2-request fork upstreamable
+	 * across drivers without affecting hantro behaviour.
+	 *
+	 * Cross-reference: ext-ctrls-codec-stateless.rst Decode Parameters
+	 * Flags — V4L2_H264_DECODE_PARAM_FLAG_PFRAME / _BFRAME.
+	 */
+	switch (surface->params.h264.slice.slice_type % 5) {
+	case H264_SLICE_P:
+		decode.flags |= V4L2_H264_DECODE_PARAM_FLAG_PFRAME;
+		break;
+	case H264_SLICE_B:
+		decode.flags |= V4L2_H264_DECODE_PARAM_FLAG_BFRAME;
+		break;
+	default:
+		/* I / SP / SI: no extra flag. */
+		break;
+	}
+
+ 	sps.profile_idc = h264_profile_to_idc(profile);
+
+ 	/*
@@ -0,0 +1,124 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-02
+Subject: [PATCH] h264: fill dpb[].pic_num as PicNum/LongTermPicNum, not VAAPI surface id
+
+fourier's h264_fill_dpb assigned `dpb->pic_num = entry->pic.picture_id`
+— the VAAPI surface id. Per ext-ctrls-codec-stateless.rst:651-655,
+v4l2_h264_dpb_entry.pic_num must equal the H.264 spec PicNum
+(equation 8-28) for short-term references or LongTermPicNum
+(equation 8-29) for long-term references. The surface id has no
+relationship to either.
+
+Kernel-side consumers of pic_num:
+  - mediatek/decoder/vdec/vdec_h264_req_common.c (line 210):
+    dst_entry->pic_num = src_entry->pic_num. Used for
+    field-coded short-term reference disambiguation.
+  - hantro / rkvdec / cedrus / qcom-iris-stateless: do NOT read
+    pic_num. They resolve refs via reference_ts (timestamp)
+    and POC. This is why fourier's wrong value never surfaced
+    on RK3568 hantro.
+
+This patch makes pic_num spec-correct so the libva-v4l2-request
+fork is upstreamable across drivers without depending on each
+target's tolerance for non-spec fills.
+
+Computation, derived from H.264 spec section 8.2.4.1:
+
+  For frames (not field-coded), PicNum = FrameNumWrap.
+  FrameNumWrap = (frame_num > cur_frame_num)
+                 ? frame_num - max_frame_num
+                 : frame_num
+
+  max_frame_num = 1 << (sps.log2_max_frame_num_minus4 + 4)
+  cur_frame_num = current picture's frame_num
+
+For long-term references:
+  LongTermPicNum = long_term_frame_idx (when not field-coded).
+  VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic line 64):
+    VAPictureH264.frame_idx = long_ref ? pic_id : frame_num
+  So long-term refs already carry long_term_frame_idx in frame_idx;
+  we copy it through.
+
+Field-coded streams require an extra factor-of-2 plus a parity
+adjustment per spec equations 8-28/8-29; this patch does not handle
+field-coded content. ohm corpus is all frame-coded so this is a
+follow-up for later.
+
+Implementation: add VAPicture parameter to h264_fill_dpb so the
+function has access to seq_fields.log2_max_frame_num_minus4 and
+the current picture's frame_num. Update the single caller in
+h264_va_picture_to_v4l2.
+
+Cross-reference: kernel doc ext-ctrls-codec-stateless.rst dpb_entry
+table (line 651-655) and mediatek/vdec/vdec_h264_req_common.c
+line 210.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/h264.c
+++ b/src/h264.c
+@@ -226,8 +226,12 @@
+
+ static void h264_fill_dpb(struct request_data *data,
+ 			  struct object_context *context,
+			  VAPictureParameterBufferH264 *VAPicture,
+ 			  struct v4l2_ctrl_h264_decode_params *decode)
+ {
+	const int max_frame_num =
+		1 << (VAPicture->seq_fields.bits.log2_max_frame_num_minus4 + 4);
+	const int cur_frame_num = (int)VAPicture->frame_num;
+ 	int i;
+
+ 	for (i = 0; i < H264_DPB_SIZE; i++) {
+@@ -246,7 +250,41 @@
+ 		}
+
+ 		dpb->frame_num = entry->pic.frame_idx;
+-		dpb->pic_num = entry->pic.picture_id;
+
+		/*
+		 * Per ext-ctrls-codec-stateless.rst, dpb[].pic_num must
+		 * equal the H.264 spec's PicNum (8-28) for short-term refs
+		 * or LongTermPicNum (8-29) for long-term refs.
+		 *
+		 * For frames (not field-coded), PicNum = FrameNumWrap.
+		 * FrameNumWrap = (frame_num > cur_frame_num)
+		 *                ? frame_num - max_frame_num
+		 *                : frame_num
+		 * (per spec section 8.2.4.1, frame_num wraparound).
+		 *
+		 * VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic
+		 * line 64): VAPictureH264.frame_idx holds long_term_frame_idx
+		 * for long-term refs and frame_num for short-term refs. So
+		 * for long-term entries we copy frame_idx straight through
+		 * as LongTermPicNum.
+		 *
+		 * fourier's previous code set pic_num to picture_id (the
+		 * VAAPI surface id) which is unrelated to H.264 PicNum;
+		 * mediatek's vdec_h264_req_common.c::dst_entry->pic_num is
+		 * one consumer that fails on that. Hantro doesn't read
+		 * pic_num at all (uses reference_ts for ref resolution),
+		 * which is why fourier's wrong value never surfaced on
+		 * RK3568.
+		 */
+		if (entry->pic.flags & VA_PICTURE_H264_LONG_TERM_REFERENCE) {
+			dpb->pic_num = entry->pic.frame_idx;
+		} else {
+			int frame_num = (int)entry->pic.frame_idx;
+			dpb->pic_num = (frame_num > cur_frame_num)
+				? frame_num - max_frame_num
+				: frame_num;
+		}
+
+ 		dpb->top_field_order_cnt =
+ 			h264_strip_ffmpeg_poc_sentinel(entry->pic.TopFieldOrderCnt,
+ 						       entry->pic.flags);
+@@ -283,7 +321,7 @@
+ 	nal_ref_idc = (b[0] >> 5) & 0x3;
+ 	nal_unit_type = b[0] & 0x1f;
+
+-	h264_fill_dpb(driver_data, context, decode);
+	h264_fill_dpb(driver_data, context, VAPicture, decode);
+
+ 	/*
+ 	 * Populate every V4L2_CID_STATELESS_H264_DECODE_PARAMS field
@@ -0,0 +1,159 @@
+From: Markus Fritsche <fritsche.markus@gmail.com>
+Date: 2026-05-02
+Subject: [PATCH] h264: derive sps.level_idc from H.264 Annex A.3 MaxFS
+
+Replaces patch 0013's hardcoded level_idc = 51 with a small lookup
+that picks the smallest level whose MaxFS contains the encoded
+frame size. Patch 0013's TODO is resolved by this change.
+
+VAAPI does not expose level_idc on the decode side
+(VAPictureParameterBufferH264 has no such field; only
+VAEncSequenceParameterBufferH264 carries it). The H.264 SPS NAL is
+parsed client-side by ffmpeg-vaapi and only slice data forwards in
+VASliceDataBuffer, so a SPS-NAL byte parser is not viable from the
+bitstream the libva-v4l2-request layer receives. We therefore
+derive level_idc from picture dimensions, which VAAPI does provide
+in VAPictureParameterBufferH264.picture_{width,height}_in_mbs_minus1.
+
+Annex A.3 (Table A-1) MaxFS thresholds:
+  Level 1.0:    99 MBs   ( 176×144  =  11×9   = 99 )
+  Level 1.1:   396       ( 352×288  =  22×18  = 396 )
+  Level 2.0:   396
+  Level 2.1:   792       ( 352×576  /  720×288 )
+  Level 2.2:  1620       ( 720×480 ≈ 1350; 720×576 = 1620 )
+  Level 3.0:  1620
+  Level 3.1:  3600       (1280×720 ≈ 3600 )
+  Level 3.2:  5120
+  Level 4.0:  8192       (1920×1088 = 8160 fits )
+  Level 4.1:  8192
+  Level 4.2:  8704
+  Level 5.0: 22080
+  Level 5.1: 36864       (3840×2176 = 32640 fits; 4K@8K-edge )
+  Level 5.2: 36864
+  Level 6.0: 139264      (8K )
+
+V4L2 control encoding: level_idc = (level major × 10) + (level minor).
+Level 4.1 → 41, Level 5.1 → 51, Level 6.0 → 60.
+
+Picks for typical content:
+  1080p (1920×1088 = 8160 MBs) → Level 4.1 (level_idc = 41)
+  4K    (3840×2176 = 32640 MBs) → Level 5.1 (level_idc = 51)
+  8K   (7680×4352 = 130560 MBs) → Level 6.0 (level_idc = 60)
+
+The previous hardcode of 51 was over-allocating for 1080p; with
+this patch hantro can pre-allocate based on the actual frame size.
+For our ohm corpus (1080p) this drops the requested DPB / MV
+buffer sizing from level-5.1 generosity to level-4.1 right-sized.
+
+Without VAAPI exposing framerate we cannot also check MaxMBPS /
+MaxBR / MaxCPB. The frame-size-based pick is acceptable in
+practice: temporally-dense streams almost always also push
+spatially-large frames, so MaxFS captures the dominant
+resource-sizing signal.
+
+Cross-reference: H.264 spec Annex A, Table A-1 ("Level limits").
+ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SPS lists
+level_idc as required-userspace-input, no kernel-derives annotation.
+
+Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
+---
+--- a/src/h264.c
+++ b/src/h264.c
+@@ -638,6 +638,55 @@
+ 	}
+ }
+
+/*
+ * Derive sps.level_idc from the encoded frame size in macroblocks per
+ * H.264 Annex A.3 (Table A-1) MaxFS thresholds. Each level's MaxFS is
+ * the maximum encoded frame size in MBs the level supports; we pick
+ * the smallest level whose MaxFS contains the actual frame size.
+ *
+ * Level decoding for the V4L2 control: level_idc = level * 10
+ *   Level 1.0 → 10, Level 4.1 → 41, Level 5.1 → 51, Level 6.0 → 60.
+ *
+ * VAAPI does not expose the bitstream's actual level_idc on the
+ * decode side (VAPictureParameterBufferH264 has no such field) — see
+ * va.h. The H.264 SPS NAL is parsed client-side by ffmpeg-vaapi /
+ * mpv and only slice data is forwarded in VASliceDataBuffer, so a
+ * SPS-NAL byte parser is not viable at this layer.
+ *
+ * Without framerate we cannot also check MaxMBPS / MaxBR / MaxCPB.
+ * That gap is acceptable in practice: consumers that push
+ * temporally-dense streams (high MBPS) almost always also push
+ * spatially-large frames (high MaxFS), so frame-size-based level
+ * selection over-allocates on the temporal axis but never
+ * under-allocates a level the consumer relies on for correct
+ * decode-resource sizing.
+ *
+ * Picks for typical content:
+ *   1080p (8160 MBs) → Level 4.1 (level_idc = 41)
+ *   4K   (32400 MBs) → Level 5.1 (level_idc = 51)
+ *   8K  (138240 MBs) → Level 6.0 (level_idc = 60)
+ *
+ * Replaces the hardcoded level_idc=51 from patch 0013.
+ */
+static inline __u8 h264_derive_level_idc(unsigned int width_in_mbs,
+					 unsigned int height_in_mbs)
+{
+	const unsigned int frame_size_mbs = width_in_mbs * height_in_mbs;
+
+	if (frame_size_mbs <= 99)     return 10;  /* Level 1.0 */
+	if (frame_size_mbs <= 396)    return 11;  /* Level 1.1 - 2.0 */
+	if (frame_size_mbs <= 792)    return 21;  /* Level 2.1 */
+	if (frame_size_mbs <= 1620)   return 22;  /* Level 2.2 - 3.0 */
+	if (frame_size_mbs <= 3600)   return 31;  /* Level 3.1 */
+	if (frame_size_mbs <= 5120)   return 32;  /* Level 3.2 */
+	if (frame_size_mbs <= 8192)   return 41;  /* Level 4.0 - 4.1 */
+	if (frame_size_mbs <= 8704)   return 42;  /* Level 4.2 */
+	if (frame_size_mbs <= 22080)  return 50;  /* Level 5.0 */
+	if (frame_size_mbs <= 36864)  return 51;  /* Level 5.1 - 5.2 */
+	if (frame_size_mbs <= 139264) return 60;  /* Level 6.0 - 6.2 */
+	return 62;                                /* > Level 6 ceiling */
+}
+
+ int h264_set_controls(struct request_data *driver_data,
+ 		      struct object_context *context,
+ 		      VAProfile profile,
+@@ -705,33 +754,15 @@
+ 	sps.profile_idc = h264_profile_to_idc(profile);
+
+ 	/*
+-	 * VAAPI's decode-side VAPictureParameterBufferH264 does not carry
+-	 * level_idc — see va.h, the field exists only in
+-	 * VAEncSequenceParameterBufferH264 on the encode path. The H.264
+-	 * SPS NAL is also not included in VASliceDataBuffer (ffmpeg-vaapi
+-	 * parses it client-side and forwards only slice data), so a
+-	 * SPS-NAL byte extractor is not viable from the bitstream we
+-	 * receive.
+-	 *
+-	 * Hantro and other stateless H.264 decoders use level_idc to
+-	 * pre-allocate decoder resources (DPB, motion-vector buffers); a
+-	 * zero-init level_idc=0 is invalid (lowest legal is 10 = Level
+-	 * 1.0) and causes hantro to silently skip the decode hardware
+-	 * dispatch.
+-	 *
+-	 * Hardcode level_idc = 51 (Level 5.1, max for 1080p/4K@30) as a
+-	 * known-incomplete intermediate. This INTENTIONALLY OVER-ALLOCATES
+-	 * decoder resources and is sufficient for any stream up to 4K@30.
+-	 * It is corpus-correct, not contract-correct.
+-	 *
+-	 * TODO: derive level_idc from (VAProfile, picture_width_in_mbs,
+-	 * picture_height_in_mbs) per H.264 Annex A.3 max-MB-per-second
+-	 * thresholds. That is a small lookup table but requires also
+-	 * mapping the consumer's framerate, which VAAPI doesn't provide
+-	 * directly. For now the over-allocation is the upstreamable
+-	 * compromise.
+	 * Derive level_idc from encoded frame size per H.264 Annex A.3.
+	 * VAAPI doesn't expose level_idc on the decode side (see
+	 * h264_derive_level_idc()'s docblock for the rationale); we pick
+	 * the smallest level whose MaxFS contains the picture dimensions.
+	 * Replaces patch 0013's intermediate hardcode of 51.
+ 	 */
+-	sps.level_idc = 51;
+	sps.level_idc = h264_derive_level_idc(
+		(unsigned int)surface->params.h264.picture.picture_width_in_mbs_minus1 + 1u,
+		(unsigned int)surface->params.h264.picture.picture_height_in_mbs_minus1 + 1u);
+
+ 	/*
+ 	 * Build the per-request control list incrementally:
@@ -0,0 +1,255 @@
+# Maintainer: Markus Fritsche <fritsche.markus@gmail.com>
+# Campaign: ohm_gl_fix Phase 6 Step 1
+#
+# Forks libva-v4l2-request to add hantro-vpu multiplanar + modern
+# stateless UAPI support. Conflicts/replaces stock libva-v4l2-request.
+#
+# Build target: fermi LXD on hertz (Arch ARM aarch64) via marfrit-packages
+# Gitea Actions; alternative: boltzmann via his subagent.
+
+pkgname=libva-v4l2-request-ohm-gl-fix
+_upstreampkg=libva-v4l2-request
+pkgver=1.0.0.r0.ga3c2476
+pkgrel=2
+pkgdesc="VA-API backend for V4L2 stateless decoders, hantro-vpu multiplanar fork"
+arch=('aarch64')
+url="https://github.com/bootlin/libva-v4l2-request"
+license=('LGPL2.1' 'MIT')
+depends=('libva' 'libdrm' 'systemd-libs')
+makedepends=('meson' 'ninja' 'pkgconf' 'git')
+provides=("${_upstreampkg}=${pkgver}" 'libva-driver')
+conflicts=("${_upstreampkg}")
+replaces=("${_upstreampkg}")
+
+# Bootlin upstream tarball — pinned to last meaningful commit 2019-05-17.
+# Use full SHA: github extracts the archive to <repo>-<full-sha>/, so the
+# short form would mismatch ${srcdir}/${_upstreampkg}-${_commit}.
+_commit=a3c2476de19e6635458273ceeaeceff124fabd63
+source=(
+    "${_upstreampkg}-${_commit}.tar.gz::https://github.com/bootlin/libva-v4l2-request/archive/${_commit}.tar.gz"
+    "fourier-local.patch"
+    "0001-mplane-multiplanar-port.patch"
+    "0002-pre-streamon-controls-and-output-pool.patch"
+    "0003-v4l2-query-helpers.patch"
+    "0004-context-request-pool.patch"
+    "0005-h264-conditional-pred-weights.patch"
+    "0006-h264-frame-based-omit-slice-controls.patch"
+    "0007-context-h264-start-code-annex-b.patch"
+    "0008-h264-fill-decode-params-from-vaapi.patch"
+    "0009-surface-no-capture-sfmt.patch"
+    "0010-DEBUG-hex-dump-output-capture.patch"
+    "0011-DEBUG-sentinel-capture-buffer.patch"
+    "0012-h264-omit-scaling-matrix-frame-based.patch"
+    "0013-h264-sps-level-idc.patch"
+    "0014-DEBUG-vapic-bytes-dump.patch"
+    "0015-h264-strip-ffmpeg-poc-sentinel.patch"
+    "0016-h264-derive-pframe-bframe-flags.patch"
+    "0017-h264-dpb-picnum-correctness.patch"
+    "0018-h264-level-idc-from-annex-a3.patch"
+)
+sha256sums=(
+    '92b523050561d64f7b6016edb53ca00524805f9f31a8b566baf457bbb15716fa'
+    '1577ff1e2fd7944d2af85bba07658c26b2c54787175c2cc9024174ad2425d3ac'
+    '7637c8c76f86a4b745516cdfd1ee89484b7fe7ce88425ff38460bd4494e1451e'
+    '5309582759f260456b15635be610c2fe6fe25cbdf427cf8ad851f74991dc8c6e'
+    '4e38eacc2b2dc26094cbad38964e8dc8bec19d2ad408a37a3ee21952003e6c38'
+    'e5b61965921093292912136ae21727c9c792d0417201d86dc90b2e622f1edbb0'
+    'c7a8e02f2e84c6248586d1ceacf25f4c26578f2f365044c3b4a011080ec016e8'
+    '5ea1f8b193a3cba21631b00ac3d9cb8c6016754f0af47b33fcccce9e0114b32a'
+    '092abc79f639ecb7ac698a48fb544edc3b6eff3cb9b711efa2cc452365c17ed0'
+    'ac81992783f562128f55620dc54507b70026342519bb7c7d3efadf6275387861'
+    '6b2c26feeeaf253f87e0ef0517191b636c6945e374660a13574f3114331aaed6'
+    'ef8706062302fd7f13535501b5b2e8aed5325dbb0d4d56a88674bef53ca96eee'
+    '242a42e10ff09e4e82bcadb8824e036dddab94cf6dc9c5f6c80eb4c2cc5dda50'
+    '417c39397dfbc86db2cabc6217f54d9072de26dedcacf9a965b909fc998de052'
+    '472deb316ff3ad282c6be028cfaf033d69ddfee845dcd519c28a0692f298bb6a'
+    '835378dd0b7c126a6101b8df0c015951d88f5139f9586a618af6b3ee503d67b6'
+    '380d334a88213185183a05e7e55380de503e388fc29d8b11d96909dafcbbeb65'
+    'eaf1e363de111ee43d7ca3e4b161d9a3a3f6b1c9ca3d8642871abe70f18fbf95'
+    '7b6f0f63fdde32a411cf3230cabeb610ef8a6bd09777976a06dbd274daa540c7'
+    '15a0a40b918988e77e5f36eebf15e9f45b2c13a6628b5640efdac528c57aab80'
+)
+
+prepare() {
+    cd "${srcdir}/${_upstreampkg}-${_commit}"
+
+    # Patch 0: fourier's stateless-control modernization.
+    # - src/h264.c + src/picture.c → V4L2_CID_STATELESS_H264_*
+    # - include/hevc-ctrls.h → redirect shim to <linux/v4l2-controls.h>
+    # - src/meson.build: h265.c/h265.h commented out (HEVC excluded)
+    patch -p1 < "${srcdir}/fourier-local.patch"
+
+    # Hygiene: include/h264-ctrls.h is dead post-fourier (no source
+    # includes it, no install_headers directive). Drop it so the
+    # built source tree has no stale UAPI carry-over.
+    rm -f include/h264-ctrls.h
+
+    # Patch 1: ohm-gl-fix multiplanar port.
+    # - V4L2_BUF_TYPE_VIDEO_{OUTPUT,CAPTURE} -> *_MPLANE in src/v4l2.c
+    # - per-plane VIDIOC_EXPBUF in src/surface.c
+    # - struct v4l2_plane planes[] threading throughout
+    # - image.c plane-stride adjustments
+    patch -p1 < "${srcdir}/0001-mplane-multiplanar-port.patch"
+
+    # Patch 2: pre-STREAMON device controls + minimum OUTPUT pool.
+    # - context.c: floor OUTPUT pool to 4 buffers (not surfaces_count)
+    # - context.c: set V4L2_CID_STATELESS_H264_{DECODE_MODE,START_CODE}
+    #              device-wide before VIDIOC_STREAMON
+    # THROWAWAY: superseded inline by patches 4 (request_pool) and 5
+    # (probe-then-set DECODE_MODE) per upstreamable_design.md §5.
+    patch -p1 < "${srcdir}/0002-pre-streamon-controls-and-output-pool.patch"
+
+    # Patch 3 (commit 2 in revised plan): QUERYCTRL/QUERYMENU helpers.
+    # Pure utility additions to src/v4l2.{c,h}, no behaviour change.
+    # Unblocks the request_pool and probe-then-set commits.
+    patch -p1 < "${srcdir}/0003-v4l2-query-helpers.patch"
+
+    # Patch 4 (commit 3 in revised plan): request_pool decoupling.
+    # NEW src/request_pool.{c,h}; context.c uses pool instead of
+    # per-surface OUTPUT loop; picture.c borrows on Begin, releases
+    # on Sync after DQBUF. Deletes 0002's "floor to 4" hunk inline
+    # — the pool's count parameter supersedes it. 0002's
+    # set_controls block remains until probe-then-set commit lands.
+    patch -p1 < "${srcdir}/0004-context-request-pool.patch"
+
+    # Patch 5: conditional PRED_WEIGHTS submission. Defect-fix found
+    # via ohm smoke testing (kernel rejects PRED_WEIGHTS at
+    # error_idx=5 on Main-profile clips without weighted prediction).
+    # Not part of Sonnet's planned series, but unblocks per-frame
+    # decode on every backing driver.
+    patch -p1 < "${srcdir}/0005-h264-conditional-pred-weights.patch"
+
+    # Patch 6: omit per-slice controls in FRAME_BASED mode. Identified
+    # via cross-reference against GStreamer's gstv4l2codech264dec.c
+    # (commit 9e3e775). FRAME_BASED requests must contain only
+    # SPS/PPS/SCALING_MATRIX/DECODE_PARAMS — submitting SLICE_PARAMS
+    # triggers V4L2 cluster validation EINVAL at error_idx=count.
+    # Hardcodes slice_based=false for now since 0002 sets FRAME_BASED;
+    # promotes to runtime probe via context->decode_mode in a
+    # follow-up commit.
+    patch -p1 < "${srcdir}/0006-h264-frame-based-omit-slice-controls.patch"
+
+    # Patch 7: enable ANNEX_B start-code emission in
+    # codec_store_buffer to match the device-side START_CODE_ANNEX_B
+    # set by 0002. Without this, kernel sees a raw NAL stream with
+    # no 0x00 0x00 0x01 markers, fails to parse slice boundaries,
+    # and emits a zeroed CAPTURE buffer (flat green frames in mpv).
+    patch -p1 < "${srcdir}/0007-context-h264-start-code-annex-b.patch"
+
+    # Patch 8: fill DECODE_PARAMS frame_num + FIELD_PIC/BOTTOM_FIELD
+    # flag bits from VAAPI. fourier left these zero-init; under
+    # FRAME_BASED on hantro the kernel uses them to drive bitstream
+    # parsing. Empirical question: does hantro tolerate the bit_size
+    # fields (idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
+    # dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
+    # slice_group_change_cycle) being zero, or do we need a
+    # slice_header() bit-level parser?
+    patch -p1 < "${srcdir}/0008-h264-fill-decode-params-from-vaapi.patch"
+
+    # Patch 9: drop VIDIOC_S_FMT on CAPTURE queue. Hantro derives
+    # CAPTURE format from per-request SPS; explicit S_FMT here can
+    # leave the driver in a state where DQBUF returns zeroed
+    # buffers despite no errors. GStreamer's reference path only
+    # G_FMTs the CAPTURE side.
+    patch -p1 < "${srcdir}/0009-surface-no-capture-sfmt.patch"
+
+    # Patch 10: DEBUG-only instrumentation. Hex-dumps OUTPUT and
+    # CAPTURE buffer first 32 bytes per frame via request_log().
+    # Removed before upstream submission.
+    patch -p1 < "${srcdir}/0010-DEBUG-hex-dump-output-capture.patch"
+
+    # Patch 11: DEBUG-only sentinel write before CAPTURE QBUF.
+    # Tells us whether kernel wrote to the buffer (sentinel gone)
+    # or didn't (sentinel survives).
+    patch -p1 < "${srcdir}/0011-DEBUG-sentinel-capture-buffer.patch"
+
+    # Patch 12 (REVISED 2026-05-02): gate SCALING_MATRIX submission
+    # on a per-surface matrix_set flag mirroring fourier's existing
+    # mpeg2.iqmatrix_set / h265.iqmatrix_set pattern. The earlier
+    # draft of this patch unconditionally omitted SCALING_MATRIX in
+    # FRAME_BASED, which was corpus-correct (bbb has no explicit
+    # scaling lists) but the wrong predicate — kernel-side gating
+    # is by "matrix-supplied vs. not," not by decode mode. Streams
+    # with explicit scaling lists must still submit in either mode.
+    # Three coordinated changes: surface.h adds bool matrix_set;
+    # picture.c sets it on VAIQMatrixBuffer arrival and resets it
+    # in RequestBeginPicture; h264.c builds controls[] incrementally.
+    # The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is
+    # preserved (kernel doc is explicit on it).
+    patch -p1 < "${srcdir}/0012-h264-omit-scaling-matrix-frame-based.patch"
+
+    # Patch 13 (REVISED 2026-05-02): hardcode SPS level_idc = 51 as
+    # an INTENTIONALLY OVER-ALLOCATING known-incomplete intermediate.
+    # VAAPI's decode-side picture-parameter buffer structurally lacks
+    # level_idc (only present in encode path). The H.264 SPS NAL is
+    # not in VASliceDataBuffer either (ffmpeg-vaapi parses it
+    # client-side and forwards only slice data — verified via the
+    # 0010 hex dump showing OUTPUT first bytes are "00 00 01 65 ...",
+    # i.e. start code + IDR slice NAL, no SPS). So a SPS-NAL byte
+    # extractor is not viable. TODO captured inline for level-from-
+    # resolution derivation per H.264 Annex A.3.
+    patch -p1 < "${srcdir}/0013-h264-sps-level-idc.patch"
+
+    # Patch 14: DEBUG-only — dump VAPictureH264 raw bytes + decoded
+    # fields. Used to disambiguate the TopFieldOrderCnt=65536 anomaly
+    # on ohm in 2026-04-30..2026-05-02 investigation. The dump output
+    # cross-referenced against meitner ground-truth (i965 backend)
+    # confirmed +0x10000 is ffmpeg-vaapi convention, not an ohm bug.
+    # Resolution: patch 0015. This patch stays in the series until
+    # the 65536 sentinel handling has been validated on ohm; remove
+    # before upstream submission.
+    patch -p1 < "${srcdir}/0014-DEBUG-vapic-bytes-dump.patch"
+
+    # Patch 15: strip ffmpeg-vaapi's POC sentinel before passing to
+    # V4L2. Root-cause fix for the "kernel decodes successfully but
+    # produces zeroed CAPTURE buffers" symptom. ffmpeg's
+    # H264POCContext initialises prev_poc_msb to (1 << 16) and the
+    # value leaks through field_poc[] to VAPictureH264. Working
+    # backends (i965, intel-iHD) tolerate the high word; V4L2
+    # stateless drivers cannot. Adds h264_strip_ffmpeg_poc_sentinel()
+    # static inline and applies it at all 4 POC sites (DPB top/bot,
+    # CurrPic top/bot). Detection by bit-16-set so a future ffmpeg
+    # version that fixes the leak degrades gracefully.
+    patch -p1 < "${srcdir}/0015-h264-strip-ffmpeg-poc-sentinel.patch"
+
+    # Patch 16: derive PFRAME / BFRAME flags from VAAPI slice_type.
+    # Upstreamability fix — tegra-vde consumes these flags to choose
+    # the inter-frame decode kernel; hantro/rkvdec/cedrus/mediatek/
+    # qcom don't read them but should still see spec-correct values.
+    patch -p1 < "${srcdir}/0016-h264-derive-pframe-bframe-flags.patch"
+
+    # Patch 17: fill dpb[].pic_num as PicNum/LongTermPicNum per H.264
+    # spec equations 8-28/8-29 instead of fourier's wrong VAAPI
+    # surface-id assignment. Adds VAPicture parameter to h264_fill_dpb
+    # so it can compute FrameNumWrap from log2_max_frame_num_minus4 +
+    # current frame_num. Mediatek consumes pic_num for short-term
+    # field-coded ref disambiguation; hantro doesn't read it (uses
+    # reference_ts), which is why fourier's wrong value never
+    # surfaced on RK3568.
+    patch -p1 < "${srcdir}/0017-h264-dpb-picnum-correctness.patch"
+
+    # Patch 18: derive sps.level_idc from encoded frame size per
+    # H.264 Annex A.3 (Table A-1) MaxFS thresholds. Replaces patch
+    # 0013's intermediate hardcode of 51. For typical content:
+    # 1080p → Level 4.1 (level_idc=41), 4K → 5.1, 8K → 6.0. Hantro
+    # uses level_idc to size DPB / MV buffers; correct sizing means
+    # less wasted memory than 0013's blanket over-allocation.
+    patch -p1 < "${srcdir}/0018-h264-level-idc-from-annex-a3.patch"
+}
+
+build() {
+    cd "${srcdir}/${_upstreampkg}-${_commit}"
+    # meson_options.txt only exposes 'kernel_headers' — leave it empty to
+    # use system /usr/include kernel UAPI headers. No per-codec toggles.
+    arch-meson build --buildtype=release
+    meson compile -C build
+}
+
+package() {
+    cd "${srcdir}/${_upstreampkg}-${_commit}"
+    meson install -C build --destdir "${pkgdir}"
+
+    install -Dm644 COPYING       "${pkgdir}/usr/share/licenses/${pkgname}/COPYING"
+    install -Dm644 COPYING.LGPL  "${pkgdir}/usr/share/licenses/${pkgname}/COPYING.LGPL"
+    install -Dm644 COPYING.MIT   "${pkgdir}/usr/share/licenses/${pkgname}/COPYING.MIT"
+}
@@ -0,0 +1,58 @@
+# libva-v4l2-request-ohm-gl-fix
+
+Bootlin's libva-v4l2-request VA-API backend, with hantro-vpu
+multi-planar + chromium-149-era stateless H.264 patches developed
+in the [ohm_gl_fix campaign](../../../ohm_gl_fix/) Phase 6 Step 1
+(2026-05-01..2026-05-02).
+
+Patches 0001-0018 are contract-correct against the kernel V4L2
+stateless H.264 UAPI, validated by inspection against
+hantro_h264.c and v4l2_h264_init_reflist_builder() in linux 6.19.x.
+See ohm_gl_fix's `phase6/step1/audit_0008_decode_params_2026-05-01.md`
+and `phase6/step1/api_contract_findings_2026-05-01.md` for the
+audit trail.
+
+## Honest characterisation
+
+This package compiles cleanly, installs cleanly, and `vainfo` with
+`LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1`
+enumerates all H.264 profiles. It is, however, **not on Brave's
+critical decode path** on the ohm_gl_fix Step-1+Step-2 stack —
+Brave/Chromium uses its own `V4L2VideoDecoder` in
+`media/gpu/v4l2/`, opens `/dev/video1` directly, and never loads
+libva. See `phase3_remeasure_2026-05-02/B3_decoder_discovery.md`
+in the ohm_gl_fix repo for the strace/maps evidence.
+
+For libva consumers that DO route through libva (mpv with
+`--hwdec=vaapi`, ffmpeg-vaapi, Firefox with
+`media.ffmpeg.vaapi.enabled=true`), this backend would in
+principle engage hantro hardware decode — but each consumer hits
+its own downstream issue:
+
+- mpv `--vo=gpu-next`: blocked by a Mesa-panfrost WSI pitch bug
+  during EGL dmabuf import (out of scope for this package; see
+  `phase3_remeasure_2026-05-02/A3_mesa_wsi_pitch.md`).
+- mpv `--vo=image`: silently falls back to libavcodec SW decode
+  rather than engaging the libva session (see
+  `phase3_remeasure_2026-05-02/A1_morning_pass_disambiguation.md`).
+  Reason not yet diagnosed.
+
+The most likely use case where this backend cleanly delivers
+hardware decode end-to-end is **Firefox via libavcodec-vaapi**, on
+a stack that also has the Mesa pitch issue resolved (or that
+doesn't hit the EGL import path because Firefox's video element
+composites differently). Untested at time of writing.
+
+## DEBUG patches in the series
+
+`0010-DEBUG-hex-dump-output-capture.patch`,
+`0011-DEBUG-sentinel-capture-buffer.patch`, and
+`0014-DEBUG-vapic-bytes-dump.patch` produce verbose stderr output
+useful for diagnosing decode-path issues but excessive for
+production. They are intentionally kept in the applied series
+during development; remove for cleaner runs.
+
+## Status
+
+Tagged stable as of 2026-05-02 against chromium-149 / linux-6.19.10
+/ libva-2.22.0. Contract-correct; ecosystem-validation-pending.