Add libva-v4l2-request-ohm-gl-fix package
build and publish packages / distcc-avahi-aarch64 (push) Successful in 1m3s
build and publish packages / lmcp-any (push) Successful in 9s
build and publish packages / lmcp-debian (push) Successful in 4s
build and publish packages / claude-his-any (push) Failing after 4s
build and publish packages / ffmpeg-v4l2-request-aarch64 (push) Has been skipped
build and publish packages / claude-his-debian (push) Has been skipped
build and publish packages / distcc-avahi-aarch64 (push) Successful in 1m3s
build and publish packages / lmcp-any (push) Successful in 9s
build and publish packages / lmcp-debian (push) Successful in 4s
build and publish packages / claude-his-any (push) Failing after 4s
build and publish packages / ffmpeg-v4l2-request-aarch64 (push) Has been skipped
build and publish packages / claude-his-debian (push) Has been skipped
Mirrors phase6/step1/ from the ohm_gl_fix campaign. Contract-correct
hantro multi-planar / chromium-149-era stateless H.264 port of
bootlin's libva-v4l2-request, patches 0001..0018 + fourier-local.
Honest characterisation in README:
- Builds cleanly on chromium-builder LXC (boltzmann)
- vainfo enumerates H.264 profiles cleanly with LIBVA_DRIVER_NAME=v4l2_request
- NOT on Brave's decode path on ohm_gl_fix stack — Brave uses
Chromium's own V4L2VideoDecoder in media/gpu/v4l2/.
- Most likely useful for a future Firefox-via-libavcodec-vaapi
campaign, modulo a separate Mesa-panfrost WSI pitch issue.
- DEBUG patches (0010, 0011, 0014) intentionally kept in series
for development; remove for cleaner production runs.
Audit trail in the source repo at ohm_gl_fix:
phase6/step1/audit_0008_decode_params_2026-05-01.md
phase6/step1/api_contract_findings_2026-05-01.md
phase3_remeasure_2026-05-02/B3_decoder_discovery.md (why this
isn't on Brave's path)
This commit is contained in:
@@ -0,0 +1,70 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-01
|
||||
Subject: [PATCH] mplane: enable V4L2 multiplanar capture for NV12 on hantro-vpu
|
||||
|
||||
Fourier's local patch already wired multiplanar plumbing through
|
||||
src/v4l2.c (helpers v4l2_type_video_{output,capture}() at lines 59-69,
|
||||
struct v4l2_plane planes[] threading in QUERYBUF/QBUF/DQBUF, per-plane
|
||||
EXPBUF loop at line 411) and through src/context.c, src/buffer.c,
|
||||
src/picture.c via the v4l2_type_video_{output,capture}(video_format
|
||||
->v4l2_mplane) helper calls.
|
||||
|
||||
The remaining gap: the NV12 entry in src/video.c was hardcoded to
|
||||
v4l2_mplane=false, and the bootstrap path in src/surface.c was
|
||||
hardcoded to singleplanar literals before video_format is populated.
|
||||
|
||||
This patch flips the NV12 entry to v4l2_mplane=true and updates the
|
||||
two singleplanar literals in src/surface.c to their MPLANE variants:
|
||||
|
||||
- src/video.c:42 v4l2_mplane=false -> true (NV12 only;
|
||||
Sunxi-tiled NV12 left at false for cedrus compatibility)
|
||||
- src/surface.c:84 output_type = v4l2_type_video_output(true)
|
||||
- src/surface.c:109 v4l2_find_format(..., CAPTURE_MPLANE, NV12)
|
||||
|
||||
Empirically, hantro-vpu (RK3568 mainline) advertises NV12 only under
|
||||
V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; querying the singleplanar type
|
||||
returns no match (verified via VIDIOC_ENUM_FMT in Phase 3 GStreamer
|
||||
strace baseline).
|
||||
|
||||
Trade-off accepted: legacy sunxi-cedrus singleplanar NV12 paths are
|
||||
left unchanged via the SUNXI_TILED_NV12 entry (still mplane=false,
|
||||
__arm__ only). Pure-NV12 cedrus on aarch64 would regress, but the
|
||||
known userbase here is RK3566/RK3568 hantro.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
src/surface.c | 4 ++--
|
||||
src/video.c | 2 +-
|
||||
2 files changed, 3 insertions(+), 3 deletions(-)
|
||||
|
||||
--- a/src/video.c
|
||||
+++ b/src/video.c
|
||||
@@ -39,7 +39,7 @@ static struct video_format formats[] = {
|
||||
.description = "NV12 YUV",
|
||||
.v4l2_format = V4L2_PIX_FMT_NV12,
|
||||
.v4l2_buffers_count = 1,
|
||||
- .v4l2_mplane = false,
|
||||
+ .v4l2_mplane = true,
|
||||
.drm_format = DRM_FORMAT_NV12,
|
||||
.drm_modifier = DRM_FORMAT_MOD_NONE,
|
||||
.planes_count = 2,
|
||||
--- a/src/surface.c
|
||||
+++ b/src/surface.c
|
||||
@@ -81,7 +81,7 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
|
||||
// we declare SET_FORMAT_OF_OUTPUT_ONCE to ensure v4l2_set_format only gets called once
|
||||
// (in the first RequestCreateSurfaces2 call BEFORE any buffers are created later on)
|
||||
unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE;
|
||||
- unsigned int output_type = v4l2_type_video_output(false);
|
||||
+ unsigned int output_type = v4l2_type_video_output(true);
|
||||
|
||||
if (!SET_FORMAT_OF_OUTPUT_ONCE) {
|
||||
rc = v4l2_set_format(driver_data->video_fd, output_type, pixelformat,
|
||||
@@ -106,7 +106,7 @@ VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
|
||||
video_format = video_format_find(V4L2_PIX_FMT_SUNXI_TILED_NV12);
|
||||
|
||||
found = v4l2_find_format(driver_data->video_fd,
|
||||
- V4L2_BUF_TYPE_VIDEO_CAPTURE,
|
||||
+ V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE,
|
||||
V4L2_PIX_FMT_NV12);
|
||||
if (found)
|
||||
video_format = video_format_find(V4L2_PIX_FMT_NV12);
|
||||
@@ -0,0 +1,103 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-01
|
||||
Subject: [PATCH] context: pre-STREAMON device controls and minimum OUTPUT pool
|
||||
|
||||
Two related fixes that surfaced during the first hantro-vpu (RK3568)
|
||||
smoke test of the multiplanar build:
|
||||
|
||||
1. **OUTPUT queue must be non-empty at STREAMON.** Hantro's
|
||||
vb2_start_streaming rejects an empty queue with EINVAL. Some
|
||||
VA-API callers (notably ffmpeg's vaapi-copy path) call
|
||||
vaCreateContext with num_render_targets=0 and allocate render
|
||||
targets lazily. The OUTPUT (bitstream-input) pool must NOT be
|
||||
sized off surfaces_count alone — it is a request-time resource,
|
||||
not per-surface. Quick fix: floor the pool to 4 buffers when
|
||||
the caller passes 0. (A proper decoupling of OUTPUT pool from
|
||||
surface lifecycle is documented in upstreamable_design.md.)
|
||||
|
||||
2. **Device-wide stateless H.264 controls before STREAMON.** The
|
||||
V4L2 stateless framework requires V4L2_CID_STATELESS_H264_
|
||||
DECODE_MODE and START_CODE be set on the device fd
|
||||
(request_fd=-1) before stream start. Per-request controls
|
||||
(SPS/PPS/SLICE_PARAMS/etc.) attached to a request_fd come
|
||||
later via h264_set_controls(). hantro-vpu accepts only
|
||||
DECODE_MODE_FRAME_BASED; START_CODE_ANNEX_B matches what the
|
||||
existing slice-assembly path emits.
|
||||
|
||||
This is set unconditionally for now (errors silently ignored)
|
||||
to keep cedrus and other backends compatible — they may
|
||||
default to SLICE_BASED and not expose DECODE_MODE at all.
|
||||
Probe-then-set via VIDIOC_QUERYCTRL is the upstream-correct
|
||||
approach (see upstreamable_design.md §3).
|
||||
|
||||
After this patch, vainfo still enumerates as before, but the first
|
||||
mpv vaapi-copy attempt advances past STREAMON and into actual
|
||||
decode submission.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
src/context.c | 38 +++++++++++++++++++++++++++++++++++++-
|
||||
1 file changed, 37 insertions(+), 1 deletion(-)
|
||||
|
||||
--- a/src/context.c
|
||||
+++ b/src/context.c
|
||||
@@ -64,6 +64,7 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
||||
VAContextID id;
|
||||
VAStatus status;
|
||||
unsigned int output_type, capture_type;
|
||||
+ unsigned int output_buffers_count;
|
||||
unsigned int index_base;
|
||||
unsigned int index;
|
||||
unsigned int i;
|
||||
@@ -90,8 +91,16 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
||||
}
|
||||
memset(&context_object->dpb, 0, sizeof(context_object->dpb));
|
||||
|
||||
+ /*
|
||||
+ * The OUTPUT (bitstream-input) queue must be non-empty before
|
||||
+ * VIDIOC_STREAMON or hantro-class drivers reject with EINVAL.
|
||||
+ * VA-API callers (e.g. ffmpeg's vaapi-copy path) may invoke
|
||||
+ * vaCreateContext with num_render_targets=0; allocate a small
|
||||
+ * minimum pool independent of the caller's surface count.
|
||||
+ */
|
||||
+ output_buffers_count = surfaces_count > 0 ? surfaces_count : 4;
|
||||
rc = v4l2_create_buffers(driver_data->video_fd, output_type,
|
||||
- surfaces_count, &index_base);
|
||||
+ output_buffers_count, &index_base);
|
||||
if (rc < 0) {
|
||||
status = VA_STATUS_ERROR_ALLOCATION_FAILED;
|
||||
goto error;
|
||||
@@ -138,6 +147,33 @@ VAStatus RequestCreateContext(VADriverContextP context, VAConfigID config_id,
|
||||
surface_object->source_size = length;
|
||||
}
|
||||
|
||||
+ /*
|
||||
+ * Stateless H.264 device-wide controls. The kernel V4L2 stateless
|
||||
+ * framework requires DECODE_MODE and START_CODE be set on the
|
||||
+ * device fd (request_fd=-1) before VIDIOC_STREAMON; per-request
|
||||
+ * controls (SPS/PPS/etc.) attached to a request_fd come later.
|
||||
+ *
|
||||
+ * hantro-vpu (RK3568) accepts only DECODE_MODE_FRAME_BASED.
|
||||
+ * START_CODE_ANNEX_B preserves leading 0x00000001 in the slice
|
||||
+ * payload that h264.c assembles. Errors here are not fatal: not
|
||||
+ * every backing driver supports both controls (e.g. cedrus may
|
||||
+ * default to SLICE_BASED without exposing DECODE_MODE).
|
||||
+ */
|
||||
+ {
|
||||
+ struct v4l2_ext_control dev_ctrls[2] = {
|
||||
+ {
|
||||
+ .id = V4L2_CID_STATELESS_H264_DECODE_MODE,
|
||||
+ .value = V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED,
|
||||
+ },
|
||||
+ {
|
||||
+ .id = V4L2_CID_STATELESS_H264_START_CODE,
|
||||
+ .value = V4L2_STATELESS_H264_START_CODE_ANNEX_B,
|
||||
+ },
|
||||
+ };
|
||||
+ (void)v4l2_set_controls(driver_data->video_fd, -1,
|
||||
+ dev_ctrls, 2);
|
||||
+ }
|
||||
+
|
||||
rc = v4l2_set_stream(driver_data->video_fd, output_type, true);
|
||||
if (rc < 0) {
|
||||
status = VA_STATUS_ERROR_OPERATION_FAILED;
|
||||
@@ -0,0 +1,145 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-01
|
||||
Subject: [PATCH] v4l2: add QUERYCTRL/QUERYMENU capability-probe helpers
|
||||
|
||||
Pure utility additions, no behaviour change. Three helpers in
|
||||
src/v4l2.{c,h}:
|
||||
|
||||
- v4l2_query_ext_ctrl(): wraps VIDIOC_QUERY_EXT_CTRL by CID.
|
||||
Returns 0 if the control exists, -1 if not. Caller passes NULL
|
||||
qec to test existence only.
|
||||
|
||||
- v4l2_query_menu(): wraps VIDIOC_QUERYMENU at a given index.
|
||||
Returns 0 if a menu item exists at that index, -1 otherwise.
|
||||
|
||||
- v4l2_ctrl_menu_has_value(): convenience layered on the above.
|
||||
For a menu/intmenu-type control, walks all menu items between
|
||||
minimum and maximum and returns true iff `value` is a valid
|
||||
entry. Used by callers that ask "does this driver accept menu
|
||||
value X for this CID?" without caring about iteration details.
|
||||
|
||||
These unblock commit 3 (request_pool — needs ext-ctrl probing for
|
||||
codec-ops dispatch) and commit 4 (probe-then-set DECODE_MODE/
|
||||
START_CODE — replaces 0002's unconditional set with a real probe)
|
||||
of the upstreamable design's six-commit series.
|
||||
|
||||
Forward-declarations in v4l2.h keep the header lean: existing
|
||||
prototypes already use opaque struct v4l2_ext_control * pointers
|
||||
without including <linux/videodev2.h>; we follow the same
|
||||
convention for struct v4l2_query_ext_ctrl and struct v4l2_querymenu.
|
||||
|
||||
No call sites added in this commit. Compile-only verification:
|
||||
the .so links cleanly with three new exports.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
src/v4l2.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
src/v4l2.h | 33 +++++++++++++++++++++++++++++
|
||||
2 files changed, 93 insertions(+)
|
||||
|
||||
--- a/src/v4l2.h
|
||||
+++ b/src/v4l2.h
|
||||
@@ -64,4 +64,37 @@ int v4l2_set_control(int video_fd, int request_fd, unsigned int id, void *data,
|
||||
unsigned int size);
|
||||
int v4l2_set_stream(int video_fd, unsigned int type, bool enable);
|
||||
|
||||
+/*
|
||||
+ * Capability-probe helpers. These let calling code discover what the
|
||||
+ * backing kernel driver supports rather than hardcoding assumptions
|
||||
+ * about specific decoder hardware.
|
||||
+ */
|
||||
+
|
||||
+/*
|
||||
+ * Query the metadata of an extended control by CID. Fills *qec on
|
||||
+ * success. Returns 0 if the control exists, -1 (errno=EINVAL) if the
|
||||
+ * driver does not expose this CID. Pass qec=NULL to test existence
|
||||
+ * only.
|
||||
+ */
|
||||
+struct v4l2_query_ext_ctrl;
|
||||
+int v4l2_query_ext_ctrl(int video_fd, unsigned int id,
|
||||
+ struct v4l2_query_ext_ctrl *qec);
|
||||
+
|
||||
+/*
|
||||
+ * Query a single menu item of a menu/intmenu control at the given
|
||||
+ * index. Fills *qm on success. Returns 0 if the menu item exists at
|
||||
+ * this index, -1 otherwise.
|
||||
+ */
|
||||
+struct v4l2_querymenu;
|
||||
+int v4l2_query_menu(int video_fd, unsigned int id, unsigned int index,
|
||||
+ struct v4l2_querymenu *qm);
|
||||
+
|
||||
+/*
|
||||
+ * Convenience: for a menu-type control, return true iff `value` is a
|
||||
+ * valid menu entry (i.e. the driver accepts it). Walks all menu items
|
||||
+ * up to the control's maximum to check.
|
||||
+ */
|
||||
+bool v4l2_ctrl_menu_has_value(int video_fd, unsigned int id,
|
||||
+ unsigned int value);
|
||||
+
|
||||
#endif
|
||||
--- a/src/v4l2.c
|
||||
+++ b/src/v4l2.c
|
||||
@@ -508,3 +508,63 @@ int v4l2_set_stream(int video_fd, unsigned int type, bool enable)
|
||||
|
||||
return 0;
|
||||
}
|
||||
+
|
||||
+int v4l2_query_ext_ctrl(int video_fd, unsigned int id,
|
||||
+ struct v4l2_query_ext_ctrl *qec)
|
||||
+{
|
||||
+ struct v4l2_query_ext_ctrl local;
|
||||
+ struct v4l2_query_ext_ctrl *target = qec ? qec : &local;
|
||||
+ int rc;
|
||||
+
|
||||
+ memset(target, 0, sizeof(*target));
|
||||
+ target->id = id;
|
||||
+
|
||||
+ rc = ioctl(video_fd, VIDIOC_QUERY_EXT_CTRL, target);
|
||||
+ if (rc < 0)
|
||||
+ return -1;
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+int v4l2_query_menu(int video_fd, unsigned int id, unsigned int index,
|
||||
+ struct v4l2_querymenu *qm)
|
||||
+{
|
||||
+ int rc;
|
||||
+
|
||||
+ if (qm == NULL)
|
||||
+ return -1;
|
||||
+
|
||||
+ memset(qm, 0, sizeof(*qm));
|
||||
+ qm->id = id;
|
||||
+ qm->index = index;
|
||||
+
|
||||
+ rc = ioctl(video_fd, VIDIOC_QUERYMENU, qm);
|
||||
+ if (rc < 0)
|
||||
+ return -1;
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+bool v4l2_ctrl_menu_has_value(int video_fd, unsigned int id,
|
||||
+ unsigned int value)
|
||||
+{
|
||||
+ struct v4l2_query_ext_ctrl qec;
|
||||
+ struct v4l2_querymenu qm;
|
||||
+ long long i;
|
||||
+
|
||||
+ if (v4l2_query_ext_ctrl(video_fd, id, &qec) < 0)
|
||||
+ return false;
|
||||
+
|
||||
+ if (qec.type != V4L2_CTRL_TYPE_MENU &&
|
||||
+ qec.type != V4L2_CTRL_TYPE_INTEGER_MENU)
|
||||
+ return false;
|
||||
+
|
||||
+ for (i = qec.minimum; i <= qec.maximum; i += qec.step ? qec.step : 1) {
|
||||
+ if (v4l2_query_menu(video_fd, id, (unsigned int)i, &qm) < 0)
|
||||
+ continue;
|
||||
+ if ((unsigned int)i == value)
|
||||
+ return true;
|
||||
+ }
|
||||
+
|
||||
+ return false;
|
||||
+}
|
||||
@@ -0,0 +1,545 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-01
|
||||
Subject: [PATCH] context: introduce request_pool, decouple OUTPUT buffers from surfaces
|
||||
|
||||
Commit 3 of the upstreamable plan (upstreamable_design.md §1, §5).
|
||||
Replaces the prior per-surface OUTPUT-buffer ownership model with a
|
||||
small driver-wide pool sized by codec pipeline depth (4 H.264 frames
|
||||
in flight), allocated unconditionally regardless of caller's
|
||||
num_render_targets.
|
||||
|
||||
Prior art (kernel UAPI dev-stateless-decoder.rst, ffmpeg
|
||||
v4l2_request.c, Chromium V4L2StatelessVideoDecoder, GStreamer
|
||||
v4l2slh264dec) all decouple OUTPUT and CAPTURE pool sizing. fourier's
|
||||
"output_count == surfaces_count" model was a category error: OUTPUT
|
||||
buffers are request-time bitstream slots, CAPTURE buffers are
|
||||
picture-time DPB slots; their lifecycles and sizing are independent.
|
||||
|
||||
Changes:
|
||||
* NEW src/request_pool.{c,h} (~200 LoC):
|
||||
- request_pool_init(): CREATE_BUFS + per-slot QUERYBUF + mmap.
|
||||
- request_pool_destroy(): munmap all, idempotent.
|
||||
- request_pool_acquire(): round-robin claim; returns V4L2 buffer
|
||||
index of an unused slot or -1.
|
||||
- request_pool_release(): mark slot free for reuse.
|
||||
- request_pool_slot(): accessor for ptr/size given a buffer index.
|
||||
|
||||
* src/request.h: add struct request_pool output_pool to request_data.
|
||||
|
||||
* src/context.c::RequestCreateContext: replace the per-surface
|
||||
OUTPUT loop with a single request_pool_init() call (count=4,
|
||||
independent of surfaces_count). Drop the now-unused locals
|
||||
(length, offset, source_data, output_buffers_count, index,
|
||||
index_base, i, surface_object). DELETES patch 0002's
|
||||
"output_buffers_count = ... ? ... : 4" hack inline — the pool's
|
||||
own count parameter supersedes it.
|
||||
|
||||
* src/picture.c::RequestBeginPicture: borrow a pool slot at frame
|
||||
start, write its mmap pointer/size/index into the surface's
|
||||
transient source_* fields. The fields stay (still useful as
|
||||
a borrow handle that the existing codec_store_buffer memcpys
|
||||
target), but no longer represent surface-permanent ownership.
|
||||
Reset slices_size/slices_count here too (was implicit on first
|
||||
Render).
|
||||
|
||||
* src/surface.c::RequestSyncSurface: after VIDIOC_DQBUF returns
|
||||
the OUTPUT buffer, release the pool slot and clear the surface's
|
||||
borrow handle. Fixes the segv on second-frame submission.
|
||||
|
||||
* src/surface.c::RequestDestroySurfaces: remove the munmap of
|
||||
source_data — pool owns the mmap.
|
||||
|
||||
* src/request.c::RequestTerminate: call request_pool_destroy()
|
||||
before close(video_fd) so munmaps still target a valid fd.
|
||||
|
||||
* src/meson.build: add request_pool.c and request_pool.h to the
|
||||
sources/headers lists.
|
||||
|
||||
This commit removes 0002's OUTPUT-pool hack inline (the
|
||||
"floor to 4" line is gone). The DECODE_MODE/START_CODE block in 0002
|
||||
remains until commit 4 lands.
|
||||
|
||||
Build-verified clean on aarch64.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/request.h 2026-05-01 20:09:57.346428828 +0000
|
||||
+++ b/src/request.h 2026-05-01 20:17:57.497514185 +0000
|
||||
@@ -31,6 +31,7 @@
|
||||
|
||||
#include "context.h"
|
||||
#include "object_heap.h"
|
||||
+#include "request_pool.h"
|
||||
#include "video.h"
|
||||
#include <va/va.h>
|
||||
|
||||
@@ -55,6 +56,13 @@
|
||||
int media_fd;
|
||||
|
||||
struct video_format *video_format;
|
||||
+
|
||||
+ /*
|
||||
+ * OUTPUT (bitstream-input) buffer pool, decoupled from VA
|
||||
+ * surfaces. Sized by codec pipeline depth, populated on first
|
||||
+ * RequestCreateContext, torn down at driver Terminate.
|
||||
+ */
|
||||
+ struct request_pool output_pool;
|
||||
};
|
||||
|
||||
VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP context);
|
||||
--- a/src/request.c 2026-05-01 20:09:57.346428828 +0000
|
||||
+++ b/src/request.c 2026-05-01 20:19:48.091143681 +0000
|
||||
@@ -205,6 +205,13 @@
|
||||
struct object_config *config_object;
|
||||
int iterator;
|
||||
|
||||
+ /*
|
||||
+ * Tear down the OUTPUT buffer pool before closing video_fd so
|
||||
+ * the munmap calls in request_pool_destroy() can still touch the
|
||||
+ * mmap regions (which are tied to that fd's lifetime).
|
||||
+ */
|
||||
+ request_pool_destroy(&driver_data->output_pool);
|
||||
+
|
||||
close(driver_data->video_fd);
|
||||
close(driver_data->media_fd);
|
||||
|
||||
--- a/src/context.c 2026-05-01 20:09:57.346428828 +0000
|
||||
+++ b/src/context.c 2026-05-01 20:18:33.738048227 +0000
|
||||
@@ -54,20 +54,12 @@
|
||||
{
|
||||
struct request_data *driver_data = context->pDriverData;
|
||||
struct object_config *config_object;
|
||||
- struct object_surface *surface_object;
|
||||
struct object_context *context_object = NULL;
|
||||
struct video_format *video_format;
|
||||
- unsigned int length;
|
||||
- unsigned int offset;
|
||||
- void *source_data = MAP_FAILED;
|
||||
VASurfaceID *ids = NULL;
|
||||
VAContextID id;
|
||||
VAStatus status;
|
||||
unsigned int output_type, capture_type;
|
||||
- unsigned int output_buffers_count;
|
||||
- unsigned int index_base;
|
||||
- unsigned int index;
|
||||
- unsigned int i;
|
||||
int rc;
|
||||
|
||||
video_format = driver_data->video_format;
|
||||
@@ -92,15 +84,20 @@
|
||||
memset(&context_object->dpb, 0, sizeof(context_object->dpb));
|
||||
|
||||
/*
|
||||
- * The OUTPUT (bitstream-input) queue must be non-empty before
|
||||
- * VIDIOC_STREAMON or hantro-class drivers reject with EINVAL.
|
||||
- * VA-API callers (e.g. ffmpeg's vaapi-copy path) may invoke
|
||||
- * vaCreateContext with num_render_targets=0; allocate a small
|
||||
- * minimum pool independent of the caller's surface count.
|
||||
+ * Initialize the OUTPUT (bitstream-input) buffer pool. Sized by
|
||||
+ * codec pipeline depth (4 H.264 frames in flight is sufficient
|
||||
+ * for current hantro/rkvdec scheduling); independent of caller-
|
||||
+ * supplied surfaces_count. Pool is owned by driver_data so it
|
||||
+ * outlives any single context destroy/recreate cycle.
|
||||
+ *
|
||||
+ * This replaces the prior per-surface OUTPUT loop, which (a)
|
||||
+ * created an empty queue when surfaces_count==0 (ffmpeg vaapi-
|
||||
+ * copy path) and (b) only populated surface->source_* for
|
||||
+ * surfaces present at vaCreateContext time, NULL-derefing on
|
||||
+ * surfaces created later.
|
||||
*/
|
||||
- output_buffers_count = surfaces_count > 0 ? surfaces_count : 4;
|
||||
- rc = v4l2_create_buffers(driver_data->video_fd, output_type,
|
||||
- output_buffers_count, &index_base);
|
||||
+ rc = request_pool_init(&driver_data->output_pool,
|
||||
+ driver_data->video_fd, output_type, 4);
|
||||
if (rc < 0) {
|
||||
status = VA_STATUS_ERROR_ALLOCATION_FAILED;
|
||||
goto error;
|
||||
@@ -111,40 +108,15 @@
|
||||
* we don't have any indication wrt its life time. Let's make sure
|
||||
* its life span is under our control.
|
||||
*/
|
||||
- ids = malloc(surfaces_count * sizeof(VASurfaceID));
|
||||
- if (ids == NULL) {
|
||||
- status = VA_STATUS_ERROR_ALLOCATION_FAILED;
|
||||
- goto error;
|
||||
- }
|
||||
-
|
||||
- memcpy(ids, surfaces_ids, surfaces_count * sizeof(VASurfaceID));
|
||||
-
|
||||
- for (i = 0; i < surfaces_count; i++) {
|
||||
- index = index_base + i;
|
||||
-
|
||||
- surface_object = SURFACE(driver_data, surfaces_ids[i]);
|
||||
- if (surface_object == NULL) {
|
||||
- status = VA_STATUS_ERROR_INVALID_SURFACE;
|
||||
- goto error;
|
||||
- }
|
||||
-
|
||||
- rc = v4l2_query_buffer(driver_data->video_fd, output_type,
|
||||
- index, &length, &offset, 1);
|
||||
- if (rc < 0) {
|
||||
+ if (surfaces_count > 0) {
|
||||
+ ids = malloc(surfaces_count * sizeof(VASurfaceID));
|
||||
+ if (ids == NULL) {
|
||||
status = VA_STATUS_ERROR_ALLOCATION_FAILED;
|
||||
goto error;
|
||||
}
|
||||
|
||||
- source_data = mmap(NULL, length, PROT_READ | PROT_WRITE,
|
||||
- MAP_SHARED, driver_data->video_fd, offset);
|
||||
- if (source_data == MAP_FAILED) {
|
||||
- status = VA_STATUS_ERROR_ALLOCATION_FAILED;
|
||||
- goto error;
|
||||
- }
|
||||
-
|
||||
- surface_object->source_index = index;
|
||||
- surface_object->source_data = source_data;
|
||||
- surface_object->source_size = length;
|
||||
+ memcpy(ids, surfaces_ids,
|
||||
+ surfaces_count * sizeof(VASurfaceID));
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -200,9 +172,6 @@
|
||||
goto complete;
|
||||
|
||||
error:
|
||||
- if (source_data != MAP_FAILED)
|
||||
- munmap(source_data, length);
|
||||
-
|
||||
if (ids != NULL)
|
||||
free(ids);
|
||||
|
||||
--- a/src/picture.c 2026-05-01 20:09:57.346428828 +0000
|
||||
+++ b/src/picture.c 2026-05-01 20:19:10.742593454 +0000
|
||||
@@ -216,6 +216,8 @@
|
||||
struct request_data *driver_data = context->pDriverData;
|
||||
struct object_context *context_object;
|
||||
struct object_surface *surface_object;
|
||||
+ struct request_pool_slot *slot;
|
||||
+ int slot_index;
|
||||
|
||||
context_object = CONTEXT(driver_data, context_id);
|
||||
if (context_object == NULL)
|
||||
@@ -228,6 +230,31 @@
|
||||
if (surface_object->status == VASurfaceRendering)
|
||||
RequestSyncSurface(context, surface_id);
|
||||
|
||||
+ /*
|
||||
+ * Borrow an OUTPUT (bitstream-input) slot from the driver-wide
|
||||
+ * pool for the duration of this Begin/Render/End cycle. The
|
||||
+ * surface's source_* fields hold the borrow's mmap pointer/size/
|
||||
+ * V4L2 buffer index until RequestSyncSurface releases it after
|
||||
+ * VIDIOC_DQBUF.
|
||||
+ */
|
||||
+ slot_index = request_pool_acquire(&driver_data->output_pool);
|
||||
+ if (slot_index < 0)
|
||||
+ return VA_STATUS_ERROR_ALLOCATION_FAILED;
|
||||
+
|
||||
+ slot = request_pool_slot(&driver_data->output_pool,
|
||||
+ (unsigned int)slot_index);
|
||||
+ if (slot == NULL) {
|
||||
+ request_pool_release(&driver_data->output_pool,
|
||||
+ (unsigned int)slot_index);
|
||||
+ return VA_STATUS_ERROR_ALLOCATION_FAILED;
|
||||
+ }
|
||||
+
|
||||
+ surface_object->source_index = slot->index;
|
||||
+ surface_object->source_data = slot->data;
|
||||
+ surface_object->source_size = slot->size;
|
||||
+ surface_object->slices_size = 0;
|
||||
+ surface_object->slices_count = 0;
|
||||
+
|
||||
surface_object->status = VASurfaceRendering;
|
||||
context_object->render_surface_id = surface_id;
|
||||
|
||||
--- a/src/surface.c 2026-05-01 20:09:57.346428828 +0000
|
||||
+++ b/src/surface.c 2026-05-01 20:19:35.490958060 +0000
|
||||
@@ -254,10 +254,11 @@
|
||||
if (surface_object == NULL)
|
||||
return VA_STATUS_ERROR_INVALID_SURFACE;
|
||||
|
||||
- if (surface_object->source_data != NULL &&
|
||||
- surface_object->source_size > 0)
|
||||
- munmap(surface_object->source_data,
|
||||
- surface_object->source_size);
|
||||
+ /*
|
||||
+ * source_* are now transient borrows from request_pool, not
|
||||
+ * surface-owned mappings; the pool owns the underlying mmap.
|
||||
+ * Nothing to free here.
|
||||
+ */
|
||||
|
||||
for (j = 0; j < surface_object->destination_buffers_count; j++)
|
||||
if (surface_object->destination_map[j] != NULL &&
|
||||
@@ -336,6 +337,15 @@
|
||||
goto error;
|
||||
}
|
||||
|
||||
+ /*
|
||||
+ * OUTPUT buffer is back from the kernel: return its pool slot
|
||||
+ * for reuse and clear the surface's transient borrow handle.
|
||||
+ */
|
||||
+ request_pool_release(&driver_data->output_pool,
|
||||
+ surface_object->source_index);
|
||||
+ surface_object->source_data = NULL;
|
||||
+ surface_object->source_size = 0;
|
||||
+
|
||||
rc = v4l2_dequeue_buffer(driver_data->video_fd, -1, capture_type,
|
||||
surface_object->destination_index,
|
||||
surface_object->destination_buffers_count);
|
||||
--- a/src/meson.build 2026-05-01 20:09:57.346428828 +0000
|
||||
+++ b/src/meson.build 2026-05-01 20:20:04.775389455 +0000
|
||||
@@ -44,6 +44,7 @@
|
||||
'v4l2.c',
|
||||
'mpeg2.c',
|
||||
'h264.c',
|
||||
+ 'request_pool.c',
|
||||
# 'h265.c'
|
||||
]
|
||||
|
||||
@@ -64,6 +65,7 @@
|
||||
'v4l2.h',
|
||||
'mpeg2.h',
|
||||
'h264.h',
|
||||
+ 'request_pool.h',
|
||||
# 'h265.h'
|
||||
]
|
||||
|
||||
--- a/src/request_pool.h 2025-09-03 18:38:22.431999998 +0000
|
||||
+++ b/src/request_pool.h 2026-05-01 20:17:37.517219722 +0000
|
||||
@@ -0,0 +1,84 @@
|
||||
+/*
|
||||
+ * Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
|
||||
+ *
|
||||
+ * Permission is hereby granted, free of charge, to any person obtaining a
|
||||
+ * copy of this software and associated documentation files (the
|
||||
+ * "Software"), to deal in the Software without restriction.
|
||||
+ *
|
||||
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
|
||||
+ */
|
||||
+
|
||||
+#ifndef _REQUEST_POOL_H_
|
||||
+#define _REQUEST_POOL_H_
|
||||
+
|
||||
+#include <stdbool.h>
|
||||
+
|
||||
+/*
|
||||
+ * OUTPUT (bitstream-input) buffer pool, decoupled from caller-allocated
|
||||
+ * VA surfaces. Sizing is driven by codec pipeline depth (typically 4
|
||||
+ * for H.264), not by the consumer's surface count.
|
||||
+ *
|
||||
+ * The pool owns the V4L2 buffer indices and their mmap pointers. A
|
||||
+ * decode request "borrows" a slot at vaBeginPicture, fills it across
|
||||
+ * vaRenderPicture calls, queues it at vaEndPicture, and releases it
|
||||
+ * after VIDIOC_DQBUF returns.
|
||||
+ *
|
||||
+ * This replaces the per-surface OUTPUT-buffer ownership model in the
|
||||
+ * pre-refactor code, where object_surface.source_* fields permanently
|
||||
+ * held a single OUTPUT buffer per surface — incorrect because OUTPUT
|
||||
+ * buffers are request-time resources, not picture-time resources, and
|
||||
+ * because the per-surface loop in RequestCreateContext only ran when
|
||||
+ * surfaces_count > 0 (breaking ffmpeg's vaapi-copy num_render_targets=0
|
||||
+ * convention).
|
||||
+ */
|
||||
+
|
||||
+struct request_pool_slot {
|
||||
+ unsigned int index; /* V4L2 buffer index in OUTPUT queue */
|
||||
+ void *data; /* mmap pointer for this slot */
|
||||
+ unsigned int size; /* mmap size in bytes */
|
||||
+ bool busy; /* true while borrowed for a request */
|
||||
+};
|
||||
+
|
||||
+struct request_pool {
|
||||
+ struct request_pool_slot *slots;
|
||||
+ unsigned int count;
|
||||
+ unsigned int next; /* round-robin acquire cursor */
|
||||
+ bool initialized;
|
||||
+};
|
||||
+
|
||||
+/*
|
||||
+ * Allocate count OUTPUT buffers via VIDIOC_CREATE_BUFS, query and mmap
|
||||
+ * each, populate pool->slots[]. Caller must have already done
|
||||
+ * VIDIOC_S_FMT on the OUTPUT queue. Returns 0 on success, -1 on
|
||||
+ * failure.
|
||||
+ */
|
||||
+int request_pool_init(struct request_pool *pool, int video_fd,
|
||||
+ unsigned int output_type, unsigned int count);
|
||||
+
|
||||
+/*
|
||||
+ * Munmap all slots and free the slots array. Idempotent.
|
||||
+ */
|
||||
+void request_pool_destroy(struct request_pool *pool);
|
||||
+
|
||||
+/*
|
||||
+ * Claim the next free slot (round-robin). Returns the slot's V4L2
|
||||
+ * buffer index on success (slot in pool->slots[] is determined by
|
||||
+ * the returned index), or -1 if all slots are busy.
|
||||
+ */
|
||||
+int request_pool_acquire(struct request_pool *pool);
|
||||
+
|
||||
+/*
|
||||
+ * Mark the slot at pool->slots[i] free for reuse. Caller must pass the
|
||||
+ * V4L2 buffer index returned earlier from request_pool_acquire().
|
||||
+ */
|
||||
+void request_pool_release(struct request_pool *pool, unsigned int index);
|
||||
+
|
||||
+/*
|
||||
+ * Look up the pool slot owning a given V4L2 buffer index. Returns
|
||||
+ * pointer to the slot on success, NULL if the index is out of range.
|
||||
+ * The returned pointer is valid until pool destruction; do not free.
|
||||
+ */
|
||||
+struct request_pool_slot *request_pool_slot(struct request_pool *pool,
|
||||
+ unsigned int index);
|
||||
+
|
||||
+#endif
|
||||
--- a/src/request_pool.c 2025-09-03 18:38:22.431999998 +0000
|
||||
+++ b/src/request_pool.c 2026-05-01 20:17:37.537220017 +0000
|
||||
@@ -0,0 +1,147 @@
|
||||
+/*
|
||||
+ * Copyright (C) 2026 Markus Fritsche <fritsche.markus@gmail.com>
|
||||
+ *
|
||||
+ * Permission is hereby granted, free of charge, to any person obtaining a
|
||||
+ * copy of this software and associated documentation files (the
|
||||
+ * "Software"), to deal in the Software without restriction.
|
||||
+ *
|
||||
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
|
||||
+ */
|
||||
+
|
||||
+#include "request_pool.h"
|
||||
+
|
||||
+#include <errno.h>
|
||||
+#include <stdlib.h>
|
||||
+#include <string.h>
|
||||
+#include <sys/mman.h>
|
||||
+
|
||||
+#include "utils.h"
|
||||
+#include "v4l2.h"
|
||||
+
|
||||
+int request_pool_init(struct request_pool *pool, int video_fd,
|
||||
+ unsigned int output_type, unsigned int count)
|
||||
+{
|
||||
+ unsigned int index_base;
|
||||
+ unsigned int length;
|
||||
+ unsigned int offset;
|
||||
+ unsigned int i;
|
||||
+ int rc;
|
||||
+
|
||||
+ if (pool == NULL || count == 0)
|
||||
+ return -1;
|
||||
+
|
||||
+ if (pool->initialized)
|
||||
+ return 0;
|
||||
+
|
||||
+ pool->slots = calloc(count, sizeof(*pool->slots));
|
||||
+ if (pool->slots == NULL)
|
||||
+ return -1;
|
||||
+
|
||||
+ pool->count = count;
|
||||
+ pool->next = 0;
|
||||
+
|
||||
+ rc = v4l2_create_buffers(video_fd, output_type, count, &index_base);
|
||||
+ if (rc < 0)
|
||||
+ goto error;
|
||||
+
|
||||
+ for (i = 0; i < count; i++) {
|
||||
+ pool->slots[i].index = index_base + i;
|
||||
+ pool->slots[i].busy = false;
|
||||
+
|
||||
+ rc = v4l2_query_buffer(video_fd, output_type,
|
||||
+ pool->slots[i].index,
|
||||
+ &length, &offset, 1);
|
||||
+ if (rc < 0)
|
||||
+ goto error;
|
||||
+
|
||||
+ pool->slots[i].data = mmap(NULL, length,
|
||||
+ PROT_READ | PROT_WRITE,
|
||||
+ MAP_SHARED, video_fd, offset);
|
||||
+ if (pool->slots[i].data == MAP_FAILED) {
|
||||
+ pool->slots[i].data = NULL;
|
||||
+ goto error;
|
||||
+ }
|
||||
+
|
||||
+ pool->slots[i].size = length;
|
||||
+ }
|
||||
+
|
||||
+ pool->initialized = true;
|
||||
+ return 0;
|
||||
+
|
||||
+error:
|
||||
+ request_pool_destroy(pool);
|
||||
+ return -1;
|
||||
+}
|
||||
+
|
||||
+void request_pool_destroy(struct request_pool *pool)
|
||||
+{
|
||||
+ unsigned int i;
|
||||
+
|
||||
+ if (pool == NULL || pool->slots == NULL)
|
||||
+ return;
|
||||
+
|
||||
+ for (i = 0; i < pool->count; i++) {
|
||||
+ if (pool->slots[i].data != NULL && pool->slots[i].size > 0)
|
||||
+ munmap(pool->slots[i].data, pool->slots[i].size);
|
||||
+ }
|
||||
+
|
||||
+ free(pool->slots);
|
||||
+ pool->slots = NULL;
|
||||
+ pool->count = 0;
|
||||
+ pool->next = 0;
|
||||
+ pool->initialized = false;
|
||||
+}
|
||||
+
|
||||
+int request_pool_acquire(struct request_pool *pool)
|
||||
+{
|
||||
+ unsigned int start;
|
||||
+ unsigned int i;
|
||||
+
|
||||
+ if (pool == NULL || !pool->initialized || pool->count == 0)
|
||||
+ return -1;
|
||||
+
|
||||
+ start = pool->next;
|
||||
+ for (i = 0; i < pool->count; i++) {
|
||||
+ unsigned int slot = (start + i) % pool->count;
|
||||
+
|
||||
+ if (!pool->slots[slot].busy) {
|
||||
+ pool->slots[slot].busy = true;
|
||||
+ pool->next = (slot + 1) % pool->count;
|
||||
+ return (int)pool->slots[slot].index;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ /* All slots busy; caller must wait for an in-flight DQBUF. */
|
||||
+ return -1;
|
||||
+}
|
||||
+
|
||||
+void request_pool_release(struct request_pool *pool, unsigned int index)
|
||||
+{
|
||||
+ unsigned int i;
|
||||
+
|
||||
+ if (pool == NULL || pool->slots == NULL)
|
||||
+ return;
|
||||
+
|
||||
+ for (i = 0; i < pool->count; i++) {
|
||||
+ if (pool->slots[i].index == index) {
|
||||
+ pool->slots[i].busy = false;
|
||||
+ return;
|
||||
+ }
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+struct request_pool_slot *request_pool_slot(struct request_pool *pool,
|
||||
+ unsigned int index)
|
||||
+{
|
||||
+ unsigned int i;
|
||||
+
|
||||
+ if (pool == NULL || pool->slots == NULL)
|
||||
+ return NULL;
|
||||
+
|
||||
+ for (i = 0; i < pool->count; i++) {
|
||||
+ if (pool->slots[i].index == index)
|
||||
+ return &pool->slots[i];
|
||||
+ }
|
||||
+
|
||||
+ return NULL;
|
||||
+}
|
||||
@@ -0,0 +1,61 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-01
|
||||
Subject: [PATCH] h264: submit PRED_WEIGHTS only when WEIGHTED_PRED applies
|
||||
|
||||
Per kernel UAPI (include/uapi/linux/v4l2-controls.h),
|
||||
V4L2_CID_STATELESS_H264_PRED_WEIGHTS is a conditional control:
|
||||
|
||||
V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) :=
|
||||
((pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) &&
|
||||
(slice_type == P || slice_type == SP)) ||
|
||||
(pps->weighted_bipred_idc == 1 && slice_type == B)
|
||||
|
||||
Submitting PRED_WEIGHTS on a frame where the macro evaluates false
|
||||
triggers VIDIOC_S_EXT_CTRLS to return EINVAL at error_idx=5 (the
|
||||
6th, last control in the per-request batch) on hantro-vpu and any
|
||||
other driver that strictly enforces the spec.
|
||||
|
||||
Smoke trace from RK3568 hantro on bbb_1080p30 (Main profile, no
|
||||
weighted prediction): every per-frame batch fails identically, 13
|
||||
EINVALs over a 10-frame run. Without this fix, ffmpeg's vaapi-copy
|
||||
falls back to software decode for every frame.
|
||||
|
||||
Fix: narrow num_controls to 5 (excluding PRED_WEIGHTS at index 5)
|
||||
when the macro returns false; keep at 6 when it returns true.
|
||||
|
||||
Defect found and fixed via Phase 6 Step 1 ohm smoke testing. Not
|
||||
part of Sonnet's six-commit upstreamable plan; slotted in as patch
|
||||
0005 ahead of the planned probe-then-set / FRAME_BASED commits
|
||||
because it unblocks per-frame submission on every backing driver,
|
||||
not just hantro.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/h264.c 2026-05-01 20:17:02.108697824 +0000
|
||||
+++ b/src/h264.c 2026-05-01 20:30:02.632190563 +0000
|
||||
@@ -559,8 +559,24 @@
|
||||
}
|
||||
};
|
||||
|
||||
+ /*
|
||||
+ * PRED_WEIGHTS is conditionally required per kernel UAPI:
|
||||
+ * V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) is only
|
||||
+ * true when explicit weighted prediction applies (P/SP slice
|
||||
+ * with WEIGHTED_PRED flag, or B slice with weighted_bipred_idc
|
||||
+ * == 1). Submitting it unconditionally on a frame that does
|
||||
+ * not need it triggers EINVAL at error_idx=5 on hantro and
|
||||
+ * other drivers that strictly enforce the spec.
|
||||
+ *
|
||||
+ * controls[5] is PRED_WEIGHTS (last in array); narrow the
|
||||
+ * submission count to exclude it when not required.
|
||||
+ */
|
||||
+ unsigned int num_controls = 6;
|
||||
+ if (!V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
|
||||
+ num_controls = 5;
|
||||
+
|
||||
rc = v4l2_set_controls(driver_data->video_fd, surface->request_fd,
|
||||
- controls, 6);
|
||||
+ controls, num_controls);
|
||||
if (rc < 0)
|
||||
return VA_STATUS_ERROR_OPERATION_FAILED;
|
||||
|
||||
@@ -0,0 +1,128 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-01
|
||||
Subject: [PATCH] h264: omit per-slice controls in FRAME_BASED mode
|
||||
|
||||
Identified by cross-reference against GStreamer's
|
||||
gst-plugins-bad/sys/v4l2codecs/gstv4l2codech264dec.c (upstream commit
|
||||
9e3e775). At lines 1263-1304, GStreamer gates SLICE_PARAMS and
|
||||
PRED_WEIGHTS submission on is_slice_based(self):
|
||||
|
||||
if (is_slice_based (self)) {
|
||||
control[num_controls].id = V4L2_CID_STATELESS_H264_SLICE_PARAMS;
|
||||
...
|
||||
control[num_controls].id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS;
|
||||
...
|
||||
}
|
||||
|
||||
In V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED, the kernel parses the
|
||||
bitstream itself from the OUTPUT-queue payload; per-slice controls in
|
||||
the request trigger cluster-validation EINVAL at error_idx=count
|
||||
(observed on RK3568 hantro-vpu, kernel 6.19.10).
|
||||
|
||||
This patch:
|
||||
- Reorders controls[] so FRAME_BASED-required entries come first
|
||||
(SPS, PPS, SCALING_MATRIX, DECODE_PARAMS at indices 0..3) and the
|
||||
SLICE_BASED-only entries come last (SLICE_PARAMS, PRED_WEIGHTS at
|
||||
indices 4..5).
|
||||
- Defaults num_controls=4 (FRAME_BASED), expanding to 5 for
|
||||
SLICE_BASED and 6 when V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
|
||||
- Hardcodes slice_based=false for now since patch 0002 sets the
|
||||
device to FRAME_BASED unconditionally. A TODO marks the spot for
|
||||
the planned probe-then-set commit, which will populate
|
||||
context->decode_mode at CreateContext via VIDIOC_QUERYCTRL/
|
||||
G_EXT_CTRLS and replace the hardcoded false with a runtime check.
|
||||
|
||||
Diagnosis chain:
|
||||
- patch 0005 reduced one EINVAL per frame on PRED_WEIGHTS
|
||||
submission, but cluster-level rejection persisted at error_idx=5
|
||||
(count) — meaning kernel walked all 5 controls cleanly but
|
||||
rejected the request as a whole.
|
||||
- dmesg silent → rejection in V4L2 core (v4l2-ctrls-request.c /
|
||||
v4l2-h264.c), not in hantro driver where it could log.
|
||||
- GStreamer reference confirmed FRAME_BASED contract: only 4
|
||||
sequence-and-frame-level controls go in the per-request batch.
|
||||
|
||||
After this patch the kernel should accept the per-request controls
|
||||
and actually decode the bitstream into the CAPTURE buffer.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/h264.c 2026-05-01 20:30:02.632190563 +0000
|
||||
+++ b/src/h264.c 2026-05-01 20:49:46.937497317 +0000
|
||||
@@ -531,6 +531,21 @@
|
||||
|
||||
sps.profile_idc = h264_profile_to_idc(profile);
|
||||
|
||||
+ /*
|
||||
+ * Per-request control batch, ordered so the controls REQUIRED in
|
||||
+ * V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED come first
|
||||
+ * (indices 0..3) and the SLICE_BASED-only controls come last
|
||||
+ * (indices 4..5).
|
||||
+ *
|
||||
+ * Cross-reference: GStreamer gst-plugins-bad
|
||||
+ * sys/v4l2codecs/gstv4l2codech264dec.c (commit 9e3e775,
|
||||
+ * lines 1263-1304) gates SLICE_PARAMS and PRED_WEIGHTS on
|
||||
+ * is_slice_based(self); under FRAME_BASED only SPS/PPS/
|
||||
+ * SCALING_MATRIX/DECODE_PARAMS are submitted. The kernel
|
||||
+ * parses the bitstream itself in FRAME_BASED mode; submitting
|
||||
+ * per-slice controls in that mode triggers cluster-validation
|
||||
+ * EINVAL at error_idx=count.
|
||||
+ */
|
||||
struct v4l2_ext_control controls[6] = {
|
||||
{
|
||||
.id = V4L2_CID_STATELESS_H264_SPS,
|
||||
@@ -545,14 +560,14 @@
|
||||
.p_h264_scaling_matrix = &matrix,
|
||||
.size = sizeof(matrix),
|
||||
}, {
|
||||
- .id = V4L2_CID_STATELESS_H264_SLICE_PARAMS,
|
||||
- .p_h264_slice_params = &slice,
|
||||
- .size = sizeof(slice),
|
||||
- }, {
|
||||
.id = V4L2_CID_STATELESS_H264_DECODE_PARAMS,
|
||||
.p_h264_decode_params = &decode,
|
||||
.size = sizeof(decode),
|
||||
}, {
|
||||
+ .id = V4L2_CID_STATELESS_H264_SLICE_PARAMS,
|
||||
+ .p_h264_slice_params = &slice,
|
||||
+ .size = sizeof(slice),
|
||||
+ }, {
|
||||
.id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS,
|
||||
.ptr = &weights,
|
||||
.size = sizeof(weights),
|
||||
@@ -560,20 +575,24 @@
|
||||
};
|
||||
|
||||
/*
|
||||
- * PRED_WEIGHTS is conditionally required per kernel UAPI:
|
||||
- * V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) is only
|
||||
- * true when explicit weighted prediction applies (P/SP slice
|
||||
- * with WEIGHTED_PRED flag, or B slice with weighted_bipred_idc
|
||||
- * == 1). Submitting it unconditionally on a frame that does
|
||||
- * not need it triggers EINVAL at error_idx=5 on hantro and
|
||||
- * other drivers that strictly enforce the spec.
|
||||
+ * Decode-mode dispatch. Patch 0002 unconditionally sets the
|
||||
+ * device to FRAME_BASED, so we hardcode that here. When the
|
||||
+ * planned probe-then-set commit lands, slice_based becomes
|
||||
+ * context->decode_mode == V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED
|
||||
+ * with context->decode_mode populated at CreateContext via
|
||||
+ * VIDIOC_QUERYCTRL/G_EXT_CTRLS.
|
||||
*
|
||||
- * controls[5] is PRED_WEIGHTS (last in array); narrow the
|
||||
- * submission count to exclude it when not required.
|
||||
+ * FRAME_BASED: 4 controls (SPS, PPS, SCALING_MATRIX, DECODE_PARAMS).
|
||||
+ * SLICE_BASED: +SLICE_PARAMS (always), +PRED_WEIGHTS (when
|
||||
+ * V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED).
|
||||
*/
|
||||
- unsigned int num_controls = 6;
|
||||
- if (!V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
|
||||
+ const bool slice_based = false; /* TODO: probe via context->decode_mode */
|
||||
+ unsigned int num_controls = 4;
|
||||
+ if (slice_based) {
|
||||
num_controls = 5;
|
||||
+ if (V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
|
||||
+ num_controls = 6;
|
||||
+ }
|
||||
|
||||
rc = v4l2_set_controls(driver_data->video_fd, surface->request_fd,
|
||||
controls, num_controls);
|
||||
@@ -0,0 +1,59 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-01
|
||||
Subject: [PATCH] context: enable ANNEX_B start-code emission to match device
|
||||
|
||||
Patch 0002 sets V4L2_CID_STATELESS_H264_START_CODE to ANNEX_B on the
|
||||
device, telling the kernel that OUTPUT-buffer payloads will contain
|
||||
0x00 0x00 0x01 NAL start codes. picture.c::codec_store_buffer has
|
||||
the prepend logic guarded by `if (context->h264_start_code)`, but
|
||||
that boolean is set ONLY inside h264_get_controls() — a function
|
||||
that exists but is never called.
|
||||
|
||||
Result: device expects ANNEX_B, libva-v4l2-request feeds raw NAL
|
||||
payloads with no start codes, kernel cannot find slice boundaries,
|
||||
hantro emits a zeroed CAPTURE buffer. mpv reports successful decode
|
||||
because the V4L2 round-trip succeeds (no EINVAL); the visual output
|
||||
is a flat dark-green frame (NV12 zero through BT.709).
|
||||
|
||||
Identified via:
|
||||
- Patch 0006 cleared the EINVAL cluster-rejection (128 → 0 on
|
||||
bbb_1080p30) but visual output remained flat green.
|
||||
- GStreamer reference (gstv4l2codech264dec.c:1363-1377) confirms
|
||||
start codes are required when ANNEX_B is selected.
|
||||
- Source-archaeology of fourier's picture.c:67-74 showed the gate
|
||||
on context->h264_start_code.
|
||||
|
||||
Fix: in context.c::RequestCreateContext, immediately after patch
|
||||
0002's device-control block, set context_object->h264_start_code =
|
||||
true to match the ANNEX_B mode we just programmed. Hardcoded for
|
||||
now (matches 0002's hardcoded set); replaced with a runtime probe
|
||||
in the planned probe-then-set commit.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/context.c 2026-05-01 20:48:59.884816330 +0000
|
||||
+++ b/src/context.c 2026-05-01 20:59:54.446340219 +0000
|
||||
@@ -146,6 +146,23 @@
|
||||
dev_ctrls, 2);
|
||||
}
|
||||
|
||||
+ /*
|
||||
+ * Mirror the ANNEX_B start-code mode set on the device above
|
||||
+ * into context_object->h264_start_code so picture.c::
|
||||
+ * codec_store_buffer prepends 0x00 0x00 0x01 to each slice
|
||||
+ * payload it copies into the OUTPUT buffer. Without this, the
|
||||
+ * kernel — which we just told to expect ANNEX_B — sees a raw
|
||||
+ * NAL stream with no start codes, fails to find slice
|
||||
+ * boundaries, and emits a zeroed CAPTURE buffer (visually a
|
||||
+ * flat dark-green frame).
|
||||
+ *
|
||||
+ * h264_get_controls() exists for this purpose but is never
|
||||
+ * called in the current code path; the planned probe-then-set
|
||||
+ * commit will replace this hardcoded assignment with a runtime
|
||||
+ * read of the kernel's accepted START_CODE value.
|
||||
+ */
|
||||
+ context_object->h264_start_code = true;
|
||||
+
|
||||
rc = v4l2_set_stream(driver_data->video_fd, output_type, true);
|
||||
if (rc < 0) {
|
||||
status = VA_STATUS_ERROR_OPERATION_FAILED;
|
||||
@@ -0,0 +1,87 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-01
|
||||
Subject: [PATCH] h264: fill DECODE_PARAMS frame_num + field flags from VAAPI
|
||||
|
||||
Fourier's h264_va_picture_to_v4l2 only populated four fields of the
|
||||
struct v4l2_ctrl_h264_decode_params: dpb (via h264_fill_dpb),
|
||||
nal_ref_idc, top_field_order_cnt, bottom_field_order_cnt, and the
|
||||
IDR_PIC flag. Many other required-by-spec fields were left at zero-
|
||||
init (frame_num, idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
|
||||
dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
|
||||
slice_group_change_cycle, FIELD_PIC and BOTTOM_FIELD flags).
|
||||
|
||||
For an IDR (first frame) on hantro-vpu RK3568, the kernel parses
|
||||
the bitstream from the OUTPUT buffer and uses these fields to drive
|
||||
its bitstream-element offset tracking. Empirically the kernel
|
||||
returned a successfully-decoded but ZEROED CAPTURE buffer — flat
|
||||
dark-green frames in mpv output, no errors logged.
|
||||
|
||||
This patch fills every field VAAPI exposes:
|
||||
|
||||
- frame_num: from VAPicture->frame_num.
|
||||
- FIELD_PIC flag: from VAPicture->pic_fields.bits.field_pic_flag.
|
||||
- BOTTOM_FIELD flag: from
|
||||
VAPicture->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD.
|
||||
|
||||
Also corrects the IDR_PIC flag to use |= instead of = so the new
|
||||
field flags don't clobber it.
|
||||
|
||||
Fields NOT derivable from VAAPI's pre-parsed structures —
|
||||
idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
|
||||
dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
|
||||
slice_group_change_cycle — require a slice_header() bit-level
|
||||
parse. libva-v4l2-request does not currently do this. They remain
|
||||
at zero-init.
|
||||
|
||||
Empirical question this patch answers: does hantro tolerate the
|
||||
bit_size fields being zero for IDR frames, or does it strictly
|
||||
require them? If post-patch CAPTURE is still zeroed, a slice-header
|
||||
parser is required. If CAPTURE shows real picture data, hantro
|
||||
fills in the bit-positions itself when no hint is supplied.
|
||||
|
||||
Cross-reference: gstv4l2codech264dec.c::
|
||||
gst_v4l2_codec_h264_dec_fill_decoder_params (commit 9e3e775,
|
||||
lines 632-678).
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/h264.c 2026-05-01 20:59:41.710154198 +0000
|
||||
+++ b/src/h264.c 2026-05-01 21:16:35.712995986 +0000
|
||||
@@ -243,13 +243,34 @@
|
||||
|
||||
h264_fill_dpb(driver_data, context, decode);
|
||||
|
||||
- //decode->num_slices = surface->slices_count;
|
||||
+ /*
|
||||
+ * Populate every V4L2_CID_STATELESS_H264_DECODE_PARAMS field
|
||||
+ * we can derive from VAAPI's pre-parsed VAPictureParameterBuffer
|
||||
+ * + bitstream byte. Cross-reference: GStreamer
|
||||
+ * gstv4l2codech264dec.c::gst_v4l2_codec_h264_dec_fill_decoder_params
|
||||
+ * (lines 632-678).
|
||||
+ *
|
||||
+ * Fields not derivable from VAAPI (idr_pic_id, pic_order_cnt_lsb,
|
||||
+ * delta_pic_order_cnt_*, dec_ref_pic_marking_bit_size,
|
||||
+ * pic_order_cnt_bit_size, slice_group_change_cycle) require a
|
||||
+ * full slice_header() bit-level parse, which libva-v4l2-request
|
||||
+ * does not currently do. They are left at zero-init and the
|
||||
+ * kernel-side hantro-vpu may compute them itself when scanning
|
||||
+ * the OUTPUT bitstream — a hypothesis verified empirically by
|
||||
+ * running this patch and inspecting the CAPTURE buffer.
|
||||
+ */
|
||||
decode->nal_ref_idc = nal_ref_idc;
|
||||
- if (nal_unit_type == 5)
|
||||
- decode->flags = V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC;
|
||||
+ decode->frame_num = VAPicture->frame_num;
|
||||
decode->top_field_order_cnt = VAPicture->CurrPic.TopFieldOrderCnt;
|
||||
decode->bottom_field_order_cnt = VAPicture->CurrPic.BottomFieldOrderCnt;
|
||||
|
||||
+ if (nal_unit_type == 5)
|
||||
+ decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC;
|
||||
+ if (VAPicture->pic_fields.bits.field_pic_flag)
|
||||
+ decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC;
|
||||
+ if (VAPicture->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD)
|
||||
+ decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD;
|
||||
+
|
||||
pps->weighted_bipred_idc =
|
||||
VAPicture->pic_fields.bits.weighted_bipred_idc;
|
||||
pps->pic_init_qs_minus26 = VAPicture->pic_init_qs_minus26;
|
||||
@@ -0,0 +1,58 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-01
|
||||
Subject: [PATCH] surface: don't VIDIOC_S_FMT the CAPTURE queue
|
||||
|
||||
The hantro stateless decoder derives the CAPTURE format from the
|
||||
SPS attached to the per-request OUTPUT controls. Calling
|
||||
VIDIOC_S_FMT on the CAPTURE queue at vaCreateSurfaces2 time can
|
||||
leave the driver's vb2 state in an inconsistent configuration
|
||||
where the queue accepts buffers and DQBUF returns successfully but
|
||||
the kernel never actually writes decoded pixels into them.
|
||||
|
||||
Cross-reference: GStreamer's gst-plugins-bad/sys/v4l2codecs/
|
||||
gstv4l2decoder.c only calls VIDIOC_G_FMT on the CAPTURE side
|
||||
(via gst_v4l2_decoder_negotiate_src_format and friends). The
|
||||
same code path produces correctly-decoded NV12 frames on the
|
||||
same RK3568 hantro-vpu where libva-v4l2-request-with-S_FMT
|
||||
emits flat-green zeroed CAPTURE buffers.
|
||||
|
||||
The v4l2_get_format() call immediately after this block already
|
||||
gives us the bytesperline / sizes the driver chose; nothing else
|
||||
in this file consumed the explicit S_FMT side-effects.
|
||||
|
||||
Empirical hypothesis test for the lingering "kernel decodes
|
||||
without errors but emits zeroed CAPTURE" bug. If post-patch
|
||||
output shows actual picture content, this confirms the
|
||||
diagnosis: explicit CAPTURE format mutation breaks hantro's
|
||||
internal state. If output remains flat-green, the bug is
|
||||
elsewhere and we resume hex-dump-grade instrumentation.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/surface.c 2026-05-01 21:16:19.588759711 +0000
|
||||
+++ b/src/surface.c 2026-05-01 21:41:12.095146549 +0000
|
||||
@@ -118,10 +118,20 @@
|
||||
|
||||
capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);
|
||||
|
||||
- rc = v4l2_set_format(driver_data->video_fd, capture_type,
|
||||
- video_format->v4l2_format, width, height);
|
||||
- if (rc < 0)
|
||||
- return VA_STATUS_ERROR_OPERATION_FAILED;
|
||||
+ /*
|
||||
+ * Do not VIDIOC_S_FMT on the CAPTURE queue. The hantro
|
||||
+ * stateless decoder derives the CAPTURE format from the
|
||||
+ * SPS attached to the OUTPUT request; explicitly setting
|
||||
+ * it here can put the driver into an inconsistent state.
|
||||
+ * GStreamer's v4l2slh264dec only G_FMTs CAPTURE (see
|
||||
+ * gst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c::
|
||||
+ * gst_v4l2_decoder_negotiate_src_format), and that
|
||||
+ * variant produces correct decoded NV12 on the same
|
||||
+ * hardware where this driver currently emits zeros.
|
||||
+ *
|
||||
+ * v4l2_get_format() below queries the driver's current
|
||||
+ * state and gives us the bytesperline/sizes we need.
|
||||
+ */
|
||||
} else {
|
||||
video_format = driver_data->video_format;
|
||||
capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);
|
||||
@@ -0,0 +1,101 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-01
|
||||
Subject: [PATCH] DEBUG: hex-dump OUTPUT and CAPTURE buffer contents per frame
|
||||
|
||||
Diagnostic-only patch (NOT for upstream). Hex-dumps:
|
||||
- First 32 bytes of OUTPUT buffer at QBUF time in
|
||||
picture.c::RequestEndPicture (i.e. what we feed the kernel)
|
||||
- First 32 bytes of CAPTURE Y-plane after DQBUF in
|
||||
surface.c::RequestSyncSurface (i.e. what kernel returned)
|
||||
|
||||
Lets us see whether:
|
||||
- OUTPUT bitstream begins with valid ANNEX_B start code + NAL
|
||||
header byte (e.g. `00 00 01 65` for IDR slice)
|
||||
- CAPTURE Y-plane after decode contains varied luma data
|
||||
(working) vs. all-zeros / repeating pattern (kernel didn't
|
||||
write anything).
|
||||
|
||||
Removed once Step 1 decode is verified working. Output goes via
|
||||
existing request_log() to stderr.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/picture.c 2026-05-01 21:41:00.114969150 +0000
|
||||
+++ b/src/picture.c 2026-05-01 21:50:11.123117853 +0000
|
||||
@@ -36,6 +36,7 @@
|
||||
#include "mpeg2.h"
|
||||
|
||||
#include <assert.h>
|
||||
+#include <stdio.h>
|
||||
#include <string.h>
|
||||
|
||||
#include <errno.h>
|
||||
@@ -354,6 +355,27 @@
|
||||
if (rc < 0)
|
||||
return VA_STATUS_ERROR_OPERATION_FAILED;
|
||||
|
||||
+ /*
|
||||
+ * DEBUG INSTRUMENTATION (0010): hex-dump first 32 bytes of the
|
||||
+ * OUTPUT buffer at the moment we hand it to the kernel. Helps
|
||||
+ * pin down whether our bitstream prepend logic is correct.
|
||||
+ * For a valid ANNEX_B IDR slice the dump should start
|
||||
+ * 00 00 01 65 ... (00 00 01 = start code; 0x65 = nal_ref_idc=3,
|
||||
+ * nal_unit_type=5 = IDR slice). Removed once Step 1 decode is
|
||||
+ * verified working.
|
||||
+ */
|
||||
+ {
|
||||
+ const unsigned char *p = surface_object->source_data;
|
||||
+ char hex[32 * 3 + 1] = { 0 };
|
||||
+ unsigned int i, n = surface_object->slices_size < 32 ?
|
||||
+ surface_object->slices_size : 32;
|
||||
+ for (i = 0; i < n; i++)
|
||||
+ snprintf(hex + i * 3, 4, " %02x", p[i]);
|
||||
+ request_log("OUTPUT[idx=%u, len=%u]:%s\n",
|
||||
+ surface_object->source_index,
|
||||
+ surface_object->slices_size, hex);
|
||||
+ }
|
||||
+
|
||||
rc = v4l2_queue_buffer(driver_data->video_fd, request_fd, output_type,
|
||||
&surface_object->timestamp,
|
||||
surface_object->source_index,
|
||||
--- a/src/surface.c 2026-05-01 21:41:12.095146549 +0000
|
||||
+++ b/src/surface.c 2026-05-01 21:50:15.895188360 +0000
|
||||
@@ -29,6 +29,7 @@
|
||||
|
||||
#include <assert.h>
|
||||
#include <errno.h>
|
||||
+#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <unistd.h>
|
||||
@@ -364,6 +365,30 @@
|
||||
goto error;
|
||||
}
|
||||
|
||||
+ /*
|
||||
+ * DEBUG INSTRUMENTATION (0010): hex-dump first 32 bytes of the
|
||||
+ * decoded CAPTURE Y-plane after DQBUF. If the kernel actually
|
||||
+ * decoded the frame, these should reflect a real Y-luma pattern
|
||||
+ * (varied bytes). All-zero or all-identical means no decode
|
||||
+ * landed pixels in the buffer. Removed once Step 1 is verified.
|
||||
+ */
|
||||
+ {
|
||||
+ const unsigned char *p =
|
||||
+ (unsigned char *)surface_object->destination_map[0];
|
||||
+ char hex[32 * 3 + 1] = { 0 };
|
||||
+ unsigned int i;
|
||||
+ if (p == NULL) {
|
||||
+ request_log("CAPTURE[idx=%u, plane0]: (NULL)\n",
|
||||
+ surface_object->destination_index);
|
||||
+ } else {
|
||||
+ for (i = 0; i < 32; i++)
|
||||
+ snprintf(hex + i * 3, 4, " %02x", p[i]);
|
||||
+ request_log("CAPTURE[idx=%u, plane0]:%s\n",
|
||||
+ surface_object->destination_index,
|
||||
+ hex);
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
surface_object->status = VASurfaceDisplaying;
|
||||
|
||||
status = VA_STATUS_SUCCESS;
|
||||
@@ -0,0 +1,53 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-01
|
||||
Subject: [PATCH] DEBUG: sentinel-pattern test for CAPTURE buffer write
|
||||
|
||||
Diagnostic-only. Writes 0xab×32 into the CAPTURE buffer's first 32
|
||||
bytes immediately before VIDIOC_QBUF. The 0010 hex-dump after
|
||||
DQBUF reveals which case we're in:
|
||||
|
||||
- All 0xab → kernel never wrote to this buffer (wrong buffer
|
||||
chosen, alias, or no decode actually happened despite
|
||||
bytesused=3655712 reported).
|
||||
- All zeros → kernel did write 0x00s (overwriting our
|
||||
sentinel), and the apparent "no picture" output is the
|
||||
kernel-side decode actually producing zeros (e.g. parser
|
||||
rejected the bitstream).
|
||||
- Mix of zeros and real luma values → kernel wrote real
|
||||
decoded pixels; CPU read sees stale-cached zeros somewhere
|
||||
OR the sentinel area was a header that decoder zeroed but
|
||||
rest is real. Need to check more bytes.
|
||||
- All 0xab still → kernel never touched this region but other
|
||||
parts of buffer may be filled (incomplete decode).
|
||||
|
||||
Removed once Step 1 decode is verified.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/picture.c 2026-05-01 21:50:11.123117853 +0000
|
||||
+++ b/src/picture.c 2026-05-01 22:20:20.589037667 +0000
|
||||
@@ -349,6 +349,24 @@
|
||||
if (rc != VA_STATUS_SUCCESS)
|
||||
return rc;
|
||||
|
||||
+ /*
|
||||
+ * DEBUG INSTRUMENTATION (0011): write a sentinel pattern into
|
||||
+ * the CAPTURE buffer's first 32 bytes BEFORE QBUF. If after
|
||||
+ * DQBUF the sentinel survives (per surface.c hex dump), the
|
||||
+ * kernel never wrote to this buffer. If the sentinel is gone
|
||||
+ * (replaced by zeros), the kernel did write but our CPU read
|
||||
+ * sees stale-cached data — cache-coherency issue.
|
||||
+ */
|
||||
+ {
|
||||
+ unsigned char *p = (unsigned char *)
|
||||
+ surface_object->destination_map[0];
|
||||
+ if (p != NULL) {
|
||||
+ unsigned int i;
|
||||
+ for (i = 0; i < 32; i++)
|
||||
+ p[i] = 0xab;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
rc = v4l2_queue_buffer(driver_data->video_fd, -1, capture_type, NULL,
|
||||
surface_object->destination_index, 0,
|
||||
surface_object->destination_buffers_count);
|
||||
@@ -0,0 +1,214 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-02
|
||||
Subject: [PATCH] h264: gate SCALING_MATRIX submission on VAIQMatrixBuffer presence
|
||||
|
||||
VAAPI signals "explicit scaling lists are present in the bitstream"
|
||||
implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a
|
||||
VAIQMatrixBufferH264 alongside RenderPicture iff
|
||||
sps_scaling_matrix_present_flag || pps_scaling_matrix_present_flag.
|
||||
When the bitstream uses default (flat) scaling, no IQMatrixBuffer
|
||||
arrives and the in-tree h264.matrix struct stays zero-initialised.
|
||||
|
||||
fourier's existing codec_store_buffer for MPEG2 and HEVC tracks this
|
||||
via a per-surface iqmatrix_set boolean (surface.h::mpeg2.iqmatrix_set,
|
||||
h265.iqmatrix_set) — the H.264 path was missing the equivalent flag,
|
||||
so set_controls always submitted the scaling matrix, including the
|
||||
zero-initialised case.
|
||||
|
||||
Symptom on hantro-vpu RK3568: when TRANSFORM_8X8_MODE is enabled in
|
||||
PPS, the kernel multiplies all 8x8 DCT coefficients by the zeroed
|
||||
scaling_list_8x8, producing a zeroed CAPTURE buffer despite a
|
||||
successful decode round-trip (no V4L2_BUF_FLAG_ERROR,
|
||||
bytesused=3655712 reported).
|
||||
|
||||
Earlier draft of this patch unconditionally omitted SCALING_MATRIX in
|
||||
FRAME_BASED. That's corpus-correct (bbb has no explicit scaling
|
||||
lists) but the wrong predicate: the kernel-side gating is by
|
||||
"matrix-supplied vs. not," not by decode mode. Streams that signal
|
||||
explicit scaling lists must submit SCALING_MATRIX in either mode.
|
||||
|
||||
Contract verification (audit_0008_decode_params_2026-05-01.md +
|
||||
hantro_h264.c::assemble_scaling_list): the kernel uses the supplied
|
||||
matrix when SCALING_MATRIX is in the control batch and falls back
|
||||
to spec-defined defaults when absent. Mode-independent.
|
||||
|
||||
This patch:
|
||||
- surface.h: adds bool matrix_set to params.h264, mirroring
|
||||
mpeg2.iqmatrix_set / h265.iqmatrix_set.
|
||||
- picture.c codec_store_buffer (H.264 VAIQMatrixBufferType case):
|
||||
sets matrix_set = true when the buffer arrives.
|
||||
- picture.c RequestBeginPicture: resets matrix_set = false at the
|
||||
start of each Begin/Render/End cycle.
|
||||
- h264.c h264_set_controls: builds the controls[] array
|
||||
incrementally; SPS/PPS/DECODE_PARAMS always; SCALING_MATRIX iff
|
||||
matrix_set; SLICE_PARAMS only in SLICE_BASED; PRED_WEIGHTS only
|
||||
when both SLICE_BASED and V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
|
||||
|
||||
The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is preserved —
|
||||
kernel doc ext-ctrls-codec-stateless.rst:752: "When this mode is
|
||||
selected, the V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall
|
||||
not be set."
|
||||
|
||||
Cross-reference: kernel UAPI section
|
||||
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SCALING_MATRIX
|
||||
(matrix supplied iff explicit scaling lists in bitstream) and
|
||||
hantro_h264.c::assemble_scaling_list (consumes supplied matrix or
|
||||
falls back to defaults).
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/surface.h
|
||||
+++ b/src/surface.h
|
||||
@@ -73,6 +73,7 @@
|
||||
} mpeg2;
|
||||
struct {
|
||||
VAIQMatrixBufferH264 matrix;
|
||||
+ bool matrix_set;
|
||||
VAPictureParameterBufferH264 picture;
|
||||
VASliceParameterBufferH264 slice;
|
||||
} h264;
|
||||
--- a/src/picture.c
|
||||
+++ b/src/picture.c
|
||||
@@ -153,6 +153,7 @@
|
||||
memcpy(&surface_object->params.h264.matrix,
|
||||
buffer_object->data,
|
||||
sizeof(surface_object->params.h264.matrix));
|
||||
+ surface_object->params.h264.matrix_set = true;
|
||||
break;
|
||||
|
||||
case VAProfileHEVCMain:
|
||||
@@ -255,6 +256,7 @@
|
||||
surface_object->source_size = slot->size;
|
||||
surface_object->slices_size = 0;
|
||||
surface_object->slices_count = 0;
|
||||
+ surface_object->params.h264.matrix_set = false;
|
||||
|
||||
surface_object->status = VASurfaceRendering;
|
||||
context_object->render_surface_id = surface_id;
|
||||
--- a/src/h264.c
|
||||
+++ b/src/h264.c
|
||||
@@ -553,66 +553,68 @@
|
||||
sps.profile_idc = h264_profile_to_idc(profile);
|
||||
|
||||
/*
|
||||
- * Per-request control batch, ordered so the controls REQUIRED in
|
||||
- * V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED come first
|
||||
- * (indices 0..3) and the SLICE_BASED-only controls come last
|
||||
- * (indices 4..5).
|
||||
+ * Build the per-request control list incrementally:
|
||||
+ * - SPS, PPS, DECODE_PARAMS: always required (in either decode
|
||||
+ * mode).
|
||||
+ * - SCALING_MATRIX: gated on surface->params.h264.matrix_set,
|
||||
+ * i.e. the consumer sent a VAIQMatrixBufferH264 this frame.
|
||||
+ * This matches the H.264 spec: explicit scaling lists are
|
||||
+ * present iff sps_scaling_matrix_present_flag ||
|
||||
+ * pps_scaling_matrix_present_flag, in which case VAAPI
|
||||
+ * consumers send the matrix; otherwise the kernel uses
|
||||
+ * spec-defined defaults. Independent of FRAME_BASED /
|
||||
+ * SLICE_BASED.
|
||||
+ * - SLICE_PARAMS: SLICE_BASED only. Kernel doc
|
||||
+ * ext-ctrls-codec-stateless.rst (FRAME_BASED entry):
|
||||
+ * "When this mode is selected, the
|
||||
+ * V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall not be
|
||||
+ * set." Submitting it under FRAME_BASED triggers cluster-
|
||||
+ * validation EINVAL at error_idx=count.
|
||||
+ * - PRED_WEIGHTS: SLICE_BASED + V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
|
||||
*
|
||||
- * Cross-reference: GStreamer gst-plugins-bad
|
||||
- * sys/v4l2codecs/gstv4l2codech264dec.c (commit 9e3e775,
|
||||
- * lines 1263-1304) gates SLICE_PARAMS and PRED_WEIGHTS on
|
||||
- * is_slice_based(self); under FRAME_BASED only SPS/PPS/
|
||||
- * SCALING_MATRIX/DECODE_PARAMS are submitted. The kernel
|
||||
- * parses the bitstream itself in FRAME_BASED mode; submitting
|
||||
- * per-slice controls in that mode triggers cluster-validation
|
||||
- * EINVAL at error_idx=count.
|
||||
- */
|
||||
- struct v4l2_ext_control controls[6] = {
|
||||
- {
|
||||
- .id = V4L2_CID_STATELESS_H264_SPS,
|
||||
- .p_h264_sps = &sps,
|
||||
- .size = sizeof(sps),
|
||||
- }, {
|
||||
- .id = V4L2_CID_STATELESS_H264_PPS,
|
||||
- .p_h264_pps = &pps,
|
||||
- .size = sizeof(pps),
|
||||
- }, {
|
||||
- .id = V4L2_CID_STATELESS_H264_SCALING_MATRIX,
|
||||
- .p_h264_scaling_matrix = &matrix,
|
||||
- .size = sizeof(matrix),
|
||||
- }, {
|
||||
- .id = V4L2_CID_STATELESS_H264_DECODE_PARAMS,
|
||||
- .p_h264_decode_params = &decode,
|
||||
- .size = sizeof(decode),
|
||||
- }, {
|
||||
- .id = V4L2_CID_STATELESS_H264_SLICE_PARAMS,
|
||||
- .p_h264_slice_params = &slice,
|
||||
- .size = sizeof(slice),
|
||||
- }, {
|
||||
- .id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS,
|
||||
- .ptr = &weights,
|
||||
- .size = sizeof(weights),
|
||||
- }
|
||||
- };
|
||||
-
|
||||
- /*
|
||||
- * Decode-mode dispatch. Patch 0002 unconditionally sets the
|
||||
- * device to FRAME_BASED, so we hardcode that here. When the
|
||||
- * planned probe-then-set commit lands, slice_based becomes
|
||||
- * context->decode_mode == V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED
|
||||
- * with context->decode_mode populated at CreateContext via
|
||||
- * VIDIOC_QUERYCTRL/G_EXT_CTRLS.
|
||||
- *
|
||||
- * FRAME_BASED: 4 controls (SPS, PPS, SCALING_MATRIX, DECODE_PARAMS).
|
||||
- * SLICE_BASED: +SLICE_PARAMS (always), +PRED_WEIGHTS (when
|
||||
- * V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED).
|
||||
+ * Patch 0002 unconditionally sets the device to FRAME_BASED,
|
||||
+ * so slice_based is hardcoded false here. When the planned
|
||||
+ * probe-then-set commit lands, this becomes
|
||||
+ * context->decode_mode == V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED.
|
||||
*/
|
||||
+ struct v4l2_ext_control controls[6] = { 0 };
|
||||
+ unsigned int num_controls = 0;
|
||||
const bool slice_based = false; /* TODO: probe via context->decode_mode */
|
||||
- unsigned int num_controls = 4;
|
||||
+
|
||||
+ controls[num_controls].id = V4L2_CID_STATELESS_H264_SPS;
|
||||
+ controls[num_controls].p_h264_sps = &sps;
|
||||
+ controls[num_controls].size = sizeof(sps);
|
||||
+ num_controls++;
|
||||
+
|
||||
+ controls[num_controls].id = V4L2_CID_STATELESS_H264_PPS;
|
||||
+ controls[num_controls].p_h264_pps = &pps;
|
||||
+ controls[num_controls].size = sizeof(pps);
|
||||
+ num_controls++;
|
||||
+
|
||||
+ controls[num_controls].id = V4L2_CID_STATELESS_H264_DECODE_PARAMS;
|
||||
+ controls[num_controls].p_h264_decode_params = &decode;
|
||||
+ controls[num_controls].size = sizeof(decode);
|
||||
+ num_controls++;
|
||||
+
|
||||
+ if (surface->params.h264.matrix_set) {
|
||||
+ controls[num_controls].id = V4L2_CID_STATELESS_H264_SCALING_MATRIX;
|
||||
+ controls[num_controls].p_h264_scaling_matrix = &matrix;
|
||||
+ controls[num_controls].size = sizeof(matrix);
|
||||
+ num_controls++;
|
||||
+ }
|
||||
+
|
||||
if (slice_based) {
|
||||
- num_controls = 5;
|
||||
- if (V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice))
|
||||
- num_controls = 6;
|
||||
+ controls[num_controls].id = V4L2_CID_STATELESS_H264_SLICE_PARAMS;
|
||||
+ controls[num_controls].p_h264_slice_params = &slice;
|
||||
+ controls[num_controls].size = sizeof(slice);
|
||||
+ num_controls++;
|
||||
+
|
||||
+ if (V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(&pps, &slice)) {
|
||||
+ controls[num_controls].id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS;
|
||||
+ controls[num_controls].ptr = &weights;
|
||||
+ controls[num_controls].size = sizeof(weights);
|
||||
+ num_controls++;
|
||||
+ }
|
||||
}
|
||||
|
||||
rc = v4l2_set_controls(driver_data->video_fd, surface->request_fd,
|
||||
@@ -0,0 +1,86 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-02
|
||||
Subject: [PATCH] h264: hardcode SPS level_idc = 51 (intentional over-allocation)
|
||||
|
||||
fourier's h264_va_picture_to_v4l2 never assigns sps->level_idc; the
|
||||
field stays at zero-init. level_idc=0 is invalid per the H.264 spec
|
||||
(lowest legal value is 10, Level 1.0). Hantro and other stateless
|
||||
H.264 decoders use level_idc to pre-allocate decoder resources (DPB
|
||||
size, motion-vector buffers); when fed an invalid level the hantro
|
||||
kernel driver silently skips the decode-hardware dispatch — the V4L2
|
||||
request completes with no error, DQBUF returns the CAPTURE buffer
|
||||
reporting bytesused=3655712 and no V4L2_BUF_FLAG_ERROR, but the
|
||||
buffer is never written.
|
||||
|
||||
VAAPI's decode-side VAPictureParameterBufferH264 structurally does
|
||||
NOT include level_idc — `grep level_idc va/va.h` returns only hits
|
||||
inside VAEncSequenceParameterBufferH264 (the encode path). The
|
||||
H.264 SPS NAL is also not included in VASliceDataBuffer because
|
||||
ffmpeg-vaapi parses it client-side and forwards only slice data
|
||||
(verified empirically via patch 0010's hex-dump of the OUTPUT
|
||||
buffer: it contains "00 00 01 65 ..." — i.e. ANNEX_B start code +
|
||||
IDR slice NAL byte, no SPS NAL). A SPS-NAL byte extractor is
|
||||
therefore not viable from the bitstream libva-v4l2-request
|
||||
receives.
|
||||
|
||||
Workaround: hardcode level_idc = 51 (= Level 5.1, max for 1080p
|
||||
and 4K@30 mainstream consumer profiles). This INTENTIONALLY
|
||||
OVER-ALLOCATES decoder resources but is sufficient for any stream
|
||||
up to 4K@30. It is corpus-correct, not contract-correct: a 4K@60
|
||||
stream (Level 6.x) would under-allocate.
|
||||
|
||||
This patch is a known-incomplete intermediate, not a final fix.
|
||||
The proper upstreamable answer is a level-from-resolution
|
||||
derivation per H.264 Annex A.3 (max MB rate / max frame size
|
||||
thresholds). That requires mapping consumer-side framerate which
|
||||
VAAPI does not expose, so the lookup table is non-trivial. The
|
||||
TODO is captured inline.
|
||||
|
||||
This patch's goal is unblocking decode-hardware engagement on the
|
||||
ohm_gl_fix corpus while the full level-derivation work proceeds.
|
||||
|
||||
Cross-reference: kernel doc
|
||||
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SPS lists
|
||||
level_idc as a required field with no "kernel-derives" annotation —
|
||||
i.e., userspace-required.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/h264.c
|
||||
+++ b/src/h264.c
|
||||
@@ -553,6 +553,35 @@
|
||||
sps.profile_idc = h264_profile_to_idc(profile);
|
||||
|
||||
/*
|
||||
+ * VAAPI's decode-side VAPictureParameterBufferH264 does not carry
|
||||
+ * level_idc — see va.h, the field exists only in
|
||||
+ * VAEncSequenceParameterBufferH264 on the encode path. The H.264
|
||||
+ * SPS NAL is also not included in VASliceDataBuffer (ffmpeg-vaapi
|
||||
+ * parses it client-side and forwards only slice data), so a
|
||||
+ * SPS-NAL byte extractor is not viable from the bitstream we
|
||||
+ * receive.
|
||||
+ *
|
||||
+ * Hantro and other stateless H.264 decoders use level_idc to
|
||||
+ * pre-allocate decoder resources (DPB, motion-vector buffers); a
|
||||
+ * zero-init level_idc=0 is invalid (lowest legal is 10 = Level
|
||||
+ * 1.0) and causes hantro to silently skip the decode hardware
|
||||
+ * dispatch.
|
||||
+ *
|
||||
+ * Hardcode level_idc = 51 (Level 5.1, max for 1080p/4K@30) as a
|
||||
+ * known-incomplete intermediate. This INTENTIONALLY OVER-ALLOCATES
|
||||
+ * decoder resources and is sufficient for any stream up to 4K@30.
|
||||
+ * It is corpus-correct, not contract-correct.
|
||||
+ *
|
||||
+ * TODO: derive level_idc from (VAProfile, picture_width_in_mbs,
|
||||
+ * picture_height_in_mbs) per H.264 Annex A.3 max-MB-per-second
|
||||
+ * thresholds. That is a small lookup table but requires also
|
||||
+ * mapping the consumer's framerate, which VAAPI doesn't provide
|
||||
+ * directly. For now the over-allocation is the upstreamable
|
||||
+ * compromise.
|
||||
+ */
|
||||
+ sps.level_idc = 51;
|
||||
+
|
||||
+ /*
|
||||
* Build the per-request control list incrementally:
|
||||
* - SPS, PPS, DECODE_PARAMS: always required (in either decode
|
||||
* mode).
|
||||
@@ -0,0 +1,97 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-01
|
||||
Subject: [PATCH] DEBUG: dump VAPictureH264 raw bytes + decoded fields
|
||||
|
||||
Diagnostic-only. Investigating the observed anomaly:
|
||||
|
||||
- V4L2 strace shows decode_params.top_field_order_cnt = 65536
|
||||
on the first IDR frame submitted by mpv+ffmpeg+libva-v4l2-request
|
||||
- GStreamer's reference path writes 0 (spec-correct: PicOrderCnt=0
|
||||
for IDR with pic_order_cnt_type=0 / pic_order_cnt_lsb=0)
|
||||
- Reading FFmpeg source (libavcodec/vaapi_h264.c::fill_vaapi_pic):
|
||||
va_pic->TopFieldOrderCnt = 0;
|
||||
if (pic->field_poc[0] != INT_MAX)
|
||||
va_pic->TopFieldOrderCnt = pic->field_poc[0];
|
||||
For IDR: ff_h264_init_poc sets field_poc[0] = poc_msb + poc_lsb
|
||||
= 0 + 0 = 0. So FFmpeg should write 0.
|
||||
|
||||
If FFmpeg writes 0 but fourier reads 65536, the mismatch is in the
|
||||
libva ABI between ffmpeg's writer and our reader. Most likely
|
||||
suspect: VA_PADDING_LOW size in VAPictureH264 differs between the
|
||||
libva headers ffmpeg+libva were built against and the headers
|
||||
fourier was built against, shifting struct field offsets.
|
||||
|
||||
This patch dumps:
|
||||
1. sizeof(VAPictureH264) at our reader's view
|
||||
2. First 32 raw bytes of VAPicture->CurrPic
|
||||
3. Field-decoded values via the .picture_id, .frame_idx, .flags,
|
||||
.TopFieldOrderCnt, .BottomFieldOrderCnt accessors
|
||||
|
||||
If the raw bytes show 00 00 01 00 at offset 12 (= 65536 LE), the
|
||||
field offset is correct and FFmpeg actually wrote 65536 — meaning
|
||||
either FFmpeg has a bug, or our test scenario triggers a non-spec
|
||||
code path. If the raw bytes show 00 00 00 00 at offset 12 but
|
||||
TopFieldOrderCnt accessor returns 65536, the struct ABI is
|
||||
mismatched and we need to reconcile libva versions.
|
||||
|
||||
If sizeof(VAPictureH264) prints as something other than 36 (= 4*5
|
||||
+ 4*VA_PADDING_LOW assuming VA_PADDING_LOW=4), the struct layout
|
||||
on this build differs from the documented libva-2.x layout.
|
||||
|
||||
Removed once the source of the 65536 is identified.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/h264.c 2026-05-01 22:56:42.656744048 +0000
|
||||
+++ b/src/h264.c 2026-05-02 00:00:00.000000000 +0000
|
||||
@@ -28,6 +28,7 @@
|
||||
#include <assert.h>
|
||||
#include <limits.h>
|
||||
#include <string.h>
|
||||
+#include <stdio.h>
|
||||
|
||||
#include <sys/ioctl.h>
|
||||
#include <sys/mman.h>
|
||||
@@ -259,6 +259,42 @@
|
||||
* the OUTPUT bitstream — a hypothesis verified empirically by
|
||||
* running this patch and inspecting the CAPTURE buffer.
|
||||
*/
|
||||
+ /*
|
||||
+ * DEBUG INSTRUMENTATION (0014): dump the raw bytes of
|
||||
+ * VAPicture->CurrPic plus sizeof(VAPictureH264) so we can
|
||||
+ * tell whether the observed TopFieldOrderCnt=65536 anomaly is
|
||||
+ * (a) at the documented byte-offset 12 (ffmpeg-side bug or
|
||||
+ * intentional non-spec encoding) or
|
||||
+ * (b) at a different offset (libva ABI / VA_PADDING_LOW
|
||||
+ * mismatch between ffmpeg's writer and our reader).
|
||||
+ *
|
||||
+ * Documented VAPictureH264 layout (libva-2.x):
|
||||
+ * offset 0: VASurfaceID picture_id (uint32)
|
||||
+ * offset 4: uint32 frame_idx
|
||||
+ * offset 8: uint32 flags
|
||||
+ * offset 12: int32 TopFieldOrderCnt
|
||||
+ * offset 16: int32 BottomFieldOrderCnt
|
||||
+ * offset 20+: uint32 va_reserved[VA_PADDING_LOW]
|
||||
+ */
|
||||
+ {
|
||||
+ const unsigned char *cp = (const unsigned char *)&VAPicture->CurrPic;
|
||||
+ char hex[32 * 3 + 1] = { 0 };
|
||||
+ unsigned int i;
|
||||
+ for (i = 0; i < 32; i++)
|
||||
+ snprintf(hex + i * 3, 4, " %02x", cp[i]);
|
||||
+ request_log("VAPictureH264 sizeof=%zu CurrPic[0..31]:%s\n",
|
||||
+ sizeof(VAPictureH264), hex);
|
||||
+ request_log("VAPictureH264 CurrPic field reads: "
|
||||
+ "picture_id=0x%08x frame_idx=%u flags=0x%x "
|
||||
+ "TopFOC=%d BottomFOC=%d frame_num=%u\n",
|
||||
+ (unsigned)VAPicture->CurrPic.picture_id,
|
||||
+ (unsigned)VAPicture->CurrPic.frame_idx,
|
||||
+ (unsigned)VAPicture->CurrPic.flags,
|
||||
+ (int)VAPicture->CurrPic.TopFieldOrderCnt,
|
||||
+ (int)VAPicture->CurrPic.BottomFieldOrderCnt,
|
||||
+ (unsigned)VAPicture->frame_num);
|
||||
+ }
|
||||
+
|
||||
decode->nal_ref_idc = nal_ref_idc;
|
||||
decode->frame_num = VAPicture->frame_num;
|
||||
decode->top_field_order_cnt = VAPicture->CurrPic.TopFieldOrderCnt;
|
||||
@@ -0,0 +1,150 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-02
|
||||
Subject: [PATCH] h264: strip ffmpeg-vaapi POC sentinel before passing to V4L2
|
||||
|
||||
ROOT CAUSE for "kernel decodes successfully but produces zeroed
|
||||
CAPTURE buffers despite no V4L2_BUF_FLAG_ERROR":
|
||||
|
||||
ffmpeg's H264POCContext initialises prev_poc_msb to (1 << 16) =
|
||||
0x10000 as a sentinel for "uninitialised":
|
||||
libavcodec/h264dec.c:301 — global init in ff_h264_decode_init
|
||||
libavcodec/h264dec.c:444 — IDR reset in idr() helper
|
||||
ff_h264_init_poc (libavcodec/h264_parse.c:296-305) then computes
|
||||
pc->poc_msb = pc->prev_poc_msb whenever the slice header's
|
||||
pic_order_cnt_lsb hasn't wrapped relative to prev_poc_lsb (which
|
||||
is the typical case for any normal H.264 content with sane POC
|
||||
ordering). The sentinel leaks into field_poc[] (line 305) and from
|
||||
there into VAPictureH264.TopFieldOrderCnt / BottomFieldOrderCnt at
|
||||
libavcodec/vaapi_h264.c::fill_vaapi_pic (lines 73-78).
|
||||
|
||||
Empirical confirmation via meitner 2026-05-02 ground-truth test:
|
||||
ran an LD_PRELOAD shim around vaCreateBuffer against an i965
|
||||
VAAPI backend decoding a 60-frame H.264 Main clip. Every frame
|
||||
showed TopFieldOrderCnt = (POC | 0x10000):
|
||||
|
||||
Frame 1 IDR: raw bytes "00 00 01 00" at offset 12 → TopFOC=65536
|
||||
Frame 2: raw bytes "06 00 01 00" → TopFOC=65542
|
||||
Frame 3: "02 00 01 00" → TopFOC=65538
|
||||
|
||||
i965 successfully decodes regardless. V4L2 stateless drivers
|
||||
(hantro_h264.c::prepare_table feeds the value direct to
|
||||
tbl->poc[i*2]/[32], the kernel reflist builder uses it directly
|
||||
for cur_pic_order_count comparison) cannot tolerate the high word —
|
||||
the kernel's resource sizing math sees POC=65536 for an IDR and
|
||||
breaks.
|
||||
|
||||
This patch adds h264_strip_ffmpeg_poc_sentinel() as a small static
|
||||
inline in src/h264.c. It detects bit 16 set rather than blindly
|
||||
subtracting, so a future ffmpeg version that fixes the leak
|
||||
degrades gracefully. The helper is applied at all four POC sites:
|
||||
|
||||
1. h264_fill_dpb: dpb->top_field_order_cnt
|
||||
2. h264_fill_dpb: dpb->bottom_field_order_cnt
|
||||
3. h264_va_picture_to_v4l2: decode->top_field_order_cnt
|
||||
4. h264_va_picture_to_v4l2: decode->bottom_field_order_cnt
|
||||
|
||||
VA_PICTURE_H264_INVALID DPB slots are short-circuited to POC=0
|
||||
because libavcodec/vaapi_h264.c::init_vaapi_pic (line 43) already
|
||||
sets POC=0 there; the sentinel never applies. Zeroing them
|
||||
explicitly removes a class of "stale POC value in invalidated
|
||||
slot" foot-guns.
|
||||
|
||||
Non-trivial follow-ups identified during the meitner experiment
|
||||
that are NOT addressed by this patch:
|
||||
- PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags are
|
||||
not yet derived from VASliceParameterBufferH264.slice_type. The
|
||||
bbb corpus is I-only at the start so this hasn't been a
|
||||
blocker, but a clip with B-frames will need the slice-type
|
||||
routing patch.
|
||||
- h264_fill_dpb's pic_num assignment (entry->pic.picture_id) is
|
||||
almost certainly wrong per the kernel doc — pic_num must equal
|
||||
the H.264 spec's PicNum / FrameNumWrap, not the VAAPI surface
|
||||
id. Out of scope here; will surface as a defect on streams
|
||||
that have multi-frame DPB lookups.
|
||||
|
||||
Cross-references:
|
||||
audit_0008_decode_params_2026-05-01.md — kernel-side consumer
|
||||
audit confirming POC fields are userspace-required.
|
||||
api_contract_findings_2026-05-01.md — VAAPI doc gap on POC
|
||||
semantics; H.264 spec section 8.2.1 is the binding contract.
|
||||
meitner_2026-05-02_vaapi_idr_groundtruth/ — full empirical
|
||||
capture of the sentinel pattern across 60 frames.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/h264.c
|
||||
+++ b/src/h264.c
|
||||
@@ -187,6 +187,43 @@
|
||||
}
|
||||
}
|
||||
|
||||
+/*
|
||||
+ * Strip ffmpeg-vaapi's POC sentinel.
|
||||
+ *
|
||||
+ * ffmpeg's H264POCContext initialises prev_poc_msb to (1 << 16) =
|
||||
+ * 0x10000 in libavcodec/h264dec.c (lines 301 and 444 of v8.0). After
|
||||
+ * an IDR the idr() helper resets prev_poc_msb to that same sentinel.
|
||||
+ * ff_h264_init_poc (libavcodec/h264_parse.c lines 296-305) then
|
||||
+ * computes pc->poc_msb as prev_poc_msb when the slice header's
|
||||
+ * poc_lsb hasn't wrapped — which is the typical case for normal
|
||||
+ * content. The sentinel leaks into field_poc[] and from there into
|
||||
+ * VAPictureH264.TopFieldOrderCnt / BottomFieldOrderCnt at
|
||||
+ * libavcodec/vaapi_h264.c::fill_vaapi_pic.
|
||||
+ *
|
||||
+ * Working VAAPI backends (intel-iHD, i965 verified empirically on
|
||||
+ * meitner 2026-05-02) tolerate the high word — they either mask it
|
||||
+ * or treat POCs as relative comparisons. V4L2 stateless H.264
|
||||
+ * driver-side consumers (hantro_h264.c::prepare_table feeds the
|
||||
+ * value direct to tbl->poc[]) need the spec value, so we strip the
|
||||
+ * sentinel here at the libva-v4l2-request boundary.
|
||||
+ *
|
||||
+ * Detection by bit-16-set rather than blind subtraction so that a
|
||||
+ * future ffmpeg version that fixes the sentinel leak degrades
|
||||
+ * gracefully. POC values for non-degenerate H.264 content rarely
|
||||
+ * exceed 16 bits; bit 16 set is a strong signal of the sentinel.
|
||||
+ *
|
||||
+ * Empty DPB slots (VA_PICTURE_H264_INVALID) carry POC=0 by
|
||||
+ * libavcodec/vaapi_h264.c::init_vaapi_pic and need no fix-up.
|
||||
+ */
|
||||
+static inline int32_t h264_strip_ffmpeg_poc_sentinel(int32_t poc, uint32_t flags)
|
||||
+{
|
||||
+ if (flags & VA_PICTURE_H264_INVALID)
|
||||
+ return 0;
|
||||
+ if (poc & (1 << 16))
|
||||
+ return poc - (1 << 16);
|
||||
+ return poc;
|
||||
+}
|
||||
+
|
||||
static void h264_fill_dpb(struct request_data *data,
|
||||
struct object_context *context,
|
||||
struct v4l2_ctrl_h264_decode_params *decode)
|
||||
@@ -210,8 +247,12 @@
|
||||
|
||||
dpb->frame_num = entry->pic.frame_idx;
|
||||
dpb->pic_num = entry->pic.picture_id;
|
||||
- dpb->top_field_order_cnt = entry->pic.TopFieldOrderCnt;
|
||||
- dpb->bottom_field_order_cnt = entry->pic.BottomFieldOrderCnt;
|
||||
+ dpb->top_field_order_cnt =
|
||||
+ h264_strip_ffmpeg_poc_sentinel(entry->pic.TopFieldOrderCnt,
|
||||
+ entry->pic.flags);
|
||||
+ dpb->bottom_field_order_cnt =
|
||||
+ h264_strip_ffmpeg_poc_sentinel(entry->pic.BottomFieldOrderCnt,
|
||||
+ entry->pic.flags);
|
||||
|
||||
dpb->flags = V4L2_H264_DPB_ENTRY_FLAG_VALID;
|
||||
|
||||
@@ -298,8 +339,12 @@
|
||||
|
||||
decode->nal_ref_idc = nal_ref_idc;
|
||||
decode->frame_num = VAPicture->frame_num;
|
||||
- decode->top_field_order_cnt = VAPicture->CurrPic.TopFieldOrderCnt;
|
||||
- decode->bottom_field_order_cnt = VAPicture->CurrPic.BottomFieldOrderCnt;
|
||||
+ decode->top_field_order_cnt =
|
||||
+ h264_strip_ffmpeg_poc_sentinel(VAPicture->CurrPic.TopFieldOrderCnt,
|
||||
+ VAPicture->CurrPic.flags);
|
||||
+ decode->bottom_field_order_cnt =
|
||||
+ h264_strip_ffmpeg_poc_sentinel(VAPicture->CurrPic.BottomFieldOrderCnt,
|
||||
+ VAPicture->CurrPic.flags);
|
||||
|
||||
if (nal_unit_type == 5)
|
||||
decode->flags |= V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC;
|
||||
@@ -0,0 +1,82 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-02
|
||||
Subject: [PATCH] h264: derive PFRAME / BFRAME flags from VASlice slice_type
|
||||
|
||||
v4l2_ctrl_h264_decode_params.flags has PFRAME and BFRAME bits per
|
||||
ext-ctrls-codec-stateless.rst. fourier never set them; libva-v4l2-
|
||||
request relied on each backing driver tolerating frame-class
|
||||
ambiguity.
|
||||
|
||||
Kernel survey (linux 6.19.x):
|
||||
- tegra-vde/h264.c (lines 783-799) consumes both flags to select
|
||||
the inter-frame decode kernel. Without them the I-frame kernel
|
||||
runs on P/B content.
|
||||
- visl-trace-h264.h uses them for decode tracing.
|
||||
- hantro / rkvdec / cedrus / mediatek / qcom-iris-stateless do
|
||||
not consume the flags.
|
||||
|
||||
Hantro on ohm decoded bbb cleanly without these flags set (see
|
||||
phase6/step1/ohm_smoke_2026-05-02T060255Z_post_0015/), so this is
|
||||
an upstreamability fix for cross-driver portability rather than a
|
||||
correctness fix for hantro.
|
||||
|
||||
VAAPI's VASliceParameterBufferH264.slice_type maps directly to the
|
||||
H.264 slice_header() slice_type field. Per spec 7.4.3:
|
||||
0=P 1=B 2=I 3=SP 4=SI; 5..9 = "all slices in the picture have
|
||||
this slice_type." `slice_type % 5` recovers the underlying type
|
||||
in either encoding form.
|
||||
|
||||
In FRAME_BASED mode we only see surface->params.h264.slice from the
|
||||
most-recent VASliceParameterBuffer — that's fine: a single coded
|
||||
picture has a uniform slice_type for the purposes of the PFRAME /
|
||||
BFRAME flag (multi-slice frames may mix slice types in some streams,
|
||||
but the flag's semantic is "this is an inter-coded frame," which
|
||||
holds if any slice is P or B; using the last-seen slice's type is
|
||||
a reasonable approximation).
|
||||
|
||||
Cross-reference: ext-ctrls-codec-stateless.rst Decode Parameters
|
||||
Flags table.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/h264.c
|
||||
+++ b/src/h264.c
|
||||
@@ -587,6 +587,38 @@
|
||||
&surface->params.h264.slice,
|
||||
&surface->params.h264.picture, &slice, &weights);
|
||||
|
||||
+ /*
|
||||
+ * Derive PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags
|
||||
+ * from VASliceParameterBufferH264.slice_type. VAAPI's slice_type
|
||||
+ * matches the H.264 spec slice_type semantic: 0=P, 1=B, 2=I, 3=SP,
|
||||
+ * 4=SI; values 5..9 mean "all slices in the picture have this
|
||||
+ * slice_type" (mod 5 yields the underlying type). VAAPI consumers
|
||||
+ * (ffmpeg, mpv) populate this for every slice; in FRAME_BASED mode
|
||||
+ * we only see the most-recent slice's params, but slice_type is
|
||||
+ * uniform across a single coded picture for our purposes.
|
||||
+ *
|
||||
+ * Kernel consumers that read these flags: tegra-vde
|
||||
+ * (drivers/media/platform/nvidia/tegra-vde/h264.c lines 783-799 of
|
||||
+ * 6.19.x) selects the inter-frame decode kernel. Hantro / rkvdec /
|
||||
+ * cedrus / mediatek / qcom-iris-stateless do not consume them.
|
||||
+ * Setting them keeps the libva-v4l2-request fork upstreamable
|
||||
+ * across drivers without affecting hantro behaviour.
|
||||
+ *
|
||||
+ * Cross-reference: ext-ctrls-codec-stateless.rst Decode Parameters
|
||||
+ * Flags — V4L2_H264_DECODE_PARAM_FLAG_PFRAME / _BFRAME.
|
||||
+ */
|
||||
+ switch (surface->params.h264.slice.slice_type % 5) {
|
||||
+ case H264_SLICE_P:
|
||||
+ decode.flags |= V4L2_H264_DECODE_PARAM_FLAG_PFRAME;
|
||||
+ break;
|
||||
+ case H264_SLICE_B:
|
||||
+ decode.flags |= V4L2_H264_DECODE_PARAM_FLAG_BFRAME;
|
||||
+ break;
|
||||
+ default:
|
||||
+ /* I / SP / SI: no extra flag. */
|
||||
+ break;
|
||||
+ }
|
||||
+
|
||||
sps.profile_idc = h264_profile_to_idc(profile);
|
||||
|
||||
/*
|
||||
@@ -0,0 +1,124 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-02
|
||||
Subject: [PATCH] h264: fill dpb[].pic_num as PicNum/LongTermPicNum, not VAAPI surface id
|
||||
|
||||
fourier's h264_fill_dpb assigned `dpb->pic_num = entry->pic.picture_id`
|
||||
— the VAAPI surface id. Per ext-ctrls-codec-stateless.rst:651-655,
|
||||
v4l2_h264_dpb_entry.pic_num must equal the H.264 spec PicNum
|
||||
(equation 8-28) for short-term references or LongTermPicNum
|
||||
(equation 8-29) for long-term references. The surface id has no
|
||||
relationship to either.
|
||||
|
||||
Kernel-side consumers of pic_num:
|
||||
- mediatek/decoder/vdec/vdec_h264_req_common.c (line 210):
|
||||
dst_entry->pic_num = src_entry->pic_num. Used for
|
||||
field-coded short-term reference disambiguation.
|
||||
- hantro / rkvdec / cedrus / qcom-iris-stateless: do NOT read
|
||||
pic_num. They resolve refs via reference_ts (timestamp)
|
||||
and POC. This is why fourier's wrong value never surfaced
|
||||
on RK3568 hantro.
|
||||
|
||||
This patch makes pic_num spec-correct so the libva-v4l2-request
|
||||
fork is upstreamable across drivers without depending on each
|
||||
target's tolerance for non-spec fills.
|
||||
|
||||
Computation, derived from H.264 spec section 8.2.4.1:
|
||||
|
||||
For frames (not field-coded), PicNum = FrameNumWrap.
|
||||
FrameNumWrap = (frame_num > cur_frame_num)
|
||||
? frame_num - max_frame_num
|
||||
: frame_num
|
||||
|
||||
max_frame_num = 1 << (sps.log2_max_frame_num_minus4 + 4)
|
||||
cur_frame_num = current picture's frame_num
|
||||
|
||||
For long-term references:
|
||||
LongTermPicNum = long_term_frame_idx (when not field-coded).
|
||||
VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic line 64):
|
||||
VAPictureH264.frame_idx = long_ref ? pic_id : frame_num
|
||||
So long-term refs already carry long_term_frame_idx in frame_idx;
|
||||
we copy it through.
|
||||
|
||||
Field-coded streams require an extra factor-of-2 plus a parity
|
||||
adjustment per spec equations 8-28/8-29; this patch does not handle
|
||||
field-coded content. ohm corpus is all frame-coded so this is a
|
||||
follow-up for later.
|
||||
|
||||
Implementation: add VAPicture parameter to h264_fill_dpb so the
|
||||
function has access to seq_fields.log2_max_frame_num_minus4 and
|
||||
the current picture's frame_num. Update the single caller in
|
||||
h264_va_picture_to_v4l2.
|
||||
|
||||
Cross-reference: kernel doc ext-ctrls-codec-stateless.rst dpb_entry
|
||||
table (line 651-655) and mediatek/vdec/vdec_h264_req_common.c
|
||||
line 210.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/h264.c
|
||||
+++ b/src/h264.c
|
||||
@@ -226,8 +226,12 @@
|
||||
|
||||
static void h264_fill_dpb(struct request_data *data,
|
||||
struct object_context *context,
|
||||
+ VAPictureParameterBufferH264 *VAPicture,
|
||||
struct v4l2_ctrl_h264_decode_params *decode)
|
||||
{
|
||||
+ const int max_frame_num =
|
||||
+ 1 << (VAPicture->seq_fields.bits.log2_max_frame_num_minus4 + 4);
|
||||
+ const int cur_frame_num = (int)VAPicture->frame_num;
|
||||
int i;
|
||||
|
||||
for (i = 0; i < H264_DPB_SIZE; i++) {
|
||||
@@ -246,7 +250,41 @@
|
||||
}
|
||||
|
||||
dpb->frame_num = entry->pic.frame_idx;
|
||||
- dpb->pic_num = entry->pic.picture_id;
|
||||
+
|
||||
+ /*
|
||||
+ * Per ext-ctrls-codec-stateless.rst, dpb[].pic_num must
|
||||
+ * equal the H.264 spec's PicNum (8-28) for short-term refs
|
||||
+ * or LongTermPicNum (8-29) for long-term refs.
|
||||
+ *
|
||||
+ * For frames (not field-coded), PicNum = FrameNumWrap.
|
||||
+ * FrameNumWrap = (frame_num > cur_frame_num)
|
||||
+ * ? frame_num - max_frame_num
|
||||
+ * : frame_num
|
||||
+ * (per spec section 8.2.4.1, frame_num wraparound).
|
||||
+ *
|
||||
+ * VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic
|
||||
+ * line 64): VAPictureH264.frame_idx holds long_term_frame_idx
|
||||
+ * for long-term refs and frame_num for short-term refs. So
|
||||
+ * for long-term entries we copy frame_idx straight through
|
||||
+ * as LongTermPicNum.
|
||||
+ *
|
||||
+ * fourier's previous code set pic_num to picture_id (the
|
||||
+ * VAAPI surface id) which is unrelated to H.264 PicNum;
|
||||
+ * mediatek's vdec_h264_req_common.c::dst_entry->pic_num is
|
||||
+ * one consumer that fails on that. Hantro doesn't read
|
||||
+ * pic_num at all (uses reference_ts for ref resolution),
|
||||
+ * which is why fourier's wrong value never surfaced on
|
||||
+ * RK3568.
|
||||
+ */
|
||||
+ if (entry->pic.flags & VA_PICTURE_H264_LONG_TERM_REFERENCE) {
|
||||
+ dpb->pic_num = entry->pic.frame_idx;
|
||||
+ } else {
|
||||
+ int frame_num = (int)entry->pic.frame_idx;
|
||||
+ dpb->pic_num = (frame_num > cur_frame_num)
|
||||
+ ? frame_num - max_frame_num
|
||||
+ : frame_num;
|
||||
+ }
|
||||
+
|
||||
dpb->top_field_order_cnt =
|
||||
h264_strip_ffmpeg_poc_sentinel(entry->pic.TopFieldOrderCnt,
|
||||
entry->pic.flags);
|
||||
@@ -283,7 +321,7 @@
|
||||
nal_ref_idc = (b[0] >> 5) & 0x3;
|
||||
nal_unit_type = b[0] & 0x1f;
|
||||
|
||||
- h264_fill_dpb(driver_data, context, decode);
|
||||
+ h264_fill_dpb(driver_data, context, VAPicture, decode);
|
||||
|
||||
/*
|
||||
* Populate every V4L2_CID_STATELESS_H264_DECODE_PARAMS field
|
||||
@@ -0,0 +1,159 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: 2026-05-02
|
||||
Subject: [PATCH] h264: derive sps.level_idc from H.264 Annex A.3 MaxFS
|
||||
|
||||
Replaces patch 0013's hardcoded level_idc = 51 with a small lookup
|
||||
that picks the smallest level whose MaxFS contains the encoded
|
||||
frame size. Patch 0013's TODO is resolved by this change.
|
||||
|
||||
VAAPI does not expose level_idc on the decode side
|
||||
(VAPictureParameterBufferH264 has no such field; only
|
||||
VAEncSequenceParameterBufferH264 carries it). The H.264 SPS NAL is
|
||||
parsed client-side by ffmpeg-vaapi and only slice data forwards in
|
||||
VASliceDataBuffer, so a SPS-NAL byte parser is not viable from the
|
||||
bitstream the libva-v4l2-request layer receives. We therefore
|
||||
derive level_idc from picture dimensions, which VAAPI does provide
|
||||
in VAPictureParameterBufferH264.picture_{width,height}_in_mbs_minus1.
|
||||
|
||||
Annex A.3 (Table A-1) MaxFS thresholds:
|
||||
Level 1.0: 99 MBs ( 176×144 = 11×9 = 99 )
|
||||
Level 1.1: 396 ( 352×288 = 22×18 = 396 )
|
||||
Level 2.0: 396
|
||||
Level 2.1: 792 ( 352×576 / 720×288 )
|
||||
Level 2.2: 1620 ( 720×480 ≈ 1350; 720×576 = 1620 )
|
||||
Level 3.0: 1620
|
||||
Level 3.1: 3600 (1280×720 ≈ 3600 )
|
||||
Level 3.2: 5120
|
||||
Level 4.0: 8192 (1920×1088 = 8160 fits )
|
||||
Level 4.1: 8192
|
||||
Level 4.2: 8704
|
||||
Level 5.0: 22080
|
||||
Level 5.1: 36864 (3840×2176 = 32640 fits; 4K@8K-edge )
|
||||
Level 5.2: 36864
|
||||
Level 6.0: 139264 (8K )
|
||||
|
||||
V4L2 control encoding: level_idc = (level major × 10) + (level minor).
|
||||
Level 4.1 → 41, Level 5.1 → 51, Level 6.0 → 60.
|
||||
|
||||
Picks for typical content:
|
||||
1080p (1920×1088 = 8160 MBs) → Level 4.1 (level_idc = 41)
|
||||
4K (3840×2176 = 32640 MBs) → Level 5.1 (level_idc = 51)
|
||||
8K (7680×4352 = 130560 MBs) → Level 6.0 (level_idc = 60)
|
||||
|
||||
The previous hardcode of 51 was over-allocating for 1080p; with
|
||||
this patch hantro can pre-allocate based on the actual frame size.
|
||||
For our ohm corpus (1080p) this drops the requested DPB / MV
|
||||
buffer sizing from level-5.1 generosity to level-4.1 right-sized.
|
||||
|
||||
Without VAAPI exposing framerate we cannot also check MaxMBPS /
|
||||
MaxBR / MaxCPB. The frame-size-based pick is acceptable in
|
||||
practice: temporally-dense streams almost always also push
|
||||
spatially-large frames, so MaxFS captures the dominant
|
||||
resource-sizing signal.
|
||||
|
||||
Cross-reference: H.264 spec Annex A, Table A-1 ("Level limits").
|
||||
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SPS lists
|
||||
level_idc as required-userspace-input, no kernel-derives annotation.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
--- a/src/h264.c
|
||||
+++ b/src/h264.c
|
||||
@@ -638,6 +638,55 @@
|
||||
}
|
||||
}
|
||||
|
||||
+/*
|
||||
+ * Derive sps.level_idc from the encoded frame size in macroblocks per
|
||||
+ * H.264 Annex A.3 (Table A-1) MaxFS thresholds. Each level's MaxFS is
|
||||
+ * the maximum encoded frame size in MBs the level supports; we pick
|
||||
+ * the smallest level whose MaxFS contains the actual frame size.
|
||||
+ *
|
||||
+ * Level decoding for the V4L2 control: level_idc = level * 10
|
||||
+ * Level 1.0 → 10, Level 4.1 → 41, Level 5.1 → 51, Level 6.0 → 60.
|
||||
+ *
|
||||
+ * VAAPI does not expose the bitstream's actual level_idc on the
|
||||
+ * decode side (VAPictureParameterBufferH264 has no such field) — see
|
||||
+ * va.h. The H.264 SPS NAL is parsed client-side by ffmpeg-vaapi /
|
||||
+ * mpv and only slice data is forwarded in VASliceDataBuffer, so a
|
||||
+ * SPS-NAL byte parser is not viable at this layer.
|
||||
+ *
|
||||
+ * Without framerate we cannot also check MaxMBPS / MaxBR / MaxCPB.
|
||||
+ * That gap is acceptable in practice: consumers that push
|
||||
+ * temporally-dense streams (high MBPS) almost always also push
|
||||
+ * spatially-large frames (high MaxFS), so frame-size-based level
|
||||
+ * selection over-allocates on the temporal axis but never
|
||||
+ * under-allocates a level the consumer relies on for correct
|
||||
+ * decode-resource sizing.
|
||||
+ *
|
||||
+ * Picks for typical content:
|
||||
+ * 1080p (8160 MBs) → Level 4.1 (level_idc = 41)
|
||||
+ * 4K (32400 MBs) → Level 5.1 (level_idc = 51)
|
||||
+ * 8K (138240 MBs) → Level 6.0 (level_idc = 60)
|
||||
+ *
|
||||
+ * Replaces the hardcoded level_idc=51 from patch 0013.
|
||||
+ */
|
||||
+static inline __u8 h264_derive_level_idc(unsigned int width_in_mbs,
|
||||
+ unsigned int height_in_mbs)
|
||||
+{
|
||||
+ const unsigned int frame_size_mbs = width_in_mbs * height_in_mbs;
|
||||
+
|
||||
+ if (frame_size_mbs <= 99) return 10; /* Level 1.0 */
|
||||
+ if (frame_size_mbs <= 396) return 11; /* Level 1.1 - 2.0 */
|
||||
+ if (frame_size_mbs <= 792) return 21; /* Level 2.1 */
|
||||
+ if (frame_size_mbs <= 1620) return 22; /* Level 2.2 - 3.0 */
|
||||
+ if (frame_size_mbs <= 3600) return 31; /* Level 3.1 */
|
||||
+ if (frame_size_mbs <= 5120) return 32; /* Level 3.2 */
|
||||
+ if (frame_size_mbs <= 8192) return 41; /* Level 4.0 - 4.1 */
|
||||
+ if (frame_size_mbs <= 8704) return 42; /* Level 4.2 */
|
||||
+ if (frame_size_mbs <= 22080) return 50; /* Level 5.0 */
|
||||
+ if (frame_size_mbs <= 36864) return 51; /* Level 5.1 - 5.2 */
|
||||
+ if (frame_size_mbs <= 139264) return 60; /* Level 6.0 - 6.2 */
|
||||
+ return 62; /* > Level 6 ceiling */
|
||||
+}
|
||||
+
|
||||
int h264_set_controls(struct request_data *driver_data,
|
||||
struct object_context *context,
|
||||
VAProfile profile,
|
||||
@@ -705,33 +754,15 @@
|
||||
sps.profile_idc = h264_profile_to_idc(profile);
|
||||
|
||||
/*
|
||||
- * VAAPI's decode-side VAPictureParameterBufferH264 does not carry
|
||||
- * level_idc — see va.h, the field exists only in
|
||||
- * VAEncSequenceParameterBufferH264 on the encode path. The H.264
|
||||
- * SPS NAL is also not included in VASliceDataBuffer (ffmpeg-vaapi
|
||||
- * parses it client-side and forwards only slice data), so a
|
||||
- * SPS-NAL byte extractor is not viable from the bitstream we
|
||||
- * receive.
|
||||
- *
|
||||
- * Hantro and other stateless H.264 decoders use level_idc to
|
||||
- * pre-allocate decoder resources (DPB, motion-vector buffers); a
|
||||
- * zero-init level_idc=0 is invalid (lowest legal is 10 = Level
|
||||
- * 1.0) and causes hantro to silently skip the decode hardware
|
||||
- * dispatch.
|
||||
- *
|
||||
- * Hardcode level_idc = 51 (Level 5.1, max for 1080p/4K@30) as a
|
||||
- * known-incomplete intermediate. This INTENTIONALLY OVER-ALLOCATES
|
||||
- * decoder resources and is sufficient for any stream up to 4K@30.
|
||||
- * It is corpus-correct, not contract-correct.
|
||||
- *
|
||||
- * TODO: derive level_idc from (VAProfile, picture_width_in_mbs,
|
||||
- * picture_height_in_mbs) per H.264 Annex A.3 max-MB-per-second
|
||||
- * thresholds. That is a small lookup table but requires also
|
||||
- * mapping the consumer's framerate, which VAAPI doesn't provide
|
||||
- * directly. For now the over-allocation is the upstreamable
|
||||
- * compromise.
|
||||
+ * Derive level_idc from encoded frame size per H.264 Annex A.3.
|
||||
+ * VAAPI doesn't expose level_idc on the decode side (see
|
||||
+ * h264_derive_level_idc()'s docblock for the rationale); we pick
|
||||
+ * the smallest level whose MaxFS contains the picture dimensions.
|
||||
+ * Replaces patch 0013's intermediate hardcode of 51.
|
||||
*/
|
||||
- sps.level_idc = 51;
|
||||
+ sps.level_idc = h264_derive_level_idc(
|
||||
+ (unsigned int)surface->params.h264.picture.picture_width_in_mbs_minus1 + 1u,
|
||||
+ (unsigned int)surface->params.h264.picture.picture_height_in_mbs_minus1 + 1u);
|
||||
|
||||
/*
|
||||
* Build the per-request control list incrementally:
|
||||
@@ -0,0 +1,255 @@
|
||||
# Maintainer: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
# Campaign: ohm_gl_fix Phase 6 Step 1
|
||||
#
|
||||
# Forks libva-v4l2-request to add hantro-vpu multiplanar + modern
|
||||
# stateless UAPI support. Conflicts/replaces stock libva-v4l2-request.
|
||||
#
|
||||
# Build target: fermi LXD on hertz (Arch ARM aarch64) via marfrit-packages
|
||||
# Gitea Actions; alternative: boltzmann via his subagent.
|
||||
|
||||
pkgname=libva-v4l2-request-ohm-gl-fix
|
||||
_upstreampkg=libva-v4l2-request
|
||||
pkgver=1.0.0.r0.ga3c2476
|
||||
pkgrel=2
|
||||
pkgdesc="VA-API backend for V4L2 stateless decoders, hantro-vpu multiplanar fork"
|
||||
arch=('aarch64')
|
||||
url="https://github.com/bootlin/libva-v4l2-request"
|
||||
license=('LGPL2.1' 'MIT')
|
||||
depends=('libva' 'libdrm' 'systemd-libs')
|
||||
makedepends=('meson' 'ninja' 'pkgconf' 'git')
|
||||
provides=("${_upstreampkg}=${pkgver}" 'libva-driver')
|
||||
conflicts=("${_upstreampkg}")
|
||||
replaces=("${_upstreampkg}")
|
||||
|
||||
# Bootlin upstream tarball — pinned to last meaningful commit 2019-05-17.
|
||||
# Use full SHA: github extracts the archive to <repo>-<full-sha>/, so the
|
||||
# short form would mismatch ${srcdir}/${_upstreampkg}-${_commit}.
|
||||
_commit=a3c2476de19e6635458273ceeaeceff124fabd63
|
||||
source=(
|
||||
"${_upstreampkg}-${_commit}.tar.gz::https://github.com/bootlin/libva-v4l2-request/archive/${_commit}.tar.gz"
|
||||
"fourier-local.patch"
|
||||
"0001-mplane-multiplanar-port.patch"
|
||||
"0002-pre-streamon-controls-and-output-pool.patch"
|
||||
"0003-v4l2-query-helpers.patch"
|
||||
"0004-context-request-pool.patch"
|
||||
"0005-h264-conditional-pred-weights.patch"
|
||||
"0006-h264-frame-based-omit-slice-controls.patch"
|
||||
"0007-context-h264-start-code-annex-b.patch"
|
||||
"0008-h264-fill-decode-params-from-vaapi.patch"
|
||||
"0009-surface-no-capture-sfmt.patch"
|
||||
"0010-DEBUG-hex-dump-output-capture.patch"
|
||||
"0011-DEBUG-sentinel-capture-buffer.patch"
|
||||
"0012-h264-omit-scaling-matrix-frame-based.patch"
|
||||
"0013-h264-sps-level-idc.patch"
|
||||
"0014-DEBUG-vapic-bytes-dump.patch"
|
||||
"0015-h264-strip-ffmpeg-poc-sentinel.patch"
|
||||
"0016-h264-derive-pframe-bframe-flags.patch"
|
||||
"0017-h264-dpb-picnum-correctness.patch"
|
||||
"0018-h264-level-idc-from-annex-a3.patch"
|
||||
)
|
||||
sha256sums=(
|
||||
'92b523050561d64f7b6016edb53ca00524805f9f31a8b566baf457bbb15716fa'
|
||||
'1577ff1e2fd7944d2af85bba07658c26b2c54787175c2cc9024174ad2425d3ac'
|
||||
'7637c8c76f86a4b745516cdfd1ee89484b7fe7ce88425ff38460bd4494e1451e'
|
||||
'5309582759f260456b15635be610c2fe6fe25cbdf427cf8ad851f74991dc8c6e'
|
||||
'4e38eacc2b2dc26094cbad38964e8dc8bec19d2ad408a37a3ee21952003e6c38'
|
||||
'e5b61965921093292912136ae21727c9c792d0417201d86dc90b2e622f1edbb0'
|
||||
'c7a8e02f2e84c6248586d1ceacf25f4c26578f2f365044c3b4a011080ec016e8'
|
||||
'5ea1f8b193a3cba21631b00ac3d9cb8c6016754f0af47b33fcccce9e0114b32a'
|
||||
'092abc79f639ecb7ac698a48fb544edc3b6eff3cb9b711efa2cc452365c17ed0'
|
||||
'ac81992783f562128f55620dc54507b70026342519bb7c7d3efadf6275387861'
|
||||
'6b2c26feeeaf253f87e0ef0517191b636c6945e374660a13574f3114331aaed6'
|
||||
'ef8706062302fd7f13535501b5b2e8aed5325dbb0d4d56a88674bef53ca96eee'
|
||||
'242a42e10ff09e4e82bcadb8824e036dddab94cf6dc9c5f6c80eb4c2cc5dda50'
|
||||
'417c39397dfbc86db2cabc6217f54d9072de26dedcacf9a965b909fc998de052'
|
||||
'472deb316ff3ad282c6be028cfaf033d69ddfee845dcd519c28a0692f298bb6a'
|
||||
'835378dd0b7c126a6101b8df0c015951d88f5139f9586a618af6b3ee503d67b6'
|
||||
'380d334a88213185183a05e7e55380de503e388fc29d8b11d96909dafcbbeb65'
|
||||
'eaf1e363de111ee43d7ca3e4b161d9a3a3f6b1c9ca3d8642871abe70f18fbf95'
|
||||
'7b6f0f63fdde32a411cf3230cabeb610ef8a6bd09777976a06dbd274daa540c7'
|
||||
'15a0a40b918988e77e5f36eebf15e9f45b2c13a6628b5640efdac528c57aab80'
|
||||
)
|
||||
|
||||
prepare() {
|
||||
cd "${srcdir}/${_upstreampkg}-${_commit}"
|
||||
|
||||
# Patch 0: fourier's stateless-control modernization.
|
||||
# - src/h264.c + src/picture.c → V4L2_CID_STATELESS_H264_*
|
||||
# - include/hevc-ctrls.h → redirect shim to <linux/v4l2-controls.h>
|
||||
# - src/meson.build: h265.c/h265.h commented out (HEVC excluded)
|
||||
patch -p1 < "${srcdir}/fourier-local.patch"
|
||||
|
||||
# Hygiene: include/h264-ctrls.h is dead post-fourier (no source
|
||||
# includes it, no install_headers directive). Drop it so the
|
||||
# built source tree has no stale UAPI carry-over.
|
||||
rm -f include/h264-ctrls.h
|
||||
|
||||
# Patch 1: ohm-gl-fix multiplanar port.
|
||||
# - V4L2_BUF_TYPE_VIDEO_{OUTPUT,CAPTURE} -> *_MPLANE in src/v4l2.c
|
||||
# - per-plane VIDIOC_EXPBUF in src/surface.c
|
||||
# - struct v4l2_plane planes[] threading throughout
|
||||
# - image.c plane-stride adjustments
|
||||
patch -p1 < "${srcdir}/0001-mplane-multiplanar-port.patch"
|
||||
|
||||
# Patch 2: pre-STREAMON device controls + minimum OUTPUT pool.
|
||||
# - context.c: floor OUTPUT pool to 4 buffers (not surfaces_count)
|
||||
# - context.c: set V4L2_CID_STATELESS_H264_{DECODE_MODE,START_CODE}
|
||||
# device-wide before VIDIOC_STREAMON
|
||||
# THROWAWAY: superseded inline by patches 4 (request_pool) and 5
|
||||
# (probe-then-set DECODE_MODE) per upstreamable_design.md §5.
|
||||
patch -p1 < "${srcdir}/0002-pre-streamon-controls-and-output-pool.patch"
|
||||
|
||||
# Patch 3 (commit 2 in revised plan): QUERYCTRL/QUERYMENU helpers.
|
||||
# Pure utility additions to src/v4l2.{c,h}, no behaviour change.
|
||||
# Unblocks the request_pool and probe-then-set commits.
|
||||
patch -p1 < "${srcdir}/0003-v4l2-query-helpers.patch"
|
||||
|
||||
# Patch 4 (commit 3 in revised plan): request_pool decoupling.
|
||||
# NEW src/request_pool.{c,h}; context.c uses pool instead of
|
||||
# per-surface OUTPUT loop; picture.c borrows on Begin, releases
|
||||
# on Sync after DQBUF. Deletes 0002's "floor to 4" hunk inline
|
||||
# — the pool's count parameter supersedes it. 0002's
|
||||
# set_controls block remains until probe-then-set commit lands.
|
||||
patch -p1 < "${srcdir}/0004-context-request-pool.patch"
|
||||
|
||||
# Patch 5: conditional PRED_WEIGHTS submission. Defect-fix found
|
||||
# via ohm smoke testing (kernel rejects PRED_WEIGHTS at
|
||||
# error_idx=5 on Main-profile clips without weighted prediction).
|
||||
# Not part of Sonnet's planned series, but unblocks per-frame
|
||||
# decode on every backing driver.
|
||||
patch -p1 < "${srcdir}/0005-h264-conditional-pred-weights.patch"
|
||||
|
||||
# Patch 6: omit per-slice controls in FRAME_BASED mode. Identified
|
||||
# via cross-reference against GStreamer's gstv4l2codech264dec.c
|
||||
# (commit 9e3e775). FRAME_BASED requests must contain only
|
||||
# SPS/PPS/SCALING_MATRIX/DECODE_PARAMS — submitting SLICE_PARAMS
|
||||
# triggers V4L2 cluster validation EINVAL at error_idx=count.
|
||||
# Hardcodes slice_based=false for now since 0002 sets FRAME_BASED;
|
||||
# promotes to runtime probe via context->decode_mode in a
|
||||
# follow-up commit.
|
||||
patch -p1 < "${srcdir}/0006-h264-frame-based-omit-slice-controls.patch"
|
||||
|
||||
# Patch 7: enable ANNEX_B start-code emission in
|
||||
# codec_store_buffer to match the device-side START_CODE_ANNEX_B
|
||||
# set by 0002. Without this, kernel sees a raw NAL stream with
|
||||
# no 0x00 0x00 0x01 markers, fails to parse slice boundaries,
|
||||
# and emits a zeroed CAPTURE buffer (flat green frames in mpv).
|
||||
patch -p1 < "${srcdir}/0007-context-h264-start-code-annex-b.patch"
|
||||
|
||||
# Patch 8: fill DECODE_PARAMS frame_num + FIELD_PIC/BOTTOM_FIELD
|
||||
# flag bits from VAAPI. fourier left these zero-init; under
|
||||
# FRAME_BASED on hantro the kernel uses them to drive bitstream
|
||||
# parsing. Empirical question: does hantro tolerate the bit_size
|
||||
# fields (idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
|
||||
# dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
|
||||
# slice_group_change_cycle) being zero, or do we need a
|
||||
# slice_header() bit-level parser?
|
||||
patch -p1 < "${srcdir}/0008-h264-fill-decode-params-from-vaapi.patch"
|
||||
|
||||
# Patch 9: drop VIDIOC_S_FMT on CAPTURE queue. Hantro derives
|
||||
# CAPTURE format from per-request SPS; explicit S_FMT here can
|
||||
# leave the driver in a state where DQBUF returns zeroed
|
||||
# buffers despite no errors. GStreamer's reference path only
|
||||
# G_FMTs the CAPTURE side.
|
||||
patch -p1 < "${srcdir}/0009-surface-no-capture-sfmt.patch"
|
||||
|
||||
# Patch 10: DEBUG-only instrumentation. Hex-dumps OUTPUT and
|
||||
# CAPTURE buffer first 32 bytes per frame via request_log().
|
||||
# Removed before upstream submission.
|
||||
patch -p1 < "${srcdir}/0010-DEBUG-hex-dump-output-capture.patch"
|
||||
|
||||
# Patch 11: DEBUG-only sentinel write before CAPTURE QBUF.
|
||||
# Tells us whether kernel wrote to the buffer (sentinel gone)
|
||||
# or didn't (sentinel survives).
|
||||
patch -p1 < "${srcdir}/0011-DEBUG-sentinel-capture-buffer.patch"
|
||||
|
||||
# Patch 12 (REVISED 2026-05-02): gate SCALING_MATRIX submission
|
||||
# on a per-surface matrix_set flag mirroring fourier's existing
|
||||
# mpeg2.iqmatrix_set / h265.iqmatrix_set pattern. The earlier
|
||||
# draft of this patch unconditionally omitted SCALING_MATRIX in
|
||||
# FRAME_BASED, which was corpus-correct (bbb has no explicit
|
||||
# scaling lists) but the wrong predicate — kernel-side gating
|
||||
# is by "matrix-supplied vs. not," not by decode mode. Streams
|
||||
# with explicit scaling lists must still submit in either mode.
|
||||
# Three coordinated changes: surface.h adds bool matrix_set;
|
||||
# picture.c sets it on VAIQMatrixBuffer arrival and resets it
|
||||
# in RequestBeginPicture; h264.c builds controls[] incrementally.
|
||||
# The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is
|
||||
# preserved (kernel doc is explicit on it).
|
||||
patch -p1 < "${srcdir}/0012-h264-omit-scaling-matrix-frame-based.patch"
|
||||
|
||||
# Patch 13 (REVISED 2026-05-02): hardcode SPS level_idc = 51 as
|
||||
# an INTENTIONALLY OVER-ALLOCATING known-incomplete intermediate.
|
||||
# VAAPI's decode-side picture-parameter buffer structurally lacks
|
||||
# level_idc (only present in encode path). The H.264 SPS NAL is
|
||||
# not in VASliceDataBuffer either (ffmpeg-vaapi parses it
|
||||
# client-side and forwards only slice data — verified via the
|
||||
# 0010 hex dump showing OUTPUT first bytes are "00 00 01 65 ...",
|
||||
# i.e. start code + IDR slice NAL, no SPS). So a SPS-NAL byte
|
||||
# extractor is not viable. TODO captured inline for level-from-
|
||||
# resolution derivation per H.264 Annex A.3.
|
||||
patch -p1 < "${srcdir}/0013-h264-sps-level-idc.patch"
|
||||
|
||||
# Patch 14: DEBUG-only — dump VAPictureH264 raw bytes + decoded
|
||||
# fields. Used to disambiguate the TopFieldOrderCnt=65536 anomaly
|
||||
# on ohm in 2026-04-30..2026-05-02 investigation. The dump output
|
||||
# cross-referenced against meitner ground-truth (i965 backend)
|
||||
# confirmed +0x10000 is ffmpeg-vaapi convention, not an ohm bug.
|
||||
# Resolution: patch 0015. This patch stays in the series until
|
||||
# the 65536 sentinel handling has been validated on ohm; remove
|
||||
# before upstream submission.
|
||||
patch -p1 < "${srcdir}/0014-DEBUG-vapic-bytes-dump.patch"
|
||||
|
||||
# Patch 15: strip ffmpeg-vaapi's POC sentinel before passing to
|
||||
# V4L2. Root-cause fix for the "kernel decodes successfully but
|
||||
# produces zeroed CAPTURE buffers" symptom. ffmpeg's
|
||||
# H264POCContext initialises prev_poc_msb to (1 << 16) and the
|
||||
# value leaks through field_poc[] to VAPictureH264. Working
|
||||
# backends (i965, intel-iHD) tolerate the high word; V4L2
|
||||
# stateless drivers cannot. Adds h264_strip_ffmpeg_poc_sentinel()
|
||||
# static inline and applies it at all 4 POC sites (DPB top/bot,
|
||||
# CurrPic top/bot). Detection by bit-16-set so a future ffmpeg
|
||||
# version that fixes the leak degrades gracefully.
|
||||
patch -p1 < "${srcdir}/0015-h264-strip-ffmpeg-poc-sentinel.patch"
|
||||
|
||||
# Patch 16: derive PFRAME / BFRAME flags from VAAPI slice_type.
|
||||
# Upstreamability fix — tegra-vde consumes these flags to choose
|
||||
# the inter-frame decode kernel; hantro/rkvdec/cedrus/mediatek/
|
||||
# qcom don't read them but should still see spec-correct values.
|
||||
patch -p1 < "${srcdir}/0016-h264-derive-pframe-bframe-flags.patch"
|
||||
|
||||
# Patch 17: fill dpb[].pic_num as PicNum/LongTermPicNum per H.264
|
||||
# spec equations 8-28/8-29 instead of fourier's wrong VAAPI
|
||||
# surface-id assignment. Adds VAPicture parameter to h264_fill_dpb
|
||||
# so it can compute FrameNumWrap from log2_max_frame_num_minus4 +
|
||||
# current frame_num. Mediatek consumes pic_num for short-term
|
||||
# field-coded ref disambiguation; hantro doesn't read it (uses
|
||||
# reference_ts), which is why fourier's wrong value never
|
||||
# surfaced on RK3568.
|
||||
patch -p1 < "${srcdir}/0017-h264-dpb-picnum-correctness.patch"
|
||||
|
||||
# Patch 18: derive sps.level_idc from encoded frame size per
|
||||
# H.264 Annex A.3 (Table A-1) MaxFS thresholds. Replaces patch
|
||||
# 0013's intermediate hardcode of 51. For typical content:
|
||||
# 1080p → Level 4.1 (level_idc=41), 4K → 5.1, 8K → 6.0. Hantro
|
||||
# uses level_idc to size DPB / MV buffers; correct sizing means
|
||||
# less wasted memory than 0013's blanket over-allocation.
|
||||
patch -p1 < "${srcdir}/0018-h264-level-idc-from-annex-a3.patch"
|
||||
}
|
||||
|
||||
build() {
|
||||
cd "${srcdir}/${_upstreampkg}-${_commit}"
|
||||
# meson_options.txt only exposes 'kernel_headers' — leave it empty to
|
||||
# use system /usr/include kernel UAPI headers. No per-codec toggles.
|
||||
arch-meson build --buildtype=release
|
||||
meson compile -C build
|
||||
}
|
||||
|
||||
package() {
|
||||
cd "${srcdir}/${_upstreampkg}-${_commit}"
|
||||
meson install -C build --destdir "${pkgdir}"
|
||||
|
||||
install -Dm644 COPYING "${pkgdir}/usr/share/licenses/${pkgname}/COPYING"
|
||||
install -Dm644 COPYING.LGPL "${pkgdir}/usr/share/licenses/${pkgname}/COPYING.LGPL"
|
||||
install -Dm644 COPYING.MIT "${pkgdir}/usr/share/licenses/${pkgname}/COPYING.MIT"
|
||||
}
|
||||
@@ -0,0 +1,58 @@
|
||||
# libva-v4l2-request-ohm-gl-fix
|
||||
|
||||
Bootlin's libva-v4l2-request VA-API backend, with hantro-vpu
|
||||
multi-planar + chromium-149-era stateless H.264 patches developed
|
||||
in the [ohm_gl_fix campaign](../../../ohm_gl_fix/) Phase 6 Step 1
|
||||
(2026-05-01..2026-05-02).
|
||||
|
||||
Patches 0001-0018 are contract-correct against the kernel V4L2
|
||||
stateless H.264 UAPI, validated by inspection against
|
||||
hantro_h264.c and v4l2_h264_init_reflist_builder() in linux 6.19.x.
|
||||
See ohm_gl_fix's `phase6/step1/audit_0008_decode_params_2026-05-01.md`
|
||||
and `phase6/step1/api_contract_findings_2026-05-01.md` for the
|
||||
audit trail.
|
||||
|
||||
## Honest characterisation
|
||||
|
||||
This package compiles cleanly, installs cleanly, and `vainfo` with
|
||||
`LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1`
|
||||
enumerates all H.264 profiles. It is, however, **not on Brave's
|
||||
critical decode path** on the ohm_gl_fix Step-1+Step-2 stack —
|
||||
Brave/Chromium uses its own `V4L2VideoDecoder` in
|
||||
`media/gpu/v4l2/`, opens `/dev/video1` directly, and never loads
|
||||
libva. See `phase3_remeasure_2026-05-02/B3_decoder_discovery.md`
|
||||
in the ohm_gl_fix repo for the strace/maps evidence.
|
||||
|
||||
For libva consumers that DO route through libva (mpv with
|
||||
`--hwdec=vaapi`, ffmpeg-vaapi, Firefox with
|
||||
`media.ffmpeg.vaapi.enabled=true`), this backend would in
|
||||
principle engage hantro hardware decode — but each consumer hits
|
||||
its own downstream issue:
|
||||
|
||||
- mpv `--vo=gpu-next`: blocked by a Mesa-panfrost WSI pitch bug
|
||||
during EGL dmabuf import (out of scope for this package; see
|
||||
`phase3_remeasure_2026-05-02/A3_mesa_wsi_pitch.md`).
|
||||
- mpv `--vo=image`: silently falls back to libavcodec SW decode
|
||||
rather than engaging the libva session (see
|
||||
`phase3_remeasure_2026-05-02/A1_morning_pass_disambiguation.md`).
|
||||
Reason not yet diagnosed.
|
||||
|
||||
The most likely use case where this backend cleanly delivers
|
||||
hardware decode end-to-end is **Firefox via libavcodec-vaapi**, on
|
||||
a stack that also has the Mesa pitch issue resolved (or that
|
||||
doesn't hit the EGL import path because Firefox's video element
|
||||
composites differently). Untested at time of writing.
|
||||
|
||||
## DEBUG patches in the series
|
||||
|
||||
`0010-DEBUG-hex-dump-output-capture.patch`,
|
||||
`0011-DEBUG-sentinel-capture-buffer.patch`, and
|
||||
`0014-DEBUG-vapic-bytes-dump.patch` produce verbose stderr output
|
||||
useful for diagnosing decode-path issues but excessive for
|
||||
production. They are intentionally kept in the applied series
|
||||
during development; remove for cleaner runs.
|
||||
|
||||
## Status
|
||||
|
||||
Tagged stable as of 2026-05-02 against chromium-149 / linux-6.19.10
|
||||
/ libva-2.22.0. Contract-correct; ecosystem-validation-pending.
|
||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user