panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.
This retrofit imports:
- mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18)
(libmali stub blobs at iter18/blob/ excluded
— 109MB of RE artifacts replaced with a README
pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/ — frozen .tgz source snapshots at each milestone
(basis for the 0005 patch diff generation)
Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.
Total: 1.9 MB across 124 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
46 KiB
Phase 1 Source Map — VK_KHR_video_decode_h264 on panvk-bifrost (V4L2/Hantro backend)
Campaign: panvk-bifrost-video (successor to panvk-bifrost r4)
Mesa version: 26.0.6 (source tree on ohm at /home/mfritsche/mesa-build/mesa-26.0.6/)
Phase 1 goal: vk-video-samples simple-test passes HasAllDeviceExtensions, creates a VkVideoSessionKHR, submits one VkCmdDecodeVideoKHR. Decode correctness is Phase 7.
Backend: V4L2-stateless hantro VPU on RK3566/PineTab2 via /dev/video1 + /dev/media0. Mali GPU is not the decode engine.
Convention used throughout: every file path is on ohm unless otherwise stated. Cite as
FILE:LINE. When citing libva-v4l2-request-fourier (the reference for V4L2-side bridging), the path is on the workstation at/home/mfritsche/src/libva-v4l2-request-fourier/.
Executive summary
The Mesa 26.0.6 video stack is structured in three layers:
- Shared runtime helpers —
src/vulkan/runtime/vk_video.{c,h}(3413 + 436 lines). Owns:vk_video_session_init/finish,vk_video_session_parameters_{create,update,destroy}, H.264 SPS/PPS storage asstruct vk_video_h264_{sps,pps}, and thevk_common_{Create,Update,Destroy}VideoSessionParametersKHRentrypoints (full dispatch coverage of the parameters object). Codec parameter parsing helpers (vk_video_get_h264_parameters,vk_video_find_h264_dec_std_{sps,pps}). - Driver-side video — anv (
src/intel/vulkan/anv_video.c+genX_cmd_video.c) and radv (src/amd/vulkan/radv_video.c). Each driver owns: extension advertisement, queue-family advertisement,GetPhysicalDeviceVideoCapabilitiesKHR,GetPhysicalDeviceVideoFormatPropertiesKHR,Create/DestroyVideoSessionKHR,GetVideoSessionMemoryRequirementsKHR,BindVideoSessionMemoryKHR, and the per-frameCmdBeginVideoCodingKHR/CmdControlVideoCodingKHR/CmdDecodeVideoKHR/CmdEndVideoCodingKHRrecording. - HW codegen — driver emits register packets into a command stream during the
CmdDecodeVideoKHRrecord; the existing GPU queue submit path then ships that stream to the video engine.
Critical mismatch for our backend: layer 3 does not exist for us. The Hantro VPU has no Mali-side command stream. It has its own kernel device node (/dev/video1 + /dev/media0) with a request-API ioctl interface. So we keep layer 1 verbatim (huge win — all H.264 SPS/PPS parsing comes free), reuse layer 2's interface contracts, and replace layer 2's command-stream codegen with deferred V4L2 control marshalling + submit-time VIDIOC_QBUF/POLL/VIDIOC_DQBUF.
vk-video-samples simple-test trinity of required extensions:
VK_KHR_video_queue(spec v8) — shared baseVK_KHR_video_decode_queue(spec v8) — decode-specific commandsVK_KHR_video_decode_h264(spec v9) — H.264 profile
None are advertised in panvk-bifrost r4 today (Mesa 26.0.6 src/panfrost/vulkan/panvk_vX_physical_device.c:539-540 explicitly sets unifiedImageLayoutsVideo = false and leaves all KHR_video_* extension flags unset / default-false).
A. Extension surface
A.1 Where extensions are advertised
panvk extension table is built by panvk_per_arch(get_physical_device_extensions) in src/panfrost/vulkan/panvk_vX_physical_device.c:35-160. This is a single struct-literal that fills a struct vk_device_extension_table field-by-field. To add the three required extensions we extend the literal between (alphabetical sort by KHR_):
.KHR_video_decode_h264 = true, /* gated on hantro probe success */
.KHR_video_decode_queue = true,
.KHR_video_queue = true,
The natural insertion point is between .KHR_vertex_attribute_divisor = true, (line ~123) and .KHR_vulkan_memory_model = true, (line ~124).
Anv reference for comparison: src/intel/vulkan/anv_physical_device.c:262-274:
.KHR_video_queue = video_decode_enabled || video_encode_enabled,
.KHR_video_decode_queue = video_decode_enabled,
.KHR_video_decode_h264 = VIDEO_CODEC_H264DEC && video_decode_enabled,
where video_decode_enabled is device->instance->debug & ANV_DEBUG_VIDEO_DECODE (anv_physical_device.c:153). Anv gates this behind a debug flag because anv-side decode is still considered experimental. We probably want the same gating pattern, except keyed on hantro probe success rather than a debug flag — so the extension is advertised only if /dev/video1 opens and reports H.264 OUTPUT format support.
A.2 Feature struct fields
vk-video-samples simple-test requires VK_KHR_video_queue and friends advertised. The strictly-required feature struct fields are:
VkPhysicalDeviceVideoMaintenance1FeaturesKHR::videoMaintenance1— only if we advertiseKHR_video_maintenance1. For Phase 1, the simple-test does NOT require maintenance1 — confirmed by reading test harness expectations. Skip in Phase 1.VkPhysicalDeviceUnifiedImageLayoutsFeaturesKHR::unifiedImageLayoutsVideo— currentlyfalseatpanvk_vX_physical_device.c:540. Staysfalsefor Phase 1 (transition rules still apply).
The shared vk_video_session struct (vk_video.h:80-115) carries the per-session profile bookkeeping that gets driven by the codec ops pNext. No driver-side feature toggles needed beyond the three extension booleans for Phase 1.
A.3 vkGetPhysicalDeviceVideoCapabilitiesKHR routing
This is a direct driver entrypoint — there is no vk_common_GetPhysicalDeviceVideoCapabilitiesKHR in src/vulkan/runtime/. Verified: grep -rn "vk_common_GetPhysicalDeviceVideo" /home/mfritsche/mesa-build/mesa-26.0.6/src/ returns no hits.
Driver-side, the entrypoint is generated via vk_entrypoints_gen from vk_api.xml (per panvk/vulkan/meson.build:7-19). The panvk symbol resolution uses the panvk prefix and per-arch shims panvk_v6 / panvk_v7 / panvk_v9 / panvk_v10 / panvk_v12 / panvk_v13. So the symbol we need to provide is one of:
panvk_GetPhysicalDeviceVideoCapabilitiesKHR(inpanvk_physical_device.c) — common (arch-agnostic), since physical-device caps don't vary across Mali archs for V4L2-side decode (the VPU is on a separate engine entirely). Recommended.panvk_per_arch(GetPhysicalDeviceVideoCapabilitiesKHR)in a newpanvk_vX_video_decode.c— only needed if the answer varies per arch, which it doesn't here.
Reference shape from anv (anv_video.c:183-291): the function takes pVideoProfile and fills pCapabilities (maxCodedExtent, maxDpbSlots, maxActiveReferencePictures, minBitstreamBufferOffsetAlignment, stdHeaderVersion), then walks the codec-specific pNext chain. For H.264-decode, that means VkVideoDecodeH264CapabilitiesKHR (anv lines 213-225) with maxLevelIdc and fieldOffsetGranularity. Also fills VkVideoDecodeCapabilitiesKHR::flags = VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR (anv line 205) — which is what we'll want too, because the Hantro CAPTURE buffers ARE the DPB (no separate scratch).
The hantro driver's real limits (4K H.264 decode confirmed on RK3566) drive these values; we want to be conservative for Phase 1 and use maxCodedExtent = 1920x1088, maxDpbSlots = 17 (one more than STD_VIDEO_H264_MAX_NUM_LIST_REF=16, matches ANV_VIDEO_H264_MAX_DPB_SLOTS at anv_private.h:6581), maxActiveReferencePictures = 16.
A.4 vkGetPhysicalDeviceVideoFormatPropertiesKHR routing
Same routing pattern as A.3 — direct driver entrypoint, no shared common path. Implement as panvk_GetPhysicalDeviceVideoFormatPropertiesKHR in panvk_physical_device.c.
Reference shape from anv (anv_video.c:393-481): walks VkVideoProfileListInfoKHR from pVideoFormatInfo->pNext, validates each profile, then outputs format entries. For H.264 8-bit, anv reports VK_FORMAT_G8_B8R8_2PLANE_420_UNORM (NV12-equivalent, anv:460).
This is exactly what we need. The hantro driver returns NV12 as V4L2_PIX_FMT_NV12 on the CAPTURE queue (confirmed in libva-v4l2-request-fourier src/h264.c and via v4l2_find_format calls in src/request.c:864-865 showing format-probe pattern). The dst usage flag merge in anv at lines 410-419 (where VIDEO_DECODE_DST triggers added flags including SAMPLED_BIT | TRANSFER_DST_BIT) is universal vulkan-video pattern and applies verbatim. Set:
format = VK_FORMAT_G8_B8R8_2PLANE_420_UNORM(NV12)imageType = VK_IMAGE_TYPE_2DimageTiling = VK_IMAGE_TILING_OPTIMAL— but see G.2 below about how the underlying memory comes from V4L2, so this is a "logical" tiling decisionimageUsageFlags = VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR | VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR | VK_IMAGE_USAGE_SAMPLED_BIT | VK_IMAGE_USAGE_TRANSFER_SRC_BITimageCreateFlags = VK_IMAGE_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR | VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT | VK_IMAGE_CREATE_EXTENDED_USAGE_BIT
B. Queue family registration
B.1 Current state (r4)
src/panfrost/vulkan/panvk_device.h:46-48:
enum panvk_queue_family {
PANVK_QUEUE_FAMILY_GPU,
PANVK_QUEUE_FAMILY_BIND,
PANVK_QUEUE_FAMILY_COUNT,
};
Queue-family-properties query at panvk_physical_device.c:557-595:
[PANVK_QUEUE_FAMILY_GPU] = {
.queueFlags = VK_QUEUE_GRAPHICS_BIT | VK_QUEUE_COMPUTE_BIT | VK_QUEUE_TRANSFER_BIT,
...
},
[PANVK_QUEUE_FAMILY_BIND] = {
.queueFlags = VK_QUEUE_SPARSE_BINDING_BIT,
.queueCount = 1,
},
Queue dispatch in panvk_vX_device.c:
- line 253-258 —
panvk_queue_check_statusswitches onqueue->queue_family_indexto callgpu_queue_check_statusorbind_queue_check_status - line 269 —
panvk_device_check_statusiteratesfor (uint32_t qfi = 0; qfi < PANVK_QUEUE_FAMILY_COUNT; qfi++) - line 305-313 —
panvk_queue_createswitches oncreate_info->queueFamilyIndexto dispatch topanvk_per_arch(create_gpu_queue)orpanvk_per_arch(create_bind_queue) - line 320-329 —
panvk_queue_destroysymmetric - line 546-561 —
panvk_per_arch(create_device)iteratespCreateInfo->queueCreateInfoCount, callspanvk_queue_createfor each
B.2 What to add
Add a third enum value PANVK_QUEUE_FAMILY_VIDEO_DECODE. Slot ordering matters: Vulkan apps query queue families by index and the test client typically iterates looking for VK_QUEUE_VIDEO_DECODE_BIT_KHR. Index value is opaque so adding at end is safe.
enum panvk_queue_family {
PANVK_QUEUE_FAMILY_GPU,
PANVK_QUEUE_FAMILY_BIND,
PANVK_QUEUE_FAMILY_VIDEO_DECODE, /* NEW */
PANVK_QUEUE_FAMILY_COUNT,
};
Then in panvk_physical_device.c:557-595 extend the props table:
[PANVK_QUEUE_FAMILY_VIDEO_DECODE] = {
.queueFlags = VK_QUEUE_VIDEO_DECODE_BIT_KHR | VK_QUEUE_TRANSFER_BIT,
.queueCount = 1,
.minImageTransferGranularity = {1, 1, 1}, /* match VPU mb alignment if needed */
},
Anv reference for this pattern: src/intel/vulkan/anv_physical_device.c:2556-2576 (queue-family-init writing flags onto pdevice->queue.families[family_count++]). Anv also handles the VkQueueFamilyVideoPropertiesKHR pNext extension at anv_physical_device.c:3012-3030:
case VK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHR: {
VkQueueFamilyVideoPropertiesKHR *prop = ...;
if (queue_family->queueFlags & VK_QUEUE_VIDEO_DECODE_BIT_KHR) {
prop->videoCodecOperations = VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR | ...;
}
}
We need to mirror that pattern in panvk_GetPhysicalDeviceQueueFamilyProperties2. Right now it only walks VkQueueFamilyGlobalPriorityPropertiesKHR (at panvk_physical_device.c:589). Add a pNext walk for VK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHR and fill videoCodecOperations = VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR. Optional but recommended for Phase 1: also fill VK_STRUCTURE_TYPE_QUEUE_FAMILY_QUERY_RESULT_STATUS_PROPERTIES_KHR if test client asks (anv_physical_device.c:3007-3011).
B.3 Queue identification at queue_create time
Driver dispatches at panvk_vX_device.c:305-313 via panvk_queue_create. Extend the switch:
case PANVK_QUEUE_FAMILY_VIDEO_DECODE:
return panvk_per_arch(create_video_decode_queue)(
dev, create_info, queue_idx, out_queue);
And similarly extend panvk_queue_destroy (line 320-329) and panvk_queue_check_status (line 253-258).
For check_global_priority at panvk_vX_device.c:218-247 — the video decode family gets a new case that returns VK_SUCCESS for any priority (since the V4L2 device doesn't expose priority semantics) or just VK_QUEUE_GLOBAL_PRIORITY_MEDIUM_KHR like BIND.
B.4 V4L2 submit path — clean hook into queue infrastructure
The existing vk_queue has a driver_submit callback (set in jm/panvk_vX_gpu_queue.c:359: queue->vk.driver_submit = panvk_per_arch(gpu_queue_submit);). The submit function takes a struct vk_queue_submit containing command_buffers[], waits, signals.
For our V4L2 queue, the analog is: queue->vk.driver_submit = panvk_per_arch(video_decode_queue_submit); and the implementation does NOT touch Mali — it walks the cmdbuf's recorded V4L2 ops and dispatches each:
for each panvk_video_decode_op in cmdbuf->video_decode_ops:
media_request_reinit(op->request_fd) /* libva-v4l2-request-fourier media.c:51 */
VIDIOC_S_EXT_CTRLS(video_fd, request_fd,
{SPS, PPS, DECODE_PARAMS, SLICE_PARAMS, SCALING_MATRIX})
VIDIOC_QBUF(video_fd, OUTPUT, request_fd=op->request_fd) /* bitstream src */
VIDIOC_QBUF(video_fd, CAPTURE, dpb_buffer_index=op->dst_slot)
media_request_queue(op->request_fd) /* media.c:65 */
poll(request_fd, POLLPRI, timeout) /* media.c:79 */
VIDIOC_DQBUF(video_fd, OUTPUT)
VIDIOC_DQBUF(video_fd, CAPTURE)
The waits/signals from vk_queue_submit need to map to syncobj waits before we VIDIOC_QBUF, and a syncobj signal after the POLL completes. For Phase 1 (a single submit with no other GPU work in the queue), we can ignore semaphores and just use a syncobj that signals on DQBUF completion.
vk_queue_init (panvk_vX_gpu_queue.c:348) is the entry point; we'd reuse the same pattern for create_video_decode_queue. Allocate a struct panvk_video_decode_queue { struct vk_queue vk; int video_fd; int media_fd; ... } and stash the fds.
C. Session object lifecycle (VkVideoSessionKHR)
C.1 What CreateVideoSession allocates
Anv reference at src/intel/vulkan/anv_video.c:31-55:
struct anv_video_session *vid = vk_alloc2(...);
memset(vid, 0, sizeof(*vid));
VkResult result = vk_video_session_init(&device->vk, &vid->vk, pCreateInfo);
*pVideoSession = anv_video_session_to_handle(vid);
That's it. The heavy lifting is in vk_video_session_init (src/vulkan/runtime/vk_video.c:33-128), which fills:
vid->op(VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHRetc.)vid->max_coded,picture_format,ref_format,max_dpb_slots,max_active_ref_picsvid->h264.profile_idcfrom theVkVideoDecodeH264ProfileInfoKHRpNext (lines 51-57)
The driver-specific anv_video_session struct (anv_private.h:6688-6727) adds backend-specific per-stream state: cdf_initialized (for AV1), vid_mem[ANV_VID_MEM_AV1_MAX] (private memory bindings for codec scratch).
C.2 Memory binding via vkBindVideoSessionMemoryKHR
Anv reference at anv_video.c:914-998 for GetVideoSessionMemoryRequirements and anv_video.c:972-1000 for BindVideoSessionMemory. The mem_idx enums for H.264 (anv_private.h:6588-6593):
enum anv_vid_mem_h264_types {
ANV_VID_MEM_H264_INTRA_ROW_STORE,
ANV_VID_MEM_H264_DEBLOCK_FILTER_ROW_STORE,
ANV_VID_MEM_H264_BSD_MPC_ROW_SCRATCH,
ANV_VID_MEM_H264_MPR_ROW_SCRATCH,
ANV_VID_MEM_H264_MAX,
};
These are scratch buffers the Intel HCP/MFX engines need. The sizes are computed in get_h264_video_mem_size (anv_video.c:483-501) as multiples of width-in-MBs.
BindVideoSessionMemory (anv lines 972-998) is just bookkeeping: it copies each VkBindVideoSessionMemoryInfoKHR into vid->vid_mem[bind_index] (struct anv_vid_mem { anv_device_memory *mem; offset; size; } at anv_private.h:6572-6576).
C.3 For our V4L2 backend
Massive simplification opportunity: the Hantro VPU does NOT require driver-allocated scratch buffers — all scratch is internal to the VPU and managed by the kernel driver. So GetVideoSessionMemoryRequirements can return zero entries (*pVideoSessionMemoryRequirementsCount = 0), and BindVideoSessionMemory becomes a no-op (just return VK_SUCCESS;).
What CreateVideoSession DOES need to allocate, V4L2-side:
- Open
/dev/video1and/dev/media0if not already held by the device (see J.1 for ownership decision). - VIDIOC_S_FMT on the OUTPUT queue:
V4L2_PIX_FMT_H264_SLICE(note: hantro is slice-stateless), based onvid->h264.profile_idcandvid->max_coded. See libva-v4l2-request-fouriersrc/h264.c:699-738for the control-set pattern. - VIDIOC_S_FMT on the CAPTURE queue:
V4L2_PIX_FMT_NV12, dimensions fromvid->max_coded. - Allocate request_fd pool: pre-allocate N request fds (one per DPB slot + outstanding submits) via
MEDIA_IOC_REQUEST_ALLOCioctls (media.c:41). - VIDIOC_REQBUFS on OUTPUT + CAPTURE queues to set up buffer count.
So panvk_video_session struct shape:
struct panvk_video_session {
struct vk_video_session vk; /* shared base */
int video_fd; /* may share with physical_device */
int media_fd; /* may share with physical_device */
/* per-session V4L2 state */
uint32_t bitstream_buffer_count;
uint32_t capture_buffer_count;
struct {
int request_fd;
bool in_use;
uint32_t dpb_slot;
} request_pool[MAX_OUTSTANDING_DECODES];
};
C.4 Anv session creation shape — full reference
VkResult anv_CreateVideoSessionKHR(VkDevice _device,
const VkVideoSessionCreateInfoKHR *pCreateInfo,
const VkAllocationCallbacks *pAllocator,
VkVideoSessionKHR *pVideoSession)
/* anv_video.c:31-55 */
{
ANV_FROM_HANDLE(anv_device, device, _device);
struct anv_video_session *vid = vk_alloc2(..., sizeof(*vid), 8, OBJECT);
if (!vid) return vk_error(device, VK_ERROR_OUT_OF_HOST_MEMORY);
memset(vid, 0, sizeof(*vid));
VkResult result = vk_video_session_init(&device->vk, &vid->vk, pCreateInfo);
if (result != VK_SUCCESS) { vk_free2(..., vid); return result; }
*pVideoSession = anv_video_session_to_handle(vid);
return VK_SUCCESS;
}
For us, the body grows by ~15-30 lines for V4L2 setup (open fds, S_FMT, REQBUFS, request_fd pool init) and adds error-rollback paths.
D. Parameters object lifecycle (VkVideoSessionParametersKHR)
D.1 The shared layer does almost everything
src/vulkan/runtime/vk_video.c:845-885 defines:
vk_common_CreateVideoSessionParametersKHR(line 846-862)vk_common_UpdateVideoSessionParametersKHR(line 865-872)vk_common_DestroyVideoSessionParametersKHR(line 875-885)
These delegate to:
vk_video_session_parameters_create(helper atvk_video.c:480— alloc + dispatch by codec op)vk_video_session_parameters_update(line 793-844 — switches onparams->opand callsupdate_h264_dec_session_parametersat line 692 which does the actual SPS/PPS array merge with seq_parameter_set_id collision detection per the spec)vk_video_session_parameters_destroy
Key question: do panvk-bifrost entrypoints get auto-wired to the vk_common_* versions, or does the driver need to opt in?
Mesa's entrypoint generator (vk_entrypoints_gen.py) wires shared-helper entrypoints by default unless the driver provides a stronger symbol. So if panvk does NOT define panvk_CreateVideoSessionParametersKHR, the linker falls through to vk_common_CreateVideoSessionParametersKHR. Confirmed by anv comparison: anv has no anv_CreateVideoSessionParametersKHR, only anv_UpdateVideoSessionParametersKHR is missing too — both come from vk_common_*.
radv DOES override (radv_video.c:630-647) but only to call radv_video_patch_session_parameters for an AMD-specific fixup. For Phase 1 we don't need that.
Decision: rely entirely on vk_common. Zero driver code for parameters object lifecycle.
D.2 Parameters → V4L2 control conversion happens at CmdDecodeVideo time, not at parameter creation
The shared parameters struct (vk_video.h:127-195) for H.264-decode stores SPS array of struct vk_video_h264_sps (which embeds StdVideoH264SequenceParameterSet base) and PPS array of struct vk_video_h264_pps (which embeds StdVideoH264PictureParameterSet base). The lookup helpers vk_video_find_h264_dec_std_sps(params, id) and vk_video_find_h264_dec_std_pps(params, id) (vk_video.c:1186-1198) are what we call at decode time to get the SPS/PPS for the current frame.
The V4L2-side bridge from StdVideoH264SequenceParameterSet → struct v4l2_ctrl_h264_sps is the same conversion fourier does. See libva-v4l2-request-fourier/src/h264.c:360 for h264_va_picture_to_v4l2 which marshals to struct v4l2_ctrl_h264_decode_params, v4l2_ctrl_h264_pps, v4l2_ctrl_h264_sps — except the source format on our side is StdVideoH264* instead of VAPictureParameterBufferH264. The field-name mapping is essentially identical because both VAPictureParameterBufferH264 and StdVideoH264SequenceParameterSet ultimately derive from the H.264 spec's syntax element names.
We will write panvk_h264_std_sps_to_v4l2(const StdVideoH264SequenceParameterSet *std, struct v4l2_ctrl_h264_sps *out) etc. as a new helper file (~150 lines per codec). This is the bridge function that has no Mesa precedent — it's our novel contribution.
D.3 Hooking the parameters cache to ext-control structs at decode time
At CmdDecodeVideoKHR recording time, we retrieve the relevant StdVideoH264SequenceParameterSet * and StdVideoH264PictureParameterSet * via vk_video_get_h264_parameters (vk_video.h:419-425). The signature:
void vk_video_get_h264_parameters(const struct vk_video_session *session,
const struct vk_video_session_parameters *params,
const VkVideoDecodeInfoKHR *decode_info,
const VkVideoDecodeH264PictureInfoKHR *h264_pic_info,
const StdVideoH264SequenceParameterSet **sps_p,
const StdVideoH264PictureParameterSet **pps_p);
Anv uses this at genX_cmd_video.c:904 in anv_h264_decode_video. We do the same.
E. vkCmdDecodeVideoKHR command recording
E.1 What anv emits at record time vs submit time
Crucial finding: anv does ALL work at record time. By the time the cmdbuf goes to the queue, the command stream is fully baked. Look at anv_h264_decode_video (genX_cmd_video.c:892-1300+): every anv_batch_emit(&cmd_buffer->batch, GENX(MFX_PIPE_MODE_SELECT), sel) etc. is a register/packet write into the cmd_buffer's batch buffer. Submit time just kicks the batch.
The Begin/End wrappers are thin:
CmdBeginVideoCodingKHR(genX_cmd_video.c:31-50): stashescmd_buffer->video.vid = vid; cmd_buffer->video.params = params;into command-buffer-local state. That's it for H.264 (AV1 adds CDF table init).CmdControlVideoCodingKHR(genX_cmd_video.c:52-74): if RESET flag, emitMI_FLUSH_DWwithVideoPipelineCacheInvalidate = 1.CmdEndVideoCodingKHR(genX_cmd_video.c:76-83): clearscmd_buffer->video.vid = NULL; cmd_buffer->video.params = NULL;.
The cmd_buffer->video shadow state (anv_private.h:4935-4938):
struct {
struct anv_video_session *vid;
struct vk_video_session_parameters *params;
} video;
E.2 For our V4L2 backend — "deferred record"
The V4L2 ioctls cannot meaningfully happen at record time, because:
- The bitstream buffer (frame_info->srcBuffer) is a
VkBufferwe don't necessarily know the contents of yet (might be filled by a prior submitted cmdbuf or by host writes between record and submit). - Request_fd allocation and S_EXT_CTRLS need to be sequential per submit (cannot pre-bind a request_fd to a recorded cmdbuf and reuse it).
Pattern: per-cmdbuf list of "video decode ops" recorded during CmdDecodeVideoKHR. The op captures everything we need to replay at submit time:
struct panvk_video_decode_op {
/* From CmdBegin */
struct panvk_video_session *session;
struct vk_video_session_parameters *params;
/* From CmdDecode */
VkBuffer src_buffer; /* bitstream source */
VkDeviceSize src_offset;
VkDeviceSize src_size;
/* DPB target */
struct panvk_image_view *dst_iv;
uint32_t dst_dpb_slot;
/* Already-resolved SPS/PPS pointers (cheap copy by value) */
StdVideoH264SequenceParameterSet sps;
StdVideoH264PictureParameterSet pps;
/* H.264 slice info, picked apart at submit time */
StdVideoDecodeH264PictureInfo std_pic_info;
/* Reference slot info — small array, copy by value */
uint32_t reference_slot_count;
struct panvk_video_ref_slot reference_slots[16];
};
struct panvk_cmd_buffer {
...
struct util_dynarray video_decode_ops; /* of struct panvk_video_decode_op */
};
Then submit-time (per B.4) walks the dynarray and does the ioctl dance per op.
Comparable record-time op-list pattern exists today for sparse binds (panvk_sparse.c). Anv stores per-cmdbuf state in cmd_buffer->video but doesn't queue up ops because it emits direct register packets. We're doing what anv would do if anv ran on a separate kernel device.
E.3 CmdBegin/Control/End for our backend
panvk_per_arch(CmdBeginVideoCodingKHR): clearcmd_buffer->video_decode_session = vid; cmd_buffer->video_decode_params = params;. Optionally validate the reference slot layout matches the dpb_slot count we set up at session init.panvk_per_arch(CmdControlVideoCodingKHR)forVK_VIDEO_CODING_CONTROL_RESET_BIT_KHR: this needs to translate toMEDIA_REQUEST_IOC_REINITon all pooled request_fds — OR just mark a session-wide flag "next decode needs fresh request setup". Phase 1 we can no-op this if we always reinit per submit anyway.panvk_per_arch(CmdEndVideoCodingKHR): clear shadow state. No emission needed.
F. DPB management
F.1 Vulkan-side DPB model
Per-frame VkCmdDecodeVideoKHR receives:
frame_info->dstPictureResource—VkVideoPictureResourceInfoKHR { codedOffset, codedExtent, baseArrayLayer, imageViewBinding }. The image view that will receive the decoded output.frame_info->pSetupReferenceSlot—VkVideoReferenceSlotInfoKHR { slotIndex, pPictureResource }. Says "this decoded frame becomes DPB slot N".frame_info->pReferenceSlots[]— references TO read from. Each carriesslotIndex+pPictureResource.
For H.264, additionally:
pNextchainVkVideoDecodeH264PictureInfoKHR { pStdPictureInfo, sliceCount, pSliceOffsets }- DPB slot pNext per reference:
VkVideoDecodeH264DpbSlotInfoKHR { pStdReferenceInfo }— contains POC/short-term/long-term flags.
Anv's reference assembly logic at genX_cmd_video.c:992-1004:
for (unsigned i = 0; i < frame_info->referenceSlotCount; i++) {
const struct anv_image_view *ref_iv = anv_image_view_from_handle(
frame_info->pReferenceSlots[i].pPictureResource->imageViewBinding);
int idx = frame_info->pReferenceSlots[i].slotIndex;
...
dpb_slots[idx] = i;
buf.ReferencePictureAddress[i] = anv_image_dpb_address(ref_iv, baseArrayLayer);
}
F.2 V4L2 DPB model
v4l2_ctrl_h264_decode_params::dpb[16] is an array of struct v4l2_h264_dpb_entry { reference_ts, pic_num, frame_num, fields, flags, top_field_order_cnt, bottom_field_order_cnt }. Each entry's reference_ts is the timestamp used at VIDIOC_QBUF of the OUTPUT (bitstream) plane when that reference was decoded — V4L2 uses this as the "buffer identity" key.
So the mapping rule from Vulkan-side VkVideoReferenceSlotInfoKHR[] to V4L2-side dpb[16] is:
| Vulkan field | V4L2 dpb field | How to source |
|---|---|---|
pReferenceSlots[i].slotIndex |
array index in dpb[] |
direct (assert <= 16) |
pReferenceSlots[i].pNext->pStdReferenceInfo->PicOrderCnt[0] |
top_field_order_cnt |
direct |
pReferenceSlots[i].pNext->pStdReferenceInfo->PicOrderCnt[1] |
bottom_field_order_cnt |
direct |
pReferenceSlots[i].pNext->pStdReferenceInfo->FrameNum |
frame_num |
direct |
| short-term/long-term flag | flags |
direct |
| (the decoded output VkImage backing the ref slot) | reference_ts |
lookup: we maintain a slotIndex → reference_ts map per-session, populated each time we decode into that slot. See libva-fourier src/h264.c:140-218 for dpb_insert/dpb_update/dpb_find_entry. Our case is simpler: slotIndex is provided by Vulkan, we just need to track "what ts did I QBUF when I last decoded into slotIndex N". |
The fourier src/h264.c:238-353 h264_fill_dpb function is the closest analog — it constructs struct v4l2_h264_dpb_entry[] from libva-side state. We do the analog but feed it from pReferenceSlots[].
F.3 Bookkeeping struct in panvk_video_session
struct panvk_video_session {
...
struct {
uint64_t reference_ts; /* timestamp last used when decoding into this slot */
struct panvk_image *image; /* the VkImage backing this slot's DPB */
uint32_t array_layer;
bool active;
} dpb[16];
};
Update at decode-completion time (after VIDIOC_DQBUF) for the setup-reference-slot.
G. Memory + dmabuf interop
G.1 The challenge
App creates a VkImage with VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR | VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR | VK_IMAGE_USAGE_SAMPLED_BIT. Memory is bound via normal vkBindImageMemory. Then the decoded frame data needs to physically end up in that memory backing.
Hantro's CAPTURE queue allocates its own buffers via VIDIOC_REQBUFS(memory=V4L2_MEMORY_MMAP) or accepts dma_buf imports via VIDIOC_REQBUFS(memory=V4L2_MEMORY_DMABUF). The clean path: app's VkImage memory backing IS a dma_buf, exported from panvk via vkGetMemoryFdKHR, and we VIDIOC_QBUF'd with the dma_buf fd as the CAPTURE plane.
But Vulkan apps don't usually export memory back to themselves. They expect vkCreateImage(usage=VIDEO_DECODE_DST) to "just work". So we drive the dma_buf flow internally.
G.2 Internal dma_buf flow (proposed)
Two strategies:
Strategy A: Driver-allocated CAPTURE buffers, app-imported into VkImage
- VIDIOC_REQBUFS(MMAP) at session create.
- VIDIOC_EXPBUF to get a dma_buf fd per allocated buffer.
- Import the dma_buf back into pan_kmod as a VkDeviceMemory equivalent.
- VkBindImageMemory to that DeviceMemory.
Strategy B: App-allocated VkImage, V4L2_MEMORY_DMABUF queue
- App calls vkCreateImage with VkExternalMemoryImageCreateInfo handleTypes=DMA_BUF.
- Vk allocates the BO via pan_kmod, exports a dma_buf fd via
pan_kmod_bo_export(panvk_device_memory.c:387-404). - VIDIOC_QBUF(memory=V4L2_MEMORY_DMABUF, fd=our_dmabuf_fd) at submit time.
Strategy B is what fourier does for surface buffers, and it's the cleaner fit — the app gets a real VkImage with real VkDeviceMemory, we never have to fake the import direction. Phase 1 may want to start with Strategy A for simplicity since vk-video-samples likely doesn't pass VkExternalMemoryImageCreateInfo flags, but Strategy B is the long-term right answer.
G.3 Anv's DPB image allocation
Anv treats DPB images as plain VkImages — no special allocation. The HW reads them directly via anv_image_dpb_address(iv, baseArrayLayer) at genX_cmd_video.c:933. Memory layout is whatever ISL gives them (tile-Y or planar-420). For our backend, that doesn't transfer — the Hantro VPU expects NV12 in a linear layout (or a vendor-specific tiled layout that we'd need to expose; for Phase 1 we mandate linear).
G.4 panvk dmabuf entry points (already present)
panvk_AllocateMemoryhandlesVkImportMemoryFdInfoKHRatpanvk_device_memory.c:121-135— callspan_kmod_bo_import.panvk_GetMemoryFdKHRatpanvk_device_memory.c:387-404exports.EXT_external_memory_dma_bufalready advertised atpanvk_vX_physical_device.c:146.
So the building blocks exist. The new code is the session-internal V4L2 buffer pool that converts between V4L2_MEMORY_MMAP/DMABUF and pan_kmod BOs.
H. vk_video runtime helper coverage matrix
What we inherit vs what we write. Cross-referenced from sections A–G:
| Question | Inherit from vk_video shared layer? | Driver writes? |
|---|---|---|
| A. KHR_video_* extension booleans | No | YES — panvk_vX_physical_device.c table |
| A. videoMaintenance1 feature struct | No | (Phase 1: skip; future: yes if advertised) |
| A. GetPhysicalDeviceVideoCapabilitiesKHR | NO — direct entrypoint | YES — new code in panvk_physical_device.c |
| A. GetPhysicalDeviceVideoFormatPropertiesKHR | NO — direct entrypoint | YES — new code in panvk_physical_device.c |
| B. Queue family enum + props | No | YES — panvk_device.h + panvk_physical_device.c |
| B. Queue-family-video pNext walk | No | YES — extend panvk_GetPhysicalDeviceQueueFamilyProperties2 |
| B. Queue create/destroy dispatch | No | YES — extend panvk_vX_device.c:305-329 |
| B. Queue submit | No | YES — new panvk_vX_video_decode_queue.c |
| C. CreateVideoSessionKHR — handle + base init | YES partial: vk_video_session_init does the codec-op parsing |
YES — driver wraps, adds V4L2 fd open + S_FMT + REQBUFS |
| C. DestroyVideoSessionKHR — base finish | YES partial: vk_video_session_finish |
YES — driver wraps, adds V4L2 teardown |
| C. GetVideoSessionMemoryRequirementsKHR | No | YES (trivial: zero entries) |
| C. BindVideoSessionMemoryKHR | No | YES (trivial: no-op) |
| D. CreateVideoSessionParametersKHR | YES — vk_common_CreateVideoSessionParametersKHR (vk_video.c:846) |
NO driver code needed |
| D. UpdateVideoSessionParametersKHR | YES — vk_common_UpdateVideoSessionParametersKHR (vk_video.c:865) |
NO driver code needed |
| D. DestroyVideoSessionParametersKHR | YES — vk_common_DestroyVideoSessionParametersKHR (vk_video.c:875) |
NO driver code needed |
| D. H.264 SPS/PPS storage | YES — struct vk_video_h264_{sps,pps} (vk_video.h:32-43) |
NO |
| D. H.264 SPS/PPS lookup | YES — vk_video_find_h264_dec_std_{sps,pps} (vk_video.c:1186) |
NO |
| D. H.264 params merge with dedup | YES — internal to vk_video_session_parameters_update |
NO |
| D. Std → V4L2 control marshalling | No precedent in Mesa | YES — NEW helper file (~300 lines for H.264) |
| E. CmdBeginVideoCodingKHR | No | YES — trivial state-stash |
| E. CmdControlVideoCodingKHR | No | YES — trivial RESET handling |
| E. CmdEndVideoCodingKHR | No | YES — trivial state-clear |
| E. CmdDecodeVideoKHR | No | YES — record op into cmdbuf dynarray |
E. vk_video_get_h264_parameters resolver |
YES (vk_video.h:419) | NO |
| F. DPB slot ↔ reference_ts map | No | YES — panvk_video_session.dpb[16] |
| F. H.264 reference list construction | Partially: vk_fill_video_h264_* helpers if present |
YES — but mostly direct field copies |
| G. dmabuf BO import/export | YES — existing panvk path (panvk_device_memory.c:121,387) |
NO new code |
| G. V4L2 buffer ↔ pan_kmod_bo bridging | No precedent | YES — NEW helper file |
| G. Image creation for VIDEO_DECODE_DST | YES — existing panvk_image_init (panvk_image.c:562) handles all usage flags through ISL |
Possibly yes for tile mode restrictions |
Net leverage: ~3000 lines of vk_video runtime helpers we inherit for free, primarily the H.264 SPS/PPS bitstream parsing + parameters object lifecycle + std/find helpers. Our new-code estimate is roughly 800-1500 lines split across ~4 new files (see I).
I. panvk-specific integration points (concrete edits)
I.1 Existing files to modify
src/panfrost/vulkan/panvk_vX_physical_device.c:
- Lines ~123-124 (between
KHR_vertex_attribute_divisorandKHR_vulkan_memory_model): add.KHR_video_queue = true,,.KHR_video_decode_queue = true,,.KHR_video_decode_h264 = true,(gated on hantro probe). - Optional Phase 2+: at line 540, flip
unifiedImageLayoutsVideobased on session config.
src/panfrost/vulkan/panvk_physical_device.c:
- Line ~565: extend the
qfamily_props[]array — add a third entry forPANVK_QUEUE_FAMILY_VIDEO_DECODEwithqueueFlags = VK_QUEUE_VIDEO_DECODE_BIT_KHR | VK_QUEUE_TRANSFER_BIT. - Around line 589 inside the
vk_outarray_append_typedloop: add a pNext walk forVK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHRthat setsvideoCodecOperations = VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR. - ADD new entrypoints
panvk_GetPhysicalDeviceVideoCapabilitiesKHRandpanvk_GetPhysicalDeviceVideoFormatPropertiesKHRat end of file (~70 lines + ~50 lines).
src/panfrost/vulkan/panvk_device.h:
- Line 46-48: add
PANVK_QUEUE_FAMILY_VIDEO_DECODE,to the enum.
src/panfrost/vulkan/panvk_vX_device.c:
- Lines 218-247 (
check_global_priority): addcase PANVK_QUEUE_FAMILY_VIDEO_DECODE: return VK_SUCCESS;. - Lines 253-258 (
panvk_queue_check_status): add case for the new family callingpanvk_per_arch(video_decode_queue_check_status). - Lines 305-313 (
panvk_queue_create): add case callingpanvk_per_arch(create_video_decode_queue). - Lines 320-329 (
panvk_queue_destroy): symmetric.
src/panfrost/vulkan/meson.build:
- Add new files to either
libpanvk_files(arch-agnostic) orcommon_per_arch_files(arch-templated). The session/queue/command-record code is arch-agnostic but usespanvk_per_arch()symbols only by convention — Phase 1 we can place all new files inlibpanvk_filesand skip the per_arch dispatch.
I.2 New files to add
src/panfrost/vulkan/panvk_video_decode.c (~400 lines):
panvk_CreateVideoSessionKHRpanvk_DestroyVideoSessionKHRpanvk_GetVideoSessionMemoryRequirementsKHR(returns count=0)panvk_BindVideoSessionMemoryKHR(no-op)panvk_CmdBeginVideoCodingKHRpanvk_CmdControlVideoCodingKHRpanvk_CmdEndVideoCodingKHRpanvk_CmdDecodeVideoKHR(record op intocmd_buffer->video_decode_ops)
src/panfrost/vulkan/panvk_video_decode.h:
struct panvk_video_sessionstruct panvk_video_decode_opstruct panvk_video_decode_queue
src/panfrost/vulkan/panvk_v4l2.c (~500 lines):
panvk_v4l2_probe_hantro()— finds /dev/video1 and /dev/media0 (mirrors libva-v4l2-request-fouriersrc/request.c:143-308find_decoder_video_node_via_topology).panvk_v4l2_session_init()— S_FMT on OUTPUT/CAPTURE, REQBUFS, request_fd pool alloc.panvk_v4l2_h264_std_to_ctrl_sps()—StdVideoH264SequenceParameterSet *→struct v4l2_ctrl_h264_sps.panvk_v4l2_h264_std_to_ctrl_pps()—StdVideoH264PictureParameterSet *→struct v4l2_ctrl_h264_pps.panvk_v4l2_h264_fill_decode_params()— buildstruct v4l2_ctrl_h264_decode_paramsfrom VkVideoDecodeInfoKHR + slot map.panvk_v4l2_submit_op()— the request_fd / S_EXT_CTRLS / QBUF / poll / DQBUF dance for one op.
src/panfrost/vulkan/panvk_vX_video_decode_queue.c (~150 lines, per_arch):
panvk_per_arch(create_video_decode_queue)panvk_per_arch(destroy_video_decode_queue)panvk_per_arch(video_decode_queue_submit)— walks cmdbuf ops, callspanvk_v4l2_submit_opper op.panvk_per_arch(video_decode_queue_check_status)
I.3 Entrypoint generation
Recall from meson.build:7-19 that entrypoints are auto-wired with --prefix panvk and per-arch prefixes. The names above (panvk_CmdDecodeVideoKHR etc.) match the auto-resolution rules — no changes needed in vk_entrypoints_gen invocation.
For the per-arch ones (panvk_per_arch(...)), we expand under each PAN_ARCH define just like existing per-arch code.
J. Probable architecture sketch
V4L2 fd ownership: at panvk_physical_device level for probe-time discovery (panvk_v4l2_probe_hantro sets phys_dev->v4l2.video_fd_present = true and stashes paths), but actual open() happens at panvk_CreateVideoSessionKHR time per-session. Two reasons: (1) the V4L2 driver state is per-fd, so two concurrent sessions need two separate fds anyway; (2) keeping fds closed when no video session is active is good citizenship. The PhysicalDevice only holds device-node paths and capability flags.
Per-session V4L2 state: struct panvk_video_session (see C.3) owns one video_fd + one media_fd + a pool of request_fds (one per max-in-flight decode, typically max_dpb_slots + 2). At CreateVideoSession we S_FMT both queues, REQBUFS to allocate the buffer count, EXPBUF the CAPTURE buffers to dma_bufs that get held in the session for later association with VkImage memory (Strategy B from G.2).
Per-VkImage dmabuf bookkeeping: the existing pan_kmod export path (panvk_device_memory.c:387-404) gives us dma_buf out. The new piece is the inverse — at vkBindImageMemory time for a VkImage whose usage & VIDEO_DECODE_DST, we'd register the underlying BO's dma_buf as a CAPTURE buffer with VIDIOC_QBUF(memory=V4L2_MEMORY_DMABUF). The image's panvk_image struct gains a int v4l2_capture_index; field.
Submit-time dispatch: at panvk_vX_device.c:305-313 we extended the switch to route PANVK_QUEUE_FAMILY_VIDEO_DECODE to panvk_per_arch(create_video_decode_queue) whose driver_submit = panvk_per_arch(video_decode_queue_submit). The submit function walks each cmdbuf's video_decode_ops dynarray, and per op:
1. resolve request_fd from session pool (allocate or reuse, ioctl(media_fd, MEDIA_IOC_REQUEST_ALLOC))
2. media_request_reinit(request_fd) if reusing
3. translate op->sps to v4l2_ctrl_h264_sps via panvk_v4l2_h264_std_to_ctrl_sps()
4. translate op->pps to v4l2_ctrl_h264_pps via panvk_v4l2_h264_std_to_ctrl_pps()
5. build v4l2_ctrl_h264_decode_params from op (including dpb[] from session->dpb[] tracking)
6. VIDIOC_S_EXT_CTRLS(video_fd, request_fd=op->request_fd, {SPS, PPS, DECODE_PARAMS, SCALING_MATRIX, SLICE_PARAMS})
7. VIDIOC_QBUF(video_fd, OUTPUT, request_fd=op->request_fd, bytesused=op->src_size, m.fd=op->src_buffer's bo dma_buf)
8. VIDIOC_QBUF(video_fd, CAPTURE, index=op->dst_iv->image->v4l2_capture_index)
9. MEDIA_REQUEST_IOC_QUEUE(request_fd)
10. poll(request_fd, POLLPRI, timeout)
11. VIDIOC_DQBUF(video_fd, OUTPUT) /* releases input slot */
12. VIDIOC_DQBUF(video_fd, CAPTURE) /* output ready */
13. Update session->dpb[op->dst_dpb_slot].reference_ts to the QBUF timestamp
14. Signal vk_queue_submit's signal semaphores
Steps 5-12 are exactly the libva-v4l2-request-fourier RequestEndPicture body (src/picture.c:497-650). The mapping VAPicture* → V4L2 vs Std* → V4L2 is the one piece of code that has no Mesa precedent — we're inventing the bridge — but it's bounded: ~150 lines per codec (we only need H.264 in Phase 1).
Mesa-version observations and risks
- Mesa 26.0.6 is the campaign baseline. The vk_video runtime helpers in
src/vulkan/runtime/vk_video.{c,h}are stable in this version with H.264, H.265, AV1, VP9, encode-h264, encode-h265, encode-av1 all covered. No upgrade required for Phase 1. KHR_video_decode_h264spec v9 is what's invk_api.xmlfor 26.0.6 — confirmed by extension being already known to entrypoint generator (no--betaflag needed; that flag atmeson.build:18is for beta/provisional extensions only).- Maintenance1/2 features are NOT required for the simple-test in Phase 1, so we don't need
videoMaintenance1/videoMaintenance2machinery yet. Maintenance1 (inline parameters, inline queries) becomes relevant in Phase 6+ if we want to pass conformance suites. - The
unifiedImageLayoutsVideofeature atpanvk_vX_physical_device.c:540is currently false. Phase 1 we can leave it false — the test client honors explicitVkImageMemoryBarriertransitions to/fromVK_IMAGE_LAYOUT_VIDEO_DECODE_DST_KHR.
Architectural maps that DO cleanly transfer from anv/radv
- Session as wrapper around
vk_video_session. Anv:struct anv_video_session { struct vk_video_session vk; ... }. radv: same shape. Ours: same shape. Thevk.namespace gives us all the spec-mandated session fields for free. - Parameters fully delegated to
vk_common_*. Anv does this, radv mostly does this (with a tinyradv_video_patch_session_parameterspatch). Ours: full delegation. - Cmdbuf-local shadow state for current session+params during the Begin..End scope. Anv:
cmd_buffer->video.{vid,params}. We do the same. - DPB slot index ↔ image view lookup at decode time. Both anv and our backend do this lookup per frame.
Architectural maps that DO NOT transfer
- Driver-allocated session scratch memory (
anv_vid_memarray). Hantro VPU keeps scratch internal; we return zero memory requirements. Hard skip — not just simplification, an inversion. anv_batch_emitregister packets directly into cmdbuf at record time. There is no equivalent. We MUST defer to submit-time — that's the entire point of the V4L2 backend being on a separate kernel device.anv_image_dpb_address(iv, layer)resolving to a GPU virtual address. Our DPB references resolve to V4L2 buffer indices (queued at session-init) or dma_buf fds (Strategy B). The "address" abstraction doesn't apply; the VPU doesn't share the GPU's address space.- MFX/HCP/VDENC register-set knowledge in
genX_cmd_video.c— 4000+ lines of Intel-specific HW programming. Completely irrelevant. The Hantro VPU's "programming" is a sequence of structv4l2_ctrl_*fills + ioctls. - MOCS / cache state in pipe-buf-addr-state (
genX_cmd_video.c:962+). N/A — the kernel V4L2 driver handles all cache coherency at QBUF/DQBUF boundaries.
Phase 1 success criteria — final checklist
| vk-video-samples simple-test step | Where it lands in this map |
|---|---|
vkGetPhysicalDeviceQueueFamilyProperties2 returns family with VK_QUEUE_VIDEO_DECODE_BIT_KHR and VkQueueFamilyVideoPropertiesKHR::videoCodecOperations & VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR set |
B.2 |
vkEnumerateDeviceExtensionProperties returns the three KHR_video_* |
A.1 |
vkGetPhysicalDeviceVideoCapabilitiesKHR(profile=H264) returns sane caps |
A.3 |
vkGetPhysicalDeviceVideoFormatPropertiesKHR returns NV12 |
A.4 |
vkCreateDevice succeeds with the video queue family selected |
B.3 |
vkCreateVideoSessionKHR succeeds |
C |
vkGetVideoSessionMemoryRequirementsKHR returns 0 entries |
C.3 |
vkCreateVideoSessionParametersKHR with SPS+PPS succeeds |
D (free from vk_common) |
Recording a vkCmdDecodeVideoKHR succeeds (no execution yet — could even no-op the V4L2 ioctls in Phase 1 since correctness isn't tested) |
E.2 |
| Single queue submit succeeds without VK_ERROR_DEVICE_LOST | B.4, J |
Phase 1 deliberately stops short of "decoded picture compares against reference". That's Phase 7. Phase 1 is the end-to-end plumbing.