Files
marfrit a4e7d8ab90 initial seed: retrofit campaign lineage from local working trees
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.

This retrofit imports:
- mesa-panvk-bifrost/   — r1..r4 era phase docs (iter1..iter18)
                          (libmali stub blobs at iter18/blob/ excluded
                          — 109MB of RE artifacts replaced with a README
                          pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/             — frozen .tgz source snapshots at each milestone
                          (basis for the 0005 patch diff generation)

Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.

Total: 1.9 MB across 124 files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:25:37 +02:00

46 KiB
Raw Permalink Blame History

Phase 1 Source Map — VK_KHR_video_decode_h264 on panvk-bifrost (V4L2/Hantro backend)

Campaign: panvk-bifrost-video (successor to panvk-bifrost r4) Mesa version: 26.0.6 (source tree on ohm at /home/mfritsche/mesa-build/mesa-26.0.6/) Phase 1 goal: vk-video-samples simple-test passes HasAllDeviceExtensions, creates a VkVideoSessionKHR, submits one VkCmdDecodeVideoKHR. Decode correctness is Phase 7. Backend: V4L2-stateless hantro VPU on RK3566/PineTab2 via /dev/video1 + /dev/media0. Mali GPU is not the decode engine.

Convention used throughout: every file path is on ohm unless otherwise stated. Cite as FILE:LINE. When citing libva-v4l2-request-fourier (the reference for V4L2-side bridging), the path is on the workstation at /home/mfritsche/src/libva-v4l2-request-fourier/.


Executive summary

The Mesa 26.0.6 video stack is structured in three layers:

  1. Shared runtime helperssrc/vulkan/runtime/vk_video.{c,h} (3413 + 436 lines). Owns: vk_video_session_init/finish, vk_video_session_parameters_{create,update,destroy}, H.264 SPS/PPS storage as struct vk_video_h264_{sps,pps}, and the vk_common_{Create,Update,Destroy}VideoSessionParametersKHR entrypoints (full dispatch coverage of the parameters object). Codec parameter parsing helpers (vk_video_get_h264_parameters, vk_video_find_h264_dec_std_{sps,pps}).
  2. Driver-side video — anv (src/intel/vulkan/anv_video.c + genX_cmd_video.c) and radv (src/amd/vulkan/radv_video.c). Each driver owns: extension advertisement, queue-family advertisement, GetPhysicalDeviceVideoCapabilitiesKHR, GetPhysicalDeviceVideoFormatPropertiesKHR, Create/DestroyVideoSessionKHR, GetVideoSessionMemoryRequirementsKHR, BindVideoSessionMemoryKHR, and the per-frame CmdBeginVideoCodingKHR/CmdControlVideoCodingKHR/CmdDecodeVideoKHR/CmdEndVideoCodingKHR recording.
  3. HW codegen — driver emits register packets into a command stream during the CmdDecodeVideoKHR record; the existing GPU queue submit path then ships that stream to the video engine.

Critical mismatch for our backend: layer 3 does not exist for us. The Hantro VPU has no Mali-side command stream. It has its own kernel device node (/dev/video1 + /dev/media0) with a request-API ioctl interface. So we keep layer 1 verbatim (huge win — all H.264 SPS/PPS parsing comes free), reuse layer 2's interface contracts, and replace layer 2's command-stream codegen with deferred V4L2 control marshalling + submit-time VIDIOC_QBUF/POLL/VIDIOC_DQBUF.

vk-video-samples simple-test trinity of required extensions:

  • VK_KHR_video_queue (spec v8) — shared base
  • VK_KHR_video_decode_queue (spec v8) — decode-specific commands
  • VK_KHR_video_decode_h264 (spec v9) — H.264 profile

None are advertised in panvk-bifrost r4 today (Mesa 26.0.6 src/panfrost/vulkan/panvk_vX_physical_device.c:539-540 explicitly sets unifiedImageLayoutsVideo = false and leaves all KHR_video_* extension flags unset / default-false).


A. Extension surface

A.1 Where extensions are advertised

panvk extension table is built by panvk_per_arch(get_physical_device_extensions) in src/panfrost/vulkan/panvk_vX_physical_device.c:35-160. This is a single struct-literal that fills a struct vk_device_extension_table field-by-field. To add the three required extensions we extend the literal between (alphabetical sort by KHR_):

.KHR_video_decode_h264   = true,   /* gated on hantro probe success */
.KHR_video_decode_queue  = true,
.KHR_video_queue         = true,

The natural insertion point is between .KHR_vertex_attribute_divisor = true, (line ~123) and .KHR_vulkan_memory_model = true, (line ~124).

Anv reference for comparison: src/intel/vulkan/anv_physical_device.c:262-274:

.KHR_video_queue                       = video_decode_enabled || video_encode_enabled,
.KHR_video_decode_queue                = video_decode_enabled,
.KHR_video_decode_h264                 = VIDEO_CODEC_H264DEC && video_decode_enabled,

where video_decode_enabled is device->instance->debug & ANV_DEBUG_VIDEO_DECODE (anv_physical_device.c:153). Anv gates this behind a debug flag because anv-side decode is still considered experimental. We probably want the same gating pattern, except keyed on hantro probe success rather than a debug flag — so the extension is advertised only if /dev/video1 opens and reports H.264 OUTPUT format support.

A.2 Feature struct fields

vk-video-samples simple-test requires VK_KHR_video_queue and friends advertised. The strictly-required feature struct fields are:

  • VkPhysicalDeviceVideoMaintenance1FeaturesKHR::videoMaintenance1only if we advertise KHR_video_maintenance1. For Phase 1, the simple-test does NOT require maintenance1 — confirmed by reading test harness expectations. Skip in Phase 1.
  • VkPhysicalDeviceUnifiedImageLayoutsFeaturesKHR::unifiedImageLayoutsVideo — currently false at panvk_vX_physical_device.c:540. Stays false for Phase 1 (transition rules still apply).

The shared vk_video_session struct (vk_video.h:80-115) carries the per-session profile bookkeeping that gets driven by the codec ops pNext. No driver-side feature toggles needed beyond the three extension booleans for Phase 1.

A.3 vkGetPhysicalDeviceVideoCapabilitiesKHR routing

This is a direct driver entrypoint — there is no vk_common_GetPhysicalDeviceVideoCapabilitiesKHR in src/vulkan/runtime/. Verified: grep -rn "vk_common_GetPhysicalDeviceVideo" /home/mfritsche/mesa-build/mesa-26.0.6/src/ returns no hits.

Driver-side, the entrypoint is generated via vk_entrypoints_gen from vk_api.xml (per panvk/vulkan/meson.build:7-19). The panvk symbol resolution uses the panvk prefix and per-arch shims panvk_v6 / panvk_v7 / panvk_v9 / panvk_v10 / panvk_v12 / panvk_v13. So the symbol we need to provide is one of:

  • panvk_GetPhysicalDeviceVideoCapabilitiesKHR (in panvk_physical_device.c) — common (arch-agnostic), since physical-device caps don't vary across Mali archs for V4L2-side decode (the VPU is on a separate engine entirely). Recommended.
  • panvk_per_arch(GetPhysicalDeviceVideoCapabilitiesKHR) in a new panvk_vX_video_decode.c — only needed if the answer varies per arch, which it doesn't here.

Reference shape from anv (anv_video.c:183-291): the function takes pVideoProfile and fills pCapabilities (maxCodedExtent, maxDpbSlots, maxActiveReferencePictures, minBitstreamBufferOffsetAlignment, stdHeaderVersion), then walks the codec-specific pNext chain. For H.264-decode, that means VkVideoDecodeH264CapabilitiesKHR (anv lines 213-225) with maxLevelIdc and fieldOffsetGranularity. Also fills VkVideoDecodeCapabilitiesKHR::flags = VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR (anv line 205) — which is what we'll want too, because the Hantro CAPTURE buffers ARE the DPB (no separate scratch).

The hantro driver's real limits (4K H.264 decode confirmed on RK3566) drive these values; we want to be conservative for Phase 1 and use maxCodedExtent = 1920x1088, maxDpbSlots = 17 (one more than STD_VIDEO_H264_MAX_NUM_LIST_REF=16, matches ANV_VIDEO_H264_MAX_DPB_SLOTS at anv_private.h:6581), maxActiveReferencePictures = 16.

A.4 vkGetPhysicalDeviceVideoFormatPropertiesKHR routing

Same routing pattern as A.3 — direct driver entrypoint, no shared common path. Implement as panvk_GetPhysicalDeviceVideoFormatPropertiesKHR in panvk_physical_device.c.

Reference shape from anv (anv_video.c:393-481): walks VkVideoProfileListInfoKHR from pVideoFormatInfo->pNext, validates each profile, then outputs format entries. For H.264 8-bit, anv reports VK_FORMAT_G8_B8R8_2PLANE_420_UNORM (NV12-equivalent, anv:460).

This is exactly what we need. The hantro driver returns NV12 as V4L2_PIX_FMT_NV12 on the CAPTURE queue (confirmed in libva-v4l2-request-fourier src/h264.c and via v4l2_find_format calls in src/request.c:864-865 showing format-probe pattern). The dst usage flag merge in anv at lines 410-419 (where VIDEO_DECODE_DST triggers added flags including SAMPLED_BIT | TRANSFER_DST_BIT) is universal vulkan-video pattern and applies verbatim. Set:

  • format = VK_FORMAT_G8_B8R8_2PLANE_420_UNORM (NV12)
  • imageType = VK_IMAGE_TYPE_2D
  • imageTiling = VK_IMAGE_TILING_OPTIMAL — but see G.2 below about how the underlying memory comes from V4L2, so this is a "logical" tiling decision
  • imageUsageFlags = VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR | VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR | VK_IMAGE_USAGE_SAMPLED_BIT | VK_IMAGE_USAGE_TRANSFER_SRC_BIT
  • imageCreateFlags = VK_IMAGE_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR | VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT | VK_IMAGE_CREATE_EXTENDED_USAGE_BIT

B. Queue family registration

B.1 Current state (r4)

src/panfrost/vulkan/panvk_device.h:46-48:

enum panvk_queue_family {
   PANVK_QUEUE_FAMILY_GPU,
   PANVK_QUEUE_FAMILY_BIND,
   PANVK_QUEUE_FAMILY_COUNT,
};

Queue-family-properties query at panvk_physical_device.c:557-595:

[PANVK_QUEUE_FAMILY_GPU] = {
   .queueFlags = VK_QUEUE_GRAPHICS_BIT | VK_QUEUE_COMPUTE_BIT | VK_QUEUE_TRANSFER_BIT,
   ...
},
[PANVK_QUEUE_FAMILY_BIND] = {
   .queueFlags = VK_QUEUE_SPARSE_BINDING_BIT,
   .queueCount = 1,
},

Queue dispatch in panvk_vX_device.c:

  • line 253-258 — panvk_queue_check_status switches on queue->queue_family_index to call gpu_queue_check_status or bind_queue_check_status
  • line 269 — panvk_device_check_status iterates for (uint32_t qfi = 0; qfi < PANVK_QUEUE_FAMILY_COUNT; qfi++)
  • line 305-313 — panvk_queue_create switches on create_info->queueFamilyIndex to dispatch to panvk_per_arch(create_gpu_queue) or panvk_per_arch(create_bind_queue)
  • line 320-329 — panvk_queue_destroy symmetric
  • line 546-561 — panvk_per_arch(create_device) iterates pCreateInfo->queueCreateInfoCount, calls panvk_queue_create for each

B.2 What to add

Add a third enum value PANVK_QUEUE_FAMILY_VIDEO_DECODE. Slot ordering matters: Vulkan apps query queue families by index and the test client typically iterates looking for VK_QUEUE_VIDEO_DECODE_BIT_KHR. Index value is opaque so adding at end is safe.

enum panvk_queue_family {
   PANVK_QUEUE_FAMILY_GPU,
   PANVK_QUEUE_FAMILY_BIND,
   PANVK_QUEUE_FAMILY_VIDEO_DECODE,   /* NEW */
   PANVK_QUEUE_FAMILY_COUNT,
};

Then in panvk_physical_device.c:557-595 extend the props table:

[PANVK_QUEUE_FAMILY_VIDEO_DECODE] = {
   .queueFlags = VK_QUEUE_VIDEO_DECODE_BIT_KHR | VK_QUEUE_TRANSFER_BIT,
   .queueCount = 1,
   .minImageTransferGranularity = {1, 1, 1},   /* match VPU mb alignment if needed */
},

Anv reference for this pattern: src/intel/vulkan/anv_physical_device.c:2556-2576 (queue-family-init writing flags onto pdevice->queue.families[family_count++]). Anv also handles the VkQueueFamilyVideoPropertiesKHR pNext extension at anv_physical_device.c:3012-3030:

case VK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHR: {
   VkQueueFamilyVideoPropertiesKHR *prop = ...;
   if (queue_family->queueFlags & VK_QUEUE_VIDEO_DECODE_BIT_KHR) {
      prop->videoCodecOperations = VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR | ...;
   }
}

We need to mirror that pattern in panvk_GetPhysicalDeviceQueueFamilyProperties2. Right now it only walks VkQueueFamilyGlobalPriorityPropertiesKHR (at panvk_physical_device.c:589). Add a pNext walk for VK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHR and fill videoCodecOperations = VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR. Optional but recommended for Phase 1: also fill VK_STRUCTURE_TYPE_QUEUE_FAMILY_QUERY_RESULT_STATUS_PROPERTIES_KHR if test client asks (anv_physical_device.c:3007-3011).

B.3 Queue identification at queue_create time

Driver dispatches at panvk_vX_device.c:305-313 via panvk_queue_create. Extend the switch:

case PANVK_QUEUE_FAMILY_VIDEO_DECODE:
   return panvk_per_arch(create_video_decode_queue)(
      dev, create_info, queue_idx, out_queue);

And similarly extend panvk_queue_destroy (line 320-329) and panvk_queue_check_status (line 253-258).

For check_global_priority at panvk_vX_device.c:218-247 — the video decode family gets a new case that returns VK_SUCCESS for any priority (since the V4L2 device doesn't expose priority semantics) or just VK_QUEUE_GLOBAL_PRIORITY_MEDIUM_KHR like BIND.

B.4 V4L2 submit path — clean hook into queue infrastructure

The existing vk_queue has a driver_submit callback (set in jm/panvk_vX_gpu_queue.c:359: queue->vk.driver_submit = panvk_per_arch(gpu_queue_submit);). The submit function takes a struct vk_queue_submit containing command_buffers[], waits, signals.

For our V4L2 queue, the analog is: queue->vk.driver_submit = panvk_per_arch(video_decode_queue_submit); and the implementation does NOT touch Mali — it walks the cmdbuf's recorded V4L2 ops and dispatches each:

for each panvk_video_decode_op in cmdbuf->video_decode_ops:
    media_request_reinit(op->request_fd)         /* libva-v4l2-request-fourier media.c:51 */
    VIDIOC_S_EXT_CTRLS(video_fd, request_fd,
                       {SPS, PPS, DECODE_PARAMS, SLICE_PARAMS, SCALING_MATRIX})
    VIDIOC_QBUF(video_fd, OUTPUT, request_fd=op->request_fd)   /* bitstream src */
    VIDIOC_QBUF(video_fd, CAPTURE, dpb_buffer_index=op->dst_slot)
    media_request_queue(op->request_fd)          /* media.c:65 */
    poll(request_fd, POLLPRI, timeout)           /* media.c:79 */
    VIDIOC_DQBUF(video_fd, OUTPUT)
    VIDIOC_DQBUF(video_fd, CAPTURE)

The waits/signals from vk_queue_submit need to map to syncobj waits before we VIDIOC_QBUF, and a syncobj signal after the POLL completes. For Phase 1 (a single submit with no other GPU work in the queue), we can ignore semaphores and just use a syncobj that signals on DQBUF completion.

vk_queue_init (panvk_vX_gpu_queue.c:348) is the entry point; we'd reuse the same pattern for create_video_decode_queue. Allocate a struct panvk_video_decode_queue { struct vk_queue vk; int video_fd; int media_fd; ... } and stash the fds.


C. Session object lifecycle (VkVideoSessionKHR)

C.1 What CreateVideoSession allocates

Anv reference at src/intel/vulkan/anv_video.c:31-55:

struct anv_video_session *vid = vk_alloc2(...);
memset(vid, 0, sizeof(*vid));
VkResult result = vk_video_session_init(&device->vk, &vid->vk, pCreateInfo);
*pVideoSession = anv_video_session_to_handle(vid);

That's it. The heavy lifting is in vk_video_session_init (src/vulkan/runtime/vk_video.c:33-128), which fills:

  • vid->op (VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR etc.)
  • vid->max_coded, picture_format, ref_format, max_dpb_slots, max_active_ref_pics
  • vid->h264.profile_idc from the VkVideoDecodeH264ProfileInfoKHR pNext (lines 51-57)

The driver-specific anv_video_session struct (anv_private.h:6688-6727) adds backend-specific per-stream state: cdf_initialized (for AV1), vid_mem[ANV_VID_MEM_AV1_MAX] (private memory bindings for codec scratch).

C.2 Memory binding via vkBindVideoSessionMemoryKHR

Anv reference at anv_video.c:914-998 for GetVideoSessionMemoryRequirements and anv_video.c:972-1000 for BindVideoSessionMemory. The mem_idx enums for H.264 (anv_private.h:6588-6593):

enum anv_vid_mem_h264_types {
   ANV_VID_MEM_H264_INTRA_ROW_STORE,
   ANV_VID_MEM_H264_DEBLOCK_FILTER_ROW_STORE,
   ANV_VID_MEM_H264_BSD_MPC_ROW_SCRATCH,
   ANV_VID_MEM_H264_MPR_ROW_SCRATCH,
   ANV_VID_MEM_H264_MAX,
};

These are scratch buffers the Intel HCP/MFX engines need. The sizes are computed in get_h264_video_mem_size (anv_video.c:483-501) as multiples of width-in-MBs.

BindVideoSessionMemory (anv lines 972-998) is just bookkeeping: it copies each VkBindVideoSessionMemoryInfoKHR into vid->vid_mem[bind_index] (struct anv_vid_mem { anv_device_memory *mem; offset; size; } at anv_private.h:6572-6576).

C.3 For our V4L2 backend

Massive simplification opportunity: the Hantro VPU does NOT require driver-allocated scratch buffers — all scratch is internal to the VPU and managed by the kernel driver. So GetVideoSessionMemoryRequirements can return zero entries (*pVideoSessionMemoryRequirementsCount = 0), and BindVideoSessionMemory becomes a no-op (just return VK_SUCCESS;).

What CreateVideoSession DOES need to allocate, V4L2-side:

  1. Open /dev/video1 and /dev/media0 if not already held by the device (see J.1 for ownership decision).
  2. VIDIOC_S_FMT on the OUTPUT queue: V4L2_PIX_FMT_H264_SLICE (note: hantro is slice-stateless), based on vid->h264.profile_idc and vid->max_coded. See libva-v4l2-request-fourier src/h264.c:699-738 for the control-set pattern.
  3. VIDIOC_S_FMT on the CAPTURE queue: V4L2_PIX_FMT_NV12, dimensions from vid->max_coded.
  4. Allocate request_fd pool: pre-allocate N request fds (one per DPB slot + outstanding submits) via MEDIA_IOC_REQUEST_ALLOC ioctls (media.c:41).
  5. VIDIOC_REQBUFS on OUTPUT + CAPTURE queues to set up buffer count.

So panvk_video_session struct shape:

struct panvk_video_session {
   struct vk_video_session vk;       /* shared base */
   int video_fd;                     /* may share with physical_device */
   int media_fd;                     /* may share with physical_device */
   /* per-session V4L2 state */
   uint32_t bitstream_buffer_count;
   uint32_t capture_buffer_count;
   struct {
      int request_fd;
      bool in_use;
      uint32_t dpb_slot;
   } request_pool[MAX_OUTSTANDING_DECODES];
};

C.4 Anv session creation shape — full reference

VkResult anv_CreateVideoSessionKHR(VkDevice _device,
                                   const VkVideoSessionCreateInfoKHR *pCreateInfo,
                                   const VkAllocationCallbacks *pAllocator,
                                   VkVideoSessionKHR *pVideoSession)
/* anv_video.c:31-55 */
{
   ANV_FROM_HANDLE(anv_device, device, _device);
   struct anv_video_session *vid = vk_alloc2(..., sizeof(*vid), 8, OBJECT);
   if (!vid) return vk_error(device, VK_ERROR_OUT_OF_HOST_MEMORY);
   memset(vid, 0, sizeof(*vid));
   VkResult result = vk_video_session_init(&device->vk, &vid->vk, pCreateInfo);
   if (result != VK_SUCCESS) { vk_free2(..., vid); return result; }
   *pVideoSession = anv_video_session_to_handle(vid);
   return VK_SUCCESS;
}

For us, the body grows by ~15-30 lines for V4L2 setup (open fds, S_FMT, REQBUFS, request_fd pool init) and adds error-rollback paths.


D. Parameters object lifecycle (VkVideoSessionParametersKHR)

D.1 The shared layer does almost everything

src/vulkan/runtime/vk_video.c:845-885 defines:

  • vk_common_CreateVideoSessionParametersKHR (line 846-862)
  • vk_common_UpdateVideoSessionParametersKHR (line 865-872)
  • vk_common_DestroyVideoSessionParametersKHR (line 875-885)

These delegate to:

  • vk_video_session_parameters_create (helper at vk_video.c:480 — alloc + dispatch by codec op)
  • vk_video_session_parameters_update (line 793-844 — switches on params->op and calls update_h264_dec_session_parameters at line 692 which does the actual SPS/PPS array merge with seq_parameter_set_id collision detection per the spec)
  • vk_video_session_parameters_destroy

Key question: do panvk-bifrost entrypoints get auto-wired to the vk_common_* versions, or does the driver need to opt in?

Mesa's entrypoint generator (vk_entrypoints_gen.py) wires shared-helper entrypoints by default unless the driver provides a stronger symbol. So if panvk does NOT define panvk_CreateVideoSessionParametersKHR, the linker falls through to vk_common_CreateVideoSessionParametersKHR. Confirmed by anv comparison: anv has no anv_CreateVideoSessionParametersKHR, only anv_UpdateVideoSessionParametersKHR is missing too — both come from vk_common_*.

radv DOES override (radv_video.c:630-647) but only to call radv_video_patch_session_parameters for an AMD-specific fixup. For Phase 1 we don't need that.

Decision: rely entirely on vk_common. Zero driver code for parameters object lifecycle.

D.2 Parameters → V4L2 control conversion happens at CmdDecodeVideo time, not at parameter creation

The shared parameters struct (vk_video.h:127-195) for H.264-decode stores SPS array of struct vk_video_h264_sps (which embeds StdVideoH264SequenceParameterSet base) and PPS array of struct vk_video_h264_pps (which embeds StdVideoH264PictureParameterSet base). The lookup helpers vk_video_find_h264_dec_std_sps(params, id) and vk_video_find_h264_dec_std_pps(params, id) (vk_video.c:1186-1198) are what we call at decode time to get the SPS/PPS for the current frame.

The V4L2-side bridge from StdVideoH264SequenceParameterSetstruct v4l2_ctrl_h264_sps is the same conversion fourier does. See libva-v4l2-request-fourier/src/h264.c:360 for h264_va_picture_to_v4l2 which marshals to struct v4l2_ctrl_h264_decode_params, v4l2_ctrl_h264_pps, v4l2_ctrl_h264_sps — except the source format on our side is StdVideoH264* instead of VAPictureParameterBufferH264. The field-name mapping is essentially identical because both VAPictureParameterBufferH264 and StdVideoH264SequenceParameterSet ultimately derive from the H.264 spec's syntax element names.

We will write panvk_h264_std_sps_to_v4l2(const StdVideoH264SequenceParameterSet *std, struct v4l2_ctrl_h264_sps *out) etc. as a new helper file (~150 lines per codec). This is the bridge function that has no Mesa precedent — it's our novel contribution.

D.3 Hooking the parameters cache to ext-control structs at decode time

At CmdDecodeVideoKHR recording time, we retrieve the relevant StdVideoH264SequenceParameterSet * and StdVideoH264PictureParameterSet * via vk_video_get_h264_parameters (vk_video.h:419-425). The signature:

void vk_video_get_h264_parameters(const struct vk_video_session *session,
                                  const struct vk_video_session_parameters *params,
                                  const VkVideoDecodeInfoKHR *decode_info,
                                  const VkVideoDecodeH264PictureInfoKHR *h264_pic_info,
                                  const StdVideoH264SequenceParameterSet **sps_p,
                                  const StdVideoH264PictureParameterSet **pps_p);

Anv uses this at genX_cmd_video.c:904 in anv_h264_decode_video. We do the same.


E. vkCmdDecodeVideoKHR command recording

E.1 What anv emits at record time vs submit time

Crucial finding: anv does ALL work at record time. By the time the cmdbuf goes to the queue, the command stream is fully baked. Look at anv_h264_decode_video (genX_cmd_video.c:892-1300+): every anv_batch_emit(&cmd_buffer->batch, GENX(MFX_PIPE_MODE_SELECT), sel) etc. is a register/packet write into the cmd_buffer's batch buffer. Submit time just kicks the batch.

The Begin/End wrappers are thin:

  • CmdBeginVideoCodingKHR (genX_cmd_video.c:31-50): stashes cmd_buffer->video.vid = vid; cmd_buffer->video.params = params; into command-buffer-local state. That's it for H.264 (AV1 adds CDF table init).
  • CmdControlVideoCodingKHR (genX_cmd_video.c:52-74): if RESET flag, emit MI_FLUSH_DW with VideoPipelineCacheInvalidate = 1.
  • CmdEndVideoCodingKHR (genX_cmd_video.c:76-83): clears cmd_buffer->video.vid = NULL; cmd_buffer->video.params = NULL;.

The cmd_buffer->video shadow state (anv_private.h:4935-4938):

struct {
   struct anv_video_session *vid;
   struct vk_video_session_parameters *params;
} video;

E.2 For our V4L2 backend — "deferred record"

The V4L2 ioctls cannot meaningfully happen at record time, because:

  1. The bitstream buffer (frame_info->srcBuffer) is a VkBuffer we don't necessarily know the contents of yet (might be filled by a prior submitted cmdbuf or by host writes between record and submit).
  2. Request_fd allocation and S_EXT_CTRLS need to be sequential per submit (cannot pre-bind a request_fd to a recorded cmdbuf and reuse it).

Pattern: per-cmdbuf list of "video decode ops" recorded during CmdDecodeVideoKHR. The op captures everything we need to replay at submit time:

struct panvk_video_decode_op {
   /* From CmdBegin */
   struct panvk_video_session *session;
   struct vk_video_session_parameters *params;
   /* From CmdDecode */
   VkBuffer src_buffer;        /* bitstream source */
   VkDeviceSize src_offset;
   VkDeviceSize src_size;
   /* DPB target */
   struct panvk_image_view *dst_iv;
   uint32_t dst_dpb_slot;
   /* Already-resolved SPS/PPS pointers (cheap copy by value) */
   StdVideoH264SequenceParameterSet sps;
   StdVideoH264PictureParameterSet pps;
   /* H.264 slice info, picked apart at submit time */
   StdVideoDecodeH264PictureInfo std_pic_info;
   /* Reference slot info — small array, copy by value */
   uint32_t reference_slot_count;
   struct panvk_video_ref_slot reference_slots[16];
};

struct panvk_cmd_buffer {
   ...
   struct util_dynarray video_decode_ops;   /* of struct panvk_video_decode_op */
};

Then submit-time (per B.4) walks the dynarray and does the ioctl dance per op.

Comparable record-time op-list pattern exists today for sparse binds (panvk_sparse.c). Anv stores per-cmdbuf state in cmd_buffer->video but doesn't queue up ops because it emits direct register packets. We're doing what anv would do if anv ran on a separate kernel device.

E.3 CmdBegin/Control/End for our backend

  • panvk_per_arch(CmdBeginVideoCodingKHR): clear cmd_buffer->video_decode_session = vid; cmd_buffer->video_decode_params = params;. Optionally validate the reference slot layout matches the dpb_slot count we set up at session init.
  • panvk_per_arch(CmdControlVideoCodingKHR) for VK_VIDEO_CODING_CONTROL_RESET_BIT_KHR: this needs to translate to MEDIA_REQUEST_IOC_REINIT on all pooled request_fds — OR just mark a session-wide flag "next decode needs fresh request setup". Phase 1 we can no-op this if we always reinit per submit anyway.
  • panvk_per_arch(CmdEndVideoCodingKHR): clear shadow state. No emission needed.

F. DPB management

F.1 Vulkan-side DPB model

Per-frame VkCmdDecodeVideoKHR receives:

  • frame_info->dstPictureResourceVkVideoPictureResourceInfoKHR { codedOffset, codedExtent, baseArrayLayer, imageViewBinding }. The image view that will receive the decoded output.
  • frame_info->pSetupReferenceSlotVkVideoReferenceSlotInfoKHR { slotIndex, pPictureResource }. Says "this decoded frame becomes DPB slot N".
  • frame_info->pReferenceSlots[] — references TO read from. Each carries slotIndex + pPictureResource.

For H.264, additionally:

  • pNext chain VkVideoDecodeH264PictureInfoKHR { pStdPictureInfo, sliceCount, pSliceOffsets }
  • DPB slot pNext per reference: VkVideoDecodeH264DpbSlotInfoKHR { pStdReferenceInfo } — contains POC/short-term/long-term flags.

Anv's reference assembly logic at genX_cmd_video.c:992-1004:

for (unsigned i = 0; i < frame_info->referenceSlotCount; i++) {
   const struct anv_image_view *ref_iv = anv_image_view_from_handle(
      frame_info->pReferenceSlots[i].pPictureResource->imageViewBinding);
   int idx = frame_info->pReferenceSlots[i].slotIndex;
   ...
   dpb_slots[idx] = i;
   buf.ReferencePictureAddress[i] = anv_image_dpb_address(ref_iv, baseArrayLayer);
}

F.2 V4L2 DPB model

v4l2_ctrl_h264_decode_params::dpb[16] is an array of struct v4l2_h264_dpb_entry { reference_ts, pic_num, frame_num, fields, flags, top_field_order_cnt, bottom_field_order_cnt }. Each entry's reference_ts is the timestamp used at VIDIOC_QBUF of the OUTPUT (bitstream) plane when that reference was decoded — V4L2 uses this as the "buffer identity" key.

So the mapping rule from Vulkan-side VkVideoReferenceSlotInfoKHR[] to V4L2-side dpb[16] is:

Vulkan field V4L2 dpb field How to source
pReferenceSlots[i].slotIndex array index in dpb[] direct (assert <= 16)
pReferenceSlots[i].pNext->pStdReferenceInfo->PicOrderCnt[0] top_field_order_cnt direct
pReferenceSlots[i].pNext->pStdReferenceInfo->PicOrderCnt[1] bottom_field_order_cnt direct
pReferenceSlots[i].pNext->pStdReferenceInfo->FrameNum frame_num direct
short-term/long-term flag flags direct
(the decoded output VkImage backing the ref slot) reference_ts lookup: we maintain a slotIndex → reference_ts map per-session, populated each time we decode into that slot. See libva-fourier src/h264.c:140-218 for dpb_insert/dpb_update/dpb_find_entry. Our case is simpler: slotIndex is provided by Vulkan, we just need to track "what ts did I QBUF when I last decoded into slotIndex N".

The fourier src/h264.c:238-353 h264_fill_dpb function is the closest analog — it constructs struct v4l2_h264_dpb_entry[] from libva-side state. We do the analog but feed it from pReferenceSlots[].

F.3 Bookkeeping struct in panvk_video_session

struct panvk_video_session {
   ...
   struct {
      uint64_t reference_ts;         /* timestamp last used when decoding into this slot */
      struct panvk_image *image;     /* the VkImage backing this slot's DPB */
      uint32_t array_layer;
      bool active;
   } dpb[16];
};

Update at decode-completion time (after VIDIOC_DQBUF) for the setup-reference-slot.


G. Memory + dmabuf interop

G.1 The challenge

App creates a VkImage with VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR | VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR | VK_IMAGE_USAGE_SAMPLED_BIT. Memory is bound via normal vkBindImageMemory. Then the decoded frame data needs to physically end up in that memory backing.

Hantro's CAPTURE queue allocates its own buffers via VIDIOC_REQBUFS(memory=V4L2_MEMORY_MMAP) or accepts dma_buf imports via VIDIOC_REQBUFS(memory=V4L2_MEMORY_DMABUF). The clean path: app's VkImage memory backing IS a dma_buf, exported from panvk via vkGetMemoryFdKHR, and we VIDIOC_QBUF'd with the dma_buf fd as the CAPTURE plane.

But Vulkan apps don't usually export memory back to themselves. They expect vkCreateImage(usage=VIDEO_DECODE_DST) to "just work". So we drive the dma_buf flow internally.

G.2 Internal dma_buf flow (proposed)

Two strategies:

Strategy A: Driver-allocated CAPTURE buffers, app-imported into VkImage

  • VIDIOC_REQBUFS(MMAP) at session create.
  • VIDIOC_EXPBUF to get a dma_buf fd per allocated buffer.
  • Import the dma_buf back into pan_kmod as a VkDeviceMemory equivalent.
  • VkBindImageMemory to that DeviceMemory.

Strategy B: App-allocated VkImage, V4L2_MEMORY_DMABUF queue

  • App calls vkCreateImage with VkExternalMemoryImageCreateInfo handleTypes=DMA_BUF.
  • Vk allocates the BO via pan_kmod, exports a dma_buf fd via pan_kmod_bo_export (panvk_device_memory.c:387-404).
  • VIDIOC_QBUF(memory=V4L2_MEMORY_DMABUF, fd=our_dmabuf_fd) at submit time.

Strategy B is what fourier does for surface buffers, and it's the cleaner fit — the app gets a real VkImage with real VkDeviceMemory, we never have to fake the import direction. Phase 1 may want to start with Strategy A for simplicity since vk-video-samples likely doesn't pass VkExternalMemoryImageCreateInfo flags, but Strategy B is the long-term right answer.

G.3 Anv's DPB image allocation

Anv treats DPB images as plain VkImages — no special allocation. The HW reads them directly via anv_image_dpb_address(iv, baseArrayLayer) at genX_cmd_video.c:933. Memory layout is whatever ISL gives them (tile-Y or planar-420). For our backend, that doesn't transfer — the Hantro VPU expects NV12 in a linear layout (or a vendor-specific tiled layout that we'd need to expose; for Phase 1 we mandate linear).

G.4 panvk dmabuf entry points (already present)

  • panvk_AllocateMemory handles VkImportMemoryFdInfoKHR at panvk_device_memory.c:121-135 — calls pan_kmod_bo_import.
  • panvk_GetMemoryFdKHR at panvk_device_memory.c:387-404 exports.
  • EXT_external_memory_dma_buf already advertised at panvk_vX_physical_device.c:146.

So the building blocks exist. The new code is the session-internal V4L2 buffer pool that converts between V4L2_MEMORY_MMAP/DMABUF and pan_kmod BOs.


H. vk_video runtime helper coverage matrix

What we inherit vs what we write. Cross-referenced from sections AG:

Question Inherit from vk_video shared layer? Driver writes?
A. KHR_video_* extension booleans No YES — panvk_vX_physical_device.c table
A. videoMaintenance1 feature struct No (Phase 1: skip; future: yes if advertised)
A. GetPhysicalDeviceVideoCapabilitiesKHR NO — direct entrypoint YES — new code in panvk_physical_device.c
A. GetPhysicalDeviceVideoFormatPropertiesKHR NO — direct entrypoint YES — new code in panvk_physical_device.c
B. Queue family enum + props No YES — panvk_device.h + panvk_physical_device.c
B. Queue-family-video pNext walk No YES — extend panvk_GetPhysicalDeviceQueueFamilyProperties2
B. Queue create/destroy dispatch No YES — extend panvk_vX_device.c:305-329
B. Queue submit No YES — new panvk_vX_video_decode_queue.c
C. CreateVideoSessionKHR — handle + base init YES partial: vk_video_session_init does the codec-op parsing YES — driver wraps, adds V4L2 fd open + S_FMT + REQBUFS
C. DestroyVideoSessionKHR — base finish YES partial: vk_video_session_finish YES — driver wraps, adds V4L2 teardown
C. GetVideoSessionMemoryRequirementsKHR No YES (trivial: zero entries)
C. BindVideoSessionMemoryKHR No YES (trivial: no-op)
D. CreateVideoSessionParametersKHR YES — vk_common_CreateVideoSessionParametersKHR (vk_video.c:846) NO driver code needed
D. UpdateVideoSessionParametersKHR YES — vk_common_UpdateVideoSessionParametersKHR (vk_video.c:865) NO driver code needed
D. DestroyVideoSessionParametersKHR YES — vk_common_DestroyVideoSessionParametersKHR (vk_video.c:875) NO driver code needed
D. H.264 SPS/PPS storage YES — struct vk_video_h264_{sps,pps} (vk_video.h:32-43) NO
D. H.264 SPS/PPS lookup YES — vk_video_find_h264_dec_std_{sps,pps} (vk_video.c:1186) NO
D. H.264 params merge with dedup YES — internal to vk_video_session_parameters_update NO
D. Std → V4L2 control marshalling No precedent in Mesa YES — NEW helper file (~300 lines for H.264)
E. CmdBeginVideoCodingKHR No YES — trivial state-stash
E. CmdControlVideoCodingKHR No YES — trivial RESET handling
E. CmdEndVideoCodingKHR No YES — trivial state-clear
E. CmdDecodeVideoKHR No YES — record op into cmdbuf dynarray
E. vk_video_get_h264_parameters resolver YES (vk_video.h:419) NO
F. DPB slot ↔ reference_ts map No YES — panvk_video_session.dpb[16]
F. H.264 reference list construction Partially: vk_fill_video_h264_* helpers if present YES — but mostly direct field copies
G. dmabuf BO import/export YES — existing panvk path (panvk_device_memory.c:121,387) NO new code
G. V4L2 buffer ↔ pan_kmod_bo bridging No precedent YES — NEW helper file
G. Image creation for VIDEO_DECODE_DST YES — existing panvk_image_init (panvk_image.c:562) handles all usage flags through ISL Possibly yes for tile mode restrictions

Net leverage: ~3000 lines of vk_video runtime helpers we inherit for free, primarily the H.264 SPS/PPS bitstream parsing + parameters object lifecycle + std/find helpers. Our new-code estimate is roughly 800-1500 lines split across ~4 new files (see I).


I. panvk-specific integration points (concrete edits)

I.1 Existing files to modify

src/panfrost/vulkan/panvk_vX_physical_device.c:

  • Lines ~123-124 (between KHR_vertex_attribute_divisor and KHR_vulkan_memory_model): add .KHR_video_queue = true,, .KHR_video_decode_queue = true,, .KHR_video_decode_h264 = true, (gated on hantro probe).
  • Optional Phase 2+: at line 540, flip unifiedImageLayoutsVideo based on session config.

src/panfrost/vulkan/panvk_physical_device.c:

  • Line ~565: extend the qfamily_props[] array — add a third entry for PANVK_QUEUE_FAMILY_VIDEO_DECODE with queueFlags = VK_QUEUE_VIDEO_DECODE_BIT_KHR | VK_QUEUE_TRANSFER_BIT.
  • Around line 589 inside the vk_outarray_append_typed loop: add a pNext walk for VK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHR that sets videoCodecOperations = VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR.
  • ADD new entrypoints panvk_GetPhysicalDeviceVideoCapabilitiesKHR and panvk_GetPhysicalDeviceVideoFormatPropertiesKHR at end of file (~70 lines + ~50 lines).

src/panfrost/vulkan/panvk_device.h:

  • Line 46-48: add PANVK_QUEUE_FAMILY_VIDEO_DECODE, to the enum.

src/panfrost/vulkan/panvk_vX_device.c:

  • Lines 218-247 (check_global_priority): add case PANVK_QUEUE_FAMILY_VIDEO_DECODE: return VK_SUCCESS;.
  • Lines 253-258 (panvk_queue_check_status): add case for the new family calling panvk_per_arch(video_decode_queue_check_status).
  • Lines 305-313 (panvk_queue_create): add case calling panvk_per_arch(create_video_decode_queue).
  • Lines 320-329 (panvk_queue_destroy): symmetric.

src/panfrost/vulkan/meson.build:

  • Add new files to either libpanvk_files (arch-agnostic) or common_per_arch_files (arch-templated). The session/queue/command-record code is arch-agnostic but uses panvk_per_arch() symbols only by convention — Phase 1 we can place all new files in libpanvk_files and skip the per_arch dispatch.

I.2 New files to add

src/panfrost/vulkan/panvk_video_decode.c (~400 lines):

  • panvk_CreateVideoSessionKHR
  • panvk_DestroyVideoSessionKHR
  • panvk_GetVideoSessionMemoryRequirementsKHR (returns count=0)
  • panvk_BindVideoSessionMemoryKHR (no-op)
  • panvk_CmdBeginVideoCodingKHR
  • panvk_CmdControlVideoCodingKHR
  • panvk_CmdEndVideoCodingKHR
  • panvk_CmdDecodeVideoKHR (record op into cmd_buffer->video_decode_ops)

src/panfrost/vulkan/panvk_video_decode.h:

  • struct panvk_video_session
  • struct panvk_video_decode_op
  • struct panvk_video_decode_queue

src/panfrost/vulkan/panvk_v4l2.c (~500 lines):

  • panvk_v4l2_probe_hantro() — finds /dev/video1 and /dev/media0 (mirrors libva-v4l2-request-fourier src/request.c:143-308 find_decoder_video_node_via_topology).
  • panvk_v4l2_session_init() — S_FMT on OUTPUT/CAPTURE, REQBUFS, request_fd pool alloc.
  • panvk_v4l2_h264_std_to_ctrl_sps()StdVideoH264SequenceParameterSet *struct v4l2_ctrl_h264_sps.
  • panvk_v4l2_h264_std_to_ctrl_pps()StdVideoH264PictureParameterSet *struct v4l2_ctrl_h264_pps.
  • panvk_v4l2_h264_fill_decode_params() — build struct v4l2_ctrl_h264_decode_params from VkVideoDecodeInfoKHR + slot map.
  • panvk_v4l2_submit_op() — the request_fd / S_EXT_CTRLS / QBUF / poll / DQBUF dance for one op.

src/panfrost/vulkan/panvk_vX_video_decode_queue.c (~150 lines, per_arch):

  • panvk_per_arch(create_video_decode_queue)
  • panvk_per_arch(destroy_video_decode_queue)
  • panvk_per_arch(video_decode_queue_submit) — walks cmdbuf ops, calls panvk_v4l2_submit_op per op.
  • panvk_per_arch(video_decode_queue_check_status)

I.3 Entrypoint generation

Recall from meson.build:7-19 that entrypoints are auto-wired with --prefix panvk and per-arch prefixes. The names above (panvk_CmdDecodeVideoKHR etc.) match the auto-resolution rules — no changes needed in vk_entrypoints_gen invocation.

For the per-arch ones (panvk_per_arch(...)), we expand under each PAN_ARCH define just like existing per-arch code.


J. Probable architecture sketch

V4L2 fd ownership: at panvk_physical_device level for probe-time discovery (panvk_v4l2_probe_hantro sets phys_dev->v4l2.video_fd_present = true and stashes paths), but actual open() happens at panvk_CreateVideoSessionKHR time per-session. Two reasons: (1) the V4L2 driver state is per-fd, so two concurrent sessions need two separate fds anyway; (2) keeping fds closed when no video session is active is good citizenship. The PhysicalDevice only holds device-node paths and capability flags.

Per-session V4L2 state: struct panvk_video_session (see C.3) owns one video_fd + one media_fd + a pool of request_fds (one per max-in-flight decode, typically max_dpb_slots + 2). At CreateVideoSession we S_FMT both queues, REQBUFS to allocate the buffer count, EXPBUF the CAPTURE buffers to dma_bufs that get held in the session for later association with VkImage memory (Strategy B from G.2).

Per-VkImage dmabuf bookkeeping: the existing pan_kmod export path (panvk_device_memory.c:387-404) gives us dma_buf out. The new piece is the inverse — at vkBindImageMemory time for a VkImage whose usage & VIDEO_DECODE_DST, we'd register the underlying BO's dma_buf as a CAPTURE buffer with VIDIOC_QBUF(memory=V4L2_MEMORY_DMABUF). The image's panvk_image struct gains a int v4l2_capture_index; field.

Submit-time dispatch: at panvk_vX_device.c:305-313 we extended the switch to route PANVK_QUEUE_FAMILY_VIDEO_DECODE to panvk_per_arch(create_video_decode_queue) whose driver_submit = panvk_per_arch(video_decode_queue_submit). The submit function walks each cmdbuf's video_decode_ops dynarray, and per op:

1. resolve request_fd from session pool (allocate or reuse, ioctl(media_fd, MEDIA_IOC_REQUEST_ALLOC))
2. media_request_reinit(request_fd) if reusing
3. translate op->sps to v4l2_ctrl_h264_sps via panvk_v4l2_h264_std_to_ctrl_sps()
4. translate op->pps to v4l2_ctrl_h264_pps via panvk_v4l2_h264_std_to_ctrl_pps()  
5. build v4l2_ctrl_h264_decode_params from op (including dpb[] from session->dpb[] tracking)
6. VIDIOC_S_EXT_CTRLS(video_fd, request_fd=op->request_fd, {SPS, PPS, DECODE_PARAMS, SCALING_MATRIX, SLICE_PARAMS})
7. VIDIOC_QBUF(video_fd, OUTPUT, request_fd=op->request_fd, bytesused=op->src_size, m.fd=op->src_buffer's bo dma_buf)
8. VIDIOC_QBUF(video_fd, CAPTURE, index=op->dst_iv->image->v4l2_capture_index)
9. MEDIA_REQUEST_IOC_QUEUE(request_fd)
10. poll(request_fd, POLLPRI, timeout)
11. VIDIOC_DQBUF(video_fd, OUTPUT)  /* releases input slot */
12. VIDIOC_DQBUF(video_fd, CAPTURE) /* output ready */
13. Update session->dpb[op->dst_dpb_slot].reference_ts to the QBUF timestamp
14. Signal vk_queue_submit's signal semaphores

Steps 5-12 are exactly the libva-v4l2-request-fourier RequestEndPicture body (src/picture.c:497-650). The mapping VAPicture* → V4L2 vs Std* → V4L2 is the one piece of code that has no Mesa precedent — we're inventing the bridge — but it's bounded: ~150 lines per codec (we only need H.264 in Phase 1).


Mesa-version observations and risks

  • Mesa 26.0.6 is the campaign baseline. The vk_video runtime helpers in src/vulkan/runtime/vk_video.{c,h} are stable in this version with H.264, H.265, AV1, VP9, encode-h264, encode-h265, encode-av1 all covered. No upgrade required for Phase 1.
  • KHR_video_decode_h264 spec v9 is what's in vk_api.xml for 26.0.6 — confirmed by extension being already known to entrypoint generator (no --beta flag needed; that flag at meson.build:18 is for beta/provisional extensions only).
  • Maintenance1/2 features are NOT required for the simple-test in Phase 1, so we don't need videoMaintenance1 / videoMaintenance2 machinery yet. Maintenance1 (inline parameters, inline queries) becomes relevant in Phase 6+ if we want to pass conformance suites.
  • The unifiedImageLayoutsVideo feature at panvk_vX_physical_device.c:540 is currently false. Phase 1 we can leave it false — the test client honors explicit VkImageMemoryBarrier transitions to/from VK_IMAGE_LAYOUT_VIDEO_DECODE_DST_KHR.

Architectural maps that DO cleanly transfer from anv/radv

  1. Session as wrapper around vk_video_session. Anv: struct anv_video_session { struct vk_video_session vk; ... }. radv: same shape. Ours: same shape. The vk. namespace gives us all the spec-mandated session fields for free.
  2. Parameters fully delegated to vk_common_*. Anv does this, radv mostly does this (with a tiny radv_video_patch_session_parameters patch). Ours: full delegation.
  3. Cmdbuf-local shadow state for current session+params during the Begin..End scope. Anv: cmd_buffer->video.{vid,params}. We do the same.
  4. DPB slot index ↔ image view lookup at decode time. Both anv and our backend do this lookup per frame.

Architectural maps that DO NOT transfer

  1. Driver-allocated session scratch memory (anv_vid_mem array). Hantro VPU keeps scratch internal; we return zero memory requirements. Hard skip — not just simplification, an inversion.
  2. anv_batch_emit register packets directly into cmdbuf at record time. There is no equivalent. We MUST defer to submit-time — that's the entire point of the V4L2 backend being on a separate kernel device.
  3. anv_image_dpb_address(iv, layer) resolving to a GPU virtual address. Our DPB references resolve to V4L2 buffer indices (queued at session-init) or dma_buf fds (Strategy B). The "address" abstraction doesn't apply; the VPU doesn't share the GPU's address space.
  4. MFX/HCP/VDENC register-set knowledge in genX_cmd_video.c — 4000+ lines of Intel-specific HW programming. Completely irrelevant. The Hantro VPU's "programming" is a sequence of struct v4l2_ctrl_* fills + ioctls.
  5. MOCS / cache state in pipe-buf-addr-state (genX_cmd_video.c:962+). N/A — the kernel V4L2 driver handles all cache coherency at QBUF/DQBUF boundaries.

Phase 1 success criteria — final checklist

vk-video-samples simple-test step Where it lands in this map
vkGetPhysicalDeviceQueueFamilyProperties2 returns family with VK_QUEUE_VIDEO_DECODE_BIT_KHR and VkQueueFamilyVideoPropertiesKHR::videoCodecOperations & VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR set B.2
vkEnumerateDeviceExtensionProperties returns the three KHR_video_* A.1
vkGetPhysicalDeviceVideoCapabilitiesKHR(profile=H264) returns sane caps A.3
vkGetPhysicalDeviceVideoFormatPropertiesKHR returns NV12 A.4
vkCreateDevice succeeds with the video queue family selected B.3
vkCreateVideoSessionKHR succeeds C
vkGetVideoSessionMemoryRequirementsKHR returns 0 entries C.3
vkCreateVideoSessionParametersKHR with SPS+PPS succeeds D (free from vk_common)
Recording a vkCmdDecodeVideoKHR succeeds (no execution yet — could even no-op the V4L2 ioctls in Phase 1 since correctness isn't tested) E.2
Single queue submit succeeds without VK_ERROR_DEVICE_LOST B.4, J

Phase 1 deliberately stops short of "decoded picture compares against reference". That's Phase 7. Phase 1 is the end-to-end plumbing.