initial seed: retrofit campaign lineage from local working trees

panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan video decode) shipped before this repo existed; the deliverable patches live in marfrit-packages, but the reasoning chain, phase docs, and source-state evidence lived only in local working trees on the development host. This retrofit imports: - mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18) (libmali stub blobs at iter18/blob/ excluded — 109MB of RE artifacts replaced with a README pointer) - mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe - evidence/ — frozen .tgz source snapshots at each milestone (basis for the 0005 patch diff generation) Future iterations should branch off here from day one, so each iter is a commit rather than a snapshot. See [[feedback-session-local-process-pins]] for the process drift this retrofit closes. Total: 1.9 MB across 124 files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:25:37 +02:00
parent 430d0da278
commit a4e7d8ab90
124 changed files with 22551 additions and 1 deletions
@@ -0,0 +1,669 @@
+# Phase 1 Source Map — VK_KHR_video_decode_h264 on panvk-bifrost (V4L2/Hantro backend)
+
+**Campaign**: panvk-bifrost-video (successor to panvk-bifrost r4)
+**Mesa version**: 26.0.6 (source tree on ohm at `/home/mfritsche/mesa-build/mesa-26.0.6/`)
+**Phase 1 goal**: vk-video-samples simple-test passes `HasAllDeviceExtensions`, creates a `VkVideoSessionKHR`, submits one `VkCmdDecodeVideoKHR`. Decode correctness is Phase 7.
+**Backend**: V4L2-stateless `hantro` VPU on RK3566/PineTab2 via `/dev/video1` + `/dev/media0`. Mali GPU is not the decode engine.
+
+> Convention used throughout: every file path is **on ohm** unless otherwise stated. Cite as `FILE:LINE`. When citing libva-v4l2-request-fourier (the reference for V4L2-side bridging), the path is on the **workstation** at `/home/mfritsche/src/libva-v4l2-request-fourier/`.
+
+---
+
+## Executive summary
+
+The Mesa 26.0.6 video stack is structured in three layers:
+
+1. **Shared runtime helpers** — `src/vulkan/runtime/vk_video.{c,h}` (3413 + 436 lines). Owns: `vk_video_session_init`/`finish`, `vk_video_session_parameters_{create,update,destroy}`, H.264 SPS/PPS storage as `struct vk_video_h264_{sps,pps}`, and the `vk_common_{Create,Update,Destroy}VideoSessionParametersKHR` entrypoints (full dispatch coverage of the parameters object). Codec parameter parsing helpers (`vk_video_get_h264_parameters`, `vk_video_find_h264_dec_std_{sps,pps}`).
+2. **Driver-side video** — anv (`src/intel/vulkan/anv_video.c` + `genX_cmd_video.c`) and radv (`src/amd/vulkan/radv_video.c`). Each driver owns: extension advertisement, queue-family advertisement, `GetPhysicalDeviceVideoCapabilitiesKHR`, `GetPhysicalDeviceVideoFormatPropertiesKHR`, `Create/DestroyVideoSessionKHR`, `GetVideoSessionMemoryRequirementsKHR`, `BindVideoSessionMemoryKHR`, and the per-frame `CmdBeginVideoCodingKHR`/`CmdControlVideoCodingKHR`/`CmdDecodeVideoKHR`/`CmdEndVideoCodingKHR` recording.
+3. **HW codegen** — driver emits register packets into a command stream during the `CmdDecodeVideoKHR` record; the existing GPU queue submit path then ships that stream to the video engine.
+
+**Critical mismatch for our backend**: layer 3 does not exist for us. The Hantro VPU has no Mali-side command stream. It has its own kernel device node (`/dev/video1` + `/dev/media0`) with a request-API ioctl interface. So we keep layer 1 verbatim (huge win — all H.264 SPS/PPS parsing comes free), reuse layer 2's *interface contracts*, and replace layer 2's command-stream codegen with deferred V4L2 control marshalling + submit-time `VIDIOC_QBUF`/`POLL`/`VIDIOC_DQBUF`.
+
+**vk-video-samples simple-test trinity** of required extensions:
+- `VK_KHR_video_queue` (spec v8) — shared base
+- `VK_KHR_video_decode_queue` (spec v8) — decode-specific commands
+- `VK_KHR_video_decode_h264` (spec v9) — H.264 profile
+
+None are advertised in panvk-bifrost r4 today (Mesa 26.0.6 `src/panfrost/vulkan/panvk_vX_physical_device.c:539-540` explicitly sets `unifiedImageLayoutsVideo = false` and leaves all `KHR_video_*` extension flags unset / default-false).
+
+---
+
+## A. Extension surface
+
+### A.1 Where extensions are advertised
+
+panvk extension table is built by `panvk_per_arch(get_physical_device_extensions)` in `src/panfrost/vulkan/panvk_vX_physical_device.c:35-160`. This is a single struct-literal that fills a `struct vk_device_extension_table` field-by-field. To add the three required extensions we extend the literal between (alphabetical sort by KHR_):
+
+```
+.KHR_video_decode_h264   = true,   /* gated on hantro probe success */
+.KHR_video_decode_queue  = true,
+.KHR_video_queue         = true,
+```
+
+The natural insertion point is between `.KHR_vertex_attribute_divisor = true,` (line ~123) and `.KHR_vulkan_memory_model = true,` (line ~124).
+
+Anv reference for comparison: `src/intel/vulkan/anv_physical_device.c:262-274`:
+```c
+.KHR_video_queue                       = video_decode_enabled || video_encode_enabled,
+.KHR_video_decode_queue                = video_decode_enabled,
+.KHR_video_decode_h264                 = VIDEO_CODEC_H264DEC && video_decode_enabled,
+```
+where `video_decode_enabled` is `device->instance->debug & ANV_DEBUG_VIDEO_DECODE` (`anv_physical_device.c:153`). Anv gates this behind a debug flag because anv-side decode is still considered experimental. We probably want the same gating pattern, except keyed on hantro probe success rather than a debug flag — so the extension is advertised only if `/dev/video1` opens and reports H.264 OUTPUT format support.
+
+### A.2 Feature struct fields
+
+vk-video-samples simple-test requires `VK_KHR_video_queue` and friends advertised. The strictly-required feature struct fields are:
+
+- `VkPhysicalDeviceVideoMaintenance1FeaturesKHR::videoMaintenance1` — **only if** we advertise `KHR_video_maintenance1`. For Phase 1, the simple-test does NOT require maintenance1 — confirmed by reading test harness expectations. Skip in Phase 1.
+- `VkPhysicalDeviceUnifiedImageLayoutsFeaturesKHR::unifiedImageLayoutsVideo` — currently `false` at `panvk_vX_physical_device.c:540`. Stays `false` for Phase 1 (transition rules still apply).
+
+The shared `vk_video_session` struct (`vk_video.h:80-115`) carries the per-session profile bookkeeping that gets driven by the codec ops `pNext`. No driver-side feature toggles needed beyond the three extension booleans for Phase 1.
+
+### A.3 vkGetPhysicalDeviceVideoCapabilitiesKHR routing
+
+This is a **direct driver entrypoint** — there is no `vk_common_GetPhysicalDeviceVideoCapabilitiesKHR` in `src/vulkan/runtime/`. Verified: `grep -rn "vk_common_GetPhysicalDeviceVideo" /home/mfritsche/mesa-build/mesa-26.0.6/src/` returns no hits.
+
+Driver-side, the entrypoint is generated via `vk_entrypoints_gen` from `vk_api.xml` (per `panvk/vulkan/meson.build:7-19`). The panvk symbol resolution uses the `panvk` prefix and per-arch shims `panvk_v6` / `panvk_v7` / `panvk_v9` / `panvk_v10` / `panvk_v12` / `panvk_v13`. So the symbol we need to provide is one of:
+
+- `panvk_GetPhysicalDeviceVideoCapabilitiesKHR` (in `panvk_physical_device.c`) — common (arch-agnostic), since physical-device caps don't vary across Mali archs for V4L2-side decode (the VPU is on a separate engine entirely). **Recommended.**
+- `panvk_per_arch(GetPhysicalDeviceVideoCapabilitiesKHR)` in a new `panvk_vX_video_decode.c` — only needed if the answer varies per arch, which it doesn't here.
+
+Reference shape from anv (`anv_video.c:183-291`): the function takes `pVideoProfile` and fills `pCapabilities` (`maxCodedExtent`, `maxDpbSlots`, `maxActiveReferencePictures`, `minBitstreamBufferOffsetAlignment`, `stdHeaderVersion`), then walks the codec-specific `pNext` chain. For H.264-decode, that means `VkVideoDecodeH264CapabilitiesKHR` (anv lines 213-225) with `maxLevelIdc` and `fieldOffsetGranularity`. Also fills `VkVideoDecodeCapabilitiesKHR::flags = VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR` (anv line 205) — which is what we'll want too, because the Hantro CAPTURE buffers ARE the DPB (no separate scratch).
+
+The hantro driver's real limits (4K H.264 decode confirmed on RK3566) drive these values; we want to be conservative for Phase 1 and use `maxCodedExtent = 1920x1088`, `maxDpbSlots = 17` (one more than `STD_VIDEO_H264_MAX_NUM_LIST_REF=16`, matches `ANV_VIDEO_H264_MAX_DPB_SLOTS` at `anv_private.h:6581`), `maxActiveReferencePictures = 16`.
+
+### A.4 vkGetPhysicalDeviceVideoFormatPropertiesKHR routing
+
+Same routing pattern as A.3 — direct driver entrypoint, no shared common path. Implement as `panvk_GetPhysicalDeviceVideoFormatPropertiesKHR` in `panvk_physical_device.c`.
+
+Reference shape from anv (`anv_video.c:393-481`): walks `VkVideoProfileListInfoKHR` from `pVideoFormatInfo->pNext`, validates each profile, then outputs format entries. For H.264 8-bit, anv reports `VK_FORMAT_G8_B8R8_2PLANE_420_UNORM` (NV12-equivalent, anv:460). 
+
+This is exactly what we need. The hantro driver returns NV12 as `V4L2_PIX_FMT_NV12` on the CAPTURE queue (confirmed in libva-v4l2-request-fourier `src/h264.c` and via `v4l2_find_format` calls in `src/request.c:864-865` showing format-probe pattern). The dst usage flag merge in anv at lines 410-419 (where `VIDEO_DECODE_DST` triggers added flags including `SAMPLED_BIT | TRANSFER_DST_BIT`) is universal vulkan-video pattern and applies verbatim. Set:
+- `format = VK_FORMAT_G8_B8R8_2PLANE_420_UNORM` (NV12)
+- `imageType = VK_IMAGE_TYPE_2D`
+- `imageTiling = VK_IMAGE_TILING_OPTIMAL` — but see G.2 below about how the underlying memory comes from V4L2, so this is a "logical" tiling decision
+- `imageUsageFlags = VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR | VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR | VK_IMAGE_USAGE_SAMPLED_BIT | VK_IMAGE_USAGE_TRANSFER_SRC_BIT`
+- `imageCreateFlags = VK_IMAGE_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR | VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT | VK_IMAGE_CREATE_EXTENDED_USAGE_BIT`
+
+---
+
+## B. Queue family registration
+
+### B.1 Current state (r4)
+
+`src/panfrost/vulkan/panvk_device.h:46-48`:
+```
+enum panvk_queue_family {
+   PANVK_QUEUE_FAMILY_GPU,
+   PANVK_QUEUE_FAMILY_BIND,
+   PANVK_QUEUE_FAMILY_COUNT,
+};
+```
+
+Queue-family-properties query at `panvk_physical_device.c:557-595`:
+```
+[PANVK_QUEUE_FAMILY_GPU] = {
+   .queueFlags = VK_QUEUE_GRAPHICS_BIT | VK_QUEUE_COMPUTE_BIT | VK_QUEUE_TRANSFER_BIT,
+   ...
+},
+[PANVK_QUEUE_FAMILY_BIND] = {
+   .queueFlags = VK_QUEUE_SPARSE_BINDING_BIT,
+   .queueCount = 1,
+},
+```
+
+Queue dispatch in `panvk_vX_device.c`:
+- line 253-258 — `panvk_queue_check_status` switches on `queue->queue_family_index` to call `gpu_queue_check_status` or `bind_queue_check_status`
+- line 269 — `panvk_device_check_status` iterates `for (uint32_t qfi = 0; qfi < PANVK_QUEUE_FAMILY_COUNT; qfi++)`
+- line 305-313 — `panvk_queue_create` switches on `create_info->queueFamilyIndex` to dispatch to `panvk_per_arch(create_gpu_queue)` or `panvk_per_arch(create_bind_queue)`
+- line 320-329 — `panvk_queue_destroy` symmetric
+- line 546-561 — `panvk_per_arch(create_device)` iterates `pCreateInfo->queueCreateInfoCount`, calls `panvk_queue_create` for each
+
+### B.2 What to add
+
+Add a third enum value `PANVK_QUEUE_FAMILY_VIDEO_DECODE`. Slot ordering matters: Vulkan apps query queue families by index and the test client *typically* iterates looking for `VK_QUEUE_VIDEO_DECODE_BIT_KHR`. Index value is opaque so adding at end is safe.
+
+```
+enum panvk_queue_family {
+   PANVK_QUEUE_FAMILY_GPU,
+   PANVK_QUEUE_FAMILY_BIND,
+   PANVK_QUEUE_FAMILY_VIDEO_DECODE,   /* NEW */
+   PANVK_QUEUE_FAMILY_COUNT,
+};
+```
+
+Then in `panvk_physical_device.c:557-595` extend the props table:
+```
+[PANVK_QUEUE_FAMILY_VIDEO_DECODE] = {
+   .queueFlags = VK_QUEUE_VIDEO_DECODE_BIT_KHR | VK_QUEUE_TRANSFER_BIT,
+   .queueCount = 1,
+   .minImageTransferGranularity = {1, 1, 1},   /* match VPU mb alignment if needed */
+},
+```
+
+Anv reference for this pattern: `src/intel/vulkan/anv_physical_device.c:2556-2576` (queue-family-init writing flags onto `pdevice->queue.families[family_count++]`). Anv also handles the `VkQueueFamilyVideoPropertiesKHR` pNext extension at `anv_physical_device.c:3012-3030`:
+```c
+case VK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHR: {
+   VkQueueFamilyVideoPropertiesKHR *prop = ...;
+   if (queue_family->queueFlags & VK_QUEUE_VIDEO_DECODE_BIT_KHR) {
+      prop->videoCodecOperations = VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR | ...;
+   }
+}
+```
+
+We need to mirror that pattern in `panvk_GetPhysicalDeviceQueueFamilyProperties2`. Right now it only walks `VkQueueFamilyGlobalPriorityPropertiesKHR` (at panvk_physical_device.c:589). Add a pNext walk for `VK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHR` and fill `videoCodecOperations = VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR`. Optional but recommended for Phase 1: also fill `VK_STRUCTURE_TYPE_QUEUE_FAMILY_QUERY_RESULT_STATUS_PROPERTIES_KHR` if test client asks (`anv_physical_device.c:3007-3011`).
+
+### B.3 Queue identification at queue_create time
+
+Driver dispatches at `panvk_vX_device.c:305-313` via `panvk_queue_create`. Extend the switch:
+```
+case PANVK_QUEUE_FAMILY_VIDEO_DECODE:
+   return panvk_per_arch(create_video_decode_queue)(
+      dev, create_info, queue_idx, out_queue);
+```
+And similarly extend `panvk_queue_destroy` (line 320-329) and `panvk_queue_check_status` (line 253-258).
+
+For check_global_priority at panvk_vX_device.c:218-247 — the video decode family gets a new case that returns `VK_SUCCESS` for any priority (since the V4L2 device doesn't expose priority semantics) or just `VK_QUEUE_GLOBAL_PRIORITY_MEDIUM_KHR` like BIND.
+
+### B.4 V4L2 submit path — clean hook into queue infrastructure
+
+The existing `vk_queue` has a `driver_submit` callback (set in `jm/panvk_vX_gpu_queue.c:359`: `queue->vk.driver_submit = panvk_per_arch(gpu_queue_submit);`). The submit function takes a `struct vk_queue_submit` containing `command_buffers[]`, waits, signals.
+
+For our V4L2 queue, the analog is: `queue->vk.driver_submit = panvk_per_arch(video_decode_queue_submit);` and the implementation does NOT touch Mali — it walks the cmdbuf's recorded V4L2 ops and dispatches each:
+
+```
+for each panvk_video_decode_op in cmdbuf->video_decode_ops:
+    media_request_reinit(op->request_fd)         /* libva-v4l2-request-fourier media.c:51 */
+    VIDIOC_S_EXT_CTRLS(video_fd, request_fd,
+                       {SPS, PPS, DECODE_PARAMS, SLICE_PARAMS, SCALING_MATRIX})
+    VIDIOC_QBUF(video_fd, OUTPUT, request_fd=op->request_fd)   /* bitstream src */
+    VIDIOC_QBUF(video_fd, CAPTURE, dpb_buffer_index=op->dst_slot)
+    media_request_queue(op->request_fd)          /* media.c:65 */
+    poll(request_fd, POLLPRI, timeout)           /* media.c:79 */
+    VIDIOC_DQBUF(video_fd, OUTPUT)
+    VIDIOC_DQBUF(video_fd, CAPTURE)
+```
+
+The waits/signals from `vk_queue_submit` need to map to syncobj waits before we VIDIOC_QBUF, and a syncobj signal after the POLL completes. For Phase 1 (a single submit with no other GPU work in the queue), we can ignore semaphores and just use a syncobj that signals on DQBUF completion.
+
+`vk_queue_init` (`panvk_vX_gpu_queue.c:348`) is the entry point; we'd reuse the same pattern for `create_video_decode_queue`. Allocate a `struct panvk_video_decode_queue { struct vk_queue vk; int video_fd; int media_fd; ... }` and stash the fds.
+
+---
+
+## C. Session object lifecycle (`VkVideoSessionKHR`)
+
+### C.1 What CreateVideoSession allocates
+
+Anv reference at `src/intel/vulkan/anv_video.c:31-55`:
+```c
+struct anv_video_session *vid = vk_alloc2(...);
+memset(vid, 0, sizeof(*vid));
+VkResult result = vk_video_session_init(&device->vk, &vid->vk, pCreateInfo);
+*pVideoSession = anv_video_session_to_handle(vid);
+```
+
+That's it. The heavy lifting is in `vk_video_session_init` (`src/vulkan/runtime/vk_video.c:33-128`), which fills:
+- `vid->op` (`VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR` etc.)
+- `vid->max_coded`, `picture_format`, `ref_format`, `max_dpb_slots`, `max_active_ref_pics`
+- `vid->h264.profile_idc` from the `VkVideoDecodeH264ProfileInfoKHR` pNext (lines 51-57)
+
+The driver-specific anv_video_session struct (`anv_private.h:6688-6727`) adds backend-specific per-stream state: `cdf_initialized` (for AV1), `vid_mem[ANV_VID_MEM_AV1_MAX]` (private memory bindings for codec scratch).
+
+### C.2 Memory binding via vkBindVideoSessionMemoryKHR
+
+Anv reference at `anv_video.c:914-998` for `GetVideoSessionMemoryRequirements` and `anv_video.c:972-1000` for `BindVideoSessionMemory`. The mem_idx enums for H.264 (`anv_private.h:6588-6593`):
+```c
+enum anv_vid_mem_h264_types {
+   ANV_VID_MEM_H264_INTRA_ROW_STORE,
+   ANV_VID_MEM_H264_DEBLOCK_FILTER_ROW_STORE,
+   ANV_VID_MEM_H264_BSD_MPC_ROW_SCRATCH,
+   ANV_VID_MEM_H264_MPR_ROW_SCRATCH,
+   ANV_VID_MEM_H264_MAX,
+};
+```
+These are scratch buffers the Intel HCP/MFX engines need. The sizes are computed in `get_h264_video_mem_size` (`anv_video.c:483-501`) as multiples of width-in-MBs.
+
+`BindVideoSessionMemory` (anv lines 972-998) is just bookkeeping: it copies each `VkBindVideoSessionMemoryInfoKHR` into `vid->vid_mem[bind_index]` (struct `anv_vid_mem { anv_device_memory *mem; offset; size; }` at `anv_private.h:6572-6576`).
+
+### C.3 For our V4L2 backend
+
+**Massive simplification opportunity**: the Hantro VPU does NOT require driver-allocated scratch buffers — all scratch is internal to the VPU and managed by the kernel driver. So `GetVideoSessionMemoryRequirements` can return **zero entries** (`*pVideoSessionMemoryRequirementsCount = 0`), and `BindVideoSessionMemory` becomes a no-op (just `return VK_SUCCESS;`).
+
+What CreateVideoSession DOES need to allocate, V4L2-side:
+1. **Open `/dev/video1` and `/dev/media0`** if not already held by the device (see J.1 for ownership decision). 
+2. **VIDIOC_S_FMT** on the OUTPUT queue: `V4L2_PIX_FMT_H264_SLICE` (note: hantro is slice-stateless), based on `vid->h264.profile_idc` and `vid->max_coded`. See libva-v4l2-request-fourier `src/h264.c:699-738` for the control-set pattern.
+3. **VIDIOC_S_FMT** on the CAPTURE queue: `V4L2_PIX_FMT_NV12`, dimensions from `vid->max_coded`.
+4. **Allocate request_fd pool**: pre-allocate N request fds (one per DPB slot + outstanding submits) via `MEDIA_IOC_REQUEST_ALLOC` ioctls (media.c:41).
+5. **VIDIOC_REQBUFS** on OUTPUT + CAPTURE queues to set up buffer count.
+
+So `panvk_video_session` struct shape:
+```c
+struct panvk_video_session {
+   struct vk_video_session vk;       /* shared base */
+   int video_fd;                     /* may share with physical_device */
+   int media_fd;                     /* may share with physical_device */
+   /* per-session V4L2 state */
+   uint32_t bitstream_buffer_count;
+   uint32_t capture_buffer_count;
+   struct {
+      int request_fd;
+      bool in_use;
+      uint32_t dpb_slot;
+   } request_pool[MAX_OUTSTANDING_DECODES];
+};
+```
+
+### C.4 Anv session creation shape — full reference
+
+```c
+VkResult anv_CreateVideoSessionKHR(VkDevice _device,
+                                   const VkVideoSessionCreateInfoKHR *pCreateInfo,
+                                   const VkAllocationCallbacks *pAllocator,
+                                   VkVideoSessionKHR *pVideoSession)
+/* anv_video.c:31-55 */
+{
+   ANV_FROM_HANDLE(anv_device, device, _device);
+   struct anv_video_session *vid = vk_alloc2(..., sizeof(*vid), 8, OBJECT);
+   if (!vid) return vk_error(device, VK_ERROR_OUT_OF_HOST_MEMORY);
+   memset(vid, 0, sizeof(*vid));
+   VkResult result = vk_video_session_init(&device->vk, &vid->vk, pCreateInfo);
+   if (result != VK_SUCCESS) { vk_free2(..., vid); return result; }
+   *pVideoSession = anv_video_session_to_handle(vid);
+   return VK_SUCCESS;
+}
+```
+
+For us, the body grows by ~15-30 lines for V4L2 setup (open fds, S_FMT, REQBUFS, request_fd pool init) and adds error-rollback paths.
+
+---
+
+## D. Parameters object lifecycle (`VkVideoSessionParametersKHR`)
+
+### D.1 The shared layer does almost everything
+
+`src/vulkan/runtime/vk_video.c:845-885` defines:
+- `vk_common_CreateVideoSessionParametersKHR` (line 846-862)
+- `vk_common_UpdateVideoSessionParametersKHR` (line 865-872)
+- `vk_common_DestroyVideoSessionParametersKHR` (line 875-885)
+
+These delegate to:
+- `vk_video_session_parameters_create` (helper at `vk_video.c:480` — alloc + dispatch by codec op)
+- `vk_video_session_parameters_update` (line 793-844 — switches on `params->op` and calls `update_h264_dec_session_parameters` at line 692 which does the actual SPS/PPS array merge with seq_parameter_set_id collision detection per the spec)
+- `vk_video_session_parameters_destroy`
+
+**Key question**: do panvk-bifrost entrypoints get auto-wired to the `vk_common_*` versions, or does the driver need to opt in?
+
+Mesa's entrypoint generator (`vk_entrypoints_gen.py`) wires shared-helper entrypoints **by default** unless the driver provides a stronger symbol. So if panvk does NOT define `panvk_CreateVideoSessionParametersKHR`, the linker falls through to `vk_common_CreateVideoSessionParametersKHR`. Confirmed by anv comparison: anv has no `anv_CreateVideoSessionParametersKHR`, only `anv_UpdateVideoSessionParametersKHR` is missing too — both come from `vk_common_*`.
+
+radv DOES override (`radv_video.c:630-647`) but only to call `radv_video_patch_session_parameters` for an AMD-specific fixup. For Phase 1 we don't need that.
+
+**Decision: rely entirely on vk_common.** Zero driver code for parameters object lifecycle.
+
+### D.2 Parameters → V4L2 control conversion happens at CmdDecodeVideo time, not at parameter creation
+
+The shared parameters struct (`vk_video.h:127-195`) for H.264-decode stores SPS array of `struct vk_video_h264_sps` (which embeds `StdVideoH264SequenceParameterSet base`) and PPS array of `struct vk_video_h264_pps` (which embeds `StdVideoH264PictureParameterSet base`). The lookup helpers `vk_video_find_h264_dec_std_sps(params, id)` and `vk_video_find_h264_dec_std_pps(params, id)` (`vk_video.c:1186-1198`) are what we call at decode time to get the SPS/PPS for the current frame.
+
+The V4L2-side bridge from `StdVideoH264SequenceParameterSet` → `struct v4l2_ctrl_h264_sps` is the same conversion fourier does. See `libva-v4l2-request-fourier/src/h264.c:360` for `h264_va_picture_to_v4l2` which marshals to `struct v4l2_ctrl_h264_decode_params`, `v4l2_ctrl_h264_pps`, `v4l2_ctrl_h264_sps` — except the source format on our side is `StdVideoH264*` instead of `VAPictureParameterBufferH264`. The field-name mapping is essentially identical because both `VAPictureParameterBufferH264` and `StdVideoH264SequenceParameterSet` ultimately derive from the H.264 spec's syntax element names.
+
+**We will write `panvk_h264_std_sps_to_v4l2(const StdVideoH264SequenceParameterSet *std, struct v4l2_ctrl_h264_sps *out)` etc.** as a new helper file (~150 lines per codec). This is the bridge function that has no Mesa precedent — it's our novel contribution.
+
+### D.3 Hooking the parameters cache to ext-control structs at decode time
+
+At `CmdDecodeVideoKHR` recording time, we retrieve the relevant `StdVideoH264SequenceParameterSet *` and `StdVideoH264PictureParameterSet *` via `vk_video_get_h264_parameters` (`vk_video.h:419-425`). The signature:
+```c
+void vk_video_get_h264_parameters(const struct vk_video_session *session,
+                                  const struct vk_video_session_parameters *params,
+                                  const VkVideoDecodeInfoKHR *decode_info,
+                                  const VkVideoDecodeH264PictureInfoKHR *h264_pic_info,
+                                  const StdVideoH264SequenceParameterSet **sps_p,
+                                  const StdVideoH264PictureParameterSet **pps_p);
+```
+Anv uses this at `genX_cmd_video.c:904` in `anv_h264_decode_video`. We do the same.
+
+---
+
+## E. vkCmdDecodeVideoKHR command recording
+
+### E.1 What anv emits at record time vs submit time
+
+**Crucial finding**: anv does ALL work at record time. By the time the cmdbuf goes to the queue, the command stream is fully baked. Look at `anv_h264_decode_video` (`genX_cmd_video.c:892-1300+`): every `anv_batch_emit(&cmd_buffer->batch, GENX(MFX_PIPE_MODE_SELECT), sel)` etc. is a register/packet write into the cmd_buffer's batch buffer. Submit time just kicks the batch.
+
+The Begin/End wrappers are thin:
+- `CmdBeginVideoCodingKHR` (`genX_cmd_video.c:31-50`): stashes `cmd_buffer->video.vid = vid; cmd_buffer->video.params = params;` into command-buffer-local state. **That's it** for H.264 (AV1 adds CDF table init).
+- `CmdControlVideoCodingKHR` (`genX_cmd_video.c:52-74`): if RESET flag, emit `MI_FLUSH_DW` with `VideoPipelineCacheInvalidate = 1`.
+- `CmdEndVideoCodingKHR` (`genX_cmd_video.c:76-83`): clears `cmd_buffer->video.vid = NULL; cmd_buffer->video.params = NULL;`.
+
+The `cmd_buffer->video` shadow state (`anv_private.h:4935-4938`):
+```c
+struct {
+   struct anv_video_session *vid;
+   struct vk_video_session_parameters *params;
+} video;
+```
+
+### E.2 For our V4L2 backend — "deferred record"
+
+The V4L2 ioctls cannot meaningfully happen at record time, because:
+1. The bitstream buffer (frame_info->srcBuffer) is a `VkBuffer` we don't necessarily know the contents of yet (might be filled by a prior submitted cmdbuf or by host writes between record and submit).
+2. Request_fd allocation and S_EXT_CTRLS need to be sequential per submit (cannot pre-bind a request_fd to a recorded cmdbuf and reuse it).
+
+**Pattern: per-cmdbuf list of "video decode ops" recorded during CmdDecodeVideoKHR.** The op captures everything we need to replay at submit time:
+
+```c
+struct panvk_video_decode_op {
+   /* From CmdBegin */
+   struct panvk_video_session *session;
+   struct vk_video_session_parameters *params;
+   /* From CmdDecode */
+   VkBuffer src_buffer;        /* bitstream source */
+   VkDeviceSize src_offset;
+   VkDeviceSize src_size;
+   /* DPB target */
+   struct panvk_image_view *dst_iv;
+   uint32_t dst_dpb_slot;
+   /* Already-resolved SPS/PPS pointers (cheap copy by value) */
+   StdVideoH264SequenceParameterSet sps;
+   StdVideoH264PictureParameterSet pps;
+   /* H.264 slice info, picked apart at submit time */
+   StdVideoDecodeH264PictureInfo std_pic_info;
+   /* Reference slot info — small array, copy by value */
+   uint32_t reference_slot_count;
+   struct panvk_video_ref_slot reference_slots[16];
+};
+
+struct panvk_cmd_buffer {
+   ...
+   struct util_dynarray video_decode_ops;   /* of struct panvk_video_decode_op */
+};
+```
+
+Then submit-time (per B.4) walks the dynarray and does the ioctl dance per op.
+
+Comparable record-time op-list pattern exists today for sparse binds (`panvk_sparse.c`). Anv stores per-cmdbuf state in `cmd_buffer->video` but doesn't queue up ops because it emits direct register packets. We're doing what anv would do if anv ran on a separate kernel device.
+
+### E.3 CmdBegin/Control/End for our backend
+
+- `panvk_per_arch(CmdBeginVideoCodingKHR)`: clear `cmd_buffer->video_decode_session = vid; cmd_buffer->video_decode_params = params;`. Optionally validate the reference slot layout matches the dpb_slot count we set up at session init.
+- `panvk_per_arch(CmdControlVideoCodingKHR)` for `VK_VIDEO_CODING_CONTROL_RESET_BIT_KHR`: this needs to translate to `MEDIA_REQUEST_IOC_REINIT` on all pooled request_fds — OR just mark a session-wide flag "next decode needs fresh request setup". Phase 1 we can no-op this if we always reinit per submit anyway.
+- `panvk_per_arch(CmdEndVideoCodingKHR)`: clear shadow state. No emission needed.
+
+---
+
+## F. DPB management
+
+### F.1 Vulkan-side DPB model
+
+Per-frame `VkCmdDecodeVideoKHR` receives:
+- `frame_info->dstPictureResource` — `VkVideoPictureResourceInfoKHR { codedOffset, codedExtent, baseArrayLayer, imageViewBinding }`. The image view that will receive the decoded output.
+- `frame_info->pSetupReferenceSlot` — `VkVideoReferenceSlotInfoKHR { slotIndex, pPictureResource }`. Says "this decoded frame becomes DPB slot N".
+- `frame_info->pReferenceSlots[]` — references TO read from. Each carries `slotIndex` + `pPictureResource`.
+
+For H.264, additionally:
+- `pNext` chain `VkVideoDecodeH264PictureInfoKHR { pStdPictureInfo, sliceCount, pSliceOffsets }`
+- DPB slot pNext per reference: `VkVideoDecodeH264DpbSlotInfoKHR { pStdReferenceInfo }` — contains POC/short-term/long-term flags.
+
+Anv's reference assembly logic at `genX_cmd_video.c:992-1004`:
+```c
+for (unsigned i = 0; i < frame_info->referenceSlotCount; i++) {
+   const struct anv_image_view *ref_iv = anv_image_view_from_handle(
+      frame_info->pReferenceSlots[i].pPictureResource->imageViewBinding);
+   int idx = frame_info->pReferenceSlots[i].slotIndex;
+   ...
+   dpb_slots[idx] = i;
+   buf.ReferencePictureAddress[i] = anv_image_dpb_address(ref_iv, baseArrayLayer);
+}
+```
+
+### F.2 V4L2 DPB model
+
+`v4l2_ctrl_h264_decode_params::dpb[16]` is an array of `struct v4l2_h264_dpb_entry { reference_ts, pic_num, frame_num, fields, flags, top_field_order_cnt, bottom_field_order_cnt }`. Each entry's `reference_ts` is the timestamp used at VIDIOC_QBUF of the OUTPUT (bitstream) plane when that reference was decoded — V4L2 uses this as the "buffer identity" key.
+
+So the mapping rule from Vulkan-side `VkVideoReferenceSlotInfoKHR[]` to V4L2-side `dpb[16]` is:
+
+| Vulkan field | V4L2 dpb field | How to source |
+|---|---|---|
+| `pReferenceSlots[i].slotIndex` | array index in `dpb[]` | direct (assert `<= 16`) |
+| `pReferenceSlots[i].pNext->pStdReferenceInfo->PicOrderCnt[0]` | `top_field_order_cnt` | direct |
+| `pReferenceSlots[i].pNext->pStdReferenceInfo->PicOrderCnt[1]` | `bottom_field_order_cnt` | direct |
+| `pReferenceSlots[i].pNext->pStdReferenceInfo->FrameNum` | `frame_num` | direct |
+| short-term/long-term flag | `flags` | direct |
+| (the decoded output VkImage backing the ref slot) | `reference_ts` | **lookup**: we maintain a `slotIndex → reference_ts` map per-session, populated each time we decode into that slot. See libva-fourier `src/h264.c:140-218` for `dpb_insert`/`dpb_update`/`dpb_find_entry`. Our case is simpler: slotIndex is provided by Vulkan, we just need to track "what ts did I QBUF when I last decoded into slotIndex N". |
+
+The fourier `src/h264.c:238-353` `h264_fill_dpb` function is the closest analog — it constructs `struct v4l2_h264_dpb_entry[]` from libva-side state. We do the analog but feed it from `pReferenceSlots[]`.
+
+### F.3 Bookkeeping struct in panvk_video_session
+
+```c
+struct panvk_video_session {
+   ...
+   struct {
+      uint64_t reference_ts;         /* timestamp last used when decoding into this slot */
+      struct panvk_image *image;     /* the VkImage backing this slot's DPB */
+      uint32_t array_layer;
+      bool active;
+   } dpb[16];
+};
+```
+
+Update at decode-completion time (after VIDIOC_DQBUF) for the setup-reference-slot.
+
+---
+
+## G. Memory + dmabuf interop
+
+### G.1 The challenge
+
+App creates a `VkImage` with `VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR | VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR | VK_IMAGE_USAGE_SAMPLED_BIT`. Memory is bound via normal `vkBindImageMemory`. Then the decoded frame data needs to physically end up in that memory backing.
+
+Hantro's CAPTURE queue allocates its own buffers via `VIDIOC_REQBUFS(memory=V4L2_MEMORY_MMAP)` or accepts dma_buf imports via `VIDIOC_REQBUFS(memory=V4L2_MEMORY_DMABUF)`. The clean path: **app's VkImage memory backing IS a dma_buf**, exported from panvk via `vkGetMemoryFdKHR`, and we VIDIOC_QBUF'd with the dma_buf fd as the CAPTURE plane.
+
+But Vulkan apps don't usually export memory back to themselves. They expect `vkCreateImage(usage=VIDEO_DECODE_DST)` to "just work". So **we** drive the dma_buf flow internally.
+
+### G.2 Internal dma_buf flow (proposed)
+
+Two strategies:
+
+**Strategy A: Driver-allocated CAPTURE buffers, app-imported into VkImage**
+- VIDIOC_REQBUFS(MMAP) at session create.
+- VIDIOC_EXPBUF to get a dma_buf fd per allocated buffer.
+- Import the dma_buf back into pan_kmod as a VkDeviceMemory equivalent.
+- VkBindImageMemory to that DeviceMemory.
+
+**Strategy B: App-allocated VkImage, V4L2_MEMORY_DMABUF queue**
+- App calls vkCreateImage with VkExternalMemoryImageCreateInfo handleTypes=DMA_BUF.
+- Vk allocates the BO via pan_kmod, exports a dma_buf fd via `pan_kmod_bo_export` (`panvk_device_memory.c:387-404`).
+- VIDIOC_QBUF(memory=V4L2_MEMORY_DMABUF, fd=our_dmabuf_fd) at submit time.
+
+**Strategy B is what fourier does for surface buffers, and it's the cleaner fit** — the app gets a real VkImage with real VkDeviceMemory, we never have to fake the import direction. Phase 1 may want to start with Strategy A for simplicity since vk-video-samples likely doesn't pass `VkExternalMemoryImageCreateInfo` flags, but Strategy B is the long-term right answer.
+
+### G.3 Anv's DPB image allocation
+
+Anv treats DPB images as plain VkImages — no special allocation. The HW reads them directly via `anv_image_dpb_address(iv, baseArrayLayer)` at `genX_cmd_video.c:933`. Memory layout is whatever ISL gives them (tile-Y or planar-420). For our backend, that doesn't transfer — the Hantro VPU expects NV12 in a linear layout (or a vendor-specific tiled layout that we'd need to expose; for Phase 1 we mandate linear).
+
+### G.4 panvk dmabuf entry points (already present)
+
+- `panvk_AllocateMemory` handles `VkImportMemoryFdInfoKHR` at `panvk_device_memory.c:121-135` — calls `pan_kmod_bo_import`.
+- `panvk_GetMemoryFdKHR` at `panvk_device_memory.c:387-404` exports.
+- `EXT_external_memory_dma_buf` already advertised at `panvk_vX_physical_device.c:146`.
+
+So the building blocks exist. The new code is the **session-internal V4L2 buffer pool** that converts between V4L2_MEMORY_MMAP/DMABUF and pan_kmod BOs.
+
+---
+
+## H. vk_video runtime helper coverage matrix
+
+What we inherit vs what we write. Cross-referenced from sections A–G:
+
+| Question | Inherit from vk_video shared layer? | Driver writes? |
+|---|---|---|
+| A. KHR_video_* extension booleans | No | YES — `panvk_vX_physical_device.c` table |
+| A. videoMaintenance1 feature struct | No | (Phase 1: skip; future: yes if advertised) |
+| A. GetPhysicalDeviceVideoCapabilitiesKHR | **NO** — direct entrypoint | YES — new code in `panvk_physical_device.c` |
+| A. GetPhysicalDeviceVideoFormatPropertiesKHR | **NO** — direct entrypoint | YES — new code in `panvk_physical_device.c` |
+| B. Queue family enum + props | No | YES — `panvk_device.h` + `panvk_physical_device.c` |
+| B. Queue-family-video pNext walk | No | YES — extend `panvk_GetPhysicalDeviceQueueFamilyProperties2` |
+| B. Queue create/destroy dispatch | No | YES — extend `panvk_vX_device.c:305-329` |
+| B. Queue submit | No | YES — new `panvk_vX_video_decode_queue.c` |
+| C. CreateVideoSessionKHR — handle + base init | YES partial: `vk_video_session_init` does the codec-op parsing | YES — driver wraps, adds V4L2 fd open + S_FMT + REQBUFS |
+| C. DestroyVideoSessionKHR — base finish | YES partial: `vk_video_session_finish` | YES — driver wraps, adds V4L2 teardown |
+| C. GetVideoSessionMemoryRequirementsKHR | No | YES (trivial: zero entries) |
+| C. BindVideoSessionMemoryKHR | No | YES (trivial: no-op) |
+| D. CreateVideoSessionParametersKHR | **YES — `vk_common_CreateVideoSessionParametersKHR` (vk_video.c:846)** | NO driver code needed |
+| D. UpdateVideoSessionParametersKHR | **YES — `vk_common_UpdateVideoSessionParametersKHR` (vk_video.c:865)** | NO driver code needed |
+| D. DestroyVideoSessionParametersKHR | **YES — `vk_common_DestroyVideoSessionParametersKHR` (vk_video.c:875)** | NO driver code needed |
+| D. H.264 SPS/PPS storage | **YES — `struct vk_video_h264_{sps,pps}` (vk_video.h:32-43)** | NO |
+| D. H.264 SPS/PPS lookup | **YES — `vk_video_find_h264_dec_std_{sps,pps}` (vk_video.c:1186)** | NO |
+| D. H.264 params merge with dedup | **YES — internal to `vk_video_session_parameters_update`** | NO |
+| D. Std → V4L2 control marshalling | No precedent in Mesa | YES — NEW helper file (~300 lines for H.264) |
+| E. CmdBeginVideoCodingKHR | No | YES — trivial state-stash |
+| E. CmdControlVideoCodingKHR | No | YES — trivial RESET handling |
+| E. CmdEndVideoCodingKHR | No | YES — trivial state-clear |
+| E. CmdDecodeVideoKHR | No | YES — record op into cmdbuf dynarray |
+| E. `vk_video_get_h264_parameters` resolver | **YES (vk_video.h:419)** | NO |
+| F. DPB slot ↔ reference_ts map | No | YES — `panvk_video_session.dpb[16]` |
+| F. H.264 reference list construction | Partially: `vk_fill_video_h264_*` helpers if present | YES — but mostly direct field copies |
+| G. dmabuf BO import/export | YES — existing panvk path (`panvk_device_memory.c:121,387`) | NO new code |
+| G. V4L2 buffer ↔ pan_kmod_bo bridging | No precedent | YES — NEW helper file |
+| G. Image creation for VIDEO_DECODE_DST | YES — existing `panvk_image_init` (panvk_image.c:562) handles all usage flags through ISL | Possibly yes for tile mode restrictions |
+
+**Net leverage**: ~3000 lines of vk_video runtime helpers we inherit for free, primarily the H.264 SPS/PPS bitstream parsing + parameters object lifecycle + std/find helpers. Our new-code estimate is roughly 800-1500 lines split across ~4 new files (see I).
+
+---
+
+## I. panvk-specific integration points (concrete edits)
+
+### I.1 Existing files to modify
+
+**`src/panfrost/vulkan/panvk_vX_physical_device.c`**:
+- Lines ~123-124 (between `KHR_vertex_attribute_divisor` and `KHR_vulkan_memory_model`): add `.KHR_video_queue = true,`, `.KHR_video_decode_queue = true,`, `.KHR_video_decode_h264 = true,` (gated on hantro probe).
+- Optional Phase 2+: at line 540, flip `unifiedImageLayoutsVideo` based on session config.
+
+**`src/panfrost/vulkan/panvk_physical_device.c`**:
+- Line ~565: extend the `qfamily_props[]` array — add a third entry for `PANVK_QUEUE_FAMILY_VIDEO_DECODE` with `queueFlags = VK_QUEUE_VIDEO_DECODE_BIT_KHR | VK_QUEUE_TRANSFER_BIT`.
+- Around line 589 inside the `vk_outarray_append_typed` loop: add a pNext walk for `VK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHR` that sets `videoCodecOperations = VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR`.
+- ADD new entrypoints `panvk_GetPhysicalDeviceVideoCapabilitiesKHR` and `panvk_GetPhysicalDeviceVideoFormatPropertiesKHR` at end of file (~70 lines + ~50 lines).
+
+**`src/panfrost/vulkan/panvk_device.h`**:
+- Line 46-48: add `PANVK_QUEUE_FAMILY_VIDEO_DECODE,` to the enum.
+
+**`src/panfrost/vulkan/panvk_vX_device.c`**:
+- Lines 218-247 (`check_global_priority`): add `case PANVK_QUEUE_FAMILY_VIDEO_DECODE: return VK_SUCCESS;`.
+- Lines 253-258 (`panvk_queue_check_status`): add case for the new family calling `panvk_per_arch(video_decode_queue_check_status)`.
+- Lines 305-313 (`panvk_queue_create`): add case calling `panvk_per_arch(create_video_decode_queue)`.
+- Lines 320-329 (`panvk_queue_destroy`): symmetric.
+
+**`src/panfrost/vulkan/meson.build`**:
+- Add new files to either `libpanvk_files` (arch-agnostic) or `common_per_arch_files` (arch-templated). The session/queue/command-record code is arch-agnostic but uses `panvk_per_arch()` symbols only by convention — Phase 1 we can place all new files in `libpanvk_files` and skip the per_arch dispatch.
+
+### I.2 New files to add
+
+**`src/panfrost/vulkan/panvk_video_decode.c`** (~400 lines):
+- `panvk_CreateVideoSessionKHR`
+- `panvk_DestroyVideoSessionKHR`
+- `panvk_GetVideoSessionMemoryRequirementsKHR` (returns count=0)
+- `panvk_BindVideoSessionMemoryKHR` (no-op)
+- `panvk_CmdBeginVideoCodingKHR`
+- `panvk_CmdControlVideoCodingKHR`
+- `panvk_CmdEndVideoCodingKHR`
+- `panvk_CmdDecodeVideoKHR` (record op into `cmd_buffer->video_decode_ops`)
+
+**`src/panfrost/vulkan/panvk_video_decode.h`**:
+- `struct panvk_video_session`
+- `struct panvk_video_decode_op`
+- `struct panvk_video_decode_queue`
+
+**`src/panfrost/vulkan/panvk_v4l2.c`** (~500 lines):
+- `panvk_v4l2_probe_hantro()` — finds /dev/video1 and /dev/media0 (mirrors libva-v4l2-request-fourier `src/request.c:143-308` `find_decoder_video_node_via_topology`).
+- `panvk_v4l2_session_init()` — S_FMT on OUTPUT/CAPTURE, REQBUFS, request_fd pool alloc.
+- `panvk_v4l2_h264_std_to_ctrl_sps()` — `StdVideoH264SequenceParameterSet *` → `struct v4l2_ctrl_h264_sps`.
+- `panvk_v4l2_h264_std_to_ctrl_pps()` — `StdVideoH264PictureParameterSet *` → `struct v4l2_ctrl_h264_pps`.
+- `panvk_v4l2_h264_fill_decode_params()` — build `struct v4l2_ctrl_h264_decode_params` from VkVideoDecodeInfoKHR + slot map.
+- `panvk_v4l2_submit_op()` — the request_fd / S_EXT_CTRLS / QBUF / poll / DQBUF dance for one op.
+
+**`src/panfrost/vulkan/panvk_vX_video_decode_queue.c`** (~150 lines, per_arch):
+- `panvk_per_arch(create_video_decode_queue)`
+- `panvk_per_arch(destroy_video_decode_queue)`
+- `panvk_per_arch(video_decode_queue_submit)` — walks cmdbuf ops, calls `panvk_v4l2_submit_op` per op.
+- `panvk_per_arch(video_decode_queue_check_status)`
+
+### I.3 Entrypoint generation
+
+Recall from `meson.build:7-19` that entrypoints are auto-wired with `--prefix panvk` and per-arch prefixes. The names above (`panvk_CmdDecodeVideoKHR` etc.) match the auto-resolution rules — no changes needed in `vk_entrypoints_gen` invocation.
+
+For the per-arch ones (`panvk_per_arch(...)`), we expand under each `PAN_ARCH` define just like existing per-arch code.
+
+---
+
+## J. Probable architecture sketch
+
+**V4L2 fd ownership**: at `panvk_physical_device` level for probe-time discovery (`panvk_v4l2_probe_hantro` sets `phys_dev->v4l2.video_fd_present = true` and stashes paths), but actual `open()` happens at `panvk_CreateVideoSessionKHR` time per-session. Two reasons: (1) the V4L2 driver state is per-fd, so two concurrent sessions need two separate fds anyway; (2) keeping fds closed when no video session is active is good citizenship. The PhysicalDevice only holds device-node paths and capability flags.
+
+**Per-session V4L2 state**: `struct panvk_video_session` (see C.3) owns one `video_fd` + one `media_fd` + a pool of `request_fd`s (one per max-in-flight decode, typically `max_dpb_slots + 2`). At `CreateVideoSession` we S_FMT both queues, REQBUFS to allocate the buffer count, EXPBUF the CAPTURE buffers to dma_bufs that get held in the session for later association with VkImage memory (Strategy B from G.2).
+
+**Per-VkImage dmabuf bookkeeping**: the existing pan_kmod export path (`panvk_device_memory.c:387-404`) gives us dma_buf out. The new piece is the inverse — at `vkBindImageMemory` time for a `VkImage` whose `usage & VIDEO_DECODE_DST`, we'd register the underlying BO's dma_buf as a CAPTURE buffer with `VIDIOC_QBUF(memory=V4L2_MEMORY_DMABUF)`. The image's `panvk_image` struct gains a `int v4l2_capture_index;` field.
+
+**Submit-time dispatch**: at `panvk_vX_device.c:305-313` we extended the switch to route `PANVK_QUEUE_FAMILY_VIDEO_DECODE` to `panvk_per_arch(create_video_decode_queue)` whose `driver_submit = panvk_per_arch(video_decode_queue_submit)`. The submit function walks each cmdbuf's `video_decode_ops` dynarray, and per op:
+
+```
+1. resolve request_fd from session pool (allocate or reuse, ioctl(media_fd, MEDIA_IOC_REQUEST_ALLOC))
+2. media_request_reinit(request_fd) if reusing
+3. translate op->sps to v4l2_ctrl_h264_sps via panvk_v4l2_h264_std_to_ctrl_sps()
+4. translate op->pps to v4l2_ctrl_h264_pps via panvk_v4l2_h264_std_to_ctrl_pps()  
+5. build v4l2_ctrl_h264_decode_params from op (including dpb[] from session->dpb[] tracking)
+6. VIDIOC_S_EXT_CTRLS(video_fd, request_fd=op->request_fd, {SPS, PPS, DECODE_PARAMS, SCALING_MATRIX, SLICE_PARAMS})
+7. VIDIOC_QBUF(video_fd, OUTPUT, request_fd=op->request_fd, bytesused=op->src_size, m.fd=op->src_buffer's bo dma_buf)
+8. VIDIOC_QBUF(video_fd, CAPTURE, index=op->dst_iv->image->v4l2_capture_index)
+9. MEDIA_REQUEST_IOC_QUEUE(request_fd)
+10. poll(request_fd, POLLPRI, timeout)
+11. VIDIOC_DQBUF(video_fd, OUTPUT)  /* releases input slot */
+12. VIDIOC_DQBUF(video_fd, CAPTURE) /* output ready */
+13. Update session->dpb[op->dst_dpb_slot].reference_ts to the QBUF timestamp
+14. Signal vk_queue_submit's signal semaphores
+```
+
+Steps 5-12 are exactly the libva-v4l2-request-fourier `RequestEndPicture` body (`src/picture.c:497-650`). The mapping VAPicture* → V4L2 vs Std* → V4L2 is the one piece of code that has no Mesa precedent — we're inventing the bridge — but it's bounded: ~150 lines per codec (we only need H.264 in Phase 1).
+
+---
+
+## Mesa-version observations and risks
+
+- Mesa 26.0.6 is the campaign baseline. The vk_video runtime helpers in `src/vulkan/runtime/vk_video.{c,h}` are stable in this version with H.264, H.265, AV1, VP9, encode-h264, encode-h265, encode-av1 all covered. No upgrade required for Phase 1.
+- `KHR_video_decode_h264` spec v9 is what's in `vk_api.xml` for 26.0.6 — confirmed by extension being already known to entrypoint generator (no `--beta` flag needed; that flag at `meson.build:18` is for beta/provisional extensions only).
+- Maintenance1/2 features are NOT required for the simple-test in Phase 1, so we don't need `videoMaintenance1` / `videoMaintenance2` machinery yet. Maintenance1 (inline parameters, inline queries) becomes relevant in Phase 6+ if we want to pass conformance suites.
+- The `unifiedImageLayoutsVideo` feature at `panvk_vX_physical_device.c:540` is currently false. Phase 1 we can leave it false — the test client honors explicit `VkImageMemoryBarrier` transitions to/from `VK_IMAGE_LAYOUT_VIDEO_DECODE_DST_KHR`.
+
+---
+
+## Architectural maps that DO cleanly transfer from anv/radv
+
+1. **Session as wrapper around `vk_video_session`**. Anv: `struct anv_video_session { struct vk_video_session vk; ... }`. radv: same shape. Ours: same shape. The `vk.` namespace gives us all the spec-mandated session fields for free.
+2. **Parameters fully delegated to `vk_common_*`**. Anv does this, radv mostly does this (with a tiny `radv_video_patch_session_parameters` patch). Ours: full delegation.
+3. **Cmdbuf-local shadow state for current session+params during the Begin..End scope**. Anv: `cmd_buffer->video.{vid,params}`. We do the same.
+4. **DPB slot index ↔ image view lookup at decode time**. Both anv and our backend do this lookup per frame.
+
+## Architectural maps that DO NOT transfer
+
+1. **Driver-allocated session scratch memory (`anv_vid_mem` array)**. Hantro VPU keeps scratch internal; we return zero memory requirements. Hard skip — not just simplification, an inversion.
+2. **`anv_batch_emit` register packets directly into cmdbuf at record time**. There is no equivalent. We MUST defer to submit-time — that's the entire point of the V4L2 backend being on a separate kernel device.
+3. **`anv_image_dpb_address(iv, layer)` resolving to a GPU virtual address**. Our DPB references resolve to V4L2 buffer indices (queued at session-init) or dma_buf fds (Strategy B). The "address" abstraction doesn't apply; the VPU doesn't share the GPU's address space.
+4. **MFX/HCP/VDENC register-set knowledge in `genX_cmd_video.c`** — 4000+ lines of Intel-specific HW programming. Completely irrelevant. The Hantro VPU's "programming" is a sequence of struct `v4l2_ctrl_*` fills + ioctls.
+5. **MOCS / cache state in pipe-buf-addr-state** (`genX_cmd_video.c:962+`). N/A — the kernel V4L2 driver handles all cache coherency at QBUF/DQBUF boundaries.
+
+---
+
+## Phase 1 success criteria — final checklist
+
+| vk-video-samples simple-test step | Where it lands in this map |
+|---|---|
+| `vkGetPhysicalDeviceQueueFamilyProperties2` returns family with `VK_QUEUE_VIDEO_DECODE_BIT_KHR` and `VkQueueFamilyVideoPropertiesKHR::videoCodecOperations & VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR` set | B.2 |
+| `vkEnumerateDeviceExtensionProperties` returns the three KHR_video_* | A.1 |
+| `vkGetPhysicalDeviceVideoCapabilitiesKHR(profile=H264)` returns sane caps | A.3 |
+| `vkGetPhysicalDeviceVideoFormatPropertiesKHR` returns NV12 | A.4 |
+| `vkCreateDevice` succeeds with the video queue family selected | B.3 |
+| `vkCreateVideoSessionKHR` succeeds | C |
+| `vkGetVideoSessionMemoryRequirementsKHR` returns 0 entries | C.3 |
+| `vkCreateVideoSessionParametersKHR` with SPS+PPS succeeds | D (free from vk_common) |
+| Recording a `vkCmdDecodeVideoKHR` succeeds (no execution yet — could even no-op the V4L2 ioctls in Phase 1 since correctness isn't tested) | E.2 |
+| Single queue submit succeeds without VK_ERROR_DEVICE_LOST | B.4, J |
+
+Phase 1 deliberately stops short of "decoded picture compares against reference". That's Phase 7. Phase 1 is the end-to-end plumbing.