initial seed: retrofit campaign lineage from local working trees
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.
This retrofit imports:
- mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18)
(libmali stub blobs at iter18/blob/ excluded
— 109MB of RE artifacts replaced with a README
pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/ — frozen .tgz source snapshots at each milestone
(basis for the 0005 patch diff generation)
Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.
Total: 1.9 MB across 124 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,669 @@
|
||||
# Phase 1 Source Map — VK_KHR_video_decode_h264 on panvk-bifrost (V4L2/Hantro backend)
|
||||
|
||||
**Campaign**: panvk-bifrost-video (successor to panvk-bifrost r4)
|
||||
**Mesa version**: 26.0.6 (source tree on ohm at `/home/mfritsche/mesa-build/mesa-26.0.6/`)
|
||||
**Phase 1 goal**: vk-video-samples simple-test passes `HasAllDeviceExtensions`, creates a `VkVideoSessionKHR`, submits one `VkCmdDecodeVideoKHR`. Decode correctness is Phase 7.
|
||||
**Backend**: V4L2-stateless `hantro` VPU on RK3566/PineTab2 via `/dev/video1` + `/dev/media0`. Mali GPU is not the decode engine.
|
||||
|
||||
> Convention used throughout: every file path is **on ohm** unless otherwise stated. Cite as `FILE:LINE`. When citing libva-v4l2-request-fourier (the reference for V4L2-side bridging), the path is on the **workstation** at `/home/mfritsche/src/libva-v4l2-request-fourier/`.
|
||||
|
||||
---
|
||||
|
||||
## Executive summary
|
||||
|
||||
The Mesa 26.0.6 video stack is structured in three layers:
|
||||
|
||||
1. **Shared runtime helpers** — `src/vulkan/runtime/vk_video.{c,h}` (3413 + 436 lines). Owns: `vk_video_session_init`/`finish`, `vk_video_session_parameters_{create,update,destroy}`, H.264 SPS/PPS storage as `struct vk_video_h264_{sps,pps}`, and the `vk_common_{Create,Update,Destroy}VideoSessionParametersKHR` entrypoints (full dispatch coverage of the parameters object). Codec parameter parsing helpers (`vk_video_get_h264_parameters`, `vk_video_find_h264_dec_std_{sps,pps}`).
|
||||
2. **Driver-side video** — anv (`src/intel/vulkan/anv_video.c` + `genX_cmd_video.c`) and radv (`src/amd/vulkan/radv_video.c`). Each driver owns: extension advertisement, queue-family advertisement, `GetPhysicalDeviceVideoCapabilitiesKHR`, `GetPhysicalDeviceVideoFormatPropertiesKHR`, `Create/DestroyVideoSessionKHR`, `GetVideoSessionMemoryRequirementsKHR`, `BindVideoSessionMemoryKHR`, and the per-frame `CmdBeginVideoCodingKHR`/`CmdControlVideoCodingKHR`/`CmdDecodeVideoKHR`/`CmdEndVideoCodingKHR` recording.
|
||||
3. **HW codegen** — driver emits register packets into a command stream during the `CmdDecodeVideoKHR` record; the existing GPU queue submit path then ships that stream to the video engine.
|
||||
|
||||
**Critical mismatch for our backend**: layer 3 does not exist for us. The Hantro VPU has no Mali-side command stream. It has its own kernel device node (`/dev/video1` + `/dev/media0`) with a request-API ioctl interface. So we keep layer 1 verbatim (huge win — all H.264 SPS/PPS parsing comes free), reuse layer 2's *interface contracts*, and replace layer 2's command-stream codegen with deferred V4L2 control marshalling + submit-time `VIDIOC_QBUF`/`POLL`/`VIDIOC_DQBUF`.
|
||||
|
||||
**vk-video-samples simple-test trinity** of required extensions:
|
||||
- `VK_KHR_video_queue` (spec v8) — shared base
|
||||
- `VK_KHR_video_decode_queue` (spec v8) — decode-specific commands
|
||||
- `VK_KHR_video_decode_h264` (spec v9) — H.264 profile
|
||||
|
||||
None are advertised in panvk-bifrost r4 today (Mesa 26.0.6 `src/panfrost/vulkan/panvk_vX_physical_device.c:539-540` explicitly sets `unifiedImageLayoutsVideo = false` and leaves all `KHR_video_*` extension flags unset / default-false).
|
||||
|
||||
---
|
||||
|
||||
## A. Extension surface
|
||||
|
||||
### A.1 Where extensions are advertised
|
||||
|
||||
panvk extension table is built by `panvk_per_arch(get_physical_device_extensions)` in `src/panfrost/vulkan/panvk_vX_physical_device.c:35-160`. This is a single struct-literal that fills a `struct vk_device_extension_table` field-by-field. To add the three required extensions we extend the literal between (alphabetical sort by KHR_):
|
||||
|
||||
```
|
||||
.KHR_video_decode_h264 = true, /* gated on hantro probe success */
|
||||
.KHR_video_decode_queue = true,
|
||||
.KHR_video_queue = true,
|
||||
```
|
||||
|
||||
The natural insertion point is between `.KHR_vertex_attribute_divisor = true,` (line ~123) and `.KHR_vulkan_memory_model = true,` (line ~124).
|
||||
|
||||
Anv reference for comparison: `src/intel/vulkan/anv_physical_device.c:262-274`:
|
||||
```c
|
||||
.KHR_video_queue = video_decode_enabled || video_encode_enabled,
|
||||
.KHR_video_decode_queue = video_decode_enabled,
|
||||
.KHR_video_decode_h264 = VIDEO_CODEC_H264DEC && video_decode_enabled,
|
||||
```
|
||||
where `video_decode_enabled` is `device->instance->debug & ANV_DEBUG_VIDEO_DECODE` (`anv_physical_device.c:153`). Anv gates this behind a debug flag because anv-side decode is still considered experimental. We probably want the same gating pattern, except keyed on hantro probe success rather than a debug flag — so the extension is advertised only if `/dev/video1` opens and reports H.264 OUTPUT format support.
|
||||
|
||||
### A.2 Feature struct fields
|
||||
|
||||
vk-video-samples simple-test requires `VK_KHR_video_queue` and friends advertised. The strictly-required feature struct fields are:
|
||||
|
||||
- `VkPhysicalDeviceVideoMaintenance1FeaturesKHR::videoMaintenance1` — **only if** we advertise `KHR_video_maintenance1`. For Phase 1, the simple-test does NOT require maintenance1 — confirmed by reading test harness expectations. Skip in Phase 1.
|
||||
- `VkPhysicalDeviceUnifiedImageLayoutsFeaturesKHR::unifiedImageLayoutsVideo` — currently `false` at `panvk_vX_physical_device.c:540`. Stays `false` for Phase 1 (transition rules still apply).
|
||||
|
||||
The shared `vk_video_session` struct (`vk_video.h:80-115`) carries the per-session profile bookkeeping that gets driven by the codec ops `pNext`. No driver-side feature toggles needed beyond the three extension booleans for Phase 1.
|
||||
|
||||
### A.3 vkGetPhysicalDeviceVideoCapabilitiesKHR routing
|
||||
|
||||
This is a **direct driver entrypoint** — there is no `vk_common_GetPhysicalDeviceVideoCapabilitiesKHR` in `src/vulkan/runtime/`. Verified: `grep -rn "vk_common_GetPhysicalDeviceVideo" /home/mfritsche/mesa-build/mesa-26.0.6/src/` returns no hits.
|
||||
|
||||
Driver-side, the entrypoint is generated via `vk_entrypoints_gen` from `vk_api.xml` (per `panvk/vulkan/meson.build:7-19`). The panvk symbol resolution uses the `panvk` prefix and per-arch shims `panvk_v6` / `panvk_v7` / `panvk_v9` / `panvk_v10` / `panvk_v12` / `panvk_v13`. So the symbol we need to provide is one of:
|
||||
|
||||
- `panvk_GetPhysicalDeviceVideoCapabilitiesKHR` (in `panvk_physical_device.c`) — common (arch-agnostic), since physical-device caps don't vary across Mali archs for V4L2-side decode (the VPU is on a separate engine entirely). **Recommended.**
|
||||
- `panvk_per_arch(GetPhysicalDeviceVideoCapabilitiesKHR)` in a new `panvk_vX_video_decode.c` — only needed if the answer varies per arch, which it doesn't here.
|
||||
|
||||
Reference shape from anv (`anv_video.c:183-291`): the function takes `pVideoProfile` and fills `pCapabilities` (`maxCodedExtent`, `maxDpbSlots`, `maxActiveReferencePictures`, `minBitstreamBufferOffsetAlignment`, `stdHeaderVersion`), then walks the codec-specific `pNext` chain. For H.264-decode, that means `VkVideoDecodeH264CapabilitiesKHR` (anv lines 213-225) with `maxLevelIdc` and `fieldOffsetGranularity`. Also fills `VkVideoDecodeCapabilitiesKHR::flags = VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR` (anv line 205) — which is what we'll want too, because the Hantro CAPTURE buffers ARE the DPB (no separate scratch).
|
||||
|
||||
The hantro driver's real limits (4K H.264 decode confirmed on RK3566) drive these values; we want to be conservative for Phase 1 and use `maxCodedExtent = 1920x1088`, `maxDpbSlots = 17` (one more than `STD_VIDEO_H264_MAX_NUM_LIST_REF=16`, matches `ANV_VIDEO_H264_MAX_DPB_SLOTS` at `anv_private.h:6581`), `maxActiveReferencePictures = 16`.
|
||||
|
||||
### A.4 vkGetPhysicalDeviceVideoFormatPropertiesKHR routing
|
||||
|
||||
Same routing pattern as A.3 — direct driver entrypoint, no shared common path. Implement as `panvk_GetPhysicalDeviceVideoFormatPropertiesKHR` in `panvk_physical_device.c`.
|
||||
|
||||
Reference shape from anv (`anv_video.c:393-481`): walks `VkVideoProfileListInfoKHR` from `pVideoFormatInfo->pNext`, validates each profile, then outputs format entries. For H.264 8-bit, anv reports `VK_FORMAT_G8_B8R8_2PLANE_420_UNORM` (NV12-equivalent, anv:460).
|
||||
|
||||
This is exactly what we need. The hantro driver returns NV12 as `V4L2_PIX_FMT_NV12` on the CAPTURE queue (confirmed in libva-v4l2-request-fourier `src/h264.c` and via `v4l2_find_format` calls in `src/request.c:864-865` showing format-probe pattern). The dst usage flag merge in anv at lines 410-419 (where `VIDEO_DECODE_DST` triggers added flags including `SAMPLED_BIT | TRANSFER_DST_BIT`) is universal vulkan-video pattern and applies verbatim. Set:
|
||||
- `format = VK_FORMAT_G8_B8R8_2PLANE_420_UNORM` (NV12)
|
||||
- `imageType = VK_IMAGE_TYPE_2D`
|
||||
- `imageTiling = VK_IMAGE_TILING_OPTIMAL` — but see G.2 below about how the underlying memory comes from V4L2, so this is a "logical" tiling decision
|
||||
- `imageUsageFlags = VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR | VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR | VK_IMAGE_USAGE_SAMPLED_BIT | VK_IMAGE_USAGE_TRANSFER_SRC_BIT`
|
||||
- `imageCreateFlags = VK_IMAGE_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR | VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT | VK_IMAGE_CREATE_EXTENDED_USAGE_BIT`
|
||||
|
||||
---
|
||||
|
||||
## B. Queue family registration
|
||||
|
||||
### B.1 Current state (r4)
|
||||
|
||||
`src/panfrost/vulkan/panvk_device.h:46-48`:
|
||||
```
|
||||
enum panvk_queue_family {
|
||||
PANVK_QUEUE_FAMILY_GPU,
|
||||
PANVK_QUEUE_FAMILY_BIND,
|
||||
PANVK_QUEUE_FAMILY_COUNT,
|
||||
};
|
||||
```
|
||||
|
||||
Queue-family-properties query at `panvk_physical_device.c:557-595`:
|
||||
```
|
||||
[PANVK_QUEUE_FAMILY_GPU] = {
|
||||
.queueFlags = VK_QUEUE_GRAPHICS_BIT | VK_QUEUE_COMPUTE_BIT | VK_QUEUE_TRANSFER_BIT,
|
||||
...
|
||||
},
|
||||
[PANVK_QUEUE_FAMILY_BIND] = {
|
||||
.queueFlags = VK_QUEUE_SPARSE_BINDING_BIT,
|
||||
.queueCount = 1,
|
||||
},
|
||||
```
|
||||
|
||||
Queue dispatch in `panvk_vX_device.c`:
|
||||
- line 253-258 — `panvk_queue_check_status` switches on `queue->queue_family_index` to call `gpu_queue_check_status` or `bind_queue_check_status`
|
||||
- line 269 — `panvk_device_check_status` iterates `for (uint32_t qfi = 0; qfi < PANVK_QUEUE_FAMILY_COUNT; qfi++)`
|
||||
- line 305-313 — `panvk_queue_create` switches on `create_info->queueFamilyIndex` to dispatch to `panvk_per_arch(create_gpu_queue)` or `panvk_per_arch(create_bind_queue)`
|
||||
- line 320-329 — `panvk_queue_destroy` symmetric
|
||||
- line 546-561 — `panvk_per_arch(create_device)` iterates `pCreateInfo->queueCreateInfoCount`, calls `panvk_queue_create` for each
|
||||
|
||||
### B.2 What to add
|
||||
|
||||
Add a third enum value `PANVK_QUEUE_FAMILY_VIDEO_DECODE`. Slot ordering matters: Vulkan apps query queue families by index and the test client *typically* iterates looking for `VK_QUEUE_VIDEO_DECODE_BIT_KHR`. Index value is opaque so adding at end is safe.
|
||||
|
||||
```
|
||||
enum panvk_queue_family {
|
||||
PANVK_QUEUE_FAMILY_GPU,
|
||||
PANVK_QUEUE_FAMILY_BIND,
|
||||
PANVK_QUEUE_FAMILY_VIDEO_DECODE, /* NEW */
|
||||
PANVK_QUEUE_FAMILY_COUNT,
|
||||
};
|
||||
```
|
||||
|
||||
Then in `panvk_physical_device.c:557-595` extend the props table:
|
||||
```
|
||||
[PANVK_QUEUE_FAMILY_VIDEO_DECODE] = {
|
||||
.queueFlags = VK_QUEUE_VIDEO_DECODE_BIT_KHR | VK_QUEUE_TRANSFER_BIT,
|
||||
.queueCount = 1,
|
||||
.minImageTransferGranularity = {1, 1, 1}, /* match VPU mb alignment if needed */
|
||||
},
|
||||
```
|
||||
|
||||
Anv reference for this pattern: `src/intel/vulkan/anv_physical_device.c:2556-2576` (queue-family-init writing flags onto `pdevice->queue.families[family_count++]`). Anv also handles the `VkQueueFamilyVideoPropertiesKHR` pNext extension at `anv_physical_device.c:3012-3030`:
|
||||
```c
|
||||
case VK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHR: {
|
||||
VkQueueFamilyVideoPropertiesKHR *prop = ...;
|
||||
if (queue_family->queueFlags & VK_QUEUE_VIDEO_DECODE_BIT_KHR) {
|
||||
prop->videoCodecOperations = VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR | ...;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
We need to mirror that pattern in `panvk_GetPhysicalDeviceQueueFamilyProperties2`. Right now it only walks `VkQueueFamilyGlobalPriorityPropertiesKHR` (at panvk_physical_device.c:589). Add a pNext walk for `VK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHR` and fill `videoCodecOperations = VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR`. Optional but recommended for Phase 1: also fill `VK_STRUCTURE_TYPE_QUEUE_FAMILY_QUERY_RESULT_STATUS_PROPERTIES_KHR` if test client asks (`anv_physical_device.c:3007-3011`).
|
||||
|
||||
### B.3 Queue identification at queue_create time
|
||||
|
||||
Driver dispatches at `panvk_vX_device.c:305-313` via `panvk_queue_create`. Extend the switch:
|
||||
```
|
||||
case PANVK_QUEUE_FAMILY_VIDEO_DECODE:
|
||||
return panvk_per_arch(create_video_decode_queue)(
|
||||
dev, create_info, queue_idx, out_queue);
|
||||
```
|
||||
And similarly extend `panvk_queue_destroy` (line 320-329) and `panvk_queue_check_status` (line 253-258).
|
||||
|
||||
For check_global_priority at panvk_vX_device.c:218-247 — the video decode family gets a new case that returns `VK_SUCCESS` for any priority (since the V4L2 device doesn't expose priority semantics) or just `VK_QUEUE_GLOBAL_PRIORITY_MEDIUM_KHR` like BIND.
|
||||
|
||||
### B.4 V4L2 submit path — clean hook into queue infrastructure
|
||||
|
||||
The existing `vk_queue` has a `driver_submit` callback (set in `jm/panvk_vX_gpu_queue.c:359`: `queue->vk.driver_submit = panvk_per_arch(gpu_queue_submit);`). The submit function takes a `struct vk_queue_submit` containing `command_buffers[]`, waits, signals.
|
||||
|
||||
For our V4L2 queue, the analog is: `queue->vk.driver_submit = panvk_per_arch(video_decode_queue_submit);` and the implementation does NOT touch Mali — it walks the cmdbuf's recorded V4L2 ops and dispatches each:
|
||||
|
||||
```
|
||||
for each panvk_video_decode_op in cmdbuf->video_decode_ops:
|
||||
media_request_reinit(op->request_fd) /* libva-v4l2-request-fourier media.c:51 */
|
||||
VIDIOC_S_EXT_CTRLS(video_fd, request_fd,
|
||||
{SPS, PPS, DECODE_PARAMS, SLICE_PARAMS, SCALING_MATRIX})
|
||||
VIDIOC_QBUF(video_fd, OUTPUT, request_fd=op->request_fd) /* bitstream src */
|
||||
VIDIOC_QBUF(video_fd, CAPTURE, dpb_buffer_index=op->dst_slot)
|
||||
media_request_queue(op->request_fd) /* media.c:65 */
|
||||
poll(request_fd, POLLPRI, timeout) /* media.c:79 */
|
||||
VIDIOC_DQBUF(video_fd, OUTPUT)
|
||||
VIDIOC_DQBUF(video_fd, CAPTURE)
|
||||
```
|
||||
|
||||
The waits/signals from `vk_queue_submit` need to map to syncobj waits before we VIDIOC_QBUF, and a syncobj signal after the POLL completes. For Phase 1 (a single submit with no other GPU work in the queue), we can ignore semaphores and just use a syncobj that signals on DQBUF completion.
|
||||
|
||||
`vk_queue_init` (`panvk_vX_gpu_queue.c:348`) is the entry point; we'd reuse the same pattern for `create_video_decode_queue`. Allocate a `struct panvk_video_decode_queue { struct vk_queue vk; int video_fd; int media_fd; ... }` and stash the fds.
|
||||
|
||||
---
|
||||
|
||||
## C. Session object lifecycle (`VkVideoSessionKHR`)
|
||||
|
||||
### C.1 What CreateVideoSession allocates
|
||||
|
||||
Anv reference at `src/intel/vulkan/anv_video.c:31-55`:
|
||||
```c
|
||||
struct anv_video_session *vid = vk_alloc2(...);
|
||||
memset(vid, 0, sizeof(*vid));
|
||||
VkResult result = vk_video_session_init(&device->vk, &vid->vk, pCreateInfo);
|
||||
*pVideoSession = anv_video_session_to_handle(vid);
|
||||
```
|
||||
|
||||
That's it. The heavy lifting is in `vk_video_session_init` (`src/vulkan/runtime/vk_video.c:33-128`), which fills:
|
||||
- `vid->op` (`VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR` etc.)
|
||||
- `vid->max_coded`, `picture_format`, `ref_format`, `max_dpb_slots`, `max_active_ref_pics`
|
||||
- `vid->h264.profile_idc` from the `VkVideoDecodeH264ProfileInfoKHR` pNext (lines 51-57)
|
||||
|
||||
The driver-specific anv_video_session struct (`anv_private.h:6688-6727`) adds backend-specific per-stream state: `cdf_initialized` (for AV1), `vid_mem[ANV_VID_MEM_AV1_MAX]` (private memory bindings for codec scratch).
|
||||
|
||||
### C.2 Memory binding via vkBindVideoSessionMemoryKHR
|
||||
|
||||
Anv reference at `anv_video.c:914-998` for `GetVideoSessionMemoryRequirements` and `anv_video.c:972-1000` for `BindVideoSessionMemory`. The mem_idx enums for H.264 (`anv_private.h:6588-6593`):
|
||||
```c
|
||||
enum anv_vid_mem_h264_types {
|
||||
ANV_VID_MEM_H264_INTRA_ROW_STORE,
|
||||
ANV_VID_MEM_H264_DEBLOCK_FILTER_ROW_STORE,
|
||||
ANV_VID_MEM_H264_BSD_MPC_ROW_SCRATCH,
|
||||
ANV_VID_MEM_H264_MPR_ROW_SCRATCH,
|
||||
ANV_VID_MEM_H264_MAX,
|
||||
};
|
||||
```
|
||||
These are scratch buffers the Intel HCP/MFX engines need. The sizes are computed in `get_h264_video_mem_size` (`anv_video.c:483-501`) as multiples of width-in-MBs.
|
||||
|
||||
`BindVideoSessionMemory` (anv lines 972-998) is just bookkeeping: it copies each `VkBindVideoSessionMemoryInfoKHR` into `vid->vid_mem[bind_index]` (struct `anv_vid_mem { anv_device_memory *mem; offset; size; }` at `anv_private.h:6572-6576`).
|
||||
|
||||
### C.3 For our V4L2 backend
|
||||
|
||||
**Massive simplification opportunity**: the Hantro VPU does NOT require driver-allocated scratch buffers — all scratch is internal to the VPU and managed by the kernel driver. So `GetVideoSessionMemoryRequirements` can return **zero entries** (`*pVideoSessionMemoryRequirementsCount = 0`), and `BindVideoSessionMemory` becomes a no-op (just `return VK_SUCCESS;`).
|
||||
|
||||
What CreateVideoSession DOES need to allocate, V4L2-side:
|
||||
1. **Open `/dev/video1` and `/dev/media0`** if not already held by the device (see J.1 for ownership decision).
|
||||
2. **VIDIOC_S_FMT** on the OUTPUT queue: `V4L2_PIX_FMT_H264_SLICE` (note: hantro is slice-stateless), based on `vid->h264.profile_idc` and `vid->max_coded`. See libva-v4l2-request-fourier `src/h264.c:699-738` for the control-set pattern.
|
||||
3. **VIDIOC_S_FMT** on the CAPTURE queue: `V4L2_PIX_FMT_NV12`, dimensions from `vid->max_coded`.
|
||||
4. **Allocate request_fd pool**: pre-allocate N request fds (one per DPB slot + outstanding submits) via `MEDIA_IOC_REQUEST_ALLOC` ioctls (media.c:41).
|
||||
5. **VIDIOC_REQBUFS** on OUTPUT + CAPTURE queues to set up buffer count.
|
||||
|
||||
So `panvk_video_session` struct shape:
|
||||
```c
|
||||
struct panvk_video_session {
|
||||
struct vk_video_session vk; /* shared base */
|
||||
int video_fd; /* may share with physical_device */
|
||||
int media_fd; /* may share with physical_device */
|
||||
/* per-session V4L2 state */
|
||||
uint32_t bitstream_buffer_count;
|
||||
uint32_t capture_buffer_count;
|
||||
struct {
|
||||
int request_fd;
|
||||
bool in_use;
|
||||
uint32_t dpb_slot;
|
||||
} request_pool[MAX_OUTSTANDING_DECODES];
|
||||
};
|
||||
```
|
||||
|
||||
### C.4 Anv session creation shape — full reference
|
||||
|
||||
```c
|
||||
VkResult anv_CreateVideoSessionKHR(VkDevice _device,
|
||||
const VkVideoSessionCreateInfoKHR *pCreateInfo,
|
||||
const VkAllocationCallbacks *pAllocator,
|
||||
VkVideoSessionKHR *pVideoSession)
|
||||
/* anv_video.c:31-55 */
|
||||
{
|
||||
ANV_FROM_HANDLE(anv_device, device, _device);
|
||||
struct anv_video_session *vid = vk_alloc2(..., sizeof(*vid), 8, OBJECT);
|
||||
if (!vid) return vk_error(device, VK_ERROR_OUT_OF_HOST_MEMORY);
|
||||
memset(vid, 0, sizeof(*vid));
|
||||
VkResult result = vk_video_session_init(&device->vk, &vid->vk, pCreateInfo);
|
||||
if (result != VK_SUCCESS) { vk_free2(..., vid); return result; }
|
||||
*pVideoSession = anv_video_session_to_handle(vid);
|
||||
return VK_SUCCESS;
|
||||
}
|
||||
```
|
||||
|
||||
For us, the body grows by ~15-30 lines for V4L2 setup (open fds, S_FMT, REQBUFS, request_fd pool init) and adds error-rollback paths.
|
||||
|
||||
---
|
||||
|
||||
## D. Parameters object lifecycle (`VkVideoSessionParametersKHR`)
|
||||
|
||||
### D.1 The shared layer does almost everything
|
||||
|
||||
`src/vulkan/runtime/vk_video.c:845-885` defines:
|
||||
- `vk_common_CreateVideoSessionParametersKHR` (line 846-862)
|
||||
- `vk_common_UpdateVideoSessionParametersKHR` (line 865-872)
|
||||
- `vk_common_DestroyVideoSessionParametersKHR` (line 875-885)
|
||||
|
||||
These delegate to:
|
||||
- `vk_video_session_parameters_create` (helper at `vk_video.c:480` — alloc + dispatch by codec op)
|
||||
- `vk_video_session_parameters_update` (line 793-844 — switches on `params->op` and calls `update_h264_dec_session_parameters` at line 692 which does the actual SPS/PPS array merge with seq_parameter_set_id collision detection per the spec)
|
||||
- `vk_video_session_parameters_destroy`
|
||||
|
||||
**Key question**: do panvk-bifrost entrypoints get auto-wired to the `vk_common_*` versions, or does the driver need to opt in?
|
||||
|
||||
Mesa's entrypoint generator (`vk_entrypoints_gen.py`) wires shared-helper entrypoints **by default** unless the driver provides a stronger symbol. So if panvk does NOT define `panvk_CreateVideoSessionParametersKHR`, the linker falls through to `vk_common_CreateVideoSessionParametersKHR`. Confirmed by anv comparison: anv has no `anv_CreateVideoSessionParametersKHR`, only `anv_UpdateVideoSessionParametersKHR` is missing too — both come from `vk_common_*`.
|
||||
|
||||
radv DOES override (`radv_video.c:630-647`) but only to call `radv_video_patch_session_parameters` for an AMD-specific fixup. For Phase 1 we don't need that.
|
||||
|
||||
**Decision: rely entirely on vk_common.** Zero driver code for parameters object lifecycle.
|
||||
|
||||
### D.2 Parameters → V4L2 control conversion happens at CmdDecodeVideo time, not at parameter creation
|
||||
|
||||
The shared parameters struct (`vk_video.h:127-195`) for H.264-decode stores SPS array of `struct vk_video_h264_sps` (which embeds `StdVideoH264SequenceParameterSet base`) and PPS array of `struct vk_video_h264_pps` (which embeds `StdVideoH264PictureParameterSet base`). The lookup helpers `vk_video_find_h264_dec_std_sps(params, id)` and `vk_video_find_h264_dec_std_pps(params, id)` (`vk_video.c:1186-1198`) are what we call at decode time to get the SPS/PPS for the current frame.
|
||||
|
||||
The V4L2-side bridge from `StdVideoH264SequenceParameterSet` → `struct v4l2_ctrl_h264_sps` is the same conversion fourier does. See `libva-v4l2-request-fourier/src/h264.c:360` for `h264_va_picture_to_v4l2` which marshals to `struct v4l2_ctrl_h264_decode_params`, `v4l2_ctrl_h264_pps`, `v4l2_ctrl_h264_sps` — except the source format on our side is `StdVideoH264*` instead of `VAPictureParameterBufferH264`. The field-name mapping is essentially identical because both `VAPictureParameterBufferH264` and `StdVideoH264SequenceParameterSet` ultimately derive from the H.264 spec's syntax element names.
|
||||
|
||||
**We will write `panvk_h264_std_sps_to_v4l2(const StdVideoH264SequenceParameterSet *std, struct v4l2_ctrl_h264_sps *out)` etc.** as a new helper file (~150 lines per codec). This is the bridge function that has no Mesa precedent — it's our novel contribution.
|
||||
|
||||
### D.3 Hooking the parameters cache to ext-control structs at decode time
|
||||
|
||||
At `CmdDecodeVideoKHR` recording time, we retrieve the relevant `StdVideoH264SequenceParameterSet *` and `StdVideoH264PictureParameterSet *` via `vk_video_get_h264_parameters` (`vk_video.h:419-425`). The signature:
|
||||
```c
|
||||
void vk_video_get_h264_parameters(const struct vk_video_session *session,
|
||||
const struct vk_video_session_parameters *params,
|
||||
const VkVideoDecodeInfoKHR *decode_info,
|
||||
const VkVideoDecodeH264PictureInfoKHR *h264_pic_info,
|
||||
const StdVideoH264SequenceParameterSet **sps_p,
|
||||
const StdVideoH264PictureParameterSet **pps_p);
|
||||
```
|
||||
Anv uses this at `genX_cmd_video.c:904` in `anv_h264_decode_video`. We do the same.
|
||||
|
||||
---
|
||||
|
||||
## E. vkCmdDecodeVideoKHR command recording
|
||||
|
||||
### E.1 What anv emits at record time vs submit time
|
||||
|
||||
**Crucial finding**: anv does ALL work at record time. By the time the cmdbuf goes to the queue, the command stream is fully baked. Look at `anv_h264_decode_video` (`genX_cmd_video.c:892-1300+`): every `anv_batch_emit(&cmd_buffer->batch, GENX(MFX_PIPE_MODE_SELECT), sel)` etc. is a register/packet write into the cmd_buffer's batch buffer. Submit time just kicks the batch.
|
||||
|
||||
The Begin/End wrappers are thin:
|
||||
- `CmdBeginVideoCodingKHR` (`genX_cmd_video.c:31-50`): stashes `cmd_buffer->video.vid = vid; cmd_buffer->video.params = params;` into command-buffer-local state. **That's it** for H.264 (AV1 adds CDF table init).
|
||||
- `CmdControlVideoCodingKHR` (`genX_cmd_video.c:52-74`): if RESET flag, emit `MI_FLUSH_DW` with `VideoPipelineCacheInvalidate = 1`.
|
||||
- `CmdEndVideoCodingKHR` (`genX_cmd_video.c:76-83`): clears `cmd_buffer->video.vid = NULL; cmd_buffer->video.params = NULL;`.
|
||||
|
||||
The `cmd_buffer->video` shadow state (`anv_private.h:4935-4938`):
|
||||
```c
|
||||
struct {
|
||||
struct anv_video_session *vid;
|
||||
struct vk_video_session_parameters *params;
|
||||
} video;
|
||||
```
|
||||
|
||||
### E.2 For our V4L2 backend — "deferred record"
|
||||
|
||||
The V4L2 ioctls cannot meaningfully happen at record time, because:
|
||||
1. The bitstream buffer (frame_info->srcBuffer) is a `VkBuffer` we don't necessarily know the contents of yet (might be filled by a prior submitted cmdbuf or by host writes between record and submit).
|
||||
2. Request_fd allocation and S_EXT_CTRLS need to be sequential per submit (cannot pre-bind a request_fd to a recorded cmdbuf and reuse it).
|
||||
|
||||
**Pattern: per-cmdbuf list of "video decode ops" recorded during CmdDecodeVideoKHR.** The op captures everything we need to replay at submit time:
|
||||
|
||||
```c
|
||||
struct panvk_video_decode_op {
|
||||
/* From CmdBegin */
|
||||
struct panvk_video_session *session;
|
||||
struct vk_video_session_parameters *params;
|
||||
/* From CmdDecode */
|
||||
VkBuffer src_buffer; /* bitstream source */
|
||||
VkDeviceSize src_offset;
|
||||
VkDeviceSize src_size;
|
||||
/* DPB target */
|
||||
struct panvk_image_view *dst_iv;
|
||||
uint32_t dst_dpb_slot;
|
||||
/* Already-resolved SPS/PPS pointers (cheap copy by value) */
|
||||
StdVideoH264SequenceParameterSet sps;
|
||||
StdVideoH264PictureParameterSet pps;
|
||||
/* H.264 slice info, picked apart at submit time */
|
||||
StdVideoDecodeH264PictureInfo std_pic_info;
|
||||
/* Reference slot info — small array, copy by value */
|
||||
uint32_t reference_slot_count;
|
||||
struct panvk_video_ref_slot reference_slots[16];
|
||||
};
|
||||
|
||||
struct panvk_cmd_buffer {
|
||||
...
|
||||
struct util_dynarray video_decode_ops; /* of struct panvk_video_decode_op */
|
||||
};
|
||||
```
|
||||
|
||||
Then submit-time (per B.4) walks the dynarray and does the ioctl dance per op.
|
||||
|
||||
Comparable record-time op-list pattern exists today for sparse binds (`panvk_sparse.c`). Anv stores per-cmdbuf state in `cmd_buffer->video` but doesn't queue up ops because it emits direct register packets. We're doing what anv would do if anv ran on a separate kernel device.
|
||||
|
||||
### E.3 CmdBegin/Control/End for our backend
|
||||
|
||||
- `panvk_per_arch(CmdBeginVideoCodingKHR)`: clear `cmd_buffer->video_decode_session = vid; cmd_buffer->video_decode_params = params;`. Optionally validate the reference slot layout matches the dpb_slot count we set up at session init.
|
||||
- `panvk_per_arch(CmdControlVideoCodingKHR)` for `VK_VIDEO_CODING_CONTROL_RESET_BIT_KHR`: this needs to translate to `MEDIA_REQUEST_IOC_REINIT` on all pooled request_fds — OR just mark a session-wide flag "next decode needs fresh request setup". Phase 1 we can no-op this if we always reinit per submit anyway.
|
||||
- `panvk_per_arch(CmdEndVideoCodingKHR)`: clear shadow state. No emission needed.
|
||||
|
||||
---
|
||||
|
||||
## F. DPB management
|
||||
|
||||
### F.1 Vulkan-side DPB model
|
||||
|
||||
Per-frame `VkCmdDecodeVideoKHR` receives:
|
||||
- `frame_info->dstPictureResource` — `VkVideoPictureResourceInfoKHR { codedOffset, codedExtent, baseArrayLayer, imageViewBinding }`. The image view that will receive the decoded output.
|
||||
- `frame_info->pSetupReferenceSlot` — `VkVideoReferenceSlotInfoKHR { slotIndex, pPictureResource }`. Says "this decoded frame becomes DPB slot N".
|
||||
- `frame_info->pReferenceSlots[]` — references TO read from. Each carries `slotIndex` + `pPictureResource`.
|
||||
|
||||
For H.264, additionally:
|
||||
- `pNext` chain `VkVideoDecodeH264PictureInfoKHR { pStdPictureInfo, sliceCount, pSliceOffsets }`
|
||||
- DPB slot pNext per reference: `VkVideoDecodeH264DpbSlotInfoKHR { pStdReferenceInfo }` — contains POC/short-term/long-term flags.
|
||||
|
||||
Anv's reference assembly logic at `genX_cmd_video.c:992-1004`:
|
||||
```c
|
||||
for (unsigned i = 0; i < frame_info->referenceSlotCount; i++) {
|
||||
const struct anv_image_view *ref_iv = anv_image_view_from_handle(
|
||||
frame_info->pReferenceSlots[i].pPictureResource->imageViewBinding);
|
||||
int idx = frame_info->pReferenceSlots[i].slotIndex;
|
||||
...
|
||||
dpb_slots[idx] = i;
|
||||
buf.ReferencePictureAddress[i] = anv_image_dpb_address(ref_iv, baseArrayLayer);
|
||||
}
|
||||
```
|
||||
|
||||
### F.2 V4L2 DPB model
|
||||
|
||||
`v4l2_ctrl_h264_decode_params::dpb[16]` is an array of `struct v4l2_h264_dpb_entry { reference_ts, pic_num, frame_num, fields, flags, top_field_order_cnt, bottom_field_order_cnt }`. Each entry's `reference_ts` is the timestamp used at VIDIOC_QBUF of the OUTPUT (bitstream) plane when that reference was decoded — V4L2 uses this as the "buffer identity" key.
|
||||
|
||||
So the mapping rule from Vulkan-side `VkVideoReferenceSlotInfoKHR[]` to V4L2-side `dpb[16]` is:
|
||||
|
||||
| Vulkan field | V4L2 dpb field | How to source |
|
||||
|---|---|---|
|
||||
| `pReferenceSlots[i].slotIndex` | array index in `dpb[]` | direct (assert `<= 16`) |
|
||||
| `pReferenceSlots[i].pNext->pStdReferenceInfo->PicOrderCnt[0]` | `top_field_order_cnt` | direct |
|
||||
| `pReferenceSlots[i].pNext->pStdReferenceInfo->PicOrderCnt[1]` | `bottom_field_order_cnt` | direct |
|
||||
| `pReferenceSlots[i].pNext->pStdReferenceInfo->FrameNum` | `frame_num` | direct |
|
||||
| short-term/long-term flag | `flags` | direct |
|
||||
| (the decoded output VkImage backing the ref slot) | `reference_ts` | **lookup**: we maintain a `slotIndex → reference_ts` map per-session, populated each time we decode into that slot. See libva-fourier `src/h264.c:140-218` for `dpb_insert`/`dpb_update`/`dpb_find_entry`. Our case is simpler: slotIndex is provided by Vulkan, we just need to track "what ts did I QBUF when I last decoded into slotIndex N". |
|
||||
|
||||
The fourier `src/h264.c:238-353` `h264_fill_dpb` function is the closest analog — it constructs `struct v4l2_h264_dpb_entry[]` from libva-side state. We do the analog but feed it from `pReferenceSlots[]`.
|
||||
|
||||
### F.3 Bookkeeping struct in panvk_video_session
|
||||
|
||||
```c
|
||||
struct panvk_video_session {
|
||||
...
|
||||
struct {
|
||||
uint64_t reference_ts; /* timestamp last used when decoding into this slot */
|
||||
struct panvk_image *image; /* the VkImage backing this slot's DPB */
|
||||
uint32_t array_layer;
|
||||
bool active;
|
||||
} dpb[16];
|
||||
};
|
||||
```
|
||||
|
||||
Update at decode-completion time (after VIDIOC_DQBUF) for the setup-reference-slot.
|
||||
|
||||
---
|
||||
|
||||
## G. Memory + dmabuf interop
|
||||
|
||||
### G.1 The challenge
|
||||
|
||||
App creates a `VkImage` with `VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR | VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR | VK_IMAGE_USAGE_SAMPLED_BIT`. Memory is bound via normal `vkBindImageMemory`. Then the decoded frame data needs to physically end up in that memory backing.
|
||||
|
||||
Hantro's CAPTURE queue allocates its own buffers via `VIDIOC_REQBUFS(memory=V4L2_MEMORY_MMAP)` or accepts dma_buf imports via `VIDIOC_REQBUFS(memory=V4L2_MEMORY_DMABUF)`. The clean path: **app's VkImage memory backing IS a dma_buf**, exported from panvk via `vkGetMemoryFdKHR`, and we VIDIOC_QBUF'd with the dma_buf fd as the CAPTURE plane.
|
||||
|
||||
But Vulkan apps don't usually export memory back to themselves. They expect `vkCreateImage(usage=VIDEO_DECODE_DST)` to "just work". So **we** drive the dma_buf flow internally.
|
||||
|
||||
### G.2 Internal dma_buf flow (proposed)
|
||||
|
||||
Two strategies:
|
||||
|
||||
**Strategy A: Driver-allocated CAPTURE buffers, app-imported into VkImage**
|
||||
- VIDIOC_REQBUFS(MMAP) at session create.
|
||||
- VIDIOC_EXPBUF to get a dma_buf fd per allocated buffer.
|
||||
- Import the dma_buf back into pan_kmod as a VkDeviceMemory equivalent.
|
||||
- VkBindImageMemory to that DeviceMemory.
|
||||
|
||||
**Strategy B: App-allocated VkImage, V4L2_MEMORY_DMABUF queue**
|
||||
- App calls vkCreateImage with VkExternalMemoryImageCreateInfo handleTypes=DMA_BUF.
|
||||
- Vk allocates the BO via pan_kmod, exports a dma_buf fd via `pan_kmod_bo_export` (`panvk_device_memory.c:387-404`).
|
||||
- VIDIOC_QBUF(memory=V4L2_MEMORY_DMABUF, fd=our_dmabuf_fd) at submit time.
|
||||
|
||||
**Strategy B is what fourier does for surface buffers, and it's the cleaner fit** — the app gets a real VkImage with real VkDeviceMemory, we never have to fake the import direction. Phase 1 may want to start with Strategy A for simplicity since vk-video-samples likely doesn't pass `VkExternalMemoryImageCreateInfo` flags, but Strategy B is the long-term right answer.
|
||||
|
||||
### G.3 Anv's DPB image allocation
|
||||
|
||||
Anv treats DPB images as plain VkImages — no special allocation. The HW reads them directly via `anv_image_dpb_address(iv, baseArrayLayer)` at `genX_cmd_video.c:933`. Memory layout is whatever ISL gives them (tile-Y or planar-420). For our backend, that doesn't transfer — the Hantro VPU expects NV12 in a linear layout (or a vendor-specific tiled layout that we'd need to expose; for Phase 1 we mandate linear).
|
||||
|
||||
### G.4 panvk dmabuf entry points (already present)
|
||||
|
||||
- `panvk_AllocateMemory` handles `VkImportMemoryFdInfoKHR` at `panvk_device_memory.c:121-135` — calls `pan_kmod_bo_import`.
|
||||
- `panvk_GetMemoryFdKHR` at `panvk_device_memory.c:387-404` exports.
|
||||
- `EXT_external_memory_dma_buf` already advertised at `panvk_vX_physical_device.c:146`.
|
||||
|
||||
So the building blocks exist. The new code is the **session-internal V4L2 buffer pool** that converts between V4L2_MEMORY_MMAP/DMABUF and pan_kmod BOs.
|
||||
|
||||
---
|
||||
|
||||
## H. vk_video runtime helper coverage matrix
|
||||
|
||||
What we inherit vs what we write. Cross-referenced from sections A–G:
|
||||
|
||||
| Question | Inherit from vk_video shared layer? | Driver writes? |
|
||||
|---|---|---|
|
||||
| A. KHR_video_* extension booleans | No | YES — `panvk_vX_physical_device.c` table |
|
||||
| A. videoMaintenance1 feature struct | No | (Phase 1: skip; future: yes if advertised) |
|
||||
| A. GetPhysicalDeviceVideoCapabilitiesKHR | **NO** — direct entrypoint | YES — new code in `panvk_physical_device.c` |
|
||||
| A. GetPhysicalDeviceVideoFormatPropertiesKHR | **NO** — direct entrypoint | YES — new code in `panvk_physical_device.c` |
|
||||
| B. Queue family enum + props | No | YES — `panvk_device.h` + `panvk_physical_device.c` |
|
||||
| B. Queue-family-video pNext walk | No | YES — extend `panvk_GetPhysicalDeviceQueueFamilyProperties2` |
|
||||
| B. Queue create/destroy dispatch | No | YES — extend `panvk_vX_device.c:305-329` |
|
||||
| B. Queue submit | No | YES — new `panvk_vX_video_decode_queue.c` |
|
||||
| C. CreateVideoSessionKHR — handle + base init | YES partial: `vk_video_session_init` does the codec-op parsing | YES — driver wraps, adds V4L2 fd open + S_FMT + REQBUFS |
|
||||
| C. DestroyVideoSessionKHR — base finish | YES partial: `vk_video_session_finish` | YES — driver wraps, adds V4L2 teardown |
|
||||
| C. GetVideoSessionMemoryRequirementsKHR | No | YES (trivial: zero entries) |
|
||||
| C. BindVideoSessionMemoryKHR | No | YES (trivial: no-op) |
|
||||
| D. CreateVideoSessionParametersKHR | **YES — `vk_common_CreateVideoSessionParametersKHR` (vk_video.c:846)** | NO driver code needed |
|
||||
| D. UpdateVideoSessionParametersKHR | **YES — `vk_common_UpdateVideoSessionParametersKHR` (vk_video.c:865)** | NO driver code needed |
|
||||
| D. DestroyVideoSessionParametersKHR | **YES — `vk_common_DestroyVideoSessionParametersKHR` (vk_video.c:875)** | NO driver code needed |
|
||||
| D. H.264 SPS/PPS storage | **YES — `struct vk_video_h264_{sps,pps}` (vk_video.h:32-43)** | NO |
|
||||
| D. H.264 SPS/PPS lookup | **YES — `vk_video_find_h264_dec_std_{sps,pps}` (vk_video.c:1186)** | NO |
|
||||
| D. H.264 params merge with dedup | **YES — internal to `vk_video_session_parameters_update`** | NO |
|
||||
| D. Std → V4L2 control marshalling | No precedent in Mesa | YES — NEW helper file (~300 lines for H.264) |
|
||||
| E. CmdBeginVideoCodingKHR | No | YES — trivial state-stash |
|
||||
| E. CmdControlVideoCodingKHR | No | YES — trivial RESET handling |
|
||||
| E. CmdEndVideoCodingKHR | No | YES — trivial state-clear |
|
||||
| E. CmdDecodeVideoKHR | No | YES — record op into cmdbuf dynarray |
|
||||
| E. `vk_video_get_h264_parameters` resolver | **YES (vk_video.h:419)** | NO |
|
||||
| F. DPB slot ↔ reference_ts map | No | YES — `panvk_video_session.dpb[16]` |
|
||||
| F. H.264 reference list construction | Partially: `vk_fill_video_h264_*` helpers if present | YES — but mostly direct field copies |
|
||||
| G. dmabuf BO import/export | YES — existing panvk path (`panvk_device_memory.c:121,387`) | NO new code |
|
||||
| G. V4L2 buffer ↔ pan_kmod_bo bridging | No precedent | YES — NEW helper file |
|
||||
| G. Image creation for VIDEO_DECODE_DST | YES — existing `panvk_image_init` (panvk_image.c:562) handles all usage flags through ISL | Possibly yes for tile mode restrictions |
|
||||
|
||||
**Net leverage**: ~3000 lines of vk_video runtime helpers we inherit for free, primarily the H.264 SPS/PPS bitstream parsing + parameters object lifecycle + std/find helpers. Our new-code estimate is roughly 800-1500 lines split across ~4 new files (see I).
|
||||
|
||||
---
|
||||
|
||||
## I. panvk-specific integration points (concrete edits)
|
||||
|
||||
### I.1 Existing files to modify
|
||||
|
||||
**`src/panfrost/vulkan/panvk_vX_physical_device.c`**:
|
||||
- Lines ~123-124 (between `KHR_vertex_attribute_divisor` and `KHR_vulkan_memory_model`): add `.KHR_video_queue = true,`, `.KHR_video_decode_queue = true,`, `.KHR_video_decode_h264 = true,` (gated on hantro probe).
|
||||
- Optional Phase 2+: at line 540, flip `unifiedImageLayoutsVideo` based on session config.
|
||||
|
||||
**`src/panfrost/vulkan/panvk_physical_device.c`**:
|
||||
- Line ~565: extend the `qfamily_props[]` array — add a third entry for `PANVK_QUEUE_FAMILY_VIDEO_DECODE` with `queueFlags = VK_QUEUE_VIDEO_DECODE_BIT_KHR | VK_QUEUE_TRANSFER_BIT`.
|
||||
- Around line 589 inside the `vk_outarray_append_typed` loop: add a pNext walk for `VK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHR` that sets `videoCodecOperations = VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR`.
|
||||
- ADD new entrypoints `panvk_GetPhysicalDeviceVideoCapabilitiesKHR` and `panvk_GetPhysicalDeviceVideoFormatPropertiesKHR` at end of file (~70 lines + ~50 lines).
|
||||
|
||||
**`src/panfrost/vulkan/panvk_device.h`**:
|
||||
- Line 46-48: add `PANVK_QUEUE_FAMILY_VIDEO_DECODE,` to the enum.
|
||||
|
||||
**`src/panfrost/vulkan/panvk_vX_device.c`**:
|
||||
- Lines 218-247 (`check_global_priority`): add `case PANVK_QUEUE_FAMILY_VIDEO_DECODE: return VK_SUCCESS;`.
|
||||
- Lines 253-258 (`panvk_queue_check_status`): add case for the new family calling `panvk_per_arch(video_decode_queue_check_status)`.
|
||||
- Lines 305-313 (`panvk_queue_create`): add case calling `panvk_per_arch(create_video_decode_queue)`.
|
||||
- Lines 320-329 (`panvk_queue_destroy`): symmetric.
|
||||
|
||||
**`src/panfrost/vulkan/meson.build`**:
|
||||
- Add new files to either `libpanvk_files` (arch-agnostic) or `common_per_arch_files` (arch-templated). The session/queue/command-record code is arch-agnostic but uses `panvk_per_arch()` symbols only by convention — Phase 1 we can place all new files in `libpanvk_files` and skip the per_arch dispatch.
|
||||
|
||||
### I.2 New files to add
|
||||
|
||||
**`src/panfrost/vulkan/panvk_video_decode.c`** (~400 lines):
|
||||
- `panvk_CreateVideoSessionKHR`
|
||||
- `panvk_DestroyVideoSessionKHR`
|
||||
- `panvk_GetVideoSessionMemoryRequirementsKHR` (returns count=0)
|
||||
- `panvk_BindVideoSessionMemoryKHR` (no-op)
|
||||
- `panvk_CmdBeginVideoCodingKHR`
|
||||
- `panvk_CmdControlVideoCodingKHR`
|
||||
- `panvk_CmdEndVideoCodingKHR`
|
||||
- `panvk_CmdDecodeVideoKHR` (record op into `cmd_buffer->video_decode_ops`)
|
||||
|
||||
**`src/panfrost/vulkan/panvk_video_decode.h`**:
|
||||
- `struct panvk_video_session`
|
||||
- `struct panvk_video_decode_op`
|
||||
- `struct panvk_video_decode_queue`
|
||||
|
||||
**`src/panfrost/vulkan/panvk_v4l2.c`** (~500 lines):
|
||||
- `panvk_v4l2_probe_hantro()` — finds /dev/video1 and /dev/media0 (mirrors libva-v4l2-request-fourier `src/request.c:143-308` `find_decoder_video_node_via_topology`).
|
||||
- `panvk_v4l2_session_init()` — S_FMT on OUTPUT/CAPTURE, REQBUFS, request_fd pool alloc.
|
||||
- `panvk_v4l2_h264_std_to_ctrl_sps()` — `StdVideoH264SequenceParameterSet *` → `struct v4l2_ctrl_h264_sps`.
|
||||
- `panvk_v4l2_h264_std_to_ctrl_pps()` — `StdVideoH264PictureParameterSet *` → `struct v4l2_ctrl_h264_pps`.
|
||||
- `panvk_v4l2_h264_fill_decode_params()` — build `struct v4l2_ctrl_h264_decode_params` from VkVideoDecodeInfoKHR + slot map.
|
||||
- `panvk_v4l2_submit_op()` — the request_fd / S_EXT_CTRLS / QBUF / poll / DQBUF dance for one op.
|
||||
|
||||
**`src/panfrost/vulkan/panvk_vX_video_decode_queue.c`** (~150 lines, per_arch):
|
||||
- `panvk_per_arch(create_video_decode_queue)`
|
||||
- `panvk_per_arch(destroy_video_decode_queue)`
|
||||
- `panvk_per_arch(video_decode_queue_submit)` — walks cmdbuf ops, calls `panvk_v4l2_submit_op` per op.
|
||||
- `panvk_per_arch(video_decode_queue_check_status)`
|
||||
|
||||
### I.3 Entrypoint generation
|
||||
|
||||
Recall from `meson.build:7-19` that entrypoints are auto-wired with `--prefix panvk` and per-arch prefixes. The names above (`panvk_CmdDecodeVideoKHR` etc.) match the auto-resolution rules — no changes needed in `vk_entrypoints_gen` invocation.
|
||||
|
||||
For the per-arch ones (`panvk_per_arch(...)`), we expand under each `PAN_ARCH` define just like existing per-arch code.
|
||||
|
||||
---
|
||||
|
||||
## J. Probable architecture sketch
|
||||
|
||||
**V4L2 fd ownership**: at `panvk_physical_device` level for probe-time discovery (`panvk_v4l2_probe_hantro` sets `phys_dev->v4l2.video_fd_present = true` and stashes paths), but actual `open()` happens at `panvk_CreateVideoSessionKHR` time per-session. Two reasons: (1) the V4L2 driver state is per-fd, so two concurrent sessions need two separate fds anyway; (2) keeping fds closed when no video session is active is good citizenship. The PhysicalDevice only holds device-node paths and capability flags.
|
||||
|
||||
**Per-session V4L2 state**: `struct panvk_video_session` (see C.3) owns one `video_fd` + one `media_fd` + a pool of `request_fd`s (one per max-in-flight decode, typically `max_dpb_slots + 2`). At `CreateVideoSession` we S_FMT both queues, REQBUFS to allocate the buffer count, EXPBUF the CAPTURE buffers to dma_bufs that get held in the session for later association with VkImage memory (Strategy B from G.2).
|
||||
|
||||
**Per-VkImage dmabuf bookkeeping**: the existing pan_kmod export path (`panvk_device_memory.c:387-404`) gives us dma_buf out. The new piece is the inverse — at `vkBindImageMemory` time for a `VkImage` whose `usage & VIDEO_DECODE_DST`, we'd register the underlying BO's dma_buf as a CAPTURE buffer with `VIDIOC_QBUF(memory=V4L2_MEMORY_DMABUF)`. The image's `panvk_image` struct gains a `int v4l2_capture_index;` field.
|
||||
|
||||
**Submit-time dispatch**: at `panvk_vX_device.c:305-313` we extended the switch to route `PANVK_QUEUE_FAMILY_VIDEO_DECODE` to `panvk_per_arch(create_video_decode_queue)` whose `driver_submit = panvk_per_arch(video_decode_queue_submit)`. The submit function walks each cmdbuf's `video_decode_ops` dynarray, and per op:
|
||||
|
||||
```
|
||||
1. resolve request_fd from session pool (allocate or reuse, ioctl(media_fd, MEDIA_IOC_REQUEST_ALLOC))
|
||||
2. media_request_reinit(request_fd) if reusing
|
||||
3. translate op->sps to v4l2_ctrl_h264_sps via panvk_v4l2_h264_std_to_ctrl_sps()
|
||||
4. translate op->pps to v4l2_ctrl_h264_pps via panvk_v4l2_h264_std_to_ctrl_pps()
|
||||
5. build v4l2_ctrl_h264_decode_params from op (including dpb[] from session->dpb[] tracking)
|
||||
6. VIDIOC_S_EXT_CTRLS(video_fd, request_fd=op->request_fd, {SPS, PPS, DECODE_PARAMS, SCALING_MATRIX, SLICE_PARAMS})
|
||||
7. VIDIOC_QBUF(video_fd, OUTPUT, request_fd=op->request_fd, bytesused=op->src_size, m.fd=op->src_buffer's bo dma_buf)
|
||||
8. VIDIOC_QBUF(video_fd, CAPTURE, index=op->dst_iv->image->v4l2_capture_index)
|
||||
9. MEDIA_REQUEST_IOC_QUEUE(request_fd)
|
||||
10. poll(request_fd, POLLPRI, timeout)
|
||||
11. VIDIOC_DQBUF(video_fd, OUTPUT) /* releases input slot */
|
||||
12. VIDIOC_DQBUF(video_fd, CAPTURE) /* output ready */
|
||||
13. Update session->dpb[op->dst_dpb_slot].reference_ts to the QBUF timestamp
|
||||
14. Signal vk_queue_submit's signal semaphores
|
||||
```
|
||||
|
||||
Steps 5-12 are exactly the libva-v4l2-request-fourier `RequestEndPicture` body (`src/picture.c:497-650`). The mapping VAPicture* → V4L2 vs Std* → V4L2 is the one piece of code that has no Mesa precedent — we're inventing the bridge — but it's bounded: ~150 lines per codec (we only need H.264 in Phase 1).
|
||||
|
||||
---
|
||||
|
||||
## Mesa-version observations and risks
|
||||
|
||||
- Mesa 26.0.6 is the campaign baseline. The vk_video runtime helpers in `src/vulkan/runtime/vk_video.{c,h}` are stable in this version with H.264, H.265, AV1, VP9, encode-h264, encode-h265, encode-av1 all covered. No upgrade required for Phase 1.
|
||||
- `KHR_video_decode_h264` spec v9 is what's in `vk_api.xml` for 26.0.6 — confirmed by extension being already known to entrypoint generator (no `--beta` flag needed; that flag at `meson.build:18` is for beta/provisional extensions only).
|
||||
- Maintenance1/2 features are NOT required for the simple-test in Phase 1, so we don't need `videoMaintenance1` / `videoMaintenance2` machinery yet. Maintenance1 (inline parameters, inline queries) becomes relevant in Phase 6+ if we want to pass conformance suites.
|
||||
- The `unifiedImageLayoutsVideo` feature at `panvk_vX_physical_device.c:540` is currently false. Phase 1 we can leave it false — the test client honors explicit `VkImageMemoryBarrier` transitions to/from `VK_IMAGE_LAYOUT_VIDEO_DECODE_DST_KHR`.
|
||||
|
||||
---
|
||||
|
||||
## Architectural maps that DO cleanly transfer from anv/radv
|
||||
|
||||
1. **Session as wrapper around `vk_video_session`**. Anv: `struct anv_video_session { struct vk_video_session vk; ... }`. radv: same shape. Ours: same shape. The `vk.` namespace gives us all the spec-mandated session fields for free.
|
||||
2. **Parameters fully delegated to `vk_common_*`**. Anv does this, radv mostly does this (with a tiny `radv_video_patch_session_parameters` patch). Ours: full delegation.
|
||||
3. **Cmdbuf-local shadow state for current session+params during the Begin..End scope**. Anv: `cmd_buffer->video.{vid,params}`. We do the same.
|
||||
4. **DPB slot index ↔ image view lookup at decode time**. Both anv and our backend do this lookup per frame.
|
||||
|
||||
## Architectural maps that DO NOT transfer
|
||||
|
||||
1. **Driver-allocated session scratch memory (`anv_vid_mem` array)**. Hantro VPU keeps scratch internal; we return zero memory requirements. Hard skip — not just simplification, an inversion.
|
||||
2. **`anv_batch_emit` register packets directly into cmdbuf at record time**. There is no equivalent. We MUST defer to submit-time — that's the entire point of the V4L2 backend being on a separate kernel device.
|
||||
3. **`anv_image_dpb_address(iv, layer)` resolving to a GPU virtual address**. Our DPB references resolve to V4L2 buffer indices (queued at session-init) or dma_buf fds (Strategy B). The "address" abstraction doesn't apply; the VPU doesn't share the GPU's address space.
|
||||
4. **MFX/HCP/VDENC register-set knowledge in `genX_cmd_video.c`** — 4000+ lines of Intel-specific HW programming. Completely irrelevant. The Hantro VPU's "programming" is a sequence of struct `v4l2_ctrl_*` fills + ioctls.
|
||||
5. **MOCS / cache state in pipe-buf-addr-state** (`genX_cmd_video.c:962+`). N/A — the kernel V4L2 driver handles all cache coherency at QBUF/DQBUF boundaries.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 success criteria — final checklist
|
||||
|
||||
| vk-video-samples simple-test step | Where it lands in this map |
|
||||
|---|---|
|
||||
| `vkGetPhysicalDeviceQueueFamilyProperties2` returns family with `VK_QUEUE_VIDEO_DECODE_BIT_KHR` and `VkQueueFamilyVideoPropertiesKHR::videoCodecOperations & VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR` set | B.2 |
|
||||
| `vkEnumerateDeviceExtensionProperties` returns the three KHR_video_* | A.1 |
|
||||
| `vkGetPhysicalDeviceVideoCapabilitiesKHR(profile=H264)` returns sane caps | A.3 |
|
||||
| `vkGetPhysicalDeviceVideoFormatPropertiesKHR` returns NV12 | A.4 |
|
||||
| `vkCreateDevice` succeeds with the video queue family selected | B.3 |
|
||||
| `vkCreateVideoSessionKHR` succeeds | C |
|
||||
| `vkGetVideoSessionMemoryRequirementsKHR` returns 0 entries | C.3 |
|
||||
| `vkCreateVideoSessionParametersKHR` with SPS+PPS succeeds | D (free from vk_common) |
|
||||
| Recording a `vkCmdDecodeVideoKHR` succeeds (no execution yet — could even no-op the V4L2 ioctls in Phase 1 since correctness isn't tested) | E.2 |
|
||||
| Single queue submit succeeds without VK_ERROR_DEVICE_LOST | B.4, J |
|
||||
|
||||
Phase 1 deliberately stops short of "decoded picture compares against reference". That's Phase 7. Phase 1 is the end-to-end plumbing.
|
||||
Reference in New Issue
Block a user