# Phase 2 — design lock for panvk-bifrost-video Phase 1 source-map (`phase1_source_map.md`) acquired the architecture. This document locks the implementation-level decisions that bind Phase 4. Where Phase 1 listed options, this picks one. ## Re-anchored constraints (re-verified 2026-05-21) - ohm reachable, kernel `linux-fresnel-fourier` with `dma_resv` patches - `/dev/video1` (hantro decoder) + `/dev/media0` (media controller) present - libva-v4l2-request-fourier installed and exercising the same V4L2 path — proves the protocol works (1.56× / 1.73× realtime). **Coexistence policy: env-mutex (Phase 0 Q1 lock A).** Only one client holds `/dev/video1` at a time; user picks via `LIBVA_DRIVER_NAME=null` or service-level coordination. - `mesa-panvk-bifrost` r4 source on ohm at `/home/mfritsche/mesa-build/mesa-26.0.6/`. Reuses the same r1–r4 patch lineage in PKGBUILD; new package `mesa-panvk-bifrost-video` is a sibling — see Phase 0 [[campaign-close-via-pkgbuild]]. - Vulkan headers: 26.0.6's bundled `vk.xml` has H.264 decode v9 stable. No `--beta` flag needed. - Test bitstream: `/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4` (725 MB H.264 Main, 1080p30) — proven decoding via libva path 2026-05-21. - vk-video-samples builds on aarch64 (Phase 0). simple-test binary at `~/panvk-bifrost-video-evidence/Vulkan-Video-Samples/build/vk_video_decoder/test/vulkan-video-dec-simple-test`. ## Locked decisions ### D1 — V4L2 device ownership: per-`VkVideoSessionKHR`, not per-`VkDevice` Each call to `vkCreateVideoSessionKHR` opens its own `video_fd` to `/dev/video1` and `media_fd` to `/dev/media0`. The PhysicalDevice only holds discovery state (paths + caps flags). Per Phase 1 §J reasoning: kernel V4L2 state is per-fd, multiple sessions need separate fds anyway, idle-when-no-session is good citizenship. Trade-off rejected: per-device shared fd. Would force a session-arbitration daemon inside panvk. Not worth it for Phase 1; not needed for the simple-test workload (single session). ### D2 — File layout (committed) New files in `src/panfrost/vulkan/`: | File | Purpose | Est. LoC | |---|---|---| | `panvk_video_decode.c` | VkVideoSession* + VkCmd*VideoCoding entrypoints; record video_decode_ops dynarray | ~400 | | `panvk_video_decode.h` | structs: `panvk_video_session`, `panvk_video_decode_op`, `panvk_video_decode_queue` | ~80 | | `panvk_v4l2.c` | V4L2 probe + per-session init + Std*→v4l2_ctrl_h264_* bridge + submit_op() | ~500 | | `panvk_vX_video_decode_queue.c` | per-arch queue create/destroy/submit (walks ops, calls panvk_v4l2_submit_op) | ~150 | Modified files (locations from Phase 1 §I.1): - `panvk_vX_physical_device.c` (extension list + capability/format entrypoints) - `panvk_physical_device.c` (queue family list + video properties pNext walk) - `panvk_device.h` (queue family enum) - `panvk_vX_device.c` (queue create/destroy/submit dispatch — 4 cases) - `meson.build` (register new sources) ### D3 — Per-session state struct (locked layout) ```c struct panvk_video_session { struct vk_video_session vk; /* spec-mandated fields */ /* V4L2 fds — opened in CreateVideoSession, closed in Destroy */ int video_fd; /* /dev/video1 */ int media_fd; /* /dev/media0 */ /* Negotiated formats per OUTPUT / CAPTURE queue */ struct v4l2_format fmt_output; struct v4l2_format fmt_capture; /* Request fd pool. Max-in-flight = max_dpb_slots + 2 */ int *request_fds; unsigned num_request_fds; uint32_t request_fd_next; /* round-robin index */ /* DPB slotIndex → V4L2 reference_ts mapping */ struct { bool valid; uint64_t reference_ts; /* V4L2 timestamp at QBUF time */ /* No image-view pointer here — image references via slotIndex * only; resolution at record time via vk.params lookup. */ } dpb[16]; /* DECODE_PARAMS/SLICE_PARAMS submit mode (locked FRAME_BASED for Phase 1) */ bool slice_based; /* Phase 1: false */ }; ``` DPB mirroring is identical to `libva-v4l2-request-fourier/src/h264.c:140-218` `dpb_insert` / `dpb_update`. Reuse the algorithm; don't link the lib — copy ~80 LoC verbatim into `panvk_v4l2.c`. ### D4 — Per-cmdbuf decode-op entry (locked layout) ```c struct panvk_video_decode_op { /* Captured at vkCmdDecodeVideoKHR record time */ uint32_t dst_dpb_slot; /* output slot */ struct panvk_image_view *dst_iv; /* output VkImageView */ uint32_t num_ref_slots; struct { uint32_t slot_index; struct panvk_image_view *iv; /* reference VkImageView */ } ref_slots[16]; /* Bitstream buffer */ struct panvk_buffer *src_buffer; uint64_t src_offset; uint64_t src_size; /* Cached params at record time (so submit can run after Parameters object updates) */ const StdVideoH264SequenceParameterSet *sps; /* from vk.params */ const StdVideoH264PictureParameterSet *pps; VkVideoDecodeH264PictureInfoKHR pic_info; /* the per-frame info */ /* Filled at submit time */ int request_fd; /* allocated from session pool */ uint64_t qbuf_ts; /* timestamp used for dpb tracking */ }; ``` Recorded as a `util_dynarray` on the command buffer. `vkResetCommandBuffer` clears it. ### D5 — Bitstream input: VkBuffer dmabuf import (one-shot) At record time, the `VkBuffer` (with `VIDEO_DECODE_SRC_BIT_KHR` usage) carries a `panvk_priv_bo` with an exportable dmabuf. At submit time, op-submit does: ``` fd = pan_kmod_bo_export_dma_buf(src_buffer->bo) VIDIOC_QBUF(video_fd, V4L2_BUF_TYPE_VIDEO_OUTPUT, memory=V4L2_MEMORY_DMABUF, m.fd=fd, bytesused=op->src_size, request_fd=op->request_fd) ``` Source-side buffers are not pinned to V4L2 OUTPUT slots — each decode gets a fresh QBUF using the dmabuf fd. After DQBUF the slot is implicitly released. ### D6 — Output frames: VkImage permanent CAPTURE slot binding (Strategy B from §G.2) At `vkBindImageMemory` time, if the VkImage's `usage & VIDEO_DECODE_DST_BIT_KHR`, the image's underlying BO is `EXPBUF`'d and registered as a permanent CAPTURE buffer slot via `VIDIOC_QBUF(memory=DMABUF)` at session init, then the slot index is stashed in: ```c struct panvk_image { ... int v4l2_capture_index; /* -1 if not a video output image */ }; ``` Rejected alternative: per-decode-call dmabuf import. Higher per-frame ioctl overhead. Strategy B amortizes the registration cost across the session lifetime. ### D7 — Submit-time ioctl dance (the 14 steps, locked) ``` panvk_per_arch(video_decode_queue_submit)(queue, submit): for each cmdbuf in submit: for each op in cmdbuf->video_decode_ops: panvk_v4l2_submit_op(session, op): 1. resolve request_fd: pool[round_robin++ % num] or MEDIA_IOC_REQUEST_ALLOC 2. ioctl(request_fd, MEDIA_REQUEST_IOC_REINIT) 3. fill v4l2_ctrl_h264_sps from op->sps via panvk_v4l2_h264_std_to_ctrl_sps() 4. fill v4l2_ctrl_h264_pps from op->pps via panvk_v4l2_h264_std_to_ctrl_pps() 5. fill v4l2_ctrl_h264_decode_params from op->pic_info + session->dpb[] 6. ext_controls = { SPS, PPS, DECODE_PARAMS, SCALING_MATRIX } (Phase 1: SLICE_PARAMS optional, FRAME_BASED → omit) 7. VIDIOC_S_EXT_CTRLS(video_fd, which=REQUEST_VAL, request_fd, ext_controls) 8. VIDIOC_QBUF(video_fd, OUTPUT, memory=DMABUF, request_fd, m.fd=src_dmabuf, bytesused=op->src_size, timestamp=op->qbuf_ts) 9. VIDIOC_QBUF(video_fd, CAPTURE, memory=DMABUF, index=dst_iv->image->v4l2_capture_index) 10. MEDIA_REQUEST_IOC_QUEUE(request_fd) 11. poll(request_fd, POLLPRI, timeout_ms=200) 12. VIDIOC_DQBUF(video_fd, OUTPUT) /* release input slot */ 13. VIDIOC_DQBUF(video_fd, CAPTURE) /* output ready */ 14. session->dpb[op->dst_dpb_slot] = { valid:true, reference_ts:op->qbuf_ts } vk_queue_signal_semaphores(submit->signal_semaphores) ``` Per Phase 1 §J. Step 11's 200ms timeout is empirically derived from libva-v4l2-request-fourier behavior (it polls indefinitely; we cap to avoid driver-side hangs surfacing as Vulkan device-lost on bad bitstreams). ### D8 — Synchronization: standard vk_queue infrastructure `panvk_per_arch(create_video_decode_queue)` initializes a `struct vk_queue` with `driver_submit = panvk_per_arch(video_decode_queue_submit)`. Wait/signal semaphores are handled by the standard `vk_queue_submit` infrastructure. Inside `submit`, the `poll(request_fd)` in step 11 is the synchronous gate — when it returns, the decode is done in V4L2 land, and the signal semaphores are signaled before returning. For Phase 1, **all video decodes are synchronous to submit**. Async / pipelined decode is Phase >>1. ### D9 — Hantro probe: by DT compatible name + topology `panvk_v4l2_probe_hantro()` enumerates `/dev/video*` via `udev`, queries each with `VIDIOC_QUERYCAP`, accepts cards whose `card` field starts with `"hantro-vpu"` OR matches the RK3568/RK3566/RK3588 hantro DT compatibles. Falls back to a hard-coded `/dev/video1` if udev unavailable. Mirrors `libva-v4l2-request-fourier/src/request.c:143-308` `find_decoder_video_node_via_topology`. Negative probe outcome (no hantro device) → physical_device's video extension advertisement returns false, queue family entry is suppressed, vkEnumerateDeviceExtensionProperties does not list the three KHR_video_*. Driver gracefully degrades to graphics-only. ### D10 — Errors: broad first, refine Phase 6 - V4L2 EINVAL / EAGAIN / EBUSY at submit → `VK_ERROR_DEVICE_LOST` (broad) - Probe failure during CreateVideoSession → `VK_ERROR_INITIALIZATION_FAILED` - DPB slot conflict → `VK_ERROR_OUT_OF_DEVICE_MEMORY` (closest spec match) - Refine per-error-class mapping in Phase 6 (conformance hardening). ## Out of scope for this iteration (explicit non-goals) 1. **H.265 / HEVC**: Phase 0 lock — H.264 only. 2. **Encode**: out of scope, ever (until a separate campaign). 3. **Async decode** / pipelined submit: synchronous-to-submit only in Phase 1. 4. **Multi-session concurrent decode**: single session only in Phase 1 (per Phase 0 Q5). 5. **`VkVideoMaintenance1`** (inline parameters, inline queries): not in the simple-test requirements. 6. **Multiplane 444 formats** (`VK_EXT_ycbcr_2plane_444_formats`): optional, not in Phase 1. 7. **`VK_EXT_descriptor_buffer`** integration: optional, not in Phase 1. 8. **Decode correctness verification** (frame-PSNR vs reference): Phase 7 territory. 9. **Brave consumer**: structurally unfixable, see brave-vaapi-fourier close + DokuWiki. ## Failure modes to watch for during Phase 4 (instrumentation plan) | Failure | Detection | |---|---| | hantro device not present on a build target | `panvk_v4l2_probe_hantro` returns false → extension list silently shrinks. Test: `vulkaninfo \| grep VK_KHR_video` empty on a non-hantro box | | `/dev/video1` held by libva → CreateVideoSession EBUSY | `mesa_loge()` at probe + return VK_ERROR_INITIALIZATION_FAILED. Test: run mpv-fourier in parallel, verify clean error message | | S_EXT_CTRLS EINVAL on a per-control basis | per-control `failing_ctrl_id` is in libva-v4l2-request-fourier `src/v4l2.c:497-502` (the format we don't have on the iter14 path). Reproduce that diagnostic in our `panvk_v4l2_submit_op` | | H.264 spec field mismatch between Std* and v4l2_ctrl_* | Add a per-field assertion in the std→v4l2 bridge for the fields where the bitwidth differs (e.g., `bit_depth_luma_minus8` is u8 in std, u8 in v4l2 — but some flags pack differently). Test: assert at translation time | | DPB slot reuse with stale reference_ts | `session->dpb[].valid` cleared at DestroyVideoSession + at ResetVideoCodingControl. Test: send a `RESET` flag mid-stream and check dpb[] is cleared | | Driver-side decode hang (bad bitstream) | poll(timeout=200ms) is the gate. Test: feed a truncated bitstream, verify clean VK_ERROR_DEVICE_LOST rather than session hang | ## Phase 4 implementation slice — first three commits Bite-sized, validated incrementally: 1. **Commit 1** — extension advertisement + queue family registration (no functionality, just enumeration). Validation: `vulkan-video-dec-simple-test` gets past `HasAllDeviceExtensions` check and into device creation. Failure mode: extension list still missing. 2. **Commit 2** — `CreateVideoSessionKHR` + `DestroyVideoSessionKHR` + capability/format entrypoints (returns sane caps, no V4L2 yet — fds opened as `/dev/null` placeholders if necessary). Validation: simple-test creates a session, gets memory requirements (0 entries), destroys it cleanly. Failure mode: session create returns ERROR. 3. **Commit 3** — `panvk_v4l2_probe_hantro` + real video_fd open + per-session V4L2 init (S_FMT, REQBUFS, request fd pool). Validation: simple-test creates a session against real `/dev/video1`. Failure mode: probe fails or EBUSY. After commit 3, all the plumbing is wired. Commits 4-6 add the per-frame decode plumbing (vkCmdDecodeVideoKHR record + submit dispatch + the ioctl dance). Commit 7 is the Std→v4l2 control bridge. ## Phase 2 close criteria - [x] All D1–D10 decisions locked - [x] Non-goals explicit - [x] Failure-modes table with detection methods - [x] Phase 4 first-three-commits slice defined - [x] Constraints re-verified on ohm (substrate side) Phase 3 next: build a probe test client (smaller than vk-video-samples) that exercises just the extension-advertisement + queue-family-enumeration path. This is the regression test Phase 4 commits 1-2 are validated against, before bringing in the heavier vk-video-samples machinery. — claude-noether, 2026-05-21