panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.
This retrofit imports:
- mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18)
(libmali stub blobs at iter18/blob/ excluded
— 109MB of RE artifacts replaced with a README
pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/ — frozen .tgz source snapshots at each milestone
(basis for the 0005 patch diff generation)
Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.
Total: 1.9 MB across 124 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
14 KiB
Phase 2 — design lock for panvk-bifrost-video
Phase 1 source-map (phase1_source_map.md) acquired the architecture. This document locks the implementation-level decisions that bind Phase 4. Where Phase 1 listed options, this picks one.
Re-anchored constraints (re-verified 2026-05-21)
- ohm reachable, kernel
linux-fresnel-fourierwithdma_resvpatches /dev/video1(hantro decoder) +/dev/media0(media controller) present- libva-v4l2-request-fourier installed and exercising the same V4L2 path — proves the protocol works (1.56× / 1.73× realtime). Coexistence policy: env-mutex (Phase 0 Q1 lock A). Only one client holds
/dev/video1at a time; user picks viaLIBVA_DRIVER_NAME=nullor service-level coordination. mesa-panvk-bifrostr4 source on ohm at/home/mfritsche/mesa-build/mesa-26.0.6/. Reuses the same r1–r4 patch lineage in PKGBUILD; new packagemesa-panvk-bifrost-videois a sibling — see Phase 0 campaign-close-via-pkgbuild.- Vulkan headers: 26.0.6's bundled
vk.xmlhas H.264 decode v9 stable. No--betaflag needed. - Test bitstream:
/home/mfritsche/fourier-test/bbb_1080p30_h264.mp4(725 MB H.264 Main, 1080p30) — proven decoding via libva path 2026-05-21. - vk-video-samples builds on aarch64 (Phase 0). simple-test binary at
~/panvk-bifrost-video-evidence/Vulkan-Video-Samples/build/vk_video_decoder/test/vulkan-video-dec-simple-test.
Locked decisions
D1 — V4L2 device ownership: per-VkVideoSessionKHR, not per-VkDevice
Each call to vkCreateVideoSessionKHR opens its own video_fd to /dev/video1 and media_fd to /dev/media0. The PhysicalDevice only holds discovery state (paths + caps flags). Per Phase 1 §J reasoning: kernel V4L2 state is per-fd, multiple sessions need separate fds anyway, idle-when-no-session is good citizenship.
Trade-off rejected: per-device shared fd. Would force a session-arbitration daemon inside panvk. Not worth it for Phase 1; not needed for the simple-test workload (single session).
D2 — File layout (committed)
New files in src/panfrost/vulkan/:
| File | Purpose | Est. LoC |
|---|---|---|
panvk_video_decode.c |
VkVideoSession* + VkCmd*VideoCoding entrypoints; record video_decode_ops dynarray | ~400 |
panvk_video_decode.h |
structs: panvk_video_session, panvk_video_decode_op, panvk_video_decode_queue |
~80 |
panvk_v4l2.c |
V4L2 probe + per-session init + Std*→v4l2_ctrl_h264_* bridge + submit_op() | ~500 |
panvk_vX_video_decode_queue.c |
per-arch queue create/destroy/submit (walks ops, calls panvk_v4l2_submit_op) | ~150 |
Modified files (locations from Phase 1 §I.1):
panvk_vX_physical_device.c(extension list + capability/format entrypoints)panvk_physical_device.c(queue family list + video properties pNext walk)panvk_device.h(queue family enum)panvk_vX_device.c(queue create/destroy/submit dispatch — 4 cases)meson.build(register new sources)
D3 — Per-session state struct (locked layout)
struct panvk_video_session {
struct vk_video_session vk; /* spec-mandated fields */
/* V4L2 fds — opened in CreateVideoSession, closed in Destroy */
int video_fd; /* /dev/video1 */
int media_fd; /* /dev/media0 */
/* Negotiated formats per OUTPUT / CAPTURE queue */
struct v4l2_format fmt_output;
struct v4l2_format fmt_capture;
/* Request fd pool. Max-in-flight = max_dpb_slots + 2 */
int *request_fds;
unsigned num_request_fds;
uint32_t request_fd_next; /* round-robin index */
/* DPB slotIndex → V4L2 reference_ts mapping */
struct {
bool valid;
uint64_t reference_ts; /* V4L2 timestamp at QBUF time */
/* No image-view pointer here — image references via slotIndex
* only; resolution at record time via vk.params lookup. */
} dpb[16];
/* DECODE_PARAMS/SLICE_PARAMS submit mode (locked FRAME_BASED for Phase 1) */
bool slice_based; /* Phase 1: false */
};
DPB mirroring is identical to libva-v4l2-request-fourier/src/h264.c:140-218 dpb_insert / dpb_update. Reuse the algorithm; don't link the lib — copy ~80 LoC verbatim into panvk_v4l2.c.
D4 — Per-cmdbuf decode-op entry (locked layout)
struct panvk_video_decode_op {
/* Captured at vkCmdDecodeVideoKHR record time */
uint32_t dst_dpb_slot; /* output slot */
struct panvk_image_view *dst_iv; /* output VkImageView */
uint32_t num_ref_slots;
struct {
uint32_t slot_index;
struct panvk_image_view *iv; /* reference VkImageView */
} ref_slots[16];
/* Bitstream buffer */
struct panvk_buffer *src_buffer;
uint64_t src_offset;
uint64_t src_size;
/* Cached params at record time (so submit can run after Parameters object updates) */
const StdVideoH264SequenceParameterSet *sps; /* from vk.params */
const StdVideoH264PictureParameterSet *pps;
VkVideoDecodeH264PictureInfoKHR pic_info; /* the per-frame info */
/* Filled at submit time */
int request_fd; /* allocated from session pool */
uint64_t qbuf_ts; /* timestamp used for dpb tracking */
};
Recorded as a util_dynarray on the command buffer. vkResetCommandBuffer clears it.
D5 — Bitstream input: VkBuffer dmabuf import (one-shot)
At record time, the VkBuffer (with VIDEO_DECODE_SRC_BIT_KHR usage) carries a panvk_priv_bo with an exportable dmabuf. At submit time, op-submit does:
fd = pan_kmod_bo_export_dma_buf(src_buffer->bo)
VIDIOC_QBUF(video_fd, V4L2_BUF_TYPE_VIDEO_OUTPUT,
memory=V4L2_MEMORY_DMABUF, m.fd=fd, bytesused=op->src_size,
request_fd=op->request_fd)
Source-side buffers are not pinned to V4L2 OUTPUT slots — each decode gets a fresh QBUF using the dmabuf fd. After DQBUF the slot is implicitly released.
D6 — Output frames: VkImage permanent CAPTURE slot binding (Strategy B from §G.2)
At vkBindImageMemory time, if the VkImage's usage & VIDEO_DECODE_DST_BIT_KHR, the image's underlying BO is EXPBUF'd and registered as a permanent CAPTURE buffer slot via VIDIOC_QBUF(memory=DMABUF) at session init, then the slot index is stashed in:
struct panvk_image {
...
int v4l2_capture_index; /* -1 if not a video output image */
};
Rejected alternative: per-decode-call dmabuf import. Higher per-frame ioctl overhead. Strategy B amortizes the registration cost across the session lifetime.
D7 — Submit-time ioctl dance (the 14 steps, locked)
panvk_per_arch(video_decode_queue_submit)(queue, submit):
for each cmdbuf in submit:
for each op in cmdbuf->video_decode_ops:
panvk_v4l2_submit_op(session, op):
1. resolve request_fd: pool[round_robin++ % num] or MEDIA_IOC_REQUEST_ALLOC
2. ioctl(request_fd, MEDIA_REQUEST_IOC_REINIT)
3. fill v4l2_ctrl_h264_sps from op->sps via panvk_v4l2_h264_std_to_ctrl_sps()
4. fill v4l2_ctrl_h264_pps from op->pps via panvk_v4l2_h264_std_to_ctrl_pps()
5. fill v4l2_ctrl_h264_decode_params from op->pic_info + session->dpb[]
6. ext_controls = { SPS, PPS, DECODE_PARAMS, SCALING_MATRIX }
(Phase 1: SLICE_PARAMS optional, FRAME_BASED → omit)
7. VIDIOC_S_EXT_CTRLS(video_fd, which=REQUEST_VAL, request_fd, ext_controls)
8. VIDIOC_QBUF(video_fd, OUTPUT, memory=DMABUF, request_fd, m.fd=src_dmabuf,
bytesused=op->src_size, timestamp=op->qbuf_ts)
9. VIDIOC_QBUF(video_fd, CAPTURE, memory=DMABUF, index=dst_iv->image->v4l2_capture_index)
10. MEDIA_REQUEST_IOC_QUEUE(request_fd)
11. poll(request_fd, POLLPRI, timeout_ms=200)
12. VIDIOC_DQBUF(video_fd, OUTPUT) /* release input slot */
13. VIDIOC_DQBUF(video_fd, CAPTURE) /* output ready */
14. session->dpb[op->dst_dpb_slot] = { valid:true, reference_ts:op->qbuf_ts }
vk_queue_signal_semaphores(submit->signal_semaphores)
Per Phase 1 §J. Step 11's 200ms timeout is empirically derived from libva-v4l2-request-fourier behavior (it polls indefinitely; we cap to avoid driver-side hangs surfacing as Vulkan device-lost on bad bitstreams).
D8 — Synchronization: standard vk_queue infrastructure
panvk_per_arch(create_video_decode_queue) initializes a struct vk_queue with driver_submit = panvk_per_arch(video_decode_queue_submit). Wait/signal semaphores are handled by the standard vk_queue_submit infrastructure. Inside submit, the poll(request_fd) in step 11 is the synchronous gate — when it returns, the decode is done in V4L2 land, and the signal semaphores are signaled before returning.
For Phase 1, all video decodes are synchronous to submit. Async / pipelined decode is Phase >>1.
D9 — Hantro probe: by DT compatible name + topology
panvk_v4l2_probe_hantro() enumerates /dev/video* via udev, queries each with VIDIOC_QUERYCAP, accepts cards whose card field starts with "hantro-vpu" OR matches the RK3568/RK3566/RK3588 hantro DT compatibles. Falls back to a hard-coded /dev/video1 if udev unavailable. Mirrors libva-v4l2-request-fourier/src/request.c:143-308 find_decoder_video_node_via_topology.
Negative probe outcome (no hantro device) → physical_device's video extension advertisement returns false, queue family entry is suppressed, vkEnumerateDeviceExtensionProperties does not list the three KHR_video_*. Driver gracefully degrades to graphics-only.
D10 — Errors: broad first, refine Phase 6
- V4L2 EINVAL / EAGAIN / EBUSY at submit →
VK_ERROR_DEVICE_LOST(broad) - Probe failure during CreateVideoSession →
VK_ERROR_INITIALIZATION_FAILED - DPB slot conflict →
VK_ERROR_OUT_OF_DEVICE_MEMORY(closest spec match) - Refine per-error-class mapping in Phase 6 (conformance hardening).
Out of scope for this iteration (explicit non-goals)
- H.265 / HEVC: Phase 0 lock — H.264 only.
- Encode: out of scope, ever (until a separate campaign).
- Async decode / pipelined submit: synchronous-to-submit only in Phase 1.
- Multi-session concurrent decode: single session only in Phase 1 (per Phase 0 Q5).
VkVideoMaintenance1(inline parameters, inline queries): not in the simple-test requirements.- Multiplane 444 formats (
VK_EXT_ycbcr_2plane_444_formats): optional, not in Phase 1. VK_EXT_descriptor_bufferintegration: optional, not in Phase 1.- Decode correctness verification (frame-PSNR vs reference): Phase 7 territory.
- Brave consumer: structurally unfixable, see brave-vaapi-fourier close + DokuWiki.
Failure modes to watch for during Phase 4 (instrumentation plan)
| Failure | Detection |
|---|---|
| hantro device not present on a build target | panvk_v4l2_probe_hantro returns false → extension list silently shrinks. Test: vulkaninfo | grep VK_KHR_video empty on a non-hantro box |
/dev/video1 held by libva → CreateVideoSession EBUSY |
mesa_loge() at probe + return VK_ERROR_INITIALIZATION_FAILED. Test: run mpv-fourier in parallel, verify clean error message |
| S_EXT_CTRLS EINVAL on a per-control basis | per-control failing_ctrl_id is in libva-v4l2-request-fourier src/v4l2.c:497-502 (the format we don't have on the iter14 path). Reproduce that diagnostic in our panvk_v4l2_submit_op |
| H.264 spec field mismatch between Std* and v4l2_ctrl_* | Add a per-field assertion in the std→v4l2 bridge for the fields where the bitwidth differs (e.g., bit_depth_luma_minus8 is u8 in std, u8 in v4l2 — but some flags pack differently). Test: assert at translation time |
| DPB slot reuse with stale reference_ts | session->dpb[].valid cleared at DestroyVideoSession + at ResetVideoCodingControl. Test: send a RESET flag mid-stream and check dpb[] is cleared |
| Driver-side decode hang (bad bitstream) | poll(timeout=200ms) is the gate. Test: feed a truncated bitstream, verify clean VK_ERROR_DEVICE_LOST rather than session hang |
Phase 4 implementation slice — first three commits
Bite-sized, validated incrementally:
- Commit 1 — extension advertisement + queue family registration (no functionality, just enumeration). Validation:
vulkan-video-dec-simple-testgets pastHasAllDeviceExtensionscheck and into device creation. Failure mode: extension list still missing. - Commit 2 —
CreateVideoSessionKHR+DestroyVideoSessionKHR+ capability/format entrypoints (returns sane caps, no V4L2 yet — fds opened as/dev/nullplaceholders if necessary). Validation: simple-test creates a session, gets memory requirements (0 entries), destroys it cleanly. Failure mode: session create returns ERROR. - Commit 3 —
panvk_v4l2_probe_hantro+ real video_fd open + per-session V4L2 init (S_FMT, REQBUFS, request fd pool). Validation: simple-test creates a session against real/dev/video1. Failure mode: probe fails or EBUSY.
After commit 3, all the plumbing is wired. Commits 4-6 add the per-frame decode plumbing (vkCmdDecodeVideoKHR record + submit dispatch + the ioctl dance). Commit 7 is the Std→v4l2 control bridge.
Phase 2 close criteria
- All D1–D10 decisions locked
- Non-goals explicit
- Failure-modes table with detection methods
- Phase 4 first-three-commits slice defined
- Constraints re-verified on ohm (substrate side)
Phase 3 next: build a probe test client (smaller than vk-video-samples) that exercises just the extension-advertisement + queue-family-enumeration path. This is the regression test Phase 4 commits 1-2 are validated against, before bringing in the heavier vk-video-samples machinery.
— claude-noether, 2026-05-21