Files
panvk-bifrost/mesa-panvk-bifrost-video/phase0_findings.md
T
marfrit a4e7d8ab90 initial seed: retrofit campaign lineage from local working trees
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.

This retrofit imports:
- mesa-panvk-bifrost/   — r1..r4 era phase docs (iter1..iter18)
                          (libmali stub blobs at iter18/blob/ excluded
                          — 109MB of RE artifacts replaced with a README
                          pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/             — frozen .tgz source snapshots at each milestone
                          (basis for the 0005 patch diff generation)

Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.

Total: 1.9 MB across 124 files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:25:37 +02:00

15 KiB
Raw Blame History

Phase 0 — substrate / motivation / inventory for panvk-bifrost-video

Research question (one sentence)

Can mesa-panvk-bifrost-video expose VK_KHR_video_decode_h264 (plus its supporting extensions VK_KHR_video_queue, VK_KHR_video_decode_queue, VK_KHR_video_maintenance1) backed by the RK3566 hantro V4L2 stateless VPU, such that Khronos vk-video-samples decodes a 1080p H.264 BBB clip on ohm end-to-end with hantro engagement provable via fuser /dev/video1?

Operator-supplied mechanism (load-bearing claim — verbatim from session)

"brave is closed source and walled off from v4l2-request (checks for CHROME_OS at build time) and walled off from vaapi (expects a Vulkan output device I think). This is the exact reason I want the Vulkan driver - so brave does not just use vulkan to draw buttons, but to actively use the features to offload, create buffers that kwin can understand, yadda yadda younameit."

The structural insight: the unmodifiable consumer (Brave) speaks Vulkan natively as its compositor + GPU process buffer broker. If Vulkan grows a decode capability, Brave's existing dispatch hits it without changes. The bridge to actual decoder hardware (V4L2-stateless hantro) lives on the driver side of the boundary.

The structural claim has three parts that the campaign relies on:

  1. H.264 spec parameters map across protocols. Both VkVideoDecodeH264PictureInfoKHR / VkVideoDecodeH264DpbSlotInfoKHR (Vulkan side) and the V4L2 stateless H264 controls (V4L2_CID_STATELESS_H264_SPS|PPS|DECODE_PARAMS|SLICE_PARAMS| PRED_WEIGHTS|SCALING_MATRIX) carry the same underlying H.264 spec fields. The mapping is a tedious but mechanical translation, not a semantic gap.

  2. Buffers can move across protocols zero-copy. Both Vulkan (VkBuffer / VkImage with VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT) and V4L2 (V4L2_MEMORY_DMABUF) speak dmabuf. The compressed bitstream buffer (Vulkan side) → V4L2 OUTPUT queue, and the V4L2 CAPTURE queue → decoded NV12 VkImage, can both route through dmabuf fd handoffs.

  3. No GPU-side computation is required for the actual decode. The hantro is autonomous; once parameters and buffers are queued via V4L2 ioctls, the VPU executes asynchronously. panvk's role is protocol translation, not GPU shader execution.

Predecessor carry-over (panvk-bifrost campaign close)

State carried forward:

  • mesa-panvk-bifrost r4 installed on ohm: /usr/lib/panvk-bifrost/libvulkan_panfrost.so (md5 7810235db2a8379323acf8d2d521be9a)
  • ICD JSON at /usr/lib/panvk-bifrost/icd.json
  • VK_ICD_FILENAMES opt-in pattern (via brave-vulkan launcher or direct env)
  • PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1 requirement (still in force — panvk's self-non-conformance gate)
  • libva-v4l2-request-fourier installed on ohm (proves V4L2 H.264 decode path on hantro)
  • Source pointers: NIR pass + sysval pattern in ~/src/panvk-bifrost/iter17/applied_state/panvk_vX_xfb_lower.c; PKGBUILD shape in ~/src/marfrit-packages/arch/mesa-panvk-bifrost/

Data NOT carried forward (reference history only):

  • iter15's 75.7% CTS pass — wrong metric for this campaign.
  • iter17's 91.7% post-XFB-decomp — wrong metric.
  • libva-v4l2-request-fourier 1.16× realtime — wrong protocol layer.

This campaign measures VK_KHR_video_decode engagement + decode throughput + frame correctness in its own metric space. Phase 1 hypothesis goes here, Phase 3 measures fresh.

Tooling and measurement-instrument inventory

What's installed on ohm right now (live verification, not paper)

  • mesa-panvk-bifrost r4-1 — Vulkan ICD substrate
  • vulkan-headers (presumably — to be live-checked)
  • libva-v4l2-request-fourier — currently holding /dev/video1 while running. Coexistence policy needed.
  • ffmpeg-v4l2-request-fourier — uses libva path, same device contention
  • mpv-fourier, kwin-fourier, qt6-base-fourier — display stack
  • Kernel: linux-fresnel-fourier — provides hantro v4l2 stateless driver and the dma_resv patches

What needs verification (Phase 0 open items)

  • Does vulkaninfo on ohm enumerate ANY video queue family today? Likely no, but baseline the no.
  • Is the Vulkan loader on ohm new enough to support the VK_KHR_video_* extension surface negotiation? (Vulkan headers 1.3.221+ minimum.)
  • Are vk-video-samples buildable on aarch64 today? Khronos repo KhronosGroup/Vulkan-Samples and nvpro-samples/vk_video_samples. Build deps + cmake config.
  • Does Mesa ship src/vulkan/runtime/vk_video.c helpers in 26.0.6, and are they usable from a video-queue-bearing driver?
  • What's the device-ownership policy between libva-v4l2-request-fourier (currently using /dev/video1) and panvk-bifrost-video if both want decode access? V4L2 m2m allows only one process at a time.

Reference implementations to read (not copy)

  • Mesa NVKsrc/nouveau/vulkan/nvk_video.c and surrounding. Most recent Mesa VK_KHR_video implementation. Uses NVIDIA's NVDEC via class methods. Read for: extension advertisement shape, queue family registration, session/command lifecycle, DPB management state machine.
  • Mesa Anvsrc/intel/vulkan/anv_video.c. Intel VCN. Mature. Read for: parameter object handling, multi-decoder DPB tracking.
  • Mesa RADVsrc/amd/vulkan/radv_video.c. AMD UVD/VCN. Read for: a third reference point on the abstractions Mesa's vk_video.c runtime helper expects from a driver.

Crucial: do NOT copy these. Each driver dispatches into the GPU's video engine via a tightly bound submit path. Our submit path is ioctl(VIDIOC_QBUF) to /dev/video1, a fundamentally different shape. Read the high-level structure (extension surface, queue family bring-up, session object lifecycle), then implement against the V4L2 backend ourselves.

Reference for the V4L2 side (proven-working)

  • libva-v4l2-request-fourier on github → marfrit fork on packages.reauktion.de. Specifically:
    • H.264 frame-based path (single CTRL_REQ, full frame in one slice)
    • DECODE_PARAMS / SPS / PPS / SLICE_PARAMS / PRED_WEIGHTS / SCALING_MATRIX control marshalling
    • dmabuf import/export for CAPTURE queue
  • Kernel v4l2-request docs: Documentation/userspace-api/media/v4l/ ext-ctrls-codec-stateless.rst — authoritative H.264 control reference.
  • hantro_h264.c in the kernel — read assemble_scaling_list, reference_picture_list builder for the actual per-decode hardware ops, gives a sense of what V4L2 will accept.

In-session baseline anchor (per Phase 0 dev_process rule)

Predecessor's reference floors that must replicate at N=3 before binding cells anchor to them:

  1. mesa-panvk-bifrost r4 enumerates a Vulkan device and probe_winding passes 3/3 topologies. → Verified earlier this session at 14:30 UTC with packaged r4-1; sufficient as session anchor.
  2. libva-v4l2-request-fourier decodes BBB H.264 via hantro. → Verified 2026-05-21: ffmpeg -hwaccel vaapi + libva = 1.56× realtime on the same BBB file used in this session's brave instrumentation run. ffmpeg -hwaccel v4l2request (direct, bypassing libva) = 1.73× realtime. Both paths green at N=1 each; N=3 anchor still pending but the single-rep result reproduces the iter14 measurement at same magnitude so likely-stable.
  3. vulkaninfo reports advertised extensions and queue families. → Measured 2026-05-21 with VK_ICD_FILENAMES=/usr/lib/panvk-bifrost/icd.json PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1: Vulkan 1.4.350 loader; 19 instance extensions; zero VK_KHR_video_* extensions; single queue family, queueCount=1; no VK_QUEUE_VIDEO_DECODE_BIT anywhere. Clean baseline — campaign deliverable is 0→1 queue family + extensions on panvk-bifrost.

If (2) or (3) fail to anchor, loop back: investigate the rig before moving to Phase 1.

Open questions for Phase 1 to resolve

These are known unknowns — they don't block Phase 0 close, but Phase 1's metric choice depends on the answer.

Q1 — Device ownership: how do libva and panvk-bifrost-video coexist?

/dev/video1 (hantro m2m) accepts one process at a time. Options:

  • A. Mutually exclusive use: only one runtime holds the device at a time; user picks via env var (LIBVA_DRIVER_NAME=null → Vulkan path, etc.).
  • B. Shared-device daemon: a small userspace daemon owns /dev/video1 and arbitrates V4L2 requests from multiple clients via a custom IPC protocol. Complex. Not for Phase 1.
  • C. Drop libva entirely for the consumers we care about: brave uses Vulkan; firefox-fourier already uses V4L2-direct, not libva; mpv-fourier uses ffmpeg-v4l2-request-fourier. If libva-v4l2-request isn't the path for any consumer in scope, drop it from the running set for video tasks.

Recommendation for Phase 1: lock A. Document the env-toggle. Defer B to later iteration if real workloads need it.

Q2 — Does Brave even probe VK_KHR_video_decode_h264 today? — ANSWERED 2026-05-21

No, and won't engage even if we offer it. strings /opt/brave-bin/* returns zero hits for VK_KHR_video / VulkanVideoDecoder. Chromium's VulkanVideoDecoder is a Khronos design draft (Dec 2025, 13-week implementation plan, not merged) — see Vulkan Video Integration into Chromium. Beyond probing, brave-bin on PineTab2 is structurally unable to engage HW video decode at all due to the chromeos-pipeline ImageProcessor wall — see fourier:brave_arm64_vaapi_wall on DokuWiki or ~/src/brave-vaapi-fourier/DEFINITIVE_FINDING.md (measured 2026-05-21).

Implication for this campaign: Brave is NOT a Phase 1 consumer. The immediate consumer story:

  • mpv with --hwdec=vulkan — enumerated today on ohm for h264 / hevc / vp9 / av1 (mpv hwdec=help confirms). Uses libavcodec's hwcontext_vulkan.c path. Once panvk-bifrost-video exposes VK_KHR_video_decode_h264, mpv-fourier becomes an immediate consumer.
  • ffmpeg with -hwaccel vulkan — first-class hwaccel method, confirmed in ffmpeg -hwaccels on ohm.
  • gstreamer 1.28.3 vulkan plugin — gst-plugins-bad ships vulkan{h264,h265,av1}dec (per-codec presence on this build TBD).
  • Future Brave: gets it free when chromium upstream lands VulkanVideoDecoder (months/year-ish).

Phase 1 milestone stays vk-video-samples as the test client (isolates driver work from consumer-side bugs). Phase 8 close-criteria will add "mpv-fourier --hwdec=vulkan decodes BBB H.264 on ohm with fuser showing /dev/video1 engagement" — the real-world consumer proof.

Q3 — Vulkan ↔ V4L2 DPB management mismatch

Vulkan API thinks of DPB (Decoded Picture Buffer) as an array of VkImage slots owned by the driver, with the application telling the driver which slot is the output frame, which slots are references for the current decode, and when a slot can be reused.

V4L2 stateless H.264 thinks of DPB as a runtime data structure encoded in V4L2_CID_STATELESS_H264_DECODE_PARAMS (the dpb[16] array of v4l2_h264_dpb_entry), pointing at indices of frames in the CAPTURE queue.

The mapping is doable but not trivial. The Mesa NVK/Anv/RADV implementations have abstractions around this in src/vulkan/runtime/vk_video.c. Phase 0 close: read that file end to end, decide whether it's a usable harness for our V4L2 backend or whether we need a parallel set of helpers.

Q4 — Vulkan video queue family expectations vs panvk's job manager

panvk on Bifrost is JM-class (Job Manager). Job Manager has graphics

  • compute + fragment ringbuffers; it has no concept of a separate video ring. The Vulkan API expects a queue family with VK_QUEUE_VIDEO_DECODE_BIT_KHR and an associated VkQueue instance you submit decode commands to.

Our submit path won't actually go to the JM at all — it'll go to V4L2. So the panvk video queue is "fake" from the GPU's perspective: it's a userspace queue that translates command-buffer-recorded video ops into V4L2 ioctls. This is fine architecturally but needs the queue infrastructure (synchronization, timeline semaphores between graphics and video families) to be wired up correctly. NVK probably has the cleanest reference for this since NVDEC is also architecturally separate from the graphics scheduler on Nvidia.

Q5 — Hantro per-stream device contention vs concurrent decodes

VK_KHR_video allows multiple VkVideoSessionKHR instances per device. If two of them concurrently want to decode different streams, the hantro m2m driver serializes them via the V4L2 queueing model, but performance contention is a real issue. Phase 1's target is a single decode session; multi-session concurrency is a Phase >>1 problem.

Predecessor inheritance summary

Inherited Source How used
Vulkan ICD substrate (libvulkan_panfrost.so r4) panvk-bifrost campaign The library we extend with video
PKGBUILD pattern for mesa-panvk-bifrost-* packages marfrit-packages/arch/mesa-panvk-bifrost Template for new mesa-panvk-bifrost-video
V4L2 stateless H.264 control marshalling libva-v4l2-request-fourier Reference; not linked into panvk
Kernel dma_resv patches linux-fresnel-fourier Buffer fence correctness on V4L2 producers
Build/CI on Gitea Actions aarch64 runner marfrit-packages Same pipeline, new package
Dev process 9(+1)-phase loop feedback_dev_process.md This campaign follows

Phase 0 close criteria (when this loop step is done)

  • Research question + mechanism locked
  • Predecessor state vs data categorized
  • Live verification on ohm — vulkaninfo baselined, libva-v4l2-request re-anchored via ffmpeg side-by-side (1.56×/1.73× realtime confirmed)
  • Open questions tabled — Q1 (device ownership, lock A: mutex with env), Q2 (Brave probe — ANSWERED: no, won't engage, see DokuWiki finding), Q3 (DPB mapping — Phase 1 reads Mesa NVK reference), Q4 (video queue family on JM — Phase 1 design item), Q5 (multi-session concurrency — Phase >>1 scope, lock single-session for now)
  • vk-video-samples build attempt on aarch64 — PENDING, last gating item for Phase 0 close
  • Phase 0 evidence dir populated with anchored measurements (phase0_evidence/) — PENDING packaging the raw measurements

What Phase 1 will lock against

After Phase 0 closes, Phase 1 will state the success metric in measurable terms. Tentative: "vk-video-samples (or equivalent Khronos Vulkan video test client, version locked) decodes a 1080p H.264 sample to NV12 frames on ohm using mesa-panvk-bifrost-video, with fuser /dev/video1 confirming hantro engagement, with no software fallback in chrome://media-internals-equivalent diagnostics, at no worse than 1.0× realtime."

The 1.0× threshold is conservative; libva-v4l2-request-fourier already does 1.16× via the same V4L2 path. The driver-bridge cost should be a few percent at worst. Anything below 0.7× indicates a buffer-copy regression to investigate.

— claude-noether, 2026-05-21