a4e7d8ab90
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.
This retrofit imports:
- mesa-panvk-bifrost/ — r1..r4 era phase docs (iter1..iter18)
(libmali stub blobs at iter18/blob/ excluded
— 109MB of RE artifacts replaced with a README
pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/ — frozen .tgz source snapshots at each milestone
(basis for the 0005 patch diff generation)
Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.
Total: 1.9 MB across 124 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
308 lines
15 KiB
Markdown
308 lines
15 KiB
Markdown
# Phase 0 — substrate / motivation / inventory for panvk-bifrost-video
|
||
|
||
## Research question (one sentence)
|
||
|
||
**Can `mesa-panvk-bifrost-video` expose `VK_KHR_video_decode_h264`
|
||
(plus its supporting extensions `VK_KHR_video_queue`,
|
||
`VK_KHR_video_decode_queue`, `VK_KHR_video_maintenance1`) backed by
|
||
the RK3566 hantro V4L2 stateless VPU, such that Khronos
|
||
`vk-video-samples` decodes a 1080p H.264 BBB clip on ohm end-to-end
|
||
with hantro engagement provable via `fuser /dev/video1`?**
|
||
|
||
## Operator-supplied mechanism (load-bearing claim — verbatim from session)
|
||
|
||
> "brave is closed source and walled off from v4l2-request (checks for
|
||
> CHROME_OS at build time) and walled off from vaapi (expects a Vulkan
|
||
> output device I think). This is the exact reason I want the Vulkan
|
||
> driver - so brave does not just use vulkan to draw buttons, but to
|
||
> actively use the features to offload, create buffers that kwin can
|
||
> understand, yadda yadda younameit."
|
||
|
||
The structural insight: the unmodifiable consumer (Brave) speaks
|
||
Vulkan natively as its compositor + GPU process buffer broker. If
|
||
Vulkan grows a decode capability, Brave's existing dispatch hits it
|
||
without changes. The bridge to actual decoder hardware (V4L2-stateless
|
||
hantro) lives on the *driver* side of the boundary.
|
||
|
||
The structural claim has three parts that the campaign relies on:
|
||
|
||
1. **H.264 spec parameters map across protocols.** Both
|
||
`VkVideoDecodeH264PictureInfoKHR` / `VkVideoDecodeH264DpbSlotInfoKHR`
|
||
(Vulkan side) and the V4L2 stateless H264 controls
|
||
(`V4L2_CID_STATELESS_H264_SPS|PPS|DECODE_PARAMS|SLICE_PARAMS|
|
||
PRED_WEIGHTS|SCALING_MATRIX`) carry the same underlying H.264 spec
|
||
fields. The mapping is a tedious but mechanical translation, not a
|
||
semantic gap.
|
||
|
||
2. **Buffers can move across protocols zero-copy.** Both Vulkan
|
||
(`VkBuffer` / `VkImage` with `VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT`)
|
||
and V4L2 (`V4L2_MEMORY_DMABUF`) speak dmabuf. The compressed
|
||
bitstream buffer (Vulkan side) → V4L2 OUTPUT queue, and the V4L2
|
||
CAPTURE queue → decoded NV12 VkImage, can both route through
|
||
dmabuf fd handoffs.
|
||
|
||
3. **No GPU-side computation is required for the actual decode.**
|
||
The hantro is autonomous; once parameters and buffers are queued
|
||
via V4L2 ioctls, the VPU executes asynchronously. panvk's role
|
||
is *protocol translation*, not GPU shader execution.
|
||
|
||
## Predecessor carry-over (panvk-bifrost campaign close)
|
||
|
||
**State carried forward**:
|
||
- `mesa-panvk-bifrost` r4 installed on ohm:
|
||
`/usr/lib/panvk-bifrost/libvulkan_panfrost.so` (md5
|
||
7810235db2a8379323acf8d2d521be9a)
|
||
- ICD JSON at `/usr/lib/panvk-bifrost/icd.json`
|
||
- `VK_ICD_FILENAMES` opt-in pattern (via `brave-vulkan` launcher or
|
||
direct env)
|
||
- `PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1` requirement (still in force —
|
||
panvk's self-non-conformance gate)
|
||
- `libva-v4l2-request-fourier` installed on ohm (proves V4L2 H.264
|
||
decode path on hantro)
|
||
- Source pointers: NIR pass + sysval pattern in
|
||
`~/src/panvk-bifrost/iter17/applied_state/panvk_vX_xfb_lower.c`;
|
||
PKGBUILD shape in
|
||
`~/src/marfrit-packages/arch/mesa-panvk-bifrost/`
|
||
|
||
**Data NOT carried forward** (reference history only):
|
||
- iter15's 75.7% CTS pass — wrong metric for this campaign.
|
||
- iter17's 91.7% post-XFB-decomp — wrong metric.
|
||
- libva-v4l2-request-fourier 1.16× realtime — wrong protocol layer.
|
||
|
||
This campaign measures **VK_KHR_video_decode engagement + decode
|
||
throughput + frame correctness** in its own metric space. Phase 1
|
||
hypothesis goes here, Phase 3 measures fresh.
|
||
|
||
## Tooling and measurement-instrument inventory
|
||
|
||
### What's installed on ohm right now (live verification, not paper)
|
||
|
||
- `mesa-panvk-bifrost` r4-1 — Vulkan ICD substrate
|
||
- `vulkan-headers` (presumably — to be live-checked)
|
||
- `libva-v4l2-request-fourier` — currently holding `/dev/video1`
|
||
while running. **Coexistence policy needed.**
|
||
- `ffmpeg-v4l2-request-fourier` — uses libva path, same device
|
||
contention
|
||
- `mpv-fourier`, `kwin-fourier`, `qt6-base-fourier` — display stack
|
||
- Kernel: `linux-fresnel-fourier` — provides hantro v4l2 stateless
|
||
driver and the `dma_resv` patches
|
||
|
||
### What needs verification (Phase 0 open items)
|
||
|
||
- Does `vulkaninfo` on ohm enumerate ANY video queue family today?
|
||
Likely no, but baseline the no.
|
||
- Is the Vulkan loader on ohm new enough to support the `VK_KHR_video_*`
|
||
extension surface negotiation? (Vulkan headers 1.3.221+ minimum.)
|
||
- Are vk-video-samples buildable on aarch64 today?
|
||
Khronos repo `KhronosGroup/Vulkan-Samples` and
|
||
`nvpro-samples/vk_video_samples`. Build deps + cmake config.
|
||
- Does Mesa ship `src/vulkan/runtime/vk_video.c` helpers in
|
||
26.0.6, and are they usable from a video-queue-bearing driver?
|
||
- What's the device-ownership policy between `libva-v4l2-request-fourier`
|
||
(currently using `/dev/video1`) and `panvk-bifrost-video` if both
|
||
want decode access? V4L2 m2m allows only one process at a time.
|
||
|
||
### Reference implementations to read (not copy)
|
||
|
||
- **Mesa NVK** — `src/nouveau/vulkan/nvk_video.c` and surrounding.
|
||
Most recent Mesa VK_KHR_video implementation. Uses NVIDIA's NVDEC
|
||
via class methods. Read for: extension advertisement shape,
|
||
queue family registration, session/command lifecycle, DPB
|
||
management state machine.
|
||
- **Mesa Anv** — `src/intel/vulkan/anv_video.c`. Intel VCN. Mature.
|
||
Read for: parameter object handling, multi-decoder DPB tracking.
|
||
- **Mesa RADV** — `src/amd/vulkan/radv_video.c`. AMD UVD/VCN.
|
||
Read for: a third reference point on the abstractions Mesa's
|
||
`vk_video.c` runtime helper expects from a driver.
|
||
|
||
**Crucial**: do NOT copy these. Each driver dispatches into the
|
||
GPU's video engine via a tightly bound submit path. Our submit path
|
||
is `ioctl(VIDIOC_QBUF)` to /dev/video1, a fundamentally different
|
||
shape. Read the high-level structure (extension surface, queue
|
||
family bring-up, session object lifecycle), then implement against
|
||
the V4L2 backend ourselves.
|
||
|
||
### Reference for the V4L2 side (proven-working)
|
||
|
||
- `libva-v4l2-request-fourier` on github → marfrit fork on
|
||
packages.reauktion.de. Specifically:
|
||
- H.264 frame-based path (single CTRL_REQ, full frame in one slice)
|
||
- DECODE_PARAMS / SPS / PPS / SLICE_PARAMS / PRED_WEIGHTS /
|
||
SCALING_MATRIX control marshalling
|
||
- dmabuf import/export for CAPTURE queue
|
||
- Kernel v4l2-request docs: `Documentation/userspace-api/media/v4l/
|
||
ext-ctrls-codec-stateless.rst` — authoritative H.264 control
|
||
reference.
|
||
- `hantro_h264.c` in the kernel — read assemble_scaling_list,
|
||
reference_picture_list builder for the actual per-decode hardware
|
||
ops, gives a sense of what V4L2 will accept.
|
||
|
||
## In-session baseline anchor (per Phase 0 dev_process rule)
|
||
|
||
Predecessor's reference floors that must replicate at N=3 before
|
||
binding cells anchor to them:
|
||
|
||
1. `mesa-panvk-bifrost` r4 enumerates a Vulkan device and
|
||
`probe_winding` passes 3/3 topologies. → **Verified earlier this
|
||
session at 14:30 UTC** with packaged r4-1; sufficient as session
|
||
anchor.
|
||
2. `libva-v4l2-request-fourier` decodes BBB H.264 via hantro. → **Verified
|
||
2026-05-21**: ffmpeg `-hwaccel vaapi` + libva = 1.56× realtime on the
|
||
same BBB file used in this session's brave instrumentation run.
|
||
ffmpeg `-hwaccel v4l2request` (direct, bypassing libva) = 1.73× realtime.
|
||
Both paths green at N=1 each; N=3 anchor still pending but the
|
||
single-rep result reproduces the iter14 measurement at same magnitude
|
||
so likely-stable.
|
||
3. `vulkaninfo` reports advertised extensions and queue families. →
|
||
**Measured 2026-05-21** with `VK_ICD_FILENAMES=/usr/lib/panvk-bifrost/icd.json`
|
||
`PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1`: Vulkan 1.4.350 loader; 19
|
||
instance extensions; **zero `VK_KHR_video_*` extensions**; single queue
|
||
family, queueCount=1; no `VK_QUEUE_VIDEO_DECODE_BIT` anywhere. Clean
|
||
baseline — campaign deliverable is 0→1 queue family + extensions on
|
||
panvk-bifrost.
|
||
|
||
If (2) or (3) fail to anchor, loop back: investigate the rig before
|
||
moving to Phase 1.
|
||
|
||
## Open questions for Phase 1 to resolve
|
||
|
||
These are *known unknowns* — they don't block Phase 0 close, but
|
||
Phase 1's metric choice depends on the answer.
|
||
|
||
### Q1 — Device ownership: how do libva and panvk-bifrost-video coexist?
|
||
|
||
`/dev/video1` (hantro m2m) accepts one process at a time. Options:
|
||
|
||
- **A. Mutually exclusive use**: only one runtime holds the device at
|
||
a time; user picks via env var (`LIBVA_DRIVER_NAME=null` → Vulkan
|
||
path, etc.).
|
||
- **B. Shared-device daemon**: a small userspace daemon owns
|
||
`/dev/video1` and arbitrates V4L2 requests from multiple clients
|
||
via a custom IPC protocol. Complex. Not for Phase 1.
|
||
- **C. Drop libva entirely for the consumers we care about**: brave
|
||
uses Vulkan; firefox-fourier already uses V4L2-direct, not libva;
|
||
mpv-fourier uses ffmpeg-v4l2-request-fourier. If libva-v4l2-request
|
||
isn't the path for any consumer in scope, drop it from the running
|
||
set for video tasks.
|
||
|
||
**Recommendation for Phase 1**: lock A. Document the env-toggle.
|
||
Defer B to later iteration if real workloads need it.
|
||
|
||
### Q2 — Does Brave even probe VK_KHR_video_decode_h264 today? — ANSWERED 2026-05-21
|
||
|
||
**No, and won't engage even if we offer it.** `strings /opt/brave-bin/*`
|
||
returns **zero hits** for `VK_KHR_video` / `VulkanVideoDecoder`.
|
||
Chromium's VulkanVideoDecoder is a Khronos design draft (Dec 2025,
|
||
13-week implementation plan, not merged) — see
|
||
[Vulkan Video Integration into Chromium](https://www.khronos.org/vulkan/chrome-video/vulkan_video_integration.html).
|
||
Beyond probing, brave-bin on PineTab2 is structurally unable to engage
|
||
HW video decode at all due to the chromeos-pipeline ImageProcessor wall —
|
||
see [[fourier:brave_arm64_vaapi_wall]] on DokuWiki or
|
||
`~/src/brave-vaapi-fourier/DEFINITIVE_FINDING.md` (measured 2026-05-21).
|
||
|
||
**Implication for this campaign**: Brave is NOT a Phase 1 consumer.
|
||
The immediate consumer story:
|
||
|
||
- **mpv with `--hwdec=vulkan`** — enumerated today on ohm for h264 /
|
||
hevc / vp9 / av1 (mpv hwdec=help confirms). Uses libavcodec's
|
||
`hwcontext_vulkan.c` path. Once panvk-bifrost-video exposes
|
||
`VK_KHR_video_decode_h264`, mpv-fourier becomes an immediate consumer.
|
||
- **ffmpeg with `-hwaccel vulkan`** — first-class hwaccel method,
|
||
confirmed in `ffmpeg -hwaccels` on ohm.
|
||
- **gstreamer 1.28.3 `vulkan` plugin** — gst-plugins-bad ships
|
||
`vulkan{h264,h265,av1}dec` (per-codec presence on this build TBD).
|
||
- **Future Brave**: gets it free when chromium upstream lands
|
||
VulkanVideoDecoder (months/year-ish).
|
||
|
||
Phase 1 milestone stays **vk-video-samples** as the test client (isolates
|
||
driver work from consumer-side bugs). Phase 8 close-criteria will add
|
||
"mpv-fourier `--hwdec=vulkan` decodes BBB H.264 on ohm with fuser
|
||
showing /dev/video1 engagement" — the real-world consumer proof.
|
||
|
||
### Q3 — Vulkan ↔ V4L2 DPB management mismatch
|
||
|
||
Vulkan API thinks of DPB (Decoded Picture Buffer) as an array of
|
||
`VkImage` slots owned by the driver, with the application telling the
|
||
driver which slot is the output frame, which slots are references
|
||
for the current decode, and when a slot can be reused.
|
||
|
||
V4L2 stateless H.264 thinks of DPB as a runtime data structure
|
||
encoded in `V4L2_CID_STATELESS_H264_DECODE_PARAMS` (the `dpb[16]`
|
||
array of `v4l2_h264_dpb_entry`), pointing at indices of frames in
|
||
the CAPTURE queue.
|
||
|
||
The mapping is doable but not trivial. The Mesa NVK/Anv/RADV
|
||
implementations have abstractions around this in
|
||
`src/vulkan/runtime/vk_video.c`. Phase 0 close: read that file end
|
||
to end, decide whether it's a usable harness for our V4L2 backend
|
||
or whether we need a parallel set of helpers.
|
||
|
||
### Q4 — Vulkan video queue family expectations vs panvk's job manager
|
||
|
||
panvk on Bifrost is JM-class (Job Manager). Job Manager has graphics
|
||
+ compute + fragment ringbuffers; it has no concept of a separate
|
||
video ring. The Vulkan API expects a queue family with
|
||
`VK_QUEUE_VIDEO_DECODE_BIT_KHR` and an associated `VkQueue` instance
|
||
you submit decode commands to.
|
||
|
||
Our submit path won't actually go to the JM at all — it'll go to
|
||
V4L2. So the panvk video queue is "fake" from the GPU's perspective:
|
||
it's a userspace queue that translates command-buffer-recorded video
|
||
ops into V4L2 ioctls. This is fine architecturally but needs the
|
||
queue infrastructure (synchronization, timeline semaphores between
|
||
graphics and video families) to be wired up correctly. NVK probably
|
||
has the cleanest reference for this since NVDEC is also
|
||
architecturally separate from the graphics scheduler on Nvidia.
|
||
|
||
### Q5 — Hantro per-stream device contention vs concurrent decodes
|
||
|
||
VK_KHR_video allows multiple `VkVideoSessionKHR` instances per device.
|
||
If two of them concurrently want to decode different streams, the
|
||
hantro m2m driver serializes them via the V4L2 queueing model, but
|
||
performance contention is a real issue. Phase 1's target is a single
|
||
decode session; multi-session concurrency is a Phase >>1 problem.
|
||
|
||
## Predecessor inheritance summary
|
||
|
||
| Inherited | Source | How used |
|
||
|---|---|---|
|
||
| Vulkan ICD substrate (`libvulkan_panfrost.so` r4) | panvk-bifrost campaign | The library we extend with video |
|
||
| PKGBUILD pattern for `mesa-panvk-bifrost-*` packages | marfrit-packages/arch/mesa-panvk-bifrost | Template for new `mesa-panvk-bifrost-video` |
|
||
| V4L2 stateless H.264 control marshalling | libva-v4l2-request-fourier | Reference; not linked into panvk |
|
||
| Kernel `dma_resv` patches | linux-fresnel-fourier | Buffer fence correctness on V4L2 producers |
|
||
| Build/CI on Gitea Actions aarch64 runner | marfrit-packages | Same pipeline, new package |
|
||
| Dev process 9(+1)-phase loop | `feedback_dev_process.md` | This campaign follows |
|
||
|
||
## Phase 0 close criteria (when this loop step is done)
|
||
|
||
- [x] Research question + mechanism locked
|
||
- [x] Predecessor state vs data categorized
|
||
- [x] Live verification on ohm — vulkaninfo baselined, libva-v4l2-request
|
||
re-anchored via ffmpeg side-by-side (1.56×/1.73× realtime confirmed)
|
||
- [x] Open questions tabled — Q1 (device ownership, lock A: mutex with env),
|
||
Q2 (Brave probe — ANSWERED: no, won't engage, see DokuWiki finding),
|
||
Q3 (DPB mapping — Phase 1 reads Mesa NVK reference),
|
||
Q4 (video queue family on JM — Phase 1 design item),
|
||
Q5 (multi-session concurrency — Phase >>1 scope, lock single-session for now)
|
||
- [ ] vk-video-samples build attempt on aarch64 — **PENDING**, last
|
||
gating item for Phase 0 close
|
||
- [ ] Phase 0 evidence dir populated with anchored measurements
|
||
(`phase0_evidence/`) — **PENDING** packaging the raw measurements
|
||
|
||
## What Phase 1 will lock against
|
||
|
||
After Phase 0 closes, Phase 1 will state the success metric in
|
||
measurable terms. Tentative: *"vk-video-samples (or equivalent
|
||
Khronos Vulkan video test client, version locked) decodes a
|
||
1080p H.264 sample to NV12 frames on ohm using mesa-panvk-bifrost-video,
|
||
with `fuser /dev/video1` confirming hantro engagement, with no
|
||
software fallback in `chrome://media-internals`-equivalent
|
||
diagnostics, at no worse than 1.0× realtime."*
|
||
|
||
The 1.0× threshold is conservative; libva-v4l2-request-fourier
|
||
already does 1.16× via the same V4L2 path. The driver-bridge cost
|
||
should be a few percent at worst. Anything below 0.7× indicates a
|
||
buffer-copy regression to investigate.
|
||
|
||
— claude-noether, 2026-05-21
|