Files
marfrit a4e7d8ab90 initial seed: retrofit campaign lineage from local working trees
panvk-bifrost campaigns (r1..r4 Vulkan compositor + r5.video1 Vulkan
video decode) shipped before this repo existed; the deliverable
patches live in marfrit-packages, but the reasoning chain, phase docs,
and source-state evidence lived only in local working trees on the
development host.

This retrofit imports:
- mesa-panvk-bifrost/   — r1..r4 era phase docs (iter1..iter18)
                          (libmali stub blobs at iter18/blob/ excluded
                          — 109MB of RE artifacts replaced with a README
                          pointer)
- mesa-panvk-bifrost-video/ — sibling campaign phase docs + probe
- evidence/             — frozen .tgz source snapshots at each milestone
                          (basis for the 0005 patch diff generation)

Future iterations should branch off here from day one, so each iter is
a commit rather than a snapshot. See [[feedback-session-local-process-pins]]
for the process drift this retrofit closes.

Total: 1.9 MB across 124 files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 05:25:37 +02:00

308 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 0 — substrate / motivation / inventory for panvk-bifrost-video
## Research question (one sentence)
**Can `mesa-panvk-bifrost-video` expose `VK_KHR_video_decode_h264`
(plus its supporting extensions `VK_KHR_video_queue`,
`VK_KHR_video_decode_queue`, `VK_KHR_video_maintenance1`) backed by
the RK3566 hantro V4L2 stateless VPU, such that Khronos
`vk-video-samples` decodes a 1080p H.264 BBB clip on ohm end-to-end
with hantro engagement provable via `fuser /dev/video1`?**
## Operator-supplied mechanism (load-bearing claim — verbatim from session)
> "brave is closed source and walled off from v4l2-request (checks for
> CHROME_OS at build time) and walled off from vaapi (expects a Vulkan
> output device I think). This is the exact reason I want the Vulkan
> driver - so brave does not just use vulkan to draw buttons, but to
> actively use the features to offload, create buffers that kwin can
> understand, yadda yadda younameit."
The structural insight: the unmodifiable consumer (Brave) speaks
Vulkan natively as its compositor + GPU process buffer broker. If
Vulkan grows a decode capability, Brave's existing dispatch hits it
without changes. The bridge to actual decoder hardware (V4L2-stateless
hantro) lives on the *driver* side of the boundary.
The structural claim has three parts that the campaign relies on:
1. **H.264 spec parameters map across protocols.** Both
`VkVideoDecodeH264PictureInfoKHR` / `VkVideoDecodeH264DpbSlotInfoKHR`
(Vulkan side) and the V4L2 stateless H264 controls
(`V4L2_CID_STATELESS_H264_SPS|PPS|DECODE_PARAMS|SLICE_PARAMS|
PRED_WEIGHTS|SCALING_MATRIX`) carry the same underlying H.264 spec
fields. The mapping is a tedious but mechanical translation, not a
semantic gap.
2. **Buffers can move across protocols zero-copy.** Both Vulkan
(`VkBuffer` / `VkImage` with `VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT`)
and V4L2 (`V4L2_MEMORY_DMABUF`) speak dmabuf. The compressed
bitstream buffer (Vulkan side) → V4L2 OUTPUT queue, and the V4L2
CAPTURE queue → decoded NV12 VkImage, can both route through
dmabuf fd handoffs.
3. **No GPU-side computation is required for the actual decode.**
The hantro is autonomous; once parameters and buffers are queued
via V4L2 ioctls, the VPU executes asynchronously. panvk's role
is *protocol translation*, not GPU shader execution.
## Predecessor carry-over (panvk-bifrost campaign close)
**State carried forward**:
- `mesa-panvk-bifrost` r4 installed on ohm:
`/usr/lib/panvk-bifrost/libvulkan_panfrost.so` (md5
7810235db2a8379323acf8d2d521be9a)
- ICD JSON at `/usr/lib/panvk-bifrost/icd.json`
- `VK_ICD_FILENAMES` opt-in pattern (via `brave-vulkan` launcher or
direct env)
- `PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1` requirement (still in force —
panvk's self-non-conformance gate)
- `libva-v4l2-request-fourier` installed on ohm (proves V4L2 H.264
decode path on hantro)
- Source pointers: NIR pass + sysval pattern in
`~/src/panvk-bifrost/iter17/applied_state/panvk_vX_xfb_lower.c`;
PKGBUILD shape in
`~/src/marfrit-packages/arch/mesa-panvk-bifrost/`
**Data NOT carried forward** (reference history only):
- iter15's 75.7% CTS pass — wrong metric for this campaign.
- iter17's 91.7% post-XFB-decomp — wrong metric.
- libva-v4l2-request-fourier 1.16× realtime — wrong protocol layer.
This campaign measures **VK_KHR_video_decode engagement + decode
throughput + frame correctness** in its own metric space. Phase 1
hypothesis goes here, Phase 3 measures fresh.
## Tooling and measurement-instrument inventory
### What's installed on ohm right now (live verification, not paper)
- `mesa-panvk-bifrost` r4-1 — Vulkan ICD substrate
- `vulkan-headers` (presumably — to be live-checked)
- `libva-v4l2-request-fourier` — currently holding `/dev/video1`
while running. **Coexistence policy needed.**
- `ffmpeg-v4l2-request-fourier` — uses libva path, same device
contention
- `mpv-fourier`, `kwin-fourier`, `qt6-base-fourier` — display stack
- Kernel: `linux-fresnel-fourier` — provides hantro v4l2 stateless
driver and the `dma_resv` patches
### What needs verification (Phase 0 open items)
- Does `vulkaninfo` on ohm enumerate ANY video queue family today?
Likely no, but baseline the no.
- Is the Vulkan loader on ohm new enough to support the `VK_KHR_video_*`
extension surface negotiation? (Vulkan headers 1.3.221+ minimum.)
- Are vk-video-samples buildable on aarch64 today?
Khronos repo `KhronosGroup/Vulkan-Samples` and
`nvpro-samples/vk_video_samples`. Build deps + cmake config.
- Does Mesa ship `src/vulkan/runtime/vk_video.c` helpers in
26.0.6, and are they usable from a video-queue-bearing driver?
- What's the device-ownership policy between `libva-v4l2-request-fourier`
(currently using `/dev/video1`) and `panvk-bifrost-video` if both
want decode access? V4L2 m2m allows only one process at a time.
### Reference implementations to read (not copy)
- **Mesa NVK** — `src/nouveau/vulkan/nvk_video.c` and surrounding.
Most recent Mesa VK_KHR_video implementation. Uses NVIDIA's NVDEC
via class methods. Read for: extension advertisement shape,
queue family registration, session/command lifecycle, DPB
management state machine.
- **Mesa Anv** — `src/intel/vulkan/anv_video.c`. Intel VCN. Mature.
Read for: parameter object handling, multi-decoder DPB tracking.
- **Mesa RADV** — `src/amd/vulkan/radv_video.c`. AMD UVD/VCN.
Read for: a third reference point on the abstractions Mesa's
`vk_video.c` runtime helper expects from a driver.
**Crucial**: do NOT copy these. Each driver dispatches into the
GPU's video engine via a tightly bound submit path. Our submit path
is `ioctl(VIDIOC_QBUF)` to /dev/video1, a fundamentally different
shape. Read the high-level structure (extension surface, queue
family bring-up, session object lifecycle), then implement against
the V4L2 backend ourselves.
### Reference for the V4L2 side (proven-working)
- `libva-v4l2-request-fourier` on github → marfrit fork on
packages.reauktion.de. Specifically:
- H.264 frame-based path (single CTRL_REQ, full frame in one slice)
- DECODE_PARAMS / SPS / PPS / SLICE_PARAMS / PRED_WEIGHTS /
SCALING_MATRIX control marshalling
- dmabuf import/export for CAPTURE queue
- Kernel v4l2-request docs: `Documentation/userspace-api/media/v4l/
ext-ctrls-codec-stateless.rst` — authoritative H.264 control
reference.
- `hantro_h264.c` in the kernel — read assemble_scaling_list,
reference_picture_list builder for the actual per-decode hardware
ops, gives a sense of what V4L2 will accept.
## In-session baseline anchor (per Phase 0 dev_process rule)
Predecessor's reference floors that must replicate at N=3 before
binding cells anchor to them:
1. `mesa-panvk-bifrost` r4 enumerates a Vulkan device and
`probe_winding` passes 3/3 topologies. → **Verified earlier this
session at 14:30 UTC** with packaged r4-1; sufficient as session
anchor.
2. `libva-v4l2-request-fourier` decodes BBB H.264 via hantro. → **Verified
2026-05-21**: ffmpeg `-hwaccel vaapi` + libva = 1.56× realtime on the
same BBB file used in this session's brave instrumentation run.
ffmpeg `-hwaccel v4l2request` (direct, bypassing libva) = 1.73× realtime.
Both paths green at N=1 each; N=3 anchor still pending but the
single-rep result reproduces the iter14 measurement at same magnitude
so likely-stable.
3. `vulkaninfo` reports advertised extensions and queue families. →
**Measured 2026-05-21** with `VK_ICD_FILENAMES=/usr/lib/panvk-bifrost/icd.json`
`PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1`: Vulkan 1.4.350 loader; 19
instance extensions; **zero `VK_KHR_video_*` extensions**; single queue
family, queueCount=1; no `VK_QUEUE_VIDEO_DECODE_BIT` anywhere. Clean
baseline — campaign deliverable is 0→1 queue family + extensions on
panvk-bifrost.
If (2) or (3) fail to anchor, loop back: investigate the rig before
moving to Phase 1.
## Open questions for Phase 1 to resolve
These are *known unknowns* — they don't block Phase 0 close, but
Phase 1's metric choice depends on the answer.
### Q1 — Device ownership: how do libva and panvk-bifrost-video coexist?
`/dev/video1` (hantro m2m) accepts one process at a time. Options:
- **A. Mutually exclusive use**: only one runtime holds the device at
a time; user picks via env var (`LIBVA_DRIVER_NAME=null` → Vulkan
path, etc.).
- **B. Shared-device daemon**: a small userspace daemon owns
`/dev/video1` and arbitrates V4L2 requests from multiple clients
via a custom IPC protocol. Complex. Not for Phase 1.
- **C. Drop libva entirely for the consumers we care about**: brave
uses Vulkan; firefox-fourier already uses V4L2-direct, not libva;
mpv-fourier uses ffmpeg-v4l2-request-fourier. If libva-v4l2-request
isn't the path for any consumer in scope, drop it from the running
set for video tasks.
**Recommendation for Phase 1**: lock A. Document the env-toggle.
Defer B to later iteration if real workloads need it.
### Q2 — Does Brave even probe VK_KHR_video_decode_h264 today? — ANSWERED 2026-05-21
**No, and won't engage even if we offer it.** `strings /opt/brave-bin/*`
returns **zero hits** for `VK_KHR_video` / `VulkanVideoDecoder`.
Chromium's VulkanVideoDecoder is a Khronos design draft (Dec 2025,
13-week implementation plan, not merged) — see
[Vulkan Video Integration into Chromium](https://www.khronos.org/vulkan/chrome-video/vulkan_video_integration.html).
Beyond probing, brave-bin on PineTab2 is structurally unable to engage
HW video decode at all due to the chromeos-pipeline ImageProcessor wall —
see [[fourier:brave_arm64_vaapi_wall]] on DokuWiki or
`~/src/brave-vaapi-fourier/DEFINITIVE_FINDING.md` (measured 2026-05-21).
**Implication for this campaign**: Brave is NOT a Phase 1 consumer.
The immediate consumer story:
- **mpv with `--hwdec=vulkan`** — enumerated today on ohm for h264 /
hevc / vp9 / av1 (mpv hwdec=help confirms). Uses libavcodec's
`hwcontext_vulkan.c` path. Once panvk-bifrost-video exposes
`VK_KHR_video_decode_h264`, mpv-fourier becomes an immediate consumer.
- **ffmpeg with `-hwaccel vulkan`** — first-class hwaccel method,
confirmed in `ffmpeg -hwaccels` on ohm.
- **gstreamer 1.28.3 `vulkan` plugin** — gst-plugins-bad ships
`vulkan{h264,h265,av1}dec` (per-codec presence on this build TBD).
- **Future Brave**: gets it free when chromium upstream lands
VulkanVideoDecoder (months/year-ish).
Phase 1 milestone stays **vk-video-samples** as the test client (isolates
driver work from consumer-side bugs). Phase 8 close-criteria will add
"mpv-fourier `--hwdec=vulkan` decodes BBB H.264 on ohm with fuser
showing /dev/video1 engagement" — the real-world consumer proof.
### Q3 — Vulkan ↔ V4L2 DPB management mismatch
Vulkan API thinks of DPB (Decoded Picture Buffer) as an array of
`VkImage` slots owned by the driver, with the application telling the
driver which slot is the output frame, which slots are references
for the current decode, and when a slot can be reused.
V4L2 stateless H.264 thinks of DPB as a runtime data structure
encoded in `V4L2_CID_STATELESS_H264_DECODE_PARAMS` (the `dpb[16]`
array of `v4l2_h264_dpb_entry`), pointing at indices of frames in
the CAPTURE queue.
The mapping is doable but not trivial. The Mesa NVK/Anv/RADV
implementations have abstractions around this in
`src/vulkan/runtime/vk_video.c`. Phase 0 close: read that file end
to end, decide whether it's a usable harness for our V4L2 backend
or whether we need a parallel set of helpers.
### Q4 — Vulkan video queue family expectations vs panvk's job manager
panvk on Bifrost is JM-class (Job Manager). Job Manager has graphics
+ compute + fragment ringbuffers; it has no concept of a separate
video ring. The Vulkan API expects a queue family with
`VK_QUEUE_VIDEO_DECODE_BIT_KHR` and an associated `VkQueue` instance
you submit decode commands to.
Our submit path won't actually go to the JM at all — it'll go to
V4L2. So the panvk video queue is "fake" from the GPU's perspective:
it's a userspace queue that translates command-buffer-recorded video
ops into V4L2 ioctls. This is fine architecturally but needs the
queue infrastructure (synchronization, timeline semaphores between
graphics and video families) to be wired up correctly. NVK probably
has the cleanest reference for this since NVDEC is also
architecturally separate from the graphics scheduler on Nvidia.
### Q5 — Hantro per-stream device contention vs concurrent decodes
VK_KHR_video allows multiple `VkVideoSessionKHR` instances per device.
If two of them concurrently want to decode different streams, the
hantro m2m driver serializes them via the V4L2 queueing model, but
performance contention is a real issue. Phase 1's target is a single
decode session; multi-session concurrency is a Phase >>1 problem.
## Predecessor inheritance summary
| Inherited | Source | How used |
|---|---|---|
| Vulkan ICD substrate (`libvulkan_panfrost.so` r4) | panvk-bifrost campaign | The library we extend with video |
| PKGBUILD pattern for `mesa-panvk-bifrost-*` packages | marfrit-packages/arch/mesa-panvk-bifrost | Template for new `mesa-panvk-bifrost-video` |
| V4L2 stateless H.264 control marshalling | libva-v4l2-request-fourier | Reference; not linked into panvk |
| Kernel `dma_resv` patches | linux-fresnel-fourier | Buffer fence correctness on V4L2 producers |
| Build/CI on Gitea Actions aarch64 runner | marfrit-packages | Same pipeline, new package |
| Dev process 9(+1)-phase loop | `feedback_dev_process.md` | This campaign follows |
## Phase 0 close criteria (when this loop step is done)
- [x] Research question + mechanism locked
- [x] Predecessor state vs data categorized
- [x] Live verification on ohm — vulkaninfo baselined, libva-v4l2-request
re-anchored via ffmpeg side-by-side (1.56×/1.73× realtime confirmed)
- [x] Open questions tabled — Q1 (device ownership, lock A: mutex with env),
Q2 (Brave probe — ANSWERED: no, won't engage, see DokuWiki finding),
Q3 (DPB mapping — Phase 1 reads Mesa NVK reference),
Q4 (video queue family on JM — Phase 1 design item),
Q5 (multi-session concurrency — Phase >>1 scope, lock single-session for now)
- [ ] vk-video-samples build attempt on aarch64 — **PENDING**, last
gating item for Phase 0 close
- [ ] Phase 0 evidence dir populated with anchored measurements
(`phase0_evidence/`) — **PENDING** packaging the raw measurements
## What Phase 1 will lock against
After Phase 0 closes, Phase 1 will state the success metric in
measurable terms. Tentative: *"vk-video-samples (or equivalent
Khronos Vulkan video test client, version locked) decodes a
1080p H.264 sample to NV12 frames on ohm using mesa-panvk-bifrost-video,
with `fuser /dev/video1` confirming hantro engagement, with no
software fallback in `chrome://media-internals`-equivalent
diagnostics, at no worse than 1.0× realtime."*
The 1.0× threshold is conservative; libva-v4l2-request-fourier
already does 1.16× via the same V4L2 path. The driver-bridge cost
should be a few percent at worst. Anything below 0.7× indicates a
buffer-copy regression to investigate.
— claude-noether, 2026-05-21