diff --git a/README.md b/README.md index 32d92a7..fc327fa 100644 --- a/README.md +++ b/README.md @@ -2,19 +2,19 @@ ## TL;DR -**libva** (= "Linux Video Acceleration API") is the standard userspace shim that lets video apps — Firefox, Chromium, mpv, GStreamer, vlc — talk to GPU/VPU hardware decoders without per-vendor code. Apps call libva; libva loads a backend driver (`.so` in `/usr/lib/dri/`) for the local hardware. Common backends: `intel-iHD` (Intel iGPUs), `mesa-va-gallium` (AMD), `nvidia-vaapi-driver`, and **`v4l2_request`** for V4L2-stateless ARM/embedded decoders (Rockchip hantro, Allwinner cedrus, Sun*, etc.). This campaign hardens a fork of the v4l2-request backend to actually work on Rockchip hantro (PineTab2 RK3568 first), end-to-end, for real consumers. +**libva** (= "Linux Video Acceleration API") is the standard userspace shim that lets video apps — Firefox, Chromium, mpv, GStreamer, vlc — talk to GPU/VPU hardware decoders without per-vendor code. Apps call libva; libva loads a backend driver (`.so` in `/usr/lib/dri/`) for the local hardware. Common backends: `intel-iHD` (Intel iGPUs), `mesa-va-gallium` (AMD), `nvidia-vaapi-driver`, and **`v4l2_request`** for V4L2-stateless ARM/embedded decoders (Rockchip hantro, Allwinner cedrus, Sun*, etc.). This campaign hardens a fork of the v4l2-request backend to actually work on Rockchip hantro (PineTab2 — Rockchip **RK3566** silicon, hantro driver via the `rockchip,rk3568-vpu` DT compatible since the silicon is close enough; mainline `rkvdec2`/`vdpu346` for RK3566 not yet merged), end-to-end, for real consumers. **`firefox-fourier`** ([build instructions](firefox-fourier/README.md)) exists because Firefox 150's RDD-process sandbox blocks two things V4L2-stateless decoders need: `/dev/media*` request-API nodes (broker policy never lists them) and `linux/media.h` ioctls (seccomp policy doesn't admit magic byte `'|'`). Stock Firefox on hantro/cedrus/sun* hardware therefore SW-decodes everything unless you set `MOZ_DISABLE_RDD_SANDBOX=1` (which turns the whole sandbox off). The `firefox-fourier` patch — ~50 lines across two files — opens *just* what V4L2-stateless decoders need, sandbox stays on. Tested working with the iter5-end libva backend. **Sister campaigns** in the operator's "fourier" series — same hardware, different layers of the pipeline: -- [`kwin-fourier`](https://git.reauktion.de/marfrit/marfrit-packages/src/branch/master/arch/kwin-fourier) — KWin compositor patches addressing scanout-plane / dmabuf-fence behavior on RK3568. Decode-side fix in this campaign + render-side fix in kwin-fourier are complementary; together they make HW video work end-to-end on PineTab2. +- [`kwin-fourier`](https://git.reauktion.de/marfrit/marfrit-packages/src/branch/master/arch/kwin-fourier) — KWin compositor patches addressing scanout-plane / dmabuf-fence behavior on rockchip-drm (the kernel DRM driver that handles RK3566/RK3568/RK3588 display). Decode-side fix in this campaign + render-side fix in kwin-fourier are complementary; together they make HW video work end-to-end on PineTab2. - `chromium-fourier` — Chromium fork enabling VAAPI on aarch64 + GL-stack workarounds for Mali-G52/Panfrost. Stock Brave doesn't even reach VAAPI on this rig (GPU process dies at GL bindings); chromium-fourier unblocks it. Verifying compatibility with iter5 driver is iter6 candidate C. --- Single-question campaign on the libva V4L2-stateless backend: **make multi-planar libva work, end-to-end, on Rockchip hantro hardware, for production VA-API consumers (Brave/Chromium, Firefox via libavcodec, mpv `--hwdec=vaapi`, vainfo as smoke test)**. -The deliverable is a libva-v4l2-request fork that any VA-API consumer can dlopen and get H.264 (initially) and MPEG-2 hardware-decoded NV12 dmabufs out of, on PineTab2 RK3568 first, with the same plumbing intended to extend to RK3399 (fresnel) and RK3588 (boltzmann/ampere) once the RK3568 path is solid. +The deliverable is a libva-v4l2-request fork that any VA-API consumer can dlopen and get H.264 hardware-decoded NV12 dmabufs out of, on PineTab2 (RK3566 via hantro/rk3568-vpu) first, with the same plumbing intended to extend to RK3399 (fresnel) and RK3588 (boltzmann/ampere) once the PineTab2 path is solid. (MPEG-2 was iter1 backlog, dropped at iter6 close — CPU handles MPEG-2 fine on the A55 cluster.) The fork lives as a subdirectory of this campaign: @@ -32,7 +32,7 @@ This README is the Claude-facing entry point for resumption after compaction. Re Phase 5 review: -The chromium-fourier verdict's load-bearing claim is "multi-planar libva is the binding decode-side enabler on hantro." Whether that claim survives a clean control depends on this campaign's deliverable shipping. **The reverse is also true**: until a working multi-planar libva-v4l2-request lands, no consumer other than chromium-fourier-with-Step-1-patches has hardware decode on RK3568. Firefox VAAPI, mpv `--hwdec=vaapi`, gst-vaapi, vainfo all degrade to software or fall over. +The chromium-fourier verdict's load-bearing claim is "multi-planar libva is the binding decode-side enabler on hantro." Whether that claim survives a clean control depends on this campaign's deliverable shipping. **The reverse is also true**: until a working multi-planar libva-v4l2-request lands, no consumer other than chromium-fourier-with-Step-1-patches has hardware decode on PineTab2 (RK3566 via hantro/rk3568-vpu). Firefox VAAPI, mpv `--hwdec=vaapi`, gst-vaapi, vainfo all degrade to software or fall over. ## Process @@ -67,7 +67,7 @@ State (carry-over) — fork content, file:line pointers, contract analyses: Reference history (context, NOT data this campaign anchors to) — orthogonal scanout-plane constraint: -- [`~/src/kwin_overlay_subsurface/phase2_source_findings.md`](../kwin_overlay_subsurface/phase2_source_findings.md) — rockchip-drm RK3568 plane format/modifier table. **Plane 39 (Primary, NV12 LINEAR) is the only NV12-capable scanout; KWin owns it. Plane 45 (Overlay) advertises zero NV12.** Therefore: even when libva-multi-planar produces a clean NV12 dmabuf, no scanout plane is reachable while KWin runs, and some component must GL-composite NV12 → RGB before display. **This is orthogonal to libva**: libva is on the decode side, the scanout-plane gap is on the display side. They're separate problems with separate fixes. +- [`~/src/kwin_overlay_subsurface/phase2_source_findings.md`](../kwin_overlay_subsurface/phase2_source_findings.md) — rockchip-drm plane format/modifier table on PineTab2 (RK3566). **Plane 39 (Primary, NV12 LINEAR) is the only NV12-capable scanout; KWin owns it. Plane 45 (Overlay) advertises zero NV12.** Therefore: even when libva-multi-planar produces a clean NV12 dmabuf, no scanout plane is reachable while KWin runs, and some component must GL-composite NV12 → RGB before display. **This is orthogonal to libva**: libva is on the decode side, the scanout-plane gap is on the display side. They're separate problems with separate fixes. - [`~/src/x11-session-research/phase0_evidence/x11_baseline_2026-05-03/x1_summary.md`](../x11-session-research/phase0_evidence/x11_baseline_2026-05-03/x1_summary.md) — confirms the scanout-plane gap isn't fixable by switching session servers either. mpv-xv {SW,HW} and mpv-gpu {SW,HW} all leave Plane 39 in XRGB8888 throughout. It's a kernel/Mesa/Xorg-DDX gap, not a hardware-decoding gap. **Don't expect this campaign to "fix the video pipeline end-to-end"** — fixing libva-multi-planar fixes the decode side; the scanout-plane question stays open after. - [`~/src/kwin_overlay_subsurface/`](../kwin_overlay_subsurface/) — closed without patch (`phase8_handover.md`); its `feedback_replicate_baseline_first.md` lesson is the discipline that this campaign inherits. @@ -84,7 +84,7 @@ External reference: - libva-v4l2-request fork (`libva-v4l2-request-fourier/`), **backend only** — multi-planar correctness across the V4L2-stateless lifecycle: format probing (single-plane fallback to multi-plane), control protocol sequencing, surface-handle export, dmabuf modifier negotiation. - H.264 first; MPEG-2 next. HEVC explicitly out. -- Hardware target: **ohm RK3568 hantro G1/G2 first iteration only.** fresnel RK3399 + ampere/boltzmann RK3588 explicit future iterations after ohm path is solid. +- Hardware target: **ohm PineTab2 (RK3566 silicon, hantro driver via rk3568-vpu DT compatible) first iteration only.** fresnel RK3399 + ampere/boltzmann RK3588 explicit future iterations after ohm path is solid. - Test consumers: vainfo, mpv `--hwdec=vaapi`, Firefox `media.ffmpeg.vaapi.enabled`, chromium-fourier 149 (regression check). Brave 1.89 deferred (chromeos-pipeline gating, not a libva-side problem). - Phase 1 success criterion: **boolean correctness** — "libva accepted + providing access to hardware decoder". Performance metrics deferred to follow-up iteration. @@ -100,7 +100,7 @@ External reference: ## Hardware target -- ohm — PineTab2, Rockchip RK3568 (4× Cortex-A55, Mali-G52 MP2, hantro G1/G2 VPU). Kernel `6.19.10-danctnix1-1-pinetab2`. **Primary measurement target.** +- ohm — PineTab2, Rockchip **RK3566** silicon (4× Cortex-A55, Mali-G52 MP2, hantro G1/G2 VPU). Hantro driver attaches via the `rockchip,rk3568-vpu` DT compatible since RK3566/RK3568 silicon is close enough; the proper RK3566 driver target (`rkvdec2`/`vdpu346`) has no mainline support yet — Christian Hewitt's patch series ([LKML 2025/12/26/206](https://lkml.org/lkml/2025/12/26/206)) is unmerged. Kernel `6.19.10-danctnix1-1-pinetab2`. **Primary measurement target.** - (later) fresnel — Pinebook Pro, Rockchip RK3399 (hantro G1, no G2). EndeavourOS-ARM custom OC kernel — see [`reference_fresnel_kernel_constraints.md`](../../.claude/projects/-home-mfritsche-src-fourier/memory/reference_fresnel_kernel_constraints.md). - (much later) ampere/boltzmann — RK3588 (hantro VDPU381). Adding VDPU381 is a code addition this fork doesn't have today. diff --git a/firefox-fourier/README.md b/firefox-fourier/README.md index b10812f..fe17abb 100644 --- a/firefox-fourier/README.md +++ b/firefox-fourier/README.md @@ -42,7 +42,7 @@ The patch (six hunks across two files): 3. Adds ioctl magic byte `'|'` (linux/media.h) to `RDDSandboxPolicy`'s seccomp allowlist alongside the existing `'V'`. 4. Adds an explicit `case __NR_ioctl:` to `UtilitySandboxPolicy::EvaluateSyscall` mirroring RDD's allowlist (`'d'` DRM, `'b'` DMA-Buf, `'V'` V4L2, `'|'` linux/media.h) — required because FF150 routes VAAPI work to the Utility process and the common policy blocks all four magic bytes. -Tested on hantro G1 (Rockchip RK3568 / PineTab2) running bbb_1080p30 H.264 with full sandbox enabled. ENETDOWN gone, libva initializes in the Utility process, `MEDIA_IOC_REQUEST_ALLOC` succeeds, decode reaches end-of-stream. +Tested on hantro G1 (PineTab2 — Rockchip RK3566 silicon, hantro driver via the `rockchip,rk3568-vpu` DT compatible) running bbb_1080p30 H.264 with full sandbox enabled. ENETDOWN gone, libva initializes in the Utility process, `MEDIA_IOC_REQUEST_ALLOC` succeeds, decode reaches end-of-stream. ## Build instructions (Arch / ALARM) diff --git a/phase0_findings_iter7.md b/phase0_findings_iter7.md index 8af072a..f7d790c 100644 --- a/phase0_findings_iter7.md +++ b/phase0_findings_iter7.md @@ -10,7 +10,7 @@ iter6 landed a single fork commit: - `a09c03c` — per-OUTPUT-slot request_fd binding via `MEDIA_REQUEST_IOC_REINIT`. Replaces iter4's `385dee1` close+`media_request_alloc`-per-frame model. Pool size 4 → 16. Slot owns the fd, surface borrows. iter4's case-against-REINIT was confirmed to be a DPB-payload confounder (since fixed in `74d8dd1`). -iter6 dropped MPEG-2 from the carry list (CPU handles it fine on RK3568). +iter6 dropped MPEG-2 from the carry list (CPU handles it fine on the PineTab2 A55 cluster). iter6 carried into iter7: - **msync pixel-correctness verification** (carry from iter5 Phase 5 sonnet C3) @@ -90,7 +90,7 @@ Currently when REINIT or DQBUF fails mid-cycle, the slot stays busy=true until ` ### H. fourier-fresnel campaign (carried from iter5 followon-campaigns memory) -> Open the `fourier-fresnel` campaign — port `libva-v4l2-request-fourier` from ohm RK3568 to fresnel RK3399 (Pinebook Pro). Validates generality of iter1-iter6 fixes on a second hardware target. +> Open the `fourier-fresnel` campaign — port `libva-v4l2-request-fourier` from ohm PineTab2 (RK3566 via hantro/rk3568-vpu) to fresnel RK3399 (Pinebook Pro). Validates generality of iter1-iter6 fixes on a second hardware target. **Stance**: separate top-level campaign, not an iter7 candidate. Charter at `~/src/fourier-fresnel/` once opened. iter5 memory entry `project_followon_campaigns.md` records sequencing: fourier-fresnel before panvk-bifrost. @@ -115,7 +115,7 @@ Currently when REINIT or DQBUF fails mid-cycle, the slot stays busy=true until ` ## State that carries (re-verified iter6 close) -- **Hardware**: ohm RK3568 hantro G1/G2, kernel 6.19.10. Access: `ohm` (LAN). VPN currently flaky. +- **Hardware**: ohm PineTab2 (Rockchip RK3566 silicon; hantro driver via `rockchip,rk3568-vpu` DT compatible), kernel 6.19.10. Access: `ohm` (LAN). VPN currently flaky. - **Userspace**: firefox 150.0.1-1.1 (iter5 amendment), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3. - **Driver installed**: `/usr/lib/dri/v4l2_request_drv_video.so` sha256 `ebe396d55104dbfedfa1065232d7f1959c519b4afe6fe33f46c1b9af13465ed6` (iter6-end, REINIT discipline + pool=16). - **Test fixture**: bbb_1080p30_h264.mp4 sha256 `dcf8a7170fbd...`. diff --git a/phase2_iter7_situation.md b/phase2_iter7_situation.md index d596664..8a10a52 100644 --- a/phase2_iter7_situation.md +++ b/phase2_iter7_situation.md @@ -10,7 +10,7 @@ iter1 had `msync(MS_SYNC | MS_INVALIDATE)` on the CAPTURE buffer mmap after DQBU ### Hypothesis -On hantro G1 / RK3568 / kernel 6.19.10 with CMA-backed DMA contiguous allocator, the V4L2 framework (`videobuf2-dma-contig.c`) does cache sync at DQBUF time for cached mappings. If our CAPTURE mmap is cached, `msync(MS_INVALIDATE)` is structurally redundant. If it's write-combine / uncached, the kernel-side sync is unnecessary. Either way, msync removal should be safe. +On hantro G1 / PineTab2 (RK3566 via rk3568-vpu) / kernel 6.19.10 with CMA-backed DMA contiguous allocator, the V4L2 framework (`videobuf2-dma-contig.c`) does cache sync at DQBUF time for cached mappings. If our CAPTURE mmap is cached, `msync(MS_INVALIDATE)` is structurally redundant. If it's write-combine / uncached, the kernel-side sync is unnecessary. Either way, msync removal should be safe. But this is testable, not just theoretical. The cleanest test: diff --git a/phase8_iteration6_close.md b/phase8_iteration6_close.md index a1ed1b5..5c3c4ba 100644 --- a/phase8_iteration6_close.md +++ b/phase8_iteration6_close.md @@ -40,7 +40,7 @@ Closes GREEN with a single architectural fix. ## State that carries to iter7 (or campaign close) -- **Hardware**: ohm RK3568 hantro G1/G2, kernel 6.19.10. Access: `ohm` (LAN) — VPN currently flaky. +- **Hardware**: ohm PineTab2 (Rockchip RK3566 silicon; hantro driver via `rockchip,rk3568-vpu` DT compatible), kernel 6.19.10. Access: `ohm` (LAN) — VPN currently flaky. - **Userspace**: firefox 150.0.1-1.1 (iter5 amendment), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3. - **Driver installed**: `/usr/lib/dri/v4l2_request_drv_video.so` sha256 `ebe396d55104dbfedfa1065232d7f1959c519b4afe6fe33f46c1b9af13465ed6` (iter6-end, REINIT discipline + pool=16). - **Test fixture**: bbb_1080p30_h264.mp4 sha256 `dcf8a7170fbd...`. @@ -78,7 +78,7 @@ Outstanding for upstream-readiness: - Slot-leak error recovery (iter6 carry) - Probe-pattern test harness for cap_pool race (iter5 sonnet C4 / iter6 candidate A — NOW EXERCISED organically by YT's resolution renegotiations, but a synthetic harness would anchor the claim) -iter1 lock's "H.264 first; MPEG-2 next" backlog item is dropped 2026-05-06: MPEG-2 SD/HD decodes trivially in CPU on RK3568's A55 cluster (well under one core), so the campaign's user audience doesn't need MPEG-2 HW path. If an upstream reviewer asks, the answer is "H.264-only by design — CPU handles MPEG-2 fine on this hardware." +iter1 lock's "H.264 first; MPEG-2 next" backlog item is dropped 2026-05-06: MPEG-2 SD/HD decodes trivially in CPU on PineTab2's A55 cluster (well under one core), so the campaign's user audience doesn't need MPEG-2 HW path. If an upstream reviewer asks, the answer is "H.264-only by design — CPU handles MPEG-2 fine on this hardware." Per `feedback_no_upstream.md`, no PR/MR happens without explicit operator instruction. diff --git a/phase8_iteration7_close.md b/phase8_iteration7_close.md index 0a20146..cb0de15 100644 --- a/phase8_iteration7_close.md +++ b/phase8_iteration7_close.md @@ -46,7 +46,7 @@ Net: ~545 lines added, ~13 lines modified across 6 files. Three test artifacts i ## State that carries to iter8 (or campaign close) -- **Hardware**: ohm RK3568 hantro G1/G2, kernel 6.19.10. Access: `ohm` (LAN). VPN currently flaky. +- **Hardware**: ohm PineTab2 (Rockchip RK3566 silicon; hantro driver via `rockchip,rk3568-vpu` DT compatible), kernel 6.19.10. Access: `ohm` (LAN). VPN currently flaky. - **Userspace**: firefox 150.0.1-1.1 (iter5 amendment), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3. - **Test fixture**: bbb_1080p30_h264.mp4 sha256 `dcf8a7170fbd...`. - **Build container**: firefox-fourier LXD on boltzmann, persistent.