# daedalus-fourier Community-built VP9 / AV1 software-decode back-end running on the VideoCore VII (V3D 7.1) QPUs on Broadcom BCM2712 (Raspberry Pi 5 / Compute Module 5), via the existing Mesa `v3d` userspace driver. ARM keeps the serial entropy front-end; the QPU takes the parallel back-end (inverse transforms, deblocking, CDEF, loop restoration, MC residual add). > Daedalus built the Labyrinth for King Minos, then escaped from it > by hand-forging flight firmware out of feathers and wax when no > sanctioned exit existed. That's the project shape. The Broadcom-locked VideoCore VII is the Labyrinth; the Pi Foundation's "use the HEVC block and live with software decode for everything else" is the official non-exit; the QPU sits unused inside the labyrinth's walls. **Status: Phase 0 closed (substrate audit). Phase 1 in progress (first-kernel proof on hertz).** This is research-track work that may take months or may yield a single proof-of-concept kernel that loses to ARM NEON, in which case the negative result ships and the project closes. ## Why this exists higgs is a Raspberry Pi Compute Module 5 in a small portable chassis with a battery. Watching nerds review *Star Wars* on YouTube while putting Mac Studios into virtual shopping baskets is a core workload for the higgs class of device. YouTube serves H.264 (legacy), VP9 (typical 4K), and AV1 (newer high-bitrate / high-resolution content). It does not serve HEVC. Pi 5's BCM2712 has one HW decoder block: HEVC. The intersection of {what YouTube serves} ∩ {what BCM2712 decodes in HW} = ∅. Every YouTube frame on higgs today is software-decoded on Cortex-A76 cores at ~50–90% CPU per video stream. Offloading the parallel back-end of that decode to the otherwise-idle QPU complex *might* recover meaningful CPU time and battery on higgs. The honest prior — measured in Phase 0 — is that the QPU has roughly equal raw compute to the A76 cluster but a smaller slice of the shared LPDDR4x bandwidth, so the win, if any, comes from offloading *concurrent* work the CPU would have done anyway. The Pi Foundation isn't going to do this work (per their own statement: chromium-patch sustainment was too much; codec sustainment would be moreso). The kernel `rpi-hevc-dec` series has been 17 months in review for one decoder block they DID write themselves. Whatever ships here ships through the community. ## Architecture (Path B) Phase 0 closed two paths: - **Path A — custom VPU firmware on the VC7 scalar cores.** Blocked. BCM2712 has a silicon root of trust: the mask ROM hardcodes RPi's public key and unconditionally verifies the second-stage bootloader. `EXECUTE_CODE` mailbox removed on Pi 5. No software-only bypass exists. See `docs/phase0.md §3`. - **Path B — QPU compute kernels via the existing Mesa `v3d` / DRM / Vulkan-compute path.** This is the path. The QPU is reachable from userspace today on a stock signed Pi 5 / CM5 via `/dev/dri/card0`. No firmware loading. No signing fight. `Idein/py-videocore7` (SGEMM 21 GFLOPS sustained) is the existence proof. The build: ``` ┌───────────────────────────────┐ │ userspace VP9 / AV1 decoder │ │ (fork of dav1d / libvpx) │ ├───────────────────────────────┤ │ ARM: entropy decode │ ← Cortex-A76 + NEON │ (Bool coder / ANS) │ structurally serial ├───────────────────────────────┤ │ QPU: parallel back-end │ ← V3D 7.1 via Mesa v3dv │ (IDCT, CDEF, │ Vulkan compute shaders │ deblock, LR, MC) │ or direct DRM submit ├───────────────────────────────┤ │ V4L2 stateless wrapper │ ← out-of-tree kernel module │ (eventual, kernel-agent) │ exposing /dev/videoN └───────────────────────────────┘ ``` The first deliverable is *not* the V4L2 wrapper. The first deliverable is one back-end kernel running on the QPU, bit-exact against a libavcodec reference, with measured throughput. If that single kernel can't beat NEON or get within 50% of it, the project closes here with a documented negative result. ## In scope - A small set of codec back-end kernels (IDCT 8×8, CDEF, deblocking, loop restoration filter, MC interpolation) compiled as SPIR-V compute shaders for Mesa `v3dv`, dispatched via Vulkan compute from userspace. - A test harness on hertz that runs each kernel against libavcodec reference outputs and measures throughput (megapixels/sec or blocks/sec) against the equivalent NEON path. - Phase 1 = one kernel, bit-exact, with numbers. Phase 2+ = more kernels only if Phase 1 numbers justify it. ## Out of scope (for now) - HEVC (Pi 5 has dedicated silicon; `rpi-hevc-dec` covers it). - Pi 4 / BCM2711 / VideoCore VI. Different ISA, smaller compute budget. Path B *could* extend but isn't the priority. - Encode. Pi Foundation removed all HW encode in Pi 5; encode on VC7 is a separate, larger project. - Custom VPU firmware (Path A — blocked by silicon RoT, see `docs/phase0.md`). - V4L2 stateless driver wrapping the userspace decoder. Eventual consumption point, but Phase 1 lives entirely in userspace. - Beating ARM NEON unconditionally. The honest target is *concurrent* work: QPU runs while CPU does something else. ## Dev substrate - **hertz** (Pi 5, 8 GB, Debian Trixie, kernel 6.12.75-rpt-rpi-2712, Mesa 25.0.7 with v3dv, V3D 7.1.7) — the dev / test / measurement host. Watchdog-protected for crash recovery. See `docs/vulkaninfo_v3d_7_1_7_hertz.txt` for the inside-view device profile. - **higgs** (CM5 in portable battery chassis) — the eventual user target. Not a dev unit; sealed chassis. ## Conventions This project follows the 9(+1)-phase dev process. See `docs/dev_process.md`. Phase 0 is closed (`docs/phase0.md`); Phase 1 is `docs/phase1.md`. Gitea identity: `claude-noether` (per `feedback_gitea_as_claude_noether.md`). No `marfrit` pushes from Claude sessions. ## Layout ``` daedalus-fourier/ ├── README.md ← this file ├── docs/ │ ├── dev_process.md ← reference copy of the 9(+1)-phase loop │ ├── phase0.md ← substrate audit (closes Paths A and B) │ ├── phase1.md ← first-kernel goal + measurement plan │ └── vulkaninfo_v3d_7_1_7_hertz.txt │ ← inside-view device profile from hertz ├── src/ ← kernels + Vulkan dispatch harness └── tests/ ← bit-exact vs libavcodec, throughput ``` No build system yet. Adding CMake when the first kernel lands. ## Sibling projects in the same orbit - `libva-v4l2-request-fourier` — VA-API consumer-side backend. Eventual consumer if daedalus produces a V4L2 stateless node. - `firefox-fourier` — Firefox fork that routes stateless V4L2 through libavcodec's `v4l2_request` hwaccel. Same pickup point. - `chromium-fourier` — sibling for Chromium. - `kernel-agent` — would house the V4L2 driver wrapping the userspace decoder, once one exists. - `ampere-av1-enablement` — software-side AV1 bring-up on RK3588 (rkvdec / vpu981). Provides the userspace conformance harness daedalus reuses for VC7-AV1 verification. ## Source attribution Daedalus-the-myth is public domain. The wax-and-feathers metaphor is older than software engineering. Anyone wanting to fail at this project: please file your failures under `branches/icarus/`. Built-in self-deprecation slot, with honor.