From 7510b569bf650068c121dfa9b8985feea7a4f794 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sun, 17 May 2026 20:22:26 +0000 Subject: [PATCH] Scaffold: community VC7 codec firmware (empty, Phase 0 pending) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Empty project skeleton + README laying out the wax-and-feathers rationale (VC7 is programmable silicon sitting unused; YouTube's VP9 / AV1 codecs map to Pi 5's biggest energy cost; higgs is battery-powered; saving electrons saves the planet — at least that seems to be consensus). Includes: - README.md with rationale, scope (in/out), conventions, 5 Phase-0 open questions, layout, sibling-project orbit. - docs/dev_process.md (mirror of feedback_dev_process.md). - src/, tests/ empty placeholders. No build system yet. No PKGBUILD. Phase 0 not started. Co-Authored-By: Claude Opus 4.7 --- .gitignore | 9 +++ README.md | 161 ++++++++++++++++++++++++++++++++++++++++++++ docs/dev_process.md | 96 ++++++++++++++++++++++++++ src/.gitkeep | 0 tests/.gitkeep | 0 5 files changed, 266 insertions(+) create mode 100644 .gitignore create mode 100644 README.md create mode 100644 docs/dev_process.md create mode 100644 src/.gitkeep create mode 100644 tests/.gitkeep diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..4acc5ee --- /dev/null +++ b/.gitignore @@ -0,0 +1,9 @@ +# Build artifacts +build/ +*.o +*.elf +*.bin +*.lst +# Editor +*.swp +.DS_Store diff --git a/README.md b/README.md new file mode 100644 index 0000000..f172d95 --- /dev/null +++ b/README.md @@ -0,0 +1,161 @@ +# daedalus-fourier + +Community-built video codec firmware for the VideoCore VII processor +on Broadcom BCM2712 (Raspberry Pi 5, Compute Module 5). Targets the +codecs Pi 5's purpose-built hardware doesn't cover — primarily VP9 +and AV1 — by running software decoders on the programmable VPU cores +already sitting on the die. + +> Daedalus built the Labyrinth for King Minos, then escaped from it +> by hand-forging flight firmware out of feathers and wax when no +> sanctioned exit existed. + +That's the project shape. The Broadcom-locked VideoCore VII is the +Labyrinth; the Pi Foundation's "use the HEVC block and live with +software decode for everything else" is the official non-exit; and +this project is the wax-and-feathers attempt to make the silicon do +what its instruction set says it can. + +**Status: empty scaffold.** Phase 0 not yet started. Don't try to +build this. Don't expect a milestone schedule. This is research-track +work that may take years or may turn out structurally impossible. + +## Why this exists (rationale) + +higgs is a Raspberry Pi Compute Module 5 in a small portable +chassis with a battery. Watching nerds review *Star Wars* on YouTube +while putting Mac Studios into virtual shopping baskets is a +core workload for the higgs class of device. + +YouTube serves H.264 (legacy), VP9 (typical 4K), and AV1 (newer +high-bitrate / high-resolution content). It does not serve HEVC. +Pi 5's BCM2712 has one HW decoder block: HEVC. The intersection +of {what YouTube serves} ∩ {what BCM2712 decodes in HW} = ∅. + +Every YouTube frame on higgs today is software-decoded on Cortex-A76 +cores at ~50–90% CPU per video stream. Software decode at 1080p VP9 +draws roughly 4–7 W under sustained load on Pi 5. HW decode of the +same content, if achievable, would draw under 1 W and free the CPU +for everything else. + +For a battery-powered higgs that's a 3–5× session-time improvement. +For the global ~10M-unit Pi 5 install base watching mostly-VP9 +video, it's measurable embodied-electron savings. Saving electrons +saves the planet — at least that seems to be consensus. + +The Pi Foundation isn't going to do this work (per their own +statement: chromium-patch sustainment was too much; firmware-codec +sustainment would be moreso). The kernel `rpi-hevc-dec` series has +been 17 months in review for one decoder block they DID write +themselves. Whatever ships here ships through the community. + +## Scope + +### In scope (research-grade, no schedule) +- **VideoCore VII ISA + toolchain reverse-engineering.** Extending + the existing VC4/VC6 community work (Mesa `vc4`/`v3d`, `vc4-asm`, + Eric Anholt's docs) to VC7. VC4's ISA is open-sourced; VC6/VC7 + is mostly extrapolation but the boot path / firmware-load + interface is documented enough to load custom code. +- **AV1 software decoder on VC7.** AV1 is the larger long-term win + (newer codec, growing share, no licensing landmines comparable + to HEVC). dav1d's reference C implementation is the porting + target — significant work but a clean reference exists. +- **VP9 software decoder on VC7.** Older but currently the dominant + 4K YouTube codec. libvpx as porting target. +- **V4L2 stateless / stateful uAPI exposure.** Once firmware + decodes, surface it as `/dev/videoNN` so existing userspace + (ffmpeg, mpv, firefox-fourier via libavcodec) consumes it + identically to rpi-hevc-dec. Reuse libva-v4l2-request-fourier + + firefox-fourier plumbing unchanged. + +### Out of scope +- HEVC (Pi 5 has a dedicated HW block; rpi-hevc-dec covers it). +- Pi 4 / BCM2711 / VideoCore VI. Different ISA, smaller compute + budget, likely insufficient for VP9/AV1 even in optimal firmware. +- Non-Pi targets. VideoCore is Broadcom-Pi-specific silicon. +- Encode. Pi Foundation explicitly removed all HW encode in Pi 5; + firmware encode on VC7 is a separate (larger) project. +- Replacing the existing GPU firmware. Custom codec firmware + coexists with the graphics path; doesn't displace it. + +## Conventions + +This project follows the same 9(+1)-phase dev process as the rest +of the fresnel/ampere/firefox-fourier campaigns. See +`docs/dev_process.md` (mirrored from +`~/.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md`). + +- Phase 0 (substrate / motivation / inventory) opens the chapter. +- Phase 5 (second-model review) is mandatory before implementation. +- Phase 8 (package & ship) goes through `marfrit-packages` ALARM. +- Phase 9 distills lessons into memory entries. + +Gitea identity: `claude-noether` (per +`feedback_gitea_as_claude_noether.md`). No `marfrit` pushes from +Claude sessions. + +## Open questions for Phase 0 + +Lock these empirically before Phase 1. + +1. **Is VideoCore VII's ISA reachable from userspace today?** The + Mesa `v3d` driver exposes compute shaders. Can arbitrary + bytecode load via that path, or do we need a custom firmware + loader through the mailbox interface? +2. **What's the VC7 instruction throughput envelope?** VC4 hit ~8 + GFLOPS at full clock. VC7 has 12 cores at higher clock. Need + actual benchmark numbers for the kind of integer / SIMD load + AV1 / VP9 entropy decode demands. Decode is fundamentally + serial in places — does VC7's parallel structure even help? +3. **Is the boot / firmware-load path documented enough to load + community firmware safely?** Pi Foundation publishes some VC + binaries (`start.elf`, `fixup.dat`). What's the contract for + side-loaded code that wants to coexist with the GPU path? +4. **What's the state of the prior art?** Search Mesa community, + Phoronix archives, Eric Anholt's old posts, Herman Hermitage's + VideoCore reverse-engineering. The "v3d firmware decode" + concept may have been tried and abandoned for documented + reasons we should learn before re-doing. +5. **Is dav1d / libvpx structurally amenable to GPU-shader-style + parallelism, or does it assume conventional CPU execution?** + This may be a hard architectural blocker that no amount of + firmware brilliance fixes. + +## Layout + +``` +daedalus-fourier/ +├── README.md ← this file +├── docs/ +│ └── dev_process.md ← reference copy of the 9(+1)-phase loop +├── src/ ← empty; VC7 codec firmware lives here +└── tests/ ← empty; unit-test rig for individual + decode primitives +``` + +No build system yet. No PKGBUILD. Adding those before the first +real compilation unit is scaffold-creep. + +## Sibling projects in the same orbit + +- `libva-v4l2-request-fourier` — VA-API consumer-side backend. + Would receive daedalus-fourier's decoded surfaces transparently + once firmware exposes a V4L2 stateless / stateful node. +- `firefox-fourier` — Firefox fork that routes stateless V4L2 + through libavcodec's `v4l2_request` hwaccel. Same pickup point. +- `chromium-fourier` — sibling for Chromium. +- `kernel-agent` — manages the kernel-side bring-up. The V4L2 + driver wrapping daedalus's firmware would land here. +- `ampere-av1-enablement` — software-side AV1 bring-up on RK3588 + (rkvdec / vpu981). Provides the userspace conformance harness + daedalus would reuse for VC7-AV1 verification. + +## Source attribution + +Daedalus-the-myth is public domain. The wax-and-feathers +metaphor is older than software engineering. + +Anyone wanting to fail at this project: please file your failures +under `branches/icarus/`. Built-in self-deprecation slot, with +honor. diff --git a/docs/dev_process.md b/docs/dev_process.md new file mode 100644 index 0000000..ae9715c --- /dev/null +++ b/docs/dev_process.md @@ -0,0 +1,96 @@ +--- +name: Claude-Assisted Development Process (9(+1)-phase loop) +description: Default workflow for any non-trivial implementation — substrate/motivation/inventory, formulate, analyze, baseline, plan, second-model review, implement, verify, closing (package+ship), memory-update; with explicit loopback edges +type: feedback +originSessionId: 83898ac9-e61f-4c44-8429-0154cb12d124 +--- +Markus's standardized loop for our implementation work. Apply by default whenever a task is bigger than a one-liner. Skipping phases is a deliberate choice that should be flagged, not a default. + +## Phase 0 — Substrate / Motivation / Inventory + +Pre-formulation. Lock the research question and assemble the substrate *before* Phase 1 commits to a measurable goal. Output: a `phase0_findings.md` artifact that future phases can refer back to without re-deriving. + +- **Research question + mechanism captured.** State the question in one sentence. Capture any operator-supplied mechanism (the "why this question, how does it work" insight) verbatim — it's the load-bearing claim Phase 1 binds against. +- **Predecessor carry-over: state vs data.** When a campaign succeeds another, categorize what transfers. *State* (installed packages, governor settings, system tweaks, source-read file:line pointers, protocol designs, parser scripts) carries forward. *Data* (drop counts, perf percentages, threshold values, baseline floors) does not — it is reference history only. Binding cells in this campaign anchor to in-session-acquired numbers, even if the predecessor measured an identical condition. +- **Tooling and measurement-instrument inventory.** What's installed, what would need installing, what extensions/protocols the live system actually supports. Live verification, not paper compatibility. +- **In-session baseline anchor.** Re-run the reference rep — N=3 minimum if the baseline is load-bearing for the campaign's premise — *before* any instrument changes. **If the predecessor's reference floor doesn't replicate at N=3 in the same session, that is the campaign result.** Don't build multi-phase infrastructure on an N=1 historical floor. See `feedback_replicate_baseline_first.md`. +- **Open questions tabled.** What's not known going into Phase 1. Phase 1 locks against the knowns; Phase 0 surfaces the unknowns explicitly so they don't slip into binding cells unverified. + +## Phase 1 — Goal Formulation +Define the objective in measurable terms. State what success looks like *before* touching anything. The chosen metric is a **hypothesis** about what to measure, not an axiom — Phase 3 may invalidate it. + +## Phase 2 — Situation Analysis +Document current state. Identify constraints, dependencies, known failure modes. **Reset context here** — do not carry assumptions from prior sessions; re-read CLAUDE.md, relevant memory files, run `git status`, re-verify reachability. + +## Phase 3 — Baseline Measurements +Take concrete measurements *before* any changes. Paste raw output into DokuWiki at capture time — verbatim, not paraphrased. The Phase 5 artifact is the raw data, not Claude's summary. + +**Real data, not theatre.** Phase 3 exists to use AI capacity for absorbing wide, low-level instrumentation a human reader would skim past. Attaching strace / perf / ftrace / eBPF / custom tripwires to the process under test is real Phase 3; scraping mpv's stdout dropped-frame counter is not. Discriminator: if a human with bash and grep could produce the same baseline, it isn't Phase 3 yet — go down to the syscall / call-path / MMIO / register layer. See `feedback_phase3_no_theatre.md`. + +**Anti-fabrication:** +- Every cited value traces to a visible tool invocation or verbatim paste-in. If a measurement wasn't taken, write "not measured" — never an estimate, inference, or recall from training / prior sessions / sibling-host memory. +- Raw before derived. A derived number (FPS, p99, error rate) appears alongside the raw stream it came from, never alone. +- Rig failure is the finding. Empty strace, dead UART, perf counter that didn't increment → that *is* the Phase 3 result. Loop back to Phase 2 to fix the rig; do not synthesize plausible-looking baseline data to keep momentum. + +- **If baseline reveals the Phase 1 metric was tracking the wrong thing → loop back to Phase 1** with the corrected target. (Example: "max H.264 FPS" Phase 1 metric, but baseline shows DMA-setup + sync overhead dwarfs decode → real metric is bytes-copied-per-second / EGL surface-import time, not FPS.) + +**Measurements describe what the system *does*, not what it *should do*.** Baseline data is evidence, not a specification. Do NOT derive API call sequences, struct layouts, or parameter values from observed behaviour (strace, perf, example output). Observable behaviour may reflect bugs, workarounds, or implementation accidents — anything you copy from it inherits those. + +## Phase 4 — Plan +Formulate the approach. Identify what will and will not be touched. State expected outcome of implementation in the *same* measurable terms used in Phase 1/3. + +## Phase 5 — Second Model Review +Goal, situation, measurements, plan get pasted into **DokuWiki**. Markus reviews and redacts, then initiates the handover to a fresh model instance. **Claude does not curate the artifact going to the reviewer** — that would re-introduce the blind-spot accumulation the review is meant to escape. Do not summarize when handing over; paste the actual artifacts. + +## Phase 6 — Implementation +Execute the plan. Scope strictly to what was planned — resist feature creep, refactor-creep, "while I'm here" cleanups, and over-eager scope expansion. If a plan revision is needed mid-implementation, surface it explicitly and re-enter Phase 4. + +**Contract before code.** Before writing or modifying any call site: +- Read the API contract — kernel docs, header comments, and upstream source for every call touched. +- State the contract explicitly before implementing against it (in the plan, the commit message, or a comment — somewhere reviewable). +- If the contract cannot be found: stop and surface the gap. Don't infer it from baseline behaviour or sibling code. + +**Copying from baseline measurements is not implementation. It is transcription of potentially broken behaviour.** A deliverable that matches baseline bytes but violates the API contract is not a deliverable — it is a deferred bug. + +### What "state the contract explicitly" looks like + +Worked example: `0012-h264-omit-scaling-matrix-frame-based.patch` in `~/src/ohm_gl_fix/phase6/step1/`. The commit message opens with the contract before any code: + +> VAAPI signals "explicit scaling lists are present in the bitstream" implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a `VAIQMatrixBufferH264` alongside `RenderPicture` iff `sps_scaling_matrix_present_flag || pps_scaling_matrix_present_flag`. When the bitstream uses default (flat) scaling, no IQMatrixBuffer arrives […] +> +> Earlier draft of this patch unconditionally omitted SCALING_MATRIX in FRAME_BASED. That's **corpus-correct** (bbb has no explicit scaling lists) but the **wrong predicate**: the kernel-side gating is by "matrix-supplied vs. not," not by decode mode. […] +> +> Contract verification (audit_0008_decode_params_2026-05-01.md + hantro_h264.c::assemble_scaling_list): the kernel uses the supplied matrix when SCALING_MATRIX is in the control batch and falls back to spec-defined defaults when absent. Mode-independent. + +What this gets right: +- **Contract first**: per-control rules cited from kernel doc (`ext-ctrls-codec-stateless.rst:752`), kernel driver (`hantro_h264.c::assemble_scaling_list`), and sibling implementation (gst-plugins-bad commit 9e3e775) — *before* any patch hunks. +- **Corpus-correct ≠ spec-correct, called out by name**: the rejected predicate ("omit SCALING_MATRIX in FRAME_BASED") *did* match the BBB baseline. It still got rejected, because the contract said the gate is "matrix-supplied vs. not," not "decode mode." This is exactly the Phase 3-derived-implementation trap. +- **Then** the diff implements one branch per contract clause: SPS/PPS/DECODE_PARAMS always, SCALING_MATRIX iff `matrix_set`, SLICE_PARAMS iff SLICE_BASED, PRED_WEIGHTS iff SLICE_BASED + `V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED`. + +Mirror format anywhere reviewable: PR description, commit message body, plan section, or a header comment block. The shape is "contract clauses with citations → code that maps 1:1 to those clauses." + +## Phase 7 — Verification Measurements +Repeat measurements from Phase 3. Compare explicitly against baseline. +- **If the delta does not match Phase 4's prediction → loop back to Phase 4** (re-plan). Do not declare success when the numbers say otherwise; an unexplained delta is a finding, not a footnote. + +## Phase 8 — Closing (Package & Ship) +Ship the deliverable to its consumption point. Working code that lives only in a checkout is half a deliverable — the next session has to re-discover it, the fleet doesn't get the fix, and the loop's value evaporates. + +- **Kernel patch → kernel-agent package.** Route through the kernel-agent flow (`fleet/.yaml` + scope-tagged patches) so the kernel package gets properly built, signed, and published. Don't leave loose `.patch` files in a working tree. See `project_kernel_agent.md` for the manifest shape; `linux-ampere-fourier` and `linux-fresnel-fourier` are the canonical examples. +- **Program / library change → marfrit-packages.** Add or update a PKGBUILD (Arch/ALARM) or debian/ tree (deb), push to `git.reauktion.de/marfrit/marfrit-packages`, and let `.gitea/workflows/build.yml` produce + sign + publish to `packages.reauktion.de`. See `project_marfrit_packages.md`. Local-only fixes go upstream as PR-quality diffs into the same overlay. +- **Skipping is a deliberate choice.** If the change is one-shot scratch work (debugging tripwire, throw-away script), say so explicitly in the closing note. The default is: it gets packaged. +- **Re-verify on the deploy host with the packaged artifact.** A clean Phase 7 result from a hand-rolled dev build (e.g. `meson -Dbuildtype=release && ninja`) is **not** the same as the `.pkg.tar.zst` / `.deb` that the deploy host installs. Distro packaging flags (Arch makepkg's `-O2 + FORTIFY + stack-protector-strong + stack-clash-protection` vs meson's `-O3 -DNDEBUG`, debhelper's hardening defaults, lto toggles) vectorise / unroll loops differently and routinely unmask latent UB the dev build folded away. Pull the published package down via the package manager and re-run the Phase 7 success criterion against it before closing — until that PASSes, the loop is not done. See `feedback_package_build_flags_unmask_bugs.md` for the iter39 incident that codified this. + +## Phase 9 — Memory Update +Loop terminates here. Distill the lesson into a memory entry — what was the mistake the loop caught, what's the rule that would shorten the next cycle. Do not let the lesson rot in chat history. + +--- + +## Loopback edges (summary) +- Phase 3 → Phase 1 (metric was wrong) +- Phase 7 → Phase 4 (plan didn't deliver predicted delta) +- Any phase → Phase 0 (substrate was wrong: predecessor baseline didn't replicate, mechanism doesn't engage on this stack, or the data inverts the premise → re-anchor or honest close) +- Phase 9 closes the loop + +## Why this exists +Several recurring failures in prior work codify into individual rules — observer-first, simulate-before-flash, three-strikes-then-verify, "trust eyes not vibes," scope-strictly-to-plan, no-fake-dry-run. Those are all symptoms; this loop is the structural fix. Use it as the spine and let those rules show up as rejection patterns inside the appropriate phases. diff --git a/src/.gitkeep b/src/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/tests/.gitkeep b/tests/.gitkeep new file mode 100644 index 0000000..e69de29