claude-noether 4182b32adf design: optional Stage 5 NV12 → RGBA conversion
User question 2026-05-23: 'Wayland does need a conversion of NV12 to
its output format. Could we cram that in?'

Yes — trivially.  Added Stage 5 to the pipeline doc with:

  - 5-line per-pixel compute shader (BT.709 limited-range example
    given; matrix selected from H.264 VUI at runtime)
  - explicit OPT-IN flag, off by default
  - rationale for default-off: most consumers (V4L2 stateless,
    Wayland zwp_linux_dmabuf NV12 passthrough, Firefox/mpv VAAPI
    paths) want NV12 because compositors convert during composition
    essentially for free.  RGBA8 is 4x the bandwidth of NV12 — not
    worth burning DMA + electrons when no downstream needs it
  - colourspace metadata plumbing requirement: SPS vui_parameters
    (colour_primaries, transfer_characteristics, matrix_coefficients,
    video_full_range_flag) MUST flow through to the shader; default
    BT.709 limited-range with warning if VUI absent

Updated the new-shader inventory to include v3d_h264_yuv_to_rgba.
Total dispatches/frame remains ~190-200; Stage 5 adds one.
2026-05-23 22:46:45 +02:00

daedalus-decoder

Frame-level GPU H.264 decoder for Raspberry Pi 5 / V3D7. Design phase — not implemented yet.

The objective: build the NVDEC-equivalent shape on Pi 5. One Vulkan submit per frame, one fence wait per frame, encoded H.264 bitstream in, NV12 frame out. Reuses daedalus-fourier's V3D compute primitives at the right granularity — not the per-block-call granularity that the kernel-substitution prototype exposed as architecturally wrong.

Sibling projects:

  • daedalus-fourier — V3D + NEON kernel pack (IDCT, MC, deblock primitives). Stays as research/microbench artifact.
  • daedalus-v4l2 — V4L2 stateless decoder shim + userspace daemon for Pi 5. The eventual consumer of this decoder.
  • libva-v4l2-request-fourier — VAAPI ↔ V4L2 stateless bridge. End consumer.

See DESIGN.md for the architecture sketch.

S
Description
Frame-level GPU H.264 decoder for Raspberry Pi 5 V3D7. NVDEC-shaped pipeline (encoded bitstream in, NV12 out, one Vulkan submit per frame) built on daedalus-fourier's V3D compute primitives. Phase 1 design exploration.
Readme 560 KiB
Languages
Markdown 100%