diff --git a/DESIGN.md b/DESIGN.md index d59a5b3..1e2caff 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -215,25 +215,31 @@ That's a substantial shader inventory. Each requires bit-exact M1 gate against **Phase 4 โ€” Production-ready deblock + perf optimization + libva integration** (+4 weeks). Real-world stream conformance. Plug into daedalus-v4l2 daemon as the actual decode backend. -**Total budget:** 4-6 months. +**Total H.264 budget:** 4-6 months. + +**Phase 5+ (future codec scope, not committed):** VP9 and AV1 reuse the same frame-level dispatch architecture, daedalus-fourier kernel pack, and DPB plumbing. Per ยง9.7, they are deferred but *not firmly out-of-scope*. HEVC stays firmly out (Pi 5 has `rpi-hevc-dec` for that). --- -## 9. Open questions +## 9. Phase 1 decisions -1. **Intra prediction strategy:** GPU wavefront (~187 dispatches, more complex) vs CPU speculative (simpler, slower). Plan: wavefront in Phase 1; revisit if it's the perf bottleneck. +User-confirmed 2026-05-24. All seven questions from the initial +draft are now decided; this section preserves the original wording +of each item for traceability. -2. **libavcodec intercept granularity:** macroblock-level (substitution-arc evolution) vs slice-level (cleaner rewrite). Plan: macroblock-level for Phase 1; consider slice-level later if buffer accumulation overhead is non-trivial. +1. **Intra prediction strategy:** GPU wavefront (~187 dispatches, more complex) vs CPU speculative (simpler, slower). **Decision: wavefront in Phase 1; revisit if it's the perf bottleneck.** -3. **Shader parameterization:** 16 qpel variants as 16 shaders, or one parameterized shader with switch on mc_position? V3D's compiler might inline-optimize either; needs measurement. +2. **libavcodec intercept granularity:** macroblock-level (substitution-arc evolution) vs slice-level (cleaner rewrite). **Decision: macroblock-level for Phase 1; consider slice-level later if buffer accumulation overhead is non-trivial.** -4. **DPB allocation:** Vulkan-native VkImage with dmabuf export, vs CPU-allocated dma_buf imported into Vulkan. Affects V4L2 integration story. Plan: Vulkan-native with `VK_KHR_external_memory_dma_buf` export; daedalus-v4l2 daemon imports. +3. **Shader parameterization:** 16 qpel variants as 16 shaders, or one parameterized shader with switch on mc_position? **Decision: measure both during Phase 2 (the MC phase) and pick the winner. No commit ahead of measurement.** -5. **Daemon integration shape:** does daedalus-decoder ship as a static library the daemon links, or as a separate process the daemon talks to? Library, almost certainly โ€” process boundary would multiply IPC cost. +4. **DPB allocation:** Vulkan-native VkImage with dmabuf export, vs CPU-allocated dma_buf imported into Vulkan. **Decision: Vulkan-native with `VK_KHR_external_memory_dma_buf` export; daedalus-v4l2 daemon imports.** -6. **Build dependency on daedalus-fourier:** as a CMake `find_package`, or vendored? `find_package`, pinned to a tagged release. daedalus-fourier becomes the "kernel pack" upstream library. +5. **Daemon integration shape:** static library the daemon links, or separate process. **Decision: library link.** -7. **Out-of-scope for daedalus-decoder (firmly):** VP9, AV1, HEVC (Pi 5 has rpi-hevc-dec for that), 10-bit, interlaced, FMO/ASO. +6. **Build dependency on daedalus-fourier:** CMake `find_package`, or vendored? **Decision: `find_package`, pinned to a tagged release. daedalus-fourier becomes the "kernel pack" upstream library.** + +7. **Codec scope.** **Decision: firmly out-of-scope for daedalus-decoder are HEVC (Pi 5 has `rpi-hevc-dec` for that), 10-bit, interlaced, and FMO/ASO.** VP9 and AV1 are *not* firmly out โ€” they're future codec scope for the same framework after H.264 lands. This is a scope expansion from the initial draft, which had grouped them with HEVC under "firmly out". ---