daedalus-fourier/docs at 373f63a910e5e08db7988af18a49235d9bc48c40 - daedalus-fourier - marfrit's space

marfrit/daedalus-fourier

Files

T

History

marfrit 373f63a910 Cycle 8 closed: H.264 deblock R8=0.061 RED, opportunistic helper

Phase 6 deliverable: v3d_h264deblock.comp (132 inst, 4 threads,
no spills). Phase 5 REDs applied:
  RED-1: explicit clamp p1'/q1' to [0,255] before uint8 write
  RED-2: bench-enforced m.x >= 4*stride contract

M1: 3-way 4096/4096 bit-exact (QPU vs C ref AND vs NEON).
M2: 5.629 Medge/s isolation → R8 = 0.061 RED (predicted 0.09-0.14).
    Lower than prediction; H.264 deblock has 4 early-return paths +
    2 conditional writes that hurt V3D branchy execution more than
    expected.

M4 same-kernel: NEON-3+QPU 12.81 Medge/s ≈ pure-NEON-4 ~12-15
  (neutral).

M4 MIXED (real H.264 deployment shape): CPU=MC + QPU=h264deblock
  gives CPU MC 25.11 Mblock/s + QPU h264deblock 6.23 Medge/s.
  QPU contribution is essentially unchanged from isolation —
  the cross-substrate contention is gentle (consistent with
  Issue 003's V4 finding).

Verdict: H.264 deblock = opportunistic QPU helper. Same recipe
slot as cycle 5 CDEF. 6 Medge/s helper = 85% of single-NEON-core
deblock capacity, available when CPU is busy with other work.

Cycles 1-8 deployment recipe complete:
  Primary QPU: cycles 1+2+4 (VP9 IDCT/LPF, all bandwidth-bound)
  Primary CPU: cycles 3+6+7 (compute-heavy or trivially fast on NEON)
  Opportunistic helper: cycles 5+8 (CDEF, H.264 deblock)

Phase 9 lessons added:
  - Branchy kernels underperform V3D vs straight-line ones
  - Mixed-kernel helper value scales with isolation M2, not
    same-kernel M4
  - R prediction needs branchiness weight, not just compute density

- src/v3d_h264deblock.comp (132 inst QPU shader)
- tests/bench_v3d_h264deblock.c (3-way M1 + M2 + R classification)
- tests/bench_concurrent_mixed.c extended with K_H264DEBLOCK
- CMakeLists.txt: v3d_h264deblock.spv + bench_v3d_h264deblock
  + h264dsp linked into bench_concurrent_mixed
- docs/k8_h264deblock_phase7.md (full closure with cycles 1-8 recipe)

Next: Phase 8 — V4L2 wrapper / deployment infra. Public API
already exposes recipe-default substrate per kernel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 14:44:21 +00:00

..

Issue 003 closed: mixed-kernel M4 validates V4 deployment shape

2026-05-18 13:44:08 +00:00

dev_process.md

Path B pivot + Phase 0-3 closed with first baseline numbers

2026-05-18 11:30:12 +00:00

k2_deblock_phase1.md

Cycle 2 (deblocking) Phase 1-3: M3'' = 48.285 Medge/s baseline

2026-05-18 12:28:57 +00:00

k2_deblock_phase2.md

Cycle 2 (deblocking) Phase 1-3: M3'' = 48.285 Medge/s baseline

2026-05-18 12:28:57 +00:00

k2_deblock_phase3.md

Cycle 2 (deblocking) Phase 1-3: M3'' = 48.285 Medge/s baseline

2026-05-18 12:28:57 +00:00

k2_deblock_phase4.md

Cycle 2 (LPF) closure: M1''=100%, R''=0.41, M4''=+6.9%, PASS

2026-05-18 12:39:26 +00:00

k2_deblock_phase5.md

Cycle 2 (LPF) closure: M1''=100%, R''=0.41, M4''=+6.9%, PASS

2026-05-18 12:39:26 +00:00

k2_deblock_phase7.md

Cycle 2 (LPF) closure: M1''=100%, R''=0.41, M4''=+6.9%, PASS

2026-05-18 12:39:26 +00:00

k3_mc_phase1.md

Cycle 3 (MC interpolation) closure: M1'''=100%, R'''=0.067 RED, M4=-19.5%

2026-05-18 12:51:43 +00:00

k3_mc_phase2.md

Cycle 3 (MC interpolation) closure: M1'''=100%, R'''=0.067 RED, M4=-19.5%

2026-05-18 12:51:43 +00:00

k3_mc_phase3.md

Cycle 3 (MC interpolation) closure: M1'''=100%, R'''=0.067 RED, M4=-19.5%

2026-05-18 12:51:43 +00:00

k3_mc_phase4.md

Cycle 3 (MC interpolation) closure: M1'''=100%, R'''=0.067 RED, M4=-19.5%

2026-05-18 12:51:43 +00:00

k3_mc_phase5.md

Cycle 3 (MC interpolation) closure: M1'''=100%, R'''=0.067 RED, M4=-19.5%

2026-05-18 12:51:43 +00:00

k3_mc_phase7.md

Issue 003 closed: mixed-kernel M4 validates V4 deployment shape

2026-05-18 13:44:08 +00:00

k4_lpf8_phase1_3.md

Cycle 4 (LPF wd=8) closure: M1=100%, R=0.34, M4=+4.1%, PASS

2026-05-18 12:56:25 +00:00

k4_lpf8_phase4_7.md

Cycle 4 (LPF wd=8) closure: M1=100%, R=0.34, M4=+4.1%, PASS

2026-05-18 12:56:25 +00:00

k5_cdef_phase1_2.md

Cycle 5 setup (Phase 1+2): vendor dav1d 1.4.3 CDEF sources

2026-05-18 13:12:25 +00:00

k5_cdef_phase3_partial.md

Issue 003 closed: mixed-kernel M4 validates V4 deployment shape

2026-05-18 13:44:08 +00:00

k5_cdef_phase3.md

Cycle 5 Phase 3 closed: M1 PASS via bench pointer-convention fix

2026-05-18 13:46:50 +00:00

k5_cdef_phase4.md

Cycle 5 closed: CDEF QPU R5=0.116 ORANGE, opportunistic helper

2026-05-18 13:52:46 +00:00

k5_cdef_phase7.md

Cycle 5 closed: CDEF QPU R5=0.116 ORANGE, opportunistic helper

2026-05-18 13:52:46 +00:00

k6_h264idct4_phase1.md

Cycle 6 (H.264) opened — IDCT 4x4 Phase 1+3, M3 = 175 Mblock/s

2026-05-18 14:14:43 +00:00

k6_h264idct4_phase3.md

Cycle 6 (H.264) opened — IDCT 4x4 Phase 1+3, M3 = 175 Mblock/s

2026-05-18 14:14:43 +00:00

k6_h264idct4_phase4.md

Cycle 6 closed (deferred Phase 4): IDCT 4x4 too small for QPU

2026-05-18 14:15:25 +00:00

k7_h264idct8_phase1.md

Cycle 7 (H.264 IDCT 8x8) opened — Phase 1 goal doc

2026-05-18 14:15:37 +00:00

k7_h264idct8_phase3_and_4.md

Cycle 7 closed: H.264 IDCT 8x8 = 151 Mblock/s NEON, Phase 4 deferred

2026-05-18 14:16:42 +00:00

k8_h264deblock_phase1.md

Cycle 8 (H.264 deblock) opened — Phase 1 + NEON vendored

2026-05-18 14:18:19 +00:00

k8_h264deblock_phase3.md

Cycle 8 Phase 3 closed: H.264 deblock NEON = 92 Medge/s

2026-05-18 14:39:36 +00:00

k8_h264deblock_phase4.md

Cycle 8 Phase 4: H.264 deblock QPU shader plan

2026-05-18 14:40:07 +00:00

k8_h264deblock_phase7.md

Cycle 8 closed: H.264 deblock R8=0.061 RED, opportunistic helper

2026-05-18 14:44:21 +00:00

phase0.md

Phase 4 plan + Phase 5 second-model review (PASS-WITH-REVISIONS)

2026-05-18 11:47:03 +00:00

phase1.md

Path B pivot + Phase 0-3 closed with first baseline numbers

2026-05-18 11:30:12 +00:00

phase2.md

Phase 4 plan + Phase 5 second-model review (PASS-WITH-REVISIONS)

2026-05-18 11:47:03 +00:00

phase3.md

Path B pivot + Phase 0-3 closed with first baseline numbers

2026-05-18 11:30:12 +00:00

phase4.md

Phase 4 plan + Phase 5 second-model review (PASS-WITH-REVISIONS)

2026-05-18 11:47:03 +00:00

phase5.md

Phase 4 plan + Phase 5 second-model review (PASS-WITH-REVISIONS)

2026-05-18 11:47:03 +00:00

phase7_M4.md

Phase 7 M4: mixed CPU+QPU beats pure 4-core NEON; project continues

2026-05-18 12:18:36 +00:00

phase7.md

Phase 6 (v1+v4 production) + Phase 7 closure: R = 0.92 ± 0.03 on hertz

2026-05-18 12:09:00 +00:00

phase8_scoping.md

Phase 8 skeleton: public C API + first end-to-end smoke test

2026-05-18 13:54:43 +00:00

vulkaninfo_v3d_7_1_7_hertz.txt

Path B pivot + Phase 0-3 closed with first baseline numbers

2026-05-18 11:30:12 +00:00