Issue 003 closed: mixed-kernel M4 validates V4 deployment shape
bench_concurrent_mixed runs NEON-N on kernel A + QPU on kernel B concurrently. Matrix on hertz: V3 (CPU MC + QPU MC same-kernel): CPU 22.64 + QPU 0.39 Mblock/s V4 (CPU MC + QPU LPF4): CPU 27.87 + QPU 12.74 Medge/s V1 (CPU MC + NEON-fb CDEF): CPU 24.49 + 1.75 Mblock/s CDEF V2 (CPU LPF4 + NEON-fb CDEF): CPU 27.28 Medge + 1.70 Mblock/s V4 is the daedalus-fourier deployment shape (CPU runs MC; QPU runs LPF4 via cycle 2 GREEN offload). Both substrates productive; CPU MC +23% per-core vs same-kernel V3 control. Same-kernel M4 in cycles 1-5 was a worst-case contention bound, not a deployment number — user's "5%/50%" framing was correct. Cycle 3 MC verdict unchanged (QPU MC contributes ~0.4 under any contention); cycle 5 CDEF deferred verdict softened to opportunistic helper (NEON-fallback proxy used since cycle 5 Phase 6 not yet built). - tests/bench_concurrent_mixed.c (configurable cpu-kernel / qpu-kernel matrix; supports MC, LPF4, LPF8, IDCT real QPU dispatch; CDEF uses NEON-on-core-3 fallback) - CMakeLists.txt: build target wired with all FFmpeg + dav1d sources - docs/issues/003-mixed-kernel-m4-bench.md: closure + matrix - docs/k3_mc_phase7.md: M4 methodology caveat extended with V3/V4 - docs/k5_cdef_phase3_partial.md: deployment recommendation updated Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -122,6 +122,27 @@ NEON-3 on kernel-A + QPU on kernel-B concurrently would close the
|
||||
question. ~½ day of additional bench work; would update the
|
||||
deployment recipe for cycles 3 + 5 if the result is positive.
|
||||
|
||||
### Issue 003 results (2026-05-18, closed)
|
||||
|
||||
`bench_concurrent_mixed` matrix in `docs/issues/003-mixed-kernel-m4-bench.md`
|
||||
confirms the methodology critique:
|
||||
|
||||
| QPU side | CPU MC agg | per-core MC | QPU contribution |
|
||||
|---|---|---|---|
|
||||
| MC (V3 control, same kernel) | 22.64 Mblock/s | 7.5 avg | 0.39 Mblock/s MC |
|
||||
| LPF4 real QPU (V4) | **27.87 Mblock/s** | **9.3 avg** | **12.74 Medge/s LPF4** |
|
||||
|
||||
Switching QPU off MC (same kernel) onto LPF4 (a different
|
||||
bandwidth-bound kernel) gave CPU MC **+23 % per-core uplift**.
|
||||
V4 = the actual daedalus-fourier deployment shape (CPU MC + QPU
|
||||
LPF4), and both substrates were productive concurrently.
|
||||
|
||||
**Cycle 3 MC verdict unchanged**: QPU MC contributes ~0.4
|
||||
Mblock/s under any contention scenario (V3, V5). The 4 NEON cores
|
||||
do MC dramatically better. **MC stays on CPU.** But the
|
||||
*deployment recipe overall* (cycle 1+2+4 on QPU, 3 on CPU) is
|
||||
validated by V4 as a positive-sum arrangement.
|
||||
|
||||
## Decision per Phase 1 rules + 30fps-floor calibration
|
||||
|
||||
| Rule | Result | Status |
|
||||
|
||||
Reference in New Issue
Block a user