bench_concurrent_mixed runs NEON-N on kernel A + QPU on kernel B
concurrently. Matrix on hertz:
V3 (CPU MC + QPU MC same-kernel): CPU 22.64 + QPU 0.39 Mblock/s
V4 (CPU MC + QPU LPF4): CPU 27.87 + QPU 12.74 Medge/s
V1 (CPU MC + NEON-fb CDEF): CPU 24.49 + 1.75 Mblock/s CDEF
V2 (CPU LPF4 + NEON-fb CDEF): CPU 27.28 Medge + 1.70 Mblock/s
V4 is the daedalus-fourier deployment shape (CPU runs MC; QPU runs
LPF4 via cycle 2 GREEN offload). Both substrates productive; CPU
MC +23% per-core vs same-kernel V3 control. Same-kernel M4 in
cycles 1-5 was a worst-case contention bound, not a deployment
number — user's "5%/50%" framing was correct.
Cycle 3 MC verdict unchanged (QPU MC contributes ~0.4 under any
contention); cycle 5 CDEF deferred verdict softened to
opportunistic helper (NEON-fallback proxy used since cycle 5
Phase 6 not yet built).
- tests/bench_concurrent_mixed.c (configurable cpu-kernel /
qpu-kernel matrix; supports MC, LPF4, LPF8, IDCT real QPU
dispatch; CDEF uses NEON-on-core-3 fallback)
- CMakeLists.txt: build target wired with all FFmpeg + dav1d sources
- docs/issues/003-mixed-kernel-m4-bench.md: closure + matrix
- docs/k3_mc_phase7.md: M4 methodology caveat extended with V3/V4
- docs/k5_cdef_phase3_partial.md: deployment recommendation updated
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>