Cycle 8 Phase 3 closed: H.264 deblock NEON = 92 Medge/s
M1: 10000/10000 bit-exact (after orientation fix: ff_h264_v_loop_ filter is "vertical filtering of horizontal edges", not "vertical edge"; 16 columns process the edge horizontally with 8 rows of vertical context). M3: 91.947 Medge/s per core. Per-edge 10.9 ns. 11x worst-case 1080p30 floor, 30x realistic floor. Filter triggers on 25 % of edges (random alpha/beta/tc0 covers both gating paths). Cycle 8 Phase 9 lesson: H.264/FFmpeg "v_loop_filter" naming uses filter DIRECTION (vertical) not edge orientation. Edge is horizontal; filter operates vertically across it. Distinct from cycle 6's column-major-block lesson but related discovery pattern. Encoded for future cycles. R8 prediction revised: 0.09-0.14 ORANGE (down from Phase 1's 0.3-0.8 estimate). H.264 deblock is 2x faster on NEON than VP9 LPF wd=4 (cycle 2) but H.264 deblock has more per-edge branches that hurt QPU more. Worth building anyway: - ORANGE in cycle 1's "M4 may rescue" band - Mixed-kernel deployment helper value (Issue 003) matters more than isolation R - 25%-trigger rate gives 4x effective contribution multiplier on QPU side - tests/h264_deblock_ref.c (column-walking C ref per row segment) - tests/bench_neon_h264deblock.c (M1 + M3 bench) - CMakeLists.txt: cycle 8 NEON bench wiring + h264dsp_neon.S - docs/k8_h264deblock_phase3.md (closure) Next: Phase 4 plan QPU shader, Phase 5 Sonnet review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -120,6 +120,21 @@ add_executable(bench_neon_h264idct8
|
||||
)
|
||||
target_compile_options(bench_neon_h264idct8 PRIVATE -O3 -march=armv8-a+simd)
|
||||
|
||||
# Cycle 8 — H.264 luma vertical deblock NEON M3 baseline bench.
|
||||
set(FFASM_H264DSP_SOURCES
|
||||
${FFSNAP}/libavcodec/aarch64/h264dsp_neon.S
|
||||
)
|
||||
set_source_files_properties(${FFASM_H264DSP_SOURCES} PROPERTIES
|
||||
COMPILE_OPTIONS "${FFASM_FLAGS}"
|
||||
LANGUAGE ASM)
|
||||
|
||||
add_executable(bench_neon_h264deblock
|
||||
tests/bench_neon_h264deblock.c
|
||||
tests/h264_deblock_ref.c
|
||||
${FFASM_H264DSP_SOURCES}
|
||||
)
|
||||
target_compile_options(bench_neon_h264deblock PRIVATE -O3 -march=armv8-a+simd)
|
||||
|
||||
add_executable(bench_neon_idct
|
||||
tests/bench_neon_idct.c
|
||||
tests/vp9_idct8_ref.c
|
||||
|
||||
Reference in New Issue
Block a user