# Issue 001 — VP9 LPF wd=16 cycle (prediction validation) **Status**: open, not blocking **Type**: kernel-cycle (cycle 5 candidate) **Predicted verdict**: RED (M4 likely negative, per cycle 4 lesson 4) **Priority**: low (incremental; trend prediction) **Filed**: 2026-05-18 ## Background Cycle 4 (LPF wd=8) closed PASS with M4 delta +4.1 % vs cycle 2 wd=4's +6.9 %. The downward trend prompted Phase 9 lesson: "wd=16 would probably show further R degradation; M4 may flip negative based on the trend line." See `docs/k4_lpf8_phase4_7.md §"Phase 9 lessons"`. This issue tracks the experiment to validate (or invalidate) that prediction. ## What to do Cycle 5 LPF wd=16, mirroring cycle 4's compact structure: 1. **Phase 3**: build `tests/bench_neon_lpf16.c` modelled on `bench_neon_lpf8.c`. NEON symbol: `ff_vp9_loop_filter_h_16_16_neon` (already in vendored `vp9lpf_neon.S`). Capture M3. 2. **Phase 4-7**: write `src/v3d_lpf_h_16_16.comp` extending the wd=8 kernel with the wd=16 outer-flat path (`flat8out` test, 14 writes per row when both flat8out and flat8in pass). New contract: `dst_stride_u8 ≥ 14` (vs cycle 4's ≥ 6) because the flat8out path writes at `base-7..base+6` (14 contiguous bytes). 3. **Phase 5 review**: mandatory — wd=16 is not as incremental as wd=8 (much larger conditional logic, new contract bound). 4. **Phase 7**: measure M2, R; if M4 negative as predicted, document trend confirmation and close kernel as "CPU-only" in deployment recipe. ## Expected outcome (per prediction) | Quantity | Predicted | |---|---| | M1 bit-exact | 100 % (same pattern as cycles 2/4) | | M3 NEON | ~55 Medge/s (slightly faster than wd=8) | | M2 QPU isolation | ~12-15 Medge/s | | R isolation | 0.22-0.27 (ORANGE, downward) | | M4 mixed vs NEON-4 | -2 % to +1 % (borderline; likely negative) | | 30fps margin | still 5×+ (user-facing PASS regardless) | ## Acceptance criteria (issue closed when) - Cycle 5 phases 1-7 complete, committed - `docs/k5_lpf16_phase*.md` produced - Phase 7 verdict documented, deployment recipe updated either way - Phase 9 lesson 4 trend prediction validated or refuted ## Why deferred (not done in current session) The session goal was "continue until user intervention necessary." User directed: file as issue, progress to cycle 5 CDEF instead. The trend prediction is interesting but the project's deployment recipe is already locked through cycle 4; cycle 5 wd=16 result would update at most one row of the recipe table. ## Related - `docs/k4_lpf8_phase4_7.md §"Phase 9 lessons"` lesson 4 (the prediction this validates) - `external/ffmpeg-snapshot/libavcodec/aarch64/vp9lpf_neon.S` (NEON ref already vendored — symbol `ff_vp9_loop_filter_h_16_16_neon`) - `docs/k2_deblock_phase4.md` (cycle 2 template) - `docs/k4_lpf8_phase4_7.md` (cycle 4 template, the most direct reference)