Files
daedalus-fourier/docs/issues/001-lpf-wd-16-prediction-validation.md
T
marfrit 20e3d004ae Issues 001+002: defer LPF wd=16 + LPF vertical variants
Per user direction at cycle-4 close: file wd=16 (trend prediction
validation) and vertical variants (column-stride TMU behaviour
unknown) as local issues for future cycles. Progress instead to
CDEF (AV1) for codec breadth.

docs/issues/001 — wd=16 prediction validation. Per cycle 4 lesson 4,
trend says wd=16 likely flips M4 negative. Quick incremental cycle
when revisited.

docs/issues/002 — vertical variants. Different memory access pattern
(column-strided vs row-strided). The load-bearing unknown is
whether the cycle 2 +6.9% mixed gain survives the TMU coalescing
shift. If positive, deployment recipe gains symmetry; if negative,
must split by orientation.

Both issues have acceptance criteria + expected outcomes documented.

Cycle 5 next: CDEF (AV1) — codec-breadth expansion.

No Gitea repo exists for daedalus-fourier yet (project is local-
only). If a tracker is wanted, create the repo and migrate these
.md files. For now they live in-tree as part of the project history.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 13:09:51 +00:00

2.8 KiB
Raw Blame History

Issue 001 — VP9 LPF wd=16 cycle (prediction validation)

Status: open, not blocking Type: kernel-cycle (cycle 5 candidate) Predicted verdict: RED (M4 likely negative, per cycle 4 lesson 4) Priority: low (incremental; trend prediction) Filed: 2026-05-18

Background

Cycle 4 (LPF wd=8) closed PASS with M4 delta +4.1 % vs cycle 2 wd=4's +6.9 %. The downward trend prompted Phase 9 lesson: "wd=16 would probably show further R degradation; M4 may flip negative based on the trend line." See docs/k4_lpf8_phase4_7.md §"Phase 9 lessons".

This issue tracks the experiment to validate (or invalidate) that prediction.

What to do

Cycle 5 LPF wd=16, mirroring cycle 4's compact structure:

  1. Phase 3: build tests/bench_neon_lpf16.c modelled on bench_neon_lpf8.c. NEON symbol: ff_vp9_loop_filter_h_16_16_neon (already in vendored vp9lpf_neon.S). Capture M3.
  2. Phase 4-7: write src/v3d_lpf_h_16_16.comp extending the wd=8 kernel with the wd=16 outer-flat path (flat8out test, 14 writes per row when both flat8out and flat8in pass). New contract: dst_stride_u8 ≥ 14 (vs cycle 4's ≥ 6) because the flat8out path writes at base-7..base+6 (14 contiguous bytes).
  3. Phase 5 review: mandatory — wd=16 is not as incremental as wd=8 (much larger conditional logic, new contract bound).
  4. Phase 7: measure M2, R; if M4 negative as predicted, document trend confirmation and close kernel as "CPU-only" in deployment recipe.

Expected outcome (per prediction)

Quantity Predicted
M1 bit-exact 100 % (same pattern as cycles 2/4)
M3 NEON ~55 Medge/s (slightly faster than wd=8)
M2 QPU isolation ~12-15 Medge/s
R isolation 0.22-0.27 (ORANGE, downward)
M4 mixed vs NEON-4 -2 % to +1 % (borderline; likely negative)
30fps margin still 5×+ (user-facing PASS regardless)

Acceptance criteria (issue closed when)

  • Cycle 5 phases 1-7 complete, committed
  • docs/k5_lpf16_phase*.md produced
  • Phase 7 verdict documented, deployment recipe updated either way
  • Phase 9 lesson 4 trend prediction validated or refuted

Why deferred (not done in current session)

The session goal was "continue until user intervention necessary." User directed: file as issue, progress to cycle 5 CDEF instead. The trend prediction is interesting but the project's deployment recipe is already locked through cycle 4; cycle 5 wd=16 result would update at most one row of the recipe table.

  • docs/k4_lpf8_phase4_7.md §"Phase 9 lessons" lesson 4 (the prediction this validates)
  • external/ffmpeg-snapshot/libavcodec/aarch64/vp9lpf_neon.S (NEON ref already vendored — symbol ff_vp9_loop_filter_h_16_16_neon)
  • docs/k2_deblock_phase4.md (cycle 2 template)
  • docs/k4_lpf8_phase4_7.md (cycle 4 template, the most direct reference)