Files
daedalus-fourier/docs/issues/001-lpf-wd-16-prediction-validation.md
marfrit 20e3d004ae Issues 001+002: defer LPF wd=16 + LPF vertical variants
Per user direction at cycle-4 close: file wd=16 (trend prediction
validation) and vertical variants (column-stride TMU behaviour
unknown) as local issues for future cycles. Progress instead to
CDEF (AV1) for codec breadth.

docs/issues/001 — wd=16 prediction validation. Per cycle 4 lesson 4,
trend says wd=16 likely flips M4 negative. Quick incremental cycle
when revisited.

docs/issues/002 — vertical variants. Different memory access pattern
(column-strided vs row-strided). The load-bearing unknown is
whether the cycle 2 +6.9% mixed gain survives the TMU coalescing
shift. If positive, deployment recipe gains symmetry; if negative,
must split by orientation.

Both issues have acceptance criteria + expected outcomes documented.

Cycle 5 next: CDEF (AV1) — codec-breadth expansion.

No Gitea repo exists for daedalus-fourier yet (project is local-
only). If a tracker is wanted, create the repo and migrate these
.md files. For now they live in-tree as part of the project history.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 13:09:51 +00:00

72 lines
2.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Issue 001 — VP9 LPF wd=16 cycle (prediction validation)
**Status**: open, not blocking
**Type**: kernel-cycle (cycle 5 candidate)
**Predicted verdict**: RED (M4 likely negative, per cycle 4 lesson 4)
**Priority**: low (incremental; trend prediction)
**Filed**: 2026-05-18
## Background
Cycle 4 (LPF wd=8) closed PASS with M4 delta +4.1 % vs cycle 2 wd=4's
+6.9 %. The downward trend prompted Phase 9 lesson: "wd=16 would
probably show further R degradation; M4 may flip negative based on
the trend line." See `docs/k4_lpf8_phase4_7.md §"Phase 9 lessons"`.
This issue tracks the experiment to validate (or invalidate) that
prediction.
## What to do
Cycle 5 LPF wd=16, mirroring cycle 4's compact structure:
1. **Phase 3**: build `tests/bench_neon_lpf16.c` modelled on
`bench_neon_lpf8.c`. NEON symbol: `ff_vp9_loop_filter_h_16_16_neon`
(already in vendored `vp9lpf_neon.S`). Capture M3.
2. **Phase 4-7**: write `src/v3d_lpf_h_16_16.comp` extending the
wd=8 kernel with the wd=16 outer-flat path (`flat8out` test, 14
writes per row when both flat8out and flat8in pass). New
contract: `dst_stride_u8 ≥ 14` (vs cycle 4's ≥ 6) because the
flat8out path writes at `base-7..base+6` (14 contiguous bytes).
3. **Phase 5 review**: mandatory — wd=16 is not as incremental as
wd=8 (much larger conditional logic, new contract bound).
4. **Phase 7**: measure M2, R; if M4 negative as predicted, document
trend confirmation and close kernel as "CPU-only" in deployment
recipe.
## Expected outcome (per prediction)
| Quantity | Predicted |
|---|---|
| M1 bit-exact | 100 % (same pattern as cycles 2/4) |
| M3 NEON | ~55 Medge/s (slightly faster than wd=8) |
| M2 QPU isolation | ~12-15 Medge/s |
| R isolation | 0.22-0.27 (ORANGE, downward) |
| M4 mixed vs NEON-4 | -2 % to +1 % (borderline; likely negative) |
| 30fps margin | still 5×+ (user-facing PASS regardless) |
## Acceptance criteria (issue closed when)
- Cycle 5 phases 1-7 complete, committed
- `docs/k5_lpf16_phase*.md` produced
- Phase 7 verdict documented, deployment recipe updated either way
- Phase 9 lesson 4 trend prediction validated or refuted
## Why deferred (not done in current session)
The session goal was "continue until user intervention necessary."
User directed: file as issue, progress to cycle 5 CDEF instead.
The trend prediction is interesting but the project's deployment
recipe is already locked through cycle 4; cycle 5 wd=16 result
would update at most one row of the recipe table.
## Related
- `docs/k4_lpf8_phase4_7.md §"Phase 9 lessons"` lesson 4 (the
prediction this validates)
- `external/ffmpeg-snapshot/libavcodec/aarch64/vp9lpf_neon.S`
(NEON ref already vendored — symbol `ff_vp9_loop_filter_h_16_16_neon`)
- `docs/k2_deblock_phase4.md` (cycle 2 template)
- `docs/k4_lpf8_phase4_7.md` (cycle 4 template, the most direct
reference)