diff --git a/iter1_close.md b/iter1_close.md new file mode 100644 index 0000000..b77c2f3 --- /dev/null +++ b/iter1_close.md @@ -0,0 +1,91 @@ +# iter1 close — ampere-fourier baseline + +Closed 2026-05-16. Iteration 1 of the ampere-fourier campaign. Establishes the validated 3-codec libva HW decode baseline on RK3588 and spawns the three deferred-codec iterations. + +## What iter1 delivers + +| Question | Answer | +|----------|--------| +| Does `libva-v4l2-request-fourier @ 7ac934e` work on RK3588 unchanged? | **Yes**, for the 3 codecs the v7.0-rc3 mainline kernel exposes via V4L2 stateless on this board. | +| Which codecs? | H.264 (rkvdec), VP8 (hantro generic), MPEG-2 (hantro generic). | +| With what guarantees? | C1-C6 per [`phase1_goal.md`](phase1_goal.md): decode completes, HW path engaged (ioctl-trace evidence), frame 0 byte-identical vs SW reference, frame 720 per per-codec SSIM Y floors (see below), N=3 FPS reported with σ, dmesg clean. | +| What about HEVC, VP9, AV1? | Each is blocked behind one specific external dependency. Iter1 spawned a follow-up issue per blocker; each becomes its own iteration. | +| What about the firefox-fourier consumer-side test (C7)? | Out of iter1 scope — ampere ran headless-ssh by design. C7 anchors to whichever future iteration first runs a compositor on ampere (likely a UX sub-iteration after AV1 lands, or the firefox-consumer iteration that follows). | + +## Per-codec result table + +| Codec | C1 frames | C2 ioctls | C3 frame 0 | C4 frame 720 (per refined floor) | C5 FPS (N=3) | C6 dmesg | +|-------|-----------|-----------|------------|----------------------------------|---------------|----------| +| **H.264** | ✓ 30/30 | ✓ V4L2 ioctls 224 + MEDIA_REQUEST 88 | ✓ byte-identical, sha `3214803d8be74416` | observed SSIM Y `0.667575` (no PASS threshold — see drift note) | **461.49 ± 0.61** | ✓ empty diff | +| **VP8** | ✓ 30/30 | ✓ V4L2 ioctls 208 + MEDIA_REQUEST 82 | ✓ byte-identical | ✓ **SSIM Y 1.000000** (byte-identical at f720, exceeds floor ≥ 1.000) | **217.24 ± 0.57** | ✓ empty diff | +| **MPEG-2** | ✓ 30/30 | ✓ V4L2 ioctls 168 + MEDIA_REQUEST 66 | ✓ byte-identical | ✓ **SSIM Y 0.999720** (within floor ≥ 0.9997, IEEE 1180 tolerance) | **199.84 ± 0.70** | ✓ empty diff | + +Refined C4 per-codec floors (from Phase 4 plan §1, amended per Phase 5 review): + +| Codec | Floor | Basis | +|-------|-------|-------| +| H.264 | no PASS threshold (documented at 0.6676) | cumulative GOP drift between libavcodec SW and rkvdec HW — see drift note below | +| VP8 | ≥ 1.000 (byte-identical) | Phase 3 N=3 observed perfect bit-exactness; raises the bar instead of lowering it | +| MPEG-2 | ≥ 0.9997 | memory [`feedback_mpeg2_hw_sw_idct_precision`](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_mpeg2_hw_sw_idct_precision.md) — hantro IDCT conformant per IEEE 1180 but ≤3 LSB off SW | + +## H.264 SSIM 0.6676 drift note (Phase 5 framing amendment applied) + +The H.264 SSIM Y at frame 720 measured **0.667575** on ampere RK3588. Fresnel iter1 measured **0.6431** on RK3399. + +These are **two independent empirical data points on two different SoC generations**. They are not cross-verified against each other and neither is the other's reference anchor. RK3399's rkvdec is the v1-era driver; RK3588's rkvdec uses the VDPU381/383 mainline path — different decoder hardware, different deblocking and entropy-decode implementations. The fact that the two drift values converge (~0.65 ± 0.02) is consistent with both decoders being H.264-conformant within tolerance vs libavcodec SW but not bit-identical to it — which is exactly what's expected when two independent hardware implementations both follow the H.264 spec to its allowed precision. + +The drift is NOT: +- a libva backend bug (frame 0 byte-identical, decode completes, HW path engaged) +- a kernel ABI deviation (Phase 3 ruled out frame-0 byte-deviation, which is the diagnostic for that) +- specific to either chip generation (both show the same shape of drift) + +The drift IS: +- the expected accumulated divergence between two conformant H.264 implementations across a 720-frame inter-prediction chain +- accepted as-is for iter1; future iterations may investigate whether an x264 encode setting (`-tune psnr`, `-no-cabac`) reduces drift, but that's not iter1 scope + +## Codecs blocked, spawned follow-up issues + +| Codec | Blocker | Follow-up issue | Iteration that picks it up | +|-------|---------|------------------|----------------------------| +| HEVC | kernel oops in `rkvdec_hevc_prepare_hw_st_rps` (`__pi_memcmp` fault) — cascades to wedge `v4l2_mem2mem` for all decoders until reboot | [`marfrit/kernel-agent#11 [ka:experiment]`](https://git.reauktion.de/marfrit/kernel-agent/issues/11) | iter2 | +| VP9 | kernel doesn't register `V4L2_PIX_FMT_VP9_FRAME` on RK3588 rkvdec; needs VDPU381/383 variant_ops enablement | [`marfrit/kernel-agent#12 [ka:experiment]`](https://git.reauktion.de/marfrit/kernel-agent/issues/12) | iter3 | +| AV1 | kernel exposes `/dev/video4` cleanly but libva backend iter38 auto-probe hard-capped at 2 fds; need iter39 third-fd model | [`marfrit/libva-v4l2-request-fourier#2`](https://git.reauktion.de/marfrit/libva-v4l2-request-fourier/issues/2) | iter4 | + +[`marfrit/kernel-agent#6`](https://git.reauktion.de/marfrit/kernel-agent/issues/6) (the iter0 umbrella) closed with cross-refs to all three. + +## C7 — out of iter1 scope (headless by design) + +The firefox-fourier vendor-defaults engagement test (Phase 1 C7) is out of iter1 scope. iter1 ran headless-ssh by design — no UX requirement, no compositor needed for C1-C6 which fully cover the libva backend correctness verdict. + +C7 needs a Wayland compositor on ampere (the `widget.dmabuf.force-enabled` pref unlock path requires a real DMA-BUF surface to negotiate against). When a future iteration first stands up a graphical session on ampere — likely either (a) the firefox-consumer iteration after AV1 lands, or (b) a dedicated UX sub-iteration — C7 lands there. It is NOT a missing rig prerequisite that iter1 should have fixed; it's a scope boundary. + +The vendor-default pref file is shipped correctly: `/usr/lib/firefox-fourier/defaults/preferences/rockchip-fourier-defaults.js` contains the three required prefs (`widget.dmabuf.force-enabled`, `media.hardware-video-decoding.force-enabled`, `media.ffvpx-hw.enabled` all `true`) per `marfrit-packages#8`. So C7 is "ready to run, just needs a compositor." + +## State to carry forward to iter2+ + +**Carries (state):** +- Backend source pin `7ac934e` (iter38b) +- Hand-build over the broken CI package binary remains the install method until `marfrit-packages#17` is fixed +- Source clips in `~/measurements/encoded/` on ampere (5 codecs encoded, immutable) +- Per-codec C4 floors as refined above (VP8 ≥ 1.000, MPEG-2 ≥ 0.9997, H.264 documented) +- Reboot authorization for ampere when v4l2 stack wedges (per Phase 0 operator decision) + +**Does NOT carry (data):** +- The Phase 3 N=3 FPS numbers are iter1-anchor only. iter2/3/4 measure their own anchors per their Phase 1 success criteria, with the iter1 N=3 mean as **reference history**. + +**New work items unblocked by iter1 close:** +- iter2: HEVC fix experiment via kernel-agent#11 +- iter3: VP9 enablement experiment via kernel-agent#12 +- iter4: AV1 backend iter39 via libva-v4l2-request-fourier#2 +- (potential) UX sub-iteration to validate C7 once a compositor is up on ampere + +## What did NOT happen in iter1 (for clarity) + +- No code changes to libva backend, mpv-fourier, ffmpeg-v4l2-request-fourier, firefox-fourier, kernel. +- No `pacman -Syu` on ampere (deliberate per Phase 2 constraint 1; libva hand-build override would have been silently reverted). +- No SDDM auto-login configuration (would have been scope creep; C7 deferred instead). +- No re-encoding of test clips (immutable for the campaign). + +## Phase 8 hand-off + +Memory entry will distill the iter1 lesson: campaigns on a new host with a known-good backend benefit from a "characterize-and-baseline" iteration *before* any change-the-system iteration; the baseline establishes the floor that change-iterations regress against, and surfaces blockers as separate issue threads rather than letting them conflate into the change-iteration's debug loop.