Files
marfrit 0bd4222a36 iter1 phase6: implementation (file 3 follow-ups, close ka#6, write close artifact)
Phase 6 atoms executed:
  1. Refined C4 to per-codec SSIM floors — documented in iter1_close.md.
  2. Filed kernel-agent #11 [ka:experiment]: HEVC rkvdec_hevc_prepare_hw_st_rps
     __pi_memcmp fault, m2m wedge. Includes dmesg trace + reproducer +
     fresnel cross-check (RK3399 doesn't trip).
  3. Filed kernel-agent #12 [ka:experiment]: VP9 enablement on RK3588
     rkvdec via VDPU381/383 variant_ops. References memory
     feedback_rkvdec_patch_reachability for path boundary.
  4. Filed libva-v4l2-request-fourier #2: iter39 third-fd auto-probe
     for RK3588 av1-vpu-dec. Pure backend, no kernel.
  5. Updated kernel-agent #6 with per-iter1 outcome comment, closed
     with cross-refs to #11, #12, libva-v4l2-request-fourier#2.

iter1_close.md applies the two Phase 5 framing amendments:
  - H.264 SSIM 0.6676 reframed as 'independent RK3588 empirical
    observation; converges with fresnel 0.6431 but neither is the
    other's reference anchor'
  - C7 reframed as 'out of iter1 scope (headless by design)' not
    'rig blocker'

Carries-forward / does-not-carry state explicit for iter2/3/4 entry.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 07:27:37 +00:00

92 lines
7.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# iter1 close — ampere-fourier baseline
Closed 2026-05-16. Iteration 1 of the ampere-fourier campaign. Establishes the validated 3-codec libva HW decode baseline on RK3588 and spawns the three deferred-codec iterations.
## What iter1 delivers
| Question | Answer |
|----------|--------|
| Does `libva-v4l2-request-fourier @ 7ac934e` work on RK3588 unchanged? | **Yes**, for the 3 codecs the v7.0-rc3 mainline kernel exposes via V4L2 stateless on this board. |
| Which codecs? | H.264 (rkvdec), VP8 (hantro generic), MPEG-2 (hantro generic). |
| With what guarantees? | C1-C6 per [`phase1_goal.md`](phase1_goal.md): decode completes, HW path engaged (ioctl-trace evidence), frame 0 byte-identical vs SW reference, frame 720 per per-codec SSIM Y floors (see below), N=3 FPS reported with σ, dmesg clean. |
| What about HEVC, VP9, AV1? | Each is blocked behind one specific external dependency. Iter1 spawned a follow-up issue per blocker; each becomes its own iteration. |
| What about the firefox-fourier consumer-side test (C7)? | Out of iter1 scope — ampere ran headless-ssh by design. C7 anchors to whichever future iteration first runs a compositor on ampere (likely a UX sub-iteration after AV1 lands, or the firefox-consumer iteration that follows). |
## Per-codec result table
| Codec | C1 frames | C2 ioctls | C3 frame 0 | C4 frame 720 (per refined floor) | C5 FPS (N=3) | C6 dmesg |
|-------|-----------|-----------|------------|----------------------------------|---------------|----------|
| **H.264** | ✓ 30/30 | ✓ V4L2 ioctls 224 + MEDIA_REQUEST 88 | ✓ byte-identical, sha `3214803d8be74416` | observed SSIM Y `0.667575` (no PASS threshold — see drift note) | **461.49 ± 0.61** | ✓ empty diff |
| **VP8** | ✓ 30/30 | ✓ V4L2 ioctls 208 + MEDIA_REQUEST 82 | ✓ byte-identical | ✓ **SSIM Y 1.000000** (byte-identical at f720, exceeds floor ≥ 1.000) | **217.24 ± 0.57** | ✓ empty diff |
| **MPEG-2** | ✓ 30/30 | ✓ V4L2 ioctls 168 + MEDIA_REQUEST 66 | ✓ byte-identical | ✓ **SSIM Y 0.999720** (within floor ≥ 0.9997, IEEE 1180 tolerance) | **199.84 ± 0.70** | ✓ empty diff |
Refined C4 per-codec floors (from Phase 4 plan §1, amended per Phase 5 review):
| Codec | Floor | Basis |
|-------|-------|-------|
| H.264 | no PASS threshold (documented at 0.6676) | cumulative GOP drift between libavcodec SW and rkvdec HW — see drift note below |
| VP8 | ≥ 1.000 (byte-identical) | Phase 3 N=3 observed perfect bit-exactness; raises the bar instead of lowering it |
| MPEG-2 | ≥ 0.9997 | memory [`feedback_mpeg2_hw_sw_idct_precision`](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_mpeg2_hw_sw_idct_precision.md) — hantro IDCT conformant per IEEE 1180 but ≤3 LSB off SW |
## H.264 SSIM 0.6676 drift note (Phase 5 framing amendment applied)
The H.264 SSIM Y at frame 720 measured **0.667575** on ampere RK3588. Fresnel iter1 measured **0.6431** on RK3399.
These are **two independent empirical data points on two different SoC generations**. They are not cross-verified against each other and neither is the other's reference anchor. RK3399's rkvdec is the v1-era driver; RK3588's rkvdec uses the VDPU381/383 mainline path — different decoder hardware, different deblocking and entropy-decode implementations. The fact that the two drift values converge (~0.65 ± 0.02) is consistent with both decoders being H.264-conformant within tolerance vs libavcodec SW but not bit-identical to it — which is exactly what's expected when two independent hardware implementations both follow the H.264 spec to its allowed precision.
The drift is NOT:
- a libva backend bug (frame 0 byte-identical, decode completes, HW path engaged)
- a kernel ABI deviation (Phase 3 ruled out frame-0 byte-deviation, which is the diagnostic for that)
- specific to either chip generation (both show the same shape of drift)
The drift IS:
- the expected accumulated divergence between two conformant H.264 implementations across a 720-frame inter-prediction chain
- accepted as-is for iter1; future iterations may investigate whether an x264 encode setting (`-tune psnr`, `-no-cabac`) reduces drift, but that's not iter1 scope
## Codecs blocked, spawned follow-up issues
| Codec | Blocker | Follow-up issue | Iteration that picks it up |
|-------|---------|------------------|----------------------------|
| HEVC | kernel oops in `rkvdec_hevc_prepare_hw_st_rps` (`__pi_memcmp` fault) — cascades to wedge `v4l2_mem2mem` for all decoders until reboot | [`marfrit/kernel-agent#11 [ka:experiment]`](https://git.reauktion.de/marfrit/kernel-agent/issues/11) | iter2 |
| VP9 | kernel doesn't register `V4L2_PIX_FMT_VP9_FRAME` on RK3588 rkvdec; needs VDPU381/383 variant_ops enablement | [`marfrit/kernel-agent#12 [ka:experiment]`](https://git.reauktion.de/marfrit/kernel-agent/issues/12) | iter3 |
| AV1 | kernel exposes `/dev/video4` cleanly but libva backend iter38 auto-probe hard-capped at 2 fds; need iter39 third-fd model | [`marfrit/libva-v4l2-request-fourier#2`](https://git.reauktion.de/marfrit/libva-v4l2-request-fourier/issues/2) | iter4 |
[`marfrit/kernel-agent#6`](https://git.reauktion.de/marfrit/kernel-agent/issues/6) (the iter0 umbrella) closed with cross-refs to all three.
## C7 — out of iter1 scope (headless by design)
The firefox-fourier vendor-defaults engagement test (Phase 1 C7) is out of iter1 scope. iter1 ran headless-ssh by design — no UX requirement, no compositor needed for C1-C6 which fully cover the libva backend correctness verdict.
C7 needs a Wayland compositor on ampere (the `widget.dmabuf.force-enabled` pref unlock path requires a real DMA-BUF surface to negotiate against). When a future iteration first stands up a graphical session on ampere — likely either (a) the firefox-consumer iteration after AV1 lands, or (b) a dedicated UX sub-iteration — C7 lands there. It is NOT a missing rig prerequisite that iter1 should have fixed; it's a scope boundary.
The vendor-default pref file is shipped correctly: `/usr/lib/firefox-fourier/defaults/preferences/rockchip-fourier-defaults.js` contains the three required prefs (`widget.dmabuf.force-enabled`, `media.hardware-video-decoding.force-enabled`, `media.ffvpx-hw.enabled` all `true`) per `marfrit-packages#8`. So C7 is "ready to run, just needs a compositor."
## State to carry forward to iter2+
**Carries (state):**
- Backend source pin `7ac934e` (iter38b)
- Hand-build over the broken CI package binary remains the install method until `marfrit-packages#17` is fixed
- Source clips in `~/measurements/encoded/` on ampere (5 codecs encoded, immutable)
- Per-codec C4 floors as refined above (VP8 ≥ 1.000, MPEG-2 ≥ 0.9997, H.264 documented)
- Reboot authorization for ampere when v4l2 stack wedges (per Phase 0 operator decision)
**Does NOT carry (data):**
- The Phase 3 N=3 FPS numbers are iter1-anchor only. iter2/3/4 measure their own anchors per their Phase 1 success criteria, with the iter1 N=3 mean as **reference history**.
**New work items unblocked by iter1 close:**
- iter2: HEVC fix experiment via kernel-agent#11
- iter3: VP9 enablement experiment via kernel-agent#12
- iter4: AV1 backend iter39 via libva-v4l2-request-fourier#2
- (potential) UX sub-iteration to validate C7 once a compositor is up on ampere
## What did NOT happen in iter1 (for clarity)
- No code changes to libva backend, mpv-fourier, ffmpeg-v4l2-request-fourier, firefox-fourier, kernel.
- No `pacman -Syu` on ampere (deliberate per Phase 2 constraint 1; libva hand-build override would have been silently reverted).
- No SDDM auto-login configuration (would have been scope creep; C7 deferred instead).
- No re-encoding of test clips (immutable for the campaign).
## Phase 8 hand-off
Memory entry will distill the iter1 lesson: campaigns on a new host with a known-good backend benefit from a "characterize-and-baseline" iteration *before* any change-the-system iteration; the baseline establishes the floor that change-iterations regress against, and surfaces blockers as separate issue threads rather than letting them conflate into the change-iteration's debug loop.