diff --git a/phase1_goal.md b/phase1_goal.md new file mode 100644 index 0000000..73dc43c --- /dev/null +++ b/phase1_goal.md @@ -0,0 +1,69 @@ +# Phase 1 — Goal formulation (iter1 of ampere-kernel-decoders, meta-campaign shape) + +Locked 2026-05-16, post-Phase-0-prior-art-survey. Operator picked option 2 from Phase 0's re-scope question: **meta-campaign coordinating an HEVC backend iteration and a VP9 kernel iteration as parallel children under one umbrella**. + +## Meta-campaign goal (one sentence) + +**Close the two decoder gaps `ampere-fourier` iter1 surfaced (HEVC kernel OOPS, VP9 not exposed) on ampere RK3588 by running them as parallel sub-iterations whose mid-iteration progress + Phase 7 verification both regress against the `ampere-fourier` iter1 baseline, and whose patches land in their respective repos (libva-v4l2-request-fourier for HEVC, kernel-agent experiment branch for VP9) per the operator policy that ampere baseline kernel stays clean.** + +Despite this campaign being initially named `ampere-kernel-decoders`, Phase 0 reclassified HEVC to userspace work. The name stays for now (it captures *substrate* — ampere as test host — and *purpose* — decoder enablement). If a rename is wanted (`ampere-decoders` or `rk3588-decoders` would be more accurate post-reclassification), it's an operator decision; the meta-campaign artifacts work under any name. + +## Sub-iteration ledger + +Each is its own 8-phase loop with its own `phase0_findings_iter.md` etc. anchored under this campaign's umbrella. Numbering follows iteration entry order: + +| Sub-iter | Title | Repo(s) the patch lands in | Tracking issue | Phase 0 entry trigger | +|----------|-------|----------------------------|----------------|-----------------------| +| **iter2 (HEVC)** | Backend populates `V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS` / `_LT_RPS` for VDPU381 HEVC | `marfrit/libva-v4l2-request-fourier` | `libva-v4l2-request-fourier#3` | when iter1 closes | +| **iter3 (VP9)** | Kernel enablement of `V4L2_PIX_FMT_VP9_FRAME` on RK3588 rkvdec VDPU381 variant_ops | kernel-agent experiment branch (preferred) OR direct patch series to mainline (TBD) | `marfrit/kernel-agent#12` | once iter2 closes OR can start in parallel if operator wants concurrency | +| **(potential iter4)** | AV1 backend iter39 third-fd probe | `marfrit/libva-v4l2-request-fourier` | `libva-v4l2-request-fourier#2` | out of this meta-campaign per `ampere-fourier` iter1 close (AV1 was tagged for a different campaign-iteration assignment) — kept here as a placeholder pointer in case operator wants to fold it in | + +## Iteration ordering hypothesis + +**HEVC (iter2) before VP9 (iter3)**, because: +- HEVC fix-path is a smaller-blast-radius userspace patch with a fast feedback loop (no kernel rebuild needed; backend rebuild on ampere is ~30s). +- The HEVC mechanism reconstruction in Phase 0 is a *hypothesis*, not yet test-verified. A minimal backend patch (even with dummy values for the new CIDs) is the cheapest way to confirm whether the survey's diagnosis is correct. If wrong, the campaign re-anchors before committing to VP9 work. +- VP9 has unresolved upstream-tree-selection (Sarma android vs dongioia overlay vs wait-for-Casanova-v1 mainline). Phase 0 of iter3 will need to choose; deferring that lets the operator absorb iter2's findings first. + +The operator may overrule and start them in parallel — sub-iterations are independent at the artifact level. If parallel, each just has its own Phase 0/1/... + +## Meta-campaign success criteria + +For the umbrella to close as **success**: + +1. **iter2 closes with HEVC end-to-end decode validated** on ampere — `ampere-fourier` Phase 3 instrumentation (C1-C6) re-run with HEVC added to the codec list, all six criteria pass per the per-codec floors `ampere-fourier` iter1 established. HEVC SSIM Y at f720 expected to fall in the same drift territory as H.264 (~0.65 ± 0.05), accepted as documented-drift not regression. +2. **iter3 closes with VP9 end-to-end decode validated** on ampere — same C1-C6 shape, applied to a VP9-encoded clip. Per-codec floor for VP9 from fresnel iter1 reference history is SSIM Y = 1.000 (byte-identical) — that's the target to verify holds on RK3588 too, but iter3's Phase 1 will lock the actual floor. +3. **No regression** to `ampere-fourier` iter1's 3-codec baseline (H.264, VP8, MPEG-2 still pass C1-C6 per the iter1 floors). +4. **All work lands in the right repos**: HEVC patch in `libva-v4l2-request-fourier` (commits + new pkgrel), VP9 patch in kernel-agent experiment branch (NOT in `linux-ampere-fourier` baseline per operator policy). + +If iter2's hypothesis is *wrong* (backend patch doesn't unblock HEVC), re-open `kernel-agent#11` with the new evidence and the meta-campaign re-enters Phase 0 for a corrected substrate read. + +## Out of meta-campaign + +- **AV1** — listed in the ledger as "potential iter4" pointer but properly belongs to whatever campaign owns `libva-v4l2-request-fourier#2`; not committed to here. +- **IOMMU restore patch verification** — surfaced as Phase 0 open Q4 ("is `media: rkvdec: Restore iommu addresses on errors` present in `ampere-minimal-devices @ 7c241f2e2835`?"). This is a Phase 2 source-read item for whichever iteration first touches the rkvdec code path on ampere — likely iter2 (HEVC backend extension exercises the same kernel path on first decode). Don't make it a separate iteration; it's a passive substrate check. +- **lore.kernel.org Anubis re-check** — survey couldn't enumerate due to bot-protection; Phase 2 of iter2 should manually re-check `lore.kernel.org/linux-media/?q=rkvdec+VDPU381+RPS` for any HEVC stability fixes between 7.0 and 7.0-rcN. If a kernel-side fix DID land in -rc4..-rc7 (post-our-rc3 baseline), iter2's plan must include "bump ampere kernel to 7.0 final or later" before the backend patch is exercised. +- **Multi-core glue** — out per Phase 0; would be a far-future iteration. + +## Resolved Phase 0 open questions + +- **Q1 (meta scope)**: option 2 picked — meta-campaign with parallel sub-iterations. +- **Q2 (VP9 starting tree)**: deferred to iter3's Phase 0. +- **Q3 (test-verification of HEVC mechanism)**: explicitly captured as iter2's Phase 6 first-action — write minimal backend patch that registers the new CIDs (even with dummy data), confirm whether OOPS disappears / changes shape. Then iterate to correct CID-population. +- **Q4 (IOMMU restore patch present?)**: passive substrate check during iter2 Phase 2. +- **Q5 (lore.kernel.org Anubis re-check)**: iter2 Phase 2 task. + +## Hypothesis (meta level) + +With backend `libva-v4l2-request-fourier` extended to populate the new HEVC EXT_SPS_*_RPS controls (iter2) and the RK3588 rkvdec kernel side extended to register VP9 (iter3), ampere achieves 5/6 codec HW decode coverage (H.264, HEVC, VP9, VP8, MPEG-2 — AV1 stays separate). The 6th, AV1, is unblocked by `libva-v4l2-request-fourier#2` backend iter39 outside this meta-campaign. + +**The hypothesis is wrong if**: +- iter2 backend patch doesn't unblock HEVC — re-open kernel-agent#11, re-enter Phase 0 with new evidence (mechanism is something other than UAPI-gap, possibly a real kernel bug). +- iter3 VP9 starting tree doesn't rebase cleanly onto `ampere-minimal-devices` — re-enter Phase 0 of iter3 to pick a different upstream source. +- Either iteration's Phase 7 fails to regress against ampere-fourier iter1's 3-codec baseline — i.e. enabling HEVC or VP9 broke H.264/VP8/MPEG-2 — meaning the patch is touching shared code in a way Phase 5 review didn't catch. + +## Phase 1 close + +Goal locked. Ledger with 2-3 sub-iterations established. Order hypothesis (HEVC first) stated. 5 Phase-0 open questions all routed to specific iter2/3 phases. Ready to start iter2 Phase 0. + +(Per dev-process discipline: Phase 2 of *this* iter1 — the meta-campaign's situation analysis — will be brief since the meta-campaign's "situation" is mostly handled by the per-sub-iteration Phase 2s. iter1 of the meta-campaign itself is mostly Phase 0 + Phase 1 + Phase 8: scoping + ledger setup + close. The real engineering loops happen inside iter2 and iter3.)