iter1 phase2: situation analysis + reset-context
Re-verifies Phase 0 substrate against live system (post-reboot ampere 2 min uptime, backend md5 still matches hand-build, source clips intact, memory list unchanged). Two missing tools flagged for Phase 4 to install (strace, firefox-fourier). Catalogues 7 constraints (backend file pacman-owned but content- unmanaged; kernel hand-managed; HEVC OOPS cascade; mpv --hwdec=vaapi needs GL surface; mpv MPEG-2 not on default hwdec allow-list; reboot authorization in scope; fresnel offline-able and not depended-on) and 7 known failure modes (HEVC oops, broken CI backend silent fail, hwdec SW fallback, MPEG-2 hwdec gate, firefox prefs version sensitivity, RK3399 vaDeriveImage zero-page issue as open Q for RK3588, pacman-Qo lying about file content). Explicit non-deps section (HEVC/VP9/AV1, kdirect-on-RK3588, cross-host byte-compare, DokuWiki) preempts scope creep doubts in Phase 3-7. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,72 @@
|
|||||||
|
# Phase 2 — Situation analysis (iter1)
|
||||||
|
|
||||||
|
Context reset 2026-05-16 09:20. ampere uptime 2 min (post-reboot to clear the HEVC-OOPS-induced v4l2_mem2mem wedge captured in Phase 0). Re-verifies all of Phase 0's substrate against live system, identifies the constraints + dependencies + known failure modes that the Phase 4 plan must respect.
|
||||||
|
|
||||||
|
## Re-verified substrate (against live system)
|
||||||
|
|
||||||
|
- **ampere reachable** (`ssh ampere` returns within 1s; uptime 2 min, kernel `7.0.0-rc3-devices+`).
|
||||||
|
- **Backend binary intact** as the hand-built copy: `/usr/lib/dri/v4l2_request_drv_video.so` md5 `0c9a7efaab6a31b74536ac1abff18dfa`, 484 800 B. Same as Phase 0 — the reboot didn't bring the broken CI build back, because the file on disk is what's installed; `pacman -Qo` still claims the package owns it.
|
||||||
|
- **All 5 source clips intact** in `~/measurements/encoded/`. Iter1 uses 3 of them (h264, vp8, mpeg2). The hevc clip stays untouched — opening it through the libva backend on this kernel triggers the OOPS cascade.
|
||||||
|
- **Memory list unchanged** since fresnel iter38 close. No new entries that would invalidate iter1 assumptions.
|
||||||
|
|
||||||
|
## Instrument prerequisites
|
||||||
|
|
||||||
|
| Tool | Present | Purpose |
|
||||||
|
|------|---------|---------|
|
||||||
|
| `ffmpeg` (from `ffmpeg-v4l2-request-fourier`) | yes | C1, C3-C5 driver |
|
||||||
|
| `mpv` (from `mpv-fourier 1:0.41.0-10`) | yes | secondary HW path check (`--hwdec=v4l2request-copy`) |
|
||||||
|
| `vainfo` | yes | C2 sanity (profile list) |
|
||||||
|
| `lsof` | yes | C2 — show open `/dev/video*` fds during decode |
|
||||||
|
| **`strace`** | **MISSING** | C2 — confirm `VIDIOC_S_EXT_CTRLS` / `MEDIA_REQUEST_IOC_QUEUE` ioctls on the right fds |
|
||||||
|
| `coredumpctl` | yes | residual-crash audit |
|
||||||
|
| `dmesg` | yes | C6 — clean-dmesg check |
|
||||||
|
| **`firefox-fourier`** | **MISSING** | C7 — vendor-default pref engagement test |
|
||||||
|
|
||||||
|
Two installs needed before Phase 3 can fully run; both are pacman-installable from the marfrit aarch64 repo (`strace` from arch core).
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
|
||||||
|
1. **Backend file is pacman-owned but content-unmanaged.** `pacman -Qo /usr/lib/dri/v4l2_request_drv_video.so` reports it owned by `libva-v4l2-request-fourier 1.0.0.r348.7ac934e-1`, but the file is the hand-built `0c9a7efa…` (485 KB), not the broken CI build `ae611d80…` (133 KB) the package shipped. Any `pacman -Syu` or `pacman -S libva-v4l2-request-fourier` would silently overwrite back to the broken binary. Treat this as fragile state: avoid full-system upgrades for the duration of iter1, and re-`md5sum` the file in every phase that's about to use it. Same constraint as fresnel (per the `marfrit-packages#17` thread).
|
||||||
|
2. **Booted kernel hand-managed in `/boot/firmware/` extlinux entries.** The `linux-ampere-fourier 7.0rc3.kafr1-1` package shipped `Image-7.0.0-rc3-ARCH+` + the matching DTB + initramfs into `/boot/firmware/`, but the boot path itself is the existing extlinux config with `default arch_mainline`. A pacman update of the kernel would re-deposit files but the boot record stays as-is until something flips the default. Same operational shape as fresnel.
|
||||||
|
3. **HEVC OOPS cascades through `v4l2_mem2mem`.** Once `rkvdec_hevc_prepare_hw_st_rps` faults on `__pi_memcmp`, *every* subsequent V4L2 m2m queue request from any device (rkvdec H.264, hantro VP8, hantro MPEG-2) blocks in futex wait inside libva. The blocking is recoverable only by reboot. Iter1 must not include HEVC in any decode batch; even one HEVC call wedges the whole rig and invalidates the rest of the sweep.
|
||||||
|
4. **`--hwdec=vaapi` (zero-copy) needs a GL/Vulkan surface.** Headless ssh sessions have no surface, so plain `--hwdec=vaapi` falls back to SW silently. Iter1 mpv invocations use `--hwdec=vaapi-copy` or `--hwdec=v4l2request-copy` — both copy decoded frames to CPU memory and don't need a display. (Established on fresnel; recorded as a config rule for the campaign.)
|
||||||
|
5. **mpv default `--hwdec-codecs` excludes MPEG-2.** Must pass `--hwdec-codecs=all` for the MPEG-2 path to even attempt HW. (Same fresnel finding.)
|
||||||
|
6. **Reboot authorization in scope.** Per Phase 0 operator decision, reboot is authorized for ampere when v4l2 stack wedges. No fresh prompt needed inside iter1.
|
||||||
|
7. **fresnel offline-able.** Pinebook Pro lid-close suspends; can't depend on fresnel being reachable mid-iteration. Iter1 doesn't need fresnel at all (everything happens on ampere), so this is just a guard against accidentally depending on cross-host fetches.
|
||||||
|
|
||||||
|
## Dependencies (external-to-this-iteration)
|
||||||
|
|
||||||
|
| Dependency | Status | Risk if it changes |
|
||||||
|
|------------|--------|--------------------|
|
||||||
|
| `linux-ampere-fourier 7.0rc3.kafr1-1` (kernel) | installed, booted | A pacman update could replace `/boot/firmware/Image-7.0.0-rc3-ARCH+` while the OOPS-prone HEVC code path is still present — would invalidate Phase 3 measurements if rerun on a different kernel. Don't `pacman -Syu` during iter1. |
|
||||||
|
| `libva-v4l2-request-fourier 1.0.0.r348.7ac934e-1` (backend) | hand-built `0c9a7efa…` installed over the package | Same — see Constraint 1. |
|
||||||
|
| `ffmpeg-v4l2-request-fourier 2:8.1.r123329.b57fbbe-4` | installed | If updated mid-iter1, hwaccel behavior could shift. Pin for the duration. |
|
||||||
|
| `mpv-fourier 1:0.41.0-10` | installed | Pin. |
|
||||||
|
| `libva 2.23.0-1` | installed | Pin. |
|
||||||
|
| `linux-api-headers 6.19-1` | installed | Pin (V4L2 control struct layouts come from here). |
|
||||||
|
| Source clips in `~/measurements/encoded/` | present | Re-encoding would change file contents; treat as immutable for iter1. |
|
||||||
|
|
||||||
|
## Known failure modes (from prior fleet experience + Phase 0)
|
||||||
|
|
||||||
|
1. **HEVC kernel OOPS + v4l2_mem2mem wedge** (Phase 0 finding). Avoidance: skip HEVC.
|
||||||
|
2. **libva backend HEVC silent failure when the broken CI package binary is in use** (`marfrit-packages#17`). Avoidance: re-`md5sum` before Phase 3; redeploy hand-build if md5 doesn't match.
|
||||||
|
3. **mpv `--hwdec=vaapi` fallback to SW with no GL surface** (fresnel finding). Avoidance: use `-copy` variants headless.
|
||||||
|
4. **mpv MPEG-2 not on default hwdec allow-list** (fresnel finding). Avoidance: pass `--hwdec-codecs=all`.
|
||||||
|
5. **Firefox empty-profile vendor-default activation depends on `widget.dmabuf.force-enabled` being shipped in `rockchip-fourier-defaults.js`** (fresnel finding, marfrit-packages#8). Confirmed shipped in `firefox-fourier 150.0.1-5`. Install that exact version on ampere; do not regress to an earlier rev.
|
||||||
|
6. **vaDeriveImage / cached-mmap returns all-zero on RK3399 hantro CAPTURE** (memory `reference_dmabuf_resv_blocker`). Open question whether same applies to RK3588 hantro CAPTURE — if Phase 3 sees zero-byte HW output on VP8/MPEG-2, the same workaround chain (DMA-BUF GL via `mpv --vo=image`, or ffmpeg-v4l2request DRM_PRIME) applies, and the byte-compare anchor C3 must be done via DMA-BUF not cached-mmap. Surface this if it shows up; don't pre-compensate.
|
||||||
|
7. **`pacman -Qo` lies about content** (Constraint 1 generalized). Trust `md5sum`, not pacman ownership records.
|
||||||
|
|
||||||
|
## Things this iteration does NOT depend on
|
||||||
|
|
||||||
|
To preempt "should we do X first" doubts during Phase 3-7:
|
||||||
|
|
||||||
|
- HEVC working — explicitly out of scope per Phase 1.
|
||||||
|
- VP9 working — out of scope.
|
||||||
|
- AV1 working — out of scope.
|
||||||
|
- `kdirect` rig on RK3588 — Phase 1 picked SW-compare as the bit-exact anchor instead.
|
||||||
|
- Cross-host byte-compare against fresnel — Phase 0 carryover rule: numbers don't carry across hosts. Codec-byte-identity on RK3399 doesn't predict identity on RK3588 because the HW IDCT / deblock implementations differ between chip generations.
|
||||||
|
- DokuWiki for Phase 5 review — for iter1, Phase 5 uses an in-session `Plan` subagent with `model: sonnet` instead, per the campaign README. DokuWiki is the operator's archival channel, not a hard dependency.
|
||||||
|
|
||||||
|
## Phase 2 close
|
||||||
|
|
||||||
|
Substrate re-verified against live system; two missing tools (`strace`, `firefox-fourier`) identified as Phase 4 plan items; seven constraints and seven known failure modes catalogued. Ready for Phase 3 baseline instrumentation.
|
||||||
Reference in New Issue
Block a user