4353db0cd0
Re-verifies Phase 0 substrate against live system (post-reboot ampere 2 min uptime, backend md5 still matches hand-build, source clips intact, memory list unchanged). Two missing tools flagged for Phase 4 to install (strace, firefox-fourier). Catalogues 7 constraints (backend file pacman-owned but content- unmanaged; kernel hand-managed; HEVC OOPS cascade; mpv --hwdec=vaapi needs GL surface; mpv MPEG-2 not on default hwdec allow-list; reboot authorization in scope; fresnel offline-able and not depended-on) and 7 known failure modes (HEVC oops, broken CI backend silent fail, hwdec SW fallback, MPEG-2 hwdec gate, firefox prefs version sensitivity, RK3399 vaDeriveImage zero-page issue as open Q for RK3588, pacman-Qo lying about file content). Explicit non-deps section (HEVC/VP9/AV1, kdirect-on-RK3588, cross-host byte-compare, DokuWiki) preempts scope creep doubts in Phase 3-7. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
7.7 KiB
7.7 KiB
Phase 2 — Situation analysis (iter1)
Context reset 2026-05-16 09:20. ampere uptime 2 min (post-reboot to clear the HEVC-OOPS-induced v4l2_mem2mem wedge captured in Phase 0). Re-verifies all of Phase 0's substrate against live system, identifies the constraints + dependencies + known failure modes that the Phase 4 plan must respect.
Re-verified substrate (against live system)
- ampere reachable (
ssh amperereturns within 1s; uptime 2 min, kernel7.0.0-rc3-devices+). - Backend binary intact as the hand-built copy:
/usr/lib/dri/v4l2_request_drv_video.somd50c9a7efaab6a31b74536ac1abff18dfa, 484 800 B. Same as Phase 0 — the reboot didn't bring the broken CI build back, because the file on disk is what's installed;pacman -Qostill claims the package owns it. - All 5 source clips intact in
~/measurements/encoded/. Iter1 uses 3 of them (h264, vp8, mpeg2). The hevc clip stays untouched — opening it through the libva backend on this kernel triggers the OOPS cascade. - Memory list unchanged since fresnel iter38 close. No new entries that would invalidate iter1 assumptions.
Instrument prerequisites
| Tool | Present | Purpose |
|---|---|---|
ffmpeg (from ffmpeg-v4l2-request-fourier) |
yes | C1, C3-C5 driver |
mpv (from mpv-fourier 1:0.41.0-10) |
yes | secondary HW path check (--hwdec=v4l2request-copy) |
vainfo |
yes | C2 sanity (profile list) |
lsof |
yes | C2 — show open /dev/video* fds during decode |
strace |
MISSING | C2 — confirm VIDIOC_S_EXT_CTRLS / MEDIA_REQUEST_IOC_QUEUE ioctls on the right fds |
coredumpctl |
yes | residual-crash audit |
dmesg |
yes | C6 — clean-dmesg check |
firefox-fourier |
MISSING | C7 — vendor-default pref engagement test |
Two installs needed before Phase 3 can fully run; both are pacman-installable from the marfrit aarch64 repo (strace from arch core).
Constraints
- Backend file is pacman-owned but content-unmanaged.
pacman -Qo /usr/lib/dri/v4l2_request_drv_video.soreports it owned bylibva-v4l2-request-fourier 1.0.0.r348.7ac934e-1, but the file is the hand-built0c9a7efa…(485 KB), not the broken CI buildae611d80…(133 KB) the package shipped. Anypacman -Syuorpacman -S libva-v4l2-request-fourierwould silently overwrite back to the broken binary. Treat this as fragile state: avoid full-system upgrades for the duration of iter1, and re-md5sumthe file in every phase that's about to use it. Same constraint as fresnel (per themarfrit-packages#17thread). - Booted kernel hand-managed in
/boot/firmware/extlinux entries. Thelinux-ampere-fourier 7.0rc3.kafr1-1package shippedImage-7.0.0-rc3-ARCH++ the matching DTB + initramfs into/boot/firmware/, but the boot path itself is the existing extlinux config withdefault arch_mainline. A pacman update of the kernel would re-deposit files but the boot record stays as-is until something flips the default. Same operational shape as fresnel. - HEVC OOPS cascades through
v4l2_mem2mem. Oncerkvdec_hevc_prepare_hw_st_rpsfaults on__pi_memcmp, every subsequent V4L2 m2m queue request from any device (rkvdec H.264, hantro VP8, hantro MPEG-2) blocks in futex wait inside libva. The blocking is recoverable only by reboot. Iter1 must not include HEVC in any decode batch; even one HEVC call wedges the whole rig and invalidates the rest of the sweep. --hwdec=vaapi(zero-copy) needs a GL/Vulkan surface. Headless ssh sessions have no surface, so plain--hwdec=vaapifalls back to SW silently. Iter1 mpv invocations use--hwdec=vaapi-copyor--hwdec=v4l2request-copy— both copy decoded frames to CPU memory and don't need a display. (Established on fresnel; recorded as a config rule for the campaign.)- mpv default
--hwdec-codecsexcludes MPEG-2. Must pass--hwdec-codecs=allfor the MPEG-2 path to even attempt HW. (Same fresnel finding.) - Reboot authorization in scope. Per Phase 0 operator decision, reboot is authorized for ampere when v4l2 stack wedges. No fresh prompt needed inside iter1.
- fresnel offline-able. Pinebook Pro lid-close suspends; can't depend on fresnel being reachable mid-iteration. Iter1 doesn't need fresnel at all (everything happens on ampere), so this is just a guard against accidentally depending on cross-host fetches.
Dependencies (external-to-this-iteration)
| Dependency | Status | Risk if it changes |
|---|---|---|
linux-ampere-fourier 7.0rc3.kafr1-1 (kernel) |
installed, booted | A pacman update could replace /boot/firmware/Image-7.0.0-rc3-ARCH+ while the OOPS-prone HEVC code path is still present — would invalidate Phase 3 measurements if rerun on a different kernel. Don't pacman -Syu during iter1. |
libva-v4l2-request-fourier 1.0.0.r348.7ac934e-1 (backend) |
hand-built 0c9a7efa… installed over the package |
Same — see Constraint 1. |
ffmpeg-v4l2-request-fourier 2:8.1.r123329.b57fbbe-4 |
installed | If updated mid-iter1, hwaccel behavior could shift. Pin for the duration. |
mpv-fourier 1:0.41.0-10 |
installed | Pin. |
libva 2.23.0-1 |
installed | Pin. |
linux-api-headers 6.19-1 |
installed | Pin (V4L2 control struct layouts come from here). |
Source clips in ~/measurements/encoded/ |
present | Re-encoding would change file contents; treat as immutable for iter1. |
Known failure modes (from prior fleet experience + Phase 0)
- HEVC kernel OOPS + v4l2_mem2mem wedge (Phase 0 finding). Avoidance: skip HEVC.
- libva backend HEVC silent failure when the broken CI package binary is in use (
marfrit-packages#17). Avoidance: re-md5sumbefore Phase 3; redeploy hand-build if md5 doesn't match. - mpv
--hwdec=vaapifallback to SW with no GL surface (fresnel finding). Avoidance: use-copyvariants headless. - mpv MPEG-2 not on default hwdec allow-list (fresnel finding). Avoidance: pass
--hwdec-codecs=all. - Firefox empty-profile vendor-default activation depends on
widget.dmabuf.force-enabledbeing shipped inrockchip-fourier-defaults.js(fresnel finding, marfrit-packages#8). Confirmed shipped infirefox-fourier 150.0.1-5. Install that exact version on ampere; do not regress to an earlier rev. - vaDeriveImage / cached-mmap returns all-zero on RK3399 hantro CAPTURE (memory
reference_dmabuf_resv_blocker). Open question whether same applies to RK3588 hantro CAPTURE — if Phase 3 sees zero-byte HW output on VP8/MPEG-2, the same workaround chain (DMA-BUF GL viampv --vo=image, or ffmpeg-v4l2request DRM_PRIME) applies, and the byte-compare anchor C3 must be done via DMA-BUF not cached-mmap. Surface this if it shows up; don't pre-compensate. pacman -Qolies about content (Constraint 1 generalized). Trustmd5sum, not pacman ownership records.
Things this iteration does NOT depend on
To preempt "should we do X first" doubts during Phase 3-7:
- HEVC working — explicitly out of scope per Phase 1.
- VP9 working — out of scope.
- AV1 working — out of scope.
kdirectrig on RK3588 — Phase 1 picked SW-compare as the bit-exact anchor instead.- Cross-host byte-compare against fresnel — Phase 0 carryover rule: numbers don't carry across hosts. Codec-byte-identity on RK3399 doesn't predict identity on RK3588 because the HW IDCT / deblock implementations differ between chip generations.
- DokuWiki for Phase 5 review — for iter1, Phase 5 uses an in-session
Plansubagent withmodel: sonnetinstead, per the campaign README. DokuWiki is the operator's archival channel, not a hard dependency.
Phase 2 close
Substrate re-verified against live system; two missing tools (strace, firefox-fourier) identified as Phase 4 plan items; seven constraints and seven known failure modes catalogued. Ready for Phase 3 baseline instrumentation.