Files
ampere-fourier/phase2_situation.md
marfrit 4353db0cd0 iter1 phase2: situation analysis + reset-context
Re-verifies Phase 0 substrate against live system (post-reboot ampere
2 min uptime, backend md5 still matches hand-build, source clips
intact, memory list unchanged). Two missing tools flagged for Phase 4
to install (strace, firefox-fourier).

Catalogues 7 constraints (backend file pacman-owned but content-
unmanaged; kernel hand-managed; HEVC OOPS cascade; mpv --hwdec=vaapi
needs GL surface; mpv MPEG-2 not on default hwdec allow-list; reboot
authorization in scope; fresnel offline-able and not depended-on)
and 7 known failure modes (HEVC oops, broken CI backend silent fail,
hwdec SW fallback, MPEG-2 hwdec gate, firefox prefs version
sensitivity, RK3399 vaDeriveImage zero-page issue as open Q for
RK3588, pacman-Qo lying about file content).

Explicit non-deps section (HEVC/VP9/AV1, kdirect-on-RK3588, cross-host
byte-compare, DokuWiki) preempts scope creep doubts in Phase 3-7.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 07:21:07 +00:00

7.7 KiB

Phase 2 — Situation analysis (iter1)

Context reset 2026-05-16 09:20. ampere uptime 2 min (post-reboot to clear the HEVC-OOPS-induced v4l2_mem2mem wedge captured in Phase 0). Re-verifies all of Phase 0's substrate against live system, identifies the constraints + dependencies + known failure modes that the Phase 4 plan must respect.

Re-verified substrate (against live system)

  • ampere reachable (ssh ampere returns within 1s; uptime 2 min, kernel 7.0.0-rc3-devices+).
  • Backend binary intact as the hand-built copy: /usr/lib/dri/v4l2_request_drv_video.so md5 0c9a7efaab6a31b74536ac1abff18dfa, 484 800 B. Same as Phase 0 — the reboot didn't bring the broken CI build back, because the file on disk is what's installed; pacman -Qo still claims the package owns it.
  • All 5 source clips intact in ~/measurements/encoded/. Iter1 uses 3 of them (h264, vp8, mpeg2). The hevc clip stays untouched — opening it through the libva backend on this kernel triggers the OOPS cascade.
  • Memory list unchanged since fresnel iter38 close. No new entries that would invalidate iter1 assumptions.

Instrument prerequisites

Tool Present Purpose
ffmpeg (from ffmpeg-v4l2-request-fourier) yes C1, C3-C5 driver
mpv (from mpv-fourier 1:0.41.0-10) yes secondary HW path check (--hwdec=v4l2request-copy)
vainfo yes C2 sanity (profile list)
lsof yes C2 — show open /dev/video* fds during decode
strace MISSING C2 — confirm VIDIOC_S_EXT_CTRLS / MEDIA_REQUEST_IOC_QUEUE ioctls on the right fds
coredumpctl yes residual-crash audit
dmesg yes C6 — clean-dmesg check
firefox-fourier MISSING C7 — vendor-default pref engagement test

Two installs needed before Phase 3 can fully run; both are pacman-installable from the marfrit aarch64 repo (strace from arch core).

Constraints

  1. Backend file is pacman-owned but content-unmanaged. pacman -Qo /usr/lib/dri/v4l2_request_drv_video.so reports it owned by libva-v4l2-request-fourier 1.0.0.r348.7ac934e-1, but the file is the hand-built 0c9a7efa… (485 KB), not the broken CI build ae611d80… (133 KB) the package shipped. Any pacman -Syu or pacman -S libva-v4l2-request-fourier would silently overwrite back to the broken binary. Treat this as fragile state: avoid full-system upgrades for the duration of iter1, and re-md5sum the file in every phase that's about to use it. Same constraint as fresnel (per the marfrit-packages#17 thread).
  2. Booted kernel hand-managed in /boot/firmware/ extlinux entries. The linux-ampere-fourier 7.0rc3.kafr1-1 package shipped Image-7.0.0-rc3-ARCH+ + the matching DTB + initramfs into /boot/firmware/, but the boot path itself is the existing extlinux config with default arch_mainline. A pacman update of the kernel would re-deposit files but the boot record stays as-is until something flips the default. Same operational shape as fresnel.
  3. HEVC OOPS cascades through v4l2_mem2mem. Once rkvdec_hevc_prepare_hw_st_rps faults on __pi_memcmp, every subsequent V4L2 m2m queue request from any device (rkvdec H.264, hantro VP8, hantro MPEG-2) blocks in futex wait inside libva. The blocking is recoverable only by reboot. Iter1 must not include HEVC in any decode batch; even one HEVC call wedges the whole rig and invalidates the rest of the sweep.
  4. --hwdec=vaapi (zero-copy) needs a GL/Vulkan surface. Headless ssh sessions have no surface, so plain --hwdec=vaapi falls back to SW silently. Iter1 mpv invocations use --hwdec=vaapi-copy or --hwdec=v4l2request-copy — both copy decoded frames to CPU memory and don't need a display. (Established on fresnel; recorded as a config rule for the campaign.)
  5. mpv default --hwdec-codecs excludes MPEG-2. Must pass --hwdec-codecs=all for the MPEG-2 path to even attempt HW. (Same fresnel finding.)
  6. Reboot authorization in scope. Per Phase 0 operator decision, reboot is authorized for ampere when v4l2 stack wedges. No fresh prompt needed inside iter1.
  7. fresnel offline-able. Pinebook Pro lid-close suspends; can't depend on fresnel being reachable mid-iteration. Iter1 doesn't need fresnel at all (everything happens on ampere), so this is just a guard against accidentally depending on cross-host fetches.

Dependencies (external-to-this-iteration)

Dependency Status Risk if it changes
linux-ampere-fourier 7.0rc3.kafr1-1 (kernel) installed, booted A pacman update could replace /boot/firmware/Image-7.0.0-rc3-ARCH+ while the OOPS-prone HEVC code path is still present — would invalidate Phase 3 measurements if rerun on a different kernel. Don't pacman -Syu during iter1.
libva-v4l2-request-fourier 1.0.0.r348.7ac934e-1 (backend) hand-built 0c9a7efa… installed over the package Same — see Constraint 1.
ffmpeg-v4l2-request-fourier 2:8.1.r123329.b57fbbe-4 installed If updated mid-iter1, hwaccel behavior could shift. Pin for the duration.
mpv-fourier 1:0.41.0-10 installed Pin.
libva 2.23.0-1 installed Pin.
linux-api-headers 6.19-1 installed Pin (V4L2 control struct layouts come from here).
Source clips in ~/measurements/encoded/ present Re-encoding would change file contents; treat as immutable for iter1.

Known failure modes (from prior fleet experience + Phase 0)

  1. HEVC kernel OOPS + v4l2_mem2mem wedge (Phase 0 finding). Avoidance: skip HEVC.
  2. libva backend HEVC silent failure when the broken CI package binary is in use (marfrit-packages#17). Avoidance: re-md5sum before Phase 3; redeploy hand-build if md5 doesn't match.
  3. mpv --hwdec=vaapi fallback to SW with no GL surface (fresnel finding). Avoidance: use -copy variants headless.
  4. mpv MPEG-2 not on default hwdec allow-list (fresnel finding). Avoidance: pass --hwdec-codecs=all.
  5. Firefox empty-profile vendor-default activation depends on widget.dmabuf.force-enabled being shipped in rockchip-fourier-defaults.js (fresnel finding, marfrit-packages#8). Confirmed shipped in firefox-fourier 150.0.1-5. Install that exact version on ampere; do not regress to an earlier rev.
  6. vaDeriveImage / cached-mmap returns all-zero on RK3399 hantro CAPTURE (memory reference_dmabuf_resv_blocker). Open question whether same applies to RK3588 hantro CAPTURE — if Phase 3 sees zero-byte HW output on VP8/MPEG-2, the same workaround chain (DMA-BUF GL via mpv --vo=image, or ffmpeg-v4l2request DRM_PRIME) applies, and the byte-compare anchor C3 must be done via DMA-BUF not cached-mmap. Surface this if it shows up; don't pre-compensate.
  7. pacman -Qo lies about content (Constraint 1 generalized). Trust md5sum, not pacman ownership records.

Things this iteration does NOT depend on

To preempt "should we do X first" doubts during Phase 3-7:

  • HEVC working — explicitly out of scope per Phase 1.
  • VP9 working — out of scope.
  • AV1 working — out of scope.
  • kdirect rig on RK3588 — Phase 1 picked SW-compare as the bit-exact anchor instead.
  • Cross-host byte-compare against fresnel — Phase 0 carryover rule: numbers don't carry across hosts. Codec-byte-identity on RK3399 doesn't predict identity on RK3588 because the HW IDCT / deblock implementations differ between chip generations.
  • DokuWiki for Phase 5 review — for iter1, Phase 5 uses an in-session Plan subagent with model: sonnet instead, per the campaign README. DokuWiki is the operator's archival channel, not a hard dependency.

Phase 2 close

Substrate re-verified against live system; two missing tools (strace, firefox-fourier) identified as Phase 4 plan items; seven constraints and seven known failure modes catalogued. Ready for Phase 3 baseline instrumentation.