libva-v4l2-request-fourier 1.0.0.r348.7ac934e-1: CI binary segfaults on HEVC vaEndPicture; LTO/ICF suspected in build pipeline #17
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
libva-v4l2-request-fourier: CI binary 4× smaller than hand-build + HEVC SEGFAULT — LTO suspected
libva-v4l2-request-fourier 1.0.0.r348.7ac934e-1(published inmarfritaarch64 repo, built today onfermiLXC) segfaults on HEVC decode invaEndPicturepath. Same source7ac934ebuilt locally on the consumer host (fresnel, Pinebook Pro) produces a working binary. Likely aggressive optimization in the CI build pipeline that the PKGBUILD doesn't pin.Affects the iter38 close: HEVC was certified byte-exact on May 14 with the hand-built backend; today's first attempt to consume the packaged backend reverted that result. The other 4 codecs (H.264, VP8, VP9, MPEG-2) work fine on both the CI and the hand-built binary — only HEVC trips.
Reproducer
On fresnel after
pacman -S libva-v4l2-request-fourier:Same input + same args run against
mpv --hwdec=v4l2request-copyworks (60 s clip in 6 s = 236 FPS) — kernel + libavcodec HEVC path is fine; only the libva backend's HEVC code path crashes.Core dump trace
Addresses unresolvable on the stripped CI binary (
addr2linereturns the only exported symbol__vaDriverInit_1_23for every offset). My local debug build has different layout so symbol mapping by offset isn't reliable across builds.Build-flag bisection
Same source
7ac934e, four binary variants tested on fresnel:meson setup build(defaultdebugoptimized, no strip)0c9a7efa…arch-meson build --buildtype=release(mirrors PKGBUILD)06650dc6…arch-meson build --buildtype=release+CFLAGS=-flto7b4387f1…marfrit-packagesPKGBUILD via makepkg on fermi)ae611d80…Pattern:
arch-meson --buildtype=releaseshould not be doing alone. Something extra is being set at makepkg time.makepkg.conf on both fermi (CI builder) and fresnel (consumer / local builder) reads
OPTIONS=(strip docs !libtool !staticlibs emptydirs zipman purge !debug !lto)— i.e. LTO is explicitly opted out in both. So the source of LTO (or LTO-shaped behaviour) producing the CI binary is somewhere outside the visible config.Candidates to chase:
arch-meson's wrapper unconditionally sets-Db_pie=trueand--buildtype=plain(then PKGBUILD reapplies--buildtype=release). The interaction withb_pieon aarch64 may pull in-fipa-icf/-flto-partition=one/ cross-function ICF that has the same dead-code-eliminating effect as LTO under arch-meson defaults. Confirm what flags the actualgccinvocation got.fermi's makepkg may be picking up a user-level override file (
~/.makepkg.conf,pacman-key's package signing wrapper, or a buildah/podman environment file from the CI workflowbin/marfrit-publishinvocation). The repository-level workflow file (marfrit-packages/.gitea/workflows/build.yml) may inject env vars that aren't in/etc/makepkg.conf.ABI mismatch:
linux-api-headers 6.19-1,libva 2.23.0-1,libdrm 2.4.133-1are byte-identical on fermi vs fresnel (I sha256'd/usr/include/linux/v4l2-controls.hon both — same), so the V4L2 stateless HEVC control structs are layout-identical. Header skew is not the cause.Compiler version skew: not yet measured. Worth recording
gcc --versionon both hosts in the CI log.Why HEVC specifically (and not the other 4)
HEVC is the only one of the five campaign codecs that submits a chain of multiple control structs per frame via vaRenderPicture (SPS + PPS + DECODE_PARAMS + SLICE_PARAMS), and the iter25/iter31 fixes hinge on which struct a given field lives in (the
st_rps_bitsslice-vs-decode-params routing). If a function that copies one of these structs's payload (e.g. the per-codechevc_set_controlshelper, ormedia_request_queuecalling into a helper that's been ICF'd against an MPEG-2 lookalike) is wrongly merged or removed by aggressive optimisation, HEVC blows up while the simpler-control-chain codecs survive.The crash offset
+0x9d74(in the source-and-build-flag-resolution my local symbol map suggests) lands within ~40 bytes ofv4l2_timeval_to_ns(84-byte helper). Ifv4l2_timeval_to_nsgot ICF-merged with a similarly-shaped helper from the HEVC path, the wrong instance's local stack layout would be invoked and a structured-fill loop in HEVC would overshoot — that's consistent with a crash aftervaEndPicturefinishes the queue submission, in a per-codec helper.Workaround on consumer hosts (until the build is fixed)
pacman will keep reporting the file as owned by the package, but
pacman -Syuwill silently overwrite it back to the broken CI build on next package upgrade.Suggested fix path
options=('!strip' '!lto')and verify that what gets shipped is the 145 KB binary (matches my local arch-meson reproduction).meson_optionsoverrides:-Db_lto=false -Db_ndebug=falseand skip the strip step inpackage().makepkgproduces the .pkg.tar.zst, runpacman -U <pkg>in a chroot, exercise an HEVC decode viaffmpeg -hwaccel vaapi, fail the workflow if the exit code is non-zero. Cheap regression check; catches this class of bug pre-publish.Severity
The campaign-close criterion (libva == kdirect byte-exact on all 5 codecs) holds only on the hand-built backend. Anyone installing from the marfrit repo today gets a regression. Two consumers definitely affected:
ffmpeg -hwaccel vaapiHEVC path,firefox-fourierHEVC autoplay (HW path silently fails after the codec is selected — content stays SW-decoded).mpv --hwdec=v4l2request-copyis unaffected because it bypasses libva entirely.Refs
marfrit/libva-v4l2-request-fourier @ 7ac934e(iter38b: bounds check uses MAX_PROFILES)marfrit/fresnel-fourier @ e66c5c0feedback_va_st_rps_bits_is_slice_field(the iter31 fix)/var/lib/systemd/coredump/core.ffmpeg.1000.ce5a0dca7def454eaafa3e2383970a76.12611.1778870067000000.zst/var/lib/systemd/coredump/core.ffmpeg.1000.ce5a0dca7def454eaafa3e2383970a76.12635.1778870069000000.zst