33 Commits

Author SHA1 Message Date
marfrit 9bf97fdb49 ffmpeg-v4l2-request-fourier: PKGREL 3 → 5 (force rebuild past orphan -4 .deb)
PR #76 (H.264 IDCT 4×4 daedalus-fourier substitution) was merged but
the resulting .deb was not actually built: an orphan
ffmpeg-v4l2-request-fourier_8.1+rfourier+gb57fbbe-4_arm64.deb (dated
2026-05-19, no matching source commit in main) sat in the apt pool.
.gitea/scripts/check-already-published.sh's debian branch compares
`dpkg --compare-versions $pool_ver ge $source_full` — pool -4
≥ source -3, so CI's skip-check emitted skip=1 and short-circuited
the build.  The ffmpeg-v4l2-request-debian Action reported success
without actually publishing.

Bump source PKGREL past -4 so the next CI run sees source >= pool
and proceeds to build + publish.

No source code change beyond PKGREL + changelog.  Arch side
unaffected (its skip-check is exact-URL-match, not pool-head-ge).
2026-05-21 22:17:00 +02:00
marfrit a536e20218 Merge pull request 'ffmpeg-v4l2-request-fourier: substitute H.264 IDCT 4×4 → daedalus-fourier' (#76) from claude-noether/marfrit-packages:noether/ffmpeg-fourier-idct4-daedalus into main
Reviewed-on: marfrit/marfrit-packages#76
2026-05-21 19:57:31 +00:00
marfrit a1dba5f630 Merge remote-tracking branch 'origin/main' into noether/ffmpeg-fourier-idct4-daedalus 2026-05-21 21:56:41 +02:00
marfrit 88a65cb6d0 CI: add cmake / ninja-build / libvulkan-dev / glslang-tools to ffmpeg-debian deps
The ffmpeg-v4l2-request-debian job now needs to build daedalus-fourier
into a temp prefix before configuring FFmpeg (substitution patch
0003-h264-idct4-daedalus-fourier.patch links libdaedalus_core.a into
libavcodec.so).  Mirror the build-deps the daedalus-v4l2-debian job
already declared for the same reason.

No-op on Arch — makepkg --syncdeps auto-installs cmake/ninja/
vulkan-headers from the PKGBUILD makedepends.
2026-05-21 21:47:48 +02:00
marfrit e641d679d3 ffmpeg-v4l2-request-fourier: substitute H.264 IDCT 4×4 → daedalus-fourier
First cycle of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11
step 2).  H264DSPContext.idct_add — called per 4×4 block from the
intra-4×4 decode path in libavcodec/h264_mb.c — now dispatches through
daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.

## What

- Add 0003-h264-idct4-daedalus-fourier.patch (in both arch/ and
  debian/ ffmpeg-v4l2-request-fourier/).  Creates
  libavcodec/aarch64/h264_idct_daedalus.c (ff_h264_idct_add_daedalus
  shim + lazy pthread_once context init via
  daedalus_ctx_create_no_qpu), patches
  libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->idct_add to
  the shim, adds the new .o to libavcodec/aarch64/Makefile.
- arch/PKGBUILD + debian/build-deb.sh: fetch + build
  daedalus-fourier (pinned at d87239d — lockstep with the
  daedalus-v4l2 daemon's inline build) with
  -DCMAKE_POSITION_INDEPENDENT_CODE=ON into a per-build temp prefix,
  then pass --extra-cflags=-I.../include --extra-ldflags=-L.../lib
  --extra-libs="-ldaedalus_core -lvulkan -lpthread" to FFmpeg
  configure.  daedalus_core.a is static-linked into libavcodec.so.62.
- debian/control Depends gains libvulkan1 (daedalus_core PUBLIC-links
  Vulkan::Vulkan for the queryable QPU substrate; the no-QPU
  constructor still works at runtime but the loader needs
  libvulkan.so.1 present to dlopen libavcodec.so.62).
- arch/PKGBUILD depends gains vulkan-icd-loader, makedepends gains
  cmake / ninja / vulkan-headers.

## Why

The recipe layer picks the substrate; for cycle 6 (H.264 IDCT 4×4)
the recipe is CPU NEON, so this is effectively a NEON-to-NEON
substitution with one extra dispatch call and recipe-table lookup.
The point of this first cycle isn't perf wins — it's plumbing.  Once
the path is wired and stable, follow-up patches batch through the
bulk paths (idct_add16 / idct_add16intra / idct_add8) and stack
cycles 7/8/9 (IDCT 8×8, luma-v deblock, qpel mc20).

Bit-exact against ff_h264_idct_add_neon (daedalus-fourier cycle 6
green; FFmpeg's 4×4 block storage matches daedalus's column-major
convention).

## Scope NOT covered

- Bulk paths (idct_add16 / idct_add16intra / idct_add8) — most IDCT
  4×4 calls in real H.264 streams go through these, not the per-
  block c->idct_add path; intra-4×4-only macroblocks are a minority.
  Batched substitution lands in a follow-up.
- High-bit-depth (10-bit) path — not touched; 8-bit only.
- Cycles 7/8/9 — separate PRs.

## SONAME

Unchanged.  libavcodec.so.62 / libavformat.so.62 / libavutil.so.60.
No daedalus-v4l2-dkms or daedalus-v4l2 bump required.

## Refs

- reauktion/daedalus-v4l2 issue #11 (substitution arc): reauktion/daedalus-v4l2#11
- marfrit/daedalus-fourier cycle 6 close (H.264 IDCT 4×4 NEON green)
2026-05-21 21:44:35 +02:00
marfrit 877238bd1b Merge pull request 'daedalus-v4l2: 3bc0da1 -> 6e6dfa1 — dlopen Kwiboo soname 62 + CI build-deps swap' (#75) from claude-noether/marfrit-packages:noether/daedalus-bump-6e6dfa1-soname62 into main
Reviewed-on: marfrit/marfrit-packages#75
2026-05-21 19:26:39 +00:00
claude-noether 27617e4cb0 daedalus-v4l2: 3bc0da1 -> 6e6dfa1 — dlopen Kwiboo soname 62 (#16) 2026-05-21 21:24:03 +02:00
marfrit a2daab1b28 Merge pull request 'daedalus-v4l2: 77e14e5 -> 3bc0da1 — decode_us + periodic stats' (#74) from claude-noether/marfrit-packages:noether/daedalus-bump-3bc0da1 into main
Reviewed-on: marfrit/marfrit-packages#74
2026-05-21 18:50:46 +00:00
claude-noether 9146e83710 daedalus-v4l2: 77e14e5 -> 3bc0da1 — decode_us + periodic stats (#15) 2026-05-21 20:29:07 +02:00
marfrit abf8fb3077 Merge pull request 'ci: add libvulkan-dev + glslang-tools for daedalus-fourier build dep' (#73) from claude-noether/marfrit-packages:noether/ci-fourier-build-deps into main
Reviewed-on: marfrit/marfrit-packages#73
2026-05-21 18:05:59 +00:00
claude-noether 1414dfeac2 .gitea/workflows: add libvulkan-dev + glslang-tools to daedalus-v4l2 Debian build deps
The daedalus-v4l2 build-deb.sh (post marfrit-packages#72) now fetches
+ cmake-builds daedalus-fourier into a per-build temp prefix before
building the daemon, so the static-archive can be linked in.
daedalus-fourier's CMakeLists requires Vulkan headers and glslangValidator
(for SPIR-V compilation of the .comp compute shaders).  Without them
the configure step on the debian-aarch64 runner fails with:

  CMake Error at FindPackageHandleStandardArgs.cmake:233 (message):
    Could NOT find Vulkan (missing: Vulkan_LIBRARY Vulkan_INCLUDE_DIR)

(Observed on Gitea Actions run 1056.)

Add `libvulkan-dev` and `glslang-tools` to the apt-get install line so
the in-build daedalus-fourier compile succeeds and the daemon can link.
2026-05-21 19:58:19 +02:00
marfrit 41c1e0b6b9 Merge pull request 'daedalus-v4l2: 5d8b436 -> 77e14e5 — #12 (LOW_DELAY) + #13 (daedalus-fourier linkage)' (#72) from claude-noether/marfrit-packages:noether/daedalus-bump-77e14e5-with-fourier into main
Reviewed-on: marfrit/marfrit-packages#72
2026-05-21 17:15:12 +00:00
claude-noether c9a4b82f2c daedalus-v4l2: 5d8b436 -> 77e14e5 — picks up #12 (LOW_DELAY) + #13 (daedalus-fourier linkage)
Daemon-only bump (no daedalus-v4l2-dkms change needed; PROTO_VERSION
stays at 0).

#12 (LOW_DELAY half-measure): daemon sets AV_CODEC_FLAG_LOW_DELAY on
the H.264 AVCodecContext so libavcodec emits frames in decode order
~99% of the time (a few stragglers at GOP boundaries when the
stream's SPS num_reorder_frames overrides the flag).  Visible
improvement vs the 2-1-4-3 pair-swap on Firefox + mpv playback;
not the permanent fix — see daedalus-v4l2#11 for the architectural
plan to substitute daedalus-fourier kernels for libavcodec's
pixel math one cycle at a time.

#13 (daedalus-fourier linkage): daemon now pkg-config-links against
the daedalus-fourier kernel library (marfrit/daedalus-fourier) and
logs substrate availability at startup.  No kernels dispatched yet
— this is the build-time foundation for the substitution work.

build-deb.sh updated to fetch + build + install daedalus-fourier
(pinned at d87239d, marfrit/daedalus-fourier PR #1) into a per-
build temp prefix before invoking the daemon's cmake, exposing it
via PKG_CONFIG_PATH.  Static-linked, so the resulting .deb has no
new runtime deps.  Requires libvulkan-dev + glslang-tools on the
CI runner.

Arch PKGBUILD bumped to the same upstream commit but Arch packaging
for daedalus-fourier itself is a follow-up; until that lands the
Arch build expects daedalus-fourier installed by the user (AUR-style).
Debian-side is end-to-end self-contained via build-deb.sh.

Refs:
  * reauktion/daedalus-v4l2#12
  * reauktion/daedalus-v4l2#13
  * reauktion/daedalus-v4l2#11
  * marfrit/daedalus-fourier#1
2026-05-21 18:39:22 +02:00
marfrit 736b6da176 Merge pull request 'daedalus-v4l2{,-dkms}: 79256dc/6ffe92b -> 5d8b436 — revert parking design' (#71) from claude-noether/marfrit-packages:noether/daedalus-revert-bump-5d8b436 into main
Reviewed-on: marfrit/marfrit-packages#71
2026-05-21 14:54:18 +00:00
claude-noether 34972ae9c1 daedalus-v4l2{,-dkms}: 79256dc/6ffe92b -> 5d8b436 — revert parking design
Lock-step downgrade of both packages to the revert tip of
daedalus-v4l2 (PR #10 closed PRs #7 + #8).  After
0.1.0+r28+g79256dc-1 / 0.1.0+r30+g6ffe92b-1 landed in production,
mpv (--hwdec=vaapi-copy) failed pre-playing with "Unable to dequeue
buffer: Resource temporarily unavailable" because the daemon
parked CAPTURE buffers waiting for libavcodec's display-order
reorder, violating libva's V4L2 stateless 1:1 contract.  See
daedalus-v4l2#9 for the diagnostic, #10 for the revert PR.

DAEDALUS_PROTO_VERSION drops 1 → 0; install both .debs in the same
apt transaction.  Userspace ABI returns to the f0d4186-equivalent
behaviour, plus PR #4 (cosmetic H.264 menu controls).  The
daedalus-v4l2-dkms #64 multi-kernel postinst behaviour stays in
build-deb.sh.

Visible regression: H.264 B-frame streams in Firefox return to the
"2 1 4 3 6 5" pair-swap visual.  Proper fix (concurrent in-flight
requests in daemon + display-order reorder moved into libva-v4l2-
request-fourier) tracked at daedalus-v4l2#11.

Refs:
  * reauktion/daedalus-v4l2#9
  * reauktion/daedalus-v4l2#10  (merged)
  * reauktion/daedalus-v4l2#11
2026-05-21 15:42:03 +02:00
marfrit a9f1b833b9 Merge pull request 'mesa-panvk-bifrost: r3 -> r4 — iter17 XFB primitive decomposition' (#70) from claude-noether/marfrit-packages:noether/mesa-panvk-bifrost-r4-iter17-xfb-decomp into main
Reviewed-on: marfrit/marfrit-packages#70
2026-05-21 12:18:23 +00:00
marfrit 83e8eca56d mesa-panvk-bifrost: r3 -> r4 — iter17 XFB primitive decomposition
iter17 closes the 162 winding_* CTS failures from iter15's baseline by
replacing the upstream pan_nir_lower_xfb call with a panvk-specific NIR
pass (panvk_per_arch(nir_lower_xfb)) that handles per-primitive
decomposition for non-LIST topologies (LINE_STRIP, TRIANGLE_STRIP,
TRIANGLE_FAN, and the four _WITH_ADJACENCY variants).

Topology + per-instance output vertex count are threaded as new sysvals
(vs.xfb_topology + vs.xfb_output_count) so the NIR pass can dispatch
per-topology at runtime without compiling 7+ shader variants.

dEQP-VK.transform_feedback.simple.* result (133596 cases total):
                  iter15 baseline  ->  iter17
  Pass:             796               958   (+162)
  Fail:             243               81    (-162; resume_* by-design only)
  NotSupported:     132551            132551
  Fatal-skip:       6                 6
  Pass rate of runnable: 76.2% -> 91.7% (+15.5pp)

100% of the iter15 winding-fail cluster closed. The remaining 81 fails
are all resume_* (pause/resume XFB, by design — we advertise
transformFeedbackDraw=false).

Second-model review (janet) produced 3 findings; Findings 1+2 were
already fixed in the in-tree applied state (stale applied_state/ snapshot
read by reviewer), Finding 3 (degenerate N underflow on N<2) addressed
by gating non-LIST emission on `output_count > 0` predicate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 14:07:00 +02:00
marfrit 1c8c186681 Merge pull request 'daedalus-v4l2-dkms: 79256dc -> 6ffe92b — fix kernel panic regression from #67' (#69) from claude-noether/marfrit-packages:noether/daedalus-dkms-bump-6ffe92b into main
Reviewed-on: marfrit/marfrit-packages#69
2026-05-21 12:00:15 +00:00
claude-noether a0be2dcc9f daedalus-v4l2-dkms: 79256dc -> 6ffe92b — fix kernel panic from #7
Kernel-only bump.  Fixes the hard-reboot regression introduced by
the daedalus-v4l2#7 split-completion design and observed on higgs
(Pi CM5) during the first mpv vaapi-copy playback of 720p H.264:
device_run now removes src + dst from m2m_ctx's rdy_queue at the
moment it picks them up, not at buf_done time.  Without this, a
parked dst_buf (waiting for libavcodec's display-order release)
stayed in the rdy_queue and got re-picked by the next device_run
after SRC_CONSUMED's job_finish released the scheduler — two
inflight entries on the same vb2_buffer, later HAS_PIXELS calls
list_del on an already-detached list_head, panic.

DAEDALUS_PROTO_VERSION stays at 1 — daemon (userspace
daedalus-v4l2) need NOT bump in lockstep with this DKMS update.
The existing daedalus-v4l2 0.1.0+r28+g79256dc is wire-compatible
with daedalus-v4l2-dkms 0.1.0+r30+g6ffe92b.

Refs:
  * reauktion/daedalus-v4l2#8
2026-05-21 13:56:42 +02:00
marfrit eb89f12c3e Merge pull request 'libva-v4l2-request-fourier: bump pin to c454618 (#15 transparent resize)' (#68) from claude-noether/marfrit-packages:bump-libva-fourier-c454618-issue-15 into main
Reviewed-on: marfrit/marfrit-packages#68
2026-05-21 11:25:39 +00:00
marfrit ce2fff1a4f libva-v4l2-request-fourier: bump pin to c454618 (#15 transparent resize)
Bumps both Arch PKGBUILD and Debian build-deb.sh pins to PR #16 —
codec_store_buffer + request_pool_resize transparent OUTPUT-pool grow
on a mid-session resolution upshift overrun.  Picks up the frame-
survival path that supersedes #13's drop-and-recreate fallback.

Dual-pin per feedback_marfrit_packages_dual_pin so both Arch and
Debian repos see check-already-published.sh report a new version.
2026-05-21 13:24:21 +02:00
marfrit 9301894997 Merge pull request 'daedalus-v4l2{,-dkms}: f0d4186 -> 79256dc — H.264 B-frame reorder fix + menu ctrls' (#67) from claude-noether/marfrit-packages:noether/daedalus-bump-79256dc into main
Reviewed-on: marfrit/marfrit-packages#67
2026-05-21 10:51:14 +00:00
claude-noether f21c1ff80a daedalus-v4l2{,-dkms}: f0d4186 -> 79256dc — H.264 B-frame reorder + menu ctrls
Lock-step bump of both packages to daedalus-v4l2#7 + #4.  PROTO_VERSION
bumps 0 → 1 at the daemon ↔ kernel chardev wire: REQ_DECODE adds
__u64 src_pts (the OUTPUT vb2 timestamp); RESP_FRAME adds __u32 flags
(HAS_PIXELS / SRC_CONSUMED) + __u64 output_src_pts (= frame->pts on
drain).  Both .debs must be installed atomically or the chardev
handshake rejects the version mismatch.

  * daedalus-v4l2: daemon's send_packet → receive_frame loop now
    stamps pkt->pts = req->src_pts and looks up the cookie for each
    drained frame via frame->pts.  chardev_client emits multiple
    RESP_FRAME messages per REQ_DECODE when libavcodec's display-
    order DPB releases an earlier frame on receipt of a later
    bitstream — fixes the "2 1 4 3 6 5" pair-swap on H.264 streams
    with B-frames.

  * daedalus-v4l2-dkms: kernel device_run mirrors src_buf timestamp
    into REQ_DECODE.src_pts.  Completion path splits HAS_PIXELS /
    SRC_CONSUMED: src is released as soon as send_packet succeeds
    (so the m2m scheduler moves on), dst stays parked until the
    matching frame is drained later.  TIMESTAMP_COPY's auto src→dst
    pairing no longer applies once lifecycles decouple — dst is
    stamped explicitly from inflight->src_pts at HAS_PIXELS time.

  * daedalus-v4l2-dkms also carries forward the -2 multi-kernel
    postinst fix (#64) from the prior PKGREL.  PKGREL resets to 1 on
    the new upstream pin.

The daedalus-v4l2#4 H.264 DECODE_MODE + START_CODE menu controls (a
cosmetic warning fix that PR landed alongside #7) is also subsumed —
"Unable to set control(s) error_idx=2/2" no longer fires.

Refs:
  * reauktion/daedalus-v4l2#7
  * reauktion/daedalus-v4l2#4
  * reauktion/daedalus-v4l2#6
2026-05-21 12:41:12 +02:00
marfrit e15b887d8d Merge pull request 'libva-v4l2-request-fourier: bump pin to 2860d75 (#13 bounds-check fix)' (#66) from claude-noether/marfrit-packages:bump-libva-fourier-2860d75-issue-13 into main
Reviewed-on: marfrit/marfrit-packages#66
2026-05-21 10:38:03 +00:00
marfrit b69db65037 libva-v4l2-request-fourier: bump pin to 2860d75 (#13 bounds-check fix)
Bumps both the Arch PKGBUILD and the Debian build-deb.sh pins to PR
#14 merge — codec_store_buffer bounds-checks for VASliceDataBufferType.
Picks up the SIGSEGV fix for mpv --hwdec=vaapi-copy on resolution
upshift mid-stream (issue #13).

Dual-pin so check-already-published.sh detects both pool ABIs as
needing a fresh build.
2026-05-21 12:19:04 +02:00
marfrit adcc824bf7 Merge pull request 'daedalus-v4l2-dkms: postinst — autoinstall for all installed kernels (#64)' (#65) from claude-noether/marfrit-packages:fix/daedalus-dkms-multi-kernel-64 into main
Reviewed-on: marfrit/marfrit-packages#65
2026-05-21 09:28:47 +00:00
claude-noether 7213b23861 daedalus-v4l2-dkms: postinst — autoinstall for all installed kernels (#64)
Previously dkms autoinstall ran only against $(uname -r), so installing
the package on kernel A and rebooting into separately-installed kernel B
left /lib/modules/B/updates/dkms/ empty.  /dev/daedalus-v4l2 absent,
daedalus daemon nothing to talk to, browser/VAAPI silently falling back
to software with no obvious diagnostic for the user.

Now we enumerate every /lib/modules/*/build that resolves to a real
directory (i.e. headers are actually installed for that kernel) and run
'dkms autoinstall -k <kver>' for each.  Per-kernel verify; aggregated
warning only for the kernels that didn't build.

Tested locally: enumeration filters dangling /build symlinks correctly
(2 kernels installed, 1 has headers → only that one is built against).

Bumps PKGREL 1 → 2.  Closes #64.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 11:07:35 +02:00
marfrit 2cd3acd680 Merge pull request 'firefox-fourier 0003: proper V4L2REQUEST type acceptance patch (closes #60)' (#63) from firefox-0003-v4l2request-proper-2026-05-21 into main
Reviewed-on: marfrit/marfrit-packages#63
2026-05-21 05:10:27 +00:00
marfrit 22ac3c9845 firefox-fourier 0003: V4L2REQUEST type acceptance (proper patch, regenerated from real source)
Closes #60.

Resolves the malformed-patch issue from #61 (since reverted in #62)
by regenerating the 0003 patch via actual application against firefox
150.0.3 Pi-OS source.

Functional change vs prior 0003: walking hw_configs accepts
AV_HWDEVICE_TYPE_DRM (legacy) OR integer device_type values 13/14
(AV_HWDEVICE_TYPE_V4L2REQUEST in Kwibos no-AMF / upstream-AMF trees).
CreateV4L2RequestDeviceContext passes integer 13 (Kwibos value) cast
to enum AVHWDeviceType for the av_hwdevice_ctx_create call.

Tested: applied cleanly via patch -p1 against firefox-150.0.3 source
post-Pi-OS-quilt-patches. Test build follow-up in firefox-rpios EC2
script (drops the in-source sed hack from v7-v8).
2026-05-21 06:59:20 +02:00
marfrit 3275d06728 Merge pull request 'Revert #61: malformed firefox-fourier 0003 patch' (#62) from revert-pr-61-malformed-patch into main
Reviewed-on: marfrit/marfrit-packages#62
2026-05-21 04:33:35 +00:00
marfrit 33b91cf7dc Revert "Merge pull request 'firefox-fourier patch #3: accept AV_HWDEVICE_TYPE_V4L2REQUEST too' (#61) from fix/firefox-v4l2request-type-accept-2026-05-21 into main"
This reverts commit a640633ea7, reversing
changes made to de3c2c6744.
2026-05-21 06:32:39 +02:00
marfrit a640633ea7 Merge pull request 'firefox-fourier patch #3: accept AV_HWDEVICE_TYPE_V4L2REQUEST too' (#61) from fix/firefox-v4l2request-type-accept-2026-05-21 into main
Reviewed-on: marfrit/marfrit-packages#61
2026-05-21 04:18:28 +00:00
marfrit 5f21a71770 firefox-fourier patch #3: accept AV_HWDEVICE_TYPE_V4L2REQUEST too
Closes part of #60 (firefox-side patch update for fourier2 ffmpeg).

Background: libavcodec61-fourier2 (Kwiboo v4l2-request-n7.1.3 backed)
registers its hwaccels with AV_HWDEVICE_TYPE_V4L2REQUEST (the dedicated
enum added in FFmpeg 7.1+), not AV_HWDEVICE_TYPE_DRM as fourier1 did.
The firefox-fourier patch #3 walked hw_configs looking only for DRM
and fell through to software for every codec.

Patch updates:
- CreateV4L2RequestDeviceContext now takes an int aDeviceType (Mozillas
  bundled libavutil headers may lack the V4L2REQUEST enumerator), passed
  through to av_hwdevice_ctx_create.
- hw_configs walk accepts DRM (legacy) OR V4L2REQUEST integer value
  (13 on Kwibooss no-AMF tree, 14 on upstream-AMF tree).
- Renamed mDRMDeviceContext to mV4L2RequestDeviceContext for accuracy.

Build pkgrel will be bumped at debian-package level to +fourier2.
2026-05-21 00:09:54 +02:00
17 changed files with 1420 additions and 112 deletions
+24 -9
View File
@@ -930,12 +930,13 @@ jobs:
# map 1:1 to the previous Arch list; libav*-dev intentionally
# absent (we are FFmpeg itself, providing those libs).
retry apt-get install -y --no-install-recommends \
build-essential git pkg-config nasm yasm \
build-essential cmake ninja-build git pkg-config nasm yasm \
linux-libc-dev libgl1-mesa-dev libasound2-dev libbz2-dev \
libfontconfig-dev libfribidi-dev libgmp-dev libgnutls28-dev \
libmp3lame-dev libass-dev libdav1d-dev libdrm-dev \
libfreetype-dev libpulse-dev libva-dev libvorbis-dev libvpx-dev \
libwebp-dev libx264-dev libx265-dev libxml2-dev libopus-dev \
libvulkan-dev glslang-tools \
v4l-utils liblzma-dev zlib1g-dev \
curl ca-certificates openssh-client rsync dpkg-dev
@@ -1159,16 +1160,30 @@ jobs:
retry() { for i in 1 2 3; do "$@" && return 0; rc=$?; echo "retry $i (exit=$rc)" >&2; sleep $((i*5)); done; return 1; }
export DEBIAN_FRONTEND=noninteractive
retry apt-get update -qq
# libav*-dev provide the headers daedalus daemon dlopens at
# runtime — Debian's stock packages match the trixie ABI the
# daemon will encounter on Pi 5 hosts (both ship libavcodec
# 61.x). The fourier ffmpeg fork isn't needed here; the
# daemon never link-binds against libav (Option γ — dlopen
# at runtime), so any header set with the right struct
# definitions works.
# FFmpeg headers + sonames the daemon dlopens. As of
# daedalus-v4l2 PR #16 (commit 514da29), the daemon targets
# the Kwiboo fork's libavcodec.so.62 / libavformat.so.62 /
# libavutil.so.60 at /opt/fourier — so the build needs
# /opt/fourier/include and /opt/fourier/lib/pkgconfig.
# ffmpeg-v4l2-request-fourier provides both (plus the
# runtime libs the .deb will dlopen on the target host;
# we install it as a build-dep here and the dpkg-shlibdeps
# step pulls it into the daemon .deb's Depends automatically).
# Debian-stock libav*-dev removed — would conflict on
# /usr/include/libavcodec/avcodec.h vs /opt/fourier's copy.
#
# libvulkan-dev + glslang-tools: needed by the in-build
# daedalus-fourier fetch (build-deb.sh fetches the sibling
# library, cmake-builds it into a temp prefix, then the
# daedalus daemon static-links against it via pkg-config).
# Without these, daedalus-fourier's find_package(Vulkan)
# and glslangValidator find_program both fail at configure
# time. See marfrit/daedalus-fourier PR #1 +
# reauktion/daedalus-v4l2 PR #13.
retry apt-get install -y --no-install-recommends \
build-essential cmake ninja-build pkg-config git \
libavcodec-dev libavformat-dev libavutil-dev libdrm-dev \
ffmpeg-v4l2-request-fourier libdrm-dev \
libvulkan-dev glslang-tools \
linux-libc-dev \
curl ca-certificates openssh-client rsync dpkg-dev
+8 -3
View File
@@ -18,10 +18,15 @@ _module=daedalus_v4l2
# Same pin as arch/daedalus-v4l2 — keep kernel module + daemon
# bit-versioned together so the chardev wire protocol stays in sync.
_commit=f0d41867f60f5bf8dbfcc6cc16404d7d7eb90014
# 5d8b436 reverts PRs #7 + #8 (parking design that broke libva's
# 1:1 contract — see daedalus-v4l2#9 + #10). Tree is
# content-equivalent to f0d4186 plus PR #4 (cosmetic menu ctrls).
# PROTO_VERSION drops 1 → 0; lock-step install with
# daedalus-v4l2 0.1.0.r33.5d8b436 REQUIRED.
_commit=5d8b4369e58ab947d1c56b1f718293c57c6065b5
pkgver=0.1.0.r24.f0d4186
pkgrel=1 # reset for new upstream pin (3dd0eb0 — DAEMON-PPS H.264 SPS/PPS NAL synth)
pkgver=0.1.0.r33.5d8b436
pkgrel=1 # reset for new upstream pin (5d8b436 — revert parking design)
pkgdesc="V4L2 stateless decoder shim kernel module (DKMS) — Pi 5 / CM5"
arch=('any')
url="https://git.reauktion.de/reauktion/daedalus-v4l2"
+11 -9
View File
@@ -16,17 +16,19 @@
pkgname=daedalus-v4l2
_upstreampkg=daedalus-v4l2
# Pin the daedalus-v4l2 tip. 481279c = "Phase 8.13: byte-exact end-to-
# end via libva (consumer target hit)" — first commit where the full
# ffmpeg -hwaccel vaapi → libva → /dev/video0 → daemon path lands a
# pixel-correct decoded frame back in ffmpeg. Promote to a later pin
# only after a future phase closes cleanly.
_commit=f0d41867f60f5bf8dbfcc6cc16404d7d7eb90014
# 6e6dfa1 = picks up daedalus-v4l2 PR #16 — daemon now dlopens
# the Kwiboo fourier fork's libavcodec.so.62 / libavformat.so.62 /
# libavutil.so.60 at /opt/fourier instead of Debian-stock soname
# 61/61/59. First step on the daedalus-fourier substitution arc
# (daedalus-v4l2#11). Daemon still needs daedalus-fourier at
# build time (Arch packaging for that is a follow-up; Debian side
# fetches inline via build-deb.sh).
_commit=6e6dfa144da7bc7fa8be50c8da91d7d1c6132a2c
# 0.1.0 (pre-1.0) + commit count + short sha. Bump the .Y on each
# Phase 8.x close. pkgver() recomputes at build time.
pkgver=0.1.0.r24.f0d4186
pkgrel=1 # reset for new upstream pin (3dd0eb0 — DAEMON-PPS H.264 SPS/PPS NAL synth)
pkgver=0.1.0.r41.6e6dfa1
pkgrel=1 # reset for new upstream pin (6e6dfa1 — soname 62 via /opt/fourier)
pkgdesc="Userspace daemon for the daedalus-v4l2 V4L2 stateless decoder shim (VP9/AV1/H.264 on Pi 5 / CM5)"
arch=('aarch64')
url="https://git.reauktion.de/reauktion/daedalus-v4l2"
@@ -34,7 +36,7 @@ license=('BSD-2-Clause' 'GPL-2.0-or-later')
# Daemon dlopens libavformat.so.61 / libavcodec.so.61 / libavutil.so.59
# at runtime (Option γ — see daemon/src/ffmpeg_loader.h). ffmpeg
# provides those; we don't link them.
depends=('ffmpeg' 'libdrm')
depends=('ffmpeg-v4l2-request-fourier' 'libdrm')
# Headers from libav*-dev needed at compile time for type-safe function
# pointer signatures; pkg-config locates them.
makedepends=('cmake' 'ninja' 'pkgconf' 'git' 'ffmpeg')
@@ -0,0 +1,137 @@
From f760c0541586f43334c02611fcb4c212c08ad576 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Thu, 21 May 2026 21:40:22 +0200
Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 4x4 IDCT through
daedalus-fourier
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
H264DSPContext.idct_add (called per 4x4 block from the intra-4x4
decode path in h264_mb.c) now dispatches through
daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
The recipe layer picks the substrate; for cycle 6 (H.264 IDCT 4x4)
the recipe is CPU NEON, so this is effectively a NEON-to-NEON
substitution with one extra dispatch call and recipe-table lookup.
Provides the first end-to-end exercise of the daedalus-fourier
kernel pack inside the libavcodec.so decode hot path; follow-up
patches wire IDCT 8x8, luma-v deblock, and qpel mc20.
The library context is process-global, lazily initialised under
pthread_once on first call. We pick the no-QPU constructor because
libavcodec.so is loaded into arbitrary host processes
(firefox-fourier, mpv-fourier, daedalus_v4l2_daemon, ...) and we
cannot assume the host has a usable Vulkan instance. Higher cycles
(deblock luma-v, MC) that benefit from the QPU will provision their
own recipe-selected context once that path is wired.
Bulk paths (idct_add16, idct_add16intra, idct_add8 — used for
non-intra4x4 macroblocks) remain on the stock NEON .S implementations
and will be batched through daedalus_recipe_dispatch_h264_idct4 with
n_blocks>1 in a follow-up.
Bit-exact against ff_h264_idct_add_neon (daedalus-fourier cycle 6
green; see marfrit/daedalus-fourier/CYCLE_LOGS.md).
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2.
---
libavcodec/aarch64/Makefile | 3 +-
libavcodec/aarch64/h264_idct_daedalus.c | 49 +++++++++++++++++++++++
libavcodec/aarch64/h264dsp_init_aarch64.c | 3 +-
3 files changed, 53 insertions(+), 2 deletions(-)
create mode 100644 libavcodec/aarch64/h264_idct_daedalus.c
diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
index 41ab025..7b95fb1 100644
--- a/libavcodec/aarch64/Makefile
+++ b/libavcodec/aarch64/Makefile
@@ -3,7 +3,8 @@ OBJS-$(CONFIG_AC3DSP) += aarch64/ac3dsp_init_aarch64.o
OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_init_aarch64.o
OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_init.o
OBJS-$(CONFIG_H264CHROMA) += aarch64/h264chroma_init_aarch64.o
-OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_init_aarch64.o
+OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_init_aarch64.o \
+ aarch64/h264_idct_daedalus.o
OBJS-$(CONFIG_HUFFYUVDSP) += aarch64/huffyuvdsp_init_aarch64.o
OBJS-$(CONFIG_H264PRED) += aarch64/h264pred_init.o
OBJS-$(CONFIG_H264QPEL) += aarch64/h264qpel_init_aarch64.o
diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
new file mode 100644
index 0000000..538d223
--- /dev/null
+++ b/libavcodec/aarch64/h264_idct_daedalus.c
@@ -0,0 +1,49 @@
+/*
+ * H.264 4x4 IDCT + add — daedalus-fourier substitution shim.
+ *
+ * Routes H264DSPContext.idct_add through
+ * daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
+ * The recipe layer picks the substrate (CPU NEON by default for
+ * cycle 6; future cycles may dispatch to V3D opportunistically).
+ *
+ * FFmpeg's 4x4 block memory layout matches daedalus's column-major
+ * convention: block[r + 4*c] = coefficient at (row r, col c). Both
+ * sides destructively zero the block after the transform.
+ *
+ * The library context is process-global and lazily initialised under
+ * pthread_once. We pick the no-QPU constructor here because
+ * libavcodec.so is loaded into arbitrary host processes
+ * (firefox-fourier, mpv-fourier, daedalus_v4l2_daemon, ...) and we
+ * cannot assume the host has a usable Vulkan instance. Higher cycles
+ * (deblock, MC) that benefit from the QPU initialise their own
+ * recipe-selected context once that path is wired.
+ */
+
+#include <pthread.h>
+#include <stddef.h>
+#include <stdint.h>
+
+#include <daedalus.h>
+
+#include "libavutil/attributes.h"
+#include "libavcodec/h264dsp.h"
+
+static daedalus_ctx *g_dctx;
+static pthread_once_t g_dctx_once = PTHREAD_ONCE_INIT;
+
+static void daedalus_ctx_init_once(void)
+{
+ g_dctx = daedalus_ctx_create_no_qpu();
+}
+
+void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
+
+void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
+{
+ static const daedalus_h264_block_meta meta = { .dst_off = 0 };
+
+ pthread_once(&g_dctx_once, daedalus_ctx_init_once);
+
+ daedalus_recipe_dispatch_h264_idct4(g_dctx, dst, (size_t)stride,
+ block, 1, &meta);
+}
diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
index c684574..b993df2 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
@@ -66,6 +66,7 @@ void ff_biweight_h264_pixels_4_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride
int weights, int offset);
void ff_h264_idct_add_neon(uint8_t *dst, int16_t *block, int stride);
+void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct_dc_add_neon(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct_add16_neon(uint8_t *dst, const int *block_offset,
int16_t *block, int stride,
@@ -139,7 +140,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
c->biweight_pixels_tab[1] = ff_biweight_h264_pixels_8_neon;
c->biweight_pixels_tab[2] = ff_biweight_h264_pixels_4_neon;
- c->idct_add = ff_h264_idct_add_neon;
+ c->idct_add = ff_h264_idct_add_daedalus;
c->idct_dc_add = ff_h264_idct_dc_add_neon;
c->idct_add16 = ff_h264_idct_add16_neon;
c->idct_add16intra = ff_h264_idct_add16intra_neon;
--
2.47.3
+33 -3
View File
@@ -24,8 +24,13 @@ _srcname=FFmpeg
_version='8.1'
_commit='b57fbbe50c9b2656fad86a1a7eeabfd2b2a50935' # v4l2-request-n8.1 tip 2026-04-24
pkgver=8.1.r123329.b57fbbe
pkgrel=5
pkgrel=6 # pkgrel=6 — H.264 IDCT 4x4 daedalus-fourier substitution (2026-05-21)
epoch=2
# daedalus-fourier pin — first kernel substitution in libavcodec
# (cycle 6 H.264 IDCT 4x4). Same SHA as the daedalus-v4l2 daemon's
# inline build; lockstep with that until the public API rolls.
_daedalus_fourier_commit='d87239d8172307d9a1b93c95cbed116d175b85cc'
pkgdesc='FFmpeg with V4L2 Request API hwaccel (Rockchip / Allwinner stateless decode)'
arch=('aarch64')
url='https://github.com/Kwiboo/FFmpeg'
@@ -34,6 +39,7 @@ depends=(
alsa-lib
bzip2
fontconfig
vulkan-icd-loader
fribidi
gmp
gnutls
@@ -59,10 +65,13 @@ depends=(
zlib
)
makedepends=(
cmake
git
linux-api-headers
mesa
nasm
ninja
vulkan-headers
)
provides=(
libavcodec.so
@@ -78,9 +87,11 @@ provides=(
conflicts=(ffmpeg)
replaces=(ffmpeg ffmpeg-v4l2-request-git)
source=("git+https://github.com/Kwiboo/FFmpeg.git#commit=${_commit}"
"daedalus-fourier-${_daedalus_fourier_commit}.tar.gz::https://git.reauktion.de/marfrit/daedalus-fourier/archive/${_daedalus_fourier_commit}.tar.gz"
'0001-libudev-bypass-fallback.patch'
'0002-nv15-to-p010-unpack.patch')
sha256sums=('SKIP' 'SKIP' 'SKIP')
'0002-nv15-to-p010-unpack.patch'
'0003-h264-idct4-daedalus-fourier.patch')
sha256sums=('SKIP' 'SKIP' 'SKIP' 'SKIP' 'SKIP')
pkgver() {
cd "${_srcname}"
@@ -93,9 +104,25 @@ prepare() {
cd "${_srcname}"
patch -Np1 -i "${srcdir}/0001-libudev-bypass-fallback.patch"
patch -Np1 -i "${srcdir}/0002-nv15-to-p010-unpack.patch"
patch -Np1 -i "${srcdir}/0003-h264-idct4-daedalus-fourier.patch"
}
build() {
# --- daedalus-fourier: build static .a with PIC, install to a
# per-build prefix; libavcodec.so links it into the shared object so
# H264DSPContext.idct_add (and follow-up kernels) dispatch through
# the daedalus recipe layer instead of the in-tree NEON .S code. ---
local _fourier_prefix="${srcdir}/fourier-prefix"
mkdir -p "${_fourier_prefix}"
pushd "${srcdir}"/daedalus-fourier >/dev/null
cmake -B build -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_POSITION_INDEPENDENT_CODE=ON \
-DCMAKE_INSTALL_PREFIX="${_fourier_prefix}"
cmake --build build --target daedalus_core
cmake --install build
popd >/dev/null
cd "${_srcname}"
# FFmpeg's configure resolves the compiler via `which` and bakes the
@@ -147,6 +174,9 @@ build() {
--enable-libx265 \
--enable-libwebp \
\
--extra-cflags="-I${_fourier_prefix}/include" \
--extra-ldflags="-L${_fourier_prefix}/lib" \
--extra-libs="-ldaedalus_core -lvulkan -lpthread" \
--host-cflags='-fPIC'
make
@@ -18,27 +18,30 @@ This patch adds a sibling init path, `InitV4L2RequestDecoder`, that:
* looks up the codec via two complementary mechanisms libavcodec
uses for v4l2_request:
- **named codec** (`h264_v4l2request`, `vp8_v4l2request`, etc.):
the legacy AVCodec-per-hwaccel registration. ALARM, Debian,
and most distros building with --enable-v4l2-request expose
this (avcodec_find_decoder_by_name lookup).
- **generic codec + AV_HWDEVICE_TYPE_DRM** in `hw_configs`:
the modern hwaccel registration on some upstream-only ffmpeg
builds.
the legacy AVCodec-per-hwaccel registration.
- **generic codec + hw_configs walk**: the modern hwaccel
registration. Accepts EITHER AV_HWDEVICE_TYPE_DRM (legacy
ffmpeg-v4l2-request-fork output prior to FFmpeg 7.1) OR
AV_HWDEVICE_TYPE_V4L2REQUEST (FFmpeg 7.1+ dedicated enum,
value 13 on Kwiboo's no-AMF tree, 14 on upstream-AMF tree).
Mozilla's bundled libavutil headers may not have the V4L2REQUEST
enumerator, so the test is on the integer value via `(int)cast`.
Probes named-codec first (explicit, portable) and falls back to
walking the generic codec's `hw_configs` for the DRM device type;
* creates an `AV_HWDEVICE_TYPE_DRM` hwdevice context bound to
`/dev/dri/renderD128` via the new `av_hwdevice_ctx_create` wrapper
(patch 2/4) and attaches it to the codec context;
walking the generic codec's `hw_configs` for either device type;
* creates an hwdevice context bound to `/dev/dri/renderD128`. Uses
integer 13 (V4L2REQUEST as defined by Kwiboo's v4l2-request-n7.1.3
tree, what our libavcodec61-fourier emits) cast to enum
AVHWDeviceType for the av_hwdevice_ctx_create call;
* reuses the existing `ChooseV4L2PixelFormat` get-format callback
(already returns `AV_PIX_FMT_DRM_PRIME`) and the existing
`apply_cropping = 0` constraint.
`InitV4L2RequestDecoder` is invoked **before** `InitV4L2Decoder` in
`InitHWDecoderIfAllowed`. On Rockchip mainline it succeeds via either
mechanism (ALARM uses the named codec). On Pi4 / Mediatek /
vendor-MPP-stateful boards neither mechanism is registered for the
codec, the function bails out, and the existing stateful
`InitV4L2Decoder` runs as before. No regression of stateful boards.
mechanism. On Pi4 / Mediatek / vendor-MPP-stateful boards neither
mechanism is registered for the codec, the function bails out, and the
existing stateful `InitV4L2Decoder` runs as before. No regression of
stateful boards.
`mDRMDeviceContext` is unconditionally `av_buffer_unref`'d in
`ProcessShutdown` (no-op when null). Gated behind
@@ -46,9 +49,8 @@ codec, the function bails out, and the existing stateful
Bug 1969297.
diff --git a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h
--- a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h 2026-03-18 19:22:14.000000000 +0000
+++ b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h 2026-04-27 20:43:39.347992674 +0000
--- a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h 2026-05-21 04:57:59.570946601 +0000
+++ b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h 2026-05-21 04:57:59.876488776 +0000
@@ -225,7 +225,12 @@
bool IsLinuxHDR() const;
MediaResult InitVAAPIDecoder();
@@ -73,9 +75,8 @@ diff --git a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.h b/dom/media/platfor
// If video overlay is used we want to upload SW decoded frames to
// DMABuf and present it as a external texture to rendering pipeline.
bool mUploadSWDecodeToDMABuf = false;
diff --git a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp
--- a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp 2026-04-27 16:09:10.000000000 +0200
+++ b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp 2026-04-29 00:10:00.098884335 +0200
--- a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp 2026-05-21 04:57:59.566685221 +0000
+++ b/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp 2026-05-21 04:58:00.136004159 +0000
@@ -403,6 +403,129 @@
return NS_OK;
}
@@ -90,7 +91,7 @@ diff --git a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp b/dom/media/platf
+ }
+ const char* drmDevice = "/dev/dri/renderD128";
+ if (mLib->av_hwdevice_ctx_create(&mDRMDeviceContext,
+ AV_HWDEVICE_TYPE_DRM, drmDevice,
+ (enum AVHWDeviceType)13, drmDevice,
+ nullptr, 0) < 0) {
+ FFMPEG_LOG(" av_hwdevice_ctx_create(DRM, %s) failed", drmDevice);
+ return false;
@@ -143,7 +144,7 @@ diff --git a/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp b/dom/media/platf
+ for (int i = 0;; i++) {
+ const AVCodecHWConfig* cfg = mLib->avcodec_get_hw_config(generic, i);
+ if (!cfg) break;
+ if (cfg->device_type == AV_HWDEVICE_TYPE_DRM) {
+ if (cfg->device_type == AV_HWDEVICE_TYPE_DRM || (int)cfg->device_type == 13 || (int)cfg->device_type == 14) {
+ codec = generic;
+ FFMPEG_LOG(" using generic codec %s with DRM hwaccel", codec->name);
+ break;
+17 -17
View File
@@ -24,29 +24,29 @@ pkgname=libva-v4l2-request-fourier
epoch=1
_upstreampkg=libva-v4l2-request
# Pin the fork tip. 77f9236 = PR #12 merge "av1: populate
# V4L2_CID_STATELESS_AV1_SEQUENCE in codec_set_controls (#11 libva side)"
# — addresses the libva-side portion of marfrit/libva-v4l2-request-fourier#11.
# Adds src/av1.{c,h} with av1_set_controls that maps VAAPI's
# VAPictureParameterBufferAV1.seq_info_fields onto struct
# v4l2_ctrl_av1_sequence and queues V4L2_CID_STATELESS_AV1_SEQUENCE via
# S_EXT_CTRLS. The daedalus_v4l2 daemon track is the consumer that turns
# the ctrl into an OBU_SEQUENCE_HEADER and prepends it to the slice
# bitstream so libdav1d can parse the (sequence-header-stripped) OUTPUT
# buffer that ffmpeg-vaapi delivers. Until the daemon side lands the ctrl
# is just sitting in the request unused — no-cost no-op for HW decoders
# (vpu981 parses OBU bytes directly).
# Pin the fork tip. c454618 = PR #16 merge "picture, request_pool:
# transparent OUTPUT-pool resize on bitstream overrun (#15)" —
# follow-up root-cause fix to #13/#14. On a mid-stream bitstream-
# budget overrun (typical cause: SPS-driven resolution upshift in an
# adaptive-bitrate stream), codec_store_buffer now snapshots the in-
# flight surface's accumulated bytes, releases its OUTPUT pool slot,
# calls request_pool_resize (STREAMOFF → REQBUFS(0) → S_FMT with
# 2×sizeimage hint, capped at 1 GiB, page-aligned → CREATE_BUFS →
# mmap → media_request_alloc → STREAMON), re-acquires a slot, re-
# mirrors the surface's source_{data,size,request_fd}, restores the
# bytes, and continues. The frame survives instead of being dropped
# back to libavcodec for surface recreation. CAPTURE side untouched
# (per-queue V4L2 streaming independence).
#
# Prior pin (c1bb444) = PR #9 merge — h264_set_controls
# max_num_ref_frames fallback + libva-boundary instrumentation for the
# daedalus consumer-strict path (issue #8 libva side).
_commit=77f92364661419f6e5a7bd827c1b845b4e426569
# Prior pin (2860d75) = PR #14 merge — codec_store_buffer bounds-
# check floor (#13).
_commit=c454618ae11addce2e17b560f4deeacbed067d98
# Project version from meson.build (1.0.0) + commit count + short sha,
# matching the ffmpeg-v4l2-request-fourier convention. Recomputed at
# build time by pkgver() below; the static value here is a placeholder
# so AUR-style consumers see something coherent before src/ exists.
pkgver=1.0.0.r386.77f9236
pkgver=1.0.0.r390.c454618
pkgrel=1
pkgdesc="VA-API backend for V4L2 stateless decoders (multiplanar fork — fourier umbrella)"
arch=('aarch64')
@@ -0,0 +1,629 @@
diff -urN a/src/panfrost/vulkan/meson.build b/src/panfrost/vulkan/meson.build
--- a/src/panfrost/vulkan/meson.build 2026-05-21 14:04:02.529474145 +0200
+++ b/src/panfrost/vulkan/meson.build 2026-05-21 14:04:04.106755486 +0200
@@ -123,6 +123,7 @@
'panvk_vX_nir_lower_input_attachment_loads.c',
'panvk_vX_sampler.c',
'panvk_vX_shader.c',
+ 'panvk_vX_xfb_lower.c',
sha1_h,
]
diff -urN a/src/panfrost/vulkan/panvk_shader.h b/src/panfrost/vulkan/panvk_shader.h
--- a/src/panfrost/vulkan/panvk_shader.h 2026-05-21 14:04:02.525251986 +0200
+++ b/src/panfrost/vulkan/panvk_shader.h 2026-05-21 14:04:04.084251800 +0200
@@ -154,6 +154,8 @@
/* aligned_u64 attribute below inserts the 4-byte alignment gap
* after num_vertices automatically — no explicit pad needed. */
aligned_u64 xfb_address[4]; /* iter13: 4 transform feedback buffer base addresses */
+ uint32_t xfb_topology; /* iter17: panvk_xfb_topology enum value */
+ uint32_t xfb_output_count; /* iter17: per-instance output verts after decomp */
#endif
int32_t first_vertex;
int32_t base_instance;
@@ -569,4 +571,76 @@
struct pan_compute_dim local_size, const void *bin_ptr, size_t bin_size,
struct panvk_shader **shader_out);
+
+#if PAN_ARCH < 9
+/* iter17: encoding for vs.xfb_topology sysval. Maps VkPrimitiveTopology values
+ * we need to distinguish at shader runtime for XFB capture. LIST topologies
+ * use the iter13 single-store fast path; non-LIST need per-vertex decomposition. */
+enum panvk_xfb_topology {
+ PANVK_XFB_TOPO_LIST = 0,
+ PANVK_XFB_TOPO_LINE_STRIP = 1,
+ PANVK_XFB_TOPO_TRI_STRIP = 2,
+ PANVK_XFB_TOPO_TRI_FAN = 3,
+ PANVK_XFB_TOPO_LINE_LIST_ADJ = 4,
+ PANVK_XFB_TOPO_LINE_STRIP_ADJ = 5,
+ PANVK_XFB_TOPO_TRI_LIST_ADJ = 6,
+ PANVK_XFB_TOPO_TRI_STRIP_ADJ = 7,
+};
+
+#include "panvk_macros.h"
+struct nir_shader;
+bool panvk_per_arch(nir_lower_xfb)(struct nir_shader *nir);
+
+/* Map VkPrimitiveTopology to panvk_xfb_topology enum (driver-side helper). */
+static inline uint32_t
+panvk_vk_topology_to_xfb_enum(VkPrimitiveTopology topo)
+{
+ switch (topo) {
+ case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP:
+ return PANVK_XFB_TOPO_LINE_STRIP;
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP:
+ return PANVK_XFB_TOPO_TRI_STRIP;
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN:
+ return PANVK_XFB_TOPO_TRI_FAN;
+ case VK_PRIMITIVE_TOPOLOGY_LINE_LIST_WITH_ADJACENCY:
+ return PANVK_XFB_TOPO_LINE_LIST_ADJ;
+ case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP_WITH_ADJACENCY:
+ return PANVK_XFB_TOPO_LINE_STRIP_ADJ;
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST_WITH_ADJACENCY:
+ return PANVK_XFB_TOPO_TRI_LIST_ADJ;
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP_WITH_ADJACENCY:
+ return PANVK_XFB_TOPO_TRI_STRIP_ADJ;
+ case VK_PRIMITIVE_TOPOLOGY_POINT_LIST:
+ case VK_PRIMITIVE_TOPOLOGY_LINE_LIST:
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST:
+ default:
+ return PANVK_XFB_TOPO_LIST;
+ }
+}
+
+/* Compute the per-instance output vertex count for a given (topology, input count). */
+static inline uint32_t
+panvk_xfb_output_count(VkPrimitiveTopology topo, uint32_t input_count)
+{
+ switch (topo) {
+ case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP:
+ return input_count >= 1 ? 2u * (input_count - 1u) : 0u;
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP:
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN:
+ return input_count >= 2 ? 3u * (input_count - 2u) : 0u;
+ case VK_PRIMITIVE_TOPOLOGY_LINE_LIST_WITH_ADJACENCY:
+ return (input_count / 4u) * 2u;
+ case VK_PRIMITIVE_TOPOLOGY_LINE_STRIP_WITH_ADJACENCY:
+ return input_count >= 3 ? 2u * (input_count - 3u) : 0u;
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST_WITH_ADJACENCY:
+ return (input_count / 6u) * 3u;
+ case VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP_WITH_ADJACENCY:
+ return input_count >= 6 ? 3u * (input_count / 2u - 2u) : 0u;
+ default:
+ return input_count; /* LIST topologies: 1:1 mapping */
+ }
+}
+#endif
+
+
#endif
diff -urN a/src/panfrost/vulkan/panvk_vX_cmd_draw.c b/src/panfrost/vulkan/panvk_vX_cmd_draw.c
--- a/src/panfrost/vulkan/panvk_vX_cmd_draw.c 2026-05-21 14:04:02.528576354 +0200
+++ b/src/panfrost/vulkan/panvk_vX_cmd_draw.c 2026-05-21 14:04:04.091357598 +0200
@@ -727,6 +727,20 @@
/* iter13: VK_EXT_transform_feedback sysvals — always set (per draw),
* reflect bound XFB state. set_gfx_sysval is a no-op if value unchanged. */
set_gfx_sysval(cmdbuf, dirty_sysvals, vs.num_vertices, info->vertex.count);
+
+ /* iter17: XFB primitive-decomposition sysvals.
+ * xfb_topology = enum value for the current bound topology.
+ * xfb_output_count = per-instance output vertex count after decomposition.
+ * For LIST topologies, output_count == input vertex count and the shader
+ * takes the iter13 single-store fast path. */
+ {
+ VkPrimitiveTopology vk_topo =
+ cmdbuf->vk.dynamic_graphics_state.ia.primitive_topology;
+ uint32_t topo_enum = panvk_vk_topology_to_xfb_enum(vk_topo);
+ uint32_t out_count = panvk_xfb_output_count(vk_topo, info->vertex.count);
+ set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_topology, topo_enum);
+ set_gfx_sysval(cmdbuf, dirty_sysvals, vs.xfb_output_count, out_count);
+ }
{
const struct panvk_cmd_graphics_state *_gfx = &cmdbuf->state.gfx;
/* iter13: default each XFB buffer address to PAN_SHADER_OOB_ADDRESS
diff -urN a/src/panfrost/vulkan/panvk_vX_shader.c b/src/panfrost/vulkan/panvk_vX_shader.c
--- a/src/panfrost/vulkan/panvk_vX_shader.c 2026-05-21 14:04:02.527576494 +0200
+++ b/src/panfrost/vulkan/panvk_vX_shader.c 2026-05-21 14:04:04.098356619 +0200
@@ -895,7 +895,10 @@
nir->info.has_transform_feedback_varyings) {
NIR_PASS(_, nir, nir_opt_constant_folding);
NIR_PASS(_, nir, nir_io_add_intrinsic_xfb_info);
- NIR_PASS(_, nir, pan_nir_lower_xfb);
+ /* iter17: panvk-specific replacement for pan_nir_lower_xfb that handles
+ * primitive decomposition for non-LIST topologies. Single-store LIST
+ * fast path matches iter13 behavior. */
+ NIR_PASS(_, nir, panvk_per_arch(nir_lower_xfb));
}
#endif
}
diff -urN a/src/panfrost/vulkan/panvk_vX_xfb_lower.c b/src/panfrost/vulkan/panvk_vX_xfb_lower.c
--- a/src/panfrost/vulkan/panvk_vX_xfb_lower.c 1970-01-01 01:00:00.000000000 +0100
+++ b/src/panfrost/vulkan/panvk_vX_xfb_lower.c 2026-05-21 14:04:04.115354242 +0200
@@ -0,0 +1,486 @@
+/*
+ * Copyright © 2026 mfritsche / claude-noether
+ * SPDX-License-Identifier: MIT
+ *
+ * iter17: panvk-specific replacement for pan_nir_lower_xfb that handles
+ * primitive decomposition for transform_feedback on non-LIST topologies
+ * (TRIANGLE_STRIP/FAN, LINE_STRIP, *_WITH_ADJACENCY).
+ *
+ * Approach: emit a topology dispatch at the start of each store_output
+ * lowering. The shader reads vs.xfb_topology sysval at runtime and branches
+ * into per-topology emission logic. For each affected topology, the lowered
+ * code emits guarded conditional stores — one per primitive this vertex
+ * contributes to, computing the output buffer position via primitive index
+ * and slot within the decomposed primitive.
+ *
+ * For LIST topologies (POINT/LINE/TRIANGLE LIST), takes a fast path that
+ * matches iter13's single-store behavior.
+ *
+ * For TRIANGLE_FAN, the central vertex (v=0) contributes to ALL primitives
+ * as slot 2 — handled via a NIR loop bounded by num_vertices.
+ *
+ * See ~/src/panvk-bifrost/iter17/phase{0,1,2}_*.md for full design context.
+ */
+
+#include "panvk_macros.h"
+
+#if PAN_ARCH < 9
+
+#include "panvk_shader.h"
+
+#include "compiler/nir/nir_builder.h"
+#include "pan_nir.h"
+
+#include <vulkan/vulkan_core.h>
+
+/* ----- Address arithmetic ----- */
+
+static nir_def *
+xfb_store_addr(nir_builder *b, nir_def *buf, nir_def *out_idx,
+ uint16_t stride, uint16_t offset_bytes)
+{
+ nir_def *byte_off = nir_iadd_imm(b,
+ nir_imul_imm(b, out_idx, stride), offset_bytes);
+ return nir_iadd(b, buf, nir_u2u64(b, byte_off));
+}
+
+static void
+emit_list_store(nir_builder *b, nir_def *buf, nir_def *output_count,
+ nir_def *instance_id, nir_def *raw_vid, nir_def *value,
+ uint16_t stride, uint16_t offset_bytes)
+{
+ nir_def *out_idx = nir_iadd(b,
+ nir_imul(b, instance_id, output_count), raw_vid);
+ nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
+ nir_store_global(b, value, addr);
+}
+
+static void
+emit_prim_store(nir_builder *b, nir_def *buf, nir_def *output_count,
+ nir_def *instance_id, nir_def *eligible,
+ nir_def *prim_idx, nir_def *slot,
+ uint32_t verts_per_prim,
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
+{
+ nir_push_if(b, eligible);
+ {
+ nir_def *out_idx = nir_iadd(b,
+ nir_imul(b, instance_id, output_count),
+ nir_iadd(b, nir_imul_imm(b, prim_idx, verts_per_prim), slot));
+ nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
+ nir_store_global(b, value, addr);
+ }
+ nir_pop_if(b, NULL);
+}
+
+/* ----- Per-topology emission ----- */
+
+/* TRIANGLE_STRIP: vertex v contributes to prims v, v-1, v-2 (per eligibility). */
+static void
+emit_tri_strip(nir_builder *b, nir_def *v, nir_def *N,
+ nir_def *buf, nir_def *output_count, nir_def *instance_id,
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
+{
+ nir_def *Nm2 = nir_iadd_imm(b, N, -2);
+ nir_def *Nm1 = nir_iadd_imm(b, N, -1);
+
+ /* Prim v, slot 0: v < N-2 */
+ emit_prim_store(b, buf, output_count, instance_id,
+ nir_ult(b, v, Nm2),
+ v, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
+
+ /* Prim v-1, slot = 1 if prim even else 2: 1 <= v < N-1 */
+ {
+ nir_def *prim = nir_iadd_imm(b, v, -1);
+ nir_def *parity = nir_iand_imm(b, prim, 1u);
+ nir_def *slot = nir_iadd_imm(b, parity, 1);
+ nir_def *eligible = nir_iand(b,
+ nir_uge(b, v, nir_imm_int(b, 1)),
+ nir_ult(b, v, Nm1));
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
+ prim, slot, 3, value, stride, offset_bytes);
+ }
+
+ /* Prim v-2, slot = 2 if prim even else 1: 2 <= v < N */
+ {
+ nir_def *prim = nir_iadd_imm(b, v, -2);
+ nir_def *parity = nir_iand_imm(b, prim, 1u);
+ nir_def *slot = nir_isub(b, nir_imm_int(b, 2), parity);
+ nir_def *eligible = nir_iand(b,
+ nir_uge(b, v, nir_imm_int(b, 2)),
+ nir_ult(b, v, N));
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
+ prim, slot, 3, value, stride, offset_bytes);
+ }
+}
+
+/* LINE_STRIP: vertex v contributes to prim v slot 0 + prim v-1 slot 1. */
+static void
+emit_line_strip(nir_builder *b, nir_def *v, nir_def *N,
+ nir_def *buf, nir_def *output_count, nir_def *instance_id,
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
+{
+ nir_def *Nm1 = nir_iadd_imm(b, N, -1);
+
+ /* Prim v, slot 0: v < N-1 */
+ emit_prim_store(b, buf, output_count, instance_id,
+ nir_ult(b, v, Nm1),
+ v, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
+
+ /* Prim v-1, slot 1: 1 <= v < N */
+ {
+ nir_def *prim = nir_iadd_imm(b, v, -1);
+ nir_def *eligible = nir_iand(b,
+ nir_uge(b, v, nir_imm_int(b, 1)),
+ nir_ult(b, v, N));
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
+ prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
+ }
+}
+
+/* TRIANGLE_FAN: prim p emits {p+1, p+2, 0}.
+ * vertex v=0: contributes to ALL prims as slot 2 (loop required)
+ * vertex v>=1: contributes to prim v-1 as slot 0 (if 1 <= v <= N-2)
+ * vertex v>=2: contributes to prim v-2 as slot 1 (if 2 <= v <= N-1)
+ */
+static void
+emit_tri_fan(nir_builder *b, nir_def *v, nir_def *N,
+ nir_def *buf, nir_def *output_count, nir_def *instance_id,
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
+{
+ nir_def *Nm1 = nir_iadd_imm(b, N, -1);
+ nir_def *Nm2 = nir_iadd_imm(b, N, -2);
+
+ /* Prim v-1, slot 0: 1 <= v < N-1 */
+ {
+ nir_def *prim = nir_iadd_imm(b, v, -1);
+ nir_def *eligible = nir_iand(b,
+ nir_uge(b, v, nir_imm_int(b, 1)),
+ nir_ult(b, v, Nm1));
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
+ prim, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
+ }
+
+ /* Prim v-2, slot 1: 2 <= v < N */
+ {
+ nir_def *prim = nir_iadd_imm(b, v, -2);
+ nir_def *eligible = nir_iand(b,
+ nir_uge(b, v, nir_imm_int(b, 2)),
+ nir_ult(b, v, N));
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
+ prim, nir_imm_int(b, 1), 3, value, stride, offset_bytes);
+ }
+
+ /* Central vertex (v == 0): loop over all prims, write to slot 2. */
+ nir_push_if(b, nir_ieq_imm(b, v, 0));
+ {
+ nir_variable *p_var = nir_local_variable_create(b->impl,
+ glsl_uint_type(), "fan_p");
+ nir_store_var(b, p_var, nir_imm_int(b, 0), 0x1);
+ nir_push_loop(b);
+ {
+ nir_def *p = nir_load_var(b, p_var);
+ nir_push_if(b, nir_uge(b, p, Nm2));
+ {
+ nir_jump(b, nir_jump_break);
+ }
+ nir_pop_if(b, NULL);
+
+ nir_def *out_idx = nir_iadd(b,
+ nir_imul(b, instance_id, output_count),
+ nir_iadd_imm(b, nir_imul_imm(b, p, 3), 2));
+ nir_def *addr = xfb_store_addr(b, buf, out_idx, stride, offset_bytes);
+ nir_store_global(b, value, addr);
+
+ nir_store_var(b, p_var, nir_iadd_imm(b, p, 1), 0x1);
+ }
+ nir_pop_loop(b, NULL);
+ }
+ nir_pop_if(b, NULL);
+}
+
+/* LINE_LIST_WITH_ADJACENCY: 4-vertex groups [4i..4i+3]; output {4i+1, 4i+2}.
+ * v contributes if v%4 == 1: prim v/4 slot 0
+ * v contributes if v%4 == 2: prim v/4 slot 1
+ */
+static void
+emit_line_list_adj(nir_builder *b, nir_def *v, nir_def *N,
+ nir_def *buf, nir_def *output_count, nir_def *instance_id,
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
+{
+ (void)N; /* eligibility is mod-based, not range-based */
+ nir_def *vmod4 = nir_iand_imm(b, v, 3u);
+ nir_def *prim = nir_ushr_imm(b, v, 2); /* v / 4 */
+
+ emit_prim_store(b, buf, output_count, instance_id,
+ nir_ieq_imm(b, vmod4, 1),
+ prim, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
+
+ emit_prim_store(b, buf, output_count, instance_id,
+ nir_ieq_imm(b, vmod4, 2),
+ prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
+}
+
+/* LINE_STRIP_WITH_ADJACENCY: prim p emits {p+1, p+2}.
+ * v contributes to prim v-1 slot 0 (1 <= v <= N-2)
+ * v contributes to prim v-2 slot 1 (2 <= v <= N-1)
+ */
+static void
+emit_line_strip_adj(nir_builder *b, nir_def *v, nir_def *N,
+ nir_def *buf, nir_def *output_count, nir_def *instance_id,
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
+{
+ nir_def *Nm1 = nir_iadd_imm(b, N, -1);
+ nir_def *Nm2 = nir_iadd_imm(b, N, -2);
+
+ /* Prim v-1, slot 0: 1 <= v <= N-2 ⇔ v >= 1 AND v <= N-2 ⇔ v >= 1 AND v < N-1 */
+ {
+ nir_def *prim = nir_iadd_imm(b, v, -1);
+ nir_def *eligible = nir_iand(b,
+ nir_uge(b, v, nir_imm_int(b, 1)),
+ nir_ult(b, v, Nm1));
+ (void)Nm2;
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
+ prim, nir_imm_int(b, 0), 2, value, stride, offset_bytes);
+ }
+
+ /* Prim v-2, slot 1: 2 <= v <= N-1 ⇔ v >= 2 AND v < N */
+ {
+ nir_def *prim = nir_iadd_imm(b, v, -2);
+ nir_def *eligible = nir_iand(b,
+ nir_uge(b, v, nir_imm_int(b, 2)),
+ nir_ult(b, v, N));
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
+ prim, nir_imm_int(b, 1), 2, value, stride, offset_bytes);
+ }
+}
+
+/* TRIANGLE_LIST_WITH_ADJACENCY: 6-vertex groups; output {6i, 6i+2, 6i+4}.
+ * v contributes if v%6 == 0: prim v/6 slot 0
+ * v contributes if v%6 == 2: prim v/6 slot 1
+ * v contributes if v%6 == 4: prim v/6 slot 2
+ */
+static void
+emit_tri_list_adj(nir_builder *b, nir_def *v, nir_def *N,
+ nir_def *buf, nir_def *output_count, nir_def *instance_id,
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
+{
+ (void)N;
+ nir_def *vmod6 = nir_umod_imm(b, v, 6);
+ nir_def *prim = nir_udiv_imm(b, v, 6);
+
+ for (uint32_t slot = 0; slot < 3; slot++) {
+ emit_prim_store(b, buf, output_count, instance_id,
+ nir_ieq_imm(b, vmod6, slot * 2),
+ prim, nir_imm_int(b, slot), 3, value, stride, offset_bytes);
+ }
+}
+
+/* TRIANGLE_STRIP_WITH_ADJACENCY: prim i emits:
+ * even i: {2i, 2i+2, 2i+4} (slots 0, 1, 2 ← input indices 2i, 2i+2, 2i+4)
+ * odd i: {2i, 2i+4, 2i+2} (slots 0, 1, 2 ← input indices 2i, 2i+4, 2i+2)
+ *
+ * Only EVEN input vertices contribute (since all output indices are 2*something).
+ * For even input v:
+ * prim v/2 slot 0 (always, if v/2 < N/2-2)
+ * prim (v-2)/2 slot 1 if (v-2)/2 even, slot 2 if odd (when v >= 2)
+ * prim (v-4)/2 slot 2 if (v-4)/2 even, slot 1 if odd (when v >= 4)
+ */
+static void
+emit_tri_strip_adj(nir_builder *b, nir_def *v, nir_def *N,
+ nir_def *buf, nir_def *output_count, nir_def *instance_id,
+ nir_def *value, uint16_t stride, uint16_t offset_bytes)
+{
+ /* Bail for odd input vertices — they never contribute. */
+ nir_def *v_is_even = nir_ieq_imm(b, nir_iand_imm(b, v, 1u), 0);
+ nir_push_if(b, v_is_even);
+ {
+ nir_def *N_half = nir_ushr_imm(b, N, 1);
+ nir_def *max_prim = nir_iadd_imm(b, N_half, -2); /* N/2 - 2 */
+ nir_def *v_half = nir_ushr_imm(b, v, 1);
+
+ /* Prim v/2 slot 0: v/2 < N/2 - 2 */
+ emit_prim_store(b, buf, output_count, instance_id,
+ nir_ult(b, v_half, max_prim),
+ v_half, nir_imm_int(b, 0), 3, value, stride, offset_bytes);
+
+ /* Prim (v-2)/2 = v/2 - 1: v >= 2 AND prim < N/2-2 */
+ {
+ nir_def *prim = nir_iadd_imm(b, v_half, -1);
+ nir_def *parity = nir_iand_imm(b, prim, 1u);
+ nir_def *slot = nir_iadd_imm(b, parity, 1); /* even→1, odd→2 */
+ nir_def *eligible = nir_iand(b,
+ nir_uge(b, v, nir_imm_int(b, 2)),
+ nir_ult(b, prim, max_prim));
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
+ prim, slot, 3, value, stride, offset_bytes);
+ }
+
+ /* Prim (v-4)/2 = v/2 - 2: v >= 4 AND prim < N/2-2 */
+ {
+ nir_def *prim = nir_iadd_imm(b, v_half, -2);
+ nir_def *parity = nir_iand_imm(b, prim, 1u);
+ nir_def *slot = nir_isub(b, nir_imm_int(b, 2), parity); /* even→2, odd→1 */
+ nir_def *eligible = nir_iand(b,
+ nir_uge(b, v, nir_imm_int(b, 4)),
+ nir_ult(b, prim, max_prim));
+ emit_prim_store(b, buf, output_count, instance_id, eligible,
+ prim, slot, 3, value, stride, offset_bytes);
+ }
+ }
+ nir_pop_if(b, NULL);
+}
+
+/* ----- Main lowering: per store_output XFB channel ----- */
+
+static void
+lower_xfb_output_iter17(nir_builder *b, nir_intrinsic_instr *intr,
+ unsigned channel_idx, unsigned num_components,
+ unsigned buffer, unsigned offset_words)
+{
+ assert(buffer < MAX_XFB_BUFFERS);
+ assert(nir_intrinsic_component(intr) == 0);
+
+ uint16_t stride = b->shader->info.xfb_stride[buffer] * 4;
+ assert(stride != 0);
+ uint16_t offset_bytes = offset_words * 4;
+
+ BITSET_SET(b->shader->info.system_values_read, SYSTEM_VALUE_VERTEX_ID_ZERO_BASE);
+ BITSET_SET(b->shader->info.system_values_read, SYSTEM_VALUE_INSTANCE_ID);
+
+ nir_def *topology = load_sysval(b, graphics, 32, vs.xfb_topology);
+ nir_def *out_count = load_sysval(b, graphics, 32, vs.xfb_output_count);
+ nir_def *N = nir_load_num_vertices(b);
+ nir_def *v = nir_load_raw_vertex_id_pan(b);
+ nir_def *instance = nir_load_instance_id(b);
+ nir_def *buf = nir_load_xfb_address(b, 64, .base = buffer);
+
+ nir_def *src = intr->src[0].ssa;
+ nir_component_mask_t mask = nir_component_mask(num_components);
+ nir_def *value = nir_channels(b, src, mask << channel_idx);
+
+ /* Topology dispatch ladder. LIST first (fast path). */
+ nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LIST));
+ {
+ emit_list_store(b, buf, out_count, instance, v, value,
+ stride, offset_bytes);
+ }
+ nir_push_else(b, NULL);
+ {
+ /* iter17 Janet Finding 3: gate all non-LIST emission on
+ * output_count > 0. For degenerate input counts (N < min required
+ * for the topology), output_count is 0 and we must emit NO stores
+ * — otherwise N-2 / N-3 / etc. arithmetic underflows in the
+ * eligibility predicates and we falsely fire stores. */
+ nir_push_if(b, nir_ult(b, nir_imm_int(b, 0), out_count));
+ {
+ nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_STRIP));
+ {
+ emit_tri_strip(b, v, N, buf, out_count, instance, value,
+ stride, offset_bytes);
+ }
+ nir_push_else(b, NULL);
+ {
+ nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_STRIP));
+ {
+ emit_line_strip(b, v, N, buf, out_count, instance, value,
+ stride, offset_bytes);
+ }
+ nir_push_else(b, NULL);
+ {
+ nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_FAN));
+ {
+ emit_tri_fan(b, v, N, buf, out_count, instance, value,
+ stride, offset_bytes);
+ }
+ nir_push_else(b, NULL);
+ {
+ nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_LIST_ADJ));
+ {
+ emit_line_list_adj(b, v, N, buf, out_count, instance, value,
+ stride, offset_bytes);
+ }
+ nir_push_else(b, NULL);
+ {
+ nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_LINE_STRIP_ADJ));
+ {
+ emit_line_strip_adj(b, v, N, buf, out_count, instance, value,
+ stride, offset_bytes);
+ }
+ nir_push_else(b, NULL);
+ {
+ nir_push_if(b, nir_ieq_imm(b, topology, PANVK_XFB_TOPO_TRI_LIST_ADJ));
+ {
+ emit_tri_list_adj(b, v, N, buf, out_count, instance, value,
+ stride, offset_bytes);
+ }
+ nir_push_else(b, NULL);
+ {
+ /* TRI_STRIP_ADJ — last case */
+ emit_tri_strip_adj(b, v, N, buf, out_count, instance, value,
+ stride, offset_bytes);
+ }
+ nir_pop_if(b, NULL);
+ }
+ nir_pop_if(b, NULL);
+ }
+ nir_pop_if(b, NULL);
+ }
+ nir_pop_if(b, NULL);
+ }
+ nir_pop_if(b, NULL);
+ }
+ nir_pop_if(b, NULL);
+ }
+ nir_pop_if(b, NULL); /* Janet Finding 3: close output_count > 0 guard */
+ }
+ nir_pop_if(b, NULL);
+}
+
+/* Mirror of pan_nir_lower_xfb's lower_xfb: load_vertex_id rewrite +
+ * dispatch store_output through our topology-aware emission. */
+static bool
+lower_xfb_iter17(nir_builder *b, nir_intrinsic_instr *intr,
+ UNUSED void *data)
+{
+ if (intr->intrinsic == nir_intrinsic_load_vertex_id) {
+ b->cursor = nir_instr_remove(&intr->instr);
+ nir_def *repl = nir_iadd(b, nir_load_raw_vertex_id_pan(b),
+ nir_load_raw_vertex_offset_pan(b));
+ nir_def_rewrite_uses(&intr->def, repl);
+ return true;
+ }
+
+ if (intr->intrinsic != nir_intrinsic_store_output)
+ return false;
+
+ bool progress = false;
+ b->cursor = nir_before_instr(&intr->instr);
+
+ /* io_xfb has only out[0,1]; the other 2 channels are in io_xfb2.
+ * Outer loop selects which annotation; inner picks which channel. */
+ for (unsigned i = 0; i < 2; ++i) {
+ nir_io_xfb xfb = i ? nir_intrinsic_io_xfb2(intr)
+ : nir_intrinsic_io_xfb(intr);
+ for (unsigned j = 0; j < 2; ++j) {
+ if (!xfb.out[j].num_components)
+ continue;
+ lower_xfb_output_iter17(b, intr, i * 2 + j, xfb.out[j].num_components,
+ xfb.out[j].buffer, xfb.out[j].offset);
+ progress = true;
+ }
+ }
+
+ if (progress)
+ nir_instr_remove(&intr->instr);
+ return progress;
+}
+
+bool
+panvk_per_arch(nir_lower_xfb)(nir_shader *nir)
+{
+ return nir_shader_intrinsics_pass(
+ nir, lower_xfb_iter17, nir_metadata_control_flow, NULL);
+}
+
+#endif /* PAN_ARCH < 9 */
+17 -2
View File
@@ -30,7 +30,7 @@
pkgname=mesa-panvk-bifrost
_mesaver=26.0.6
pkgver=26.0.6.r3
pkgver=26.0.6.r4
pkgrel=1
pkgdesc="Patched Mesa libvulkan_panfrost.so exposing Bifrost-gen Mali to Vulkan apps (panvk-bifrost campaign)"
arch=('aarch64')
@@ -80,6 +80,7 @@ source=(
"0001-panvk-expose-robustness2-nullDescriptor-bifrost.patch"
"0002-panvk-expose-vulkan-1.1-1.2-on-bifrost.patch"
"0003-panvk-bifrost-vk-ext-transform-feedback.patch"
"0004-panvk-bifrost-xfb-primitive-decomposition.patch"
"brave-vulkan"
"icd.json"
)
@@ -90,6 +91,7 @@ sha256sums=(
'SKIP'
'SKIP'
'SKIP'
'SKIP'
)
prepare() {
@@ -116,6 +118,15 @@ prepare() {
# reports "Hardware accelerated" across the board for the affected paths).
patch -p1 < "${srcdir}/0003-panvk-bifrost-vk-ext-transform-feedback.patch"
# iter17: XFB primitive decomposition for non-LIST topologies (TRI_STRIP,
# TRI_FAN, LINE_STRIP, *_WITH_ADJACENCY). Replacement panvk-specific
# NIR pass (panvk_per_arch(nir_lower_xfb)) substituted for upstream
# pan_nir_lower_xfb. Closes the 162 dEQP-VK winding_* failures from
# iter15 (958 P / 81 F / 0 Crash on full XFB CTS — remaining 81 fails
# are by-design resume_* tests, transformFeedbackDraw=false).
# Phase-doc context: ~/src/panvk-bifrost/iter17/phase{0,1,2,4,5,6,8}_*.md.
patch -p1 < "${srcdir}/0004-panvk-bifrost-xfb-primitive-decomposition.patch"
# Sanity-check the patches landed.
grep -q "KHR_robustness2 = true," src/panfrost/vulkan/panvk_vX_physical_device.c
grep -q "EXT_robustness2 = true," src/panfrost/vulkan/panvk_vX_physical_device.c
@@ -124,8 +135,12 @@ prepare() {
grep -q "has_vk1_2 = true;" src/panfrost/vulkan/panvk_vX_physical_device.c
# iter13 sanity:
grep -q "EXT_transform_feedback = PAN_ARCH < 9," src/panfrost/vulkan/panvk_vX_physical_device.c
grep -q "pan_nir_lower_xfb" src/panfrost/vulkan/panvk_vX_shader.c
test -f src/panfrost/vulkan/jm/panvk_vX_cmd_xfb.c
# iter17 sanity: pan_nir_lower_xfb call site has been replaced; new file present.
grep -q "panvk_per_arch(nir_lower_xfb)" src/panfrost/vulkan/panvk_vX_shader.c
grep -q "xfb_topology" src/panfrost/vulkan/panvk_shader.h
grep -q "panvk_xfb_topology" src/panfrost/vulkan/panvk_shader.h
test -f src/panfrost/vulkan/panvk_vX_xfb_lower.c
}
build() {
+49 -23
View File
@@ -14,9 +14,9 @@
# Sibling userspace package: ../daedalus-v4l2/build-deb.sh
set -euo pipefail
UPSTREAM_COMMIT=f0d41867f60f5bf8dbfcc6cc16404d7d7eb90014
PKGVER=0.1.0+r24+gf0d4186
PKGREL=1 # reset for new upstream pin (3dd0eb0 — DAEMON-PPS H.264 SPS/PPS NAL synth)
UPSTREAM_COMMIT=5d8b4369e58ab947d1c56b1f718293c57c6065b5
PKGVER=0.1.0+r33+g5d8b436
PKGREL=1 # reset for new upstream pin (5d8b436 — revert parking design); still carries the #64 multi-kernel postinst fix
MODULE_NAME=daedalus_v4l2
HERE=$(dirname "$(readlink -f "$0")")
@@ -78,7 +78,6 @@ set -e
NAME=${MODULE_NAME}
VERSION=${PKGVER}
KERNELVER=\$(uname -r)
# Yellow + bold ANSI for the warning so it stands out in apt's
# stream of "Setting up" lines. Disable colour on non-TTY.
@@ -101,29 +100,56 @@ if [ "\$1" = "configure" ]; then
dkms add "\$NAME/\$VERSION" 2>/dev/null || true
# Don't let autoinstall failure mask the actual problem behind '|| true'.
# Run it, capture the result, then verify post-condition.
autoinstall_rc=0
dkms autoinstall "\$NAME/\$VERSION" || autoinstall_rc=\$?
# Enumerate every kernel whose headers are actually present
# (/lib/modules/<kver>/build resolves to a directory). We iterate
# all of them — not just \$(uname -r) — so that installing this
# package after a kernel update covers the newly-installed kernel
# too, and so that a later kernel-headers install for a previously
# uncovered version gets picked up on dpkg-reconfigure. Without
# this, autoinstall (which targets only the running kernel) leaves
# /dev/daedalus-v4l2 absent after a kernel switch + reboot
# (marfrit/marfrit-packages#64).
kvers=''
for d in /lib/modules/*/build; do
[ -d "\$d" ] || continue
k=\$(basename "\$(dirname "\$d")")
kvers="\$kvers \$k"
done
# Verify the module actually built + installed for the running kernel.
status=\$(dkms status -m "\$NAME" -v "\$VERSION" -k "\$KERNELVER" 2>/dev/null || true)
if ! printf '%s\\n' "\$status" | grep -q -E 'installed|loaded'; then
if [ -z "\$kvers" ]; then
warn ""
warn "DKMS build did NOT land for kernel \$KERNELVER."
warn " dkms status -m \$NAME -v \$VERSION -k \$KERNELVER:"
warn " \$(printf '%s' "\$status" | head -1)"
warn ""
warn "Most likely cause: kernel headers package is missing."
warn " Raspberry Pi OS / Pi 5: apt install linux-headers-rpi-2712"
warn " Debian generic: apt install linux-headers-\$KERNELVER"
warn ""
warn "After installing headers, finish the install with:"
warn "No kernels with headers found under /lib/modules/*/build."
warn "Install kernel headers (e.g. linux-headers-rpi-2712 on Pi OS)"
warn "then finish with:"
warn " sudo dkms autoinstall \$NAME/\$VERSION"
warn " sudo modprobe daedalus_v4l2"
exit 0
fi
failed=''
for k in \$kvers; do
dkms autoinstall -k "\$k" "\$NAME/\$VERSION" >/dev/null 2>&1 || true
s=\$(dkms status -m "\$NAME" -v "\$VERSION" -k "\$k" 2>/dev/null || true)
if ! printf '%s\\n' "\$s" | grep -q -E 'installed|loaded'; then
failed="\$failed \$k"
fi
done
if [ -n "\$failed" ]; then
warn ""
warn "Until then daedalus_v4l2 will NOT be loadable and the"
warn "userspace daedalus-v4l2 daemon will have nothing to talk to."
warn "DKMS build did NOT land for kernel(s):\$failed"
warn ""
warn "Most likely cause: kernel headers missing for those versions."
warn " Raspberry Pi OS / Pi 5: apt install linux-headers-rpi-2712"
warn " Debian generic: apt install linux-headers-<version>"
warn ""
warn "After installing headers, finish with:"
for k in \$failed; do
warn " sudo dkms autoinstall -k \$k \$NAME/\$VERSION"
done
warn " sudo modprobe daedalus_v4l2 (after booting that kernel)"
warn ""
warn "Until then daedalus_v4l2 will NOT be loadable on those kernels"
warn "and the userspace daedalus-v4l2 daemon will have nothing to talk to."
fi
fi
+73
View File
@@ -1,3 +1,76 @@
daedalus-v4l2-dkms (0.1.0+r33+g5d8b436-1) bookworm trixie; urgency=medium
* Bump to 5d8b436 — reverts daedalus-v4l2 PRs #7 + #8. Kernel
module returns to the pre-#7 buf_done_and_job_finish completion
model: no src/dst lifecycle decoupling, no parked dst_bufs, no
1:1-contract violation against libva-v4l2-request-fourier
(closes daedalus-v4l2#9 + #10 as won't-fix at this layer; proper
fix tracked at daedalus-v4l2#11).
* Wire-protocol drops 1 → 0; lock-step install with daedalus-v4l2
0.1.0+r33+g5d8b436 REQUIRED.
* Carries forward the #64 multi-kernel postinst fix.
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 14:50:00 +0000
daedalus-v4l2-dkms (0.1.0+r30+g6ffe92b-1) bookworm trixie; urgency=medium
* Bump to 6ffe92b — fixes the kernel panic regression introduced
by 79256dc's split-completion design (closes daedalus-v4l2#8).
`device_run` now removes both src + dst from `m2m_ctx`'s
rdy_queue at pickup time, not at `buf_done` time. Without
this, after `SRC_CONSUMED`'s `job_finish` released the m2m
scheduler, the NEXT `device_run` saw the still-queued parked
dst_buf and paired it with a fresh src — two inflight entries
referencing the same vb2_buffer, the later `HAS_PIXELS`
triggered list_del on an already-detached list_head, smashing
the rdy_queue → hard reboot on Pi CM5 during `mpv vaapi-copy`
playback of 720p H.264 (2026-05-21).
* Wire protocol unchanged — DAEDALUS_PROTO_VERSION stays at 1.
Daemon (userspace daedalus-v4l2 package) need NOT bump in
lockstep with this DKMS update; the existing
daedalus-v4l2 0.1.0+r28+g79256dc is wire-compatible with
daedalus-v4l2-dkms 0.1.0+r30+g6ffe92b.
* Carries forward the #64 multi-kernel postinst fix.
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 14:00:00 +0000
daedalus-v4l2-dkms (0.1.0+r28+g79256dc-1) bookworm trixie; urgency=medium
* Bump to 79256dc — H.264 B-frame display reorder fix (closes
daedalus-v4l2#6). libavcodec's H.264 decoder reorders output to
display order before returning from avcodec_receive_frame; the
daemon was binding each REQ_DECODE's pixels to the cookie of the
bitstream that triggered the receive_frame call, not the cookie
of the bitstream that actually produced the picture. For B-frame
sequences this paired cookie N's CAPTURE buffer with cookie N-2's
pixels and silently lost intermediate frames — visible as
"2 1 4 3 6 5" frame pairing in mpv / Firefox on Pi CM5.
* Wire-protocol bump (DAEDALUS_PROTO_VERSION 0 → 1): REQ_DECODE
gains __u64 src_pts; RESP_FRAME gains __u32 flags +
__u64 output_src_pts. Kernel + daemon must install atomically
(this package + daedalus-v4l2 0.1.0+r28+g79256dc).
* Carries forward the #64 multi-kernel postinst fix from -2:
autoinstall for every /lib/modules/*/build that resolves to real
headers, not just $(uname -r).
* Closes #64.
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 12:00:00 +0000
daedalus-v4l2-dkms (0.1.0+r24+gf0d4186-2) bookworm trixie; urgency=medium
* postinst: autoinstall for every installed kernel with headers, not
just the running one. Previously `dkms autoinstall $NAME/$VERSION`
built only against `$(uname -r)`, so installing the package on
kernel A and then rebooting into a separately-installed kernel B
left /lib/modules/B/updates/dkms/ empty — /dev/daedalus-v4l2 absent,
daedalus daemon nothing to talk to, browser/VAAPI silently falling
back to software with no obvious diagnostic. Now we enumerate every
/lib/modules/*/build that resolves to a real directory and run
`dkms autoinstall -k <kver>` for each, reporting per-kernel failure
only when headers are missing. Closes #64.
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 09:30:00 +0000
daedalus-v4l2-dkms (0.1.0+r24+gf0d4186-1) bookworm trixie; urgency=medium
* Bump to f0d4186 — per-ctx vb2 lock fix. daedalus_queue_init now
+41 -8
View File
@@ -11,13 +11,23 @@
# Upstream repo: https://git.reauktion.de/reauktion/daedalus-v4l2
set -euo pipefail
# Same pin as the Arch PKGBUILD. 481279c = "Phase 8.13: byte-exact
# end-to-end via libva (consumer target hit)" — first commit where the
# full ffmpeg -hwaccel vaapi → libva → /dev/video0 → daemon path lands
# a pixel-correct decoded frame back in ffmpeg.
UPSTREAM_COMMIT=f0d41867f60f5bf8dbfcc6cc16404d7d7eb90014
PKGVER=0.1.0+r24+gf0d4186
PKGREL=1 # reset for new upstream pin (3dd0eb0 — DAEMON-PPS H.264 SPS/PPS NAL synth)
# 6e6dfa1 = picks up daedalus-v4l2 PR #16 — daemon now dlopens
# the Kwiboo fourier fork's libavcodec.so.62 / libavformat.so.62 /
# libavutil.so.60 at /opt/fourier instead of Debian-stock soname
# 61/61/59. First step on the daedalus-fourier substitution arc
# (daedalus-v4l2#11): routes the daemon through the libavcodec
# source tree we own in marfrit-packages. Headers + .pc files
# come from ffmpeg-v4l2-request-fourier (installed by the CI
# workflow before this script runs; see PKG_CONFIG_PATH below).
UPSTREAM_COMMIT=6e6dfa144da7bc7fa8be50c8da91d7d1c6132a2c
PKGVER=0.1.0+r41+g6e6dfa1
PKGREL=1 # reset for new upstream pin (6e6dfa1 — soname 62 via /opt/fourier)
# daedalus-fourier pin. d87239d = marfrit/daedalus-fourier PR #1 merge
# (install rules + pkg-config, enables this consumer to find_package
# + link). Bump in lockstep with the upstream daemon when daedalus-
# fourier's API or installed shaders are changed by a new consumer.
DAEDALUS_FOURIER_COMMIT=d87239d8172307d9a1b93c95cbed116d175b85cc
HERE=$(dirname "$(readlink -f "$0")")
@@ -27,14 +37,37 @@ export SOURCE_DATE_EPOCH=1779231600
work=$(mktemp -d)
trap "rm -rf $work" EXIT
# --- daedalus-fourier: fetch + build + install to per-build prefix ---
#
# Static-linked into the daemon, so the temp prefix is only for the
# duration of this build script. Requires libvulkan-dev + glslang-tools
# on the runner (already needed for the daedalus-fourier benches).
FOURIER_PREFIX=$work/fourier-prefix
mkdir -p "$FOURIER_PREFIX"
cd "$work"
curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo daedalus-fourier.tar.gz \
"https://git.reauktion.de/marfrit/daedalus-fourier/archive/${DAEDALUS_FOURIER_COMMIT}.tar.gz"
tar xzf daedalus-fourier.tar.gz
cd daedalus-fourier
cmake -B build -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX="$FOURIER_PREFIX"
cmake --build build --target daedalus_core
cmake --install build
# --- daedalus-v4l2: fetch + build daemon against installed daedalus-fourier ---
cd "$work"
curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo daedalus-v4l2.tar.gz \
"https://git.reauktion.de/reauktion/daedalus-v4l2/archive/${UPSTREAM_COMMIT}.tar.gz"
tar xzf daedalus-v4l2.tar.gz
SRCDIR=daedalus-v4l2
# Build daemon (CMake)
# Build daemon (CMake) — point pkg-config at the daedalus-fourier
# temp prefix so pkg_check_modules(DAEDALUS_FOURIER …) resolves to it.
cd "$SRCDIR/daemon"
PKG_CONFIG_PATH="$FOURIER_PREFIX/lib/pkgconfig:/opt/fourier/lib/pkgconfig" \
cmake -B build -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr
+117
View File
@@ -1,3 +1,120 @@
daedalus-v4l2 (0.1.0+r41+g6e6dfa1-1) bookworm trixie; urgency=medium
* Bump to 6e6dfa1 — daedalus-v4l2 PR #16. Daemon dlopens Kwiboo
fourier fork's libavcodec.so.62 / libavformat.so.62 /
libavutil.so.60 at /opt/fourier instead of Debian-stock
soname 61/61/59. First step on the daedalus-fourier
substitution arc (daedalus-v4l2#11): the next PR series
layers daedalus_recipe_dispatch_h264_* substitution patches
into ffmpeg-v4l2-request-fourier's H264DSPContext NEON init,
reaching the daemon's production decode path.
* Build: PKG_CONFIG_PATH now includes /opt/fourier/lib/pkgconfig
so daemon's pkg_check_modules picks up the Kwiboo .pc files.
* CI workflow build-deps: libavcodec-dev / libavformat-dev /
libavutil-dev (Debian stock 7.1.3) → ffmpeg-v4l2-request-fourier
(provides /opt/fourier/include + .pc files).
* Wire protocol unchanged. No daedalus-v4l2-dkms bump.
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 21:30:00 +0000
daedalus-v4l2 (0.1.0+r39+g3bc0da1-1) bookworm trixie; urgency=medium
* Bump to 3bc0da1 — picks up daedalus-v4l2 PR #15. Per-frame
`decoder: OK ...` log line gains `decode_us=N` (libavcodec
send_packet + receive_frame wall-clock cost in microseconds).
New `decoder stats` summary line every 60 decoded frames with
codec, fps, avg decode_us, MBs/s throughput, B/MB bitrate.
* Pure observability — no decode-path behaviour change.
Establishes baseline metrics for the substitution work in
daedalus-v4l2#11 step 2 (replacing libavcodec primitives with
daedalus-fourier kernels one cycle at a time).
* On Pi CM5 / bbb 720p H.264 baseline: ~4 ms decode_us / 24 fps
/ 90 K MBs/s — workload is well under 1 % of any single
daedalus-fourier kernel's NEON ceiling.
* Wire protocol unchanged. No daedalus-v4l2-dkms bump needed.
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 18:30:00 +0000
daedalus-v4l2 (0.1.0+r37+g77e14e5-1) bookworm trixie; urgency=medium
* Bump to 77e14e5 — picks up daedalus-v4l2 PRs #12 + #13.
* #12 (LOW_DELAY half-measure): the daemon now sets
AV_CODEC_FLAG_LOW_DELAY on the H.264 AVCodecContext so libavcodec
emits frames in decode order ~99% of the time (a few stragglers
at GOP boundaries when the stream's SPS num_reorder_frames
overrides the flag). Visible improvement vs the 2-1-4-3
pair-swap on Firefox YouTube + mpv playback; not a permanent
fix (see #11 for the architectural plan).
* #13 (daedalus-fourier linkage): the daemon now pkg-config-links
against the daedalus-fourier kernel library (marfrit/
daedalus-fourier) and logs substrate availability at startup.
No kernels dispatched yet — this is the build-time / link-time
foundation for the H.264 daemon-rewrite plan in #11
(substituting daedalus-fourier IDCT 4×4 / IDCT 8×8 / luma
deblock primitives for libavcodec's per-MB pixel math, one
cycle at a time, measuring CPU saved per substitution).
* Build-deb.sh now fetches + builds + installs daedalus-fourier
(pinned at d87239d, marfrit/daedalus-fourier PR #1) into a
per-build temp prefix, then builds the daemon with
PKG_CONFIG_PATH pointing at it. daedalus-fourier is
statically linked into the daemon binary, so the resulting
.deb has no new runtime deps. Requires libvulkan-dev +
glslang-tools on the CI runner (the daedalus-fourier benches
already needed those).
* Wire protocol unchanged — DAEDALUS_PROTO_VERSION stays at 0.
No daedalus-v4l2-dkms bump needed.
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 16:30:00 +0000
daedalus-v4l2 (0.1.0+r33+g5d8b436-1) bookworm trixie; urgency=medium
* Bump to 5d8b436 — reverts daedalus-v4l2 PRs #7 + #8 (the parking
design that broke libva-v4l2-request-fourier's 1:1 CAPTURE
contract; see daedalus-v4l2#9 + #10). After daemon-r28+g79256dc
landed, mpv (--hwdec=vaapi-copy) failed pre-playing with
"Unable to dequeue buffer: Resource temporarily unavailable" /
"Failed to end picture decode" because the daemon parked CAPTURE
buffers waiting for libavcodec to release H.264 B-frames in
display order — violating the V4L2 stateless 1:1 contract.
Firefox tolerated the mess (visible "2 1 4 3" pair-swap); mpv
bailed.
* This bump restores f0d4186-equivalent behaviour, plus PR #4
(cosmetic H.264 DECODE_MODE / START_CODE menu controls). PR #7
+ PR #8 wire-protocol additions (src_pts / output_src_pts /
RESP_FRAME flags) are reverted — DAEDALUS_PROTO_VERSION drops
back from 1 → 0. Lock-step install with daedalus-v4l2-dkms
0.1.0+r33+g5d8b436 REQUIRED.
* Visible regression: H.264 B-frame streams in Firefox revert to
the original "2 1 4 3 6 5" pair-swap visual. The proper fix
(concurrent in-flight requests in daemon + display-order reorder
in libva-v4l2-request-fourier) is tracked at daedalus-v4l2#11.
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 14:50:00 +0000
daedalus-v4l2 (0.1.0+r28+g79256dc-1) bookworm trixie; urgency=medium
* Bump to 79256dc — H.264 B-frame display reorder fix (closes
daedalus-v4l2#6 + #4 menu controls). Daemon side: the
avcodec_send_packet → receive_frame loop now stamps pkt->pts =
req->src_pts so libavcodec's display-ordered frame->pts identifies
which OUTPUT bitstream's pixels each drained frame belongs to.
chardev_client maintains a (src_pts → cookie) lookup table so the
daemon can ship pixels to the cookie of the *originating*
bitstream, not the cookie of whatever REQ triggered the
receive_frame call. Multiple RESP_FRAME messages per REQ_DECODE
are now possible (one for the just-consumed src, one or more for
drained pixels).
* Wire-protocol bump (DAEDALUS_PROTO_VERSION 0 → 1): REQ_DECODE
gains __u64 src_pts; RESP_FRAME gains __u32 flags +
__u64 output_src_pts. Daemon + kernel must install atomically
(this package + daedalus-v4l2-dkms 0.1.0+r28+g79256dc).
* Also subsumes 79256dc's predecessor 7ff2d89 — H.264 DECODE_MODE +
START_CODE menu-control registration that retires the
"Unable to set control(s) error_idx=2/2" warning libva-v4l2-
request emitted on every context init.
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 12:00:00 +0000
daedalus-v4l2 (0.1.0+r24+gf0d4186-1) bookworm trixie; urgency=medium
* Bump to f0d4186 — kernel per-ctx vb2 lock fix. daedalus_queue_init
@@ -0,0 +1,137 @@
From f760c0541586f43334c02611fcb4c212c08ad576 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Thu, 21 May 2026 21:40:22 +0200
Subject: [PATCH] avcodec/aarch64/h264dsp: route H.264 4x4 IDCT through
daedalus-fourier
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
H264DSPContext.idct_add (called per 4x4 block from the intra-4x4
decode path in h264_mb.c) now dispatches through
daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
The recipe layer picks the substrate; for cycle 6 (H.264 IDCT 4x4)
the recipe is CPU NEON, so this is effectively a NEON-to-NEON
substitution with one extra dispatch call and recipe-table lookup.
Provides the first end-to-end exercise of the daedalus-fourier
kernel pack inside the libavcodec.so decode hot path; follow-up
patches wire IDCT 8x8, luma-v deblock, and qpel mc20.
The library context is process-global, lazily initialised under
pthread_once on first call. We pick the no-QPU constructor because
libavcodec.so is loaded into arbitrary host processes
(firefox-fourier, mpv-fourier, daedalus_v4l2_daemon, ...) and we
cannot assume the host has a usable Vulkan instance. Higher cycles
(deblock luma-v, MC) that benefit from the QPU will provision their
own recipe-selected context once that path is wired.
Bulk paths (idct_add16, idct_add16intra, idct_add8 — used for
non-intra4x4 macroblocks) remain on the stock NEON .S implementations
and will be batched through daedalus_recipe_dispatch_h264_idct4 with
n_blocks>1 in a follow-up.
Bit-exact against ff_h264_idct_add_neon (daedalus-fourier cycle 6
green; see marfrit/daedalus-fourier/CYCLE_LOGS.md).
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2.
---
libavcodec/aarch64/Makefile | 3 +-
libavcodec/aarch64/h264_idct_daedalus.c | 49 +++++++++++++++++++++++
libavcodec/aarch64/h264dsp_init_aarch64.c | 3 +-
3 files changed, 53 insertions(+), 2 deletions(-)
create mode 100644 libavcodec/aarch64/h264_idct_daedalus.c
diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
index 41ab025..7b95fb1 100644
--- a/libavcodec/aarch64/Makefile
+++ b/libavcodec/aarch64/Makefile
@@ -3,7 +3,8 @@ OBJS-$(CONFIG_AC3DSP) += aarch64/ac3dsp_init_aarch64.o
OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_init_aarch64.o
OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_init.o
OBJS-$(CONFIG_H264CHROMA) += aarch64/h264chroma_init_aarch64.o
-OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_init_aarch64.o
+OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_init_aarch64.o \
+ aarch64/h264_idct_daedalus.o
OBJS-$(CONFIG_HUFFYUVDSP) += aarch64/huffyuvdsp_init_aarch64.o
OBJS-$(CONFIG_H264PRED) += aarch64/h264pred_init.o
OBJS-$(CONFIG_H264QPEL) += aarch64/h264qpel_init_aarch64.o
diff --git a/libavcodec/aarch64/h264_idct_daedalus.c b/libavcodec/aarch64/h264_idct_daedalus.c
new file mode 100644
index 0000000..538d223
--- /dev/null
+++ b/libavcodec/aarch64/h264_idct_daedalus.c
@@ -0,0 +1,49 @@
+/*
+ * H.264 4x4 IDCT + add — daedalus-fourier substitution shim.
+ *
+ * Routes H264DSPContext.idct_add through
+ * daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
+ * The recipe layer picks the substrate (CPU NEON by default for
+ * cycle 6; future cycles may dispatch to V3D opportunistically).
+ *
+ * FFmpeg's 4x4 block memory layout matches daedalus's column-major
+ * convention: block[r + 4*c] = coefficient at (row r, col c). Both
+ * sides destructively zero the block after the transform.
+ *
+ * The library context is process-global and lazily initialised under
+ * pthread_once. We pick the no-QPU constructor here because
+ * libavcodec.so is loaded into arbitrary host processes
+ * (firefox-fourier, mpv-fourier, daedalus_v4l2_daemon, ...) and we
+ * cannot assume the host has a usable Vulkan instance. Higher cycles
+ * (deblock, MC) that benefit from the QPU initialise their own
+ * recipe-selected context once that path is wired.
+ */
+
+#include <pthread.h>
+#include <stddef.h>
+#include <stdint.h>
+
+#include <daedalus.h>
+
+#include "libavutil/attributes.h"
+#include "libavcodec/h264dsp.h"
+
+static daedalus_ctx *g_dctx;
+static pthread_once_t g_dctx_once = PTHREAD_ONCE_INIT;
+
+static void daedalus_ctx_init_once(void)
+{
+ g_dctx = daedalus_ctx_create_no_qpu();
+}
+
+void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
+
+void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride)
+{
+ static const daedalus_h264_block_meta meta = { .dst_off = 0 };
+
+ pthread_once(&g_dctx_once, daedalus_ctx_init_once);
+
+ daedalus_recipe_dispatch_h264_idct4(g_dctx, dst, (size_t)stride,
+ block, 1, &meta);
+}
diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c
index c684574..b993df2 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
@@ -66,6 +66,7 @@ void ff_biweight_h264_pixels_4_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride
int weights, int offset);
void ff_h264_idct_add_neon(uint8_t *dst, int16_t *block, int stride);
+void ff_h264_idct_add_daedalus(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct_dc_add_neon(uint8_t *dst, int16_t *block, int stride);
void ff_h264_idct_add16_neon(uint8_t *dst, const int *block_offset,
int16_t *block, int stride,
@@ -139,7 +140,7 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth,
c->biweight_pixels_tab[1] = ff_biweight_h264_pixels_8_neon;
c->biweight_pixels_tab[2] = ff_biweight_h264_pixels_4_neon;
- c->idct_add = ff_h264_idct_add_neon;
+ c->idct_add = ff_h264_idct_add_daedalus;
c->idct_dc_add = ff_h264_idct_dc_add_neon;
c->idct_add16 = ff_h264_idct_add16_neon;
c->idct_add16intra = ff_h264_idct_add16intra_neon;
--
2.47.3
+41 -1
View File
@@ -33,7 +33,15 @@ FFMPEG_VERSION=8.1
# epoch 2 matches Debian's stock ffmpeg (currently 7:7.1.x in trixie);
# +rfourier suffix to avoid colliding with upstream/Debian rebuilds.
PKGVER=2:${FFMPEG_VERSION}+rfourier+gb57fbbe
PKGREL=2 # pkgrel=2Path A move to /opt/fourier prefix (2026-05-19)
PKGREL=5 # pkgrel=5H.264 IDCT 4x4 daedalus-fourier substitution; skip past
# an orphan -4 .deb sitting in the apt pool that made
# check-already-published.sh's `pool_ver ge source_full` short-
# circuit the previous -3 build (PR #76). (2026-05-21)
# daedalus-fourier pin — first kernel substitution in libavcodec (cycle 6
# H.264 IDCT 4x4). Same SHA as the daedalus-v4l2 daemon already ships
# inline; rev in lockstep with the daemon when the public API rolls.
DAEDALUS_FOURIER_COMMIT=d87239d8172307d9a1b93c95cbed116d175b85cc
HERE=$(dirname "$(readlink -f "$0")")
@@ -57,6 +65,34 @@ fi
# Apply patches (same as Arch).
patch -Np1 -i "$HERE/0001-libudev-bypass-fallback.patch"
patch -Np1 -i "$HERE/0002-nv15-to-p010-unpack.patch"
patch -Np1 -i "$HERE/0003-h264-idct4-daedalus-fourier.patch"
# --- daedalus-fourier: fetch + build static .a with PIC, install to a
# per-build prefix; libavcodec.so links it into the shared object so
# H264DSPContext.idct_add (and follow-up kernels) dispatch through the
# daedalus recipe layer instead of the in-tree NEON .S code. ---
#
# PIC is mandatory — the static .a is linked into a .so, so all object
# code must be relocatable. Vulkan is PUBLIC-linked by daedalus_core
# (queryable QPU substrate); we add libvulkan1 to Debian Depends below
# so dlopen of libavcodec.so.62 succeeds on stock trixie.
FOURIER_PREFIX=$work/fourier-prefix
mkdir -p "$FOURIER_PREFIX"
pushd "$work" >/dev/null
curl --connect-timeout 10 --max-time 600 --retry 3 --retry-delay 5 -sSLfo daedalus-fourier.tar.gz \
"https://git.reauktion.de/marfrit/daedalus-fourier/archive/${DAEDALUS_FOURIER_COMMIT}.tar.gz"
tar xzf daedalus-fourier.tar.gz
pushd daedalus-fourier >/dev/null
cmake -B build -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_POSITION_INDEPENDENT_CODE=ON \
-DCMAKE_INSTALL_PREFIX="$FOURIER_PREFIX"
cmake --build build --target daedalus_core
cmake --install build
popd >/dev/null
popd >/dev/null
cd "$work/FFmpeg"
# Configure with Arch-parity flags. Drops the same set of features
# (X11, AMF, CUDA, FireWire, AviSynth, Bluray, OpenMPT, JPEG-XL,
@@ -73,6 +109,9 @@ patch -Np1 -i "$HERE/0002-nv15-to-p010-unpack.patch"
--mandir=/opt/fourier/share/man \
--extra-ldexeflags='-Wl,-rpath,/opt/fourier/lib' \
--extra-ldsoflags='-Wl,-rpath,/opt/fourier/lib' \
--extra-cflags="-I${FOURIER_PREFIX}/include" \
--extra-ldflags="-L${FOURIER_PREFIX}/lib" \
--extra-libs="-ldaedalus_core -lvulkan -lpthread" \
--disable-debug \
--disable-static \
--disable-doc \
@@ -147,6 +186,7 @@ Priority: optional
Architecture: arm64
Depends: libc6,
libdrm2,
libvulkan1,
libfontconfig1,
libfreetype6,
libfribidi0,
+45
View File
@@ -1,3 +1,48 @@
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-5) bookworm trixie; urgency=medium
* pkgrel-only bump (3 → 5) to force a rebuild of the H.264 IDCT 4x4
daedalus-fourier substitution that landed in marfrit-packages PR
#76. An orphan -4 .deb already sat in the apt pool (dated
2026-05-19, no matching source commit in main); CI's
check-already-published.sh compares with `dpkg --compare-versions
pool_ver ge source_full`, which short-circuited PR #76's -3
build. Skipping past -4 lets the CI workflow actually publish the
substitution.
* No source code change beyond PKGREL and this changelog entry.
Substitution + control + build-deb.sh wiring stay as PR #76 left
them.
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 21:30:00 +0000
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-3) bookworm trixie; urgency=medium
* Add 0003-h264-idct4-daedalus-fourier.patch — H264DSPContext.idct_add
(per-block 4x4 IDCT, called from the intra-4x4 decode path in
libavcodec/h264_mb.c) now dispatches through
daedalus_recipe_dispatch_h264_idct4 instead of
ff_h264_idct_add_neon. First end-to-end exercise of the
daedalus-fourier kernel pack inside libavcodec.so on the
production decode hot path (daedalus-v4l2#11 step 2 — cycle 6
H.264 IDCT 4x4, NEON-by-recipe).
* build-deb.sh: fetches + builds daedalus-fourier (pinned at
d87239d, lockstep with the daemon's static link) with
-fPIC into a per-build temp prefix, then passes
--extra-cflags=-I.../include --extra-ldflags=-L.../lib
--extra-libs="-ldaedalus_core -lvulkan -lpthread" to FFmpeg
configure. Static-linked into libavcodec.so.62.
* Bulk paths (idct_add16 / idct_add16intra / idct_add8) remain on
the stock NEON .S code and will be batched through
daedalus_recipe_dispatch_h264_idct4 with n_blocks>1 in a
follow-up. Cycles 7/8/9 (IDCT 8x8 / luma-v deblock / qpel mc20)
land in subsequent patches.
* Depends gains libvulkan1 — daedalus_core PUBLIC-links Vulkan
(queryable QPU substrate); the no-QPU constructor still works,
but the loader refuses libavcodec.so.62 at dlopen time without
libvulkan.so.1 present.
* No ABI change; SONAMEs stay 62/62/60.
-- Markus Fritsche <mfritsche@reauktion.de> Thu, 21 May 2026 20:00:00 +0000
ffmpeg-v4l2-request-fourier (2:8.1+rfourier+gb57fbbe-1) bookworm trixie; urgency=medium
* Initial Debian packaging for the Kwiboo FFmpeg fork with V4L2
+18 -15
View File
@@ -10,22 +10,25 @@
# Upstream fork: https://git.reauktion.de/marfrit/libva-v4l2-request-fourier
set -euo pipefail
# Same pin as the Arch PKGBUILD. 77f9236 = PR #12 merge "av1:
# populate V4L2_CID_STATELESS_AV1_SEQUENCE in codec_set_controls
# (#11 libva side)" — adds src/av1.{c,h} with av1_set_controls that
# maps VAAPI's VAPictureParameterBufferAV1.seq_info_fields onto
# struct v4l2_ctrl_av1_sequence and queues V4L2_CID_STATELESS_AV1_
# SEQUENCE via S_EXT_CTRLS. The daedalus_v4l2 daemon track is the
# consumer that turns the ctrl into an OBU_SEQUENCE_HEADER and
# prepends it to the slice bitstream so libdav1d can parse the
# (sequence-header-stripped) OUTPUT buffer that ffmpeg-vaapi
# delivers.
# Same pin as the Arch PKGBUILD. c454618 = PR #16 merge "picture,
# request_pool: transparent OUTPUT-pool resize on bitstream overrun
# (#15)" — follow-up root-cause fix to #13/#14. On a mid-stream
# bitstream-budget overrun (typical cause: SPS-driven resolution
# upshift in an adaptive-bitrate stream), codec_store_buffer now
# snapshots the in-flight surface's accumulated bytes, releases its
# OUTPUT pool slot, calls request_pool_resize (STREAMOFF →
# REQBUFS(0) → S_FMT with 2×sizeimage hint, capped at 1 GiB, page-
# aligned → CREATE_BUFS → mmap → media_request_alloc → STREAMON),
# re-acquires a slot, re-mirrors the surface's source_{data,size,
# request_fd}, restores the bytes, and continues. The frame
# survives instead of being dropped back to libavcodec for surface
# recreation. CAPTURE side untouched (per-queue V4L2 streaming
# independence).
#
# Prior pin (c1bb444) = PR #9 merge — h264_set_controls
# max_num_ref_frames fallback + libva-boundary instrumentation for
# the daedalus consumer-strict path (issue #8 libva side).
UPSTREAM_COMMIT=77f92364661419f6e5a7bd827c1b845b4e426569
PKGVER=1.0.0+r386+g77f9236
# Prior pin (2860d75) = PR #14 merge — codec_store_buffer bounds-
# check floor (#13).
UPSTREAM_COMMIT=c454618ae11addce2e17b560f4deeacbed067d98
PKGVER=1.0.0+r390+gc454618
PKGREL=1
HERE=$(dirname "$(readlink -f "$0")")