The 0016 patch declared ff_h264_set_mb_inspect_cb in h264dec.h and
defined it in h264_mb.c, but didn't touch libavcodec/libavcodec.v.
FFmpeg's default version script exports only `av_*`, `avcodec_*`,
`avpriv_*`, and `avsubtitle_free`; everything else is hidden as LOCAL
behind a `*` glob. Result: `nm -D libavcodec.so.62 | grep
ff_h264_set_mb_inspect_cb` returned nothing → dlsym() returned NULL.
Static-link CLI consumer (daedalus_decode_h264) was unaffected
because static linking doesn't care about symbol visibility. The
daedalus-v4l2 daemon shadow_decoder path (PR-Q3a.1) dlopens
libavcodec.so.62 and resolves the callback via dlsym — that needs
the symbol exported.
Fix: add ff_h264_set_mb_inspect_cb to the global list in
libavcodec/libavcodec.v. Single-line addition to the 0016 patch.
Mirrored across the arch/ + debian/ patch trees.
PKGREL bump 14 → 15, changelog entry added (debian side). PKGBUILD
pkgrel bumped on arch side too. No behaviour change to the decode
path: the callback is still opt-in via the H264Context function
pointer; only consumers that have explicitly installed a callback
pay the one-load-one-branch cost per MB.
dejavu-check: this is fixing the existing 0016 observation-hook to
actually work as a dlsym intercept (the architectural shape the
patch was designed for). NOT adding new per-kernel substitution.
Same shape, same patch number, same intent. Just hiding/exporting
plumbing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Companion to 0016 (PR #106). Adds a coefficient side buffer in
H264Context, populated at the start of ff_h264_hl_decode_mb with a
single memcpy from sl->mb BEFORE IDCT-add zeros it. The existing
post-pixel-work callback (still in 0016) can now read:
- h->mb_inspect_coeffs = pre-IDCT coefficients (this patch)
- h->cur_pic.f->data = post-pixel-work pre-deblock reconstruction
and derive P = pixels − IDCT(C) for daedalus-decoder's frame-major
dispatch in PR-A3+.
Memcpy gated on (h->mb_inspect_cb != NULL). Zero cost when no
consumer is registered. Side buffer = 16 * 48 int16 = 1536 bytes
(matches the 8-bit half of sl->mb's int16_t[16 * 48 * 2] declared
size; high-bit-depth uses the upper half — not preserved here since
the daedalus-decoder consumer is 8-bit-only).
Single-threaded decode assumed at the consumer side
(avctx->thread_count = 1). Multi-slice / multi-threaded streams
would race on the single side buffer — explicit limitation of the
inspection mechanism, future extension would put per-slice buffers
in H264SliceContext.
Verified: patches 0016 + 0017 apply cleanly and build in sequence
against the Kwiboo v4l2-request-n8.1 fork at the pinned commit
b57fbbe5. ff_h264_set_mb_inspect_cb symbol exported as before.
Wired into arch PKGBUILD + debian build-deb.sh patch sequence.
pkgrel bumped 13 → 14.
Refs reauktion/daedalus-decoder!14 (PR-A2 callback wiring complete,
PR-A3 coefficient extraction is the next consumer).
Adds 0016-h264-mb-inspect-callback.patch to the FFmpeg fork. Adds an
opt-in callback fired by ff_h264_hl_decode_mb after the existing
pixel work, for tools that need per-MB visibility into H.264 decode.
API:
typedef void (*ff_h264_mb_inspect_cb)(void *opaque,
const struct H264Context *h,
int mb_x, int mb_y);
void ff_h264_set_mb_inspect_cb(AVCodecContext *avctx,
ff_h264_mb_inspect_cb cb, void *opaque);
Two new fields appended to H264Context (internal struct, declared in
h264dec.h not h264.h, no ABI surface to non-libavcodec callers).
Callback fires post-pixel-work for every MB in coded order; receives
const H264Context* so it can inspect any state (slice ctx via
h->slice_ctx, reconstructed pixels via h->cur_pic.f->data[plane],
etc.).
Default (cb==NULL): zero behaviour change, one load + one branch per
MB in the decoder hot path.
Shape distinction: per-MB observation, NOT per-kernel function-pointer
hijack (the 0003-0014 substitution-arc pattern that PR #105 reverted
+ daedalus-fourier PR #37's measurement-correction architecturally
retired). Per-block synchronous Vulkan dispatch from libavcodec is
non-competitive; per-MB CPU-side observation feeding a per-frame
daedalus-decoder batch submit is the right shape (frame-major UMA
dispatch verdict, memory: dejavu).
Used by:
- daedalus-decoder/tools/daedalus_decode_h264 (PR-A1b, follow-up)
- future daedalus-v4l2 daemon refactor
Wired into arch PKGBUILD source[] + prepare() and debian build-deb.sh
patch sequence. pkgrel bumped 12 → 13.
Refs reauktion/daedalus-decoder!12.
Reverts the no_qpu → qpu-capable ctx flip that landed via patch 0014
(marfrit-packages PR #104).
PR #104 was justified by daedalus-fourier PR #36's "QPU 4.30x faster
than CPU NEON" bench result. That number was a measurement artifact:
v3d_runner.read_spv() did a bare cwd-relative fopen() with no path
search, so when the bench was run from the source dir (as in PR #36),
the SPVs at $builddir/v3d_*.spv were not found, every QPU dispatch
returned -1 fast, and the bench loop timed the failure path.
daedalus-fourier PR #37 fixes the SPV search + bench preflight.
Corrected numbers on hertz (Pi 5 V3D 7.1):
kernel CPU ns/op QPU ns/op winner
IDCT 4x4 luma 10.75 217.63 CPU 20.24x
IDCT 8x8 luma 29.69 785.94 CPU 26.47x
Deblock luma_v 17.63 467.42 CPU 26.51x
Deblock luma_h 38.30 498.53 CPU 13.02x
qpel mc20 (8x8) 30.17 1300.44 CPU 43.10x
qpel mc02 (8x8) 17.69 1363.40 CPU 77.08x
qpel mc22 (8x8) 71.60 1948.37 CPU 27.21x
1080p sum: CPU 5.57 ms vs QPU 123.54 ms — QPU 22x SLOWER.
Until daedalus QPU dispatch overhead is actually competitive (separate
multi-task effort tracked on the daedalus-fourier side), libavcodec.so
substitution must stay on daedalus_ctx_create_no_qpu() so the host
processes (firefox-fourier RDD, mpv-fourier, daedalus_v4l2_daemon)
don't pessimize their H.264 decode path.
Adds 0015-h264-ctx-revert-to-no-qpu.patch (2-line revert of patch 0014)
to both arch PKGBUILD and debian build-deb.sh. Both pkgrel bumped
11 → 12. Refs reauktion/daedalus-fourier!37.
(Renumbered from 0013 — PR #102 landed 0013-h264-deblock-chroma-intra
while this PR was open, so the next free slot is 0014.)
Patches 0003 (IDCT 4x4) and 0007 (qpel mc20) created the libavcodec.so
process-global daedalus_ctx via daedalus_ctx_create_no_qpu(). Rationale
at the time: cycle 6/9 had only CPU NEON paths, so a QPU-capable ctx
would have meant pointless Vulkan init in every host process.
Two things changed since:
1. Every H.264 hot-path primitive now has a V3D7 compute shader.
IDCT 4x4/8x8 + 8 deblock variants (luma+chroma × V+H × inter+intra)
+ 30 qpel positions. See daedalus-fourier PRs #28-#35.
2. Dispatch overhead has been hammered down — buffer pool in
v3d_runner + persistent command buffer. daedalus-fourier PR #36
bench on hertz (Pi 5 V3D 7.1, 30 iters x 5 warmup):
1080p worst-case sum (IDCT4 + deblock luma + qpel mc22):
CPU NEON only: 5.57 ms
QPU only: 1.30 ms (CPU/QPU sum ratio = 4.30x)
PR #10's CPU-4x-faster-than-QPU verdict (which justified the original
no_qpu ctx choice) is reversed by ~17x.
This commit adds 0014-h264-ctx-qpu-capable.patch which flips both H.264
TUs (h264_idct_daedalus.c, h264_qpel_daedalus.c) from
daedalus_ctx_create_no_qpu() to daedalus_ctx_create().
daedalus_ctx_create() probes for a usable Vulkan device and falls back
to no_qpu mode if unavailable, so this is safe on hosts without V3D
(x86 build runners, Debian aarch64 builders without renderD, etc.).
Hosts WITH V3D (Pi 5 deployment targets) now route the H.264 hot-path
through V3D compute instead of CPU NEON.
Wired into both arch PKGBUILD (source[] + prepare()) and debian
build-deb.sh; both pkgrel bumped 10 → 11.
Refs reauktion/daedalus-fourier!36.
Adds 0008-panvk-bifrost-bump-max-image-dim-3d-for-dawn.patch. Two-hunk patch:
Hunk 1: Bumps maxImageDimension3D from 512 to 2048 on Bifrost (PAN_ARCH 7..10).
Surfaced by panvk-bifrost-perf-measurement iter1 spike: Brave's WebGPU/Dawn
detects panvk-bifrost as a Vulkan adapter on Mali-G52 r1 MC1 but rejects it
because the advertised limit is below WebGPU's 2048 minimum (per
third_party/dawn/src/dawn/native/vulkan/PhysicalDeviceVk.cpp:746). This is
the actual unblock for the campaign's stated motivator — Chromium GPU
process Vulkan boot on PineTab2 / Bifrost SBCs.
Per Vulkan 1.3 spec §43.1, maxImageDimensionXD is the upper bound on any
creatable image; per-format limits MAY be smaller. On PAN_ARCH<=10 the
per-format limit caps at ~1023 per axis for RGBA8 within the 4 GB
max_img_size_B address constraint. Apps trying 2048^3 with thick formats
hit the per-format limit at image-create — per-spec behavior.
Hunk 2: Removes three asserts in get_max_3d_image_size() that encoded the
wrong invariant (per-format >= basic), opposite of what the Vulkan spec
mandates. The asserts were release-mode-masked via NDEBUG, but debug
builds would abort the first time Dawn (or any client) called
vkGetPhysicalDeviceImageFormatProperties on a 3D image format. Surfaced
by Phase 5 2nd-model review.
Verified on PineTab2 (Mali-G52 r1 MC1, PAN_ARCH 7):
- vulkaninfo: maxImageDimension3D = 2048
- Brave/Dawn: "Insufficient Vulkan limits" warning eliminated; adapter
accepted for WebGPU.
- CTS regression: dEQP-VK.api.copy_and_blit.core.image_to_image.3d_images.*
6/6 Pass (unchanged from r7 baseline).
Phase 5 (2nd-model) review: APPROVE WITH CHANGES — both changes applied
(release-mode + debug-mode assert exposure addressed by removing the
wrong-invariant asserts).
Note on numbering: r8 was attempted (KHR_depth_clamp_zero_one trim) but
abandoned mid-Phase-3 when it surfaced that 5 more post-1.3.10 KHR
extensions are advertised — surgically false-gating all of them would
risk undoing r1's KHR_robustness2 work for Chromium Dawn. Documented at
~/src/panvk-bifrost/iter21/phase0to3_close_no_ship.md.
Cross-refs:
- ~/src/panvk-bifrost/iter22/phase0to2_max3d_close.md (Phase 0-2 close)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Substitutes c->v_loop_filter_chroma_intra and c->h_loop_filter_chroma_intra
with daedalus wrappers in the bit_depth=8 / chroma_format_idc<=1 (4:2:0)
branch. 4:2:2 stays on the in-tree NEON path (the daedalus chroma intra
dispatch is 4:2:0-only).
The fourier dispatches were exposed in PR #11 (DEFINE_INTRA_DISPATCH
macro generates the public daedalus_dispatch_h264_deblock_chroma_*_intra
symbols + recipe wrappers).
Re-architects the chroma init: v_loop_filter_chroma_intra was previously
assigned unconditionally to the NEON variant (which works for both 4:2:0
and 4:2:2). We now assign it INSIDE both branches of the chroma_format_idc
conditional — 4:2:0 picks daedalus, 4:2:2 keeps NEON. No regression for
4:2:2 streams.
Same NEON-to-NEON via recipe shape as 0010 luma intra.
Closes the deblock substitution layer for the 4:2:0 / 8-bit hot path:
- 0005 luma_v non-intra ✓
- 0008 luma_h non-intra ✓
- 0009 chroma_v / chroma_h non-intra ✓
- 0010 luma_v / luma_h intra ✓
- 0013 chroma_v / chroma_h intra ✓
All 8 deblock variants for the common 4:2:0 path now route through
daedalus. 4:2:2 chroma + the chroma422 mbaff variants stay on in-tree
NEON.
Verified the patch applies cleanly on top of 0001-0012 against the
pinned upstream commit b57fbbe5 on hertz.
Closes the H.264 qpel substitution. Extends 0007 (which routed only
mc20 put_) to ALL 15 useful positions in BOTH the put_ and avg_
tables, skipping mc00 (integer copy / pointer-only fast path).
29 substitutions total: 14 new put_ + 15 avg_. Each wraps a single
daedalus_recipe_dispatch_h264_qpel_{avg_,}mcXY call (the dispatches
landed in daedalus-fourier PRs #15-#20). Collapsed via a single
DEFINE_QPEL_WRAPPER macro on the libavcodec shim side so the diff
is uniform.
All recipe-table entries route AUTO to CPU NEON — no QPU shaders
for any qpel position other than mc20 yet. Plumbing-only
NEON-to-NEON via the daedalus recipe layer; bit-exact against
the in-tree ff_*_h264_qpel8_*_neon path (each daedalus dispatch is
already bit-exact-gated by the corresponding fourier PR's test).
16x16 qpel tables ([0][...]) stay on the in-tree NEON. daedalus
only exposes 8x8 today; 16x16 substitution can land once fourier
provides those variants.
Verified the patch applies cleanly on top of 0001-0011 against the
pinned upstream commit b57fbbe5 on hertz.
Substitutes H264DSPContext.chroma_dc_dequant_idct in the 4:2:0 /
bit_depth=8 init path with a wrapper that composes the daedalus
chroma DC Hadamard primitive (daedalus-fourier PR #25) with the
qmul scaling FFmpeg's reference does in one fused function
(h264idct_template.c::ff_h264_chroma_dc_dequant_idct).
Algorithm per H.264 §8.5.11.1 / §8.5.11.2:
1. Extract 4 DCs from the scattered positions in the per-MB
coefficient buffer (stride=32, xStride=16)
2. 2x2 Hadamard transform (daedalus primitive)
3. qmul scale + >> 7, write back to original positions
Bit-exact against ff_h264_chroma_dc_dequant_idct_8_c. The Hadamard
itself is gated by the fourier PR #23 7-case test suite (including
the H·H = 4·I algebraic invariant), and the public-API parity
test added in PR #25 confirms the src/ symbol matches the test ref.
4:2:2 chroma stays on the in-tree ff_h264_chroma422_dc_dequant_idct_c
path — same chroma_format_idc<=1 gating shape as 0009 chroma deblock.
Pin bump: _daedalus_fourier_commit / DAEDALUS_FOURIER_COMMIT bumped
to b9f9ff2a (post-PR #25) so the build picks up the public
daedalus_h264_chroma_dc_hadamard_2x2 symbol.
Verified the patch applies cleanly on top of 0001-0010 against the
pinned upstream commit b57fbbe5 on hertz.
Adds the bS=4 intra-strength variants of the already-substituted
luma_v / luma_h deblock (0005, 0008). Intra MBs and certain
inter-MB edges (4x4 transform boundaries inside an Intra_NxN
neighbour) force boundary strength to 4 per H.264 §8.7.2.1.
H264DSPContext.v_loop_filter_luma_intra →
daedalus_recipe_dispatch_h264_deblock_luma_v_intra
H264DSPContext.h_loop_filter_luma_intra →
daedalus_recipe_dispatch_h264_deblock_luma_h_intra
Both kernels landed in daedalus-fourier PR #11. Recipe → CPU NEON
(no intra QPU shaders yet); plumbing-only NEON-to-NEON via daedalus.
Signature differs from bS<4: no tc0 argument. Wrapper passes
daedalus_h264_deblock_meta with alpha/beta set; tc0[] is ignored by
the intra dispatch (bS=4 hardcodes the strength).
Chroma intra variants are deferred to a follow-up because the chroma
init has a 4:2:0 / 4:2:2 split (chroma_format_idc gating) — the
daedalus dispatch is 4:2:0-only and needs explicit conditional
substitution to avoid running on 4:2:2 chroma.
Verified the patch applies cleanly on top of 0001-0009 against the
pinned upstream commit b57fbbe5 on hertz.
Chroma siblings of 0005 (luma_v) and 0008 (luma_h). Same
NEON-to-NEON pattern via the daedalus recipe layer:
H264DSPContext.v_loop_filter_chroma →
daedalus_recipe_dispatch_h264_deblock_chroma_v
H264DSPContext.h_loop_filter_chroma →
daedalus_recipe_dispatch_h264_deblock_chroma_h
Both kernels landed in daedalus-fourier PR #10. Recipe table routes
AUTO to CPU NEON (no chroma QPU shaders yet), so this is plumbing-
only and stays bit-exact against the in-tree NEON.
Intra chroma (bS=4) loop filters remain on in-tree NEON;
daedalus_h264_deblock_meta covers the non-intra (bS<4) path.
Verified the patch applies cleanly on top of 0001-0008 against the
pinned upstream commit b57fbbe5 on hertz. Wires the new patch into
both arch/PKGBUILD and debian/build-deb.sh.
Adds patch 0008 to the substitution arc, mirroring 0005's V variant
for H.264 non-intra bS<4 horizontal luma deblock.
H264DSPContext.h_loop_filter_luma →
daedalus_recipe_dispatch_h264_deblock_luma_h
The H kernel was added to daedalus-fourier in PR #9 (vendored
ff_h264_h_loop_filter_luma_neon, wired through the same CPU-dispatch
pattern as V). Recipe table routes AUTO to CPU NEON (no QPU shader
for H yet), so this is a NEON-to-NEON substitution via the daedalus
recipe layer — same shape as 0005.
The libavcodec.so ctx remains no-QPU (daedalus_ctx_create_no_qpu),
matching the existing 0003/0004/0005/0007 patches. Higher-cycle
QPU init waits for a feature-flag gating change in a separate PR.
Intra (bS=4) h_loop_filter_luma_intra stays on the in-tree NEON .S
code; daedalus_h264_deblock_meta covers the non-intra path only.
A follow-up can route intra once daedalus-fourier exposes the
intra-h dispatch (the kernel already exists internally per fourier
PR #11).
Wires the new patch into both arch/PKGBUILD and debian/build-deb.sh
sequences. Verified the patch applies cleanly on top of 0001-0007
against the pinned upstream commit b57fbbe5 on hertz.
Adds 0007-panvk-bifrost-xfb-component-base-fix.patch — eliminates a
reliable SIGSEGV in vkCreateGraphicsPipeline whenever an XFB-bound
vertex output is declared with non-zero `layout (component=N)`.
Surfaced by dEQP-VK.transform_feedback.simple.holes_vert (Mali-G52 r1
MC1, PAN_ARCH 7). Backtrace lands 11 frames into libvulkan_panfrost.so
called from vkt::TransformFeedback::TransformFeedbackHolesInstance::
iterate.
Root cause: iter17's lower_xfb_output_iter17 (and upstream
pan_nir_lower_xfb, which has the identical `// TODO`) computes the
source-channel mask as `mask << channel_idx`, where channel_idx is
the varying-location component (0..3) but src only contains channels
starting at nir_intrinsic_component(intr). For a scalar declared
component=2, the lowering computed `mask << 2` against a 1-component
src — out-of-range; the malformed nir_def segfaulted in downstream
NIR constant-folding (nir_constant_expressions.c::evaluate_*).
Fix translates channel_idx to source-channel space by subtracting
nir_intrinsic_component(intr) before shifting the mask, and replaces
the elided release-mode asserts with explicit release-mode guards
(closes the same elision class as the original bug).
Verified on PineTab2 (Mali-G52 r1 MC1, PAN_ARCH 7) against vulkan-cts
1.3.10.0:
- holes_vert / holes_extra_draw_vert no longer SIGSEGV (now Fail
on color-check; that is a separate iter20 finding).
- basic_*: 36/36 Pass. depth_clip_*: 1 Pass + 4 NotSupported.
lines_or_triangles*: 16 NotSupported. 0 Fail across the set.
Caveat (not regressions): max_output_components_64/_128/_256 were
never reached on the r5 sweep — watchdog killed transform_feedback
after the holes_vert crash. With this fix in place, they now run
and surface their own pre-existing coredumps, confirmed on shipped
r6 baseline too. iter20+ territory.
Phase 5 (2nd-model) review: APPROVE WITH CHANGES (non-blocking).
Changes applied: release-mode defensive guards on both preconditions
plus a dispatcher-side comment clarifying the i*2+j semantics.
Cross-refs:
- ~/src/panvk-bifrost/iter19/phase{0,1,2,3}_holes_vert*.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
aish is an AI-augmented conversational shell in LuaJIT 2.x with FFI
bindings to libcurl, GNU readline, and libc — no C extensions, no
build step. Source-of-truth: git.reauktion.de/marfrit/aish, tag v0.1.0
(tarball sha256 9ebc3939e028832e39391ae33efacb5ec9bcd99d123cbc8ca1cd6ca9a640b5b5).
The arch and debian recipes mirror the lmcp pattern (pure-Lua any-arch
package, no makefile, install copies modules directly):
arch/aish/PKGBUILD — depends=(luajit readline curl)
debian/aish/build-deb.sh — pure dpkg-deb, SOURCE_DATE_EPOCH pinned
debian/aish/debian/{control,changelog,copyright}
Install layout, matching what main.lua's script-dir-relative package.path
expects after the wrapper execs `luajit /usr/share/lua/5.1/aish/main.lua`:
/usr/bin/aish ← bin/aish wrapper
/usr/share/lua/5.1/aish/{main,broker,context,executor,history,
mcp,renderer,repl,router,safety,secrets}.lua
/usr/share/lua/5.1/aish/ffi/{curl,libc,pty,readline}.lua
/usr/share/lua/5.1/aish/vendor/dkjson.lua
/usr/share/doc/aish/{README.md,LICENSE,examples/config.lua}
CI: two new jobs in .gitea/workflows/build.yml at the end of file.
aish-any chains needs:lmcp-debian (parallel-DAG with claude-his-any,
serialized via the shared arch-aarch64 runner — avoids needless wait
through the unrelated fourier stack). aish-debian chains needs:aish-any.
Both invoke the standard check-already-published.sh fast-skip on no-
change pushes.
Sonnet review (per feedback_reviews_use_sonnet.md + bugfix-process
step 4): no blockers. Folded in two findings before commit: switched
needs: from mpv-fourier-aarch64 to lmcp-debian (cleaner DAG, faster
cold-build wall clock), removed the dead Build-Depends: debhelper-
compat line from debian/aish/debian/control (build-deb.sh doesn't
use debhelper).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backports Mesa main's unconditional flip of VK_EXT_legacy_dithering.
Pure-software composition; no new HW path. vk_render_pass already gates
on enabled_features.legacyDithering and panvk_vX_blend + pan_format
already plumb the dithered BLEND descriptor (BFMT2 table has MALI_BLEND_AU
encodings for RGB565/RGB5A1/RGBA4/RGB10A2 on PAN_ARCH 7). Our r5 base
just hadn't picked up the cherry-pick.
Phase 7 verify on ohm (PineTab2 / RK3566 / Mali-G52 r1 MC1) with a
locally-built r6 lib at /tmp/r6_test_lib/:
Feature advertisement:
r5: VK_EXT_legacy_dithering not in extension list, legacyDithering=false
r6: VK_EXT_legacy_dithering rev 2 advertised, legacyDithering=true
dEQP subsets (delta):
dEQP-VK.api.*.legacy_dithering* r5/r6 both: 2 P / 0 F (identical)
dEQP-VK.renderpass.dithering.* r5/r6 both: 0 P / 0 F / 94 NS (identical)
dEQP-VK.renderpass2.dithering.* r5/r6 both: 0 P / 0 F / 94 NS (identical)
Net: zero regressions, advertisement-only delta as expected.
Second-model review (per bugfix-process step 4) traced the full code
path through vk_render_pass + panvk_vX_cmd_draw + panvk_vX_blend +
pan_format BFMT2. No interaction with our r1 nullDescriptor (disjoint
paths). Mesa upstream marks ext DONE for panvk in docs/features.txt.
ARM's own libmali r51p0 driver (BXODROIDN2PL, 2024-08) lists
VK_EXT_legacy_dithering in its Vulkan extension string table,
confirming the feature is shipped by ARM for Mali-G52-class hardware.
Follow-up out of scope: the 94 renderpass-dithering tests show as
NotSupported on both r5 and r6 — there's a separate panvk-side
prereq the dEQP harness checks (likely a specific format-feature
combination). Worth investigating in a future iteration.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Arch: version 147→148, drop enable_nacl (removed upstream), fix
nv12-external-oes patch context for 148 (base/numerics/safe_conversions.h
include removed upstream). Header comment updated: native build fiction →
cross-compile reality.
Debian: new build-deb.sh that assembles .deb from pre-built artifacts
on CT 220 (data). Same binary artifacts as the Arch package, launcher at
/usr/bin/chromium-fourier (no Conflicts with stock chromium on Debian).
Both packages published to packages.reauktion.de:
- Arch: marfrit/aarch64/chromium-fourier 1:148.0.7778.178-1
- Debian: trixie+bookworm/main/arm64 chromium-fourier 1:148.0.7778.178-1
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backports Mesa main's unconditional flip of .fragmentStoresAndAtomics
to true in src/panfrost/vulkan/panvk_vX_physical_device.c. Closes
the Dawn-WebGPU adapter rejection at PhysicalDeviceVk.cpp:250 that
caused brave-vulkan to fall back to the SwiftShader CPU adapter on
PineTab2/Mali-G52, per marfrit/panvk-bifrost#2.
Phase 7 verify on ohm (PineTab2, RK3566, Mali-G52 r1 MC1) with a
locally-built r5 lib installed to /tmp/r5_test_lib/:
dEQP-VK.glsl.atomic_operations.*:
r4: 48 pass / 0 fail / 992 NotSupported (1040 total)
r5: 80 pass / 0 fail / 960 NotSupported (1040 total)
delta: +32 newly-passing, zero new failures
dEQP-VK.image.store.*:
r4: 2772 pass / 0 fail / 238 NotSupported (3010 total)
r5: 2772 pass / 0 fail / 238 NotSupported (3010 total)
delta: identical (image.store is independent of the flag)
The disjunction with instance->force_enable_shader_atomics is kept as
a documented kill-switch even though the compiler folds it away —
it leaves the DRI option pan_force_enable_shader_atomics semantically
wired for future rebases or downstream debugging.
Patch reviewed via 2nd-model pass (per bugfix-process step 4):
recommended keeping the disjunction (applied), Bifrost-only-vs-unconditional
left unconditional to match upstream (applied), pre-ship CTS subset
(applied with results above).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both PKGBUILDs referenced url=https://github.com/marfrit/panvk-bifrost,
which was a hallucinated URL — no such repo existed. The campaign's
real source-of-truth home was just created at
https://git.reauktion.de/marfrit/panvk-bifrost (mfritsche, 2026-05-23).
Point both PKGBUILDs at the real URL so `pacman -Si` and any consumer
reading package metadata follows a working link.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
H264QpelContext.put_h264_qpel_pixels_tab[1][2] (8x8 luma horizontal
half-pel, 6-tap "put" — the canonical representative of the H.264
luma motion-compensation family) now dispatches through
daedalus_recipe_dispatch_h264_qpel_mc20 instead of
ff_put_h264_qpel8_mc20_neon.
Cycle 9 of the daedalus-v4l2#11 step 2 substitution arc; closes the
4-cycle libavcodec.so substitution sequence:
cycle 6 (PR #76) H.264 IDCT 4x4 done
cycle 7 (PR #85) H.264 IDCT 8x8 done
cycle 8 (PR #86) H.264 luma-v deblock done
cycle 9 (this) H.264 qpel mc20
Bumps daedalus-fourier pin d87239d → 209a421 (PR #2 — public API
gains daedalus_recipe_dispatch_h264_qpel_mc20 +
DAEDALUS_KERNEL_H264_QPEL_MC20).
Verdict per docs/k9_h264qpel_mc20.md: CPU NEON. Per-block 7.6 ns at
131 Mblock/s gives 135× margin over 30 fps 1080p; QPU dispatch floor
at ~250 ns makes any V3D shader strictly worse. Substitution is
plumbing-only — same daedalus_ctx_create_no_qpu pthread_once shape
the cycles 6/7/8 shims already own (kept SEPARATE from the H264DSP
shim's ctx because H264QPEL is its own libavcodec Makefile module
and link order does not guarantee a single .o owns the ctx symbol;
one extra ~µs init per process, paid lazily on first MC call).
Other H.264 luma MC variants (mc02, mc11, mc22 etc.) and the 16x16
size tier stay on the in-tree NEON .S code per the cycle-9 phase-1
rationale (mc20 8x8 is representative; remaining variants would
multiply recipe-lookup overhead without changing the substrate
verdict).
Bit-exact against ff_put_h264_qpel8_mc20_neon (daedalus-fourier
cycle 9 green; 10000/10000 random blocks bit-exact, M3 = 131 Mblock/s).
No SONAME change, no Depends change. PKGREL 9 → 10.
Refs reauktion/daedalus-v4l2#11 — substitution arc step 2 cycle 9.
Picks up reauktion/daedalus-v4l2 PR #20 (closes#19): wire-protocol
cap DAEDALUS_PROTO_MAX_PAYLOAD raised from 64 KiB to 1 MiB.
DAEDALUS_MAX_BITSTREAM follows; daedalus_fill_output_fmt now reports
OUTPUT_MPLANE sizeimage = ~1 MiB.
Fixes the Firefox YouTube avc1 SW-fallback observed on higgs when
any H.264 slice exceeded 64 KiB (routine on 720p+ streams).
libva-v4l2-request-fourier's S_FMT-driven OUTPUT-pool resize was
clamping back to 65484 and Firefox lost the slice; now the kernel
honours the larger sizeimage.
Both packages bumped to 0.1.0+r45+g872eec5-1:
- daedalus-v4l2 (daemon): r43 -> r45. Daemon-side allocations
are dynamic, so the only growth is one ~1 MiB read buffer per
daemon process at startup.
- daedalus-v4l2-dkms (kernel module): r33 -> r45. Skips the
daemon-only bumps r37/r39/r41/r43 (no kernel/include change in
that range) and lands the PROTO_MAX_PAYLOAD bump.
LOCK-STEP INSTALL REQUIRED: effective cap is min(kernel, daemon).
A stale kernel with a new daemon (or vice versa) still rejects
>64 KiB payloads. apt/pacman should pick both up in one
transaction since they share the same upstream pin.
Wire-protocol value-only change in include/daedalus_v4l2_proto.h;
struct layout unchanged. DAEDALUS_PROTO_VERSION stays at 0.
Picks up reauktion/daedalus-v4l2 PR #18 (closes#17): daemon drops
degenerate (<4 byte) bitstreams at REQ_DECODE entry instead of
letting avcodec_send_packet emit AVERROR_INVALIDDATA, replies
RESP_FRAME NO_FRAME so libva's V4L2 surface pool stays alive.
Fixes the Firefox YouTube avc1 pause→resume regression observed on
higgs: libva-v4l2-request-fourier flushes a 3-byte stub into
OUTPUT_MPLANE at the pause boundary; the old daemon path turned
that into a decode failure, Firefox marked H.264-via-VAAPI as
broken for the session, and routed every subsequent frame to
libmozavcodec SW. After this bump the daemon logs 'tiny bitstream
3 bytes — dropping as no-op' and the next real REQ_DECODE
proceeds normally.
Wire protocol unchanged. daedalus-v4l2-dkms bump not needed.
FFmpeg 8.x dropped the H.264 decoder's low_delay code path —
AV_CODEC_FLAG_LOW_DELAY no longer prevents h264_select_output_frame
from running the display-order DPB output queue. The daedalus-v4l2
daemon's `ctx->flags |= AV_CODEC_FLAG_LOW_DELAY` at
daemon/src/decoder.c:202 has been a silent no-op since the SONAME
61→62 jump landed in reauktion/daedalus-v4l2 PR #16; on Firefox
YouTube this re-introduced the 2-1-4-3 B-frame pair-swap that PR
#12's daemon flag was supposed to prevent.
Fix lives in libavcodec, not the daemon: restore the documented
LOW_DELAY semantics so the daemon (and any other V4L2-stateless-
style consumer) keeps the one-frame-per-send_packet decode-order
output contract it already declares.
## Patch
0006-h264-restore-low-delay.patch touches libavcodec/h264_slice.c:
- h264_select_output_frame: early-exit when LOW_DELAY is set.
Emit the just-decoded picture as next_output_pic, mirror the
corruption / recovery-point tracking the main path performs,
skip delayed_pic[] / POC reorder machinery entirely.
- h264_field_start: suppress the SPS-driven
`has_b_frames = sps->num_reorder_frames` clobber when LOW_DELAY
is set. Without this the per-slice bitstream_restriction_flag
re-pickup would reintroduce a nonzero reorder buffer mid-stream
even after the daemon set has_b_frames=0 at avcodec_open2.
## Why not daemon-side
A daemon SPS-rewrite (`num_reorder_frames=0`) was considered but
rejected: it works only for the daemon's reconstructed SPS NAL,
not for any in-band SPS the daemon dlopens libavformat to parse
in other code paths. Restoring documented FFmpeg flag semantics
is the smaller, more durable change and keeps the daemon
interface stable.
## Packaging
- PKGREL/pkgrel bump to 9.
- No new build-deps, no Depends change.
- Substitution arc cycles 6/7/8 unchanged.
## Refs
- reauktion/daedalus-v4l2#11 / #12 (LOW_DELAY half-measure on
daemon side, originally landed against FFmpeg 7.x).
- daemon/src/decoder.c:202 (`ctx->flags |= AV_CODEC_FLAG_LOW_DELAY`
for H.264 only — unchanged, but now actually has effect again).
Cycle 8 of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11
step 2). H264DSPContext.v_loop_filter_luma — non-intra bS<4 vertical
luma deblock, called per macroblock-row edge from the slice deblock
loop in libavcodec/h264_loopfilter.c — now dispatches through
daedalus_recipe_dispatch_h264_deblock_luma_v instead of
ff_h264_v_loop_filter_luma_neon.
## What
- Add 0005-h264-deblock-luma-v-daedalus-fourier.patch (in both arch/
and debian/ ffmpeg-v4l2-request-fourier/). Extends
libavcodec/aarch64/h264_idct_daedalus.c with
ff_h264_v_loop_filter_luma_daedalus (constructs a
daedalus_h264_deblock_meta from FFmpeg's (alpha, beta, tc0[4]) and
calls daedalus_recipe_dispatch_h264_deblock_luma_v with n_edges=1).
Patches libavcodec/aarch64/h264dsp_init_aarch64.c to wire
c->v_loop_filter_luma to the new shim.
- arch/PKGBUILD + debian/build-deb.sh: append patch + bump pkgrel/PKGREL
to 8.
- No new build-deps, no Depends change, no daedalus-fourier rev — the
d87239d pin already exposes daedalus_recipe_dispatch_h264_deblock_luma_v.
## Why
Cycle 8 is marked "CPU primary; QPU opportunistic" in the daedalus-
fourier API docstring. Per the hybrid substrate philosophy
("if there's a coprocessor, use it") we eventually want the QPU
opportunism active here. But the libavcodec.so context is
process-global and shared with cycles 6/7 via pthread_once, and it
uses daedalus_ctx_create_no_qpu deliberately to avoid implicit
Vulkan init in arbitrary host processes (Firefox content, mpv-fourier,
ffmpeg-fourier CLI, ...). Switching to daedalus_ctx_create here
without a feature flag would be a footgun.
So cycle 8 lands as plumbing-only NEON-by-recipe substitution for
now; opportunistic QPU enablement is a separate follow-up that adds
a DAEDALUS_FOURIER_ENABLE_QPU env var or equivalent.
## Scope NOT covered
- Intra (bS=4) loop filter c->v_loop_filter_luma_intra — daedalus's
daedalus_h264_deblock_meta only covers the non-intra path.
- Horizontal-edge variant c->h_loop_filter_luma — separate kernel
(not yet in daedalus-fourier API).
- Chroma loop filters — separate kernels.
- Bulk batching — single-edge dispatch wastes the kernel's n_edges>1
amortization. Same caveat as cycles 6/7; follow-up.
- QPU opportunism — see "Why" above.
## SONAME
Unchanged. libavcodec.so.62 / libavformat.so.62 / libavutil.so.60.
## Refs
- reauktion/daedalus-v4l2 issue #11: reauktion/daedalus-v4l2#11
- marfrit-packages PR #76 (cycle 6 IDCT 4×4)
- marfrit-packages PR #85 (cycle 7 IDCT 8×8)
- marfrit/daedalus-fourier cycle 8 close (deblock luma-v NEON green)
Cycle 7 of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11
step 2). H264DSPContext.idct8_add — called per 8×8 block from the
High-profile intra-8×8-DCT decode path in libavcodec/h264_mb.c — now
dispatches through daedalus_recipe_dispatch_h264_idct8 instead of
ff_h264_idct8_add_neon.
## What
- Add 0004-h264-idct8-daedalus-fourier.patch (in both arch/ and debian/
ffmpeg-v4l2-request-fourier/). Extends libavcodec/aarch64/
h264_idct_daedalus.c (introduced by 0003) with ff_h264_idct8_add_daedalus
and a daedalus_recipe_dispatch_h264_idct8 call; patches
libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->idct8_add to
the new shim.
- arch/PKGBUILD + debian/build-deb.sh: append the new patch to the
apply list; bump pkgrel/PKGREL to 7.
- No new build-deps, no Depends change, no daedalus-fourier rev — the
d87239d pin already exposes daedalus_recipe_dispatch_h264_idct8.
## Why
The recipe layer picks the substrate; for cycle 7 (H.264 IDCT 8×8)
the recipe is CPU NEON, so this is effectively a NEON-to-NEON
substitution layered on top of cycle 6. Production validation of
cycle 6 on higgs Firefox YouTube: 3040 frames decoded cleanly,
avg_decode_us=3388 (no regression vs the pre-substitution ~4 ms
baseline). Cycle 7 inherits the same shim's pthread_once context.
Bit-exact against ff_h264_idct8_add_neon (daedalus-fourier cycle 7
green; FFmpeg 8×8 block storage block[r + 8*c] matches daedalus
column-major convention).
## Scope NOT covered (deferred)
- Bulk c->idct8_add4 (inter 8×8-DCT macroblocks) stays on the
in-tree NEON .S code; batched substitution with n_blocks>1 lands
later alongside the cycle-6 bulk-paths work.
- High-bit-depth (10-bit) path untouched.
- Cycles 8/9 — separate PRs.
## SONAME
Unchanged. libavcodec.so.62 / libavformat.so.62 / libavutil.so.60.
## Refs
- reauktion/daedalus-v4l2 issue #11 (substitution arc): reauktion/daedalus-v4l2#11
- marfrit-packages PR #76 (cycle 6 IDCT 4×4)
- marfrit-packages PR #78 (libxml2 ABI-skew workaround)
- marfrit/daedalus-fourier cycle 7 close (H.264 IDCT 8×8 NEON green)
The original 0005 patch was generated from the pre-Phase-5-review source
snapshot (phase5_review_input_2026-05-21.tgz), missing the four
load-bearing review fixes that landed in the post-review snapshot:
- probe_hantro gate on KHR_video_* extension advertisement
- per-session ts_counter (was process-global static)
- panvk_v4l2_session_finish full unwind (munmap + STREAMOFF + REQBUFS=0)
- MIN2(rb.count, 18) clamp on num_*_buffers
Run #162 (job 17032) failed in prepare() because the PKGBUILD sanity
check 'grep -q "KHR_video_queue = PAN_ARCH < 9 && panvk_v4l2_probe_hantro()"'
didn't match the actual patched output (which still had the pre-review
'KHR_video_queue = PAN_ARCH < 9,').
This patch (regenerated from phase5_post_review_2026-05-21.tgz) carries
all four review fixes. Validated locally: vanilla mesa-26.0.6 + r1..r4 +
this patch reproduces prepare()-OK byte-for-byte.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The original PR #79 used symlinks for 0001..0004 patches (pointing into
../mesa-panvk-bifrost/) to avoid drift between siblings. CI's
"cp -r arch/mesa-panvk-bifrost-video /tmp/build-..." preserves the
symlinks, but the destination /tmp/build-... has no sibling dir to
resolve them against, so makepkg errors with:
==> ERROR: 0001-panvk-expose-robustness2-nullDescriptor-bifrost.patch
was not found in the build directory and is not a URL.
Each Arch PKGBUILD owns its source files per convention; the
duplication risk is low because r1..r4 are closed-release patches.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The merge commit for PR #79 (e7cc22e42) did not auto-fire the
Gitea Actions workflow despite touching paths matched by the
build.yml filter (arch/** + .gitea/workflows/**). No run row
exists between #160 (PR #78 merge) and now. This README touch
is a no-op content change to force a fresh workflow_dispatch
through the standard push trigger.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
panvk-bifrost-video campaign close. Phase 4 byte-exact validated
2026-05-21 on RK3566/PineTab2 (Mali-G52 r1 MC1 + hantro VPU): 48/48
unique BBB display frames decoded by this driver are byte-identical
to ffmpeg+libva-v4l2-request-fourier on the same hantro hardware
(frame 42 Y md5 = 54b9b396e6cd377256eb4bce0efc0bed both ways).
Phase 5 second-model review passed; load-bearing findings applied.
Co-installs at /usr/lib/panvk-bifrost-video/ parallel to the r4
sibling at /usr/lib/panvk-bifrost/; opt-in via VK_ICD_FILENAMES.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First cycle of the libavcodec.so substitution arc (reauktion/daedalus-v4l2#11
step 2). H264DSPContext.idct_add — called per 4×4 block from the
intra-4×4 decode path in libavcodec/h264_mb.c — now dispatches through
daedalus_recipe_dispatch_h264_idct4 instead of ff_h264_idct_add_neon.
## What
- Add 0003-h264-idct4-daedalus-fourier.patch (in both arch/ and
debian/ ffmpeg-v4l2-request-fourier/). Creates
libavcodec/aarch64/h264_idct_daedalus.c (ff_h264_idct_add_daedalus
shim + lazy pthread_once context init via
daedalus_ctx_create_no_qpu), patches
libavcodec/aarch64/h264dsp_init_aarch64.c to wire c->idct_add to
the shim, adds the new .o to libavcodec/aarch64/Makefile.
- arch/PKGBUILD + debian/build-deb.sh: fetch + build
daedalus-fourier (pinned at d87239d — lockstep with the
daedalus-v4l2 daemon's inline build) with
-DCMAKE_POSITION_INDEPENDENT_CODE=ON into a per-build temp prefix,
then pass --extra-cflags=-I.../include --extra-ldflags=-L.../lib
--extra-libs="-ldaedalus_core -lvulkan -lpthread" to FFmpeg
configure. daedalus_core.a is static-linked into libavcodec.so.62.
- debian/control Depends gains libvulkan1 (daedalus_core PUBLIC-links
Vulkan::Vulkan for the queryable QPU substrate; the no-QPU
constructor still works at runtime but the loader needs
libvulkan.so.1 present to dlopen libavcodec.so.62).
- arch/PKGBUILD depends gains vulkan-icd-loader, makedepends gains
cmake / ninja / vulkan-headers.
## Why
The recipe layer picks the substrate; for cycle 6 (H.264 IDCT 4×4)
the recipe is CPU NEON, so this is effectively a NEON-to-NEON
substitution with one extra dispatch call and recipe-table lookup.
The point of this first cycle isn't perf wins — it's plumbing. Once
the path is wired and stable, follow-up patches batch through the
bulk paths (idct_add16 / idct_add16intra / idct_add8) and stack
cycles 7/8/9 (IDCT 8×8, luma-v deblock, qpel mc20).
Bit-exact against ff_h264_idct_add_neon (daedalus-fourier cycle 6
green; FFmpeg's 4×4 block storage matches daedalus's column-major
convention).
## Scope NOT covered
- Bulk paths (idct_add16 / idct_add16intra / idct_add8) — most IDCT
4×4 calls in real H.264 streams go through these, not the per-
block c->idct_add path; intra-4×4-only macroblocks are a minority.
Batched substitution lands in a follow-up.
- High-bit-depth (10-bit) path — not touched; 8-bit only.
- Cycles 7/8/9 — separate PRs.
## SONAME
Unchanged. libavcodec.so.62 / libavformat.so.62 / libavutil.so.60.
No daedalus-v4l2-dkms or daedalus-v4l2 bump required.
## Refs
- reauktion/daedalus-v4l2 issue #11 (substitution arc): reauktion/daedalus-v4l2#11
- marfrit/daedalus-fourier cycle 6 close (H.264 IDCT 4×4 NEON green)
Daemon-only bump (no daedalus-v4l2-dkms change needed; PROTO_VERSION
stays at 0).
#12 (LOW_DELAY half-measure): daemon sets AV_CODEC_FLAG_LOW_DELAY on
the H.264 AVCodecContext so libavcodec emits frames in decode order
~99% of the time (a few stragglers at GOP boundaries when the
stream's SPS num_reorder_frames overrides the flag). Visible
improvement vs the 2-1-4-3 pair-swap on Firefox + mpv playback;
not the permanent fix — see daedalus-v4l2#11 for the architectural
plan to substitute daedalus-fourier kernels for libavcodec's
pixel math one cycle at a time.
#13 (daedalus-fourier linkage): daemon now pkg-config-links against
the daedalus-fourier kernel library (marfrit/daedalus-fourier) and
logs substrate availability at startup. No kernels dispatched yet
— this is the build-time foundation for the substitution work.
build-deb.sh updated to fetch + build + install daedalus-fourier
(pinned at d87239d, marfrit/daedalus-fourier PR #1) into a per-
build temp prefix before invoking the daemon's cmake, exposing it
via PKG_CONFIG_PATH. Static-linked, so the resulting .deb has no
new runtime deps. Requires libvulkan-dev + glslang-tools on the
CI runner.
Arch PKGBUILD bumped to the same upstream commit but Arch packaging
for daedalus-fourier itself is a follow-up; until that lands the
Arch build expects daedalus-fourier installed by the user (AUR-style).
Debian-side is end-to-end self-contained via build-deb.sh.
Refs:
* reauktion/daedalus-v4l2#12
* reauktion/daedalus-v4l2#13
* reauktion/daedalus-v4l2#11
* marfrit/daedalus-fourier#1
Lock-step downgrade of both packages to the revert tip of
daedalus-v4l2 (PR #10 closed PRs #7 + #8). After
0.1.0+r28+g79256dc-1 / 0.1.0+r30+g6ffe92b-1 landed in production,
mpv (--hwdec=vaapi-copy) failed pre-playing with "Unable to dequeue
buffer: Resource temporarily unavailable" because the daemon
parked CAPTURE buffers waiting for libavcodec's display-order
reorder, violating libva's V4L2 stateless 1:1 contract. See
daedalus-v4l2#9 for the diagnostic, #10 for the revert PR.
DAEDALUS_PROTO_VERSION drops 1 → 0; install both .debs in the same
apt transaction. Userspace ABI returns to the f0d4186-equivalent
behaviour, plus PR #4 (cosmetic H.264 menu controls). The
daedalus-v4l2-dkms #64 multi-kernel postinst behaviour stays in
build-deb.sh.
Visible regression: H.264 B-frame streams in Firefox return to the
"2 1 4 3 6 5" pair-swap visual. Proper fix (concurrent in-flight
requests in daemon + display-order reorder moved into libva-v4l2-
request-fourier) tracked at daedalus-v4l2#11.
Refs:
* reauktion/daedalus-v4l2#9
* reauktion/daedalus-v4l2#10 (merged)
* reauktion/daedalus-v4l2#11
iter17 closes the 162 winding_* CTS failures from iter15's baseline by
replacing the upstream pan_nir_lower_xfb call with a panvk-specific NIR
pass (panvk_per_arch(nir_lower_xfb)) that handles per-primitive
decomposition for non-LIST topologies (LINE_STRIP, TRIANGLE_STRIP,
TRIANGLE_FAN, and the four _WITH_ADJACENCY variants).
Topology + per-instance output vertex count are threaded as new sysvals
(vs.xfb_topology + vs.xfb_output_count) so the NIR pass can dispatch
per-topology at runtime without compiling 7+ shader variants.
dEQP-VK.transform_feedback.simple.* result (133596 cases total):
iter15 baseline -> iter17
Pass: 796 958 (+162)
Fail: 243 81 (-162; resume_* by-design only)
NotSupported: 132551 132551
Fatal-skip: 6 6
Pass rate of runnable: 76.2% -> 91.7% (+15.5pp)
100% of the iter15 winding-fail cluster closed. The remaining 81 fails
are all resume_* (pause/resume XFB, by design — we advertise
transformFeedbackDraw=false).
Second-model review (janet) produced 3 findings; Findings 1+2 were
already fixed in the in-tree applied state (stale applied_state/ snapshot
read by reviewer), Finding 3 (degenerate N underflow on N<2) addressed
by gating non-LIST emission on `output_count > 0` predicate.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Kernel-only bump. Fixes the hard-reboot regression introduced by
the daedalus-v4l2#7 split-completion design and observed on higgs
(Pi CM5) during the first mpv vaapi-copy playback of 720p H.264:
device_run now removes src + dst from m2m_ctx's rdy_queue at the
moment it picks them up, not at buf_done time. Without this, a
parked dst_buf (waiting for libavcodec's display-order release)
stayed in the rdy_queue and got re-picked by the next device_run
after SRC_CONSUMED's job_finish released the scheduler — two
inflight entries on the same vb2_buffer, later HAS_PIXELS calls
list_del on an already-detached list_head, panic.
DAEDALUS_PROTO_VERSION stays at 1 — daemon (userspace
daedalus-v4l2) need NOT bump in lockstep with this DKMS update.
The existing daedalus-v4l2 0.1.0+r28+g79256dc is wire-compatible
with daedalus-v4l2-dkms 0.1.0+r30+g6ffe92b.
Refs:
* reauktion/daedalus-v4l2#8
Bumps both Arch PKGBUILD and Debian build-deb.sh pins to PR #16 —
codec_store_buffer + request_pool_resize transparent OUTPUT-pool grow
on a mid-session resolution upshift overrun. Picks up the frame-
survival path that supersedes #13's drop-and-recreate fallback.
Dual-pin per feedback_marfrit_packages_dual_pin so both Arch and
Debian repos see check-already-published.sh report a new version.
Lock-step bump of both packages to daedalus-v4l2#7 + #4. PROTO_VERSION
bumps 0 → 1 at the daemon ↔ kernel chardev wire: REQ_DECODE adds
__u64 src_pts (the OUTPUT vb2 timestamp); RESP_FRAME adds __u32 flags
(HAS_PIXELS / SRC_CONSUMED) + __u64 output_src_pts (= frame->pts on
drain). Both .debs must be installed atomically or the chardev
handshake rejects the version mismatch.
* daedalus-v4l2: daemon's send_packet → receive_frame loop now
stamps pkt->pts = req->src_pts and looks up the cookie for each
drained frame via frame->pts. chardev_client emits multiple
RESP_FRAME messages per REQ_DECODE when libavcodec's display-
order DPB releases an earlier frame on receipt of a later
bitstream — fixes the "2 1 4 3 6 5" pair-swap on H.264 streams
with B-frames.
* daedalus-v4l2-dkms: kernel device_run mirrors src_buf timestamp
into REQ_DECODE.src_pts. Completion path splits HAS_PIXELS /
SRC_CONSUMED: src is released as soon as send_packet succeeds
(so the m2m scheduler moves on), dst stays parked until the
matching frame is drained later. TIMESTAMP_COPY's auto src→dst
pairing no longer applies once lifecycles decouple — dst is
stamped explicitly from inflight->src_pts at HAS_PIXELS time.
* daedalus-v4l2-dkms also carries forward the -2 multi-kernel
postinst fix (#64) from the prior PKGREL. PKGREL resets to 1 on
the new upstream pin.
The daedalus-v4l2#4 H.264 DECODE_MODE + START_CODE menu controls (a
cosmetic warning fix that PR landed alongside #7) is also subsumed —
"Unable to set control(s) error_idx=2/2" no longer fires.
Refs:
* reauktion/daedalus-v4l2#7
* reauktion/daedalus-v4l2#4
* reauktion/daedalus-v4l2#6
Bumps both the Arch PKGBUILD and the Debian build-deb.sh pins to PR
#14 merge — codec_store_buffer bounds-checks for VASliceDataBufferType.
Picks up the SIGSEGV fix for mpv --hwdec=vaapi-copy on resolution
upshift mid-stream (issue #13).
Dual-pin so check-already-published.sh detects both pool ABIs as
needing a fresh build.
Closes#60.
Resolves the malformed-patch issue from #61 (since reverted in #62)
by regenerating the 0003 patch via actual application against firefox
150.0.3 Pi-OS source.
Functional change vs prior 0003: walking hw_configs accepts
AV_HWDEVICE_TYPE_DRM (legacy) OR integer device_type values 13/14
(AV_HWDEVICE_TYPE_V4L2REQUEST in Kwibos no-AMF / upstream-AMF trees).
CreateV4L2RequestDeviceContext passes integer 13 (Kwibos value) cast
to enum AVHWDeviceType for the av_hwdevice_ctx_create call.
Tested: applied cleanly via patch -p1 against firefox-150.0.3 source
post-Pi-OS-quilt-patches. Test build follow-up in firefox-rpios EC2
script (drops the in-source sed hack from v7-v8).
Closes part of #60 (firefox-side patch update for fourier2 ffmpeg).
Background: libavcodec61-fourier2 (Kwiboo v4l2-request-n7.1.3 backed)
registers its hwaccels with AV_HWDEVICE_TYPE_V4L2REQUEST (the dedicated
enum added in FFmpeg 7.1+), not AV_HWDEVICE_TYPE_DRM as fourier1 did.
The firefox-fourier patch #3 walked hw_configs looking only for DRM
and fell through to software for every codec.
Patch updates:
- CreateV4L2RequestDeviceContext now takes an int aDeviceType (Mozillas
bundled libavutil headers may lack the V4L2REQUEST enumerator), passed
through to av_hwdevice_ctx_create.
- hw_configs walk accepts DRM (legacy) OR V4L2REQUEST integer value
(13 on Kwibooss no-AMF tree, 14 on upstream-AMF tree).
- Renamed mDRMDeviceContext to mV4L2RequestDeviceContext for accuracy.
Build pkgrel will be bumped at debian-package level to +fourier2.
Upstream PR #3 — kernel per-context vb2_queue lock so concurrent
clients of /dev/video0 don't serialise on a device-wide mutex.
Pi 5 Firefox VAAPI playback (RDD + content + GPU processes each
opening the device) now works without S_FMT EBUSY collisions.
Verified on higgs: YouTube playback engages daedalus at sustained
~230 fps decode through the libavcodec dlopen path, ~7× headroom
over the 30fps@1080p Pi 5 Fourier target.
Both packages: pkgver 0.1.0.r24.f0d4186, pkgrel reset to 1.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bumps both Arch (PKGBUILD) and Debian (build-deb.sh) sides in one commit
this time — following the dual-pin lesson from PR #53.
77f9236 = libva PR #12 merge: src/av1.{c,h} implements av1_set_controls
mapping VAPictureParameterBufferAV1 onto struct v4l2_ctrl_av1_sequence,
queued via S_EXT_CTRLS as V4L2_CID_STATELESS_AV1_SEQUENCE. The
daedalus_v4l2 daemon track will consume the ctrl to synthesise an
OBU_SEQUENCE_HEADER and prepend it to the slice bitstream, so libdav1d
can parse the OUTPUT buffer that ffmpeg-vaapi delivers without the
sequence header.
Until the daemon-side OBU synth lands (issue #11 operator track), the
SEQUENCE ctrl is just sitting in the request unused. Harmless on the
RK3588 vpu981 hardware path (vpu981 parses OBU bytes directly, ignores
the ctrl payload).
pkgver: r382.c1bb444 -> r386.77f9236 (commit count 382 -> 386, two new
upstream commits: 9fa18f2 av1 + 77f9236 merge).
pkgrel: 1 (fresh pkgver, no rebuild-only iteration).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Upstream PR #2 landed the one-line kernel fix that was the missing
half of issue libva-v4l2-request-fourier#8: device_run now calls
v4l2_ctrl_request_setup() before reading ctrl->p_cur, so the
daedalus_h264_meta the daemon receives reflects the in-flight
media_request's bound H.264 stateless control values instead of
stale/default ones.
Pairs with libva-v4l2-request-fourier 1.0.0+r382+gc1bb444 (max_num_
ref_frames fallback + Fix 4 instrumentation that exposed the
control-binding gap in the first place).
Effect on Pi 5 / CM5 hosts (higgs): ffmpeg -hwaccel vaapi against
H.264 sources now produces actual decoded content (per-frame
fnv1a hashes differ, zero MB-decode errors) instead of the
constant 0x6a6a05c5 "best-effort give-up" hash and cascading
decode warnings.
Both packages: pkgver 0.1.0.r22.462aa4b, pkgrel reset to 1.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bumps the libva backend pin to include marfrit/libva-v4l2-request-fourier
PR #9 — h264_set_controls fix for the bitstream-vs-session value drift
that breaks the daedalus_v4l2 strict-consumer path (issue #8):
* max_num_ref_frames fallback when VAAPI client left it 0 (count
valid DPB entries, then per-profile spec minimum)
* one-line request_log at h264_set_controls entry dumping raw
VAAPI bitfields for disambiguating remaining PPS-flag-zero
portion of #8
The PR explicitly defers the deeper "profile_idc / level_idc from
bitstream" portion of #8 — VAAPI's VAPictureParameterBufferH264 omits
both fields, so a real fix needs SPS-NAL parsing or daedalus
wire-protocol pass-through. Not in this bump.
pkgver: 1.0.0.r380.9898331 -> 1.0.0.r382.c1bb444 (commit count 380->382)
pkgrel: 1 (fresh pkgver, no rebuild-only iteration)
Verified on higgs (Debian 13 trixie, gcc 14.2.0, libva 2.22.0):
clean meson build, vainfo enumerates all 8 codec profiles, multi-device
probe still wires rkvdec / rpi-hevc-dec / daedalus_v4l2.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
iter12 hit a wall: Brave's ANGLE-Vulkan path requires GLES3, which
requires VK_EXT_transform_feedback, which PanVk-Bifrost did not
implement. This iter implements that extension, unlocking the full
ANGLE-Vulkan-on-Bifrost stack.
The implementation follows Panfrost-Gallium's well-validated XFB lowering
(nir_io_add_intrinsic_xfb_info + pan_nir_lower_xfb) wired into the PanVk
shader pipeline after nir_lower_io. Adds 4 XFB buffer address sysvals
plus per-draw num_vertices to the graphics sysval struct. Buffer state
is tracked on the cmd buffer; per-draw sysval upload populates either
the bound buffer's GPU address or PAN_SHADER_OOB_ADDRESS (memory-sink)
so XFB-capable pipelines used outside Begin/End survive without GPU
fault — the Panfrost-Gallium idiom from gallium/drivers/panfrost/
pan_cmdstream.c:1350.
Verified on PineTab2 (Mali-G52 r1 MC1, RK3566):
- /tmp/panvk-iter13/probe_xfb: 3 vertices captured byte-exact
- /tmp/panvk-iter13/probe_xfb_nodraw: XFB pipeline used without Bind/
Begin/End survives — DEVICE_LOST regression closed
- Brave 148 with --use-angle=vulkan: WebGL 2.0 (OpenGL ES 3.0) creates
cleanly, renderer reports
"ANGLE (ARM, Vulkan 1.2.335 (Mali-G52 r1 MC1), panvk)"
- chrome://gpu graphics feature status: Canvas/Compositing/OpenGL/
Rasterization/WebGL/WebGL2/WebGPU/Video Decode all hardware accelerated
Phase docs:
- ~/src/panvk-bifrost/phase4_iter13_close.md (build green)
- ~/src/panvk-bifrost/phase5_iter13_close.md (review fixes applied)
- ~/src/panvk-bifrost/phase6_iter13_close.md (Brave integration green)
pkgver bumped 26.0.6.r2 -> 26.0.6.r3; iter13 patch applied via
unified-diff (the 328-line change scope is past sed-of-individual-
lines territory). Sanity checks in prepare() verify the patch landed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>