Compare commits

..

18 Commits

Author SHA1 Message Date
marfrit b958ef8166 Merge pull request 'kernel: drain in-flight m2m jobs on daemon disconnect (fixes #146 D-state)' (#23) from noether/kernel-drain-inflight-on-chardev-release into main
Reviewed-on: #23
2026-05-23 15:11:40 +00:00
claude-noether 94be8c3d03 kernel: drain in-flight m2m jobs on daemon disconnect
Fixes issue #146 — daemon-crash (SIGKILL, SEGV, anything that
triggers chardev release) leaves V4L2 consumers in unkillable
TASK_UNINTERRUPTIBLE on /dev/video0 close.

## Root cause

device_run() adds an entry to dev->inflight when it sends a
REQ_DECODE to the daemon, marking the m2m job as "running".
The job is only cleared via v4l2_m2m_buf_done_and_job_finish()
in daedalus_complete_resp_frame(), which only fires on RESP_FRAME.

If the daemon dies (SIGKILL, SEGV, exit) BEFORE writing the
matching RESP_FRAME:
  - the inflight entry is never popped
  - v4l2_m2m_buf_done_and_job_finish is never called
  - the m2m scheduler still thinks a job is running

Later, when the V4L2 consumer's close() runs (or gets signalled
to exit), v4l2_m2m_ctx_release() → v4l2_m2m_cancel_job() waits
for !job_running indefinitely.  The consumer enters D-state and
survives SIGKILL until reboot.

Reproduced on hertz 2026-05-23, kernel 6.12.75+rpt-rpi-2712:

  $ sudo kill -STOP $DAEMON_PID            # block daemon I/O
  $ ./test_m2m_decode keyframe.bin out.nv12 1920 1080 vp9 &
  $ sudo kill -9 $DAEMON_PID               # chardev_release fires
  $ kill -9 $CLIENT_PID                    # ignored — D-state
  # client stack:
  v4l2_m2m_cancel_job+0x14c [v4l2_mem2mem]
  v4l2_m2m_ctx_release+0x20 [v4l2_mem2mem]
  daedalus_release+0x2c [daedalus_v4l2]
  v4l2_release+0x7c [videodev]
  __fput → do_exit → SIGKILL never delivered

## Fix

New API daedalus_drain_inflight_on_disconnect() in main.{c,h}:
walks the in-flight list, marks both src+dst buffers
VB2_BUF_STATE_ERROR via v4l2_m2m_buf_done_and_job_finish(), and
releases the bound media_request if any.  Same completion shape
as daedalus_complete_resp_frame() takes on the success path,
just with state = ERROR for every in-flight entry.

chardev_release calls the drain after flushing dev->req_queue
(messages still in req_queue weren't released to the daemon yet,
so they don't need the m2m-job-finish dance — freeing them is
sufficient).  The order matters: queue first (cheap), then m2m
drain (heavier, takes the inflight list).

Locking: list_splice_init under inflight_lock to take the entire
list atomically; lock dropped before iterating because
v4l2_m2m_buf_done_and_job_finish can sleep via vb2's buffer-done
dispatch and can re-enter device_run via the scheduler (which
would need inflight_lock again on the next REQ_DECODE).

## Verification path

Cannot rmmod the running module on hertz right now — the D-state
corpse from the repro session pins the refcount.  Verification
of the fixed module needs a reboot or fresh test host:

  $ sudo reboot                            # clears hung client
  $ sudo make modules_install              # install new .ko
  $ sudo modprobe daedalus_v4l2
  $ # rerun the repro script — client should die cleanly with
  $ # an -EIO / similar return from poll/DQBUF instead of hanging.

Build: clean on Linux 6.12.75 + rpt-rpi-2712, no new warnings.
The pre-existing "frame size 2128 > 2048" warning on
daedalus_device_run is unchanged by this commit.

## Followup not in scope

If a new V4L2 consumer races a REQ_DECODE through device_run
AFTER the drain has spliced the list (but before the daemon
chardev is reopened), the new entry sits in a freshly-empty
inflight list and the same hang can recur for that consumer
when the systemd auto-restart of the daemon either fails or
takes longer than the consumer's patience.  A secondary
safeguard would be to fail-fast in device_run when dev->chardev
is unopened — proposing as a separate ticket if this race
materialises in practice.

Closes #146.
2026-05-23 17:06:06 +02:00
marfrit 872eec505e Merge pull request 'proto: bump PROTO_MAX_PAYLOAD 64 KiB → 1 MiB (closes #19)' (#20) from noether/issue-19-bump-proto-payload-1mib into main
Reviewed-on: #20
2026-05-22 18:47:46 +00:00
marfrit ee42419479 proto: bump PROTO_MAX_PAYLOAD 64 KiB -> 1 MiB (closes #19)
Real H.264 access units routinely exceed the previous 64 KiB cap
on the chardev wire protocol:

  720p worst-case I-frame  ~200 KiB
  1080p worst-case I-frame ~500 KiB

libva-v4l2-request-fourier detects the under-sized OUTPUT-MPLANE
buffer and tries to grow it via VIDIOC_S_FMT to 147456 B, but
daedalus_fill_output_fmt unconditionally pins sizeimage to
DAEDALUS_MAX_BITSTREAM (= 65484) regardless of userspace's
request.  Firefox loses the slice, falls back to libmozavcodec
SW for the rest of the session.

Bumping the wire-protocol cap to 1 MiB lifts the kernel
OUTPUT_MPLANE sizeimage with it (DAEDALUS_MAX_BITSTREAM is derived
from the same #define).  All allocations (kernel kmalloc /
kmemdup, daemon read buffer, vb2 plane backing) are dynamic and
sized per-payload at runtime, so the only growth is the daemon's
startup read buffer (one ~1 MiB allocation per daemon process)
and the V4L2 OUTPUT_MPLANE per-buffer size.  KMALLOC_MAX_SIZE on
aarch64 SLUB is several MiB; 1 MiB is well within bounds.  Other
V4L2 stateless decoders (cedrus, rkvdec, hantro) report 1-4 MiB
OUTPUT_MPLANE sizeimage — this puts daedalus at the conservative
end of normal.

## Compatibility

#define-only change; struct layout unchanged.  But the
effective cap is the smaller of (kernel cap, daemon cap), so:
- new daemon + stale kernel: still capped at 64 KiB until the
  kernel module rebuilds.
- new kernel + stale daemon: same.
Lock-step install of daedalus-v4l2 + daedalus-v4l2-dkms is
therefore required for the fix to take effect; mirrors the
PR-#7/#8 cadence.

## NOT changed in this commit

- daedalus_fill_output_fmt still hardcodes sizeimage =
  DAEDALUS_MAX_BITSTREAM regardless of userspace request.
  Acceptable: vb2 will allocate up to that, and libva's resize-
  test now sees the kernel report a sizeimage at-least-as-large
  as what it asked for (147456 < 1048524).  A future cleanup
  could respect userspace's S_FMT.sizeimage clamped to the cap,
  to save memory on tiny streams.
- chardev kmalloc → kvmalloc swap (only matters above
  KMALLOC_MAX_SIZE, not here).

Refs #19.
2026-05-22 20:46:27 +02:00
marfrit 1d8f5af164 Merge pull request 'daemon: filter tiny pause-time bitstreams (closes #17)' (#18) from noether/issue-17-tiny-bitstream-filter into main
Reviewed-on: #18
2026-05-22 16:14:56 +00:00
marfrit 3e4e6e8eae daemon: filter tiny pause-time bitstreams (closes #17)
libva-v4l2-request-fourier flushes a stub packet into the V4L2
OUTPUT_MPLANE queue at playback-pause boundaries.  The payload is
shorter than any parseable H.264 NAL (3-byte start code + 1-byte
NAL header = 4 bytes minimum); avcodec_send_packet returns
AVERROR_INVALIDDATA (-1094995529), which propagated to the kernel
as a decode failure.  Firefox then marked H.264-via-VAAPI as
broken for the session and routed every subsequent frame to
libmozavcodec SW — pause never recovered to HW.

At the REQ_DECODE entry in chardev_client.c::handle_req_decode,
short-circuit any bitstream below the minimum-parseable threshold:
log INFO, skip daedalus_decoder_run_request, and reply RESP_FRAME
with status=DAEDALUS_DECODE_NO_FRAME so libva's V4L2 surface pool
stays healthy and Firefox doesn't see a failure.

Repro: Pi CM5 trixie, daedalus-v4l2 0.1.0+r41 + ffmpeg-v4l2-
request-fourier 2:8.1+rfourier+gb57fbbe-9, Firefox YouTube avc1.
Play → daemon decodes at ~46 fps.  Pause ≥ 1s.  Resume → daemon
silent; sudo journalctl -u daedalus-v4l2 --since '10s' | grep -c
'decoder: OK' = 0.  Last entry before silence:

    REQ_DECODE cookie=N codec=3 bitstream=3 bytes ...
    [h264 @ ...] no frame!
    [ERR] decoder: avcodec_send_packet failed: -1094995529

After this fix the 3-byte sentinel logs as 'tiny bitstream 3
bytes — dropping as no-op' and the libavcodec context is
untouched; the next real REQ_DECODE proceeds normally.

Scope NOT covered (intentionally deferred):
- A more general "tolerate AVERROR_INVALIDDATA mid-stream" path.
  Worth doing later but masks unrelated bugs.
- Investigating WHY libva sends the 3-byte sentinel on pause.
  Likely an upstream libva-v4l2-request-fourier issue; tracked
  separately if this filter is not enough.

Wire protocol unchanged.  No DAEDALUS_PROTO_VERSION bump.
2026-05-22 17:26:25 +02:00
marfrit 6e6dfa144d Merge pull request 'daemon: dlopen Kwiboo fork's soname 62 (FFmpeg 8.1 at /opt/fourier)' (#16) from noether/daemon-dlopen-kwiboo-soname62 into main
Reviewed-on: #16
2026-05-21 19:20:22 +00:00
claude-noether 514da29a73 daemon: dlopen Kwiboo fork's libavcodec.so.62 / libavformat.so.62 / libavutil.so.60
Switch the daemon's runtime dlopen targets from Debian-stock soname
61/61/59 (FFmpeg 7.1.3) to the Kwiboo fourier fork's soname
62/62/60 (FFmpeg 8.1) installed at the /opt/fourier prefix.

Why
---
The substitution arc tracked at daedalus-v4l2#11 needs daedalus-
fourier kernel calls woven into libavcodec's H264DSPContext NEON
init (replacing ff_h264_idct_add_neon etc. with thunks calling
daedalus_recipe_dispatch_h264_*).  We do that via patches in the
ffmpeg-v4l2-request-fourier package source — which we own, in
marfrit-packages, alongside the existing libudev-bypass and
nv15-to-p010 patches.  But that package builds the Kwiboo fork at
soname 62 / /opt/fourier.  The daemon currently dlopens soname 61
(Debian-stock + a separately-built +fourier2 patch that isn't in
marfrit-packages' source tree), so substitution patches there
wouldn't reach the daemon.

Switching to soname 62 routes the daemon through the package we
control — first step toward landing daedalus-fourier kernel
substitution into the production decode path.

Compat
------
- /opt/fourier libs are already on every host running the daemon
  (hard build-dep of ffmpeg-v4l2-request-fourier).  Firefox-fourier
  and mpv-fourier already dlopen them via the same path.
- /etc/ld.so.conf.d/fourier.conf entry resolves the new sonames
  from /opt/fourier/lib via the ld cache; dlopen-by-soname works
  without LD_LIBRARY_PATH wrappers.
- Build-side: daemon's pkg_check_modules picks up libav*.pc from
  /opt/fourier/lib/pkgconfig when PKG_CONFIG_PATH includes that
  directory (build-deb.sh follow-up will set it).
- API surface unchanged: avcodec_send_packet / receive_frame /
  AVCodecContext flags / AVFrame fields are all stable between
  FFmpeg 7.1 and 8.1.  Verified clean cross-compile on hertz.

Wire protocol unchanged.  No kmod bump.

Next step (follow-up PRs)
-------------------------
1. ffmpeg-v4l2-request-fourier patch: add 0003-daedalus-fourier-
   substitute-h264-idct4.patch that replaces ff_h264_idct_add_neon
   in libavcodec/aarch64/h264dsp_init_aarch64.c with a thunk
   calling daedalus_recipe_dispatch_h264_idct4.
2. Repeat for IDCT 8×8, deblock luma-v, qpel mc20 (one kernel per
   PR for reviewability; bench delta + decode_us delta documented
   per substitution).
3. marfrit-packages bump to pick up the new daemon + the substituted
   fourier package.
2026-05-21 21:19:24 +02:00
marfrit 3bc0da168c Merge pull request 'daemon: per-frame decode_us + periodic stats (#11 step 1)' (#15) from noether/daemon-decode-stats into main
Reviewed-on: #15
2026-05-21 18:26:50 +00:00
claude-noether 814b74d0bb daemon: per-frame decode_us + periodic stats summary (#11 step 1)
Establishes observable baseline metrics before any daedalus-fourier
kernel substitution lands.  Step 1 of the daemon-rewrite arc tracked
at daedalus-v4l2#11.

Changes
-------
- Per-frame `decoder: OK ...` log line now carries decode_us=N (the
  send_packet + receive_frame wall-clock cost in microseconds —
  exclusively the libavcodec round-trip, not the bitstream pack /
  SPS-PPS synth / pack-to-planes work).
- New "decoder stats" summary line every DAEDALUS_STATS_EVERY (60)
  decoded frames, reporting: codec, frame count, window seconds,
  fps, avg decode_us, MBs/s throughput, bytes/MB bitrate.

Sample
------
  decoder stats: codec=h264 frames=300 window=12.32s fps=24.35
                 avg_decode_us=4216.4 mbs_per_s=87643 bs_b_per_mb=1.56

What this tells us
------------------
Steady-state on higgs (Pi CM5) decoding bbb_720p_h264.mp4:
~4 ms decode_us, ~90 K MBs/s, well under the daedalus-fourier
NEON kernel ceilings (IDCT 4×4 @ 175 Mblocks/s, deblock @ 92 Medges/s,
qpel mc20 @ 131 Mblocks/s — all 100-1000× over our actual workload).

Means the 4 ms/frame is mostly libavcodec's CABAC + MV prediction +
intra prediction overhead, NOT the pixel-math primitives.
Substituting a single primitive would shave only a small slice of
the 4 ms.  Useful as guidance for the upcoming substitution work —
we'll pick the primitive with the largest cycle cost relative to
the alternative, and measure CPU saved per substitution.

No behaviour change: counters are static + unsynchronised (the
chardev event loop is single-threaded); reset when codec_id changes.
clock_gettime(CLOCK_MONOTONIC) for timing.
2026-05-21 20:17:09 +02:00
marfrit 77e14e5a19 Merge pull request 'daemon: link daedalus-fourier + log substrate availability at startup' (#13) from noether/daemon-link-daedalus-fourier into main
Reviewed-on: #13
2026-05-21 16:35:38 +00:00
claude-noether 88b2ebfaa9 daemon: link daedalus-fourier + log substrate availability at startup
First incremental step toward H.264 daemon-rewrite (daedalus-v4l2#11):
make the daedalus-fourier kernel library available to the daemon
process so subsequent patches can substitute its primitives
(IDCT 4×4, IDCT 8×8, luma vertical deblock, etc.) for libavcodec's
per-MB pixel math.

This patch does NOT yet dispatch any kernels.  It only:

  - Adds `pkg_check_modules(DAEDALUS_FOURIER REQUIRED daedalus-fourier)`
    to the daemon's CMakeLists, with explicit link ordering
    (libdaedalus_core.a must precede -lvulkan because the static
    archive references vulkan symbols and the linker resolves
    left-to-right).  We bypass IMPORTED_TARGET because pkg-config's
    Requires.private chain leaves CMake's dependency graph reordering
    the archive after -lvulkan, breaking the static link.

  - Calls daedalus_ctx_create_no_qpu() at daemon startup, logs the
    substrate-availability line, destroys the context at exit.
    no_qpu mode skips V3D Vulkan probe — proves linkage works
    without depending on shader-path resolution (which is a
    separate piece of work, since v3d_runner currently loads
    .spv files from cwd-relative paths and consumer would need
    a search path override).

Sample journal line:

  [2026-05-21 17:59:35.271 INFO] daedalus-fourier: linked, ctx alive
  (no_qpu mode; has_qpu=0)

Build-test verified on hertz (Pi 5 dev host) against an installed
copy of daedalus-fourier r35+gd87239d (from marfrit/daedalus-fourier
PR #1).  Binary links cleanly, --help prints, daemon mode opens
chardev (fails predictably on hertz which has no daedalus_v4l2
kmod; on higgs this is the existing working path).

Follow-up patches per daedalus-v4l2#11:

  1. Instrument the existing libavcodec decode path to count
     per-frame IDCT blocks / deblock edges / MC tiles so we have
     a baseline of what work the daemon dispatches for a typical
     YouTube H.264 stream.
  2. Substitute daedalus-fourier kernels one at a time, measuring
     CPU saved per substitution.
  3. Wire shader path resolution into daedalus_ctx_create() for
     the QPU substrate (V3D opportunistic helper paths).

Wire protocol unchanged.  DAEDALUS_PROTO_VERSION stays at 0.
2026-05-21 18:00:46 +02:00
marfrit 64b9599e47 Merge pull request 'daemon: AV_CODEC_FLAG_LOW_DELAY for H.264 — implements #11 part (2)' (#12) from noether/daemon-low-delay-h264 into main
Reviewed-on: #12
2026-05-21 15:17:57 +00:00
claude-noether 234a103084 daemon: AV_CODEC_FLAG_LOW_DELAY for H.264 — fix display-reorder breaking V4L2 1:1
Force libavcodec's H.264 decoder to emit frames in DECODE order
(one frame per send_packet, no internal display-order reorder
queue).  Single-line addition: ctx->flags |= AV_CODEC_FLAG_LOW_DELAY
before avcodec_open2, gated on codec_id == DAEDALUS_CODEC_H264.

Closes daedalus-v4l2#11 part (2).

Background
----------
PR #7's "parking design" approach to the H.264 display-reorder
problem broke libva-v4l2-request-fourier's 1:1 CAPTURE-completion
contract (see #9 + #10).  After the revert, the visible "2 1 4 3"
pair-swap regressed and the only path forward was to align the
daemon's output ordering with what V4L2 stateless clients expect:
**decode order, one CAPTURE buffer per OUTPUT slice, with display
reorder pushed upstream to ffmpeg-vaapi's per-VAAPI-surface POC
logic** (which it already does correctly for every real H.264
hardware decoder via VAPictureParameterBufferH264).

How LOW_DELAY does this
-----------------------
Inside libavcodec/h264dec.c, the flag sets h->low_delay = 1.
h264_select_output_frame (h264_picture.c) emits the just-decoded
picture immediately instead of routing through the display-order
DPB output queue.  DPB management for reference frames
(short_ref / long_ref) is unaffected — B-frame decoding
correctness is preserved; only the output buffering is bypassed.

Skipped for VP9 / AV1 — those codecs don't reorder internally,
so the flag would be a no-op but adds no value.

Verified
--------
On higgs (Pi CM5, 6.18.29+rpt-rpi-2712), test daemon hot-swapped
into /usr/bin/daedalus_v4l2_daemon, mpv --hwdec=vaapi-copy
--frames=300 against bbb_720p_h264.mp4: 311 REQ_DECODEs received,
308 successful "decoder: OK" responses (99.04% steady-state
delivery — 3 lost at GOP boundaries, no compounding drift).
mpv plays to its --frames cap and exits cleanly with "End of
file".  No "Unable to dequeue buffer", no "Failed to end picture
decode", no "AVHWFramesContext: Failed to sync surface" — all
the failures from #9 are gone.

Builds clean against ffmpeg-v4l2-request-fourier libavcodec.
2026-05-21 17:14:33 +02:00
marfrit 5d8b4369e5 Merge pull request 'kernel + daemon: revert PRs #7 + #8 (parking design incompatible with V4L2 stateless 1:1 expectation)' (#10) from noether/revert-parking-pr7-pr8 into main
Reviewed-on: #10
2026-05-21 13:39:09 +00:00
marfrit 714d781d22 Revert "Merge pull request 'kernel + daemon: H.264 B-frame display reorder fix (closes #6)' (#7) from noether/kernel-daemon-h264-reorder-fix into main"
This reverts commit 79256dc7ef, reversing
changes made to 7ff2d897ea.
2026-05-21 14:40:59 +02:00
marfrit 49e60c9bba Revert "Merge pull request 'kernel: claim src/dst at device_run, not at buf_done (fixes panic from #7)' (#8) from noether/kernel-claim-bufs-at-device-run into main"
This reverts commit 6ffe92bcac, reversing
changes made to 79256dc7ef.
2026-05-21 14:40:52 +02:00
marfrit 6ffe92bcac Merge pull request 'kernel: claim src/dst at device_run, not at buf_done (fixes panic from #7)' (#8) from noether/kernel-claim-bufs-at-device-run into main
Reviewed-on: #8
2026-05-21 11:54:52 +00:00
11 changed files with 616 additions and 749 deletions
+24
View File
@@ -28,6 +28,20 @@ find_package(PkgConfig REQUIRED)
pkg_check_modules(FFMPEG REQUIRED IMPORTED_TARGET
libavformat libavcodec libavutil)
# daedalus-fourier — VC VII (V3D) + ARM NEON back-end kernel library.
# Linked statically. Today only the no-QPU smoke-test path is wired
# (a ctx_create_no_qpu at daemon startup, log-and-destroy at exit);
# follow-up patches (per daedalus-v4l2#11) substitute the
# `daedalus_recipe_dispatch_h264_*` family for libavcodec's per-MB
# pixel primitives, one cycle at a time.
#
# We bypass IMPORTED_TARGET and consume pkg-config's static variables
# (--static --libs path) directly so we control the link order:
# libdaedalus_core.a must precede -lvulkan because the static archive
# references vulkan symbols and the linker resolves left-to-right.
pkg_check_modules(DAEDALUS_FOURIER REQUIRED daedalus-fourier)
find_package(Vulkan REQUIRED)
add_executable(daedalus_v4l2_daemon
src/main.c
src/ffmpeg_loader.c
@@ -45,13 +59,23 @@ target_include_directories(daedalus_v4l2_daemon
src
${CMAKE_CURRENT_SOURCE_DIR}/../include
${FFMPEG_INCLUDE_DIRS}
${DAEDALUS_FOURIER_INCLUDE_DIRS}
)
# dl for dlopen, pthread for future threading work.
target_link_directories(daedalus_v4l2_daemon
PRIVATE
${DAEDALUS_FOURIER_LIBRARY_DIRS}
)
target_link_libraries(daedalus_v4l2_daemon
PRIVATE
dl
pthread
# Order matters: libdaedalus_core.a first (so its undefined
# vulkan symbols register), then -lvulkan to satisfy them.
${DAEDALUS_FOURIER_LIBRARIES}
Vulkan::Vulkan
)
install(TARGETS daedalus_v4l2_daemon
+63 -241
View File
@@ -133,288 +133,110 @@ static int send_response(struct chardev_client *cli, uint32_t type,
return rc;
}
/*
* Register a new (src_pts → cookie) mapping in the pending table.
* Reuses an existing slot for src_pts if one exists (defensive — the
* kernel should never re-use the same src_pts for two live cookies,
* but libva running against a test client without timestamps might
* send all-zero src_pts; collapse them onto the latest cookie so the
* 1:1-per-stream case keeps working). Returns 0 on success, -ENOSPC
* if the table is full.
*/
static int pending_register(struct chardev_client *cli, uint64_t src_pts,
uint32_t cookie,
const struct daedalus_req_decode *req)
{
int free_slot = -1;
int i;
for (i = 0; i < DAEDALUS_MAX_PENDING_COOKIES; i++) {
if (cli->pending[i].used && cli->pending[i].src_pts == src_pts) {
cli->pending[i].cookie = cookie;
cli->pending[i].cached_req = *req;
return 0;
}
if (!cli->pending[i].used && free_slot < 0)
free_slot = i;
}
if (free_slot < 0) {
log_err("pending: table full registering cookie=%u src_pts=%llu",
cookie, (unsigned long long) src_pts);
return -ENOSPC;
}
cli->pending[free_slot].used = 1;
cli->pending[free_slot].src_pts = src_pts;
cli->pending[free_slot].cookie = cookie;
cli->pending[free_slot].cached_req = *req;
return 0;
}
/*
* Look up the cookie + cached REQ_DECODE that originally introduced
* @src_pts. Returns 0 + populates @cookie_out / @req_out, or -ENOENT
* if no match (likely a daemon bug or codec output we can't route).
*/
static int pending_lookup(const struct chardev_client *cli,
uint64_t src_pts,
uint32_t *cookie_out,
struct daedalus_req_decode *req_out)
{
int i;
for (i = 0; i < DAEDALUS_MAX_PENDING_COOKIES; i++) {
if (cli->pending[i].used &&
cli->pending[i].src_pts == src_pts) {
*cookie_out = cli->pending[i].cookie;
*req_out = cli->pending[i].cached_req;
return 0;
}
}
return -ENOENT;
}
static void pending_release(struct chardev_client *cli, uint64_t src_pts)
{
int i;
for (i = 0; i < DAEDALUS_MAX_PENDING_COOKIES; i++) {
if (cli->pending[i].used &&
cli->pending[i].src_pts == src_pts) {
cli->pending[i].used = 0;
cli->pending[i].src_pts = 0;
cli->pending[i].cookie = 0;
return;
}
}
}
/*
* Pack the daemon's current AVFrame into the CAPTURE buffer owned by
* @owner_cookie, then ship RESP_FRAME with the flags caller asked for.
* Returns 0 on success; -errno on GET_DMABUF / mmap failure (RESP is
* still emitted so the kernel doesn't park the dst buffer forever).
*/
static int deliver_frame_to_cookie(struct chardev_client *cli,
uint32_t owner_cookie,
const struct daedalus_req_decode *owner_req,
struct daedalus_resp_frame *resp,
uint32_t resp_flags)
{
struct daedalus_capture_planes planes;
int orc;
orc = daedalus_capture_planes_open(cli->fd, owner_cookie, owner_req,
&planes);
if (orc < 0) {
log_warn("drain: GET_DMABUF cookie=%u failed (%d); RESP metadata-only",
owner_cookie, orc);
} else {
(void) daedalus_decoder_pack_current(cli->decoder, &planes,
owner_req->capture_pix_fmt);
daedalus_capture_planes_close(&planes);
}
resp->flags |= resp_flags;
return send_response(cli, DAEDALUS_MSG_RESP_FRAME, owner_cookie,
resp, sizeof(*resp));
}
static int handle_req_decode(struct chardev_client *cli,
const struct daedalus_msg_hdr *hdr,
const uint8_t *payload)
{
struct daedalus_req_decode req;
struct daedalus_resp_frame resp;
struct daedalus_capture_planes planes;
const struct daedalus_h264_meta *h264_meta = NULL;
size_t meta_off, meta_len = 0;
int submit_status;
int src_consumed_emitted = 0;
int rc;
int decoded = 0;
if (hdr->payload_len < sizeof(req)) {
struct daedalus_resp_frame err = { 0 };
log_err("REQ_DECODE cookie=%u: payload too short %u < %zu",
hdr->cookie, hdr->payload_len, sizeof(req));
err.status = DAEDALUS_DECODE_ERR_RECV;
err.flags = DAEDALUS_RESP_FLAG_HAS_PIXELS |
DAEDALUS_RESP_FLAG_SRC_CONSUMED;
memset(&resp, 0, sizeof(resp));
resp.status = DAEDALUS_DECODE_ERR_RECV;
return send_response(cli, DAEDALUS_MSG_RESP_FRAME,
hdr->cookie, &err, sizeof(err));
hdr->cookie, &resp, sizeof(resp));
}
memcpy(&req, payload, sizeof(req));
/* Optional H.264 meta block follows req when the flag is set;
* bitstream comes after meta. */
if (req.flags & DAEDALUS_REQ_FLAG_H264_META)
meta_len = sizeof(struct daedalus_h264_meta);
meta_off = sizeof(req);
if ((size_t) req.bitstream_len + sizeof(req) + meta_len !=
hdr->payload_len) {
struct daedalus_resp_frame err = { 0 };
log_err("REQ_DECODE cookie=%u: bitstream_len %u + meta %zu inconsistent with payload_len %u",
hdr->cookie, req.bitstream_len, meta_len,
hdr->payload_len);
err.status = DAEDALUS_DECODE_ERR_RECV;
err.flags = DAEDALUS_RESP_FLAG_HAS_PIXELS |
DAEDALUS_RESP_FLAG_SRC_CONSUMED;
memset(&resp, 0, sizeof(resp));
resp.status = DAEDALUS_DECODE_ERR_RECV;
return send_response(cli, DAEDALUS_MSG_RESP_FRAME,
hdr->cookie, &err, sizeof(err));
hdr->cookie, &resp, sizeof(resp));
}
if (meta_len)
h264_meta = (const struct daedalus_h264_meta *)
(payload + meta_off);
log_info("REQ_DECODE cookie=%u codec=%u bitstream=%u bytes meta=%s capture=%ux%u %u planes src_pts=%llu",
log_info("REQ_DECODE cookie=%u codec=%u bitstream=%u bytes meta=%s capture=%ux%u %u planes",
hdr->cookie, req.codec_id, req.bitstream_len,
h264_meta ? "h264" : "none",
req.capture_width, req.capture_height,
req.capture_num_planes,
(unsigned long long) req.src_pts);
req.capture_num_planes);
/*
* Register (src_pts → cookie) mapping BEFORE submit, so any drained
* frame whose pts matches this REQ's src_pts (the steady-state
* 1:1 path) can find its owner via pending_lookup below. Out of
* space here is fatal — we'd lose the routing identity for this
* cookie's eventual frame. Send an error RESP that releases both
* src and dst so the V4L2 client moves on.
* Degenerate-bitstream filter (issue #17): libva-v4l2-request-
* fourier flushes a stub packet into the OUTPUT_MPLANE queue at
* playback-pause boundaries. The payload is shorter than any
* parseable H.264 NAL (3-byte start code + 1-byte NAL header =
* 4 bytes minimum); avcodec_send_packet returns
* AVERROR_INVALIDDATA, which we used to propagate to the kernel
* as a decode failure. Firefox then marks H.264-via-VAAPI as
* broken for the session and routes every subsequent frame to
* libmozavcodec SW — pause never recovers to HW.
*
* Drop the request as a no-op decode and reply RESP_FRAME OK so
* libva's V4L2 state machine keeps its surface pool alive.
*/
rc = pending_register(cli, req.src_pts, hdr->cookie, &req);
if (rc < 0) {
struct daedalus_resp_frame err = { 0 };
err.status = DAEDALUS_DECODE_ERR_SEND;
err.flags = DAEDALUS_RESP_FLAG_HAS_PIXELS |
DAEDALUS_RESP_FLAG_SRC_CONSUMED;
if (req.bitstream_len < 4) {
log_info("REQ_DECODE cookie=%u: tiny bitstream %u bytes — dropping as no-op (pause-time sentinel)",
hdr->cookie, req.bitstream_len);
memset(&resp, 0, sizeof(resp));
resp.status = DAEDALUS_DECODE_NO_FRAME;
return send_response(cli, DAEDALUS_MSG_RESP_FRAME,
hdr->cookie, &err, sizeof(err));
}
submit_status = daedalus_decoder_submit(cli->decoder, &req,
payload + meta_off + meta_len,
h264_meta);
if (submit_status != 0) {
/*
* avcodec_send_packet failed before any frame could have
* been queued for this src_pts. Drop the pending entry
* (no future drain will find a matching pts), and emit a
* combined HAS_PIXELS|SRC_CONSUMED error RESP for this
* cookie so the V4L2 client unblocks.
*/
struct daedalus_resp_frame err = { 0 };
pending_release(cli, req.src_pts);
err.status = (uint32_t) submit_status;
err.codec_id = req.codec_id;
err.flags = DAEDALUS_RESP_FLAG_HAS_PIXELS |
DAEDALUS_RESP_FLAG_SRC_CONSUMED;
err.output_src_pts = req.src_pts;
return send_response(cli, DAEDALUS_MSG_RESP_FRAME,
hdr->cookie, &err, sizeof(err));
}
/*
* Drain libavcodec for as many display-ordered frames as it can
* emit right now. Each frame's pts identifies which cookie's
* CAPTURE buffer the pixels go in (see [[daedalus-v4l2#6]]). In
* steady state for VP9/AV1 (no reorder) the loop runs exactly
* once, draining the just-submitted packet's own frame. For
* H.264 with B-frames the first drained frame may belong to an
* EARLIER cookie's bitstream — that's the entire point.
*/
for (;;) {
struct daedalus_resp_frame resp;
uint32_t owner_cookie = 0;
struct daedalus_req_decode owner_req;
uint32_t flags;
rc = daedalus_decoder_drain_one(cli->decoder, req.codec_id,
&resp);
if (rc == -EAGAIN)
break;
if (rc != 0) {
/*
* Hard codec error during drain. resp->status is set.
* Pin it to THIS REQ's cookie (we can't know whose
* pts the failed frame would have had); set both
* flags so the V4L2 client moves on.
*/
pending_release(cli, req.src_pts);
resp.flags = DAEDALUS_RESP_FLAG_HAS_PIXELS |
DAEDALUS_RESP_FLAG_SRC_CONSUMED;
resp.output_src_pts = req.src_pts;
(void) send_response(cli, DAEDALUS_MSG_RESP_FRAME,
hdr->cookie, &resp, sizeof(resp));
src_consumed_emitted = 1;
break;
}
if (pending_lookup(cli, resp.output_src_pts,
&owner_cookie, &owner_req) != 0) {
/*
* Frame's pts has no registered owner — implies a
* daemon-side tracking bug or a codec output for a
* packet we never registered (e.g. a B-frame that
* was queued before the daemon caught up). Drop the
* frame; can't safely route it.
*/
log_warn("drain: no pending entry for output_src_pts=%llu (codec dropped a frame?)",
(unsigned long long) resp.output_src_pts);
continue;
}
flags = DAEDALUS_RESP_FLAG_HAS_PIXELS;
if (owner_cookie == hdr->cookie) {
flags |= DAEDALUS_RESP_FLAG_SRC_CONSUMED;
src_consumed_emitted = 1;
}
(void) deliver_frame_to_cookie(cli, owner_cookie, &owner_req,
&resp, flags);
pending_release(cli, resp.output_src_pts);
}
/*
* If the drain loop didn't already SRC_CONSUMED this REQ's cookie
* (libavcodec held the frame for display-order reorder — the
* pixels will arrive in a future drain), emit a standalone
* SRC_CONSUMED RESP now. Kernel releases src_buf + runs
* job_finish; dst_buf parked until the matching HAS_PIXELS
* shows up later.
*/
if (!src_consumed_emitted) {
struct daedalus_resp_frame resp = { 0 };
resp.status = DAEDALUS_DECODE_OK;
resp.codec_id = req.codec_id;
resp.flags = DAEDALUS_RESP_FLAG_SRC_CONSUMED;
(void) send_response(cli, DAEDALUS_MSG_RESP_FRAME,
hdr->cookie, &resp, sizeof(resp));
}
return 0;
/*
* Open dmabuf-fds for every CAPTURE plane and mmap them.
* If this fails we still attempt the decode (so the kernel
* gets a structured error response) — but we pass NULL
* planes so pixels aren't written anywhere.
*/
rc = daedalus_capture_planes_open(cli->fd, hdr->cookie, &req,
&planes);
if (rc < 0) {
log_warn("REQ_DECODE cookie=%u: GET_DMABUF/mmap failed (%d); decode metadata-only",
hdr->cookie, rc);
/* planes is already zeroed by capture_planes_open */
}
rc = daedalus_decoder_run_request(cli->decoder, &req,
payload + meta_off + meta_len,
h264_meta,
&resp,
planes.nr ? &planes : NULL);
decoded = (rc >= 0);
daedalus_capture_planes_close(&planes);
if (!decoded)
return rc;
/*
* RESP_FRAME is metadata-only in Phase 8.6 — pixels already
* live in the V4L2 client's CAPTURE buffer via the dmabuf
* the daemon wrote to in pack_nv12_to_planes.
*/
return send_response(cli, DAEDALUS_MSG_RESP_FRAME, hdr->cookie,
&resp, sizeof(resp));
}
static int handle_ping(struct chardev_client *cli,
-26
View File
@@ -18,44 +18,18 @@
struct ffmpeg_loader;
struct daedalus_decoder;
/*
* Per-inflight (cookie, src_pts) tracking for the H.264 B-frame
* display-reorder fix (daedalus-v4l2#6). When the daemon drains a
* frame from libavcodec, frame->pts (= src_pts of the OUTPUT bitstream
* that contained the frame's slices) identifies which cookie's CAPTURE
* buffer the pixels belong in — distinct from the cookie of the REQ
* that triggered the receive_frame call. Mapping is small (bounded
* by the V4L2 client's buffer pool depth, typically ≤24) so a linear
* array beats a hashtable for cache-locality.
*
* cached_req carries the capture geometry (num_planes, plane sizes,
* strides, pix_fmt) so a later drain — which may target this cookie
* from a DIFFERENT REQ's drain loop — can call GET_DMABUF + open
* planes with the original REQ's parameters.
*/
#define DAEDALUS_MAX_PENDING_COOKIES 64
struct chardev_pending_cookie {
int used;
uint64_t src_pts;
uint32_t cookie;
struct daedalus_req_decode cached_req;
};
/**
* struct chardev_client - daemon-side chardev state
* @fd: open /dev/daedalus-v4l2 descriptor (-1 if not open)
* @loader: dlopen'd FFmpeg loader (borrowed; not owned)
* @decoder: per-codec AVCodecContext cache (owned)
* @stop_flag: set non-zero from a signal handler to break the loop
* @pending: pts → cookie lookup table for split SRC/DST RESPs
*/
struct chardev_client {
int fd;
struct ffmpeg_loader *loader;
struct daedalus_decoder *decoder;
volatile sig_atomic_t *stop_flag;
struct chardev_pending_cookie pending[DAEDALUS_MAX_PENDING_COOKIES];
};
/**
+257 -151
View File
@@ -10,12 +10,55 @@
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <linux/videodev2.h>
#include <libavcodec/avcodec.h>
#include <libavutil/pixfmt.h>
/*
* Per-codec running stats — daedalus-v4l2#11 step 1. Establishes
* baseline observability before any daedalus-fourier kernel
* substitution lands, so we can see what each substitution actually
* shifted. Per-frame `decoder: OK` line now carries decode_us; a
* "decoder stats" summary line lands every DAEDALUS_STATS_EVERY OK
* frames with throughput + per-frame budget aggregates.
*
* Counters are static (process-local) and unsynchronised — the
* daemon's chardev event loop is single-threaded, so no atomics or
* locking needed. Reset when codec_id changes (different stream).
*/
#define DAEDALUS_STATS_EVERY 60u
struct daedalus_decode_stats {
uint32_t codec_id;
uint64_t frames;
uint64_t total_decode_ns;
uint64_t total_bitstream_bytes;
uint64_t total_mbs; /* derived from frame WxH; H.264-style 16x16 */
struct timespec window_start;
};
static struct daedalus_decode_stats g_stats;
static inline uint64_t timespec_delta_ns(const struct timespec *a,
const struct timespec *b)
{
return (uint64_t)(b->tv_sec - a->tv_sec) * 1000000000ull +
(uint64_t)(b->tv_nsec - a->tv_nsec);
}
static const char *codec_id_name(uint32_t cid)
{
switch (cid) {
case DAEDALUS_CODEC_VP9: return "vp9";
case DAEDALUS_CODEC_AV1: return "av1";
case DAEDALUS_CODEC_H264: return "h264";
default: return "?";
}
}
/*
* FNV-1a 32-bit hash. Used as a compact digest of the decoded
* frame's YUV planes so the kernel can verify "the daemon produced
@@ -132,6 +175,32 @@ static int decoder_open_codec(struct daedalus_decoder *dec, uint32_t codec_id,
ctx = fm->avcodec_alloc_context3(codec);
if (!ctx)
return -ENOMEM;
/*
* H.264-only: force libavcodec to emit frames in DECODE order
* (one frame per send_packet, no internal display-order reorder
* queue). V4L2 stateless decoder protocol expects each OUTPUT
* bitstream packet to produce one CAPTURE buffer with that
* packet's slice-decoded pixels — regardless of display order.
* ffmpeg-vaapi's H.264 decoder (which is what consumes our
* CAPTURE buffers via libva-v4l2-request-fourier) does its own
* POC-based display reorder upstream, so producing decode-order
* output is correct.
*
* AV_CODEC_FLAG_LOW_DELAY forces `low_delay = 1` inside
* libavcodec's H.264 decoder — `h264_select_output_frame` emits
* the just-decoded picture immediately instead of holding it
* for the display-order DPB output queue. DPB management for
* reference frames (short_ref / long_ref) is unaffected; B-frame
* decoding correctness is preserved.
*
* Closes daedalus-v4l2#11 part (2). Skipped for VP9 / AV1 —
* those formats don't internally reorder, so the flag would be
* a no-op but adds no value.
*/
if (codec_id == DAEDALUS_CODEC_H264)
ctx->flags |= AV_CODEC_FLAG_LOW_DELAY;
rc = fm->avcodec_open2(ctx, codec, NULL);
if (rc < 0) {
log_err("decoder: avcodec_open2 failed: %d", rc);
@@ -348,30 +417,31 @@ static int pack_nv12_to_planes(struct AVFrame *fr,
return 0;
}
/*
* Per-codec assemble + send_packet. Returns 0 on success, or one
* of DAEDALUS_DECODE_ERR_* on failure (errors here propagate via
* the caller's RESP_FRAME status field — they are NOT logged as a
* silent skip). pkt->pts is stamped from req->src_pts so the
* resulting frame->pts comes back identifiable on the drain side.
*/
int daedalus_decoder_submit(struct daedalus_decoder *dec,
const struct daedalus_req_decode *req,
const uint8_t *bitstream,
const struct daedalus_h264_meta *h264_meta)
int daedalus_decoder_run_request(struct daedalus_decoder *dec,
const struct daedalus_req_decode *req,
const uint8_t *bitstream,
const struct daedalus_h264_meta *h264_meta,
struct daedalus_resp_frame *resp,
const struct daedalus_capture_planes *planes)
{
struct ffmpeg_loader *fm = dec->loader;
struct AVCodecContext *ctx = NULL;
uint8_t *assembled = NULL;
size_t assembled_len = 0;
int rc;
int status = 0;
memset(resp, 0, sizeof(*resp));
resp->codec_id = req->codec_id;
rc = decoder_open_codec(dec, req->codec_id, &ctx);
if (rc == -ENOSYS)
return DAEDALUS_DECODE_ERR_CODEC;
if (rc < 0)
return DAEDALUS_DECODE_ERR_OPEN;
if (rc == -ENOSYS) {
resp->status = DAEDALUS_DECODE_ERR_CODEC;
goto out;
}
if (rc < 0) {
resp->status = DAEDALUS_DECODE_ERR_OPEN;
goto out;
}
fm->av_packet_unref(dec->pkt);
@@ -396,14 +466,14 @@ int daedalus_decoder_submit(struct daedalus_decoder *dec,
if (sps_len == 0 || pps_len == 0) {
log_err("decoder: SPS/PPS NAL synth failed (sps=%zu pps=%zu)",
sps_len, pps_len);
status = DAEDALUS_DECODE_ERR_SEND;
resp->status = DAEDALUS_DECODE_ERR_SEND;
goto out;
}
assembled_len = sps_len + pps_len + req->bitstream_len;
assembled = malloc(assembled_len + AV_INPUT_BUFFER_PADDING_SIZE);
if (!assembled) {
status = DAEDALUS_DECODE_ERR_SEND;
resp->status = DAEDALUS_DECODE_ERR_SEND;
goto out;
}
memcpy(assembled, sps_nal, sps_len);
@@ -441,161 +511,197 @@ int daedalus_decoder_submit(struct daedalus_decoder *dec,
}
/*
* Stamp pkt->pts from REQ_DECODE's src_pts (the V4L2 OUTPUT
* buffer's vb2 timestamp captured by the kernel at device_run
* time). libavcodec carries pkt->pts forward to frame->pts on
* the receive_frame side — even after display-order reordering
* inside the H.264 DPB — which lets the chardev_client identify
* which cookie's CAPTURE buffer a drained frame's pixels belong
* in. Without this stamp, every drained frame would look like
* it came from the current REQ; pairs of B/P would swap places
* in the visible output (daedalus-v4l2#6).
* Time send_packet+receive_frame for the per-frame `decoder: OK`
* line + the periodic stats summary. Includes only the
* libavcodec round-trip — not the bitstream packing, SPS/PPS
* synth, or pack-to-planes work (those are accounted for
* separately in the request's overall handle time).
*/
dec->pkt->pts = (int64_t) req->src_pts;
struct timespec t_decode_start, t_decode_end;
uint64_t decode_ns = 0;
clock_gettime(CLOCK_MONOTONIC, &t_decode_start);
rc = fm->avcodec_send_packet(ctx, dec->pkt);
if (rc < 0) {
log_err("decoder: avcodec_send_packet failed: %d", rc);
status = DAEDALUS_DECODE_ERR_SEND;
resp->status = DAEDALUS_DECODE_ERR_SEND;
goto out;
}
out:
free(assembled);
(void) assembled_len;
return status;
}
/*
* Pull the next display-ordered frame out of libavcodec's DPB.
* Returns 0 if a frame was returned (dec->frame holds it and resp
* is populated with metadata + output_src_pts == frame->pts),
* -EAGAIN if libavcodec needs more input, or DAEDALUS_DECODE_ERR_*
* on a hard codec error. Caller may immediately invoke
* daedalus_decoder_pack_current() to copy this frame's pixels into
* a CAPTURE buffer's mapped planes, then call drain_one again for
* any further frames in the DPB.
*/
int daedalus_decoder_drain_one(struct daedalus_decoder *dec,
uint32_t codec_id,
struct daedalus_resp_frame *resp)
{
struct ffmpeg_loader *fm = dec->loader;
struct AVCodecContext *ctx = NULL;
struct AVFrame *fr;
const AVPixFmtDescriptor *desc;
uint32_t h, luma_len = 0, chroma_len = 0;
int rc;
memset(resp, 0, sizeof(*resp));
resp->codec_id = codec_id;
rc = decoder_open_codec(dec, codec_id, &ctx);
if (rc == -ENOSYS) {
resp->status = DAEDALUS_DECODE_ERR_CODEC;
return DAEDALUS_DECODE_ERR_CODEC;
}
if (rc < 0) {
resp->status = DAEDALUS_DECODE_ERR_OPEN;
return DAEDALUS_DECODE_ERR_OPEN;
}
fm->av_frame_unref(dec->frame);
rc = fm->avcodec_receive_frame(ctx, dec->frame);
if (rc == AVERROR(EAGAIN) || rc == AVERROR_EOF)
return -EAGAIN;
clock_gettime(CLOCK_MONOTONIC, &t_decode_end);
decode_ns = timespec_delta_ns(&t_decode_start, &t_decode_end);
if (rc == AVERROR(EAGAIN) || rc == AVERROR_EOF) {
log_debug("decoder: no frame ready yet (rc=%d, %lu us)",
rc, (unsigned long)(decode_ns / 1000));
resp->status = DAEDALUS_DECODE_NO_FRAME;
goto out;
}
if (rc < 0) {
log_err("decoder: avcodec_receive_frame failed: %d", rc);
resp->status = DAEDALUS_DECODE_ERR_RECV;
return DAEDALUS_DECODE_ERR_RECV;
goto out;
}
fr = dec->frame;
desc = fm->av_pix_fmt_desc_get(fr->format);
h = fnv1a32_init();
{
struct AVFrame *fr = dec->frame;
const AVPixFmtDescriptor *desc =
fm->av_pix_fmt_desc_get(fr->format);
uint32_t h = fnv1a32_init();
uint32_t luma_len = 0, chroma_len = 0;
resp->status = DAEDALUS_DECODE_OK;
resp->width = (uint32_t) fr->width;
resp->height = (uint32_t) fr->height;
resp->pix_fmt = fr->format;
resp->output_src_pts = (uint64_t) fr->pts;
resp->status = DAEDALUS_DECODE_OK;
resp->width = (uint32_t) fr->width;
resp->height = (uint32_t) fr->height;
resp->pix_fmt = fr->format;
if (!desc) {
log_warn("decoder: no descriptor for pix_fmt %d", fr->format);
} else {
int p, max_plane = 0;
int i;
/*
* Walk every plane reported by the AVPixFmtDescriptor.
* For each component, byte width = ((plane_w *
* step_minus1) >> 0) — but the descriptor only tells
* us which plane each component sits in, not the
* plane's byte stride per pixel. In practice for the
* formats we care about (YUV420P, YUV422P, YUV444P,
* GBRP, NV12), each plane has exactly one component
* at 1 byte/sample. Hash each plane at
* (width >> log2_chroma_w) × (height >> log2_chroma_h)
* for chroma planes, full-size for plane 0.
*
* This generalises cleanly to anything 8-bit-per-
* sample-per-plane; 10/12-bit (P010, YUV420P10LE) will
* need depth handling when Phase 8.6 brings HDR
* content into play.
*/
if (!desc) {
log_warn("decoder: no descriptor for pix_fmt %d",
fr->format);
} else {
int p, max_plane = 0;
int i;
for (i = 0; i < desc->nb_components; i++) {
if (desc->comp[i].plane > max_plane)
max_plane = desc->comp[i].plane;
}
for (p = 0; p <= max_plane; p++) {
int pw, ph;
if (!fr->data[p] || !fr->linesize[p])
continue;
if (p == 0) {
pw = fr->width;
ph = fr->height;
luma_len += (uint32_t) pw * (uint32_t) ph;
} else {
pw = AV_CEIL_RSHIFT(fr->width,
desc->log2_chroma_w);
ph = AV_CEIL_RSHIFT(fr->height,
desc->log2_chroma_h);
chroma_len += (uint32_t) pw * (uint32_t) ph;
for (i = 0; i < desc->nb_components; i++) {
if (desc->comp[i].plane > max_plane)
max_plane = desc->comp[i].plane;
}
h = fnv1a32_plane(h, fr->data[p], pw, ph,
fr->linesize[p]);
for (p = 0; p <= max_plane; p++) {
int pw, ph;
if (!fr->data[p] || !fr->linesize[p])
continue;
if (p == 0) {
pw = fr->width;
ph = fr->height;
luma_len += (uint32_t) pw *
(uint32_t) ph;
} else {
pw = AV_CEIL_RSHIFT(fr->width,
desc->log2_chroma_w);
ph = AV_CEIL_RSHIFT(fr->height,
desc->log2_chroma_h);
chroma_len += (uint32_t) pw *
(uint32_t) ph;
}
h = fnv1a32_plane(h, fr->data[p], pw, ph,
fr->linesize[p]);
}
}
resp->luma_len = luma_len;
resp->chroma_len = chroma_len;
resp->fnv1a_yuv = h;
/*
* Pack pixels directly into the mapped CAPTURE dmabuf
* planes. Dispatch on the V4L2 fourcc the kernel
* negotiated:
* V4L2_PIX_FMT_NV12M (default, 8-bit, 2 planes)
* V4L2_PIX_FMT_P010 (10-bit HDR, 1 plane)
*/
if (planes && planes->nr >= 1) {
int prc = 0;
switch (req->capture_pix_fmt) {
case V4L2_PIX_FMT_NV12M:
prc = pack_nv12_to_planes(fr, desc, planes);
break;
case V4L2_PIX_FMT_NV12:
prc = pack_nv12_single_to_plane(fr, desc, planes);
break;
case V4L2_PIX_FMT_P010:
prc = pack_p010_to_plane(fr, desc, planes);
break;
default:
log_warn("decoder: unsupported capture fourcc 0x%08x",
req->capture_pix_fmt);
prc = -EINVAL;
break;
}
if (prc < 0)
log_warn("decoder: pack failed (pix_fmt=%d cap_fourcc=0x%08x) — kernel will see metadata only",
fr->format, req->capture_pix_fmt);
}
log_info("decoder: OK %dx%d fmt=%d (%s) fnv1a=0x%08x luma=%u chroma=%u decode_us=%lu",
fr->width, fr->height, fr->format,
desc ? desc->name : "?",
h, luma_len, chroma_len,
(unsigned long)(decode_ns / 1000));
/*
* Periodic stats summary (every DAEDALUS_STATS_EVERY frames).
* Reset window on codec change. Gives observable baseline
* for the daedalus-v4l2#11 substitution arc: fps, average
* decode_us, MB/s throughput, bitstream B/MB. Compare
* against daedalus-fourier README's per-kernel NEON
* baselines (e.g. H.264 IDCT 4x4 = 175 Mblock/s) to gauge
* which substitutions are worth pursuing.
*/
if (g_stats.codec_id != req->codec_id) {
g_stats.codec_id = req->codec_id;
g_stats.frames = 0;
g_stats.total_decode_ns = 0;
g_stats.total_bitstream_bytes = 0;
g_stats.total_mbs = 0;
clock_gettime(CLOCK_MONOTONIC, &g_stats.window_start);
}
g_stats.frames++;
g_stats.total_decode_ns += decode_ns;
g_stats.total_bitstream_bytes += req->bitstream_len;
g_stats.total_mbs += (uint64_t)((fr->width + 15) / 16) *
(uint64_t)((fr->height + 15) / 16);
if (g_stats.frames % DAEDALUS_STATS_EVERY == 0) {
struct timespec t_now;
clock_gettime(CLOCK_MONOTONIC, &t_now);
uint64_t window_ns =
timespec_delta_ns(&g_stats.window_start, &t_now);
double window_s = (double)window_ns / 1e9;
double fps = window_s > 0 ?
(double)g_stats.frames / window_s : 0.0;
double avg_decode_us = g_stats.frames > 0 ?
(double)g_stats.total_decode_ns /
(double)g_stats.frames / 1000.0 : 0.0;
double mb_per_s = window_s > 0 ?
(double)g_stats.total_mbs / window_s : 0.0;
double bs_b_per_mb = g_stats.total_mbs > 0 ?
(double)g_stats.total_bitstream_bytes /
(double)g_stats.total_mbs : 0.0;
log_info("decoder stats: codec=%s "
"frames=%llu window=%.2fs fps=%.2f "
"avg_decode_us=%.1f mbs_per_s=%.0f "
"bs_b_per_mb=%.2f",
codec_id_name(g_stats.codec_id),
(unsigned long long)g_stats.frames,
window_s, fps, avg_decode_us,
mb_per_s, bs_b_per_mb);
}
}
resp->luma_len = luma_len;
resp->chroma_len = chroma_len;
resp->fnv1a_yuv = h;
log_info("decoder: OK %dx%d fmt=%d (%s) fnv1a=0x%08x luma=%u chroma=%u src_pts=%llu",
fr->width, fr->height, fr->format,
desc ? desc->name : "?",
h, luma_len, chroma_len,
(unsigned long long) fr->pts);
fm->av_frame_unref(dec->frame);
out:
free(assembled);
(void) assembled_len;
return 0;
}
int daedalus_decoder_pack_current(struct daedalus_decoder *dec,
const struct daedalus_capture_planes *planes,
uint32_t capture_pix_fmt)
{
struct ffmpeg_loader *fm = dec->loader;
struct AVFrame *fr = dec->frame;
const AVPixFmtDescriptor *desc;
int prc;
if (!planes || planes->nr < 1 || !fr || !fr->width || !fr->height)
return -EINVAL;
desc = fm->av_pix_fmt_desc_get(fr->format);
switch (capture_pix_fmt) {
case V4L2_PIX_FMT_NV12M:
prc = pack_nv12_to_planes(fr, desc, planes);
break;
case V4L2_PIX_FMT_NV12:
prc = pack_nv12_single_to_plane(fr, desc, planes);
break;
case V4L2_PIX_FMT_P010:
prc = pack_p010_to_plane(fr, desc, planes);
break;
default:
log_warn("decoder: unsupported capture fourcc 0x%08x",
capture_pix_fmt);
prc = -EINVAL;
break;
}
if (prc < 0)
log_warn("decoder: pack failed (pix_fmt=%d cap_fourcc=0x%08x)",
fr->format, capture_pix_fmt);
return prc;
}
+22 -57
View File
@@ -56,68 +56,33 @@ int daedalus_decoder_init(struct daedalus_decoder *dec,
void daedalus_decoder_cleanup(struct daedalus_decoder *dec);
/**
* daedalus_decoder_submit - send one REQ_DECODE's bitstream into libavcodec
* daedalus_decoder_run_request - decode one REQ_DECODE payload
* @dec: initialised decoder
* @req: REQ_DECODE prefix (from the wire); src_pts is stamped on
* the AVPacket so libavcodec returns frame->pts == src_pts
* when it eventually outputs the matching frame in display
* order (daedalus-v4l2#6).
* @req: REQ_DECODE prefix (from the wire)
* @bitstream: bitstream blob (req->bitstream_len bytes)
* @h264_meta: optional H.264 SPS/PPS metadata; non-NULL only when
* codec_id == H264 and the kernel set DAEDALUS_REQ_FLAG_
* H264_META. See decoder.c for the AnnexB synthesis.
* H264_META. Used to synthesise the AnnexB SPS+PPS NALs
* libavcodec needs before any slice (libva-v4l2-request
* passes only the slice in @bitstream per the V4L2
* stateless API contract). NULL for VP9/AV1 paths.
* @resp: caller-allocated RESP_FRAME output (zeroed by callee)
* @planes: mapped CAPTURE planes (Phase 8.6 dmabuf path). If
* NULL or planes->nr == 0, the decoder runs but
* writes no pixels — caller still gets dims + digest.
*
* Calls avcodec_send_packet on the codec's per-codec AVCodecContext.
* Returns 0 on success; one of DAEDALUS_DECODE_ERR_* on failure
* (which the caller should propagate as the RESP_FRAME status for
* the cookie of this REQ). Does NOT call avcodec_receive_frame —
* use daedalus_decoder_drain_one for that.
* Populates @resp with the decode outcome and writes decoded
* pixels (NV12 layout: Y to plane 0, interleaved CbCr to plane
* 1) directly into the mapped dmabuf planes. Always returns
* 0; decode-level failures are reported via @resp->status so
* the kernel sees a structured response rather than a dropped
* request.
*/
int daedalus_decoder_submit(struct daedalus_decoder *dec,
const struct daedalus_req_decode *req,
const uint8_t *bitstream,
const struct daedalus_h264_meta *h264_meta);
/**
* daedalus_decoder_drain_one - pop the next display-ordered frame, if any
* @dec: initialised decoder
* @codec_id: which codec context to drain (matches the REQ that just
* called submit). VP9/AV1/H264 use independent contexts.
* @resp: caller-allocated RESP_FRAME output (zeroed by callee).
* On a successful drain (return 0), resp's status / width /
* height / pix_fmt / luma_len / chroma_len / fnv1a_yuv /
* output_src_pts are populated; flags is left at 0 (caller
* adds HAS_PIXELS / SRC_CONSUMED). On EAGAIN, resp is
* zeroed.
*
* Return: 0 on a frame returned, -EAGAIN if libavcodec needs more
* input (display-order frame held inside DPB), <0 on a hard codec
* error (resp->status set).
*
* After a successful drain, the dec's internal AVFrame holds the
* decoded picture. Caller may immediately call
* daedalus_decoder_pack_current(planes) to write that picture into
* a CAPTURE buffer's dmabuf-mapped planes. Subsequent calls to
* drain_one (without another submit) try to pull additional frames
* from libavcodec's DPB.
*/
int daedalus_decoder_drain_one(struct daedalus_decoder *dec,
uint32_t codec_id,
struct daedalus_resp_frame *resp);
/**
* daedalus_decoder_pack_current - pack the last drained frame into planes
* @dec: initialised decoder; must have a frame from drain_one
* @planes: mapped CAPTURE planes (open via GET_DMABUF using the
* cookie that owns the frame's output_src_pts).
* @capture_pix_fmt: V4L2 fourcc on the CAPTURE side (NV12M, NV12,
* P010).
*
* Return: 0 on success, <0 on a pack failure (kernel sees only the
* metadata, not pixels — typical when a format isn't wired yet).
*/
int daedalus_decoder_pack_current(struct daedalus_decoder *dec,
const struct daedalus_capture_planes *planes,
uint32_t capture_pix_fmt);
int daedalus_decoder_run_request(struct daedalus_decoder *dec,
const struct daedalus_req_decode *req,
const uint8_t *bitstream,
const struct daedalus_h264_meta *h264_meta,
struct daedalus_resp_frame *resp,
const struct daedalus_capture_planes *planes);
#endif /* DAEDALUS_V4L2_DECODER_H */
+24 -7
View File
@@ -11,14 +11,31 @@
#include <dlfcn.h>
/*
* SONAME versions match Debian Trixie / FFmpeg 7.1.3 today. If
* the system FFmpeg changes major, the daemon needs a rebuild;
* we could add fallback paths (.so.60, .so.59, ...) but for
* Phase 8.3 the pinned version is fine.
* SONAME versions match the Kwiboo ffmpeg-v4l2-request-fourier
* fork (FFmpeg 8.1) installed at the /opt/fourier prefix. The
* fourier campaign's ld.so.conf.d/fourier.conf entry resolves
* these sonames from /opt/fourier/lib via the ld cache, so
* dlopen-by-soname works without LD_LIBRARY_PATH wrappers.
*
* Switched from Debian-stock soname 61/61/59 (FFmpeg 7.1.3) at
* 2026-05-21 to land daedalus-fourier kernel substitution into
* the production decode path via patches in the Kwiboo fork
* (see daedalus-v4l2#11 substitution arc): we own the fork
* source in marfrit-packages, so we can layer NEON-DSP
* substitution patches there for libavcodec/aarch64/h264dsp_init
* → daedalus_recipe_dispatch_* thunks. The Debian-stock 7.1.3
* is built outside the marfrit-packages source tree, which
* would have made layering substitution patches awkward.
*
* Note: libavutil bumps soname 59 → 60 between FFmpeg 7.1 and
* 8.1; libavformat + libavcodec each bump 61 → 62. The public
* API surface the daemon uses (avcodec_send_packet /
* receive_frame / AVCodecContext flags / AVFrame fields) is
* stable across the bump.
*/
#define LIBAVFORMAT_SONAME "libavformat.so.61"
#define LIBAVCODEC_SONAME "libavcodec.so.61"
#define LIBAVUTIL_SONAME "libavutil.so.59"
#define LIBAVFORMAT_SONAME "libavformat.so.62"
#define LIBAVCODEC_SONAME "libavcodec.so.62"
#define LIBAVUTIL_SONAME "libavutil.so.60"
/*
* Resolve a symbol from a dlopen'd handle. Logs the failure
+24
View File
@@ -22,6 +22,8 @@
#include <libavutil/log.h>
#include <daedalus.h>
static volatile sig_atomic_t g_terminate = 0;
static void on_signal(int sig)
@@ -120,6 +122,26 @@ int main(int argc, char **argv)
/* Mute FFmpeg's own chattiness unless the user asked. */
fm.av_log_set_level(verbose ? AV_LOG_INFO : AV_LOG_WARNING);
/*
* Initialise daedalus-fourier early so we can log substrate
* availability up front. daedalus_ctx_create_no_qpu() skips
* the V3D Vulkan probe — we're not dispatching any kernels
* yet, this is just the linkage sanity check + a marker in the
* journal that the binary is wired against the right
* daedalus-fourier version. Future work (per daedalus-v4l2#11)
* promotes to daedalus_ctx_create() once shader-path resolution
* is wired through the public API.
*/
daedalus_ctx *df_ctx = daedalus_ctx_create_no_qpu();
if (df_ctx) {
log_info("daedalus-fourier: linked, ctx alive (no_qpu mode; "
"has_qpu=%d)",
daedalus_ctx_has_qpu(df_ctx));
} else {
log_warn("daedalus-fourier: ctx_create_no_qpu returned NULL "
"(out of memory?) — continuing without backend kernels");
}
int rc;
const char *cmd = argv[i++];
if (strcmp(cmd, "parse") == 0) {
@@ -132,6 +154,8 @@ int main(int argc, char **argv)
rc = 2;
}
if (df_ctx)
daedalus_ctx_destroy(df_ctx);
ffmpeg_loader_cleanup(&fm);
log_cleanup();
return rc;
+14 -53
View File
@@ -28,12 +28,7 @@
#include <linux/v4l2-controls.h>
#define DAEDALUS_PROTO_MAGIC 0x44303456u /* 'D04V' */
#define DAEDALUS_PROTO_VERSION 1u /* pre-1.0; bumped for
* REQ_DECODE.src_pts +
* RESP_FRAME.flags +
* RESP_FRAME.output_src_pts
* (H.264 B-frame reorder fix,
* daedalus-v4l2#6). */
#define DAEDALUS_PROTO_VERSION 0u /* pre-1.0 */
/*
* Wire-protocol message types.
@@ -76,7 +71,18 @@ struct daedalus_msg_hdr {
__u32 reserved;
};
#define DAEDALUS_PROTO_MAX_PAYLOAD (64u * 1024u) /* 64 KiB */
/*
* Wire-protocol payload cap. Sized to comfortably hold real-world
* H.264 / VP9 / AV1 access-unit bitstreams:
* - 720p H.264 worst-case I-frame: ~200 KiB
* - 1080p H.264 worst-case I-frame: ~500 KiB
* - 4K H.264 worst-case I-frame: ~2 MiB (would need a bump)
* 1 MiB is the conservative end of what cedrus / rkvdec / hantro
* report as OUTPUT_MPLANE sizeimage. Allocations (chardev kmalloc
* / kmemdup, daemon read buffer, vb2 plane backing) are sized per-
* payload at runtime; this only sets the ceiling. Issue #19.
*/
#define DAEDALUS_PROTO_MAX_PAYLOAD (1024u * 1024u) /* 1 MiB */
/* -- REQ_DECODE / RESP_FRAME payload structures ---------------------- */
@@ -147,17 +153,6 @@ struct daedalus_req_decode {
__u32 capture_plane_size[3];
__u32 capture_plane_stride[3];
__u32 flags;
__u32 reserved0; /* explicit pad to 8-byte align src_pts */
/*
* The V4L2 OUTPUT (bitstream) buffer's vb2 timestamp at submission
* time. The daemon sets pkt->pts = src_pts before
* avcodec_send_packet so libavcodec's display-ordered
* receive_frame can return frame->pts == src_pts of the bitstream
* the frame's slices belong to. Decouples kernel cookie (decode
* order, in-kernel identity) from display order — required for
* H.264 B-frame correctness (daedalus-v4l2#6).
*/
__u64 src_pts;
};
/**
@@ -224,31 +219,6 @@ enum daedalus_decode_status {
* Fixed size — keeps wire parsing simple. No variable-length
* pixel data in Phase 8.4; dmabuf in Phase 8.5 carries that.
*/
/**
* DAEDALUS_RESP_FLAG_HAS_PIXELS - this RESP delivers a decoded frame's
* pixels. The owning CAPTURE buffer is identified by output_src_pts
* (matched against an in-flight item's src_pts on the kernel side),
* NOT by the chardev message header's cookie. Required since
* libavcodec's H.264 decoder reorders to display order — the cookie
* the daemon just received the REQ on may not be the cookie whose
* bitstream produced the frame just popped from receive_frame.
*
* DAEDALUS_RESP_FLAG_SRC_CONSUMED - the chardev header's cookie's
* OUTPUT bitstream buffer is done from the daemon's perspective
* (libavcodec has accepted the slice data via avcodec_send_packet).
* Kernel releases src_buf for the cookie and runs job_finish so the
* m2m scheduler can dispatch the next REQ. Independent of any
* pixel delivery — the dst_buf paired with this cookie may still
* be parked, awaiting a future RESP with HAS_PIXELS + matching
* output_src_pts.
*
* Both flags may be set in a single message (steady-state path with
* no codec reorder lag — the just-sent packet immediately yielded a
* frame whose pts == this REQ's src_pts).
*/
#define DAEDALUS_RESP_FLAG_HAS_PIXELS 0x00000001u
#define DAEDALUS_RESP_FLAG_SRC_CONSUMED 0x00000002u
struct daedalus_resp_frame {
__u32 status;
__u32 codec_id;
@@ -258,16 +228,7 @@ struct daedalus_resp_frame {
__u32 luma_len;
__u32 chroma_len;
__u32 fnv1a_yuv;
__u32 flags; /* bitmask of DAEDALUS_RESP_FLAG_* */
__u32 reserved0; /* explicit pad to 8-byte align output_src_pts */
/*
* Set when DAEDALUS_RESP_FLAG_HAS_PIXELS is in flags. Identifies
* which OUTPUT bitstream's slices produced the pixels in this
* RESP — kernel completes the CAPTURE buffer whose inflight item
* has src_pts == output_src_pts. Ignored when HAS_PIXELS is
* clear.
*/
__u64 output_src_pts;
__u32 reserved;
};
/* -- chardev ioctl ABI ----------------------------------------------- */
+20
View File
@@ -167,6 +167,26 @@ static int daedalus_chardev_release(struct inode *inode, struct file *file)
}
mutex_unlock(&dev->req_lock);
/*
* Drain the V4L2-side in-flight list before the daemon goes
* away. Any REQ_DECODE we already sent to the daemon won't
* get a matching RESP_FRAME — without this drain,
* v4l2_m2m_cancel_job() in the V4L2 consumer's close() path
* (or in vb2's STREAMOFF path) blocks forever waiting for a
* job_finish that will never arrive, and the consumer becomes
* unkillable D-state. Issue #146.
*
* Done AFTER draining the request queue: any REQ_DECODE still
* sitting in dev->req_queue is per definition not yet "in
* flight" (the kernel never released it to the daemon), so it
* doesn't need the m2m-job-finish dance — freeing the message
* is sufficient. The inflight list holds entries the kernel
* already committed to (added in device_run after the message
* was queued or written), which is exactly what needs to be
* failed back to vb2 here.
*/
daedalus_drain_inflight_on_disconnect();
mutex_lock(&dev->open_lock);
dev->opened = 0;
mutex_unlock(&dev->open_lock);
+145 -214
View File
@@ -611,28 +611,8 @@ struct daedalus_inflight {
struct list_head list;
u32 cookie;
struct daedalus_ctx *ctx;
/*
* src_buf / dst_buf decouple in the daedalus-v4l2#6 reorder fix.
* src_buf is cleared (NULL'd) when DAEDALUS_RESP_FLAG_SRC_CONSUMED
* arrives — that signals libavcodec has accepted the bitstream
* even if no display-order frame is ready yet. dst_buf is cleared
* when DAEDALUS_RESP_FLAG_HAS_PIXELS arrives — the daemon has
* written pixels into this CAPTURE buffer. When both are NULL
* the inflight entry is removed and freed.
*/
struct vb2_v4l2_buffer *src_buf;
struct vb2_v4l2_buffer *dst_buf;
/*
* src_buf->vb2_buf.timestamp captured at device_run time.
* Mirrored into REQ_DECODE.src_pts so the daemon can set
* pkt->pts = src_pts on avcodec_send_packet, and read back
* frame->pts to identify which OUTPUT bitstream produced the
* current display-order frame. Kept here so the kernel can
* stamp dst_buf.timestamp explicitly at HAS_PIXELS time even
* though V4L2_BUF_FLAG_TIMESTAMP_COPY's automatic src->dst
* pairing no longer applies (src/dst lifecycles decoupled).
*/
u64 src_pts;
/*
* Captured media_request the src_buf was bound to (if any).
* Set by device_run from src_buf->vb2_buf.req_obj.req;
@@ -643,22 +623,16 @@ struct daedalus_inflight {
struct media_request *req;
};
/*
* Peek (don't remove). The split-completion path may receive
* multiple RESP_FRAME messages on a single inflight item (one for
* SRC_CONSUMED, one for HAS_PIXELS — possibly separated in time if
* libavcodec held the picture for display reorder). Caller removes
* the entry only when both src_buf and dst_buf have been cleared
* from inside the inflight lock.
*/
static struct daedalus_inflight *
daedalus_inflight_peek_locked(struct daedalus_dev *dev, u32 cookie)
daedalus_inflight_pop_locked(struct daedalus_dev *dev, u32 cookie)
{
struct daedalus_inflight *e;
list_for_each_entry(e, &dev->inflight, list) {
if (e->cookie == cookie)
if (e->cookie == cookie) {
list_del(&e->list);
return e;
}
}
return NULL;
}
@@ -731,7 +705,6 @@ static void daedalus_device_run(void *priv)
size_t blen, payload_len;
u32 cookie;
int ret;
bool claimed = false; /* src/dst removed from m2m rdy_queue */
src_buf = v4l2_m2m_next_src_buf(ctx->m2m_ctx);
dst_buf = v4l2_m2m_next_dst_buf(ctx->m2m_ctx);
@@ -822,17 +795,6 @@ static void daedalus_device_run(void *priv)
req->codec_id = cid;
req->bitstream_len = (u32) blen;
/*
* Ferry the OUTPUT buffer's vb2 timestamp through to the
* daemon for the H.264 B-frame display-reorder fix
* (daedalus-v4l2#6). Daemon sets pkt->pts = src_pts before
* avcodec_send_packet; libavcodec stamps frame->pts with
* the same value when it eventually outputs the frame in
* display order, letting the daemon route HAS_PIXELS RESPs
* to the correct cookie even when libavcodec's display
* order disagrees with V4L2's decode submission order.
*/
req->src_pts = (u64) src_buf->vb2_buf.timestamp;
req->capture_width = ctx->dst_fmt.width;
req->capture_height = ctx->dst_fmt.height;
req->capture_pix_fmt = ctx->dst_fmt.pixelformat;
@@ -857,34 +819,11 @@ static void daedalus_device_run(void *priv)
inf = kzalloc(sizeof(*inf), GFP_KERNEL);
if (!inf)
goto fail_buf_error;
/*
* Take both buffers off the m2m ready-queue HERE — before the
* inflight list grows. Once src_consumed releases the src side
* and the m2m scheduler can dispatch the next device_run, the
* NEW device_run mustn't see this dst_buf (which we're still
* holding for a future HAS_PIXELS). Without this claim,
* v4l2_m2m_next_dst_buf at the next device_run returns the same
* parked dst_buf, two inflight entries reference it, and the
* later HAS_PIXELS triggers a list_del on an already-removed
* vb2_buffer → kernel panic (observed on Pi CM5 hard reboot
* during mpv vaapi-copy playback of 720p H.264, 2026-05-21).
*
* Both helpers are inline list_del+counter-decrement under the
* q_ctx rdy_spinlock — safe to call from device_run on the
* buffer we just peeked via next_*_buf above. Mirrors the
* amphion vdec/venc pattern.
*/
v4l2_m2m_src_buf_remove_by_buf(ctx->m2m_ctx, src_buf);
v4l2_m2m_dst_buf_remove_by_buf(ctx->m2m_ctx, dst_buf);
claimed = true;
cookie = daedalus_next_cookie();
inf->cookie = cookie;
inf->ctx = ctx;
inf->src_buf = src_buf;
inf->dst_buf = dst_buf;
inf->src_pts = req->src_pts;
/*
* Capture the bound media_request (if any) so the
* completion path can call v4l2_ctrl_request_complete +
@@ -932,13 +871,11 @@ static void daedalus_device_run(void *priv)
fail_buf_error:
if (src_buf) {
if (!claimed)
v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_ERROR);
}
if (dst_buf) {
if (!claimed)
v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
v4l2_m2m_buf_done(dst_buf, VB2_BUF_STATE_ERROR);
}
kfree(req);
@@ -952,185 +889,179 @@ static const struct v4l2_m2m_ops daedalus_m2m_ops = {
/* -- chardev RESP_FRAME → buf_done bridge ---------------------------- */
/*
* Pack the daemon's pixel delivery into the inflight item's CAPTURE
* buffer. Called from daedalus_complete_resp_frame on the
* HAS_PIXELS branch, after the lock has been dropped (vb2 ops may
* sleep / take their own locks). The dst_buf reference was
* snapshotted under the inflight lock and cleared from the entry,
* so no other RESP can race for this buffer.
*
* pixels_len == 0 → dmabuf path (Phase 8.6+); the daemon mmap'd the
* CAPTURE plane via GET_DMABUF and wrote pixels in place; we just
* set the plane payloads. pixels_len > 0 → legacy Phase 8.5 inline
* NV12 path; we memcpy from the chardev payload.
*/
static void daedalus_pack_pixels_into_dst(struct vb2_v4l2_buffer *dst_buf,
const struct daedalus_resp_frame *fr,
const u8 *pixels, size_t pixels_len)
{
struct vb2_buffer *vb = &dst_buf->vb2_buf;
void *dst_y, *dst_uv;
u32 y_size, uv_size;
unsigned int p;
if (pixels_len) {
y_size = min_t(u32, fr->luma_len,
(u32) vb2_plane_size(vb, 0));
uv_size = vb->num_planes > 1 ?
min_t(u32, fr->chroma_len,
(u32) vb2_plane_size(vb, 1)) : 0;
dst_y = vb2_plane_vaddr(vb, 0);
dst_uv = vb->num_planes > 1 ?
vb2_plane_vaddr(vb, 1) : NULL;
if (dst_y && y_size && pixels_len >= y_size)
memcpy(dst_y, pixels, y_size);
else
y_size = 0;
if (dst_uv && uv_size &&
pixels_len >= y_size + uv_size)
memcpy(dst_uv, pixels + y_size, uv_size);
else
uv_size = 0;
vb2_set_plane_payload(vb, 0, y_size);
if (vb->num_planes > 1)
vb2_set_plane_payload(vb, 1, uv_size);
} else {
for (p = 0; p < vb->num_planes; p++)
vb2_set_plane_payload(vb, p,
vb2_plane_size(vb, p));
}
}
void daedalus_complete_resp_frame(u32 cookie,
const struct daedalus_resp_frame *fr,
const u8 *pixels, size_t pixels_len)
{
struct daedalus_dev *dev = g_daedalus_dev;
struct daedalus_inflight *inf;
struct daedalus_ctx *ctx = NULL;
struct vb2_v4l2_buffer *src_to_complete = NULL;
struct vb2_v4l2_buffer *dst_to_complete = NULL;
struct media_request *req_to_complete = NULL;
enum vb2_buffer_state state;
u64 dst_timestamp = 0;
bool entry_freed = false;
bool has_pixels, src_consumed;
void *dst_y, *dst_uv;
u32 y_size, uv_size;
if (!dev)
return;
state = (fr->status == DAEDALUS_DECODE_OK)
? VB2_BUF_STATE_DONE : VB2_BUF_STATE_ERROR;
has_pixels = !!(fr->flags & DAEDALUS_RESP_FLAG_HAS_PIXELS);
src_consumed = !!(fr->flags & DAEDALUS_RESP_FLAG_SRC_CONSUMED);
if (!has_pixels && !src_consumed) {
pr_warn_ratelimited(
"daedalus_v4l2: RESP_FRAME cookie=%u with neither HAS_PIXELS nor SRC_CONSUMED — ignoring\n",
cookie);
return;
}
mutex_lock(&dev->inflight_lock);
inf = daedalus_inflight_peek_locked(dev, cookie);
inf = daedalus_inflight_pop_locked(dev, cookie);
mutex_unlock(&dev->inflight_lock);
if (!inf) {
mutex_unlock(&dev->inflight_lock);
pr_warn_ratelimited(
"daedalus_v4l2: RESP_FRAME for unknown cookie=%u\n",
cookie);
return;
}
ctx = inf->ctx;
state = (fr->status == DAEDALUS_DECODE_OK)
? VB2_BUF_STATE_DONE : VB2_BUF_STATE_ERROR;
/*
* Snapshot what this RESP completes and clear the matching
* fields on the inflight item, so concurrent RESPs (e.g. a
* later HAS_PIXELS arriving on the same cookie after this
* SRC_CONSUMED clears src_buf) see the correct residual
* state. Actual vb2 buf_done calls happen below the lock.
* Two routes the daemon can take, both supported:
*
* Sanity check on output_src_pts only when HAS_PIXELS is
* set — the daemon's output_src_pts should equal this
* inflight's stored src_pts, since the daemon routes pixels
* to the cookie of the OUTPUT bitstream that contained the
* frame's slices (which is what we stored at device_run time).
* Surface a mismatch loudly — indicates daemon-side pts→cookie
* mapping bug, not silent data corruption.
* (a) dmabuf path (Phase 8.6+) — daemon called
* DAEDALUS_IOC_GET_DMABUF, mmap'd the CAPTURE buffer,
* wrote pixels in place. RESP_FRAME carries metadata
* only (pixels_len == 0). Each plane's payload is
* the full plane size (the daemon wrote everything
* the format requires).
*
* (b) Phase 8.5 inline path — daemon shipped raw NV12 in
* the chardev payload (≤ 64 KiB cap). We memcpy
* into the vb2 buffer. Plane payloads come from
* the daemon's NV12 luma/chroma counts.
*/
if (has_pixels) {
if (fr->output_src_pts != inf->src_pts)
pr_warn_ratelimited(
"daedalus_v4l2: RESP HAS_PIXELS cookie=%u output_src_pts=%llu but inflight.src_pts=%llu — daemon dispatch bug?\n",
cookie,
(unsigned long long) fr->output_src_pts,
(unsigned long long) inf->src_pts);
if (state == VB2_BUF_STATE_DONE) {
struct vb2_buffer *vb = &inf->dst_buf->vb2_buf;
unsigned int p;
dst_to_complete = inf->dst_buf;
dst_timestamp = inf->src_pts;
inf->dst_buf = NULL;
if (pixels_len) {
/* (b) inline NV12 copy — legacy 2-plane only */
y_size = min_t(u32, fr->luma_len,
(u32) vb2_plane_size(vb, 0));
uv_size = vb->num_planes > 1 ?
min_t(u32, fr->chroma_len,
(u32) vb2_plane_size(vb, 1)) : 0;
dst_y = vb2_plane_vaddr(vb, 0);
dst_uv = vb->num_planes > 1 ?
vb2_plane_vaddr(vb, 1) : NULL;
if (dst_y && y_size && pixels_len >= y_size)
memcpy(dst_y, pixels, y_size);
else
y_size = 0;
if (dst_uv && uv_size &&
pixels_len >= y_size + uv_size)
memcpy(dst_uv, pixels + y_size, uv_size);
else
uv_size = 0;
vb2_set_plane_payload(vb, 0, y_size);
if (vb->num_planes > 1)
vb2_set_plane_payload(vb, 1, uv_size);
} else {
/* (a) dmabuf path: plane is fully populated by
* the daemon, so payload == sizeimage. */
for (p = 0; p < vb->num_planes; p++)
vb2_set_plane_payload(vb, p,
vb2_plane_size(vb, p));
}
}
if (src_consumed) {
src_to_complete = inf->src_buf;
req_to_complete = inf->req;
inf->src_buf = NULL;
inf->req = NULL;
}
/*
* Phase 8.14: if the src_buf was bound to a media_request
* (libva-driven decode path), complete the per-request
* control state BEFORE buf_done_and_job_finish. vb2-core's
* buf_done unbinds the buffer's req_obj on its own, but the
* control object stays bound until v4l2_ctrl_request_complete
* runs — only after BOTH objects unbind does the request
* transition to MEDIA_REQUEST_STATE_COMPLETE and wake any
* userspace poll on the request fd.
*
* For non-request flows (test_m2m_stream direct QBUF) inf->req
* is NULL and v4l2_ctrl_request_complete just no-ops.
*/
if (inf->req)
v4l2_ctrl_request_complete(inf->req, &inf->ctx->hdl);
if (!inf->src_buf && !inf->dst_buf) {
list_del(&inf->list);
entry_freed = true;
}
/*
* Use the buf_done_and_job_finish helper rather than plain
* buf_done + job_finish: the helper pops the buffers off
* the m2m queue before marking them done, otherwise the
* scheduler immediately re-runs device_run on the same
* still-queued src buffer. Caught during Phase 8.5 first
* run — second REQ_DECODE with identical bitstream + oops
* in stop_streaming when the test client tore down.
*/
v4l2_m2m_buf_done_and_job_finish(dev->m2m_dev, inf->ctx->m2m_ctx,
state);
/*
* Release our reference taken in device_run; safe to do
* AFTER buf_done_and_job_finish (which dropped the vb2
* reference) because we still hold this one. If the
* refcount hits zero here, media-core releases the request.
*/
if (inf->req)
media_request_put(inf->req);
kfree(inf);
}
/* -- daemon disconnect drain ----------------------------------------- */
void daedalus_drain_inflight_on_disconnect(void)
{
struct daedalus_dev *dev = g_daedalus_dev;
struct daedalus_inflight *inf, *tmp;
LIST_HEAD(local);
if (!dev)
return;
/*
* Splice the in-flight list onto a local list under the lock,
* then process each entry with the lock dropped — every
* v4l2_m2m_buf_done_and_job_finish call may itself try to
* re-enter device_run via the scheduler (which would need to
* walk dev->inflight again on a future REQ_DECODE), and
* v4l2_m2m_buf_done can sleep via vb2's buffer-done dispatch.
* Holding inflight_lock across either is a deadlock invitation.
*/
mutex_lock(&dev->inflight_lock);
list_splice_init(&dev->inflight, &local);
mutex_unlock(&dev->inflight_lock);
/*
* Complete the CAPTURE side first (when applicable). vb2-core's
* V4L2_BUF_FLAG_TIMESTAMP_COPY semantics no longer auto-copy
* src→dst timestamps because src and dst are no longer paired
* 1:1 in m2m's view — stamp dst explicitly from the inflight's
* stored src_pts (= the OUTPUT vb2_buf.timestamp captured at
* device_run). The V4L2 client gets the same display-PTS it
* originally set on the OUTPUT side.
*/
if (dst_to_complete) {
if (state == VB2_BUF_STATE_DONE)
daedalus_pack_pixels_into_dst(dst_to_complete, fr,
pixels, pixels_len);
dst_to_complete->vb2_buf.timestamp = dst_timestamp;
list_for_each_entry_safe(inf, tmp, &local, list) {
list_del(&inf->list);
v4l2_warn(&dev->v4l2_dev,
"draining inflight cookie=%u (daemon disconnect)\n",
inf->cookie);
/*
* The buffer was already removed from m2m's rdy_queue at
* device_run time (see the "Take both buffers off ..."
* block). Just call buf_done here — calling
* v4l2_m2m_dst_buf_remove_by_buf again would list_del a
* list_head that's no longer linked, smashing the list.
* Complete the per-request control state before
* buf_done_and_job_finish, same ordering as the success
* path in daedalus_complete_resp_frame(). For non-request
* flows inf->req is NULL and v4l2_ctrl_request_complete
* no-ops.
*/
v4l2_m2m_buf_done(dst_to_complete, state);
}
if (inf->req)
v4l2_ctrl_request_complete(inf->req, &inf->ctx->hdl);
/*
* Complete the OUTPUT side: release the bound media_request's
* controls (libva-driven path), drop our request reference taken
* in device_run, mark src done, then job_finish so the m2m
* scheduler can dispatch the next pending REQ on this ctx. The
* dst_buf for this cookie may still be parked (HAS_PIXELS hasn't
* arrived yet — libavcodec is holding the frame for display-
* order release). That's fine: the next device_run picks a
* different next_dst_buf out of the CAPTURE queue and proceeds.
*/
if (src_to_complete) {
if (req_to_complete)
v4l2_ctrl_request_complete(req_to_complete, &ctx->hdl);
/* Already off the rdy_queue (see device_run claim) — buf_done only. */
v4l2_m2m_buf_done(src_to_complete, state);
if (req_to_complete)
media_request_put(req_to_complete);
v4l2_m2m_job_finish(dev->m2m_dev, ctx->m2m_ctx);
}
/*
* Mark both buffers ERROR and clear the m2m scheduler's
* job_running flag. This is what unsticks
* v4l2_m2m_cancel_job() inside the consumer's close()
* path; without it, the consumer hangs in TASK_UNINTERRUPTIBLE
* forever (issue #146).
*/
v4l2_m2m_buf_done_and_job_finish(dev->m2m_dev,
inf->ctx->m2m_ctx,
VB2_BUF_STATE_ERROR);
if (inf->req)
media_request_put(inf->req);
if (entry_freed)
kfree(inf);
}
}
/* -- v4l2_ioctl_ops -------------------------------------------------- */
+23
View File
@@ -103,4 +103,27 @@ void daedalus_complete_resp_frame(u32 cookie,
int daedalus_export_capture_dmabuf(u32 cookie, u32 plane, u32 flags,
int *out_fd);
/**
* daedalus_drain_inflight_on_disconnect() - fail all in-flight m2m jobs
*
* Called from daedalus_chardev_release() when the daemon disconnects
* (graceful close, SIGKILL, daemon crash — anything that triggers
* chardev release). Walks the in-flight list and, for every entry,
* marks both src+dst buffers VB2_BUF_STATE_ERROR and calls
* v4l2_m2m_buf_done_and_job_finish() to clear the m2m scheduler's
* "job_running" flag.
*
* Without this, v4l2_m2m_cancel_job() (called from
* v4l2_m2m_ctx_release() during the consumer's close() / task exit)
* blocks forever waiting for a job_finish that the dead daemon will
* never send — the consumer enters TASK_UNINTERRUPTIBLE and survives
* SIGKILL until reboot. See issue #146 for the full trace.
*
* Safe to call with an empty in-flight list; no-op in that case.
* Must NOT be called from atomic context — uses inflight_lock
* (sleeping mutex) and v4l2_m2m_buf_done_and_job_finish (which can
* sleep via vb2 buffer-done dispatch).
*/
void daedalus_drain_inflight_on_disconnect(void);
#endif /* DAEDALUS_V4L2_MAIN_H */