daemon: per-frame decode_us + periodic stats (#11 step 1) #15

Merged
marfrit merged 1 commits from noether/daemon-decode-stats into main 2026-05-21 18:26:50 +00:00
Owner

Step 1 of the daemon-rewrite arc in #11. Pure observability — no behaviour change to decode.

Changes

  • Per-frame decoder: OK log line gains decode_us=N (libavcodec round-trip wall time only — excludes bitstream pack / SPS-PPS synth / plane pack).
  • New decoder stats summary line every 60 decoded frames: codec, frame count, window seconds, fps, avg decode_us, MBs/s throughput, B/MB bitrate.

Baseline on higgs (Pi CM5, 720p H.264 / bbb)

frames window fps avg decode_us MBs/s B/MB
60 2.20 s 27.33 2690 98 K 1.03
120 4.70 s 25.54 3122 92 K 1.18
240 9.70 s 24.75 4104 89 K 1.55
360 14.82 s 24.29 4422 88 K 1.66

Steady-state ~4 ms decode_us at ~24 fps. Workload (90 K MBs/s) is 0.05-1 % of daedalus-fourier NEON kernel ceilings for the H.264 primitives shipped today (IDCT 4×4 @ 175 Mblocks/s; deblock luma-v @ 92 Medges/s; qpel mc20 @ 131 Mblocks/s). Substituting any single primitive will only shave a small slice of the 4 ms — most of that time is libavcodec's CABAC + MV prediction + intra prediction. Useful to know before picking the first substitution target in step 2.

Wire protocol unchanged. No kmod change.

Step 1 of the daemon-rewrite arc in #11. Pure observability — no behaviour change to decode. ## Changes - Per-frame `decoder: OK` log line gains `decode_us=N` (libavcodec round-trip wall time only — excludes bitstream pack / SPS-PPS synth / plane pack). - New `decoder stats` summary line every 60 decoded frames: codec, frame count, window seconds, fps, avg decode_us, MBs/s throughput, B/MB bitrate. ## Baseline on higgs (Pi CM5, 720p H.264 / bbb) | frames | window | fps | avg decode_us | MBs/s | B/MB | |---|---|---|---|---|---| | 60 | 2.20 s | 27.33 | 2690 | 98 K | 1.03 | | 120 | 4.70 s | 25.54 | 3122 | 92 K | 1.18 | | 240 | 9.70 s | 24.75 | 4104 | 89 K | 1.55 | | 360 | 14.82 s | 24.29 | 4422 | 88 K | 1.66 | Steady-state ~4 ms decode_us at ~24 fps. Workload (90 K MBs/s) is **0.05-1 % of daedalus-fourier NEON kernel ceilings** for the H.264 primitives shipped today (IDCT 4×4 @ 175 Mblocks/s; deblock luma-v @ 92 Medges/s; qpel mc20 @ 131 Mblocks/s). Substituting any single primitive will only shave a small slice of the 4 ms — most of that time is libavcodec's CABAC + MV prediction + intra prediction. Useful to know before picking the first substitution target in step 2. Wire protocol unchanged. No kmod change.
marfrit added 1 commit 2026-05-21 18:17:54 +00:00
Establishes observable baseline metrics before any daedalus-fourier
kernel substitution lands.  Step 1 of the daemon-rewrite arc tracked
at daedalus-v4l2#11.

Changes
-------
- Per-frame `decoder: OK ...` log line now carries decode_us=N (the
  send_packet + receive_frame wall-clock cost in microseconds —
  exclusively the libavcodec round-trip, not the bitstream pack /
  SPS-PPS synth / pack-to-planes work).
- New "decoder stats" summary line every DAEDALUS_STATS_EVERY (60)
  decoded frames, reporting: codec, frame count, window seconds,
  fps, avg decode_us, MBs/s throughput, bytes/MB bitrate.

Sample
------
  decoder stats: codec=h264 frames=300 window=12.32s fps=24.35
                 avg_decode_us=4216.4 mbs_per_s=87643 bs_b_per_mb=1.56

What this tells us
------------------
Steady-state on higgs (Pi CM5) decoding bbb_720p_h264.mp4:
~4 ms decode_us, ~90 K MBs/s, well under the daedalus-fourier
NEON kernel ceilings (IDCT 4×4 @ 175 Mblocks/s, deblock @ 92 Medges/s,
qpel mc20 @ 131 Mblocks/s — all 100-1000× over our actual workload).

Means the 4 ms/frame is mostly libavcodec's CABAC + MV prediction +
intra prediction overhead, NOT the pixel-math primitives.
Substituting a single primitive would shave only a small slice of
the 4 ms.  Useful as guidance for the upcoming substitution work —
we'll pick the primitive with the largest cycle cost relative to
the alternative, and measure CPU saved per substitution.

No behaviour change: counters are static + unsynchronised (the
chardev event loop is single-threaded); reset when codec_id changes.
clock_gettime(CLOCK_MONOTONIC) for timing.
marfrit merged commit 3bc0da168c into main 2026-05-21 18:26:50 +00:00
marfrit deleted branch noether/daemon-decode-stats 2026-05-21 18:26:50 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: reauktion/daedalus-v4l2#15