daemon: per-frame decode_us + periodic stats (#11 step 1) #15
Reference in New Issue
Block a user
Delete Branch "noether/daemon-decode-stats"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Step 1 of the daemon-rewrite arc in #11. Pure observability — no behaviour change to decode.
Changes
decoder: OKlog line gainsdecode_us=N(libavcodec round-trip wall time only — excludes bitstream pack / SPS-PPS synth / plane pack).decoder statssummary line every 60 decoded frames: codec, frame count, window seconds, fps, avg decode_us, MBs/s throughput, B/MB bitrate.Baseline on higgs (Pi CM5, 720p H.264 / bbb)
Steady-state ~4 ms decode_us at ~24 fps. Workload (90 K MBs/s) is 0.05-1 % of daedalus-fourier NEON kernel ceilings for the H.264 primitives shipped today (IDCT 4×4 @ 175 Mblocks/s; deblock luma-v @ 92 Medges/s; qpel mc20 @ 131 Mblocks/s). Substituting any single primitive will only shave a small slice of the 4 ms — most of that time is libavcodec's CABAC + MV prediction + intra prediction. Useful to know before picking the first substitution target in step 2.
Wire protocol unchanged. No kmod change.
Establishes observable baseline metrics before any daedalus-fourier kernel substitution lands. Step 1 of the daemon-rewrite arc tracked at daedalus-v4l2#11. Changes ------- - Per-frame `decoder: OK ...` log line now carries decode_us=N (the send_packet + receive_frame wall-clock cost in microseconds — exclusively the libavcodec round-trip, not the bitstream pack / SPS-PPS synth / pack-to-planes work). - New "decoder stats" summary line every DAEDALUS_STATS_EVERY (60) decoded frames, reporting: codec, frame count, window seconds, fps, avg decode_us, MBs/s throughput, bytes/MB bitrate. Sample ------ decoder stats: codec=h264 frames=300 window=12.32s fps=24.35 avg_decode_us=4216.4 mbs_per_s=87643 bs_b_per_mb=1.56 What this tells us ------------------ Steady-state on higgs (Pi CM5) decoding bbb_720p_h264.mp4: ~4 ms decode_us, ~90 K MBs/s, well under the daedalus-fourier NEON kernel ceilings (IDCT 4×4 @ 175 Mblocks/s, deblock @ 92 Medges/s, qpel mc20 @ 131 Mblocks/s — all 100-1000× over our actual workload). Means the 4 ms/frame is mostly libavcodec's CABAC + MV prediction + intra prediction overhead, NOT the pixel-math primitives. Substituting a single primitive would shave only a small slice of the 4 ms. Useful as guidance for the upcoming substitution work — we'll pick the primitive with the largest cycle cost relative to the alternative, and measure CPU saved per substitution. No behaviour change: counters are static + unsynchronised (the chardev event loop is single-threaded); reset when codec_id changes. clock_gettime(CLOCK_MONOTONIC) for timing.