814b74d0bb
Establishes observable baseline metrics before any daedalus-fourier
kernel substitution lands. Step 1 of the daemon-rewrite arc tracked
at daedalus-v4l2#11.
Changes
-------
- Per-frame `decoder: OK ...` log line now carries decode_us=N (the
send_packet + receive_frame wall-clock cost in microseconds —
exclusively the libavcodec round-trip, not the bitstream pack /
SPS-PPS synth / pack-to-planes work).
- New "decoder stats" summary line every DAEDALUS_STATS_EVERY (60)
decoded frames, reporting: codec, frame count, window seconds,
fps, avg decode_us, MBs/s throughput, bytes/MB bitrate.
Sample
------
decoder stats: codec=h264 frames=300 window=12.32s fps=24.35
avg_decode_us=4216.4 mbs_per_s=87643 bs_b_per_mb=1.56
What this tells us
------------------
Steady-state on higgs (Pi CM5) decoding bbb_720p_h264.mp4:
~4 ms decode_us, ~90 K MBs/s, well under the daedalus-fourier
NEON kernel ceilings (IDCT 4×4 @ 175 Mblocks/s, deblock @ 92 Medges/s,
qpel mc20 @ 131 Mblocks/s — all 100-1000× over our actual workload).
Means the 4 ms/frame is mostly libavcodec's CABAC + MV prediction +
intra prediction overhead, NOT the pixel-math primitives.
Substituting a single primitive would shave only a small slice of
the 4 ms. Useful as guidance for the upcoming substitution work —
we'll pick the primitive with the largest cycle cost relative to
the alternative, and measure CPU saved per substitution.
No behaviour change: counters are static + unsynchronised (the
chardev event loop is single-threaded); reset when codec_id changes.
clock_gettime(CLOCK_MONOTONIC) for timing.