a7d585eee8cc0b7246d03e1279bef983b4cbf111
3 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
0de0288dce |
Phase 8.10+8.11: libva consumer integration scaffold
Brings daedalus_v4l2 from "standalone test client" to "VAAPI-
discoverable decoder" by adding the surface formats and
media-controller plumbing that libva-v4l2-request-fourier
(sibling repo) requires.
libva-v4l2-request-fourier patches (pushed separately):
- b5b3acf: daedalus_v4l2 added to known_decoder_drivers
- 2146341: meson option gate
This commit (daedalus-v4l2 side, 3 production changes):
1. V4L2_PIX_FMT_NV12 (single-plane) on CAPTURE
- Added to daedalus_capture_formats[] alongside NV12M + P010
- daedalus_fill_capture_fmt handles num_planes=1 case
(sizeimage = W*H*3/2, bytesperline = W)
- daemon pack_nv12_single_to_plane: Y at base+0,
interleaved CbCr at base+(stride*H); same byte content
as NV12M two-plane, different layout
- Required because libva-v4l2-request-fourier's video.c
only knows non-multi-plane NV12 (it advertises
v4l2_mplane=true but uses the single-plane fourcc).
- Verified byte-exact via test_m2m_stream against
ffmpeg -pix_fmt nv12 reference (VP9 1080p 10 frames,
31 MB).
2. V4L2 Request API media ops
- daedalus_media_ops = { vb2_request_validate,
v4l2_m2m_request_queue } assigned to mdev.ops before
media_device_init.
- Without this, MEDIA_IOC_REQUEST_ALLOC returned
-ENOTTY and no VAAPI consumer could allocate a
media_request.
3. Stateless control registration via v4l2_ctrl_new_custom
- Switched from v4l2_ctrl_new_std_compound(NULL p_def)
to v4l2_ctrl_new_custom — pattern rkvdec/cedrus/
hantro use. Adds a no-op s_ctrl callback.
Verification (hertz, Pi 5, 6.12.75+rpt-rpi-2712):
LibVA trace through `ffmpeg -hwaccel vaapi`:
vaInitialize / Profiles / Entrypoints / CreateConfig /
QuerySurfaceAttributes / CreateSurfaces / CreateContext
(cap_pool: 24 slots, 1 plane each) / CreateBuffer
(slice + picture params) / MEDIA_IOC_REQUEST_ALLOC
— all succeed.
Standalone NV12 decode path:
test_m2m_stream vp9_1080_stream.ivf out.nv12 1920 1080 vp9 nv12
→ 10/10 frames, byte-exact vs ffmpeg -pix_fmt nv12
vainfo (via libva-v4l2-request-fourier with our driver):
7 VAProfile entries with VAEntrypointVLD
(H264 Main/High/CBaseline/MultiviewHigh/StereoHigh,
VP9Profile0, AV1Profile0)
What's NOT here (Phase 8.12):
The libva trace stops at VIDIOC_S_EXT_CTRLS returning
EINVAL when populating V4L2_CID_STATELESS_VP9_FRAME on
the request. The compound-control payload validation
against the kernel's expected struct shape rejects.
This isn't a "missing line" fix — it needs proper
stateless control plumbing (the SPS/PPS/SliceParams
get_dims, validate, default-value paths that in-tree
rkvdec/cedrus/hantro implement to satisfy v4l2-core's
std_validate). Documented as Phase 8.12 scope.
The shipped integration is itself a meaningful deliverable:
all the framework scaffolding is in place; the remaining
gap is well-characterised and bounded.
See docs/phase_8_10_11_closure.md for the full trace
analysis + next-phase plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
1ae9528e76 |
Phase 8.8: throughput baseline + multi-codec streams + HDR
Per the correctness-before-speed principle: measure before
optimising. Roadmap going in said "QPU dispatch substitution
to hit 30fps@1080p". Measurement on hertz shows the FFmpeg
software path already hits 65-88 fps@1080p across all three
codecs — QPU substitution would be premature optimisation.
So 8.8 ships what's actually useful:
1. Per-frame timing in test_m2m_stream.
2. Multi-frame AV1 + H.264 streams verified byte-exact at
1080p (closes the "VP9-only stream tests" gap from 8.7).
3. HDR / 10-bit via V4L2_PIX_FMT_P010 + daemon
pack_p010_to_plane.
Test harness (tools/test_m2m_stream.c):
- Per-frame µs timing via CLOCK_MONOTONIC; reports mean/p50/
p99/min/max + wall ms + fps.
- Annex-B H.264 parser: split on 3-/4-byte start codes,
accumulate NALs into access units (push on VCL NAL types
1 or 5). Without AU grouping FFmpeg rejects SPS/PPS-only
buffers as "no frame!".
- Format auto-detect (DKIF magic → IVF; else Annex-B).
- Optional 6th arg `[capture]`: nv12m | p010.
- CAPTURE mmap path generalised for num_planes==1 (P010).
Kernel (kernel/daedalus_v4l2_main.c):
- CAPTURE formats array {NV12M, P010}; enum_fmt walks it.
- daedalus_fill_capture_fmt takes a fourcc:
NV12M: 2 planes, W*H + W*H/2 bytes, bpl=W
P010: 1 plane, W*H*2 + W*H bytes, bpl=W*2
- try_fmt preserves caller fourcc when supported.
- daedalus_complete_resp_frame's dmabuf path now sets each
plane's payload to vb2_plane_size(vb,p) — generalises
cleanly across 1-plane (P010) and 2-plane (NV12M) layouts;
the daemon fully populates the plane so payload =
sizeimage.
Daemon (daemon/src/decoder.c):
- pack_p010_to_plane: YUV420P10LE → P010 single-plane.
10-bit samples shifted left by 6 to MSB-align in 16-bit
words per V4L2 ABI. Y at base+0, interleaved CbCr right
after Y plane (per format spec for single-plane P010).
Strips source stride padding; respects destination stride.
- daedalus_decoder_run_request dispatches on
req->capture_pix_fmt (NV12M → pack_nv12_to_planes; P010
→ pack_p010_to_plane; else warn + skip).
- Includes <linux/videodev2.h> for fourcc constants.
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
1080p throughput baseline (30 frames testsrc, dmabuf path):
VP9 1080p: mean 12.0 ms, p99 15.9 ms, fps **83.1**, byte-exact ✓
AV1 1080p: mean 15.4 ms, p99 41.0 ms, fps **65.0**, byte-exact ✓
H.264 1080p: mean 11.3 ms, p99 21.5 ms, fps **88.3**, byte-exact ✓
All 2-3× over the 30fps-floor-is-fine criterion.
HDR / 10-bit 1080p P010:
10 frames, 62 MB output, fps **48.8**, byte-exact vs
`ffmpeg -pix_fmt p010le -f rawvideo`.
Small-frame P010 (320×240): fps 966 — fixed daemon overhead
dominates at low resolutions.
v4l2-compliance unchanged from 8.7: 49/49 passing.
Format enumeration confirms NM12 + P010 on CAPTURE.
Clean SIGTERM + rmmod; no kernel oops/WARN.
Roadmap update (docs/roadmap.md):
- 8.8 marked closed with closure-doc reference, including
the explicit "QPU substitution not needed" rationale.
- 8.9 reshaped: libva-v4l2-request consumer integration
(per project_consumer_target memory) — the actual
user-facing endpoint.
Per correctness-before-speed:
- Measured first; QPU work explicitly justified-out via data.
- Byte-exact pixel comparison for every codec/format combo
(NV12: VP9, AV1, H.264; P010: VP9 10-bit at 320×240 and
1080p).
- AU grouping in the Annex-B parser is the correct
semantic boundary, not just a workaround.
- vb2_plane_size for payload generalises to any plane
count, not hardcoded to 2.
Phase 8.9 next: libva-v4l2-request integration — close
the loop from YouTube/Firefox to /dev/video0 + daemon
playback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5965805d86 |
Phase 8.7: media controller + multi-frame streaming verification
Two pieces — both shipped:
1. Media controller binding closes the last v4l2-compliance
failure from 8.6 (DECODER_CMD, which requires has_media on
stateless decoders) and unlocks the V4L2 request API for
libva-v4l2-request.
2. Multi-frame streaming test exercises the daemon's
AVCodecContext state preservation across many REQ_DECODE
calls — Phase 8.6's tests pushed exactly one keyframe per
invocation; real content has P-frame references.
Compliance now reaches **49/49 passing.**
Kernel (kernel/daedalus_v4l2_main.{c,h}):
- Added `struct media_device mdev` to daedalus_dev.
- media_device_init(&mdev) BEFORE v4l2_device_register so
v4l2-core sees v4l2_dev.mdev = &mdev and binds the m2m
entities into the graph during register.
- After video_register_device:
v4l2_m2m_register_media_controller(..., MEDIA_ENT_F_PROC_VIDEO_DECODER)
then media_device_register so userspace sees the complete
graph in /dev/mediaN with the decoder entity tagged.
- daedalus_remove unwinds in reverse: unregister media,
unregister mc, unregister video, release m2m, unregister
v4l2, cleanup mdev.
- Error paths added for both new failure points.
Test harness (tools/test_m2m_stream.c, new):
- Multi-frame V4L2 m2m client: parses IVF → 4-deep buffer
rings on both queues → per-frame QBUF/DQBUF loop →
concatenates decoded NV12 to output file. Returns 0 only
if every input frame decoded without error.
- Same codec vocabulary as test_m2m_decode (vp9 | av1 |
h264 via 5th arg).
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
v4l2-compliance: 49 tests, 49 passed, 0 failed, 0 warnings.
$ v4l2-ctl --list-devices
daedalus-fourier V3D7+NEON (platform:daedalus_v4l2):
/dev/video0
/dev/media3
VP9 320×240 30 frames (1 keyframe + 29 P-frames, 3.46 MB
NV12): byte-for-byte match vs `ffmpeg -i in.ivf -pix_fmt
nv12 -f rawvideo`.
VP9 1920×1080 10 frames (31 MB NV12 through the dmabuf
path): byte-for-byte match vs same reference command.
Daemon log shows cookies 1..30 all completing cleanly in
order; lazily-opened AVCodecContext maintains reference
frames across the chardev round-trips.
Clean SIGTERM + rmmod, no oops/WARN.
Roadmap update (docs/roadmap.md):
- 8.7 marked closed with closure-doc reference.
- 8.8 reshaped: perf profiling, QPU dispatch substitution
via daedalus-fourier, multi-frame AV1/H.264, HDR (P010M).
Per correctness-before-speed:
- Order-correct media controller lifecycle (init → bind
v4l2_dev → register video → register mc → register
media; reverse for teardown).
- 4-deep buffer rings on both queues — the scheduler
actually pipelines multiple in-flight cookies through
the chardev (not just one-at-a-time as in 8.5/8.6 tests).
- Bit-exact comparison against ffmpeg, not "looks right."
- All resource paths cleaned on every error branch.
Phase 8.8 next: profile daemon hot loops, dlopen
daedalus-fourier from the daemon, swap FFmpeg per-block
calls for daedalus_dispatch_* where the kernel matches,
target 30fps@1080p from 30fps-floor-is-fine memory.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|