daedalus-v4l2

Author	SHA1	Message	Date
marfrit	7ff2d897ea	Merge pull request 'kernel: register H.264 DECODE_MODE + START_CODE menu controls' (#4 ) from noether/kernel-h264-menu-ctrls into main Reviewed-on: #4	2026-05-21 09:02:43 +00:00
claude-noether	69a62a922f	kernel: register H.264 DECODE_MODE + START_CODE menu controls libva-v4l2-request sets V4L2_CID_STATELESS_H264_DECODE_MODE and V4L2_CID_STATELESS_H264_START_CODE on the device fd at context init (see libva-v4l2-request-fourier src/context.c:577 — best-effort call, result is (void)cast). Our ctrl_handler did not advertise either control, so v4l2-core returned EINVAL on validate; userspace logged the noisy v4l2-request: Unable to set control(s): Invalid argument (error_idx=2/2 ioctl-level) at every Firefox/ffmpeg context creation, despite decode itself succeeding (the daemon already operates as FRAME_BASED + ANNEX_B and the per-request SPS/PPS/SCALING_MATRIX/DECODE_PARAMS batch lands fine). Register the two as v4l2_ctrl_new_std_menu with the only value each the daemon actually supports — FRAME_BASED for DECODE_MODE, ANNEX_B for START_CODE — and mask out the unsupported alternates (SLICE_BASED, NONE). Pattern matches rkvdec / hantro. Update the handler-init capacity hint to ARRAY_SIZE(daedalus_stateless_ctrls) + 2 to cover the additions. Verified: builds clean on 6.18.29+rpt-rpi-2712 (Pi CM5) DKMS source tree.	2026-05-21 11:01:41 +02:00
marfrit	f0d41867f6	Merge pull request 'kernel: per-ctx vb2 lock — Firefox multi-process VAAPI unblock' (#3 ) from noether/kernel-per-ctx-vb-mutex into main Reviewed-on: #3	2026-05-20 19:25:02 +00:00
marfrit	a3ada8ba38	kernel: per-ctx vb2 lock so concurrent clients don't serialise on dev mutex daedalus_queue_init was wiring both src_vq->lock and dst_vq->lock to ctx->dev->m2m_lock — a device-wide mutex. That serialises every vb2 ioctl (S_FMT, REQBUFS, QBUF, DQBUF, STREAMON, ...) across ALL concurrent clients of /dev/video0. For a single-client consumer like the test_m2m_* tools it doesn't matter; for Firefox, which spawns separate content + RDD + GPU processes that each open /dev/video0 and run libva probe simultaneously, the contention showed up as EBUSY from one libva session's S_FMT(OUTPUT_MPLANE) when another session was mid-streamon on the same device. Observable on higgs (Pi CM5): $ MOZ_VA_API_ENABLED=1 LIBVA_DRIVER_NAME=v4l2_request firefox ... v4l2-request: phase 8.10: opened daedalus_v4l2 at video_fd=32 ... v4l2-request: cap_pool_init: 24 slots ready v4l2-request: Unable to set format for type 10: Device or resource busy After this fix, each open() gets its own ctx->vb_mutex and the per-context vb2_queue locks are independent — Firefox's multi- process VAAPI clients no longer fight each other. YouTube playback on higgs runs through daedalus at ~230 fps sustained (640x368, libavcodec dlopen path), 7× headroom over the 30fps target. cedrus / rkvdec / hantro all use the per-ctx vb mutex pattern for the same reason. This mirrors them. Lifecycle: - mutex_init in daedalus_open (right after the kzalloc that creates ctx, before v4l2_fh_init). - mutex_destroy in daedalus_release (after v4l2_fh_exit, before kfree), and in the err_ctrl unwind path in daedalus_open. Verified end-to-end on higgs: - rmmod + modprobe the rebuilt .ko. - Restart daedalus-v4l2.service. - Firefox YouTube playback engages VAAPI, daemon journal shows cookie=1..N codec=3 (H.264) REQ_DECODE / decoder:OK pairs with unique per-frame fnv1a hashes. - No EBUSY in either firefox stderr or daemon journal during the entire session. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 21:23:44 +02:00
marfrit	462aa4b480	Merge pull request 'kernel: bind request controls to p_cur via v4l2_ctrl_request_setup' (#2 ) from noether/kernel-ctrl-request-setup into main Reviewed-on: #2	2026-05-20 18:37:12 +00:00
marfrit	29f16ece13	kernel: bind request controls to p_cur before reading them device_run was reading ctrl->p_cur.p_h264_* directly, but v4l2-m2m's request scheduler does NOT auto-bind the in-flight media_request's control values to the ctrl handler's p_cur slots — drivers have to call v4l2_ctrl_request_setup() explicitly. cedrus / rkvdec / hantro all do this in their device_run; daedalus didn't. Result: daedalus_collect_h264_meta() read stale or default values (whatever the prior request had left in p_cur, or v4l2_ctrl_new_custom initial state if no prior request had completed) instead of the S_EXT_CTRLS V4L2_CTRL_WHICH_REQUEST_VAL values libva-v4l2-request- fourier had just sent for THIS frame. The mismatch was a smoking gun on higgs after libva PR #9 / packages PR #52 landed an instrumentation log at h264_set_controls entry: libva boundary (sent to kernel): VAProfile=13 seq_fields=0x00032051 pic_fields=0x00000500 num_ref_frames=1 daedalus daemon (read from kernel p_cur): prof=100 level=41 ref_frames=0 flags=0x10 pps_flags=0x0 After calling v4l2_ctrl_request_setup() at the top of device_run: daedalus daemon (read from kernel p_cur): prof=66 level=11 ref_frames=1 poc_type=2 flags=0x50 pps_flags=0x88 — matches what libva sent, matches the bitstream's actual SPS. End-to-end test on higgs with libva-v4l2-request-fourier 1.0.0+r382 +gc1bb444 (after-fix-3-and-fix-4-instrumentation) + this kernel patch: $ LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi \ -hwaccel_device /dev/dri/renderD128 -i h264_test.mp4 \ -frames:v 1 -f null - ... rc=0 daemon journal: zero "error while decoding MB" lines, zero "reference frames exceeds max" lines. Per-frame fnv1a hashes differ (0xf1c515aa, 0x16e915e8, 0x16bd16cc, ...) instead of the constant 0x6a6a05c5 "give-up-and-zero" hash from before — libavcodec is actually decoding real pixel content from each P-frame. Pair note: the daemon side already calls v4l2_ctrl_request_complete in daedalus_complete_resp_frame (line 834) — this commit pairs the setup half with that completion half. The daemon side change (decoder.c) is a small log-level promotion: the per-frame "h264 SPS/PPS prepended ..." trace went from log_debug to log_info so the journal shows what's being shipped into libavcodec without needing a daemon rebuild with --debug. Matches the libva- side h264_set_controls instrumentation that landed in libva PR #9. Closes part of issue libva-v4l2-request-fourier#8 — the SPS/PPS field-value gap. Profile/level still come from libva's session- derived hardcoded values (h264_profile_to_idc + h264_derive_level_ idc) which is sufficient for libavcodec to accept the synthesised NAL unit; a true stream-parsed profile/level would need SPS-NAL parsing in libva — separate operator-design call. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 20:35:06 +02:00
marfrit	3dd0eb070a	Merge pull request 'DAEMON-PPS: synthesise H.264 SPS/PPS NAL units from V4L2 controls' (#1 ) from noether/daemon-pps-h264-nal-synth into main Reviewed-on: #1	2026-05-20 16:51:26 +00:00
marfrit	8c1d9960c4	DAEMON-PPS: synthesise H.264 SPS/PPS NAL units from V4L2 controls libva-v4l2-request-fourier (and any V4L2-stateless-API consumer) passes H.264 SPS/PPS as separate V4L2_CID_STATELESS_H264_{SPS,PPS} controls; only the slice NAL goes into the OUTPUT buffer. This is correct per the V4L2 stateless contract. But libavcodec — which the daedalus daemon uses for actual decode (Option γ) — wants a self-contained AnnexB stream including SPS+PPS before any slice. Result on higgs: "non-existing PPS 0 referenced" + decode_slice_ header errors on every H.264 frame, even after LIBVA-1 and -2 routing correctly delivered the request to the daemon. Fix splits across kernel + daemon, keeping the kernel module as a thin transport and putting the actual NAL encoding in userspace: include/daedalus_v4l2_proto.h: Add struct daedalus_h264_meta (the four v4l2_ctrl_h264_* structs the kernel collects) and DAEDALUS_REQ_FLAG_H264_META (set in req.flags when the meta block is present between the daedalus_req_decode prefix and the slice bitstream). kernel/daedalus_v4l2_main.c: Add daedalus_collect_h264_meta() — reads the H.264 ctrl values from the bound media_request via v4l2_ctrl_find + ctrl->p_cur.p_h264_*. device_run() calls it on H.264 codec_id, copies the structs into the REQ_DECODE payload between the prefix and bitstream, and sets the flag. Payload size is bounds-checked against DAEDALUS_PROTO_MAX_PAYLOAD so an over- sized slice + meta fails loud instead of truncating. daemon/src/bitstream_writer.{c,h}: New module — MSB-first bit packer with H.264 Exp-Golomb ue(v) and se(v) coding + rbsp_trailing_bits alignment. Sticky overflow flag so callers can verify the output buffer wasn't truncated. daemon/src/h264_nal_synth.{c,h}: New module — turns v4l2_ctrl_h264_sps / v4l2_ctrl_h264_pps into AnnexB-framed NAL units per ITU-T H.264 7.3.2.1 / 7.3.2.2. Emits emulation prevention bytes (0x03 after every 00 00 in the EBSP) and the 4-byte start code (0x00000001). Coverage matches what V4L2 stateless surface gives us: VUI parameters and full scaling matrices are NOT emitted (V4L2 doesn't carry them — the seq_scaling_matrix_present_flag is set to 0 and libavcodec uses flat defaults, which matches the de-facto behaviour of most H.264 streams libva-v4l2-request drives). daemon/src/decoder.c: daedalus_decoder_run_request() now takes an optional h264_meta parameter. For codec_id == H264 with meta != NULL, synthesises SPS+PPS NAL units, allocates a combined [SPS][PPS][slice] buffer (+ AV_INPUT_BUFFER_PADDING_SIZE), and feeds that to avcodec_send_packet instead of the raw slice. VP9/AV1 path unchanged (frames are self-contained). Cleanup now goes through a unified `out:` label so the assembled buffer is always freed on every exit (including the existing decoder_open_codec / no-frame / receive_frame failure paths). daemon/src/chardev_client.c: handle_req_decode() peels off the optional meta block when the flag is set, passes it through to the decoder, and updates the payload-length consistency check (now allows for an extra sizeof(daedalus_h264_meta) when the flag is on). Build (boltzmann aarch64): clean compile of all daemon sources, including bitstream_writer + h264_nal_synth + the refactored decoder.c. Kernel module compile to be verified via DKMS rebuild on higgs in the marfrit-packages bump that follows. Test plan: with this commit + a marfrit-packages daedalus pin bump, higgs's ffmpeg -hwaccel vaapi -i h264_test.mp4 should produce a successful decode (vs. the previous "non-existing PPS 0 referenced" failure). The daemon log should show: decoder: opened h264 context decoder: h264 prepended SPS=NB PPS=MB slice=KB decoder: OK 320x240 fmt=0 (yuv420p) fnv1a=0x... VP9 / AV1 behaviour unchanged — they don't carry meta and the existing per-frame self-describing path still applies. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 17:35:24 +02:00
marfrit	481279c9bf	packaging/systemd: ship daedalus-v4l2.service + modules-load drop-in Canonical location for the systemd unit + module-autoload conf, referenced by both arch/daedalus-v4l2 and debian/daedalus-v4l2 in marfrit-packages. Was a real gap in the original packaging: postinst installed the daemon binary but nothing started it, so the libva path got REQ_DECODE messages with nobody listening on /dev/daedalus-v4l2 and timed out. packaging/systemd/daedalus-v4l2.service: - Type=simple, ExecStart=/usr/bin/daedalus_v4l2_daemon daemon - After=systemd-modules-load.service + ConditionPathExists= /dev/daedalus-v4l2 (so it only starts when the kernel module is loaded; doesn't false-fire on non-daedalus hosts that happen to have the package installed) - Restart=on-failure, RestartSec=2 - MemoryHigh=128M / MemoryMax=256M (Phase 8.9 stress run showed RSS settling around 25 MiB; leaves headroom) - Hardening: NoNewPrivileges, ProtectSystem=strict, ProtectHome, PrivateTmp, ProtectKernel*, SystemCallFilter=@system-service. PrivateDevices=false because we DO need /dev/daedalus-v4l2 packaging/systemd/daedalus-v4l2.modules-load: - Drops to /etc/modules-load.d/daedalus-v4l2.conf so the kernel module loads before the .service unit fires. Both files are picked up by the package recipes (next bump in marfrit-packages) — neither lives in /usr/lib/systemd/system or /etc/modules-load.d until the .deb / .pkg installs them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 10:26:58 +02:00
marfrit	f0cd29a340	kernel: v4l2_fh_add/del gained file* arg in 6.18 — version-conditional DKMS build failure on higgs (Pi CM5, kernel 6.18.29+rpt-rpi-2712): daedalus_v4l2_main.c:1049: error: too few arguments to function 'v4l2_fh_add' v4l2-fh.h:97: void v4l2_fh_add(struct v4l2_fh fh, struct file filp); daedalus_v4l2_main.c:1063: error: too few arguments to function 'v4l2_fh_del' Signature changed exactly at v6.18 (verified v6.13–v6.17 still use the one-arg form via raw.githubusercontent.com tag walk). Wrap the calls with LINUX_VERSION_CODE >= KERNEL_VERSION(6, 18, 0) so the module keeps building against: * 6.12 LTS / RPi 6.12.75 (one-arg) — hertz * 6.12.88+deb13-arm64 (one-arg) * 6.18.29+rpt-rpi-2712 (file* arg) — higgs running kernel Build verified on both: hertz 6.12.75 clean, higgs 6.18.29 clean + modprobe daedalus_v4l2 succeeds, /dev/daedalus-v4l2 + /dev/video0 appear. Add #include <linux/version.h> for KERNEL_VERSION + LINUX_VERSION_CODE (also pulled transitively via module.h but explicit is better than implicit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 10:15:24 +02:00
marfrit	f55b2cd002	kernel: media_request_get/put around inf->req (UAF safety) Sonnet pre-deployment review flagged a SHIP-WITH-EYES-OPEN risk: Phase 8.13's inf->req captured src_buf->vb2_buf.req_obj.req as a raw pointer with no media_request_get(). On the normal decode path that's fine because vb2-core holds its own reference until v4l2_m2m_buf_done_and_job_finish releases it. But on a concurrent cancel (MEDIA_IOC_REQUEST_REINIT or a process kill triggering buf_request_complete from the cancel path before RESP_FRAME comes back), vb2 could drop its reference first. Our inf->req would then dangle through v4l2_ctrl_request_complete + buf_done_and_job_finish — UAF. Fix matches the cedrus / rkvdec pattern: take our own reference when we capture the pointer, release it after we're done with it (after buf_done_and_job_finish to keep the ordering crystal-clear). /* in daedalus_device_run, after inf->req = src_buf->...->req / if (inf->req) media_request_get(inf->req); / in daedalus_complete_resp_frame, after buf_done_and_job_finish */ if (inf->req) media_request_put(inf->req); Verified on hertz: - libva path (request-bound, inf->req != NULL): byte-exact NV12, same FNV-1a as standalone. - test_m2m_stream (direct QBUF, inf->req == NULL): 30/30 frames decoded, conditional skip works. - No kernel oops / WARN, no leak in dmesg. Add #include <media/media-request.h> for the helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 18:39:10 +00:00
marfrit	f04d7000f8	Phase 8.13: byte-exact end-to-end via libva (consumer target hit) The project's consumer-side goal landed: a real VAAPI consumer (ffmpeg with -hwaccel vaapi) drives our libva backend → V4L2 driver → daemon → byte-exact NV12 output back to ffmpeg. ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \ -hwaccel_output_format nv12 -i vp9_small.ivf \ -f rawvideo -y /tmp/vp9_via_libva.nv12 cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12 → match 18432-byte NV12 byte-for-byte identical to plain ffmpeg -pix_fmt nv12 software decode. The project_consumer_target memory's deliverable shape — "V4L2 stateless node consumed by a real VAAPI client" — is achieved. Two related kernel changes: 1. v4l2_ctrl_handler_setup(&ctx->hdl) after registration — matches rkvdec/cedrus/hantro. Brings each registered compound control out of "uninitialised" state via std_init_compound defaults. 2. Per-request control completion in the decode path — the real fix for "Timeout when waiting for media request". vb2-core's vb2_buffer_done unbinds the BUFFER's req_obj on normal decode completion, but the per-request CONTROL object stays bound. buf_request_complete fires only from queue-cancel paths (vb2-core line 2284), NOT from normal buf_done. The driver must call v4l2_ctrl_request_complete(req, hdl) explicitly from the completion path. struct daedalus_inflight gained a `struct media_request *req` field, captured from src_buf->vb2_buf.req_obj.req in device_run. daedalus_complete_resp_frame then calls v4l2_ctrl_request_complete before v4l2_m2m_buf_done_and_job_finish — triggers MEDIA_REQUEST_STATE_COMPLETE and wakes the request fd poll. For non-request flows (test_m2m_stream direct QBUF) inf->req is NULL; the conditional skips the call. Both consumer styles work concurrently. Diagnostic clarification (was Phase 8.13a): strace traced three S_EXT_CTRLS calls per frame: 1. H264_PROFILE + H264_LEVEL → EINVAL (we don't register) 2. HEVC_PROFILE + HEVC_LEVEL → EINVAL (we don't register) 3. VP9_FRAME + VP9_COMPRESSED_HDR → SUCCESS The first two are harmless: libva probes whether we support H264/HEVC integer profile/level controls during config negotiation; we don't (we expose them as stateless), so EINVAL just falls through. The actual VP9 stateless controls (#3) succeeded all along — the libva-side "Unable to set control(s)" log was misleading us into thinking the control path was the bug. Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712): daemon log: REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes decoder: opened vp9 context decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe ... ffmpeg side: no Timeout, no Decoding error /tmp/vp9_via_libva.nv12: 18432 bytes cmp vs reference: byte-for-byte identical. Roadmap update: - 8.10/8.11, 8.12, 8.13 marked closed with closure docs. - 8.14 = multi-frame VP9 via libva, AV1 + H.264, mpv/Firefox higher-level consumers. Per correctness-before-speed: - strace + kernel-source-reading found the actual root cause rather than guessing. - Conditional v4l2_ctrl_request_complete preserves the existing test_m2m_stream non-request path — both consumer styles work concurrently without per-flow branching elsewhere. - Byte-exact pixel comparison, not "frame size matches." Phase 8.14 next: multi-frame stream + multi-codec via libva + mpv/Firefox. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 18:14:34 +00:00
marfrit	a7d585eee8	Phase 8.12: first VP9 frame decoded via libva ffmpeg -hwaccel vaapi → libva-v4l2-request-fourier → /dev/video0 → daedalus_v4l2 kernel → REQ_DECODE on the chardev → daemon FFmpeg decode → byte-exact NV12 (FNV-1a 0x1eb34bfe, same hash the standalone test_m2m_stream produces for the same 128x96 VP9 keyframe). The pixel-correct decode through the libva path is the milestone. What's NOT yet working: libva times out on the media_request fd because buf_request_complete never fires (vb->req_obj.req is NULL when buf_done runs — the S_EXT_CTRLS EINVAL leaves the buffer un-bound to the request even though the buffer queues anyway). Phase 8.13 fixes the EINVAL so the request bind takes and the completion signal propagates. Kernel V4L2 request API integration: - media_device_ops.req_validate / req_queue = vb2_request_ validate / v4l2_m2m_request_queue (Phase 8.11) — MEDIA_IOC_REQUEST_ALLOC succeeds. - vb2_queue.supports_requests = true on OUTPUT queue — without this v4l2-core rejects S_EXT_CTRLS(REQUEST_VAL). - vb2_ops.buf_request_complete = daedalus_buf_request_complete → v4l2_ctrl_request_complete(req, &ctx->hdl). Without this v4l2-core WARNs at videobuf2-v4l2.c:440. - vb2_ops.buf_out_validate: sets field=V4L2_FIELD_NONE on OUTPUT buf. Required for the same WARN check. - requires_requests intentionally NOT set: lets the existing test_m2m_stream (direct QBUF, no request) keep working alongside the libva path. Stateless control re-registration: - Switched from v4l2_ctrl_new_std_compound(NULL p_def) to v4l2_ctrl_new_custom(&cfg, NULL) — pattern rkvdec / cedrus / hantro use. v4l2-core auto-fills elem_size + type from std table (verified: VP9_FRAME elem_size=168, matches sizeof(struct v4l2_ctrl_vp9_frame)). - No-op s_ctrl callback so SET requests don't crash — daemon ignores values, FFmpeg re-parses the bitstream. Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712): ffmpeg -hwaccel vaapi -i vp9_small.ivf … daemon: REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes daemon: decoder: opened vp9 context daemon: decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe … Same FNV-1a hash as the standalone test_m2m_stream produces for the same VP9 keyframe. End-to-end through libva. Remaining (Phase 8.13): - S_EXT_CTRLS EINVAL on V4L2_CID_STATELESS_VP9_FRAME despite matching elem_size — needs deeper validate-path debugging. - Once the request bind takes, buf_request_complete fires on buf_done, request fd signals completion, libva DQBUFs the decoded NV12. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 18:01:26 +00:00
marfrit	0de0288dce	Phase 8.10+8.11: libva consumer integration scaffold Brings daedalus_v4l2 from "standalone test client" to "VAAPI- discoverable decoder" by adding the surface formats and media-controller plumbing that libva-v4l2-request-fourier (sibling repo) requires. libva-v4l2-request-fourier patches (pushed separately): - b5b3acf: daedalus_v4l2 added to known_decoder_drivers - 2146341: meson option gate This commit (daedalus-v4l2 side, 3 production changes): 1. V4L2_PIX_FMT_NV12 (single-plane) on CAPTURE - Added to daedalus_capture_formats[] alongside NV12M + P010 - daedalus_fill_capture_fmt handles num_planes=1 case (sizeimage = WH3/2, bytesperline = W) - daemon pack_nv12_single_to_plane: Y at base+0, interleaved CbCr at base+(stride*H); same byte content as NV12M two-plane, different layout - Required because libva-v4l2-request-fourier's video.c only knows non-multi-plane NV12 (it advertises v4l2_mplane=true but uses the single-plane fourcc). - Verified byte-exact via test_m2m_stream against ffmpeg -pix_fmt nv12 reference (VP9 1080p 10 frames, 31 MB). 2. V4L2 Request API media ops - daedalus_media_ops = { vb2_request_validate, v4l2_m2m_request_queue } assigned to mdev.ops before media_device_init. - Without this, MEDIA_IOC_REQUEST_ALLOC returned -ENOTTY and no VAAPI consumer could allocate a media_request. 3. Stateless control registration via v4l2_ctrl_new_custom - Switched from v4l2_ctrl_new_std_compound(NULL p_def) to v4l2_ctrl_new_custom — pattern rkvdec/cedrus/ hantro use. Adds a no-op s_ctrl callback. Verification (hertz, Pi 5, 6.12.75+rpt-rpi-2712): LibVA trace through `ffmpeg -hwaccel vaapi`: vaInitialize / Profiles / Entrypoints / CreateConfig / QuerySurfaceAttributes / CreateSurfaces / CreateContext (cap_pool: 24 slots, 1 plane each) / CreateBuffer (slice + picture params) / MEDIA_IOC_REQUEST_ALLOC — all succeed. Standalone NV12 decode path: test_m2m_stream vp9_1080_stream.ivf out.nv12 1920 1080 vp9 nv12 → 10/10 frames, byte-exact vs ffmpeg -pix_fmt nv12 vainfo (via libva-v4l2-request-fourier with our driver): 7 VAProfile entries with VAEntrypointVLD (H264 Main/High/CBaseline/MultiviewHigh/StereoHigh, VP9Profile0, AV1Profile0) What's NOT here (Phase 8.12): The libva trace stops at VIDIOC_S_EXT_CTRLS returning EINVAL when populating V4L2_CID_STATELESS_VP9_FRAME on the request. The compound-control payload validation against the kernel's expected struct shape rejects. This isn't a "missing line" fix — it needs proper stateless control plumbing (the SPS/PPS/SliceParams get_dims, validate, default-value paths that in-tree rkvdec/cedrus/hantro implement to satisfy v4l2-core's std_validate). Documented as Phase 8.12 scope. The shipped integration is itself a meaningful deliverable: all the framework scaffolding is in place; the remaining gap is well-characterised and bounded. See docs/phase_8_10_11_closure.md for the full trace analysis + next-phase plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:51:16 +00:00
marfrit	d84efdb125	Phase 8.9: long-form stress + multi-codec HDR + libva scoping Three verification deliverables; no production code changes (infrastructure from 8.8 was sufficient). 1. libva-v4l2-request consumer investigation (task 95): - bootlin/libva-v4l2-request@master supports MPEG-2 / H.264 / HEVC only. No VP9, no AV1. - H264 expects V4L2_PIX_FMT_H264_SLICE_RAW (older fourcc); we advertise V4L2_PIX_FMT_H264_SLICE. - CAPTURE expects V4L2_PIX_FMT_NV12 (single-plane); we advertise NV12M + P010. - Real integration = patch libva-v4l2-request to add VP9 + AV1 mappings + accept the newer H.264 fourcc. Multi-session work — pushed to Phase 8.10. 2. Long-form stress test (task 96): - Built a 1800-frame (60s @ 30fps) VP9 1080p stream by Python concat of vp9_5s.ivf × 12 with PTS adjustment and re-muxed IVF header. - 1800 / 1800 frames decoded cleanly through test_m2m_stream + daemon, fps=120.9 sustained across 14.9 s wall, p99=17.3 ms/frame (well inside the 33 ms 30fps budget). - Daemon alive after 3620 cookies across two back-to-back runs, RSS=23 MiB — no leak. - No kernel oops/WARN, no fps degradation across the long run. 3. Multi-codec HDR (task 97): - AV1 1080p 10-bit → P010: byte-exact vs ffmpeg p010le. fps 17.1 (below 30fps target; AV1 10-bit is intrinsically expensive). - H.264 1080p 10-bit (high10) → P010: byte-exact vs ffmpeg p010le. fps 26.9 (close to target). - Combined with 8.8's VP9-10bit P010 result (48.8 fps): all three codecs' 10-bit paths produce byte-exact P010 output. Roadmap update (docs/roadmap.md): - 8.9 marked closed with the scope-cut explained. - 8.10 = libva-v4l2-request VP9/AV1 patch + end-to-end consumer integration (the actual user-facing loop: mpv --hwdec=vaapi → libva-v4l2-request → /dev/video0 → daemon → decoded frame). Per correctness-before-speed: characterised the libva integration scope rigorously rather than starting a multi-session battle in this phase. The bounded deliverables (stress test + HDR matrix) ship clean and prove the existing infrastructure handles real-world workloads stably. Phase 8.10 next: build + patch libva-v4l2-request on hertz; end-to-end with mpv. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:26:42 +00:00
marfrit	1d0db3b5a9	docs: pure ffmpeg vs daedalus pipeline CPU comparison Measured on hertz (Pi 5, 6.12.75+rpt-rpi-2712, FFmpeg 7.1.3) to quantify the architectural cost/benefit of routing decode through the V4L2 m2m + chardev + dmabuf path vs running ffmpeg standalone. 1080p × 150 frames, decode-as-fast-as-possible: VP9 8-bit: ffmpeg 214.9% CPU / 1083ms wall daedalus 96.3% CPU / 1229ms wall AV1 8-bit: ffmpeg 201.5% CPU / 1162ms wall daedalus 96.6% CPU / 1478ms wall H.264 8-bit: ffmpeg 205.8% CPU / 1063ms wall daedalus 100.1% CPU / 1020ms wall VP9 10-bit: ffmpeg 155.8% CPU / 269ms wall daedalus 91.6% CPU / 131ms wall Key takeaway: the daedalus pipeline uses ~half the CPU for roughly the same wall throughput. FFmpeg standalone defaults to 2 threads; for single-stream decode that doesn't parallelise well, so the 2× CPU usage is overhead, not parallelism benefit. The daemon's single-threaded serialised event loop avoids that tax. For the project's 30fps-floor-is-fine target ("daily YouTube with CPU free for vscode"), daedalus leaves ~2× the CPU headroom for the rest of the desktop at the same playback rate. VP9-10bit is striking — daedalus is faster wallclock too (131ms vs 269ms) because at small per-frame work FFmpeg's thread pool spin-up dominates. Note: "daedalus" still uses FFmpeg internally (Phase 8.8 explicitly deferred QPU substitution after measurement showed 30fps@1080p was already met). The benefit here is architectural — single-threaded decode, out-of-process daemon, dmabuf zero-copy — not QPU offload. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:20:22 +00:00
marfrit	1ae9528e76	Phase 8.8: throughput baseline + multi-codec streams + HDR Per the correctness-before-speed principle: measure before optimising. Roadmap going in said "QPU dispatch substitution to hit 30fps@1080p". Measurement on hertz shows the FFmpeg software path already hits 65-88 fps@1080p across all three codecs — QPU substitution would be premature optimisation. So 8.8 ships what's actually useful: 1. Per-frame timing in test_m2m_stream. 2. Multi-frame AV1 + H.264 streams verified byte-exact at 1080p (closes the "VP9-only stream tests" gap from 8.7). 3. HDR / 10-bit via V4L2_PIX_FMT_P010 + daemon pack_p010_to_plane. Test harness (tools/test_m2m_stream.c): - Per-frame µs timing via CLOCK_MONOTONIC; reports mean/p50/ p99/min/max + wall ms + fps. - Annex-B H.264 parser: split on 3-/4-byte start codes, accumulate NALs into access units (push on VCL NAL types 1 or 5). Without AU grouping FFmpeg rejects SPS/PPS-only buffers as "no frame!". - Format auto-detect (DKIF magic → IVF; else Annex-B). - Optional 6th arg `[capture]`: nv12m \| p010. - CAPTURE mmap path generalised for num_planes==1 (P010). Kernel (kernel/daedalus_v4l2_main.c): - CAPTURE formats array {NV12M, P010}; enum_fmt walks it. - daedalus_fill_capture_fmt takes a fourcc: NV12M: 2 planes, WH + WH/2 bytes, bpl=W P010: 1 plane, WH2 + WH bytes, bpl=W2 - try_fmt preserves caller fourcc when supported. - daedalus_complete_resp_frame's dmabuf path now sets each plane's payload to vb2_plane_size(vb,p) — generalises cleanly across 1-plane (P010) and 2-plane (NV12M) layouts; the daemon fully populates the plane so payload = sizeimage. Daemon (daemon/src/decoder.c): - pack_p010_to_plane: YUV420P10LE → P010 single-plane. 10-bit samples shifted left by 6 to MSB-align in 16-bit words per V4L2 ABI. Y at base+0, interleaved CbCr right after Y plane (per format spec for single-plane P010). Strips source stride padding; respects destination stride. - daedalus_decoder_run_request dispatches on req->capture_pix_fmt (NV12M → pack_nv12_to_planes; P010 → pack_p010_to_plane; else warn + skip). - Includes <linux/videodev2.h> for fourcc constants. Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712): 1080p throughput baseline (30 frames testsrc, dmabuf path): VP9 1080p: mean 12.0 ms, p99 15.9 ms, fps 83.1, byte-exact ✓ AV1 1080p: mean 15.4 ms, p99 41.0 ms, fps 65.0, byte-exact ✓ H.264 1080p: mean 11.3 ms, p99 21.5 ms, fps 88.3, byte-exact ✓ All 2-3× over the 30fps-floor-is-fine criterion. HDR / 10-bit 1080p P010: 10 frames, 62 MB output, fps 48.8, byte-exact vs `ffmpeg -pix_fmt p010le -f rawvideo`. Small-frame P010 (320×240): fps 966 — fixed daemon overhead dominates at low resolutions. v4l2-compliance unchanged from 8.7: 49/49 passing. Format enumeration confirms NM12 + P010 on CAPTURE. Clean SIGTERM + rmmod; no kernel oops/WARN. Roadmap update (docs/roadmap.md): - 8.8 marked closed with closure-doc reference, including the explicit "QPU substitution not needed" rationale. - 8.9 reshaped: libva-v4l2-request consumer integration (per project_consumer_target memory) — the actual user-facing endpoint. Per correctness-before-speed: - Measured first; QPU work explicitly justified-out via data. - Byte-exact pixel comparison for every codec/format combo (NV12: VP9, AV1, H.264; P010: VP9 10-bit at 320×240 and 1080p). - AU grouping in the Annex-B parser is the correct semantic boundary, not just a workaround. - vb2_plane_size for payload generalises to any plane count, not hardcoded to 2. Phase 8.9 next: libva-v4l2-request integration — close the loop from YouTube/Firefox to /dev/video0 + daemon playback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 16:34:05 +00:00
marfrit	5965805d86	Phase 8.7: media controller + multi-frame streaming verification Two pieces — both shipped: 1. Media controller binding closes the last v4l2-compliance failure from 8.6 (DECODER_CMD, which requires has_media on stateless decoders) and unlocks the V4L2 request API for libva-v4l2-request. 2. Multi-frame streaming test exercises the daemon's AVCodecContext state preservation across many REQ_DECODE calls — Phase 8.6's tests pushed exactly one keyframe per invocation; real content has P-frame references. Compliance now reaches 49/49 passing. Kernel (kernel/daedalus_v4l2_main.{c,h}): - Added `struct media_device mdev` to daedalus_dev. - media_device_init(&mdev) BEFORE v4l2_device_register so v4l2-core sees v4l2_dev.mdev = &mdev and binds the m2m entities into the graph during register. - After video_register_device: v4l2_m2m_register_media_controller(..., MEDIA_ENT_F_PROC_VIDEO_DECODER) then media_device_register so userspace sees the complete graph in /dev/mediaN with the decoder entity tagged. - daedalus_remove unwinds in reverse: unregister media, unregister mc, unregister video, release m2m, unregister v4l2, cleanup mdev. - Error paths added for both new failure points. Test harness (tools/test_m2m_stream.c, new): - Multi-frame V4L2 m2m client: parses IVF → 4-deep buffer rings on both queues → per-frame QBUF/DQBUF loop → concatenates decoded NV12 to output file. Returns 0 only if every input frame decoded without error. - Same codec vocabulary as test_m2m_decode (vp9 \| av1 \| h264 via 5th arg). Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712): v4l2-compliance: 49 tests, 49 passed, 0 failed, 0 warnings. $ v4l2-ctl --list-devices daedalus-fourier V3D7+NEON (platform:daedalus_v4l2): /dev/video0 /dev/media3 VP9 320×240 30 frames (1 keyframe + 29 P-frames, 3.46 MB NV12): byte-for-byte match vs `ffmpeg -i in.ivf -pix_fmt nv12 -f rawvideo`. VP9 1920×1080 10 frames (31 MB NV12 through the dmabuf path): byte-for-byte match vs same reference command. Daemon log shows cookies 1..30 all completing cleanly in order; lazily-opened AVCodecContext maintains reference frames across the chardev round-trips. Clean SIGTERM + rmmod, no oops/WARN. Roadmap update (docs/roadmap.md): - 8.7 marked closed with closure-doc reference. - 8.8 reshaped: perf profiling, QPU dispatch substitution via daedalus-fourier, multi-frame AV1/H.264, HDR (P010M). Per correctness-before-speed: - Order-correct media controller lifecycle (init → bind v4l2_dev → register video → register mc → register media; reverse for teardown). - 4-deep buffer rings on both queues — the scheduler actually pipelines multiple in-flight cookies through the chardev (not just one-at-a-time as in 8.5/8.6 tests). - Bit-exact comparison against ffmpeg, not "looks right." - All resource paths cleaned on every error branch. Phase 8.8 next: profile daemon hot loops, dlopen daedalus-fourier from the daemon, swap FFmpeg per-block calls for daedalus_dispatch_* where the kernel matches, target 30fps@1080p from 30fps-floor-is-fine memory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 16:21:58 +00:00
marfrit	c7f6fb90cb	Phase 8.6: dmabuf + AV1 + H.264 + stateless controls Removes the Phase 8.5 64 KiB frame-size cap by exporting CAPTURE buffers as dmabuf-fds the daemon mmaps and writes pixels into directly. Adds AV1 + H.264 codec support, V4L2 stateless control registration, and the compliance polish that brings the driver to 47/48 v4l2-compliance pass. Protocol (include/daedalus_v4l2_proto.h): - struct daedalus_req_decode grew capture-buffer metadata (width/height/pix_fmt/num_planes + per-plane size+stride). - New DAEDALUS_IOC_GET_DMABUF ioctl on the chardev: daemon asks for a per-plane dmabuf fd, kernel calls vb2_core_expbuf in daemon task context so the fd lands in the daemon's table. Kernel m2m driver (kernel/daedalus_v4l2_main.c): - Both queues switched to vb2_dma_contig_memops. OUTPUT was vmalloc in 8.5; the switch is needed because vmalloc doesn't honour V4L2_MEMORY_FLAG_NON_COHERENT and v4l2-compliance's REQBUFS test rejected the driver because of it. We still read bitstream via vb2_plane_vaddr (dma_contig gives a kernel virtual address just like vmalloc did). - dma_coerce_mask_and_coherent(DMA_BIT_MASK(32)) in probe. - queue_setup populates alloc_devs[plane] = &pdev->dev for both queues; allow_cache_hints=1 on both. - daedalus_export_capture_dmabuf(cookie, plane, flags, fd): walks inflight list, calls vb2_core_expbuf on the CAPTURE buffer in the caller's (daemon's) task context. - device_run fills the new REQ_DECODE capture fields from ctx->dst_fmt and maps ctx->src_fmt.pixelformat to DAEDALUS_CODEC_VP9 / _AV1 / _H264 (was hard-wired to VP9). - daedalus_complete_resp_frame handles both the 8.5 inline path (kept for debugging) and the 8.6 dmabuf path (pixels already in CAPTURE buffer, just set payload from metadata). - enum_fmt advertises all 3 OUTPUT formats (VP9F, AV1F, S264). - try_fmt preserves userspace colorspace fields instead of overwriting with REC709 defaults (fixes 8.5 compliance fail). - s_fmt propagates OUTPUT colorspace → CAPTURE (stateless decoder round-trip test at v4l2-test-formats.cpp:958). - 12 V4L2 stateless controls registered per open (VP9_FRAME, VP9_COMPRESSED_HDR, H264_SPS/PPS/SCALING/PRED_WEIGHTS/ SLICE_PARAMS/DECODE_PARAMS, AV1_FRAME/SEQUENCE/ TILE_GROUP_ENTRY/FILM_GRAIN). Daemon ignores values (FFmpeg re-parses); registration is what makes libva-v4l2-request see us. Kernel chardev (kernel/daedalus_v4l2_chardev.c): - New unlocked_ioctl dispatching DAEDALUS_IOC_GET_DMABUF to daedalus_export_capture_dmabuf. - debugfs test_decode cookies unified with the m2m cookie allocator via shared daedalus_next_cookie() — kills the Phase 8.5 namespace collision. Daemon (daemon/src/...): - New dmabuf_capture.{c,h}: GET_DMABUF + mmap each plane on REQ_DECODE; munmap + close on completion. O_RDWR \| O_CLOEXEC is essential — vb2_core_expbuf extracts O_ACCMODE from flags and exports read-only by default (caught on first run; mmap -EACCES on PROT_WRITE). - decoder.{c,h}: lazily opens AV1 + H.264 AVCodecContexts in addition to VP9 (dropped the -ENOSYS stubs). pack_nv12_to_planes writes Y line-by-line into planes[0] with planes[0].stride; interleaves Cb/Cr into planes[1] with planes[1].stride. - chardev_client.c handle_req_decode: opens dmabuf planes, runs decode (pixels land in CAPTURE buffer directly), closes planes, sends metadata-only RESP_FRAME. No wire-pixel allocation. Test harness (tools/test_m2m_decode.c): - Optional 5th arg `codec` (vp9 \| av1 \| h264). Same client drives all three codecs. Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712): Bit-exact end-to-end vs `ffmpeg -pix_fmt nv12`: VP9 1920x1080 3,110,400 bytes MATCH AV1 128x96 18,432 bytes MATCH H.264 128x96 18,432 bytes MATCH VP9 1080p went through the full dmabuf path with no chardev payload bloat — the same chardev that capped at 64 KiB in 8.5 now ferries metadata only and lets the daemon mmap+write a 3.1 MB frame directly into the V4L2 client's buffer. v4l2-compliance: Phase 8.1: 44/48 Phase 8.5: 44/48 (different fails after m2m landed) Phase 8.6: 47/48 Only remaining: VIDIOC_(TRY_)DECODER_CMD (needs media controller — explicitly Phase 8.7 work). 11 standard compound controls visible: vp9_frame_decode_parameters, vp9_probabilities_updates, h264_sequence_parameter_set, h264_picture_parameter_set, h264_scaling_matrix, h264_prediction_weight_table, h264_slice_parameters, h264_decode_parameters, av1_sequence_parameters, av1_frame_parameters, av1_film_grain (av1_tile_group_entry refused by hdl->error on this kernel — skipped silently). Clean SIGTERM + rmmod, no oops/WARN. Roadmap update (docs/roadmap.md): - Phase 8.6 marked closed with the closure-doc reference. - Phase 8.7 reshaped to (1) media controller, (2) perf + daedalus_dispatch_ substitution, (3) HDR/10-bit, (4) long-form multi-frame streaming. Per correctness-before-speed: - Real V4L2 dmabuf via vb2_core_expbuf (not a sideband fd-passing hack). - O_RDWR access mode threaded through correctly. - Strict pixel-byte comparison against ffmpeg, not "looks right" eyeballing. - Each compliance edge documented with the underlying test source-line + the fix. - All resource paths cleaned (munmap + close per plane on every exit, including error paths). Phase 8.7 next: media controller binding (closes last compliance fail), per-frame profiling, QPU dispatch substitution targeting 30fps@1080p from 30fps-floor-is-fine memory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 16:16:06 +00:00
marfrit	6f4b580f7c	Phase 8.5: full V4L2 m2m driver, VP9 decode via QBUF/DQBUF Replaces the Phase 8.4 debugfs-triggered chardev path with a real V4L2 m2m driver. Userspace clients now drive decoding the standard way — S_FMT / REQBUFS / QBUF on the OUTPUT (bitstream) queue, DQBUF on the CAPTURE (NV12M) queue. Kernel device_run packs the bitstream into REQ_DECODE; daemon decodes via FFmpeg; RESP_FRAME's inline NV12 pixel payload lands in the CAPTURE buffer. Phase 8.6 swaps the inline payload for dmabuf so big frames stop being capped at 64 KiB. Kernel (daedalus_v4l2_main.c, rewritten + main.h added): - Per-open struct daedalus_ctx: v4l2_fh, m2m_ctx, ctrl_handler, per-queue v4l2_pix_format_mplane. - Two vb2_queues (vb2_vmalloc_memops for both — no DMA needed yet; 8.6 switches CAPTURE to dma_contig for dmabuf-export): OUTPUT = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, VP9_FRAME CAPTURE = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE, NV12M - Full v4l2_ioctl_ops table: querycap, enum_fmt, g/s/try_fmt for both queues, reqbufs/querybuf/qbuf/dqbuf/create_bufs/ prepare_buf/expbuf/streamon/streamoff via v4l2_m2m_ioctl_* helpers. - v4l2_m2m_ops.device_run: peeks next OUTPUT buf, builds REQ_DECODE inline with the bitstream bytes, enqueues with an auto-incrementing cookie, stores {ctx, src_buf, dst_buf} in a per-device inflight list. Job stays open until RESP_FRAME. - daedalus_complete_resp_frame(): pops the inflight entry, memcpys inline NV12 pixels into the CAPTURE buffer (Y plane + interleaved CbCr), finishes via v4l2_m2m_buf_done_and_job_finish — NOT plain buf_done + job_finish, which leaves the src buf on the m2m queue and causes device_run to immediately re-run on the same input (caught on first run; second REQ_DECODE for same bitstream + eventual oops in stop_streaming on teardown). Kernel (daedalus_v4l2_chardev.c): - RESP_FRAME handler now hands inline pixel payload to daedalus_complete_resp_frame so it lands in the CAPTURE vb2 buffer. Existing PONG and debugfs test_decode paths still work; the latter produces a harmless ratelimited "unknown cookie" since it bypasses V4L2 m2m. Daemon (decoder.c, decoder.h): - daedalus_decoder_run_request signature extended with (nv12_out, nv12_cap, nv12_used). After the FNV-1a digest the decoder packs YUV420P into NV12 in the caller's buffer: Y plane line-by-line stripped of stride padding; Cb/Cr interleaved into a single chroma plane. Truncation silent — kernel only memcpys what fits in the CAPTURE plane. Daemon (chardev_client.c): - handle_req_decode allocates a response buffer sized for the full chardev payload, lets decoder fill the pixel area after the resp_frame struct, sends the full payload via the existing send_response. Test client (tools/test_m2m_decode.c, new): - Minimal V4L2 m2m client: S_FMT both queues, REQBUFS 1 each, mmap+fill OUTPUT, QBUF both, STREAMON, poll, DQBUF, dump CAPTURE planes to a raw NV12 file. ~250 LOC; verifies the whole flow without needing v4l2-ctl framing. Roadmap update (docs/roadmap.md): - Phase 8.4 retitled "daemon ↔ kernel decode round-trip" to reflect what actually shipped (vs. the original V4L2- ioctl-driven plan which moved here). - Phase 8.5 retitled "full V4L2 m2m driver" with closure status. - Phase 8.6 reshaped to two tracks: dmabuf + AV1/H.264/ stateless controls + media controller. Adds the punch list of v4l2-compliance failures (DECODER_CMD, S_FMT colorspace) that 8.6 will fix. Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712): Kernel + daemon build clean (-Wall -Wextra clean both sides). Test harness drives one VP9 keyframe end-to-end: OUTPUT REQBUFS -> 2 CAPTURE REQBUFS -> 2 QBUF OUTPUT[0] bytesused=1566 QBUF CAPTURE[0]; STREAMON both poll revents=0x5 DQBUF OUTPUT[0] flags=0x4001 (DONE) DQBUF CAPTURE[0] flags=0x4000 payloads=[12288, 6144] wrote 12288 Y + 6144 UV bytes to /tmp/out_m2m.nv12 Pixel correctness vs reference: ffmpeg -i vp9_small.ivf -pix_fmt nv12 -f rawvideo -y ref.nv12 cmp /tmp/out_m2m.nv12 /tmp/ref.nv12 → match ✓ Byte-for-byte identical to FFmpeg's stock CPU decode. v4l2-compliance: detected as Stateless Decoder; most ioctls pass; two expected fails documented in closure doc (DECODER_CMD/media controller, S_FMT colorspace). Clean teardown: SIGTERM the daemon, rmmod the module, no oops/WARN in dmesg. Per correctness-before-speed: - Real V4L2 ioctl table (not stubs); uses v4l2-core helpers where they exist instead of reinventing. - v4l2_m2m_buf_done_and_job_finish (not the manual sequence) to keep scheduler state consistent. - Bit-exact reference comparison, not just "looks right." - Documented every compliance failure with the planned fix. - All resource paths (kmalloc/kfree, inflight list cleanup, src/dst buf removal in stop_streaming) handled on every error branch. Phase 8.6 next: dmabuf-export for CAPTURE (removes 64 KiB frame-size cap), add AV1+H.264 codecs, add V4L2 stateless controls + media controller binding, fix the colorspace + cookie-namespace compliance issues. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:55:10 +00:00
marfrit	2a449632b9	Phase 8.4: daemon ↔ kernel decode round-trip (VP9 end-to-end) Wires the Phase 8.3 FFmpeg loader through the Phase 8.2 chardev bridge: kernel injects REQ_DECODE carrying a raw VP9 access unit, daemon hands the bitstream to libavcodec via dlopen, sends RESP_FRAME back with a content-dependent FNV-1a digest of the decoded YUV planes. Pure CPU decode for now — Phase 8.5 swaps in dmabuf + QPU dispatch. Protocol (include/daedalus_v4l2_proto.h): - New REQ_DECODE (kernel→daemon) and RESP_FRAME (daemon→kernel) message types, with fixed-size payload structs. - New DAEDALUS_CODEC_VP9/AV1/H264 enum (wire-stable so 8.6's AV1+H.264 work doesn't move existing values). - New DAEDALUS_DECODE_* status enum (OK / NO_FRAME / ERR_OPEN / ERR_SEND / ERR_RECV / ERR_CODEC). - Converted the prior `enum daedalus_msg_type` to #defines — high-bit values exceed INT_MAX and tripped -Wpedantic on userspace; kernel uABI headers use the same idiom. Kernel (kernel/daedalus_v4l2_chardev.c): - New debugfs entry /sys/kernel/debug/daedalus_v4l2/test_decode: writing raw bitstream bytes wraps them in a REQ_DECODE (codec=VP9 for Phase 8.4) and enqueues with an auto-incrementing cookie. - daedalus_chardev_write learned RESP_FRAME: parses the payload and emits a single pr_info line with decode metadata. Keeps existing PONG handling on the default arm. Daemon (daemon/src/...): - chardev_client.{c,h} — opens /dev/daedalus-v4l2, blocking read loop, single-buffer write() responses (kernel chardev has only .write, not .write_iter, so writev lands as -EINVAL — discovered the hard way during first run). - decoder.{c,h} — lazily-opened AVCodecContext per codec, shared AVPacket/AVFrame pair, descriptor-driven plane walker (av_pix_fmt_desc_get) so the same hash path covers YUV420P, YUV422P, YUV444P, GBRP and other 8-bit planar layouts. Generalised after first run decoded testsrc as GBRP (71) rather than the assumed YUV420P. - `daemon` command in main.c opens the chardev and runs the loop until SIGINT/SIGTERM. Cookie correlation handled end-to-end. - ffmpeg_loader gained av_pix_fmt_desc_get (23 symbols total). Build: - CMakeLists adds chardev_client.c + decoder.c; explicit -I../include for the shared protocol header. - Still -Wall -Wextra -Wpedantic clean. Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712): $ ffmpeg ... -pix_fmt yuv420p -c:v libvpx-vp9 -frames:v 1 \ -y /tmp/vp9_test.ivf $ python3 ... strip IVF framing → vp9_keyframe.bin (3268 B) $ sudo insmod kernel/daedalus_v4l2.ko $ daedalus_v4l2_daemon -v daemon & $ sudo dd if=vp9_keyframe.bin \ of=/sys/kernel/debug/daedalus_v4l2/test_decode daemon: REQ_DECODE cookie=2 → decoded yuv420p 320x240 fnv1a=0x6ef10d71 luma=76800 chroma=38400 kernel: RESP_FRAME cookie=2 status=0 320x240 pixfmt=0 fnv1a=0x6ef10d71 ← matches daemon ✓ Hash properties verified: cookie=2 testsrc 3268 B → 0x6ef10d71 (first decode) cookie=3 red 44 B → 0x7f6e5dc5 (content-dependent ✓) cookie=4 testsrc 3268 B → 0x6ef10d71 (deterministic ✓) cookie=5 64 B random → status=101 (ERR_SEND, daemon alive) Daemon survives bad input (FFmpeg "Invalid sync code" wrapped into structured ERR_SEND response). Clean SIGTERM shutdown, clean rmmod. Phase 8.4 acceptance criteria met: - ✓ end-to-end kernel→daemon→FFmpeg→kernel round-trip - ✓ cookie correlation per request/response pair - ✓ content-dependent + deterministic digest - ✓ structured error responses (no daemon crash on bad input) - ✓ clean teardown (SIGTERM + rmmod) - ✓ builds clean on both kernel kbuild and daemon CMake Per correctness-before-speed: - Real chardev I/O (no shortcuts, no select-loop hacks) - Real FFmpeg AVCodecContext lifecycle (lazily opened, properly freed on cleanup) - Descriptor-driven plane walk (generalises across pix_fmts) - Structured error path (not just log-and-continue) - All resource paths cleaned up on every error branch - Documented why FNV-1a digest, why write() not writev(), why pix_desc walk in docs/phase_8_4_closure.md Phase 8.5 next: V4L2 m2m queue submits REQ_DECODE from vidioc_qbuf; dmabuf carries actual pixel data so the chardev's 64 KiB cap doesn't gate frame size; begin substituting daedalus_dispatch_* into the daemon's decode path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:22:16 +00:00
marfrit	873a04c622	Phase 8.3: userspace daemon scaffold + FFmpeg dlopen + parse path Builds the daemon executable per the locked Phase 8 architecture (Option γ: dlopen FFmpeg at runtime). Phase 8.3 scope: parse path validation only — no V4L2 wiring, no decode, no chardev connection. Components: - daemon/CMakeLists.txt — CMake with -Wall -Wextra -Wpedantic clean. pkg-config for FFmpeg headers; only -ldl + -lpthread at link time. - daemon/src/main.c — entry point, signal handlers (SIGINT/SIGTERM), command dispatcher. Currently `parse <file>`. - daemon/src/ffmpeg_loader.{c,h} — runtime FFmpeg loader. dlopens libavformat.so.61, libavcodec.so.61, libavutil.so.59. Resolves 22 function pointers using POSIX-recommended (void)& dlsym idiom (per POSIX.1-2017 dlsym(3p) Rationale). - daemon/src/parser.{c,h} — demux loop via avformat_open_input + av_read_frame. Per-frame logging on -v. - daemon/src/log.{c,h} — logging facade (stderr Phase 8.3; syslog/journal planned for 8.5+). Verification on hertz: $ ffmpeg -f lavfi -i testsrc=duration=2:size=320x240:rate=30 \ -c:v libvpx-vp9 -y /tmp/testsrc.ivf $ daedalus_v4l2_daemon parse /tmp/testsrc.ivf [INFO] FFmpeg loaded: 7.1.3-0+deb13u1+rpt1 (libavformat 61.7.100) [INFO] video stream #0: codec=vp9 (Google VP9) 320x240, 0/0 fps [INFO] parse complete: 60 frames (1 key) total 17859 bytes Error paths verified: - Missing file → "avformat_open_input(...): code -2", exit 1 - No command → usage message, exit 2 - Bad command → usage message, exit 2 Per correctness-before-speed: - Real CMake (no Makefile hacks) - pkg-config for headers - POSIX-conformant dlsym pattern (no -Wpedantic suppression) - Real signal handling + proper exit codes - Real logging with timestamp + level - Headers included at compile-time for type safety; dlopen decouples runtime - All FFmpeg resources freed on every exit path - Builds clean on -Wall -Wextra -Wpedantic Phase 8.3 acceptance criteria met: - ✓ daemon binary builds - ✓ dlopen FFmpeg at runtime - ✓ demux a VP9 IVF file end-to-end - ✓ per-frame metadata logged correctly - ✓ frame count + keyframe count + byte total accurate Phase 8.4 next: wire daemon to /dev/daedalus-v4l2 chardev, add REQ_DECODE / RESP_FRAME handling, drive VP9 decode end-to-end via daedalus_dispatch_ from daedalus-fourier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:10:22 +00:00
marfrit	895f57c63a	Phase 8.2: kernel ↔ daemon chardev bridge with round-trip test Adds /dev/daedalus-v4l2 misc chardev to the kernel module. The chardev is the IPC channel for the future userspace decoder daemon: kernel enqueues REQ_* messages, daemon read()s them, processes, write()s RESP_* back. Wire protocol (pre-1.0, header in include/daedalus_v4l2_proto.h): - struct daedalus_msg_hdr: magic (D04V) + version + type + cookie + payload_len + reserved - Request/response separated by high bit of type field - Max 64 KiB payload per message - Cookie correlates request with matching response Kernel implementation (kernel/daedalus_v4l2_chardev.{c,h}): - Single-instance chardev (-EBUSY on second open) - In-kernel FIFO bounded at 64 messages - Blocking + non-blocking read; poll() with EPOLLIN on queued - write() parses + validates header, logs response at pr_debug - Bad magic → -EBADMSG, bad version → -EPROTO, oversize → -EMSGSIZE - All error paths free resources Phase 8.2 test trigger via debugfs: - /sys/kernel/debug/daedalus_v4l2/test_ping — any byte enqueues a PING with a fixed 24-byte payload. Removed in Phase 8.4 when real REQ_DECODE from V4L2 path takes over. Userspace verification tool (tools/test_chardev_pingpong.c): - Real C program, proper error reporting via strerror - Validates the 6-step round-trip: open → empty-queue EAGAIN → trigger ping → read PING → verify all fields → write PONG → close - Builds with -Wall -Wextra clean Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712): $ sudo insmod daedalus_v4l2.ko $ sudo tools/test_chardev_pingpong opening /dev/daedalus-v4l2... non-blocking read on empty queue: EAGAIN ✓ injected PING via debugfs ✓ read PING: magic ✓ version ✓ type=PING ✓ cookie=0x1234 ✓ payload=24 bytes payload: "DAEDALUS-V4L2-PING-PL" wrote PONG (cookie=0x1234) ✓ ALL TESTS PASSED. $ sudo rmmod daedalus_v4l2 # clean Per correctness-before-speed: full kerneldoc on structs, 8-tab kernel style, SPDX headers, proper error paths, real test program (not "I ran it once"), failure-mode coverage documented. Phase 8.3 next: userspace daemon with dlopen'd FFmpeg parse path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:05:54 +00:00
marfrit	9415b7e0f7	Phase 8.1: kernel V4L2 device skeleton (out-of-tree module) Out-of-tree Linux kernel module registering /dev/videoNN. Phase 8.1 scope: skeleton only — VIDIOC_QUERYCAP works, no codec ioctls / no vb2_queue / no controls yet. Real V4L2 plumbing throughout per "correctness before speed": platform_device + v4l2_device + video_device, properly nested with error paths and devm_kzalloc-managed lifetime. Per-cycle 9 discipline ports to kernel code: SPDX header, kernel coding style (8-tab, static-by-default), kerneldoc on structs, no shortcuts. Files (~250 LOC total): - kernel/Makefile — out-of-tree kbuild with checkpatch target - kernel/daedalus_v4l2_main.c — module init/exit + probe/remove Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712): - Builds clean with -Wall -Wextra. No warnings. - modprobe / rmmod round-trip clean. No dmesg taints beyond the expected "out-of-tree taint" line. - v4l2-ctl --list-devices shows: "daedalus-fourier V3D7+NEON (platform:daedalus_v4l2): /dev/video0" - VIDIOC_QUERYCAP returns driver/card/bus/caps as specified. - v4l2-compliance: 44/48 passing. The 4 failures are exactly the format/buffer ioctls Phase 8.2 will implement (ENUM_FMT, G_FMT, Scaling, REQBUFS) — not skeleton bugs, legitimately-absent features. Documentation: docs/phase_8_1_closure.md captures full verification output + Phase 8.2 plan. Phase 8.1 acceptance criteria met: - ✓ /dev/videoNN appears via v4l2-ctl --list-devices - ✓ VIDIOC_QUERYCAP responds with sensible values - ✓ rmmod is clean (no kref leaks) - ✓ v4l2-compliance passes except for explicit Phase 8.2 work Next: Phase 8.2 chardev bridge for kernel ↔ daemon IPC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:03:22 +00:00
marfrit	89f56e4b49	README: add sibling link back to daedalus-fourier Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 14:57:55 +00:00
marfrit	c7d8050cc9	Initial scaffold: daedalus-v4l2 sibling repo V4L2 stateless decoder for Pi 5, backed by sibling daedalus-fourier kernel library (VP9 + AV1 CDEF + H.264 video decode kernels on VideoCore VII compute + ARM NEON). Architecture locked 2026-05-18 by mfritsche per daedalus-fourier/docs/phase8_scoping.md: - Option B: Linux kernel V4L2 shim + userspace daemon (not v4l2loopback). Real /dev/videoNN; proper DRM PRIME for browser zero-copy. - Option γ: dlopen FFmpeg at runtime as parser. No vendoring; fastest to v1. - Sibling repo (this repo): V4L2-side work outside of daedalus-fourier so kernel-library API stays clean. Components: kernel/ - Linux out-of-tree kernel module (GPLv2; V4L2 device + chardev bridge to userspace daemon) daemon/ - userspace decoder daemon (BSD-2-Clause; links libdaedalus_core.a from sibling; dlopens FFmpeg) docs/ - architecture + 7-phase roadmap (8.1..8.7) include/ - shared headers between kernel and daemon Roadmap (7 sub-phases, ~1 week each): 8.1 kernel skeleton (/dev/videoNN with no-op ioctls) 8.2 chardev bridge (kernel ↔ daemon ping-pong) 8.3 daemon FFmpeg dlopen + parse path 8.4 VP9 end-to-end via daedalus_dispatch_* 8.5 dmabuf / DRM PRIME for zero-copy 8.6 AV1 + H.264 codec support 8.7 performance: hit 30fps@1080p (project floor) No code yet — only README + design docs + directory structure. First implementation work starts in Phase 8.1 next session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 14:54:56 +00:00

26 Commits