f55b2cd002afdfd08f3c093627317f92e4929074
16 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
f55b2cd002 |
kernel: media_request_get/put around inf->req (UAF safety)
Sonnet pre-deployment review flagged a SHIP-WITH-EYES-OPEN risk:
Phase 8.13's inf->req captured src_buf->vb2_buf.req_obj.req as a
raw pointer with no media_request_get(). On the normal decode
path that's fine because vb2-core holds its own reference until
v4l2_m2m_buf_done_and_job_finish releases it.
But on a concurrent cancel (MEDIA_IOC_REQUEST_REINIT or a process
kill triggering buf_request_complete from the cancel path before
RESP_FRAME comes back), vb2 could drop its reference first. Our
inf->req would then dangle through v4l2_ctrl_request_complete +
buf_done_and_job_finish — UAF.
Fix matches the cedrus / rkvdec pattern: take our own reference
when we capture the pointer, release it after we're done with it
(after buf_done_and_job_finish to keep the ordering crystal-clear).
/* in daedalus_device_run, after inf->req = src_buf->...->req */
if (inf->req)
media_request_get(inf->req);
/* in daedalus_complete_resp_frame, after buf_done_and_job_finish */
if (inf->req)
media_request_put(inf->req);
Verified on hertz:
- libva path (request-bound, inf->req != NULL): byte-exact NV12,
same FNV-1a as standalone.
- test_m2m_stream (direct QBUF, inf->req == NULL): 30/30 frames
decoded, conditional skip works.
- No kernel oops / WARN, no leak in dmesg.
Add #include <media/media-request.h> for the helpers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
f04d7000f8 |
Phase 8.13: byte-exact end-to-end via libva (consumer target hit)
The project's consumer-side goal landed: a real VAAPI consumer
(ffmpeg with -hwaccel vaapi) drives our libva backend → V4L2
driver → daemon → byte-exact NV12 output back to ffmpeg.
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format nv12 -i vp9_small.ivf \
-f rawvideo -y /tmp/vp9_via_libva.nv12
cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12 → match
18432-byte NV12 byte-for-byte identical to plain ffmpeg
-pix_fmt nv12 software decode. The project_consumer_target
memory's deliverable shape — "V4L2 stateless node consumed by
a real VAAPI client" — is achieved.
Two related kernel changes:
1. v4l2_ctrl_handler_setup(&ctx->hdl) after registration —
matches rkvdec/cedrus/hantro. Brings each registered
compound control out of "uninitialised" state via
std_init_compound defaults.
2. Per-request control completion in the decode path —
the real fix for "Timeout when waiting for media request".
vb2-core's vb2_buffer_done unbinds the BUFFER's req_obj
on normal decode completion, but the per-request CONTROL
object stays bound. buf_request_complete fires only from
queue-cancel paths (vb2-core line 2284), NOT from normal
buf_done. The driver must call
v4l2_ctrl_request_complete(req, hdl) explicitly from the
completion path.
struct daedalus_inflight gained a `struct media_request
*req` field, captured from src_buf->vb2_buf.req_obj.req
in device_run. daedalus_complete_resp_frame then calls
v4l2_ctrl_request_complete before
v4l2_m2m_buf_done_and_job_finish — triggers
MEDIA_REQUEST_STATE_COMPLETE and wakes the request fd
poll.
For non-request flows (test_m2m_stream direct QBUF)
inf->req is NULL; the conditional skips the call.
Both consumer styles work concurrently.
Diagnostic clarification (was Phase 8.13a):
strace traced three S_EXT_CTRLS calls per frame:
1. H264_PROFILE + H264_LEVEL → EINVAL (we don't register)
2. HEVC_PROFILE + HEVC_LEVEL → EINVAL (we don't register)
3. VP9_FRAME + VP9_COMPRESSED_HDR → SUCCESS
The first two are harmless: libva probes whether we support
H264/HEVC integer profile/level controls during config
negotiation; we don't (we expose them as stateless), so EINVAL
just falls through. The actual VP9 stateless controls (#3)
succeeded all along — the libva-side "Unable to set control(s)"
log was misleading us into thinking the control path was the
bug.
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
daemon log:
REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
decoder: opened vp9 context
decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe ...
ffmpeg side:
no Timeout, no Decoding error
/tmp/vp9_via_libva.nv12: 18432 bytes
cmp vs reference: byte-for-byte identical.
Roadmap update:
- 8.10/8.11, 8.12, 8.13 marked closed with closure docs.
- 8.14 = multi-frame VP9 via libva, AV1 + H.264, mpv/Firefox
higher-level consumers.
Per correctness-before-speed:
- strace + kernel-source-reading found the actual root cause
rather than guessing.
- Conditional v4l2_ctrl_request_complete preserves the existing
test_m2m_stream non-request path — both consumer styles work
concurrently without per-flow branching elsewhere.
- Byte-exact pixel comparison, not "frame size matches."
Phase 8.14 next: multi-frame stream + multi-codec via libva +
mpv/Firefox.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
a7d585eee8 |
Phase 8.12: first VP9 frame decoded via libva
ffmpeg -hwaccel vaapi → libva-v4l2-request-fourier → /dev/video0 → daedalus_v4l2 kernel → REQ_DECODE on the chardev → daemon FFmpeg decode → byte-exact NV12 (FNV-1a 0x1eb34bfe, same hash the standalone test_m2m_stream produces for the same 128x96 VP9 keyframe). The pixel-correct decode through the libva path is the milestone. What's NOT yet working: libva times out on the media_request fd because buf_request_complete never fires (vb->req_obj.req is NULL when buf_done runs — the S_EXT_CTRLS EINVAL leaves the buffer un-bound to the request even though the buffer queues anyway). Phase 8.13 fixes the EINVAL so the request bind takes and the completion signal propagates. Kernel V4L2 request API integration: - media_device_ops.req_validate / req_queue = vb2_request_ validate / v4l2_m2m_request_queue (Phase 8.11) — MEDIA_IOC_REQUEST_ALLOC succeeds. - vb2_queue.supports_requests = true on OUTPUT queue — without this v4l2-core rejects S_EXT_CTRLS(REQUEST_VAL). - vb2_ops.buf_request_complete = daedalus_buf_request_complete → v4l2_ctrl_request_complete(req, &ctx->hdl). Without this v4l2-core WARNs at videobuf2-v4l2.c:440. - vb2_ops.buf_out_validate: sets field=V4L2_FIELD_NONE on OUTPUT buf. Required for the same WARN check. - requires_requests intentionally NOT set: lets the existing test_m2m_stream (direct QBUF, no request) keep working alongside the libva path. Stateless control re-registration: - Switched from v4l2_ctrl_new_std_compound(NULL p_def) to v4l2_ctrl_new_custom(&cfg, NULL) — pattern rkvdec / cedrus / hantro use. v4l2-core auto-fills elem_size + type from std table (verified: VP9_FRAME elem_size=168, matches sizeof(struct v4l2_ctrl_vp9_frame)). - No-op s_ctrl callback so SET requests don't crash — daemon ignores values, FFmpeg re-parses the bitstream. Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712): ffmpeg -hwaccel vaapi -i vp9_small.ivf … daemon: REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes daemon: decoder: opened vp9 context daemon: decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe … Same FNV-1a hash as the standalone test_m2m_stream produces for the same VP9 keyframe. End-to-end through libva. Remaining (Phase 8.13): - S_EXT_CTRLS EINVAL on V4L2_CID_STATELESS_VP9_FRAME despite matching elem_size — needs deeper validate-path debugging. - Once the request bind takes, buf_request_complete fires on buf_done, request fd signals completion, libva DQBUFs the decoded NV12. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
0de0288dce |
Phase 8.10+8.11: libva consumer integration scaffold
Brings daedalus_v4l2 from "standalone test client" to "VAAPI-
discoverable decoder" by adding the surface formats and
media-controller plumbing that libva-v4l2-request-fourier
(sibling repo) requires.
libva-v4l2-request-fourier patches (pushed separately):
- b5b3acf: daedalus_v4l2 added to known_decoder_drivers
- 2146341: meson option gate
This commit (daedalus-v4l2 side, 3 production changes):
1. V4L2_PIX_FMT_NV12 (single-plane) on CAPTURE
- Added to daedalus_capture_formats[] alongside NV12M + P010
- daedalus_fill_capture_fmt handles num_planes=1 case
(sizeimage = W*H*3/2, bytesperline = W)
- daemon pack_nv12_single_to_plane: Y at base+0,
interleaved CbCr at base+(stride*H); same byte content
as NV12M two-plane, different layout
- Required because libva-v4l2-request-fourier's video.c
only knows non-multi-plane NV12 (it advertises
v4l2_mplane=true but uses the single-plane fourcc).
- Verified byte-exact via test_m2m_stream against
ffmpeg -pix_fmt nv12 reference (VP9 1080p 10 frames,
31 MB).
2. V4L2 Request API media ops
- daedalus_media_ops = { vb2_request_validate,
v4l2_m2m_request_queue } assigned to mdev.ops before
media_device_init.
- Without this, MEDIA_IOC_REQUEST_ALLOC returned
-ENOTTY and no VAAPI consumer could allocate a
media_request.
3. Stateless control registration via v4l2_ctrl_new_custom
- Switched from v4l2_ctrl_new_std_compound(NULL p_def)
to v4l2_ctrl_new_custom — pattern rkvdec/cedrus/
hantro use. Adds a no-op s_ctrl callback.
Verification (hertz, Pi 5, 6.12.75+rpt-rpi-2712):
LibVA trace through `ffmpeg -hwaccel vaapi`:
vaInitialize / Profiles / Entrypoints / CreateConfig /
QuerySurfaceAttributes / CreateSurfaces / CreateContext
(cap_pool: 24 slots, 1 plane each) / CreateBuffer
(slice + picture params) / MEDIA_IOC_REQUEST_ALLOC
— all succeed.
Standalone NV12 decode path:
test_m2m_stream vp9_1080_stream.ivf out.nv12 1920 1080 vp9 nv12
→ 10/10 frames, byte-exact vs ffmpeg -pix_fmt nv12
vainfo (via libva-v4l2-request-fourier with our driver):
7 VAProfile entries with VAEntrypointVLD
(H264 Main/High/CBaseline/MultiviewHigh/StereoHigh,
VP9Profile0, AV1Profile0)
What's NOT here (Phase 8.12):
The libva trace stops at VIDIOC_S_EXT_CTRLS returning
EINVAL when populating V4L2_CID_STATELESS_VP9_FRAME on
the request. The compound-control payload validation
against the kernel's expected struct shape rejects.
This isn't a "missing line" fix — it needs proper
stateless control plumbing (the SPS/PPS/SliceParams
get_dims, validate, default-value paths that in-tree
rkvdec/cedrus/hantro implement to satisfy v4l2-core's
std_validate). Documented as Phase 8.12 scope.
The shipped integration is itself a meaningful deliverable:
all the framework scaffolding is in place; the remaining
gap is well-characterised and bounded.
See docs/phase_8_10_11_closure.md for the full trace
analysis + next-phase plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d84efdb125 |
Phase 8.9: long-form stress + multi-codec HDR + libva scoping
Three verification deliverables; no production code changes
(infrastructure from 8.8 was sufficient).
1. libva-v4l2-request consumer investigation (task 95):
- bootlin/libva-v4l2-request@master supports MPEG-2 /
H.264 / HEVC only. No VP9, no AV1.
- H264 expects V4L2_PIX_FMT_H264_SLICE_RAW (older
fourcc); we advertise V4L2_PIX_FMT_H264_SLICE.
- CAPTURE expects V4L2_PIX_FMT_NV12 (single-plane);
we advertise NV12M + P010.
- Real integration = patch libva-v4l2-request to add
VP9 + AV1 mappings + accept the newer H.264 fourcc.
Multi-session work — pushed to Phase 8.10.
2. Long-form stress test (task 96):
- Built a 1800-frame (60s @ 30fps) VP9 1080p stream
by Python concat of vp9_5s.ivf × 12 with PTS
adjustment and re-muxed IVF header.
- 1800 / 1800 frames decoded cleanly through
test_m2m_stream + daemon, fps=120.9 sustained
across 14.9 s wall, p99=17.3 ms/frame (well inside
the 33 ms 30fps budget).
- Daemon alive after 3620 cookies across two
back-to-back runs, RSS=23 MiB — no leak.
- No kernel oops/WARN, no fps degradation across
the long run.
3. Multi-codec HDR (task 97):
- AV1 1080p 10-bit → P010: byte-exact vs ffmpeg
p010le. fps 17.1 (below 30fps target; AV1 10-bit
is intrinsically expensive).
- H.264 1080p 10-bit (high10) → P010: byte-exact
vs ffmpeg p010le. fps 26.9 (close to target).
- Combined with 8.8's VP9-10bit P010 result
(48.8 fps): all three codecs' 10-bit paths
produce byte-exact P010 output.
Roadmap update (docs/roadmap.md):
- 8.9 marked closed with the scope-cut explained.
- 8.10 = libva-v4l2-request VP9/AV1 patch + end-to-end
consumer integration (the actual user-facing loop:
mpv --hwdec=vaapi → libva-v4l2-request → /dev/video0
→ daemon → decoded frame).
Per correctness-before-speed: characterised the libva
integration scope rigorously rather than starting a
multi-session battle in this phase. The bounded
deliverables (stress test + HDR matrix) ship clean and
prove the existing infrastructure handles real-world
workloads stably.
Phase 8.10 next: build + patch libva-v4l2-request on
hertz; end-to-end with mpv.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
1d0db3b5a9 |
docs: pure ffmpeg vs daedalus pipeline CPU comparison
Measured on hertz (Pi 5, 6.12.75+rpt-rpi-2712, FFmpeg 7.1.3)
to quantify the architectural cost/benefit of routing decode
through the V4L2 m2m + chardev + dmabuf path vs running
ffmpeg standalone.
1080p × 150 frames, decode-as-fast-as-possible:
VP9 8-bit: ffmpeg 214.9% CPU / 1083ms wall
daedalus 96.3% CPU / 1229ms wall
AV1 8-bit: ffmpeg 201.5% CPU / 1162ms wall
daedalus 96.6% CPU / 1478ms wall
H.264 8-bit: ffmpeg 205.8% CPU / 1063ms wall
daedalus 100.1% CPU / 1020ms wall
VP9 10-bit: ffmpeg 155.8% CPU / 269ms wall
daedalus 91.6% CPU / 131ms wall
Key takeaway: the daedalus pipeline uses ~half the CPU for
roughly the same wall throughput. FFmpeg standalone defaults
to 2 threads; for single-stream decode that doesn't
parallelise well, so the 2× CPU usage is overhead, not
parallelism benefit. The daemon's single-threaded serialised
event loop avoids that tax.
For the project's 30fps-floor-is-fine target ("daily YouTube
with CPU free for vscode"), daedalus leaves ~2× the CPU
headroom for the rest of the desktop at the same playback
rate.
VP9-10bit is striking — daedalus is faster wallclock too
(131ms vs 269ms) because at small per-frame work FFmpeg's
thread pool spin-up dominates.
Note: "daedalus" still uses FFmpeg internally (Phase 8.8
explicitly deferred QPU substitution after measurement showed
30fps@1080p was already met). The benefit here is
architectural — single-threaded decode, out-of-process
daemon, dmabuf zero-copy — not QPU offload.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
1ae9528e76 |
Phase 8.8: throughput baseline + multi-codec streams + HDR
Per the correctness-before-speed principle: measure before
optimising. Roadmap going in said "QPU dispatch substitution
to hit 30fps@1080p". Measurement on hertz shows the FFmpeg
software path already hits 65-88 fps@1080p across all three
codecs — QPU substitution would be premature optimisation.
So 8.8 ships what's actually useful:
1. Per-frame timing in test_m2m_stream.
2. Multi-frame AV1 + H.264 streams verified byte-exact at
1080p (closes the "VP9-only stream tests" gap from 8.7).
3. HDR / 10-bit via V4L2_PIX_FMT_P010 + daemon
pack_p010_to_plane.
Test harness (tools/test_m2m_stream.c):
- Per-frame µs timing via CLOCK_MONOTONIC; reports mean/p50/
p99/min/max + wall ms + fps.
- Annex-B H.264 parser: split on 3-/4-byte start codes,
accumulate NALs into access units (push on VCL NAL types
1 or 5). Without AU grouping FFmpeg rejects SPS/PPS-only
buffers as "no frame!".
- Format auto-detect (DKIF magic → IVF; else Annex-B).
- Optional 6th arg `[capture]`: nv12m | p010.
- CAPTURE mmap path generalised for num_planes==1 (P010).
Kernel (kernel/daedalus_v4l2_main.c):
- CAPTURE formats array {NV12M, P010}; enum_fmt walks it.
- daedalus_fill_capture_fmt takes a fourcc:
NV12M: 2 planes, W*H + W*H/2 bytes, bpl=W
P010: 1 plane, W*H*2 + W*H bytes, bpl=W*2
- try_fmt preserves caller fourcc when supported.
- daedalus_complete_resp_frame's dmabuf path now sets each
plane's payload to vb2_plane_size(vb,p) — generalises
cleanly across 1-plane (P010) and 2-plane (NV12M) layouts;
the daemon fully populates the plane so payload =
sizeimage.
Daemon (daemon/src/decoder.c):
- pack_p010_to_plane: YUV420P10LE → P010 single-plane.
10-bit samples shifted left by 6 to MSB-align in 16-bit
words per V4L2 ABI. Y at base+0, interleaved CbCr right
after Y plane (per format spec for single-plane P010).
Strips source stride padding; respects destination stride.
- daedalus_decoder_run_request dispatches on
req->capture_pix_fmt (NV12M → pack_nv12_to_planes; P010
→ pack_p010_to_plane; else warn + skip).
- Includes <linux/videodev2.h> for fourcc constants.
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
1080p throughput baseline (30 frames testsrc, dmabuf path):
VP9 1080p: mean 12.0 ms, p99 15.9 ms, fps **83.1**, byte-exact ✓
AV1 1080p: mean 15.4 ms, p99 41.0 ms, fps **65.0**, byte-exact ✓
H.264 1080p: mean 11.3 ms, p99 21.5 ms, fps **88.3**, byte-exact ✓
All 2-3× over the 30fps-floor-is-fine criterion.
HDR / 10-bit 1080p P010:
10 frames, 62 MB output, fps **48.8**, byte-exact vs
`ffmpeg -pix_fmt p010le -f rawvideo`.
Small-frame P010 (320×240): fps 966 — fixed daemon overhead
dominates at low resolutions.
v4l2-compliance unchanged from 8.7: 49/49 passing.
Format enumeration confirms NM12 + P010 on CAPTURE.
Clean SIGTERM + rmmod; no kernel oops/WARN.
Roadmap update (docs/roadmap.md):
- 8.8 marked closed with closure-doc reference, including
the explicit "QPU substitution not needed" rationale.
- 8.9 reshaped: libva-v4l2-request consumer integration
(per project_consumer_target memory) — the actual
user-facing endpoint.
Per correctness-before-speed:
- Measured first; QPU work explicitly justified-out via data.
- Byte-exact pixel comparison for every codec/format combo
(NV12: VP9, AV1, H.264; P010: VP9 10-bit at 320×240 and
1080p).
- AU grouping in the Annex-B parser is the correct
semantic boundary, not just a workaround.
- vb2_plane_size for payload generalises to any plane
count, not hardcoded to 2.
Phase 8.9 next: libva-v4l2-request integration — close
the loop from YouTube/Firefox to /dev/video0 + daemon
playback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5965805d86 |
Phase 8.7: media controller + multi-frame streaming verification
Two pieces — both shipped:
1. Media controller binding closes the last v4l2-compliance
failure from 8.6 (DECODER_CMD, which requires has_media on
stateless decoders) and unlocks the V4L2 request API for
libva-v4l2-request.
2. Multi-frame streaming test exercises the daemon's
AVCodecContext state preservation across many REQ_DECODE
calls — Phase 8.6's tests pushed exactly one keyframe per
invocation; real content has P-frame references.
Compliance now reaches **49/49 passing.**
Kernel (kernel/daedalus_v4l2_main.{c,h}):
- Added `struct media_device mdev` to daedalus_dev.
- media_device_init(&mdev) BEFORE v4l2_device_register so
v4l2-core sees v4l2_dev.mdev = &mdev and binds the m2m
entities into the graph during register.
- After video_register_device:
v4l2_m2m_register_media_controller(..., MEDIA_ENT_F_PROC_VIDEO_DECODER)
then media_device_register so userspace sees the complete
graph in /dev/mediaN with the decoder entity tagged.
- daedalus_remove unwinds in reverse: unregister media,
unregister mc, unregister video, release m2m, unregister
v4l2, cleanup mdev.
- Error paths added for both new failure points.
Test harness (tools/test_m2m_stream.c, new):
- Multi-frame V4L2 m2m client: parses IVF → 4-deep buffer
rings on both queues → per-frame QBUF/DQBUF loop →
concatenates decoded NV12 to output file. Returns 0 only
if every input frame decoded without error.
- Same codec vocabulary as test_m2m_decode (vp9 | av1 |
h264 via 5th arg).
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
v4l2-compliance: 49 tests, 49 passed, 0 failed, 0 warnings.
$ v4l2-ctl --list-devices
daedalus-fourier V3D7+NEON (platform:daedalus_v4l2):
/dev/video0
/dev/media3
VP9 320×240 30 frames (1 keyframe + 29 P-frames, 3.46 MB
NV12): byte-for-byte match vs `ffmpeg -i in.ivf -pix_fmt
nv12 -f rawvideo`.
VP9 1920×1080 10 frames (31 MB NV12 through the dmabuf
path): byte-for-byte match vs same reference command.
Daemon log shows cookies 1..30 all completing cleanly in
order; lazily-opened AVCodecContext maintains reference
frames across the chardev round-trips.
Clean SIGTERM + rmmod, no oops/WARN.
Roadmap update (docs/roadmap.md):
- 8.7 marked closed with closure-doc reference.
- 8.8 reshaped: perf profiling, QPU dispatch substitution
via daedalus-fourier, multi-frame AV1/H.264, HDR (P010M).
Per correctness-before-speed:
- Order-correct media controller lifecycle (init → bind
v4l2_dev → register video → register mc → register
media; reverse for teardown).
- 4-deep buffer rings on both queues — the scheduler
actually pipelines multiple in-flight cookies through
the chardev (not just one-at-a-time as in 8.5/8.6 tests).
- Bit-exact comparison against ffmpeg, not "looks right."
- All resource paths cleaned on every error branch.
Phase 8.8 next: profile daemon hot loops, dlopen
daedalus-fourier from the daemon, swap FFmpeg per-block
calls for daedalus_dispatch_* where the kernel matches,
target 30fps@1080p from 30fps-floor-is-fine memory.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c7f6fb90cb |
Phase 8.6: dmabuf + AV1 + H.264 + stateless controls
Removes the Phase 8.5 64 KiB frame-size cap by exporting CAPTURE
buffers as dmabuf-fds the daemon mmaps and writes pixels into
directly. Adds AV1 + H.264 codec support, V4L2 stateless control
registration, and the compliance polish that brings the driver
to 47/48 v4l2-compliance pass.
Protocol (include/daedalus_v4l2_proto.h):
- struct daedalus_req_decode grew capture-buffer metadata
(width/height/pix_fmt/num_planes + per-plane size+stride).
- New DAEDALUS_IOC_GET_DMABUF ioctl on the chardev: daemon
asks for a per-plane dmabuf fd, kernel calls vb2_core_expbuf
in daemon task context so the fd lands in the daemon's table.
Kernel m2m driver (kernel/daedalus_v4l2_main.c):
- Both queues switched to vb2_dma_contig_memops. OUTPUT was
vmalloc in 8.5; the switch is needed because vmalloc doesn't
honour V4L2_MEMORY_FLAG_NON_COHERENT and v4l2-compliance's
REQBUFS test rejected the driver because of it. We still
read bitstream via vb2_plane_vaddr (dma_contig gives a
kernel virtual address just like vmalloc did).
- dma_coerce_mask_and_coherent(DMA_BIT_MASK(32)) in probe.
- queue_setup populates alloc_devs[plane] = &pdev->dev for
both queues; allow_cache_hints=1 on both.
- daedalus_export_capture_dmabuf(cookie, plane, flags, *fd):
walks inflight list, calls vb2_core_expbuf on the CAPTURE
buffer in the caller's (daemon's) task context.
- device_run fills the new REQ_DECODE capture fields from
ctx->dst_fmt and maps ctx->src_fmt.pixelformat to
DAEDALUS_CODEC_VP9 / _AV1 / _H264 (was hard-wired to VP9).
- daedalus_complete_resp_frame handles both the 8.5 inline
path (kept for debugging) and the 8.6 dmabuf path (pixels
already in CAPTURE buffer, just set payload from metadata).
- enum_fmt advertises all 3 OUTPUT formats (VP9F, AV1F, S264).
- try_fmt preserves userspace colorspace fields instead of
overwriting with REC709 defaults (fixes 8.5 compliance fail).
- s_fmt propagates OUTPUT colorspace → CAPTURE (stateless
decoder round-trip test at v4l2-test-formats.cpp:958).
- 12 V4L2 stateless controls registered per open (VP9_FRAME,
VP9_COMPRESSED_HDR, H264_SPS/PPS/SCALING/PRED_WEIGHTS/
SLICE_PARAMS/DECODE_PARAMS, AV1_FRAME/SEQUENCE/
TILE_GROUP_ENTRY/FILM_GRAIN). Daemon ignores values (FFmpeg
re-parses); registration is what makes libva-v4l2-request
see us.
Kernel chardev (kernel/daedalus_v4l2_chardev.c):
- New unlocked_ioctl dispatching DAEDALUS_IOC_GET_DMABUF to
daedalus_export_capture_dmabuf.
- debugfs test_decode cookies unified with the m2m cookie
allocator via shared daedalus_next_cookie() — kills the
Phase 8.5 namespace collision.
Daemon (daemon/src/...):
- New dmabuf_capture.{c,h}: GET_DMABUF + mmap each plane on
REQ_DECODE; munmap + close on completion. O_RDWR | O_CLOEXEC
is essential — vb2_core_expbuf extracts O_ACCMODE from flags
and exports read-only by default (caught on first run; mmap
-EACCES on PROT_WRITE).
- decoder.{c,h}: lazily opens AV1 + H.264 AVCodecContexts in
addition to VP9 (dropped the -ENOSYS stubs). pack_nv12_to_planes
writes Y line-by-line into planes[0] with planes[0].stride;
interleaves Cb/Cr into planes[1] with planes[1].stride.
- chardev_client.c handle_req_decode: opens dmabuf planes,
runs decode (pixels land in CAPTURE buffer directly), closes
planes, sends metadata-only RESP_FRAME. No wire-pixel
allocation.
Test harness (tools/test_m2m_decode.c):
- Optional 5th arg `codec` (vp9 | av1 | h264). Same client
drives all three codecs.
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
Bit-exact end-to-end vs `ffmpeg -pix_fmt nv12`:
VP9 1920x1080 3,110,400 bytes MATCH
AV1 128x96 18,432 bytes MATCH
H.264 128x96 18,432 bytes MATCH
VP9 1080p went through the full dmabuf path with no chardev
payload bloat — the same chardev that capped at 64 KiB in 8.5
now ferries metadata only and lets the daemon mmap+write a
3.1 MB frame directly into the V4L2 client's buffer.
v4l2-compliance:
Phase 8.1: 44/48
Phase 8.5: 44/48 (different fails after m2m landed)
Phase 8.6: 47/48
Only remaining: VIDIOC_(TRY_)DECODER_CMD (needs media
controller — explicitly Phase 8.7 work).
11 standard compound controls visible:
vp9_frame_decode_parameters, vp9_probabilities_updates,
h264_sequence_parameter_set, h264_picture_parameter_set,
h264_scaling_matrix, h264_prediction_weight_table,
h264_slice_parameters, h264_decode_parameters,
av1_sequence_parameters, av1_frame_parameters,
av1_film_grain (av1_tile_group_entry refused by hdl->error
on this kernel — skipped silently).
Clean SIGTERM + rmmod, no oops/WARN.
Roadmap update (docs/roadmap.md):
- Phase 8.6 marked closed with the closure-doc reference.
- Phase 8.7 reshaped to (1) media controller, (2) perf +
daedalus_dispatch_* substitution, (3) HDR/10-bit, (4)
long-form multi-frame streaming.
Per correctness-before-speed:
- Real V4L2 dmabuf via vb2_core_expbuf (not a sideband
fd-passing hack).
- O_RDWR access mode threaded through correctly.
- Strict pixel-byte comparison against ffmpeg, not "looks
right" eyeballing.
- Each compliance edge documented with the underlying test
source-line + the fix.
- All resource paths cleaned (munmap + close per plane on
every exit, including error paths).
Phase 8.7 next: media controller binding (closes last
compliance fail), per-frame profiling, QPU dispatch
substitution targeting 30fps@1080p from
30fps-floor-is-fine memory.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
6f4b580f7c |
Phase 8.5: full V4L2 m2m driver, VP9 decode via QBUF/DQBUF
Replaces the Phase 8.4 debugfs-triggered chardev path with a
real V4L2 m2m driver. Userspace clients now drive decoding the
standard way — S_FMT / REQBUFS / QBUF on the OUTPUT (bitstream)
queue, DQBUF on the CAPTURE (NV12M) queue. Kernel device_run
packs the bitstream into REQ_DECODE; daemon decodes via FFmpeg;
RESP_FRAME's inline NV12 pixel payload lands in the CAPTURE
buffer. Phase 8.6 swaps the inline payload for dmabuf so big
frames stop being capped at 64 KiB.
Kernel (daedalus_v4l2_main.c, rewritten + main.h added):
- Per-open struct daedalus_ctx: v4l2_fh, m2m_ctx, ctrl_handler,
per-queue v4l2_pix_format_mplane.
- Two vb2_queues (vb2_vmalloc_memops for both — no DMA needed
yet; 8.6 switches CAPTURE to dma_contig for dmabuf-export):
OUTPUT = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, VP9_FRAME
CAPTURE = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE, NV12M
- Full v4l2_ioctl_ops table: querycap, enum_fmt, g/s/try_fmt
for both queues, reqbufs/querybuf/qbuf/dqbuf/create_bufs/
prepare_buf/expbuf/streamon/streamoff via v4l2_m2m_ioctl_*
helpers.
- v4l2_m2m_ops.device_run: peeks next OUTPUT buf, builds
REQ_DECODE inline with the bitstream bytes, enqueues with an
auto-incrementing cookie, stores {ctx, src_buf, dst_buf} in
a per-device inflight list. Job stays open until RESP_FRAME.
- daedalus_complete_resp_frame(): pops the inflight entry,
memcpys inline NV12 pixels into the CAPTURE buffer (Y plane
+ interleaved CbCr), finishes via
v4l2_m2m_buf_done_and_job_finish — NOT plain buf_done +
job_finish, which leaves the src buf on the m2m queue and
causes device_run to immediately re-run on the same input
(caught on first run; second REQ_DECODE for same bitstream +
eventual oops in stop_streaming on teardown).
Kernel (daedalus_v4l2_chardev.c):
- RESP_FRAME handler now hands inline pixel payload to
daedalus_complete_resp_frame so it lands in the CAPTURE
vb2 buffer. Existing PONG and debugfs test_decode paths still
work; the latter produces a harmless ratelimited "unknown
cookie" since it bypasses V4L2 m2m.
Daemon (decoder.c, decoder.h):
- daedalus_decoder_run_request signature extended with
(nv12_out, nv12_cap, nv12_used). After the FNV-1a digest the
decoder packs YUV420P into NV12 in the caller's buffer: Y
plane line-by-line stripped of stride padding; Cb/Cr
interleaved into a single chroma plane. Truncation silent —
kernel only memcpys what fits in the CAPTURE plane.
Daemon (chardev_client.c):
- handle_req_decode allocates a response buffer sized for the
full chardev payload, lets decoder fill the pixel area
after the resp_frame struct, sends the full payload via the
existing send_response.
Test client (tools/test_m2m_decode.c, new):
- Minimal V4L2 m2m client: S_FMT both queues, REQBUFS 1 each,
mmap+fill OUTPUT, QBUF both, STREAMON, poll, DQBUF, dump
CAPTURE planes to a raw NV12 file. ~250 LOC; verifies the
whole flow without needing v4l2-ctl framing.
Roadmap update (docs/roadmap.md):
- Phase 8.4 retitled "daemon ↔ kernel decode round-trip"
to reflect what actually shipped (vs. the original V4L2-
ioctl-driven plan which moved here).
- Phase 8.5 retitled "full V4L2 m2m driver" with closure
status.
- Phase 8.6 reshaped to two tracks: dmabuf + AV1/H.264/
stateless controls + media controller. Adds the punch list
of v4l2-compliance failures (DECODER_CMD, S_FMT colorspace)
that 8.6 will fix.
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
Kernel + daemon build clean (-Wall -Wextra clean both sides).
Test harness drives one VP9 keyframe end-to-end:
OUTPUT REQBUFS -> 2
CAPTURE REQBUFS -> 2
QBUF OUTPUT[0] bytesused=1566
QBUF CAPTURE[0]; STREAMON both
poll revents=0x5
DQBUF OUTPUT[0] flags=0x4001 (DONE)
DQBUF CAPTURE[0] flags=0x4000 payloads=[12288, 6144]
wrote 12288 Y + 6144 UV bytes to /tmp/out_m2m.nv12
Pixel correctness vs reference:
ffmpeg -i vp9_small.ivf -pix_fmt nv12 -f rawvideo -y ref.nv12
cmp /tmp/out_m2m.nv12 /tmp/ref.nv12 → match ✓
Byte-for-byte identical to FFmpeg's stock CPU decode.
v4l2-compliance: detected as Stateless Decoder; most ioctls
pass; two expected fails documented in closure doc
(DECODER_CMD/media controller, S_FMT colorspace).
Clean teardown: SIGTERM the daemon, rmmod the module, no
oops/WARN in dmesg.
Per correctness-before-speed:
- Real V4L2 ioctl table (not stubs); uses v4l2-core helpers
where they exist instead of reinventing.
- v4l2_m2m_buf_done_and_job_finish (not the manual sequence)
to keep scheduler state consistent.
- Bit-exact reference comparison, not just "looks right."
- Documented every compliance failure with the planned fix.
- All resource paths (kmalloc/kfree, inflight list cleanup,
src/dst buf removal in stop_streaming) handled on every
error branch.
Phase 8.6 next: dmabuf-export for CAPTURE (removes 64 KiB
frame-size cap), add AV1+H.264 codecs, add V4L2 stateless
controls + media controller binding, fix the colorspace +
cookie-namespace compliance issues.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2a449632b9 |
Phase 8.4: daemon ↔ kernel decode round-trip (VP9 end-to-end)
Wires the Phase 8.3 FFmpeg loader through the Phase 8.2 chardev
bridge: kernel injects REQ_DECODE carrying a raw VP9 access unit,
daemon hands the bitstream to libavcodec via dlopen, sends
RESP_FRAME back with a content-dependent FNV-1a digest of the
decoded YUV planes. Pure CPU decode for now — Phase 8.5 swaps in
dmabuf + QPU dispatch.
Protocol (include/daedalus_v4l2_proto.h):
- New REQ_DECODE (kernel→daemon) and RESP_FRAME (daemon→kernel)
message types, with fixed-size payload structs.
- New DAEDALUS_CODEC_VP9/AV1/H264 enum (wire-stable so 8.6's
AV1+H.264 work doesn't move existing values).
- New DAEDALUS_DECODE_* status enum (OK / NO_FRAME / ERR_OPEN /
ERR_SEND / ERR_RECV / ERR_CODEC).
- Converted the prior `enum daedalus_msg_type` to #defines —
high-bit values exceed INT_MAX and tripped -Wpedantic on
userspace; kernel uABI headers use the same idiom.
Kernel (kernel/daedalus_v4l2_chardev.c):
- New debugfs entry /sys/kernel/debug/daedalus_v4l2/test_decode:
writing raw bitstream bytes wraps them in a REQ_DECODE
(codec=VP9 for Phase 8.4) and enqueues with an
auto-incrementing cookie.
- daedalus_chardev_write learned RESP_FRAME: parses the payload
and emits a single pr_info line with decode metadata. Keeps
existing PONG handling on the default arm.
Daemon (daemon/src/...):
- chardev_client.{c,h} — opens /dev/daedalus-v4l2, blocking read
loop, single-buffer write() responses (kernel chardev has only
.write, not .write_iter, so writev lands as -EINVAL —
discovered the hard way during first run).
- decoder.{c,h} — lazily-opened AVCodecContext per codec, shared
AVPacket/AVFrame pair, descriptor-driven plane walker
(av_pix_fmt_desc_get) so the same hash path covers YUV420P,
YUV422P, YUV444P, GBRP and other 8-bit planar layouts.
Generalised after first run decoded testsrc as GBRP (71)
rather than the assumed YUV420P.
- `daemon` command in main.c opens the chardev and runs the loop
until SIGINT/SIGTERM. Cookie correlation handled end-to-end.
- ffmpeg_loader gained av_pix_fmt_desc_get (23 symbols total).
Build:
- CMakeLists adds chardev_client.c + decoder.c; explicit
-I../include for the shared protocol header.
- Still -Wall -Wextra -Wpedantic clean.
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
$ ffmpeg ... -pix_fmt yuv420p -c:v libvpx-vp9 -frames:v 1 \
-y /tmp/vp9_test.ivf
$ python3 ... strip IVF framing → vp9_keyframe.bin (3268 B)
$ sudo insmod kernel/daedalus_v4l2.ko
$ daedalus_v4l2_daemon -v daemon &
$ sudo dd if=vp9_keyframe.bin \
of=/sys/kernel/debug/daedalus_v4l2/test_decode
daemon: REQ_DECODE cookie=2 → decoded yuv420p 320x240
fnv1a=0x6ef10d71 luma=76800 chroma=38400
kernel: RESP_FRAME cookie=2 status=0 320x240 pixfmt=0
fnv1a=0x6ef10d71 ← matches daemon ✓
Hash properties verified:
cookie=2 testsrc 3268 B → 0x6ef10d71 (first decode)
cookie=3 red 44 B → 0x7f6e5dc5 (content-dependent ✓)
cookie=4 testsrc 3268 B → 0x6ef10d71 (deterministic ✓)
cookie=5 64 B random → status=101 (ERR_SEND, daemon alive)
Daemon survives bad input (FFmpeg "Invalid sync code" wrapped
into structured ERR_SEND response). Clean SIGTERM shutdown,
clean rmmod.
Phase 8.4 acceptance criteria met:
- ✓ end-to-end kernel→daemon→FFmpeg→kernel round-trip
- ✓ cookie correlation per request/response pair
- ✓ content-dependent + deterministic digest
- ✓ structured error responses (no daemon crash on bad input)
- ✓ clean teardown (SIGTERM + rmmod)
- ✓ builds clean on both kernel kbuild and daemon CMake
Per correctness-before-speed:
- Real chardev I/O (no shortcuts, no select-loop hacks)
- Real FFmpeg AVCodecContext lifecycle (lazily opened, properly
freed on cleanup)
- Descriptor-driven plane walk (generalises across pix_fmts)
- Structured error path (not just log-and-continue)
- All resource paths cleaned up on every error branch
- Documented why FNV-1a digest, why write() not writev(), why
pix_desc walk in docs/phase_8_4_closure.md
Phase 8.5 next: V4L2 m2m queue submits REQ_DECODE from
vidioc_qbuf; dmabuf carries actual pixel data so the chardev's
64 KiB cap doesn't gate frame size; begin substituting
daedalus_dispatch_* into the daemon's decode path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
873a04c622 |
Phase 8.3: userspace daemon scaffold + FFmpeg dlopen + parse path
Builds the daemon executable per the locked Phase 8 architecture
(Option γ: dlopen FFmpeg at runtime). Phase 8.3 scope: parse
path validation only — no V4L2 wiring, no decode, no chardev
connection.
Components:
- daemon/CMakeLists.txt — CMake with -Wall -Wextra -Wpedantic
clean. pkg-config for FFmpeg headers; only -ldl + -lpthread
at link time.
- daemon/src/main.c — entry point, signal handlers
(SIGINT/SIGTERM), command dispatcher. Currently `parse <file>`.
- daemon/src/ffmpeg_loader.{c,h} — runtime FFmpeg loader.
dlopens libavformat.so.61, libavcodec.so.61, libavutil.so.59.
Resolves 22 function pointers using POSIX-recommended
*(void**)& dlsym idiom (per POSIX.1-2017 dlsym(3p) Rationale).
- daemon/src/parser.{c,h} — demux loop via avformat_open_input +
av_read_frame. Per-frame logging on -v.
- daemon/src/log.{c,h} — logging facade (stderr Phase 8.3;
syslog/journal planned for 8.5+).
Verification on hertz:
$ ffmpeg -f lavfi -i testsrc=duration=2:size=320x240:rate=30 \
-c:v libvpx-vp9 -y /tmp/testsrc.ivf
$ daedalus_v4l2_daemon parse /tmp/testsrc.ivf
[INFO] FFmpeg loaded: 7.1.3-0+deb13u1+rpt1 (libavformat 61.7.100)
[INFO] video stream #0: codec=vp9 (Google VP9) 320x240, 0/0 fps
[INFO] parse complete: 60 frames (1 key) total 17859 bytes
Error paths verified:
- Missing file → "avformat_open_input(...): code -2", exit 1
- No command → usage message, exit 2
- Bad command → usage message, exit 2
Per correctness-before-speed:
- Real CMake (no Makefile hacks)
- pkg-config for headers
- POSIX-conformant dlsym pattern (no -Wpedantic suppression)
- Real signal handling + proper exit codes
- Real logging with timestamp + level
- Headers included at compile-time for type safety; dlopen
decouples runtime
- All FFmpeg resources freed on every exit path
- Builds clean on -Wall -Wextra -Wpedantic
Phase 8.3 acceptance criteria met:
- ✓ daemon binary builds
- ✓ dlopen FFmpeg at runtime
- ✓ demux a VP9 IVF file end-to-end
- ✓ per-frame metadata logged correctly
- ✓ frame count + keyframe count + byte total accurate
Phase 8.4 next: wire daemon to /dev/daedalus-v4l2 chardev,
add REQ_DECODE / RESP_FRAME handling, drive VP9 decode
end-to-end via daedalus_dispatch_* from daedalus-fourier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
895f57c63a |
Phase 8.2: kernel ↔ daemon chardev bridge with round-trip test
Adds /dev/daedalus-v4l2 misc chardev to the kernel module. The
chardev is the IPC channel for the future userspace decoder
daemon: kernel enqueues REQ_* messages, daemon read()s them,
processes, write()s RESP_* back.
Wire protocol (pre-1.0, header in include/daedalus_v4l2_proto.h):
- struct daedalus_msg_hdr: magic (D04V) + version + type +
cookie + payload_len + reserved
- Request/response separated by high bit of type field
- Max 64 KiB payload per message
- Cookie correlates request with matching response
Kernel implementation (kernel/daedalus_v4l2_chardev.{c,h}):
- Single-instance chardev (-EBUSY on second open)
- In-kernel FIFO bounded at 64 messages
- Blocking + non-blocking read; poll() with EPOLLIN on queued
- write() parses + validates header, logs response at pr_debug
- Bad magic → -EBADMSG, bad version → -EPROTO, oversize → -EMSGSIZE
- All error paths free resources
Phase 8.2 test trigger via debugfs:
- /sys/kernel/debug/daedalus_v4l2/test_ping — any byte
enqueues a PING with a fixed 24-byte payload. Removed in
Phase 8.4 when real REQ_DECODE from V4L2 path takes over.
Userspace verification tool (tools/test_chardev_pingpong.c):
- Real C program, proper error reporting via strerror
- Validates the 6-step round-trip: open → empty-queue EAGAIN →
trigger ping → read PING → verify all fields → write PONG → close
- Builds with -Wall -Wextra clean
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
$ sudo insmod daedalus_v4l2.ko
$ sudo tools/test_chardev_pingpong
opening /dev/daedalus-v4l2...
non-blocking read on empty queue: EAGAIN ✓
injected PING via debugfs ✓
read PING: magic ✓ version ✓ type=PING ✓ cookie=0x1234 ✓ payload=24 bytes
payload: "DAEDALUS-V4L2-PING-PL"
wrote PONG (cookie=0x1234) ✓
ALL TESTS PASSED.
$ sudo rmmod daedalus_v4l2 # clean
Per correctness-before-speed: full kerneldoc on structs, 8-tab
kernel style, SPDX headers, proper error paths, real test
program (not "I ran it once"), failure-mode coverage documented.
Phase 8.3 next: userspace daemon with dlopen'd FFmpeg parse path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
9415b7e0f7 |
Phase 8.1: kernel V4L2 device skeleton (out-of-tree module)
Out-of-tree Linux kernel module registering /dev/videoNN. Phase 8.1 scope: skeleton only — VIDIOC_QUERYCAP works, no codec ioctls / no vb2_queue / no controls yet. Real V4L2 plumbing throughout per "correctness before speed": platform_device + v4l2_device + video_device, properly nested with error paths and devm_kzalloc-managed lifetime. Per-cycle 9 discipline ports to kernel code: SPDX header, kernel coding style (8-tab, static-by-default), kerneldoc on structs, no shortcuts. Files (~250 LOC total): - kernel/Makefile — out-of-tree kbuild with checkpatch target - kernel/daedalus_v4l2_main.c — module init/exit + probe/remove Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712): - Builds clean with -Wall -Wextra. No warnings. - modprobe / rmmod round-trip clean. No dmesg taints beyond the expected "out-of-tree taint" line. - v4l2-ctl --list-devices shows: "daedalus-fourier V3D7+NEON (platform:daedalus_v4l2): /dev/video0" - VIDIOC_QUERYCAP returns driver/card/bus/caps as specified. - v4l2-compliance: 44/48 passing. The 4 failures are exactly the format/buffer ioctls Phase 8.2 will implement (ENUM_FMT, G_FMT, Scaling, REQBUFS) — not skeleton bugs, legitimately-absent features. Documentation: docs/phase_8_1_closure.md captures full verification output + Phase 8.2 plan. Phase 8.1 acceptance criteria met: - ✓ /dev/videoNN appears via v4l2-ctl --list-devices - ✓ VIDIOC_QUERYCAP responds with sensible values - ✓ rmmod is clean (no kref leaks) - ✓ v4l2-compliance passes except for explicit Phase 8.2 work Next: Phase 8.2 chardev bridge for kernel ↔ daemon IPC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
89f56e4b49 |
README: add sibling link back to daedalus-fourier
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c7d8050cc9 |
Initial scaffold: daedalus-v4l2 sibling repo
V4L2 stateless decoder for Pi 5, backed by sibling
daedalus-fourier kernel library (VP9 + AV1 CDEF + H.264 video
decode kernels on VideoCore VII compute + ARM NEON).
Architecture locked 2026-05-18 by mfritsche per
daedalus-fourier/docs/phase8_scoping.md:
- Option B: Linux kernel V4L2 shim + userspace daemon (not
v4l2loopback). Real /dev/videoNN; proper DRM PRIME for
browser zero-copy.
- Option γ: dlopen FFmpeg at runtime as parser. No vendoring;
fastest to v1.
- Sibling repo (this repo): V4L2-side work outside of
daedalus-fourier so kernel-library API stays clean.
Components:
kernel/ - Linux out-of-tree kernel module (GPLv2; V4L2
device + chardev bridge to userspace daemon)
daemon/ - userspace decoder daemon (BSD-2-Clause; links
libdaedalus_core.a from sibling; dlopens FFmpeg)
docs/ - architecture + 7-phase roadmap (8.1..8.7)
include/ - shared headers between kernel and daemon
Roadmap (7 sub-phases, ~1 week each):
8.1 kernel skeleton (/dev/videoNN with no-op ioctls)
8.2 chardev bridge (kernel ↔ daemon ping-pong)
8.3 daemon FFmpeg dlopen + parse path
8.4 VP9 end-to-end via daedalus_dispatch_*
8.5 dmabuf / DRM PRIME for zero-copy
8.6 AV1 + H.264 codec support
8.7 performance: hit 30fps@1080p (project floor)
No code yet — only README + design docs + directory structure.
First implementation work starts in Phase 8.1 next session.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|