Files
daedalus-v4l2/kernel/daedalus_v4l2_chardev.c
T
marfrit 6f4b580f7c Phase 8.5: full V4L2 m2m driver, VP9 decode via QBUF/DQBUF
Replaces the Phase 8.4 debugfs-triggered chardev path with a
real V4L2 m2m driver. Userspace clients now drive decoding the
standard way — S_FMT / REQBUFS / QBUF on the OUTPUT (bitstream)
queue, DQBUF on the CAPTURE (NV12M) queue. Kernel device_run
packs the bitstream into REQ_DECODE; daemon decodes via FFmpeg;
RESP_FRAME's inline NV12 pixel payload lands in the CAPTURE
buffer. Phase 8.6 swaps the inline payload for dmabuf so big
frames stop being capped at 64 KiB.

Kernel (daedalus_v4l2_main.c, rewritten + main.h added):
- Per-open struct daedalus_ctx: v4l2_fh, m2m_ctx, ctrl_handler,
  per-queue v4l2_pix_format_mplane.
- Two vb2_queues (vb2_vmalloc_memops for both — no DMA needed
  yet; 8.6 switches CAPTURE to dma_contig for dmabuf-export):
    OUTPUT  = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE,  VP9_FRAME
    CAPTURE = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE, NV12M
- Full v4l2_ioctl_ops table: querycap, enum_fmt, g/s/try_fmt
  for both queues, reqbufs/querybuf/qbuf/dqbuf/create_bufs/
  prepare_buf/expbuf/streamon/streamoff via v4l2_m2m_ioctl_*
  helpers.
- v4l2_m2m_ops.device_run: peeks next OUTPUT buf, builds
  REQ_DECODE inline with the bitstream bytes, enqueues with an
  auto-incrementing cookie, stores {ctx, src_buf, dst_buf} in
  a per-device inflight list. Job stays open until RESP_FRAME.
- daedalus_complete_resp_frame(): pops the inflight entry,
  memcpys inline NV12 pixels into the CAPTURE buffer (Y plane
  + interleaved CbCr), finishes via
  v4l2_m2m_buf_done_and_job_finish — NOT plain buf_done +
  job_finish, which leaves the src buf on the m2m queue and
  causes device_run to immediately re-run on the same input
  (caught on first run; second REQ_DECODE for same bitstream +
  eventual oops in stop_streaming on teardown).

Kernel (daedalus_v4l2_chardev.c):
- RESP_FRAME handler now hands inline pixel payload to
  daedalus_complete_resp_frame so it lands in the CAPTURE
  vb2 buffer. Existing PONG and debugfs test_decode paths still
  work; the latter produces a harmless ratelimited "unknown
  cookie" since it bypasses V4L2 m2m.

Daemon (decoder.c, decoder.h):
- daedalus_decoder_run_request signature extended with
  (nv12_out, nv12_cap, nv12_used). After the FNV-1a digest the
  decoder packs YUV420P into NV12 in the caller's buffer: Y
  plane line-by-line stripped of stride padding; Cb/Cr
  interleaved into a single chroma plane. Truncation silent —
  kernel only memcpys what fits in the CAPTURE plane.

Daemon (chardev_client.c):
- handle_req_decode allocates a response buffer sized for the
  full chardev payload, lets decoder fill the pixel area
  after the resp_frame struct, sends the full payload via the
  existing send_response.

Test client (tools/test_m2m_decode.c, new):
- Minimal V4L2 m2m client: S_FMT both queues, REQBUFS 1 each,
  mmap+fill OUTPUT, QBUF both, STREAMON, poll, DQBUF, dump
  CAPTURE planes to a raw NV12 file. ~250 LOC; verifies the
  whole flow without needing v4l2-ctl framing.

Roadmap update (docs/roadmap.md):
- Phase 8.4 retitled "daemon ↔ kernel decode round-trip"
  to reflect what actually shipped (vs. the original V4L2-
  ioctl-driven plan which moved here).
- Phase 8.5 retitled "full V4L2 m2m driver" with closure
  status.
- Phase 8.6 reshaped to two tracks: dmabuf + AV1/H.264/
  stateless controls + media controller. Adds the punch list
  of v4l2-compliance failures (DECODER_CMD, S_FMT colorspace)
  that 8.6 will fix.

Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):

  Kernel + daemon build clean (-Wall -Wextra clean both sides).
  Test harness drives one VP9 keyframe end-to-end:
    OUTPUT REQBUFS -> 2
    CAPTURE REQBUFS -> 2
    QBUF OUTPUT[0] bytesused=1566
    QBUF CAPTURE[0]; STREAMON both
    poll revents=0x5
    DQBUF OUTPUT[0] flags=0x4001 (DONE)
    DQBUF CAPTURE[0] flags=0x4000 payloads=[12288, 6144]
    wrote 12288 Y + 6144 UV bytes to /tmp/out_m2m.nv12

  Pixel correctness vs reference:
    ffmpeg -i vp9_small.ivf -pix_fmt nv12 -f rawvideo -y ref.nv12
    cmp /tmp/out_m2m.nv12 /tmp/ref.nv12 → match ✓
  Byte-for-byte identical to FFmpeg's stock CPU decode.

  v4l2-compliance: detected as Stateless Decoder; most ioctls
  pass; two expected fails documented in closure doc
  (DECODER_CMD/media controller, S_FMT colorspace).

  Clean teardown: SIGTERM the daemon, rmmod the module, no
  oops/WARN in dmesg.

Per correctness-before-speed:
- Real V4L2 ioctl table (not stubs); uses v4l2-core helpers
  where they exist instead of reinventing.
- v4l2_m2m_buf_done_and_job_finish (not the manual sequence)
  to keep scheduler state consistent.
- Bit-exact reference comparison, not just "looks right."
- Documented every compliance failure with the planned fix.
- All resource paths (kmalloc/kfree, inflight list cleanup,
  src/dst buf removal in stop_streaming) handled on every
  error branch.

Phase 8.6 next: dmabuf-export for CAPTURE (removes 64 KiB
frame-size cap), add AV1+H.264 codecs, add V4L2 stateless
controls + media controller binding, fix the colorspace +
cookie-namespace compliance issues.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 15:55:10 +00:00

492 lines
13 KiB
C

// SPDX-License-Identifier: GPL-2.0-or-later
/*
* daedalus-v4l2 — kernel ↔ daemon chardev bridge.
*
* Exposes /dev/daedalus-v4l2 (a misc-class character device)
* for the userspace daemon to attach to. Single-instance:
* only one open file at a time. Blocking read() pulls the next
* request from a kernel-side FIFO; write() submits a response.
*
* Phase 8.2 scope: PING request handling — the daemon writes a
* PONG response to a PING request that arrives via read(). In
* Phase 8.2 the kernel injects test PING requests itself via a
* debugfs trigger (no V4L2 ioctl flow yet); Phase 8.4 wires
* real DECODE requests from the V4L2 path.
*/
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/miscdevice.h>
#include <linux/mutex.h>
#include <linux/wait.h>
#include <linux/list.h>
#include <linux/slab.h>
#include <linux/uaccess.h>
#include <linux/poll.h>
#include <linux/debugfs.h>
#include "daedalus_v4l2_proto.h"
#include "daedalus_v4l2_chardev.h"
#include "daedalus_v4l2_main.h"
#define DAEDALUS_CHARDEV_NAME "daedalus-v4l2"
/* Cap the number of pending requests so a stuck daemon can't OOM us. */
#define DAEDALUS_QUEUE_MAX 64
/**
* struct daedalus_chardev_msg - in-kernel queued message
* @list: queue linkage
* @hdr: wire header
* @payload: payload bytes; size = hdr.payload_len
*/
struct daedalus_chardev_msg {
struct list_head list;
struct daedalus_msg_hdr hdr;
u8 *payload;
};
/**
* struct daedalus_chardev - per-singleton chardev state
* @misc: misc-class device registration
* @open_lock: serialises open()/release()
* @opened: non-zero when the chardev is currently open
* @req_lock: protects @req_queue / @req_count
* @req_queue: list of pending REQ_* messages waiting for daemon read()
* @req_count: current number of queued requests
* @req_wait: read() blocks here until a request arrives
*/
struct daedalus_chardev {
struct miscdevice misc;
struct mutex open_lock;
int opened;
struct mutex req_lock;
struct list_head req_queue;
int req_count;
wait_queue_head_t req_wait;
struct dentry *debugfs_dir;
};
static struct daedalus_chardev *g_chardev;
/* -- internal helpers ------------------------------------------------ */
static struct daedalus_chardev_msg *
daedalus_chardev_dequeue_locked(struct daedalus_chardev *dev)
{
struct daedalus_chardev_msg *msg;
if (list_empty(&dev->req_queue))
return NULL;
msg = list_first_entry(&dev->req_queue,
struct daedalus_chardev_msg, list);
list_del(&msg->list);
dev->req_count--;
return msg;
}
static void daedalus_chardev_msg_free(struct daedalus_chardev_msg *msg)
{
if (!msg)
return;
kfree(msg->payload);
kfree(msg);
}
int daedalus_chardev_enqueue_req(u32 type, u32 cookie,
const void *payload, size_t payload_len)
{
struct daedalus_chardev *dev = g_chardev;
struct daedalus_chardev_msg *msg;
if (!dev)
return -ENODEV;
if (payload_len > DAEDALUS_PROTO_MAX_PAYLOAD)
return -EMSGSIZE;
if (type & 0x80000000u) /* responses don't get queued here */
return -EINVAL;
msg = kzalloc(sizeof(*msg), GFP_KERNEL);
if (!msg)
return -ENOMEM;
if (payload_len) {
msg->payload = kmemdup(payload, payload_len, GFP_KERNEL);
if (!msg->payload) {
kfree(msg);
return -ENOMEM;
}
}
msg->hdr.magic = DAEDALUS_PROTO_MAGIC;
msg->hdr.version = DAEDALUS_PROTO_VERSION;
msg->hdr.type = type;
msg->hdr.cookie = cookie;
msg->hdr.payload_len = (u32) payload_len;
msg->hdr.reserved = 0;
mutex_lock(&dev->req_lock);
if (dev->req_count >= DAEDALUS_QUEUE_MAX) {
mutex_unlock(&dev->req_lock);
daedalus_chardev_msg_free(msg);
return -ENOSPC;
}
list_add_tail(&msg->list, &dev->req_queue);
dev->req_count++;
mutex_unlock(&dev->req_lock);
wake_up_interruptible(&dev->req_wait);
return 0;
}
/* -- file operations ------------------------------------------------- */
static int daedalus_chardev_open(struct inode *inode, struct file *file)
{
struct daedalus_chardev *dev = g_chardev;
mutex_lock(&dev->open_lock);
if (dev->opened) {
mutex_unlock(&dev->open_lock);
return -EBUSY;
}
dev->opened = 1;
mutex_unlock(&dev->open_lock);
file->private_data = dev;
return 0;
}
static int daedalus_chardev_release(struct inode *inode, struct file *file)
{
struct daedalus_chardev *dev = file->private_data;
struct daedalus_chardev_msg *msg;
mutex_lock(&dev->req_lock);
while ((msg = daedalus_chardev_dequeue_locked(dev)) != NULL) {
mutex_unlock(&dev->req_lock);
daedalus_chardev_msg_free(msg);
mutex_lock(&dev->req_lock);
}
mutex_unlock(&dev->req_lock);
mutex_lock(&dev->open_lock);
dev->opened = 0;
mutex_unlock(&dev->open_lock);
return 0;
}
static ssize_t daedalus_chardev_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
struct daedalus_chardev *dev = file->private_data;
struct daedalus_chardev_msg *msg;
size_t total;
int ret;
if (count < sizeof(struct daedalus_msg_hdr))
return -EINVAL;
for (;;) {
mutex_lock(&dev->req_lock);
msg = daedalus_chardev_dequeue_locked(dev);
mutex_unlock(&dev->req_lock);
if (msg)
break;
if (file->f_flags & O_NONBLOCK)
return -EAGAIN;
ret = wait_event_interruptible(dev->req_wait,
dev->req_count > 0);
if (ret)
return ret;
}
total = sizeof(msg->hdr) + msg->hdr.payload_len;
if (count < total) {
/*
* Requeue so the caller can retry with a bigger buffer.
* Re-enqueue at HEAD to preserve FIFO order.
*/
mutex_lock(&dev->req_lock);
list_add(&msg->list, &dev->req_queue);
dev->req_count++;
mutex_unlock(&dev->req_lock);
return -EMSGSIZE;
}
if (copy_to_user(buf, &msg->hdr, sizeof(msg->hdr))) {
daedalus_chardev_msg_free(msg);
return -EFAULT;
}
if (msg->hdr.payload_len &&
copy_to_user(buf + sizeof(msg->hdr), msg->payload,
msg->hdr.payload_len)) {
daedalus_chardev_msg_free(msg);
return -EFAULT;
}
daedalus_chardev_msg_free(msg);
return total;
}
static ssize_t daedalus_chardev_write(struct file *file,
const char __user *buf,
size_t count, loff_t *ppos)
{
struct daedalus_msg_hdr hdr;
u8 *payload = NULL;
size_t expected;
if (count < sizeof(hdr))
return -EINVAL;
if (copy_from_user(&hdr, buf, sizeof(hdr)))
return -EFAULT;
if (hdr.magic != DAEDALUS_PROTO_MAGIC)
return -EBADMSG;
if (hdr.version != DAEDALUS_PROTO_VERSION)
return -EPROTO;
if (hdr.payload_len > DAEDALUS_PROTO_MAX_PAYLOAD)
return -EMSGSIZE;
expected = sizeof(hdr) + hdr.payload_len;
if (count < expected)
return -EINVAL;
if (hdr.payload_len) {
payload = kmalloc(hdr.payload_len, GFP_KERNEL);
if (!payload)
return -ENOMEM;
if (copy_from_user(payload, buf + sizeof(hdr),
hdr.payload_len)) {
kfree(payload);
return -EFAULT;
}
}
/*
* Response dispatch. Phase 8.4 understands PONG (echoes
* back at debug level) and RESP_FRAME (logs decode result
* at info so the test harness can see it without enabling
* dyndbg). Phase 8.5+ will wire RESP_FRAME to the V4L2
* buffer-done path.
*/
switch (hdr.type) {
case DAEDALUS_MSG_RESP_FRAME: {
struct daedalus_resp_frame fr;
const u8 *pixels = NULL;
size_t pixels_len = 0;
if (hdr.payload_len < sizeof(fr)) {
pr_warn("daedalus_v4l2: RESP_FRAME payload too short (%u < %zu)\n",
hdr.payload_len, sizeof(fr));
kfree(payload);
return -EBADMSG;
}
memcpy(&fr, payload, sizeof(fr));
if (hdr.payload_len > sizeof(fr)) {
pixels = payload + sizeof(fr);
pixels_len = hdr.payload_len - sizeof(fr);
}
pr_debug("daedalus_v4l2: RESP_FRAME cookie=%u status=%u codec=%u %ux%u pixfmt=%d luma=%u chroma=%u fnv1a=0x%08x inline_pixels=%zu\n",
hdr.cookie, fr.status, fr.codec_id,
fr.width, fr.height, fr.pix_fmt,
fr.luma_len, fr.chroma_len, fr.fnv1a_yuv,
pixels_len);
/*
* Hand off to the V4L2 m2m completion path. If no
* V4L2 device is registered yet (e.g. debugfs-only
* test_decode used and no V4L2 m2m_ctx exists),
* daedalus_complete_resp_frame returns silently after
* a ratelimited warn.
*/
daedalus_complete_resp_frame(hdr.cookie, &fr, pixels,
pixels_len);
break;
}
default:
pr_debug("daedalus_v4l2: chardev got response type=0x%08x cookie=%u plen=%u\n",
hdr.type, hdr.cookie, hdr.payload_len);
break;
}
kfree(payload);
return expected;
}
static __poll_t daedalus_chardev_poll(struct file *file,
struct poll_table_struct *wait)
{
struct daedalus_chardev *dev = file->private_data;
__poll_t mask = EPOLLOUT | EPOLLWRNORM;
poll_wait(file, &dev->req_wait, wait);
if (READ_ONCE(dev->req_count) > 0)
mask |= EPOLLIN | EPOLLRDNORM;
return mask;
}
/*
* .llseek intentionally unset. The chardev is a streaming
* request/response channel; no positional semantics. Recent
* kernels removed `no_llseek`; leaving the slot NULL gets the
* generic "no-op or -ESPIPE" behaviour the v6.12+ vfs picks.
*/
static const struct file_operations daedalus_chardev_fops = {
.owner = THIS_MODULE,
.open = daedalus_chardev_open,
.release = daedalus_chardev_release,
.read = daedalus_chardev_read,
.write = daedalus_chardev_write,
.poll = daedalus_chardev_poll,
};
/* -- debugfs test trigger (Phase 8.2 only) --------------------------- */
/*
* Writing any non-zero byte stream to
* /sys/kernel/debug/daedalus_v4l2/test_ping enqueues a PING
* request with a fixed 24-byte payload "DAEDALUS-V4L2-PING-PL\0\0\0".
* The userspace test daemon (tools/test_chardev_pingpong.c)
* then reads it back, sends PONG, and the kernel logs the
* round-trip at pr_debug level.
*
* Phase 8.4 replaces this with real REQ_DECODE injection from
* the V4L2 buffer-submit path; the debugfs entry can be removed
* then.
*/
static ssize_t daedalus_test_ping_write(struct file *file,
const char __user *buf,
size_t count, loff_t *ppos)
{
static const char payload[24] = "DAEDALUS-V4L2-PING-PL";
int ret;
ret = daedalus_chardev_enqueue_req(DAEDALUS_MSG_PING, 0x1234u,
payload, sizeof(payload));
if (ret)
return ret;
return count;
}
static const struct file_operations daedalus_test_ping_fops = {
.owner = THIS_MODULE,
.write = daedalus_test_ping_write,
};
/*
* Writing bitstream bytes to
* /sys/kernel/debug/daedalus_v4l2/test_decode enqueues a REQ_DECODE
* carrying those bytes as a VP9 access unit (Phase 8.4 fixed
* codec). The wire payload prepends a struct daedalus_req_decode
* header so the daemon knows the codec id and bitstream length.
*
* Phase 8.6 generalises codec_id (via a sysfs / debugfs control);
* for Phase 8.4 VP9 is hard-wired since that's what the cycle-9
* stack targets first.
*/
static atomic_t daedalus_decode_cookie = ATOMIC_INIT(0);
static ssize_t daedalus_test_decode_write(struct file *file,
const char __user *buf,
size_t count, loff_t *ppos)
{
struct daedalus_req_decode req;
u8 *blob;
size_t total;
u32 cookie;
int ret;
if (count == 0)
return -EINVAL;
if (count + sizeof(req) > DAEDALUS_PROTO_MAX_PAYLOAD)
return -EMSGSIZE;
total = sizeof(req) + count;
blob = kmalloc(total, GFP_KERNEL);
if (!blob)
return -ENOMEM;
req.codec_id = DAEDALUS_CODEC_VP9;
req.bitstream_len = (u32) count;
req.flags = 0;
memcpy(blob, &req, sizeof(req));
if (copy_from_user(blob + sizeof(req), buf, count)) {
kfree(blob);
return -EFAULT;
}
cookie = (u32) atomic_inc_return(&daedalus_decode_cookie);
ret = daedalus_chardev_enqueue_req(DAEDALUS_MSG_REQ_DECODE, cookie,
blob, total);
kfree(blob);
if (ret)
return ret;
pr_info("daedalus_v4l2: REQ_DECODE enqueued cookie=%u codec=VP9 bitstream=%zu\n",
cookie, count);
return count;
}
static const struct file_operations daedalus_test_decode_fops = {
.owner = THIS_MODULE,
.write = daedalus_test_decode_write,
};
/* -- registration ---------------------------------------------------- */
int daedalus_chardev_init(void)
{
struct daedalus_chardev *dev;
int ret;
dev = kzalloc(sizeof(*dev), GFP_KERNEL);
if (!dev)
return -ENOMEM;
mutex_init(&dev->open_lock);
mutex_init(&dev->req_lock);
INIT_LIST_HEAD(&dev->req_queue);
init_waitqueue_head(&dev->req_wait);
dev->misc.minor = MISC_DYNAMIC_MINOR;
dev->misc.name = DAEDALUS_CHARDEV_NAME;
dev->misc.fops = &daedalus_chardev_fops;
dev->misc.mode = 0660; /* root:video, like /dev/videoNN */
ret = misc_register(&dev->misc);
if (ret) {
kfree(dev);
return ret;
}
dev->debugfs_dir = debugfs_create_dir("daedalus_v4l2", NULL);
if (!IS_ERR(dev->debugfs_dir)) {
debugfs_create_file("test_ping", 0200, dev->debugfs_dir,
NULL, &daedalus_test_ping_fops);
debugfs_create_file("test_decode", 0200, dev->debugfs_dir,
NULL, &daedalus_test_decode_fops);
}
g_chardev = dev;
pr_info("daedalus_v4l2: /dev/%s registered\n", DAEDALUS_CHARDEV_NAME);
return 0;
}
void daedalus_chardev_exit(void)
{
struct daedalus_chardev *dev = g_chardev;
struct daedalus_chardev_msg *msg;
if (!dev)
return;
debugfs_remove_recursive(dev->debugfs_dir);
misc_deregister(&dev->misc);
while ((msg = list_first_entry_or_null(&dev->req_queue,
struct daedalus_chardev_msg,
list)) != NULL) {
list_del(&msg->list);
daedalus_chardev_msg_free(msg);
}
mutex_destroy(&dev->req_lock);
mutex_destroy(&dev->open_lock);
kfree(dev);
g_chardev = NULL;
}