codec_store_buffer: resize / re-allocate source_data on resolution upshift (root cause for #13) #15

Closed
opened 2026-05-21 10:20:14 +00:00 by marfrit · 0 comments
Owner

Follow-up to #13 (closed by PR #14) — the bounds checks landed in PR #14 are the memory-safety floor, not the actual fix. With them in place a resolution upshift produces a graceful VA_STATUS_ERROR_ALLOCATION_FAILED to libavcodec instead of a SIGSEGV inside the memcpy, which gives the consumer a chance to re-create the surface — but it still means that decoded frame is dropped and the consumer has to do the recovery itself. The right behaviour is for the libva backend to absorb the resize transparently.

Where the buffer comes from

surface_object->source_data is mirrored in BeginPicture from an OUTPUT-pool slot:

slot = request_pool_slot(&driver_data->output_pool, ...);
surface_object->source_data = slot->data;
surface_object->source_size = slot->size;

The slot's data / size were set once in request_pool_init:

rc = v4l2_query_buffer(video_fd, output_type, pool->slots[i].index,
                       &length, &offset, 1);
pool->slots[i].data = mmap(NULL, length, ...);
pool->slots[i].size = length;

i.e. the kernel driver's negotiated sizeimage from the S_FMT that ran during CreateContext. That number is fixed for the life of the pool — no path in the backend ever re-S_FMTs OUTPUT or re-mmaps the slot.

So any stream-level resolution upshift that lifts the per-frame slice budget past the original S_FMT's sizeimage will trip the new bounds check forever, until the libva client tears down the context and re-creates one.

Two viable shapes for the proper fix

Shape A — re-S_FMT + re-init the OUTPUT pool on detected upshift

  1. In codec_store_buffer, when the bounds check would fire, call a new output_pool_resize(driver_data, needed_bytes) instead of returning the error immediately.
  2. output_pool_resize issues VIDIOC_STREAMOFF, re-does S_FMT with the new sizeimage (or just a generous round-up), re-CREATE_BUFS, re-mmaps, and re-STREAMON. Updates each surface's source_data / source_size mirror on next BeginPicture.
  3. Same for the CAPTURE side — the kernel driver will reject the OUTPUT S_FMT if the dimensions don't match the negotiated CAPTURE format on stateless decoders, so the resize has to be coordinated.

Gotcha: any surface that's mid-Begin..Render..End cycle must finish (or be aborted) before the streamoff. The existing single-context, single-render-surface invariant in picture.c makes that tractable — there's only ever one render-surface-id in flight per context.

Shape B — over-allocate the OUTPUT pool at CreateContext time

If the libva client tells us the peak dimensions up front (it does — vaCreateContext takes picture_width and picture_height), oversize the OUTPUT S_FMT's sizeimage to e.g. 2× the spec maximum slice for that resolution + profile combo. Simpler, but wastes mmap until the libva client tells us the max.

The daedalus_v4l2 daemon path is mid-stream-resolution-change-tolerant on its own side (libavcodec rebuilds its parser on a fresh SPS), so the libva-side fix doesn't need to coordinate with the daemon — only with the kernel S_FMT state machine.

Why this matters beyond the SIGSEGV class of failures

Adaptive streaming (DASH/HLS) flips resolution multiple times per minute on cellular. Every flip currently means a destroyed VA context on the consumer side. mpv copes; Firefox's RDD process can be made to cope but the tear-down/rebuild is a stutter the user sees. A libva-side transparent resize would let the same VA context survive a mid-stream upshift.

Sequencing

This is independent of:

  • the daedalus daemon's OBU synthesiser (#11 daemon track), which already has its own resolution-change handling on the consumer.
  • the h264_set_controls fallback (#8).

Depends on:

  • A way to test mid-stream resolution change in CI / on higgs. The bbb_1080p30 ↔ 720p30 repro from #13 is the obvious one, but it also exercises the CAPTURE-side allocator (cap_pool_init) — that one will need the same kind of resize hook.

Suggested label

backlog, refactor, memory-safety (this is the proper fix; PR #14 was the floor).

Follow-up to #13 (closed by PR #14) — the bounds checks landed in PR #14 are the **memory-safety floor**, not the actual fix. With them in place a resolution upshift produces a graceful `VA_STATUS_ERROR_ALLOCATION_FAILED` to libavcodec instead of a SIGSEGV inside the `memcpy`, which gives the consumer a chance to re-create the surface — but it still means *that decoded frame is dropped* and the consumer has to do the recovery itself. The right behaviour is for the libva backend to absorb the resize transparently. ## Where the buffer comes from `surface_object->source_data` is mirrored in `BeginPicture` from an OUTPUT-pool slot: ```c slot = request_pool_slot(&driver_data->output_pool, ...); surface_object->source_data = slot->data; surface_object->source_size = slot->size; ``` The slot's `data` / `size` were set once in `request_pool_init`: ```c rc = v4l2_query_buffer(video_fd, output_type, pool->slots[i].index, &length, &offset, 1); pool->slots[i].data = mmap(NULL, length, ...); pool->slots[i].size = length; ``` i.e. the kernel driver's negotiated `sizeimage` from the `S_FMT` that ran during `CreateContext`. That number is fixed for the life of the pool — no path in the backend ever re-`S_FMT`s OUTPUT or re-mmaps the slot. So any stream-level resolution upshift that lifts the per-frame slice budget past the original `S_FMT`'s `sizeimage` will trip the new bounds check forever, until the libva client tears down the context and re-creates one. ## Two viable shapes for the proper fix ### Shape A — re-`S_FMT` + re-init the OUTPUT pool on detected upshift 1. In `codec_store_buffer`, when the bounds check would fire, call a new `output_pool_resize(driver_data, needed_bytes)` instead of returning the error immediately. 2. `output_pool_resize` issues `VIDIOC_STREAMOFF`, re-does `S_FMT` with the new `sizeimage` (or just a generous round-up), re-`CREATE_BUFS`, re-mmaps, and re-`STREAMON`. Updates each surface's `source_data` / `source_size` mirror on next `BeginPicture`. 3. Same for the CAPTURE side — the kernel driver will reject the OUTPUT `S_FMT` if the dimensions don't match the negotiated CAPTURE format on stateless decoders, so the resize has to be coordinated. Gotcha: any surface that's mid-`Begin..Render..End` cycle must finish (or be aborted) before the streamoff. The existing single-context, single-render-surface invariant in `picture.c` makes that tractable — there's only ever one render-surface-id in flight per context. ### Shape B — over-allocate the OUTPUT pool at `CreateContext` time If the libva client tells us the *peak* dimensions up front (it does — `vaCreateContext` takes `picture_width` and `picture_height`), oversize the OUTPUT `S_FMT`'s `sizeimage` to e.g. 2× the spec maximum slice for that resolution + profile combo. Simpler, but wastes mmap until the libva client tells us the max. The daedalus_v4l2 daemon path is mid-stream-resolution-change-tolerant on its own side (libavcodec rebuilds its parser on a fresh SPS), so the libva-side fix doesn't need to coordinate with the daemon — only with the kernel S_FMT state machine. ## Why this matters beyond the SIGSEGV class of failures Adaptive streaming (DASH/HLS) flips resolution multiple times per minute on cellular. Every flip currently means a destroyed VA context on the consumer side. mpv copes; Firefox's RDD process can be made to cope but the tear-down/rebuild is a stutter the user sees. A libva-side transparent resize would let the same VA context survive a mid-stream upshift. ## Sequencing This is independent of: - the daedalus daemon's OBU synthesiser (#11 daemon track), which already has its own resolution-change handling on the consumer. - the h264_set_controls fallback (#8). Depends on: - A way to test mid-stream resolution change in CI / on higgs. The bbb_1080p30 ↔ 720p30 repro from #13 is the obvious one, but it also exercises the CAPTURE-side allocator (`cap_pool_init`) — that one will need the same kind of resize hook. ## Suggested label `backlog`, `refactor`, `memory-safety` (this is the *proper* fix; PR #14 was the floor).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/libva-v4l2-request-fourier#15