Commit Graph

76 Commits

Author SHA1 Message Date
claude-noether 6f4e5833f0 iter8 Phase 7 fix-fwd: picture.c needs <stdlib.h> for getenv
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:22:40 +00:00
claude-noether 66ecbef5c6 iter8 Phase 7 IMP-1 experiment: LIBVA_V4L2_ZERO_CAPTURE pre-zero gate
Env-gated CAPTURE pre-zero in BeginPicture after cap_pool_acquire. With
LIBVA_V4L2_ZERO_CAPTURE=1, the slot mmap region is memset 0 before the
kernel decode runs. Discriminates "kernel writes partial then aborts"
from "kernel writes nothing, buffer carries stale residue from prior
allocation."

Per Phase 5 IMP-1: the 16x32 patch in libva H.264 frame 1 may be either
real partial kernel write OR stale residue. This gate makes the next
sweep run deterministically zero the buffer; if the patch still appears
after, the kernel really writes it; if not, it was stale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:21:54 +00:00
claude-noether beaa914680 fresnel-fourier iter4 Phase 6 commit C: picture.c VP9 dispatch + 2 buffer-type cases
5 sites:
1. include block: add #include "vp9.h".
2. codec_set_controls: add VP9 case calling vp9_set_controls().
3. codec_store_buffer VAPictureParameterBufferType: VP9 inner case
   memcpy'ing into surface_object->params.vp9.picture.
4. codec_store_buffer VASliceParameterBufferType: VP9 inner case
   memcpy'ing into surface_object->params.vp9.slice.
5. (No reset in RequestBeginPicture — VP9 has no iqmatrix_set/
   probability_set-style flag, Picture/Slice are unconditionally
   populated by VAAPI consumer per frame.)

Per Phase 2 B12: NO buffer.c changes — VP9 uses Picture+Slice+Data
which are already in the iter3 allow-list. Per memory
feedback_runtime_enumerates_allowlists.md plan for Commit D
fix-forward if a runtime miss surfaces; predicted clean.

Verified end-to-end on fresnel:
- vainfo enumerates VAProfileVP9Profile0 alongside H.264 + HEVC.
- LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi VP9 decode
  exits 0 (criterion 3 PASS): 5 frames decoded at 0.307x speed,
  cap_pool_init OK, no kernel ioctl errors.
- mpv vp9-vaapi engagement still SW-fallback (iter4-B2 backlog —
  mpv-DRM device-create path doesn't honor LIBVA_DRIVER_NAME the
  way ffmpeg-vaapi does; investigation deferred).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 06:48:49 +00:00
claude-noether 7f84bbb50f fresnel-fourier iter3 Phase 6 commit C: picture.c VP8 dispatch + 4
buffer-type cases + new VAProbabilityBufferType outer case + per-
frame reset + surface.h params.vp8 union extension

Five sites in picture.c + one site in surface.h wire up the VP8
codec dispatcher introduced by commit B:

  1. Include #include "vp8.h" in the codec headers block.

  2. codec_set_controls: NEW case VAProfileVP8Version0_3 calling
     vp8_set_controls(driver_data, context, surface_object).
     Same shape as MPEG-2 + HEVC dispatch.

  3. codec_store_buffer VAPictureParameterBufferType: NEW VP8 case
     memcpy'ing into surface_object->params.vp8.picture
     (sizeof VAPictureParameterBufferVP8).

  4. codec_store_buffer VASliceParameterBufferType: NEW VP8 case
     memcpy'ing into surface_object->params.vp8.slice (single,
     no slices[] array — VP8 is frame-mode, no multi-slice).

  5. codec_store_buffer VAIQMatrixBufferType: NEW VP8 case
     memcpy'ing into surface_object->params.vp8.iqmatrix +
     setting iqmatrix_set true.

  6. codec_store_buffer NEW outer case VAProbabilityBufferType
     (Phase 5 C3: NOT VAProbabilityDataBufferType — that's the
     STRUCT name; the buffer-type enum constant is
     VAProbabilityBufferType = 13 per va.h:2058). Inner switch
     dispatches by profile, with VP8 case memcpy'ing into
     surface_object->params.vp8.probability + setting
     probability_set true.

  7. RequestBeginPicture: NEW per-frame reset for the two VP8
     flags — params.vp8.iqmatrix_set = false +
     params.vp8.probability_set = false. Mirrors the existing
     iter1 (h264.matrix_set) + iter2 (h265.num_slices) per-frame
     resets.

surface.h extension:

  8. params union: NEW vp8 struct after h265 — holds the 4 VAAPI
     buffer-type structs (VAPictureParameterBufferVP8,
     VASliceParameterBufferVP8, VAIQMatrixBufferVP8 + iqmatrix_set,
     VAProbabilityDataBufferVP8 + probability_set).

The NEW vp8 union member adds ~5300 bytes (sizeof
VAProbabilityDataBufferVP8 dominated by dct_coeff_probs[4][8][3]
[11] = 1056 + bookkeeping). The h265 member with slices[64] array
remains the largest (~17 KB), so the union size doesn't grow.

After this commit: backend builds clean, links cleanly. mpv-vaapi
VP8 decode should engage end-to-end on hantro env binding. Phase
1 criteria 1 + 2 + 3 expected satisfied; criterion 4 (HW=SW byte-
identical) and criterion 5 (3-codec regression) verified at Phase
6 smoke + Phase 7.

Refs:
  ../fresnel-fourier/phase4_iter3_plan.md (Commit C site list)
  ../fresnel-fourier/phase2_iter3_situation.md (B6, B7, B8, B9
                                                 bug enumeration)
  ../fresnel-fourier/phase5_iter3_review.md (C3 VAProbabilityBuffer
                                              Type rename
                                              empirically verified)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 22:52:24 +00:00
claude-noether 8d71e20bf7 fresnel-fourier iter2 Phase 6 commit B: rewrite h265.c against new V4L2 stateless HEVC API
Rewrites src/h265.c (407 lines → 588 lines) and the picture.c HEVC
dispatch + per-slice accumulation against the modern split V4L2_CID_
STATELESS_HEVC_{SPS,PPS,SLICE_PARAMS,SCALING_MATRIX,DECODE_PARAMS,
DECODE_MODE,START_CODE} stateless controls. Replaces the staging-era
V4L2_CID_MPEG_VIDEO_HEVC_{SPS,PPS,SLICE_PARAMS} CIDs that were
removed from the kernel UAPI.

Per-frame submission: ONE batched VIDIOC_S_EXT_CTRLS, count=5,
ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS:

  0xa40a90 SPS            (40  bytes)
  0xa40a91 PPS            (64  bytes)
  0xa40a92 SLICE_PARAMS   (variable; dynamic-array; one entry per slice)
  0xa40a93 SCALING_MATRIX (1296 bytes; memset-zero when no scaling list)
  0xa40a94 DECODE_PARAMS  (328 bytes; per-frame DPB info)

Plus device-wide menus set once at context.c init (separate batched
S_EXT_CTRLS call so a kernel without HEVC controls — e.g. hantro on
RK3568/RK3399 — silently fails its batch without invalidating H.264):

  0xa40a95 DECODE_MODE  (FRAME_BASED on rkvdec)
  0xa40a96 START_CODE   (ANNEX_B on rkvdec)

Reference: FFmpeg libavcodec/v4l2_request_hevc.c:505-565
           (v4l2_request_hevc_queue_decode batched submission shape).

Phase 5 review amendments incorporated:

  C1 (data_byte_offset NOT data_bit_offset):
    Old h265.c at lines 184-209 ran an 8-bit search to compute
    bit-granularity offset. New API renames the field to
    data_byte_offset (u32 byte offset). Bit-search dropped; replaced
    with plain byte offset = source_offset + slice->slice_data_byte_offset.

  C2 (dpb_entry.flags only LONG_TERM_REFERENCE; pic_order_cnt_val
      singular; poc_st_curr_*[] arrays hold DPB INDICES not POC):
    h265_fill_decode_params replaces old slice-params DPB iteration
    with explicit DPB classification + index-array population.
    For each VAAPI ReferenceFrames[i]:
      - Classify into ST_CURR_BEFORE / ST_CURR_AFTER / LT_CURR via
        VA_PICTURE_HEVC_RPS_* flags.
      - Set dpb[j].timestamp, .pic_order_cnt_val (singular), .field_pic.
      - Set dpb[j].flags = LONG_TERM_REFERENCE iff RPS_LT_CURR.
      - Append j (DPB index, u8) to poc_st_curr_before[k] /
        poc_st_curr_after[k] / poc_lt_curr[k] based on classification.

  C3 (union-aliasing reasoning corrected):
    BeginPicture's params.h265.num_slices = 0 reset is benign for
    non-HEVC profiles because byte ~17764 of the params union is past
    any field non-HEVC profiles read, NOT because RenderPicture's
    per-buffer copies overwrite that location. Wording amended in
    phase4_iter2_plan.md per phase5_iter2_review.md.

  S1 (PPS flags 19 + 20 — DEBLOCKING_FILTER_CONTROL_PRESENT and
      UNIFORM_SPACING):
    Empirically VAAPI does NOT expose either flag in the
    VAPictureParameterBufferHEVC pic_fields.bits or
    slice_parsing_fields.bits. Both bits left zero. BBB-720p10s_hevc
    fixture uses neither tiles nor explicit deblocking-control
    parameters, so the omission is correct for the iter2 binding cell.

  S2 (3 PPS scalars added):
    pic_parameter_set_id (default 0; VAAPI doesn't expose),
    num_ref_idx_l0_default_active_minus1, num_ref_idx_l1_default_
    active_minus1 (both populated from VAAPI picture struct).

  Q2 (slice_segment_addr populated):
    Was missing in old h265.c. Now sourced from
    VAAPI's slice->slice_segment_address.

  S3 (SCALING_MATRIX content choice):
    Implementer choice taken: when iqmatrix_set==false (BBB has no
    scaling list per SPS flags = SAO|STRONG_INTRA_SMOOTHING),
    h265_fill_scaling_matrix sends memset-zero. Matches FFmpeg's
    sl=NULL pattern at v4l2_request_hevc.c:384-403 (preserves
    byte-equality vs cross-validator anchor).

  S4 (FFmpeg function name fix): cosmetic; no code impact.

Plus one Phase 6 inline correction: phase 5 review S1 suggested
VAAPI exposes uniform_spacing_flag in pic_fields.bits; empirical
test-compile shows it doesn't. Comment added in h265_fill_pps
documenting the omission.

Picture.c changes (3 edits):

  1. codec_set_controls HEVCMain dispatch (lines 204-206 → call
     h265_set_controls; replaces explicit Fourier-local: HEVC stripped
     reject).
  2. codec_store_buffer HEVC VASliceParameterBufferType case: append
     VAAPI slice param to params.h265.slices[N] array, increment
     num_slices. Single-slice mirror at .slice retained for
     h265_fill_pps (which reads dependent_slice_segment_flag from
     LongSliceFlags).
  3. RequestBeginPicture: add params.h265.num_slices = 0 reset
     alongside existing h264.matrix_set = false reset.

Surface.h: extend params.h265 struct with slices[HEVC_MAX_SLICES_PER_
FRAME=64] array + num_slices counter. ~17 KB extra per surface union;
24 surfaces in iter7 cap_pool = ~400 KB total surface_heap growth.
object_heap allocator picks up new size automatically via
sizeof(struct object_surface).

Context.c: separate 2-control batched call sets HEVC DECODE_MODE +
START_CODE device-wide. Same best-effort (void)v4l2_set_controls
pattern as the existing H.264 device-init block; if kernel doesn't
advertise HEVC controls (hantro on RK3568/RK3399), the batch silently
fails without invalidating the H.264 batch.

Meson.build: uncomment 'h265.c' (line 50) and 'h265.h' (line 73)
in sources + headers lists.

H265.h: added HEVC_MAX_SLICES_PER_FRAME=64 #define before struct
forward declarations.

Phase 6 smoke test on fresnel (post Commit A + Commit B):

  Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec env binding
              (/dev/video1 + /dev/media0). PASS.

  Criterion 3: ffmpeg -hwaccel vaapi HEVC decode of bbb_720p10s_hevc.mp4
              -frames:v 5 -f null -, exit 0. cap_pool_init: 24 slots
              ready. PASS.

  Criterion 4: mpv --hwdec=vaapi --vo=image at +02s seek, HEVC fixture:
    HW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5
    SW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5
    HW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656
    SW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656
    HW=SW byte-identical for both frames; frame1 != frame2 (real motion).
    PASS.

  Criterion 5: regression hashes hold for both prior cells:
    H.264 +30s HW frame 1: f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9 (T4 ref MATCH)
    H.264 +30s HW frame 2: 7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8 (T4 ref MATCH)
    MPEG-2 +02s HW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 (iter1 ref MATCH)
    MPEG-2 +02s HW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de (iter1 ref MATCH)
    PASS.

All five criteria green on first build attempt — Phase 5 review
caught the 3 Critical UAPI errors (data_bit_offset → data_byte_offset
rename; dpb.rps field gone + pic_order_cnt_val rename + index-array
semantics) that would have been Phase 6 compile failures or silent
Phase 7 byte-compare divergences. Without that review pass, this
commit would have been the start of a 2+ loopback debugging cycle.

Refs:
  ../fresnel-fourier/phase4_iter2_plan.md (10 contract clauses,
                                            File 4 patch shape)
  ../fresnel-fourier/phase5_iter2_review.md (C1, C2, C3, S1, S2,
                                              S3, S4, Q2 amendments
                                              all incorporated)
  ../fresnel-fourier/phase0_evidence/2026-05-08/iter2_phase3/
    ffmpeg_v4l2req.stdout (cross-validator anchor — Phase 7
    bonus byte-compare verification target)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:58:34 +02:00
claude-noether a09c03c154 iter6 fix: per-OUTPUT-slot request_fd binding via REINIT
iter4 (385dee1) replaced the original media_request_reinit pattern
with close+media_request_alloc per frame to escape an EINVAL on
S_EXT_CTRLS that turned out to be a DPB-payload bug (74d8dd1, FFmpeg
V4L2_H264_FRAME_REF semantics). The per-frame close+alloc model
worked for mpv vaapi-copy (single-surface recycle) but raced under
Firefox 150's MediaSource pipeline (multi-surface rotation): fd=30
got reused via lowest-free-fd allocation faster than the kernel-
side per-buffer state-machine could tear down the prior request,
producing intermittent VIDIOC_QBUF EINVAL on OUTPUT after 1..53
successful frames.

Phase 2 telemetry confirmed:
- DQBUF returned the index we passed (no FIFO mismatch)
- SPS/PPS/DECODE_PARAMS/SCALING_MATRIX byte-identical between mpv
  and Firefox first 64 bytes
- Pool size bump 4 -> 16 only delayed the failure (62 frames)
- Different OUTPUT slot indices failed across runs (race signature)

Fix: each OUTPUT pool slot owns a permanent request_fd allocated
once at request_pool_init and REINIT'd between uses in
RequestSyncSurface. 1:1 slot-to-fd binding eliminates cross-slot fd
reuse entirely. Pool stays driver-wide (multi-context safe per
iter5 Track E); slots cycle through 16 distinct fds in round-robin
acquire.

Files:
- request_pool.h: add request_fd field to slot struct; init
  signature takes media_fd
- request_pool.c: alloc per-slot fd at init, close at destroy
- context.c: pass driver_data->media_fd; pool size 4 -> 16
- picture.c: BeginPicture binds slot->request_fd to surface;
  EndPicture's per-frame media_request_alloc removed
- surface.c: RequestSyncSurface uses media_request_reinit instead
  of close+alloc; DestroySurfaces close removed (slot owns fd);
  error path close removed; surface_object NULL-init for the
  -Wmaybe-uninitialized warning fix

Empirical verification (clean build sha ebe396d5..., no diagnostic
instrumentation):
- Firefox 150 + bbb_1080p30_h264.mp4 + LIBVA_DRIVER_NAME=v4l2_request
  + sandbox enabled: 35s+ playback, zero "Unable to queue buffer"
  / "Unable to set control(s)", lsof shows RDD process holds
  /dev/video1 + /dev/media0 throughout. Driver stderr: only the
  single cap_pool_init: 24 slots ready line.
- mpv vaapi-copy 50 frames: zero errors, "Using hardware decoding
  (vaapi-copy)" - no regression vs iter5-end driver.

Pool-size bump diagnostic (Phase 5 sonnet design review feedback):
4 -> 16 alone took 1->62 frames, far short of the 30s success
criterion (~900 frames at 30fps). REINIT discipline is the actual
fix; pool 16 is comfortable headroom over typical H.264 MaxDpbFrames.

Phase 5 sonnet code review: APPROVE-WITH-CHANGES (one comment
attribution corrected: cleanup runs at RequestTerminate, not
RequestDestroyContext, since the pool is driver-wide).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:30:39 +00:00
test0r d3a299b4cc iter5 sweep: remove iter1 patch-0010 hex-dumps + patch-0011 sentinel
picture.c: remove the 0xab sentinel write into CAPTURE buffer first
32 bytes pre-QBUF + the OUTPUT hex-dump pre-QBUF. Both were iter1
diagnostics for "where does the buffer write go?" investigation.

surface.c: remove the post-DQBUF CAPTURE Y-plane hex-dump + luma
variance signal. The msync(MS_SYNC|MS_INVALIDATE) was added as a
companion fix for the cached-mmap issue surfaced by the dump itself —
removing the dump removes the need for the msync.

With iter1+iter2+iter3+iter4 fixes landed, these dumps fire on every
single frame and produce hundreds of MB of log noise during sustained
decode. Now gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:48:31 +00:00
test0r 951233a12e iter5 sweep: remove iter1 ENTER traces (13 call sites across 4 files)
Removes the iter1 patch-0014 ENTER traces from buffer.c, image.c,
picture.c, surface.c. These were diagnostic-only entry-point logs
added during iter1's "where does Firefox RDD crash?" investigation.
With the iter1+iter2+iter3+iter4 fixes landed, the entry-point
traces are pure noise.

If a future investigation needs entry-point coverage, strace -e trace
on the libva consumer process gives equivalent visibility without
modifying the driver.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:47:25 +00:00
test0r 19acc76da4 iter2 Fix 3: decoupled CAPTURE buffer pool with LRU recycling
Pre-iter2 each VA surface was permanently 1:1 bound to one V4L2 CAPTURE
buffer. mpv reusing a surface for a new decode while the compositor still
held an EXPBUF'd dma_buf fd to the prior frame caused the kernel to
write fresh decode output into the same physical memory the compositor
was reading -- visible as stutter / back-and-forth swap on
mpv --hwdec=vaapi --vo=gpu playback.

Architecture:
- New cap_pool abstraction (cap_pool.{h,c}) owns N CAPTURE buffers
  (N = max(surfaces_count, MIN_CAP_POOL=24)) with per-slot state
  {FREE, IN_DECODE, DECODED, EXPORTED} guarded by pthread_mutex_t.
- Surfaces no longer own buffers; each vaBeginPicture acquires the
  oldest FREE slot (LRU), binds it for the decode cycle, and the slot
  cycles IN_DECODE -> DECODED (post-DQBUF) -> EXPORTED (post-EXPBUF).
- Slot is released on next BeginPicture for the same surface or on
  vaDestroySurfaces.

Limitations (Sonnet Phase 5 review iter2 9.x, deferred to iter3+):
- Option-A statistical mitigation; race window narrows to "pool
  exhausted, force-recycle of oldest EXPORTED slot." For typical mpv
  16-surface playback with MIN_CAP_POOL=24 the fallback never fires.
- Multi-context concurrent use not addressed (one V4L2 device, multiple
  cap_pools -- iter3 scope).

Other call sites updated:
- picture.c::BeginPicture acquires + binds, releasing prior slot if any.
- surface.c::SyncSurface marks slot DECODED after DQBUF.
- surface.c::ExportSurfaceHandle marks slot EXPORTED, retaining OUR
  EXPBUF fd for force-recycle close().
- surface.c::DestroySurfaces releases via surface_unbind_slot;
  cap_pool owns the mmaps now.
- surface.c::CreateSurfaces2 destroys the pool in the resolution-change
  path before REQBUFS(0) (else stale v4l2_index after Fix 1's REQBUFS).
- context.c::DestroyContext invokes cap_pool_destroy.
- image.c::DeriveImage skips copy_surface_to_image when current_slot is
  NULL (ffmpeg av_hwframe_ctx_init probes derive on undecoded surfaces).

Verified: mpv vaapi-copy 200 frames bbb_1080p30, 0 drops, LRU visibly
recycling slot indices, real luma gradient. mpv vaapi --vo=gpu
operator-inspection follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:03:31 +00:00
test0r 21ae311077 DEBUG: ENTER on CreateBuffer + BeginPicture for frame-1 crash narrowing 2026-05-04 14:43:29 +00:00
test0r 841f616e74 h264: gate SCALING_MATRIX submission on VAIQMatrixBuffer presence
VAAPI signals "explicit scaling lists are present in the bitstream"
implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a
VAIQMatrixBufferH264 alongside RenderPicture iff
sps_scaling_matrix_present_flag || pps_scaling_matrix_present_flag.
When the bitstream uses default (flat) scaling, no IQMatrixBuffer
arrives and the in-tree h264.matrix struct stays zero-initialised.

fourier's existing codec_store_buffer for MPEG2 and HEVC tracks this
via a per-surface iqmatrix_set boolean (surface.h::mpeg2.iqmatrix_set,
h265.iqmatrix_set) — the H.264 path was missing the equivalent flag,
so set_controls always submitted the scaling matrix, including the
zero-initialised case.

Symptom on hantro-vpu RK3568: when TRANSFORM_8X8_MODE is enabled in
PPS, the kernel multiplies all 8x8 DCT coefficients by the zeroed
scaling_list_8x8, producing a zeroed CAPTURE buffer despite a
successful decode round-trip (no V4L2_BUF_FLAG_ERROR,
bytesused=3655712 reported).

Earlier draft of this patch unconditionally omitted SCALING_MATRIX in
FRAME_BASED. That's corpus-correct (bbb has no explicit scaling
lists) but the wrong predicate: the kernel-side gating is by
"matrix-supplied vs. not," not by decode mode. Streams that signal
explicit scaling lists must submit SCALING_MATRIX in either mode.

Contract verification (audit_0008_decode_params_2026-05-01.md +
hantro_h264.c::assemble_scaling_list): the kernel uses the supplied
matrix when SCALING_MATRIX is in the control batch and falls back
to spec-defined defaults when absent. Mode-independent.

This patch:
  - surface.h: adds bool matrix_set to params.h264, mirroring
    mpeg2.iqmatrix_set / h265.iqmatrix_set.
  - picture.c codec_store_buffer (H.264 VAIQMatrixBufferType case):
    sets matrix_set = true when the buffer arrives.
  - picture.c RequestBeginPicture: resets matrix_set = false at the
    start of each Begin/Render/End cycle.
  - h264.c h264_set_controls: builds the controls[] array
    incrementally; SPS/PPS/DECODE_PARAMS always; SCALING_MATRIX iff
    matrix_set; SLICE_PARAMS only in SLICE_BASED; PRED_WEIGHTS only
    when both SLICE_BASED and V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.

The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is preserved —
kernel doc ext-ctrls-codec-stateless.rst:752: "When this mode is
selected, the V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall
not be set."

Cross-reference: kernel UAPI section
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SCALING_MATRIX
(matrix supplied iff explicit scaling lists in bitstream) and
hantro_h264.c::assemble_scaling_list (consumes supplied matrix or
falls back to defaults).

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 1690dfaa79 DEBUG: sentinel-pattern test for CAPTURE buffer write
Diagnostic-only. Writes 0xab×32 into the CAPTURE buffer's first 32
bytes immediately before VIDIOC_QBUF. The 0010 hex-dump after
DQBUF reveals which case we're in:

  - All 0xab → kernel never wrote to this buffer (wrong buffer
    chosen, alias, or no decode actually happened despite
    bytesused=3655712 reported).
  - All zeros → kernel did write 0x00s (overwriting our
    sentinel), and the apparent "no picture" output is the
    kernel-side decode actually producing zeros (e.g. parser
    rejected the bitstream).
  - Mix of zeros and real luma values → kernel wrote real
    decoded pixels; CPU read sees stale-cached zeros somewhere
    OR the sentinel area was a header that decoder zeroed but
    rest is real. Need to check more bytes.
  - All 0xab still → kernel never touched this region but other
    parts of buffer may be filled (incomplete decode).

Removed once Step 1 decode is verified.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 3609fbb425 DEBUG: hex-dump OUTPUT and CAPTURE buffer contents per frame
Diagnostic-only patch (NOT for upstream). Hex-dumps:
  - First 32 bytes of OUTPUT buffer at QBUF time in
    picture.c::RequestEndPicture (i.e. what we feed the kernel)
  - First 32 bytes of CAPTURE Y-plane after DQBUF in
    surface.c::RequestSyncSurface (i.e. what kernel returned)

Lets us see whether:
  - OUTPUT bitstream begins with valid ANNEX_B start code + NAL
    header byte (e.g. `00 00 01 65` for IDR slice)
  - CAPTURE Y-plane after decode contains varied luma data
    (working) vs. all-zeros / repeating pattern (kernel didn't
    write anything).

Removed once Step 1 decode is verified working. Output goes via
existing request_log() to stderr.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 565f5c0de4 context: introduce request_pool, decouple OUTPUT buffers from surfaces
Commit 3 of the upstreamable plan (upstreamable_design.md §1, §5).
Replaces the prior per-surface OUTPUT-buffer ownership model with a
small driver-wide pool sized by codec pipeline depth (4 H.264 frames
in flight), allocated unconditionally regardless of caller's
num_render_targets.

Prior art (kernel UAPI dev-stateless-decoder.rst, ffmpeg
v4l2_request.c, Chromium V4L2StatelessVideoDecoder, GStreamer
v4l2slh264dec) all decouple OUTPUT and CAPTURE pool sizing. fourier's
"output_count == surfaces_count" model was a category error: OUTPUT
buffers are request-time bitstream slots, CAPTURE buffers are
picture-time DPB slots; their lifecycles and sizing are independent.

Changes:
  * NEW src/request_pool.{c,h} (~200 LoC):
      - request_pool_init(): CREATE_BUFS + per-slot QUERYBUF + mmap.
      - request_pool_destroy(): munmap all, idempotent.
      - request_pool_acquire(): round-robin claim; returns V4L2 buffer
        index of an unused slot or -1.
      - request_pool_release(): mark slot free for reuse.
      - request_pool_slot(): accessor for ptr/size given a buffer index.

  * src/request.h: add struct request_pool output_pool to request_data.

  * src/context.c::RequestCreateContext: replace the per-surface
    OUTPUT loop with a single request_pool_init() call (count=4,
    independent of surfaces_count). Drop the now-unused locals
    (length, offset, source_data, output_buffers_count, index,
    index_base, i, surface_object). DELETES patch 0002's
    "output_buffers_count = ... ? ... : 4" hack inline — the pool's
    own count parameter supersedes it.

  * src/picture.c::RequestBeginPicture: borrow a pool slot at frame
    start, write its mmap pointer/size/index into the surface's
    transient source_* fields. The fields stay (still useful as
    a borrow handle that the existing codec_store_buffer memcpys
    target), but no longer represent surface-permanent ownership.
    Reset slices_size/slices_count here too (was implicit on first
    Render).

  * src/surface.c::RequestSyncSurface: after VIDIOC_DQBUF returns
    the OUTPUT buffer, release the pool slot and clear the surface's
    borrow handle. Fixes the segv on second-frame submission.

  * src/surface.c::RequestDestroySurfaces: remove the munmap of
    source_data — pool owns the mmap.

  * src/request.c::RequestTerminate: call request_pool_destroy()
    before close(video_fd) so munmaps still target a valid fd.

  * src/meson.build: add request_pool.c and request_pool.h to the
    sources/headers lists.

This commit removes 0002's OUTPUT-pool hack inline (the
"floor to 4" line is gone). The DECODE_MODE/START_CODE block in 0002
remains until commit 4 lands.

Build-verified clean on aarch64.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r c45fea96e3 fourier-local: stateless control modernization + HEVC strip
Compound patch carrying the fork's pre-Step-1 substrate, originally
authored by Jernej Škrabec / fourier on top of bootlin's a3c2476:

- src/h264.c + src/picture.c: V4L2_CID_MPEG_VIDEO_H264_* renamed to
  V4L2_CID_STATELESS_H264_*, struct shapes tracked to mainline
  (V4L2_CID_STATELESS_H264_DECODE_MODE/_START_CODE added to the
  passthrough shim).
- include/hevc-ctrls.h: redirect shim to <linux/v4l2-controls.h>
  (kernel-side HEVC controls now live in the canonical UAPI header).
- src/meson.build: src/h265.c / src/h265.h commented out — HEVC
  build path is excluded from this fork (RK3568 hantro G1/G2 has
  no HEVC, and the kernel-side HEVC controls have a separate
  rework in flight upstream).
- src/tiled_yuv.S: aarch64 stub for tiled_to_planar (assembly
  source was sunxi-cedrus armv7-only; aarch64 needs a stub to keep
  the build linking).
- include/h264-ctrls.h: removed (dead post-fourier — no source
  includes it; the passthrough shim's CID aliases live in the
  kernel header now).

Functionally equivalent to the prior fork master commits:
  c1f5108 V4L2_PIX_FMT_H264_SLICE rename
  4ccbfe9 Strip HEVC build path
  da9f2a5 include/h264-ctrls.h passthrough + CID aliases
  fc4bb10 src/h264.c track upstream UAPI shape
  13e9b64 src/h264.c drop num_slices field
  4d14ffb src/tiled_yuv.S aarch64 stub
  1b02c9b src/h264.c include utils.h

Folded into one commit during 2026-05-04 Step 1 reconciliation
(see ../phase0_evidence/2026-05-04/findings.md). Per-patch history
of the early fork commits preserved on the pre-step1 branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 09:40:14 +00:00
Paul Kocialkowski 0c611c6b7a Implement proper timestamping for references
Reference frames are now identified using their timestamp:
set the timestamp when queuing the output buffer and use it to identify
the frame later on.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:41:56 +01:00
Paul Kocialkowski e29b04ccc7 autotools: Rewrite configuration in a minimalistic fashion
Drop the per-codec options while at it, since we'll soon include a copy
of the associated headers.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:37:12 +01:00
Paul Kocialkowski 518d7a0c59 Update and harmonize heading author lists
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:37:12 +01:00
Ezequiel Garcia b2944629fa Add support for dynamic detection of supported codecs
H.264 and H.265 support is still not supported upstream,
so it makes sense to autodetect each codec and only
enable those that are supported.

Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
2018-10-12 16:11:06 -03:00
Paul Kocialkowski 7ff2543e64 Add support for the single-planar V4L2 API
Signed-off-by: Paul Kocialkowski <contact@paulk.fr>
2018-09-07 16:43:13 +02:00
Paul Kocialkowski 13eaae060e Add support for H265 decoding, including predictive frames
Some features are missing, such as scaling lists (quantization) and
10-bit output.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-08-31 10:13:52 +02:00
Paul Kocialkowski 7d1ac10517 Add support for MPEG2 quantization matrices
This adds support for MPEG2 quantization matrices, which are optional
given that fallback default matrices are used (on the kernel side) when
no such matrix is provided.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-25 14:36:36 +02:00
Maxime Ripard 92f6546596 tree: Remove void * casts
void * can be assigned from and stored to any pointer type without any
warning. Remove the explicit casts.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 17:02:23 +02:00
Maxime Ripard 111f5b209a tree: Rename cedrus_data to request_data
The cedrus_data structure carries the old name. In order to migrate to the
new name, let's rename it to request_data.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 17:02:23 +02:00
Maxime Ripard 4ad990e087 tree: Rename the header and defines
The sunxi_cedrus.h header contains a bunch of defines prefixed with
SUNXI_CEDRUS.

As part as the ongoing migration to a more generic name, change that prefix
for V4L2_REQUEST, and the header file to request.h

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 17:02:23 +02:00
Maxime Ripard 913e1e642c tree: Rename the libva hooks
As part of our renaming effort, Rename the libva hooks names to mention
request instead of SunxiCedrus

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 17:02:23 +02:00
Maxime Ripard 2d1bce38c2 h264: Don't set num_slices anymore
The num_slices parameter was improperly set to the number of reference
frames, which is incorrect.

Add a counter for the number of slices per surface, and set num_slices to
that value.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 15:28:55 +02:00
Maxime Ripard acc0cf3475 codecs: pass the context to the controls function as well
Some functions setting the controls will need the context in the future.
Make sure that we provide it as an argument.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 15:28:55 +02:00
Maxime Ripard 5aeb07f8bf tree: Run clang-format to conform to the kernel coding style
The coding style has been a bit erratic. Enforce the linux kernel coding
style by reusing their .clang-format file, running clang-format on the
source, and ignoring the few shortcomings that clang-format has at the
moment (especially on aligning the define values).

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 10:12:15 +02:00
Maxime Ripard b938824c48 tree: Shorten struct sunxi_cedrus_driver_data name
This long structure name makes it quite difficult to fit within the 80
characters limit. Shorten it.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 09:34:15 +02:00
Maxime Ripard 5aa1604a6c tree: add the driver_data parameter to the BUFFER macro as well
The BUFFER macro takes an implicit driver_data argument. In order to make
it obvious that we need it, let's put it as an explicit parameter.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 09:27:57 +02:00
Maxime Ripard fd263773cc tree: Change the macros to take the actual arguments they are using
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-13 16:00:08 +02:00
Maxime Ripard 1efa9d877e Add support for H264 decoding
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-11 17:07:15 +02:00
Maxime Ripard d7d8fc744b Abstract away MPEG2 support
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-11 17:07:15 +02:00
Paul Kocialkowski 03fd51b3b3 Reduce switch/case indentation
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-11 15:32:52 +02:00
Paul Kocialkowski 9f2c069f76 Rework buffer management to be more generic and support untiled format
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-11 15:16:52 +02:00
Paul Kocialkowski bb73d363a3 Sync with latest definitions from the Cedrus driver and requests API
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-06-21 09:30:06 +02:00
Paul Kocialkowski a412ee8b42 picture: Always sync surface in EndPicture
The libVA API expects rendering to be done after calling EndPicture.
Since SyncSurface calls are generally not issued in a way compatible
with dequeuing buffers (they might be called too early or too late),
always call SyncSurface from EndPicture.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-26 10:00:09 +02:00
Paul Kocialkowski d716c32511 picture: Remove recurrent log message
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-25 11:46:06 +02:00
Paul Kocialkowski f872e345d0 Centralize buffer-related ressources in surface object and avoid dynamic indexes
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-25 10:48:17 +02:00
Paul Kocialkowski 97a087dc73 picture: Resolve various trivial build issues
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-25 09:40:36 +02:00
Paul Kocialkowski c55a2709b0 mpeg2: Pass driver_data along to access reference surfaces
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-25 09:32:45 +02:00
Paul Kocialkowski 6cd00b758c Move log function to a dedicated file and rename it along the way
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-24 17:12:29 +02:00
Paul Kocialkowski 9de2ba88b5 Introduce and use media helpers, updated to the latest media request API
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-24 17:00:50 +02:00
Paul Kocialkowski a8c191b544 Rename mem2mem_fd to video_fd to prepare for media introduction
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-24 16:40:42 +02:00
Paul Kocialkowski 1bf08f9656 mpeg2: Rework helper functions to a more flexible interface
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-24 16:15:00 +02:00
Paul Kocialkowski c7f0d7684a Introduce and use dedicated v4l2 helpers to replace inline ioctls
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-24 15:39:31 +02:00
Paul Kocialkowski 9716acc322 surface: Harmonize coding style
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-24 14:44:57 +02:00
Paul Kocialkowski 56614c25a6 picture: Harmonize coding style
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-24 14:11:08 +02:00
Paul Kocialkowski cd31cb568c Rename va_config to config for consistency
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-04-23 17:09:19 +02:00