Commit Graph

92 Commits

Author SHA1 Message Date
claude-noether d7ef0f6cd9 ampere-av1 Phase 3: SEQUENCE byte-equal kdirect; 3/10 frames PASS bit-exact
Three more fixes after strace-diff localization vs kdirect.

Fix 6 — fill_sequence ENABLE_SUPERRES: gate on
picture->pic_info_fields.bits.use_superres instead of unconditional
set-true. VAAPI doesn't expose enable_superres at sequence level; per
strace diff kdirect clears the flag for streams not using superres
(byte 1 of flags was the only SEQUENCE diff). After this fix,
SEQUENCE ctrl byte-equal kdirect on every call.

Fix 7 — refresh_frame_flags = 0xff (was 0): VAAPI doesn't expose
refresh_frame_flags. Default 0xff = "refresh all DPB slots" matches
kdirect's submission and AV1 spec default for KEY/SWITCH frames; for
inter frames simple P-frame chains naturally tolerate this.

Fix 8 — surface_object->av1_order_hint per-surface tracking. Set in
av1_set_controls from picture->order_hint of the current frame. Also
propagated to the linked display surface (when apply_grain=1 →
cur_frame != cur_display) so future frames referencing the display
surface find the order_hint via the linked_decode_surface_id.

Tried + reverted: ref-name iteration of reference_frame_ts / order_hints
via picture->ref_frame_idx[i-1] → DPB slot (Kwiboo's convention via
FFmpeg's s->ref[i]). Empirically regressed 3/10 → 1/10. V4L2 uAPI's
indexing here looks DPB-slot-direct despite the AV1 spec lexicon —
needs kernel-side disambiguation to settle.

Verification on ampere (av1_larger.ivf 352x288, 10 frames):
  Frames 0, 2, 4: PASS bit-exact (apply_grain=1, grain HW path)
  Frames 1, 3, 5-9: DIFF (apply_grain=0)
  3/10 PASS (was 1/10 after iter checkpoint).
  test_av1.ivf 208x208: unchanged bit-exact PASS sha 029ee72c214b37c1

Remaining open: frame 1 (apply_grain=0, first inter) submits IDENTICAL
FRAME ctrl bytes to kdirect (verified strace-diff post-fix), yet
decoded output diverges. That means the divergence is no longer in
control submission — points at OUTPUT-side bitstream differences
between ffmpeg-vaapi and ffmpeg-v4l2request, or at DPB CAPTURE buffer
state (grain-applied data being used as reference vs pre-grain).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 10:55:07 +00:00
claude-noether 5803cbcf6c ampere-av1 Phase 3 progress: film_grain link + UPDATE_GRAIN; frame 0 bit-exact
Three structural fixes for AV1 with film_grain on vpu981 (RK3588). Output
is no longer empty / crashed; frame 0 (IDR with apply_grain=1) is
bit-exact vs kdirect. Inter frames still diverge.

Fix 1 — surface.h + surface.c: linked_decode_surface_id field on
object_surface, initialized to VA_INVALID_SURFACE. When AV1 picture has
apply_grain=1, VAAPI's VADecPictureParameterBufferAV1 carries a
current_display_picture distinct from current_frame. ffmpeg-vaapi calls
vaBeginPicture on current_frame (decode surface, slot gets bound) but
vaGetImage on current_display_picture (display surface, no slot) → NULL
deref in copy_surface_to_image.

Fix 2 — av1.c: in av1_set_controls, when cur_frame != cur_display, set
display_surface->linked_decode_surface_id = current_frame. Establishes
the back-link so display surface can borrow decode surface's data.

Fix 3 — image.c copy_surface_to_image: when slot is NULL and the
surface has linked_decode_surface_id, lookup the decode surface and
mirror its destination_data[] + destination_sizes[] +
destination_planes_count. NULL guard with diagnostic log retained.

Fix 4 — av1.c fill_film_grain: when apply_grain=1, also set
V4L2_AV1_FILM_GRAIN_FLAG_UPDATE_GRAIN. Confirmed by strace-diff: kdirect
sends flags=0x0B (APPLY|UPDATE|...), libva was sending 0x09 (APPLY but
no UPDATE). Without UPDATE the kernel tries to reuse from
film_grain_params_ref_idx=0, which is never populated. Earlier reverted
because UPDATE seemed to trigger a SEGV — but that SEGV was the
unmasked NULL-slot deref; with fix 1+2+3 in place UPDATE is safe.

Fix 5 — av1.c reference_frame_ts plumbing: when a referenced surface
has timestamp=0 AND linked_decode_surface_id set, follow the link to
find the decode surface that carries the real timestamp. Display
surfaces don't get OUTPUT QBUF'd by us, so their own timestamp stays
zero.

Also: BeginPicture diagnostic log + surface_unbind_slot diagnostic log
+ v4l2.c error_idx diagnostic (kept from earlier — useful for ongoing
investigation).

Verification on ampere:
  test_av1.ivf (208x208, 2 frames, no grain): bit-exact PASS sha
    029ee72c214b37c1 (unchanged, no regression)
  av1_larger.ivf (352x288, 10 frames, film_grain alternates):
    frame 0 (key, apply_grain=1): PASS bit-exact vs kdirect
    frame 4: PASS bit-exact
    frames 1,2,3,5,6,7,8,9: DIFFER

Frame 0 PASS proves: SEQUENCE + FRAME + TILE_GROUP_ENTRY + FILM_GRAIN
mapping is correct for IDR. Frame 4 PASS is unexplained but encouraging.
Inter-frame divergence (frame 1+) points at: reference handling for
inter prediction is still off — either order_hints[] (still zero,
VAAPI doesn't expose per-ref), or grain-applied vs pre-grain DPB
semantics, or ref_frame_idx pointing into the wrong surface space.

Next investigation: per-frame strace diff between libva and kdirect
controls payload to spot remaining field mis-mappings on inter frames.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 10:45:31 +00:00
claude-noether 415688dab0 Revert "iter19 α-23 TEST: skip media_request_reinit() in RequestSyncSurface"
This reverts commit aa82bffa35.
2026-05-14 09:03:37 +00:00
claude-noether aa82bffa35 iter19 α-23 TEST: skip media_request_reinit() in RequestSyncSurface
Tests mechanism 2 (REINIT clears controls between S_EXT_CTRLS and QUEUE).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 09:03:08 +00:00
claude-noether 7eae6eab46 iter8 Phase 6: γ env-gated CAPTURE buffer diagnostic dump
After RequestSyncSurface DQBUFs CAPTURE and marks slot DECODED, optionally
dump first/last 32 bytes of each destination_data plane plus a non-zero
count over a per-plane scan window (one MB row for plane 0, 1024 bytes
for chroma). Gated behind LIBVA_V4L2_DUMP_CAPTURE=1; default off, no
regression on existing flows.

Diagnostic for Bug 4 (H.264 partial-fill): distinguishes "kernel didn't
write" from "libva mis-reads" from "stale-residue" by inspecting the
post-DQBUF buffer state directly.

Phase 5 amendments applied:
- Amendment 1 (CRIT-1): snprintf-buffered hex line, one request_log call.
- Amendment 2 (CRIT-2): dump nested inside current_slot != NULL guard.
- Amendment 4 (IMP-3): placed between cap_pool_mark_decoded and
  status=VASurfaceDisplaying on happy path only.
- Amendment 5 (MIN-2): scan window = max(1024 chroma, bpl*16 luma).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:10:24 +00:00
claude-noether 70196f8065 fresnel-fourier iter5b-β Phase 7 fix-forward commit D: destination_* for vaapi-copy late-surface flow
Phase 7 empirical: all 5 libva codecs returned all-zero because
CreateContext's surfaces_ids[] walk was a no-op for ffmpeg-vaapi-copy
which passes surfaces_count=0 to vaCreateContext (per the iter6
comment at context.c:262). Surfaces existed in driver_data's
surface_heap but weren't in the param array → destination_* stayed
at the zero initialization from CreateSurfaces2 β → BeginPicture's
surface_bind_slot saw destination_planes_count=0 → no data
assignment → copy_surface_to_image read all-zero.

Fix: cache the format-uniform CAPTURE geometry in driver_data
(fmt_valid, fmt_planes_count, fmt_buffers_count, fmt_format_height,
fmt_sizes[], fmt_bytesperlines[]). Populate at CreateContext after
v4l2_get_format(CAPTURE). Walk surface_heap (not just surfaces_ids[])
to fill every existing surface. Add lazy-fill in CreateSurfaces2 for
surfaces created AFTER CreateContext. Invalidate cache in
DestroyContext.

New helper: surface_fill_format_uniform(driver_data, surface_object).
Idempotent on destination_planes_count != 0.

Signed-off-by: claude-noether <claude-noether@reauktion.de>
2026-05-12 18:52:33 +00:00
claude-noether 7055b14f5e fresnel-fourier iter5b-β Phase 6 commit C: β refactor — OUTPUT lifecycle to CreateContext + CRIT-1 + CRIT-2
Strip OUTPUT-side V4L2 device-format lifecycle out of
RequestCreateSurfaces2 entirely. Move S_FMT(OUTPUT), CAPTURE-format
probe, cap_pool_init, per-surface destination_* fill into
RequestCreateContext where config_id (and therefore the bound
VAProfile) is known via config_object->pixelformat (wired by
commit B). The α' multi-CreateSurfaces2-mid-stream failure mode
disappears because β has no in-CreateSurfaces2 teardown branch;
each context cycle does its own setup, DestroyContext handles
teardown.

Phase 5 v2 review amendments:
- CRIT-1: removed video_format==NULL early-return at context.c:64-66
  (would have rejected every first β CreateContext).
- CRIT-2: added request_pool_destroy() to DestroyContext before
  REQBUFS(0). Pre-β only surface.c's resolution-change branch
  called request_pool_destroy; β strips that, so DestroyContext
  becomes the sole per-session teardown site.
- IMP-1: probe CAPTURE format first to derive output_type from
  video_format->v4l2_mplane (eliminates the hardcoded mplane=true
  hack from the Phase 4 v2 plan).
- IMP-2: surface_reset_format_cache() deleted (function + declaration
  in surface.h + call in DestroyContext + last_output_{width,height}
  fields in request.h). All dead under β.

CreateSurfaces2 now ~50 LOC (was ~250). Pure surface ID allocation
+ per-surface lifecycle bookkeeping; no V4L2 device state touched.

Signed-off-by: claude-noether <claude-noether@reauktion.de>
2026-05-12 14:41:35 +00:00
claude-noether 709ab34624 Revert "fresnel-fourier iter5b Phase 6 commit C: surface.c — profile-derived OUTPUT pixel format"
This reverts commit 4b2288fa9a.
2026-05-12 12:32:56 +00:00
claude-noether 4b2288fa9a fresnel-fourier iter5b Phase 6 commit C: surface.c — profile-derived OUTPUT pixel format
Replace the hardcoded `V4L2_PIX_FMT_H264_SLICE` at surface.c:173 with
a profile-derived lookup via find_sole_active_pixelformat(). The
helper walks the config_heap; with one active config (universal across
mpv, ffmpeg, Firefox, Chromium) it returns the cached pixelformat
populated at CreateConfig in commit B. Falls back to the pre-iter5b
H264_SLICE for the pathological "zero or multiple configs" case
(probe surfaces before CreateConfig; multi-config-then-surfaces).

Extend the existing resolution-change gate to also fire on
pixelformat (codec) change. The teardown branch handles both cases
identically — REQBUFS(0) on both queues before re-S_FMT.

The kernel behavior pre-iter5b on RK3399:
- hantro: hantro_find_format(H264_SLICE) returns NULL on the RK3399
  decoder block (no H.264 support); hantro_try_fmt silently
  substitutes the first format in rk3399_vpu_dec_fmts =
  MPEG2_SLICE → codec_mode = MPEG2_DECODER. VP8 bitstream
  dispatched to MPEG2 ops → all-zero CAPTURE. MPEG-2 worked by
  accident (bitstream matched the substituted codec_mode).
- rkvdec: format/control mismatch; decoder silently drops the
  request → all-zero CAPTURE.

Same bug class as iter4 commit `692eaa0` (h264_start_code
unconditional set). Both fixes thread the active VAProfile into
codec-specific kernel state.

Signed-off-by: claude-noether <claude-noether@reauktion.de>
2026-05-12 09:23:31 +00:00
claude-noether 7bd0818792 iter7 Phase 7 finalization: OUTPUT-pool teardown + test refinements
Surfaced during Phase 7 verification on ohm:

1. **OUTPUT pool stale-slot bug (src/surface.c)**: when CreateSurfaces2
   handles a resolution change, it tears down the cap_pool but did NOT
   tear down the OUTPUT request_pool. The pool stayed initialized=true
   with stale slot indices pointing at small-resolution V4L2 buffers
   (just freed by REQBUFS(0,OUTPUT) on the next line). Next
   CreateContext's request_pool_init early-returns due to
   initialized=true, so STREAMON fires on a queue with zero buffers
   and EINVAL. Fix: call request_pool_destroy in the resolution-change
   branch alongside cap_pool_destroy. Mirror the cap_pool teardown.

   Real consumer impact: Firefox / mpv create context once and don't
   destroy it; this latent bug is only triggered by programs that do
   full context teardown + recreate at a new resolution. Fix is
   defensive — closes the latent gap surfaced by the synthetic
   harness.

2. **cap_pool_probe_pattern.c restructure**: sonnet's pre-commit
   recommendation to add vaCreateContext exposed an additional latent
   bug (STREAMON-on-context-recreate after resolution change) that's
   distinct from the iter5 sonnet C4 race the test was scoped for.
   Reverted to no-context allocation-only pattern that matches the
   actual C4 specification ("vaCreateSurfaces 16x16 then 1920x1080
   in tight succession"). The new STREAMON bug is logged as iter8
   candidate.

3. **run_cap_pool_probe.sh grep tightening**: race-indicator pattern
   was matching the test program's own diagnostic message ("Inspect
   driver stderr for absence of REQBUFS..."). Now grep restricts to
   lines starting with "v4l2-request:" prefix.

Phase 7 results (clean iter7 driver sha 54999017... + this fix):
- Track A (msync verify): 100 frames byte-for-byte SW=HW (sha
  58c8f3f4...) -> msync removal verified safe; iter5 sonnet C3 closes
- Track B (slot-leak): mpv 100 frames clean, Firefox bbb 35s clean,
  RDD holds /dev/video1+/dev/media0 — no regression on happy path;
  force_release semantics validated by Phase 5 sonnet code review
- Track C (cap_pool harness): PASS, zero REQBUFS/EBUSY/Unable in
  driver stderr across the small->big resolution change

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 09:29:46 +00:00
claude-noether 988b848908 iter7: A+B+C — slot-leak fix, cap_pool harness, msync verify harness
Closes three internal carry items in one fork commit. iter6 deferred
these as TODOs; iter7 lands the implementations + supporting tests.

# Track B — slot-leak error recovery (src/)

iter6 documented the RequestSyncSurface error paths as a "bounded
leak we accept" — slots stayed busy=true after REINIT/DQBUF failures
until RequestTerminate ran. With pool=16 and rare errors this was
acceptable, but a sustained-error scenario could starve the pool.

Adds request_pool_force_release(pool, index) which:
1. Tries media_request_reinit on the slot's fd (cheap path)
2. Falls back to close + media_request_alloc (recovery)
3. Leaves the slot dead-busy if even alloc fails (other slots
   unaffected, pool capacity reduced by 1 until destroy)

Wires it into surface.c RequestSyncSurface error paths only for
errors before the OUTPUT-DQBUF attempt. After OUTPUT-DQBUF failure
the V4L2 buffer is in indeterminate kernel state, so a separate
error label (`error_buffer_indeterminate`) leaves the slot
dead-busy — reusing the slot would QBUF on a kernel-still-held
buffer and EINVAL.

Phase 5 sonnet review caught this discriminator subtlety pre-commit.

Files: request_pool.{h,c}, surface.c.

# Track C — cap_pool race synthetic harness (tests/)

iter5 sonnet C4 / iter6 candidate A: cap_pool resolution-change
race was organically exercised by YT's quality renegotiations
(iter6 close, 4 cap_pool_init events clean) but had no
deterministic regression test.

tests/cap_pool_probe_pattern.c — ~170-line C program: opens
libva display, vaCreateConfig, vaCreateSurfaces(small) +
vaCreateContext (triggers OUTPUT pool init at small resolution),
dispose, vaCreateSurfaces(big) + vaCreateContext (forces S_FMT
on the new resolution against an in-use OUTPUT pool — the actual
race-hitting path).

Phase 5 sonnet flagged that without vaCreateContext the test
would pass trivially (OUTPUT pool never init'd, REQBUFS(0) on
empty queue is a no-op). Fixed before commit.

tests/run_cap_pool_probe.sh — runner; greps driver stderr for
REQBUFS / EBUSY / "Unable to set format" race indicators.

# Track A — msync pixel-correctness verify harness (tests/)

iter5 sweep removed msync(MS_SYNC|MS_INVALIDATE) from CAPTURE
DQBUF path. iter5 sonnet C3 flagged: no formal pixel verification.

tests/run_msync_pixel_verify.sh — runs FFmpeg SW decode (libavcodec
reference) and FFmpeg HW decode (via our v4l2_request driver),
compares NV12 byte streams. Probes fixture dimensions via ffprobe
and uses crop=$W:$H after hwdownload to normalize MB-padding
artifacts (hantro pads height to 16-line align; SW returns
crop-aligned).

Phase 5 sonnet flagged the stride-mismatch false-failure risk
pre-commit. Fixed: explicit crop + diagnostic that distinguishes
genuine pixel divergence from MB-padding stride artifacts.

# Phase 5 sonnet code review

Verdict: APPROVE-WITH-CHANGES. Three actionable findings, all
addressed before this commit:
1. surface.c error path: separated OUTPUT-DQBUF-failure into
   error_buffer_indeterminate label, slot stays dead-busy
2. cap_pool_probe_pattern.c: added vaCreateContext to actually
   exercise the OUTPUT pool init at the small resolution
3. run_msync_pixel_verify.sh: explicit crop on HW path,
   stride-mismatch diagnostic distinguished from corruption

Empirical verification (Phase 6+7 deploy + run): pending operator
ohm-tools availability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 06:49:48 +00:00
claude-noether a09c03c154 iter6 fix: per-OUTPUT-slot request_fd binding via REINIT
iter4 (385dee1) replaced the original media_request_reinit pattern
with close+media_request_alloc per frame to escape an EINVAL on
S_EXT_CTRLS that turned out to be a DPB-payload bug (74d8dd1, FFmpeg
V4L2_H264_FRAME_REF semantics). The per-frame close+alloc model
worked for mpv vaapi-copy (single-surface recycle) but raced under
Firefox 150's MediaSource pipeline (multi-surface rotation): fd=30
got reused via lowest-free-fd allocation faster than the kernel-
side per-buffer state-machine could tear down the prior request,
producing intermittent VIDIOC_QBUF EINVAL on OUTPUT after 1..53
successful frames.

Phase 2 telemetry confirmed:
- DQBUF returned the index we passed (no FIFO mismatch)
- SPS/PPS/DECODE_PARAMS/SCALING_MATRIX byte-identical between mpv
  and Firefox first 64 bytes
- Pool size bump 4 -> 16 only delayed the failure (62 frames)
- Different OUTPUT slot indices failed across runs (race signature)

Fix: each OUTPUT pool slot owns a permanent request_fd allocated
once at request_pool_init and REINIT'd between uses in
RequestSyncSurface. 1:1 slot-to-fd binding eliminates cross-slot fd
reuse entirely. Pool stays driver-wide (multi-context safe per
iter5 Track E); slots cycle through 16 distinct fds in round-robin
acquire.

Files:
- request_pool.h: add request_fd field to slot struct; init
  signature takes media_fd
- request_pool.c: alloc per-slot fd at init, close at destroy
- context.c: pass driver_data->media_fd; pool size 4 -> 16
- picture.c: BeginPicture binds slot->request_fd to surface;
  EndPicture's per-frame media_request_alloc removed
- surface.c: RequestSyncSurface uses media_request_reinit instead
  of close+alloc; DestroySurfaces close removed (slot owns fd);
  error path close removed; surface_object NULL-init for the
  -Wmaybe-uninitialized warning fix

Empirical verification (clean build sha ebe396d5..., no diagnostic
instrumentation):
- Firefox 150 + bbb_1080p30_h264.mp4 + LIBVA_DRIVER_NAME=v4l2_request
  + sandbox enabled: 35s+ playback, zero "Unable to queue buffer"
  / "Unable to set control(s)", lsof shows RDD process holds
  /dev/video1 + /dev/media0 throughout. Driver stderr: only the
  single cap_pool_init: 24 slots ready line.
- mpv vaapi-copy 50 frames: zero errors, "Using hardware decoding
  (vaapi-copy)" - no regression vs iter5-end driver.

Pool-size bump diagnostic (Phase 5 sonnet design review feedback):
4 -> 16 alone took 1->62 frames, far short of the 30s success
criterion (~900 frames at 30fps). REINIT discipline is the actual
fix; pool 16 is comfortable headroom over typical H.264 MaxDpbFrames.

Phase 5 sonnet code review: APPROVE-WITH-CHANGES (one comment
attribution corrected: cleanup runs at RequestTerminate, not
RequestDestroyContext, since the pool is driver-wide).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:30:39 +00:00
test0r c8b6edec3d iter5 sweep follow-up: remove additional DEBUG sites flagged by Phase 5 review
Phase 5 sonnet review caught four DEBUG sites the first sweep pass
missed (the vaapi-copy + --vo=null stress test didn't exercise the
ExportSurfaceHandle path, so per-frame ExportSurfaceHandle dumps went
undetected).

Removed:
- surface.c::CreateSurfaces2 format-dump (per-CreateSurfaces2 noise,
  labeled DEBUG INSTRUMENTATION (surface-export diagnosis 2026-05-04))
- surface.c::ExportSurfaceHandle full-descriptor dump (per-frame for
  consumers using DMA-BUF, also labeled DEBUG)
- surface.c::QuerySurfaceStatus -> status= line (per-call noise)
- h264.c V4L2 readback block (~67 lines): static bool readback_warned
  + the per-frame VIDIOC_G_EXT_CTRLS attempt + the readback success
  log + the "V4L2 readback unavailable" fallback announcement. With
  the iter4 fixes landed, the readback EACCES is no longer load-bearing
  to investigate — drop the block + the per-process global state.

Removing the readback block also resolves Phase 5 finding C2: the
static bool readback_warned was new mutable process-global state
introduced post-Track-E, inconsistent with that track's intent.

Net: -107 lines from src/{h264,surface}.c.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 16:04:03 +00:00
test0r b993355507 iter5 Track E: move LAST_OUTPUT_WIDTH/HEIGHT from process-global to per-driver-data
Sonnet review 7.3 / 9.6 from iter1 + carried iter2/3/4 substrate.
Two libva driver_data instances in the same process (e.g. Firefox
playing two tabs at different resolutions, or Firefox + mpv via the
same dlopened backend) would race on the static cache.

Move to struct request_data.last_output_width/height. The V4L2
device fd is already per-driver_data, so this is the correct binding
unit (one fd, one current OUTPUT format).

Verified: two concurrent mpv processes (2s stagger) both decode
300 frames cleanly with no cross-corruption. Same-instant init still
hits kernel-level fd contention on /dev/video1 (hantro is a
single-instance device); cross-process serialization is out of scope
for a libva backend.

Resolves the surface_reset_format_cache() callsite: now takes
driver_data parameter (was zero-arg).

Also drops the 'rc' unused-variable warning in v4l2_ioctl_controls
that the iter5 sweep left behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:05:41 +00:00
test0r 843febc174 iter5 sweep: remove iter1 slice_header parse + VAPicture dump + Sync RETURN trace
h264.c:
- Remove the slice_header parse success log (the parse data is now
  forwarded into decode_params directly without per-frame echo). Keep
  the FAILED-rc log since it indicates a real decode-blocking error.
- Remove the iter1 patch-0014 VAPictureH264 byte-dump + field-read
  log block. The TopFieldOrderCnt=65536 anomaly it diagnosed was
  resolved by the POC sentinel strip (h264_strip_ffmpeg_poc_sentinel)
  that stays in the codebase.

surface.c:
- Remove the per-call "RequestSyncSurface RETURN status=" trace.
- Remove the per-call "RequestSyncSurface early-exit" trace.

v4l2.c:
- Suppress the per-frame "Unable to get control(s): Permission denied"
  log when errno == EACCES (the expected case on this hantro rig
  per iter1 patch-0014's findings). The one-time announcement in
  h264.c stays. Real EACCES-on-non-request-fd or other errno values
  still log normally.

Per-frame v4l2-request log noise drops from ~30+ lines/frame to
init-time + once-per-resolution-change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:51:10 +00:00
test0r d3a299b4cc iter5 sweep: remove iter1 patch-0010 hex-dumps + patch-0011 sentinel
picture.c: remove the 0xab sentinel write into CAPTURE buffer first
32 bytes pre-QBUF + the OUTPUT hex-dump pre-QBUF. Both were iter1
diagnostics for "where does the buffer write go?" investigation.

surface.c: remove the post-DQBUF CAPTURE Y-plane hex-dump + luma
variance signal. The msync(MS_SYNC|MS_INVALIDATE) was added as a
companion fix for the cached-mmap issue surfaced by the dump itself —
removing the dump removes the need for the msync.

With iter1+iter2+iter3+iter4 fixes landed, these dumps fire on every
single frame and produce hundreds of MB of log noise during sustained
decode. Now gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:48:31 +00:00
test0r 951233a12e iter5 sweep: remove iter1 ENTER traces (13 call sites across 4 files)
Removes the iter1 patch-0014 ENTER traces from buffer.c, image.c,
picture.c, surface.c. These were diagnostic-only entry-point logs
added during iter1's "where does Firefox RDD crash?" investigation.
With the iter1+iter2+iter3+iter4 fixes landed, the entry-point
traces are pure noise.

If a future investigation needs entry-point coverage, strace -e trace
on the libva consumer process gives equivalent visibility without
modifying the driver.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:47:25 +00:00
test0r 385dee1bbf iter4 fix: fresh request_fd per frame (fixes carryover EINVAL)
This is the load-bearing fix that resolves the iter1+iter2+iter3
"frame-11 EINVAL" carryover. Replace the per-surface request_fd cache
+ MEDIA_REQUEST_IOC_REINIT pattern with allocate-fresh-per-frame:
in RequestSyncSurface, after queue + wait_completion succeed, close
the request_fd and reset surface_object->request_fd = -1 so the next
BeginPicture allocates a new one via media_request_alloc.

Diagnostic root cause: per-control VIDIOC_TRY_EXT_CTRLS isolation
showed all four H.264 controls (SPS/PPS/DECODE_PARAMS/SCALING_MATRIX)
fail individually with EINVAL on the *same* request_fd that had been
through queue+wait+reinit. The fd state was bad even though every
ioctl in the previous decode cycle returned success. Allocating fresh
sidesteps any kernel-side request-state-machine subtlety we don't
fully understand.

Empirical verification (iter4 Phase 7, 90s autonomous run on ohm via
firefox-fourier without MOZ_DISABLE_RDD_SANDBOX=1, bbb_1080p30 H.264):
  - ENETDOWN count: 0
  - S_EXT_CTRLS rejected: 0  (was: fired at frame 11 every iter1-3)
  - Unable to set control(s): 0
  - Generic EINVAL: 0
  - Video stream mTime reached: 49.7 seconds
  - Audio stream mTime reached: 51.5 seconds

Cost: ~one extra MEDIA_IOC_REQUEST_ALLOC + close() per decoded frame.
Negligible (cycles below the V4L2 set_controls + queue + wait stack).

Companion fixes that landed earlier in iter4 to get to this point:
  74d8dd1 — DPB fields=V4L2_H264_FRAME_REF + skip !used entries
            (matches FFmpeg's libavcodec/v4l2_request_h264.c semantics)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:41 +00:00
test0r 19acc76da4 iter2 Fix 3: decoupled CAPTURE buffer pool with LRU recycling
Pre-iter2 each VA surface was permanently 1:1 bound to one V4L2 CAPTURE
buffer. mpv reusing a surface for a new decode while the compositor still
held an EXPBUF'd dma_buf fd to the prior frame caused the kernel to
write fresh decode output into the same physical memory the compositor
was reading -- visible as stutter / back-and-forth swap on
mpv --hwdec=vaapi --vo=gpu playback.

Architecture:
- New cap_pool abstraction (cap_pool.{h,c}) owns N CAPTURE buffers
  (N = max(surfaces_count, MIN_CAP_POOL=24)) with per-slot state
  {FREE, IN_DECODE, DECODED, EXPORTED} guarded by pthread_mutex_t.
- Surfaces no longer own buffers; each vaBeginPicture acquires the
  oldest FREE slot (LRU), binds it for the decode cycle, and the slot
  cycles IN_DECODE -> DECODED (post-DQBUF) -> EXPORTED (post-EXPBUF).
- Slot is released on next BeginPicture for the same surface or on
  vaDestroySurfaces.

Limitations (Sonnet Phase 5 review iter2 9.x, deferred to iter3+):
- Option-A statistical mitigation; race window narrows to "pool
  exhausted, force-recycle of oldest EXPORTED slot." For typical mpv
  16-surface playback with MIN_CAP_POOL=24 the fallback never fires.
- Multi-context concurrent use not addressed (one V4L2 device, multiple
  cap_pools -- iter3 scope).

Other call sites updated:
- picture.c::BeginPicture acquires + binds, releasing prior slot if any.
- surface.c::SyncSurface marks slot DECODED after DQBUF.
- surface.c::ExportSurfaceHandle marks slot EXPORTED, retaining OUR
  EXPBUF fd for force-recycle close().
- surface.c::DestroySurfaces releases via surface_unbind_slot;
  cap_pool owns the mmaps now.
- surface.c::CreateSurfaces2 destroys the pool in the resolution-change
  path before REQBUFS(0) (else stale v4l2_index after Fix 1's REQBUFS).
- context.c::DestroyContext invokes cap_pool_destroy.
- image.c::DeriveImage skips copy_surface_to_image when current_slot is
  NULL (ffmpeg av_hwframe_ctx_init probes derive on undecoded surfaces).

Verified: mpv vaapi-copy 200 frames bbb_1080p30, 0 drops, LRU visibly
recycling slot indices, real luma gradient. mpv vaapi --vo=gpu
operator-inspection follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:03:31 +00:00
test0r e64bb0852d iter2 Fix 2: conditional DRM_FORMAT_MOD_INVALID for non-64-aligned pitch
Iteration 2 Fix 2: branch on bytesperline alignment when setting
the drm_format_modifier in RequestExportSurfaceHandle. For
non-64-byte-aligned pitches (e.g. 864 for 864-wide videos),
report DRM_FORMAT_MOD_INVALID instead of DRM_FORMAT_MOD_NONE
(LINEAR explicit). Mesa's WSI rejects LINEAR buffers that
aren't scanout-aligned with 'WSI pitch not properly aligned';
MOD_INVALID tells the importer to treat as texture-only, which
is the correct behavior for buffers that don't meet scanout
alignment requirements.

Diagnosis from operator's mozilla.org session in iteration 1
close: 864-wide intro videos triggered the WSI alignment error
and Firefox fell back to SW for those videos.

Sonnet Phase 5 review endorsed the conditional approach over a
universal MOD_INVALID change to preserve LINEAR semantics for
already-aligned content (avoids unnecessary perf cost on the
common 1920-wide case).

Verification path (Phase 7 of iteration 2): Firefox loads
mozilla.org main page; check no MESA WSI errors in stderr;
operator confirms intro videos engage HW decode (or at least
don't fall back).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 19:18:55 +00:00
test0r 06beef6248 iter2 Fix 1: invalidate format cache on DestroyContext + REQBUFS(0) on CAPTURE in resolution-change path
Fix 1 of iteration 2 per phase4_iter2_plan.md.

Adds surface_reset_format_cache() exposed from src/surface.h. Called
from RequestDestroyContext after the dual REQBUFS(0). Without this,
multi-video Firefox sessions on mozilla.org corrupted the next
session's CAPTURE format query: the kernel reset to defaults but
our LAST_OUTPUT_WIDTH/HEIGHT cache still said 'already 1920x1088,'
so the next G_FMT returned 48x48 and the exported descriptor
encoded wrong pitch/offset.

Also adds REQBUFS(0) on CAPTURE in the resolution-change path of
RequestCreateSurfaces2 (Sonnet Phase 5 review iter2 9.1). The
existing code only did REQBUFS(0) on OUTPUT before re-S_FMTting;
hantro derives CAPTURE format from OUTPUT format, so leftover
CAPTURE buffers from the prior resolution would also block the
implicit format change. Pre-existing bug surfaced by Sonnet's
audit; Fix 3 pool refactor would have exposed it more often.

Limitation noted in surface.h docblock: the LAST_OUTPUT_WIDTH/
HEIGHT cache is a static process-global, so concurrent multi-
context use still races (Sonnet 7.3 / 9.6). Iteration 2 only
addresses sequential sessions. Multi-context safety is iteration 3+.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 19:11:03 +00:00
test0r ac891a01fa surface: honor VA_EXPORT_SURFACE_SEPARATE_LAYERS in vaExportSurfaceHandle
Firefox 150's RDD calls vaExportSurfaceHandle with the
VA_EXPORT_SURFACE_SEPARATE_LAYERS flag (per FFmpegVideoDecoder.cpp
GetVAAPISurfaceDescriptor at the libva-VAAPI export site). With
that flag, libva consumers expect 2 separate layers — Y as
DRM_FORMAT_R8, UV as DRM_FORMAT_GR88, each with num_planes=1 —
not the COMPOSED single-layer-with-2-planes shape we always
returned regardless of flags.

Our previous code ignored the flag parameter and always built
the COMPOSED descriptor. mpv works with that because mpv passes
the default (COMPOSED) flag and the shape matches. Firefox's
DMABufSurfaceYUV import code parsed our COMPOSED descriptor as
if it were SEPARATE, found bogus layer-1 data, silently fell
back to FFmpeg(FFVPX) software decode after frame 0.

Fix: branch on the flag and build the appropriate descriptor.

  flags & VA_EXPORT_SURFACE_SEPARATE_LAYERS:
    num_layers=2
    layers[0] = Y as DRM_FORMAT_R8, num_planes=1
    layers[1] = UV as DRM_FORMAT_GR88, num_planes=1

  default (COMPOSED, including unflagged):
    num_layers=1, drm_format=DRM_FORMAT_NV12, num_planes=2
    (existing behavior, preserved for mpv et al.)

For the single-fd case (hantro NV12 backed by one CMA buffer),
both layers reference object_index=0 with different offsets and
pitches (both stride=1920 for 1920x1088).

Diagnosed via Firefox source dive (mozilla/gecko-dev master,
dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp:1638) — the
explicit flag in the export call was the discriminator between
mpv's success and Firefox's silent SW fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:32:12 +00:00
test0r fdfee2d661 DEBUG: log SyncSurface RETURN to confirm clean exit before crash 2026-05-04 15:08:25 +00:00
test0r 7da2b27454 DEBUG: ENTER logging at libva entry points to trace Firefox call flow
Adds request_log on entry to:
- RequestSyncSurface
- RequestQuerySurfaceAttributes
- RequestQuerySurfaceStatus (including the returned status value)
- RequestDeriveImage
- RequestQueryImageFormats
- RequestGetImage

Goal: identify which API call Firefox 150 makes that returns
differently than it expects, causing the SW fallback after
frame 0. mpv works end-to-end with the surface-export fix in
place; Firefox does not. Per operator's correction: don't assume
mpv's success means the driver is correct — Firefox may detect
a real spec violation that mpv silently tolerates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 14:17:52 +00:00
test0r 37c0e720fc surface: re-set OUTPUT format on resolution change
The static SET_FORMAT_OF_OUTPUT_ONCE flag pinned the OUTPUT format
to the first call's dimensions, which for mpv's probe pattern means
128x128. Subsequent CreateSurfaces2 for the real 1920x1088 resolution
would then read CAPTURE format from the kernel (which derives from
the OUTPUT format) and get 128x128 sizes back — leading to a
VADRMPRIMESurfaceDescriptor with width=1920 height=1088 but pitch=128
offset=16384. Mesa's WSI rejected this as 'pitch too small,' and the
mpv vaapi --vo=gpu render landed on a solid blue frame. Same root
cause for Firefox 150's SW fallback after frame 0.

Replace SET_FORMAT_OF_OUTPUT_ONCE with LAST_OUTPUT_WIDTH/HEIGHT
tracking. When dimensions change, call REQBUFS(0) on OUTPUT to drop
any stale buffers (S_FMT is rejected by V4L2 while buffers exist),
then re-S_FMT at the new resolution. The kernel will derive the
new CAPTURE format from this OUTPUT format on the next CreateBufs
+ G_FMT cycle.

Caveat (TODO for next iteration): for consumers that legitimately
stream multiple resolutions in sequence (mid-stream resolution
change via V4L2_EVENT_SOURCE_CHANGE), the current approach still
requires CreateSurfaces2 to be called, which mpv does on probe.
A proper context-level redesign would handle SOURCE_CHANGE inline
with STREAMOFF + REQBUFS(0) + new S_FMT.

Diagnosis and root cause: surfaced by 2026-05-04 Phase 5 sonnet
review (finding 7.3) as a 'latent bug to document.' Today's
instrumentation captured it as the active bug — the
ExportSurfaceHandle dump showed pitch=128 for 1920x1088 surfaces
right before MESA reported 'WSI pitch too small' and dropped to
software.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 14:05:16 +00:00
test0r 2517a1206b DEBUG: instrument surface CreateSurfaces2 + ExportSurfaceHandle for diagnosis
Logs format_width/height + bytesperline + sizes from v4l2_get_format
in CreateSurfaces2, and the full VADRMPRIMESurfaceDescriptor in
ExportSurfaceHandle (fd, fourcc, width/height, num_objects/layers,
obj.size + drm_format/modifier, plane offsets/pitches).

Diagnostic for the surface-export bug surfaced by Phase 7 (mpv
--hwdec=vaapi --vo=gpu shows solid blue, Firefox falls back to SW
after frame 0 — both consumers GL-import the DMA-BUF, both fail
to render correctly while vaapi-copy works).

Phase 5 review (sonnet) suggested format_height might be 1080
(stream) vs 1088 (MB-aligned), miscomputing UV offset by 15360
bytes. Earlier ftrace shows kernel returns height=1088 — the
hypothesis is likely false but verifying in-driver to confirm.

Will compare with mpv --msg-level=vd=v --msg-level=vo=v output to
identify the import-side discrepancy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 14:00:13 +00:00
test0r a047926dbc DEBUG: cache-fix CAPTURE dump + VIDIOC_G_EXT_CTRLS readback
Tier 3E + 3F observability hardening from the libva-multiplanar
campaign Phase 6 follow-up. Improves diagnostic reliability for
future probes; no functional decode path change.

Tier 3E (cache-fix): patch-0010's CAPTURE Y-plane dump now calls
msync(p, 32, MS_SYNC|MS_INVALIDATE) before the read so userspace
sees what the kernel actually DMA-wrote, not a stale CPU cache
line. Without this, the previous version of the dump consistently
showed the patch-0011 sentinel (0xab) even when the kernel had
overwritten it — caused half a day of mistaken "kernel never wrote
the buffer" diagnosis. Also computes a luma min/max/variance
signal so a uniform fill (variance=0) is visually obvious vs
real pixel data (variance > 0).

Tier 3F (VIDIOC_G_EXT_CTRLS readback): after v4l2_set_controls
in h264_set_controls, reads back DECODE_PARAMS + PPS via
v4l2_get_controls (added by patch 0003) on the request fd.
Logs key fields:

  dec.idr_pic_id, poc_lsb, refmark_bits, poc_bits  — confirms
    slice-header parser outputs landed in the V4L2 control batch.
  dec.top_foc / bot_foc                             — confirms
    patch-0015 POC sentinel strip actually applied (should NOT
    show 65536 unless the strip mis-fired).
  dec.frame_num                                     — cross-checks
    against VAAPI's pre-parsed frame_num (also already logged by
    patch 0014).
  pps.flags + (SMP=...)                             — confirms
    SCALING_MATRIX_PRESENT bit set this build.
  pps.refidx_l0/l1                                  — confirms
    Tier 1B num_ref_idx writes landed.

Discriminates "we wrote X but kernel saw Y" from "we wrote zero
all along" — the failure mode the original patch series didn't
catch when slice-header bit_size fields were left zero.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:58:52 +00:00
test0r 3609fbb425 DEBUG: hex-dump OUTPUT and CAPTURE buffer contents per frame
Diagnostic-only patch (NOT for upstream). Hex-dumps:
  - First 32 bytes of OUTPUT buffer at QBUF time in
    picture.c::RequestEndPicture (i.e. what we feed the kernel)
  - First 32 bytes of CAPTURE Y-plane after DQBUF in
    surface.c::RequestSyncSurface (i.e. what kernel returned)

Lets us see whether:
  - OUTPUT bitstream begins with valid ANNEX_B start code + NAL
    header byte (e.g. `00 00 01 65` for IDR slice)
  - CAPTURE Y-plane after decode contains varied luma data
    (working) vs. all-zeros / repeating pattern (kernel didn't
    write anything).

Removed once Step 1 decode is verified working. Output goes via
existing request_log() to stderr.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 597e896594 surface: don't VIDIOC_S_FMT the CAPTURE queue
The hantro stateless decoder derives the CAPTURE format from the
SPS attached to the per-request OUTPUT controls. Calling
VIDIOC_S_FMT on the CAPTURE queue at vaCreateSurfaces2 time can
leave the driver's vb2 state in an inconsistent configuration
where the queue accepts buffers and DQBUF returns successfully but
the kernel never actually writes decoded pixels into them.

Cross-reference: GStreamer's gst-plugins-bad/sys/v4l2codecs/
gstv4l2decoder.c only calls VIDIOC_G_FMT on the CAPTURE side
(via gst_v4l2_decoder_negotiate_src_format and friends). The
same code path produces correctly-decoded NV12 frames on the
same RK3568 hantro-vpu where libva-v4l2-request-with-S_FMT
emits flat-green zeroed CAPTURE buffers.

The v4l2_get_format() call immediately after this block already
gives us the bytesperline / sizes the driver chose; nothing else
in this file consumed the explicit S_FMT side-effects.

Empirical hypothesis test for the lingering "kernel decodes
without errors but emits zeroed CAPTURE" bug. If post-patch
output shows actual picture content, this confirms the
diagnosis: explicit CAPTURE format mutation breaks hantro's
internal state. If output remains flat-green, the bug is
elsewhere and we resume hex-dump-grade instrumentation.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 565f5c0de4 context: introduce request_pool, decouple OUTPUT buffers from surfaces
Commit 3 of the upstreamable plan (upstreamable_design.md §1, §5).
Replaces the prior per-surface OUTPUT-buffer ownership model with a
small driver-wide pool sized by codec pipeline depth (4 H.264 frames
in flight), allocated unconditionally regardless of caller's
num_render_targets.

Prior art (kernel UAPI dev-stateless-decoder.rst, ffmpeg
v4l2_request.c, Chromium V4L2StatelessVideoDecoder, GStreamer
v4l2slh264dec) all decouple OUTPUT and CAPTURE pool sizing. fourier's
"output_count == surfaces_count" model was a category error: OUTPUT
buffers are request-time bitstream slots, CAPTURE buffers are
picture-time DPB slots; their lifecycles and sizing are independent.

Changes:
  * NEW src/request_pool.{c,h} (~200 LoC):
      - request_pool_init(): CREATE_BUFS + per-slot QUERYBUF + mmap.
      - request_pool_destroy(): munmap all, idempotent.
      - request_pool_acquire(): round-robin claim; returns V4L2 buffer
        index of an unused slot or -1.
      - request_pool_release(): mark slot free for reuse.
      - request_pool_slot(): accessor for ptr/size given a buffer index.

  * src/request.h: add struct request_pool output_pool to request_data.

  * src/context.c::RequestCreateContext: replace the per-surface
    OUTPUT loop with a single request_pool_init() call (count=4,
    independent of surfaces_count). Drop the now-unused locals
    (length, offset, source_data, output_buffers_count, index,
    index_base, i, surface_object). DELETES patch 0002's
    "output_buffers_count = ... ? ... : 4" hack inline — the pool's
    own count parameter supersedes it.

  * src/picture.c::RequestBeginPicture: borrow a pool slot at frame
    start, write its mmap pointer/size/index into the surface's
    transient source_* fields. The fields stay (still useful as
    a borrow handle that the existing codec_store_buffer memcpys
    target), but no longer represent surface-permanent ownership.
    Reset slices_size/slices_count here too (was implicit on first
    Render).

  * src/surface.c::RequestSyncSurface: after VIDIOC_DQBUF returns
    the OUTPUT buffer, release the pool slot and clear the surface's
    borrow handle. Fixes the segv on second-frame submission.

  * src/surface.c::RequestDestroySurfaces: remove the munmap of
    source_data — pool owns the mmap.

  * src/request.c::RequestTerminate: call request_pool_destroy()
    before close(video_fd) so munmaps still target a valid fd.

  * src/meson.build: add request_pool.c and request_pool.h to the
    sources/headers lists.

This commit removes 0002's OUTPUT-pool hack inline (the
"floor to 4" line is gone). The DECODE_MODE/START_CODE block in 0002
remains until commit 4 lands.

Build-verified clean on aarch64.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 10114f6781 mplane: enable V4L2 multiplanar capture for NV12 on hantro-vpu
Fourier's local patch already wired multiplanar plumbing through
src/v4l2.c (helpers v4l2_type_video_{output,capture}() at lines 59-69,
struct v4l2_plane planes[] threading in QUERYBUF/QBUF/DQBUF, per-plane
EXPBUF loop at line 411) and through src/context.c, src/buffer.c,
src/picture.c via the v4l2_type_video_{output,capture}(video_format
->v4l2_mplane) helper calls.

The remaining gap: the NV12 entry in src/video.c was hardcoded to
v4l2_mplane=false, and the bootstrap path in src/surface.c was
hardcoded to singleplanar literals before video_format is populated.

This patch flips the NV12 entry to v4l2_mplane=true and updates the
two singleplanar literals in src/surface.c to their MPLANE variants:

  - src/video.c:42  v4l2_mplane=false -> true (NV12 only;
    Sunxi-tiled NV12 left at false for cedrus compatibility)
  - src/surface.c:84  output_type = v4l2_type_video_output(true)
  - src/surface.c:109 v4l2_find_format(..., CAPTURE_MPLANE, NV12)

Empirically, hantro-vpu (RK3568 mainline) advertises NV12 only under
V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; querying the singleplanar type
returns no match (verified via VIDIOC_ENUM_FMT in Phase 3 GStreamer
strace baseline).

Trade-off accepted: legacy sunxi-cedrus singleplanar NV12 paths are
left unchanged via the SUNXI_TILED_NV12 entry (still mplane=false,
__arm__ only). Pure-NV12 cedrus on aarch64 would regress, but the
known userbase here is RK3566/RK3568 hantro.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r c45fea96e3 fourier-local: stateless control modernization + HEVC strip
Compound patch carrying the fork's pre-Step-1 substrate, originally
authored by Jernej Škrabec / fourier on top of bootlin's a3c2476:

- src/h264.c + src/picture.c: V4L2_CID_MPEG_VIDEO_H264_* renamed to
  V4L2_CID_STATELESS_H264_*, struct shapes tracked to mainline
  (V4L2_CID_STATELESS_H264_DECODE_MODE/_START_CODE added to the
  passthrough shim).
- include/hevc-ctrls.h: redirect shim to <linux/v4l2-controls.h>
  (kernel-side HEVC controls now live in the canonical UAPI header).
- src/meson.build: src/h265.c / src/h265.h commented out — HEVC
  build path is excluded from this fork (RK3568 hantro G1/G2 has
  no HEVC, and the kernel-side HEVC controls have a separate
  rework in flight upstream).
- src/tiled_yuv.S: aarch64 stub for tiled_to_planar (assembly
  source was sunxi-cedrus armv7-only; aarch64 needs a stub to keep
  the build linking).
- include/h264-ctrls.h: removed (dead post-fourier — no source
  includes it; the passthrough shim's CID aliases live in the
  kernel header now).

Functionally equivalent to the prior fork master commits:
  c1f5108 V4L2_PIX_FMT_H264_SLICE rename
  4ccbfe9 Strip HEVC build path
  da9f2a5 include/h264-ctrls.h passthrough + CID aliases
  fc4bb10 src/h264.c track upstream UAPI shape
  13e9b64 src/h264.c drop num_slices field
  4d14ffb src/tiled_yuv.S aarch64 stub
  1b02c9b src/h264.c include utils.h

Folded into one commit during 2026-05-04 Step 1 reconciliation
(see ../phase0_evidence/2026-05-04/findings.md). Per-patch history
of the early fork commits preserved on the pre-step1 branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 09:40:14 +00:00
Nicolas Dufresne b8ac9bb9ea surface: Only set format if unset
The vaCreateSurface2 may be called multiple times, setting the format
again would lead to EBUSY being returned as you cannot change the
format if you have buffers allocated.

Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
2019-05-16 16:14:55 +02:00
Paul Kocialkowski 518d7a0c59 Update and harmonize heading author lists
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:37:12 +01:00
Ezequiel Garcia 2c27ec3794 Add settable attributes to pixelformats
Apparently, pixelformats are expected to be settable although
the reason is not exactly clear to me.

However, intel vaapi driver sets all its pixelformats as
settable, and gstreamer-vaapi expects that as well.

Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
2018-10-12 16:13:05 -03:00
Paul Kocialkowski 7ff2543e64 Add support for the single-planar V4L2 API
Signed-off-by: Paul Kocialkowski <contact@paulk.fr>
2018-09-07 16:43:13 +02:00
Paul Kocialkowski 25a8ac4d7e Register video format directly instead of tiled indicator
Signed-off-by: Paul Kocialkowski <contact@paulk.fr>
2018-09-07 12:58:44 +02:00
Paul Kocialkowski c9327dd55a Grab the base index when allocating buffers and mapping them
Because there might be more than a single call to CreateSurfaces,
we cannot assume that the index relative to the number of surfaces
requested in a single call matches the v4l2 index.

Grab the base index (as returned by the kernel) when allocating
buffers and use it for memory mapping and addressing them in v4l2.
This avoids memory-mapping the first (index 0) buffer multiple times
in that scenario instead of the n-th allocated buffer (in the n-th
call in the sequence).

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-20 13:48:51 +02:00
Paul Kocialkowski 3b0e7dbf12 surface: Avoid unitialized variable compiler warning
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-20 13:47:48 +02:00
Paul Kocialkowski d2357862f8 Rename request_buffer helper to query_buffer
Since the V4L2 ioctl is called QUERYBUF, it makes more sense to
call the associated function with the same name.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-20 13:47:48 +02:00
Paul Kocialkowski c2fb5683cf surface: Remove duplicate request fd close
This removes a duplicate conditional close of the request fd.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-20 13:47:48 +02:00
Paul Kocialkowski c764527c17 Add support for QuerySurfaceAttributes
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-18 15:07:42 +02:00
Paul Kocialkowski 7587ef6901 surface: Add ExportSurfaceHandle support for dma-buf export
This is the latest version of dma-buf export, that does support
specifying DRM modifiers.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-18 15:02:37 +02:00
Paul Kocialkowski 829abae895 surface: Add basic support for CreateSurfaces2
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-18 14:40:36 +02:00
Maxime Ripard 92f6546596 tree: Remove void * casts
void * can be assigned from and stored to any pointer type without any
warning. Remove the explicit casts.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 17:02:23 +02:00
Maxime Ripard 111f5b209a tree: Rename cedrus_data to request_data
The cedrus_data structure carries the old name. In order to migrate to the
new name, let's rename it to request_data.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 17:02:23 +02:00
Maxime Ripard 4ad990e087 tree: Rename the header and defines
The sunxi_cedrus.h header contains a bunch of defines prefixed with
SUNXI_CEDRUS.

As part as the ongoing migration to a more generic name, change that prefix
for V4L2_REQUEST, and the header file to request.h

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 17:02:23 +02:00
Maxime Ripard 913e1e642c tree: Rename the libva hooks
As part of our renaming effort, Rename the libva hooks names to mention
request instead of SunxiCedrus

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 17:02:23 +02:00
Maxime Ripard 2d1bce38c2 h264: Don't set num_slices anymore
The num_slices parameter was improperly set to the number of reference
frames, which is incorrect.

Add a counter for the number of slices per surface, and set num_slices to
that value.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 15:28:55 +02:00
Maxime Ripard 5aeb07f8bf tree: Run clang-format to conform to the kernel coding style
The coding style has been a bit erratic. Enforce the linux kernel coding
style by reusing their .clang-format file, running clang-format on the
source, and ignoring the few shortcomings that clang-format has at the
moment (especially on aligning the define values).

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 10:12:15 +02:00