Commit Graph

306 Commits

Author SHA1 Message Date
claude-noether c106d95869 fresnel-fourier iter7 Phase 6: auto-detect with decoder-entity discrimination (B1a)
Refactor request.c::find_video_node_via_topology to
find_decoder_video_node_via_topology — walks media-topology entities
looking for MEDIA_ENT_F_PROC_VIDEO_DECODER function, then follows the
kernel's link graph (data link from proc to IO entity, interface link
from IO entity to V4L_VIDEO interface) to the correct /dev/videoN.

Two-pass find_codec_device: pass 1 accepts only "rkvdec" (multi-codec
decoder, 3 of 5 codecs); pass 2 accepts any known_decoder_drivers
entry. Pre-iter7 the walk picked whichever media device matched the
hantro-vpu driver name first — which on RK3399 could be the encoder
half of the same media device, surfacing as an empty profile list.

Phase 5 amendments incorporated:
- CRIT-1: use MEDIA_LNK_FL_INTERFACE_LINK (1U<<28) to discriminate
  interface vs data links.
- CRIT-2: check both source_id and sink_id of each link.
- IMP-3: 2-call MEDIA_IOC_G_TOPOLOGY pattern (allocate all 3 arrays
  before second call); pre-iter7 had a spurious memset + third call.

iter4-B1b (multi-decoder routing — open BOTH rkvdec AND hantro from
one backend instance) still deferred. Post-iter7 MPEG-2/VP8 (hantro)
still need LIBVA_V4L2_REQUEST_VIDEO_PATH override.

Signed-off-by: claude-noether <claude-noether@reauktion.de>
2026-05-13 09:38:54 +00:00
claude-noether 70196f8065 fresnel-fourier iter5b-β Phase 7 fix-forward commit D: destination_* for vaapi-copy late-surface flow
Phase 7 empirical: all 5 libva codecs returned all-zero because
CreateContext's surfaces_ids[] walk was a no-op for ffmpeg-vaapi-copy
which passes surfaces_count=0 to vaCreateContext (per the iter6
comment at context.c:262). Surfaces existed in driver_data's
surface_heap but weren't in the param array → destination_* stayed
at the zero initialization from CreateSurfaces2 β → BeginPicture's
surface_bind_slot saw destination_planes_count=0 → no data
assignment → copy_surface_to_image read all-zero.

Fix: cache the format-uniform CAPTURE geometry in driver_data
(fmt_valid, fmt_planes_count, fmt_buffers_count, fmt_format_height,
fmt_sizes[], fmt_bytesperlines[]). Populate at CreateContext after
v4l2_get_format(CAPTURE). Walk surface_heap (not just surfaces_ids[])
to fill every existing surface. Add lazy-fill in CreateSurfaces2 for
surfaces created AFTER CreateContext. Invalidate cache in
DestroyContext.

New helper: surface_fill_format_uniform(driver_data, surface_object).
Idempotent on destination_planes_count != 0.

Signed-off-by: claude-noether <claude-noether@reauktion.de>
2026-05-12 18:52:33 +00:00
claude-noether 7055b14f5e fresnel-fourier iter5b-β Phase 6 commit C: β refactor — OUTPUT lifecycle to CreateContext + CRIT-1 + CRIT-2
Strip OUTPUT-side V4L2 device-format lifecycle out of
RequestCreateSurfaces2 entirely. Move S_FMT(OUTPUT), CAPTURE-format
probe, cap_pool_init, per-surface destination_* fill into
RequestCreateContext where config_id (and therefore the bound
VAProfile) is known via config_object->pixelformat (wired by
commit B). The α' multi-CreateSurfaces2-mid-stream failure mode
disappears because β has no in-CreateSurfaces2 teardown branch;
each context cycle does its own setup, DestroyContext handles
teardown.

Phase 5 v2 review amendments:
- CRIT-1: removed video_format==NULL early-return at context.c:64-66
  (would have rejected every first β CreateContext).
- CRIT-2: added request_pool_destroy() to DestroyContext before
  REQBUFS(0). Pre-β only surface.c's resolution-change branch
  called request_pool_destroy; β strips that, so DestroyContext
  becomes the sole per-session teardown site.
- IMP-1: probe CAPTURE format first to derive output_type from
  video_format->v4l2_mplane (eliminates the hardcoded mplane=true
  hack from the Phase 4 v2 plan).
- IMP-2: surface_reset_format_cache() deleted (function + declaration
  in surface.h + call in DestroyContext + last_output_{width,height}
  fields in request.h). All dead under β.

CreateSurfaces2 now ~50 LOC (was ~250). Pure surface ID allocation
+ per-surface lifecycle bookkeeping; no V4L2 device state touched.

Signed-off-by: claude-noether <claude-noether@reauktion.de>
2026-05-12 14:41:35 +00:00
claude-noether cc077a0c06 fresnel-fourier iter5b-β Phase 6 commit B: config.c — wire object_config->pixelformat
Populate the previously-dead pixelformat field at config.h:46 from
pixelformat_for_profile(profile). The switch at lines 54-88 already
rejects unsupported profiles, so by the time we reach the assignment
at line 98, pixelformat_for_profile returns non-zero.

Commit C reads this field at CreateContext to set the V4L2 OUTPUT
format correctly per profile (the β architectural fix for Bug 2 —
HEVC/VP9/VP8 currently dispatch through the pre-iter5b H264_SLICE
hardcode at surface.c:173 because surface.c has no config_id to look
up the profile).

Signed-off-by: claude-noether <claude-noether@reauktion.de>
2026-05-12 14:11:32 +00:00
claude-noether 1c548b136a fresnel-fourier iter5b-β Phase 6 commit A: NEW src/codec.{h,c} — pixelformat_for_profile helper
Re-introduce after the iter5b-α' revert. Helper maps VAProfile to V4L2
OUTPUT-side FOURCC, used at CreateConfig in commit B to populate the
previously-dead object_config->pixelformat field. β reads from there
at CreateContext (commit C).

Single source of truth for the profile→pixelformat mapping; mirrors
the per-profile probes in config.c::RequestQueryConfigProfiles
(lines 138-188).

Register codec.c in meson.build sources, codec.h in headers.

Signed-off-by: claude-noether <claude-noether@reauktion.de>
2026-05-12 14:10:46 +00:00
claude-noether 6bc29ec582 Revert "fresnel-fourier iter5b Phase 6 commit A: NEW src/codec.{h,c} — pixelformat_for_profile helper"
This reverts commit ce304ef5af.
2026-05-12 12:32:57 +00:00
claude-noether 9a7f888f1b Revert "fresnel-fourier iter5b Phase 6 commit B: state-tracking — request.h field + config.c wire-up"
This reverts commit f8256e6c2d.
2026-05-12 12:32:57 +00:00
claude-noether 709ab34624 Revert "fresnel-fourier iter5b Phase 6 commit C: surface.c — profile-derived OUTPUT pixel format"
This reverts commit 4b2288fa9a.
2026-05-12 12:32:56 +00:00
claude-noether 4b2288fa9a fresnel-fourier iter5b Phase 6 commit C: surface.c — profile-derived OUTPUT pixel format
Replace the hardcoded `V4L2_PIX_FMT_H264_SLICE` at surface.c:173 with
a profile-derived lookup via find_sole_active_pixelformat(). The
helper walks the config_heap; with one active config (universal across
mpv, ffmpeg, Firefox, Chromium) it returns the cached pixelformat
populated at CreateConfig in commit B. Falls back to the pre-iter5b
H264_SLICE for the pathological "zero or multiple configs" case
(probe surfaces before CreateConfig; multi-config-then-surfaces).

Extend the existing resolution-change gate to also fire on
pixelformat (codec) change. The teardown branch handles both cases
identically — REQBUFS(0) on both queues before re-S_FMT.

The kernel behavior pre-iter5b on RK3399:
- hantro: hantro_find_format(H264_SLICE) returns NULL on the RK3399
  decoder block (no H.264 support); hantro_try_fmt silently
  substitutes the first format in rk3399_vpu_dec_fmts =
  MPEG2_SLICE → codec_mode = MPEG2_DECODER. VP8 bitstream
  dispatched to MPEG2 ops → all-zero CAPTURE. MPEG-2 worked by
  accident (bitstream matched the substituted codec_mode).
- rkvdec: format/control mismatch; decoder silently drops the
  request → all-zero CAPTURE.

Same bug class as iter4 commit `692eaa0` (h264_start_code
unconditional set). Both fixes thread the active VAProfile into
codec-specific kernel state.

Signed-off-by: claude-noether <claude-noether@reauktion.de>
2026-05-12 09:23:31 +00:00
claude-noether f8256e6c2d fresnel-fourier iter5b Phase 6 commit B: state-tracking — request.h field + config.c wire-up
request.h: add last_output_pixelformat to struct request_data, alongside
the existing last_output_{width,height} V4L2 device state cache. Gates
re-S_FMT on codec change in addition to resolution change.

config.c::RequestCreateConfig: wire up object_config->pixelformat
(previously dead field at config.h:46) by calling pixelformat_for_profile
on the active profile. The pixelformat field becomes the source of truth
that surface.c reads in commit C.

Signed-off-by: claude-noether <claude-noether@reauktion.de>
2026-05-12 09:08:33 +00:00
claude-noether ce304ef5af fresnel-fourier iter5b Phase 6 commit A: NEW src/codec.{h,c} — pixelformat_for_profile helper
Add a small helper that maps a VAProfile to its V4L2 OUTPUT-side
pixel format FOURCC. Single source of truth, mirrors the per-profile
probes in config.c::RequestQueryConfigProfiles (lines 138-188).

Used by commits B + C in this series:
- commit B: populate object_config->pixelformat at CreateConfig
- commit C: surface.c reads the populated field to set OUTPUT format
  per-profile instead of hardcoded H264_SLICE

Register in meson.build sources + headers.

Signed-off-by: claude-noether <claude-noether@reauktion.de>
2026-05-12 09:04:02 +00:00
claude-noether 692eaa0053 fresnel-fourier iter4 Phase 7 fix-forward: gate ANNEX-B start-code prepend on H.264/HEVC profiles
Root cause for VP9 criterion-4 failure traced via runtime
instrumentation: context.c:194 unconditionally set
context_object->h264_start_code = true for every CreateContext,
regardless of codec profile. picture.c:70 then prepends 0x00 0x00 0x01
(ANNEX-B start code) to ALL slice data including VP9 frames.

VP9 has no start codes — its uncompressed_header begins with the raw
frame_marker byte (0x10 in the high 2 bits). The 3-byte prefix
shifted the rkvdec driver's bitstream-read by 24 bits, producing a
silent decode failure (frame_marker mismatch -> driver fails to
locate a valid frame -> CAPTURE slot stays at cap_pool init pattern,
the dim 0x4c green visible in Phase 7 hwdownload PNGs).

iter4 fix: switch on config_object->profile in RequestCreateContext.
Set h264_start_code = true only for VAProfileH264* and VAProfileHEVCMain.
False for MPEG2/VP8/VP9.

iter1 (MPEG-2) and iter3 (VP8) had this same bug latent — they passed
because their criterion-4 verification used different paths (iter1
direct readback was small enough to mask, iter3 used transitive proof
not pixel comparison). The Phase 7 byte-level pixel comparison is what
exposed it.

Empirical proof of the fix on fresnel:
- pre-fix submission FRAME control bytes 0-23: lf.flags=0x01 (only
  DELTA_ENABLED), base_q_idx=0x41 — bit-misaligned because parser was
  reading the prefix bytes.
- post-fix submission FRAME control bytes 0-23 byte-match Phase 3
  kernel-direct anchor: lf.flags=0x03 (ENABLED|UPDATE), base_q_idx=0x2e
  (46). Transitive-proof leg 1 (backend-payload == kernel-direct-payload)
  satisfied for the keyframe.
- s(6) bit-width fix in vp9.c (4 mag + 1 sign -> 6 mag + 1 sign per
  VP9 spec) was a real bug too, latent because Bug 1 (this commit's fix)
  prevented its code path from running. Both fixes ship together.

Pixels still produce 0x4c constant pattern post-fix — that is Bug 2
(substrate-wide cap_pool readback regression on
linux-fresnel-fourier 7.0-1) per phase7_iter4_verification.md.
Bug 2 is out of iter4 scope per Option-A choice; transitive proof
remains the criterion-4 verification path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 09:50:25 +00:00
claude-noether beaa914680 fresnel-fourier iter4 Phase 6 commit C: picture.c VP9 dispatch + 2 buffer-type cases
5 sites:
1. include block: add #include "vp9.h".
2. codec_set_controls: add VP9 case calling vp9_set_controls().
3. codec_store_buffer VAPictureParameterBufferType: VP9 inner case
   memcpy'ing into surface_object->params.vp9.picture.
4. codec_store_buffer VASliceParameterBufferType: VP9 inner case
   memcpy'ing into surface_object->params.vp9.slice.
5. (No reset in RequestBeginPicture — VP9 has no iqmatrix_set/
   probability_set-style flag, Picture/Slice are unconditionally
   populated by VAAPI consumer per frame.)

Per Phase 2 B12: NO buffer.c changes — VP9 uses Picture+Slice+Data
which are already in the iter3 allow-list. Per memory
feedback_runtime_enumerates_allowlists.md plan for Commit D
fix-forward if a runtime miss surfaces; predicted clean.

Verified end-to-end on fresnel:
- vainfo enumerates VAProfileVP9Profile0 alongside H.264 + HEVC.
- LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi VP9 decode
  exits 0 (criterion 3 PASS): 5 frames decoded at 0.307x speed,
  cap_pool_init OK, no kernel ioctl errors.
- mpv vp9-vaapi engagement still SW-fallback (iter4-B2 backlog —
  mpv-DRM device-create path doesn't honor LIBVA_DRIVER_NAME the
  way ffmpeg-vaapi does; investigation deferred).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 06:48:49 +00:00
claude-noether 406d08e122 fresnel-fourier iter4 Phase 6 commit B: NEW src/vp9.c + src/vp9.h + meson.build + context.h (vp9_lf) + surface.h (params.vp9)
VP9 codec dispatcher implementing 12 contract clauses against
V4L2_CID_STATELESS_VP9_FRAME (0xa40a2c) +
V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (0xa40a2d). 2 batched
controls per frame; rkvdec on RK3399 mandatorily requires both
per drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble:752.

Implementation:
- ~80 LOC VPX range coder (vp9_rac_*) — minimal port of FFmpeg
  vpx_rac.[ch] + vp89_rac.h. Stateless static helpers.
- inv_map_table[255] + read_prob_delta — verbatim copy from
  v4l2_request_vp9.c:44-97.
- vp9_parse_uncompressed_header_lf_quant — partial parse for the
  fields VAAPI doesn't expose: lf_delta_enabled / lf_delta_update /
  lf_ref_delta[4] / lf_mode_delta[2] / base_q_idx /
  delta_q_y_dc / delta_q_uv_dc / delta_q_uv_ac. ~120 LOC.
- vp9_fill_compressed_hdr — port of FFmpeg fill_compressed_hdr
  with Phase 5 C3 out_reference_mode parameter. ~140 LOC.
- vp9_set_controls — orchestrates Clauses 1+2+4+5+7+10+11+12.
  ~120 LOC.

Phase 5 amendments incorporated in code:
- C1: frame.interpolation_filter = direct from VAAPI's
  mcomp_filter_type (NO XOR; vaapi_vp9.c:62 already applied it
  before storing into VAAPI's mcomp_filter_type).
- C2: persistent vp9_lf state added to object_context (in
  context.h). Initialized to VP9 spec defaults
  {1,0,-1,-1,0,0} on keyframe / intra_only / error_resilient.
  Updated only when parser sees lf_delta.update=1. Always
  copied to kernel control.
- C3: vp9_fill_compressed_hdr takes uint8_t *out_reference_mode;
  threaded through call site. allowcompinter derived from VAAPI
  sign-bias bits.

Phase 5 S4: uv_mode memcpy from FFmpeg's fill_compressed_hdr
omitted — rkvdec reads uv_mode from kernel's persistent
probability_tables, NOT from prob_updates ctrl.

Clause 3 compile-time _Static_assert on struct sizes (168/2040)
matches Phase 3 empirical baseline; UAPI shifts will fail loudly.

surface.h: extends params union with vp9 { picture, slice }.
context.h: adds vp9_lf { ref_deltas[4], mode_deltas[2], initialized }.
meson.build: adds vp9.c + vp9.h.

Build: clean on fresnel (linux-fresnel-fourier 7.0-1, libva 1.23).
Runtime: not yet wired in picture.c — next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 06:46:11 +00:00
claude-noether 16b397305d fresnel-fourier iter4 Phase 6 commit A: VP9 enumeration + dispatch in config.c
3 sites:
1. RequestQueryConfigProfiles: probe V4L2_PIX_FMT_VP9_FRAME against
   single + MPLANE OUTPUT formats; advertise VAProfileVP9Profile0.
2. RequestCreateConfig: VAProfileVP9Profile0 case (no profile-specific
   validation; defer to vaCreateContext / control submission time).
3. RequestQueryConfigEntrypoints: add VAProfileVP9Profile0 to the
   VAEntrypointVLD fall-through.

Verified on fresnel: vainfo (auto-detect rkvdec) now shows
VAProfileVP9Profile0 alongside H.264x5 + HEVCMain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 06:34:14 +00:00
claude-noether 7f8fa93213 fresnel-fourier iter4 Phase 6 commit Z: device-path auto-detect via media controller topology
Pre-iter4 backend hardcoded /dev/video0 + /dev/media0 as defaults
when no env override was set. Linux 7.0 udev/probe order changed,
rockchip-rga (RGB color converter, no codec) now claims
/dev/video0 — legacy default returns empty profile list.

Discovery is driven by the media controller graph (the canonical
v4l2-request approach). NOT a /dev/video* walk by enumeration
order — that mispairs video and media nodes when one driver
registers multiple media devices, and depends on probe-order
luck.

Algorithm:
  1. Walk /dev/media0..15. MEDIA_IOC_DEVICE_INFO names the driver.
     Match against {rkvdec, hantro-vpu, cedrus, sun4i_csi}.
  2. MEDIA_IOC_G_TOPOLOGY enumerates the entity/interface graph.
     The MEDIA_INTF_T_V4L_VIDEO interface carries major:minor of
     the V4L2 video node owned by THIS media controller — paired
     by the kernel, not by /dev/* enumeration order.
  3. Resolve major:minor to /dev/videoN via /sys/dev/char/<M>:<N>
     (the kernel's char-device sysfs symlink whose basename is
     the device node name).

LIBVA_V4L2_REQUEST_NO_AUTODETECT=1 escape hatch reverts to legacy
/dev/video0 + /dev/media0 hardcoded behavior for callers that
depended on it.

Phase 5 C4 amendment: walk-and-pick-first selects rkvdec on RK3399
(rkvdec's media controller enumerates before hantro's). H.264 /
HEVC / VP9 (rkvdec codecs) work without env override after this
commit. MPEG-2 / VP8 (hantro) still require explicit
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 override; full
multi-decoder dispatch is iter4-B1 backlog item.

Verified empirically on fresnel (linux-fresnel-fourier 7.0-1):
- vainfo (no env) -> "auto-selected codec device: /dev/video1 +
  /dev/media0", enumerates H264*5 + HEVCMain (rkvdec) — paired
  via topology graph, not /dev/video* enumeration.
- vainfo NO_AUTODETECT=1 -> empty list (legacy /dev/video0 = rga).
- vainfo with explicit /dev/video3 + /dev/media1 -> MPEG2*2 + VP8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 06:05:41 +00:00
claude-noether e1aca9cc6b fresnel-fourier iter3 Phase 6 commit D: buffer.c whitelist for
VAProbabilityBufferType

Phase 2 source-read assumed buffer.c was type-agnostic ("the buffer
registry is type-agnostic" per phase2_iter3_situation.md non-bugs
list). FALSE. RequestCreateBuffer at buffer.c:59-70 has an explicit
allow-list switch:

  case VAPictureParameterBufferType:
  case VAIQMatrixBufferType:
  case VASliceParameterBufferType:
  case VASliceDataBufferType:
  case VAImageBufferType:
      break;
  default:
      return VA_STATUS_ERROR_UNSUPPORTED_BUFFERTYPE;

Without VAProbabilityBufferType in the allow-list, the consumer gets
VA_STATUS_ERROR_UNSUPPORTED_BUFFERTYPE on vaCreateBuffer for the
probability buffer, BEFORE codec_store_buffer is ever reached.
ffmpeg-vaapi log:

  [vp8] Failed to create parameter buffer (type 13): 15
        (the requested VABufferType is not supported).

Same iter1 Commit D pattern: Phase 2 grep didn't find this, runtime
enumerated authoritatively. Per memory feedback_header_deletion_
check.md ("let the compiler enumerate them") — but extended here:
runtime enumerates allow-list violations the same way the compiler
enumerates include-site violations.

Fix: add `case VAProbabilityBufferType:` to the buffer.c allow-list.
+1 line, mechanical.

Refs:
  ../fresnel-fourier/phase2_iter3_situation.md (incorrect non-bug
                                                 claim about buffer.c)
  ../fresnel-fourier/phase4_iter3_plan.md (Commit D placeholder for
                                            fix-forward — used)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 23:03:59 +00:00
claude-noether 7f84bbb50f fresnel-fourier iter3 Phase 6 commit C: picture.c VP8 dispatch + 4
buffer-type cases + new VAProbabilityBufferType outer case + per-
frame reset + surface.h params.vp8 union extension

Five sites in picture.c + one site in surface.h wire up the VP8
codec dispatcher introduced by commit B:

  1. Include #include "vp8.h" in the codec headers block.

  2. codec_set_controls: NEW case VAProfileVP8Version0_3 calling
     vp8_set_controls(driver_data, context, surface_object).
     Same shape as MPEG-2 + HEVC dispatch.

  3. codec_store_buffer VAPictureParameterBufferType: NEW VP8 case
     memcpy'ing into surface_object->params.vp8.picture
     (sizeof VAPictureParameterBufferVP8).

  4. codec_store_buffer VASliceParameterBufferType: NEW VP8 case
     memcpy'ing into surface_object->params.vp8.slice (single,
     no slices[] array — VP8 is frame-mode, no multi-slice).

  5. codec_store_buffer VAIQMatrixBufferType: NEW VP8 case
     memcpy'ing into surface_object->params.vp8.iqmatrix +
     setting iqmatrix_set true.

  6. codec_store_buffer NEW outer case VAProbabilityBufferType
     (Phase 5 C3: NOT VAProbabilityDataBufferType — that's the
     STRUCT name; the buffer-type enum constant is
     VAProbabilityBufferType = 13 per va.h:2058). Inner switch
     dispatches by profile, with VP8 case memcpy'ing into
     surface_object->params.vp8.probability + setting
     probability_set true.

  7. RequestBeginPicture: NEW per-frame reset for the two VP8
     flags — params.vp8.iqmatrix_set = false +
     params.vp8.probability_set = false. Mirrors the existing
     iter1 (h264.matrix_set) + iter2 (h265.num_slices) per-frame
     resets.

surface.h extension:

  8. params union: NEW vp8 struct after h265 — holds the 4 VAAPI
     buffer-type structs (VAPictureParameterBufferVP8,
     VASliceParameterBufferVP8, VAIQMatrixBufferVP8 + iqmatrix_set,
     VAProbabilityDataBufferVP8 + probability_set).

The NEW vp8 union member adds ~5300 bytes (sizeof
VAProbabilityDataBufferVP8 dominated by dct_coeff_probs[4][8][3]
[11] = 1056 + bookkeeping). The h265 member with slices[64] array
remains the largest (~17 KB), so the union size doesn't grow.

After this commit: backend builds clean, links cleanly. mpv-vaapi
VP8 decode should engage end-to-end on hantro env binding. Phase
1 criteria 1 + 2 + 3 expected satisfied; criterion 4 (HW=SW byte-
identical) and criterion 5 (3-codec regression) verified at Phase
6 smoke + Phase 7.

Refs:
  ../fresnel-fourier/phase4_iter3_plan.md (Commit C site list)
  ../fresnel-fourier/phase2_iter3_situation.md (B6, B7, B8, B9
                                                 bug enumeration)
  ../fresnel-fourier/phase5_iter3_review.md (C3 VAProbabilityBuffer
                                              Type rename
                                              empirically verified)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 22:52:24 +00:00
claude-noether 017e27f389 fresnel-fourier iter3 Phase 6 commit B: NEW src/vp8.c + src/vp8.h
+ meson.build VP8 entries

Net-new VP8 codec dispatcher implemented against
V4L2_CID_STATELESS_VP8_FRAME (kernel UAPI <linux/v4l2-controls.h>:
1900-1958). Single batched control per frame, no init-time device-
wide menus (VP8 has no DECODE_MODE/START_CODE).

Per-frame submission: ONE VIDIOC_S_EXT_CTRLS, count=1, with full
v4l2_ctrl_vp8_frame struct (1232 bytes — corrected vs Phase 2
implicit ~400 estimate; entropy.coeff_probs[4][8][3][11] alone is
1056 bytes).

vp8_set_controls() implements 10 contract clauses per
phase4_iter3_plan.md:

  Clause 1: single-control batched submission (count=1)
  Clause 2: stack alloc + memset zero (covers all padding)
  Clause 3: width/height/version/per-frame scalars; off-by-one
            num_dct_parts = num_of_partitions - 1
  Clause 4: DPB timestamp resolution (3 refs: last/golden/alt;
            NULL surface → 0-sentinel via memset; mirrors iter1
            mpeg2.c::pic.forward_ref_ts)
  Clause 5: loop filter (6 fields + 3 flag bits; ADJ_ENABLE/
            DELTA_UPDATE/FILTER_TYPE_SIMPLE)
  Clause 6: quant base + delta derivation from VAAPI's per-segment
            absolute index matrix (subtraction recovers signed
            deltas; correct for typical content per Phase 5 S1)
  Clause 7: segment fields (segment_probs direct copy; flags
            assembled with DELTA_VALUE_MODE set unconditionally
            per FFmpeg pattern)
  Clause 8: entropy table — 3 VAAPI sources merged (Picture: y_mode +
            uv_mode + mv_probs; ProbabilityData: coeff_probs[4][8][3]
            [11] direct memcpy; IQMatrix: quant)
  Clause 9: coder state + first-partition fields + flags assembly
  Clause 10: v4l2_set_controls submission

Phase 5 review amendments incorporated:

  C1 first_part_header_bits = slice->macroblock_offset
     NOT 0 — kernel hantro_g1_vp8_dec.c:260 + rockchip_vpu2_hw_vp8_
     dec.c:372 read this field unconditionally to compute the MB-
     data DMA offset. Verified via source identity: vaapi_vp8.c:204
     and v4l2_request_vp8.c:83 use byte-identical formulas
     (8 * (input - data) - bit_count - 8); VAAPI exposes via
     slice->macroblock_offset, V4L2 names it first_part_header_bits.

  C2 first_part_size = slice->partition_size[0] +
                       ((macroblock_offset + 7) / 8)
     VAAPI's partition_size[0] is the REMAINING bytes after parsing
     (vaapi_vp8.c:209; va_dec_vp8.h:193-196). Kernel needs the
     TOTAL control partition size; recover by adding back ceil
     (macroblock_offset/8) bytes.
     Phase 3 keyframe verbatim cross-check: 21923 + 819 = 22742 ✓

  C4 (int8_t) cast (NOT (s8); s8 is kernel-internal typedef from
     <linux/types.h> not exposed to userspace; userspace UAPI
     exposes __s8 with double-underscore; portable userspace cast
     is int8_t from <stdint.h>).

  S3 assert(probability_set) — kernel hantro_vp8.c::hantro_vp8_
     prob_update reads coeff_probs unconditionally; NO default-
     table fallback. Practical risk low (FFmpeg vaapi_vp8.c always
     sends VAProbabilityBufferType per frame), but assert surfaces
     immediately if a future consumer doesn't.

Flags assembly: 6 mainline-documented bits only (KEY_FRAME, SHOW_
FRAME, MB_NO_SKIP_COEFF, SIGN_BIAS_GOLDEN, SIGN_BIAS_ALT). EXP +
bit 0x40 NOT replicated despite ffmpeg-v4l2-request-git setting
them on inter frames — kernel hantro_vp8.c only inspects KEY_FRAME
bit. SHOW_FRAME forced unconditional per Phase 3 Q4 (BBB has no
alt-ref invisible frames; documented fidelity gap).

VAAPI inverts: key_frame=0 means it IS a keyframe per VP8 spec.
Backend writes V4L2_VP8_FRAME_FLAG_KEY_FRAME iff
!picture->pic_fields.bits.key_frame.

After this commit alone: vp8.o compiles standalone; meson.build
links it into the shared library. picture.c can't dispatch yet
(commit C wires that).

Refs:
  ../fresnel-fourier/phase4_iter3_plan.md (10 contract clauses,
                                            Phase 5 amendments
                                            section)
  ../fresnel-fourier/phase5_iter3_review.md (C1, C2, C3, C4, S3
                                              all incorporated)
  ../fresnel-fourier/phase3_iter3_baseline.md (verbatim payload
                                                anchors)
  references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp8.c (V4L2 ref)
  references/ffmpeg-kwiboo/libavcodec/vaapi_vp8.c (VAAPI source ref)
  references/linux-mainline/drivers/media/platform/verisilicon/
    hantro_g1_vp8_dec.c (RK3399 kernel driver — first_part_header_
    bits + first_part_size usage)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 22:51:12 +00:00
claude-noether 27d82e3cf4 fresnel-fourier iter3 Phase 6 commit A: VP8 enumeration + dispatch in config.c
Three sites enabling VP8 profile recognition through the libva config
path:

  1. RequestQueryConfigProfiles: NEW enumeration block probing
     V4L2_PIX_FMT_VP8_FRAME against single + MPLANE OUTPUT formats.
     Mirrors iter2 HEVC enumeration block. Surfaces VAProfileVP8
     Version0_3 in vainfo on hantro env binding.

  2. RequestCreateConfig: NEW case VAProfileVP8Version0_3 with
     break — same shape as iter1 MPEG-2 + iter2 HEVCMain (no
     profile-specific config validation in the libva backend;
     validation deferred to vaCreateContext / control submission).

  3. RequestQueryConfigEntrypoints: VAProfileVP8Version0_3 added to
     the existing fall-through case list — surfaces VAEntrypointVLD.

After this commit alone, vainfo lists VP8Version0_3 (Phase 1
criterion 1) but vaCreateContext / runtime decode would fail at
later stages because no codec dispatcher exists yet (added in
commit B + C).

Refs:
  ../fresnel-fourier/phase4_iter3_plan.md (Commit A site list)
  ../fresnel-fourier/phase2_iter3_situation.md (B1, B2, B3
                                                  bug enumeration)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 22:49:28 +00:00
claude-noether 8d71e20bf7 fresnel-fourier iter2 Phase 6 commit B: rewrite h265.c against new V4L2 stateless HEVC API
Rewrites src/h265.c (407 lines → 588 lines) and the picture.c HEVC
dispatch + per-slice accumulation against the modern split V4L2_CID_
STATELESS_HEVC_{SPS,PPS,SLICE_PARAMS,SCALING_MATRIX,DECODE_PARAMS,
DECODE_MODE,START_CODE} stateless controls. Replaces the staging-era
V4L2_CID_MPEG_VIDEO_HEVC_{SPS,PPS,SLICE_PARAMS} CIDs that were
removed from the kernel UAPI.

Per-frame submission: ONE batched VIDIOC_S_EXT_CTRLS, count=5,
ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS:

  0xa40a90 SPS            (40  bytes)
  0xa40a91 PPS            (64  bytes)
  0xa40a92 SLICE_PARAMS   (variable; dynamic-array; one entry per slice)
  0xa40a93 SCALING_MATRIX (1296 bytes; memset-zero when no scaling list)
  0xa40a94 DECODE_PARAMS  (328 bytes; per-frame DPB info)

Plus device-wide menus set once at context.c init (separate batched
S_EXT_CTRLS call so a kernel without HEVC controls — e.g. hantro on
RK3568/RK3399 — silently fails its batch without invalidating H.264):

  0xa40a95 DECODE_MODE  (FRAME_BASED on rkvdec)
  0xa40a96 START_CODE   (ANNEX_B on rkvdec)

Reference: FFmpeg libavcodec/v4l2_request_hevc.c:505-565
           (v4l2_request_hevc_queue_decode batched submission shape).

Phase 5 review amendments incorporated:

  C1 (data_byte_offset NOT data_bit_offset):
    Old h265.c at lines 184-209 ran an 8-bit search to compute
    bit-granularity offset. New API renames the field to
    data_byte_offset (u32 byte offset). Bit-search dropped; replaced
    with plain byte offset = source_offset + slice->slice_data_byte_offset.

  C2 (dpb_entry.flags only LONG_TERM_REFERENCE; pic_order_cnt_val
      singular; poc_st_curr_*[] arrays hold DPB INDICES not POC):
    h265_fill_decode_params replaces old slice-params DPB iteration
    with explicit DPB classification + index-array population.
    For each VAAPI ReferenceFrames[i]:
      - Classify into ST_CURR_BEFORE / ST_CURR_AFTER / LT_CURR via
        VA_PICTURE_HEVC_RPS_* flags.
      - Set dpb[j].timestamp, .pic_order_cnt_val (singular), .field_pic.
      - Set dpb[j].flags = LONG_TERM_REFERENCE iff RPS_LT_CURR.
      - Append j (DPB index, u8) to poc_st_curr_before[k] /
        poc_st_curr_after[k] / poc_lt_curr[k] based on classification.

  C3 (union-aliasing reasoning corrected):
    BeginPicture's params.h265.num_slices = 0 reset is benign for
    non-HEVC profiles because byte ~17764 of the params union is past
    any field non-HEVC profiles read, NOT because RenderPicture's
    per-buffer copies overwrite that location. Wording amended in
    phase4_iter2_plan.md per phase5_iter2_review.md.

  S1 (PPS flags 19 + 20 — DEBLOCKING_FILTER_CONTROL_PRESENT and
      UNIFORM_SPACING):
    Empirically VAAPI does NOT expose either flag in the
    VAPictureParameterBufferHEVC pic_fields.bits or
    slice_parsing_fields.bits. Both bits left zero. BBB-720p10s_hevc
    fixture uses neither tiles nor explicit deblocking-control
    parameters, so the omission is correct for the iter2 binding cell.

  S2 (3 PPS scalars added):
    pic_parameter_set_id (default 0; VAAPI doesn't expose),
    num_ref_idx_l0_default_active_minus1, num_ref_idx_l1_default_
    active_minus1 (both populated from VAAPI picture struct).

  Q2 (slice_segment_addr populated):
    Was missing in old h265.c. Now sourced from
    VAAPI's slice->slice_segment_address.

  S3 (SCALING_MATRIX content choice):
    Implementer choice taken: when iqmatrix_set==false (BBB has no
    scaling list per SPS flags = SAO|STRONG_INTRA_SMOOTHING),
    h265_fill_scaling_matrix sends memset-zero. Matches FFmpeg's
    sl=NULL pattern at v4l2_request_hevc.c:384-403 (preserves
    byte-equality vs cross-validator anchor).

  S4 (FFmpeg function name fix): cosmetic; no code impact.

Plus one Phase 6 inline correction: phase 5 review S1 suggested
VAAPI exposes uniform_spacing_flag in pic_fields.bits; empirical
test-compile shows it doesn't. Comment added in h265_fill_pps
documenting the omission.

Picture.c changes (3 edits):

  1. codec_set_controls HEVCMain dispatch (lines 204-206 → call
     h265_set_controls; replaces explicit Fourier-local: HEVC stripped
     reject).
  2. codec_store_buffer HEVC VASliceParameterBufferType case: append
     VAAPI slice param to params.h265.slices[N] array, increment
     num_slices. Single-slice mirror at .slice retained for
     h265_fill_pps (which reads dependent_slice_segment_flag from
     LongSliceFlags).
  3. RequestBeginPicture: add params.h265.num_slices = 0 reset
     alongside existing h264.matrix_set = false reset.

Surface.h: extend params.h265 struct with slices[HEVC_MAX_SLICES_PER_
FRAME=64] array + num_slices counter. ~17 KB extra per surface union;
24 surfaces in iter7 cap_pool = ~400 KB total surface_heap growth.
object_heap allocator picks up new size automatically via
sizeof(struct object_surface).

Context.c: separate 2-control batched call sets HEVC DECODE_MODE +
START_CODE device-wide. Same best-effort (void)v4l2_set_controls
pattern as the existing H.264 device-init block; if kernel doesn't
advertise HEVC controls (hantro on RK3568/RK3399), the batch silently
fails without invalidating the H.264 batch.

Meson.build: uncomment 'h265.c' (line 50) and 'h265.h' (line 73)
in sources + headers lists.

H265.h: added HEVC_MAX_SLICES_PER_FRAME=64 #define before struct
forward declarations.

Phase 6 smoke test on fresnel (post Commit A + Commit B):

  Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec env binding
              (/dev/video1 + /dev/media0). PASS.

  Criterion 3: ffmpeg -hwaccel vaapi HEVC decode of bbb_720p10s_hevc.mp4
              -frames:v 5 -f null -, exit 0. cap_pool_init: 24 slots
              ready. PASS.

  Criterion 4: mpv --hwdec=vaapi --vo=image at +02s seek, HEVC fixture:
    HW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5
    SW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5
    HW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656
    SW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656
    HW=SW byte-identical for both frames; frame1 != frame2 (real motion).
    PASS.

  Criterion 5: regression hashes hold for both prior cells:
    H.264 +30s HW frame 1: f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9 (T4 ref MATCH)
    H.264 +30s HW frame 2: 7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8 (T4 ref MATCH)
    MPEG-2 +02s HW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 (iter1 ref MATCH)
    MPEG-2 +02s HW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de (iter1 ref MATCH)
    PASS.

All five criteria green on first build attempt — Phase 5 review
caught the 3 Critical UAPI errors (data_bit_offset → data_byte_offset
rename; dpb.rps field gone + pic_order_cnt_val rename + index-array
semantics) that would have been Phase 6 compile failures or silent
Phase 7 byte-compare divergences. Without that review pass, this
commit would have been the start of a 2+ loopback debugging cycle.

Refs:
  ../fresnel-fourier/phase4_iter2_plan.md (10 contract clauses,
                                            File 4 patch shape)
  ../fresnel-fourier/phase5_iter2_review.md (C1, C2, C3, S1, S2,
                                              S3, S4, Q2 amendments
                                              all incorporated)
  ../fresnel-fourier/phase0_evidence/2026-05-08/iter2_phase3/
    ffmpeg_v4l2req.stdout (cross-validator anchor — Phase 7
    bonus byte-compare verification target)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:58:34 +02:00
claude-noether cca539d5f9 fresnel-fourier iter2 Phase 6 commit A: config.c break for HEVCMain case
RequestCreateConfig dispatches H.264 + MPEG-2 cases via break.
HEVCMain previously fell through to default returning
VA_STATUS_ERROR_UNSUPPORTED_PROFILE (= 12). Same fall-through
pattern iter1 fixed for MPEG-2; iter2 closes the loop for HEVC.

Add break for VAProfileHEVCMain. Same shape as iter1 Commit A
pattern — no profile-specific config validation in
RequestCreateConfig (validation happens at vaCreateContext /
control submission time).

This is the substrate fix only. After this commit:
  - vaCreateConfig(VAProfileHEVCMain) returns SUCCESS
  - mpv-vaapi HEVC ATTEMPTS to set up the hwaccel path
  - codec_set_controls at picture.c:204-206 still has the
    explicit case VAProfileHEVCMain: return UNSUPPORTED_PROFILE
    reject in place
  - decode fails downstream with -5 (Input/output error)

Bug 2 (picture.c reject removal) + Bug 3-7 (h265.c rewrite +
meson re-enable + slice_params accumulation + device-init
extension) land together in commit B, where h265_set_controls
exists to dispatch to.

Verified empirically Phase 3 Baseline D (scratch test on
throwaway branch): with this break alone, vaCreateConfig
SUCCESS for HEVCMain, V4L2 setup proceeds, decode fails at
the picture.c reject — confirms Phase 2 prediction. T4 H.264
+ iter1 MPEG-2 reference hashes hold (no collateral
regression).

Refs:
  ../fresnel-fourier/phase0_findings_iter2.md (Phase 1 lock)
  ../fresnel-fourier/phase2_iter2_situation.md Bug 1
  ../fresnel-fourier/phase3_iter2_baseline.md Baseline D
  ../fresnel-fourier/phase4_iter2_plan.md Clause 8, File 1
  ../fresnel-fourier/phase5_iter2_review.md (no Critical findings
                                              touch this commit)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:00:30 +02:00
claude-noether 229d6d11be fresnel-fourier iter1 Phase 6 commit D: drop missed mpeg2-ctrls.h include from context.c
Fix-forward for commit C (3aab187): Phase 2 source-read missed a
third occurrence of #include <mpeg2-ctrls.h> in src/context.c:42.
The Phase 2 grep audit reported only two callsites
(src/config.c:37, src/mpeg2.c:38), both removed in commit B.
After commit C deleted include/mpeg2-ctrls.h from disk, the build
broke on context.c with:

  ../src/context.c:42:10: fatal error: mpeg2-ctrls.h:
  No such file or directory
     42 | #include <mpeg2-ctrls.h>
        |          ^~~~~~~~~~~~~~~

The include in context.c was vestigial — context.c references no
V4L2_CID_MPEG_VIDEO_MPEG2_* symbols and never needed the header
even before iter1's rewrite. The Phase 2 grep was simply incomplete.

This commit drops the orphan include line. Build now passes; install
clean; Phase 1 criterion 4 (DMA-BUF GL HW=SW byte-identical pixel
hashes) still PASS:

  HW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092
  SW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092
  HW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de
  SW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de

Per feedback_dev_process.md Phase 6 discipline:
"If a plan revision is needed mid-implementation, surface it
explicitly and re-enter Phase 4."

This is a 1-line scope expansion of commit B's "drop mpeg2-ctrls.h
include from all callsites" intent. Surfacing explicitly here
rather than silently amending B (which is already pushed). No
re-lock of plan needed; the spirit of File 1+2 in
phase4_iter1_plan.md was "drop the include from every file that
has it." The audit method (Phase 2 grep) was the gap.

Lesson for Phase 8 memory update: a more authoritative completeness
check than naive grep before deleting a header — recursive build
attempt to drive out hidden includes, or grep with no path filter
would have caught it.

Refs:
  ../fresnel-fourier/phase4_iter1_plan.md (File 3 + audit)
  ../fresnel-fourier/phase2_iter1_situation.md Bug 3 (incomplete
                                                     audit)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:24:50 +02:00
claude-noether 3aab1879cb fresnel-fourier iter1 Phase 6 commit C: delete staging-era include/mpeg2-ctrls.h
Removes the local fork-internal header include/mpeg2-ctrls.h. The
header explicitly self-described as staging-era in its preamble:

  These are the MPEG2 state controls for use with stateless MPEG-2
  codec drivers. It turns out that these structs are not stable yet
  and will undergo more changes. So keep them private until they
  are stable and ready to become part of the official public API.

The structs DID stabilize and become public in mainline Linux —
at different CIDs (V4L2_CID_STATELESS_MPEG2_{SEQUENCE,PICTURE,
QUANTISATION} = 0xa409dc/dd/de) and with redesigned struct
layouts (split sequence/picture/quantisation, slice header parsing
moved kernel-side, boolean fields collapsed to flags bitmask).

Before this commit, two source files included this header:
  - src/config.c:37 #include <mpeg2-ctrls.h>
  - src/mpeg2.c:38  #include <mpeg2-ctrls.h>

Both includes were removed in commit B. After this commit:

  $ git grep -l 'mpeg2-ctrls' --
  (no matches)

The kernel UAPI providing the new MPEG-2 stateless symbols is in
<linux/v4l2-controls.h>, pulled in transitively via
<linux/videodev2.h> (and explicitly in src/mpeg2.c).

include/hevc-ctrls.h is kept untouched per
phase5_iter1_review.md Nit 6 (lower-risk path; HEVC iteration
will delete its corresponding staging header in a separate commit).

Refs:
  ../fresnel-fourier/phase4_iter1_plan.md (File 3)
  ../fresnel-fourier/phase5_iter1_review.md (Nit 6)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:18:43 +02:00
claude-noether 5fe873c144 fresnel-fourier iter1 Phase 6 commit B: rewrite mpeg2.c against new V4L2 stateless API
Rewrites src/mpeg2.c to submit MPEG-2 control payload via the new
split V4L2_CID_STATELESS_MPEG2_{SEQUENCE,PICTURE,QUANTISATION}
controls (mainline kernel <linux/v4l2-controls.h>:1985-2105),
replacing the staging-era V4L2_CID_MPEG_VIDEO_MPEG2_{SLICE_PARAMS,
QUANTIZATION} combined-struct API that the kernel removed.

Per-frame submission: one batched VIDIOC_S_EXT_CTRLS, count=3,
ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS (0xf010000), with the
three controls in order:
  - id=0xa409dc (SEQUENCE)      size=12  bytes
  - id=0xa409dd (PICTURE)       size=32  bytes
  - id=0xa409de (QUANTISATION)  size=256 bytes

Matches FFmpeg libavcodec/v4l2_request_mpeg2.c:130-155 reference
implementation. Verified empirically against fresnel-fourier
Phase 0 cross-validator anchor (bit-for-bit byte equivalence on
SEQUENCE first-row + QUANTISATION 256 bytes).

Six structural changes from old to new API:

  1. Slice header parsing moved to kernel: bit_size,
     data_bit_offset, quantiser_scale_code GONE from new structs.
  2. Reference timestamps moved from slice to picture:
     forward_ref_ts/backward_ref_ts now in
     v4l2_ctrl_mpeg2_picture (offsets 0/8).
  3. Boolean fields collapsed into picture.flags bitmask
     (TOP_FIELD_FIRST 0x01 .. PROGRESSIVE 0x80, 8 bits total).
  4. progressive_sequence collapsed into sequence.flags &
     V4L2_MPEG2_SEQ_FLAG_PROGRESSIVE.
  5. PICTURE_CODING_TYPE renamed to PIC_CODING_TYPE (values same).
  6. Quantisation load_* flags removed; matrices always present;
     British spelling — quantiSation not quantiZation.

Behavioral correction (from old code, was a latent bug):
  Old src/mpeg2.c:104-118 self-referenced surface_object timestamp
  when the VAAPI ref picture was VA_INVALID_ID. New code sets the
  ref_ts to 0, matching kernel doc 0-as-sentinel convention
  (verified Phase 3 Baseline C: I-frame has both ts == 0; FFmpeg
  v4l2_request_mpeg2.c:98-108 same convention).

Quantisation matrix order: zigzag scanning order per kernel doc
v4l2-controls.h:2076. VAAPI VAIQMatrixBufferMPEG2 stores in
zigzag order (per VAAPI spec). Direct memcpy works; no
permutation in libva backend. Kernel hantro_mpeg2.c::
hantro_mpeg2_dec_copy_qtable applies zigzag-to-raster permutation
when copying to the hardware quantisation table.

Default matrices (when iqmatrix_set==false): MPEG-2 spec defaults
per ISO/IEC 13818-2 Table 7-3. The mpeg2_default_intra_matrix
constant was transcribed from fresnel-fourier Phase 3 Baseline C
QUANTISATION verbatim payload bytes 0..63 (256-byte capture from
ffmpeg-v4l2request decode of bbb_720p10s_mpeg2.ts), per
phase5_iter1_review.md S3 amendment that flagged spec-recall as
unreliable. non_intra and chroma_non_intra are 16s per spec
(verified Baseline C bytes 64..127, 192..255). chroma_intra is
copy of intra (Baseline C bytes 128..191, verified identical).

Submission shape: one batched v4l2_set_controls call with all
three v4l2_ext_control entries, matching iter6/7/8 H.264 pattern
at src/h264.c:986. Bound to surface_object->request_fd (the
per-OUTPUT-slot permanent request_fd from iter6 binding).

Behavioral details:
  - sequence.vbv_buffer_size = surface_object->source_size, where
    source_size is set in picture.c:276 from request_pool slot->size,
    which is the V4L2-negotiated sizeimage from VIDIOC_QUERYBUF.
    Matches FFmpeg controls->pic.output->size.
  - sequence.profile_and_level_indication = 0; not exposed by
    VAAPI VAPictureParameterBufferMPEG2.
  - sequence.chroma_format = 1 (4:2:0) hardcoded; campaign codec
    scope is 4:2:0.
  - progressive_frame proxies for progressive_sequence; same bit
    for typical streams.

Phase 6 smoke test (post Commit A + Commit B):
  - vainfo enumerates VAProfileMPEG2Simple + VAProfileMPEG2Main
    on hantro bind. (Phase 1 criterion 1)
  - libva trace: vaCreateConfig(VAProfileMPEG2Main) =
    VA_STATUS_SUCCESS. (Phase 1 criterion 2)
  - ffmpeg -hwaccel vaapi exits 0 with no Failed-to-create-
    decode-configuration. (Phase 1 criterion 3 adjusted)
  - mpv --hwdec=vaapi --vo=image at +02s seek: 2 distinct
    frames with hashes byte-identical to SW reference:
      HW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092
      SW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092
      HW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de
      SW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de
    (Phase 1 criterion 4 — DMA-BUF GL import path; cache-coherency-safe)
  - T4 H.264 reference hashes still match (criterion 5; verified
    Phase 3 Baseline D earlier).

Cache-stale class observation (out-of-scope iter1 work item):
  ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi + hwdownload
  pipeline produces all-zero NV12 for MPEG-2 (same iter1 patch-0011
  cache-coherency bug class observed for H.264 in fresnel-fourier
  T4). Kernel + HW decode is correct (verified via ffmpeg
  -hwaccel v4l2request -hwaccel_output_format drm_prime + hwdownload
  which produces correct non-zero pixels matching SW reference).
  Bug is in libva backend vaDeriveImage path; Phase 4 cross-
  cutting work to add VIDIOC_EXPBUF + DMA_BUF_IOCTL_SYNC support.
  Not blocking iter1 — DMA-BUF GL import path (mpv --vo=image) is
  cache-coherency-safe and gives bit-exact pixels.

Auxiliary EINVAL noise (out-of-scope iter1 work item):
  src/context.c:142-155 unconditionally sets H.264 device-wide
  controls (V4L2_CID_STATELESS_H264_DECODE_MODE,
  _START_CODE) on every CreateContext, regardless of profile.
  EINVALs on hantro-vpu-dec (no H.264 controls there). Intentional
  best-effort behavior — return value cast to (void) and discarded
  at line 153. The error message "Unable to set control(s):
  Invalid argument" is logged from src/v4l2.c:484 but doesn't
  propagate as a backend error. Stays as documented auxiliary
  noise.

Drop #include <mpeg2-ctrls.h> from src/config.c:37 and src/mpeg2.c
(formerly line 38). The kernel UAPI for MPEG-2 stateless control
IDs comes from <linux/v4l2-controls.h>, pulled transitively via
<linux/videodev2.h> (and explicitly from src/mpeg2.c after this
rewrite). The fork local include/mpeg2-ctrls.h header is deleted
in commit C; this commit removes the last includes of it.
src/config.c:38 still includes <hevc-ctrls.h> — left untouched per
phase5_iter1_review.md Nit 6 (lower-risk path; HEVC iteration
deletes its header).

Refs:
  ../fresnel-fourier/phase4_iter1_plan.md (contract clauses 1-6,
                                           File 2 patch shape)
  ../fresnel-fourier/phase5_iter1_review.md (S3, Q4, Q5 amendments)
  ../fresnel-fourier/phase0_evidence/2026-05-07/iter1_phase3/
    baseline_C_xvalidator/ffmpeg.stdout (cross-validator anchor)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:17:40 +02:00
claude-noether e7dad7abb5 fresnel-fourier iter1 Phase 6 commit A: config.c break for MPEG-2 cases
RequestCreateConfig dispatches H.264 cases via // FIXME + break;
MPEG-2 + HEVC cases fell through to default: which returns
VA_STATUS_ERROR_UNSUPPORTED_PROFILE (= 12). For MPEG-2, fall-through
was a leftover from libva-multiplanar iter1-iter5 H.264 focus —
nobody on that campaign tested MPEG-2 end-to-end, so the missing
break never surfaced as a bug there.

Add break for VAProfileMPEG2Simple + VAProfileMPEG2Main cases.
HEVC stays in fall-through (h265.c excluded from build per
fresnel-fourier campaign Phase 0 finding F-C; honest
UNSUPPORTED_PROFILE is correct until h265.c is reinstated in a
later iteration).

This is the substrate fix only. After this commit, vaCreateConfig
returns SUCCESS for MPEG-2, but actual decode still fails at
VIDIOC_S_EXT_CTRLS time because src/mpeg2.c uses staging-era
control IDs that mainline kernel removed. That fix lands in
commit B (mpeg2.c rewrite against the new V4L2_CID_STATELESS_MPEG2_*
split API).

Verified empirically in Phase 3 baseline B (scratch fix on
throwaway branch): with this break in place, vaCreateConfig
ret = SUCCESS, V4L2 setup proceeds (CREATE_BUFS, REQBUFS,
QUERYBUF, STREAMON, REQUEST_ALLOC, QBUF/DQBUF), then
VIDIOC_S_EXT_CTRLS id=V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS
(0x9909fa) returns -1 EINVAL — exactly the next failure mode
predicted by phase2_iter1_situation.md.

H.264 regression check (T4 reference hashes): with scratch fix
in place, mpv --hwdec=vaapi at +30s into bbb_1080p30_h264.mp4
produces JPEG hashes f623d5f7... (frame 1) and 7d7bc6f2...
(frame 2), exactly matching SW reference and T4 baseline.
No H.264 regression.

Refs:
  ../fresnel-fourier/phase0_findings_iter1.md (Phase 1 lock)
  ../fresnel-fourier/phase2_iter1_situation.md Bug 1
  ../fresnel-fourier/phase3_iter1_baseline.md Baseline A + B
  ../fresnel-fourier/phase4_iter1_plan.md Clause 6, File 1
  ../fresnel-fourier/phase5_iter1_review.md (Nit 6, kept smaller)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 07:45:35 +02:00
claude-noether 65969da3ee iter8 Phase 4: tests/run_perf_binding_cell.sh — perf binding cell harness
Anchors campaign-wide claims with measured numbers. Runs four
consumer configurations against $FIXTURE for $DURATION seconds each:
  1. mpv --hwdec=vaapi          (DMA-BUF zero-copy through libva)
  2. mpv --hwdec=vaapi-copy     (HW decode + VAImage readback)
  3. firefox (iter5-amend, sandbox enabled, file:// URL)
  4. mpv --hwdec=no             (SW decode baseline / control)

Captures per consumer: CPU% (median + p90 from pidstat), GPU freq
median (from /sys/class/devfreq/fde60000.gpu/cur_freq, polled at
100ms cadence), drops in window (from mpv --term-status-msg),
p50 frame interval (mpv only), VmRSS delta (from /proc/PID/status).

Emits a markdown table with raw numbers per consumer — no aggregation,
no improvement ratios, no curated-benchmark framing. Honest schema
including '—' for measurements not available per consumer (e.g.
Firefox drops without internal hooks).

Phase 5 sonnet review caught 3 issues, all addressed before commit:
1. pidstat $8 column heuristic — replaced with header-driven %CPU
   field detection (robust across sysstat 12.x point releases)
2. GPU freq median computation used /dev/stdin in nested subshell-
   over-pipe (unreliable) — replaced with temp-file path
3. --frames=$((DURATION * 30)) hardcoded 30fps (fixture-hardcoding
   per feedback_no_fixture_hardcoding.md) — replaced with
   --length=$DURATION (wall-time bounded, framerate-agnostic)

Plus minor: empty cpu_pct.log now emits ERR rather than silent 0,
distinguishing measurement failure from "process used no CPU."

Reproducibility surface: run date, host, kernel, driver sha256,
fixture path+size, duration captured in the output markdown.
Hardware constants (/dev/video1, /dev/media0, devfreq path,
driver install path) are documented as PineTab2 (RK3566 via
hantro/rk3568-vpu) specific.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 11:59:13 +00:00
claude-noether dcaa1f12e5 docs: clarify Rockchip silicon — PineTab2 is RK3566, not RK3568
Surfaced during iter7 Track F research: the campaign target hardware
is Rockchip RK3566 silicon (PineTab2). The hantro driver attaches
via the rockchip,rk3568-vpu DT compatible because the RK3566 silicon
is close enough to RK3568 to share that variant. The proper RK3566
mainline driver target (rkvdec2 / vdpu346) has no kernel support yet
— Christian Hewitt's patch series LKML 2025/12/26/206 is unmerged.

Updates the two src/ comments that called the hardware "RK3568":
- context.c: hantro-vpu device-init S_EXT_CTRLS comment now reads
  "via rockchip,rk3568-vpu DT compatible (covers RK3568 and RK3566
  — PineTab2 silicon — since they're close enough)"
- h264.c: DPB pic_num discussion ends "...never surfaced on PineTab2
  (RK3566 via hantro/rk3568-vpu)"

Not a correctness change. Compiles + decodes identically. The
update matters for upstream submission accuracy (bootlin/Rockchip
maintainers will care which silicon the campaign tested on).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 11:39:11 +00:00
claude-noether 7bd0818792 iter7 Phase 7 finalization: OUTPUT-pool teardown + test refinements
Surfaced during Phase 7 verification on ohm:

1. **OUTPUT pool stale-slot bug (src/surface.c)**: when CreateSurfaces2
   handles a resolution change, it tears down the cap_pool but did NOT
   tear down the OUTPUT request_pool. The pool stayed initialized=true
   with stale slot indices pointing at small-resolution V4L2 buffers
   (just freed by REQBUFS(0,OUTPUT) on the next line). Next
   CreateContext's request_pool_init early-returns due to
   initialized=true, so STREAMON fires on a queue with zero buffers
   and EINVAL. Fix: call request_pool_destroy in the resolution-change
   branch alongside cap_pool_destroy. Mirror the cap_pool teardown.

   Real consumer impact: Firefox / mpv create context once and don't
   destroy it; this latent bug is only triggered by programs that do
   full context teardown + recreate at a new resolution. Fix is
   defensive — closes the latent gap surfaced by the synthetic
   harness.

2. **cap_pool_probe_pattern.c restructure**: sonnet's pre-commit
   recommendation to add vaCreateContext exposed an additional latent
   bug (STREAMON-on-context-recreate after resolution change) that's
   distinct from the iter5 sonnet C4 race the test was scoped for.
   Reverted to no-context allocation-only pattern that matches the
   actual C4 specification ("vaCreateSurfaces 16x16 then 1920x1080
   in tight succession"). The new STREAMON bug is logged as iter8
   candidate.

3. **run_cap_pool_probe.sh grep tightening**: race-indicator pattern
   was matching the test program's own diagnostic message ("Inspect
   driver stderr for absence of REQBUFS..."). Now grep restricts to
   lines starting with "v4l2-request:" prefix.

Phase 7 results (clean iter7 driver sha 54999017... + this fix):
- Track A (msync verify): 100 frames byte-for-byte SW=HW (sha
  58c8f3f4...) -> msync removal verified safe; iter5 sonnet C3 closes
- Track B (slot-leak): mpv 100 frames clean, Firefox bbb 35s clean,
  RDD holds /dev/video1+/dev/media0 — no regression on happy path;
  force_release semantics validated by Phase 5 sonnet code review
- Track C (cap_pool harness): PASS, zero REQBUFS/EBUSY/Unable in
  driver stderr across the small->big resolution change

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 09:29:46 +00:00
claude-noether 988b848908 iter7: A+B+C — slot-leak fix, cap_pool harness, msync verify harness
Closes three internal carry items in one fork commit. iter6 deferred
these as TODOs; iter7 lands the implementations + supporting tests.

# Track B — slot-leak error recovery (src/)

iter6 documented the RequestSyncSurface error paths as a "bounded
leak we accept" — slots stayed busy=true after REINIT/DQBUF failures
until RequestTerminate ran. With pool=16 and rare errors this was
acceptable, but a sustained-error scenario could starve the pool.

Adds request_pool_force_release(pool, index) which:
1. Tries media_request_reinit on the slot's fd (cheap path)
2. Falls back to close + media_request_alloc (recovery)
3. Leaves the slot dead-busy if even alloc fails (other slots
   unaffected, pool capacity reduced by 1 until destroy)

Wires it into surface.c RequestSyncSurface error paths only for
errors before the OUTPUT-DQBUF attempt. After OUTPUT-DQBUF failure
the V4L2 buffer is in indeterminate kernel state, so a separate
error label (`error_buffer_indeterminate`) leaves the slot
dead-busy — reusing the slot would QBUF on a kernel-still-held
buffer and EINVAL.

Phase 5 sonnet review caught this discriminator subtlety pre-commit.

Files: request_pool.{h,c}, surface.c.

# Track C — cap_pool race synthetic harness (tests/)

iter5 sonnet C4 / iter6 candidate A: cap_pool resolution-change
race was organically exercised by YT's quality renegotiations
(iter6 close, 4 cap_pool_init events clean) but had no
deterministic regression test.

tests/cap_pool_probe_pattern.c — ~170-line C program: opens
libva display, vaCreateConfig, vaCreateSurfaces(small) +
vaCreateContext (triggers OUTPUT pool init at small resolution),
dispose, vaCreateSurfaces(big) + vaCreateContext (forces S_FMT
on the new resolution against an in-use OUTPUT pool — the actual
race-hitting path).

Phase 5 sonnet flagged that without vaCreateContext the test
would pass trivially (OUTPUT pool never init'd, REQBUFS(0) on
empty queue is a no-op). Fixed before commit.

tests/run_cap_pool_probe.sh — runner; greps driver stderr for
REQBUFS / EBUSY / "Unable to set format" race indicators.

# Track A — msync pixel-correctness verify harness (tests/)

iter5 sweep removed msync(MS_SYNC|MS_INVALIDATE) from CAPTURE
DQBUF path. iter5 sonnet C3 flagged: no formal pixel verification.

tests/run_msync_pixel_verify.sh — runs FFmpeg SW decode (libavcodec
reference) and FFmpeg HW decode (via our v4l2_request driver),
compares NV12 byte streams. Probes fixture dimensions via ffprobe
and uses crop=$W:$H after hwdownload to normalize MB-padding
artifacts (hantro pads height to 16-line align; SW returns
crop-aligned).

Phase 5 sonnet flagged the stride-mismatch false-failure risk
pre-commit. Fixed: explicit crop + diagnostic that distinguishes
genuine pixel divergence from MB-padding stride artifacts.

# Phase 5 sonnet code review

Verdict: APPROVE-WITH-CHANGES. Three actionable findings, all
addressed before this commit:
1. surface.c error path: separated OUTPUT-DQBUF-failure into
   error_buffer_indeterminate label, slot stays dead-busy
2. cap_pool_probe_pattern.c: added vaCreateContext to actually
   exercise the OUTPUT pool init at the small resolution
3. run_msync_pixel_verify.sh: explicit crop on HW path,
   stride-mismatch diagnostic distinguished from corruption

Empirical verification (Phase 6+7 deploy + run): pending operator
ohm-tools availability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 06:49:48 +00:00
claude-noether a09c03c154 iter6 fix: per-OUTPUT-slot request_fd binding via REINIT
iter4 (385dee1) replaced the original media_request_reinit pattern
with close+media_request_alloc per frame to escape an EINVAL on
S_EXT_CTRLS that turned out to be a DPB-payload bug (74d8dd1, FFmpeg
V4L2_H264_FRAME_REF semantics). The per-frame close+alloc model
worked for mpv vaapi-copy (single-surface recycle) but raced under
Firefox 150's MediaSource pipeline (multi-surface rotation): fd=30
got reused via lowest-free-fd allocation faster than the kernel-
side per-buffer state-machine could tear down the prior request,
producing intermittent VIDIOC_QBUF EINVAL on OUTPUT after 1..53
successful frames.

Phase 2 telemetry confirmed:
- DQBUF returned the index we passed (no FIFO mismatch)
- SPS/PPS/DECODE_PARAMS/SCALING_MATRIX byte-identical between mpv
  and Firefox first 64 bytes
- Pool size bump 4 -> 16 only delayed the failure (62 frames)
- Different OUTPUT slot indices failed across runs (race signature)

Fix: each OUTPUT pool slot owns a permanent request_fd allocated
once at request_pool_init and REINIT'd between uses in
RequestSyncSurface. 1:1 slot-to-fd binding eliminates cross-slot fd
reuse entirely. Pool stays driver-wide (multi-context safe per
iter5 Track E); slots cycle through 16 distinct fds in round-robin
acquire.

Files:
- request_pool.h: add request_fd field to slot struct; init
  signature takes media_fd
- request_pool.c: alloc per-slot fd at init, close at destroy
- context.c: pass driver_data->media_fd; pool size 4 -> 16
- picture.c: BeginPicture binds slot->request_fd to surface;
  EndPicture's per-frame media_request_alloc removed
- surface.c: RequestSyncSurface uses media_request_reinit instead
  of close+alloc; DestroySurfaces close removed (slot owns fd);
  error path close removed; surface_object NULL-init for the
  -Wmaybe-uninitialized warning fix

Empirical verification (clean build sha ebe396d5..., no diagnostic
instrumentation):
- Firefox 150 + bbb_1080p30_h264.mp4 + LIBVA_DRIVER_NAME=v4l2_request
  + sandbox enabled: 35s+ playback, zero "Unable to queue buffer"
  / "Unable to set control(s)", lsof shows RDD process holds
  /dev/video1 + /dev/media0 throughout. Driver stderr: only the
  single cap_pool_init: 24 slots ready line.
- mpv vaapi-copy 50 frames: zero errors, "Using hardware decoding
  (vaapi-copy)" - no regression vs iter5-end driver.

Pool-size bump diagnostic (Phase 5 sonnet design review feedback):
4 -> 16 alone took 1->62 frames, far short of the 30s success
criterion (~900 frames at 30fps). REINIT discipline is the actual
fix; pool 16 is comfortable headroom over typical H.264 MaxDpbFrames.

Phase 5 sonnet code review: APPROVE-WITH-CHANGES (one comment
attribution corrected: cleanup runs at RequestTerminate, not
RequestDestroyContext, since the pool is driver-wide).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:30:39 +00:00
test0r c8b6edec3d iter5 sweep follow-up: remove additional DEBUG sites flagged by Phase 5 review
Phase 5 sonnet review caught four DEBUG sites the first sweep pass
missed (the vaapi-copy + --vo=null stress test didn't exercise the
ExportSurfaceHandle path, so per-frame ExportSurfaceHandle dumps went
undetected).

Removed:
- surface.c::CreateSurfaces2 format-dump (per-CreateSurfaces2 noise,
  labeled DEBUG INSTRUMENTATION (surface-export diagnosis 2026-05-04))
- surface.c::ExportSurfaceHandle full-descriptor dump (per-frame for
  consumers using DMA-BUF, also labeled DEBUG)
- surface.c::QuerySurfaceStatus -> status= line (per-call noise)
- h264.c V4L2 readback block (~67 lines): static bool readback_warned
  + the per-frame VIDIOC_G_EXT_CTRLS attempt + the readback success
  log + the "V4L2 readback unavailable" fallback announcement. With
  the iter4 fixes landed, the readback EACCES is no longer load-bearing
  to investigate — drop the block + the per-process global state.

Removing the readback block also resolves Phase 5 finding C2: the
static bool readback_warned was new mutable process-global state
introduced post-Track-E, inconsistent with that track's intent.

Net: -107 lines from src/{h264,surface}.c.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 16:04:03 +00:00
test0r b993355507 iter5 Track E: move LAST_OUTPUT_WIDTH/HEIGHT from process-global to per-driver-data
Sonnet review 7.3 / 9.6 from iter1 + carried iter2/3/4 substrate.
Two libva driver_data instances in the same process (e.g. Firefox
playing two tabs at different resolutions, or Firefox + mpv via the
same dlopened backend) would race on the static cache.

Move to struct request_data.last_output_width/height. The V4L2
device fd is already per-driver_data, so this is the correct binding
unit (one fd, one current OUTPUT format).

Verified: two concurrent mpv processes (2s stagger) both decode
300 frames cleanly with no cross-corruption. Same-instant init still
hits kernel-level fd contention on /dev/video1 (hantro is a
single-instance device); cross-process serialization is out of scope
for a libva backend.

Resolves the surface_reset_format_cache() callsite: now takes
driver_data parameter (was zero-arg).

Also drops the 'rc' unused-variable warning in v4l2_ioctl_controls
that the iter5 sweep left behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:05:41 +00:00
test0r 843febc174 iter5 sweep: remove iter1 slice_header parse + VAPicture dump + Sync RETURN trace
h264.c:
- Remove the slice_header parse success log (the parse data is now
  forwarded into decode_params directly without per-frame echo). Keep
  the FAILED-rc log since it indicates a real decode-blocking error.
- Remove the iter1 patch-0014 VAPictureH264 byte-dump + field-read
  log block. The TopFieldOrderCnt=65536 anomaly it diagnosed was
  resolved by the POC sentinel strip (h264_strip_ffmpeg_poc_sentinel)
  that stays in the codebase.

surface.c:
- Remove the per-call "RequestSyncSurface RETURN status=" trace.
- Remove the per-call "RequestSyncSurface early-exit" trace.

v4l2.c:
- Suppress the per-frame "Unable to get control(s): Permission denied"
  log when errno == EACCES (the expected case on this hantro rig
  per iter1 patch-0014's findings). The one-time announcement in
  h264.c stays. Real EACCES-on-non-request-fd or other errno values
  still log normally.

Per-frame v4l2-request log noise drops from ~30+ lines/frame to
init-time + once-per-resolution-change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:51:10 +00:00
test0r d3a299b4cc iter5 sweep: remove iter1 patch-0010 hex-dumps + patch-0011 sentinel
picture.c: remove the 0xab sentinel write into CAPTURE buffer first
32 bytes pre-QBUF + the OUTPUT hex-dump pre-QBUF. Both were iter1
diagnostics for "where does the buffer write go?" investigation.

surface.c: remove the post-DQBUF CAPTURE Y-plane hex-dump + luma
variance signal. The msync(MS_SYNC|MS_INVALIDATE) was added as a
companion fix for the cached-mmap issue surfaced by the dump itself —
removing the dump removes the need for the msync.

With iter1+iter2+iter3+iter4 fixes landed, these dumps fire on every
single frame and produce hundreds of MB of log noise during sustained
decode. Now gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:48:31 +00:00
test0r 951233a12e iter5 sweep: remove iter1 ENTER traces (13 call sites across 4 files)
Removes the iter1 patch-0014 ENTER traces from buffer.c, image.c,
picture.c, surface.c. These were diagnostic-only entry-point logs
added during iter1's "where does Firefox RDD crash?" investigation.
With the iter1+iter2+iter3+iter4 fixes landed, the entry-point
traces are pure noise.

If a future investigation needs entry-point coverage, strace -e trace
on the libva consumer process gives equivalent visibility without
modifying the driver.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:47:25 +00:00
test0r 39498f0d8e iter5 sweep: remove iter4 DPB census instrumentation from h264.c
Removes the pre-S_EXT_CTRLS DPB census + per-entry dump that helped
diagnose iter4's frame-11 EINVAL bug. With the fix landed (385dee1),
the diagnostic is no longer needed in the release driver.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:46:10 +00:00
test0r 848fc0c4c4 iter5 sweep: remove iter3+iter4 Y2 instrumentation from v4l2.c
Removes iter3 Y2 v1 (S_EXT_CTRLS rejected logging) + iter4 Y2 v3
(TRY_EXT_CTRLS retry) + iter4 per-control TRY isolation. With the
frame-11 EINVAL fix landed in iter4 (385dee1), these diagnostics no
longer fire under expected workloads, and they're noise for any
upstream submission.

If a future EINVAL re-introduces, the per-control TRY isolation
pattern is documented in feedback_kernel_obfuscation_compound.md and
can be re-applied surgically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:45:43 +00:00
test0r b81ce6981f iter4 fix: B-slice L1 reflist .fields copy-paste bug
In h264_va_slice_to_v4l2, the B-slice L1 reflist loop wrote .fields
into ref_pic_list0[i] instead of ref_pic_list1[i]. This corrupted L0
reflist fields when L1 was being built and left ref_pic_list1[i].fields
zero (which the kernel may interpret as "no valid field reference").

Pre-existing pre-iter4 bug (caught by iter4 Phase 5 sonnet review,
finding C2). Latent on hantro bbb_1080p30 in FRAME_BASED mode because
hantro walks reference_ts directly and ignores SLICE_PARAMS.fields,
but the bug is wrong-by-construction and would surface on any driver
that reads SLICE_PARAMS reflist fields, on interlaced content, or in
SLICE_BASED decode mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:15:40 +00:00
test0r f21bdf0d50 iter4 DEBUG: per-control VIDIOC_TRY_EXT_CTRLS isolation
Iterates each control individually through VIDIOC_TRY_EXT_CTRLS on
S_EXT_CTRLS EINVAL. Used in iter4 Phase 4 to diagnose the carryover
frame-11 EINVAL: discovered all four H.264 controls fail individually
on the same request_fd → diagnosis pivot from "bad control content"
to "bad request_fd state," which led to the fresh-request_fd-per-frame
fix in 385dee1.

Stays in for the iter5 DEBUG sweep alongside iter1 ENTER traces +
iter3 Y2 + iter4 Y2v3 + iter4 DPB census.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:53 +00:00
test0r 385dee1bbf iter4 fix: fresh request_fd per frame (fixes carryover EINVAL)
This is the load-bearing fix that resolves the iter1+iter2+iter3
"frame-11 EINVAL" carryover. Replace the per-surface request_fd cache
+ MEDIA_REQUEST_IOC_REINIT pattern with allocate-fresh-per-frame:
in RequestSyncSurface, after queue + wait_completion succeed, close
the request_fd and reset surface_object->request_fd = -1 so the next
BeginPicture allocates a new one via media_request_alloc.

Diagnostic root cause: per-control VIDIOC_TRY_EXT_CTRLS isolation
showed all four H.264 controls (SPS/PPS/DECODE_PARAMS/SCALING_MATRIX)
fail individually with EINVAL on the *same* request_fd that had been
through queue+wait+reinit. The fd state was bad even though every
ioctl in the previous decode cycle returned success. Allocating fresh
sidesteps any kernel-side request-state-machine subtlety we don't
fully understand.

Empirical verification (iter4 Phase 7, 90s autonomous run on ohm via
firefox-fourier without MOZ_DISABLE_RDD_SANDBOX=1, bbb_1080p30 H.264):
  - ENETDOWN count: 0
  - S_EXT_CTRLS rejected: 0  (was: fired at frame 11 every iter1-3)
  - Unable to set control(s): 0
  - Generic EINVAL: 0
  - Video stream mTime reached: 49.7 seconds
  - Audio stream mTime reached: 51.5 seconds

Cost: ~one extra MEDIA_IOC_REQUEST_ALLOC + close() per decoded frame.
Negligible (cycles below the V4L2 set_controls + queue + wait stack).

Companion fixes that landed earlier in iter4 to get to this point:
  74d8dd1 — DPB fields=V4L2_H264_FRAME_REF + skip !used entries
            (matches FFmpeg's libavcodec/v4l2_request_h264.c semantics)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:41 +00:00
test0r 4892656b3f iter4 DEBUG: pre-S_EXT_CTRLS DPB census + per-entry dump
Inline log of DECODE_PARAMS.flags, sps.max_num_ref_frames, dpb counts
(valid/active/long-term/internally-used), and per-entry frame_num /
pic_num / fields / reference_ts immediately before each S_EXT_CTRLS
submission.

Used in iter4 Phase 4 to identify (a) the dpb->fields=0 bug and
(b) the stale-entry growth bug. Stays in for iter4 Phase 4 continuation
(at least one more bug still produces EINVAL after frame ~20). Remove
at iter5 DEBUG sweep alongside iter1 ENTER/CAPTURE-dump and iter3 Y2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:44:56 +00:00
test0r 74d8dd134a iter4 partial fix: DPB fill matches FFmpeg semantics
Two empirically-validated correctness fixes from comparing our h264.c
fill_dpb against FFmpeg's libavcodec/v4l2_request_h264.c::fill_dpb on
the iter3 test rig (Firefox-fourier on ohm RK3568, bbb_1080p30 H.264):

1. Set dpb[].fields = V4L2_H264_FRAME_REF for every valid entry. The
   kernel's v4l2_h264_init_reflist_builder iterates dpb[] and skips
   entries with fields == 0 — they count as "no field reference"
   regardless of VALID/ACTIVE flags. Without this, P-slices that
   need to walk the reference list (first one in BBB is at frame 11)
   hit "no valid refs" and S_EXT_CTRLS rejects the request with
   EINVAL (error_idx == count = kernel's "application bug" sentinel).

2. Skip entries with valid=true but used=false. dpb_update() clears
   `used` for all entries then re-marks only those in the current
   ReferenceFrames[] list. Stale entries (frames the consumer has
   retired from its DPB) were being included, growing the V4L2 dpb[]
   monotonically until H264_DPB_SIZE while SPS.max_num_ref_frames
   may be 4. FFmpeg iterates h->short_ref[] / h->long_ref[] only —
   the currently-referenced set.

Empirical: from "10 frames decode, frame-11 P-slice EINVAL" to
"~20 frames decode, then a different EINVAL on later frames."
Confirms both fixes are correctness improvements but Track A is not
yet fully resolved — at least one more bug remains. iter4 stays open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:44:53 +00:00
test0r a12d29937c iter4 DEBUG: Y2 v3 — retry with TRY_EXT_CTRLS on S_EXT_CTRLS EINVAL
Per kernel comment in v4l2-ctrls-api.c:222-224, S_EXT_CTRLS deliberately
obfuscates by setting error_idx = count, while TRY_EXT_CTRLS reports
the actual failing index. Adds TRY retry inside the EINVAL diagnostic
path.

Empirical finding (iter4 Phase 4): TRY also returned error_idx == count
on the frame-11 EINVAL on bbb_1080p30. Conclusion: failure is in the
post-validate cluster commit (hantro driver's try_ctrl op or similar
state-coherence check), NOT in any individual control's std_validate.
The kernel comment may be outdated for compound controls, or the
H.264 stateless cluster is committed atomically post-validate where
error_idx is intentionally not updated for either S or TRY.

Path forward (Phase 4 next): switch from "read kernel source" to
"diff our DECODE_PARAMS construction vs FFmpeg's libavcodec/v4l2_request_h264.c"
to identify field-by-field divergence at frame 11.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:25:40 +00:00
test0r 086b7ce8cb iter3 DEBUG: S_EXT_CTRLS EINVAL diagnostic in v4l2_ioctl_controls
When VIDIOC_S_EXT_CTRLS returns -EINVAL, log num_controls, error_idx,
and per-control id+size. Lets iter3+ debug "Unable to set control(s):
Invalid argument" failures by naming exactly which control set was
rejected — previously the request_log line in v4l2_set_controls just
printed strerror(errno) with no specificity.

Used in iter3 Phase 7 to confirm the frame-11 EINVAL is request-level
("error_idx == num_controls" sentinel = kernel rejected but couldn't
pinpoint a single field) rather than a single-control size mismatch.

To remove at iter4 DEBUG sweep alongside iter1 ENTER/CAPTURE-dump
instrumentation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:57:01 +00:00
test0r 4a7a07e0f4 iter3 Fix: select() → poll() in media_request_wait_completion
Firefox's RDD seccomp common policy admits poll/ppoll/epoll_* but does
NOT admit select/pselect6. Under the iter3 sandbox-patched RDD process,
our select(except_fds) call returned ENOSYS (Mozilla's seccomp uses
SECCOMP_RET_ERRNO with ENOSYS for filtered syscalls — not SIGSYS),
killing libva decode after just one BeginPicture.

poll(POLLPRI) is functionally equivalent for waiting on the media
request fd's exceptional-condition completion signal, and lives
inside a syscall family Mozilla's sandbox already permits. Driver-side
fix preferred over expanding Firefox's seccomp surface — smaller blast
radius, portable across sandbox policies, and poll() is the modern API.

Verified iter3 Phase 7 on ohm: with this change in place plus the
firefox-fourier broker + seccomp ioctl '|' patches, Firefox decodes
through libva inside the sandboxed RDD without MOZ_DISABLE_RDD_SANDBOX=1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:56:49 +00:00
test0r 19acc76da4 iter2 Fix 3: decoupled CAPTURE buffer pool with LRU recycling
Pre-iter2 each VA surface was permanently 1:1 bound to one V4L2 CAPTURE
buffer. mpv reusing a surface for a new decode while the compositor still
held an EXPBUF'd dma_buf fd to the prior frame caused the kernel to
write fresh decode output into the same physical memory the compositor
was reading -- visible as stutter / back-and-forth swap on
mpv --hwdec=vaapi --vo=gpu playback.

Architecture:
- New cap_pool abstraction (cap_pool.{h,c}) owns N CAPTURE buffers
  (N = max(surfaces_count, MIN_CAP_POOL=24)) with per-slot state
  {FREE, IN_DECODE, DECODED, EXPORTED} guarded by pthread_mutex_t.
- Surfaces no longer own buffers; each vaBeginPicture acquires the
  oldest FREE slot (LRU), binds it for the decode cycle, and the slot
  cycles IN_DECODE -> DECODED (post-DQBUF) -> EXPORTED (post-EXPBUF).
- Slot is released on next BeginPicture for the same surface or on
  vaDestroySurfaces.

Limitations (Sonnet Phase 5 review iter2 9.x, deferred to iter3+):
- Option-A statistical mitigation; race window narrows to "pool
  exhausted, force-recycle of oldest EXPORTED slot." For typical mpv
  16-surface playback with MIN_CAP_POOL=24 the fallback never fires.
- Multi-context concurrent use not addressed (one V4L2 device, multiple
  cap_pools -- iter3 scope).

Other call sites updated:
- picture.c::BeginPicture acquires + binds, releasing prior slot if any.
- surface.c::SyncSurface marks slot DECODED after DQBUF.
- surface.c::ExportSurfaceHandle marks slot EXPORTED, retaining OUR
  EXPBUF fd for force-recycle close().
- surface.c::DestroySurfaces releases via surface_unbind_slot;
  cap_pool owns the mmaps now.
- surface.c::CreateSurfaces2 destroys the pool in the resolution-change
  path before REQBUFS(0) (else stale v4l2_index after Fix 1's REQBUFS).
- context.c::DestroyContext invokes cap_pool_destroy.
- image.c::DeriveImage skips copy_surface_to_image when current_slot is
  NULL (ffmpeg av_hwframe_ctx_init probes derive on undecoded surfaces).

Verified: mpv vaapi-copy 200 frames bbb_1080p30, 0 drops, LRU visibly
recycling slot indices, real luma gradient. mpv vaapi --vo=gpu
operator-inspection follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:03:31 +00:00
test0r e64bb0852d iter2 Fix 2: conditional DRM_FORMAT_MOD_INVALID for non-64-aligned pitch
Iteration 2 Fix 2: branch on bytesperline alignment when setting
the drm_format_modifier in RequestExportSurfaceHandle. For
non-64-byte-aligned pitches (e.g. 864 for 864-wide videos),
report DRM_FORMAT_MOD_INVALID instead of DRM_FORMAT_MOD_NONE
(LINEAR explicit). Mesa's WSI rejects LINEAR buffers that
aren't scanout-aligned with 'WSI pitch not properly aligned';
MOD_INVALID tells the importer to treat as texture-only, which
is the correct behavior for buffers that don't meet scanout
alignment requirements.

Diagnosis from operator's mozilla.org session in iteration 1
close: 864-wide intro videos triggered the WSI alignment error
and Firefox fell back to SW for those videos.

Sonnet Phase 5 review endorsed the conditional approach over a
universal MOD_INVALID change to preserve LINEAR semantics for
already-aligned content (avoids unnecessary perf cost on the
common 1920-wide case).

Verification path (Phase 7 of iteration 2): Firefox loads
mozilla.org main page; check no MESA WSI errors in stderr;
operator confirms intro videos engage HW decode (or at least
don't fall back).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 19:18:55 +00:00
test0r 06beef6248 iter2 Fix 1: invalidate format cache on DestroyContext + REQBUFS(0) on CAPTURE in resolution-change path
Fix 1 of iteration 2 per phase4_iter2_plan.md.

Adds surface_reset_format_cache() exposed from src/surface.h. Called
from RequestDestroyContext after the dual REQBUFS(0). Without this,
multi-video Firefox sessions on mozilla.org corrupted the next
session's CAPTURE format query: the kernel reset to defaults but
our LAST_OUTPUT_WIDTH/HEIGHT cache still said 'already 1920x1088,'
so the next G_FMT returned 48x48 and the exported descriptor
encoded wrong pitch/offset.

Also adds REQBUFS(0) on CAPTURE in the resolution-change path of
RequestCreateSurfaces2 (Sonnet Phase 5 review iter2 9.1). The
existing code only did REQBUFS(0) on OUTPUT before re-S_FMTting;
hantro derives CAPTURE format from OUTPUT format, so leftover
CAPTURE buffers from the prior resolution would also block the
implicit format change. Pre-existing bug surfaced by Sonnet's
audit; Fix 3 pool refactor would have exposed it more often.

Limitation noted in surface.h docblock: the LAST_OUTPUT_WIDTH/
HEIGHT cache is a static process-global, so concurrent multi-
context use still races (Sonnet 7.3 / 9.6). Iteration 2 only
addresses sequential sessions. Multi-context safety is iteration 3+.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 19:11:03 +00:00
test0r c036a44f98 image: fully populate VAImageFormat per VAAPI spec for NV12
QueryImageFormats and DeriveImage previously set only .fourcc and
left byte_order, bits_per_pixel, depth, and color masks zero
(uninitialized in the caller's buffer). VAAPI consumers that read
these fields (FFmpeg's hwcontext_vaapi.c::vaapi_init_pixfmt,
intel-vaapi-driver test paths) inherit caller-stack garbage with
non-deterministic behavior.

Cross-reference: Mesa's gallium/frontends/va/image.c and
intel-vaapi-driver's i965_drv_video.c both publish NV12 with
byte_order=VA_LSB_FIRST and bits_per_pixel=12. We now match.

For YUV formats, depth/red_mask/green_mask/blue_mask/alpha_mask
are not meaningful (RGB-bitlayout-only fields); leave them zeroed
via memset.

Audit context: 2026-05-04 cross-reference of all libva entry
points Firefox 150 calls vs our backend implementations. The
SEPARATE_LAYERS fix (commit ac891a0) cleared the load-bearing
bug; this fixes a latent uninitialized-field issue that was
masked by mpv's tolerance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:34:50 +00:00