libva-v4l2-request-fourier

Author	SHA1	Message	Date
claude-noether	e1aca9cc6b	fresnel-fourier iter3 Phase 6 commit D: buffer.c whitelist for VAProbabilityBufferType Phase 2 source-read assumed buffer.c was type-agnostic ("the buffer registry is type-agnostic" per phase2_iter3_situation.md non-bugs list). FALSE. RequestCreateBuffer at buffer.c:59-70 has an explicit allow-list switch: case VAPictureParameterBufferType: case VAIQMatrixBufferType: case VASliceParameterBufferType: case VASliceDataBufferType: case VAImageBufferType: break; default: return VA_STATUS_ERROR_UNSUPPORTED_BUFFERTYPE; Without VAProbabilityBufferType in the allow-list, the consumer gets VA_STATUS_ERROR_UNSUPPORTED_BUFFERTYPE on vaCreateBuffer for the probability buffer, BEFORE codec_store_buffer is ever reached. ffmpeg-vaapi log: [vp8] Failed to create parameter buffer (type 13): 15 (the requested VABufferType is not supported). Same iter1 Commit D pattern: Phase 2 grep didn't find this, runtime enumerated authoritatively. Per memory feedback_header_deletion_ check.md ("let the compiler enumerate them") — but extended here: runtime enumerates allow-list violations the same way the compiler enumerates include-site violations. Fix: add `case VAProbabilityBufferType:` to the buffer.c allow-list. +1 line, mechanical. Refs: ../fresnel-fourier/phase2_iter3_situation.md (incorrect non-bug claim about buffer.c) ../fresnel-fourier/phase4_iter3_plan.md (Commit D placeholder for fix-forward — used) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 23:03:59 +00:00
claude-noether	7f84bbb50f	fresnel-fourier iter3 Phase 6 commit C: picture.c VP8 dispatch + 4 buffer-type cases + new VAProbabilityBufferType outer case + per- frame reset + surface.h params.vp8 union extension Five sites in picture.c + one site in surface.h wire up the VP8 codec dispatcher introduced by commit B: 1. Include #include "vp8.h" in the codec headers block. 2. codec_set_controls: NEW case VAProfileVP8Version0_3 calling vp8_set_controls(driver_data, context, surface_object). Same shape as MPEG-2 + HEVC dispatch. 3. codec_store_buffer VAPictureParameterBufferType: NEW VP8 case memcpy'ing into surface_object->params.vp8.picture (sizeof VAPictureParameterBufferVP8). 4. codec_store_buffer VASliceParameterBufferType: NEW VP8 case memcpy'ing into surface_object->params.vp8.slice (single, no slices[] array — VP8 is frame-mode, no multi-slice). 5. codec_store_buffer VAIQMatrixBufferType: NEW VP8 case memcpy'ing into surface_object->params.vp8.iqmatrix + setting iqmatrix_set true. 6. codec_store_buffer NEW outer case VAProbabilityBufferType (Phase 5 C3: NOT VAProbabilityDataBufferType — that's the STRUCT name; the buffer-type enum constant is VAProbabilityBufferType = 13 per va.h:2058). Inner switch dispatches by profile, with VP8 case memcpy'ing into surface_object->params.vp8.probability + setting probability_set true. 7. RequestBeginPicture: NEW per-frame reset for the two VP8 flags — params.vp8.iqmatrix_set = false + params.vp8.probability_set = false. Mirrors the existing iter1 (h264.matrix_set) + iter2 (h265.num_slices) per-frame resets. surface.h extension: 8. params union: NEW vp8 struct after h265 — holds the 4 VAAPI buffer-type structs (VAPictureParameterBufferVP8, VASliceParameterBufferVP8, VAIQMatrixBufferVP8 + iqmatrix_set, VAProbabilityDataBufferVP8 + probability_set). The NEW vp8 union member adds ~5300 bytes (sizeof VAProbabilityDataBufferVP8 dominated by dct_coeff_probs[4][8][3] [11] = 1056 + bookkeeping). The h265 member with slices[64] array remains the largest (~17 KB), so the union size doesn't grow. After this commit: backend builds clean, links cleanly. mpv-vaapi VP8 decode should engage end-to-end on hantro env binding. Phase 1 criteria 1 + 2 + 3 expected satisfied; criterion 4 (HW=SW byte- identical) and criterion 5 (3-codec regression) verified at Phase 6 smoke + Phase 7. Refs: ../fresnel-fourier/phase4_iter3_plan.md (Commit C site list) ../fresnel-fourier/phase2_iter3_situation.md (B6, B7, B8, B9 bug enumeration) ../fresnel-fourier/phase5_iter3_review.md (C3 VAProbabilityBuffer Type rename empirically verified) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 22:52:24 +00:00
claude-noether	017e27f389	fresnel-fourier iter3 Phase 6 commit B: NEW src/vp8.c + src/vp8.h + meson.build VP8 entries Net-new VP8 codec dispatcher implemented against V4L2_CID_STATELESS_VP8_FRAME (kernel UAPI <linux/v4l2-controls.h>: 1900-1958). Single batched control per frame, no init-time device- wide menus (VP8 has no DECODE_MODE/START_CODE). Per-frame submission: ONE VIDIOC_S_EXT_CTRLS, count=1, with full v4l2_ctrl_vp8_frame struct (1232 bytes — corrected vs Phase 2 implicit ~400 estimate; entropy.coeff_probs[4][8][3][11] alone is 1056 bytes). vp8_set_controls() implements 10 contract clauses per phase4_iter3_plan.md: Clause 1: single-control batched submission (count=1) Clause 2: stack alloc + memset zero (covers all padding) Clause 3: width/height/version/per-frame scalars; off-by-one num_dct_parts = num_of_partitions - 1 Clause 4: DPB timestamp resolution (3 refs: last/golden/alt; NULL surface → 0-sentinel via memset; mirrors iter1 mpeg2.c::pic.forward_ref_ts) Clause 5: loop filter (6 fields + 3 flag bits; ADJ_ENABLE/ DELTA_UPDATE/FILTER_TYPE_SIMPLE) Clause 6: quant base + delta derivation from VAAPI's per-segment absolute index matrix (subtraction recovers signed deltas; correct for typical content per Phase 5 S1) Clause 7: segment fields (segment_probs direct copy; flags assembled with DELTA_VALUE_MODE set unconditionally per FFmpeg pattern) Clause 8: entropy table — 3 VAAPI sources merged (Picture: y_mode + uv_mode + mv_probs; ProbabilityData: coeff_probs[4][8][3] [11] direct memcpy; IQMatrix: quant) Clause 9: coder state + first-partition fields + flags assembly Clause 10: v4l2_set_controls submission Phase 5 review amendments incorporated: C1 first_part_header_bits = slice->macroblock_offset NOT 0 — kernel hantro_g1_vp8_dec.c:260 + rockchip_vpu2_hw_vp8_ dec.c:372 read this field unconditionally to compute the MB- data DMA offset. Verified via source identity: vaapi_vp8.c:204 and v4l2_request_vp8.c:83 use byte-identical formulas (8 * (input - data) - bit_count - 8); VAAPI exposes via slice->macroblock_offset, V4L2 names it first_part_header_bits. C2 first_part_size = slice->partition_size[0] + ((macroblock_offset + 7) / 8) VAAPI's partition_size[0] is the REMAINING bytes after parsing (vaapi_vp8.c:209; va_dec_vp8.h:193-196). Kernel needs the TOTAL control partition size; recover by adding back ceil (macroblock_offset/8) bytes. Phase 3 keyframe verbatim cross-check: 21923 + 819 = 22742 ✓ C4 (int8_t) cast (NOT (s8); s8 is kernel-internal typedef from <linux/types.h> not exposed to userspace; userspace UAPI exposes __s8 with double-underscore; portable userspace cast is int8_t from <stdint.h>). S3 assert(probability_set) — kernel hantro_vp8.c::hantro_vp8_ prob_update reads coeff_probs unconditionally; NO default- table fallback. Practical risk low (FFmpeg vaapi_vp8.c always sends VAProbabilityBufferType per frame), but assert surfaces immediately if a future consumer doesn't. Flags assembly: 6 mainline-documented bits only (KEY_FRAME, SHOW_ FRAME, MB_NO_SKIP_COEFF, SIGN_BIAS_GOLDEN, SIGN_BIAS_ALT). EXP + bit 0x40 NOT replicated despite ffmpeg-v4l2-request-git setting them on inter frames — kernel hantro_vp8.c only inspects KEY_FRAME bit. SHOW_FRAME forced unconditional per Phase 3 Q4 (BBB has no alt-ref invisible frames; documented fidelity gap). VAAPI inverts: key_frame=0 means it IS a keyframe per VP8 spec. Backend writes V4L2_VP8_FRAME_FLAG_KEY_FRAME iff !picture->pic_fields.bits.key_frame. After this commit alone: vp8.o compiles standalone; meson.build links it into the shared library. picture.c can't dispatch yet (commit C wires that). Refs: ../fresnel-fourier/phase4_iter3_plan.md (10 contract clauses, Phase 5 amendments section) ../fresnel-fourier/phase5_iter3_review.md (C1, C2, C3, C4, S3 all incorporated) ../fresnel-fourier/phase3_iter3_baseline.md (verbatim payload anchors) references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp8.c (V4L2 ref) references/ffmpeg-kwiboo/libavcodec/vaapi_vp8.c (VAAPI source ref) references/linux-mainline/drivers/media/platform/verisilicon/ hantro_g1_vp8_dec.c (RK3399 kernel driver — first_part_header_ bits + first_part_size usage) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 22:51:12 +00:00
claude-noether	27d82e3cf4	fresnel-fourier iter3 Phase 6 commit A: VP8 enumeration + dispatch in config.c Three sites enabling VP8 profile recognition through the libva config path: 1. RequestQueryConfigProfiles: NEW enumeration block probing V4L2_PIX_FMT_VP8_FRAME against single + MPLANE OUTPUT formats. Mirrors iter2 HEVC enumeration block. Surfaces VAProfileVP8 Version0_3 in vainfo on hantro env binding. 2. RequestCreateConfig: NEW case VAProfileVP8Version0_3 with break — same shape as iter1 MPEG-2 + iter2 HEVCMain (no profile-specific config validation in the libva backend; validation deferred to vaCreateContext / control submission). 3. RequestQueryConfigEntrypoints: VAProfileVP8Version0_3 added to the existing fall-through case list — surfaces VAEntrypointVLD. After this commit alone, vainfo lists VP8Version0_3 (Phase 1 criterion 1) but vaCreateContext / runtime decode would fail at later stages because no codec dispatcher exists yet (added in commit B + C). Refs: ../fresnel-fourier/phase4_iter3_plan.md (Commit A site list) ../fresnel-fourier/phase2_iter3_situation.md (B1, B2, B3 bug enumeration) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 22:49:28 +00:00
claude-noether	8d71e20bf7	fresnel-fourier iter2 Phase 6 commit B: rewrite h265.c against new V4L2 stateless HEVC API Rewrites src/h265.c (407 lines → 588 lines) and the picture.c HEVC dispatch + per-slice accumulation against the modern split V4L2_CID_ STATELESS_HEVC_{SPS,PPS,SLICE_PARAMS,SCALING_MATRIX,DECODE_PARAMS, DECODE_MODE,START_CODE} stateless controls. Replaces the staging-era V4L2_CID_MPEG_VIDEO_HEVC_{SPS,PPS,SLICE_PARAMS} CIDs that were removed from the kernel UAPI. Per-frame submission: ONE batched VIDIOC_S_EXT_CTRLS, count=5, ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS: 0xa40a90 SPS (40 bytes) 0xa40a91 PPS (64 bytes) 0xa40a92 SLICE_PARAMS (variable; dynamic-array; one entry per slice) 0xa40a93 SCALING_MATRIX (1296 bytes; memset-zero when no scaling list) 0xa40a94 DECODE_PARAMS (328 bytes; per-frame DPB info) Plus device-wide menus set once at context.c init (separate batched S_EXT_CTRLS call so a kernel without HEVC controls — e.g. hantro on RK3568/RK3399 — silently fails its batch without invalidating H.264): 0xa40a95 DECODE_MODE (FRAME_BASED on rkvdec) 0xa40a96 START_CODE (ANNEX_B on rkvdec) Reference: FFmpeg libavcodec/v4l2_request_hevc.c:505-565 (v4l2_request_hevc_queue_decode batched submission shape). Phase 5 review amendments incorporated: C1 (data_byte_offset NOT data_bit_offset): Old h265.c at lines 184-209 ran an 8-bit search to compute bit-granularity offset. New API renames the field to data_byte_offset (u32 byte offset). Bit-search dropped; replaced with plain byte offset = source_offset + slice->slice_data_byte_offset. C2 (dpb_entry.flags only LONG_TERM_REFERENCE; pic_order_cnt_val singular; poc_st_curr_[] arrays hold DPB INDICES not POC): h265_fill_decode_params replaces old slice-params DPB iteration with explicit DPB classification + index-array population. For each VAAPI ReferenceFrames[i]: - Classify into ST_CURR_BEFORE / ST_CURR_AFTER / LT_CURR via VA_PICTURE_HEVC_RPS_ flags. - Set dpb[j].timestamp, .pic_order_cnt_val (singular), .field_pic. - Set dpb[j].flags = LONG_TERM_REFERENCE iff RPS_LT_CURR. - Append j (DPB index, u8) to poc_st_curr_before[k] / poc_st_curr_after[k] / poc_lt_curr[k] based on classification. C3 (union-aliasing reasoning corrected): BeginPicture's params.h265.num_slices = 0 reset is benign for non-HEVC profiles because byte ~17764 of the params union is past any field non-HEVC profiles read, NOT because RenderPicture's per-buffer copies overwrite that location. Wording amended in phase4_iter2_plan.md per phase5_iter2_review.md. S1 (PPS flags 19 + 20 — DEBLOCKING_FILTER_CONTROL_PRESENT and UNIFORM_SPACING): Empirically VAAPI does NOT expose either flag in the VAPictureParameterBufferHEVC pic_fields.bits or slice_parsing_fields.bits. Both bits left zero. BBB-720p10s_hevc fixture uses neither tiles nor explicit deblocking-control parameters, so the omission is correct for the iter2 binding cell. S2 (3 PPS scalars added): pic_parameter_set_id (default 0; VAAPI doesn't expose), num_ref_idx_l0_default_active_minus1, num_ref_idx_l1_default_ active_minus1 (both populated from VAAPI picture struct). Q2 (slice_segment_addr populated): Was missing in old h265.c. Now sourced from VAAPI's slice->slice_segment_address. S3 (SCALING_MATRIX content choice): Implementer choice taken: when iqmatrix_set==false (BBB has no scaling list per SPS flags = SAO\|STRONG_INTRA_SMOOTHING), h265_fill_scaling_matrix sends memset-zero. Matches FFmpeg's sl=NULL pattern at v4l2_request_hevc.c:384-403 (preserves byte-equality vs cross-validator anchor). S4 (FFmpeg function name fix): cosmetic; no code impact. Plus one Phase 6 inline correction: phase 5 review S1 suggested VAAPI exposes uniform_spacing_flag in pic_fields.bits; empirical test-compile shows it doesn't. Comment added in h265_fill_pps documenting the omission. Picture.c changes (3 edits): 1. codec_set_controls HEVCMain dispatch (lines 204-206 → call h265_set_controls; replaces explicit Fourier-local: HEVC stripped reject). 2. codec_store_buffer HEVC VASliceParameterBufferType case: append VAAPI slice param to params.h265.slices[N] array, increment num_slices. Single-slice mirror at .slice retained for h265_fill_pps (which reads dependent_slice_segment_flag from LongSliceFlags). 3. RequestBeginPicture: add params.h265.num_slices = 0 reset alongside existing h264.matrix_set = false reset. Surface.h: extend params.h265 struct with slices[HEVC_MAX_SLICES_PER_ FRAME=64] array + num_slices counter. ~17 KB extra per surface union; 24 surfaces in iter7 cap_pool = ~400 KB total surface_heap growth. object_heap allocator picks up new size automatically via sizeof(struct object_surface). Context.c: separate 2-control batched call sets HEVC DECODE_MODE + START_CODE device-wide. Same best-effort (void)v4l2_set_controls pattern as the existing H.264 device-init block; if kernel doesn't advertise HEVC controls (hantro on RK3568/RK3399), the batch silently fails without invalidating the H.264 batch. Meson.build: uncomment 'h265.c' (line 50) and 'h265.h' (line 73) in sources + headers lists. H265.h: added HEVC_MAX_SLICES_PER_FRAME=64 #define before struct forward declarations. Phase 6 smoke test on fresnel (post Commit A + Commit B): Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec env binding (/dev/video1 + /dev/media0). PASS. Criterion 3: ffmpeg -hwaccel vaapi HEVC decode of bbb_720p10s_hevc.mp4 -frames:v 5 -f null -, exit 0. cap_pool_init: 24 slots ready. PASS. Criterion 4: mpv --hwdec=vaapi --vo=image at +02s seek, HEVC fixture: HW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5 SW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5 HW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656 SW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656 HW=SW byte-identical for both frames; frame1 != frame2 (real motion). PASS. Criterion 5: regression hashes hold for both prior cells: H.264 +30s HW frame 1: f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9 (T4 ref MATCH) H.264 +30s HW frame 2: 7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8 (T4 ref MATCH) MPEG-2 +02s HW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 (iter1 ref MATCH) MPEG-2 +02s HW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de (iter1 ref MATCH) PASS. All five criteria green on first build attempt — Phase 5 review caught the 3 Critical UAPI errors (data_bit_offset → data_byte_offset rename; dpb.rps field gone + pic_order_cnt_val rename + index-array semantics) that would have been Phase 6 compile failures or silent Phase 7 byte-compare divergences. Without that review pass, this commit would have been the start of a 2+ loopback debugging cycle. Refs: ../fresnel-fourier/phase4_iter2_plan.md (10 contract clauses, File 4 patch shape) ../fresnel-fourier/phase5_iter2_review.md (C1, C2, C3, S1, S2, S3, S4, Q2 amendments all incorporated) ../fresnel-fourier/phase0_evidence/2026-05-08/iter2_phase3/ ffmpeg_v4l2req.stdout (cross-validator anchor — Phase 7 bonus byte-compare verification target) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 15:58:34 +02:00
claude-noether	cca539d5f9	fresnel-fourier iter2 Phase 6 commit A: config.c break for HEVCMain case RequestCreateConfig dispatches H.264 + MPEG-2 cases via break. HEVCMain previously fell through to default returning VA_STATUS_ERROR_UNSUPPORTED_PROFILE (= 12). Same fall-through pattern iter1 fixed for MPEG-2; iter2 closes the loop for HEVC. Add break for VAProfileHEVCMain. Same shape as iter1 Commit A pattern — no profile-specific config validation in RequestCreateConfig (validation happens at vaCreateContext / control submission time). This is the substrate fix only. After this commit: - vaCreateConfig(VAProfileHEVCMain) returns SUCCESS - mpv-vaapi HEVC ATTEMPTS to set up the hwaccel path - codec_set_controls at picture.c:204-206 still has the explicit case VAProfileHEVCMain: return UNSUPPORTED_PROFILE reject in place - decode fails downstream with -5 (Input/output error) Bug 2 (picture.c reject removal) + Bug 3-7 (h265.c rewrite + meson re-enable + slice_params accumulation + device-init extension) land together in commit B, where h265_set_controls exists to dispatch to. Verified empirically Phase 3 Baseline D (scratch test on throwaway branch): with this break alone, vaCreateConfig SUCCESS for HEVCMain, V4L2 setup proceeds, decode fails at the picture.c reject — confirms Phase 2 prediction. T4 H.264 + iter1 MPEG-2 reference hashes hold (no collateral regression). Refs: ../fresnel-fourier/phase0_findings_iter2.md (Phase 1 lock) ../fresnel-fourier/phase2_iter2_situation.md Bug 1 ../fresnel-fourier/phase3_iter2_baseline.md Baseline D ../fresnel-fourier/phase4_iter2_plan.md Clause 8, File 1 ../fresnel-fourier/phase5_iter2_review.md (no Critical findings touch this commit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 15:00:30 +02:00
claude-noether	229d6d11be	fresnel-fourier iter1 Phase 6 commit D: drop missed mpeg2-ctrls.h include from context.c Fix-forward for commit C (`3aab187`): Phase 2 source-read missed a third occurrence of #include <mpeg2-ctrls.h> in src/context.c:42. The Phase 2 grep audit reported only two callsites (src/config.c:37, src/mpeg2.c:38), both removed in commit B. After commit C deleted include/mpeg2-ctrls.h from disk, the build broke on context.c with: ../src/context.c:42:10: fatal error: mpeg2-ctrls.h: No such file or directory 42 \| #include <mpeg2-ctrls.h> \| ^~~~~~~~~~~~~~~ The include in context.c was vestigial — context.c references no V4L2_CID_MPEG_VIDEO_MPEG2_* symbols and never needed the header even before iter1's rewrite. The Phase 2 grep was simply incomplete. This commit drops the orphan include line. Build now passes; install clean; Phase 1 criterion 4 (DMA-BUF GL HW=SW byte-identical pixel hashes) still PASS: HW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 SW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 HW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de SW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de Per feedback_dev_process.md Phase 6 discipline: "If a plan revision is needed mid-implementation, surface it explicitly and re-enter Phase 4." This is a 1-line scope expansion of commit B's "drop mpeg2-ctrls.h include from all callsites" intent. Surfacing explicitly here rather than silently amending B (which is already pushed). No re-lock of plan needed; the spirit of File 1+2 in phase4_iter1_plan.md was "drop the include from every file that has it." The audit method (Phase 2 grep) was the gap. Lesson for Phase 8 memory update: a more authoritative completeness check than naive grep before deleting a header — recursive build attempt to drive out hidden includes, or grep with no path filter would have caught it. Refs: ../fresnel-fourier/phase4_iter1_plan.md (File 3 + audit) ../fresnel-fourier/phase2_iter1_situation.md Bug 3 (incomplete audit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 10:24:50 +02:00
claude-noether	3aab1879cb	fresnel-fourier iter1 Phase 6 commit C: delete staging-era include/mpeg2-ctrls.h Removes the local fork-internal header include/mpeg2-ctrls.h. The header explicitly self-described as staging-era in its preamble: These are the MPEG2 state controls for use with stateless MPEG-2 codec drivers. It turns out that these structs are not stable yet and will undergo more changes. So keep them private until they are stable and ready to become part of the official public API. The structs DID stabilize and become public in mainline Linux — at different CIDs (V4L2_CID_STATELESS_MPEG2_{SEQUENCE,PICTURE, QUANTISATION} = 0xa409dc/dd/de) and with redesigned struct layouts (split sequence/picture/quantisation, slice header parsing moved kernel-side, boolean fields collapsed to flags bitmask). Before this commit, two source files included this header: - src/config.c:37 #include <mpeg2-ctrls.h> - src/mpeg2.c:38 #include <mpeg2-ctrls.h> Both includes were removed in commit B. After this commit: $ git grep -l 'mpeg2-ctrls' -- (no matches) The kernel UAPI providing the new MPEG-2 stateless symbols is in <linux/v4l2-controls.h>, pulled in transitively via <linux/videodev2.h> (and explicitly in src/mpeg2.c). include/hevc-ctrls.h is kept untouched per phase5_iter1_review.md Nit 6 (lower-risk path; HEVC iteration will delete its corresponding staging header in a separate commit). Refs: ../fresnel-fourier/phase4_iter1_plan.md (File 3) ../fresnel-fourier/phase5_iter1_review.md (Nit 6) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 10:18:43 +02:00
claude-noether	5fe873c144	fresnel-fourier iter1 Phase 6 commit B: rewrite mpeg2.c against new V4L2 stateless API Rewrites src/mpeg2.c to submit MPEG-2 control payload via the new split V4L2_CID_STATELESS_MPEG2_{SEQUENCE,PICTURE,QUANTISATION} controls (mainline kernel <linux/v4l2-controls.h>:1985-2105), replacing the staging-era V4L2_CID_MPEG_VIDEO_MPEG2_{SLICE_PARAMS, QUANTIZATION} combined-struct API that the kernel removed. Per-frame submission: one batched VIDIOC_S_EXT_CTRLS, count=3, ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS (0xf010000), with the three controls in order: - id=0xa409dc (SEQUENCE) size=12 bytes - id=0xa409dd (PICTURE) size=32 bytes - id=0xa409de (QUANTISATION) size=256 bytes Matches FFmpeg libavcodec/v4l2_request_mpeg2.c:130-155 reference implementation. Verified empirically against fresnel-fourier Phase 0 cross-validator anchor (bit-for-bit byte equivalence on SEQUENCE first-row + QUANTISATION 256 bytes). Six structural changes from old to new API: 1. Slice header parsing moved to kernel: bit_size, data_bit_offset, quantiser_scale_code GONE from new structs. 2. Reference timestamps moved from slice to picture: forward_ref_ts/backward_ref_ts now in v4l2_ctrl_mpeg2_picture (offsets 0/8). 3. Boolean fields collapsed into picture.flags bitmask (TOP_FIELD_FIRST 0x01 .. PROGRESSIVE 0x80, 8 bits total). 4. progressive_sequence collapsed into sequence.flags & V4L2_MPEG2_SEQ_FLAG_PROGRESSIVE. 5. PICTURE_CODING_TYPE renamed to PIC_CODING_TYPE (values same). 6. Quantisation load_* flags removed; matrices always present; British spelling — quantiSation not quantiZation. Behavioral correction (from old code, was a latent bug): Old src/mpeg2.c:104-118 self-referenced surface_object timestamp when the VAAPI ref picture was VA_INVALID_ID. New code sets the ref_ts to 0, matching kernel doc 0-as-sentinel convention (verified Phase 3 Baseline C: I-frame has both ts == 0; FFmpeg v4l2_request_mpeg2.c:98-108 same convention). Quantisation matrix order: zigzag scanning order per kernel doc v4l2-controls.h:2076. VAAPI VAIQMatrixBufferMPEG2 stores in zigzag order (per VAAPI spec). Direct memcpy works; no permutation in libva backend. Kernel hantro_mpeg2.c:: hantro_mpeg2_dec_copy_qtable applies zigzag-to-raster permutation when copying to the hardware quantisation table. Default matrices (when iqmatrix_set==false): MPEG-2 spec defaults per ISO/IEC 13818-2 Table 7-3. The mpeg2_default_intra_matrix constant was transcribed from fresnel-fourier Phase 3 Baseline C QUANTISATION verbatim payload bytes 0..63 (256-byte capture from ffmpeg-v4l2request decode of bbb_720p10s_mpeg2.ts), per phase5_iter1_review.md S3 amendment that flagged spec-recall as unreliable. non_intra and chroma_non_intra are 16s per spec (verified Baseline C bytes 64..127, 192..255). chroma_intra is copy of intra (Baseline C bytes 128..191, verified identical). Submission shape: one batched v4l2_set_controls call with all three v4l2_ext_control entries, matching iter6/7/8 H.264 pattern at src/h264.c:986. Bound to surface_object->request_fd (the per-OUTPUT-slot permanent request_fd from iter6 binding). Behavioral details: - sequence.vbv_buffer_size = surface_object->source_size, where source_size is set in picture.c:276 from request_pool slot->size, which is the V4L2-negotiated sizeimage from VIDIOC_QUERYBUF. Matches FFmpeg controls->pic.output->size. - sequence.profile_and_level_indication = 0; not exposed by VAAPI VAPictureParameterBufferMPEG2. - sequence.chroma_format = 1 (4:2:0) hardcoded; campaign codec scope is 4:2:0. - progressive_frame proxies for progressive_sequence; same bit for typical streams. Phase 6 smoke test (post Commit A + Commit B): - vainfo enumerates VAProfileMPEG2Simple + VAProfileMPEG2Main on hantro bind. (Phase 1 criterion 1) - libva trace: vaCreateConfig(VAProfileMPEG2Main) = VA_STATUS_SUCCESS. (Phase 1 criterion 2) - ffmpeg -hwaccel vaapi exits 0 with no Failed-to-create- decode-configuration. (Phase 1 criterion 3 adjusted) - mpv --hwdec=vaapi --vo=image at +02s seek: 2 distinct frames with hashes byte-identical to SW reference: HW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 SW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 HW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de SW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de (Phase 1 criterion 4 — DMA-BUF GL import path; cache-coherency-safe) - T4 H.264 reference hashes still match (criterion 5; verified Phase 3 Baseline D earlier). Cache-stale class observation (out-of-scope iter1 work item): ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi + hwdownload pipeline produces all-zero NV12 for MPEG-2 (same iter1 patch-0011 cache-coherency bug class observed for H.264 in fresnel-fourier T4). Kernel + HW decode is correct (verified via ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime + hwdownload which produces correct non-zero pixels matching SW reference). Bug is in libva backend vaDeriveImage path; Phase 4 cross- cutting work to add VIDIOC_EXPBUF + DMA_BUF_IOCTL_SYNC support. Not blocking iter1 — DMA-BUF GL import path (mpv --vo=image) is cache-coherency-safe and gives bit-exact pixels. Auxiliary EINVAL noise (out-of-scope iter1 work item): src/context.c:142-155 unconditionally sets H.264 device-wide controls (V4L2_CID_STATELESS_H264_DECODE_MODE, _START_CODE) on every CreateContext, regardless of profile. EINVALs on hantro-vpu-dec (no H.264 controls there). Intentional best-effort behavior — return value cast to (void) and discarded at line 153. The error message "Unable to set control(s): Invalid argument" is logged from src/v4l2.c:484 but doesn't propagate as a backend error. Stays as documented auxiliary noise. Drop #include <mpeg2-ctrls.h> from src/config.c:37 and src/mpeg2.c (formerly line 38). The kernel UAPI for MPEG-2 stateless control IDs comes from <linux/v4l2-controls.h>, pulled transitively via <linux/videodev2.h> (and explicitly from src/mpeg2.c after this rewrite). The fork local include/mpeg2-ctrls.h header is deleted in commit C; this commit removes the last includes of it. src/config.c:38 still includes <hevc-ctrls.h> — left untouched per phase5_iter1_review.md Nit 6 (lower-risk path; HEVC iteration deletes its header). Refs: ../fresnel-fourier/phase4_iter1_plan.md (contract clauses 1-6, File 2 patch shape) ../fresnel-fourier/phase5_iter1_review.md (S3, Q4, Q5 amendments) ../fresnel-fourier/phase0_evidence/2026-05-07/iter1_phase3/ baseline_C_xvalidator/ffmpeg.stdout (cross-validator anchor) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 10:17:40 +02:00
claude-noether	e7dad7abb5	fresnel-fourier iter1 Phase 6 commit A: config.c break for MPEG-2 cases RequestCreateConfig dispatches H.264 cases via // FIXME + break; MPEG-2 + HEVC cases fell through to default: which returns VA_STATUS_ERROR_UNSUPPORTED_PROFILE (= 12). For MPEG-2, fall-through was a leftover from libva-multiplanar iter1-iter5 H.264 focus — nobody on that campaign tested MPEG-2 end-to-end, so the missing break never surfaced as a bug there. Add break for VAProfileMPEG2Simple + VAProfileMPEG2Main cases. HEVC stays in fall-through (h265.c excluded from build per fresnel-fourier campaign Phase 0 finding F-C; honest UNSUPPORTED_PROFILE is correct until h265.c is reinstated in a later iteration). This is the substrate fix only. After this commit, vaCreateConfig returns SUCCESS for MPEG-2, but actual decode still fails at VIDIOC_S_EXT_CTRLS time because src/mpeg2.c uses staging-era control IDs that mainline kernel removed. That fix lands in commit B (mpeg2.c rewrite against the new V4L2_CID_STATELESS_MPEG2_* split API). Verified empirically in Phase 3 baseline B (scratch fix on throwaway branch): with this break in place, vaCreateConfig ret = SUCCESS, V4L2 setup proceeds (CREATE_BUFS, REQBUFS, QUERYBUF, STREAMON, REQUEST_ALLOC, QBUF/DQBUF), then VIDIOC_S_EXT_CTRLS id=V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS (0x9909fa) returns -1 EINVAL — exactly the next failure mode predicted by phase2_iter1_situation.md. H.264 regression check (T4 reference hashes): with scratch fix in place, mpv --hwdec=vaapi at +30s into bbb_1080p30_h264.mp4 produces JPEG hashes f623d5f7... (frame 1) and 7d7bc6f2... (frame 2), exactly matching SW reference and T4 baseline. No H.264 regression. Refs: ../fresnel-fourier/phase0_findings_iter1.md (Phase 1 lock) ../fresnel-fourier/phase2_iter1_situation.md Bug 1 ../fresnel-fourier/phase3_iter1_baseline.md Baseline A + B ../fresnel-fourier/phase4_iter1_plan.md Clause 6, File 1 ../fresnel-fourier/phase5_iter1_review.md (Nit 6, kept smaller) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 07:45:35 +02:00
claude-noether	65969da3ee	iter8 Phase 4: tests/run_perf_binding_cell.sh — perf binding cell harness Anchors campaign-wide claims with measured numbers. Runs four consumer configurations against $FIXTURE for $DURATION seconds each: 1. mpv --hwdec=vaapi (DMA-BUF zero-copy through libva) 2. mpv --hwdec=vaapi-copy (HW decode + VAImage readback) 3. firefox (iter5-amend, sandbox enabled, file:// URL) 4. mpv --hwdec=no (SW decode baseline / control) Captures per consumer: CPU% (median + p90 from pidstat), GPU freq median (from /sys/class/devfreq/fde60000.gpu/cur_freq, polled at 100ms cadence), drops in window (from mpv --term-status-msg), p50 frame interval (mpv only), VmRSS delta (from /proc/PID/status). Emits a markdown table with raw numbers per consumer — no aggregation, no improvement ratios, no curated-benchmark framing. Honest schema including '—' for measurements not available per consumer (e.g. Firefox drops without internal hooks). Phase 5 sonnet review caught 3 issues, all addressed before commit: 1. pidstat $8 column heuristic — replaced with header-driven %CPU field detection (robust across sysstat 12.x point releases) 2. GPU freq median computation used /dev/stdin in nested subshell- over-pipe (unreliable) — replaced with temp-file path 3. --frames=$((DURATION * 30)) hardcoded 30fps (fixture-hardcoding per feedback_no_fixture_hardcoding.md) — replaced with --length=$DURATION (wall-time bounded, framerate-agnostic) Plus minor: empty cpu_pct.log now emits ERR rather than silent 0, distinguishing measurement failure from "process used no CPU." Reproducibility surface: run date, host, kernel, driver sha256, fixture path+size, duration captured in the output markdown. Hardware constants (/dev/video1, /dev/media0, devfreq path, driver install path) are documented as PineTab2 (RK3566 via hantro/rk3568-vpu) specific. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 11:59:13 +00:00
claude-noether	dcaa1f12e5	docs: clarify Rockchip silicon — PineTab2 is RK3566, not RK3568 Surfaced during iter7 Track F research: the campaign target hardware is Rockchip RK3566 silicon (PineTab2). The hantro driver attaches via the rockchip,rk3568-vpu DT compatible because the RK3566 silicon is close enough to RK3568 to share that variant. The proper RK3566 mainline driver target (rkvdec2 / vdpu346) has no kernel support yet — Christian Hewitt's patch series LKML 2025/12/26/206 is unmerged. Updates the two src/ comments that called the hardware "RK3568": - context.c: hantro-vpu device-init S_EXT_CTRLS comment now reads "via rockchip,rk3568-vpu DT compatible (covers RK3568 and RK3566 — PineTab2 silicon — since they're close enough)" - h264.c: DPB pic_num discussion ends "...never surfaced on PineTab2 (RK3566 via hantro/rk3568-vpu)" Not a correctness change. Compiles + decodes identically. The update matters for upstream submission accuracy (bootlin/Rockchip maintainers will care which silicon the campaign tested on). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 11:39:11 +00:00
claude-noether	7bd0818792	iter7 Phase 7 finalization: OUTPUT-pool teardown + test refinements Surfaced during Phase 7 verification on ohm: 1. OUTPUT pool stale-slot bug (src/surface.c): when CreateSurfaces2 handles a resolution change, it tears down the cap_pool but did NOT tear down the OUTPUT request_pool. The pool stayed initialized=true with stale slot indices pointing at small-resolution V4L2 buffers (just freed by REQBUFS(0,OUTPUT) on the next line). Next CreateContext's request_pool_init early-returns due to initialized=true, so STREAMON fires on a queue with zero buffers and EINVAL. Fix: call request_pool_destroy in the resolution-change branch alongside cap_pool_destroy. Mirror the cap_pool teardown. Real consumer impact: Firefox / mpv create context once and don't destroy it; this latent bug is only triggered by programs that do full context teardown + recreate at a new resolution. Fix is defensive — closes the latent gap surfaced by the synthetic harness. 2. cap_pool_probe_pattern.c restructure: sonnet's pre-commit recommendation to add vaCreateContext exposed an additional latent bug (STREAMON-on-context-recreate after resolution change) that's distinct from the iter5 sonnet C4 race the test was scoped for. Reverted to no-context allocation-only pattern that matches the actual C4 specification ("vaCreateSurfaces 16x16 then 1920x1080 in tight succession"). The new STREAMON bug is logged as iter8 candidate. 3. run_cap_pool_probe.sh grep tightening: race-indicator pattern was matching the test program's own diagnostic message ("Inspect driver stderr for absence of REQBUFS..."). Now grep restricts to lines starting with "v4l2-request:" prefix. Phase 7 results (clean iter7 driver sha 54999017... + this fix): - Track A (msync verify): 100 frames byte-for-byte SW=HW (sha 58c8f3f4...) -> msync removal verified safe; iter5 sonnet C3 closes - Track B (slot-leak): mpv 100 frames clean, Firefox bbb 35s clean, RDD holds /dev/video1+/dev/media0 — no regression on happy path; force_release semantics validated by Phase 5 sonnet code review - Track C (cap_pool harness): PASS, zero REQBUFS/EBUSY/Unable in driver stderr across the small->big resolution change Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 09:29:46 +00:00
claude-noether	988b848908	iter7: A+B+C — slot-leak fix, cap_pool harness, msync verify harness Closes three internal carry items in one fork commit. iter6 deferred these as TODOs; iter7 lands the implementations + supporting tests. # Track B — slot-leak error recovery (src/) iter6 documented the RequestSyncSurface error paths as a "bounded leak we accept" — slots stayed busy=true after REINIT/DQBUF failures until RequestTerminate ran. With pool=16 and rare errors this was acceptable, but a sustained-error scenario could starve the pool. Adds request_pool_force_release(pool, index) which: 1. Tries media_request_reinit on the slot's fd (cheap path) 2. Falls back to close + media_request_alloc (recovery) 3. Leaves the slot dead-busy if even alloc fails (other slots unaffected, pool capacity reduced by 1 until destroy) Wires it into surface.c RequestSyncSurface error paths only for errors before the OUTPUT-DQBUF attempt. After OUTPUT-DQBUF failure the V4L2 buffer is in indeterminate kernel state, so a separate error label (`error_buffer_indeterminate`) leaves the slot dead-busy — reusing the slot would QBUF on a kernel-still-held buffer and EINVAL. Phase 5 sonnet review caught this discriminator subtlety pre-commit. Files: request_pool.{h,c}, surface.c. # Track C — cap_pool race synthetic harness (tests/) iter5 sonnet C4 / iter6 candidate A: cap_pool resolution-change race was organically exercised by YT's quality renegotiations (iter6 close, 4 cap_pool_init events clean) but had no deterministic regression test. tests/cap_pool_probe_pattern.c — ~170-line C program: opens libva display, vaCreateConfig, vaCreateSurfaces(small) + vaCreateContext (triggers OUTPUT pool init at small resolution), dispose, vaCreateSurfaces(big) + vaCreateContext (forces S_FMT on the new resolution against an in-use OUTPUT pool — the actual race-hitting path). Phase 5 sonnet flagged that without vaCreateContext the test would pass trivially (OUTPUT pool never init'd, REQBUFS(0) on empty queue is a no-op). Fixed before commit. tests/run_cap_pool_probe.sh — runner; greps driver stderr for REQBUFS / EBUSY / "Unable to set format" race indicators. # Track A — msync pixel-correctness verify harness (tests/) iter5 sweep removed msync(MS_SYNC\|MS_INVALIDATE) from CAPTURE DQBUF path. iter5 sonnet C3 flagged: no formal pixel verification. tests/run_msync_pixel_verify.sh — runs FFmpeg SW decode (libavcodec reference) and FFmpeg HW decode (via our v4l2_request driver), compares NV12 byte streams. Probes fixture dimensions via ffprobe and uses crop=$W:$H after hwdownload to normalize MB-padding artifacts (hantro pads height to 16-line align; SW returns crop-aligned). Phase 5 sonnet flagged the stride-mismatch false-failure risk pre-commit. Fixed: explicit crop + diagnostic that distinguishes genuine pixel divergence from MB-padding stride artifacts. # Phase 5 sonnet code review Verdict: APPROVE-WITH-CHANGES. Three actionable findings, all addressed before this commit: 1. surface.c error path: separated OUTPUT-DQBUF-failure into error_buffer_indeterminate label, slot stays dead-busy 2. cap_pool_probe_pattern.c: added vaCreateContext to actually exercise the OUTPUT pool init at the small resolution 3. run_msync_pixel_verify.sh: explicit crop on HW path, stride-mismatch diagnostic distinguished from corruption Empirical verification (Phase 6+7 deploy + run): pending operator ohm-tools availability. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 06:49:48 +00:00
claude-noether	a09c03c154	iter6 fix: per-OUTPUT-slot request_fd binding via REINIT iter4 (`385dee1`) replaced the original media_request_reinit pattern with close+media_request_alloc per frame to escape an EINVAL on S_EXT_CTRLS that turned out to be a DPB-payload bug (`74d8dd1`, FFmpeg V4L2_H264_FRAME_REF semantics). The per-frame close+alloc model worked for mpv vaapi-copy (single-surface recycle) but raced under Firefox 150's MediaSource pipeline (multi-surface rotation): fd=30 got reused via lowest-free-fd allocation faster than the kernel- side per-buffer state-machine could tear down the prior request, producing intermittent VIDIOC_QBUF EINVAL on OUTPUT after 1..53 successful frames. Phase 2 telemetry confirmed: - DQBUF returned the index we passed (no FIFO mismatch) - SPS/PPS/DECODE_PARAMS/SCALING_MATRIX byte-identical between mpv and Firefox first 64 bytes - Pool size bump 4 -> 16 only delayed the failure (62 frames) - Different OUTPUT slot indices failed across runs (race signature) Fix: each OUTPUT pool slot owns a permanent request_fd allocated once at request_pool_init and REINIT'd between uses in RequestSyncSurface. 1:1 slot-to-fd binding eliminates cross-slot fd reuse entirely. Pool stays driver-wide (multi-context safe per iter5 Track E); slots cycle through 16 distinct fds in round-robin acquire. Files: - request_pool.h: add request_fd field to slot struct; init signature takes media_fd - request_pool.c: alloc per-slot fd at init, close at destroy - context.c: pass driver_data->media_fd; pool size 4 -> 16 - picture.c: BeginPicture binds slot->request_fd to surface; EndPicture's per-frame media_request_alloc removed - surface.c: RequestSyncSurface uses media_request_reinit instead of close+alloc; DestroySurfaces close removed (slot owns fd); error path close removed; surface_object NULL-init for the -Wmaybe-uninitialized warning fix Empirical verification (clean build sha ebe396d5..., no diagnostic instrumentation): - Firefox 150 + bbb_1080p30_h264.mp4 + LIBVA_DRIVER_NAME=v4l2_request + sandbox enabled: 35s+ playback, zero "Unable to queue buffer" / "Unable to set control(s)", lsof shows RDD process holds /dev/video1 + /dev/media0 throughout. Driver stderr: only the single cap_pool_init: 24 slots ready line. - mpv vaapi-copy 50 frames: zero errors, "Using hardware decoding (vaapi-copy)" - no regression vs iter5-end driver. Pool-size bump diagnostic (Phase 5 sonnet design review feedback): 4 -> 16 alone took 1->62 frames, far short of the 30s success criterion (~900 frames at 30fps). REINIT discipline is the actual fix; pool 16 is comfortable headroom over typical H.264 MaxDpbFrames. Phase 5 sonnet code review: APPROVE-WITH-CHANGES (one comment attribution corrected: cleanup runs at RequestTerminate, not RequestDestroyContext, since the pool is driver-wide). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 21:30:39 +00:00
Markus Fritsche	c8b6edec3d	iter5 sweep follow-up: remove additional DEBUG sites flagged by Phase 5 review Phase 5 sonnet review caught four DEBUG sites the first sweep pass missed (the vaapi-copy + --vo=null stress test didn't exercise the ExportSurfaceHandle path, so per-frame ExportSurfaceHandle dumps went undetected). Removed: - surface.c::CreateSurfaces2 format-dump (per-CreateSurfaces2 noise, labeled DEBUG INSTRUMENTATION (surface-export diagnosis 2026-05-04)) - surface.c::ExportSurfaceHandle full-descriptor dump (per-frame for consumers using DMA-BUF, also labeled DEBUG) - surface.c::QuerySurfaceStatus -> status= line (per-call noise) - h264.c V4L2 readback block (~67 lines): static bool readback_warned + the per-frame VIDIOC_G_EXT_CTRLS attempt + the readback success log + the "V4L2 readback unavailable" fallback announcement. With the iter4 fixes landed, the readback EACCES is no longer load-bearing to investigate — drop the block + the per-process global state. Removing the readback block also resolves Phase 5 finding C2: the static bool readback_warned was new mutable process-global state introduced post-Track-E, inconsistent with that track's intent. Net: -107 lines from src/{h264,surface}.c. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 16:04:03 +00:00
Markus Fritsche	b993355507	iter5 Track E: move LAST_OUTPUT_WIDTH/HEIGHT from process-global to per-driver-data Sonnet review 7.3 / 9.6 from iter1 + carried iter2/3/4 substrate. Two libva driver_data instances in the same process (e.g. Firefox playing two tabs at different resolutions, or Firefox + mpv via the same dlopened backend) would race on the static cache. Move to struct request_data.last_output_width/height. The V4L2 device fd is already per-driver_data, so this is the correct binding unit (one fd, one current OUTPUT format). Verified: two concurrent mpv processes (2s stagger) both decode 300 frames cleanly with no cross-corruption. Same-instant init still hits kernel-level fd contention on /dev/video1 (hantro is a single-instance device); cross-process serialization is out of scope for a libva backend. Resolves the surface_reset_format_cache() callsite: now takes driver_data parameter (was zero-arg). Also drops the 'rc' unused-variable warning in v4l2_ioctl_controls that the iter5 sweep left behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 15:05:41 +00:00
Markus Fritsche	843febc174	iter5 sweep: remove iter1 slice_header parse + VAPicture dump + Sync RETURN trace h264.c: - Remove the slice_header parse success log (the parse data is now forwarded into decode_params directly without per-frame echo). Keep the FAILED-rc log since it indicates a real decode-blocking error. - Remove the iter1 patch-0014 VAPictureH264 byte-dump + field-read log block. The TopFieldOrderCnt=65536 anomaly it diagnosed was resolved by the POC sentinel strip (h264_strip_ffmpeg_poc_sentinel) that stays in the codebase. surface.c: - Remove the per-call "RequestSyncSurface RETURN status=" trace. - Remove the per-call "RequestSyncSurface early-exit" trace. v4l2.c: - Suppress the per-frame "Unable to get control(s): Permission denied" log when errno == EACCES (the expected case on this hantro rig per iter1 patch-0014's findings). The one-time announcement in h264.c stays. Real EACCES-on-non-request-fd or other errno values still log normally. Per-frame v4l2-request log noise drops from ~30+ lines/frame to init-time + once-per-resolution-change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 14:51:10 +00:00
Markus Fritsche	d3a299b4cc	iter5 sweep: remove iter1 patch-0010 hex-dumps + patch-0011 sentinel picture.c: remove the 0xab sentinel write into CAPTURE buffer first 32 bytes pre-QBUF + the OUTPUT hex-dump pre-QBUF. Both were iter1 diagnostics for "where does the buffer write go?" investigation. surface.c: remove the post-DQBUF CAPTURE Y-plane hex-dump + luma variance signal. The msync(MS_SYNC\|MS_INVALIDATE) was added as a companion fix for the cached-mmap issue surfaced by the dump itself — removing the dump removes the need for the msync. With iter1+iter2+iter3+iter4 fixes landed, these dumps fire on every single frame and produce hundreds of MB of log noise during sustained decode. Now gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 14:48:31 +00:00
Markus Fritsche	951233a12e	iter5 sweep: remove iter1 ENTER traces (13 call sites across 4 files) Removes the iter1 patch-0014 ENTER traces from buffer.c, image.c, picture.c, surface.c. These were diagnostic-only entry-point logs added during iter1's "where does Firefox RDD crash?" investigation. With the iter1+iter2+iter3+iter4 fixes landed, the entry-point traces are pure noise. If a future investigation needs entry-point coverage, strace -e trace on the libva consumer process gives equivalent visibility without modifying the driver. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 14:47:25 +00:00
Markus Fritsche	39498f0d8e	iter5 sweep: remove iter4 DPB census instrumentation from h264.c Removes the pre-S_EXT_CTRLS DPB census + per-entry dump that helped diagnose iter4's frame-11 EINVAL bug. With the fix landed (`385dee1`), the diagnostic is no longer needed in the release driver. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 14:46:10 +00:00
Markus Fritsche	848fc0c4c4	iter5 sweep: remove iter3+iter4 Y2 instrumentation from v4l2.c Removes iter3 Y2 v1 (S_EXT_CTRLS rejected logging) + iter4 Y2 v3 (TRY_EXT_CTRLS retry) + iter4 per-control TRY isolation. With the frame-11 EINVAL fix landed in iter4 (`385dee1`), these diagnostics no longer fire under expected workloads, and they're noise for any upstream submission. If a future EINVAL re-introduces, the per-control TRY isolation pattern is documented in feedback_kernel_obfuscation_compound.md and can be re-applied surgically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 14:45:43 +00:00
Markus Fritsche	b81ce6981f	iter4 fix: B-slice L1 reflist .fields copy-paste bug In h264_va_slice_to_v4l2, the B-slice L1 reflist loop wrote .fields into ref_pic_list0[i] instead of ref_pic_list1[i]. This corrupted L0 reflist fields when L1 was being built and left ref_pic_list1[i].fields zero (which the kernel may interpret as "no valid field reference"). Pre-existing pre-iter4 bug (caught by iter4 Phase 5 sonnet review, finding C2). Latent on hantro bbb_1080p30 in FRAME_BASED mode because hantro walks reference_ts directly and ignores SLICE_PARAMS.fields, but the bug is wrong-by-construction and would surface on any driver that reads SLICE_PARAMS reflist fields, on interlaced content, or in SLICE_BASED decode mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 14:15:40 +00:00
Markus Fritsche	f21bdf0d50	iter4 DEBUG: per-control VIDIOC_TRY_EXT_CTRLS isolation Iterates each control individually through VIDIOC_TRY_EXT_CTRLS on S_EXT_CTRLS EINVAL. Used in iter4 Phase 4 to diagnose the carryover frame-11 EINVAL: discovered all four H.264 controls fail individually on the same request_fd → diagnosis pivot from "bad control content" to "bad request_fd state," which led to the fresh-request_fd-per-frame fix in `385dee1`. Stays in for the iter5 DEBUG sweep alongside iter1 ENTER traces + iter3 Y2 + iter4 Y2v3 + iter4 DPB census. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 14:12:53 +00:00
Markus Fritsche	385dee1bbf	iter4 fix: fresh request_fd per frame (fixes carryover EINVAL) This is the load-bearing fix that resolves the iter1+iter2+iter3 "frame-11 EINVAL" carryover. Replace the per-surface request_fd cache + MEDIA_REQUEST_IOC_REINIT pattern with allocate-fresh-per-frame: in RequestSyncSurface, after queue + wait_completion succeed, close the request_fd and reset surface_object->request_fd = -1 so the next BeginPicture allocates a new one via media_request_alloc. Diagnostic root cause: per-control VIDIOC_TRY_EXT_CTRLS isolation showed all four H.264 controls (SPS/PPS/DECODE_PARAMS/SCALING_MATRIX) fail individually with EINVAL on the same request_fd that had been through queue+wait+reinit. The fd state was bad even though every ioctl in the previous decode cycle returned success. Allocating fresh sidesteps any kernel-side request-state-machine subtlety we don't fully understand. Empirical verification (iter4 Phase 7, 90s autonomous run on ohm via firefox-fourier without MOZ_DISABLE_RDD_SANDBOX=1, bbb_1080p30 H.264): - ENETDOWN count: 0 - S_EXT_CTRLS rejected: 0 (was: fired at frame 11 every iter1-3) - Unable to set control(s): 0 - Generic EINVAL: 0 - Video stream mTime reached: 49.7 seconds - Audio stream mTime reached: 51.5 seconds Cost: ~one extra MEDIA_IOC_REQUEST_ALLOC + close() per decoded frame. Negligible (cycles below the V4L2 set_controls + queue + wait stack). Companion fixes that landed earlier in iter4 to get to this point: `74d8dd1` — DPB fields=V4L2_H264_FRAME_REF + skip !used entries (matches FFmpeg's libavcodec/v4l2_request_h264.c semantics) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 14:12:41 +00:00
Markus Fritsche	4892656b3f	iter4 DEBUG: pre-S_EXT_CTRLS DPB census + per-entry dump Inline log of DECODE_PARAMS.flags, sps.max_num_ref_frames, dpb counts (valid/active/long-term/internally-used), and per-entry frame_num / pic_num / fields / reference_ts immediately before each S_EXT_CTRLS submission. Used in iter4 Phase 4 to identify (a) the dpb->fields=0 bug and (b) the stale-entry growth bug. Stays in for iter4 Phase 4 continuation (at least one more bug still produces EINVAL after frame ~20). Remove at iter5 DEBUG sweep alongside iter1 ENTER/CAPTURE-dump and iter3 Y2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:44:56 +00:00
Markus Fritsche	74d8dd134a	iter4 partial fix: DPB fill matches FFmpeg semantics Two empirically-validated correctness fixes from comparing our h264.c fill_dpb against FFmpeg's libavcodec/v4l2_request_h264.c::fill_dpb on the iter3 test rig (Firefox-fourier on ohm RK3568, bbb_1080p30 H.264): 1. Set dpb[].fields = V4L2_H264_FRAME_REF for every valid entry. The kernel's v4l2_h264_init_reflist_builder iterates dpb[] and skips entries with fields == 0 — they count as "no field reference" regardless of VALID/ACTIVE flags. Without this, P-slices that need to walk the reference list (first one in BBB is at frame 11) hit "no valid refs" and S_EXT_CTRLS rejects the request with EINVAL (error_idx == count = kernel's "application bug" sentinel). 2. Skip entries with valid=true but used=false. dpb_update() clears `used` for all entries then re-marks only those in the current ReferenceFrames[] list. Stale entries (frames the consumer has retired from its DPB) were being included, growing the V4L2 dpb[] monotonically until H264_DPB_SIZE while SPS.max_num_ref_frames may be 4. FFmpeg iterates h->short_ref[] / h->long_ref[] only — the currently-referenced set. Empirical: from "10 frames decode, frame-11 P-slice EINVAL" to "~20 frames decode, then a different EINVAL on later frames." Confirms both fixes are correctness improvements but Track A is not yet fully resolved — at least one more bug remains. iter4 stays open. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:44:53 +00:00
Markus Fritsche	a12d29937c	iter4 DEBUG: Y2 v3 — retry with TRY_EXT_CTRLS on S_EXT_CTRLS EINVAL Per kernel comment in v4l2-ctrls-api.c:222-224, S_EXT_CTRLS deliberately obfuscates by setting error_idx = count, while TRY_EXT_CTRLS reports the actual failing index. Adds TRY retry inside the EINVAL diagnostic path. Empirical finding (iter4 Phase 4): TRY also returned error_idx == count on the frame-11 EINVAL on bbb_1080p30. Conclusion: failure is in the post-validate cluster commit (hantro driver's try_ctrl op or similar state-coherence check), NOT in any individual control's std_validate. The kernel comment may be outdated for compound controls, or the H.264 stateless cluster is committed atomically post-validate where error_idx is intentionally not updated for either S or TRY. Path forward (Phase 4 next): switch from "read kernel source" to "diff our DECODE_PARAMS construction vs FFmpeg's libavcodec/v4l2_request_h264.c" to identify field-by-field divergence at frame 11. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:25:40 +00:00
Markus Fritsche	086b7ce8cb	iter3 DEBUG: S_EXT_CTRLS EINVAL diagnostic in v4l2_ioctl_controls When VIDIOC_S_EXT_CTRLS returns -EINVAL, log num_controls, error_idx, and per-control id+size. Lets iter3+ debug "Unable to set control(s): Invalid argument" failures by naming exactly which control set was rejected — previously the request_log line in v4l2_set_controls just printed strerror(errno) with no specificity. Used in iter3 Phase 7 to confirm the frame-11 EINVAL is request-level ("error_idx == num_controls" sentinel = kernel rejected but couldn't pinpoint a single field) rather than a single-control size mismatch. To remove at iter4 DEBUG sweep alongside iter1 ENTER/CAPTURE-dump instrumentation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:57:01 +00:00
Markus Fritsche	4a7a07e0f4	iter3 Fix: select() → poll() in media_request_wait_completion Firefox's RDD seccomp common policy admits poll/ppoll/epoll_* but does NOT admit select/pselect6. Under the iter3 sandbox-patched RDD process, our select(except_fds) call returned ENOSYS (Mozilla's seccomp uses SECCOMP_RET_ERRNO with ENOSYS for filtered syscalls — not SIGSYS), killing libva decode after just one BeginPicture. poll(POLLPRI) is functionally equivalent for waiting on the media request fd's exceptional-condition completion signal, and lives inside a syscall family Mozilla's sandbox already permits. Driver-side fix preferred over expanding Firefox's seccomp surface — smaller blast radius, portable across sandbox policies, and poll() is the modern API. Verified iter3 Phase 7 on ohm: with this change in place plus the firefox-fourier broker + seccomp ioctl '\|' patches, Firefox decodes through libva inside the sandboxed RDD without MOZ_DISABLE_RDD_SANDBOX=1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:56:49 +00:00
Markus Fritsche	19acc76da4	iter2 Fix 3: decoupled CAPTURE buffer pool with LRU recycling Pre-iter2 each VA surface was permanently 1:1 bound to one V4L2 CAPTURE buffer. mpv reusing a surface for a new decode while the compositor still held an EXPBUF'd dma_buf fd to the prior frame caused the kernel to write fresh decode output into the same physical memory the compositor was reading -- visible as stutter / back-and-forth swap on mpv --hwdec=vaapi --vo=gpu playback. Architecture: - New cap_pool abstraction (cap_pool.{h,c}) owns N CAPTURE buffers (N = max(surfaces_count, MIN_CAP_POOL=24)) with per-slot state {FREE, IN_DECODE, DECODED, EXPORTED} guarded by pthread_mutex_t. - Surfaces no longer own buffers; each vaBeginPicture acquires the oldest FREE slot (LRU), binds it for the decode cycle, and the slot cycles IN_DECODE -> DECODED (post-DQBUF) -> EXPORTED (post-EXPBUF). - Slot is released on next BeginPicture for the same surface or on vaDestroySurfaces. Limitations (Sonnet Phase 5 review iter2 9.x, deferred to iter3+): - Option-A statistical mitigation; race window narrows to "pool exhausted, force-recycle of oldest EXPORTED slot." For typical mpv 16-surface playback with MIN_CAP_POOL=24 the fallback never fires. - Multi-context concurrent use not addressed (one V4L2 device, multiple cap_pools -- iter3 scope). Other call sites updated: - picture.c::BeginPicture acquires + binds, releasing prior slot if any. - surface.c::SyncSurface marks slot DECODED after DQBUF. - surface.c::ExportSurfaceHandle marks slot EXPORTED, retaining OUR EXPBUF fd for force-recycle close(). - surface.c::DestroySurfaces releases via surface_unbind_slot; cap_pool owns the mmaps now. - surface.c::CreateSurfaces2 destroys the pool in the resolution-change path before REQBUFS(0) (else stale v4l2_index after Fix 1's REQBUFS). - context.c::DestroyContext invokes cap_pool_destroy. - image.c::DeriveImage skips copy_surface_to_image when current_slot is NULL (ffmpeg av_hwframe_ctx_init probes derive on undecoded surfaces). Verified: mpv vaapi-copy 200 frames bbb_1080p30, 0 drops, LRU visibly recycling slot indices, real luma gradient. mpv vaapi --vo=gpu operator-inspection follows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:03:31 +00:00
Markus Fritsche	e64bb0852d	iter2 Fix 2: conditional DRM_FORMAT_MOD_INVALID for non-64-aligned pitch Iteration 2 Fix 2: branch on bytesperline alignment when setting the drm_format_modifier in RequestExportSurfaceHandle. For non-64-byte-aligned pitches (e.g. 864 for 864-wide videos), report DRM_FORMAT_MOD_INVALID instead of DRM_FORMAT_MOD_NONE (LINEAR explicit). Mesa's WSI rejects LINEAR buffers that aren't scanout-aligned with 'WSI pitch not properly aligned'; MOD_INVALID tells the importer to treat as texture-only, which is the correct behavior for buffers that don't meet scanout alignment requirements. Diagnosis from operator's mozilla.org session in iteration 1 close: 864-wide intro videos triggered the WSI alignment error and Firefox fell back to SW for those videos. Sonnet Phase 5 review endorsed the conditional approach over a universal MOD_INVALID change to preserve LINEAR semantics for already-aligned content (avoids unnecessary perf cost on the common 1920-wide case). Verification path (Phase 7 of iteration 2): Firefox loads mozilla.org main page; check no MESA WSI errors in stderr; operator confirms intro videos engage HW decode (or at least don't fall back). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 19:18:55 +00:00
Markus Fritsche	06beef6248	iter2 Fix 1: invalidate format cache on DestroyContext + REQBUFS(0) on CAPTURE in resolution-change path Fix 1 of iteration 2 per phase4_iter2_plan.md. Adds surface_reset_format_cache() exposed from src/surface.h. Called from RequestDestroyContext after the dual REQBUFS(0). Without this, multi-video Firefox sessions on mozilla.org corrupted the next session's CAPTURE format query: the kernel reset to defaults but our LAST_OUTPUT_WIDTH/HEIGHT cache still said 'already 1920x1088,' so the next G_FMT returned 48x48 and the exported descriptor encoded wrong pitch/offset. Also adds REQBUFS(0) on CAPTURE in the resolution-change path of RequestCreateSurfaces2 (Sonnet Phase 5 review iter2 9.1). The existing code only did REQBUFS(0) on OUTPUT before re-S_FMTting; hantro derives CAPTURE format from OUTPUT format, so leftover CAPTURE buffers from the prior resolution would also block the implicit format change. Pre-existing bug surfaced by Sonnet's audit; Fix 3 pool refactor would have exposed it more often. Limitation noted in surface.h docblock: the LAST_OUTPUT_WIDTH/ HEIGHT cache is a static process-global, so concurrent multi- context use still races (Sonnet 7.3 / 9.6). Iteration 2 only addresses sequential sessions. Multi-context safety is iteration 3+. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 19:11:03 +00:00
Markus Fritsche	c036a44f98	image: fully populate VAImageFormat per VAAPI spec for NV12 QueryImageFormats and DeriveImage previously set only .fourcc and left byte_order, bits_per_pixel, depth, and color masks zero (uninitialized in the caller's buffer). VAAPI consumers that read these fields (FFmpeg's hwcontext_vaapi.c::vaapi_init_pixfmt, intel-vaapi-driver test paths) inherit caller-stack garbage with non-deterministic behavior. Cross-reference: Mesa's gallium/frontends/va/image.c and intel-vaapi-driver's i965_drv_video.c both publish NV12 with byte_order=VA_LSB_FIRST and bits_per_pixel=12. We now match. For YUV formats, depth/red_mask/green_mask/blue_mask/alpha_mask are not meaningful (RGB-bitlayout-only fields); leave them zeroed via memset. Audit context: 2026-05-04 cross-reference of all libva entry points Firefox 150 calls vs our backend implementations. The SEPARATE_LAYERS fix (commit `ac891a0`) cleared the load-bearing bug; this fixes a latent uninitialized-field issue that was masked by mpv's tolerance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 18:34:50 +00:00
Markus Fritsche	ac891a01fa	surface: honor VA_EXPORT_SURFACE_SEPARATE_LAYERS in vaExportSurfaceHandle Firefox 150's RDD calls vaExportSurfaceHandle with the VA_EXPORT_SURFACE_SEPARATE_LAYERS flag (per FFmpegVideoDecoder.cpp GetVAAPISurfaceDescriptor at the libva-VAAPI export site). With that flag, libva consumers expect 2 separate layers — Y as DRM_FORMAT_R8, UV as DRM_FORMAT_GR88, each with num_planes=1 — not the COMPOSED single-layer-with-2-planes shape we always returned regardless of flags. Our previous code ignored the flag parameter and always built the COMPOSED descriptor. mpv works with that because mpv passes the default (COMPOSED) flag and the shape matches. Firefox's DMABufSurfaceYUV import code parsed our COMPOSED descriptor as if it were SEPARATE, found bogus layer-1 data, silently fell back to FFmpeg(FFVPX) software decode after frame 0. Fix: branch on the flag and build the appropriate descriptor. flags & VA_EXPORT_SURFACE_SEPARATE_LAYERS: num_layers=2 layers[0] = Y as DRM_FORMAT_R8, num_planes=1 layers[1] = UV as DRM_FORMAT_GR88, num_planes=1 default (COMPOSED, including unflagged): num_layers=1, drm_format=DRM_FORMAT_NV12, num_planes=2 (existing behavior, preserved for mpv et al.) For the single-fd case (hantro NV12 backed by one CMA buffer), both layers reference object_index=0 with different offsets and pitches (both stride=1920 for 1920x1088). Diagnosed via Firefox source dive (mozilla/gecko-dev master, dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp:1638) — the explicit flag in the export call was the discriminator between mpv's success and Firefox's silent SW fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:32:12 +00:00
Markus Fritsche	fdfee2d661	DEBUG: log SyncSurface RETURN to confirm clean exit before crash	2026-05-04 15:08:25 +00:00
Markus Fritsche	21ae311077	DEBUG: ENTER on CreateBuffer + BeginPicture for frame-1 crash narrowing	2026-05-04 14:43:29 +00:00
Markus Fritsche	92f5b254e6	DEBUG: ENTER on buffer/image entry points to localize Firefox RDD crash	2026-05-04 14:28:53 +00:00
Markus Fritsche	7da2b27454	DEBUG: ENTER logging at libva entry points to trace Firefox call flow Adds request_log on entry to: - RequestSyncSurface - RequestQuerySurfaceAttributes - RequestQuerySurfaceStatus (including the returned status value) - RequestDeriveImage - RequestQueryImageFormats - RequestGetImage Goal: identify which API call Firefox 150 makes that returns differently than it expects, causing the SW fallback after frame 0. mpv works end-to-end with the surface-export fix in place; Firefox does not. Per operator's correction: don't assume mpv's success means the driver is correct — Firefox may detect a real spec violation that mpv silently tolerates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:17:52 +00:00
Markus Fritsche	37c0e720fc	surface: re-set OUTPUT format on resolution change The static SET_FORMAT_OF_OUTPUT_ONCE flag pinned the OUTPUT format to the first call's dimensions, which for mpv's probe pattern means 128x128. Subsequent CreateSurfaces2 for the real 1920x1088 resolution would then read CAPTURE format from the kernel (which derives from the OUTPUT format) and get 128x128 sizes back — leading to a VADRMPRIMESurfaceDescriptor with width=1920 height=1088 but pitch=128 offset=16384. Mesa's WSI rejected this as 'pitch too small,' and the mpv vaapi --vo=gpu render landed on a solid blue frame. Same root cause for Firefox 150's SW fallback after frame 0. Replace SET_FORMAT_OF_OUTPUT_ONCE with LAST_OUTPUT_WIDTH/HEIGHT tracking. When dimensions change, call REQBUFS(0) on OUTPUT to drop any stale buffers (S_FMT is rejected by V4L2 while buffers exist), then re-S_FMT at the new resolution. The kernel will derive the new CAPTURE format from this OUTPUT format on the next CreateBufs + G_FMT cycle. Caveat (TODO for next iteration): for consumers that legitimately stream multiple resolutions in sequence (mid-stream resolution change via V4L2_EVENT_SOURCE_CHANGE), the current approach still requires CreateSurfaces2 to be called, which mpv does on probe. A proper context-level redesign would handle SOURCE_CHANGE inline with STREAMOFF + REQBUFS(0) + new S_FMT. Diagnosis and root cause: surfaced by 2026-05-04 Phase 5 sonnet review (finding 7.3) as a 'latent bug to document.' Today's instrumentation captured it as the active bug — the ExportSurfaceHandle dump showed pitch=128 for 1920x1088 surfaces right before MESA reported 'WSI pitch too small' and dropped to software. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:05:16 +00:00
Markus Fritsche	2517a1206b	DEBUG: instrument surface CreateSurfaces2 + ExportSurfaceHandle for diagnosis Logs format_width/height + bytesperline + sizes from v4l2_get_format in CreateSurfaces2, and the full VADRMPRIMESurfaceDescriptor in ExportSurfaceHandle (fd, fourcc, width/height, num_objects/layers, obj.size + drm_format/modifier, plane offsets/pitches). Diagnostic for the surface-export bug surfaced by Phase 7 (mpv --hwdec=vaapi --vo=gpu shows solid blue, Firefox falls back to SW after frame 0 — both consumers GL-import the DMA-BUF, both fail to render correctly while vaapi-copy works). Phase 5 review (sonnet) suggested format_height might be 1080 (stream) vs 1088 (MB-aligned), miscomputing UV offset by 15360 bytes. Earlier ftrace shows kernel returns height=1088 — the hypothesis is likely false but verifying in-driver to confirm. Will compare with mpv --msg-level=vd=v --msg-level=vo=v output to identify the import-side discrepancy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:00:13 +00:00
Markus Fritsche	6be3f3b120	h264: rate-limit V4L2 readback EACCES warning to once per process Hantro+v4l2 on Linux 6.19.x returns EACCES from VIDIOC_G_EXT_CTRLS on a request_fd in QUEUEING state for compound H.264 controls. Not actionable from userspace — kernel-side permission check whose semantics aren't yet investigated. Decode is unaffected (SET-side write succeeds; we just can't verify via readback from this rig). Logging the failure once per process instead of per-frame keeps the diagnostic message visible without flooding stderr. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 13:00:49 +00:00
Markus Fritsche	a047926dbc	DEBUG: cache-fix CAPTURE dump + VIDIOC_G_EXT_CTRLS readback Tier 3E + 3F observability hardening from the libva-multiplanar campaign Phase 6 follow-up. Improves diagnostic reliability for future probes; no functional decode path change. Tier 3E (cache-fix): patch-0010's CAPTURE Y-plane dump now calls msync(p, 32, MS_SYNC\|MS_INVALIDATE) before the read so userspace sees what the kernel actually DMA-wrote, not a stale CPU cache line. Without this, the previous version of the dump consistently showed the patch-0011 sentinel (0xab) even when the kernel had overwritten it — caused half a day of mistaken "kernel never wrote the buffer" diagnosis. Also computes a luma min/max/variance signal so a uniform fill (variance=0) is visually obvious vs real pixel data (variance > 0). Tier 3F (VIDIOC_G_EXT_CTRLS readback): after v4l2_set_controls in h264_set_controls, reads back DECODE_PARAMS + PPS via v4l2_get_controls (added by patch 0003) on the request fd. Logs key fields: dec.idr_pic_id, poc_lsb, refmark_bits, poc_bits — confirms slice-header parser outputs landed in the V4L2 control batch. dec.top_foc / bot_foc — confirms patch-0015 POC sentinel strip actually applied (should NOT show 65536 unless the strip mis-fired). dec.frame_num — cross-checks against VAAPI's pre-parsed frame_num (also already logged by patch 0014). pps.flags + (SMP=...) — confirms SCALING_MATRIX_PRESENT bit set this build. pps.refidx_l0/l1 — confirms Tier 1B num_ref_idx writes landed. Discriminates "we wrote X but kernel saw Y" from "we wrote zero all along" — the failure mode the original patch series didn't catch when slice-header bit_size fields were left zero. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 12:58:52 +00:00
Markus Fritsche	9de1be34ef	h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields The load-bearing fix from diff_against_ffmpeg.md (campaign repo). Adds src/h264_slice_header.{c,h} — a minimal H.264 slice_header() bit-parser per ITU-T H.264 (08/2024) §7.3.3. Parses just enough of the slice header to populate the V4L2 DECODE_PARAMS fields VAAPI doesn't carry and that hantro G1 hardware reads directly out of DECODE_PARAMS into MMIO registers: dec_param->dec_ref_pic_marking_bit_size -> G1_REG_DEC_CTRL5_REFPIC_MK_LEN dec_param->idr_pic_id -> G1_REG_DEC_CTRL5_IDR_PIC_ID dec_param->pic_order_cnt_bit_size -> G1_REG_DEC_CTRL6_POC_LENGTH dec_param->pic_order_cnt_lsb -> hantro reflist builder (poc_type=0) dec_param->delta_pic_order_cnt_bottom -> same dec_param->delta_pic_order_cnt0/1 -> hantro reflist builder (poc_type=1) Without these set correctly, hantro's hardware bitstream parser walks past zero bits in the slice header, lands on garbage, decodes zero pixels — the all-zero CAPTURE output observed across both mpv and Firefox during 2026-05-04 Phase 0 (see libva-multiplanar campaign phase0_evidence/2026-05-04-kernel-trace/findings.md). Implementation: - Minimal RBSP bit reader (br_read_u/_ue/_se), MSB-first, fault-flag on overrun. - Emulation-prevention unescape (strips 0x03 after 0x00 0x00) on the first 64 bytes of the slice — slice headers fit comfortably. - Walks slice_header() up to and including dec_ref_pic_marking(), measuring bit positions for the *_bit_size fields. - Skips ref_pic_list_modification() and pred_weight_table() — needed only to advance the bit position to dec_ref_pic_marking(). - Returns a struct with the V4L2 fields plus diagnostics (first_mb_in_slice, slice_type, pps_id, frame_num). Wired into h264_va_picture_to_v4l2 (src/h264.c) right after the nal_ref_idc/nal_unit_type extraction. SPS/PPS context is built from VAPicture's seq_fields and pic_fields; num_ref_idx_l0/l1_active defaults come from VASlice (best available substitute for the parsed PPS values). On parse success, populates decode_params with the recovered values + emits a request_log with the decoded fields for cross-validation against VAAPI's pre-parsed values. src/meson.build: adds h264_slice_header.{c,h} to sources. Cross-references: - FFmpeg libavcodec/h264_slice.c (Kwiboo v4l2-request-n8.1) — populates H264SliceContext::ref_pic_marking_bit_size / pic_order_cnt_bit_size by the same bit-precise parse, then v4l2_request_h264.c forwards to V4L2. - Linux drivers/media/platform/verisilicon/hantro_g1_h264_dec.c set_params() — the register-write code that reads these fields. MVC nal_unit_type 20/21 unhandled (this fork strips MVC alongside HEVC). Multi-slice non-IDR streams parse the first slice's header only; for FRAME_BASED mode that's fine — kernel sees the whole bitstream and parses subsequent slices itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 12:34:47 +00:00
Markus Fritsche	d41a4b96b3	h264: always submit SCALING_MATRIX + populate pps num_ref_idx Three Tier-2C/1B fixes from diff_against_ffmpeg.md (campaign repo): 1. Submit V4L2_CID_STATELESS_H264_SCALING_MATRIX every frame, with the H.264 spec flat default (every entry = 16) when the consumer didn't send a VAIQMatrixBufferH264. New helper: h264_default_flat_scaling_matrix(). Mirrors FFmpeg's v4l2_request_h264.c which always provides a scaling matrix. Replaces patch 0012's VAIQMatrixBuffer-conditional submission — that was corpus-correct (bbb has no explicit scaling lists) but inconsistent with what hantro G1 expects. 2. Set pps->flags \|= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT unconditionally. Hantro G1's set_params reads this flag to gate G1_REG_DEC_CTRL2_TYPE1_QUANT_E. 3. Populate pps->num_ref_idx_l0/l1_default_active_minus1 from VASliceParameterBufferH264.num_ref_idx_l*_active_minus1. Hantro G1 writes both into G1_REG_DEC_CTRL6_REFIDX0_ACTIVE / REFIDX1_ACTIVE. VAAPI doesn't expose the parsed-PPS default fields; the per-slice override is the closest available source (matches PPS default except on streams with explicit per-slice override). Why now: 2026-05-04 Phase 0 kernel-side audit (kernel source drivers/media/platform/verisilicon/hantro_g1_h264_dec.c) showed hantro G1 writes these fields directly into hardware MMIO registers. Prior assumption that they're "informational" or that "VAAPI handles defaults" was wrong — the hardware uses them to bit-walk the slice header and to size reference lists. See ~/src/libva-multiplanar/diff_against_ffmpeg.md. This is the easy half of the fix. The load-bearing half — adding a slice-header bit-parser to populate dec_param->dec_ref_pic_ marking_bit_size, idr_pic_id, pic_order_cnt_bit_size — comes in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 12:21:23 +00:00
Markus Fritsche	74b3793e3c	STUDY.md: pointer to libva-multiplanar campaign Phase 0 The Phase 0 / Phase 2 substrate that previously lived in this fork's STUDY.md has been migrated to the campaign-level phase0_findings.md at ../phase0_findings.md. This file is a pointer only. Note: after the 2026-05-04 Step 1 reconciliation (resetting fork master to bootlin `a3c2476` and replaying the marfrit-packages 18-patch series as commits), the historical commit referenced as `e0acc33` (STUDY.md phase 2 finding) lives only on the pre-step1 branch. To recover the historical content: 'git show pre-step1:STUDY.md'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 09:45:45 +00:00
Markus Fritsche	8594d74275	h264: derive sps.level_idc from H.264 Annex A.3 MaxFS Replaces patch 0013's hardcoded level_idc = 51 with a small lookup that picks the smallest level whose MaxFS contains the encoded frame size. Patch 0013's TODO is resolved by this change. VAAPI does not expose level_idc on the decode side (VAPictureParameterBufferH264 has no such field; only VAEncSequenceParameterBufferH264 carries it). The H.264 SPS NAL is parsed client-side by ffmpeg-vaapi and only slice data forwards in VASliceDataBuffer, so a SPS-NAL byte parser is not viable from the bitstream the libva-v4l2-request layer receives. We therefore derive level_idc from picture dimensions, which VAAPI does provide in VAPictureParameterBufferH264.picture_{width,height}_in_mbs_minus1. Annex A.3 (Table A-1) MaxFS thresholds: Level 1.0: 99 MBs ( 176×144 = 11×9 = 99 ) Level 1.1: 396 ( 352×288 = 22×18 = 396 ) Level 2.0: 396 Level 2.1: 792 ( 352×576 / 720×288 ) Level 2.2: 1620 ( 720×480 ≈ 1350; 720×576 = 1620 ) Level 3.0: 1620 Level 3.1: 3600 (1280×720 ≈ 3600 ) Level 3.2: 5120 Level 4.0: 8192 (1920×1088 = 8160 fits ) Level 4.1: 8192 Level 4.2: 8704 Level 5.0: 22080 Level 5.1: 36864 (3840×2176 = 32640 fits; 4K@8K-edge ) Level 5.2: 36864 Level 6.0: 139264 (8K ) V4L2 control encoding: level_idc = (level major × 10) + (level minor). Level 4.1 → 41, Level 5.1 → 51, Level 6.0 → 60. Picks for typical content: 1080p (1920×1088 = 8160 MBs) → Level 4.1 (level_idc = 41) 4K (3840×2176 = 32640 MBs) → Level 5.1 (level_idc = 51) 8K (7680×4352 = 130560 MBs) → Level 6.0 (level_idc = 60) The previous hardcode of 51 was over-allocating for 1080p; with this patch hantro can pre-allocate based on the actual frame size. For our ohm corpus (1080p) this drops the requested DPB / MV buffer sizing from level-5.1 generosity to level-4.1 right-sized. Without VAAPI exposing framerate we cannot also check MaxMBPS / MaxBR / MaxCPB. The frame-size-based pick is acceptable in practice: temporally-dense streams almost always also push spatially-large frames, so MaxFS captures the dominant resource-sizing signal. Cross-reference: H.264 spec Annex A, Table A-1 ("Level limits"). ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SPS lists level_idc as required-userspace-input, no kernel-derives annotation. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	b0a93e4683	h264: fill dpb[].pic_num as PicNum/LongTermPicNum, not VAAPI surface id fourier's h264_fill_dpb assigned `dpb->pic_num = entry->pic.picture_id` — the VAAPI surface id. Per ext-ctrls-codec-stateless.rst:651-655, v4l2_h264_dpb_entry.pic_num must equal the H.264 spec PicNum (equation 8-28) for short-term references or LongTermPicNum (equation 8-29) for long-term references. The surface id has no relationship to either. Kernel-side consumers of pic_num: - mediatek/decoder/vdec/vdec_h264_req_common.c (line 210): dst_entry->pic_num = src_entry->pic_num. Used for field-coded short-term reference disambiguation. - hantro / rkvdec / cedrus / qcom-iris-stateless: do NOT read pic_num. They resolve refs via reference_ts (timestamp) and POC. This is why fourier's wrong value never surfaced on RK3568 hantro. This patch makes pic_num spec-correct so the libva-v4l2-request fork is upstreamable across drivers without depending on each target's tolerance for non-spec fills. Computation, derived from H.264 spec section 8.2.4.1: For frames (not field-coded), PicNum = FrameNumWrap. FrameNumWrap = (frame_num > cur_frame_num) ? frame_num - max_frame_num : frame_num max_frame_num = 1 << (sps.log2_max_frame_num_minus4 + 4) cur_frame_num = current picture's frame_num For long-term references: LongTermPicNum = long_term_frame_idx (when not field-coded). VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic line 64): VAPictureH264.frame_idx = long_ref ? pic_id : frame_num So long-term refs already carry long_term_frame_idx in frame_idx; we copy it through. Field-coded streams require an extra factor-of-2 plus a parity adjustment per spec equations 8-28/8-29; this patch does not handle field-coded content. ohm corpus is all frame-coded so this is a follow-up for later. Implementation: add VAPicture parameter to h264_fill_dpb so the function has access to seq_fields.log2_max_frame_num_minus4 and the current picture's frame_num. Update the single caller in h264_va_picture_to_v4l2. Cross-reference: kernel doc ext-ctrls-codec-stateless.rst dpb_entry table (line 651-655) and mediatek/vdec/vdec_h264_req_common.c line 210. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	05ffd02ff2	h264: derive PFRAME / BFRAME flags from VASlice slice_type v4l2_ctrl_h264_decode_params.flags has PFRAME and BFRAME bits per ext-ctrls-codec-stateless.rst. fourier never set them; libva-v4l2- request relied on each backing driver tolerating frame-class ambiguity. Kernel survey (linux 6.19.x): - tegra-vde/h264.c (lines 783-799) consumes both flags to select the inter-frame decode kernel. Without them the I-frame kernel runs on P/B content. - visl-trace-h264.h uses them for decode tracing. - hantro / rkvdec / cedrus / mediatek / qcom-iris-stateless do not consume the flags. Hantro on ohm decoded bbb cleanly without these flags set (see phase6/step1/ohm_smoke_2026-05-02T060255Z_post_0015/), so this is an upstreamability fix for cross-driver portability rather than a correctness fix for hantro. VAAPI's VASliceParameterBufferH264.slice_type maps directly to the H.264 slice_header() slice_type field. Per spec 7.4.3: 0=P 1=B 2=I 3=SP 4=SI; 5..9 = "all slices in the picture have this slice_type." `slice_type % 5` recovers the underlying type in either encoding form. In FRAME_BASED mode we only see surface->params.h264.slice from the most-recent VASliceParameterBuffer — that's fine: a single coded picture has a uniform slice_type for the purposes of the PFRAME / BFRAME flag (multi-slice frames may mix slice types in some streams, but the flag's semantic is "this is an inter-coded frame," which holds if any slice is P or B; using the last-seen slice's type is a reasonable approximation). Cross-reference: ext-ctrls-codec-stateless.rst Decode Parameters Flags table. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00
Markus Fritsche	fdb0b728d7	h264: strip ffmpeg-vaapi POC sentinel before passing to V4L2 ROOT CAUSE for "kernel decodes successfully but produces zeroed CAPTURE buffers despite no V4L2_BUF_FLAG_ERROR": ffmpeg's H264POCContext initialises prev_poc_msb to (1 << 16) = 0x10000 as a sentinel for "uninitialised": libavcodec/h264dec.c:301 — global init in ff_h264_decode_init libavcodec/h264dec.c:444 — IDR reset in idr() helper ff_h264_init_poc (libavcodec/h264_parse.c:296-305) then computes pc->poc_msb = pc->prev_poc_msb whenever the slice header's pic_order_cnt_lsb hasn't wrapped relative to prev_poc_lsb (which is the typical case for any normal H.264 content with sane POC ordering). The sentinel leaks into field_poc[] (line 305) and from there into VAPictureH264.TopFieldOrderCnt / BottomFieldOrderCnt at libavcodec/vaapi_h264.c::fill_vaapi_pic (lines 73-78). Empirical confirmation via meitner 2026-05-02 ground-truth test: ran an LD_PRELOAD shim around vaCreateBuffer against an i965 VAAPI backend decoding a 60-frame H.264 Main clip. Every frame showed TopFieldOrderCnt = (POC \| 0x10000): Frame 1 IDR: raw bytes "00 00 01 00" at offset 12 → TopFOC=65536 Frame 2: raw bytes "06 00 01 00" → TopFOC=65542 Frame 3: "02 00 01 00" → TopFOC=65538 i965 successfully decodes regardless. V4L2 stateless drivers (hantro_h264.c::prepare_table feeds the value direct to tbl->poc[i*2]/[32], the kernel reflist builder uses it directly for cur_pic_order_count comparison) cannot tolerate the high word — the kernel's resource sizing math sees POC=65536 for an IDR and breaks. This patch adds h264_strip_ffmpeg_poc_sentinel() as a small static inline in src/h264.c. It detects bit 16 set rather than blindly subtracting, so a future ffmpeg version that fixes the leak degrades gracefully. The helper is applied at all four POC sites: 1. h264_fill_dpb: dpb->top_field_order_cnt 2. h264_fill_dpb: dpb->bottom_field_order_cnt 3. h264_va_picture_to_v4l2: decode->top_field_order_cnt 4. h264_va_picture_to_v4l2: decode->bottom_field_order_cnt VA_PICTURE_H264_INVALID DPB slots are short-circuited to POC=0 because libavcodec/vaapi_h264.c::init_vaapi_pic (line 43) already sets POC=0 there; the sentinel never applies. Zeroing them explicitly removes a class of "stale POC value in invalidated slot" foot-guns. Non-trivial follow-ups identified during the meitner experiment that are NOT addressed by this patch: - PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags are not yet derived from VASliceParameterBufferH264.slice_type. The bbb corpus is I-only at the start so this hasn't been a blocker, but a clip with B-frames will need the slice-type routing patch. - h264_fill_dpb's pic_num assignment (entry->pic.picture_id) is almost certainly wrong per the kernel doc — pic_num must equal the H.264 spec's PicNum / FrameNumWrap, not the VAAPI surface id. Out of scope here; will surface as a defect on streams that have multi-frame DPB lookups. Cross-references: audit_0008_decode_params_2026-05-01.md — kernel-side consumer audit confirming POC fields are userspace-required. api_contract_findings_2026-05-01.md — VAAPI doc gap on POC semantics; H.264 spec section 8.2.1 is the binding contract. meitner_2026-05-02_vaapi_idr_groundtruth/ — full empirical capture of the sentinel pattern across 60 frames. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-04 09:45:05 +00:00

1 2 3 4 5 ...

290 Commits