Phase 4 plan for iter2 HEVC fix. Structured per the
feedback_dev_process.md Phase 6 contract-before-code worked example
(0012-h264-omit-scaling-matrix-frame-based.patch shape): contract
clauses with citations first, then code changes mapping 1:1 to
clauses.
10 contract clauses cited from authoritative sources:
Clause 1 — Per-frame batched VIDIOC_S_EXT_CTRLS, count=5
Authority: linux/v4l2-controls.h:2090-2300 (8 HEVC stateless CIDs)
Reference impl: FFmpeg libavcodec/v4l2_request_hevc.c:505-565
(v4l2_request_hevc_queue_decode)
Empirical anchor: Phase 3 Baseline B verbatim payload
Clause 2 — v4l2_ctrl_hevc_sps layout (40 bytes)
Authority: linux/v4l2-controls.h:2096+ (struct + 9 SPS_FLAG_* bits)
Field-by-field VAAPI source mapping table; existing
h265_fill_sps logic preserved, just routed to flags bitmask
Phase 3 Baseline B BBB SPS bytes: flags=SAO|STRONG_INTRA_SMOOTHING
Clause 3 — v4l2_ctrl_hevc_pps layout (64 bytes, 19 flags)
Authority: linux/v4l2-controls.h:2126-2150
Field source: VAPictureParameterBufferHEVC + slice (for
dependent_slice_segment_flag)
Clause 4 — v4l2_ctrl_hevc_slice_params (variable; dynamic-array)
Authority: kernel exposes 0xa40a92 elems=1 dims=[600] dynamic-array
Submission shape: size = sizeof(slice_params) * num_slices_in_frame
Reference impl: FFmpeg v4l2_request_hevc.c:540-547
BEHAVIORAL CHANGE: per-slice accumulation in codec_store_buffer
(replace overwrite with append-to-array)
DPB MOVES OUT of slice_params to DECODE_PARAMS (Clause 6)
Clause 5 — v4l2_ctrl_hevc_scaling_matrix (size M; conditional)
Conditional on kernel availability (probed via VIDIOC_QUERY_EXT_CTRL
at init), NOT on bitstream flag (Phase 3 baseline corrects Phase 2
assumption)
Spec defaults from ISO/IEC 23008-2 Table 4-1 when iqmatrix_set==false
PROTOCOL: transcribe defaults from Phase 3 Baseline B verbatim
SCALING_MATRIX bytes, NOT from spec recall (per
memory feedback_review_empirical_over_theoretical.md)
Clause 6 — v4l2_ctrl_hevc_decode_params layout (328 bytes)
NEW in modern API (didn't exist in staging-era)
Contains: DPB array (16 entries), POC, num_active_dpb_entries,
num_poc_st_curr_before/after, num_poc_lt_curr,
poc_st_curr_before[8], etc.
Source: existing h265_fill_slice_params lines 269-315 logic
preserved, routed to new struct
Clause 7 — Device-wide DECODE_MODE + START_CODE menus
Set once at init via v4l2_set_controls(...request_fd=-1, 2 ctrls)
rkvdec accepts: FRAME_BASED + ANNEX_B (only options per kernel menu
constraints, Phase 0 v4l2_inventory)
Default location: extend src/context.c:142-155 device-init block
Clause 8 — config.c HEVCMain case must break;
Authority: C semantics; iter1 Bug 1 pattern verbatim
Empirical anchor: Phase 3 Baseline D scratch confirmed
Clause 9 — picture.c::codec_set_controls HEVCMain dispatch
Authority: existing MPEG-2 dispatch pattern at picture.c:186-191
Replace explicit Fourier-local: HEVC stripped reject with
h265_set_controls call
Clause 10 — Per-slice accumulation in codec_store_buffer
HEVC slice_params dynamic-array source = per-RenderPicture appends
BeginPicture resets num_slices=0; codec_store_buffer appends each
VASliceParameterBufferType to slices[N] array
Diff scope (8 files):
src/config.c — 5-line break addition (Clause 8)
src/picture.c — HEVCMain dispatch (Clause 9) + per-slice
accumulation (Clause 10) + BeginPicture
num_slices reset, ~25 lines
src/surface.h — extend params.h265 with slices[64] +
num_slices, ~17 KB extra per surface union
src/h265.c — full rewrite ~400 lines (Clauses 2-7)
src/h265.h — re-enable
src/meson.build — uncomment h265.c + h265.h
src/context.c — extend device-init for HEVC DECODE_MODE +
START_CODE
include/hevc-ctrls.h — leave as-is (9-line shim, lower-risk path
per iter1 Phase 5 Nit 6 deferral)
Phase 6 implementation order (2 logical commits + optional fix-forward):
A: src/config.c HEVCMain break only (substrate fix in isolation;
Phase 3 Baseline D already verified collateral safe)
B: h265.c rewrite + picture.c dispatch + slice_params accumulation +
meson re-enable + surface.h extension + context.c device-init
C: optional fix-forward if Phase 7 surfaces a regression
Phase 7 verification harness (full Bash incantations in plan body):
Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec
Criterion 2: vaCreateConfig(VAProfileHEVCMain) = SUCCESS via libva trace
Criterion 3: ffmpeg -hwaccel vaapi exit 0, no Failed-to-create
Criterion 4: mpv --hwdec=vaapi --vo=image at +02s; HW=SW byte-identical
(DMA-BUF GL cache-coherency-safe path per memory
feedback_rockchip_pixel_verify_path.md)
Criterion 5: iter1 MPEG-2 + T4 H.264 reference hashes still match
Bonus: byte-compare post-fix S_EXT_CTRLS payload vs Baseline B
Pre-identified Phase 7 → Phase 4 loopback triggers:
1. S_EXT_CTRLS EINVAL post-fix → check struct sizes (pahole),
reserved zeroing, SCALING_MATRIX size encoding
2. HW pixel hash mismatch → DPB ordering, slice_params bit_offset,
SPS/PPS flags bit positions, SCALING_MATRIX values
3. mpv --hwdec=vaapi filters HEVC out → fall-forward to ffmpeg
-vf hwdownload (less likely; vaapi engaged MPEG-2 in iter1)
4. iter1/T4 regression → verify diffs scoped right
5. Slice_params dynamic-array submission shape rejected → cross-
validator size encoding anchor
6. SCALING_MATRIX availability detection wrong → defensive
QUERY_EXT_CTRL probe in h265_init_device_controls
7. Latent bug B3 hits HEVC differently than MPEG-2 → byte 240 in
h265.picture; ffmpeg-vaapi sends VAPictureParameterBufferType
per frame so masking holds
Out-of-scope (LOCKED): VP9/VP8; HEVC Main 10 / Main Still Picture /
range ext / tile-wavefront; perf metrics; long-duration stress;
SLICE_BASED decode mode (rkvdec FRAME_BASED only); Phase 4 cross-
cutting backlog (B1 device-discovery, B3 BeginPicture profile-aware,
B4 context.c log suppression, B5 vbv_buffer_size, L3 vaDeriveImage
cache-stale); chromium-fourier 149 install; upstream engagement;
hevc-ctrls.h deletion (Phase 5 Nit 6 lower-risk path continues).
Predicted Phase 8 close: 4-6 commits on the fork (vs iter1's 4).
Iter2 ~3x larger codebase delta than iter1 (mpeg2.c rewrite was
~120 lines; h265.c rewrite is ~400 lines).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
41 KiB
Iteration 2 — Phase 4 (plan)
Implementation plan for iter2 HEVC Main on rkvdec. Inputs:
phase0_findings_iter2.md— Phase 1 lock (5 boolean criteria).phase2_iter2_situation.md— six bugs identified in HEVC path.phase3_iter2_baseline.md— substrate verified post-upgrade, HEVC cross-validator anchor captured (5-control per-frame batch).
Per feedback_dev_process.md Phase 6 contract-before-code: this plan opens with the contract clauses (kernel UAPI + FFmpeg reference + Phase 3 Baseline B verbatim citations), then specifies code changes that map 1:1 to those clauses.
Phase 1 criteria (re-stated; no Phase 3 → Phase 1 loopback this time)
Per phase0_findings_iter2.md, all 5 criteria as locked. No Phase 3 surprises required adjustment (criterion 3 already anchored on ffmpeg-direct from the start, mirroring iter1's Phase 5 Q4 amendment).
- vainfo enumeration regression:
VAProfileHEVCMaincontinues to be listed on the rkvdec env binding. (Already passes; iter2 must not strip.) - vaCreateConfig success:
vaCreateConfig(VAProfileHEVCMain, VAEntrypointVLD)returnsVA_STATUS_SUCCESS. (CurrentlyVA_STATUS_ERROR_UNSUPPORTED_PROFILE = 12.) - End-to-end ffmpeg-direct decode:
ffmpeg -hwaccel vaapi -i bbb_720p10s_hevc.mp4 -frames:v 5 -f null -exits 0; libva trace showsvaCreateConfig SUCCESS; noFailed to create decode configurationlines; noEINVALfromVIDIOC_S_EXT_CTRLS. - DMA-BUF GL HW=SW byte-identical at +02s: 2 distinct frames hash-equal across HW (
mpv --hwdec=vaapi --vo=image) and SW (--hwdec=no); frames 1 vs 2 hash-differ (real motion). - Regression on iter1 MPEG-2 AND T4 H.264: both prior-iteration cells continue to pass with their reference hashes.
Contract clauses (cite-before-code)
Clause 1 — Per-frame batched VIDIOC_S_EXT_CTRLS with 5 controls
Authority: Linux mainline include/uapi/linux/v4l2-controls.h:2090-2300 defines the 5 mandatory + 2 device-wide + 3 conditional HEVC stateless controls:
#define V4L2_CID_STATELESS_HEVC_SPS (V4L2_CID_CODEC_STATELESS_BASE+400) /* 0xa40a90 */
#define V4L2_CID_STATELESS_HEVC_PPS (V4L2_CID_CODEC_STATELESS_BASE+401) /* 0xa40a91 */
#define V4L2_CID_STATELESS_HEVC_SLICE_PARAMS (V4L2_CID_CODEC_STATELESS_BASE+402) /* 0xa40a92 */
#define V4L2_CID_STATELESS_HEVC_SCALING_MATRIX (V4L2_CID_CODEC_STATELESS_BASE+403) /* 0xa40a93 */
#define V4L2_CID_STATELESS_HEVC_DECODE_PARAMS (V4L2_CID_CODEC_STATELESS_BASE+404) /* 0xa40a94 */
#define V4L2_CID_STATELESS_HEVC_DECODE_MODE (V4L2_CID_CODEC_STATELESS_BASE+405) /* 0xa40a95 */
#define V4L2_CID_STATELESS_HEVC_START_CODE (V4L2_CID_CODEC_STATELESS_BASE+406) /* 0xa40a96 */
#define V4L2_CID_STATELESS_HEVC_ENTRY_POINT_OFFSETS (V4L2_CID_CODEC_STATELESS_BASE+407) /* not iter2 — tile/wavefront */
Reference implementation: FFmpeg libavcodec/v4l2_request_hevc.c:505-565 (v4l2_request_hevc_queue_decode) builds a 5-element v4l2_ext_control array and submits via ff_v4l2_request_decode_frame (single VIDIOC_S_EXT_CTRLS per frame).
Empirical anchor: Phase 3 Baseline B strace verbatim (phase3_iter2_baseline.md + phase0_evidence/2026-05-08/iter2_phase3/ffmpeg_v4l2req.strace.* gitignored) shows:
ioctl(/dev/video1, VIDIOC_S_EXT_CTRLS,
{ctrl_class=0xf010000 /* V4L2_CTRL_CLASS_CODEC_STATELESS */,
count=5,
controls=[
{id=0xa40a90 SPS, size=40, ...},
{id=0xa40a91 PPS, size=64, ...},
{id=0xa40a92 SLICE_PARAMS, size=N, ...}, /* dynamic-array */
{id=0xa40a93 SCALING_MATRIX, size=M, ...}, /* conditional on kernel availability */
{id=0xa40a94 DECODE_PARAMS, size=328, ...}
]}) = 0
Implication for iter2: h265_set_controls() builds a 5-entry struct v4l2_ext_control array and submits via the existing v4l2_set_controls(driver_data->video_fd, surface_object->request_fd, controls, 5) API. One VIDIOC_S_EXT_CTRLS per frame, mirroring iter1 MPEG-2 + iter6/7/8 H.264 patterns.
Clause 2 — v4l2_ctrl_hevc_sps field layout (40 bytes)
Authority: <linux/v4l2-controls.h>:2096+ struct v4l2_ctrl_hevc_sps:
struct v4l2_ctrl_hevc_sps {
__u8 video_parameter_set_id;
__u8 seq_parameter_set_id;
__u16 pic_width_in_luma_samples;
__u16 pic_height_in_luma_samples;
__u8 bit_depth_luma_minus8;
__u8 bit_depth_chroma_minus8;
__u8 log2_max_pic_order_cnt_lsb_minus4;
__u8 sps_max_dec_pic_buffering_minus1;
__u8 sps_max_num_reorder_pics;
__u8 sps_max_latency_increase_plus1;
__u8 log2_min_luma_coding_block_size_minus3;
__u8 log2_diff_max_min_luma_coding_block_size;
__u8 log2_min_luma_transform_block_size_minus2;
__u8 log2_diff_max_min_luma_transform_block_size;
__u8 max_transform_hierarchy_depth_inter;
__u8 max_transform_hierarchy_depth_intra;
__u8 pcm_sample_bit_depth_luma_minus1;
__u8 pcm_sample_bit_depth_chroma_minus1;
__u8 log2_min_pcm_luma_coding_block_size_minus3;
__u8 log2_diff_max_min_pcm_luma_coding_block_size;
__u8 num_short_term_ref_pic_sets;
__u8 num_long_term_ref_pics_sps;
__u8 chroma_format_idc;
__u8 sps_max_sub_layers_minus1;
__u8 reserved[6];
__u64 flags;
};
Total 40 bytes (verified against Phase 3 Baseline B verbatim payload size). 9 boolean fields collapsed into u64 flags:
#define V4L2_HEVC_SPS_FLAG_SEPARATE_COLOUR_PLANE (1ULL << 0)
#define V4L2_HEVC_SPS_FLAG_SCALING_LIST_ENABLED (1ULL << 1)
#define V4L2_HEVC_SPS_FLAG_AMP_ENABLED (1ULL << 2)
#define V4L2_HEVC_SPS_FLAG_SAMPLE_ADAPTIVE_OFFSET (1ULL << 3)
#define V4L2_HEVC_SPS_FLAG_PCM_ENABLED (1ULL << 4)
#define V4L2_HEVC_SPS_FLAG_PCM_LOOP_FILTER_DISABLED (1ULL << 5)
#define V4L2_HEVC_SPS_FLAG_LONG_TERM_REF_PICS_PRESENT (1ULL << 6)
#define V4L2_HEVC_SPS_FLAG_SPS_TEMPORAL_MVP_ENABLED (1ULL << 7)
#define V4L2_HEVC_SPS_FLAG_STRONG_INTRA_SMOOTHING_ENABLED (1ULL << 8)
VAAPI source mapping (mostly preserved from current src/h265.c::h265_fill_sps, just routed to flags collapsed bitmask):
| New SPS field | Source: VAPictureParameterBufferHEVC picture |
|---|---|
pic_width_in_luma_samples |
picture->pic_width_in_luma_samples |
pic_height_in_luma_samples |
picture->pic_height_in_luma_samples |
bit_depth_luma_minus8 |
picture->bit_depth_luma_minus8 |
bit_depth_chroma_minus8 |
picture->bit_depth_chroma_minus8 |
chroma_format_idc |
picture->pic_fields.bits.chroma_format_idc |
log2_max_pic_order_cnt_lsb_minus4 |
picture->log2_max_pic_order_cnt_lsb_minus4 |
sps_max_dec_pic_buffering_minus1 |
picture->sps_max_dec_pic_buffering_minus1 |
sps_max_num_reorder_pics |
0 (current code hardcodes; VAAPI doesn't expose) |
sps_max_latency_increase_plus1 |
0 (same) |
log2_min_luma_coding_block_size_minus3 |
picture->log2_min_luma_coding_block_size_minus3 |
log2_diff_max_min_luma_coding_block_size |
picture->log2_diff_max_min_luma_coding_block_size |
log2_min_luma_transform_block_size_minus2 |
picture->log2_min_transform_block_size_minus2 |
log2_diff_max_min_luma_transform_block_size |
picture->log2_diff_max_min_transform_block_size |
max_transform_hierarchy_depth_inter/intra |
same fields in VAAPI |
pcm_sample_bit_depth_luma_minus1, etc. |
same fields |
num_short_term_ref_pic_sets |
picture->num_short_term_ref_pic_sets |
num_long_term_ref_pics_sps |
picture->num_long_term_ref_pic_sps |
sps_max_sub_layers_minus1 |
0 (VAAPI doesn't expose; placeholder) |
video_parameter_set_id |
0 (VAAPI doesn't expose) |
seq_parameter_set_id |
0 (VAAPI doesn't expose) |
flags (OR of:) |
|
_SEPARATE_COLOUR_PLANE |
picture->pic_fields.bits.separate_colour_plane_flag |
_SCALING_LIST_ENABLED |
picture->pic_fields.bits.scaling_list_enabled_flag |
_AMP_ENABLED |
picture->pic_fields.bits.amp_enabled_flag |
_SAMPLE_ADAPTIVE_OFFSET |
picture->slice_parsing_fields.bits.sample_adaptive_offset_enabled_flag |
_PCM_ENABLED |
picture->pic_fields.bits.pcm_enabled_flag |
_PCM_LOOP_FILTER_DISABLED |
picture->pic_fields.bits.pcm_loop_filter_disabled_flag |
_LONG_TERM_REF_PICS_PRESENT |
picture->slice_parsing_fields.bits.long_term_ref_pics_present_flag |
_SPS_TEMPORAL_MVP_ENABLED |
picture->slice_parsing_fields.bits.sps_temporal_mvp_enabled_flag |
_STRONG_INTRA_SMOOTHING_ENABLED |
picture->pic_fields.bits.strong_intra_smoothing_enabled_flag |
reserved[6] |
zero (via memset) |
Phase 3 Baseline B verbatim sanity: BBB SPS bytes decode to: 1280×720, 8-bit, 4:2:0, no PCM, flags=SAMPLE_ADAPTIVE_OFFSET | STRONG_INTRA_SMOOTHING_ENABLED (0x108). iter2 implementation must produce the same 40 bytes for this fixture (Phase 7 byte-compare check).
Clause 3 — v4l2_ctrl_hevc_pps field layout (64 bytes)
Authority: <linux/v4l2-controls.h>:2150+ struct v4l2_ctrl_hevc_pps. Total 64 bytes. 19 boolean PPS fields collapsed into u64 flags:
#define V4L2_HEVC_PPS_FLAG_DEPENDENT_SLICE_SEGMENT_ENABLED (1ULL << 0)
#define V4L2_HEVC_PPS_FLAG_OUTPUT_FLAG_PRESENT (1ULL << 1)
#define V4L2_HEVC_PPS_FLAG_SIGN_DATA_HIDING_ENABLED (1ULL << 2)
#define V4L2_HEVC_PPS_FLAG_CABAC_INIT_PRESENT (1ULL << 3)
#define V4L2_HEVC_PPS_FLAG_CONSTRAINED_INTRA_PRED (1ULL << 4)
#define V4L2_HEVC_PPS_FLAG_TRANSFORM_SKIP_ENABLED (1ULL << 5)
#define V4L2_HEVC_PPS_FLAG_CU_QP_DELTA_ENABLED (1ULL << 6)
#define V4L2_HEVC_PPS_FLAG_PPS_SLICE_CHROMA_QP_OFFSETS_PRESENT (1ULL << 7)
#define V4L2_HEVC_PPS_FLAG_WEIGHTED_PRED (1ULL << 8)
#define V4L2_HEVC_PPS_FLAG_WEIGHTED_BIPRED (1ULL << 9)
#define V4L2_HEVC_PPS_FLAG_TRANSQUANT_BYPASS_ENABLED (1ULL << 10)
#define V4L2_HEVC_PPS_FLAG_TILES_ENABLED (1ULL << 11)
#define V4L2_HEVC_PPS_FLAG_ENTROPY_CODING_SYNC_ENABLED (1ULL << 12)
#define V4L2_HEVC_PPS_FLAG_LOOP_FILTER_ACROSS_TILES_ENABLED (1ULL << 13)
#define V4L2_HEVC_PPS_FLAG_PPS_LOOP_FILTER_ACROSS_SLICES_ENABLED (1ULL << 14)
#define V4L2_HEVC_PPS_FLAG_DEBLOCKING_FILTER_OVERRIDE_ENABLED (1ULL << 15)
#define V4L2_HEVC_PPS_FLAG_PPS_DISABLE_DEBLOCKING_FILTER (1ULL << 16)
#define V4L2_HEVC_PPS_FLAG_LISTS_MODIFICATION_PRESENT (1ULL << 17)
#define V4L2_HEVC_PPS_FLAG_SLICE_SEGMENT_HEADER_EXTENSION_PRESENT (1ULL << 18)
VAAPI source mapping: extracted from BOTH picture (VAPictureParameterBufferHEVC) AND slice (VASliceParameterBufferHEVC for dependent_slice_segment_flag). The current src/h265.c::h265_fill_pps (lines 48-102) does the field extraction correctly; iter2 just collapses booleans into the new u64 flags bitmask:
| New PPS field source | Old h265.c location |
|---|---|
pps->dependent_slice_segment_flag (now flags & DEPENDENT_SLICE_SEGMENT_ENABLED) |
slice->LongSliceFlags.fields.dependent_slice_segment_flag (line 54) |
pps->output_flag_present_flag (now flags & OUTPUT_FLAG_PRESENT) |
picture->slice_parsing_fields.bits.output_flag_present_flag |
pps->num_extra_slice_header_bits (kept as field) |
picture->num_extra_slice_header_bits |
| ... (15 more boolean field-to-flag conversions, mechanical) | |
pps->init_qp_minus26 (kept) |
picture->init_qp_minus26 |
pps->diff_cu_qp_delta_depth (kept) |
picture->diff_cu_qp_delta_depth |
pps->pps_cb_qp_offset (kept) |
picture->pps_cb_qp_offset |
pps->pps_cr_qp_offset (kept) |
picture->pps_cr_qp_offset |
pps->num_tile_columns_minus1 (kept) |
picture->num_tile_columns_minus1 |
pps->num_tile_rows_minus1 (kept) |
picture->num_tile_rows_minus1 |
pps->pps_beta_offset_div2 (kept) |
picture->pps_beta_offset_div2 |
pps->pps_tc_offset_div2 (kept) |
picture->pps_tc_offset_div2 |
pps->log2_parallel_merge_level_minus2 (kept) |
picture->log2_parallel_merge_level_minus2 |
Field added: column_width_minus1[20], row_height_minus1[22], num_extra_slice_header_bits, reserved |
populate from VAAPI (or zero if VAAPI doesn't expose) |
flags u64 with the 19 bits OR'd |
(mechanical boolean collapse) |
Clause 4 — v4l2_ctrl_hevc_slice_params (variable; dynamic-array per frame)
Authority: <linux/v4l2-controls.h> struct v4l2_ctrl_hevc_slice_params. Contains per-slice info: bit_size, data_bit_offset, slice_type, slice_pic_order_cnt, slice flags, QP deltas, ref_idx_l0/l1[15], pred_weight_table, num_entry_point_offsets, slice_segment_addr, etc.
Phase 0 inventory confirms rkvdec advertises:
hevc_slice_parameters 0x00a40a92 (hevc-slice-params): elems=1 dims=[600] flags=has-payload, dynamic-array
So kernel accepts up to 600 slice_params entries per submission. iter2's bbb_720p10s_hevc.mp4 fixture is x265-ultrafast — typical 1 slice per frame; multi-slice would still fit in the 600-entry envelope.
Submission shape: size = sizeof(struct v4l2_ctrl_hevc_slice_params) * num_slices_in_frame. FFmpeg libavcodec/v4l2_request_hevc.c:540-547 shows the pattern:
if (ctx->max_slice_params && controls->num_slice_params) {
control[count++] = (struct v4l2_ext_control) {
.id = V4L2_CID_STATELESS_HEVC_SLICE_PARAMS,
.ptr = controls->frame_slice_params,
.size = sizeof(*controls->frame_slice_params) *
FFMIN(controls->num_slice_params, ctx->max_slice_params),
};
}
libva backend behavioral change (NEW for iter2): VAAPI clients submit VASliceParameterBufferType once per slice via vaRenderPicture. The current src/picture.c::codec_store_buffer:115-135 for HEVC memcpy(&surface->params.h265.slice, …) overwrites the previous slice's params. iter2 must change to append: each VASliceParameterBufferType arrival appends a new entry to a params.h265.slices[N] array, with params.h265.num_slices++. At end_picture, h265_set_controls reads the array and submits as one dynamic-array control.
VAAPI source mapping: existing src/h265.c::h265_fill_slice_params (lines 160-365) does the field extraction per-slice correctly. iter2 preserves the extraction logic (NAL header parse, data_bit_offset bit-search, ref_idx, pred_weight) but routes per-slice into an array slot rather than a single struct.
Critical: NAL header parsing at h265.c:184-209 extracts nal_unit_type and data_bit_offset from the slice bitstream. This logic is preserved — the new V4L2 API still requires per-slice bit_size and data_bit_offset. The new struct keeps these fields (they're per-slice metadata, not per-frame).
One field MOVES OUT of slice_params: the DPB array (dpb[15]) and num_active_dpb_entries / num_rps_poc_st_curr_before/after / num_rps_poc_lt_curr migrate to DECODE_PARAMS (Clause 6). iter2's per-slice fill no longer populates the DPB.
Clause 5 — v4l2_ctrl_hevc_scaling_matrix (size M; conditional submission)
Authority: <linux/v4l2-controls.h> struct v4l2_ctrl_hevc_scaling_matrix. Contains 4 scaling lists (4×4, 8×8, 16×16, 32×32) for luma + chroma intra/inter — substantial struct.
Conditional submission per FFmpeg pattern: query kernel availability once at init via VIDIOC_QUERY_EXT_CTRL for the SCALING_MATRIX CID. If kernel advertises (rkvdec on fresnel does, per Phase 3 Baseline B), include in the per-frame batch unconditionally. If kernel doesn't advertise, omit.
Phase 3 evidence: BBB fixture's per-frame batch always contains SCALING_MATRIX (see Baseline B verbatim 30 occurrences across 5 frames + queries). FFmpeg gates on ctx->has_scaling_matrix set at init from ff_v4l2_request_query_control_default_value(...SCALING_MATRIX). iter2 mirrors: probe at init, store boolean in the libva backend's per-context state, include in batch if true.
VAAPI source mapping: VAIQMatrixBufferHEVC provides the four scaling lists (scaling_lists_4x4[6][16], _8x8[6][64], _16x16[6][64], _32x32[2][64] plus DC scaling lists). When iqmatrix_set==true, copy from VAAPI struct to V4L2 struct. When iqmatrix_set==false, populate with HEVC spec default scaling matrices (per ISO/IEC 23008-2 Table 4-1 — flat 16 across all positions, with DC values 16).
Phase 3 Baseline B SCALING_MATRIX verbatim payload not field-decoded yet (deferred to Phase 6 transcription); will compare bytes against backend-generated payload at Phase 7 verification time.
Clause 6 — v4l2_ctrl_hevc_decode_params field layout (328 bytes)
Authority: <linux/v4l2-controls.h> struct v4l2_ctrl_hevc_decode_params. NEW in modern API (didn't exist in staging-era). Contains:
pic_order_cnt_val(s32) — current picture POC.short_term_ref_pic_set_size,long_term_ref_pic_set_size— RPS sizes.num_active_dpb_entries— count of valid DPB entries.num_poc_st_curr_before/after, num_poc_lt_curr— short-term + long-term ref counts.poc_st_curr_before[8],poc_st_curr_after[8],poc_lt_curr[8]— POC arrays for ref pic ordering.dpb[16]— DPB entries:{timestamp, flags, field_pic, pic_order_cnt_val, _padding}per entry.flags(u64) —IRAP_PIC,IDR_PIC,NO_OUTPUT_OF_PRIOR_PICS, etc.
Total 328 bytes (verified against Phase 3 Baseline B verbatim payload size).
VAAPI source mapping: largely preserved from current src/h265.c::h265_fill_slice_params lines 269-315 (DPB iteration over picture->ReferenceFrames[15]), just routed to a new struct. The existing logic for dpb[i].timestamp, dpb[i].rps, dpb[i].pic_order_cnt[0], field_pic migrates verbatim to decode_params.dpb[i].timestamp etc. The DPB-counting logic (num_rps_poc_st_curr_before/after, num_rps_poc_lt_curr) migrates to the num_poc_* fields of decode_params.
Submission: per-frame, after SPS + PPS in the batch.
Clause 7 — Device-wide DECODE_MODE + START_CODE menu controls
Authority: <linux/v4l2-controls.h> defines:
#define V4L2_CID_STATELESS_HEVC_DECODE_MODE (V4L2_CID_CODEC_STATELESS_BASE+405)
#define V4L2_CID_STATELESS_HEVC_START_CODE (V4L2_CID_CODEC_STATELESS_BASE+406)
enum v4l2_stateless_hevc_decode_mode {
V4L2_STATELESS_HEVC_DECODE_MODE_SLICE_BASED,
V4L2_STATELESS_HEVC_DECODE_MODE_FRAME_BASED,
};
enum v4l2_stateless_hevc_start_code {
V4L2_STATELESS_HEVC_START_CODE_NONE,
V4L2_STATELESS_HEVC_START_CODE_ANNEX_B,
};
Phase 0 inventory confirms fresnel rkvdec advertises:
hevc_decode_mode 0x00a40a95 (menu): min=1 max=1 default=1 (Frame-Based) flags=has-min-max
hevc_start_code 0x00a40a96 (menu): min=1 max=1 default=1 (Annex B Start Code) flags=has-min-max
So rkvdec accepts ONLY FRAME_BASED decode mode and ANNEX_B start code — same constraints as H.264 + MPEG-2. Set both at decoder init via v4l2_set_controls(driver_data->video_fd, /* request_fd= */ -1, dev_ctrls, 2) with values FRAME_BASED + ANNEX_B.
Where to set: extend src/context.c:142-155's existing H.264 device-init block to also set HEVC's two device controls when context is HEVC-profile-bound. Current pattern: 2 ext_controls in one batched call with request_fd=-1. iter2 adds 2 more controls (or a separate call) for the HEVC variants.
Alternative: set them inside h265_set_controls once per context (with a "first call" guard). Cleaner location-wise but requires per-context state. Phase 6 implementer chooses.
Clause 8 — RequestCreateConfig HEVCMain case must break;
Authority: C language semantics. src/config.c:67 case VAProfileHEVCMain: falls through to default: (line 68) which returns the error. iter1 added break; for MPEG-2 cases; HEVCMain is the last case in the same fall-through bucket.
Empirical anchor: Phase 3 Baseline D verified the patch shape in scratch — adding break; for HEVCMain lets vaCreateConfig return VA_STATUS_SUCCESS without affecting iter1 MPEG-2 or T4 H.264 hashes.
Fix shape: 5 lines (case label preserved; comment + break added; matches iter1 Commit A pattern verbatim).
Clause 9 — picture.c::codec_set_controls HEVCMain dispatch
Authority: existing src/picture.c:186-191 MPEG-2 dispatch pattern from iter1:
case VAProfileMPEG2Simple:
case VAProfileMPEG2Main:
rc = mpeg2_set_controls(driver_data, context, surface_object);
if (rc < 0) return VA_STATUS_ERROR_OPERATION_FAILED;
break;
iter2 replaces the explicit case VAProfileHEVCMain: return VA_STATUS_ERROR_UNSUPPORTED_PROFILE; (lines 204-206) with the same shape, dispatching to h265_set_controls. Comment updated to remove the stale Fourier-local: HEVC stripped, no HW support on RK3566. reference.
Clause 10 — Per-slice accumulation in codec_store_buffer
Authority: HEVC kernel API requires per-slice slice_params (Clause 4). VAAPI clients submit VASliceParameterBufferType once per slice via vaRenderPicture. The current src/picture.c:115-135 for HEVC VASliceParameterBufferType does:
case VAProfileHEVCMain:
memcpy(&surface_object->params.h265.slice, buffer_object->data, sizeof(...));
break;
Behavior change: replace single-slot copy with array-append:
case VAProfileHEVCMain:
if (surface_object->params.h265.num_slices < HEVC_MAX_SLICES_PER_FRAME) {
memcpy(&surface_object->params.h265.slices[surface_object->params.h265.num_slices],
buffer_object->data,
sizeof(VASliceParameterBufferHEVC));
surface_object->params.h265.num_slices++;
} else {
/* exceeded array bound — log and drop; Phase 7 verification flags */
}
break;
HEVC_MAX_SLICES_PER_FRAME = e.g. 64 (kernel max is 600; conservative). For the BBB fixture this maxes at 1 per frame; the bound is for safety.
At BeginPicture: reset num_slices = 0 per-frame. Currently picture.c:287 only resets params.h264.matrix_set = false; iter2 adds params.h265.num_slices = 0 reset for HEVC surfaces. (Or per-profile: switch on config_object->profile and reset accordingly. iter2 adds params.h265.num_slices = 0 unconditionally for now — benign for non-HEVC since the union aliasing puts num_slices in a region overwritten by RenderPicture's per-buffer copies.)
Diff scope
File 1: src/config.c — add break; for HEVCMain case (5 lines)
@@ -68,6 +68,11 @@ VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile,
// submission time.
break;
case VAProfileHEVCMain:
+ // fresnel-fourier iter2: HEVC enabled. Same shape as H.264/
+ // MPEG-2 above — no profile-specific config validation in the
+ // libva backend; validation happens at vaCreateContext /
+ // control submission time.
+ break;
default:
return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
File 2: src/picture.c — replace HEVCMain reject with dispatch + per-slice slice_params accumulation (~25 lines)
Two distinct changes:
(a) Dispatch HEVCMain in codec_set_controls (lines 204-206):
- case VAProfileHEVCMain:
- /* Fourier-local: HEVC stripped, no HW support on RK3566. */
- return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
+ case VAProfileHEVCMain:
+ rc = h265_set_controls(driver_data, context, surface_object);
+ if (rc < 0)
+ return VA_STATUS_ERROR_OPERATION_FAILED;
+ break;
(b) Per-slice accumulation in codec_store_buffer (HEVC VASliceParameterBufferType case, lines 127-131):
- case VAProfileHEVCMain:
- memcpy(&surface_object->params.h265.slice,
- buffer_object->data,
- sizeof(surface_object->params.h265.slice));
- break;
+ case VAProfileHEVCMain: {
+ unsigned int n = surface_object->params.h265.num_slices;
+ if (n < HEVC_MAX_SLICES_PER_FRAME) {
+ memcpy(&surface_object->params.h265.slices[n],
+ buffer_object->data,
+ sizeof(VASliceParameterBufferHEVC));
+ surface_object->params.h265.num_slices = n + 1;
+ }
+ /* note: also keep .slice (singular) populated as last-slice
+ * mirror for h265_fill_pps which reads dependent_slice_segment_flag
+ * from VASliceParameterBufferHEVC->LongSliceFlags */
+ memcpy(&surface_object->params.h265.slice,
+ buffer_object->data,
+ sizeof(surface_object->params.h265.slice));
+ break;
+ }
(c) Reset num_slices in RequestBeginPicture at line 287:
surface_object->params.h264.matrix_set = false;
+ surface_object->params.h265.num_slices = 0;
File 3: src/surface.h — extend params.h265 to hold slice_params array
Add inside the union { ... } params block:
struct {
VAPictureParameterBufferHEVC picture;
VASliceParameterBufferHEVC slice;
+ VASliceParameterBufferHEVC slices[HEVC_MAX_SLICES_PER_FRAME];
+ unsigned int num_slices;
VAIQMatrixBufferHEVC iqmatrix;
bool iqmatrix_set;
} h265;
HEVC_MAX_SLICES_PER_FRAME = 64 defined in surface.h (or h265.h). Total memory cost: sizeof(VASliceParameterBufferHEVC) ≈ 264 bytes × 64 = ~17 KB extra per surface union — significant but acceptable.
Alternative (smaller memory): heap-allocate slices array dynamically (malloc on first slice arrival, realloc on grow, free at surface destroy). More plumbing; defer to Phase 4 plan revision if Phase 7 surfaces memory concerns. iter2 default: stack-array of 64.
File 4: src/h265.c — full rewrite against new split API (~400 lines)
Per Clauses 2-7. The bulk of iter2 work. Structure mirrors current h265.c but routes to new struct layouts:
h265_fill_sps()→ fillstruct v4l2_ctrl_hevc_sps(40 bytes, flags collapsed). ~40 lines.h265_fill_pps()→ fillstruct v4l2_ctrl_hevc_pps(64 bytes, flags collapsed). ~50 lines.h265_fill_slice_params()→ fill ONEstruct v4l2_ctrl_hevc_slice_params(per-slice; called from a loop in h265_set_controls over surface->params.h265.slices[]). ~80 lines (preserves NAL header parse, data_bit_offset bit-search, ref_idx, pred_weight).- NEW
h265_fill_decode_params()→ fillstruct v4l2_ctrl_hevc_decode_params(328 bytes: DPB array, POC, num_active_dpb_entries, etc.). ~60 lines. - NEW
h265_fill_scaling_matrix()→ fillstruct v4l2_ctrl_hevc_scaling_matrixfromVAIQMatrixBufferHEVC(or spec defaults ifiqmatrix_set==false). ~30 lines. - NEW
h265_init_device_controls()→ set DECODE_MODE + START_CODE menus once per context. ~15 lines. Called from h265_set_controls with first-call guard, OR from context.c device-init block. h265_set_controls()→ orchestrator: build SPS, PPS, all slice_params (loop over array), DECODE_PARAMS, SCALING_MATRIX (conditional on init-time probe); submit batched. ~50 lines.
Plus the static const default scaling matrices (luma + chroma intra/inter, 4 × 64 bytes per scan-size with extra DC values) for the iqmatrix_set==false branch. Per Phase 5 Lesson L2 (feedback_review_empirical_over_theoretical.md): transcribe from Phase 3 Baseline B SCALING_MATRIX verbatim payload, NOT from spec recall. Phase 6 protocol: capture the BBB SCALING_MATRIX bytes via verbose strace, decode into the four 64-byte arrays, transcribe with byte-equality assertion.
File 5: src/h265.h — re-enable
Currently meson.build:73 has # 'h265.h' commented. Uncomment.
h265.h exposes only int h265_set_controls(...) declaration; the new helpers (h265_fill_decode_params, h265_fill_scaling_matrix, h265_init_device_controls) stay file-static.
File 6: src/meson.build — uncomment h265.c + h265.h
@@ -47,7 +47,7 @@ sources = [
'request_pool.c',
'cap_pool.c',
-# 'h265.c'
+ 'h265.c'
]
@@ -70,7 +70,7 @@ headers = [
'cap_pool.h',
-# 'h265.h'
+ 'h265.h'
]
File 7: src/context.c — extend device-init for HEVC (optional)
Decision (defer to Phase 6 implementer): either extend src/context.c:142-155's device-init block to also set HEVC DECODE_MODE + START_CODE controls (would fire EINVAL on hantro-vpu-dec same as the existing H.264 controls — auxiliary noise, intentionally swallowed by (void)v4l2_set_controls). OR set them inside h265_set_controls first-call.
Lower-risk path: extend context.c's existing block (mirrors the existing pattern, minimal new code). Picks up the EINVAL noise cosmetic on non-HEVC devices but matches existing behavior. Phase 6 default: extend context.c.
File 8: include/hevc-ctrls.h — leave as-is
The 9-line shim is harmless (per Phase 2 Bug 7 verify-only). NOT deleted in iter2 (lower-risk path; iter1 Phase 5 Nit 6 deferral continues).
Phase 6 implementation order
Phase 6 lands in 2 logical commits + optional fix-forward:
-
Commit A —
src/config.cHEVCMain break: 5-line diff. Verifies the substrate fix in isolation (Phase 3 Baseline D already proved it). Phase 7 partial verification: criterion 1 + 2 should pass (vainfo enum unchanged,vaCreateConfigSUCCESS); criteria 3-5 still fail because picture.c reject is in place. -
Commit B — h265.c rewrite + picture.c HEVCMain dispatch + slice_params accumulation + meson re-enable + surface.h extension + context.c device-init extension: the bulk of iter2 work. Phase 7 verification: all 5 criteria green.
-
Commit C (optional) — fix-forward if Phase 7 surfaces a regression. Per
memory/feedback_header_deletion_check.md, iter2 doesn't deletehevc-ctrls.h, so the iter1 Commit-D-style header-completeness oversight doesn't apply. Other fix-forward triggers are Phase 7 → Phase 4 loopback signals; pre-identified below.
Implementation strategy for Commit B: develop incrementally inside h265.c with printf instrumentation showing each per-frame fill (SPS struct hex dump, PPS, decode_params, slice_params count, scaling_matrix presence). After build passes and mpv-vaapi runs without crash, decode 2 frames and compare HW vs SW JPEG hashes. Iterate until match. Strip instrumentation at close (per phase8_iteration1_close.md iter1 sweep precedent).
Phase 7 verification harness
Re-uses iter1's 5-criterion shape with HEVC fixture substituted. All 5 run in one pass; raw output captured to phase0_evidence/2026-05-08-or-later/iter2_phase7/.
# Re-build + install
ssh fresnel '
cd ~/src/libva-v4l2-request-fourier
git pull --ff-only
ninja -C build && sudo ninja -C build install
sha256sum /usr/lib/dri/v4l2_request_drv_video.so
'
# Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec bind
ssh fresnel '
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
vainfo --display drm --device /dev/dri/renderD128 2>&1 | \
grep -E "VAProfileHEVCMain"
'
# Criteria 2 + 3: vaCreateConfig + ffmpeg-direct decode
ssh fresnel '
mkdir -p /tmp/iter2_phase7
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
LIBVA_TRACE=/tmp/iter2_phase7/libva.trace \
ffmpeg -hide_banner -loglevel info -hwaccel vaapi \
-i ~/fourier-test/bbb_720p10s_hevc.mp4 -frames:v 5 -f null -
'
# Expected: exit 0, no Failed-to-create-decode-config, libva trace
# shows vaCreateConfig SUCCESS, no EINVAL on S_EXT_CTRLS.
# Criterion 4: DMA-BUF GL HW vs SW byte-identical at +02s
ssh fresnel '
mkdir -p /tmp/iter2_phase7/png_hw /tmp/iter2_phase7/png_sw
WAYLAND_DISPLAY=wayland-0 XDG_RUNTIME_DIR=/run/user/1000 \
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
mpv --hwdec=vaapi --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:02 \
--vo-image-outdir=/tmp/iter2_phase7/png_hw \
~/fourier-test/bbb_720p10s_hevc.mp4
mpv --hwdec=no --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:02 \
--vo-image-outdir=/tmp/iter2_phase7/png_sw \
~/fourier-test/bbb_720p10s_hevc.mp4
sha256sum /tmp/iter2_phase7/png_hw/*.jpg /tmp/iter2_phase7/png_sw/*.jpg
'
# Expected: HW frame 1 hash == SW frame 1 hash; HW frame 2 hash ==
# SW frame 2 hash; frame 1 hash != frame 2 hash (real motion).
# Per memory feedback_rockchip_pixel_verify_path.md — DMA-BUF GL is
# the cache-coherency-safe verifier; do NOT use ffmpeg-vaapi+hwdownload
# (cache-stale class on RK3399 for both H.264 + MPEG-2; HEVC expected same).
# Criterion 5: iter1 MPEG-2 + T4 H.264 reference hashes still match
ssh fresnel '
# H.264 (T4 reference)
mkdir -p /tmp/iter2_phase7/h264_hw /tmp/iter2_phase7/h264_sw
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
mpv --hwdec=vaapi --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:30 \
--vo-image-outdir=/tmp/iter2_phase7/h264_hw \
~/fourier-test/bbb_1080p30_h264.mp4
mpv --hwdec=no --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:30 \
--vo-image-outdir=/tmp/iter2_phase7/h264_sw \
~/fourier-test/bbb_1080p30_h264.mp4
# MPEG-2 (iter1 reference)
mkdir -p /tmp/iter2_phase7/mpeg2_hw /tmp/iter2_phase7/mpeg2_sw
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1 \
mpv --hwdec=vaapi --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:02 \
--vo-image-outdir=/tmp/iter2_phase7/mpeg2_hw \
~/fourier-test/bbb_720p10s_mpeg2.ts
mpv --hwdec=no --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:02 \
--vo-image-outdir=/tmp/iter2_phase7/mpeg2_sw \
~/fourier-test/bbb_720p10s_mpeg2.ts
sha256sum /tmp/iter2_phase7/h264_hw/*.jpg /tmp/iter2_phase7/h264_sw/*.jpg \
/tmp/iter2_phase7/mpeg2_hw/*.jpg /tmp/iter2_phase7/mpeg2_sw/*.jpg
'
# Expected:
# H.264 frames at +30s: f623d5f7... (frame 1) and 7d7bc6f2... (frame 2)
# MPEG-2 frames at +02s: 6e7873030dbf... (frame 1) and ccc7ce08810d... (frame 2)
# Bonus byte-compare: post-fix S_EXT_CTRLS payload vs Baseline B verbatim
ssh fresnel '
mkdir -p /tmp/iter2_phase7/cross
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
strace -ff -tt -y -v -s 8192 -e trace=ioctl \
-o /tmp/iter2_phase7/cross/ffmpeg.strace \
ffmpeg -hide_banner -loglevel error -hwaccel vaapi \
-i ~/fourier-test/bbb_720p10s_hevc.mp4 -frames:v 2 -f null -
grep "VIDIOC_S_EXT_CTRLS.*ctrl_class=0xf010000.*count=5" \
/tmp/iter2_phase7/cross/ffmpeg.strace.* | head -2
'
# Expected per Baseline B: per frame, count=5 with ids 0xa40a90/91/92/93/94
# in order; SPS bytes for first 40 should match Baseline B's BBB-SPS verbatim
# (1280x720, 8-bit, 4:2:0, flags=SAO|STRONG_INTRA_SMOOTHING).
Pass/fail decision
All 5 criteria PASS → Phase 7 closes green; proceed to Phase 8 (memory update + close iter2).
Any criterion FAIL → Phase 7 → Phase 4 loopback per feedback_dev_process.md. Pre-identified loopback triggers:
-
VIDIOC_S_EXT_CTRLSreturns EINVAL post-fix on per-frame batch. Likely causes:- Struct size mismatch between iter2's stack-allocated structs and kernel-expected sizes. Mitigation:
paholeagainst kernel UAPI; compare to Phase 3 Baseline B verbatim sizes (40 + 64 + 328 = 432 bytes for the fixed-size controls). - SCALING_MATRIX size encoding wrong (depends on whether kernel expects fixed or runtime-discovered size).
- reserved fields not zeroed (
memsetwas forgotten on a struct).
- Struct size mismatch between iter2's stack-allocated structs and kernel-expected sizes. Mitigation:
-
HW pixel hashes differ from SW. Likely causes:
- DPB ordering wrong (FFmpeg populates
poc_st_curr_before/afterin specific order; iter2's translation from VAAPI ReferenceFrames must match). - Slice_params bit_size or data_bit_offset off-by-N from NAL header byte alignment quirks (preserved logic from old h265.c, but the dynamic-array shape might affect slice boundaries).
- SPS/PPS flags bitmask wrong bit position (e.g.,
_SAMPLE_ADAPTIVE_OFFSETis bit 3, not bit 4 — easy off-by-1). - SCALING_MATRIX values wrong (transcribed from spec rather than from Baseline B verbatim — per Lesson L2, this is the common trap).
- DPB ordering wrong (FFmpeg populates
-
mpv
--hwdec=vaapifilters HEVC out (analogous to vaapi-copy filtering MPEG-2). Mitigation: per Phase 5 Q4 amendment in iter1, fall-forward to ffmpeg-vf hwdownloadpath. Less likely than for MPEG-2 because mpv-vaapi DID engage MPEG-2 in iter1. -
iter1 MPEG-2 OR T4 H.264 regression. Bug 1 + picture.c HEVCMain dispatch must not touch MPEG-2 / H.264 paths. Mitigation: verify Phase 3 Baseline D-style scratch was scoped right; re-read the diffs against the dispatch tables.
-
Slice_params dynamic-array submission shape rejected by kernel. Possible if kernel expects
countas element count rather thansizeas bytes (the kernel UAPI might want a different size encoding). Mitigation: cross-validator anchor in Phase 3 Baseline B has the verbatimsize=Nvalue for one frame's batch; iter2's submission must produce a matching size for matching slice count. If dynamic-array semantics are confusing, FFmpegv4l2_request_hevc.c:540-547has the canonical pattern. -
SCALING_MATRIX availability detection wrong. iter2 assumes kernel always advertises (matches Baseline B). If on a different host (e.g., ohm) kernel doesn't advertise, the unconditional submission would fail. Mitigation: probe via
VIDIOC_QUERY_EXT_CTRLat h265_init_device_controls; gate inclusion in batch on probe result. Defer this defensive path to Phase 6 if Phase 3 Baseline B is anchor enough. -
Latent bug B3 (h264.matrix_set=false writes inside h265.picture) — for HEVC surfaces, byte 240 of the
paramsunion lands insideh265.picture(Phase 2 Bug 8 verified). RenderPicture'sVAPictureParameterBufferTypeper-frame copy overwrites it. Iter1 Bug 8 documentation explains the masking; iter2 inherits the same masking via ffmpeg-vaapi sender pattern (always sends VAPictureParameterBufferType per frame). If a VAAPI client surfaces without per-frame picture params, iter2 won't catch it — same latent as iter1.
Out of scope (LOCKED for iter2)
- VP9, VP8 work (iter3/iter4).
- HEVC Main 10 (10-bit) profile.
- HEVC Main Still Picture profile.
- HEVC range extensions (SCC, REXT) —
EXT_SPS_ST_RPS,EXT_SPS_LT_RPScontrols. - HEVC tile / wavefront parallel processing —
ENTRY_POINT_OFFSETScontrol. - Performance metrics (Phase 1+ separate iteration).
- Long-duration HEVC stress (>10s).
- Slice-mode decoding (
SLICE_BASEDdecode mode) — rkvdec only does FRAME_BASED. - Phase 4 cross-cutting backlog items B1 (V4L2 device-discovery), B3 (BeginPicture profile-aware reset), B4 (context.c log suppression), B5 (vbv_buffer_size negotiation), L3 (vaDeriveImage cache-stale fix).
- chromium-fourier 149 install on fresnel.
- Upstream Linux engagement.
include/hevc-ctrls.hdeletion (carries forward from iter1 Phase 5 Nit 6).
Phase 5 entry point
Phase 5 (second-model review) inputs: this plan + the Phase 3 Baseline B verbatim payloads. Per feedback_dev_process.md:
Goal, situation, measurements, plan get pasted into DokuWiki. Markus reviews and redacts, then initiates the handover to a fresh model instance. Claude does not curate the artifact going to the reviewer — that would re-introduce the blind-spot accumulation the review is meant to escape. Do not summarize when handing over; paste the actual artifacts.
Concretely: artifacts to hand over are the four primary documents in this campaign repo (phase0_findings_iter2.md, phase2_iter2_situation.md, phase3_iter2_baseline.md, phase4_iter2_plan.md) plus the phase0_evidence/2026-05-08/iter2_phase3/ raw output. No summary, no executive overview, no "the gist is" framing — Markus has the raw bundle, the reviewer reads it directly.
Per memory/feedback_review_empirical_over_theoretical.md: when the reviewer flags a numerical mismatch, the right response is "I'll empirically check during Phase 7" — NOT a same-day source-read rebuttal.
Predicted iter2 outcome
The fix is structurally larger than iter1 (10 contract clauses vs 6) but bounded:
- Trivial: Bugs 1, 8, 9 (config break + meson re-enable + dispatch) total ~15 lines.
- Substantial: Bugs 3, 4, 5, 7, 10 (h265.c rewrite + DECODE_PARAMS + SCALING_MATRIX + slice_params dynamic-array + per-slice accumulation in picture.c) — ~400 lines combined.
Expected Phase 7 outcome: criteria 1+2 pass after Commit A. Criteria 3+4+5 pass after Commit B. Likely 1-2 Phase 7 → Phase 4 loopbacks for off-by-one bit positions in flags bitmasks or DPB ordering nuances. Phase 8 close estimated to land 4-6 commits on the fork (vs iter1's 4).
If a major surprise fires (e.g., slice_params dynamic-array submission requires a different ioctl path, or scaling_matrix structure differs significantly between FFmpeg and kernel UAPI), Phase 7 → Phase 4 → Phase 2 loopback to source-read deeper. Substrate is well-understood; major surprises unlikely.