Files
fresnel-fourier/phase4_iter2_plan.md
T
claude-noether 348736eb63 iter2 Phase 4: plan — 10 contract clauses, ~400-line h265.c rewrite
Phase 4 plan for iter2 HEVC fix. Structured per the
feedback_dev_process.md Phase 6 contract-before-code worked example
(0012-h264-omit-scaling-matrix-frame-based.patch shape): contract
clauses with citations first, then code changes mapping 1:1 to
clauses.

10 contract clauses cited from authoritative sources:

  Clause 1 — Per-frame batched VIDIOC_S_EXT_CTRLS, count=5
    Authority: linux/v4l2-controls.h:2090-2300 (8 HEVC stateless CIDs)
    Reference impl: FFmpeg libavcodec/v4l2_request_hevc.c:505-565
                    (v4l2_request_hevc_queue_decode)
    Empirical anchor: Phase 3 Baseline B verbatim payload

  Clause 2 — v4l2_ctrl_hevc_sps layout (40 bytes)
    Authority: linux/v4l2-controls.h:2096+ (struct + 9 SPS_FLAG_* bits)
    Field-by-field VAAPI source mapping table; existing
    h265_fill_sps logic preserved, just routed to flags bitmask
    Phase 3 Baseline B BBB SPS bytes: flags=SAO|STRONG_INTRA_SMOOTHING

  Clause 3 — v4l2_ctrl_hevc_pps layout (64 bytes, 19 flags)
    Authority: linux/v4l2-controls.h:2126-2150
    Field source: VAPictureParameterBufferHEVC + slice (for
                  dependent_slice_segment_flag)

  Clause 4 — v4l2_ctrl_hevc_slice_params (variable; dynamic-array)
    Authority: kernel exposes 0xa40a92 elems=1 dims=[600] dynamic-array
    Submission shape: size = sizeof(slice_params) * num_slices_in_frame
    Reference impl: FFmpeg v4l2_request_hevc.c:540-547
    BEHAVIORAL CHANGE: per-slice accumulation in codec_store_buffer
                      (replace overwrite with append-to-array)
    DPB MOVES OUT of slice_params to DECODE_PARAMS (Clause 6)

  Clause 5 — v4l2_ctrl_hevc_scaling_matrix (size M; conditional)
    Conditional on kernel availability (probed via VIDIOC_QUERY_EXT_CTRL
    at init), NOT on bitstream flag (Phase 3 baseline corrects Phase 2
    assumption)
    Spec defaults from ISO/IEC 23008-2 Table 4-1 when iqmatrix_set==false
    PROTOCOL: transcribe defaults from Phase 3 Baseline B verbatim
              SCALING_MATRIX bytes, NOT from spec recall (per
              memory feedback_review_empirical_over_theoretical.md)

  Clause 6 — v4l2_ctrl_hevc_decode_params layout (328 bytes)
    NEW in modern API (didn't exist in staging-era)
    Contains: DPB array (16 entries), POC, num_active_dpb_entries,
              num_poc_st_curr_before/after, num_poc_lt_curr,
              poc_st_curr_before[8], etc.
    Source: existing h265_fill_slice_params lines 269-315 logic
            preserved, routed to new struct

  Clause 7 — Device-wide DECODE_MODE + START_CODE menus
    Set once at init via v4l2_set_controls(...request_fd=-1, 2 ctrls)
    rkvdec accepts: FRAME_BASED + ANNEX_B (only options per kernel menu
                    constraints, Phase 0 v4l2_inventory)
    Default location: extend src/context.c:142-155 device-init block

  Clause 8 — config.c HEVCMain case must break;
    Authority: C semantics; iter1 Bug 1 pattern verbatim
    Empirical anchor: Phase 3 Baseline D scratch confirmed

  Clause 9 — picture.c::codec_set_controls HEVCMain dispatch
    Authority: existing MPEG-2 dispatch pattern at picture.c:186-191
    Replace explicit Fourier-local: HEVC stripped reject with
    h265_set_controls call

  Clause 10 — Per-slice accumulation in codec_store_buffer
    HEVC slice_params dynamic-array source = per-RenderPicture appends
    BeginPicture resets num_slices=0; codec_store_buffer appends each
    VASliceParameterBufferType to slices[N] array

Diff scope (8 files):
  src/config.c     — 5-line break addition (Clause 8)
  src/picture.c    — HEVCMain dispatch (Clause 9) + per-slice
                     accumulation (Clause 10) + BeginPicture
                     num_slices reset, ~25 lines
  src/surface.h    — extend params.h265 with slices[64] +
                     num_slices, ~17 KB extra per surface union
  src/h265.c       — full rewrite ~400 lines (Clauses 2-7)
  src/h265.h       — re-enable
  src/meson.build  — uncomment h265.c + h265.h
  src/context.c    — extend device-init for HEVC DECODE_MODE +
                     START_CODE
  include/hevc-ctrls.h — leave as-is (9-line shim, lower-risk path
                          per iter1 Phase 5 Nit 6 deferral)

Phase 6 implementation order (2 logical commits + optional fix-forward):
  A: src/config.c HEVCMain break only (substrate fix in isolation;
     Phase 3 Baseline D already verified collateral safe)
  B: h265.c rewrite + picture.c dispatch + slice_params accumulation +
     meson re-enable + surface.h extension + context.c device-init
  C: optional fix-forward if Phase 7 surfaces a regression

Phase 7 verification harness (full Bash incantations in plan body):
  Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec
  Criterion 2: vaCreateConfig(VAProfileHEVCMain) = SUCCESS via libva trace
  Criterion 3: ffmpeg -hwaccel vaapi exit 0, no Failed-to-create
  Criterion 4: mpv --hwdec=vaapi --vo=image at +02s; HW=SW byte-identical
              (DMA-BUF GL cache-coherency-safe path per memory
              feedback_rockchip_pixel_verify_path.md)
  Criterion 5: iter1 MPEG-2 + T4 H.264 reference hashes still match
  Bonus: byte-compare post-fix S_EXT_CTRLS payload vs Baseline B

Pre-identified Phase 7 → Phase 4 loopback triggers:
  1. S_EXT_CTRLS EINVAL post-fix → check struct sizes (pahole),
     reserved zeroing, SCALING_MATRIX size encoding
  2. HW pixel hash mismatch → DPB ordering, slice_params bit_offset,
     SPS/PPS flags bit positions, SCALING_MATRIX values
  3. mpv --hwdec=vaapi filters HEVC out → fall-forward to ffmpeg
     -vf hwdownload (less likely; vaapi engaged MPEG-2 in iter1)
  4. iter1/T4 regression → verify diffs scoped right
  5. Slice_params dynamic-array submission shape rejected → cross-
     validator size encoding anchor
  6. SCALING_MATRIX availability detection wrong → defensive
     QUERY_EXT_CTRL probe in h265_init_device_controls
  7. Latent bug B3 hits HEVC differently than MPEG-2 → byte 240 in
     h265.picture; ffmpeg-vaapi sends VAPictureParameterBufferType
     per frame so masking holds

Out-of-scope (LOCKED): VP9/VP8; HEVC Main 10 / Main Still Picture /
range ext / tile-wavefront; perf metrics; long-duration stress;
SLICE_BASED decode mode (rkvdec FRAME_BASED only); Phase 4 cross-
cutting backlog (B1 device-discovery, B3 BeginPicture profile-aware,
B4 context.c log suppression, B5 vbv_buffer_size, L3 vaDeriveImage
cache-stale); chromium-fourier 149 install; upstream engagement;
hevc-ctrls.h deletion (Phase 5 Nit 6 lower-risk path continues).

Predicted Phase 8 close: 4-6 commits on the fork (vs iter1's 4).
Iter2 ~3x larger codebase delta than iter1 (mpeg2.c rewrite was
~120 lines; h265.c rewrite is ~400 lines).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:56:51 +00:00

41 KiB
Raw Blame History

Iteration 2 — Phase 4 (plan)

Implementation plan for iter2 HEVC Main on rkvdec. Inputs:

Per feedback_dev_process.md Phase 6 contract-before-code: this plan opens with the contract clauses (kernel UAPI + FFmpeg reference + Phase 3 Baseline B verbatim citations), then specifies code changes that map 1:1 to those clauses.

Phase 1 criteria (re-stated; no Phase 3 → Phase 1 loopback this time)

Per phase0_findings_iter2.md, all 5 criteria as locked. No Phase 3 surprises required adjustment (criterion 3 already anchored on ffmpeg-direct from the start, mirroring iter1's Phase 5 Q4 amendment).

  1. vainfo enumeration regression: VAProfileHEVCMain continues to be listed on the rkvdec env binding. (Already passes; iter2 must not strip.)
  2. vaCreateConfig success: vaCreateConfig(VAProfileHEVCMain, VAEntrypointVLD) returns VA_STATUS_SUCCESS. (Currently VA_STATUS_ERROR_UNSUPPORTED_PROFILE = 12.)
  3. End-to-end ffmpeg-direct decode: ffmpeg -hwaccel vaapi -i bbb_720p10s_hevc.mp4 -frames:v 5 -f null - exits 0; libva trace shows vaCreateConfig SUCCESS; no Failed to create decode configuration lines; no EINVAL from VIDIOC_S_EXT_CTRLS.
  4. DMA-BUF GL HW=SW byte-identical at +02s: 2 distinct frames hash-equal across HW (mpv --hwdec=vaapi --vo=image) and SW (--hwdec=no); frames 1 vs 2 hash-differ (real motion).
  5. Regression on iter1 MPEG-2 AND T4 H.264: both prior-iteration cells continue to pass with their reference hashes.

Contract clauses (cite-before-code)

Clause 1 — Per-frame batched VIDIOC_S_EXT_CTRLS with 5 controls

Authority: Linux mainline include/uapi/linux/v4l2-controls.h:2090-2300 defines the 5 mandatory + 2 device-wide + 3 conditional HEVC stateless controls:

#define V4L2_CID_STATELESS_HEVC_SPS              (V4L2_CID_CODEC_STATELESS_BASE+400)  /* 0xa40a90 */
#define V4L2_CID_STATELESS_HEVC_PPS              (V4L2_CID_CODEC_STATELESS_BASE+401)  /* 0xa40a91 */
#define V4L2_CID_STATELESS_HEVC_SLICE_PARAMS     (V4L2_CID_CODEC_STATELESS_BASE+402)  /* 0xa40a92 */
#define V4L2_CID_STATELESS_HEVC_SCALING_MATRIX   (V4L2_CID_CODEC_STATELESS_BASE+403)  /* 0xa40a93 */
#define V4L2_CID_STATELESS_HEVC_DECODE_PARAMS    (V4L2_CID_CODEC_STATELESS_BASE+404)  /* 0xa40a94 */
#define V4L2_CID_STATELESS_HEVC_DECODE_MODE      (V4L2_CID_CODEC_STATELESS_BASE+405)  /* 0xa40a95 */
#define V4L2_CID_STATELESS_HEVC_START_CODE       (V4L2_CID_CODEC_STATELESS_BASE+406)  /* 0xa40a96 */
#define V4L2_CID_STATELESS_HEVC_ENTRY_POINT_OFFSETS (V4L2_CID_CODEC_STATELESS_BASE+407) /* not iter2 — tile/wavefront */

Reference implementation: FFmpeg libavcodec/v4l2_request_hevc.c:505-565 (v4l2_request_hevc_queue_decode) builds a 5-element v4l2_ext_control array and submits via ff_v4l2_request_decode_frame (single VIDIOC_S_EXT_CTRLS per frame).

Empirical anchor: Phase 3 Baseline B strace verbatim (phase3_iter2_baseline.md + phase0_evidence/2026-05-08/iter2_phase3/ffmpeg_v4l2req.strace.* gitignored) shows:

ioctl(/dev/video1, VIDIOC_S_EXT_CTRLS,
  {ctrl_class=0xf010000 /* V4L2_CTRL_CLASS_CODEC_STATELESS */,
   count=5,
   controls=[
     {id=0xa40a90 SPS,            size=40,  ...},
     {id=0xa40a91 PPS,            size=64,  ...},
     {id=0xa40a92 SLICE_PARAMS,   size=N,   ...},  /* dynamic-array */
     {id=0xa40a93 SCALING_MATRIX, size=M,   ...},  /* conditional on kernel availability */
     {id=0xa40a94 DECODE_PARAMS,  size=328, ...}
  ]}) = 0

Implication for iter2: h265_set_controls() builds a 5-entry struct v4l2_ext_control array and submits via the existing v4l2_set_controls(driver_data->video_fd, surface_object->request_fd, controls, 5) API. One VIDIOC_S_EXT_CTRLS per frame, mirroring iter1 MPEG-2 + iter6/7/8 H.264 patterns.

Clause 2 — v4l2_ctrl_hevc_sps field layout (40 bytes)

Authority: <linux/v4l2-controls.h>:2096+ struct v4l2_ctrl_hevc_sps:

struct v4l2_ctrl_hevc_sps {
    __u8  video_parameter_set_id;
    __u8  seq_parameter_set_id;
    __u16 pic_width_in_luma_samples;
    __u16 pic_height_in_luma_samples;
    __u8  bit_depth_luma_minus8;
    __u8  bit_depth_chroma_minus8;
    __u8  log2_max_pic_order_cnt_lsb_minus4;
    __u8  sps_max_dec_pic_buffering_minus1;
    __u8  sps_max_num_reorder_pics;
    __u8  sps_max_latency_increase_plus1;
    __u8  log2_min_luma_coding_block_size_minus3;
    __u8  log2_diff_max_min_luma_coding_block_size;
    __u8  log2_min_luma_transform_block_size_minus2;
    __u8  log2_diff_max_min_luma_transform_block_size;
    __u8  max_transform_hierarchy_depth_inter;
    __u8  max_transform_hierarchy_depth_intra;
    __u8  pcm_sample_bit_depth_luma_minus1;
    __u8  pcm_sample_bit_depth_chroma_minus1;
    __u8  log2_min_pcm_luma_coding_block_size_minus3;
    __u8  log2_diff_max_min_pcm_luma_coding_block_size;
    __u8  num_short_term_ref_pic_sets;
    __u8  num_long_term_ref_pics_sps;
    __u8  chroma_format_idc;
    __u8  sps_max_sub_layers_minus1;
    __u8  reserved[6];
    __u64 flags;
};

Total 40 bytes (verified against Phase 3 Baseline B verbatim payload size). 9 boolean fields collapsed into u64 flags:

#define V4L2_HEVC_SPS_FLAG_SEPARATE_COLOUR_PLANE         (1ULL << 0)
#define V4L2_HEVC_SPS_FLAG_SCALING_LIST_ENABLED          (1ULL << 1)
#define V4L2_HEVC_SPS_FLAG_AMP_ENABLED                   (1ULL << 2)
#define V4L2_HEVC_SPS_FLAG_SAMPLE_ADAPTIVE_OFFSET        (1ULL << 3)
#define V4L2_HEVC_SPS_FLAG_PCM_ENABLED                   (1ULL << 4)
#define V4L2_HEVC_SPS_FLAG_PCM_LOOP_FILTER_DISABLED      (1ULL << 5)
#define V4L2_HEVC_SPS_FLAG_LONG_TERM_REF_PICS_PRESENT    (1ULL << 6)
#define V4L2_HEVC_SPS_FLAG_SPS_TEMPORAL_MVP_ENABLED      (1ULL << 7)
#define V4L2_HEVC_SPS_FLAG_STRONG_INTRA_SMOOTHING_ENABLED (1ULL << 8)

VAAPI source mapping (mostly preserved from current src/h265.c::h265_fill_sps, just routed to flags collapsed bitmask):

New SPS field Source: VAPictureParameterBufferHEVC picture
pic_width_in_luma_samples picture->pic_width_in_luma_samples
pic_height_in_luma_samples picture->pic_height_in_luma_samples
bit_depth_luma_minus8 picture->bit_depth_luma_minus8
bit_depth_chroma_minus8 picture->bit_depth_chroma_minus8
chroma_format_idc picture->pic_fields.bits.chroma_format_idc
log2_max_pic_order_cnt_lsb_minus4 picture->log2_max_pic_order_cnt_lsb_minus4
sps_max_dec_pic_buffering_minus1 picture->sps_max_dec_pic_buffering_minus1
sps_max_num_reorder_pics 0 (current code hardcodes; VAAPI doesn't expose)
sps_max_latency_increase_plus1 0 (same)
log2_min_luma_coding_block_size_minus3 picture->log2_min_luma_coding_block_size_minus3
log2_diff_max_min_luma_coding_block_size picture->log2_diff_max_min_luma_coding_block_size
log2_min_luma_transform_block_size_minus2 picture->log2_min_transform_block_size_minus2
log2_diff_max_min_luma_transform_block_size picture->log2_diff_max_min_transform_block_size
max_transform_hierarchy_depth_inter/intra same fields in VAAPI
pcm_sample_bit_depth_luma_minus1, etc. same fields
num_short_term_ref_pic_sets picture->num_short_term_ref_pic_sets
num_long_term_ref_pics_sps picture->num_long_term_ref_pic_sps
sps_max_sub_layers_minus1 0 (VAAPI doesn't expose; placeholder)
video_parameter_set_id 0 (VAAPI doesn't expose)
seq_parameter_set_id 0 (VAAPI doesn't expose)
flags (OR of:)
_SEPARATE_COLOUR_PLANE picture->pic_fields.bits.separate_colour_plane_flag
_SCALING_LIST_ENABLED picture->pic_fields.bits.scaling_list_enabled_flag
_AMP_ENABLED picture->pic_fields.bits.amp_enabled_flag
_SAMPLE_ADAPTIVE_OFFSET picture->slice_parsing_fields.bits.sample_adaptive_offset_enabled_flag
_PCM_ENABLED picture->pic_fields.bits.pcm_enabled_flag
_PCM_LOOP_FILTER_DISABLED picture->pic_fields.bits.pcm_loop_filter_disabled_flag
_LONG_TERM_REF_PICS_PRESENT picture->slice_parsing_fields.bits.long_term_ref_pics_present_flag
_SPS_TEMPORAL_MVP_ENABLED picture->slice_parsing_fields.bits.sps_temporal_mvp_enabled_flag
_STRONG_INTRA_SMOOTHING_ENABLED picture->pic_fields.bits.strong_intra_smoothing_enabled_flag
reserved[6] zero (via memset)

Phase 3 Baseline B verbatim sanity: BBB SPS bytes decode to: 1280×720, 8-bit, 4:2:0, no PCM, flags=SAMPLE_ADAPTIVE_OFFSET | STRONG_INTRA_SMOOTHING_ENABLED (0x108). iter2 implementation must produce the same 40 bytes for this fixture (Phase 7 byte-compare check).

Clause 3 — v4l2_ctrl_hevc_pps field layout (64 bytes)

Authority: <linux/v4l2-controls.h>:2150+ struct v4l2_ctrl_hevc_pps. Total 64 bytes. 19 boolean PPS fields collapsed into u64 flags:

#define V4L2_HEVC_PPS_FLAG_DEPENDENT_SLICE_SEGMENT_ENABLED        (1ULL << 0)
#define V4L2_HEVC_PPS_FLAG_OUTPUT_FLAG_PRESENT                    (1ULL << 1)
#define V4L2_HEVC_PPS_FLAG_SIGN_DATA_HIDING_ENABLED               (1ULL << 2)
#define V4L2_HEVC_PPS_FLAG_CABAC_INIT_PRESENT                     (1ULL << 3)
#define V4L2_HEVC_PPS_FLAG_CONSTRAINED_INTRA_PRED                 (1ULL << 4)
#define V4L2_HEVC_PPS_FLAG_TRANSFORM_SKIP_ENABLED                 (1ULL << 5)
#define V4L2_HEVC_PPS_FLAG_CU_QP_DELTA_ENABLED                    (1ULL << 6)
#define V4L2_HEVC_PPS_FLAG_PPS_SLICE_CHROMA_QP_OFFSETS_PRESENT    (1ULL << 7)
#define V4L2_HEVC_PPS_FLAG_WEIGHTED_PRED                          (1ULL << 8)
#define V4L2_HEVC_PPS_FLAG_WEIGHTED_BIPRED                        (1ULL << 9)
#define V4L2_HEVC_PPS_FLAG_TRANSQUANT_BYPASS_ENABLED              (1ULL << 10)
#define V4L2_HEVC_PPS_FLAG_TILES_ENABLED                          (1ULL << 11)
#define V4L2_HEVC_PPS_FLAG_ENTROPY_CODING_SYNC_ENABLED            (1ULL << 12)
#define V4L2_HEVC_PPS_FLAG_LOOP_FILTER_ACROSS_TILES_ENABLED       (1ULL << 13)
#define V4L2_HEVC_PPS_FLAG_PPS_LOOP_FILTER_ACROSS_SLICES_ENABLED  (1ULL << 14)
#define V4L2_HEVC_PPS_FLAG_DEBLOCKING_FILTER_OVERRIDE_ENABLED     (1ULL << 15)
#define V4L2_HEVC_PPS_FLAG_PPS_DISABLE_DEBLOCKING_FILTER          (1ULL << 16)
#define V4L2_HEVC_PPS_FLAG_LISTS_MODIFICATION_PRESENT             (1ULL << 17)
#define V4L2_HEVC_PPS_FLAG_SLICE_SEGMENT_HEADER_EXTENSION_PRESENT (1ULL << 18)

VAAPI source mapping: extracted from BOTH picture (VAPictureParameterBufferHEVC) AND slice (VASliceParameterBufferHEVC for dependent_slice_segment_flag). The current src/h265.c::h265_fill_pps (lines 48-102) does the field extraction correctly; iter2 just collapses booleans into the new u64 flags bitmask:

New PPS field source Old h265.c location
pps->dependent_slice_segment_flag (now flags & DEPENDENT_SLICE_SEGMENT_ENABLED) slice->LongSliceFlags.fields.dependent_slice_segment_flag (line 54)
pps->output_flag_present_flag (now flags & OUTPUT_FLAG_PRESENT) picture->slice_parsing_fields.bits.output_flag_present_flag
pps->num_extra_slice_header_bits (kept as field) picture->num_extra_slice_header_bits
... (15 more boolean field-to-flag conversions, mechanical)
pps->init_qp_minus26 (kept) picture->init_qp_minus26
pps->diff_cu_qp_delta_depth (kept) picture->diff_cu_qp_delta_depth
pps->pps_cb_qp_offset (kept) picture->pps_cb_qp_offset
pps->pps_cr_qp_offset (kept) picture->pps_cr_qp_offset
pps->num_tile_columns_minus1 (kept) picture->num_tile_columns_minus1
pps->num_tile_rows_minus1 (kept) picture->num_tile_rows_minus1
pps->pps_beta_offset_div2 (kept) picture->pps_beta_offset_div2
pps->pps_tc_offset_div2 (kept) picture->pps_tc_offset_div2
pps->log2_parallel_merge_level_minus2 (kept) picture->log2_parallel_merge_level_minus2
Field added: column_width_minus1[20], row_height_minus1[22], num_extra_slice_header_bits, reserved populate from VAAPI (or zero if VAAPI doesn't expose)
flags u64 with the 19 bits OR'd (mechanical boolean collapse)

Clause 4 — v4l2_ctrl_hevc_slice_params (variable; dynamic-array per frame)

Authority: <linux/v4l2-controls.h> struct v4l2_ctrl_hevc_slice_params. Contains per-slice info: bit_size, data_bit_offset, slice_type, slice_pic_order_cnt, slice flags, QP deltas, ref_idx_l0/l1[15], pred_weight_table, num_entry_point_offsets, slice_segment_addr, etc.

Phase 0 inventory confirms rkvdec advertises:

hevc_slice_parameters 0x00a40a92 (hevc-slice-params): elems=1 dims=[600] flags=has-payload, dynamic-array

So kernel accepts up to 600 slice_params entries per submission. iter2's bbb_720p10s_hevc.mp4 fixture is x265-ultrafast — typical 1 slice per frame; multi-slice would still fit in the 600-entry envelope.

Submission shape: size = sizeof(struct v4l2_ctrl_hevc_slice_params) * num_slices_in_frame. FFmpeg libavcodec/v4l2_request_hevc.c:540-547 shows the pattern:

if (ctx->max_slice_params && controls->num_slice_params) {
    control[count++] = (struct v4l2_ext_control) {
        .id = V4L2_CID_STATELESS_HEVC_SLICE_PARAMS,
        .ptr = controls->frame_slice_params,
        .size = sizeof(*controls->frame_slice_params) *
                FFMIN(controls->num_slice_params, ctx->max_slice_params),
    };
}

libva backend behavioral change (NEW for iter2): VAAPI clients submit VASliceParameterBufferType once per slice via vaRenderPicture. The current src/picture.c::codec_store_buffer:115-135 for HEVC memcpy(&surface->params.h265.slice, …) overwrites the previous slice's params. iter2 must change to append: each VASliceParameterBufferType arrival appends a new entry to a params.h265.slices[N] array, with params.h265.num_slices++. At end_picture, h265_set_controls reads the array and submits as one dynamic-array control.

VAAPI source mapping: existing src/h265.c::h265_fill_slice_params (lines 160-365) does the field extraction per-slice correctly. iter2 preserves the extraction logic (NAL header parse, data_bit_offset bit-search, ref_idx, pred_weight) but routes per-slice into an array slot rather than a single struct.

Critical: NAL header parsing at h265.c:184-209 extracts nal_unit_type and data_bit_offset from the slice bitstream. This logic is preserved — the new V4L2 API still requires per-slice bit_size and data_bit_offset. The new struct keeps these fields (they're per-slice metadata, not per-frame).

One field MOVES OUT of slice_params: the DPB array (dpb[15]) and num_active_dpb_entries / num_rps_poc_st_curr_before/after / num_rps_poc_lt_curr migrate to DECODE_PARAMS (Clause 6). iter2's per-slice fill no longer populates the DPB.

Clause 5 — v4l2_ctrl_hevc_scaling_matrix (size M; conditional submission)

Authority: <linux/v4l2-controls.h> struct v4l2_ctrl_hevc_scaling_matrix. Contains 4 scaling lists (4×4, 8×8, 16×16, 32×32) for luma + chroma intra/inter — substantial struct.

Conditional submission per FFmpeg pattern: query kernel availability once at init via VIDIOC_QUERY_EXT_CTRL for the SCALING_MATRIX CID. If kernel advertises (rkvdec on fresnel does, per Phase 3 Baseline B), include in the per-frame batch unconditionally. If kernel doesn't advertise, omit.

Phase 3 evidence: BBB fixture's per-frame batch always contains SCALING_MATRIX (see Baseline B verbatim 30 occurrences across 5 frames + queries). FFmpeg gates on ctx->has_scaling_matrix set at init from ff_v4l2_request_query_control_default_value(...SCALING_MATRIX). iter2 mirrors: probe at init, store boolean in the libva backend's per-context state, include in batch if true.

VAAPI source mapping: VAIQMatrixBufferHEVC provides the four scaling lists (scaling_lists_4x4[6][16], _8x8[6][64], _16x16[6][64], _32x32[2][64] plus DC scaling lists). When iqmatrix_set==true, copy from VAAPI struct to V4L2 struct. When iqmatrix_set==false, populate with HEVC spec default scaling matrices (per ISO/IEC 23008-2 Table 4-1 — flat 16 across all positions, with DC values 16).

Phase 3 Baseline B SCALING_MATRIX verbatim payload not field-decoded yet (deferred to Phase 6 transcription); will compare bytes against backend-generated payload at Phase 7 verification time.

Clause 6 — v4l2_ctrl_hevc_decode_params field layout (328 bytes)

Authority: <linux/v4l2-controls.h> struct v4l2_ctrl_hevc_decode_params. NEW in modern API (didn't exist in staging-era). Contains:

  • pic_order_cnt_val (s32) — current picture POC.
  • short_term_ref_pic_set_size, long_term_ref_pic_set_size — RPS sizes.
  • num_active_dpb_entries — count of valid DPB entries.
  • num_poc_st_curr_before/after, num_poc_lt_curr — short-term + long-term ref counts.
  • poc_st_curr_before[8], poc_st_curr_after[8], poc_lt_curr[8] — POC arrays for ref pic ordering.
  • dpb[16] — DPB entries: {timestamp, flags, field_pic, pic_order_cnt_val, _padding} per entry.
  • flags (u64) — IRAP_PIC, IDR_PIC, NO_OUTPUT_OF_PRIOR_PICS, etc.

Total 328 bytes (verified against Phase 3 Baseline B verbatim payload size).

VAAPI source mapping: largely preserved from current src/h265.c::h265_fill_slice_params lines 269-315 (DPB iteration over picture->ReferenceFrames[15]), just routed to a new struct. The existing logic for dpb[i].timestamp, dpb[i].rps, dpb[i].pic_order_cnt[0], field_pic migrates verbatim to decode_params.dpb[i].timestamp etc. The DPB-counting logic (num_rps_poc_st_curr_before/after, num_rps_poc_lt_curr) migrates to the num_poc_* fields of decode_params.

Submission: per-frame, after SPS + PPS in the batch.

Clause 7 — Device-wide DECODE_MODE + START_CODE menu controls

Authority: <linux/v4l2-controls.h> defines:

#define V4L2_CID_STATELESS_HEVC_DECODE_MODE   (V4L2_CID_CODEC_STATELESS_BASE+405)
#define V4L2_CID_STATELESS_HEVC_START_CODE    (V4L2_CID_CODEC_STATELESS_BASE+406)

enum v4l2_stateless_hevc_decode_mode {
    V4L2_STATELESS_HEVC_DECODE_MODE_SLICE_BASED,
    V4L2_STATELESS_HEVC_DECODE_MODE_FRAME_BASED,
};
enum v4l2_stateless_hevc_start_code {
    V4L2_STATELESS_HEVC_START_CODE_NONE,
    V4L2_STATELESS_HEVC_START_CODE_ANNEX_B,
};

Phase 0 inventory confirms fresnel rkvdec advertises:

hevc_decode_mode 0x00a40a95 (menu): min=1 max=1 default=1 (Frame-Based) flags=has-min-max
hevc_start_code  0x00a40a96 (menu): min=1 max=1 default=1 (Annex B Start Code) flags=has-min-max

So rkvdec accepts ONLY FRAME_BASED decode mode and ANNEX_B start code — same constraints as H.264 + MPEG-2. Set both at decoder init via v4l2_set_controls(driver_data->video_fd, /* request_fd= */ -1, dev_ctrls, 2) with values FRAME_BASED + ANNEX_B.

Where to set: extend src/context.c:142-155's existing H.264 device-init block to also set HEVC's two device controls when context is HEVC-profile-bound. Current pattern: 2 ext_controls in one batched call with request_fd=-1. iter2 adds 2 more controls (or a separate call) for the HEVC variants.

Alternative: set them inside h265_set_controls once per context (with a "first call" guard). Cleaner location-wise but requires per-context state. Phase 6 implementer chooses.

Clause 8 — RequestCreateConfig HEVCMain case must break;

Authority: C language semantics. src/config.c:67 case VAProfileHEVCMain: falls through to default: (line 68) which returns the error. iter1 added break; for MPEG-2 cases; HEVCMain is the last case in the same fall-through bucket.

Empirical anchor: Phase 3 Baseline D verified the patch shape in scratch — adding break; for HEVCMain lets vaCreateConfig return VA_STATUS_SUCCESS without affecting iter1 MPEG-2 or T4 H.264 hashes.

Fix shape: 5 lines (case label preserved; comment + break added; matches iter1 Commit A pattern verbatim).

Clause 9 — picture.c::codec_set_controls HEVCMain dispatch

Authority: existing src/picture.c:186-191 MPEG-2 dispatch pattern from iter1:

case VAProfileMPEG2Simple:
case VAProfileMPEG2Main:
    rc = mpeg2_set_controls(driver_data, context, surface_object);
    if (rc < 0) return VA_STATUS_ERROR_OPERATION_FAILED;
    break;

iter2 replaces the explicit case VAProfileHEVCMain: return VA_STATUS_ERROR_UNSUPPORTED_PROFILE; (lines 204-206) with the same shape, dispatching to h265_set_controls. Comment updated to remove the stale Fourier-local: HEVC stripped, no HW support on RK3566. reference.

Clause 10 — Per-slice accumulation in codec_store_buffer

Authority: HEVC kernel API requires per-slice slice_params (Clause 4). VAAPI clients submit VASliceParameterBufferType once per slice via vaRenderPicture. The current src/picture.c:115-135 for HEVC VASliceParameterBufferType does:

case VAProfileHEVCMain:
    memcpy(&surface_object->params.h265.slice, buffer_object->data, sizeof(...));
    break;

Behavior change: replace single-slot copy with array-append:

case VAProfileHEVCMain:
    if (surface_object->params.h265.num_slices < HEVC_MAX_SLICES_PER_FRAME) {
        memcpy(&surface_object->params.h265.slices[surface_object->params.h265.num_slices],
               buffer_object->data,
               sizeof(VASliceParameterBufferHEVC));
        surface_object->params.h265.num_slices++;
    } else {
        /* exceeded array bound — log and drop; Phase 7 verification flags */
    }
    break;

HEVC_MAX_SLICES_PER_FRAME = e.g. 64 (kernel max is 600; conservative). For the BBB fixture this maxes at 1 per frame; the bound is for safety.

At BeginPicture: reset num_slices = 0 per-frame. Currently picture.c:287 only resets params.h264.matrix_set = false; iter2 adds params.h265.num_slices = 0 reset for HEVC surfaces. (Or per-profile: switch on config_object->profile and reset accordingly. iter2 adds params.h265.num_slices = 0 unconditionally for now — benign for non-HEVC since the union aliasing puts num_slices in a region overwritten by RenderPicture's per-buffer copies.)

Diff scope

File 1: src/config.c — add break; for HEVCMain case (5 lines)

@@ -68,6 +68,11 @@ VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile,
 		// submission time.
 		break;
 	case VAProfileHEVCMain:
+		// fresnel-fourier iter2: HEVC enabled. Same shape as H.264/
+		// MPEG-2 above — no profile-specific config validation in the
+		// libva backend; validation happens at vaCreateContext /
+		// control submission time.
+		break;
 	default:
 		return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;

File 2: src/picture.c — replace HEVCMain reject with dispatch + per-slice slice_params accumulation (~25 lines)

Two distinct changes:

(a) Dispatch HEVCMain in codec_set_controls (lines 204-206):

-    case VAProfileHEVCMain:
-        /* Fourier-local: HEVC stripped, no HW support on RK3566. */
-        return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
+    case VAProfileHEVCMain:
+        rc = h265_set_controls(driver_data, context, surface_object);
+        if (rc < 0)
+            return VA_STATUS_ERROR_OPERATION_FAILED;
+        break;

(b) Per-slice accumulation in codec_store_buffer (HEVC VASliceParameterBufferType case, lines 127-131):

-    case VAProfileHEVCMain:
-        memcpy(&surface_object->params.h265.slice,
-               buffer_object->data,
-               sizeof(surface_object->params.h265.slice));
-        break;
+    case VAProfileHEVCMain: {
+        unsigned int n = surface_object->params.h265.num_slices;
+        if (n < HEVC_MAX_SLICES_PER_FRAME) {
+            memcpy(&surface_object->params.h265.slices[n],
+                   buffer_object->data,
+                   sizeof(VASliceParameterBufferHEVC));
+            surface_object->params.h265.num_slices = n + 1;
+        }
+        /* note: also keep .slice (singular) populated as last-slice
+         * mirror for h265_fill_pps which reads dependent_slice_segment_flag
+         * from VASliceParameterBufferHEVC->LongSliceFlags */
+        memcpy(&surface_object->params.h265.slice,
+               buffer_object->data,
+               sizeof(surface_object->params.h265.slice));
+        break;
+    }

(c) Reset num_slices in RequestBeginPicture at line 287:

     surface_object->params.h264.matrix_set = false;
+    surface_object->params.h265.num_slices = 0;

File 3: src/surface.h — extend params.h265 to hold slice_params array

Add inside the union { ... } params block:

         struct {
             VAPictureParameterBufferHEVC picture;
             VASliceParameterBufferHEVC slice;
+            VASliceParameterBufferHEVC slices[HEVC_MAX_SLICES_PER_FRAME];
+            unsigned int num_slices;
             VAIQMatrixBufferHEVC iqmatrix;
             bool iqmatrix_set;
         } h265;

HEVC_MAX_SLICES_PER_FRAME = 64 defined in surface.h (or h265.h). Total memory cost: sizeof(VASliceParameterBufferHEVC) ≈ 264 bytes × 64 = ~17 KB extra per surface union — significant but acceptable.

Alternative (smaller memory): heap-allocate slices array dynamically (malloc on first slice arrival, realloc on grow, free at surface destroy). More plumbing; defer to Phase 4 plan revision if Phase 7 surfaces memory concerns. iter2 default: stack-array of 64.

File 4: src/h265.c — full rewrite against new split API (~400 lines)

Per Clauses 2-7. The bulk of iter2 work. Structure mirrors current h265.c but routes to new struct layouts:

  • h265_fill_sps() → fill struct v4l2_ctrl_hevc_sps (40 bytes, flags collapsed). ~40 lines.
  • h265_fill_pps() → fill struct v4l2_ctrl_hevc_pps (64 bytes, flags collapsed). ~50 lines.
  • h265_fill_slice_params() → fill ONE struct v4l2_ctrl_hevc_slice_params (per-slice; called from a loop in h265_set_controls over surface->params.h265.slices[]). ~80 lines (preserves NAL header parse, data_bit_offset bit-search, ref_idx, pred_weight).
  • NEW h265_fill_decode_params() → fill struct v4l2_ctrl_hevc_decode_params (328 bytes: DPB array, POC, num_active_dpb_entries, etc.). ~60 lines.
  • NEW h265_fill_scaling_matrix() → fill struct v4l2_ctrl_hevc_scaling_matrix from VAIQMatrixBufferHEVC (or spec defaults if iqmatrix_set==false). ~30 lines.
  • NEW h265_init_device_controls() → set DECODE_MODE + START_CODE menus once per context. ~15 lines. Called from h265_set_controls with first-call guard, OR from context.c device-init block.
  • h265_set_controls() → orchestrator: build SPS, PPS, all slice_params (loop over array), DECODE_PARAMS, SCALING_MATRIX (conditional on init-time probe); submit batched. ~50 lines.

Plus the static const default scaling matrices (luma + chroma intra/inter, 4 × 64 bytes per scan-size with extra DC values) for the iqmatrix_set==false branch. Per Phase 5 Lesson L2 (feedback_review_empirical_over_theoretical.md): transcribe from Phase 3 Baseline B SCALING_MATRIX verbatim payload, NOT from spec recall. Phase 6 protocol: capture the BBB SCALING_MATRIX bytes via verbose strace, decode into the four 64-byte arrays, transcribe with byte-equality assertion.

File 5: src/h265.h — re-enable

Currently meson.build:73 has # 'h265.h' commented. Uncomment.

h265.h exposes only int h265_set_controls(...) declaration; the new helpers (h265_fill_decode_params, h265_fill_scaling_matrix, h265_init_device_controls) stay file-static.

File 6: src/meson.build — uncomment h265.c + h265.h

@@ -47,7 +47,7 @@ sources = [
 	'request_pool.c',
 	'cap_pool.c',
-#	'h265.c'
+	'h265.c'
 ]
@@ -70,7 +70,7 @@ headers = [
 	'cap_pool.h',
-#	'h265.h'
+	'h265.h'
 ]

File 7: src/context.c — extend device-init for HEVC (optional)

Decision (defer to Phase 6 implementer): either extend src/context.c:142-155's device-init block to also set HEVC DECODE_MODE + START_CODE controls (would fire EINVAL on hantro-vpu-dec same as the existing H.264 controls — auxiliary noise, intentionally swallowed by (void)v4l2_set_controls). OR set them inside h265_set_controls first-call.

Lower-risk path: extend context.c's existing block (mirrors the existing pattern, minimal new code). Picks up the EINVAL noise cosmetic on non-HEVC devices but matches existing behavior. Phase 6 default: extend context.c.

File 8: include/hevc-ctrls.h — leave as-is

The 9-line shim is harmless (per Phase 2 Bug 7 verify-only). NOT deleted in iter2 (lower-risk path; iter1 Phase 5 Nit 6 deferral continues).

Phase 6 implementation order

Phase 6 lands in 2 logical commits + optional fix-forward:

  1. Commit A — src/config.c HEVCMain break: 5-line diff. Verifies the substrate fix in isolation (Phase 3 Baseline D already proved it). Phase 7 partial verification: criterion 1 + 2 should pass (vainfo enum unchanged, vaCreateConfig SUCCESS); criteria 3-5 still fail because picture.c reject is in place.

  2. Commit B — h265.c rewrite + picture.c HEVCMain dispatch + slice_params accumulation + meson re-enable + surface.h extension + context.c device-init extension: the bulk of iter2 work. Phase 7 verification: all 5 criteria green.

  3. Commit C (optional) — fix-forward if Phase 7 surfaces a regression. Per memory/feedback_header_deletion_check.md, iter2 doesn't delete hevc-ctrls.h, so the iter1 Commit-D-style header-completeness oversight doesn't apply. Other fix-forward triggers are Phase 7 → Phase 4 loopback signals; pre-identified below.

Implementation strategy for Commit B: develop incrementally inside h265.c with printf instrumentation showing each per-frame fill (SPS struct hex dump, PPS, decode_params, slice_params count, scaling_matrix presence). After build passes and mpv-vaapi runs without crash, decode 2 frames and compare HW vs SW JPEG hashes. Iterate until match. Strip instrumentation at close (per phase8_iteration1_close.md iter1 sweep precedent).

Phase 7 verification harness

Re-uses iter1's 5-criterion shape with HEVC fixture substituted. All 5 run in one pass; raw output captured to phase0_evidence/2026-05-08-or-later/iter2_phase7/.

# Re-build + install
ssh fresnel '
cd ~/src/libva-v4l2-request-fourier
git pull --ff-only
ninja -C build && sudo ninja -C build install
sha256sum /usr/lib/dri/v4l2_request_drv_video.so
'

# Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec bind
ssh fresnel '
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
vainfo --display drm --device /dev/dri/renderD128 2>&1 | \
  grep -E "VAProfileHEVCMain"
'

# Criteria 2 + 3: vaCreateConfig + ffmpeg-direct decode
ssh fresnel '
mkdir -p /tmp/iter2_phase7
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
LIBVA_TRACE=/tmp/iter2_phase7/libva.trace \
ffmpeg -hide_banner -loglevel info -hwaccel vaapi \
  -i ~/fourier-test/bbb_720p10s_hevc.mp4 -frames:v 5 -f null -
'
# Expected: exit 0, no Failed-to-create-decode-config, libva trace
# shows vaCreateConfig SUCCESS, no EINVAL on S_EXT_CTRLS.

# Criterion 4: DMA-BUF GL HW vs SW byte-identical at +02s
ssh fresnel '
mkdir -p /tmp/iter2_phase7/png_hw /tmp/iter2_phase7/png_sw
WAYLAND_DISPLAY=wayland-0 XDG_RUNTIME_DIR=/run/user/1000 \
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
mpv --hwdec=vaapi --frames=2 --vo=image --no-audio \
    --no-input-default-bindings --start=00:00:02 \
    --vo-image-outdir=/tmp/iter2_phase7/png_hw \
    ~/fourier-test/bbb_720p10s_hevc.mp4

mpv --hwdec=no --frames=2 --vo=image --no-audio \
    --no-input-default-bindings --start=00:00:02 \
    --vo-image-outdir=/tmp/iter2_phase7/png_sw \
    ~/fourier-test/bbb_720p10s_hevc.mp4

sha256sum /tmp/iter2_phase7/png_hw/*.jpg /tmp/iter2_phase7/png_sw/*.jpg
'
# Expected: HW frame 1 hash == SW frame 1 hash; HW frame 2 hash ==
# SW frame 2 hash; frame 1 hash != frame 2 hash (real motion).
# Per memory feedback_rockchip_pixel_verify_path.md — DMA-BUF GL is
# the cache-coherency-safe verifier; do NOT use ffmpeg-vaapi+hwdownload
# (cache-stale class on RK3399 for both H.264 + MPEG-2; HEVC expected same).

# Criterion 5: iter1 MPEG-2 + T4 H.264 reference hashes still match
ssh fresnel '
# H.264 (T4 reference)
mkdir -p /tmp/iter2_phase7/h264_hw /tmp/iter2_phase7/h264_sw
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
mpv --hwdec=vaapi --frames=2 --vo=image --no-audio \
    --no-input-default-bindings --start=00:00:30 \
    --vo-image-outdir=/tmp/iter2_phase7/h264_hw \
    ~/fourier-test/bbb_1080p30_h264.mp4
mpv --hwdec=no --frames=2 --vo=image --no-audio \
    --no-input-default-bindings --start=00:00:30 \
    --vo-image-outdir=/tmp/iter2_phase7/h264_sw \
    ~/fourier-test/bbb_1080p30_h264.mp4

# MPEG-2 (iter1 reference)
mkdir -p /tmp/iter2_phase7/mpeg2_hw /tmp/iter2_phase7/mpeg2_sw
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1 \
mpv --hwdec=vaapi --frames=2 --vo=image --no-audio \
    --no-input-default-bindings --start=00:00:02 \
    --vo-image-outdir=/tmp/iter2_phase7/mpeg2_hw \
    ~/fourier-test/bbb_720p10s_mpeg2.ts
mpv --hwdec=no --frames=2 --vo=image --no-audio \
    --no-input-default-bindings --start=00:00:02 \
    --vo-image-outdir=/tmp/iter2_phase7/mpeg2_sw \
    ~/fourier-test/bbb_720p10s_mpeg2.ts

sha256sum /tmp/iter2_phase7/h264_hw/*.jpg /tmp/iter2_phase7/h264_sw/*.jpg \
          /tmp/iter2_phase7/mpeg2_hw/*.jpg /tmp/iter2_phase7/mpeg2_sw/*.jpg
'
# Expected:
#   H.264 frames at +30s: f623d5f7... (frame 1) and 7d7bc6f2... (frame 2)
#   MPEG-2 frames at +02s: 6e7873030dbf... (frame 1) and ccc7ce08810d... (frame 2)

# Bonus byte-compare: post-fix S_EXT_CTRLS payload vs Baseline B verbatim
ssh fresnel '
mkdir -p /tmp/iter2_phase7/cross
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
strace -ff -tt -y -v -s 8192 -e trace=ioctl \
  -o /tmp/iter2_phase7/cross/ffmpeg.strace \
  ffmpeg -hide_banner -loglevel error -hwaccel vaapi \
    -i ~/fourier-test/bbb_720p10s_hevc.mp4 -frames:v 2 -f null -
grep "VIDIOC_S_EXT_CTRLS.*ctrl_class=0xf010000.*count=5" \
  /tmp/iter2_phase7/cross/ffmpeg.strace.* | head -2
'
# Expected per Baseline B: per frame, count=5 with ids 0xa40a90/91/92/93/94
# in order; SPS bytes for first 40 should match Baseline B's BBB-SPS verbatim
# (1280x720, 8-bit, 4:2:0, flags=SAO|STRONG_INTRA_SMOOTHING).

Pass/fail decision

All 5 criteria PASS → Phase 7 closes green; proceed to Phase 8 (memory update + close iter2).

Any criterion FAIL → Phase 7 → Phase 4 loopback per feedback_dev_process.md. Pre-identified loopback triggers:

  1. VIDIOC_S_EXT_CTRLS returns EINVAL post-fix on per-frame batch. Likely causes:

    • Struct size mismatch between iter2's stack-allocated structs and kernel-expected sizes. Mitigation: pahole against kernel UAPI; compare to Phase 3 Baseline B verbatim sizes (40 + 64 + 328 = 432 bytes for the fixed-size controls).
    • SCALING_MATRIX size encoding wrong (depends on whether kernel expects fixed or runtime-discovered size).
    • reserved fields not zeroed (memset was forgotten on a struct).
  2. HW pixel hashes differ from SW. Likely causes:

    • DPB ordering wrong (FFmpeg populates poc_st_curr_before/after in specific order; iter2's translation from VAAPI ReferenceFrames must match).
    • Slice_params bit_size or data_bit_offset off-by-N from NAL header byte alignment quirks (preserved logic from old h265.c, but the dynamic-array shape might affect slice boundaries).
    • SPS/PPS flags bitmask wrong bit position (e.g., _SAMPLE_ADAPTIVE_OFFSET is bit 3, not bit 4 — easy off-by-1).
    • SCALING_MATRIX values wrong (transcribed from spec rather than from Baseline B verbatim — per Lesson L2, this is the common trap).
  3. mpv --hwdec=vaapi filters HEVC out (analogous to vaapi-copy filtering MPEG-2). Mitigation: per Phase 5 Q4 amendment in iter1, fall-forward to ffmpeg -vf hwdownload path. Less likely than for MPEG-2 because mpv-vaapi DID engage MPEG-2 in iter1.

  4. iter1 MPEG-2 OR T4 H.264 regression. Bug 1 + picture.c HEVCMain dispatch must not touch MPEG-2 / H.264 paths. Mitigation: verify Phase 3 Baseline D-style scratch was scoped right; re-read the diffs against the dispatch tables.

  5. Slice_params dynamic-array submission shape rejected by kernel. Possible if kernel expects count as element count rather than size as bytes (the kernel UAPI might want a different size encoding). Mitigation: cross-validator anchor in Phase 3 Baseline B has the verbatim size=N value for one frame's batch; iter2's submission must produce a matching size for matching slice count. If dynamic-array semantics are confusing, FFmpeg v4l2_request_hevc.c:540-547 has the canonical pattern.

  6. SCALING_MATRIX availability detection wrong. iter2 assumes kernel always advertises (matches Baseline B). If on a different host (e.g., ohm) kernel doesn't advertise, the unconditional submission would fail. Mitigation: probe via VIDIOC_QUERY_EXT_CTRL at h265_init_device_controls; gate inclusion in batch on probe result. Defer this defensive path to Phase 6 if Phase 3 Baseline B is anchor enough.

  7. Latent bug B3 (h264.matrix_set=false writes inside h265.picture) — for HEVC surfaces, byte 240 of the params union lands inside h265.picture (Phase 2 Bug 8 verified). RenderPicture's VAPictureParameterBufferType per-frame copy overwrites it. Iter1 Bug 8 documentation explains the masking; iter2 inherits the same masking via ffmpeg-vaapi sender pattern (always sends VAPictureParameterBufferType per frame). If a VAAPI client surfaces without per-frame picture params, iter2 won't catch it — same latent as iter1.

Out of scope (LOCKED for iter2)

  • VP9, VP8 work (iter3/iter4).
  • HEVC Main 10 (10-bit) profile.
  • HEVC Main Still Picture profile.
  • HEVC range extensions (SCC, REXT) — EXT_SPS_ST_RPS, EXT_SPS_LT_RPS controls.
  • HEVC tile / wavefront parallel processing — ENTRY_POINT_OFFSETS control.
  • Performance metrics (Phase 1+ separate iteration).
  • Long-duration HEVC stress (>10s).
  • Slice-mode decoding (SLICE_BASED decode mode) — rkvdec only does FRAME_BASED.
  • Phase 4 cross-cutting backlog items B1 (V4L2 device-discovery), B3 (BeginPicture profile-aware reset), B4 (context.c log suppression), B5 (vbv_buffer_size negotiation), L3 (vaDeriveImage cache-stale fix).
  • chromium-fourier 149 install on fresnel.
  • Upstream Linux engagement.
  • include/hevc-ctrls.h deletion (carries forward from iter1 Phase 5 Nit 6).

Phase 5 entry point

Phase 5 (second-model review) inputs: this plan + the Phase 3 Baseline B verbatim payloads. Per feedback_dev_process.md:

Goal, situation, measurements, plan get pasted into DokuWiki. Markus reviews and redacts, then initiates the handover to a fresh model instance. Claude does not curate the artifact going to the reviewer — that would re-introduce the blind-spot accumulation the review is meant to escape. Do not summarize when handing over; paste the actual artifacts.

Concretely: artifacts to hand over are the four primary documents in this campaign repo (phase0_findings_iter2.md, phase2_iter2_situation.md, phase3_iter2_baseline.md, phase4_iter2_plan.md) plus the phase0_evidence/2026-05-08/iter2_phase3/ raw output. No summary, no executive overview, no "the gist is" framing — Markus has the raw bundle, the reviewer reads it directly.

Per memory/feedback_review_empirical_over_theoretical.md: when the reviewer flags a numerical mismatch, the right response is "I'll empirically check during Phase 7" — NOT a same-day source-read rebuttal.

Predicted iter2 outcome

The fix is structurally larger than iter1 (10 contract clauses vs 6) but bounded:

  • Trivial: Bugs 1, 8, 9 (config break + meson re-enable + dispatch) total ~15 lines.
  • Substantial: Bugs 3, 4, 5, 7, 10 (h265.c rewrite + DECODE_PARAMS + SCALING_MATRIX + slice_params dynamic-array + per-slice accumulation in picture.c) — ~400 lines combined.

Expected Phase 7 outcome: criteria 1+2 pass after Commit A. Criteria 3+4+5 pass after Commit B. Likely 1-2 Phase 7 → Phase 4 loopbacks for off-by-one bit positions in flags bitmasks or DPB ordering nuances. Phase 8 close estimated to land 4-6 commits on the fork (vs iter1's 4).

If a major surprise fires (e.g., slice_params dynamic-array submission requires a different ioctl path, or scaling_matrix structure differs significantly between FFmpeg and kernel UAPI), Phase 7 → Phase 4 → Phase 2 loopback to source-read deeper. Substrate is well-understood; major surprises unlikely.