iter2 Phase 4: plan — 10 contract clauses, ~400-line h265.c rewrite

Phase 4 plan for iter2 HEVC fix. Structured per the
feedback_dev_process.md Phase 6 contract-before-code worked example
(0012-h264-omit-scaling-matrix-frame-based.patch shape): contract
clauses with citations first, then code changes mapping 1:1 to
clauses.

10 contract clauses cited from authoritative sources:

  Clause 1 — Per-frame batched VIDIOC_S_EXT_CTRLS, count=5
    Authority: linux/v4l2-controls.h:2090-2300 (8 HEVC stateless CIDs)
    Reference impl: FFmpeg libavcodec/v4l2_request_hevc.c:505-565
                    (v4l2_request_hevc_queue_decode)
    Empirical anchor: Phase 3 Baseline B verbatim payload

  Clause 2 — v4l2_ctrl_hevc_sps layout (40 bytes)
    Authority: linux/v4l2-controls.h:2096+ (struct + 9 SPS_FLAG_* bits)
    Field-by-field VAAPI source mapping table; existing
    h265_fill_sps logic preserved, just routed to flags bitmask
    Phase 3 Baseline B BBB SPS bytes: flags=SAO|STRONG_INTRA_SMOOTHING

  Clause 3 — v4l2_ctrl_hevc_pps layout (64 bytes, 19 flags)
    Authority: linux/v4l2-controls.h:2126-2150
    Field source: VAPictureParameterBufferHEVC + slice (for
                  dependent_slice_segment_flag)

  Clause 4 — v4l2_ctrl_hevc_slice_params (variable; dynamic-array)
    Authority: kernel exposes 0xa40a92 elems=1 dims=[600] dynamic-array
    Submission shape: size = sizeof(slice_params) * num_slices_in_frame
    Reference impl: FFmpeg v4l2_request_hevc.c:540-547
    BEHAVIORAL CHANGE: per-slice accumulation in codec_store_buffer
                      (replace overwrite with append-to-array)
    DPB MOVES OUT of slice_params to DECODE_PARAMS (Clause 6)

  Clause 5 — v4l2_ctrl_hevc_scaling_matrix (size M; conditional)
    Conditional on kernel availability (probed via VIDIOC_QUERY_EXT_CTRL
    at init), NOT on bitstream flag (Phase 3 baseline corrects Phase 2
    assumption)
    Spec defaults from ISO/IEC 23008-2 Table 4-1 when iqmatrix_set==false
    PROTOCOL: transcribe defaults from Phase 3 Baseline B verbatim
              SCALING_MATRIX bytes, NOT from spec recall (per
              memory feedback_review_empirical_over_theoretical.md)

  Clause 6 — v4l2_ctrl_hevc_decode_params layout (328 bytes)
    NEW in modern API (didn't exist in staging-era)
    Contains: DPB array (16 entries), POC, num_active_dpb_entries,
              num_poc_st_curr_before/after, num_poc_lt_curr,
              poc_st_curr_before[8], etc.
    Source: existing h265_fill_slice_params lines 269-315 logic
            preserved, routed to new struct

  Clause 7 — Device-wide DECODE_MODE + START_CODE menus
    Set once at init via v4l2_set_controls(...request_fd=-1, 2 ctrls)
    rkvdec accepts: FRAME_BASED + ANNEX_B (only options per kernel menu
                    constraints, Phase 0 v4l2_inventory)
    Default location: extend src/context.c:142-155 device-init block

  Clause 8 — config.c HEVCMain case must break;
    Authority: C semantics; iter1 Bug 1 pattern verbatim
    Empirical anchor: Phase 3 Baseline D scratch confirmed

  Clause 9 — picture.c::codec_set_controls HEVCMain dispatch
    Authority: existing MPEG-2 dispatch pattern at picture.c:186-191
    Replace explicit Fourier-local: HEVC stripped reject with
    h265_set_controls call

  Clause 10 — Per-slice accumulation in codec_store_buffer
    HEVC slice_params dynamic-array source = per-RenderPicture appends
    BeginPicture resets num_slices=0; codec_store_buffer appends each
    VASliceParameterBufferType to slices[N] array

Diff scope (8 files):
  src/config.c     — 5-line break addition (Clause 8)
  src/picture.c    — HEVCMain dispatch (Clause 9) + per-slice
                     accumulation (Clause 10) + BeginPicture
                     num_slices reset, ~25 lines
  src/surface.h    — extend params.h265 with slices[64] +
                     num_slices, ~17 KB extra per surface union
  src/h265.c       — full rewrite ~400 lines (Clauses 2-7)
  src/h265.h       — re-enable
  src/meson.build  — uncomment h265.c + h265.h
  src/context.c    — extend device-init for HEVC DECODE_MODE +
                     START_CODE
  include/hevc-ctrls.h — leave as-is (9-line shim, lower-risk path
                          per iter1 Phase 5 Nit 6 deferral)

Phase 6 implementation order (2 logical commits + optional fix-forward):
  A: src/config.c HEVCMain break only (substrate fix in isolation;
     Phase 3 Baseline D already verified collateral safe)
  B: h265.c rewrite + picture.c dispatch + slice_params accumulation +
     meson re-enable + surface.h extension + context.c device-init
  C: optional fix-forward if Phase 7 surfaces a regression

Phase 7 verification harness (full Bash incantations in plan body):
  Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec
  Criterion 2: vaCreateConfig(VAProfileHEVCMain) = SUCCESS via libva trace
  Criterion 3: ffmpeg -hwaccel vaapi exit 0, no Failed-to-create
  Criterion 4: mpv --hwdec=vaapi --vo=image at +02s; HW=SW byte-identical
              (DMA-BUF GL cache-coherency-safe path per memory
              feedback_rockchip_pixel_verify_path.md)
  Criterion 5: iter1 MPEG-2 + T4 H.264 reference hashes still match
  Bonus: byte-compare post-fix S_EXT_CTRLS payload vs Baseline B

Pre-identified Phase 7 → Phase 4 loopback triggers:
  1. S_EXT_CTRLS EINVAL post-fix → check struct sizes (pahole),
     reserved zeroing, SCALING_MATRIX size encoding
  2. HW pixel hash mismatch → DPB ordering, slice_params bit_offset,
     SPS/PPS flags bit positions, SCALING_MATRIX values
  3. mpv --hwdec=vaapi filters HEVC out → fall-forward to ffmpeg
     -vf hwdownload (less likely; vaapi engaged MPEG-2 in iter1)
  4. iter1/T4 regression → verify diffs scoped right
  5. Slice_params dynamic-array submission shape rejected → cross-
     validator size encoding anchor
  6. SCALING_MATRIX availability detection wrong → defensive
     QUERY_EXT_CTRL probe in h265_init_device_controls
  7. Latent bug B3 hits HEVC differently than MPEG-2 → byte 240 in
     h265.picture; ffmpeg-vaapi sends VAPictureParameterBufferType
     per frame so masking holds

Out-of-scope (LOCKED): VP9/VP8; HEVC Main 10 / Main Still Picture /
range ext / tile-wavefront; perf metrics; long-duration stress;
SLICE_BASED decode mode (rkvdec FRAME_BASED only); Phase 4 cross-
cutting backlog (B1 device-discovery, B3 BeginPicture profile-aware,
B4 context.c log suppression, B5 vbv_buffer_size, L3 vaDeriveImage
cache-stale); chromium-fourier 149 install; upstream engagement;
hevc-ctrls.h deletion (Phase 5 Nit 6 lower-risk path continues).

Predicted Phase 8 close: 4-6 commits on the fork (vs iter1's 4).
Iter2 ~3x larger codebase delta than iter1 (mpeg2.c rewrite was
~120 lines; h265.c rewrite is ~400 lines).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-08 11:56:51 +00:00
parent d35a247948
commit 348736eb63
+656
View File
@@ -0,0 +1,656 @@
# Iteration 2 — Phase 4 (plan)
Implementation plan for iter2 HEVC Main on rkvdec. Inputs:
- [`phase0_findings_iter2.md`](phase0_findings_iter2.md) — Phase 1 lock (5 boolean criteria).
- [`phase2_iter2_situation.md`](phase2_iter2_situation.md) — six bugs identified in HEVC path.
- [`phase3_iter2_baseline.md`](phase3_iter2_baseline.md) — substrate verified post-upgrade, HEVC cross-validator anchor captured (5-control per-frame batch).
Per `feedback_dev_process.md` Phase 6 contract-before-code: this plan opens with the contract clauses (kernel UAPI + FFmpeg reference + Phase 3 Baseline B verbatim citations), then specifies code changes that map 1:1 to those clauses.
## Phase 1 criteria (re-stated; no Phase 3 → Phase 1 loopback this time)
Per [`phase0_findings_iter2.md`](phase0_findings_iter2.md), all 5 criteria as locked. No Phase 3 surprises required adjustment (criterion 3 already anchored on ffmpeg-direct from the start, mirroring iter1's Phase 5 Q4 amendment).
1. **vainfo enumeration regression**: `VAProfileHEVCMain` continues to be listed on the rkvdec env binding. (Already passes; iter2 must not strip.)
2. **vaCreateConfig success**: `vaCreateConfig(VAProfileHEVCMain, VAEntrypointVLD)` returns `VA_STATUS_SUCCESS`. (Currently `VA_STATUS_ERROR_UNSUPPORTED_PROFILE = 12`.)
3. **End-to-end ffmpeg-direct decode**: `ffmpeg -hwaccel vaapi -i bbb_720p10s_hevc.mp4 -frames:v 5 -f null -` exits 0; libva trace shows `vaCreateConfig SUCCESS`; no `Failed to create decode configuration` lines; no `EINVAL` from `VIDIOC_S_EXT_CTRLS`.
4. **DMA-BUF GL HW=SW byte-identical at +02s**: 2 distinct frames hash-equal across HW (`mpv --hwdec=vaapi --vo=image`) and SW (`--hwdec=no`); frames 1 vs 2 hash-differ (real motion).
5. **Regression on iter1 MPEG-2 AND T4 H.264**: both prior-iteration cells continue to pass with their reference hashes.
## Contract clauses (cite-before-code)
### Clause 1 — Per-frame batched VIDIOC_S_EXT_CTRLS with 5 controls
**Authority**: Linux mainline `include/uapi/linux/v4l2-controls.h:2090-2300` defines the 5 mandatory + 2 device-wide + 3 conditional HEVC stateless controls:
```c
#define V4L2_CID_STATELESS_HEVC_SPS (V4L2_CID_CODEC_STATELESS_BASE+400) /* 0xa40a90 */
#define V4L2_CID_STATELESS_HEVC_PPS (V4L2_CID_CODEC_STATELESS_BASE+401) /* 0xa40a91 */
#define V4L2_CID_STATELESS_HEVC_SLICE_PARAMS (V4L2_CID_CODEC_STATELESS_BASE+402) /* 0xa40a92 */
#define V4L2_CID_STATELESS_HEVC_SCALING_MATRIX (V4L2_CID_CODEC_STATELESS_BASE+403) /* 0xa40a93 */
#define V4L2_CID_STATELESS_HEVC_DECODE_PARAMS (V4L2_CID_CODEC_STATELESS_BASE+404) /* 0xa40a94 */
#define V4L2_CID_STATELESS_HEVC_DECODE_MODE (V4L2_CID_CODEC_STATELESS_BASE+405) /* 0xa40a95 */
#define V4L2_CID_STATELESS_HEVC_START_CODE (V4L2_CID_CODEC_STATELESS_BASE+406) /* 0xa40a96 */
#define V4L2_CID_STATELESS_HEVC_ENTRY_POINT_OFFSETS (V4L2_CID_CODEC_STATELESS_BASE+407) /* not iter2 — tile/wavefront */
```
**Reference implementation**: FFmpeg `libavcodec/v4l2_request_hevc.c:505-565` (`v4l2_request_hevc_queue_decode`) builds a 5-element `v4l2_ext_control` array and submits via `ff_v4l2_request_decode_frame` (single `VIDIOC_S_EXT_CTRLS` per frame).
**Empirical anchor**: Phase 3 Baseline B strace verbatim ([`phase3_iter2_baseline.md`](phase3_iter2_baseline.md) + `phase0_evidence/2026-05-08/iter2_phase3/ffmpeg_v4l2req.strace.*` gitignored) shows:
```
ioctl(/dev/video1, VIDIOC_S_EXT_CTRLS,
{ctrl_class=0xf010000 /* V4L2_CTRL_CLASS_CODEC_STATELESS */,
count=5,
controls=[
{id=0xa40a90 SPS, size=40, ...},
{id=0xa40a91 PPS, size=64, ...},
{id=0xa40a92 SLICE_PARAMS, size=N, ...}, /* dynamic-array */
{id=0xa40a93 SCALING_MATRIX, size=M, ...}, /* conditional on kernel availability */
{id=0xa40a94 DECODE_PARAMS, size=328, ...}
]}) = 0
```
**Implication for iter2**: `h265_set_controls()` builds a 5-entry `struct v4l2_ext_control` array and submits via the existing `v4l2_set_controls(driver_data->video_fd, surface_object->request_fd, controls, 5)` API. One `VIDIOC_S_EXT_CTRLS` per frame, mirroring iter1 MPEG-2 + iter6/7/8 H.264 patterns.
### Clause 2 — `v4l2_ctrl_hevc_sps` field layout (40 bytes)
**Authority**: `<linux/v4l2-controls.h>:2096+` `struct v4l2_ctrl_hevc_sps`:
```c
struct v4l2_ctrl_hevc_sps {
__u8 video_parameter_set_id;
__u8 seq_parameter_set_id;
__u16 pic_width_in_luma_samples;
__u16 pic_height_in_luma_samples;
__u8 bit_depth_luma_minus8;
__u8 bit_depth_chroma_minus8;
__u8 log2_max_pic_order_cnt_lsb_minus4;
__u8 sps_max_dec_pic_buffering_minus1;
__u8 sps_max_num_reorder_pics;
__u8 sps_max_latency_increase_plus1;
__u8 log2_min_luma_coding_block_size_minus3;
__u8 log2_diff_max_min_luma_coding_block_size;
__u8 log2_min_luma_transform_block_size_minus2;
__u8 log2_diff_max_min_luma_transform_block_size;
__u8 max_transform_hierarchy_depth_inter;
__u8 max_transform_hierarchy_depth_intra;
__u8 pcm_sample_bit_depth_luma_minus1;
__u8 pcm_sample_bit_depth_chroma_minus1;
__u8 log2_min_pcm_luma_coding_block_size_minus3;
__u8 log2_diff_max_min_pcm_luma_coding_block_size;
__u8 num_short_term_ref_pic_sets;
__u8 num_long_term_ref_pics_sps;
__u8 chroma_format_idc;
__u8 sps_max_sub_layers_minus1;
__u8 reserved[6];
__u64 flags;
};
```
Total 40 bytes (verified against Phase 3 Baseline B verbatim payload size). 9 boolean fields collapsed into u64 `flags`:
```c
#define V4L2_HEVC_SPS_FLAG_SEPARATE_COLOUR_PLANE (1ULL << 0)
#define V4L2_HEVC_SPS_FLAG_SCALING_LIST_ENABLED (1ULL << 1)
#define V4L2_HEVC_SPS_FLAG_AMP_ENABLED (1ULL << 2)
#define V4L2_HEVC_SPS_FLAG_SAMPLE_ADAPTIVE_OFFSET (1ULL << 3)
#define V4L2_HEVC_SPS_FLAG_PCM_ENABLED (1ULL << 4)
#define V4L2_HEVC_SPS_FLAG_PCM_LOOP_FILTER_DISABLED (1ULL << 5)
#define V4L2_HEVC_SPS_FLAG_LONG_TERM_REF_PICS_PRESENT (1ULL << 6)
#define V4L2_HEVC_SPS_FLAG_SPS_TEMPORAL_MVP_ENABLED (1ULL << 7)
#define V4L2_HEVC_SPS_FLAG_STRONG_INTRA_SMOOTHING_ENABLED (1ULL << 8)
```
**VAAPI source mapping** (mostly preserved from current `src/h265.c::h265_fill_sps`, just routed to `flags` collapsed bitmask):
| New SPS field | Source: VAPictureParameterBufferHEVC `picture` |
|---|---|
| `pic_width_in_luma_samples` | `picture->pic_width_in_luma_samples` |
| `pic_height_in_luma_samples` | `picture->pic_height_in_luma_samples` |
| `bit_depth_luma_minus8` | `picture->bit_depth_luma_minus8` |
| `bit_depth_chroma_minus8` | `picture->bit_depth_chroma_minus8` |
| `chroma_format_idc` | `picture->pic_fields.bits.chroma_format_idc` |
| `log2_max_pic_order_cnt_lsb_minus4` | `picture->log2_max_pic_order_cnt_lsb_minus4` |
| `sps_max_dec_pic_buffering_minus1` | `picture->sps_max_dec_pic_buffering_minus1` |
| `sps_max_num_reorder_pics` | 0 (current code hardcodes; VAAPI doesn't expose) |
| `sps_max_latency_increase_plus1` | 0 (same) |
| `log2_min_luma_coding_block_size_minus3` | `picture->log2_min_luma_coding_block_size_minus3` |
| `log2_diff_max_min_luma_coding_block_size` | `picture->log2_diff_max_min_luma_coding_block_size` |
| `log2_min_luma_transform_block_size_minus2` | `picture->log2_min_transform_block_size_minus2` |
| `log2_diff_max_min_luma_transform_block_size` | `picture->log2_diff_max_min_transform_block_size` |
| `max_transform_hierarchy_depth_inter/intra` | same fields in VAAPI |
| `pcm_sample_bit_depth_luma_minus1`, etc. | same fields |
| `num_short_term_ref_pic_sets` | `picture->num_short_term_ref_pic_sets` |
| `num_long_term_ref_pics_sps` | `picture->num_long_term_ref_pic_sps` |
| `sps_max_sub_layers_minus1` | 0 (VAAPI doesn't expose; placeholder) |
| `video_parameter_set_id` | 0 (VAAPI doesn't expose) |
| `seq_parameter_set_id` | 0 (VAAPI doesn't expose) |
| `flags` (OR of:) | |
| `_SEPARATE_COLOUR_PLANE` | `picture->pic_fields.bits.separate_colour_plane_flag` |
| `_SCALING_LIST_ENABLED` | `picture->pic_fields.bits.scaling_list_enabled_flag` |
| `_AMP_ENABLED` | `picture->pic_fields.bits.amp_enabled_flag` |
| `_SAMPLE_ADAPTIVE_OFFSET` | `picture->slice_parsing_fields.bits.sample_adaptive_offset_enabled_flag` |
| `_PCM_ENABLED` | `picture->pic_fields.bits.pcm_enabled_flag` |
| `_PCM_LOOP_FILTER_DISABLED` | `picture->pic_fields.bits.pcm_loop_filter_disabled_flag` |
| `_LONG_TERM_REF_PICS_PRESENT` | `picture->slice_parsing_fields.bits.long_term_ref_pics_present_flag` |
| `_SPS_TEMPORAL_MVP_ENABLED` | `picture->slice_parsing_fields.bits.sps_temporal_mvp_enabled_flag` |
| `_STRONG_INTRA_SMOOTHING_ENABLED` | `picture->pic_fields.bits.strong_intra_smoothing_enabled_flag` |
| `reserved[6]` | zero (via `memset`) |
**Phase 3 Baseline B verbatim sanity**: BBB SPS bytes decode to: 1280×720, 8-bit, 4:2:0, no PCM, flags=`SAMPLE_ADAPTIVE_OFFSET | STRONG_INTRA_SMOOTHING_ENABLED` (0x108). iter2 implementation must produce the same 40 bytes for this fixture (Phase 7 byte-compare check).
### Clause 3 — `v4l2_ctrl_hevc_pps` field layout (64 bytes)
**Authority**: `<linux/v4l2-controls.h>:2150+` `struct v4l2_ctrl_hevc_pps`. Total 64 bytes. 19 boolean PPS fields collapsed into u64 `flags`:
```c
#define V4L2_HEVC_PPS_FLAG_DEPENDENT_SLICE_SEGMENT_ENABLED (1ULL << 0)
#define V4L2_HEVC_PPS_FLAG_OUTPUT_FLAG_PRESENT (1ULL << 1)
#define V4L2_HEVC_PPS_FLAG_SIGN_DATA_HIDING_ENABLED (1ULL << 2)
#define V4L2_HEVC_PPS_FLAG_CABAC_INIT_PRESENT (1ULL << 3)
#define V4L2_HEVC_PPS_FLAG_CONSTRAINED_INTRA_PRED (1ULL << 4)
#define V4L2_HEVC_PPS_FLAG_TRANSFORM_SKIP_ENABLED (1ULL << 5)
#define V4L2_HEVC_PPS_FLAG_CU_QP_DELTA_ENABLED (1ULL << 6)
#define V4L2_HEVC_PPS_FLAG_PPS_SLICE_CHROMA_QP_OFFSETS_PRESENT (1ULL << 7)
#define V4L2_HEVC_PPS_FLAG_WEIGHTED_PRED (1ULL << 8)
#define V4L2_HEVC_PPS_FLAG_WEIGHTED_BIPRED (1ULL << 9)
#define V4L2_HEVC_PPS_FLAG_TRANSQUANT_BYPASS_ENABLED (1ULL << 10)
#define V4L2_HEVC_PPS_FLAG_TILES_ENABLED (1ULL << 11)
#define V4L2_HEVC_PPS_FLAG_ENTROPY_CODING_SYNC_ENABLED (1ULL << 12)
#define V4L2_HEVC_PPS_FLAG_LOOP_FILTER_ACROSS_TILES_ENABLED (1ULL << 13)
#define V4L2_HEVC_PPS_FLAG_PPS_LOOP_FILTER_ACROSS_SLICES_ENABLED (1ULL << 14)
#define V4L2_HEVC_PPS_FLAG_DEBLOCKING_FILTER_OVERRIDE_ENABLED (1ULL << 15)
#define V4L2_HEVC_PPS_FLAG_PPS_DISABLE_DEBLOCKING_FILTER (1ULL << 16)
#define V4L2_HEVC_PPS_FLAG_LISTS_MODIFICATION_PRESENT (1ULL << 17)
#define V4L2_HEVC_PPS_FLAG_SLICE_SEGMENT_HEADER_EXTENSION_PRESENT (1ULL << 18)
```
**VAAPI source mapping**: extracted from BOTH `picture` (VAPictureParameterBufferHEVC) AND `slice` (VASliceParameterBufferHEVC for `dependent_slice_segment_flag`). The current `src/h265.c::h265_fill_pps` (lines 48-102) does the field extraction correctly; iter2 just collapses booleans into the new u64 `flags` bitmask:
| New PPS field source | Old h265.c location |
|---|---|
| `pps->dependent_slice_segment_flag` (now `flags & DEPENDENT_SLICE_SEGMENT_ENABLED`) | `slice->LongSliceFlags.fields.dependent_slice_segment_flag` (line 54) |
| `pps->output_flag_present_flag` (now `flags & OUTPUT_FLAG_PRESENT`) | `picture->slice_parsing_fields.bits.output_flag_present_flag` |
| `pps->num_extra_slice_header_bits` (kept as field) | `picture->num_extra_slice_header_bits` |
| ... (15 more boolean field-to-flag conversions, mechanical) ||
| `pps->init_qp_minus26` (kept) | `picture->init_qp_minus26` |
| `pps->diff_cu_qp_delta_depth` (kept) | `picture->diff_cu_qp_delta_depth` |
| `pps->pps_cb_qp_offset` (kept) | `picture->pps_cb_qp_offset` |
| `pps->pps_cr_qp_offset` (kept) | `picture->pps_cr_qp_offset` |
| `pps->num_tile_columns_minus1` (kept) | `picture->num_tile_columns_minus1` |
| `pps->num_tile_rows_minus1` (kept) | `picture->num_tile_rows_minus1` |
| `pps->pps_beta_offset_div2` (kept) | `picture->pps_beta_offset_div2` |
| `pps->pps_tc_offset_div2` (kept) | `picture->pps_tc_offset_div2` |
| `pps->log2_parallel_merge_level_minus2` (kept) | `picture->log2_parallel_merge_level_minus2` |
| Field added: `column_width_minus1[20]`, `row_height_minus1[22]`, `num_extra_slice_header_bits`, `reserved` | populate from VAAPI (or zero if VAAPI doesn't expose) |
| `flags` u64 with the 19 bits OR'd | (mechanical boolean collapse) |
### Clause 4 — `v4l2_ctrl_hevc_slice_params` (variable; dynamic-array per frame)
**Authority**: `<linux/v4l2-controls.h>` `struct v4l2_ctrl_hevc_slice_params`. Contains per-slice info: bit_size, data_bit_offset, slice_type, slice_pic_order_cnt, slice flags, QP deltas, ref_idx_l0/l1[15], pred_weight_table, num_entry_point_offsets, slice_segment_addr, etc.
**Phase 0 inventory** confirms rkvdec advertises:
```
hevc_slice_parameters 0x00a40a92 (hevc-slice-params): elems=1 dims=[600] flags=has-payload, dynamic-array
```
So kernel accepts up to 600 slice_params entries per submission. iter2's bbb_720p10s_hevc.mp4 fixture is x265-ultrafast — typical 1 slice per frame; multi-slice would still fit in the 600-entry envelope.
**Submission shape**: `size = sizeof(struct v4l2_ctrl_hevc_slice_params) * num_slices_in_frame`. FFmpeg `libavcodec/v4l2_request_hevc.c:540-547` shows the pattern:
```c
if (ctx->max_slice_params && controls->num_slice_params) {
control[count++] = (struct v4l2_ext_control) {
.id = V4L2_CID_STATELESS_HEVC_SLICE_PARAMS,
.ptr = controls->frame_slice_params,
.size = sizeof(*controls->frame_slice_params) *
FFMIN(controls->num_slice_params, ctx->max_slice_params),
};
}
```
**libva backend behavioral change (NEW for iter2)**: VAAPI clients submit `VASliceParameterBufferType` once per slice via `vaRenderPicture`. The current `src/picture.c::codec_store_buffer:115-135` for HEVC `memcpy(&surface->params.h265.slice, …)` **overwrites** the previous slice's params. iter2 must change to **append**: each VASliceParameterBufferType arrival appends a new entry to a `params.h265.slices[N]` array, with `params.h265.num_slices++`. At end_picture, `h265_set_controls` reads the array and submits as one dynamic-array control.
**VAAPI source mapping**: existing `src/h265.c::h265_fill_slice_params` (lines 160-365) does the field extraction per-slice correctly. iter2 preserves the extraction logic (NAL header parse, data_bit_offset bit-search, ref_idx, pred_weight) but routes per-slice into an array slot rather than a single struct.
Critical: NAL header parsing at `h265.c:184-209` extracts `nal_unit_type` and `data_bit_offset` from the slice bitstream. **This logic is preserved** — the new V4L2 API still requires per-slice `bit_size` and `data_bit_offset`. The new struct keeps these fields (they're per-slice metadata, not per-frame).
**One field MOVES OUT of slice_params**: the DPB array (`dpb[15]`) and `num_active_dpb_entries` / `num_rps_poc_st_curr_before/after` / `num_rps_poc_lt_curr` migrate to **DECODE_PARAMS** (Clause 6). iter2's per-slice fill no longer populates the DPB.
### Clause 5 — `v4l2_ctrl_hevc_scaling_matrix` (size M; conditional submission)
**Authority**: `<linux/v4l2-controls.h>` `struct v4l2_ctrl_hevc_scaling_matrix`. Contains 4 scaling lists (4×4, 8×8, 16×16, 32×32) for luma + chroma intra/inter — substantial struct.
**Conditional submission per FFmpeg pattern**: query kernel availability once at init via `VIDIOC_QUERY_EXT_CTRL` for the SCALING_MATRIX CID. If kernel advertises (rkvdec on fresnel does, per Phase 3 Baseline B), include in the per-frame batch unconditionally. If kernel doesn't advertise, omit.
**Phase 3 evidence**: BBB fixture's per-frame batch always contains SCALING_MATRIX (see Baseline B verbatim 30 occurrences across 5 frames + queries). FFmpeg gates on `ctx->has_scaling_matrix` set at init from `ff_v4l2_request_query_control_default_value(...SCALING_MATRIX)`. iter2 mirrors: probe at init, store boolean in the libva backend's per-context state, include in batch if true.
**VAAPI source mapping**: `VAIQMatrixBufferHEVC` provides the four scaling lists (`scaling_lists_4x4[6][16]`, `_8x8[6][64]`, `_16x16[6][64]`, `_32x32[2][64]` plus DC scaling lists). When `iqmatrix_set==true`, copy from VAAPI struct to V4L2 struct. When `iqmatrix_set==false`, populate with HEVC spec default scaling matrices (per ISO/IEC 23008-2 Table 4-1 — flat 16 across all positions, with DC values 16).
Phase 3 Baseline B SCALING_MATRIX verbatim payload not field-decoded yet (deferred to Phase 6 transcription); will compare bytes against backend-generated payload at Phase 7 verification time.
### Clause 6 — `v4l2_ctrl_hevc_decode_params` field layout (328 bytes)
**Authority**: `<linux/v4l2-controls.h>` `struct v4l2_ctrl_hevc_decode_params`. NEW in modern API (didn't exist in staging-era). Contains:
- `pic_order_cnt_val` (s32) — current picture POC.
- `short_term_ref_pic_set_size`, `long_term_ref_pic_set_size` — RPS sizes.
- `num_active_dpb_entries` — count of valid DPB entries.
- `num_poc_st_curr_before/after, num_poc_lt_curr` — short-term + long-term ref counts.
- `poc_st_curr_before[8]`, `poc_st_curr_after[8]`, `poc_lt_curr[8]` — POC arrays for ref pic ordering.
- `dpb[16]` — DPB entries: `{timestamp, flags, field_pic, pic_order_cnt_val, _padding}` per entry.
- `flags` (u64) — `IRAP_PIC`, `IDR_PIC`, `NO_OUTPUT_OF_PRIOR_PICS`, etc.
Total **328 bytes** (verified against Phase 3 Baseline B verbatim payload size).
**VAAPI source mapping**: largely preserved from current `src/h265.c::h265_fill_slice_params` lines 269-315 (DPB iteration over `picture->ReferenceFrames[15]`), just routed to a new struct. The existing logic for `dpb[i].timestamp`, `dpb[i].rps`, `dpb[i].pic_order_cnt[0]`, `field_pic` migrates verbatim to `decode_params.dpb[i].timestamp` etc. The DPB-counting logic (`num_rps_poc_st_curr_before/after, num_rps_poc_lt_curr`) migrates to the `num_poc_*` fields of decode_params.
**Submission**: per-frame, after SPS + PPS in the batch.
### Clause 7 — Device-wide DECODE_MODE + START_CODE menu controls
**Authority**: `<linux/v4l2-controls.h>` defines:
```c
#define V4L2_CID_STATELESS_HEVC_DECODE_MODE (V4L2_CID_CODEC_STATELESS_BASE+405)
#define V4L2_CID_STATELESS_HEVC_START_CODE (V4L2_CID_CODEC_STATELESS_BASE+406)
enum v4l2_stateless_hevc_decode_mode {
V4L2_STATELESS_HEVC_DECODE_MODE_SLICE_BASED,
V4L2_STATELESS_HEVC_DECODE_MODE_FRAME_BASED,
};
enum v4l2_stateless_hevc_start_code {
V4L2_STATELESS_HEVC_START_CODE_NONE,
V4L2_STATELESS_HEVC_START_CODE_ANNEX_B,
};
```
**Phase 0 inventory** confirms fresnel rkvdec advertises:
```
hevc_decode_mode 0x00a40a95 (menu): min=1 max=1 default=1 (Frame-Based) flags=has-min-max
hevc_start_code 0x00a40a96 (menu): min=1 max=1 default=1 (Annex B Start Code) flags=has-min-max
```
So rkvdec accepts ONLY `FRAME_BASED` decode mode and `ANNEX_B` start code — same constraints as H.264 + MPEG-2. Set both at decoder init via `v4l2_set_controls(driver_data->video_fd, /* request_fd= */ -1, dev_ctrls, 2)` with values `FRAME_BASED` + `ANNEX_B`.
**Where to set**: extend `src/context.c:142-155`'s existing H.264 device-init block to also set HEVC's two device controls when context is HEVC-profile-bound. Current pattern: 2 ext_controls in one batched call with `request_fd=-1`. iter2 adds 2 more controls (or a separate call) for the HEVC variants.
Alternative: set them inside `h265_set_controls` once per context (with a "first call" guard). Cleaner location-wise but requires per-context state. Phase 6 implementer chooses.
### Clause 8 — `RequestCreateConfig` HEVCMain case must `break;`
**Authority**: C language semantics. `src/config.c:67` `case VAProfileHEVCMain:` falls through to `default:` (line 68) which returns the error. iter1 added `break;` for MPEG-2 cases; HEVCMain is the last case in the same fall-through bucket.
**Empirical anchor**: Phase 3 Baseline D verified the patch shape in scratch — adding `break;` for HEVCMain lets `vaCreateConfig` return `VA_STATUS_SUCCESS` without affecting iter1 MPEG-2 or T4 H.264 hashes.
**Fix shape**: 5 lines (case label preserved; comment + break added; matches iter1 Commit A pattern verbatim).
### Clause 9 — `picture.c::codec_set_controls` HEVCMain dispatch
**Authority**: existing `src/picture.c:186-191` MPEG-2 dispatch pattern from iter1:
```c
case VAProfileMPEG2Simple:
case VAProfileMPEG2Main:
rc = mpeg2_set_controls(driver_data, context, surface_object);
if (rc < 0) return VA_STATUS_ERROR_OPERATION_FAILED;
break;
```
iter2 replaces the explicit `case VAProfileHEVCMain: return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;` (lines 204-206) with the same shape, dispatching to `h265_set_controls`. Comment updated to remove the stale `Fourier-local: HEVC stripped, no HW support on RK3566.` reference.
### Clause 10 — Per-slice accumulation in `codec_store_buffer`
**Authority**: HEVC kernel API requires per-slice slice_params (Clause 4). VAAPI clients submit `VASliceParameterBufferType` once per slice via `vaRenderPicture`. The current `src/picture.c:115-135` for HEVC `VASliceParameterBufferType` does:
```c
case VAProfileHEVCMain:
memcpy(&surface_object->params.h265.slice, buffer_object->data, sizeof(...));
break;
```
**Behavior change**: replace single-slot copy with array-append:
```c
case VAProfileHEVCMain:
if (surface_object->params.h265.num_slices < HEVC_MAX_SLICES_PER_FRAME) {
memcpy(&surface_object->params.h265.slices[surface_object->params.h265.num_slices],
buffer_object->data,
sizeof(VASliceParameterBufferHEVC));
surface_object->params.h265.num_slices++;
} else {
/* exceeded array bound — log and drop; Phase 7 verification flags */
}
break;
```
`HEVC_MAX_SLICES_PER_FRAME` = e.g. 64 (kernel max is 600; conservative). For the BBB fixture this maxes at 1 per frame; the bound is for safety.
**At BeginPicture**: reset `num_slices = 0` per-frame. Currently `picture.c:287` only resets `params.h264.matrix_set = false`; iter2 adds `params.h265.num_slices = 0` reset for HEVC surfaces. (Or per-profile: switch on `config_object->profile` and reset accordingly. iter2 adds `params.h265.num_slices = 0` unconditionally for now — benign for non-HEVC since the union aliasing puts num_slices in a region overwritten by RenderPicture's per-buffer copies.)
## Diff scope
### File 1: `src/config.c` — add `break;` for HEVCMain case (5 lines)
```diff
@@ -68,6 +68,11 @@ VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile,
// submission time.
break;
case VAProfileHEVCMain:
+ // fresnel-fourier iter2: HEVC enabled. Same shape as H.264/
+ // MPEG-2 above — no profile-specific config validation in the
+ // libva backend; validation happens at vaCreateContext /
+ // control submission time.
+ break;
default:
return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
```
### File 2: `src/picture.c` — replace HEVCMain reject with dispatch + per-slice slice_params accumulation (~25 lines)
Two distinct changes:
(a) **Dispatch HEVCMain in `codec_set_controls`** (lines 204-206):
```diff
- case VAProfileHEVCMain:
- /* Fourier-local: HEVC stripped, no HW support on RK3566. */
- return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
+ case VAProfileHEVCMain:
+ rc = h265_set_controls(driver_data, context, surface_object);
+ if (rc < 0)
+ return VA_STATUS_ERROR_OPERATION_FAILED;
+ break;
```
(b) **Per-slice accumulation in `codec_store_buffer`** (HEVC VASliceParameterBufferType case, lines 127-131):
```diff
- case VAProfileHEVCMain:
- memcpy(&surface_object->params.h265.slice,
- buffer_object->data,
- sizeof(surface_object->params.h265.slice));
- break;
+ case VAProfileHEVCMain: {
+ unsigned int n = surface_object->params.h265.num_slices;
+ if (n < HEVC_MAX_SLICES_PER_FRAME) {
+ memcpy(&surface_object->params.h265.slices[n],
+ buffer_object->data,
+ sizeof(VASliceParameterBufferHEVC));
+ surface_object->params.h265.num_slices = n + 1;
+ }
+ /* note: also keep .slice (singular) populated as last-slice
+ * mirror for h265_fill_pps which reads dependent_slice_segment_flag
+ * from VASliceParameterBufferHEVC->LongSliceFlags */
+ memcpy(&surface_object->params.h265.slice,
+ buffer_object->data,
+ sizeof(surface_object->params.h265.slice));
+ break;
+ }
```
(c) **Reset `num_slices` in `RequestBeginPicture`** at line 287:
```diff
surface_object->params.h264.matrix_set = false;
+ surface_object->params.h265.num_slices = 0;
```
### File 3: `src/surface.h` — extend `params.h265` to hold slice_params array
Add inside the `union { ... } params` block:
```diff
struct {
VAPictureParameterBufferHEVC picture;
VASliceParameterBufferHEVC slice;
+ VASliceParameterBufferHEVC slices[HEVC_MAX_SLICES_PER_FRAME];
+ unsigned int num_slices;
VAIQMatrixBufferHEVC iqmatrix;
bool iqmatrix_set;
} h265;
```
`HEVC_MAX_SLICES_PER_FRAME` = `64` defined in surface.h (or h265.h). Total memory cost: `sizeof(VASliceParameterBufferHEVC)` ≈ 264 bytes × 64 = ~17 KB extra per surface union — significant but acceptable.
Alternative (smaller memory): heap-allocate `slices` array dynamically (malloc on first slice arrival, realloc on grow, free at surface destroy). More plumbing; defer to Phase 4 plan revision if Phase 7 surfaces memory concerns. iter2 default: stack-array of 64.
### File 4: `src/h265.c` — full rewrite against new split API (~400 lines)
Per Clauses 2-7. The bulk of iter2 work. Structure mirrors current h265.c but routes to new struct layouts:
- `h265_fill_sps()` → fill `struct v4l2_ctrl_hevc_sps` (40 bytes, flags collapsed). ~40 lines.
- `h265_fill_pps()` → fill `struct v4l2_ctrl_hevc_pps` (64 bytes, flags collapsed). ~50 lines.
- `h265_fill_slice_params()` → fill ONE `struct v4l2_ctrl_hevc_slice_params` (per-slice; called from a loop in h265_set_controls over surface->params.h265.slices[]). ~80 lines (preserves NAL header parse, data_bit_offset bit-search, ref_idx, pred_weight).
- **NEW** `h265_fill_decode_params()` → fill `struct v4l2_ctrl_hevc_decode_params` (328 bytes: DPB array, POC, num_active_dpb_entries, etc.). ~60 lines.
- **NEW** `h265_fill_scaling_matrix()` → fill `struct v4l2_ctrl_hevc_scaling_matrix` from `VAIQMatrixBufferHEVC` (or spec defaults if `iqmatrix_set==false`). ~30 lines.
- **NEW** `h265_init_device_controls()` → set DECODE_MODE + START_CODE menus once per context. ~15 lines. Called from h265_set_controls with first-call guard, OR from context.c device-init block.
- `h265_set_controls()` → orchestrator: build SPS, PPS, all slice_params (loop over array), DECODE_PARAMS, SCALING_MATRIX (conditional on init-time probe); submit batched. ~50 lines.
Plus the static const default scaling matrices (luma + chroma intra/inter, 4 × 64 bytes per scan-size with extra DC values) for the iqmatrix_set==false branch. Per Phase 5 Lesson L2 (`feedback_review_empirical_over_theoretical.md`): transcribe from Phase 3 Baseline B SCALING_MATRIX verbatim payload, NOT from spec recall. Phase 6 protocol: capture the BBB SCALING_MATRIX bytes via verbose strace, decode into the four 64-byte arrays, transcribe with byte-equality assertion.
### File 5: `src/h265.h` — re-enable
Currently `meson.build:73` has `# 'h265.h'` commented. Uncomment.
`h265.h` exposes only `int h265_set_controls(...)` declaration; the new helpers (`h265_fill_decode_params`, `h265_fill_scaling_matrix`, `h265_init_device_controls`) stay file-static.
### File 6: `src/meson.build` — uncomment h265.c + h265.h
```diff
@@ -47,7 +47,7 @@ sources = [
'request_pool.c',
'cap_pool.c',
-# 'h265.c'
+ 'h265.c'
]
@@ -70,7 +70,7 @@ headers = [
'cap_pool.h',
-# 'h265.h'
+ 'h265.h'
]
```
### File 7: `src/context.c` — extend device-init for HEVC (optional)
**Decision (defer to Phase 6 implementer)**: either extend `src/context.c:142-155`'s device-init block to also set HEVC `DECODE_MODE` + `START_CODE` controls (would fire EINVAL on hantro-vpu-dec same as the existing H.264 controls — auxiliary noise, intentionally swallowed by `(void)v4l2_set_controls`). OR set them inside `h265_set_controls` first-call.
Lower-risk path: extend context.c's existing block (mirrors the existing pattern, minimal new code). Picks up the EINVAL noise cosmetic on non-HEVC devices but matches existing behavior. Phase 6 default: extend context.c.
### File 8: `include/hevc-ctrls.h` — leave as-is
The 9-line shim is harmless (per Phase 2 Bug 7 verify-only). NOT deleted in iter2 (lower-risk path; iter1 Phase 5 Nit 6 deferral continues).
## Phase 6 implementation order
Phase 6 lands in 2 logical commits + optional fix-forward:
1. **Commit A — `src/config.c` HEVCMain break**: 5-line diff. Verifies the substrate fix in isolation (Phase 3 Baseline D already proved it). Phase 7 partial verification: criterion 1 + 2 should pass (vainfo enum unchanged, `vaCreateConfig` SUCCESS); criteria 3-5 still fail because picture.c reject is in place.
2. **Commit B — h265.c rewrite + picture.c HEVCMain dispatch + slice_params accumulation + meson re-enable + surface.h extension + context.c device-init extension**: the bulk of iter2 work. Phase 7 verification: all 5 criteria green.
3. **Commit C (optional)** — fix-forward if Phase 7 surfaces a regression. Per [`memory/feedback_header_deletion_check.md`](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_header_deletion_check.md), iter2 doesn't delete `hevc-ctrls.h`, so the iter1 Commit-D-style header-completeness oversight doesn't apply. Other fix-forward triggers are Phase 7 → Phase 4 loopback signals; pre-identified below.
Implementation strategy for Commit B: develop incrementally inside h265.c with `printf` instrumentation showing each per-frame fill (SPS struct hex dump, PPS, decode_params, slice_params count, scaling_matrix presence). After build passes and mpv-vaapi runs without crash, decode 2 frames and compare HW vs SW JPEG hashes. Iterate until match. Strip instrumentation at close (per [`phase8_iteration1_close.md`](phase8_iteration1_close.md) iter1 sweep precedent).
## Phase 7 verification harness
Re-uses iter1's 5-criterion shape with HEVC fixture substituted. All 5 run in one pass; raw output captured to `phase0_evidence/2026-05-08-or-later/iter2_phase7/`.
```bash
# Re-build + install
ssh fresnel '
cd ~/src/libva-v4l2-request-fourier
git pull --ff-only
ninja -C build && sudo ninja -C build install
sha256sum /usr/lib/dri/v4l2_request_drv_video.so
'
# Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec bind
ssh fresnel '
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
vainfo --display drm --device /dev/dri/renderD128 2>&1 | \
grep -E "VAProfileHEVCMain"
'
# Criteria 2 + 3: vaCreateConfig + ffmpeg-direct decode
ssh fresnel '
mkdir -p /tmp/iter2_phase7
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
LIBVA_TRACE=/tmp/iter2_phase7/libva.trace \
ffmpeg -hide_banner -loglevel info -hwaccel vaapi \
-i ~/fourier-test/bbb_720p10s_hevc.mp4 -frames:v 5 -f null -
'
# Expected: exit 0, no Failed-to-create-decode-config, libva trace
# shows vaCreateConfig SUCCESS, no EINVAL on S_EXT_CTRLS.
# Criterion 4: DMA-BUF GL HW vs SW byte-identical at +02s
ssh fresnel '
mkdir -p /tmp/iter2_phase7/png_hw /tmp/iter2_phase7/png_sw
WAYLAND_DISPLAY=wayland-0 XDG_RUNTIME_DIR=/run/user/1000 \
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
mpv --hwdec=vaapi --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:02 \
--vo-image-outdir=/tmp/iter2_phase7/png_hw \
~/fourier-test/bbb_720p10s_hevc.mp4
mpv --hwdec=no --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:02 \
--vo-image-outdir=/tmp/iter2_phase7/png_sw \
~/fourier-test/bbb_720p10s_hevc.mp4
sha256sum /tmp/iter2_phase7/png_hw/*.jpg /tmp/iter2_phase7/png_sw/*.jpg
'
# Expected: HW frame 1 hash == SW frame 1 hash; HW frame 2 hash ==
# SW frame 2 hash; frame 1 hash != frame 2 hash (real motion).
# Per memory feedback_rockchip_pixel_verify_path.md — DMA-BUF GL is
# the cache-coherency-safe verifier; do NOT use ffmpeg-vaapi+hwdownload
# (cache-stale class on RK3399 for both H.264 + MPEG-2; HEVC expected same).
# Criterion 5: iter1 MPEG-2 + T4 H.264 reference hashes still match
ssh fresnel '
# H.264 (T4 reference)
mkdir -p /tmp/iter2_phase7/h264_hw /tmp/iter2_phase7/h264_sw
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
mpv --hwdec=vaapi --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:30 \
--vo-image-outdir=/tmp/iter2_phase7/h264_hw \
~/fourier-test/bbb_1080p30_h264.mp4
mpv --hwdec=no --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:30 \
--vo-image-outdir=/tmp/iter2_phase7/h264_sw \
~/fourier-test/bbb_1080p30_h264.mp4
# MPEG-2 (iter1 reference)
mkdir -p /tmp/iter2_phase7/mpeg2_hw /tmp/iter2_phase7/mpeg2_sw
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1 \
mpv --hwdec=vaapi --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:02 \
--vo-image-outdir=/tmp/iter2_phase7/mpeg2_hw \
~/fourier-test/bbb_720p10s_mpeg2.ts
mpv --hwdec=no --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:02 \
--vo-image-outdir=/tmp/iter2_phase7/mpeg2_sw \
~/fourier-test/bbb_720p10s_mpeg2.ts
sha256sum /tmp/iter2_phase7/h264_hw/*.jpg /tmp/iter2_phase7/h264_sw/*.jpg \
/tmp/iter2_phase7/mpeg2_hw/*.jpg /tmp/iter2_phase7/mpeg2_sw/*.jpg
'
# Expected:
# H.264 frames at +30s: f623d5f7... (frame 1) and 7d7bc6f2... (frame 2)
# MPEG-2 frames at +02s: 6e7873030dbf... (frame 1) and ccc7ce08810d... (frame 2)
# Bonus byte-compare: post-fix S_EXT_CTRLS payload vs Baseline B verbatim
ssh fresnel '
mkdir -p /tmp/iter2_phase7/cross
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
strace -ff -tt -y -v -s 8192 -e trace=ioctl \
-o /tmp/iter2_phase7/cross/ffmpeg.strace \
ffmpeg -hide_banner -loglevel error -hwaccel vaapi \
-i ~/fourier-test/bbb_720p10s_hevc.mp4 -frames:v 2 -f null -
grep "VIDIOC_S_EXT_CTRLS.*ctrl_class=0xf010000.*count=5" \
/tmp/iter2_phase7/cross/ffmpeg.strace.* | head -2
'
# Expected per Baseline B: per frame, count=5 with ids 0xa40a90/91/92/93/94
# in order; SPS bytes for first 40 should match Baseline B's BBB-SPS verbatim
# (1280x720, 8-bit, 4:2:0, flags=SAO|STRONG_INTRA_SMOOTHING).
```
## Pass/fail decision
All 5 criteria PASS → Phase 7 closes green; proceed to Phase 8 (memory update + close iter2).
Any criterion FAIL → Phase 7 → Phase 4 loopback per `feedback_dev_process.md`. Pre-identified loopback triggers:
1. **`VIDIOC_S_EXT_CTRLS` returns EINVAL post-fix on per-frame batch**. Likely causes:
- Struct size mismatch between iter2's stack-allocated structs and kernel-expected sizes. Mitigation: `pahole` against kernel UAPI; compare to Phase 3 Baseline B verbatim sizes (40 + 64 + 328 = 432 bytes for the fixed-size controls).
- SCALING_MATRIX size encoding wrong (depends on whether kernel expects fixed or runtime-discovered size).
- reserved fields not zeroed (`memset` was forgotten on a struct).
2. **HW pixel hashes differ from SW**. Likely causes:
- DPB ordering wrong (FFmpeg populates `poc_st_curr_before/after` in specific order; iter2's translation from VAAPI ReferenceFrames must match).
- Slice_params bit_size or data_bit_offset off-by-N from NAL header byte alignment quirks (preserved logic from old h265.c, but the dynamic-array shape might affect slice boundaries).
- SPS/PPS flags bitmask wrong bit position (e.g., `_SAMPLE_ADAPTIVE_OFFSET` is bit 3, not bit 4 — easy off-by-1).
- SCALING_MATRIX values wrong (transcribed from spec rather than from Baseline B verbatim — per Lesson L2, this is the common trap).
3. **mpv `--hwdec=vaapi` filters HEVC out** (analogous to vaapi-copy filtering MPEG-2). Mitigation: per Phase 5 Q4 amendment in iter1, fall-forward to ffmpeg `-vf hwdownload` path. Less likely than for MPEG-2 because mpv-vaapi DID engage MPEG-2 in iter1.
4. **iter1 MPEG-2 OR T4 H.264 regression**. Bug 1 + picture.c HEVCMain dispatch must not touch MPEG-2 / H.264 paths. Mitigation: verify Phase 3 Baseline D-style scratch was scoped right; re-read the diffs against the dispatch tables.
5. **Slice_params dynamic-array submission shape rejected by kernel**. Possible if kernel expects `count` as element count rather than `size` as bytes (the kernel UAPI might want a different size encoding). Mitigation: cross-validator anchor in Phase 3 Baseline B has the verbatim `size=N` value for one frame's batch; iter2's submission must produce a matching size for matching slice count. If dynamic-array semantics are confusing, FFmpeg `v4l2_request_hevc.c:540-547` has the canonical pattern.
6. **SCALING_MATRIX availability detection wrong**. iter2 assumes kernel always advertises (matches Baseline B). If on a different host (e.g., ohm) kernel doesn't advertise, the unconditional submission would fail. Mitigation: probe via `VIDIOC_QUERY_EXT_CTRL` at h265_init_device_controls; gate inclusion in batch on probe result. **Defer this defensive path to Phase 6 if Phase 3 Baseline B is anchor enough**.
7. **Latent bug B3 (h264.matrix_set=false writes inside h265.picture)** — for HEVC surfaces, byte 240 of the `params` union lands inside `h265.picture` (Phase 2 Bug 8 verified). RenderPicture's `VAPictureParameterBufferType` per-frame copy overwrites it. Iter1 Bug 8 documentation explains the masking; iter2 inherits the same masking via ffmpeg-vaapi sender pattern (always sends VAPictureParameterBufferType per frame). If a VAAPI client surfaces without per-frame picture params, iter2 won't catch it — same latent as iter1.
## Out of scope (LOCKED for iter2)
- VP9, VP8 work (iter3/iter4).
- HEVC Main 10 (10-bit) profile.
- HEVC Main Still Picture profile.
- HEVC range extensions (SCC, REXT) — `EXT_SPS_ST_RPS`, `EXT_SPS_LT_RPS` controls.
- HEVC tile / wavefront parallel processing — `ENTRY_POINT_OFFSETS` control.
- Performance metrics (Phase 1+ separate iteration).
- Long-duration HEVC stress (>10s).
- Slice-mode decoding (`SLICE_BASED` decode mode) — rkvdec only does FRAME_BASED.
- Phase 4 cross-cutting backlog items B1 (V4L2 device-discovery), B3 (BeginPicture profile-aware reset), B4 (context.c log suppression), B5 (vbv_buffer_size negotiation), L3 (vaDeriveImage cache-stale fix).
- chromium-fourier 149 install on fresnel.
- Upstream Linux engagement.
- `include/hevc-ctrls.h` deletion (carries forward from iter1 Phase 5 Nit 6).
## Phase 5 entry point
Phase 5 (second-model review) inputs: this plan + the Phase 3 Baseline B verbatim payloads. Per `feedback_dev_process.md`:
> Goal, situation, measurements, plan get pasted into DokuWiki. Markus reviews and redacts, then initiates the handover to a fresh model instance. Claude does not curate the artifact going to the reviewer — that would re-introduce the blind-spot accumulation the review is meant to escape. Do not summarize when handing over; paste the actual artifacts.
Concretely: artifacts to hand over are the four primary documents in this campaign repo (`phase0_findings_iter2.md`, `phase2_iter2_situation.md`, `phase3_iter2_baseline.md`, `phase4_iter2_plan.md`) plus the `phase0_evidence/2026-05-08/iter2_phase3/` raw output. No summary, no executive overview, no "the gist is" framing — Markus has the raw bundle, the reviewer reads it directly.
Per [`memory/feedback_review_empirical_over_theoretical.md`](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_review_empirical_over_theoretical.md): when the reviewer flags a numerical mismatch, the right response is "I'll empirically check during Phase 7" — NOT a same-day source-read rebuttal.
## Predicted iter2 outcome
The fix is structurally larger than iter1 (10 contract clauses vs 6) but bounded:
- Trivial: Bugs 1, 8, 9 (config break + meson re-enable + dispatch) total ~15 lines.
- Substantial: Bugs 3, 4, 5, 7, 10 (h265.c rewrite + DECODE_PARAMS + SCALING_MATRIX + slice_params dynamic-array + per-slice accumulation in picture.c) — ~400 lines combined.
Expected Phase 7 outcome: criteria 1+2 pass after Commit A. Criteria 3+4+5 pass after Commit B. Likely 1-2 Phase 7 → Phase 4 loopbacks for off-by-one bit positions in flags bitmasks or DPB ordering nuances. Phase 8 close estimated to land 4-6 commits on the fork (vs iter1's 4).
If a major surprise fires (e.g., slice_params dynamic-array submission requires a different ioctl path, or scaling_matrix structure differs significantly between FFmpeg and kernel UAPI), Phase 7 → Phase 4 → Phase 2 loopback to source-read deeper. Substrate is well-understood; major surprises unlikely.