Files
fresnel-fourier/phase4_iter2_plan.md
claude-noether 348736eb63 iter2 Phase 4: plan — 10 contract clauses, ~400-line h265.c rewrite
Phase 4 plan for iter2 HEVC fix. Structured per the
feedback_dev_process.md Phase 6 contract-before-code worked example
(0012-h264-omit-scaling-matrix-frame-based.patch shape): contract
clauses with citations first, then code changes mapping 1:1 to
clauses.

10 contract clauses cited from authoritative sources:

  Clause 1 — Per-frame batched VIDIOC_S_EXT_CTRLS, count=5
    Authority: linux/v4l2-controls.h:2090-2300 (8 HEVC stateless CIDs)
    Reference impl: FFmpeg libavcodec/v4l2_request_hevc.c:505-565
                    (v4l2_request_hevc_queue_decode)
    Empirical anchor: Phase 3 Baseline B verbatim payload

  Clause 2 — v4l2_ctrl_hevc_sps layout (40 bytes)
    Authority: linux/v4l2-controls.h:2096+ (struct + 9 SPS_FLAG_* bits)
    Field-by-field VAAPI source mapping table; existing
    h265_fill_sps logic preserved, just routed to flags bitmask
    Phase 3 Baseline B BBB SPS bytes: flags=SAO|STRONG_INTRA_SMOOTHING

  Clause 3 — v4l2_ctrl_hevc_pps layout (64 bytes, 19 flags)
    Authority: linux/v4l2-controls.h:2126-2150
    Field source: VAPictureParameterBufferHEVC + slice (for
                  dependent_slice_segment_flag)

  Clause 4 — v4l2_ctrl_hevc_slice_params (variable; dynamic-array)
    Authority: kernel exposes 0xa40a92 elems=1 dims=[600] dynamic-array
    Submission shape: size = sizeof(slice_params) * num_slices_in_frame
    Reference impl: FFmpeg v4l2_request_hevc.c:540-547
    BEHAVIORAL CHANGE: per-slice accumulation in codec_store_buffer
                      (replace overwrite with append-to-array)
    DPB MOVES OUT of slice_params to DECODE_PARAMS (Clause 6)

  Clause 5 — v4l2_ctrl_hevc_scaling_matrix (size M; conditional)
    Conditional on kernel availability (probed via VIDIOC_QUERY_EXT_CTRL
    at init), NOT on bitstream flag (Phase 3 baseline corrects Phase 2
    assumption)
    Spec defaults from ISO/IEC 23008-2 Table 4-1 when iqmatrix_set==false
    PROTOCOL: transcribe defaults from Phase 3 Baseline B verbatim
              SCALING_MATRIX bytes, NOT from spec recall (per
              memory feedback_review_empirical_over_theoretical.md)

  Clause 6 — v4l2_ctrl_hevc_decode_params layout (328 bytes)
    NEW in modern API (didn't exist in staging-era)
    Contains: DPB array (16 entries), POC, num_active_dpb_entries,
              num_poc_st_curr_before/after, num_poc_lt_curr,
              poc_st_curr_before[8], etc.
    Source: existing h265_fill_slice_params lines 269-315 logic
            preserved, routed to new struct

  Clause 7 — Device-wide DECODE_MODE + START_CODE menus
    Set once at init via v4l2_set_controls(...request_fd=-1, 2 ctrls)
    rkvdec accepts: FRAME_BASED + ANNEX_B (only options per kernel menu
                    constraints, Phase 0 v4l2_inventory)
    Default location: extend src/context.c:142-155 device-init block

  Clause 8 — config.c HEVCMain case must break;
    Authority: C semantics; iter1 Bug 1 pattern verbatim
    Empirical anchor: Phase 3 Baseline D scratch confirmed

  Clause 9 — picture.c::codec_set_controls HEVCMain dispatch
    Authority: existing MPEG-2 dispatch pattern at picture.c:186-191
    Replace explicit Fourier-local: HEVC stripped reject with
    h265_set_controls call

  Clause 10 — Per-slice accumulation in codec_store_buffer
    HEVC slice_params dynamic-array source = per-RenderPicture appends
    BeginPicture resets num_slices=0; codec_store_buffer appends each
    VASliceParameterBufferType to slices[N] array

Diff scope (8 files):
  src/config.c     — 5-line break addition (Clause 8)
  src/picture.c    — HEVCMain dispatch (Clause 9) + per-slice
                     accumulation (Clause 10) + BeginPicture
                     num_slices reset, ~25 lines
  src/surface.h    — extend params.h265 with slices[64] +
                     num_slices, ~17 KB extra per surface union
  src/h265.c       — full rewrite ~400 lines (Clauses 2-7)
  src/h265.h       — re-enable
  src/meson.build  — uncomment h265.c + h265.h
  src/context.c    — extend device-init for HEVC DECODE_MODE +
                     START_CODE
  include/hevc-ctrls.h — leave as-is (9-line shim, lower-risk path
                          per iter1 Phase 5 Nit 6 deferral)

Phase 6 implementation order (2 logical commits + optional fix-forward):
  A: src/config.c HEVCMain break only (substrate fix in isolation;
     Phase 3 Baseline D already verified collateral safe)
  B: h265.c rewrite + picture.c dispatch + slice_params accumulation +
     meson re-enable + surface.h extension + context.c device-init
  C: optional fix-forward if Phase 7 surfaces a regression

Phase 7 verification harness (full Bash incantations in plan body):
  Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec
  Criterion 2: vaCreateConfig(VAProfileHEVCMain) = SUCCESS via libva trace
  Criterion 3: ffmpeg -hwaccel vaapi exit 0, no Failed-to-create
  Criterion 4: mpv --hwdec=vaapi --vo=image at +02s; HW=SW byte-identical
              (DMA-BUF GL cache-coherency-safe path per memory
              feedback_rockchip_pixel_verify_path.md)
  Criterion 5: iter1 MPEG-2 + T4 H.264 reference hashes still match
  Bonus: byte-compare post-fix S_EXT_CTRLS payload vs Baseline B

Pre-identified Phase 7 → Phase 4 loopback triggers:
  1. S_EXT_CTRLS EINVAL post-fix → check struct sizes (pahole),
     reserved zeroing, SCALING_MATRIX size encoding
  2. HW pixel hash mismatch → DPB ordering, slice_params bit_offset,
     SPS/PPS flags bit positions, SCALING_MATRIX values
  3. mpv --hwdec=vaapi filters HEVC out → fall-forward to ffmpeg
     -vf hwdownload (less likely; vaapi engaged MPEG-2 in iter1)
  4. iter1/T4 regression → verify diffs scoped right
  5. Slice_params dynamic-array submission shape rejected → cross-
     validator size encoding anchor
  6. SCALING_MATRIX availability detection wrong → defensive
     QUERY_EXT_CTRL probe in h265_init_device_controls
  7. Latent bug B3 hits HEVC differently than MPEG-2 → byte 240 in
     h265.picture; ffmpeg-vaapi sends VAPictureParameterBufferType
     per frame so masking holds

Out-of-scope (LOCKED): VP9/VP8; HEVC Main 10 / Main Still Picture /
range ext / tile-wavefront; perf metrics; long-duration stress;
SLICE_BASED decode mode (rkvdec FRAME_BASED only); Phase 4 cross-
cutting backlog (B1 device-discovery, B3 BeginPicture profile-aware,
B4 context.c log suppression, B5 vbv_buffer_size, L3 vaDeriveImage
cache-stale); chromium-fourier 149 install; upstream engagement;
hevc-ctrls.h deletion (Phase 5 Nit 6 lower-risk path continues).

Predicted Phase 8 close: 4-6 commits on the fork (vs iter1's 4).
Iter2 ~3x larger codebase delta than iter1 (mpeg2.c rewrite was
~120 lines; h265.c rewrite is ~400 lines).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:56:51 +00:00

657 lines
41 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 2 — Phase 4 (plan)
Implementation plan for iter2 HEVC Main on rkvdec. Inputs:
- [`phase0_findings_iter2.md`](phase0_findings_iter2.md) — Phase 1 lock (5 boolean criteria).
- [`phase2_iter2_situation.md`](phase2_iter2_situation.md) — six bugs identified in HEVC path.
- [`phase3_iter2_baseline.md`](phase3_iter2_baseline.md) — substrate verified post-upgrade, HEVC cross-validator anchor captured (5-control per-frame batch).
Per `feedback_dev_process.md` Phase 6 contract-before-code: this plan opens with the contract clauses (kernel UAPI + FFmpeg reference + Phase 3 Baseline B verbatim citations), then specifies code changes that map 1:1 to those clauses.
## Phase 1 criteria (re-stated; no Phase 3 → Phase 1 loopback this time)
Per [`phase0_findings_iter2.md`](phase0_findings_iter2.md), all 5 criteria as locked. No Phase 3 surprises required adjustment (criterion 3 already anchored on ffmpeg-direct from the start, mirroring iter1's Phase 5 Q4 amendment).
1. **vainfo enumeration regression**: `VAProfileHEVCMain` continues to be listed on the rkvdec env binding. (Already passes; iter2 must not strip.)
2. **vaCreateConfig success**: `vaCreateConfig(VAProfileHEVCMain, VAEntrypointVLD)` returns `VA_STATUS_SUCCESS`. (Currently `VA_STATUS_ERROR_UNSUPPORTED_PROFILE = 12`.)
3. **End-to-end ffmpeg-direct decode**: `ffmpeg -hwaccel vaapi -i bbb_720p10s_hevc.mp4 -frames:v 5 -f null -` exits 0; libva trace shows `vaCreateConfig SUCCESS`; no `Failed to create decode configuration` lines; no `EINVAL` from `VIDIOC_S_EXT_CTRLS`.
4. **DMA-BUF GL HW=SW byte-identical at +02s**: 2 distinct frames hash-equal across HW (`mpv --hwdec=vaapi --vo=image`) and SW (`--hwdec=no`); frames 1 vs 2 hash-differ (real motion).
5. **Regression on iter1 MPEG-2 AND T4 H.264**: both prior-iteration cells continue to pass with their reference hashes.
## Contract clauses (cite-before-code)
### Clause 1 — Per-frame batched VIDIOC_S_EXT_CTRLS with 5 controls
**Authority**: Linux mainline `include/uapi/linux/v4l2-controls.h:2090-2300` defines the 5 mandatory + 2 device-wide + 3 conditional HEVC stateless controls:
```c
#define V4L2_CID_STATELESS_HEVC_SPS (V4L2_CID_CODEC_STATELESS_BASE+400) /* 0xa40a90 */
#define V4L2_CID_STATELESS_HEVC_PPS (V4L2_CID_CODEC_STATELESS_BASE+401) /* 0xa40a91 */
#define V4L2_CID_STATELESS_HEVC_SLICE_PARAMS (V4L2_CID_CODEC_STATELESS_BASE+402) /* 0xa40a92 */
#define V4L2_CID_STATELESS_HEVC_SCALING_MATRIX (V4L2_CID_CODEC_STATELESS_BASE+403) /* 0xa40a93 */
#define V4L2_CID_STATELESS_HEVC_DECODE_PARAMS (V4L2_CID_CODEC_STATELESS_BASE+404) /* 0xa40a94 */
#define V4L2_CID_STATELESS_HEVC_DECODE_MODE (V4L2_CID_CODEC_STATELESS_BASE+405) /* 0xa40a95 */
#define V4L2_CID_STATELESS_HEVC_START_CODE (V4L2_CID_CODEC_STATELESS_BASE+406) /* 0xa40a96 */
#define V4L2_CID_STATELESS_HEVC_ENTRY_POINT_OFFSETS (V4L2_CID_CODEC_STATELESS_BASE+407) /* not iter2 — tile/wavefront */
```
**Reference implementation**: FFmpeg `libavcodec/v4l2_request_hevc.c:505-565` (`v4l2_request_hevc_queue_decode`) builds a 5-element `v4l2_ext_control` array and submits via `ff_v4l2_request_decode_frame` (single `VIDIOC_S_EXT_CTRLS` per frame).
**Empirical anchor**: Phase 3 Baseline B strace verbatim ([`phase3_iter2_baseline.md`](phase3_iter2_baseline.md) + `phase0_evidence/2026-05-08/iter2_phase3/ffmpeg_v4l2req.strace.*` gitignored) shows:
```
ioctl(/dev/video1, VIDIOC_S_EXT_CTRLS,
{ctrl_class=0xf010000 /* V4L2_CTRL_CLASS_CODEC_STATELESS */,
count=5,
controls=[
{id=0xa40a90 SPS, size=40, ...},
{id=0xa40a91 PPS, size=64, ...},
{id=0xa40a92 SLICE_PARAMS, size=N, ...}, /* dynamic-array */
{id=0xa40a93 SCALING_MATRIX, size=M, ...}, /* conditional on kernel availability */
{id=0xa40a94 DECODE_PARAMS, size=328, ...}
]}) = 0
```
**Implication for iter2**: `h265_set_controls()` builds a 5-entry `struct v4l2_ext_control` array and submits via the existing `v4l2_set_controls(driver_data->video_fd, surface_object->request_fd, controls, 5)` API. One `VIDIOC_S_EXT_CTRLS` per frame, mirroring iter1 MPEG-2 + iter6/7/8 H.264 patterns.
### Clause 2 — `v4l2_ctrl_hevc_sps` field layout (40 bytes)
**Authority**: `<linux/v4l2-controls.h>:2096+` `struct v4l2_ctrl_hevc_sps`:
```c
struct v4l2_ctrl_hevc_sps {
__u8 video_parameter_set_id;
__u8 seq_parameter_set_id;
__u16 pic_width_in_luma_samples;
__u16 pic_height_in_luma_samples;
__u8 bit_depth_luma_minus8;
__u8 bit_depth_chroma_minus8;
__u8 log2_max_pic_order_cnt_lsb_minus4;
__u8 sps_max_dec_pic_buffering_minus1;
__u8 sps_max_num_reorder_pics;
__u8 sps_max_latency_increase_plus1;
__u8 log2_min_luma_coding_block_size_minus3;
__u8 log2_diff_max_min_luma_coding_block_size;
__u8 log2_min_luma_transform_block_size_minus2;
__u8 log2_diff_max_min_luma_transform_block_size;
__u8 max_transform_hierarchy_depth_inter;
__u8 max_transform_hierarchy_depth_intra;
__u8 pcm_sample_bit_depth_luma_minus1;
__u8 pcm_sample_bit_depth_chroma_minus1;
__u8 log2_min_pcm_luma_coding_block_size_minus3;
__u8 log2_diff_max_min_pcm_luma_coding_block_size;
__u8 num_short_term_ref_pic_sets;
__u8 num_long_term_ref_pics_sps;
__u8 chroma_format_idc;
__u8 sps_max_sub_layers_minus1;
__u8 reserved[6];
__u64 flags;
};
```
Total 40 bytes (verified against Phase 3 Baseline B verbatim payload size). 9 boolean fields collapsed into u64 `flags`:
```c
#define V4L2_HEVC_SPS_FLAG_SEPARATE_COLOUR_PLANE (1ULL << 0)
#define V4L2_HEVC_SPS_FLAG_SCALING_LIST_ENABLED (1ULL << 1)
#define V4L2_HEVC_SPS_FLAG_AMP_ENABLED (1ULL << 2)
#define V4L2_HEVC_SPS_FLAG_SAMPLE_ADAPTIVE_OFFSET (1ULL << 3)
#define V4L2_HEVC_SPS_FLAG_PCM_ENABLED (1ULL << 4)
#define V4L2_HEVC_SPS_FLAG_PCM_LOOP_FILTER_DISABLED (1ULL << 5)
#define V4L2_HEVC_SPS_FLAG_LONG_TERM_REF_PICS_PRESENT (1ULL << 6)
#define V4L2_HEVC_SPS_FLAG_SPS_TEMPORAL_MVP_ENABLED (1ULL << 7)
#define V4L2_HEVC_SPS_FLAG_STRONG_INTRA_SMOOTHING_ENABLED (1ULL << 8)
```
**VAAPI source mapping** (mostly preserved from current `src/h265.c::h265_fill_sps`, just routed to `flags` collapsed bitmask):
| New SPS field | Source: VAPictureParameterBufferHEVC `picture` |
|---|---|
| `pic_width_in_luma_samples` | `picture->pic_width_in_luma_samples` |
| `pic_height_in_luma_samples` | `picture->pic_height_in_luma_samples` |
| `bit_depth_luma_minus8` | `picture->bit_depth_luma_minus8` |
| `bit_depth_chroma_minus8` | `picture->bit_depth_chroma_minus8` |
| `chroma_format_idc` | `picture->pic_fields.bits.chroma_format_idc` |
| `log2_max_pic_order_cnt_lsb_minus4` | `picture->log2_max_pic_order_cnt_lsb_minus4` |
| `sps_max_dec_pic_buffering_minus1` | `picture->sps_max_dec_pic_buffering_minus1` |
| `sps_max_num_reorder_pics` | 0 (current code hardcodes; VAAPI doesn't expose) |
| `sps_max_latency_increase_plus1` | 0 (same) |
| `log2_min_luma_coding_block_size_minus3` | `picture->log2_min_luma_coding_block_size_minus3` |
| `log2_diff_max_min_luma_coding_block_size` | `picture->log2_diff_max_min_luma_coding_block_size` |
| `log2_min_luma_transform_block_size_minus2` | `picture->log2_min_transform_block_size_minus2` |
| `log2_diff_max_min_luma_transform_block_size` | `picture->log2_diff_max_min_transform_block_size` |
| `max_transform_hierarchy_depth_inter/intra` | same fields in VAAPI |
| `pcm_sample_bit_depth_luma_minus1`, etc. | same fields |
| `num_short_term_ref_pic_sets` | `picture->num_short_term_ref_pic_sets` |
| `num_long_term_ref_pics_sps` | `picture->num_long_term_ref_pic_sps` |
| `sps_max_sub_layers_minus1` | 0 (VAAPI doesn't expose; placeholder) |
| `video_parameter_set_id` | 0 (VAAPI doesn't expose) |
| `seq_parameter_set_id` | 0 (VAAPI doesn't expose) |
| `flags` (OR of:) | |
| `_SEPARATE_COLOUR_PLANE` | `picture->pic_fields.bits.separate_colour_plane_flag` |
| `_SCALING_LIST_ENABLED` | `picture->pic_fields.bits.scaling_list_enabled_flag` |
| `_AMP_ENABLED` | `picture->pic_fields.bits.amp_enabled_flag` |
| `_SAMPLE_ADAPTIVE_OFFSET` | `picture->slice_parsing_fields.bits.sample_adaptive_offset_enabled_flag` |
| `_PCM_ENABLED` | `picture->pic_fields.bits.pcm_enabled_flag` |
| `_PCM_LOOP_FILTER_DISABLED` | `picture->pic_fields.bits.pcm_loop_filter_disabled_flag` |
| `_LONG_TERM_REF_PICS_PRESENT` | `picture->slice_parsing_fields.bits.long_term_ref_pics_present_flag` |
| `_SPS_TEMPORAL_MVP_ENABLED` | `picture->slice_parsing_fields.bits.sps_temporal_mvp_enabled_flag` |
| `_STRONG_INTRA_SMOOTHING_ENABLED` | `picture->pic_fields.bits.strong_intra_smoothing_enabled_flag` |
| `reserved[6]` | zero (via `memset`) |
**Phase 3 Baseline B verbatim sanity**: BBB SPS bytes decode to: 1280×720, 8-bit, 4:2:0, no PCM, flags=`SAMPLE_ADAPTIVE_OFFSET | STRONG_INTRA_SMOOTHING_ENABLED` (0x108). iter2 implementation must produce the same 40 bytes for this fixture (Phase 7 byte-compare check).
### Clause 3 — `v4l2_ctrl_hevc_pps` field layout (64 bytes)
**Authority**: `<linux/v4l2-controls.h>:2150+` `struct v4l2_ctrl_hevc_pps`. Total 64 bytes. 19 boolean PPS fields collapsed into u64 `flags`:
```c
#define V4L2_HEVC_PPS_FLAG_DEPENDENT_SLICE_SEGMENT_ENABLED (1ULL << 0)
#define V4L2_HEVC_PPS_FLAG_OUTPUT_FLAG_PRESENT (1ULL << 1)
#define V4L2_HEVC_PPS_FLAG_SIGN_DATA_HIDING_ENABLED (1ULL << 2)
#define V4L2_HEVC_PPS_FLAG_CABAC_INIT_PRESENT (1ULL << 3)
#define V4L2_HEVC_PPS_FLAG_CONSTRAINED_INTRA_PRED (1ULL << 4)
#define V4L2_HEVC_PPS_FLAG_TRANSFORM_SKIP_ENABLED (1ULL << 5)
#define V4L2_HEVC_PPS_FLAG_CU_QP_DELTA_ENABLED (1ULL << 6)
#define V4L2_HEVC_PPS_FLAG_PPS_SLICE_CHROMA_QP_OFFSETS_PRESENT (1ULL << 7)
#define V4L2_HEVC_PPS_FLAG_WEIGHTED_PRED (1ULL << 8)
#define V4L2_HEVC_PPS_FLAG_WEIGHTED_BIPRED (1ULL << 9)
#define V4L2_HEVC_PPS_FLAG_TRANSQUANT_BYPASS_ENABLED (1ULL << 10)
#define V4L2_HEVC_PPS_FLAG_TILES_ENABLED (1ULL << 11)
#define V4L2_HEVC_PPS_FLAG_ENTROPY_CODING_SYNC_ENABLED (1ULL << 12)
#define V4L2_HEVC_PPS_FLAG_LOOP_FILTER_ACROSS_TILES_ENABLED (1ULL << 13)
#define V4L2_HEVC_PPS_FLAG_PPS_LOOP_FILTER_ACROSS_SLICES_ENABLED (1ULL << 14)
#define V4L2_HEVC_PPS_FLAG_DEBLOCKING_FILTER_OVERRIDE_ENABLED (1ULL << 15)
#define V4L2_HEVC_PPS_FLAG_PPS_DISABLE_DEBLOCKING_FILTER (1ULL << 16)
#define V4L2_HEVC_PPS_FLAG_LISTS_MODIFICATION_PRESENT (1ULL << 17)
#define V4L2_HEVC_PPS_FLAG_SLICE_SEGMENT_HEADER_EXTENSION_PRESENT (1ULL << 18)
```
**VAAPI source mapping**: extracted from BOTH `picture` (VAPictureParameterBufferHEVC) AND `slice` (VASliceParameterBufferHEVC for `dependent_slice_segment_flag`). The current `src/h265.c::h265_fill_pps` (lines 48-102) does the field extraction correctly; iter2 just collapses booleans into the new u64 `flags` bitmask:
| New PPS field source | Old h265.c location |
|---|---|
| `pps->dependent_slice_segment_flag` (now `flags & DEPENDENT_SLICE_SEGMENT_ENABLED`) | `slice->LongSliceFlags.fields.dependent_slice_segment_flag` (line 54) |
| `pps->output_flag_present_flag` (now `flags & OUTPUT_FLAG_PRESENT`) | `picture->slice_parsing_fields.bits.output_flag_present_flag` |
| `pps->num_extra_slice_header_bits` (kept as field) | `picture->num_extra_slice_header_bits` |
| ... (15 more boolean field-to-flag conversions, mechanical) ||
| `pps->init_qp_minus26` (kept) | `picture->init_qp_minus26` |
| `pps->diff_cu_qp_delta_depth` (kept) | `picture->diff_cu_qp_delta_depth` |
| `pps->pps_cb_qp_offset` (kept) | `picture->pps_cb_qp_offset` |
| `pps->pps_cr_qp_offset` (kept) | `picture->pps_cr_qp_offset` |
| `pps->num_tile_columns_minus1` (kept) | `picture->num_tile_columns_minus1` |
| `pps->num_tile_rows_minus1` (kept) | `picture->num_tile_rows_minus1` |
| `pps->pps_beta_offset_div2` (kept) | `picture->pps_beta_offset_div2` |
| `pps->pps_tc_offset_div2` (kept) | `picture->pps_tc_offset_div2` |
| `pps->log2_parallel_merge_level_minus2` (kept) | `picture->log2_parallel_merge_level_minus2` |
| Field added: `column_width_minus1[20]`, `row_height_minus1[22]`, `num_extra_slice_header_bits`, `reserved` | populate from VAAPI (or zero if VAAPI doesn't expose) |
| `flags` u64 with the 19 bits OR'd | (mechanical boolean collapse) |
### Clause 4 — `v4l2_ctrl_hevc_slice_params` (variable; dynamic-array per frame)
**Authority**: `<linux/v4l2-controls.h>` `struct v4l2_ctrl_hevc_slice_params`. Contains per-slice info: bit_size, data_bit_offset, slice_type, slice_pic_order_cnt, slice flags, QP deltas, ref_idx_l0/l1[15], pred_weight_table, num_entry_point_offsets, slice_segment_addr, etc.
**Phase 0 inventory** confirms rkvdec advertises:
```
hevc_slice_parameters 0x00a40a92 (hevc-slice-params): elems=1 dims=[600] flags=has-payload, dynamic-array
```
So kernel accepts up to 600 slice_params entries per submission. iter2's bbb_720p10s_hevc.mp4 fixture is x265-ultrafast — typical 1 slice per frame; multi-slice would still fit in the 600-entry envelope.
**Submission shape**: `size = sizeof(struct v4l2_ctrl_hevc_slice_params) * num_slices_in_frame`. FFmpeg `libavcodec/v4l2_request_hevc.c:540-547` shows the pattern:
```c
if (ctx->max_slice_params && controls->num_slice_params) {
control[count++] = (struct v4l2_ext_control) {
.id = V4L2_CID_STATELESS_HEVC_SLICE_PARAMS,
.ptr = controls->frame_slice_params,
.size = sizeof(*controls->frame_slice_params) *
FFMIN(controls->num_slice_params, ctx->max_slice_params),
};
}
```
**libva backend behavioral change (NEW for iter2)**: VAAPI clients submit `VASliceParameterBufferType` once per slice via `vaRenderPicture`. The current `src/picture.c::codec_store_buffer:115-135` for HEVC `memcpy(&surface->params.h265.slice, …)` **overwrites** the previous slice's params. iter2 must change to **append**: each VASliceParameterBufferType arrival appends a new entry to a `params.h265.slices[N]` array, with `params.h265.num_slices++`. At end_picture, `h265_set_controls` reads the array and submits as one dynamic-array control.
**VAAPI source mapping**: existing `src/h265.c::h265_fill_slice_params` (lines 160-365) does the field extraction per-slice correctly. iter2 preserves the extraction logic (NAL header parse, data_bit_offset bit-search, ref_idx, pred_weight) but routes per-slice into an array slot rather than a single struct.
Critical: NAL header parsing at `h265.c:184-209` extracts `nal_unit_type` and `data_bit_offset` from the slice bitstream. **This logic is preserved** — the new V4L2 API still requires per-slice `bit_size` and `data_bit_offset`. The new struct keeps these fields (they're per-slice metadata, not per-frame).
**One field MOVES OUT of slice_params**: the DPB array (`dpb[15]`) and `num_active_dpb_entries` / `num_rps_poc_st_curr_before/after` / `num_rps_poc_lt_curr` migrate to **DECODE_PARAMS** (Clause 6). iter2's per-slice fill no longer populates the DPB.
### Clause 5 — `v4l2_ctrl_hevc_scaling_matrix` (size M; conditional submission)
**Authority**: `<linux/v4l2-controls.h>` `struct v4l2_ctrl_hevc_scaling_matrix`. Contains 4 scaling lists (4×4, 8×8, 16×16, 32×32) for luma + chroma intra/inter — substantial struct.
**Conditional submission per FFmpeg pattern**: query kernel availability once at init via `VIDIOC_QUERY_EXT_CTRL` for the SCALING_MATRIX CID. If kernel advertises (rkvdec on fresnel does, per Phase 3 Baseline B), include in the per-frame batch unconditionally. If kernel doesn't advertise, omit.
**Phase 3 evidence**: BBB fixture's per-frame batch always contains SCALING_MATRIX (see Baseline B verbatim 30 occurrences across 5 frames + queries). FFmpeg gates on `ctx->has_scaling_matrix` set at init from `ff_v4l2_request_query_control_default_value(...SCALING_MATRIX)`. iter2 mirrors: probe at init, store boolean in the libva backend's per-context state, include in batch if true.
**VAAPI source mapping**: `VAIQMatrixBufferHEVC` provides the four scaling lists (`scaling_lists_4x4[6][16]`, `_8x8[6][64]`, `_16x16[6][64]`, `_32x32[2][64]` plus DC scaling lists). When `iqmatrix_set==true`, copy from VAAPI struct to V4L2 struct. When `iqmatrix_set==false`, populate with HEVC spec default scaling matrices (per ISO/IEC 23008-2 Table 4-1 — flat 16 across all positions, with DC values 16).
Phase 3 Baseline B SCALING_MATRIX verbatim payload not field-decoded yet (deferred to Phase 6 transcription); will compare bytes against backend-generated payload at Phase 7 verification time.
### Clause 6 — `v4l2_ctrl_hevc_decode_params` field layout (328 bytes)
**Authority**: `<linux/v4l2-controls.h>` `struct v4l2_ctrl_hevc_decode_params`. NEW in modern API (didn't exist in staging-era). Contains:
- `pic_order_cnt_val` (s32) — current picture POC.
- `short_term_ref_pic_set_size`, `long_term_ref_pic_set_size` — RPS sizes.
- `num_active_dpb_entries` — count of valid DPB entries.
- `num_poc_st_curr_before/after, num_poc_lt_curr` — short-term + long-term ref counts.
- `poc_st_curr_before[8]`, `poc_st_curr_after[8]`, `poc_lt_curr[8]` — POC arrays for ref pic ordering.
- `dpb[16]` — DPB entries: `{timestamp, flags, field_pic, pic_order_cnt_val, _padding}` per entry.
- `flags` (u64) — `IRAP_PIC`, `IDR_PIC`, `NO_OUTPUT_OF_PRIOR_PICS`, etc.
Total **328 bytes** (verified against Phase 3 Baseline B verbatim payload size).
**VAAPI source mapping**: largely preserved from current `src/h265.c::h265_fill_slice_params` lines 269-315 (DPB iteration over `picture->ReferenceFrames[15]`), just routed to a new struct. The existing logic for `dpb[i].timestamp`, `dpb[i].rps`, `dpb[i].pic_order_cnt[0]`, `field_pic` migrates verbatim to `decode_params.dpb[i].timestamp` etc. The DPB-counting logic (`num_rps_poc_st_curr_before/after, num_rps_poc_lt_curr`) migrates to the `num_poc_*` fields of decode_params.
**Submission**: per-frame, after SPS + PPS in the batch.
### Clause 7 — Device-wide DECODE_MODE + START_CODE menu controls
**Authority**: `<linux/v4l2-controls.h>` defines:
```c
#define V4L2_CID_STATELESS_HEVC_DECODE_MODE (V4L2_CID_CODEC_STATELESS_BASE+405)
#define V4L2_CID_STATELESS_HEVC_START_CODE (V4L2_CID_CODEC_STATELESS_BASE+406)
enum v4l2_stateless_hevc_decode_mode {
V4L2_STATELESS_HEVC_DECODE_MODE_SLICE_BASED,
V4L2_STATELESS_HEVC_DECODE_MODE_FRAME_BASED,
};
enum v4l2_stateless_hevc_start_code {
V4L2_STATELESS_HEVC_START_CODE_NONE,
V4L2_STATELESS_HEVC_START_CODE_ANNEX_B,
};
```
**Phase 0 inventory** confirms fresnel rkvdec advertises:
```
hevc_decode_mode 0x00a40a95 (menu): min=1 max=1 default=1 (Frame-Based) flags=has-min-max
hevc_start_code 0x00a40a96 (menu): min=1 max=1 default=1 (Annex B Start Code) flags=has-min-max
```
So rkvdec accepts ONLY `FRAME_BASED` decode mode and `ANNEX_B` start code — same constraints as H.264 + MPEG-2. Set both at decoder init via `v4l2_set_controls(driver_data->video_fd, /* request_fd= */ -1, dev_ctrls, 2)` with values `FRAME_BASED` + `ANNEX_B`.
**Where to set**: extend `src/context.c:142-155`'s existing H.264 device-init block to also set HEVC's two device controls when context is HEVC-profile-bound. Current pattern: 2 ext_controls in one batched call with `request_fd=-1`. iter2 adds 2 more controls (or a separate call) for the HEVC variants.
Alternative: set them inside `h265_set_controls` once per context (with a "first call" guard). Cleaner location-wise but requires per-context state. Phase 6 implementer chooses.
### Clause 8 — `RequestCreateConfig` HEVCMain case must `break;`
**Authority**: C language semantics. `src/config.c:67` `case VAProfileHEVCMain:` falls through to `default:` (line 68) which returns the error. iter1 added `break;` for MPEG-2 cases; HEVCMain is the last case in the same fall-through bucket.
**Empirical anchor**: Phase 3 Baseline D verified the patch shape in scratch — adding `break;` for HEVCMain lets `vaCreateConfig` return `VA_STATUS_SUCCESS` without affecting iter1 MPEG-2 or T4 H.264 hashes.
**Fix shape**: 5 lines (case label preserved; comment + break added; matches iter1 Commit A pattern verbatim).
### Clause 9 — `picture.c::codec_set_controls` HEVCMain dispatch
**Authority**: existing `src/picture.c:186-191` MPEG-2 dispatch pattern from iter1:
```c
case VAProfileMPEG2Simple:
case VAProfileMPEG2Main:
rc = mpeg2_set_controls(driver_data, context, surface_object);
if (rc < 0) return VA_STATUS_ERROR_OPERATION_FAILED;
break;
```
iter2 replaces the explicit `case VAProfileHEVCMain: return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;` (lines 204-206) with the same shape, dispatching to `h265_set_controls`. Comment updated to remove the stale `Fourier-local: HEVC stripped, no HW support on RK3566.` reference.
### Clause 10 — Per-slice accumulation in `codec_store_buffer`
**Authority**: HEVC kernel API requires per-slice slice_params (Clause 4). VAAPI clients submit `VASliceParameterBufferType` once per slice via `vaRenderPicture`. The current `src/picture.c:115-135` for HEVC `VASliceParameterBufferType` does:
```c
case VAProfileHEVCMain:
memcpy(&surface_object->params.h265.slice, buffer_object->data, sizeof(...));
break;
```
**Behavior change**: replace single-slot copy with array-append:
```c
case VAProfileHEVCMain:
if (surface_object->params.h265.num_slices < HEVC_MAX_SLICES_PER_FRAME) {
memcpy(&surface_object->params.h265.slices[surface_object->params.h265.num_slices],
buffer_object->data,
sizeof(VASliceParameterBufferHEVC));
surface_object->params.h265.num_slices++;
} else {
/* exceeded array bound — log and drop; Phase 7 verification flags */
}
break;
```
`HEVC_MAX_SLICES_PER_FRAME` = e.g. 64 (kernel max is 600; conservative). For the BBB fixture this maxes at 1 per frame; the bound is for safety.
**At BeginPicture**: reset `num_slices = 0` per-frame. Currently `picture.c:287` only resets `params.h264.matrix_set = false`; iter2 adds `params.h265.num_slices = 0` reset for HEVC surfaces. (Or per-profile: switch on `config_object->profile` and reset accordingly. iter2 adds `params.h265.num_slices = 0` unconditionally for now — benign for non-HEVC since the union aliasing puts num_slices in a region overwritten by RenderPicture's per-buffer copies.)
## Diff scope
### File 1: `src/config.c` — add `break;` for HEVCMain case (5 lines)
```diff
@@ -68,6 +68,11 @@ VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile,
// submission time.
break;
case VAProfileHEVCMain:
+ // fresnel-fourier iter2: HEVC enabled. Same shape as H.264/
+ // MPEG-2 above — no profile-specific config validation in the
+ // libva backend; validation happens at vaCreateContext /
+ // control submission time.
+ break;
default:
return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
```
### File 2: `src/picture.c` — replace HEVCMain reject with dispatch + per-slice slice_params accumulation (~25 lines)
Two distinct changes:
(a) **Dispatch HEVCMain in `codec_set_controls`** (lines 204-206):
```diff
- case VAProfileHEVCMain:
- /* Fourier-local: HEVC stripped, no HW support on RK3566. */
- return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
+ case VAProfileHEVCMain:
+ rc = h265_set_controls(driver_data, context, surface_object);
+ if (rc < 0)
+ return VA_STATUS_ERROR_OPERATION_FAILED;
+ break;
```
(b) **Per-slice accumulation in `codec_store_buffer`** (HEVC VASliceParameterBufferType case, lines 127-131):
```diff
- case VAProfileHEVCMain:
- memcpy(&surface_object->params.h265.slice,
- buffer_object->data,
- sizeof(surface_object->params.h265.slice));
- break;
+ case VAProfileHEVCMain: {
+ unsigned int n = surface_object->params.h265.num_slices;
+ if (n < HEVC_MAX_SLICES_PER_FRAME) {
+ memcpy(&surface_object->params.h265.slices[n],
+ buffer_object->data,
+ sizeof(VASliceParameterBufferHEVC));
+ surface_object->params.h265.num_slices = n + 1;
+ }
+ /* note: also keep .slice (singular) populated as last-slice
+ * mirror for h265_fill_pps which reads dependent_slice_segment_flag
+ * from VASliceParameterBufferHEVC->LongSliceFlags */
+ memcpy(&surface_object->params.h265.slice,
+ buffer_object->data,
+ sizeof(surface_object->params.h265.slice));
+ break;
+ }
```
(c) **Reset `num_slices` in `RequestBeginPicture`** at line 287:
```diff
surface_object->params.h264.matrix_set = false;
+ surface_object->params.h265.num_slices = 0;
```
### File 3: `src/surface.h` — extend `params.h265` to hold slice_params array
Add inside the `union { ... } params` block:
```diff
struct {
VAPictureParameterBufferHEVC picture;
VASliceParameterBufferHEVC slice;
+ VASliceParameterBufferHEVC slices[HEVC_MAX_SLICES_PER_FRAME];
+ unsigned int num_slices;
VAIQMatrixBufferHEVC iqmatrix;
bool iqmatrix_set;
} h265;
```
`HEVC_MAX_SLICES_PER_FRAME` = `64` defined in surface.h (or h265.h). Total memory cost: `sizeof(VASliceParameterBufferHEVC)` ≈ 264 bytes × 64 = ~17 KB extra per surface union — significant but acceptable.
Alternative (smaller memory): heap-allocate `slices` array dynamically (malloc on first slice arrival, realloc on grow, free at surface destroy). More plumbing; defer to Phase 4 plan revision if Phase 7 surfaces memory concerns. iter2 default: stack-array of 64.
### File 4: `src/h265.c` — full rewrite against new split API (~400 lines)
Per Clauses 2-7. The bulk of iter2 work. Structure mirrors current h265.c but routes to new struct layouts:
- `h265_fill_sps()` → fill `struct v4l2_ctrl_hevc_sps` (40 bytes, flags collapsed). ~40 lines.
- `h265_fill_pps()` → fill `struct v4l2_ctrl_hevc_pps` (64 bytes, flags collapsed). ~50 lines.
- `h265_fill_slice_params()` → fill ONE `struct v4l2_ctrl_hevc_slice_params` (per-slice; called from a loop in h265_set_controls over surface->params.h265.slices[]). ~80 lines (preserves NAL header parse, data_bit_offset bit-search, ref_idx, pred_weight).
- **NEW** `h265_fill_decode_params()` → fill `struct v4l2_ctrl_hevc_decode_params` (328 bytes: DPB array, POC, num_active_dpb_entries, etc.). ~60 lines.
- **NEW** `h265_fill_scaling_matrix()` → fill `struct v4l2_ctrl_hevc_scaling_matrix` from `VAIQMatrixBufferHEVC` (or spec defaults if `iqmatrix_set==false`). ~30 lines.
- **NEW** `h265_init_device_controls()` → set DECODE_MODE + START_CODE menus once per context. ~15 lines. Called from h265_set_controls with first-call guard, OR from context.c device-init block.
- `h265_set_controls()` → orchestrator: build SPS, PPS, all slice_params (loop over array), DECODE_PARAMS, SCALING_MATRIX (conditional on init-time probe); submit batched. ~50 lines.
Plus the static const default scaling matrices (luma + chroma intra/inter, 4 × 64 bytes per scan-size with extra DC values) for the iqmatrix_set==false branch. Per Phase 5 Lesson L2 (`feedback_review_empirical_over_theoretical.md`): transcribe from Phase 3 Baseline B SCALING_MATRIX verbatim payload, NOT from spec recall. Phase 6 protocol: capture the BBB SCALING_MATRIX bytes via verbose strace, decode into the four 64-byte arrays, transcribe with byte-equality assertion.
### File 5: `src/h265.h` — re-enable
Currently `meson.build:73` has `# 'h265.h'` commented. Uncomment.
`h265.h` exposes only `int h265_set_controls(...)` declaration; the new helpers (`h265_fill_decode_params`, `h265_fill_scaling_matrix`, `h265_init_device_controls`) stay file-static.
### File 6: `src/meson.build` — uncomment h265.c + h265.h
```diff
@@ -47,7 +47,7 @@ sources = [
'request_pool.c',
'cap_pool.c',
-# 'h265.c'
+ 'h265.c'
]
@@ -70,7 +70,7 @@ headers = [
'cap_pool.h',
-# 'h265.h'
+ 'h265.h'
]
```
### File 7: `src/context.c` — extend device-init for HEVC (optional)
**Decision (defer to Phase 6 implementer)**: either extend `src/context.c:142-155`'s device-init block to also set HEVC `DECODE_MODE` + `START_CODE` controls (would fire EINVAL on hantro-vpu-dec same as the existing H.264 controls — auxiliary noise, intentionally swallowed by `(void)v4l2_set_controls`). OR set them inside `h265_set_controls` first-call.
Lower-risk path: extend context.c's existing block (mirrors the existing pattern, minimal new code). Picks up the EINVAL noise cosmetic on non-HEVC devices but matches existing behavior. Phase 6 default: extend context.c.
### File 8: `include/hevc-ctrls.h` — leave as-is
The 9-line shim is harmless (per Phase 2 Bug 7 verify-only). NOT deleted in iter2 (lower-risk path; iter1 Phase 5 Nit 6 deferral continues).
## Phase 6 implementation order
Phase 6 lands in 2 logical commits + optional fix-forward:
1. **Commit A — `src/config.c` HEVCMain break**: 5-line diff. Verifies the substrate fix in isolation (Phase 3 Baseline D already proved it). Phase 7 partial verification: criterion 1 + 2 should pass (vainfo enum unchanged, `vaCreateConfig` SUCCESS); criteria 3-5 still fail because picture.c reject is in place.
2. **Commit B — h265.c rewrite + picture.c HEVCMain dispatch + slice_params accumulation + meson re-enable + surface.h extension + context.c device-init extension**: the bulk of iter2 work. Phase 7 verification: all 5 criteria green.
3. **Commit C (optional)** — fix-forward if Phase 7 surfaces a regression. Per [`memory/feedback_header_deletion_check.md`](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_header_deletion_check.md), iter2 doesn't delete `hevc-ctrls.h`, so the iter1 Commit-D-style header-completeness oversight doesn't apply. Other fix-forward triggers are Phase 7 → Phase 4 loopback signals; pre-identified below.
Implementation strategy for Commit B: develop incrementally inside h265.c with `printf` instrumentation showing each per-frame fill (SPS struct hex dump, PPS, decode_params, slice_params count, scaling_matrix presence). After build passes and mpv-vaapi runs without crash, decode 2 frames and compare HW vs SW JPEG hashes. Iterate until match. Strip instrumentation at close (per [`phase8_iteration1_close.md`](phase8_iteration1_close.md) iter1 sweep precedent).
## Phase 7 verification harness
Re-uses iter1's 5-criterion shape with HEVC fixture substituted. All 5 run in one pass; raw output captured to `phase0_evidence/2026-05-08-or-later/iter2_phase7/`.
```bash
# Re-build + install
ssh fresnel '
cd ~/src/libva-v4l2-request-fourier
git pull --ff-only
ninja -C build && sudo ninja -C build install
sha256sum /usr/lib/dri/v4l2_request_drv_video.so
'
# Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec bind
ssh fresnel '
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
vainfo --display drm --device /dev/dri/renderD128 2>&1 | \
grep -E "VAProfileHEVCMain"
'
# Criteria 2 + 3: vaCreateConfig + ffmpeg-direct decode
ssh fresnel '
mkdir -p /tmp/iter2_phase7
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
LIBVA_TRACE=/tmp/iter2_phase7/libva.trace \
ffmpeg -hide_banner -loglevel info -hwaccel vaapi \
-i ~/fourier-test/bbb_720p10s_hevc.mp4 -frames:v 5 -f null -
'
# Expected: exit 0, no Failed-to-create-decode-config, libva trace
# shows vaCreateConfig SUCCESS, no EINVAL on S_EXT_CTRLS.
# Criterion 4: DMA-BUF GL HW vs SW byte-identical at +02s
ssh fresnel '
mkdir -p /tmp/iter2_phase7/png_hw /tmp/iter2_phase7/png_sw
WAYLAND_DISPLAY=wayland-0 XDG_RUNTIME_DIR=/run/user/1000 \
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
mpv --hwdec=vaapi --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:02 \
--vo-image-outdir=/tmp/iter2_phase7/png_hw \
~/fourier-test/bbb_720p10s_hevc.mp4
mpv --hwdec=no --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:02 \
--vo-image-outdir=/tmp/iter2_phase7/png_sw \
~/fourier-test/bbb_720p10s_hevc.mp4
sha256sum /tmp/iter2_phase7/png_hw/*.jpg /tmp/iter2_phase7/png_sw/*.jpg
'
# Expected: HW frame 1 hash == SW frame 1 hash; HW frame 2 hash ==
# SW frame 2 hash; frame 1 hash != frame 2 hash (real motion).
# Per memory feedback_rockchip_pixel_verify_path.md — DMA-BUF GL is
# the cache-coherency-safe verifier; do NOT use ffmpeg-vaapi+hwdownload
# (cache-stale class on RK3399 for both H.264 + MPEG-2; HEVC expected same).
# Criterion 5: iter1 MPEG-2 + T4 H.264 reference hashes still match
ssh fresnel '
# H.264 (T4 reference)
mkdir -p /tmp/iter2_phase7/h264_hw /tmp/iter2_phase7/h264_sw
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
mpv --hwdec=vaapi --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:30 \
--vo-image-outdir=/tmp/iter2_phase7/h264_hw \
~/fourier-test/bbb_1080p30_h264.mp4
mpv --hwdec=no --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:30 \
--vo-image-outdir=/tmp/iter2_phase7/h264_sw \
~/fourier-test/bbb_1080p30_h264.mp4
# MPEG-2 (iter1 reference)
mkdir -p /tmp/iter2_phase7/mpeg2_hw /tmp/iter2_phase7/mpeg2_sw
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1 \
mpv --hwdec=vaapi --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:02 \
--vo-image-outdir=/tmp/iter2_phase7/mpeg2_hw \
~/fourier-test/bbb_720p10s_mpeg2.ts
mpv --hwdec=no --frames=2 --vo=image --no-audio \
--no-input-default-bindings --start=00:00:02 \
--vo-image-outdir=/tmp/iter2_phase7/mpeg2_sw \
~/fourier-test/bbb_720p10s_mpeg2.ts
sha256sum /tmp/iter2_phase7/h264_hw/*.jpg /tmp/iter2_phase7/h264_sw/*.jpg \
/tmp/iter2_phase7/mpeg2_hw/*.jpg /tmp/iter2_phase7/mpeg2_sw/*.jpg
'
# Expected:
# H.264 frames at +30s: f623d5f7... (frame 1) and 7d7bc6f2... (frame 2)
# MPEG-2 frames at +02s: 6e7873030dbf... (frame 1) and ccc7ce08810d... (frame 2)
# Bonus byte-compare: post-fix S_EXT_CTRLS payload vs Baseline B verbatim
ssh fresnel '
mkdir -p /tmp/iter2_phase7/cross
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media0 \
strace -ff -tt -y -v -s 8192 -e trace=ioctl \
-o /tmp/iter2_phase7/cross/ffmpeg.strace \
ffmpeg -hide_banner -loglevel error -hwaccel vaapi \
-i ~/fourier-test/bbb_720p10s_hevc.mp4 -frames:v 2 -f null -
grep "VIDIOC_S_EXT_CTRLS.*ctrl_class=0xf010000.*count=5" \
/tmp/iter2_phase7/cross/ffmpeg.strace.* | head -2
'
# Expected per Baseline B: per frame, count=5 with ids 0xa40a90/91/92/93/94
# in order; SPS bytes for first 40 should match Baseline B's BBB-SPS verbatim
# (1280x720, 8-bit, 4:2:0, flags=SAO|STRONG_INTRA_SMOOTHING).
```
## Pass/fail decision
All 5 criteria PASS → Phase 7 closes green; proceed to Phase 8 (memory update + close iter2).
Any criterion FAIL → Phase 7 → Phase 4 loopback per `feedback_dev_process.md`. Pre-identified loopback triggers:
1. **`VIDIOC_S_EXT_CTRLS` returns EINVAL post-fix on per-frame batch**. Likely causes:
- Struct size mismatch between iter2's stack-allocated structs and kernel-expected sizes. Mitigation: `pahole` against kernel UAPI; compare to Phase 3 Baseline B verbatim sizes (40 + 64 + 328 = 432 bytes for the fixed-size controls).
- SCALING_MATRIX size encoding wrong (depends on whether kernel expects fixed or runtime-discovered size).
- reserved fields not zeroed (`memset` was forgotten on a struct).
2. **HW pixel hashes differ from SW**. Likely causes:
- DPB ordering wrong (FFmpeg populates `poc_st_curr_before/after` in specific order; iter2's translation from VAAPI ReferenceFrames must match).
- Slice_params bit_size or data_bit_offset off-by-N from NAL header byte alignment quirks (preserved logic from old h265.c, but the dynamic-array shape might affect slice boundaries).
- SPS/PPS flags bitmask wrong bit position (e.g., `_SAMPLE_ADAPTIVE_OFFSET` is bit 3, not bit 4 — easy off-by-1).
- SCALING_MATRIX values wrong (transcribed from spec rather than from Baseline B verbatim — per Lesson L2, this is the common trap).
3. **mpv `--hwdec=vaapi` filters HEVC out** (analogous to vaapi-copy filtering MPEG-2). Mitigation: per Phase 5 Q4 amendment in iter1, fall-forward to ffmpeg `-vf hwdownload` path. Less likely than for MPEG-2 because mpv-vaapi DID engage MPEG-2 in iter1.
4. **iter1 MPEG-2 OR T4 H.264 regression**. Bug 1 + picture.c HEVCMain dispatch must not touch MPEG-2 / H.264 paths. Mitigation: verify Phase 3 Baseline D-style scratch was scoped right; re-read the diffs against the dispatch tables.
5. **Slice_params dynamic-array submission shape rejected by kernel**. Possible if kernel expects `count` as element count rather than `size` as bytes (the kernel UAPI might want a different size encoding). Mitigation: cross-validator anchor in Phase 3 Baseline B has the verbatim `size=N` value for one frame's batch; iter2's submission must produce a matching size for matching slice count. If dynamic-array semantics are confusing, FFmpeg `v4l2_request_hevc.c:540-547` has the canonical pattern.
6. **SCALING_MATRIX availability detection wrong**. iter2 assumes kernel always advertises (matches Baseline B). If on a different host (e.g., ohm) kernel doesn't advertise, the unconditional submission would fail. Mitigation: probe via `VIDIOC_QUERY_EXT_CTRL` at h265_init_device_controls; gate inclusion in batch on probe result. **Defer this defensive path to Phase 6 if Phase 3 Baseline B is anchor enough**.
7. **Latent bug B3 (h264.matrix_set=false writes inside h265.picture)** — for HEVC surfaces, byte 240 of the `params` union lands inside `h265.picture` (Phase 2 Bug 8 verified). RenderPicture's `VAPictureParameterBufferType` per-frame copy overwrites it. Iter1 Bug 8 documentation explains the masking; iter2 inherits the same masking via ffmpeg-vaapi sender pattern (always sends VAPictureParameterBufferType per frame). If a VAAPI client surfaces without per-frame picture params, iter2 won't catch it — same latent as iter1.
## Out of scope (LOCKED for iter2)
- VP9, VP8 work (iter3/iter4).
- HEVC Main 10 (10-bit) profile.
- HEVC Main Still Picture profile.
- HEVC range extensions (SCC, REXT) — `EXT_SPS_ST_RPS`, `EXT_SPS_LT_RPS` controls.
- HEVC tile / wavefront parallel processing — `ENTRY_POINT_OFFSETS` control.
- Performance metrics (Phase 1+ separate iteration).
- Long-duration HEVC stress (>10s).
- Slice-mode decoding (`SLICE_BASED` decode mode) — rkvdec only does FRAME_BASED.
- Phase 4 cross-cutting backlog items B1 (V4L2 device-discovery), B3 (BeginPicture profile-aware reset), B4 (context.c log suppression), B5 (vbv_buffer_size negotiation), L3 (vaDeriveImage cache-stale fix).
- chromium-fourier 149 install on fresnel.
- Upstream Linux engagement.
- `include/hevc-ctrls.h` deletion (carries forward from iter1 Phase 5 Nit 6).
## Phase 5 entry point
Phase 5 (second-model review) inputs: this plan + the Phase 3 Baseline B verbatim payloads. Per `feedback_dev_process.md`:
> Goal, situation, measurements, plan get pasted into DokuWiki. Markus reviews and redacts, then initiates the handover to a fresh model instance. Claude does not curate the artifact going to the reviewer — that would re-introduce the blind-spot accumulation the review is meant to escape. Do not summarize when handing over; paste the actual artifacts.
Concretely: artifacts to hand over are the four primary documents in this campaign repo (`phase0_findings_iter2.md`, `phase2_iter2_situation.md`, `phase3_iter2_baseline.md`, `phase4_iter2_plan.md`) plus the `phase0_evidence/2026-05-08/iter2_phase3/` raw output. No summary, no executive overview, no "the gist is" framing — Markus has the raw bundle, the reviewer reads it directly.
Per [`memory/feedback_review_empirical_over_theoretical.md`](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_review_empirical_over_theoretical.md): when the reviewer flags a numerical mismatch, the right response is "I'll empirically check during Phase 7" — NOT a same-day source-read rebuttal.
## Predicted iter2 outcome
The fix is structurally larger than iter1 (10 contract clauses vs 6) but bounded:
- Trivial: Bugs 1, 8, 9 (config break + meson re-enable + dispatch) total ~15 lines.
- Substantial: Bugs 3, 4, 5, 7, 10 (h265.c rewrite + DECODE_PARAMS + SCALING_MATRIX + slice_params dynamic-array + per-slice accumulation in picture.c) — ~400 lines combined.
Expected Phase 7 outcome: criteria 1+2 pass after Commit A. Criteria 3+4+5 pass after Commit B. Likely 1-2 Phase 7 → Phase 4 loopbacks for off-by-one bit positions in flags bitmasks or DPB ordering nuances. Phase 8 close estimated to land 4-6 commits on the fork (vs iter1's 4).
If a major surprise fires (e.g., slice_params dynamic-array submission requires a different ioctl path, or scaling_matrix structure differs significantly between FFmpeg and kernel UAPI), Phase 7 → Phase 4 → Phase 2 loopback to source-read deeper. Substrate is well-understood; major surprises unlikely.