# Iteration 3 — Phase 5 (sonnet-architect review) Second-model review of `phase4_iter3_plan.md` (commit `2918dda`) by sonnet-architect on 2026-05-08. Reviewer ran cold-context source reads of fork files + reference sources + kernel hantro driver + VAAPI headers + verbatim Phase 3 baseline. All Critical findings empirically verified by author before incorporation per memory `feedback_review_empirical_over_theoretical.md` Direction 2. ## Verdict **BLOCK** — 4 Critical findings, all blocking iter3 close until amended: - C1, C2: latent decode-correctness failures (wrong hardware DMA addresses) → criterion-4 byte-compare divergence - C3, C4: Phase 6 build failures (undefined identifier + non-existent typedef in userspace) All 4 amendments accepted into the amended Phase 4 plan. ## Critical findings ### C1 — `first_part_header_bits = 0` is a correctness bug, NOT a cosmetic gap **Plan claim** (Clause 9): "kernel hantro_vp8.c re-parses the uncompressed header and may not depend on this field." Marked as a known fidelity gap, leaving it 0. **Empirical evidence** (verified by author): ``` references/linux-mainline/drivers/media/platform/verisilicon/hantro_g1_vp8_dec.c:260: hdr->first_part_header_bits + 8; references/linux-mainline/drivers/media/platform/verisilicon/rockchip_vpu2_hw_vp8_dec.c:372: hdr->first_part_header_bits + 8; ``` Both kernel VP8 decoders (G1 on RK3399 + VPU2 on RK3568) read `hdr->first_part_header_bits` unconditionally to compute the macroblock-data DMA offset: ```c mb_offset_bits = first_part_offset * 8 + hdr->first_part_header_bits + 8; mb_offset_bytes = mb_offset_bits / 8; mb_size = hdr->first_part_size - (mb_offset_bytes - first_part_offset) + ...; ``` Setting this to 0 places the hardware at the wrong DMA offset for ALL macroblock data. The decode produces garbage or hangs. **Fix** (verified by source identity): ```c frame.first_part_header_bits = slice->macroblock_offset; ``` **Proof** (author verified via grep): ``` references/ffmpeg-kwiboo/libavcodec/vaapi_vp8.c:204: .macroblock_offset = (8 * (s->coder_state_at_header_end.input - data) - s->coder_state_at_header_end.bit_count - 8), references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp8.c:83: .first_part_header_bits = (8 * (s->coder_state_at_header_end.input - data) - s->coder_state_at_header_end.bit_count - 8), ``` Identical formulas, identical inputs. `slice->macroblock_offset` IS `first_part_header_bits` by source identity. **Anchor**: VAAPI `va_dec_vp8.h:186` confirms `macroblock_offset` field exists in `VASliceParameterBufferVP8`. Phase 3 keyframe verbatim: `first_part_header_bits = 6550`. ### C2 — `frame.first_part_size = slice->partition_size[0]` is short by `ceil(macroblock_offset/8)` bytes **Plan claim** (Clause 9): `frame.first_part_size = slice->partition_size[0]`. **Empirical evidence**: ``` references/ffmpeg-kwiboo/libavcodec/vaapi_vp8.c:209: sp.partition_size[0] = s->header_partition_size - ((sp.macroblock_offset + 7) / 8); ``` VAAPI's `partition_size[0]` is the **remaining** bytes of the control partition AFTER the header-bits portion (`ceil(macroblock_offset / 8)` bytes consumed up front). Confirmed by VAAPI spec comment at `va_dec_vp8.h:193-196`: "remaining bytes of control partition after parsed by application." But kernel needs the TOTAL control partition size: ``` hantro_g1_vp8_dec.c:265: mb_size = hdr->first_part_size - (mb_offset_bytes - first_part_offset) + ...; hantro_g1_vp8_dec.c:293: dct_part_offset = first_part_offset + hdr->first_part_size; ``` `first_part_size` here is the TOTAL control partition size (= FFmpeg's `s->header_partition_size`). **Numerical verification** (Phase 3 frame-1 keyframe verbatim): - Kernel `first_part_size = 22742` bytes - `macroblock_offset` = `first_part_header_bits` = 6550 bits → `ceil(6550/8) = 819` bytes - VAAPI's `partition_size[0]` = `22742 - 819 = 21923` bytes - Plan-as-written would set `first_part_size = 21923` → DCT partitions begin 819 bytes too early → garbage output **Fix**: ```c frame.first_part_size = slice->partition_size[0] + ((uint32_t)slice->macroblock_offset + 7) / 8; ``` ### C3 — `VAProbabilityDataBufferType` does not exist; will fail Phase 6 compile **Plan refers to** (Phase 2 B7, Phase 4 Commit C): `case VAProbabilityDataBufferType:` in `codec_store_buffer` outer switch. **Empirical evidence** (verified by grep): ``` reference/libva/va/va.h:2058: VAProbabilityBufferType = 13, reference/libva/va/va_dec_vp8.h:218: typedef struct _VAProbabilityDataBufferVP8 { ``` The plan confused the buffer-type enum constant (`VAProbabilityBufferType`) with the buffer-data struct name (`VAProbabilityDataBufferVP8`). Switch case must use the enum constant. **Fix**: replace `VAProbabilityDataBufferType` → `VAProbabilityBufferType` throughout (Phase 2 B7 + Phase 4 Commit C site list). **Cross-check**: `vaapi_vp8.c:147` confirms FFmpeg's VAAPI encoder uses `VAProbabilityBufferType` correctly. ### C4 — `(s8)` cast undefined in userspace **Plan code** (Clauses 6, 7): ```c frame.quant.y_dc_delta = (s8)(iqmatrix->quantization_index[0][1] - iqmatrix->quantization_index[0][0]); frame.segment.quant_update[i] = (s8)(...); ``` **Empirical evidence** (verified by grep): ``` /usr/include/asm-generic/int-ll64.h:20: typedef __signed__ char __s8; ``` The kernel UAPI exposes `__s8` (with leading double-underscore prefix). Bare `s8` is a kernel-internal typedef from `linux/types.h` — NOT exposed to userspace. Compile fails: undefined identifier. **Fix**: replace `(s8)` with `(int8_t)` from ``. Or drop the explicit cast entirely — implicit conversion to `__s8` is well-defined for values in `[-128, 127]` which the BBB content stays well within. ## Suggested improvements ### S1 — Quantization derivation correctness for non-BBB content (author's Q1) The plan's `iqmatrix[0][N] - iqmatrix[0][0]` subtraction is correct **for streams where no clip saturation occurs**. Verified via `vaapi_vp8.c:157-162`: VAAPI fills `quantization_index[0][N] = clip(base_quant[0] + yac_qi + delta_N)`. Edge case: when `yac_qi + delta_N` saturates at 0 or 127 in VAAPI, the subtraction can't recover the original signed delta. BBB does not exercise this path. Document as known fidelity gap; not blocking iter3 close. ### S2 — `segment.quant_update[s]` semantics (author's Q5) VAAPI: `iqmatrix[s][0] = clip(base_quant[s] + yac_qi)`. Plan: `quant_update[s] = iqmatrix[s][0] - iqmatrix[0][0]` = `base_quant[s] - base_quant[0]` (when no clip). When `base_quant[0] = 0` (typical), this matches FFmpeg V4L2's `quant_update[i] = s->segmentation.base_quant[i]`. When `base_quant[0] != 0`, plan's formula produces `base_quant[s] - base_quant[0]` not `base_quant[s]`. BBB has segmentation disabled; flag as known fidelity gap for non-standard streams. ### S3 — Probability buffer fallback comment is factually wrong **Plan Clause 8 comment**: "if not, kernel hantro driver re-derives from default tables." This is incorrect. `hantro_vp8.c::hantro_vp8_prob_update` (lines 49-143) reads `entropy->coeff_probs` unconditionally — there is no default-table fallback in the driver. If `probability_set==false` and `coeff_probs` is zero, decode produces garbage. Practical risk is low (`vaapi_vp8.c:146-148` shows FFmpeg VAAPI always sends `VAProbabilityBufferType` per frame, so mpv-vaapi via FFmpeg should always set probability_set), but the plan comment should be corrected and an `assert(probability_set)` added (or a logged warning) so any miss surfaces immediately during Phase 6 runtime. **Recommended amendment**: change the comment, add the runtime assertion. ### S4 — `lf.level` source for segmentation-enabled streams **Plan**: `frame.lf.level = picture->loop_filter_level[0]`. `vaapi_vp8.c:87-95`: VAAPI fills `pp.loop_filter_level[i] = filter_level[i] + filter.level` (effective level for segment i). For segmentation-disabled BBB, `loop_filter_level[0] = filter.level` (the base) — plan is correct. For segmentation-enabled streams in DELTA mode, kernel computes `clamp(lf.level + lf_update[i])`; using effective level for segment 0 as the base produces wrong results for segments 1-3. BBB out-of-scope; document fidelity gap. ### S5 — `segment.lf_update[0]` is implicitly correct via self-subtraction Plan Clause 7 fills `lf_update[i]` for `i = 0..3` as `loop_filter_level[i] - loop_filter_level[0]`. For i=0: yields 0 regardless. Kernel computes `clamp(lf.level + lf_update[0]) = lf.level`. Correct. No issue. ## Questions ### Q1 — Does mpv-vaapi consumer always send `VAProbabilityBufferType`? `vaapi_vp8.c:146-148` shows FFmpeg's VAAPI encoder always sends it. mpv's libavcodec VP8 VAAPI path uses `ff_vp8_vaapi_hwaccel` which is the same code, so yes — `probability_set` should always be true. Recommended: add `assert(probability_set);` debug guard so any miss surfaces in Phase 6 (this is the runtime check from S3). ### Q2 — `first_part_size` recovery formula for non-byte-aligned `macroblock_offset` `((macroblock_offset + 7) / 8)` rounds up. Phase 3 keyframe: `macroblock_offset = 6550` bits → `ceil(6550/8) = 819` bytes. Reconstructed `first_part_size = 21923 + 819 = 22742` ✓. Author should verify this also holds for inter frames in Phase 7 (Phase 3 captured inter `first_part_size = 1218` etc.; Phase 7 byte-compare will validate empirically). ### Q3 — Which kernel VP8 driver runs on RK3399 (G1 vs VPU2)? Both G1 (`hantro_g1_vp8_dec.c`) and VPU2 (`rockchip_vpu2_hw_vp8_dec.c`) use `first_part_header_bits` identically. RK3399 uses G1 (per `hantro_drv.c` machine-data table). C1 + C2 apply regardless. No action. ## Empirical verification log | Claim | Verification | Result | |---|---|---| | `first_part_header_bits` used by hantro G1 | `grep -rn first_part_header_bits references/.../verisilicon/` | Read at `hantro_g1_vp8_dec.c:260` + `rockchip_vpu2_hw_vp8_dec.c:372` | | `macroblock_offset == first_part_header_bits` formula identity | Source compare `vaapi_vp8.c:204-205` vs `v4l2_request_vp8.c:83-84` | Identical formulas | | `partition_size[0] != first_part_size` | `vaapi_vp8.c:209` derivation + Phase 3 numerics | Differ by `ceil(macroblock_offset/8)` = 819 bytes | | `VAProbabilityDataBufferType` non-existent | `grep VAProbability* va.h va_dec_vp8.h` | Only `VAProbabilityBufferType=13` exists; struct is `VAProbabilityDataBufferVP8` | | `s8` userspace typedef | `grep typedef.*s8 /usr/include/` | Only `__s8` (double-underscore) defined; bare `s8` kernel-only | | VAAPI `quantization_index` is absolute | `vaapi_vp8.c:152-163` | Confirmed `clip(base_quant[s] + yac_qi + delta_c)` | | `coeff_probs` direct memcpy safe | Both paths apply `coeff_bands_inverse` reordering | `vaapi_vp8.c:133-143` and `v4l2_request_vp8.c:141-153` use identical reordering — direct memcpy correct | | `num_dct_parts` off-by-one direction | `vaapi_vp8.c:206`: `num_of_partitions = num_coeff_partitions + 1`; `v4l2_request_vp8.c:80`: `num_dct_parts = num_coeff_partitions` | Plan's `num_of_partitions - 1` confirmed correct | | VAAPI `mb_mode_delta` offset baked in | `vaapi_vp8.c:100`: `[i + 4]`; `v4l2_request_vp8.c:109`: `[i + MODE_I4x4]` (= 4) | Same offset; plan's direct copy correct | | KEY_FRAME inversion direction | `vaapi_vp8.c:58`: `key_frame = !s->keyframe` | Plan's `!pic_fields.bits.key_frame` correct | | `hantro_vp8_prob_update` no fallback | Read `hantro_vp8.c:49-143` | Confirmed — comment in plan Clause 8 was wrong | | All VAAPI field names valid | Line-by-line grep against `va_dec_vp8.h:71-241` | All present | ## Author's 7 review questions — direct answers | Q | Answer | |---|---| | Q1 quantization derivation | Correct for typical content (no clip saturation, base_quant[0]=0). See S1+S2 for edge cases. | | Q2 `first_part_header_bits=0` safety | UNSAFE → C1 | | Q3 `num_dct_parts` off-by-one | `num_of_partitions - 1` correct | | Q4 field availability | All present; `(s8)` fails compile (C4); `VAProbabilityDataBufferType` fails compile (C3) | | Q5 `quant_update[s]` semantics | Signed delta; plan's subtraction correct when `base_quant[0]=0` | | Q6 `SHOW_FRAME` unconditional | Safe for BBB scope; documented fidelity gap for invisible-frame content | | Q7 buffer order independence | Confirmed — each `codec_store_buffer` writes a distinct field | ## Required Phase 4 plan amendments | ID | Location | Change | |---|---|---| | C1 | Clause 9, plan-comment + code | `frame.first_part_header_bits = slice->macroblock_offset;` | | C2 | Clause 9, code | `frame.first_part_size = slice->partition_size[0] + ((uint32_t)slice->macroblock_offset + 7) / 8;` | | C3 | Phase 2 B7 + Phase 4 Commit C | `case VAProbabilityBufferType:` (drop "Data") | | C4 | Clauses 6+7 | `(s8)` → `(int8_t)` (or drop cast) | | S3 | Clause 8 comment + add runtime check | Correct comment; add `assert(probability_set)` or warn-log | C1 + C2 affect Clause 9 jointly; rewrite as a single block. C3 affects 5 places (Phase 2 B7 + Phase 4 Commit C site list + B7 patch shape). C4 affects ~6 places (Clauses 6+7). S3 affects 2 places. ## Process discipline note This review was conducted with empirical verification of every Critical claim BEFORE inclusion (memory `feedback_review_empirical_over_theoretical.md` Direction 2). Every C1-C4 finding was verified by grep / source-read of multiple files, and the proposed amendments were cross-checked against working code (FFmpeg VAAPI ref, FFmpeg V4L2 ref). Author independently verified each before incorporating into the amended plan. The review caught what would have been a 2-loopback Phase 6 failure cycle (compile failures C3+C4) plus a Phase 7 byte-compare divergence (C1+C2 hardware-DMA-offset bug). Without this review pass, iter3 Phase 6 would have failed first compile attempt, then after fix-forward would have produced visible output but with wrong DMA addresses → Phase 7 → Phase 4 loopback.