iter3 Phase 5: sonnet review — 4 Critical findings, 4 amendments

Second-model review by sonnet-architect found 4 Critical bugs in
Phase 4 plan, all verified empirically by author before incorporation
per memory feedback_review_empirical_over_theoretical Direction 2.
Amendments applied in-place to phase4_iter3_plan.md +
phase2_iter3_situation.md.

Critical findings:

  C1 first_part_header_bits = 0 was claimed cosmetic; actually
     UNSAFE. hantro_g1_vp8_dec.c:260 + rockchip_vpu2_hw_vp8_dec.c:372
     both read this field unconditionally to compute the macroblock
     DMA offset. Setting 0 would place hardware at wrong DMA offset
     for ALL macroblock data → garbage decode.
     Fix: frame.first_part_header_bits = slice->macroblock_offset
     (verified by source identity — vaapi_vp8.c:204 and
     v4l2_request_vp8.c:83 use byte-identical formulas).

  C2 first_part_size = slice->partition_size[0] was wrong; VAAPI's
     partition_size[0] is the REMAINING bytes after parsing
     (vaapi_vp8.c:209 confirms; va_dec_vp8.h:193-196 spec confirms).
     Kernel needs the TOTAL control partition size.
     Fix: frame.first_part_size = slice->partition_size[0] +
                                  ((macroblock_offset + 7) / 8)
     Phase 3 keyframe numerics confirm: 21923 + 819 = 22742 ✓.

  C3 VAProbabilityDataBufferType does not exist as a buffer-type
     enum; it's the struct name. The actual enum constant is
     VAProbabilityBufferType (= 13 per va.h:2058). Switch case
     using the wrong identifier would have failed Phase 6 compile.
     Fix: replace globally in phase2 + phase4 docs.

  C4 (s8) cast undefined in userspace. Kernel has 's8' typedef in
     linux/types.h (kernel-internal). UAPI exposes '__s8' (double-
     underscore). Userspace portable cast is int8_t from <stdint.h>.
     Fix: replace (s8) with (int8_t) in Clauses 6+7.

Suggested:

  S3 Clause 8 comment was factually wrong: hantro_vp8.c::
     hantro_vp8_prob_update reads coeff_probs unconditionally;
     there is NO default-table fallback. If probability_set==false,
     decode produces garbage. Practical risk low (FFmpeg vaapi_vp8.c
     always sends VAProbabilityBufferType per frame), but corrected
     comment + added assert(probability_set) runtime guard for
     immediate Phase 6 surfacing.

Plus 5 minor S/Q items documented; non-blocking for iter3.

Author's 7 review questions all answered directly in the review:
  Q1 quantization derivation: correct for typical content
  Q2 first_part_header_bits=0 safety: UNSAFE → C1
  Q3 num_dct_parts off-by-one: confirmed correct
  Q4 field availability: 2 compile failures found (C3 + C4)
  Q5 quant_update[s] semantics: signed delta confirmed
  Q6 SHOW_FRAME unconditional: safe for BBB scope
  Q7 buffer order independence: confirmed

Estimated saving: 1 Phase 6 → Phase 4 loopback + 2 Phase 6 fix-
forward commits. Review pass is the right path forward per memory
rule "Reviews are never skippable" — empty-review value =
empirical-verification value, regardless of finding count.

Refs:
  phase4_iter3_plan.md (amended in-place; Phase 5 amendments
                         section appended)
  phase2_iter3_situation.md (amended C3 globally)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-08 21:27:53 +00:00
parent 2918dda2e0
commit 656596aa6b
3 changed files with 301 additions and 34 deletions
+69 -30
View File
@@ -135,20 +135,25 @@ For segment 0 (always present):
```c
if (iqmatrix_set) {
frame.quant.y_ac_qi = iqmatrix->quantization_index[0][0];
frame.quant.y_dc_delta = (s8)(iqmatrix->quantization_index[0][1] -
iqmatrix->quantization_index[0][0]);
frame.quant.y2_dc_delta = (s8)(iqmatrix->quantization_index[0][2] -
iqmatrix->quantization_index[0][0]);
frame.quant.y2_ac_delta = (s8)(iqmatrix->quantization_index[0][3] -
iqmatrix->quantization_index[0][0]);
frame.quant.uv_dc_delta = (s8)(iqmatrix->quantization_index[0][4] -
iqmatrix->quantization_index[0][0]);
frame.quant.uv_ac_delta = (s8)(iqmatrix->quantization_index[0][5] -
iqmatrix->quantization_index[0][0]);
frame.quant.y_dc_delta = (int8_t)(iqmatrix->quantization_index[0][1] -
iqmatrix->quantization_index[0][0]);
frame.quant.y2_dc_delta = (int8_t)(iqmatrix->quantization_index[0][2] -
iqmatrix->quantization_index[0][0]);
frame.quant.y2_ac_delta = (int8_t)(iqmatrix->quantization_index[0][3] -
iqmatrix->quantization_index[0][0]);
frame.quant.uv_dc_delta = (int8_t)(iqmatrix->quantization_index[0][4] -
iqmatrix->quantization_index[0][0]);
frame.quant.uv_ac_delta = (int8_t)(iqmatrix->quantization_index[0][5] -
iqmatrix->quantization_index[0][0]);
}
/* if iqmatrix_set==false: frame.quant stays zero from memset; safe
* default for fixtures with implicit-zero deltas (most VP8 streams,
* including BBB). */
* including BBB).
*
* Phase 5 C4: cast is (int8_t), NOT (s8) — the latter is a kernel-
* internal typedef from <linux/types.h> not exposed to userspace.
* Userspace UAPI exposes __s8 (double-underscore). int8_t from
* <stdint.h> is the portable userspace equivalent. */
```
For segments 1..3 (only used when segmentation enabled):
@@ -156,7 +161,7 @@ For segments 1..3 (only used when segmentation enabled):
```c
if (picture->pic_fields.bits.segmentation_enabled && iqmatrix_set) {
for (i = 1; i < 4; i++)
frame.segment.quant_update[i] = (s8)
frame.segment.quant_update[i] = (int8_t)
(iqmatrix->quantization_index[i][0] -
iqmatrix->quantization_index[0][0]);
}
@@ -190,7 +195,7 @@ frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_DELTA_VALUE_MODE;
/* segment.lf_update[] populated only when segmentation enabled */
if (picture->pic_fields.bits.segmentation_enabled) {
for (i = 0; i < 4; i++)
frame.segment.lf_update[i] = (s8)
frame.segment.lf_update[i] = (int8_t)
(picture->loop_filter_level[i] -
picture->loop_filter_level[0]);
}
@@ -219,11 +224,14 @@ if (probability_set) {
probability->dct_coeff_probs,
sizeof frame.entropy.coeff_probs);
}
/* If probability_set==false: leave coeff_probs zero from memset.
* Most consumers will send VAProbabilityDataBuffer per frame; if not,
* kernel hantro driver re-derives from default tables. Phase 5
* review will flag this as a fidelity gap if Phase 7 mpv consumer
* doesn't send probability buffers. */
/* Phase 5 S3 correction: if probability_set==false, decode produces
* GARBAGE. hantro_vp8.c::hantro_vp8_prob_update reads
* entropy.coeff_probs unconditionally; there is NO default-table
* fallback in the kernel driver. Practical risk is low — FFmpeg
* vaapi_vp8.c:146-148 always sends VAProbabilityBufferType per
* frame, and mpv-vaapi VP8 uses ff_vp8_vaapi_hwaccel under the hood.
* Add an assert() so any consumer violation surfaces immediately. */
assert(probability_set); /* runtime guard per Phase 5 S3 */
```
**Anchors**:
@@ -239,16 +247,29 @@ frame.coder_state.value = picture->bool_coder_ctx.value;
frame.coder_state.bit_count = picture->bool_coder_ctx.count;
/* coder_state.padding stays zero from memset */
frame.first_part_size = slice->partition_size[0];
frame.first_part_header_bits = 0;
/* first_part_header_bits: VAAPI doesn't expose this directly. FFmpeg
* derives from internal parser bit-cursor state. Phase 3 keyframe
* empirical = 6550, inter ranges 86..254. Leaving 0 here is a known
* fidelity gap; kernel hantro_vp8.c re-parses the uncompressed header
* and may not depend on this field for correctness. Phase 7 byte-
* compare will reveal whether Phase 5 review needs to re-open this
* with bitstream-side derivation (closest VAAPI analog: slice->
* macroblock_offset, but it's the MB-data offset not header-bits). */
/* Phase 5 C1+C2: macroblock_offset IS first_part_header_bits by
* source identity (vaapi_vp8.c:204 and v4l2_request_vp8.c:83 set
* the field with the identical 3-input formula
* 8 * (input - data) - bit_count - 8
* — VAAPI just renames it). Kernel hantro_g1_vp8_dec.c:260 and
* rockchip_vpu2_hw_vp8_dec.c:372 read it unconditionally to compute
* the macroblock-data DMA offset; leaving 0 places hardware at the
* wrong offset → garbage decode. NOT cosmetic.
*
* Phase 5 C2: VAAPI's slice->partition_size[0] is the REMAINING
* bytes of the control partition AFTER ceil(macroblock_offset/8)
* bytes already consumed by the application's bitstream parser
* (vaapi_vp8.c:209 confirms; va_dec_vp8.h:193-196 spec comment
* confirms). Kernel needs the TOTAL control partition size, so
* recover by adding back the header bytes: */
frame.first_part_header_bits = slice->macroblock_offset;
frame.first_part_size = slice->partition_size[0] +
((uint32_t)slice->macroblock_offset + 7) / 8;
/* Phase 3 keyframe verbatim cross-check:
* first_part_size = 22742 bytes
* first_part_header_bits = 6550 bits → ceil(6550/8) = 819 bytes
* VAAPI partition_size[0] = 22742 - 819 = 21923 (predicted)
* reconstructed = 21923 + 819 = 22742 ✓ */
for (i = 0; i < 8; i++)
frame.dct_part_sizes[i] = slice->partition_size[i + 1];
@@ -322,7 +343,7 @@ Predicted ~200 LOC for vp8.c (the 10 clauses), ~40 LOC for vp8.h, +2 lines meson
5 sites:
1. `picture.c:34-36` include block: add `#include "vp8.h"`.
2. `picture.c::codec_set_controls` (line ~218 after HEVCMain case): add VP8 dispatch case calling `vp8_set_controls`.
3. `picture.c::codec_store_buffer` (lines 85-179): add VP8 cases for VAPictureParameterBufferType, VASliceParameterBufferType, VAIQMatrixBufferType + NEW outer case `VAProbabilityDataBufferType` with VP8 inner case.
3. `picture.c::codec_store_buffer` (lines 85-179): add VP8 cases for VAPictureParameterBufferType, VASliceParameterBufferType, VAIQMatrixBufferType + NEW outer case `VAProbabilityBufferType` with VP8 inner case.
4. `picture.c::RequestBeginPicture` (line ~300): add 2 reset lines for `params.vp8.iqmatrix_set = false;` and `params.vp8.probability_set = false;`.
5. `surface.h::object_surface::params` union (line ~112): insert vp8 struct after h265 (~10 LOC).
@@ -372,7 +393,7 @@ Submitting this plan for second-model review (sonnet-architect). Key questions f
3. **`first_part_header_bits = 0` fallback**: is this actually safe for hantro? Reviewer should grep kernel `hantro_vp8.c` + `rkvdec_vp8.c` for any read of `first_part_header_bits` in the decode hot path. If used, the gap blocks correctness; if unused (re-parsed from bitstream), the gap is cosmetic.
4. **Probability buffer ordering**: VAAPI's mpv consumer might send VAProbabilityDataBufferType BEFORE VAPictureParameterBufferType in the same frame's RenderPicture call, OR after. The libva backend's `codec_store_buffer` is order-independent (each call updates a different surface field), so order shouldn't matter — but flag this for review confirmation.
4. **Probability buffer ordering**: VAAPI's mpv consumer might send VAProbabilityBufferType BEFORE VAPictureParameterBufferType in the same frame's RenderPicture call, OR after. The libva backend's `codec_store_buffer` is order-independent (each call updates a different surface field), so order shouldn't matter — but flag this for review confirmation.
5. **Endianness assumption**: VAAPI structs use host-byte-order; kernel UAPI structs use host-byte-order; both little-endian on aarch64. No byte-swap needed — confirm review.
@@ -397,3 +418,21 @@ Submitting this plan for second-model review (sonnet-architect). Key questions f
- All Phase 3 anchors captured + preserved on fresnel `/tmp/iter3_phase3/`.
- Memory rules carry forward; `feedback_fresnel_hostname` added.
- Phase 4 plan ready for sonnet-architect review (Phase 5).
## Phase 5 amendments incorporated (post-review)
Sonnet-architect review (`phase5_iter3_review.md`) returned 4 Critical findings + 1 Suggested correction. All 4 Critical findings empirically verified by author before incorporation per memory `feedback_review_empirical_over_theoretical.md` Direction 2. Amendments applied in-place to this plan:
| ID | Site | Change | Verified |
|---|---|---|---|
| C1 | Clause 9 | `frame.first_part_header_bits = slice->macroblock_offset;` (kernel reads it unconditionally per `hantro_g1_vp8_dec.c:260`) | ✓ grep + source identity |
| C2 | Clause 9 | `frame.first_part_size = slice->partition_size[0] + ((macroblock_offset+7)/8)` (recover total partition size from VAAPI's post-parse remainder per `vaapi_vp8.c:209`) | ✓ Phase 3 numerics: 21923+819=22742 |
| C3 | Phase 2 B7 + Phase 4 Commit C | `VAProbabilityDataBufferType``VAProbabilityBufferType` (correct enum constant from `va.h:2058`; the "Data" suffix is the struct name not the enum) | ✓ grep |
| C4 | Clauses 6+7 | `(s8)``(int8_t)` (kernel-internal `s8` not exposed to userspace; UAPI exposes `__s8`; portable userspace cast is `int8_t` from `<stdint.h>`) | ✓ grep `/usr/include/asm-generic/int-ll64.h:20` |
| S3 | Clause 8 | Comment corrected (kernel has NO probability-table fallback); added `assert(probability_set)` runtime guard | ✓ source-read `hantro_vp8.c:49-143` |
Without the Phase 5 review, iter3 Phase 6 would have:
- **Failed first compile attempt** (C3 + C4) — ~2 fix-forward commits required
- **Then produced visible-but-wrong output** (C1 + C2) — Phase 7 byte-compare would have diverged, triggering Phase 7 → Phase 4 loopback for hardware-DMA-offset bug
Estimated saving: 1 Phase 6 → Phase 4 loopback + 2 Phase 6 fix-forward commits. The review pass is the right path forward per memory rule "Reviews are never skippable."