Files
fresnel-fourier/phase5_iter3_review.md
T
claude-noether 656596aa6b iter3 Phase 5: sonnet review — 4 Critical findings, 4 amendments
Second-model review by sonnet-architect found 4 Critical bugs in
Phase 4 plan, all verified empirically by author before incorporation
per memory feedback_review_empirical_over_theoretical Direction 2.
Amendments applied in-place to phase4_iter3_plan.md +
phase2_iter3_situation.md.

Critical findings:

  C1 first_part_header_bits = 0 was claimed cosmetic; actually
     UNSAFE. hantro_g1_vp8_dec.c:260 + rockchip_vpu2_hw_vp8_dec.c:372
     both read this field unconditionally to compute the macroblock
     DMA offset. Setting 0 would place hardware at wrong DMA offset
     for ALL macroblock data → garbage decode.
     Fix: frame.first_part_header_bits = slice->macroblock_offset
     (verified by source identity — vaapi_vp8.c:204 and
     v4l2_request_vp8.c:83 use byte-identical formulas).

  C2 first_part_size = slice->partition_size[0] was wrong; VAAPI's
     partition_size[0] is the REMAINING bytes after parsing
     (vaapi_vp8.c:209 confirms; va_dec_vp8.h:193-196 spec confirms).
     Kernel needs the TOTAL control partition size.
     Fix: frame.first_part_size = slice->partition_size[0] +
                                  ((macroblock_offset + 7) / 8)
     Phase 3 keyframe numerics confirm: 21923 + 819 = 22742 ✓.

  C3 VAProbabilityDataBufferType does not exist as a buffer-type
     enum; it's the struct name. The actual enum constant is
     VAProbabilityBufferType (= 13 per va.h:2058). Switch case
     using the wrong identifier would have failed Phase 6 compile.
     Fix: replace globally in phase2 + phase4 docs.

  C4 (s8) cast undefined in userspace. Kernel has 's8' typedef in
     linux/types.h (kernel-internal). UAPI exposes '__s8' (double-
     underscore). Userspace portable cast is int8_t from <stdint.h>.
     Fix: replace (s8) with (int8_t) in Clauses 6+7.

Suggested:

  S3 Clause 8 comment was factually wrong: hantro_vp8.c::
     hantro_vp8_prob_update reads coeff_probs unconditionally;
     there is NO default-table fallback. If probability_set==false,
     decode produces garbage. Practical risk low (FFmpeg vaapi_vp8.c
     always sends VAProbabilityBufferType per frame), but corrected
     comment + added assert(probability_set) runtime guard for
     immediate Phase 6 surfacing.

Plus 5 minor S/Q items documented; non-blocking for iter3.

Author's 7 review questions all answered directly in the review:
  Q1 quantization derivation: correct for typical content
  Q2 first_part_header_bits=0 safety: UNSAFE → C1
  Q3 num_dct_parts off-by-one: confirmed correct
  Q4 field availability: 2 compile failures found (C3 + C4)
  Q5 quant_update[s] semantics: signed delta confirmed
  Q6 SHOW_FRAME unconditional: safe for BBB scope
  Q7 buffer order independence: confirmed

Estimated saving: 1 Phase 6 → Phase 4 loopback + 2 Phase 6 fix-
forward commits. Review pass is the right path forward per memory
rule "Reviews are never skippable" — empty-review value =
empirical-verification value, regardless of finding count.

Refs:
  phase4_iter3_plan.md (amended in-place; Phase 5 amendments
                         section appended)
  phase2_iter3_situation.md (amended C3 globally)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:27:53 +00:00

14 KiB

Iteration 3 — Phase 5 (sonnet-architect review)

Second-model review of phase4_iter3_plan.md (commit 2918dda) by sonnet-architect on 2026-05-08. Reviewer ran cold-context source reads of fork files + reference sources + kernel hantro driver + VAAPI headers + verbatim Phase 3 baseline. All Critical findings empirically verified by author before incorporation per memory feedback_review_empirical_over_theoretical.md Direction 2.

Verdict

BLOCK — 4 Critical findings, all blocking iter3 close until amended:

  • C1, C2: latent decode-correctness failures (wrong hardware DMA addresses) → criterion-4 byte-compare divergence
  • C3, C4: Phase 6 build failures (undefined identifier + non-existent typedef in userspace)

All 4 amendments accepted into the amended Phase 4 plan.

Critical findings

C1 — first_part_header_bits = 0 is a correctness bug, NOT a cosmetic gap

Plan claim (Clause 9): "kernel hantro_vp8.c re-parses the uncompressed header and may not depend on this field." Marked as a known fidelity gap, leaving it 0.

Empirical evidence (verified by author):

references/linux-mainline/drivers/media/platform/verisilicon/hantro_g1_vp8_dec.c:260:
        hdr->first_part_header_bits + 8;

references/linux-mainline/drivers/media/platform/verisilicon/rockchip_vpu2_hw_vp8_dec.c:372:
        hdr->first_part_header_bits + 8;

Both kernel VP8 decoders (G1 on RK3399 + VPU2 on RK3568) read hdr->first_part_header_bits unconditionally to compute the macroblock-data DMA offset:

mb_offset_bits = first_part_offset * 8 + hdr->first_part_header_bits + 8;
mb_offset_bytes = mb_offset_bits / 8;
mb_size = hdr->first_part_size - (mb_offset_bytes - first_part_offset) + ...;

Setting this to 0 places the hardware at the wrong DMA offset for ALL macroblock data. The decode produces garbage or hangs.

Fix (verified by source identity):

frame.first_part_header_bits = slice->macroblock_offset;

Proof (author verified via grep):

references/ffmpeg-kwiboo/libavcodec/vaapi_vp8.c:204:
    .macroblock_offset = (8 * (s->coder_state_at_header_end.input - data) -
                          s->coder_state_at_header_end.bit_count - 8),

references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp8.c:83:
    .first_part_header_bits = (8 * (s->coder_state_at_header_end.input - data) -
                               s->coder_state_at_header_end.bit_count - 8),

Identical formulas, identical inputs. slice->macroblock_offset IS first_part_header_bits by source identity.

Anchor: VAAPI va_dec_vp8.h:186 confirms macroblock_offset field exists in VASliceParameterBufferVP8. Phase 3 keyframe verbatim: first_part_header_bits = 6550.

C2 — frame.first_part_size = slice->partition_size[0] is short by ceil(macroblock_offset/8) bytes

Plan claim (Clause 9): frame.first_part_size = slice->partition_size[0].

Empirical evidence:

references/ffmpeg-kwiboo/libavcodec/vaapi_vp8.c:209:
    sp.partition_size[0] = s->header_partition_size - ((sp.macroblock_offset + 7) / 8);

VAAPI's partition_size[0] is the remaining bytes of the control partition AFTER the header-bits portion (ceil(macroblock_offset / 8) bytes consumed up front). Confirmed by VAAPI spec comment at va_dec_vp8.h:193-196: "remaining bytes of control partition after parsed by application."

But kernel needs the TOTAL control partition size:

hantro_g1_vp8_dec.c:265:  mb_size = hdr->first_part_size - (mb_offset_bytes - first_part_offset) + ...;
hantro_g1_vp8_dec.c:293:  dct_part_offset = first_part_offset + hdr->first_part_size;

first_part_size here is the TOTAL control partition size (= FFmpeg's s->header_partition_size).

Numerical verification (Phase 3 frame-1 keyframe verbatim):

  • Kernel first_part_size = 22742 bytes
  • macroblock_offset = first_part_header_bits = 6550 bits → ceil(6550/8) = 819 bytes
  • VAAPI's partition_size[0] = 22742 - 819 = 21923 bytes
  • Plan-as-written would set first_part_size = 21923 → DCT partitions begin 819 bytes too early → garbage output

Fix:

frame.first_part_size = slice->partition_size[0] +
                        ((uint32_t)slice->macroblock_offset + 7) / 8;

C3 — VAProbabilityDataBufferType does not exist; will fail Phase 6 compile

Plan refers to (Phase 2 B7, Phase 4 Commit C): case VAProbabilityDataBufferType: in codec_store_buffer outer switch.

Empirical evidence (verified by grep):

reference/libva/va/va.h:2058:           VAProbabilityBufferType             = 13,
reference/libva/va/va_dec_vp8.h:218:    typedef struct _VAProbabilityDataBufferVP8 {

The plan confused the buffer-type enum constant (VAProbabilityBufferType) with the buffer-data struct name (VAProbabilityDataBufferVP8). Switch case must use the enum constant.

Fix: replace VAProbabilityDataBufferTypeVAProbabilityBufferType throughout (Phase 2 B7 + Phase 4 Commit C site list).

Cross-check: vaapi_vp8.c:147 confirms FFmpeg's VAAPI encoder uses VAProbabilityBufferType correctly.

C4 — (s8) cast undefined in userspace

Plan code (Clauses 6, 7):

frame.quant.y_dc_delta = (s8)(iqmatrix->quantization_index[0][1] -
                              iqmatrix->quantization_index[0][0]);
frame.segment.quant_update[i] = (s8)(...);

Empirical evidence (verified by grep):

/usr/include/asm-generic/int-ll64.h:20:  typedef __signed__ char __s8;

The kernel UAPI exposes __s8 (with leading double-underscore prefix). Bare s8 is a kernel-internal typedef from linux/types.h — NOT exposed to userspace. Compile fails: undefined identifier.

Fix: replace (s8) with (int8_t) from <stdint.h>. Or drop the explicit cast entirely — implicit conversion to __s8 is well-defined for values in [-128, 127] which the BBB content stays well within.

Suggested improvements

S1 — Quantization derivation correctness for non-BBB content (author's Q1)

The plan's iqmatrix[0][N] - iqmatrix[0][0] subtraction is correct for streams where no clip saturation occurs. Verified via vaapi_vp8.c:157-162: VAAPI fills quantization_index[0][N] = clip(base_quant[0] + yac_qi + delta_N).

Edge case: when yac_qi + delta_N saturates at 0 or 127 in VAAPI, the subtraction can't recover the original signed delta. BBB does not exercise this path. Document as known fidelity gap; not blocking iter3 close.

S2 — segment.quant_update[s] semantics (author's Q5)

VAAPI: iqmatrix[s][0] = clip(base_quant[s] + yac_qi). Plan: quant_update[s] = iqmatrix[s][0] - iqmatrix[0][0] = base_quant[s] - base_quant[0] (when no clip).

When base_quant[0] = 0 (typical), this matches FFmpeg V4L2's quant_update[i] = s->segmentation.base_quant[i]. When base_quant[0] != 0, plan's formula produces base_quant[s] - base_quant[0] not base_quant[s]. BBB has segmentation disabled; flag as known fidelity gap for non-standard streams.

S3 — Probability buffer fallback comment is factually wrong

Plan Clause 8 comment: "if not, kernel hantro driver re-derives from default tables."

This is incorrect. hantro_vp8.c::hantro_vp8_prob_update (lines 49-143) reads entropy->coeff_probs unconditionally — there is no default-table fallback in the driver. If probability_set==false and coeff_probs is zero, decode produces garbage.

Practical risk is low (vaapi_vp8.c:146-148 shows FFmpeg VAAPI always sends VAProbabilityBufferType per frame, so mpv-vaapi via FFmpeg should always set probability_set), but the plan comment should be corrected and an assert(probability_set) added (or a logged warning) so any miss surfaces immediately during Phase 6 runtime.

Recommended amendment: change the comment, add the runtime assertion.

S4 — lf.level source for segmentation-enabled streams

Plan: frame.lf.level = picture->loop_filter_level[0].

vaapi_vp8.c:87-95: VAAPI fills pp.loop_filter_level[i] = filter_level[i] + filter.level (effective level for segment i). For segmentation-disabled BBB, loop_filter_level[0] = filter.level (the base) — plan is correct. For segmentation-enabled streams in DELTA mode, kernel computes clamp(lf.level + lf_update[i]); using effective level for segment 0 as the base produces wrong results for segments 1-3. BBB out-of-scope; document fidelity gap.

S5 — segment.lf_update[0] is implicitly correct via self-subtraction

Plan Clause 7 fills lf_update[i] for i = 0..3 as loop_filter_level[i] - loop_filter_level[0]. For i=0: yields 0 regardless. Kernel computes clamp(lf.level + lf_update[0]) = lf.level. Correct. No issue.

Questions

Q1 — Does mpv-vaapi consumer always send VAProbabilityBufferType?

vaapi_vp8.c:146-148 shows FFmpeg's VAAPI encoder always sends it. mpv's libavcodec VP8 VAAPI path uses ff_vp8_vaapi_hwaccel which is the same code, so yes — probability_set should always be true.

Recommended: add assert(probability_set); debug guard so any miss surfaces in Phase 6 (this is the runtime check from S3).

Q2 — first_part_size recovery formula for non-byte-aligned macroblock_offset

((macroblock_offset + 7) / 8) rounds up. Phase 3 keyframe: macroblock_offset = 6550 bits → ceil(6550/8) = 819 bytes. Reconstructed first_part_size = 21923 + 819 = 22742 ✓. Author should verify this also holds for inter frames in Phase 7 (Phase 3 captured inter first_part_size = 1218 etc.; Phase 7 byte-compare will validate empirically).

Q3 — Which kernel VP8 driver runs on RK3399 (G1 vs VPU2)?

Both G1 (hantro_g1_vp8_dec.c) and VPU2 (rockchip_vpu2_hw_vp8_dec.c) use first_part_header_bits identically. RK3399 uses G1 (per hantro_drv.c machine-data table). C1 + C2 apply regardless. No action.

Empirical verification log

Claim Verification Result
first_part_header_bits used by hantro G1 grep -rn first_part_header_bits references/.../verisilicon/ Read at hantro_g1_vp8_dec.c:260 + rockchip_vpu2_hw_vp8_dec.c:372
macroblock_offset == first_part_header_bits formula identity Source compare vaapi_vp8.c:204-205 vs v4l2_request_vp8.c:83-84 Identical formulas
partition_size[0] != first_part_size vaapi_vp8.c:209 derivation + Phase 3 numerics Differ by ceil(macroblock_offset/8) = 819 bytes
VAProbabilityDataBufferType non-existent grep VAProbability* va.h va_dec_vp8.h Only VAProbabilityBufferType=13 exists; struct is VAProbabilityDataBufferVP8
s8 userspace typedef grep typedef.*s8 /usr/include/ Only __s8 (double-underscore) defined; bare s8 kernel-only
VAAPI quantization_index is absolute vaapi_vp8.c:152-163 Confirmed clip(base_quant[s] + yac_qi + delta_c)
coeff_probs direct memcpy safe Both paths apply coeff_bands_inverse reordering vaapi_vp8.c:133-143 and v4l2_request_vp8.c:141-153 use identical reordering — direct memcpy correct
num_dct_parts off-by-one direction vaapi_vp8.c:206: num_of_partitions = num_coeff_partitions + 1; v4l2_request_vp8.c:80: num_dct_parts = num_coeff_partitions Plan's num_of_partitions - 1 confirmed correct
VAAPI mb_mode_delta offset baked in vaapi_vp8.c:100: [i + 4]; v4l2_request_vp8.c:109: [i + MODE_I4x4] (= 4) Same offset; plan's direct copy correct
KEY_FRAME inversion direction vaapi_vp8.c:58: key_frame = !s->keyframe Plan's !pic_fields.bits.key_frame correct
hantro_vp8_prob_update no fallback Read hantro_vp8.c:49-143 Confirmed — comment in plan Clause 8 was wrong
All VAAPI field names valid Line-by-line grep against va_dec_vp8.h:71-241 All present

Author's 7 review questions — direct answers

Q Answer
Q1 quantization derivation Correct for typical content (no clip saturation, base_quant[0]=0). See S1+S2 for edge cases.
Q2 first_part_header_bits=0 safety UNSAFE → C1
Q3 num_dct_parts off-by-one num_of_partitions - 1 correct
Q4 field availability All present; (s8) fails compile (C4); VAProbabilityDataBufferType fails compile (C3)
Q5 quant_update[s] semantics Signed delta; plan's subtraction correct when base_quant[0]=0
Q6 SHOW_FRAME unconditional Safe for BBB scope; documented fidelity gap for invisible-frame content
Q7 buffer order independence Confirmed — each codec_store_buffer writes a distinct field

Required Phase 4 plan amendments

ID Location Change
C1 Clause 9, plan-comment + code frame.first_part_header_bits = slice->macroblock_offset;
C2 Clause 9, code frame.first_part_size = slice->partition_size[0] + ((uint32_t)slice->macroblock_offset + 7) / 8;
C3 Phase 2 B7 + Phase 4 Commit C case VAProbabilityBufferType: (drop "Data")
C4 Clauses 6+7 (s8)(int8_t) (or drop cast)
S3 Clause 8 comment + add runtime check Correct comment; add assert(probability_set) or warn-log

C1 + C2 affect Clause 9 jointly; rewrite as a single block. C3 affects 5 places (Phase 2 B7 + Phase 4 Commit C site list + B7 patch shape). C4 affects ~6 places (Clauses 6+7). S3 affects 2 places.

Process discipline note

This review was conducted with empirical verification of every Critical claim BEFORE inclusion (memory feedback_review_empirical_over_theoretical.md Direction 2). Every C1-C4 finding was verified by grep / source-read of multiple files, and the proposed amendments were cross-checked against working code (FFmpeg VAAPI ref, FFmpeg V4L2 ref). Author independently verified each before incorporating into the amended plan.

The review caught what would have been a 2-loopback Phase 6 failure cycle (compile failures C3+C4) plus a Phase 7 byte-compare divergence (C1+C2 hardware-DMA-offset bug). Without this review pass, iter3 Phase 6 would have failed first compile attempt, then after fix-forward would have produced visible output but with wrong DMA addresses → Phase 7 → Phase 4 loopback.