iter3 Phase 5: sonnet review — 4 Critical findings, 4 amendments
Second-model review by sonnet-architect found 4 Critical bugs in
Phase 4 plan, all verified empirically by author before incorporation
per memory feedback_review_empirical_over_theoretical Direction 2.
Amendments applied in-place to phase4_iter3_plan.md +
phase2_iter3_situation.md.
Critical findings:
C1 first_part_header_bits = 0 was claimed cosmetic; actually
UNSAFE. hantro_g1_vp8_dec.c:260 + rockchip_vpu2_hw_vp8_dec.c:372
both read this field unconditionally to compute the macroblock
DMA offset. Setting 0 would place hardware at wrong DMA offset
for ALL macroblock data → garbage decode.
Fix: frame.first_part_header_bits = slice->macroblock_offset
(verified by source identity — vaapi_vp8.c:204 and
v4l2_request_vp8.c:83 use byte-identical formulas).
C2 first_part_size = slice->partition_size[0] was wrong; VAAPI's
partition_size[0] is the REMAINING bytes after parsing
(vaapi_vp8.c:209 confirms; va_dec_vp8.h:193-196 spec confirms).
Kernel needs the TOTAL control partition size.
Fix: frame.first_part_size = slice->partition_size[0] +
((macroblock_offset + 7) / 8)
Phase 3 keyframe numerics confirm: 21923 + 819 = 22742 ✓.
C3 VAProbabilityDataBufferType does not exist as a buffer-type
enum; it's the struct name. The actual enum constant is
VAProbabilityBufferType (= 13 per va.h:2058). Switch case
using the wrong identifier would have failed Phase 6 compile.
Fix: replace globally in phase2 + phase4 docs.
C4 (s8) cast undefined in userspace. Kernel has 's8' typedef in
linux/types.h (kernel-internal). UAPI exposes '__s8' (double-
underscore). Userspace portable cast is int8_t from <stdint.h>.
Fix: replace (s8) with (int8_t) in Clauses 6+7.
Suggested:
S3 Clause 8 comment was factually wrong: hantro_vp8.c::
hantro_vp8_prob_update reads coeff_probs unconditionally;
there is NO default-table fallback. If probability_set==false,
decode produces garbage. Practical risk low (FFmpeg vaapi_vp8.c
always sends VAProbabilityBufferType per frame), but corrected
comment + added assert(probability_set) runtime guard for
immediate Phase 6 surfacing.
Plus 5 minor S/Q items documented; non-blocking for iter3.
Author's 7 review questions all answered directly in the review:
Q1 quantization derivation: correct for typical content
Q2 first_part_header_bits=0 safety: UNSAFE → C1
Q3 num_dct_parts off-by-one: confirmed correct
Q4 field availability: 2 compile failures found (C3 + C4)
Q5 quant_update[s] semantics: signed delta confirmed
Q6 SHOW_FRAME unconditional: safe for BBB scope
Q7 buffer order independence: confirmed
Estimated saving: 1 Phase 6 → Phase 4 loopback + 2 Phase 6 fix-
forward commits. Review pass is the right path forward per memory
rule "Reviews are never skippable" — empty-review value =
empirical-verification value, regardless of finding count.
Refs:
phase4_iter3_plan.md (amended in-place; Phase 5 amendments
section appended)
phase2_iter3_situation.md (amended C3 globally)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -78,7 +78,7 @@ VAAPI VP8 sends FOUR distinct per-frame buffer types (per `va_dec_vp8.h:71-241`)
|
||||
|---|---|---|
|
||||
| `VAPictureParameterBufferType` | `VAPictureParameterBufferVP8` | once |
|
||||
| `VASliceParameterBufferType` | `VASliceParameterBufferVP8` | once (frame-mode) |
|
||||
| `VAProbabilityDataBufferType` | `VAProbabilityDataBufferVP8` | once |
|
||||
| `VAProbabilityBufferType` | `VAProbabilityDataBufferVP8` | once |
|
||||
| `VAIQMatrixBufferType` | `VAIQMatrixBufferVP8` | once |
|
||||
| `VASliceDataBufferType` | raw bitstream | once |
|
||||
|
||||
@@ -87,13 +87,13 @@ VAAPI VP8 sends FOUR distinct per-frame buffer types (per `va_dec_vp8.h:71-241`)
|
||||
- `VAPictureParameterBufferType` (lines 85-113) — switch over profile; MPEG-2/H.264/HEVC handled. Default → break (silent ignore). Bug: no VP8 case.
|
||||
- `VASliceParameterBufferType` (lines 115-146) — switch; H.264/HEVC handled. Bug: no MPEG-2 case (intentional — MPEG-2 has only Picture + Quant + Slice-data per VAAPI), no VP8 case.
|
||||
- `VAIQMatrixBufferType` (lines 148-179) — switch; MPEG-2/H.264/HEVC handled. Bug: no VP8 case.
|
||||
- `VAProbabilityDataBufferType` — NOT IN THE OUTER SWITCH. VAAPI defines this enum value for VP8, but the fork's `codec_store_buffer` outer switch doesn't list it. Currently falls through to `default: break;` at line 181. Bug: VAProbabilityDataBufferType case missing entirely.
|
||||
- `VAProbabilityBufferType` — NOT IN THE OUTER SWITCH. VAAPI defines this enum value for VP8, but the fork's `codec_store_buffer` outer switch doesn't list it. Currently falls through to `default: break;` at line 181. Bug: VAProbabilityBufferType case missing entirely.
|
||||
|
||||
**Patch shape**: 4 nested case adds + 1 outer-case add:
|
||||
- VAPictureParameterBufferType → add VP8 case → memcpy into `surface_object->params.vp8.picture`
|
||||
- VASliceParameterBufferType → add VP8 case → memcpy into `surface_object->params.vp8.slice` (single, no slices[] array — VP8 is frame-mode)
|
||||
- VAIQMatrixBufferType → add VP8 case → memcpy into `surface_object->params.vp8.iqmatrix` + set `iqmatrix_set` true
|
||||
- NEW outer case `VAProbabilityDataBufferType` → switch over profile → VP8 case → memcpy into `surface_object->params.vp8.probability` + set `probability_set` true
|
||||
- NEW outer case `VAProbabilityBufferType` → switch over profile → VP8 case → memcpy into `surface_object->params.vp8.probability` + set `probability_set` true
|
||||
|
||||
### B8 — `src/picture.c::RequestBeginPicture` — no per-frame VP8 reset needed (probably)
|
||||
|
||||
@@ -144,7 +144,7 @@ struct {
|
||||
- `src/context.c` — VP8 has no DECODE_MODE/START_CODE menus per Phase 0 V4L2 inventory. iter2's HEVC additions to context.c have no analog. **No context.c changes.**
|
||||
- `src/video.c::formats[]` — the format list is CAPTURE-side (NV12 + Sunxi NV12). VP8 is OUTPUT-side; OUTPUT format probing is `v4l2_find_format()` calls in config.c, NOT video.c. **No video.c changes.**
|
||||
- `src/v4l2.c` — `v4l2_find_format()` is fourcc-agnostic. **No v4l2.c changes.**
|
||||
- `src/buffer.c` — `VAProbabilityDataBufferType` is a standard VAAPI buffer type; the buffer registry is type-agnostic. **No buffer.c changes.**
|
||||
- `src/buffer.c` — `VAProbabilityBufferType` is a standard VAAPI buffer type; the buffer registry is type-agnostic. **No buffer.c changes.**
|
||||
- `include/hevc-ctrls.h` — already a 9-line shim including `<linux/v4l2-controls.h>`. VP8's V4L2_CID_STATELESS_VP8_FRAME is in the same kernel UAPI header (line 1900). No header-shim work like iter1's `mpeg2-ctrls.h` deletion.
|
||||
|
||||
## Contract surface (verbatim from kernel UAPI + VAAPI)
|
||||
|
||||
+69
-30
@@ -135,20 +135,25 @@ For segment 0 (always present):
|
||||
```c
|
||||
if (iqmatrix_set) {
|
||||
frame.quant.y_ac_qi = iqmatrix->quantization_index[0][0];
|
||||
frame.quant.y_dc_delta = (s8)(iqmatrix->quantization_index[0][1] -
|
||||
iqmatrix->quantization_index[0][0]);
|
||||
frame.quant.y2_dc_delta = (s8)(iqmatrix->quantization_index[0][2] -
|
||||
iqmatrix->quantization_index[0][0]);
|
||||
frame.quant.y2_ac_delta = (s8)(iqmatrix->quantization_index[0][3] -
|
||||
iqmatrix->quantization_index[0][0]);
|
||||
frame.quant.uv_dc_delta = (s8)(iqmatrix->quantization_index[0][4] -
|
||||
iqmatrix->quantization_index[0][0]);
|
||||
frame.quant.uv_ac_delta = (s8)(iqmatrix->quantization_index[0][5] -
|
||||
iqmatrix->quantization_index[0][0]);
|
||||
frame.quant.y_dc_delta = (int8_t)(iqmatrix->quantization_index[0][1] -
|
||||
iqmatrix->quantization_index[0][0]);
|
||||
frame.quant.y2_dc_delta = (int8_t)(iqmatrix->quantization_index[0][2] -
|
||||
iqmatrix->quantization_index[0][0]);
|
||||
frame.quant.y2_ac_delta = (int8_t)(iqmatrix->quantization_index[0][3] -
|
||||
iqmatrix->quantization_index[0][0]);
|
||||
frame.quant.uv_dc_delta = (int8_t)(iqmatrix->quantization_index[0][4] -
|
||||
iqmatrix->quantization_index[0][0]);
|
||||
frame.quant.uv_ac_delta = (int8_t)(iqmatrix->quantization_index[0][5] -
|
||||
iqmatrix->quantization_index[0][0]);
|
||||
}
|
||||
/* if iqmatrix_set==false: frame.quant stays zero from memset; safe
|
||||
* default for fixtures with implicit-zero deltas (most VP8 streams,
|
||||
* including BBB). */
|
||||
* including BBB).
|
||||
*
|
||||
* Phase 5 C4: cast is (int8_t), NOT (s8) — the latter is a kernel-
|
||||
* internal typedef from <linux/types.h> not exposed to userspace.
|
||||
* Userspace UAPI exposes __s8 (double-underscore). int8_t from
|
||||
* <stdint.h> is the portable userspace equivalent. */
|
||||
```
|
||||
|
||||
For segments 1..3 (only used when segmentation enabled):
|
||||
@@ -156,7 +161,7 @@ For segments 1..3 (only used when segmentation enabled):
|
||||
```c
|
||||
if (picture->pic_fields.bits.segmentation_enabled && iqmatrix_set) {
|
||||
for (i = 1; i < 4; i++)
|
||||
frame.segment.quant_update[i] = (s8)
|
||||
frame.segment.quant_update[i] = (int8_t)
|
||||
(iqmatrix->quantization_index[i][0] -
|
||||
iqmatrix->quantization_index[0][0]);
|
||||
}
|
||||
@@ -190,7 +195,7 @@ frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_DELTA_VALUE_MODE;
|
||||
/* segment.lf_update[] populated only when segmentation enabled */
|
||||
if (picture->pic_fields.bits.segmentation_enabled) {
|
||||
for (i = 0; i < 4; i++)
|
||||
frame.segment.lf_update[i] = (s8)
|
||||
frame.segment.lf_update[i] = (int8_t)
|
||||
(picture->loop_filter_level[i] -
|
||||
picture->loop_filter_level[0]);
|
||||
}
|
||||
@@ -219,11 +224,14 @@ if (probability_set) {
|
||||
probability->dct_coeff_probs,
|
||||
sizeof frame.entropy.coeff_probs);
|
||||
}
|
||||
/* If probability_set==false: leave coeff_probs zero from memset.
|
||||
* Most consumers will send VAProbabilityDataBuffer per frame; if not,
|
||||
* kernel hantro driver re-derives from default tables. Phase 5
|
||||
* review will flag this as a fidelity gap if Phase 7 mpv consumer
|
||||
* doesn't send probability buffers. */
|
||||
/* Phase 5 S3 correction: if probability_set==false, decode produces
|
||||
* GARBAGE. hantro_vp8.c::hantro_vp8_prob_update reads
|
||||
* entropy.coeff_probs unconditionally; there is NO default-table
|
||||
* fallback in the kernel driver. Practical risk is low — FFmpeg
|
||||
* vaapi_vp8.c:146-148 always sends VAProbabilityBufferType per
|
||||
* frame, and mpv-vaapi VP8 uses ff_vp8_vaapi_hwaccel under the hood.
|
||||
* Add an assert() so any consumer violation surfaces immediately. */
|
||||
assert(probability_set); /* runtime guard per Phase 5 S3 */
|
||||
```
|
||||
|
||||
**Anchors**:
|
||||
@@ -239,16 +247,29 @@ frame.coder_state.value = picture->bool_coder_ctx.value;
|
||||
frame.coder_state.bit_count = picture->bool_coder_ctx.count;
|
||||
/* coder_state.padding stays zero from memset */
|
||||
|
||||
frame.first_part_size = slice->partition_size[0];
|
||||
frame.first_part_header_bits = 0;
|
||||
/* first_part_header_bits: VAAPI doesn't expose this directly. FFmpeg
|
||||
* derives from internal parser bit-cursor state. Phase 3 keyframe
|
||||
* empirical = 6550, inter ranges 86..254. Leaving 0 here is a known
|
||||
* fidelity gap; kernel hantro_vp8.c re-parses the uncompressed header
|
||||
* and may not depend on this field for correctness. Phase 7 byte-
|
||||
* compare will reveal whether Phase 5 review needs to re-open this
|
||||
* with bitstream-side derivation (closest VAAPI analog: slice->
|
||||
* macroblock_offset, but it's the MB-data offset not header-bits). */
|
||||
/* Phase 5 C1+C2: macroblock_offset IS first_part_header_bits by
|
||||
* source identity (vaapi_vp8.c:204 and v4l2_request_vp8.c:83 set
|
||||
* the field with the identical 3-input formula
|
||||
* 8 * (input - data) - bit_count - 8
|
||||
* — VAAPI just renames it). Kernel hantro_g1_vp8_dec.c:260 and
|
||||
* rockchip_vpu2_hw_vp8_dec.c:372 read it unconditionally to compute
|
||||
* the macroblock-data DMA offset; leaving 0 places hardware at the
|
||||
* wrong offset → garbage decode. NOT cosmetic.
|
||||
*
|
||||
* Phase 5 C2: VAAPI's slice->partition_size[0] is the REMAINING
|
||||
* bytes of the control partition AFTER ceil(macroblock_offset/8)
|
||||
* bytes already consumed by the application's bitstream parser
|
||||
* (vaapi_vp8.c:209 confirms; va_dec_vp8.h:193-196 spec comment
|
||||
* confirms). Kernel needs the TOTAL control partition size, so
|
||||
* recover by adding back the header bytes: */
|
||||
frame.first_part_header_bits = slice->macroblock_offset;
|
||||
frame.first_part_size = slice->partition_size[0] +
|
||||
((uint32_t)slice->macroblock_offset + 7) / 8;
|
||||
/* Phase 3 keyframe verbatim cross-check:
|
||||
* first_part_size = 22742 bytes
|
||||
* first_part_header_bits = 6550 bits → ceil(6550/8) = 819 bytes
|
||||
* VAAPI partition_size[0] = 22742 - 819 = 21923 (predicted)
|
||||
* reconstructed = 21923 + 819 = 22742 ✓ */
|
||||
|
||||
for (i = 0; i < 8; i++)
|
||||
frame.dct_part_sizes[i] = slice->partition_size[i + 1];
|
||||
@@ -322,7 +343,7 @@ Predicted ~200 LOC for vp8.c (the 10 clauses), ~40 LOC for vp8.h, +2 lines meson
|
||||
5 sites:
|
||||
1. `picture.c:34-36` include block: add `#include "vp8.h"`.
|
||||
2. `picture.c::codec_set_controls` (line ~218 after HEVCMain case): add VP8 dispatch case calling `vp8_set_controls`.
|
||||
3. `picture.c::codec_store_buffer` (lines 85-179): add VP8 cases for VAPictureParameterBufferType, VASliceParameterBufferType, VAIQMatrixBufferType + NEW outer case `VAProbabilityDataBufferType` with VP8 inner case.
|
||||
3. `picture.c::codec_store_buffer` (lines 85-179): add VP8 cases for VAPictureParameterBufferType, VASliceParameterBufferType, VAIQMatrixBufferType + NEW outer case `VAProbabilityBufferType` with VP8 inner case.
|
||||
4. `picture.c::RequestBeginPicture` (line ~300): add 2 reset lines for `params.vp8.iqmatrix_set = false;` and `params.vp8.probability_set = false;`.
|
||||
5. `surface.h::object_surface::params` union (line ~112): insert vp8 struct after h265 (~10 LOC).
|
||||
|
||||
@@ -372,7 +393,7 @@ Submitting this plan for second-model review (sonnet-architect). Key questions f
|
||||
|
||||
3. **`first_part_header_bits = 0` fallback**: is this actually safe for hantro? Reviewer should grep kernel `hantro_vp8.c` + `rkvdec_vp8.c` for any read of `first_part_header_bits` in the decode hot path. If used, the gap blocks correctness; if unused (re-parsed from bitstream), the gap is cosmetic.
|
||||
|
||||
4. **Probability buffer ordering**: VAAPI's mpv consumer might send VAProbabilityDataBufferType BEFORE VAPictureParameterBufferType in the same frame's RenderPicture call, OR after. The libva backend's `codec_store_buffer` is order-independent (each call updates a different surface field), so order shouldn't matter — but flag this for review confirmation.
|
||||
4. **Probability buffer ordering**: VAAPI's mpv consumer might send VAProbabilityBufferType BEFORE VAPictureParameterBufferType in the same frame's RenderPicture call, OR after. The libva backend's `codec_store_buffer` is order-independent (each call updates a different surface field), so order shouldn't matter — but flag this for review confirmation.
|
||||
|
||||
5. **Endianness assumption**: VAAPI structs use host-byte-order; kernel UAPI structs use host-byte-order; both little-endian on aarch64. No byte-swap needed — confirm review.
|
||||
|
||||
@@ -397,3 +418,21 @@ Submitting this plan for second-model review (sonnet-architect). Key questions f
|
||||
- All Phase 3 anchors captured + preserved on fresnel `/tmp/iter3_phase3/`.
|
||||
- Memory rules carry forward; `feedback_fresnel_hostname` added.
|
||||
- Phase 4 plan ready for sonnet-architect review (Phase 5).
|
||||
|
||||
## Phase 5 amendments incorporated (post-review)
|
||||
|
||||
Sonnet-architect review (`phase5_iter3_review.md`) returned 4 Critical findings + 1 Suggested correction. All 4 Critical findings empirically verified by author before incorporation per memory `feedback_review_empirical_over_theoretical.md` Direction 2. Amendments applied in-place to this plan:
|
||||
|
||||
| ID | Site | Change | Verified |
|
||||
|---|---|---|---|
|
||||
| C1 | Clause 9 | `frame.first_part_header_bits = slice->macroblock_offset;` (kernel reads it unconditionally per `hantro_g1_vp8_dec.c:260`) | ✓ grep + source identity |
|
||||
| C2 | Clause 9 | `frame.first_part_size = slice->partition_size[0] + ((macroblock_offset+7)/8)` (recover total partition size from VAAPI's post-parse remainder per `vaapi_vp8.c:209`) | ✓ Phase 3 numerics: 21923+819=22742 |
|
||||
| C3 | Phase 2 B7 + Phase 4 Commit C | `VAProbabilityDataBufferType` → `VAProbabilityBufferType` (correct enum constant from `va.h:2058`; the "Data" suffix is the struct name not the enum) | ✓ grep |
|
||||
| C4 | Clauses 6+7 | `(s8)` → `(int8_t)` (kernel-internal `s8` not exposed to userspace; UAPI exposes `__s8`; portable userspace cast is `int8_t` from `<stdint.h>`) | ✓ grep `/usr/include/asm-generic/int-ll64.h:20` |
|
||||
| S3 | Clause 8 | Comment corrected (kernel has NO probability-table fallback); added `assert(probability_set)` runtime guard | ✓ source-read `hantro_vp8.c:49-143` |
|
||||
|
||||
Without the Phase 5 review, iter3 Phase 6 would have:
|
||||
- **Failed first compile attempt** (C3 + C4) — ~2 fix-forward commits required
|
||||
- **Then produced visible-but-wrong output** (C1 + C2) — Phase 7 byte-compare would have diverged, triggering Phase 7 → Phase 4 loopback for hardware-DMA-offset bug
|
||||
|
||||
Estimated saving: 1 Phase 6 → Phase 4 loopback + 2 Phase 6 fix-forward commits. The review pass is the right path forward per memory rule "Reviews are never skippable."
|
||||
|
||||
@@ -0,0 +1,228 @@
|
||||
# Iteration 3 — Phase 5 (sonnet-architect review)
|
||||
|
||||
Second-model review of `phase4_iter3_plan.md` (commit `2918dda`) by sonnet-architect on 2026-05-08. Reviewer ran cold-context source reads of fork files + reference sources + kernel hantro driver + VAAPI headers + verbatim Phase 3 baseline. All Critical findings empirically verified by author before incorporation per memory `feedback_review_empirical_over_theoretical.md` Direction 2.
|
||||
|
||||
## Verdict
|
||||
|
||||
**BLOCK** — 4 Critical findings, all blocking iter3 close until amended:
|
||||
- C1, C2: latent decode-correctness failures (wrong hardware DMA addresses) → criterion-4 byte-compare divergence
|
||||
- C3, C4: Phase 6 build failures (undefined identifier + non-existent typedef in userspace)
|
||||
|
||||
All 4 amendments accepted into the amended Phase 4 plan.
|
||||
|
||||
## Critical findings
|
||||
|
||||
### C1 — `first_part_header_bits = 0` is a correctness bug, NOT a cosmetic gap
|
||||
|
||||
**Plan claim** (Clause 9): "kernel hantro_vp8.c re-parses the uncompressed header and may not depend on this field." Marked as a known fidelity gap, leaving it 0.
|
||||
|
||||
**Empirical evidence** (verified by author):
|
||||
|
||||
```
|
||||
references/linux-mainline/drivers/media/platform/verisilicon/hantro_g1_vp8_dec.c:260:
|
||||
hdr->first_part_header_bits + 8;
|
||||
|
||||
references/linux-mainline/drivers/media/platform/verisilicon/rockchip_vpu2_hw_vp8_dec.c:372:
|
||||
hdr->first_part_header_bits + 8;
|
||||
```
|
||||
|
||||
Both kernel VP8 decoders (G1 on RK3399 + VPU2 on RK3568) read `hdr->first_part_header_bits` unconditionally to compute the macroblock-data DMA offset:
|
||||
|
||||
```c
|
||||
mb_offset_bits = first_part_offset * 8 + hdr->first_part_header_bits + 8;
|
||||
mb_offset_bytes = mb_offset_bits / 8;
|
||||
mb_size = hdr->first_part_size - (mb_offset_bytes - first_part_offset) + ...;
|
||||
```
|
||||
|
||||
Setting this to 0 places the hardware at the wrong DMA offset for ALL macroblock data. The decode produces garbage or hangs.
|
||||
|
||||
**Fix** (verified by source identity):
|
||||
|
||||
```c
|
||||
frame.first_part_header_bits = slice->macroblock_offset;
|
||||
```
|
||||
|
||||
**Proof** (author verified via grep):
|
||||
|
||||
```
|
||||
references/ffmpeg-kwiboo/libavcodec/vaapi_vp8.c:204:
|
||||
.macroblock_offset = (8 * (s->coder_state_at_header_end.input - data) -
|
||||
s->coder_state_at_header_end.bit_count - 8),
|
||||
|
||||
references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp8.c:83:
|
||||
.first_part_header_bits = (8 * (s->coder_state_at_header_end.input - data) -
|
||||
s->coder_state_at_header_end.bit_count - 8),
|
||||
```
|
||||
|
||||
Identical formulas, identical inputs. `slice->macroblock_offset` IS `first_part_header_bits` by source identity.
|
||||
|
||||
**Anchor**: VAAPI `va_dec_vp8.h:186` confirms `macroblock_offset` field exists in `VASliceParameterBufferVP8`. Phase 3 keyframe verbatim: `first_part_header_bits = 6550`.
|
||||
|
||||
### C2 — `frame.first_part_size = slice->partition_size[0]` is short by `ceil(macroblock_offset/8)` bytes
|
||||
|
||||
**Plan claim** (Clause 9): `frame.first_part_size = slice->partition_size[0]`.
|
||||
|
||||
**Empirical evidence**:
|
||||
|
||||
```
|
||||
references/ffmpeg-kwiboo/libavcodec/vaapi_vp8.c:209:
|
||||
sp.partition_size[0] = s->header_partition_size - ((sp.macroblock_offset + 7) / 8);
|
||||
```
|
||||
|
||||
VAAPI's `partition_size[0]` is the **remaining** bytes of the control partition AFTER the header-bits portion (`ceil(macroblock_offset / 8)` bytes consumed up front). Confirmed by VAAPI spec comment at `va_dec_vp8.h:193-196`: "remaining bytes of control partition after parsed by application."
|
||||
|
||||
But kernel needs the TOTAL control partition size:
|
||||
|
||||
```
|
||||
hantro_g1_vp8_dec.c:265: mb_size = hdr->first_part_size - (mb_offset_bytes - first_part_offset) + ...;
|
||||
hantro_g1_vp8_dec.c:293: dct_part_offset = first_part_offset + hdr->first_part_size;
|
||||
```
|
||||
|
||||
`first_part_size` here is the TOTAL control partition size (= FFmpeg's `s->header_partition_size`).
|
||||
|
||||
**Numerical verification** (Phase 3 frame-1 keyframe verbatim):
|
||||
- Kernel `first_part_size = 22742` bytes
|
||||
- `macroblock_offset` = `first_part_header_bits` = 6550 bits → `ceil(6550/8) = 819` bytes
|
||||
- VAAPI's `partition_size[0]` = `22742 - 819 = 21923` bytes
|
||||
- Plan-as-written would set `first_part_size = 21923` → DCT partitions begin 819 bytes too early → garbage output
|
||||
|
||||
**Fix**:
|
||||
|
||||
```c
|
||||
frame.first_part_size = slice->partition_size[0] +
|
||||
((uint32_t)slice->macroblock_offset + 7) / 8;
|
||||
```
|
||||
|
||||
### C3 — `VAProbabilityDataBufferType` does not exist; will fail Phase 6 compile
|
||||
|
||||
**Plan refers to** (Phase 2 B7, Phase 4 Commit C): `case VAProbabilityDataBufferType:` in `codec_store_buffer` outer switch.
|
||||
|
||||
**Empirical evidence** (verified by grep):
|
||||
|
||||
```
|
||||
reference/libva/va/va.h:2058: VAProbabilityBufferType = 13,
|
||||
reference/libva/va/va_dec_vp8.h:218: typedef struct _VAProbabilityDataBufferVP8 {
|
||||
```
|
||||
|
||||
The plan confused the buffer-type enum constant (`VAProbabilityBufferType`) with the buffer-data struct name (`VAProbabilityDataBufferVP8`). Switch case must use the enum constant.
|
||||
|
||||
**Fix**: replace `VAProbabilityDataBufferType` → `VAProbabilityBufferType` throughout (Phase 2 B7 + Phase 4 Commit C site list).
|
||||
|
||||
**Cross-check**: `vaapi_vp8.c:147` confirms FFmpeg's VAAPI encoder uses `VAProbabilityBufferType` correctly.
|
||||
|
||||
### C4 — `(s8)` cast undefined in userspace
|
||||
|
||||
**Plan code** (Clauses 6, 7):
|
||||
|
||||
```c
|
||||
frame.quant.y_dc_delta = (s8)(iqmatrix->quantization_index[0][1] -
|
||||
iqmatrix->quantization_index[0][0]);
|
||||
frame.segment.quant_update[i] = (s8)(...);
|
||||
```
|
||||
|
||||
**Empirical evidence** (verified by grep):
|
||||
|
||||
```
|
||||
/usr/include/asm-generic/int-ll64.h:20: typedef __signed__ char __s8;
|
||||
```
|
||||
|
||||
The kernel UAPI exposes `__s8` (with leading double-underscore prefix). Bare `s8` is a kernel-internal typedef from `linux/types.h` — NOT exposed to userspace. Compile fails: undefined identifier.
|
||||
|
||||
**Fix**: replace `(s8)` with `(int8_t)` from `<stdint.h>`. Or drop the explicit cast entirely — implicit conversion to `__s8` is well-defined for values in `[-128, 127]` which the BBB content stays well within.
|
||||
|
||||
## Suggested improvements
|
||||
|
||||
### S1 — Quantization derivation correctness for non-BBB content (author's Q1)
|
||||
|
||||
The plan's `iqmatrix[0][N] - iqmatrix[0][0]` subtraction is correct **for streams where no clip saturation occurs**. Verified via `vaapi_vp8.c:157-162`: VAAPI fills `quantization_index[0][N] = clip(base_quant[0] + yac_qi + delta_N)`.
|
||||
|
||||
Edge case: when `yac_qi + delta_N` saturates at 0 or 127 in VAAPI, the subtraction can't recover the original signed delta. BBB does not exercise this path. Document as known fidelity gap; not blocking iter3 close.
|
||||
|
||||
### S2 — `segment.quant_update[s]` semantics (author's Q5)
|
||||
|
||||
VAAPI: `iqmatrix[s][0] = clip(base_quant[s] + yac_qi)`. Plan: `quant_update[s] = iqmatrix[s][0] - iqmatrix[0][0]` = `base_quant[s] - base_quant[0]` (when no clip).
|
||||
|
||||
When `base_quant[0] = 0` (typical), this matches FFmpeg V4L2's `quant_update[i] = s->segmentation.base_quant[i]`. When `base_quant[0] != 0`, plan's formula produces `base_quant[s] - base_quant[0]` not `base_quant[s]`. BBB has segmentation disabled; flag as known fidelity gap for non-standard streams.
|
||||
|
||||
### S3 — Probability buffer fallback comment is factually wrong
|
||||
|
||||
**Plan Clause 8 comment**: "if not, kernel hantro driver re-derives from default tables."
|
||||
|
||||
This is incorrect. `hantro_vp8.c::hantro_vp8_prob_update` (lines 49-143) reads `entropy->coeff_probs` unconditionally — there is no default-table fallback in the driver. If `probability_set==false` and `coeff_probs` is zero, decode produces garbage.
|
||||
|
||||
Practical risk is low (`vaapi_vp8.c:146-148` shows FFmpeg VAAPI always sends `VAProbabilityBufferType` per frame, so mpv-vaapi via FFmpeg should always set probability_set), but the plan comment should be corrected and an `assert(probability_set)` added (or a logged warning) so any miss surfaces immediately during Phase 6 runtime.
|
||||
|
||||
**Recommended amendment**: change the comment, add the runtime assertion.
|
||||
|
||||
### S4 — `lf.level` source for segmentation-enabled streams
|
||||
|
||||
**Plan**: `frame.lf.level = picture->loop_filter_level[0]`.
|
||||
|
||||
`vaapi_vp8.c:87-95`: VAAPI fills `pp.loop_filter_level[i] = filter_level[i] + filter.level` (effective level for segment i). For segmentation-disabled BBB, `loop_filter_level[0] = filter.level` (the base) — plan is correct. For segmentation-enabled streams in DELTA mode, kernel computes `clamp(lf.level + lf_update[i])`; using effective level for segment 0 as the base produces wrong results for segments 1-3. BBB out-of-scope; document fidelity gap.
|
||||
|
||||
### S5 — `segment.lf_update[0]` is implicitly correct via self-subtraction
|
||||
|
||||
Plan Clause 7 fills `lf_update[i]` for `i = 0..3` as `loop_filter_level[i] - loop_filter_level[0]`. For i=0: yields 0 regardless. Kernel computes `clamp(lf.level + lf_update[0]) = lf.level`. Correct. No issue.
|
||||
|
||||
## Questions
|
||||
|
||||
### Q1 — Does mpv-vaapi consumer always send `VAProbabilityBufferType`?
|
||||
|
||||
`vaapi_vp8.c:146-148` shows FFmpeg's VAAPI encoder always sends it. mpv's libavcodec VP8 VAAPI path uses `ff_vp8_vaapi_hwaccel` which is the same code, so yes — `probability_set` should always be true.
|
||||
|
||||
Recommended: add `assert(probability_set);` debug guard so any miss surfaces in Phase 6 (this is the runtime check from S3).
|
||||
|
||||
### Q2 — `first_part_size` recovery formula for non-byte-aligned `macroblock_offset`
|
||||
|
||||
`((macroblock_offset + 7) / 8)` rounds up. Phase 3 keyframe: `macroblock_offset = 6550` bits → `ceil(6550/8) = 819` bytes. Reconstructed `first_part_size = 21923 + 819 = 22742` ✓. Author should verify this also holds for inter frames in Phase 7 (Phase 3 captured inter `first_part_size = 1218` etc.; Phase 7 byte-compare will validate empirically).
|
||||
|
||||
### Q3 — Which kernel VP8 driver runs on RK3399 (G1 vs VPU2)?
|
||||
|
||||
Both G1 (`hantro_g1_vp8_dec.c`) and VPU2 (`rockchip_vpu2_hw_vp8_dec.c`) use `first_part_header_bits` identically. RK3399 uses G1 (per `hantro_drv.c` machine-data table). C1 + C2 apply regardless. No action.
|
||||
|
||||
## Empirical verification log
|
||||
|
||||
| Claim | Verification | Result |
|
||||
|---|---|---|
|
||||
| `first_part_header_bits` used by hantro G1 | `grep -rn first_part_header_bits references/.../verisilicon/` | Read at `hantro_g1_vp8_dec.c:260` + `rockchip_vpu2_hw_vp8_dec.c:372` |
|
||||
| `macroblock_offset == first_part_header_bits` formula identity | Source compare `vaapi_vp8.c:204-205` vs `v4l2_request_vp8.c:83-84` | Identical formulas |
|
||||
| `partition_size[0] != first_part_size` | `vaapi_vp8.c:209` derivation + Phase 3 numerics | Differ by `ceil(macroblock_offset/8)` = 819 bytes |
|
||||
| `VAProbabilityDataBufferType` non-existent | `grep VAProbability* va.h va_dec_vp8.h` | Only `VAProbabilityBufferType=13` exists; struct is `VAProbabilityDataBufferVP8` |
|
||||
| `s8` userspace typedef | `grep typedef.*s8 /usr/include/` | Only `__s8` (double-underscore) defined; bare `s8` kernel-only |
|
||||
| VAAPI `quantization_index` is absolute | `vaapi_vp8.c:152-163` | Confirmed `clip(base_quant[s] + yac_qi + delta_c)` |
|
||||
| `coeff_probs` direct memcpy safe | Both paths apply `coeff_bands_inverse` reordering | `vaapi_vp8.c:133-143` and `v4l2_request_vp8.c:141-153` use identical reordering — direct memcpy correct |
|
||||
| `num_dct_parts` off-by-one direction | `vaapi_vp8.c:206`: `num_of_partitions = num_coeff_partitions + 1`; `v4l2_request_vp8.c:80`: `num_dct_parts = num_coeff_partitions` | Plan's `num_of_partitions - 1` confirmed correct |
|
||||
| VAAPI `mb_mode_delta` offset baked in | `vaapi_vp8.c:100`: `[i + 4]`; `v4l2_request_vp8.c:109`: `[i + MODE_I4x4]` (= 4) | Same offset; plan's direct copy correct |
|
||||
| KEY_FRAME inversion direction | `vaapi_vp8.c:58`: `key_frame = !s->keyframe` | Plan's `!pic_fields.bits.key_frame` correct |
|
||||
| `hantro_vp8_prob_update` no fallback | Read `hantro_vp8.c:49-143` | Confirmed — comment in plan Clause 8 was wrong |
|
||||
| All VAAPI field names valid | Line-by-line grep against `va_dec_vp8.h:71-241` | All present |
|
||||
|
||||
## Author's 7 review questions — direct answers
|
||||
|
||||
| Q | Answer |
|
||||
|---|---|
|
||||
| Q1 quantization derivation | Correct for typical content (no clip saturation, base_quant[0]=0). See S1+S2 for edge cases. |
|
||||
| Q2 `first_part_header_bits=0` safety | UNSAFE → C1 |
|
||||
| Q3 `num_dct_parts` off-by-one | `num_of_partitions - 1` correct |
|
||||
| Q4 field availability | All present; `(s8)` fails compile (C4); `VAProbabilityDataBufferType` fails compile (C3) |
|
||||
| Q5 `quant_update[s]` semantics | Signed delta; plan's subtraction correct when `base_quant[0]=0` |
|
||||
| Q6 `SHOW_FRAME` unconditional | Safe for BBB scope; documented fidelity gap for invisible-frame content |
|
||||
| Q7 buffer order independence | Confirmed — each `codec_store_buffer` writes a distinct field |
|
||||
|
||||
## Required Phase 4 plan amendments
|
||||
|
||||
| ID | Location | Change |
|
||||
|---|---|---|
|
||||
| C1 | Clause 9, plan-comment + code | `frame.first_part_header_bits = slice->macroblock_offset;` |
|
||||
| C2 | Clause 9, code | `frame.first_part_size = slice->partition_size[0] + ((uint32_t)slice->macroblock_offset + 7) / 8;` |
|
||||
| C3 | Phase 2 B7 + Phase 4 Commit C | `case VAProbabilityBufferType:` (drop "Data") |
|
||||
| C4 | Clauses 6+7 | `(s8)` → `(int8_t)` (or drop cast) |
|
||||
| S3 | Clause 8 comment + add runtime check | Correct comment; add `assert(probability_set)` or warn-log |
|
||||
|
||||
C1 + C2 affect Clause 9 jointly; rewrite as a single block. C3 affects 5 places (Phase 2 B7 + Phase 4 Commit C site list + B7 patch shape). C4 affects ~6 places (Clauses 6+7). S3 affects 2 places.
|
||||
|
||||
## Process discipline note
|
||||
|
||||
This review was conducted with empirical verification of every Critical claim BEFORE inclusion (memory `feedback_review_empirical_over_theoretical.md` Direction 2). Every C1-C4 finding was verified by grep / source-read of multiple files, and the proposed amendments were cross-checked against working code (FFmpeg VAAPI ref, FFmpeg V4L2 ref). Author independently verified each before incorporating into the amended plan.
|
||||
|
||||
The review caught what would have been a 2-loopback Phase 6 failure cycle (compile failures C3+C4) plus a Phase 7 byte-compare divergence (C1+C2 hardware-DMA-offset bug). Without this review pass, iter3 Phase 6 would have failed first compile attempt, then after fix-forward would have produced visible output but with wrong DMA addresses → Phase 7 → Phase 4 loopback.
|
||||
Reference in New Issue
Block a user