656596aa6b
Second-model review by sonnet-architect found 4 Critical bugs in
Phase 4 plan, all verified empirically by author before incorporation
per memory feedback_review_empirical_over_theoretical Direction 2.
Amendments applied in-place to phase4_iter3_plan.md +
phase2_iter3_situation.md.
Critical findings:
C1 first_part_header_bits = 0 was claimed cosmetic; actually
UNSAFE. hantro_g1_vp8_dec.c:260 + rockchip_vpu2_hw_vp8_dec.c:372
both read this field unconditionally to compute the macroblock
DMA offset. Setting 0 would place hardware at wrong DMA offset
for ALL macroblock data → garbage decode.
Fix: frame.first_part_header_bits = slice->macroblock_offset
(verified by source identity — vaapi_vp8.c:204 and
v4l2_request_vp8.c:83 use byte-identical formulas).
C2 first_part_size = slice->partition_size[0] was wrong; VAAPI's
partition_size[0] is the REMAINING bytes after parsing
(vaapi_vp8.c:209 confirms; va_dec_vp8.h:193-196 spec confirms).
Kernel needs the TOTAL control partition size.
Fix: frame.first_part_size = slice->partition_size[0] +
((macroblock_offset + 7) / 8)
Phase 3 keyframe numerics confirm: 21923 + 819 = 22742 ✓.
C3 VAProbabilityDataBufferType does not exist as a buffer-type
enum; it's the struct name. The actual enum constant is
VAProbabilityBufferType (= 13 per va.h:2058). Switch case
using the wrong identifier would have failed Phase 6 compile.
Fix: replace globally in phase2 + phase4 docs.
C4 (s8) cast undefined in userspace. Kernel has 's8' typedef in
linux/types.h (kernel-internal). UAPI exposes '__s8' (double-
underscore). Userspace portable cast is int8_t from <stdint.h>.
Fix: replace (s8) with (int8_t) in Clauses 6+7.
Suggested:
S3 Clause 8 comment was factually wrong: hantro_vp8.c::
hantro_vp8_prob_update reads coeff_probs unconditionally;
there is NO default-table fallback. If probability_set==false,
decode produces garbage. Practical risk low (FFmpeg vaapi_vp8.c
always sends VAProbabilityBufferType per frame), but corrected
comment + added assert(probability_set) runtime guard for
immediate Phase 6 surfacing.
Plus 5 minor S/Q items documented; non-blocking for iter3.
Author's 7 review questions all answered directly in the review:
Q1 quantization derivation: correct for typical content
Q2 first_part_header_bits=0 safety: UNSAFE → C1
Q3 num_dct_parts off-by-one: confirmed correct
Q4 field availability: 2 compile failures found (C3 + C4)
Q5 quant_update[s] semantics: signed delta confirmed
Q6 SHOW_FRAME unconditional: safe for BBB scope
Q7 buffer order independence: confirmed
Estimated saving: 1 Phase 6 → Phase 4 loopback + 2 Phase 6 fix-
forward commits. Review pass is the right path forward per memory
rule "Reviews are never skippable" — empty-review value =
empirical-verification value, regardless of finding count.
Refs:
phase4_iter3_plan.md (amended in-place; Phase 5 amendments
section appended)
phase2_iter3_situation.md (amended C3 globally)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
377 lines
27 KiB
Markdown
377 lines
27 KiB
Markdown
# Iteration 3 — Phase 2 (situation analysis)
|
|
|
|
Source-read of every file the iter3 patch series will touch, plus the kernel UAPI + VAAPI + downstream FFmpeg + kernel hantro reference sources. Written immediately after iter3 Phase 1 lock (commit `ea2413e`). Conducted on noether against fork tip `8d71e20` (iter2 Phase 6 commit B); fresnel.vpn was unreachable at Phase 2 open, so the read is against the noether mirror — verified at commit hash level pre-read.
|
|
|
|
This is a contract-before-code analysis per `feedback_dev_process.md` Phase 2: enumerate the bugs, cite the contract verbatim, predict the patch shape, queue the Phase 3 baseline questions.
|
|
|
|
## Bug enumeration (sites the iter3 patch series must touch)
|
|
|
|
### B1 — `src/config.c::RequestQueryConfigProfiles` — VP8 enumeration block missing
|
|
|
|
**Site**: `config.c:121-165`.
|
|
|
|
**Current state** (lines 128-160): three enumeration blocks for MPEG-2 (lines 128-137), H.264 (139-151), HEVC (153-160). Each `v4l2_find_format()`'s the OUTPUT-side pixfmt against both single-plane and MPLANE buffer types, then conditionally appends profile constants to the output array under a count guard.
|
|
|
|
**Bug**: no analogous block for `V4L2_PIX_FMT_VP8_FRAME` → `VAProfileVP8Version0_3`. Without this, `vainfo` (and any consumer that calls `vaQueryConfigProfiles`) sees no VP8 profile in the enumeration → criterion 1 fails before vaCreateConfig is ever attempted.
|
|
|
|
**Different from iter1+iter2**: iter1 (MPEG-2) and iter2 (HEVC) had the enumeration block already in place pre-iter; only the case label fall-through in `RequestCreateConfig` was missing. iter3 has neither. Both ADDs.
|
|
|
|
### B2 — `src/config.c::RequestCreateConfig` — VP8 case label missing entirely
|
|
|
|
**Site**: `config.c:54-78`.
|
|
|
|
**Current state**: switch over `profile`. iter1 added `case VAProfileMPEG2Simple/Main:` with explicit `break;` (lines 63-69). iter2 added `case VAProfileHEVCMain:` with `break;` (lines 70-75). H.264 always existed (lines 56-62, marked `// FIXME` from upstream). Default → `VA_STATUS_ERROR_UNSUPPORTED_PROFILE`.
|
|
|
|
**Bug**: no `case VAProfileVP8Version0_3:`. Hits default → consumer gets `VA_STATUS_ERROR_UNSUPPORTED_PROFILE` from vaCreateConfig → criterion 2 fails.
|
|
|
|
**Patch shape**: add 4-line case (label + comment + `break;`) directly after the iter2 HEVCMain block, mirroring iter1+iter2 style.
|
|
|
|
### B3 — `src/config.c::RequestQueryConfigEntrypoints` — VP8 case missing
|
|
|
|
**Site**: `config.c:167-191`.
|
|
|
|
**Current state**: switch over `profile`; case list at lines 173-180 covers MPEG-2/H.264/HEVC and falls through to `entrypoints[0] = VAEntrypointVLD; *entrypoints_count = 1;`. Default sets count to 0.
|
|
|
|
**Bug**: no `case VAProfileVP8Version0_3:`. mpv-vaapi's profile probe queries entry points; without VLD, it skips VP8 → criterion 3 fails (mpv falls through to SW decode silently).
|
|
|
|
**Patch shape**: add `case VAProfileVP8Version0_3:` to the existing fall-through case list.
|
|
|
|
### B4 — `src/vp8.c` — file does not exist; needs net-new implementation
|
|
|
|
**Site**: NEW FILE `src/vp8.c`.
|
|
|
|
**Bug**: there is no VP8 codec dispatcher in the fork. The fork's predecessor (libva-v4l2-request bootlin master) only implements MPEG-2 + H.264 + HEVC. VP8 was never added upstream.
|
|
|
|
**Patch shape**: NEW file, ~150-200 lines. Mirror the iter1 mpeg2.c template (`src/mpeg2.c:53-249`):
|
|
- Includes block (mpeg2.h-equivalent + context + request + surface + v4l2-controls)
|
|
- `vp8_set_controls()` function entry point matching the existing dispatcher signature `(struct request_data *driver_data, struct object_context *context_object, struct object_surface *surface_object) -> int`
|
|
- Local `v4l2_ctrl_vp8_frame` struct populated from VAAPI buffers (Picture + IQMatrix + Probability + Slice param)
|
|
- DPB-timestamp lookup for `last_frame_ts`/`golden_frame_ts`/`alt_frame_ts` from `VASurfaceID` references in VAPictureParameterBufferVP8
|
|
- One-element `v4l2_ext_control` array, single `V4L2_CID_STATELESS_VP8_FRAME` control
|
|
- Single `v4l2_set_controls(driver_data->video_fd, surface_object->request_fd, ctrls, 1)` call
|
|
|
|
### B5 — `src/vp8.h` — header does not exist
|
|
|
|
**Site**: NEW FILE `src/vp8.h`.
|
|
|
|
**Bug**: companion header for vp8.c. Declare `vp8_set_controls()`. Mirror `src/mpeg2.h` (forward declarations of `request_data`, `object_context`, `object_surface`, function prototype). No struct definitions needed (no array dimensions to declare like HEVC's `HEVC_MAX_SLICES_PER_FRAME`).
|
|
|
|
### B6 — `src/picture.c::codec_set_controls` — VP8 dispatch case missing
|
|
|
|
**Site**: `picture.c:188-225` (function `codec_set_controls`).
|
|
|
|
**Current state**: switch over profile; MPEG-2 → `mpeg2_set_controls` (lines 196-201), H.264 → `h264_set_controls` (203-212), HEVCMain → `h265_set_controls` (214-218). Default → `VA_STATUS_ERROR_UNSUPPORTED_PROFILE`.
|
|
|
|
**Bug**: no VP8 case. Hits default after RequestEndPicture → vaEndPicture returns error → consumer aborts decode.
|
|
|
|
**Patch shape**: add `case VAProfileVP8Version0_3:` calling `vp8_set_controls(driver_data, context_object, surface_object)` with same `if (rc < 0) return VA_STATUS_ERROR_OPERATION_FAILED;` shape as MPEG-2 + HEVC.
|
|
|
|
Plus include directive update: add `#include "vp8.h"` near `picture.c:34-36` (the existing `h264.h`/`h265.h`/`mpeg2.h` block).
|
|
|
|
### B7 — `src/picture.c::codec_store_buffer` — 4 VAAPI buffer types unmapped
|
|
|
|
**Site**: `picture.c:54-186` (function `codec_store_buffer`).
|
|
|
|
VAAPI VP8 sends FOUR distinct per-frame buffer types (per `va_dec_vp8.h:71-241`):
|
|
|
|
| VAAPI buffer type | VAAPI struct | Per-frame |
|
|
|---|---|---|
|
|
| `VAPictureParameterBufferType` | `VAPictureParameterBufferVP8` | once |
|
|
| `VASliceParameterBufferType` | `VASliceParameterBufferVP8` | once (frame-mode) |
|
|
| `VAProbabilityBufferType` | `VAProbabilityDataBufferVP8` | once |
|
|
| `VAIQMatrixBufferType` | `VAIQMatrixBufferVP8` | once |
|
|
| `VASliceDataBufferType` | raw bitstream | once |
|
|
|
|
**Current state**:
|
|
- `VASliceDataBufferType` (lines 61-83) — already universal, no per-profile branch. `context->h264_start_code` flag prepends `00 00 01` for H.264 only; VP8 does not need start-code prefix (VP8 has its own 3-byte uncompressed frame header). The slice-data path is fine for VP8 unmodified.
|
|
- `VAPictureParameterBufferType` (lines 85-113) — switch over profile; MPEG-2/H.264/HEVC handled. Default → break (silent ignore). Bug: no VP8 case.
|
|
- `VASliceParameterBufferType` (lines 115-146) — switch; H.264/HEVC handled. Bug: no MPEG-2 case (intentional — MPEG-2 has only Picture + Quant + Slice-data per VAAPI), no VP8 case.
|
|
- `VAIQMatrixBufferType` (lines 148-179) — switch; MPEG-2/H.264/HEVC handled. Bug: no VP8 case.
|
|
- `VAProbabilityBufferType` — NOT IN THE OUTER SWITCH. VAAPI defines this enum value for VP8, but the fork's `codec_store_buffer` outer switch doesn't list it. Currently falls through to `default: break;` at line 181. Bug: VAProbabilityBufferType case missing entirely.
|
|
|
|
**Patch shape**: 4 nested case adds + 1 outer-case add:
|
|
- VAPictureParameterBufferType → add VP8 case → memcpy into `surface_object->params.vp8.picture`
|
|
- VASliceParameterBufferType → add VP8 case → memcpy into `surface_object->params.vp8.slice` (single, no slices[] array — VP8 is frame-mode)
|
|
- VAIQMatrixBufferType → add VP8 case → memcpy into `surface_object->params.vp8.iqmatrix` + set `iqmatrix_set` true
|
|
- NEW outer case `VAProbabilityBufferType` → switch over profile → VP8 case → memcpy into `surface_object->params.vp8.probability` + set `probability_set` true
|
|
|
|
### B8 — `src/picture.c::RequestBeginPicture` — no per-frame VP8 reset needed (probably)
|
|
|
|
**Site**: `picture.c:227-306`.
|
|
|
|
iter1 added `surface_object->params.h264.matrix_set = false;` at line 299. iter2 added `surface_object->params.h265.num_slices = 0;` at line 300.
|
|
|
|
**Bug analysis**: VP8 has no slice-array (single per-frame). It does have a probability-data flag (`probability_set`) that needs reset per frame. AND iqmatrix_set needs per-frame reset.
|
|
|
|
**Patch shape**: add two lines:
|
|
- `surface_object->params.vp8.iqmatrix_set = false;`
|
|
- `surface_object->params.vp8.probability_set = false;`
|
|
|
|
This mirrors iter1's `matrix_set = false` reset pattern (one line each profile).
|
|
|
|
### B9 — `src/surface.h::object_surface::params` union — no `vp8` member
|
|
|
|
**Site**: `surface.h:92-113`.
|
|
|
|
**Current state**: union of three structs: `mpeg2`, `h264`, `h265`. Each holds the buffer-type structs the dispatcher reads.
|
|
|
|
**Bug**: no `vp8` member. iter1 B3 latent surface-reuse bug (per phase0_findings_iter3.md): `picture.c:299` writes byte 240 of the union (h264.matrix_set offset). The iter2 union is dominated by h265 with its 64-slot slices[64] array; total union size ~17 KB. Adding a `vp8` member doesn't grow the union (h265 is the dominant member by far).
|
|
|
|
**Patch shape**: add `vp8` struct after `h265`:
|
|
```c
|
|
struct {
|
|
VAPictureParameterBufferVP8 picture;
|
|
VASliceParameterBufferVP8 slice;
|
|
VAIQMatrixBufferVP8 iqmatrix;
|
|
bool iqmatrix_set;
|
|
VAProbabilityDataBufferVP8 probability;
|
|
bool probability_set;
|
|
} vp8;
|
|
```
|
|
|
|
### B10 — `src/meson.build` — `vp8.c` + `vp8.h` not in sources/headers
|
|
|
|
**Site**: `meson.build:30-74`.
|
|
|
|
**Current state**: `sources` list has `mpeg2.c`/`h264.c`/`h264_slice_header.c`/`h265.c` (line 50, uncommented in iter2). `headers` list has `mpeg2.h`/`h264.h`/`h264_slice_header.h`/`h265.h` (line 73).
|
|
|
|
**Bug**: vp8.c + vp8.h are NEW files, must be ADDED.
|
|
|
|
**Patch shape**: insert `'vp8.c'` after `'h265.c'` in sources, insert `'vp8.h'` after `'h265.h'` in headers.
|
|
|
|
### Non-bugs (intentionally NOT touched)
|
|
|
|
- `src/context.c` — VP8 has no DECODE_MODE/START_CODE menus per Phase 0 V4L2 inventory. iter2's HEVC additions to context.c have no analog. **No context.c changes.**
|
|
- `src/video.c::formats[]` — the format list is CAPTURE-side (NV12 + Sunxi NV12). VP8 is OUTPUT-side; OUTPUT format probing is `v4l2_find_format()` calls in config.c, NOT video.c. **No video.c changes.**
|
|
- `src/v4l2.c` — `v4l2_find_format()` is fourcc-agnostic. **No v4l2.c changes.**
|
|
- `src/buffer.c` — `VAProbabilityBufferType` is a standard VAAPI buffer type; the buffer registry is type-agnostic. **No buffer.c changes.**
|
|
- `include/hevc-ctrls.h` — already a 9-line shim including `<linux/v4l2-controls.h>`. VP8's V4L2_CID_STATELESS_VP8_FRAME is in the same kernel UAPI header (line 1900). No header-shim work like iter1's `mpeg2-ctrls.h` deletion.
|
|
|
|
## Contract surface (verbatim from kernel UAPI + VAAPI)
|
|
|
|
### Kernel UAPI: `V4L2_CID_STATELESS_VP8_FRAME`
|
|
|
|
`<linux/v4l2-controls.h>:1900` — `V4L2_CID_STATELESS_VP8_FRAME = V4L2_CID_CODEC_STATELESS_BASE + 200 = 0x00a409c8`. Matches the per-device control advertised by hantro-vpu-dec in Phase 0 V4L2 inventory (`vp8_frame_parameters 0x00a409c8`).
|
|
|
|
### Kernel UAPI: `struct v4l2_ctrl_vp8_frame` (`<linux/v4l2-controls.h>:1929-1958`)
|
|
|
|
```c
|
|
struct v4l2_ctrl_vp8_frame {
|
|
struct v4l2_vp8_segment segment; /* offset 0 */
|
|
struct v4l2_vp8_loop_filter lf; /* loop filter parameters */
|
|
struct v4l2_vp8_quantization quant; /* base quant indices */
|
|
struct v4l2_vp8_entropy entropy; /* update probabilities */
|
|
struct v4l2_vp8_entropy_coder_state coder_state;
|
|
|
|
__u16 width;
|
|
__u16 height;
|
|
|
|
__u8 horizontal_scale;
|
|
__u8 vertical_scale;
|
|
|
|
__u8 version;
|
|
__u8 prob_skip_false;
|
|
__u8 prob_intra;
|
|
__u8 prob_last;
|
|
__u8 prob_gf;
|
|
__u8 num_dct_parts;
|
|
|
|
__u32 first_part_size;
|
|
__u32 first_part_header_bits;
|
|
__u32 dct_part_sizes[8];
|
|
|
|
__u64 last_frame_ts;
|
|
__u64 golden_frame_ts;
|
|
__u64 alt_frame_ts;
|
|
|
|
__u64 flags;
|
|
};
|
|
```
|
|
|
|
Sub-structs (`<linux/v4l2-controls.h>:1785-1888`):
|
|
|
|
- `v4l2_vp8_segment`: `__s8 quant_update[4]; __s8 lf_update[4]; __u8 segment_probs[3]; __u8 padding; __u32 flags;` (segment-id probabilities, per-segment quant/lf overrides, flags `V4L2_VP8_SEGMENT_FLAG_{ENABLED, UPDATE_MAP, UPDATE_FEATURE_DATA, DELTA_VALUE_MODE}`)
|
|
- `v4l2_vp8_loop_filter`: `__s8 ref_frm_delta[4]; __s8 mb_mode_delta[4]; __u8 sharpness_level; __u8 level; __u16 padding; __u32 flags;` (flags `V4L2_VP8_LF_{ADJ_ENABLE, DELTA_UPDATE, FILTER_TYPE_SIMPLE}`)
|
|
- `v4l2_vp8_quantization`: `__u8 y_ac_qi; __s8 y_dc_delta; __s8 y2_dc_delta; __s8 y2_ac_delta; __s8 uv_dc_delta; __s8 uv_ac_delta; __u16 padding;` — base values; per-segment overrides come from `segment.quant_update[]`
|
|
- `v4l2_vp8_entropy`: `__u8 coeff_probs[4][8][3][11]; __u8 y_mode_probs[4]; __u8 uv_mode_probs[3]; __u8 mv_probs[2][19]; __u8 padding[3];` — probability update tables
|
|
- `v4l2_vp8_entropy_coder_state`: `__u8 range; __u8 value; __u8 bit_count; __u8 padding;` — boolean coder state at end of header
|
|
|
|
Frame flags (`<linux/v4l2-controls.h>:1890-1895`):
|
|
|
|
- `V4L2_VP8_FRAME_FLAG_KEY_FRAME = 0x01`
|
|
- `V4L2_VP8_FRAME_FLAG_EXPERIMENTAL = 0x02`
|
|
- `V4L2_VP8_FRAME_FLAG_SHOW_FRAME = 0x04`
|
|
- `V4L2_VP8_FRAME_FLAG_MB_NO_SKIP_COEFF = 0x08`
|
|
- `V4L2_VP8_FRAME_FLAG_SIGN_BIAS_GOLDEN = 0x10`
|
|
- `V4L2_VP8_FRAME_FLAG_SIGN_BIAS_ALT = 0x20`
|
|
|
|
### VAAPI buffer types (`/home/mfritsche/src/ohm_gl_fix/phase6/step1/reference/libva/va/va_dec_vp8.h`)
|
|
|
|
`VAPictureParameterBufferVP8` (lines 71-160):
|
|
- `frame_width`, `frame_height` (u32)
|
|
- `last_ref_frame`, `golden_ref_frame`, `alt_ref_frame`, `out_of_loop_frame` (VASurfaceID)
|
|
- `pic_fields.bits.{key_frame, version, segmentation_enabled, update_mb_segmentation_map, update_segment_feature_data, filter_type, sharpness_level, loop_filter_adj_enable, mode_ref_lf_delta_update, sign_bias_golden, sign_bias_alternate, mb_no_coeff_skip, loop_filter_disable}` (packed bitfield)
|
|
- `mb_segment_tree_probs[3]` (u8)
|
|
- `loop_filter_level[4]`, `loop_filter_deltas_ref_frame[4]`, `loop_filter_deltas_mode[4]` (per-segment / per-ref / per-mode)
|
|
- `prob_skip_false`, `prob_intra`, `prob_last`, `prob_gf` (u8)
|
|
- `y_mode_probs[4]`, `uv_mode_probs[3]` (u8 — luma + chroma intra-prediction probs)
|
|
- `mv_probs[2][19]` (u8)
|
|
- `bool_coder_ctx.{range, value, count}` (u8 — same bytes as kernel `v4l2_vp8_entropy_coder_state` minus `padding`)
|
|
|
|
`VASliceParameterBufferVP8` (lines 170-202):
|
|
- `slice_data_size`, `slice_data_offset`, `slice_data_flag`, `macroblock_offset` (u32)
|
|
- `num_of_partitions` (u8)
|
|
- `partition_size[9]` (u32) — partition_size[0] is control-partition remaining bytes; partition_size[1..8] are DCT partition sizes (max 8 DCT partitions per VP8 spec)
|
|
|
|
`VAProbabilityDataBufferVP8` (lines 218-223):
|
|
- `dct_coeff_probs[4][8][3][11]` (u8) — direct match to kernel `v4l2_vp8_entropy.coeff_probs`
|
|
|
|
`VAIQMatrixBufferVP8` (lines 232-241):
|
|
- `quantization_index[4][6]` (u16) — per-segment, per-component effective Q index. Component order: yac(0), ydc(1), y2dc(2), y2ac(3), uvdc(4), uvac(5). Already includes per-segment effective values.
|
|
|
|
### FFmpeg downstream reference (`v4l2_request_vp8.c:31-187`)
|
|
|
|
Submission shape: single batched S_EXT_CTRLS at end_frame, count=1, V4L2_CID_STATELESS_VP8_FRAME with full v4l2_ctrl_vp8_frame struct. **No init-time device-wide menus** (no DECODE_MODE/START_CODE for VP8 — confirmed by absence in FFmpeg ref + Phase 0 V4L2 inventory).
|
|
|
|
Bitstream is appended verbatim (`v4l2_request_vp8_decode_slice` calls `ff_v4l2_request_append_output(buffer, size)` once per frame with the WHOLE VP8 frame including 3-byte uncompressed header). NO Annex-B start codes, NO start-code emulation prevention. The kernel hantro driver re-parses the 3-byte (or 10-byte for keyframe) uncompressed header.
|
|
|
|
### Kernel hantro driver reference (`hantro_vp8.c:49-143`)
|
|
|
|
`hantro_vp8_prob_update()` reads:
|
|
- `hdr->prob_skip_false`, `hdr->prob_intra`, `hdr->prob_last`, `hdr->prob_gf`
|
|
- `hdr->segment.segment_probs[0..2]`
|
|
- `hdr->entropy.{y_mode_probs[4], uv_mode_probs[3], mv_probs[2][19], coeff_probs[4][8][3][11]}`
|
|
|
|
The kernel does NOT read `hdr->coder_state.padding` or `quant.padding` or `lf.padding` — they're zeroed by struct designation initializer in C. **All `padding` fields must be left zero in the libva backend** (matches FFmpeg ref, which uses C99 designated init defaulting all unset fields to zero).
|
|
|
|
## Mapping table (VAAPI → V4L2 / kernel)
|
|
|
|
The libva backend's job: read VAAPI's per-frame buffers (Picture + Slice + Probability + IQMatrix) and write the kernel's `v4l2_ctrl_vp8_frame`. The VAAPI consumer (mpv/ffmpeg-vaapi) has already parsed the bitstream — the libva backend is field-shuffling only, no bitstream parsing.
|
|
|
|
| Kernel field | VAAPI source | Notes |
|
|
|---|---|---|
|
|
| `width`, `height` | `picture->frame_width`, `frame_height` | u32 → u16, both ≤65535 within campaign codec scope (1920 max) |
|
|
| `version` | `picture->pic_fields.bits.version` | 3-bit field |
|
|
| `horizontal_scale`, `vertical_scale` | 0, 0 | VAAPI doesn't expose; FFmpeg ref also hardcodes 0 |
|
|
| `prob_skip_false` | `picture->prob_skip_false` | direct |
|
|
| `prob_intra` | `picture->prob_intra` | direct |
|
|
| `prob_last` | `picture->prob_last` | direct |
|
|
| `prob_gf` | `picture->prob_gf` | direct |
|
|
| `num_dct_parts` | `slice->num_of_partitions - 1` | VAAPI's count includes control partition; kernel's excludes (per-spec). Verify against Phase 3 trace. |
|
|
| `first_part_size` | `slice->partition_size[0]` | control-partition size |
|
|
| `first_part_header_bits` | DERIVED — see below | not in VAAPI directly |
|
|
| `dct_part_sizes[0..7]` | `slice->partition_size[1..8]` | shift by 1 to skip control partition |
|
|
| `last_frame_ts` | DPB lookup `picture->last_ref_frame` | VASurfaceID → object_surface->timestamp → v4l2_timeval_to_ns() (mirror mpeg2.c::pic.forward_ref_ts pattern) |
|
|
| `golden_frame_ts` | DPB lookup `picture->golden_ref_frame` | same as above |
|
|
| `alt_frame_ts` | DPB lookup `picture->alt_ref_frame` | same as above |
|
|
| `flags & KEY_FRAME` | `picture->pic_fields.bits.key_frame == 0` | VAAPI inverts — VP8 spec says key_frame=0 means key-frame |
|
|
| `flags & SHOW_FRAME` | not in VAAPI | force 1 (mpv only renders shown frames; alt-ref invisible frames are also shown=1 to mpv consumer side; safe to force) |
|
|
| `flags & MB_NO_SKIP_COEFF` | `picture->pic_fields.bits.mb_no_coeff_skip` | direct |
|
|
| `flags & SIGN_BIAS_GOLDEN` | `picture->pic_fields.bits.sign_bias_golden` | direct |
|
|
| `flags & SIGN_BIAS_ALT` | `picture->pic_fields.bits.sign_bias_alternate` | direct |
|
|
| `flags & EXPERIMENTAL` | 0 | VAAPI doesn't expose; FFmpeg uses `s->profile & 0x4` which has no VAAPI analog. Leave 0. |
|
|
| `coder_state.range` | `picture->bool_coder_ctx.range` | direct |
|
|
| `coder_state.value` | `picture->bool_coder_ctx.value` | direct |
|
|
| `coder_state.bit_count` | `picture->bool_coder_ctx.count` | VAAPI calls it `count` |
|
|
| `lf.sharpness_level` | `picture->pic_fields.bits.sharpness_level` | direct |
|
|
| `lf.level` | `picture->loop_filter_level[0]` | base level (segment 0); VAAPI exposes per-segment, kernel takes base only |
|
|
| `lf.ref_frm_delta[0..3]` | `picture->loop_filter_deltas_ref_frame[0..3]` | direct |
|
|
| `lf.mb_mode_delta[0..3]` | `picture->loop_filter_deltas_mode[0..3]` | direct |
|
|
| `lf.flags & ADJ_ENABLE` | `picture->pic_fields.bits.loop_filter_adj_enable` | direct |
|
|
| `lf.flags & DELTA_UPDATE` | `picture->pic_fields.bits.mode_ref_lf_delta_update` | direct |
|
|
| `lf.flags & FILTER_TYPE_SIMPLE` | `picture->pic_fields.bits.filter_type` | VAAPI: filter_type=0 normal, =1 simple |
|
|
| `quant.y_ac_qi` | `iqmatrix->quantization_index[0][0]` | segment 0, yac component |
|
|
| `quant.y_dc_delta` | `iqmatrix->quantization_index[0][1] - iqmatrix->quantization_index[0][0]` | u8 - u8 → s8 (clamp) |
|
|
| `quant.y2_dc_delta` | `iqmatrix->quantization_index[0][2] - iqmatrix->quantization_index[0][0]` | same |
|
|
| `quant.y2_ac_delta` | `iqmatrix->quantization_index[0][3] - iqmatrix->quantization_index[0][0]` | same |
|
|
| `quant.uv_dc_delta` | `iqmatrix->quantization_index[0][4] - iqmatrix->quantization_index[0][0]` | same |
|
|
| `quant.uv_ac_delta` | `iqmatrix->quantization_index[0][5] - iqmatrix->quantization_index[0][0]` | same |
|
|
| `segment.quant_update[s]` | for s∈[1..3]: `iqmatrix->quantization_index[s][0] - iqmatrix->quantization_index[0][0]` if segmentation enabled, else 0 | when segmentation_enabled=0 (BBB case), all quant_updates are 0 — bypass the per-segment math |
|
|
| `segment.lf_update[s]` | for s∈[1..3]: `picture->loop_filter_level[s] - picture->loop_filter_level[0]` if segmentation enabled, else 0 | same |
|
|
| `segment.segment_probs[0..2]` | `picture->mb_segment_tree_probs[0..2]` | direct |
|
|
| `segment.flags & ENABLED` | `picture->pic_fields.bits.segmentation_enabled` | direct |
|
|
| `segment.flags & UPDATE_MAP` | `picture->pic_fields.bits.update_mb_segmentation_map` | direct |
|
|
| `segment.flags & UPDATE_FEATURE_DATA` | `picture->pic_fields.bits.update_segment_feature_data` | direct |
|
|
| `segment.flags & DELTA_VALUE_MODE` | NOT in VAAPI directly | VAAPI doesn't expose abs_delta. Per VP8 spec default, segment values are deltas unless explicitly absolute — the FFmpeg ref sets DELTA_VALUE_MODE iff `!s->segmentation.absolute_vals`. For BBB (segmentation disabled), this flag's value is irrelevant. Leave 0; document the gap for Phase 5 review. |
|
|
| `entropy.y_mode_probs[0..3]` | `picture->y_mode_probs[0..3]` | direct |
|
|
| `entropy.uv_mode_probs[0..2]` | `picture->uv_mode_probs[0..2]` | direct |
|
|
| `entropy.mv_probs[i][j]` | `picture->mv_probs[i][j]` | direct, [2][19] both sides |
|
|
| `entropy.coeff_probs[i][j][k][l]` | `probability->dct_coeff_probs[i][j][k][l]` | DIFFERENT BUFFER — sourced from VAProbabilityDataBuffer not Picture. Direct shape match [4][8][3][11]. |
|
|
|
|
### `first_part_header_bits` derivation
|
|
|
|
This field is a kernel-imposed metadata about the bitstream: number of bits consumed by the uncompressed header partition before the boolean coder takes over. FFmpeg derives it from internal parser state:
|
|
|
|
```c
|
|
.first_part_header_bits = (8 * (s->coder_state_at_header_end.input - data) -
|
|
s->coder_state_at_header_end.bit_count - 8),
|
|
```
|
|
|
|
VAAPI does not expose this directly. **Open question for Phase 3 baseline**: derive from `slice->macroblock_offset` (bit offset of MB layer from start of slice data) — likely equal or off by a known constant. Phase 3 captures the verbatim payload from ffmpeg-v4l2request and computes the relationship.
|
|
|
|
If the kernel ignores `first_part_header_bits` (some drivers do — hantro re-parses), the field can be left zero or approximate. Phase 5 review will flag this as a known fidelity gap.
|
|
|
|
## Patch shape prediction
|
|
|
|
| Site | Action | LOC delta |
|
|
|---|---|---|
|
|
| `src/config.c:121-160` | INSERT VP8 enumeration block (~10 lines) | +10 |
|
|
| `src/config.c:54-78` | INSERT case label + break + comment (~5 lines) | +5 |
|
|
| `src/config.c:167-191` | INSERT case label (~1 line) | +1 |
|
|
| `src/vp8.c` | NEW FILE | +160-220 |
|
|
| `src/vp8.h` | NEW FILE | +35-45 |
|
|
| `src/picture.c:34-36` | INSERT `#include "vp8.h"` | +1 |
|
|
| `src/picture.c:188-225` | INSERT VP8 dispatch case (~6 lines) | +6 |
|
|
| `src/picture.c:54-186` | INSERT 4 nested cases + 1 outer case | +30-40 |
|
|
| `src/picture.c:299-300` | INSERT 2 reset lines | +2 |
|
|
| `src/surface.h:92-113` | INSERT vp8 struct (~8 lines) | +8 |
|
|
| `src/meson.build:50,73` | INSERT 2 entries | +2 |
|
|
|
|
Total: ~260-340 LOC across 6 modified files + 2 new files. Compared to iter1 (~120 LOC, 4 modified + 0 new + 1 deleted) and iter2 (~470 LOC, 5 modified + 0 new + 0 deleted), iter3 is medium-sized — the new file dominates. The dispatcher additions in picture.c + config.c are mechanical ports of iter1+iter2 patterns.
|
|
|
|
## Open questions for Phase 3 baseline
|
|
|
|
The Phase 3 baseline run will capture verbatim S_EXT_CTRLS payloads from `ffmpeg -hwaccel v4l2request bbb_720p10s_vp8.webm` (cross-validator anchor). Questions to answer empirically before Phase 4 plan locks:
|
|
|
|
1. **first_part_header_bits exact value**: capture for frame 1 (key) and frame 2 (inter). Compare against `slice->macroblock_offset` from a parallel `vainfo --vbo`-equivalent capture.
|
|
2. **num_dct_parts vs num_of_partitions**: confirm off-by-one (kernel excludes, VAAPI includes control partition). Verify dct_part_sizes[] indexing.
|
|
3. **DPB timestamp lookup**: confirm v4l2_timeval_to_ns(picture->last_ref_frame's surface_object->timestamp) matches what the kernel hantro driver reads. Any 0-sentinel for missing refs? (FFmpeg leaves zero for missing refs by C99 designated init.)
|
|
4. **show_frame handling**: VAAPI doesn't expose. Force 1 vs derive — which matches the kernel's expectation? (BBB has no alt-ref invisible frames; both options should work for the binding cell, but verify.)
|
|
5. **lf.flags FILTER_TYPE_SIMPLE bit**: VAAPI's filter_type=1 means simple. Confirm against bitstream baseline.
|
|
6. **First-frame DPB sentinel**: when `picture->last_ref_frame == VA_INVALID_SURFACE`, what does FFmpeg ref's `last_frame_ts` end up as? (Likely 0; verify.)
|
|
|
|
These answers feed Phase 4 plan clauses. None are blocking — all have safe defaults that work for the BBB binding cell.
|
|
|
|
## Phase 3 baseline targets (work plan)
|
|
|
|
To answer the open questions above, Phase 3 will run on fresnel (when reachable):
|
|
|
|
1. **Cross-validator capture**: `strace -ff -tt -y -v -e trace=ioctl ffmpeg -hwaccel v4l2request -i ~/fourier-test/bbb_720p10s_vp8.webm -frames:v 5 -f null - 2>strace.log` with hantro-vpu-dec env vars. Extract S_EXT_CTRLS payload bytes for VP8_FRAME control across frames 1 (key) and 2 (inter).
|
|
2. **VAAPI-side trace**: `LIBVA_TRACE=/tmp/vp8_libva.trace mpv --hwdec=no --vo=null --frames=2 ~/fourier-test/bbb_720p10s_vp8.webm` to confirm VAAPI consumer chain (mpv's parser produces VAPictureParameterBufferVP8 + slice + iqmatrix + probability buffers).
|
|
3. **Cache-safe verify path baseline**: `mpv --hwdec=no --vo=image --frames=2 --start=00:00:02 ~/fourier-test/bbb_720p10s_vp8.webm` and capture `frame-0001.jpg` + `frame-0002.jpg` SHA256s (SW reference for criterion 4 byte-compare in Phase 7).
|
|
|
|
## Phase 4 plan structure (anticipated)
|
|
|
|
Following iter2's 10-clause plan template:
|
|
|
|
- Clause 1: device-init batched submission contract (VP8 has none — clause is empty / N/A)
|
|
- Clause 2: per-frame batched submission shape (count=1, VP8_FRAME control)
|
|
- Clause 3: VAAPI → V4L2 mapping table (the table above, normalized to plan-prose form)
|
|
- Clause 4: DPB timestamp resolution
|
|
- Clause 5: quantization base+delta derivation from VAAPI's denormalized matrix
|
|
- Clause 6: probability table mapping (separate buffer source)
|
|
- Clause 7: BeginPicture per-frame reset (iqmatrix_set, probability_set)
|
|
- Clause 8: surface union extension
|
|
- Clause 9: enumeration + dispatch wiring (config.c + picture.c)
|
|
- Clause 10: meson + new file integration
|
|
|
|
The plan will cite verbatim Phase 3 baseline payload bytes for fields where the mapping is non-obvious (quant deltas, first_part_header_bits) per `feedback_dev_process.md` Phase 6 contract-before-code.
|
|
|
|
## Substrate state at Phase 2 close
|
|
|
|
- iter3 Phase 1 commit `ea2413e` pushed to gitea (campaign repo).
|
|
- Fork on noether at iter2 tip `8d71e20` (synced via `git fetch origin && git merge --ff-only origin/master` from previous commit `229d6d1`).
|
|
- Fresnel.vpn unreachable at Phase 2 read time; Phase 3 baseline + Phase 6 builds need the laptop online. Memory rule — don't offer pause prompts; will wait for fresnel to come back online OR the user to wake it before Phase 3.
|
|
- All 5 memory entries still apply: gitea-as-claude-noether, no-session-termination-attempts, header-deletion-check, review-empirical-over-theoretical (BOTH directions), rockchip-pixel-verify-path.
|
|
- Phase 3 baseline questions queued (6 items above).
|