iter3 Phase 3: baselines — VP8 cross-validator + 3-codec regression

+ SW reference

Captured on fresnel 2026-05-08 across two suspend cycles (laptop
dropped twice mid-run, captures preserved on /tmp/iter3_phase3).
All Phase 3 deliverables green.

Substrate verification:
  backend SHA256: 9e27...6258 (matches iter2 close)
  3-codec regression block: ALL 6 reference hashes match byte-for-
  byte vs iter1+iter2 (H.264 +30s, MPEG-2 +02s, HEVC +02s on rkvdec/
  hantro). Substrate has not regressed; criterion-5 anchor solid.

Cross-validator anchor (ffmpeg-v4l2request VP8 strace):
  - VIDIOC_S_EXT_CTRLS, count=1, ctrl_class=V4L2_CTRL_CLASS_CODEC_
    STATELESS, id=0xa409c8, size=1232 bytes
  - struct size CORRECTED: v4l2_ctrl_vp8_frame = 1232 bytes (NOT
    400 as one might assume; entropy.coeff_probs[4][8][3][11] alone
    is 1056 bytes)
  - keyframe (frame 1) verbatim payload captured: y_ac_qi=8,
    last/golden/alt ts all 0, flags=0x0d (KEY|SHOW|NOSKIP),
    y_mode_probs=[145,156,163,128] (matches FFmpeg keyframe const)
  - inter frame verbatim payload captured: y_ac_qi=122, all DPB
    timestamps non-zero, flags=0x66 (anomaly: bit 0x40 not in
    mainline UAPI; vendor-patched ffmpeg-v4l2-request-git;
    kernel hantro_vp8.c only inspects KEY_FRAME bit, ignores
    bit 0x40)

VP8 SW pixel-verify reference (criterion-4 anchor):
  vp8_sw_001.jpg: e43757a40e5d71ad176455c0fda14c2cbf9351b702188fc8ad
                  584d789db2c984
  vp8_sw_002.jpg: a86bf885e588257731ff6cf8d2ccc5756be550e85220eee1c3
                  e6ea8c0c78e97a
  Frame 1 != Frame 2 (real motion). These are the Phase 7 byte-
  compare HW-vs-SW targets.

Open-question resolution (5 of 6 answered empirically):

  Q1 first_part_header_bits — varies per frame (key=6550, inter
     ranges 86..254); VAAPI doesn't expose. Phase 4 fallback:
     leave 0 and check kernel behavior at Phase 7 byte-compare.
     Phase 5 review will flag as known fidelity gap.

  Q2 num_dct_parts vs VAAPI num_of_partitions — confirmed off-by-
     one: kernel = VAAPI - 1 (BBB has VAAPI=2, kernel=1).

  Q3 DPB timestamp 0-sentinel — confirmed: keyframe writes all
     three timestamps as 0; iter3 mirrors iter1 mpeg2.c pattern.

  Q4 SHOW_FRAME default — set on every captured frame (BBB has no
     alt-ref invisible). Force unconditional in libva backend.

  Q5 lf.flags FILTER_TYPE_SIMPLE — not set; BBB normal loop filter.
     Direct mapping from VAAPI filter_type=0.

  Q6 First-frame DPB sentinel — confirmed Q3; no self-reference
     fallback needed (different from iter1 mpeg2.c).

V4L2 binding cells this boot:
  rkvdec        : /dev/video3 + /dev/media1
  hantro-vpu-dec: /dev/video5 + /dev/media2

Capture artefacts on fresnel /tmp/iter3_phase3/ preserved for
Phase 7 re-run:
  vp8_strace.* (19 files, multi-thread)
  decode_vp8.py (payload decoder)
  vp8_sw_00{1,2}.jpg (criterion-4)
  {h264,mpeg2,hevc}_hw_00{1,2}.jpg (criterion-5)

Refs:
  phase0_findings_iter3.md (Phase 1 lock)
  phase2_iter3_situation.md (Phase 2 contract surface)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-08 20:14:46 +00:00
parent 898544a29c
commit fd3fce86a6
+214
View File
@@ -0,0 +1,214 @@
# Iteration 3 — Phase 3 (baselines)
Captured 2026-05-08 on fresnel after the laptop returned from suspend (twice — laptop dropped mid-capture, captures preserved on `/tmp/iter3_phase3` between runs). Phase 3 deliverables per `feedback_dev_process.md`:
1. Substrate re-verification (criterion-5 anchor) — ✅
2. Cross-validator anchor (verbatim VP8_FRAME control payload) — ✅
3. VAAPI consumer trace — deferred to Phase 6 build-time check (see step 3.4)
4. Cache-safe pixel-verify SW reference (criterion-4 anchor) — ✅
5. Phase 2 open-question answers — ✅ (5 of 6 answered empirically; 1 deferred)
6. Three-codec regression block — ✅
## Pre-flight (verified)
```
hostname : fresnel
kernel : 6.19.9-99-eos-arm
mpv : 1:0.41.0-3 (NOT mpv-git — L3 satisfied)
libva : 2.23.0-1
ffmpeg : ffmpeg-v4l2-request-git 2:8.1.r123329.b57fbbe-2
backend SHA256 : 9e27043847998c197a46a1a26b2f77f22880bb7b3a62aa4d60d8fcaec0ae6258 ← matches iter2 close
fixture : ~/fourier-test/bbb_720p10s_vp8.webm (2419912 bytes)
V4L2 binding cells (this boot):
rkvdec : /dev/video3 + /dev/media1
hantro-vpu-dec: /dev/video5 + /dev/media2
```
## Step 3.1 — Regression-block reference (criterion-5 anchor)
All 6 reference hashes match byte-for-byte vs iter1+iter2 close — substrate has not regressed.
| Codec | Site | Frame 1 SHA256 | Frame 2 SHA256 | Status |
|---|---|---|---|---|
| H.264 +30s | rkvdec | `f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9` | `7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8` | ✅ MATCH |
| MPEG-2 +02s | hantro | `6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092` | `ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de` | ✅ MATCH |
| HEVC +02s | rkvdec | `47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5` | `a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656` | ✅ MATCH |
JPEGs preserved at `/tmp/iter3_phase3/{h264,mpeg2,hevc}_hw_00{1,2}.jpg` for re-running Phase 7 byte-compare.
## Step 3.2 — VP8 SW pixel-verify reference (criterion-4 anchor)
`mpv --hwdec=no --vo=image --vo-image-format=jpg --frames=2 --start=00:00:02 ~/fourier-test/bbb_720p10s_vp8.webm`:
| Frame | SHA256 | Size |
|---|---|---|
| 1 | `e43757a40e5d71ad176455c0fda14c2cbf9351b702188fc8ad584d789db2c984` | 235990 bytes |
| 2 | `a86bf885e588257731ff6cf8d2ccc5756be550e85220eee1c3e6ea8c0c78e97a` | 232549 bytes |
Frame 1 ≠ Frame 2 (real motion). These two hashes are the Phase 7 criterion-4 byte-equality target.
JPEGs preserved at `/tmp/iter3_phase3/vp8_sw_00{1,2}.jpg`.
## Step 3.3 — Cross-validator strace + V4L2_CID_STATELESS_VP8_FRAME payload
`strace -ff -tt -y -v -s 4096 -e trace=ioctl,openat,close ffmpeg -hwaccel v4l2request -hwaccel_device /dev/media2 -i bbb_720p10s_vp8.webm -frames:v 5 -f null -`. ffmpeg-v4l2-request-git decoded 5 frames; strace produced 7 worker-thread PID files + helpers. Decoded payloads via custom python decoder at `/tmp/iter3_phase3/decode_vp8.py` (kept on disk for re-run).
### Submission shape (confirmed)
| Property | Value | Source |
|---|---|---|
| ioctl | `VIDIOC_S_EXT_CTRLS` | strace verbatim |
| `ctrl_class` | `0xf010000` (`V4L2_CTRL_CLASS_CODEC_STATELESS`) | strace verbatim |
| `count` | `1` | strace verbatim — confirms single-control-per-frame predicted in Phase 2 |
| `controls[0].id` | `0xa409c8` | matches `V4L2_CID_STATELESS_VP8_FRAME` from kernel UAPI |
| `controls[0].size` | `1232` | **CORRECTION vs Phase 2** — original commit message of iter1 said "400 bytes" for v4l2_ctrl_vp8_frame; actual is 1232 bytes. Computed: 16(seg)+16(lf)+8(quant)+1104(entropy)+4(coder_state)+84(tail) = 1232. The big component is `entropy.coeff_probs[4][8][3][11] = 1056 bytes`. |
### Verbatim payload — frame 1 (keyframe, PID 2860 first call)
```
struct v4l2_vp8_segment:
quant_update[4] = (0, 0, 0, 0)
lf_update[4] = (0, 0, 0, 0)
segment_probs[3] = (0, 0, 0)
flags = 0x08 (V4L2_VP8_SEGMENT_FLAG_DELTA_VALUE_MODE)
struct v4l2_vp8_loop_filter:
ref_frm_delta[4] = (2, 0, -2, -2)
mb_mode_delta[4] = (4, -2, 2, 4)
sharpness_level = 0
level = 1
flags = 0x03 (ADJ_ENABLE | DELTA_UPDATE)
struct v4l2_vp8_quantization:
y_ac_qi = 8
y_dc_delta = 0
y2_dc_delta = 0
y2_ac_delta = 0
uv_dc_delta = 0
uv_ac_delta = 0
struct v4l2_vp8_entropy:
sha1(1104 bytes) = 8b2fdae200eb193f...
y_mode_probs[4] = (145, 156, 163, 128) ← FFmpeg's hardcoded keyframe_y_mode_probs
uv_mode_probs[3] = (142, 114, 183) ← FFmpeg's hardcoded keyframe_uv_mode_probs
struct v4l2_vp8_entropy_coder_state:
range = 248
value = 133
bit_count = 2
width × height = 1280 × 720
horizontal_scale = 0
vertical_scale = 0
version = 0
prob_skip_false = 255
prob_intra = 0 ← KEY frame: intra always-on; field unused; FFmpeg writes parser state which is 0
prob_last = 0 ← same
prob_gf = 0 ← same
num_dct_parts = 1
first_part_size = 22742
first_part_header_bits = 6550
dct_part_sizes[8] = (277872, 0, 0, 0, 0, 0, 0, 0)
last_frame_ts = 0 ← KEY frame: no prior reference
golden_frame_ts = 0 ← same
alt_frame_ts = 0 ← same
flags = 0x0d (KEY_FRAME | SHOW_FRAME | MB_NO_SKIP_COEFF)
```
### Verbatim payload — frame 2 (inter, PID 2860 second call)
```
segment: same as frame 1 (BBB has segmentation disabled)
lf: ref/mb deltas same; sharp=0; level=15; flags=0x01 (DELTA_UPDATE bit clears post-keyframe)
quant: y_ac_qi=122; all deltas=0
entropy: sha1=e5742b9050e8dc66 (CHANGED — BBB inter-frame entropy state)
y_mode_probs = (3, 1, 128, 1) ← parser-derived inter probs
uv_mode_probs = (162, 101, 204) ← parser-derived
coder_state: range=150 value=69 bit_count=3 ← post-frame-1 boolean coder state
prob_skip_false=14 prob_intra=1 prob_last=251 prob_gf=255 num_dct_parts=1
first_part_size=1218 first_part_header_bits=133
dct_part_sizes=(122,0,0,0,0,0,0,0)
last_frame_ts=5000 golden_frame_ts=11000 alt_frame_ts=11000
flags = 0x66 (see flags-anomaly note below)
```
### Flags-anomaly note (informational; not blocking iter3)
The empirical inter-frame `flags=0x66 = bit 0x02 | 0x04 | 0x20 | 0x40` is set by ffmpeg-v4l2-request-git — bit `0x40` is **not defined** in mainline `<linux/v4l2-controls.h>` (only bits 0x01..0x20 are). The keyframe correctly produces `flags=0x0d = 0x01|0x04|0x08`.
The `0x40` extra bit is a vendor-patched additional flag in the installed ffmpeg-v4l2-request-git (kwiboo branch may have downstream changes vs the in-tree reference). The kernel `hantro_vp8.c` driver only inspects `V4L2_VP8_FRAME_IS_KEY_FRAME(hdr)` — bit 0x40 is silently ignored.
**Phase 4 plan implication**: the libva backend should set ONLY the 6 mainline-documented flag bits per Phase 2 mapping table. We will NOT attempt to byte-match FFmpeg's `0x66` for inter frames during Phase 7 cross-validator byte-compare; instead, the Phase 7 byte-compare will be field-by-field with explicit allow-list for `flags` (KEY_FRAME bit + the 5 boolean transcoded from VAAPI).
EXPERIMENTAL bit (0x02) is set by FFmpeg per `if (s->profile & 0x4)`. BBB profile=0, so `s->profile & 0x4 == 0`. The empirical 0x02 set on inter frames suggests ffmpeg-v4l2-request-git either (a) has a different conditional or (b) sets it from a different field. Either way, libva backend skips this — VAAPI doesn't expose it.
## Step 3.4 — VAAPI consumer trace (DEFERRED)
`LIBVA_TRACE` capture from `mpv --hwdec=no --vo=null` is uninformative because mpv with hwdec=no doesn't engage the libva decode path (uses libva for color-conversion only). Capturing the libva-side decode-path trace requires HW-decode mode, which iter3's whole point is to enable.
**Decision**: defer VAAPI buffer-type enumeration verification to Phase 6 build-time. Phase 2 source-read of `va_dec_vp8.h` already enumerated the 4 buffer types (Picture, Slice, Probability, IQMatrix); Phase 6 build will verify the dispatcher accepts them. If a buffer type is missing or extra, Phase 6 compile/runtime will surface it.
## Step 3.5 — Open-question resolution
Six Phase 2 questions; empirical answers:
### Q1: `first_part_header_bits` exact value
**Frame 1 (key)**: 6550 bits. Frame 2 (inter): 133, 86, 140, 254, 86 (varies). FFmpeg derives this from `s->coder_state_at_header_end.input - data` minus residual bits. VAAPI does NOT expose this directly.
**Phase 4 implication**: VAAPI's `slice->macroblock_offset` (bit offset of MB layer from start of slice data) is the closest analog. **However**, `slice->macroblock_offset` is the MB-data offset (after BOTH the uncompressed header AND the entropy header), whereas `first_part_header_bits` is just the entropy header portion. They differ by `first_part_size * 8 - first_part_header_bits` (the entropy-encoded part of the control partition).
**Fallback strategy**: leave `first_part_header_bits = 0` and check whether kernel hantro driver actually uses it. If it doesn't (likely — the driver re-parses the bitstream), zero is correct. If it does, Phase 7 byte-compare will reveal divergence and Phase 4 will need to compute it bitstream-side. **Phase 5 review will flag this as a known fidelity gap**.
### Q2: `num_dct_parts` vs VAAPI `num_of_partitions`
**Empirical**: `num_dct_parts = 1` for every captured frame. BBB has 2 partitions total (1 control + 1 DCT). VAAPI's `slice->num_of_partitions = 2`. Confirms predicted off-by-one: `num_dct_parts = slice->num_of_partitions - 1`.
### Q3: DPB timestamp 0-sentinel handling
**Empirical**: Frame 1 (key) has `last_ts=0, golden_ts=0, alt_ts=0` — all three zero. Inter frames have all three non-zero (referencing prior visible frames). Confirms FFmpeg writes 0 for missing refs (matches `forward_ref_ts=0` pattern from iter1 mpeg2.c::mpeg2_set_controls).
**Phase 4 implication**: in vp8_set_controls, lookup VASurfaceID `picture->{last,golden,alt}_ref_frame`; if `SURFACE() == NULL` (i.e. `VA_INVALID_SURFACE` or stale ID), leave timestamp = 0. Mirror iter1 mpeg2.c pattern (lines 146-156).
### Q4: `SHOW_FRAME` flag default
**Empirical**: `flags & SHOW_FRAME (0x04)` set on every captured frame (key + inter). BBB has no alt-ref invisible frames in the +0..2s range. **Phase 4 decision**: force `flags |= SHOW_FRAME` unconditionally — VAAPI doesn't expose the bit, and BBB is all-visible. Document as known fidelity gap for streams with alt-ref invisible frames (iter3 out-of-scope per Phase 1 lock).
### Q5: `lf.flags & FILTER_TYPE_SIMPLE`
**Empirical**: not set on any captured frame. BBB uses normal (not simple) loop filter. Confirms VAAPI's `pic_fields.bits.filter_type=0` for BBB — direct mapping per Phase 2 table.
### Q6: First-frame DPB sentinel
**Empirical**: confirmed Q3 above — `last_ts=golden_ts=alt_ts=0` for the keyframe. No self-reference fallback (different from iter1's mpeg2.c where I had to fix self-reference to use 0 sentinel; FFmpeg's VP8 path naturally writes 0 via C99 designated init).
## Phase 3 → Phase 4 transition (proceed condition)
All Phase 3 deliverables green:
- Substrate not regressed (3-codec hashes hold, criterion-5 anchor solid)
- Cross-validator strace captured (~13 S_EXT_CTRLS for 5 frames decoded; verbatim payload for keyframe + inter frames available)
- struct size CORRECTED to 1232 bytes (vs Phase 2 implicit assumption of ~400)
- 5 of 6 open questions answered empirically; Q1 (first_part_header_bits) deferred with safe-default fallback
- VP8 SW reference JPEGs captured (criterion-4 anchor)
Phase 4 plan can lock against:
1. Verbatim keyframe + inter-frame payload bytes (above) for byte-compare anchors.
2. Confirmed quantization deltas all zero for BBB → libva backend computes `quant.y_dc_delta = quantization_index[0][1] - quantization_index[0][0]` and verify all-zero for BBB; mapping is correct.
3. Confirmed segment fields all zero for BBB (segmentation disabled); `segment.flags |= DELTA_VALUE_MODE` per FFmpeg pattern, but kernel ignores when ENABLED bit clear.
4. `lf.flags = ADJ_ENABLE` for BBB on inter frames; `ADJ_ENABLE | DELTA_UPDATE` on keyframe (DELTA_UPDATE only when keyframe initializes the loop-filter delta state).
5. `flags` byte-compare to use mainline-documented bits only (libva backend will produce 0x0d for keyframe, 0x04|... for inter frames; FFmpeg's bit 0x40 explicitly NOT replicated).
## Substrate state at Phase 3 close
- iter3 Phase 1 + Phase 2 commits pushed to gitea (`ea2413e`, `898544a`).
- Fork on noether at iter2 tip `8d71e20`; Phase 6 patches will land here.
- fresnel went offline twice during Phase 3 capture (suspend mid-run), captures preserved on `/tmp/iter3_phase3` between runs.
- Memory rules carry forward unchanged (5 entries + new `feedback_fresnel_hostname`).
- Capture artefacts on fresnel `/tmp/iter3_phase3/`:
- `vp8_strace.*` — 19 strace files (multi-thread)
- `decode_vp8.py` — payload decoder (kept for Phase 7 re-run)
- `vp8_sw_00{1,2}.jpg` — SW reference (criterion-4 anchor)
- `{h264,mpeg2,hevc}_hw_00{1,2}.jpg` — regression block (criterion-5 anchors)