iter3 Phase 3: baselines — VP8 cross-validator + 3-codec regression

+ SW reference Captured on fresnel 2026-05-08 across two suspend cycles (laptop dropped twice mid-run, captures preserved on /tmp/iter3_phase3). All Phase 3 deliverables green. Substrate verification: backend SHA256: 9e27...6258 (matches iter2 close) 3-codec regression block: ALL 6 reference hashes match byte-for- byte vs iter1+iter2 (H.264 +30s, MPEG-2 +02s, HEVC +02s on rkvdec/ hantro). Substrate has not regressed; criterion-5 anchor solid. Cross-validator anchor (ffmpeg-v4l2request VP8 strace): - VIDIOC_S_EXT_CTRLS, count=1, ctrl_class=V4L2_CTRL_CLASS_CODEC_ STATELESS, id=0xa409c8, size=1232 bytes - struct size CORRECTED: v4l2_ctrl_vp8_frame = 1232 bytes (NOT 400 as one might assume; entropy.coeff_probs[4][8][3][11] alone is 1056 bytes) - keyframe (frame 1) verbatim payload captured: y_ac_qi=8, last/golden/alt ts all 0, flags=0x0d (KEY|SHOW|NOSKIP), y_mode_probs=[145,156,163,128] (matches FFmpeg keyframe const) - inter frame verbatim payload captured: y_ac_qi=122, all DPB timestamps non-zero, flags=0x66 (anomaly: bit 0x40 not in mainline UAPI; vendor-patched ffmpeg-v4l2-request-git; kernel hantro_vp8.c only inspects KEY_FRAME bit, ignores bit 0x40) VP8 SW pixel-verify reference (criterion-4 anchor): vp8_sw_001.jpg: e43757a40e5d71ad176455c0fda14c2cbf9351b702188fc8ad 584d789db2c984 vp8_sw_002.jpg: a86bf885e588257731ff6cf8d2ccc5756be550e85220eee1c3 e6ea8c0c78e97a Frame 1 != Frame 2 (real motion). These are the Phase 7 byte- compare HW-vs-SW targets. Open-question resolution (5 of 6 answered empirically): Q1 first_part_header_bits — varies per frame (key=6550, inter ranges 86..254); VAAPI doesn't expose. Phase 4 fallback: leave 0 and check kernel behavior at Phase 7 byte-compare. Phase 5 review will flag as known fidelity gap. Q2 num_dct_parts vs VAAPI num_of_partitions — confirmed off-by- one: kernel = VAAPI - 1 (BBB has VAAPI=2, kernel=1). Q3 DPB timestamp 0-sentinel — confirmed: keyframe writes all three timestamps as 0; iter3 mirrors iter1 mpeg2.c pattern. Q4 SHOW_FRAME default — set on every captured frame (BBB has no alt-ref invisible). Force unconditional in libva backend. Q5 lf.flags FILTER_TYPE_SIMPLE — not set; BBB normal loop filter. Direct mapping from VAAPI filter_type=0. Q6 First-frame DPB sentinel — confirmed Q3; no self-reference fallback needed (different from iter1 mpeg2.c). V4L2 binding cells this boot: rkvdec : /dev/video3 + /dev/media1 hantro-vpu-dec: /dev/video5 + /dev/media2 Capture artefacts on fresnel /tmp/iter3_phase3/ preserved for Phase 7 re-run: vp8_strace.* (19 files, multi-thread) decode_vp8.py (payload decoder) vp8_sw_00{1,2}.jpg (criterion-4) {h264,mpeg2,hevc}_hw_00{1,2}.jpg (criterion-5) Refs: phase0_findings_iter3.md (Phase 1 lock) phase2_iter3_situation.md (Phase 2 contract surface) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:14:46 +00:00
parent 898544a29c
commit fd3fce86a6
1 changed files with 214 additions and 0 deletions
@@ -0,0 +1,214 @@
+# Iteration 3 — Phase 3 (baselines)
+
+Captured 2026-05-08 on fresnel after the laptop returned from suspend (twice — laptop dropped mid-capture, captures preserved on `/tmp/iter3_phase3` between runs). Phase 3 deliverables per `feedback_dev_process.md`:
+
+1. Substrate re-verification (criterion-5 anchor) — ✅
+2. Cross-validator anchor (verbatim VP8_FRAME control payload) — ✅
+3. VAAPI consumer trace — deferred to Phase 6 build-time check (see step 3.4)
+4. Cache-safe pixel-verify SW reference (criterion-4 anchor) — ✅
+5. Phase 2 open-question answers — ✅ (5 of 6 answered empirically; 1 deferred)
+6. Three-codec regression block — ✅
+
+## Pre-flight (verified)
+
+```
+hostname           : fresnel
+kernel             : 6.19.9-99-eos-arm
+mpv                : 1:0.41.0-3 (NOT mpv-git — L3 satisfied)
+libva              : 2.23.0-1
+ffmpeg             : ffmpeg-v4l2-request-git 2:8.1.r123329.b57fbbe-2
+backend SHA256     : 9e27043847998c197a46a1a26b2f77f22880bb7b3a62aa4d60d8fcaec0ae6258  ← matches iter2 close
+fixture            : ~/fourier-test/bbb_720p10s_vp8.webm (2419912 bytes)
+
+V4L2 binding cells (this boot):
+  rkvdec        : /dev/video3 + /dev/media1
+  hantro-vpu-dec: /dev/video5 + /dev/media2
+```
+
+## Step 3.1 — Regression-block reference (criterion-5 anchor)
+
+All 6 reference hashes match byte-for-byte vs iter1+iter2 close — substrate has not regressed.
+
+| Codec | Site | Frame 1 SHA256 | Frame 2 SHA256 | Status |
+|---|---|---|---|---|
+| H.264 +30s | rkvdec | `f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9` | `7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8` | ✅ MATCH |
+| MPEG-2 +02s | hantro | `6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092` | `ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de` | ✅ MATCH |
+| HEVC +02s | rkvdec | `47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5` | `a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656` | ✅ MATCH |
+
+JPEGs preserved at `/tmp/iter3_phase3/{h264,mpeg2,hevc}_hw_00{1,2}.jpg` for re-running Phase 7 byte-compare.
+
+## Step 3.2 — VP8 SW pixel-verify reference (criterion-4 anchor)
+
+`mpv --hwdec=no --vo=image --vo-image-format=jpg --frames=2 --start=00:00:02 ~/fourier-test/bbb_720p10s_vp8.webm`:
+
+| Frame | SHA256 | Size |
+|---|---|---|
+| 1 | `e43757a40e5d71ad176455c0fda14c2cbf9351b702188fc8ad584d789db2c984` | 235990 bytes |
+| 2 | `a86bf885e588257731ff6cf8d2ccc5756be550e85220eee1c3e6ea8c0c78e97a` | 232549 bytes |
+
+Frame 1 ≠ Frame 2 (real motion). These two hashes are the Phase 7 criterion-4 byte-equality target.
+
+JPEGs preserved at `/tmp/iter3_phase3/vp8_sw_00{1,2}.jpg`.
+
+## Step 3.3 — Cross-validator strace + V4L2_CID_STATELESS_VP8_FRAME payload
+
+`strace -ff -tt -y -v -s 4096 -e trace=ioctl,openat,close ffmpeg -hwaccel v4l2request -hwaccel_device /dev/media2 -i bbb_720p10s_vp8.webm -frames:v 5 -f null -`. ffmpeg-v4l2-request-git decoded 5 frames; strace produced 7 worker-thread PID files + helpers. Decoded payloads via custom python decoder at `/tmp/iter3_phase3/decode_vp8.py` (kept on disk for re-run).
+
+### Submission shape (confirmed)
+
+| Property | Value | Source |
+|---|---|---|
+| ioctl | `VIDIOC_S_EXT_CTRLS` | strace verbatim |
+| `ctrl_class` | `0xf010000` (`V4L2_CTRL_CLASS_CODEC_STATELESS`) | strace verbatim |
+| `count` | `1` | strace verbatim — confirms single-control-per-frame predicted in Phase 2 |
+| `controls[0].id` | `0xa409c8` | matches `V4L2_CID_STATELESS_VP8_FRAME` from kernel UAPI |
+| `controls[0].size` | `1232` | **CORRECTION vs Phase 2** — original commit message of iter1 said "400 bytes" for v4l2_ctrl_vp8_frame; actual is 1232 bytes. Computed: 16(seg)+16(lf)+8(quant)+1104(entropy)+4(coder_state)+84(tail) = 1232. The big component is `entropy.coeff_probs[4][8][3][11] = 1056 bytes`. |
+
+### Verbatim payload — frame 1 (keyframe, PID 2860 first call)
+
+```
+struct v4l2_vp8_segment:
+  quant_update[4]   = (0, 0, 0, 0)
+  lf_update[4]      = (0, 0, 0, 0)
+  segment_probs[3]  = (0, 0, 0)
+  flags             = 0x08  (V4L2_VP8_SEGMENT_FLAG_DELTA_VALUE_MODE)
+
+struct v4l2_vp8_loop_filter:
+  ref_frm_delta[4]  = (2, 0, -2, -2)
+  mb_mode_delta[4]  = (4, -2, 2, 4)
+  sharpness_level   = 0
+  level             = 1
+  flags             = 0x03  (ADJ_ENABLE | DELTA_UPDATE)
+
+struct v4l2_vp8_quantization:
+  y_ac_qi           = 8
+  y_dc_delta        = 0
+  y2_dc_delta       = 0
+  y2_ac_delta       = 0
+  uv_dc_delta       = 0
+  uv_ac_delta       = 0
+
+struct v4l2_vp8_entropy:
+  sha1(1104 bytes)  = 8b2fdae200eb193f...
+  y_mode_probs[4]   = (145, 156, 163, 128)  ← FFmpeg's hardcoded keyframe_y_mode_probs
+  uv_mode_probs[3]  = (142, 114, 183)        ← FFmpeg's hardcoded keyframe_uv_mode_probs
+
+struct v4l2_vp8_entropy_coder_state:
+  range             = 248
+  value             = 133
+  bit_count         = 2
+
+width × height      = 1280 × 720
+horizontal_scale    = 0
+vertical_scale      = 0
+version             = 0
+prob_skip_false     = 255
+prob_intra          = 0      ← KEY frame: intra always-on; field unused; FFmpeg writes parser state which is 0
+prob_last           = 0      ← same
+prob_gf             = 0      ← same
+num_dct_parts       = 1
+first_part_size     = 22742
+first_part_header_bits = 6550
+dct_part_sizes[8]   = (277872, 0, 0, 0, 0, 0, 0, 0)
+last_frame_ts       = 0      ← KEY frame: no prior reference
+golden_frame_ts     = 0      ← same
+alt_frame_ts        = 0      ← same
+flags               = 0x0d   (KEY_FRAME | SHOW_FRAME | MB_NO_SKIP_COEFF)
+```
+
+### Verbatim payload — frame 2 (inter, PID 2860 second call)
+
+```
+segment: same as frame 1 (BBB has segmentation disabled)
+lf: ref/mb deltas same; sharp=0; level=15; flags=0x01  (DELTA_UPDATE bit clears post-keyframe)
+quant: y_ac_qi=122; all deltas=0
+entropy: sha1=e5742b9050e8dc66 (CHANGED — BBB inter-frame entropy state)
+  y_mode_probs   = (3, 1, 128, 1)             ← parser-derived inter probs
+  uv_mode_probs  = (162, 101, 204)            ← parser-derived
+coder_state: range=150 value=69 bit_count=3   ← post-frame-1 boolean coder state
+prob_skip_false=14 prob_intra=1 prob_last=251 prob_gf=255 num_dct_parts=1
+first_part_size=1218 first_part_header_bits=133
+dct_part_sizes=(122,0,0,0,0,0,0,0)
+last_frame_ts=5000  golden_frame_ts=11000  alt_frame_ts=11000
+flags = 0x66  (see flags-anomaly note below)
+```
+
+### Flags-anomaly note (informational; not blocking iter3)
+
+The empirical inter-frame `flags=0x66 = bit 0x02 | 0x04 | 0x20 | 0x40` is set by ffmpeg-v4l2-request-git — bit `0x40` is **not defined** in mainline `<linux/v4l2-controls.h>` (only bits 0x01..0x20 are). The keyframe correctly produces `flags=0x0d = 0x01|0x04|0x08`.
+
+The `0x40` extra bit is a vendor-patched additional flag in the installed ffmpeg-v4l2-request-git (kwiboo branch may have downstream changes vs the in-tree reference). The kernel `hantro_vp8.c` driver only inspects `V4L2_VP8_FRAME_IS_KEY_FRAME(hdr)` — bit 0x40 is silently ignored.
+
+**Phase 4 plan implication**: the libva backend should set ONLY the 6 mainline-documented flag bits per Phase 2 mapping table. We will NOT attempt to byte-match FFmpeg's `0x66` for inter frames during Phase 7 cross-validator byte-compare; instead, the Phase 7 byte-compare will be field-by-field with explicit allow-list for `flags` (KEY_FRAME bit + the 5 boolean transcoded from VAAPI).
+
+EXPERIMENTAL bit (0x02) is set by FFmpeg per `if (s->profile & 0x4)`. BBB profile=0, so `s->profile & 0x4 == 0`. The empirical 0x02 set on inter frames suggests ffmpeg-v4l2-request-git either (a) has a different conditional or (b) sets it from a different field. Either way, libva backend skips this — VAAPI doesn't expose it.
+
+## Step 3.4 — VAAPI consumer trace (DEFERRED)
+
+`LIBVA_TRACE` capture from `mpv --hwdec=no --vo=null` is uninformative because mpv with hwdec=no doesn't engage the libva decode path (uses libva for color-conversion only). Capturing the libva-side decode-path trace requires HW-decode mode, which iter3's whole point is to enable.
+
+**Decision**: defer VAAPI buffer-type enumeration verification to Phase 6 build-time. Phase 2 source-read of `va_dec_vp8.h` already enumerated the 4 buffer types (Picture, Slice, Probability, IQMatrix); Phase 6 build will verify the dispatcher accepts them. If a buffer type is missing or extra, Phase 6 compile/runtime will surface it.
+
+## Step 3.5 — Open-question resolution
+
+Six Phase 2 questions; empirical answers:
+
+### Q1: `first_part_header_bits` exact value
+
+**Frame 1 (key)**: 6550 bits. Frame 2 (inter): 133, 86, 140, 254, 86 (varies). FFmpeg derives this from `s->coder_state_at_header_end.input - data` minus residual bits. VAAPI does NOT expose this directly.
+
+**Phase 4 implication**: VAAPI's `slice->macroblock_offset` (bit offset of MB layer from start of slice data) is the closest analog. **However**, `slice->macroblock_offset` is the MB-data offset (after BOTH the uncompressed header AND the entropy header), whereas `first_part_header_bits` is just the entropy header portion. They differ by `first_part_size * 8 - first_part_header_bits` (the entropy-encoded part of the control partition).
+
+**Fallback strategy**: leave `first_part_header_bits = 0` and check whether kernel hantro driver actually uses it. If it doesn't (likely — the driver re-parses the bitstream), zero is correct. If it does, Phase 7 byte-compare will reveal divergence and Phase 4 will need to compute it bitstream-side. **Phase 5 review will flag this as a known fidelity gap**.
+
+### Q2: `num_dct_parts` vs VAAPI `num_of_partitions`
+
+**Empirical**: `num_dct_parts = 1` for every captured frame. BBB has 2 partitions total (1 control + 1 DCT). VAAPI's `slice->num_of_partitions = 2`. Confirms predicted off-by-one: `num_dct_parts = slice->num_of_partitions - 1`.
+
+### Q3: DPB timestamp 0-sentinel handling
+
+**Empirical**: Frame 1 (key) has `last_ts=0, golden_ts=0, alt_ts=0` — all three zero. Inter frames have all three non-zero (referencing prior visible frames). Confirms FFmpeg writes 0 for missing refs (matches `forward_ref_ts=0` pattern from iter1 mpeg2.c::mpeg2_set_controls).
+
+**Phase 4 implication**: in vp8_set_controls, lookup VASurfaceID `picture->{last,golden,alt}_ref_frame`; if `SURFACE() == NULL` (i.e. `VA_INVALID_SURFACE` or stale ID), leave timestamp = 0. Mirror iter1 mpeg2.c pattern (lines 146-156).
+
+### Q4: `SHOW_FRAME` flag default
+
+**Empirical**: `flags & SHOW_FRAME (0x04)` set on every captured frame (key + inter). BBB has no alt-ref invisible frames in the +0..2s range. **Phase 4 decision**: force `flags |= SHOW_FRAME` unconditionally — VAAPI doesn't expose the bit, and BBB is all-visible. Document as known fidelity gap for streams with alt-ref invisible frames (iter3 out-of-scope per Phase 1 lock).
+
+### Q5: `lf.flags & FILTER_TYPE_SIMPLE`
+
+**Empirical**: not set on any captured frame. BBB uses normal (not simple) loop filter. Confirms VAAPI's `pic_fields.bits.filter_type=0` for BBB — direct mapping per Phase 2 table.
+
+### Q6: First-frame DPB sentinel
+
+**Empirical**: confirmed Q3 above — `last_ts=golden_ts=alt_ts=0` for the keyframe. No self-reference fallback (different from iter1's mpeg2.c where I had to fix self-reference to use 0 sentinel; FFmpeg's VP8 path naturally writes 0 via C99 designated init).
+
+## Phase 3 → Phase 4 transition (proceed condition)
+
+All Phase 3 deliverables green:
+
+- Substrate not regressed (3-codec hashes hold, criterion-5 anchor solid)
+- Cross-validator strace captured (~13 S_EXT_CTRLS for 5 frames decoded; verbatim payload for keyframe + inter frames available)
+- struct size CORRECTED to 1232 bytes (vs Phase 2 implicit assumption of ~400)
+- 5 of 6 open questions answered empirically; Q1 (first_part_header_bits) deferred with safe-default fallback
+- VP8 SW reference JPEGs captured (criterion-4 anchor)
+
+Phase 4 plan can lock against:
+
+1. Verbatim keyframe + inter-frame payload bytes (above) for byte-compare anchors.
+2. Confirmed quantization deltas all zero for BBB → libva backend computes `quant.y_dc_delta = quantization_index[0][1] - quantization_index[0][0]` and verify all-zero for BBB; mapping is correct.
+3. Confirmed segment fields all zero for BBB (segmentation disabled); `segment.flags |= DELTA_VALUE_MODE` per FFmpeg pattern, but kernel ignores when ENABLED bit clear.
+4. `lf.flags = ADJ_ENABLE` for BBB on inter frames; `ADJ_ENABLE | DELTA_UPDATE` on keyframe (DELTA_UPDATE only when keyframe initializes the loop-filter delta state).
+5. `flags` byte-compare to use mainline-documented bits only (libva backend will produce 0x0d for keyframe, 0x04|... for inter frames; FFmpeg's bit 0x40 explicitly NOT replicated).
+
+## Substrate state at Phase 3 close
+
+- iter3 Phase 1 + Phase 2 commits pushed to gitea (`ea2413e`, `898544a`).
+- Fork on noether at iter2 tip `8d71e20`; Phase 6 patches will land here.
+- fresnel went offline twice during Phase 3 capture (suspend mid-run), captures preserved on `/tmp/iter3_phase3` between runs.
+- Memory rules carry forward unchanged (5 entries + new `feedback_fresnel_hostname`).
+- Capture artefacts on fresnel `/tmp/iter3_phase3/`:
+  - `vp8_strace.*` — 19 strace files (multi-thread)
+  - `decode_vp8.py` — payload decoder (kept for Phase 7 re-run)
+  - `vp8_sw_00{1,2}.jpg` — SW reference (criterion-4 anchor)
+  - `{h264,mpeg2,hevc}_hw_00{1,2}.jpg` — regression block (criterion-5 anchors)