# Iteration 14 — Phase 8 (close) Closes 2026-05-14. iter14 = α-16 OUTPUT bitstream byte dump. **Definitive empirical narrowing of Bug 4 + Bug 5 to kernel-side.** PARTIAL on the campaign's success criteria but represents the largest single jump in understanding since iter5b. ## Outcome | Metric | Value | |---|---| | Fork tip end | `522fb6d` (α-16 OUTPUT dump) | | LOC delta | +43 in `src/picture.c` | | Backend SHA on fresnel | `fa2098b69fd484ea2e4e9b6208d9e1a996358ae64401b47b5ac8bdb166e3c972` | | Phase 1 criteria | 5/6 PASS — Bug 4/5 hashes unchanged but **cause definitively localized** | ## The key result For HEVC frame 1 (IDR keyframe), 96893-byte OUTPUT dump from libva: ``` size: 96893 start codes (00 00 01) at: [0] # exactly ONE start code, at position 0 total: 1 pos 0: NAL header 0x28, nal_unit_type=20 # IDR_N_LP, correct ``` Comparison against input file's raw HEVC ANNEX-B IDR NAL: ``` libva slice (after start code) size: 96890 bytes input file's slice NAL+data size (up to next start code): 96891 bytes byte-by-byte diff over min(96890,96891)=96890: 0 bytes differ ``` **0 bytes differ.** libva's OUTPUT buffer contains exactly the IDR NAL the kernel should decode. kdirect (ffmpeg-v4l2request) submits the same bytes from the same parser. Both backends submit identical bitstream. ## What this empirically rules out Cumulative iter8–iter14 eliminations for Bug 4 + Bug 5: | Iter | Hypothesis | Status | |---|---|---| | iter8 P7 | γ dump: libva mis-reads | ❌ Eliminated | | iter8 P7 | Slot binding wrong | ❌ Eliminated | | iter8 IMP-1 | Stale residue (memset test) | ❌ Eliminated | | iter8 Phase 5b | SPS constraint_set_flags | ❌ Eliminated (rkvdec ignores) | | iter9 α-2 | POC sentinel | ❌ Eliminated | | iter9 α-7 | Reference_ts magnitude | ❌ Eliminated | | iter11 α-13 | sps_max_num_reorder_pics | ❌ Eliminated (rkvdec ignores) | | iter11 α-14 | DECODE_PARAMS IRAP/IDR flags | ❌ Eliminated (rkvdec ignores) | | iter11 | num_entry_point_offsets | ❌ Eliminated (rkvdec ignores) | | iter11 | Slice qp_delta | ❌ Eliminated (rkvdec ignores) | | iter12 | RFC v2 vb2_dma_resv fences | ❌ Eliminated (orthogonal path) | | iter13 α-17 | DMA_BUF_IOCTL_SYNC CPU cache | ❌ Eliminated (ioctls work, output unchanged) | | **iter14 α-16** | **OUTPUT bitstream bytes wrong** | **❌ Eliminated (byte-identical to input)** | **13 hypotheses eliminated.** Libva backend produces byte-correct ioctls + controls + bitstream. Bug 4 + Bug 5 are **kernel-side**, not libva-side. ## Where the bug actually is Given: - libva submits byte-identical bitstream as kdirect. - libva submits kernel-correct controls (rkvdec reads the same SPS / PPS / DPB / slice fields from both). - libva uses the same V4L2 ioctl sequence shape (REQBUFS, S_FMT, EXPBUF, QBUF, MEDIA_REQUEST_IOC_QUEUE, DQBUF). - Same kernel (linux-fresnel-fourier 7.0-2). - Same hardware (rkvdec on RK3399). But: - libva HEVC → all-zero CAPTURE. - kdirect HEVC → correct CAPTURE. The cause must be: - Some subtle ioctl-sequence difference (timing of STREAMON, QBUF ordering, request_fd reuse pattern) that triggers different rkvdec state. - Some allocator difference (libva's CAPTURE buffer goes through one vb2 allocator, kdirect's through another, even though both end up V4L2_MEMORY_MMAP). - Some kernel-side state-machine bug specific to how libva sequences calls. These are NOT visible at the wire-byte / payload level. They are visible at the syscall-sequence level. The natural next investigation is a **full ioctl trace comparison** (not just S_EXT_CTRLS payload): - libva strace: every ioctl from open → REQBUFS → S_FMT → EXPBUF → STREAMON → QBUF → MEDIA_REQUEST_IOC_QUEUE → DQBUF → close. - kdirect strace: same. - Find the FIRST diverging ioctl or its FIRST diverging argument. ## Lessons 1. **OUTPUT byte verification is the gold-standard ruling-out check.** Two iters (12, 13) thrashed on kernel-substrate / cache hypotheses before this one byte-compared the actual slice data. Doing α-16 EARLIER (iter5 / iter6) would have saved many cycles. 2. **The campaign has been chasing wire-byte fields the kernel ignores.** Same anti-pattern as iter8 α-1. The reviewer's "grep rkvdec source for field reference" methodology saves iterations. 3. **VP9 works through the same libva backend** — so this isn't a categorical libva failure. It's a kernel codec-specific failure (HEVC + H.264 paths) that libva's particular ioctl sequence triggers and kdirect's doesn't. ## Substrate state at iter14 close - Fork tip `522fb6d` on noether + fresnel + gitea. - Backend SHA `fa2098b6…` on fresnel. - Kernel `7.0-2` (RFC v2 included). - Cumulative libva improvements that ship clean (zero regression, wire correctness): γ dump (iter8), IMP-1 memset gate (iter8), α-2 POC strip removed (iter9), α-7 timestamp counter (iter9), α-13 SPS hygiene (iter11), α-14 IRAP/IDR flags (iter11), α-17 DMA_BUF_IOCTL_SYNC (iter13), α-16 OUTPUT dump (iter14). 8 commits, all env-gated or wire-correctness. ## iter15+ candidates Given iter14's localization: - **Full ioctl-sequence trace diff** — strace libva vs kdirect, complete syscall sequence, find the first divergence. Likely 1-2 hours. - **kernel-side rkvdec hot-path trace** — instrument rkvdec-hevc.c via ftrace or eBPF kprobe; compare what kernel state evolves between libva-trigger and kdirect-trigger for the same input. Route via kernel-agent. - **Investigate libva STREAMON timing** — libva's STREAMON happens at CreateContext (iter5b-β); kdirect's STREAMON timing may differ. - **Campaign close-out documentation** — VP9 + MPEG-2 PASS direct via libva; HEVC + H.264 + VP8 remain kernel-side bugs, narrowing complete to wire-byte AND OUTPUT-byte byte-identity. Campaign deliverable: a libva backend that's byte-correct on its side; kernel-side gap is upstream. ## Memory rule candidate (defer) **Strong empirical evidence that libva backend ioctl/control/OUTPUT byte production for HEVC + H.264 on RK3399 is byte-correct relative to the working reference (ffmpeg-v4l2request). Bug 4 + Bug 5 are KERNEL-SIDE failures in how rkvdec processes specific ioctl sequences.** Future iters that target libva-side fixes for these bugs are unlikely to succeed without kernel cooperation.