Files

T

marfrit bece7b7016 iter6 Phase 2: situation — VP8 control bytes are correct; bug is elsewhere

Empirical byte-diff of libva vs kdirect VP8 control payloads on
current substrate:
- Keyframe (payloads 0+1): BYTE-IDENTICAL (0 diffs / 1232 bytes)
- Inter frames: only 24 bytes diff at offset 1200-1223, which are
  the 3 reference-frame timestamps. libva uses gettimeofday→ns
  (large values), kdirect uses pts-derived (small). Both internally
  consistent; kernel uses them as keys, absolute values don't matter.

Verdict: Bug 6 is NOT in vp8.c control generation. The bytes match.
With identical controls and same hardware, libva produces 0.4% pixel
match for keyframe — bug lives in slice-data path, bytesused, cache
coherency, or CAPTURE slot rotation.

5 hypotheses (H-A..H-E) for Phase 3 to narrow:
- H-A slice data corruption in libva path (picture.c memcpy)
- H-B slices_size wrong on OUTPUT QBUF
- H-C cache coherency on OUTPUT mmap before kernel DMA read
- H-D CAPTURE slot rotation mismatch
- H-E other (deeper kernel-side)

Pre-iter5b masked all of these via the OUTPUT format mismatch
producing all-zero output. β fixed format → kernel actually decodes →
underlying bug now visible.

iter3's transitive proof verified specific control fields. Did not
verify slice data, bytesused, cache state, or slot rotation. Same
pattern as iter2's HEVC transitive PASS missing Bug 5. Future
transitive PASS claims must enumerate non-verified artifacts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-12 20:17:05 +00:00

8.8 KiB

Raw Blame History

Iteration 6 — Phase 2 (situation analysis)

Captured 2026-05-12 evening, immediately after Phase 1 lock. Empirical diff of VP8 control payloads (libva backend vs ffmpeg-v4l2request kdirect) ruled out the obvious "wrong fields in v4l2_ctrl_vp8_frame" hypothesis. Bug 6's root cause lives elsewhere.

Byte-level pixel divergence (Phase 1 anchor recap)

Raw NV12-derived yuv420p, 3 frames @720p:

Frame	Y match	U match	V match
0 (keyframe)	3779/921600 (0.4%)	2728/230400 (1.2%)	3001/230400 (1.3%)
1 (inter)	93/921600 (0.0%)	46/230400 (0.0%)	127/230400 (0.1%)
2 (inter)	33/921600 (0.0%)	1/230400 (0.0%)	21/230400 (0.0%)

Frame 0 first 16 bytes (libva): 93 8e 8a 89 85 72 8c 6d 82 79 92 7e 80 80 80 80 — plausibly real decoded content. Frame 0 first 16 bytes (kdirect): different (real decoded content from the SAME bitstream). Both decoded SOMETHING; the somethings disagree.

The decoder ran (no DQBUF ERROR like HEVC) but produced bytes that don't match the same kernel's kdirect output for the same bitstream.

Control-payload diff (libva vs kdirect on current substrate)

Method: strace both paths with -x -s 99999, extract every VIDIOC_S_EXT_CTRLS payload at id=0xa409c8 (V4L2_CID_STATELESS_VP8_FRAME), unescape strace's hex string back to raw bytes, byte-compare across payload indices.

Result (full table at /tmp/vp8diff2.py):

Payload 0 + 1 (frame 0 keyframe — both TRY_EXT_CTRLS + S_EXT_CTRLS for the same frame): 0 bytes differ. last_frame_ts = gold_frame_ts = alt_frame_ts = 0x0, key=True, fp_size=22742, fp_hb=6550. byte-identical between libva and kdirect.
Payload 2+ (inter frames): 24 bytes differ at offsets 1200..1223 — exactly the three __u64 reference timestamps (last_frame_ts, golden_frame_ts, alt_frame_ts). libva uses wall-clock ns from gettimeofday(&surface_object->timestamp, NULL) (e.g. 0x18aeea0df8c7c628); kdirect uses small pts-derived values (e.g. 0x1388 = 5000). Both are valid timestamps as far as the kernel is concerned — the kernel uses them as keys to look up CAPTURE buffers; absolute values don't matter as long as they're internally consistent (the same value used at the producer-side OUTPUT QBUF when the reference frame was decoded, and at the consumer-side reference control when reading the reference).

Verdict: control payloads for keyframes are byte-identical. For inter frames the only diff is reference timestamps (different domains, both internally consistent). Bug 6 is NOT in vp8.c::vp8_set_controls — the control bytes are correct.

Where does the divergence live?

Hypothesis space narrowed:

H-A — Slice data corruption in libva path

picture.c::codec_store_buffer at line 78 does memcpy(surface_object->source_data + slices_size, buffer_object->data, buffer_object->size * buffer_object->count). If source_data points to the wrong slot, or slices_size is wrong, or the memcpy size is wrong, the OUTPUT buffer contents don't match what kdirect submits.

Pre-iter5b VP8 ran through this same code path and produced all-zero (because of the iter5b-fixed OUTPUT format mismatch). Now post-β VP8 runs the same code path with the correct format — so any pre-existing slice-data bug surfaces only now.

Diagnostic step: dump the OUTPUT buffer contents right before QBUF and compare to what kdirect writes. If different bytes, slice-data path is the bug.

H-B — `slices_size` (bytesused on OUTPUT QBUF) wrong

picture.c::RequestEndPicture line 462-464:

rc = v4l2_queue_buffer(driver_data->video_fd, request_fd, output_type,
               &surface_object->timestamp,
               surface_object->source_index,
               surface_object->slices_size, 1);

slices_size is the bytesused for OUTPUT. Earlier strace of HEVC showed bytesused=0 on the top-level v4l2_buffer field in strace — but strace doesn't show m.planes[].bytesused for MPLANE. So we don't know from strace alone whether slices_size is being set correctly on the wire.

Diagnostic step: instrument the v4l2_queue_buffer call site to log slices_size per frame, OR use a kernel ftrace event to see what bytesused the kernel sees on the OUTPUT queue.

H-C — Cache coherency on the OUTPUT bitstream side

CPU writes slice data to mmap'd OUTPUT buffer; kernel reads via DMA. On RK3399 with non-coherent DMA (memory feedback_rockchip_pixel_verify_path.md documents the cache-mmap weirdness), if no cache flush happens before QBUF, the kernel sees stale buffer contents.

Diagnostic step: add an explicit msync(MS_SYNC) before QBUF on the OUTPUT buffer and see if output changes.

H-D — CAPTURE-side slot rotation

cap_pool acquires a slot at BeginPicture; binds destination_data via surface_bind_slot. If the slot index passed to QBUF (surface_object->destination_index) doesn't match the slot whose mmap is bound to destination_data, the readback reads the WRONG slot.

Diagnostic step: log slot index per frame at QBUF time AND at copy_surface_to_image time.

H-E — kernel-side hantro VP8 quirk we don't understand yet

Some kernel-side handling we haven't traced.

Why H-A / H-B / H-C / H-D didn't surface pre-iter5b

Pre-β VP8 produced all-zero because the OUTPUT pixel format was H264_SLICE (not VP8_FRAME). Hantro substituted MPEG2_DECODER codec_mode (per feedback_unconditional_codec_state.md analysis). The kernel never actually attempted VP8 decode. So no slice-data bug, bytesused bug, cache bug, or slot bug could surface — the kernel-side dispatch was wrong before any of those mattered.

Post-β fixed the format → kernel now actually attempts VP8 decode → underlying bugs become visible.

iter3 transitive-proof reframing

iter3 Phase 5 C1+C2 verified specific control fields (first_part_size, first_part_header_bits) matched kdirect. iter3's "PASS via transitive proof" was technically correct for what it verified. It did NOT verify slice-data bytes, bytesused-on-the-wire, cache flush, or CAPTURE slot rotation. Bug 6 lives in one of those — invisible to iter3's proof structure.

This is the same shape as iter2's HEVC "PASS via transitive proof" missing Bug 5. The pattern: transitive proofs against ONE artifact (control payload) don't catch bugs in OTHER artifacts (slice data, ioctl ordering, cache state). Future "transitive PASS" claims should explicitly enumerate which artifacts are NOT proven equivalent.

Phase 3 plan (empirical narrowing)

iter6 Phase 3 (the next phase, since Phase 2 is wrapping here) will narrow which of H-A through H-E is Bug 6:

Dump libva's OUTPUT-buffer slice-data bytes right before QBUF. Compare to kdirect's bytes (extractable from kdirect strace). If different → H-A confirmed.
Log slices_size per frame. Compare to expected size (from VAAPI's buffer_object->size * count plus accumulation). Verify it matches first_part_size + dct_part_sizes total.
Insert an msync(MS_SYNC, source_data, slices_size) before QBUF. If output changes → H-C.
Log surface_object->destination_index vs surface_object->current_slot->v4l2_index at each cap_pool_acquire and at copy_surface_to_image time. If they ever diverge → H-D.
If none of 1-4 reveal the bug → H-E and deeper kernel-side debugging.

iter6 Phase 3 is empirical investigation; Phase 4 is the plan based on findings.

Memory rules touched

feedback_review_empirical_over_theoretical.md (Direction 2): re-verify before declaring. Phase 2 author hypothesized "wrong vp8.c control field"; empirical diff disproved it. Hypothesis pivoted to slice-data / bytesused / cache / slot rotation.
feedback_trace_fix_mechanism_to_consumer.md: tracing chain producer→primitive→consumer-read-site. For Bug 6 the consumer is "the kernel's decode path reading the OUTPUT bitstream"; the producer is "libva backend writing slice data to the OUTPUT mmap." The trace must continue from libva's memcpy through to kernel DMA read.

Phase 5 review hand-off (when Phase 4 lands)

Phase 5 review of Bug 6's fix plan must:

Verify the diagnostic that picked H-A/B/C/D/E was empirical (run the test, observe).
Verify the fix patches the right site (the artifact that diagnostic identified).
Verify regression-free for the other 4 codecs.

Phase 4 anticipated

Once H-A through H-E is narrowed to ONE root cause, Phase 4 writes a small patch (likely 5-50 LOC) against that site. Phase 6 lands the patch. Phase 7 re-runs sweep.

Substrate state at iter6 Phase 2 close

Fork tip 70196f8 on noether + fresnel + gitea.
Backend installed at /usr/lib/dri/v4l2_request_drv_video.so SHA 2c6ff82c….
Kernel linux-fresnel-fourier 7.0-1.
Phase 1 VP8-related anchors at /tmp/iter5b_p7v2/, /tmp/vp8_libva_traces/, /tmp/vp8_kdirect_traces/. Phase 3 artifacts will accumulate here.
VP9, MPEG-2, H.264 keyframe-partial, HEVC all-zero status unchanged from iter5b-β close.

8.8 KiB Raw Blame History