Files

T

marfrit 007cf6ca8e iter6 Phase 3: narrowed Bug 6 — H-A/B/C eliminated; H-D/E (kernel) remain

Empirical Phase 3 narrowing:
- H-A slice data corruption: ELIMINATED. SHA256 of libva-dumped slice 0
  (300614 bytes) byte-identical to raw VP8 frame 0 from .webm at
  offset 10..300624 (post-VP8-header).
- H-B slices_size wrong: ELIMINATED. slices_size = fp_size +
  sum(dct_part_sizes) = 300614 exactly.
- H-C cache coherency: ELIMINATED. msync attempt yielded no output
  change; VP9 uses same image.c path and works fine.
- Control payloads: byte-identical between libva and kdirect for VP8
  keyframe (pre-Phase-2 finding).

Output pattern: erratic partial-write. Frame 0 Y plane has real
content rows 0-535, then 100% zero rows 536-719. UV plane real
rows 0-133, zero 134-359. Frame 1 Y plane real rows 0-23, zero
24-719. Per-frame transitions differ — not buffer-size truncation,
not slot rotation.

Remaining:
- H-D slot rotation (untested; needs instrumentation)
- H-E kernel-side hantro VP8 partial-write quirk (likely; needs
  ftrace / kernel investigation)

iter5b-β did fix Bug 2 for VP8 (pre-β all-zero was format mismatch;
post-β real-but-partial content is a separate kernel-side issue).

Phase 3 hands off 4 candidate directions to user:
- K: continue H-D investigation (1-2h next session)
- L: pivot to H-E kernel-side work (multi-session)
- M: park Bug 6, pick different bug (Bug 4/5 or iter4-B1)
- N: close iter6 PARTIAL, defer Bug 6 to iter7+

Substrate unchanged; no regression. Backend SHA still 2c6ff82c....

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-12 22:43:14 +00:00

7.4 KiB

Raw Blame History

Iteration 6 — Phase 3 (empirical narrowing for Bug 6)

Captured 2026-05-12 evening. Bug 6 is further narrowed but not fully root-caused in this Phase 3 session. Three of five hypotheses eliminated; remaining two need deeper kernel-side investigation.

Eliminations

H-A — slice data corruption: ELIMINATED

Instrumented picture.c::RequestEndPicture to dump surface_object->source_data[0..slices_size] to /tmp/iter6_slice_libva_N.bin right before QBUF on OUTPUT.

Frame 0 (keyframe): 300614 bytes dumped.
Frame 1: 417 bytes.
Frame 2: 1122 bytes.
...

Extracted raw VP8 frames from the fixture (ffmpeg -i bbb_720p10s_vp8.webm -c:v copy -f rawvideo). Frame 0 starts with the 10-byte VP8 keyframe header (d0 1a 0b 9d 01 2a 00 05 d0 02) then the control-partition data.

SHA256 of raw frame 0 bytes 10..300624 = SHA256 of libva slice 0 dump:

9e74956c75388e8a8c5f4d8f747e2ac99801b5fef14fe890b708b9d0272e9407

Byte-identical. The libva backend submits the correct VP8 frame bytes (post-header, as expected for VAAPI's pre-parsed-coder-state submission convention).

H-B — `slices_size` wrong on OUTPUT QBUF: ELIMINATED

Extracted from the control payload: fp_size = 22742, dct_part_sizes[0] = 277872. Expected slice data = 22742 + 277872 = 300614. Dumped slice size = 300614 exactly. slices_size is right.

H-C — CAPTURE-side cache coherency: ELIMINATED (probable)

Added msync(MS_SYNC | MS_INVALIDATE) instrumentation before the copy_surface_to_image memcpy in image.c. msync returned EINVAL (page-alignment issue on V4L2 mmap addresses), but the output hash was unchanged with or without the attempt: bcc57ed5c9021d02a3134949c6e483f13df22ff1f1dc0764097570fbcc4904e6 both runs.

Stronger argument against H-C: VP9 uses the same picture.c → image.c readback path and produces byte-identical output to kdirect. If cache coherency were the bug, VP9 would be broken too.

Control payload byte-equality: pre-eliminated at Phase 2

VP8 keyframe control payload byte-identical between libva and kdirect on the current substrate. Inter-frame payloads differ only in reference timestamps (libva: wall-clock ns; kdirect: small pts-derived; both internally consistent — kernel uses them as opaque keys to look up CAPTURE buffers).

Remaining hypotheses

H-D — CAPTURE slot rotation mismatch: open

Not directly tested in this session. Would need: log slot->v4l2_index at cap_pool_acquire time and at copy_surface_to_image read time; verify they match. If they diverge, libva reads from a stale slot while kernel wrote to a different one.

H-E — kernel-side hantro VP8 quirk: open and increasingly likely

The output bytes show an erratic partial-write pattern:

Frame	First fully-zero row (Y plane)	First fully-zero row (UV plane)	Real-content rows
0 (keyframe)	536 of 720	134 of 360	0..535 (Y) + 0..133 (UV)
1 (inter)	24 of 720	(not measured)	0..23 (Y)
2 (inter)	(not measured)	(not measured)	—

Per-frame transition rows differ (536, 24, …). Not a simple "first N rows decoded, rest zero" pattern with fixed N. Not a slot-rotation bug (would produce shifted real content, not partial-then-zero). Not a buffer-size truncation (would be a clean cutoff at a consistent row, not per-frame).

Plausible H-E sub-hypotheses:

Kernel decoder runs asynchronously and DQBUF returns BEFORE the kernel finishes writing all macroblock rows. Each frame stops at a different row depending on how lucky the timing was when DQBUF returned. Despite V4L2 spec saying DQBUF blocks until VB2_BUF_STATE_DONE, perhaps hantro VP8 path has a bug where it signals DONE early.
Kernel decoder rate-limited or interrupted at random macroblock counts due to some kernel-internal scheduler / IRQ issue.
vb2_dma_resv-style cache invalidation gap (the iter5-rejected RFC v2 patches addressed this for DMA-BUF-import; maybe also matters for the libva-MMAP-EXPBUF readback path despite Phase 5 iter5 analysis showing the fence doesn't reach the consumer).

H-E doesn't yield to a single small backend patch. Would need kernel ftrace / instrumented hantro driver to confirm or deny.

What's confirmed

iter5b-β fixed the OUTPUT pixel format bug (Bug 2 for VP8 specifically). Pre-β VP8 was all-zero because hantro substituted MPEG2_DECODER codec_mode. Post-β VP8 has VP8_FRAME OUTPUT format → kernel ACTUALLY dispatches to VP8 decoder → partial output (Bug 6).
The libva backend's VP8 control bytes are correct (byte-identical to kdirect on the same hardware).
The libva backend's slice data is correct (byte-identical to the raw VP8 bitstream post-header).
slices_size (the OUTPUT QBUF bytesused) is correct (matches fp_size + sum(dct_part_sizes)).

What's NOT yet confirmed

Whether H-D (slot rotation) is happening — needs instrumentation.
Whether H-E (kernel-side partial-write) is happening — needs kernel-side investigation.
Whether the erratic per-frame transition rows have a discoverable pattern that points at a specific kernel bug.

Phase 4 candidates

Given Phase 3 narrowing, iter6 has multiple possible directions:

Candidate K — Continue H-D investigation (next session)

Add slot-index logging at cap_pool_acquire + image.c::copy_surface_to_image; run sweep; verify indices match. If they diverge → fix in backend's slot binding. If they match → H-D eliminated, proceed to H-E.

Estimated wallclock: 1-2 hours next session.

Candidate L — Move to H-E kernel-side investigation

Pivot to kernel-side ftrace, hantro source-read, possibly local kernel patches. Substantially heavier; aligns with the original iter5 Candidate B (kernel work) that user rejected at iter5b open.

Estimated wallclock: multi-session.

Candidate M — Park Bug 6, pick a different bug

Phase 3 narrowing established Bug 6 is a kernel-side partial-write issue, not a quick backend fix. Drop iter6 from Bug 6, switch to:

Bug 4 (H.264 inter race-loss) — also kernel-related, but the iter4 prior work touched H.264 backend extensively so backend instrumentation is more familiar.
Bug 5 (HEVC DQBUF FLAG_ERROR) — pre-existing kernel rejection; diff strace of libva vs kdirect HEVC.
iter4-B1 (auto-detect device discrimination) — pure backend, ~100 LOC.

Candidate N — Document Bug 6 partial root cause, close iter6 PARTIAL

iter6 closes with: "Bug 6 narrowed but kernel-side, deferred to iter7+. iter6's Phase 3 work establishes H-A/B/C are NOT Bug 6's cause; H-D/E remain. Substrate state unchanged; no regression introduced."

iter6 hands off Bugs 4/5/6 to iter7+. Memory updates for the iter6 lesson on transitive-proof partial-coverage.

Decision point

This is a user-decision point. Phase 3 has done its narrowing job. Bug 6's actual fix is either:

1-2 hours more empirical work (Candidate K — likely productive)
Multi-session kernel-side work (Candidate L)
Pivot to a different bug (Candidate M)
Close iter6 partial and document (Candidate N)

Substrate state at iter6 Phase 3 close

Fork tip 70196f8 (iter5b-β + Commit D). Unchanged.
Backend installed SHA 2c6ff82cbdc156ff8910d0c7fe58e75eeecdfd6e6a1caabb049c8adf43a098b8. Phase 3 diagnostic instrumentation reverted.
Kernel 7.0.0-fresnel-fourier. Unchanged.
Phase 3 artifacts at fresnel /tmp/iter5b_p7v2/, /tmp/iter6_slice_libva_*.bin, /tmp/vp8_raw.bin. Plus /tmp/vp8_libva_traces/ and /tmp/vp8_kdirect_traces/ on noether (strace captures + extract scripts).

7.4 KiB Raw Blame History