Files
fresnel-fourier/phase0_findings_iter9.md
T
marfrit fa771b0625 iter9 Phase 0: lock α-7 timestamp scheme — only remaining wire diff
Phase 0 deep-strace yielded a critical narrowing:
- Post-DPB DECODE_PARAMS bytes (512-559): IDENTICAL libva vs kdirect
- PPS: IDENTICAL
- SPS: identical except inert constraint_set_flags
- DPB[0] beyond reference_ts: IDENTICAL after α-2

The ONLY remaining wire-byte diff between libva (broken) and kdirect
(working) is reference_ts magnitude. libva uses gettimeofday giving
~1.78e18 ns; kdirect uses an internal counter giving ~10000 ns.

α-7 hypothesis: V4L2 stateless decoder (rkvdec) reference-resolution
fails for very large reference_ts values. Possible mechanisms:
M-A: vb2_find_buffer_by_timestamp truncates/overflows on giant values.
M-B: V4L2 framework transforms OUTPUT QBUF ts before storing on CAPTURE
     but DPB.reference_ts left untransformed → mismatch.
M-C: gettimeofday + v4l2_timeval_to_ns produce slightly different ns
     values than the kernel computes from the timeval QBUF.

Fix: ~10 LOC. Add timestamp_counter to driver_data; replace
gettimeofday in EndPicture with monotonic counter.

If α-7 works → iter9 PASS, Bug 4 closed.
If α-7 doesn't → iter9 PARTIAL, wire-byte search space effectively
exhausted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:27:01 +00:00

8.0 KiB
Raw Blame History

Iteration 9 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock

Opens 2026-05-13 immediately after iter8 PARTIAL close (phase8_iteration8_close.md, commit 3ed1e45). User confirmed proceed to iter9.

Empirical surprise from iter9 Phase 0 deep-strace

I dumped the FULL 560-byte DECODE_PARAMS for the first P-frame of libva vs kdirect with strace -s 8192:

Region libva kdirect Diff
DPB[0] bytes 0-7 (reference_ts) 30c2ea5cd622af18 (giant gettimeofday ns) f82a000000000000 (0x2af8 = 10968 ns) DIFF
DPB[0] bytes 8-31 (pic_num/frame_num/fields/tfoc/bfoc/flags) 00 00 00 00 00 00 03 00 ... 00 00 01 00 00 00 01 00 03 00 00 00 identical match
DPB[1..15] all zero all zero match
Post-DPB bytes 512-559 01 00 01 00 04 00 01 00 ... identical match

The ONLY wire-byte diff in DECODE_PARAMS is reference_ts magnitude. Post-DPB fields, DPB entry contents (after α-2 fixed POC), pic_num, frame_num, fields, flags — all byte-identical between libva and kdirect.

Combined with iter8's other findings:

  • SPS: identical except constraint_set_flags (rkvdec ignores per Phase 5b CRIT-1)
  • PPS: identical (verified in this Phase 0)
  • SCALING_MATRIX: identical (always-flat default)

This leaves reference_ts (and adjacent OUTPUT QBUF timestamp) as the single remaining wire-byte hypothesis.

Locked research question (iter9, 2026-05-13)

"Does replacing libva's gettimeofday-based timestamp scheme with a monotonic per-context counter (matching kdirect's small-value pattern) unblock libva's H.264 decode on rkvdec? After fix: libva_h264.yuv == kdirect_h264.yuv byte-identical."

Pass/fail (boolean)

  1. H.264 libva == kdirect: cmp -s libva_h264.yuv kdirect_h264.yuv returns 0. Hash matches 1e7a0bc9….
  2. VP9 unchanged: 4f1565e8….
  3. MPEG-2 unchanged: 19eefbf4….
  4. HEVC unchanged: 06b2c5a0… (Bug 5 still deferred).
  5. VP8 unchanged: bcc57ed5… (Bug 6 still deferred).
  6. Control-payload anchors hold for 4 non-H.264 codecs.

Clean iter9 close = all 6 PASS. If criterion 1 still fails, iter9 PARTIAL close with timestamp eliminated — at which point the search space for Bug 4 is effectively exhausted on the libva wire-payload side, and iter10+ would shift to slice-data encoding or kernel-internal investigation.

Substrate state at iter9 open

Property Value
Kernel linux-fresnel-fourier 7.0-1 (unchanged)
Fork tip 0226684 (iter8 close: γ + IMP-1 + α-2 POC strip removal)
Backend installed b6a3958a5bca945164262339dea5cc28f17accce13d57bd9f0c5a5dabbdf1b53
Test fixtures unchanged
Bug 4 narrowing 5 eliminations (libva-readback, slot-binding, stale-residue, constraint_set_flags, POC sentinel)
Bug 4 remaining wire diff reference_ts magnitude (giant vs small)

Mechanism the question targets

picture.c::RequestEndPicture line 440:

gettimeofday(&surface_object->timestamp, NULL);

This produces a real-clock timeval (~1.78 × 10^9 sec = 1.78 × 10^18 ns) as the OUTPUT QBUF timestamp. The kernel stores this on the CAPTURE buffer via V4L2_BUF_FLAG_TIMESTAMP_COPY after decode.

h264.c::h264_fill_dpb line 268-269:

timestamp = v4l2_timeval_to_ns(&surface->timestamp);
dpb->reference_ts = timestamp;

Sends the same giant ns value as reference_ts for inter-frame references.

In principle the kernel's vb2_find_buffer_by_timestamp does an exact 64-bit ns match and should not care about magnitude. But empirically, libva fails and kdirect (which uses ffmpeg-v4l2request's internal counter generating tiny ns values like 0x2af8 = 10968) succeeds.

Possible mechanisms:

  • M-A: Kernel vb2_find_buffer_by_timestamp has a comparison that fails for very large values (e.g., overflow on a u32 truncation, or signed comparison treating bit 63 as negative).
  • M-B: The OUTPUT QBUF's timestamp gets truncated/transformed by the V4L2 framework before being stored on the CAPTURE buffer, but the DPB.reference_ts is left at full resolution. The kernel then compares full-resolution reference_ts against truncated stored ts — never matches.
  • M-C: gettimeofday and v4l2_timeval_to_ns produce slightly different ns values (e.g., due to a re-read or rounding), making OUTPUT QBUF's ts and DPB.reference_ts not byte-equal.
  • M-D: Some other reason a small counter works but a giant one doesn't.

Fix shape

α-7: monotonic per-context counter

Replace gettimeofday(&surface_object->timestamp, NULL) with a counter scheme that produces small ns values matching kdirect's pattern.

Simplest implementation:

  • Add u64 timestamp_counter to request_data (init at 0 in CreateContext).
  • In EndPicture, increment counter and set surface->timestamp from it.

Code shape:

driver_data->timestamp_counter += 1000;  /* 1 µs increments — small magnitude */
surface_object->timestamp.tv_sec = driver_data->timestamp_counter / 1000000;
surface_object->timestamp.tv_usec = driver_data->timestamp_counter % 1000000;

Or even simpler — just increment by 1 each frame, giving small values like 1, 2, 3, ...

LOC estimate: ~10 LOC.

Risk

  • R-1: Timestamp uniqueness — if the counter wraps in some long-lived process, ts uniqueness fails. For a campaign verifier (3 frames), counter wraps are impossible. For production playback, even 1 µs/frame gives ~140 years of unique values from u64.
  • R-2: VP9 / VP8 / HEVC / MPEG-2 regression. Timestamp is shared infrastructure; all codecs use this same gettimeofday path. The change is uniform across codecs, so all 5 codecs get the new counter. VP9/MPEG-2 currently PASS direct via libva; switching from gettimeofday to counter should be neutral OR also a positive.
  • R-3: Some consumer (ffmpeg outside the decoder path) reads the surface timestamp as a real wallclock. Probably no — VAAPI surfaces don't expose timestamps to consumers.

In scope

  • src/request.h — add timestamp_counter to driver_data.
  • src/request.c or src/context.c — init counter to 0 in CreateContext.
  • src/picture.c::RequestEndPicture — replace gettimeofday(&surface->timestamp, NULL) with counter-based assignment.

Out of scope

  • Per-codec changes (this is a shared-infrastructure change).
  • Kernel patches.

Phase 2 source-read targets

Already done in Phase 0 strace dump. Phase 2 will be brief — just confirm the gettimeofday call site is unique and that no other code reads surface->timestamp as a real wallclock.

Phase 3 baseline

Already captured in iter8 Phase 3 + Phase 7c regression sweep. iter9 Phase 3 may just re-anchor.

Phase 4 plan shape (predicted)

Already implicit in α-7. Will be drafted explicitly in Phase 4.

Phase 5 review concerns to invite

  • Reviewer should challenge M-B plausibility: does the V4L2 framework really transform OUTPUT QBUF timestamp before storing it on CAPTURE? Check drivers/media/v4l2-core/v4l2-dev.c or videobuf2-v4l2.c::vb2_buffer_done.
  • Reviewer should verify the counter-monotonic-counter approach doesn't break MPEG-2 (which works via libva on hantro — different driver) or VP9 (works via libva on rkvdec).
  • Per feedback_wire_vs_behavior.md: don't claim α-7 success on wire-byte change alone; criterion-1 hash test required.

Predicted iter9 cadence

  • Phase 0: this doc.
  • Phase 4: 15 min.
  • Phase 5: 30 min.
  • Phase 6: 15 min.
  • Phase 7: 15 min.
  • Phase 8: 10 min.

Total: ~90 min. Quick iteration.

What "iter9 PASS" looks like

If α-7 closes Bug 4:

  • iter9 PASS.
  • Bug 4 closed. H.264 row goes from PARTIAL to PASS direct.
  • Memory rule worth recording: V4L2 stateless decoders may require small/relative reference_ts values; gettimeofday is unsafe. See feedback_v4l2_timestamp_scheme.

If α-7 doesn't close:

  • iter9 PARTIAL. iter10 candidates: slice-data encoding (DEEP), DPB entry ordering (cosmetic, kernel sorts internally anyway), or pivot to a different bug (Bug 5 HEVC).
  • Realistically, iter10 may shift entirely to a kernel-side or fresh investigation since the wire-byte search space is now exhausted.