Files
fresnel-fourier/phase5_iter9_review.md
marfrit 3b0880a97f iter9 Phase 5: CRIT-1 — α-7 contradicted by VP9/MPEG-2 PASS evidence
VP9 (vp9.c:624) and MPEG-2 (mpeg2.c:150,156) use v4l2_timeval_to_ns
identically to H.264. Both PASS via libva with the same gettimeofday-
based giant ns values. If timestamp magnitude were the bug, VP9/MPEG-2
should also fail. They don't.

Reviewer flagged α-7 as low-probability fix and pointed to iter10
kernel-side investigation (M-A vb2_find_buffer_by_timestamp overflow)
if α-7 confirmed inert.

IMP-1: timestamp_counter should live in object_context not driver_data
to avoid multi-context collisions.

Decision: implement α-7 anyway as empirical confirmation (5 min) since
test cost is trivial. If α-7 fails as predicted, iter9 closes PARTIAL
with wire-byte search exhausted; iter10 candidates pivot to slice-data
encoding or kernel investigation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:33:54 +00:00

69 lines
5.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 9 — Phase 5 Review
Reviewer: second-model agent, 2026-05-13.
---
## CRIT
### CRIT-1 — VP9 / MPEG-2 use timestamp reference just as H.264 does; "neutral" claim is unverified
The plan states the change is "neutral OR a positive" for VP9/MPEG-2. That is wrong. VP9 uses `v4l2_timeval_to_ns(&last_ref->timestamp)` / `golden_frame_ts` / `alt_frame_ts` for *every inter frame* (`vp9.c:624-626`), and MPEG-2 does the same for forward/backward refs (`mpeg2.c:150,156`). These are structurally identical to H.264's `reference_ts`. If gettimeofday giant-ns timestamps are broken for H.264 inter-frames, they should be equally broken for VP9 inter-frames and MPEG-2 B/P-frames.
VP9 and MPEG-2 currently PASS via libva. This contradicts the hypothesis that giant-ns magnitude per se causes failure on rkvdec.
**The plan must address this contradiction before Phase 6.** Two possibilities:
- (a) The PASS test clip is keyframe-only (I-frames only, zero inter references used at decode time). If the test clip has P/B-frames, this rules out magnitude as the cause.
- (b) hantro (MPEG-2) and rkvdec (VP9) handle `vb2_find_buffer_by_timestamp` differently, but that is a kernel-driver difference, not a timestamp magnitude difference.
If neither (a) nor (b) is true, α-7 likely will not fix H.264 either, and the Phase 7 pass is a coin flip. The Phase 3 baseline notes should be checked to see whether the PASS codecs decode inter-frames from the libva path.
### CRIT-2 — Formula path is symmetric; M-C is implausible, M-B is the live candidate
`v4l2_timeval_to_ns` in userland (`videodev2.h:1124`):
```c
return (__u64)tv->tv_sec * 1000000000ULL + tv->tv_usec * 1000;
```
The kernel's `vb2_v4l2.c::vb2_fill_vb2_v4l2_buffer` does:
```c
vb->vb2_buf.timestamp = timeval_to_ns(&b->timestamp);
```
which uses `ktime_t timeval_to_ns` — identical formula. So userspace `v4l2_timeval_to_ns` and the kernel's ingestion formula are the same. M-C (formula mismatch) is ruled out; M-B (kernel truncation before storing on CAPTURE, while DPB.reference_ts is full-resolution) remains the most credible live hypothesis.
Concretely for M-B: the OUTPUT QBUF carries the timeval, kernel converts it to u64 ns and stamps it on the resulting CAPTURE buffer via `V4L2_BUF_FLAG_TIMESTAMP_COPY`. DPB.reference_ts is computed in userspace from the same timeval by the same formula. Unless the kernel loses precision anywhere in that path, M-B is also implausible. **M-A** (overflow / signed comparison in `vb2_find_buffer_by_timestamp` for large u64) remains the only mechanism consistent with all evidence, including the VP9/MPEG-2 anomaly if those test clips happen to be all-keyframes.
The plan does not clearly commit to M-A as the mechanism. It should, because that is the mechanism α-7 is actually testing.
---
## IMP
### IMP-1 — `request_data` vs per-context scope: counter must be per-context, not global
The plan puts `timestamp_counter` in `request_data` (driver-global). If two contexts decode concurrently (unlikely for the campaign, common in real applications), they share the counter and timestamps collide within a single context's DPB. The counter should live in `object_context` and be initialised in `CreateContext`. For the 3-frame campaign test this is irrelevant, but the plan claims 18,500-year production safety while leaving a multi-context collision.
### IMP-2 — VP9 test clip inter-frame status must be verified before claiming regression safety
Run `ffprobe -show_frames vp9_clip.ivf | grep pict_type` (or the equivalent) to confirm whether the VP9 regression clip exercises P-frames at the rkvdec libva path. If it does and still PASSes, that is definitive evidence against magnitude being the root cause, and α-7 should be downgraded to "worth trying, but expected to fail".
---
## MIN
### MIN-1 — Strace QUERYBUF cross-check as a cheaper pre-build diagnostic
Before rebuilding, a 2-line strace filter on an existing run can confirm whether the CAPTURE buffer's stored timestamp (after `V4L2_BUF_FLAG_TIMESTAMP_COPY`) matches the OUTPUT QBUF timestamp at the kernel level. If it does not match, M-B is real and α-7 fixes nothing. This can save a build cycle.
### MIN-2 — `tv_sec = counter / 1e6`, `tv_usec = counter % 1e6` is fine but add a comment
The wrap-safe form in the plan is correct. Add a one-line comment that `tv_usec < 1,000,000` is maintained by construction, to document why the modulus is 1e6 and not USEC_PER_SEC (they are equal, but the intent is not obvious).
---
## Summary
**One plan-level blocker (CRIT-1):** the VP9/MPEG-2 working-today evidence contradicts the hypothesis. Resolve by auditing whether those codecs' PASS clips use inter-frame decode under libva, before committing Phase 6 resources. If they do use inter-frame decode and still PASS, α-7 is a low-probability fix — document that and proceed anyway (it's only 10 LOC), but Phase 7 PARTIAL is the expected outcome and iter10 should pivot to kernel-side investigation (M-A: overflow in `vb2_find_buffer_by_timestamp`). If the clips are all-keyframes, CRIT-1 dissolves.
**CRIT-2** is an analytical sharpening, not a plan-killer: M-C is ruled out; M-A is the live mechanism; the plan should say so explicitly.