Files
fresnel-fourier/phase0_findings_iter9.md
T
marfrit fa771b0625 iter9 Phase 0: lock α-7 timestamp scheme — only remaining wire diff
Phase 0 deep-strace yielded a critical narrowing:
- Post-DPB DECODE_PARAMS bytes (512-559): IDENTICAL libva vs kdirect
- PPS: IDENTICAL
- SPS: identical except inert constraint_set_flags
- DPB[0] beyond reference_ts: IDENTICAL after α-2

The ONLY remaining wire-byte diff between libva (broken) and kdirect
(working) is reference_ts magnitude. libva uses gettimeofday giving
~1.78e18 ns; kdirect uses an internal counter giving ~10000 ns.

α-7 hypothesis: V4L2 stateless decoder (rkvdec) reference-resolution
fails for very large reference_ts values. Possible mechanisms:
M-A: vb2_find_buffer_by_timestamp truncates/overflows on giant values.
M-B: V4L2 framework transforms OUTPUT QBUF ts before storing on CAPTURE
     but DPB.reference_ts left untransformed → mismatch.
M-C: gettimeofday + v4l2_timeval_to_ns produce slightly different ns
     values than the kernel computes from the timeval QBUF.

Fix: ~10 LOC. Add timestamp_counter to driver_data; replace
gettimeofday in EndPicture with monotonic counter.

If α-7 works → iter9 PASS, Bug 4 closed.
If α-7 doesn't → iter9 PARTIAL, wire-byte search space effectively
exhausted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:27:01 +00:00

153 lines
8.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 9 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock
Opens 2026-05-13 immediately after iter8 PARTIAL close ([`phase8_iteration8_close.md`](phase8_iteration8_close.md), commit `3ed1e45`). User confirmed proceed to iter9.
## Empirical surprise from iter9 Phase 0 deep-strace
I dumped the FULL 560-byte DECODE_PARAMS for the first P-frame of libva vs kdirect with `strace -s 8192`:
| Region | libva | kdirect | Diff |
|---|---|---|---|
| DPB[0] bytes 0-7 (reference_ts) | `30c2ea5cd622af18` (giant gettimeofday ns) | `f82a000000000000` (0x2af8 = 10968 ns) | **DIFF** |
| DPB[0] bytes 8-31 (pic_num/frame_num/fields/tfoc/bfoc/flags) | `00 00 00 00 00 00 03 00 ... 00 00 01 00 00 00 01 00 03 00 00 00` | identical | match |
| DPB[1..15] | all zero | all zero | match |
| Post-DPB bytes 512-559 | `01 00 01 00 04 00 01 00 ...` | identical | **match** |
**The ONLY wire-byte diff in DECODE_PARAMS is `reference_ts` magnitude.** Post-DPB fields, DPB entry contents (after α-2 fixed POC), pic_num, frame_num, fields, flags — all byte-identical between libva and kdirect.
Combined with iter8's other findings:
- SPS: identical except `constraint_set_flags` (rkvdec ignores per Phase 5b CRIT-1)
- PPS: identical (verified in this Phase 0)
- SCALING_MATRIX: identical (always-flat default)
**This leaves `reference_ts` (and adjacent OUTPUT QBUF timestamp) as the single remaining wire-byte hypothesis.**
## Locked research question (iter9, 2026-05-13)
> *"Does replacing libva's gettimeofday-based timestamp scheme with a monotonic per-context counter (matching kdirect's small-value pattern) unblock libva's H.264 decode on rkvdec? After fix: `libva_h264.yuv == kdirect_h264.yuv` byte-identical."*
### Pass/fail (boolean)
1. **H.264 libva == kdirect**: `cmp -s libva_h264.yuv kdirect_h264.yuv` returns 0. Hash matches `1e7a0bc9…`.
2. **VP9 unchanged**: `4f1565e8…`.
3. **MPEG-2 unchanged**: `19eefbf4…`.
4. **HEVC unchanged**: `06b2c5a0…` (Bug 5 still deferred).
5. **VP8 unchanged**: `bcc57ed5…` (Bug 6 still deferred).
6. **Control-payload anchors hold for 4 non-H.264 codecs**.
Clean iter9 close = all 6 PASS. If criterion 1 still fails, iter9 PARTIAL close with timestamp eliminated — at which point the search space for Bug 4 is effectively exhausted on the libva wire-payload side, and iter10+ would shift to slice-data encoding or kernel-internal investigation.
## Substrate state at iter9 open
| Property | Value |
|---|---|
| Kernel | `linux-fresnel-fourier 7.0-1` (unchanged) |
| Fork tip | `0226684` (iter8 close: γ + IMP-1 + α-2 POC strip removal) |
| Backend installed | `b6a3958a5bca945164262339dea5cc28f17accce13d57bd9f0c5a5dabbdf1b53` |
| Test fixtures | unchanged |
| Bug 4 narrowing | 5 eliminations (libva-readback, slot-binding, stale-residue, constraint_set_flags, POC sentinel) |
| Bug 4 remaining wire diff | reference_ts magnitude (giant vs small) |
## Mechanism the question targets
`picture.c::RequestEndPicture` line 440:
```c
gettimeofday(&surface_object->timestamp, NULL);
```
This produces a real-clock timeval (~1.78 × 10^9 sec = 1.78 × 10^18 ns) as the OUTPUT QBUF timestamp. The kernel stores this on the CAPTURE buffer via `V4L2_BUF_FLAG_TIMESTAMP_COPY` after decode.
`h264.c::h264_fill_dpb` line 268-269:
```c
timestamp = v4l2_timeval_to_ns(&surface->timestamp);
dpb->reference_ts = timestamp;
```
Sends the same giant ns value as reference_ts for inter-frame references.
In principle the kernel's `vb2_find_buffer_by_timestamp` does an exact 64-bit ns match and should not care about magnitude. But empirically, libva fails and kdirect (which uses ffmpeg-v4l2request's internal counter generating tiny ns values like `0x2af8 = 10968`) succeeds.
Possible mechanisms:
- **M-A**: Kernel `vb2_find_buffer_by_timestamp` has a comparison that fails for very large values (e.g., overflow on a u32 truncation, or signed comparison treating bit 63 as negative).
- **M-B**: The OUTPUT QBUF's timestamp gets truncated/transformed by the V4L2 framework before being stored on the CAPTURE buffer, but the DPB.reference_ts is left at full resolution. The kernel then compares full-resolution reference_ts against truncated stored ts — never matches.
- **M-C**: `gettimeofday` and `v4l2_timeval_to_ns` produce slightly different ns values (e.g., due to a re-read or rounding), making OUTPUT QBUF's ts and DPB.reference_ts not byte-equal.
- **M-D**: Some other reason a small counter works but a giant one doesn't.
## Fix shape
### α-7: monotonic per-context counter
Replace `gettimeofday(&surface_object->timestamp, NULL)` with a counter scheme that produces small ns values matching kdirect's pattern.
Simplest implementation:
- Add `u64 timestamp_counter` to `request_data` (init at 0 in CreateContext).
- In `EndPicture`, increment counter and set `surface->timestamp` from it.
Code shape:
```c
driver_data->timestamp_counter += 1000; /* 1 µs increments — small magnitude */
surface_object->timestamp.tv_sec = driver_data->timestamp_counter / 1000000;
surface_object->timestamp.tv_usec = driver_data->timestamp_counter % 1000000;
```
Or even simpler — just increment by 1 each frame, giving small values like 1, 2, 3, ...
LOC estimate: ~10 LOC.
### Risk
- **R-1**: Timestamp uniqueness — if the counter wraps in some long-lived process, ts uniqueness fails. For a campaign verifier (3 frames), counter wraps are impossible. For production playback, even 1 µs/frame gives ~140 years of unique values from u64.
- **R-2**: VP9 / VP8 / HEVC / MPEG-2 regression. Timestamp is shared infrastructure; all codecs use this same gettimeofday path. The change is uniform across codecs, so all 5 codecs get the new counter. VP9/MPEG-2 currently PASS direct via libva; switching from gettimeofday to counter should be neutral OR also a positive.
- **R-3**: Some consumer (ffmpeg outside the decoder path) reads the surface timestamp as a real wallclock. Probably no — VAAPI surfaces don't expose timestamps to consumers.
## In scope
- `src/request.h` — add `timestamp_counter` to driver_data.
- `src/request.c` or `src/context.c` — init counter to 0 in CreateContext.
- `src/picture.c::RequestEndPicture` — replace `gettimeofday(&surface->timestamp, NULL)` with counter-based assignment.
## Out of scope
- Per-codec changes (this is a shared-infrastructure change).
- Kernel patches.
## Phase 2 source-read targets
Already done in Phase 0 strace dump. Phase 2 will be brief — just confirm the gettimeofday call site is unique and that no other code reads `surface->timestamp` as a real wallclock.
## Phase 3 baseline
Already captured in iter8 Phase 3 + Phase 7c regression sweep. iter9 Phase 3 may just re-anchor.
## Phase 4 plan shape (predicted)
Already implicit in α-7. Will be drafted explicitly in Phase 4.
## Phase 5 review concerns to invite
- Reviewer should challenge `M-B` plausibility: does the V4L2 framework really transform OUTPUT QBUF timestamp before storing it on CAPTURE? Check `drivers/media/v4l2-core/v4l2-dev.c` or `videobuf2-v4l2.c::vb2_buffer_done`.
- Reviewer should verify the counter-monotonic-counter approach doesn't break MPEG-2 (which works via libva on hantro — different driver) or VP9 (works via libva on rkvdec).
- Per `feedback_wire_vs_behavior.md`: don't claim α-7 success on wire-byte change alone; criterion-1 hash test required.
## Predicted iter9 cadence
- Phase 0: this doc.
- Phase 4: 15 min.
- Phase 5: 30 min.
- Phase 6: 15 min.
- Phase 7: 15 min.
- Phase 8: 10 min.
Total: ~90 min. Quick iteration.
## What "iter9 PASS" looks like
If α-7 closes Bug 4:
- iter9 PASS.
- Bug 4 closed. H.264 row goes from PARTIAL to PASS direct.
- Memory rule worth recording: V4L2 stateless decoders may require small/relative reference_ts values; gettimeofday is unsafe. See [[feedback_v4l2_timestamp_scheme]].
If α-7 doesn't close:
- iter9 PARTIAL. iter10 candidates: slice-data encoding (DEEP), DPB entry ordering (cosmetic, kernel sorts internally anyway), or pivot to a different bug (Bug 5 HEVC).
- Realistically, iter10 may shift entirely to a kernel-side or fresh investigation since the wire-byte search space is now exhausted.