fa771b0625
Phase 0 deep-strace yielded a critical narrowing:
- Post-DPB DECODE_PARAMS bytes (512-559): IDENTICAL libva vs kdirect
- PPS: IDENTICAL
- SPS: identical except inert constraint_set_flags
- DPB[0] beyond reference_ts: IDENTICAL after α-2
The ONLY remaining wire-byte diff between libva (broken) and kdirect
(working) is reference_ts magnitude. libva uses gettimeofday giving
~1.78e18 ns; kdirect uses an internal counter giving ~10000 ns.
α-7 hypothesis: V4L2 stateless decoder (rkvdec) reference-resolution
fails for very large reference_ts values. Possible mechanisms:
M-A: vb2_find_buffer_by_timestamp truncates/overflows on giant values.
M-B: V4L2 framework transforms OUTPUT QBUF ts before storing on CAPTURE
but DPB.reference_ts left untransformed → mismatch.
M-C: gettimeofday + v4l2_timeval_to_ns produce slightly different ns
values than the kernel computes from the timeval QBUF.
Fix: ~10 LOC. Add timestamp_counter to driver_data; replace
gettimeofday in EndPicture with monotonic counter.
If α-7 works → iter9 PASS, Bug 4 closed.
If α-7 doesn't → iter9 PARTIAL, wire-byte search space effectively
exhausted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
153 lines
8.0 KiB
Markdown
153 lines
8.0 KiB
Markdown
# Iteration 9 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock
|
||
|
||
Opens 2026-05-13 immediately after iter8 PARTIAL close ([`phase8_iteration8_close.md`](phase8_iteration8_close.md), commit `3ed1e45`). User confirmed proceed to iter9.
|
||
|
||
## Empirical surprise from iter9 Phase 0 deep-strace
|
||
|
||
I dumped the FULL 560-byte DECODE_PARAMS for the first P-frame of libva vs kdirect with `strace -s 8192`:
|
||
|
||
| Region | libva | kdirect | Diff |
|
||
|---|---|---|---|
|
||
| DPB[0] bytes 0-7 (reference_ts) | `30c2ea5cd622af18` (giant gettimeofday ns) | `f82a000000000000` (0x2af8 = 10968 ns) | **DIFF** |
|
||
| DPB[0] bytes 8-31 (pic_num/frame_num/fields/tfoc/bfoc/flags) | `00 00 00 00 00 00 03 00 ... 00 00 01 00 00 00 01 00 03 00 00 00` | identical | match |
|
||
| DPB[1..15] | all zero | all zero | match |
|
||
| Post-DPB bytes 512-559 | `01 00 01 00 04 00 01 00 ...` | identical | **match** |
|
||
|
||
**The ONLY wire-byte diff in DECODE_PARAMS is `reference_ts` magnitude.** Post-DPB fields, DPB entry contents (after α-2 fixed POC), pic_num, frame_num, fields, flags — all byte-identical between libva and kdirect.
|
||
|
||
Combined with iter8's other findings:
|
||
- SPS: identical except `constraint_set_flags` (rkvdec ignores per Phase 5b CRIT-1)
|
||
- PPS: identical (verified in this Phase 0)
|
||
- SCALING_MATRIX: identical (always-flat default)
|
||
|
||
**This leaves `reference_ts` (and adjacent OUTPUT QBUF timestamp) as the single remaining wire-byte hypothesis.**
|
||
|
||
## Locked research question (iter9, 2026-05-13)
|
||
|
||
> *"Does replacing libva's gettimeofday-based timestamp scheme with a monotonic per-context counter (matching kdirect's small-value pattern) unblock libva's H.264 decode on rkvdec? After fix: `libva_h264.yuv == kdirect_h264.yuv` byte-identical."*
|
||
|
||
### Pass/fail (boolean)
|
||
|
||
1. **H.264 libva == kdirect**: `cmp -s libva_h264.yuv kdirect_h264.yuv` returns 0. Hash matches `1e7a0bc9…`.
|
||
2. **VP9 unchanged**: `4f1565e8…`.
|
||
3. **MPEG-2 unchanged**: `19eefbf4…`.
|
||
4. **HEVC unchanged**: `06b2c5a0…` (Bug 5 still deferred).
|
||
5. **VP8 unchanged**: `bcc57ed5…` (Bug 6 still deferred).
|
||
6. **Control-payload anchors hold for 4 non-H.264 codecs**.
|
||
|
||
Clean iter9 close = all 6 PASS. If criterion 1 still fails, iter9 PARTIAL close with timestamp eliminated — at which point the search space for Bug 4 is effectively exhausted on the libva wire-payload side, and iter10+ would shift to slice-data encoding or kernel-internal investigation.
|
||
|
||
## Substrate state at iter9 open
|
||
|
||
| Property | Value |
|
||
|---|---|
|
||
| Kernel | `linux-fresnel-fourier 7.0-1` (unchanged) |
|
||
| Fork tip | `0226684` (iter8 close: γ + IMP-1 + α-2 POC strip removal) |
|
||
| Backend installed | `b6a3958a5bca945164262339dea5cc28f17accce13d57bd9f0c5a5dabbdf1b53` |
|
||
| Test fixtures | unchanged |
|
||
| Bug 4 narrowing | 5 eliminations (libva-readback, slot-binding, stale-residue, constraint_set_flags, POC sentinel) |
|
||
| Bug 4 remaining wire diff | reference_ts magnitude (giant vs small) |
|
||
|
||
## Mechanism the question targets
|
||
|
||
`picture.c::RequestEndPicture` line 440:
|
||
```c
|
||
gettimeofday(&surface_object->timestamp, NULL);
|
||
```
|
||
|
||
This produces a real-clock timeval (~1.78 × 10^9 sec = 1.78 × 10^18 ns) as the OUTPUT QBUF timestamp. The kernel stores this on the CAPTURE buffer via `V4L2_BUF_FLAG_TIMESTAMP_COPY` after decode.
|
||
|
||
`h264.c::h264_fill_dpb` line 268-269:
|
||
```c
|
||
timestamp = v4l2_timeval_to_ns(&surface->timestamp);
|
||
dpb->reference_ts = timestamp;
|
||
```
|
||
|
||
Sends the same giant ns value as reference_ts for inter-frame references.
|
||
|
||
In principle the kernel's `vb2_find_buffer_by_timestamp` does an exact 64-bit ns match and should not care about magnitude. But empirically, libva fails and kdirect (which uses ffmpeg-v4l2request's internal counter generating tiny ns values like `0x2af8 = 10968`) succeeds.
|
||
|
||
Possible mechanisms:
|
||
- **M-A**: Kernel `vb2_find_buffer_by_timestamp` has a comparison that fails for very large values (e.g., overflow on a u32 truncation, or signed comparison treating bit 63 as negative).
|
||
- **M-B**: The OUTPUT QBUF's timestamp gets truncated/transformed by the V4L2 framework before being stored on the CAPTURE buffer, but the DPB.reference_ts is left at full resolution. The kernel then compares full-resolution reference_ts against truncated stored ts — never matches.
|
||
- **M-C**: `gettimeofday` and `v4l2_timeval_to_ns` produce slightly different ns values (e.g., due to a re-read or rounding), making OUTPUT QBUF's ts and DPB.reference_ts not byte-equal.
|
||
- **M-D**: Some other reason a small counter works but a giant one doesn't.
|
||
|
||
## Fix shape
|
||
|
||
### α-7: monotonic per-context counter
|
||
|
||
Replace `gettimeofday(&surface_object->timestamp, NULL)` with a counter scheme that produces small ns values matching kdirect's pattern.
|
||
|
||
Simplest implementation:
|
||
- Add `u64 timestamp_counter` to `request_data` (init at 0 in CreateContext).
|
||
- In `EndPicture`, increment counter and set `surface->timestamp` from it.
|
||
|
||
Code shape:
|
||
```c
|
||
driver_data->timestamp_counter += 1000; /* 1 µs increments — small magnitude */
|
||
surface_object->timestamp.tv_sec = driver_data->timestamp_counter / 1000000;
|
||
surface_object->timestamp.tv_usec = driver_data->timestamp_counter % 1000000;
|
||
```
|
||
|
||
Or even simpler — just increment by 1 each frame, giving small values like 1, 2, 3, ...
|
||
|
||
LOC estimate: ~10 LOC.
|
||
|
||
### Risk
|
||
|
||
- **R-1**: Timestamp uniqueness — if the counter wraps in some long-lived process, ts uniqueness fails. For a campaign verifier (3 frames), counter wraps are impossible. For production playback, even 1 µs/frame gives ~140 years of unique values from u64.
|
||
- **R-2**: VP9 / VP8 / HEVC / MPEG-2 regression. Timestamp is shared infrastructure; all codecs use this same gettimeofday path. The change is uniform across codecs, so all 5 codecs get the new counter. VP9/MPEG-2 currently PASS direct via libva; switching from gettimeofday to counter should be neutral OR also a positive.
|
||
- **R-3**: Some consumer (ffmpeg outside the decoder path) reads the surface timestamp as a real wallclock. Probably no — VAAPI surfaces don't expose timestamps to consumers.
|
||
|
||
## In scope
|
||
|
||
- `src/request.h` — add `timestamp_counter` to driver_data.
|
||
- `src/request.c` or `src/context.c` — init counter to 0 in CreateContext.
|
||
- `src/picture.c::RequestEndPicture` — replace `gettimeofday(&surface->timestamp, NULL)` with counter-based assignment.
|
||
|
||
## Out of scope
|
||
|
||
- Per-codec changes (this is a shared-infrastructure change).
|
||
- Kernel patches.
|
||
|
||
## Phase 2 source-read targets
|
||
|
||
Already done in Phase 0 strace dump. Phase 2 will be brief — just confirm the gettimeofday call site is unique and that no other code reads `surface->timestamp` as a real wallclock.
|
||
|
||
## Phase 3 baseline
|
||
|
||
Already captured in iter8 Phase 3 + Phase 7c regression sweep. iter9 Phase 3 may just re-anchor.
|
||
|
||
## Phase 4 plan shape (predicted)
|
||
|
||
Already implicit in α-7. Will be drafted explicitly in Phase 4.
|
||
|
||
## Phase 5 review concerns to invite
|
||
|
||
- Reviewer should challenge `M-B` plausibility: does the V4L2 framework really transform OUTPUT QBUF timestamp before storing it on CAPTURE? Check `drivers/media/v4l2-core/v4l2-dev.c` or `videobuf2-v4l2.c::vb2_buffer_done`.
|
||
- Reviewer should verify the counter-monotonic-counter approach doesn't break MPEG-2 (which works via libva on hantro — different driver) or VP9 (works via libva on rkvdec).
|
||
- Per `feedback_wire_vs_behavior.md`: don't claim α-7 success on wire-byte change alone; criterion-1 hash test required.
|
||
|
||
## Predicted iter9 cadence
|
||
|
||
- Phase 0: this doc.
|
||
- Phase 4: 15 min.
|
||
- Phase 5: 30 min.
|
||
- Phase 6: 15 min.
|
||
- Phase 7: 15 min.
|
||
- Phase 8: 10 min.
|
||
|
||
Total: ~90 min. Quick iteration.
|
||
|
||
## What "iter9 PASS" looks like
|
||
|
||
If α-7 closes Bug 4:
|
||
- iter9 PASS.
|
||
- Bug 4 closed. H.264 row goes from PARTIAL to PASS direct.
|
||
- Memory rule worth recording: V4L2 stateless decoders may require small/relative reference_ts values; gettimeofday is unsafe. See [[feedback_v4l2_timestamp_scheme]].
|
||
|
||
If α-7 doesn't close:
|
||
- iter9 PARTIAL. iter10 candidates: slice-data encoding (DEEP), DPB entry ordering (cosmetic, kernel sorts internally anyway), or pivot to a different bug (Bug 5 HEVC).
|
||
- Realistically, iter10 may shift entirely to a kernel-side or fresh investigation since the wire-byte search space is now exhausted.
|