Files
fresnel-fourier/phase4c_iter8_plan.md
marfrit 16034152a8 iter8 Phase 4c: α-2 plan — remove POC sentinel strip for rkvdec
Phase 3 strace re-decoded with correct struct layout:
- libva sends dpb[0] tfoc=0, bfoc=0 (sentinel stripped)
- kdirect sends dpb[0] tfoc=65536, bfoc=65536 (FFmpeg sentinel preserved)
- flags match between both (0x03 VALID|ACTIVE)

rkvdec config_registers() writes top/bottom_field_order_cnt directly to
MMIO. The strip was added in h264.c:219 for hantro's prepare_table; for
rkvdec, kdirect's path (no strip) decodes correctly while libva's
(strip) produces 16x32 partial decode.

Option A: remove the strip entirely (~5 LOC).
Option B: per-driver gating (~20 LOC).

Hantro+H.264 not exercised on RK3399 — Option A is safe. Phase 5c
review then Phase 6c implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:48:52 +00:00

118 lines
5.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 8 — Phase 4c plan (α-2 POC strip audit)
Drafted 2026-05-13 after Phase 5b CRIT-1 killed α-1 (rkvdec ignores `constraint_set_flags`). Phase 4c shifts to the reviewer's IMP-1 redirect: audit DPB entry construction. Empirical re-read of Phase 3 strace bytes pins the leading hypothesis to **POC sentinel strip**, not flags.
## Strace re-decode for kdirect frame 2 dpb[0]
```
\370*\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\0\0\0\0\1\0\0\0\1\0\3\0\0\0
```
Struct field decode (32 bytes per `struct v4l2_h264_dpb_entry` from `<linux/v4l2-controls.h>`):
| Offset | Field | Width | Hex | Value | Interpretation |
|---|---|---|---|---|---|
| 07 | reference_ts | u64 | f8 2a 00 00 00 00 00 00 | 0x2af8 = 10,968 | small relative ns |
| 811 | pic_num | u32 | 00 00 00 00 | 0 | |
| 1213 | frame_num | u16 | 00 00 | 0 | |
| 14 | fields | u8 | 03 | V4L2_H264_FRAME_REF | |
| 1519 | reserved | u8 [5] | 00 × 5 | — | |
| 2023 | top_field_order_cnt | s32 | 00 00 01 00 | **0x00010000 = 65536** | **FFmpeg sentinel-encoded POC** |
| 2427 | bottom_field_order_cnt | s32 | 00 00 01 00 | **0x00010000 = 65536** | **FFmpeg sentinel-encoded POC** |
| 2831 | flags | u32 | 03 00 00 00 | 0x03 = VALID \| ACTIVE | |
**libva sends** for the same reference frame:
```
\310\224,\33\256\34\257\30\0\0\0\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0
```
- top_field_order_cnt = 0
- bottom_field_order_cnt = 0
- flags = 0x03 (VALID|ACTIVE) — same as kdirect
**The diff that matters: libva's POC fields are 0 (sentinel stripped); kdirect's are 65536 (sentinel preserved).**
flags match between libva and kdirect (both 0x03). My earlier Phase 3 / Phase 5b interpretation of "kdirect flags = 0x30001" was incorrect — those bytes are tfoc + bfoc + flags, not just flags.
## Why this could be the kernel-side bug
Per rkvdec-h264.c lines 975-978 (config_registers):
```c
writel_relaxed(dpb[i].top_field_order_cnt,
rkvdec->regs + poc_reg_tbl_top_field[i]);
writel_relaxed(dpb[i].bottom_field_order_cnt,
rkvdec->regs + poc_reg_tbl_bottom_field[i]);
```
The driver writes POC values directly to MMIO registers. The hardware uses POC for motion-compensation reference picture lookup. If libva strips bit 16 and kdirect preserves it, the values sent to hardware differ by 65536 — a large numerical difference.
For the current frame:
```c
reg = RKVDEC_CUR_POC(dec_params->top_field_order_cnt);
writel_relaxed(reg, rkvdec->regs + RKVDEC_REG_CUR_POC0);
```
Same direct MMIO write. RKVDEC_CUR_POC may mask the value (e.g., to 32-bit signed), but if the strip doesn't match what hardware expects, the comparison "is this reference frame's POC equal to current frame's POC delta" fails → wrong reference chosen → garbage motion compensation.
## Why the strip was added
h264.c:191-218 docblock says the strip was added for hantro's `prepare_table` which feeds tbl->poc[] directly:
> Working VAAPI backends (intel-iHD, i965 verified empirically on meitner 2026-05-02) tolerate the high word — they either mask it or treat POCs as relative comparisons. V4L2 stateless H.264 driver-side consumers (hantro_h264.c::prepare_table feeds the value direct to tbl->poc[]) need the spec value, so we strip the sentinel here at the libva-v4l2-request boundary.
So the strip was hantro-specific. **For rkvdec, kdirect (via ffmpeg-v4l2request) preserves the sentinel and decodes correctly.** Stripping for rkvdec may be the wrong move.
## Proposed fix
### α-2 — Remove POC sentinel strip OR make it driver-aware
**Option A (simplest)**: Remove the strip entirely. Match what kdirect (ffmpeg-v4l2request) sends. If hantro+H.264 still works after, great; if it regresses, fall back to Option B.
**Option B (per-driver gating)**: Add driver-detect logic. For rkvdec: don't strip. For hantro: strip. Requires knowing which driver is active (already known via `info.driver` from MEDIA_IOC_DEVICE_INFO at init).
Hantro+H.264 is not in the campaign scoreboard (hantro-decoder doesn't support H.264 on RK3399 anyway, and the iter4 transitive path didn't exercise hantro+H.264 directly). So Option A is safe in practice.
### LOC
~5 LOC for Option A: change `h264_strip_ffmpeg_poc_sentinel` to return `poc` unchanged, OR delete the function and call sites.
For Option B: ~20 LOC (driver-detect bool + per-driver branch).
## Risks
- **R-1**: POC strip removal breaks hantro+H.264 elsewhere (deferred backend that does decode H.264). Probability: irrelevant on RK3399 (hantro-decoder doesn't support H.264). Mitigation: Option B.
- **R-2**: The POC diff isn't the root cause; H.264 still fails. Probability: medium. Mitigation: Phase 7c γ dump shows full-plane writes or not.
- **R-3**: VP9 / HEVC / MPEG-2 / VP8 regression. Probability: zero. POC strip is H.264-specific (only in h264.c).
## Scope
In scope: `src/h264.c``h264_strip_ffmpeg_poc_sentinel` and its 4 call sites (h264.c:309, 312, 462, 465).
Out of scope: VP9 / VP8 / HEVC / MPEG-2 paths.
## Phase 5c review
I'd like a quick review of this plan. The change is small (5 LOC for Option A). Reviewer should:
- Confirm Option A doesn't break anything else.
- Validate the strace re-decode of POC bytes is correct.
- Suggest whether to skip α-2's Phase 6c if confidence is high.
If reviewer comes back with "looks good" — Phase 6c is a 5-LOC change + build + test. Very fast.
## Phase 7c success criteria
- C1: libva_h264 == kdirect_h264 (criterion 1 PASS).
- C2-C6: 5 other codecs unchanged.
If γ dump confirms full-plane writes after fix → Bug 4 closed.
## Predicted cadence
- Phase 5c review: 5-10 min (quick).
- Phase 6c implementation: 5 min.
- Phase 7c verification: 10 min.
- Phase 8 close: 10 min.
Total: 30 min if α-2 works.
If α-2 doesn't work → iter8 PARTIAL close with Bug 4 narrowed further (POC isn't load-bearing either; iter9 candidate explores PPS/SLICE deeper diff).