Files
fresnel-fourier/phase4c_iter8_plan.md
T
marfrit 16034152a8 iter8 Phase 4c: α-2 plan — remove POC sentinel strip for rkvdec
Phase 3 strace re-decoded with correct struct layout:
- libva sends dpb[0] tfoc=0, bfoc=0 (sentinel stripped)
- kdirect sends dpb[0] tfoc=65536, bfoc=65536 (FFmpeg sentinel preserved)
- flags match between both (0x03 VALID|ACTIVE)

rkvdec config_registers() writes top/bottom_field_order_cnt directly to
MMIO. The strip was added in h264.c:219 for hantro's prepare_table; for
rkvdec, kdirect's path (no strip) decodes correctly while libva's
(strip) produces 16x32 partial decode.

Option A: remove the strip entirely (~5 LOC).
Option B: per-driver gating (~20 LOC).

Hantro+H.264 not exercised on RK3399 — Option A is safe. Phase 5c
review then Phase 6c implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:48:52 +00:00

5.8 KiB
Raw Blame History

Iteration 8 — Phase 4c plan (α-2 POC strip audit)

Drafted 2026-05-13 after Phase 5b CRIT-1 killed α-1 (rkvdec ignores constraint_set_flags). Phase 4c shifts to the reviewer's IMP-1 redirect: audit DPB entry construction. Empirical re-read of Phase 3 strace bytes pins the leading hypothesis to POC sentinel strip, not flags.

Strace re-decode for kdirect frame 2 dpb[0]

\370*\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\0\0\0\0\1\0\0\0\1\0\3\0\0\0

Struct field decode (32 bytes per struct v4l2_h264_dpb_entry from <linux/v4l2-controls.h>):

Offset Field Width Hex Value Interpretation
07 reference_ts u64 f8 2a 00 00 00 00 00 00 0x2af8 = 10,968 small relative ns
811 pic_num u32 00 00 00 00 0
1213 frame_num u16 00 00 0
14 fields u8 03 V4L2_H264_FRAME_REF
1519 reserved u8 [5] 00 × 5
2023 top_field_order_cnt s32 00 00 01 00 0x00010000 = 65536 FFmpeg sentinel-encoded POC
2427 bottom_field_order_cnt s32 00 00 01 00 0x00010000 = 65536 FFmpeg sentinel-encoded POC
2831 flags u32 03 00 00 00 0x03 = VALID | ACTIVE

libva sends for the same reference frame:

\310\224,\33\256\34\257\30\0\0\0\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0
  • top_field_order_cnt = 0
  • bottom_field_order_cnt = 0
  • flags = 0x03 (VALID|ACTIVE) — same as kdirect

The diff that matters: libva's POC fields are 0 (sentinel stripped); kdirect's are 65536 (sentinel preserved).

flags match between libva and kdirect (both 0x03). My earlier Phase 3 / Phase 5b interpretation of "kdirect flags = 0x30001" was incorrect — those bytes are tfoc + bfoc + flags, not just flags.

Why this could be the kernel-side bug

Per rkvdec-h264.c lines 975-978 (config_registers):

writel_relaxed(dpb[i].top_field_order_cnt,
               rkvdec->regs + poc_reg_tbl_top_field[i]);
writel_relaxed(dpb[i].bottom_field_order_cnt,
               rkvdec->regs + poc_reg_tbl_bottom_field[i]);

The driver writes POC values directly to MMIO registers. The hardware uses POC for motion-compensation reference picture lookup. If libva strips bit 16 and kdirect preserves it, the values sent to hardware differ by 65536 — a large numerical difference.

For the current frame:

reg = RKVDEC_CUR_POC(dec_params->top_field_order_cnt);
writel_relaxed(reg, rkvdec->regs + RKVDEC_REG_CUR_POC0);

Same direct MMIO write. RKVDEC_CUR_POC may mask the value (e.g., to 32-bit signed), but if the strip doesn't match what hardware expects, the comparison "is this reference frame's POC equal to current frame's POC delta" fails → wrong reference chosen → garbage motion compensation.

Why the strip was added

h264.c:191-218 docblock says the strip was added for hantro's prepare_table which feeds tbl->poc[] directly:

Working VAAPI backends (intel-iHD, i965 verified empirically on meitner 2026-05-02) tolerate the high word — they either mask it or treat POCs as relative comparisons. V4L2 stateless H.264 driver-side consumers (hantro_h264.c::prepare_table feeds the value direct to tbl->poc[]) need the spec value, so we strip the sentinel here at the libva-v4l2-request boundary.

So the strip was hantro-specific. For rkvdec, kdirect (via ffmpeg-v4l2request) preserves the sentinel and decodes correctly. Stripping for rkvdec may be the wrong move.

Proposed fix

α-2 — Remove POC sentinel strip OR make it driver-aware

Option A (simplest): Remove the strip entirely. Match what kdirect (ffmpeg-v4l2request) sends. If hantro+H.264 still works after, great; if it regresses, fall back to Option B.

Option B (per-driver gating): Add driver-detect logic. For rkvdec: don't strip. For hantro: strip. Requires knowing which driver is active (already known via info.driver from MEDIA_IOC_DEVICE_INFO at init).

Hantro+H.264 is not in the campaign scoreboard (hantro-decoder doesn't support H.264 on RK3399 anyway, and the iter4 transitive path didn't exercise hantro+H.264 directly). So Option A is safe in practice.

LOC

~5 LOC for Option A: change h264_strip_ffmpeg_poc_sentinel to return poc unchanged, OR delete the function and call sites.

For Option B: ~20 LOC (driver-detect bool + per-driver branch).

Risks

  • R-1: POC strip removal breaks hantro+H.264 elsewhere (deferred backend that does decode H.264). Probability: irrelevant on RK3399 (hantro-decoder doesn't support H.264). Mitigation: Option B.
  • R-2: The POC diff isn't the root cause; H.264 still fails. Probability: medium. Mitigation: Phase 7c γ dump shows full-plane writes or not.
  • R-3: VP9 / HEVC / MPEG-2 / VP8 regression. Probability: zero. POC strip is H.264-specific (only in h264.c).

Scope

In scope: src/h264.ch264_strip_ffmpeg_poc_sentinel and its 4 call sites (h264.c:309, 312, 462, 465).

Out of scope: VP9 / VP8 / HEVC / MPEG-2 paths.

Phase 5c review

I'd like a quick review of this plan. The change is small (5 LOC for Option A). Reviewer should:

  • Confirm Option A doesn't break anything else.
  • Validate the strace re-decode of POC bytes is correct.
  • Suggest whether to skip α-2's Phase 6c if confidence is high.

If reviewer comes back with "looks good" — Phase 6c is a 5-LOC change + build + test. Very fast.

Phase 7c success criteria

  • C1: libva_h264 == kdirect_h264 (criterion 1 PASS).
  • C2-C6: 5 other codecs unchanged.

If γ dump confirms full-plane writes after fix → Bug 4 closed.

Predicted cadence

  • Phase 5c review: 5-10 min (quick).
  • Phase 6c implementation: 5 min.
  • Phase 7c verification: 10 min.
  • Phase 8 close: 10 min.

Total: 30 min if α-2 works.

If α-2 doesn't work → iter8 PARTIAL close with Bug 4 narrowed further (POC isn't load-bearing either; iter9 candidate explores PPS/SLICE deeper diff).