Files
fresnel-fourier/phase8_iteration18_close.md
marfrit a449cec92e iter18 Phase 8 close: mechanisms 3 + 5 disproved; iter17 finding stands
α-21 (heap-persist HEVC controls past IOC_QUEUE): hash unchanged.
  -> Kernel does copy at S_EXT_CTRLS time, not deferred. Mechanism 3 dead.

α-22 (log error_idx after S_EXT_CTRLS): error_idx = count - 1 in BOTH
  the working device-init batch AND the broken per-frame batch. Not
  a failure indicator in this kernel version. Mechanism 5 dead.

Backend reverted to iter15 stable state c1d4bb53... All 5-codec
anchors preserved.

Remaining mechanisms (untested):
  1. request_fd mismatch (unlikely; strace shows consistent fd)
  2. REINIT clears controls between S_EXT_CTRLS and QUEUE (LEADING)
  4. ctrl_hdl mismatch (libva submits to one, rkvdec reads from another)

iter17's empirical finding still stands as the campaign's strongest
narrowing: rkvdec sees zero SPS for libva, correct for kdirect. The
mechanism is between S_EXT_CTRLS submission and ctx->ctrl_hdl->p_cur
read, specific to libva's invocation pattern.

iter19 candidate (α-23): test mechanism 2 by disabling
media_request_reinit() in libva's RequestSyncSurface. If hashes
change, REINIT timing is the bug. Alternative (mechanism 4): kernel
printk that dumps &ctx->ctrl_hdl + per-request handler pointer,
comparing libva vs kdirect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 09:02:19 +00:00

4.2 KiB
Raw Permalink Blame History

Iteration 18 — Phase 8 (close)

Closes 2026-05-14. iter18 = test mechanisms 3 (stale pointer) and 5 (silent partial failure) for iter17's finding. PARTIAL close. Both mechanisms disproved.

Mechanism tests

α-21 (mechanism 3 — stale stack-local pointers)

Made libva's HEVC control structs static (file-scope), persisting indefinitely. Suppressed free(slice_params_array) so the heap-allocated SLICE_PARAMS also persists past MEDIA_REQUEST_IOC_QUEUE.

Result: Hash unchanged (06b2c5a0…). Kernel pr_info still shows w=0 h=0 for libva. Mechanism 3 DISPROVED — kernel does copy at S_EXT_CTRLS time, not deferred.

α-22 (mechanism 5 — silent partial failure via error_idx)

Added libva-side logging of controls.error_idx after each S_EXT_CTRLS. Output:

S_EXT_CTRLS rc=0 errno=0 count=2 error_idx=1 request_fd=0 which=0x0      # device-init: WORKS
S_EXT_CTRLS rc=0 errno=0 count=5 error_idx=4 request_fd=8 which=0xf010000  # per-frame: BROKEN

error_idx = count - 1 in BOTH the working (device-init, sets HEVC_DECODE_MODE + HEVC_START_CODE correctly) and broken cases. This is not a failure indicator in this kernel version — it appears to just be "index of last control processed."

α-22 follow-up test removed DECODE_PARAMS (count=4); error_idx still = count-1 (=3). Removing DECODE_PARAMS didn't unblock — same all-zero kernel state.

Mechanism 5 DISPROVED — error_idx isn't reporting partial failure.

Both reverted, backend restored to iter15 state (c1d4bb53…)

iter18 ships zero code changes. All tests proved their hypotheses negative.

Remaining mechanisms (post-iter18)

# Mechanism Status
1 request_fd mismatch unlikely (strace shows consistent fd)
2 REINIT clears controls between S_EXT_CTRLS and QUEUE untested — leading hypothesis
3 Stack-locals stale DISPROVED by α-21
4 ctrl_hdl mismatch (libva submits to one, rkvdec reads from another) untested — possible
5 Silent partial failure via error_idx DISPROVED by α-22

iter19 candidate (α-23)

Mechanism 2 test: temporarily disable media_request_reinit() in libva's RequestSyncSurface for HEVC. If the controls SURVIVE without REINIT-clearing them, mechanism 2 is confirmed. Then the fix is to reorder: REINIT must run before the next S_EXT_CTRLS, NOT after the previous decode (which is libva's current iter6 model).

Or alternatively (mechanism 4): add deeper kernel printk that dumps ctx->ctrl_hdl pointer + the per-request req->req (V4L2 request handler) pointer, comparing libva-trigger vs kdirect-trigger. If they're different handlers, libva is staging to wrong one.

The kernel-side approach (deeper printk) is more invasive but more definitive. Alternative: rebuild rkvdec_hevc_run_preamble to dump &ctx->ctrl_hdl AND first 16 bytes of *run->sps. If pointer is the same as a previous frame's, suggests no per-request update.

Substrate state at iter18 close

  • Fork tip fc78ed4 on noether + fresnel + gitea (clean iter15 state).
  • Backend SHA c1d4bb53… on fresnel (iter15 stable).
  • Kernel linux-fresnel-fourier 7.0-3 (with diagnostic printk; want to keep for iter19).
  • 5-codec anchors: byte-identical to iter15 anchors. Zero regression.

iter17 finding stands

Despite iter18's negative results on mechanisms 3 and 5, the iter17 empirical finding remains the campaign's strongest narrowing:

The kernel sees zero control values for libva HEVC (run.sps->{w,h,reorder,chroma}=0,0,0,0) but correct values for kdirect (w=1280, h=720, reorder=2, chroma=1).

The mechanism is still unknown but localized. The remaining productive direction is targeted kernel investigation of where libva's S_EXT_CTRLS payload lands vs where rkvdec reads from.

Lessons

  1. error_idx semantics differ between kernel versions / paths. Don't rely on it for partial-failure detection without first verifying the success-case value.
  2. Stack-local control pointers are SAFE for V4L2 compound controls — the kernel copies immediately at S_EXT_CTRLS time.
  3. The S_EXT_CTRLS → request → ctrl_hdl chain has at least one bug specific to libva's invocation pattern on RK3399 rkvdec, despite identical wire bytes vs kdirect.