Files
fresnel-fourier/phase0_iter5_loopback.md
marfrit cd34ec1918 iter5 Phase 0 loopback: real Bug 2 is surface.c:173 hardcoded OUTPUT format
Empirical strace of all 5 codecs through libva shows VIDIOC_S_FMT on
OUTPUT_MPLANE ships pixelformat V4L2_PIX_FMT_H264_SLICE for EVERY
profile. HEVC controls submitted on H264_SLICE OUTPUT → kernel rkvdec
silently rejects/no-ops → CAPTURE stays in cap_pool init (all-zero).

Per-codec Bug 2 taxonomy:
- HEVC, VP9, VP8: OUTPUT format mismatch on rkvdec/hantro-strict → 100% zero
- MPEG-2: format mismatch but hantro tolerates → works
- H.264: format right by coincidence; keyframe decodes, inter all-zero
  (Bug 4, separate, deferred from iter5b)

Site: src/surface.c:173 `unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE`.
Same bug class as feedback_unconditional_codec_state.md
(iter4 h264_start_code = true).

iter5b new Phase 1: fix surface.c to switch pixelformat on
config_object->profile. 4 criteria locked, all backend-side, no kernel
patches. RFC v2 series filed back to backlog for a future
DMABUF-import-consumer campaign.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 11:21:41 +00:00

12 KiB
Raw Permalink Blame History

Iteration 5 — Phase 0 loopback (re-root-cause Bug 2)

Captured 2026-05-11 mid-day after Phase 5 review CRIT-1 invalidated the iter5 Phase 4 plan. iter5 returns to Phase 0 per feedback_dev_process.md. The vb2_dma_resv-RFC-v2-as-Bug-2-fix hypothesis is rejected. This document captures the new empirical evidence and re-frames iter5.

What Phase 5 + this loopback found

The original Bug 2 framing (Phase 0 Candidate B, Phase 2 situation, Phase 3 baseline, Phase 4 plan) was: "userspace cap_pool readback races ahead of kernel decoder completion; vb2_dma_resv RFC v2 closes the race." Phase 5 reviewer empirically traced producer→primitive→consumer-read-site through the actual libva backend code path and found the fence mechanism never reaches the MMAP+EXPBUF path the backend uses. The author re-verified at the surface.c source level: RequestSyncSurface already does media_request_wait_completion + v4l2_dequeue_buffer, which already block until decode-DONE. The fence would have been a no-op for the libva path even if it had reached the right resv.

So why are pages all-zero?

This loopback's empirical investigation answers it.

Empirical finding: OUTPUT pixel format hardcoded H264_SLICE

/home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/src/surface.c:173:

unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE;

This is the OUTPUT-side pixel format the backend sets on the V4L2 queue. It is hardcoded, regardless of which profile is active. When the backend then submits HEVC controls (V4L2_CID_STATELESS_HEVC_*) on an OUTPUT buffer queued with H264_SLICE format, the kernel rkvdec driver sees a fundamental contract mismatch:

  • OUTPUT buffer format claims: H.264 NAL slices.
  • Submitted controls claim: HEVC SPS/PPS/decode_params/slice_params/scaling_matrix.

Kernel doesn't decode (logs no error in dmesg — silent rejection). The CAPTURE buffer stays in the cap_pool init state (all-zero). When userspace vaDeriveImage + vaMapBuffer + ffmpeg hwdownload reads, it reads the unmodified all-zero pages.

This is the same class of bug as memory feedback_unconditional_codec_state.md: codec-specific state set unconditionally without profile gating. The iter4 fix for h264_start_code gated it on H.264/HEVC profiles. Surface.c's pixelformat needs an analogous gate.

Per-codec Bug 2 taxonomy (post-loopback)

Codec OUTPUT format in libva Match? Phase 3 libva result Phase 3 explanation
H.264 H264_SLICE right 99.99% zero + traces of keyframe content Format right; separate H.264 inter-frame bug — race or something else
HEVC H264_SLICE wrong 100% zero OUTPUT format mismatch → kernel rkvdec rejects/no-ops
VP9 H264_SLICE wrong 100% zero OUTPUT format mismatch → kernel rkvdec rejects/no-ops
MPEG-2 H264_SLICE wrong but tolerated Real decoded pixels (libva == kdirect) hantro is single-codec; ignores OUTPUT format mismatch, dispatches on control class. Or got lucky on timing.
VP8 H264_SLICE wrong 100% zero OUTPUT format mismatch on hantro → unlike MPEG-2, VP8 doesn't tolerate it

Empirically verified via strace inspection at fresnel /tmp/iter5_fmt/<codec>/trace.* 2026-05-11: each codec's first VIDIOC_S_FMT on V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE ships pixelformat S264 (= V4L2_PIX_FMT_H264_SLICE, FOURCC 0x34363253). No corrective re-S_FMT later in any codec's trace.

What this means for iter1-iter4 close validity

iter2 (HEVC), iter3 (VP8), and iter4 (VP9) all closed via transitive proof: backend's VIDIOC_S_EXT_CTRLS payload byte-matched the kernel-direct ffmpeg-v4l2request anchor. That proof verified the control-payload contract, not the OUTPUT pixel format. The transitive proof was technically sound for what it claimed (controls correctly shaped), but silently masked the OUTPUT format bug — because the test never actually decoded a frame through libva and read pixels; it only compared control bytes.

iter1 (MPEG-2) closed via direct pixel verification, and it happens to work because hantro tolerates the wrong OUTPUT format. iter1's "PASS direct" stands.

H.264 (T4) is in the carry-over from libva-multiplanar's ohm work. It also happens to use the default H264_SLICE format. Decoded keyframes through libva were verified at iter1 P0 cross-validator (real pixels at +30s seek). But the new Phase 3 data shows inter frames go all-zero for H.264 too. That's a separate Bug 4 (H.264 inter-frame race or sync gap) that the iter1 P0 testing didn't expose because mpv's --vo=image --frames=2 --start=+30s happens to hit a keyframe.

Bug taxonomy after loopback

Bug Root cause Affects Severity
Bug 2 surface.c:173 hardcodes OUTPUT format H264_SLICE for every profile HEVC, VP9, VP8 (rkvdec + hantro that strictly checks format) HIGH — codec-class bug, masks 3 of 5 codecs
Bug 3 None — Phase 5 confirmed Bug 3 doesn't exist as UAPI drift (n/a) (n/a)
Bug 4 (new, from Phase 3 + this loopback) H.264 inter frames produce all-zero pages through libva even though OUTPUT format matches H.264 inter frames specifically MEDIUM — keyframes work, inter frames don't

Re-locked Phase 1 success criteria (iter5b)

The original 4 criteria are partially still valid; criterion 1 changes target (backend fix, not kernel patch). Criteria 2 (substrate ships from kernel-agent) is dropped — no kernel patches needed for the new fix. Criterion 3 (no codec-contract regression) stays. Criterion 4 (5/5 direct) stays as the bar.

*"Fix the libva backend's OUTPUT pixel-format hardcoding (surface.c:173) so that each profile sets the correct V4L2_PIX_FMT_*_SLICE / _FRAME on the OUTPUT_MPLANE buffer. After fix: ffmpeg-vaapi-hwdownload for HEVC + VP9 + VP8 produces YUV byte-identical to the kernel-direct + SW reference. MPEG-2 continues to pass. H.264 keyframes continue to decode correctly through libva. H.264 inter-frame Bug 4 is OUT OF SCOPE for iter5b; deferred to a follow-up iteration."

Pass/fail (boolean, iter5b)

  1. Bug 2 closed for HEVC, VP9, VP8libva_<codec>.yuv == kdirect_<codec>.yuv == sw_<codec>.yuv (SHA256-equal raw YUV bytes), for bbb_720p10s_{hevc.mp4, vp9.webm, vp8.webm}, 3-frame test, on the current linux-fresnel-fourier 7.0-1 kernel (no kernel patches).
  2. No regression on MPEG-2libva_mpeg2.yuv == kdirect_mpeg2.yuv still holds (matches Phase 3 baseline). MPEG-2 already worked; the OUTPUT format fix should not break it.
  3. H.264 keyframe still decodes — H.264 first frame (keyframe) through libva still produces real content (81 81 80 80 … neutral chroma row at byte offset 0 of frame 1). Bug 4 (inter frames all-zero) is acceptable for iter5b close — out of scope, recorded as backlog. H.264 still passes via the iter1-P0 keyframe-seek path.
  4. Control-payload anchors holdVIDIOC_S_EXT_CTRLS payload for each codec on the fixed backend byte-matches the iter5 Phase 3 anchor. Backend control-handling code is unchanged; only the OUTPUT pixel format setup changes.

Clean iter5b close = 4/4 criteria green.

Phase 2 source-read targets for iter5b

The fix site is src/surface.c:173. The supporting per-profile mapping table needs to derive from:

  • VA-API VAProfile* enum (/usr/include/va/va.h).
  • Backend's existing profile dispatch (e.g., config.c::RequestCreateConfig switch, picture.c::codec_set_controls dispatch).
  • Kernel UAPI per-codec OUTPUT pixel formats (<linux/v4l2-controls.h>, FOURCC values).

Expected mapping:

VA profile (object_config->profile) V4L2_PIX_FMT_* Already in config.c?(grep at iter5 Phase 0)
VAProfileH264* H264_SLICE (0x34363253 = 'S','2','6','4') yes line 151-154
VAProfileHEVCMain HEVC_SLICE (0x53434548 = 'H','E','V','S' ?) yes line 165-168
VAProfileMPEG2* MPEG2_SLICE yes line 140-143
VAProfileVP8Version0_3 VP8_FRAME yes line 175-178
VAProfileVP9Profile0 VP9_FRAME yes line 185-188

config.c already knows the mapping (uses it for profile-enumeration probes). Phase 4 plan for iter5b is to thread the active profile from CreateContext through to CreateSurfaces, OR defer the OUTPUT-side v4l2_set_format from CreateSurfaces to CreateContext when the profile is known, OR look up the active context's profile from driver_data at CreateSurfaces time.

Which approach is cleanest is a Phase 4 plan question, not a Phase 0 question.

Open question for iter5b Phase 4

Where in the VA-API lifecycle should the OUTPUT format S_FMT happen?

  • Option α: At CreateSurfaces (current site), but read profile from driver_data->current_profile which must be set at CreateConfig OR CreateContext. Simplest patch.
  • Option β: Defer the S_FMT + CREATE_BUFS lifecycle entirely from CreateSurfaces to CreateContext (when profile is unambiguously known). Larger refactor but architecturally cleanest.
  • Option γ: Trigger S_FMT lazily at first BeginPicture, when profile is definitely active. Requires checking format on every BeginPicture and conditionally REQBUFS(0)+S_FMT+CREATE_BUFS — heavyweight if BeginPicture fires often.

Phase 4 of iter5b picks one of these.

Bug 4 — H.264 inter frame race (OUT OF SCOPE for iter5b)

Empirical signature (Phase 3 + this loopback):

  • H.264 keyframe: real content (81 81 80 80 … chroma row in frame 1 first 16 bytes).
  • H.264 inter frame: all-zero (frame 2 + 3 byte-by-byte zero).
  • This pattern is consistent (not intermittent), unlike the original "race" hypothesis would predict.

Possible causes:

  • DPB-related issue specifically affecting inter frames (kernel rejects inter decode because reference list is malformed).
  • Some other backend-side state that's wrong for inter (e.g., decode_params flags).
  • An actual race that happens to lose every time for inter (decode is slower for inter because of motion compensation, while keyframes can finish before the readback).

Filed as Bug 4. Investigated by a future iteration (likely iter6). Not iter5b's surface — Bug 2 OUTPUT format fix is enough for 3 codecs (HEVC, VP9, VP8) to gain direct-verification PASS status.

After iter5b: campaign scoreboard becomes "5/5 with 4 direct + 1 (H.264 inter) partial" — better than the current "5/5 with 1 direct (MPEG-2) + 4 transitive."

Substrate at loopback open

  • Kernel: linux-fresnel-fourier 7.0-1. Unchanged.
  • Fork tip: 692eaa0. Unchanged.
  • Backend installed: SHA256 6e90b7a9b2c33480…. Unchanged.
  • Test fixtures: unchanged.
  • Boltzmann: reachable as of Phase 4/Phase 5; not needed for iter5b (no kernel work).
  • vb2_dma_resv RFC v2 patches: still local at ~/src/linux-rfc/. Filed back to the kernel-agent backlog for a future campaign that targets the DMABUF-import consumer path (KWin/Mesa).

Memory rules touched

  • New: feedback_trace_fix_mechanism_to_consumer.md (pinned at start of this loopback per operator instruction).
  • feedback_unconditional_codec_state.md — applied: surface.c's pixelformat hardcode is the same class as the h264_start_code unconditional-set bug iter4 fixed. The lesson generalizes.
  • feedback_review_empirical_over_theoretical.md — Phase 5 review embodied Direction 2 by tracing producer→consumer empirically.

What iter5b looks like

  1. Phase 1 lock (this doc): bullets above. 4 criteria.
  2. Phase 2 source-read: pick implementation option α/β/γ. Cite line numbers.
  3. Phase 3 baseline: re-run Phase 3 sweep (already done; baseline already captured at iter5_phase3_baseline.tgz). No new baseline needed.
  4. Phase 4 plan: patch shape, exact diff for surface.c, ancillary changes to track profile in driver_data.
  5. Phase 5 review: sonnet-architect, focused on the profile-threading mechanism.
  6. Phase 6 implementation: backend patch + rebuild + install. Estimated <100 LOC.
  7. Phase 7 verification: re-run Phase 3 sweep with the fixed backend; expect libva == kdirect for HEVC+VP9+VP8.
  8. Phase 8 close: campaign scoreboard updated.

Estimated cadence: half a session for Phase 2-7. The fix is small, the verification is fast, no kernel build needed.