Files
fresnel-fourier/phase0_findings_iter1.md
claude-noether f720c7784b iter1 Phase 0 + Phase 1 lock: MPEG-2 boolean correctness on hantro
Iteration 1 of the campaign 8(+1)-phase loop opens following the
campaign Phase 0 close (b74551b). Per the suggested order in
phase0_evidence/2026-05-07/cross_validator_traces.md, iter1 attacks
MPEG-2 — the cheapest fix in the codec status sweep (mpeg2.c is
already compiled, src/config.c has the case statements, but
vaCreateConfig returns 12 / UNSUPPORTED_PROFILE).

Locked research question:

  Make MPEG-2 the second codec to pass boolean-correctness on
  fresnel via the libva-v4l2-request-fourier path —
  mpv --hwdec=vaapi-copy bbb_720p10s_mpeg2.ts engages the backend
  cleanly and DMA-BUF GL import yields HW pixels byte-identical
  to a software-decoded reference for the same frames.

Phase 1 success criterion (5 boolean checks, all must be green):

  1. vainfo: VAProfileMPEG2Simple + VAProfileMPEG2Main still
     enumerated on the hantro env binding (regression check).
  2. vaCreateConfig(VAProfileMPEG2Main, VLD): VA_STATUS_SUCCESS.
  3. mpv --hwdec=vaapi-copy --frames=2 --vo=null on
     bbb_720p10s_mpeg2.ts: engages backend, exit 0, no
     "Failed to create decode configuration" lines.
  4. mpv --hwdec=vaapi --vo=image (DMA-BUF GL import) at +02s
     seek: 2 distinct frames hash-equal to SW reference frames
     and hash-differ from each other.
  5. T4 H.264 regression: re-run T4 incantation, hashes match
     f623d5f7... and 7d7bc6f2... reference values.

Mechanism the question targets:

The kernel + driver path is solid (cross_validator_traces.md —
ffmpeg-v4l2request decodes the same fixture exit 0). vaCreateConfig
rejection must be downstream of src/config.c:64-65's case match
(both VAProfileMPEG2Simple and VAProfileMPEG2Main are present in
the validation switch) but upstream of return-success. Plausible
suspects for Phase 2 source-read:

  - V4L2 capability probe (e.g., VIDIOC_TRY_FMT against MG2S) that
    fails because the libva backend was bound to /dev/video5 but
    is checking format against the wrong codec list.
  - Device-discovery routing reaches a default-reject because
    bound device didn't match an expected codec-to-device map.
  - media_request allocation step fails on /dev/media2 for some
    MPEG-2-specific reason.
  - iter6/iter7 regression in the dispatch-by-profile path that
    broke MPEG-2 silently because nobody on libva-multiplanar
    tested it (iter5 close: "MPEG-2 was iter1 backlog, dropped
    at iter6 close because A55 CPU handles it fine" — fresnel
    runs A53 so the disposition doesn't transfer).

Phase 4 plan must cite the contract before patching, per
feedback_dev_process.md Phase 6 contract-before-code: read kernel
drivers/media/platform/verisilicon/hantro_mpeg2.c, read FFmpeg
downstream libavcodec/v4l2_request_mpeg2.c, state the MPEG-2
control-submission contract explicitly before any code lands.

Predecessor carry-over (state vs data) explicit per
feedback_dev_process.md Phase 0 + feedback_replicate_baseline_first.md:

  Carries forward (re-verified in campaign Phase 0):
    - iter8 master fork (65969da) installed on fresnel,
      vainfo enumeration confirmed
    - bbb_720p10s_mpeg2.ts fixture (5.3 MB, MPEG-2 Main, 720p,
      10s, MPEG-TS) on fresnel ~/fourier-test/, provenance
      documented in test_fixtures.md
    - hantro-vpu-dec /dev/video5 + /dev/media2 binding
    - Cross-validator anchor: ffmpeg-v4l2request mpeg2 trace
      captured (5 S_EXT_CTRLS, 4 REQUEST_ALLOC, 4 DMA_BUF_SYNC)
    - T4 reference hashes for H.264 regression check

  Does NOT carry forward (re-acquire if needed):
    - ohm/RK3568 hantro MPEG-2 behaviour — different kernel
      driver variant inside drivers/media/platform/verisilicon/
    - Pre-iter6 libva-multiplanar MPEG-2 trace data (untested
      at iter6 close)

  Open questions inherited:
    - Cache-stale vaDeriveImage bug class on RK3399 (T4) —
      iter1 uses DMA-BUF GL import for verify; Phase 4 cross-
      cutting fix is not iter1-scoped
    - Architectural divergence ffmpeg vs our backend (EXPBUF +
      DMA_BUF_SYNC vs cap_pool + vaDeriveImage) — Phase 4
      design decision, not iter1-blocking

Out-of-scope (LOCKED): HEVC/VP9/VP8 (later iterations); vaDerive
Image cache-stale fix (Phase 4 cross-cutting); chromium-fourier
149 install (Phase 0 follow-up to iter1 substrate, non-gating);
performance metrics (Phase 1+ separate iteration); long-duration
stress (>10s); other MPEG-2 containers beyond MPEG-TS;
upstream engagement.

Iter1 Phase 2 source-read targets:
  - src/config.c::RequestCreateConfig (rejection site)
  - src/picture.c MPEG-2 dispatch path
  - src/mpeg2.c set-controls path
  - kernel drivers/media/platform/verisilicon/hantro_mpeg2.c
  - FFmpeg downstream libavcodec/v4l2_request_mpeg2.c

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 22:04:52 +00:00

16 KiB
Raw Permalink Blame History

Iteration 1 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock

Opens 2026-05-07 evening immediately after campaign phase0_findings.md close (commit b74551b). This is the first per-iteration loop on fresnel-fourier; the campaign-level Phase 0 already locked scope (5 codecs, RK3399, boolean correctness per codec) and produced the empirical groundwork on which iter1 commits.

Locked research question (iteration 1)

"Make MPEG-2 the second codec to pass boolean-correctness on fresnel via the libva-v4l2-request-fourier path — mpv --hwdec=vaapi-copy bbb_720p10s_mpeg2.ts engages the backend cleanly and DMA-BUF GL import yields HW pixels byte-identical to a software-decoded reference for the same frames."

Pass/fail (boolean):

  1. Profile enumeration regression check. vainfo --display drm --device /dev/dri/renderD128 with the hantro-vpu-dec env binding (LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video5, LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media2) continues to list VAProfileMPEG2Simple and VAProfileMPEG2Main. (Already passes today; this exists to prove iter1 work didn't strip the enumeration.)
  2. Config creation succeeds. vaCreateConfig(VAProfileMPEG2Main, VAEntrypointVLD) returns VA_STATUS_SUCCESS. (Today returns 12 = VA_STATUS_ERROR_UNSUPPORTED_PROFILE.)
  3. End-to-end decode engages the backend. mpv --hwdec=vaapi-copy --frames=2 --vo=null --no-audio --no-input-default-bindings ~/fourier-test/bbb_720p10s_mpeg2.ts with the hantro env binding logs the [vaapi] libva: Trying to open /usr/lib/dri/v4l2_request_drv_video.so chain, the Using hardware decoding (vaapi-copy) confirmation, and exits 0 with no Failed to create decode configuration lines.
  4. Cache-safe pixel verification matches SW reference. mpv --hwdec=vaapi --vo=image --frames=2 --start=00:00:02 --vo-image-outdir=/tmp/iter1_mpeg2_hw and the equivalent --hwdec=no SW run produce JPEGs whose sha256sum outputs match for both frame 1 and frame 2. The seek to +02s (~48 frames into the 10s 720p MPEG-2 fixture) avoids an all-solid-color intro and exercises real bunny content. Frames 1 and 2 must hash-differ between each other (motion content) AND hash-equal across HW vs SW.
  5. Regression check on H.264. The T4 re-run incantation against bbb_1080p30_h264.mp4 continues to pass — H.264 hashes at +30s seek match the reference values from phase0_evidence/2026-05-07/h264_baseline_trace.md (f623d5f7… for frame 1, 7d7bc6f2… for frame 2). Iter1 must not break H.264.

A clean iter1 close has all five checks green. Anything less loops back to Phase 4 per feedback_dev_process.md Phase 7 → Phase 4 edge.

Mechanism the question targets

Phase 0 cross-validator sweep (phase0_evidence/2026-05-07/cross_validator_traces.md) established that the kernel + driver path works for all five locked codecs. ffmpeg -hwaccel v4l2request -i bbb_720p10s_mpeg2.ts decodes 2 frames to exit 0 — hantro-vpu-dec on /dev/video5 accepts the MG2S (V4L2_PIX_FMT_MPEG2_SLICE) request-API contract end-to-end.

Phase 0 codec-status sweep then established that our libva backend is the lone broken link. Reading src/config.c on the iter8 master tip (65969da):

  • src/config.c:38: #include <hevc-ctrls.h> — kernel UAPI HEVC headers loaded (used elsewhere for HEVC enumeration).
  • src/config.c:45: VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile, …) — entry point.
  • src/config.c:64: case VAProfileMPEG2Simple: and src/config.c:65: case VAProfileMPEG2Main: — the profiles ARE in the validation switch.
  • src/config.c:113: ...VAProfile *profiles, int *profiles_count)RequestQueryConfigProfiles, the enumerator.
  • src/config.c:126: profiles[index++] = VAProfileMPEG2Simple; and :127: profiles[index++] = VAProfileMPEG2Main; — both unconditionally enumerated.
  • src/config.c:164: case VAProfileMPEG2Simple: and :165: case VAProfileMPEG2Main: — present in another switch (likely RequestQueryConfigEntrypoints).

So all the case statements are in place. Yet vaCreateConfig rejects with 12 (UNSUPPORTED_PROFILE). The rejection must be downstream of the case match — somewhere in RequestCreateConfig between line 67 (after the cases) and the function's return. Plausible suspects:

  • A V4L2 capability probe (e.g., VIDIOC_TRY_FMT against MG2S on the bound device) that fails because the libva backend was bound to /dev/video5 but is checking the format against the wrong codec list.
  • A device-discovery routing decision that reaches a default-reject because the bound device didn't match an expected codec-to-device map.
  • A media_request allocation step that fails with EINVAL on /dev/media2 for some MPEG-2-specific reason.
  • An iter6 or iter7 regression in the dispatch-by-profile path that broke MPEG-2 silently because nobody on libva-multiplanar tested it (per phase0_findings.md carry-over: "MPEG-2 was iter1 backlog in libva-multiplanar, dropped at iter6 close because A55 CPU handles it fine").

Phase 2 source-read of RequestCreateConfig end-to-end + picture.c MPEG-2 dispatch + mpeg2.c set-controls path will identify the exact rejection site. Phase 4 plan must cite the contract before patching. Per feedback_dev_process.md Phase 6 contract-before-code: read kernel drivers/media/platform/verisilicon/hantro_mpeg2.c, read FFmpeg downstream libavcodec/v4l2_request_mpeg2.c, state the MPEG-2 control-submission contract explicitly in the Phase 4 plan or commit message before any code lands.

Predecessor carry-over (campaign Phase 0 → iter1)

State that carries forward (re-verified in campaign Phase 0)

  • Hardware: fresnel RK3399, kernel 6.19.9-99-eos-arm. Custom OC kernel with CONFIG_FTRACE=y, CONFIG_FUNCTION_TRACER=y, CONFIG_DYNAMIC_FTRACE=y, CONFIG_TRACING=y (phase0_findings.md line 29). No rebuild needed.
  • Hantro-vpu-dec node: /dev/video5 + /dev/media2 bind. DT compatible rockchip,rk3399-vpu. Same parent device as the JPEG encoder on /dev/video4. Card type: rockchip,rk3399-vpu-dec (per v4l2-ctl --info).
  • Decoder formats (from phase0_evidence/2026-05-07/v4l2_inventory.txt): OUTPUT_MPLANE = MG2S (MPEG-2 Parsed Slice Data, compressed) + VP8F. CAPTURE_MPLANE = NV12.
  • Stateless control payloads (kernel surface): mpeg_2_sequence_header (0x00a409dc), mpeg_2_picture_header (0x00a409dd), mpeg_2_quantisation_matrices (0x00a409de). All flagged unsupported payload type by v4l2-ctl --list-ctrls-menus (normal — v4l2-ctl can't serialize compound controls; the kernel ABI uses VIDIOC_S_EXT_CTRLS with V4L2_CTRL_WHICH_REQUEST_VAL).
  • Userspace: libva 1.23.0, libdrm 2.4.131, mpv 0.41.0 stock (replaced mpv-git which was libplacebo-broken; phase0_evidence/2026-05-07/h264_baseline_trace.md), ffmpeg n8.1-13-gb57fbbe50c (Kwiboo v4l2-request-n8.1 branch).
  • Backend build state: libva-v4l2-request-fourier master tip 65969da (iter8 Phase 4) built directly on fresnel via meson setup --prefix=/usr build && ninja -C build && sudo ninja -C build install. Installed at /usr/lib/dri/v4l2_request_drv_video.so, mode 0755 root:root, BuildID 89addcc37a8e6ed2240b0e7ef78789a2e09a2245, single export __vaDriverInit_1_23. Compiled-in codecs: mpeg2.c, h264.c, h264_slice_header.c. Excluded: h265.c (commented out in src/meson.build). Absent: VP8, VP9 source files.
  • Test fixture: ~/fourier-test/bbb_720p10s_mpeg2.ts on fresnel (5.3 MB, MPEG-2 Main, 1280×720@24fps yuv420p, 10s, MPEG-TS container, generated 2026-05-07 23:35 from H.264 master via ffmpeg -ss 30 -t 10 -vf scale=1280:720 -c:v mpeg2video -profile:v 4 -level:v 8 -b:v 4M -pix_fmt yuv420p). Provenance + reproducibility: phase0_evidence/2026-05-07/test_fixtures.md.
  • H.264 reference for regression: ~/fourier-test/bbb_1080p30_h264.mp4 (725 MB, H.264 High@4.0, 1920×1080@24fps). Reference hashes from T4: HW frame 1 (+30s) sha256 f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9, frame 2 sha256 7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8.
  • Cross-validator anchor: ffmpeg-v4l2request MPEG-2 contract from phase0_evidence/2026-05-07/cross_validator/mpeg2/. 5 S_EXT_CTRLS for 2 frames (= 2.5/frame: sequence_header + picture_header + quantisation_matrices, partially batched), 4 MEDIA_IOC_REQUEST_ALLOC, 4 DMA_BUF_IOCTL_SYNC. Single-threaded dec0:0:mpeg2vid worker (no frame-threading for MPEG-2 in ffmpeg). 32 ftrace lines for 2 frames.
  • Cache-safe verify path: mpv --hwdec=vaapi --vo=image (DMA-BUF + EGL_EXT_image_dma_buf_import + glReadPixels + JPEG encode). Proven equivalent to SW reference on H.264 (T4 — byte-identical at +30s mid-content). Same path applies for MPEG-2 verification.

Data that does NOT carry forward (re-acquire if needed)

  • ohm/RK3568 hantro MPEG-2 behaviour. ohm uses RK3568 hantro (rockchip,rk3568-vpu kernel variant); fresnel uses RK3399 hantro (rockchip,rk3399-vpu variant). Different driver path inside drivers/media/platform/verisilicon/. Reference history only — re-verify any contract claim against fresnel.
  • The "MPEG-2 was dropped at iter6 close because A55 CPU handles it fine" disposition. fresnel runs A53 (weaker than ohm's A55), so the disposition for ohm doesn't transfer; HW MPEG-2 decode is potentially valuable on RK3399.
  • Pre-iter6 libva-multiplanar MPEG-2 trace data, if any. Don't have a path to it; if Phase 2 source-read shows the MPEG-2 codepath has been quiet (untested) since iter1, treat MPEG-2 in this fork as a bring-up-from-scratch.

Open questions inherited from campaign Phase 0

  • Cache-stale vaDeriveImage bug class on RK3399 (T4 finding). Iter1 must use the DMA-BUF GL import path for pixel verification (per Pass/fail #4 above), not vaDeriveImage. The image-export bug fix is Phase 4 cross-cutting work, not iter1-scoped.
  • ffmpeg -hwaccel v4l2request MPEG-2 architectural divergence: 4 MEDIA_IOC_REQUEST_ALLOC (vs our backend's 16 in iter6 binding), VIDIOC_EXPBUF + DMA_BUF_IOCTL_SYNC for cache-safe readback. Whether to mirror the EXPBUF + SYNC pattern in our backend or stay with the iter6 cap_pool model is a Phase 4 design decision; iter1 doesn't have to converge on ffmpeg's pattern as long as the boolean criteria pass.
  • HEVC profile enumerated despite h265.c not compiled (T3 finding). Orthogonal to iter1; cleaning up the false-advertising is Phase 4 cross-cutting.

Tooling and measurement-instrument inventory (live verification)

Re-verified on fresnel at iter1 open:

  • strace -ff -tt -y -e trace=ioctl,openat,close for libva-side V4L2 ioctl tracing — proven working in T4.
  • sudo sh -c "echo 1 > /sys/kernel/tracing/events/v4l2/enable" for kernel v4l2 tracepoints — proven working in T4 + cross-validator sweep.
  • mpv --hwdec=vaapi --vo=image (cache-safe pixel verify) — proven on T4, replicates for MPEG-2 in iter1 binding cells.
  • ffmpeg -hwaccel v4l2request (independent V4L2 client cross-validator) — proven on all 5 codecs in T6.
  • Backend build harness on fresnel: ninja -C ~/src/libva-v4l2-request-fourier/build && sudo ninja -C ~/src/libva-v4l2-request-fourier/build install.

Iter1 will likely add per-source debug printf/fprintf(stderr, ...) instrumentation in src/config.c's RequestCreateConfig (and possibly picture.c MPEG-2 dispatch) to pin the rejection site. That instrumentation is iter1-internal scratch — clean sweep at iter1 close per Phase 5 review precedent (libva-multiplanar iter5 sweep removed ~339 lines of debug instrumentation at close).

In-scope (LOCKED 2026-05-07 for iteration 1)

  • libva-v4l2-request-fourier backend MPEG-2 path on hantro-vpu-dec.
  • src/config.c::RequestCreateConfig MPEG-2 rejection-site investigation + fix.
  • src/picture.c MPEG-2 dispatch path (if Phase 2 source-read finds it implicated).
  • src/mpeg2.c set-controls path verification against kernel hantro_mpeg2.c and FFmpeg v4l2_request_mpeg2.c.
  • iter1 binding-cell test harness: a script that runs the five Pass/fail checks above, captures evidence to phase0_evidence/<date>/iter1_mpeg2/, and emits a markdown verdict.
  • Cache-safe pixel verify must use DMA-BUF GL import (not vaDeriveImage).
  • Regression check on H.264 (re-run T4 incantation, compare hashes against reference).

Out-of-scope (LOCKED 2026-05-07 for iteration 1)

  • HEVC, VP9, VP8 work — separate iterations per the suggested order in cross_validator_traces.md.
  • The vaDeriveImage cache-stale bug class fix — Phase 4 cross-cutting work (potentially under a separate iteration).
  • chromium-fourier 149 install on fresnel — not gating; can land as a Phase 0 follow-up to iter1's substrate when convenient.
  • MPEG-2 performance metrics (FPS, CPU%, drops) — Phase 1+ separate iteration. iter1 is boolean correctness only, per the campaign-locked criterion.
  • Long-duration MPEG-2 stress (>10s) — boolean correctness on 2 frames is enough; longer-run regressions surface as a separate iteration if iter1 exposes any.
  • MPEG-2 Simple-only fixtures — the campaign locked fixture is Main profile; Simple is a strict subset and likely passes once Main does.
  • AVI, MPG (program-stream), or other MPEG-2 containers beyond the iter1 fixture's MPEG-TS shape. iter1 fixture is bbb_720p10s_mpeg2.ts; container-shape coverage is a Phase 1+ matter.
  • Upstream Linux engagement (per feedback_no_upstream.md). Kernel side works; nothing to file.

Phase 1 success criterion (LOCKED 2026-05-07)

Per feedback_dev_process.md Phase 1 — define the objective in measurable terms before touching anything. The five Pass/fail bullets at the top of this document are the iter1 success criterion, locked. Phase 3 baseline measurement (the strace + ftrace contract trace of current MPEG-2 failure on our backend, plus the reference ffmpeg-v4l2request trace already in phase0_evidence/2026-05-07/cross_validator/mpeg2/) feeds Phase 4 plan; Phase 7 verification re-runs all five checks against the patched backend.

If Phase 3 baseline reveals the chosen criterion is the wrong target (per feedback_dev_process.md's Phase 3 → Phase 1 loopback), the criterion will be rewritten and re-locked. Plausible reasons that would trigger the loopback:

  • The MPEG-2 fixture is malformed in a way that exposes a fixture-side bug rather than a backend-side bug. (Mitigation: ffmpeg-v4l2request decodes the same fixture cleanly per cross_validator data, so this is unlikely.)
  • Pixel verification via DMA-BUF GL import for MPEG-2 produces non-matching hashes for reasons unrelated to the backend (e.g., GL color-space conversion divergence, panfrost MPEG-2-specific quirk). In that case the criterion gets a different verifier — direct ffmpeg hwdownload,format=nv12 from our libva path, or a custom C reproducer with msync(MS_SYNC|MS_INVALIDATE).
  • The vaCreateConfig rejection site turns out to be in libva itself (not our backend), making "fix RequestCreateConfig" the wrong scope. (Unlikely — the error message threads through our backend's return path.)

What "iteration 1 close" looks like

A clean iter1 close per feedback_dev_process.md Phase 8 yields:

  • All five Pass/fail criteria green.
  • A phase8_iteration1_close.md document in this campaign repo summarizing the bug, the contract, the fix, and the binding-cell numbers.
  • A second-codec passing entry in the campaign-level scoreboard (currently 1/5, target 2/5 after iter1).
  • Memory entry distilling the lesson — per feedback_dev_process.md Phase 8 "do not let the lesson rot in chat history."
  • A debug-instrumentation sweep — any printf/fprintf added during Phase 6 must be removed before close.
  • The Phase 5 sonnet-architect review pass (per feedback_dev_process.md Phase 5) signed off.
  • Commit history under git.reauktion.de/marfrit/fresnel-fourier reflecting the iter1 phases, all authored as claude-noether per memory/feedback_gitea_as_claude_noether.md.