Files
fresnel-fourier/phase0_findings_iter1.md
claude-noether f720c7784b iter1 Phase 0 + Phase 1 lock: MPEG-2 boolean correctness on hantro
Iteration 1 of the campaign 8(+1)-phase loop opens following the
campaign Phase 0 close (b74551b). Per the suggested order in
phase0_evidence/2026-05-07/cross_validator_traces.md, iter1 attacks
MPEG-2 — the cheapest fix in the codec status sweep (mpeg2.c is
already compiled, src/config.c has the case statements, but
vaCreateConfig returns 12 / UNSUPPORTED_PROFILE).

Locked research question:

  Make MPEG-2 the second codec to pass boolean-correctness on
  fresnel via the libva-v4l2-request-fourier path —
  mpv --hwdec=vaapi-copy bbb_720p10s_mpeg2.ts engages the backend
  cleanly and DMA-BUF GL import yields HW pixels byte-identical
  to a software-decoded reference for the same frames.

Phase 1 success criterion (5 boolean checks, all must be green):

  1. vainfo: VAProfileMPEG2Simple + VAProfileMPEG2Main still
     enumerated on the hantro env binding (regression check).
  2. vaCreateConfig(VAProfileMPEG2Main, VLD): VA_STATUS_SUCCESS.
  3. mpv --hwdec=vaapi-copy --frames=2 --vo=null on
     bbb_720p10s_mpeg2.ts: engages backend, exit 0, no
     "Failed to create decode configuration" lines.
  4. mpv --hwdec=vaapi --vo=image (DMA-BUF GL import) at +02s
     seek: 2 distinct frames hash-equal to SW reference frames
     and hash-differ from each other.
  5. T4 H.264 regression: re-run T4 incantation, hashes match
     f623d5f7... and 7d7bc6f2... reference values.

Mechanism the question targets:

The kernel + driver path is solid (cross_validator_traces.md —
ffmpeg-v4l2request decodes the same fixture exit 0). vaCreateConfig
rejection must be downstream of src/config.c:64-65's case match
(both VAProfileMPEG2Simple and VAProfileMPEG2Main are present in
the validation switch) but upstream of return-success. Plausible
suspects for Phase 2 source-read:

  - V4L2 capability probe (e.g., VIDIOC_TRY_FMT against MG2S) that
    fails because the libva backend was bound to /dev/video5 but
    is checking format against the wrong codec list.
  - Device-discovery routing reaches a default-reject because
    bound device didn't match an expected codec-to-device map.
  - media_request allocation step fails on /dev/media2 for some
    MPEG-2-specific reason.
  - iter6/iter7 regression in the dispatch-by-profile path that
    broke MPEG-2 silently because nobody on libva-multiplanar
    tested it (iter5 close: "MPEG-2 was iter1 backlog, dropped
    at iter6 close because A55 CPU handles it fine" — fresnel
    runs A53 so the disposition doesn't transfer).

Phase 4 plan must cite the contract before patching, per
feedback_dev_process.md Phase 6 contract-before-code: read kernel
drivers/media/platform/verisilicon/hantro_mpeg2.c, read FFmpeg
downstream libavcodec/v4l2_request_mpeg2.c, state the MPEG-2
control-submission contract explicitly before any code lands.

Predecessor carry-over (state vs data) explicit per
feedback_dev_process.md Phase 0 + feedback_replicate_baseline_first.md:

  Carries forward (re-verified in campaign Phase 0):
    - iter8 master fork (65969da) installed on fresnel,
      vainfo enumeration confirmed
    - bbb_720p10s_mpeg2.ts fixture (5.3 MB, MPEG-2 Main, 720p,
      10s, MPEG-TS) on fresnel ~/fourier-test/, provenance
      documented in test_fixtures.md
    - hantro-vpu-dec /dev/video5 + /dev/media2 binding
    - Cross-validator anchor: ffmpeg-v4l2request mpeg2 trace
      captured (5 S_EXT_CTRLS, 4 REQUEST_ALLOC, 4 DMA_BUF_SYNC)
    - T4 reference hashes for H.264 regression check

  Does NOT carry forward (re-acquire if needed):
    - ohm/RK3568 hantro MPEG-2 behaviour — different kernel
      driver variant inside drivers/media/platform/verisilicon/
    - Pre-iter6 libva-multiplanar MPEG-2 trace data (untested
      at iter6 close)

  Open questions inherited:
    - Cache-stale vaDeriveImage bug class on RK3399 (T4) —
      iter1 uses DMA-BUF GL import for verify; Phase 4 cross-
      cutting fix is not iter1-scoped
    - Architectural divergence ffmpeg vs our backend (EXPBUF +
      DMA_BUF_SYNC vs cap_pool + vaDeriveImage) — Phase 4
      design decision, not iter1-blocking

Out-of-scope (LOCKED): HEVC/VP9/VP8 (later iterations); vaDerive
Image cache-stale fix (Phase 4 cross-cutting); chromium-fourier
149 install (Phase 0 follow-up to iter1 substrate, non-gating);
performance metrics (Phase 1+ separate iteration); long-duration
stress (>10s); other MPEG-2 containers beyond MPEG-TS;
upstream engagement.

Iter1 Phase 2 source-read targets:
  - src/config.c::RequestCreateConfig (rejection site)
  - src/picture.c MPEG-2 dispatch path
  - src/mpeg2.c set-controls path
  - kernel drivers/media/platform/verisilicon/hantro_mpeg2.c
  - FFmpeg downstream libavcodec/v4l2_request_mpeg2.c

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 22:04:52 +00:00

122 lines
16 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 1 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock
Opens 2026-05-07 evening immediately after campaign [`phase0_findings.md`](phase0_findings.md) close (commit `b74551b`). This is the first per-iteration loop on fresnel-fourier; the campaign-level Phase 0 already locked scope (5 codecs, RK3399, `boolean correctness` per codec) and produced the empirical groundwork on which iter1 commits.
## Locked research question (iteration 1)
> **"Make MPEG-2 the second codec to pass boolean-correctness on fresnel via the libva-v4l2-request-fourier path — `mpv --hwdec=vaapi-copy bbb_720p10s_mpeg2.ts` engages the backend cleanly and DMA-BUF GL import yields HW pixels byte-identical to a software-decoded reference for the same frames."**
Pass/fail (boolean):
1. **Profile enumeration regression check.** `vainfo --display drm --device /dev/dri/renderD128` with the hantro-vpu-dec env binding (`LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video5`, `LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media2`) continues to list `VAProfileMPEG2Simple` and `VAProfileMPEG2Main`. (Already passes today; this exists to prove iter1 work didn't strip the enumeration.)
2. **Config creation succeeds.** `vaCreateConfig(VAProfileMPEG2Main, VAEntrypointVLD)` returns `VA_STATUS_SUCCESS`. (Today returns `12 = VA_STATUS_ERROR_UNSUPPORTED_PROFILE`.)
3. **End-to-end decode engages the backend.** `mpv --hwdec=vaapi-copy --frames=2 --vo=null --no-audio --no-input-default-bindings ~/fourier-test/bbb_720p10s_mpeg2.ts` with the hantro env binding logs the `[vaapi] libva: Trying to open /usr/lib/dri/v4l2_request_drv_video.so` chain, the `Using hardware decoding (vaapi-copy)` confirmation, and exits 0 with no `Failed to create decode configuration` lines.
4. **Cache-safe pixel verification matches SW reference.** `mpv --hwdec=vaapi --vo=image --frames=2 --start=00:00:02 --vo-image-outdir=/tmp/iter1_mpeg2_hw` and the equivalent `--hwdec=no` SW run produce JPEGs whose `sha256sum` outputs match for both frame 1 and frame 2. The seek to `+02s` (~48 frames into the 10s 720p MPEG-2 fixture) avoids an all-solid-color intro and exercises real bunny content. Frames 1 and 2 must hash-differ between each other (motion content) AND hash-equal across HW vs SW.
5. **Regression check on H.264.** The T4 re-run incantation against `bbb_1080p30_h264.mp4` continues to pass — H.264 hashes at +30s seek match the reference values from `phase0_evidence/2026-05-07/h264_baseline_trace.md` (`f623d5f7…` for frame 1, `7d7bc6f2…` for frame 2). Iter1 must not break H.264.
A clean iter1 close has all five checks green. Anything less loops back to Phase 4 per `feedback_dev_process.md` Phase 7 → Phase 4 edge.
## Mechanism the question targets
Phase 0 cross-validator sweep ([`phase0_evidence/2026-05-07/cross_validator_traces.md`](phase0_evidence/2026-05-07/cross_validator_traces.md)) established that the kernel + driver path works for all five locked codecs. `ffmpeg -hwaccel v4l2request -i bbb_720p10s_mpeg2.ts` decodes 2 frames to exit 0 — hantro-vpu-dec on `/dev/video5` accepts the `MG2S` (`V4L2_PIX_FMT_MPEG2_SLICE`) request-API contract end-to-end.
Phase 0 codec-status sweep then established that **our libva backend is the lone broken link**. Reading [`src/config.c`](../libva-multiplanar/libva-v4l2-request-fourier/src/config.c) on the iter8 master tip (`65969da`):
- `src/config.c:38: #include <hevc-ctrls.h>` — kernel UAPI HEVC headers loaded (used elsewhere for HEVC enumeration).
- `src/config.c:45: VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile, …)` — entry point.
- `src/config.c:64: case VAProfileMPEG2Simple:` and `src/config.c:65: case VAProfileMPEG2Main:` — the profiles ARE in the validation switch.
- `src/config.c:113: ...VAProfile *profiles, int *profiles_count)``RequestQueryConfigProfiles`, the enumerator.
- `src/config.c:126: profiles[index++] = VAProfileMPEG2Simple;` and `:127: profiles[index++] = VAProfileMPEG2Main;` — both unconditionally enumerated.
- `src/config.c:164: case VAProfileMPEG2Simple:` and `:165: case VAProfileMPEG2Main:` — present in another switch (likely `RequestQueryConfigEntrypoints`).
So all the case statements are in place. Yet `vaCreateConfig` rejects with `12 (UNSUPPORTED_PROFILE)`. The rejection must be downstream of the case match — somewhere in `RequestCreateConfig` between line 67 (after the cases) and the function's return. Plausible suspects:
- A V4L2 capability probe (e.g., `VIDIOC_TRY_FMT` against `MG2S` on the bound device) that fails because the libva backend was bound to `/dev/video5` but is checking the format against the wrong codec list.
- A device-discovery routing decision that reaches a default-reject because the bound device didn't match an expected codec-to-device map.
- A `media_request` allocation step that fails with EINVAL on `/dev/media2` for some MPEG-2-specific reason.
- An iter6 or iter7 regression in the dispatch-by-profile path that broke MPEG-2 silently because nobody on libva-multiplanar tested it (per `phase0_findings.md` carry-over: "MPEG-2 was iter1 backlog in libva-multiplanar, dropped at iter6 close because A55 CPU handles it fine").
Phase 2 source-read of `RequestCreateConfig` end-to-end + `picture.c` MPEG-2 dispatch + `mpeg2.c` set-controls path will identify the exact rejection site. **Phase 4 plan must cite the contract before patching.** Per `feedback_dev_process.md` Phase 6 contract-before-code: read kernel `drivers/media/platform/verisilicon/hantro_mpeg2.c`, read FFmpeg downstream `libavcodec/v4l2_request_mpeg2.c`, state the MPEG-2 control-submission contract explicitly in the Phase 4 plan or commit message before any code lands.
## Predecessor carry-over (campaign Phase 0 → iter1)
### State that carries forward (re-verified in campaign Phase 0)
- **Hardware**: fresnel RK3399, kernel `6.19.9-99-eos-arm`. Custom OC kernel with `CONFIG_FTRACE=y, CONFIG_FUNCTION_TRACER=y, CONFIG_DYNAMIC_FTRACE=y, CONFIG_TRACING=y` (`phase0_findings.md` line 29). No rebuild needed.
- **Hantro-vpu-dec node**: `/dev/video5` + `/dev/media2` bind. DT compatible `rockchip,rk3399-vpu`. Same parent device as the JPEG encoder on `/dev/video4`. Card type: `rockchip,rk3399-vpu-dec` (per `v4l2-ctl --info`).
- **Decoder formats** (from `phase0_evidence/2026-05-07/v4l2_inventory.txt`): OUTPUT_MPLANE = `MG2S` (MPEG-2 Parsed Slice Data, compressed) + `VP8F`. CAPTURE_MPLANE = `NV12`.
- **Stateless control payloads** (kernel surface): `mpeg_2_sequence_header` (`0x00a409dc`), `mpeg_2_picture_header` (`0x00a409dd`), `mpeg_2_quantisation_matrices` (`0x00a409de`). All flagged `unsupported payload type` by `v4l2-ctl --list-ctrls-menus` (normal — v4l2-ctl can't serialize compound controls; the kernel ABI uses `VIDIOC_S_EXT_CTRLS` with `V4L2_CTRL_WHICH_REQUEST_VAL`).
- **Userspace**: libva 1.23.0, libdrm 2.4.131, mpv 0.41.0 stock (replaced mpv-git which was libplacebo-broken; `phase0_evidence/2026-05-07/h264_baseline_trace.md`), ffmpeg `n8.1-13-gb57fbbe50c` (Kwiboo `v4l2-request-n8.1` branch).
- **Backend build state**: libva-v4l2-request-fourier master tip `65969da` (iter8 Phase 4) built directly on fresnel via `meson setup --prefix=/usr build && ninja -C build && sudo ninja -C build install`. Installed at `/usr/lib/dri/v4l2_request_drv_video.so`, mode 0755 root:root, BuildID `89addcc37a8e6ed2240b0e7ef78789a2e09a2245`, single export `__vaDriverInit_1_23`. Compiled-in codecs: `mpeg2.c`, `h264.c`, `h264_slice_header.c`. Excluded: `h265.c` (commented out in `src/meson.build`). Absent: VP8, VP9 source files.
- **Test fixture**: `~/fourier-test/bbb_720p10s_mpeg2.ts` on fresnel (5.3 MB, MPEG-2 Main, 1280×720@24fps yuv420p, 10s, MPEG-TS container, generated 2026-05-07 23:35 from H.264 master via `ffmpeg -ss 30 -t 10 -vf scale=1280:720 -c:v mpeg2video -profile:v 4 -level:v 8 -b:v 4M -pix_fmt yuv420p`). Provenance + reproducibility: [`phase0_evidence/2026-05-07/test_fixtures.md`](phase0_evidence/2026-05-07/test_fixtures.md).
- **H.264 reference for regression**: `~/fourier-test/bbb_1080p30_h264.mp4` (725 MB, H.264 High@4.0, 1920×1080@24fps). Reference hashes from T4: HW frame 1 (`+30s`) sha256 `f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9`, frame 2 sha256 `7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8`.
- **Cross-validator anchor**: ffmpeg-v4l2request MPEG-2 contract from [`phase0_evidence/2026-05-07/cross_validator/mpeg2/`](phase0_evidence/2026-05-07/cross_validator/mpeg2/). 5 `S_EXT_CTRLS` for 2 frames (= 2.5/frame: sequence_header + picture_header + quantisation_matrices, partially batched), 4 `MEDIA_IOC_REQUEST_ALLOC`, 4 `DMA_BUF_IOCTL_SYNC`. Single-threaded `dec0:0:mpeg2vid` worker (no frame-threading for MPEG-2 in ffmpeg). 32 ftrace lines for 2 frames.
- **Cache-safe verify path**: `mpv --hwdec=vaapi --vo=image` (DMA-BUF + EGL_EXT_image_dma_buf_import + glReadPixels + JPEG encode). Proven equivalent to SW reference on H.264 (T4 — byte-identical at +30s mid-content). Same path applies for MPEG-2 verification.
### Data that does NOT carry forward (re-acquire if needed)
- ohm/RK3568 hantro MPEG-2 behaviour. ohm uses RK3568 hantro (`rockchip,rk3568-vpu` kernel variant); fresnel uses RK3399 hantro (`rockchip,rk3399-vpu` variant). Different driver path inside `drivers/media/platform/verisilicon/`. Reference history only — re-verify any contract claim against fresnel.
- The "MPEG-2 was dropped at iter6 close because A55 CPU handles it fine" disposition. fresnel runs A53 (weaker than ohm's A55), so the disposition for ohm doesn't transfer; HW MPEG-2 decode is potentially valuable on RK3399.
- Pre-iter6 libva-multiplanar MPEG-2 trace data, if any. Don't have a path to it; if Phase 2 source-read shows the MPEG-2 codepath has been quiet (untested) since iter1, treat MPEG-2 in this fork as a bring-up-from-scratch.
### Open questions inherited from campaign Phase 0
- **Cache-stale `vaDeriveImage` bug class on RK3399** (T4 finding). Iter1 must use the DMA-BUF GL import path for pixel verification (per Pass/fail #4 above), not `vaDeriveImage`. The image-export bug fix is Phase 4 cross-cutting work, not iter1-scoped.
- **`ffmpeg -hwaccel v4l2request` MPEG-2 architectural divergence**: 4 `MEDIA_IOC_REQUEST_ALLOC` (vs our backend's 16 in iter6 binding), `VIDIOC_EXPBUF` + `DMA_BUF_IOCTL_SYNC` for cache-safe readback. Whether to mirror the EXPBUF + SYNC pattern in our backend or stay with the iter6 cap_pool model is a Phase 4 design decision; iter1 doesn't have to converge on ffmpeg's pattern as long as the boolean criteria pass.
- **HEVC profile enumerated despite `h265.c` not compiled** (T3 finding). Orthogonal to iter1; cleaning up the false-advertising is Phase 4 cross-cutting.
## Tooling and measurement-instrument inventory (live verification)
Re-verified on fresnel at iter1 open:
- `strace -ff -tt -y -e trace=ioctl,openat,close` for libva-side V4L2 ioctl tracing — proven working in T4.
- `sudo sh -c "echo 1 > /sys/kernel/tracing/events/v4l2/enable"` for kernel v4l2 tracepoints — proven working in T4 + cross-validator sweep.
- `mpv --hwdec=vaapi --vo=image` (cache-safe pixel verify) — proven on T4, replicates for MPEG-2 in iter1 binding cells.
- `ffmpeg -hwaccel v4l2request` (independent V4L2 client cross-validator) — proven on all 5 codecs in T6.
- Backend build harness on fresnel: `ninja -C ~/src/libva-v4l2-request-fourier/build && sudo ninja -C ~/src/libva-v4l2-request-fourier/build install`.
Iter1 will likely add per-source debug `printf`/`fprintf(stderr, ...)` instrumentation in `src/config.c`'s `RequestCreateConfig` (and possibly `picture.c` MPEG-2 dispatch) to pin the rejection site. That instrumentation is iter1-internal scratch — clean sweep at iter1 close per Phase 5 review precedent (libva-multiplanar iter5 sweep removed ~339 lines of debug instrumentation at close).
## In-scope (LOCKED 2026-05-07 for iteration 1)
- libva-v4l2-request-fourier backend MPEG-2 path on hantro-vpu-dec.
- `src/config.c::RequestCreateConfig` MPEG-2 rejection-site investigation + fix.
- `src/picture.c` MPEG-2 dispatch path (if Phase 2 source-read finds it implicated).
- `src/mpeg2.c` set-controls path verification against kernel `hantro_mpeg2.c` and FFmpeg `v4l2_request_mpeg2.c`.
- iter1 binding-cell test harness: a script that runs the five Pass/fail checks above, captures evidence to `phase0_evidence/<date>/iter1_mpeg2/`, and emits a markdown verdict.
- Cache-safe pixel verify must use DMA-BUF GL import (not `vaDeriveImage`).
- Regression check on H.264 (re-run T4 incantation, compare hashes against reference).
## Out-of-scope (LOCKED 2026-05-07 for iteration 1)
- HEVC, VP9, VP8 work — separate iterations per the suggested order in `cross_validator_traces.md`.
- The vaDeriveImage cache-stale bug class fix — Phase 4 cross-cutting work (potentially under a separate iteration).
- chromium-fourier 149 install on fresnel — not gating; can land as a Phase 0 follow-up to iter1's substrate when convenient.
- MPEG-2 performance metrics (FPS, CPU%, drops) — Phase 1+ separate iteration. iter1 is boolean correctness only, per the campaign-locked criterion.
- Long-duration MPEG-2 stress (>10s) — boolean correctness on 2 frames is enough; longer-run regressions surface as a separate iteration if iter1 exposes any.
- MPEG-2 Simple-only fixtures — the campaign locked fixture is Main profile; Simple is a strict subset and likely passes once Main does.
- AVI, MPG (program-stream), or other MPEG-2 containers beyond the iter1 fixture's MPEG-TS shape. iter1 fixture is `bbb_720p10s_mpeg2.ts`; container-shape coverage is a Phase 1+ matter.
- Upstream Linux engagement (per `feedback_no_upstream.md`). Kernel side works; nothing to file.
## Phase 1 success criterion (LOCKED 2026-05-07)
Per `feedback_dev_process.md` Phase 1 — define the objective in measurable terms before touching anything. The five Pass/fail bullets at the top of this document are the iter1 success criterion, locked. Phase 3 baseline measurement (the strace + ftrace contract trace of *current* MPEG-2 failure on our backend, plus the reference ffmpeg-v4l2request trace already in `phase0_evidence/2026-05-07/cross_validator/mpeg2/`) feeds Phase 4 plan; Phase 7 verification re-runs all five checks against the patched backend.
If Phase 3 baseline reveals the chosen criterion is the wrong target (per `feedback_dev_process.md`'s Phase 3 → Phase 1 loopback), the criterion will be rewritten and re-locked. Plausible reasons that would trigger the loopback:
- The MPEG-2 fixture is malformed in a way that exposes a fixture-side bug rather than a backend-side bug. (Mitigation: ffmpeg-v4l2request decodes the same fixture cleanly per cross_validator data, so this is unlikely.)
- Pixel verification via DMA-BUF GL import for MPEG-2 produces non-matching hashes for reasons unrelated to the backend (e.g., GL color-space conversion divergence, panfrost MPEG-2-specific quirk). In that case the criterion gets a different verifier — direct ffmpeg `hwdownload,format=nv12` from our libva path, or a custom C reproducer with `msync(MS_SYNC|MS_INVALIDATE)`.
- The vaCreateConfig rejection site turns out to be in libva itself (not our backend), making "fix RequestCreateConfig" the wrong scope. (Unlikely — the error message threads through our backend's return path.)
## What "iteration 1 close" looks like
A clean iter1 close per `feedback_dev_process.md` Phase 8 yields:
- All five Pass/fail criteria green.
- A `phase8_iteration1_close.md` document in this campaign repo summarizing the bug, the contract, the fix, and the binding-cell numbers.
- A second-codec passing entry in the campaign-level scoreboard (currently 1/5, target 2/5 after iter1).
- Memory entry distilling the lesson — per `feedback_dev_process.md` Phase 8 "do not let the lesson rot in chat history."
- A debug-instrumentation sweep — any `printf`/`fprintf` added during Phase 6 must be removed before close.
- The Phase 5 sonnet-architect review pass (per `feedback_dev_process.md` Phase 5) signed off.
- Commit history under `git.reauktion.de/marfrit/fresnel-fourier` reflecting the iter1 phases, all authored as `claude-noether` per [`memory/feedback_gitea_as_claude_noether.md`](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_gitea_as_claude_noether.md).