diff --git a/phase0_findings_iter1.md b/phase0_findings_iter1.md new file mode 100644 index 0000000..f2a0220 --- /dev/null +++ b/phase0_findings_iter1.md @@ -0,0 +1,121 @@ +# Iteration 1 — Phase 0 (substrate / motivation / inventory) → Phase 1 lock + +Opens 2026-05-07 evening immediately after campaign [`phase0_findings.md`](phase0_findings.md) close (commit `b74551b`). This is the first per-iteration loop on fresnel-fourier; the campaign-level Phase 0 already locked scope (5 codecs, RK3399, `boolean correctness` per codec) and produced the empirical groundwork on which iter1 commits. + +## Locked research question (iteration 1) + +> **"Make MPEG-2 the second codec to pass boolean-correctness on fresnel via the libva-v4l2-request-fourier path — `mpv --hwdec=vaapi-copy bbb_720p10s_mpeg2.ts` engages the backend cleanly and DMA-BUF GL import yields HW pixels byte-identical to a software-decoded reference for the same frames."** + +Pass/fail (boolean): + +1. **Profile enumeration regression check.** `vainfo --display drm --device /dev/dri/renderD128` with the hantro-vpu-dec env binding (`LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video5`, `LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media2`) continues to list `VAProfileMPEG2Simple` and `VAProfileMPEG2Main`. (Already passes today; this exists to prove iter1 work didn't strip the enumeration.) +2. **Config creation succeeds.** `vaCreateConfig(VAProfileMPEG2Main, VAEntrypointVLD)` returns `VA_STATUS_SUCCESS`. (Today returns `12 = VA_STATUS_ERROR_UNSUPPORTED_PROFILE`.) +3. **End-to-end decode engages the backend.** `mpv --hwdec=vaapi-copy --frames=2 --vo=null --no-audio --no-input-default-bindings ~/fourier-test/bbb_720p10s_mpeg2.ts` with the hantro env binding logs the `[vaapi] libva: Trying to open /usr/lib/dri/v4l2_request_drv_video.so` chain, the `Using hardware decoding (vaapi-copy)` confirmation, and exits 0 with no `Failed to create decode configuration` lines. +4. **Cache-safe pixel verification matches SW reference.** `mpv --hwdec=vaapi --vo=image --frames=2 --start=00:00:02 --vo-image-outdir=/tmp/iter1_mpeg2_hw` and the equivalent `--hwdec=no` SW run produce JPEGs whose `sha256sum` outputs match for both frame 1 and frame 2. The seek to `+02s` (~48 frames into the 10s 720p MPEG-2 fixture) avoids an all-solid-color intro and exercises real bunny content. Frames 1 and 2 must hash-differ between each other (motion content) AND hash-equal across HW vs SW. +5. **Regression check on H.264.** The T4 re-run incantation against `bbb_1080p30_h264.mp4` continues to pass — H.264 hashes at +30s seek match the reference values from `phase0_evidence/2026-05-07/h264_baseline_trace.md` (`f623d5f7…` for frame 1, `7d7bc6f2…` for frame 2). Iter1 must not break H.264. + +A clean iter1 close has all five checks green. Anything less loops back to Phase 4 per `feedback_dev_process.md` Phase 7 → Phase 4 edge. + +## Mechanism the question targets + +Phase 0 cross-validator sweep ([`phase0_evidence/2026-05-07/cross_validator_traces.md`](phase0_evidence/2026-05-07/cross_validator_traces.md)) established that the kernel + driver path works for all five locked codecs. `ffmpeg -hwaccel v4l2request -i bbb_720p10s_mpeg2.ts` decodes 2 frames to exit 0 — hantro-vpu-dec on `/dev/video5` accepts the `MG2S` (`V4L2_PIX_FMT_MPEG2_SLICE`) request-API contract end-to-end. + +Phase 0 codec-status sweep then established that **our libva backend is the lone broken link**. Reading [`src/config.c`](../libva-multiplanar/libva-v4l2-request-fourier/src/config.c) on the iter8 master tip (`65969da`): + +- `src/config.c:38: #include ` — kernel UAPI HEVC headers loaded (used elsewhere for HEVC enumeration). +- `src/config.c:45: VAStatus RequestCreateConfig(VADriverContextP context, VAProfile profile, …)` — entry point. +- `src/config.c:64: case VAProfileMPEG2Simple:` and `src/config.c:65: case VAProfileMPEG2Main:` — the profiles ARE in the validation switch. +- `src/config.c:113: ...VAProfile *profiles, int *profiles_count)` — `RequestQueryConfigProfiles`, the enumerator. +- `src/config.c:126: profiles[index++] = VAProfileMPEG2Simple;` and `:127: profiles[index++] = VAProfileMPEG2Main;` — both unconditionally enumerated. +- `src/config.c:164: case VAProfileMPEG2Simple:` and `:165: case VAProfileMPEG2Main:` — present in another switch (likely `RequestQueryConfigEntrypoints`). + +So all the case statements are in place. Yet `vaCreateConfig` rejects with `12 (UNSUPPORTED_PROFILE)`. The rejection must be downstream of the case match — somewhere in `RequestCreateConfig` between line 67 (after the cases) and the function's return. Plausible suspects: + +- A V4L2 capability probe (e.g., `VIDIOC_TRY_FMT` against `MG2S` on the bound device) that fails because the libva backend was bound to `/dev/video5` but is checking the format against the wrong codec list. +- A device-discovery routing decision that reaches a default-reject because the bound device didn't match an expected codec-to-device map. +- A `media_request` allocation step that fails with EINVAL on `/dev/media2` for some MPEG-2-specific reason. +- An iter6 or iter7 regression in the dispatch-by-profile path that broke MPEG-2 silently because nobody on libva-multiplanar tested it (per `phase0_findings.md` carry-over: "MPEG-2 was iter1 backlog in libva-multiplanar, dropped at iter6 close because A55 CPU handles it fine"). + +Phase 2 source-read of `RequestCreateConfig` end-to-end + `picture.c` MPEG-2 dispatch + `mpeg2.c` set-controls path will identify the exact rejection site. **Phase 4 plan must cite the contract before patching.** Per `feedback_dev_process.md` Phase 6 contract-before-code: read kernel `drivers/media/platform/verisilicon/hantro_mpeg2.c`, read FFmpeg downstream `libavcodec/v4l2_request_mpeg2.c`, state the MPEG-2 control-submission contract explicitly in the Phase 4 plan or commit message before any code lands. + +## Predecessor carry-over (campaign Phase 0 → iter1) + +### State that carries forward (re-verified in campaign Phase 0) + +- **Hardware**: fresnel RK3399, kernel `6.19.9-99-eos-arm`. Custom OC kernel with `CONFIG_FTRACE=y, CONFIG_FUNCTION_TRACER=y, CONFIG_DYNAMIC_FTRACE=y, CONFIG_TRACING=y` (`phase0_findings.md` line 29). No rebuild needed. +- **Hantro-vpu-dec node**: `/dev/video5` + `/dev/media2` bind. DT compatible `rockchip,rk3399-vpu`. Same parent device as the JPEG encoder on `/dev/video4`. Card type: `rockchip,rk3399-vpu-dec` (per `v4l2-ctl --info`). +- **Decoder formats** (from `phase0_evidence/2026-05-07/v4l2_inventory.txt`): OUTPUT_MPLANE = `MG2S` (MPEG-2 Parsed Slice Data, compressed) + `VP8F`. CAPTURE_MPLANE = `NV12`. +- **Stateless control payloads** (kernel surface): `mpeg_2_sequence_header` (`0x00a409dc`), `mpeg_2_picture_header` (`0x00a409dd`), `mpeg_2_quantisation_matrices` (`0x00a409de`). All flagged `unsupported payload type` by `v4l2-ctl --list-ctrls-menus` (normal — v4l2-ctl can't serialize compound controls; the kernel ABI uses `VIDIOC_S_EXT_CTRLS` with `V4L2_CTRL_WHICH_REQUEST_VAL`). +- **Userspace**: libva 1.23.0, libdrm 2.4.131, mpv 0.41.0 stock (replaced mpv-git which was libplacebo-broken; `phase0_evidence/2026-05-07/h264_baseline_trace.md`), ffmpeg `n8.1-13-gb57fbbe50c` (Kwiboo `v4l2-request-n8.1` branch). +- **Backend build state**: libva-v4l2-request-fourier master tip `65969da` (iter8 Phase 4) built directly on fresnel via `meson setup --prefix=/usr build && ninja -C build && sudo ninja -C build install`. Installed at `/usr/lib/dri/v4l2_request_drv_video.so`, mode 0755 root:root, BuildID `89addcc37a8e6ed2240b0e7ef78789a2e09a2245`, single export `__vaDriverInit_1_23`. Compiled-in codecs: `mpeg2.c`, `h264.c`, `h264_slice_header.c`. Excluded: `h265.c` (commented out in `src/meson.build`). Absent: VP8, VP9 source files. +- **Test fixture**: `~/fourier-test/bbb_720p10s_mpeg2.ts` on fresnel (5.3 MB, MPEG-2 Main, 1280×720@24fps yuv420p, 10s, MPEG-TS container, generated 2026-05-07 23:35 from H.264 master via `ffmpeg -ss 30 -t 10 -vf scale=1280:720 -c:v mpeg2video -profile:v 4 -level:v 8 -b:v 4M -pix_fmt yuv420p`). Provenance + reproducibility: [`phase0_evidence/2026-05-07/test_fixtures.md`](phase0_evidence/2026-05-07/test_fixtures.md). +- **H.264 reference for regression**: `~/fourier-test/bbb_1080p30_h264.mp4` (725 MB, H.264 High@4.0, 1920×1080@24fps). Reference hashes from T4: HW frame 1 (`+30s`) sha256 `f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9`, frame 2 sha256 `7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8`. +- **Cross-validator anchor**: ffmpeg-v4l2request MPEG-2 contract from [`phase0_evidence/2026-05-07/cross_validator/mpeg2/`](phase0_evidence/2026-05-07/cross_validator/mpeg2/). 5 `S_EXT_CTRLS` for 2 frames (= 2.5/frame: sequence_header + picture_header + quantisation_matrices, partially batched), 4 `MEDIA_IOC_REQUEST_ALLOC`, 4 `DMA_BUF_IOCTL_SYNC`. Single-threaded `dec0:0:mpeg2vid` worker (no frame-threading for MPEG-2 in ffmpeg). 32 ftrace lines for 2 frames. +- **Cache-safe verify path**: `mpv --hwdec=vaapi --vo=image` (DMA-BUF + EGL_EXT_image_dma_buf_import + glReadPixels + JPEG encode). Proven equivalent to SW reference on H.264 (T4 — byte-identical at +30s mid-content). Same path applies for MPEG-2 verification. + +### Data that does NOT carry forward (re-acquire if needed) + +- ohm/RK3568 hantro MPEG-2 behaviour. ohm uses RK3568 hantro (`rockchip,rk3568-vpu` kernel variant); fresnel uses RK3399 hantro (`rockchip,rk3399-vpu` variant). Different driver path inside `drivers/media/platform/verisilicon/`. Reference history only — re-verify any contract claim against fresnel. +- The "MPEG-2 was dropped at iter6 close because A55 CPU handles it fine" disposition. fresnel runs A53 (weaker than ohm's A55), so the disposition for ohm doesn't transfer; HW MPEG-2 decode is potentially valuable on RK3399. +- Pre-iter6 libva-multiplanar MPEG-2 trace data, if any. Don't have a path to it; if Phase 2 source-read shows the MPEG-2 codepath has been quiet (untested) since iter1, treat MPEG-2 in this fork as a bring-up-from-scratch. + +### Open questions inherited from campaign Phase 0 + +- **Cache-stale `vaDeriveImage` bug class on RK3399** (T4 finding). Iter1 must use the DMA-BUF GL import path for pixel verification (per Pass/fail #4 above), not `vaDeriveImage`. The image-export bug fix is Phase 4 cross-cutting work, not iter1-scoped. +- **`ffmpeg -hwaccel v4l2request` MPEG-2 architectural divergence**: 4 `MEDIA_IOC_REQUEST_ALLOC` (vs our backend's 16 in iter6 binding), `VIDIOC_EXPBUF` + `DMA_BUF_IOCTL_SYNC` for cache-safe readback. Whether to mirror the EXPBUF + SYNC pattern in our backend or stay with the iter6 cap_pool model is a Phase 4 design decision; iter1 doesn't have to converge on ffmpeg's pattern as long as the boolean criteria pass. +- **HEVC profile enumerated despite `h265.c` not compiled** (T3 finding). Orthogonal to iter1; cleaning up the false-advertising is Phase 4 cross-cutting. + +## Tooling and measurement-instrument inventory (live verification) + +Re-verified on fresnel at iter1 open: + +- `strace -ff -tt -y -e trace=ioctl,openat,close` for libva-side V4L2 ioctl tracing — proven working in T4. +- `sudo sh -c "echo 1 > /sys/kernel/tracing/events/v4l2/enable"` for kernel v4l2 tracepoints — proven working in T4 + cross-validator sweep. +- `mpv --hwdec=vaapi --vo=image` (cache-safe pixel verify) — proven on T4, replicates for MPEG-2 in iter1 binding cells. +- `ffmpeg -hwaccel v4l2request` (independent V4L2 client cross-validator) — proven on all 5 codecs in T6. +- Backend build harness on fresnel: `ninja -C ~/src/libva-v4l2-request-fourier/build && sudo ninja -C ~/src/libva-v4l2-request-fourier/build install`. + +Iter1 will likely add per-source debug `printf`/`fprintf(stderr, ...)` instrumentation in `src/config.c`'s `RequestCreateConfig` (and possibly `picture.c` MPEG-2 dispatch) to pin the rejection site. That instrumentation is iter1-internal scratch — clean sweep at iter1 close per Phase 5 review precedent (libva-multiplanar iter5 sweep removed ~339 lines of debug instrumentation at close). + +## In-scope (LOCKED 2026-05-07 for iteration 1) + +- libva-v4l2-request-fourier backend MPEG-2 path on hantro-vpu-dec. +- `src/config.c::RequestCreateConfig` MPEG-2 rejection-site investigation + fix. +- `src/picture.c` MPEG-2 dispatch path (if Phase 2 source-read finds it implicated). +- `src/mpeg2.c` set-controls path verification against kernel `hantro_mpeg2.c` and FFmpeg `v4l2_request_mpeg2.c`. +- iter1 binding-cell test harness: a script that runs the five Pass/fail checks above, captures evidence to `phase0_evidence//iter1_mpeg2/`, and emits a markdown verdict. +- Cache-safe pixel verify must use DMA-BUF GL import (not `vaDeriveImage`). +- Regression check on H.264 (re-run T4 incantation, compare hashes against reference). + +## Out-of-scope (LOCKED 2026-05-07 for iteration 1) + +- HEVC, VP9, VP8 work — separate iterations per the suggested order in `cross_validator_traces.md`. +- The vaDeriveImage cache-stale bug class fix — Phase 4 cross-cutting work (potentially under a separate iteration). +- chromium-fourier 149 install on fresnel — not gating; can land as a Phase 0 follow-up to iter1's substrate when convenient. +- MPEG-2 performance metrics (FPS, CPU%, drops) — Phase 1+ separate iteration. iter1 is boolean correctness only, per the campaign-locked criterion. +- Long-duration MPEG-2 stress (>10s) — boolean correctness on 2 frames is enough; longer-run regressions surface as a separate iteration if iter1 exposes any. +- MPEG-2 Simple-only fixtures — the campaign locked fixture is Main profile; Simple is a strict subset and likely passes once Main does. +- AVI, MPG (program-stream), or other MPEG-2 containers beyond the iter1 fixture's MPEG-TS shape. iter1 fixture is `bbb_720p10s_mpeg2.ts`; container-shape coverage is a Phase 1+ matter. +- Upstream Linux engagement (per `feedback_no_upstream.md`). Kernel side works; nothing to file. + +## Phase 1 success criterion (LOCKED 2026-05-07) + +Per `feedback_dev_process.md` Phase 1 — define the objective in measurable terms before touching anything. The five Pass/fail bullets at the top of this document are the iter1 success criterion, locked. Phase 3 baseline measurement (the strace + ftrace contract trace of *current* MPEG-2 failure on our backend, plus the reference ffmpeg-v4l2request trace already in `phase0_evidence/2026-05-07/cross_validator/mpeg2/`) feeds Phase 4 plan; Phase 7 verification re-runs all five checks against the patched backend. + +If Phase 3 baseline reveals the chosen criterion is the wrong target (per `feedback_dev_process.md`'s Phase 3 → Phase 1 loopback), the criterion will be rewritten and re-locked. Plausible reasons that would trigger the loopback: + +- The MPEG-2 fixture is malformed in a way that exposes a fixture-side bug rather than a backend-side bug. (Mitigation: ffmpeg-v4l2request decodes the same fixture cleanly per cross_validator data, so this is unlikely.) +- Pixel verification via DMA-BUF GL import for MPEG-2 produces non-matching hashes for reasons unrelated to the backend (e.g., GL color-space conversion divergence, panfrost MPEG-2-specific quirk). In that case the criterion gets a different verifier — direct ffmpeg `hwdownload,format=nv12` from our libva path, or a custom C reproducer with `msync(MS_SYNC|MS_INVALIDATE)`. +- The vaCreateConfig rejection site turns out to be in libva itself (not our backend), making "fix RequestCreateConfig" the wrong scope. (Unlikely — the error message threads through our backend's return path.) + +## What "iteration 1 close" looks like + +A clean iter1 close per `feedback_dev_process.md` Phase 8 yields: + +- All five Pass/fail criteria green. +- A `phase8_iteration1_close.md` document in this campaign repo summarizing the bug, the contract, the fix, and the binding-cell numbers. +- A second-codec passing entry in the campaign-level scoreboard (currently 1/5, target 2/5 after iter1). +- Memory entry distilling the lesson — per `feedback_dev_process.md` Phase 8 "do not let the lesson rot in chat history." +- A debug-instrumentation sweep — any `printf`/`fprintf` added during Phase 6 must be removed before close. +- The Phase 5 sonnet-architect review pass (per `feedback_dev_process.md` Phase 5) signed off. +- Commit history under `git.reauktion.de/marfrit/fresnel-fourier` reflecting the iter1 phases, all authored as `claude-noether` per [`memory/feedback_gitea_as_claude_noether.md`](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_gitea_as_claude_noether.md).