Files

83 lines
8.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ampere-av1-enablement Phase 3+5 close — AV1 PASS on all-intra; film_grain+show_existing edge case localized
Closed 2026-05-17 evening. Phase 2.1 + Phase 3 + Phase 5 review iteration. Substrate state on ampere: backend tip `902d6c1` on `av1-iter1` branch (matches `av1-iter1-imported` on noether, pushed to gitea). Module installed at `/usr/lib/dri/v4l2_request_drv_video.so`. Kernel `7.0.0-rc3-devices+` unchanged.
## Phase 5 review outcome (post-Phase-3)
sonnet-architect review found 3 amendments:
- **Amendment 1 (loop_restoration remap table) — REVERTED**: reviewer proposed a permutation matching Kwiboo + AV1 spec wire encoding. Empirically applied → regressed allintra 10/10 → 0/10 + test_av1 bit-exact → DIFF. Per [[feedback_review_empirical_over_theoretical]] identity mapping wins. Either VAAPI's `yframe_restoration_type` is already in V4L2-enum order, or vpu981's V4L2 enum interpretation differs from the uAPI header docs.
- **Amendment 4 (stale `linked_decode_surface_id`) — KEPT**: clear in BeginPicture after iter2 Fix 3 release. Prevents spurious link-borrows when ffmpeg-vaapi recycles former display surfaces as decode targets. No-op for non-AV1 codecs.
- **Amendment 5 (SEPARATE_UV_DELTA_Q seq flag missing) — noted, not actionable**: VAAPI doesn't expose `color_config.separate_uv_delta_q`. Needs bitstream-side info.
## Test matrix
| Fixture | Source | Frames | Resolution | Profile | Result |
|---|---|---|---|---|---|
| `test_av1.ivf` | AOM `av1-1-b8-01-size-208x208` (also `/tmp/test_av1.ivf`) | 2 | 208×208 | Main, 8-bit, no grain | **bit-exact PASS 2/2** (sha `029ee72c214b37c1`) |
| `av1-1-b8-02-allintra.ivf` | AOM | 39 | 352×288 | Main, 8-bit, all-intra | **bit-exact PASS 10/10** (first 10 frames sampled) |
| `av1_larger.ivf` / `av1-1-b8-23-film_grain-50.ivf` | AOM | 10 | 352×288 | Main, 8-bit, film_grain, show_existing_frame | **3/10 PASS** (frames 0, 2, 4 — apply_grain=1 IDR-derived) |
| `av1-1-b10-23-film_grain-50.ivf` | AOM | 10 | 352×288 | Main, 10-bit, film_grain | both libva AND kdirect produce 0 bytes — vpu981 may not support 10-bit AV1 |
## What works (validated on hardware)
- **AV1 dispatch + control submission**: `SEQUENCE`, `FRAME`, `TILE_GROUP_ENTRY`, `FILM_GRAIN` all submitted correctly; strace shows kernel accepts every batch.
- **All four V4L2 controls byte-identical to kdirect** for the first 7 EndPicture calls (verified via patched `libavcodec.so` LD_LIBRARY_PATH override that adds an fwrite diag to `ff_v4l2_request_append_output`).
- **DPB / reference frame timestamp plumbing**: VAAPI `ref_frame_map[i]` (surface IDs) → `SURFACE()` lookup → `v4l2_timeval_to_ns(&ref_surface->timestamp)`.
- **Film grain link infrastructure**: when `apply_grain=1`, `current_display_picture != current_frame`; we link the display surface to the decode surface so `vaGetImage` on the display surface follows back to the decode surface's CAPTURE slot.
- **`refresh_frame_flags = 0xff`**: VAAPI doesn't expose; default 0xff matches AV1 spec for KEY/SWITCH frames and kdirect's submission.
- **`ENABLE_SUPERRES` gated on `picture->pic_info_fields.bits.use_superres`**: matches kdirect; was previously unconditional set-true.
- **Per-surface AV1 `order_hint` tracking**: surfaces carry `av1_order_hint` set at decode time; referenced surfaces' values populate the V4L2 ctrl's `order_hints[]`.
- **F1/F2/F3 risk mitigations from the Janet plan v2 review**: `mi_col/row_starts` sentinel, `superres_denom` correct, `loop_restoration_size[]` gated on USES_LR — all applied.
## What's open (Phase 4 territory)
Remaining 7/10 divergence on `av1_larger.ivf` localized to:
- ffmpeg-vaapi's AV1 hwaccel issues 2 EXTRA `vaEndPicture` calls on **REUSED surfaces** (`0x4000008` repeated at t8, `0x4000006` repeated at t9) compared to ffmpeg-v4l2request's 7 calls for the same input.
- The reused-surface pattern correlates with the IVF having `show_existing_frame` OBUs at frame positions 2, 4, 6 (each just 5 bytes — "redisplay frame X").
- Our `iter2 Fix 3` (release-on-rebind) invariant: 1 surface ↔ 1 cap_pool slot at a time. When ffmpeg-vaapi rebinds, prior CAPTURE data is gone.
- **Falsified**: `LIBVA_SKIP_REBIND=1` experiment (do not unbind in BeginPicture, leak the old slot) produced identical 3/10 PASS count as default behavior. So `iter2 Fix 3` is NOT the cause; the issue is deeper in how the surface→slot accounting interacts with ffmpeg-vaapi's surface reuse.
### Hypothesized fix paths (Phase 4)
1. **Multi-surface-per-slot cap_pool refactor**: track {slot → set of surfaces} so when a surface is re-bound, the slot can still serve `vaGetImage` for the surface IDs that previously bound it. Bigger refactor than this iteration.
2. **Surface-ID identity tracking via picture parameters**: snoop AV1's `current_frame` / `current_display_picture` across frames to detect when ffmpeg-vaapi means "render this prior frame again" vs "decode a new frame", and dispatch differently. Requires understanding ffmpeg-vaapi's AV1 hwaccel surface allocation logic.
3. **ffmpeg-vaapi source fix**: modify ffmpeg-vaapi to use distinct surfaces for show_existing_frame display rather than reusing decode surfaces. Cross-package; rejected as default first move.
## Commits delivered this iteration
On `av1-iter1` branch (both ampere `/home/mfritsche/src/libva-v4l2-request-fourier/` + noether `/home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/`):
```
902d6c1 ampere-av1 Phase 5 review: stale linked_decode_surface_id clear; remap fix REVERTED
c839b94 ampere-av1 Phase 3 finding: iter2 Fix 3 release is NOT the divergence cause
d7ef0f6 ampere-av1 Phase 3: SEQUENCE byte-equal kdirect; 3/10 frames PASS bit-exact
5803cbc ampere-av1 Phase 3 progress: film_grain link + UPDATE_GRAIN; frame 0 bit-exact
ab79ed5 ampere-av1 Phase 3 in-progress notes: UPDATE_GRAIN segfault; 352x288 still 0-byte
5fb7e36 ampere-av1 Phase 3 fix: wire reference_frame_ts[] from VAAPI ref_frame_map[]
85bcddb v4l2: surface error_idx + errno on VIDIOC_S_EXT_CTRLS failure
9c30ecc ampere-av1 Phase 2.1: implement av1_set_controls body (~500 LoC)
78a9978 ampere-av1 Phase 2 step 4: AV1 dispatch scaffolding compiles and wires
61db76e ampere-av1 Phase 2 step 2: advertise VAProfileAV1Profile0 via libva
bed75c0 ampere-av1 Phase 2 step 1: third-device fd scaffolding for vpu981
```
Branch pushed to gitea at `marfrit/libva-v4l2-request-fourier` (no ssh sideband disconnect this time — pre-amnesia me's vp9 branch concern not reproduced for av1-iter1).
## Verifier scripts retained
- `/tmp/diff_av1_ctrls.py` on ampere: per-CID byte diff between two strace logs. Decodes octal-escaped strings, matches the same ctrl across calls, prints byte-level diffs.
- `/tmp/ivf_split.py` on ampere: splits an IVF file into per-frame `.bin` files. Reveals AV1 show_existing_frame OBUs as 5-byte stub frames.
- patched `libavcodec.so.62.28.100` shadow build on boltzmann at `~/marfrit-packages/arch/ffmpeg-v4l2-request-git/src/FFmpeg/libavcodec/libavcodec.so.62`; the `.bak` source was restored to clean — re-apply via:
```sh
ssh boltzmann 'sed -i "/memcpy(pic->output->addr + pic->output->bytesused, data, size);/a\\ do { static unsigned int __dump_idx = 0; char __p[256]; snprintf(__p, sizeof(__p), \"/tmp/K_dump_kdirect/append_%04u_size%u.bin\", __dump_idx++, size); FILE *__f = fopen(__p, \"wb\"); if (__f) { fwrite(data, 1, size, __f); fclose(__f); } } while (0);" ~/marfrit-packages/arch/ffmpeg-v4l2-request-git/src/FFmpeg/libavcodec/v4l2_request.c'
```
then `make -j8 libavcodec/libavcodec.so` + scp to ampere's `/tmp/lib_patched/`, run kdirect with `LD_LIBRARY_PATH=/tmp/lib_patched:$LD_LIBRARY_PATH`.
## Stance
Phase 3 closes with **AV1 hardware decode WORKING for the common cases**: simple intra-only (10/10), grain-free inter (test_av1), and grain-IDR frames (3/10 on av1_larger). The remaining edge case (grain + show_existing inter frames) is a real divergence but is now narrowly localized — its fix space is understood (cap_pool refactor or ffmpeg-vaapi surface tracking), and a future iteration can pick it up with the full diagnostic infrastructure already in place.
Tasks #21 carries the resumption details.