8.0 KiB
ampere-av1-enablement Phase 3+5 close — AV1 PASS on all-intra; film_grain+show_existing edge case localized
Closed 2026-05-17 evening. Phase 2.1 + Phase 3 + Phase 5 review iteration. Substrate state on ampere: backend tip 902d6c1 on av1-iter1 branch (matches av1-iter1-imported on noether, pushed to gitea). Module installed at /usr/lib/dri/v4l2_request_drv_video.so. Kernel 7.0.0-rc3-devices+ unchanged.
Phase 5 review outcome (post-Phase-3)
sonnet-architect review found 3 amendments:
- Amendment 1 (loop_restoration remap table) — REVERTED: reviewer proposed a permutation matching Kwiboo + AV1 spec wire encoding. Empirically applied → regressed allintra 10/10 → 0/10 + test_av1 bit-exact → DIFF. Per feedback_review_empirical_over_theoretical identity mapping wins. Either VAAPI's
yframe_restoration_typeis already in V4L2-enum order, or vpu981's V4L2 enum interpretation differs from the uAPI header docs. - Amendment 4 (stale
linked_decode_surface_id) — KEPT: clear in BeginPicture after iter2 Fix 3 release. Prevents spurious link-borrows when ffmpeg-vaapi recycles former display surfaces as decode targets. No-op for non-AV1 codecs. - Amendment 5 (SEPARATE_UV_DELTA_Q seq flag missing) — noted, not actionable: VAAPI doesn't expose
color_config.separate_uv_delta_q. Needs bitstream-side info.
Test matrix
| Fixture | Source | Frames | Resolution | Profile | Result |
|---|---|---|---|---|---|
test_av1.ivf |
AOM av1-1-b8-01-size-208x208 (also /tmp/test_av1.ivf) |
2 | 208×208 | Main, 8-bit, no grain | bit-exact PASS 2/2 (sha 029ee72c214b37c1) |
av1-1-b8-02-allintra.ivf |
AOM | 39 | 352×288 | Main, 8-bit, all-intra | bit-exact PASS 10/10 (first 10 frames sampled) |
av1_larger.ivf / av1-1-b8-23-film_grain-50.ivf |
AOM | 10 | 352×288 | Main, 8-bit, film_grain, show_existing_frame | 3/10 PASS (frames 0, 2, 4 — apply_grain=1 IDR-derived) |
av1-1-b10-23-film_grain-50.ivf |
AOM | 10 | 352×288 | Main, 10-bit, film_grain | both libva AND kdirect produce 0 bytes — vpu981 may not support 10-bit AV1 |
What works (validated on hardware)
- AV1 dispatch + control submission:
SEQUENCE,FRAME,TILE_GROUP_ENTRY,FILM_GRAINall submitted correctly; strace shows kernel accepts every batch. - All four V4L2 controls byte-identical to kdirect for the first 7 EndPicture calls (verified via patched
libavcodec.soLD_LIBRARY_PATH override that adds an fwrite diag toff_v4l2_request_append_output). - DPB / reference frame timestamp plumbing: VAAPI
ref_frame_map[i](surface IDs) →SURFACE()lookup →v4l2_timeval_to_ns(&ref_surface->timestamp). - Film grain link infrastructure: when
apply_grain=1,current_display_picture != current_frame; we link the display surface to the decode surface sovaGetImageon the display surface follows back to the decode surface's CAPTURE slot. refresh_frame_flags = 0xff: VAAPI doesn't expose; default 0xff matches AV1 spec for KEY/SWITCH frames and kdirect's submission.ENABLE_SUPERRESgated onpicture->pic_info_fields.bits.use_superres: matches kdirect; was previously unconditional set-true.- Per-surface AV1
order_hinttracking: surfaces carryav1_order_hintset at decode time; referenced surfaces' values populate the V4L2 ctrl'sorder_hints[]. - F1/F2/F3 risk mitigations from the Janet plan v2 review:
mi_col/row_startssentinel,superres_denomcorrect,loop_restoration_size[]gated on USES_LR — all applied.
What's open (Phase 4 territory)
Remaining 7/10 divergence on av1_larger.ivf localized to:
- ffmpeg-vaapi's AV1 hwaccel issues 2 EXTRA
vaEndPicturecalls on REUSED surfaces (0x4000008repeated at t8,0x4000006repeated at t9) compared to ffmpeg-v4l2request's 7 calls for the same input. - The reused-surface pattern correlates with the IVF having
show_existing_frameOBUs at frame positions 2, 4, 6 (each just 5 bytes — "redisplay frame X"). - Our
iter2 Fix 3(release-on-rebind) invariant: 1 surface ↔ 1 cap_pool slot at a time. When ffmpeg-vaapi rebinds, prior CAPTURE data is gone. - Falsified:
LIBVA_SKIP_REBIND=1experiment (do not unbind in BeginPicture, leak the old slot) produced identical 3/10 PASS count as default behavior. Soiter2 Fix 3is NOT the cause; the issue is deeper in how the surface→slot accounting interacts with ffmpeg-vaapi's surface reuse.
Hypothesized fix paths (Phase 4)
- Multi-surface-per-slot cap_pool refactor: track {slot → set of surfaces} so when a surface is re-bound, the slot can still serve
vaGetImagefor the surface IDs that previously bound it. Bigger refactor than this iteration. - Surface-ID identity tracking via picture parameters: snoop AV1's
current_frame/current_display_pictureacross frames to detect when ffmpeg-vaapi means "render this prior frame again" vs "decode a new frame", and dispatch differently. Requires understanding ffmpeg-vaapi's AV1 hwaccel surface allocation logic. - ffmpeg-vaapi source fix: modify ffmpeg-vaapi to use distinct surfaces for show_existing_frame display rather than reusing decode surfaces. Cross-package; rejected as default first move.
Commits delivered this iteration
On av1-iter1 branch (both ampere /home/mfritsche/src/libva-v4l2-request-fourier/ + noether /home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/):
902d6c1 ampere-av1 Phase 5 review: stale linked_decode_surface_id clear; remap fix REVERTED
c839b94 ampere-av1 Phase 3 finding: iter2 Fix 3 release is NOT the divergence cause
d7ef0f6 ampere-av1 Phase 3: SEQUENCE byte-equal kdirect; 3/10 frames PASS bit-exact
5803cbc ampere-av1 Phase 3 progress: film_grain link + UPDATE_GRAIN; frame 0 bit-exact
ab79ed5 ampere-av1 Phase 3 in-progress notes: UPDATE_GRAIN segfault; 352x288 still 0-byte
5fb7e36 ampere-av1 Phase 3 fix: wire reference_frame_ts[] from VAAPI ref_frame_map[]
85bcddb v4l2: surface error_idx + errno on VIDIOC_S_EXT_CTRLS failure
9c30ecc ampere-av1 Phase 2.1: implement av1_set_controls body (~500 LoC)
78a9978 ampere-av1 Phase 2 step 4: AV1 dispatch scaffolding compiles and wires
61db76e ampere-av1 Phase 2 step 2: advertise VAProfileAV1Profile0 via libva
bed75c0 ampere-av1 Phase 2 step 1: third-device fd scaffolding for vpu981
Branch pushed to gitea at marfrit/libva-v4l2-request-fourier (no ssh sideband disconnect this time — pre-amnesia me's vp9 branch concern not reproduced for av1-iter1).
Verifier scripts retained
/tmp/diff_av1_ctrls.pyon ampere: per-CID byte diff between two strace logs. Decodes octal-escaped strings, matches the same ctrl across calls, prints byte-level diffs./tmp/ivf_split.pyon ampere: splits an IVF file into per-frame.binfiles. Reveals AV1 show_existing_frame OBUs as 5-byte stub frames.- patched
libavcodec.so.62.28.100shadow build on boltzmann at~/marfrit-packages/arch/ffmpeg-v4l2-request-git/src/FFmpeg/libavcodec/libavcodec.so.62; the.baksource was restored to clean — re-apply via:thenssh boltzmann 'sed -i "/memcpy(pic->output->addr + pic->output->bytesused, data, size);/a\\ do { static unsigned int __dump_idx = 0; char __p[256]; snprintf(__p, sizeof(__p), \"/tmp/K_dump_kdirect/append_%04u_size%u.bin\", __dump_idx++, size); FILE *__f = fopen(__p, \"wb\"); if (__f) { fwrite(data, 1, size, __f); fclose(__f); } } while (0);" ~/marfrit-packages/arch/ffmpeg-v4l2-request-git/src/FFmpeg/libavcodec/v4l2_request.c'make -j8 libavcodec/libavcodec.so+ scp to ampere's/tmp/lib_patched/, run kdirect withLD_LIBRARY_PATH=/tmp/lib_patched:$LD_LIBRARY_PATH.
Stance
Phase 3 closes with AV1 hardware decode WORKING for the common cases: simple intra-only (10/10), grain-free inter (test_av1), and grain-IDR frames (3/10 on av1_larger). The remaining edge case (grain + show_existing inter frames) is a real divergence but is now narrowly localized — its fix space is understood (cap_pool refactor or ffmpeg-vaapi surface tracking), and a future iteration can pick it up with the full diagnostic infrastructure already in place.
Tasks #21 carries the resumption details.