Commit Graph

4 Commits

Author SHA1 Message Date
claude-noether 695588aaaf Phase 1 v3 amendment: Janet PROCEED with 2 plan-text fixes
Janet re-review verdict on v2: PROCEED with no architecture changes.
Two minor plan-text amendments:

A. cap_pool init for vpu981 — at backend init (NOT lazy). Three
   explicit init calls in VA_DRIVER_INIT_FUNC. Lazy init creates race
   window where concurrent VP8 (legacy hantro) and AV1 (vpu981)
   sessions can alias cap_pool state across the wrong device.

B. Lock the F1 test vector: av1-1-b8-23-film_grain-50.ivf (AOM data
   set). Already used in Phase 0 kdirect; reuse via libva for byte-
   compare. Default 1080p encodes may be single-tile and never
   exercise F1; a named conformance vector locks the test surface.

Cross-compile hazard documented: _Static_assert validates against
COMPILE-TIME kernel headers. Sibling-campaign pattern is native arm64
build on ampere — no cross-compile in scope. Comment added near the
assert for future Makefile changes.

Phase 2 entry checklist locked. Plan stack: v1 → v2 (Q5 BLOCKER fix)
→ v3 amendment (Janet PROCEED + 2 fixes). Implementation starts with
request.c third-device scaffolding (smallest atomic change), then
small file edits, then NEW av1.h, then NEW av1.c (~700 LoC), then
build + Phase 3 hardware test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 07:51:56 +00:00
claude-noether c734a82c41 Phase 1 plan v2: address Janet AMEND verdict
Janet v1 verdict was AMEND with 1 structural BLOCKER (Q5) + 3
implementation-time risks (F1-F3).

Q5 (BLOCKER) empirically confirmed on ampere: RK3588 has THREE
hantro-vpu instances (legacy MPEG2/VP8 at /dev/video2, encoder at
/dev/video3, vpu981 AV1 at /dev/video4). Current backend's 2-device
fd model (rkvdec + hantro) is "RK3399-shaped knowledge" per its own
comment — silently picks the wrong hantro on RK3588.

v2 fix: add third device slot (video_fd_vpu981 + media_fd_vpu981),
discriminate by V4L2_PIX_FMT_AV1_FRAME capability (not driver name),
extend request_device_kind_for_profile with 'a' kind for VAProfile-
AV1Profile0, extend cap_pool pair-of-flags layout per iter38 pattern.

Q1 amendment: tile_group_entry DYNAMIC_ARRAY size = sizeof * MAX(N,1);
add _Static_assert for kernel uAPI drift catch.

Q2 amendment: VIDIOC_QUERY_EXT_CTRL probe at context init for film_grain
availability; gate per-frame send on the flag.

Q3 PROCEED: per-frame SEQUENCE send (no caching).
Q4 PROCEED: ignore VAOpaqueAV1 (codec_store_buffer has default fallback).
Q6 PROCEED: V4L2_REQUEST_MAX_PROFILES=11 exactly full with AV1; add
            comment for future bumps.

F1-F3 implementation risks catalogued for Phase 2 code review:
- mi_col/row_starts sentinel (silent corruption on multi-tile)
- superres_denom AV1_SUPERRES_NUM=8 default offset
- loop_restoration_size[] USES_LR flag gating

File estimate up from 800 to 880 LoC (added vpu981 scaffolding).
Phase 0 test vectors were single-tile (208×208 + 352×288); Phase 3
verification must include multi-tile 1080p+ AV1 to exercise F1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 07:50:47 +00:00
claude-noether 794bac72df Phase 1 plan: libva-v4l2-request-fourier AV1 dispatch
Goal: VAAPI consumers (mpv, VLC, GStreamer-VAAPI, browsers) can decode
AV1 via libva backend, same HW path that ffmpeg-v4l2request kdirect
already uses bit-perfectly (Phase 0).

Plan ~800 LoC across 7 files (new av1.c ~700 LoC, av1.h, plus edits to
codec.c, config.c, picture.c, surface.h, Makefile.am).

Canonical reference: Kwiboo/FFmpeg v4l2-request-n8.1
libavcodec/v4l2_request_av1.c (636 LoC) — exact field mappings for
v4l2_ctrl_av1_sequence/_frame/_film_grain/_tile_group_entry.

Architectural pattern: existing vp9.c (700+ LoC) in the backend.

6 open architectural questions for Janet review before Phase 2 code:
Q1 4-control batching (vs vp9's 2)
Q2 film_grain conditional vs unconditional submit
Q3 SEQUENCE caching strategy
Q4 VAOpaqueAV1 opaque payload semantics
Q5 vpu981 vs rkvdec device selection in cap_pool
Q6 multi-device probe extension (iter38b pattern + vpu981 for AV1)

Phase 2 starts after Janet sign-off.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 07:47:28 +00:00
claude-noether f9562972bd Phase 0: AV1 hw decode on ampere VERIFIED bit-perfect first-try
Pivoted from ampere-vp9-enablement (closed at structural impossibility
on rkvdec/vdpu381). Janet PIVOT verdict pointed at AV1 — verification
shows it works out-of-the-box on mainline 7.0.0-rc3:

- Kernel driver: drivers/media/platform/verisilicon/
  rockchip_vpu981_hw_av1_dec.c (in-tree, loaded as hantro-vpu)
- Hardware: vpu981 dedicated AV1 IP at fdc70000 (separate from rkvdec)
- V4L2 node: /dev/video4 enumerates AV1F format
- Userspace: ffmpeg -hwaccel v4l2request kdirect path works

Verification: byte-compare HW (hantro-vpu) vs SW (libdav1d) on two
AOM test vectors:
  - av1-1-b8-01-size-208x208.ivf  (2 frames):  100.0000% exact match
  - av1-1-b8-23-film_grain-50.ivf (10 frames): 100.0000% exact match
    per frame, including AV1 film_grain post-processing

Pivot outcome:
- VP9 campaign: 10 iterations + 2 architect reviews → structural
  impossibility (kernel-side gap that needs upstream/Collabora coord)
- AV1 verification: 0 iterations → bit-perfect first try

The "enablement campaign" framing is mostly inappropriate for AV1 —
this is a verification campaign. Real upstream work was done by
Verisilicon + Collabora; we just confirm it works on ampere.

Optional follow-ups (out of Phase 0 scope):
A. libva backend AV1 dispatch (enables VAAPI consumers)
B. Fluster AV1-TEST-VECTORS comprehensive validation
C. 1080p/4K real-world AV1 stress test

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 07:00:02 +00:00