Files
ampere-av1-enablement/phase0_findings.md
T
claude-noether f9562972bd Phase 0: AV1 hw decode on ampere VERIFIED bit-perfect first-try
Pivoted from ampere-vp9-enablement (closed at structural impossibility
on rkvdec/vdpu381). Janet PIVOT verdict pointed at AV1 — verification
shows it works out-of-the-box on mainline 7.0.0-rc3:

- Kernel driver: drivers/media/platform/verisilicon/
  rockchip_vpu981_hw_av1_dec.c (in-tree, loaded as hantro-vpu)
- Hardware: vpu981 dedicated AV1 IP at fdc70000 (separate from rkvdec)
- V4L2 node: /dev/video4 enumerates AV1F format
- Userspace: ffmpeg -hwaccel v4l2request kdirect path works

Verification: byte-compare HW (hantro-vpu) vs SW (libdav1d) on two
AOM test vectors:
  - av1-1-b8-01-size-208x208.ivf  (2 frames):  100.0000% exact match
  - av1-1-b8-23-film_grain-50.ivf (10 frames): 100.0000% exact match
    per frame, including AV1 film_grain post-processing

Pivot outcome:
- VP9 campaign: 10 iterations + 2 architect reviews → structural
  impossibility (kernel-side gap that needs upstream/Collabora coord)
- AV1 verification: 0 iterations → bit-perfect first try

The "enablement campaign" framing is mostly inappropriate for AV1 —
this is a verification campaign. Real upstream work was done by
Verisilicon + Collabora; we just confirm it works on ampere.

Optional follow-ups (out of Phase 0 scope):
A. libva backend AV1 dispatch (enables VAAPI consumers)
B. Fluster AV1-TEST-VECTORS comprehensive validation
C. 1080p/4K real-world AV1 stress test

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 07:00:02 +00:00

4.4 KiB
Raw Blame History

Phase 0 findings — AV1 on RK3588 ampere is already a working mainline path

Date: 2026-05-17 09:00. Campaign opened immediately after the sibling ampere-vp9-enablement closed at structural-impossibility (10 failed iterations of VP9-on-vdpu381 register tuning; Janet PIVOT verdict recommended AV1).

Substrate

Component Where Status
Kernel AV1 driver drivers/media/platform/verisilicon/rockchip_vpu981_hw_av1_dec.c (mainline 7.0) Loaded as hantro-vpu module
V4L2 device node /dev/video4 on ampere Enumerates AV1F (AV1 Frame, compressed)
Hardware IP vpu981 @ fdc70000 (RK3588 dedicated AV1 decoder) Active, IOMMU mapped
Userspace path 1 ffmpeg -hwaccel v4l2request (kdirect) Bit-perfect verified
Userspace path 2 libva -hwaccel vaapi via libva-v4l2-request-fourier NOT implemented (no av1.c in backend)

Verification methodology (per feedback_compare_hw_against_sw_reference)

Two AOM test vectors decoded twice — once via HW (v4l2request → hantro-vpu) and once via SW (libdav1d). Byte-compare the raw NV12 outputs.

Test 1: av1-1-b8-01-size-208x208.ivf (small smoke test)

  • 2 frames, 208×208 yuv420p
  • Both HW and SW produced 129792 bytes (= 2 × 208 × 208 × 1.5)
  • 100.0000% exact match

Test 2: av1-1-b8-23-film_grain-50.ivf (complex AV1 feature: film grain)

  • 10 frames, 352×288 yuv420p, includes AV1 film_grain synthesis
  • Both produced 1520640 bytes (= 10 × 352 × 288 × 1.5)
  • All 10 frames: 100.0000% exact match

Film grain is one of AV1's more complex post-processing features (decoder-side noise synthesis with provided model parameters); bit-perfect match here is a strong signal that the HW decoder's spec compliance is solid.

What this verifies

  1. The mainline rockchip_vpu981_hw_av1_dec.c driver is functional on ampere's 7.0.0-rc3 kernel
  2. The hantro-vpu framework correctly routes AV1 to vpu981 (not to vdpu381 rkvdec)
  3. ffmpeg's v4l2request hwaccel correctly invokes the kernel AV1 path
  4. HW output matches libdav1d SW reference at byte level

What this does NOT verify

  • libva backend AV1 support (still completely absent from libva-v4l2-request-fourier)
  • Higher resolutions (1080p/4K AV1 — current tests are 208×208 + 352×288)
  • Longer sequences (current tests are 2 + 10 frames)
  • Profile 1/2 (high-bit-depth, 10-bit) — current tests are 8-bit Profile 0
  • Real-time-ish workload (current tests are tiny vectors)

Why this campaign closed at Phase 0

Unlike VP9 (where the gap was kernel-side register-layout incompleteness requiring multi-week porting), AV1 was already enabled by upstream Verisilicon + Collabora work and just needed verification. The "enablement campaign" framing is mostly inappropriate here — the right word is "verification campaign".

Open scope (optional follow-ups)

A: Libva backend AV1 support (multi-day work)

Add ~/src/libva-v4l2-request-fourier/src/av1.{c,h} to dispatch VAAPI AV1 controls (VAPictureParameterBufferAV1, VASliceParameterBufferAV1, etc.) onto the V4L2_CID_STATELESS_AV1_* control set. Pattern mirrors existing VP9 + HEVC dispatch. Would enable VAAPI consumers (VLC, mpv --hwdec=vaapi, GStreamer-VAAPI, browsers).

Useful if the ampere desktop has consumers that prefer VAAPI over direct v4l2request.

B: Fluster AV1-TEST-VECTORS comprehensive run

Set up gst-launch-1.0 + Fluster GStreamer-AV1-V4L2SL-Gst1.0 test runner. Get a quantitative pass-rate against the canonical AV1 conformance suite.

C: Real-world bitstream verification

Encode + decode a real 1080p/4K AV1 sample (e.g., a 30s clip from a Netflix-encoded AV1 sample, if available). Verify bit-perfect at scale.

Persistent state

  • Repo: git.reauktion.de/claude-noether/ampere-av1-enablement — this README + Phase 0 close
  • Local working dir: fresnel:~/src/ampere-av1-enablement/
  • Ampere: restored to sibling-campaign-close kernel module (HEVC bit-perfect retained). AV1 path is hantro-vpu (separate from rkvdec which the VP9 campaign attempted to modify).
  • Test vectors: /tmp/test_av1.ivf + /tmp/av1_larger.ivf on both fresnel and ampere
  • Outputs: /tmp/hw-av1*.nv12 + /tmp/sw-av1*.nv12 on ampere (~1.5MB total)

Verdict

AV1 hardware decode on ampere is WORKING via mainline + ffmpeg-v4l2request, bit-perfect against SW reference. Pivot from VP9 success.