Phase 0: AV1 hw decode on ampere VERIFIED bit-perfect first-try

Pivoted from ampere-vp9-enablement (closed at structural impossibility
on rkvdec/vdpu381). Janet PIVOT verdict pointed at AV1 — verification
shows it works out-of-the-box on mainline 7.0.0-rc3:

- Kernel driver: drivers/media/platform/verisilicon/
  rockchip_vpu981_hw_av1_dec.c (in-tree, loaded as hantro-vpu)
- Hardware: vpu981 dedicated AV1 IP at fdc70000 (separate from rkvdec)
- V4L2 node: /dev/video4 enumerates AV1F format
- Userspace: ffmpeg -hwaccel v4l2request kdirect path works

Verification: byte-compare HW (hantro-vpu) vs SW (libdav1d) on two
AOM test vectors:
  - av1-1-b8-01-size-208x208.ivf  (2 frames):  100.0000% exact match
  - av1-1-b8-23-film_grain-50.ivf (10 frames): 100.0000% exact match
    per frame, including AV1 film_grain post-processing

Pivot outcome:
- VP9 campaign: 10 iterations + 2 architect reviews → structural
  impossibility (kernel-side gap that needs upstream/Collabora coord)
- AV1 verification: 0 iterations → bit-perfect first try

The "enablement campaign" framing is mostly inappropriate for AV1 —
this is a verification campaign. Real upstream work was done by
Verisilicon + Collabora; we just confirm it works on ampere.

Optional follow-ups (out of Phase 0 scope):
A. libva backend AV1 dispatch (enables VAAPI consumers)
B. Fluster AV1-TEST-VECTORS comprehensive validation
C. 1080p/4K real-world AV1 stress test

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-17 07:00:02 +00:00
commit f9562972bd
2 changed files with 131 additions and 0 deletions
+53
View File
@@ -0,0 +1,53 @@
# ampere-av1-enablement
AV1 hardware decode verification on Rockchip RK3588 ampere (CoolPi CM5 GenBook).
## Status (2026-05-17 09:00)
**VERIFIED WORKING bit-perfect first-try** using mainline 7.0.0-rc3 + ffmpeg-v4l2request kdirect path on the hantro `vpu981` AV1 driver. Zero new code required.
Sibling campaign [ampere-vp9-enablement](https://git.reauktion.de/claude-noether/ampere-vp9-enablement) closed at structural-impossibility on rkvdec/vdpu381 VP9; Janet PIVOT verdict pointed to AV1; verification confirmed AV1 works out-of-the-box.
## Verification
```
$ ssh ampere
$ ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime \
-i /tmp/av1_larger.ivf -vf 'hwdownload,format=nv12' \
-f rawvideo -pix_fmt nv12 /tmp/hw-av1-all.nv12
[AVHWFramesContext] Using V4L2 media driver hantro-vpu (7.0.0) for AV1F
```
Byte-compare against ffmpeg's libdav1d SW reference, all 10 frames of the av1-1-b8-23-film_grain-50.ivf test vector (352×288, includes film-grain feature):
```
frame 0: exact=100.0000%
frame 1: exact=100.0000%
...
frame 9: exact=100.0000%
```
Smaller test vector (av1-1-b8-01-size-208x208.ivf, 2 frames): also 100% match.
## Driver stack
- IP: `vpu981` (Rockchip's dedicated AV1 hardware on RK3588, MMIO at fdc70000)
- Kernel driver: `drivers/media/platform/verisilicon/rockchip_vpu981_hw_av1_dec.c` (in-tree)
- DT compatible: presumably `rockchip,rk3588-vpu981-av1-dec` (verified loaded via `lsmod | grep hantro_vpu`)
- V4L2 node: `/dev/video4` (enumerates AV1F format)
- Userspace path: ffmpeg `-hwaccel v4l2request` (kdirect, not libva)
## What's NOT done
- **libva-v4l2-request-fourier backend AV1 dispatch** — backend has no AV1 codec module. ffmpeg-v4l2request kdirect works without libva. Adding libva AV1 support would make AV1 available to other VAAPI consumers (VLC, mpv with VA-API, GStreamer-VAAPI, browsers via VAAPI/VDPAU). Estimated effort: 1-2 days mirroring the existing HEVC/H.264/VP9 dispatch patterns in `~/src/libva-v4l2-request-fourier/src/` (sibling repo).
- Fluster `AV1-TEST-VECTORS` comprehensive validation (only ran 2 of the AOM test vectors).
- Stress test (long bitstream, 1080p+, complex features beyond film_grain).
## Out of scope
- Adding AV1 to rkvdec/vdpu381 (would duplicate the working hantro-vpu981 path)
- Reviving the failed VP9 work from sibling campaign
## Process
This is a verification campaign more than an enablement campaign. The work was done upstream by Verisilicon + Collabora; this repo documents that it works on ampere out-of-the-box.
+78
View File
@@ -0,0 +1,78 @@
# Phase 0 findings — AV1 on RK3588 ampere is already a working mainline path
Date: 2026-05-17 09:00. Campaign opened immediately after the sibling ampere-vp9-enablement closed at structural-impossibility (10 failed iterations of VP9-on-vdpu381 register tuning; Janet PIVOT verdict recommended AV1).
## Substrate
| Component | Where | Status |
|---|---|---|
| Kernel AV1 driver | `drivers/media/platform/verisilicon/rockchip_vpu981_hw_av1_dec.c` (mainline 7.0) | Loaded as `hantro-vpu` module |
| V4L2 device node | `/dev/video4` on ampere | Enumerates `AV1F (AV1 Frame, compressed)` |
| Hardware IP | vpu981 @ fdc70000 (RK3588 dedicated AV1 decoder) | Active, IOMMU mapped |
| Userspace path 1 | ffmpeg `-hwaccel v4l2request` (kdirect) | **Bit-perfect verified** |
| Userspace path 2 | libva `-hwaccel vaapi` via libva-v4l2-request-fourier | NOT implemented (no `av1.c` in backend) |
## Verification methodology (per [feedback_compare_hw_against_sw_reference](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_compare_hw_against_sw_reference.md))
Two AOM test vectors decoded twice — once via HW (`v4l2request → hantro-vpu`) and once via SW (libdav1d). Byte-compare the raw NV12 outputs.
### Test 1: av1-1-b8-01-size-208x208.ivf (small smoke test)
- 2 frames, 208×208 yuv420p
- Both HW and SW produced 129792 bytes (= 2 × 208 × 208 × 1.5)
- **100.0000% exact match**
### Test 2: av1-1-b8-23-film_grain-50.ivf (complex AV1 feature: film grain)
- 10 frames, 352×288 yuv420p, includes AV1 film_grain synthesis
- Both produced 1520640 bytes (= 10 × 352 × 288 × 1.5)
- **All 10 frames: 100.0000% exact match**
Film grain is one of AV1's more complex post-processing features (decoder-side noise synthesis with provided model parameters); bit-perfect match here is a strong signal that the HW decoder's spec compliance is solid.
## What this verifies
1. The mainline `rockchip_vpu981_hw_av1_dec.c` driver is functional on ampere's 7.0.0-rc3 kernel
2. The hantro-vpu framework correctly routes AV1 to vpu981 (not to vdpu381 rkvdec)
3. ffmpeg's v4l2request hwaccel correctly invokes the kernel AV1 path
4. HW output matches libdav1d SW reference at byte level
## What this does NOT verify
- libva backend AV1 support (still completely absent from `libva-v4l2-request-fourier`)
- Higher resolutions (1080p/4K AV1 — current tests are 208×208 + 352×288)
- Longer sequences (current tests are 2 + 10 frames)
- Profile 1/2 (high-bit-depth, 10-bit) — current tests are 8-bit Profile 0
- Real-time-ish workload (current tests are tiny vectors)
## Why this campaign closed at Phase 0
Unlike VP9 (where the gap was kernel-side register-layout incompleteness requiring multi-week porting), AV1 was already enabled by upstream Verisilicon + Collabora work and just needed verification. The "enablement campaign" framing is mostly inappropriate here — the right word is "verification campaign".
## Open scope (optional follow-ups)
### A: Libva backend AV1 support (multi-day work)
Add `~/src/libva-v4l2-request-fourier/src/av1.{c,h}` to dispatch VAAPI AV1 controls (VAPictureParameterBufferAV1, VASliceParameterBufferAV1, etc.) onto the V4L2_CID_STATELESS_AV1_* control set. Pattern mirrors existing VP9 + HEVC dispatch. Would enable VAAPI consumers (VLC, mpv `--hwdec=vaapi`, GStreamer-VAAPI, browsers).
Useful if the ampere desktop has consumers that prefer VAAPI over direct v4l2request.
### B: Fluster AV1-TEST-VECTORS comprehensive run
Set up `gst-launch-1.0` + Fluster GStreamer-AV1-V4L2SL-Gst1.0 test runner. Get a quantitative pass-rate against the canonical AV1 conformance suite.
### C: Real-world bitstream verification
Encode + decode a real 1080p/4K AV1 sample (e.g., a 30s clip from a Netflix-encoded AV1 sample, if available). Verify bit-perfect at scale.
## Persistent state
- **Repo**: `git.reauktion.de/claude-noether/ampere-av1-enablement` — this README + Phase 0 close
- **Local working dir**: `fresnel:~/src/ampere-av1-enablement/`
- **Ampere**: restored to sibling-campaign-close kernel module (HEVC bit-perfect retained). AV1 path is hantro-vpu (separate from rkvdec which the VP9 campaign attempted to modify).
- **Test vectors**: `/tmp/test_av1.ivf` + `/tmp/av1_larger.ivf` on both fresnel and ampere
- **Outputs**: `/tmp/hw-av1*.nv12` + `/tmp/sw-av1*.nv12` on ampere (~1.5MB total)
## Verdict
AV1 hardware decode on ampere is **WORKING** via mainline + ffmpeg-v4l2request, bit-perfect against SW reference. Pivot from VP9 success.