Phase 0: AV1 hw decode on ampere VERIFIED bit-perfect first-try
Pivoted from ampere-vp9-enablement (closed at structural impossibility
on rkvdec/vdpu381). Janet PIVOT verdict pointed at AV1 — verification
shows it works out-of-the-box on mainline 7.0.0-rc3:
- Kernel driver: drivers/media/platform/verisilicon/
rockchip_vpu981_hw_av1_dec.c (in-tree, loaded as hantro-vpu)
- Hardware: vpu981 dedicated AV1 IP at fdc70000 (separate from rkvdec)
- V4L2 node: /dev/video4 enumerates AV1F format
- Userspace: ffmpeg -hwaccel v4l2request kdirect path works
Verification: byte-compare HW (hantro-vpu) vs SW (libdav1d) on two
AOM test vectors:
- av1-1-b8-01-size-208x208.ivf (2 frames): 100.0000% exact match
- av1-1-b8-23-film_grain-50.ivf (10 frames): 100.0000% exact match
per frame, including AV1 film_grain post-processing
Pivot outcome:
- VP9 campaign: 10 iterations + 2 architect reviews → structural
impossibility (kernel-side gap that needs upstream/Collabora coord)
- AV1 verification: 0 iterations → bit-perfect first try
The "enablement campaign" framing is mostly inappropriate for AV1 —
this is a verification campaign. Real upstream work was done by
Verisilicon + Collabora; we just confirm it works on ampere.
Optional follow-ups (out of Phase 0 scope):
A. libva backend AV1 dispatch (enables VAAPI consumers)
B. Fluster AV1-TEST-VECTORS comprehensive validation
C. 1080p/4K real-world AV1 stress test
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,53 @@
|
||||
# ampere-av1-enablement
|
||||
|
||||
AV1 hardware decode verification on Rockchip RK3588 ampere (CoolPi CM5 GenBook).
|
||||
|
||||
## Status (2026-05-17 09:00)
|
||||
|
||||
**VERIFIED WORKING bit-perfect first-try** using mainline 7.0.0-rc3 + ffmpeg-v4l2request kdirect path on the hantro `vpu981` AV1 driver. Zero new code required.
|
||||
|
||||
Sibling campaign [ampere-vp9-enablement](https://git.reauktion.de/claude-noether/ampere-vp9-enablement) closed at structural-impossibility on rkvdec/vdpu381 VP9; Janet PIVOT verdict pointed to AV1; verification confirmed AV1 works out-of-the-box.
|
||||
|
||||
## Verification
|
||||
|
||||
```
|
||||
$ ssh ampere
|
||||
$ ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime \
|
||||
-i /tmp/av1_larger.ivf -vf 'hwdownload,format=nv12' \
|
||||
-f rawvideo -pix_fmt nv12 /tmp/hw-av1-all.nv12
|
||||
[AVHWFramesContext] Using V4L2 media driver hantro-vpu (7.0.0) for AV1F
|
||||
```
|
||||
|
||||
Byte-compare against ffmpeg's libdav1d SW reference, all 10 frames of the av1-1-b8-23-film_grain-50.ivf test vector (352×288, includes film-grain feature):
|
||||
|
||||
```
|
||||
frame 0: exact=100.0000%
|
||||
frame 1: exact=100.0000%
|
||||
...
|
||||
frame 9: exact=100.0000%
|
||||
```
|
||||
|
||||
Smaller test vector (av1-1-b8-01-size-208x208.ivf, 2 frames): also 100% match.
|
||||
|
||||
## Driver stack
|
||||
|
||||
- IP: `vpu981` (Rockchip's dedicated AV1 hardware on RK3588, MMIO at fdc70000)
|
||||
- Kernel driver: `drivers/media/platform/verisilicon/rockchip_vpu981_hw_av1_dec.c` (in-tree)
|
||||
- DT compatible: presumably `rockchip,rk3588-vpu981-av1-dec` (verified loaded via `lsmod | grep hantro_vpu`)
|
||||
- V4L2 node: `/dev/video4` (enumerates AV1F format)
|
||||
- Userspace path: ffmpeg `-hwaccel v4l2request` (kdirect, not libva)
|
||||
|
||||
## What's NOT done
|
||||
|
||||
- **libva-v4l2-request-fourier backend AV1 dispatch** — backend has no AV1 codec module. ffmpeg-v4l2request kdirect works without libva. Adding libva AV1 support would make AV1 available to other VAAPI consumers (VLC, mpv with VA-API, GStreamer-VAAPI, browsers via VAAPI/VDPAU). Estimated effort: 1-2 days mirroring the existing HEVC/H.264/VP9 dispatch patterns in `~/src/libva-v4l2-request-fourier/src/` (sibling repo).
|
||||
- Fluster `AV1-TEST-VECTORS` comprehensive validation (only ran 2 of the AOM test vectors).
|
||||
- Stress test (long bitstream, 1080p+, complex features beyond film_grain).
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Adding AV1 to rkvdec/vdpu381 (would duplicate the working hantro-vpu981 path)
|
||||
- Reviving the failed VP9 work from sibling campaign
|
||||
|
||||
## Process
|
||||
|
||||
This is a verification campaign more than an enablement campaign. The work was done upstream by Verisilicon + Collabora; this repo documents that it works on ampere out-of-the-box.
|
||||
@@ -0,0 +1,78 @@
|
||||
# Phase 0 findings — AV1 on RK3588 ampere is already a working mainline path
|
||||
|
||||
Date: 2026-05-17 09:00. Campaign opened immediately after the sibling ampere-vp9-enablement closed at structural-impossibility (10 failed iterations of VP9-on-vdpu381 register tuning; Janet PIVOT verdict recommended AV1).
|
||||
|
||||
## Substrate
|
||||
|
||||
| Component | Where | Status |
|
||||
|---|---|---|
|
||||
| Kernel AV1 driver | `drivers/media/platform/verisilicon/rockchip_vpu981_hw_av1_dec.c` (mainline 7.0) | Loaded as `hantro-vpu` module |
|
||||
| V4L2 device node | `/dev/video4` on ampere | Enumerates `AV1F (AV1 Frame, compressed)` |
|
||||
| Hardware IP | vpu981 @ fdc70000 (RK3588 dedicated AV1 decoder) | Active, IOMMU mapped |
|
||||
| Userspace path 1 | ffmpeg `-hwaccel v4l2request` (kdirect) | **Bit-perfect verified** |
|
||||
| Userspace path 2 | libva `-hwaccel vaapi` via libva-v4l2-request-fourier | NOT implemented (no `av1.c` in backend) |
|
||||
|
||||
## Verification methodology (per [feedback_compare_hw_against_sw_reference](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_compare_hw_against_sw_reference.md))
|
||||
|
||||
Two AOM test vectors decoded twice — once via HW (`v4l2request → hantro-vpu`) and once via SW (libdav1d). Byte-compare the raw NV12 outputs.
|
||||
|
||||
### Test 1: av1-1-b8-01-size-208x208.ivf (small smoke test)
|
||||
|
||||
- 2 frames, 208×208 yuv420p
|
||||
- Both HW and SW produced 129792 bytes (= 2 × 208 × 208 × 1.5)
|
||||
- **100.0000% exact match**
|
||||
|
||||
### Test 2: av1-1-b8-23-film_grain-50.ivf (complex AV1 feature: film grain)
|
||||
|
||||
- 10 frames, 352×288 yuv420p, includes AV1 film_grain synthesis
|
||||
- Both produced 1520640 bytes (= 10 × 352 × 288 × 1.5)
|
||||
- **All 10 frames: 100.0000% exact match**
|
||||
|
||||
Film grain is one of AV1's more complex post-processing features (decoder-side noise synthesis with provided model parameters); bit-perfect match here is a strong signal that the HW decoder's spec compliance is solid.
|
||||
|
||||
## What this verifies
|
||||
|
||||
1. The mainline `rockchip_vpu981_hw_av1_dec.c` driver is functional on ampere's 7.0.0-rc3 kernel
|
||||
2. The hantro-vpu framework correctly routes AV1 to vpu981 (not to vdpu381 rkvdec)
|
||||
3. ffmpeg's v4l2request hwaccel correctly invokes the kernel AV1 path
|
||||
4. HW output matches libdav1d SW reference at byte level
|
||||
|
||||
## What this does NOT verify
|
||||
|
||||
- libva backend AV1 support (still completely absent from `libva-v4l2-request-fourier`)
|
||||
- Higher resolutions (1080p/4K AV1 — current tests are 208×208 + 352×288)
|
||||
- Longer sequences (current tests are 2 + 10 frames)
|
||||
- Profile 1/2 (high-bit-depth, 10-bit) — current tests are 8-bit Profile 0
|
||||
- Real-time-ish workload (current tests are tiny vectors)
|
||||
|
||||
## Why this campaign closed at Phase 0
|
||||
|
||||
Unlike VP9 (where the gap was kernel-side register-layout incompleteness requiring multi-week porting), AV1 was already enabled by upstream Verisilicon + Collabora work and just needed verification. The "enablement campaign" framing is mostly inappropriate here — the right word is "verification campaign".
|
||||
|
||||
## Open scope (optional follow-ups)
|
||||
|
||||
### A: Libva backend AV1 support (multi-day work)
|
||||
|
||||
Add `~/src/libva-v4l2-request-fourier/src/av1.{c,h}` to dispatch VAAPI AV1 controls (VAPictureParameterBufferAV1, VASliceParameterBufferAV1, etc.) onto the V4L2_CID_STATELESS_AV1_* control set. Pattern mirrors existing VP9 + HEVC dispatch. Would enable VAAPI consumers (VLC, mpv `--hwdec=vaapi`, GStreamer-VAAPI, browsers).
|
||||
|
||||
Useful if the ampere desktop has consumers that prefer VAAPI over direct v4l2request.
|
||||
|
||||
### B: Fluster AV1-TEST-VECTORS comprehensive run
|
||||
|
||||
Set up `gst-launch-1.0` + Fluster GStreamer-AV1-V4L2SL-Gst1.0 test runner. Get a quantitative pass-rate against the canonical AV1 conformance suite.
|
||||
|
||||
### C: Real-world bitstream verification
|
||||
|
||||
Encode + decode a real 1080p/4K AV1 sample (e.g., a 30s clip from a Netflix-encoded AV1 sample, if available). Verify bit-perfect at scale.
|
||||
|
||||
## Persistent state
|
||||
|
||||
- **Repo**: `git.reauktion.de/claude-noether/ampere-av1-enablement` — this README + Phase 0 close
|
||||
- **Local working dir**: `fresnel:~/src/ampere-av1-enablement/`
|
||||
- **Ampere**: restored to sibling-campaign-close kernel module (HEVC bit-perfect retained). AV1 path is hantro-vpu (separate from rkvdec which the VP9 campaign attempted to modify).
|
||||
- **Test vectors**: `/tmp/test_av1.ivf` + `/tmp/av1_larger.ivf` on both fresnel and ampere
|
||||
- **Outputs**: `/tmp/hw-av1*.nv12` + `/tmp/sw-av1*.nv12` on ampere (~1.5MB total)
|
||||
|
||||
## Verdict
|
||||
|
||||
AV1 hardware decode on ampere is **WORKING** via mainline + ffmpeg-v4l2request, bit-perfect against SW reference. Pivot from VP9 success.
|
||||
Reference in New Issue
Block a user