Janet v1 verdict was AMEND with 1 structural BLOCKER (Q5) + 3
implementation-time risks (F1-F3).
Q5 (BLOCKER) empirically confirmed on ampere: RK3588 has THREE
hantro-vpu instances (legacy MPEG2/VP8 at /dev/video2, encoder at
/dev/video3, vpu981 AV1 at /dev/video4). Current backend's 2-device
fd model (rkvdec + hantro) is "RK3399-shaped knowledge" per its own
comment — silently picks the wrong hantro on RK3588.
v2 fix: add third device slot (video_fd_vpu981 + media_fd_vpu981),
discriminate by V4L2_PIX_FMT_AV1_FRAME capability (not driver name),
extend request_device_kind_for_profile with 'a' kind for VAProfile-
AV1Profile0, extend cap_pool pair-of-flags layout per iter38 pattern.
Q1 amendment: tile_group_entry DYNAMIC_ARRAY size = sizeof * MAX(N,1);
add _Static_assert for kernel uAPI drift catch.
Q2 amendment: VIDIOC_QUERY_EXT_CTRL probe at context init for film_grain
availability; gate per-frame send on the flag.
Q3 PROCEED: per-frame SEQUENCE send (no caching).
Q4 PROCEED: ignore VAOpaqueAV1 (codec_store_buffer has default fallback).
Q6 PROCEED: V4L2_REQUEST_MAX_PROFILES=11 exactly full with AV1; add
comment for future bumps.
F1-F3 implementation risks catalogued for Phase 2 code review:
- mi_col/row_starts sentinel (silent corruption on multi-tile)
- superres_denom AV1_SUPERRES_NUM=8 default offset
- loop_restoration_size[] USES_LR flag gating
File estimate up from 800 to 880 LoC (added vpu981 scaffolding).
Phase 0 test vectors were single-tile (208×208 + 352×288); Phase 3
verification must include multi-tile 1080p+ AV1 to exercise F1.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5.8 KiB
Phase 1 plan v2 — Janet amendments
Date: 2026-05-17 09:40. v1 returned AMEND from Janet with one structural BLOCKER (Q5) + 3 implementation-time risks. v2 addresses all six findings.
Q5 BLOCKER: vpu981 is a THIRD physical hantro device on RK3588
Empirically confirmed on ampere via v4l2-ctl --device:
/dev/video1 rkvdec — rkvdec (H.264, HEVC, VP9-broken)
/dev/video2 hantro-vpu — rockchip,rk3568-vpu-dec (MPEG2, VP8, ...)
/dev/video3 hantro-vpu — rockchip,rk3588-vepu121-enc (encoder, n/a)
/dev/video4 hantro-vpu — rockchip,rk3588-av1-vpu-dec (vpu981 = AV1!)
Three /dev/media* controllers, three hantro-vpu driver instances. The current backend (request.c:386 comment: "RK3399-shaped knowledge") opens exactly two device fds (video_fd_rkvdec + video_fd_hantro) discriminated by driver name alone. On RK3588, find_decoder_device_by_driver("hantro-vpu") returns whichever node appears first — currently /dev/video2 (legacy), and silently skips vpu981.
Structural fix — add a third device slot
struct request_data {
int video_fd_rkvdec; /* existing */
int media_fd_rkvdec; /* existing */
int video_fd_hantro; /* existing — legacy hantro (MPEG2/VP8) */
int media_fd_hantro; /* existing */
int video_fd_vpu981; /* NEW — AV1 hantro on RK3588 */
int media_fd_vpu981; /* NEW */
/* ... */
};
Capability-based probe (not driver-name)
Driver name "hantro-vpu" is ambiguous on RK3588. Discriminate by which OUTPUT format the device supports:
/* New helper in request.c */
static int find_decoder_device_supporting_fmt(u32 pixfmt,
int *video_fd, int *media_fd);
Probe sequence:
- Open all 3
hantro-vpucandidates (legacy, encoder, vpu981). - For each:
VIDIOC_ENUM_FMTonV4L2_BUF_TYPE_VIDEO_OUTPUTand check forV4L2_PIX_FMT_AV1_FRAME. - The match is vpu981; the non-match with
V4L2_PIX_FMT_MPEG2_SLICE(or VP8) is legacy. - Encoder (
vepu121-enc) advertises NO OUTPUT decode formats — naturally falls out.
request_device_kind_for_profile extension
case VAProfileAV1Profile0: return 'a'; /* AV1 → vpu981 */
request_switch_device_for_profile extension
case 'a':
target_video = driver_data->video_fd_vpu981;
target_media = driver_data->media_fd_vpu981;
break;
cap_pool extension
Add a per-vpu981 CAP pool slot. The iter38 architectural pattern (per feedback_multi_device_probe_design) used a 2-flag pair; extending to 3 needs the same pair-of-flags layout for vpu981.
Cleanup (request_terminate)
Close video_fd_vpu981 + media_fd_vpu981 symmetrically.
Q1 AMENDMENT: tile group entry DYNAMIC_ARRAY size formula
Per kernel uAPI, V4L2_CID_STATELESS_AV1_TILE_GROUP_ENTRY has V4L2_CTRL_FLAG_DYNAMIC_ARRAY. Submit:
ctrls[i].size = sizeof(struct v4l2_ctrl_av1_tile_group_entry) *
MAX(num_entries, 1);
Single-tile case: num_entries=1 not 0 — kernel UB on size=0 dynamic array.
Add _Static_assert(sizeof(struct v4l2_ctrl_av1_tile_group_entry) == 16, "uAPI drift") to catch kernel-uAPI changes.
Q2 AMENDMENT: film_grain probe at context init
Mirror Kwiboo's pattern. At backend context-open time:
struct v4l2_query_ext_ctrl qec = { .id = V4L2_CID_STATELESS_AV1_FILM_GRAIN };
ctx->has_film_grain = (ioctl(video_fd, VIDIOC_QUERY_EXT_CTRL, &qec) == 0);
Gate the 4th control on this flag at end_frame.
Q3 PROCEED: SEQUENCE_HEADER per-frame send
No caching. Tiny struct, no perf concern.
Q4 PROCEED: ignore VAOpaqueAV1
Verify with grep VAOpaqueAV1 /usr/include/va/*.h on ampere; if absent, definitive. codec_store_buffer has default: break; fallback already.
Q6 PROCEED with comment
V4L2_REQUEST_MAX_PROFILES = 11 (10 existing + AV1 = exactly 11). Add a comment: "Bumping this requires recompiling all callers — verify libva consumers' profiles[] sizing".
General fill_frame implementation risks (catalogued for Phase 2 review)
| # | Risk | Mitigation |
|---|---|---|
| F1 | tile_info.mi_col_starts[tile_cols] / mi_row_starts[tile_rows] sentinel values must be computed from frame width/height in superblock units (2 * ((frame_width + 7) >> 3)). Silent corruption on multi-tile streams if wrong. |
Mirror Kwiboo lines 238/244 exactly. Add diagnostic dump for first multi-tile frame |
| F2 | superres_denom zero is wrong; must be AV1_SUPERRES_NUM = 8 when use_superres=0. VAAPI's coded_denom needs +AV1_SUPERRES_DENOM_MIN offset. |
Mirror Kwiboo lines 186-189 |
| F3 | loop_restoration.loop_restoration_size[] array gated on USES_LR flag. Inverted flag check would cause decoder hang. |
Mirror Kwiboo lines 281-287 |
Phase 0 test vectors are 208×208 + 352×288 (single-tile). 1080p+ AV1 streams exercise F1; Phase 3 verification should include at least one multi-tile stream.
Updated file breakdown
| File | Lines (revised) |
|---|---|
request.c |
+80 (third-device scaffolding, capability probe, kind 'a') |
request.h |
+5 (vpu981 fd slots) |
codec.c |
+5 |
config.c |
+15 (AV1 profile + AV1 fmt check) |
picture.c |
+30 |
surface.h |
+10 |
NEW av1.h |
+30 |
NEW av1.c |
+700 (per Kwiboo reference, with F1/F2/F3 cautions) |
Makefile.am |
+2 |
Total: ~880 LoC. Up from v1's 800 due to vpu981 scaffolding.
Verdict carry-forward
Janet v1 verdict: AMEND (Q5 structural blocker). v2 addresses Q5 with explicit third-device fd + capability-based probe; Q1 with exact dynamic-array size formula; Q2 with init-time probe; Q4 verified-via-headers PROCEED; Q6 PROCEED-with-comment. Implementation-time risks F1/F2/F3 catalogued for Phase 2 review.
With these amendments, expect PROCEED on re-review.