Files
claude-noether c734a82c41 Phase 1 plan v2: address Janet AMEND verdict
Janet v1 verdict was AMEND with 1 structural BLOCKER (Q5) + 3
implementation-time risks (F1-F3).

Q5 (BLOCKER) empirically confirmed on ampere: RK3588 has THREE
hantro-vpu instances (legacy MPEG2/VP8 at /dev/video2, encoder at
/dev/video3, vpu981 AV1 at /dev/video4). Current backend's 2-device
fd model (rkvdec + hantro) is "RK3399-shaped knowledge" per its own
comment — silently picks the wrong hantro on RK3588.

v2 fix: add third device slot (video_fd_vpu981 + media_fd_vpu981),
discriminate by V4L2_PIX_FMT_AV1_FRAME capability (not driver name),
extend request_device_kind_for_profile with 'a' kind for VAProfile-
AV1Profile0, extend cap_pool pair-of-flags layout per iter38 pattern.

Q1 amendment: tile_group_entry DYNAMIC_ARRAY size = sizeof * MAX(N,1);
add _Static_assert for kernel uAPI drift catch.

Q2 amendment: VIDIOC_QUERY_EXT_CTRL probe at context init for film_grain
availability; gate per-frame send on the flag.

Q3 PROCEED: per-frame SEQUENCE send (no caching).
Q4 PROCEED: ignore VAOpaqueAV1 (codec_store_buffer has default fallback).
Q6 PROCEED: V4L2_REQUEST_MAX_PROFILES=11 exactly full with AV1; add
            comment for future bumps.

F1-F3 implementation risks catalogued for Phase 2 code review:
- mi_col/row_starts sentinel (silent corruption on multi-tile)
- superres_denom AV1_SUPERRES_NUM=8 default offset
- loop_restoration_size[] USES_LR flag gating

File estimate up from 800 to 880 LoC (added vpu981 scaffolding).
Phase 0 test vectors were single-tile (208×208 + 352×288); Phase 3
verification must include multi-tile 1080p+ AV1 to exercise F1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 07:50:47 +00:00

5.8 KiB
Raw Permalink Blame History

Phase 1 plan v2 — Janet amendments

Date: 2026-05-17 09:40. v1 returned AMEND from Janet with one structural BLOCKER (Q5) + 3 implementation-time risks. v2 addresses all six findings.

Q5 BLOCKER: vpu981 is a THIRD physical hantro device on RK3588

Empirically confirmed on ampere via v4l2-ctl --device:

/dev/video1  rkvdec       — rkvdec               (H.264, HEVC, VP9-broken)
/dev/video2  hantro-vpu   — rockchip,rk3568-vpu-dec   (MPEG2, VP8, ...)
/dev/video3  hantro-vpu   — rockchip,rk3588-vepu121-enc (encoder, n/a)
/dev/video4  hantro-vpu   — rockchip,rk3588-av1-vpu-dec (vpu981 = AV1!)

Three /dev/media* controllers, three hantro-vpu driver instances. The current backend (request.c:386 comment: "RK3399-shaped knowledge") opens exactly two device fds (video_fd_rkvdec + video_fd_hantro) discriminated by driver name alone. On RK3588, find_decoder_device_by_driver("hantro-vpu") returns whichever node appears first — currently /dev/video2 (legacy), and silently skips vpu981.

Structural fix — add a third device slot

struct request_data {
    int video_fd_rkvdec;   /* existing */
    int media_fd_rkvdec;   /* existing */
    int video_fd_hantro;   /* existing — legacy hantro (MPEG2/VP8) */
    int media_fd_hantro;   /* existing */
    int video_fd_vpu981;   /* NEW — AV1 hantro on RK3588 */
    int media_fd_vpu981;   /* NEW */
    /* ... */
};

Capability-based probe (not driver-name)

Driver name "hantro-vpu" is ambiguous on RK3588. Discriminate by which OUTPUT format the device supports:

/* New helper in request.c */
static int find_decoder_device_supporting_fmt(u32 pixfmt,
                                              int *video_fd, int *media_fd);

Probe sequence:

  1. Open all 3 hantro-vpu candidates (legacy, encoder, vpu981).
  2. For each: VIDIOC_ENUM_FMT on V4L2_BUF_TYPE_VIDEO_OUTPUT and check for V4L2_PIX_FMT_AV1_FRAME.
  3. The match is vpu981; the non-match with V4L2_PIX_FMT_MPEG2_SLICE (or VP8) is legacy.
  4. Encoder (vepu121-enc) advertises NO OUTPUT decode formats — naturally falls out.

request_device_kind_for_profile extension

case VAProfileAV1Profile0: return 'a';   /* AV1 → vpu981 */

request_switch_device_for_profile extension

case 'a':
    target_video = driver_data->video_fd_vpu981;
    target_media = driver_data->media_fd_vpu981;
    break;

cap_pool extension

Add a per-vpu981 CAP pool slot. The iter38 architectural pattern (per feedback_multi_device_probe_design) used a 2-flag pair; extending to 3 needs the same pair-of-flags layout for vpu981.

Cleanup (request_terminate)

Close video_fd_vpu981 + media_fd_vpu981 symmetrically.

Q1 AMENDMENT: tile group entry DYNAMIC_ARRAY size formula

Per kernel uAPI, V4L2_CID_STATELESS_AV1_TILE_GROUP_ENTRY has V4L2_CTRL_FLAG_DYNAMIC_ARRAY. Submit:

ctrls[i].size = sizeof(struct v4l2_ctrl_av1_tile_group_entry) *
                MAX(num_entries, 1);

Single-tile case: num_entries=1 not 0 — kernel UB on size=0 dynamic array.

Add _Static_assert(sizeof(struct v4l2_ctrl_av1_tile_group_entry) == 16, "uAPI drift") to catch kernel-uAPI changes.

Q2 AMENDMENT: film_grain probe at context init

Mirror Kwiboo's pattern. At backend context-open time:

struct v4l2_query_ext_ctrl qec = { .id = V4L2_CID_STATELESS_AV1_FILM_GRAIN };
ctx->has_film_grain = (ioctl(video_fd, VIDIOC_QUERY_EXT_CTRL, &qec) == 0);

Gate the 4th control on this flag at end_frame.

Q3 PROCEED: SEQUENCE_HEADER per-frame send

No caching. Tiny struct, no perf concern.

Q4 PROCEED: ignore VAOpaqueAV1

Verify with grep VAOpaqueAV1 /usr/include/va/*.h on ampere; if absent, definitive. codec_store_buffer has default: break; fallback already.

Q6 PROCEED with comment

V4L2_REQUEST_MAX_PROFILES = 11 (10 existing + AV1 = exactly 11). Add a comment: "Bumping this requires recompiling all callers — verify libva consumers' profiles[] sizing".

General fill_frame implementation risks (catalogued for Phase 2 review)

# Risk Mitigation
F1 tile_info.mi_col_starts[tile_cols] / mi_row_starts[tile_rows] sentinel values must be computed from frame width/height in superblock units (2 * ((frame_width + 7) >> 3)). Silent corruption on multi-tile streams if wrong. Mirror Kwiboo lines 238/244 exactly. Add diagnostic dump for first multi-tile frame
F2 superres_denom zero is wrong; must be AV1_SUPERRES_NUM = 8 when use_superres=0. VAAPI's coded_denom needs +AV1_SUPERRES_DENOM_MIN offset. Mirror Kwiboo lines 186-189
F3 loop_restoration.loop_restoration_size[] array gated on USES_LR flag. Inverted flag check would cause decoder hang. Mirror Kwiboo lines 281-287

Phase 0 test vectors are 208×208 + 352×288 (single-tile). 1080p+ AV1 streams exercise F1; Phase 3 verification should include at least one multi-tile stream.

Updated file breakdown

File Lines (revised)
request.c +80 (third-device scaffolding, capability probe, kind 'a')
request.h +5 (vpu981 fd slots)
codec.c +5
config.c +15 (AV1 profile + AV1 fmt check)
picture.c +30
surface.h +10
NEW av1.h +30
NEW av1.c +700 (per Kwiboo reference, with F1/F2/F3 cautions)
Makefile.am +2

Total: ~880 LoC. Up from v1's 800 due to vpu981 scaffolding.

Verdict carry-forward

Janet v1 verdict: AMEND (Q5 structural blocker). v2 addresses Q5 with explicit third-device fd + capability-based probe; Q1 with exact dynamic-array size formula; Q2 with init-time probe; Q4 verified-via-headers PROCEED; Q6 PROCEED-with-comment. Implementation-time risks F1/F2/F3 catalogued for Phase 2 review.

With these amendments, expect PROCEED on re-review.