# Phase 1 plan v2 — Janet amendments Date: 2026-05-17 09:40. v1 returned AMEND from Janet with one structural BLOCKER (Q5) + 3 implementation-time risks. v2 addresses all six findings. ## Q5 BLOCKER: vpu981 is a THIRD physical hantro device on RK3588 **Empirically confirmed** on ampere via `v4l2-ctl --device`: ``` /dev/video1 rkvdec — rkvdec (H.264, HEVC, VP9-broken) /dev/video2 hantro-vpu — rockchip,rk3568-vpu-dec (MPEG2, VP8, ...) /dev/video3 hantro-vpu — rockchip,rk3588-vepu121-enc (encoder, n/a) /dev/video4 hantro-vpu — rockchip,rk3588-av1-vpu-dec (vpu981 = AV1!) ``` Three `/dev/media*` controllers, three `hantro-vpu` driver instances. The current backend (`request.c:386` comment: *"RK3399-shaped knowledge"*) opens exactly two device fds (`video_fd_rkvdec` + `video_fd_hantro`) discriminated by driver name alone. On RK3588, `find_decoder_device_by_driver("hantro-vpu")` returns whichever node appears first — currently `/dev/video2` (legacy), and silently skips vpu981. ### Structural fix — add a third device slot ```c struct request_data { int video_fd_rkvdec; /* existing */ int media_fd_rkvdec; /* existing */ int video_fd_hantro; /* existing — legacy hantro (MPEG2/VP8) */ int media_fd_hantro; /* existing */ int video_fd_vpu981; /* NEW — AV1 hantro on RK3588 */ int media_fd_vpu981; /* NEW */ /* ... */ }; ``` ### Capability-based probe (not driver-name) Driver name "hantro-vpu" is ambiguous on RK3588. Discriminate by which OUTPUT format the device supports: ```c /* New helper in request.c */ static int find_decoder_device_supporting_fmt(u32 pixfmt, int *video_fd, int *media_fd); ``` Probe sequence: 1. Open all 3 `hantro-vpu` candidates (legacy, encoder, vpu981). 2. For each: `VIDIOC_ENUM_FMT` on `V4L2_BUF_TYPE_VIDEO_OUTPUT` and check for `V4L2_PIX_FMT_AV1_FRAME`. 3. The match is vpu981; the non-match with `V4L2_PIX_FMT_MPEG2_SLICE` (or VP8) is legacy. 4. Encoder (`vepu121-enc`) advertises NO OUTPUT decode formats — naturally falls out. ### `request_device_kind_for_profile` extension ```c case VAProfileAV1Profile0: return 'a'; /* AV1 → vpu981 */ ``` ### `request_switch_device_for_profile` extension ```c case 'a': target_video = driver_data->video_fd_vpu981; target_media = driver_data->media_fd_vpu981; break; ``` ### `cap_pool` extension Add a per-vpu981 CAP pool slot. The iter38 architectural pattern (per [feedback_multi_device_probe_design](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_multi_device_probe_design.md)) used a 2-flag pair; extending to 3 needs the same pair-of-flags layout for vpu981. ### Cleanup (`request_terminate`) Close `video_fd_vpu981` + `media_fd_vpu981` symmetrically. ## Q1 AMENDMENT: tile group entry DYNAMIC_ARRAY size formula Per kernel uAPI, V4L2_CID_STATELESS_AV1_TILE_GROUP_ENTRY has `V4L2_CTRL_FLAG_DYNAMIC_ARRAY`. Submit: ```c ctrls[i].size = sizeof(struct v4l2_ctrl_av1_tile_group_entry) * MAX(num_entries, 1); ``` Single-tile case: `num_entries=1` not 0 — kernel UB on size=0 dynamic array. Add `_Static_assert(sizeof(struct v4l2_ctrl_av1_tile_group_entry) == 16, "uAPI drift")` to catch kernel-uAPI changes. ## Q2 AMENDMENT: film_grain probe at context init Mirror Kwiboo's pattern. At backend context-open time: ```c struct v4l2_query_ext_ctrl qec = { .id = V4L2_CID_STATELESS_AV1_FILM_GRAIN }; ctx->has_film_grain = (ioctl(video_fd, VIDIOC_QUERY_EXT_CTRL, &qec) == 0); ``` Gate the 4th control on this flag at end_frame. ## Q3 PROCEED: SEQUENCE_HEADER per-frame send No caching. Tiny struct, no perf concern. ## Q4 PROCEED: ignore VAOpaqueAV1 Verify with `grep VAOpaqueAV1 /usr/include/va/*.h` on ampere; if absent, definitive. `codec_store_buffer` has `default: break;` fallback already. ## Q6 PROCEED with comment `V4L2_REQUEST_MAX_PROFILES = 11` (10 existing + AV1 = exactly 11). Add a comment: *"Bumping this requires recompiling all callers — verify libva consumers' profiles[] sizing"*. ## General fill_frame implementation risks (catalogued for Phase 2 review) | # | Risk | Mitigation | |---|---|---| | F1 | `tile_info.mi_col_starts[tile_cols]` / `mi_row_starts[tile_rows]` sentinel values must be computed from frame width/height in superblock units (`2 * ((frame_width + 7) >> 3)`). Silent corruption on multi-tile streams if wrong. | Mirror Kwiboo lines 238/244 exactly. Add diagnostic dump for first multi-tile frame | | F2 | `superres_denom` zero is wrong; must be `AV1_SUPERRES_NUM = 8` when `use_superres=0`. VAAPI's `coded_denom` needs `+AV1_SUPERRES_DENOM_MIN` offset. | Mirror Kwiboo lines 186-189 | | F3 | `loop_restoration.loop_restoration_size[]` array gated on `USES_LR` flag. Inverted flag check would cause decoder hang. | Mirror Kwiboo lines 281-287 | Phase 0 test vectors are 208×208 + 352×288 (single-tile). 1080p+ AV1 streams exercise F1; Phase 3 verification should include at least one multi-tile stream. ## Updated file breakdown | File | Lines (revised) | |---|---| | `request.c` | +80 (third-device scaffolding, capability probe, kind 'a') | | `request.h` | +5 (vpu981 fd slots) | | `codec.c` | +5 | | `config.c` | +15 (AV1 profile + AV1 fmt check) | | `picture.c` | +30 | | `surface.h` | +10 | | NEW `av1.h` | +30 | | NEW `av1.c` | +700 (per Kwiboo reference, with F1/F2/F3 cautions) | | `Makefile.am` | +2 | Total: ~880 LoC. Up from v1's 800 due to vpu981 scaffolding. ## Verdict carry-forward Janet v1 verdict: AMEND (Q5 structural blocker). v2 addresses Q5 with explicit third-device fd + capability-based probe; Q1 with exact dynamic-array size formula; Q2 with init-time probe; Q4 verified-via-headers PROCEED; Q6 PROCEED-with-comment. Implementation-time risks F1/F2/F3 catalogued for Phase 2 review. **With these amendments, expect PROCEED on re-review.**