c734a82c41
Janet v1 verdict was AMEND with 1 structural BLOCKER (Q5) + 3
implementation-time risks (F1-F3).
Q5 (BLOCKER) empirically confirmed on ampere: RK3588 has THREE
hantro-vpu instances (legacy MPEG2/VP8 at /dev/video2, encoder at
/dev/video3, vpu981 AV1 at /dev/video4). Current backend's 2-device
fd model (rkvdec + hantro) is "RK3399-shaped knowledge" per its own
comment — silently picks the wrong hantro on RK3588.
v2 fix: add third device slot (video_fd_vpu981 + media_fd_vpu981),
discriminate by V4L2_PIX_FMT_AV1_FRAME capability (not driver name),
extend request_device_kind_for_profile with 'a' kind for VAProfile-
AV1Profile0, extend cap_pool pair-of-flags layout per iter38 pattern.
Q1 amendment: tile_group_entry DYNAMIC_ARRAY size = sizeof * MAX(N,1);
add _Static_assert for kernel uAPI drift catch.
Q2 amendment: VIDIOC_QUERY_EXT_CTRL probe at context init for film_grain
availability; gate per-frame send on the flag.
Q3 PROCEED: per-frame SEQUENCE send (no caching).
Q4 PROCEED: ignore VAOpaqueAV1 (codec_store_buffer has default fallback).
Q6 PROCEED: V4L2_REQUEST_MAX_PROFILES=11 exactly full with AV1; add
comment for future bumps.
F1-F3 implementation risks catalogued for Phase 2 code review:
- mi_col/row_starts sentinel (silent corruption on multi-tile)
- superres_denom AV1_SUPERRES_NUM=8 default offset
- loop_restoration_size[] USES_LR flag gating
File estimate up from 800 to 880 LoC (added vpu981 scaffolding).
Phase 0 test vectors were single-tile (208×208 + 352×288); Phase 3
verification must include multi-tile 1080p+ AV1 to exercise F1.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
138 lines
5.8 KiB
Markdown
138 lines
5.8 KiB
Markdown
# Phase 1 plan v2 — Janet amendments
|
||
|
||
Date: 2026-05-17 09:40. v1 returned AMEND from Janet with one structural BLOCKER (Q5) + 3 implementation-time risks. v2 addresses all six findings.
|
||
|
||
## Q5 BLOCKER: vpu981 is a THIRD physical hantro device on RK3588
|
||
|
||
**Empirically confirmed** on ampere via `v4l2-ctl --device`:
|
||
|
||
```
|
||
/dev/video1 rkvdec — rkvdec (H.264, HEVC, VP9-broken)
|
||
/dev/video2 hantro-vpu — rockchip,rk3568-vpu-dec (MPEG2, VP8, ...)
|
||
/dev/video3 hantro-vpu — rockchip,rk3588-vepu121-enc (encoder, n/a)
|
||
/dev/video4 hantro-vpu — rockchip,rk3588-av1-vpu-dec (vpu981 = AV1!)
|
||
```
|
||
|
||
Three `/dev/media*` controllers, three `hantro-vpu` driver instances. The current backend (`request.c:386` comment: *"RK3399-shaped knowledge"*) opens exactly two device fds (`video_fd_rkvdec` + `video_fd_hantro`) discriminated by driver name alone. On RK3588, `find_decoder_device_by_driver("hantro-vpu")` returns whichever node appears first — currently `/dev/video2` (legacy), and silently skips vpu981.
|
||
|
||
### Structural fix — add a third device slot
|
||
|
||
```c
|
||
struct request_data {
|
||
int video_fd_rkvdec; /* existing */
|
||
int media_fd_rkvdec; /* existing */
|
||
int video_fd_hantro; /* existing — legacy hantro (MPEG2/VP8) */
|
||
int media_fd_hantro; /* existing */
|
||
int video_fd_vpu981; /* NEW — AV1 hantro on RK3588 */
|
||
int media_fd_vpu981; /* NEW */
|
||
/* ... */
|
||
};
|
||
```
|
||
|
||
### Capability-based probe (not driver-name)
|
||
|
||
Driver name "hantro-vpu" is ambiguous on RK3588. Discriminate by which OUTPUT format the device supports:
|
||
|
||
```c
|
||
/* New helper in request.c */
|
||
static int find_decoder_device_supporting_fmt(u32 pixfmt,
|
||
int *video_fd, int *media_fd);
|
||
```
|
||
|
||
Probe sequence:
|
||
1. Open all 3 `hantro-vpu` candidates (legacy, encoder, vpu981).
|
||
2. For each: `VIDIOC_ENUM_FMT` on `V4L2_BUF_TYPE_VIDEO_OUTPUT` and check for `V4L2_PIX_FMT_AV1_FRAME`.
|
||
3. The match is vpu981; the non-match with `V4L2_PIX_FMT_MPEG2_SLICE` (or VP8) is legacy.
|
||
4. Encoder (`vepu121-enc`) advertises NO OUTPUT decode formats — naturally falls out.
|
||
|
||
### `request_device_kind_for_profile` extension
|
||
|
||
```c
|
||
case VAProfileAV1Profile0: return 'a'; /* AV1 → vpu981 */
|
||
```
|
||
|
||
### `request_switch_device_for_profile` extension
|
||
|
||
```c
|
||
case 'a':
|
||
target_video = driver_data->video_fd_vpu981;
|
||
target_media = driver_data->media_fd_vpu981;
|
||
break;
|
||
```
|
||
|
||
### `cap_pool` extension
|
||
|
||
Add a per-vpu981 CAP pool slot. The iter38 architectural pattern (per [feedback_multi_device_probe_design](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_multi_device_probe_design.md)) used a 2-flag pair; extending to 3 needs the same pair-of-flags layout for vpu981.
|
||
|
||
### Cleanup (`request_terminate`)
|
||
|
||
Close `video_fd_vpu981` + `media_fd_vpu981` symmetrically.
|
||
|
||
## Q1 AMENDMENT: tile group entry DYNAMIC_ARRAY size formula
|
||
|
||
Per kernel uAPI, V4L2_CID_STATELESS_AV1_TILE_GROUP_ENTRY has `V4L2_CTRL_FLAG_DYNAMIC_ARRAY`. Submit:
|
||
|
||
```c
|
||
ctrls[i].size = sizeof(struct v4l2_ctrl_av1_tile_group_entry) *
|
||
MAX(num_entries, 1);
|
||
```
|
||
|
||
Single-tile case: `num_entries=1` not 0 — kernel UB on size=0 dynamic array.
|
||
|
||
Add `_Static_assert(sizeof(struct v4l2_ctrl_av1_tile_group_entry) == 16, "uAPI drift")` to catch kernel-uAPI changes.
|
||
|
||
## Q2 AMENDMENT: film_grain probe at context init
|
||
|
||
Mirror Kwiboo's pattern. At backend context-open time:
|
||
|
||
```c
|
||
struct v4l2_query_ext_ctrl qec = { .id = V4L2_CID_STATELESS_AV1_FILM_GRAIN };
|
||
ctx->has_film_grain = (ioctl(video_fd, VIDIOC_QUERY_EXT_CTRL, &qec) == 0);
|
||
```
|
||
|
||
Gate the 4th control on this flag at end_frame.
|
||
|
||
## Q3 PROCEED: SEQUENCE_HEADER per-frame send
|
||
|
||
No caching. Tiny struct, no perf concern.
|
||
|
||
## Q4 PROCEED: ignore VAOpaqueAV1
|
||
|
||
Verify with `grep VAOpaqueAV1 /usr/include/va/*.h` on ampere; if absent, definitive. `codec_store_buffer` has `default: break;` fallback already.
|
||
|
||
## Q6 PROCEED with comment
|
||
|
||
`V4L2_REQUEST_MAX_PROFILES = 11` (10 existing + AV1 = exactly 11). Add a comment: *"Bumping this requires recompiling all callers — verify libva consumers' profiles[] sizing"*.
|
||
|
||
## General fill_frame implementation risks (catalogued for Phase 2 review)
|
||
|
||
| # | Risk | Mitigation |
|
||
|---|---|---|
|
||
| F1 | `tile_info.mi_col_starts[tile_cols]` / `mi_row_starts[tile_rows]` sentinel values must be computed from frame width/height in superblock units (`2 * ((frame_width + 7) >> 3)`). Silent corruption on multi-tile streams if wrong. | Mirror Kwiboo lines 238/244 exactly. Add diagnostic dump for first multi-tile frame |
|
||
| F2 | `superres_denom` zero is wrong; must be `AV1_SUPERRES_NUM = 8` when `use_superres=0`. VAAPI's `coded_denom` needs `+AV1_SUPERRES_DENOM_MIN` offset. | Mirror Kwiboo lines 186-189 |
|
||
| F3 | `loop_restoration.loop_restoration_size[]` array gated on `USES_LR` flag. Inverted flag check would cause decoder hang. | Mirror Kwiboo lines 281-287 |
|
||
|
||
Phase 0 test vectors are 208×208 + 352×288 (single-tile). 1080p+ AV1 streams exercise F1; Phase 3 verification should include at least one multi-tile stream.
|
||
|
||
## Updated file breakdown
|
||
|
||
| File | Lines (revised) |
|
||
|---|---|
|
||
| `request.c` | +80 (third-device scaffolding, capability probe, kind 'a') |
|
||
| `request.h` | +5 (vpu981 fd slots) |
|
||
| `codec.c` | +5 |
|
||
| `config.c` | +15 (AV1 profile + AV1 fmt check) |
|
||
| `picture.c` | +30 |
|
||
| `surface.h` | +10 |
|
||
| NEW `av1.h` | +30 |
|
||
| NEW `av1.c` | +700 (per Kwiboo reference, with F1/F2/F3 cautions) |
|
||
| `Makefile.am` | +2 |
|
||
|
||
Total: ~880 LoC. Up from v1's 800 due to vpu981 scaffolding.
|
||
|
||
## Verdict carry-forward
|
||
|
||
Janet v1 verdict: AMEND (Q5 structural blocker). v2 addresses Q5 with explicit third-device fd + capability-based probe; Q1 with exact dynamic-array size formula; Q2 with init-time probe; Q4 verified-via-headers PROCEED; Q6 PROCEED-with-comment. Implementation-time risks F1/F2/F3 catalogued for Phase 2 review.
|
||
|
||
**With these amendments, expect PROCEED on re-review.**
|