Files
claude-noether c734a82c41 Phase 1 plan v2: address Janet AMEND verdict
Janet v1 verdict was AMEND with 1 structural BLOCKER (Q5) + 3
implementation-time risks (F1-F3).

Q5 (BLOCKER) empirically confirmed on ampere: RK3588 has THREE
hantro-vpu instances (legacy MPEG2/VP8 at /dev/video2, encoder at
/dev/video3, vpu981 AV1 at /dev/video4). Current backend's 2-device
fd model (rkvdec + hantro) is "RK3399-shaped knowledge" per its own
comment — silently picks the wrong hantro on RK3588.

v2 fix: add third device slot (video_fd_vpu981 + media_fd_vpu981),
discriminate by V4L2_PIX_FMT_AV1_FRAME capability (not driver name),
extend request_device_kind_for_profile with 'a' kind for VAProfile-
AV1Profile0, extend cap_pool pair-of-flags layout per iter38 pattern.

Q1 amendment: tile_group_entry DYNAMIC_ARRAY size = sizeof * MAX(N,1);
add _Static_assert for kernel uAPI drift catch.

Q2 amendment: VIDIOC_QUERY_EXT_CTRL probe at context init for film_grain
availability; gate per-frame send on the flag.

Q3 PROCEED: per-frame SEQUENCE send (no caching).
Q4 PROCEED: ignore VAOpaqueAV1 (codec_store_buffer has default fallback).
Q6 PROCEED: V4L2_REQUEST_MAX_PROFILES=11 exactly full with AV1; add
            comment for future bumps.

F1-F3 implementation risks catalogued for Phase 2 code review:
- mi_col/row_starts sentinel (silent corruption on multi-tile)
- superres_denom AV1_SUPERRES_NUM=8 default offset
- loop_restoration_size[] USES_LR flag gating

File estimate up from 800 to 880 LoC (added vpu981 scaffolding).
Phase 0 test vectors were single-tile (208×208 + 352×288); Phase 3
verification must include multi-tile 1080p+ AV1 to exercise F1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 07:50:47 +00:00

138 lines
5.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 1 plan v2 — Janet amendments
Date: 2026-05-17 09:40. v1 returned AMEND from Janet with one structural BLOCKER (Q5) + 3 implementation-time risks. v2 addresses all six findings.
## Q5 BLOCKER: vpu981 is a THIRD physical hantro device on RK3588
**Empirically confirmed** on ampere via `v4l2-ctl --device`:
```
/dev/video1 rkvdec — rkvdec (H.264, HEVC, VP9-broken)
/dev/video2 hantro-vpu — rockchip,rk3568-vpu-dec (MPEG2, VP8, ...)
/dev/video3 hantro-vpu — rockchip,rk3588-vepu121-enc (encoder, n/a)
/dev/video4 hantro-vpu — rockchip,rk3588-av1-vpu-dec (vpu981 = AV1!)
```
Three `/dev/media*` controllers, three `hantro-vpu` driver instances. The current backend (`request.c:386` comment: *"RK3399-shaped knowledge"*) opens exactly two device fds (`video_fd_rkvdec` + `video_fd_hantro`) discriminated by driver name alone. On RK3588, `find_decoder_device_by_driver("hantro-vpu")` returns whichever node appears first — currently `/dev/video2` (legacy), and silently skips vpu981.
### Structural fix — add a third device slot
```c
struct request_data {
int video_fd_rkvdec; /* existing */
int media_fd_rkvdec; /* existing */
int video_fd_hantro; /* existing — legacy hantro (MPEG2/VP8) */
int media_fd_hantro; /* existing */
int video_fd_vpu981; /* NEW — AV1 hantro on RK3588 */
int media_fd_vpu981; /* NEW */
/* ... */
};
```
### Capability-based probe (not driver-name)
Driver name "hantro-vpu" is ambiguous on RK3588. Discriminate by which OUTPUT format the device supports:
```c
/* New helper in request.c */
static int find_decoder_device_supporting_fmt(u32 pixfmt,
int *video_fd, int *media_fd);
```
Probe sequence:
1. Open all 3 `hantro-vpu` candidates (legacy, encoder, vpu981).
2. For each: `VIDIOC_ENUM_FMT` on `V4L2_BUF_TYPE_VIDEO_OUTPUT` and check for `V4L2_PIX_FMT_AV1_FRAME`.
3. The match is vpu981; the non-match with `V4L2_PIX_FMT_MPEG2_SLICE` (or VP8) is legacy.
4. Encoder (`vepu121-enc`) advertises NO OUTPUT decode formats — naturally falls out.
### `request_device_kind_for_profile` extension
```c
case VAProfileAV1Profile0: return 'a'; /* AV1 → vpu981 */
```
### `request_switch_device_for_profile` extension
```c
case 'a':
target_video = driver_data->video_fd_vpu981;
target_media = driver_data->media_fd_vpu981;
break;
```
### `cap_pool` extension
Add a per-vpu981 CAP pool slot. The iter38 architectural pattern (per [feedback_multi_device_probe_design](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_multi_device_probe_design.md)) used a 2-flag pair; extending to 3 needs the same pair-of-flags layout for vpu981.
### Cleanup (`request_terminate`)
Close `video_fd_vpu981` + `media_fd_vpu981` symmetrically.
## Q1 AMENDMENT: tile group entry DYNAMIC_ARRAY size formula
Per kernel uAPI, V4L2_CID_STATELESS_AV1_TILE_GROUP_ENTRY has `V4L2_CTRL_FLAG_DYNAMIC_ARRAY`. Submit:
```c
ctrls[i].size = sizeof(struct v4l2_ctrl_av1_tile_group_entry) *
MAX(num_entries, 1);
```
Single-tile case: `num_entries=1` not 0 — kernel UB on size=0 dynamic array.
Add `_Static_assert(sizeof(struct v4l2_ctrl_av1_tile_group_entry) == 16, "uAPI drift")` to catch kernel-uAPI changes.
## Q2 AMENDMENT: film_grain probe at context init
Mirror Kwiboo's pattern. At backend context-open time:
```c
struct v4l2_query_ext_ctrl qec = { .id = V4L2_CID_STATELESS_AV1_FILM_GRAIN };
ctx->has_film_grain = (ioctl(video_fd, VIDIOC_QUERY_EXT_CTRL, &qec) == 0);
```
Gate the 4th control on this flag at end_frame.
## Q3 PROCEED: SEQUENCE_HEADER per-frame send
No caching. Tiny struct, no perf concern.
## Q4 PROCEED: ignore VAOpaqueAV1
Verify with `grep VAOpaqueAV1 /usr/include/va/*.h` on ampere; if absent, definitive. `codec_store_buffer` has `default: break;` fallback already.
## Q6 PROCEED with comment
`V4L2_REQUEST_MAX_PROFILES = 11` (10 existing + AV1 = exactly 11). Add a comment: *"Bumping this requires recompiling all callers — verify libva consumers' profiles[] sizing"*.
## General fill_frame implementation risks (catalogued for Phase 2 review)
| # | Risk | Mitigation |
|---|---|---|
| F1 | `tile_info.mi_col_starts[tile_cols]` / `mi_row_starts[tile_rows]` sentinel values must be computed from frame width/height in superblock units (`2 * ((frame_width + 7) >> 3)`). Silent corruption on multi-tile streams if wrong. | Mirror Kwiboo lines 238/244 exactly. Add diagnostic dump for first multi-tile frame |
| F2 | `superres_denom` zero is wrong; must be `AV1_SUPERRES_NUM = 8` when `use_superres=0`. VAAPI's `coded_denom` needs `+AV1_SUPERRES_DENOM_MIN` offset. | Mirror Kwiboo lines 186-189 |
| F3 | `loop_restoration.loop_restoration_size[]` array gated on `USES_LR` flag. Inverted flag check would cause decoder hang. | Mirror Kwiboo lines 281-287 |
Phase 0 test vectors are 208×208 + 352×288 (single-tile). 1080p+ AV1 streams exercise F1; Phase 3 verification should include at least one multi-tile stream.
## Updated file breakdown
| File | Lines (revised) |
|---|---|
| `request.c` | +80 (third-device scaffolding, capability probe, kind 'a') |
| `request.h` | +5 (vpu981 fd slots) |
| `codec.c` | +5 |
| `config.c` | +15 (AV1 profile + AV1 fmt check) |
| `picture.c` | +30 |
| `surface.h` | +10 |
| NEW `av1.h` | +30 |
| NEW `av1.c` | +700 (per Kwiboo reference, with F1/F2/F3 cautions) |
| `Makefile.am` | +2 |
Total: ~880 LoC. Up from v1's 800 due to vpu981 scaffolding.
## Verdict carry-forward
Janet v1 verdict: AMEND (Q5 structural blocker). v2 addresses Q5 with explicit third-device fd + capability-based probe; Q1 with exact dynamic-array size formula; Q2 with init-time probe; Q4 verified-via-headers PROCEED; Q6 PROCEED-with-comment. Implementation-time risks F1/F2/F3 catalogued for Phase 2 review.
**With these amendments, expect PROCEED on re-review.**