Files
fresnel-fourier/phase4_iter39_subprofile_plan.md
T
marfrit 407c7c56e1 iter39 Phase 4-6 LANDED on backend — Phase 7 awaiting fresnel power-on
Adds the iter39 sub-profile (H264 Hi10P + HEVC Main10) FR landing
materials and resumption sequence to the campaign repo.

- phase4_iter39_subprofile_plan.md: full Phase 4 plan with Phase 5
  sonnet-architect review amendments folded in. Documents the
  Option A/B/C/D scope tree, the locked Option C choice (full NV15→P010
  userspace unpack), the LOC breakdown (~180), and the test plan.
- phase7_iter39_test_rig.sh: end-to-end test script for fresnel. Encodes
  Hi10P + Main10 fixtures, runs libva vs kdirect bit-exact comparison
  (both via `-vf hwdownload,format=p010le` to normalize the NV15 stride
  difference between paths), SSIM_Y check vs SW reference, and verifies
  the iter38 5/5 baseline still holds.
- PRE_COMPACT_HANDOFF.md: TL;DR table row for iter39 (committed
  pending validation), Phase 7 resumption sequence, internals-summary
  for future-session resumption.

Backend tip: `662f887` (iter39 α-31) + `8746690` (unpack self-test) on
gitea master. Self-test passes on noether x86_64; compile-test clean on
boltzmann aarch64 native; self-review of commit vs Phase 5 amendments
APPROVED. Phase 7 actual decode test blocked on fresnel power-on.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:22:34 +00:00

126 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 4 — iter39 sub-profile support plan
**Status:** Phase 6 LANDED at backend `662f887` on gitea master (pushed 2026-05-17).
Phase 5 review (sonnet-architect, 3 mandatory amendments + 1 corrected claim) folded in below.
Phase 7 test rig at `phase7_iter39_test_rig.sh`; blocked on fresnel power-on.
## FR
PRE_COMPACT_HANDOFF.md "Open items" #2 — H264 Hi10P, HEVC Main10, VP9 Profile 2 are advertised as HW-capable on RK3399 but the libva backend has no entries. Drop them in.
## Phase 0/2 findings (locked from linux-mmind-v7.0 rkvdec source on boltzmann)
`drivers/media/platform/rockchip/rkvdec/rkvdec.c` ctrl tables, with `rk3399_rkvdec_variant` binding `rkvdec_coded_fmts`:
| VAProfile | rkvdec HW | V4L2 OUTPUT pix_fmt | rkvdec CAPTURE pix_fmt | notes |
|-------------------|-----------|------------------------|------------------------|-------|
| `H264High10` | ✅ yes | `H264_SLICE` | `NV15` (4:2:0 10-bit) | ctrl `cfg.max=HIGH_422_INTRA`, `bit_depth_luma_minus8==2` path live in `rkvdec-h264-common.c:196` |
| `HEVCMain10` | ✅ yes | `HEVC_SLICE` | `NV15` | ctrl `cfg.max=MAIN_10`, rkvdec-hevc.c:514 "only 8-bit and 10-bit are supported" |
| `VP9Profile2` | ❌ no | n/a | n/a | rkvdec-vp9.c:670 "We only support profile 0"; ctrl `cfg.max=PROFILE_0`. **EXCLUDED FROM SCOPE.** |
Bonus / aside: H264 4:2:2 profiles also supported by HW (NV16/NV20) but VAAPI's `VAProfileH264High422` is non-standard and most consumers won't use it. **OUT OF SCOPE.**
## Architectural ripple — NV15 ↔ VA-standard pixel format
The hard part. RK3399 rkvdec emits 10-bit frames as **NV15**: 4 × 10-bit values packed in 5 bytes per element, no padding. VAAPI's standard 10-bit fourcc is **P010**: 2 bytes per pixel, 10 high bits used, 6 low bits zero. Mapping requires a bit unpack pass.
`/usr/include/va/va.h`:
- `VA_FOURCC_P010 = 0x30313050` (defined, standard)
- `VA_FOURCC_NV15` (not defined — would need inline `VA_FOURCC('N','V','1','5')`)
- `VA_RT_FORMAT_YUV420_10 = 0x100` (defined)
ffmpeg-v4l2-request kdirect path already handles NV15 → P010 internally inside `libavcodec/v4l2_request_hevc.c` family for the kdirect-test-rig codepath — so kdirect output for a Main10 fixture lands in P010 buffers already. Our libva-vs-kdirect bit-exact contract can still hold if we surface P010 too.
### Three scope options
**Option A — Enumerate-only.** Wire profile lists, pixelformat_for_profile, picture.c, synthetic SPS bit_depth. Skip the unpack. vainfo will list Hi10P / Main10 but actual decode emits NV15 in raw buffers that no standard VAAPI consumer understands. Misleading — not recommended.
**Option B — NV15-as-FOURCC expose.** Surface decoded NV15 with a non-standard `VA_FOURCC('N','V','1','5')`. Mesa/ffmpeg-vaapi will reject in their `vaCreateImage` path. Only useful for `vaExportSurfaceHandle` (DRM-PRIME) consumers that understand DRM_FORMAT_NV15 modifiers — Mesa panfrost-Midgard support unclear.
**Option C — Userspace unpack to P010.** Add `nv15_to_p010()` in surface.c / image.c, run in `copy_surface_to_image`. ~150 LOC bit-twiddling. Adds 1× decoded-frame-size memcpy + bit unpack per `vaDeriveImage` / `vaGetImage` call. Standard VAAPI consumers work as expected.
**Option D — Skeleton only / documented partial.** Profiles enumerated, decode goes through, but `vaDeriveImage` returns `VA_STATUS_ERROR_UNIMPLEMENTED` for 10-bit surfaces with a clear log. Real downstream usage broken; flagged as known limitation in README + memory.
## Recommended plan: Option C (full P010 unpack) — LOCKED
Reasoning: PRE_COMPACT_HANDOFF.md marks this as "open polish item" — partial-work options A/B/D would resurface as a TODO. Sub-profile support is only useful end-to-end. Option C is the only one that lets a Main10 fixture round-trip through `ffmpeg -hwaccel vaapi -i x.hevc.10b.mp4 ...` with our backend.
NV15 packing per `Documentation/userspace-api/media/v4l/pixfmt-nv15.rst` (linux-mmind-v7.0): 4×10-bit values packed in 5 consecutive bytes — `A[9:2] A[1:0]B[9:4] B[3:0]C[7:6] C[5:0]D[9:8] D[7:0]`. Unpack one 5-byte group → 4 P010 words (2B each, value in bits [15:6], zeros in [5:0]).
**Stride gotcha (per Phase 5 review)**: NV15 row stride is `ceil(width/4)*5` bytes, NOT `width*2`. The kernel's `G_FMT` returns the NV15 stride in `destination_bytesperlines[0]`. The unpack must use NV15 stride for source iteration and compute P010 stride independently as `width * 2`.
**kdirect does NOT unpack (Phase 5 correction)**: ffmpeg-v4l2-request's hwcontext_v4l2request.c maps NV15 as `AV_PIX_FMT_YUV420P10` with `DRM_FORMAT_NV15` modifier and exports raw DRM_PRIME. The downstream `av_frame_copy` uses libswscale's unpack — kdirect itself emits raw NV15 in DRM-PRIME buffers. **The libva backend cannot use that trick**`vaGetImage` consumers receive a `VAImage` buffer (no AVFrame, no libswscale call). The Option C userspace unpack is the only path.
## Code changes (Option C, amended after Phase 5 review)
### libva-v4l2-request-fourier (gitea master, claude-noether identity)
**src/codec.c**`pixelformat_for_profile`:
- Add `case VAProfileH264High10:``V4L2_PIX_FMT_H264_SLICE` (same OUTPUT slice as 8-bit H.264 — bit depth is signaled via SPS contents not OUTPUT pix_fmt)
- Add `case VAProfileHEVCMain10:``V4L2_PIX_FMT_HEVC_SLICE`
**src/config.c**:
- `RequestCreateConfig` switch: add the 2 profile cases (no-op validation, same shape as siblings)
- `RequestQueryConfigProfiles`: append `VAProfileH264High10` after existing H264 block (guard `-5``-6`), `VAProfileHEVCMain10` after `HEVCMain` (guard `-1``-2`). Bump `V4L2_REQUEST_MAX_PROFILES` from 11 → 13 in `request.h`.
- `RequestQueryConfigEntrypoints`: add the 2 profile cases
- `RequestGetConfigAttributes` + the inline assignment in `RequestCreateConfig` (line 122): branch on profile to return `VA_RT_FORMAT_YUV420_10` instead of `VA_RT_FORMAT_YUV420` for 10-bit profiles.
**src/context.c**`RequestCreateContext`:
- Line 111-131 CAPTURE-probe block: extend to try NV15 first for 10-bit profiles (else NULL `video_format` → NULL-deref at line 135). Profile-gated branch.
- Line 178 `capture_pixelformat = V4L2_PIX_FMT_NV12` → branch on profile to set `V4L2_PIX_FMT_NV15` for 10-bit profiles.
- Synthetic SPS (line 235+): add Hi10P / Main10 cases with `bit_depth_luma_minus8 = 2`, `bit_depth_chroma_minus8 = 2`. H264 `profile_idc=110` is benign-but-unnecessary per Phase 5 review (kernel ignores in `get_image_fmt`); HEVC SPS has no profile_idc field at all. Image_fmt resolution is purely on `bit_depth_luma_minus8` (==2) + `chroma_format_idc` (==1).
**src/video.c — CRITICAL (Phase 5 amendment 1)**:
- `formats[]` table (line 37) currently has only NV12 entry. **Add NV15 entry.** Without this, `video_format_find(V4L2_PIX_FMT_NV15)` at context.c:117 returns NULL → `v4l2_type_video_capture()` at context.c:135 NULL-derefs.
**src/picture.c** — 5 switch blocks (lines 102, 123, 165, 210, 277). Add the new cases routing to the same per-codec function as the 8-bit profile siblings.
**src/h264.c / src/h265.c** — verify `bit_depth_luma_minus8 != 0` paths exist (they do, mapped to V4L2_CTRL field). No change expected.
**src/surface.c — CRITICAL (Phase 5 amendment 2)**:
- Line 185: `if (format != VA_RT_FORMAT_YUV420) return UNSUPPORTED;` — extend to `(format != YUV420 && format != YUV420_10)`. Without this fix the 10-bit `vaCreateSurfaces` aborts before context creation.
- `copy_surface_to_image`: branch on surface fourcc — if NV15 source, call unpack into P010 destination
- `RequestExportSurfaceHandle` (line ~685): emit `DRM_FORMAT_P010` (copy path) or pass-through `DRM_FORMAT_NV15` for PRIME path. Ship copy path only for v1; PRIME path is follow-up.
**src/image.c — CRITICAL (Phase 5 amendment 3)**:
- `RequestDeriveImage` line ~272: hardcoded `format.fourcc = VA_FOURCC_NV12` + `bits_per_pixel = 12`. **Branch on the underlying surface's bit depth** — emit `VA_FOURCC_P010` + `bits_per_pixel = 24` for 10-bit surfaces.
- `RequestQueryImageFormats` line ~326: currently advertises NV12 only. Extend to also advertise P010 when the active session is 10-bit. Requires per-session `is_10bit` flag on `driver_data` or a config lookup.
**src/request.h or struct request_data**:
- Add `bool is_10bit` (or equivalent — could derive from active config), set in `RequestCreateContext` based on `config_object->profile`, used by image.c branches.
**src/nv15.c + src/nv15.h (new files)**:
- `nv15_to_p010(const uint8_t *src, uint16_t *dst, unsigned int width, unsigned int height, unsigned int src_stride)` — pure C bit unpack. ~30-40 LOC for the function itself. `dst_stride = width * 2` (computed inline). `src_stride = ceil(width/4)*5` from kernel G_FMT.
- Two calls: luma plane (full size), chroma plane (UV interleaved, half-height).
### Diff size estimate (revised)
- codec.c, config.c, picture.c, context.c, video.c: ~80 LOC
- surface.c + image.c branches: ~50 LOC
- NV15→P010 unpack (nv15.c new): ~50 LOC
- Total: ~180 LOC (Phase 5 corrected pessimistic 150→100 for unpack itself; offset by extra video.c + image.c + surface.c amendments)
## Test plan (Phase 7)
**Fixtures (acquire on fresnel):**
```bash
# Re-encode BBB into Main10 + Hi10P
ffmpeg -i ~/measurements/source/bbb_720p.mov -t 10 -c:v libx265 -preset fast -crf 28 \
-pix_fmt yuv420p10le -profile:v main10 ~/measurements/encoded/bbb_main10.mp4
ffmpeg -i ~/measurements/source/bbb_720p.mov -t 10 -c:v libx264 -preset medium -crf 23 \
-pix_fmt yuv420p10le -profile:v high10 ~/measurements/encoded/bbb_hi10p.mp4
```
**Criteria:**
1. `vainfo` lists `VAProfileHEVCMain10` + `VAProfileH264High10`.
2. `ffmpeg -hwaccel vaapi -i bbb_main10.mp4 -vf hwdownload,format=p010le -frames:v 10 -f rawvideo /tmp/L_main10.yuv` succeeds without error.
3. SHA matches between libva and kdirect for 10 frames each codec — **using the same `-vf hwdownload,format=p010le` on BOTH paths** (kdirect emits NV15 via DRM-PRIME and libswscale unpacks via the format filter; libva emits P010 directly via our new unpack). The format filter normalizes both into P010 byte stream.
4. SSIM vs libavcodec SW reference ≥ 0.999 against `-pix_fmt yuv420p10le` SW decode (Main10 SW reference encoded above; convert P010 to YUV420P10 in the compare step or compare SSIM_Y after conversion).
5. No regression — 5/5 PASS still holds on the existing 5-codec smoke (run after changes).
## Open / pending decisions
- **PRIME path vs copy path for 10-bit**: ship copy-only first (P010 derived/created images), defer PRIME path for a follow-up. Many consumers actually use `vaPutImage` rather than `vaExportSurface` so this covers most cases.
- **VP9 Profile 2**: confirmed HW-unsupported on RK3399; do NOT add to enumeration. Add an explicit `/* not on RK3399 rkvdec */` comment in config.c near the VP9 line to prevent future "completeness" PRs adding it.