Adds the iter39 sub-profile (H264 Hi10P + HEVC Main10) FR landing materials and resumption sequence to the campaign repo. - phase4_iter39_subprofile_plan.md: full Phase 4 plan with Phase 5 sonnet-architect review amendments folded in. Documents the Option A/B/C/D scope tree, the locked Option C choice (full NV15→P010 userspace unpack), the LOC breakdown (~180), and the test plan. - phase7_iter39_test_rig.sh: end-to-end test script for fresnel. Encodes Hi10P + Main10 fixtures, runs libva vs kdirect bit-exact comparison (both via `-vf hwdownload,format=p010le` to normalize the NV15 stride difference between paths), SSIM_Y check vs SW reference, and verifies the iter38 5/5 baseline still holds. - PRE_COMPACT_HANDOFF.md: TL;DR table row for iter39 (committed pending validation), Phase 7 resumption sequence, internals-summary for future-session resumption. Backend tip: `662f887` (iter39 α-31) + `8746690` (unpack self-test) on gitea master. Self-test passes on noether x86_64; compile-test clean on boltzmann aarch64 native; self-review of commit vs Phase 5 amendments APPROVED. Phase 7 actual decode test blocked on fresnel power-on. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
11 KiB
Phase 4 — iter39 sub-profile support plan
Status: Phase 6 LANDED at backend 662f887 on gitea master (pushed 2026-05-17).
Phase 5 review (sonnet-architect, 3 mandatory amendments + 1 corrected claim) folded in below.
Phase 7 test rig at phase7_iter39_test_rig.sh; blocked on fresnel power-on.
FR
PRE_COMPACT_HANDOFF.md "Open items" #2 — H264 Hi10P, HEVC Main10, VP9 Profile 2 are advertised as HW-capable on RK3399 but the libva backend has no entries. Drop them in.
Phase 0/2 findings (locked from linux-mmind-v7.0 rkvdec source on boltzmann)
drivers/media/platform/rockchip/rkvdec/rkvdec.c ctrl tables, with rk3399_rkvdec_variant binding rkvdec_coded_fmts:
| VAProfile | rkvdec HW | V4L2 OUTPUT pix_fmt | rkvdec CAPTURE pix_fmt | notes |
|---|---|---|---|---|
H264High10 |
✅ yes | H264_SLICE |
NV15 (4:2:0 10-bit) |
ctrl cfg.max=HIGH_422_INTRA, bit_depth_luma_minus8==2 path live in rkvdec-h264-common.c:196 |
HEVCMain10 |
✅ yes | HEVC_SLICE |
NV15 |
ctrl cfg.max=MAIN_10, rkvdec-hevc.c:514 "only 8-bit and 10-bit are supported" |
VP9Profile2 |
❌ no | n/a | n/a | rkvdec-vp9.c:670 "We only support profile 0"; ctrl cfg.max=PROFILE_0. EXCLUDED FROM SCOPE. |
Bonus / aside: H264 4:2:2 profiles also supported by HW (NV16/NV20) but VAAPI's VAProfileH264High422 is non-standard and most consumers won't use it. OUT OF SCOPE.
Architectural ripple — NV15 ↔ VA-standard pixel format
The hard part. RK3399 rkvdec emits 10-bit frames as NV15: 4 × 10-bit values packed in 5 bytes per element, no padding. VAAPI's standard 10-bit fourcc is P010: 2 bytes per pixel, 10 high bits used, 6 low bits zero. Mapping requires a bit unpack pass.
/usr/include/va/va.h:
VA_FOURCC_P010 = 0x30313050(defined, standard)VA_FOURCC_NV15(not defined — would need inlineVA_FOURCC('N','V','1','5'))VA_RT_FORMAT_YUV420_10 = 0x100(defined)
ffmpeg-v4l2-request kdirect path already handles NV15 → P010 internally inside libavcodec/v4l2_request_hevc.c family for the kdirect-test-rig codepath — so kdirect output for a Main10 fixture lands in P010 buffers already. Our libva-vs-kdirect bit-exact contract can still hold if we surface P010 too.
Three scope options
Option A — Enumerate-only. Wire profile lists, pixelformat_for_profile, picture.c, synthetic SPS bit_depth. Skip the unpack. vainfo will list Hi10P / Main10 but actual decode emits NV15 in raw buffers that no standard VAAPI consumer understands. Misleading — not recommended.
Option B — NV15-as-FOURCC expose. Surface decoded NV15 with a non-standard VA_FOURCC('N','V','1','5'). Mesa/ffmpeg-vaapi will reject in their vaCreateImage path. Only useful for vaExportSurfaceHandle (DRM-PRIME) consumers that understand DRM_FORMAT_NV15 modifiers — Mesa panfrost-Midgard support unclear.
Option C — Userspace unpack to P010. Add nv15_to_p010() in surface.c / image.c, run in copy_surface_to_image. ~150 LOC bit-twiddling. Adds 1× decoded-frame-size memcpy + bit unpack per vaDeriveImage / vaGetImage call. Standard VAAPI consumers work as expected.
Option D — Skeleton only / documented partial. Profiles enumerated, decode goes through, but vaDeriveImage returns VA_STATUS_ERROR_UNIMPLEMENTED for 10-bit surfaces with a clear log. Real downstream usage broken; flagged as known limitation in README + memory.
Recommended plan: Option C (full P010 unpack) — LOCKED
Reasoning: PRE_COMPACT_HANDOFF.md marks this as "open polish item" — partial-work options A/B/D would resurface as a TODO. Sub-profile support is only useful end-to-end. Option C is the only one that lets a Main10 fixture round-trip through ffmpeg -hwaccel vaapi -i x.hevc.10b.mp4 ... with our backend.
NV15 packing per Documentation/userspace-api/media/v4l/pixfmt-nv15.rst (linux-mmind-v7.0): 4×10-bit values packed in 5 consecutive bytes — A[9:2] A[1:0]B[9:4] B[3:0]C[7:6] C[5:0]D[9:8] D[7:0]. Unpack one 5-byte group → 4 P010 words (2B each, value in bits [15:6], zeros in [5:0]).
Stride gotcha (per Phase 5 review): NV15 row stride is ceil(width/4)*5 bytes, NOT width*2. The kernel's G_FMT returns the NV15 stride in destination_bytesperlines[0]. The unpack must use NV15 stride for source iteration and compute P010 stride independently as width * 2.
kdirect does NOT unpack (Phase 5 correction): ffmpeg-v4l2-request's hwcontext_v4l2request.c maps NV15 as AV_PIX_FMT_YUV420P10 with DRM_FORMAT_NV15 modifier and exports raw DRM_PRIME. The downstream av_frame_copy uses libswscale's unpack — kdirect itself emits raw NV15 in DRM-PRIME buffers. The libva backend cannot use that trick — vaGetImage consumers receive a VAImage buffer (no AVFrame, no libswscale call). The Option C userspace unpack is the only path.
Code changes (Option C, amended after Phase 5 review)
libva-v4l2-request-fourier (gitea master, claude-noether identity)
src/codec.c — pixelformat_for_profile:
- Add
case VAProfileH264High10:→V4L2_PIX_FMT_H264_SLICE(same OUTPUT slice as 8-bit H.264 — bit depth is signaled via SPS contents not OUTPUT pix_fmt) - Add
case VAProfileHEVCMain10:→V4L2_PIX_FMT_HEVC_SLICE
src/config.c:
RequestCreateConfigswitch: add the 2 profile cases (no-op validation, same shape as siblings)RequestQueryConfigProfiles: appendVAProfileH264High10after existing H264 block (guard-5→-6),VAProfileHEVCMain10afterHEVCMain(guard-1→-2). BumpV4L2_REQUEST_MAX_PROFILESfrom 11 → 13 inrequest.h.RequestQueryConfigEntrypoints: add the 2 profile casesRequestGetConfigAttributes+ the inline assignment inRequestCreateConfig(line 122): branch on profile to returnVA_RT_FORMAT_YUV420_10instead ofVA_RT_FORMAT_YUV420for 10-bit profiles.
src/context.c — RequestCreateContext:
- Line 111-131 CAPTURE-probe block: extend to try NV15 first for 10-bit profiles (else NULL
video_format→ NULL-deref at line 135). Profile-gated branch. - Line 178
capture_pixelformat = V4L2_PIX_FMT_NV12→ branch on profile to setV4L2_PIX_FMT_NV15for 10-bit profiles. - Synthetic SPS (line 235+): add Hi10P / Main10 cases with
bit_depth_luma_minus8 = 2,bit_depth_chroma_minus8 = 2. H264profile_idc=110is benign-but-unnecessary per Phase 5 review (kernel ignores inget_image_fmt); HEVC SPS has no profile_idc field at all. Image_fmt resolution is purely onbit_depth_luma_minus8(==2) +chroma_format_idc(==1).
src/video.c — CRITICAL (Phase 5 amendment 1):
formats[]table (line 37) currently has only NV12 entry. Add NV15 entry. Without this,video_format_find(V4L2_PIX_FMT_NV15)at context.c:117 returns NULL →v4l2_type_video_capture()at context.c:135 NULL-derefs.
src/picture.c — 5 switch blocks (lines 102, 123, 165, 210, 277). Add the new cases routing to the same per-codec function as the 8-bit profile siblings.
src/h264.c / src/h265.c — verify bit_depth_luma_minus8 != 0 paths exist (they do, mapped to V4L2_CTRL field). No change expected.
src/surface.c — CRITICAL (Phase 5 amendment 2):
- Line 185:
if (format != VA_RT_FORMAT_YUV420) return UNSUPPORTED;— extend to(format != YUV420 && format != YUV420_10). Without this fix the 10-bitvaCreateSurfacesaborts before context creation. copy_surface_to_image: branch on surface fourcc — if NV15 source, call unpack into P010 destinationRequestExportSurfaceHandle(line ~685): emitDRM_FORMAT_P010(copy path) or pass-throughDRM_FORMAT_NV15for PRIME path. Ship copy path only for v1; PRIME path is follow-up.
src/image.c — CRITICAL (Phase 5 amendment 3):
RequestDeriveImageline ~272: hardcodedformat.fourcc = VA_FOURCC_NV12+bits_per_pixel = 12. Branch on the underlying surface's bit depth — emitVA_FOURCC_P010+bits_per_pixel = 24for 10-bit surfaces.RequestQueryImageFormatsline ~326: currently advertises NV12 only. Extend to also advertise P010 when the active session is 10-bit. Requires per-sessionis_10bitflag ondriver_dataor a config lookup.
src/request.h or struct request_data:
- Add
bool is_10bit(or equivalent — could derive from active config), set inRequestCreateContextbased onconfig_object->profile, used by image.c branches.
src/nv15.c + src/nv15.h (new files):
nv15_to_p010(const uint8_t *src, uint16_t *dst, unsigned int width, unsigned int height, unsigned int src_stride)— pure C bit unpack. ~30-40 LOC for the function itself.dst_stride = width * 2(computed inline).src_stride = ceil(width/4)*5from kernel G_FMT.- Two calls: luma plane (full size), chroma plane (UV interleaved, half-height).
Diff size estimate (revised)
- codec.c, config.c, picture.c, context.c, video.c: ~80 LOC
- surface.c + image.c branches: ~50 LOC
- NV15→P010 unpack (nv15.c new): ~50 LOC
- Total: ~180 LOC (Phase 5 corrected pessimistic 150→100 for unpack itself; offset by extra video.c + image.c + surface.c amendments)
Test plan (Phase 7)
Fixtures (acquire on fresnel):
# Re-encode BBB into Main10 + Hi10P
ffmpeg -i ~/measurements/source/bbb_720p.mov -t 10 -c:v libx265 -preset fast -crf 28 \
-pix_fmt yuv420p10le -profile:v main10 ~/measurements/encoded/bbb_main10.mp4
ffmpeg -i ~/measurements/source/bbb_720p.mov -t 10 -c:v libx264 -preset medium -crf 23 \
-pix_fmt yuv420p10le -profile:v high10 ~/measurements/encoded/bbb_hi10p.mp4
Criteria:
vainfolistsVAProfileHEVCMain10+VAProfileH264High10.ffmpeg -hwaccel vaapi -i bbb_main10.mp4 -vf hwdownload,format=p010le -frames:v 10 -f rawvideo /tmp/L_main10.yuvsucceeds without error.- SHA matches between libva and kdirect for 10 frames each codec — using the same
-vf hwdownload,format=p010leon BOTH paths (kdirect emits NV15 via DRM-PRIME and libswscale unpacks via the format filter; libva emits P010 directly via our new unpack). The format filter normalizes both into P010 byte stream. - SSIM vs libavcodec SW reference ≥ 0.999 against
-pix_fmt yuv420p10leSW decode (Main10 SW reference encoded above; convert P010 to YUV420P10 in the compare step or compare SSIM_Y after conversion). - No regression — 5/5 PASS still holds on the existing 5-codec smoke (run after changes).
Open / pending decisions
- PRIME path vs copy path for 10-bit: ship copy-only first (P010 derived/created images), defer PRIME path for a follow-up. Many consumers actually use
vaPutImagerather thanvaExportSurfaceso this covers most cases. - VP9 Profile 2: confirmed HW-unsupported on RK3399; do NOT add to enumeration. Add an explicit
/* not on RK3399 rkvdec */comment in config.c near the VP9 line to prevent future "completeness" PRs adding it.