Files
fresnel-fourier/phase4_iter39_subprofile_plan.md
marfrit 407c7c56e1 iter39 Phase 4-6 LANDED on backend — Phase 7 awaiting fresnel power-on
Adds the iter39 sub-profile (H264 Hi10P + HEVC Main10) FR landing
materials and resumption sequence to the campaign repo.

- phase4_iter39_subprofile_plan.md: full Phase 4 plan with Phase 5
  sonnet-architect review amendments folded in. Documents the
  Option A/B/C/D scope tree, the locked Option C choice (full NV15→P010
  userspace unpack), the LOC breakdown (~180), and the test plan.
- phase7_iter39_test_rig.sh: end-to-end test script for fresnel. Encodes
  Hi10P + Main10 fixtures, runs libva vs kdirect bit-exact comparison
  (both via `-vf hwdownload,format=p010le` to normalize the NV15 stride
  difference between paths), SSIM_Y check vs SW reference, and verifies
  the iter38 5/5 baseline still holds.
- PRE_COMPACT_HANDOFF.md: TL;DR table row for iter39 (committed
  pending validation), Phase 7 resumption sequence, internals-summary
  for future-session resumption.

Backend tip: `662f887` (iter39 α-31) + `8746690` (unpack self-test) on
gitea master. Self-test passes on noether x86_64; compile-test clean on
boltzmann aarch64 native; self-review of commit vs Phase 5 amendments
APPROVED. Phase 7 actual decode test blocked on fresnel power-on.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:22:34 +00:00

11 KiB
Raw Permalink Blame History

Phase 4 — iter39 sub-profile support plan

Status: Phase 6 LANDED at backend 662f887 on gitea master (pushed 2026-05-17). Phase 5 review (sonnet-architect, 3 mandatory amendments + 1 corrected claim) folded in below. Phase 7 test rig at phase7_iter39_test_rig.sh; blocked on fresnel power-on.

FR

PRE_COMPACT_HANDOFF.md "Open items" #2 — H264 Hi10P, HEVC Main10, VP9 Profile 2 are advertised as HW-capable on RK3399 but the libva backend has no entries. Drop them in.

Phase 0/2 findings (locked from linux-mmind-v7.0 rkvdec source on boltzmann)

drivers/media/platform/rockchip/rkvdec/rkvdec.c ctrl tables, with rk3399_rkvdec_variant binding rkvdec_coded_fmts:

VAProfile rkvdec HW V4L2 OUTPUT pix_fmt rkvdec CAPTURE pix_fmt notes
H264High10 yes H264_SLICE NV15 (4:2:0 10-bit) ctrl cfg.max=HIGH_422_INTRA, bit_depth_luma_minus8==2 path live in rkvdec-h264-common.c:196
HEVCMain10 yes HEVC_SLICE NV15 ctrl cfg.max=MAIN_10, rkvdec-hevc.c:514 "only 8-bit and 10-bit are supported"
VP9Profile2 no n/a n/a rkvdec-vp9.c:670 "We only support profile 0"; ctrl cfg.max=PROFILE_0. EXCLUDED FROM SCOPE.

Bonus / aside: H264 4:2:2 profiles also supported by HW (NV16/NV20) but VAAPI's VAProfileH264High422 is non-standard and most consumers won't use it. OUT OF SCOPE.

Architectural ripple — NV15 ↔ VA-standard pixel format

The hard part. RK3399 rkvdec emits 10-bit frames as NV15: 4 × 10-bit values packed in 5 bytes per element, no padding. VAAPI's standard 10-bit fourcc is P010: 2 bytes per pixel, 10 high bits used, 6 low bits zero. Mapping requires a bit unpack pass.

/usr/include/va/va.h:

  • VA_FOURCC_P010 = 0x30313050 (defined, standard)
  • VA_FOURCC_NV15 (not defined — would need inline VA_FOURCC('N','V','1','5'))
  • VA_RT_FORMAT_YUV420_10 = 0x100 (defined)

ffmpeg-v4l2-request kdirect path already handles NV15 → P010 internally inside libavcodec/v4l2_request_hevc.c family for the kdirect-test-rig codepath — so kdirect output for a Main10 fixture lands in P010 buffers already. Our libva-vs-kdirect bit-exact contract can still hold if we surface P010 too.

Three scope options

Option A — Enumerate-only. Wire profile lists, pixelformat_for_profile, picture.c, synthetic SPS bit_depth. Skip the unpack. vainfo will list Hi10P / Main10 but actual decode emits NV15 in raw buffers that no standard VAAPI consumer understands. Misleading — not recommended.

Option B — NV15-as-FOURCC expose. Surface decoded NV15 with a non-standard VA_FOURCC('N','V','1','5'). Mesa/ffmpeg-vaapi will reject in their vaCreateImage path. Only useful for vaExportSurfaceHandle (DRM-PRIME) consumers that understand DRM_FORMAT_NV15 modifiers — Mesa panfrost-Midgard support unclear.

Option C — Userspace unpack to P010. Add nv15_to_p010() in surface.c / image.c, run in copy_surface_to_image. ~150 LOC bit-twiddling. Adds 1× decoded-frame-size memcpy + bit unpack per vaDeriveImage / vaGetImage call. Standard VAAPI consumers work as expected.

Option D — Skeleton only / documented partial. Profiles enumerated, decode goes through, but vaDeriveImage returns VA_STATUS_ERROR_UNIMPLEMENTED for 10-bit surfaces with a clear log. Real downstream usage broken; flagged as known limitation in README + memory.

Reasoning: PRE_COMPACT_HANDOFF.md marks this as "open polish item" — partial-work options A/B/D would resurface as a TODO. Sub-profile support is only useful end-to-end. Option C is the only one that lets a Main10 fixture round-trip through ffmpeg -hwaccel vaapi -i x.hevc.10b.mp4 ... with our backend.

NV15 packing per Documentation/userspace-api/media/v4l/pixfmt-nv15.rst (linux-mmind-v7.0): 4×10-bit values packed in 5 consecutive bytes — A[9:2] A[1:0]B[9:4] B[3:0]C[7:6] C[5:0]D[9:8] D[7:0]. Unpack one 5-byte group → 4 P010 words (2B each, value in bits [15:6], zeros in [5:0]).

Stride gotcha (per Phase 5 review): NV15 row stride is ceil(width/4)*5 bytes, NOT width*2. The kernel's G_FMT returns the NV15 stride in destination_bytesperlines[0]. The unpack must use NV15 stride for source iteration and compute P010 stride independently as width * 2.

kdirect does NOT unpack (Phase 5 correction): ffmpeg-v4l2-request's hwcontext_v4l2request.c maps NV15 as AV_PIX_FMT_YUV420P10 with DRM_FORMAT_NV15 modifier and exports raw DRM_PRIME. The downstream av_frame_copy uses libswscale's unpack — kdirect itself emits raw NV15 in DRM-PRIME buffers. The libva backend cannot use that trickvaGetImage consumers receive a VAImage buffer (no AVFrame, no libswscale call). The Option C userspace unpack is the only path.

Code changes (Option C, amended after Phase 5 review)

libva-v4l2-request-fourier (gitea master, claude-noether identity)

src/codec.cpixelformat_for_profile:

  • Add case VAProfileH264High10:V4L2_PIX_FMT_H264_SLICE (same OUTPUT slice as 8-bit H.264 — bit depth is signaled via SPS contents not OUTPUT pix_fmt)
  • Add case VAProfileHEVCMain10:V4L2_PIX_FMT_HEVC_SLICE

src/config.c:

  • RequestCreateConfig switch: add the 2 profile cases (no-op validation, same shape as siblings)
  • RequestQueryConfigProfiles: append VAProfileH264High10 after existing H264 block (guard -5-6), VAProfileHEVCMain10 after HEVCMain (guard -1-2). Bump V4L2_REQUEST_MAX_PROFILES from 11 → 13 in request.h.
  • RequestQueryConfigEntrypoints: add the 2 profile cases
  • RequestGetConfigAttributes + the inline assignment in RequestCreateConfig (line 122): branch on profile to return VA_RT_FORMAT_YUV420_10 instead of VA_RT_FORMAT_YUV420 for 10-bit profiles.

src/context.cRequestCreateContext:

  • Line 111-131 CAPTURE-probe block: extend to try NV15 first for 10-bit profiles (else NULL video_format → NULL-deref at line 135). Profile-gated branch.
  • Line 178 capture_pixelformat = V4L2_PIX_FMT_NV12 → branch on profile to set V4L2_PIX_FMT_NV15 for 10-bit profiles.
  • Synthetic SPS (line 235+): add Hi10P / Main10 cases with bit_depth_luma_minus8 = 2, bit_depth_chroma_minus8 = 2. H264 profile_idc=110 is benign-but-unnecessary per Phase 5 review (kernel ignores in get_image_fmt); HEVC SPS has no profile_idc field at all. Image_fmt resolution is purely on bit_depth_luma_minus8 (==2) + chroma_format_idc (==1).

src/video.c — CRITICAL (Phase 5 amendment 1):

  • formats[] table (line 37) currently has only NV12 entry. Add NV15 entry. Without this, video_format_find(V4L2_PIX_FMT_NV15) at context.c:117 returns NULL → v4l2_type_video_capture() at context.c:135 NULL-derefs.

src/picture.c — 5 switch blocks (lines 102, 123, 165, 210, 277). Add the new cases routing to the same per-codec function as the 8-bit profile siblings.

src/h264.c / src/h265.c — verify bit_depth_luma_minus8 != 0 paths exist (they do, mapped to V4L2_CTRL field). No change expected.

src/surface.c — CRITICAL (Phase 5 amendment 2):

  • Line 185: if (format != VA_RT_FORMAT_YUV420) return UNSUPPORTED; — extend to (format != YUV420 && format != YUV420_10). Without this fix the 10-bit vaCreateSurfaces aborts before context creation.
  • copy_surface_to_image: branch on surface fourcc — if NV15 source, call unpack into P010 destination
  • RequestExportSurfaceHandle (line ~685): emit DRM_FORMAT_P010 (copy path) or pass-through DRM_FORMAT_NV15 for PRIME path. Ship copy path only for v1; PRIME path is follow-up.

src/image.c — CRITICAL (Phase 5 amendment 3):

  • RequestDeriveImage line ~272: hardcoded format.fourcc = VA_FOURCC_NV12 + bits_per_pixel = 12. Branch on the underlying surface's bit depth — emit VA_FOURCC_P010 + bits_per_pixel = 24 for 10-bit surfaces.
  • RequestQueryImageFormats line ~326: currently advertises NV12 only. Extend to also advertise P010 when the active session is 10-bit. Requires per-session is_10bit flag on driver_data or a config lookup.

src/request.h or struct request_data:

  • Add bool is_10bit (or equivalent — could derive from active config), set in RequestCreateContext based on config_object->profile, used by image.c branches.

src/nv15.c + src/nv15.h (new files):

  • nv15_to_p010(const uint8_t *src, uint16_t *dst, unsigned int width, unsigned int height, unsigned int src_stride) — pure C bit unpack. ~30-40 LOC for the function itself. dst_stride = width * 2 (computed inline). src_stride = ceil(width/4)*5 from kernel G_FMT.
  • Two calls: luma plane (full size), chroma plane (UV interleaved, half-height).

Diff size estimate (revised)

  • codec.c, config.c, picture.c, context.c, video.c: ~80 LOC
  • surface.c + image.c branches: ~50 LOC
  • NV15→P010 unpack (nv15.c new): ~50 LOC
  • Total: ~180 LOC (Phase 5 corrected pessimistic 150→100 for unpack itself; offset by extra video.c + image.c + surface.c amendments)

Test plan (Phase 7)

Fixtures (acquire on fresnel):

# Re-encode BBB into Main10 + Hi10P
ffmpeg -i ~/measurements/source/bbb_720p.mov -t 10 -c:v libx265 -preset fast -crf 28 \
    -pix_fmt yuv420p10le -profile:v main10 ~/measurements/encoded/bbb_main10.mp4
ffmpeg -i ~/measurements/source/bbb_720p.mov -t 10 -c:v libx264 -preset medium -crf 23 \
    -pix_fmt yuv420p10le -profile:v high10 ~/measurements/encoded/bbb_hi10p.mp4

Criteria:

  1. vainfo lists VAProfileHEVCMain10 + VAProfileH264High10.
  2. ffmpeg -hwaccel vaapi -i bbb_main10.mp4 -vf hwdownload,format=p010le -frames:v 10 -f rawvideo /tmp/L_main10.yuv succeeds without error.
  3. SHA matches between libva and kdirect for 10 frames each codec — using the same -vf hwdownload,format=p010le on BOTH paths (kdirect emits NV15 via DRM-PRIME and libswscale unpacks via the format filter; libva emits P010 directly via our new unpack). The format filter normalizes both into P010 byte stream.
  4. SSIM vs libavcodec SW reference ≥ 0.999 against -pix_fmt yuv420p10le SW decode (Main10 SW reference encoded above; convert P010 to YUV420P10 in the compare step or compare SSIM_Y after conversion).
  5. No regression — 5/5 PASS still holds on the existing 5-codec smoke (run after changes).

Open / pending decisions

  • PRIME path vs copy path for 10-bit: ship copy-only first (P010 derived/created images), defer PRIME path for a follow-up. Many consumers actually use vaPutImage rather than vaExportSurface so this covers most cases.
  • VP9 Profile 2: confirmed HW-unsupported on RK3399; do NOT add to enumeration. Add an explicit /* not on RK3399 rkvdec */ comment in config.c near the VP9 line to prevent future "completeness" PRs adding it.