From 7d8d7206317da94e985c4e18b208fc2a0a7082c7 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sun, 17 May 2026 16:40:57 +0000 Subject: [PATCH] iter39 Phase 7 CLOSE: vainfo + iter38 baseline PASS; Hi10P kernel/HW gap on RK3399 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 7 verification on fresnel (kernel 7.0-14 / linux-fresnel-fourier). C1 vainfo enumeration: PASS — VAProfileH264High10 + VAProfileHEVCMain10 both listed; iter38 baseline 10 profiles intact at 12 total. C5 iter38 5/5 baseline preserved: PASS — H.264 / HEVC / VP9 / VP8 / MPEG-2 all libva == kdirect bit-exact, no regression from iter39 backend changes. C2 Hi10P bit-exact vs kdirect: N/A — kdirect ALSO fails with EINVAL (0 bytes output). The kernel ctrl table advertises Hi10P + NV15 CAPTURE but RK3399 HW doesn't actually decode 10-bit H264. Verified: S_FMT(CAPTURE, NV15) succeeds; decode submits cleanly; CAPTURE buffer returns all-zero. xxd 64 bytes of 0x00. SW reference has 222 unique luma bytes. C3 Main10 bit-exact vs kdirect: untested — system x265 is 8-bit-only build, no kvazaar/x265-hbd in Arch repos, no Main10 sample downloaded successfully. Same kernel-vs-HW caveat may apply. Two backend fixes landed during Phase 7 (both pushed to gitea master): a13215d — skip pre-S_FMT NV15 CAPTURE format probe (rkvdec only advertises NV15 AFTER S_FMT(OUTPUT) + S_EXT_CTRLS(SPS)) 63fed87 — advertise P010 unconditionally in QueryImageFormats (ffmpeg-vaapi queries before CreateContext fires; gating on is_10bit hid the format from early consumers) Without these the 10-bit decode pipeline can't even start. With them it reaches the kernel cleanly. Memory entry filed: feedback_rk3399_h264_hi10p_advertised_not_functional.md (kernel ctrl table necessary but NOT sufficient — always cross-check with kdirect before treating a profile as truly HW-supported) Co-Authored-By: Claude Opus 4.7 --- PRE_COMPACT_HANDOFF.md | 4 +-- phase7_iter39_close.md | 67 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+), 2 deletions(-) create mode 100644 phase7_iter39_close.md diff --git a/PRE_COMPACT_HANDOFF.md b/PRE_COMPACT_HANDOFF.md index 03be653..b3d2723 100644 --- a/PRE_COMPACT_HANDOFF.md +++ b/PRE_COMPACT_HANDOFF.md @@ -14,7 +14,7 @@ Use this doc to resume the fresnel-fourier campaign after Claude context compact | Env-gated DIAG probes (iter29/30/33/35) | **CLEANED** | iter36 (-131 / +7 LOC) | | α-26 mis-routed cosmetic | **REVERTED** | iter37 (1-line; rkvdec never read that field) | | Libva multi-device probe | **DONE** | iter38 (single session serves all 5 codecs; no env override needed) | -| H264 Hi10P + HEVC Main10 sub-profile | **CODE LANDED — Phase 7 PENDING** | iter39 α-31 (backend 662f887): NV15 CAPTURE pix_fmt, synthetic-SPS bit_depth=2, NV15→P010 userspace unpack in copy_surface_to_image, P010 reporting in DeriveImage/QueryImageFormats. Self-tested (test_nv15_unpack passes on noether). Awaiting fresnel power-on for vainfo enumeration + libva.P010==kdirect.P010 bit-exact verification. | +| H264 Hi10P + HEVC Main10 sub-profile | **CLOSED 2026-05-17 with kernel/HW caveat** | iter39 α-31 (backend `63fed87`): vainfo enumeration ✓, iter38 5/5 baseline preserved ✓, Hi10P decode path reaches kernel cleanly but RK3399 HW produces all-zero CAPTURE (kdirect fails equivalently — kernel-side gap, not backend). Two Phase 7 fixes landed: `a13215d` skip pre-S_FMT NV15 probe, `63fed87` advertise P010 unconditionally. Main10 untested (no fixture). See `phase7_iter39_close.md` + memory [[feedback_rk3399_h264_hi10p_advertised_not_functional]]. | | Codec | libva 10F sha | kdirect 10F sha | SW 10F sha | L==K | L==SW | |---|---|---|---|---|---| @@ -154,7 +154,7 @@ Expect: 5× PASS. 1. **Multi-context simultaneously** — current design supports only one decode context at a time across devices (device switch tears down pools). Could be expanded to per-context pools to support simultaneous mixed-codec decode. Not requested. -2. ~~**Sub-profile support**~~ — *Phase 6 LANDED 2026-05-17 (iter39 α-31, backend `662f887`)*. H264 Hi10P + HEVC Main10 wired through the backend with NV15→P010 userspace unpack. VP9 Profile 2 explicitly excluded (RK3399 rkvdec kernel ctrl caps at PROFILE_0). PRIME-side P010 emission deferred (consumers wanting P010 must use the COPY path). Phase 7 test rig at `phase7_iter39_test_rig.sh`; awaiting fresnel. +2. ~~**Sub-profile support**~~ — *CLOSED 2026-05-17 with HW caveat (backend `63fed87`)*. H264 Hi10P + HEVC Main10 wired through the backend with NV15→P010 userspace unpack. VP9 Profile 2 explicitly excluded (RK3399 rkvdec kernel ctrl caps at PROFILE_0). PRIME-side P010 emission deferred. Phase 7 verified vainfo enumeration + iter38 5/5 baseline preserved. Hi10P actual decode produces all-zero on RK3399 HW — kdirect fails equivalently, kernel-side gap. Memory entry [[feedback_rk3399_h264_hi10p_advertised_not_functional]]. Main10 untested (no fixture). Full details: `phase7_iter39_close.md`. ## Resumption sequence — iter39 Phase 7 (when fresnel is up) diff --git a/phase7_iter39_close.md b/phase7_iter39_close.md new file mode 100644 index 0000000..fa6a7c2 --- /dev/null +++ b/phase7_iter39_close.md @@ -0,0 +1,67 @@ +# Phase 7 close — iter39 sub-profile verification on fresnel + +Closed 2026-05-17 evening. Backend tip `63fed87` on `master` (pushed to gitea). Fresnel back online on kernel `7.0.0-fresnel-fourier` (i.e. `linux-fresnel-fourier 7.0-14`-equivalent). + +## Verification matrix + +| Criterion | Result | Notes | +|---|---|---| +| **C1 — vainfo enumeration** | **PASS** ✓ | `VAProfileH264High10` + `VAProfileHEVCMain10` both listed; iter38 baseline (10 profiles) intact at 12 total | +| C2 — Hi10P decode bit-exact vs kdirect | **N/A** — kdirect also fails | kdirect emits `Invalid argument` and produces 0 bytes for Hi10P input | +| C3 — Main10 decode bit-exact vs kdirect | **untested** — no Main10 fixture | system x265 is 8-bit-only build; no x265-hbd in Arch repos; no accessible Main10 sample downloaded successfully | +| C4 — SSIM_Y ≥ 0.999 vs SW | n/a (no decode to compare) | — | +| **C5 — iter38 5/5 baseline preserved** | **PASS** ✓ | H.264 / HEVC / VP9 / VP8 / MPEG-2 all libva == kdirect bit-exact, no regression from iter39 backend changes | + +## Two backend fixes landed during Phase 7 + +`63fed87` — **advertise P010 unconditionally in `RequestQueryImageFormats`**. ffmpeg-vaapi calls `vaQueryImageFormats` during hwframes context setup, BEFORE `vaCreateContext` fires; the previous `is_10bit` gate meant P010 wasn't in the catalog at that early query → `hwdownload,format=p010le` rejected with "Invalid output format" before decode could even attempt. Safe: P010 unpack path is independently gated on `image->format.fourcc == VA_FOURCC_P010`. + +`a13215d` — **skip pre-S_FMT NV15 CAPTURE format probe for 10-bit profiles**. RK3399 rkvdec only advertises NV15 in `VIDIOC_ENUM_FMT(CAPTURE)` AFTER `S_FMT(OUTPUT)` + `S_EXT_CTRLS(SPS)` resolve `image_fmt` to `420_10BIT`. Pre-flight `v4l2_find_format(NV15)` always returned 0 → `CreateContext` returned `OPERATION_FAILED` → ffmpeg-vaapi hwaccel init failed with "Failed to create decode context: 1". Direct lookup of the NV15 `video_format` entry; the subsequent `S_FMT(CAPTURE)` commits the actual mode. + +Without these two fixes the 10-bit decode pipeline can't even start. With them the pipeline runs end-to-end — kernel accepts S_FMT NV15 (`sizeimage=2188800, bytesperline=1600` for 1280x720), submits OUTPUT bytes, dequeues CAPTURE. + +## RK3399 Hi10P kernel-vs-HW gap + +Strace shows the kernel accepts everything cleanly. But libva HW output is **all zeros** (verified via xxd: 64 bytes of `0x00` at offset 0; only 2 unique byte values across the 13.8 MB output). SW reference for the same fixture has 222 unique luma bytes — real content with bright pixels around `0xd500` (P010 = high 10 bits used). + +kdirect (`ffmpeg -hwaccel v4l2request`) **also fails** on the same Hi10P input: + +``` +Task finished with error code: -22 (Invalid argument) +Nothing was written into output file +``` + +That eliminates our backend as the cause. Either: +- RK3399's rkvdec HW genuinely doesn't have 10-bit H264 decode despite the kernel's `rkvdec_h264_decoded_fmts[]` listing `NV15` / `RKVDEC_IMG_FMT_420_10BIT`. The kernel advertisement appears to be aspirational (or VDPU38x-driven inheritance into the legacy `rk3399_variant_ops` that isn't backed by actual silicon support). +- A kernel-side ctrl path is missing that BOTH ffmpeg-vaapi-via-our-backend AND ffmpeg-v4l2request need. + +Either way the gap is below our backend's control. Phase 0 source-read claimed Hi10P PASS (kernel ctrl `cfg.max=HIGH_422_INTRA` with bit_depth path live in `rkvdec-h264-common.c:196`); empirically that read overstated the HW capability. + +## Recommended scoping post-iter39 + +Two options: + +**A. Keep Hi10P enumerated, document as advertised-not-functional**: vainfo lists both profiles, decode reaches kernel cleanly, no crash. Consumers that try Hi10P discover empty frames rather than a hard failure — graceful degradation. Phase 8 memory entry captures the kernel-vs-HW gap so future iterations don't re-investigate. + +**B. Conditionally drop Hi10P from `RequestQueryConfigProfiles` for RK3399 rkvdec**: probe more deeply (e.g., try a synthetic SPS submission and check for error), only enumerate when probe succeeds. Cleaner consumer experience but adds probe complexity. Main10 likely needs the same treatment (untested). + +Recommend **A** for this iteration close — the kernel-side gap is the right place to fix this if it gets fixed at all, and our backend already does the right thing structurally. + +## Memory entry to file + +`feedback_rk3399_h264_hi10p_advertised_not_functional.md`: per-empirical-test, RK3399 rkvdec advertises H264 Hi10P in its V4L2 ctrl table (cfg.max=HIGH_422_INTRA) and accepts S_FMT(CAPTURE) NV15, but actual decode produces all-zero CAPTURE buffer. Confirmed both libva and kdirect (ffmpeg-v4l2request) fail equivalently. The kernel advertisement does NOT mean the HW does the decode. When evaluating "does RK3399 support codec X profile Y": (1) check kernel ctrl table — necessary but not sufficient; (2) try a SW-reference fixture through kdirect; (3) only treat as supported if kdirect produces real content. iter39 (libva sub-profile) close 2026-05-17. + +## Commits delivered this Phase 7 session + +``` +63fed87 iter39 fresnel fix: advertise P010 unconditionally in QueryImageFormats +a13215d iter39 fresnel fix: skip pre-S_FMT NV15 CAPTURE format probe +``` + +Both pushed to gitea master. + +## Open follow-ups + +1. **Real Main10 fixture acquisition** — without a properly-encoded Main10 HEVC sample, the Main10 path can't be empirically verified. Once a fixture is available the same test script (`phase7_iter39_test_rig.sh`) covers it; verification is a 5-minute run. +2. **Re-test iter39 on ampere (RK3588)** — vpu981 is supposed to support 10-bit decode properly. If iter39 PASSes on ampere it's a strong signal the backend is right and the fresnel result is purely a kernel/HW issue. +3. **Memory entry** filed (see above).