7d8d720631
Phase 7 verification on fresnel (kernel 7.0-14 / linux-fresnel-fourier).
C1 vainfo enumeration: PASS — VAProfileH264High10 + VAProfileHEVCMain10
both listed; iter38 baseline 10 profiles intact at 12 total.
C5 iter38 5/5 baseline preserved: PASS — H.264 / HEVC / VP9 / VP8 /
MPEG-2 all libva == kdirect bit-exact, no regression from iter39
backend changes.
C2 Hi10P bit-exact vs kdirect: N/A — kdirect ALSO fails with EINVAL
(0 bytes output). The kernel ctrl table advertises Hi10P + NV15
CAPTURE but RK3399 HW doesn't actually decode 10-bit H264. Verified:
S_FMT(CAPTURE, NV15) succeeds; decode submits cleanly; CAPTURE buffer
returns all-zero. xxd 64 bytes of 0x00. SW reference has 222 unique
luma bytes.
C3 Main10 bit-exact vs kdirect: untested — system x265 is 8-bit-only
build, no kvazaar/x265-hbd in Arch repos, no Main10 sample downloaded
successfully. Same kernel-vs-HW caveat may apply.
Two backend fixes landed during Phase 7 (both pushed to gitea master):
a13215d — skip pre-S_FMT NV15 CAPTURE format probe (rkvdec only
advertises NV15 AFTER S_FMT(OUTPUT) + S_EXT_CTRLS(SPS))
63fed87 — advertise P010 unconditionally in QueryImageFormats
(ffmpeg-vaapi queries before CreateContext fires; gating
on is_10bit hid the format from early consumers)
Without these the 10-bit decode pipeline can't even start. With them
it reaches the kernel cleanly.
Memory entry filed:
feedback_rk3399_h264_hi10p_advertised_not_functional.md
(kernel ctrl table necessary but NOT sufficient — always cross-check
with kdirect before treating a profile as truly HW-supported)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
247 lines
16 KiB
Markdown
247 lines
16 KiB
Markdown
# Pre-Compact Handoff — Session 2026-05-17 (iter39 sub-profile work landed, pending fresnel test)
|
||
|
||
Use this doc to resume the fresnel-fourier campaign after Claude context compaction. **Iter38 close still holds (5/5 PASS, single libva session). Iter39 sub-profile work (H264 Hi10P + HEVC Main10) committed at backend `662f887` and awaiting Phase 7 validation on fresnel.**
|
||
|
||
## TL;DR
|
||
|
||
| Bug / Item | Status | Fix iter |
|
||
|---|---|---|
|
||
| Bug 4 (H.264 keyframe-partial) | **FIXED** | iter25 α-25 (rkvdec image_fmt pre-seed via synthetic SPS at CreateContext) |
|
||
| Bug 5 (HEVC libva all-zero CAPTURE) | **FIXED** | iter25 α-25 (frame 1) + iter31 α-29 (frames 2+: slice_params.short_term_ref_pic_set_size from VAAPI st_rps_bits) |
|
||
| VP8 wrong output through libva | **FIXED** | iter33 α-30 (prepend 10/3 byte VP8 uncompressed header to OUTPUT — ffmpeg-vaapi strips it) |
|
||
| MPEG-2 HW differs from SW | **NOT A BUG** | hantro IDCT precision (≤3 LSB / pixel, SSIM > 0.9999); libva == kdirect bit-exact |
|
||
| Kernel diagnostic printks | **CLEANED** | iter32 (7.0-11) + iter34 (7.0-14) |
|
||
| Env-gated DIAG probes (iter29/30/33/35) | **CLEANED** | iter36 (-131 / +7 LOC) |
|
||
| α-26 mis-routed cosmetic | **REVERTED** | iter37 (1-line; rkvdec never read that field) |
|
||
| Libva multi-device probe | **DONE** | iter38 (single session serves all 5 codecs; no env override needed) |
|
||
| H264 Hi10P + HEVC Main10 sub-profile | **CLOSED 2026-05-17 with kernel/HW caveat** | iter39 α-31 (backend `63fed87`): vainfo enumeration ✓, iter38 5/5 baseline preserved ✓, Hi10P decode path reaches kernel cleanly but RK3399 HW produces all-zero CAPTURE (kdirect fails equivalently — kernel-side gap, not backend). Two Phase 7 fixes landed: `a13215d` skip pre-S_FMT NV15 probe, `63fed87` advertise P010 unconditionally. Main10 untested (no fixture). See `phase7_iter39_close.md` + memory [[feedback_rk3399_h264_hi10p_advertised_not_functional]]. |
|
||
|
||
| Codec | libva 10F sha | kdirect 10F sha | SW 10F sha | L==K | L==SW |
|
||
|---|---|---|---|---|---|
|
||
| H.264 | dd4f5f2d552c07bc | same | same | ✓ | ✓ |
|
||
| HEVC | 108f925bb6cbb6c9 | same | same | ✓ | ✓ |
|
||
| VP9 | cf35908ae0f9ab60 | same | same | ✓ | ✓ |
|
||
| VP8 | d3231e5b6c0ee10b | same | same | ✓ | ✓ |
|
||
| MPEG-2| 95c5905890c937d4 | same | 933b744134e47ba4 | ✓ | ~ (≤3 LSB IDCT precision) |
|
||
|
||
**5/5 PASS** the libva-vs-kdirect bit-exact correctness contract. 4/5 also bit-equal SW.
|
||
|
||
`vainfo` with NO env override enumerates the union of profiles from rkvdec + hantro:
|
||
|
||
```
|
||
v4l2-request: auto-selected codec device: /dev/video3 + /dev/media1
|
||
v4l2-request: iter38: also opened hantro-vpu decoder at /dev/video2 + /dev/media0
|
||
vainfo: Supported profile and entrypoints
|
||
VAProfileMPEG2Simple : VAEntrypointVLD
|
||
VAProfileMPEG2Main : VAEntrypointVLD
|
||
VAProfileH264Main : VAEntrypointVLD
|
||
VAProfileH264High : VAEntrypointVLD
|
||
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
|
||
VAProfileH264MultiviewHigh : VAEntrypointVLD
|
||
VAProfileH264StereoHigh : VAEntrypointVLD
|
||
VAProfileHEVCMain : VAEntrypointVLD
|
||
VAProfileVP8Version0_3 : VAEntrypointVLD
|
||
VAProfileVP9Profile0 : VAEntrypointVLD
|
||
```
|
||
|
||
## Substrate state
|
||
|
||
| Component | Location | Tip |
|
||
|---|---|---|
|
||
| Campaign repo (this) | `/home/mfritsche/src/fresnel-fourier/` | `ba4b6fd` on gitea master |
|
||
| Libva backend fork (noether) | `/home/mfritsche/src/libva-multiplanar/libva-v4l2-request-fourier/` | `662f887` on gitea master (iter39 α-31; iter38b is `7ac934e`) |
|
||
| Libva backend (fresnel deploy) | `/home/mfritsche/src/libva-v4l2-request-fourier/` | sync to gitea master, `ninja -C build` |
|
||
| Kernel source (boltzmann) | `~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/linux-fresnel-fourier/` | pkgrel=14 clean |
|
||
| Kernel running on fresnel | `linux-fresnel-fourier 7.0-14` | clean shipping kernel, no diagnostic printks |
|
||
| Test fixtures (fresnel) | `/home/mfritsche/fourier-test/bbb_*.{mp4,ts,webm}` | 5 codecs at 720p10s or 1080p30 |
|
||
| Memory | `~/.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/` | see entries below |
|
||
|
||
## Identity for gitea pushes
|
||
|
||
All `git.reauktion.de` interactions use the `claude-noether` identity (per memory `feedback_gitea_as_claude_noether.md`). Backend remote URL: `ssh://gitea@git.reauktion.de.claude-noether/marfrit/libva-v4l2-request-fourier.git`.
|
||
|
||
## Device map on 7.0-14
|
||
|
||
`/dev/video*` and `/dev/media*` numbers SHIFT between kernel boots based on probe order. On the current 7.0-14 boot:
|
||
|
||
| Driver | /dev/videoN | /dev/mediaN |
|
||
|---|---|---|
|
||
| rockchip-rga | video0 | n/a |
|
||
| rk3399-vpu-enc | video1 | (shared) |
|
||
| rk3399-vpu-dec (hantro) | **video2** | **media0** |
|
||
| rkvdec | **video3** | **media1** |
|
||
|
||
`v4l2-ctl --info` + `media-ctl -p` if mapping uncertain on a fresh boot. Iter38 makes this irrelevant for typical use — libva auto-probes both.
|
||
|
||
## Backend commits delivered (chronological, this campaign day)
|
||
|
||
```
|
||
7ac934e iter38b: bounds check uses MAX_PROFILES (11), not MAX_CONFIG_ATTRIBUTES (10)
|
||
c56a77b iter38: multi-device probe — single libva session serves all 5 codecs ← architectural close
|
||
25d3e5f iter37: revert α-26 — decode_params.short_term_ref_pic_set_size back to 0
|
||
7db15a5 iter36: remove env-gated DIAG probes (iter29/30/33/35)
|
||
48fd028 iter35 DIAG: env-gated dump of v4l2_ctrl_mpeg2_* contents (removed iter36)
|
||
7e0848d iter33 α-30: prepend VP8 uncompressed frame header to OUTPUT buffer ← VP8 fix
|
||
bf3e3d8 iter33: extend VP8 DIAG to dump VAAPI probability struct directly (removed iter36)
|
||
4b3c21b iter33 DIAG: env-gated dump of v4l2_ctrl_vp8_frame contents (removed iter36)
|
||
23eb1bd iter31 α-29: slice_params.short_term_ref_pic_set_size = picture->st_rps_bits ← HEVC fix
|
||
68dbbdd iter30 DIAG: LIBVA_TS_SCALE env-gated timestamp multiplier (removed iter36)
|
||
0eca3ff iter29 DIAG: env-gated dump of HEVC slice_data trailing 80 bytes (removed iter36)
|
||
6646b16 Revert iter28b DIAG: trim=40 universal-trim broke IDR frame 1
|
||
cd286d9 iter28 α-28: bit_size = (slice_data_size - data_byte_offset) * 8 for HEVC
|
||
754be1d iter27 diag: env-gated VAAPI slice fields dump
|
||
719d813 iter27 α-27: populate slice_params.num_entry_point_offsets (no-op)
|
||
66ef848 iter26 α-26: decode_params.short_term_ref_pic_set_size from VAAPI (reverted iter37)
|
||
d062fec iter25 α-25 fix: FRAME_MBS_ONLY flag for H264 dummy SPS
|
||
db0b7f9 iter25 α-25: inject synthetic SPS before cap_pool_init to seed image_fmt ← H264+HEVC frame 1 fix
|
||
```
|
||
|
||
Load-bearing commits: `db0b7f9 + d062fec` (α-25), `23eb1bd` (α-29), `7e0848d` (α-30), `c56a77b + 7ac934e` (iter38 multi-device).
|
||
|
||
## Campaign repo commits delivered (today's arc)
|
||
|
||
```
|
||
ba4b6fd iter38 close: multi-device probe — 5/5 codecs in one libva session
|
||
7e3eadf iter36 close: env-gated DIAG removed, 5/5 PASS retained
|
||
7c06c51 iter35 close: MPEG-2 verified libva-correct; HW IDCT precision intrinsic
|
||
70ddbd6 iter34 close: kernel 7.0-14 CLEAN ship — 5/5 codecs PASS
|
||
cd2d077 iter33: MPEG-2 closed (libva==kdirect bit-exact) — 5/5 codecs PASS
|
||
51eee19 iter33 α-30 close: VP8 FIXED — 4/5 codecs PASS
|
||
acacf3d iter32 close: kernel substrate cleanup landed → 7.0-11 SHIPPING
|
||
85cc178 Update campaign session doc: full-day arc closes at 3/3 PASS
|
||
fde8a25 Update handoff doc: HEVC Bug 5 fully fixed (3/3 PASS)
|
||
c1f9738 iter31 α-29 close: HEVC Bug 5 remainder FIXED — 3/3 PASS
|
||
422ecaf Add pre-compact handoff doc for session resumption
|
||
… earlier in day: c15fc6c, 8b17bf7, 02c4192, bf67900 (iter20-28 chain)
|
||
```
|
||
|
||
## How to verify the current state
|
||
|
||
Run on fresnel (post-7.0-14 boot, no env override needed):
|
||
|
||
```bash
|
||
for codec in h264:bbb_1080p30_h264.mp4 hevc:bbb_720p10s_hevc.mp4 vp9:bbb_720p10s_vp9.webm vp8:bbb_720p10s_vp8.webm mpeg2:bbb_720p10s_mpeg2.ts; do
|
||
name="${codec%%:*}"; fixture="${codec#*:}"
|
||
env LIBVA_DRIVER_NAME=v4l2_request \
|
||
LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
|
||
ffmpeg -hide_banner -loglevel error -y \
|
||
-hwaccel vaapi -hwaccel_output_format vaapi \
|
||
-i "/home/mfritsche/fourier-test/$fixture" \
|
||
-vf "hwdownload,format=nv12" -frames:v 10 \
|
||
-f rawvideo -pix_fmt nv12 "/tmp/L_${name}.yuv"
|
||
ffmpeg -hide_banner -loglevel error -y -hwaccel v4l2request -hwaccel_output_format drm_prime \
|
||
-i "/home/mfritsche/fourier-test/$fixture" -vf "hwdownload,format=nv12" \
|
||
-frames:v 10 -f rawvideo -pix_fmt nv12 "/tmp/K_${name}.yuv"
|
||
L=$(sha256sum "/tmp/L_${name}.yuv" | cut -c1-16)
|
||
K=$(sha256sum "/tmp/K_${name}.yuv" | cut -c1-16)
|
||
[ "$L" = "$K" ] && echo "$name: PASS" || echo "$name: FAIL"
|
||
done
|
||
```
|
||
|
||
Expect: 5× PASS.
|
||
|
||
## Root cause summary
|
||
|
||
**Bug 4 + Bug 5 frame 1 (iter25 α-25)**: `rkvdec_s_ctrl` returns -EBUSY when first SPS triggers image_fmt reset on a busy CAPTURE queue. libva pre-allocated 24 CAPTURE buffers at CreateContext (iter5b-β design) before per-frame S_EXT_CTRLS. Fix: inject synthetic SPS at CreateContext, pre-cap_pool_init, while CAPTURE is still empty.
|
||
|
||
**Bug 5 frame 2+ (iter31 α-29)**: libva backend set `slice_params->short_term_ref_pic_set_size = 0` (stale "VAAPI doesn't expose" comment). rkvdec's `assemble_sw_rps` (rkvdec-hevc.c:386-389) reads this; when zero with `num_short_term_ref_pic_sets <= 1`, falls back to 0 → entropy decoder consumes slice-header bits as long-term-RPS → garbage for every non-IDR slice. IDR is gated by `!IDR_PIC` so frame 1 was unaffected. Fix: `slice_params->short_term_ref_pic_set_size = picture->st_rps_bits` (VAAPI's field IS the slice-header bit count, per `va_dec_hevc.h` doc). α-26 had mis-routed this value into `decode_params` (same field name in V4L2, different semantics — SPS-side bit count) — reverted in iter37.
|
||
|
||
**VP8 (iter33 α-30)**: ffmpeg-vaapi strips the VP8 uncompressed frame header (3 bytes interframe / 10 bytes keyframe) before submitting via VAAPI. ffmpeg-v4l2request keeps it. Hantro hard-codes `first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3` and uses it for both `mb_offset_bits` and `dct_part_offset`. Without the prepended header in libva's OUTPUT, hantro's offset arithmetic lands inside the compressed bitstream and the entropy decoder produces garbage. Fix: in `codec_store_buffer`, prepend `header_size` zero bytes to OUTPUT for VP8 profile (hantro skips these bytes for actual parsing, uses ctrl-struct values).
|
||
|
||
**Multi-device probe (iter38)**: VA_DRIVER_INIT opens BOTH rkvdec + hantro fds. `RequestCreateConfig` retargets `driver_data->{video,media}_fd` to the right device per profile (tearing down pools on switch). `RequestQueryConfigProfiles` unions across all open fds. iter38b fixed a latent off-by-one: bounds checks used `MAX_CONFIG_ATTRIBUTES` (10) but profile array is sized by `MAX_PROFILES` (11) — pre-iter38 never returned more than 9 profiles so the bug never bit.
|
||
|
||
## Open items (low priority, optional polish)
|
||
|
||
1. **Multi-context simultaneously** — current design supports only one decode context at a time across devices (device switch tears down pools). Could be expanded to per-context pools to support simultaneous mixed-codec decode. Not requested.
|
||
|
||
2. ~~**Sub-profile support**~~ — *CLOSED 2026-05-17 with HW caveat (backend `63fed87`)*. H264 Hi10P + HEVC Main10 wired through the backend with NV15→P010 userspace unpack. VP9 Profile 2 explicitly excluded (RK3399 rkvdec kernel ctrl caps at PROFILE_0). PRIME-side P010 emission deferred. Phase 7 verified vainfo enumeration + iter38 5/5 baseline preserved. Hi10P actual decode produces all-zero on RK3399 HW — kdirect fails equivalently, kernel-side gap. Memory entry [[feedback_rk3399_h264_hi10p_advertised_not_functional]]. Main10 untested (no fixture). Full details: `phase7_iter39_close.md`.
|
||
|
||
## Resumption sequence — iter39 Phase 7 (when fresnel is up)
|
||
|
||
```bash
|
||
# 1. Sync + build backend on fresnel
|
||
ssh fresnel 'cd ~/src/libva-v4l2-request-fourier && \
|
||
git fetch && git reset --hard origin/master && \
|
||
ninja -C build && \
|
||
sudo install -m644 build/src/v4l2_request_drv_video.so /usr/lib/dri/'
|
||
|
||
# 2. Push test rig + run
|
||
scp ~/src/fresnel-fourier/phase7_iter39_test_rig.sh fresnel:/tmp/
|
||
ssh fresnel 'bash /tmp/phase7_iter39_test_rig.sh'
|
||
|
||
# Expected pass criteria:
|
||
# 1. vainfo lists VAProfileH264High10 + VAProfileHEVCMain10
|
||
# 2. libva.P010 SHA == kdirect.P010 SHA for Hi10P and Main10 fixtures
|
||
# (both paths use -vf hwdownload,format=p010le to normalize NV15)
|
||
# 3. SSIM_Y vs libavcodec SW (yuv420p10le) >= 0.999
|
||
# 4. iter38 5/5 PASS baseline still holds on H264/HEVC/VP9/VP8/MPEG-2
|
||
```
|
||
|
||
## Iter39 internals — pre-Phase 7 verification done
|
||
|
||
- **Self-test** of `nv15_unpack_plane_to_p010` (`tests/test_nv15_unpack.c` in backend): zero / all-max / 8 known vectors / remainder widths {1,2,3,7} / multi-row stride-padding / chroma-shape — ALL PASS on noether x86_64.
|
||
- **Compile-test**: aarch64 native build on boltzmann clean (gcc 15.2.1 / libva 1.23.0 / libdrm 2.4.133), .so produced, 0 new warnings.
|
||
- **Self-review of commit 662f887** vs Phase 5 amendments: APPROVED. All 3 mandatory amendments + MAX_PROFILES bump + guard updates + NV15-stride source confirmed present.
|
||
|
||
## Iter39 design notes (load-bearing)
|
||
|
||
- `driver_data->is_10bit` is the per-session flag (request.h). Set in `RequestCreateContext` from `config_object->profile`, cleared in `RequestDestroyContext`. Drives image.c P010 reporting/unpack and context.c CAPTURE pix_fmt.
|
||
- `video_format` cache invalidated on bit-depth transition (sibling to iter38's device-switch invalidation in `request_switch_device_for_profile`). Same session can now alternate Main → Main10 contexts.
|
||
- Synthetic SPS pre-seed (α-25 lineage) extended for 10-bit: `bit_depth_luma_minus8 = 2`. Image_fmt resolution in `rkvdec-h264-common.c:196` + `rkvdec-hevc-common.c:467` dispatches on bit_depth_luma_minus8 only — profile_idc ignored, `v4l2_ctrl_hevc_sps` has no profile_idc field at all.
|
||
- NV15 stride = V4L2-reported `destination_bytesperlines[i]` (kernel may pad above `ceil(width/4)*5`). NEVER assume `width*2`.
|
||
- VP9 Profile 2 NOT in any path. Added comment in config.c near VAProfileVP9Profile0 case to deter future "completeness" PRs.
|
||
|
||
## Memory entries (full campaign set)
|
||
|
||
- `feedback_rkvdec_image_fmt_pre_seed.md` — α-25 (Bug 4 + Bug 5 frame 1)
|
||
- `feedback_va_st_rps_bits_is_slice_field.md` — α-29 (Bug 5 frame 2+)
|
||
- `feedback_vaapi_strips_vp8_uncompressed_header.md` — α-30 (VP8)
|
||
- `feedback_mpeg2_hw_sw_idct_precision.md` — MPEG-2 PASS criterion = libva==kdirect (HW vs SW gap intrinsic per spec)
|
||
- `feedback_multi_device_probe_design.md` — iter38 dual-fd architecture + MAX_PROFILES bounds gotcha
|
||
- `feedback_libva_byte_correct_kernel_bug.md` — **FULLY OVERTURNED** (both Bug 4 + Bug 5 are libva-side fixes)
|
||
- `reference_fresnel_kernel_substrate.md` — 7.0-14 clean, device-enumeration-shift caveat
|
||
- MEMORY.md index updated
|
||
|
||
## Key commands quickreference
|
||
|
||
```bash
|
||
# Sync backend on fresnel + rebuild
|
||
ssh fresnel 'cd ~/src/libva-v4l2-request-fourier && git fetch && git reset --hard origin/master && ninja -C build'
|
||
|
||
# 5-codec smoke (above script). Each codec ~5s.
|
||
|
||
# Identify which video device is rkvdec vs hantro after a fresh boot
|
||
ssh fresnel 'for v in /dev/video*; do v4l2-ctl -d $v --info 2>/dev/null | grep -E "^Card type" | head -1 | awk -v dev=$v "{print dev,\$0}"; done'
|
||
|
||
# vainfo (auto-detects + opens both decoders since iter38)
|
||
ssh fresnel 'env LIBVA_DRIVER_NAME=v4l2_request \
|
||
LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
|
||
vainfo'
|
||
|
||
# kdirect reference (works for any codec; hwaccel auto-routes)
|
||
ssh fresnel 'ffmpeg -hide_banner -loglevel error -y -hwaccel v4l2request -hwaccel_output_format drm_prime \
|
||
-i /home/mfritsche/fourier-test/bbb_720p10s_hevc.mp4 \
|
||
-vf "hwdownload,format=nv12" -frames:v 10 -f rawvideo -pix_fmt nv12 /tmp/y.yuv'
|
||
|
||
# Force single-device mode (skip iter38 alt-probe)
|
||
env LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1 ...
|
||
|
||
# Reboot fresnel (sddm autologin reseats mfritsche)
|
||
ssh fresnel 'sudo systemctl reboot'; sleep 60
|
||
```
|
||
|
||
## Safe vs needs-confirmation actions
|
||
|
||
**Safe (no confirmation needed)**:
|
||
- Read/grep on noether, boltzmann, fresnel
|
||
- Push to gitea (claude-noether identity)
|
||
- Reboot fresnel (sddm autologin restores session)
|
||
- Build kernel on boltzmann via `makepkg -ef --skipinteg --noconfirm`
|
||
- Deploy kernel via `scp` + `sudo pacman -U`
|
||
- Run ffmpeg/cmp tests on fresnel
|
||
|
||
**Needs confirmation**:
|
||
- Significant rebuild (~25-30 min CPU on boltzmann, e.g. ffmpeg full rebuild or fresh kernel build)
|
||
- Per-context pool refactor (item 1 — would allow simultaneous mixed-codec decode but is invasive)
|
||
- Sub-profile rollout (item 2)
|