Adds the iter39 sub-profile (H264 Hi10P + HEVC Main10) FR landing
materials and resumption sequence to the campaign repo.
- phase4_iter39_subprofile_plan.md: full Phase 4 plan with Phase 5
sonnet-architect review amendments folded in. Documents the
Option A/B/C/D scope tree, the locked Option C choice (full NV15→P010
userspace unpack), the LOC breakdown (~180), and the test plan.
- phase7_iter39_test_rig.sh: end-to-end test script for fresnel. Encodes
Hi10P + Main10 fixtures, runs libva vs kdirect bit-exact comparison
(both via `-vf hwdownload,format=p010le` to normalize the NV15 stride
difference between paths), SSIM_Y check vs SW reference, and verifies
the iter38 5/5 baseline still holds.
- PRE_COMPACT_HANDOFF.md: TL;DR table row for iter39 (committed
pending validation), Phase 7 resumption sequence, internals-summary
for future-session resumption.
Backend tip: `662f887` (iter39 α-31) + `8746690` (unpack self-test) on
gitea master. Self-test passes on noether x86_64; compile-test clean on
boltzmann aarch64 native; self-review of commit vs Phase 5 amendments
APPROVED. Phase 7 actual decode test blocked on fresnel power-on.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes the last architectural open item from iter37 campaign-close.
VA_DRIVER_INIT now opens BOTH rkvdec and hantro fds when no env
override is set. RequestQueryConfigProfiles enumerates the union
(via any_fd_supports_output_format helper). RequestCreateConfig
calls request_switch_device_for_profile to retarget the active fd
to the device that serves the profile; tears down output_pool +
capture_pool + video_format cache on switch so the next
RequestCreateContext rebuilds them on the new device.
Bonus iter38b: fixed latent bounds-check bug in
RequestQueryConfigProfiles — profile array bounds checks should use
MAX_PROFILES (11), not MAX_CONFIG_ATTRIBUTES (10). Pre-iter38 single-
device probes never returned more than 9 profiles so the off-by-one
never bit. iter38's 10-profile union surfaced it.
Result: 5/5 codecs PASS bit-exact vs kdirect with NO env override.
$ vainfo (no env)
v4l2-request: auto-selected: /dev/video3 + /dev/media1
v4l2-request: iter38: also opened hantro at /dev/video2 + /dev/media0
→ all 10 profiles enumerated (MPEG2+H264+HEVC+VP8+VP9)
Backend tip 7ac934e.
Investigation requested via /user 'mpeg2 next' after iter33 closed
MPEG-2 at libva==kdirect with libva!=SW gap. iter35 verifies:
1. libva==kdirect holds across:
- BBB 720p MPEG-2 240F (sha a75015b10fe205ea)
- Synthetic testsrc 720p MPEG-2 48F (sha 5557dae9fa99d01b)
Rules out fixture coincidence.
2. v4l2 ctrl structs (SEQUENCE + PICTURE + QUANTISATION) semantically
correct per V4L2 spec via iter35 env-gated DIAG dump. Intra Q
matrix in zigzag scan order; flags = PROGRESSIVE+FRAME_PRED_DCT.
3. HW vs SW gap quantified:
- mean byte-diff = 0.015, max = 3
- SSIM = 0.999923 (visually identical)
Within IEEE 1180 / ISO 13818-2 Annex A MPEG-2 IDCT precision
tolerance. Not fixable at V4L2/libva layer.
ffmpeg's own SW IDCT options (-idct auto/int/simple/...) produce 4
distinct hashes for the same bitstream — libavcodec internally has
the same precision divergence between impls. Hantro HW IDCT is yet
another impl, not matching any SW.
Backend tip 48fd028 (added env-gated LIBVA_MPEG2_DUMP_FRAME).
Memory: feedback_mpeg2_hw_sw_idct_precision.md added.
Confirmed the 40-byte inflation is non-uniform — IDR slice has correct
size from VAAPI; only P/B slices are inflated. Real fix requires dynamic
rbsp_stop_bit detection or per-slice-type logic.
α-23 test (skip media_request_reinit): no change. HEVC still 06b2c5a0...
all-zero. Kernel printk still shows w=0 h=0 for libva.
Cumulative disproved mechanisms (iter17-iter19):
2. REINIT clears between S_EXT_CTRLS and QUEUE: DISPROVED (α-23)
3. Stale stack-local pointer: DISPROVED (α-21)
5. Silent partial failure via error_idx: DISPROVED (α-22)
1. request_fd mismatch: unlikely per strace evidence
Remaining:
4. ctrl_hdl mismatch — libva submits to one v4l2_ctrl_handler,
rkvdec reads from another.
iter20 candidate: kernel printk dumping &ctx->ctrl_hdl, per-ID
ctrl pointer, and *p_cur.p first bytes during rkvdec_hevc_run_preamble.
Comparing libva vs kdirect will pinpoint where the mismatch sits.
State at close: backend c1d4bb53... (iter15 stable). Fork tip 415688d.
5-codec anchors held. Diagnostic kernel 7.0-3 still running on fresnel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
α-21 (heap-persist HEVC controls past IOC_QUEUE): hash unchanged.
-> Kernel does copy at S_EXT_CTRLS time, not deferred. Mechanism 3 dead.
α-22 (log error_idx after S_EXT_CTRLS): error_idx = count - 1 in BOTH
the working device-init batch AND the broken per-frame batch. Not
a failure indicator in this kernel version. Mechanism 5 dead.
Backend reverted to iter15 stable state c1d4bb53... All 5-codec
anchors preserved.
Remaining mechanisms (untested):
1. request_fd mismatch (unlikely; strace shows consistent fd)
2. REINIT clears controls between S_EXT_CTRLS and QUEUE (LEADING)
4. ctrl_hdl mismatch (libva submits to one, rkvdec reads from another)
iter17's empirical finding still stands as the campaign's strongest
narrowing: rkvdec sees zero SPS for libva, correct for kdirect. The
mechanism is between S_EXT_CTRLS submission and ctx->ctrl_hdl->p_cur
read, specific to libva's invocation pattern.
iter19 candidate (α-23): test mechanism 2 by disabling
media_request_reinit() in libva's RequestSyncSurface. If hashes
change, REINIT timing is the bug. Alternative (mechanism 4): kernel
printk that dumps &ctx->ctrl_hdl + per-request handler pointer,
comparing libva vs kdirect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DEFINITIVE FINDING via pr_info in rkvdec_hevc_run on RK3399:
libva HEVC: w=0 h=0 reorder=0 chroma=0 nal_unit_type=0 decode_flags=0x0
kdirect HEVC: w=1280 h=720 reorder=2 chroma=1 nal_unit_type=20 decode_flags=0x3
The kernel sees ALL-ZERO control structs for libva HEVC, but CORRECT values
for kdirect. Same kernel, same code path, same /dev/video1, same
rkvdec_hevc_run_preamble fetching v4l2_ctrl_find(ctx->ctrl_hdl,
HEVC_SPS)->p_cur.p.
This overturns iter11-iter15's "wire-byte search exhausted" conclusion.
The S_EXT_CTRLS payloads ARE byte-correct at the strace observer level,
but the kernel sees zeros. The bug is in the
S_EXT_CTRLS -> request -> ctx->ctrl_hdl path, specifically for libva.
Five mechanisms hypothesized:
1. request_fd mismatch
2. REINIT clears controls before QUEUE
3. Compound-control copy deferred until QUEUE -> stack-locals stale
4. ctrl_hdl mismatch (libva submits to one, rkvdec reads another)
5. error_idx silently fails
Key difference observed:
libva stores SPS/PPS/decode_params as STACK LOCALS in h265_set_controls
kdirect stores them in heap-allocated hwaccel_picture_private
Mechanism 3 (kernel defers compound-ctrl copy_from_user) is the leading
hypothesis. iter18 α-21: heap-allocate libva's HEVC control structs;
if Bug 5 fixes, apply same pattern to H.264 (Bug 4) and VP8 (Bug 6).
This is the strongest narrowing since iter5b-β.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per iter16 close (Bug 4/5/6 confirmed kernel-side, libva byte-correct),
add a single pr_info at rkvdec_hevc_run entry dumping key state values
from run->sps / pps / slices_params[0] / decode_params. Build 7.0-3,
deploy, reboot, run libva-HEVC + kdirect-HEVC, diff dmesg output.
Outcome interpretations:
identical -> bug is in rkvdec assemble_hw_*/config_registers/HW path
different -> libva somehow leaks different struct contents via non-
ioctl path despite identical V4L2 ioctls
Build running on boltzmann via kernel-agent workflow; pkgrel 7.0-2 -> 7.0-3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Applied iter14's α-16 OUTPUT byte verification to VP8. Result:
libva VP8 frame 1 OUTPUT dump: 300614 bytes
input IVF frame 1: 300624 bytes
diff with +10-byte offset (VP8 uncompressed header stripped by VAAPI
consumer client-side): 0 bytes differ.
Libva's VP8 OUTPUT bytes are byte-identical to the input frame minus
the 10-byte uncompressed header. Same correctness as iter14's HEVC
verification.
Cumulative finding: ALL THREE remaining campaign bugs (Bug 4 H.264
partial-fill, Bug 5 HEVC all-zero, Bug 6 VP8 partial) have:
- libva controls byte-equal to kdirect on rkvdec-read fields
- libva OUTPUT bitstream bytes byte-identical to input
- libva ioctl sequence structurally close to kdirect after iter15 α-19
But:
- VP9 + MPEG-2 work via the same libva backend on the same kernel.
- libva HEVC/H.264 hash to wrong output; kdirect HEVC/H.264 hash to
correct output. Same kernel.
Therefore Bug 4 + 5 + 6 are kernel-side rkvdec/hantro per-codec bugs
specific to libva's ioctl pattern. Per
feedback_libva_byte_correct_kernel_bug.md (saved iter14), libva-side
changes are confirmed inert for these bugs.
iter17 productive direction: kernel-side investigation via
kernel-agent workflow. Read rkvdec source, instrument via ftrace/
eBPF kprobe, compare kernel state evolution between libva-trigger
and kdirect-trigger for same bitstream.
No code changes in iter16. Substrate unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 ioctl-sequence diff identified missing S_FMT CAPTURE in libva
init (only G_FMT was being called, per iter5b-β's hantro-targeted
comment). α-19 added explicit S_FMT CAPTURE with NV12 + dims after
S_FMT OUTPUT, before CREATE_BUFS. strace confirms libva now emits
identical S_FMT CAPTURE call to kdirect:
S_FMT CAPTURE NV12 1280x720 -> sizeimage=1843200, bytesperline=1280
5-codec sweep on α-19 backend: byte-identical anchors. HEVC still
06b2c5a0... all-zero, H.264 still 71ac099b... partial. Wire correct,
behavior unchanged.
Cumulative iter8-iter15: 14 hypotheses eliminated for Bug 4 + 5. Libva
backend ioctl + payload sequence is now structurally equivalent to
kdirect's at every byte/field level rkvdec reads. Remaining diffs are
in allocation pattern (REQBUFS vs incremental CREATE_BUFS) and pool
sizes (libva 24+16, kdirect ~13+4) — high-risk to change without
clearer kernel evidence; VP9/MPEG-2 work with libva's pattern.
Bug 4 + 5 confirmed kernel-side rkvdec failures specific to HEVC +
H.264 paths on RK3399 that libva's pattern triggers and kdirect's
doesn't. Per-codec kernel-level investigation is the only productive
direction; route via kernel-agent.
α-19 ships as wire-correctness hygiene (zero regression). Backend
SHA c1d4bb53...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
α-16 OUTPUT byte dump: libva HEVC frame 1 = 96893 bytes = 1 ANNEX-B
start code + 96890 byte IDR NAL with header 0x28 (nal_unit_type 20 =
IDR_N_LP, correct). Byte-compared against input file's raw HEVC
ANNEX-B stream (after VPS+SPS+PPS): 0 bytes differ over 96890 byte
overlap. The 1-byte tail diff is an inter-NAL boundary marker, not
slice payload.
Libva submits BYTE-IDENTICAL slice bytes as what the input contains
and what kdirect submits. Combined with iter11's wire-byte audit
showing every libva-vs-kdirect control diff is in a field rkvdec
ignores, AND iter12's RFC v2 substrate upgrade producing zero
codec-correctness change, AND iter13's DMA_BUF_IOCTL_SYNC ioctl
working but inert:
Cumulative iter8-iter14: 13 hypotheses eliminated. Libva backend
is empirically byte-correct on its side. Bug 4 + Bug 5 are
KERNEL-SIDE failures specific to how rkvdec processes the libva
ioctl sequence vs the kdirect sequence — NOT a libva backend bug.
iter15+ candidates:
- Full ioctl-sequence trace diff (libva vs kdirect, find first
divergence in syscall order/args).
- kernel-side rkvdec ftrace/eBPF kprobe instrumentation; route
via kernel-agent.
- Campaign close-out: VP9+MPEG-2 PASS direct, HEVC+H.264+VP8 narrowed
to kernel-side with byte-clean libva submission.
Backend SHA fa2098b6... 8 cumulative iter11-iter14 commits all ship
clean (wire-correctness, env-gated diagnostics, zero regression).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
α-17 implemented and deployed. strace confirms VIDIOC_EXPBUF +
DMA_BUF_IOCTL_SYNC(START|READ) before memcpy + END after, all returning 0.
The libva backend now follows the V4L2+dma-buf cache-sync contract
correctly. But 5-codec sweep hashes are byte-identical to anchors:
no Bug 4/5 movement.
Cache-sync hypothesis empirically falsified. Bug 4 + 5 are NOT a CPU
cache-coherency issue on the libva cached-mmap path.
Three consecutive PARTIAL closes (iter11 wire-byte, iter12 RFC v2,
iter13 cache-sync) confirms libva-backend-side hypothesis space for
Bug 4+5 is exhausted. The live source is kernel-side write-
completeness for HEVC and H.264 on RK3399 rkvdec — distinct from
cache visibility (γ dump iter8 already confirmed destination_data[]
post-DQBUF matches YUV output).
Backend SHA on fresnel: 9ba47002...
iter14 candidates:
α-16: OUTPUT byte dump (cheapest remaining)
kernel-side rkvdec audit (deepest; route via kernel-agent)
pivot to Bug 6 VP8 or campaign close-out documentation
α-17 itself is real wire-correctness progress even as a non-fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Boltzmann built linux-fresnel-fourier 7.0-2 in ~50 min (8-core native,
no distcc). Package sha 843fd4462a09b3d9... Deployed to fresnel:
sudo pacman -U clean. extlinux hook updated entry. sddm autologin as
mfritsche persisted. Reboot succeeded; fresnel up on new kernel
within 30s.
5-codec sweep post-reboot: all 5 hashes BYTE-IDENTICAL to pre-iter12
anchors. RFC v2's dma_resv fence machinery does NOT engage libva's
cached-mmap pixel readback path. Consistent with what
reference_dmabuf_resv_blocker.md memo always said: vaDeriveImage /
cached-mmap is the broken path; RFC v2 helps DRM_PRIME / compositor
paths.
Substrate state moved forward (kernel 7.0-1 -> 7.0-2 with RFC v2).
Memory entries updated:
reference_fresnel_kernel_substrate.md (pkg version + patch list)
feedback_rfc_v2_vb2_dma_resv_scope.md (NEW — scope clarification)
iter13 candidates ranked:
α-17: DMA_BUF_IOCTL_SYNC(START|END) in libva backend around image
read sites (~30 LOC).
α-18: switch libva image export to DRM_PRIME (larger refactor).
α-16: OUTPUT byte dump (deferred again).
α-17 is the natural follow-on — Figa's 2024 "userspace responsibility
for explicit sync" line directly addresses the libva-cached-mmap path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User signaled RFC v2 is prepared at boltzmann:~/v2-patch-work/v2-out/.
Three patches:
0001 media: videobuf2: add opt-in dma_resv producer fence helper
0002 media: hantro: attach dma_resv release fence at device_run
0003 media: rockchip-rga: attach dma_resv release fence at ...
v2 key change vs v1: attach moves from buf_queue to m2m device_run
(Dufresne's finite-time-contract concern). Build the kernel package
on boltzmann (~/src/kernel-agent-bootstrap/.../linux-fresnel-fourier/),
deploy to fresnel, reboot, retest.
sddm auto-login as mfritsche staged in /etc/sddm.conf.d/20-autologin.conf
on fresnel before reboot per user authorization.
Phase 0's α-16 OUTPUT-byte dump candidate parked; kernel substrate
upgrade takes precedence given RFC v2 is the long-stalled
reference_dmabuf_resv_blocker.md unblock.
Iter12 outcomes:
PASS = Bug 4/5 hashes shift toward kdirect after reboot.
PARTIAL = kernel upgraded cleanly, no regression, hashes unchanged.
Either outcome is valuable — substrate moves forward regardless.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After iter11 close, both Bug 4 (H.264 partial fill) and Bug 5 (HEVC
all-zero) share the same architectural pattern: libva control payloads
can be made byte-equivalent to kdirect for fields rkvdec consumes,
yet libva produces wrong output while kdirect succeeds.
Remaining unexamined surface = OUTPUT bitstream bytes (source_data
that the kernel reads). iter12 candidate α-16: extend γ infra to
dump source_data pre-QBUF, compare with kdirect.
If bytes match → both bugs are outside libva (kernel/HW state).
If bytes differ → narrow to bitstream-write divergence site.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
α-13 + α-14 changed two HEVC wire-byte fields to match kdirect
(sps_max_num_reorder_pics, decode_params IRAP|IDR flags). Output
unchanged (06b2c5a0... still all-zero). 5-codec regression sweep:
zero regression.
Cumulative iter11 eliminations: 4 fields (sps_max_num_reorder_pics,
sps_max_latency_increase_plus1, IRAP/IDR flags, num_entry_point_offsets)
all confirmed kernel-ignored on RK3399 per rkvdec-hevc.c grep.
Wire-byte landscape after iter11: every observable libva-vs-kdirect
HEVC control-payload diff is in a field rkvdec ignores. Bug 5 root
cause is NOT in S_EXT_CTRLS payload.
Same wall as Bug 4 / iter9: wire-byte search exhausted. Real cause is
in OUTPUT bitstream bytes the kernel reads. iter12 candidate: extend
γ infrastructure to dump source_data pre-QBUF, compare with kdirect
byte-by-byte. Bug 4 and Bug 5 likely both close via this same
instrumentation given the parallel structure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer empirically read rkvdec-hevc.c on boltzmann kernel-agent
tree. sps_max_num_reorder_pics is NOT read by rkvdec. α-13 would
match kdirect's wire bytes but produce no behavioural change.
CRIT-2: num_entry_point_offsets (libva hardcoded 0 at h265.c:356,
kdirect 22 from slice header parse) + PPS UNIFORM_SPACING flag are
the live candidates. BBB HEVC uses WPP (ENTROPY_CODING_SYNC flag
set in PPS) not tiles; 22 entry points = 23 CTB rows for 720p with
32-pixel CTBs.
Decision: land α-13 as wire-correctness hygiene (matches kdirect,
no regression risk), then α-14 for the actual fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
iter10 closed negative at Phase 0 (Bommarito unreachable on RK3399).
Saved kernel build + reboot cycle by source-tree reachability check.
iter11 opens with Bug 5 (HEVC libva all-zero) as research target.
Replay iter8/iter9 methodology: deep strace HEVC libva vs kdirect,
decode V4L2_CID_STATELESS_HEVC_* control payload bytes, find the
diff that causes rkvdec to produce all-zero output for libva while
kdirect's submission produces correct decode.
In scope: src/h265.c (libva HEVC), Phase 3 strace + byte-decode.
Out of scope: ext_sps_st/lt_rps (VDPU381/383-only, not RK3399),
kernel patches until empirical evidence of a kernel-side gap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empirical reachability check on linux-fresnel-fourier 7.0-1 source tree
at boltzmann:~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/.../linux-7.0/.
rkvdec_hevc_assemble_hw_rps() is defined in rkvdec-hevc-common.c:411 and
called ONLY from rkvdec-vdpu381-hevc.c:609 (RK3576) and
rkvdec-vdpu383-hevc.c:620 (RK3588). RK3399's variant_ops bind to
rockchip,rk3399-vdec and route HEVC through the older standalone
rkvdec-hevc.c, which does NOT call rkvdec_hevc_assemble_hw_rps.
Bommarito's May 13 patch is real and load-bearing on RK3588/3576,
but inert on RK3399 / fresnel. Not iter10 vehicle for Bug 5.
Saved a kernel build/reboot cycle by Phase-0 reachability check.
Memory rule candidate: before applying any upstream patch to fresnel's
kernel, verify the patched path is reachable from rockchip,rk3399-vdec.
mainline rkvdec has diverging per-variant code (VDPU381/383 vs RK3399
legacy).
iter10 candidate pivots:
- α-10: audit rkvdec-hevc.c (RK3399 legacy) for analogous OOB gaps;
same KUnit/KASAN methodology Bommarito used. Route via kernel-agent
per user directive.
- α-12: stay on Bug 4 H.264 (PPS deep diff / OUTPUT bitstream).
User directive registered: "consult kernel-agent for kernel work."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
α-7 (monotonic timestamp counter) changed wire bytes but H.264 output
unchanged (71ac099b...). Confirms Phase 5 CRIT-1 prediction: VP9/MPEG-2
PASS via libva with the same v4l2_timeval_to_ns(&ref->timestamp)
pattern; therefore timestamp magnitude was never load-bearing.
5-codec regression sweep: all 4 non-H.264 anchors hold. Zero regression.
Cumulative state after iter8+iter9:
- 6 hypotheses eliminated (libva-readback, slot-binding, stale-residue,
constraint_set_flags, POC sentinel, reference_ts magnitude)
- libva-vs-kdirect H.264 wire-byte diff is now empirically zero
- α-2 + α-7 shipped as wire-payload hygiene cleanups (zero behavior
change but cleaner semantics)
iter10 candidate ranking:
1. α-8 OUTPUT bitstream byte dump (compare in-memory slice bytes)
2. α-9 untraced control diff (device-wide controls beyond DECODE_MODE
+ START_CODE)
3. Kernel-side investigation (rkvdec source dive for 16x32 partial-
decode signature)
4. Pivot to Bug 5 (HEVC) or Bug 6 (VP8)
Two more iterations of diminishing returns suggest either deeper
empirical work (OUTPUT-byte dump or kernel investigation) or pivot
to a different bug.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VP9 (vp9.c:624) and MPEG-2 (mpeg2.c:150,156) use v4l2_timeval_to_ns
identically to H.264. Both PASS via libva with the same gettimeofday-
based giant ns values. If timestamp magnitude were the bug, VP9/MPEG-2
should also fail. They don't.
Reviewer flagged α-7 as low-probability fix and pointed to iter10
kernel-side investigation (M-A vb2_find_buffer_by_timestamp overflow)
if α-7 confirmed inert.
IMP-1: timestamp_counter should live in object_context not driver_data
to avoid multi-context collisions.
Decision: implement α-7 anyway as empirical confirmation (5 min) since
test cost is trivial. If α-7 fails as predicted, iter9 closes PARTIAL
with wire-byte search exhausted; iter10 candidates pivot to slice-data
encoding or kernel investigation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 deep-strace yielded a critical narrowing:
- Post-DPB DECODE_PARAMS bytes (512-559): IDENTICAL libva vs kdirect
- PPS: IDENTICAL
- SPS: identical except inert constraint_set_flags
- DPB[0] beyond reference_ts: IDENTICAL after α-2
The ONLY remaining wire-byte diff between libva (broken) and kdirect
(working) is reference_ts magnitude. libva uses gettimeofday giving
~1.78e18 ns; kdirect uses an internal counter giving ~10000 ns.
α-7 hypothesis: V4L2 stateless decoder (rkvdec) reference-resolution
fails for very large reference_ts values. Possible mechanisms:
M-A: vb2_find_buffer_by_timestamp truncates/overflows on giant values.
M-B: V4L2 framework transforms OUTPUT QBUF ts before storing on CAPTURE
but DPB.reference_ts left untransformed → mismatch.
M-C: gettimeofday + v4l2_timeval_to_ns produce slightly different ns
values than the kernel computes from the timeval QBUF.
Fix: ~10 LOC. Add timestamp_counter to driver_data; replace
gettimeofday in EndPicture with monotonic counter.
If α-7 works → iter9 PASS, Bug 4 closed.
If α-7 doesn't → iter9 PARTIAL, wire-byte search space effectively
exhausted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sonnet-architect Phase 5b read rkvdec-h264.c end-to-end and confirmed:
constraint_set_flags is NEVER accessed by the driver. assemble_hw_pps()
reads only chroma_format_idc, bit_depth_*, log2_max_frame_num_minus4,
max_num_ref_frames, pic_order_cnt_type, log2_max_pic_order_cnt_lsb_minus4,
and dimension fields. rkvdec_h264_validate_sps() doesn't validate it.
CONSTRAINT_SET3_FLAG and PROFILE_IDC in the hardware PPS packet are
hardcoded constants (1 and 0xFF respectively), not propagated from the
incoming SPS.
α-1 will not unblock Bug 4. Plan-killer.
CRIT-2: ConstrainedBaseline 0x42 mapping is wrong (bit 6 reserved);
correct value 0x12 (bit 1 | bit 4) per H.264 §A.2.1.1.
IMP-1 redirects: DPB entry flags + POC fields are the next candidate.
rkvdec config_registers() reads dpb[i].flags ACTIVE/FIELD bits and
dpb[i].fields TOP/BOT bits. lookup_ref_buf_idx() substitutes destination
buffer as reference when ACTIVE missing — silent corruption matching
observed symptom.
IMP-2/3: full PPS byte comparison + close-criteria framing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
γ dump confirms libva reads buffer correctly; the 16x32 patch and
stride-4 UV markers appear at YUV output exactly as in the dump.
IMP-1 memset-before-QBUF test: pre-zeroing buffer does NOT change output
(identical hash). The 512 bytes ARE deterministic kernel writes, not
stale residue.
Bug root cause: rkvdec accepts libva's H.264 decode request without
error flags but writes only 16x32 of luma-neutral data + stride-4 UV
scratch. Kernel decoded a tiny bit then stopped.
Phase 3 SPS diff: libva SPS.constraint_set_flags=0x00 vs kdirect's
0x02 — likely the kernel hint that triggers rkvdec's full decode path
for Main profile. Phase 4b α-1 fix: derive constraint_set_flags per
VAProfile in h264_set_controls. ~10 LOC. Phase 5b review required.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CRIT-1: request_log prepends prefix on every call; per-byte loop in γ
sketch would emit 32 prefix-only lines. Fix: snprintf buffered emit.
CRIT-2: γ dump block missing null guard on destination_data[]; the
plan's env-var check is outside the current_slot != NULL guard. Fix:
nest the dump inside the existing slot-null guard.
IMP-1: "stale residue from prior decode" not eliminated as alternative
explanation for the 16x32 patch. Add memset-zero-before-QBUF experiment
to Phase 7 to discriminate.
IMP-2: γ-first defensible but on IMP-1 grounds, not the
three-signature argument (which is weaker than stated).
IMP-3/4 placement clarifications. MIN-1/2/3 cosmetic.
5 mechanical amendments locked for Phase 6. γ-first strategy stands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 redefined Bug 4 to partial-fill (not inter race). Three distinct
per-codec signatures (VP9 correct, HEVC zero, H.264 partial-leak) can't
be explained by a single hypothesis. Phase 4 commits to γ first: a
~30 LOC env-gated diagnostic dump in RequestSyncSurface that fires
after CAPTURE DQBUF, prints first/last 32 bytes of each destination_data
plane and a non-zero-count of the first 1024 bytes.
γ definitively distinguishes "kernel didn't write" from "libva mis-reads"
from "slot binding wrong". Phase 4b targeted fix follows γ's outcome.
Out of scope: per-codec H.264 control-fill changes (gated on γ's
findings), VP9/VP8/HEVC/MPEG-2 paths, kernel patches.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 strace + byte-level analysis on fresnel rkvdec. Findings:
1. Bug 4 is NOT inter-race-loss. The IDR keyframe itself fails through
libva (only 512 bytes of real Y data at top-left 16x32 region).
2. The 16x32 leak is structured real image content (smooth gradients,
neutral luma ~0x80) — kernel decoded one tile / one MB pair, then
stopped.
3. VP9 via libva WORKS through the same readback path (100% non-zero,
real image data). So the bug isn't generic DMA-BUF cache coherency.
4. HEVC fails via libva (all-zero, distinct from H.264 partial-fill).
5. OUTPUT sizeimage = 1MB (SOURCE_SIZE_MAX) is sufficient — BBB IDR is
only 6321 bytes. Not the bug.
6. Control payload diffs: SPS.constraint_set_flags = 0 vs kdirect's 2
(probably cosmetic); DECODE_PARAMS.dpb[0].bottom_field_order_cnt = 0
vs kdirect's 1 (load-bearing for POC).
Refined hypothesis: a specific H.264 control field libva sends causes
rkvdec to abort after partial decode. Phase 4 candidates: α fix POC
fields, β bump OUTPUT sizeimage, γ instrumentation dump, δ relative
timestamps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User pick at iter8 open. Carried unchanged through 5 iters (iter4..iter7);
keyframe partially decodes (frame-1 first 16 bytes = real chroma) while
inter frames return all-zero. Pass criterion: libva_h264 == kdirec_h264
== sw_h264 byte-identical for bbb_1080p30_h264.mp4 3-frame, including
inter frames.
In scope: src/h264.c, src/h264_slice_header.c, src/picture.c H.264 paths,
per-frame request_fd lifecycle. Out of scope: VP9/VP8/HEVC/MPEG-2, kernel
patches, performance, all other backlog items.
Substrate at iter8 open: fork tip 6df2159 (iter7), backend SHA 520507f6..,
kernel linux-fresnel-fourier 7.0-1, auto-detect picks rkvdec on every boot.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>