Commit Graph

98 Commits

Author SHA1 Message Date
marfrit c1f9738368 iter31 α-29 close: HEVC Bug 5 remainder FIXED — 3/3 PASS
Fix: slice_params.short_term_ref_pic_set_size = picture->st_rps_bits
(was 0, mis-routed via α-26 into decode_params with same field name).

Final 5-codec state:
- H.264 10F: PASS (byte-equal SW)
- HEVC 10F: PASS (byte-equal SW)  ← THIS ITER
- VP9 10F: PASS (byte-equal SW)
- MPEG-2 / VP8: untestable through libva single-device probe
  (pre-existing limitation, orthogonal to Bug 4/5)

Backend fork tip: 23eb1bd. Kernel: 7.0-10 (diagnostic printks still in,
production cleanup outstanding).
2026-05-14 15:30:48 +00:00
marfrit 422ecafca9 Add pre-compact handoff doc for session resumption 2026-05-14 14:55:15 +00:00
marfrit c15fc6c0f6 iter28b DIAG documented: universal trim=40 breaks IDR (reverted)
Confirmed the 40-byte inflation is non-uniform — IDR slice has correct
size from VAAPI; only P/B slices are inflated. Real fix requires dynamic
rbsp_stop_bit detection or per-slice-type logic.
2026-05-14 14:45:35 +00:00
marfrit 8b17bf797a Final session summary: H264 + VP9 + HEVC frame 1 byte-equal to SW
Bug 4 (H264 keyframe-partial): FIXED.
Bug 5 (HEVC libva all-zero): partial fix, frame 1 byte-equal.
Root cause: rkvdec_s_ctrl -EBUSY when first SPS triggers image_fmt
reset on busy CAPTURE queue (libva pre-allocates buffers at
CreateContext, kernel blocks the reset).
Fix: 90-LOC synthetic SPS injection in libva CreateContext before
cap_pool_init pre-seeds ctx->image_fmt.

Remaining: HEVC frame 2+ (ffmpeg-vaapi slice_data 40-byte inflation),
MPEG-2/VP8 (libva multi-device probe). Both deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 12:10:08 +00:00
marfrit 02c4192902 iter27/28: probe HEVC frame 2+ divergence; α-27/α-28 no-op; ffmpeg-vaapi slice_data inflation localized
α-27: num_entry_point_offsets — VAAPI returns 0, rkvdec doesn't use it
α-28: bit_size = (slice_data_size - data_byte_offset) * 8 — matches kdirect's
      printk value, but rkvdec doesn't use bit_size either. Output unchanged.

Remaining HEVC frame 2+ root cause: libva's slice_data buffer (from VAAPI)
is 40 bytes larger per slice than what ffmpeg-v4l2request appends from
libavcodec for the same frame. The trailing bytes inflate OUTPUT buffer
content → rkvdec reads past slice payload into garbage → frame 2+ wrong.

Campaign status: H264  (Bug 4 fixed), HEVC frame 1  (Bug 5 partial),
VP9 , HEVC frame 2+ ⚠️ (deferred to ffmpeg-vaapi-level fix).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 10:28:34 +00:00
marfrit bf67900cd8 iter20-26: kernel-side root-cause localization, α-25/α-26 fix Bug 4, partial Bug 5
iter20-23: kernel printk in rkvdec_hevc_run + v4l2_ctrl_request_setup
iter24:    pinpointed rkvdec_s_ctrl returning -EBUSY for HEVC_SPS due
           to vb2_is_busy(CAPTURE) — libva pre-allocates 24 CAPTURE bufs
           before first per-frame S_EXT_CTRLS, blocking image_fmt reset
iter25 α-25: synthetic SPS injection before cap_pool_init seeds
           ctx->image_fmt to RKVDEC_IMG_FMT_420_8BIT while CAPTURE is
           still empty. H264 Bug 4 fully fixed (byte-equal kdirect).
           HEVC Bug 5 frame 1 fixed (byte-equal kdirect).
iter26 α-26: populate decode_params.short_term_ref_pic_set_size from
           picture->st_rps_bits (VAAPI does expose it). Bytes 4-5 of
           dp now match kdirect. HEVC frame 2+ still diverges
           (separate bug, likely DPB entry mapping).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 10:10:56 +00:00
marfrit a443ad73d3 iter19 Phase 8 close: mechanism 2 (REINIT) disproved; ctrl_hdl mismatch is sole remaining hypothesis
α-23 test (skip media_request_reinit): no change. HEVC still 06b2c5a0...
all-zero. Kernel printk still shows w=0 h=0 for libva.

Cumulative disproved mechanisms (iter17-iter19):
  2. REINIT clears between S_EXT_CTRLS and QUEUE: DISPROVED (α-23)
  3. Stale stack-local pointer: DISPROVED (α-21)
  5. Silent partial failure via error_idx: DISPROVED (α-22)
  1. request_fd mismatch: unlikely per strace evidence

Remaining:
  4. ctrl_hdl mismatch — libva submits to one v4l2_ctrl_handler,
     rkvdec reads from another.

iter20 candidate: kernel printk dumping &ctx->ctrl_hdl, per-ID
ctrl pointer, and *p_cur.p first bytes during rkvdec_hevc_run_preamble.
Comparing libva vs kdirect will pinpoint where the mismatch sits.

State at close: backend c1d4bb53... (iter15 stable). Fork tip 415688d.
5-codec anchors held. Diagnostic kernel 7.0-3 still running on fresnel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 09:04:11 +00:00
marfrit a449cec92e iter18 Phase 8 close: mechanisms 3 + 5 disproved; iter17 finding stands
α-21 (heap-persist HEVC controls past IOC_QUEUE): hash unchanged.
  -> Kernel does copy at S_EXT_CTRLS time, not deferred. Mechanism 3 dead.

α-22 (log error_idx after S_EXT_CTRLS): error_idx = count - 1 in BOTH
  the working device-init batch AND the broken per-frame batch. Not
  a failure indicator in this kernel version. Mechanism 5 dead.

Backend reverted to iter15 stable state c1d4bb53... All 5-codec
anchors preserved.

Remaining mechanisms (untested):
  1. request_fd mismatch (unlikely; strace shows consistent fd)
  2. REINIT clears controls between S_EXT_CTRLS and QUEUE (LEADING)
  4. ctrl_hdl mismatch (libva submits to one, rkvdec reads from another)

iter17's empirical finding still stands as the campaign's strongest
narrowing: rkvdec sees zero SPS for libva, correct for kdirect. The
mechanism is between S_EXT_CTRLS submission and ctx->ctrl_hdl->p_cur
read, specific to libva's invocation pattern.

iter19 candidate (α-23): test mechanism 2 by disabling
media_request_reinit() in libva's RequestSyncSurface. If hashes
change, REINIT timing is the bug. Alternative (mechanism 4): kernel
printk that dumps &ctx->ctrl_hdl + per-request handler pointer,
comparing libva vs kdirect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 09:02:19 +00:00
marfrit cbead4ec64 iter17 Phase 7: KERNEL PRINTK FINDS THE BUG — controls lost between S_EXT_CTRLS and rkvdec read
DEFINITIVE FINDING via pr_info in rkvdec_hevc_run on RK3399:

libva HEVC:    w=0 h=0 reorder=0 chroma=0 nal_unit_type=0 decode_flags=0x0
kdirect HEVC:  w=1280 h=720 reorder=2 chroma=1 nal_unit_type=20 decode_flags=0x3

The kernel sees ALL-ZERO control structs for libva HEVC, but CORRECT values
for kdirect. Same kernel, same code path, same /dev/video1, same
rkvdec_hevc_run_preamble fetching v4l2_ctrl_find(ctx->ctrl_hdl,
HEVC_SPS)->p_cur.p.

This overturns iter11-iter15's "wire-byte search exhausted" conclusion.
The S_EXT_CTRLS payloads ARE byte-correct at the strace observer level,
but the kernel sees zeros. The bug is in the
S_EXT_CTRLS -> request -> ctx->ctrl_hdl path, specifically for libva.

Five mechanisms hypothesized:
  1. request_fd mismatch
  2. REINIT clears controls before QUEUE
  3. Compound-control copy deferred until QUEUE -> stack-locals stale
  4. ctrl_hdl mismatch (libva submits to one, rkvdec reads another)
  5. error_idx silently fails

Key difference observed:
  libva stores SPS/PPS/decode_params as STACK LOCALS in h265_set_controls
  kdirect stores them in heap-allocated hwaccel_picture_private

Mechanism 3 (kernel defers compound-ctrl copy_from_user) is the leading
hypothesis. iter18 α-21: heap-allocate libva's HEVC control structs;
if Bug 5 fixes, apply same pattern to H.264 (Bug 4) and VP8 (Bug 6).

This is the strongest narrowing since iter5b-β.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 08:55:58 +00:00
marfrit 57051b665c iter17 Phase 0: kernel-side rkvdec_hevc_run diagnostic printk
Per iter16 close (Bug 4/5/6 confirmed kernel-side, libva byte-correct),
add a single pr_info at rkvdec_hevc_run entry dumping key state values
from run->sps / pps / slices_params[0] / decode_params. Build 7.0-3,
deploy, reboot, run libva-HEVC + kdirect-HEVC, diff dmesg output.

Outcome interpretations:
  identical -> bug is in rkvdec assemble_hw_*/config_registers/HW path
  different -> libva somehow leaks different struct contents via non-
                ioctl path despite identical V4L2 ioctls

Build running on boltzmann via kernel-agent workflow; pkgrel 7.0-2 -> 7.0-3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 08:44:57 +00:00
marfrit caf480ef71 iter16 Phase 8 close: VP8 OUTPUT byte-verified — Bug 4/5/6 same cause class
Applied iter14's α-16 OUTPUT byte verification to VP8. Result:
  libva VP8 frame 1 OUTPUT dump: 300614 bytes
  input IVF frame 1: 300624 bytes
  diff with +10-byte offset (VP8 uncompressed header stripped by VAAPI
  consumer client-side): 0 bytes differ.

Libva's VP8 OUTPUT bytes are byte-identical to the input frame minus
the 10-byte uncompressed header. Same correctness as iter14's HEVC
verification.

Cumulative finding: ALL THREE remaining campaign bugs (Bug 4 H.264
partial-fill, Bug 5 HEVC all-zero, Bug 6 VP8 partial) have:
- libva controls byte-equal to kdirect on rkvdec-read fields
- libva OUTPUT bitstream bytes byte-identical to input
- libva ioctl sequence structurally close to kdirect after iter15 α-19

But:
- VP9 + MPEG-2 work via the same libva backend on the same kernel.
- libva HEVC/H.264 hash to wrong output; kdirect HEVC/H.264 hash to
  correct output. Same kernel.

Therefore Bug 4 + 5 + 6 are kernel-side rkvdec/hantro per-codec bugs
specific to libva's ioctl pattern. Per
feedback_libva_byte_correct_kernel_bug.md (saved iter14), libva-side
changes are confirmed inert for these bugs.

iter17 productive direction: kernel-side investigation via
kernel-agent workflow. Read rkvdec source, instrument via ftrace/
eBPF kprobe, compare kernel state evolution between libva-trigger
and kdirect-trigger for same bitstream.

No code changes in iter16. Substrate unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 08:37:26 +00:00
marfrit 42c0515900 iter15 Phase 8 close: α-19 S_FMT CAPTURE wires up, 14 hypotheses eliminated
Phase 3 ioctl-sequence diff identified missing S_FMT CAPTURE in libva
init (only G_FMT was being called, per iter5b-β's hantro-targeted
comment). α-19 added explicit S_FMT CAPTURE with NV12 + dims after
S_FMT OUTPUT, before CREATE_BUFS. strace confirms libva now emits
identical S_FMT CAPTURE call to kdirect:
  S_FMT CAPTURE NV12 1280x720 -> sizeimage=1843200, bytesperline=1280

5-codec sweep on α-19 backend: byte-identical anchors. HEVC still
06b2c5a0... all-zero, H.264 still 71ac099b... partial. Wire correct,
behavior unchanged.

Cumulative iter8-iter15: 14 hypotheses eliminated for Bug 4 + 5. Libva
backend ioctl + payload sequence is now structurally equivalent to
kdirect's at every byte/field level rkvdec reads. Remaining diffs are
in allocation pattern (REQBUFS vs incremental CREATE_BUFS) and pool
sizes (libva 24+16, kdirect ~13+4) — high-risk to change without
clearer kernel evidence; VP9/MPEG-2 work with libva's pattern.

Bug 4 + 5 confirmed kernel-side rkvdec failures specific to HEVC +
H.264 paths on RK3399 that libva's pattern triggers and kdirect's
doesn't. Per-codec kernel-level investigation is the only productive
direction; route via kernel-agent.

α-19 ships as wire-correctness hygiene (zero regression). Backend
SHA c1d4bb53...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 08:35:37 +00:00
marfrit 18f24cd26d iter14 Phase 8 close: α-16 finds libva HEVC OUTPUT bytes BYTE-IDENTICAL to input
α-16 OUTPUT byte dump: libva HEVC frame 1 = 96893 bytes = 1 ANNEX-B
start code + 96890 byte IDR NAL with header 0x28 (nal_unit_type 20 =
IDR_N_LP, correct). Byte-compared against input file's raw HEVC
ANNEX-B stream (after VPS+SPS+PPS): 0 bytes differ over 96890 byte
overlap. The 1-byte tail diff is an inter-NAL boundary marker, not
slice payload.

Libva submits BYTE-IDENTICAL slice bytes as what the input contains
and what kdirect submits. Combined with iter11's wire-byte audit
showing every libva-vs-kdirect control diff is in a field rkvdec
ignores, AND iter12's RFC v2 substrate upgrade producing zero
codec-correctness change, AND iter13's DMA_BUF_IOCTL_SYNC ioctl
working but inert:

Cumulative iter8-iter14: 13 hypotheses eliminated. Libva backend
is empirically byte-correct on its side. Bug 4 + Bug 5 are
KERNEL-SIDE failures specific to how rkvdec processes the libva
ioctl sequence vs the kdirect sequence — NOT a libva backend bug.

iter15+ candidates:
  - Full ioctl-sequence trace diff (libva vs kdirect, find first
    divergence in syscall order/args).
  - kernel-side rkvdec ftrace/eBPF kprobe instrumentation; route
    via kernel-agent.
  - Campaign close-out: VP9+MPEG-2 PASS direct, HEVC+H.264+VP8 narrowed
    to kernel-side with byte-clean libva submission.

Backend SHA fa2098b6... 8 cumulative iter11-iter14 commits all ship
clean (wire-correctness, env-gated diagnostics, zero regression).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 08:29:10 +00:00
marfrit 2eaf737145 iter13 Phase 8 close: α-17 DMA_BUF_IOCTL_SYNC ioctls fire but hashes unchanged
α-17 implemented and deployed. strace confirms VIDIOC_EXPBUF +
DMA_BUF_IOCTL_SYNC(START|READ) before memcpy + END after, all returning 0.
The libva backend now follows the V4L2+dma-buf cache-sync contract
correctly. But 5-codec sweep hashes are byte-identical to anchors:
no Bug 4/5 movement.

Cache-sync hypothesis empirically falsified. Bug 4 + 5 are NOT a CPU
cache-coherency issue on the libva cached-mmap path.

Three consecutive PARTIAL closes (iter11 wire-byte, iter12 RFC v2,
iter13 cache-sync) confirms libva-backend-side hypothesis space for
Bug 4+5 is exhausted. The live source is kernel-side write-
completeness for HEVC and H.264 on RK3399 rkvdec — distinct from
cache visibility (γ dump iter8 already confirmed destination_data[]
post-DQBUF matches YUV output).

Backend SHA on fresnel: 9ba47002...

iter14 candidates:
  α-16: OUTPUT byte dump (cheapest remaining)
  kernel-side rkvdec audit (deepest; route via kernel-agent)
  pivot to Bug 6 VP8 or campaign close-out documentation

α-17 itself is real wire-correctness progress even as a non-fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 08:08:11 +00:00
marfrit 33f74b07c8 iter12 Phase 8 close: kernel 7.0-2 with RFC v2 deployed; Bug 4/5 unchanged
Boltzmann built linux-fresnel-fourier 7.0-2 in ~50 min (8-core native,
no distcc). Package sha 843fd4462a09b3d9... Deployed to fresnel:
sudo pacman -U clean. extlinux hook updated entry. sddm autologin as
mfritsche persisted. Reboot succeeded; fresnel up on new kernel
within 30s.

5-codec sweep post-reboot: all 5 hashes BYTE-IDENTICAL to pre-iter12
anchors. RFC v2's dma_resv fence machinery does NOT engage libva's
cached-mmap pixel readback path. Consistent with what
reference_dmabuf_resv_blocker.md memo always said: vaDeriveImage /
cached-mmap is the broken path; RFC v2 helps DRM_PRIME / compositor
paths.

Substrate state moved forward (kernel 7.0-1 -> 7.0-2 with RFC v2).
Memory entries updated:
  reference_fresnel_kernel_substrate.md (pkg version + patch list)
  feedback_rfc_v2_vb2_dma_resv_scope.md (NEW — scope clarification)

iter13 candidates ranked:
  α-17: DMA_BUF_IOCTL_SYNC(START|END) in libva backend around image
        read sites (~30 LOC).
  α-18: switch libva image export to DRM_PRIME (larger refactor).
  α-16: OUTPUT byte dump (deferred again).

α-17 is the natural follow-on — Figa's 2024 "userspace responsibility
for explicit sync" line directly addresses the libva-cached-mmap path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 07:47:51 +00:00
marfrit de889898b8 iter12 Phase 4 + 6: integrate vb2_dma_resv RFC v2 into linux-fresnel-fourier 7.0-2
User signaled RFC v2 is prepared at boltzmann:~/v2-patch-work/v2-out/.
Three patches:
  0001 media: videobuf2: add opt-in dma_resv producer fence helper
  0002 media: hantro: attach dma_resv release fence at device_run
  0003 media: rockchip-rga: attach dma_resv release fence at ...

v2 key change vs v1: attach moves from buf_queue to m2m device_run
(Dufresne's finite-time-contract concern). Build the kernel package
on boltzmann (~/src/kernel-agent-bootstrap/.../linux-fresnel-fourier/),
deploy to fresnel, reboot, retest.

sddm auto-login as mfritsche staged in /etc/sddm.conf.d/20-autologin.conf
on fresnel before reboot per user authorization.

Phase 0's α-16 OUTPUT-byte dump candidate parked; kernel substrate
upgrade takes precedence given RFC v2 is the long-stalled
reference_dmabuf_resv_blocker.md unblock.

Iter12 outcomes:
  PASS  = Bug 4/5 hashes shift toward kdirect after reboot.
  PARTIAL = kernel upgraded cleanly, no regression, hashes unchanged.

Either outcome is valuable — substrate moves forward regardless.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 06:56:15 +00:00
marfrit f40b025868 iter12 Phase 0: lock OUTPUT bitstream byte dump as next candidate
After iter11 close, both Bug 4 (H.264 partial fill) and Bug 5 (HEVC
all-zero) share the same architectural pattern: libva control payloads
can be made byte-equivalent to kdirect for fields rkvdec consumes,
yet libva produces wrong output while kdirect succeeds.

Remaining unexamined surface = OUTPUT bitstream bytes (source_data
that the kernel reads). iter12 candidate α-16: extend γ infra to
dump source_data pre-QBUF, compare with kdirect.

If bytes match → both bugs are outside libva (kernel/HW state).
If bytes differ → narrow to bitstream-write divergence site.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 06:08:27 +00:00
marfrit 7807326aff iter11 Phase 8 close: HEVC wire-byte search exhausted; same wall as iter9
α-13 + α-14 changed two HEVC wire-byte fields to match kdirect
(sps_max_num_reorder_pics, decode_params IRAP|IDR flags). Output
unchanged (06b2c5a0... still all-zero). 5-codec regression sweep:
zero regression.

Cumulative iter11 eliminations: 4 fields (sps_max_num_reorder_pics,
sps_max_latency_increase_plus1, IRAP/IDR flags, num_entry_point_offsets)
all confirmed kernel-ignored on RK3399 per rkvdec-hevc.c grep.

Wire-byte landscape after iter11: every observable libva-vs-kdirect
HEVC control-payload diff is in a field rkvdec ignores. Bug 5 root
cause is NOT in S_EXT_CTRLS payload.

Same wall as Bug 4 / iter9: wire-byte search exhausted. Real cause is
in OUTPUT bitstream bytes the kernel reads. iter12 candidate: extend
γ infrastructure to dump source_data pre-QBUF, compare with kdirect
byte-by-byte. Bug 4 and Bug 5 likely both close via this same
instrumentation given the parallel structure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 06:07:35 +00:00
marfrit 7a1bd8ec0a iter11 Phase 5: α-13 inert; pivot to α-14 num_entry_point_offsets
Reviewer empirically read rkvdec-hevc.c on boltzmann kernel-agent
tree. sps_max_num_reorder_pics is NOT read by rkvdec. α-13 would
match kdirect's wire bytes but produce no behavioural change.

CRIT-2: num_entry_point_offsets (libva hardcoded 0 at h265.c:356,
kdirect 22 from slice header parse) + PPS UNIFORM_SPACING flag are
the live candidates. BBB HEVC uses WPP (ENTROPY_CODING_SYNC flag
set in PPS) not tiles; 22 entry points = 23 CTB rows for 720p with
32-pixel CTBs.

Decision: land α-13 as wire-correctness hygiene (matches kdirect,
no regression risk), then α-14 for the actual fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 02:06:37 +00:00
marfrit a18ba53d6b iter11 Phase 3 + 4: HEVC SPS wire-byte diff narrows Bug 5 to α-13
Phase 3 deep strace: only meaningful SPS diff is bytes 10-11.
  libva   bytes 10-11 = 00 00 (sps_max_num_reorder_pics=0, latency=0)
  kdirect bytes 10-11 = 02 04 (reorder=2, latency=4)

Hardcoded at h265.c:110-111 with comment "/* not exposed */". VAAPI's
VAPictureParameterBufferHEVC doesn't forward these; kdirect parses
SPS NAL directly. sps_max_num_reorder_pics = 0 tells rkvdec "no
reordering" -> B-frame decode blocked -> all-zero output (Bug 5 fits).

Secondary diffs (Phase 4b candidates if α-13 doesn't close):
  - SLICE_PARAMS num_entry_point_offsets = 0 (hardcoded at h265.c:356
    with "iter2 doesn't do tiles" comment); kdirect submits 22.
  - PPS UNIFORM_SPACING flag bit 20 (don't-care for non-tiled).

Phase 4 α-13: ~2 LOC fix. Set sps_max_num_reorder_pics =
sps_max_dec_pic_buffering_minus1 (safe upper bound per H.265 §A.4.2).
Leave sps_max_latency_increase_plus1 = 0 (spec "unconstrained").

Phase 5b review required before Phase 6b implementation per
"reviews never skippable".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 01:58:03 +00:00
marfrit 5f94d7a9ae iter10 close + iter11 Phase 0: pivot to HEVC wire-byte diff for Bug 5
iter10 closed negative at Phase 0 (Bommarito unreachable on RK3399).
Saved kernel build + reboot cycle by source-tree reachability check.

iter11 opens with Bug 5 (HEVC libva all-zero) as research target.
Replay iter8/iter9 methodology: deep strace HEVC libva vs kdirect,
decode V4L2_CID_STATELESS_HEVC_* control payload bytes, find the
diff that causes rkvdec to produce all-zero output for libva while
kdirect's submission produces correct decode.

In scope: src/h265.c (libva HEVC), Phase 3 strace + byte-decode.
Out of scope: ext_sps_st/lt_rps (VDPU381/383-only, not RK3399),
kernel patches until empirical evidence of a kernel-side gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 01:37:45 +00:00
marfrit 917e9b2691 iter10 Phase 0: Bommarito patch unreachable on RK3399 — close Phase 0 negative
Empirical reachability check on linux-fresnel-fourier 7.0-1 source tree
at boltzmann:~/src/kernel-agent-bootstrap/build/marfrit-packages/arch/.../linux-7.0/.

rkvdec_hevc_assemble_hw_rps() is defined in rkvdec-hevc-common.c:411 and
called ONLY from rkvdec-vdpu381-hevc.c:609 (RK3576) and
rkvdec-vdpu383-hevc.c:620 (RK3588). RK3399's variant_ops bind to
rockchip,rk3399-vdec and route HEVC through the older standalone
rkvdec-hevc.c, which does NOT call rkvdec_hevc_assemble_hw_rps.

Bommarito's May 13 patch is real and load-bearing on RK3588/3576,
but inert on RK3399 / fresnel. Not iter10 vehicle for Bug 5.

Saved a kernel build/reboot cycle by Phase-0 reachability check.

Memory rule candidate: before applying any upstream patch to fresnel's
kernel, verify the patched path is reachable from rockchip,rk3399-vdec.
mainline rkvdec has diverging per-variant code (VDPU381/383 vs RK3399
legacy).

iter10 candidate pivots:
- α-10: audit rkvdec-hevc.c (RK3399 legacy) for analogous OOB gaps;
  same KUnit/KASAN methodology Bommarito used. Route via kernel-agent
  per user directive.
- α-12: stay on Bug 4 H.264 (PPS deep diff / OUTPUT bitstream).

User directive registered: "consult kernel-agent for kernel work."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 01:34:20 +00:00
marfrit 5e2a228cfd iter9 Phase 8 close: α-7 inert as predicted; wire-byte search exhausted
α-7 (monotonic timestamp counter) changed wire bytes but H.264 output
unchanged (71ac099b...). Confirms Phase 5 CRIT-1 prediction: VP9/MPEG-2
PASS via libva with the same v4l2_timeval_to_ns(&ref->timestamp)
pattern; therefore timestamp magnitude was never load-bearing.

5-codec regression sweep: all 4 non-H.264 anchors hold. Zero regression.

Cumulative state after iter8+iter9:
- 6 hypotheses eliminated (libva-readback, slot-binding, stale-residue,
  constraint_set_flags, POC sentinel, reference_ts magnitude)
- libva-vs-kdirect H.264 wire-byte diff is now empirically zero
- α-2 + α-7 shipped as wire-payload hygiene cleanups (zero behavior
  change but cleaner semantics)

iter10 candidate ranking:
1. α-8 OUTPUT bitstream byte dump (compare in-memory slice bytes)
2. α-9 untraced control diff (device-wide controls beyond DECODE_MODE
   + START_CODE)
3. Kernel-side investigation (rkvdec source dive for 16x32 partial-
   decode signature)
4. Pivot to Bug 5 (HEVC) or Bug 6 (VP8)

Two more iterations of diminishing returns suggest either deeper
empirical work (OUTPUT-byte dump or kernel investigation) or pivot
to a different bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:57:26 +00:00
marfrit 3b0880a97f iter9 Phase 5: CRIT-1 — α-7 contradicted by VP9/MPEG-2 PASS evidence
VP9 (vp9.c:624) and MPEG-2 (mpeg2.c:150,156) use v4l2_timeval_to_ns
identically to H.264. Both PASS via libva with the same gettimeofday-
based giant ns values. If timestamp magnitude were the bug, VP9/MPEG-2
should also fail. They don't.

Reviewer flagged α-7 as low-probability fix and pointed to iter10
kernel-side investigation (M-A vb2_find_buffer_by_timestamp overflow)
if α-7 confirmed inert.

IMP-1: timestamp_counter should live in object_context not driver_data
to avoid multi-context collisions.

Decision: implement α-7 anyway as empirical confirmation (5 min) since
test cost is trivial. If α-7 fails as predicted, iter9 closes PARTIAL
with wire-byte search exhausted; iter10 candidates pivot to slice-data
encoding or kernel investigation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:33:54 +00:00
marfrit 4832ffc401 iter9 Phase 4: α-7 implementation contract — monotonic per-context counter
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:27:40 +00:00
marfrit fa771b0625 iter9 Phase 0: lock α-7 timestamp scheme — only remaining wire diff
Phase 0 deep-strace yielded a critical narrowing:
- Post-DPB DECODE_PARAMS bytes (512-559): IDENTICAL libva vs kdirect
- PPS: IDENTICAL
- SPS: identical except inert constraint_set_flags
- DPB[0] beyond reference_ts: IDENTICAL after α-2

The ONLY remaining wire-byte diff between libva (broken) and kdirect
(working) is reference_ts magnitude. libva uses gettimeofday giving
~1.78e18 ns; kdirect uses an internal counter giving ~10000 ns.

α-7 hypothesis: V4L2 stateless decoder (rkvdec) reference-resolution
fails for very large reference_ts values. Possible mechanisms:
M-A: vb2_find_buffer_by_timestamp truncates/overflows on giant values.
M-B: V4L2 framework transforms OUTPUT QBUF ts before storing on CAPTURE
     but DPB.reference_ts left untransformed → mismatch.
M-C: gettimeofday + v4l2_timeval_to_ns produce slightly different ns
     values than the kernel computes from the timeval QBUF.

Fix: ~10 LOC. Add timestamp_counter to driver_data; replace
gettimeofday in EndPicture with monotonic counter.

If α-7 works → iter9 PASS, Bug 4 closed.
If α-7 doesn't → iter9 PARTIAL, wire-byte search space effectively
exhausted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:27:01 +00:00
marfrit 3ed1e454fb iter8 Phase 7c + 8: close iter8 PARTIAL — Bug 4 narrowed via 5 eliminations
α-2 (POC strip removal) changed wire bytes (POC now matches kdirect's
sentinel-encoded 0x10000) but H.264 output unchanged. POC not load-bearing.

5-codec regression sweep on α-2 backend: all 4 non-H.264 anchors hold.
Zero regression.

Iter8 close: 5/6 PASS, criterion-1 PARTIAL. Bug 4 narrowed but not fixed.

Eliminations achieved:
  1. libva-readback bug (γ dump)
  2. Slot-binding wrong (γ dump shows correct slot per surface)
  3. Stale residue (IMP-1 memset confirmed deterministic kernel write)
  4. constraint_set_flags (Phase 5b CRIT-1: rkvdec source review)
  5. POC sentinel strip (α-2 wire change, no output change)

Remaining candidates for iter9: PPS diff (α-3), DECODE_PARAMS post-DPB
fields (α-6), DPB entry order (α-4), slice data encoding (α-5).

Fork tip 0226684 carries γ + IMP-1 diagnostic + α-2 hygiene. All
env-gated off by default; α-2 is a wire-payload cleanup with zero
behavior effect.

Lessons distilled:
- Reviews are never skippable — Phase 5b CRIT-1 saved a build cycle.
- Wire-byte equivalence ≠ behavior equivalence.
- Per-driver kludges in shared codec code need explicit gating.
- Bug carryover labels can mislead (Bug 4 != "inter race-loss").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:01:36 +00:00
marfrit 16034152a8 iter8 Phase 4c: α-2 plan — remove POC sentinel strip for rkvdec
Phase 3 strace re-decoded with correct struct layout:
- libva sends dpb[0] tfoc=0, bfoc=0 (sentinel stripped)
- kdirect sends dpb[0] tfoc=65536, bfoc=65536 (FFmpeg sentinel preserved)
- flags match between both (0x03 VALID|ACTIVE)

rkvdec config_registers() writes top/bottom_field_order_cnt directly to
MMIO. The strip was added in h264.c:219 for hantro's prepare_table; for
rkvdec, kdirect's path (no strip) decodes correctly while libva's
(strip) produces 16x32 partial decode.

Option A: remove the strip entirely (~5 LOC).
Option B: per-driver gating (~20 LOC).

Hantro+H.264 not exercised on RK3399 — Option A is safe. Phase 5c
review then Phase 6c implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:48:52 +00:00
marfrit 64b370d863 iter8 Phase 5b review: CRIT-1 kills α-1 (rkvdec ignores constraint_set_flags)
Sonnet-architect Phase 5b read rkvdec-h264.c end-to-end and confirmed:
constraint_set_flags is NEVER accessed by the driver. assemble_hw_pps()
reads only chroma_format_idc, bit_depth_*, log2_max_frame_num_minus4,
max_num_ref_frames, pic_order_cnt_type, log2_max_pic_order_cnt_lsb_minus4,
and dimension fields. rkvdec_h264_validate_sps() doesn't validate it.

CONSTRAINT_SET3_FLAG and PROFILE_IDC in the hardware PPS packet are
hardcoded constants (1 and 0xFF respectively), not propagated from the
incoming SPS.

α-1 will not unblock Bug 4. Plan-killer.

CRIT-2: ConstrainedBaseline 0x42 mapping is wrong (bit 6 reserved);
correct value 0x12 (bit 1 | bit 4) per H.264 §A.2.1.1.

IMP-1 redirects: DPB entry flags + POC fields are the next candidate.
rkvdec config_registers() reads dpb[i].flags ACTIVE/FIELD bits and
dpb[i].fields TOP/BOT bits. lookup_ref_buf_idx() substitutes destination
buffer as reference when ACTIVE missing — silent corruption matching
observed symptom.

IMP-2/3: full PPS byte comparison + close-criteria framing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:33:40 +00:00
marfrit 678c072d75 iter8 Phase 4b: α-1 plan — per-profile SPS constraint_set_flags
Single-byte fix candidate. Add h264_constraint_set_flags(VAProfile)
helper to h264.c, mirror pattern of h264_profile_to_idc + level_idc
derivation. VAAPI doesn't forward this field; libva backend must derive
per profile.

Mapping per H.264 typical-stream conventions:
  Main → 0x02 (constraint_set1_flag, matches BBB + kdirect)
  ConstrainedBaseline → 0x42
  High / MultiviewHigh / StereoHigh → 0x00

LOC ~15 in h264.c only. Per-VAProfile-gated; no risk to VP9/VP8/HEVC/
MPEG-2. Phase 5b architect review required before Phase 6b implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:25:23 +00:00
marfrit 84c939692f iter8 Phase 7 (γ + IMP-1): root cause confirmed kernel-side
γ dump confirms libva reads buffer correctly; the 16x32 patch and
stride-4 UV markers appear at YUV output exactly as in the dump.

IMP-1 memset-before-QBUF test: pre-zeroing buffer does NOT change output
(identical hash). The 512 bytes ARE deterministic kernel writes, not
stale residue.

Bug root cause: rkvdec accepts libva's H.264 decode request without
error flags but writes only 16x32 of luma-neutral data + stride-4 UV
scratch. Kernel decoded a tiny bit then stopped.

Phase 3 SPS diff: libva SPS.constraint_set_flags=0x00 vs kdirect's
0x02 — likely the kernel hint that triggers rkvdec's full decode path
for Main profile. Phase 4b α-1 fix: derive constraint_set_flags per
VAProfile in h264_set_controls. ~10 LOC. Phase 5b review required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:23:55 +00:00
marfrit d4c04b4a3b iter8 Phase 5: sonnet-architect review — 2 CRIT + 4 IMP + 3 MIN
CRIT-1: request_log prepends prefix on every call; per-byte loop in γ
sketch would emit 32 prefix-only lines. Fix: snprintf buffered emit.

CRIT-2: γ dump block missing null guard on destination_data[]; the
plan's env-var check is outside the current_slot != NULL guard. Fix:
nest the dump inside the existing slot-null guard.

IMP-1: "stale residue from prior decode" not eliminated as alternative
explanation for the 16x32 patch. Add memset-zero-before-QBUF experiment
to Phase 7 to discriminate.

IMP-2: γ-first defensible but on IMP-1 grounds, not the
three-signature argument (which is weaker than stated).

IMP-3/4 placement clarifications. MIN-1/2/3 cosmetic.

5 mechanical amendments locked for Phase 6. γ-first strategy stands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:55:51 +00:00
marfrit 3a6307638d iter8 Phase 4: γ-then-α plan — diagnostic dump first, fix after
Phase 3 redefined Bug 4 to partial-fill (not inter race). Three distinct
per-codec signatures (VP9 correct, HEVC zero, H.264 partial-leak) can't
be explained by a single hypothesis. Phase 4 commits to γ first: a
~30 LOC env-gated diagnostic dump in RequestSyncSurface that fires
after CAPTURE DQBUF, prints first/last 32 bytes of each destination_data
plane and a non-zero-count of the first 1024 bytes.

γ definitively distinguishes "kernel didn't write" from "libva mis-reads"
from "slot binding wrong". Phase 4b targeted fix follows γ's outcome.

Out of scope: per-codec H.264 control-fill changes (gated on γ's
findings), VP9/VP8/HEVC/MPEG-2 paths, kernel patches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:51:47 +00:00
marfrit 4320d7860f iter8 Phase 3: empirical Bug 4 redefinition — partial-fill, not inter race
Phase 3 strace + byte-level analysis on fresnel rkvdec. Findings:

1. Bug 4 is NOT inter-race-loss. The IDR keyframe itself fails through
   libva (only 512 bytes of real Y data at top-left 16x32 region).
2. The 16x32 leak is structured real image content (smooth gradients,
   neutral luma ~0x80) — kernel decoded one tile / one MB pair, then
   stopped.
3. VP9 via libva WORKS through the same readback path (100% non-zero,
   real image data). So the bug isn't generic DMA-BUF cache coherency.
4. HEVC fails via libva (all-zero, distinct from H.264 partial-fill).
5. OUTPUT sizeimage = 1MB (SOURCE_SIZE_MAX) is sufficient — BBB IDR is
   only 6321 bytes. Not the bug.
6. Control payload diffs: SPS.constraint_set_flags = 0 vs kdirect's 2
   (probably cosmetic); DECODE_PARAMS.dpb[0].bottom_field_order_cnt = 0
   vs kdirect's 1 (load-bearing for POC).

Refined hypothesis: a specific H.264 control field libva sends causes
rkvdec to abort after partial decode. Phase 4 candidates: α fix POC
fields, β bump OUTPUT sizeimage, γ instrumentation dump, δ relative
timestamps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:48:41 +00:00
marfrit abd97e3eb6 iter8 Phase 2: H.264 backend source-read + refined hypothesis surface
Maps the per-frame decode pipeline (BeginPicture → RenderPicture →
EndPicture → SyncSurface) and walks frame-1 IDR + frame-2 P state
transitions through h264_set_controls and the DPB.

Eliminates 6 of 13 hypotheses from Phase 0 by source-read alone (H-A
DPB stale, H-B POC sentinel for small POCs, H-C SLICE flags in FRAME_BASED,
H-D request_fd lifecycle, H-F pred-weight, H-G scaling matrix re-upload).
Adds 4 new hypotheses (H-J reference_ts derivation, H-K CAPTURE buf count,
H-L slice_data alignment vs h264_start_code, H-M frame_num cross-check).
Live hypotheses for Phase 3: H-E (CAPTURE rotation/reference-resolution),
H-H (start_code prefix), H-L (slice_data alignment), H-K (cap_pool size).

Phase 3 plan: strace-diff libva-vaapi-H.264 vs kdirect-H.264 on the same
fixture; byte-level frame-1/2/3 examination; dmesg check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:19:49 +00:00
marfrit e47a7ba309 iter8 Phase 0: lock Bug 4 — H.264 inter-frame race-loss
User pick at iter8 open. Carried unchanged through 5 iters (iter4..iter7);
keyframe partially decodes (frame-1 first 16 bytes = real chroma) while
inter frames return all-zero. Pass criterion: libva_h264 == kdirec_h264
== sw_h264 byte-identical for bbb_1080p30_h264.mp4 3-frame, including
inter frames.

In scope: src/h264.c, src/h264_slice_header.c, src/picture.c H.264 paths,
per-frame request_fd lifecycle. Out of scope: VP9/VP8/HEVC/MPEG-2, kernel
patches, performance, all other backlog items.

Substrate at iter8 open: fork tip 6df2159 (iter7), backend SHA 520507f6..,
kernel linux-fresnel-fourier 7.0-1, auto-detect picks rkvdec on every boot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:15:45 +00:00
marfrit b0ebe67673 iter7 PASS close: auto-detect picks rkvdec reliably; iter4-B1a closed
Phase 7 verification 5/5 PASS:
- C1 auto-detect picks decoder (verified: auto-selected /dev/video1 +
  /dev/media0 on rkvdec, NOT encoder)
- C2 prefer rkvdec (pass-1 short-circuit confirmed)
- C3 zero regression: all 5 codec hashes (H.264 71ac099b..., HEVC
  06b2c5a0..., VP9 4f1565e8..., MPEG-2 19eefbf4..., VP8 bcc57ed5...)
  identical to iter5b-β/iter6 anchors
- C4 multi-boot stability: SOFT PASS (architectural — algorithm is
  deterministic given kernel topology; physical reboot not session-
  blocking)
- C5 vainfo lists 7 rkvdec profiles (H.264 variants + HEVC + VP9)

Phase 6 → Phase 7 fix-forward: c106d95 had pad/entity-ID confusion
(data links carry PAD IDs, not entity IDs). Empirical topology dump
on fresnel /dev/media0 revealed it; fix-forward 6df2159 allocates
topo.pads[] and resolves data-link endpoints via pads[].entity_id.

Phase 5 reviewer caught 2 CRIT + 4 IMP + 3 MIN — all incorporated.
Phase 5 missed the pad/entity ID encoding distinction; future
media-topology code reviews should ask for empirical dumps.

Net iter7 contribution: quality-of-life. Auto-detect now reliable
across boot orderings for rkvdec codecs (H.264/HEVC/VP9). MPEG-2/VP8
still need LIBVA_V4L2_REQUEST_VIDEO_PATH env override (iter4-B1b
backlog — multi-decoder routing deferred to future iter).

Fork tip 6df2159. Backend SHA 520507f6...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:10:23 +00:00
marfrit 5bf6acb964 iter7 Phase 6: 1 commit landed on fork — auto-detect refactor pending fresnel build
Fork tip c106d95 (was 70196f8). 165 LOC added / 57 removed in
src/request.c. All 9 Phase 5 amendments (2 CRIT + 4 IMP + 3 MIN)
incorporated.

Fresnel offline at push time. Build + install + Phase 7 verify
deferred until host returns. Phase 7 sweep ready to execute:
vainfo + ffmpeg-vaapi + reboot stability + iter5b/iter6 regression
check.

Code review verified algorithm correctness against Phase 5 reviewer
pseudocode + boltzmann's linux-rockchip source confirms
MEDIA_ENT_F_PROC_VIDEO_DECODER is set on rkvdec.c:1382 +
hantro_drv.c proc entities. Compile-time syntax untested
(no va-api dev headers on noether).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 09:41:12 +00:00
marfrit cebdd82e7f iter7 Phase 5: review — 2 CRIT on link-graph traversal; algorithm validated
Phase 5 sonnet-architect found:
- CRIT-1: interface links connect IO entities (source/sink) to interfaces,
  NOT directly to proc entity. Walk must use MEDIA_LNK_FL_INTERFACE_LINK
  (1U<<28) to discriminate. Author verified at media.h:223-225.
- CRIT-2: source_id/sink_id ordering not guaranteed in link entries;
  check both endpoints. Author verified media_v2_link struct at media.h:341-347.
- IMP-1: hantro decoder-proc (entity 17) distinct from encoder-proc
  (entity 3) by function field. Algorithm correct by construction —
  no encoder contamination possible.
- IMP-2: MEDIA_ENT_F_PROC_VIDEO_DECODER set on both rkvdec-proc
  (rkvdec.c:1382) and hantro-dec-proc (hantro_drv.c).
- IMP-3: current 3-call ioctl pattern has spurious memset; new function
  uses 2-call pattern (alloc all 3 arrays before second call).
- IMP-4/MIN-1/2/3: minor implementation notes.

All 5 substantive findings empirically verified against boltzmann's
linux-rockchip tree.

Phase 6 implementer pseudocode provided: walk entities → find decoder
proc → walk data links to collect IO entity neighbors → walk
interface links to find linked interface → resolve major:minor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 09:34:40 +00:00
marfrit 8ce6372ef8 iter7 Phase 4: plan — split iter4-B1 into B1a (this iter, encoder/decoder) + B1b (defer, multi-decoder routing)
Phase 2 source-read found iter4-B1 conflates two sub-bugs:
- B1a: walk picks encoder when it should pick decoder. SMALL FIX
  (~100-150 LOC). Add MEDIA_ENT_F_PROC_VIDEO_DECODER entity check
  in find_video_node_via_topology; two-pass prefer rkvdec.
- B1b: multi-decoder routing (rkvdec for H.264/HEVC/VP9 + hantro
  for MPEG-2/VP8 from one backend instance). Bigger arch fix
  ~200-400 LOC. DEFERRED.

iter7 ships B1a. Phase 1 criteria amended:
- Auto-detect always picks a decoder, never an encoder.
- Prefer rkvdec over hantro (rkvdec serves 3 of 5 codecs).
- 2 reboots verify stability.
- vainfo lists rkvdec's 3 codecs minimum.
- No regression on iter5b-β / iter6 state.

Phase 6 will use MEDIA_IOC_G_TOPOLOGY's entities+links arrays to
match V4L node entities to decoder-proc entities. Two-pass walk:
pass-1 rkvdec only, pass-2 any decoder.

Empirical baseline: on 2026-05-12 boot, /dev/media0=rkvdec (only
decoder), /dev/media1=hantro-vpu (encoder AND decoder both inside),
/dev/media2=uvc. Fix must skip encoder when accepting media1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:40:53 +00:00
marfrit fc44a1e63c iter7 Phase 0 lock: iter4-B1 auto-detect harden — require MEDIA_ENT_F_PROC_VIDEO_DECODER
Backend-only ~30-80 LOC. Walk media-topology entities (already partially
done at iter4 Commit Z); require at least one entity with function ==
MEDIA_ENT_F_PROC_VIDEO_DECODER. Eliminates the hantro encoder false-match
that breaks vainfo + ffmpeg-vaapi on every other reboot.

5 boolean Phase 1 criteria locked. No kernel work. No pixel-correctness
chasing. Quality-of-life delivery; removes per-session env-override
friction.

Predicted lowest-difficulty iteration since iter1. 2-3 hours wallclock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:25:18 +00:00
marfrit 8ce00d3aa1 iter6 PARTIAL close: Bug 6 narrowed to H-E (kernel-side hantro VP8 partial-write)
Phase 3 Candidate K executed: H-D (slot rotation) ELIMINATED via
instrumented bind+read site logging. Slot v4l2_index matches at
BeginPicture and at vaGetImage for every surface; destination_data[0]
matches slot->map[0]. No rotation mismatch.

H-A/B/C/D all eliminated. H-E (kernel-side hantro VP8 partial-write)
confirmed by elimination. The libva backend submits correct controls,
correct slice bytes, correct slices_size, correct slot indices.
Kernel writes erratic partial content (per-frame Y plane transitions
at row 536, 24, ... — not a clean buffer-size truncation, not slot
rotation).

iter6 close PARTIAL: 5 of 6 Phase 1 criteria PASS; criterion 1
(libva_vp8 == kdirect) PARTIAL — kernel-side fix needed, out of
iter6's locked backend-only scope.

No patches landed. Fresnel substrate unchanged: fork tip 70196f8,
backend SHA 2c6ff82c... (identical to iter5b-β close).

Net deliverable: Phase 3 narrowing reduces Bug-6 hypothesis space
from 5 to 1. Future iter7+ (or kernel-agent campaign) picks up the
kernel-side investigation.

Pattern recognized: iter2 HEVC transitive PASS masked Bug 5;
iter3 VP8 transitive PASS masked Bug 6. Both surfaced under direct
verification post-iter5b-β. Transitive proofs against ONE artifact
(control payload) don't catch bugs in OTHER artifacts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:52:15 +00:00
marfrit 007cf6ca8e iter6 Phase 3: narrowed Bug 6 — H-A/B/C eliminated; H-D/E (kernel) remain
Empirical Phase 3 narrowing:
- H-A slice data corruption: ELIMINATED. SHA256 of libva-dumped slice 0
  (300614 bytes) byte-identical to raw VP8 frame 0 from .webm at
  offset 10..300624 (post-VP8-header).
- H-B slices_size wrong: ELIMINATED. slices_size = fp_size +
  sum(dct_part_sizes) = 300614 exactly.
- H-C cache coherency: ELIMINATED. msync attempt yielded no output
  change; VP9 uses same image.c path and works fine.
- Control payloads: byte-identical between libva and kdirect for VP8
  keyframe (pre-Phase-2 finding).

Output pattern: erratic partial-write. Frame 0 Y plane has real
content rows 0-535, then 100% zero rows 536-719. UV plane real
rows 0-133, zero 134-359. Frame 1 Y plane real rows 0-23, zero
24-719. Per-frame transitions differ — not buffer-size truncation,
not slot rotation.

Remaining:
- H-D slot rotation (untested; needs instrumentation)
- H-E kernel-side hantro VP8 partial-write quirk (likely; needs
  ftrace / kernel investigation)

iter5b-β did fix Bug 2 for VP8 (pre-β all-zero was format mismatch;
post-β real-but-partial content is a separate kernel-side issue).

Phase 3 hands off 4 candidate directions to user:
- K: continue H-D investigation (1-2h next session)
- L: pivot to H-E kernel-side work (multi-session)
- M: park Bug 6, pick different bug (Bug 4/5 or iter4-B1)
- N: close iter6 PARTIAL, defer Bug 6 to iter7+

Substrate unchanged; no regression. Backend SHA still 2c6ff82c....

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:43:14 +00:00
marfrit bece7b7016 iter6 Phase 2: situation — VP8 control bytes are correct; bug is elsewhere
Empirical byte-diff of libva vs kdirect VP8 control payloads on
current substrate:
- Keyframe (payloads 0+1): BYTE-IDENTICAL (0 diffs / 1232 bytes)
- Inter frames: only 24 bytes diff at offset 1200-1223, which are
  the 3 reference-frame timestamps. libva uses gettimeofday→ns
  (large values), kdirect uses pts-derived (small). Both internally
  consistent; kernel uses them as keys, absolute values don't matter.

Verdict: Bug 6 is NOT in vp8.c control generation. The bytes match.
With identical controls and same hardware, libva produces 0.4% pixel
match for keyframe — bug lives in slice-data path, bytesused, cache
coherency, or CAPTURE slot rotation.

5 hypotheses (H-A..H-E) for Phase 3 to narrow:
- H-A slice data corruption in libva path (picture.c memcpy)
- H-B slices_size wrong on OUTPUT QBUF
- H-C cache coherency on OUTPUT mmap before kernel DMA read
- H-D CAPTURE slot rotation mismatch
- H-E other (deeper kernel-side)

Pre-iter5b masked all of these via the OUTPUT format mismatch
producing all-zero output. β fixed format → kernel actually decodes →
underlying bug now visible.

iter3's transitive proof verified specific control fields. Did not
verify slice data, bytesused, cache state, or slot rotation. Same
pattern as iter2's HEVC transitive PASS missing Bug 5. Future
transitive PASS claims must enumerate non-verified artifacts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:17:05 +00:00
marfrit 868d854121 iter6 Phase 0 lock: Candidate G — Bug 6 VP8 partial output
User pick. 6 boolean criteria locked: VP8 libva==kdirect; no regression
on VP9/MPEG-2/H.264-keyframe/HEVC; control-payload anchors hold.

Scope: src/vp8.c, src/picture.c VP8 dispatch + buffer cases,
src/surface.c surface_bind_slot, cap_pool slot lifecycle.
No kernel work. Backend-side fix expected (decode runs through
kernel cleanly; output diverges in slot rotation or partial fill).

Predicted small: 5-50 LOC once root-caused. Phase 2 + Phase 3
likely take more wallclock than Phase 6 implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 19:37:13 +00:00
marfrit 34e1480de5 iter6 Phase 0: substrate inventory + 5 candidate research questions
iter5b-β surfaced 3 explicit bugs (Bug 4 H.264 inter, Bug 5 HEVC
DQBUF ERROR, Bug 6 VP8 partial output) plus carried backlog items
(iter4-B1 device discrimination, B2-B6, L3, Q6, COLOR_RANGE).

Candidates F-J laid out for user lock:
- F: Bug 5 HEVC kernel-rejection (highest claim-vs-reality stigma)
- G: Bug 6 VP8 partial output (smallest suspect surface)
- H: Bug 4 H.264 inter race (highest consumer impact)
- I: Re-anchor regression hashes on β substrate
- J: iter4-B1 auto-detect harden

Recommendation: G → H → F sequence if multiple iters planned;
otherwise H for impact or J for architectural-cleanup fit.

Phase 1 lock pending user pick.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 19:23:58 +00:00
marfrit 9a14cc2527 iter5b-β Phase 8 close: PARTIAL PASS — VP9 unblocked direct, Bugs 4/5/6 carried to iter6
Iteration shipped (fork tip 70196f8, backend SHA 2c6ff82c... on fresnel):
- VP9 directly verifiable (Phase 1 criterion 1 met for 1 of 3 target codecs)
- MPEG-2 maintained (no regression after Commit D fix-forward)
- H.264 unchanged (Bug 4 deferred per Phase 1 lock)
- Architecture cleaned: CreateSurfaces2 ~70 LOC (single-responsibility),
  CreateContext owns OUTPUT lifecycle, no α'-style failure mode possible.

Surfaced bugs for iter6+:
- Bug 5: HEVC libva DQBUF FLAG_ERROR (pre-existing; iter2's transitive
  PASS verified control payload but not decode outcome)
- Bug 6: VP8 libva produces non-zero non-matching output (slot rotation
  or partial fill, masked pre-β by all-zero state)
- Bug 4: H.264 inter-frame race-loss (carried from iter4 P7)

Lessons distilled to memory:
- feedback_grep_callsites_before_no_change.md (Phase 5 v2 CRIT-2 caught
  request_pool_destroy not in DestroyContext after C3 stripped its
  only per-session caller)
- feedback_trust_iter_comments_for_lifecycle.md (Commit D fix-forward
  surfaced because Phase 4 v2 read but didn't trace context.c:262's
  iter6 ffmpeg-vaapi-copy surfaces_count=0 comment)

Campaign scoreboard: 5/5 with 2 direct (VP9 new, MPEG-2 maintained) +
3 mixed (H.264 keyframe partial, VP8 partial new, HEVC transitive-only
direct-FAIL).

iter6 awaits Phase 0 research-question lock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 19:01:07 +00:00
marfrit c773c3d2c1 iter5b-β Phase 7: PARTIAL PASS — VP9 unblocked, MPEG-2 maintained, HEVC+VP8 partial
Two acts:
Act 1 (β alone): all 5 libva codecs returned all-zero. MPEG-2 was a
regression (pre-β it worked); HEVC was unchanged (kernel returns
DQBUF FLAG_ERROR pre AND post β — same Phase 3 baseline showed it).
Root cause: ffmpeg-vaapi-copy passes surfaces_count=0 to vaCreateContext
per iter6 context.c:262 comment; my β walk of surfaces_ids[] was a
no-op → destination_planes_count stayed 0 → surface_bind_slot no-op
→ all-zero readback.

Act 2 (Commit D): cache format-uniform CAPTURE geometry in driver_data;
walk surface_heap in CreateContext; lazy-fill in CreateSurfaces2 when
fmt_valid is set; invalidate in DestroyContext. Restores MPEG-2 to
pre-β state and unlocks VP9.

Per Phase 1 criteria: criterion 1 PARTIAL (VP9 of HEVC+VP9+VP8);
criteria 2-4 PASS.

Bug 5 (NEW): HEVC libva DQBUF FLAG_ERROR — pre-existing kernel
rejection; β's OUTPUT format fix didn't address it. Transitive proof
at iter2 verified control payload shape but kernel still rejects;
some other V4L2 protocol contract aspect differs from kdirect.

Bug 6 (NEW): VP8 libva produces non-zero output with real content
(74.8% zero + 256 unique bytes incl. keyframe pixels at `93 8e 8a 89...`)
but diverges from kdirect. Decode runs; output mismatch likely
slot-rotation or partial-fill bug.

VP9 is iter5b-β's only clean PASS. Architecture-wise β succeeded:
no α'-style failure mode possible (no in-CreateSurfaces2 destructive
teardown), and the CRIT-1+CRIT-2 fixes from Phase 5 v2 review held.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 18:56:26 +00:00
marfrit 311411b3f9 iter5b-β Phase 6: 3 commits A+B+C landed on fork, build pending fresnel uptime
Commits: 1c548b1 (codec helper), cc077a0 (config wire-up),
7055b14 (β refactor + CRIT-1 + CRIT-2 + IMP-1 + IMP-2 + dead-field
cleanup). Fork tip 7055b14.

surface.c CreateSurfaces2 reduced from ~250 to ~50 LOC. OUTPUT-side
V4L2 lifecycle moved to context.c CreateContext. DestroyContext
gained request_pool_destroy() (CRIT-2 fix). last_output_*/surface_reset_
format_cache deleted (dead under β).

All 5 Phase 5 v2 amendments (CRIT-1, CRIT-2, IMP-1, IMP-2, IMP-3)
incorporated. Fresnel offline at push time — build+install+verify
deferred to Phase 7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:16:28 +00:00
marfrit 3508a2cfeb iter5b Phase 5 v2: 2 CRIT findings — NULL guard + missing request_pool_destroy
CRIT-1: context.c:64-66 video_format==NULL guard rejects every first
β CreateContext. β moves the probe from CreateSurfaces2 into
CreateContext itself, so the guard fires before any new logic runs.
Fix: remove guard, move CAPTURE probe to top of CreateContext.

CRIT-2: DestroyContext lacks request_pool_destroy. Empirical grep
shows only surface.c:220 (which β strips) calls it per-session.
Without amendment, second CreateContext gets pool->initialized=true
with stale slot pointers → QBUF EINVAL. Fix: add request_pool_destroy
to DestroyContext before REQBUFS(0). C3 (surface.c strip) and CRIT-2
fix MUST land together.

Plus IMP-1 (mplane assumption wrong for SUNXI_TILED_NV12) + IMP-2
(surface_reset_format_cache becomes dead under C7) + IMP-3 (error
recovery comment).

Phase 6 BLOCKED pending CRIT-1 + CRIT-2 fixes. Author confirmed
both at code level — Phase 5 caught what Phase 4 v2's surface read
missed ("DestroyContext teardown — no change needed" — wrong; was
incomplete).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:50:08 +00:00