Commit Graph

103 Commits

Author SHA1 Message Date
marfrit 34e1480de5 iter6 Phase 0: substrate inventory + 5 candidate research questions
iter5b-β surfaced 3 explicit bugs (Bug 4 H.264 inter, Bug 5 HEVC
DQBUF ERROR, Bug 6 VP8 partial output) plus carried backlog items
(iter4-B1 device discrimination, B2-B6, L3, Q6, COLOR_RANGE).

Candidates F-J laid out for user lock:
- F: Bug 5 HEVC kernel-rejection (highest claim-vs-reality stigma)
- G: Bug 6 VP8 partial output (smallest suspect surface)
- H: Bug 4 H.264 inter race (highest consumer impact)
- I: Re-anchor regression hashes on β substrate
- J: iter4-B1 auto-detect harden

Recommendation: G → H → F sequence if multiple iters planned;
otherwise H for impact or J for architectural-cleanup fit.

Phase 1 lock pending user pick.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 19:23:58 +00:00
marfrit 9a14cc2527 iter5b-β Phase 8 close: PARTIAL PASS — VP9 unblocked direct, Bugs 4/5/6 carried to iter6
Iteration shipped (fork tip 70196f8, backend SHA 2c6ff82c... on fresnel):
- VP9 directly verifiable (Phase 1 criterion 1 met for 1 of 3 target codecs)
- MPEG-2 maintained (no regression after Commit D fix-forward)
- H.264 unchanged (Bug 4 deferred per Phase 1 lock)
- Architecture cleaned: CreateSurfaces2 ~70 LOC (single-responsibility),
  CreateContext owns OUTPUT lifecycle, no α'-style failure mode possible.

Surfaced bugs for iter6+:
- Bug 5: HEVC libva DQBUF FLAG_ERROR (pre-existing; iter2's transitive
  PASS verified control payload but not decode outcome)
- Bug 6: VP8 libva produces non-zero non-matching output (slot rotation
  or partial fill, masked pre-β by all-zero state)
- Bug 4: H.264 inter-frame race-loss (carried from iter4 P7)

Lessons distilled to memory:
- feedback_grep_callsites_before_no_change.md (Phase 5 v2 CRIT-2 caught
  request_pool_destroy not in DestroyContext after C3 stripped its
  only per-session caller)
- feedback_trust_iter_comments_for_lifecycle.md (Commit D fix-forward
  surfaced because Phase 4 v2 read but didn't trace context.c:262's
  iter6 ffmpeg-vaapi-copy surfaces_count=0 comment)

Campaign scoreboard: 5/5 with 2 direct (VP9 new, MPEG-2 maintained) +
3 mixed (H.264 keyframe partial, VP8 partial new, HEVC transitive-only
direct-FAIL).

iter6 awaits Phase 0 research-question lock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 19:01:07 +00:00
marfrit c773c3d2c1 iter5b-β Phase 7: PARTIAL PASS — VP9 unblocked, MPEG-2 maintained, HEVC+VP8 partial
Two acts:
Act 1 (β alone): all 5 libva codecs returned all-zero. MPEG-2 was a
regression (pre-β it worked); HEVC was unchanged (kernel returns
DQBUF FLAG_ERROR pre AND post β — same Phase 3 baseline showed it).
Root cause: ffmpeg-vaapi-copy passes surfaces_count=0 to vaCreateContext
per iter6 context.c:262 comment; my β walk of surfaces_ids[] was a
no-op → destination_planes_count stayed 0 → surface_bind_slot no-op
→ all-zero readback.

Act 2 (Commit D): cache format-uniform CAPTURE geometry in driver_data;
walk surface_heap in CreateContext; lazy-fill in CreateSurfaces2 when
fmt_valid is set; invalidate in DestroyContext. Restores MPEG-2 to
pre-β state and unlocks VP9.

Per Phase 1 criteria: criterion 1 PARTIAL (VP9 of HEVC+VP9+VP8);
criteria 2-4 PASS.

Bug 5 (NEW): HEVC libva DQBUF FLAG_ERROR — pre-existing kernel
rejection; β's OUTPUT format fix didn't address it. Transitive proof
at iter2 verified control payload shape but kernel still rejects;
some other V4L2 protocol contract aspect differs from kdirect.

Bug 6 (NEW): VP8 libva produces non-zero output with real content
(74.8% zero + 256 unique bytes incl. keyframe pixels at `93 8e 8a 89...`)
but diverges from kdirect. Decode runs; output mismatch likely
slot-rotation or partial-fill bug.

VP9 is iter5b-β's only clean PASS. Architecture-wise β succeeded:
no α'-style failure mode possible (no in-CreateSurfaces2 destructive
teardown), and the CRIT-1+CRIT-2 fixes from Phase 5 v2 review held.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 18:56:26 +00:00
marfrit 311411b3f9 iter5b-β Phase 6: 3 commits A+B+C landed on fork, build pending fresnel uptime
Commits: 1c548b1 (codec helper), cc077a0 (config wire-up),
7055b14 (β refactor + CRIT-1 + CRIT-2 + IMP-1 + IMP-2 + dead-field
cleanup). Fork tip 7055b14.

surface.c CreateSurfaces2 reduced from ~250 to ~50 LOC. OUTPUT-side
V4L2 lifecycle moved to context.c CreateContext. DestroyContext
gained request_pool_destroy() (CRIT-2 fix). last_output_*/surface_reset_
format_cache deleted (dead under β).

All 5 Phase 5 v2 amendments (CRIT-1, CRIT-2, IMP-1, IMP-2, IMP-3)
incorporated. Fresnel offline at push time — build+install+verify
deferred to Phase 7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:16:28 +00:00
marfrit 3508a2cfeb iter5b Phase 5 v2: 2 CRIT findings — NULL guard + missing request_pool_destroy
CRIT-1: context.c:64-66 video_format==NULL guard rejects every first
β CreateContext. β moves the probe from CreateSurfaces2 into
CreateContext itself, so the guard fires before any new logic runs.
Fix: remove guard, move CAPTURE probe to top of CreateContext.

CRIT-2: DestroyContext lacks request_pool_destroy. Empirical grep
shows only surface.c:220 (which β strips) calls it per-session.
Without amendment, second CreateContext gets pool->initialized=true
with stale slot pointers → QBUF EINVAL. Fix: add request_pool_destroy
to DestroyContext before REQBUFS(0). C3 (surface.c strip) and CRIT-2
fix MUST land together.

Plus IMP-1 (mplane assumption wrong for SUNXI_TILED_NV12) + IMP-2
(surface_reset_format_cache becomes dead under C7) + IMP-3 (error
recovery comment).

Phase 6 BLOCKED pending CRIT-1 + CRIT-2 fixes. Author confirmed
both at code level — Phase 5 caught what Phase 4 v2's surface read
missed ("DestroyContext teardown — no change needed" — wrong; was
incomplete).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:50:08 +00:00
marfrit 5abea730a0 iter5b Phase 4 v2: re-plan with option β — CreateContext-centric OUTPUT lifecycle
Supersedes phase4_iter5b_plan.md (the α' plan rejected at Phase 7).
β architecture: strip OUTPUT-side V4L2 device state from
RequestCreateSurfaces2 entirely; move it to RequestCreateContext
where config_id (and therefore the bound profile) is unambiguously
known. CreateSurfaces2 becomes ID-allocation + per-surface
bookkeeping only.

9 contract clauses (C1..C9). Reuses 2 of 3 reverted iter5b commits
(codec.h/codec.c helper; object_config->pixelformat wire-up at
CreateConfig). New work: C3 strip surface.c, C4 build out
context.c — predicted ~120 LOC into context.c, ~190 LOC stripped
from surface.c (net ~70 LOC delta).

Risk register: 7 items; highest is multi-context resolution change
within shared driver_data (medium impact, mitigated by existing
DestroyContext teardown). α''s destructive teardown failure mode
disappears because β has no in-CreateSurfaces2 teardown branch.

Phase 5 review focus: error-recovery branches in CreateContext,
per-surface destination_* fill semantics (format-uniform fields
at CreateContext vs per-slot fields at BeginPicture), ohm
backwards-compat verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:57:54 +00:00
marfrit 864af258e9 iter5b Phase 7: FAIL — HEVC SIGSEGV, option α' rejected, revert + loopback to β
Empirical sweep on iter5b backend (SHA d7722da...) crashed in
copy_surface_to_image during HEVC libva-vaapi-hwdownload. Coredump
backtrace shows memcpy on stale surface_object->destination_data[i]
pointer — cap_pool_destroy ran during my pixfmt-change teardown
branch, but the subsequent S_FMT got EBUSY because the OUTPUT
queue was already streaming. State corruption mid-decode.

Root cause: ffmpeg-vaapi calls vaCreateSurfaces2 *twice*, with
CreateContext+STREAMON between them. My CreateSurfaces2 gate
destructively tears down cap_pool on pixelformat change but can't
recover when REQBUFS(0) silently fails on a streaming queue.

surface.c:164-171 TODO comment from iter1 anticipated exactly this:
"STREAMOFF + REQBUFS(0) + new S_FMT + new CREATE_BUFS — that's a
context-level redesign for the next iteration." Phase 4 dismissed
the comment as targeting multi-resolution mid-stream. That
dismissal was wrong; ffmpeg-vaapi triggers the same code path.

3 reverts on fork master: 4b2288f, f8256e6, ce304ef reverted by
709ab34, 9a7f888, 6bc29ec. Backend rebuilt + reinstalled on fresnel
at iter4-tip SHA 6e90b7a9.... Post-revert HEVC libva returns the
pre-iter5b broken-but-non-crashing all-zero pattern.

Per Phase 1 lock: criteria 1 FAIL (HEVC/VP9/VP8 still all-zero);
criteria 2-4 PASS (no regression on MPEG-2/H.264 keyframe/control
payloads). iter5b does not close.

Phase 7 → Phase 4 loopback: re-plan as option β (defer OUTPUT-side
S_FMT+CREATE_BUFS to CreateContext where config_id is known and
streams haven't started). User pick: revert + re-plan with β.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:46:16 +00:00
marfrit 550bb81a3e iter5b Phase 6: 3 commits A+B+C landed clean, backend installed on fresnel
Fork tip 4b2288f. Backend SHA256 d7722da742bfcb86a9136b07e6d9a5de23668f37fcad328258966c5338265e82
on /usr/lib/dri/v4l2_request_drv_video.so (pre-iter5b was 6e90b7a9b2c33480...).

LOC: 188 across 5 modified files + 2 new (codec.h, codec.c). All 4
Phase 5 amendments (CRIT-1 + 3 IMPs) incorporated in the actual
commits, no follow-ups needed.

Phase 7 sweep ready: re-run /tmp/iter5_p3/sweep.sh on fresnel; expect
libva == kdirect == sw for HEVC + VP9 + VP8 (3 codecs unblocked);
MPEG-2 unchanged; H.264 unchanged (Bug 4 deferred to iter6).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 09:41:04 +00:00
marfrit 7d1c44bd90 iter5b Phase 5: review — CRIT-1 mechanical pseudocode fix, 3 IMP amendments
Sonnet-architect found one Critical pseudocode error and three
Important amendments. All mechanical; no structural plan change.

CRIT-1: Phase 4 C2 pseudocode used non-existent
`struct object_heap_iterator`. Actual API at object_heap.h:67-68 uses
`int *iterator`. Author re-verified vs request.c:411-418 canonical
usage. Verbatim paste would have compile-failed.

IMP-1: gate comment at surface.c:178-195 should mention codec/profile
change alongside resolution change.

IMP-2: dead `object_config->pixelformat` field at config.h:46 — accept
option (a): wire up at CreateConfig, return directly from heap walk.
Saves one pixelformat_for_profile() call in surface.c path.

IMP-3: characterize hantro mechanism precisely — substitution to
default MPEG2_DECODER codec_mode, not rejection. Explains why MPEG-2
worked but VP8 didn't pre-fix.

10 contract clauses scorecard: 1 FAIL (C2), 2 CONDITIONAL (C3, C10),
7 PASS. Phase 6 cleared conditionally pending all 4 amendments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 08:04:38 +00:00
marfrit eca03d2641 iter5b Phase 4: plan — option α' (single-config lookup), 10 contract clauses
Picks α' over the Phase 2 recommendation of β: smaller scope (~50 LOC
vs ~250), targets iter5b's actual bug (wrong OUTPUT format at INITIAL
CreateSurfaces2, not the multi-resolution mid-stream case the
surface.c:164-171 TODO comment anticipates).

Patches:
- C1/C6: NEW src/codec.{h,c} + meson.build — pixelformat_for_profile()
- C2: NEW find_sole_active_profile() static helper in surface.c
- C3: Replace surface.c:173 hardcode with profile-derived lookup
- C5: Extend last_output_* gate with pixelformat

Phase 7 expected post-fix matrix: HEVC + VP9 + VP8 libva == kdirect
== sw (3 codecs unblocked); MPEG-2 unchanged (already worked);
H.264 still race-loses inter frames (Bug 4, deferred to iter6).

Phase 5 review concerns laid out: helper completeness, heap iterator
API, gate semantics, hantro CAPTURE-derivation on correct format,
mpv probe-then-real flow, memory rule placement.

Option β deferral note: cleaner refactor exists but not necessary
for iter5b's bug; defer to future iteration when multi-resolution
mid-stream becomes a target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:24:35 +00:00
marfrit 6b0e023e7f iter5b Phase 2: situation — lifecycle traced, option β (defer to CreateContext) recommended
VA-API lifecycle traced: CreateConfig stores profile in object_config;
CreateSurfaces2 has NO config_id, can't access profile; CreateContext
takes VAConfigID and already does profile-switch for h264_start_code
(context.c:205-217, iter4 fix-forward 692eaa0).

surface.c:164-171 already flags this as deferred-work in a TODO comment:
"that's a context-level redesign for the next iteration." iter5b picks
up that deferred work.

Three options analyzed empirically:
- α: thread current_profile through driver_data (15 LOC, fragile semantic)
- β: move OUTPUT-side lifecycle to CreateContext (80-150 LOC, clean)
- γ: lazy at BeginPicture (architecturally wrong site)

Recommendation: option β. iter4 reviewer accepted the deferred-work
flag in surface.c; iter5b is the iteration that addresses it.

object_config->pixelformat field at config.h:46 is declared but never
assigned — opportunity for wiring up cleanly via the profile→pixelformat
map.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:12:58 +00:00
marfrit cd34ec1918 iter5 Phase 0 loopback: real Bug 2 is surface.c:173 hardcoded OUTPUT format
Empirical strace of all 5 codecs through libva shows VIDIOC_S_FMT on
OUTPUT_MPLANE ships pixelformat V4L2_PIX_FMT_H264_SLICE for EVERY
profile. HEVC controls submitted on H264_SLICE OUTPUT → kernel rkvdec
silently rejects/no-ops → CAPTURE stays in cap_pool init (all-zero).

Per-codec Bug 2 taxonomy:
- HEVC, VP9, VP8: OUTPUT format mismatch on rkvdec/hantro-strict → 100% zero
- MPEG-2: format mismatch but hantro tolerates → works
- H.264: format right by coincidence; keyframe decodes, inter all-zero
  (Bug 4, separate, deferred from iter5b)

Site: src/surface.c:173 `unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE`.
Same bug class as feedback_unconditional_codec_state.md
(iter4 h264_start_code = true).

iter5b new Phase 1: fix surface.c to switch pixelformat on
config_object->profile. 4 criteria locked, all backend-side, no kernel
patches. RFC v2 series filed back to backlog for a future
DMABUF-import-consumer campaign.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 11:21:41 +00:00
marfrit 0adfb11fff iter5 Phase 5: review CRIT-1 invalidates Phase 4 — loop back to Phase 0/3
Sonnet-architect review found that the RFC v2 fix mechanism does not
reach the libva backend's consumer path:
- Backend uses V4L2_MEMORY_MMAP for both OUTPUT + CAPTURE buffers.
- For MMAP buffers, vb->planes[].dbuf stays NULL.
- RFC v2 helper's plane loop skips planes with !dbuf, fence attached
  to no dma_resv.
- EXPBUF (vb2_dc_get_dmabuf) creates a fresh disjoint dma_resv.
- The fence-mechanism fix would be a no-op for the cap_pool path even
  if it did reach the right resv, because RequestSyncSurface already
  blocks on media_request_wait_completion + v4l2_dequeue_buffer.

Three alternative root-cause hypotheses for Phase 0/3 to disambiguate:
cache coherency, cap_pool slot-rotation bug, or a separate-sync gap
in vaDeriveImage/vaMapBuffer that bypasses RequestSyncSurface.

Phase 5 saved ~half a session of build-install-test wallclock that
would have ended in a Phase 7 → Phase 0 loopback anyway.

Three Important + 2 Minor findings also recorded for when iter5 reopens.

User pick: loop back to Phase 0/3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 10:56:11 +00:00
marfrit a809e9c0b8 iter5 Phase 4: plan — 4 patches + manifest diff + PKGBUILD bump
12 contract clauses (C1..C12) covering: 3 RFC v2 patches verbatim,
1 new rkvdec consumer (claude-noether-authored, dry-applied clean
on v7.0 in worktree test), kernel-agent patches/ scope tag +
fleet/fresnel.yaml diff, marfrit-packages PKGBUILD bump 7.0-1 → 7.0-2,
boltzmann build + hertz publish + fresnel install commands per
bootstrap README's manual ka-* substitutes, Phase 7 verification
expected-hash matrix.

Rebase risk eliminated empirically on boltzmann: 3 RFC v2 patches
apply cleanly on Linux 7.0, all 10 dma_fence/dma_resv API symbols
present, rkvdec consumer site (rkvdec_buf_queue:954) unchanged
post-staging-promotion.

Phase 5 review questions: patch ordering, return-value handling
of vb2_buffer_attach_release_fence, rkvdec m2m completion semantics,
scope-tag depth, libva==kdirect vs libva==sw PASS bar,
OUTPUT-side fence attachment implications.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 07:40:44 +00:00
marfrit 3c05564e99 iter5 Phase 3: baseline — 4/5 libva codecs race-lose, MPEG-2 wins, kdirect clean
5-codec sweep matrix on linux-fresnel-fourier 7.0-1 confirms:
- libva path returns all-zero cap_pool init pattern for H.264 (mostly)
  HEVC, VP9, VP8 (always). MPEG-2 wins the race (fastest hantro decode).
- kernel-direct ffmpeg-v4l2request hwdownload byte-matches SW for all
  4 race-losing codecs.
- B4 cosmetic init-probe EINVAL noise reproduced on hantro (2 ioctl per
  codec); MPEG-2 + VP8 stateless control submissions follow at = 0.

iter4 P7's "RGB(0,0x4c,0)" pattern corrected to all-zero raw bytes
(the 0x4c was YUV→RGB conversion of all-zero NV12). Same SHA shape
as iter3's hantro b34860e0 blocker fingerprint.

Control-payload strace anchors persisted as phase-7 invariants.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 05:14:57 +00:00
marfrit 9941523f1f iter5 Phase 2: situation analysis — 4-patch plan (3 RFC v2 + 1 new rkvdec consumer)
Source-read complete: 3 RFC v2 patches dissected, v7.0 rkvdec_buf_queue
site identified at line 954 of drivers/media/platform/rockchip/rkvdec/rkvdec.c,
empirical disproof of Bug 3 UAPI drift via byte-identical v6.12↔v7.0 struct
diff, hantro_v4l2.c confirmed unchanged across the same range.

Rebase risk concentrated in videobuf2-core.c (medium — vb2 core sees regular
activity); deferred to Phase 4 when boltzmann is reachable for the
git apply --3way verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 03:58:07 +00:00
marfrit 31b9255d63 iter5 Phase 0 amend: Bug 3 collapses, locked criteria 5→4
Phase 2 source-read mid-execution found that v4l2_ctrl_mpeg2_*
and v4l2_ctrl_vp8_frame are byte-identical v6.12 ↔ v7.0 mainline.
On-fresnel re-trace with correct hantro-decoder bind shows MPEG-2
controls submit at = 0; the "Unable to set control(s)" log noise
is the backend's H.264/HEVC init-probe EINVAL on a non-H.264 device
(B4 backlog), not a UAPI drift.

iter5 locked scope is now vb2_dma_resv (4 patches: 3 existing
operator-authored RFC v2 + new rkvdec consumer). Criteria reduced
from 5 to 4. B4 stays in backlog.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 03:55:36 +00:00
marfrit 8acfca3fe0 iter5 Phase 0: lock Candidate B — vb2_dma_resv + hantro UAPI drift in linux-fresnel-fourier
Five Phase 1 criteria: Bug 2 closed (cap_pool readback returns real
pixels through libva); Bug 3 closed (hantro MPEG-2 + VP8 controls
accepted on new kernel); patches ship from kernel-agent (local-carry
acceptable, mainline bonus); zero codec-contract regression vs iter4;
5/5 direct-verification block restored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:54:54 +00:00
marfrit 9d2b7c1944 iter4 Phase 7 close: Option-A transitive proof complete — VP9 PASS 4/5
Leg 1: FRAME control 168/168 bytes byte-identical to kernel-direct anchor.
Leg 2: COMPRESSED_HDR 1950/2040 match; 90-byte uv_mode[10][9] delta is the
       documented S4 carve-out (rkvdec persistent kernel table).
Leg 3: kernel-direct YUV (NV12→YUV420P, 3 frames @1280x720) SHA256-identical
       to libvpx-vp9 SW reference: 4f1565e89cd720c4eb6e59d8bbb46127b02cf13102911afc4e174925e5b36094

iter4 criteria 1+2+3 direct PASS, 4 transitive PASS, 5 carried as substrate
issue (cap_pool readback, Bug 2 + hantro UAPI drift, Bug 3) outside iter4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:01:09 +00:00
marfrit f510ac6be5 iter4 Phase 7 pause: fork fix-forward 692eaa0, awaiting fresnel return for transitive-proof closure
Mid-Phase-7 fix-forward landed on fork
(marfrit/libva-multiplanar:692eaa0): unconditional
context_object->h264_start_code = true was prepending 0x00 0x00 0x01
to VP9 slice data, shifting the rkvdec bitstream by 24 bits and
producing silent decode failure. Now gated on
config_object->profile (H.264 + HEVC only).

Empirical verification when fresnel was online: post-fix VP9 keyframe
FRAME control bytes 0-23 byte-match Phase 3 anchor:
  lf.flags=0x03 (DELTA_ENABLED|DELTA_UPDATE) — was 0x01
  base_q_idx=0x2e=46 — was 0x41=65

This is the transitive-proof leg-1 (backend-payload == kernel-direct-payload)
for the iter4 keyframe.

Open verification when fresnel returns:
- Full 168-byte FRAME control diff mine vs Phase 3 anchor
- Full 2040-byte COMPRESSED_HDR control diff
- ffmpeg-v4l2request kernel-direct VP9 decode + hwdownload pixels =
  Phase 3 SW reference (transitive-proof leg-2)

If both legs PASS, iter4 closes 5/5 (4 direct from earlier iters
+ 1 transitive iter4) per Option-A choice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 10:18:07 +00:00
marfrit d87c940788 iter4 Phase 7: criterion 1+2+3 PASS, criterion 4+5 FAIL — three bug classes identified
Verification on linux-fresnel-fourier 7.0-1:

PASS:
- Criterion 1: vainfo enumerates VAProfileVP9Profile0 via auto-detect.
- Criterion 2: vaCreateConfig SUCCESS (implicit).
- Criterion 3: ffmpeg-vaapi VP9 5-frame decode exit 0 at 0.307x, no
  ioctl errors.

FAIL — three distinguishable bug classes:

Bug 1 (VP9-specific, my Clause 6 parser):
  Strace of frame-1 keyframe FRAME control vs Phase 3 anchor:
  - byte 8 (lf.flags): mine=0x01 (DELTA_ENABLED only) vs ref=0x03
    (ENABLED|UPDATE).
  - byte 16 (base_q_idx): mine=0x41 (65) vs ref=0x2e (46).
  - byte 17 (delta_q_y_dc): mine=8 vs ref=0.
  Bit-trace shows my parser is 2 bits ahead of correct position by
  the time it reaches lf_delta_enabled. Fix path: faithful port of
  FFmpeg vp9.c::decode_frame_header.

Bug 2 (substrate-wide, cap_pool readback):
  Constant RGB(0, 0x4c, 0) "0x4c gray" pattern across all codecs
  (VP9, HEVC, MPEG-2, VP8). H.264 keyframe DOES read correctly with
  real RGB(0, 0xe3, 0) content; H.264 inter frames revert to 0x4c.
  Kernel decode succeeds (Phase 3 strace + ffmpeg-v4l2request
  standalone confirm). libva readback returns cap_pool init scratch.
  Sibling of iter3 dma_resv blocker but with different signature
  (constant 0x4c instead of all-zero 0x00).

Bug 3 (hantro UAPI drift):
  MPEG-2 + VP8 produce kernel "Unable to set control(s): Invalid
  argument" errors. UAPI struct sizes/fields likely shifted between
  6.19.9 and 7.0 (sibling of Phase 3 VP9 struct-size correction
  144/1947 -> 168/2040).

Three loopback options proposed (decision pending user):
- A: VP9-only fix (Clause 6 parser); accept Bug 2/3 as substrate
     pre-existing; criterion 4 transitive-only per iter3.
- B: Full loopback covering all 3 bugs; possibly requires kernel
     patches (vb2_dma_resv RFC v2).
- C: Phase 0 reset; substrate is the primary issue; pause iter4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 07:20:51 +00:00
marfrit 42b9ec333a iter4 Phase 6: 4 commits landed (Z+A+B+C), ffmpeg-vaapi VP9 decode PASS
Fork at marfrit/libva-multiplanar tip beaa914:
- Z (7f8fa93) device-path auto-detect via media controller topology;
  walk /dev/media*, MEDIA_IOC_DEVICE_INFO match, MEDIA_IOC_G_TOPOLOGY
  -> MEDIA_INTF_T_V4L_VIDEO -> resolve via /sys/dev/char.
  LIBVA_V4L2_REQUEST_NO_AUTODETECT=1 escape hatch.
- A (16b3973) src/config.c VP9 enumeration + dispatch + entrypoints.
- B (406d08e) NEW src/vp9.c (~750 LOC: VPX rac + inv_map_table +
  uncompressed-header partial parser + compressed-header parser +
  vp9_set_controls) + src/vp9.h + meson.build + context.h
  (persistent vp9_lf state for Phase 5 C2) + surface.h
  (params.vp9 union extension).
- C (beaa914) src/picture.c VP9 dispatcher + 2 buffer-type cases.

NO Commit D — buffer.c allow-list already permissive for VP9's 3
buffer types (Picture, Slice, SliceData; all in iter3 baseline).

Phase 5 amendments all in code: C1 no-XOR direct, C2 persistent
vp9_lf with VP9 spec defaults, C3 out_reference_mode parameter,
C4 NO_AUTODETECT escape, S4 uv_mode memcpy omitted.

Plan amendment to Commit Z section in phase4_iter4_plan.md
documents the canonical media-topology approach (replacing the
original /dev/video* walk).

Verification empirically on fresnel:
- Criterion 1: vainfo enumerates VAProfileVP9Profile0 alongside
  H.264 + HEVC under auto-detect rkvdec.
- Criterion 2 (implicit via successful ffmpeg run).
- Criterion 3: ffmpeg-vaapi VP9 5-frame decode exit 0 at
  0.307x speed, no ioctl errors.
- Criterion 4: deferred to Phase 7 verification.
- Criterion 5: rkvdec codecs work without env override; hantro
  (MPEG-2/VP8) still need env override per iter4-B1 backlog.

Open iter4 backlog: B1 (multi-decoder dispatch refactor),
B2 (mpv-vaapi Could-not-create-device — ffmpeg-vaapi works fine
through same backend, mpv does not), Q6 (per-segment ALT_Q
mapping for non-BBB), COLOR_RANGE (VAAPI gap).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 06:55:45 +00:00
marfrit 9865416ed2 iter4 Phase 5: sonnet-architect review — 4 Critical findings, all amendments incorporated
Review by sonnet-architect with cold-context source reads of fork +
kernel UAPI + VAAPI + FFmpeg references + kernel rkvdec source.
Reviewer applied Direction 2 (empirical-over-theoretical) by
test-compiling struct sizes, gcc-c-checking VAAPI field accesses,
and source-tracing FFmpeg's filter-mode XOR provenance.

Critical findings (all empirically validated by author before
incorporation per feedback_review_empirical_over_theoretical.md):

C1 - interpolation_filter double-XOR: vaapi_vp9.c:62 ALREADY applies
     `filtermode ^ (filtermode <= 1)` when filling VAAPI's
     mcomp_filter_type. Plan's second XOR was incorrect; would swap
     EIGHTTAP and EIGHTTAP_SMOOTH for inter frames -> wrong
     loop-filter strength. Fix: direct assignment, no XOR.

C2 - LF deltas not persistent: kernel UAPI explicitly says
     "users should pass its last value" when delta_update=0. Plan
     memset-zeroed each frame; would send {0,0,0,0,0,0} on BBB inter
     frames instead of {1,0,-1,-1,0,0}. Fix: add persistent vp9_lf
     state to object_context, init to VP9 spec defaults, update only
     when parser sees delta_update=1, always copy to kernel control.

C3 - reference_mode out-parameter missing: reference_mode lives in
     FRAME struct, not COMPRESSED_HDR. Plan referenced
     `compressed_hdr_reference_mode` placeholder which would be an
     undefined identifier -> compile failure. Fix: add
     `uint8_t *out_reference_mode` param to vp9_fill_compressed_hdr;
     derive `allowcompinter` at call site from the 3 sign biases.

C4 - Mitigation B scope claim overstated: walk-and-pick-first always
     selects rkvdec on 7.0 (since video1 enumerates first). Hantro
     codecs (MPEG-2, VP8) at video3 still require env override.
     Fix: qualify criterion-5 trace; add LIBVA_V4L2_REQUEST_NO_
     AUTODETECT=1 escape hatch for legacy callers.

6 Suggested (S1-S6): all confirm plan correctness OR are scope-
aligned non-issues. S4 (uv_mode memcpy omission safe for rkvdec)
baked into Clause 9 amended text.

Without this review, iter4 Phase 6 would have failed first compile
(C3) + produced wrong inter-frame output (C1+C2) + caused user
confusion (C4). Estimated saving: 1 compile failure + 1 Phase 7 ->
Phase 4 loopback + 1 doc correction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 05:49:13 +00:00
marfrit 4b36077b17 iter4 Phase 4: plan locks 12 contract clauses + Mitigation B
5-commit plan (Z, A, B, C, optional D):
- Commit Z: src/request.c — walk /dev/video* + /dev/media*, match by
  driver name in {rkvdec, hantro-vpu, cedrus, sun4i_csi}; restores
  baseline functionality on 7.0 (where /dev/video0 is rockchip-rga).
- Commit A: src/config.c — VAProfileVP9Profile0 enumeration + dispatch
  + entrypoints (~16 LOC, 1 file).
- Commit B: NEW src/vp9.c + .h + meson — 12 contract clauses; ~580 LOC
  vp9.c (50 infra + 80 VPX rac + 50 uncompressed-header partial parse +
  180 compressed-header parser + ~200 frame-fill).
- Commit C: src/picture.c + surface.h — VP9 dispatch + 2 buffer-type
  cases + union extension; NO BeginPicture reset (VP9 has no
  iqmatrix_set-style flags).
- Commit D: optional fix-forward placeholder (predicted no-op per
  feedback_runtime_enumerates_allowlists.md).

Total ~699 LOC, 7 files.

12 contract clauses include 2 NEW vs iter3:
- Clause 3: compile-time _Static_assert sizeof v4l2_ctrl_vp9_frame ==
  168 && ..._compressed_hdr == 2040 (any UAPI shift fails loudly).
- Clause 6: uncompressed-header partial parse for lf_delta_* +
  base_q_idx (VAAPI doesn't expose; BBB keyframe needs non-zero
  ref_deltas={1,0,-1,-1} per Phase 3 anchor).

7 Phase 5 review questions queued, all empirical-leaning per
feedback_review_empirical_over_theoretical.md Direction 2:
parser-vs-bitstream cross-check, FFmpeg-XOR-remap validation,
struct-size stability, mitigation B regression risk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 23:10:47 +00:00
marfrit 56abe3d6a2 iter4 Phase 3: VP9 baseline + 4-codec regression on 7.0 substrate
Captured on linux-fresnel-fourier 7.0-1 (post 6.19 decommission).

VP9 baseline (kernel-direct via ffmpeg-v4l2request on rkvdec):
- 5-frame SW reference PNG SHA256 anchors (criterion-4)
- VIDIOC_S_EXT_CTRLS strace with full payload at -s 16384
- Empirical struct sizes 168 B (FRAME) / 2040 B (COMPRESSED_HDR)
  supersede Phase 2 estimates of 144 / 1947
- Probe pattern: count=1 (FRAME-only) then count=2 (FRAME + COMPRESSED_HDR)

Phase 2 doc fix: control IDs corrected 0xa40b2c/d -> 0xa40a2c/d.

4-codec regression (H.264, MPEG-2, HEVC, VP8): all fall back to SW on
default config because /dev/video0 is now rockchip-rga (RGB color
converter), not a codec device. Fork hardcodes /dev/video0 in
request.c:149. Env override LIBVA_V4L2_REQUEST_VIDEO_PATH /
_MEDIA_PATH restores per-driver profile enumeration; mitigation A/B/C
queued for user decision.

New contract clauses surfaced:
- Clause 11: uncompressed-header partial parse for lf_delta /
  base_q_idx (VAAPI doesn't expose these; keyframe ref_deltas non-zero
  for BBB so leave-at-zero is wrong)
- Clause 12: compile-time sizeof asserts on the two control structs
  so future UAPI shifts fail loudly

iter4_phase3.tgz: full Phase 3 artifact bundle (strace + PNG refs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 20:31:53 +00:00
claude-noether 2651e4cfdf iter4 Phase 2: situation analysis — VP9 backend gaps + compressed-
header parser requirement

Source-read of every file the iter4 patch series will touch, plus
kernel UAPI + VAAPI + downstream FFmpeg + kernel rkvdec reference
sources. Conducted on noether against fork tip e1aca9c (iter3 close).

Critical scope-shaping finding: rkvdec on RK3399 REQUIRES
V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (not optional). Per
drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble
lines 752-754:

  ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl,
                        V4L2_CID_STATELESS_VP9_COMPRESSED_HDR);
  if (WARN_ON(!ctrl))
      return -EINVAL;

VAAPI does NOT expose compressed-header probability updates
(va_dec_vp9.h:50-192 — only frame parameters + segmentation;
vendor VAAPI drivers parse compressed header in firmware/GPU).
Therefore the libva backend MUST parse the compressed header
itself via a VPX boolean decoder + inv_map_table[]. ~150-200 LOC
of bitstream parsing logic (port from FFmpeg
v4l2_request_vp9.c::fill_compressed_hdr).

Bug enumeration (12 sites):

  B1   config.c::RequestQueryConfigProfiles    enum block missing
  B2   config.c::RequestCreateConfig           VP9 case missing
  B3   config.c::RequestQueryConfigEntrypoints VP9 case missing
  B4   src/vp9.c                               new file ~500-600 LOC
  B5   src/vp9.h                               new file ~35-45 LOC
  B6   src/vp9_rac.h                           NEW or inline (Phase 4
                                                 plan locks Option A:
                                                 inline in vp9.c)
  B7   picture.c::codec_set_controls           VP9 dispatch missing
  B8   picture.c::codec_store_buffer           2 buffer-type cases
                                                 (Picture + Slice;
                                                 NOT 4 like VP8)
  B9   picture.c::RequestBeginPicture          predicted no reset
                                                 needed (no flag-state
                                                 like VP8 iqmatrix_set)
  B10  surface.h::object_surface::params union vp9 member missing
  B11  meson.build                             vp9.c/vp9.h not in lists
  B12  buffer.c                                predicted no change
                                                 needed (VP9 uses
                                                 Picture/Slice/SliceData
                                                 — all whitelisted)

Non-bugs (intentionally untouched): context.c (no DECODE_MODE/
START_CODE menus per FFmpeg ref), video.c (CAPTURE-side format
list), v4l2.c (fourcc-agnostic), include/hevc-ctrls.h (already
includes <linux/v4l2-controls.h>).

Contract surface cited verbatim:

  V4L2_CID_STATELESS_VP9_FRAME = 0xa40b2c (~144 bytes — much
    smaller than VP8's 1232 bytes because VP9_FRAME carries no
    entropy table; that's in COMPRESSED_HDR)
  V4L2_CID_STATELESS_VP9_COMPRESSED_HDR = 0xa40b2d (~1947 bytes
    — coef[4][2][2][6][6][3] alone is 1728 bytes)
  Per-frame submission: 2 controls batched in single S_EXT_CTRLS
  v4l2_request_vp9.c references confirmed: 2-control shape,
    runtime-probed COMPRESSED_HDR availability (rkvdec advertises
    it; we MUST provide)

VAAPI buffer types: 2 per frame (Picture + Slice) vs iter3 VP8's
4. NO Probability buffer (VP9 keeps probs in compressed header).
NO IQMatrix (VP9 keeps quant in slice's per-segment seg_param[8]).

VAAPI → V4L2 mapping table: 30+ fields enumerated. Several gap
candidates identified for Phase 3 empirical resolution:

  Q1 lf.ref_deltas/mode_deltas/flags — not in VAAPI; FFmpeg reads
     from VP9Context internal. BBB likely zero.
  Q2 quant.base_q_idx + deltas — VAAPI exposes only effective
     per-segment scales. Inverse-derive needed.
  Q3 reference_mode — not in VAAPI. Default to SELECT?
  Q4 interpolation_filter mapping (FFmpeg ^ remap)
  Q5 reset_frame_context off-by-one (FFmpeg > 0 ? - 1 : 0)
  Q6 Per-segment feature_data[8][4] derivation from VAAPI's
     effective scales is non-trivial
  Q7 mpv 0.41.0 VP9 hwdec engagement (per memory feedback_hw_
     decode_engagement_check.md — known gap from iter3 VP8)
  Q8 rkvdec dma_resv issue? (predicted NO based on iter1+iter2
     successful mpv-DMA-BUF-GL on rkvdec)

Patch-shape prediction: ~580-690 LOC across 5 modified + 2 new
files (closer to iter2 HEVC's 470 than iter3 VP8's 370). Compressed-
header parser is the dominant cost.

Phase 3 baseline targets queued: cross-validator strace verbatim
S_EXT_CTRLS payloads (both controls), VAAPI consumer trace, mpv-
VP9-vaapi engagement check, rkvdec readback non-zero check.

Phase 4 plan structure anticipated: 10-clause template per
iter2/iter3, with new Clause 8 dedicated to compressed-header
parser.

Refs:
  phase0_findings_iter4.md (Phase 1 lock)
  phase8_iteration3_close.md (predecessor)
  references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp9.c (V4L2 ref)
  references/ffmpeg-kwiboo/libavcodec/vaapi_vp9.c (VAAPI ref)
  /home/mfritsche/src/linux-rfc/drivers/staging/media/rkvdec/
    rkvdec-vp9.c (kernel driver — confirms COMPRESSED_HDR
    requirement at lines 752-754)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 05:20:07 +00:00
claude-noether 9a71dbf4c3 iter4 Phase 0 + Phase 1 lock: VP9 on rkvdec
Opens iter4 immediately after iter3 close (d5d4beb). Targets VP9
Profile 0 as the fifth (final) codec to pass boolean-correctness
on fresnel via libva-v4l2-request-fourier — completes the campaign
codec scope.

Locked research question:
  mpv --hwdec=vaapi bbb_720p10s_vp9.webm engages backend cleanly
  on rkvdec, and HW pixel readback yields byte-identical output
  to a software-decoded reference for the same frames.

Five Phase 1 boolean criteria:
  1. vainfo enumerates VAProfileVP9Profile0 on rkvdec env binding
  2. vaCreateConfig(VAProfileVP9Profile0, VLD) = SUCCESS
  3. ffmpeg -hwaccel vaapi VP9 5-frame decode exit 0
  4. HW=SW byte-identical with HW engagement verified per memory
     feedback_hw_decode_engagement_check.md (mpv -v log inspection
     before claiming match). If mpv falls back to SW for VP9 like
     it did for iter3 VP8, OR if rkvdec exhibits the same dma_resv
     kernel issue as hantro, fall through to transitive proof per
     memory reference_dmabuf_resv_blocker.md (libva backend
     payload == kernel-direct payload AND kernel-direct decode ==
     SW reference).
  5. FOUR-codec regression block: H.264 + MPEG-2 + HEVC + VP8
     reference hashes hold

Substrate carry-forward (re-verified):
  - fork tip e1aca9c (post-iter3-close)
  - /usr/lib/dri/v4l2_request_drv_video.so SHA256 0ab5b2ba...4ef
  - linux-eos-arm 6.19.9-99-eos-arm
  - bbb_720p10s_vp9.webm fixture on fresnel ~/fourier-test/ (3.4 MB)
  - rkvdec OUTPUT_MPLANE VP9F + 2 VP9 stateless controls
    (V4L2_CID_STATELESS_VP9_FRAME = 0xa40b2c, COMPRESSED_HDR =
    0xa40b2d)
  - cross-validator anchor confirmed: rkvdec advertises VP9 per
    Phase 0 V4L2 inventory
  - Reference sources local:
    references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp9.c
    references/ffmpeg-kwiboo/libavcodec/vaapi_vp9.c
    references/linux-mainline/drivers/staging/media/rkvdec/
      rkvdec-vp9.c (verify presence at Phase 2)

Predicted scope:
  - config.c: ADD VP9 enumeration block + RequestCreateConfig case
    + RequestQueryConfigEntrypoints case (3 sites; same shape as
    iter3 VP8)
  - src/vp9.c NEW file (~250-350 LOC; 2 V4L2 controls per frame:
    FRAME + COMPRESSED_HDR; 8-entry DPB vs VP8's 3)
  - src/vp9.h NEW file
  - src/meson.build add 'vp9.c' + 'vp9.h' entries
  - picture.c codec_set_controls VP9 dispatch + codec_store_buffer
    cases for 2 VAAPI VP9 buffer types (Picture + Slice; NO
    Probability + IQMatrix unlike iter3 VP8)
  - surface.h params union extend with vp9 member
  - context.c: NO changes expected (no init-time menus per FFmpeg
    ref pattern)
  - buffer.c: predicted no Commit D needed (VP9 uses Picture +
    Slice + SliceData buffer types — all already whitelisted by
    H.264 path); plan for fix-forward if runtime miss surfaces
    per memory feedback_runtime_enumerates_allowlists.md

Predicted total: ~400-500 LOC, 3-4 commits + 0-1 fix-forwards.
Larger than iter3 VP8 (370 LOC) but comparable to iter2 HEVC
(470 LOC).

VP9 contract surface:
  - 2 controls per frame batched in single S_EXT_CTRLS:
    FRAME (struct v4l2_ctrl_vp9_frame) + COMPRESSED_HDR
    (struct v4l2_ctrl_vp9_compressed_hdr — probability updates
    from compressed header)
  - 8 reference frames in DPB (active_ref_frames[8])
  - Tile-based decoding (VP9 has 1..N tiles per frame)
  - Profile 0 only (8-bit 4:2:0); Profile 1/2/3 OUT-OF-SCOPE

Phase 2 source-read targets queued: config.c enumeration pattern,
picture.c dispatch + per-buffer-type cases, surface.h params union,
VAAPI <va/va_dec_vp9.h>, kernel UAPI v4l2_ctrl_vp9_frame +
v4l2_ctrl_vp9_compressed_hdr (lines 2696-2870), kernel rkvdec-
vp9.c driver, FFmpeg v4l2_request_vp9.c + vaapi_vp9.c.

Memory carry-forward (all 9 entries apply unchanged):
  feedback_gitea_as_claude_noether
  feedback_no_session_termination_attempts
  feedback_header_deletion_check
  feedback_runtime_enumerates_allowlists (NEW iter3)
  feedback_review_empirical_over_theoretical (BOTH directions)
  feedback_rockchip_pixel_verify_path
  feedback_fresnel_hostname (NEW iter3)
  feedback_hw_decode_engagement_check (NEW iter3)
  reference_dmabuf_resv_blocker (NEW iter3)

Open questions inherited from iter3 close (not blocking iter4
lock):
  - Does mpv 0.41.0 engage HW for VP9 hwdec=vaapi or fall back
    like it did for VP8? Phase 0+3 verifies via mpv -v log.
  - Does rkvdec exhibit the same vb2_dma_resv kernel issue as
    hantro? Likely no (different driver subsystem; iter1+iter2
    mpv-DMA-BUF-GL paths worked on rkvdec). Phase 3 baseline
    answers via ffmpeg-vaapi-hwdownload non-zero check.

iter4 = final codec in campaign scope. Clean close → 5/5 codecs
passing → campaign complete.

Refs:
  phase0_findings_iter1.md (iter1 MPEG-2 lock template)
  phase0_findings_iter2.md (iter2 HEVC lock template)
  phase0_findings_iter3.md (iter3 VP8 lock template)
  phase8_iteration3_close.md (immediate predecessor close)
  phase0_evidence/2026-05-07/v4l2_inventory_findings.md (rkvdec
    VP9 capability)
  phase0_evidence/2026-05-07/test_fixtures.md (bbb_720p10s_vp9.
    webm provenance)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 23:36:04 +00:00
claude-noether d5d4beb64d iter3 Phase 8 close: 4/5 codecs passing, 3 new memory entries
distilled, 0 Phase 7 → Phase 4 loopbacks

iter3 = VP8 on hantro-vpu-dec via libva-v4l2-request-fourier on
RK3399 (fresnel / Pinebook Pro). Fourth codec to ship.

Final state:

  Fork tip: e1aca9c (post iter2 close 8d71e20 + 4 commits)
  Phase 1 criteria: 5/5 GREEN (4 direct + 1 transitive)
  LOC delta: +373 across 7 files (2 new + 5 modified)
  Phase 7 → Phase 4 loopbacks: 0
  Phase 6 fix-forwards: 1 (Commit D buffer.c allow-list)
  Phase 5 review findings: 4 Critical, all empirically validated

Lessons distilled to memory (3 NEW entries):

  feedback_hw_decode_engagement_check.md
    Mandatory HW engagement check before claiming criterion-4
    HW=SW PASS. mpv silently falls back to SW for some codec/
    backend combos. Use lsof/strace/mpv -v/ffmpeg log to verify
    HW path actually engaged. Established by user catch
    mid-Phase-7: initial criterion-4 PASS was vacuous SW=SW.

  reference_dmabuf_resv_blocker.md
    Cross-campaign blocker. RK3399 hantro CAPTURE → libva
    readback returns all-zero pages (videobuf2 missing
    dma_resv release fence + panfrost no IOMMU_CACHE).
    Tracked at git.reauktion.de/marfrit/dmabuf-modifier-triage/
    issues/2. vb2_dma_resv kernel patches in flight (RFC v2,
    2026-04 linux-media). Use transitive proof until patches
    land: backend payload == kernel-direct payload AND
    kernel-direct decode == SW reference.

  feedback_runtime_enumerates_allowlists.md
    Sibling to feedback_header_deletion_check.md. When ADDING
    new enum values (buffer types, profiles, ioctls), grep
    misses switch-default-rejection sites. Runtime enumerates
    authoritatively — let fix-forward catch what grep missed.
    Established by Phase 6 Commit D fix-forward: Phase 2 source-
    read claimed buffer.c was type-agnostic; runtime enumerated
    the explicit allow-list switch on first vaCreateBuffer.

Phase 5 amendments empirically validated (all 4 Critical correct):

  C1 first_part_header_bits = slice->macroblock_offset → 6550 ✓
  C2 first_part_size = partition_size[0]+ceil(macroblock_offset/8)
     → 22742 ✓ (= 21923 + 819, exact match for Phase 3 anchor)
  C3 VAProbabilityBufferType (not VAProbabilityDataBufferType)
     → compiled clean post-Commit-D
  C4 (int8_t) cast (not (s8)) → compiled clean Commit B first try

Estimated savings without Phase 5 review: 2 Phase 6 compile-fail/
fix-forward cycles (C3 + C4) + 1 Phase 7 → Phase 4 loopback (C1
+ C2 hardware-DMA-offset bug, would have produced visible-but-
corrupt output). Actual cost with review: 1 fix-forward (Commit
D, +1 LOC, was a Phase 2 source-read miss outside Phase 5 scope).

Cross-cutting backlog updates:

  iter3-Q1 first_part_header_bits → CLOSED by Phase 5 C1
  iter3-flags-anomaly bit 0x40 → not iter3 scope; kernel ignores
  iter3-criterion-4-readback → blocked on dmabuf-modifier-triage
                                iter1; transitive proof used
  iter3-mpv-vp8-fallback → mpv 0.41.0 falls back to SW for VP8;
                            consumer-side, not backend; verify
                            via chrome-fourier when convenient

Inherited backlog (B1, B3, B4, B5, B6, L3) — no closures from
iter3.

Campaign scoreboard: 3/5 → 4/5 codecs passing.

  H.264   | rkvdec | T4    | PASS direct
  MPEG-2  | hantro | iter1 | PASS direct
  HEVC    | rkvdec | iter2 | PASS direct
  VP8     | hantro | iter3 | PASS transitive (readback blocked)
  VP9     | rkvdec | iter4 | PENDING

iter4 (VP9 on rkvdec) prediction: comparable scope to iter2 HEVC
(VP9 has compressed-header control + probability state).
~400-500 LOC, 3-4 commits + 1 fix-forward. mpv may engage HW for
VP9 (different from VP8 fallback) — verify at iter4 Phase 0.

Refs:
  phase0_findings_iter3.md (Phase 1 lock)
  phase2_iter3_situation.md (situation analysis)
  phase3_iter3_baseline.md (verbatim payload anchors)
  phase4_iter3_plan.md (10 contract clauses + Phase 5 amendments)
  phase5_iter3_review.md (4 Critical, all validated correct)
  phase7_iter3_verification.md (4 direct + 1 transitive PASS)
  Fork commits 27d82e3 + 017e27f + 7f84bbb + e1aca9c

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 23:31:23 +00:00
claude-noether afb9b1450f iter3 Phase 7: verification — 4 direct PASS, 1 transitive PASS
Phase 1 5-criterion verification on iter3 backend (fork tip e1aca9c).
4 direct PASS + 1 transitive PASS. Vacuous-pass mode caught + corrected
mid-Phase-7 (initial mpv --hwdec=vaapi --vo=image HW=SW match was
SW=SW; mpv silently fell back to SW for VP8).

Criterion results:

  1. vainfo enumerates VAProfileVP8Version0_3       PASS (direct)
  2. vaCreateConfig SUCCESS                          PASS (direct, implied)
  3. ffmpeg-vaapi VP8 5-frame decode exit 0          PASS (direct)
  4. HW=SW byte-identical via DMA-BUF GL             PASS (transitive)
  5. 3-codec regression (H.264 + MPEG-2 + HEVC)      PASS (direct)

Criterion 4 transitive proof:

  Step A: Strace of ffmpeg-vaapi via libva backend captures the
          V4L2_CID_STATELESS_VP8_FRAME control payload — keyframe
          y_ac_qi=8, first_part_size=22742, first_part_header_bits=
          6550, all 30 fields enumerated.

  Step B: Phase 3 baseline already captured the kernel-direct
          (ffmpeg-v4l2request) keyframe payload — IDENTICAL to A
          field-for-field.

  Step C: ffmpeg-v4l2request kernel-direct VP8 decode produces
          5 raw frames byte-identical to SW reference (cmp on
          full 6.7 MB vp8_kerneldirect.yuv vs vp8_sw5.yuv = silent
          BYTE-IDENTICAL).

  Conclusion: A == B (libva backend produces correct kernel input)
              AND C (kernel-direct decode is correct), therefore
              libva backend's HW decode IS correct by transitivity.

Direct readback BLOCKED by kernel-layer dma_resv issue (sibling
campaign git.reauktion.de/marfrit/dmabuf-modifier-triage/issues/2):

  - ffmpeg-vaapi -hwaccel_output_format vaapi -vf hwdownload
    returns all-zero pages (SHA b34860e0... = SHA of all-zero
    1382400-byte block) for ALL 5 frames.
  - Same all-zero from -hwaccel_output_format nv12 + auto-DL.
  - mpv --hwdec=vaapi-copy returns Y=128 gray (uninitialized).
  - Root cause: videobuf2 missing dma_resv release fence + panfrost
    IOMMU_CACHE absence on RK3399 (per dmabuf-modifier-triage iter1
    RFC). vb2_dma_resv kernel patches in flight (linux-media RFC v2,
    2026-04). When patches land, direct verification re-runnable.

Phase 5 amendments empirically validated:

  C1 first_part_header_bits = slice->macroblock_offset → 6550 ✓
  C2 first_part_size = partition_size[0] + ceil(macroblock_offset/8)
     → 22742 ✓ (= 21923 + 819, exact match for Phase 3 anchor)
  C3 VAProbabilityBufferType (not VAProbabilityDataBufferType) →
     compiled clean post-Commit-D fix-forward
  C4 (int8_t) cast → compiled clean Commit B first try
  S3 assert(probability_set) → has not fired (FFmpeg vaapi_vp8.c
     always sends VAProbabilityBufferType per frame)

Phase 6 fix-forward Commit D documented: buffer.c had an explicit
allow-list switch (Phase 2 source-read missed it). Same iter1 Commit
D pattern — runtime enumerates authoritatively what grep missed.

HW-engagement check applied per new memory rule
feedback_hw_decode_engagement_check.md (established this session):

  - mpv-vaapi VP8: SILENT FALLBACK to SW. mpv-side, not backend
    issue. ffmpeg-vaapi VP8: HW engaged (Format vaapi chosen by
    get_format(); cap_pool_init: 24 slots ready).
  - V4L2 strace: VIDIOC_S_EXT_CTRLS for VP8_FRAME (0xa409c8)
    returns 0 (kernel accepts payload). CAPTURE buffer indexes
    advance through distinct slots per decode.

Cross-cutting backlog updates:

  iter3-Q1 first_part_header_bits → closed by Phase 5 C1
  iter3-flags 0x40 → not iter3 scope; kernel ignores
  iter3-criterion-4 readback → blocked on dmabuf-modifier-triage
                                iter1 (vb2_dma_resv kernel patches)

Campaign scoreboard: 3/5 → 4/5 codecs passing.

Memory entries added:
  feedback_hw_decode_engagement_check.md (mandatory HW engagement
    verification before claiming criterion-4 PASS)
  reference_dmabuf_resv_blocker.md (cross-campaign blocker tracking
    + transitive proof pattern)

Refs:
  phase4_iter3_plan.md (10 contract clauses + Phase 5 amendments)
  phase5_iter3_review.md (4 Critical findings, all empirically
                            validated in Phase 7)
  phase3_iter3_baseline.md (verbatim payload anchors used in
                              transitive proof Step B)
  git.reauktion.de/marfrit/dmabuf-modifier-triage/issues/2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 23:26:27 +00:00
claude-noether 656596aa6b iter3 Phase 5: sonnet review — 4 Critical findings, 4 amendments
Second-model review by sonnet-architect found 4 Critical bugs in
Phase 4 plan, all verified empirically by author before incorporation
per memory feedback_review_empirical_over_theoretical Direction 2.
Amendments applied in-place to phase4_iter3_plan.md +
phase2_iter3_situation.md.

Critical findings:

  C1 first_part_header_bits = 0 was claimed cosmetic; actually
     UNSAFE. hantro_g1_vp8_dec.c:260 + rockchip_vpu2_hw_vp8_dec.c:372
     both read this field unconditionally to compute the macroblock
     DMA offset. Setting 0 would place hardware at wrong DMA offset
     for ALL macroblock data → garbage decode.
     Fix: frame.first_part_header_bits = slice->macroblock_offset
     (verified by source identity — vaapi_vp8.c:204 and
     v4l2_request_vp8.c:83 use byte-identical formulas).

  C2 first_part_size = slice->partition_size[0] was wrong; VAAPI's
     partition_size[0] is the REMAINING bytes after parsing
     (vaapi_vp8.c:209 confirms; va_dec_vp8.h:193-196 spec confirms).
     Kernel needs the TOTAL control partition size.
     Fix: frame.first_part_size = slice->partition_size[0] +
                                  ((macroblock_offset + 7) / 8)
     Phase 3 keyframe numerics confirm: 21923 + 819 = 22742 ✓.

  C3 VAProbabilityDataBufferType does not exist as a buffer-type
     enum; it's the struct name. The actual enum constant is
     VAProbabilityBufferType (= 13 per va.h:2058). Switch case
     using the wrong identifier would have failed Phase 6 compile.
     Fix: replace globally in phase2 + phase4 docs.

  C4 (s8) cast undefined in userspace. Kernel has 's8' typedef in
     linux/types.h (kernel-internal). UAPI exposes '__s8' (double-
     underscore). Userspace portable cast is int8_t from <stdint.h>.
     Fix: replace (s8) with (int8_t) in Clauses 6+7.

Suggested:

  S3 Clause 8 comment was factually wrong: hantro_vp8.c::
     hantro_vp8_prob_update reads coeff_probs unconditionally;
     there is NO default-table fallback. If probability_set==false,
     decode produces garbage. Practical risk low (FFmpeg vaapi_vp8.c
     always sends VAProbabilityBufferType per frame), but corrected
     comment + added assert(probability_set) runtime guard for
     immediate Phase 6 surfacing.

Plus 5 minor S/Q items documented; non-blocking for iter3.

Author's 7 review questions all answered directly in the review:
  Q1 quantization derivation: correct for typical content
  Q2 first_part_header_bits=0 safety: UNSAFE → C1
  Q3 num_dct_parts off-by-one: confirmed correct
  Q4 field availability: 2 compile failures found (C3 + C4)
  Q5 quant_update[s] semantics: signed delta confirmed
  Q6 SHOW_FRAME unconditional: safe for BBB scope
  Q7 buffer order independence: confirmed

Estimated saving: 1 Phase 6 → Phase 4 loopback + 2 Phase 6 fix-
forward commits. Review pass is the right path forward per memory
rule "Reviews are never skippable" — empty-review value =
empirical-verification value, regardless of finding count.

Refs:
  phase4_iter3_plan.md (amended in-place; Phase 5 amendments
                         section appended)
  phase2_iter3_situation.md (amended C3 globally)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:27:53 +00:00
claude-noether 2918dda2e0 iter3 Phase 4: plan — 10 contract clauses, ~308-LOC patch, 3 commits
Locks the iter3 patch shape against Phase 3 verbatim cross-validator
payload + Phase 2 contract surface. 10 contract clauses cite kernel
UAPI + VAAPI + FFmpeg ref + Phase 3 byte anchors throughout.

Patch shape (mirrors iter1 ABCD pattern):

  Commit A: src/config.c — enumeration block + CreateConfig case +
            QueryConfigEntrypoints case (3 sites, +16 LOC, 1 file).
            After: vainfo lists VP8Version0_3.
  Commit B: NEW src/vp8.c (~200 LOC) + NEW src/vp8.h (~40 LOC) +
            meson.build sources/headers entries (+2). 3 files
            (2 new + 1 modified).
            After: vp8.o compiles standalone.
  Commit C: src/picture.c — codec_set_controls dispatch +
            codec_store_buffer 4 buffer-type cases + outer
            VAProbabilityDataBufferType case + BeginPicture
            per-frame reset (4 sites, +40 LOC) + src/surface.h
            params.vp8 union member (+10 LOC). 2 files modified.
            After: end-to-end VP8 decode through libva backend.

Total: ~308 LOC, 6 files (2 new + 4 modified), 3 commits.

Contract clauses summary:

  1. Submission shape — single VIDIOC_S_EXT_CTRLS, count=1, ctrl_class=
     V4L2_CTRL_CLASS_CODEC_STATELESS (0xf010000), id=0xa409c8,
     size=1232 bytes
  2. Local struct alloc + zero-init (memset clears all padding)
  3. Frame geometry + version + per-frame scalars (off-by-one
     num_dct_parts = num_of_partitions - 1)
  4. DPB timestamp resolution (3 refs: last/golden/alt; 0-sentinel
     when SURFACE() returns NULL — mirrors iter1 mpeg2.c pattern)
  5. Loop filter mapping (6 fields + 3 flag bits)
  6. Quantization base + delta derivation (segment 0 = base via
     iqmatrix[0][0]; deltas = iqmatrix[0][N+1] - iqmatrix[0][0]
     signed; per-segment quant_update[1..3] only when segmentation
     enabled)
  7. Segment fields (segment_probs direct copy; flags assembled +
     DELTA_VALUE_MODE set unconditionally per FFmpeg pattern)
  8. Entropy table mapping — 3 VAAPI sources (Picture: y_mode +
     uv_mode + mv_probs; ProbabilityData: coeff_probs[4][8][3][11]
     direct memcpy; IQMatrix: quant)
  9. Coder state + first-partition fields + flags (6 mainline-
     documented bits only; bit 0x40 + EXPERIMENTAL NOT replicated
     vs ffmpeg-v4l2-request-git anomaly; first_part_header_bits=0
     fallback documented as known fidelity gap)
  10. Final batched submission via v4l2_set_controls

Phase 5 review questions queued (7 items): quantization derivation
correctness, per-segment quant_update semantics, first_part_header_
bits=0 safety, probability buffer ordering, endianness, struct size
sizeof correctness, field-availability test-compile per memory
feedback_review_empirical_over_theoretical Direction 2.

Cross-cutting backlog deferred (B1, B3, B4, B5, B6, L3 inherited;
iter3-Q1 first_part_header_bits + iter3-flags 0x40 anomaly NEW).

Refs:
  phase0_findings_iter3.md (Phase 1 lock)
  phase2_iter3_situation.md (Phase 2 contract surface)
  phase3_iter3_baseline.md (Phase 3 verbatim payload anchors)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:39:52 +00:00
claude-noether fd3fce86a6 iter3 Phase 3: baselines — VP8 cross-validator + 3-codec regression
+ SW reference

Captured on fresnel 2026-05-08 across two suspend cycles (laptop
dropped twice mid-run, captures preserved on /tmp/iter3_phase3).
All Phase 3 deliverables green.

Substrate verification:
  backend SHA256: 9e27...6258 (matches iter2 close)
  3-codec regression block: ALL 6 reference hashes match byte-for-
  byte vs iter1+iter2 (H.264 +30s, MPEG-2 +02s, HEVC +02s on rkvdec/
  hantro). Substrate has not regressed; criterion-5 anchor solid.

Cross-validator anchor (ffmpeg-v4l2request VP8 strace):
  - VIDIOC_S_EXT_CTRLS, count=1, ctrl_class=V4L2_CTRL_CLASS_CODEC_
    STATELESS, id=0xa409c8, size=1232 bytes
  - struct size CORRECTED: v4l2_ctrl_vp8_frame = 1232 bytes (NOT
    400 as one might assume; entropy.coeff_probs[4][8][3][11] alone
    is 1056 bytes)
  - keyframe (frame 1) verbatim payload captured: y_ac_qi=8,
    last/golden/alt ts all 0, flags=0x0d (KEY|SHOW|NOSKIP),
    y_mode_probs=[145,156,163,128] (matches FFmpeg keyframe const)
  - inter frame verbatim payload captured: y_ac_qi=122, all DPB
    timestamps non-zero, flags=0x66 (anomaly: bit 0x40 not in
    mainline UAPI; vendor-patched ffmpeg-v4l2-request-git;
    kernel hantro_vp8.c only inspects KEY_FRAME bit, ignores
    bit 0x40)

VP8 SW pixel-verify reference (criterion-4 anchor):
  vp8_sw_001.jpg: e43757a40e5d71ad176455c0fda14c2cbf9351b702188fc8ad
                  584d789db2c984
  vp8_sw_002.jpg: a86bf885e588257731ff6cf8d2ccc5756be550e85220eee1c3
                  e6ea8c0c78e97a
  Frame 1 != Frame 2 (real motion). These are the Phase 7 byte-
  compare HW-vs-SW targets.

Open-question resolution (5 of 6 answered empirically):

  Q1 first_part_header_bits — varies per frame (key=6550, inter
     ranges 86..254); VAAPI doesn't expose. Phase 4 fallback:
     leave 0 and check kernel behavior at Phase 7 byte-compare.
     Phase 5 review will flag as known fidelity gap.

  Q2 num_dct_parts vs VAAPI num_of_partitions — confirmed off-by-
     one: kernel = VAAPI - 1 (BBB has VAAPI=2, kernel=1).

  Q3 DPB timestamp 0-sentinel — confirmed: keyframe writes all
     three timestamps as 0; iter3 mirrors iter1 mpeg2.c pattern.

  Q4 SHOW_FRAME default — set on every captured frame (BBB has no
     alt-ref invisible). Force unconditional in libva backend.

  Q5 lf.flags FILTER_TYPE_SIMPLE — not set; BBB normal loop filter.
     Direct mapping from VAAPI filter_type=0.

  Q6 First-frame DPB sentinel — confirmed Q3; no self-reference
     fallback needed (different from iter1 mpeg2.c).

V4L2 binding cells this boot:
  rkvdec        : /dev/video3 + /dev/media1
  hantro-vpu-dec: /dev/video5 + /dev/media2

Capture artefacts on fresnel /tmp/iter3_phase3/ preserved for
Phase 7 re-run:
  vp8_strace.* (19 files, multi-thread)
  decode_vp8.py (payload decoder)
  vp8_sw_00{1,2}.jpg (criterion-4)
  {h264,mpeg2,hevc}_hw_00{1,2}.jpg (criterion-5)

Refs:
  phase0_findings_iter3.md (Phase 1 lock)
  phase2_iter3_situation.md (Phase 2 contract surface)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:14:46 +00:00
claude-noether 898544a29c iter3 Phase 2: situation analysis — VP8 backend gaps + contract surface
Source-read of every file the iter3 patch series will touch, plus the
kernel UAPI + VAAPI + downstream FFmpeg + kernel hantro reference
sources. Conducted on noether against fork tip 8d71e20 (iter2 Phase 6
commit B); fresnel.vpn was unreachable so Phase 3 baseline empirical
capture defers until laptop reachable.

Bug enumeration (10 sites the patch series must touch):

  B1  config.c::RequestQueryConfigProfiles    enumeration block missing
  B2  config.c::RequestCreateConfig           VP8 case label missing
  B3  config.c::RequestQueryConfigEntrypoints VP8 case missing
  B4  src/vp8.c                               new file ~160-220 LOC
  B5  src/vp8.h                               new file ~35-45 LOC
  B6  picture.c::codec_set_controls           VP8 dispatch missing
  B7  picture.c::codec_store_buffer           4 buffer-type cases +
                                              VAProbabilityDataBufferType
                                              outer case missing
  B8  picture.c::RequestBeginPicture          per-frame reset additions
  B9  surface.h::object_surface::params union vp8 member missing
  B10 meson.build                             vp8.c/vp8.h not in lists

Non-bugs (intentionally untouched):
  - context.c (no DECODE_MODE/START_CODE menus for VP8)
  - video.c (CAPTURE-side format list; VP8 is OUTPUT-side)
  - v4l2.c (fourcc-agnostic helpers)
  - buffer.c (buffer registry is type-agnostic)
  - include/hevc-ctrls.h (already includes <linux/v4l2-controls.h>
    which holds V4L2_CID_STATELESS_VP8_FRAME)

Contract surface cited verbatim:
  - V4L2_CID_STATELESS_VP8_FRAME = V4L2_CID_CODEC_STATELESS_BASE+200
    = 0x00a409c8 (matches Phase 0 V4L2 inventory)
  - struct v4l2_ctrl_vp8_frame at <linux/v4l2-controls.h>:1929-1958
    + 5 sub-structs (segment, lf, quant, entropy, coder_state) at
    1785-1888
  - VAAPI VAPictureParameterBufferVP8 + VASliceParameterBufferVP8 +
    VAProbabilityDataBufferVP8 + VAIQMatrixBufferVP8 at
    references/libva/va/va_dec_vp8.h
  - FFmpeg v4l2_request_vp8.c reference: single batched S_EXT_CTRLS
    at end_frame, count=1, no init-time menus
  - Kernel hantro_vp8.c::hantro_vp8_prob_update reads 9 fields from
    hdr (skip/intra/last/gf probs, segment_probs, entropy.{y,uv,mv,
    coeff}_probs)

VAAPI → V4L2 mapping table: 30 fields enumerated. Open questions for
Phase 3 baseline (6 items: first_part_header_bits derivation, num_
dct_parts off-by-one, DPB timestamp 0-sentinel handling, show_frame
default, lf.flags FILTER_TYPE_SIMPLE bit, first-frame DPB sentinel).

Patch-shape prediction: ~260-340 LOC across 6 modified + 2 new
files. Medium-sized iter — between iter1's 120 LOC (3 modified +
1 deleted) and iter2's 470 LOC (5 modified). The new file dominates.

Phase 3 baseline targets queued: cross-validator strace verbatim
S_EXT_CTRLS payload capture, VAAPI consumer trace, mpv-SW reference
JPEG capture for criterion 4 byte-compare anchor.

Phase 4 plan structure anticipated: 10-clause template per iter2.

Refs:
  phase0_findings_iter3.md (Phase 1 lock)
  phase8_iteration2_close.md (predecessor close)
  src/mpeg2.c (iter1 single-codec template; iter3 will mirror shape)
  src/h265.c (iter2 dispatcher pattern; iter3 takes structure cues)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:27:06 +00:00
claude-noether ea2413e957 iter3 Phase 0 + Phase 1 lock: VP8 on hantro-vpu-dec
Opens iter3 of the fresnel-fourier campaign immediately after iter2
close (df787a6). Targets VP8 as the fourth codec to pass boolean-
correctness on fresnel via libva-v4l2-request-fourier.

Locked research question:
  mpv --hwdec=vaapi bbb_720p10s_vp8.webm engages backend cleanly and
  DMA-BUF GL import yields HW pixels byte-identical to SW reference.

Five Phase 1 boolean criteria:
  1. vainfo enumerates VAProfileVP8Version0_3 on hantro env binding
  2. vaCreateConfig(VAProfileVP8Version0_3, VLD) = SUCCESS
  3. ffmpeg -hwaccel vaapi VP8 decode exit 0
  4. mpv --hwdec=vaapi --vo=image @ +02s seek: HW=SW byte-identical
     for 2 distinct frames; frame1 != frame2
  5. THREE-codec regression block: iter1 MPEG-2 + iter2 HEVC + T4
     H.264 reference hashes all hold

Substrate carry-forward (re-verified):
  - fork master tip post-iter2-close (cca539d + 8d71e20)
  - /usr/lib/dri/v4l2_request_drv_video.so SHA256 9e27...6258
  - linux-eos-arm 6.19.9-99-eos-arm (post linux-7 headers-only upgrade)
  - bbb_720p10s_vp8.webm fixture on fresnel ~/fourier-test/ (2.4 MB)
  - hantro-vpu-dec OUTPUT_MPLANE VP8F + vp8_frame_parameters control
  - cross-validator anchor confirmed: ffmpeg-v4l2request VP8 = exit 0

Predicted scope (smaller than iter1+iter2):
  - config.c: ADD VP8 enumeration block + RequestCreateConfig case
    + RequestQueryConfigEntrypoints case (3 sites; iter1+iter2
    only had 1-2 existing-but-broken case labels)
  - src/vp8.c NEW file (~150-250 lines vs iter2's 588 h265.c)
  - src/vp8.h NEW file
  - src/meson.build add 'vp8.c' + 'vp8.h' entries
  - picture.c codec_set_controls VP8 dispatch + codec_store_buffer
    cases for 4 VAAPI VP8 buffer types (Picture, Slice, Probability,
    IQMatrix)
  - surface.h params union extend with vp8 member
  - context.c: NO changes (VP8 has no DECODE_MODE/START_CODE menus
    on hantro per Phase 0 v4l2_inventory)

VP8 contract surface: single V4L2_CID_STATELESS_VP8_FRAME control
per frame (no batch); no slice_params dynamic-array (frame-mode);
no SCALING_MATRIX (entropy + quant carried in v4l2_ctrl_vp8_frame
sub-structs).

Phase 2 source-read targets queued: config.c enumeration pattern,
picture.c dispatch + per-buffer-type cases, surface.h params union,
VAAPI <va/va_dec_vp8.h>, kernel UAPI <linux/v4l2-controls.h>
v4l2_ctrl_vp8_frame, kernel hantro_vp8.c driver, FFmpeg
v4l2_request_vp8.c.

Memory carry-forward (all five entries apply unchanged):
  feedback_gitea_as_claude_noether
  feedback_no_session_termination_attempts
  feedback_header_deletion_check
  feedback_review_empirical_over_theoretical (BOTH directions)
  feedback_rockchip_pixel_verify_path

Refs:
  phase0_findings_iter1.md (iter1 MPEG-2 lock template)
  phase0_findings_iter2.md (iter2 HEVC lock template)
  phase8_iteration2_close.md (immediate predecessor close)
  phase0_evidence/2026-05-07/v4l2_inventory_findings.md (hantro VP8
    capability)
  phase0_evidence/2026-05-07/cross_validator_traces.md (VP8 kernel
    decode path proven)
  phase0_evidence/2026-05-07/test_fixtures.md (bbb_720p10s_vp8.webm
    provenance)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:49:28 +00:00
claude-noether df787a6cc2 iter2 Phase 8 close: 3/5 codecs passing, lesson L1 extended (BOTH directions)
Iteration 2 closes with all 5 Phase 1 boolean-correctness criteria
green. Third codec passes — campaign scoreboard 2/5 → 3/5 (H.264
in T4, MPEG-2 in iter1, HEVC in iter2). Loop terminates per
feedback_dev_process.md Phase 8.

Notable: ZERO Phase 7 → Phase 4 loopbacks needed. Phase 5 review
caught all 3 would-be loopback triggers in advance (data_byte_offset
rename, dpb.rps→index-arrays semantics, pic_order_cnt_val rename).
This is the dev-process ideal: review catches bugs before
implementation lands; verification confirms contract.

What landed:

  Code (libva-v4l2-request-fourier master 229d6d1 → 8d71e20):
    cca539d iter2 Phase 6 commit A: config.c break for HEVCMain case
    8d71e20 iter2 Phase 6 commit B: rewrite h265.c against new V4L2
            stateless HEVC API (6 files, 463 ins, 236 del)

  Both authored as Claude (noether) per feedback_gitea_as_claude_noether.md.

  Campaign docs (fresnel-fourier):
    6e8c970 iter2 Phase 0 + Phase 1 lock
    b3ba157 iter2 Phase 2 situation analysis (6 bugs)
    d35a247 iter2 Phase 3 baselines (substrate post-pacman-Syu + HEVC anchor)
    348736e iter2 Phase 4 plan (10 contract clauses)
    9eae068 iter2 Phase 5 sonnet review (3 Critical UAPI errors caught)
    05b4bd5 iter2 Phase 7 verification (5/5 GREEN)
    [this commit] iter2 Phase 8 close

Lesson L1 distilled to memory (extension of iter1 entry):

  feedback_review_empirical_over_theoretical.md updated with
  Direction 2 corollary. Original iter1 lesson covered
  author-rebuttal-of-reviewer-finding (lean empirical, defer to
  Phase 7 byte-compare). iter2 surfaced opposite direction:
  author-too-credulously-adopting-reviewer-amendment.

  Concrete iter2 instance: Phase 5 S1 suggested
  picture->pic_fields.bits.uniform_spacing_flag exists in VAAPI as
  source for V4L2_HEVC_PPS_FLAG_UNIFORM_SPACING. I adopted into
  amended plan without verifying. Phase 6 build failed:
    error: struct has no member named 'uniform_spacing_flag'
  VAAPI's VAPictureParameterBufferHEVC doesn't expose either bit
  19 or bit 20. Reviewer-cited mapping was wrong; cheap gcc
  test-compile would have caught it.

  Memory updated with Direction 2 protocol: when reviewer
  suggests a field mapping, verify empirically (test-compile,
  struct dump, kernel UAPI grep) BEFORE incorporating into the
  amended plan. Generalized rule: empirical evidence trumps
  source-read theory in BOTH directions of Phase 5 review.

Backlog items deferred (campaign-internal, not durable memory):

  B6 — VAAPI ↔ V4L2 SPS field-fidelity gaps:
    sps_max_num_reorder_pics (post-fix=0, baseline=2),
    sps_max_latency_increase_plus1 (post-fix=0, baseline=4),
    possibly PPS bit 12 ENTROPY_CODING_SYNC_ENABLED.
    VAAPI doesn't expose; FFmpeg parses from bitstream directly.
    Operational impact NIL (Phase 7 Criterion 4 byte-identical
    pixel pass). Phase 8 polish-backlog candidate: add SPS
    bitstream parsing to h265_fill_sps when VAAPI doesn't supply
    the fields. Probably low ROI — kernel HEVC handler tolerates
    0 values for the BBB fixture. Defer until a real-world consumer
    surfaces a fixture that breaks on this.

  B7 — Phase 4 plan body sizeof typo:
    Plan claimed sizeof(scaling_matrix) = 1296. Empirical = 1000
    bytes. Code uses sizeof() symbolically so produces correct
    bytes; only plan body's expected-value comment was wrong.
    Phase 7 byte-compare structural check caught it. Future polish:
    state struct sizes via sizeof() references in plan bodies, not
    hand-computed values.

iter1 carryover backlog (still deferred):

  B3 latent surface-reuse bug — picture.c:287 h264.matrix_set=false
      hits union byte 240. For HEVC: byte 240 lands in h265.picture.
      RenderPicture's per-frame VAPictureParameterBufferType
      overwrite masks the corruption (verified Phase 5 Q3 — mpv-vaapi
      sends VAPictureParameterBufferType per frame for HEVC, no
      MPEG-2-style filtering). Iter2+ Phase 4 cross-cutting candidate.

  B4 context.c H.264 device-init log noise (rkvdec accepts both
      H.264 + HEVC controls cleanly; on hantro both EINVAL with
      cosmetic log). Iter2 added a 2nd batched call for HEVC; same
      (void) swallow pattern. Cosmetic.

  B5 vbv_buffer_size 1MB vs 1.31MB (MPEG-2-only; not exercised by
      HEVC).

Phase 4 cross-cutting work items collected:
  - VIDIOC_EXPBUF + DMA_BUF_IOCTL_SYNC for vaDeriveImage cache-stale
    fix (still applies to all codecs)
  - V4L2 device-discovery probe (still applies)
  - picture.c BeginPicture profile-aware reset (B3)
  - context.c H.264+HEVC device-init log suppression (B4)
  - mpeg2 vbv_buffer_size negotiation polish (B5)
  - h265 SPS bitstream-parse fidelity polish (B6)

Campaign roadmap (codec iterations remaining):
  iter3: VP8 on hantro — implement vp8.c. Smaller scope than iter2;
         predicting closer to iter1 MPEG-2 in size. No slice_params
         dynamic-array; single v4l2_ctrl_vp8_frame struct per kernel
         UAPI.
  iter4: VP9 on rkvdec — implement vp9.c. Largest control surface
         remaining.

Phase 5 review value confirmed empirically AGAIN: 3 Critical findings
caught (data_byte_offset rename, dpb.rps→index-arrays semantics,
pic_order_cnt_val rename) — would have been Phase 6 compile failures
or silent semantic bugs. Without that review pass, iter2 would have
required at least 1-2 Phase 7 → Phase 4 loopback cycles. Reviews
are never skippable per global ~/.claude/CLAUDE.md rule; iter2
exemplifies why.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:16:55 +00:00
claude-noether 05b4bd56ec iter2 Phase 7: verification — all 5 criteria GREEN, third codec PASS
Phase 7 verification of iter2 HEVC fix executed against fork tip
8d71e20 (libva-v4l2-request-fourier master = post-iter2-Commit-B).
Verbatim raw output captured to phase0_evidence/2026-05-08/
iter2_phase7/. All five Phase 1 criteria green; bonus byte-compare
confirms structural match against Baseline B with two minor field-
value divergences (informational SPS fields VAAPI doesn't expose;
non-blocking per Criterion 4 byte-identical pixel pass).

Phase 1 → Phase 7 scoreboard:

  Criterion 1 (vainfo VAProfileHEVCMain enum):                  PASS
    rkvdec bind: H.264 (5 profiles) + HEVCMain — same as Baseline.

  Criterion 2 (vaCreateConfig SUCCESS for HEVCMain):            PASS
    Pre-iter2: VA_STATUS_ERROR_UNSUPPORTED_PROFILE (12)
    Post-iter2: VA_STATUS_SUCCESS (verified verbatim libva trace)

  Criterion 3 (ffmpeg-direct HEVC engages backend, exit 0):     PASS
    5 frames decoded clean, cap_pool_init: 24 slots ready,
    no Failed-to-create lines, no S_EXT_CTRLS EINVAL.

  Criterion 4 (DMA-BUF GL HEVC HW=SW byte-identical at +02s):   PASS
    HW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5
    SW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5
    HW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656
    SW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656
    Frames 1 vs 2 hash-differ (real motion).

  Criterion 5 (iter1 MPEG-2 + T4 H.264 reference hashes):       PASS
    H.264 +30s HW1: f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9 (T4 ref MATCH)
    H.264 +30s HW2: 7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8 (T4 ref MATCH)
    MPEG-2 +02s HW1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 (iter1 ref MATCH)
    MPEG-2 +02s HW2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de (iter1 ref MATCH)

Bonus byte-compare against Phase 3 Baseline B verbatim:

  count=5, ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS=0xf010000:
    SPS            id=0xa40a90 size=40   (matches Baseline B)
    PPS            id=0xa40a91 size=64   (matches)
    SLICE_PARAMS   id=0xa40a92 size=280  (1 slice × sizeof(slice_params))
    SCALING_MATRIX id=0xa40a93 size=1000 (matches sizeof(scaling_matrix);
                                          Phase 4 plan typo'\''d 1296 — actual
                                          struct sums to 1000 = 96+384+384+
                                          128+6+2)
    DECODE_PARAMS  id=0xa40a94 size=328  (matches)
    All return = 0 (kernel accepts every batched call).

  SPS field-value divergences vs Baseline B (FFmpeg-v4l2request):
    sps_max_num_reorder_pics:    post-fix=0  baseline=2   DIVERGE
    sps_max_latency_increase_plus1: post-fix=0  baseline=4 DIVERGE
    All other SPS fields match (pic_width=1280, pic_height=720,
    bit_depth=0, flags=0x180=SAO|STRONG_INTRA_SMOOTHING).

  PPS flags also diverge slightly (bit 12 ENTROPY_CODING_SYNC_ENABLED:
  post-fix unset, baseline set). Other PPS fields match.

  Cause: VAAPI'\''s VAPictureParameterBufferHEVC doesn'\''t expose
  sps_max_num_reorder_pics, sps_max_latency_increase_plus1, or
  always-truthful entropy_coding_sync. FFmpeg parses these from
  bitstream directly. Operational impact NIL (Criterion 4 byte-
  identical pixel pass — kernel decoded correctly with these fields
  defaulted to 0). Phase 8 polish backlog candidate (low priority):
  add SPS bitstream parsing to extract these fields when VAAPI
  doesn'\''t supply them.

Phase 7 → Phase 8: clean transition, no loopback.

Notable Phase 7 observations for Phase 8 memory:

  1. Phase 5 review value confirmed: 3 Critical findings (C1
     data_byte_offset rename, C2 dpb.rps→index-arrays semantics,
     C3 pic_order_cnt_val rename) caught at Phase 5 — prevented
     Phase 6 compile failures + at least 1-2 Phase 7→Phase 4
     loopback cycles. Per memory feedback_review_empirical_over_
     theoretical.md: every Critical/Should-fix verified
     empirically before responding. Lesson held.

  2. One Phase 5 amendment was empirically wrong: S1 suggested
     uniform_spacing_flag exists in VAAPI; gcc test-compile rejected.
     Both PPS bits 19+20 left zero (VAAPI exposes neither).
     Documented inline. Lesson: even reviewer-cited field mappings
     warrant empirical verification.

  3. Phase 4 plan typo: claimed sizeof(scaling_matrix) = 1296;
     empirical size is 1000. Code uses sizeof() so produces correct
     bytes. Plan body amendment-by-side-channel; not blocking.

  4. VAAPI↔V4L2 field-fidelity gaps surfaced: 2 SPS fields +
     possibly 1 PPS bit not exposed by VAAPI. Operational nil;
     Phase 8 polish-backlog candidate.

  5. mpv --hwdec=vaapi engages HEVC cleanly (no MPEG-2-style
     filtering). Confirms Phase 5 Q3 — VAPictureParameterBufferType
     sent per-frame for HEVC; latent B3 bug masked same as MPEG-2.

  6. BBB HEVC fixture is 1 slice per frame (slice_params size=280
     = 1 × sizeof). Multi-slice path in iter2 is coded but
     untested by binding cell.

Campaign scoreboard: 2/5 → 3/5 codecs passing
(H.264 in T4, MPEG-2 in iter1, HEVC in iter2). iter2 advances
to Phase 8.

Refs:
  ../libva-v4l2-request-fourier@8d71e20 (the fork tip verified)
  phase4_iter2_plan.md (10 contract clauses; SCALING_MATRIX size
                        typo noted)
  phase5_iter2_review.md (3 Critical + 4 Should-fix amendments
                          all incorporated; S1 partially empirically
                          incorrect — VAAPI doesn'\''t expose
                          uniform_spacing_flag)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:52:10 +00:00
claude-noether 9eae068f11 iter2 Phase 5: sonnet review — 3 critical UAPI errors caught, 7 amendments
Phase 5 review run via Plan subagent with model: sonnet per
feedback_dev_process.md Phase 5 discipline. 13 findings: 3 Critical
+ 4 Should-fix + 3 Question + 3 Nit. Reviewer's bottom-line: medium
confidence (vs iter1's medium-high) — lower because the plan had
3 concrete-and-wrong claims about kernel UAPI struct fields that
would have caused compile errors or silent semantic bugs in Phase 6.

Per memory feedback_review_empirical_over_theoretical.md: every
Critical and Should-fix finding was VERIFIED against fresnel's
kernel UAPI before responding. No source-read rebuttals attempted.

Critical resolutions:

C1 (data_byte_offset, not data_bit_offset):
  Plan Clause 4 said new API "still requires bit_size + data_bit_
  offset, this logic is preserved." Empirical: struct has
  data_byte_offset (u32 byte count). FFmpeg uses straight byte
  offset, no bit search. Plan amendment: drop bit-search at
  h265.c:196-209; replace with byte-offset assignment.
  ACCEPTED.

C2 (dpb.rps GONE, pic_order_cnt_val rename, poc_st_curr_*
    arrays hold DPB indices):
  Plan Clause 6 said "DPB extraction migrates verbatim." Empirical:
    - dpb_entry has flags (only LONG_TERM_REFERENCE bit), no .rps
    - pic_order_cnt_val (singular s32) replaces pic_order_cnt[0]
    - poc_st_curr_before[16]/_after[16]/_lt_curr[16] are u8 DPB
      INDICES, not POC values; populate via FFmpeg
      get_ref_pic_index() pattern (search dpb[] by timestamp,
      return index)
  Plan amendment: replace "verbatim migration" claim with explicit
  re-spec: classify VAAPI ReferenceFrames into ST_CURR_BEFORE/
  AFTER/LT_CURR lists, assign DPB indices, populate arrays with
  indices.
  ACCEPTED.

C3 (union-aliasing reasoning wrong, claim still right):
  Same anti-pattern as iter1 review C1. Plan said reset is benign
  because RenderPicture per-buffer copies overwrite byte 17764.
  Empirical: byte 17764 lands in num_slices region; non-HEVC
  profiles never read that location. Reset is benign because
  non-aliasing, NOT because of overwriting. Wording amended.
  ACCEPTED.

Should-fix resolutions:

S1 (PPS flags 19+20 missing): empirical confirms
  V4L2_HEVC_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT (1ULL<<19)
  V4L2_HEVC_PPS_FLAG_UNIFORM_SPACING (1ULL<<20)
  Plan amended to add both. ACCEPTED.

S2 (3 PPS scalars missing): empirical PPS struct dump confirms
  pic_parameter_set_id, num_ref_idx_l0_default_active_minus1,
  num_ref_idx_l1_default_active_minus1 all present in modern
  struct. Plan amended to populate. ACCEPTED.

S3 (SCALING_MATRIX content divergence FFmpeg vs libva):
  FFmpeg sends memset-zero when no scaling list in stream
  (BBB has no scaling_list — SPS flags=SAO|STRONG_INTRA only).
  Plan said "populate spec defaults when iqmatrix_set==false."
  Phase 6 implementer choice; document in commit which path
  taken. Phase 7 byte-compare validates. ACCEPTED as choice
  rather than mandate.

S4 (FFmpeg function name wrong cite):
  Plan cited ff_v4l2_request_query_control_default_value;
  actual is ff_v4l2_request_query_control. Cosmetic fix.
  ACCEPTED.

Question resolutions:

Q1 (object_heap allocator size handling): VERIFIED safe.
  request.c:142-143 uses sizeof(struct object_surface). Adding
  slices[64] auto-picks-up the larger size.

Q2 (slice_segment_addr field): VERIFIED present in struct.
  Plan amended Clause 4: populate from VAAPI
  slice->slice_segment_address. Single-slice BBB safe with
  implicit zero; multi-slice would corrupt without this field.

Q3 (VAPictureParameterBufferType per-frame send for HEVC):
  Deferred to Phase 7 LIBVA_TRACE capture. iter1+T4 patterns
  suggest yes, worth grepping at verification time.

Nits N1+N2+N3: array size [16] not [8]; image-output
  directory naming cosmetic; BeginPicture cleanup deferred.

Plan amendments consolidated:
  1. Clause 4: data_byte_offset; drop bit-search; add
     slice_segment_addr population (C1 + Q2)
  2. Clause 6: explicit DPB classification + index-array logic;
     pic_order_cnt_val rename; drop dpb.rps (C2)
  3. Clause 3: 2 PPS flags + 3 scalars (S1, S2)
  4. Clause 5: function name fix (S4); SCALING_MATRIX divergence
     deferred to Phase 6 implementer (S3)
  5. Clause 10: union-aliasing reasoning corrected (C3)
  6. Clause 6: V4L2_HEVC_DPB_ENTRIES_NUM_MAX=16 macro reference (N1)
  7. Phase 7 harness: rename png_* → image_* dirs (N2)

Plan re-locks with these amendments. Phase 6 proceeds.

Per global ~/.claude/CLAUDE.md rule: Phase 5 reviews never
skippable. iter2's review was the right path forward — caught
3 concrete UAPI errors (data_bit_offset → data_byte_offset rename;
dpb.rps field gone; pic_order_cnt struct shape) that would have
been Phase 6 compile failures or silent Phase 7 byte-compare
divergences requiring loopback. Outside-look value substantial.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:51:33 +00:00
claude-noether 348736eb63 iter2 Phase 4: plan — 10 contract clauses, ~400-line h265.c rewrite
Phase 4 plan for iter2 HEVC fix. Structured per the
feedback_dev_process.md Phase 6 contract-before-code worked example
(0012-h264-omit-scaling-matrix-frame-based.patch shape): contract
clauses with citations first, then code changes mapping 1:1 to
clauses.

10 contract clauses cited from authoritative sources:

  Clause 1 — Per-frame batched VIDIOC_S_EXT_CTRLS, count=5
    Authority: linux/v4l2-controls.h:2090-2300 (8 HEVC stateless CIDs)
    Reference impl: FFmpeg libavcodec/v4l2_request_hevc.c:505-565
                    (v4l2_request_hevc_queue_decode)
    Empirical anchor: Phase 3 Baseline B verbatim payload

  Clause 2 — v4l2_ctrl_hevc_sps layout (40 bytes)
    Authority: linux/v4l2-controls.h:2096+ (struct + 9 SPS_FLAG_* bits)
    Field-by-field VAAPI source mapping table; existing
    h265_fill_sps logic preserved, just routed to flags bitmask
    Phase 3 Baseline B BBB SPS bytes: flags=SAO|STRONG_INTRA_SMOOTHING

  Clause 3 — v4l2_ctrl_hevc_pps layout (64 bytes, 19 flags)
    Authority: linux/v4l2-controls.h:2126-2150
    Field source: VAPictureParameterBufferHEVC + slice (for
                  dependent_slice_segment_flag)

  Clause 4 — v4l2_ctrl_hevc_slice_params (variable; dynamic-array)
    Authority: kernel exposes 0xa40a92 elems=1 dims=[600] dynamic-array
    Submission shape: size = sizeof(slice_params) * num_slices_in_frame
    Reference impl: FFmpeg v4l2_request_hevc.c:540-547
    BEHAVIORAL CHANGE: per-slice accumulation in codec_store_buffer
                      (replace overwrite with append-to-array)
    DPB MOVES OUT of slice_params to DECODE_PARAMS (Clause 6)

  Clause 5 — v4l2_ctrl_hevc_scaling_matrix (size M; conditional)
    Conditional on kernel availability (probed via VIDIOC_QUERY_EXT_CTRL
    at init), NOT on bitstream flag (Phase 3 baseline corrects Phase 2
    assumption)
    Spec defaults from ISO/IEC 23008-2 Table 4-1 when iqmatrix_set==false
    PROTOCOL: transcribe defaults from Phase 3 Baseline B verbatim
              SCALING_MATRIX bytes, NOT from spec recall (per
              memory feedback_review_empirical_over_theoretical.md)

  Clause 6 — v4l2_ctrl_hevc_decode_params layout (328 bytes)
    NEW in modern API (didn't exist in staging-era)
    Contains: DPB array (16 entries), POC, num_active_dpb_entries,
              num_poc_st_curr_before/after, num_poc_lt_curr,
              poc_st_curr_before[8], etc.
    Source: existing h265_fill_slice_params lines 269-315 logic
            preserved, routed to new struct

  Clause 7 — Device-wide DECODE_MODE + START_CODE menus
    Set once at init via v4l2_set_controls(...request_fd=-1, 2 ctrls)
    rkvdec accepts: FRAME_BASED + ANNEX_B (only options per kernel menu
                    constraints, Phase 0 v4l2_inventory)
    Default location: extend src/context.c:142-155 device-init block

  Clause 8 — config.c HEVCMain case must break;
    Authority: C semantics; iter1 Bug 1 pattern verbatim
    Empirical anchor: Phase 3 Baseline D scratch confirmed

  Clause 9 — picture.c::codec_set_controls HEVCMain dispatch
    Authority: existing MPEG-2 dispatch pattern at picture.c:186-191
    Replace explicit Fourier-local: HEVC stripped reject with
    h265_set_controls call

  Clause 10 — Per-slice accumulation in codec_store_buffer
    HEVC slice_params dynamic-array source = per-RenderPicture appends
    BeginPicture resets num_slices=0; codec_store_buffer appends each
    VASliceParameterBufferType to slices[N] array

Diff scope (8 files):
  src/config.c     — 5-line break addition (Clause 8)
  src/picture.c    — HEVCMain dispatch (Clause 9) + per-slice
                     accumulation (Clause 10) + BeginPicture
                     num_slices reset, ~25 lines
  src/surface.h    — extend params.h265 with slices[64] +
                     num_slices, ~17 KB extra per surface union
  src/h265.c       — full rewrite ~400 lines (Clauses 2-7)
  src/h265.h       — re-enable
  src/meson.build  — uncomment h265.c + h265.h
  src/context.c    — extend device-init for HEVC DECODE_MODE +
                     START_CODE
  include/hevc-ctrls.h — leave as-is (9-line shim, lower-risk path
                          per iter1 Phase 5 Nit 6 deferral)

Phase 6 implementation order (2 logical commits + optional fix-forward):
  A: src/config.c HEVCMain break only (substrate fix in isolation;
     Phase 3 Baseline D already verified collateral safe)
  B: h265.c rewrite + picture.c dispatch + slice_params accumulation +
     meson re-enable + surface.h extension + context.c device-init
  C: optional fix-forward if Phase 7 surfaces a regression

Phase 7 verification harness (full Bash incantations in plan body):
  Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec
  Criterion 2: vaCreateConfig(VAProfileHEVCMain) = SUCCESS via libva trace
  Criterion 3: ffmpeg -hwaccel vaapi exit 0, no Failed-to-create
  Criterion 4: mpv --hwdec=vaapi --vo=image at +02s; HW=SW byte-identical
              (DMA-BUF GL cache-coherency-safe path per memory
              feedback_rockchip_pixel_verify_path.md)
  Criterion 5: iter1 MPEG-2 + T4 H.264 reference hashes still match
  Bonus: byte-compare post-fix S_EXT_CTRLS payload vs Baseline B

Pre-identified Phase 7 → Phase 4 loopback triggers:
  1. S_EXT_CTRLS EINVAL post-fix → check struct sizes (pahole),
     reserved zeroing, SCALING_MATRIX size encoding
  2. HW pixel hash mismatch → DPB ordering, slice_params bit_offset,
     SPS/PPS flags bit positions, SCALING_MATRIX values
  3. mpv --hwdec=vaapi filters HEVC out → fall-forward to ffmpeg
     -vf hwdownload (less likely; vaapi engaged MPEG-2 in iter1)
  4. iter1/T4 regression → verify diffs scoped right
  5. Slice_params dynamic-array submission shape rejected → cross-
     validator size encoding anchor
  6. SCALING_MATRIX availability detection wrong → defensive
     QUERY_EXT_CTRL probe in h265_init_device_controls
  7. Latent bug B3 hits HEVC differently than MPEG-2 → byte 240 in
     h265.picture; ffmpeg-vaapi sends VAPictureParameterBufferType
     per frame so masking holds

Out-of-scope (LOCKED): VP9/VP8; HEVC Main 10 / Main Still Picture /
range ext / tile-wavefront; perf metrics; long-duration stress;
SLICE_BASED decode mode (rkvdec FRAME_BASED only); Phase 4 cross-
cutting backlog (B1 device-discovery, B3 BeginPicture profile-aware,
B4 context.c log suppression, B5 vbv_buffer_size, L3 vaDeriveImage
cache-stale); chromium-fourier 149 install; upstream engagement;
hevc-ctrls.h deletion (Phase 5 Nit 6 lower-risk path continues).

Predicted Phase 8 close: 4-6 commits on the fork (vs iter1's 4).
Iter2 ~3x larger codebase delta than iter1 (mpeg2.c rewrite was
~120 lines; h265.c rewrite is ~400 lines).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:56:51 +00:00
claude-noether d35a247948 iter2 Phase 3: baselines — substrate verified post-upgrade, HEVC anchor captured
Phase 3 baselines for iter2 HEVC. Substrate-update verification
ran first (post pacman -Syu rolling upgrade), then iter2-specific
HEVC cross-validator anchor + Bug 1 scratch.

Pre-Phase-3 substrate event: pacman -Syu landed 71 packages.
The "scheduled for linux-7" upgrade was headers-only —
linux-eos-arm-headers 6.19.9-99 → 7.0.3-1, but linux-eos-arm
kernel binary stayed at 6.19.9-99 (EOS-ARM repo hasn't
published the matching 7.x kernel yet). Userland refreshed:
qt6-base epoch bump, libdrm 2.4.131 → 2.4.133, chromium
147 → 148, KDE 26.04.1 batch, mkinitcpio 41-3, etc. OC DTB
intact (sha256 unchanged). mfritsche Plasma session active
throughout, no SDDM regression on this kernel boot.
eos-reboot-recommended marker installed; reboot deferred.

Baseline A (substrate validation post-upgrade):

  T4 H.264 +30s and iter1 MPEG-2 +02s reference hashes all
  8 match exactly:
    H.264 HW1=SW1=f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9
    H.264 HW2=SW2=7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8
    MPEG-2 HW1=SW1=6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092
    MPEG-2 HW2=SW2=ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de
  Userland upgrade did not regress kernel-side decode or
  DMA-BUF GL readback.

Baseline B (HEVC cross-validator verbatim contract anchor):

  ffmpeg -hwaccel v4l2request decoded bbb_720p10s_hevc.mp4
  -frames:v 5 cleanly. Per-frame submission shape:

    VIDIOC_S_EXT_CTRLS, ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS,
                        count=5
      0xa40a90 SPS            size=40
      0xa40a91 PPS            size=64
      0xa40a92 SLICE_PARAMS   size=N (dynamic-array)
      0xa40a93 SCALING_MATRIX size=M
      0xa40a94 DECODE_PARAMS  size=328
    Plus init device-wide:
      0xa40a95 DECODE_MODE    (menu, set once)
      0xa40a96 START_CODE     (menu, set once)

  Key Phase 2 amendments from Phase 3 evidence:
    - Per-frame batch is 5 controls (not "up to 6" — BBB
      doesn't trigger ENTRY_POINT_OFFSETS / EXT_SPS_*).
    - SCALING_MATRIX is sent unconditionally for BBB. FFmpeg
      gates on ctx->has_scaling_matrix from kernel
      VIDIOC_QUERY_EXT_CTRL at init, NOT on per-frame
      bitstream flags. Phase 4 plan amends: query kernel for
      SCALING_MATRIX availability at init, submit if available.

  SPS payload field-decoded (40 bytes verbatim from BBB
  fixture): 1280x720, 8-bit, 4:2:0, no PCM, flags = SAO |
  STRONG_INTRA_SMOOTHING. PPS + DECODE_PARAMS + SLICE_PARAMS +
  SCALING_MATRIX payloads captured for Phase 4 transcription.

Baseline C (slice-count probe): deferred. ffprobe confirms
1 video stream HEVC Main 1280x720 24fps 10s. Per-frame
slice-count not directly extracted; assume 1 slice/frame for
x265 ultrafast preset until Phase 6 verifies. Kernel
advertises slice_params dynamic-array max 600 entries
(phase0 v4l2_inventory), so multi-slice frames are supported
by the contract.

Baseline D (Bug 1 scratch test, collateral safety):

  Applied Bug 1 (config.c break for HEVCMain) on throwaway
  branch; h265.c stayed disabled. Built + installed.
    H.264 HW frames @ +30s: f623d5f7..., 7d7bc6f2... (match T4)
    MPEG-2 HW frames @ +02s: 6e7873030dbf..., ccc7ce08810d...
                              (match iter1)
  Bug 1 in isolation does not regress H.264 or MPEG-2.

  HEVC behavior with Bug 1 only:
    libva trace: vaCreateConfig SUCCESS for VAProfileHEVCMain
    ffmpeg: Task finished with error code: -5 (Input/output error)
  Decode fails downstream because picture.c:204-206 still has
  the explicit case VAProfileHEVCMain: return UNSUPPORTED_PROFILE
  reject (Bug 2). Confirms Phase 2 prediction; Bug 2 fix
  requires h265_set_controls to exist (Bug 3-6: enable +
  rewrite). Bug 2 lands together with the h265.c rewrite in
  Commit B (analogous to iter1 Commit B).

  Scratch state cleaned: git checkout + rebuild + reinstall
  master backend. H.264 + MPEG-2 still pass. Back to Baseline-A-
  equivalent state.

Phase 4 plan inputs updated:
  - Per-frame batch: 5 controls (not "up to 6")
  - SCALING_MATRIX: unconditional iff kernel advertises (init
    QUERY_EXT_CTRL probe), not bitstream-conditional
  - SLICE_PARAMS: dynamic-array (max 600 elems per kernel UAPI)
  - DECODE_MODE + START_CODE: 2 device-wide menus at init
  - Phase 7 harness anchors on mpv-vaapi-vo=image (DMA-BUF GL
    cache-coherency-safe path per
    feedback_rockchip_pixel_verify_path.md)
  - Phase 7 bonus: byte-compare post-fix S_EXT_CTRLS payload
    against Baseline B (per feedback_review_empirical_over_
    theoretical.md — empirical wins)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:13:38 +00:00
claude-noether b3ba157cb4 iter2 Phase 2: situation analysis — six bugs in HEVC path
Phase 2 source-read of the HEVC path post-iter1-close (fork master
229d6d1). Six bugs identified, all in libva backend; kernel + driver
path proven for HEVC in Phase 0 cross-validator sweep.

Substrate timing caveat: Phase 2 conducted against fresnel kernel
6.19.9-99. Operator-scheduled rolling pacman -Syyuu to linux-7
imminent. Phase 2 source-read findings are kernel-agnostic (fork
code + UAPI + FFmpeg reference); they carry forward across the
kernel jump unchanged. Phase 3 baselines will run on linux-7.

Bug 1 — src/config.c:64-69 HEVCMain falls through to default,
returns VA_STATUS_ERROR_UNSUPPORTED_PROFILE. Verbatim match for
iter1 Bug 1 pattern; fix is 3-line break addition.

Bug 2 — src/picture.c:204-206 explicit
case VAProfileHEVCMain: return UNSUPPORTED_PROFILE
with stale comment "Fourier-local: HEVC stripped, no HW support
on RK3566." (RK3566 is ohm context; fresnel is RK3399 where
rkvdec DOES support HEVC.) Fix: replace explicit reject with
dispatch to h265_set_controls() (mirrors MPEG-2 dispatch at
picture.c:186-191).

Bug 3 — src/h265.c uses staging-era CIDs:
  V4L2_CID_MPEG_VIDEO_HEVC_PPS / _SPS / _SLICE_PARAMS
These don't exist on fresnel's 6.19 kernel headers (verified via
test-compile: gcc reports undeclared identifiers, suggests
V4L2_CID_MPEG_VIDEO_DEC_PTS as nearest match). Mainline kernel
UAPI splits HEVC stateless into 7 controls:
  V4L2_CID_STATELESS_HEVC_{SPS,PPS,SLICE_PARAMS,SCALING_MATRIX,
                            DECODE_PARAMS,DECODE_MODE,START_CODE}
  + ENTRY_POINT_OFFSETS, EXT_SPS_ST_RPS, EXT_SPS_LT_RPS
(0xa40a90..0xa40a96 + extensions, V4L2_CID_CODEC_STATELESS_BASE
+ 400..407+).

Fix shape: rewrite h265.c against new split API. Substantially
larger than iter1's mpeg2.c rewrite (HEVC has 7 controls vs MPEG-2
3, + slice_params dynamic-array, + per-slice accumulation logic
needed).

Bug 4 — h265.c uses single-slice_params shape; new API is
dynamic-array. Fresnel rkvdec advertises:
  hevc_slice_parameters 0xa40a92 elems=1 dims=[600] dynamic-array
Up to 600 slice_params entries per submission. Current
codec_store_buffer:115-135 OVERWRITES previous slice on
VASliceParameterBufferType arrival. Multi-slice frames need
APPEND-not-overwrite. FFmpeg reference v4l2_request_hevc.c:540-547
shows the pattern.

Fix shape: extend params.h265 to hold slice_params array (or
pointer+count); codec_store_buffer appends; h265_set_controls
flushes the array at end_picture as a single dynamic-array
S_EXT_CTRLS entry.

Bug 5 — h265.c missing controls: doesn't submit DECODE_PARAMS
(per-frame DPB info; new in modern API), SCALING_MATRIX (conditional
on iqmatrix_set + sps.scaling_list_enabled), DECODE_MODE+START_CODE
(device-wide menus, set once per context init).

Fix shape: add h265_fill_decode_params() (DPB ordering from VAAPI
ReferenceFrames[15] — preserve current extraction logic from
h265_fill_slice_params:269-315, route to new struct). Conditional
SCALING_MATRIX from VAIQMatrixBufferHEVC. Device-wide
DECODE_MODE+START_CODE either at first h265_set_controls call or
in extended context.c device-init block.

Bug 6 — src/meson.build comments out 'h265.c' (line 50) and
'h265.h' (line 73). Fix: uncomment both. Trivial.

Bug 7 (verify only) — include/hevc-ctrls.h is a 9-line shim that
just #include <linux/v4l2-controls.h>. Comment dates the
modernization to "linux-media 6.6+". Adds zero value; harmless.
Leave in place per iter1 Phase 5 Nit 6 lower-risk path.

Bug 8 (latent) — picture.c:287 params.h264.matrix_set=false
writes union byte 240. For HEVC: byte 240 lands inside
h265.picture (range [0..604), size 604) — different field than
MPEG-2's chroma_intra_quantiser_matrix. ffmpeg-vaapi's
per-frame VAPictureParameterBufferHEVC re-send overwrites the
corrupted byte before h265_set_controls reads. Latent for
clients that reuse a surface without re-sending picture params.
iter2+ Phase 4 cross-cutting backlog candidate; not iter2 scope.

Things verified NOT bugs:
  - h265_fill_pps/sps/slice_params field extraction from VAAPI
    structs is sound (just routes to wrong destination structs)
  - NAL header parsing (data_bit_offset bit-search) is preserved
    in new API — slice_params still has bit_size + data_bit_offset
  - v4l2_set_controls batching API in place (used by H.264 + iter1
    MPEG-2; iter2 uses same)

Substrate / kernel observation:
  - Linux mainline 7.1.0-rc2 reference checkout has
    drivers/staging/media/rkvdec/ with rkvdec.c, rkvdec-h264.c,
    rkvdec-vp9.c — NO rkvdec_hevc.c. fresnel's HEVC support is
    out-of-tree (Christian Hewitt patches per phase0_findings.md
    external references). May land in stable 7.x.
  - Phase 4 contract-before-code therefore can't cite kernel-side
    HEVC handler source until/unless rkvdec_hevc.c lands in
    mainline. UAPI doc + FFmpeg reference + Phase 3 cross-validator
    bytes are the contract anchor.

Open questions tabled for Phase 3 (post-linux-7-upgrade):
  1. iter1 + T4 references on linux-7 (regression check of closed
     iter1 work)
  2. SDDM watchpoint on linux-7
  3. Cross-validator HEVC re-anchor (Baseline C equivalent for
     HEVC) — verbatim payload bytes for SPS, PPS, DECODE_PARAMS,
     SLICE_PARAMS array, SCALING_MATRIX
  4. Pre-fix scratch test (Bug 1 + Bug 2 only, h265.c kept
     commented out) — confirm collateral safe
  5. Slice-count for bbb_720p10s_hevc.mp4 fixture
  6. Whether linux-7 brings rkvdec_hevc.c into mainline

Predicted iter2 close shape: trivial Bugs 1+2+6 fixes + sizable
h265.c rewrite (~250-400 lines, ~3x iter1's mpeg2.c) + new
codec_store_buffer slice accumulation logic. If Phase 7 fails:
likely struct-size mismatch (run pahole), DPB ordering, or
slice_params array size encoding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:28:08 +00:00
claude-noether 6e8c970c1d iter2 Phase 0 + Phase 1 lock: HEVC Main on rkvdec
Iteration 2 of the campaign 8(+1)-phase loop opens following iter1
close (dc69378). Per phase0_evidence/2026-05-07/cross_validator_
traces.md suggested ordering, iter2 attacks HEVC Main on rkvdec —
the kernel + driver path is verified working (cross-validator sweep
exit 0); broken link is the libva backend at five distinct sites:

  src/config.c HEVCMain case fall-through (analogous to iter1 Bug 1)
  src/picture.c HEVCMain explicit UNSUPPORTED_PROFILE reject (NEW)
  src/h265.c uncompiled in build (presumably staging-era CIDs;
              Phase 2 source-read decides scope of rewrite)
  include/hevc-ctrls.h staging-era local header (deferred from
              iter1 Phase 5 Nit 6; iter2 closes the loop)
  src/meson.build h265.c commented out (re-enable)

Plus possible novel issues vs iter1's MPEG-2 work:
  - HEVC has 10 stateless control IDs vs MPEG-2's 3 (much larger
    rewrite if h265.c uses staging-era API)
  - HEVC slice_params is dynamic-array (kernel rkvdec accepts up
    to 600 entries) — different submission shape vs MPEG-2 single-
    struct or H.264 fixed-shape
  - HEVC SCALING_MATRIX is conditional (only when scaling_list_
    enabled in SPS); mapping VAIQMatrixBufferHEVC to V4L2 control
  - HEVC ENTRY_POINT_OFFSETS is in kernel surface (tile/slice
    resync) but campaign fixture doesn't use tiles — defer

Locked research question:

  Make HEVC Main the third codec to pass boolean-correctness on
  fresnel via libva-v4l2-request-fourier — mpv --hwdec=vaapi
  bbb_720p10s_hevc.mp4 engages backend cleanly and DMA-BUF GL
  import yields HW pixels byte-identical to SW reference for the
  same frames.

Phase 1 success criterion (5 boolean checks, all must pass):

  1. vainfo enumerates VAProfileHEVCMain on rkvdec env binding
     (regression check; already passes today).
  2. vaCreateConfig(VAProfileHEVCMain, VLD) returns VA_STATUS_
     SUCCESS. (Pre-iter2: VA_STATUS_ERROR_UNSUPPORTED_PROFILE.)
  3. ffmpeg -hwaccel vaapi -i bbb_720p10s_hevc.mp4 -frames:v 5
     -f null - exits 0 cleanly with no Failed-to-create-decode-
     configuration lines and no S_EXT_CTRLS EINVAL on HEVC
     controls. (Phase 1 criterion 3 anchored on ffmpeg-direct,
     mirroring iter1 Phase 5 Q4 amendment for codecs mpv may
     filter out.)
  4. mpv --hwdec=vaapi --vo=image at +02s seek: 2 distinct frames
     hash-equal to SW reference, hash-differ from each other (real
     motion). DMA-BUF GL import path per memory feedback_rockchip_
     pixel_verify_path.md (NOT ffmpeg-vaapi+hwdownload, which is
     cache-stale on RK3399 for both H.264 and MPEG-2 per iter1
     Phase 6/7 findings).
  5. iter1 MPEG-2 + T4 H.264 reference hashes BOTH still match
     (regression check on prior-iteration cells):
       MPEG-2 +02s: HW1=6e7873030dbf...   HW2=ccc7ce08810d...
       H.264 +30s:  HW1=f623d5f7a416...   HW2=7d7bc6f2146d...

Substrate carry-over:

  - libva-v4l2-request-fourier master tip post-iter1-close
    (commits e7dad7a..229d6d1 stack on iter8 65969da).
  - bbb_720p10s_hevc.mp4 fixture (620 KB, HEVC Main, 1280x720,
    24fps, 10s, yuv420p; provenance phase0_evidence/2026-05-07/
    test_fixtures.md).
  - Cross-validator anchor: phase0_evidence/2026-05-07/
    cross_validator/hevc/ — 14 S_EXT_CTRLS + 5 QUERY_EXT_CTRL
    (HEVC slice_params dynamic-array introspection unique among
    the 5 codecs) + 4 REQUEST_ALLOC.
  - Memory carries forward: feedback_gitea_as_claude_noether,
    feedback_no_session_termination_attempts, feedback_header_
    deletion_check (iter1 lesson L1 — apply to hevc-ctrls.h
    deletion), feedback_review_empirical_over_theoretical
    (iter1 lesson L2 — apply to Phase 5 review responses),
    feedback_rockchip_pixel_verify_path (iter1 lesson L3 —
    DMA-BUF GL is the verifier, NOT cached-mmap).

Out-of-scope (LOCKED): VP9/VP8 (later iterations); HEVC Main 10
(silicon support unverified); HEVC Main Still Picture; performance
metrics; long-duration HEVC stress; tile / wavefront parallel
processing (ENTRY_POINT_OFFSETS); Phase 4 cross-cutting backlog
(B1 device-discovery, B3 BeginPicture profile-aware reset, B4
context.c log suppression, B5 vbv_buffer_size negotiation, L3
vaDeriveImage cache-stale fix); chromium-fourier 149 install;
src/context.c changes; upstream engagement.

Predecessor open questions:
  - iter1 B3 latent surface-reuse bug (picture.c:287
    h264.matrix_set=false hits union byte 240) — for HEVC, the
    union member is params.h265.{picture,slice,iqmatrix,
    iqmatrix_set}. params.h265 layout differs from params.mpeg2.
    Phase 2 source-read action item: verify whether byte 240 lands
    in a meaningful HEVC field. If so, iter2 may need to address
    even though MPEG-2 didn't.

Phase 2 source-read targets (queued for next phase):
  - src/h265.c (~267 lines) — current state, target API
  - src/picture.c:204-206 (the explicit HEVC reject)
  - src/config.c:55-69 (confirm HEVCMain fall-through)
  - src/surface.h:103-108 (params.h265 struct)
  - include/hevc-ctrls.h (staging-era; identify CID/struct refs)
  - src/meson.build (commented-out h265.c)
  - linux/v4l2-controls.h:2110+ (modern HEVC stateless UAPI)
  - drivers/staging/media/rkvdec/rkvdec_hevc.c (rkvdec contract)
  - libavcodec/v4l2_request_hevc.c (FFmpeg reference impl)
  - va/va_dec_hevc.h (VAAPI HEVC buffer structs)

Predicted iter2 close shape: similar pattern to iter1 (config
break + h265.c new-API rewrite + header delete + meson re-enable
+ picture.c reject removal). Larger code change than iter1
(predicting 250-400 lines for h265.c rewrite vs iter1's ~120 lines
for mpeg2.c). One novel construct (slice_params dynamic-array)
worth Phase 4 contract-clause-level attention. Expect Phase 6
takes longer than iter1; Phase 7 harness re-uses iter1's pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 09:44:30 +00:00
claude-noether dc6937868a iter1 Phase 8 close: 2/5 codecs passing, 3 lessons distilled to memory
Iteration 1 closes with all five Phase 1 boolean-correctness criteria
green. Second codec passes — campaign scoreboard 1/5 → 2/5 (H.264
in T4, MPEG-2 in iter1). Loop terminates per
feedback_dev_process.md Phase 8.

What landed:

  Code (libva-v4l2-request-fourier master 65969da..229d6d1):
    e7dad7a iter1 Phase 6 commit A: config.c break for MPEG-2 cases
    5fe873c iter1 Phase 6 commit B: rewrite mpeg2.c against new V4L2 stateless API
    3aab187 iter1 Phase 6 commit C: delete staging-era include/mpeg2-ctrls.h
    229d6d1 iter1 Phase 6 commit D: drop missed mpeg2-ctrls.h include from context.c (fix-forward)

  All four authored as Claude (noether) per feedback_gitea_as_claude_noether.md.

  Campaign docs (fresnel-fourier):
    f720c77 iter1 Phase 0 + Phase 1 lock
    cc55a6e iter1 Phase 2 situation analysis (3 bugs)
    b9625af iter1 Phase 3 baseline measurements (4 baselines)
    3e996d0 iter1 Phase 4 plan (6 contract clauses)
    0e2e1c2 iter1 Phase 5 sonnet review (6 findings, 4 amendments)
    ec9133a iter1 Phase 7 verification (5/5 GREEN)
    [this commit] iter1 Phase 8 close

Three lessons distilled to durable memory:

  L1 — feedback_header_deletion_check.md
    Phase 2 grep audit found 2 of 3 include sites for the
    staging-era mpeg2-ctrls.h header; build broke on Commit C
    delete because of context.c:42. Rule: when removing a header,
    let the compiler enumerate includes authoritatively (clean
    rebuild after include-removal patches, before git rm). Grep
    is a hint; the compiler is the authority.

  L2 — feedback_review_empirical_over_theoretical.md
    Phase 5 reviewer S2 flagged a numerical mismatch in
    vbv_buffer_size between Baseline C (1.31 MB) and predicted
    post-fix (1 MB). I rejected with confident source-read
    reasoning (slot->size = sizeimage = matches). Phase 7
    byte-compare empirically showed the reviewer was right —
    post-fix produces 1 MB, not 1.31 MB. Operational impact nil
    (kernel ignores the field), but my Phase 5 rebuttal had a
    source-read gap. Rule: when a reviewer cites a concrete
    numerical discrepancy, defer to Phase 7 byte-compare; don't
    reject on source-read alone.

  L3 — feedback_rockchip_pixel_verify_path.md
    Iter1 + T4 (H.264) both empirically confirm: libva backend's
    vaDeriveImage / cached-mmap readback returns all-zero NV12
    on RK3399 — same iter1 patch-0011 cache-coherency bug class
    observed on RK3568. Pixel verification must use DMA-BUF GL
    import (mpv --vo=image, ffmpeg-v4l2request DRM_PRIME +
    hwdownload). NOT ffmpeg-vaapi+hwdownload (cache-stale on
    Rockchip). Codec-agnostic; applies to all 5 codecs in
    campaign scope.

Backlog items deferred (campaign-internal, not durable memory):

  B1: V4L2 /dev/videoN numbering shuffles across reboots
      (rkvdec moved video3+media1 → video1+media0 between
       Phase 0/3 and Phase 7). Backend should probe /dev/media*
      for driver match. Iter2+ Phase 4 cross-cutting candidate.

  B2: mpv --hwdec=vaapi-copy silently filters MPEG-2 out before
      libva is loaded. mpv --hwdec=vaapi (DMA-BUF) DOES engage.
      Phase 1 criterion 3 ended up anchored on ffmpeg-direct.
      Mpv-side investigation as separate follow-up.

  B3: Latent surface-reuse bug — picture.c:287 h264.matrix_set
      reset writes byte 240 of params union, lands inside
      mpeg2.iqmatrix.chroma_intra_quantiser_matrix[20] (verified
      via offsetof on fresnel via gcc + libva). Per-frame
      RenderPicture overwrites this byte for ffmpeg-vaapi flows
      that send VAIQMatrixBufferType every frame. Latent for
      VAAPI clients that reuse a surface without re-sending
      IQMatrix. Iter2+ candidate.

  B4: src/context.c:142-155 H.264 device-init runs unconditionally
      on every CreateContext, EINVALs on hantro. Intentional
      best-effort but request_log fires "Unable to set
      control(s)" cosmetically. Suppress-log candidate, low
      priority.

  B5: vbv_buffer_size = SOURCE_SIZE_MAX (1 MB) rather than
      negotiated sizeimage. Kernel ignores. Polish candidate.

Phase 4 cross-cutting work items collected:
  - Add VIDIOC_EXPBUF + DMA_BUF_IOCTL_SYNC to libva backend
    image-export (fixes L3's vaDeriveImage cache-stale bug for
    all codecs).
  - V4L2 device-discovery probe (fixes B1).
  - Picture.c BeginPicture profile-aware reset (fixes B3).
  - Context.c H.264 device-init log suppression (fixes B4).

Campaign roadmap (codec iterations remaining):
  iter2: HEVC on rkvdec — re-enable h265.c in build, audit against
         rkvdec kernel HEVC contract.
  iter3: VP8 on hantro — implement vp8.c.
  iter4: VP9 on rkvdec — implement vp9.c (largest control surface).

Phase 5 review S2 historical-record correction: Phase 5 reviewer
was numerically right about vbv_buffer_size. My Phase 5 rebuttal
in phase5_iter1_review.md was empirically wrong. Acknowledged in
phase7_iter1_verification.md and phase8_iteration1_close.md;
Phase 5 doc preserved as-is for the historical record.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 09:37:19 +00:00
claude-noether ec9133a5e4 iter1 Phase 7: verification — all 5 criteria GREEN, second codec PASS
Phase 7 verification of iter1 MPEG-2 fix executed against fork tip
229d6d1 (libva-v4l2-request-fourier master = post-Commit-D).
Verbatim raw output captured to phase0_evidence/2026-05-08/
iter1_phase7/. All five Phase 1 criteria green; bonus byte-compare
confirms structural match against Baseline C with one numerical
divergence (vbv_buffer_size, kernel-ignored, non-blocking).

Phase 1 → Phase 7 scoreboard:

  Criterion 1 (vainfo MPEG-2 Simple+Main enum):           PASS
  Criterion 2 (vaCreateConfig SUCCESS for MPEG2Main):     PASS
    Pre-iter1: VA_STATUS_ERROR_UNSUPPORTED_PROFILE (12)
    Post-iter1: VA_STATUS_SUCCESS (verified verbatim libva trace)
  Criterion 3 (ffmpeg-hwaccel-vaapi engages backend):     PASS
    5 frames decoded, exit 0, no Failed-to-create lines,
    no S_EXT_CTRLS EINVAL on the MPEG-2 path
  Criterion 4 (DMA-BUF GL HW=SW byte-identical at +02s):  PASS
    HW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092
    SW frame 1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092
    HW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de
    SW frame 2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de
    Frames 1 vs 2 differ in size (real motion).
  Criterion 5 (T4 H.264 reference hashes match):          PASS
    HW + SW frames at +30s into bbb_1080p30_h264.mp4 match
    f623d5f7... and 7d7bc6f2... exactly. No H.264 regression.

Bonus byte-compare against Phase 3 Baseline C verbatim:

  count=3, ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS=0xf010000:
    SEQUENCE     id=0xa409dc size=12   (matches)
    PICTURE      id=0xa409dd size=32   (matches structurally)
    QUANTISATION id=0xa409de size=256  (intra matrix bytes
                                        IDENTICAL to Baseline C
                                        verbatim 64 bytes;
                                        non_intra all 16's)
    All return = 0 (kernel accepts every batched call).

  One numerical divergence: sequence.vbv_buffer_size
    post-fix:    0x100000 = 1 048 576 (= SOURCE_SIZE_MAX)
    Baseline C:  0x151800 = 1 376 256 (= negotiated sizeimage)
    Kernel ignores per v4l2-controls.h:2003 (informational).
    Decode is bit-exact correct regardless. Phase 5 reviewer S2
    was numerically prescient; my Phase 5 response (rejected with
    "slot->size = sizeimage") was wrong empirically; operational
    impact nil. Tracked as low-priority post-iter1 polish.

Phase 7 → Phase 8: clean transition, no loopback to Phase 4.

Notable observations for Phase 8 memory update:

  1. V4L2 /dev/videoN numbering shuffles across reboots on RK3399.
     Phase 0/3 had rkvdec=video3+media1, hantro=video5+media2; this
     boot has rkvdec=video1+media0, hantro=video3+media1. Phase 1
     binding cells using fixed paths fragile across reboots. Phase
     4 cross-cutting fix candidate: backend probes /dev/media* for
     driver=hantro-vpu/rkvdec rather than env-var stability.

  2. iter1 patch-0011 cache-stale bug class also affects MPEG-2
     (verified empirically; same as H.264 in T4). vaDeriveImage
     readback returns all-zero NV12 via ffmpeg-vaapi+hwdownload.
     Workaround: DMA-BUF GL import (mpv --vo=image) is cache-
     coherency-safe. Phase 4 cross-cutting fix candidate: add
     VIDIOC_EXPBUF + DMA_BUF_IOCTL_SYNC support to libva backend
     image-export path.

  3. src/context.c:142-155 H.264 device-init logs noisy EINVAL on
     hantro every CreateContext (return value cast to (void) but
     v4l2.c:484 still calls request_log). Cosmetic suppression
     candidate; low priority.

  4. Phase 6 commit D (fix-forward for missed mpeg2-ctrls.h
     include in context.c) — Phase 2 grep audit was incomplete.
     Phase 8 lesson: when deleting a header, completeness check
     is git rm + clean rebuild, not grep alone.

Campaign scoreboard: 1/5 → 2/5 codecs passing
(H.264 in T4, MPEG-2 in iter1). Iter1 advances to Phase 8.

Refs:
  ../libva-v4l2-request-fourier@229d6d1 (the fork tip verified)
  phase4_iter1_plan.md (criteria as locked, including Phase 5
                        amendments to criterion 3 + criterion 4)
  phase5_iter1_review.md (S2 partial-correct; S3, Q4, Q5
                          confirmed empirically)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 09:00:14 +00:00
claude-noether 0e2e1c2293 iter1 Phase 5: sonnet-architect review — 6 findings, 4 amendments
Phase 5 review run via Plan subagent with model: sonnet per
feedback_dev_process.md Phase 5 discipline. Review verbatim
preserved in phase5_iter1_review.md alongside per-finding response.

Findings: 1 Critical (latent), 2 Should-fix (1 valid, 1 misreading),
2 Question/clarification, 1 Nit. Reviewer's bottom-line: medium-high
confidence in the plan as written.

Resolutions:

C1 (union-aliasing reasoning was wrong; iter1 unaffected; latent bug):
  Verified offsets on fresnel via gcc + libva headers:
    h264.matrix_set       at union byte 240
    mpeg2.iqmatrix_set    at union byte 376
    mpeg2.iqmatrix range  [88..376) — sizeof=288
  Setting h264.matrix_set=false writes byte 240, which lands inside
  mpeg2.iqmatrix.chroma_intra_quantiser_matrix at offset 20.
  Phase 2 said the byte gets overwritten by RenderPicture before
  mpeg2_set_controls reads it. That was true only because ffmpeg-
  vaapi sends VAIQMatrixBufferType every frame; codec_store_buffer
  then copies the full 260-byte payload over the corrupted byte.
  ACCEPTED: update Phase 2 + Phase 4 wording to cite the correct
  safety chain. Latent bug for clients that reuse a surface without
  re-sending IQMatrix logged for iter2+ backlog.

S2 (vbv_buffer_size source — reviewer misread):
  Reviewer assumed slot->size = SOURCE_SIZE_MAX (1MB). Verified
  source: src/request_pool.c:71 sets pool->slots[i].size = length,
  where length is the V4L2-reported buffer length from
  VIDIOC_QUERYBUF (= negotiated sizeimage from S_FMT). Phase 3
  Baseline C strace shows S_FMT(OUTPUT_MPLANE) returns
  sizeimage=1382400=0x151800 — exactly matches Baseline C's
  vbv_buffer_size payload. Plan is correct as-is.
  REJECTED (reviewer's claim wrong); 1-line note added to Phase 6
  Commit B message clarifying the dynamic source.

S3 (default-matrix transcription byte-verify protocol):
  ACCEPTED. Phase 6 protocol amendment: when transcribing the
  64-entry default_intra[] in src/mpeg2.c, derive values from
  Baseline C QUANTISATION verbatim payload, then run a diff-based
  assertion before commit lands. Same for non_intra (all 16's),
  chroma_intra (= intra), chroma_non_intra (all 16's) — verified
  against Baseline C bytes 0..63 / 64..127 / 128..191 / 192..255.

Q4 (criterion 4 — ffmpeg+hwdownload primary, not fallback):
  ACCEPTED. Phase 7 harness criterion 4 changes from
    mpv --hwdec=vaapi --vo=image first, ffmpeg fallback
  to
    ffmpeg -hwaccel vaapi -vf hwdownload,format=nv12 primary,
    mpv-vaapi-vo=image backup
  Critical addition: Phase 7 must check both hashes match AND
  content non-zero/non-sentinel. T4 found ffmpeg-vaapi
  -hwaccel_output_format nv12 returns mostly zeros via cached-mmap
  on RK3399 (iter1 patch-0011 cache-stale bug class). For MPEG-2,
  hwdownload may use a different readback path; if it also exposes
  the cache-stale bug, swap to mpv-vaapi-vo=image. Empirical
  determination during Phase 7.

Q5 (timestamp behavior is a correction, not "no semantic change"):
  ACCEPTED. Phase 4 Clause 3 amendment: explicitly note that
  forward_ref_ts/backward_ref_ts = 0 when reference surface is
  VA_INVALID_ID is a CORRECTION vs current code's self-referencing
  behavior. Old code at src/mpeg2.c:106-107, 113-115 set
  forward_reference_surface = surface_object (self-ref) when ref
  was VA_INVALID_ID. New code sets ts to 0. Baseline C frame 1
  confirms 0-as-sentinel; FFmpeg v4l2_request_mpeg2.c:98-108
  matches. Iter1 fixes a latent bug.

Nit 6 (hevc-ctrls.h left alongside removed mpeg2-ctrls.h):
  ACCEPTED (lower-risk path). Phase 6 Commit B removes mpeg2-ctrls.h
  include only; Commit C deletes include/mpeg2-ctrls.h only.
  Hevc-ctrls.h header + include left untouched, deferred to HEVC
  iteration. Optional cleanup if Phase 6 chooses to bundle, but
  default is the smaller diff.

Phase 4 → Phase 6 amendments consolidated:
  1. Clause 3 timestamp behavior explicit (Q5)
  2. Clause 4 default-matrix Baseline-C-derived transcription (S3)
  3. Phase 7 criterion 4 ffmpeg+hwdownload primary + non-zero check (Q4)
  4. Hevc-ctrls.h cleanup deferred (Nit 6)
  5. Phase 2 + Phase 4 wording fix on union safety chain (C1 partial)
  6. Latent surface-reuse bug logged for iter2+ backlog (C1 follow-up)

Plan re-locks with these amendments. Phase 6 proceeds.

Per global ~/.claude/CLAUDE.md rule: Phase 5 reviews are never
skippable. This review was the right path forward; surfaced 2 plan
amendments + 1 latent bug worth documenting + 1 reviewer-misreading
worth pinning so the trail is clear. Material outside-look value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 05:36:12 +00:00
claude-noether 3e996d09e2 iter1 Phase 4: plan — contract clauses, diff scope, Phase 7 harness
Phase 4 plan for the iter1 MPEG-2 fix, structured per the
feedback_dev_process.md Phase 6 contract-before-code worked example
(0012-h264-omit-scaling-matrix-frame-based.patch shape):
contract clauses with citations first, then code changes mapping
1:1 to clauses.

Phase 1 criterion #3 re-locked per Phase 3 → Phase 1 loopback:

  Original: "mpv --hwdec=vaapi-copy ... engages backend"
  Adjusted: "ffmpeg -hwaccel vaapi ... engages backend"

  Phase 3 Baseline A established mpv silently filters MPEG-2 out
  before libva is loaded; the original wording was unfalsifiable.
  ffmpeg-direct exercises the path. mpv-driven testing → separate
  follow-up task.

Other 4 criteria unchanged (vainfo regression, vaCreateConfig
SUCCESS, DMA-BUF GL pixel verify HW=SW at +02s, T4 H.264
regression).

Six contract clauses cited from authoritative sources:

  Clause 1 — Three split controls in one batched VIDIOC_S_EXT_CTRLS
    Authority: linux/v4l2-controls.h:1988-2105
    Reference impl: FFmpeg libavcodec/v4l2_request_mpeg2.c:130-155
    Empirical anchor: Phase 3 Baseline C verbatim payload

  Clause 2 — v4l2_ctrl_mpeg2_sequence layout (12 bytes)
    Authority: linux/v4l2-controls.h:2009-2017
    Field-by-field VAAPI source mapping table
    Note: progressive_frame is used as proxy for progressive_sequence
          (VAAPI doesn't expose the latter; same bit for BBB).

  Clause 3 — v4l2_ctrl_mpeg2_picture layout (32 bytes)
    Authority: linux/v4l2-controls.h:2056-2065
    reserved[5] MUST be zeroed (kernel doc 2052)
    8 picture flags decoded; field-by-field VAAPI mapping

  Clause 4 — v4l2_ctrl_mpeg2_quantisation layout (256 bytes)
    Authority: linux/v4l2-controls.h:2089-2096
    Matrices in zigzag scanning order; no permutation in libva backend
    (kernel hantro_mpeg2_dec_copy_qtable handles zigzag-to-raster)
    Decision: when iqmatrix_set is false, populate from MPEG-2 spec
    defaults (ISO/IEC 13818-2 Table 7-3) to avoid kernel rejecting
    a batch missing the QUANTISATION control.

  Clause 5 — Per-frame submission via v4l2_set_controls
    Authority: existing src/h264.c:986 pattern
    surface_object->request_fd binds controls to per-surface request

  Clause 6 — config.c MPEG-2 case must break;
    Authority: C semantics; H.264 case shape at config.c:62-63
    Empirical anchor: Phase 3 Baseline B confirmed scratch-fix shape.

Diff scope:

  src/config.c    — 3 lines added (break for MPEG-2 cases) + drop
                    stale #include <mpeg2-ctrls.h>
  src/mpeg2.c     — full rewrite of mpeg2_set_controls against new
                    split API; ~120 lines replaced; switches from
                    2× v4l2_set_control(single) to 1× batched
                    v4l2_set_controls(3-control array)
  include/mpeg2-ctrls.h — DELETE (staging-era, masks kernel UAPI)
  src/picture.c, src/context.c, meson.build — no changes
                    (verified Phase 2 + Phase 3)

Phase 6 implementation order (3 logical commits):

  Commit A: config.c break — substrate fix in isolation
  Commit B: mpeg2.c rewrite + drop mpeg2-ctrls.h includes
  Commit C: delete include/mpeg2-ctrls.h

Phase 7 verification harness (full Bash incantations included in
plan body):

  Criterion 1: vainfo MPEG-2 enumeration regression check
  Criterion 2: vaCreateConfig SUCCESS via libva trace
  Criterion 3: ffmpeg -hwaccel vaapi exit 0, no Failed-to-create
  Criterion 4: mpv --hwdec=vaapi --vo=image at +02s seek; HW=SW
               byte-identical hashes for 2 distinct frames
               (fallback to ffmpeg hwdownload if mpv-vaapi also
               filters MPEG-2; criterion holds, harness adapts)
  Criterion 5: T4 H.264 hashes still f623d5f7... and 7d7bc6f2...
  Bonus: byte-compare post-fix S_EXT_CTRLS payload vs Baseline C

Pre-identified Phase 7 → Phase 4 loopback triggers:

  1. S_EXT_CTRLS EINVAL post-fix → check pic.reserved[5] memset,
     struct sizes, flag value collisions
  2. Pixel hash mismatch → check f_code packing, field/frame
     interpretation, ref timestamps, IQ matrix order
  3. mpv-vaapi filters MPEG-2 out (same as -copy) → fall-forward
     to ffmpeg hwdownload pixel verify (criterion holds, harness
     adapts; do not redefine criterion)
  4. H.264 regression → re-locate the offending change in Bug 1
  5. Header deletion breaks unaudited consumer → git grep audit

Out of scope (LOCKED): HEVC/VP9/VP8 (later iterations); vaDerive
Image cache-stale fix; chromium-fourier 149 install; perf metrics;
long-duration stress; other MPEG-2 containers; mpv-hwdec follow-up;
context.c H.264 device-init EINVAL (auxiliary, intentional);
profile/chroma/progressive_sequence refinement; upstream engagement.

Phase 5 entry: artifacts handover (no summary, raw bundle) per
feedback_dev_process.md — phase0_findings_iter1.md,
phase2_iter1_situation.md, phase3_iter1_baseline.md,
phase4_iter1_plan.md, plus phase0_evidence/2026-05-07/iter1_phase3/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 04:38:53 +00:00
claude-noether b9625af278 iter1 Phase 3: baseline measurements — Phase 2 confirmed empirically
Four Phase 3 baselines captured on fresnel post-reboot 2026-05-08
00:39 CEST. SDDM watchpoint condition stayed green (greeter passed
cleanly on the new boot). All four baselines confirm Phase 2's
situation analysis empirically; one Phase 1 criterion needs minor
adjustment (Phase 3 → Phase 1 loopback per feedback_dev_process.md).

Baseline A — pre-patch failure mode (master tip 65969da):

  ffmpeg -hwaccel vaapi -i bbb_720p10s_mpeg2.ts ... under strace +
  LIBVA_TRACE captures the chain:

  vaInitialize ret = SUCCESS
  vaQueryConfigProfiles ret = SUCCESS
  vaCreateConfig(profile=VAProfileMPEG2Main, entrypoint=VLD)
    ret = VA_STATUS_ERROR_UNSUPPORTED_PROFILE

  No V4L2 ioctls beyond ENUM_FMT probes from RequestQueryConfigProfiles.
  Confirms Phase 2 Bug 1 (config.c:55-69 fall-through to default).

Baseline B — post Bug 1 scratch patch (the missing break added):

  vaCreateConfig now returns SUCCESS. V4L2 setup proceeds:
  CREATE_BUFS, QUERYBUF (40), REQBUFS, STREAMON, S_FMT, etc.
  Then VIDIOC_S_EXT_CTRLS fails:

    ioctl(/dev/video5, VIDIOC_S_EXT_CTRLS,
          {ctrl_class=0xf010000,
           count=1,
           controls=[
             {id=V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS,
              size=56, ...}
           ]})
        = -1 EINVAL

  CID 0x9909fa (V4L2_CID_MPEG_BASE+250) doesn't exist on this kernel —
  mainline removed it in favor of the split V4L2_CID_STATELESS_MPEG2_*
  CIDs. Size 56 = sizeof(combined v4l2_ctrl_mpeg2_slice_params) from
  the fork's local include/mpeg2-ctrls.h. Confirms Phase 2 Bug 2.

  Auxiliary EINVAL: src/context.c:142-155 unconditionally sets H.264
  device-wide controls (H264_DECODE_MODE, H264_START_CODE) on every
  CreateContext, regardless of profile. EINVALs on hantro-vpu-dec
  (no H.264 controls there). Intentional best-effort behavior —
  return value is cast to (void) and discarded. Auxiliary, not iter1
  scope.

Baseline C — cross-validator verbatim contract anchor:

  ffmpeg -hwaccel v4l2request strace shows ONE batched call per frame:

    ioctl(/dev/video5, VIDIOC_S_EXT_CTRLS,
          {ctrl_class=0xf010000,    // V4L2_CTRL_CLASS_CODEC_STATELESS
           count=3,
           controls=[
             {id=0xa409dc, size=12,  ...},   // SEQUENCE
             {id=0xa409dd, size=32,  ...},   // PICTURE
             {id=0xa409de, size=256, ...}    // QUANTISATION
           ]}) = 0

  Field-by-field decode of frame 1 (I-picture):
    SEQUENCE: 1280×720, vbv=0x151800, profile_level=0,
              chroma_format=1, flags=PROGRESSIVE
    PICTURE: back/fwd_ref_ts=0/0, flags=0x82
             (FRAME_PRED_DCT|PROGRESSIVE), f_code=0xF×4 (I-frame
             default), P_C_T=1 (I), structure=3 (FRAME),
             intra_dc_precision=0
    QUANTISATION: starts [8, 16, 16, 19, 16, 19, 22, 22, ...] —
                  canonical MPEG-2 default intra matrix in zigzag
                  scanning order.

  Frame 2 (P-picture) shows real f_code values {{1,1},{15,15}}
  and forward_ref_ts pointing to frame 1's timestamp. Confirms
  Phase 2's claim that matrices arrive in zigzag order;
  no permutation needed in the libva backend (kernel's
  hantro_mpeg2_dec_copy_qtable handles zigzag-to-raster).

  This is the iter1 contract anchor: every Phase 4 implementation
  diff must produce a structurally indistinguishable
  VIDIOC_S_EXT_CTRLS call.

Baseline D — H.264 regression check (Phase 1 criterion #5):

  T4 reference hashes match exactly with scratch Bug 1 fix installed:
    HW frame 1: f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9
    SW frame 1: f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9
    HW frame 2: 7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8
    SW frame 2: 7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8

  Bug 1 fix in isolation does not regress H.264.

Phase 1 criterion #3 needs adjustment (Phase 3 → Phase 1 loopback):

  Original wording: "mpv --hwdec=vaapi-copy ... engages the backend"
  Reality: mpv-vaapi-copy never loads libva for MPEG-2. mpv's hwdec
  policy filters MPEG-2 out before libva is touched. Zero V4L2
  ioctls, zero libva trace, silent SW fallback. Independent of
  Bug 1 fix state.

  Adjusted criterion #3 (proposed; locks alongside Phase 4 plan):
    "ffmpeg -hwaccel vaapi -i bbb_720p10s_mpeg2.ts -frames:v 2
     -f null - shows vaCreateConfig SUCCESS, no Failed to create
     decode configuration lines, no EINVAL from VIDIOC_S_EXT_CTRLS,
     exits 0 cleanly."

  mpv-driven testing moves to a follow-up task (mpv hwdec-codecs
  filter override), separate from iter1.

Other 4 Phase 1 criteria (vainfo regression, vaCreateConfig SUCCESS,
DMA-BUF GL pixel verify HW=SW, T4 H.264 regression) hold as locked.

Scratch state cleanup: scratch patch reverted, master backend
reinstalled, MPEG-2 fails again with vaCreateConfig=12 — back to
Baseline A state, no leak.

Phase 4 plan inputs:

  - Diff scope: src/config.c (1 break), src/mpeg2.c (rewrite to
    new API), include/mpeg2-ctrls.h (delete or empty). picture.c
    + context.c unchanged.
  - Contract anchor: cite verbatim from
    linux/v4l2-controls.h:1985-2105, FFmpeg
    libavcodec/v4l2_request_mpeg2.c:130-155, kernel
    drivers/media/platform/verisilicon/hantro_mpeg2.c, AND this
    document's Baseline C verbatim payload.
  - Phase 7 verification: re-run all 5 Phase 1 criteria
    (with #3 adjusted), byte-by-byte compare post-fix
    VIDIOC_S_EXT_CTRLS payload against Baseline C.

Evidence files:

  Tracked (text):
    phase3_iter1_baseline.md (writeup with verbatim raw output)
    phase0_evidence/2026-05-07/iter1_phase3/baseline_A_ffmpeg/ffmpeg.stdout
    phase0_evidence/2026-05-07/iter1_phase3/baseline_B_postbug1/ffmpeg.stdout
    phase0_evidence/2026-05-07/iter1_phase3/baseline_C_xvalidator/ffmpeg.stdout

  Gitignored (regenerable from re-run incantations in the writeup):
    *.strace.*  *.txt (ftrace) libva.trace.* (added the latter pattern)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 04:04:25 +00:00
claude-noether cc55a6e60a iter1 Phase 2: situation analysis — three bugs in MPEG-2 path
Phase 2 source-read of the libva-v4l2-request-fourier MPEG-2 path
on master tip 65969da identifies three independent bugs, all in
the libva backend (kernel + driver path proven solid by Phase 0
cross-validator sweep).

Bug 1 — fall-through to default in RequestCreateConfig
(src/config.c:55-69):

  case VAProfileH264*:
      // FIXME
      break;
  case VAProfileMPEG2Simple:
  case VAProfileMPEG2Main:
  case VAProfileHEVCMain:
  default:
      return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;

H.264 cases have a break, MPEG-2 + HEVC fall through to default.
This explains the vaCreateConfig: 12 (UNSUPPORTED_PROFILE) error
observed in Phase 0 cross-validator sweep for both codecs.

Likely history: H.264 was libva-multiplanar focus iter1-iter5;
the FIXME comment suggests profile-specific validation logic was
expected but never landed. MPEG-2 stayed in fall-through bucket.

Fix shape: add break for MPEG-2 cases. HEVC stays in fall-through
(h265.c excluded from build per Phase 0 finding F-C; honest
UNSUPPORTED_PROFILE is correct until h265.c is reinstated in a
later iteration).

Bug 2 — staging-era UAPI in mpeg2.c; mainline kernel removed it:

src/mpeg2.c uses:
  V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS  (V4L2_CID_MPEG_BASE+250 = 0x9909fa)
  V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION  (V4L2_CID_MPEG_BASE+251 = 0x9909fb)

Mainline kernel UAPI (include/uapi/linux/v4l2-controls.h:1985-2105):
  V4L2_CID_STATELESS_MPEG2_SEQUENCE      (CODEC_STATELESS_BASE+220 = 0xa409dc)
  V4L2_CID_STATELESS_MPEG2_PICTURE       (CODEC_STATELESS_BASE+221 = 0xa409dd)
  V4L2_CID_STATELESS_MPEG2_QUANTISATION  (CODEC_STATELESS_BASE+222 = 0xa409de)

Fresnel V4L2 inventory confirms kernel exposes the new IDs only.
The fork's local include/mpeg2-ctrls.h is the staging-era header
that masks the kernel's modern definitions.

Six structural changes from old to new API:

1. Slice header parsing moved to kernel — bit_size, data_bit_offset,
   quantiser_scale_code GONE from new structs.
2. Reference timestamps moved from slice to picture
   (forward_ref_ts, backward_ref_ts now in v4l2_ctrl_mpeg2_picture).
3. Boolean fields collapsed into v4l2_ctrl_mpeg2_picture.flags
   bitmask (TOP_FIELD_FIRST, FRAME_PRED_DCT, CONCEALMENT_MV,
   Q_SCALE_TYPE, INTRA_VLC, ALT_SCAN, REPEAT_FIRST, PROGRESSIVE).
4. progressive_sequence collapsed into
   v4l2_ctrl_mpeg2_sequence.flags & V4L2_MPEG2_SEQ_FLAG_PROGRESSIVE.
5. PICTURE_CODING_TYPE renamed to PIC_CODING_TYPE
   (V4L2_MPEG2_PICTURE_CODING_TYPE_X → V4L2_MPEG2_PIC_CODING_TYPE_X).
6. Quantisation load_* flags removed; matrices always present;
   British spelling — quantiSation not quantiZation.

Quantisation matrix order: kernel doc says zigzag scanning order;
VAAPI VAIQMatrixBufferMPEG2 also stores in zigzag scanning order;
direct memcpy works. Kernel hantro_mpeg2.c does the
zigzag-to-raster permutation kernel-side
(hantro_mpeg2_dec_copy_qtable lines 12-26). No userspace
permutation needed in the libva backend (unlike FFmpeg, which
unwinds its internal idsp.idct_permutation order).

Per-frame submission: FFmpeg reference (libavcodec/
v4l2_request_mpeg2.c:130-155) batches 3 controls in single
VIDIOC_S_EXT_CTRLS. Backend's v4l2_set_controls (src/v4l2.c:475)
already supports batching — used by iter6/7/8 H.264
(src/h264.c:986). MPEG-2 rewrite follows H.264's batched pattern.

Bug 3 — include/mpeg2-ctrls.h is the staging-era local header:

The fork's local include/mpeg2-ctrls.h is the staging-era header
that defines the old (removed) API. config.c:37 + mpeg2.c:38
include it via meson's include_directories('../include'). Should
be deleted (or emptied); rely on kernel <linux/v4l2-controls.h>
pulled transitively via <linux/videodev2.h>.

Things verified NOT to be bugs:

- src/picture.c MPEG-2 dispatch is fully wired:
  - codec_store_buffer handles VAPictureParameterBuffer + VAIQMatrix
  - codec_set_controls dispatches MPEG-2 to mpeg2_set_controls
  - HEVC explicitly UNSUPPORTED_PROFILE (correct for build state)
- src/picture.c:287 unconditional h264.matrix_set=false reset is
  benign for MPEG-2 (union aliasing puts it in mpeg2.picture or
  .slice region; RenderPicture overwrites that byte before
  mpeg2_set_controls reads anything).
- src/mpeg2.c field extraction from VAAPI structs is sound; only
  the destination control IDs and struct shape need rewiring.
- src/v4l2.c batching API (v4l2_set_controls) is in place.

Open questions tabled for Phase 3 baseline:

1. Live ftrace of failing libva MPEG-2 attempt post Bug-1-fix
   (verify expected EINVAL on VIDIOC_S_EXT_CTRLS for old CID).
2. VAAPI VAIQMatrixBufferMPEG2 matrix order from real mpv decode
   (verify zigzag, no pre-permutation).
3. Cross-reference verbatim VIDIOC_S_EXT_CTRLS payload from
   ffmpeg-v4l2request cross-validator anchor strace dump.
4. SDDM watchpoint resolution — fresnel SSH No route to host at
   Phase 2 start (network event, SDDM regression, or operator
   power-state). Resolve before Phase 3.

Predicted iter1 outcome: small mechanical diff (config.c break
+ mpeg2.c rewrite + drop local mpeg2-ctrls.h). Phase 7 verification
should land all 5 Phase 1 boolean checks green on first or second
try. Likely Phase 7 → Phase 4 loopback triggers if any: forgotten
struct padding zero, garbage timestamps on first I-frame, or
device-state precondition we missed in hantro_mpeg2.c.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 22:24:50 +00:00
claude-noether f720c7784b iter1 Phase 0 + Phase 1 lock: MPEG-2 boolean correctness on hantro
Iteration 1 of the campaign 8(+1)-phase loop opens following the
campaign Phase 0 close (b74551b). Per the suggested order in
phase0_evidence/2026-05-07/cross_validator_traces.md, iter1 attacks
MPEG-2 — the cheapest fix in the codec status sweep (mpeg2.c is
already compiled, src/config.c has the case statements, but
vaCreateConfig returns 12 / UNSUPPORTED_PROFILE).

Locked research question:

  Make MPEG-2 the second codec to pass boolean-correctness on
  fresnel via the libva-v4l2-request-fourier path —
  mpv --hwdec=vaapi-copy bbb_720p10s_mpeg2.ts engages the backend
  cleanly and DMA-BUF GL import yields HW pixels byte-identical
  to a software-decoded reference for the same frames.

Phase 1 success criterion (5 boolean checks, all must be green):

  1. vainfo: VAProfileMPEG2Simple + VAProfileMPEG2Main still
     enumerated on the hantro env binding (regression check).
  2. vaCreateConfig(VAProfileMPEG2Main, VLD): VA_STATUS_SUCCESS.
  3. mpv --hwdec=vaapi-copy --frames=2 --vo=null on
     bbb_720p10s_mpeg2.ts: engages backend, exit 0, no
     "Failed to create decode configuration" lines.
  4. mpv --hwdec=vaapi --vo=image (DMA-BUF GL import) at +02s
     seek: 2 distinct frames hash-equal to SW reference frames
     and hash-differ from each other.
  5. T4 H.264 regression: re-run T4 incantation, hashes match
     f623d5f7... and 7d7bc6f2... reference values.

Mechanism the question targets:

The kernel + driver path is solid (cross_validator_traces.md —
ffmpeg-v4l2request decodes the same fixture exit 0). vaCreateConfig
rejection must be downstream of src/config.c:64-65's case match
(both VAProfileMPEG2Simple and VAProfileMPEG2Main are present in
the validation switch) but upstream of return-success. Plausible
suspects for Phase 2 source-read:

  - V4L2 capability probe (e.g., VIDIOC_TRY_FMT against MG2S) that
    fails because the libva backend was bound to /dev/video5 but
    is checking format against the wrong codec list.
  - Device-discovery routing reaches a default-reject because
    bound device didn't match an expected codec-to-device map.
  - media_request allocation step fails on /dev/media2 for some
    MPEG-2-specific reason.
  - iter6/iter7 regression in the dispatch-by-profile path that
    broke MPEG-2 silently because nobody on libva-multiplanar
    tested it (iter5 close: "MPEG-2 was iter1 backlog, dropped
    at iter6 close because A55 CPU handles it fine" — fresnel
    runs A53 so the disposition doesn't transfer).

Phase 4 plan must cite the contract before patching, per
feedback_dev_process.md Phase 6 contract-before-code: read kernel
drivers/media/platform/verisilicon/hantro_mpeg2.c, read FFmpeg
downstream libavcodec/v4l2_request_mpeg2.c, state the MPEG-2
control-submission contract explicitly before any code lands.

Predecessor carry-over (state vs data) explicit per
feedback_dev_process.md Phase 0 + feedback_replicate_baseline_first.md:

  Carries forward (re-verified in campaign Phase 0):
    - iter8 master fork (65969da) installed on fresnel,
      vainfo enumeration confirmed
    - bbb_720p10s_mpeg2.ts fixture (5.3 MB, MPEG-2 Main, 720p,
      10s, MPEG-TS) on fresnel ~/fourier-test/, provenance
      documented in test_fixtures.md
    - hantro-vpu-dec /dev/video5 + /dev/media2 binding
    - Cross-validator anchor: ffmpeg-v4l2request mpeg2 trace
      captured (5 S_EXT_CTRLS, 4 REQUEST_ALLOC, 4 DMA_BUF_SYNC)
    - T4 reference hashes for H.264 regression check

  Does NOT carry forward (re-acquire if needed):
    - ohm/RK3568 hantro MPEG-2 behaviour — different kernel
      driver variant inside drivers/media/platform/verisilicon/
    - Pre-iter6 libva-multiplanar MPEG-2 trace data (untested
      at iter6 close)

  Open questions inherited:
    - Cache-stale vaDeriveImage bug class on RK3399 (T4) —
      iter1 uses DMA-BUF GL import for verify; Phase 4 cross-
      cutting fix is not iter1-scoped
    - Architectural divergence ffmpeg vs our backend (EXPBUF +
      DMA_BUF_SYNC vs cap_pool + vaDeriveImage) — Phase 4
      design decision, not iter1-blocking

Out-of-scope (LOCKED): HEVC/VP9/VP8 (later iterations); vaDerive
Image cache-stale fix (Phase 4 cross-cutting); chromium-fourier
149 install (Phase 0 follow-up to iter1 substrate, non-gating);
performance metrics (Phase 1+ separate iteration); long-duration
stress (>10s); other MPEG-2 containers beyond MPEG-TS;
upstream engagement.

Iter1 Phase 2 source-read targets:
  - src/config.c::RequestCreateConfig (rejection site)
  - src/picture.c MPEG-2 dispatch path
  - src/mpeg2.c set-controls path
  - kernel drivers/media/platform/verisilicon/hantro_mpeg2.c
  - FFmpeg downstream libavcodec/v4l2_request_mpeg2.c

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 22:04:52 +00:00
claude-noether b74551bc56 phase 0 close: deliverables 5 + 6 — fixtures + cross-validator anchor
Closes Phase 0 for fresnel-fourier. Per-codec test fixtures and
cross-validator contract traces complete the campaign-locked
boolean-correctness baseline.

Deliverable #5 — per-codec test fixtures (test_fixtures.md):

Generated 4 new fixtures on fresnel from the bbb_1080p30_h264.mp4
master via stock ffmpeg (libx265 ultrafast, libvpx-vp9 speed 5,
mpeg2video, libvpx vp8). All 720p 10s 8-bit yuv420p — matching the
silicon-supported profile/pixfmt for each codec on RK3399:

  bbb_720p10s_hevc.mp4   620 KB  (HEVC Main, rkvdec target)
  bbb_720p10s_vp9.webm   3.4 MB  (VP9 Profile 0, rkvdec target)
  bbb_720p10s_mpeg2.ts   5.3 MB  (MPEG-2 Main, hantro-vpu-dec target)
  bbb_720p10s_vp8.webm   2.4 MB  (VP8, hantro-vpu-dec target)

Encode wall times on fresnel: HEVC 13s, VP9 93s, MPEG-2 6s, VP8 26s.
H.264 master is 725 MB carryover from libva-multiplanar / fourier_attribution.

Deliverable #6 — cross-validator anchor (cross_validator_traces.md):

phase0_findings.md named chromium-fourier 149 as the cross-validator;
that package isn't installed on fresnel and marfrit-packages isn't
configured (no auto-install path tonight). Substituted ffmpeg
-hwaccel v4l2request as a better-fit cross-validator: it's an
independent V4L2 client (uses no libva at all, lives in
libavcodec/v4l2_request*.c), already on the box (stock
ffmpeg n8.1-13-gb57fbbe50c, the Kwiboo v4l2-request-n8.1 branch),
and implements all 5 codecs the campaign locked.

Headline finding: ALL 5 CODECS WORK end-to-end via the kernel
direct path on RK3399.

  ffmpeg -hwaccel v4l2request -i bbb_<codec>.<ext> -frames:v 2 -f null -
  H.264:  exit 0
  HEVC:   exit 0
  VP9:    exit 0
  MPEG-2: exit 0
  VP8:    exit 0

The Linux kernel + rkvdec + hantro-vpu drivers are solid for the
entire campaign codec scope. Phase 6 work scope is purely libva-
backend code — no kernel patches, no upstream Linux engagement.

Per-codec libva (iter8) vs ffmpeg-v4l2request status sweep:

  H.264   libva: PASS (T4 PASS + bit-exact pixel verify) | ffmpeg-v4l2req: PASS
  HEVC    libva: vaCreateConfig=12 (UNSUPPORTED_PROFILE) | ffmpeg-v4l2req: PASS
          → src/h265.c is excluded in src/meson.build but src/config.c:151
            enumerates HEVCMain via V4L2_PIX_FMT_HEVC_SLICE probe;
            vaCreateConfig fails downstream of the case match.
  VP9     libva: profile not enumerated                  | ffmpeg-v4l2req: PASS
          → no vp9.c in fork
  MPEG-2  libva: vaCreateConfig=12 (UNSUPPORTED_PROFILE) | ffmpeg-v4l2req: PASS
          → mpeg2.c IS compiled, config.c:64-65 has the case statements,
            yet vaCreateConfig rejects. Phase 2 source-read needed.
  VP8     libva: profile not enumerated                  | ffmpeg-v4l2req: PASS
          → no vp8.c in fork

Suggested Phase 6 iteration order (subject to Phase 1 lock):
  iter1: MPEG-2 — likely cheapest (config.c-level path; mpeg2.c
                  already compiled)
  iter2: HEVC   — re-enable h265.c in build, audit against rkvdec
  iter3: VP8    — implement vp8.c on hantro
  iter4: VP9    — implement vp9.c on rkvdec (largest control surface)

Per-codec ioctl frequency anchor (2-frame ffmpeg -hwaccel v4l2request):

  ioctl                  H.264 HEVC  VP9  MPEG-2 VP8
  VIDIOC_DQBUF              45   49   40    26   49
  VIDIOC_QBUF               22   24   20    10   20
  VIDIOC_CREATE_BUFS        17   17   17    12   17
  VIDIOC_QUERYBUF           15   15   15    10   15
  VIDIOC_S_EXT_CTRLS        13   14   11     5   10
  VIDIOC_EXPBUF             11   11   11     6   11
  VIDIOC_QUERY_EXT_CTRL      0    5    0     0    0
  MEDIA_IOC_REQUEST_ALLOC    4    4    4     4    4
  DMA_BUF_IOCTL_SYNC         0    0    0     4    0
  MEDIA_REQUEST_IOC_REINIT   0    0    0     0    3

Architectural divergence ffmpeg-v4l2request vs libva-v4l2-request-fourier:

  - ffmpeg uses VIDIOC_EXPBUF + DMA-BUF for downstream readback.
    Our libva backend uses cached mmap via vaDeriveImage — the
    iter1 patch-0011 cache-stale bug class. Phase 4 work item
    consistent with T4's finding: adding VIDIOC_EXPBUF + DMA-BUF-
    backed image export to the libva backend would fix the
    cache-coherency issue identified in T4's H.264 readback.
  - ffmpeg uses 4 request_fds pooled. Our backend uses 16 (iter6
    per-OUTPUT-slot binding). Both valid; different pool depth.
  - HEVC alone needs VIDIOC_QUERY_EXT_CTRL for hevc_slice_params
    dynamic-array introspection — unique among the 5 codecs.

Substrate change deferred (not a Phase 0 blocker): chromium-fourier
149 install on fresnel is Phase 1+ work. When done, a follow-up
trace pass per codec will cross-check ffmpeg-v4l2request and
chromium contracts. For Phase 0 baseline, ffmpeg-v4l2request is
the anchor.

Phase 0 fully closed. Six deliverables landed. Phase 1 lock can proceed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 21:52:21 +00:00
claude-noether d8a9903ef4 phase 0 deliverable 4: H.264 baseline trace — PASS boolean correctness
H.264 hardware decode on RK3399 / rkvdec / libva-v4l2-request-fourier
@ master tip 65969da (iter8 Phase 4) verified bit-exact correct against
software reference, when read via the cache-safe DMA-BUF GL import path.

Test method:

  - mpv --hwdec=vaapi --vo=image (DMA-BUF + EGL_EXT_image_dma_buf_import
    + glReadPixels + JPEG encode — cache-coherency-safe per the iter1
    patch-0011 lesson).
  - Decoded 2 frames at +30s seek (mid-content bunny motion, not BBB
    intro fade-in) so size + content variation is genuine.
  - Compared HW JPEGs vs SW reference JPEGs (same mpv invocation with
    --hwdec=no).

Result:

  HW frame 1 sha256 = f623d5f7...  (651,726 bytes)  byte-identical
  SW frame 1 sha256 = f623d5f7...  (651,726 bytes)  to SW reference
  HW frame 2 sha256 = 7d7bc6f2...  (630,433 bytes)  byte-identical
  SW frame 2 sha256 = 7d7bc6f2...  (630,433 bytes)  to SW reference

  Frames 1 vs 2 differ in size — real content change captured.

Phase 0 boolean-correctness criterion for H.264: PASS.

Contract trace:

The V4L2 + media-request ioctl sequence per H.264 frame is the
canonical iter6/iter7 pattern:

  S_EXT_CTRLS (CODEC_STATELESS class, request_fd=N)
  QBUF CAPTURE_MPLANE  index=K
  QBUF OUTPUT_MPLANE   index=K  (compressed slice)
  MEDIA_REQUEST_IOC_QUEUE   (request_fd=N)
  MEDIA_REQUEST_IOC_REINIT  (request_fd=N)  ← per-OUTPUT-slot reuse
  DQBUF OUTPUT_MPLANE  index=K
  DQBUF CAPTURE_MPLANE index=K

REINIT-before-DQBUF works because the kernel completes decode in
~0.6 ms (request → COMPLETE state), and mainline media_request_
ioctl_reinit accepts both IDLE and COMPLETE. iter7 cap_pool
instantiates 24 slots cleanly: "v4l2-request: cap_pool_init: 24
slots ready" in mpv stdout.

No EINVAL, no EBUSY, no errors observed across 5 frames. iter4's
frame-11 EINVAL bug from libva-multiplanar does not reproduce on
RK3399 in this short window (longer-run repro is Phase 1+ work).

Side finding — cache-stale readback bug present in libva-backend's
vaDeriveImage path on RK3399:

When pixels are read via the cached-mmap path (libva's vaDeriveImage
+ vaMapBuffer, used by ffmpeg -hwaccel vaapi -hwaccel_output_format
nv12), readback is corrupted in exactly the iter1 patch-0011 pattern:

  size=6,220,800 bytes (correct: 2 × 1920×1080×1.5 NV12)
  non-zero=544 (0.009%)
  pattern: 16 consecutive non-zero bytes at every 1920-byte row stride,
           rest of buffer reads as zero
  diff vs SW reference: 100% of bytes differ, MAE=53.3 per byte

This is the canonical stale-cached-mmap pattern. Kernel writes real
pixels (proven by DMA-BUF GL import readback succeeding), but the
libva backend's image-export path returns a cached pointer without
the correct cache-invalidation incantation. Userspace reads stale
all-zero memory punctuated by whichever cache lines happened to fetch
post-write.

Phase 4 work item: audit whether the iter1 patch-0011 cache-flush
fix is present, effective, or RK3399-routing-bypassed. Three
possibilities: (a) fix landed for RK3568 but cache topology differs
on RK3399, (b) fix is gated on something that's not true on RK3399,
or (c) RK3399 V4L2_MEMORY_MMAP page protection bypasses the flush.
Not gating Phase 0 — kernel-side decode is correct.

Phase 1+ binding cells must use the DMA-BUF GL import path for pixel
verification, not vaDeriveImage / cached-mmap. The iter1 lesson
restated: cached-mmap readback is unreliable on this hardware family.

Evidence files (under phase0_evidence/2026-05-07/h264_baseline_trace.md
and h264_baseline/):

  - mpv.stdout — libva log, vaapi-copy engaged, cap_pool_init
  - h264_baseline_trace.md — full writeup with re-run incantations
  - mpv.strace.* (gitignored) — 19 per-thread ioctl/openat traces
  - ftrace_v4l2.txt (gitignored) — kernel qbuf/dqbuf events
  - merged_ioctls.tsv (gitignored) — time-sorted V4L2/MEDIA/DRM
    ioctls across all threads
  - *.jpg (gitignored) — HW vs SW JPEG comparison artefacts
  - frames_hw_cached_readback.nv12 (gitignored) — broken nv12
    readback for forensic reference

gitignore: extended extension list (jpg, png, nv12, yuv, tsv, strace*).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 21:32:36 +00:00