Commit Graph

62 Commits

Author SHA1 Message Date
marfrit b0ebe67673 iter7 PASS close: auto-detect picks rkvdec reliably; iter4-B1a closed
Phase 7 verification 5/5 PASS:
- C1 auto-detect picks decoder (verified: auto-selected /dev/video1 +
  /dev/media0 on rkvdec, NOT encoder)
- C2 prefer rkvdec (pass-1 short-circuit confirmed)
- C3 zero regression: all 5 codec hashes (H.264 71ac099b..., HEVC
  06b2c5a0..., VP9 4f1565e8..., MPEG-2 19eefbf4..., VP8 bcc57ed5...)
  identical to iter5b-β/iter6 anchors
- C4 multi-boot stability: SOFT PASS (architectural — algorithm is
  deterministic given kernel topology; physical reboot not session-
  blocking)
- C5 vainfo lists 7 rkvdec profiles (H.264 variants + HEVC + VP9)

Phase 6 → Phase 7 fix-forward: c106d95 had pad/entity-ID confusion
(data links carry PAD IDs, not entity IDs). Empirical topology dump
on fresnel /dev/media0 revealed it; fix-forward 6df2159 allocates
topo.pads[] and resolves data-link endpoints via pads[].entity_id.

Phase 5 reviewer caught 2 CRIT + 4 IMP + 3 MIN — all incorporated.
Phase 5 missed the pad/entity ID encoding distinction; future
media-topology code reviews should ask for empirical dumps.

Net iter7 contribution: quality-of-life. Auto-detect now reliable
across boot orderings for rkvdec codecs (H.264/HEVC/VP9). MPEG-2/VP8
still need LIBVA_V4L2_REQUEST_VIDEO_PATH env override (iter4-B1b
backlog — multi-decoder routing deferred to future iter).

Fork tip 6df2159. Backend SHA 520507f6...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:10:23 +00:00
marfrit 5bf6acb964 iter7 Phase 6: 1 commit landed on fork — auto-detect refactor pending fresnel build
Fork tip c106d95 (was 70196f8). 165 LOC added / 57 removed in
src/request.c. All 9 Phase 5 amendments (2 CRIT + 4 IMP + 3 MIN)
incorporated.

Fresnel offline at push time. Build + install + Phase 7 verify
deferred until host returns. Phase 7 sweep ready to execute:
vainfo + ffmpeg-vaapi + reboot stability + iter5b/iter6 regression
check.

Code review verified algorithm correctness against Phase 5 reviewer
pseudocode + boltzmann's linux-rockchip source confirms
MEDIA_ENT_F_PROC_VIDEO_DECODER is set on rkvdec.c:1382 +
hantro_drv.c proc entities. Compile-time syntax untested
(no va-api dev headers on noether).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 09:41:12 +00:00
marfrit cebdd82e7f iter7 Phase 5: review — 2 CRIT on link-graph traversal; algorithm validated
Phase 5 sonnet-architect found:
- CRIT-1: interface links connect IO entities (source/sink) to interfaces,
  NOT directly to proc entity. Walk must use MEDIA_LNK_FL_INTERFACE_LINK
  (1U<<28) to discriminate. Author verified at media.h:223-225.
- CRIT-2: source_id/sink_id ordering not guaranteed in link entries;
  check both endpoints. Author verified media_v2_link struct at media.h:341-347.
- IMP-1: hantro decoder-proc (entity 17) distinct from encoder-proc
  (entity 3) by function field. Algorithm correct by construction —
  no encoder contamination possible.
- IMP-2: MEDIA_ENT_F_PROC_VIDEO_DECODER set on both rkvdec-proc
  (rkvdec.c:1382) and hantro-dec-proc (hantro_drv.c).
- IMP-3: current 3-call ioctl pattern has spurious memset; new function
  uses 2-call pattern (alloc all 3 arrays before second call).
- IMP-4/MIN-1/2/3: minor implementation notes.

All 5 substantive findings empirically verified against boltzmann's
linux-rockchip tree.

Phase 6 implementer pseudocode provided: walk entities → find decoder
proc → walk data links to collect IO entity neighbors → walk
interface links to find linked interface → resolve major:minor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 09:34:40 +00:00
marfrit 8ce6372ef8 iter7 Phase 4: plan — split iter4-B1 into B1a (this iter, encoder/decoder) + B1b (defer, multi-decoder routing)
Phase 2 source-read found iter4-B1 conflates two sub-bugs:
- B1a: walk picks encoder when it should pick decoder. SMALL FIX
  (~100-150 LOC). Add MEDIA_ENT_F_PROC_VIDEO_DECODER entity check
  in find_video_node_via_topology; two-pass prefer rkvdec.
- B1b: multi-decoder routing (rkvdec for H.264/HEVC/VP9 + hantro
  for MPEG-2/VP8 from one backend instance). Bigger arch fix
  ~200-400 LOC. DEFERRED.

iter7 ships B1a. Phase 1 criteria amended:
- Auto-detect always picks a decoder, never an encoder.
- Prefer rkvdec over hantro (rkvdec serves 3 of 5 codecs).
- 2 reboots verify stability.
- vainfo lists rkvdec's 3 codecs minimum.
- No regression on iter5b-β / iter6 state.

Phase 6 will use MEDIA_IOC_G_TOPOLOGY's entities+links arrays to
match V4L node entities to decoder-proc entities. Two-pass walk:
pass-1 rkvdec only, pass-2 any decoder.

Empirical baseline: on 2026-05-12 boot, /dev/media0=rkvdec (only
decoder), /dev/media1=hantro-vpu (encoder AND decoder both inside),
/dev/media2=uvc. Fix must skip encoder when accepting media1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:40:53 +00:00
marfrit fc44a1e63c iter7 Phase 0 lock: iter4-B1 auto-detect harden — require MEDIA_ENT_F_PROC_VIDEO_DECODER
Backend-only ~30-80 LOC. Walk media-topology entities (already partially
done at iter4 Commit Z); require at least one entity with function ==
MEDIA_ENT_F_PROC_VIDEO_DECODER. Eliminates the hantro encoder false-match
that breaks vainfo + ffmpeg-vaapi on every other reboot.

5 boolean Phase 1 criteria locked. No kernel work. No pixel-correctness
chasing. Quality-of-life delivery; removes per-session env-override
friction.

Predicted lowest-difficulty iteration since iter1. 2-3 hours wallclock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:25:18 +00:00
marfrit 8ce00d3aa1 iter6 PARTIAL close: Bug 6 narrowed to H-E (kernel-side hantro VP8 partial-write)
Phase 3 Candidate K executed: H-D (slot rotation) ELIMINATED via
instrumented bind+read site logging. Slot v4l2_index matches at
BeginPicture and at vaGetImage for every surface; destination_data[0]
matches slot->map[0]. No rotation mismatch.

H-A/B/C/D all eliminated. H-E (kernel-side hantro VP8 partial-write)
confirmed by elimination. The libva backend submits correct controls,
correct slice bytes, correct slices_size, correct slot indices.
Kernel writes erratic partial content (per-frame Y plane transitions
at row 536, 24, ... — not a clean buffer-size truncation, not slot
rotation).

iter6 close PARTIAL: 5 of 6 Phase 1 criteria PASS; criterion 1
(libva_vp8 == kdirect) PARTIAL — kernel-side fix needed, out of
iter6's locked backend-only scope.

No patches landed. Fresnel substrate unchanged: fork tip 70196f8,
backend SHA 2c6ff82c... (identical to iter5b-β close).

Net deliverable: Phase 3 narrowing reduces Bug-6 hypothesis space
from 5 to 1. Future iter7+ (or kernel-agent campaign) picks up the
kernel-side investigation.

Pattern recognized: iter2 HEVC transitive PASS masked Bug 5;
iter3 VP8 transitive PASS masked Bug 6. Both surfaced under direct
verification post-iter5b-β. Transitive proofs against ONE artifact
(control payload) don't catch bugs in OTHER artifacts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:52:15 +00:00
marfrit 007cf6ca8e iter6 Phase 3: narrowed Bug 6 — H-A/B/C eliminated; H-D/E (kernel) remain
Empirical Phase 3 narrowing:
- H-A slice data corruption: ELIMINATED. SHA256 of libva-dumped slice 0
  (300614 bytes) byte-identical to raw VP8 frame 0 from .webm at
  offset 10..300624 (post-VP8-header).
- H-B slices_size wrong: ELIMINATED. slices_size = fp_size +
  sum(dct_part_sizes) = 300614 exactly.
- H-C cache coherency: ELIMINATED. msync attempt yielded no output
  change; VP9 uses same image.c path and works fine.
- Control payloads: byte-identical between libva and kdirect for VP8
  keyframe (pre-Phase-2 finding).

Output pattern: erratic partial-write. Frame 0 Y plane has real
content rows 0-535, then 100% zero rows 536-719. UV plane real
rows 0-133, zero 134-359. Frame 1 Y plane real rows 0-23, zero
24-719. Per-frame transitions differ — not buffer-size truncation,
not slot rotation.

Remaining:
- H-D slot rotation (untested; needs instrumentation)
- H-E kernel-side hantro VP8 partial-write quirk (likely; needs
  ftrace / kernel investigation)

iter5b-β did fix Bug 2 for VP8 (pre-β all-zero was format mismatch;
post-β real-but-partial content is a separate kernel-side issue).

Phase 3 hands off 4 candidate directions to user:
- K: continue H-D investigation (1-2h next session)
- L: pivot to H-E kernel-side work (multi-session)
- M: park Bug 6, pick different bug (Bug 4/5 or iter4-B1)
- N: close iter6 PARTIAL, defer Bug 6 to iter7+

Substrate unchanged; no regression. Backend SHA still 2c6ff82c....

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:43:14 +00:00
marfrit bece7b7016 iter6 Phase 2: situation — VP8 control bytes are correct; bug is elsewhere
Empirical byte-diff of libva vs kdirect VP8 control payloads on
current substrate:
- Keyframe (payloads 0+1): BYTE-IDENTICAL (0 diffs / 1232 bytes)
- Inter frames: only 24 bytes diff at offset 1200-1223, which are
  the 3 reference-frame timestamps. libva uses gettimeofday→ns
  (large values), kdirect uses pts-derived (small). Both internally
  consistent; kernel uses them as keys, absolute values don't matter.

Verdict: Bug 6 is NOT in vp8.c control generation. The bytes match.
With identical controls and same hardware, libva produces 0.4% pixel
match for keyframe — bug lives in slice-data path, bytesused, cache
coherency, or CAPTURE slot rotation.

5 hypotheses (H-A..H-E) for Phase 3 to narrow:
- H-A slice data corruption in libva path (picture.c memcpy)
- H-B slices_size wrong on OUTPUT QBUF
- H-C cache coherency on OUTPUT mmap before kernel DMA read
- H-D CAPTURE slot rotation mismatch
- H-E other (deeper kernel-side)

Pre-iter5b masked all of these via the OUTPUT format mismatch
producing all-zero output. β fixed format → kernel actually decodes →
underlying bug now visible.

iter3's transitive proof verified specific control fields. Did not
verify slice data, bytesused, cache state, or slot rotation. Same
pattern as iter2's HEVC transitive PASS missing Bug 5. Future
transitive PASS claims must enumerate non-verified artifacts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:17:05 +00:00
marfrit 868d854121 iter6 Phase 0 lock: Candidate G — Bug 6 VP8 partial output
User pick. 6 boolean criteria locked: VP8 libva==kdirect; no regression
on VP9/MPEG-2/H.264-keyframe/HEVC; control-payload anchors hold.

Scope: src/vp8.c, src/picture.c VP8 dispatch + buffer cases,
src/surface.c surface_bind_slot, cap_pool slot lifecycle.
No kernel work. Backend-side fix expected (decode runs through
kernel cleanly; output diverges in slot rotation or partial fill).

Predicted small: 5-50 LOC once root-caused. Phase 2 + Phase 3
likely take more wallclock than Phase 6 implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 19:37:13 +00:00
marfrit 34e1480de5 iter6 Phase 0: substrate inventory + 5 candidate research questions
iter5b-β surfaced 3 explicit bugs (Bug 4 H.264 inter, Bug 5 HEVC
DQBUF ERROR, Bug 6 VP8 partial output) plus carried backlog items
(iter4-B1 device discrimination, B2-B6, L3, Q6, COLOR_RANGE).

Candidates F-J laid out for user lock:
- F: Bug 5 HEVC kernel-rejection (highest claim-vs-reality stigma)
- G: Bug 6 VP8 partial output (smallest suspect surface)
- H: Bug 4 H.264 inter race (highest consumer impact)
- I: Re-anchor regression hashes on β substrate
- J: iter4-B1 auto-detect harden

Recommendation: G → H → F sequence if multiple iters planned;
otherwise H for impact or J for architectural-cleanup fit.

Phase 1 lock pending user pick.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 19:23:58 +00:00
marfrit 9a14cc2527 iter5b-β Phase 8 close: PARTIAL PASS — VP9 unblocked direct, Bugs 4/5/6 carried to iter6
Iteration shipped (fork tip 70196f8, backend SHA 2c6ff82c... on fresnel):
- VP9 directly verifiable (Phase 1 criterion 1 met for 1 of 3 target codecs)
- MPEG-2 maintained (no regression after Commit D fix-forward)
- H.264 unchanged (Bug 4 deferred per Phase 1 lock)
- Architecture cleaned: CreateSurfaces2 ~70 LOC (single-responsibility),
  CreateContext owns OUTPUT lifecycle, no α'-style failure mode possible.

Surfaced bugs for iter6+:
- Bug 5: HEVC libva DQBUF FLAG_ERROR (pre-existing; iter2's transitive
  PASS verified control payload but not decode outcome)
- Bug 6: VP8 libva produces non-zero non-matching output (slot rotation
  or partial fill, masked pre-β by all-zero state)
- Bug 4: H.264 inter-frame race-loss (carried from iter4 P7)

Lessons distilled to memory:
- feedback_grep_callsites_before_no_change.md (Phase 5 v2 CRIT-2 caught
  request_pool_destroy not in DestroyContext after C3 stripped its
  only per-session caller)
- feedback_trust_iter_comments_for_lifecycle.md (Commit D fix-forward
  surfaced because Phase 4 v2 read but didn't trace context.c:262's
  iter6 ffmpeg-vaapi-copy surfaces_count=0 comment)

Campaign scoreboard: 5/5 with 2 direct (VP9 new, MPEG-2 maintained) +
3 mixed (H.264 keyframe partial, VP8 partial new, HEVC transitive-only
direct-FAIL).

iter6 awaits Phase 0 research-question lock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 19:01:07 +00:00
marfrit c773c3d2c1 iter5b-β Phase 7: PARTIAL PASS — VP9 unblocked, MPEG-2 maintained, HEVC+VP8 partial
Two acts:
Act 1 (β alone): all 5 libva codecs returned all-zero. MPEG-2 was a
regression (pre-β it worked); HEVC was unchanged (kernel returns
DQBUF FLAG_ERROR pre AND post β — same Phase 3 baseline showed it).
Root cause: ffmpeg-vaapi-copy passes surfaces_count=0 to vaCreateContext
per iter6 context.c:262 comment; my β walk of surfaces_ids[] was a
no-op → destination_planes_count stayed 0 → surface_bind_slot no-op
→ all-zero readback.

Act 2 (Commit D): cache format-uniform CAPTURE geometry in driver_data;
walk surface_heap in CreateContext; lazy-fill in CreateSurfaces2 when
fmt_valid is set; invalidate in DestroyContext. Restores MPEG-2 to
pre-β state and unlocks VP9.

Per Phase 1 criteria: criterion 1 PARTIAL (VP9 of HEVC+VP9+VP8);
criteria 2-4 PASS.

Bug 5 (NEW): HEVC libva DQBUF FLAG_ERROR — pre-existing kernel
rejection; β's OUTPUT format fix didn't address it. Transitive proof
at iter2 verified control payload shape but kernel still rejects;
some other V4L2 protocol contract aspect differs from kdirect.

Bug 6 (NEW): VP8 libva produces non-zero output with real content
(74.8% zero + 256 unique bytes incl. keyframe pixels at `93 8e 8a 89...`)
but diverges from kdirect. Decode runs; output mismatch likely
slot-rotation or partial-fill bug.

VP9 is iter5b-β's only clean PASS. Architecture-wise β succeeded:
no α'-style failure mode possible (no in-CreateSurfaces2 destructive
teardown), and the CRIT-1+CRIT-2 fixes from Phase 5 v2 review held.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 18:56:26 +00:00
marfrit 311411b3f9 iter5b-β Phase 6: 3 commits A+B+C landed on fork, build pending fresnel uptime
Commits: 1c548b1 (codec helper), cc077a0 (config wire-up),
7055b14 (β refactor + CRIT-1 + CRIT-2 + IMP-1 + IMP-2 + dead-field
cleanup). Fork tip 7055b14.

surface.c CreateSurfaces2 reduced from ~250 to ~50 LOC. OUTPUT-side
V4L2 lifecycle moved to context.c CreateContext. DestroyContext
gained request_pool_destroy() (CRIT-2 fix). last_output_*/surface_reset_
format_cache deleted (dead under β).

All 5 Phase 5 v2 amendments (CRIT-1, CRIT-2, IMP-1, IMP-2, IMP-3)
incorporated. Fresnel offline at push time — build+install+verify
deferred to Phase 7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:16:28 +00:00
marfrit 3508a2cfeb iter5b Phase 5 v2: 2 CRIT findings — NULL guard + missing request_pool_destroy
CRIT-1: context.c:64-66 video_format==NULL guard rejects every first
β CreateContext. β moves the probe from CreateSurfaces2 into
CreateContext itself, so the guard fires before any new logic runs.
Fix: remove guard, move CAPTURE probe to top of CreateContext.

CRIT-2: DestroyContext lacks request_pool_destroy. Empirical grep
shows only surface.c:220 (which β strips) calls it per-session.
Without amendment, second CreateContext gets pool->initialized=true
with stale slot pointers → QBUF EINVAL. Fix: add request_pool_destroy
to DestroyContext before REQBUFS(0). C3 (surface.c strip) and CRIT-2
fix MUST land together.

Plus IMP-1 (mplane assumption wrong for SUNXI_TILED_NV12) + IMP-2
(surface_reset_format_cache becomes dead under C7) + IMP-3 (error
recovery comment).

Phase 6 BLOCKED pending CRIT-1 + CRIT-2 fixes. Author confirmed
both at code level — Phase 5 caught what Phase 4 v2's surface read
missed ("DestroyContext teardown — no change needed" — wrong; was
incomplete).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:50:08 +00:00
marfrit 5abea730a0 iter5b Phase 4 v2: re-plan with option β — CreateContext-centric OUTPUT lifecycle
Supersedes phase4_iter5b_plan.md (the α' plan rejected at Phase 7).
β architecture: strip OUTPUT-side V4L2 device state from
RequestCreateSurfaces2 entirely; move it to RequestCreateContext
where config_id (and therefore the bound profile) is unambiguously
known. CreateSurfaces2 becomes ID-allocation + per-surface
bookkeeping only.

9 contract clauses (C1..C9). Reuses 2 of 3 reverted iter5b commits
(codec.h/codec.c helper; object_config->pixelformat wire-up at
CreateConfig). New work: C3 strip surface.c, C4 build out
context.c — predicted ~120 LOC into context.c, ~190 LOC stripped
from surface.c (net ~70 LOC delta).

Risk register: 7 items; highest is multi-context resolution change
within shared driver_data (medium impact, mitigated by existing
DestroyContext teardown). α''s destructive teardown failure mode
disappears because β has no in-CreateSurfaces2 teardown branch.

Phase 5 review focus: error-recovery branches in CreateContext,
per-surface destination_* fill semantics (format-uniform fields
at CreateContext vs per-slot fields at BeginPicture), ohm
backwards-compat verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:57:54 +00:00
marfrit 864af258e9 iter5b Phase 7: FAIL — HEVC SIGSEGV, option α' rejected, revert + loopback to β
Empirical sweep on iter5b backend (SHA d7722da...) crashed in
copy_surface_to_image during HEVC libva-vaapi-hwdownload. Coredump
backtrace shows memcpy on stale surface_object->destination_data[i]
pointer — cap_pool_destroy ran during my pixfmt-change teardown
branch, but the subsequent S_FMT got EBUSY because the OUTPUT
queue was already streaming. State corruption mid-decode.

Root cause: ffmpeg-vaapi calls vaCreateSurfaces2 *twice*, with
CreateContext+STREAMON between them. My CreateSurfaces2 gate
destructively tears down cap_pool on pixelformat change but can't
recover when REQBUFS(0) silently fails on a streaming queue.

surface.c:164-171 TODO comment from iter1 anticipated exactly this:
"STREAMOFF + REQBUFS(0) + new S_FMT + new CREATE_BUFS — that's a
context-level redesign for the next iteration." Phase 4 dismissed
the comment as targeting multi-resolution mid-stream. That
dismissal was wrong; ffmpeg-vaapi triggers the same code path.

3 reverts on fork master: 4b2288f, f8256e6, ce304ef reverted by
709ab34, 9a7f888, 6bc29ec. Backend rebuilt + reinstalled on fresnel
at iter4-tip SHA 6e90b7a9.... Post-revert HEVC libva returns the
pre-iter5b broken-but-non-crashing all-zero pattern.

Per Phase 1 lock: criteria 1 FAIL (HEVC/VP9/VP8 still all-zero);
criteria 2-4 PASS (no regression on MPEG-2/H.264 keyframe/control
payloads). iter5b does not close.

Phase 7 → Phase 4 loopback: re-plan as option β (defer OUTPUT-side
S_FMT+CREATE_BUFS to CreateContext where config_id is known and
streams haven't started). User pick: revert + re-plan with β.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:46:16 +00:00
marfrit 550bb81a3e iter5b Phase 6: 3 commits A+B+C landed clean, backend installed on fresnel
Fork tip 4b2288f. Backend SHA256 d7722da742bfcb86a9136b07e6d9a5de23668f37fcad328258966c5338265e82
on /usr/lib/dri/v4l2_request_drv_video.so (pre-iter5b was 6e90b7a9b2c33480...).

LOC: 188 across 5 modified files + 2 new (codec.h, codec.c). All 4
Phase 5 amendments (CRIT-1 + 3 IMPs) incorporated in the actual
commits, no follow-ups needed.

Phase 7 sweep ready: re-run /tmp/iter5_p3/sweep.sh on fresnel; expect
libva == kdirect == sw for HEVC + VP9 + VP8 (3 codecs unblocked);
MPEG-2 unchanged; H.264 unchanged (Bug 4 deferred to iter6).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 09:41:04 +00:00
marfrit 7d1c44bd90 iter5b Phase 5: review — CRIT-1 mechanical pseudocode fix, 3 IMP amendments
Sonnet-architect found one Critical pseudocode error and three
Important amendments. All mechanical; no structural plan change.

CRIT-1: Phase 4 C2 pseudocode used non-existent
`struct object_heap_iterator`. Actual API at object_heap.h:67-68 uses
`int *iterator`. Author re-verified vs request.c:411-418 canonical
usage. Verbatim paste would have compile-failed.

IMP-1: gate comment at surface.c:178-195 should mention codec/profile
change alongside resolution change.

IMP-2: dead `object_config->pixelformat` field at config.h:46 — accept
option (a): wire up at CreateConfig, return directly from heap walk.
Saves one pixelformat_for_profile() call in surface.c path.

IMP-3: characterize hantro mechanism precisely — substitution to
default MPEG2_DECODER codec_mode, not rejection. Explains why MPEG-2
worked but VP8 didn't pre-fix.

10 contract clauses scorecard: 1 FAIL (C2), 2 CONDITIONAL (C3, C10),
7 PASS. Phase 6 cleared conditionally pending all 4 amendments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 08:04:38 +00:00
marfrit eca03d2641 iter5b Phase 4: plan — option α' (single-config lookup), 10 contract clauses
Picks α' over the Phase 2 recommendation of β: smaller scope (~50 LOC
vs ~250), targets iter5b's actual bug (wrong OUTPUT format at INITIAL
CreateSurfaces2, not the multi-resolution mid-stream case the
surface.c:164-171 TODO comment anticipates).

Patches:
- C1/C6: NEW src/codec.{h,c} + meson.build — pixelformat_for_profile()
- C2: NEW find_sole_active_profile() static helper in surface.c
- C3: Replace surface.c:173 hardcode with profile-derived lookup
- C5: Extend last_output_* gate with pixelformat

Phase 7 expected post-fix matrix: HEVC + VP9 + VP8 libva == kdirect
== sw (3 codecs unblocked); MPEG-2 unchanged (already worked);
H.264 still race-loses inter frames (Bug 4, deferred to iter6).

Phase 5 review concerns laid out: helper completeness, heap iterator
API, gate semantics, hantro CAPTURE-derivation on correct format,
mpv probe-then-real flow, memory rule placement.

Option β deferral note: cleaner refactor exists but not necessary
for iter5b's bug; defer to future iteration when multi-resolution
mid-stream becomes a target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:24:35 +00:00
marfrit 6b0e023e7f iter5b Phase 2: situation — lifecycle traced, option β (defer to CreateContext) recommended
VA-API lifecycle traced: CreateConfig stores profile in object_config;
CreateSurfaces2 has NO config_id, can't access profile; CreateContext
takes VAConfigID and already does profile-switch for h264_start_code
(context.c:205-217, iter4 fix-forward 692eaa0).

surface.c:164-171 already flags this as deferred-work in a TODO comment:
"that's a context-level redesign for the next iteration." iter5b picks
up that deferred work.

Three options analyzed empirically:
- α: thread current_profile through driver_data (15 LOC, fragile semantic)
- β: move OUTPUT-side lifecycle to CreateContext (80-150 LOC, clean)
- γ: lazy at BeginPicture (architecturally wrong site)

Recommendation: option β. iter4 reviewer accepted the deferred-work
flag in surface.c; iter5b is the iteration that addresses it.

object_config->pixelformat field at config.h:46 is declared but never
assigned — opportunity for wiring up cleanly via the profile→pixelformat
map.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:12:58 +00:00
marfrit cd34ec1918 iter5 Phase 0 loopback: real Bug 2 is surface.c:173 hardcoded OUTPUT format
Empirical strace of all 5 codecs through libva shows VIDIOC_S_FMT on
OUTPUT_MPLANE ships pixelformat V4L2_PIX_FMT_H264_SLICE for EVERY
profile. HEVC controls submitted on H264_SLICE OUTPUT → kernel rkvdec
silently rejects/no-ops → CAPTURE stays in cap_pool init (all-zero).

Per-codec Bug 2 taxonomy:
- HEVC, VP9, VP8: OUTPUT format mismatch on rkvdec/hantro-strict → 100% zero
- MPEG-2: format mismatch but hantro tolerates → works
- H.264: format right by coincidence; keyframe decodes, inter all-zero
  (Bug 4, separate, deferred from iter5b)

Site: src/surface.c:173 `unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE`.
Same bug class as feedback_unconditional_codec_state.md
(iter4 h264_start_code = true).

iter5b new Phase 1: fix surface.c to switch pixelformat on
config_object->profile. 4 criteria locked, all backend-side, no kernel
patches. RFC v2 series filed back to backlog for a future
DMABUF-import-consumer campaign.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 11:21:41 +00:00
marfrit 0adfb11fff iter5 Phase 5: review CRIT-1 invalidates Phase 4 — loop back to Phase 0/3
Sonnet-architect review found that the RFC v2 fix mechanism does not
reach the libva backend's consumer path:
- Backend uses V4L2_MEMORY_MMAP for both OUTPUT + CAPTURE buffers.
- For MMAP buffers, vb->planes[].dbuf stays NULL.
- RFC v2 helper's plane loop skips planes with !dbuf, fence attached
  to no dma_resv.
- EXPBUF (vb2_dc_get_dmabuf) creates a fresh disjoint dma_resv.
- The fence-mechanism fix would be a no-op for the cap_pool path even
  if it did reach the right resv, because RequestSyncSurface already
  blocks on media_request_wait_completion + v4l2_dequeue_buffer.

Three alternative root-cause hypotheses for Phase 0/3 to disambiguate:
cache coherency, cap_pool slot-rotation bug, or a separate-sync gap
in vaDeriveImage/vaMapBuffer that bypasses RequestSyncSurface.

Phase 5 saved ~half a session of build-install-test wallclock that
would have ended in a Phase 7 → Phase 0 loopback anyway.

Three Important + 2 Minor findings also recorded for when iter5 reopens.

User pick: loop back to Phase 0/3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 10:56:11 +00:00
marfrit a809e9c0b8 iter5 Phase 4: plan — 4 patches + manifest diff + PKGBUILD bump
12 contract clauses (C1..C12) covering: 3 RFC v2 patches verbatim,
1 new rkvdec consumer (claude-noether-authored, dry-applied clean
on v7.0 in worktree test), kernel-agent patches/ scope tag +
fleet/fresnel.yaml diff, marfrit-packages PKGBUILD bump 7.0-1 → 7.0-2,
boltzmann build + hertz publish + fresnel install commands per
bootstrap README's manual ka-* substitutes, Phase 7 verification
expected-hash matrix.

Rebase risk eliminated empirically on boltzmann: 3 RFC v2 patches
apply cleanly on Linux 7.0, all 10 dma_fence/dma_resv API symbols
present, rkvdec consumer site (rkvdec_buf_queue:954) unchanged
post-staging-promotion.

Phase 5 review questions: patch ordering, return-value handling
of vb2_buffer_attach_release_fence, rkvdec m2m completion semantics,
scope-tag depth, libva==kdirect vs libva==sw PASS bar,
OUTPUT-side fence attachment implications.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 07:40:44 +00:00
marfrit 3c05564e99 iter5 Phase 3: baseline — 4/5 libva codecs race-lose, MPEG-2 wins, kdirect clean
5-codec sweep matrix on linux-fresnel-fourier 7.0-1 confirms:
- libva path returns all-zero cap_pool init pattern for H.264 (mostly)
  HEVC, VP9, VP8 (always). MPEG-2 wins the race (fastest hantro decode).
- kernel-direct ffmpeg-v4l2request hwdownload byte-matches SW for all
  4 race-losing codecs.
- B4 cosmetic init-probe EINVAL noise reproduced on hantro (2 ioctl per
  codec); MPEG-2 + VP8 stateless control submissions follow at = 0.

iter4 P7's "RGB(0,0x4c,0)" pattern corrected to all-zero raw bytes
(the 0x4c was YUV→RGB conversion of all-zero NV12). Same SHA shape
as iter3's hantro b34860e0 blocker fingerprint.

Control-payload strace anchors persisted as phase-7 invariants.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 05:14:57 +00:00
marfrit 9941523f1f iter5 Phase 2: situation analysis — 4-patch plan (3 RFC v2 + 1 new rkvdec consumer)
Source-read complete: 3 RFC v2 patches dissected, v7.0 rkvdec_buf_queue
site identified at line 954 of drivers/media/platform/rockchip/rkvdec/rkvdec.c,
empirical disproof of Bug 3 UAPI drift via byte-identical v6.12↔v7.0 struct
diff, hantro_v4l2.c confirmed unchanged across the same range.

Rebase risk concentrated in videobuf2-core.c (medium — vb2 core sees regular
activity); deferred to Phase 4 when boltzmann is reachable for the
git apply --3way verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 03:58:07 +00:00
marfrit 31b9255d63 iter5 Phase 0 amend: Bug 3 collapses, locked criteria 5→4
Phase 2 source-read mid-execution found that v4l2_ctrl_mpeg2_*
and v4l2_ctrl_vp8_frame are byte-identical v6.12 ↔ v7.0 mainline.
On-fresnel re-trace with correct hantro-decoder bind shows MPEG-2
controls submit at = 0; the "Unable to set control(s)" log noise
is the backend's H.264/HEVC init-probe EINVAL on a non-H.264 device
(B4 backlog), not a UAPI drift.

iter5 locked scope is now vb2_dma_resv (4 patches: 3 existing
operator-authored RFC v2 + new rkvdec consumer). Criteria reduced
from 5 to 4. B4 stays in backlog.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 03:55:36 +00:00
marfrit 8acfca3fe0 iter5 Phase 0: lock Candidate B — vb2_dma_resv + hantro UAPI drift in linux-fresnel-fourier
Five Phase 1 criteria: Bug 2 closed (cap_pool readback returns real
pixels through libva); Bug 3 closed (hantro MPEG-2 + VP8 controls
accepted on new kernel); patches ship from kernel-agent (local-carry
acceptable, mainline bonus); zero codec-contract regression vs iter4;
5/5 direct-verification block restored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:54:54 +00:00
marfrit 9d2b7c1944 iter4 Phase 7 close: Option-A transitive proof complete — VP9 PASS 4/5
Leg 1: FRAME control 168/168 bytes byte-identical to kernel-direct anchor.
Leg 2: COMPRESSED_HDR 1950/2040 match; 90-byte uv_mode[10][9] delta is the
       documented S4 carve-out (rkvdec persistent kernel table).
Leg 3: kernel-direct YUV (NV12→YUV420P, 3 frames @1280x720) SHA256-identical
       to libvpx-vp9 SW reference: 4f1565e89cd720c4eb6e59d8bbb46127b02cf13102911afc4e174925e5b36094

iter4 criteria 1+2+3 direct PASS, 4 transitive PASS, 5 carried as substrate
issue (cap_pool readback, Bug 2 + hantro UAPI drift, Bug 3) outside iter4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:01:09 +00:00
marfrit f510ac6be5 iter4 Phase 7 pause: fork fix-forward 692eaa0, awaiting fresnel return for transitive-proof closure
Mid-Phase-7 fix-forward landed on fork
(marfrit/libva-multiplanar:692eaa0): unconditional
context_object->h264_start_code = true was prepending 0x00 0x00 0x01
to VP9 slice data, shifting the rkvdec bitstream by 24 bits and
producing silent decode failure. Now gated on
config_object->profile (H.264 + HEVC only).

Empirical verification when fresnel was online: post-fix VP9 keyframe
FRAME control bytes 0-23 byte-match Phase 3 anchor:
  lf.flags=0x03 (DELTA_ENABLED|DELTA_UPDATE) — was 0x01
  base_q_idx=0x2e=46 — was 0x41=65

This is the transitive-proof leg-1 (backend-payload == kernel-direct-payload)
for the iter4 keyframe.

Open verification when fresnel returns:
- Full 168-byte FRAME control diff mine vs Phase 3 anchor
- Full 2040-byte COMPRESSED_HDR control diff
- ffmpeg-v4l2request kernel-direct VP9 decode + hwdownload pixels =
  Phase 3 SW reference (transitive-proof leg-2)

If both legs PASS, iter4 closes 5/5 (4 direct from earlier iters
+ 1 transitive iter4) per Option-A choice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 10:18:07 +00:00
marfrit d87c940788 iter4 Phase 7: criterion 1+2+3 PASS, criterion 4+5 FAIL — three bug classes identified
Verification on linux-fresnel-fourier 7.0-1:

PASS:
- Criterion 1: vainfo enumerates VAProfileVP9Profile0 via auto-detect.
- Criterion 2: vaCreateConfig SUCCESS (implicit).
- Criterion 3: ffmpeg-vaapi VP9 5-frame decode exit 0 at 0.307x, no
  ioctl errors.

FAIL — three distinguishable bug classes:

Bug 1 (VP9-specific, my Clause 6 parser):
  Strace of frame-1 keyframe FRAME control vs Phase 3 anchor:
  - byte 8 (lf.flags): mine=0x01 (DELTA_ENABLED only) vs ref=0x03
    (ENABLED|UPDATE).
  - byte 16 (base_q_idx): mine=0x41 (65) vs ref=0x2e (46).
  - byte 17 (delta_q_y_dc): mine=8 vs ref=0.
  Bit-trace shows my parser is 2 bits ahead of correct position by
  the time it reaches lf_delta_enabled. Fix path: faithful port of
  FFmpeg vp9.c::decode_frame_header.

Bug 2 (substrate-wide, cap_pool readback):
  Constant RGB(0, 0x4c, 0) "0x4c gray" pattern across all codecs
  (VP9, HEVC, MPEG-2, VP8). H.264 keyframe DOES read correctly with
  real RGB(0, 0xe3, 0) content; H.264 inter frames revert to 0x4c.
  Kernel decode succeeds (Phase 3 strace + ffmpeg-v4l2request
  standalone confirm). libva readback returns cap_pool init scratch.
  Sibling of iter3 dma_resv blocker but with different signature
  (constant 0x4c instead of all-zero 0x00).

Bug 3 (hantro UAPI drift):
  MPEG-2 + VP8 produce kernel "Unable to set control(s): Invalid
  argument" errors. UAPI struct sizes/fields likely shifted between
  6.19.9 and 7.0 (sibling of Phase 3 VP9 struct-size correction
  144/1947 -> 168/2040).

Three loopback options proposed (decision pending user):
- A: VP9-only fix (Clause 6 parser); accept Bug 2/3 as substrate
     pre-existing; criterion 4 transitive-only per iter3.
- B: Full loopback covering all 3 bugs; possibly requires kernel
     patches (vb2_dma_resv RFC v2).
- C: Phase 0 reset; substrate is the primary issue; pause iter4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 07:20:51 +00:00
marfrit 42b9ec333a iter4 Phase 6: 4 commits landed (Z+A+B+C), ffmpeg-vaapi VP9 decode PASS
Fork at marfrit/libva-multiplanar tip beaa914:
- Z (7f8fa93) device-path auto-detect via media controller topology;
  walk /dev/media*, MEDIA_IOC_DEVICE_INFO match, MEDIA_IOC_G_TOPOLOGY
  -> MEDIA_INTF_T_V4L_VIDEO -> resolve via /sys/dev/char.
  LIBVA_V4L2_REQUEST_NO_AUTODETECT=1 escape hatch.
- A (16b3973) src/config.c VP9 enumeration + dispatch + entrypoints.
- B (406d08e) NEW src/vp9.c (~750 LOC: VPX rac + inv_map_table +
  uncompressed-header partial parser + compressed-header parser +
  vp9_set_controls) + src/vp9.h + meson.build + context.h
  (persistent vp9_lf state for Phase 5 C2) + surface.h
  (params.vp9 union extension).
- C (beaa914) src/picture.c VP9 dispatcher + 2 buffer-type cases.

NO Commit D — buffer.c allow-list already permissive for VP9's 3
buffer types (Picture, Slice, SliceData; all in iter3 baseline).

Phase 5 amendments all in code: C1 no-XOR direct, C2 persistent
vp9_lf with VP9 spec defaults, C3 out_reference_mode parameter,
C4 NO_AUTODETECT escape, S4 uv_mode memcpy omitted.

Plan amendment to Commit Z section in phase4_iter4_plan.md
documents the canonical media-topology approach (replacing the
original /dev/video* walk).

Verification empirically on fresnel:
- Criterion 1: vainfo enumerates VAProfileVP9Profile0 alongside
  H.264 + HEVC under auto-detect rkvdec.
- Criterion 2 (implicit via successful ffmpeg run).
- Criterion 3: ffmpeg-vaapi VP9 5-frame decode exit 0 at
  0.307x speed, no ioctl errors.
- Criterion 4: deferred to Phase 7 verification.
- Criterion 5: rkvdec codecs work without env override; hantro
  (MPEG-2/VP8) still need env override per iter4-B1 backlog.

Open iter4 backlog: B1 (multi-decoder dispatch refactor),
B2 (mpv-vaapi Could-not-create-device — ffmpeg-vaapi works fine
through same backend, mpv does not), Q6 (per-segment ALT_Q
mapping for non-BBB), COLOR_RANGE (VAAPI gap).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 06:55:45 +00:00
marfrit 9865416ed2 iter4 Phase 5: sonnet-architect review — 4 Critical findings, all amendments incorporated
Review by sonnet-architect with cold-context source reads of fork +
kernel UAPI + VAAPI + FFmpeg references + kernel rkvdec source.
Reviewer applied Direction 2 (empirical-over-theoretical) by
test-compiling struct sizes, gcc-c-checking VAAPI field accesses,
and source-tracing FFmpeg's filter-mode XOR provenance.

Critical findings (all empirically validated by author before
incorporation per feedback_review_empirical_over_theoretical.md):

C1 - interpolation_filter double-XOR: vaapi_vp9.c:62 ALREADY applies
     `filtermode ^ (filtermode <= 1)` when filling VAAPI's
     mcomp_filter_type. Plan's second XOR was incorrect; would swap
     EIGHTTAP and EIGHTTAP_SMOOTH for inter frames -> wrong
     loop-filter strength. Fix: direct assignment, no XOR.

C2 - LF deltas not persistent: kernel UAPI explicitly says
     "users should pass its last value" when delta_update=0. Plan
     memset-zeroed each frame; would send {0,0,0,0,0,0} on BBB inter
     frames instead of {1,0,-1,-1,0,0}. Fix: add persistent vp9_lf
     state to object_context, init to VP9 spec defaults, update only
     when parser sees delta_update=1, always copy to kernel control.

C3 - reference_mode out-parameter missing: reference_mode lives in
     FRAME struct, not COMPRESSED_HDR. Plan referenced
     `compressed_hdr_reference_mode` placeholder which would be an
     undefined identifier -> compile failure. Fix: add
     `uint8_t *out_reference_mode` param to vp9_fill_compressed_hdr;
     derive `allowcompinter` at call site from the 3 sign biases.

C4 - Mitigation B scope claim overstated: walk-and-pick-first always
     selects rkvdec on 7.0 (since video1 enumerates first). Hantro
     codecs (MPEG-2, VP8) at video3 still require env override.
     Fix: qualify criterion-5 trace; add LIBVA_V4L2_REQUEST_NO_
     AUTODETECT=1 escape hatch for legacy callers.

6 Suggested (S1-S6): all confirm plan correctness OR are scope-
aligned non-issues. S4 (uv_mode memcpy omission safe for rkvdec)
baked into Clause 9 amended text.

Without this review, iter4 Phase 6 would have failed first compile
(C3) + produced wrong inter-frame output (C1+C2) + caused user
confusion (C4). Estimated saving: 1 compile failure + 1 Phase 7 ->
Phase 4 loopback + 1 doc correction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 05:49:13 +00:00
marfrit 4b36077b17 iter4 Phase 4: plan locks 12 contract clauses + Mitigation B
5-commit plan (Z, A, B, C, optional D):
- Commit Z: src/request.c — walk /dev/video* + /dev/media*, match by
  driver name in {rkvdec, hantro-vpu, cedrus, sun4i_csi}; restores
  baseline functionality on 7.0 (where /dev/video0 is rockchip-rga).
- Commit A: src/config.c — VAProfileVP9Profile0 enumeration + dispatch
  + entrypoints (~16 LOC, 1 file).
- Commit B: NEW src/vp9.c + .h + meson — 12 contract clauses; ~580 LOC
  vp9.c (50 infra + 80 VPX rac + 50 uncompressed-header partial parse +
  180 compressed-header parser + ~200 frame-fill).
- Commit C: src/picture.c + surface.h — VP9 dispatch + 2 buffer-type
  cases + union extension; NO BeginPicture reset (VP9 has no
  iqmatrix_set-style flags).
- Commit D: optional fix-forward placeholder (predicted no-op per
  feedback_runtime_enumerates_allowlists.md).

Total ~699 LOC, 7 files.

12 contract clauses include 2 NEW vs iter3:
- Clause 3: compile-time _Static_assert sizeof v4l2_ctrl_vp9_frame ==
  168 && ..._compressed_hdr == 2040 (any UAPI shift fails loudly).
- Clause 6: uncompressed-header partial parse for lf_delta_* +
  base_q_idx (VAAPI doesn't expose; BBB keyframe needs non-zero
  ref_deltas={1,0,-1,-1} per Phase 3 anchor).

7 Phase 5 review questions queued, all empirical-leaning per
feedback_review_empirical_over_theoretical.md Direction 2:
parser-vs-bitstream cross-check, FFmpeg-XOR-remap validation,
struct-size stability, mitigation B regression risk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 23:10:47 +00:00
marfrit 56abe3d6a2 iter4 Phase 3: VP9 baseline + 4-codec regression on 7.0 substrate
Captured on linux-fresnel-fourier 7.0-1 (post 6.19 decommission).

VP9 baseline (kernel-direct via ffmpeg-v4l2request on rkvdec):
- 5-frame SW reference PNG SHA256 anchors (criterion-4)
- VIDIOC_S_EXT_CTRLS strace with full payload at -s 16384
- Empirical struct sizes 168 B (FRAME) / 2040 B (COMPRESSED_HDR)
  supersede Phase 2 estimates of 144 / 1947
- Probe pattern: count=1 (FRAME-only) then count=2 (FRAME + COMPRESSED_HDR)

Phase 2 doc fix: control IDs corrected 0xa40b2c/d -> 0xa40a2c/d.

4-codec regression (H.264, MPEG-2, HEVC, VP8): all fall back to SW on
default config because /dev/video0 is now rockchip-rga (RGB color
converter), not a codec device. Fork hardcodes /dev/video0 in
request.c:149. Env override LIBVA_V4L2_REQUEST_VIDEO_PATH /
_MEDIA_PATH restores per-driver profile enumeration; mitigation A/B/C
queued for user decision.

New contract clauses surfaced:
- Clause 11: uncompressed-header partial parse for lf_delta /
  base_q_idx (VAAPI doesn't expose these; keyframe ref_deltas non-zero
  for BBB so leave-at-zero is wrong)
- Clause 12: compile-time sizeof asserts on the two control structs
  so future UAPI shifts fail loudly

iter4_phase3.tgz: full Phase 3 artifact bundle (strace + PNG refs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 20:31:53 +00:00
claude-noether 2651e4cfdf iter4 Phase 2: situation analysis — VP9 backend gaps + compressed-
header parser requirement

Source-read of every file the iter4 patch series will touch, plus
kernel UAPI + VAAPI + downstream FFmpeg + kernel rkvdec reference
sources. Conducted on noether against fork tip e1aca9c (iter3 close).

Critical scope-shaping finding: rkvdec on RK3399 REQUIRES
V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (not optional). Per
drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble
lines 752-754:

  ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl,
                        V4L2_CID_STATELESS_VP9_COMPRESSED_HDR);
  if (WARN_ON(!ctrl))
      return -EINVAL;

VAAPI does NOT expose compressed-header probability updates
(va_dec_vp9.h:50-192 — only frame parameters + segmentation;
vendor VAAPI drivers parse compressed header in firmware/GPU).
Therefore the libva backend MUST parse the compressed header
itself via a VPX boolean decoder + inv_map_table[]. ~150-200 LOC
of bitstream parsing logic (port from FFmpeg
v4l2_request_vp9.c::fill_compressed_hdr).

Bug enumeration (12 sites):

  B1   config.c::RequestQueryConfigProfiles    enum block missing
  B2   config.c::RequestCreateConfig           VP9 case missing
  B3   config.c::RequestQueryConfigEntrypoints VP9 case missing
  B4   src/vp9.c                               new file ~500-600 LOC
  B5   src/vp9.h                               new file ~35-45 LOC
  B6   src/vp9_rac.h                           NEW or inline (Phase 4
                                                 plan locks Option A:
                                                 inline in vp9.c)
  B7   picture.c::codec_set_controls           VP9 dispatch missing
  B8   picture.c::codec_store_buffer           2 buffer-type cases
                                                 (Picture + Slice;
                                                 NOT 4 like VP8)
  B9   picture.c::RequestBeginPicture          predicted no reset
                                                 needed (no flag-state
                                                 like VP8 iqmatrix_set)
  B10  surface.h::object_surface::params union vp9 member missing
  B11  meson.build                             vp9.c/vp9.h not in lists
  B12  buffer.c                                predicted no change
                                                 needed (VP9 uses
                                                 Picture/Slice/SliceData
                                                 — all whitelisted)

Non-bugs (intentionally untouched): context.c (no DECODE_MODE/
START_CODE menus per FFmpeg ref), video.c (CAPTURE-side format
list), v4l2.c (fourcc-agnostic), include/hevc-ctrls.h (already
includes <linux/v4l2-controls.h>).

Contract surface cited verbatim:

  V4L2_CID_STATELESS_VP9_FRAME = 0xa40b2c (~144 bytes — much
    smaller than VP8's 1232 bytes because VP9_FRAME carries no
    entropy table; that's in COMPRESSED_HDR)
  V4L2_CID_STATELESS_VP9_COMPRESSED_HDR = 0xa40b2d (~1947 bytes
    — coef[4][2][2][6][6][3] alone is 1728 bytes)
  Per-frame submission: 2 controls batched in single S_EXT_CTRLS
  v4l2_request_vp9.c references confirmed: 2-control shape,
    runtime-probed COMPRESSED_HDR availability (rkvdec advertises
    it; we MUST provide)

VAAPI buffer types: 2 per frame (Picture + Slice) vs iter3 VP8's
4. NO Probability buffer (VP9 keeps probs in compressed header).
NO IQMatrix (VP9 keeps quant in slice's per-segment seg_param[8]).

VAAPI → V4L2 mapping table: 30+ fields enumerated. Several gap
candidates identified for Phase 3 empirical resolution:

  Q1 lf.ref_deltas/mode_deltas/flags — not in VAAPI; FFmpeg reads
     from VP9Context internal. BBB likely zero.
  Q2 quant.base_q_idx + deltas — VAAPI exposes only effective
     per-segment scales. Inverse-derive needed.
  Q3 reference_mode — not in VAAPI. Default to SELECT?
  Q4 interpolation_filter mapping (FFmpeg ^ remap)
  Q5 reset_frame_context off-by-one (FFmpeg > 0 ? - 1 : 0)
  Q6 Per-segment feature_data[8][4] derivation from VAAPI's
     effective scales is non-trivial
  Q7 mpv 0.41.0 VP9 hwdec engagement (per memory feedback_hw_
     decode_engagement_check.md — known gap from iter3 VP8)
  Q8 rkvdec dma_resv issue? (predicted NO based on iter1+iter2
     successful mpv-DMA-BUF-GL on rkvdec)

Patch-shape prediction: ~580-690 LOC across 5 modified + 2 new
files (closer to iter2 HEVC's 470 than iter3 VP8's 370). Compressed-
header parser is the dominant cost.

Phase 3 baseline targets queued: cross-validator strace verbatim
S_EXT_CTRLS payloads (both controls), VAAPI consumer trace, mpv-
VP9-vaapi engagement check, rkvdec readback non-zero check.

Phase 4 plan structure anticipated: 10-clause template per
iter2/iter3, with new Clause 8 dedicated to compressed-header
parser.

Refs:
  phase0_findings_iter4.md (Phase 1 lock)
  phase8_iteration3_close.md (predecessor)
  references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp9.c (V4L2 ref)
  references/ffmpeg-kwiboo/libavcodec/vaapi_vp9.c (VAAPI ref)
  /home/mfritsche/src/linux-rfc/drivers/staging/media/rkvdec/
    rkvdec-vp9.c (kernel driver — confirms COMPRESSED_HDR
    requirement at lines 752-754)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 05:20:07 +00:00
claude-noether 9a71dbf4c3 iter4 Phase 0 + Phase 1 lock: VP9 on rkvdec
Opens iter4 immediately after iter3 close (d5d4beb). Targets VP9
Profile 0 as the fifth (final) codec to pass boolean-correctness
on fresnel via libva-v4l2-request-fourier — completes the campaign
codec scope.

Locked research question:
  mpv --hwdec=vaapi bbb_720p10s_vp9.webm engages backend cleanly
  on rkvdec, and HW pixel readback yields byte-identical output
  to a software-decoded reference for the same frames.

Five Phase 1 boolean criteria:
  1. vainfo enumerates VAProfileVP9Profile0 on rkvdec env binding
  2. vaCreateConfig(VAProfileVP9Profile0, VLD) = SUCCESS
  3. ffmpeg -hwaccel vaapi VP9 5-frame decode exit 0
  4. HW=SW byte-identical with HW engagement verified per memory
     feedback_hw_decode_engagement_check.md (mpv -v log inspection
     before claiming match). If mpv falls back to SW for VP9 like
     it did for iter3 VP8, OR if rkvdec exhibits the same dma_resv
     kernel issue as hantro, fall through to transitive proof per
     memory reference_dmabuf_resv_blocker.md (libva backend
     payload == kernel-direct payload AND kernel-direct decode ==
     SW reference).
  5. FOUR-codec regression block: H.264 + MPEG-2 + HEVC + VP8
     reference hashes hold

Substrate carry-forward (re-verified):
  - fork tip e1aca9c (post-iter3-close)
  - /usr/lib/dri/v4l2_request_drv_video.so SHA256 0ab5b2ba...4ef
  - linux-eos-arm 6.19.9-99-eos-arm
  - bbb_720p10s_vp9.webm fixture on fresnel ~/fourier-test/ (3.4 MB)
  - rkvdec OUTPUT_MPLANE VP9F + 2 VP9 stateless controls
    (V4L2_CID_STATELESS_VP9_FRAME = 0xa40b2c, COMPRESSED_HDR =
    0xa40b2d)
  - cross-validator anchor confirmed: rkvdec advertises VP9 per
    Phase 0 V4L2 inventory
  - Reference sources local:
    references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp9.c
    references/ffmpeg-kwiboo/libavcodec/vaapi_vp9.c
    references/linux-mainline/drivers/staging/media/rkvdec/
      rkvdec-vp9.c (verify presence at Phase 2)

Predicted scope:
  - config.c: ADD VP9 enumeration block + RequestCreateConfig case
    + RequestQueryConfigEntrypoints case (3 sites; same shape as
    iter3 VP8)
  - src/vp9.c NEW file (~250-350 LOC; 2 V4L2 controls per frame:
    FRAME + COMPRESSED_HDR; 8-entry DPB vs VP8's 3)
  - src/vp9.h NEW file
  - src/meson.build add 'vp9.c' + 'vp9.h' entries
  - picture.c codec_set_controls VP9 dispatch + codec_store_buffer
    cases for 2 VAAPI VP9 buffer types (Picture + Slice; NO
    Probability + IQMatrix unlike iter3 VP8)
  - surface.h params union extend with vp9 member
  - context.c: NO changes expected (no init-time menus per FFmpeg
    ref pattern)
  - buffer.c: predicted no Commit D needed (VP9 uses Picture +
    Slice + SliceData buffer types — all already whitelisted by
    H.264 path); plan for fix-forward if runtime miss surfaces
    per memory feedback_runtime_enumerates_allowlists.md

Predicted total: ~400-500 LOC, 3-4 commits + 0-1 fix-forwards.
Larger than iter3 VP8 (370 LOC) but comparable to iter2 HEVC
(470 LOC).

VP9 contract surface:
  - 2 controls per frame batched in single S_EXT_CTRLS:
    FRAME (struct v4l2_ctrl_vp9_frame) + COMPRESSED_HDR
    (struct v4l2_ctrl_vp9_compressed_hdr — probability updates
    from compressed header)
  - 8 reference frames in DPB (active_ref_frames[8])
  - Tile-based decoding (VP9 has 1..N tiles per frame)
  - Profile 0 only (8-bit 4:2:0); Profile 1/2/3 OUT-OF-SCOPE

Phase 2 source-read targets queued: config.c enumeration pattern,
picture.c dispatch + per-buffer-type cases, surface.h params union,
VAAPI <va/va_dec_vp9.h>, kernel UAPI v4l2_ctrl_vp9_frame +
v4l2_ctrl_vp9_compressed_hdr (lines 2696-2870), kernel rkvdec-
vp9.c driver, FFmpeg v4l2_request_vp9.c + vaapi_vp9.c.

Memory carry-forward (all 9 entries apply unchanged):
  feedback_gitea_as_claude_noether
  feedback_no_session_termination_attempts
  feedback_header_deletion_check
  feedback_runtime_enumerates_allowlists (NEW iter3)
  feedback_review_empirical_over_theoretical (BOTH directions)
  feedback_rockchip_pixel_verify_path
  feedback_fresnel_hostname (NEW iter3)
  feedback_hw_decode_engagement_check (NEW iter3)
  reference_dmabuf_resv_blocker (NEW iter3)

Open questions inherited from iter3 close (not blocking iter4
lock):
  - Does mpv 0.41.0 engage HW for VP9 hwdec=vaapi or fall back
    like it did for VP8? Phase 0+3 verifies via mpv -v log.
  - Does rkvdec exhibit the same vb2_dma_resv kernel issue as
    hantro? Likely no (different driver subsystem; iter1+iter2
    mpv-DMA-BUF-GL paths worked on rkvdec). Phase 3 baseline
    answers via ffmpeg-vaapi-hwdownload non-zero check.

iter4 = final codec in campaign scope. Clean close → 5/5 codecs
passing → campaign complete.

Refs:
  phase0_findings_iter1.md (iter1 MPEG-2 lock template)
  phase0_findings_iter2.md (iter2 HEVC lock template)
  phase0_findings_iter3.md (iter3 VP8 lock template)
  phase8_iteration3_close.md (immediate predecessor close)
  phase0_evidence/2026-05-07/v4l2_inventory_findings.md (rkvdec
    VP9 capability)
  phase0_evidence/2026-05-07/test_fixtures.md (bbb_720p10s_vp9.
    webm provenance)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 23:36:04 +00:00
claude-noether d5d4beb64d iter3 Phase 8 close: 4/5 codecs passing, 3 new memory entries
distilled, 0 Phase 7 → Phase 4 loopbacks

iter3 = VP8 on hantro-vpu-dec via libva-v4l2-request-fourier on
RK3399 (fresnel / Pinebook Pro). Fourth codec to ship.

Final state:

  Fork tip: e1aca9c (post iter2 close 8d71e20 + 4 commits)
  Phase 1 criteria: 5/5 GREEN (4 direct + 1 transitive)
  LOC delta: +373 across 7 files (2 new + 5 modified)
  Phase 7 → Phase 4 loopbacks: 0
  Phase 6 fix-forwards: 1 (Commit D buffer.c allow-list)
  Phase 5 review findings: 4 Critical, all empirically validated

Lessons distilled to memory (3 NEW entries):

  feedback_hw_decode_engagement_check.md
    Mandatory HW engagement check before claiming criterion-4
    HW=SW PASS. mpv silently falls back to SW for some codec/
    backend combos. Use lsof/strace/mpv -v/ffmpeg log to verify
    HW path actually engaged. Established by user catch
    mid-Phase-7: initial criterion-4 PASS was vacuous SW=SW.

  reference_dmabuf_resv_blocker.md
    Cross-campaign blocker. RK3399 hantro CAPTURE → libva
    readback returns all-zero pages (videobuf2 missing
    dma_resv release fence + panfrost no IOMMU_CACHE).
    Tracked at git.reauktion.de/marfrit/dmabuf-modifier-triage/
    issues/2. vb2_dma_resv kernel patches in flight (RFC v2,
    2026-04 linux-media). Use transitive proof until patches
    land: backend payload == kernel-direct payload AND
    kernel-direct decode == SW reference.

  feedback_runtime_enumerates_allowlists.md
    Sibling to feedback_header_deletion_check.md. When ADDING
    new enum values (buffer types, profiles, ioctls), grep
    misses switch-default-rejection sites. Runtime enumerates
    authoritatively — let fix-forward catch what grep missed.
    Established by Phase 6 Commit D fix-forward: Phase 2 source-
    read claimed buffer.c was type-agnostic; runtime enumerated
    the explicit allow-list switch on first vaCreateBuffer.

Phase 5 amendments empirically validated (all 4 Critical correct):

  C1 first_part_header_bits = slice->macroblock_offset → 6550 ✓
  C2 first_part_size = partition_size[0]+ceil(macroblock_offset/8)
     → 22742 ✓ (= 21923 + 819, exact match for Phase 3 anchor)
  C3 VAProbabilityBufferType (not VAProbabilityDataBufferType)
     → compiled clean post-Commit-D
  C4 (int8_t) cast (not (s8)) → compiled clean Commit B first try

Estimated savings without Phase 5 review: 2 Phase 6 compile-fail/
fix-forward cycles (C3 + C4) + 1 Phase 7 → Phase 4 loopback (C1
+ C2 hardware-DMA-offset bug, would have produced visible-but-
corrupt output). Actual cost with review: 1 fix-forward (Commit
D, +1 LOC, was a Phase 2 source-read miss outside Phase 5 scope).

Cross-cutting backlog updates:

  iter3-Q1 first_part_header_bits → CLOSED by Phase 5 C1
  iter3-flags-anomaly bit 0x40 → not iter3 scope; kernel ignores
  iter3-criterion-4-readback → blocked on dmabuf-modifier-triage
                                iter1; transitive proof used
  iter3-mpv-vp8-fallback → mpv 0.41.0 falls back to SW for VP8;
                            consumer-side, not backend; verify
                            via chrome-fourier when convenient

Inherited backlog (B1, B3, B4, B5, B6, L3) — no closures from
iter3.

Campaign scoreboard: 3/5 → 4/5 codecs passing.

  H.264   | rkvdec | T4    | PASS direct
  MPEG-2  | hantro | iter1 | PASS direct
  HEVC    | rkvdec | iter2 | PASS direct
  VP8     | hantro | iter3 | PASS transitive (readback blocked)
  VP9     | rkvdec | iter4 | PENDING

iter4 (VP9 on rkvdec) prediction: comparable scope to iter2 HEVC
(VP9 has compressed-header control + probability state).
~400-500 LOC, 3-4 commits + 1 fix-forward. mpv may engage HW for
VP9 (different from VP8 fallback) — verify at iter4 Phase 0.

Refs:
  phase0_findings_iter3.md (Phase 1 lock)
  phase2_iter3_situation.md (situation analysis)
  phase3_iter3_baseline.md (verbatim payload anchors)
  phase4_iter3_plan.md (10 contract clauses + Phase 5 amendments)
  phase5_iter3_review.md (4 Critical, all validated correct)
  phase7_iter3_verification.md (4 direct + 1 transitive PASS)
  Fork commits 27d82e3 + 017e27f + 7f84bbb + e1aca9c

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 23:31:23 +00:00
claude-noether afb9b1450f iter3 Phase 7: verification — 4 direct PASS, 1 transitive PASS
Phase 1 5-criterion verification on iter3 backend (fork tip e1aca9c).
4 direct PASS + 1 transitive PASS. Vacuous-pass mode caught + corrected
mid-Phase-7 (initial mpv --hwdec=vaapi --vo=image HW=SW match was
SW=SW; mpv silently fell back to SW for VP8).

Criterion results:

  1. vainfo enumerates VAProfileVP8Version0_3       PASS (direct)
  2. vaCreateConfig SUCCESS                          PASS (direct, implied)
  3. ffmpeg-vaapi VP8 5-frame decode exit 0          PASS (direct)
  4. HW=SW byte-identical via DMA-BUF GL             PASS (transitive)
  5. 3-codec regression (H.264 + MPEG-2 + HEVC)      PASS (direct)

Criterion 4 transitive proof:

  Step A: Strace of ffmpeg-vaapi via libva backend captures the
          V4L2_CID_STATELESS_VP8_FRAME control payload — keyframe
          y_ac_qi=8, first_part_size=22742, first_part_header_bits=
          6550, all 30 fields enumerated.

  Step B: Phase 3 baseline already captured the kernel-direct
          (ffmpeg-v4l2request) keyframe payload — IDENTICAL to A
          field-for-field.

  Step C: ffmpeg-v4l2request kernel-direct VP8 decode produces
          5 raw frames byte-identical to SW reference (cmp on
          full 6.7 MB vp8_kerneldirect.yuv vs vp8_sw5.yuv = silent
          BYTE-IDENTICAL).

  Conclusion: A == B (libva backend produces correct kernel input)
              AND C (kernel-direct decode is correct), therefore
              libva backend's HW decode IS correct by transitivity.

Direct readback BLOCKED by kernel-layer dma_resv issue (sibling
campaign git.reauktion.de/marfrit/dmabuf-modifier-triage/issues/2):

  - ffmpeg-vaapi -hwaccel_output_format vaapi -vf hwdownload
    returns all-zero pages (SHA b34860e0... = SHA of all-zero
    1382400-byte block) for ALL 5 frames.
  - Same all-zero from -hwaccel_output_format nv12 + auto-DL.
  - mpv --hwdec=vaapi-copy returns Y=128 gray (uninitialized).
  - Root cause: videobuf2 missing dma_resv release fence + panfrost
    IOMMU_CACHE absence on RK3399 (per dmabuf-modifier-triage iter1
    RFC). vb2_dma_resv kernel patches in flight (linux-media RFC v2,
    2026-04). When patches land, direct verification re-runnable.

Phase 5 amendments empirically validated:

  C1 first_part_header_bits = slice->macroblock_offset → 6550 ✓
  C2 first_part_size = partition_size[0] + ceil(macroblock_offset/8)
     → 22742 ✓ (= 21923 + 819, exact match for Phase 3 anchor)
  C3 VAProbabilityBufferType (not VAProbabilityDataBufferType) →
     compiled clean post-Commit-D fix-forward
  C4 (int8_t) cast → compiled clean Commit B first try
  S3 assert(probability_set) → has not fired (FFmpeg vaapi_vp8.c
     always sends VAProbabilityBufferType per frame)

Phase 6 fix-forward Commit D documented: buffer.c had an explicit
allow-list switch (Phase 2 source-read missed it). Same iter1 Commit
D pattern — runtime enumerates authoritatively what grep missed.

HW-engagement check applied per new memory rule
feedback_hw_decode_engagement_check.md (established this session):

  - mpv-vaapi VP8: SILENT FALLBACK to SW. mpv-side, not backend
    issue. ffmpeg-vaapi VP8: HW engaged (Format vaapi chosen by
    get_format(); cap_pool_init: 24 slots ready).
  - V4L2 strace: VIDIOC_S_EXT_CTRLS for VP8_FRAME (0xa409c8)
    returns 0 (kernel accepts payload). CAPTURE buffer indexes
    advance through distinct slots per decode.

Cross-cutting backlog updates:

  iter3-Q1 first_part_header_bits → closed by Phase 5 C1
  iter3-flags 0x40 → not iter3 scope; kernel ignores
  iter3-criterion-4 readback → blocked on dmabuf-modifier-triage
                                iter1 (vb2_dma_resv kernel patches)

Campaign scoreboard: 3/5 → 4/5 codecs passing.

Memory entries added:
  feedback_hw_decode_engagement_check.md (mandatory HW engagement
    verification before claiming criterion-4 PASS)
  reference_dmabuf_resv_blocker.md (cross-campaign blocker tracking
    + transitive proof pattern)

Refs:
  phase4_iter3_plan.md (10 contract clauses + Phase 5 amendments)
  phase5_iter3_review.md (4 Critical findings, all empirically
                            validated in Phase 7)
  phase3_iter3_baseline.md (verbatim payload anchors used in
                              transitive proof Step B)
  git.reauktion.de/marfrit/dmabuf-modifier-triage/issues/2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 23:26:27 +00:00
claude-noether 656596aa6b iter3 Phase 5: sonnet review — 4 Critical findings, 4 amendments
Second-model review by sonnet-architect found 4 Critical bugs in
Phase 4 plan, all verified empirically by author before incorporation
per memory feedback_review_empirical_over_theoretical Direction 2.
Amendments applied in-place to phase4_iter3_plan.md +
phase2_iter3_situation.md.

Critical findings:

  C1 first_part_header_bits = 0 was claimed cosmetic; actually
     UNSAFE. hantro_g1_vp8_dec.c:260 + rockchip_vpu2_hw_vp8_dec.c:372
     both read this field unconditionally to compute the macroblock
     DMA offset. Setting 0 would place hardware at wrong DMA offset
     for ALL macroblock data → garbage decode.
     Fix: frame.first_part_header_bits = slice->macroblock_offset
     (verified by source identity — vaapi_vp8.c:204 and
     v4l2_request_vp8.c:83 use byte-identical formulas).

  C2 first_part_size = slice->partition_size[0] was wrong; VAAPI's
     partition_size[0] is the REMAINING bytes after parsing
     (vaapi_vp8.c:209 confirms; va_dec_vp8.h:193-196 spec confirms).
     Kernel needs the TOTAL control partition size.
     Fix: frame.first_part_size = slice->partition_size[0] +
                                  ((macroblock_offset + 7) / 8)
     Phase 3 keyframe numerics confirm: 21923 + 819 = 22742 ✓.

  C3 VAProbabilityDataBufferType does not exist as a buffer-type
     enum; it's the struct name. The actual enum constant is
     VAProbabilityBufferType (= 13 per va.h:2058). Switch case
     using the wrong identifier would have failed Phase 6 compile.
     Fix: replace globally in phase2 + phase4 docs.

  C4 (s8) cast undefined in userspace. Kernel has 's8' typedef in
     linux/types.h (kernel-internal). UAPI exposes '__s8' (double-
     underscore). Userspace portable cast is int8_t from <stdint.h>.
     Fix: replace (s8) with (int8_t) in Clauses 6+7.

Suggested:

  S3 Clause 8 comment was factually wrong: hantro_vp8.c::
     hantro_vp8_prob_update reads coeff_probs unconditionally;
     there is NO default-table fallback. If probability_set==false,
     decode produces garbage. Practical risk low (FFmpeg vaapi_vp8.c
     always sends VAProbabilityBufferType per frame), but corrected
     comment + added assert(probability_set) runtime guard for
     immediate Phase 6 surfacing.

Plus 5 minor S/Q items documented; non-blocking for iter3.

Author's 7 review questions all answered directly in the review:
  Q1 quantization derivation: correct for typical content
  Q2 first_part_header_bits=0 safety: UNSAFE → C1
  Q3 num_dct_parts off-by-one: confirmed correct
  Q4 field availability: 2 compile failures found (C3 + C4)
  Q5 quant_update[s] semantics: signed delta confirmed
  Q6 SHOW_FRAME unconditional: safe for BBB scope
  Q7 buffer order independence: confirmed

Estimated saving: 1 Phase 6 → Phase 4 loopback + 2 Phase 6 fix-
forward commits. Review pass is the right path forward per memory
rule "Reviews are never skippable" — empty-review value =
empirical-verification value, regardless of finding count.

Refs:
  phase4_iter3_plan.md (amended in-place; Phase 5 amendments
                         section appended)
  phase2_iter3_situation.md (amended C3 globally)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:27:53 +00:00
claude-noether 2918dda2e0 iter3 Phase 4: plan — 10 contract clauses, ~308-LOC patch, 3 commits
Locks the iter3 patch shape against Phase 3 verbatim cross-validator
payload + Phase 2 contract surface. 10 contract clauses cite kernel
UAPI + VAAPI + FFmpeg ref + Phase 3 byte anchors throughout.

Patch shape (mirrors iter1 ABCD pattern):

  Commit A: src/config.c — enumeration block + CreateConfig case +
            QueryConfigEntrypoints case (3 sites, +16 LOC, 1 file).
            After: vainfo lists VP8Version0_3.
  Commit B: NEW src/vp8.c (~200 LOC) + NEW src/vp8.h (~40 LOC) +
            meson.build sources/headers entries (+2). 3 files
            (2 new + 1 modified).
            After: vp8.o compiles standalone.
  Commit C: src/picture.c — codec_set_controls dispatch +
            codec_store_buffer 4 buffer-type cases + outer
            VAProbabilityDataBufferType case + BeginPicture
            per-frame reset (4 sites, +40 LOC) + src/surface.h
            params.vp8 union member (+10 LOC). 2 files modified.
            After: end-to-end VP8 decode through libva backend.

Total: ~308 LOC, 6 files (2 new + 4 modified), 3 commits.

Contract clauses summary:

  1. Submission shape — single VIDIOC_S_EXT_CTRLS, count=1, ctrl_class=
     V4L2_CTRL_CLASS_CODEC_STATELESS (0xf010000), id=0xa409c8,
     size=1232 bytes
  2. Local struct alloc + zero-init (memset clears all padding)
  3. Frame geometry + version + per-frame scalars (off-by-one
     num_dct_parts = num_of_partitions - 1)
  4. DPB timestamp resolution (3 refs: last/golden/alt; 0-sentinel
     when SURFACE() returns NULL — mirrors iter1 mpeg2.c pattern)
  5. Loop filter mapping (6 fields + 3 flag bits)
  6. Quantization base + delta derivation (segment 0 = base via
     iqmatrix[0][0]; deltas = iqmatrix[0][N+1] - iqmatrix[0][0]
     signed; per-segment quant_update[1..3] only when segmentation
     enabled)
  7. Segment fields (segment_probs direct copy; flags assembled +
     DELTA_VALUE_MODE set unconditionally per FFmpeg pattern)
  8. Entropy table mapping — 3 VAAPI sources (Picture: y_mode +
     uv_mode + mv_probs; ProbabilityData: coeff_probs[4][8][3][11]
     direct memcpy; IQMatrix: quant)
  9. Coder state + first-partition fields + flags (6 mainline-
     documented bits only; bit 0x40 + EXPERIMENTAL NOT replicated
     vs ffmpeg-v4l2-request-git anomaly; first_part_header_bits=0
     fallback documented as known fidelity gap)
  10. Final batched submission via v4l2_set_controls

Phase 5 review questions queued (7 items): quantization derivation
correctness, per-segment quant_update semantics, first_part_header_
bits=0 safety, probability buffer ordering, endianness, struct size
sizeof correctness, field-availability test-compile per memory
feedback_review_empirical_over_theoretical Direction 2.

Cross-cutting backlog deferred (B1, B3, B4, B5, B6, L3 inherited;
iter3-Q1 first_part_header_bits + iter3-flags 0x40 anomaly NEW).

Refs:
  phase0_findings_iter3.md (Phase 1 lock)
  phase2_iter3_situation.md (Phase 2 contract surface)
  phase3_iter3_baseline.md (Phase 3 verbatim payload anchors)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:39:52 +00:00
claude-noether fd3fce86a6 iter3 Phase 3: baselines — VP8 cross-validator + 3-codec regression
+ SW reference

Captured on fresnel 2026-05-08 across two suspend cycles (laptop
dropped twice mid-run, captures preserved on /tmp/iter3_phase3).
All Phase 3 deliverables green.

Substrate verification:
  backend SHA256: 9e27...6258 (matches iter2 close)
  3-codec regression block: ALL 6 reference hashes match byte-for-
  byte vs iter1+iter2 (H.264 +30s, MPEG-2 +02s, HEVC +02s on rkvdec/
  hantro). Substrate has not regressed; criterion-5 anchor solid.

Cross-validator anchor (ffmpeg-v4l2request VP8 strace):
  - VIDIOC_S_EXT_CTRLS, count=1, ctrl_class=V4L2_CTRL_CLASS_CODEC_
    STATELESS, id=0xa409c8, size=1232 bytes
  - struct size CORRECTED: v4l2_ctrl_vp8_frame = 1232 bytes (NOT
    400 as one might assume; entropy.coeff_probs[4][8][3][11] alone
    is 1056 bytes)
  - keyframe (frame 1) verbatim payload captured: y_ac_qi=8,
    last/golden/alt ts all 0, flags=0x0d (KEY|SHOW|NOSKIP),
    y_mode_probs=[145,156,163,128] (matches FFmpeg keyframe const)
  - inter frame verbatim payload captured: y_ac_qi=122, all DPB
    timestamps non-zero, flags=0x66 (anomaly: bit 0x40 not in
    mainline UAPI; vendor-patched ffmpeg-v4l2-request-git;
    kernel hantro_vp8.c only inspects KEY_FRAME bit, ignores
    bit 0x40)

VP8 SW pixel-verify reference (criterion-4 anchor):
  vp8_sw_001.jpg: e43757a40e5d71ad176455c0fda14c2cbf9351b702188fc8ad
                  584d789db2c984
  vp8_sw_002.jpg: a86bf885e588257731ff6cf8d2ccc5756be550e85220eee1c3
                  e6ea8c0c78e97a
  Frame 1 != Frame 2 (real motion). These are the Phase 7 byte-
  compare HW-vs-SW targets.

Open-question resolution (5 of 6 answered empirically):

  Q1 first_part_header_bits — varies per frame (key=6550, inter
     ranges 86..254); VAAPI doesn't expose. Phase 4 fallback:
     leave 0 and check kernel behavior at Phase 7 byte-compare.
     Phase 5 review will flag as known fidelity gap.

  Q2 num_dct_parts vs VAAPI num_of_partitions — confirmed off-by-
     one: kernel = VAAPI - 1 (BBB has VAAPI=2, kernel=1).

  Q3 DPB timestamp 0-sentinel — confirmed: keyframe writes all
     three timestamps as 0; iter3 mirrors iter1 mpeg2.c pattern.

  Q4 SHOW_FRAME default — set on every captured frame (BBB has no
     alt-ref invisible). Force unconditional in libva backend.

  Q5 lf.flags FILTER_TYPE_SIMPLE — not set; BBB normal loop filter.
     Direct mapping from VAAPI filter_type=0.

  Q6 First-frame DPB sentinel — confirmed Q3; no self-reference
     fallback needed (different from iter1 mpeg2.c).

V4L2 binding cells this boot:
  rkvdec        : /dev/video3 + /dev/media1
  hantro-vpu-dec: /dev/video5 + /dev/media2

Capture artefacts on fresnel /tmp/iter3_phase3/ preserved for
Phase 7 re-run:
  vp8_strace.* (19 files, multi-thread)
  decode_vp8.py (payload decoder)
  vp8_sw_00{1,2}.jpg (criterion-4)
  {h264,mpeg2,hevc}_hw_00{1,2}.jpg (criterion-5)

Refs:
  phase0_findings_iter3.md (Phase 1 lock)
  phase2_iter3_situation.md (Phase 2 contract surface)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:14:46 +00:00
claude-noether 898544a29c iter3 Phase 2: situation analysis — VP8 backend gaps + contract surface
Source-read of every file the iter3 patch series will touch, plus the
kernel UAPI + VAAPI + downstream FFmpeg + kernel hantro reference
sources. Conducted on noether against fork tip 8d71e20 (iter2 Phase 6
commit B); fresnel.vpn was unreachable so Phase 3 baseline empirical
capture defers until laptop reachable.

Bug enumeration (10 sites the patch series must touch):

  B1  config.c::RequestQueryConfigProfiles    enumeration block missing
  B2  config.c::RequestCreateConfig           VP8 case label missing
  B3  config.c::RequestQueryConfigEntrypoints VP8 case missing
  B4  src/vp8.c                               new file ~160-220 LOC
  B5  src/vp8.h                               new file ~35-45 LOC
  B6  picture.c::codec_set_controls           VP8 dispatch missing
  B7  picture.c::codec_store_buffer           4 buffer-type cases +
                                              VAProbabilityDataBufferType
                                              outer case missing
  B8  picture.c::RequestBeginPicture          per-frame reset additions
  B9  surface.h::object_surface::params union vp8 member missing
  B10 meson.build                             vp8.c/vp8.h not in lists

Non-bugs (intentionally untouched):
  - context.c (no DECODE_MODE/START_CODE menus for VP8)
  - video.c (CAPTURE-side format list; VP8 is OUTPUT-side)
  - v4l2.c (fourcc-agnostic helpers)
  - buffer.c (buffer registry is type-agnostic)
  - include/hevc-ctrls.h (already includes <linux/v4l2-controls.h>
    which holds V4L2_CID_STATELESS_VP8_FRAME)

Contract surface cited verbatim:
  - V4L2_CID_STATELESS_VP8_FRAME = V4L2_CID_CODEC_STATELESS_BASE+200
    = 0x00a409c8 (matches Phase 0 V4L2 inventory)
  - struct v4l2_ctrl_vp8_frame at <linux/v4l2-controls.h>:1929-1958
    + 5 sub-structs (segment, lf, quant, entropy, coder_state) at
    1785-1888
  - VAAPI VAPictureParameterBufferVP8 + VASliceParameterBufferVP8 +
    VAProbabilityDataBufferVP8 + VAIQMatrixBufferVP8 at
    references/libva/va/va_dec_vp8.h
  - FFmpeg v4l2_request_vp8.c reference: single batched S_EXT_CTRLS
    at end_frame, count=1, no init-time menus
  - Kernel hantro_vp8.c::hantro_vp8_prob_update reads 9 fields from
    hdr (skip/intra/last/gf probs, segment_probs, entropy.{y,uv,mv,
    coeff}_probs)

VAAPI → V4L2 mapping table: 30 fields enumerated. Open questions for
Phase 3 baseline (6 items: first_part_header_bits derivation, num_
dct_parts off-by-one, DPB timestamp 0-sentinel handling, show_frame
default, lf.flags FILTER_TYPE_SIMPLE bit, first-frame DPB sentinel).

Patch-shape prediction: ~260-340 LOC across 6 modified + 2 new
files. Medium-sized iter — between iter1's 120 LOC (3 modified +
1 deleted) and iter2's 470 LOC (5 modified). The new file dominates.

Phase 3 baseline targets queued: cross-validator strace verbatim
S_EXT_CTRLS payload capture, VAAPI consumer trace, mpv-SW reference
JPEG capture for criterion 4 byte-compare anchor.

Phase 4 plan structure anticipated: 10-clause template per iter2.

Refs:
  phase0_findings_iter3.md (Phase 1 lock)
  phase8_iteration2_close.md (predecessor close)
  src/mpeg2.c (iter1 single-codec template; iter3 will mirror shape)
  src/h265.c (iter2 dispatcher pattern; iter3 takes structure cues)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:27:06 +00:00
claude-noether ea2413e957 iter3 Phase 0 + Phase 1 lock: VP8 on hantro-vpu-dec
Opens iter3 of the fresnel-fourier campaign immediately after iter2
close (df787a6). Targets VP8 as the fourth codec to pass boolean-
correctness on fresnel via libva-v4l2-request-fourier.

Locked research question:
  mpv --hwdec=vaapi bbb_720p10s_vp8.webm engages backend cleanly and
  DMA-BUF GL import yields HW pixels byte-identical to SW reference.

Five Phase 1 boolean criteria:
  1. vainfo enumerates VAProfileVP8Version0_3 on hantro env binding
  2. vaCreateConfig(VAProfileVP8Version0_3, VLD) = SUCCESS
  3. ffmpeg -hwaccel vaapi VP8 decode exit 0
  4. mpv --hwdec=vaapi --vo=image @ +02s seek: HW=SW byte-identical
     for 2 distinct frames; frame1 != frame2
  5. THREE-codec regression block: iter1 MPEG-2 + iter2 HEVC + T4
     H.264 reference hashes all hold

Substrate carry-forward (re-verified):
  - fork master tip post-iter2-close (cca539d + 8d71e20)
  - /usr/lib/dri/v4l2_request_drv_video.so SHA256 9e27...6258
  - linux-eos-arm 6.19.9-99-eos-arm (post linux-7 headers-only upgrade)
  - bbb_720p10s_vp8.webm fixture on fresnel ~/fourier-test/ (2.4 MB)
  - hantro-vpu-dec OUTPUT_MPLANE VP8F + vp8_frame_parameters control
  - cross-validator anchor confirmed: ffmpeg-v4l2request VP8 = exit 0

Predicted scope (smaller than iter1+iter2):
  - config.c: ADD VP8 enumeration block + RequestCreateConfig case
    + RequestQueryConfigEntrypoints case (3 sites; iter1+iter2
    only had 1-2 existing-but-broken case labels)
  - src/vp8.c NEW file (~150-250 lines vs iter2's 588 h265.c)
  - src/vp8.h NEW file
  - src/meson.build add 'vp8.c' + 'vp8.h' entries
  - picture.c codec_set_controls VP8 dispatch + codec_store_buffer
    cases for 4 VAAPI VP8 buffer types (Picture, Slice, Probability,
    IQMatrix)
  - surface.h params union extend with vp8 member
  - context.c: NO changes (VP8 has no DECODE_MODE/START_CODE menus
    on hantro per Phase 0 v4l2_inventory)

VP8 contract surface: single V4L2_CID_STATELESS_VP8_FRAME control
per frame (no batch); no slice_params dynamic-array (frame-mode);
no SCALING_MATRIX (entropy + quant carried in v4l2_ctrl_vp8_frame
sub-structs).

Phase 2 source-read targets queued: config.c enumeration pattern,
picture.c dispatch + per-buffer-type cases, surface.h params union,
VAAPI <va/va_dec_vp8.h>, kernel UAPI <linux/v4l2-controls.h>
v4l2_ctrl_vp8_frame, kernel hantro_vp8.c driver, FFmpeg
v4l2_request_vp8.c.

Memory carry-forward (all five entries apply unchanged):
  feedback_gitea_as_claude_noether
  feedback_no_session_termination_attempts
  feedback_header_deletion_check
  feedback_review_empirical_over_theoretical (BOTH directions)
  feedback_rockchip_pixel_verify_path

Refs:
  phase0_findings_iter1.md (iter1 MPEG-2 lock template)
  phase0_findings_iter2.md (iter2 HEVC lock template)
  phase8_iteration2_close.md (immediate predecessor close)
  phase0_evidence/2026-05-07/v4l2_inventory_findings.md (hantro VP8
    capability)
  phase0_evidence/2026-05-07/cross_validator_traces.md (VP8 kernel
    decode path proven)
  phase0_evidence/2026-05-07/test_fixtures.md (bbb_720p10s_vp8.webm
    provenance)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:49:28 +00:00
claude-noether df787a6cc2 iter2 Phase 8 close: 3/5 codecs passing, lesson L1 extended (BOTH directions)
Iteration 2 closes with all 5 Phase 1 boolean-correctness criteria
green. Third codec passes — campaign scoreboard 2/5 → 3/5 (H.264
in T4, MPEG-2 in iter1, HEVC in iter2). Loop terminates per
feedback_dev_process.md Phase 8.

Notable: ZERO Phase 7 → Phase 4 loopbacks needed. Phase 5 review
caught all 3 would-be loopback triggers in advance (data_byte_offset
rename, dpb.rps→index-arrays semantics, pic_order_cnt_val rename).
This is the dev-process ideal: review catches bugs before
implementation lands; verification confirms contract.

What landed:

  Code (libva-v4l2-request-fourier master 229d6d1 → 8d71e20):
    cca539d iter2 Phase 6 commit A: config.c break for HEVCMain case
    8d71e20 iter2 Phase 6 commit B: rewrite h265.c against new V4L2
            stateless HEVC API (6 files, 463 ins, 236 del)

  Both authored as Claude (noether) per feedback_gitea_as_claude_noether.md.

  Campaign docs (fresnel-fourier):
    6e8c970 iter2 Phase 0 + Phase 1 lock
    b3ba157 iter2 Phase 2 situation analysis (6 bugs)
    d35a247 iter2 Phase 3 baselines (substrate post-pacman-Syu + HEVC anchor)
    348736e iter2 Phase 4 plan (10 contract clauses)
    9eae068 iter2 Phase 5 sonnet review (3 Critical UAPI errors caught)
    05b4bd5 iter2 Phase 7 verification (5/5 GREEN)
    [this commit] iter2 Phase 8 close

Lesson L1 distilled to memory (extension of iter1 entry):

  feedback_review_empirical_over_theoretical.md updated with
  Direction 2 corollary. Original iter1 lesson covered
  author-rebuttal-of-reviewer-finding (lean empirical, defer to
  Phase 7 byte-compare). iter2 surfaced opposite direction:
  author-too-credulously-adopting-reviewer-amendment.

  Concrete iter2 instance: Phase 5 S1 suggested
  picture->pic_fields.bits.uniform_spacing_flag exists in VAAPI as
  source for V4L2_HEVC_PPS_FLAG_UNIFORM_SPACING. I adopted into
  amended plan without verifying. Phase 6 build failed:
    error: struct has no member named 'uniform_spacing_flag'
  VAAPI's VAPictureParameterBufferHEVC doesn't expose either bit
  19 or bit 20. Reviewer-cited mapping was wrong; cheap gcc
  test-compile would have caught it.

  Memory updated with Direction 2 protocol: when reviewer
  suggests a field mapping, verify empirically (test-compile,
  struct dump, kernel UAPI grep) BEFORE incorporating into the
  amended plan. Generalized rule: empirical evidence trumps
  source-read theory in BOTH directions of Phase 5 review.

Backlog items deferred (campaign-internal, not durable memory):

  B6 — VAAPI ↔ V4L2 SPS field-fidelity gaps:
    sps_max_num_reorder_pics (post-fix=0, baseline=2),
    sps_max_latency_increase_plus1 (post-fix=0, baseline=4),
    possibly PPS bit 12 ENTROPY_CODING_SYNC_ENABLED.
    VAAPI doesn't expose; FFmpeg parses from bitstream directly.
    Operational impact NIL (Phase 7 Criterion 4 byte-identical
    pixel pass). Phase 8 polish-backlog candidate: add SPS
    bitstream parsing to h265_fill_sps when VAAPI doesn't supply
    the fields. Probably low ROI — kernel HEVC handler tolerates
    0 values for the BBB fixture. Defer until a real-world consumer
    surfaces a fixture that breaks on this.

  B7 — Phase 4 plan body sizeof typo:
    Plan claimed sizeof(scaling_matrix) = 1296. Empirical = 1000
    bytes. Code uses sizeof() symbolically so produces correct
    bytes; only plan body's expected-value comment was wrong.
    Phase 7 byte-compare structural check caught it. Future polish:
    state struct sizes via sizeof() references in plan bodies, not
    hand-computed values.

iter1 carryover backlog (still deferred):

  B3 latent surface-reuse bug — picture.c:287 h264.matrix_set=false
      hits union byte 240. For HEVC: byte 240 lands in h265.picture.
      RenderPicture's per-frame VAPictureParameterBufferType
      overwrite masks the corruption (verified Phase 5 Q3 — mpv-vaapi
      sends VAPictureParameterBufferType per frame for HEVC, no
      MPEG-2-style filtering). Iter2+ Phase 4 cross-cutting candidate.

  B4 context.c H.264 device-init log noise (rkvdec accepts both
      H.264 + HEVC controls cleanly; on hantro both EINVAL with
      cosmetic log). Iter2 added a 2nd batched call for HEVC; same
      (void) swallow pattern. Cosmetic.

  B5 vbv_buffer_size 1MB vs 1.31MB (MPEG-2-only; not exercised by
      HEVC).

Phase 4 cross-cutting work items collected:
  - VIDIOC_EXPBUF + DMA_BUF_IOCTL_SYNC for vaDeriveImage cache-stale
    fix (still applies to all codecs)
  - V4L2 device-discovery probe (still applies)
  - picture.c BeginPicture profile-aware reset (B3)
  - context.c H.264+HEVC device-init log suppression (B4)
  - mpeg2 vbv_buffer_size negotiation polish (B5)
  - h265 SPS bitstream-parse fidelity polish (B6)

Campaign roadmap (codec iterations remaining):
  iter3: VP8 on hantro — implement vp8.c. Smaller scope than iter2;
         predicting closer to iter1 MPEG-2 in size. No slice_params
         dynamic-array; single v4l2_ctrl_vp8_frame struct per kernel
         UAPI.
  iter4: VP9 on rkvdec — implement vp9.c. Largest control surface
         remaining.

Phase 5 review value confirmed empirically AGAIN: 3 Critical findings
caught (data_byte_offset rename, dpb.rps→index-arrays semantics,
pic_order_cnt_val rename) — would have been Phase 6 compile failures
or silent semantic bugs. Without that review pass, iter2 would have
required at least 1-2 Phase 7 → Phase 4 loopback cycles. Reviews
are never skippable per global ~/.claude/CLAUDE.md rule; iter2
exemplifies why.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:16:55 +00:00
claude-noether 05b4bd56ec iter2 Phase 7: verification — all 5 criteria GREEN, third codec PASS
Phase 7 verification of iter2 HEVC fix executed against fork tip
8d71e20 (libva-v4l2-request-fourier master = post-iter2-Commit-B).
Verbatim raw output captured to phase0_evidence/2026-05-08/
iter2_phase7/. All five Phase 1 criteria green; bonus byte-compare
confirms structural match against Baseline B with two minor field-
value divergences (informational SPS fields VAAPI doesn't expose;
non-blocking per Criterion 4 byte-identical pixel pass).

Phase 1 → Phase 7 scoreboard:

  Criterion 1 (vainfo VAProfileHEVCMain enum):                  PASS
    rkvdec bind: H.264 (5 profiles) + HEVCMain — same as Baseline.

  Criterion 2 (vaCreateConfig SUCCESS for HEVCMain):            PASS
    Pre-iter2: VA_STATUS_ERROR_UNSUPPORTED_PROFILE (12)
    Post-iter2: VA_STATUS_SUCCESS (verified verbatim libva trace)

  Criterion 3 (ffmpeg-direct HEVC engages backend, exit 0):     PASS
    5 frames decoded clean, cap_pool_init: 24 slots ready,
    no Failed-to-create lines, no S_EXT_CTRLS EINVAL.

  Criterion 4 (DMA-BUF GL HEVC HW=SW byte-identical at +02s):   PASS
    HW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5
    SW frame 1: 47a5f3850df5d8c732767a227830c2272ff78402a7b6adeea329e29838808be5
    HW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656
    SW frame 2: a467b3bc9d7b6374b6786ecfac46932d6c7bb932ab11d311edaa233d7863e656
    Frames 1 vs 2 hash-differ (real motion).

  Criterion 5 (iter1 MPEG-2 + T4 H.264 reference hashes):       PASS
    H.264 +30s HW1: f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9 (T4 ref MATCH)
    H.264 +30s HW2: 7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8 (T4 ref MATCH)
    MPEG-2 +02s HW1: 6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092 (iter1 ref MATCH)
    MPEG-2 +02s HW2: ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de (iter1 ref MATCH)

Bonus byte-compare against Phase 3 Baseline B verbatim:

  count=5, ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS=0xf010000:
    SPS            id=0xa40a90 size=40   (matches Baseline B)
    PPS            id=0xa40a91 size=64   (matches)
    SLICE_PARAMS   id=0xa40a92 size=280  (1 slice × sizeof(slice_params))
    SCALING_MATRIX id=0xa40a93 size=1000 (matches sizeof(scaling_matrix);
                                          Phase 4 plan typo'\''d 1296 — actual
                                          struct sums to 1000 = 96+384+384+
                                          128+6+2)
    DECODE_PARAMS  id=0xa40a94 size=328  (matches)
    All return = 0 (kernel accepts every batched call).

  SPS field-value divergences vs Baseline B (FFmpeg-v4l2request):
    sps_max_num_reorder_pics:    post-fix=0  baseline=2   DIVERGE
    sps_max_latency_increase_plus1: post-fix=0  baseline=4 DIVERGE
    All other SPS fields match (pic_width=1280, pic_height=720,
    bit_depth=0, flags=0x180=SAO|STRONG_INTRA_SMOOTHING).

  PPS flags also diverge slightly (bit 12 ENTROPY_CODING_SYNC_ENABLED:
  post-fix unset, baseline set). Other PPS fields match.

  Cause: VAAPI'\''s VAPictureParameterBufferHEVC doesn'\''t expose
  sps_max_num_reorder_pics, sps_max_latency_increase_plus1, or
  always-truthful entropy_coding_sync. FFmpeg parses these from
  bitstream directly. Operational impact NIL (Criterion 4 byte-
  identical pixel pass — kernel decoded correctly with these fields
  defaulted to 0). Phase 8 polish backlog candidate (low priority):
  add SPS bitstream parsing to extract these fields when VAAPI
  doesn'\''t supply them.

Phase 7 → Phase 8: clean transition, no loopback.

Notable Phase 7 observations for Phase 8 memory:

  1. Phase 5 review value confirmed: 3 Critical findings (C1
     data_byte_offset rename, C2 dpb.rps→index-arrays semantics,
     C3 pic_order_cnt_val rename) caught at Phase 5 — prevented
     Phase 6 compile failures + at least 1-2 Phase 7→Phase 4
     loopback cycles. Per memory feedback_review_empirical_over_
     theoretical.md: every Critical/Should-fix verified
     empirically before responding. Lesson held.

  2. One Phase 5 amendment was empirically wrong: S1 suggested
     uniform_spacing_flag exists in VAAPI; gcc test-compile rejected.
     Both PPS bits 19+20 left zero (VAAPI exposes neither).
     Documented inline. Lesson: even reviewer-cited field mappings
     warrant empirical verification.

  3. Phase 4 plan typo: claimed sizeof(scaling_matrix) = 1296;
     empirical size is 1000. Code uses sizeof() so produces correct
     bytes. Plan body amendment-by-side-channel; not blocking.

  4. VAAPI↔V4L2 field-fidelity gaps surfaced: 2 SPS fields +
     possibly 1 PPS bit not exposed by VAAPI. Operational nil;
     Phase 8 polish-backlog candidate.

  5. mpv --hwdec=vaapi engages HEVC cleanly (no MPEG-2-style
     filtering). Confirms Phase 5 Q3 — VAPictureParameterBufferType
     sent per-frame for HEVC; latent B3 bug masked same as MPEG-2.

  6. BBB HEVC fixture is 1 slice per frame (slice_params size=280
     = 1 × sizeof). Multi-slice path in iter2 is coded but
     untested by binding cell.

Campaign scoreboard: 2/5 → 3/5 codecs passing
(H.264 in T4, MPEG-2 in iter1, HEVC in iter2). iter2 advances
to Phase 8.

Refs:
  ../libva-v4l2-request-fourier@8d71e20 (the fork tip verified)
  phase4_iter2_plan.md (10 contract clauses; SCALING_MATRIX size
                        typo noted)
  phase5_iter2_review.md (3 Critical + 4 Should-fix amendments
                          all incorporated; S1 partially empirically
                          incorrect — VAAPI doesn'\''t expose
                          uniform_spacing_flag)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:52:10 +00:00
claude-noether 9eae068f11 iter2 Phase 5: sonnet review — 3 critical UAPI errors caught, 7 amendments
Phase 5 review run via Plan subagent with model: sonnet per
feedback_dev_process.md Phase 5 discipline. 13 findings: 3 Critical
+ 4 Should-fix + 3 Question + 3 Nit. Reviewer's bottom-line: medium
confidence (vs iter1's medium-high) — lower because the plan had
3 concrete-and-wrong claims about kernel UAPI struct fields that
would have caused compile errors or silent semantic bugs in Phase 6.

Per memory feedback_review_empirical_over_theoretical.md: every
Critical and Should-fix finding was VERIFIED against fresnel's
kernel UAPI before responding. No source-read rebuttals attempted.

Critical resolutions:

C1 (data_byte_offset, not data_bit_offset):
  Plan Clause 4 said new API "still requires bit_size + data_bit_
  offset, this logic is preserved." Empirical: struct has
  data_byte_offset (u32 byte count). FFmpeg uses straight byte
  offset, no bit search. Plan amendment: drop bit-search at
  h265.c:196-209; replace with byte-offset assignment.
  ACCEPTED.

C2 (dpb.rps GONE, pic_order_cnt_val rename, poc_st_curr_*
    arrays hold DPB indices):
  Plan Clause 6 said "DPB extraction migrates verbatim." Empirical:
    - dpb_entry has flags (only LONG_TERM_REFERENCE bit), no .rps
    - pic_order_cnt_val (singular s32) replaces pic_order_cnt[0]
    - poc_st_curr_before[16]/_after[16]/_lt_curr[16] are u8 DPB
      INDICES, not POC values; populate via FFmpeg
      get_ref_pic_index() pattern (search dpb[] by timestamp,
      return index)
  Plan amendment: replace "verbatim migration" claim with explicit
  re-spec: classify VAAPI ReferenceFrames into ST_CURR_BEFORE/
  AFTER/LT_CURR lists, assign DPB indices, populate arrays with
  indices.
  ACCEPTED.

C3 (union-aliasing reasoning wrong, claim still right):
  Same anti-pattern as iter1 review C1. Plan said reset is benign
  because RenderPicture per-buffer copies overwrite byte 17764.
  Empirical: byte 17764 lands in num_slices region; non-HEVC
  profiles never read that location. Reset is benign because
  non-aliasing, NOT because of overwriting. Wording amended.
  ACCEPTED.

Should-fix resolutions:

S1 (PPS flags 19+20 missing): empirical confirms
  V4L2_HEVC_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT (1ULL<<19)
  V4L2_HEVC_PPS_FLAG_UNIFORM_SPACING (1ULL<<20)
  Plan amended to add both. ACCEPTED.

S2 (3 PPS scalars missing): empirical PPS struct dump confirms
  pic_parameter_set_id, num_ref_idx_l0_default_active_minus1,
  num_ref_idx_l1_default_active_minus1 all present in modern
  struct. Plan amended to populate. ACCEPTED.

S3 (SCALING_MATRIX content divergence FFmpeg vs libva):
  FFmpeg sends memset-zero when no scaling list in stream
  (BBB has no scaling_list — SPS flags=SAO|STRONG_INTRA only).
  Plan said "populate spec defaults when iqmatrix_set==false."
  Phase 6 implementer choice; document in commit which path
  taken. Phase 7 byte-compare validates. ACCEPTED as choice
  rather than mandate.

S4 (FFmpeg function name wrong cite):
  Plan cited ff_v4l2_request_query_control_default_value;
  actual is ff_v4l2_request_query_control. Cosmetic fix.
  ACCEPTED.

Question resolutions:

Q1 (object_heap allocator size handling): VERIFIED safe.
  request.c:142-143 uses sizeof(struct object_surface). Adding
  slices[64] auto-picks-up the larger size.

Q2 (slice_segment_addr field): VERIFIED present in struct.
  Plan amended Clause 4: populate from VAAPI
  slice->slice_segment_address. Single-slice BBB safe with
  implicit zero; multi-slice would corrupt without this field.

Q3 (VAPictureParameterBufferType per-frame send for HEVC):
  Deferred to Phase 7 LIBVA_TRACE capture. iter1+T4 patterns
  suggest yes, worth grepping at verification time.

Nits N1+N2+N3: array size [16] not [8]; image-output
  directory naming cosmetic; BeginPicture cleanup deferred.

Plan amendments consolidated:
  1. Clause 4: data_byte_offset; drop bit-search; add
     slice_segment_addr population (C1 + Q2)
  2. Clause 6: explicit DPB classification + index-array logic;
     pic_order_cnt_val rename; drop dpb.rps (C2)
  3. Clause 3: 2 PPS flags + 3 scalars (S1, S2)
  4. Clause 5: function name fix (S4); SCALING_MATRIX divergence
     deferred to Phase 6 implementer (S3)
  5. Clause 10: union-aliasing reasoning corrected (C3)
  6. Clause 6: V4L2_HEVC_DPB_ENTRIES_NUM_MAX=16 macro reference (N1)
  7. Phase 7 harness: rename png_* → image_* dirs (N2)

Plan re-locks with these amendments. Phase 6 proceeds.

Per global ~/.claude/CLAUDE.md rule: Phase 5 reviews never
skippable. iter2's review was the right path forward — caught
3 concrete UAPI errors (data_bit_offset → data_byte_offset rename;
dpb.rps field gone; pic_order_cnt struct shape) that would have
been Phase 6 compile failures or silent Phase 7 byte-compare
divergences requiring loopback. Outside-look value substantial.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:51:33 +00:00
claude-noether 348736eb63 iter2 Phase 4: plan — 10 contract clauses, ~400-line h265.c rewrite
Phase 4 plan for iter2 HEVC fix. Structured per the
feedback_dev_process.md Phase 6 contract-before-code worked example
(0012-h264-omit-scaling-matrix-frame-based.patch shape): contract
clauses with citations first, then code changes mapping 1:1 to
clauses.

10 contract clauses cited from authoritative sources:

  Clause 1 — Per-frame batched VIDIOC_S_EXT_CTRLS, count=5
    Authority: linux/v4l2-controls.h:2090-2300 (8 HEVC stateless CIDs)
    Reference impl: FFmpeg libavcodec/v4l2_request_hevc.c:505-565
                    (v4l2_request_hevc_queue_decode)
    Empirical anchor: Phase 3 Baseline B verbatim payload

  Clause 2 — v4l2_ctrl_hevc_sps layout (40 bytes)
    Authority: linux/v4l2-controls.h:2096+ (struct + 9 SPS_FLAG_* bits)
    Field-by-field VAAPI source mapping table; existing
    h265_fill_sps logic preserved, just routed to flags bitmask
    Phase 3 Baseline B BBB SPS bytes: flags=SAO|STRONG_INTRA_SMOOTHING

  Clause 3 — v4l2_ctrl_hevc_pps layout (64 bytes, 19 flags)
    Authority: linux/v4l2-controls.h:2126-2150
    Field source: VAPictureParameterBufferHEVC + slice (for
                  dependent_slice_segment_flag)

  Clause 4 — v4l2_ctrl_hevc_slice_params (variable; dynamic-array)
    Authority: kernel exposes 0xa40a92 elems=1 dims=[600] dynamic-array
    Submission shape: size = sizeof(slice_params) * num_slices_in_frame
    Reference impl: FFmpeg v4l2_request_hevc.c:540-547
    BEHAVIORAL CHANGE: per-slice accumulation in codec_store_buffer
                      (replace overwrite with append-to-array)
    DPB MOVES OUT of slice_params to DECODE_PARAMS (Clause 6)

  Clause 5 — v4l2_ctrl_hevc_scaling_matrix (size M; conditional)
    Conditional on kernel availability (probed via VIDIOC_QUERY_EXT_CTRL
    at init), NOT on bitstream flag (Phase 3 baseline corrects Phase 2
    assumption)
    Spec defaults from ISO/IEC 23008-2 Table 4-1 when iqmatrix_set==false
    PROTOCOL: transcribe defaults from Phase 3 Baseline B verbatim
              SCALING_MATRIX bytes, NOT from spec recall (per
              memory feedback_review_empirical_over_theoretical.md)

  Clause 6 — v4l2_ctrl_hevc_decode_params layout (328 bytes)
    NEW in modern API (didn't exist in staging-era)
    Contains: DPB array (16 entries), POC, num_active_dpb_entries,
              num_poc_st_curr_before/after, num_poc_lt_curr,
              poc_st_curr_before[8], etc.
    Source: existing h265_fill_slice_params lines 269-315 logic
            preserved, routed to new struct

  Clause 7 — Device-wide DECODE_MODE + START_CODE menus
    Set once at init via v4l2_set_controls(...request_fd=-1, 2 ctrls)
    rkvdec accepts: FRAME_BASED + ANNEX_B (only options per kernel menu
                    constraints, Phase 0 v4l2_inventory)
    Default location: extend src/context.c:142-155 device-init block

  Clause 8 — config.c HEVCMain case must break;
    Authority: C semantics; iter1 Bug 1 pattern verbatim
    Empirical anchor: Phase 3 Baseline D scratch confirmed

  Clause 9 — picture.c::codec_set_controls HEVCMain dispatch
    Authority: existing MPEG-2 dispatch pattern at picture.c:186-191
    Replace explicit Fourier-local: HEVC stripped reject with
    h265_set_controls call

  Clause 10 — Per-slice accumulation in codec_store_buffer
    HEVC slice_params dynamic-array source = per-RenderPicture appends
    BeginPicture resets num_slices=0; codec_store_buffer appends each
    VASliceParameterBufferType to slices[N] array

Diff scope (8 files):
  src/config.c     — 5-line break addition (Clause 8)
  src/picture.c    — HEVCMain dispatch (Clause 9) + per-slice
                     accumulation (Clause 10) + BeginPicture
                     num_slices reset, ~25 lines
  src/surface.h    — extend params.h265 with slices[64] +
                     num_slices, ~17 KB extra per surface union
  src/h265.c       — full rewrite ~400 lines (Clauses 2-7)
  src/h265.h       — re-enable
  src/meson.build  — uncomment h265.c + h265.h
  src/context.c    — extend device-init for HEVC DECODE_MODE +
                     START_CODE
  include/hevc-ctrls.h — leave as-is (9-line shim, lower-risk path
                          per iter1 Phase 5 Nit 6 deferral)

Phase 6 implementation order (2 logical commits + optional fix-forward):
  A: src/config.c HEVCMain break only (substrate fix in isolation;
     Phase 3 Baseline D already verified collateral safe)
  B: h265.c rewrite + picture.c dispatch + slice_params accumulation +
     meson re-enable + surface.h extension + context.c device-init
  C: optional fix-forward if Phase 7 surfaces a regression

Phase 7 verification harness (full Bash incantations in plan body):
  Criterion 1: vainfo lists VAProfileHEVCMain on rkvdec
  Criterion 2: vaCreateConfig(VAProfileHEVCMain) = SUCCESS via libva trace
  Criterion 3: ffmpeg -hwaccel vaapi exit 0, no Failed-to-create
  Criterion 4: mpv --hwdec=vaapi --vo=image at +02s; HW=SW byte-identical
              (DMA-BUF GL cache-coherency-safe path per memory
              feedback_rockchip_pixel_verify_path.md)
  Criterion 5: iter1 MPEG-2 + T4 H.264 reference hashes still match
  Bonus: byte-compare post-fix S_EXT_CTRLS payload vs Baseline B

Pre-identified Phase 7 → Phase 4 loopback triggers:
  1. S_EXT_CTRLS EINVAL post-fix → check struct sizes (pahole),
     reserved zeroing, SCALING_MATRIX size encoding
  2. HW pixel hash mismatch → DPB ordering, slice_params bit_offset,
     SPS/PPS flags bit positions, SCALING_MATRIX values
  3. mpv --hwdec=vaapi filters HEVC out → fall-forward to ffmpeg
     -vf hwdownload (less likely; vaapi engaged MPEG-2 in iter1)
  4. iter1/T4 regression → verify diffs scoped right
  5. Slice_params dynamic-array submission shape rejected → cross-
     validator size encoding anchor
  6. SCALING_MATRIX availability detection wrong → defensive
     QUERY_EXT_CTRL probe in h265_init_device_controls
  7. Latent bug B3 hits HEVC differently than MPEG-2 → byte 240 in
     h265.picture; ffmpeg-vaapi sends VAPictureParameterBufferType
     per frame so masking holds

Out-of-scope (LOCKED): VP9/VP8; HEVC Main 10 / Main Still Picture /
range ext / tile-wavefront; perf metrics; long-duration stress;
SLICE_BASED decode mode (rkvdec FRAME_BASED only); Phase 4 cross-
cutting backlog (B1 device-discovery, B3 BeginPicture profile-aware,
B4 context.c log suppression, B5 vbv_buffer_size, L3 vaDeriveImage
cache-stale); chromium-fourier 149 install; upstream engagement;
hevc-ctrls.h deletion (Phase 5 Nit 6 lower-risk path continues).

Predicted Phase 8 close: 4-6 commits on the fork (vs iter1's 4).
Iter2 ~3x larger codebase delta than iter1 (mpeg2.c rewrite was
~120 lines; h265.c rewrite is ~400 lines).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:56:51 +00:00
claude-noether d35a247948 iter2 Phase 3: baselines — substrate verified post-upgrade, HEVC anchor captured
Phase 3 baselines for iter2 HEVC. Substrate-update verification
ran first (post pacman -Syu rolling upgrade), then iter2-specific
HEVC cross-validator anchor + Bug 1 scratch.

Pre-Phase-3 substrate event: pacman -Syu landed 71 packages.
The "scheduled for linux-7" upgrade was headers-only —
linux-eos-arm-headers 6.19.9-99 → 7.0.3-1, but linux-eos-arm
kernel binary stayed at 6.19.9-99 (EOS-ARM repo hasn't
published the matching 7.x kernel yet). Userland refreshed:
qt6-base epoch bump, libdrm 2.4.131 → 2.4.133, chromium
147 → 148, KDE 26.04.1 batch, mkinitcpio 41-3, etc. OC DTB
intact (sha256 unchanged). mfritsche Plasma session active
throughout, no SDDM regression on this kernel boot.
eos-reboot-recommended marker installed; reboot deferred.

Baseline A (substrate validation post-upgrade):

  T4 H.264 +30s and iter1 MPEG-2 +02s reference hashes all
  8 match exactly:
    H.264 HW1=SW1=f623d5f7a41697f67dd227275c6f1b21ffc257f65626d32fde8229357f8764c9
    H.264 HW2=SW2=7d7bc6f2146dda8b2d223bba622c4b9fbe9674181ff1e02afe286b620342e0a8
    MPEG-2 HW1=SW1=6e7873030dbf0403c67f35dd106ebef3c7909a0fd12433b82ad758e7fee9f092
    MPEG-2 HW2=SW2=ccc7ce08810d4a96e9ba7a19f4f95bbf6cc861bda9337604b5c668ad52bef7de
  Userland upgrade did not regress kernel-side decode or
  DMA-BUF GL readback.

Baseline B (HEVC cross-validator verbatim contract anchor):

  ffmpeg -hwaccel v4l2request decoded bbb_720p10s_hevc.mp4
  -frames:v 5 cleanly. Per-frame submission shape:

    VIDIOC_S_EXT_CTRLS, ctrl_class=V4L2_CTRL_CLASS_CODEC_STATELESS,
                        count=5
      0xa40a90 SPS            size=40
      0xa40a91 PPS            size=64
      0xa40a92 SLICE_PARAMS   size=N (dynamic-array)
      0xa40a93 SCALING_MATRIX size=M
      0xa40a94 DECODE_PARAMS  size=328
    Plus init device-wide:
      0xa40a95 DECODE_MODE    (menu, set once)
      0xa40a96 START_CODE     (menu, set once)

  Key Phase 2 amendments from Phase 3 evidence:
    - Per-frame batch is 5 controls (not "up to 6" — BBB
      doesn't trigger ENTRY_POINT_OFFSETS / EXT_SPS_*).
    - SCALING_MATRIX is sent unconditionally for BBB. FFmpeg
      gates on ctx->has_scaling_matrix from kernel
      VIDIOC_QUERY_EXT_CTRL at init, NOT on per-frame
      bitstream flags. Phase 4 plan amends: query kernel for
      SCALING_MATRIX availability at init, submit if available.

  SPS payload field-decoded (40 bytes verbatim from BBB
  fixture): 1280x720, 8-bit, 4:2:0, no PCM, flags = SAO |
  STRONG_INTRA_SMOOTHING. PPS + DECODE_PARAMS + SLICE_PARAMS +
  SCALING_MATRIX payloads captured for Phase 4 transcription.

Baseline C (slice-count probe): deferred. ffprobe confirms
1 video stream HEVC Main 1280x720 24fps 10s. Per-frame
slice-count not directly extracted; assume 1 slice/frame for
x265 ultrafast preset until Phase 6 verifies. Kernel
advertises slice_params dynamic-array max 600 entries
(phase0 v4l2_inventory), so multi-slice frames are supported
by the contract.

Baseline D (Bug 1 scratch test, collateral safety):

  Applied Bug 1 (config.c break for HEVCMain) on throwaway
  branch; h265.c stayed disabled. Built + installed.
    H.264 HW frames @ +30s: f623d5f7..., 7d7bc6f2... (match T4)
    MPEG-2 HW frames @ +02s: 6e7873030dbf..., ccc7ce08810d...
                              (match iter1)
  Bug 1 in isolation does not regress H.264 or MPEG-2.

  HEVC behavior with Bug 1 only:
    libva trace: vaCreateConfig SUCCESS for VAProfileHEVCMain
    ffmpeg: Task finished with error code: -5 (Input/output error)
  Decode fails downstream because picture.c:204-206 still has
  the explicit case VAProfileHEVCMain: return UNSUPPORTED_PROFILE
  reject (Bug 2). Confirms Phase 2 prediction; Bug 2 fix
  requires h265_set_controls to exist (Bug 3-6: enable +
  rewrite). Bug 2 lands together with the h265.c rewrite in
  Commit B (analogous to iter1 Commit B).

  Scratch state cleaned: git checkout + rebuild + reinstall
  master backend. H.264 + MPEG-2 still pass. Back to Baseline-A-
  equivalent state.

Phase 4 plan inputs updated:
  - Per-frame batch: 5 controls (not "up to 6")
  - SCALING_MATRIX: unconditional iff kernel advertises (init
    QUERY_EXT_CTRL probe), not bitstream-conditional
  - SLICE_PARAMS: dynamic-array (max 600 elems per kernel UAPI)
  - DECODE_MODE + START_CODE: 2 device-wide menus at init
  - Phase 7 harness anchors on mpv-vaapi-vo=image (DMA-BUF GL
    cache-coherency-safe path per
    feedback_rockchip_pixel_verify_path.md)
  - Phase 7 bonus: byte-compare post-fix S_EXT_CTRLS payload
    against Baseline B (per feedback_review_empirical_over_
    theoretical.md — empirical wins)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:13:38 +00:00
claude-noether b3ba157cb4 iter2 Phase 2: situation analysis — six bugs in HEVC path
Phase 2 source-read of the HEVC path post-iter1-close (fork master
229d6d1). Six bugs identified, all in libva backend; kernel + driver
path proven for HEVC in Phase 0 cross-validator sweep.

Substrate timing caveat: Phase 2 conducted against fresnel kernel
6.19.9-99. Operator-scheduled rolling pacman -Syyuu to linux-7
imminent. Phase 2 source-read findings are kernel-agnostic (fork
code + UAPI + FFmpeg reference); they carry forward across the
kernel jump unchanged. Phase 3 baselines will run on linux-7.

Bug 1 — src/config.c:64-69 HEVCMain falls through to default,
returns VA_STATUS_ERROR_UNSUPPORTED_PROFILE. Verbatim match for
iter1 Bug 1 pattern; fix is 3-line break addition.

Bug 2 — src/picture.c:204-206 explicit
case VAProfileHEVCMain: return UNSUPPORTED_PROFILE
with stale comment "Fourier-local: HEVC stripped, no HW support
on RK3566." (RK3566 is ohm context; fresnel is RK3399 where
rkvdec DOES support HEVC.) Fix: replace explicit reject with
dispatch to h265_set_controls() (mirrors MPEG-2 dispatch at
picture.c:186-191).

Bug 3 — src/h265.c uses staging-era CIDs:
  V4L2_CID_MPEG_VIDEO_HEVC_PPS / _SPS / _SLICE_PARAMS
These don't exist on fresnel's 6.19 kernel headers (verified via
test-compile: gcc reports undeclared identifiers, suggests
V4L2_CID_MPEG_VIDEO_DEC_PTS as nearest match). Mainline kernel
UAPI splits HEVC stateless into 7 controls:
  V4L2_CID_STATELESS_HEVC_{SPS,PPS,SLICE_PARAMS,SCALING_MATRIX,
                            DECODE_PARAMS,DECODE_MODE,START_CODE}
  + ENTRY_POINT_OFFSETS, EXT_SPS_ST_RPS, EXT_SPS_LT_RPS
(0xa40a90..0xa40a96 + extensions, V4L2_CID_CODEC_STATELESS_BASE
+ 400..407+).

Fix shape: rewrite h265.c against new split API. Substantially
larger than iter1's mpeg2.c rewrite (HEVC has 7 controls vs MPEG-2
3, + slice_params dynamic-array, + per-slice accumulation logic
needed).

Bug 4 — h265.c uses single-slice_params shape; new API is
dynamic-array. Fresnel rkvdec advertises:
  hevc_slice_parameters 0xa40a92 elems=1 dims=[600] dynamic-array
Up to 600 slice_params entries per submission. Current
codec_store_buffer:115-135 OVERWRITES previous slice on
VASliceParameterBufferType arrival. Multi-slice frames need
APPEND-not-overwrite. FFmpeg reference v4l2_request_hevc.c:540-547
shows the pattern.

Fix shape: extend params.h265 to hold slice_params array (or
pointer+count); codec_store_buffer appends; h265_set_controls
flushes the array at end_picture as a single dynamic-array
S_EXT_CTRLS entry.

Bug 5 — h265.c missing controls: doesn't submit DECODE_PARAMS
(per-frame DPB info; new in modern API), SCALING_MATRIX (conditional
on iqmatrix_set + sps.scaling_list_enabled), DECODE_MODE+START_CODE
(device-wide menus, set once per context init).

Fix shape: add h265_fill_decode_params() (DPB ordering from VAAPI
ReferenceFrames[15] — preserve current extraction logic from
h265_fill_slice_params:269-315, route to new struct). Conditional
SCALING_MATRIX from VAIQMatrixBufferHEVC. Device-wide
DECODE_MODE+START_CODE either at first h265_set_controls call or
in extended context.c device-init block.

Bug 6 — src/meson.build comments out 'h265.c' (line 50) and
'h265.h' (line 73). Fix: uncomment both. Trivial.

Bug 7 (verify only) — include/hevc-ctrls.h is a 9-line shim that
just #include <linux/v4l2-controls.h>. Comment dates the
modernization to "linux-media 6.6+". Adds zero value; harmless.
Leave in place per iter1 Phase 5 Nit 6 lower-risk path.

Bug 8 (latent) — picture.c:287 params.h264.matrix_set=false
writes union byte 240. For HEVC: byte 240 lands inside
h265.picture (range [0..604), size 604) — different field than
MPEG-2's chroma_intra_quantiser_matrix. ffmpeg-vaapi's
per-frame VAPictureParameterBufferHEVC re-send overwrites the
corrupted byte before h265_set_controls reads. Latent for
clients that reuse a surface without re-sending picture params.
iter2+ Phase 4 cross-cutting backlog candidate; not iter2 scope.

Things verified NOT bugs:
  - h265_fill_pps/sps/slice_params field extraction from VAAPI
    structs is sound (just routes to wrong destination structs)
  - NAL header parsing (data_bit_offset bit-search) is preserved
    in new API — slice_params still has bit_size + data_bit_offset
  - v4l2_set_controls batching API in place (used by H.264 + iter1
    MPEG-2; iter2 uses same)

Substrate / kernel observation:
  - Linux mainline 7.1.0-rc2 reference checkout has
    drivers/staging/media/rkvdec/ with rkvdec.c, rkvdec-h264.c,
    rkvdec-vp9.c — NO rkvdec_hevc.c. fresnel's HEVC support is
    out-of-tree (Christian Hewitt patches per phase0_findings.md
    external references). May land in stable 7.x.
  - Phase 4 contract-before-code therefore can't cite kernel-side
    HEVC handler source until/unless rkvdec_hevc.c lands in
    mainline. UAPI doc + FFmpeg reference + Phase 3 cross-validator
    bytes are the contract anchor.

Open questions tabled for Phase 3 (post-linux-7-upgrade):
  1. iter1 + T4 references on linux-7 (regression check of closed
     iter1 work)
  2. SDDM watchpoint on linux-7
  3. Cross-validator HEVC re-anchor (Baseline C equivalent for
     HEVC) — verbatim payload bytes for SPS, PPS, DECODE_PARAMS,
     SLICE_PARAMS array, SCALING_MATRIX
  4. Pre-fix scratch test (Bug 1 + Bug 2 only, h265.c kept
     commented out) — confirm collateral safe
  5. Slice-count for bbb_720p10s_hevc.mp4 fixture
  6. Whether linux-7 brings rkvdec_hevc.c into mainline

Predicted iter2 close shape: trivial Bugs 1+2+6 fixes + sizable
h265.c rewrite (~250-400 lines, ~3x iter1's mpeg2.c) + new
codec_store_buffer slice accumulation logic. If Phase 7 fails:
likely struct-size mismatch (run pahole), DPB ordering, or
slice_params array size encoding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:28:08 +00:00
claude-noether 6e8c970c1d iter2 Phase 0 + Phase 1 lock: HEVC Main on rkvdec
Iteration 2 of the campaign 8(+1)-phase loop opens following iter1
close (dc69378). Per phase0_evidence/2026-05-07/cross_validator_
traces.md suggested ordering, iter2 attacks HEVC Main on rkvdec —
the kernel + driver path is verified working (cross-validator sweep
exit 0); broken link is the libva backend at five distinct sites:

  src/config.c HEVCMain case fall-through (analogous to iter1 Bug 1)
  src/picture.c HEVCMain explicit UNSUPPORTED_PROFILE reject (NEW)
  src/h265.c uncompiled in build (presumably staging-era CIDs;
              Phase 2 source-read decides scope of rewrite)
  include/hevc-ctrls.h staging-era local header (deferred from
              iter1 Phase 5 Nit 6; iter2 closes the loop)
  src/meson.build h265.c commented out (re-enable)

Plus possible novel issues vs iter1's MPEG-2 work:
  - HEVC has 10 stateless control IDs vs MPEG-2's 3 (much larger
    rewrite if h265.c uses staging-era API)
  - HEVC slice_params is dynamic-array (kernel rkvdec accepts up
    to 600 entries) — different submission shape vs MPEG-2 single-
    struct or H.264 fixed-shape
  - HEVC SCALING_MATRIX is conditional (only when scaling_list_
    enabled in SPS); mapping VAIQMatrixBufferHEVC to V4L2 control
  - HEVC ENTRY_POINT_OFFSETS is in kernel surface (tile/slice
    resync) but campaign fixture doesn't use tiles — defer

Locked research question:

  Make HEVC Main the third codec to pass boolean-correctness on
  fresnel via libva-v4l2-request-fourier — mpv --hwdec=vaapi
  bbb_720p10s_hevc.mp4 engages backend cleanly and DMA-BUF GL
  import yields HW pixels byte-identical to SW reference for the
  same frames.

Phase 1 success criterion (5 boolean checks, all must pass):

  1. vainfo enumerates VAProfileHEVCMain on rkvdec env binding
     (regression check; already passes today).
  2. vaCreateConfig(VAProfileHEVCMain, VLD) returns VA_STATUS_
     SUCCESS. (Pre-iter2: VA_STATUS_ERROR_UNSUPPORTED_PROFILE.)
  3. ffmpeg -hwaccel vaapi -i bbb_720p10s_hevc.mp4 -frames:v 5
     -f null - exits 0 cleanly with no Failed-to-create-decode-
     configuration lines and no S_EXT_CTRLS EINVAL on HEVC
     controls. (Phase 1 criterion 3 anchored on ffmpeg-direct,
     mirroring iter1 Phase 5 Q4 amendment for codecs mpv may
     filter out.)
  4. mpv --hwdec=vaapi --vo=image at +02s seek: 2 distinct frames
     hash-equal to SW reference, hash-differ from each other (real
     motion). DMA-BUF GL import path per memory feedback_rockchip_
     pixel_verify_path.md (NOT ffmpeg-vaapi+hwdownload, which is
     cache-stale on RK3399 for both H.264 and MPEG-2 per iter1
     Phase 6/7 findings).
  5. iter1 MPEG-2 + T4 H.264 reference hashes BOTH still match
     (regression check on prior-iteration cells):
       MPEG-2 +02s: HW1=6e7873030dbf...   HW2=ccc7ce08810d...
       H.264 +30s:  HW1=f623d5f7a416...   HW2=7d7bc6f2146d...

Substrate carry-over:

  - libva-v4l2-request-fourier master tip post-iter1-close
    (commits e7dad7a..229d6d1 stack on iter8 65969da).
  - bbb_720p10s_hevc.mp4 fixture (620 KB, HEVC Main, 1280x720,
    24fps, 10s, yuv420p; provenance phase0_evidence/2026-05-07/
    test_fixtures.md).
  - Cross-validator anchor: phase0_evidence/2026-05-07/
    cross_validator/hevc/ — 14 S_EXT_CTRLS + 5 QUERY_EXT_CTRL
    (HEVC slice_params dynamic-array introspection unique among
    the 5 codecs) + 4 REQUEST_ALLOC.
  - Memory carries forward: feedback_gitea_as_claude_noether,
    feedback_no_session_termination_attempts, feedback_header_
    deletion_check (iter1 lesson L1 — apply to hevc-ctrls.h
    deletion), feedback_review_empirical_over_theoretical
    (iter1 lesson L2 — apply to Phase 5 review responses),
    feedback_rockchip_pixel_verify_path (iter1 lesson L3 —
    DMA-BUF GL is the verifier, NOT cached-mmap).

Out-of-scope (LOCKED): VP9/VP8 (later iterations); HEVC Main 10
(silicon support unverified); HEVC Main Still Picture; performance
metrics; long-duration HEVC stress; tile / wavefront parallel
processing (ENTRY_POINT_OFFSETS); Phase 4 cross-cutting backlog
(B1 device-discovery, B3 BeginPicture profile-aware reset, B4
context.c log suppression, B5 vbv_buffer_size negotiation, L3
vaDeriveImage cache-stale fix); chromium-fourier 149 install;
src/context.c changes; upstream engagement.

Predecessor open questions:
  - iter1 B3 latent surface-reuse bug (picture.c:287
    h264.matrix_set=false hits union byte 240) — for HEVC, the
    union member is params.h265.{picture,slice,iqmatrix,
    iqmatrix_set}. params.h265 layout differs from params.mpeg2.
    Phase 2 source-read action item: verify whether byte 240 lands
    in a meaningful HEVC field. If so, iter2 may need to address
    even though MPEG-2 didn't.

Phase 2 source-read targets (queued for next phase):
  - src/h265.c (~267 lines) — current state, target API
  - src/picture.c:204-206 (the explicit HEVC reject)
  - src/config.c:55-69 (confirm HEVCMain fall-through)
  - src/surface.h:103-108 (params.h265 struct)
  - include/hevc-ctrls.h (staging-era; identify CID/struct refs)
  - src/meson.build (commented-out h265.c)
  - linux/v4l2-controls.h:2110+ (modern HEVC stateless UAPI)
  - drivers/staging/media/rkvdec/rkvdec_hevc.c (rkvdec contract)
  - libavcodec/v4l2_request_hevc.c (FFmpeg reference impl)
  - va/va_dec_hevc.h (VAAPI HEVC buffer structs)

Predicted iter2 close shape: similar pattern to iter1 (config
break + h265.c new-API rewrite + header delete + meson re-enable
+ picture.c reject removal). Larger code change than iter1
(predicting 250-400 lines for h265.c rewrite vs iter1's ~120 lines
for mpeg2.c). One novel construct (slice_params dynamic-array)
worth Phase 4 contract-clause-level attention. Expect Phase 6
takes longer than iter1; Phase 7 harness re-uses iter1's pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 09:44:30 +00:00