α-7 (monotonic timestamp counter) changed wire bytes but H.264 output
unchanged (71ac099b...). Confirms Phase 5 CRIT-1 prediction: VP9/MPEG-2
PASS via libva with the same v4l2_timeval_to_ns(&ref->timestamp)
pattern; therefore timestamp magnitude was never load-bearing.
5-codec regression sweep: all 4 non-H.264 anchors hold. Zero regression.
Cumulative state after iter8+iter9:
- 6 hypotheses eliminated (libva-readback, slot-binding, stale-residue,
constraint_set_flags, POC sentinel, reference_ts magnitude)
- libva-vs-kdirect H.264 wire-byte diff is now empirically zero
- α-2 + α-7 shipped as wire-payload hygiene cleanups (zero behavior
change but cleaner semantics)
iter10 candidate ranking:
1. α-8 OUTPUT bitstream byte dump (compare in-memory slice bytes)
2. α-9 untraced control diff (device-wide controls beyond DECODE_MODE
+ START_CODE)
3. Kernel-side investigation (rkvdec source dive for 16x32 partial-
decode signature)
4. Pivot to Bug 5 (HEVC) or Bug 6 (VP8)
Two more iterations of diminishing returns suggest either deeper
empirical work (OUTPUT-byte dump or kernel investigation) or pivot
to a different bug.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VP9 (vp9.c:624) and MPEG-2 (mpeg2.c:150,156) use v4l2_timeval_to_ns
identically to H.264. Both PASS via libva with the same gettimeofday-
based giant ns values. If timestamp magnitude were the bug, VP9/MPEG-2
should also fail. They don't.
Reviewer flagged α-7 as low-probability fix and pointed to iter10
kernel-side investigation (M-A vb2_find_buffer_by_timestamp overflow)
if α-7 confirmed inert.
IMP-1: timestamp_counter should live in object_context not driver_data
to avoid multi-context collisions.
Decision: implement α-7 anyway as empirical confirmation (5 min) since
test cost is trivial. If α-7 fails as predicted, iter9 closes PARTIAL
with wire-byte search exhausted; iter10 candidates pivot to slice-data
encoding or kernel investigation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 deep-strace yielded a critical narrowing:
- Post-DPB DECODE_PARAMS bytes (512-559): IDENTICAL libva vs kdirect
- PPS: IDENTICAL
- SPS: identical except inert constraint_set_flags
- DPB[0] beyond reference_ts: IDENTICAL after α-2
The ONLY remaining wire-byte diff between libva (broken) and kdirect
(working) is reference_ts magnitude. libva uses gettimeofday giving
~1.78e18 ns; kdirect uses an internal counter giving ~10000 ns.
α-7 hypothesis: V4L2 stateless decoder (rkvdec) reference-resolution
fails for very large reference_ts values. Possible mechanisms:
M-A: vb2_find_buffer_by_timestamp truncates/overflows on giant values.
M-B: V4L2 framework transforms OUTPUT QBUF ts before storing on CAPTURE
but DPB.reference_ts left untransformed → mismatch.
M-C: gettimeofday + v4l2_timeval_to_ns produce slightly different ns
values than the kernel computes from the timeval QBUF.
Fix: ~10 LOC. Add timestamp_counter to driver_data; replace
gettimeofday in EndPicture with monotonic counter.
If α-7 works → iter9 PASS, Bug 4 closed.
If α-7 doesn't → iter9 PARTIAL, wire-byte search space effectively
exhausted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sonnet-architect Phase 5b read rkvdec-h264.c end-to-end and confirmed:
constraint_set_flags is NEVER accessed by the driver. assemble_hw_pps()
reads only chroma_format_idc, bit_depth_*, log2_max_frame_num_minus4,
max_num_ref_frames, pic_order_cnt_type, log2_max_pic_order_cnt_lsb_minus4,
and dimension fields. rkvdec_h264_validate_sps() doesn't validate it.
CONSTRAINT_SET3_FLAG and PROFILE_IDC in the hardware PPS packet are
hardcoded constants (1 and 0xFF respectively), not propagated from the
incoming SPS.
α-1 will not unblock Bug 4. Plan-killer.
CRIT-2: ConstrainedBaseline 0x42 mapping is wrong (bit 6 reserved);
correct value 0x12 (bit 1 | bit 4) per H.264 §A.2.1.1.
IMP-1 redirects: DPB entry flags + POC fields are the next candidate.
rkvdec config_registers() reads dpb[i].flags ACTIVE/FIELD bits and
dpb[i].fields TOP/BOT bits. lookup_ref_buf_idx() substitutes destination
buffer as reference when ACTIVE missing — silent corruption matching
observed symptom.
IMP-2/3: full PPS byte comparison + close-criteria framing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
γ dump confirms libva reads buffer correctly; the 16x32 patch and
stride-4 UV markers appear at YUV output exactly as in the dump.
IMP-1 memset-before-QBUF test: pre-zeroing buffer does NOT change output
(identical hash). The 512 bytes ARE deterministic kernel writes, not
stale residue.
Bug root cause: rkvdec accepts libva's H.264 decode request without
error flags but writes only 16x32 of luma-neutral data + stride-4 UV
scratch. Kernel decoded a tiny bit then stopped.
Phase 3 SPS diff: libva SPS.constraint_set_flags=0x00 vs kdirect's
0x02 — likely the kernel hint that triggers rkvdec's full decode path
for Main profile. Phase 4b α-1 fix: derive constraint_set_flags per
VAProfile in h264_set_controls. ~10 LOC. Phase 5b review required.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CRIT-1: request_log prepends prefix on every call; per-byte loop in γ
sketch would emit 32 prefix-only lines. Fix: snprintf buffered emit.
CRIT-2: γ dump block missing null guard on destination_data[]; the
plan's env-var check is outside the current_slot != NULL guard. Fix:
nest the dump inside the existing slot-null guard.
IMP-1: "stale residue from prior decode" not eliminated as alternative
explanation for the 16x32 patch. Add memset-zero-before-QBUF experiment
to Phase 7 to discriminate.
IMP-2: γ-first defensible but on IMP-1 grounds, not the
three-signature argument (which is weaker than stated).
IMP-3/4 placement clarifications. MIN-1/2/3 cosmetic.
5 mechanical amendments locked for Phase 6. γ-first strategy stands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 redefined Bug 4 to partial-fill (not inter race). Three distinct
per-codec signatures (VP9 correct, HEVC zero, H.264 partial-leak) can't
be explained by a single hypothesis. Phase 4 commits to γ first: a
~30 LOC env-gated diagnostic dump in RequestSyncSurface that fires
after CAPTURE DQBUF, prints first/last 32 bytes of each destination_data
plane and a non-zero-count of the first 1024 bytes.
γ definitively distinguishes "kernel didn't write" from "libva mis-reads"
from "slot binding wrong". Phase 4b targeted fix follows γ's outcome.
Out of scope: per-codec H.264 control-fill changes (gated on γ's
findings), VP9/VP8/HEVC/MPEG-2 paths, kernel patches.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 strace + byte-level analysis on fresnel rkvdec. Findings:
1. Bug 4 is NOT inter-race-loss. The IDR keyframe itself fails through
libva (only 512 bytes of real Y data at top-left 16x32 region).
2. The 16x32 leak is structured real image content (smooth gradients,
neutral luma ~0x80) — kernel decoded one tile / one MB pair, then
stopped.
3. VP9 via libva WORKS through the same readback path (100% non-zero,
real image data). So the bug isn't generic DMA-BUF cache coherency.
4. HEVC fails via libva (all-zero, distinct from H.264 partial-fill).
5. OUTPUT sizeimage = 1MB (SOURCE_SIZE_MAX) is sufficient — BBB IDR is
only 6321 bytes. Not the bug.
6. Control payload diffs: SPS.constraint_set_flags = 0 vs kdirect's 2
(probably cosmetic); DECODE_PARAMS.dpb[0].bottom_field_order_cnt = 0
vs kdirect's 1 (load-bearing for POC).
Refined hypothesis: a specific H.264 control field libva sends causes
rkvdec to abort after partial decode. Phase 4 candidates: α fix POC
fields, β bump OUTPUT sizeimage, γ instrumentation dump, δ relative
timestamps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User pick at iter8 open. Carried unchanged through 5 iters (iter4..iter7);
keyframe partially decodes (frame-1 first 16 bytes = real chroma) while
inter frames return all-zero. Pass criterion: libva_h264 == kdirec_h264
== sw_h264 byte-identical for bbb_1080p30_h264.mp4 3-frame, including
inter frames.
In scope: src/h264.c, src/h264_slice_header.c, src/picture.c H.264 paths,
per-frame request_fd lifecycle. Out of scope: VP9/VP8/HEVC/MPEG-2, kernel
patches, performance, all other backlog items.
Substrate at iter8 open: fork tip 6df2159 (iter7), backend SHA 520507f6..,
kernel linux-fresnel-fourier 7.0-1, auto-detect picks rkvdec on every boot.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 5 sonnet-architect found:
- CRIT-1: interface links connect IO entities (source/sink) to interfaces,
NOT directly to proc entity. Walk must use MEDIA_LNK_FL_INTERFACE_LINK
(1U<<28) to discriminate. Author verified at media.h:223-225.
- CRIT-2: source_id/sink_id ordering not guaranteed in link entries;
check both endpoints. Author verified media_v2_link struct at media.h:341-347.
- IMP-1: hantro decoder-proc (entity 17) distinct from encoder-proc
(entity 3) by function field. Algorithm correct by construction —
no encoder contamination possible.
- IMP-2: MEDIA_ENT_F_PROC_VIDEO_DECODER set on both rkvdec-proc
(rkvdec.c:1382) and hantro-dec-proc (hantro_drv.c).
- IMP-3: current 3-call ioctl pattern has spurious memset; new function
uses 2-call pattern (alloc all 3 arrays before second call).
- IMP-4/MIN-1/2/3: minor implementation notes.
All 5 substantive findings empirically verified against boltzmann's
linux-rockchip tree.
Phase 6 implementer pseudocode provided: walk entities → find decoder
proc → walk data links to collect IO entity neighbors → walk
interface links to find linked interface → resolve major:minor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 source-read found iter4-B1 conflates two sub-bugs:
- B1a: walk picks encoder when it should pick decoder. SMALL FIX
(~100-150 LOC). Add MEDIA_ENT_F_PROC_VIDEO_DECODER entity check
in find_video_node_via_topology; two-pass prefer rkvdec.
- B1b: multi-decoder routing (rkvdec for H.264/HEVC/VP9 + hantro
for MPEG-2/VP8 from one backend instance). Bigger arch fix
~200-400 LOC. DEFERRED.
iter7 ships B1a. Phase 1 criteria amended:
- Auto-detect always picks a decoder, never an encoder.
- Prefer rkvdec over hantro (rkvdec serves 3 of 5 codecs).
- 2 reboots verify stability.
- vainfo lists rkvdec's 3 codecs minimum.
- No regression on iter5b-β / iter6 state.
Phase 6 will use MEDIA_IOC_G_TOPOLOGY's entities+links arrays to
match V4L node entities to decoder-proc entities. Two-pass walk:
pass-1 rkvdec only, pass-2 any decoder.
Empirical baseline: on 2026-05-12 boot, /dev/media0=rkvdec (only
decoder), /dev/media1=hantro-vpu (encoder AND decoder both inside),
/dev/media2=uvc. Fix must skip encoder when accepting media1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backend-only ~30-80 LOC. Walk media-topology entities (already partially
done at iter4 Commit Z); require at least one entity with function ==
MEDIA_ENT_F_PROC_VIDEO_DECODER. Eliminates the hantro encoder false-match
that breaks vainfo + ffmpeg-vaapi on every other reboot.
5 boolean Phase 1 criteria locked. No kernel work. No pixel-correctness
chasing. Quality-of-life delivery; removes per-session env-override
friction.
Predicted lowest-difficulty iteration since iter1. 2-3 hours wallclock.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 Candidate K executed: H-D (slot rotation) ELIMINATED via
instrumented bind+read site logging. Slot v4l2_index matches at
BeginPicture and at vaGetImage for every surface; destination_data[0]
matches slot->map[0]. No rotation mismatch.
H-A/B/C/D all eliminated. H-E (kernel-side hantro VP8 partial-write)
confirmed by elimination. The libva backend submits correct controls,
correct slice bytes, correct slices_size, correct slot indices.
Kernel writes erratic partial content (per-frame Y plane transitions
at row 536, 24, ... — not a clean buffer-size truncation, not slot
rotation).
iter6 close PARTIAL: 5 of 6 Phase 1 criteria PASS; criterion 1
(libva_vp8 == kdirect) PARTIAL — kernel-side fix needed, out of
iter6's locked backend-only scope.
No patches landed. Fresnel substrate unchanged: fork tip 70196f8,
backend SHA 2c6ff82c... (identical to iter5b-β close).
Net deliverable: Phase 3 narrowing reduces Bug-6 hypothesis space
from 5 to 1. Future iter7+ (or kernel-agent campaign) picks up the
kernel-side investigation.
Pattern recognized: iter2 HEVC transitive PASS masked Bug 5;
iter3 VP8 transitive PASS masked Bug 6. Both surfaced under direct
verification post-iter5b-β. Transitive proofs against ONE artifact
(control payload) don't catch bugs in OTHER artifacts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empirical Phase 3 narrowing:
- H-A slice data corruption: ELIMINATED. SHA256 of libva-dumped slice 0
(300614 bytes) byte-identical to raw VP8 frame 0 from .webm at
offset 10..300624 (post-VP8-header).
- H-B slices_size wrong: ELIMINATED. slices_size = fp_size +
sum(dct_part_sizes) = 300614 exactly.
- H-C cache coherency: ELIMINATED. msync attempt yielded no output
change; VP9 uses same image.c path and works fine.
- Control payloads: byte-identical between libva and kdirect for VP8
keyframe (pre-Phase-2 finding).
Output pattern: erratic partial-write. Frame 0 Y plane has real
content rows 0-535, then 100% zero rows 536-719. UV plane real
rows 0-133, zero 134-359. Frame 1 Y plane real rows 0-23, zero
24-719. Per-frame transitions differ — not buffer-size truncation,
not slot rotation.
Remaining:
- H-D slot rotation (untested; needs instrumentation)
- H-E kernel-side hantro VP8 partial-write quirk (likely; needs
ftrace / kernel investigation)
iter5b-β did fix Bug 2 for VP8 (pre-β all-zero was format mismatch;
post-β real-but-partial content is a separate kernel-side issue).
Phase 3 hands off 4 candidate directions to user:
- K: continue H-D investigation (1-2h next session)
- L: pivot to H-E kernel-side work (multi-session)
- M: park Bug 6, pick different bug (Bug 4/5 or iter4-B1)
- N: close iter6 PARTIAL, defer Bug 6 to iter7+
Substrate unchanged; no regression. Backend SHA still 2c6ff82c....
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empirical byte-diff of libva vs kdirect VP8 control payloads on
current substrate:
- Keyframe (payloads 0+1): BYTE-IDENTICAL (0 diffs / 1232 bytes)
- Inter frames: only 24 bytes diff at offset 1200-1223, which are
the 3 reference-frame timestamps. libva uses gettimeofday→ns
(large values), kdirect uses pts-derived (small). Both internally
consistent; kernel uses them as keys, absolute values don't matter.
Verdict: Bug 6 is NOT in vp8.c control generation. The bytes match.
With identical controls and same hardware, libva produces 0.4% pixel
match for keyframe — bug lives in slice-data path, bytesused, cache
coherency, or CAPTURE slot rotation.
5 hypotheses (H-A..H-E) for Phase 3 to narrow:
- H-A slice data corruption in libva path (picture.c memcpy)
- H-B slices_size wrong on OUTPUT QBUF
- H-C cache coherency on OUTPUT mmap before kernel DMA read
- H-D CAPTURE slot rotation mismatch
- H-E other (deeper kernel-side)
Pre-iter5b masked all of these via the OUTPUT format mismatch
producing all-zero output. β fixed format → kernel actually decodes →
underlying bug now visible.
iter3's transitive proof verified specific control fields. Did not
verify slice data, bytesused, cache state, or slot rotation. Same
pattern as iter2's HEVC transitive PASS missing Bug 5. Future
transitive PASS claims must enumerate non-verified artifacts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two acts:
Act 1 (β alone): all 5 libva codecs returned all-zero. MPEG-2 was a
regression (pre-β it worked); HEVC was unchanged (kernel returns
DQBUF FLAG_ERROR pre AND post β — same Phase 3 baseline showed it).
Root cause: ffmpeg-vaapi-copy passes surfaces_count=0 to vaCreateContext
per iter6 context.c:262 comment; my β walk of surfaces_ids[] was a
no-op → destination_planes_count stayed 0 → surface_bind_slot no-op
→ all-zero readback.
Act 2 (Commit D): cache format-uniform CAPTURE geometry in driver_data;
walk surface_heap in CreateContext; lazy-fill in CreateSurfaces2 when
fmt_valid is set; invalidate in DestroyContext. Restores MPEG-2 to
pre-β state and unlocks VP9.
Per Phase 1 criteria: criterion 1 PARTIAL (VP9 of HEVC+VP9+VP8);
criteria 2-4 PASS.
Bug 5 (NEW): HEVC libva DQBUF FLAG_ERROR — pre-existing kernel
rejection; β's OUTPUT format fix didn't address it. Transitive proof
at iter2 verified control payload shape but kernel still rejects;
some other V4L2 protocol contract aspect differs from kdirect.
Bug 6 (NEW): VP8 libva produces non-zero output with real content
(74.8% zero + 256 unique bytes incl. keyframe pixels at `93 8e 8a 89...`)
but diverges from kdirect. Decode runs; output mismatch likely
slot-rotation or partial-fill bug.
VP9 is iter5b-β's only clean PASS. Architecture-wise β succeeded:
no α'-style failure mode possible (no in-CreateSurfaces2 destructive
teardown), and the CRIT-1+CRIT-2 fixes from Phase 5 v2 review held.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CRIT-1: context.c:64-66 video_format==NULL guard rejects every first
β CreateContext. β moves the probe from CreateSurfaces2 into
CreateContext itself, so the guard fires before any new logic runs.
Fix: remove guard, move CAPTURE probe to top of CreateContext.
CRIT-2: DestroyContext lacks request_pool_destroy. Empirical grep
shows only surface.c:220 (which β strips) calls it per-session.
Without amendment, second CreateContext gets pool->initialized=true
with stale slot pointers → QBUF EINVAL. Fix: add request_pool_destroy
to DestroyContext before REQBUFS(0). C3 (surface.c strip) and CRIT-2
fix MUST land together.
Plus IMP-1 (mplane assumption wrong for SUNXI_TILED_NV12) + IMP-2
(surface_reset_format_cache becomes dead under C7) + IMP-3 (error
recovery comment).
Phase 6 BLOCKED pending CRIT-1 + CRIT-2 fixes. Author confirmed
both at code level — Phase 5 caught what Phase 4 v2's surface read
missed ("DestroyContext teardown — no change needed" — wrong; was
incomplete).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Supersedes phase4_iter5b_plan.md (the α' plan rejected at Phase 7).
β architecture: strip OUTPUT-side V4L2 device state from
RequestCreateSurfaces2 entirely; move it to RequestCreateContext
where config_id (and therefore the bound profile) is unambiguously
known. CreateSurfaces2 becomes ID-allocation + per-surface
bookkeeping only.
9 contract clauses (C1..C9). Reuses 2 of 3 reverted iter5b commits
(codec.h/codec.c helper; object_config->pixelformat wire-up at
CreateConfig). New work: C3 strip surface.c, C4 build out
context.c — predicted ~120 LOC into context.c, ~190 LOC stripped
from surface.c (net ~70 LOC delta).
Risk register: 7 items; highest is multi-context resolution change
within shared driver_data (medium impact, mitigated by existing
DestroyContext teardown). α''s destructive teardown failure mode
disappears because β has no in-CreateSurfaces2 teardown branch.
Phase 5 review focus: error-recovery branches in CreateContext,
per-surface destination_* fill semantics (format-uniform fields
at CreateContext vs per-slot fields at BeginPicture), ohm
backwards-compat verification.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empirical sweep on iter5b backend (SHA d7722da...) crashed in
copy_surface_to_image during HEVC libva-vaapi-hwdownload. Coredump
backtrace shows memcpy on stale surface_object->destination_data[i]
pointer — cap_pool_destroy ran during my pixfmt-change teardown
branch, but the subsequent S_FMT got EBUSY because the OUTPUT
queue was already streaming. State corruption mid-decode.
Root cause: ffmpeg-vaapi calls vaCreateSurfaces2 *twice*, with
CreateContext+STREAMON between them. My CreateSurfaces2 gate
destructively tears down cap_pool on pixelformat change but can't
recover when REQBUFS(0) silently fails on a streaming queue.
surface.c:164-171 TODO comment from iter1 anticipated exactly this:
"STREAMOFF + REQBUFS(0) + new S_FMT + new CREATE_BUFS — that's a
context-level redesign for the next iteration." Phase 4 dismissed
the comment as targeting multi-resolution mid-stream. That
dismissal was wrong; ffmpeg-vaapi triggers the same code path.
3 reverts on fork master: 4b2288f, f8256e6, ce304ef reverted by
709ab34, 9a7f888, 6bc29ec. Backend rebuilt + reinstalled on fresnel
at iter4-tip SHA 6e90b7a9.... Post-revert HEVC libva returns the
pre-iter5b broken-but-non-crashing all-zero pattern.
Per Phase 1 lock: criteria 1 FAIL (HEVC/VP9/VP8 still all-zero);
criteria 2-4 PASS (no regression on MPEG-2/H.264 keyframe/control
payloads). iter5b does not close.
Phase 7 → Phase 4 loopback: re-plan as option β (defer OUTPUT-side
S_FMT+CREATE_BUFS to CreateContext where config_id is known and
streams haven't started). User pick: revert + re-plan with β.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sonnet-architect found one Critical pseudocode error and three
Important amendments. All mechanical; no structural plan change.
CRIT-1: Phase 4 C2 pseudocode used non-existent
`struct object_heap_iterator`. Actual API at object_heap.h:67-68 uses
`int *iterator`. Author re-verified vs request.c:411-418 canonical
usage. Verbatim paste would have compile-failed.
IMP-1: gate comment at surface.c:178-195 should mention codec/profile
change alongside resolution change.
IMP-2: dead `object_config->pixelformat` field at config.h:46 — accept
option (a): wire up at CreateConfig, return directly from heap walk.
Saves one pixelformat_for_profile() call in surface.c path.
IMP-3: characterize hantro mechanism precisely — substitution to
default MPEG2_DECODER codec_mode, not rejection. Explains why MPEG-2
worked but VP8 didn't pre-fix.
10 contract clauses scorecard: 1 FAIL (C2), 2 CONDITIONAL (C3, C10),
7 PASS. Phase 6 cleared conditionally pending all 4 amendments.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks α' over the Phase 2 recommendation of β: smaller scope (~50 LOC
vs ~250), targets iter5b's actual bug (wrong OUTPUT format at INITIAL
CreateSurfaces2, not the multi-resolution mid-stream case the
surface.c:164-171 TODO comment anticipates).
Patches:
- C1/C6: NEW src/codec.{h,c} + meson.build — pixelformat_for_profile()
- C2: NEW find_sole_active_profile() static helper in surface.c
- C3: Replace surface.c:173 hardcode with profile-derived lookup
- C5: Extend last_output_* gate with pixelformat
Phase 7 expected post-fix matrix: HEVC + VP9 + VP8 libva == kdirect
== sw (3 codecs unblocked); MPEG-2 unchanged (already worked);
H.264 still race-loses inter frames (Bug 4, deferred to iter6).
Phase 5 review concerns laid out: helper completeness, heap iterator
API, gate semantics, hantro CAPTURE-derivation on correct format,
mpv probe-then-real flow, memory rule placement.
Option β deferral note: cleaner refactor exists but not necessary
for iter5b's bug; defer to future iteration when multi-resolution
mid-stream becomes a target.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VA-API lifecycle traced: CreateConfig stores profile in object_config;
CreateSurfaces2 has NO config_id, can't access profile; CreateContext
takes VAConfigID and already does profile-switch for h264_start_code
(context.c:205-217, iter4 fix-forward 692eaa0).
surface.c:164-171 already flags this as deferred-work in a TODO comment:
"that's a context-level redesign for the next iteration." iter5b picks
up that deferred work.
Three options analyzed empirically:
- α: thread current_profile through driver_data (15 LOC, fragile semantic)
- β: move OUTPUT-side lifecycle to CreateContext (80-150 LOC, clean)
- γ: lazy at BeginPicture (architecturally wrong site)
Recommendation: option β. iter4 reviewer accepted the deferred-work
flag in surface.c; iter5b is the iteration that addresses it.
object_config->pixelformat field at config.h:46 is declared but never
assigned — opportunity for wiring up cleanly via the profile→pixelformat
map.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empirical strace of all 5 codecs through libva shows VIDIOC_S_FMT on
OUTPUT_MPLANE ships pixelformat V4L2_PIX_FMT_H264_SLICE for EVERY
profile. HEVC controls submitted on H264_SLICE OUTPUT → kernel rkvdec
silently rejects/no-ops → CAPTURE stays in cap_pool init (all-zero).
Per-codec Bug 2 taxonomy:
- HEVC, VP9, VP8: OUTPUT format mismatch on rkvdec/hantro-strict → 100% zero
- MPEG-2: format mismatch but hantro tolerates → works
- H.264: format right by coincidence; keyframe decodes, inter all-zero
(Bug 4, separate, deferred from iter5b)
Site: src/surface.c:173 `unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE`.
Same bug class as feedback_unconditional_codec_state.md
(iter4 h264_start_code = true).
iter5b new Phase 1: fix surface.c to switch pixelformat on
config_object->profile. 4 criteria locked, all backend-side, no kernel
patches. RFC v2 series filed back to backlog for a future
DMABUF-import-consumer campaign.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sonnet-architect review found that the RFC v2 fix mechanism does not
reach the libva backend's consumer path:
- Backend uses V4L2_MEMORY_MMAP for both OUTPUT + CAPTURE buffers.
- For MMAP buffers, vb->planes[].dbuf stays NULL.
- RFC v2 helper's plane loop skips planes with !dbuf, fence attached
to no dma_resv.
- EXPBUF (vb2_dc_get_dmabuf) creates a fresh disjoint dma_resv.
- The fence-mechanism fix would be a no-op for the cap_pool path even
if it did reach the right resv, because RequestSyncSurface already
blocks on media_request_wait_completion + v4l2_dequeue_buffer.
Three alternative root-cause hypotheses for Phase 0/3 to disambiguate:
cache coherency, cap_pool slot-rotation bug, or a separate-sync gap
in vaDeriveImage/vaMapBuffer that bypasses RequestSyncSurface.
Phase 5 saved ~half a session of build-install-test wallclock that
would have ended in a Phase 7 → Phase 0 loopback anyway.
Three Important + 2 Minor findings also recorded for when iter5 reopens.
User pick: loop back to Phase 0/3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Source-read complete: 3 RFC v2 patches dissected, v7.0 rkvdec_buf_queue
site identified at line 954 of drivers/media/platform/rockchip/rkvdec/rkvdec.c,
empirical disproof of Bug 3 UAPI drift via byte-identical v6.12↔v7.0 struct
diff, hantro_v4l2.c confirmed unchanged across the same range.
Rebase risk concentrated in videobuf2-core.c (medium — vb2 core sees regular
activity); deferred to Phase 4 when boltzmann is reachable for the
git apply --3way verification.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 source-read mid-execution found that v4l2_ctrl_mpeg2_*
and v4l2_ctrl_vp8_frame are byte-identical v6.12 ↔ v7.0 mainline.
On-fresnel re-trace with correct hantro-decoder bind shows MPEG-2
controls submit at = 0; the "Unable to set control(s)" log noise
is the backend's H.264/HEVC init-probe EINVAL on a non-H.264 device
(B4 backlog), not a UAPI drift.
iter5 locked scope is now vb2_dma_resv (4 patches: 3 existing
operator-authored RFC v2 + new rkvdec consumer). Criteria reduced
from 5 to 4. B4 stays in backlog.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five Phase 1 criteria: Bug 2 closed (cap_pool readback returns real
pixels through libva); Bug 3 closed (hantro MPEG-2 + VP8 controls
accepted on new kernel); patches ship from kernel-agent (local-carry
acceptable, mainline bonus); zero codec-contract regression vs iter4;
5/5 direct-verification block restored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mid-Phase-7 fix-forward landed on fork
(marfrit/libva-multiplanar:692eaa0): unconditional
context_object->h264_start_code = true was prepending 0x00 0x00 0x01
to VP9 slice data, shifting the rkvdec bitstream by 24 bits and
producing silent decode failure. Now gated on
config_object->profile (H.264 + HEVC only).
Empirical verification when fresnel was online: post-fix VP9 keyframe
FRAME control bytes 0-23 byte-match Phase 3 anchor:
lf.flags=0x03 (DELTA_ENABLED|DELTA_UPDATE) — was 0x01
base_q_idx=0x2e=46 — was 0x41=65
This is the transitive-proof leg-1 (backend-payload == kernel-direct-payload)
for the iter4 keyframe.
Open verification when fresnel returns:
- Full 168-byte FRAME control diff mine vs Phase 3 anchor
- Full 2040-byte COMPRESSED_HDR control diff
- ffmpeg-v4l2request kernel-direct VP9 decode + hwdownload pixels =
Phase 3 SW reference (transitive-proof leg-2)
If both legs PASS, iter4 closes 5/5 (4 direct from earlier iters
+ 1 transitive iter4) per Option-A choice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review by sonnet-architect with cold-context source reads of fork +
kernel UAPI + VAAPI + FFmpeg references + kernel rkvdec source.
Reviewer applied Direction 2 (empirical-over-theoretical) by
test-compiling struct sizes, gcc-c-checking VAAPI field accesses,
and source-tracing FFmpeg's filter-mode XOR provenance.
Critical findings (all empirically validated by author before
incorporation per feedback_review_empirical_over_theoretical.md):
C1 - interpolation_filter double-XOR: vaapi_vp9.c:62 ALREADY applies
`filtermode ^ (filtermode <= 1)` when filling VAAPI's
mcomp_filter_type. Plan's second XOR was incorrect; would swap
EIGHTTAP and EIGHTTAP_SMOOTH for inter frames -> wrong
loop-filter strength. Fix: direct assignment, no XOR.
C2 - LF deltas not persistent: kernel UAPI explicitly says
"users should pass its last value" when delta_update=0. Plan
memset-zeroed each frame; would send {0,0,0,0,0,0} on BBB inter
frames instead of {1,0,-1,-1,0,0}. Fix: add persistent vp9_lf
state to object_context, init to VP9 spec defaults, update only
when parser sees delta_update=1, always copy to kernel control.
C3 - reference_mode out-parameter missing: reference_mode lives in
FRAME struct, not COMPRESSED_HDR. Plan referenced
`compressed_hdr_reference_mode` placeholder which would be an
undefined identifier -> compile failure. Fix: add
`uint8_t *out_reference_mode` param to vp9_fill_compressed_hdr;
derive `allowcompinter` at call site from the 3 sign biases.
C4 - Mitigation B scope claim overstated: walk-and-pick-first always
selects rkvdec on 7.0 (since video1 enumerates first). Hantro
codecs (MPEG-2, VP8) at video3 still require env override.
Fix: qualify criterion-5 trace; add LIBVA_V4L2_REQUEST_NO_
AUTODETECT=1 escape hatch for legacy callers.
6 Suggested (S1-S6): all confirm plan correctness OR are scope-
aligned non-issues. S4 (uv_mode memcpy omission safe for rkvdec)
baked into Clause 9 amended text.
Without this review, iter4 Phase 6 would have failed first compile
(C3) + produced wrong inter-frame output (C1+C2) + caused user
confusion (C4). Estimated saving: 1 compile failure + 1 Phase 7 ->
Phase 4 loopback + 1 doc correction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captured on linux-fresnel-fourier 7.0-1 (post 6.19 decommission).
VP9 baseline (kernel-direct via ffmpeg-v4l2request on rkvdec):
- 5-frame SW reference PNG SHA256 anchors (criterion-4)
- VIDIOC_S_EXT_CTRLS strace with full payload at -s 16384
- Empirical struct sizes 168 B (FRAME) / 2040 B (COMPRESSED_HDR)
supersede Phase 2 estimates of 144 / 1947
- Probe pattern: count=1 (FRAME-only) then count=2 (FRAME + COMPRESSED_HDR)
Phase 2 doc fix: control IDs corrected 0xa40b2c/d -> 0xa40a2c/d.
4-codec regression (H.264, MPEG-2, HEVC, VP8): all fall back to SW on
default config because /dev/video0 is now rockchip-rga (RGB color
converter), not a codec device. Fork hardcodes /dev/video0 in
request.c:149. Env override LIBVA_V4L2_REQUEST_VIDEO_PATH /
_MEDIA_PATH restores per-driver profile enumeration; mitigation A/B/C
queued for user decision.
New contract clauses surfaced:
- Clause 11: uncompressed-header partial parse for lf_delta /
base_q_idx (VAAPI doesn't expose these; keyframe ref_deltas non-zero
for BBB so leave-at-zero is wrong)
- Clause 12: compile-time sizeof asserts on the two control structs
so future UAPI shifts fail loudly
iter4_phase3.tgz: full Phase 3 artifact bundle (strace + PNG refs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>