Commit Graph

25 Commits

Author SHA1 Message Date
marfrit e46b2ed2d6 Iteration 5 Phase 1 lock — A + G + B + E
Heavyweight four-track iteration:
- Track A: DEBUG instrumentation sweep (carried four iterations)
- Track G: PGO-disabled Firefox-fourier rebuild
- Track B: mpv libplacebo --vo=gpu segfault investigation
- Track E: multi-context libva safety (Sonnet review 9.6)

Natural sequence: A first (clean codebase), G in parallel on boltzmann
(~2h rebuild offloaded), E next (architectural on clean source), B last
(consumer-side investigation on post-A+E driver).

Phase 4 will subdivide into 4A/4G/4E/4B sub-phases. Phase 5 sonnet
review + Phase 7 verify span all four tracks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:42:49 +00:00
marfrit f1aec7bdeb Iteration 5 Phase 0: substrate doc with 8 candidate questions
DEBUG sweep (A) is the carried-four-iterations backlog and natural
prerequisite for upstreaming. mpv libplacebo segfault (B) and perf
binding cell (C) are also long-deferred carryovers. New candidates
this iteration: PGO-disabled Firefox rebuild (G), and the natural
codec/hardware extensions (H).

Recommended primary: A + F (sweep + upstream prep) — with Track A
fixed in iter4, the fork is upstreamable in shape and just needs the
diagnostic noise removed. F is gated on explicit operator instruction
per feedback_no_upstream.md.

Phase 1 lock awaits user candidate pick.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:32:59 +00:00
marfrit 67494ae7ee Iteration 4 close — Track A locked, three-iteration carryover resolved
The iter1+iter2+iter3 frame-11 EINVAL is empirically eliminated. mpv
direct stress test on ohm via patched libva-v4l2-request-fourier:

  RequestBeginPicture:     2130
  RequestSyncSurface:      4254
  S_EXT_CTRLS EINVAL:      0
  Unable to set control(s): 0
  Generic EINVAL:          0
  ENETDOWN:                0

2130 frames at 24 fps = real-time HW decode (>98% of 2160-frame max
in 90 seconds wall time). Track A's Phase 1 success criterion crushed.

Three correctness fixes (4 fork commits):
- 74d8dd1: DPB fields=V4L2_H264_FRAME_REF + skip stale entries
- 385dee1: fresh request_fd per frame (THE load-bearing fix)
- b81ce69: B-slice L1 reflist .fields copy-paste

Plus diagnostic instrumentation (a12d299, 4892656, f21bdf0) deferred
to iter5 sweep alongside earlier iter1/iter3 instrumentation.

Three new memory entries: kernel obfuscation extends to compound TRY,
request_fd lifecycle (fresh per frame), FFmpeg as empirical authority.
README iteration table updated.

Carries to iter5 substrate: DEBUG sweep, mpv libplacebo segfault,
multi-context libva safety, PGO Firefox rebuild, eventual upstream
prep (Mozilla bug + bootlin libva-v4l2-request).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:29:43 +00:00
marfrit ebbcda3d75 Iteration 4 Phase 5: sonnet review YELLOW → GREEN
Sonnet Phase 5 review found two caveats:
- C1: BeginPicture=18 didn't satisfy ≥720 criterion (PGO Firefox throttle)
- C2: pre-existing B-slice L1 reflist .fields copy-paste bug

Both resolved during Phase 5 close.
- C2: fork commit b81ce69 — ref_pic_list0[i].fields → ref_pic_list1[i].fields
- C1: mpv direct stress test on ohm — 2130 BeginPictures, 4254 SyncSurfaces,
       0 EINVAL of any kind, ~89s of bbb stream-time decoded clean. >98% of
       2160 max frames at 24 fps = real-time HW decode through libva.

Phase 5 closed GREEN. Phases 6+7 satisfied by the same mpv stress test
(deploy via fix-loop, verify via mpv counters).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:21:54 +00:00
marfrit 08a04c3791 Iteration 4 Phase 4: load-bearing fix landed — fresh request_fd per frame
Phase 4's diagnostic journey: TRY_EXT_CTRLS retry didn't pinpoint
(kernel obfuscation extends to TRY for compound controls). Per-control
TRY isolation showed all 4 H.264 controls fail individually on the same
fd → pivot from "bad control content" to "bad request_fd state."
Replaced REINIT with close+realloc; iter1+2+3 carryover frame-11
EINVAL empirically eliminated.

Fork commits: 74d8dd1 (DPB FFmpeg-semantics fixes), 385dee1 (fresh
request_fd per frame, load-bearing), f21bdf0 (debug TRY iso),
b81ce69 (B-slice L1 copy-paste fix from Phase 5 review).

Empirical: 49.7s of bbb_1080p30 stream-time decoded clean on
firefox-fourier without MOZ_DISABLE_RDD_SANDBOX=1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:16:00 +00:00
marfrit ec277eab57 Iteration 4 Phase 2: kernel control validation analysis
Reading v4l2-core/v4l2-ctrls-api.c and v4l2-ctrls-core.c on the cloned
linux-pinetab2 v6.19.10-danctnix1 source: error_idx == count for
S_EXT_CTRLS is intentional kernel obfuscation, not under-reporting.
Line 629 deliberately overwrites error_idx with cs->count after
validate_ctrls failures in set mode, forcing the caller to bail rather
than partial-set.

The escape hatch is VIDIOC_TRY_EXT_CTRLS, which "never modifies controls
[so] error_idx is just set to whatever control has an invalid value"
(quoting v4l2-ctrls-api.c:222-224).

Path forward into Phase 4: amend Y2 instrumentation to retry with
TRY_EXT_CTRLS on S_EXT_CTRLS EINVAL, extract the actual failing control
index. From there, narrow the failing field by comparing frame-11
values against frames 1-10.

Phase 3 baseline anchored from iter3 Phase 7 — same rig, same EINVAL,
deterministic. No re-acquire needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:21:40 +00:00
marfrit 8dd3f39963 Iteration 4 Phase 1 lock — A solo (frame-11 EINVAL fix)
Track A locked solo. Pairing options A+B and A+D both deferred to iter5+;
Track A is the load-bearing carry from iter1+iter2+iter3, fix-loop wants
the focus.

Phase 1 success criterion: ≥30s (≥720 frames @ 24fps) of bbb_1080p30
decoded by patched-Firefox-fourier on ohm without S_EXT_CTRLS EINVAL,
with operator visual ack of frames rendering in the Firefox window.

Diagnosis path: read hantro_g1_h264_dec.c set_params validation, diff
our DECODE_PARAMS / SLICE_PARAMS / SPS / PPS struct construction vs
FFmpeg reference, speculative-fix loop on ohm. Sonnet 7.5 (mid-stream
non-IDR DPB state) is the suspect surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:02:25 +00:00
marfrit f181780294 Iteration 4 Phase 0: substrate doc with 7 candidate questions
Track A is the natural primary lock — frame-11 EINVAL has carried for
three iterations; iter3's rig + Y2 instrumentation make the diagnosis
loop short. Other candidates: DEBUG sweep (B, iter1+2+3 backlog), mpv
libplacebo segfault (C, iter3 carryover), perf binding cell (D, iter1+2+3
backlog), V4L2_MEMORY_DMABUF Option B (E, iter2 carryover), multi-context
safety (F, Sonnet review 9.6), Mozilla bug filing (G, gated on operator).

Recommended: A primary, pair with B (cleanup) or D (perf anchor).

Phase 1 lock awaits user candidate pick.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:59:46 +00:00
marfrit f91469abe3 Iteration 3 close — F GREEN, A reproduced + diagnosed for iter4
Phase 1 locked F (Firefox RDD sandbox verify-by-patch) and A (frame-11
EINVAL diagnose) running in parallel on a single firefox-fourier build.

Track F: GREEN. Patched Firefox 150.0.1 (firefox-fourier, pkgrel=1.1)
launches on ohm WITHOUT MOZ_DISABLE_RDD_SANDBOX=1 and engages our
libva-v4l2-request backend end-to-end. Three patches needed (Phase 2
identified one and deferred two):
  - Broker policy (SandboxBrokerPolicyFactory.cpp): allow /dev/media*,
    extend cap-filter to admit stateless decoders that lack M2M caps.
  - Seccomp policy (SandboxFilter.cpp): allow ioctl magic byte '|'
    for <linux/media.h> request-API ioctls.
  - Driver (media.c): replace select() with poll() — Mozilla's RDD
    seccomp common policy admits poll/ppoll/epoll_* but not
    select/pselect6. Driver-side fix preferred; smaller surface,
    portable across sandbox policies, and poll() is the modern API.

Track A: REPRODUCES + DIAGNOSED. Frame-11 EINVAL fires deterministically
on a single-slice P-frame (slice_type=0, frame_num=5, post-IDR) — the
exact iter1/iter2 carryover signature, confirming it isn't environmental.
Y2 instrumentation (in v4l2_ioctl_controls) now logs num_controls /
error_idx / per-control id+size on EINVAL. Sizes match kernel UAPI;
error_idx == num_controls is the kernel's "all bad / no specific control"
sentinel — it's a request-level rejection, not a single-field violation.
Fix is iter4's lock; rig + Y2 in place for fast iter4 turnaround.

Build infrastructure introduced: firefox-fourier LXD container on
boltzmann (RK3588 aarch64, persistent, ssh -J boltzmann
builder@firefox-fourier). Upstream Arch x86_64 wasi packages installed
to work around 4-year-stale ALARM versions. PGO generation crashes at
exit (LXC has no display); obj/dist/ tarball used as the deployable
artifact instead of the pacman package.

Phase 6 surprises captured in phase6_iter3_findings.md: malformed
first-cut patch (descriptive vs numeric hunk headers), --enable-v4l2
isn't a Mozilla 150 flag (auto-set on aarch64+GTK), Mozilla 2025 PGP
key rotation, ALARM-stale wasi, onnxruntime missing in ALARM, and the
"no tricks" lesson (revert workarounds first when redirected).

Carries to iter4 substrate: Track A fix is the natural lock; mpv
libplacebo --vo=gpu segfault stays as separate iter4 candidate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:56:34 +00:00
marfrit c555040b24 Iteration 3 Phase 0: substrate doc with 7 candidate questions
Carries iter2 close state forward. Lists 7 candidate research questions
(A frame-11 EINVAL, B DEBUG sweep, C perf binding cell, D multi-context
safety, E V4L2_MEMORY_DMABUF as Option B for true DMA-BUF lifecycle fix,
F Firefox RDD sandbox, G Sonnet 7.x carryovers) plus recommended
pairings.

Candidate F enriched with Sonnet web research: NO existing Mozilla bug
covers V4L2-stateless request-API; Bug 1833354 (FF116) is V4L2-M2M only.
Sandbox allowlist in SandboxBrokerPolicyFactory.cpp::GetRDDPolicy() +
AddV4l2Dependencies() filters by V4L2_CAP_VIDEO_M2M, explicitly excluding
Hantro stateless. /dev/media* completely absent from allowlist. Patch
needed is small in scope (~30 lines, 2 functions, 1 file) following the
renderD128 broker pattern. Real Mozilla patch needed, not env-var-only.

Stop point at Phase 1 lock — user picks among A..G.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:43:20 +00:00
marfrit c36c61e197 Phase 8 close: append Firefox 150 regression re-verify result
Operator-driven Firefox 150 video session under iter2 driver:
- With MOZ_DISABLE_RDD_SANDBOX=1: our libva backend engages, decodes 10
  frames cleanly through hantro (luma gradient 0x10..0x1c matching BBB
  intro fade), surface ID recycle + cap_pool slot cycling work as
  designed. EINVAL on frame 11 is iter1-carryover (Sonnet 7.x family),
  not an iter2 code regression.
- Without sandbox bypass: Firefox 150 RDD sandbox blocks open of
  /dev/media0 with ENETDOWN, libva init fails, Firefox SW-falls-back.
  iter1 evidence shows libva ran in the utility process
  (sandboxingKind=0) at that time; Firefox routing has since changed
  to RDD. NOT iter2-side; flag for iter3 substrate.

Cap_pool architecture confirmed working with Firefox: 8 surfaces, slots
recycled IN_DECODE -> DECODED -> reacquired across multiple BeginPicture
cycles for the same surface ID. Decoded NV12 content is real (matches
mpv vaapi-copy luma signature on the same clip).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:30:45 +00:00
marfrit 2c57816fb1 Phase 8 iteration 2 close — DMA-BUF lifecycle hardened, goal met
mpv --hwdec=vaapi --vo=gpu plays bbb_1080p30 smoothly per operator
inspection. Three independent fixes landed in the fork: format-cache
invalidation on multi-resolution sessions (Fix 1), conditional
DRM_FORMAT_MOD_INVALID for non-64-aligned pitches (Fix 2), decoupled
CAPTURE buffer pool with LRU recycling (Fix 3, the load-bearing one).

Documented limitations carried into iter3 substrate: Option-A
statistical mitigation not provably race-free under pathological
workloads; multi-context safety not addressed; LAST_OUTPUT_*
process-global cache; iter1 carry-over Sonnet 7.x questions still open.

Open recommendation for iter3 substrate: re-run iter1's 4-consumer
regression matrix under Fix 3, especially Firefox 150 multi-video.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:09:58 +00:00
marfrit 3fd51d5e7e Iter2 Phase 5 sonnet review: 3 architecture blockers + CAPTURE REQBUFS gap
Plan subagent with model: sonnet, open-consultation. Raw verbatim
review text — operator transcribes to DokuWiki.

Substantive findings beyond the iteration 2 plan as written:

ARCHITECTURE BLOCKERS (must address before Fix 3 implementation):

1. The 3-state slot machine (FREE/IN_DECODE/EXPORTED) is incomplete.
   Need a 4th state DECODED for the window between SyncSurface DQBUF
   and either ExportSurfaceHandle (vaapi path) or DeriveImage
   (vaapi-copy path). Without it, vaapi-copy races Fix 3 — a slot
   in FREE state can be claimed by BeginPicture for a new frame
   while DeriveImage is still copying from its mmap.

2. pthread_mutex_t required around cap_pool slot state. VAAPI is
   re-entrant for multi-threaded consumers; mpv vaapi may call
   EndPicture/SyncSurface from decoder thread and ExportSurfaceHandle
   from display thread.

3. The patch-0011 sentinel write at picture.c:365-373 currently uses
   surface_object->destination_map[0]. After pool refactor, must
   reference slot->map[0] instead.

ADDITIONAL FINDINGS (addressed in Fix 1):

4. The resolution-change path at surface.c:121-123 only does REQBUFS(0)
   on OUTPUT, not CAPTURE. Hantro derives CAPTURE format from OUTPUT,
   so leftover CAPTURE buffers from prior resolution would also block
   the implicit format change. Pre-existing bug; Fix 3 pool refactor
   would expose more frequently. Folded into Fix 1 implementation.

5. WSI fix should be conditional on pitch alignment, not universal
   MOD_INVALID — preserves LINEAR semantics for already-aligned
   1920-wide content.

DEFERRED to iteration 3:

- LAST_OUTPUT_WIDTH/HEIGHT thread safety (static globals; multi-context race)
- Full Option B (V4L2_MEMORY_DMABUF + userspace allocation)
- Multi-context concurrent use

Sonnet endorsed Option A (more buffers + LRU recycling) over Option B
for iteration 2's hardening scope. MIN_CAP_POOL=24 conservatively sized
for single-stream playback (verified against Firefox VideoFramePool
source for hold-window estimates).

Fix 1 + Fix 2 already implemented per revised architecture and pushed
to fork (06beef6, e64bb08). Fix 3 implementation pending ohm
availability for verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 19:39:08 +00:00
marfrit 9b1d7737cd Iteration 2 Phase 0-4: substrate, situation analysis, baseline anchor, plan
Iteration 2 opens. Hardening, not new feature work.

Phase 0 (phase0_findings_iter2.md): locked research question — make
DMA-BUF (vaapi no -copy) path render without artifacts on direct-
render consumers. Plus WSI alignment for 864-wide videos in Firefox
and multi-resolution kernel-state recovery.

Phase 2 (phase2_iter2_analysis.md): the bug origin is picture.c:375
re-QBUFing surface_object->destination_index every decode cycle
while the kernel CAPTURE buffer is still being read by the consumer
via an EXPBUF'd dma_buf fd. V4L2 doesn't enforce the constraint;
userspace must coordinate.

Three architecture options analyzed:
  A. More buffers + LRU recycling (cheapest, statistical mitigation)
  B. Per-buffer dma_buf-refcount-aware recycling (correct, requires
     kernel changes or userspace V4L2_MEMORY_DMABUF rewrite)
  C. Kernel patch to enforce QBUF rejection (out of campaign scope)

Picking Option A for iteration 2.

Phase 3 (in-session baseline anchor): mpv vaapi --vo=gpu shows
91 drops in 14s at 1080p, 9 CAPTURE buffer indices used (2-10),
~10-19 re-queues per buffer in 14s window. Per-buffer re-queue
interval ~875ms vs typical compositor hold ~50ms — race window
opens episodically.

Phase 4 (phase4_iter2_plan.md): three independent fixes ordered
cheapest-first.
  Fix 1: invalidate LAST_OUTPUT_WIDTH cache on session teardown.
  Fix 2: try DRM_FORMAT_MOD_INVALID for WSI compatibility.
  Fix 3: decoupled CAPTURE buffer pool with LRU recycling
         (~150-200 lines, the load-bearing fix).

Phase 5 sonnet review BEFORE Fix 3 implementation.

Out-of-scope for iteration 2 (carry to iter 3): EACCES probe,
multi-slice num_ref_idx, HACK block MPEG-2 cleanup, seek-to-non-IDR,
DEBUG instrumentation cleanup, V4L2_MEMORY_DMABUF rewrite, perf
metrics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:55:31 +00:00
marfrit 7e66abb72f Phase 8: iteration 1 close — deliverable lands for vaapi-copy + Firefox + vainfo
Iteration 1 close. Boolean-correctness criterion met for vainfo +
mpv vaapi-copy + Firefox 150 in live Plasma session. mpv vaapi
(DMA-BUF) engages and decodes correctly but stutters due to a
DMA-BUF lifecycle race — deferred to iteration 2.

Four load-bearing functional commits on the fork:
  9de1be3 — slice-header bit-parser populates DECODE_PARAMS bit-size
            fields hantro G1 reads into MMIO registers
  d41a4b9 — always submit SCALING_MATRIX + populate pps num_ref_idx
  37c0e72 — re-set OUTPUT format on resolution change (mpv probe-pattern)
  ac891a0 — honor VA_EXPORT_SURFACE_SEPARATE_LAYERS in vaExportSurfaceHandle

Lessons distilled to memory:
  feedback_read_consumer_source_first (NEW)
  feedback_one_consumer_success_is_not_validation
  feedback_kernel_source_audit_for_uapi_contract (Sonnet Phase 5)
  feedback_stdout_is_data_too (NEW)
  feedback_no_premature_closure

Predecessor claims either reframed or invalidated:
- 'vainfo + mpv probes work end-to-end' was true at libva engagement
  layer, wrong at kernel-decode layer until 9de1be3
- 'chromium-fourier 149 = libva-multi-planar working' was wrong about
  mechanism; chromium-fourier uses chromium-internal V4L2 backend,
  not libva. The 83 pp browser-CPU finding from fourier_attribution
  cell A stands; the path attribution does not.
- 'patch 0011 sentinel test reliably detects decode failure' had a
  cache-coherency bug fixed in a047926.

Open questions carried to iteration 2: DMA-BUF EXPBUF refcount
lifecycle (load-bearing), WSI pitch alignment for non-64-aligned
widths, multi-resolution kernel-state corruption, plus six items
from Sonnet's Phase 5 review.

Iteration 2 opens with these as Phase 0 substrate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:46:07 +00:00
marfrit a2d5210ee9 Phase 5 review (sonnet): substrate sound, parser correct, surface-export attack plan + 6 unsurfaced findings
Plan subagent with model: sonnet, open-consultation pattern. Raw
verbatim review text — operator transcribes to DokuWiki.

Headlines:
- Phase 0 substrate: process not defective, but "end-to-end" framing
  conflated libva engagement with pixel output. New invariant for V4L2
  stateless work: pixel-content check via cache-invalidating path is
  mandatory Phase 0 deliverable.
- Phase 4 plan: correctly scoped, surface-export bug was undiagnosable
  before decode landed (all-zeros NV12 produces solid colors regardless
  of export geometry).
- Slice-header parser: correct, no off-by-one. Two non-bugs to comment
  (MMCO 5 implicitly handled, POC type 2 implicitly correct).
- Phase 7 criterion: right. Both vaapi and vaapi-copy paths required.
  chromium-fourier 149 should be removed from libva corpus (it uses
  chromium's own internal V4L2 decoder, bypasses libva entirely).
- Slice-header parser in-fork is right call vs upstream VAAPI lobbying.
- DEBUG patches stay until DMA-BUF path works end-to-end, then clean
  sweep before any bootlin snapshot.
- FRAME_BASED hardcode is correct for ohm; revisit at fresnel/boltzmann.

Surface-export attack plan (5 load-bearing items):
1. Log RequestExportSurfaceHandle descriptor (object size, plane
   offsets, pitches, fd).
2. Log format_width/height from v4l2_get_format — confirm 1088 not 1080.
3. Run mpv --hwdec=vaapi --msg-level=vd=v --msg-level=vo=v, capture
   what mesa reports about imported DRM format/modifier/geometry.
4. Verify UV offset matches 1920×1088 (=2,088,960). If format_height
   is 1080 (=2,073,600), UV plane reads start 8 rows into Y data —
   most likely structural bug.
5. Read FFmpeg's hwcontext_drm.c for NV12 nb_objects + UV-offset
   computation pattern.

Six previously-unsurfaced findings:
7.1 EACCES on VIDIOC_G_EXT_CTRLS deserves one more probe (try moving
    readback to before MEDIA_REQUEST_IOC_QUEUE).
7.2 num_ref_idx_l0/l1_default_active_minus1 from VASlice is wrong for
    multi-slice streams with per-slice override (defer).
7.3 SET_FORMAT_OF_OUTPUT_ONCE global breaks multi-resolution use
    (latent bug, document).
7.4 // HACK in surface.c is architecturally wrong for multi-codec
    consumers (latent bug, document).
7.5 Firefox non-IDR first frame (mid-stream seek) handling unverified.
7.6 fourier_attribution cell A wheat verdict mechanism is now Step 2
    chromium-internal V4L2 decoder, not Step 1 libva.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:56:08 +00:00
marfrit a052d5d7cd Phase 7 verification: vaapi-copy works, DMA-BUF surface-export bug surfaces
Live Plasma 6 Wayland session retest of all 4 target consumers
against fork commit 6be3f3b.

Results:
- vainfo: ✓ no regression (7 H.264 + 2 MPEG-2 profiles)
- mpv --hwdec=vaapi-copy --vo=gpu: ✓ bunny (Phase 6 success
  re-confirmed in live session)
- mpv --hwdec=vaapi --vo=gpu: ⚠ solid blue frame
- Firefox 150 (live session): ⚠ engages libva for 1 frame
  (gets real pixels per slice_header parse log), then falls
  back to FFmpeg(FFVPX) software for sustained playback
- chromium-fourier 149: ✓ no regression but ORTHOGONAL — uses
  chromium's own V4L2 stateless decoder, bypasses libva entirely

Tests A (mpv vaapi) and B (Firefox) converge on the same
DMA-BUF surface-export bug: vaExportSurfaceHandle in libva-
v4l2-request produces a DMA-BUF that Mesa/Firefox can't render
correctly — likely wrong DRM_FORMAT modifier or plane offset/
stride mismatch with hantro's tile-padded NV12 (sizeimage=
3,655,712 vs vanilla 3,133,440 for 1920x1088).

Also disambiguated: chromium-fourier 149's decode path does
NOT go through libva-v4l2-request — uses chromium's own V4L2
backend (Step-2 chromium-side patch). Reframes the 2026-05-03
fourier_attribution cell-A wheat verdict's path validation.

Boolean-correctness criterion (sharpened): met for vaapi-copy,
not for vaapi (DMA-BUF). Phase 1 lock should wait until both
paths work. Iteration 2 (perf) is gated on the DMA-BUF path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:25:16 +00:00
marfrit b18c6f70f9 Phase 6 success: hantro decodes real H.264 pixels via libva-v4l2-request
First end-to-end hardware decode in this campaign. mpv --hwdec=
vaapi-copy --vo=gpu on bbb_1080p30_h264.mp4 in operator's live
Plasma 6 Wayland session shows the bunny playing — real decoded
NV12 pixel data, not the all-zero (green) or solid-color output
we had all day.

Operator confirmation 2026-05-04: "A big fat white bunny shows up."

Two fork commits got us here:
  d41a4b9 — h264: always submit SCALING_MATRIX + populate pps num_ref_idx
  9de1be3 — h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields

The load-bearing fix was 9de1be3 (slice-header bit-parser) — it
populates dec_param->dec_ref_pic_marking_bit_size, idr_pic_id,
pic_order_cnt_bit_size which hantro G1 writes directly into MMIO
registers (G1_REG_DEC_CTRL5_REFPIC_MK_LEN, G1_REG_DEC_CTRL5_IDR_PIC_ID,
G1_REG_DEC_CTRL6_POC_LENGTH).

Phase 1 boolean-correctness criterion now sharpened to require
real-VO pixel-content verification. Met for mpv vaapi-copy in
live session.

Phase 7 verification still owed across the full test corpus
(vainfo, mpv vaapi (no -copy), Firefox, chromium-fourier 149).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:49:59 +00:00
marfrit c6e3f58958 Phase 4 input: diff against FFmpeg + hantro kernel source identifies bug
Read FFmpeg's libavcodec/v4l2_request_h264.c (Kwiboo downstream
v4l2-request-n8.1 head b57fbbe, proven working on hantro hardware) and
Linux mainline drivers/media/platform/verisilicon/hantro_g1_h264_dec.c
(the actual register-write code).

Smoking gun: hantro G1 writes three DECODE_PARAMS fields directly into
hardware MMIO registers:

  dec_ref_pic_marking_bit_size  -> G1_REG_DEC_CTRL5_REFPIC_MK_LEN
  idr_pic_id                    -> G1_REG_DEC_CTRL5_IDR_PIC_ID
  pic_order_cnt_bit_size        -> G1_REG_DEC_CTRL6_POC_LENGTH

When all three are zero (our current state — patch 0008 left them
uninitialized with the open question "does hantro tolerate?"), the
hardware bitstream parser advances by zero bits past slice_type,
lands on garbage, decodes zero pixels into the CAPTURE buffer.

This explains the all-zero CAPTURE output we see across mpv and
Firefox in the live-session real-VO tests. Patch 0008's "Empirical
question" was empirically answered today: NO, hantro does not
tolerate zeros in these fields.

Additional easy fixes identified:
- pps->num_ref_idx_l0_default_active_minus1 (uninitialized;
  VAAPI provides; written to MMIO REFIDX0_ACTIVE)
- pps->num_ref_idx_l1_default_active_minus1 (same)
- pps->pic_parameter_set_id (uninitialized; for single-PPS BBB
  accidentally correct at 0; written to MMIO PPS_ID)
- pps->flags |= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT always
  (FFmpeg comment: "FFmpeg always provide a scaling matrix")
- always send V4L2_CID_STATELESS_H264_SCALING_MATRIX in the request
  (with default flat matrix when no VAIQMatrixBuffer arrives)

The load-bearing fix requires implementing a small slice_header()
bit-parser in libva-v4l2-request — VAAPI doesn't carry the
bit-position fields and hantro requires them. ~50-100 lines of C
in a new helper. This is real Phase 6 work, well-scoped.

Saved feedback memory:
- ../.claude/projects/-home-mfritsche-src/memory/
    feedback_kernel_source_audit_for_uapi_contract.md
  on the broader lesson: read kernel-side driver source for any
  UAPI control fields userspace fills, especially when the kernel
  writes them into hardware MMIO. "Empirical question -- does the
  kernel tolerate?" is not an acceptable resolution; either read
  the source or get a definitive empirical answer.

Reference sources gitignored — separate-repo discipline. Pull
recipes (ffmpeg-kwiboo + linux-mainline sparse clones) preserved
in shell history; can be reconstructed from diff_against_ffmpeg.md
URL/branch references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:08:43 +00:00
marfrit 365764fffb Phase 0 amendment: hantro writes zeros, sentinel test cache-buggy
Re-baselined libva-v4l2-request decode path with kernel-side
observability (ftrace v4l2/vb2/dma_fence + dmesg + dynamic_debug)
and visual disambiguator (mpv --vo=gpu in operator's live Plasma
session).

Findings:

1. Kernel reports successful CAPTURE buffer write every frame:
   ftrace vb2_buf_done shows bytesused=3655712 (full NV12 1920x1088
   + hantro tile padding). dmesg completely silent — no
   hantro/vpu/decode/error/warn messages.

2. Visual disambiguator: mpv --hwdec=vaapi-copy --vo=gpu shows a
   solid GREEN frame; --hwdec=vaapi --vo=gpu shows solid BLUE.
   Neither shows the sentinel mid-beige (NV12 Y=0xab,UV=0xab would
   render cream). Both colors are consistent with the kernel
   writing all-zero NV12 (Y=0,UV=0 → green via BT.709 limited; same
   buffer GL-imported as DMA-BUF with different colorspace → blue).

3. Patch 0011 sentinel test has a cache-coherency bug: writes
   0xab via cached surface_object->destination_map[0] mmap, never
   invalidates cache before readback. So the readback always
   shows the stale sentinel even when kernel DMA-overwrote it
   with zeros. vaapi-copy and Mesa DMA-BUF GL import correctly
   invalidate cache and see the real (zero) contents.

This corrects the previous Phase 0 verdicts twice in one day:
- Original commit f15ba8b ("the 2026-04-26 picture holds") was
  wrong: clean contract trace, never checked pixel content.
- Revised commit e892cea ("kernel produces no decoded pixel
  output, sentinel survives") was half right: kernel does write,
  writes zeros, and the sentinel test was reading stale cache.
- Now: kernel writes ALL ZEROS to the CAPTURE buffer. Hantro is
  silently failing the bitstream parse or some control validation.

This is consistent with patch 0011's own commit message hypothesis:
"All zeros → kernel did write 0x00s (overwriting our sentinel),
and the apparent 'no picture' output is the kernel-side decode
actually producing zeros (e.g. parser rejected the bitstream)."
That hypothesis was right; we just couldn't confirm it via the
sentinel test (cache bug) and went down the wrong rabbit hole.

Phase 6 direction sharpens substantially. Bug isn't "we can't
engage hantro" — it's "hantro engages but its parser produces
zeros." Bisect the control submission: VIDIOC_G_EXT_CTRLS
readback to verify writes stick, diff against FFmpeg's
v4l2_request_h264.c (proven working on hantro), verify SPS
completeness, resolve patch 0008's slice_header bit_size open
question, dyndbg the hantro module, etc. Phase 1 boolean-
correctness criterion needs a working pixel-content check before
lock; fix patch 0011's cache sync first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 11:39:42 +00:00
marfrit e892cea858 Phase 0 deliverable #3 (Firefox live session): inverted verdict
Re-tested Firefox 150.0.1 inside operator's active Plasma 6 Wayland
session (not Xvfb). Two-layer finding:

1. Firefox engages libva in real Plasma session: full V4L2-stateless
   contract lifecycle completes, no EINVAL on the request-API path,
   v4l2_request_drv_video.so successfully loaded, /dev/video1 +
   /dev/media0 opened by RDD utility process 146420.

2. Kernel produces no decoded pixel output: CAPTURE buffer returns
   from DQBUF with the patch-0011 sentinel pattern 0xab unchanged.
   Hantro never wrote the buffer despite the contract trace looking
   clean. Firefox detected the failed first frame and silently fell
   back to SW decode in RDD's FFmpeg-OS-library PDM. User-visible
   playback continues normally for 5+ minutes (operator confirmed
   t=337s playback time in live inspection).

Cross-checked against the prior 2026-05-04 mpv vaapi-copy run: 68 of
68 mpv CAPTURE buffers show the same sentinel-survives pattern.
mpv's --vo=null consumed all 68 sentinel buffers as if they were
valid NV12 frames; the failure was invisible. OUTPUT bytes are
byte-for-byte identical between mpv and Firefox (same IDR slice via
libavcodec, both consumers feed hantro the same data, hantro
silently drops both).

Implication: the prior Phase 0 in-session re-verification verdict
(commit f15ba8b: "the 2026-04-26 picture holds at boolean-correctness
level") was wrong at the kernel-decode layer. The patch-0011 sentinel
test in the deployed Step 1 build was authored specifically to detect
this failure mode; the predecessor close-out didn't grep for it, and
contract-trace cleanliness was mistaken for end-to-end success.

Phase 1 lock should be deferred until: (a) boolean-correctness
criterion is sharpened to require pixel-content verification,
(b) Phase 0 acquires kernel-side observability (ftrace, dmesg) to
characterize WHY hantro is silent. Step 1 engages libva but doesn't
make hantro decode -- Phase 6 has substantive work beyond the
18-patch series.

Likely failure-mode candidates flagged in findings_live.md priority
order: reference_ts not propagated; DECODE_PARAMS slice_header
bit_size zero; POC sentinel may still leak past patch-0015 strip;
level_idc over-allocation; SOURCE_CHANGE event handling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 10:38:57 +00:00
marfrit f115fa6cbc Phase 0 deliverable #3 (Firefox): headless-rig finding
Firefox 150.0.1 + media.ffmpeg.vaapi.enabled=true + LIBVA_DRIVER_NAME=
v4l2_request, executed under Xvfb on ohm.

Result: inconclusive at the boolean-correctness level. RDD process
dlopens libva.so.2 + libva-drm.so.2 + libva-x11.so.2 for capability
probe then immediately closes them; never reaches vaInitialize, never
opens /dev/dri/renderD128, never reaches v4l2_request_drv_video.so.
Falls back to software H.264 in RDD via FFmpeg-OS-library PDM
(Broadcast support from 'RDD', support=H264 SWDEC).

Root cause: Xvfb provides software framebuffer with no DRI/DRM
render-node integration. Firefox's gfx-environment platform-fitness
check rejects VAAPI before adding it to the RDD PDM order list.
Not a libva-side or driver-side fault — mpv --hwdec=vaapi-copy in
the same headless rig DID engage end-to-end (per
phase0_evidence/2026-05-04/findings.md).

Definitive Firefox verdict requires retesting inside a live Plasma
session — deferred to live-session run (next commit).

Also: Phase 0 deliverable #2 (Step 1 reconciliation into fork
master) was completed and pushed to marfrit/libva-v4l2-request-fourier
between this and the prior Phase 0 commit; status table updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 10:19:14 +00:00
marfrit f15ba8b147 Phase 0 in-session re-verification: 2026-04-26 picture holds
Re-executed deliverables #1 (verify failure-mode finding) and #4 (capture
contract trace) on ohm against the substrate that's actually deployed —
not the libva-v4l2-request-fourier git fork master, but the
libva-v4l2-request-ohm-gl-fix package built on boltzmann from the Step 1
18-patch series.

Result: vainfo enumerates 7 H.264 + 2 MPEG-2 profiles cleanly; mpv
--hwdec=vaapi-copy decodes 68 H.264 frames end-to-end through the full
V4L2-stateless contract on hantro /dev/video1 + /dev/media0. Zero
EINVAL/EAGAIN/EBUSY on the request-API path. No rig drift requiring
Phase 2 loopback.

Inventory finding documented: the git fork at e8c3937 is a pre-Step-1
substrate; rebuilding from it as-is would be a regression. Step 1
reconciliation (deliverable #2) is upstream of any future build-from-fork
action.

Rig caveat captured: --hwdec=vaapi requires a real VO; --hwdec=vaapi-copy
is the headless-safe alternative for SSH-driven test rigs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 09:16:38 +00:00
marfrit 35b5617c71 .gitignore: hide nested fork repo from campaign git 2026-05-04 08:10:19 +00:00
marfrit affc1752e0 Initial campaign substrate: README + phase0_findings
Single-question campaign — make multi-planar libva accepted by VA-API
consumers on Rockchip hantro RK3568 (PineTab2/ohm first iteration).
Backend only, success criterion is boolean correctness, performance
deferred. Substrate carried over from libva-v4l2-request-fourier
STUDY.md (commit e0acc33 in the fork) plus locked decisions from the
2026-05-04 setup exchange.

Fork lives as a subdirectory: libva-v4l2-request-fourier/ (separate
git repo, origin marfrit/libva-v4l2-request-fourier, upstream
bootlin/libva-v4l2-request).

Empty Gitea repo created at git.reauktion.de/marfrit/libva-multiplanar;
local origin remote set, no push yet (per operator instruction —
wait for publish-worthy state).
2026-05-04 08:10:03 +00:00