Commit Graph

12 Commits

Author SHA1 Message Date
marfrit 9b1d7737cd Iteration 2 Phase 0-4: substrate, situation analysis, baseline anchor, plan
Iteration 2 opens. Hardening, not new feature work.

Phase 0 (phase0_findings_iter2.md): locked research question — make
DMA-BUF (vaapi no -copy) path render without artifacts on direct-
render consumers. Plus WSI alignment for 864-wide videos in Firefox
and multi-resolution kernel-state recovery.

Phase 2 (phase2_iter2_analysis.md): the bug origin is picture.c:375
re-QBUFing surface_object->destination_index every decode cycle
while the kernel CAPTURE buffer is still being read by the consumer
via an EXPBUF'd dma_buf fd. V4L2 doesn't enforce the constraint;
userspace must coordinate.

Three architecture options analyzed:
  A. More buffers + LRU recycling (cheapest, statistical mitigation)
  B. Per-buffer dma_buf-refcount-aware recycling (correct, requires
     kernel changes or userspace V4L2_MEMORY_DMABUF rewrite)
  C. Kernel patch to enforce QBUF rejection (out of campaign scope)

Picking Option A for iteration 2.

Phase 3 (in-session baseline anchor): mpv vaapi --vo=gpu shows
91 drops in 14s at 1080p, 9 CAPTURE buffer indices used (2-10),
~10-19 re-queues per buffer in 14s window. Per-buffer re-queue
interval ~875ms vs typical compositor hold ~50ms — race window
opens episodically.

Phase 4 (phase4_iter2_plan.md): three independent fixes ordered
cheapest-first.
  Fix 1: invalidate LAST_OUTPUT_WIDTH cache on session teardown.
  Fix 2: try DRM_FORMAT_MOD_INVALID for WSI compatibility.
  Fix 3: decoupled CAPTURE buffer pool with LRU recycling
         (~150-200 lines, the load-bearing fix).

Phase 5 sonnet review BEFORE Fix 3 implementation.

Out-of-scope for iteration 2 (carry to iter 3): EACCES probe,
multi-slice num_ref_idx, HACK block MPEG-2 cleanup, seek-to-non-IDR,
DEBUG instrumentation cleanup, V4L2_MEMORY_DMABUF rewrite, perf
metrics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:55:31 +00:00
marfrit 7e66abb72f Phase 8: iteration 1 close — deliverable lands for vaapi-copy + Firefox + vainfo
Iteration 1 close. Boolean-correctness criterion met for vainfo +
mpv vaapi-copy + Firefox 150 in live Plasma session. mpv vaapi
(DMA-BUF) engages and decodes correctly but stutters due to a
DMA-BUF lifecycle race — deferred to iteration 2.

Four load-bearing functional commits on the fork:
  9de1be3 — slice-header bit-parser populates DECODE_PARAMS bit-size
            fields hantro G1 reads into MMIO registers
  d41a4b9 — always submit SCALING_MATRIX + populate pps num_ref_idx
  37c0e72 — re-set OUTPUT format on resolution change (mpv probe-pattern)
  ac891a0 — honor VA_EXPORT_SURFACE_SEPARATE_LAYERS in vaExportSurfaceHandle

Lessons distilled to memory:
  feedback_read_consumer_source_first (NEW)
  feedback_one_consumer_success_is_not_validation
  feedback_kernel_source_audit_for_uapi_contract (Sonnet Phase 5)
  feedback_stdout_is_data_too (NEW)
  feedback_no_premature_closure

Predecessor claims either reframed or invalidated:
- 'vainfo + mpv probes work end-to-end' was true at libva engagement
  layer, wrong at kernel-decode layer until 9de1be3
- 'chromium-fourier 149 = libva-multi-planar working' was wrong about
  mechanism; chromium-fourier uses chromium-internal V4L2 backend,
  not libva. The 83 pp browser-CPU finding from fourier_attribution
  cell A stands; the path attribution does not.
- 'patch 0011 sentinel test reliably detects decode failure' had a
  cache-coherency bug fixed in a047926.

Open questions carried to iteration 2: DMA-BUF EXPBUF refcount
lifecycle (load-bearing), WSI pitch alignment for non-64-aligned
widths, multi-resolution kernel-state corruption, plus six items
from Sonnet's Phase 5 review.

Iteration 2 opens with these as Phase 0 substrate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:46:07 +00:00
marfrit a2d5210ee9 Phase 5 review (sonnet): substrate sound, parser correct, surface-export attack plan + 6 unsurfaced findings
Plan subagent with model: sonnet, open-consultation pattern. Raw
verbatim review text — operator transcribes to DokuWiki.

Headlines:
- Phase 0 substrate: process not defective, but "end-to-end" framing
  conflated libva engagement with pixel output. New invariant for V4L2
  stateless work: pixel-content check via cache-invalidating path is
  mandatory Phase 0 deliverable.
- Phase 4 plan: correctly scoped, surface-export bug was undiagnosable
  before decode landed (all-zeros NV12 produces solid colors regardless
  of export geometry).
- Slice-header parser: correct, no off-by-one. Two non-bugs to comment
  (MMCO 5 implicitly handled, POC type 2 implicitly correct).
- Phase 7 criterion: right. Both vaapi and vaapi-copy paths required.
  chromium-fourier 149 should be removed from libva corpus (it uses
  chromium's own internal V4L2 decoder, bypasses libva entirely).
- Slice-header parser in-fork is right call vs upstream VAAPI lobbying.
- DEBUG patches stay until DMA-BUF path works end-to-end, then clean
  sweep before any bootlin snapshot.
- FRAME_BASED hardcode is correct for ohm; revisit at fresnel/boltzmann.

Surface-export attack plan (5 load-bearing items):
1. Log RequestExportSurfaceHandle descriptor (object size, plane
   offsets, pitches, fd).
2. Log format_width/height from v4l2_get_format — confirm 1088 not 1080.
3. Run mpv --hwdec=vaapi --msg-level=vd=v --msg-level=vo=v, capture
   what mesa reports about imported DRM format/modifier/geometry.
4. Verify UV offset matches 1920×1088 (=2,088,960). If format_height
   is 1080 (=2,073,600), UV plane reads start 8 rows into Y data —
   most likely structural bug.
5. Read FFmpeg's hwcontext_drm.c for NV12 nb_objects + UV-offset
   computation pattern.

Six previously-unsurfaced findings:
7.1 EACCES on VIDIOC_G_EXT_CTRLS deserves one more probe (try moving
    readback to before MEDIA_REQUEST_IOC_QUEUE).
7.2 num_ref_idx_l0/l1_default_active_minus1 from VASlice is wrong for
    multi-slice streams with per-slice override (defer).
7.3 SET_FORMAT_OF_OUTPUT_ONCE global breaks multi-resolution use
    (latent bug, document).
7.4 // HACK in surface.c is architecturally wrong for multi-codec
    consumers (latent bug, document).
7.5 Firefox non-IDR first frame (mid-stream seek) handling unverified.
7.6 fourier_attribution cell A wheat verdict mechanism is now Step 2
    chromium-internal V4L2 decoder, not Step 1 libva.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:56:08 +00:00
marfrit a052d5d7cd Phase 7 verification: vaapi-copy works, DMA-BUF surface-export bug surfaces
Live Plasma 6 Wayland session retest of all 4 target consumers
against fork commit 6be3f3b.

Results:
- vainfo: ✓ no regression (7 H.264 + 2 MPEG-2 profiles)
- mpv --hwdec=vaapi-copy --vo=gpu: ✓ bunny (Phase 6 success
  re-confirmed in live session)
- mpv --hwdec=vaapi --vo=gpu: ⚠ solid blue frame
- Firefox 150 (live session): ⚠ engages libva for 1 frame
  (gets real pixels per slice_header parse log), then falls
  back to FFmpeg(FFVPX) software for sustained playback
- chromium-fourier 149: ✓ no regression but ORTHOGONAL — uses
  chromium's own V4L2 stateless decoder, bypasses libva entirely

Tests A (mpv vaapi) and B (Firefox) converge on the same
DMA-BUF surface-export bug: vaExportSurfaceHandle in libva-
v4l2-request produces a DMA-BUF that Mesa/Firefox can't render
correctly — likely wrong DRM_FORMAT modifier or plane offset/
stride mismatch with hantro's tile-padded NV12 (sizeimage=
3,655,712 vs vanilla 3,133,440 for 1920x1088).

Also disambiguated: chromium-fourier 149's decode path does
NOT go through libva-v4l2-request — uses chromium's own V4L2
backend (Step-2 chromium-side patch). Reframes the 2026-05-03
fourier_attribution cell-A wheat verdict's path validation.

Boolean-correctness criterion (sharpened): met for vaapi-copy,
not for vaapi (DMA-BUF). Phase 1 lock should wait until both
paths work. Iteration 2 (perf) is gated on the DMA-BUF path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:25:16 +00:00
marfrit b18c6f70f9 Phase 6 success: hantro decodes real H.264 pixels via libva-v4l2-request
First end-to-end hardware decode in this campaign. mpv --hwdec=
vaapi-copy --vo=gpu on bbb_1080p30_h264.mp4 in operator's live
Plasma 6 Wayland session shows the bunny playing — real decoded
NV12 pixel data, not the all-zero (green) or solid-color output
we had all day.

Operator confirmation 2026-05-04: "A big fat white bunny shows up."

Two fork commits got us here:
  d41a4b9 — h264: always submit SCALING_MATRIX + populate pps num_ref_idx
  9de1be3 — h264: bit-parse slice_header to populate DECODE_PARAMS bit-size fields

The load-bearing fix was 9de1be3 (slice-header bit-parser) — it
populates dec_param->dec_ref_pic_marking_bit_size, idr_pic_id,
pic_order_cnt_bit_size which hantro G1 writes directly into MMIO
registers (G1_REG_DEC_CTRL5_REFPIC_MK_LEN, G1_REG_DEC_CTRL5_IDR_PIC_ID,
G1_REG_DEC_CTRL6_POC_LENGTH).

Phase 1 boolean-correctness criterion now sharpened to require
real-VO pixel-content verification. Met for mpv vaapi-copy in
live session.

Phase 7 verification still owed across the full test corpus
(vainfo, mpv vaapi (no -copy), Firefox, chromium-fourier 149).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:49:59 +00:00
marfrit c6e3f58958 Phase 4 input: diff against FFmpeg + hantro kernel source identifies bug
Read FFmpeg's libavcodec/v4l2_request_h264.c (Kwiboo downstream
v4l2-request-n8.1 head b57fbbe, proven working on hantro hardware) and
Linux mainline drivers/media/platform/verisilicon/hantro_g1_h264_dec.c
(the actual register-write code).

Smoking gun: hantro G1 writes three DECODE_PARAMS fields directly into
hardware MMIO registers:

  dec_ref_pic_marking_bit_size  -> G1_REG_DEC_CTRL5_REFPIC_MK_LEN
  idr_pic_id                    -> G1_REG_DEC_CTRL5_IDR_PIC_ID
  pic_order_cnt_bit_size        -> G1_REG_DEC_CTRL6_POC_LENGTH

When all three are zero (our current state — patch 0008 left them
uninitialized with the open question "does hantro tolerate?"), the
hardware bitstream parser advances by zero bits past slice_type,
lands on garbage, decodes zero pixels into the CAPTURE buffer.

This explains the all-zero CAPTURE output we see across mpv and
Firefox in the live-session real-VO tests. Patch 0008's "Empirical
question" was empirically answered today: NO, hantro does not
tolerate zeros in these fields.

Additional easy fixes identified:
- pps->num_ref_idx_l0_default_active_minus1 (uninitialized;
  VAAPI provides; written to MMIO REFIDX0_ACTIVE)
- pps->num_ref_idx_l1_default_active_minus1 (same)
- pps->pic_parameter_set_id (uninitialized; for single-PPS BBB
  accidentally correct at 0; written to MMIO PPS_ID)
- pps->flags |= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT always
  (FFmpeg comment: "FFmpeg always provide a scaling matrix")
- always send V4L2_CID_STATELESS_H264_SCALING_MATRIX in the request
  (with default flat matrix when no VAIQMatrixBuffer arrives)

The load-bearing fix requires implementing a small slice_header()
bit-parser in libva-v4l2-request — VAAPI doesn't carry the
bit-position fields and hantro requires them. ~50-100 lines of C
in a new helper. This is real Phase 6 work, well-scoped.

Saved feedback memory:
- ../.claude/projects/-home-mfritsche-src/memory/
    feedback_kernel_source_audit_for_uapi_contract.md
  on the broader lesson: read kernel-side driver source for any
  UAPI control fields userspace fills, especially when the kernel
  writes them into hardware MMIO. "Empirical question -- does the
  kernel tolerate?" is not an acceptable resolution; either read
  the source or get a definitive empirical answer.

Reference sources gitignored — separate-repo discipline. Pull
recipes (ffmpeg-kwiboo + linux-mainline sparse clones) preserved
in shell history; can be reconstructed from diff_against_ffmpeg.md
URL/branch references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:08:43 +00:00
marfrit 365764fffb Phase 0 amendment: hantro writes zeros, sentinel test cache-buggy
Re-baselined libva-v4l2-request decode path with kernel-side
observability (ftrace v4l2/vb2/dma_fence + dmesg + dynamic_debug)
and visual disambiguator (mpv --vo=gpu in operator's live Plasma
session).

Findings:

1. Kernel reports successful CAPTURE buffer write every frame:
   ftrace vb2_buf_done shows bytesused=3655712 (full NV12 1920x1088
   + hantro tile padding). dmesg completely silent — no
   hantro/vpu/decode/error/warn messages.

2. Visual disambiguator: mpv --hwdec=vaapi-copy --vo=gpu shows a
   solid GREEN frame; --hwdec=vaapi --vo=gpu shows solid BLUE.
   Neither shows the sentinel mid-beige (NV12 Y=0xab,UV=0xab would
   render cream). Both colors are consistent with the kernel
   writing all-zero NV12 (Y=0,UV=0 → green via BT.709 limited; same
   buffer GL-imported as DMA-BUF with different colorspace → blue).

3. Patch 0011 sentinel test has a cache-coherency bug: writes
   0xab via cached surface_object->destination_map[0] mmap, never
   invalidates cache before readback. So the readback always
   shows the stale sentinel even when kernel DMA-overwrote it
   with zeros. vaapi-copy and Mesa DMA-BUF GL import correctly
   invalidate cache and see the real (zero) contents.

This corrects the previous Phase 0 verdicts twice in one day:
- Original commit f15ba8b ("the 2026-04-26 picture holds") was
  wrong: clean contract trace, never checked pixel content.
- Revised commit e892cea ("kernel produces no decoded pixel
  output, sentinel survives") was half right: kernel does write,
  writes zeros, and the sentinel test was reading stale cache.
- Now: kernel writes ALL ZEROS to the CAPTURE buffer. Hantro is
  silently failing the bitstream parse or some control validation.

This is consistent with patch 0011's own commit message hypothesis:
"All zeros → kernel did write 0x00s (overwriting our sentinel),
and the apparent 'no picture' output is the kernel-side decode
actually producing zeros (e.g. parser rejected the bitstream)."
That hypothesis was right; we just couldn't confirm it via the
sentinel test (cache bug) and went down the wrong rabbit hole.

Phase 6 direction sharpens substantially. Bug isn't "we can't
engage hantro" — it's "hantro engages but its parser produces
zeros." Bisect the control submission: VIDIOC_G_EXT_CTRLS
readback to verify writes stick, diff against FFmpeg's
v4l2_request_h264.c (proven working on hantro), verify SPS
completeness, resolve patch 0008's slice_header bit_size open
question, dyndbg the hantro module, etc. Phase 1 boolean-
correctness criterion needs a working pixel-content check before
lock; fix patch 0011's cache sync first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 11:39:42 +00:00
marfrit e892cea858 Phase 0 deliverable #3 (Firefox live session): inverted verdict
Re-tested Firefox 150.0.1 inside operator's active Plasma 6 Wayland
session (not Xvfb). Two-layer finding:

1. Firefox engages libva in real Plasma session: full V4L2-stateless
   contract lifecycle completes, no EINVAL on the request-API path,
   v4l2_request_drv_video.so successfully loaded, /dev/video1 +
   /dev/media0 opened by RDD utility process 146420.

2. Kernel produces no decoded pixel output: CAPTURE buffer returns
   from DQBUF with the patch-0011 sentinel pattern 0xab unchanged.
   Hantro never wrote the buffer despite the contract trace looking
   clean. Firefox detected the failed first frame and silently fell
   back to SW decode in RDD's FFmpeg-OS-library PDM. User-visible
   playback continues normally for 5+ minutes (operator confirmed
   t=337s playback time in live inspection).

Cross-checked against the prior 2026-05-04 mpv vaapi-copy run: 68 of
68 mpv CAPTURE buffers show the same sentinel-survives pattern.
mpv's --vo=null consumed all 68 sentinel buffers as if they were
valid NV12 frames; the failure was invisible. OUTPUT bytes are
byte-for-byte identical between mpv and Firefox (same IDR slice via
libavcodec, both consumers feed hantro the same data, hantro
silently drops both).

Implication: the prior Phase 0 in-session re-verification verdict
(commit f15ba8b: "the 2026-04-26 picture holds at boolean-correctness
level") was wrong at the kernel-decode layer. The patch-0011 sentinel
test in the deployed Step 1 build was authored specifically to detect
this failure mode; the predecessor close-out didn't grep for it, and
contract-trace cleanliness was mistaken for end-to-end success.

Phase 1 lock should be deferred until: (a) boolean-correctness
criterion is sharpened to require pixel-content verification,
(b) Phase 0 acquires kernel-side observability (ftrace, dmesg) to
characterize WHY hantro is silent. Step 1 engages libva but doesn't
make hantro decode -- Phase 6 has substantive work beyond the
18-patch series.

Likely failure-mode candidates flagged in findings_live.md priority
order: reference_ts not propagated; DECODE_PARAMS slice_header
bit_size zero; POC sentinel may still leak past patch-0015 strip;
level_idc over-allocation; SOURCE_CHANGE event handling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 10:38:57 +00:00
marfrit f115fa6cbc Phase 0 deliverable #3 (Firefox): headless-rig finding
Firefox 150.0.1 + media.ffmpeg.vaapi.enabled=true + LIBVA_DRIVER_NAME=
v4l2_request, executed under Xvfb on ohm.

Result: inconclusive at the boolean-correctness level. RDD process
dlopens libva.so.2 + libva-drm.so.2 + libva-x11.so.2 for capability
probe then immediately closes them; never reaches vaInitialize, never
opens /dev/dri/renderD128, never reaches v4l2_request_drv_video.so.
Falls back to software H.264 in RDD via FFmpeg-OS-library PDM
(Broadcast support from 'RDD', support=H264 SWDEC).

Root cause: Xvfb provides software framebuffer with no DRI/DRM
render-node integration. Firefox's gfx-environment platform-fitness
check rejects VAAPI before adding it to the RDD PDM order list.
Not a libva-side or driver-side fault — mpv --hwdec=vaapi-copy in
the same headless rig DID engage end-to-end (per
phase0_evidence/2026-05-04/findings.md).

Definitive Firefox verdict requires retesting inside a live Plasma
session — deferred to live-session run (next commit).

Also: Phase 0 deliverable #2 (Step 1 reconciliation into fork
master) was completed and pushed to marfrit/libva-v4l2-request-fourier
between this and the prior Phase 0 commit; status table updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 10:19:14 +00:00
marfrit f15ba8b147 Phase 0 in-session re-verification: 2026-04-26 picture holds
Re-executed deliverables #1 (verify failure-mode finding) and #4 (capture
contract trace) on ohm against the substrate that's actually deployed —
not the libva-v4l2-request-fourier git fork master, but the
libva-v4l2-request-ohm-gl-fix package built on boltzmann from the Step 1
18-patch series.

Result: vainfo enumerates 7 H.264 + 2 MPEG-2 profiles cleanly; mpv
--hwdec=vaapi-copy decodes 68 H.264 frames end-to-end through the full
V4L2-stateless contract on hantro /dev/video1 + /dev/media0. Zero
EINVAL/EAGAIN/EBUSY on the request-API path. No rig drift requiring
Phase 2 loopback.

Inventory finding documented: the git fork at e8c3937 is a pre-Step-1
substrate; rebuilding from it as-is would be a regression. Step 1
reconciliation (deliverable #2) is upstream of any future build-from-fork
action.

Rig caveat captured: --hwdec=vaapi requires a real VO; --hwdec=vaapi-copy
is the headless-safe alternative for SSH-driven test rigs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 09:16:38 +00:00
marfrit 35b5617c71 .gitignore: hide nested fork repo from campaign git 2026-05-04 08:10:19 +00:00
marfrit affc1752e0 Initial campaign substrate: README + phase0_findings
Single-question campaign — make multi-planar libva accepted by VA-API
consumers on Rockchip hantro RK3568 (PineTab2/ohm first iteration).
Backend only, success criterion is boolean correctness, performance
deferred. Substrate carried over from libva-v4l2-request-fourier
STUDY.md (commit e0acc33 in the fork) plus locked decisions from the
2026-05-04 setup exchange.

Fork lives as a subdirectory: libva-v4l2-request-fourier/ (separate
git repo, origin marfrit/libva-v4l2-request-fourier, upstream
bootlin/libva-v4l2-request).

Empty Gitea repo created at git.reauktion.de/marfrit/libva-multiplanar;
local origin remote set, no push yet (per operator instruction —
wait for publish-worthy state).
2026-05-04 08:10:03 +00:00