Files
libva-multiplanar/phase2_iter7_situation.md
claude-noether ec769a9687 docs: clarify Rockchip silicon across operative docs (RK3566)
PineTab2 is Rockchip RK3566 silicon, not RK3568. The hantro driver
attaches via the rockchip,rk3568-vpu DT compatible because RK3566/
RK3568 silicon is close enough to share that variant. The proper
RK3566 mainline driver target (rkvdec2 / vdpu346) has no kernel
support yet — Christian Hewitt's patch series LKML 2025/12/26/206
is unmerged.

Updated operative docs to use the consistent form:
"PineTab2 (Rockchip RK3566 silicon; hantro driver via the
rockchip,rk3568-vpu DT compatible)" or shorter variants.

Files updated:
- README.md (campaign top-level): TL;DR, deliverable, KWin link,
  hardware target, hardware listing
- firefox-fourier/README.md: tested-on line
- phase8_iteration7_close.md: hardware carry
- phase8_iteration6_close.md: hardware carry, MPEG-2 drop
  rationale
- phase0_findings_iter7.md: predecessor summary, fourier-fresnel
  description, hardware carry
- phase2_iter7_situation.md: msync hypothesis hardware reference

Historical iter1-iter5 phase docs left as-is — they're snapshots
of what the campaign believed at the time. The canonical source
for the silicon-ID correction is track_F_research_2026-05-06.md
(commit 358801b).

Not a correctness change. The campaign's empirical evidence is
unaffected — the hantro/rk3568-vpu driver path that we exercised
was always the actual decode path on PineTab2 silicon.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 11:39:28 +00:00

8.6 KiB

Phase 2 (iter7) — situation analysis: A + B + C

Opened 2026-05-06 immediately after iter7 Phase 1 lock. Three independent sub-tracks; analysis per track.

Track A — msync pixel-correctness verification

Context

iter1 had msync(MS_SYNC | MS_INVALIDATE) on the CAPTURE buffer mmap after DQBUF, before any userspace read. Was a companion to the iter1 patch-0010 hex-dump diagnostic ("where does the buffer write go?"). iter5 sweep commit d3a299b removed both the hex-dump and the msync. Phase 5 sonnet caveat C3 (iter5) flagged: "no pixel-correctness verification post-msync-removal."

Hypothesis

On hantro G1 / PineTab2 (RK3566 via rk3568-vpu) / kernel 6.19.10 with CMA-backed DMA contiguous allocator, the V4L2 framework (videobuf2-dma-contig.c) does cache sync at DQBUF time for cached mappings. If our CAPTURE mmap is cached, msync(MS_INVALIDATE) is structurally redundant. If it's write-combine / uncached, the kernel-side sync is unnecessary. Either way, msync removal should be safe.

But this is testable, not just theoretical. The cleanest test:

Plan

  1. SW reference: ffmpeg -i bbb_1080p30_h264.mp4 -frames:v 100 -f rawvideo -pix_fmt nv12 sw_ref.yuv — pure libavcodec SW decode, well-defined NV12 packing.
  2. HW subject: same input through our driver via ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i bbb -vf hwdownload,format=nv12 -frames:v 100 -f rawvideo -pix_fmt nv12 hw_capture.yuv. Forces the same NV12 packing on output. The internal hwdownload path exercises libva's vaDeriveImage / vaGetImage readback, which is exactly what mpv-vaapi-copy uses.
  3. Compare: cmp byte-for-byte. If identical, the msync removal is verified safe — formally closes iter5 sonnet C3. If divergent, capture the divergence pattern (which frame, which plane, what bytes), restore msync, re-run.

What could go wrong

  • FFmpeg HW path may use slightly different vaapi entrypoint than mpv-vaapi-copy → test verifies a different code path than the one we care about. Mitigation: also run mpv-vaapi-copy with --o=mpv_capture.yuv and compare both against SW reference. If both match, we're covered for the two main consumer patterns.
  • Frame ordering may differ (B-frames out-of-order at decode vs display). Mitigation: extract reference and subject in display order (default for -f rawvideo). bbb is largely IPB-pattern; FFmpeg handles the reorder uniformly.
  • Bbb fixture's first 100 frames include I + multiple GOPs → exercises both intra and inter prediction paths.

Effort

1-2 hours including the harness script. Result is binary (PASS = no msync needed; FAIL = restore msync).


Track B — Slot-leak error recovery

Context

iter6 fix bound one media_request_alloc-allocated fd per OUTPUT pool slot, REINIT'd between uses. But the error path in RequestSyncSurface (commit a09c03c) doesn't release the slot when REINIT or DQBUF fails — it only clears surface_object->request_fd = -1. The slot's busy = true stays. With pool size 16, after 16 such errors the pool is starved and acquire returns -1, decode permanently broken.

Phase 5 sonnet code review for iter6 explicitly noted this as a "bounded leak we accept" but flagged the TODO: request_pool_force_release for error recovery.

Hypothesis

Adding request_pool_force_release(pool, slot_index) that REINITs the slot's fd and clears busy = true will recover the slot. If REINIT fails (kernel object too far gone), fall back to close + media_request_alloc. If alloc also fails, mark the slot "dead" and skip it in future acquires — count effectively decremented by 1.

Plan

  1. Add int media_fd to struct request_pool so force_release can re-alloc.
  2. Update request_pool_init signature → already takes media_fd from iter6; just store it in the struct.
  3. Add request_pool_force_release(pool, slot_index):
    void request_pool_force_release(struct request_pool *pool, unsigned int index)
    {
        struct request_pool_slot *slot = lookup_slot(pool, index);
        if (slot == NULL) return;
        if (media_request_reinit(slot->request_fd) < 0) {
            close(slot->request_fd);
            slot->request_fd = media_request_alloc(pool->media_fd);
            /* If realloc fails, slot stays busy=true (effectively dead).
             * Other slots unaffected. */
            if (slot->request_fd < 0) return;
        }
        slot->busy = false;
    }
    
  4. Call from RequestSyncSurface error path in surface.c after REINIT or DQBUF failure:
    if (rc < 0) {
        request_pool_force_release(&driver_data->output_pool, surface_object->source_index);
        status = VA_STATUS_ERROR_OPERATION_FAILED;
        goto error;
    }
    
  5. Fault inject test: build with -DITER7_FAULT_INJECT_REINIT that returns -EBUSY from media_request_reinit after N frames; observe pre-fix (pool starves at 16) vs post-fix (recovers).

What could go wrong

  • media_request_alloc returns lowest-free fd, which after close might be the SAME fd number — exactly the iter6 issue. But this is in error recovery, not normal flow; happens rarely; the primary safeguard (1:1 slot-fd binding for HAPPY path) still holds. Documented tradeoff.
  • The pool is driver-wide (multi-context safe per iter5 Track E); force_release is called from a single-threaded VA context path, no concurrency hazard.

Effort

1-2 hours code + test scaffolding.


Track C — cap_pool race synthetic harness

Context

iter5 sonnet C4 flagged: "cap_pool resolution-change race — mpv's libplacebo Vulkan-fallback path triggers it; mpv recovers via SW fallback (no segfault), but the race exists. Fix: drain CAPTURE properly before issuing REQBUFs(0) on resolution change in CreateSurfaces2."

Phase 5 sonnet for iter6 said: "the test plan does not yet include the probe-pattern test harness for the candidate A (REQBUFS-EBUSY) path. This is a gap. Without it, the 'mpv libplacebo GREEN' criterion may pass trivially."

iter6's REINIT discipline organically resolved the race (4 cap_pool_init events on YouTube clean), but a deterministic-repro synthetic test would anchor the claim formally.

Hypothesis

A 50-line C program using libva that:

  1. Opens /dev/dri/renderD128, gets VADisplay
  2. Initializes libva, creates VAConfig for H264Main + VLD
  3. vaCreateSurfaces(dpy, VA_RT_FORMAT_YUV420, 16, 16, small[4], 4, NULL, 0) — probe-pattern small
  4. vaDestroySurfaces(dpy, small, 4) — dispose
  5. vaCreateSurfaces(dpy, VA_RT_FORMAT_YUV420, 1920, 1080, big[4], 4, NULL, 0) — real resolution
  6. vaDestroySurfaces(dpy, big, 4), vaTerminate(dpy), close DRM

… should succeed without REQBUFS-EBUSY in driver stderr on the iter7 driver. Pre-iter5 / pre-iter6 it would fail.

Plan

  1. Write tests/cap_pool_probe_pattern.c. ~50 lines.
  2. Add a Meson build target that compiles it against installed libva (don't bundle it into the .so; build separately with pkg-config --libs libva libva-drm).
  3. Run on ohm under LIBVA_DRIVER_NAME=v4l2_request ... env vars.
  4. Capture stderr; grep for REQBUFS|EBUSY|Unable to request buffers|Unable to set format; non-zero match = FAIL.
  5. Optional extension: add a decode step to validate the produced surface contains a real frame. Defer to iter8 if needed; this iteration's harness is allocation-pattern-only.

What could go wrong

  • libva client API shape on PineTab2's libva 2.23.0 may differ slightly from the headers I'm coding against. Mitigation: read /usr/include/va/va.h from ohm to confirm the function signatures I use.
  • 16x16 probe surfaces may not be accepted by every driver (minimum size constraints). bbb's actual sizes are 1920x1080 and 864x486 — could use a more reasonable probe-pattern small (e.g. 128x128).
  • Test may pass trivially if the cap_pool init path doesn't actually run with surfaces_count=4 — need to verify the path is exercised. Diagnostic: temporarily add a request_log line in cap_pool_init confirming entry.

Effort

2-3 hours including writing the test, debugging the libva API calls, building it on ohm, running, validating. Plus another ~30 min to stitch into the campaign repo with a Meson build description.


Execution order

B → C → A:

  1. B first (smallest, additive — no impact on happy path). Land + test, then build iter7-final driver.
  2. C second (synthetic test, no driver change required, but needs iter7-final driver to test against).
  3. A last (verification — validates the iter7-final driver including B's changes). Result either confirms no msync needed (closes iter5 sonnet C3) or restores msync as a follow-up.

Stop point

Phase 2 closes here. Proceed autonomously to Phase 3 (baseline anchors) and Phase 4 (implementation).