Predecessor (iter6): primary user goal MET — Firefox + YouTube avc1 HW decode works on PineTab2. Remaining campaign work is polish, formal verification, and upstream-prep. Candidates: - A: msync removal pixel-correctness verification (carry from iter5 sonnet C3) - B: Slot-leak error recovery — request_pool_force_release for REINIT/DQBUF mid-cycle failures (iter6 internal carry) - C: Probe-pattern test harness for cap_pool race — formal anchor for iter5 sonnet C4 / iter6 candidate A organic exercise - D: Bootlin / Mozilla upstreaming prep (carried iter3+4+5+6) - E: Performance binding cell (carried six iterations) - F: V4L2_MEMORY_DMABUF (high-risk architectural) G (WiFi-IRQ frame drops) flagged out-of-campaign-scope. H/I (fourier-fresnel, panvk-bifrost) separate top-level campaigns. Recommended primary: A+B+C — closes all three internal carry items in one iteration. Alternative: D alone for the upstream- prep iteration. Phase 1 lock requires operator input. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 KiB
Phase 0 — iteration 7 substrate (libva-multiplanar campaign)
Opened 2026-05-06 immediately after iter6 close. iter6 closed GREEN with a single architectural fix (per-OUTPUT-slot request_fd binding via REINIT, fork commit a09c03c) that resolved the merged A∪I scope. Both candidate I (Firefox VIDIOC_QBUF EINVAL) and candidate A (cap_pool resolution-change race) closed in the same fix; the resolution-change race was organically exercised by YouTube's quality-renegotiation cap_pool_init events (4 events on a ~95s YT avc1 run, all clean).
Operator's primary goal — Firefox HW decode of YouTube avc1 on PineTab2 — is now MET end-to-end. Remaining campaign work is polish, formal verification, and upstream-prep.
Predecessor close-out summary (iteration 6 → iteration 7)
iter6 landed a single fork commit:
a09c03c— per-OUTPUT-slot request_fd binding viaMEDIA_REQUEST_IOC_REINIT. Replaces iter4's385dee1close+media_request_alloc-per-frame model. Pool size 4 → 16. Slot owns the fd, surface borrows. iter4's case-against-REINIT was confirmed to be a DPB-payload confounder (since fixed in74d8dd1).
iter6 dropped MPEG-2 from the carry list (CPU handles it fine on RK3568).
iter6 carried into iter7:
- msync pixel-correctness verification (carry from iter5 Phase 5 sonnet C3)
- Slot-leak error recovery (iter6 internal carry —
request_pool_force_releasefor the rare case where REINIT or DQBUF fails mid-cycle) - Probe-pattern test harness for cap_pool race (carry from iter5 sonnet C4 — race is organically exercised by YT but a synthetic test would anchor the claim formally)
- WiFi-IRQ frame drops (out-of-campaign system concern; flagged but not on iter7 scope)
Iteration 7 candidate research questions
A. msync removal pixel-correctness verification (carried from iter5 Phase 5 sonnet C3)
Confirm decoded frames are byte-identical (or visually-correct-and-deterministic) post-iter5-sweep, after
msync(MS_SYNC | MS_INVALIDATE)was removed alongside the iter1 patch-0010 hex-dump diagnostic.
Plan: 100-frame vaapi-copy run on bbb_1080p30_h264.mp4, capture md5/sha of each decoded YUV plane, compare against (a) iter1 baseline (msync-present), (b) FFmpeg software decode reference, (c) iter5-end baseline.
Risk: low. If frames diverge, msync goes back in. If frames match, formally close iter5 sonnet C3.
Effort: 1-2 hours including writing the frame-hash harness.
B. Slot-leak error recovery (iter6 internal carry)
Add
request_pool_force_release(pool, slot_index)that REINITs the slot's fd and clearsbusy=true, callable from RequestSyncSurface error paths.
Currently when REINIT or DQBUF fails mid-cycle, the slot stays busy=true until RequestTerminate. With pool=16 and rare errors this is bounded, but it's a slow leak that sould eventually starve acquire under sustained-error scenarios.
Plan: add request_pool_force_release to request_pool.{c,h}. Call from surface.c error paths. Verify with a fault-injection harness (return -EBUSY from REINIT in some test scenario).
Risk: low — additive function, doesn't change happy path.
Effort: 1 hour code + smoke test.
C. Probe-pattern test harness for cap_pool race (iter5 sonnet C4 / iter6 candidate A formal-anchor)
Write a synthetic test program that exercises the
vaCreateSurfaces(small) → vaCreateSurfaces(big)resolution-change pattern that originally triggered REQBUFS-EBUSY. Confirms iter6's REINIT discipline holds the cap_pool race closed in a deterministic-repro form, not just organically via YT.
Plan: 50-line C program using libva: open device, create context, create 4 surfaces at 128×128, vaPutSurface noop, destroy, create 4 surfaces at 1920×1080, decode bbb's first I-frame, sha256 the output. No EBUSY events expected in driver stderr.
Risk: low. Test harness only; doesn't change driver.
Effort: 2-3 hours including writing test, fixturing bbb's first I-frame as raw H.264 NAL.
D. Bootlin / Mozilla upstreaming prep (carried from iter3 candidate G + iter4 + iter5 + iter6)
File the firefox-fourier patch with Mozilla Bugzilla (bug 1833354 / 1965646 reference for V4L2 stateless analogue). File libva-v4l2-request fork's iter1-iter6 patch series with bootlin's libva-v4l2-request maintainer (Paul Kocialkowski) as a coherent series.
Plan (Mozilla): Bugzilla account, write up "V4L2 stateless decoders blocked by RDD+Utility sandbox" with reproducer, attach combined 160-line patch.
Plan (bootlin): structure the iter1-iter6 commits as a clean patch series. Possibly squash some of iter5's instrumentation-removal commits with their original patch landings. Address upstream concerns about per-call diagnostics that survived as request_log lines.
Risk: socially-mediated. Maintainer may push back on architectural decisions; per feedback_no_upstream.md no PR/MR happens without explicit operator instruction.
Effort: bug filing 2-4 hours; patch-series prep 4-8 hours.
E. Performance binding cell (carried from iter1+2+3+4+5+6)
Anchor measured numbers for the four primary consumer paths: mpv-vaapi DMA-BUF, mpv-vaapi-copy, Firefox-fourier HW, SW baseline. Drop count, CPU%, frame timing, GPU freq, on bbb_1080p30. Reproducible from a documented script.
Plan: shell script that runs each consumer for 30s, captures pidstat -u -p <PID>, reads /sys/class/devfreq/fde60000.gpu/cur_freq, parses mpv stats / Firefox about:processes. Generate a markdown table. Carried five iterations now.
Risk: low. Measurement only.
Effort: 3-4 hours including script + run + table.
F. V4L2_MEMORY_DMABUF (carried from iter2+3+4+5+6)
Replace V4L2_MEMORY_MMAP with userspace dma-buf allocation for the OUTPUT pool. Architectural fix for the iter2 cap_pool race (statistical / LRU mitigation now superseded by iter6's REINIT discipline, but DMABUF is structurally cleaner).
Risk: highest unknown of any candidate. Possibly requires kernel work. Hantro on this kernel may not accept DMABUF on OUTPUT.
Effort: 1-3 days, possibly more.
G. WiFi-IRQ frame drops (NEW from iter6, out-of-campaign-scope flagged)
ohm shows visible frame drops during YouTube playback whenever the brcm/iwlwifi driver does heavy IRQ work. Decode pipeline is fine; presentation schedule slips.
Stance: out of campaign scope (system-level concern, not a libva-multiplanar bug). Listed here because operator surfaced it during iter6 verification. If desired, separate investigation into IRQ affinity, GRO offload, network-stack buffer settings.
H. fourier-fresnel campaign (carried from iter5 followon-campaigns memory)
Open the
fourier-fresnelcampaign — portlibva-v4l2-request-fourierfrom ohm RK3568 to fresnel RK3399 (Pinebook Pro). Validates generality of iter1-iter6 fixes on a second hardware target.
Stance: separate top-level campaign, not an iter7 candidate. Charter at ~/src/fourier-fresnel/ once opened. iter5 memory entry project_followon_campaigns.md records sequencing: fourier-fresnel before panvk-bifrost.
I. panvk-bifrost campaign (carried from iter5 followon-campaigns memory)
Charter at
~/src/panvk-bifrost/already exists as document-only. Sequenced after fourier-fresnel.
Stance: separate top-level campaign.
Recommended pairings
- A + B (msync verify + slot-leak fix) — both small, both formally close iter6's carry items. Tightest scope.
- A + C (msync verify + cap_pool race harness) — formally closes the two iter5 sonnet caveats. Mid scope.
- A + B + C — closes all three internal carry items in one iteration. Reasonable upper bound.
- D alone — gated on operator instruction. Big effort but the campaign's culmination.
- E alone — anchors campaign-wide claims to numbers. Carried six iterations now.
- F alone — high-risk architectural.
Recommended primary: A + B + C — closes all three internal carry items, leaves D/E/F for iter8+ depending on whether operator wants the upstream-filing iteration first or the perf-anchor iteration first.
Alternative: D alone if operator wants to make iter7 the upstream-prep / external-filing iteration. Carries political weight (Mozilla + bootlin) but minimal new code.
State that carries (re-verified iter6 close)
- Hardware: ohm RK3568 hantro G1/G2, kernel 6.19.10. Access:
ohm(LAN). VPN currently flaky. - Userspace: firefox 150.0.1-1.1 (iter5 amendment), libva 2.23.0, mesa 26.0.5, libdrm 2.4.131, mpv 0.41.0-3.
- Driver installed:
/usr/lib/dri/v4l2_request_drv_video.sosha256ebe396d55104dbfedfa1065232d7f1959c519b4afe6fe33f46c1b9af13465ed6(iter6-end, REINIT discipline + pool=16). - Test fixture: bbb_1080p30_h264.mp4 sha256
dcf8a7170fbd.... - Build container: firefox-fourier LXD on boltzmann, persistent.
- Diagnostic instrument:
/home/mfritsche/iter6-fork-dx/on ohm — full fork tree with ITER6_DX log instrumentation, can be reactivated for iter7 candidate B fault-injection or candidate C harness wiring.
State that does NOT carry
- iter6 telemetry logs are tmpfs-volatile.
- Original iter6-DX driver binary backed up at
/home/mfritsche/v4l2_request_drv_video.so.iter5end.bak(iter5-end pre-iter6); the iter6-DX with diagnostics is at/home/mfritsche/iter6-fork-dx/build/src/v4l2_request_drv_video.soif needed for re-instrumentation.
Tooling and measurement-instrument inventory
Same as iter6. Plus for new candidates:
- For A (msync verify):
vaapi-copyalready produces NV12 output to mpv's vo=null path. Add an mpv--frames=N --o=output.yuvscript to capture raw YUV; sha256 each frame. FFmpeg'sffmpeg -i bbb -vframes N output_%d.yuvprovides the SW reference. - For B (slot-leak): synthesize a fault-inject by adding a debug toggle in surface.c that returns
-EBUSYfrom REINIT after N frames; observe pool starvation, then add force_release and verify recovery. - For C (cap_pool harness): write a 50-line libva test program. Reuse driver_data inspection from iter5 Track E test pattern.
- For D (upstream): no new tooling; existing patch series live in
firefox-fourier/and the fork's git log iter1-iter6. - For E (perf):
pidstat -u,/sys/class/devfreq/fde60000.gpu, mpv stats overlay, Firefoxabout:processes. - For F (DMABUF): kernel docs
Documentation/userspace-api/media/v4l/buffer.rst, hantro driver sourcedrivers/staging/media/hantro/.
In-scope (LOCKING DEFERRED — Phase 1 user input)
To be locked at Phase 1 from candidates A..F above. G is out-of-campaign-scope. H, I are separate top-level campaign decisions, not iter7 candidates.
Out-of-scope (LOCKED 2026-05-06 for iteration 7)
- iter6-completed work (per-slot REINIT discipline) — done.
- iter5-completed work (Track A sweep, Track G PGO rebuild, Track E multi-context, Track B mpv libplacebo) — done.
- iter5 amendment (Utility seccomp / Firefox-fourier sandbox) — done.
- iter4-completed work (Track A frame-11 EINVAL fix) — done.
- iter3-completed work (Track F sandbox patch) — done in firefox-fourier; upstream filing is iter7 candidate D.
- New codecs OUTSIDE H.264 (VP8/VP9/AV1/HEVC out per iter1 lock; MPEG-2 dropped at iter6 close).
- New target hardware (fresnel, ampere) — separate campaigns (H, I above).
- WiFi-IRQ frame drops — system-level, not libva-multiplanar.
Phase 1 success criterion (will lock after user picks candidate)
Pre-lock template:
- For candidate A: "100-frame
vaapi-copyproduces frame hashes matching either FFmpeg SW baseline (preferred) or iter1 baseline (if msync-removal causes any divergence). If divergence, msync restored and verified." - For candidate B: "Synthetic fault-injection (REINIT returns -EBUSY after N frames) demonstrates pool starvation pre-fix; post-fix demonstrates
request_pool_force_releasereclaims the slot and decode resumes." - For candidate C: "Synthetic test program issues
vaCreateSurfaces(small)thenvaCreateSurfaces(big)then decodes bbb's first I-frame; driver stderr has zero REQBUFS-EBUSY events; output frame sha matches FFmpeg SW reference for that I-frame." - For candidate D: "Mozilla Bugzilla bug filed with combined 160-line patch attached, references bug 1833354/1965646. Bootlin patch series prepared as a clean iter1-iter6 sequence on a separate branch, ready to send (no PR until operator OK)."
- For candidate E: "Anchored perf table for {mpv vaapi DMA-BUF, mpv vaapi-copy, Firefox-fourier HW, SW baseline} across drop count + CPU% + frame timing + GPU freq on bbb_1080p30. Reproducible from documented script."
- For candidate F: "vaapi-copy + vaapi --vo=null still produce real frames with V4L2_MEMORY_DMABUF-backed OUTPUT buffers; race window architecturally closed."
Stop point
Phase 1 lock requires user input — pick from A..F (and any pairing).
Recommended primary: A + B + C — closes all three internal carry items, leaves D/E/F for iter8+ depending on whether operator wants upstream-filing or perf-anchor next.
Alternative leans:
- D alone if operator wants the upstream-prep iteration now
- E alone if perf measurement matters more than carryover-closure
- F alone if architectural cleanliness drives the next iteration
After lock, iter7 phases 2..8 proceed autonomously per "Stop only if user is needed."