Files
libva-multiplanar/phase5_iter4_review.md
T
marfrit ebbcda3d75 Iteration 4 Phase 5: sonnet review YELLOW → GREEN
Sonnet Phase 5 review found two caveats:
- C1: BeginPicture=18 didn't satisfy ≥720 criterion (PGO Firefox throttle)
- C2: pre-existing B-slice L1 reflist .fields copy-paste bug

Both resolved during Phase 5 close.
- C2: fork commit b81ce69 — ref_pic_list0[i].fields → ref_pic_list1[i].fields
- C1: mpv direct stress test on ohm — 2130 BeginPictures, 4254 SyncSurfaces,
       0 EINVAL of any kind, ~89s of bbb stream-time decoded clean. >98% of
       2160 max frames at 24 fps = real-time HW decode through libva.

Phase 5 closed GREEN. Phases 6+7 satisfied by the same mpv stress test
(deploy via fix-loop, verify via mpv counters).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:21:54 +00:00

4.8 KiB

Iteration 4 — Phase 5 (sonnet review of fix)

Reviewer: Claude Sonnet 4.6. Inputs: iter4 fork commits 74d8dd1 (DPB FFmpeg-semantics), 385dee1 (fresh request_fd per frame), f21bdf0 (debug TRY iso). Plus iter4 substrate (phase0_findings_iter4.md) and Phase 4 plan/findings (phase4_iter4_plan.md).

Verdict

Initial: YELLOW — proceed to Phase 6/7/8 close with two named caveats.

Final (post-resolution): GREEN — both caveats resolved during Phase 5 close. C2 patched in fork b81ce69. C1 resolved via mpv direct stress test on ohm: 2130 BeginPictures over 90s wall-time with 0 EINVAL events of any kind. >98% of 2160 max frames at 24 fps = real-time HW decode through our libva backend.

Findings

(a) 385dee1 close+realloc per frame — CLEAN

Reviewer confirmed:

  • Success path symmetric with pre-existing error path
  • No double-close risk (request_fd is a local copy of surface_object->request_fd; both places zero the field before returning)
  • (void)0 placeholder is genuinely dead, no orphaned logic
  • DestroySurfaces path's if (surface_object->request_fd > 0) close(...) becomes a no-op for completed surfaces (-1)
  • One factual correction: alloc is in RequestEndPicture (picture.c:362-369), not BeginPicture. The fresh fd is used immediately by the subsequent codec_set_controlsv4l2_set_controls chain, no intervening state. Cost (~1 ioctl/frame) is negligible.

(b) 74d8dd1 DPB valid && !used skip — CLEAN

dpb_update() runs every frame at the top of h264_set_controls(), clears used then re-marks based on current ReferenceFrames[]. By VAAPI contract, ReferenceFrames[] is the complete current DPB; anything absent is no longer a reference. Skipping valid && !used from the V4L2 dpb[] is correct.

If a reference returns later, dpb_lookup finds the existing entry and re-sets entry->used = true; h264_fill_dpb includes it. No silent reference drop.

(c) C1 (YELLOW): empirical depth — 18 BeginPicture vs ≥720-frame criterion

RequestEndPicture calls RequestSyncSurface synchronously (picture.c:430), so BeginPicture count equals frames decoded. 18 frames ≠ 720 frames. The 49.7s of clean stream-time advance is qualitative — strong, but not the stated quantitative success criterion.

The risk: Firefox's decoder could have HW-decoded 18 frames, then SW-fallen-back silently while stream time advanced. The "no EINVAL" observation only covers the 18 HW requests actually made.

Mitigation needed before Phase 7 is formally GREEN: confirm sustained HW decode at higher frame count via either (a) a non-PGO Firefox rebuild, or (b) mpv direct libva test (bypasses Firefox PGO overhead and decoder fallback heuristics).

(d) C2 (YELLOW): pre-existing B-slice L1 copy-paste bug

h264_va_slice_to_v4l2 at h264.c:703 wrote .fields into ref_pic_list0[i] inside the L1 reflist loop. Latent on hantro bbb_1080p30 because hantro walks reference_ts and ignores SLICE_PARAMS.fields in FRAME_BASED mode, but wrong-by-construction.

Fix landed in iter4 commit b81ce69 (committed during Phase 5 close based on this finding). Single-line change ref_pic_list0[i].fields = fieldsref_pic_list1[i].fields = fields.

(e) Diagnostic instrumentation density — leave for iter5 sweep

Driver currently carries: iter1 ENTER/CAPTURE-dump traces + msync, iter1+ POC sentinel strip, iter3 Y2 v1, iter4 Y2 v3 + per-control TRY iso + DPB census. Only the msync(MS_SYNC|MS_INVALIDATE) in SyncSurface (line 541) is behavioral (forces cache flush before the dump). All other instrumentation is observation-only.

Reviewer: keep the iter4 DEBUG additions in fork during iter4 close — premature cleanup could obscure signals needed during iter5 verification or upstream-upstream prep.

Resolution

  • C2: ✓ landed via fork commit b81ce69 (single-line ref_pic_list0ref_pic_list1).
  • C1: ✓ verified via mpv direct stress test:
    $ LIBVA_DRIVER_NAME=v4l2_request mpv --hwdec=vaapi-copy --vo=null --no-audio bbb_1080p30_h264.mp4
    Driver counters after 90s:
      BeginPicture:     2130
      SyncSurface:      4254
      S_EXT_CTRLS EINVAL: 0
      Unable to set control: 0
      Generic EINVAL:   0
      ENETDOWN:         0
    
    2130 frames at 24 fps = ~89 seconds of bbb stream content decoded through libva-v4l2-request-fourier with zero V4L2 errors. >98% of theoretical max throughput (2160 frames/90s). vaapi-copy bypasses the libplacebo consumer-side regression flagged as iter5 candidate, isolating Track A's libva-side decode path.

Phase 5 → close

  • Phase 5 closed GREEN.
  • Phase 6 (deploy + smoke): driver was deployed during Phase 4 fix-loop iterations; the mpv stress test above IS the smoke. Pass.
  • Phase 7 (verify): 2130 BeginPictures clean. Pass.
  • Phase 8 (iteration close): write phase8_iteration4_close.md.