ebbcda3d75
Sonnet Phase 5 review found two caveats:
- C1: BeginPicture=18 didn't satisfy ≥720 criterion (PGO Firefox throttle)
- C2: pre-existing B-slice L1 reflist .fields copy-paste bug
Both resolved during Phase 5 close.
- C2: fork commit b81ce69 — ref_pic_list0[i].fields → ref_pic_list1[i].fields
- C1: mpv direct stress test on ohm — 2130 BeginPictures, 4254 SyncSurfaces,
0 EINVAL of any kind, ~89s of bbb stream-time decoded clean. >98% of
2160 max frames at 24 fps = real-time HW decode through libva.
Phase 5 closed GREEN. Phases 6+7 satisfied by the same mpv stress test
(deploy via fix-loop, verify via mpv counters).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
71 lines
4.8 KiB
Markdown
71 lines
4.8 KiB
Markdown
# Iteration 4 — Phase 5 (sonnet review of fix)
|
|
|
|
Reviewer: Claude Sonnet 4.6.
|
|
Inputs: iter4 fork commits `74d8dd1` (DPB FFmpeg-semantics), `385dee1` (fresh request_fd per frame), `f21bdf0` (debug TRY iso). Plus iter4 substrate (`phase0_findings_iter4.md`) and Phase 4 plan/findings (`phase4_iter4_plan.md`).
|
|
|
|
## Verdict
|
|
|
|
**Initial: YELLOW** — proceed to Phase 6/7/8 close with two named caveats.
|
|
|
|
**Final (post-resolution): GREEN** — both caveats resolved during Phase 5 close. C2 patched in fork `b81ce69`. C1 resolved via mpv direct stress test on ohm: 2130 BeginPictures over 90s wall-time with 0 EINVAL events of any kind. >98% of 2160 max frames at 24 fps = real-time HW decode through our libva backend.
|
|
|
|
## Findings
|
|
|
|
### (a) `385dee1` close+realloc per frame — CLEAN
|
|
|
|
Reviewer confirmed:
|
|
- Success path symmetric with pre-existing error path
|
|
- No double-close risk (`request_fd` is a local copy of `surface_object->request_fd`; both places zero the field before returning)
|
|
- `(void)0` placeholder is genuinely dead, no orphaned logic
|
|
- `DestroySurfaces` path's `if (surface_object->request_fd > 0) close(...)` becomes a no-op for completed surfaces (-1)
|
|
- One factual correction: alloc is in `RequestEndPicture` (picture.c:362-369), not `BeginPicture`. The fresh fd is used immediately by the subsequent `codec_set_controls` → `v4l2_set_controls` chain, no intervening state. Cost (~1 ioctl/frame) is negligible.
|
|
|
|
### (b) `74d8dd1` DPB `valid && !used` skip — CLEAN
|
|
|
|
`dpb_update()` runs every frame at the top of `h264_set_controls()`, clears `used` then re-marks based on current `ReferenceFrames[]`. By VAAPI contract, `ReferenceFrames[]` is the complete current DPB; anything absent is no longer a reference. Skipping `valid && !used` from the V4L2 dpb[] is correct.
|
|
|
|
If a reference returns later, `dpb_lookup` finds the existing entry and re-sets `entry->used = true`; `h264_fill_dpb` includes it. No silent reference drop.
|
|
|
|
### (c) C1 (YELLOW): empirical depth — 18 BeginPicture vs ≥720-frame criterion
|
|
|
|
`RequestEndPicture` calls `RequestSyncSurface` synchronously (picture.c:430), so BeginPicture count equals frames decoded. 18 frames ≠ 720 frames. The 49.7s of clean stream-time advance is qualitative — strong, but not the stated quantitative success criterion.
|
|
|
|
The risk: Firefox's decoder could have HW-decoded 18 frames, then SW-fallen-back silently while stream time advanced. The "no EINVAL" observation only covers the 18 HW requests actually made.
|
|
|
|
**Mitigation needed before Phase 7 is formally GREEN:** confirm sustained HW decode at higher frame count via either (a) a non-PGO Firefox rebuild, or (b) mpv direct libva test (bypasses Firefox PGO overhead and decoder fallback heuristics).
|
|
|
|
### (d) C2 (YELLOW): pre-existing B-slice L1 copy-paste bug
|
|
|
|
`h264_va_slice_to_v4l2` at h264.c:703 wrote `.fields` into `ref_pic_list0[i]` inside the L1 reflist loop. Latent on hantro bbb_1080p30 because hantro walks `reference_ts` and ignores `SLICE_PARAMS.fields` in FRAME_BASED mode, but wrong-by-construction.
|
|
|
|
**Fix landed in iter4 commit `b81ce69`** (committed during Phase 5 close based on this finding). Single-line change `ref_pic_list0[i].fields = fields` → `ref_pic_list1[i].fields = fields`.
|
|
|
|
### (e) Diagnostic instrumentation density — leave for iter5 sweep
|
|
|
|
Driver currently carries: iter1 ENTER/CAPTURE-dump traces + msync, iter1+ POC sentinel strip, iter3 Y2 v1, iter4 Y2 v3 + per-control TRY iso + DPB census. Only the `msync(MS_SYNC|MS_INVALIDATE)` in SyncSurface (line 541) is behavioral (forces cache flush before the dump). All other instrumentation is observation-only.
|
|
|
|
Reviewer: keep the iter4 DEBUG additions in fork during iter4 close — premature cleanup could obscure signals needed during iter5 verification or upstream-upstream prep.
|
|
|
|
## Resolution
|
|
|
|
- **C2**: ✓ landed via fork commit `b81ce69` (single-line `ref_pic_list0` → `ref_pic_list1`).
|
|
- **C1**: ✓ verified via mpv direct stress test:
|
|
```
|
|
$ LIBVA_DRIVER_NAME=v4l2_request mpv --hwdec=vaapi-copy --vo=null --no-audio bbb_1080p30_h264.mp4
|
|
Driver counters after 90s:
|
|
BeginPicture: 2130
|
|
SyncSurface: 4254
|
|
S_EXT_CTRLS EINVAL: 0
|
|
Unable to set control: 0
|
|
Generic EINVAL: 0
|
|
ENETDOWN: 0
|
|
```
|
|
2130 frames at 24 fps = ~89 seconds of bbb stream content decoded through libva-v4l2-request-fourier with zero V4L2 errors. >98% of theoretical max throughput (2160 frames/90s). vaapi-copy bypasses the libplacebo consumer-side regression flagged as iter5 candidate, isolating Track A's libva-side decode path.
|
|
|
|
## Phase 5 → close
|
|
|
|
- Phase 5 closed GREEN.
|
|
- Phase 6 (deploy + smoke): driver was deployed during Phase 4 fix-loop iterations; the mpv stress test above IS the smoke. Pass.
|
|
- Phase 7 (verify): 2130 BeginPictures clean. Pass.
|
|
- Phase 8 (iteration close): write `phase8_iteration4_close.md`.
|