From 5abea730a0212a42a79fd04f4ffb12a23fb33ad5 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Tue, 12 May 2026 12:57:54 +0000 Subject: [PATCH] =?UTF-8?q?iter5b=20Phase=204=20v2:=20re-plan=20with=20opt?= =?UTF-8?q?ion=20=CE=B2=20=E2=80=94=20CreateContext-centric=20OUTPUT=20lif?= =?UTF-8?q?ecycle?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Supersedes phase4_iter5b_plan.md (the α' plan rejected at Phase 7). β architecture: strip OUTPUT-side V4L2 device state from RequestCreateSurfaces2 entirely; move it to RequestCreateContext where config_id (and therefore the bound profile) is unambiguously known. CreateSurfaces2 becomes ID-allocation + per-surface bookkeeping only. 9 contract clauses (C1..C9). Reuses 2 of 3 reverted iter5b commits (codec.h/codec.c helper; object_config->pixelformat wire-up at CreateConfig). New work: C3 strip surface.c, C4 build out context.c — predicted ~120 LOC into context.c, ~190 LOC stripped from surface.c (net ~70 LOC delta). Risk register: 7 items; highest is multi-context resolution change within shared driver_data (medium impact, mitigated by existing DestroyContext teardown). α''s destructive teardown failure mode disappears because β has no in-CreateSurfaces2 teardown branch. Phase 5 review focus: error-recovery branches in CreateContext, per-surface destination_* fill semantics (format-uniform fields at CreateContext vs per-slot fields at BeginPicture), ohm backwards-compat verification. Co-Authored-By: Claude Opus 4.7 (1M context) --- phase4_iter5b_plan_v2.md | 267 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 267 insertions(+) create mode 100644 phase4_iter5b_plan_v2.md diff --git a/phase4_iter5b_plan_v2.md b/phase4_iter5b_plan_v2.md new file mode 100644 index 0000000..b2e929c --- /dev/null +++ b/phase4_iter5b_plan_v2.md @@ -0,0 +1,267 @@ +# Iteration 5b — Phase 4 v2 (plan, option β) + +Captured 2026-05-12 after Phase 7 → Phase 4 loopback on the failed α' attempt. Re-plan with **option β: defer all OUTPUT-side V4L2 device state from `RequestCreateSurfaces2` to `RequestCreateContext`** — the lifecycle redesign the `surface.c:164-171` TODO comment anticipated and the Phase 5 reviewer-of-Phase-2 originally recommended. + +## Why β, this time + +Phase 7 empirically demonstrated α' fails because ffmpeg-vaapi calls `vaCreateSurfaces2` multiple times per session, with `vaCreateContext + STREAMON + frame decode` interleaved. α''s pixfmt-change-triggers-teardown gate corrupts state when called mid-stream. β eliminates the problem at the architectural level: + +- **`CreateSurfaces2` no longer touches V4L2 device state.** No S_FMT, no CREATE_BUFS, no cap_pool. It allocates surface_object IDs and stores width/height. That's it. +- **`CreateContext` owns the full lifecycle.** It looks up `config_object->profile` (already does at context.c:71), derives the right pixelformat via `pixelformat_for_profile()`, does `S_FMT(OUTPUT, …)`, queries CAPTURE format, runs `cap_pool_init`, walks the bound `surface_ids[]` to fill per-surface `destination_*`. Then proceeds to existing `request_pool_init` + STREAMON + device-wide controls. +- **`DestroyContext` already handles teardown correctly.** STREAMOFF both queues, REQBUFS(0), cap_pool_destroy at context.c:276-307. No change needed. + +The α' gate's destructive failure mode disappears because there's no gate — each `CreateContext` is a fresh setup, and the corresponding `DestroyContext` cleans up. mpv probe-then-real pattern: probe context creates with H264_SLICE (or whatever the probe profile), then DestroyContext, then real CreateContext with the real profile — STREAMOFF + REQBUFS(0) happens in DestroyContext between them. + +## Reusable from failed iter5b + +Of the three reverted commits, two are reusable without modification for β: + +| iter5b commit (reverted) | Reusable for β? | Notes | +|---|---|---| +| `ce304ef` Commit A — `codec.h/codec.c` helper | **YES, identical** | `pixelformat_for_profile()` is profile→FOURCC map. Used at CreateConfig (commit B). | +| `f8256e6` Commit B — `last_output_pixelformat` field + `object_config->pixelformat` wire-up | **PARTIAL** | `object_config->pixelformat` wire-up at CreateConfig is reusable; the `last_output_pixelformat` field in driver_data is unnecessary in β (gate semantics move to per-CreateContext). | +| `4b2288f` Commit C — `find_sole_active_pixelformat` + surface.c gate change | **NO** | Wrong approach. β has no heap walk (CreateContext has config_id directly) and no surface.c gate (lifecycle is CreateContext-centric). | + +## Contract clauses + +### C1 — `pixelformat_for_profile()` helper (re-introduce from reverted commit A) + +**New files**: `src/codec.h`, `src/codec.c` (identical to reverted commit A). + +Helper: +```c +unsigned int pixelformat_for_profile(VAProfile profile); +``` + +Maps `VAProfile*` → `V4L2_PIX_FMT_*_SLICE/FRAME`. Returns 0 for unhandled. + +Register in `meson.build`: add `'codec.c'` to sources, `'codec.h'` to headers. + +### C2 — Wire `object_config->pixelformat` at CreateConfig (re-introduce from reverted commit B) + +**Edit**: `src/config.c::RequestCreateConfig` at line ~98. + +```diff ++ #include "codec.h" +... + config_object->profile = profile; + config_object->entrypoint = entrypoint; ++ config_object->pixelformat = pixelformat_for_profile(profile); +``` + +Populates the previously-dead `object_config->pixelformat` field (config.h:46) with the right FOURCC for each profile. CreateContext reads this field. + +### C3 — Strip OUTPUT-side lifecycle from `RequestCreateSurfaces2` + +**Edit**: `src/surface.c::RequestCreateSurfaces2`. + +Remove: +- Lines 173-232 (OUTPUT format S_FMT + teardown branch + last_output_*). +- Lines 237-274 (`driver_data->video_format` probe — moves to CreateContext). +- Lines 276-280 (G_FMT CAPTURE — moves to CreateContext). +- Lines 282-326 (destination_* computation — moves to CreateContext). +- Lines 305-313 (cap_pool_init — moves to CreateContext). +- Inner per-surface destination_* fill at lines 340-359 (deferred to CreateContext walk). + +Keep: +- Line 135-153: function entry, declarations. +- Line 234-235: `VA_RT_FORMAT_YUV420` validation. +- Lines 328-377: per-surface object_heap_allocate + minimal field init. Specifically: `width`, `height`, `status = VASurfaceReady`, `current_slot = NULL`, etc. Skip the destination_* fill (do it at CreateContext). + +Result: `RequestCreateSurfaces2` becomes ~60 LOC (was ~250 LOC). Pure ID allocation + per-surface bookkeeping. + +### C4 — Move OUTPUT-side lifecycle into `RequestCreateContext` + +**Edit**: `src/context.c::RequestCreateContext`. + +Current flow (post-revert): + +```c +config_object = CONFIG(driver_data, config_id); // line 71 +... +request_pool_init(&driver_data->output_pool, ...); // line 103 +... +v4l2_set_controls(... H264_DECODE_MODE/START_CODE ...); // line 142+ +v4l2_set_controls(... HEVC_DECODE_MODE/START_CODE ...); // line 164+ +switch (config_object->profile) { // line 205 + case VAProfileH264*/HEVCMain: h264_start_code = true; + default: h264_start_code = false; +} +v4l2_set_stream(... output_type, true); // line 219 STREAMON +v4l2_set_stream(... capture_type, true); // line 225 STREAMON +``` + +New flow: + +```c +config_object = CONFIG(driver_data, config_id); + +// β: derive OUTPUT pixelformat from the bound config's profile. +unsigned int pixelformat = config_object->pixelformat; +if (pixelformat == 0) { + // Defensive: shouldn't happen because CreateConfig rejects unhandled profiles, + // but Phase 5 review insists on never deref-ing 0 from helper return. + status = VA_STATUS_ERROR_UNSUPPORTED_PROFILE; + goto error; +} + +unsigned int output_type_pre = v4l2_type_video_output(true); // assume mplane; CAPTURE probe sets true/false below +rc = v4l2_set_format(driver_data->video_fd, output_type_pre, pixelformat, + picture_width, picture_height); +if (rc < 0) { status = VA_STATUS_ERROR_OPERATION_FAILED; goto error; } + +// β: probe CAPTURE format (moved from surface.c:237-274). +if (driver_data->video_format == NULL) { + /* SUNXI_TILED_NV12 / NV12 probe identical to surface.c logic */ + ... + driver_data->video_format = ...; +} +video_format = driver_data->video_format; +output_type = v4l2_type_video_output(video_format->v4l2_mplane); +capture_type = v4l2_type_video_capture(video_format->v4l2_mplane); + +// β: G_FMT CAPTURE to get sizes (moved from surface.c:276-280). +rc = v4l2_get_format(driver_data->video_fd, capture_type, + &format_width, &format_height, + destination_bytesperlines, destination_sizes, NULL); +if (rc < 0) { ... } + +// β: cap_pool_init (moved from surface.c:305-313). +if (!driver_data->capture_pool.initialized) { + unsigned int pool_count = surfaces_count > MIN_CAP_POOL ? + surfaces_count : MIN_CAP_POOL; + rc = cap_pool_init(&driver_data->capture_pool, + driver_data->video_fd, capture_type, + pool_count, video_format->v4l2_buffers_count); + if (rc < 0) { ... } +} + +// β: walk surface_ids and fill destination_* (moved from surface.c). +unsigned int destination_planes_count = video_format->planes_count; +if (video_format->v4l2_buffers_count == 1) { + destination_sizes[0] = destination_bytesperlines[0] * format_height; + for (j = 1; j < destination_planes_count; j++) + destination_sizes[j] = destination_sizes[0] / 2; +} +for (i = 0; i < surfaces_count; i++) { + struct object_surface *surface_object = SURFACE(driver_data, surfaces_ids[i]); + if (surface_object == NULL) { ... } + surface_object->destination_planes_count = destination_planes_count; + surface_object->destination_buffers_count = video_format->v4l2_buffers_count; + /* Fill destination_offsets[], destination_sizes[], destination_bytesperlines[] + per video_format->v4l2_buffers_count branch (1 vs n). Same logic as surface.c + lines 340-359. */ + ... +} + +// existing flow continues: +request_pool_init(...); +v4l2_set_controls(... H264_DECODE_MODE/START_CODE ...); +v4l2_set_controls(... HEVC_DECODE_MODE/START_CODE ...); +switch (config_object->profile) { ... h264_start_code ... } +v4l2_set_stream(... output_type, true); +v4l2_set_stream(... capture_type, true); +``` + +Predicted LOC: ~120 LOC added to context.c (mostly transplanted from surface.c). + +### C5 — Resolution-change handling moves to DestroyContext + +The existing DestroyContext at context.c:276-307 already does: +- STREAMOFF both queues +- REQBUFS(0) on OUTPUT +- cap_pool_destroy + +No change needed — that path is already a complete teardown. mpv's probe-then-real pattern works because each context cycle does its own setup + teardown. + +### C6 — `surface.h` per-surface fields stay the same + +The `object_surface` struct still has `destination_planes_count`, `destination_buffers_count`, `destination_sizes`, `destination_offsets`, `destination_bytesperlines`, `destination_data`, etc. These are now filled at CreateContext time instead of CreateSurfaces2 time, but the field layout doesn't change. No header edit needed. + +### C7 — No `last_output_*` fields needed in driver_data + +β doesn't need the `last_output_width/height/pixelformat` gate because the lifecycle is CreateContext-centric. The existing `last_output_width/height` fields in driver_data become dead. + +**Two sub-options** for what to do with the dead fields: + +a) Leave them in `request_data` with a `/* DEAD post-iter5b-β; was used by pre-β CreateSurfaces2 gate */` comment. +b) Delete them entirely. + +Recommend **(b) delete**: cleaner, the dev-process discipline favors removing dead code. Also delete `surface_reset_format_cache` at surface.c:75-79 which assigns to these dead fields. + +### C8 — Phase 7 verification matrix + +Re-run iter5 Phase 3 sweep. Expected hash matrix: + +| Codec | Pre-β libva | Expected post-β libva | kdirect (anchor) | +|---|---|---|---| +| H.264 | `71ac099b…` keyframe partial | unchanged (Bug 4 deferred) | `1e7a0bc9…` | +| HEVC | `06b2c5a0…` all-zero | **`9340b832…`** | `9340b832…` | +| VP9 | `06b2c5a0…` all-zero | **`4f1565e8…`** | `4f1565e8…` | +| MPEG-2 | `19eefbf4…` worked | `19eefbf4…` (unchanged) | `19eefbf4…` | +| VP8 | `06b2c5a0…` all-zero | **`136ce5cb…`** | `136ce5cb…` | + +PASS criteria same as Phase 4 v1: 3 codecs unblocked (HEVC + VP9 + VP8) + MPEG-2 unchanged + H.264 unchanged. + +Plus an extra robustness check: **re-run the sweep within the same backend process** (or back-to-back ffmpeg invocations) to confirm no SIGSEGV. The α' failure mode was multi-CreateSurfaces2 mid-stream; β should be robust because each session is a self-contained CreateContext lifecycle. + +### C9 — Phase 8 close criteria + +- C8 verification 3/3 (HEVC/VP9/VP8) libva == kdirect. +- No regression on H.264 keyframe + MPEG-2. +- `phase8_iteration5b_close.md` documenting commits + verification. +- Memory updates: + - Extend `feedback_unconditional_codec_state.md` with the pixelformat instance. + - Add `feedback_va_lifecycle_profile_threading.md` describing the VA-API CreateSurfaces-has-no-config_id observation and why CreateContext-centric lifecycle (β) is the right architecture for v4l2-request style backends. + - Add (or extend) a rule about **trust the Phase 2 reviewer's architectural recommendation when the smaller alternative carries lifecycle ordering risk** — the iter5b α' failure cost half a day's wallclock that β would have avoided. +- Campaign scoreboard: "5/5 with 4 direct + 1 (H.264 inter) partial" — same goal as iter5b-α' tried to reach. + +## Risk register + +| Risk | Probability | Impact | Mitigation | +|---|---|---|---| +| `video_format` probe at CreateContext fails because CAPTURE format isn't queryable until OUTPUT format is set first | LOW | MEDIUM | The S_FMT(OUTPUT) call happens BEFORE the CAPTURE probe in C4's flow; hantro's CAPTURE-derivation happens at OUTPUT S_FMT time. Order is correct. | +| Stripping CreateSurfaces2's logic accidentally drops state that consumers expect | LOW | MEDIUM | Per VA-API spec, surfaces between CreateSurfaces and CreateContext are inert. Consumers should not query `destination_*` before CreateContext binds them. Phase 5 reviewer verifies. | +| `cap_pool_init` at CreateContext when surfaces_count==0 (ffmpeg vaapi-copy might pass 0) | LOW | LOW | Existing code at surface.c:306 already handles this via `surfaces_count > MIN_CAP_POOL ?` clamp. Transplanted as-is. | +| Multiple CreateContext calls in the same session (e.g., ffmpeg-vaapi re-init with different config) | MEDIUM | LOW | First CreateContext: clean setup. Subsequent CreateContext: skips `video_format` and `cap_pool_init` if already initialized (existing `if (!driver_data->video_format)` and `if (!driver_data->capture_pool.initialized)` checks transplant). For multi-context resolution OR profile change, mpv/ffmpeg patterns call DestroyContext between contexts, which already STREAMOFF+REQBUFS(0)+cap_pool_destroy. Phase 7 verifies. | +| Resolution-change within a single context (no DestroyContext) | LOW | LOW | Not in scope for iter5b-β; consumers that do this would need a separate iteration (the TODO comment's "for the next iteration after this one" target). | +| Failing CreateContext leaves V4L2 state half-set (S_FMT done but cap_pool_init failed) | LOW | MEDIUM | Error paths goto `error:` which frees the context. Need to also REQBUFS(0) + STREAMOFF if partial state was created. Phase 5 reviewer verifies the error-recovery branch. | +| `picture_width`/`picture_height` from VA-API differ from surface width/height | LOW | LOW | VA-API spec: surfaces and context picture dimensions should match for hardware decode. If they differ, S_FMT uses picture_*, which is the right kernel-facing dimension. | + +## Phase 5 review concerns to invite + +When Phase 4 v2 lands and Phase 5 reviewer engages, attack: + +1. **Error-recovery branches in CreateContext** — if S_FMT succeeds but cap_pool_init fails, do we correctly REQBUFS(0) and STREAMOFF? Walk the `goto error:` flow. +2. **Multi-context with shared driver_data** — `video_format` and `cap_pool` are driver_data-level (shared across contexts). First CreateContext sets them up; subsequent CreateContexts skip the init. Is the skip-logic correct? What if the second context has a different resolution? +3. **Surface ID lookup at CreateContext** — what if a `surfaces_ids[i]` doesn't resolve via `SURFACE()` macro (already-destroyed)? Existing code path handles NULL via error return. Phase 5 verifies. +4. **picture_width vs surface->width/height** — `surface_object` stores width/height from CreateSurfaces2 (the surface's own dimensions). CreateContext uses picture_width/picture_height (the context's picture dimensions). What if they differ? Use picture_* for S_FMT (kernel state); fill surface_object->{width,height} with picture_*. +5. **`destination_data` fill ordering** — currently filled at surface_bind_slot (BeginPicture time, with the cap_pool_slot's actual mmap pointer). β's CreateContext walks should fill ONLY format-uniform fields (sizes, offsets, bytesperlines, planes_count, buffers_count); destination_data stays NULL until BeginPicture binds a slot. Phase 5 verifies the fields are correctly partitioned: per-format-uniform (CreateContext) vs per-slot (BeginPicture). +6. **Backwards compat** — the libva-multiplanar (ohm) campaign uses the same fork. ohm tests H.264 + MPEG-2 only. Both worked pre-iter5b (with the H264_SLICE hardcode happenstance-correct for H.264, and hantro's tolerance for MPEG-2). Post-β: H.264 path on ohm hantro gets explicit H264_SLICE; MPEG-2 path gets explicit MPEG2_SLICE. Both should still work or improve. Phase 5 verifies via ohm's Phase 3 anchors if available. + +## Predicted iter5b-β cadence + +- Phase 5 review: ~30 min (medium surface). +- Phase 6 implementation: + - Re-introduce commit A + commit B (mostly identical to iter5b commits A + B): ~15 min. + - Commit C: strip surface.c — ~30 min careful editing. + - Commit D: build out context.c — ~45 min careful editing. + - Build + install on fresnel — ~10 min. +- Phase 7 verification: + - Re-run sweep — ~5 min. + - Multi-codec back-to-back stress test — ~10 min. +- Phase 8 close — ~30 min. + +Total: 2-3 hours wallclock. + +## Phase 4 v2 → Phase 5 handoff + +This document supersedes `phase4_iter5b_plan.md` (the rejected α' plan). Phase 5 reviewer reads this fresh; the prior review (`phase5_iter5b_review.md`) is historical context only. + +Most relevant context for the reviewer: +- `phase7_iter5b_verification.md` — the empirical failure mode that mandates β. +- `phase2_iter5b_situation.md` — original lifecycle analysis. Recommended β; this Phase 4 v2 implements that recommendation. +- `phase0_iter5_loopback.md` — root cause of Bug 2 (surface.c:173 hardcode). +- `phase3_iter5_baseline.md` — pre-fix per-codec hash matrix (anchors for Phase 7). + +Reviewer should specifically verify the C4 logic transplant from surface.c to context.c — the per-surface destination_* walk has tricky single-buffer vs multi-buffer-per-frame logic that mustn't break.