Files

T

marfrit 5abea730a0 iter5b Phase 4 v2: re-plan with option β — CreateContext-centric OUTPUT lifecycle

Supersedes phase4_iter5b_plan.md (the α' plan rejected at Phase 7).
β architecture: strip OUTPUT-side V4L2 device state from
RequestCreateSurfaces2 entirely; move it to RequestCreateContext
where config_id (and therefore the bound profile) is unambiguously
known. CreateSurfaces2 becomes ID-allocation + per-surface
bookkeeping only.

9 contract clauses (C1..C9). Reuses 2 of 3 reverted iter5b commits
(codec.h/codec.c helper; object_config->pixelformat wire-up at
CreateConfig). New work: C3 strip surface.c, C4 build out
context.c — predicted ~120 LOC into context.c, ~190 LOC stripped
from surface.c (net ~70 LOC delta).

Risk register: 7 items; highest is multi-context resolution change
within shared driver_data (medium impact, mitigated by existing
DestroyContext teardown). α''s destructive teardown failure mode
disappears because β has no in-CreateSurfaces2 teardown branch.

Phase 5 review focus: error-recovery branches in CreateContext,
per-surface destination_* fill semantics (format-uniform fields
at CreateContext vs per-slot fields at BeginPicture), ohm
backwards-compat verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-12 12:57:54 +00:00

17 KiB

Raw Blame History

Iteration 5b — Phase 4 v2 (plan, option β)

Captured 2026-05-12 after Phase 7 → Phase 4 loopback on the failed α' attempt. Re-plan with option β: defer all OUTPUT-side V4L2 device state from RequestCreateSurfaces2 to RequestCreateContext — the lifecycle redesign the surface.c:164-171 TODO comment anticipated and the Phase 5 reviewer-of-Phase-2 originally recommended.

Why β, this time

Phase 7 empirically demonstrated α' fails because ffmpeg-vaapi calls vaCreateSurfaces2 multiple times per session, with vaCreateContext + STREAMON + frame decode interleaved. α''s pixfmt-change-triggers-teardown gate corrupts state when called mid-stream. β eliminates the problem at the architectural level:

CreateSurfaces2 no longer touches V4L2 device state. No S_FMT, no CREATE_BUFS, no cap_pool. It allocates surface_object IDs and stores width/height. That's it.
CreateContext owns the full lifecycle. It looks up config_object->profile (already does at context.c:71), derives the right pixelformat via pixelformat_for_profile(), does S_FMT(OUTPUT, …), queries CAPTURE format, runs cap_pool_init, walks the bound surface_ids[] to fill per-surface destination_*. Then proceeds to existing request_pool_init + STREAMON + device-wide controls.
DestroyContext already handles teardown correctly. STREAMOFF both queues, REQBUFS(0), cap_pool_destroy at context.c:276-307. No change needed.

The α' gate's destructive failure mode disappears because there's no gate — each CreateContext is a fresh setup, and the corresponding DestroyContext cleans up. mpv probe-then-real pattern: probe context creates with H264_SLICE (or whatever the probe profile), then DestroyContext, then real CreateContext with the real profile — STREAMOFF + REQBUFS(0) happens in DestroyContext between them.

Reusable from failed iter5b

Of the three reverted commits, two are reusable without modification for β:

iter5b commit (reverted)	Reusable for β?	Notes
`ce304ef` Commit A — `codec.h/codec.c` helper	YES, identical	`pixelformat_for_profile()` is profile→FOURCC map. Used at CreateConfig (commit B).
`f8256e6` Commit B — `last_output_pixelformat` field + `object_config->pixelformat` wire-up	PARTIAL	`object_config->pixelformat` wire-up at CreateConfig is reusable; the `last_output_pixelformat` field in driver_data is unnecessary in β (gate semantics move to per-CreateContext).
`4b2288f` Commit C — `find_sole_active_pixelformat` + surface.c gate change	NO	Wrong approach. β has no heap walk (CreateContext has config_id directly) and no surface.c gate (lifecycle is CreateContext-centric).

Contract clauses

C1 — `pixelformat_for_profile()` helper (re-introduce from reverted commit A)

New files: src/codec.h, src/codec.c (identical to reverted commit A).

Helper:

unsigned int pixelformat_for_profile(VAProfile profile);

Maps VAProfile* → V4L2_PIX_FMT_*_SLICE/FRAME. Returns 0 for unhandled.

C2 — Wire `object_config->pixelformat` at CreateConfig (re-introduce from reverted commit B)

Edit: src/config.c::RequestCreateConfig at line ~98.

+ #include "codec.h"
...
  config_object->profile = profile;
  config_object->entrypoint = entrypoint;
+ config_object->pixelformat = pixelformat_for_profile(profile);

Populates the previously-dead object_config->pixelformat field (config.h:46) with the right FOURCC for each profile. CreateContext reads this field.

C3 — Strip OUTPUT-side lifecycle from `RequestCreateSurfaces2`

Edit: src/surface.c::RequestCreateSurfaces2.

Remove:

Lines 173-232 (OUTPUT format S_FMT + teardown branch + last_output_*).
Lines 237-274 (driver_data->video_format probe — moves to CreateContext).
Lines 276-280 (G_FMT CAPTURE — moves to CreateContext).
Lines 282-326 (destination_* computation — moves to CreateContext).
Lines 305-313 (cap_pool_init — moves to CreateContext).
Inner per-surface destination_* fill at lines 340-359 (deferred to CreateContext walk).

Keep:

Line 135-153: function entry, declarations.
Line 234-235: VA_RT_FORMAT_YUV420 validation.
Lines 328-377: per-surface object_heap_allocate + minimal field init. Specifically: width, height, status = VASurfaceReady, current_slot = NULL, etc. Skip the destination_* fill (do it at CreateContext).

Result: RequestCreateSurfaces2 becomes ~60 LOC (was ~250 LOC). Pure ID allocation + per-surface bookkeeping.

C4 — Move OUTPUT-side lifecycle into `RequestCreateContext`

Edit: src/context.c::RequestCreateContext.

Current flow (post-revert):

config_object = CONFIG(driver_data, config_id);          // line 71
...
request_pool_init(&driver_data->output_pool, ...);       // line 103
...
v4l2_set_controls(... H264_DECODE_MODE/START_CODE ...);  // line 142+
v4l2_set_controls(... HEVC_DECODE_MODE/START_CODE ...);  // line 164+
switch (config_object->profile) {                        // line 205
   case VAProfileH264*/HEVCMain: h264_start_code = true;
   default: h264_start_code = false;
}
v4l2_set_stream(... output_type, true);                  // line 219 STREAMON
v4l2_set_stream(... capture_type, true);                 // line 225 STREAMON

New flow:

config_object = CONFIG(driver_data, config_id);

// β: derive OUTPUT pixelformat from the bound config's profile.
unsigned int pixelformat = config_object->pixelformat;
if (pixelformat == 0) {
    // Defensive: shouldn't happen because CreateConfig rejects unhandled profiles,
    // but Phase 5 review insists on never deref-ing 0 from helper return.
    status = VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
    goto error;
}

unsigned int output_type_pre = v4l2_type_video_output(true);  // assume mplane; CAPTURE probe sets true/false below
rc = v4l2_set_format(driver_data->video_fd, output_type_pre, pixelformat,
                     picture_width, picture_height);
if (rc < 0) { status = VA_STATUS_ERROR_OPERATION_FAILED; goto error; }

// β: probe CAPTURE format (moved from surface.c:237-274).
if (driver_data->video_format == NULL) {
    /* SUNXI_TILED_NV12 / NV12 probe identical to surface.c logic */
    ...
    driver_data->video_format = ...;
}
video_format = driver_data->video_format;
output_type = v4l2_type_video_output(video_format->v4l2_mplane);
capture_type = v4l2_type_video_capture(video_format->v4l2_mplane);

// β: G_FMT CAPTURE to get sizes (moved from surface.c:276-280).
rc = v4l2_get_format(driver_data->video_fd, capture_type,
                     &format_width, &format_height,
                     destination_bytesperlines, destination_sizes, NULL);
if (rc < 0) { ... }

// β: cap_pool_init (moved from surface.c:305-313).
if (!driver_data->capture_pool.initialized) {
    unsigned int pool_count = surfaces_count > MIN_CAP_POOL ?
                              surfaces_count : MIN_CAP_POOL;
    rc = cap_pool_init(&driver_data->capture_pool,
                       driver_data->video_fd, capture_type,
                       pool_count, video_format->v4l2_buffers_count);
    if (rc < 0) { ... }
}

// β: walk surface_ids and fill destination_* (moved from surface.c).
unsigned int destination_planes_count = video_format->planes_count;
if (video_format->v4l2_buffers_count == 1) {
    destination_sizes[0] = destination_bytesperlines[0] * format_height;
    for (j = 1; j < destination_planes_count; j++)
        destination_sizes[j] = destination_sizes[0] / 2;
}
for (i = 0; i < surfaces_count; i++) {
    struct object_surface *surface_object = SURFACE(driver_data, surfaces_ids[i]);
    if (surface_object == NULL) { ... }
    surface_object->destination_planes_count = destination_planes_count;
    surface_object->destination_buffers_count = video_format->v4l2_buffers_count;
    /* Fill destination_offsets[], destination_sizes[], destination_bytesperlines[]
       per video_format->v4l2_buffers_count branch (1 vs n). Same logic as surface.c
       lines 340-359. */
    ...
}

// existing flow continues:
request_pool_init(...);
v4l2_set_controls(... H264_DECODE_MODE/START_CODE ...);
v4l2_set_controls(... HEVC_DECODE_MODE/START_CODE ...);
switch (config_object->profile) { ... h264_start_code ... }
v4l2_set_stream(... output_type, true);
v4l2_set_stream(... capture_type, true);

Predicted LOC: ~120 LOC added to context.c (mostly transplanted from surface.c).

C5 — Resolution-change handling moves to DestroyContext

The existing DestroyContext at context.c:276-307 already does:

STREAMOFF both queues
REQBUFS(0) on OUTPUT
cap_pool_destroy

No change needed — that path is already a complete teardown. mpv's probe-then-real pattern works because each context cycle does its own setup + teardown.

C6 — `surface.h` per-surface fields stay the same

The object_surface struct still has destination_planes_count, destination_buffers_count, destination_sizes, destination_offsets, destination_bytesperlines, destination_data, etc. These are now filled at CreateContext time instead of CreateSurfaces2 time, but the field layout doesn't change. No header edit needed.

C7 — No `last_output_*` fields needed in driver_data

β doesn't need the last_output_width/height/pixelformat gate because the lifecycle is CreateContext-centric. The existing last_output_width/height fields in driver_data become dead.

Two sub-options for what to do with the dead fields:

a) Leave them in request_data with a /* DEAD post-iter5b-β; was used by pre-β CreateSurfaces2 gate */ comment. b) Delete them entirely.

Recommend (b) delete: cleaner, the dev-process discipline favors removing dead code. Also delete surface_reset_format_cache at surface.c:75-79 which assigns to these dead fields.

C8 — Phase 7 verification matrix

Re-run iter5 Phase 3 sweep. Expected hash matrix:

Codec	Pre-β libva	Expected post-β libva	kdirect (anchor)
H.264	`71ac099b…` keyframe partial	unchanged (Bug 4 deferred)	`1e7a0bc9…`
HEVC	`06b2c5a0…` all-zero	`9340b832…`	`9340b832…`
VP9	`06b2c5a0…` all-zero	`4f1565e8…`	`4f1565e8…`
MPEG-2	`19eefbf4…` worked	`19eefbf4…` (unchanged)	`19eefbf4…`
VP8	`06b2c5a0…` all-zero	`136ce5cb…`	`136ce5cb…`

PASS criteria same as Phase 4 v1: 3 codecs unblocked (HEVC + VP9 + VP8) + MPEG-2 unchanged + H.264 unchanged.

Plus an extra robustness check: re-run the sweep within the same backend process (or back-to-back ffmpeg invocations) to confirm no SIGSEGV. The α' failure mode was multi-CreateSurfaces2 mid-stream; β should be robust because each session is a self-contained CreateContext lifecycle.

C9 — Phase 8 close criteria

C8 verification 3/3 (HEVC/VP9/VP8) libva == kdirect.
No regression on H.264 keyframe + MPEG-2.
phase8_iteration5b_close.md documenting commits + verification.
Memory updates:
- Extend feedback_unconditional_codec_state.md with the pixelformat instance.
- Add feedback_va_lifecycle_profile_threading.md describing the VA-API CreateSurfaces-has-no-config_id observation and why CreateContext-centric lifecycle (β) is the right architecture for v4l2-request style backends.
- Add (or extend) a rule about trust the Phase 2 reviewer's architectural recommendation when the smaller alternative carries lifecycle ordering risk — the iter5b α' failure cost half a day's wallclock that β would have avoided.
Campaign scoreboard: "5/5 with 4 direct + 1 (H.264 inter) partial" — same goal as iter5b-α' tried to reach.

Risk register

Risk	Probability	Impact	Mitigation
`video_format` probe at CreateContext fails because CAPTURE format isn't queryable until OUTPUT format is set first	LOW	MEDIUM	The S_FMT(OUTPUT) call happens BEFORE the CAPTURE probe in C4's flow; hantro's CAPTURE-derivation happens at OUTPUT S_FMT time. Order is correct.
Stripping CreateSurfaces2's logic accidentally drops state that consumers expect	LOW	MEDIUM	Per VA-API spec, surfaces between CreateSurfaces and CreateContext are inert. Consumers should not query `destination_*` before CreateContext binds them. Phase 5 reviewer verifies.
`cap_pool_init` at CreateContext when surfaces_count==0 (ffmpeg vaapi-copy might pass 0)	LOW	LOW	Existing code at surface.c:306 already handles this via `surfaces_count > MIN_CAP_POOL ?` clamp. Transplanted as-is.
Multiple CreateContext calls in the same session (e.g., ffmpeg-vaapi re-init with different config)	MEDIUM	LOW	First CreateContext: clean setup. Subsequent CreateContext: skips `video_format` and `cap_pool_init` if already initialized (existing `if (!driver_data->video_format)` and `if (!driver_data->capture_pool.initialized)` checks transplant). For multi-context resolution OR profile change, mpv/ffmpeg patterns call DestroyContext between contexts, which already STREAMOFF+REQBUFS(0)+cap_pool_destroy. Phase 7 verifies.
Resolution-change within a single context (no DestroyContext)	LOW	LOW	Not in scope for iter5b-β; consumers that do this would need a separate iteration (the TODO comment's "for the next iteration after this one" target).
Failing CreateContext leaves V4L2 state half-set (S_FMT done but cap_pool_init failed)	LOW	MEDIUM	Error paths goto `error:` which frees the context. Need to also REQBUFS(0) + STREAMOFF if partial state was created. Phase 5 reviewer verifies the error-recovery branch.
`picture_width`/`picture_height` from VA-API differ from surface width/height	LOW	LOW	VA-API spec: surfaces and context picture dimensions should match for hardware decode. If they differ, S_FMT uses picture_*, which is the right kernel-facing dimension.

Phase 5 review concerns to invite

When Phase 4 v2 lands and Phase 5 reviewer engages, attack:

Error-recovery branches in CreateContext — if S_FMT succeeds but cap_pool_init fails, do we correctly REQBUFS(0) and STREAMOFF? Walk the goto error: flow.
Multi-context with shared driver_data — video_format and cap_pool are driver_data-level (shared across contexts). First CreateContext sets them up; subsequent CreateContexts skip the init. Is the skip-logic correct? What if the second context has a different resolution?
Surface ID lookup at CreateContext — what if a surfaces_ids[i] doesn't resolve via SURFACE() macro (already-destroyed)? Existing code path handles NULL via error return. Phase 5 verifies.
picture_width vs surface->width/height — surface_object stores width/height from CreateSurfaces2 (the surface's own dimensions). CreateContext uses picture_width/picture_height (the context's picture dimensions). What if they differ? Use picture_* for S_FMT (kernel state); fill surface_object->{width,height} with picture_*.
destination_data fill ordering — currently filled at surface_bind_slot (BeginPicture time, with the cap_pool_slot's actual mmap pointer). β's CreateContext walks should fill ONLY format-uniform fields (sizes, offsets, bytesperlines, planes_count, buffers_count); destination_data stays NULL until BeginPicture binds a slot. Phase 5 verifies the fields are correctly partitioned: per-format-uniform (CreateContext) vs per-slot (BeginPicture).
Backwards compat — the libva-multiplanar (ohm) campaign uses the same fork. ohm tests H.264 + MPEG-2 only. Both worked pre-iter5b (with the H264_SLICE hardcode happenstance-correct for H.264, and hantro's tolerance for MPEG-2). Post-β: H.264 path on ohm hantro gets explicit H264_SLICE; MPEG-2 path gets explicit MPEG2_SLICE. Both should still work or improve. Phase 5 verifies via ohm's Phase 3 anchors if available.

Predicted iter5b-β cadence

Phase 5 review: ~30 min (medium surface).
Phase 6 implementation:
- Re-introduce commit A + commit B (mostly identical to iter5b commits A + B): ~15 min.
- Commit C: strip surface.c — ~30 min careful editing.
- Commit D: build out context.c — ~45 min careful editing.
- Build + install on fresnel — ~10 min.
Phase 7 verification:
- Re-run sweep — ~5 min.
- Multi-codec back-to-back stress test — ~10 min.
Phase 8 close — ~30 min.

Total: 2-3 hours wallclock.

Phase 4 v2 → Phase 5 handoff

This document supersedes phase4_iter5b_plan.md (the rejected α' plan). Phase 5 reviewer reads this fresh; the prior review (phase5_iter5b_review.md) is historical context only.

Most relevant context for the reviewer:

phase7_iter5b_verification.md — the empirical failure mode that mandates β.
phase2_iter5b_situation.md — original lifecycle analysis. Recommended β; this Phase 4 v2 implements that recommendation.
phase0_iter5_loopback.md — root cause of Bug 2 (surface.c:173 hardcode).
phase3_iter5_baseline.md — pre-fix per-codec hash matrix (anchors for Phase 7).

Reviewer should specifically verify the C4 logic transplant from surface.c to context.c — the per-surface destination_* walk has tricky single-buffer vs multi-buffer-per-frame logic that mustn't break.

17 KiB Raw Blame History Unescape Escape