Files

T

marfrit 6b0e023e7f iter5b Phase 2: situation — lifecycle traced, option β (defer to CreateContext) recommended

VA-API lifecycle traced: CreateConfig stores profile in object_config;
CreateSurfaces2 has NO config_id, can't access profile; CreateContext
takes VAConfigID and already does profile-switch for h264_start_code
(context.c:205-217, iter4 fix-forward 692eaa0).

surface.c:164-171 already flags this as deferred-work in a TODO comment:
"that's a context-level redesign for the next iteration." iter5b picks
up that deferred work.

Three options analyzed empirically:
- α: thread current_profile through driver_data (15 LOC, fragile semantic)
- β: move OUTPUT-side lifecycle to CreateContext (80-150 LOC, clean)
- γ: lazy at BeginPicture (architecturally wrong site)

Recommendation: option β. iter4 reviewer accepted the deferred-work
flag in surface.c; iter5b is the iteration that addresses it.

object_config->pixelformat field at config.h:46 is declared but never
assigned — opportunity for wiring up cleanly via the profile→pixelformat
map.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-11 19:12:58 +00:00

15 KiB

Raw Permalink Blame History

Iteration 5b — Phase 2 (situation analysis)

Captured 2026-05-11 immediately after Phase 0 loopback. Per feedback_dev_process.md Phase 2: source-read the libva backend lifecycle, locate the right fix site for the surface.c:173 hardcoded OUTPUT pixel format, cite line numbers, propose a Phase 4 plan shape.

VA-API decode lifecycle (call ordering)

VA-API contract (standard sequence for a single decode session):

1. vaInitialize()                              — backend dlopen + __vaDriverInit
2. vaQueryConfigProfiles()                     — backend RequestQueryConfigProfiles
3. vaCreateConfig(profile, entrypoint, ...)    — backend RequestCreateConfig
4. vaCreateSurfaces(format, w, h, ...)         — backend RequestCreateSurfaces2
5. vaCreateContext(config_id, w, h, surfaces)  — backend RequestCreateContext
6. vaBeginPicture(context, surface)            — backend RequestBeginPicture
7. vaRenderPicture(context, buffers, n)        — backend RequestRenderPicture
8. vaEndPicture(context)                       — backend RequestEndPicture
9. vaSyncSurface(surface)                      — backend RequestSyncSurface
10. (consumer reads pixels via vaDeriveImage / vaExportSurfaceHandle / etc.)

Profile is known at step 3 (CreateConfig) and stored in object_config->profile (config.h:42). Surfaces are created at step 4 (CreateSurfaces2) without a config_id parameter — they're codec-agnostic at allocation time. Config and surfaces are linked at step 5 (CreateContext) via VAConfigID config_id.

Current backend state — relevant call sites

config.c:44 — `RequestCreateConfig`

Stores profile and entrypoint in object_config. Per config.h:39-47:

struct object_config {
    struct object_base base;
    VAProfile profile;                              // ← set at line 44+
    VAEntrypoint entrypoint;
    VAConfigAttrib attributes[V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES];
    int attributes_count;
    unsigned int pixelformat;                       // ← DECLARED, NEVER ASSIGNED
};

The pixelformat field at config.h:46 is intent-of-some-prior-iteration scaffolding that was never wired up. grep -nE "\bpixelformat\s*=" config.c returns zero hits. The field is dead.

surface.c:135-232 — `RequestCreateSurfaces2`

The OUTPUT pixel-format S_FMT site, which is the actual bug:

surface.c:173: unsigned int pixelformat = V4L2_PIX_FMT_H264_SLICE; — hardcoded for every profile. This is Bug 2's root.
surface.c:225-226: v4l2_set_format(driver_data->video_fd, output_type, pixelformat, width, height); — the S_FMT call using the hardcoded format.
surface.c:176-232: guarded by last_output_{width,height} change. Per-resolution gating exists (mpv probe pattern); per-profile gating doesn't.

CreateSurfaces2 has no config_id parameter, no path to object_config->profile. The function signature is fixed by the VA-API contract:

VAStatus RequestCreateSurfaces2(VADriverContextP context, unsigned int format,
                unsigned int width, unsigned int height,
                VASurfaceID *surfaces_ids, unsigned int surfaces_count,
                VASurfaceAttrib *attributes, unsigned int attributes_count)

format here is the RT-format (e.g. VA_RT_FORMAT_YUV420), not a codec — confirmed by the check at surface.c:234 (if (format != VA_RT_FORMAT_YUV420)).

context.c:49-227 — `RequestCreateContext`

Profile-aware. Already does the profile switch pattern from the iter4 h264_start_code fix:

context.c:49: VAStatus RequestCreateContext(..., VAConfigID config_id, ...) — config_id is a parameter.
context.c:71-75: config_object = CONFIG(driver_data, config_id); — lookup makes profile available.
context.c:103-109: request_pool_init(&driver_data->output_pool, ...) — OUTPUT CREATE_BUFS happens here (16 buffers, iter6 size). NOT in CreateSurfaces2.

context.c:205-217: existing profile switch for h264_start_code (iter4 fix-forward 692eaa0):

switch (config_object->profile) {
case VAProfileH264Main:
case VAProfileH264High:
case VAProfileH264ConstrainedBaseline:
case VAProfileH264MultiviewHigh:
case VAProfileH264StereoHigh:
case VAProfileHEVCMain:
    context_object->h264_start_code = true;
    break;
default:
    context_object->h264_start_code = false;
    break;
}

context.c:219-227: v4l2_set_stream(...output_type, true) and v4l2_set_stream(...capture_type, true) — STREAMON for both queues.

So the actual ordering on a real strace (iter5 P3 loopback, HEVC):

trace line 22 :  S_FMT(OUTPUT, H264_SLICE)               ← surface.c:225 inside CreateSurfaces2
trace line 27 :  CREATE_BUFS(CAPTURE, 24 buffers, NV12)  ← CreateSurfaces2 cap_pool init
trace lines 29-77: mmap CAPTURE buffers + REQBUFS         ← surface.c later
trace line 99 :  CREATE_BUFS(OUTPUT, 16 buffers, H264_SLICE)  ← context.c:103 inside CreateContext
trace lines 101-137: mmap OUTPUT buffers                  ← request_pool_init
trace line 148-149: S_EXT_CTRLS(H.264/HEVC DECODE_MODE+START_CODE device-wide controls)
trace line 152+: S_EXT_CTRLS(actual HEVC controls)        ← context.c HEVC controls + per-request controls

The S_FMT at line 22 sets H264_SLICE. By the time CreateContext at line 99 onwards runs with the correct profile in hand, the OUTPUT queue is already locked to H264_SLICE format AND CREATE_BUFS has been done at line 99 for that wrong format.

Why this is a non-trivial lifecycle gap

The fix isn't just "switch on profile at surface.c:173" because surface.c doesn't know the profile. Three architectural options:

Option α — Thread profile into driver_data

Add unsigned int current_pixelformat (or VAProfile current_profile) to struct request_data (request.h:50+). Set it at CreateConfig time. Read it at surface.c:173.

Pro: minimal patch. ~15 LOC: one struct field + one assignment + one read.

Con: fragile semantic. If multiple configs are alive (vainfo probing N profiles), current_* holds whichever was created last — irrelevant for vainfo (no surfaces), but a footgun if a future consumer creates surfaces between probes. Per VA-API spec this is allowed but unusual.

Option β — Defer OUTPUT-side lifecycle from CreateSurfaces to CreateContext

CreateSurfaces2 stops doing S_FMT(OUTPUT). CreateContext (after the config_object = CONFIG(...) lookup at line 71) does:

S_FMT(OUTPUT, profile-derived-pixelformat)
CREATE_BUFS(OUTPUT, ...) — already happens here at line 103
(existing CREATE_BUFS for OUTPUT in request_pool_init)

CreateSurfaces2 keeps CAPTURE-side work (CAPTURE CREATE_BUFS, mmap, cap_pool init) — that's codec-agnostic NV12 anyway.

Pro: profile unambiguously known at the moment the OUTPUT format is set. Clean semantic.

Con: bigger refactor. Need to think about the last_output_width/height tracking — it currently lives in CreateSurfaces2 and gates the S_FMT on resolution change. Moving it requires either replicating the gate in CreateContext or accepting the redundant S_FMT on every CreateContext (probably harmless if the kernel is idempotent on same-format S_FMT, but worth testing).

The surface.c:164-171 comment block already flags this:

TODO: this is still not a clean architecture — v4l2_set_format after CREATE_BUFS requires REQBUFS(0) first (kernel returns EBUSY otherwise). For mpv's pattern (probe with small, then allocate big) the small probe surfaces have not been streamed yet, so REQBUFS(0) on them works. For consumers that legitimately stream multiple resolutions in sequence, we'd need to STREAMOFF + REQBUFS(0) + new S_FMT + new CREATE_BUFS — that's a context-level redesign for the next iteration.

iter5b is "the next iteration" the comment anticipates.

Option γ — Lazy at BeginPicture

CreateSurfaces2 sets no OUTPUT format. CreateContext sets no OUTPUT format. First RequestBeginPicture checks the active context's profile and conditionally sets format if it changed.

Pro: profile fully bound; resolution fully known; format set lazily.

Con: heavyweight — every BeginPicture pays a cost (cheap if the gate is a profile comparison, but adds branch). Plus picture.c gets format-management responsibility which is architecturally out of place.

Picking α vs β

Recommendation: Option β with the gate-tracking moved to a small driver_data field. Specifically:

Strip surface.c:173-232's OUTPUT-side S_FMT block. CreateSurfaces2 keeps CAPTURE-side allocation only.
CreateContext (context.c:103-ish, right before request_pool_init) does:
- Look up config_object->profile (already done at line 71).
- Map profile → V4L2_PIX_FMT_*_SLICE / *_FRAME via the existing config.c mapping inside RequestQueryConfigProfiles (config.c:140-188 enumerates all these mappings).
- If driver_data->last_output_pixelformat != desired || resolution changed: REQBUFS(0) on OUTPUT, REQBUFS(0) on CAPTURE (since hantro derives CAPTURE from OUTPUT), re-S_FMT(OUTPUT, profile-format), proceed to CREATE_BUFS.
- Track driver_data->last_output_pixelformat alongside the existing last_output_width/height.
Move the cap_pool_init from CreateSurfaces2 to CreateContext (since CAPTURE format depends on OUTPUT format, and OUTPUT format is now set in CreateContext).

This is a moderate refactor — predicted ~80-150 LOC delta. The existing iter1+iter2+iter6+iter7 work on the OUTPUT/CAPTURE pool teardown semantics (surface.c:178-223) carries forward but moves from surface.c to context.c.

Option α is smaller (~15 LOC) but semantically fragile. The fragility is unlikely to bite in production (mpv/ffmpeg/Firefox/Chromium all create single config + single context per decode session), but it's wrong for the contract.

Phase 4 plan shape (iter5b)

Tentatively:

Patch 1: refactor — relocate OUTPUT format S_FMT + CREATE_BUFS lifecycle from surface.c to context.c. The cap_pool init also moves to context.c (since CAPTURE format is implicit from OUTPUT format on hantro). Likely 1 file, 80-150 LOC.
Patch 2: introduce profile→pixelformat helper function. Probably in config.c or a new codec.c shared utility. Returns the right V4L2 fourcc for each VAProfile. Single switch statement.
Patch 3: wire patch 1 to call patch 2. ~10 LOC.
Manifest / build: rebuild backend via meson on fresnel. No kernel work.

Estimated total: 100-200 LOC across 1-2 files, single rebuild + install + Phase 7 sweep.

Why option α is tempting but wrong

The smallest possible fix is:

// request.h: add `VAProfile current_profile;` to struct request_data

// config.c::RequestCreateConfig: at the end, before return:
driver_data->current_profile = profile;

// surface.c:173: replace hardcoded constant with:
unsigned int pixelformat = pixelformat_for_profile(driver_data->current_profile);

This works for every real-world consumer (mpv/ffmpeg/Firefox/Chromium). It even handles mpv's small-probe → real-allocate flow correctly (the second CreateSurfaces2 fires AFTER any new CreateConfig in mpv's flow). But:

The semantic is "the surface implicitly inherits the most recent config's profile" — invented, not specified by VA-API.
The Phase 5 reviewer will rightly flag this as fragile and ask: "what if the consumer creates two configs and one surface, expecting the surface to work with both?"
And the comment block at surface.c:164-171 ALREADY says "that's a context-level redesign for the next iteration" — meaning the operator (and the iter1 reviewer) already saw this gap and accepted it as deferred-work.

Option β honors the comment's deferred-work flag. Option α defers it again.

Recommendation reaffirmed: option β. iter5b is the iteration that picks up the deferred work the surface.c comment flagged.

Cross-references / memory rules touched

feedback_unconditional_codec_state.md — the surface.c:173 hardcode is the same class of bug as the iter4 h264_start_code = true unconditional set. The fix shape (profile switch) generalizes.
feedback_trace_fix_mechanism_to_consumer.md — Phase 0 loopback discovered the actual bug by tracing producer→primitive→consumer. Phase 2 traces the fix mechanism (option β) end-to-end to ensure it actually reaches the kernel's OUTPUT-format check at S_FMT.
feedback_review_empirical_over_theoretical.md — every claim in this doc cites file:line. No "the lifecycle is X" without grep evidence.

Open questions for Phase 4

Hantro's CAPTURE-format-derivation — when OUTPUT format is HEVC_SLICE vs H264_SLICE on hantro, does hantro's implicit CAPTURE format derivation produce the same NV12 layout? rkvdec we know stays NV12. Hantro decoder block on RK3399 derives CAPTURE shape from SPS bytes; OUTPUT format is the type-tag. Phase 4 verifies that hantro tolerates profile-correct OUTPUT format without breaking CAPTURE side.
Format-cache key shape — current last_output_{width,height} gates re-S_FMT on resolution change. Adding last_output_pixelformat to the gate key is mechanical; pick the struct/touch ordering carefully.
request_pool / cap_pool lifetime relocation — moving from CreateSurfaces2 to CreateContext means the teardown branch at surface.c:196-223 also needs to move. CreateContext currently destroys nothing at entry (assumes pools were either created or are fresh). Phase 4 designs the resolution-change-AND-profile-change-AND-first-context teardown logic.
iter4's existing h264_start_code switch (context.c:205-217) — same site as patch 1's pixelformat switch. Phase 4 considers whether to fold both into one switch or keep them separate (separate is probably cleaner — they're different decisions).

What Phase 3 produces (already produced)

Phase 3 baseline anchor data is at iter5_phase3_baseline.tgz — 5-codec sweep, control-payload strace anchors. No new Phase 3 work needed: the post-fix sweep at Phase 7 will compare against the same baseline. Specifically:

Pre-fix libva hashes (will change): 71ac099b… (h264), 06b2c5a0… (hevc/vp9/vp8 all-zero shared signature), 19eefbf4… (mpeg2 working).
Post-fix libva hashes expected: equal to kdirect hashes 1e7a0bc9… (h264 keyframe-only), 9340b832… (hevc), 4f1565e8… (vp9), 136ce5cb… (vp8); mpeg2 stays 19eefbf4… unchanged.

Phase 5 review concerns (to invite the reviewer to attack)

When Phase 4 plan is ready, the sonnet-architect should specifically check:

Profile-to-pixelformat mapping completeness — does the patch handle every VAProfile the backend currently advertises? Missing one would leave that codec broken silently.
Hantro CAPTURE format implication — patches the OUTPUT format change without breaking the hantro CAPTURE-derivation contract for MPEG-2 + VP8.
cap_pool / request_pool relocation ordering — particularly the teardown branches on resolution change AND profile change AND first-context-init.
Phase 7 verification matrix — strict equality on libva==kdirect raw YUV hashes for HEVC, VP9, VP8. MPEG-2 expected unchanged. H.264 partial expected (Bug 4 not in scope).
Backwards compat for ohm campaign — the same fork master is used by libva-multiplanar. ohm campaign tests H.264 + MPEG-2 only (per "I cheated. I descoped vp8 and mpeg2 on ohm" finding). MPEG-2 path on ohm hantro is unchanged (always was correct format via accidental match, since pre-iter5b only H.264 worked correctly anyway). Verify the patch doesn't introduce ohm regression.

15 KiB Raw Permalink Blame History Unescape Escape