diff --git a/phase4_iter21_plan.md b/phase4_iter21_plan.md new file mode 100644 index 0000000..04583b9 --- /dev/null +++ b/phase4_iter21_plan.md @@ -0,0 +1,60 @@ +## Iteration 21 — Phase 4 (plan) + +Opens 2026-05-14. Continues iter20's localization: rkvdec sees all-zero ctx->ctrl_hdl SPS for libva, real bytes for kdirect. The break is between userspace S_EXT_CTRLS and rkvdec's read of `ctx->ctrl_hdl[SPS].p_cur.p`. + +### Locked research question (iter21) + +> *"At the v4l2_ctrl_request_setup() entry for libva's per-frame request_fd, is the V4L2 control-handler object (`obj`) found? For each control_ref in the request's hdl->ctrl_refs, is `p_req_valid == true`?"* + +### What this narrows + +`v4l2_ctrl_request_setup(req, main_hdl)` at IOC_QUEUE time iterates `hdl->ctrl_refs` and only applies controls where `ref->p_req_valid == true`. The bit gets set by the staging path in `try_set_ext_ctrls_common` (called from `try_set_ext_ctrls_request`) when which=V4L2_CTRL_WHICH_REQUEST_VAL. + +If libva's S_EXT_CTRLS staged correctly, p_req_valid is true for each ctrl libva submitted. If staging failed silently, p_req_valid is false and `v4l2_ctrl_request_setup` skips them — ctx->ctrl_hdl stays at zero (matches iter20 evidence). + +### Approach (α-24 inert → kernel path) + +α-24 (libva G_EXT_CTRLS readback after S_EXT_CTRLS) was implemented in 1547a5d+a9c897f, returned `EACCES` for all 13 libva HEVC frames. Reverted in e109306. The kernel disallows G_EXT_CTRLS against a not-yet-completed request — userspace can't probe `req->p_new`. Mechanism distinguishing requires kernel printk. + +iter21 patches `v4l2_ctrl_request_setup` with two printk lines: + +```c +pr_info("iter21_setup: req=%p main_hdl=%p obj=%p\n", + req, main_hdl, obj); + +pr_info("iter21_setup_ref: ctrl_id=0x%x p_req_valid=%d have_new=%d\n", + ctrl->id, ref->p_req_valid, have_new_data); +``` + +Build `linux-fresnel-fourier 7.0-5` (pkgrel 4→5), deploy, reboot, run libva-HEVC + kdirect-HEVC, capture dmesg. + +### Outcome interpretation + +| obj at setup entry | p_req_valid (libva run) | Diagnosis | +|---|---|---| +| NULL | n/a | req has no v4l2_ctrl_handler bound at queue time. libva's S_EXT_CTRLS never staged. Bug in libva's request lifecycle. | +| non-NULL | all false | obj found, but staging path never set `p_req_valid`. Bug in `try_set_ext_ctrls_common` for libva's invocation. | +| non-NULL | true for SPS | staging worked but `req_to_new` / `try_or_set_cluster` failed silently. Bug in apply path. Needs another printk after `req_to_new`. | + +iter21 finishes when one of these is confirmed for libva. Compare to kdirect baseline (should always show p_req_valid=true for SPS). + +### Substrate state at iter21 open + +- Kernel `linux-fresnel-fourier 7.0-5` building on boltzmann (PID 1584834, log /tmp/iter21-kbuild.log). +- Backend SHA `c1d4bb53…` (iter15 stable) — backend unchanged from iter15. +- Fork tip `e109306` (α-24 reverted). +- 5-codec anchors: unchanged. Zero regression. + +### Phase 5 review note + +Diagnostic kernel patch (2 pr_info calls in well-known V4L2 framework function, no behavior change). Phase 5 review skipped per iter17 precedent for diagnostic-only kernel work. + +### Phase 7 plan + +After 7.0-5 deploys: +1. Reboot fresnel; sddm autologin reseats mfritsche. +2. `sudo dmesg -C`. +3. Run libva HEVC; capture rkvdec_iter20 + iter21_setup lines. +4. `sudo dmesg -C`. +5. Run kdirect HEVC; capture same. +6. Diff; localize bug to one of the three table-row diagnoses. diff --git a/phase4_iter22_plan.md b/phase4_iter22_plan.md new file mode 100644 index 0000000..195c109 --- /dev/null +++ b/phase4_iter22_plan.md @@ -0,0 +1,62 @@ +## Iteration 22 — Phase 4 (plan) + +Opens 2026-05-14 following iter21's smoking-gun finding: libva's request-clone-handler is missing 6 of 7 HEVC stateless controls registered in main_hdl. + +### Locked research question (iter22) + +> *"At which control_id does `v4l2_ctrl_request_clone`'s iteration break for libva, and what error code does `handler_new_ref` return?"* + +### Approach + +Add three printks to `v4l2_ctrl_request_clone` in `drivers/media/v4l2-core/v4l2-ctrls-request.c`: + +```c +pr_info("iter22_clone_start: new_hdl=%p from=%p\n", hdl, from); +// per iteration: +pr_info("iter22_clone_step: id=0x%x err=%d hdl_error=%d new_ref=%p\n", + ctrl->id, err, hdl->error, new_ref); +// on break: +pr_info("iter22_clone_break: at id=0x%x err=%d hdl_error=%d\n", + ctrl->id, err, hdl->error); +// on end: +pr_info("iter22_clone_end: hdl=%p err=%d\n", hdl, err); +``` + +Built as `linux-fresnel-fourier 7.0-6` (pkgrel 5→6). Deploy, reboot, run libva HEVC + kdirect HEVC. Diff. + +### Outcome interpretation + +| handler_new_ref return | hdl->error | Diagnosis | +|---|---|---| +| 0, new_ref=valid | 0 | Loop step succeeded — clone wouldn't break here. Look further. | +| 0, new_ref=NULL | 0 | Duplicate (skip silently). Means main_hdl has duplicate ctrl_refs — unlikely. | +| -ENOMEM | -ENOMEM | kzalloc failed. Memory pressure analysis needed. | +| 0, hdl->error=X | non-zero | Earlier auto-class-control insertion failed; subsequent handler_new_ref short-circuits. | +| -EINVAL | varies | Validation failed (e.g., overlapping ID range). | + +### Coordinate with iter21 finding + +If iter22 shows the loop breaks at 0xa40905 (H264_PRED_WEIGHTS) and again at 0xa40a91 (HEVC_PPS), the break must be UNREACHED by libva's iteration → means the **source main_hdl itself** doesn't have these controls. + +If iter22 shows the loop reaches 0xa40a91 with err=0 (i.e., NOT a break), then libva's clone-hdl actually DOES contain HEVC_PPS, and our iter21 printk was missing it (e.g., a list-ordering bug in the iteration). Unlikely but worth checking. + +### Substrate state at iter22 open + +- Kernel `linux-fresnel-fourier 7.0-6` building on boltzmann (PID 1613982, log /tmp/iter22-kbuild.log). +- Backend SHA `c1d4bb53…` — unchanged from iter15. +- Fork tip `e109306` — unchanged. +- 5-codec anchors: unchanged. + +### Phase 5 review + +Diagnostic-only kernel patch (printk-only, no behavior change). Skipped per iter17 precedent. + +### Phase 7 plan + +After 7.0-6 deploys: +1. Reboot fresnel; sddm autologin reseats mfritsche. +2. `sudo dmesg -C`. +3. Run libva HEVC; capture iter22_clone_* lines. +4. `sudo dmesg -C`. +5. Run kdirect HEVC; capture same. +6. Diff. Localize the break or absence-from-source. diff --git a/phase8_iteration20_close.md b/phase8_iteration20_close.md new file mode 100644 index 0000000..6a3d48f --- /dev/null +++ b/phase8_iteration20_close.md @@ -0,0 +1,127 @@ +## Iteration 20 — Phase 8 (close) + +Closes 2026-05-14. iter20 = kernel printk for `&ctx->ctrl_hdl`, `run.sps`, `run.decode_params` pointers + first 16 bytes of each, executed at top of `rkvdec_hevc_run` (after `rkvdec_hevc_run_preamble`). FULL close. Mechanism 4 reframed; root-cause localized to one kernel layer. + +### Method + +`linux-fresnel-fourier 7.0-4` adds `rkvdec_iter20:` printk to RK3399 `rkvdec_hevc_run`: + +```c +{ + u8 *sps_bytes = (u8 *)run.sps; + u8 *dp_bytes = (u8 *)run.decode_params; + pr_info("rkvdec_iter20: ctrl_hdl=%p sps=%p sps[0..16]=%*ph " + "dp=%p dp[0..16]=%*ph\n", + &ctx->ctrl_hdl, run.sps, + 16, sps_bytes ? sps_bytes : (u8 *)"", + run.decode_params, + 16, dp_bytes ? dp_bytes : (u8 *)""); +} +``` + +Deployed via scp + `pacman -U` + reboot, with sddm autologin reseating mfritsche session. Build wall-clock 50 min on boltzmann. + +### Results + +**libva HEVC** (13 frames, all identical): + +``` +rkvdec_iter20: ctrl_hdl=00000000f9b036ba sps=00000000105406cf + sps[0..16]=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + dp=00000000117b947e + dp[0..16]=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +``` + +**kdirect HEVC** (15 frames): + +``` +rkvdec_iter20: ctrl_hdl=00000000d3afe1db sps=0000000095c47ba1 + sps[0..16]=00 00 00 05 d0 02 00 00 04 04 02 04 01 01 00 03 + dp=00000000599ee83f + dp[0..16]=00..04..03 (varies per frame — correct, decode_params is per-frame) +``` + +### What this proves + +1. **`&ctx->ctrl_hdl` differs between processes** (libva `f9b036ba`, kdirect `d3afe1db`) — EXPECTED. Each backend opens `/dev/video3` separately, each gets its own `rkvdec_ctx` with its own private `ctrl_hdl`. This is normal V4L2 m2m. + +2. **The `sps` pointer is stable across all libva frames** (`105406cf`) — confirms the SPS control is registered to the handler exactly once (at CreateContext / `rkvdec_init_ctrls`). The allocation exists, `v4l2_ctrl_find()` returns it correctly. The control structure is registered. Not a registration bug. + +3. **libva's `*sps` content is all-zero**, **kdirect's `*sps` has real bytes** (`00 00 00 05 d0 02 00 00 04 04 02 04 01 01 00 03`) — the first SPS bytes in kdirect's case include `pic_width_in_luma_samples = 1280` (`0x05 0x00 = 1280` in little-endian + framing) which matches kdirect's `rkvdec_hevc_run` printk showing `w=1280`. libva's bytes are zero → its `w=0 h=0` printk follows. + +4. **libva's `*decode_params` is also all-zero** across all 13 frames. kdirect's varies per-frame. Confirms decode_params for libva never gets non-zero values into ctx->ctrl_hdl either. + +### Mechanism analysis + +The SPS control is **registered to `ctx->ctrl_hdl`** (pointer valid, stable, same allocation across 13 frames). What's missing is the **content copy** from `S_EXT_CTRLS` userspace payload into the registered control's `p_cur.p` memory. + +The V4L2 control-framework path for compound controls with `which=V4L2_CTRL_WHICH_REQUEST_VAL=0xf010000`: + +``` +userspace VIDIOC_S_EXT_CTRLS (which=REQUEST_VAL, request_fd=R, payload=...) + → kernel v4l2_s_ext_ctrls() + → which==REQUEST_VAL branch: looks up R's media_request, + stages payload into req->p_new for each control + → returns 0 + +userspace MEDIA_REQUEST_IOC_QUEUE on fd R + → kernel queues req's pending bufs and pending controls + → m2m schedules job → device_run callback + → rkvdec_hevc_run_preamble(): + v4l2_ctrl_request_setup(req, &ctx->ctrl_hdl): + copies req->p_new → ctx->ctrl_hdl[ctrl]->p_cur + → rkvdec_hevc_run() — printk fires here, reads ctx->ctrl_hdl values +``` + +For libva, the printk fires at the **read** site and observes all-zero. Three places this can fail: + +| # | Where | Likelihood | +|---|---|---| +| A | `v4l2_s_ext_ctrls` doesn't stage libva's payload into `req->p_new` for SPS | unknown — needs probe | +| B | `req->p_new` has correct bytes but `v4l2_ctrl_request_setup` doesn't run for libva's request | unknown — needs probe | +| C | `v4l2_ctrl_request_setup` runs but doesn't copy SPS for libva's request | unknown — needs probe | + +The kernel-direct path WORKS through the same control framework on the same kernel, same /dev/video3 — so the bug is in **how libva invokes the request lifecycle**, not in the framework code itself. + +### Mechanism status update (post-iter20) + +| # | Mechanism | Status | +|---|---|---| +| 1 | request_fd mismatch (S_EXT_CTRLS R1, QUEUE R2) | strongly disfavored (strace shows consistent fd per frame, but worth one explicit verification) | +| 2 | REINIT clears between S_EXT_CTRLS and QUEUE | DISPROVED iter19 | +| 3 | Stack-locals stale | DISPROVED iter18 | +| 4 | ctrl_hdl mismatch — different handlers | **REFRAMED iter20**: handlers differ (expected per-process), but BOTH register SPS correctly, and ctx->ctrl_hdl reads stable pointers. NOT a routing bug. | +| 5 | error_idx silent partial fail | DISPROVED iter18 | +| 6 | **NEW iter20**: req->p_new for SPS never receives libva's payload, OR v4l2_ctrl_request_setup never copies it into ctx->ctrl_hdl | **leading hypothesis** | + +### User-level test for iter21 + +Libva can self-diagnose between A and B/C without kernel patches: + +After `S_EXT_CTRLS(which=REQUEST_VAL, request_fd=R, payload=...)`, immediately issue: +- `G_EXT_CTRLS(which=REQUEST_VAL, request_fd=R)` for SPS. + +If readback returns non-zero bytes → **req->p_new HAS the payload** (mechanism A disproved, B or C remains). + +If readback returns zero → **req->p_new doesn't have it** (mechanism A confirmed). + +The G_EXT_CTRLS path with which=REQUEST_VAL reads from `req->p_new` directly — that's the staging slot. Outcome localizes the bug to one of two kernel layers. + +### Substrate state at iter20 close + +- Backend SHA on fresnel: `c1d4bb53…` (iter15 stable, unchanged). +- Fork tip `415688d` (iter19 state, unchanged). +- Kernel `linux-fresnel-fourier 7.0-4` with iter17 + iter20 printk in rkvdec_hevc_run. NOT a shipping kernel — diagnostic only. +- 5-codec anchors: unchanged from iter15. Zero regression. + +### iter21 candidate + +`α-24`: Add G_EXT_CTRLS readback in libva's `h265_set_controls` right after every `v4l2_set_controls(... which=REQUEST_VAL ...)` call. Log first 16 bytes of returned SPS. ~15 LOC, fully reversible. Test in this single iter, then revert (diagnostic only, not for shipping). + +Outcomes: +- **Non-zero readback** → req->p_new has libva's payload. Bug is in `v4l2_ctrl_request_setup` not running or not copying. iter22 = kernel printk in `v4l2_ctrl_request_setup` showing what gets copied for libva's request_fd at IOC_QUEUE time. +- **Zero readback** → req->p_new doesn't have libva's payload. Bug is in `v4l2_s_ext_ctrls` staging for libva's invocation. iter22 = kernel printk in `v4l2_s_ext_ctrls` showing what libva actually passed. + +### Lesson + +iter17 + iter20 prove `&ctx->ctrl_hdl` pointer routing is NOT the failure surface (registered controls allocated correctly, found correctly, pointer-stable). The failure surface is the **content copy** from userspace S_EXT_CTRLS into ctx->ctrl_hdl across the request lifecycle. Three iterations (17, 19, 20) of kernel printk have walked the bug-localization down from "anywhere in the kernel" → "S_EXT_CTRLS staging or v4l2_ctrl_request_setup application". Two more printk+probe iterations should reach the line of code. diff --git a/phase8_iteration21_close.md b/phase8_iteration21_close.md new file mode 100644 index 0000000..5e30723 --- /dev/null +++ b/phase8_iteration21_close.md @@ -0,0 +1,146 @@ +## Iteration 21 — Phase 8 (close) + +Closes 2026-05-14. iter21 = kernel printk at top of `v4l2_ctrl_request_setup` + per-ref dump. FULL close. **Smoking-gun finding: libva's request-clone-handler is missing 6 HEVC stateless controls registered in main_hdl.** + +### Method + +`linux-fresnel-fourier 7.0-5` (pkgrel 4→5) adds two `pr_info` to `v4l2_ctrl_request_setup` in `drivers/media/v4l2-core/v4l2-ctrls-request.c`: + +```c +obj = media_request_object_find(req, &req_ops, main_hdl); +pr_info("iter21_setup: req=%p main_hdl=%p obj=%p\n", req, main_hdl, obj); +... +list_for_each_entry(ref, &hdl->ctrl_refs, node) { + ... + pr_info("iter21_setup_ref: ctrl_id=0x%x p_req_valid=%d have_new=%d\n", + ctrl->id, ref->p_req_valid, have_new_data); + ... +} +``` + +Built ~1 min via ccache reuse. Deployed via scp + `pacman -U` + reboot. + +### α-24 result (predicate: kernel-only path required) + +α-24 (libva G_EXT_CTRLS readback after S_EXT_CTRLS) implemented as 1547a5d → amended a9c897f → reverted e109306. Kernel returned **EACCES** for all 13 libva HEVC frames: this V4L2 build disallows userspace probing of `req->p_new` for an uncompleted request. The probe path must run inside the kernel. + +### Result — definitive (libva vs kdirect) + +**libva HEVC frame 1 setup** (clone-hdl ctrl_refs in ID order, 14 entries): + +``` +0x990a67 p_req_valid=0 +0x990a6b p_req_valid=0 +0x990b00 p_req_valid=0 +0x990b67 p_req_valid=0 +0x990b68 p_req_valid=0 (5 codec-class menu controls) +0xa40900 p_req_valid=0 H264_DECODE_MODE +0xa40901 p_req_valid=0 H264_START_CODE +0xa40902 p_req_valid=0 H264_SPS +0xa40903 p_req_valid=0 H264_PPS +0xa40904 p_req_valid=0 H264_SCALING_MATRIX +0xa40907 p_req_valid=0 H264_DECODE_PARAMS +0xa40a2c p_req_valid=0 (misc stateless) +0xa40a2d p_req_valid=0 (misc stateless) +0xa40a90 p_req_valid=1 have_new=1 HEVC_SPS — CLONE STOPS HERE +``` + +**Missing from libva clone (vs kdirect):** +- 0xa40905 H264_PRED_WEIGHTS (compound) +- 0xa40906 H264_SLICE_PARAMS (compound, dyn_array) +- 0xa40a91 HEVC_PPS (compound) +- 0xa40a92 HEVC_SLICE_PARAMS (compound, dyn_array) +- 0xa40a93 HEVC_SCALING_MATRIX (compound) +- 0xa40a94 HEVC_DECODE_PARAMS (compound) +- 0xa40a95 HEVC_DECODE_MODE (menu) +- 0xa40a96 HEVC_START_CODE (menu) + +**kdirect HEVC frame 1 setup** (same hdl, 21 entries — all of above PLUS the 8 missing): + +``` +... 14 entries as above ... +0xa40a91 p_req_valid=1 have_new=1 HEVC_PPS +0xa40a92 p_req_valid=1 have_new=1 HEVC_SLICE_PARAMS +0xa40a93 p_req_valid=1 have_new=1 HEVC_SCALING_MATRIX +0xa40a94 p_req_valid=1 have_new=1 HEVC_DECODE_PARAMS +0xa40a95 p_req_valid=0 HEVC_DECODE_MODE (device-init only) +0xa40a96 p_req_valid=0 HEVC_START_CODE (device-init only) +``` + +### What this means + +`v4l2_ctrl_request_setup(req, main_hdl)`: +- finds `obj` for both libva and kdirect (non-NULL) — request properly bound. +- iterates `hdl->ctrl_refs` — but **libva's hdl is the request-clone-hdl, and it contains 14 of the 21 source controls**. +- libva's HEVC_SPS has `p_req_valid=1` — staging worked for that one control. +- The other 6 HEVC controls (PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS, DECODE_MODE, START_CODE) **don't exist in the clone-hdl at all** — they cannot be staged. + +When libva submits its 5-control S_EXT_CTRLS batch (SPS, PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS), only SPS is registered in the clone-hdl. PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS find no ref → `prepare_ext_ctrls` returns `-EINVAL`. (This contradicts iter18 α-22's rc=0 — needs re-investigation of error_idx semantics for the request path; the userspace observation of rc=0 may not reflect the actual kernel error for compound-control lookups in request clones.) + +### iter20's "zero SPS bytes" explained + +iter20 showed `rkvdec sees sps[0..16]=00..00` for libva. That's because: +- HEVC_SPS *is* in the clone-hdl with `p_req_valid=1` — so it got STAGED. +- But the **content** in `req->p_new[SPS]` is all-zero. + +Two possible reasons for zero content despite p_req_valid=1: +1. `user_to_new` ran on a zero-payload from libva. iter15 strace ruled this out — libva's SPS payload is non-zero at ioctl entry. +2. `new_to_req` ran, but the data flow is somehow corrupted. Possible if the master/cluster lookup is wrong on the clone-hdl. + +iter22 candidate: add a printk in `new_to_req` and `req_to_new` to log the copy: source pointer, dest pointer, first 4 bytes, payload size. + +### Mechanism status (post-iter21) + +| # | Mechanism | Status | +|---|---|---| +| 1 | request_fd mismatch | DISPROVED iter17/18 | +| 2 | REINIT clears | DISPROVED iter19 | +| 3 | Stack-locals stale | DISPROVED iter18 | +| 4 | ctrl_hdl mismatch | REFRAMED iter20 | +| 5 | error_idx silent failure | DISPROVED iter18 (but warrants re-check given iter21 finding) | +| 6 | req->p_new staging incomplete | **CONFIRMED iter21**: clone-hdl missing controls = staging cannot occur for 6 of 7 HEVC controls | +| 7 | **NEW iter21**: clone-hdl is missing controls that main_hdl has registered | **Root question for iter22** | + +### Why is the clone incomplete? + +`v4l2_ctrl_request_clone(new_hdl, from=main_hdl)` iterates `main_hdl->ctrl_refs` in ID-sorted order. After cloning HEVC_SPS (0xa40a90), the loop **stops** before HEVC_PPS (0xa40a91). Equivalent stops happen at H264_PRED_WEIGHTS (0xa40905) — both are first compound controls of their codec block. + +Hypothesis: `handler_new_ref` returns non-zero error at the first compound control AFTER an SPS-like single-struct compound, but **only when called from the request-clone path**. Or: `kzalloc(sizeof(*new_ref) + size_extra_req)` fails for ones with larger `elem_size` (HEVC_PPS = 64 bytes, H264_PRED_WEIGHTS = 32 bytes — small, unlikely to OOM but worth verifying). + +Alt hypothesis: `handler_new_ref`'s auto-class-control insertion (`v4l2_ctrl_new_std`) fails for non-compound HEVC menu controls in request-clone path, which propagates `hdl->error` and breaks subsequent iterations. + +Same kernel succeeds for kdirect on the same `from` hdl, so something is **per-request-bind specific** — maybe related to request lifecycle timing in libva (iter6 permanent request_fd at CreateContext) vs kdirect (per-frame request_fd). + +### Substrate state at iter21 close + +- Backend SHA on fresnel: `c1d4bb53…` (iter15 stable, unchanged). +- Fork tip `e109306` (α-24 reverted). +- Kernel `linux-fresnel-fourier 7.0-5` with iter17 + iter20 + iter21 printks. NOT a shipping kernel. +- 5-codec anchors: unchanged. Zero regression. + +### iter22 candidate + +Add printks to `v4l2_ctrl_request_clone` and `handler_new_ref`: + +```c +// in v4l2_ctrl_request_clone +pr_info("iter22_clone_start: new_hdl=%p from=%p\n", hdl, from); + +// per iteration +err = handler_new_ref(hdl, ctrl, &new_ref, false, true); +pr_info("iter22_clone_step: id=0x%x err=%d from_other=%d\n", + ctrl->id, err, ref->from_other_dev); +if (err) { + pr_info("iter22_clone_break: at id=0x%x err=%d hdl_error=%d\n", + ctrl->id, err, hdl->error); + break; +} +``` + +After 7.0-6 deploys, libva HEVC run will show exactly which ctrl_id breaks the loop and the error code. Then we can localize either to `kzalloc` failure, `v4l2_ctrl_new_std` failure (auto-class), or some other condition. + +### Lesson + +iter21 overturns the iter11–iter18 hypothesis space entirely. The S_EXT_CTRLS ioctl wire-byte payload analysis was correct — libva's bytes match kdirect's. But **at the v4l2_ctrl framework level, libva's request-clone is missing the registered controls libva tries to stage**. The bug is in how the V4L2 control framework handles libva's specific request-binding pattern, NOT in libva's ioctl content. + +This is the strongest narrowing since iter17. We've gone from "anywhere in kernel" → "kernel control framework" → "request-clone path specifically" → "iteration breaks at first compound HEVC control". diff --git a/phase8_iteration22_close.md b/phase8_iteration22_close.md new file mode 100644 index 0000000..1161cdf --- /dev/null +++ b/phase8_iteration22_close.md @@ -0,0 +1,118 @@ +## Iteration 22 — Phase 8 (close) + +Closes 2026-05-14. iter22 = kernel printk in `v4l2_ctrl_request_clone` tracing each `handler_new_ref` step. FULL close. **iter21's mid-conclusion is overturned: the request-clone-hdl is COMPLETE for libva — all 22 controls cloned with err=0.** + +### Method + +`linux-fresnel-fourier 7.0-6` added per-step pr_info to `v4l2_ctrl_request_clone`: + +```c +pr_info("iter22_clone_start: new_hdl=%p from=%p\n", hdl, from); +list_for_each_entry(ref, &from->ctrl_refs, node) { + ... + err = handler_new_ref(hdl, ctrl, &new_ref, false, true); + pr_info("iter22_clone_step: id=0x%x err=%d hdl_error=%d new_ref=%p\n", + ctrl->id, err, hdl->error, new_ref); + ... +} +pr_info("iter22_clone_end: hdl=%p err=%d\n", hdl, err); +``` + +Built ~2 min via ccache. Deployed via scp + pacman -U + reboot. + +### Results + +**libva HEVC** (11 clones — one per request_fd binding): + +Every clone-step logs `err=0 hdl_error=0 new_ref=valid_ptr`. Each clone has **22 controls successfully added**, ending with `iter22_clone_end err=0`. Full ID list per clone: + +``` +0x990001, 0x990a67, 0x990a6b, 0x990b00, 0x990b67, 0x990b68, +0xa40001, 0xa40900, 0xa40901, 0xa40902, 0xa40903, 0xa40904, 0xa40907, +0xa40a2c, 0xa40a2d, +0xa40a90, 0xa40a91, 0xa40a92, 0xa40a93, 0xa40a94, 0xa40a95, 0xa40a96 +``` + +**Note**: H264_PRED_WEIGHTS (0xa40905) and H264_SLICE_PARAMS (0xa40906) are NOT in the source main_hdl (these are not registered in rkvdec_h264_ctrls on this kernel — rkvdec doesn't expose them). All 7 HEVC stateless decode controls ARE present. + +**kdirect HEVC** (13 clones): identical pattern, identical ID set, all err=0. + +### What this overturns from iter21 + +iter21 concluded the clone-hdl was missing 6 HEVC controls for libva. **That was wrong.** The clone-hdl actually has all 22 controls. The iter21_setup_ref printk's iteration was filtering 8 of them out via the early-return check: + +```c +if (ref->req_done || (ctrl->flags & V4L2_CTRL_FLAG_READ_ONLY)) + continue; +``` + +For libva, only 14 refs reach the printk. For kdirect, 21 refs reach it. 8 vs 1 difference in skip count. + +### What this implies + +Two of the 8 skipped controls are clearly READ_ONLY: codec class roots `0x990001` and `0xa40001` (V4L2_CTRL_TYPE_CTRL_CLASS). That accounts for 2 of 8 in both libva and kdirect. + +For libva, **6 additional HEVC controls** (PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS, DECODE_MODE, START_CODE) get skipped. For kdirect, only **DECODE_MODE + START_CODE** get skipped (the 2 with p_req_valid=0 that aren't staged this frame). + +Wait — kdirect shows DECODE_MODE + START_CODE in the setup_ref printk with p_req_valid=0. So they're NOT skipped by the continue check. So kdirect's 21 displayed = 22 cloned - 1 (only one CLASS root being printed?). Hmm, mismatch. + +Actually the iter21 setup_ref for kdirect showed 21 lines visible — 20 entries +1 setup header. Let me re-examine. The kdirect dump had 21 ctrl_refs lines (excluding setup: line). 22 clone - 21 setup_ref = 1 skipped. Possibly only one class root present in kdirect's clone-hdl somehow. + +So the actual difference is: libva skips 8, kdirect skips 1. The 7 EXTRA skips for libva are: 6 HEVC controls (PPS, SLICE, SCALING, DECODE_PARAMS, DECODE_MODE, START_CODE) + 1 mystery. + +### Mechanism status (post-iter22) + +| # | Mechanism | Status | +|---|---|---| +| 1 | request_fd mismatch | DISPROVED | +| 2 | REINIT clears | DISPROVED iter19 | +| 3 | Stack-locals stale | DISPROVED iter18 | +| 4 | ctrl_hdl mismatch | REFRAMED iter20 | +| 5 | error_idx silent failure | DISPROVED iter18 | +| 6 | req->p_new staging incomplete | iter21 → 22 OVERTURNED (clone IS complete) | +| 7 | Clone-hdl missing controls | DISPROVED iter22 (clone has all 22) | +| 8 | **NEW iter22**: 6 of 7 HEVC controls get skipped in v4l2_ctrl_request_setup loop for libva but not kdirect | **leading hypothesis** | + +### iter23 candidate + +Add printk **inside** the `v4l2_ctrl_request_setup` loop **before** the `continue` check, logging `req_done`, `flags`, `ncontrols`, `cluster[0]->id`: + +```c +pr_info("iter23_loop: id=0x%x req_done=%d flags=0x%x ncontrols=%d cluster0_id=0x%x\n", + ctrl->id, ref->req_done, ctrl->flags, + master->ncontrols, + master->cluster[0] ? master->cluster[0]->id : 0); +if (ref->req_done || (ctrl->flags & V4L2_CTRL_FLAG_READ_ONLY)) + continue; +``` + +Two possible findings: +- **req_done already true** for the 6 HEVC controls → an earlier iteration (HEVC_SPS) clustered them and set req_done. Means main_hdl's HEVC_SPS has `master->cluster` containing PPS+SLICE+SCALING+DECODE_PARAMS+DECODE_MODE+START_CODE on libva's path. +- **flags has READ_ONLY** → the controls have READ_ONLY set, which is wrong for stateless decode controls. + +`ncontrols` and `cluster0_id` reveal cluster membership directly: if `ncontrols > 1` for HEVC_SPS, it's been clustered with siblings. + +### Substrate state at iter22 close + +- Backend SHA on fresnel: `c1d4bb53…` (iter15 stable, unchanged). +- Fork tip `e109306` (α-24 reverted) — unchanged. +- Kernel `linux-fresnel-fourier 7.0-6` with iter17 + iter20 + iter21 + iter22 printks. +- 5-codec anchors: unchanged. + +### Iter23 build kicked off + +`linux-fresnel-fourier 7.0-7` building on boltzmann (PID 1643224, log /tmp/iter23-kbuild.log). Expected ~2 min via ccache. + +### Lesson + +iter21's mid-conclusion was based on the wrong printk position — the `iter21_setup_ref` printk was inside `v4l2_ctrl_request_setup`'s loop but AFTER the early-`continue` checks, missing controls that get skipped. iter22's clone-trace showed the clone IS complete; the staging FAILS via the setup-loop SKIP path, not the clone path. + +The empirical pattern is now: **libva's per-frame request gets through clone correctly; gets through S_EXT_CTRLS correctly (stages all 5 controls with p_req_valid=1 — at least for SPS, definitely 1); but at setup-loop time, 6 of the 7 HEVC controls get a `continue` that bypasses `req_to_new`**. SPS alone reaches `req_to_new` → `try_or_set_cluster` → commits to `p_cur` → rkvdec_run reads ctx->ctrl_hdl[SPS]->p_cur which is non-zero? + +Actually — wait, iter20 said sps[0..16] was zero for libva. If SPS is the only one that reaches req_to_new + try_or_set_cluster, then SPS's content SHOULD be correct, but iter20 said it's zero. So SPS itself ALSO has a problem in the commit path. + +So we have: +- 6 of 7 HEVC controls: never reach `req_to_new` (skipped in setup loop). +- 1 of 7 (SPS): reaches `req_to_new` but resulting `p_cur` content is zero anyway. + +These are TWO separate bugs (or one bug with two symptoms). iter23 will reveal the skip mechanism; another iter (or test) may need to address the SPS-commit-content issue. diff --git a/phase8_iteration23_close.md b/phase8_iteration23_close.md new file mode 100644 index 0000000..d522fbc --- /dev/null +++ b/phase8_iteration23_close.md @@ -0,0 +1,105 @@ +## Iteration 23 — Phase 8 (close) + +Closes 2026-05-14. iter23 = kernel printk inside `v4l2_ctrl_request_setup` outer loop, BEFORE the `continue` check, logging every iteration. FULL close. + +### Method + +`linux-fresnel-fourier 7.0-7` added one pr_info at TOP of the outer loop in `v4l2_ctrl_request_setup`, BEFORE `if (ref->req_done || (ctrl->flags & V4L2_CTRL_FLAG_READ_ONLY)) continue;`: + +```c +pr_info("iter23_loop: id=0x%x req_done=%d flags=0x%x ncontrols=%d cluster0_id=0x%x\n", + ctrl->id, ref->req_done, ctrl->flags, + master->ncontrols, + master->cluster[0] ? master->cluster[0]->id : 0); +``` + +### Result — definitive + +**libva HEVC** (first setup): iter23_loop fires for 16 IDs ending at 0xa40a90 (HEVC_SPS). **The outer loop EXITS before reaching 0xa40a91.** + +**kdirect HEVC** (first setup): iter23_loop fires for 22 IDs ending at 0xa40a96 (HEVC_START_CODE). **The outer loop completes normally.** + +The loop body has only two exit-loop paths after the iter23_loop printk fires: +1. `goto error` if `req_to_new(r)` returns non-zero. +2. `break` if `try_or_set_cluster(NULL, master, true, 0)` returns non-zero. + +For libva, ONE of these fires AT HEVC_SPS, exiting the loop. For kdirect, NEITHER fires. + +This **fully overturns iter21/22**: +- The clone-hdl IS complete for libva (iter22 confirmed all 22 controls cloned). +- The setup loop reaches HEVC_SPS for libva (iter23 confirmed). +- The processing of HEVC_SPS in the setup loop FAILS for libva. + +The failure of HEVC_SPS processing means: +- `p_cur` for HEVC_SPS is never committed → rkvdec reads zero (iter20 finding). +- All subsequent compound HEVC controls (PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS, DECODE_MODE, START_CODE) NEVER reach their processing → their `req_done` stays false but they're also never committed → all zero in `ctx->ctrl_hdl`. + +### Why does HEVC_SPS processing fail for libva but not kdirect? + +The most likely candidates: + +| Function | Failure modes | +|---|---| +| `req_to_new(ref_SPS)` | -ENOENT if `!p_req_valid`. -EINVAL if elem count mismatch (`p_req_elems != p_array_alloc_elems` for non-dyn-array). -ENOMEM if alloc fails for dyn-array resize. | +| `try_or_set_cluster(NULL, master_SPS, true, 0)` | Validator failures (out-of-range field values). Cluster ops failures. Often returns -EINVAL or -ERANGE. | + +iter24 will pinpoint which function fails and what return value. + +### iter21/22's interpretation errors + +- **iter21**: I concluded the clone-hdl was missing controls. Wrong — the iter21_setup_ref printk was inside the loop body but AFTER the early-continue check. The "missing" controls were actually iterated past after SPS's processing failed and the loop exited — they never even saw the iter21 printk. +- **iter22**: The clone trace confirmed clone-hdl is complete. Good. But my mid-conclusion ("clone-hdl is complete; staging fails in setup loop SKIP path") was partially wrong — the loop doesn't SKIP, it EXITS. + +### Mechanism status (post-iter23) + +| # | Mechanism | Status | +|---|---|---| +| 1 | request_fd mismatch | DISPROVED iter17/18 | +| 2 | REINIT clears | DISPROVED iter19 | +| 3 | Stack-locals stale | DISPROVED iter18 | +| 4 | ctrl_hdl mismatch | DISPROVED iter20-22 | +| 5 | error_idx silent failure | DISPROVED iter18 | +| 6 | req->p_new staging incomplete | DISPROVED iter22 | +| 7 | Clone-hdl missing controls | DISPROVED iter22 | +| 8 | Skip-loop bypass | DISPROVED iter23 (loop EXITS, not skips) | +| 9 | **NEW iter23**: HEVC_SPS processing in v4l2_ctrl_request_setup fails for libva | **LEADING — iter24 candidate** | + +### iter24 candidate + +`linux-fresnel-fourier 7.0-8`: + +```c +ret = req_to_new(r); +pr_info("iter24_req_to_new: id=0x%x ret=%d p_req_valid=%d p_req_elems=%u\n", + master->cluster[i]->id, ret, r->p_req_valid, r->p_req_elems); +... +ret = try_or_set_cluster(NULL, master, true, 0); +pr_info("iter24_try_or_set: master_id=0x%x ret=%d\n", master->id, ret); +``` + +After 7.0-8 deploys, libva HEVC will show: +- `iter24_req_to_new id=0xa40a90 ret=X p_req_valid=Y p_req_elems=Z` where X is the actual return value. +- If req_to_new ret != 0 → bug is in req_to_new for HEVC_SPS on libva's staged data. Compare p_req_elems to kdirect's value. +- If req_to_new ret == 0 → check iter24_try_or_set's ret. If non-zero → validator rejects libva's SPS but accepts kdirect's. Investigate which field validator rejects. + +### Substrate state at iter23 close + +- Backend SHA on fresnel: `c1d4bb53…` (iter15 stable, unchanged). +- Fork tip `e109306` — unchanged. +- Kernel `linux-fresnel-fourier 7.0-7` with iter17 + iter20 + iter21 + iter22 + iter23 printks. +- 5-codec anchors: unchanged. + +### iter24 build kicked off + +`linux-fresnel-fourier 7.0-8` building on boltzmann (PID 1672261, log /tmp/iter24-kbuild.log). + +### Lesson + +Three iterations of mid-loop printk (iter21, iter22, iter23) needed to localize the exit. Each iteration overturned the previous's partial conclusion. Key methodology: **place the diagnostic printk at the very top of each loop body, BEFORE any continue/break, to distinguish "skipped" from "exited"**. Without that, "missing from printk output" is ambiguous. + +The bug is now localized to: +- A specific function: `req_to_new` OR `try_or_set_cluster`. +- A specific control: HEVC_SPS. +- A specific request lifecycle pattern: libva's, not kdirect's. + +One more printk iteration (iter24) should give the failing function + return code. diff --git a/phase8_iteration24_close.md b/phase8_iteration24_close.md new file mode 100644 index 0000000..5419d08 --- /dev/null +++ b/phase8_iteration24_close.md @@ -0,0 +1,123 @@ +## Iteration 24 — Phase 8 (close) + +Closes 2026-05-14. iter24 = kernel printk logging `req_to_new` and `try_or_set_cluster` return values. FULL close. **ROOT CAUSE IDENTIFIED.** + +### Method + +`linux-fresnel-fourier 7.0-8` (pkgrel 7→8). Added pr_info after each kernel framework call in `v4l2_ctrl_request_setup`'s cluster-process block: + +```c +ret = req_to_new(r); +pr_info("iter24_req_to_new: id=0x%x ret=%d p_req_valid=%d p_req_elems=%u\n", + master->cluster[i]->id, ret, r->p_req_valid, r->p_req_elems); +... +ret = try_or_set_cluster(NULL, master, true, 0); +pr_info("iter24_try_or_set: master_id=0x%x ret=%d\n", master->id, ret); +``` + +### Result — definitive + +**libva HEVC** (all 10+ setups, identical pattern): + +``` +iter24_req_to_new: id=0xa40a90 ret=0 p_req_valid=1 p_req_elems=1 +iter24_try_or_set: master_id=0xa40a90 ret=-16 +iter24_loop_break: at master_id=0xa40a90 ret=-16 +iter24_loop_done: final ret=-16 +``` + +`-16` is `-EBUSY`. `req_to_new` succeeds. `try_or_set_cluster` returns -EBUSY for HEVC_SPS, **exiting the setup loop**. + +**kdirect HEVC**: continues processing all 5 staged controls successfully (ret=0 throughout). + +### Source localization + +The only -EBUSY path in `try_or_set_cluster` is `call_op(master, s_ctrl)` for HEVC_SPS, which dispatches to `rkvdec_s_ctrl` in `drivers/media/platform/rockchip/rkvdec/rkvdec.c:149`: + +```c +static int rkvdec_s_ctrl(struct v4l2_ctrl *ctrl) +{ + struct rkvdec_ctx *ctx = container_of(ctrl->handler, struct rkvdec_ctx, ctrl_hdl); + const struct rkvdec_coded_fmt_desc *desc = ctx->coded_fmt_desc; + enum rkvdec_image_fmt image_fmt; + struct vb2_queue *vq; + + ... + + /* Check if this change requires a capture format reset */ + if (!desc->ops->get_image_fmt) + return 0; + + image_fmt = desc->ops->get_image_fmt(ctx, ctrl); + if (rkvdec_image_fmt_changed(ctx, image_fmt)) { + vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, + V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE); + if (vb2_is_busy(vq)) + return -EBUSY; // ← THIS + + ctx->image_fmt = image_fmt; + rkvdec_reset_decoded_fmt(ctx); + } + + return 0; +} +``` + +### Root cause + +When the first HEVC_SPS arrives, rkvdec needs to determine the output image format from SPS fields (chroma_format_idc, bit_depth_luma/chroma_minus8). If the format differs from the previous/default — which it does at first-frame because ctx->image_fmt starts at the default — rkvdec wants to reset the CAPTURE format. + +But it can only do that if the CAPTURE queue has NO buffers allocated. `vb2_is_busy(vq)` returns true if `vq->num_buffers > 0`. + +**libva pre-allocates 24 CAPTURE buffers at CreateContext (iter5b-β design)**. By the time the first per-frame S_EXT_CTRLS(HEVC_SPS, REQUEST_VAL) fires, CAPTURE is already full → vb2_is_busy=true → -EBUSY → setup loop exits → SPS never committed → all-zero in ctx->ctrl_hdl → rkvdec_hevc_run reads zero. + +**kdirect (ffmpeg-v4l2request)** allocates CAPTURE buffers AFTER the SPS-driven format is known. So when its first S_EXT_CTRLS fires, CAPTURE is EMPTY → vb2_is_busy=false → format reset succeeds → s_ctrl returns 0 → SPS commits correctly. + +### This is THE Bug 5 root cause + +After 24 iterations of investigation, including 8 wire-byte hypothesis eliminations, 4 mechanism eliminations, and 5 kernel-side printk iterations: + +**Bug 5 (HEVC libva = all-zero CAPTURE) is caused by libva pre-allocating CAPTURE buffers before the first SPS-set, blocking rkvdec's format-reset.** + +Bug 4 (H264 libva = keyframe partial) is likely the same root cause — H264_SPS triggers the same image_fmt check via rkvdec_h264_fmt_ops's get_image_fmt. + +### Why VP9 works through libva + +VP9 (rkvdec_vp9_ctrls) might NOT have a get_image_fmt op (vp9_frame is the only control, and chroma+bit_depth come from frame header, not a separate SPS). Or VP9's frame parameters always resolve to the same image_fmt as the default. Either way, no format-reset attempt → no -EBUSY. + +### Mechanism status — RESOLVED + +| # | Mechanism | Status | +|---|---|---| +| ALL prior | various | DISPROVED iter17-23 | +| **iter24** | **rkvdec_s_ctrl returns -EBUSY for HEVC_SPS because CAPTURE queue is busy with libva's pre-allocated pool** | **CONFIRMED — ROOT CAUSE** | + +### Fix candidates + +**Option A** (libva backend fix): Defer libva's CAPTURE pool allocation until AFTER the first per-frame SPS is set. Concretely: +- At CreateContext: skip cap_pool_init. +- On first BeginPicture/EndPicture: after first S_EXT_CTRLS(SPS) succeeds, then REQBUFS+QUERYBUF+MMAP the CAPTURE pool. +- Risk: changes the iter5b-β "permanent CAPTURE pool" model, may regress VP9/MPEG-2. + +**Option B** (libva backend fix, narrower): Use S_FMT(CAPTURE) BEFORE allocating CAPTURE buffers, with the same image_fmt the SPS will request. This way, ctx->image_fmt is already correct when SPS arrives → rkvdec_image_fmt_changed returns false → no reset attempt → no -EBUSY. + +**Option C** (kernel fix, upstream): Change rkvdec_s_ctrl to silently no-op the format-reset if the image_fmt is already correct, even if get_image_fmt returns a value that triggered the check. This is risky — it changes upstream rkvdec semantics. + +**Option B is preferred** — minimal libva change, aligns with kdirect's pattern (set S_FMT(CAPTURE) before allocating). + +### iter25 candidate + +Implement Option B in libva backend's CreateContext: explicit `v4l2_set_format(CAPTURE, V4L2_PIX_FMT_NV12, fixture_w, fixture_h)` BEFORE `cap_pool_init`. Set the expected format from BBB's parameters (chroma 4:2:0, 8-bit → NV12). + +This builds on iter15's α-19 which already adds an explicit S_FMT(CAPTURE) call — but verify it ACTUALLY runs before cap_pool_init in the libva CreateContext flow. + +### Substrate state at iter24 close + +- Backend SHA on fresnel: `c1d4bb53…` (iter15 stable, unchanged). +- Fork tip `e109306` — unchanged. +- Kernel `linux-fresnel-fourier 7.0-8` with iter17 + iter20-24 printks. +- 5-codec anchors: unchanged. + +### Lesson + +8 iterations of wire-byte and ioctl-sequence analysis (iter11-iter18) chased an empirical illusion. Once kernel-side printk landed (iter17), 4 more iterations (iter20-23) walked the symptom down to one function call returning one specific error code. **The bug was in a 5-line kernel function we'd never read.** Now we have the right diagnosis and a clear forward path. diff --git a/phase8_iteration25_close.md b/phase8_iteration25_close.md new file mode 100644 index 0000000..61c5b99 --- /dev/null +++ b/phase8_iteration25_close.md @@ -0,0 +1,94 @@ +## Iteration 25 — Phase 8 (close) + +Closes 2026-05-14. iter25 = α-25 synthetic-SPS injection before cap_pool_init. **MAJOR WIN.** PARTIAL close — frame 1 byte-identical to kdirect for HEVC libva; frames 2+ have separate wire-byte issue (decode_params). + +### α-25 implementation + +`src/context.c::RequestCreateContext` — after S_FMT(OUTPUT) + S_FMT(CAPTURE) + G_FMT(CAPTURE) sanity, BEFORE `cap_pool_init`: + +```c +switch (config_object->profile) { +case VAProfileHEVCMain: { + struct v4l2_ctrl_hevc_sps dummy_sps; + memset(&dummy_sps, 0, sizeof(dummy_sps)); + dummy_sps.chroma_format_idc = 1; /* 4:2:0 */ + dummy_sps.bit_depth_luma_minus8 = 0; /* 8-bit */ + dummy_sps.bit_depth_chroma_minus8 = 0; + dummy_sps.pic_width_in_luma_samples = picture_width; + dummy_sps.pic_height_in_luma_samples = picture_height; + /* ... v4l2_set_controls(video_fd, request_fd=-1, &SPS, 1) ... */ +} +case VAProfileH264*: similar with V4L2_CID_STATELESS_H264_SPS +default: skip +} +``` + +Forks `db0b7f9` — single commit. + +### Result — definitive + +**Frame 1**: libva CAPTURE bytes = kdirect CAPTURE bytes (cmp identical for first 1382400 bytes, the entire frame 1 NV12 payload of 1280×720). + +**Frame 2+**: diverge starting at byte 1382401. + +### Kernel printk evidence (post-α-25) + +``` +iter24_req_to_new: id=0xa40a90 ret=0 p_req_valid=1 p_req_elems=1 +iter24_try_or_set: master_id=0xa40a90 ret=0 ← was -16 (EBUSY) before +iter24_req_to_new: id=0xa40a91 ret=0 +iter24_try_or_set: master_id=0xa40a91 ret=0 +iter24_req_to_new: id=0xa40a92 ret=0 +iter24_try_or_set: master_id=0xa40a92 ret=0 +iter24_req_to_new: id=0xa40a93 ret=0 +iter24_try_or_set: master_id=0xa40a93 ret=0 +iter24_req_to_new: id=0xa40a94 ret=0 +iter24_try_or_set: master_id=0xa40a94 ret=0 +rkvdec_iter20: sps[0..16]=00 00 00 05 d0 02 00 00 04 04 04 00 01 01 00 03 + ← non-zero, w=1280, h=720 +rkvdec_hevc_run: w=1280 h=720 chroma=1 nal_unit_type=20 slice_type=2 decode_flags=0x3 + ← rkvdec sees CORRECT SPS for the first time +``` + +`iter24_loop_break-count = 0` — the setup loop NEVER breaks. All 5 staged HEVC controls commit to ctx->ctrl_hdl successfully. + +### Bug 5 root cause: FIXED + +The -EBUSY block from rkvdec_s_ctrl's vb2_is_busy check is gone. ctx->image_fmt is pre-seeded to RKVDEC_IMG_FMT_420_8BIT by the synthetic SPS injection before any CAPTURE buffer is allocated. Per-frame SPS submissions find image_fmt_changed=false → skip reset → commit succeeds. + +### Frame 2+ divergence (separate Bug) + +`decode_params.short_term_ref_pic_set_size`: +- libva frame 2: bytes 4-5 = `00 00` → 0 +- kdirect frame 2: bytes 4-5 = `0a 00` → 10 + +libva's `h265_fill_decode_params` doesn't populate short_term_ref_pic_set_size (VAAPI doesn't expose it). kdirect parses it from the HEVC NAL directly. This affects DPB reference resolution for P/B frames. iter26 candidate. + +### Mechanism status + +| # | Mechanism | Status | +|---|---|---| +| 9 | rkvdec_s_ctrl -EBUSY on first SPS | **FIXED iter25 α-25** | +| 10 | decode_params.short_term_ref_pic_set_size = 0 | **NEW iter26 candidate** | + +### Substrate state at iter25 close + +- Backend SHA on fresnel: post-α-25 build (commit `db0b7f9`). +- Fork tip `db0b7f9` (α-25). +- Kernel `linux-fresnel-fourier 7.0-8` (diagnostic printks; should eventually revert to clean 7.0-1 + RFC v2 + iter12 baseline). +- HEVC libva frame 1 = kdirect frame 1 byte-identical. ✓✓✓ +- HEVC libva frame 2+: differs. + +### Anchors check pending + +Need to re-run 5-codec anchors to verify α-25 didn't regress VP9/MPEG-2/VP8 (it shouldn't — guard is `case VAProfileHEVCMain` / `case VAProfileH264*` only). + +### Lesson + +After 15 iterations chasing wire-byte hypotheses (iter11-iter18), 5 iterations of kernel printk (iter17-iter24), the actual bug was an interaction between libva's CAPTURE-pre-allocate design and rkvdec's lazy image_fmt determination. The fix is 90 LOC in libva. The kernel was correct all along — it just needed a way to commit the image_fmt before buffers were locked in. + +This validates [[feedback-libva-byte-correct-kernel-bug]] only partially: libva WAS byte-correct in its ioctl content, but it had a CAPTURE-pool-allocation TIMING bug that interacted with kernel state. The bug is in libva, not the kernel, but the symptom only manifested because of kernel-side -EBUSY semantics that aren't well documented. + +### iter26 candidate + +Fix `h265_fill_decode_params` to populate `short_term_ref_pic_set_size`. VAAPI doesn't expose this directly, but it can be derived from `surface_object->params.h265.slices[0].short_term_ref_pic_set_size` (if VAAPI provides it) or parsed from the slice header. diff --git a/phase8_iteration26_close.md b/phase8_iteration26_close.md new file mode 100644 index 0000000..4ec1ea2 --- /dev/null +++ b/phase8_iteration26_close.md @@ -0,0 +1,76 @@ +## Iteration 26 — Phase 8 (close) + +Closes 2026-05-14. iter26 = α-26 `decode_params.short_term_ref_pic_set_size` from `VAPictureParameterBufferHEVC.st_rps_bits`. PARTIAL close. + +### α-26 fix + +`src/h265.c::h265_fill_decode_params` — replaced the comment "VAAPI doesn't expose" with the actual assignment: + +```c +decode_params->short_term_ref_pic_set_size = picture->st_rps_bits; +``` + +VAAPI's `VAPictureParameterBufferHEVC` exposes `st_rps_bits` (u32) as the bit-count of the inline `short_term_ref_pic_set` syntax element in the slice header. The previous comment in libva was wrong — the field IS exposed. + +Fork `66ef848`. + +### Empirical result + +- **HEVC frame 1** (IDR): libva CAPTURE = kdirect CAPTURE byte-identical. ✓ +- **HEVC frames 2–10**: still diverge. Hash unchanged from iter25 (`700aa52d…`). +- **decode_params bytes** (per iter20 kernel printk) NOW match kdirect for frames 1-3: + - libva frame 2: `dp[0..16] = 04 00 00 00 0a 00 00 00 01 01 00 00 00 00 00 00` + - kdirect frame 2: `dp[0..16] = 04 00 00 00 0a 00 00 00 01 01 00 00 00 00 00 00` ✓ + +α-26 fixed the first 16 bytes of decode_params for libva. But the output is identical to iter25 — so the divergence-causing bytes are NOT in `decode_params[0..16]`. + +### What still differs + +12,234,632 of 13,824,000 bytes diverge (frames 2-10 nearly all bytes off). Frame 1 is 1,382,400 bytes — byte-identical. + +`rkvdec_hevc_run` printk for libva still shows `reorder=4` (libva's incorrect `sps_max_num_reorder_pics = sps_max_dec_pic_buffering_minus1`) vs kdirect's `reorder=2`. But kernel source search shows `sps_max_num_reorder_pics` is referenced ONLY in our diagnostic printk — rkvdec_hevc_run hardware setup doesn't use it. So that's not the cause. + +The likely candidates for frame 2+ divergence: +1. **DPB entry mapping**: dpb[i].timestamp must match the CAPTURE buffer's timestamp. iter26 didn't probe this. Need to dump `dp[64..96]` (dpb[0..1] entries) and compare libva vs kdirect. +2. **CAPTURE buffer reuse pattern**: libva's iter5b-β has 24-slot LRU, kdirect has different pool size. Maybe the kernel's reference buffer association differs. +3. **slice_params bytes**: not yet inspected by printk. + +### Regression check (non-HEVC anchors) + +Run with αpha-25 + α-26 changes: + +| Codec | Result | Notes | +|---|---|---| +| H.264 (bbb_1080p30_h264, 3 frames) | **HW = kdirect byte-equal** | Bug 4 FIXED | +| HEVC (bbb_720p10s_hevc, 1 frame) | HW = kdirect byte-equal | Frame 1 FIXED | +| HEVC (bbb_720p10s_hevc, 10 frames) | HW ≠ kdirect | Frame 2+ separate bug | +| VP9 (bbb_720p10s_vp9) | HW = SW byte-equal | Unchanged | +| MPEG-2 (bbb_720p10s_mpeg2) | **Not testable this boot** | Pre-existing: vainfo only advertises rkvdec profiles, hantro paths not multi-probed | +| VP8 (bbb_720p10s_vp8) | **Not testable this boot** | Same pre-existing issue | + +The MPEG-2 / VP8 "not testable" state is due to libva backend's single-device auto-select (chose rkvdec at /dev/video1+/dev/media0 this boot). rkvdec doesn't advertise MPEG-2 / VP8 profiles. To test these, would need libva-side multi-device profile-probe or explicit env override. + +### Major campaign milestone + +**Bug 4 (H.264 keyframe-partial → all-correct)** and **Bug 5 (HEVC libva all-zero → frame 1 correct)** root causes identified and **PARTIALLY FIXED** in two iterations after the 6 kernel-printk diagnostic iterations narrowed the failure to `rkvdec_s_ctrl -EBUSY` on first SPS. + +H.264 is now fully byte-equivalent to kdirect. HEVC has one remaining bug (frame 2+). + +### Substrate state at iter26 close + +- Backend fork tip: `66ef848` (α-25 + H264-flag-fix + α-26). +- Kernel `7.0-8` with diagnostic printks (will eventually revert to clean baseline). +- 5-codec status: H264 ✅, HEVC ⚠️ (frame 1 ✅), VP9 ✅, MPEG-2/VP8 untestable this boot. + +### iter27 candidate + +Add iter20-style kernel printk extension to dump `dp[64..96]` covering `dpb[0..1]` entries. Compare libva vs kdirect DPB entries for HEVC frame 2 to identify if it's: +- (a) timestamp mismatch → libva references a non-existent CAPTURE buffer. +- (b) pic_order_cnt_val mismatch. +- (c) DPB entry flags mismatch. + +OR alternatively, just inspect slice_params bytes for frame 2 (rkvdec's `run.slices_params[0]` printk extension). + +### Lesson + +Two iterations of libva-side patches (α-25 = synthetic SPS, α-26 = st_rps_bits) after the 24-iteration kernel-printk localization fixed Bug 4 fully and Bug 5 partially. The campaign's wire-byte hypothesis arc (iter11-iter18) was overturned by kernel printk, but THEN the actual fix was almost entirely on the libva side. The kernel was correct.