iter20-26: kernel-side root-cause localization, α-25/α-26 fix Bug 4, partial Bug 5
iter20-23: kernel printk in rkvdec_hevc_run + v4l2_ctrl_request_setup
iter24: pinpointed rkvdec_s_ctrl returning -EBUSY for HEVC_SPS due
to vb2_is_busy(CAPTURE) — libva pre-allocates 24 CAPTURE bufs
before first per-frame S_EXT_CTRLS, blocking image_fmt reset
iter25 α-25: synthetic SPS injection before cap_pool_init seeds
ctx->image_fmt to RKVDEC_IMG_FMT_420_8BIT while CAPTURE is
still empty. H264 Bug 4 fully fixed (byte-equal kdirect).
HEVC Bug 5 frame 1 fixed (byte-equal kdirect).
iter26 α-26: populate decode_params.short_term_ref_pic_set_size from
picture->st_rps_bits (VAAPI does expose it). Bytes 4-5 of
dp now match kdirect. HEVC frame 2+ still diverges
(separate bug, likely DPB entry mapping).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,60 @@
|
||||
## Iteration 21 — Phase 4 (plan)
|
||||
|
||||
Opens 2026-05-14. Continues iter20's localization: rkvdec sees all-zero ctx->ctrl_hdl SPS for libva, real bytes for kdirect. The break is between userspace S_EXT_CTRLS and rkvdec's read of `ctx->ctrl_hdl[SPS].p_cur.p`.
|
||||
|
||||
### Locked research question (iter21)
|
||||
|
||||
> *"At the v4l2_ctrl_request_setup() entry for libva's per-frame request_fd, is the V4L2 control-handler object (`obj`) found? For each control_ref in the request's hdl->ctrl_refs, is `p_req_valid == true`?"*
|
||||
|
||||
### What this narrows
|
||||
|
||||
`v4l2_ctrl_request_setup(req, main_hdl)` at IOC_QUEUE time iterates `hdl->ctrl_refs` and only applies controls where `ref->p_req_valid == true`. The bit gets set by the staging path in `try_set_ext_ctrls_common` (called from `try_set_ext_ctrls_request`) when which=V4L2_CTRL_WHICH_REQUEST_VAL.
|
||||
|
||||
If libva's S_EXT_CTRLS staged correctly, p_req_valid is true for each ctrl libva submitted. If staging failed silently, p_req_valid is false and `v4l2_ctrl_request_setup` skips them — ctx->ctrl_hdl stays at zero (matches iter20 evidence).
|
||||
|
||||
### Approach (α-24 inert → kernel path)
|
||||
|
||||
α-24 (libva G_EXT_CTRLS readback after S_EXT_CTRLS) was implemented in 1547a5d+a9c897f, returned `EACCES` for all 13 libva HEVC frames. Reverted in e109306. The kernel disallows G_EXT_CTRLS against a not-yet-completed request — userspace can't probe `req->p_new`. Mechanism distinguishing requires kernel printk.
|
||||
|
||||
iter21 patches `v4l2_ctrl_request_setup` with two printk lines:
|
||||
|
||||
```c
|
||||
pr_info("iter21_setup: req=%p main_hdl=%p obj=%p\n",
|
||||
req, main_hdl, obj);
|
||||
|
||||
pr_info("iter21_setup_ref: ctrl_id=0x%x p_req_valid=%d have_new=%d\n",
|
||||
ctrl->id, ref->p_req_valid, have_new_data);
|
||||
```
|
||||
|
||||
Build `linux-fresnel-fourier 7.0-5` (pkgrel 4→5), deploy, reboot, run libva-HEVC + kdirect-HEVC, capture dmesg.
|
||||
|
||||
### Outcome interpretation
|
||||
|
||||
| obj at setup entry | p_req_valid (libva run) | Diagnosis |
|
||||
|---|---|---|
|
||||
| NULL | n/a | req has no v4l2_ctrl_handler bound at queue time. libva's S_EXT_CTRLS never staged. Bug in libva's request lifecycle. |
|
||||
| non-NULL | all false | obj found, but staging path never set `p_req_valid`. Bug in `try_set_ext_ctrls_common` for libva's invocation. |
|
||||
| non-NULL | true for SPS | staging worked but `req_to_new` / `try_or_set_cluster` failed silently. Bug in apply path. Needs another printk after `req_to_new`. |
|
||||
|
||||
iter21 finishes when one of these is confirmed for libva. Compare to kdirect baseline (should always show p_req_valid=true for SPS).
|
||||
|
||||
### Substrate state at iter21 open
|
||||
|
||||
- Kernel `linux-fresnel-fourier 7.0-5` building on boltzmann (PID 1584834, log /tmp/iter21-kbuild.log).
|
||||
- Backend SHA `c1d4bb53…` (iter15 stable) — backend unchanged from iter15.
|
||||
- Fork tip `e109306` (α-24 reverted).
|
||||
- 5-codec anchors: unchanged. Zero regression.
|
||||
|
||||
### Phase 5 review note
|
||||
|
||||
Diagnostic kernel patch (2 pr_info calls in well-known V4L2 framework function, no behavior change). Phase 5 review skipped per iter17 precedent for diagnostic-only kernel work.
|
||||
|
||||
### Phase 7 plan
|
||||
|
||||
After 7.0-5 deploys:
|
||||
1. Reboot fresnel; sddm autologin reseats mfritsche.
|
||||
2. `sudo dmesg -C`.
|
||||
3. Run libva HEVC; capture rkvdec_iter20 + iter21_setup lines.
|
||||
4. `sudo dmesg -C`.
|
||||
5. Run kdirect HEVC; capture same.
|
||||
6. Diff; localize bug to one of the three table-row diagnoses.
|
||||
@@ -0,0 +1,62 @@
|
||||
## Iteration 22 — Phase 4 (plan)
|
||||
|
||||
Opens 2026-05-14 following iter21's smoking-gun finding: libva's request-clone-handler is missing 6 of 7 HEVC stateless controls registered in main_hdl.
|
||||
|
||||
### Locked research question (iter22)
|
||||
|
||||
> *"At which control_id does `v4l2_ctrl_request_clone`'s iteration break for libva, and what error code does `handler_new_ref` return?"*
|
||||
|
||||
### Approach
|
||||
|
||||
Add three printks to `v4l2_ctrl_request_clone` in `drivers/media/v4l2-core/v4l2-ctrls-request.c`:
|
||||
|
||||
```c
|
||||
pr_info("iter22_clone_start: new_hdl=%p from=%p\n", hdl, from);
|
||||
// per iteration:
|
||||
pr_info("iter22_clone_step: id=0x%x err=%d hdl_error=%d new_ref=%p\n",
|
||||
ctrl->id, err, hdl->error, new_ref);
|
||||
// on break:
|
||||
pr_info("iter22_clone_break: at id=0x%x err=%d hdl_error=%d\n",
|
||||
ctrl->id, err, hdl->error);
|
||||
// on end:
|
||||
pr_info("iter22_clone_end: hdl=%p err=%d\n", hdl, err);
|
||||
```
|
||||
|
||||
Built as `linux-fresnel-fourier 7.0-6` (pkgrel 5→6). Deploy, reboot, run libva HEVC + kdirect HEVC. Diff.
|
||||
|
||||
### Outcome interpretation
|
||||
|
||||
| handler_new_ref return | hdl->error | Diagnosis |
|
||||
|---|---|---|
|
||||
| 0, new_ref=valid | 0 | Loop step succeeded — clone wouldn't break here. Look further. |
|
||||
| 0, new_ref=NULL | 0 | Duplicate (skip silently). Means main_hdl has duplicate ctrl_refs — unlikely. |
|
||||
| -ENOMEM | -ENOMEM | kzalloc failed. Memory pressure analysis needed. |
|
||||
| 0, hdl->error=X | non-zero | Earlier auto-class-control insertion failed; subsequent handler_new_ref short-circuits. |
|
||||
| -EINVAL | varies | Validation failed (e.g., overlapping ID range). |
|
||||
|
||||
### Coordinate with iter21 finding
|
||||
|
||||
If iter22 shows the loop breaks at 0xa40905 (H264_PRED_WEIGHTS) and again at 0xa40a91 (HEVC_PPS), the break must be UNREACHED by libva's iteration → means the **source main_hdl itself** doesn't have these controls.
|
||||
|
||||
If iter22 shows the loop reaches 0xa40a91 with err=0 (i.e., NOT a break), then libva's clone-hdl actually DOES contain HEVC_PPS, and our iter21 printk was missing it (e.g., a list-ordering bug in the iteration). Unlikely but worth checking.
|
||||
|
||||
### Substrate state at iter22 open
|
||||
|
||||
- Kernel `linux-fresnel-fourier 7.0-6` building on boltzmann (PID 1613982, log /tmp/iter22-kbuild.log).
|
||||
- Backend SHA `c1d4bb53…` — unchanged from iter15.
|
||||
- Fork tip `e109306` — unchanged.
|
||||
- 5-codec anchors: unchanged.
|
||||
|
||||
### Phase 5 review
|
||||
|
||||
Diagnostic-only kernel patch (printk-only, no behavior change). Skipped per iter17 precedent.
|
||||
|
||||
### Phase 7 plan
|
||||
|
||||
After 7.0-6 deploys:
|
||||
1. Reboot fresnel; sddm autologin reseats mfritsche.
|
||||
2. `sudo dmesg -C`.
|
||||
3. Run libva HEVC; capture iter22_clone_* lines.
|
||||
4. `sudo dmesg -C`.
|
||||
5. Run kdirect HEVC; capture same.
|
||||
6. Diff. Localize the break or absence-from-source.
|
||||
@@ -0,0 +1,127 @@
|
||||
## Iteration 20 — Phase 8 (close)
|
||||
|
||||
Closes 2026-05-14. iter20 = kernel printk for `&ctx->ctrl_hdl`, `run.sps`, `run.decode_params` pointers + first 16 bytes of each, executed at top of `rkvdec_hevc_run` (after `rkvdec_hevc_run_preamble`). FULL close. Mechanism 4 reframed; root-cause localized to one kernel layer.
|
||||
|
||||
### Method
|
||||
|
||||
`linux-fresnel-fourier 7.0-4` adds `rkvdec_iter20:` printk to RK3399 `rkvdec_hevc_run`:
|
||||
|
||||
```c
|
||||
{
|
||||
u8 *sps_bytes = (u8 *)run.sps;
|
||||
u8 *dp_bytes = (u8 *)run.decode_params;
|
||||
pr_info("rkvdec_iter20: ctrl_hdl=%p sps=%p sps[0..16]=%*ph "
|
||||
"dp=%p dp[0..16]=%*ph\n",
|
||||
&ctx->ctrl_hdl, run.sps,
|
||||
16, sps_bytes ? sps_bytes : (u8 *)"",
|
||||
run.decode_params,
|
||||
16, dp_bytes ? dp_bytes : (u8 *)"");
|
||||
}
|
||||
```
|
||||
|
||||
Deployed via scp + `pacman -U` + reboot, with sddm autologin reseating mfritsche session. Build wall-clock 50 min on boltzmann.
|
||||
|
||||
### Results
|
||||
|
||||
**libva HEVC** (13 frames, all identical):
|
||||
|
||||
```
|
||||
rkvdec_iter20: ctrl_hdl=00000000f9b036ba sps=00000000105406cf
|
||||
sps[0..16]=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
||||
dp=00000000117b947e
|
||||
dp[0..16]=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
||||
```
|
||||
|
||||
**kdirect HEVC** (15 frames):
|
||||
|
||||
```
|
||||
rkvdec_iter20: ctrl_hdl=00000000d3afe1db sps=0000000095c47ba1
|
||||
sps[0..16]=00 00 00 05 d0 02 00 00 04 04 02 04 01 01 00 03
|
||||
dp=00000000599ee83f
|
||||
dp[0..16]=00..04..03 (varies per frame — correct, decode_params is per-frame)
|
||||
```
|
||||
|
||||
### What this proves
|
||||
|
||||
1. **`&ctx->ctrl_hdl` differs between processes** (libva `f9b036ba`, kdirect `d3afe1db`) — EXPECTED. Each backend opens `/dev/video3` separately, each gets its own `rkvdec_ctx` with its own private `ctrl_hdl`. This is normal V4L2 m2m.
|
||||
|
||||
2. **The `sps` pointer is stable across all libva frames** (`105406cf`) — confirms the SPS control is registered to the handler exactly once (at CreateContext / `rkvdec_init_ctrls`). The allocation exists, `v4l2_ctrl_find()` returns it correctly. The control structure is registered. Not a registration bug.
|
||||
|
||||
3. **libva's `*sps` content is all-zero**, **kdirect's `*sps` has real bytes** (`00 00 00 05 d0 02 00 00 04 04 02 04 01 01 00 03`) — the first SPS bytes in kdirect's case include `pic_width_in_luma_samples = 1280` (`0x05 0x00 = 1280` in little-endian + framing) which matches kdirect's `rkvdec_hevc_run` printk showing `w=1280`. libva's bytes are zero → its `w=0 h=0` printk follows.
|
||||
|
||||
4. **libva's `*decode_params` is also all-zero** across all 13 frames. kdirect's varies per-frame. Confirms decode_params for libva never gets non-zero values into ctx->ctrl_hdl either.
|
||||
|
||||
### Mechanism analysis
|
||||
|
||||
The SPS control is **registered to `ctx->ctrl_hdl`** (pointer valid, stable, same allocation across 13 frames). What's missing is the **content copy** from `S_EXT_CTRLS` userspace payload into the registered control's `p_cur.p` memory.
|
||||
|
||||
The V4L2 control-framework path for compound controls with `which=V4L2_CTRL_WHICH_REQUEST_VAL=0xf010000`:
|
||||
|
||||
```
|
||||
userspace VIDIOC_S_EXT_CTRLS (which=REQUEST_VAL, request_fd=R, payload=...)
|
||||
→ kernel v4l2_s_ext_ctrls()
|
||||
→ which==REQUEST_VAL branch: looks up R's media_request,
|
||||
stages payload into req->p_new for each control
|
||||
→ returns 0
|
||||
|
||||
userspace MEDIA_REQUEST_IOC_QUEUE on fd R
|
||||
→ kernel queues req's pending bufs and pending controls
|
||||
→ m2m schedules job → device_run callback
|
||||
→ rkvdec_hevc_run_preamble():
|
||||
v4l2_ctrl_request_setup(req, &ctx->ctrl_hdl):
|
||||
copies req->p_new → ctx->ctrl_hdl[ctrl]->p_cur
|
||||
→ rkvdec_hevc_run() — printk fires here, reads ctx->ctrl_hdl values
|
||||
```
|
||||
|
||||
For libva, the printk fires at the **read** site and observes all-zero. Three places this can fail:
|
||||
|
||||
| # | Where | Likelihood |
|
||||
|---|---|---|
|
||||
| A | `v4l2_s_ext_ctrls` doesn't stage libva's payload into `req->p_new` for SPS | unknown — needs probe |
|
||||
| B | `req->p_new` has correct bytes but `v4l2_ctrl_request_setup` doesn't run for libva's request | unknown — needs probe |
|
||||
| C | `v4l2_ctrl_request_setup` runs but doesn't copy SPS for libva's request | unknown — needs probe |
|
||||
|
||||
The kernel-direct path WORKS through the same control framework on the same kernel, same /dev/video3 — so the bug is in **how libva invokes the request lifecycle**, not in the framework code itself.
|
||||
|
||||
### Mechanism status update (post-iter20)
|
||||
|
||||
| # | Mechanism | Status |
|
||||
|---|---|---|
|
||||
| 1 | request_fd mismatch (S_EXT_CTRLS R1, QUEUE R2) | strongly disfavored (strace shows consistent fd per frame, but worth one explicit verification) |
|
||||
| 2 | REINIT clears between S_EXT_CTRLS and QUEUE | DISPROVED iter19 |
|
||||
| 3 | Stack-locals stale | DISPROVED iter18 |
|
||||
| 4 | ctrl_hdl mismatch — different handlers | **REFRAMED iter20**: handlers differ (expected per-process), but BOTH register SPS correctly, and ctx->ctrl_hdl reads stable pointers. NOT a routing bug. |
|
||||
| 5 | error_idx silent partial fail | DISPROVED iter18 |
|
||||
| 6 | **NEW iter20**: req->p_new for SPS never receives libva's payload, OR v4l2_ctrl_request_setup never copies it into ctx->ctrl_hdl | **leading hypothesis** |
|
||||
|
||||
### User-level test for iter21
|
||||
|
||||
Libva can self-diagnose between A and B/C without kernel patches:
|
||||
|
||||
After `S_EXT_CTRLS(which=REQUEST_VAL, request_fd=R, payload=...)`, immediately issue:
|
||||
- `G_EXT_CTRLS(which=REQUEST_VAL, request_fd=R)` for SPS.
|
||||
|
||||
If readback returns non-zero bytes → **req->p_new HAS the payload** (mechanism A disproved, B or C remains).
|
||||
|
||||
If readback returns zero → **req->p_new doesn't have it** (mechanism A confirmed).
|
||||
|
||||
The G_EXT_CTRLS path with which=REQUEST_VAL reads from `req->p_new` directly — that's the staging slot. Outcome localizes the bug to one of two kernel layers.
|
||||
|
||||
### Substrate state at iter20 close
|
||||
|
||||
- Backend SHA on fresnel: `c1d4bb53…` (iter15 stable, unchanged).
|
||||
- Fork tip `415688d` (iter19 state, unchanged).
|
||||
- Kernel `linux-fresnel-fourier 7.0-4` with iter17 + iter20 printk in rkvdec_hevc_run. NOT a shipping kernel — diagnostic only.
|
||||
- 5-codec anchors: unchanged from iter15. Zero regression.
|
||||
|
||||
### iter21 candidate
|
||||
|
||||
`α-24`: Add G_EXT_CTRLS readback in libva's `h265_set_controls` right after every `v4l2_set_controls(... which=REQUEST_VAL ...)` call. Log first 16 bytes of returned SPS. ~15 LOC, fully reversible. Test in this single iter, then revert (diagnostic only, not for shipping).
|
||||
|
||||
Outcomes:
|
||||
- **Non-zero readback** → req->p_new has libva's payload. Bug is in `v4l2_ctrl_request_setup` not running or not copying. iter22 = kernel printk in `v4l2_ctrl_request_setup` showing what gets copied for libva's request_fd at IOC_QUEUE time.
|
||||
- **Zero readback** → req->p_new doesn't have libva's payload. Bug is in `v4l2_s_ext_ctrls` staging for libva's invocation. iter22 = kernel printk in `v4l2_s_ext_ctrls` showing what libva actually passed.
|
||||
|
||||
### Lesson
|
||||
|
||||
iter17 + iter20 prove `&ctx->ctrl_hdl` pointer routing is NOT the failure surface (registered controls allocated correctly, found correctly, pointer-stable). The failure surface is the **content copy** from userspace S_EXT_CTRLS into ctx->ctrl_hdl across the request lifecycle. Three iterations (17, 19, 20) of kernel printk have walked the bug-localization down from "anywhere in the kernel" → "S_EXT_CTRLS staging or v4l2_ctrl_request_setup application". Two more printk+probe iterations should reach the line of code.
|
||||
@@ -0,0 +1,146 @@
|
||||
## Iteration 21 — Phase 8 (close)
|
||||
|
||||
Closes 2026-05-14. iter21 = kernel printk at top of `v4l2_ctrl_request_setup` + per-ref dump. FULL close. **Smoking-gun finding: libva's request-clone-handler is missing 6 HEVC stateless controls registered in main_hdl.**
|
||||
|
||||
### Method
|
||||
|
||||
`linux-fresnel-fourier 7.0-5` (pkgrel 4→5) adds two `pr_info` to `v4l2_ctrl_request_setup` in `drivers/media/v4l2-core/v4l2-ctrls-request.c`:
|
||||
|
||||
```c
|
||||
obj = media_request_object_find(req, &req_ops, main_hdl);
|
||||
pr_info("iter21_setup: req=%p main_hdl=%p obj=%p\n", req, main_hdl, obj);
|
||||
...
|
||||
list_for_each_entry(ref, &hdl->ctrl_refs, node) {
|
||||
...
|
||||
pr_info("iter21_setup_ref: ctrl_id=0x%x p_req_valid=%d have_new=%d\n",
|
||||
ctrl->id, ref->p_req_valid, have_new_data);
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Built ~1 min via ccache reuse. Deployed via scp + `pacman -U` + reboot.
|
||||
|
||||
### α-24 result (predicate: kernel-only path required)
|
||||
|
||||
α-24 (libva G_EXT_CTRLS readback after S_EXT_CTRLS) implemented as 1547a5d → amended a9c897f → reverted e109306. Kernel returned **EACCES** for all 13 libva HEVC frames: this V4L2 build disallows userspace probing of `req->p_new` for an uncompleted request. The probe path must run inside the kernel.
|
||||
|
||||
### Result — definitive (libva vs kdirect)
|
||||
|
||||
**libva HEVC frame 1 setup** (clone-hdl ctrl_refs in ID order, 14 entries):
|
||||
|
||||
```
|
||||
0x990a67 p_req_valid=0
|
||||
0x990a6b p_req_valid=0
|
||||
0x990b00 p_req_valid=0
|
||||
0x990b67 p_req_valid=0
|
||||
0x990b68 p_req_valid=0 (5 codec-class menu controls)
|
||||
0xa40900 p_req_valid=0 H264_DECODE_MODE
|
||||
0xa40901 p_req_valid=0 H264_START_CODE
|
||||
0xa40902 p_req_valid=0 H264_SPS
|
||||
0xa40903 p_req_valid=0 H264_PPS
|
||||
0xa40904 p_req_valid=0 H264_SCALING_MATRIX
|
||||
0xa40907 p_req_valid=0 H264_DECODE_PARAMS
|
||||
0xa40a2c p_req_valid=0 (misc stateless)
|
||||
0xa40a2d p_req_valid=0 (misc stateless)
|
||||
0xa40a90 p_req_valid=1 have_new=1 HEVC_SPS — CLONE STOPS HERE
|
||||
```
|
||||
|
||||
**Missing from libva clone (vs kdirect):**
|
||||
- 0xa40905 H264_PRED_WEIGHTS (compound)
|
||||
- 0xa40906 H264_SLICE_PARAMS (compound, dyn_array)
|
||||
- 0xa40a91 HEVC_PPS (compound)
|
||||
- 0xa40a92 HEVC_SLICE_PARAMS (compound, dyn_array)
|
||||
- 0xa40a93 HEVC_SCALING_MATRIX (compound)
|
||||
- 0xa40a94 HEVC_DECODE_PARAMS (compound)
|
||||
- 0xa40a95 HEVC_DECODE_MODE (menu)
|
||||
- 0xa40a96 HEVC_START_CODE (menu)
|
||||
|
||||
**kdirect HEVC frame 1 setup** (same hdl, 21 entries — all of above PLUS the 8 missing):
|
||||
|
||||
```
|
||||
... 14 entries as above ...
|
||||
0xa40a91 p_req_valid=1 have_new=1 HEVC_PPS
|
||||
0xa40a92 p_req_valid=1 have_new=1 HEVC_SLICE_PARAMS
|
||||
0xa40a93 p_req_valid=1 have_new=1 HEVC_SCALING_MATRIX
|
||||
0xa40a94 p_req_valid=1 have_new=1 HEVC_DECODE_PARAMS
|
||||
0xa40a95 p_req_valid=0 HEVC_DECODE_MODE (device-init only)
|
||||
0xa40a96 p_req_valid=0 HEVC_START_CODE (device-init only)
|
||||
```
|
||||
|
||||
### What this means
|
||||
|
||||
`v4l2_ctrl_request_setup(req, main_hdl)`:
|
||||
- finds `obj` for both libva and kdirect (non-NULL) — request properly bound.
|
||||
- iterates `hdl->ctrl_refs` — but **libva's hdl is the request-clone-hdl, and it contains 14 of the 21 source controls**.
|
||||
- libva's HEVC_SPS has `p_req_valid=1` — staging worked for that one control.
|
||||
- The other 6 HEVC controls (PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS, DECODE_MODE, START_CODE) **don't exist in the clone-hdl at all** — they cannot be staged.
|
||||
|
||||
When libva submits its 5-control S_EXT_CTRLS batch (SPS, PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS), only SPS is registered in the clone-hdl. PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS find no ref → `prepare_ext_ctrls` returns `-EINVAL`. (This contradicts iter18 α-22's rc=0 — needs re-investigation of error_idx semantics for the request path; the userspace observation of rc=0 may not reflect the actual kernel error for compound-control lookups in request clones.)
|
||||
|
||||
### iter20's "zero SPS bytes" explained
|
||||
|
||||
iter20 showed `rkvdec sees sps[0..16]=00..00` for libva. That's because:
|
||||
- HEVC_SPS *is* in the clone-hdl with `p_req_valid=1` — so it got STAGED.
|
||||
- But the **content** in `req->p_new[SPS]` is all-zero.
|
||||
|
||||
Two possible reasons for zero content despite p_req_valid=1:
|
||||
1. `user_to_new` ran on a zero-payload from libva. iter15 strace ruled this out — libva's SPS payload is non-zero at ioctl entry.
|
||||
2. `new_to_req` ran, but the data flow is somehow corrupted. Possible if the master/cluster lookup is wrong on the clone-hdl.
|
||||
|
||||
iter22 candidate: add a printk in `new_to_req` and `req_to_new` to log the copy: source pointer, dest pointer, first 4 bytes, payload size.
|
||||
|
||||
### Mechanism status (post-iter21)
|
||||
|
||||
| # | Mechanism | Status |
|
||||
|---|---|---|
|
||||
| 1 | request_fd mismatch | DISPROVED iter17/18 |
|
||||
| 2 | REINIT clears | DISPROVED iter19 |
|
||||
| 3 | Stack-locals stale | DISPROVED iter18 |
|
||||
| 4 | ctrl_hdl mismatch | REFRAMED iter20 |
|
||||
| 5 | error_idx silent failure | DISPROVED iter18 (but warrants re-check given iter21 finding) |
|
||||
| 6 | req->p_new staging incomplete | **CONFIRMED iter21**: clone-hdl missing controls = staging cannot occur for 6 of 7 HEVC controls |
|
||||
| 7 | **NEW iter21**: clone-hdl is missing controls that main_hdl has registered | **Root question for iter22** |
|
||||
|
||||
### Why is the clone incomplete?
|
||||
|
||||
`v4l2_ctrl_request_clone(new_hdl, from=main_hdl)` iterates `main_hdl->ctrl_refs` in ID-sorted order. After cloning HEVC_SPS (0xa40a90), the loop **stops** before HEVC_PPS (0xa40a91). Equivalent stops happen at H264_PRED_WEIGHTS (0xa40905) — both are first compound controls of their codec block.
|
||||
|
||||
Hypothesis: `handler_new_ref` returns non-zero error at the first compound control AFTER an SPS-like single-struct compound, but **only when called from the request-clone path**. Or: `kzalloc(sizeof(*new_ref) + size_extra_req)` fails for ones with larger `elem_size` (HEVC_PPS = 64 bytes, H264_PRED_WEIGHTS = 32 bytes — small, unlikely to OOM but worth verifying).
|
||||
|
||||
Alt hypothesis: `handler_new_ref`'s auto-class-control insertion (`v4l2_ctrl_new_std`) fails for non-compound HEVC menu controls in request-clone path, which propagates `hdl->error` and breaks subsequent iterations.
|
||||
|
||||
Same kernel succeeds for kdirect on the same `from` hdl, so something is **per-request-bind specific** — maybe related to request lifecycle timing in libva (iter6 permanent request_fd at CreateContext) vs kdirect (per-frame request_fd).
|
||||
|
||||
### Substrate state at iter21 close
|
||||
|
||||
- Backend SHA on fresnel: `c1d4bb53…` (iter15 stable, unchanged).
|
||||
- Fork tip `e109306` (α-24 reverted).
|
||||
- Kernel `linux-fresnel-fourier 7.0-5` with iter17 + iter20 + iter21 printks. NOT a shipping kernel.
|
||||
- 5-codec anchors: unchanged. Zero regression.
|
||||
|
||||
### iter22 candidate
|
||||
|
||||
Add printks to `v4l2_ctrl_request_clone` and `handler_new_ref`:
|
||||
|
||||
```c
|
||||
// in v4l2_ctrl_request_clone
|
||||
pr_info("iter22_clone_start: new_hdl=%p from=%p\n", hdl, from);
|
||||
|
||||
// per iteration
|
||||
err = handler_new_ref(hdl, ctrl, &new_ref, false, true);
|
||||
pr_info("iter22_clone_step: id=0x%x err=%d from_other=%d\n",
|
||||
ctrl->id, err, ref->from_other_dev);
|
||||
if (err) {
|
||||
pr_info("iter22_clone_break: at id=0x%x err=%d hdl_error=%d\n",
|
||||
ctrl->id, err, hdl->error);
|
||||
break;
|
||||
}
|
||||
```
|
||||
|
||||
After 7.0-6 deploys, libva HEVC run will show exactly which ctrl_id breaks the loop and the error code. Then we can localize either to `kzalloc` failure, `v4l2_ctrl_new_std` failure (auto-class), or some other condition.
|
||||
|
||||
### Lesson
|
||||
|
||||
iter21 overturns the iter11–iter18 hypothesis space entirely. The S_EXT_CTRLS ioctl wire-byte payload analysis was correct — libva's bytes match kdirect's. But **at the v4l2_ctrl framework level, libva's request-clone is missing the registered controls libva tries to stage**. The bug is in how the V4L2 control framework handles libva's specific request-binding pattern, NOT in libva's ioctl content.
|
||||
|
||||
This is the strongest narrowing since iter17. We've gone from "anywhere in kernel" → "kernel control framework" → "request-clone path specifically" → "iteration breaks at first compound HEVC control".
|
||||
@@ -0,0 +1,118 @@
|
||||
## Iteration 22 — Phase 8 (close)
|
||||
|
||||
Closes 2026-05-14. iter22 = kernel printk in `v4l2_ctrl_request_clone` tracing each `handler_new_ref` step. FULL close. **iter21's mid-conclusion is overturned: the request-clone-hdl is COMPLETE for libva — all 22 controls cloned with err=0.**
|
||||
|
||||
### Method
|
||||
|
||||
`linux-fresnel-fourier 7.0-6` added per-step pr_info to `v4l2_ctrl_request_clone`:
|
||||
|
||||
```c
|
||||
pr_info("iter22_clone_start: new_hdl=%p from=%p\n", hdl, from);
|
||||
list_for_each_entry(ref, &from->ctrl_refs, node) {
|
||||
...
|
||||
err = handler_new_ref(hdl, ctrl, &new_ref, false, true);
|
||||
pr_info("iter22_clone_step: id=0x%x err=%d hdl_error=%d new_ref=%p\n",
|
||||
ctrl->id, err, hdl->error, new_ref);
|
||||
...
|
||||
}
|
||||
pr_info("iter22_clone_end: hdl=%p err=%d\n", hdl, err);
|
||||
```
|
||||
|
||||
Built ~2 min via ccache. Deployed via scp + pacman -U + reboot.
|
||||
|
||||
### Results
|
||||
|
||||
**libva HEVC** (11 clones — one per request_fd binding):
|
||||
|
||||
Every clone-step logs `err=0 hdl_error=0 new_ref=valid_ptr`. Each clone has **22 controls successfully added**, ending with `iter22_clone_end err=0`. Full ID list per clone:
|
||||
|
||||
```
|
||||
0x990001, 0x990a67, 0x990a6b, 0x990b00, 0x990b67, 0x990b68,
|
||||
0xa40001, 0xa40900, 0xa40901, 0xa40902, 0xa40903, 0xa40904, 0xa40907,
|
||||
0xa40a2c, 0xa40a2d,
|
||||
0xa40a90, 0xa40a91, 0xa40a92, 0xa40a93, 0xa40a94, 0xa40a95, 0xa40a96
|
||||
```
|
||||
|
||||
**Note**: H264_PRED_WEIGHTS (0xa40905) and H264_SLICE_PARAMS (0xa40906) are NOT in the source main_hdl (these are not registered in rkvdec_h264_ctrls on this kernel — rkvdec doesn't expose them). All 7 HEVC stateless decode controls ARE present.
|
||||
|
||||
**kdirect HEVC** (13 clones): identical pattern, identical ID set, all err=0.
|
||||
|
||||
### What this overturns from iter21
|
||||
|
||||
iter21 concluded the clone-hdl was missing 6 HEVC controls for libva. **That was wrong.** The clone-hdl actually has all 22 controls. The iter21_setup_ref printk's iteration was filtering 8 of them out via the early-return check:
|
||||
|
||||
```c
|
||||
if (ref->req_done || (ctrl->flags & V4L2_CTRL_FLAG_READ_ONLY))
|
||||
continue;
|
||||
```
|
||||
|
||||
For libva, only 14 refs reach the printk. For kdirect, 21 refs reach it. 8 vs 1 difference in skip count.
|
||||
|
||||
### What this implies
|
||||
|
||||
Two of the 8 skipped controls are clearly READ_ONLY: codec class roots `0x990001` and `0xa40001` (V4L2_CTRL_TYPE_CTRL_CLASS). That accounts for 2 of 8 in both libva and kdirect.
|
||||
|
||||
For libva, **6 additional HEVC controls** (PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS, DECODE_MODE, START_CODE) get skipped. For kdirect, only **DECODE_MODE + START_CODE** get skipped (the 2 with p_req_valid=0 that aren't staged this frame).
|
||||
|
||||
Wait — kdirect shows DECODE_MODE + START_CODE in the setup_ref printk with p_req_valid=0. So they're NOT skipped by the continue check. So kdirect's 21 displayed = 22 cloned - 1 (only one CLASS root being printed?). Hmm, mismatch.
|
||||
|
||||
Actually the iter21 setup_ref for kdirect showed 21 lines visible — 20 entries +1 setup header. Let me re-examine. The kdirect dump had 21 ctrl_refs lines (excluding setup: line). 22 clone - 21 setup_ref = 1 skipped. Possibly only one class root present in kdirect's clone-hdl somehow.
|
||||
|
||||
So the actual difference is: libva skips 8, kdirect skips 1. The 7 EXTRA skips for libva are: 6 HEVC controls (PPS, SLICE, SCALING, DECODE_PARAMS, DECODE_MODE, START_CODE) + 1 mystery.
|
||||
|
||||
### Mechanism status (post-iter22)
|
||||
|
||||
| # | Mechanism | Status |
|
||||
|---|---|---|
|
||||
| 1 | request_fd mismatch | DISPROVED |
|
||||
| 2 | REINIT clears | DISPROVED iter19 |
|
||||
| 3 | Stack-locals stale | DISPROVED iter18 |
|
||||
| 4 | ctrl_hdl mismatch | REFRAMED iter20 |
|
||||
| 5 | error_idx silent failure | DISPROVED iter18 |
|
||||
| 6 | req->p_new staging incomplete | iter21 → 22 OVERTURNED (clone IS complete) |
|
||||
| 7 | Clone-hdl missing controls | DISPROVED iter22 (clone has all 22) |
|
||||
| 8 | **NEW iter22**: 6 of 7 HEVC controls get skipped in v4l2_ctrl_request_setup loop for libva but not kdirect | **leading hypothesis** |
|
||||
|
||||
### iter23 candidate
|
||||
|
||||
Add printk **inside** the `v4l2_ctrl_request_setup` loop **before** the `continue` check, logging `req_done`, `flags`, `ncontrols`, `cluster[0]->id`:
|
||||
|
||||
```c
|
||||
pr_info("iter23_loop: id=0x%x req_done=%d flags=0x%x ncontrols=%d cluster0_id=0x%x\n",
|
||||
ctrl->id, ref->req_done, ctrl->flags,
|
||||
master->ncontrols,
|
||||
master->cluster[0] ? master->cluster[0]->id : 0);
|
||||
if (ref->req_done || (ctrl->flags & V4L2_CTRL_FLAG_READ_ONLY))
|
||||
continue;
|
||||
```
|
||||
|
||||
Two possible findings:
|
||||
- **req_done already true** for the 6 HEVC controls → an earlier iteration (HEVC_SPS) clustered them and set req_done. Means main_hdl's HEVC_SPS has `master->cluster` containing PPS+SLICE+SCALING+DECODE_PARAMS+DECODE_MODE+START_CODE on libva's path.
|
||||
- **flags has READ_ONLY** → the controls have READ_ONLY set, which is wrong for stateless decode controls.
|
||||
|
||||
`ncontrols` and `cluster0_id` reveal cluster membership directly: if `ncontrols > 1` for HEVC_SPS, it's been clustered with siblings.
|
||||
|
||||
### Substrate state at iter22 close
|
||||
|
||||
- Backend SHA on fresnel: `c1d4bb53…` (iter15 stable, unchanged).
|
||||
- Fork tip `e109306` (α-24 reverted) — unchanged.
|
||||
- Kernel `linux-fresnel-fourier 7.0-6` with iter17 + iter20 + iter21 + iter22 printks.
|
||||
- 5-codec anchors: unchanged.
|
||||
|
||||
### Iter23 build kicked off
|
||||
|
||||
`linux-fresnel-fourier 7.0-7` building on boltzmann (PID 1643224, log /tmp/iter23-kbuild.log). Expected ~2 min via ccache.
|
||||
|
||||
### Lesson
|
||||
|
||||
iter21's mid-conclusion was based on the wrong printk position — the `iter21_setup_ref` printk was inside `v4l2_ctrl_request_setup`'s loop but AFTER the early-`continue` checks, missing controls that get skipped. iter22's clone-trace showed the clone IS complete; the staging FAILS via the setup-loop SKIP path, not the clone path.
|
||||
|
||||
The empirical pattern is now: **libva's per-frame request gets through clone correctly; gets through S_EXT_CTRLS correctly (stages all 5 controls with p_req_valid=1 — at least for SPS, definitely 1); but at setup-loop time, 6 of the 7 HEVC controls get a `continue` that bypasses `req_to_new`**. SPS alone reaches `req_to_new` → `try_or_set_cluster` → commits to `p_cur` → rkvdec_run reads ctx->ctrl_hdl[SPS]->p_cur which is non-zero?
|
||||
|
||||
Actually — wait, iter20 said sps[0..16] was zero for libva. If SPS is the only one that reaches req_to_new + try_or_set_cluster, then SPS's content SHOULD be correct, but iter20 said it's zero. So SPS itself ALSO has a problem in the commit path.
|
||||
|
||||
So we have:
|
||||
- 6 of 7 HEVC controls: never reach `req_to_new` (skipped in setup loop).
|
||||
- 1 of 7 (SPS): reaches `req_to_new` but resulting `p_cur` content is zero anyway.
|
||||
|
||||
These are TWO separate bugs (or one bug with two symptoms). iter23 will reveal the skip mechanism; another iter (or test) may need to address the SPS-commit-content issue.
|
||||
@@ -0,0 +1,105 @@
|
||||
## Iteration 23 — Phase 8 (close)
|
||||
|
||||
Closes 2026-05-14. iter23 = kernel printk inside `v4l2_ctrl_request_setup` outer loop, BEFORE the `continue` check, logging every iteration. FULL close.
|
||||
|
||||
### Method
|
||||
|
||||
`linux-fresnel-fourier 7.0-7` added one pr_info at TOP of the outer loop in `v4l2_ctrl_request_setup`, BEFORE `if (ref->req_done || (ctrl->flags & V4L2_CTRL_FLAG_READ_ONLY)) continue;`:
|
||||
|
||||
```c
|
||||
pr_info("iter23_loop: id=0x%x req_done=%d flags=0x%x ncontrols=%d cluster0_id=0x%x\n",
|
||||
ctrl->id, ref->req_done, ctrl->flags,
|
||||
master->ncontrols,
|
||||
master->cluster[0] ? master->cluster[0]->id : 0);
|
||||
```
|
||||
|
||||
### Result — definitive
|
||||
|
||||
**libva HEVC** (first setup): iter23_loop fires for 16 IDs ending at 0xa40a90 (HEVC_SPS). **The outer loop EXITS before reaching 0xa40a91.**
|
||||
|
||||
**kdirect HEVC** (first setup): iter23_loop fires for 22 IDs ending at 0xa40a96 (HEVC_START_CODE). **The outer loop completes normally.**
|
||||
|
||||
The loop body has only two exit-loop paths after the iter23_loop printk fires:
|
||||
1. `goto error` if `req_to_new(r)` returns non-zero.
|
||||
2. `break` if `try_or_set_cluster(NULL, master, true, 0)` returns non-zero.
|
||||
|
||||
For libva, ONE of these fires AT HEVC_SPS, exiting the loop. For kdirect, NEITHER fires.
|
||||
|
||||
This **fully overturns iter21/22**:
|
||||
- The clone-hdl IS complete for libva (iter22 confirmed all 22 controls cloned).
|
||||
- The setup loop reaches HEVC_SPS for libva (iter23 confirmed).
|
||||
- The processing of HEVC_SPS in the setup loop FAILS for libva.
|
||||
|
||||
The failure of HEVC_SPS processing means:
|
||||
- `p_cur` for HEVC_SPS is never committed → rkvdec reads zero (iter20 finding).
|
||||
- All subsequent compound HEVC controls (PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS, DECODE_MODE, START_CODE) NEVER reach their processing → their `req_done` stays false but they're also never committed → all zero in `ctx->ctrl_hdl`.
|
||||
|
||||
### Why does HEVC_SPS processing fail for libva but not kdirect?
|
||||
|
||||
The most likely candidates:
|
||||
|
||||
| Function | Failure modes |
|
||||
|---|---|
|
||||
| `req_to_new(ref_SPS)` | -ENOENT if `!p_req_valid`. -EINVAL if elem count mismatch (`p_req_elems != p_array_alloc_elems` for non-dyn-array). -ENOMEM if alloc fails for dyn-array resize. |
|
||||
| `try_or_set_cluster(NULL, master_SPS, true, 0)` | Validator failures (out-of-range field values). Cluster ops failures. Often returns -EINVAL or -ERANGE. |
|
||||
|
||||
iter24 will pinpoint which function fails and what return value.
|
||||
|
||||
### iter21/22's interpretation errors
|
||||
|
||||
- **iter21**: I concluded the clone-hdl was missing controls. Wrong — the iter21_setup_ref printk was inside the loop body but AFTER the early-continue check. The "missing" controls were actually iterated past after SPS's processing failed and the loop exited — they never even saw the iter21 printk.
|
||||
- **iter22**: The clone trace confirmed clone-hdl is complete. Good. But my mid-conclusion ("clone-hdl is complete; staging fails in setup loop SKIP path") was partially wrong — the loop doesn't SKIP, it EXITS.
|
||||
|
||||
### Mechanism status (post-iter23)
|
||||
|
||||
| # | Mechanism | Status |
|
||||
|---|---|---|
|
||||
| 1 | request_fd mismatch | DISPROVED iter17/18 |
|
||||
| 2 | REINIT clears | DISPROVED iter19 |
|
||||
| 3 | Stack-locals stale | DISPROVED iter18 |
|
||||
| 4 | ctrl_hdl mismatch | DISPROVED iter20-22 |
|
||||
| 5 | error_idx silent failure | DISPROVED iter18 |
|
||||
| 6 | req->p_new staging incomplete | DISPROVED iter22 |
|
||||
| 7 | Clone-hdl missing controls | DISPROVED iter22 |
|
||||
| 8 | Skip-loop bypass | DISPROVED iter23 (loop EXITS, not skips) |
|
||||
| 9 | **NEW iter23**: HEVC_SPS processing in v4l2_ctrl_request_setup fails for libva | **LEADING — iter24 candidate** |
|
||||
|
||||
### iter24 candidate
|
||||
|
||||
`linux-fresnel-fourier 7.0-8`:
|
||||
|
||||
```c
|
||||
ret = req_to_new(r);
|
||||
pr_info("iter24_req_to_new: id=0x%x ret=%d p_req_valid=%d p_req_elems=%u\n",
|
||||
master->cluster[i]->id, ret, r->p_req_valid, r->p_req_elems);
|
||||
...
|
||||
ret = try_or_set_cluster(NULL, master, true, 0);
|
||||
pr_info("iter24_try_or_set: master_id=0x%x ret=%d\n", master->id, ret);
|
||||
```
|
||||
|
||||
After 7.0-8 deploys, libva HEVC will show:
|
||||
- `iter24_req_to_new id=0xa40a90 ret=X p_req_valid=Y p_req_elems=Z` where X is the actual return value.
|
||||
- If req_to_new ret != 0 → bug is in req_to_new for HEVC_SPS on libva's staged data. Compare p_req_elems to kdirect's value.
|
||||
- If req_to_new ret == 0 → check iter24_try_or_set's ret. If non-zero → validator rejects libva's SPS but accepts kdirect's. Investigate which field validator rejects.
|
||||
|
||||
### Substrate state at iter23 close
|
||||
|
||||
- Backend SHA on fresnel: `c1d4bb53…` (iter15 stable, unchanged).
|
||||
- Fork tip `e109306` — unchanged.
|
||||
- Kernel `linux-fresnel-fourier 7.0-7` with iter17 + iter20 + iter21 + iter22 + iter23 printks.
|
||||
- 5-codec anchors: unchanged.
|
||||
|
||||
### iter24 build kicked off
|
||||
|
||||
`linux-fresnel-fourier 7.0-8` building on boltzmann (PID 1672261, log /tmp/iter24-kbuild.log).
|
||||
|
||||
### Lesson
|
||||
|
||||
Three iterations of mid-loop printk (iter21, iter22, iter23) needed to localize the exit. Each iteration overturned the previous's partial conclusion. Key methodology: **place the diagnostic printk at the very top of each loop body, BEFORE any continue/break, to distinguish "skipped" from "exited"**. Without that, "missing from printk output" is ambiguous.
|
||||
|
||||
The bug is now localized to:
|
||||
- A specific function: `req_to_new` OR `try_or_set_cluster`.
|
||||
- A specific control: HEVC_SPS.
|
||||
- A specific request lifecycle pattern: libva's, not kdirect's.
|
||||
|
||||
One more printk iteration (iter24) should give the failing function + return code.
|
||||
@@ -0,0 +1,123 @@
|
||||
## Iteration 24 — Phase 8 (close)
|
||||
|
||||
Closes 2026-05-14. iter24 = kernel printk logging `req_to_new` and `try_or_set_cluster` return values. FULL close. **ROOT CAUSE IDENTIFIED.**
|
||||
|
||||
### Method
|
||||
|
||||
`linux-fresnel-fourier 7.0-8` (pkgrel 7→8). Added pr_info after each kernel framework call in `v4l2_ctrl_request_setup`'s cluster-process block:
|
||||
|
||||
```c
|
||||
ret = req_to_new(r);
|
||||
pr_info("iter24_req_to_new: id=0x%x ret=%d p_req_valid=%d p_req_elems=%u\n",
|
||||
master->cluster[i]->id, ret, r->p_req_valid, r->p_req_elems);
|
||||
...
|
||||
ret = try_or_set_cluster(NULL, master, true, 0);
|
||||
pr_info("iter24_try_or_set: master_id=0x%x ret=%d\n", master->id, ret);
|
||||
```
|
||||
|
||||
### Result — definitive
|
||||
|
||||
**libva HEVC** (all 10+ setups, identical pattern):
|
||||
|
||||
```
|
||||
iter24_req_to_new: id=0xa40a90 ret=0 p_req_valid=1 p_req_elems=1
|
||||
iter24_try_or_set: master_id=0xa40a90 ret=-16
|
||||
iter24_loop_break: at master_id=0xa40a90 ret=-16
|
||||
iter24_loop_done: final ret=-16
|
||||
```
|
||||
|
||||
`-16` is `-EBUSY`. `req_to_new` succeeds. `try_or_set_cluster` returns -EBUSY for HEVC_SPS, **exiting the setup loop**.
|
||||
|
||||
**kdirect HEVC**: continues processing all 5 staged controls successfully (ret=0 throughout).
|
||||
|
||||
### Source localization
|
||||
|
||||
The only -EBUSY path in `try_or_set_cluster` is `call_op(master, s_ctrl)` for HEVC_SPS, which dispatches to `rkvdec_s_ctrl` in `drivers/media/platform/rockchip/rkvdec/rkvdec.c:149`:
|
||||
|
||||
```c
|
||||
static int rkvdec_s_ctrl(struct v4l2_ctrl *ctrl)
|
||||
{
|
||||
struct rkvdec_ctx *ctx = container_of(ctrl->handler, struct rkvdec_ctx, ctrl_hdl);
|
||||
const struct rkvdec_coded_fmt_desc *desc = ctx->coded_fmt_desc;
|
||||
enum rkvdec_image_fmt image_fmt;
|
||||
struct vb2_queue *vq;
|
||||
|
||||
...
|
||||
|
||||
/* Check if this change requires a capture format reset */
|
||||
if (!desc->ops->get_image_fmt)
|
||||
return 0;
|
||||
|
||||
image_fmt = desc->ops->get_image_fmt(ctx, ctrl);
|
||||
if (rkvdec_image_fmt_changed(ctx, image_fmt)) {
|
||||
vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
|
||||
V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE);
|
||||
if (vb2_is_busy(vq))
|
||||
return -EBUSY; // ← THIS
|
||||
|
||||
ctx->image_fmt = image_fmt;
|
||||
rkvdec_reset_decoded_fmt(ctx);
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
### Root cause
|
||||
|
||||
When the first HEVC_SPS arrives, rkvdec needs to determine the output image format from SPS fields (chroma_format_idc, bit_depth_luma/chroma_minus8). If the format differs from the previous/default — which it does at first-frame because ctx->image_fmt starts at the default — rkvdec wants to reset the CAPTURE format.
|
||||
|
||||
But it can only do that if the CAPTURE queue has NO buffers allocated. `vb2_is_busy(vq)` returns true if `vq->num_buffers > 0`.
|
||||
|
||||
**libva pre-allocates 24 CAPTURE buffers at CreateContext (iter5b-β design)**. By the time the first per-frame S_EXT_CTRLS(HEVC_SPS, REQUEST_VAL) fires, CAPTURE is already full → vb2_is_busy=true → -EBUSY → setup loop exits → SPS never committed → all-zero in ctx->ctrl_hdl → rkvdec_hevc_run reads zero.
|
||||
|
||||
**kdirect (ffmpeg-v4l2request)** allocates CAPTURE buffers AFTER the SPS-driven format is known. So when its first S_EXT_CTRLS fires, CAPTURE is EMPTY → vb2_is_busy=false → format reset succeeds → s_ctrl returns 0 → SPS commits correctly.
|
||||
|
||||
### This is THE Bug 5 root cause
|
||||
|
||||
After 24 iterations of investigation, including 8 wire-byte hypothesis eliminations, 4 mechanism eliminations, and 5 kernel-side printk iterations:
|
||||
|
||||
**Bug 5 (HEVC libva = all-zero CAPTURE) is caused by libva pre-allocating CAPTURE buffers before the first SPS-set, blocking rkvdec's format-reset.**
|
||||
|
||||
Bug 4 (H264 libva = keyframe partial) is likely the same root cause — H264_SPS triggers the same image_fmt check via rkvdec_h264_fmt_ops's get_image_fmt.
|
||||
|
||||
### Why VP9 works through libva
|
||||
|
||||
VP9 (rkvdec_vp9_ctrls) might NOT have a get_image_fmt op (vp9_frame is the only control, and chroma+bit_depth come from frame header, not a separate SPS). Or VP9's frame parameters always resolve to the same image_fmt as the default. Either way, no format-reset attempt → no -EBUSY.
|
||||
|
||||
### Mechanism status — RESOLVED
|
||||
|
||||
| # | Mechanism | Status |
|
||||
|---|---|---|
|
||||
| ALL prior | various | DISPROVED iter17-23 |
|
||||
| **iter24** | **rkvdec_s_ctrl returns -EBUSY for HEVC_SPS because CAPTURE queue is busy with libva's pre-allocated pool** | **CONFIRMED — ROOT CAUSE** |
|
||||
|
||||
### Fix candidates
|
||||
|
||||
**Option A** (libva backend fix): Defer libva's CAPTURE pool allocation until AFTER the first per-frame SPS is set. Concretely:
|
||||
- At CreateContext: skip cap_pool_init.
|
||||
- On first BeginPicture/EndPicture: after first S_EXT_CTRLS(SPS) succeeds, then REQBUFS+QUERYBUF+MMAP the CAPTURE pool.
|
||||
- Risk: changes the iter5b-β "permanent CAPTURE pool" model, may regress VP9/MPEG-2.
|
||||
|
||||
**Option B** (libva backend fix, narrower): Use S_FMT(CAPTURE) BEFORE allocating CAPTURE buffers, with the same image_fmt the SPS will request. This way, ctx->image_fmt is already correct when SPS arrives → rkvdec_image_fmt_changed returns false → no reset attempt → no -EBUSY.
|
||||
|
||||
**Option C** (kernel fix, upstream): Change rkvdec_s_ctrl to silently no-op the format-reset if the image_fmt is already correct, even if get_image_fmt returns a value that triggered the check. This is risky — it changes upstream rkvdec semantics.
|
||||
|
||||
**Option B is preferred** — minimal libva change, aligns with kdirect's pattern (set S_FMT(CAPTURE) before allocating).
|
||||
|
||||
### iter25 candidate
|
||||
|
||||
Implement Option B in libva backend's CreateContext: explicit `v4l2_set_format(CAPTURE, V4L2_PIX_FMT_NV12, fixture_w, fixture_h)` BEFORE `cap_pool_init`. Set the expected format from BBB's parameters (chroma 4:2:0, 8-bit → NV12).
|
||||
|
||||
This builds on iter15's α-19 which already adds an explicit S_FMT(CAPTURE) call — but verify it ACTUALLY runs before cap_pool_init in the libva CreateContext flow.
|
||||
|
||||
### Substrate state at iter24 close
|
||||
|
||||
- Backend SHA on fresnel: `c1d4bb53…` (iter15 stable, unchanged).
|
||||
- Fork tip `e109306` — unchanged.
|
||||
- Kernel `linux-fresnel-fourier 7.0-8` with iter17 + iter20-24 printks.
|
||||
- 5-codec anchors: unchanged.
|
||||
|
||||
### Lesson
|
||||
|
||||
8 iterations of wire-byte and ioctl-sequence analysis (iter11-iter18) chased an empirical illusion. Once kernel-side printk landed (iter17), 4 more iterations (iter20-23) walked the symptom down to one function call returning one specific error code. **The bug was in a 5-line kernel function we'd never read.** Now we have the right diagnosis and a clear forward path.
|
||||
@@ -0,0 +1,94 @@
|
||||
## Iteration 25 — Phase 8 (close)
|
||||
|
||||
Closes 2026-05-14. iter25 = α-25 synthetic-SPS injection before cap_pool_init. **MAJOR WIN.** PARTIAL close — frame 1 byte-identical to kdirect for HEVC libva; frames 2+ have separate wire-byte issue (decode_params).
|
||||
|
||||
### α-25 implementation
|
||||
|
||||
`src/context.c::RequestCreateContext` — after S_FMT(OUTPUT) + S_FMT(CAPTURE) + G_FMT(CAPTURE) sanity, BEFORE `cap_pool_init`:
|
||||
|
||||
```c
|
||||
switch (config_object->profile) {
|
||||
case VAProfileHEVCMain: {
|
||||
struct v4l2_ctrl_hevc_sps dummy_sps;
|
||||
memset(&dummy_sps, 0, sizeof(dummy_sps));
|
||||
dummy_sps.chroma_format_idc = 1; /* 4:2:0 */
|
||||
dummy_sps.bit_depth_luma_minus8 = 0; /* 8-bit */
|
||||
dummy_sps.bit_depth_chroma_minus8 = 0;
|
||||
dummy_sps.pic_width_in_luma_samples = picture_width;
|
||||
dummy_sps.pic_height_in_luma_samples = picture_height;
|
||||
/* ... v4l2_set_controls(video_fd, request_fd=-1, &SPS, 1) ... */
|
||||
}
|
||||
case VAProfileH264*: similar with V4L2_CID_STATELESS_H264_SPS
|
||||
default: skip
|
||||
}
|
||||
```
|
||||
|
||||
Forks `db0b7f9` — single commit.
|
||||
|
||||
### Result — definitive
|
||||
|
||||
**Frame 1**: libva CAPTURE bytes = kdirect CAPTURE bytes (cmp identical for first 1382400 bytes, the entire frame 1 NV12 payload of 1280×720).
|
||||
|
||||
**Frame 2+**: diverge starting at byte 1382401.
|
||||
|
||||
### Kernel printk evidence (post-α-25)
|
||||
|
||||
```
|
||||
iter24_req_to_new: id=0xa40a90 ret=0 p_req_valid=1 p_req_elems=1
|
||||
iter24_try_or_set: master_id=0xa40a90 ret=0 ← was -16 (EBUSY) before
|
||||
iter24_req_to_new: id=0xa40a91 ret=0
|
||||
iter24_try_or_set: master_id=0xa40a91 ret=0
|
||||
iter24_req_to_new: id=0xa40a92 ret=0
|
||||
iter24_try_or_set: master_id=0xa40a92 ret=0
|
||||
iter24_req_to_new: id=0xa40a93 ret=0
|
||||
iter24_try_or_set: master_id=0xa40a93 ret=0
|
||||
iter24_req_to_new: id=0xa40a94 ret=0
|
||||
iter24_try_or_set: master_id=0xa40a94 ret=0
|
||||
rkvdec_iter20: sps[0..16]=00 00 00 05 d0 02 00 00 04 04 04 00 01 01 00 03
|
||||
← non-zero, w=1280, h=720
|
||||
rkvdec_hevc_run: w=1280 h=720 chroma=1 nal_unit_type=20 slice_type=2 decode_flags=0x3
|
||||
← rkvdec sees CORRECT SPS for the first time
|
||||
```
|
||||
|
||||
`iter24_loop_break-count = 0` — the setup loop NEVER breaks. All 5 staged HEVC controls commit to ctx->ctrl_hdl successfully.
|
||||
|
||||
### Bug 5 root cause: FIXED
|
||||
|
||||
The -EBUSY block from rkvdec_s_ctrl's vb2_is_busy check is gone. ctx->image_fmt is pre-seeded to RKVDEC_IMG_FMT_420_8BIT by the synthetic SPS injection before any CAPTURE buffer is allocated. Per-frame SPS submissions find image_fmt_changed=false → skip reset → commit succeeds.
|
||||
|
||||
### Frame 2+ divergence (separate Bug)
|
||||
|
||||
`decode_params.short_term_ref_pic_set_size`:
|
||||
- libva frame 2: bytes 4-5 = `00 00` → 0
|
||||
- kdirect frame 2: bytes 4-5 = `0a 00` → 10
|
||||
|
||||
libva's `h265_fill_decode_params` doesn't populate short_term_ref_pic_set_size (VAAPI doesn't expose it). kdirect parses it from the HEVC NAL directly. This affects DPB reference resolution for P/B frames. iter26 candidate.
|
||||
|
||||
### Mechanism status
|
||||
|
||||
| # | Mechanism | Status |
|
||||
|---|---|---|
|
||||
| 9 | rkvdec_s_ctrl -EBUSY on first SPS | **FIXED iter25 α-25** |
|
||||
| 10 | decode_params.short_term_ref_pic_set_size = 0 | **NEW iter26 candidate** |
|
||||
|
||||
### Substrate state at iter25 close
|
||||
|
||||
- Backend SHA on fresnel: post-α-25 build (commit `db0b7f9`).
|
||||
- Fork tip `db0b7f9` (α-25).
|
||||
- Kernel `linux-fresnel-fourier 7.0-8` (diagnostic printks; should eventually revert to clean 7.0-1 + RFC v2 + iter12 baseline).
|
||||
- HEVC libva frame 1 = kdirect frame 1 byte-identical. ✓✓✓
|
||||
- HEVC libva frame 2+: differs.
|
||||
|
||||
### Anchors check pending
|
||||
|
||||
Need to re-run 5-codec anchors to verify α-25 didn't regress VP9/MPEG-2/VP8 (it shouldn't — guard is `case VAProfileHEVCMain` / `case VAProfileH264*` only).
|
||||
|
||||
### Lesson
|
||||
|
||||
After 15 iterations chasing wire-byte hypotheses (iter11-iter18), 5 iterations of kernel printk (iter17-iter24), the actual bug was an interaction between libva's CAPTURE-pre-allocate design and rkvdec's lazy image_fmt determination. The fix is 90 LOC in libva. The kernel was correct all along — it just needed a way to commit the image_fmt before buffers were locked in.
|
||||
|
||||
This validates [[feedback-libva-byte-correct-kernel-bug]] only partially: libva WAS byte-correct in its ioctl content, but it had a CAPTURE-pool-allocation TIMING bug that interacted with kernel state. The bug is in libva, not the kernel, but the symptom only manifested because of kernel-side -EBUSY semantics that aren't well documented.
|
||||
|
||||
### iter26 candidate
|
||||
|
||||
Fix `h265_fill_decode_params` to populate `short_term_ref_pic_set_size`. VAAPI doesn't expose this directly, but it can be derived from `surface_object->params.h265.slices[0].short_term_ref_pic_set_size` (if VAAPI provides it) or parsed from the slice header.
|
||||
@@ -0,0 +1,76 @@
|
||||
## Iteration 26 — Phase 8 (close)
|
||||
|
||||
Closes 2026-05-14. iter26 = α-26 `decode_params.short_term_ref_pic_set_size` from `VAPictureParameterBufferHEVC.st_rps_bits`. PARTIAL close.
|
||||
|
||||
### α-26 fix
|
||||
|
||||
`src/h265.c::h265_fill_decode_params` — replaced the comment "VAAPI doesn't expose" with the actual assignment:
|
||||
|
||||
```c
|
||||
decode_params->short_term_ref_pic_set_size = picture->st_rps_bits;
|
||||
```
|
||||
|
||||
VAAPI's `VAPictureParameterBufferHEVC` exposes `st_rps_bits` (u32) as the bit-count of the inline `short_term_ref_pic_set` syntax element in the slice header. The previous comment in libva was wrong — the field IS exposed.
|
||||
|
||||
Fork `66ef848`.
|
||||
|
||||
### Empirical result
|
||||
|
||||
- **HEVC frame 1** (IDR): libva CAPTURE = kdirect CAPTURE byte-identical. ✓
|
||||
- **HEVC frames 2–10**: still diverge. Hash unchanged from iter25 (`700aa52d…`).
|
||||
- **decode_params bytes** (per iter20 kernel printk) NOW match kdirect for frames 1-3:
|
||||
- libva frame 2: `dp[0..16] = 04 00 00 00 0a 00 00 00 01 01 00 00 00 00 00 00`
|
||||
- kdirect frame 2: `dp[0..16] = 04 00 00 00 0a 00 00 00 01 01 00 00 00 00 00 00` ✓
|
||||
|
||||
α-26 fixed the first 16 bytes of decode_params for libva. But the output is identical to iter25 — so the divergence-causing bytes are NOT in `decode_params[0..16]`.
|
||||
|
||||
### What still differs
|
||||
|
||||
12,234,632 of 13,824,000 bytes diverge (frames 2-10 nearly all bytes off). Frame 1 is 1,382,400 bytes — byte-identical.
|
||||
|
||||
`rkvdec_hevc_run` printk for libva still shows `reorder=4` (libva's incorrect `sps_max_num_reorder_pics = sps_max_dec_pic_buffering_minus1`) vs kdirect's `reorder=2`. But kernel source search shows `sps_max_num_reorder_pics` is referenced ONLY in our diagnostic printk — rkvdec_hevc_run hardware setup doesn't use it. So that's not the cause.
|
||||
|
||||
The likely candidates for frame 2+ divergence:
|
||||
1. **DPB entry mapping**: dpb[i].timestamp must match the CAPTURE buffer's timestamp. iter26 didn't probe this. Need to dump `dp[64..96]` (dpb[0..1] entries) and compare libva vs kdirect.
|
||||
2. **CAPTURE buffer reuse pattern**: libva's iter5b-β has 24-slot LRU, kdirect has different pool size. Maybe the kernel's reference buffer association differs.
|
||||
3. **slice_params bytes**: not yet inspected by printk.
|
||||
|
||||
### Regression check (non-HEVC anchors)
|
||||
|
||||
Run with αpha-25 + α-26 changes:
|
||||
|
||||
| Codec | Result | Notes |
|
||||
|---|---|---|
|
||||
| H.264 (bbb_1080p30_h264, 3 frames) | **HW = kdirect byte-equal** | Bug 4 FIXED |
|
||||
| HEVC (bbb_720p10s_hevc, 1 frame) | HW = kdirect byte-equal | Frame 1 FIXED |
|
||||
| HEVC (bbb_720p10s_hevc, 10 frames) | HW ≠ kdirect | Frame 2+ separate bug |
|
||||
| VP9 (bbb_720p10s_vp9) | HW = SW byte-equal | Unchanged |
|
||||
| MPEG-2 (bbb_720p10s_mpeg2) | **Not testable this boot** | Pre-existing: vainfo only advertises rkvdec profiles, hantro paths not multi-probed |
|
||||
| VP8 (bbb_720p10s_vp8) | **Not testable this boot** | Same pre-existing issue |
|
||||
|
||||
The MPEG-2 / VP8 "not testable" state is due to libva backend's single-device auto-select (chose rkvdec at /dev/video1+/dev/media0 this boot). rkvdec doesn't advertise MPEG-2 / VP8 profiles. To test these, would need libva-side multi-device profile-probe or explicit env override.
|
||||
|
||||
### Major campaign milestone
|
||||
|
||||
**Bug 4 (H.264 keyframe-partial → all-correct)** and **Bug 5 (HEVC libva all-zero → frame 1 correct)** root causes identified and **PARTIALLY FIXED** in two iterations after the 6 kernel-printk diagnostic iterations narrowed the failure to `rkvdec_s_ctrl -EBUSY` on first SPS.
|
||||
|
||||
H.264 is now fully byte-equivalent to kdirect. HEVC has one remaining bug (frame 2+).
|
||||
|
||||
### Substrate state at iter26 close
|
||||
|
||||
- Backend fork tip: `66ef848` (α-25 + H264-flag-fix + α-26).
|
||||
- Kernel `7.0-8` with diagnostic printks (will eventually revert to clean baseline).
|
||||
- 5-codec status: H264 ✅, HEVC ⚠️ (frame 1 ✅), VP9 ✅, MPEG-2/VP8 untestable this boot.
|
||||
|
||||
### iter27 candidate
|
||||
|
||||
Add iter20-style kernel printk extension to dump `dp[64..96]` covering `dpb[0..1]` entries. Compare libva vs kdirect DPB entries for HEVC frame 2 to identify if it's:
|
||||
- (a) timestamp mismatch → libva references a non-existent CAPTURE buffer.
|
||||
- (b) pic_order_cnt_val mismatch.
|
||||
- (c) DPB entry flags mismatch.
|
||||
|
||||
OR alternatively, just inspect slice_params bytes for frame 2 (rkvdec's `run.slices_params[0]` printk extension).
|
||||
|
||||
### Lesson
|
||||
|
||||
Two iterations of libva-side patches (α-25 = synthetic SPS, α-26 = st_rps_bits) after the 24-iteration kernel-printk localization fixed Bug 4 fully and Bug 5 partially. The campaign's wire-byte hypothesis arc (iter11-iter18) was overturned by kernel printk, but THEN the actual fix was almost entirely on the libva side. The kernel was correct.
|
||||
Reference in New Issue
Block a user