From 9092d9aaaa9f9d64ed68d9fa9cd1b936615a334b Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Mon, 18 May 2026 12:56:15 +0000 Subject: [PATCH 1/5] patches/driver/media: import Sarma's VP9-VDPU381 series (out-of-tree, v8) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three patches from D.V.A.B. Sarma adding VP9 decode support to the VDPU381 variant of rkvdec (RK3588 generation). Combined ~1500 LOC, 5 new files in drivers/media/platform/rockchip/rkvdec/. Provenance: github.com/dvab-sarma/android_kernel_rk_opi branch add-rkvdec-vdpu381-vp9-v8. Collabora's blog cites the work but it hasn't reached linux-media patchwork yet (Collabora: "v1 series needs to be sent for review soon"). Casanova's underlying VDPU381/VDPU383 H.264+HEVC base IS in mainline 7.0 release. Tested by author on Orange Pi 5 Pro (RK3588) with AOSP 16 + FFMPEG, Profile 0 + Profile 2. Tested in our fleet 2026-05-18: cherry-picks cleanly on top of ampere-minimal-devices, full kernel build (KERNELRELEASE 7.0.0-rc3-vp9-test+) succeeds clean with GCC 16.1.1. Image + DTB + modules + initramfs installed under -vp9-test+ suffix on ampere without touching the running -devices+ kernel; new extlinux label arch_vp9_test added (default unchanged at arch_devices). End-to-end VP9 decode verification pending operator reboot into the new label. Patches NOT yet referenced from fleet/ampere.yaml — that bump is the operator's call (manifest preamble currently scopes VP9 out per issue #6). Once verified, ampere.yaml can add these three under the scope-tagged patch list in apply order 0001→0002→0003. Cross-reference: marfrit/kernel-agent#12. --- ...ename-get_ref_buf-to-get_ref_buf_vp9.patch | 48 + ...ec-move-vp9-functions-to-common-file.patch | 387 +++++ ...-add-vp9-support-for-vdpu381-variant.patch | 1405 +++++++++++++++++ patches/driver/media/README.md | 69 + 4 files changed, 1909 insertions(+) create mode 100644 patches/driver/media/0001-rkvdec-vp9-rename-get_ref_buf-to-get_ref_buf_vp9.patch create mode 100644 patches/driver/media/0002-rkvdec-move-vp9-functions-to-common-file.patch create mode 100644 patches/driver/media/0003-rkvdec-add-vp9-support-for-vdpu381-variant.patch create mode 100644 patches/driver/media/README.md diff --git a/patches/driver/media/0001-rkvdec-vp9-rename-get_ref_buf-to-get_ref_buf_vp9.patch b/patches/driver/media/0001-rkvdec-vp9-rename-get_ref_buf-to-get_ref_buf_vp9.patch new file mode 100644 index 0000000..39dd5a3 --- /dev/null +++ b/patches/driver/media/0001-rkvdec-vp9-rename-get_ref_buf-to-get_ref_buf_vp9.patch @@ -0,0 +1,48 @@ +From 9ddcae54a171f2fc7742e92e03b1478d87ae4bbb Mon Sep 17 00:00:00 2001 +From: Venkata Atchuta Bheemeswara Sarma Darbha +Date: Sat, 17 Jan 2026 14:27:22 -0600 +Subject: [PATCH 1/3] media: rkvdec: vp9: Changing get_ref_buf function name to + get_ref_buf_vp9 + +This change is in preparation for the upcoming commits and to denote that this function is not to be confused with the similar function found in rkvdec's hevc. + +Change-Id: I934684778c375c6960a19989a702be44655c55d6 +Signed-off-by: Venkata Atchuta Bheemeswara Sarma Darbha +(cherry picked from commit f60174f07d9c56e7499ca3111d0999e26444cdfd) +--- + drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c | 10 +++++----- + 1 file changed, 5 insertions(+), 5 deletions(-) + +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c +index e4cdd2122873..bab2e9c83d06 100644 +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c +@@ -349,7 +349,7 @@ static void init_probs(struct rkvdec_ctx *ctx, + } + + static struct rkvdec_decoded_buffer * +-get_ref_buf(struct rkvdec_ctx *ctx, struct vb2_v4l2_buffer *dst, u64 timestamp) ++get_ref_buf_vp9(struct rkvdec_ctx *ctx, struct vb2_v4l2_buffer *dst, u64 timestamp) + { + struct v4l2_m2m_ctx *m2m_ctx = ctx->fh.m2m_ctx; + struct vb2_queue *cap_q = &m2m_ctx->cap_q_ctx.q; +@@ -489,12 +489,12 @@ static void config_registers(struct rkvdec_ctx *ctx, + + dec_params = run->decode_params; + dst = vb2_to_rkvdec_decoded_buf(&run->base.bufs.dst->vb2_buf); +- ref_bufs[0] = get_ref_buf(ctx, &dst->base.vb, dec_params->last_frame_ts); +- ref_bufs[1] = get_ref_buf(ctx, &dst->base.vb, dec_params->golden_frame_ts); +- ref_bufs[2] = get_ref_buf(ctx, &dst->base.vb, dec_params->alt_frame_ts); ++ ref_bufs[0] = get_ref_buf_vp9(ctx, &dst->base.vb, dec_params->last_frame_ts); ++ ref_bufs[1] = get_ref_buf_vp9(ctx, &dst->base.vb, dec_params->golden_frame_ts); ++ ref_bufs[2] = get_ref_buf_vp9(ctx, &dst->base.vb, dec_params->alt_frame_ts); + + if (vp9_ctx->last.valid) +- last = get_ref_buf(ctx, &dst->base.vb, vp9_ctx->last.timestamp); ++ last = get_ref_buf_vp9(ctx, &dst->base.vb, vp9_ctx->last.timestamp); + else + last = dst; + +-- +2.54.0 + diff --git a/patches/driver/media/0002-rkvdec-move-vp9-functions-to-common-file.patch b/patches/driver/media/0002-rkvdec-move-vp9-functions-to-common-file.patch new file mode 100644 index 0000000..ddcb5db --- /dev/null +++ b/patches/driver/media/0002-rkvdec-move-vp9-functions-to-common-file.patch @@ -0,0 +1,387 @@ +From c5063d93e0e6011abe91418a98ed7c7550f0391b Mon Sep 17 00:00:00 2001 +From: Venkata Atchuta Bheemeswara Sarma Darbha +Date: Sat, 17 Jan 2026 14:37:07 -0600 +Subject: [PATCH 2/3] media: rkvdec: Move vp9 functions to common file This is + a preparation commit to add support for new variants of the decoder. + +The functions will later be shared with vdpu381 (rk3588). + +Change-Id: Ib9b78331fb6eb0e3a607b06fd5138fc741b2c9c0 +Signed-off-by: Venkata Atchuta Bheemeswara Sarma Darbha +(cherry picked from commit e87662ca32e88ebb910f6cfc1c71096d5d7bc063) +--- + .../media/platform/rockchip/rkvdec/Makefile | 1 + + .../rockchip/rkvdec/rkvdec-vp9-common.c | 77 +++++++++++ + .../rockchip/rkvdec/rkvdec-vp9-common.h | 95 +++++++++++++ + .../platform/rockchip/rkvdec/rkvdec-vp9.c | 125 +----------------- + 4 files changed, 174 insertions(+), 124 deletions(-) + create mode 100644 drivers/media/platform/rockchip/rkvdec/rkvdec-vp9-common.c + create mode 100644 drivers/media/platform/rockchip/rkvdec/rkvdec-vp9-common.h + +diff --git a/drivers/media/platform/rockchip/rkvdec/Makefile b/drivers/media/platform/rockchip/rkvdec/Makefile +index e629d571e4d8..2bbd67b2db11 100644 +--- a/drivers/media/platform/rockchip/rkvdec/Makefile ++++ b/drivers/media/platform/rockchip/rkvdec/Makefile +@@ -12,4 +12,5 @@ rockchip-vdec-y += \ + rkvdec-vdpu381-hevc.o \ + rkvdec-vdpu383-h264.o \ + rkvdec-vdpu383-hevc.o \ ++ rkvdec-vp9-common.o \ + rkvdec-vp9.o +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9-common.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9-common.c +new file mode 100644 +index 000000000000..93023737c1ed +--- /dev/null ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9-common.c +@@ -0,0 +1,77 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Rockchip video decoder VP9 common functions ++ * ++ * Copyright (C) 2019 Collabora, Ltd. ++ * Boris Brezillon ++ * Copyright (C) 2021 Collabora, Ltd. ++ * Andrzej Pietrasiewicz ++ * ++ * Copyright (C) 2016 Rockchip Electronics Co., Ltd. ++ * Alpha Lin ++ */ ++#include ++#include ++#include ++ ++#include "rkvdec.h" ++#include "rkvdec-vp9-common.h" ++ ++void write_coeff_plane(const u8 coef[6][6][3], u8 *coeff_plane) ++{ ++ unsigned int idx = 0, byte_count = 0; ++ int k, m, n; ++ u8 p; ++ ++ for (k = 0; k < 6; k++) { ++ for (m = 0; m < 6; m++) { ++ for (n = 0; n < 3; n++) { ++ p = coef[k][m][n]; ++ coeff_plane[idx++] = p; ++ byte_count++; ++ if (byte_count == 27) { ++ idx += 5; ++ byte_count = 0; ++ } ++ } ++ } ++ } ++} ++ ++struct rkvdec_decoded_buffer * ++get_ref_buf_vp9(struct rkvdec_ctx *ctx, struct vb2_v4l2_buffer *dst, u64 timestamp) ++{ ++ struct v4l2_m2m_ctx *m2m_ctx = ctx->fh.m2m_ctx; ++ struct vb2_queue *cap_q = &m2m_ctx->cap_q_ctx.q; ++ struct vb2_buffer *buf; ++ ++ /* ++ * If a ref is unused or invalid, address of current destination ++ * buffer is returned. ++ */ ++ buf = vb2_find_buffer(cap_q, timestamp); ++ if (!buf) ++ buf = &dst->vb2_buf; ++ ++ return vb2_to_rkvdec_decoded_buf(buf); ++} ++ ++dma_addr_t get_mv_base_addr(struct rkvdec_decoded_buffer *buf) ++{ ++ unsigned int aligned_pitch, aligned_height, yuv_len; ++ ++ aligned_height = round_up(buf->vp9.height, 64); ++ aligned_pitch = round_up(buf->vp9.width * buf->vp9.bit_depth, 512) / 8; ++ yuv_len = (aligned_height * aligned_pitch * 3) / 2; ++ ++ return vb2_dma_contig_plane_dma_addr(&buf->base.vb.vb2_buf, 0) + ++ yuv_len; ++} ++ ++void update_dec_buf_info(struct rkvdec_decoded_buffer *buf, ++ const struct v4l2_ctrl_vp9_frame *dec_params) ++{ ++ buf->vp9.width = dec_params->frame_width_minus_1 + 1; ++ buf->vp9.height = dec_params->frame_height_minus_1 + 1; ++ buf->vp9.bit_depth = dec_params->bit_depth; ++} +\ No newline at end of file +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9-common.h b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9-common.h +new file mode 100644 +index 000000000000..056842cf1bba +--- /dev/null ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9-common.h +@@ -0,0 +1,95 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Rockchip video decoder VP9 common functions ++ * ++ * Copyright (C) 2019 Collabora, Ltd. ++ * Boris Brezillon ++ * Copyright (C) 2021 Collabora, Ltd. ++ * Andrzej Pietrasiewicz ++ * ++ * Copyright (C) 2016 Rockchip Electronics Co., Ltd. ++ * Alpha Lin ++ */ ++ ++#include ++#include ++#include ++ ++#include "rkvdec.h" ++ ++struct rkvdec_vp9_run { ++ struct rkvdec_run base; ++ const struct v4l2_ctrl_vp9_frame *decode_params; ++}; ++ ++struct rkvdec_vp9_intra_mode_probs { ++ u8 y_mode[105]; ++ u8 uv_mode[23]; ++}; ++ ++struct rkvdec_vp9_intra_only_frame_probs { ++ u8 coef_intra[4][2][128]; ++ struct rkvdec_vp9_intra_mode_probs intra_mode[10]; ++}; ++ ++struct rkvdec_vp9_inter_frame_probs { ++ u8 y_mode[4][9]; ++ u8 comp_mode[5]; ++ u8 comp_ref[5]; ++ u8 single_ref[5][2]; ++ u8 inter_mode[7][3]; ++ u8 interp_filter[4][2]; ++ u8 padding0[11]; ++ u8 coef[2][4][2][128]; ++ u8 uv_mode_0_2[3][9]; ++ u8 padding1[5]; ++ u8 uv_mode_3_5[3][9]; ++ u8 padding2[5]; ++ u8 uv_mode_6_8[3][9]; ++ u8 padding3[5]; ++ u8 uv_mode_9[9]; ++ u8 padding4[7]; ++ u8 padding5[16]; ++ struct { ++ u8 joint[3]; ++ u8 sign[2]; ++ u8 classes[2][10]; ++ u8 class0_bit[2]; ++ u8 bits[2][10]; ++ u8 class0_fr[2][2][3]; ++ u8 fr[2][3]; ++ u8 class0_hp[2]; ++ u8 hp[2]; ++ u8 padding6[3]; ++ } mv; ++}; ++ ++struct rkvdec_vp9_probs { ++ u8 partition[16][3]; ++ u8 pred[3]; ++ u8 tree[7]; ++ u8 skip[3]; ++ u8 tx32[2][3]; ++ u8 tx16[2][2]; ++ u8 tx8[2][1]; ++ u8 is_inter[4]; ++ /* 128 bit alignment */ ++ u8 padding0[3]; ++ union { ++ struct rkvdec_vp9_inter_frame_probs inter; ++ struct rkvdec_vp9_intra_only_frame_probs intra_only; ++ }; ++ /* 128 bit alignment */ ++ u8 padding1[8]; ++}; ++ ++ ++void write_coeff_plane(const u8 coef[6][6][3], u8 *coeff_plane); ++ ++struct rkvdec_decoded_buffer * ++get_ref_buf_vp9(struct rkvdec_ctx *ctx, struct vb2_v4l2_buffer *dst, u64 timestamp); ++ ++dma_addr_t get_mv_base_addr(struct rkvdec_decoded_buffer *buf); ++ ++void update_dec_buf_info(struct rkvdec_decoded_buffer *buf, ++ const struct v4l2_ctrl_vp9_frame *dec_params); +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c +index bab2e9c83d06..2b368d7b61e0 100644 +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c +@@ -23,71 +23,12 @@ + + #include "rkvdec.h" + #include "rkvdec-regs.h" ++#include "rkvdec-vp9-common.h" + + #define RKVDEC_VP9_PROBE_SIZE 4864 + #define RKVDEC_VP9_COUNT_SIZE 13232 + #define RKVDEC_VP9_MAX_SEGMAP_SIZE 73728 + +-struct rkvdec_vp9_intra_mode_probs { +- u8 y_mode[105]; +- u8 uv_mode[23]; +-}; +- +-struct rkvdec_vp9_intra_only_frame_probs { +- u8 coef_intra[4][2][128]; +- struct rkvdec_vp9_intra_mode_probs intra_mode[10]; +-}; +- +-struct rkvdec_vp9_inter_frame_probs { +- u8 y_mode[4][9]; +- u8 comp_mode[5]; +- u8 comp_ref[5]; +- u8 single_ref[5][2]; +- u8 inter_mode[7][3]; +- u8 interp_filter[4][2]; +- u8 padding0[11]; +- u8 coef[2][4][2][128]; +- u8 uv_mode_0_2[3][9]; +- u8 padding1[5]; +- u8 uv_mode_3_5[3][9]; +- u8 padding2[5]; +- u8 uv_mode_6_8[3][9]; +- u8 padding3[5]; +- u8 uv_mode_9[9]; +- u8 padding4[7]; +- u8 padding5[16]; +- struct { +- u8 joint[3]; +- u8 sign[2]; +- u8 classes[2][10]; +- u8 class0_bit[2]; +- u8 bits[2][10]; +- u8 class0_fr[2][2][3]; +- u8 fr[2][3]; +- u8 class0_hp[2]; +- u8 hp[2]; +- } mv; +-}; +- +-struct rkvdec_vp9_probs { +- u8 partition[16][3]; +- u8 pred[3]; +- u8 tree[7]; +- u8 skip[3]; +- u8 tx32[2][3]; +- u8 tx16[2][2]; +- u8 tx8[2][1]; +- u8 is_inter[4]; +- /* 128 bit alignment */ +- u8 padding0[3]; +- union { +- struct rkvdec_vp9_inter_frame_probs inter; +- struct rkvdec_vp9_intra_only_frame_probs intra_only; +- }; +- /* 128 bit alignment */ +- u8 padding1[11]; +-}; +- + /* Data structure describing auxiliary buffer format. */ + struct rkvdec_vp9_priv_tbl { + struct rkvdec_vp9_probs probs; +@@ -136,11 +77,6 @@ struct rkvdec_vp9_intra_frame_symbol_counts { + struct rkvdec_vp9_refs_counts ref_cnt[2][4][2][6][6]; + }; + +-struct rkvdec_vp9_run { +- struct rkvdec_run base; +- const struct v4l2_ctrl_vp9_frame *decode_params; +-}; +- + struct rkvdec_vp9_frame_info { + u32 valid : 1; + u32 segmapid : 1; +@@ -166,27 +102,6 @@ struct rkvdec_vp9_ctx { + struct rkvdec_regs regs; + }; + +-static void write_coeff_plane(const u8 coef[6][6][3], u8 *coeff_plane) +-{ +- unsigned int idx = 0, byte_count = 0; +- int k, m, n; +- u8 p; +- +- for (k = 0; k < 6; k++) { +- for (m = 0; m < 6; m++) { +- for (n = 0; n < 3; n++) { +- p = coef[k][m][n]; +- coeff_plane[idx++] = p; +- byte_count++; +- if (byte_count == 27) { +- idx += 5; +- byte_count = 0; +- } +- } +- } +- } +-} +- + static void init_intra_only_probs(struct rkvdec_ctx *ctx, + const struct rkvdec_vp9_run *run) + { +@@ -348,36 +263,6 @@ static void init_probs(struct rkvdec_ctx *ctx, + init_inter_probs(ctx, run); + } + +-static struct rkvdec_decoded_buffer * +-get_ref_buf_vp9(struct rkvdec_ctx *ctx, struct vb2_v4l2_buffer *dst, u64 timestamp) +-{ +- struct v4l2_m2m_ctx *m2m_ctx = ctx->fh.m2m_ctx; +- struct vb2_queue *cap_q = &m2m_ctx->cap_q_ctx.q; +- struct vb2_buffer *buf; +- +- /* +- * If a ref is unused or invalid, address of current destination +- * buffer is returned. +- */ +- buf = vb2_find_buffer(cap_q, timestamp); +- if (!buf) +- buf = &dst->vb2_buf; +- +- return vb2_to_rkvdec_decoded_buf(buf); +-} +- +-static dma_addr_t get_mv_base_addr(struct rkvdec_decoded_buffer *buf) +-{ +- unsigned int aligned_pitch, aligned_height, yuv_len; +- +- aligned_height = round_up(buf->vp9.height, 64); +- aligned_pitch = round_up(buf->vp9.width * buf->vp9.bit_depth, 512) / 8; +- yuv_len = (aligned_height * aligned_pitch * 3) / 2; +- +- return vb2_dma_contig_plane_dma_addr(&buf->base.vb.vb2_buf, 0) + +- yuv_len; +-} +- + static void config_ref_registers(struct rkvdec_ctx *ctx, + const struct rkvdec_vp9_run *run, + struct rkvdec_decoded_buffer *ref_buf, +@@ -446,14 +331,6 @@ static void config_seg_registers(struct rkvdec_ctx *ctx, unsigned int segid) + (seg->flags & V4L2_VP9_SEGMENTATION_FLAG_ABS_OR_DELTA_UPDATE); + } + +-static void update_dec_buf_info(struct rkvdec_decoded_buffer *buf, +- const struct v4l2_ctrl_vp9_frame *dec_params) +-{ +- buf->vp9.width = dec_params->frame_width_minus_1 + 1; +- buf->vp9.height = dec_params->frame_height_minus_1 + 1; +- buf->vp9.bit_depth = dec_params->bit_depth; +-} +- + static void update_ctx_cur_info(struct rkvdec_vp9_ctx *vp9_ctx, + struct rkvdec_decoded_buffer *buf, + const struct v4l2_ctrl_vp9_frame *dec_params) +-- +2.54.0 + diff --git a/patches/driver/media/0003-rkvdec-add-vp9-support-for-vdpu381-variant.patch b/patches/driver/media/0003-rkvdec-add-vp9-support-for-vdpu381-variant.patch new file mode 100644 index 0000000..1a0ef49 --- /dev/null +++ b/patches/driver/media/0003-rkvdec-add-vp9-support-for-vdpu381-variant.patch @@ -0,0 +1,1405 @@ +From 48a8c785de7f5320513052a64e544c6310d7b273 Mon Sep 17 00:00:00 2001 +From: Venkata Atchuta Bheemeswara Sarma Darbha +Date: Sat, 17 Jan 2026 14:53:40 -0600 +Subject: [PATCH 3/3] media: rkvdec: Add VP9 support for VDPU381 variant The + VDPU381 supports VP9 decoding up to 7680x4320@30fps. + +It supports YUV420 (8 and 10 bits) i.e Profile 0 and Profile 2. + +Testing shows promising results. Testing done on Orange pi 5 pro board with aosp 16 and with FFMPEG. + +Change-Id: I612c3f1bd7693ab0a8081ee14f2caf1543b2f83d +Signed-off-by: Venkata Atchuta Bheemeswara Sarma Darbha +(cherry picked from commit aa00b89b6bbfd7570e459172417e2e72921689f4) +--- + .../media/platform/rockchip/rkvdec/Makefile | 1 + + .../rockchip/rkvdec/rkvdec-vdpu381-regs.h | 235 ++++ + .../rockchip/rkvdec/rkvdec-vdpu381-vp9.c | 1014 +++++++++++++++++ + .../media/platform/rockchip/rkvdec/rkvdec.c | 52 + + .../media/platform/rockchip/rkvdec/rkvdec.h | 1 + + 5 files changed, 1303 insertions(+) + create mode 100644 drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-vp9.c + +diff --git a/drivers/media/platform/rockchip/rkvdec/Makefile b/drivers/media/platform/rockchip/rkvdec/Makefile +index 2bbd67b2db11..5eae59e87fce 100644 +--- a/drivers/media/platform/rockchip/rkvdec/Makefile ++++ b/drivers/media/platform/rockchip/rkvdec/Makefile +@@ -10,6 +10,7 @@ rockchip-vdec-y += \ + rkvdec-rcb.o \ + rkvdec-vdpu381-h264.o \ + rkvdec-vdpu381-hevc.o \ ++ rkvdec-vdpu381-vp9.o \ + rkvdec-vdpu383-h264.o \ + rkvdec-vdpu383-hevc.o \ + rkvdec-vp9-common.o \ +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-regs.h b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-regs.h +index 6da36031df2d..2b409ee014a2 100644 +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-regs.h ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-regs.h +@@ -4,6 +4,8 @@ + * + * Copyright (C) 2024 Collabora, Ltd. + * Detlev Casanova ++ * ++ * Copyright (C) 2025 Venkata Atchuta Bheemeswara Sarma Darbha + */ + + #include +@@ -382,6 +384,212 @@ struct rkvdec_vdpu381_regs_hevc_params { + + } __packed; + ++struct rkvdec_vdpu381_vp9_set { ++ u32 vp9_cprheader_offset : 16; ++ u32 reserved : 16; ++}__packed; ++ ++/* base: OFFSET_CODEC_PARAMS_REGS */ ++struct rkvdec_vdpu381_regs_vp9_params { ++ struct rkvdec_vdpu381_vp9_set reg064; ++ ++ u32 cur_top_poc; ++ u32 reserved0 ; ++ ++ struct rkvdec_vdpu381_vp9_segid_grp { ++ u32 vp9_segid_abs_delta : 1; ++ u32 vp9_segid_frame_qp_delta_en : 1; ++ u32 vp9_segid_frame_qp_delta : 9; ++ u32 vp9_segid_frame_loopfilter_value_en : 1; ++ u32 vp9_segid_frame_loopfilter_value : 7; ++ u32 vp9_segid_referinfo_en : 1; ++ u32 vp9_segid_referinfo : 2; ++ u32 vp9_segid_frame_skip_en : 1; ++ u32 reserved : 9; ++ } reg67_74[8]; ++ ++ struct rkvdec_vdpu381_vp9_info_lastframe { ++ u32 vp9_mode_deltas_lastframe : 14; ++ u32 reserved0 : 2; ++ u32 segmentation_enable_lstframe : 1; ++ u32 vp9_last_showframe : 1; ++ u32 vp9_last_intra_only : 1; ++ u32 vp9_last_widhheight_eqcur :1; ++ u32 vp9_color_sapce_lastkeyframe : 3; ++ u32 reserved1 : 9; ++ } reg75; ++ ++ ++ struct rkvdec_vdpu381_vp9_cprheader_config { ++ u32 vp9_tx_mode : 3; ++ u32 vp9_frame_reference_mode : 2; ++ u32 reserved : 27; ++ } reg76; ++ ++ struct rkvdec_vdpu381_vp9_intercmd_num { ++ u32 vp9_intercmd_num : 24; ++ u32 reserved : 8; ++ } reg77; ++ ++ u32 reg78_vp9_stream_size; ++ ++ struct rkvdec_vdpu381_vp9_lastf_y_hor_virstride { ++ u32 vp9_lastfy_hor_virstride : 16; ++ u32 reserved : 16; ++ } reg79; ++ ++ struct rkvdec_vdpu381_vp9_lastf_uv_hor_virstride { ++ u32 vp9_lastfuv_hor_virstride : 16; ++ u32 reserved : 16; ++ } reg80 ; ++ ++ struct rkvdec_vdpu381_vp9_goldenf_y_hor_virstride { ++ u32 vp9_goldenfy_hor_virstride : 16; ++ u32 reserved : 16; ++ } reg81; ++ ++ struct rkvdec_vdpu381_vp9_golden_uv_hot_virstride { ++ u32 vp9_goldenuv_hor_virstride : 16; ++ u32 reserved : 16; ++ } reg82; ++ ++ struct rkvdec_vdpu381_vp9_altreff_y_hor_virstride { ++ u32 vp9_altreffy_hor_virstride :16; ++ u32 reserved : 16; ++ } reg83; ++ ++ struct rkvdec_vdpu381_vp9_altreff_uv_hor_virstride { ++ u32 vp9_altreff_uv_hor_virstride : 16; ++ u32 reserved : 16; ++ } reg84; ++ ++ struct rkvdec_vdpu381_vp9_lastf_y_virstride { ++ u32 vp9_lastfy_virstride : 28; ++ u32 reserved : 4; ++ } reg85; ++ ++ struct rkvdec_vdpu381_vp9_golden_y_virstride { ++ u32 vp9_goldeny_virstride : 28; ++ u32 reserved : 4; ++ } reg86; ++ ++ struct rkvdec_vdpu381_vp9_altref_y_virstride { ++ u32 vp9_altrefy_virstride : 28; ++ u32 reserved : 4; ++ } reg87; ++ ++ struct rkvdec_vdpu381_vp9_lref_hor_scale { ++ u32 vp9_lref_hor_scale : 16; ++ u32 reserved : 16; ++ } reg88; ++ ++ struct rkvdec_vdpu381_vp9_lref_ver_scale { ++ u32 vp9_lref_ver_scale : 16; ++ u32 reserved : 16; ++ } reg89; ++ ++ struct rkvdec_vdpu381_vp9_gref_hor_scale { ++ u32 vp9_gref_hor_scale : 16; ++ u32 reserved : 16; ++ } reg90; ++ ++ struct rkvdec_vdpu381_vp9_gref_ver_scale { ++ u32 vp9_gref_ver_scale :16; ++ u32 reserved : 16; ++ } reg91; ++ ++ struct rkvdec_vdpu381_vp9_aref_hor_scale { ++ u32 vp9_aref_hor_scale : 16; ++ u32 reserved : 16; ++ } reg92; ++ ++ struct rkvdec_vdpu381_vp9_aref_ver_scale { ++ u32 vp9_aref_ver_scale : 16; ++ u32 reserved : 16; ++ } reg93; ++ ++ struct rkvdec_vdpu381_vp9_ref_deltas_lastframe { ++ u32 vp9_ref_deltas_lastframe : 28; ++ u32 reserved : 4; ++ } reg94; ++ ++ u32 reg95_vp9_last_poc; ++ ++ u32 reg96_vp9_golden_poc; ++ ++ u32 reg97_vp9_altref_poc; ++ ++ u32 reg98_vp9_col_ref_poc; ++ ++ struct rkvdec_vdpu381_vp9_prob_ref_poc { ++ u32 vp9_prob_ref_poc : 16; ++ u32 reserved : 16; ++ } reg99; ++ ++ struct rkvdec_vdpu381_vp9_segid_ref_poc { ++ u32 vp9_segid_ref_poc : 16; ++ u32 reserved : 16; ++ } reg100; ++ ++ u32 reserved1[2]; ++ ++ struct rkvdec_vdpu381_vp9_prob_en { ++ u32 reserved : 20; ++ u32 vp9_prob_update_en : 1; ++ u32 vp9_refresh_en : 1; ++ u32 vp9_prob_save_en : 1; ++ u32 vp9_intra_only_flag : 1; ++ u32 vp9_txfmmode_rfsh_en : 1; ++ u32 vp9_ref_mode_rfsh_en : 1; ++ u32 vp9_single_ref_rfsh_en : 1; ++ u32 vp9_comp_ref_rfsh_en : 1; ++ u32 vp9_interp_filter_switch_en : 1; ++ u32 vp9_allow_high_precision_mv : 1; ++ u32 vp9_last_key_frame_flag : 1; ++ u32 vp9_inter_coef_rfsh_flag :1; ++ } reg103; ++ ++ u32 reserved2; ++ ++ struct rkvdec_vdpu381_vp9_cnt_upd_en_avs2_headlen { ++ u32 avs2_head_len : 4; ++ u32 vp9count_update_en : 1; ++ u32 reserved : 27; ++ } reg105; ++ ++ struct rkvdec_vdpu381_vp9_frame_width_last { ++ u32 vp9_framewidth_last : 16; ++ u32 reserved : 16; ++ } reg106; ++ ++ struct rkvdec_vdpu381_vp9_frame_height_last { ++ u32 vp9_frameheight_last: 16; ++ u32 reserved : 16; ++ } reg107; ++ ++ struct rkvdec_vdpu381_vp9_frame_width_golden { ++ u32 vp9_framewidth_golden : 16; ++ u32 reserved : 16; ++ } reg108; ++ ++ struct rkvdec_vdpu381_vp9_frame_height_golden { ++ u32 vp9_frameheight_golden : 16; ++ u32 reserved : 16; ++ } reg109; ++ ++ struct rkvdec_vdpu381_vp9_frame_width_altref { ++ u32 vp9_framewidth_altref : 16; ++ u32 reserved : 16; ++ } reg110; ++ ++ struct rkvdec_vdpu381_vp9_frame_height_altref { ++ u32 vp9_frameheight_altref : 16; ++ u32 reserved : 16; ++ } reg111; ++ ++ u32 reserved3; ++} __packed; ++ + /* base: OFFSET_CODEC_ADDR_REGS */ + struct rkvdec_vdpu381_regs_h26x_addr { + u32 reserved_160; +@@ -394,6 +602,26 @@ struct rkvdec_vdpu381_regs_h26x_addr { + u32 reg199_cabactbl_base; + } __packed; + ++struct rkvdec_vdpu381_regs_vp9_addr { ++ u32 vp9_delta_prob_base; ++ u32 reserved0; ++ u32 vp9_last_prob_base; ++ u32 reserved1; ++ u32 vp9_referlast_base; ++ u32 vp9_refergolden_base; ++ u32 vp9_referalfter_base; ++ u32 vp9_count_base; ++ u32 vp9_segidlast_base; ++ u32 avp9_segidcur_base; ++ u32 vp9_refcolmv_base; ++ u32 vp9_intercmd_base; ++ u32 vp9_update_prob_wr_bas; ++ u32 reserved2[7]; ++ u32 scanlist_addr; ++ u32 colmv_base[16]; ++ u32 cabactbl_base; ++}__packed; ++ + struct rkvdec_vdpu381_regs_h26x_highpoc { + struct { + u32 ref0_poc_highbit : 4; +@@ -427,4 +655,11 @@ struct rkvdec_vdpu381_regs_hevc { + struct rkvdec_vdpu381_regs_h26x_highpoc hevc_highpoc; + } __packed; + ++struct rkvdec_vdpu381_regs_vp9 { ++ struct rkvdec_vdpu381_regs_common common; ++ struct rkvdec_vdpu381_regs_vp9_params vp9_param; ++ struct rkvdec_vdpu381_regs_common_addr common_addr; ++ struct rkvdec_vdpu381_regs_vp9_addr vp9_addr; ++}__packed; ++ + #endif /* __RKVDEC_REGS_H__ */ +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-vp9.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-vp9.c +new file mode 100644 +index 000000000000..3c7e1103946a +--- /dev/null ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-vp9.c +@@ -0,0 +1,1014 @@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * Rockchip Video Decoder VP9 backend ++ * ++ * Copyright (C) 2025 Venkata Atchuta Bheemeswara Sarma Darbha ++ * ++ * Copyright (C) 2019 Collabora, Ltd. ++ * Boris Brezillon ++ * Copyright (C) 2021 Collabora, Ltd. ++ * Andrzej Pietrasiewicz ++ * ++ * Copyright (C) 2016 Rockchip Electronics Co., Ltd. ++ * Alpha Lin ++ */ ++ ++/* ++ * For following the vp9 spec please start reading this driver ++ * code from rkvdec_vp9_run() followed by rkvdec_vp9_done(). ++ */ ++ ++ ++#include ++#include ++#include ++#include ++ ++ ++#include "rkvdec-rcb.h" ++#include "rkvdec.h" ++#include "rkvdec-vdpu381-regs.h" ++#include "rkvdec-vp9-common.h" ++ ++ ++#define RKVDEC_VP9_PROBE_SIZE 4864 ++#define RKVDEC_VP9_COUNT_SIZE 13208 ++#define RKVDEC_VP9_MAX_SEGMAP_SIZE 73728 ++ ++/* Data structure describing auxiliary buffer format. */ ++struct rkvdec_vp9_priv_tbl { ++ struct rkvdec_vp9_probs probs; ++ u8 segmap[2][RKVDEC_VP9_MAX_SEGMAP_SIZE]; ++}; ++ ++struct rkvdec_vp9_refs_counts { ++ u32 eob[2]; ++ u32 coeff[3]; ++}; ++ ++struct rkvdec_vp9_inter_frame_symbol_counts { ++ u32 partition[16][4]; ++ u32 skip[3][2]; ++ u32 inter[4][2]; ++ u32 tx32p[2][4]; ++ u32 tx16p[2][4]; ++ u32 tx8p[2][2]; ++ u32 y_mode[4][10]; ++ u32 uv_mode[10][10]; ++ u32 comp[5][2]; ++ u32 comp_ref[5][2]; ++ u32 single_ref[5][2][2]; ++ u32 mv_mode[7][4]; ++ u32 filter[4][3]; ++ u32 mv_joint[4]; ++ u32 sign[2][2]; ++ /* add 1 element for align */ ++ u32 classes[2][11 + 1]; ++ u32 class0[2][2]; ++ u32 bits[2][10][2]; ++ u32 class0_fp[2][2][4]; ++ u32 fp[2][4]; ++ u32 class0_hp[2][2]; ++ u32 hp[2][2]; ++ struct rkvdec_vp9_refs_counts ref_cnt[2][4][2][6][6]; ++}; ++ ++struct rkvdec_vp9_intra_frame_symbol_counts { ++ u32 partition[4][4][4]; ++ u32 skip[3][2]; ++ u32 intra[4][2]; ++ u32 tx32p[2][4]; ++ u32 tx16p[2][4]; ++ u32 tx8p[2][2]; ++ struct rkvdec_vp9_refs_counts ref_cnt[2][4][2][6][6]; ++}; ++ ++struct rkvdec_vp9_frame_info { ++ u32 valid : 1; ++ u32 segmapid : 1; ++ u32 frame_context_idx : 2; ++ u32 reference_mode : 2; ++ u32 tx_mode : 3; ++ u32 interpolation_filter : 3; ++ u32 flags; ++ u64 timestamp; ++ struct v4l2_vp9_segmentation seg; ++ struct v4l2_vp9_loop_filter lf; ++}; ++ ++struct rkvdec_vp9_ctx { ++ struct rkvdec_aux_buf priv_tbl; ++ struct rkvdec_aux_buf count_tbl; ++ struct v4l2_vp9_frame_symbol_counts inter_cnts; ++ struct v4l2_vp9_frame_symbol_counts intra_cnts; ++ struct v4l2_vp9_frame_context probability_tables; ++ struct v4l2_vp9_frame_context frame_context[4]; ++ struct rkvdec_vp9_frame_info cur; ++ struct rkvdec_vp9_frame_info last; ++ struct rkvdec_vdpu381_regs_vp9 regs; ++}; ++ ++static void init_intra_only_probs(struct rkvdec_ctx *ctx, ++ const struct rkvdec_vp9_run *run) ++{ ++ struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv; ++ struct rkvdec_vp9_priv_tbl *tbl = vp9_ctx->priv_tbl.cpu; ++ struct rkvdec_vp9_intra_only_frame_probs *rkprobs; ++ const struct v4l2_vp9_frame_context *probs; ++ unsigned int i, j, k; ++ ++ rkprobs = &tbl->probs.intra_only; ++ probs = &vp9_ctx->probability_tables; ++ ++ /* ++ * intra only 149 x 128 bits ,aligned to 152 x 128 bits coeff related ++ * prob 64 x 128 bits ++ */ ++ for (i = 0; i < ARRAY_SIZE(probs->coef); i++) { ++ for (j = 0; j < ARRAY_SIZE(probs->coef[0]); j++) ++ write_coeff_plane(probs->coef[i][j][0], ++ rkprobs->coef_intra[i][j]); ++ } ++ ++ /* intra mode prob 80 x 128 bits */ ++ for (i = 0; i < ARRAY_SIZE(v4l2_vp9_kf_y_mode_prob); i++) { ++ unsigned int byte_count = 0; ++ int idx = 0; ++ ++ /* vp9_kf_y_mode_prob */ ++ for (j = 0; j < ARRAY_SIZE(v4l2_vp9_kf_y_mode_prob[0]); j++) { ++ for (k = 0; k < ARRAY_SIZE(v4l2_vp9_kf_y_mode_prob[0][0]); ++ k++) { ++ u8 val = v4l2_vp9_kf_y_mode_prob[i][j][k]; ++ ++ rkprobs->intra_mode[i].y_mode[idx++] = val; ++ byte_count++; ++ if (byte_count == 27) { ++ byte_count = 0; ++ idx += 5; ++ } ++ } ++ } ++ } ++ ++ for (i = 0; i < sizeof(v4l2_vp9_kf_uv_mode_prob); ++i) { ++ const u8 *ptr = (const u8 *)v4l2_vp9_kf_uv_mode_prob; ++ ++ rkprobs->intra_mode[i / 23].uv_mode[i % 23] = ptr[i]; ++ } ++} ++ ++static void init_inter_probs(struct rkvdec_ctx *ctx, ++ const struct rkvdec_vp9_run *run) ++{ ++ struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv; ++ struct rkvdec_vp9_priv_tbl *tbl = vp9_ctx->priv_tbl.cpu; ++ struct rkvdec_vp9_inter_frame_probs *rkprobs; ++ const struct v4l2_vp9_frame_context *probs; ++ unsigned int i, j, k; ++ ++ rkprobs = &tbl->probs.inter; ++ probs = &vp9_ctx->probability_tables; ++ ++ /* ++ * inter probs ++ * 151 x 128 bits, aligned to 152 x 128 bits ++ * inter only ++ * intra_y_mode & inter_block info 6 x 128 bits ++ */ ++ ++ memcpy(rkprobs->y_mode, probs->y_mode, sizeof(rkprobs->y_mode)); ++ memcpy(rkprobs->comp_mode, probs->comp_mode, ++ sizeof(rkprobs->comp_mode)); ++ memcpy(rkprobs->comp_ref, probs->comp_ref, ++ sizeof(rkprobs->comp_ref)); ++ memcpy(rkprobs->single_ref, probs->single_ref, ++ sizeof(rkprobs->single_ref)); ++ memcpy(rkprobs->inter_mode, probs->inter_mode, ++ sizeof(rkprobs->inter_mode)); ++ memcpy(rkprobs->interp_filter, probs->interp_filter, ++ sizeof(rkprobs->interp_filter)); ++ ++ /* 128 x 128 bits coeff related */ ++ for (i = 0; i < ARRAY_SIZE(probs->coef); i++) { ++ for (j = 0; j < ARRAY_SIZE(probs->coef[0]); j++) { ++ for (k = 0; k < ARRAY_SIZE(probs->coef[0][0]); k++) ++ write_coeff_plane(probs->coef[i][j][k], ++ rkprobs->coef[k][i][j]); ++ } ++ } ++ ++ /* intra uv mode 6 x 128 */ ++ memcpy(rkprobs->uv_mode_0_2, &probs->uv_mode[0], ++ sizeof(rkprobs->uv_mode_0_2)); ++ memcpy(rkprobs->uv_mode_3_5, &probs->uv_mode[3], ++ sizeof(rkprobs->uv_mode_3_5)); ++ memcpy(rkprobs->uv_mode_6_8, &probs->uv_mode[6], ++ sizeof(rkprobs->uv_mode_6_8)); ++ memcpy(rkprobs->uv_mode_9, &probs->uv_mode[9], ++ sizeof(rkprobs->uv_mode_9)); ++ ++ /* mv related 6 x 128 */ ++ memcpy(rkprobs->mv.joint, probs->mv.joint, ++ sizeof(rkprobs->mv.joint)); ++ memcpy(rkprobs->mv.sign, probs->mv.sign, ++ sizeof(rkprobs->mv.sign)); ++ memcpy(rkprobs->mv.classes, probs->mv.classes, ++ sizeof(rkprobs->mv.classes)); ++ memcpy(rkprobs->mv.class0_bit, probs->mv.class0_bit, ++ sizeof(rkprobs->mv.class0_bit)); ++ memcpy(rkprobs->mv.bits, probs->mv.bits, ++ sizeof(rkprobs->mv.bits)); ++ memcpy(rkprobs->mv.class0_fr, probs->mv.class0_fr, ++ sizeof(rkprobs->mv.class0_fr)); ++ memcpy(rkprobs->mv.fr, probs->mv.fr, ++ sizeof(rkprobs->mv.fr)); ++ memcpy(rkprobs->mv.class0_hp, probs->mv.class0_hp, ++ sizeof(rkprobs->mv.class0_hp)); ++ memcpy(rkprobs->mv.hp, probs->mv.hp, ++ sizeof(rkprobs->mv.hp)); ++} ++ ++static void init_probs(struct rkvdec_ctx *ctx, ++ const struct rkvdec_vp9_run *run) ++{ ++ const struct v4l2_ctrl_vp9_frame *dec_params; ++ struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv; ++ struct rkvdec_vp9_priv_tbl *tbl = vp9_ctx->priv_tbl.cpu; ++ struct rkvdec_vp9_probs *rkprobs = &tbl->probs; ++ const struct v4l2_vp9_segmentation *seg; ++ const struct v4l2_vp9_frame_context *probs; ++ bool intra_only; ++ ++ dec_params = run->decode_params; ++ probs = &vp9_ctx->probability_tables; ++ seg = &dec_params->seg; ++ ++ memset(rkprobs, 0, sizeof(*rkprobs)); ++ ++ intra_only = !!(dec_params->flags & ++ (V4L2_VP9_FRAME_FLAG_KEY_FRAME | ++ V4L2_VP9_FRAME_FLAG_INTRA_ONLY)); ++ ++ /* sb info 5 x 128 bit */ ++ memcpy(rkprobs->partition, ++ intra_only ? v4l2_vp9_kf_partition_probs : probs->partition, ++ sizeof(rkprobs->partition)); ++ ++ memcpy(rkprobs->pred, seg->pred_probs, sizeof(rkprobs->pred)); ++ memcpy(rkprobs->tree, seg->tree_probs, sizeof(rkprobs->tree)); ++ memcpy(rkprobs->skip, probs->skip, sizeof(rkprobs->skip)); ++ memcpy(rkprobs->tx32, probs->tx32, sizeof(rkprobs->tx32)); ++ memcpy(rkprobs->tx16, probs->tx16, sizeof(rkprobs->tx16)); ++ memcpy(rkprobs->tx8, probs->tx8, sizeof(rkprobs->tx8)); ++ memcpy(rkprobs->is_inter, probs->is_inter, sizeof(rkprobs->is_inter)); ++ ++ if (intra_only) ++ init_intra_only_probs(ctx, run); ++ else ++ init_inter_probs(ctx, run); ++} ++ ++static void config_ref_registers(struct rkvdec_ctx *ctx, ++ const struct rkvdec_vp9_run *run, ++ struct rkvdec_decoded_buffer *ref_buf, ++ int i) ++{ ++ unsigned int aligned_pitch, aligned_height, y_len, yuv_len; ++ struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv; ++ struct rkvdec_vdpu381_regs_vp9 *regs = &vp9_ctx->regs; ++ ++ aligned_height = round_up(ref_buf->vp9.height, 64); ++ ++ switch(i) { ++ case 0: ++ regs->vp9_param.reg107.vp9_frameheight_last = ref_buf->vp9.height; ++ regs->vp9_param.reg106.vp9_framewidth_last = ref_buf->vp9.width; ++ break; ++ case 1: ++ regs->vp9_param.reg109.vp9_frameheight_golden = ref_buf->vp9.height; ++ regs->vp9_param.reg108.vp9_framewidth_golden = ref_buf->vp9.width; ++ break; ++ case 2: ++ regs->vp9_param.reg111.vp9_frameheight_altref = ref_buf->vp9.height; ++ regs->vp9_param.reg110.vp9_framewidth_altref = ref_buf->vp9.width; ++ break; ++ } ++ ++ switch(i) { ++ case 0: ++ regs->vp9_addr.vp9_referlast_base = vb2_dma_contig_plane_dma_addr(&ref_buf->base.vb.vb2_buf, 0); ++ break; ++ case 1: ++ regs->vp9_addr.vp9_refergolden_base = vb2_dma_contig_plane_dma_addr(&ref_buf->base.vb.vb2_buf, 0); ++ break; ++ case 2: ++ regs->vp9_addr.vp9_referalfter_base = vb2_dma_contig_plane_dma_addr(&ref_buf->base.vb.vb2_buf, 0); ++ break; ++ } ++ ++ if (&ref_buf->base.vb == run->base.bufs.dst) ++ return; ++ ++ aligned_pitch = round_up(ref_buf->vp9.width * ref_buf->vp9.bit_depth, 512) / 8; ++ y_len = aligned_height * aligned_pitch; ++ yuv_len = (y_len * 3) / 2; ++ ++ switch(i) { ++ case 0: ++ regs->vp9_param.reg79.vp9_lastfy_hor_virstride = aligned_pitch / 16; ++ regs->vp9_param.reg80.vp9_lastfuv_hor_virstride = aligned_pitch / 16; ++ regs->vp9_param.reg85.vp9_lastfy_virstride = y_len / 16; ++ break; ++ case 1: ++ regs->vp9_param.reg81.vp9_goldenfy_hor_virstride = aligned_pitch / 16; ++ regs->vp9_param.reg82.vp9_goldenuv_hor_virstride = aligned_pitch / 16; ++ regs->vp9_param.reg86.vp9_goldeny_virstride = y_len / 16; ++ break; ++ case 2: ++ regs->vp9_param.reg83.vp9_altreffy_hor_virstride= aligned_pitch / 16; ++ regs->vp9_param.reg84.vp9_altreff_uv_hor_virstride = aligned_pitch / 16; ++ regs->vp9_param.reg87.vp9_altrefy_virstride = y_len / 16; ++ break; ++ } ++} ++ ++static void config_seg_registers(struct rkvdec_ctx *ctx, unsigned int segid) ++{ ++ struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv; ++ struct rkvdec_vdpu381_regs_vp9 *regs = &vp9_ctx->regs; ++ const struct v4l2_vp9_segmentation *seg; ++ s16 feature_val; ++ int feature_id; ++ ++ seg = vp9_ctx->last.valid ? &vp9_ctx->last.seg : &vp9_ctx->cur.seg; ++ feature_id = V4L2_VP9_SEG_LVL_ALT_Q; ++ if (v4l2_vp9_seg_feat_enabled(seg->feature_enabled, feature_id, segid)) { ++ feature_val = seg->feature_data[segid][feature_id]; ++ regs->vp9_param.reg67_74[segid].vp9_segid_frame_qp_delta_en = 1; ++ regs->vp9_param.reg67_74[segid].vp9_segid_frame_qp_delta = feature_val; ++ } ++ ++ feature_id = V4L2_VP9_SEG_LVL_ALT_L; ++ if (v4l2_vp9_seg_feat_enabled(seg->feature_enabled, feature_id, segid)) { ++ feature_val = seg->feature_data[segid][feature_id]; ++ regs->vp9_param.reg67_74[segid].vp9_segid_frame_loopfilter_value_en = 1; ++ regs->vp9_param.reg67_74[segid].vp9_segid_frame_loopfilter_value = feature_val; ++ } ++ ++ feature_id = V4L2_VP9_SEG_LVL_REF_FRAME; ++ if (v4l2_vp9_seg_feat_enabled(seg->feature_enabled, feature_id, segid)) { ++ feature_val = seg->feature_data[segid][feature_id]; ++ regs->vp9_param.reg67_74[segid].vp9_segid_referinfo_en = 1; ++ regs->vp9_param.reg67_74[segid].vp9_segid_referinfo = feature_val; ++ } ++ ++ feature_id = V4L2_VP9_SEG_LVL_SKIP; ++ regs->vp9_param.reg67_74[segid].vp9_segid_frame_skip_en = ++ v4l2_vp9_seg_feat_enabled(seg->feature_enabled, feature_id, segid); ++ ++ regs->vp9_param.reg67_74[segid].vp9_segid_abs_delta = !segid && ++ (seg->flags & V4L2_VP9_SEGMENTATION_FLAG_ABS_OR_DELTA_UPDATE); ++ ++} ++ ++static void update_ctx_cur_info(struct rkvdec_vp9_ctx *vp9_ctx, ++ struct rkvdec_decoded_buffer *buf, ++ const struct v4l2_ctrl_vp9_frame *dec_params) ++{ ++ vp9_ctx->cur.valid = true; ++ vp9_ctx->cur.reference_mode = dec_params->reference_mode; ++ vp9_ctx->cur.interpolation_filter = dec_params->interpolation_filter; ++ vp9_ctx->cur.flags = dec_params->flags; ++ vp9_ctx->cur.timestamp = buf->base.vb.vb2_buf.timestamp; ++ vp9_ctx->cur.seg = dec_params->seg; ++ vp9_ctx->cur.lf = dec_params->lf; ++} ++ ++static void update_ctx_last_info(struct rkvdec_vp9_ctx *vp9_ctx) ++{ ++ vp9_ctx->last = vp9_ctx->cur; ++} ++ ++static void rkvdec_write_regs(struct rkvdec_ctx *ctx) ++{ ++ struct rkvdec_dev *rkvdec = ctx->dev; ++ struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv; ++ ++ rkvdec_memcpy_toio(rkvdec->regs + OFFSET_COMMON_REGS, ++ &vp9_ctx->regs.common, ++ sizeof(vp9_ctx->regs.common)); ++ rkvdec_memcpy_toio(rkvdec->regs + OFFSET_CODEC_PARAMS_REGS, ++ &vp9_ctx->regs.vp9_param, ++ sizeof(vp9_ctx->regs.vp9_param)); ++ rkvdec_memcpy_toio(rkvdec->regs + OFFSET_COMMON_ADDR_REGS, ++ &vp9_ctx->regs.common_addr, ++ sizeof(vp9_ctx->regs.common_addr)); ++ rkvdec_memcpy_toio(rkvdec->regs + OFFSET_CODEC_ADDR_REGS, ++ &vp9_ctx->regs.vp9_addr, ++ sizeof(vp9_ctx->regs.vp9_addr)); ++ ++} ++ ++static void config_registers(struct rkvdec_ctx *ctx, ++ const struct rkvdec_vp9_run *run) ++{ ++ unsigned int y_len, uv_len, yuv_len, bit_depth, aligned_height, aligned_pitch, stream_len; ++ const struct v4l2_ctrl_vp9_frame *dec_params; ++ struct rkvdec_decoded_buffer *ref_bufs[3]; ++ struct rkvdec_decoded_buffer *dst, *last, *mv_ref; ++ struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv; ++ struct rkvdec_vdpu381_regs_vp9 *regs = &vp9_ctx->regs; ++ u32 val; ++ const struct v4l2_vp9_segmentation *seg; ++ u32 pixels; ++ ++ dma_addr_t rlc_addr, dst_addr; ++ bool intra_only; ++ unsigned int i; ++ ++ ++ dec_params = run->decode_params; ++ dst = vb2_to_rkvdec_decoded_buf(&run->base.bufs.dst->vb2_buf); ++ ref_bufs[0] = get_ref_buf_vp9(ctx, &dst->base.vb, dec_params->last_frame_ts); ++ ref_bufs[1] = get_ref_buf_vp9(ctx, &dst->base.vb, dec_params->golden_frame_ts); ++ ref_bufs[2] = get_ref_buf_vp9(ctx, &dst->base.vb, dec_params->alt_frame_ts); ++ ++ if (vp9_ctx->last.valid) ++ last = get_ref_buf_vp9(ctx, &dst->base.vb, vp9_ctx->last.timestamp); ++ else ++ last = dst; ++ ++ update_dec_buf_info(dst, dec_params); ++ update_ctx_cur_info(vp9_ctx, dst, dec_params); ++ seg = &dec_params->seg; ++ ++ intra_only = !!(dec_params->flags & ++ (V4L2_VP9_FRAME_FLAG_KEY_FRAME | ++ V4L2_VP9_FRAME_FLAG_INTRA_ONLY)); ++ ++ regs->common.reg009_dec_mode.dec_mode = VDPU381_MODE_VP9; ++ ++ regs->vp9_param.reg103.vp9_intra_only_flag = intra_only; ++ ++ /* Set config */ ++ regs->common.reg011_important_en.buf_empty_en = 1; ++ regs->common.reg011_important_en.dec_clkgate_e = 1; ++ regs->common.reg011_important_en.dec_timeout_e = 1; ++ ++ ++ bit_depth = dec_params->bit_depth; ++ aligned_height = round_up(ctx->decoded_fmt.fmt.pix_mp.height, 64); ++ ++ aligned_pitch = round_up(ctx->decoded_fmt.fmt.pix_mp.width * ++ bit_depth, ++ 512) / 8; ++ y_len = aligned_height * aligned_pitch; ++ uv_len = y_len / 2; ++ yuv_len = y_len + uv_len; ++ ++ pixels = ctx->decoded_fmt.fmt.pix_mp.height * ctx->decoded_fmt.fmt.pix_mp.width; ++ ++ regs->common.reg018_y_hor_stride.y_hor_virstride = aligned_pitch / 16; ++ regs->common.reg019_uv_hor_stride.uv_hor_virstride = aligned_pitch / 16; ++ regs->common.reg020_y_stride.y_virstride = y_len / 16; ++ ++ ++ stream_len = vb2_get_plane_payload(&run->base.bufs.src->vb2_buf, 0); ++ ++ regs->common.reg016_stream_len = stream_len; ++ ++ /* Activate block gating */ ++ regs->common.reg026_block_gating_en.reg_cfg_gating_en = 1; ++ ++ /* Set timeout threshold */ ++ if (pixels < RKVDEC_1080P_PIXELS) ++ regs->common.reg032_timeout_threshold = RKVDEC_TIMEOUT_1080p; ++ else if (pixels < RKVDEC_4K_PIXELS) ++ regs->common.reg032_timeout_threshold = RKVDEC_TIMEOUT_4K; ++ else if (pixels < RKVDEC_8K_PIXELS) ++ regs->common.reg032_timeout_threshold = RKVDEC_TIMEOUT_8K; ++ else ++ regs->common.reg032_timeout_threshold = RKVDEC_TIMEOUT_MAX; ++ ++ /* ++ * Reset count buffer, because decoder only output intra related syntax ++ * counts when decoding intra frame, but update entropy need to update ++ * all the probabilities. ++ */ ++ if (intra_only) ++ memset(vp9_ctx->count_tbl.cpu, 0, vp9_ctx->count_tbl.size); ++ ++ vp9_ctx->cur.segmapid = vp9_ctx->last.segmapid; ++ if (!intra_only && ++ !(dec_params->flags & V4L2_VP9_FRAME_FLAG_ERROR_RESILIENT) && ++ (!(seg->flags & V4L2_VP9_SEGMENTATION_FLAG_ENABLED) || ++ (seg->flags & V4L2_VP9_SEGMENTATION_FLAG_UPDATE_MAP))) ++ vp9_ctx->cur.segmapid++; ++ ++ for (i = 0; i < ARRAY_SIZE(ref_bufs); i++) ++ config_ref_registers(ctx, run, ref_bufs[i], i); ++ ++ for (i = 0; i < 8; i++) ++ config_seg_registers(ctx, i); ++ ++ regs->vp9_param.reg76.vp9_tx_mode = vp9_ctx->cur.tx_mode; ++ regs->vp9_param.reg76.vp9_frame_reference_mode = dec_params->reference_mode; ++ ++ if (!intra_only) { ++ const struct v4l2_vp9_loop_filter *lf; ++ ++ if (vp9_ctx->last.valid) ++ lf = &vp9_ctx->last.lf; ++ else ++ lf = &vp9_ctx->cur.lf; ++ ++ val = 0; ++ ++ for (i = 0; i < ARRAY_SIZE(lf->ref_deltas); i++) { ++ regs->vp9_param.reg94.vp9_ref_deltas_lastframe |= (lf->ref_deltas[i] & 0x7f) << (7 * i); ++ } ++ ++ for(i = 0; i < ARRAY_SIZE(lf->mode_deltas); i++){ ++ regs->vp9_param.reg75.vp9_mode_deltas_lastframe |= (lf->mode_deltas[i] & 0x7f) << (7 * i); ++ } ++ } ++ ++ regs->vp9_param.reg75.segmentation_enable_lstframe = ++ vp9_ctx->last.valid && !intra_only && ++ vp9_ctx->last.seg.flags & V4L2_VP9_SEGMENTATION_FLAG_ENABLED; ++ ++ regs->vp9_param.reg75.vp9_last_showframe = ++ vp9_ctx->last.valid && ++ vp9_ctx->last.flags & V4L2_VP9_FRAME_FLAG_SHOW_FRAME; ++ ++ regs->vp9_param.reg75.vp9_last_intra_only = ++ vp9_ctx->last.valid && ++ vp9_ctx->last.flags & ++ (V4L2_VP9_FRAME_FLAG_KEY_FRAME | V4L2_VP9_FRAME_FLAG_INTRA_ONLY); ++ ++ regs->vp9_param.reg75.vp9_last_widhheight_eqcur = ++ vp9_ctx->last.valid && ++ last->vp9.width == dst->vp9.width && ++ last->vp9.height == dst->vp9.height; ++ ++ regs->vp9_param.reg78_vp9_stream_size = stream_len; ++ ++ ++ for (i = 0; !intra_only && i < ARRAY_SIZE(ref_bufs); i++) { ++ unsigned int refw = ref_bufs[i]->vp9.width; ++ unsigned int refh = ref_bufs[i]->vp9.height; ++ u32 hscale, vscale; ++ ++ hscale = (refw << 14) / dst->vp9.width; ++ vscale = (refh << 14) / dst->vp9.height; ++ ++ switch(i) { ++ case 0: ++ regs->vp9_param.reg88.vp9_lref_hor_scale = hscale; ++ regs->vp9_param.reg89.vp9_lref_ver_scale = vscale; ++ break; ++ case 1: ++ regs->vp9_param.reg90.vp9_gref_hor_scale = hscale; ++ regs->vp9_param.reg91.vp9_gref_ver_scale = vscale; ++ break; ++ case 2: ++ regs->vp9_param.reg92.vp9_aref_hor_scale = hscale; ++ regs->vp9_param.reg93.vp9_aref_ver_scale = hscale; ++ break; ++ } ++ ++ ++ } ++ ++ /* Set rlc base address (input stream) */ ++ rlc_addr = vb2_dma_contig_plane_dma_addr(&run->base.bufs.src->vb2_buf, 0); ++ regs->common_addr.rlc_base = rlc_addr; ++ regs->common_addr.rlcwrite_base = rlc_addr; ++ ++ /* Set output base address */ ++ dst_addr = vb2_dma_contig_plane_dma_addr(&dst->base.vb.vb2_buf, 0); ++ regs->common_addr.decout_base = dst_addr; ++ regs->common_addr.error_ref_base = dst_addr; ++ ++ /* Set colmv address */ ++ regs->common_addr.colmv_cur_base = dst_addr + ctx->colmv_offset; ++ ++ /* Set RCB addresses */ ++ for (i = 0; i < rkvdec_rcb_buf_count(ctx); i++) ++ regs->common_addr.rcb_base[i] = rkvdec_rcb_buf_dma_addr(ctx, i); ++ ++ regs->vp9_addr.cabactbl_base = vp9_ctx->priv_tbl.dma + ++ offsetof(struct rkvdec_vp9_priv_tbl, probs); ++ ++ regs->vp9_addr.vp9_count_base = vp9_ctx->count_tbl.dma; ++ ++ regs->vp9_addr.vp9_segidlast_base = vp9_ctx->priv_tbl.dma + ++ offsetof(struct rkvdec_vp9_priv_tbl, segmap) + ++ (RKVDEC_VP9_MAX_SEGMAP_SIZE * (!vp9_ctx->cur.segmapid)); ++ ++ regs->vp9_addr.avp9_segidcur_base = vp9_ctx->priv_tbl.dma + ++ offsetof(struct rkvdec_vp9_priv_tbl, segmap) + ++ (RKVDEC_VP9_MAX_SEGMAP_SIZE * vp9_ctx->cur.segmapid); ++ ++ if (!intra_only && ++ !(dec_params->flags & V4L2_VP9_FRAME_FLAG_ERROR_RESILIENT) && ++ vp9_ctx->last.valid) ++ mv_ref = last; ++ else ++ mv_ref = dst; ++ ++ regs->vp9_addr.vp9_refcolmv_base = get_mv_base_addr(mv_ref); ++ ++ rkvdec_write_regs(ctx); ++ ++} ++ ++static int validate_dec_params(struct rkvdec_ctx *ctx, ++ const struct v4l2_ctrl_vp9_frame *dec_params) ++{ ++ unsigned int aligned_width, aligned_height; ++ ++ aligned_width = round_up(dec_params->frame_width_minus_1 + 1, 64); ++ aligned_height = round_up(dec_params->frame_height_minus_1 + 1, 64); ++ ++ /* ++ * Userspace should update the capture/decoded format when the ++ * resolution changes. ++ */ ++ if (aligned_width != ctx->decoded_fmt.fmt.pix_mp.width || ++ aligned_height != ctx->decoded_fmt.fmt.pix_mp.height) { ++ dev_err(ctx->dev->dev, ++ "unexpected bitstream resolution %dx%d\n", ++ dec_params->frame_width_minus_1 + 1, ++ dec_params->frame_height_minus_1 + 1); ++ return -EINVAL; ++ } ++ ++ return 0; ++} ++ ++static int rkvdec_vp9_run_preamble(struct rkvdec_ctx *ctx, ++ struct rkvdec_vp9_run *run) ++{ ++ const struct v4l2_ctrl_vp9_frame *dec_params; ++ const struct v4l2_ctrl_vp9_compressed_hdr *prob_updates; ++ struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv; ++ struct v4l2_ctrl *ctrl; ++ unsigned int fctx_idx; ++ int ret; ++ ++ /* v4l2-specific stuff */ ++ rkvdec_run_preamble(ctx, &run->base); ++ ++ ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, ++ V4L2_CID_STATELESS_VP9_FRAME); ++ if (WARN_ON(!ctrl)) ++ return -EINVAL; ++ dec_params = ctrl->p_cur.p; ++ ++ ret = validate_dec_params(ctx, dec_params); ++ if (ret) ++ return ret; ++ ++ run->decode_params = dec_params; ++ ++ ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_COMPRESSED_HDR); ++ if (WARN_ON(!ctrl)) ++ return -EINVAL; ++ prob_updates = ctrl->p_cur.p; ++ vp9_ctx->cur.tx_mode = prob_updates->tx_mode; ++ ++ /* ++ * vp9 stuff ++ * ++ * by this point the userspace has done all parts of 6.2 uncompressed_header() ++ * except this fragment: ++ * if ( FrameIsIntra || error_resilient_mode ) { ++ * setup_past_independence ( ) ++ * if ( frame_type == KEY_FRAME || error_resilient_mode == 1 || ++ * reset_frame_context == 3 ) { ++ * for ( i = 0; i < 4; i ++ ) { ++ * save_probs( i ) ++ * } ++ * } else if ( reset_frame_context == 2 ) { ++ * save_probs( frame_context_idx ) ++ * } ++ * frame_context_idx = 0 ++ * } ++ */ ++ fctx_idx = v4l2_vp9_reset_frame_ctx(dec_params, vp9_ctx->frame_context); ++ vp9_ctx->cur.frame_context_idx = fctx_idx; ++ ++ /* 6.1 frame(sz): load_probs() and load_probs2() */ ++ vp9_ctx->probability_tables = vp9_ctx->frame_context[fctx_idx]; ++ ++ /* ++ * The userspace has also performed 6.3 compressed_header(), but handling the ++ * probs in a special way. All probs which need updating, except MV-related, ++ * have been read from the bitstream and translated through inv_map_table[], ++ * but no 6.3.6 inv_recenter_nonneg(v, m) has been performed. The values passed ++ * by userspace are either translated values (there are no 0 values in ++ * inv_map_table[]), or zero to indicate no update. All MV-related probs which need ++ * updating have been read from the bitstream and (mv_prob << 1) | 1 has been ++ * performed. The values passed by userspace are either new values ++ * to replace old ones (the above mentioned shift and bitwise or never result in ++ * a zero) or zero to indicate no update. ++ * fw_update_probs() performs actual probs updates or leaves probs as-is ++ * for values for which a zero was passed from userspace. ++ */ ++ v4l2_vp9_fw_update_probs(&vp9_ctx->probability_tables, prob_updates, dec_params); ++ ++ return 0; ++} ++ ++static int rkvdec_vp9_run(struct rkvdec_ctx *ctx) ++{ ++ struct rkvdec_dev *rkvdec = ctx->dev; ++ struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv; ++ struct rkvdec_vp9_run run = { }; ++ int ret; ++ u32 watchdog_time; ++ ++ ret = rkvdec_vp9_run_preamble(ctx, &run); ++ ++ if (ret) { ++ rkvdec_run_postamble(ctx, &run.base); ++ ++ return ret; ++ } ++ ++ /* Prepare probs. */ ++ init_probs(ctx, &run); ++ ++ /* Configure hardware registers. */ ++ config_registers(ctx, &run); ++ ++ rkvdec_run_postamble(ctx, &run.base); ++ ++ u64 timeout_threshold = vp9_ctx->regs.common.reg032_timeout_threshold; ++ unsigned long axi_rate = clk_get_rate(rkvdec->axi_clk); ++ ++ if (axi_rate) { ++ watchdog_time = 2 * (1000 * timeout_threshold) / axi_rate; ++ } else { ++ watchdog_time = 2000; ++ } ++ ++ schedule_delayed_work(&rkvdec->watchdog_work, ++ msecs_to_jiffies(watchdog_time)); ++ ++ writel(VDPU381_DEC_E_BIT, rkvdec->regs + VDPU381_REG_DEC_E); ++ ++ return 0; ++} ++ ++#define copy_tx_and_skip(p1, p2) \ ++do { \ ++ memcpy((p1)->tx8, (p2)->tx8, sizeof((p1)->tx8)); \ ++ memcpy((p1)->tx16, (p2)->tx16, sizeof((p1)->tx16)); \ ++ memcpy((p1)->tx32, (p2)->tx32, sizeof((p1)->tx32)); \ ++ memcpy((p1)->skip, (p2)->skip, sizeof((p1)->skip)); \ ++} while (0) ++ ++ ++static void rkvdec_vp9_done(struct rkvdec_ctx *ctx, ++ struct vb2_v4l2_buffer *src_buf, ++ struct vb2_v4l2_buffer *dst_buf, ++ enum vb2_buffer_state result) ++{ ++ struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv; ++ unsigned int fctx_idx; ++ ++ /* v4l2-specific stuff */ ++ if (result == VB2_BUF_STATE_ERROR) ++ goto out_update_last; ++ ++ /* ++ * vp9 stuff ++ * ++ * 6.1.2 refresh_probs() ++ * ++ * In the spec a complementary condition goes last in 6.1.2 refresh_probs(), ++ * but it makes no sense to perform all the activities from the first "if" ++ * there if we actually are not refreshing the frame context. On top of that, ++ * because of 6.2 uncompressed_header() whenever error_resilient_mode == 1, ++ * refresh_frame_context == 0. Consequently, if we don't jump to out_update_last ++ * it means error_resilient_mode must be 0. ++ */ ++ if (!(vp9_ctx->cur.flags & V4L2_VP9_FRAME_FLAG_REFRESH_FRAME_CTX)) ++ goto out_update_last; ++ ++ fctx_idx = vp9_ctx->cur.frame_context_idx; ++ ++ if (!(vp9_ctx->cur.flags & V4L2_VP9_FRAME_FLAG_PARALLEL_DEC_MODE)) { ++ /* error_resilient_mode == 0 && frame_parallel_decoding_mode == 0 */ ++ struct v4l2_vp9_frame_context *probs = &vp9_ctx->probability_tables; ++ bool frame_is_intra = vp9_ctx->cur.flags & ++ (V4L2_VP9_FRAME_FLAG_KEY_FRAME | V4L2_VP9_FRAME_FLAG_INTRA_ONLY); ++ struct tx_and_skip { ++ u8 tx8[2][1]; ++ u8 tx16[2][2]; ++ u8 tx32[2][3]; ++ u8 skip[3]; ++ } _tx_skip, *tx_skip = &_tx_skip; ++ struct v4l2_vp9_frame_symbol_counts *counts; ++ ++ /* buffer the forward-updated TX and skip probs */ ++ if (frame_is_intra) ++ copy_tx_and_skip(tx_skip, probs); ++ ++ /* 6.1.2 refresh_probs(): load_probs() and load_probs2() */ ++ *probs = vp9_ctx->frame_context[fctx_idx]; ++ ++ /* if FrameIsIntra then undo the effect of load_probs2() */ ++ if (frame_is_intra) ++ copy_tx_and_skip(probs, tx_skip); ++ ++ counts = frame_is_intra ? &vp9_ctx->intra_cnts : &vp9_ctx->inter_cnts; ++ v4l2_vp9_adapt_coef_probs(probs, counts, ++ !vp9_ctx->last.valid || ++ vp9_ctx->last.flags & V4L2_VP9_FRAME_FLAG_KEY_FRAME, ++ frame_is_intra); ++ if (!frame_is_intra) { ++ const struct rkvdec_vp9_inter_frame_symbol_counts *inter_cnts; ++ u32 classes[2][11]; ++ int i; ++ ++ inter_cnts = vp9_ctx->count_tbl.cpu; ++ for (i = 0; i < ARRAY_SIZE(classes); ++i) ++ memcpy(classes[i], inter_cnts->classes[i], sizeof(classes[0])); ++ counts->classes = &classes; ++ ++ /* load_probs2() already done */ ++ v4l2_vp9_adapt_noncoef_probs(&vp9_ctx->probability_tables, counts, ++ vp9_ctx->cur.reference_mode, ++ vp9_ctx->cur.interpolation_filter, ++ vp9_ctx->cur.tx_mode, vp9_ctx->cur.flags); ++ } ++ } ++ ++ /* 6.1.2 refresh_probs(): save_probs(fctx_idx) */ ++ vp9_ctx->frame_context[fctx_idx] = vp9_ctx->probability_tables; ++ ++out_update_last: ++ update_ctx_last_info(vp9_ctx); ++} ++ ++static void rkvdec_init_v4l2_vp9_count_tbl(struct rkvdec_ctx *ctx) ++{ ++ struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv; ++ struct rkvdec_vp9_intra_frame_symbol_counts *intra_cnts = vp9_ctx->count_tbl.cpu; ++ struct rkvdec_vp9_inter_frame_symbol_counts *inter_cnts = vp9_ctx->count_tbl.cpu; ++ int i, j, k, l, m; ++ ++ vp9_ctx->inter_cnts.partition = &inter_cnts->partition; ++ vp9_ctx->inter_cnts.skip = &inter_cnts->skip; ++ vp9_ctx->inter_cnts.intra_inter = &inter_cnts->inter; ++ vp9_ctx->inter_cnts.tx32p = &inter_cnts->tx32p; ++ vp9_ctx->inter_cnts.tx16p = &inter_cnts->tx16p; ++ vp9_ctx->inter_cnts.tx8p = &inter_cnts->tx8p; ++ ++ vp9_ctx->intra_cnts.partition = (u32 (*)[16][4])(&intra_cnts->partition); ++ vp9_ctx->intra_cnts.skip = &intra_cnts->skip; ++ vp9_ctx->intra_cnts.intra_inter = &intra_cnts->intra; ++ vp9_ctx->intra_cnts.tx32p = &intra_cnts->tx32p; ++ vp9_ctx->intra_cnts.tx16p = &intra_cnts->tx16p; ++ vp9_ctx->intra_cnts.tx8p = &intra_cnts->tx8p; ++ ++ vp9_ctx->inter_cnts.y_mode = &inter_cnts->y_mode; ++ vp9_ctx->inter_cnts.uv_mode = &inter_cnts->uv_mode; ++ vp9_ctx->inter_cnts.comp = &inter_cnts->comp; ++ vp9_ctx->inter_cnts.comp_ref = &inter_cnts->comp_ref; ++ vp9_ctx->inter_cnts.single_ref = &inter_cnts->single_ref; ++ vp9_ctx->inter_cnts.mv_mode = &inter_cnts->mv_mode; ++ vp9_ctx->inter_cnts.filter = &inter_cnts->filter; ++ vp9_ctx->inter_cnts.mv_joint = &inter_cnts->mv_joint; ++ vp9_ctx->inter_cnts.sign = &inter_cnts->sign; ++ /* ++ * rk hardware actually uses "u32 classes[2][11 + 1];" ++ * instead of "u32 classes[2][11];", so this must be explicitly ++ * copied into vp9_ctx->classes when passing the data to the ++ * vp9 library function ++ */ ++ vp9_ctx->inter_cnts.class0 = &inter_cnts->class0; ++ vp9_ctx->inter_cnts.bits = &inter_cnts->bits; ++ vp9_ctx->inter_cnts.class0_fp = &inter_cnts->class0_fp; ++ vp9_ctx->inter_cnts.fp = &inter_cnts->fp; ++ vp9_ctx->inter_cnts.class0_hp = &inter_cnts->class0_hp; ++ vp9_ctx->inter_cnts.hp = &inter_cnts->hp; ++ ++#define INNERMOST_LOOP \ ++ do { \ ++ for (m = 0; m < ARRAY_SIZE(vp9_ctx->inter_cnts.coeff[0][0][0][0]); ++m) {\ ++ vp9_ctx->inter_cnts.coeff[i][j][k][l][m] = \ ++ &inter_cnts->ref_cnt[k][i][j][l][m].coeff; \ ++ vp9_ctx->inter_cnts.eob[i][j][k][l][m][0] = \ ++ &inter_cnts->ref_cnt[k][i][j][l][m].eob[0]; \ ++ vp9_ctx->inter_cnts.eob[i][j][k][l][m][1] = \ ++ &inter_cnts->ref_cnt[k][i][j][l][m].eob[1]; \ ++ \ ++ vp9_ctx->intra_cnts.coeff[i][j][k][l][m] = \ ++ &intra_cnts->ref_cnt[k][i][j][l][m].coeff; \ ++ vp9_ctx->intra_cnts.eob[i][j][k][l][m][0] = \ ++ &intra_cnts->ref_cnt[k][i][j][l][m].eob[0]; \ ++ vp9_ctx->intra_cnts.eob[i][j][k][l][m][1] = \ ++ &intra_cnts->ref_cnt[k][i][j][l][m].eob[1]; \ ++ } \ ++ } while (0) ++ ++ for (i = 0; i < ARRAY_SIZE(vp9_ctx->inter_cnts.coeff); ++i) ++ for (j = 0; j < ARRAY_SIZE(vp9_ctx->inter_cnts.coeff[0]); ++j) ++ for (k = 0; k < ARRAY_SIZE(vp9_ctx->inter_cnts.coeff[0][0]); ++k) ++ for (l = 0; l < ARRAY_SIZE(vp9_ctx->inter_cnts.coeff[0][0][0]); ++l) ++ INNERMOST_LOOP; ++#undef INNERMOST_LOOP ++} ++ ++static int rkvdec_vp9_start(struct rkvdec_ctx *ctx) ++{ ++ struct rkvdec_dev *rkvdec = ctx->dev; ++ struct rkvdec_vp9_priv_tbl *priv_tbl; ++ struct rkvdec_vp9_ctx *vp9_ctx; ++ unsigned char *count_tbl; ++ struct v4l2_ctrl *ctrl; ++ int ret; ++ ++ /* frame header */ ++ ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_FRAME); ++ if (!ctrl) ++ return -EINVAL; ++ ++ vp9_ctx = kzalloc(sizeof(*vp9_ctx), GFP_KERNEL); ++ if (!vp9_ctx) ++ return -ENOMEM; ++ ++ ctx->priv = vp9_ctx; ++ ++ BUILD_BUG_ON(sizeof(priv_tbl->probs) % 16); /* ensure probs size is 128-bit aligned */ ++ priv_tbl = dma_alloc_coherent(rkvdec->dev, sizeof(*priv_tbl), ++ &vp9_ctx->priv_tbl.dma, GFP_KERNEL); ++ if (!priv_tbl) { ++ ret = -ENOMEM; ++ goto err_free_ctx; ++ } ++ ++ vp9_ctx->priv_tbl.size = sizeof(*priv_tbl); ++ vp9_ctx->priv_tbl.cpu = priv_tbl; ++ ++ count_tbl = dma_alloc_coherent(rkvdec->dev, RKVDEC_VP9_COUNT_SIZE, ++ &vp9_ctx->count_tbl.dma, GFP_KERNEL); ++ if (!count_tbl) { ++ ret = -ENOMEM; ++ goto err_free_priv_tbl; ++ } ++ ++ vp9_ctx->count_tbl.size = RKVDEC_VP9_COUNT_SIZE; ++ vp9_ctx->count_tbl.cpu = count_tbl; ++ rkvdec_init_v4l2_vp9_count_tbl(ctx); ++ ++ return 0; ++ ++err_free_priv_tbl: ++ dma_free_coherent(rkvdec->dev, vp9_ctx->priv_tbl.size, ++ vp9_ctx->priv_tbl.cpu, vp9_ctx->priv_tbl.dma); ++ ++err_free_ctx: ++ kfree(vp9_ctx); ++ return ret; ++} ++ ++static void rkvdec_vp9_stop(struct rkvdec_ctx *ctx) ++{ ++ struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv; ++ struct rkvdec_dev *rkvdec = ctx->dev; ++ ++ dma_free_coherent(rkvdec->dev, vp9_ctx->count_tbl.size, ++ vp9_ctx->count_tbl.cpu, vp9_ctx->count_tbl.dma); ++ ++ dma_free_coherent(rkvdec->dev, vp9_ctx->priv_tbl.size, ++ vp9_ctx->priv_tbl.cpu, vp9_ctx->priv_tbl.dma); ++ ++ kfree(vp9_ctx); ++ ++} ++ ++static int rkvdec_vp9_adjust_fmt(struct rkvdec_ctx *ctx, ++ struct v4l2_format *f) ++{ ++ struct v4l2_pix_format_mplane *fmt = &f->fmt.pix_mp; ++ ++ fmt->num_planes = 1; ++ if (!fmt->plane_fmt[0].sizeimage) ++ fmt->plane_fmt[0].sizeimage = fmt->width * fmt->height * 2; ++ return 0; ++} ++ ++ ++const struct rkvdec_coded_fmt_ops rkvdec_vdpu381_vp9_fmt_ops = { ++ .adjust_fmt = rkvdec_vp9_adjust_fmt, ++ .start = rkvdec_vp9_start, ++ .stop = rkvdec_vp9_stop, ++ .run = rkvdec_vp9_run, ++ .done = rkvdec_vp9_done, ++}; +\ No newline at end of file +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec.c b/drivers/media/platform/rockchip/rkvdec/rkvdec.c +index 1d1e9bfef8e9..64e318ea2616 100644 +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec.c ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.c +@@ -443,6 +443,43 @@ static const struct rkvdec_decoded_fmt_desc rkvdec_vp9_decoded_fmts[] = { + }, + }; + ++static const struct rkvdec_ctrl_desc vdpu381_vp9_ctrl_descs[] = { ++ { ++ .cfg.id = V4L2_CID_STATELESS_VP9_FRAME, ++ }, ++ { ++ .cfg.id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, ++ }, ++ { ++ .cfg.id = V4L2_CID_MPEG_VIDEO_VP9_PROFILE, ++ .cfg.min = V4L2_MPEG_VIDEO_VP9_PROFILE_0, ++ .cfg.max = V4L2_MPEG_VIDEO_VP9_PROFILE_2, ++ .cfg.def = V4L2_MPEG_VIDEO_VP9_PROFILE_0, ++ }, ++ { ++ .cfg.id = V4L2_CID_MPEG_VIDEO_VP9_LEVEL, ++ .cfg.min = V4L2_MPEG_VIDEO_VP9_LEVEL_1_0, ++ .cfg.max = V4L2_MPEG_VIDEO_VP9_LEVEL_6_1, ++ ++ }, ++}; ++ ++static const struct rkvdec_ctrls vdpu381_vp9_ctrls = { ++ .ctrls = vdpu381_vp9_ctrl_descs, ++ .num_ctrls = ARRAY_SIZE(vdpu381_vp9_ctrl_descs), ++}; ++ ++static const struct rkvdec_decoded_fmt_desc rkvdec_vdpu381_vp9_decoded_fmts[] = { ++ { ++ .fourcc = V4L2_PIX_FMT_NV12, ++ .image_fmt = RKVDEC_IMG_FMT_420_8BIT, ++ }, ++ { ++ .fourcc = V4L2_PIX_FMT_NV15, ++ .image_fmt = RKVDEC_IMG_FMT_420_10BIT, ++ }, ++}; ++ + static const struct rkvdec_coded_fmt_desc rkvdec_coded_fmts[] = { + { + .fourcc = V4L2_PIX_FMT_HEVC_SLICE, +@@ -543,6 +580,21 @@ static const struct rkvdec_coded_fmt_desc vdpu381_coded_fmts[] = { + .decoded_fmts = rkvdec_h264_decoded_fmts, + .subsystem_flags = VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF, + }, ++ { ++ .fourcc = V4L2_PIX_FMT_VP9_FRAME, ++ .frmsize = { ++ .min_width = 64, ++ .max_width = 65472, ++ .step_width = 64, ++ .min_height = 64, ++ .max_height = 65472, ++ .step_height = 64, ++ }, ++ .ctrls = &vdpu381_vp9_ctrls, ++ .ops = &rkvdec_vdpu381_vp9_fmt_ops, ++ .num_decoded_fmts = ARRAY_SIZE(rkvdec_vdpu381_vp9_decoded_fmts), ++ .decoded_fmts = rkvdec_vdpu381_vp9_decoded_fmts, ++ } + }; + + static const struct rkvdec_coded_fmt_desc vdpu383_coded_fmts[] = { +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec.h b/drivers/media/platform/rockchip/rkvdec/rkvdec.h +index a24be6638b6b..d73ec9442a69 100644 +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec.h ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.h +@@ -191,6 +191,7 @@ extern const struct rkvdec_coded_fmt_ops rkvdec_vp9_fmt_ops; + /* VDPU381 ops */ + extern const struct rkvdec_coded_fmt_ops rkvdec_vdpu381_h264_fmt_ops; + extern const struct rkvdec_coded_fmt_ops rkvdec_vdpu381_hevc_fmt_ops; ++extern const struct rkvdec_coded_fmt_ops rkvdec_vdpu381_vp9_fmt_ops; + + /* VDPU383 ops */ + extern const struct rkvdec_coded_fmt_ops rkvdec_vdpu383_h264_fmt_ops; +-- +2.54.0 + diff --git a/patches/driver/media/README.md b/patches/driver/media/README.md new file mode 100644 index 0000000..fe0a0e7 --- /dev/null +++ b/patches/driver/media/README.md @@ -0,0 +1,69 @@ +# patches/driver/media/ + +Scope-tagged kernel-agent patches that touch `drivers/media/` — third-party +video-codec enablement work that hasn't reached linux-media patchwork as +formal series yet, but is empirically known to work on our test hardware. + +## 0001..0003 — Sarma's VP9 enablement on VDPU381 (RK3588 rkvdec) + +Three patches from `D.V.A.B. Sarma ` adding VP9 +decode support to the VDPU381 variant of rkvdec (the RK3588 generation). + +| # | Subject | LOC | What | +|---|---------|----:|------| +| 0001 | rkvdec/vp9: rename get_ref_buf to get_ref_buf_vp9 | 10 | rename existing helper to avoid namespace collision with the upcoming HEVC equivalent | +| 0002 | rkvdec: move vp9 functions to common file | 172 | extract VP9 plumbing into `rkvdec-vp9-common.{c,h}` so VDPU381 can share with the older RK3399 backend | +| 0003 | rkvdec: add VP9 support for VDPU381 variant | 1303 | the actual VDPU381 VP9 backend — register defs + `rkvdec-vdpu381-vp9.c` + glue | + +Combined: ~1500 LOC, 5 new files in `drivers/media/platform/rockchip/rkvdec/`. + +### Upstream provenance + +- Author maintains the work at https://github.com/dvab-sarma/android_kernel_rk_opi + branch `add-rkvdec-vdpu381-vp9-v8`. +- Collabora's blog post on RK3588/RK3576 video decoder mainline merge cites + the work but notes "v1 series needs to be sent for review soon" — + i.e. not yet on linux-media patchwork, no upstream timeline. +- Casanova's VDPU381+VDPU383 H.264/HEVC base (which these patches sit on top + of) IS in mainline 7.0 release. +- Patches do NOT modify any of our scope-tagged board / module / soc / + subsystem code paths — purely additive to the upstream rkvdec subdirectory. + +### Tested on + +- Author: Orange Pi 5 Pro board (RK3588), AOSP 16 + FFMPEG, Profile 0 + Profile 2 +- Our fleet: build verified clean on `ampere` (CoolPi CM5 GenBook, RK3588) + 2026-05-18 with KERNELRELEASE `7.0.0-rc3-vp9-test+` (base = running + `7.0.0-rc3-devices+` config + LOCALVERSION change + these 3 patches + + the pre-existing issue14 vb2-resv local mods). Full kernel image + + DTB + modules + initramfs land at `/boot/firmware/*-7.0.0-rc3-vp9-test+` + and `/lib/modules/7.0.0-rc3-vp9-test+`. New extlinux label `arch_vp9_test` + added without touching default `arch_devices`. End-to-end VP9 decode + validation requires booting into `arch_vp9_test` (pending operator + confirmation, then `v4l2-ctl -d /dev/video1 --list-formats-out` should + list `VP9F` alongside `S265` + `S264`). + +### Apply order + +Strict — 0001 → 0002 → 0003. 0003 depends on the common-file refactor +from 0002, which depends on the helper rename in 0001. + +### Removal criteria + +Drop these patches when: +- Sarma sends a v1 series to linux-media and it lands upstream — adopt + the upstream version at the next baseline bump, OR +- Collabora produces an alternative VP9 enablement on their own + hardware-enablement/rockchip-3588 GitLab tree — prefer that lineage + (more likely to land cleanly upstream). + +### How to use in a kernel-agent build + +If `fleet/ampere.yaml` is bumped to include VP9 (currently scope-out per +the manifest preamble — "Asks #2 (VP9 enablement on RK3588 rkvdec) and +#3 (AV1 dec integration) from issue #6 are NOT addressed in this +manifest — tracked separately"), reference these three files in apply +order under the manifest's scope-tagged patch list. + +Cross-references: `marfrit/kernel-agent#12` (the VP9-on-ampere enablement +issue). From 95be39ef80338c4aa84922fb1bf140518a847fc0 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Mon, 18 May 2026 13:15:09 +0000 Subject: [PATCH 2/5] fleet/ampere: enable Sarma VP9-VDPU381 patches in baseline MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Reference the 3 patches imported in the previous commit under the scope-tagged patch list. Apply order is strict (0001 → 0002 → 0003). Verified 2026-05-18 via the arch_vp9_test extlinux boot on ampere: - VP9F enumerates on rkvdec /dev/video2 - kdirect decode bit-exact vs libavcodec SW reference at -ss 30 - libva decode (firefox/chromium-style consumer) also bit-exact - vainfo lists VAProfileVP9Profile0 (iter38 multi-device probe auto-picks) - All three paths agree on sha c8624d7c42db66525f53a02a515bc38d0a17ef39f692660cc7bebb1e2d2e1b48 Removes VP9 from the "explicitly not included" comment block — issue #12 closes with this change. Also: AV1 stays out-of-scope per issue #6 ask 3 (kernel side fine via the existing av1-vpu-dec node; backend just needs the 4th-fd generalization tracked in libva-v4l2-request-fourier#2). The next linux-ampere-fourier package rebuild from this manifest will pick up VP9 automatically; ampere's running 7.0.0-rc3-vp9-test+ kernel already has these patches via the operator's manual build session today. --- fleet/ampere.yaml | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/fleet/ampere.yaml b/fleet/ampere.yaml index b6eb2be..b621033 100644 --- a/fleet/ampere.yaml +++ b/fleet/ampere.yaml @@ -53,12 +53,19 @@ includes: - board/coolpi-cm5-genbook/0005-arm64-dts-rockchip-rk3588-coolpi-cm5-genbook-Enable-USB-C-PD-charging-via-FUSB302.patch - board/coolpi-cm5-genbook/0008-arm64-dts-rockchip-rk3588-coolpi-cm5-genbook-Add-lid-switch-and-USB3-PHY-lane-config.patch - board/coolpi-cm5-genbook/0011-arm64-dts-rockchip-rk3588-coolpi-cm5-genbook-wire-internal-microphone.patch + # VP9 enablement for RK3588 rkvdec (issue #12, closed 2026-05-18). + # Cherry-picked from D.V.A.B. Sarma's add-rkvdec-vdpu381-vp9-v8 branch + # at github.com/dvab-sarma/android_kernel_rk_opi. Bit-exact HW==SW==libva + # verified at -ss 30 on bbb_60s_720p.vp9.webm via all three decode paths + # (kdirect / SW / libva); sha c8624d7c42db66525f53a02a515bc38d0a17ef39f692660cc7bebb1e2d2e1b48. + # Apply order is STRICT (0003 depends on the rkvdec-vp9-common refactor + # added in 0002, which depends on the helper rename in 0001). + # See patches/driver/media/README.md for provenance + removal criteria. + - driver/media/0001-rkvdec-vp9-rename-get_ref_buf-to-get_ref_buf_vp9.patch + - driver/media/0002-rkvdec-move-vp9-functions-to-common-file.patch + - driver/media/0003-rkvdec-add-vp9-support-for-vdpu381-variant.patch # Explicitly NOT included this round (tracked for later sprints): -# - VP9 enablement for RK3588 rkvdec (issue #6 ask 2). /dev/video0 only -# advertises S265 + S264 today; vainfo lists 9 profiles, target is -# 10. Requires identifying the VDPU381/383 patch chain + possible -# DTS additions. RFC-stage work, scope unclear until research lands. # - AV1 decoder integration (issue #6 ask 3). Kernel side is fine # (/dev/video4 advertises AV1F). Backend libva-v4l2-request-fourier # needs iter39 for a third fd. Backend work, not kernel. From 4c80458d1f3dc7b8248183b556deef0534fad050 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Mon, 18 May 2026 15:25:37 +0200 Subject: [PATCH 3/5] fleet/ohm: import Patch I (5GHz scan filter) + arm64 SCS build-fix MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Patch I closes besser#1 — the wsm_generic_confirm 0x0007 dmesg storm. One-line guard in bes2600_hw_scan() refuses the 5 GHz iteration of mac80211's per-band hw_scan loop with -EOPNOTSUPP, so the firmware never sees the scan request that would be rejected with status 2 → -EINVAL cascade. Phase 7 verified 2026-05-18 on ohm running pkgrel=2: Pattern A 14.3/h → 0/h over 30-min window, no WARN/BUG, single-band 2.4 GHz scans still return BSSes cleanly. Two flavors imported (scan-filter-5ghz and scan-filter-5ghz-danctnix) matching the convention of other bes2600 series — the code path doesn't touch timer APIs so the two are byte-identical for now; flavor separation is kept to preserve consistency in ohm.yaml. The arm64 scs-arm-neon-build-fix series is a build-environment workaround: GCC 15.2.1 strictly validates that -fsanitize=shadow- call-stack requires -ffixed-x18, and arm_neon.h's #pragma target/ push/pop blocks lose x18 fixing inside the wrapped section. The Makefile tweak re-adds -ffixed-x18 explicitly for xor-neon.o. It's a no-op when SCS is off (current pkgrel=2 ohm config) and unblocks SCS=y once GCC upstream is fixed. ohm.yaml gains a CONFIG_SHADOW_CALL_STACK=n config override with a pointer to besser#20 (the re-enable tracking issue) so future manifest-driven kconfig generation honors the workaround without silently dropping it. Source-of-truth commit for Patch I: marfrit/bes2600-dkms branch bes2600/scan-filter-5ghz sha 093a503 PKGBUILD-side (already deployed to ohm via pkgrel=2): marfrit/besser branch claude-noether-14 sha ae175f9 Refs: besser#1 (closed), besser#20, kernel-agent#5 --- fleet/ohm.yaml | 18 ++++ ...-arm64-xor-neon-ffixed-x18-build-fix.patch | 36 ++++++++ ...r-5-GHz-scans-at-the-driver-boundary.patch | 91 +++++++++++++++++++ ...r-5-GHz-scans-at-the-driver-boundary.patch | 91 +++++++++++++++++++ 4 files changed, 236 insertions(+) create mode 100644 patches/arch/arm64/scs-arm-neon-build-fix/0001-arm64-xor-neon-ffixed-x18-build-fix.patch create mode 100644 patches/driver/bes2600/scan-filter-5ghz-danctnix/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch create mode 100644 patches/driver/bes2600/scan-filter-5ghz/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch diff --git a/fleet/ohm.yaml b/fleet/ohm.yaml index efd2003..1bbcff4 100644 --- a/fleet/ohm.yaml +++ b/fleet/ohm.yaml @@ -55,6 +55,18 @@ includes: - driver/bes2600/drop-orphan-file-io-danctnix/ - driver/bes2600/remove-chardev-user-interface/ - driver/bes2600/enable-testmode/ + # Patch I — besser#1 closure. Filter 5 GHz scan iteration at the + # driver boundary (refuses 5 GHz drv_hw_scan with -EOPNOTSUPP). + # Eliminates the wsm_generic_confirm 0x0007 dmesg storm. + # Phase 7 verified 2026-05-18: Pattern A 14.3/h → 0/h. + - driver/bes2600/scan-filter-5ghz-danctnix/ + + # Build-environment workaround for GCC 15.2.1 + CONFIG_SHADOW_CALL_STACK=y + # + arm_neon.h #pragma pop_options interaction. See besser#20 for the + # re-enable-once-GCC-fixed tracking; for now we ship with SCS=n in the + # config and this Makefile tweak as belt-and-suspenders (no-op if SCS + # is off; allows SCS=on once GCC permits). Cross-arch fix, not bes2600. + - arch/arm64/scs-arm-neon-build-fix/ # Explicitly NOT included (decision logged): # - debian-copyright-fsf-address: Debian packaging metadata, not kernel @@ -64,6 +76,12 @@ config: source: hand-managed config file in boltzmann:~/src/besser/marfrit-besser/danctnix-besser-pkgbuild/kernel/config strategy: snapshot, fold to baseline, accept-new with rationale on diff TODO: migrate config into kernel-agent flow once kconfig-by-manifest lands + # Override applied for pkgrel=2 (2026-05-18): CONFIG_SHADOW_CALL_STACK=n + # to work around GCC 15.2.1 arm_neon.h pragma issue. Track besser#20 + # for re-enable plan. Flip back to =y in the manifest once verified + # to build clean on current Arch ARM GCC. + overrides: + CONFIG_SHADOW_CALL_STACK: n # WORKAROUND besser#20 — restore to y when GCC is fixed package: name: linux-pinetab2-danctnix-besser diff --git a/patches/arch/arm64/scs-arm-neon-build-fix/0001-arm64-xor-neon-ffixed-x18-build-fix.patch b/patches/arch/arm64/scs-arm-neon-build-fix/0001-arm64-xor-neon-ffixed-x18-build-fix.patch new file mode 100644 index 0000000..a264806 --- /dev/null +++ b/patches/arch/arm64/scs-arm-neon-build-fix/0001-arm64-xor-neon-ffixed-x18-build-fix.patch @@ -0,0 +1,36 @@ +From: Markus Fritsche +Date: Mon, 18 May 2026 11:42:00 +0200 +Subject: [PATCH] arm64: xor-neon: restore -ffixed-x18 when SHADOW_CALL_STACK=y + (GCC 15+ build fix) + +GCC 15.2.1 enforces that -fsanitize=shadow-call-stack requires +-ffixed-x18 inside arm_neon.h's #pragma GCC target() blocks. The +existing CFLAGS_REMOVE_xor-neon.o line strips the kernel-wide +-ffixed-x18 (it's part of CC_FLAGS_NO_FPU) and CC_FLAGS_FPU does not +restore it, so xor-neon.c fails to build on stricter GCC versions +when CONFIG_SHADOW_CALL_STACK=y. + +Add an explicit -ffixed-x18 just for this object, gated on the +SCS config so non-SCS builds are unaffected. + +Build environment workaround; not a kernel-runtime bug. +--- + arch/arm64/lib/Makefile | 4 ++++ + 1 file changed, 4 insertions(+) + +diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile +index 1234567..2345678 100644 +--- a/arch/arm64/lib/Makefile ++++ b/arch/arm64/lib/Makefile +@@ -9,6 +9,10 @@ ifeq ($(CONFIG_KERNEL_MODE_NEON), y) + obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o + CFLAGS_xor-neon.o += $(CC_FLAGS_FPU) + CFLAGS_REMOVE_xor-neon.o += $(CC_FLAGS_NO_FPU) ++# GCC 15+ enforces that -fsanitize=shadow-call-stack requires -ffixed-x18 ++# even after a #pragma GCC pop_options inside arm_neon.h. CC_FLAGS_REMOVE ++# above strips the kernel-wide -ffixed-x18 (part of CC_FLAGS_NO_FPU); add ++# it back here so xor-neon.c still compiles when SHADOW_CALL_STACK=y. ++CFLAGS_xor-neon.o += $(if $(CONFIG_SHADOW_CALL_STACK),-ffixed-x18) + endif + + lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o diff --git a/patches/driver/bes2600/scan-filter-5ghz-danctnix/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch b/patches/driver/bes2600/scan-filter-5ghz-danctnix/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch new file mode 100644 index 0000000..4447378 --- /dev/null +++ b/patches/driver/bes2600/scan-filter-5ghz-danctnix/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch @@ -0,0 +1,91 @@ +From 093a5038b8b68f316d976b7cb69609ca7f24f322 Mon Sep 17 00:00:00 2001 +From: Markus Fritsche +Date: Mon, 18 May 2026 11:27:40 +0200 +Subject: [PATCH] bes2600: filter 5 GHz scans at the driver boundary (besser#1) + +The BES2600 firmware refuses WSM start-scan for 5 GHz with status 2 +("rejected by policy"). This shows up in dmesg as the recurring + + wsm_generic_confirm failed for request 0x0007. + [SCAN] Scan failed (-22). + +pattern (besser issue #1, ~14-16/h on ohm/PineTab2 baseline). + +Trace shows every reject is the second of a back-to-back pair: mac80211 +splits multi-band hw_scan requests per band when the driver does not +set IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS (we don't), then re-invokes +drv_hw_scan from __ieee80211_scan_completed for each subsequent band. +The 2.4 GHz iteration succeeds; the 5 GHz iteration is what the +firmware rejects. See ieee80211_prep_hw_scan in net/mac80211/scan.c +for the loop, and the existing memory reference_bes2600_5ghz_scan_reject +for the firmware behaviour. + +The 056a71a defer-on-reject patch already in this tree handles the +BT-A2DP-coex branch and the consecutive-reject backoff, but it cannot +prevent the per-band-loop reject: by the time defer_should_scan is +consulted, the per-band call is already in flight, and the reject_count +gets reset on every successful 2.4 GHz scan in between (which is +~36% of attempts), so the threshold never trips. + +The fix: refuse the 5 GHz iteration upfront in bes2600_hw_scan. The +2.4 GHz scan still runs normally. The 5 GHz portion is reported as +aborted to userspace -- same outcome as today, minus the dmesg storm +and the wsm_generic_confirm WARN cascade. + +5 GHz band registration is intentionally left in place: direct-BSSID +association to a known 5 GHz AP still works (no scan is needed for +that path), and a future firmware update that fixes the scan behaviour +should not be foreclosed by changing band advertisement. + +Contract: per include/net/mac80211.h ieee80211_ops.hw_scan, a negative +return aborts the scan without requiring ieee80211_scan_completed(). +-EOPNOTSUPP is the semantically accurate code (operation is legal, +driver can't service it on this band today). + +Phase 3 evidence: +- baseline N=3: rate ~14.3-23.6/h converged at 14.3/h (matches OP) +- back-to-back scan gap: 6/6 rejected pairs <200us, 1/1 successful + pair was 114ms (single-band-only, no 5 GHz leg) +- defer log fires: 0/9 in 30-min window (056a71a structurally bypassed) + +Predicted Phase 7 delta: Pattern A 14/h -> 0/h. +--- + bes2600/scan.c | 22 ++++++++++++++++++++++ + 1 file changed, 22 insertions(+) + +diff --git a/drivers/staging/bes2600/scan.c b/drivers/staging/bes2600/scan.c +index fb1d298..a81afb6 100644 +--- a/drivers/staging/bes2600/scan.c ++++ b/drivers/staging/bes2600/scan.c +@@ -238,6 +238,28 @@ int bes2600_hw_scan(struct ieee80211_hw *hw, + /* Scan when P2P_GO corrupt firmware MiniAP mode */ + if (priv->join_status == BES2600_JOIN_STATUS_AP) + return -EOPNOTSUPP; ++ ++ /* ++ * Firmware refuses WSM start-scan for 5 GHz with status 2 ("rejected ++ * by policy"); see besser issue #1. mac80211 splits multi-band ++ * hw_scan requests per-band when the driver does not set ++ * IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS (we don't -- see ++ * ieee80211_hw_set() calls in bes2600_main.c), so each per-band call ++ * has req->channels[] from one band only (see ieee80211_prep_hw_scan ++ * in net/mac80211/scan.c). Refuse the 5 GHz iteration at the driver ++ * boundary so userspace gets a clean aborted-scan for that portion ++ * rather than waiting for the firmware reject to cascade up. 5 GHz ++ * band registration stays intact so direct-BSSID association to a ++ * known 5 GHz AP still works (no scan needed for that path). ++ * ++ * Contract: per include/net/mac80211.h struct ieee80211_ops.hw_scan ++ * documentation, a negative return aborts the scan without requiring ++ * ieee80211_scan_completed(). ++ */ ++ if (req->n_channels > 0 && ++ req->channels[0]->band == NL80211_BAND_5GHZ) ++ return -EOPNOTSUPP; ++ + #if 0 + if (work_pending(&priv->offchannel_work) || + (hw_priv->roc_if_id != -1)) { +-- +2.54.0 + diff --git a/patches/driver/bes2600/scan-filter-5ghz/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch b/patches/driver/bes2600/scan-filter-5ghz/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch new file mode 100644 index 0000000..4447378 --- /dev/null +++ b/patches/driver/bes2600/scan-filter-5ghz/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch @@ -0,0 +1,91 @@ +From 093a5038b8b68f316d976b7cb69609ca7f24f322 Mon Sep 17 00:00:00 2001 +From: Markus Fritsche +Date: Mon, 18 May 2026 11:27:40 +0200 +Subject: [PATCH] bes2600: filter 5 GHz scans at the driver boundary (besser#1) + +The BES2600 firmware refuses WSM start-scan for 5 GHz with status 2 +("rejected by policy"). This shows up in dmesg as the recurring + + wsm_generic_confirm failed for request 0x0007. + [SCAN] Scan failed (-22). + +pattern (besser issue #1, ~14-16/h on ohm/PineTab2 baseline). + +Trace shows every reject is the second of a back-to-back pair: mac80211 +splits multi-band hw_scan requests per band when the driver does not +set IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS (we don't), then re-invokes +drv_hw_scan from __ieee80211_scan_completed for each subsequent band. +The 2.4 GHz iteration succeeds; the 5 GHz iteration is what the +firmware rejects. See ieee80211_prep_hw_scan in net/mac80211/scan.c +for the loop, and the existing memory reference_bes2600_5ghz_scan_reject +for the firmware behaviour. + +The 056a71a defer-on-reject patch already in this tree handles the +BT-A2DP-coex branch and the consecutive-reject backoff, but it cannot +prevent the per-band-loop reject: by the time defer_should_scan is +consulted, the per-band call is already in flight, and the reject_count +gets reset on every successful 2.4 GHz scan in between (which is +~36% of attempts), so the threshold never trips. + +The fix: refuse the 5 GHz iteration upfront in bes2600_hw_scan. The +2.4 GHz scan still runs normally. The 5 GHz portion is reported as +aborted to userspace -- same outcome as today, minus the dmesg storm +and the wsm_generic_confirm WARN cascade. + +5 GHz band registration is intentionally left in place: direct-BSSID +association to a known 5 GHz AP still works (no scan is needed for +that path), and a future firmware update that fixes the scan behaviour +should not be foreclosed by changing band advertisement. + +Contract: per include/net/mac80211.h ieee80211_ops.hw_scan, a negative +return aborts the scan without requiring ieee80211_scan_completed(). +-EOPNOTSUPP is the semantically accurate code (operation is legal, +driver can't service it on this band today). + +Phase 3 evidence: +- baseline N=3: rate ~14.3-23.6/h converged at 14.3/h (matches OP) +- back-to-back scan gap: 6/6 rejected pairs <200us, 1/1 successful + pair was 114ms (single-band-only, no 5 GHz leg) +- defer log fires: 0/9 in 30-min window (056a71a structurally bypassed) + +Predicted Phase 7 delta: Pattern A 14/h -> 0/h. +--- + bes2600/scan.c | 22 ++++++++++++++++++++++ + 1 file changed, 22 insertions(+) + +diff --git a/drivers/staging/bes2600/scan.c b/drivers/staging/bes2600/scan.c +index fb1d298..a81afb6 100644 +--- a/drivers/staging/bes2600/scan.c ++++ b/drivers/staging/bes2600/scan.c +@@ -238,6 +238,28 @@ int bes2600_hw_scan(struct ieee80211_hw *hw, + /* Scan when P2P_GO corrupt firmware MiniAP mode */ + if (priv->join_status == BES2600_JOIN_STATUS_AP) + return -EOPNOTSUPP; ++ ++ /* ++ * Firmware refuses WSM start-scan for 5 GHz with status 2 ("rejected ++ * by policy"); see besser issue #1. mac80211 splits multi-band ++ * hw_scan requests per-band when the driver does not set ++ * IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS (we don't -- see ++ * ieee80211_hw_set() calls in bes2600_main.c), so each per-band call ++ * has req->channels[] from one band only (see ieee80211_prep_hw_scan ++ * in net/mac80211/scan.c). Refuse the 5 GHz iteration at the driver ++ * boundary so userspace gets a clean aborted-scan for that portion ++ * rather than waiting for the firmware reject to cascade up. 5 GHz ++ * band registration stays intact so direct-BSSID association to a ++ * known 5 GHz AP still works (no scan needed for that path). ++ * ++ * Contract: per include/net/mac80211.h struct ieee80211_ops.hw_scan ++ * documentation, a negative return aborts the scan without requiring ++ * ieee80211_scan_completed(). ++ */ ++ if (req->n_channels > 0 && ++ req->channels[0]->band == NL80211_BAND_5GHZ) ++ return -EOPNOTSUPP; ++ + #if 0 + if (work_pending(&priv->offchannel_work) || + (hw_priv->roc_if_id != -1)) { +-- +2.54.0 + From 43c8f0cba8fc007f22725a483204d4b4ad81c58e Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Mon, 18 May 2026 15:57:20 +0200 Subject: [PATCH 4/5] =?UTF-8?q?patches/driver/bes2600:=20scan-filter-5ghz?= =?UTF-8?q?=20refinement=20=E2=80=94=20allow=20targeted=20single-channel?= =?UTF-8?q?=20scans?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Updates both flavors with the n_channels > 1 refinement (was > 0). The original guard refused ALL 5 GHz scans which broke 5 GHz association via NM band=a profiles (NM iterates freq_list per channel, single-channel scans were also refused). Tightened: only multi-channel 5 GHz scans (the per-band-sweep that triggers the firmware storm) are refused; single-channel 5 GHz scans pass through so NM/wpa_supplicant can find and associate to 5 GHz BSSes. Verified on ohm with locally-built pkgrel=3 (srcversion BEB625FA7443171EA8D55F7): associated to 5 GHz BSSID c0:25:06:e6:5b:33 on 5240 MHz / ch.48, 150 Mbit/s MCS 7 40MHz short-GI; Pattern A still 0 since boot. Patch file is now a concatenation of two commits from marfrit/bes2600-dkms bes2600/scan-filter-5ghz branch: 093a503 (original Patch I) 8cd10f4 (this refinement) patch -Np1 applies them sequentially -> net effect = single squash. Refs: besser#1 (closed), PKGBUILD update at marfrit/besser claude-noether-14 commit 122582e (pkgrel=3 deployed to ohm on 2026-05-18 same session). --- ...r-5-GHz-scans-at-the-driver-boundary.patch | 79 ++++++++++++++++++- ...r-5-GHz-scans-at-the-driver-boundary.patch | 79 ++++++++++++++++++- 2 files changed, 156 insertions(+), 2 deletions(-) diff --git a/patches/driver/bes2600/scan-filter-5ghz-danctnix/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch b/patches/driver/bes2600/scan-filter-5ghz-danctnix/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch index 4447378..76df117 100644 --- a/patches/driver/bes2600/scan-filter-5ghz-danctnix/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch +++ b/patches/driver/bes2600/scan-filter-5ghz-danctnix/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch @@ -1,7 +1,8 @@ From 093a5038b8b68f316d976b7cb69609ca7f24f322 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Mon, 18 May 2026 11:27:40 +0200 -Subject: [PATCH] bes2600: filter 5 GHz scans at the driver boundary (besser#1) +Subject: [PATCH 1/2] bes2600: filter 5 GHz scans at the driver boundary + (besser#1) The BES2600 firmware refuses WSM start-scan for 5 GHz with status 2 ("rejected by policy"). This shows up in dmesg as the recurring @@ -89,3 +90,79 @@ index fb1d298..a81afb6 100644 -- 2.54.0 + +From 8cd10f487c8144d462a510812ba0fa717b3e24df Mon Sep 17 00:00:00 2001 +From: Markus Fritsche +Date: Mon, 18 May 2026 15:56:34 +0200 +Subject: [PATCH 2/2] bes2600: scan-filter-5ghz: allow targeted single-channel + scans (besser#1 follow-up) + +The original Patch I refused EVERY 5 GHz scan request unconditionally +(req->n_channels > 0 && band == NL80211_BAND_5GHZ). This eliminated +the Pattern A storm but also broke 5 GHz association entirely: +NM / wpa_supplicant iterates a freq_list when a connection profile +specifies 802-11-wireless.band=a, issuing per-frequency single-channel +scans to find the BSS before associating. Those single-channel scans +were also refused by our guard, so the BSS was never seen and +'Wi-Fi network could not be found' was the only outcome. + +Tighten the guard: refuse only multi-channel 5 GHz scans (n_channels +> 1), which is the per-band-sweep pattern mac80211 issues internally +and the only one that triggers the firmware storm at the per-band +loop boundary. Single-channel 5 GHz scans pass through to firmware, +which generally accepts them -- and when they happen to be rejected, +the failure is isolated and doesn't cascade. + +Verified on ohm with pkgrel=3 (srcversion BEB625FA7443171EA8D55F7): + - Pattern A count since boot: 0 (Phase 7 prediction still holds) + - iw dev wlan0 scan freq 5180 -> allowed + - iw dev wlan0 scan freq 5180 5200 ... -> refused -EOPNOTSUPP + - NM 'nmcli connection up' with band=a -> associated to BSSID + c0:25:06:e6:5b:33 on 5240 MHz / ch.48 in ~1 second + - TX bitrate 150 Mbit/s MCS 7 40MHz short-GI (vs 72.2 Mbit/s + HT20 on 2.4 GHz) -- ~2x throughput recovered + +The change is a single byte (> 0 -> > 1) plus comment update; the +test confirmation above is what motivates it. + +Refs: besser#1 (closed but tracked for follow-up like this), original +Patch I sha 093a503. +--- + bes2600/scan.c | 16 ++++++++++++---- + 1 file changed, 12 insertions(+), 4 deletions(-) + +diff --git a/drivers/staging/bes2600/scan.c b/drivers/staging/bes2600/scan.c +index a81afb6..497523b 100644 +--- a/drivers/staging/bes2600/scan.c ++++ b/drivers/staging/bes2600/scan.c +@@ -248,15 +248,23 @@ int bes2600_hw_scan(struct ieee80211_hw *hw, + * has req->channels[] from one band only (see ieee80211_prep_hw_scan + * in net/mac80211/scan.c). Refuse the 5 GHz iteration at the driver + * boundary so userspace gets a clean aborted-scan for that portion +- * rather than waiting for the firmware reject to cascade up. 5 GHz +- * band registration stays intact so direct-BSSID association to a +- * known 5 GHz AP still works (no scan needed for that path). ++ * rather than waiting for the firmware reject to cascade up. ++ * ++ * Only the multi-channel case is refused (n_channels > 1): that's ++ * the per-band-sweep pattern mac80211 issues internally and the ++ * one that triggers the firmware storm at the per-band loop ++ * boundary. Single-channel 5 GHz scans (BSS verification, NM's ++ * per-freq iteration when 802-11-wireless.band=a is set) pass ++ * through to firmware, which generally accepts them since the ++ * storm is the back-to-back per-band issue, not a blanket 5 GHz ++ * reject. This preserves 5 GHz association via the ++ * "wpa_supplicant iterates freq_list per channel" path. + * + * Contract: per include/net/mac80211.h struct ieee80211_ops.hw_scan + * documentation, a negative return aborts the scan without requiring + * ieee80211_scan_completed(). + */ +- if (req->n_channels > 0 && ++ if (req->n_channels > 1 && + req->channels[0]->band == NL80211_BAND_5GHZ) + return -EOPNOTSUPP; + +-- +2.54.0 + diff --git a/patches/driver/bes2600/scan-filter-5ghz/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch b/patches/driver/bes2600/scan-filter-5ghz/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch index 4447378..76df117 100644 --- a/patches/driver/bes2600/scan-filter-5ghz/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch +++ b/patches/driver/bes2600/scan-filter-5ghz/0001-bes2600-filter-5-GHz-scans-at-the-driver-boundary.patch @@ -1,7 +1,8 @@ From 093a5038b8b68f316d976b7cb69609ca7f24f322 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Mon, 18 May 2026 11:27:40 +0200 -Subject: [PATCH] bes2600: filter 5 GHz scans at the driver boundary (besser#1) +Subject: [PATCH 1/2] bes2600: filter 5 GHz scans at the driver boundary + (besser#1) The BES2600 firmware refuses WSM start-scan for 5 GHz with status 2 ("rejected by policy"). This shows up in dmesg as the recurring @@ -89,3 +90,79 @@ index fb1d298..a81afb6 100644 -- 2.54.0 + +From 8cd10f487c8144d462a510812ba0fa717b3e24df Mon Sep 17 00:00:00 2001 +From: Markus Fritsche +Date: Mon, 18 May 2026 15:56:34 +0200 +Subject: [PATCH 2/2] bes2600: scan-filter-5ghz: allow targeted single-channel + scans (besser#1 follow-up) + +The original Patch I refused EVERY 5 GHz scan request unconditionally +(req->n_channels > 0 && band == NL80211_BAND_5GHZ). This eliminated +the Pattern A storm but also broke 5 GHz association entirely: +NM / wpa_supplicant iterates a freq_list when a connection profile +specifies 802-11-wireless.band=a, issuing per-frequency single-channel +scans to find the BSS before associating. Those single-channel scans +were also refused by our guard, so the BSS was never seen and +'Wi-Fi network could not be found' was the only outcome. + +Tighten the guard: refuse only multi-channel 5 GHz scans (n_channels +> 1), which is the per-band-sweep pattern mac80211 issues internally +and the only one that triggers the firmware storm at the per-band +loop boundary. Single-channel 5 GHz scans pass through to firmware, +which generally accepts them -- and when they happen to be rejected, +the failure is isolated and doesn't cascade. + +Verified on ohm with pkgrel=3 (srcversion BEB625FA7443171EA8D55F7): + - Pattern A count since boot: 0 (Phase 7 prediction still holds) + - iw dev wlan0 scan freq 5180 -> allowed + - iw dev wlan0 scan freq 5180 5200 ... -> refused -EOPNOTSUPP + - NM 'nmcli connection up' with band=a -> associated to BSSID + c0:25:06:e6:5b:33 on 5240 MHz / ch.48 in ~1 second + - TX bitrate 150 Mbit/s MCS 7 40MHz short-GI (vs 72.2 Mbit/s + HT20 on 2.4 GHz) -- ~2x throughput recovered + +The change is a single byte (> 0 -> > 1) plus comment update; the +test confirmation above is what motivates it. + +Refs: besser#1 (closed but tracked for follow-up like this), original +Patch I sha 093a503. +--- + bes2600/scan.c | 16 ++++++++++++---- + 1 file changed, 12 insertions(+), 4 deletions(-) + +diff --git a/drivers/staging/bes2600/scan.c b/drivers/staging/bes2600/scan.c +index a81afb6..497523b 100644 +--- a/drivers/staging/bes2600/scan.c ++++ b/drivers/staging/bes2600/scan.c +@@ -248,15 +248,23 @@ int bes2600_hw_scan(struct ieee80211_hw *hw, + * has req->channels[] from one band only (see ieee80211_prep_hw_scan + * in net/mac80211/scan.c). Refuse the 5 GHz iteration at the driver + * boundary so userspace gets a clean aborted-scan for that portion +- * rather than waiting for the firmware reject to cascade up. 5 GHz +- * band registration stays intact so direct-BSSID association to a +- * known 5 GHz AP still works (no scan needed for that path). ++ * rather than waiting for the firmware reject to cascade up. ++ * ++ * Only the multi-channel case is refused (n_channels > 1): that's ++ * the per-band-sweep pattern mac80211 issues internally and the ++ * one that triggers the firmware storm at the per-band loop ++ * boundary. Single-channel 5 GHz scans (BSS verification, NM's ++ * per-freq iteration when 802-11-wireless.band=a is set) pass ++ * through to firmware, which generally accepts them since the ++ * storm is the back-to-back per-band issue, not a blanket 5 GHz ++ * reject. This preserves 5 GHz association via the ++ * "wpa_supplicant iterates freq_list per channel" path. + * + * Contract: per include/net/mac80211.h struct ieee80211_ops.hw_scan + * documentation, a negative return aborts the scan without requiring + * ieee80211_scan_completed(). + */ +- if (req->n_channels > 0 && ++ if (req->n_channels > 1 && + req->channels[0]->band == NL80211_BAND_5GHZ) + return -EOPNOTSUPP; + +-- +2.54.0 + From c9e9ad973cfe0909ff547598f68c915ca6b53e44 Mon Sep 17 00:00:00 2001 From: "Claude (noether)" Date: Mon, 18 May 2026 16:59:28 +0200 Subject: [PATCH 5/5] patches/driver/bes2600/queue-pending-record-lock-bh-danctnix: mirror besser#18 fix from bes2600-dkms Single-patch series-dir, mirror of the Markus-authored commit d95453c on marfrit/bes2600-dkms branch bes2600/queue-pending-record-lock-bh-fix (PR #11). Paths rewritten from DKMS-style (bes2600/foo.c) to in-tree staging (drivers/staging/bes2600/foo.c) via sed -- this is the in-tree variant. Fix: convert plain spin_lock(&pending_record_lock) to spin_lock_bh() at the 5 sites where it's taken in non-BH-disabled contexts (queue.c:832/839/844, tx_loop.c:112/114). queue.c:289/295 stays as plain spin_lock because BH is already disabled by the outer queue->lock_bh acquired at queue.c:285. Eliminates the SOFTIRQ-safe -> SOFTIRQ-unsafe lockdep warning reported in besser#18 (PROVE_LOCKING-only -- non-fatal on production builds where lockdep is off, but real AB-BA window between bes2600_join_work workqueue context and bes2600_tx softirq context). This commit does NOT add the include to fleet/ohm.yaml. The patch will be wired into ohm's manifest in a follow-up commit (or this branch's PR can extend with the ohm.yaml change once the migration PR #28 lands and the bes2600-dkms PR #11 is reviewed). Closes: besser#18 Refs: marfrit/bes2600-dkms #11 (source-of-truth PR) --- ...600-take-pending-record-lock-with-bh.patch | 121 ++++++++++++++++++ .../README.md | 19 +++ 2 files changed, 140 insertions(+) create mode 100644 patches/driver/bes2600/queue-pending-record-lock-bh-danctnix/0001-bes2600-take-pending-record-lock-with-bh.patch create mode 100644 patches/driver/bes2600/queue-pending-record-lock-bh-danctnix/README.md diff --git a/patches/driver/bes2600/queue-pending-record-lock-bh-danctnix/0001-bes2600-take-pending-record-lock-with-bh.patch b/patches/driver/bes2600/queue-pending-record-lock-bh-danctnix/0001-bes2600-take-pending-record-lock-with-bh.patch new file mode 100644 index 0000000..ff82b10 --- /dev/null +++ b/patches/driver/bes2600/queue-pending-record-lock-bh-danctnix/0001-bes2600-take-pending-record-lock-with-bh.patch @@ -0,0 +1,121 @@ +From d95453c98e31d7a47bc227aef5d0b426ac9e334b Mon Sep 17 00:00:00 2001 +From: Markus Fritsche +Date: Mon, 18 May 2026 16:58:49 +0200 +Subject: [PATCH] =?UTF-8?q?bes2600:=20take=20pending=5Frecord=5Flock=20wit?= + =?UTF-8?q?h=20=5Fbh()=20to=20fix=20SOFTIRQ-safe=20=E2=86=92=20-unsafe=20i?= + =?UTF-8?q?nversion=20(besser#18)?= +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +PROVE_LOCKING reports: + + WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected + kworker/u16:1 is trying to acquire: + &hw_priv->tx_loop.pending_record_lock at bes2600_queue_clear+0x80 + and this task is already holding: + &queue->lock at bes2600_queue_clear+0x60 + + which would create a new lock dependency: + (&queue->lock){+.-.} -> (&hw_priv->tx_loop.pending_record_lock){+.+.} + + but this new dependency connects a SOFTIRQ-irq-safe lock: + (&queue->lock){+.-.} + ... which became SOFTIRQ-irq-safe at: + bes2600_tx -> ieee80211_handle_wake_tx_queue -> tasklet_action + to a SOFTIRQ-irq-unsafe lock: + (&hw_priv->tx_loop.pending_record_lock){+.+.} + ... which became SOFTIRQ-irq-unsafe at: + bes2600_queue_get_skb -> bes2600_join_work -> process_one_work + +queue->lock is taken consistently with spin_lock_bh() at 22 sites; +the nested acquisition of pending_record_lock at queue.c:289 (inside +the outer queue->lock_bh held at line 285) had it implicitly BH-safe +via the outer scope. But pending_record_lock is ALSO taken from +non-BH-disabled contexts: + + bes2600_queue_get_skb (queue.c:832) — process context via + bes2600_join_work (workqueue), no outer queue->lock held + bes2600_tx_loop_item_pending_check (tx_loop.c:112) + — TX-loop context, no outer + queue->lock held + +When CPU0 holds pending_record_lock from one of those non-BH paths +and a softirq fires that wants queue->lock, and CPU1 in softirq has +queue->lock and is about to acquire pending_record_lock — classic AB-BA +SOFTIRQ deadlock. + +The fix is the conservative one: take pending_record_lock with _bh() +at every site that's not already inside a queue->lock_bh-held scope. +That makes the lock consistently SOFTIRQ-safe, eliminating the +inversion. queue.c:289/295 stays as plain spin_lock because BH is +already disabled by the outer queue->lock_bh acquired at queue.c:285. + +Five sites converted: + bes2600/queue.c:832 -- spin_lock -> spin_lock_bh + bes2600/queue.c:839 -- spin_unlock -> spin_unlock_bh + bes2600/queue.c:844 -- spin_unlock -> spin_unlock_bh + bes2600/tx_loop.c:112 -- spin_lock -> spin_lock_bh + bes2600/tx_loop.c:114 -- spin_unlock -> spin_unlock_bh + +Contract: + - Documentation/locking/locktypes.rst spelling: spin_lock_bh() is + the canonical way to make a non-IRQ spinlock safe against + softirq preemption that might re-enter the same lock. + - Same shape as queue->lock in this driver and as is_drv->lock + in the cw1200 ancestor. + +Closes: besser#18 +Fixes: +Signed-off-by: Markus Fritsche +--- + bes2600/queue.c | 6 +++--- + bes2600/tx_loop.c | 4 ++-- + 2 files changed, 5 insertions(+), 5 deletions(-) + +diff --git a/drivers/staging/bes2600/queue.c b/drivers/staging/bes2600/queue.c +index cc606c1..4016b76 100644 +--- a/drivers/staging/bes2600/queue.c ++++ b/drivers/staging/bes2600/queue.c +@@ -829,19 +829,19 @@ int bes2600_queue_get_skb(struct bes2600_queue *queue, u32 packetID, + bes2600_queue_parse_id(packetID, &queue_generation, &queue_id, + &item_generation, &item_id, &if_id, &link_id); + +- spin_lock(&queue->stats->hw_priv->tx_loop.pending_record_lock); ++ spin_lock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock); + if (!list_empty(&queue->stats->hw_priv->tx_loop.pending_record_list)) { + list_for_each_entry_safe(record_item, temp_record_item, &queue->stats->hw_priv->tx_loop.pending_record_list, head) { + if (record_item->packetID == packetID) { + list_del(&record_item->head); + dev_kfree_skb(record_item->skb); + kfree(record_item); +- spin_unlock(&queue->stats->hw_priv->tx_loop.pending_record_lock); ++ spin_unlock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock); + return -EINVAL; + } + } + } +- spin_unlock(&queue->stats->hw_priv->tx_loop.pending_record_lock); ++ spin_unlock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock); + + item = &queue->pool[item_id]; + +diff --git a/drivers/staging/bes2600/tx_loop.c b/drivers/staging/bes2600/tx_loop.c +index e6cf072..0cf7ce1 100644 +--- a/drivers/staging/bes2600/tx_loop.c ++++ b/drivers/staging/bes2600/tx_loop.c +@@ -109,9 +109,9 @@ void bes2600_tx_loop_set_enable(struct bes2600_common *hw_priv, bool need_warn) + bes2600_queue_iterate_pending_packet(&hw_priv->tx_queue[i], + bes2600_tx_loop_item_pending_item); + } +- spin_lock(&hw_priv->tx_loop.pending_record_lock); ++ spin_lock_bh(&hw_priv->tx_loop.pending_record_lock); + bes2600_queue_iterate_record_pending_packet(hw_priv, bes2600_tx_loop_item_pending_item); +- spin_unlock(&hw_priv->tx_loop.pending_record_lock); ++ spin_unlock_bh(&hw_priv->tx_loop.pending_record_lock); + + if (atomic_read(&hw_priv->bh_rx) > 0) + wake_up(&hw_priv->bh_wq); +-- +2.54.0 + diff --git a/patches/driver/bes2600/queue-pending-record-lock-bh-danctnix/README.md b/patches/driver/bes2600/queue-pending-record-lock-bh-danctnix/README.md new file mode 100644 index 0000000..28f809f --- /dev/null +++ b/patches/driver/bes2600/queue-pending-record-lock-bh-danctnix/README.md @@ -0,0 +1,19 @@ +# queue-pending-record-lock-bh-danctnix — close besser#18 + +Converts `pending_record_lock` to `spin_lock_bh()` at the 5 sites +where it is taken in non-BH-disabled contexts (`bes2600_queue_get_skb` +called from `bes2600_join_work`, and `bes2600_tx_loop_item_pending_check`). + +Eliminates the PROVE_LOCKING SOFTIRQ-safe → SOFTIRQ-unsafe warning +reported in besser#18: `&queue->lock` (taken with `_bh` everywhere, +including the nested acquisition at `queue.c:289` that holds +`pending_record_lock` as inner) was registered SOFTIRQ-irq-safe by +lockdep, but `pending_record_lock` was sometimes taken without BH +disable, creating an AB-BA deadlock window. + +Provenance: +- Source-of-truth commit on `marfrit/bes2600-dkms` branch + `bes2600/queue-pending-record-lock-bh-fix`, commit `d95453c`. +- This file is the same commit's `git format-patch` output with + the DKMS-style `bes2600/foo.c` paths rewritten to in-tree + `drivers/staging/bes2600/foo.c` paths via sed.