v3d_runner: persistent per-pipeline command buffer
Phase 2 of the QPU-default substrate campaign — eliminate
vkAllocateCommandBuffers from the dispatch hot path.
Attaches a VkCommandBuffer to each v3d_pipeline, allocated once in
v3d_runner_create_pipeline() and freed in destroy_pipeline(). The
five dispatch_*_qpu sites switch from v3d_runner_alloc_cmdbuf() to
v3d_runner_pipeline_cmdbuf_reset() — vkResetCommandBuffer is O(1)
versus the driver-side allocation walk. Pool was already created
with VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT so reset is
permitted.
Microbench (hertz, Pi 5, kernel 6.18.29, V3D 7.1):
before (task 160 pool only):
steady-state p50: 76.44 us
steady-state mean: 77.95 us
after (task 160 pool + task 161 persistent cb):
steady-state p50: 54.56 us
steady-state mean: 56.00 us
-> 28% per-dispatch reduction
The remaining ~54 us steady-state is dominated by vkQueueWaitIdle +
shader execution + the two memcpy(in/out) on the dst buffer — task
162 (dmabuf import for dst) targets the memcpy half.
test_api_idct stays bit-exact across CPU/QPU/AUTO substrates.
Refs daedalus-fourier task #161.
This commit is contained in:
@@ -34,6 +34,12 @@ typedef struct {
|
||||
VkDescriptorSet desc_set;
|
||||
uint32_t n_ssbos;
|
||||
uint32_t push_const_size;
|
||||
/* Persistent command buffer. Allocated at create-pipeline time;
|
||||
* dispatch sites use v3d_runner_pipeline_cmdbuf_reset() to
|
||||
* vkResetCommandBuffer instead of paying vkAllocateCommandBuffers
|
||||
* per dispatch. Pool flagged RESET_COMMAND_BUFFER_BIT so reset
|
||||
* is permitted. */
|
||||
VkCommandBuffer cb;
|
||||
} v3d_pipeline;
|
||||
|
||||
/*
|
||||
@@ -121,6 +127,12 @@ int v3d_runner_bind_buffers(v3d_runner *r,
|
||||
/* Allocate a primary command buffer from the runner's pool. */
|
||||
VkCommandBuffer v3d_runner_alloc_cmdbuf(v3d_runner *r);
|
||||
|
||||
/* Reset @p->cb so it can be re-recorded. Returns 0 on success.
|
||||
* Replaces v3d_runner_alloc_cmdbuf() on the dispatch hot path —
|
||||
* vkResetCommandBuffer is O(1) vs vkAllocateCommandBuffers' ~1-5us
|
||||
* driver cost. */
|
||||
int v3d_runner_pipeline_cmdbuf_reset(v3d_runner *r, v3d_pipeline *p);
|
||||
|
||||
/* Submit `cb` to the queue and wait for completion. The classic
|
||||
* timed operation. Returns 0 on success.
|
||||
*/
|
||||
|
||||
Reference in New Issue
Block a user