Commit Graph

1 Commits

Author SHA1 Message Date
claude-noether 74687d9def cycle 7: V3D shader for H.264 IDCT 8x8
Mirrors cycle 6 (PR #7 prior commit) but at 8x8 scale: 8 lanes per
block, 8 blocks per WG.  H.264 §8.5.13.2 1D butterfly twice (row
pass, column pass), (val + 32) >> 6 rounded + clipped + added to
dst.

Bit-exact first try on hertz (Pi 5, V3D 7.1):

  H264_IDCT4 recipe substrate:      2 (QPU)
  H264_IDCT8 recipe substrate:      2 (QPU)    ← flipped
  H264_DEBLOCK_LV recipe substrate: 2 (QPU)
  H264_QPEL_MC20 recipe substrate:  1 (CPU)    ← task #165 remaining
  H.264 IDCT 4x4: 2048/2048 bytes bit-exact
  H.264 IDCT 8x8: 2048/2048 bytes bit-exact    ← QPU
  H.264 deblock luma v: 2048/2048 bytes bit-exact
  H.264 qpel mc20: 1024/1024 bytes bit-exact

8 of 9 daedalus-fourier cycles now QPU-by-recipe.  Only cycle 9
(H.264 luma qpel mc20) still CPU — different shape (6-tap MC
filter, not a transform) so needs its own shader template; task
#165 covers it as a follow-on.

Same pattern as cycle 6 commit (65bd5c3): adds h264_idct8_pipe
field + lazy init, dispatch_h264_idct8_qpu() with 3 SSBOs,
v3d_h264_idct8.spv install rule.

Uses v3d_runner_create_buffer / destroy_buffer (will swap to
pool API once PR #6 lands).
2026-05-23 20:09:25 +02:00