h264: V3D shader for qpel mc22 (2D half-pel 'j' position) #31
Reference in New Issue
Block a user
Delete Branch "noether/v3d-shader-h264-qpel-mc22"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Cascaded H+V 6-tap per H.264 §8.4.2.2.1. Highest per-frame impact among missing qpel positions (2.33 ms/frame worst-case at 1080p per PR #24 bench).
Per-lane: each computes 6 horizontal lowpass int16 intermediates + 1 vertical lowpass with +512>>10 scale. ~50 ALU ops/lane. NO shared memory / barriers — V3D L2 absorbs redundant src reads (alternative was 13×8 int16 shmem + barrier; rejected as more complex without proven win).
CANNOT just cascade mc20→mc02 (int16 intermediate matters); dedicated kernel required.
Test on hertz:
qpel mc22: 2048/2048 bytes bit-exactvia QPU.Qpel QPU anchors complete: mc20 ✓, mc02 ✓, mc22 ✓. The 12 remaining qpel positions are quarter-pel L2-averaged combinations of these — can either dispatch via compose step or get dedicated shaders.