Casei Muratori stream about Vector lane masking
Posted: 2022-06-21, 22:16:10
Casey Muratori recently had a stream about what seems to be a design mistake in the very new Risc-V SIMD proposal:
https://twitter.com/cmuratori/status/15 ... 1307251713
Basically, it can only use v0 (the first vector register) for masking, and it only uses the first bits of v0 as mask where each lane is masked by one bit, and where the bits are tightly packed in the first lane or two.
What I wonder is, how much effect is this going to have on hardware complexity and performance? Are they going to have to shadow the mask on the lanes? Is there going to be a latency penalty? Or can it be all solved with just a bit more of register renaming and value forwarding? Also, how does it compare to forwardcom's solution of using sparse masking instead (where mask bits are located on the corresponding lane rather than tightly packed, and you get to use more than one vector register as mask). Any opinions?
https://twitter.com/cmuratori/status/15 ... 1307251713
Basically, it can only use v0 (the first vector register) for masking, and it only uses the first bits of v0 as mask where each lane is masked by one bit, and where the bits are tightly packed in the first lane or two.
What I wonder is, how much effect is this going to have on hardware complexity and performance? Are they going to have to shadow the mask on the lanes? Is there going to be a latency penalty? Or can it be all solved with just a bit more of register renaming and value forwarding? Also, how does it compare to forwardcom's solution of using sparse masking instead (where mask bits are located on the corresponding lane rather than tightly packed, and you get to use more than one vector register as mask). Any opinions?