What I like with the "caching" approach: it is completely transparent.
If it is not implemented by the hardware, the software still behaves exactly the same (only slower).
And it does not require any cross-lane communication.
After some thinking, such a caching would be beneficial only for slow operations like division (and multiplication maybe?)
That might be personal taste, but I'm not a big fan of using odd and even indices in a vector to have different meaning.
Back on the add with carry, is it really necessary to have a single instruction to get both the result and the carry?
It might be worth considering recalculating the sum to get the carry. An integer addition is pretty fast, so recalculating it would not incur too much overhead.
And at this point, would it be interesting to not have saturating instructions, but a set of instructions to do saturated arithmetic?
Let me introduce 3 instructions and give some examples:
add_hi src1 src2: compute (src1 + src2) >> width(src) (just compute the carry in fact
overflow src1 src2: if src1 is zero, return src2; if src1 is positive, return MAX_INT; if src1 is negative, return MIN_INT
overflow_u src1 src2: same as overflow for unsigned integers
If you need to check for overflow, just check for add_hi or mul_hi being non-zero
If you want to add 2 int128:
Code: Select all
// int128 A = r0 r1
// int128 B = r2 r3
int64 r4 = add(r0,r2)
int64 r6 = add_hi(r0,r2)
int64 r5 = add_add(r1,r3,r6)
// int128 R = r4 r5
If you want saturating addition (for multiplication, replace add and add_hi with mul and mul_hi):
Code: Select all
int64 r3 = add(r0,r1)
int64 r4 = add_hi(r0,r1)
int64 r2 = overflow(r4,r3)
This approach requires more instructions being executed to get the job done, but much fewer instructions in the ISA:
There is no need to have overflow check instructions, nor saturating instructions, nor *_ex instructions.
I assume those features are rare enough that the overhead has almost no impact on performance.
Would that be acceptable?