Emulating multiple output instructions with caching
Posted: 2018-04-13, 10:01:26
Hi Agner,
Some operations have naturally multiple outputs: like multiplication (low part and high part), division (quotient and remainder), addition (result and carry)...
As far as I understand, you want to avoid multiple outputs to simplify the hardware.
What if those multiple outputs were single output of multiple instructions:
but to avoid the recomputation, we would have a "cache" within the ALU/FPU (or anywhere else) to compute this one single time.
I think this would be very interesting for divisions. And overflowing arithmetic.
The most interesting thing I can see here is: the architecture specification keeps simple having only single output instructions, but allows some implementations to optimize those pair of instructions by avoiding recomputation.
What do you think?
Some operations have naturally multiple outputs: like multiplication (low part and high part), division (quotient and remainder), addition (result and carry)...
As far as I understand, you want to avoid multiple outputs to simplify the hardware.
What if those multiple outputs were single output of multiple instructions:
Code: Select all
int64 r2 = mul_lo r0 r1
int64 r3 = mul_hi r0 r1
I think this would be very interesting for divisions. And overflowing arithmetic.
The most interesting thing I can see here is: the architecture specification keeps simple having only single output instructions, but allows some implementations to optimize those pair of instructions by avoiding recomputation.
What do you think?