Load From Const Array Instruction
Posted: 2021-06-08, 22:17:49
Programs contain many Read Only Arrays/Lookup Tables. Currently they are stored in static data memory. To make a load from such an array, CPU needs to calculate the address as pointer + index and load the value from the dcache.
This solution requires a pointer and doesn't exploit locality very well.
It would be more efficient to have a Load From Const Array instruction which would be directly followed by an array of integers/floats, effectively offloading Read Only Arrays from dcache into the icache and embedding them right next to the instruction that accesses them.
Such instruction would have an Operand Size field, immediate describing number of elements in the array, register field specifying the index from which we want to read, and another register field as a destination register.
The load address would be calculated as:
And data would be fetched directly from the icache.
The address of the following instruction would be calculated as:
Basically creating an unconditional jump from the beginning of the array to its end.
This instruction omits the need for a pointer to the array and improves locality of Read Only Arrays, because they are stored right next to the instruction that accesses them, making it easier to cache the array.
This should make Lookup Tables, which are already very fast and very useful for optimization, even faster.
It also allows for pressure and capacity trade off between icache and dcache, which compilers or programmers can use to balance the load on the caches.
If a program makes heavy use of dcache, read only arrays can be offloaded to the icache, which decreases dcache pressure and increases it's usable capacity.
If a program makes heavy use of icache, arrays can still be implemented the standard way in order to not pollute icache (it may still be beneficial to offload small arrays).
ForwardCom already employs similar system for constants, citing ForwardCom manual section 1.4:
"The ForwardCom design makes it possible to store constant data in instruction codes instead of constants scattered in static data memory. This reduces cache misses."
This suggestion would also play nice together with the pipeline design from section 8.2 of the manual, which proposes executing simple instructions right in the front end, if their operands are avalaible in the Permanent register file.
If register that holds the index of the array is avalaible in the Permanent register file, then array look up can be completely resolved in the front end, which should be both more energy efficient and have lower latency compared to doing the load in the out of order back end.
This solution requires a pointer and doesn't exploit locality very well.
It would be more efficient to have a Load From Const Array instruction which would be directly followed by an array of integers/floats, effectively offloading Read Only Arrays from dcache into the icache and embedding them right next to the instruction that accesses them.
Such instruction would have an Operand Size field, immediate describing number of elements in the array, register field specifying the index from which we want to read, and another register field as a destination register.
The load address would be calculated as:
Code: Select all
Instruction pointer + Index * Operand Size
The address of the following instruction would be calculated as:
Code: Select all
Instruction pointer + align_to_4B( Number of elements * Operand Size )
This instruction omits the need for a pointer to the array and improves locality of Read Only Arrays, because they are stored right next to the instruction that accesses them, making it easier to cache the array.
This should make Lookup Tables, which are already very fast and very useful for optimization, even faster.
It also allows for pressure and capacity trade off between icache and dcache, which compilers or programmers can use to balance the load on the caches.
If a program makes heavy use of dcache, read only arrays can be offloaded to the icache, which decreases dcache pressure and increases it's usable capacity.
If a program makes heavy use of icache, arrays can still be implemented the standard way in order to not pollute icache (it may still be beneficial to offload small arrays).
ForwardCom already employs similar system for constants, citing ForwardCom manual section 1.4:
"The ForwardCom design makes it possible to store constant data in instruction codes instead of constants scattered in static data memory. This reduces cache misses."
This suggestion would also play nice together with the pipeline design from section 8.2 of the manual, which proposes executing simple instructions right in the front end, if their operands are avalaible in the Permanent register file.
If register that holds the index of the array is avalaible in the Permanent register file, then array look up can be completely resolved in the front end, which should be both more energy efficient and have lower latency compared to doing the load in the out of order back end.