1. Speculative memory read
I would like an option for reading from memory without causing a trap in case the address is invalid. A read from an illegal address should produce zero in a speculative read.
This can be useful in several situations:
- Speculative execution of a code branch before knowing whether it will be used. Here, I mean software speculation, not hardware speculation. The software will execute two branches and afterwords select one of the results with a conditional move rather than a branch. A read error in the discarded branch will be ignored.
- Vectorizing a 'while' loop before the loop count is known. Assume, for example, that a while loop reads one vector element per iteration. We want to vectorize the loop and read, for example, 16 vector elements at a time into a vector register. After handling 16 elements, we find that the loop should be iterated only 5 times. The remaining 11 vector elements are discarded. We don't want a trap in case the unused vector elements happen to lie beyond the limit of readable memory.
- Searching for the end of a zero-terminated string. This is probably the most common case of the above point. When implementing a strlen() function, we want to read one full-length vector register at a time and test if it contains a zero. We don't want an error trap if the full-length read happens to go beyond the limit of readable memory.
- Popping a vector register off the stack. We don't know the length of the vector in advance. We can do the pop in two steps: first read the length, and then read the vector with this length. But we may want to save time by reading both the length and a maximum-length vector in a single memory operation. By this method we risk reading beyond the limit of readable memory.
We may discuss how to implement speculative reads. One possibility is to have a global flag that suppresses memory read traps. This has the disadvantage that errors in non-speculative code may go undetected. Instead, I would like to have a dedicated instruction for speculative read. This instruction should read a maximum-length vector and set any invalid part to zero. It could be implemented as a variant of the vector pop instruction.
The solution with an instruction that reads a full vector register speculatively would cover the most relevant uses, but it has certain limitations: it does not work for general purpose registers, it does not support all addressing modes, and it does not support read-modify instructions.
2. Reduce the number of instruction codes
Some similar multi-format instructions can be merged to use the same opcode and be distinguished by option bits instead. This will free opcodes for future extensions.
- max, min, max_u, min_u. These instructions could be joined together and have option bits for distinguishing between min and max and between signed and unsigned. There should be an additional option bit for specifying how to handle NAN inputs.
- MUL_ADD and MUL_ADD2. These instructions have option bits anyway for specifying operand signs. An extra option bit could specify the order of operands that distinguish MUL_ADD and MUL_ADD2.
- div and div_u. Signed and unsigned division. These instructions have option bits anyway for specifying rounding mode. div_rev would be similar.
- rem and rem_u. Signed and unsigned remainder.
- mul_hi and mul_hi_u. Signed and unsigned high part of extended multiply.
3. Change naming of template fields
The field names IM2, IM3, and IM4 in the code templates are not unique. This should be changed to non-ambiguous names. This does not affect the code, only the documentation.
4. Optional dot product instruction
An optional vector instruction doing r[0] = ±a[0]*b[0] ±a[1]*b[1] with extended precision of the intermediate products. This will be useful for complex number multiplication, division, and dot products.
Your comments and suggestions for other improvements are welcome.