Methodology for choosing instructions to include or omit

JoeDuarte · Post by **JoeDuarte** » 2020-05-15, 18:48:42

Hi Agner – What's the methodology for deciding on the specific instructions to include in the ISA? Is likely application performance a factor? How do you model it or make predictions about likely performance?

John Regehr's post on Discovering New Instructions seems relevant and helpful: https://blog.regehr.org/archives/669

ForwardCom is an opportunity to start over and build an instruction set from first principles and empirical predictions, so it might help to formalize a methodology for deciding on instructions. This is also related to my questions on how you know in advance whether a given instruction will require microcode or not.

An interesting application of a methodology would be to answer the question: Should Bit Manipulation Instructions (BMI) be included, as they are on Haswell and subsequent CPUs, and their AMD counterparts.

Post by **agner** » 2020-05-16, 4:50:11

Thanks for the link.

The x86-64 ISA has thousands of instructions. In fact, my x86 disassembler has a table of 2029 instructions. It is miraculous that Intel and AMD have actually managed to put all this on a single silicon chip and still get a good performance, but it is obviously not an optimal design. Neither the hardware design nor the software is benefiting from such a high complexity. Part of the reason for the excessive number of x86 instructions is a long history of incremental additions with little foresight. Another reason is marketing. They need a sales argument to make consumers keep buying the next CPU version, saying that it has this or that new instruction set with a silly name. Through the history, AMD have invented their own instruction set extensions to stay competitive with Intel. Some of these extensions have been successful, others were soon forgotten. Unlike AMD, Intel have kept supporting all the old obsolete instructions for reasons of compatibility. This bizarre history is one of the things that motivated me to design a new instruction set. Right from the beginning it was obvious to me that the choice of instructions for ForwardCom should be based on an open process with technical criteria rather than on commercial competition.

A guiding principle is how to get a maximum of performance out of a minimum of hardware complexity. Other SIMD instruction sets have a new set of instructions for every new vector register length. ForwardCom has just one set of vector instructions with variable vector length. x86 has many different instructions for different variations of the same or similar operations, while ForwardCom has one instruction with extra option bits to indicate different operand types, condition codes, signs of operands, rounding modes, etc.

Now that I have started to make a soft core, I am discovering some useful features that can be implemented efficiently with a minimum of hardware complexity and other instructions or features that may be revised because the initial design requires too much hardware complexity.

As long as we are only making experimental soft cores, it is still possible to experiment with different instructions and designs to see what is most useful and efficient.

JoeDuarte · Post by **JoeDuarte** » 2020-05-27, 14:23:54

I just thought of something. Are you familiar with superoptimization and superoptimizing compilers like STOKE?

Link: http://stoke.stanford.edu/

STOKE achieves the fastest possible machine code for a given function or other code segment. It's paired with gcc, so it superoptimizes C and C++ code. It exhaustively searches for the optimal machine code compilation of the given program code.

1. It would be interesting to see if the optimal machine code tends to be microcode-dependent instructions or simpler, non-microcode-dependent instruction sequences. That would help answer whether or not microcode is optimal, given the x86-64 ISA.

I'm not clear on whether it would generalize. For example, if STOKE shows that simpler, non-microcode-dependent instructions are optimal, would that mean this was true for all possible ISAs, or just for x86-64?

2. Could you use STOKE to create an ISA, like ForwardCom? Right now, it knows only x86-64 instructions. I'm not sure how that's implemented, programmatically. It must be some sort of dataset or data layer that I assume could be swapped out with a different ISA. Perhaps you could feed it the ForwardCom ISA, with the relevant attributes for each instruction. Then you could use it to compile various programs and discover which instructions prove most optimal or fruitful for performance.

I'm not sure how that would be different from your soft core approach, or if they're complementary.

3. Have you looked at genetic algorithms to generate the optimal set of instructions?

forwardcom forum

Methodology for choosing instructions to include or omit

Methodology for choosing instructions to include or omit

Re: Methodology for choosing instructions to include or omit

Re: Methodology for choosing instructions to include or omit