Forwardcom simulations

JoeDuarte · Post by **JoeDuarte** » 2018-06-13, 1:54:58

Hi Agner -- Have you run any simulations of ForwardCom? Is it possible to determine things like the benefits of not having a TLB with simulations? It's hard to get a sense of whether ForwardCom would be faster or more efficient than x86-64 or ARM64/v8.

Given its exclusion of microcode, does the compiler need to compete with Intel's OoO secrets? From Hubert's post, I got the impression that the OoO optimizations need to be carried either in the silicon or the compiler.

By the way, does 64-bit floating point require 80-bit intermediate values? Would 80-bit immediate constants fit into a 3-word instruction?

Post by **agner** » 2018-06-13, 17:26:26

The tools that I have developed so far can emulate the ForwardCom processor but not simulate memory latencies.

The instruction set is designed for making OoO execution efficient. The compiler does not need to do this.

Only the old x87 instructions use 80-bit intermediaries. Most other instruction sets use 32 bit for single precision float and 64 bit for double. The fused multiply-and-add instructions need to calculate the intermediate product with extended precision in order to comply with the IEEE 754 floating point standard.

Kulasko · Post by **Kulasko** » 2018-06-15, 0:49:48

I am working on a simulator for these kinds of things right now. The project got pretty heavily delayed by multiple causes, but is still on track. Depending on how much work it is in the end, i want my simulator to not only simulate ForwardCom, but other architecture such as ARM as well. This way, it is really easy to design a CPU, just switch the underlying architecture, make a few adjustments and get a good picture of actual performance differences for the different ISAs. Of course, this only is the case if you have a similary optimized version of your test program for the different ISAs.

marioxcc · Post by **marioxcc** » 2018-07-17, 16:29:10

JoeDuarte wrote: ↑2018-06-13, 1:54:58 By the way, does 64-bit floating point require 80-bit intermediate values? Would 80-bit immediate constants fit into a 3-word instruction?

Architecturally: No. If doing arithmetic in Float64, the programmer always sees a Float64. Internally (at the register transfer level) the floating point number may be stored in a different format (for example: using wider exponent so that subnormal Float64 values correspond to normal numbers in the internal representation). Internally for arithmetic operations, the implementation must compute a wider intermediate value that is then rounded to what the programmer sees, to guarantee correct rounding for the operations that require it in IEEE 754. This is completely transparent to the programmer in IEEE 754-compliant hardware.

Thus an architecture that offers floating point numbers with a 64 bit significant (like the 80 bit floats of x86) necessarily uses wider intermediates internally and the extra precision is discarded to get a correctly rounded result with the required precision. Note that the extra precision (beyond 80 bit floating point numbers) is used exclusively to get a correctly rounded result but otherwise is not available to the programmer and its existence is completely hidden at the architectural level.

forwardcom forum

Forwardcom simulations

Forwardcom simulations

Re: Forwardcom simulations

Re: Forwardcom simulations

Re: Forwardcom simulations