Different systems differ in how function calls work. Some systems are using a so-called link register for saving the return address when calling a function. The main disadvantage of a link register is that it has to be saved on the stack and later restored if the function is calling another function. Another common method is to store the return address on a stack and retrieving it with a return instruction. It is common to use the same stack for function return addresses, function parameters, and local variables. Using the same stack for multiple purposes makes the system vulnerable to buffer overflows. An overflow in a local array may overwrite the return address and take the program to a wrong address. ForwardCom avoids these problems by using a separate call stack for return addresses and another stack for parameters and local function variables.
The two-stack design has further advantages in the hardware implementation. A traditional one-stack system has the problem that the stack pointer is modified in the ALU by push, pop, and arithmetic instructions. This occurs at the end of the CPU pipeline, while the stack pointer is needed at the beginning of the pipeline for call and return instructions. Advanced microprocessors have a complicated “stack engine” to circumvent the pipeline-length delay between the value of the stack pointer at the beginning and the end of the pipeline. The two-stack design in ForwardCom avoids this problem by having a separate stack pointer for the call stack that is accessed only at the beginning of the pipeline, unlike the data stack pointer.
While it is common for functions to call other functions, the total depth of function calls is rarely more than 10 – 20 in a typical program. This limited call depth makes it possible to contain the entire call stack on the CPU chip rather than in external memory. This makes call and return operations very efficient, except in the rare case of very deeply recursive functions. Branch prediction for the return instruction is unnecessary because the return address is immediately available from the on-chip call stack. Data stacks, on the other hand, are typically much bigger because they may contain most or all of the non-static data of a program. Return addresses are scattered over a possibly large memory address space if data stack and call stack are mixed.
Function parameters are transferred in registers rather than on the stack in the ForwardCom system. A maximum of 16 integer parameters and 16 floating-point or vector parameters can be transferred in registers to a function. In the rare case that a function has more parameters than this, the additional parameters are transferred in an array stored anywhere in memory and pointed to by the last register. This makes sure that the data stack and the return address are not messed up in case of errors in the parameter list.
A special mechanism makes it possible to allocate registers optimally across functions and program modules. Library functions and other separately compiled functions include information about which registers each function are modifying. Local variables that need to be saved across a function call can be stored in those registers that are not used by the called function. This will minimize the need for spilling registers to memory. The information about register use is contained in each object file. The linker checks that caller and callee agree on which registers are used.