Hi,
I've just learned about ForwardCom and I have a question about instruction templates. As you're designing an ISA from scratch, have you considered using a self-synchronizing encoding similar to UTF-8? If not, why not?
E.g. if we use the count of leading 1s to encode the number of words encoding an instruction we would have:
- 10xxxx: 1-word insn
- 110xxx: 2-word insn
- 1110xx: 3-word insn
- etc.
- 0xxxxx: not an instruction leading word (instruction payload or misc data, see below)
I believe it might:
- make the decoder faster/parallel: it doesn't need to decode the size of an instruction to start looking for the next one; just skip over the 0xxxx words
- be more forward compatible with instruction extensions: any instruction size would be supported.
- be more forward compatible with tools: e.g. a disassembler could disassemble such machine code to some extent without understanding every instructions (e.g. extensions), correctly identifying unknown instructions and data interleaved with the code.
- allow storing data in the instruction stream without having to jump over them: as long as words have the 0xxxx format, we could store anything (jump tables, large immediates, etc.) and the decoder would just ignore them if they're not in an instruction payload position.
I'm not an hardware person (I work on the Haskell compiler GHC) so maybe there are hardware concerns that I don't see. I'd be curious to have your opinions on this encoding scheme.
Thanks,
Sylvain
Self-synchronizing ISA
Moderator: agner
Re: Self-synchronizing ISA
Thanks for your comment.
ForwardCom is not using such a system for the following reasons:
ForwardCom is not using such a system for the following reasons:
- It would be using more of the precious code bits. That means instructions would be longer, and there would be fewer single-word instructions
- The instructions can already store large immediate values of 32 or 64 bits, for example double-precision floating point constants. Such immediate data words do not necessarily start with a zero
- Old x86 compilers would sometimes store jump tables in the code section. This was bad for prefetching and for disassemblers and debuggers because they couldn't distinguish between code and data. ForwardCom avoids this problem by storing jump tables in a read-only data section, while the code section is execute-only and non-readable. This gives maximum protection against hacking.
- Labels in the object files and executable files tell where each function begins. Any unused part of the code section is filled with filler words. There is no risk that reading of the code can get out of sync.
Re: Self-synchronizing ISA
Interesting idea
The refutation is unconvincing:
"It would be using more of the precious code bits": Maybe, but there would be the same amount of single-word instructions since that takes 2 bits either way.
"Large immediates": The leading bit can be switched by instruction variants/mode
3,4: irrelevant
So the advantages stand strong: more forward compatible + easier decoding + random-access decoding (allowing multi-threaded / simd decoding).
This is worth exploring.
The refutation is unconvincing:
"It would be using more of the precious code bits": Maybe, but there would be the same amount of single-word instructions since that takes 2 bits either way.
"Large immediates": The leading bit can be switched by instruction variants/mode
3,4: irrelevant
So the advantages stand strong: more forward compatible + easier decoding + random-access decoding (allowing multi-threaded / simd decoding).
This is worth exploring.
Re: Self-synchronizing ISA
Currently, 1-word instructions can begin with 00 or 01. With your proposal, they can only begin with 10. This means that we would have only half as many 1-word combinations available. Furthermore, we would have fewer available bits in the first word because we have to move the leading bits of the second and third word to the first word. All this means that we would need some 4-word instructions in order to cover as many instruction codes and options as we can now do with up to 3 words, where most instructions are single word.
In the current version, the instruction length is indicated by 2 bits in the first word. With your proposal, you have to read 3 or 4 bits to detect the instruction length.
Self-synchronization is not needed because the decoding never gets out of synch (unlike x86). It is no problem to decode multiple instructions per clock cycle because it is so easy to detect instruction length.
In the current version, the instruction length is indicated by 2 bits in the first word. With your proposal, you have to read 3 or 4 bits to detect the instruction length.
Self-synchronization is not needed because the decoding never gets out of synch (unlike x86). It is no problem to decode multiple instructions per clock cycle because it is so easy to detect instruction length.
Re: Self-synchronizing ISA
Thanks for your answers!
Another thought I had that I'd like to share: if we can have data in the instruction stream (with MSB=0 to indicate that it's not the beginning of an instruction). Then encoding instructions with an immediate operand is just a special case of an instruction with a memory access of the form `[IP]` (if IP is the instruction pointer after the current instruction as with x86).
Without additional addressing forms, it limits to 31-bit immediates. But we could add another form with 1 bit for the MSB, or 2 bits for instructions which need a 64-bit immediate.
Another thought I had that I'd like to share: if we can have data in the instruction stream (with MSB=0 to indicate that it's not the beginning of an instruction). Then encoding instructions with an immediate operand is just a special case of an instruction with a memory access of the form `[IP]` (if IP is the instruction pointer after the current instruction as with x86).
Without additional addressing forms, it limits to 31-bit immediates. But we could add another form with 1 bit for the MSB, or 2 bits for instructions which need a 64-bit immediate.
Re: Self-synchronizing ISA
The ForwardCom format already supports instructions with immediate data of up to 64 bits. Other instruction sets require that you put immediate floating point data in data memory or load them piecewise.