Self-synchronizing ISA
Posted: 2024-09-03, 13:18:55
Hi,
I've just learned about ForwardCom and I have a question about instruction templates. As you're designing an ISA from scratch, have you considered using a self-synchronizing encoding similar to UTF-8? If not, why not?
E.g. if we use the count of leading 1s to encode the number of words encoding an instruction we would have:
- 10xxxx: 1-word insn
- 110xxx: 2-word insn
- 1110xx: 3-word insn
- etc.
- 0xxxxx: not an instruction leading word (instruction payload or misc data, see below)
I believe it might:
- make the decoder faster/parallel: it doesn't need to decode the size of an instruction to start looking for the next one; just skip over the 0xxxx words
- be more forward compatible with instruction extensions: any instruction size would be supported.
- be more forward compatible with tools: e.g. a disassembler could disassemble such machine code to some extent without understanding every instructions (e.g. extensions), correctly identifying unknown instructions and data interleaved with the code.
- allow storing data in the instruction stream without having to jump over them: as long as words have the 0xxxx format, we could store anything (jump tables, large immediates, etc.) and the decoder would just ignore them if they're not in an instruction payload position.
I'm not an hardware person (I work on the Haskell compiler GHC) so maybe there are hardware concerns that I don't see. I'd be curious to have your opinions on this encoding scheme.
Thanks,
Sylvain
I've just learned about ForwardCom and I have a question about instruction templates. As you're designing an ISA from scratch, have you considered using a self-synchronizing encoding similar to UTF-8? If not, why not?
E.g. if we use the count of leading 1s to encode the number of words encoding an instruction we would have:
- 10xxxx: 1-word insn
- 110xxx: 2-word insn
- 1110xx: 3-word insn
- etc.
- 0xxxxx: not an instruction leading word (instruction payload or misc data, see below)
I believe it might:
- make the decoder faster/parallel: it doesn't need to decode the size of an instruction to start looking for the next one; just skip over the 0xxxx words
- be more forward compatible with instruction extensions: any instruction size would be supported.
- be more forward compatible with tools: e.g. a disassembler could disassemble such machine code to some extent without understanding every instructions (e.g. extensions), correctly identifying unknown instructions and data interleaved with the code.
- allow storing data in the instruction stream without having to jump over them: as long as words have the 0xxxx format, we could store anything (jump tables, large immediates, etc.) and the decoder would just ignore them if they're not in an instruction payload position.
I'm not an hardware person (I work on the Haskell compiler GHC) so maybe there are hardware concerns that I don't see. I'd be curious to have your opinions on this encoding scheme.
Thanks,
Sylvain