What if we bring back the tiny instructions but make them 21-bit instead of 16-bit?
We'll pack 3 of them + 1-bit flag in 64-bit "bundles". Now yes, the problem with (16-bit) tiny instructions was that it's difficult to find their pair to pack into a word. But with 21 bits we can have all the 32 registers, 3-operand instructions, 64+ OPs, longer immediates, surely there would be a lot more opportunities to use them so finding 21-bit triplets will actually be easier than finding 16-bit pairs?
The 1-bit flag can be used to indicate the bundle's contents, say, 0 means triplets:
- [0][ 21-bit instr... ][ 21-bit instr... ][ 21-bit instr... ]
- [10][ 62-bit: current 2-word instruction without IL ]
- [11][... 31-bit instruction ...][... 31-bit instruction ...]
For 3-word instructions, we can have bundles that are "no instruction, all immediates" to be combined with the next instruction (which can be the tiny 21-bit one -- 3-word instruction in 85-bit? Neat!). If such ALL-IMM bundle is followed by another ALL-IMM, they will be combined with the next two instructions, and so on (3 max).
To indicate ALL-IMM bundles, we can use the "Mode" bits of the currently-unused Format 2.7:
- [10][111][ ... 59-bit immediate, no instruction ... ]
So, only instructions that:
1. use RT,
2. AND use IM5/IM6
3. AND use IM7
4. AND need more than 59-bit immediate
need to use 128 bits (e.g. ALL-IMM followed by Template E2).
(In cases where we can't have full 64-bit immediates, should the 59 bits be split into 1-bit sign/extend + lower 58-bit to cover more common values?)
Anyway, bundles are aligned to 8-bytes (easier decoding?) but since it can contain either 1/2/3 instructions, JUMPs and IP-dependent instructions need to be adjusted. Should we make it so it can only jump to the first instruction in a bundle? (Like in the old/16-bit tiny instructions: we couldn't jump to the second instruction in a pair). Or should we treat the last bit as "instruction index" (ignored on single-instruction bundle; make the last instruction in a 3-instructions bundle not addressable). Or even 2-bit index? (wasteful, but all instructions become addressable).
Thoughts?