Hi Agner – How difficult do you anticipate it will be to build an LLVM backend for ForwardCom? This requires compiling LLVM bytecode (or bitcode?) into ForwardCom assembly. Your ideas and constructs are much newer and fresher than LLVM, so I wonder if there might be some fundamental incompatibilities.
If it could be done, in theory it opens the door to all the languages that have LLVM front-ends: C, C++, Objective-C/C++, Haskell, Crystal, Swift, D, Julia, et al, though I expect it won't be drop-in easy for most of them.
Is ForwardCom LLVM-friendly
Moderator: agner
Re: Is ForwardCom LLVM-friendly
LLVM would be my first choice for a compiler for ForwardCom. I haven't had the time to look into it, so I don't know if there are any obstacles. The variable-length vectors and vector loops might be a problem.
-
- Posts: 80
- Joined: 2017-11-17, 21:39:51
Re: Is ForwardCom LLVM-friendly
Yeah, forwardcom would definitely be targettable from a compiler at least for the integer part. Presumably you'd end up with the following register pools for the register allocator:
- r0 and r8-r15: can be used for tiny instructions and don't overlap masks so they should be used preferentially
- r1-r7: can be used for tiny instructions but overlap masks.
- r16-r27: don't overlap masks but not usable in tiny instructions
- r28-r31: special usage, not in compiler register allocation pool (includes stack pointer)
The vector unit is a different matter. I'm not sure how good state-of-the-art autovectorization is (I've never seen MSVC do it, though that might be due to having lots of feedback loops in my code and not using bleeding edge versions). For code that doesn't autovectorize, floating point register allocation is similar to integer (you get similar similar pools based on tiny etc). Maybe it would make sense to reserve some vector registers to scalars only, to have a pool that can be quickly saved/restored in function calls for code with lots of small functions and little auto-vectorization.
As far as I can tell, x86 code generators fold memory accesses that only reach a single math opcode into a load+alu or a load+alu+store at the last minute in their pipeline, and presumably the forwardcom code generator would do the same. I'm not sure how masked opcodes on platforms like ARM are done - possibly a math+CMOV combination that gets folded into a conditional opcode at the last minute.
- r0 and r8-r15: can be used for tiny instructions and don't overlap masks so they should be used preferentially
- r1-r7: can be used for tiny instructions but overlap masks.
- r16-r27: don't overlap masks but not usable in tiny instructions
- r28-r31: special usage, not in compiler register allocation pool (includes stack pointer)
The vector unit is a different matter. I'm not sure how good state-of-the-art autovectorization is (I've never seen MSVC do it, though that might be due to having lots of feedback loops in my code and not using bleeding edge versions). For code that doesn't autovectorize, floating point register allocation is similar to integer (you get similar similar pools based on tiny etc). Maybe it would make sense to reserve some vector registers to scalars only, to have a pool that can be quickly saved/restored in function calls for code with lots of small functions and little auto-vectorization.
As far as I can tell, x86 code generators fold memory accesses that only reach a single math opcode into a load+alu or a load+alu+store at the last minute in their pipeline, and presumably the forwardcom code generator would do the same. I'm not sure how masked opcodes on platforms like ARM are done - possibly a math+CMOV combination that gets folded into a conditional opcode at the last minute.