Hello, I was wondering whether FowardCom should let bits be the smallest addressable unit rather than bytes. I don't know the hardware or ISA implications, but I thought it could improve space and performance for software in a lot of scenarios. Yes, it would reduce the maximum addressable space by a factor of 8, limiting byte addresses to 61 bits, but who is ever going to need more than 288230 terabytes of addressable space on a single machine? For reference, current x86_64 machines only allow 48 bits worth of addressable bytes (I read on Wikipedia that current CPUs error if the upper 16 bits are not the same as the 17th bit). On historic machines, bit addressing would have severely limited addressable space, but with 64 bit pointers I'd think there's plenty of room. Was bit addressing never considered only because pointers used to be too small?
Some advantages I can think of:
Booleans could take only 1 bit of space (if not padded, obviously). Of course, this is already doable for some cases with bit tricks but bools are typically bytes just because current computers happen to use byte addressing. Bit addressing would significantly reduce friction in this area. (Stack space, arrays of bools, etc.)
Many bitwise operations could be eliminated/optimized, since extracting or changing a bit would not necessitate operating on bytes.
Succinct data structures could (presumably) be significantly faster. Ironically, this would eliminate the need for many (mostly static) pointer-based structures. The succinct renaissance could be upon us. Here's a paper explaining the basic idea and the fundamental operations these structures rely on: http://www.eecs.tufts.edu/~aloupis/comp ... actice.pdf
Maybe FowardCom could directly support the 5 fundamental operations as instructions on the hardware?
If data and code gets smaller by any of the above benefits, that's better for caches, improving execution speed overall.
Code written for byte addressable hardware could still work. Either all relevant operations could be shifted 3 positions, or the compiler could insert bitshifts before and after a pointer is modified, or maybe the uppermost 3 bits of a pointer would be used to specify bit positions, rather than the lowest 3? If x86 pointers are taken as a reference, maybe the 3 uppermost bits would be inverted if the 17th bit is set? I'm just throwing out ideas, not sure which of these would be most appropriate, but I'm hoping it's the simplest one. Another idea: Might it be useful to express pointers with a decimal point, and allow 3 bits after the decimal point to specify the bit position?
ForwardCom has many instructions for bit manipulation including instructions to extract, insert, or move bit fields with an arbitrary number of bits in each element. This allows efficient packing of information. If you have many booleans then it is most efficient to make bit vectors that can handle hundreds of bits in a single operation. If you have few booleans then performance is not an issue. I think it is more efficient to handle many bits per operation than reading and writing one bit at a time.
For instance, the memory byte 0x20000000 also appears as eight single bit addresses at 0x22000000 0x22000001 0x22000002 0x22000003 0x22000004 0x22000005 0x22000006 0x22000007.
For modern beefy general purpose architectures, if you need to optimize space by squeezing multiple variables together, you're just supposed to write code with lots of bitshifts and AND/OR/XOR operations. Often you don't see much speed difference at all in the end (speed is often limited by load/store operations and other factors).
I would like to add to the discussion that I expect the ability of a general purpose CPU to interpret opcodes
that are bit adressed unreasonable expensive to implement in an actual hardware design.
Not only would it make the hardware more complex, It would make things slow.
From a software security perspective I advocate for word alligned-instructions only, because this makes it
impossible to compile malicious code segments that change their instructions, based on where you enter
them by a jump. If computable instructions could be jumped to with bit adressing, the situation would
be worse than on the current x86 platform that expects instructions to be byte-aligned. Some early RISC designs
in contrast forced all code to be word-aligned, which allowed for simpler instruction decoding, that had
always the same size of a word.
But for accessing data that is not treated as an instruction, bit adressing might be very helpful and effective.
The boolean data type is a good example. Data structures of bitstreams are the next one.
It can be argued, that bits and bitstreams can already be accessed and processed in all relevant ways.
The newer architectures have destinct features for this. But there is plenty room for improvement.
In current real world examples it is mostly not useful to optimise away a few inefficient booleans, because the
instructions you have to invest to do so need more space and also some time to compute.
There's presumably something nice to be done with succint trees, as per the paper you linked. I like how they avoid having strings of pointers (which are a massive problem, speed-wise). But this doesn't imply that you should do bit addressing: instead, it's probably optimal to store the tree-structure-bitfield in something like a std::vector<uint64_t>, and use the large integers as basically sets of 64 bools packed together, making them into SIMD/vectorized booleans, using parallel operations such as popcount, bit search, shifts/and/or/xor, etc...
Thanks to everyone for the discussion. I would first like to say that I am of course aware that you can do bit masks and shifts to figure out if a particular bit position is set. I also appreciate what Webenson Diolen said about word-aligned instructions, and I was not advocating that instructions would be able to start and end at any bit position. I just noticed that we have instructions that operate on 8 bits in x86 and I did not know why we couldn't go lower than that besides historical reasons (pointers used to be only 32 bits).
The other reason this feature could be useful is to eliminate bit contention. For example, let's say you have are implementing the "sieve of eratosthenes" and want multiple cores to work on your bitstring representing all the integers simultaneously. If two cores want to set a bit on the same byte simultaneously but they each have to write to 8 or 64 bits in order to do it, then the program will require some kind of locking or atomic compare and exchange operations in order to gracefully allow bit set operations to not override each other. With instructions that operate on bits, especially in a situation where you are setting bits to 1 but not 0 and only care that you have the right bitstring in the end, you can just allow any core to set a bit in the bitstring at any time and not worry about in-between states or that the work of one core might interfere with another.
Your multicore sieve of Eratosthenes would be very expensive to implement in hardware because the data cache needs to combine single bits from each core.
One-off bit operations are, I think, common enough to be worth supporting; sieve of eratosthenes is a bit of a strawman, but consider tracing gc as a case that would benefit appreciably.
There are instructions for set, clear, toggle, and test a single bit or multiple bits. There is bit scan, popcount, and bitfield manipulation. What else do you need?