Universal boolean instruction

discussion of forwardcom instruction set and corresponding hardware and software

Moderator: agner

Post Reply
agner
Site Admin
Posts: 192
Joined: 2017-10-15, 8:07:27
Contact:

Universal boolean instruction

Post by agner »

I have an idea for a multi-purpose 3-input bitwise boolean instruction.

This instruction can implement all functions of the type RESULT = (A AND/OR B) AND/OR/XOR C with optional inversion on all inputs and outputs. It can also do A XOR B XOR C and bit selection: A AND B OR NOT A AND C. The latter can be interpreted as A ? B : C.

The general multiformat instructions can have six option bits, which I am making full use of in this design. It is defined by the diagram below, or by this C code:

Code: Select all

int bitwise_logic (int A, int B, int C, bool op[6]) {
  int A1 = op[0] ? ~A : A;
  int B1 = op[1] ? ~B : B;
  int C1 = op[2] ? ~C : C;
  int D  = A1 & B1;
  int D1 = op[3] ? ~D : D;
  int E  = C1 & D1;
  int E1 = op[4] ? ~E : E;
  int result;
  
  if (op[5]) {
    result = E1;
  }
  else if (op[4]) {
    if (op[3]) {
      result = A1 ^ B1 ^ C1;
    } 
    else {
      result = D ^ C1;
    }
  }
  else {
    result = D | ~A1 & C1;
  }
  return result;
}
Image

Intel processors with AVX512 have a fully universal 3-input boolean instruction named VPTERNLOGD with an 8-bit truth table. I think it is too expensive to implement a full dynamic truth table in hardware for every bit in an integer or vector. I think the bitwise_logic instruction defined above is cheaper to implement - or is it?

The current ForwardCom specification has the truth_tab3 instruction which implements a truth table only for single-bit boolean variables, because a full n-bit implementation may be too expensive. The truth_tab3 instruction is probably not needed anymore if we implement the bitwise_logic instruction.

Another question is whether compilers can handle the complications of selecting the right option bits. Considering what compilers like clang and gcc can do nowadays, I think this is realistic.

The bitwise_logic instruction can do all possible boolean functions with 2 inputs, and the most important functions with 3 inputs. It cannot do everything. For example, it cannot do A & B & C | ~A & ~B & ~C. Are there any other important 3-input boolean functions that you think should be supported?

Examples of option bits
Option bits Function
0b100000 A & B & C
0b110100 ~(A & B & ~C)
0b110111 A | B | C
0b111100 A & B | C
0b010000 A & B ^ C
0b011001 ~(A ^ B ^ C)
0b000000 A & B | ~A & C
0b000001 A & B | ~A & ~C
Do you think this instruction is sufficiently useful to justify the complexity? It is possible to make a simpler implementation that cannot do XOR.
agner
Site Admin
Posts: 192
Joined: 2017-10-15, 8:07:27
Contact:

Re: Universal boolean instruction

Post by agner »

Now I have tried to synthesize it in an FPGA. The solution with a complete truth table is using 18% more slices and 40% more LUTs than the bitwise_logic circuit I described above. The truth table implementation is using less resources than I expected because the FPGA has efficient ways of implementing an 8-input multiplexer.

I don't know how many resources it will use in a dedicated IC, but I think that the resource use of the truth table solution is not excessive, so we can go with the truth table. Advantage: it is easier to use and covers all possible cases. Disadvantage: It needs an extra 8-bit constant for the truth table while the multiformat templates have only 6 option bits. Therefore, it has to be implemented as a single-format instruction that cannot have a memory operand.
mbitsnbites
Posts: 6
Joined: 2018-08-25, 16:55:25

Re: Universal boolean instruction

Post by mbitsnbites »

Out of curiosity - where do you plan to put the six option bits? In the instruction word (that would equal 64 different opcodes)? Or in an extra operand?

In MRISC32 I have four bitwise instructions: OR, AND, XOR and SEL.

The OR, AND and XOR instructions have two control bits in the instruction word that optionally negate any of the two input operands (thus we can implement NAND, NOR, BIC and other common constructs).

The SEL instruction is bitwise select: (A & C) | (B & ~C)

For SEL I use two control bits to allow for different operand permutations (since only three registers are specified one of them must be both a source operand and a destination operand).

All in all this uses up four opcodes (2 bits worth of entropy) and two control bits (that are otherwise reserved for data size specification for arithmetic operations, i.e. byte/half-word/word/double-word).
agner
Site Admin
Posts: 192
Joined: 2017-10-15, 8:07:27
Contact:

Re: Universal boolean instruction

Post by agner »

Thank you for your comment. I was not aware of MRISC32. It looks like we have got some of the same ideas.

ForwardCom allows instructions to be up to three 32-bit words. This makes it possible to overcome the problem of cramming a lot of information into a single code word, that most RISC designs suffer from. Different instruction formats give space for 8, 16, 32, or 64 bit constants or addresses. You can load a 32-bit address or a 64-bit double precision constant with a single instruction. The same instruction can be coded in different formats, depending on the size of constants and whether the destination operand is the same as a source operand.

Regarding your question about where I put the option bits: Please see https://www.forwardcom.info/. The E2 instruction format is using two 32-bit words. it has a 6 bit field for option bits and a 16 bit field for address or data, as well as three source registers and a destination register.
Dom324
Posts: 5
Joined: 2019-08-06, 10:40:57

Re: Universal boolean instruction

Post by Dom324 »

The E2 instruction format would even allow for a four input LUT, since it has a 16 bit immediate that can be used as a truth table.
That would mean that the destination register would also serve as a source register (A = A AND B AND C AND D) or alternatively, it could be used as a three input LUT with dedicated result register.

Not sure though how resource heavy 16-input multiplexers are in comparison to 8-input multiplexers. Also, four input operands might violate some ForwardCom rules regarding register usage.
HubertLamontagne
Posts: 80
Joined: 2017-11-17, 21:39:51

Re: Universal boolean instruction

Post by HubertLamontagne »

I imagine that Intel's implementation is probably just a whole bunch of 8:1 multiplexers (and high fanout buffers), and I think that part of the idea is that the compiler could fold any number of operations on the same 3 inputs by changing the 8 entry lookup table.
Post Reply