Love this feature. In the description, you could also mention that the command, even as it stands, can be used to produce more accurate (components of a) cross product. But I do not understand what is meant by
...store the result in y0 or y1, where y0,y1 are elements of the result vector. The other
... element of y will receive a corresponding element of a third source vector c0,c1
Also, is there a way to recover the approximate error in the dot product command? How much extra work would it take to make this command an augmented operation such that it returns the sum and the error in that sum?
Or is it easier to instead provide the more general operation as:
... s += a0 * b0 + a1 * b1
Thanks.
Dot Product
Moderator: agner
Re: Dot Product
Damian, you need to use this instruction twice for a complex number multiplication or division. First, you calculate the real part of the product and place it in y0, where (y0,y1) is the result vector. Then you calculate the imaginary part and place it in y1, while retaining y0 by using (c0,c1) = (y0,0).
You are right that it can be useful for cross products as well, but this requires extra permutations.
Augmented addition is better handled by a separate instruction.
You are right that it can be useful for cross products as well, but this requires extra permutations.
Augmented addition is better handled by a separate instruction.
Re: Dot Product
Thanks for the explanation about the real and imaginary parts. Silly me. I should have twigged to the context.
I realize that using it in a cross product needs extra permutations. Will they be incorporated. I thought that exploiting this instruction in as many applications as possible would yield the best ROI.
I will ponder the problem of augmented multiplication/addition within the computation of a dot product of a vector of length N, N > 2 or N >> 2 and get back to you with hopefully some wiser thoughts.
In the last year, I have been working on several routines, all of which compute
.... (s, ds) = a0 * b0 + a1 * b1
and then I work with both s and ds in various ways, not just in augmented additions.
I realize that using it in a cross product needs extra permutations. Will they be incorporated. I thought that exploiting this instruction in as many applications as possible would yield the best ROI.
I will ponder the problem of augmented multiplication/addition within the computation of a dot product of a vector of length N, N > 2 or N >> 2 and get back to you with hopefully some wiser thoughts.
In the last year, I have been working on several routines, all of which compute
.... (s, ds) = a0 * b0 + a1 * b1
and then I work with both s and ds in various ways, not just in augmented additions.
Re: Dot Product
Given two complex numbers:
... a0 + i a1
... b0 + i b1
then the real part of the complex multiplication is a component of the cross product:
... a0 * b0 - a1 * b1
So it appears to me that your proposed instruction already covers the basics of a cross product.
... a0 + i a1
... b0 + i b1
then the real part of the complex multiplication is a component of the cross product:
... a0 * b0 - a1 * b1
So it appears to me that your proposed instruction already covers the basics of a cross product.
Re: Dot Product
Is this an instruction (or a variant of it) that we need to suggest go into IEEE 754-2019.
I still think it needs to produce a tuple (y, dy) such that
... y + dy = +/-a0*b0 +/- a1*b1
for maximum generality to satisfy as many people within the discussion.
My 2c.
I still think it needs to produce a tuple (y, dy) such that
... y + dy = +/-a0*b0 +/- a1*b1
for maximum generality to satisfy as many people within the discussion.
My 2c.
Re: Dot Product
A cross product has 3 dimensions. It requires more permutations. Permutations and data movement across vector lanes are expensive in hardware. Therefore, I have no plans to implement an instruction with more dimensions than 2.
Augmented multiplication is possible with FMA instructions:
Y = a * b
dY = FMA(a, b, -Y)
This can be combined with augmented addition to produce a longer sum of products.
I am not sure whether we need to add a dot product instruction to the IEEE754 standard. It is more important to discuss how to make long sums.
Augmented multiplication is possible with FMA instructions:
Y = a * b
dY = FMA(a, b, -Y)
This can be combined with augmented addition to produce a longer sum of products.
I am not sure whether we need to add a dot product instruction to the IEEE754 standard. It is more important to discuss how to make long sums.