Print Page - The General Class of Karatsuba-Like Multiplication

General Discussion => Other Discussions => Math and Science => Topic started by: Xeda112358 on March 16, 2019, 11:51:33 am

Title: The General Class of Karatsuba-Like Multiplication
Post by: Xeda112358 on March 16, 2019, 11:51:33 am

In playing with ideas to improve multiplication in my float routines, I recreated an algorithm that I soon learned was a known, but rarely discussed generalization of the Karatsuba algorithm!

For a long time, I thought of the Karatsuba algorithm as a special case of Toom-Cook multiplication. If you aren't familiar with that, I discuss it here (https://www.omnimaga.org/math-and-science/fast-3-digit-squaring-toom-cook/) (see the spoiler!). To be fair, it is a special case, but there is actually another way to arrive to the Karatsuba algorithm, and honestly it is a bit easier to implement and recognize.

To arrive at the algorithm, we start with polynomial multiplication. I'll start off with multiplying two cubic polynomials and that should give us enough terms to get a good view of what's going on:
##
\begin{array}{l}
f(x)g(x) &=& (a_{0}+a_{1}x+a_{2}x^{2}+a_{3}x^{3})(b_{0}+b_{1}x+b_{2}x^{2}+b_{3}x^{3})\\

&=&a_{0}b_{0}\\
&&+x(a_{0}b_{1}+a_{1}b_{0})\\
&&+x^{2}(a_{0}b_{2}+a_{1}b_{1}+a_{2}b_{0})\\
&&+x^{3}(a_{0}b_{3}+a_{1}b_{2}+a_{2}b_{1}+a_{3}b_{0})\\
&&+x^{4}(a_{1}b_{3}+a_{2}b_{2}+a_{3}b_{1})\\
&&+x^{5}(a_{2}b_{3}+a_{3}b_{2})\\
&&+x^{6}(a_{3}b_{3})\\
\end{array}
##

There is a lot of symmetry going on there in the form of ##a_{n}b_{m}+\dots+a_{m}b_{n}##. My observation was that ##a_{n}b_{m}+a_{m}b_{n} = (a_{n}+a_{m})(b_{n}+b_{m})-a_{n}b_{n}-a_{m}b_{m}##.On the surface, this looks like we are trading two multiplications for three (more work), but we actually get to reuse some of those terms in other calculations. So let's reorganize:

##
\begin{array}{l}
f(x)g(x) &=& (a_{0}+a_{1}x+a_{2}x^{2}+a_{3}x^{3})(b_{0}+b_{1}x+b_{2}x^{2}+b_{3}x^{3})\\

&=&a_{0}b_{0}\\
&&+x((a_{0}b_{1}+a_{1}b_{0}))\\
&&+x^{2}((a_{0}b_{2}+a_{2}b_{0})+a_{1}b_{1})\\
&&+x^{3}((a_{0}b_{3}+a_{3}b_{0})+(a_{1}b_{2}+a_{2}b_{1}))\\
&&+x^{4}((a_{1}b_{3}+a_{3}b_{1})+a_{2}b_{2})\\
&&+x^{5}((a_{2}b_{3}+a_{3}b_{2}))\\
&&+x^{6}(a_{3}b_{3})\\
\end{array}
##

Now let's define some auxiliary calculations and rewrite the product:
##
z_{0} = a_{0}b_{0}\\
z_{1} = a_{1}b_{1}\\
z_{2} = a_{2}b_{2}\\
z_{3} = a_{3}b_{3}\\

z_{4} = (a_{0}+a_{1})(b_{0}+b_{1})\\
z_{5} = (a_{0}+a_{2})(b_{0}+b_{2})\\
z_{6} = (a_{0}+a_{3})(b_{0}+b_{3})\\
z_{7} = (a_{1}+a_{2})(b_{1}+b_{2})\\
z_{8} = (a_{1}+a_{3})(b_{1}+b_{3})\\
z_{9} = (a_{2}+a_{3})(b_{2}+b_{3})\\

\begin{array}{l}
f(x)g(x) &=& (a_{0}+a_{1}x+a_{2}x^{2}+a_{3}x^{3})(b_{0}+b_{1}x+b_{2}x^{2}+b_{3}x^{3})\\

&=&z_{0}\\
&&+x(z_{4}-z_{0}-z_{1})\\
&&+x^{2}(z_{5}-z_{0}-z_{2}+z_{1})\\
&&+x^{3}(z_{6}-z_{0}-z_{3}+z_{7}-z_{1}-z_{2})\\
&&+x^{4}(z_{8}-z_{1}-z_{3}+z_{2})\\
&&+x^{5}(z_{9}-z_{2}-z_{3})\\
&&+x^{6}z_{3}\\
\end{array}
##

So with just this trick, we went from 16 multiplies to 10, at the cost of 12 extra additions/subtractions. In general, to multiply two n-th degree polynomials, we need ##\frac{(n+1)(n+2)}{2}## multiplies instead of ##(n+1)^{2}##, so almost half as many multiplies are needed. These are both ##O(n^{2})## operations, but the trick is to apply recursion. For example, if we want to multiply two degree-15 polynomials, we can break it up into 4 cubic polynomials and treat the smaller polynomials as coefficients. Then we need to multiply 10 pairs of cubic polynomials, resulting in 100 total multiplies. That's a lot, but it's less than 256 with the standard method, or even 136 with the algorithm above. If we carry this out another step, we can multiply two 63-degree polynomials with 1000 multiplies instead of 4096 or 2080. In fact, for a ##(4^{n}-1)-th## degree polynomial, we need ##10^{n}## multiplies, making this algorithm ##O(n^{log_{4}(10)})=O(n^{1.66096\dots})##. This is a non-trivial, better-than-##O(n^{2})## multiplication algorithm.

Using our generalized technique, let's multiply two quadratic polynomials:

Spoiler For Spoiler:

And the Karatsuba algorithm that we've all come to know and love:

Spoiler For Spoiler:

But is Karatsuba the best of this class of algorithm?

Spoiler For Yes:

A neat application of the non-recursive algorithm discussed above can be used to multiply a number by it's complement, which is another approach to how I came up with this Sine Approximation Algorithm (https://www.omnimaga.org/asm-language/sine-approximation-(z80)/).

Spoiler For Spoiler:

So let's assume that all of our coefficients are either 0 or 1. The complement of 0 is 1, and vice versa. This is useful in programming, since inverting the bits of a number is basically doing -x-1.

Let's define some operations. Where x and y are binary (0 or 1), then

~x means the complement of x, and can be rewritten (1-x).
x|y is bitwise-OR and can be rewritten as x+y-xy.
x&y is bitwise-AND and can be rewritten as x*y or xy for brevity.
x^y is bitwise-XOR and can be rewritten as x+y-2xy.

Observe, then, that:

x~x is always 0.
x&x=x.
x~y+y~x = x-xy+y-yx = x+y-2xy = x^y.

So if we want to multiply, say, a 4-bit number by its complement, we have:

Code: [Select]

z0 = a0~a0=0
z1 = a1~a1=0
z2 = a2~a2=0
z3 = a3~a3=0

z4 = (a0+a1)(~a0+~a1) = a0~a1 + a1~a0 = a0^a1
z5 = (a0+a2)(~a0+~a2) = a0~a2 + a2~a0 = a0^a2
z6 = (a0+a3)(~a0+~a3) = a0~a3 + a3~a0 = a0^a3
z7 = (a1+a2)(~a1+~a2) = a1~a2 + a2~a1 = a1^a2
z8 = (a1+a3)(~a1+~a3) = a1~a3 + a3~a1 = a1^a3
z9 = (a2+a3)(~a2+~a3) = a2~a3 + a3~a2 = a2^a3
f(x)g(x) = (a0+a1x+a2x^2+a3x^3)(~a0+~a1x+~a2x^2+~a3x^3)

=z0                      |=0
 +x(z4-z0-z1)            | +x(a0^a1)
 +x^2(z5-z0-z2+z1)       | +x^2(a0^a2)
 +x^3(z6-z0-z3+z7-z1-z2) | +x^3(a0^a3+a1^a2)
 +x^4(z8-z1-z3+z2)       | +x^4(a1^a3)
 +x^5(z9-z2-z3)          | +x^5(a2^a3)
 +x^6z3                  | +0

Omnimaga

General Discussion => Other Discussions => Math and Science => Topic started by: Xeda112358 on March 16, 2019, 11:51:33 am