Author Topic: Quadratic and Faster Convergence for Division (Read 6613 times)

Xeda112358 · « **on:** December 24, 2014, 02:12:45 am »

EDIT: See the necro-update below-- it provides a much more efficient algorithm, even 1 multiply better than most suggested implementations of Newton-Raphson division

In the study of numerical algorithms, quadratic convergence refers to an iterative algorithm that approximates a function and each iteration doubles the digits of accuracy. In this post, I will expose an algorithm for division that can double, triple, quadruple, etc. the number of digits of accuracy at each iteration. I will then show that the optimal algorithm is that which offers cubic convergence (digits of accuracy tripled).

What does it mean to multiply the number of correct digits by a?
Suppose that we have a sequence ##x_{0},x_{1},x_{2},x_{3},...## and suppose that ##x_{n}\rightarrow c## Then by the definition of convergence, ##\lim\limits_{n\rightarrow\infty}{|x_{n}-c|}=0##. You should always think of |x-y| as a "distance" between two points, and in this case, that distance is the error of our approximations, ##x_{n}##.

If the number of correct digits is multiplied by a, then that means ##\log(|x_{n+1}-c|)=a\cdot\log|x_{n}-c|##, or rewritten using logarithm tricks, ##|x_{n+1}-c|=|x_{n}-c|^{a}##. This assumes that the error is initially less than 1.

The Algorithm
The algorithm is very simple, but to break it down we need to look at "obvious" bits of math. Firstly, ##\frac{N}{D}=\frac{N\cdot c}{D\cdot c}##, and secondly, all real numbers can be written in the form of ##x=y\cdot 2^{m}, y\in(.5,1]##. You may have said "duh" at the first statement, but if you need convincing of the second, it basically says that if you divide or multiply a number by 2 enough times, then it will eventually get between 1/2 and 1. Or in another way, shown by example: 2¹¹<3767<2¹², so dividing it all by 2¹², .5<3767/4096<1. All real numbers can be bounded by consecutive power of 2, therefore they can all be written in the above form.

The final piece of the puzzle is to recognize that ##\frac{N}{1}={N}##. What we will do is recursively multiply D by some constant c at each iteration in order to drive it closer to 1. If we use range reduction techniques to get D on (.5,1] (multiply both N and D by the power of 2 that gets D in that range). Then what we want is to choose c so that ##|1-D_{n}\cdot c|= |1-D_{n}|^{a}##. If ##0\leq D_{n}\leq 1##, then we have ##1-D_{n}\cdot c = (1-D_{n})^{a}##. When a is a natural number greater than 1 (if it is 1, then there is no convergence

), we have the following:
##1-D_{n}\cdot c = \sum\limits_{k=0}^{a}{{a \choose k}(-D_{n})^{k}}##
##D_{n}\cdot c = 1-\sum\limits_{k=0}^{a}{{a \choose k}(-D_{n})^{k}}##
##D_{n}\cdot c = 1-(1+\sum\limits_{k=1}^{a}{{a \choose k}(-D_{n})^{k}})##
##D_{n}\cdot c = -\sum\limits_{k=1}^{a}{{a \choose k}(-D_{n})^{k}}##
##D_{n}\cdot c = {a \choose 1}D_{n}-{a \choose 2}D_{n}^{2}+{a \choose 3}D_{n}^{3}+\cdots##
##c = {a \choose 1}-{a \choose 2}D_{n}+{a \choose 3}D_{n}^{2}+\cdots##
Using a technique similar to Horner's Method, we can obtain:
##c = {a \choose 1}-D_{n}({a \choose 2}-D_{n}({a \choose 3}-D_{n}({a \choose 4}-D_{n}\cdots##
Then the selection for c requires a-2 multiplications and a-1 subtractions. If we take ##N_{n+1}=N_{n}\cdot c## and ##D_{n+1}=D_{n}\cdot c##, then the number of digits of accuracy is multiplied by a using a total of a multiplications and a-1.

Example
Lets say we want quadratic convergence. That is, a=2. Then ##c_{n+1}=2-D_{n}## That's right-- all you have to do is multiply by 2 minus the denominator at each iteration and if you started with 1 digit of accuracy, you next get 2 digits, then 4, then 8, 16, 32,.... Let's perform 49/39. Divide numerator and denominator by 2⁶=64 and we get .765625/.609375.

First iteration:
c=2-D=1.390625
Nc=1.064697266
Dc=.8474121094

Second iteration:
c=1.152587891
Nc=1.227157176
Dc=.9767169356

next iterations in the form {c,N,D}:
{1.023283064,1.255729155,.9994578989}
{1.000542101,1.256409887,.9999997061}
{1.000000294,1.256410256,.9999999999}
{1.000000294,1.256410256,1.000000000}

And in fact, 49/39 is 1.256410256 according to my calc.

Can we find an optimal choice for a?
The only variable over which we have control is a-- how quickly the algorithm converges. Therefore we should find an optimal choice of a. In this case, "optimal" means getting the most accuracy for the least work (computation time).

First, ##1-D_{0}<.5=2^{-1}## So then after n iterations, we have at least aⁿ bits of accuracy and since ##D_{0}## may get arbitrarily close to .5, this is the best lower bound on the number of bits of accuracy achieved for n iterations of a randomly chosen D. As well, at n iterations, we need to use ##a\cdot n## multiplications and the subtractions are treated as trivial. Then in order to get m bits of accuracy, we need roughly ##a\log_{a}(m)## multiplications. We want to minimize this function of a for a is an integer greater than 1. To do that, we find when ##0=\frac{d}{da}(a\log_{a}(m))##:
##0=\frac{d}{da}(a\log_{a}(m))##
##0=\frac{d}{da}(a\frac{\log(m)}{\log(a)})##
##0=\frac{d}{da}(\frac{a}{\log(a)})##
##0=\frac{1}{\log(a)}-\frac{1}{\log^{2}(a)}##
Since a>1:
##0=1-\frac{1}{\log(a)}##
##1=\frac{1}{\log(a)}##
##\log(a)=1##
##a=e##

However, e is not an integer, but it is bounded by 2 and 3, so check which is smaller: 2/log(2) or 3/log(3) and we find that a=3 is the optimal value that achieves the most accuracy for the least work.

Example:
Our algorithm body is:

Code: [Select]

    c=3-D*(3-D)
    N=N*c
    D=D*c

Then 49/39=.765625/.609375 and {c,N,D}=:
{1.543212891,1.181522369,.9403953552}
{1.063157358,1.256144201,.9997882418}
{1.000211803,1.256410256,1.000000000}

Conclusion
In reality, their is a specific M so that for all m>M, a=3 always requires less computational power than a=2. Before that cutoff, there are values for which they require the same amount of work or even the cost is in favor of a=2. For example, 32768 bits of accuracy can be achieved with the same amount of work. The cutoff appears to be m=2^24 which is pretty gosh darn huge (16777216 bits of accuracy is over 5 million digits).

pimathbrainiac · « **Reply #1 on:** December 24, 2014, 07:22:01 am »

Great number theory stuff, as always (although I barely understand this).
I think that accuracy is awesome. You should really publish your stuff in a book or journal if you aren't already.

Xeda112358 · « **Reply #2 on:** December 24, 2014, 10:26:23 am »

When I first learned about Goldshmidt Division (case a=2), my first thought was to generalize it. Of course, after I did that I researched if it had already been done and it has

I even think that it is known that a=3 is the most optimal case judging by this.

EDIT: And now I found a paper that mentioned an even more efficient way of doing it. It finds the inverse of D using one fewer mul at each iteration, then you use a final multiplication to do N*(1/D). For the curious:

Code: [Select]

r=1 (or some approximation of 1/D)
    x=1-rD
    r=r+rx(1+x(1+x(1+...

The last step is carried out finitely. The number of 'x' terms is how quickly it converges.

Xeda112358 · « **Reply #3 on:** October 23, 2016, 09:11:30 am »

Necro-addition:
Let's look at ##\frac{x}{1-y}, 0<|y|<1##. Then we know that since ##0<|y|<1##, then ##0<y^{k>1}<|y|<1##. Then if we multiply the numerator and denominator by ##1+y+y^{2}+...+y^{k-1}##, the numerator is now ##x(1+y+y^{2}+...+y^{k-1})## and the denominator is now ##1-y^{k}##.

Spoiler For "Why its 1-y^k":

So now since ##y^{k}## is smaller than ##y##, this means the denominator is closer to 1 so the numerator is closer to the original ##\frac{x}{1-y}##. Even better, this multiplies the digits of accuracy by k each time.

Converting this to an algorithm, we get:
##x_{0}=x, y_{0}=y##
##x_{n+1}=x_{n}(1+y_{n}+y_{n}^{2}+...+y_{n}^{k-1}), y_{n+1}=y_{n}^{k}##
There are optimizations for this, too. For example, if ##k=2^{n}##, then ##1+y+y^{2}+...+y^{k-1}=(1+y)(1+y^{2})(1+y^{4})...(1+y^{2^{n-1}})## and the code becomes something like:

Code: [Select]

x*=(1+y)
y*=y
x*=(1+y)
y*=y
...
x*=(1+y)
y*=y

For example, take k=8, then ##1+y+y^{2}+y^{3}+y^{4}+y^{5}+y^{6}+y^{7}=(1+y)(1+y^{2})(1+y^{4})##
But then, this is identical to n iterations where k=2. In fact, k=2 or any higher power of two by extension, provides the most optimal trade-off of multiplications for digits of accuracy.

For a real world application, suppose I want to get 9 digits (~30 bits) accuracy for ##\frac{x}{10}=\frac{x}{1.25}\frac{1}{8}##. Then ##y=-.25##, and because y is constant, we can precompute all of the powers of ##y##. Further, since ##y=-2^{-2}##, this will amount to simple bit-shifts. So the code is:

Code: [Select]

            #x=379          integer only
x-=x>>2     #x=284.25       285
x+=x>>4     #302.015625     302
x+=x>>8     #303.195374     303
x+=x>>16    #303.2          303
x>>=3       #37.9           37      the final /8.

SpiroH · « **Reply #4 on:** October 25, 2016, 10:29:17 am »

Hi Xeda,
I would be interesting to know a bit more about which 'real world application' are you targeting. Applications tend to bring about some money, you know, hehe.
Not really my business right now, but IMO it would make it more appealing to the less math savvy.

Xeda112358 · « **Reply #5 on:** October 26, 2016, 11:23:42 am »

Oh, it's a common programming technique for performing ##\frac{x}{2^{n}(2^{k}\pm1)}##.

Author Topic: Quadratic and Faster Convergence for Division (Read 6613 times)

Xeda112358

Quadratic and Faster Convergence for Division

pimathbrainiac

Re: Quadratic and Faster Convergence for Division

Xeda112358

Re: Quadratic and Faster Convergence for Division

Xeda112358

Re: Quadratic and Faster Convergence for Division

SpiroH

Re: Quadratic and Faster Convergence for Division

Xeda112358

Re: Quadratic and Faster Convergence for Division