Author Topic: A faster Newton's Method Square Root  (Read 3018 times)

0 Members and 1 Guest are viewing this topic.

Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4687
  • Rating: +718/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
A faster Newton's Method Square Root
« on: October 10, 2017, 02:04:50 pm »
I decided to compute some square roots by hand to keep my tired self awake and alert at work. I opted to use Newton's Method this time and I noticed an optimization!
Instead of attaining 2N precision with a 2N-by-N->2N division, I split it up into an N-by-N multiply and a N-by-N->N division, which on the Z80 or similar processor can be nearly twice as fast.

The standard algorithm works as follows:

To find the square root of c, start with a guess called x.
Iterate the following: (x+c/x)/2

Each iteration doubles the digits of precision; try it on your calculator if you aren't familiar with it! To fin the square root of 21, guess '5'. Type 5, hit [Enter]. Then do .5(Ans+21/Ans and hit [Enter] a few times.

The way it works is, if the current x is bigger than the square root of c, then c/x is less, and vice-versa. So averaging the two gets a closer estimate to the actual square root. My observation was that if I had, say, 3 digits precision, then those digits stay the same in the next iteration. Through some observations, this would mean the first 3 digits of c/x will match that of x. In fact, for c/x, I'll only need to compute it to 6 digits, but if I can skip the first 3, then the division is easier and faster! So, lets optimize:


So it's probably not so clear, but essentially for 2n-digits precision, I need to square an n-digit number and divide an n-digit number by n digits, to n digits precision. This replaces a 2n/2n division.

The former is faster, so let's try an example of finding the square root of 21:

Guess: x=5
(c-x*x) = -4
-4/5/2 = -.4
x-.4 = 4.6

(c-x*x) = 21-4.6*4.6 = -.16
-.16/4.6/2 = -.0173913... ~.02
x-.02 = 4.58

(c-x*x) = 21-4.58*4.58 = .0236
.0236/4.58/2 = .0025764... ~.0026
x+.0026 = 4.5826

(c-x*x) = 21-4.5826*4.5826 = -.00022276
-.00022276/4.5826/2 = -.000024304979... ~-.000024305
x-.000024305 = 4.582575695

In practice by hand, it actually doesn't save much time, but that's because by hand I usually can perform a 2n-by-2n division a little slower than half that of a 2n-by-2n and I can do multiplication and division in roughly the same speed. So overall This gets me about 20% faster.

On the Z80, a 2N-by-N->2N division takes roughly the time of 1.6 2N-by-2N->4N multiplies. As well, a 2N-by-2N->4N multiply takes roughly 3.5 times the time of an N-by-N->2N multiply using recursive Karatsuba multiplication.

So the standard algorithm takes roughly 5.6 N-by-N->2N multiplies worth of time.
The modified algorithm takes roughly 2.9 N-by-N->2N multiplies worth of time.