I never followed up with this! A little while after, I refined this significantly and then a year or two ago, I implemented it in my

~~z80float ~~z80float library (link is directly to

~~sqrtHLIX~~ sqrt32) and that's how I made square roots faster than multiplication.

Today I was excited to learn that this is how

**GMP does it**, it just looks scarier there because they are optimizing the notation to make it easy to implement in code without necessarily understanding

*how* it works (which is cool, they present the pseudocode in a very easy-to-digest manner). But that's not what I'm here for, I'm here to show how it works!

Here is the TL;DR if you don't want to go through the derivation:

##

\begin{array}{rcl}

x_{0} & \approx & \sqrt{c} \\

a_{0} &=& c - x_{0}^{2} \\

q_{n} + \frac{r_{n}}{2x_{n}} &=& \frac{a_{n}}{2x_{n}} \\

x_{n+1} &=& x_{n} + q_{n} \\

a_{n+1} &=& r_{n} - q_{n}^{2} \\

\end{array}

##(The third line means "divide

##\frac{a_{n}}{2x_{n}}##, but truncate at some precision to get the quotient and the remainder")

Let's start. I like the "Babylonian Method" where you start with a guess

##x_{0}## as an approximation to the square root of

##c##, then you run this iteration a few times until you have the precision you want:

##

\begin{array}{l}

x_{n+1} &=& \frac{\left(x_{n}+\frac{c}{x_{n}}\right)}{2}

\end{array}

##All you are doing is averaging the guess, with

##c## divided by that guess. If your guess was an overestimate,

##\frac{c}{x_{n}}## is going to be an underestimate, so their average will be closer to the actual answer.

And that's just a simplification of the Newton Iteration:

##

\begin{array}{l}

x_{n+1} &=& x_{n} + \frac{c-x_{n}^{2}}{2x_{n}}

\end{array}

##We are going to build off of the Newton Iteration, but make it a little more complicated by turning

##c-x_{n}^{2}## into it's own recurrence:

##

\begin{array}{l}

a_{n} &=& c-x_{n}^{2} \\

x_{n+1} &=& x_{n} + \frac{a_{n}}{2x_{n}}

\end{array}

##And actually, in practice we are going to truncate that division so that

##q_{n} + \frac{r_{n}}{2x_{n}} = \frac{a_{n}}{2x_{n}}## where

##q_{n}## is the (truncated) quotient, and

##r_{n}## is whatever the remainder was after dividing by

##2x_{n}##. So what we'd

*really* be doing is:

##

\begin{array}{rcl}

a_{n} &=& c-x_{n}^{2} \\

q_{n} + \frac{r_{n}}{2x_{n}} &=& \frac{a_{n}}{2x_{n}} \\

x_{n+1} &=& x_{n} + q_{n}

\end{array}

##Now that everything is more complicated, let's take a look at what happens to

##a_{n+1}##:

##

\begin{array}{l}

a_{n+1} &=& c-x_{n+1}^{2} \\

&=& c-(x_{n} + q_{n})^{2} \\

&=& c-(x_{n}^{2} + 2x_{n}q_{n} + q_{n}^{2}) \\

&=& c - x_{n}^{2} - 2x_{n}q_{n} - q_{n}^{2} \\

&=& a_{n} - 2x_{n}q_{n} - q_{n}^{2} \\

\end{array}

##Already, we have an optimization. Those two multiplications are half the precision of a full

##x_{n+1}^{2}## if you do it right, and that is strictly faster. But we're not done! Remember that division? Let's re-write it:

##

\begin{array}{rcl}

q_{n} + \frac{r_{n}}{2x_{n}} &=& \frac{a_{n}}{2x_{n}} \\

2x_{n}q_{n} + r_{n} &=& a_{n} \\

r_{n} &=& a_{n} - 2x_{n}q_{n} \\

\end{array}

##Well ain't that awfully convenient? Let's rewrite our

##a_{n+1}## recurrence:

##

\begin{array}{l}

a_{n+1} &=& a_{n} - 2x_{n}q_{n} - q_{n}^{2} \\

&=& r_{n} - q_{n}^{2} \\

\end{array}

##So now we've significantly improved that step by replacing the squaring operation with one at half the precision! Let's look at the recurrence relations now:

##

\begin{array}{rcl}

q_{n} + \frac{r_{n}}{2x_{n}} &=& \frac{a_{n}}{2x_{n}} \\

x_{n+1} &=& x_{n} + q_{n} \\

a_{n+1} &=& r_{n} - q_{n}^{2} \\

\end{array}

##Neat!

Now let's retry our example of finding the square-root of 21:

##

\begin{array}{rcl}

x_{0} & \approx & \sqrt{c} \\

&=& 4\\

a_{0} &=& c - x_{0}^{2} \\

&=& 21-16 \\

&=& 5 \\

q_{0} + \frac{r_{0}}{2x_{0}} &=& \frac{a_{0}}{2x_{0}} \\

&=&\frac{5}{8} \\

&=&.5 + \frac{1}{8} \\

x_{1} &=& x_{0} + q_{0} \\

&=& 4.5 \\

a_{1} &=& r_{0} - q_{0}^{2} \\

&=& 1 - .5^{2} \\

&=& .75 \\

q_{1} + \frac{r_{1}}{2x_{1}} &=& \frac{a_{1}}{2x_{1}} \\

&=&\frac{.75}{9} \\

&=&.083 + \frac{.003}{9} \\

x_{2} &=& x_{1} + q_{1} \\

&=& 4.583 \\

a_{2} &=& r_{1} - q_{1}^{2} \\

&=& .003 - .083^{2} \\

&=& -.003889 \\

q_{2} + \frac{r_{2}}{2x_{2}} &=& \frac{a_{2}}{2x_{2}} \\

&=&\frac{-.003889}{9.166} \\

&=&-.0004243 + \frac{.0000001338}{9.166} \\

x_{3} &=& x_{2} + q_{2} \\

&=& 4.5825757 \\

\end{array}

##**EDIT:** updates z80float link to point to the routine that actually implements this