Calculator Community > ASM

[z80] Floating Point Routines

<< < (9/10) > >>

joijoi:
Hello,
are you still working on the 80bits version of these routines?
Thanks in advances for any answer that you could give

Xeda112358:
At the moment, I am not working on the 80-bit versions. However, I am not finished working with them. Life is busy again, so it will probably be a while yet before I work on floating point routines again.

Xeda112358:
As an update and backup to Omni, I have been working on some of the extended precision routines. At this moment, I've rewritten multiplication and square roots, and I am about to work on division.

The square root routine is now averaging 9183.443cc, a full three times faster than the previous implementation ! (and over 9 times faster than the TI-OS float algorithm).

The multiplication routine is now averaging 9925.527cc, a more humble 8.5% improvement. (still ~3.5 times faster than the TI-OS float multiplication). A good chunk of the speed improvement comes from a slightly faster 16-bit multiply routine from Runer112, which also has much nicer/more useful output registers.


I had an issue with the new square root algorithm, so for now I have patched it up so that it works, but it's about 1000cc slower than necessary. In the process of implementing this new algorithm, I rewrote some faster division routines and I might be able to get division down below 13000cc, a 30+% speed improvement (3 times faster than the OS routine).

Xeda112358:
Okay, with a lot of discussion with Runer112 over IRC/Discord, and a lot of coding, here is an update!
First, it seems like there might be an accuracy issue with the lower bits of xdiv, but that could just be an issue from truncating the input.

xsqrt is averaging ~9183.443cc
xmul is averaging ~9925.527cc
xdiv is averaging ~11107.370cc

Comparing to the old set of routines from a few years ago, division is now almost exactly 40% faster, multiplication is still about 8.5% faster, and square roots are three times faster.

Comparing to the OS routines, multiply is about 3.59 times faster, divide is about 3.65 times faster, and square root is about 9.45 times faster.

Pretty much the one thing the OS has better is converting floats to strings, which the OS, using BCD floats, is wayyyyy faster for. I project that converting to a string could take on average 50000cc with my method, versus maybe 600cc for a TI float (assuming no special formatting).

EDIT: I'm still not good with GitHub, so I'm going to try to get this to work, but I make no promises. z80float

Xeda112358:
So I added in addition/subtraction and float->str. I haven't calculated timing for the add/sub, or float->str, but I would guess about 1500cc and 90000cc are reasonable guesses. The conversion introduces error in the last digits, so I should only return about 16 digits. At the moment, the only formatting that truncates to 16 digits max is when the exponent is too high in magnitude (ex. 1.234567890123456e-9).

Anyways, you can git the source on GitHub, but here are some ugly screenshots attached with evident rounding issues :P


EDIT: It turns out I computed the 10^-(2^k) table with lower precision. Now that it is fixed, numbers are being displayed with a bit better precision. I am currently working on implementing the B-G algorithm.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version