Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - Xeda112358

Pages: 1 ... 16 17 [18] 19 20 ... 317

256

ASM / Re: [z80] Floating Point Routines

« on: January 14, 2019, 02:40:17 pm »

x-post

I have been focusing on the single-precision floats this past week or so. I rewrote or re-worked a lot of routines. I got rid of most of the tables by switching to a polynomial approximation for the 2^x routine (thanks to the Sollya program!) and using the B-G algorithm to compute lnSingle. It turned out to be faster this way, anyways.

I implemented sine, cosine, and tangent, the first two, again, using minimax polynomial approximation. I optimized the square-root routine (much faster but a few bytes bigger). I re-implemented the B-G algorithm using math optimizations I came up with a few months ago. I opted for two B-G implementations-- one for lnSingle which requires only 1 iteration for single precision, and one for the inverse trig and hyperbolic functions which needs 2 iterations. For anybody looking to save on size, you can just use the second B-G routine for natural logarithm. It will be a little slower, but it'll work just fine (maybe even give you an extra half-bit of precision

).

I included the Python program that I use for converting numbers to my single precision format. You can use it to convert a single float or a bunch of them. I also included a Python tool I made for computing more efficient coefficients in the B-G algorithm, but that'll only be useful to me and maybe a handful of other people. It's there on the off chance somebody stumbles across my project looking for a B-G implementation.

The single precision floats are largely complete in that I can't think of any other functions that I want to add. There is still work to be done on range reduction and verification, as well as bug fixes and more extensive testing.

Here is a current screenshot of some of the routines and their outputs:

The current list of single-precision routines:

Code: [Select]

Basic arithmetic:
  absSingle     |x| -> z       Computes the absolute value
  addSingle     x+y -> z
  ameanSingle   (x+y)/2 -> z.  Arithmetic mean of two numbers.
  cmpSingle     cmp(x,y)       Compare two numbers. Output is in the flags register!
  rsubSingle    y-x -> z
  subSingle     x-y -> z
  divSingle     x/y -> z
  invSingle     1/x -> z
  mulSingle     x*y -> z
  negSingle     -x  -> z
  sqrtSingle    sqrt(x*y) -> z
  geomeanSingle sqrt(x*y) -> z

Logs, Exponentials, Powers
  expSingle    e^x -> z
  pow2Single   2^x -> z
  pow10Single  10^x-> z
  powSingle    y^x -> z
  lgSingle     log2(x)  -> z
  lnSingle     ln(x)    -> z
  log10Single  log10(x) -> z
  logSingle    log_y(x) -> z

Trig, Hyperbolic, and their Inverses
  acoshSingle   acosh(x) -> z
  acosSingle    acos(x)  -> z
  asinhSingle   asinh(x) -> z
  asinSingle    asin(x)  -> z
  atanhSingle   atanh(x) -> z
  atanSingle    atan(x)  -> z
  coshSingle    cosh(x)  -> z
  cosSingle     cos(x)   -> z
  sinhSingle    sinh(x)  -> z
  sinSingle     sin(x)   -> z
  tanhSingle    tanh(x)  -> z
  tanSingle     tan(x)   -> z

Special-Purpose    Used by various internal functions, or optimized for special cases
  bg2iSingle     1/BG(x,y) -> z   Fewer iterations, but enough to be suitable for ln(x). Kind of a special-purpose routine
  bgiSingle      1/BG(x,y) -> z   More iterations, general-purpose, needed for the inverse trig and hyperbolics
  div255Single   x/255 -> z
  div85Single    x/85  -> z
  div51Single    x/51  -> z
  div17Single    x/17  -> z
  div15Single    x/15  -> z
  div5Single     x/5   -> z
  div3Single     x/3   -> z
  mul10Single    x*10  -> z
  mulSingle_p375         x*0.375  -> z      Used in bg2iSingle.  x*(3/8)
  mulSingle_p34375       x*0.34375-> z      Used in bgiSingle.   x*(11/32)
  mulSingle_p041015625   x*0.041015625-> z  Used in bgiSingle.   x*(21/512)

Miscellaneous and Utility
  randSingle    rand   -> z
  single2str    str(x) -> z           Convert a single to a null-terminated string, with formatting
  single2TI     tifloat(x) -> z       Converts a single to a TI-float. Useful for interacting with the TI-OS
  ti2single     single(tifloat x)->z  Converts a TI-float to a single. Useful for interacting with the TI-OS
  single2char   Honestly, I forgot what it does, but I use it in some string routines. probably converts to a uint8
  pushpop       pushes the main registers to the stack and sets up a routine so that when your code exits, it restores registers. Replaces manually surrounding code with push...pop

257

ASM / Re: [z80] Floating Point Routines

« on: January 05, 2019, 12:58:36 pm »

As an update the B-G algorithm is implemented, as well as inverse trig and inverse hyperbolic functions, and natural logarithm. There have been a bunch of bug fixes, and optimizations.

My initial implementation of the 64-bit square root had an issue that was too daunting to track down. Instead, I just made a quick patch to make it work, requiring a 32-bit square operation. I finally decided to rework the code a bit and fixed the issue, allowing me to get rid of that 32-bit square and replace it with a 16-bit square (as originally intended).

xsqrt is now under 6600 clock cycles in average !

EDIT:
I made some more accurate timings, including taking into account the time it takes to make a bcall.

Code: [Select]

          TI-OS       z80float  dif         %
ln    = 131547.46cc   ~165000   +33452.54   125.43%   much slower :(
atan  = 173317.82cc   ~174000   +682.18     100.39%   slightly slower
atanh = 175320.91cc   ~174000   -1320.91     99.25%   slightly faster
sqrt  =  77699.51cc   6540.79   -71158.72     8.42%   Way faster!
mul   =  30229.53cc   9928.23   -20301.30    32.84%   over 3 times faster
add   =   1737.99cc   2094.31   +356.32     120.50%   slower :(

Also, have a recent screenshot:

Keep in mind that I'm displaying one extra digit beyond the accuracy of these floats, so the last digit is useless. In the next digit, error is off by less than 2.0 !

258

ASM / Re: [z80] Floating Point Routines

« on: December 15, 2018, 11:52:40 pm »

So I added in addition/subtraction and float->str. I haven't calculated timing for the add/sub, or float->str, but I would guess about 1500cc and 90000cc are reasonable guesses. The conversion introduces error in the last digits, so I should only return about 16 digits. At the moment, the only formatting that truncates to 16 digits max is when the exponent is too high in magnitude (ex. 1.234567890123456e-9).

Anyways, you can git the source on GitHub, but here are some ugly screenshots attached with evident rounding issues

EDIT: It turns out I computed the 10^-(2^k) table with lower precision. Now that it is fixed, numbers are being displayed with a bit better precision. I am currently working on implementing the B-G algorithm.

259

ASM / Re: Custom APD Help

« on: December 13, 2018, 01:43:10 pm »

Oh, awesome! So it worked?

260

ASM / Re: Custom APD Help

« on: December 12, 2018, 10:51:49 pm »

One quick thing I notice is that in the APD routine, you halt, but interrupts aren't enabled. As well, it looks like registers will get clobbered.
I'm too lazy to check at the moment, but try this modification:

Code: [Select]

APD:
 ld hl, (APDCounter)
 dec hl
 ld a, h
 or l
 ld (APDCounter), hl
 ret nz
 ld hl, 1800
 ld (APDCounter), hl
 xor a
 out (30h), a
 ld a, 1
 out (03h), a
 exx
 ex af,af'
 ei
 halt
 exx
 ex af,af'
 xor a
 out (03h), a
 jq initialize

261

ASM / Re: [z80] Floating Point Routines

« on: December 12, 2018, 10:37:30 pm »

Okay, with a lot of discussion with Runer112 over IRC/Discord, and a lot of coding, here is an update!
First, it seems like there might be an accuracy issue with the lower bits of xdiv, but that could just be an issue from truncating the input.

xsqrt is averaging ~9183.443cc
xmul is averaging ~9925.527cc
xdiv is averaging ~11107.370cc

Comparing to the old set of routines from a few years ago, division is now almost exactly 40% faster, multiplication is still about 8.5% faster, and square roots are three times faster.

Comparing to the OS routines, multiply is about 3.59 times faster, divide is about 3.65 times faster, and square root is about 9.45 times faster.

Pretty much the one thing the OS has better is converting floats to strings, which the OS, using BCD floats, is wayyyyy faster for. I project that converting to a string could take on average 50000cc with my method, versus maybe 600cc for a TI float (assuming no special formatting).

EDIT: I'm still not good with GitHub, so I'm going to try to get this to work, but I make no promises. z80float

262

The Axe Parser Project / Re: Maps with Axe?

« on: December 12, 2018, 04:46:07 pm »

Cool, I'm glad you figured it out!

263

ASM / Re: [z80] Floating Point Routines

« on: December 10, 2018, 11:45:55 pm »

As an update and backup to Omni, I have been working on some of the extended precision routines. At this moment, I've rewritten multiplication and square roots, and I am about to work on division.

The square root routine is now averaging 9183.443cc, a full three times faster than the previous implementation ! (and over 9 times faster than the TI-OS float algorithm).

The multiplication routine is now averaging 9925.527cc, a more humble 8.5% improvement. (still ~3.5 times faster than the TI-OS float multiplication). A good chunk of the speed improvement comes from a slightly faster 16-bit multiply routine from Runer112, which also has much nicer/more useful output registers.

I had an issue with the new square root algorithm, so for now I have patched it up so that it works, but it's about 1000cc slower than necessary. In the process of implementing this new algorithm, I rewrote some faster division routines and I might be able to get division down below 13000cc, a 30+% speed improvement (3 times faster than the OS routine).

264

News / Re: POTY 2018 !

« on: December 04, 2018, 01:42:56 pm »

Okay, added screenshots, thanks!

265

News / POTY 2018 !

« on: December 04, 2018, 11:37:42 am »

TICalc.org's Program Of The Year polls are open!
For the TI-83+/84+ category, it looks like a bunch of older projects split between two authors-- myself and squidgetx (ticalc profile).

Up for the vote are the following programs (in the order listed on ticalc.org).

Batlib (ticalc link)
Batlib is a huge library for BASIC programmers. It has 120+ functions mostly for graphics and string and data manipulation, list and matrix manipulation, compression, and much more.

CopyProg (ticalc link)
This is a small program that allows users to copy variables from RAM or Archive to another variable. CopyProg2 has a few other features as well (like line reading and reading the names of the variables on your calc). It allows you to do things like copy an archived appvar to a temp program for execution

Embers (ticalc link)
Embers is an ARPG that won Omnimaga Contest 2012. This was a really well put together game. It features good graphics, good AI, and storyline.

FloatLib (ticalc link)
Floatlib is an app that holds many single-precision float routines from addition to hyperbolic tangent. It comes with a built-in reference manual and there are some examples like computing the Mandelbrot Set.

Gravity Guy (ticalc link)
Gravity Guy is "a port/variation of the popular iphone/flash game" of the same name. You basically get to flip the direction of gravity to help navigate obstacles.

LblRW (ticalc link)
LblRW is a small utility for BASIC programmers that lets you read or modify data within the BASIC program, using a label offset. That's a mouthful. Basically, "Hey, I want to store player data in my RPG, let's store it after Lbl PD." Or you can store monster data for quick access for example

(no screenshot, sorry

)

StickNinja (ticalc link)
StickNinja was an Omnimaga Contest 2011 entry that earned 3^rd place. It's basically a platformer with awesome Stick Figure and Ninja graphics. Collect coins, destroy enemies; It's got it all.

266

ASM / Re: Grayscale Help

« on: November 30, 2018, 12:52:54 pm »

If you are a masochist, then technically no, as you can type in the hexadecimal opcodes directly on your calc. For everyone else, you can use a program that compiles a text file to a binary. The calculators have two main compilers -- Mimas and asmdream (I'm on mobile, otherwise I would look for the links). On a computer, I prefer spasm-ng, but there is Brass as well.
Personally, I jusy use a text editor, save my file with the .asm or .z80 extension (not really important) and then compile that to a .8xp

267

ASM / Re: Grayscale Help

« on: November 24, 2018, 07:49:01 pm »

Oh, I'm glad you figured it out! That was the first mistake I had to fix in my first implementation

268

Grammer / Re: Grammer 2-The APP

« on: November 23, 2018, 12:50:37 am »

I'm too tired for a full report, but I'll use this as a backup.
I fixed a few bugs introduced in the last version, optimized and cleaned up some more code, and most importantly, I totally overhauled the module system. I renamed the token to just '$'. By storing to it, you can basically register a module to be searched after the default one is searched. At the moment, up to 5 additional modules can be used, which could greatly extend the functionality of Grammer. They can be archived now, and I moved the Menu routine to an external module (appv Grampkg). I still need to do a lot of documentation, but not tonight. Attached is a screenshot of what the code looks like (with the token hook). Good night y'all.

269

Grammer / Re: Grammer 2-The APP

« on: November 19, 2018, 03:44:40 pm »

Code: [Select]

13 Nov. 2018
  - Cleaning up code, removing (at least temporarily)
    routines that aren't vital, or useful.
      - Fire graphics code.     **Probably temporary.
      - Factoring code.         **Probably permanent.
  - Optimized CopyHex
  - ConvHexStr is reorganized to be smaller
14 Nov. 2018
  - Optimized ConvOP1 to be smaller, updated performance
    analysis
  - Optimized GetPixelLoc 3 bytes smaller, 10cc faster
  - Optimized ReadArc routine. 3 bytes smaller, 18cc
    faster for archived data, 3cc slower for data in RAM.
  - Removed LoadTSA as the only internal usage was to load
    the ReadArc routine. Instead it is a specialized routine
    that no longer destroys IX. Next savings of 11 bytes,
    even after extended the mov9 LDI chain to a mov13. Saves
    172cc overall (186cc, actually, since no more ld ix,**)
  - Error also takes advantage of mov13, saving 2 bytes.
  - Added a few more fixed-size moves, including mov768.
    Total cost was 6 bytes.
  - ClrHome uses the faster SetSmallMem, saves a byte.
    Makes it 901cc faster, roughly 21% faster
  - I removed the unknown routine I labeled "lbl000", as it
    isn't used anywhere (or shouldn't be!) It looks like an
    attempt at making an off-page call, probably when the
    low mem scared me.
  - Optimized and fixed IsHexTok. It used to accept the
    ' and ' token as equivalent to 9. Saved a byte and 2cc
    when the token was 0~9.
  - Optimized DE_Times_BC. No change in size, 120cc faster
    in the average case. No longer leaves A=0.
  - optimized HL_Div_BC to be 264cc faster on average, with
    a net cost of five bytes. DE_Div_BC is now the
    subroutine, though, and is 272cc faster than if you had
    called it previously. Nearly 18% speed up
  - Fixed SearchString at a cost of 8 bytes, but should
    perform roughly 4 times faster. Also, there is now no
    risk of it entering an infinite loop, an issue the
    previous routine had.
  - SqrtHL is optimized. Actually replaced with SqrtDE.
    Saved 2 bytes, on average 261cc faster (20.17% faster).
    Worst case is still 165cc faster (12.75% faster).
15 Nov. 2018
  - I replaced the Sqrt routine with Runer112's from Axe.
    It is 221cc faster with the small modifications on my
    part to fit the output registers, and 2 bytes smaller.
    That's roughly 21.5% faster.
  - Removed ConvNumBase and HL_Div_C. HL_Div_C was only
    used by ConvNumBase, and ConvNumBase wasn't used
    anywhere in the code.
  - Changed Is_2_Byte. It's no faster or slower, just
    a little more sensible and readable.
  - Removed HexTok and GetHexAtDE.
  - Moved CompatCall so it didn't have to JP to
    IsOP1GrammerProg, saving 3 bytes and 10cc
  - Removed EndHook2 as it appears unused?
  - Optimized ONErr to be 1 byte less, 2cc faster.
  - Optimized TileMap1. 21cc faster, 2 bytes smaller.
  - Removed HL_SDiv_BC replacing the only use of it
    with a wrapper around a call to HL_Div_BC.
    Net 21 bytes smaller. Signed division command now
    averages about 47cc faster.
  - Removed PutIM, ParseFullArgI, CallI, CopyZStr,
    CreateZVar, FindVar.
  - Renamed memory addresses in the Menu command.
    May have messed something up.
  - vPutscr is 1 byte smaller, 3cc faster.
  - Optimized DrawRectToGraphI since it didn't need
    to preserve registers. Saved 9 bytes.
19 Nov. 2011
  - Did some testing and fixed some new bugs.
  - Fixed LoadReadArc. Needed 6 more bytes, saves
    another 76cc.
  - Opted to use interrupts for the Pause routine. It
    isn't as close to 1/100 seconds, but it is more
    energy efficient, smaller, and more reliable.

No screenshots as it's just behind-the-scenes code modifications.

270

HP Calculators / Re: HP Prime Emulator

« on: November 17, 2018, 10:35:44 pm »

Oh, nice work! I wish I knew the HP Prime better so that I could offer some real input

Pages: 1 ... 16 17 [18] 19 20 ... 317