Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Topics - Xeda112358

Pages: [1] 2 3 ... 12
1
TI Z80 / Hash-Based Instruction Lookup
« on: November 04, 2017, 11:24:57 am »
So I was working on some code to take an alphanumeric input string and look it up in a table and do something. After rehashing a binary search algorithm for the nth time, I remembered an old idea that I had for finding a string in set of data.
Spoiler For Spoiler:
I could basically speed up the search by xor-ing each byte of the search string and using that value is an indicator of a potential match. If the input is n bytes long, then I xor the first n starting bytes of the data to search. If the XORed value matches, double check that you haven't already found a match! Otherwise, set variables tail=data_start and head=data_start+n and x as the XORed value of the input, and acc as the XORed value of the first n bytes of the data. Now XOR the byte at tail with acc and store it back to acc. Increment tail, increment head. XOR the byte at head with acc and store it back to acc. If acc==x then double check the string between tail and head isn't a match, otherwise, continue until head goes beyond the data.
After some poking around on Google, I came across a hash based search algorithm that was just the insight that I needed!

So I came up with some sample commands and I was lucky enough to find a simple hashing function that provided a unique value for all 46 functions.

I'm going to provide the code here, even though it isn't very pretty. This way if I ever lose it, I'll have this as reference :)
This example executes the command "Disp" which I just made display everything after it.
Code: [Select]
#include "grammer3.inc"

_DispHL = 4507h
_NewLine = 452Eh
_PutS   = 450Ah
#define bcall(x) rst 28h \ .dw x
scrap = 8000h
denom = scrap
numer = scrap+4
out   = scrap+8
.db $BB,$6D
.org $9D95
    ld hl,testinput
    ld bc,testinput_end-testinput
    call hashlookup
    jr c,+_
    ld hl,s_NOTFOUND
    bcall(_PutS)
    ret
_:
    inc de \ inc de ; Now DE points to the pointer to the parsing code
    ex de,hl
    ld a,(hl) \ inc hl \ ld h,(hl) \ ld l,a
    ex de,hl
    cpi
    ret po
    ;HL points to the input, BC is the size left
    ;DE points to the code that parses the instruction
    push de
    ret
testinput:
    .db "Disp Yay, it works!",0
testinput_end:
hashlookup:
;HL points to the input
;BC is the max size
;return nc if failed to match, c if success
    ld (input_save),hl
    ld (input_savesize),bc
    ld a,b
    or c
    jr z,match_null
    call computehash
    ld hl,(input_savesize)
    xor a
    sbc hl,bc
    jr z,match_fail
    ld b,h
    ld c,l
    ld d,a
    ex de,hl
    add hl,hl
    ld (hash),hl


    ld de,hashlut_builtin
    add hl,de
    ld e,(hl)
    inc hl
    ld d,(hl)

    ld hl,(input_save)
;BC is the input size
;HL points to the input string
;DE points to the comparison
    ld a,(de)
    cp c
    jr nz,match_fail
    inc de
    ld b,c

_:
    ld a,(de)
    inc de
    cp (hl)
    jr nz,match_fail
    inc hl
    djnz -_
    scf
    ret
match_null:
    ld de,t_null+1
match_fail:
    ld hl,(input_save)
    ld bc,(input_savesize)
    or a
    ret
computehash:
    ld e,0
_:
    ld a,(hl)
    sub 48
    ret c
    cp 10
    jr c,$+5
    sub 7
    ret c
    cp 68
    ret nc
    ld d,a
    add a,a
    add a,d
    xor e
    ld e,a
    cpi
    jp pe,-_
    ret
   
   
hashlut_builtin:
.dw t_null              ;00
.dw t_null              ;01
.dw t_null              ;02
.dw t_null              ;03
.dw t_null              ;04
.dw t_null              ;05
.dw t_null              ;06
.dw t_null              ;07
.dw t_null              ;08
.dw t_null              ;09
.dw t_null              ;0a
.dw t_cosh              ;0b
.dw t_null              ;0c
.dw t_null              ;0d
.dw t_null              ;0e
.dw t_null              ;0f
.dw t_null              ;10
.dw t_null              ;11
.dw t_atan              ;12
.dw t_null              ;13
.dw t_sinh              ;14
.dw t_null              ;15
.dw t_null              ;16
.dw t_null              ;17
.dw t_null              ;18
.dw t_null              ;19
.dw t_null              ;1a
.dw t_null              ;1b
.dw t_sqrt              ;1c
.dw t_null              ;1d
.dw t_null              ;1e
.dw t_max               ;1f
.dw t_null              ;20
.dw t_null              ;21
.dw t_ClrDraw           ;22
.dw t_null              ;23
.dw t_null              ;24
.dw t_null              ;25
.dw t_null              ;26
.dw t_null              ;27
.dw t_null              ;28
.dw t_Ellipse           ;29
.dw t_null              ;2a
.dw t_null              ;2b
.dw t_null              ;2c
.dw t_null              ;2d
.dw t_null              ;2e
.dw t_null              ;2f
.dw t_null              ;30
.dw t_null              ;31
.dw t_null              ;32
.dw t_null              ;33
.dw t_null              ;34
.dw t_null              ;35
.dw t_null              ;36
.dw t_null              ;37
.dw t_null              ;38
.dw t_null              ;39
.dw t_ln                ;3a
.dw t_null              ;3b
.dw t_null              ;3c
.dw t_pi                ;3d
.dw t_null              ;3e
.dw t_null              ;3f
.dw t_null              ;40
.dw t_null              ;41
.dw t_null              ;42
.dw t_null              ;43
.dw t_null              ;44
.dw t_null              ;45
.dw t_null              ;46
.dw t_null              ;47
.dw t_null              ;48
.dw t_null              ;49
.dw t_null              ;4a
.dw t_null              ;4b
.dw t_null              ;4c
.dw t_null              ;4d
.dw t_null              ;4e
.dw t_null              ;4f
.dw t_null              ;50
.dw t_null              ;51
.dw t_null              ;52
.dw t_null              ;53
.dw t_null              ;54
.dw t_null              ;55
.dw t_null              ;56
.dw t_null              ;57
.dw t_null              ;58
.dw t_null              ;59
.dw t_null              ;5a
.dw t_null              ;5b
.dw t_null              ;5c
.dw t_null              ;5d
.dw t_null              ;5e
.dw t_null              ;5f
.dw t_null              ;60
.dw t_null              ;61
.dw t_null              ;62
.dw t_null              ;63
.dw t_null              ;64
.dw t_null              ;65
.dw t_null              ;66
.dw t_null              ;67
.dw t_null              ;68
.dw t_randint           ;69
.dw t_asinh             ;6a
.dw t_null              ;6b
.dw t_tan               ;6c
.dw t_null              ;6d
.dw t_null              ;6e
.dw t_null              ;6f
.dw t_null              ;70
.dw t_null              ;71
.dw t_null              ;72
.dw t_null              ;73
.dw t_null              ;74
.dw t_acosh             ;75
.dw t_null              ;76
.dw t_null              ;77
.dw t_null              ;78
.dw t_null              ;79
.dw t_null              ;7a
.dw t_null              ;7b
.dw t_null              ;7c
.dw t_null              ;7d
.dw t_null              ;7e
.dw t_null              ;7f
.dw t_null              ;80
.dw t_atanh             ;81
.dw t_null              ;82
.dw t_null              ;83
.dw t_null              ;84
.dw t_null              ;85
.dw t_Line              ;86
.dw t_sin               ;87
.dw t_null              ;88
.dw t_null              ;89
.dw t_e                 ;8a
.dw t_null              ;8b
.dw t_null              ;8c
.dw t_mod               ;8d
.dw t_null              ;8e
.dw t_null              ;8f
.dw t_null              ;90
.dw t_min               ;91
.dw t_Circle            ;92
.dw t_gcd               ;93
.dw t_null              ;94
.dw t_null              ;95
.dw t_null              ;96
.dw t_null              ;97
.dw t_cos               ;98
.dw t_null              ;99
.dw t_null              ;9a
.dw t_null              ;9b
.dw t_null              ;9c
.dw t_null              ;9d
.dw t_null              ;9e
.dw t_null              ;9f
.dw t_null              ;a0
.dw t_log2              ;a1
.dw t_null              ;a2
.dw t_Tilemap           ;a3
.dw t_log10             ;a4
.dw t_null              ;a5
.dw t_null              ;a6
.dw t_null              ;a7
.dw t_null              ;a8
.dw t_Text              ;a9
.dw t_null              ;aa
.dw t_null              ;ab
.dw t_null              ;ac
.dw t_null              ;ad
.dw t_Disp              ;ae
.dw t_null              ;af
.dw t_null              ;b0
.dw t_null              ;b1
.dw t_null              ;b2
.dw t_null              ;b3
.dw t_null              ;b4
.dw t_null              ;b5
.dw t_null              ;b6
.dw t_null              ;b7
.dw t_DispBuf           ;b8
.dw t_lcm               ;b9
.dw t_setseed           ;ba
.dw t_null              ;bb
.dw t_null              ;bc
.dw t_null              ;bd
.dw t_null              ;be
.dw t_null              ;bf
.dw t_pow10             ;c0
.dw t_null              ;c1
.dw t_null              ;c2
.dw t_null              ;c3
.dw t_null              ;c4
.dw t_pow2              ;c5
.dw t_null              ;c6
.dw t_null              ;c7
.dw t_null              ;c8
.dw t_null              ;c9
.dw t_null              ;ca
.dw t_null              ;cb
.dw t_null              ;cc
.dw t_null              ;cd
.dw t_null              ;ce
.dw t_null              ;cf
.dw t_null              ;d0
.dw t_null              ;d1
.dw t_null              ;d2
.dw t_null              ;d3
.dw t_null              ;d4
.dw t_null              ;d5
.dw t_null              ;d6
.dw t_null              ;d7
.dw t_null              ;d8
.dw t_null              ;d9
.dw t_null              ;da
.dw t_null              ;db
.dw t_null              ;dc
.dw t_Shiftbuf          ;dd
.dw t_randseed          ;de
.dw t_null              ;df
.dw t_null              ;e0
.dw t_null              ;e1
.dw t_exp               ;e2
.dw t_Output            ;e3
.dw t_null              ;e4
.dw t_Sprite            ;e5
.dw t_acos              ;e6
.dw t_null              ;e7
.dw t_Rect              ;e8
.dw t_null              ;e9
.dw t_null              ;ea
.dw t_null              ;eb
.dw t_null              ;ec
.dw t_rand              ;ed
.dw t_null              ;ee
.dw t_null              ;ef
.dw t_null              ;f0
.dw t_null              ;f1
.dw t_null              ;f2
.dw t_null              ;f3
.dw t_null              ;f4
.dw t_null              ;f5
.dw t_null              ;f6
.dw t_null              ;f7
.dw t_null              ;f8
.dw t_asin              ;f9
.dw t_null              ;fa
.dw t_null              ;fb
.dw t_null              ;fc
.dw t_null              ;fd
.dw t_null              ;fe
.dw t_tanh              ;ff
t_null:
.db 0
t_cosh:
.db 4,"cosh"
#include "commands\cosh.z80"
t_atan:
.db 4,"atan"
#include "commands\atan.z80"
t_sinh:
.db 4,"sinh"
#include "commands\sinh.z80"
t_sqrt:
.db 4,"sqrt"
#include "commands\sqrt.z80"
t_max:
.db 3,"max"
#include "commands\max.z80"
t_ClrDraw:
.db 7,"ClrDraw"
#include "commands\ClrDraw.z80"
t_Ellipse:
.db 7,"Ellipse"
#include "commands\Ellipse.z80"
t_ln:
.db 2,"ln"
#include "commands\ln.z80"
t_pi:
.db 2,"pi"
#include "commands\pi.z80"
t_randint:
.db 7,"randint"
#include "commands\randint.z80"
t_asinh:
.db 5,"asinh"
#include "commands\asinh.z80"
t_tan:
.db 3,"tan"
#include "commands\tan.z80"
t_acosh:
.db 5,"acosh"
#include "commands\acosh.z80"
t_atanh:
.db 5,"atanh"
#include "commands\atanh.z80"
t_Line:
.db 4,"Line"
#include "commands\Line.z80"
t_sin:
.db 3,"sin"
#include "commands\sin.z80"
t_e:
.db 1,"e"
#include "commands\e.z80"
t_mod:
.db 3,"mod"
#include "commands\mod.z80"
t_min:
.db 3,"min"
#include "commands\min.z80"
t_Circle:
.db 6,"Circle"
#include "commands\Circle.z80"
t_gcd:
.db 3,"gcd"
#include "commands\gcd.z80"
t_cos:
.db 3,"cos"
#include "commands\cos.z80"
t_log2:
.db 4,"log2"
#include "commands\log2.z80"
t_Tilemap:
.db 7,"Tilemap"
#include "commands\Tilemap.z80"
t_log10:
.db 5,"log10"
#include "commands\log10.z80"
t_Text:
.db 4,"Text"
#include "commands\Text.z80"
t_Disp:
.db 4,"Disp"
#include "commands\Disp.z80"
t_DispBuf:
.db 7,"DispBuf"
#include "commands\DispBuf.z80"
t_lcm:
.db 3,"lcm"
#include "commands\lcm.z80"
t_setseed:
.db 7,"setseed"
#include "commands\setseed.z80"
t_pow10:
.db 5,"pow10"
#include "commands\pow10.z80"
t_pow2:
.db 4,"pow2"
#include "commands\pow2.z80"
t_Shiftbuf:
.db 8,"Shiftbuf"
#include "commands\Shiftbuf.z80"
t_randseed:
.db 8,"randseed"
#include "commands\randseed.z80"
t_exp:
.db 3,"exp"
#include "commands\exp.z80"
t_Output:
.db 6,"Output"
#include "commands\Output.z80"
t_Sprite:
.db 6,"Sprite"
#include "commands\Sprite.z80"
t_acos:
.db 4,"acos"
#include "commands\acos.z80"
t_Rect:
.db 4,"Rect"
#include "commands\Rect.z80"
t_rand:
.db 4,"rand"
#include "commands\rand.z80"
t_asin:
.db 4,"asin"
#include "commands\asin.z80"
t_tanh:
.db 4,"tanh"
#include "commands\tanh.z80"

poparg:
    ret
tostr:
    ret
s_NOTFOUND:
    .db "Command Not Found",0
input_save:
    .dw 0
input_savesize:
    .dw 0
hash:
    .dw 0
The code for the Disp.z80 file, already convoluted for your viewing pleasure :P
Code: [Select]
    .dw __tokenizeDisp  ;First code generates the token, returns a pointer to the name in HL, size of the name in BC
    .dw __parsecodeDisp
    .dw __compiledcodeDisp
    .dw __helpDisp

__tokenizeDisp:
    ld hl,+_
    ld bc,1
    ret
_:
    .db tokGraphics+0
__helpDisp:
    .db 0
__compiledcodeDisp:
    .db _poparg
    .db _tostr
    .db _inlineasm \ .dw +_-$-2
    bcall(_PutS)
    ret
_:
__parsecodeDisp:
;    call poparg
;    call tostr
    bcall(_PutS)
    bcall(_NewLine)
    ret
Edit:
Oops, and the grammer3.inc file (don't get your hopes up, it's just a test :P).
Code: [Select]
tokVars     = $00   ;allows for 32 variable types (int, str, float, etc.)
tokGraphics = $20   ;allows for 32 basic graphics commands (plot(), Pxl(), Rect(), etc.)
tokControl  = $40   ;allows for 32 control commands (If, Goto, etc.)
tokMath     = $60   ;allows for 128 basic math commands
tokExtend   = $E0
tokInclude  = $F0

var_uint8   = 0
var_int8    = 1
var_uint16  = 2
var_int16   = 3
var_uint32  = 4
var_int32   = 5
var_uint64  = 6
var_int64   = 7
var_float   = 8
var_floatext= 9     ;extended precision float
var_fixed88 = 10    ;8.8 fixed point number, optimized for speed over var_fixed
var_fixed1616=11    ;16.16 fixed point number, optimized for speed over var_fixed
var_fixed   = 12    ;customized size for integer and fractional part, up to 64 bits total
var_rational= 13    ;customized size for the numerator and denominator of: int8, int16, int32, and int64. Denominator is always unsigned.
var_complex = 14    ;complex pair of floats.
var_complexext=15   ;complex pair of extended precision floats.
var_gaussian= 16    ;complex pair of integers. Customized size for the real and imaginary parts: int8, int16, int32, and int64.
var_gaussian8=17
var_gaussian16=18


var_sprite  = 28
var_array   = 29
var_list    = 30
var_str     = 31


_poparg     = 0
_pusharg    = 1
_tostr      = 2
_inlineasm  = 3

2
Math and Science / A faster Newton's Method Square Root
« on: October 10, 2017, 02:04:50 pm »
I decided to compute some square roots by hand to keep my tired self awake and alert at work. I opted to use Newton's Method this time and I noticed an optimization!
Instead of attaining 2N precision with a 2N-by-N->2N division, I split it up into an N-by-N multiply and a N-by-N->N division, which on the Z80 or similar processor can be nearly twice as fast.


The standard algorithm works as follows:

To find the square root of c, start with a guess called x.
Iterate the following: (x+c/x)/2


Each iteration doubles the digits of precision; try it on your calculator if you aren't familiar with it! To fin the square root of 21, guess '5'. Type 5, hit [Enter]. Then do .5(Ans+21/Ans and hit [Enter] a few times.

The way it works is, if the current x is bigger than the square root of c, then c/x is less, and vice-versa. So averaging the two gets a closer estimate to the actual square root. My observation was that if I had, say, 3 digits precision, then those digits stay the same in the next iteration. Through some observations, this would mean the first 3 digits of c/x will match that of x. In fact, for c/x, I'll only need to compute it to 6 digits, but if I can skip the first 3, then the division is easier and faster! So, lets optimize:


=>.5(x+c/x)
=>.5(x+x+(c/x-x))
=>x+.5(c-x*x)/x

So it's probably not so clear, but essentially for 2n-digits precision, I need to square an n-digit number and divide an n-digit number by n digits, to n digits precision. This replaces a 2n/2n division.

The former is faster, so let's try an example of finding the square root of 21:

c=21
Guess: x=5
(c-x*x) = -4
-4/5/2 = -.4
x-.4 = 4.6

(c-x*x) = 21-4.6*4.6 = -.16
-.16/4.6/2 = -.0173913... ~.02
x-.02 = 4.58

(c-x*x) = 21-4.58*4.58 = .0236
.0236/4.58/2 = .0025764... ~.0026
x+.0026 = 4.5826

(c-x*x) = 21-4.5826*4.5826 = -.00022276
-.00022276/4.5826/2 = -.000024304979... ~-.000024305
x-.000024305 = 4.582575695



In practice by hand, it actually doesn't save much time, but that's because by hand I usually can perform a 2n-by-2n division a little slower than half that of a 2n-by-2n and I can do multiplication and division in roughly the same speed. So overall This gets me about 20% faster.

On the Z80, a 2N-by-N->2N division takes roughly the time of 1.6 2N-by-2N->4N multiplies. As well, a 2N-by-2N->4N multiply takes roughly 3.5 times the time of an N-by-N->2N multiply using recursive Karatsuba multiplication.

So the standard algorithm takes roughly 5.6 N-by-N->2N multiplies worth of time.
The modified algorithm takes roughly 2.9 N-by-N->2N multiplies worth of time.

3
TI Z80 / Shunting-Yard Algorithm
« on: September 27, 2017, 08:37:33 pm »
So I have been really wanting to implement the Shunting-Yard algorithm for a few years now and I think I've got it in assembly. I keep finding bugs, but in case other people are interested, here is my code. It assumes that the operator stack and output (combined) won't exceed 768 bytes.

Things that would be useful:
  • Converting numbers to an intermediate format.
  • Handling multi-byte tokens and named vars and functions.
This way the code can be pseudo-compiled when it is converted to RPN, then an RPN calculator can parse the code faster or prep it for actual compiling.
Code: [Select]
#define bcall(x) rst 28h \ .dw x
saveSScreen = 86ECh
scrap=saveSScreen
outhead=8000h
stackhead = 8002h
.db $BB,$6D
.org $9D95
    ld hl,test
    ld bc,test_end-test
    call shuntingyard
    call rpn
    ld hl,scrap
    bcall(450Ah)
    bcall(452Eh)
    ret
shuntingyard:
    ld de,scrap
    ld (outhead),de
    ld d,(scrap/256)+3
    ld (stackhead),de
_:
    ld a,(hl)
    call +_
    cpi
    jp pe,-_
    ld hl,scrap+768
    ld de,(stackhead)
    or a
    sbc hl,de
    ld b,h
    ld c,l
    ld hl,(outhead)
    ex de,hl
    jr z,$+3
    ldir
    dec de
    xor a
    ld (de),a
    ret
_:
    cp '.'
    jp z,num_dec
    cp 30h
    jr c,+_
    cp 3Ah
    jp c,num
_:
    cp '('
    jp nz,+_
    ex de,hl
    ld hl,(stackhead)
    dec hl
    ld (hl),','
    dec hl
    ld (hl),a
    ld (stackhead),hl
    ex de,hl
    ret
_:
    cp ')'
    jp nz,checkunops
    push hl
    push bc
    ld hl,scrap+768
    ld de,(stackhead)
    sbc hl,de
    jp z,ERR_Unmatched_lparens
    ld b,h
    ld c,l
    ex de,hl
    ld de,(outhead)
;BC is the size of the stack. Use this in case there is a missing ')' so we don't read garbage.
;basically search for the matching '(' while piping out the stack to the output.
outerloop:
    ld a,(hl)
    cp '('
    jr z,parens_found
    ld a,','
_:
    cp (hl)
    ldi
    jp po,ERR_Unmatched_lparens
    jr z,outerloop
    jp -_
parens_found:
    inc hl
    inc hl
    ld (outhead),de
    ld (stackhead),hl
    pop bc
    pop hl
    ret
checkunops:
checkbinops:
;; if the token is an operator, then:
;; while there is an operator at the top of the operator stack with
;; greater than or equal to precedence and the operator is left associative:
;; pop operators from the operator stack, onto the output queue.
;; push the read operator onto the operator stack.
;;
;;
    push bc
    ex de,hl
    call getprecedence
    ld a,c
    pop bc
    ex de,hl
    jp c,search_function
    ;now C is the precedence, with lower bit = 1 if left-associative
    push bc
    push hl
    ld de,(stackhead)
    ld hl,scrap+768
    sbc hl,de
    ld b,h
    ld c,l
    ld hl,(outhead)
    ex de,hl
    jr z,pushop
    ;a is the precedence against which to compare
_:
    push hl
    push bc
    push af
    ld a,(hl)
    call getprecedence
    jr c,+_
    pop hl
    ld a,h      ;incoming
    cp c
    jr nz,$+4
    rra \ nop

    pop bc
    pop hl
   
 ;======================================================
    jr nc,pushop
.echo "The following code only works until we have to add >1 byte tokens."
 ldi
 ldi
    jp pe,-_
    jp $+6
_:
    pop af
    pop bc
    pop hl
pushop:
    ld (outhead),de
    pop de
    dec hl
    ld (hl),','
    dec hl
    ld a,(de)
    ld (stackhead),hl
    ld (hl),a
    ex de,hl
    pop bc
    ret
search_function:
    jp ERR_Func_Not_Found
getprecedence:
    ld hl,binops
    ld b,(binops_end-binops)/2
_:
    cp (hl)
    inc hl
    ld c,(hl)
    ret z
    inc hl
    djnz -_
    scf
    ret
binops:
    .db 4,  $01
    .db '=',$50
    .db '|',$60
    .db '&',$70
    .db '-',$81     ;right associative is odd
    .db '+',$80     ;left associative is even
    .db '/',$83     ;right associative
    .db '*',$82     ;left associative
    .db '^',$85     ;right associative
binops_end:
num:
    ld de,(outhead)
_:
    ldi
    jp po,+_
    ld a,(hl)
    cp '.'
    jr z,num_dec+4
    cp 30h
    jr c,+_
    cp 3Ah
    jr c,-_
_:
    ld a,','
    ld (de),a
    inc de
    ld (outhead),de
    dec hl
    inc bc
    ret
num_dec:
    ld de,(outhead)
_:
    ldi
    jp po,+_
    ld a,(hl)
    cp 30h
    jr c,+_
    cp 3Ah
    jr c,-_
_:
    cp '.'
    jp z,ERR_Syntax_00
    ld a,','
    ld (de),a
    inc de
    ld (outhead),de
    dec hl
    inc bc
    ret
ERR_Syntax_00:      ;Too many decimal points.
ERR_Func_Not_Found:
ERR_Unmatched_lparens:
    ret
rpn:
    ret
test:
;    .db "(3.1415926535)"
.db "(3.142+6/2-7)*3^6*3"
test_end:

4
Hi all, I lost my previous references and example programs and it took me this morning to locate this algorithm, digest it, and spew out my own version.  I looked on all of my calculators and Omni first, so I'm going to post it here for when it happens again :P

Anyways, this is one of my favorite algorithms for evaluating logarithms:

Code: [Select]
;Natural Logarithm on [.5,2]
;Single precision
a=.5(1+x)
g=.5(a+x/a) ;half precision divide
g=.5(g+x/g) ;full precision divide
b=a
a=(a+g)/2
c=.5(a+g)
g=.5(c+a*g/c) ;full precision divide
c=a
b = a-b/4
a=(a+g)/2
c = a-c/4-b/16
return (x-1)(1-1/4)(1-1/16)/c
  • It achieves single precision accuracy (at least 24.4996 bits) on the range of inputs from [.5,2].
  • During range reduction, x is usually reduced to some value on [c,2c].
    • The best precision is found when c=.5sqrt(2) (range is [.5sqrt(2),sqrt(2)], achieving at least 31.5 bits
    • I prefer c=2/3 since 2/3 and 4/3 are equidistant from 1-- it makes it easier for me to analyze time complexity. This still offers at least 29.97 bits, which is better than single precision
  • Cost is:
    • amean: 7 . 'amean' is the same cost as an add in binary floats
    • half divide: 1
    • full divide: 3
    • multiply:    1
    • shift-by-2:  3
    • shift-by-4:  2. This is sightly more efficient on the Z80 than 4 single-shifts when values are in RAM
    • add/sub:     5
    • add/sub const:1

I derived this algorithm from this wonderful paper which is annoyingly never at the top of a Google search and in fact took me a loooong time to ever stumble upon it, sadly.

This paper greatly accelerates the classic Borchardt-Gauss algorithm to be on par with the AGM algorithm. At their core, both algorithms perform an arithmetic and geometric mean, but AGM requires them to be done in parallel (emulated on single-core processors by some simple variable juggling), whereas B-G does them sequentially. As well, AGM achieves quadratic convergence or better (I've seen some exponential convergence in specific special cases), whereas classic B-G usually achieves linear convergence. Carlson's version of the B-G algorithm adds O(log(n)^2) additions and O(log(n)) space for quadratic convergence (where n is the number of desired bits of accuracy).

I like the B-G-C algorithm since I can easily obtain the inverse trig functions and inverse hyperbolic functions as well as the natural logarithm.

5
Sorry, I'm on my phone so I'll probably not go too in-depth on this :( Bug me for details if I don't get around to it and you need them :P

So:
Given ##x\in[-.5ln2,.5ln2]##
Let ##y=x^{2}##
Let ##a=\frac{x}{2}\frac{1+\frac{5y}{156}\left(1+\frac{3y}{550}\left(1+\frac{y}{1512}\right)\right)}{1+\frac{3y}{26}\left(1+\frac{5y}{396}\left(1+\frac{y}{450}\right)\right)}##
Then ##e^{x}\approx\frac{1+a}{1-a}##

Accuracy is ~75.9666 bits.
7 increments, 1 decrement
6 constant multiplications
6 general multiplications
2 general divisions
1 div by 2 (right shift or decrement an exponent)

For comparison, that's comparable to 16 terms of the Taylor's series, or 8 terms of the standard Padé expansion (exponential is special in that it comes out to f(x)/f(-x) so it can be done even easier than most).

I basically carried out a Padé expansion for e^x to infinity, noticed that after the constant term all the even coefficients were zero, so I used a Padé expansion on that function to quickly find our approximation for a.

In my usage, I actually implemented a 2x function since I'm using binary floats with 64-bit precision. I take int(x) and save that for a final exponent for the float. Remove that value from x. By definition of int(), x is now non-negative. If x≥.5, increment that saved exponent, subtract 1 from x. Now x is on [-.5,.5]. Now we need to perform 2x, but that is equivalent to ex*ln(2). We've effectively applied range reduction to our original input and now we can rely on the algorithm at the top. That integer part that we saved earlier now gets added to the exponent byte, et voilà !

I think when I calculated max error using my floats it was accurate up to the last two bits or something. Don't quote me on that as I don't have those notes on me at the moment and it was done months ago.

6
TI Z80 / Floatlib
« on: May 14, 2016, 06:44:13 pm »
I made a cute little app for my calc that I thought I would share! It basically puts all of my single-precision floats into one app and provides a somewhat easy way to access the routines through a jump table.

It even features an in-app reference for the routines which is useful for when I'm programming on my calc, as well as a hexadecimal translation of the header just in case I forget the code.

I included two examples. The first simply computes and displays eπ. The second uses an algorithm similar to that of the Arithmetic-Geometric Mean algorithm to compute arctangent. It takes the input number as a string in Ans.

Warning: I've run into a few bugs (some are documented, others I haven't chased down yet). Pro-tip: Don't store your program variables in the range of scrap to about scrap+30. The default value for scrap is 8000h. If you change this, please note that the app expects the scrap region to be 256 bytes.

7
ASM / [help] App Signing, SPASM
« on: February 09, 2016, 10:14:48 am »
My computer is a Rasberry Pi with Debian, and I'm trying to create and sign apps with SPASM. Unfortunately I get the error that appsigning isn't available in my build of Spasm. So then I installed rabbitsign, but rabbitsign says the number of app pages is wrong and when I try to send the resulting file to the calc, it has an error.

Can anybody help me? And is it possible for spasm to do it all (I know that's what I used to do with my old computer).

8
TI Z80 / Yet Another Code Parser
« on: February 08, 2016, 11:17:34 am »
As many of you know, I have a predilection for making code parsing engines. This time I don't have any grand expectations to finish this as a programming language. I will, however, offer my latest attempt for anybody who wants to make a custom parser.

Adding a binary operator
   Operations like ‘+’ and ‘-’ are considered binary operators. They take two inputs and produce one output. In this parser, binary operators are performed left-to-right. To add a new bin op, open up the source file 'binop.z80'. In the routine isBinOp, after the ld a,(de) line, insert binoperator(token,my_op). Then add the routine to the file somewhere like:
Code: [Select]
my_op:
;;checks if x<y
    xor a
    sbc hl,de
    ld h,a
    rla
    ld l,a
    xor a    ;sets type as uint16
    ret

Adding functions
This is a tad more complex of a process, but once you add a few, it's super easy. There are several built-in packages for different types of functions (math or graphics, for example). Each of these packages are listed in the look up table in the file "includes.z80". You can even add your own built-in packages if you want. Each package has an alphabetically sorted lookup table of functions and you will need to add your function to one of these tables. First however, let's add the code:
Code: [Select]
my_func00:
    .db "SPRITE",0    ;This is the name of the function

;;To make sure Ans is preserved, use this default code
    ld hl,(ans_ptr)
    push hl
    push af
    ld a,(ans_type)
    push af

;;Parse the arguments. Remember you can't modify DE or BC unless you save them.
;;parseNum returns A as the type, HL as the value. It checks to make sure the input was a number, and since only uint16 is supported, you don't need to check the type.
    call parseNum \ ld a,l \ push af
    call parseNum \ pop af \ ld h,a \ push hl
    call parseData
;;Now the stack holds the coordinates, HL points to the data string.
;;Save BC,DE so we can have full reign
    ld (parsePtr),de    ;Save these values
    ld (parseSize),bc   ;
    pop bc          ;B=X,C=Y
    call drawSprite

;;Now restore BC,DE
    ld bc,(parseSize)
    ld de,(parsePtr)

;This is the default ending. Jump to closeFUnc with
;A = ans_type
;H = opening token. Ex. '(' for SPRITE(x,y,data)
;<stack> holds ans_ptr
    pop hl
    pop af
    jp closeFunc
When that is finished, insert .dw my_func00 into the graphics LUT (located in graphics.z80). You need to insert this so that the LUT is in alphabetical order! Since "SPRITE" comes after "RECTXOR", you would insert it at the end of the LUT.

To Use:
Send prgmPARSER to your calc. You need Ans to contain the name of the program file to parse. For prgmTEST, use "ETEST" or "[TEST".

Included functions:
DISP x
RECTXOR(y,x,h,w)
ABS(x)

Binary ops:
+,-,*,/ are the traditional binary ops, currently only performed on uint16
plotdot, plotcross, plotbox are used as in Axe for 16-bit AND, OR, and XOR.

Things that would make this much better:

-Adding metadata after the function names to indicate stuff like whether it is a binary operator (so that 'x nCr y' can be added, for example.)
-An ASCII editor, since this is actually supposed to take ASCII input, not tokens.
-Variable support

9
TI Z80 / Balltrix - z80 ASM Remake
« on: January 09, 2016, 11:22:08 pm »
Updated Here

I've remade DJ Omnimaga's Balltrix in z80 Assembly. As requested in an older port, I've kept the original title screen.

The biggest issue that needs fixing is the name entry menu for highscores. It works, but it is super ugly and tedious. Some quirks/features are:
-The highscore info is saved directly to the program, so no external save file.
-Only 48 bytes of RAM are used to store the graphics data and none of the 768-byte buffers are modified by this program.
-This program spends a lot of it's time in a low-power halt state and operates at 6MHz.

I'm still in the process of tweaking and modifying the program, but I think I've got the control sensitivity almost perfect as well as the gradual increase in speed. My highest score so far is 87 ;D

10
TI Z80 / Mandelbrot Set explorer
« on: September 01, 2015, 09:00:55 am »
I had no internet for a few days, but at the same time I had a day off from work, so I made a Mandelbrot Set explorer! It uses 4.12 fixed point arithmetic. Here is a gif of it in action, and attached are stills of some deeper zooms.



I also made my own 4x4 mini font!

11
TI Z80 / Tok2Char
« on: August 06, 2015, 08:22:36 am »
This is a work-in-progress, but the intent is for it to convert a token into it's individual chars. It first converts the token to ASCII (this is easy; the OS provides a call), then the hard part is converting the ASCII to individual tokens. Uppercase is easy, and I also added conversion for lowercase chars, space, and "(" so that should get most tokens. However, something like tan-1( will not properly convert the -1.

12
TI Z80 / CatElem - Catalog Reader
« on: August 05, 2015, 10:10:09 am »
There was a request on Cemetech that I took interest in, and this is my work so far. You basically pass a number into the program, and it returns the nth catalog element. So 1:Asm(prgmCATELEM would return "abs(" as an example. If you pass 0 into it, it returns how many catalog elements there are.

Before you can use prgmCATELEM for it's intended purpose, you must run it once, and it will set itself up for your specific OS.
Suggestion: If you are going to use prgmCATELEM in a program that you want to release into the wild, provide a fresh prgmCATELEM, not the version on your calculator. Further, you should run it once at the start of your program to install, just in case the user of your program didn't run the installation step.

There are definitely issues with how prgmCESETUP searches for the catalog table. It assumes that the catalog will not have its first 4 bytes split on two pages and it assumes that catalog starts with "abs(" and then " and ". It also assumes that the catalog end is on the same page and ends with "?".

13
ASM / Link Port Woes
« on: July 27, 2015, 12:03:43 pm »
I am trying to write some link port routines and I am having some issues. Every so often, data won't be correctly transmitted and I have deduced that this only occurs when the sent byte has bit 1,3, or 5 set, it sometimes causes the next bit up to be set (so if bit 3 is supposed to be set, it also sets bit 4). This is a chaotic occurence.

What I do know is when the odd bits are read, bit 0 of port 0 (so the tip) reads 1. As well, the sending calc had a low battery, so I am not sure if it is an issue of bringing the lines up. I am pretty sure this is not the problem, though, as I switched out batteries.

I also tried adding code that guaranteed that the lines were reading correctly, otherwise it would rewrite the value to port 0 until it did. This still didn't work.
I tried adding huge delays, it still was just as unreliable.
Interrupts and link assist are disabled.

Below is my code if you want to muddle through it. It is supposed to send a value in Ans to the other calc and the receiving calc receives in Ans (modulo 100).
Code: [Select]
#define bcall(x) rst 28h \ .dw x
.db $BB,$6D
.org $9D95
    bcall(4AD7h)
    bcall(4AEFh)
sendb:
    di
    ld c,a
    xor a
    out (8),a
    ld e,55h
sendloop:
    rrc c
    rla
    sla e
    ccf
    rla
    out (0),a
    ld b,6          ;this is a delay, and not claculated to be precise. We probably do not even need a delay
    djnz $
    jr nz,sendloop
    ld b,10         ;delay
    djnz $
    xor a
    out (0),a
    ret
Code: [Select]
#define bcall(x) rst 28h \ .dw x
.db $BB,$6D
.org $9D95
    call getb
    bcall(478Ch)
    bcall(4ABFh)
    ret
getb:
    di
    xor a
    out (8),a
    ld bc,$08AA
readloop:
    in a,(0)
    xor c
    rra
    jr c,readloop
    rra             ;bits cycled in are masked with 0x55 as a sideeffect. Need to invert anyways, so mask at the end with 0xAA
    rr c
    djnz readloop
    ld a,c
    and 1
    rrca
    xor c
    xor $AA
    ret

14
Technology and Development / Serial Link, TI Z80
« on: July 15, 2015, 10:29:19 am »
I was not sure where to post this, but I made a calc-to-calc serial cable!

I ended up finding two sets of headphones with the right size tips (not sure what they are called), so I cut them up with an ample length of wire. I also have a bunch of jumper wires that I got for my Raspberry Pi, so after cutting the headphone wires, I exposed the three sets of wire, used an abrasive to strip off the enameling, and spliced each wire with a female tip of a jumper wire. From there, I can connect them with male-male jumpers and it properly transmits data between my calcs!

The reason that I didn't just splice the two sets of wires together directly is because now I can plug in other items. For example, I connected male leads to an electric motor and LED-- it lets me control them through the link port now! I can also hook them up to a breadboard if I wanted to, making it possible to connect more calculators, or I can directly plug them into the GPIO board on my Raspberry Pi. A few months ago I did this and wrote a Python program and a calc program that could communicate with each other (basically just sending a string in Ans to the Pi, and have it displayed in the terminal).

Unfortunately, I lost the Python program due to a crash, but I still have the calc programmed backed up on calc. My plan is to come up with some working routines from scratch since I am still pretty terrible with figuring out the link port. It took me forever to figure out that the receiving calc can't change the lines.

15
ASM / isDivisible Routine
« on: June 23, 2015, 12:30:33 pm »
I don't want to put this in the Optimized Routines thread as I haven't had it long enough for it to be ultra-optimized.

This code is intended as a method of testing if HL is divisible by DE. It's average run time for random 16-bit inputs is 163cc, and where DE<=HL (more likely in practice), it is 229cc. For a code that is doing prime-checking, it will more likely take about 707.5cc (Since DE will be roughly the square root of HL or less).
Code: [Select]
isDivisible:
;;Inputs: DE,HL
;;Outputs: c flag set if HL is not divisible by DE, else c flag is set.
;;         HL is 0 if true.
;;1087cc worst case (HL=65535, DE=1).
;;Average for random inputs: 163cc
;;Average for DE<=HL: 229cc.
    ld a,d \ or e \ ccf \ ret z         ;remove this if DE is always guaranteed non-zero
;step 1
    ld a,e \ or l \ rra \ jr c,step2    ;\
    srl d \ rr e \ rr h \ rr l          ; |
    ld a,e \ or l \ rra \ jr nc,$-11    ; |Remove these if DE is always guaranteed odd at input.
step2:                                  ; |
    ld a,e \ rra \ ccf \ ret c          ;/
;steps 3, 4, and 5
    ld a,l
    or a
loop:
    sbc hl,de \ ret c \ ret z
    rr h \ rra \ bit 0,a \ jr z,$-5
    ld l,a
    jp loop

Motivation and Development
I often find myself in a situation where I need to find the factors of a number, but I have no technology around to aid me. This means I need to use... mental arithmetic! I've been doing this for 15 years, so I have refined my mental process quite a bit. It is still a trial division algorithm, but with a very obfuscated "division" technique. We don't need to do 1131/7 to see if it is divisible by 7, we just need to see if 7 divides 1131 and this is what my algorithm does. Interestingly, testing divisibility at the algorithmic level is a little faster than division. Not by much, but it is also non-negligible.
The Algorithm
The core algorithm is designed around checking that (A mod B == 0) is true or false. We also make the assumption that B is odd and by extension, non-zero. The case where B is non-zero and even will be discussed later.

Since B is odd, 2 does not divide B. This means that if A is even:
    (A mod B == 0) if and only if  (A/2 mod B)==0.
We also know by the definition of divisibility that
    (A mod B) == (A+c*B mod B)
where c is any integer. Combining all this, we have an algorithm:

  • Remove all factors of 2 from A
  • With A now odd, do A=A-B
    • If the result is zero, that means (A mod B == 0)
    • If the result underflow (becomes "negative", or on the Z80, sets the carry flag), it means that A was somewhere on [1,B-1], so it is not divisible by B.
  • Continue back at 1.
Now suppose B is allowed to be non-zero and even. Then B is of the form d*2^k where d is odd. This just means there are some factors of 2 that can be removed from B until it is odd. The only way A is divisible by B, is if it has the same number or more of factors of 2 as B. If we factor out common factors of 2 and find B is still even, then A is not divisible by B. Otherwise we have an odd number and only need to check the new (A mod d) for which we can use the "odd algorithm" above.
So putting it all together:
  • If B==0, return FALSE.
  • Remove common factors of 2 from A and B.
  • If B is even, return FALSE.
  • Remove all factors of 2 from A.
  • Subtract B from A (A=A-B).
    • If the result is zero, return TRUE.
    • If the result is "negative" (setting the carry flag on many processors), return FALSE.
  • Repeat at 4.
The overhead steps are (1) to (3). The iterated steps are (4) and (5).
Because (5) always produces an even number, when it then performs step 4, it always divides by at least one factor of 2. This means the algorithm takes at most 1+ceil(log2(A))-floor(log2(B) iterations. For example, if A is a 37-bit number and B is a 13-bit number,this takes at most 38-13 = 25 iterations. However, in practice it is usually fewer iterations.
Example Time:
Say I wanted to test if 1337 is divisible by 17.
Since 17 is odd, we can proceed.
1337 is odd, so no factors of 2 to remove.
1337-17 == 1320.
1320/2 == 660
660/2 == 330
330/2 == 165
165-17 == 148
148/2 == 74
74/2 == 37
37-17 == 20
20/2 == 10
10/2 == 5
5-17 == -12
So 1337 is not divisible by 17.

Now test divisibility by 7:
1337 => 1330
=>665
=>658
=>329
=>322
=>161
=>154
=>77
=>70
=>35
=>28
=>14
=>7
=>0

So 1337 is divisible by 7.


The worst-case timing is 66*16+31 = 1087

Pages: [1] 2 3 ... 12