Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - Xeda112358

Pages: 1 ... 21 22 [23] 24 25 ... 317
331
TI Z80 / Hash-Based Instruction Lookup
« on: November 04, 2017, 11:24:57 am »
So I was working on some code to take an alphanumeric input string and look it up in a table and do something. After rehashing a binary search algorithm for the nth time, I remembered an old idea that I had for finding a string in set of data.
Spoiler For Spoiler:
I could basically speed up the search by xor-ing each byte of the search string and using that value is an indicator of a potential match. If the input is n bytes long, then I xor the first n starting bytes of the data to search. If the XORed value matches, double check that you haven't already found a match! Otherwise, set variables tail=data_start and head=data_start+n and x as the XORed value of the input, and acc as the XORed value of the first n bytes of the data. Now XOR the byte at tail with acc and store it back to acc. Increment tail, increment head. XOR the byte at head with acc and store it back to acc. If acc==x then double check the string between tail and head isn't a match, otherwise, continue until head goes beyond the data.
After some poking around on Google, I came across a hash based search algorithm that was just the insight that I needed!

So I came up with some sample commands and I was lucky enough to find a simple hashing function that provided a unique value for all 46 functions.

I'm going to provide the code here, even though it isn't very pretty. This way if I ever lose it, I'll have this as reference :)
This example executes the command "Disp" which I just made display everything after it.
Code: [Select]
#include "grammer3.inc"

_DispHL = 4507h
_NewLine = 452Eh
_PutS   = 450Ah
#define bcall(x) rst 28h \ .dw x
scrap = 8000h
denom = scrap
numer = scrap+4
out   = scrap+8
.db $BB,$6D
.org $9D95
    ld hl,testinput
    ld bc,testinput_end-testinput
    call hashlookup
    jr c,+_
    ld hl,s_NOTFOUND
    bcall(_PutS)
    ret
_:
    inc de \ inc de ; Now DE points to the pointer to the parsing code
    ex de,hl
    ld a,(hl) \ inc hl \ ld h,(hl) \ ld l,a
    ex de,hl
    cpi
    ret po
    ;HL points to the input, BC is the size left
    ;DE points to the code that parses the instruction
    push de
    ret
testinput:
    .db "Disp Yay, it works!",0
testinput_end:
hashlookup:
;HL points to the input
;BC is the max size
;return nc if failed to match, c if success
    ld (input_save),hl
    ld (input_savesize),bc
    ld a,b
    or c
    jr z,match_null
    call computehash
    ld hl,(input_savesize)
    xor a
    sbc hl,bc
    jr z,match_fail
    ld b,h
    ld c,l
    ld d,a
    ex de,hl
    add hl,hl
    ld (hash),hl


    ld de,hashlut_builtin
    add hl,de
    ld e,(hl)
    inc hl
    ld d,(hl)

    ld hl,(input_save)
;BC is the input size
;HL points to the input string
;DE points to the comparison
    ld a,(de)
    cp c
    jr nz,match_fail
    inc de
    ld b,c

_:
    ld a,(de)
    inc de
    cp (hl)
    jr nz,match_fail
    inc hl
    djnz -_
    scf
    ret
match_null:
    ld de,t_null+1
match_fail:
    ld hl,(input_save)
    ld bc,(input_savesize)
    or a
    ret
computehash:
    ld e,0
_:
    ld a,(hl)
    sub 48
    ret c
    cp 10
    jr c,$+5
    sub 7
    ret c
    cp 68
    ret nc
    ld d,a
    add a,a
    add a,d
    xor e
    ld e,a
    cpi
    jp pe,-_
    ret
   
   
hashlut_builtin:
.dw t_null              ;00
.dw t_null              ;01
.dw t_null              ;02
.dw t_null              ;03
.dw t_null              ;04
.dw t_null              ;05
.dw t_null              ;06
.dw t_null              ;07
.dw t_null              ;08
.dw t_null              ;09
.dw t_null              ;0a
.dw t_cosh              ;0b
.dw t_null              ;0c
.dw t_null              ;0d
.dw t_null              ;0e
.dw t_null              ;0f
.dw t_null              ;10
.dw t_null              ;11
.dw t_atan              ;12
.dw t_null              ;13
.dw t_sinh              ;14
.dw t_null              ;15
.dw t_null              ;16
.dw t_null              ;17
.dw t_null              ;18
.dw t_null              ;19
.dw t_null              ;1a
.dw t_null              ;1b
.dw t_sqrt              ;1c
.dw t_null              ;1d
.dw t_null              ;1e
.dw t_max               ;1f
.dw t_null              ;20
.dw t_null              ;21
.dw t_ClrDraw           ;22
.dw t_null              ;23
.dw t_null              ;24
.dw t_null              ;25
.dw t_null              ;26
.dw t_null              ;27
.dw t_null              ;28
.dw t_Ellipse           ;29
.dw t_null              ;2a
.dw t_null              ;2b
.dw t_null              ;2c
.dw t_null              ;2d
.dw t_null              ;2e
.dw t_null              ;2f
.dw t_null              ;30
.dw t_null              ;31
.dw t_null              ;32
.dw t_null              ;33
.dw t_null              ;34
.dw t_null              ;35
.dw t_null              ;36
.dw t_null              ;37
.dw t_null              ;38
.dw t_null              ;39
.dw t_ln                ;3a
.dw t_null              ;3b
.dw t_null              ;3c
.dw t_pi                ;3d
.dw t_null              ;3e
.dw t_null              ;3f
.dw t_null              ;40
.dw t_null              ;41
.dw t_null              ;42
.dw t_null              ;43
.dw t_null              ;44
.dw t_null              ;45
.dw t_null              ;46
.dw t_null              ;47
.dw t_null              ;48
.dw t_null              ;49
.dw t_null              ;4a
.dw t_null              ;4b
.dw t_null              ;4c
.dw t_null              ;4d
.dw t_null              ;4e
.dw t_null              ;4f
.dw t_null              ;50
.dw t_null              ;51
.dw t_null              ;52
.dw t_null              ;53
.dw t_null              ;54
.dw t_null              ;55
.dw t_null              ;56
.dw t_null              ;57
.dw t_null              ;58
.dw t_null              ;59
.dw t_null              ;5a
.dw t_null              ;5b
.dw t_null              ;5c
.dw t_null              ;5d
.dw t_null              ;5e
.dw t_null              ;5f
.dw t_null              ;60
.dw t_null              ;61
.dw t_null              ;62
.dw t_null              ;63
.dw t_null              ;64
.dw t_null              ;65
.dw t_null              ;66
.dw t_null              ;67
.dw t_null              ;68
.dw t_randint           ;69
.dw t_asinh             ;6a
.dw t_null              ;6b
.dw t_tan               ;6c
.dw t_null              ;6d
.dw t_null              ;6e
.dw t_null              ;6f
.dw t_null              ;70
.dw t_null              ;71
.dw t_null              ;72
.dw t_null              ;73
.dw t_null              ;74
.dw t_acosh             ;75
.dw t_null              ;76
.dw t_null              ;77
.dw t_null              ;78
.dw t_null              ;79
.dw t_null              ;7a
.dw t_null              ;7b
.dw t_null              ;7c
.dw t_null              ;7d
.dw t_null              ;7e
.dw t_null              ;7f
.dw t_null              ;80
.dw t_atanh             ;81
.dw t_null              ;82
.dw t_null              ;83
.dw t_null              ;84
.dw t_null              ;85
.dw t_Line              ;86
.dw t_sin               ;87
.dw t_null              ;88
.dw t_null              ;89
.dw t_e                 ;8a
.dw t_null              ;8b
.dw t_null              ;8c
.dw t_mod               ;8d
.dw t_null              ;8e
.dw t_null              ;8f
.dw t_null              ;90
.dw t_min               ;91
.dw t_Circle            ;92
.dw t_gcd               ;93
.dw t_null              ;94
.dw t_null              ;95
.dw t_null              ;96
.dw t_null              ;97
.dw t_cos               ;98
.dw t_null              ;99
.dw t_null              ;9a
.dw t_null              ;9b
.dw t_null              ;9c
.dw t_null              ;9d
.dw t_null              ;9e
.dw t_null              ;9f
.dw t_null              ;a0
.dw t_log2              ;a1
.dw t_null              ;a2
.dw t_Tilemap           ;a3
.dw t_log10             ;a4
.dw t_null              ;a5
.dw t_null              ;a6
.dw t_null              ;a7
.dw t_null              ;a8
.dw t_Text              ;a9
.dw t_null              ;aa
.dw t_null              ;ab
.dw t_null              ;ac
.dw t_null              ;ad
.dw t_Disp              ;ae
.dw t_null              ;af
.dw t_null              ;b0
.dw t_null              ;b1
.dw t_null              ;b2
.dw t_null              ;b3
.dw t_null              ;b4
.dw t_null              ;b5
.dw t_null              ;b6
.dw t_null              ;b7
.dw t_DispBuf           ;b8
.dw t_lcm               ;b9
.dw t_setseed           ;ba
.dw t_null              ;bb
.dw t_null              ;bc
.dw t_null              ;bd
.dw t_null              ;be
.dw t_null              ;bf
.dw t_pow10             ;c0
.dw t_null              ;c1
.dw t_null              ;c2
.dw t_null              ;c3
.dw t_null              ;c4
.dw t_pow2              ;c5
.dw t_null              ;c6
.dw t_null              ;c7
.dw t_null              ;c8
.dw t_null              ;c9
.dw t_null              ;ca
.dw t_null              ;cb
.dw t_null              ;cc
.dw t_null              ;cd
.dw t_null              ;ce
.dw t_null              ;cf
.dw t_null              ;d0
.dw t_null              ;d1
.dw t_null              ;d2
.dw t_null              ;d3
.dw t_null              ;d4
.dw t_null              ;d5
.dw t_null              ;d6
.dw t_null              ;d7
.dw t_null              ;d8
.dw t_null              ;d9
.dw t_null              ;da
.dw t_null              ;db
.dw t_null              ;dc
.dw t_Shiftbuf          ;dd
.dw t_randseed          ;de
.dw t_null              ;df
.dw t_null              ;e0
.dw t_null              ;e1
.dw t_exp               ;e2
.dw t_Output            ;e3
.dw t_null              ;e4
.dw t_Sprite            ;e5
.dw t_acos              ;e6
.dw t_null              ;e7
.dw t_Rect              ;e8
.dw t_null              ;e9
.dw t_null              ;ea
.dw t_null              ;eb
.dw t_null              ;ec
.dw t_rand              ;ed
.dw t_null              ;ee
.dw t_null              ;ef
.dw t_null              ;f0
.dw t_null              ;f1
.dw t_null              ;f2
.dw t_null              ;f3
.dw t_null              ;f4
.dw t_null              ;f5
.dw t_null              ;f6
.dw t_null              ;f7
.dw t_null              ;f8
.dw t_asin              ;f9
.dw t_null              ;fa
.dw t_null              ;fb
.dw t_null              ;fc
.dw t_null              ;fd
.dw t_null              ;fe
.dw t_tanh              ;ff
t_null:
.db 0
t_cosh:
.db 4,"cosh"
#include "commands\cosh.z80"
t_atan:
.db 4,"atan"
#include "commands\atan.z80"
t_sinh:
.db 4,"sinh"
#include "commands\sinh.z80"
t_sqrt:
.db 4,"sqrt"
#include "commands\sqrt.z80"
t_max:
.db 3,"max"
#include "commands\max.z80"
t_ClrDraw:
.db 7,"ClrDraw"
#include "commands\ClrDraw.z80"
t_Ellipse:
.db 7,"Ellipse"
#include "commands\Ellipse.z80"
t_ln:
.db 2,"ln"
#include "commands\ln.z80"
t_pi:
.db 2,"pi"
#include "commands\pi.z80"
t_randint:
.db 7,"randint"
#include "commands\randint.z80"
t_asinh:
.db 5,"asinh"
#include "commands\asinh.z80"
t_tan:
.db 3,"tan"
#include "commands\tan.z80"
t_acosh:
.db 5,"acosh"
#include "commands\acosh.z80"
t_atanh:
.db 5,"atanh"
#include "commands\atanh.z80"
t_Line:
.db 4,"Line"
#include "commands\Line.z80"
t_sin:
.db 3,"sin"
#include "commands\sin.z80"
t_e:
.db 1,"e"
#include "commands\e.z80"
t_mod:
.db 3,"mod"
#include "commands\mod.z80"
t_min:
.db 3,"min"
#include "commands\min.z80"
t_Circle:
.db 6,"Circle"
#include "commands\Circle.z80"
t_gcd:
.db 3,"gcd"
#include "commands\gcd.z80"
t_cos:
.db 3,"cos"
#include "commands\cos.z80"
t_log2:
.db 4,"log2"
#include "commands\log2.z80"
t_Tilemap:
.db 7,"Tilemap"
#include "commands\Tilemap.z80"
t_log10:
.db 5,"log10"
#include "commands\log10.z80"
t_Text:
.db 4,"Text"
#include "commands\Text.z80"
t_Disp:
.db 4,"Disp"
#include "commands\Disp.z80"
t_DispBuf:
.db 7,"DispBuf"
#include "commands\DispBuf.z80"
t_lcm:
.db 3,"lcm"
#include "commands\lcm.z80"
t_setseed:
.db 7,"setseed"
#include "commands\setseed.z80"
t_pow10:
.db 5,"pow10"
#include "commands\pow10.z80"
t_pow2:
.db 4,"pow2"
#include "commands\pow2.z80"
t_Shiftbuf:
.db 8,"Shiftbuf"
#include "commands\Shiftbuf.z80"
t_randseed:
.db 8,"randseed"
#include "commands\randseed.z80"
t_exp:
.db 3,"exp"
#include "commands\exp.z80"
t_Output:
.db 6,"Output"
#include "commands\Output.z80"
t_Sprite:
.db 6,"Sprite"
#include "commands\Sprite.z80"
t_acos:
.db 4,"acos"
#include "commands\acos.z80"
t_Rect:
.db 4,"Rect"
#include "commands\Rect.z80"
t_rand:
.db 4,"rand"
#include "commands\rand.z80"
t_asin:
.db 4,"asin"
#include "commands\asin.z80"
t_tanh:
.db 4,"tanh"
#include "commands\tanh.z80"

poparg:
    ret
tostr:
    ret
s_NOTFOUND:
    .db "Command Not Found",0
input_save:
    .dw 0
input_savesize:
    .dw 0
hash:
    .dw 0
The code for the Disp.z80 file, already convoluted for your viewing pleasure :P
Code: [Select]
    .dw __tokenizeDisp  ;First code generates the token, returns a pointer to the name in HL, size of the name in BC
    .dw __parsecodeDisp
    .dw __compiledcodeDisp
    .dw __helpDisp

__tokenizeDisp:
    ld hl,+_
    ld bc,1
    ret
_:
    .db tokGraphics+0
__helpDisp:
    .db 0
__compiledcodeDisp:
    .db _poparg
    .db _tostr
    .db _inlineasm \ .dw +_-$-2
    bcall(_PutS)
    ret
_:
__parsecodeDisp:
;    call poparg
;    call tostr
    bcall(_PutS)
    bcall(_NewLine)
    ret
Edit:
Oops, and the grammer3.inc file (don't get your hopes up, it's just a test :P).
Code: [Select]
tokVars     = $00   ;allows for 32 variable types (int, str, float, etc.)
tokGraphics = $20   ;allows for 32 basic graphics commands (plot(), Pxl(), Rect(), etc.)
tokControl  = $40   ;allows for 32 control commands (If, Goto, etc.)
tokMath     = $60   ;allows for 128 basic math commands
tokExtend   = $E0
tokInclude  = $F0

var_uint8   = 0
var_int8    = 1
var_uint16  = 2
var_int16   = 3
var_uint32  = 4
var_int32   = 5
var_uint64  = 6
var_int64   = 7
var_float   = 8
var_floatext= 9     ;extended precision float
var_fixed88 = 10    ;8.8 fixed point number, optimized for speed over var_fixed
var_fixed1616=11    ;16.16 fixed point number, optimized for speed over var_fixed
var_fixed   = 12    ;customized size for integer and fractional part, up to 64 bits total
var_rational= 13    ;customized size for the numerator and denominator of: int8, int16, int32, and int64. Denominator is always unsigned.
var_complex = 14    ;complex pair of floats.
var_complexext=15   ;complex pair of extended precision floats.
var_gaussian= 16    ;complex pair of integers. Customized size for the real and imaginary parts: int8, int16, int32, and int64.
var_gaussian8=17
var_gaussian16=18


var_sprite  = 28
var_array   = 29
var_list    = 30
var_str     = 31


_poparg     = 0
_pusharg    = 1
_tostr      = 2
_inlineasm  = 3

332
TI Z80 / Re: Antelope (formerly "OPIA") - A Polymorphic z80 language
« on: November 04, 2017, 09:55:09 am »
I'll be honest, I have no clue what a lot of that means. I've played with OOPLs here and there, but I've never managed to really "get it." However, I know for many programmers it is more natural and it would be cool to have such a language available on the (e)Z80 calculators.

I totally get being busy, but maybe you'll be lucky enough to pull a JamesV and come back in ten years with renewed energy/time/motivation.

333
ASM / Re: ASM Optimized routines
« on: October 16, 2017, 11:52:22 pm »
I find that doing the sign check before and after is the best method. I did find a few optimizations and noticed that the remainder isn't properly restored to A in the case where the result is negative. Also keep in mind that the routine may fail when C>127.

Code: [Select]
; divide HL by C (HL is signed, C is not)
; output: HL = quotient, A = remainder
divHLbyCs:
    bit 7,h
    push    af
    jr      z,divHLbyCsStart
    xor a \ sub l \ ld l,a
    sbc a,a \ sub h \ ld h,a
divHLbyCsStart:
    xor     a
    ld      b,16
divHLbyCsLoop:
    add     hl,hl
    rla
    cp      c
    jp      c,divHLbyCsNext
    sub     c
    inc     l
divHLbyCsNext:
    djnz    divHLbyCLoop
    ld      b,a
    pop     af
    ld      a,b
    ret     z
    xor a \ sub l \ ld l,a
    sbc a,a \ sub h \ ld h,a
    ld a,c
    sub b
    ret
I changed the output remainder so that it returns the remainder modulo c. So -7/3 returns 2 instead of 1. This means that no bytes nor clock cycles were saved after all :P

334
ASM / Re: ASM Optimized routines
« on: October 16, 2017, 10:19:05 pm »
Since I was able to optimize my best 16-bit multiply even further today, I thought I'd share! I even think it can be further optimized. And for that matter, here is my favorite DE_Times_A
mul16 596.34375cc, 92 bytes (incl. DE_Times_A)
Code: [Select]
mul16:
;Inputs:
;   BC,DE are unsigned integers
;Output:
;   HL:DE is the 32-bit product
;Destroys:
;   A,B,C
;min: 359cc
;max: 717cc
;avg: 596.34375cc
;92 bytes
    ld a,c
    call DE_Times_A
    push hl
    push af
    ld a,b
    call DE_Times_A+2
    pop bc
    pop de
;AHL
; BDE
    ld c,d
    add hl,bc
    adc a,0
;AHLE
    ld d,l
    ld l,h
    ld h,a
;HLDE
    ret
DE_Times_A:
;Input: DE,A
;Output: A:HL is the product, C=0, B,DE unaffected, z flag set if result is zero, c flag set if A is input as 1, else nc.
;A:128~255 219+6{0,10}+{0,19}    avg=258.5   *1/2
;A:64~127  203+5{0,10}+{0,19}    avg=237.5   *1/4
;A:32~63   187+4{0,10}+{0,19}    avg=216.5   *1/8
;A:16~31   171+3{0,10}+{0,19}    avg=195.5   *1/16
;A:8~15    155+2{0,10}+{0,19}    avg=174.5   *1/32
;A:4~7     139+{0,10}+{0,19}     avg=153.5   *1/64
;A:2~3     123+{0,19}            avg=132.5   *1/128
;A:1       107cc                 avg=107     *1/256
;A:0       119cc                 avg=119     *1/256
;overall avg: 237.671875cc
    ld c,0
    ld h,d
    ld l,e
    rla \ jr c,mul_07
    rla \ jr c,mul_06
    rla \ jr c,mul_05
    rla \ jr c,mul_04
    rla \ jr c,mul_03
    rla \ jr c,mul_02
    rla \ jr c,mul_01
    rla
    ret c
    ld h,a
    ld l,a
    ret
mul_07:
    add hl,hl \ rla \ jr nc,$+4 \ add hl,de \ adc a,c
mul_06:
    add hl,hl \ rla \ jr nc,$+4 \ add hl,de \ adc a,c
mul_05:
    add hl,hl \ rla \ jr nc,$+4 \ add hl,de \ adc a,c
mul_04:
    add hl,hl \ rla \ jr nc,$+4 \ add hl,de \ adc a,c
mul_03:
    add hl,hl \ rla \ jr nc,$+4 \ add hl,de \ adc a,c
mul_02:
    add hl,hl \ rla \ jr nc,$+4 \ add hl,de \ adc a,c
mul_01:
    add hl,hl \ rla \ ret nc \ add hl,de \ adc a,c
    ret
   

DE_Times_A, 237.671875cc, 72 bytes
Code: [Select]
DE_Times_A:
;Input: DE,A
;Output: A:HL is the product, C=0, B,DE unaffected, z flag set if result is zero, c flag set if A is input as 1, else nc.
;A:128~255 219+6{0,10}+{0,19}    avg=258.5   *1/2
;A:64~127  203+5{0,10}+{0,19}    avg=237.5   *1/4
;A:32~63   187+4{0,10}+{0,19}    avg=216.5   *1/8
;A:16~31   171+3{0,10}+{0,19}    avg=195.5   *1/16
;A:8~15    155+2{0,10}+{0,19}    avg=174.5   *1/32
;A:4~7     139+{0,10}+{0,19}     avg=153.5   *1/64
;A:2~3     123+{0,19}            avg=132.5   *1/128
;A:1       107cc                 avg=107     *1/256
;A:0       119cc                 avg=119     *1/256
;overall avg: 237.671875cc
    ld c,0
    ld h,d
    ld l,e
    rla \ jr c,mul_07
    rla \ jr c,mul_06
    rla \ jr c,mul_05
    rla \ jr c,mul_04
    rla \ jr c,mul_03
    rla \ jr c,mul_02
    rla \ jr c,mul_01
    rla
    ret c
    ld h,a
    ld l,a
    ret
mul_07:
    add hl,hl \ rla \ jr nc,$+4 \ add hl,de \ adc a,c
mul_06:
    add hl,hl \ rla \ jr nc,$+4 \ add hl,de \ adc a,c
mul_05:
    add hl,hl \ rla \ jr nc,$+4 \ add hl,de \ adc a,c
mul_04:
    add hl,hl \ rla \ jr nc,$+4 \ add hl,de \ adc a,c
mul_03:
    add hl,hl \ rla \ jr nc,$+4 \ add hl,de \ adc a,c
mul_02:
    add hl,hl \ rla \ jr nc,$+4 \ add hl,de \ adc a,c
mul_01:
    add hl,hl \ rla \ ret nc \ add hl,de \ adc a,c
    ret



335
I have one and it was my primary computer for a while (and before that it was the Pi 2, and before that it was the Pi B). I used it mostly for math-related programming and TI-programming and video games! Tilem and TiLP work well, I installed Sage for more powerful math computing, and RetroPie to play some of my old PSP and PS1 games-- the latter of which runs smoothly, the former requires some tweaking.

I also wrote the brunt of a Point-Of-Sale (POS) system from scratch that it handled just fine, but I need to modify it to be web-based now :( My Pi 2 uses a 128GB USB drive for the root file system which makes it my second highest memory computer (superseded by a 140GB Windows laptop).

336
Math and Science / A faster Newton's Method Square Root
« on: October 10, 2017, 02:04:50 pm »
EDIT 2022-06-29: The post below has a much needed improvement :)

I decided to compute some square roots by hand to keep my tired self awake and alert at work. I opted to use Newton's Method this time and I noticed an optimization!
Instead of attaining 2N precision with a 2N-by-N->2N division, I split it up into an N-by-N multiply and a N-by-N->N division, which on the Z80 or similar processor can be nearly twice as fast.


The standard algorithm works as follows:

To find the square root of c, start with a guess called x.
Iterate the following: (x+c/x)/2


Each iteration doubles the digits of precision; try it on your calculator if you aren't familiar with it! To fin the square root of 21, guess '5'. Type 5, hit [Enter]. Then do .5(Ans+21/Ans and hit [Enter] a few times.

The way it works is, if the current x is bigger than the square root of c, then c/x is less, and vice-versa. So averaging the two gets a closer estimate to the actual square root. My observation was that if I had, say, 3 digits precision, then those digits stay the same in the next iteration. Through some observations, this would mean the first 3 digits of c/x will match that of x. In fact, for c/x, I'll only need to compute it to 6 digits, but if I can skip the first 3, then the division is easier and faster! So, lets optimize:


=>.5(x+c/x)
=>.5(x+x+(c/x-x))
=>x+.5(c-x*x)/x

So it's probably not so clear, but essentially for 2n-digits precision, I need to square an n-digit number and divide an n-digit number by n digits, to n digits precision. This replaces a 2n/2n division.

The former is faster, so let's try an example of finding the square root of 21:

c=21
Guess: x=5
(c-x*x) = -4
-4/5/2 = -.4
x-.4 = 4.6

(c-x*x) = 21-4.6*4.6 = -.16
-.16/4.6/2 = -.0173913... ~.02
x-.02 = 4.58

(c-x*x) = 21-4.58*4.58 = .0236
.0236/4.58/2 = .0025764... ~.0026
x+.0026 = 4.5826

(c-x*x) = 21-4.5826*4.5826 = -.00022276
-.00022276/4.5826/2 = -.000024304979... ~-.000024305
x-.000024305 = 4.582575695



In practice by hand, it actually doesn't save much time, but that's because by hand I usually can perform a 2n-by-2n division a little slower than half that of a 2n-by-2n and I can do multiplication and division in roughly the same speed. So overall This gets me about 20% faster.

On the Z80, a 2N-by-N->2N division takes roughly the time of 1.6 2N-by-2N->4N multiplies. As well, a 2N-by-2N->4N multiply takes roughly 3.5 times the time of an N-by-N->2N multiply using recursive Karatsuba multiplication.

So the standard algorithm takes roughly 5.6 N-by-N->2N multiplies worth of time.
The modified algorithm takes roughly 2.9 N-by-N->2N multiplies worth of time.

337
TI Z80 / Shunting-Yard Algorithm
« on: September 27, 2017, 08:37:33 pm »
So I have been really wanting to implement the Shunting-Yard algorithm for a few years now and I think I've got it in assembly. I keep finding bugs, but in case other people are interested, here is my code. It assumes that the operator stack and output (combined) won't exceed 768 bytes.

Things that would be useful:
  • Converting numbers to an intermediate format.
  • Handling multi-byte tokens and named vars and functions.
This way the code can be pseudo-compiled when it is converted to RPN, then an RPN calculator can parse the code faster or prep it for actual compiling.
Code: [Select]
#define bcall(x) rst 28h \ .dw x
saveSScreen = 86ECh
scrap=saveSScreen
outhead=8000h
stackhead = 8002h
.db $BB,$6D
.org $9D95
    ld hl,test
    ld bc,test_end-test
    call shuntingyard
    call rpn
    ld hl,scrap
    bcall(450Ah)
    bcall(452Eh)
    ret
shuntingyard:
    ld de,scrap
    ld (outhead),de
    ld d,(scrap/256)+3
    ld (stackhead),de
_:
    ld a,(hl)
    call +_
    cpi
    jp pe,-_
    ld hl,scrap+768
    ld de,(stackhead)
    or a
    sbc hl,de
    ld b,h
    ld c,l
    ld hl,(outhead)
    ex de,hl
    jr z,$+3
    ldir
    dec de
    xor a
    ld (de),a
    ret
_:
    cp '.'
    jp z,num_dec
    cp 30h
    jr c,+_
    cp 3Ah
    jp c,num
_:
    cp '('
    jp nz,+_
    ex de,hl
    ld hl,(stackhead)
    dec hl
    ld (hl),','
    dec hl
    ld (hl),a
    ld (stackhead),hl
    ex de,hl
    ret
_:
    cp ')'
    jp nz,checkunops
    push hl
    push bc
    ld hl,scrap+768
    ld de,(stackhead)
    sbc hl,de
    jp z,ERR_Unmatched_lparens
    ld b,h
    ld c,l
    ex de,hl
    ld de,(outhead)
;BC is the size of the stack. Use this in case there is a missing ')' so we don't read garbage.
;basically search for the matching '(' while piping out the stack to the output.
outerloop:
    ld a,(hl)
    cp '('
    jr z,parens_found
    ld a,','
_:
    cp (hl)
    ldi
    jp po,ERR_Unmatched_lparens
    jr z,outerloop
    jp -_
parens_found:
    inc hl
    inc hl
    ld (outhead),de
    ld (stackhead),hl
    pop bc
    pop hl
    ret
checkunops:
checkbinops:
;; if the token is an operator, then:
;; while there is an operator at the top of the operator stack with
;; greater than or equal to precedence and the operator is left associative:
;; pop operators from the operator stack, onto the output queue.
;; push the read operator onto the operator stack.
;;
;;
    push bc
    ex de,hl
    call getprecedence
    ld a,c
    pop bc
    ex de,hl
    jp c,search_function
    ;now C is the precedence, with lower bit = 1 if left-associative
    push bc
    push hl
    ld de,(stackhead)
    ld hl,scrap+768
    sbc hl,de
    ld b,h
    ld c,l
    ld hl,(outhead)
    ex de,hl
    jr z,pushop
    ;a is the precedence against which to compare
_:
    push hl
    push bc
    push af
    ld a,(hl)
    call getprecedence
    jr c,+_
    pop hl
    ld a,h      ;incoming
    cp c
    jr nz,$+4
    rra \ nop

    pop bc
    pop hl
   
 ;======================================================
    jr nc,pushop
.echo "The following code only works until we have to add >1 byte tokens."
 ldi
 ldi
    jp pe,-_
    jp $+6
_:
    pop af
    pop bc
    pop hl
pushop:
    ld (outhead),de
    pop de
    dec hl
    ld (hl),','
    dec hl
    ld a,(de)
    ld (stackhead),hl
    ld (hl),a
    ex de,hl
    pop bc
    ret
search_function:
    jp ERR_Func_Not_Found
getprecedence:
    ld hl,binops
    ld b,(binops_end-binops)/2
_:
    cp (hl)
    inc hl
    ld c,(hl)
    ret z
    inc hl
    djnz -_
    scf
    ret
binops:
    .db 4,  $01
    .db '=',$50
    .db '|',$60
    .db '&',$70
    .db '-',$81     ;right associative is odd
    .db '+',$80     ;left associative is even
    .db '/',$83     ;right associative
    .db '*',$82     ;left associative
    .db '^',$85     ;right associative
binops_end:
num:
    ld de,(outhead)
_:
    ldi
    jp po,+_
    ld a,(hl)
    cp '.'
    jr z,num_dec+4
    cp 30h
    jr c,+_
    cp 3Ah
    jr c,-_
_:
    ld a,','
    ld (de),a
    inc de
    ld (outhead),de
    dec hl
    inc bc
    ret
num_dec:
    ld de,(outhead)
_:
    ldi
    jp po,+_
    ld a,(hl)
    cp 30h
    jr c,+_
    cp 3Ah
    jr c,-_
_:
    cp '.'
    jp z,ERR_Syntax_00
    ld a,','
    ld (de),a
    inc de
    ld (outhead),de
    dec hl
    inc bc
    ret
ERR_Syntax_00:      ;Too many decimal points.
ERR_Func_Not_Found:
ERR_Unmatched_lparens:
    ret
rpn:
    ret
test:
;    .db "(3.1415926535)"
.db "(3.142+6/2-7)*3^6*3"
test_end:

338
Math and Science / Re: Identifying 4D intersections with two unknowns.
« on: September 27, 2017, 08:27:01 pm »
I'll be honest, my eyes glossed over. So you have a 3-D space and two buckets that move back and forth along a fixed (randomly initialized) path. A projectile is launched from a bucket, in only one direction and  the goal is for it to fall into the other moving bucket. The problem then is to determine if it is possible with a given setup.

Since the projectile is launched in one fixed direction, I'm assuming directly 'up' and it falls directly back 'down', let's pretend we do that at every single point and map the result. If the path made a loop, the plot would look something like a cookie cutter. Plot the other path and find if it ever intersects. If not, return FALSE.

If it does, we need to then verify if they intersect in time. What I would do is determine how many integer time units it takes both paths to complete. If the time units of both paths are coprime, then return TRUE.

Perform the following loop in the even that they are not coprime:
  • 0. Given the length of time of the first path is A, and the length of the second is B.
  • 1. Start with a first intersection.
  • 2. Identify the time units it takes to reach that point on both paths called C and D.
  • 3. If gcd(|C-D|,gcd(A,B))=1, return TRUE, else move to the next intersection and go to Step 2
  • 4. If all intersections are exhausted, return FALSE.


WARNING: My math skills have been deteriorating since college. I'm sure there are optimizations and identities to apply.

339
Hi all, I lost my previous references and example programs and it took me this morning to locate this algorithm, digest it, and spew out my own version.  I looked on all of my calculators and Omni first, so I'm going to post it here for when it happens again :P

Anyways, this is one of my favorite algorithms for evaluating logarithms:

Code: [Select]
;Natural Logarithm on [.5,2]
;Single precision
a=.5(1+x)
g=.5(a+x/a) ;half precision divide
g=.5(g+x/g) ;full precision divide
b=a
a=(a+g)/2
c=.5(a+g)
g=.5(c+a*g/c) ;full precision divide
c=a
b = a-b/4
a=(a+g)/2
c = a-c/4-b/16
return (x-1)(1-1/4)(1-1/16)/c
  • It achieves single precision accuracy (at least 24.4996 bits) on the range of inputs from [.5,2].
  • During range reduction, x is usually reduced to some value on [c,2c].
    • The best precision is found when c=.5sqrt(2) (range is [.5sqrt(2),sqrt(2)], achieving at least 31.5 bits
    • I prefer c=2/3 since 2/3 and 4/3 are equidistant from 1-- it makes it easier for me to analyze time complexity. This still offers at least 29.97 bits, which is better than single precision
  • Cost is:
    • amean: 7 . 'amean' is the same cost as an add in binary floats
    • half divide: 1
    • full divide: 3
    • multiply:    1
    • shift-by-2:  3
    • shift-by-4:  2. This is sightly more efficient on the Z80 than 4 single-shifts when values are in RAM
    • add/sub:     5
    • add/sub const:1

I derived this algorithm from this wonderful paper which is annoyingly never at the top of a Google search and in fact took me a loooong time to ever stumble upon it, sadly.

This paper greatly accelerates the classic Borchardt-Gauss algorithm to be on par with the AGM algorithm. At their core, both algorithms perform an arithmetic and geometric mean, but AGM requires them to be done in parallel (emulated on single-core processors by some simple variable juggling), whereas B-G does them sequentially. As well, AGM achieves quadratic convergence or better (I've seen some exponential convergence in specific special cases), whereas classic B-G usually achieves linear convergence. Carlson's version of the B-G algorithm adds O(log(n)^2) additions and O(log(n)) space for quadratic convergence (where n is the number of desired bits of accuracy).

I like the B-G-C algorithm since I can easily obtain the inverse trig functions and inverse hyperbolic functions as well as the natural logarithm.

340
General Discussion / Re: Watcha Been Listening To?
« on: August 02, 2017, 07:25:03 am »
For a moment I thought you meant  to type Otep XD I haven't heard anything about that one for a while.

341
Sorry, I'm on my phone so I'll probably not go too in-depth on this :( Bug me for details if I don't get around to it and you need them :P

So:
Given ##x\in[-.5ln2,.5ln2]##
Let ##y=x^{2}##
Let ##a=\frac{x}{2}\frac{1+\frac{5y}{156}\left(1+\frac{3y}{550}\left(1+\frac{y}{1512}\right)\right)}{1+\frac{3y}{26}\left(1+\frac{5y}{396}\left(1+\frac{y}{450}\right)\right)}##
Then ##e^{x}\approx\frac{1+a}{1-a}##

Accuracy is ~75.9666 bits.
7 increments, 1 decrement
6 constant multiplications
6 general multiplications
2 general divisions
1 div by 2 (right shift or decrement an exponent)

For comparison, that's comparable to 16 terms of the Taylor's series, or 8 terms of the standard Padé expansion (exponential is special in that it comes out to f(x)/f(-x) so it can be done even easier than most).

I basically carried out a Padé expansion for e^x to infinity, noticed that after the constant term all the even coefficients were zero, so I used a Padé expansion on that function to quickly find our approximation for a.

In my usage, I actually implemented a 2x function since I'm using binary floats with 64-bit precision. I take int(x) and save that for a final exponent for the float. Remove that value from x. By definition of int(), x is now non-negative. If x≥.5, increment that saved exponent, subtract 1 from x. Now x is on [-.5,.5]. Now we need to perform 2x, but that is equivalent to ex*ln(2). We've effectively applied range reduction to our original input and now we can rely on the algorithm at the top. That integer part that we saved earlier now gets added to the exponent byte, et voilà !

I think when I calculated max error using my floats it was accurate up to the last two bits or something. Don't quote me on that as I don't have those notes on me at the moment and it was done months ago.

342
Miscellaneous / Re: Hello again everyone
« on: May 02, 2017, 09:08:38 am »
Hey Art! It's been the same way for me, too. Adulthood and social media... ugh :P I love programming and do program on my calc most days, but it's usually just a couple of minutes here and there throughout the day :( I just don't have time for projects. I couldn't tell if I wasn't seeing you around or if it was because I don't check in regularly XD

343
TI-BASIC / Re: TI-Concours 2017 has started!
« on: March 25, 2017, 08:22:47 pm »
Oh, this looks fun! I make no promises, but I might enter!

I have a new job and my free time is much more consistent. Aaaand the competition ends on my birthday :)

344
ASM / Re: Shifting data block down a nibble?
« on: March 25, 2017, 08:18:15 pm »
harold's method is essentially what I did to shift the screen left or right by multiples of 4. It's slow, but faster than shifting by 1 four times.

345
ASM / Re: Should I learn ASM?
« on: March 15, 2017, 10:32:24 am »
I'm also guessing division would be the subtraction instead of addition, trunkating any remainder.
Division is typically performed exactly like 'schoolbook' long division (until you get to higher precision, then you can use some state-of-the-art algorithms and math magic).

Start with an accumulator and quotient set to 0.
Rotate the one bit of the numerator into the accumulator.
If accumulator>=denominator, then subtract the denominator from the accumulator and shift in a 1 to the quotient, else shift in a zero
Repeat at the "rotate" step.

In code, since we are getting rid of one bit at a time in the numerator and adding 1 bit at a time to the quotient, we can actually recycle the freed up bits in the numerator.
Here is an example of HL/C where C<128, A is the accumulator and HL doubles as the quotient and numerator:
Code: [Select]
HL_div_C:
;Input:
;  HL is the numerator
;  C is the denominator. C<128
;Output:
;  A is the remainder
;  HL is the quotient.
    xor a
    ld b,16
loop:
    add hl,hl   ;this works like shifting HL left by 1. Overflow (thus, the top bit) is left in the carry flag
    rla         ;shift a left, rotating in the c flag as the low bit.
    cp c        ;compare a to c. Basically does A-C and returns the flags. If C>A, then there will be underflow setting the C flag.
    jr c,skip   ;skip the next two bytes if c flag is set (so A<C)
    inc l       ;we know the low bit of HL is 0, so incrementing HL will set that bit.
    sub c
skip:
    djnz loop
    ret
ld b,8 - This must be telling the loop to repeat 8 times to check each bit, so, how does the loop relate to b? I'm thinking about how nested loops would work...
djnz loop - kindof like "goto" loop, with the automatic decrementing of register b?
That is correct and that's why we initially do ld b,8. Keep in mind that djnz * and jr * are intended for small (ish) loops or redirecting to code relatively close by. It can jump backwards up to 126 bytes and forward up to 128 (back 1 byte is simply a slower way of performing rst 38h and I do not suggest it, back 2 bytes creates an infinite loop, back 0 bytes does nothing but waste 7 or 9 clock cycles). Most assemblers will warn you of out-of-bounds jumps.
For longer loops or jumping to code far away, use jp *. djnz * only works with register b and always decrements.

rrc e - you say this rotates register e, is this the same as changing the register input from left to right?
For example, if e=01111011, then rrc e would change it to e=10111101. The bottom bit gets returned in the c flag as well as being returned to the top bit of e. I used rrc * since rotating 8 times would leave it exactly how it was input. I could have used rr * or even sra * or srl *, but I prefer to avoid destroying registers if I can.

Pages: 1 ... 21 22 [23] 24 25 ... 317