Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Topics - Xeda112358

Pages: 1 [2] 3 4 ... 13
16
Site Feedback and Questions / Login Broken
« on: August 06, 2018, 04:31:38 pm »
I've noticed that when I try to sign in to Omni via the home page ( https://www.omnimaga.org/index.php ), it just hangs with a "Loading..." status bar at the top and never actually logs in. However, when I navigate to any other page and try to log in at the top, it works fine.
It has happened to me a few times in the past, but I so rarely need to log in that I attributed it to my internet connection.

17
TI Z80 / Heapsort, VATSort, and ListSort
« on: July 03, 2018, 10:29:08 pm »
EDIT: Some updates can be found later in this thread, here.

Hey all, I implemented the heapsort algorithm with some features inspired by Sean McLaughlin's implementation. Heapsort operates in O(n*log(n)) time, so it is fast even as array sizes increase. The cool perk to Sean's implementation that I wanted to include in mine is the ability to have a callback function that performs comparisons. This is fantastic if your array contains pointers to, say, an array of strings. The callback function can take the inputs (in my case, HL and DE), perform a string comparison, and return the result to the heapsort routine. Heapsort will then know whether or not to swap the elements.

As an example, I created a VAT sorting routine. It creates an array of pointers to the VAT entries of named variables (like Programs, Appvars, etc.), and then uses a fancy callback function that compares the names of the VAT entries. Returned is an alphabetically sorted array of pointers.

As well, I made a very fast routine to sort TI lists. The callback routine for this takes indices, constructs pointers to the actual list data, compares them, then returns the result. Since the OS routine uses an O(n2) algorithm, my program can perform way faster on larger lists. I ran a few tests on random lists and came up with:

* at 100 elements, the OS routine and my routine were pretty close in speed.
* at 400 elements, my routine sorted it in ~1.3 seconds, versus ~9.5. for the OS routine
* at 999 elements, my routine sorted it in ~3.0 seconds, versus ~55.5. for the OS.

Code: (Heapsort) [Select]
#ifndef scrap
#define scrap 8000h
.echo "Warning, 'scrap' not defined, defining scrap=8000h"
#endif
#ifndef SMC
arraybase= scrap
arraylen = scrap+2
#endif
heapflags= 33
curchild = 0
heapsort:
;   HL points to the array data
;   BC is the size of the array. NOT GREATER THAN 32767
;   IX points to the routine that compares the values
#ifdef fast
    call heapify
    ld hl,(arraylen)
#else
    push bc
    call heapify
    pop hl
#endif
_:
    dec hl
    ld (arraylen),hl
#ifndef fast
    push hl
#endif
    ld de,(arraybase)
    add hl,hl
    inc de
    add hl,de
#ifdef fast
    ld a,(de)
    ldd
    inc hl
    ld (hl),a
    dec hl
    ld a,(de)
    ldd
    inc hl
    ld (hl),a
#else
    call swap
#endif
    ld bc,1
    call propogate
#ifdef fast
    ld hl,(arraylen)
#else
    pop hl
#endif
    ld a,h
    or l
    jr nz,-_   
    ret
heapify:
;Inputs:
;   HL points to the array data
;   BC is the size of the array. NOT GREATER THAN 32767
;   IX points to the routine that compares the values
    ld (arraybase),hl
    ld (arraylen),bc
    srl b
    rr c
_:
    push bc
    call propogate
    pop bc
    dec bc
    ld a,b
    or c
    jr nz,-_
    ret
propogate:
;BC-1 is the parent index
;2BC is the child1 index
    res curchild,(iy+heapflags)
proppost:
    sla c
    rl b
    ld d,b
    ld e,c
#ifdef SMC
arraylen=$+1
    ld hl,0
#else
    ld hl,(arraylen)
#endif
    sbc hl,de
    add hl,de
    ret c  ;no children
;compare the two children
#ifdef SMC
arraybase=$+1
    ld hl,0
#else
    ld hl,(arraybase)
#endif
    add hl,de
    add hl,de
    inc hl
    ld d,(hl)
    dec hl
    ld e,(hl)
    dec hl
    push hl
    ld a,(hl)
    dec hl
    ld l,(hl)
    ld h,a
    ;HL holds the value of child0
    ;DE holds the value of child1
    jr z,+_
    call callix
    jr nc,+_
    ex de,hl
    pop de
    inc de
    inc de
    set curchild,(iy+heapflags)
    push de
_:
;{stack} points to the child
;HL is the value of the child
;BC points to the parent
;now compare the child and parent
    ex de,hl
    ld hl,(arraybase)
    add hl,bc
    push hl
    dec hl
    ld a,(hl)
    dec hl
    ld l,(hl)
    ld h,a
    call callix
    pop hl
    pop de
    ret nc
    dec hl
    call swap
;values swapped, now set parent=child
;BC is the index of child1
    bit curchild,(iy+heapflags)
    jp z,proppost
    inc bc
    jp propogate
swap:
;HL points to the top of one word
;DE points to the top of another
;Must preserve BC
#ifdef fast
    ld a,(de)
    ldd
    inc hl
    ld (hl),a
    dec hl
    ld a,(de)
    ldd
    inc hl
    ld (hl),a
    inc c
    inc bc
#else
    call +_
_:
    ld a,(de)
    ldd
    inc hl
    ld (hl),a
    dec hl
    inc bc
#endif
    ret
callix:
    jp (ix)
Even when optimizing for speed over size, my code is smaller, but I'm fairly sure there are more optimizations to be found!

18
TI Z80 / LZD Image Compression/Decompression
« on: June 27, 2018, 08:41:18 pm »
Hey all, I have been playing with compression. I made a Python program that compresses data, such as a TI Image, as well as a calculator program that can decompress such data to the graph screen.

While the Python program is general-purpose, the Z80 program assumes that the input is a compressed image, so you can mess things up! The neat thing is that in my example, the calc program (LZD) and the compressed data, along with their file headers, all came out to less than the size of the original image!

19
Math and Science / Fast 3-digit Squaring, Toom-Cook
« on: June 26, 2018, 02:18:40 pm »
Hi all, I realized that I have a lot of unshared results that I've worked out over the years! (Then again, I have a lot of shared results :P )
Today's is dealing with squaring 3-digit numbers.
If you aren't familiar with the Toom-Cook algorithm, read the spoiler! It's cool stuff. Otherwise, continue down.
Spoiler For Prerequisite:
The Toom-Cook algorithm basically generalizes multiplying integers as multiplying polynomials.For example, if we want to multiply 47*81, we can think of it as multiplying (7+4x)*(1+8x) and evaluating at x=10 (since our numbers were in base 10). This seems like a lot of unnecessary complexity, and it is for such small numbers, but let's look at a few key ideas.

If you are given 3 points on a plane, you can construct a unique quadratic function that passes through each point. The key here is that it is unique meaning there are no other degree-2 polynomials that pass through each point-- there is only one and that one must exists. On the other hand given 4 points, there might be a parabola that passes through each point, but it is not a guarantee and is in fact unlikely. As well, given three points, there exist infinitely many cubic polynomials (or quartic or quintic, etc.) that pass through each point. In general, if you have n+1 points, you are guaranteed to be able to find a unique polynomial of degree n that is the only such polynomial that passes through each point.

Guarantees and uniqueness are cool and all, but actually constructing the polynomials from those points can be tricky (but still "quickly" doable in a computational sense)! Luckily for our case, the points won't be totally arbitrary. Let's look back at our Toom-Cook example of multiplying 47*81. The polynomial product is (7+4x)*(1+8x), which we know should be a quadratic equation. Then if we evaluate this product at three easy points, we should be able to reconstruct the result. Specifically, let's evaluate at x=-1, x=0, and x=1:
(7-4)*(1-8)=-21
7*1=7
(7+4)*(1+8)=99

The easiest way to reconstruct the polynomial by hand, in my opinion, is to use matrices to represent a system of equations. In this case, with ai being the points we evaluated at -1,0, and 1:

[[  1   (-1)1    (-1)2   a0]
 [  1   01       02      a1]
 [  1   11       12      a2]]


   [[  1  -1   1   a0]
=   [  1   0   0   a1]
    [  1   1   1   a2]]

And now we reduce this to reduced row-eschelon form.

   [[  0   1   0   (a2-a0)/2]
=>  [  1   0   0   a1]
    [  1   0   1   (a2+a0)/2]]


   [[  0   1   0   (a2-a0)/2]
=>  [  1   0   0   a1]
    [  0   0   1   (a2+a0)/2-a1]]


Now that we know the resulting coefficients, let's construct our polynomial:
a1+(a2-a0)x/2+((a2+a0)/2-a1)x2
=7+60x+32x2
Let's evaluate this at x=10 and we have 3200+600+7 = 3807 which is indeed 47*81.

The nice thing about this is, since we've already solved the system of equations, we can reuse this and we don't need to compute it every time. We just compute the product at -1, 0, and , then plug it into our formula et voila. In fact, let's make a BASIC program that multiplies two ten-digit numbers and returns a 20-digit result. Input is passed through two 2-element lists, L1 and L2, and use those elements as 5-digit 'digits' to compute the product.
Code: [Select]
{1.-1
sum(L1Ans)sum(L2Ans->A
L1(1)L2(1->B
sum(L1)sum(L2->C
.5(C-A
{B,Ans,Ans+A-B->L1
0->C
L1e-5->L1    ;engineering E
For(K,1,3
C+L1(k->L1(K
e-5int(Ans->C
End
C->L1(4
e5fPart(L1->L1
The output is a 20-digit integer comprised of four 5-digit numbers. The first element is the lowest 5 digits, the fourth element is the upper 5 digits.
To square a 3-digit number, we'll result in a degree-4 polynomial. Consequently, we'll need to evaluate at 5 points. The points I chose were {0,1,i,-1,-i} where 'i' is the imaginary unit. It may seem like evaluating at a complex point would be, well, complex, but we'll see. First, let's write our polynomial as a0+a1x+a2x2.
Then:
b0=f(0)=a02
b1=f(1)=(a0+a1+a2)2
b2=f(i)=(a0+a1i-a2)2
b3=f(-1)=(a0-a1+a2)2
b4=f(-i)=(a0-ia1-a2)2
Squaring complex numbers is easy though-- (a+bi)2= a2-b2+2abi = (a-b)(a+b)+2abi
So simplifying:
b2=(a0-a2+a1)(a0-a2-a1)+2i(a0-a2)a1
b4=(a0-a2+a1)(a0-a2-a1)-2i(a0-a2)a1

Noticing symmetry, let's define instead:
c3=(a0-a2+a1)(a0-a2-a1)
c4=2(a0-a2)a1

Then:
b2=c3+ic4
b4=c3-ic4


    1       0       0       0       0   b0
    1       1       1       1       1   b1
    1       i      -1      -i       1   c3+ic4
    1      -1       1      -1       1   b3
    1      -i      -1       i       1   c3-ic4


    1       0       0       0       0   b0
    1       0       1       0       1   (b1+b3)/2 = c1
=>  0       1       0       1       0   (b1-b3)/2 = c2
    1       0      -1       0       1   c3
    0       1       0      -1       0   c4


    1       0       0       0       0   b0
    1       0       0       0       1   (c1+c3)/2 = d1
=>  0       0       1       0       0   (c1-c3)/2 = d2
    0       1       0       0       0   (c2+c4)/2 = d3
    0       0       0       1       0   (c2-c4)/2 = d4


Then our polynomial is:
b0+d3x+d2x2+d4x3+(d1-b0)x4


So for an example, let's find 348*348
Then we have:
a0=8
a1=4
a2=3
b0=64
b1=(8+4+3)2=225
b3=(8-4+3)2=49
c3=(8-3+4)(8-3-4)=9
c4=2(8-3)*4=40
c1=(225+49)/2 = 137
c2=c1-49 = 88
d1=(137+9)/2 = 73
d2=d1-9 = 64
d3=(88+40)/2 = 64
d4=d3-40 = 24

So then, our result is:
64+64x+64x2+24x3+9x4
Evaluating at x=10 eyelids yields:
64+640+6400+24000+90000 = 121104, which is indeed 384*384

Now let's generate some Python code:
Code: [Select]
def sqr3(a, b, c):
    d = a + b + c    # roughly 75% of cases exceed the original bit-width
    e = a - b + c    # roughly 1/6 of the time this will exceed the original bit-width. However, whenever this exceeds the bit-width, then d also exceeds it.
    d *= d
    e *= e
    d = (d + e) / 2
    e = d - e
    f = (a + b - c) * (a - b - c)   # it is never the case that both multiplicands exceed  their original word width, but about 1/3 of the time, one of them exceeds it (~2/3 of the time, neither exceed).
    b *= (a - c)     # neither multiplicand exceeds the original bit-width (though the second operand may become negative).
    b += b
    a *= a
    e = (e + b) / 2
    b = e - b
    d = (d - f) / 2
    f += d - a
    return [a, e, d, b, f]
Note that three of the sub-multiplications are just squaring the input. Also note that all divisions are by a power of two, and all results and operations are integer operations, not complex (assuming all inputs are integers).

20
Computer Programming / How To Add Two Sprites
« on: March 14, 2018, 01:39:13 pm »
For a few years now, I've been playing on-and-off with various ideas related to parsing and evaluating expressions. Today as I was working out the code for adding two values, I came across the case of adding a 'sprite' variable to some other variable.

If you had a programming language where you had two sprite variables, what would you want or expect 'sprite1+sprite2' to do, if anything? For that matter, how about adding an integer or float to it? How about multiplication or division?

I know, for example, that I would want 'sprite1 or sprite2' to return a sprite with it's contents bit-wise ORed together.

21
TI Z80 / Hash-Based Instruction Lookup
« on: November 04, 2017, 11:24:57 am »
So I was working on some code to take an alphanumeric input string and look it up in a table and do something. After rehashing a binary search algorithm for the nth time, I remembered an old idea that I had for finding a string in set of data.
Spoiler For Spoiler:
I could basically speed up the search by xor-ing each byte of the search string and using that value is an indicator of a potential match. If the input is n bytes long, then I xor the first n starting bytes of the data to search. If the XORed value matches, double check that you haven't already found a match! Otherwise, set variables tail=data_start and head=data_start+n and x as the XORed value of the input, and acc as the XORed value of the first n bytes of the data. Now XOR the byte at tail with acc and store it back to acc. Increment tail, increment head. XOR the byte at head with acc and store it back to acc. If acc==x then double check the string between tail and head isn't a match, otherwise, continue until head goes beyond the data.
After some poking around on Google, I came across a hash based search algorithm that was just the insight that I needed!

So I came up with some sample commands and I was lucky enough to find a simple hashing function that provided a unique value for all 46 functions.

I'm going to provide the code here, even though it isn't very pretty. This way if I ever lose it, I'll have this as reference :)
This example executes the command "Disp" which I just made display everything after it.
Code: [Select]
#include "grammer3.inc"

_DispHL = 4507h
_NewLine = 452Eh
_PutS   = 450Ah
#define bcall(x) rst 28h \ .dw x
scrap = 8000h
denom = scrap
numer = scrap+4
out   = scrap+8
.db $BB,$6D
.org $9D95
    ld hl,testinput
    ld bc,testinput_end-testinput
    call hashlookup
    jr c,+_
    ld hl,s_NOTFOUND
    bcall(_PutS)
    ret
_:
    inc de \ inc de ; Now DE points to the pointer to the parsing code
    ex de,hl
    ld a,(hl) \ inc hl \ ld h,(hl) \ ld l,a
    ex de,hl
    cpi
    ret po
    ;HL points to the input, BC is the size left
    ;DE points to the code that parses the instruction
    push de
    ret
testinput:
    .db "Disp Yay, it works!",0
testinput_end:
hashlookup:
;HL points to the input
;BC is the max size
;return nc if failed to match, c if success
    ld (input_save),hl
    ld (input_savesize),bc
    ld a,b
    or c
    jr z,match_null
    call computehash
    ld hl,(input_savesize)
    xor a
    sbc hl,bc
    jr z,match_fail
    ld b,h
    ld c,l
    ld d,a
    ex de,hl
    add hl,hl
    ld (hash),hl


    ld de,hashlut_builtin
    add hl,de
    ld e,(hl)
    inc hl
    ld d,(hl)

    ld hl,(input_save)
;BC is the input size
;HL points to the input string
;DE points to the comparison
    ld a,(de)
    cp c
    jr nz,match_fail
    inc de
    ld b,c

_:
    ld a,(de)
    inc de
    cp (hl)
    jr nz,match_fail
    inc hl
    djnz -_
    scf
    ret
match_null:
    ld de,t_null+1
match_fail:
    ld hl,(input_save)
    ld bc,(input_savesize)
    or a
    ret
computehash:
    ld e,0
_:
    ld a,(hl)
    sub 48
    ret c
    cp 10
    jr c,$+5
    sub 7
    ret c
    cp 68
    ret nc
    ld d,a
    add a,a
    add a,d
    xor e
    ld e,a
    cpi
    jp pe,-_
    ret
   
   
hashlut_builtin:
.dw t_null              ;00
.dw t_null              ;01
.dw t_null              ;02
.dw t_null              ;03
.dw t_null              ;04
.dw t_null              ;05
.dw t_null              ;06
.dw t_null              ;07
.dw t_null              ;08
.dw t_null              ;09
.dw t_null              ;0a
.dw t_cosh              ;0b
.dw t_null              ;0c
.dw t_null              ;0d
.dw t_null              ;0e
.dw t_null              ;0f
.dw t_null              ;10
.dw t_null              ;11
.dw t_atan              ;12
.dw t_null              ;13
.dw t_sinh              ;14
.dw t_null              ;15
.dw t_null              ;16
.dw t_null              ;17
.dw t_null              ;18
.dw t_null              ;19
.dw t_null              ;1a
.dw t_null              ;1b
.dw t_sqrt              ;1c
.dw t_null              ;1d
.dw t_null              ;1e
.dw t_max               ;1f
.dw t_null              ;20
.dw t_null              ;21
.dw t_ClrDraw           ;22
.dw t_null              ;23
.dw t_null              ;24
.dw t_null              ;25
.dw t_null              ;26
.dw t_null              ;27
.dw t_null              ;28
.dw t_Ellipse           ;29
.dw t_null              ;2a
.dw t_null              ;2b
.dw t_null              ;2c
.dw t_null              ;2d
.dw t_null              ;2e
.dw t_null              ;2f
.dw t_null              ;30
.dw t_null              ;31
.dw t_null              ;32
.dw t_null              ;33
.dw t_null              ;34
.dw t_null              ;35
.dw t_null              ;36
.dw t_null              ;37
.dw t_null              ;38
.dw t_null              ;39
.dw t_ln                ;3a
.dw t_null              ;3b
.dw t_null              ;3c
.dw t_pi                ;3d
.dw t_null              ;3e
.dw t_null              ;3f
.dw t_null              ;40
.dw t_null              ;41
.dw t_null              ;42
.dw t_null              ;43
.dw t_null              ;44
.dw t_null              ;45
.dw t_null              ;46
.dw t_null              ;47
.dw t_null              ;48
.dw t_null              ;49
.dw t_null              ;4a
.dw t_null              ;4b
.dw t_null              ;4c
.dw t_null              ;4d
.dw t_null              ;4e
.dw t_null              ;4f
.dw t_null              ;50
.dw t_null              ;51
.dw t_null              ;52
.dw t_null              ;53
.dw t_null              ;54
.dw t_null              ;55
.dw t_null              ;56
.dw t_null              ;57
.dw t_null              ;58
.dw t_null              ;59
.dw t_null              ;5a
.dw t_null              ;5b
.dw t_null              ;5c
.dw t_null              ;5d
.dw t_null              ;5e
.dw t_null              ;5f
.dw t_null              ;60
.dw t_null              ;61
.dw t_null              ;62
.dw t_null              ;63
.dw t_null              ;64
.dw t_null              ;65
.dw t_null              ;66
.dw t_null              ;67
.dw t_null              ;68
.dw t_randint           ;69
.dw t_asinh             ;6a
.dw t_null              ;6b
.dw t_tan               ;6c
.dw t_null              ;6d
.dw t_null              ;6e
.dw t_null              ;6f
.dw t_null              ;70
.dw t_null              ;71
.dw t_null              ;72
.dw t_null              ;73
.dw t_null              ;74
.dw t_acosh             ;75
.dw t_null              ;76
.dw t_null              ;77
.dw t_null              ;78
.dw t_null              ;79
.dw t_null              ;7a
.dw t_null              ;7b
.dw t_null              ;7c
.dw t_null              ;7d
.dw t_null              ;7e
.dw t_null              ;7f
.dw t_null              ;80
.dw t_atanh             ;81
.dw t_null              ;82
.dw t_null              ;83
.dw t_null              ;84
.dw t_null              ;85
.dw t_Line              ;86
.dw t_sin               ;87
.dw t_null              ;88
.dw t_null              ;89
.dw t_e                 ;8a
.dw t_null              ;8b
.dw t_null              ;8c
.dw t_mod               ;8d
.dw t_null              ;8e
.dw t_null              ;8f
.dw t_null              ;90
.dw t_min               ;91
.dw t_Circle            ;92
.dw t_gcd               ;93
.dw t_null              ;94
.dw t_null              ;95
.dw t_null              ;96
.dw t_null              ;97
.dw t_cos               ;98
.dw t_null              ;99
.dw t_null              ;9a
.dw t_null              ;9b
.dw t_null              ;9c
.dw t_null              ;9d
.dw t_null              ;9e
.dw t_null              ;9f
.dw t_null              ;a0
.dw t_log2              ;a1
.dw t_null              ;a2
.dw t_Tilemap           ;a3
.dw t_log10             ;a4
.dw t_null              ;a5
.dw t_null              ;a6
.dw t_null              ;a7
.dw t_null              ;a8
.dw t_Text              ;a9
.dw t_null              ;aa
.dw t_null              ;ab
.dw t_null              ;ac
.dw t_null              ;ad
.dw t_Disp              ;ae
.dw t_null              ;af
.dw t_null              ;b0
.dw t_null              ;b1
.dw t_null              ;b2
.dw t_null              ;b3
.dw t_null              ;b4
.dw t_null              ;b5
.dw t_null              ;b6
.dw t_null              ;b7
.dw t_DispBuf           ;b8
.dw t_lcm               ;b9
.dw t_setseed           ;ba
.dw t_null              ;bb
.dw t_null              ;bc
.dw t_null              ;bd
.dw t_null              ;be
.dw t_null              ;bf
.dw t_pow10             ;c0
.dw t_null              ;c1
.dw t_null              ;c2
.dw t_null              ;c3
.dw t_null              ;c4
.dw t_pow2              ;c5
.dw t_null              ;c6
.dw t_null              ;c7
.dw t_null              ;c8
.dw t_null              ;c9
.dw t_null              ;ca
.dw t_null              ;cb
.dw t_null              ;cc
.dw t_null              ;cd
.dw t_null              ;ce
.dw t_null              ;cf
.dw t_null              ;d0
.dw t_null              ;d1
.dw t_null              ;d2
.dw t_null              ;d3
.dw t_null              ;d4
.dw t_null              ;d5
.dw t_null              ;d6
.dw t_null              ;d7
.dw t_null              ;d8
.dw t_null              ;d9
.dw t_null              ;da
.dw t_null              ;db
.dw t_null              ;dc
.dw t_Shiftbuf          ;dd
.dw t_randseed          ;de
.dw t_null              ;df
.dw t_null              ;e0
.dw t_null              ;e1
.dw t_exp               ;e2
.dw t_Output            ;e3
.dw t_null              ;e4
.dw t_Sprite            ;e5
.dw t_acos              ;e6
.dw t_null              ;e7
.dw t_Rect              ;e8
.dw t_null              ;e9
.dw t_null              ;ea
.dw t_null              ;eb
.dw t_null              ;ec
.dw t_rand              ;ed
.dw t_null              ;ee
.dw t_null              ;ef
.dw t_null              ;f0
.dw t_null              ;f1
.dw t_null              ;f2
.dw t_null              ;f3
.dw t_null              ;f4
.dw t_null              ;f5
.dw t_null              ;f6
.dw t_null              ;f7
.dw t_null              ;f8
.dw t_asin              ;f9
.dw t_null              ;fa
.dw t_null              ;fb
.dw t_null              ;fc
.dw t_null              ;fd
.dw t_null              ;fe
.dw t_tanh              ;ff
t_null:
.db 0
t_cosh:
.db 4,"cosh"
#include "commands\cosh.z80"
t_atan:
.db 4,"atan"
#include "commands\atan.z80"
t_sinh:
.db 4,"sinh"
#include "commands\sinh.z80"
t_sqrt:
.db 4,"sqrt"
#include "commands\sqrt.z80"
t_max:
.db 3,"max"
#include "commands\max.z80"
t_ClrDraw:
.db 7,"ClrDraw"
#include "commands\ClrDraw.z80"
t_Ellipse:
.db 7,"Ellipse"
#include "commands\Ellipse.z80"
t_ln:
.db 2,"ln"
#include "commands\ln.z80"
t_pi:
.db 2,"pi"
#include "commands\pi.z80"
t_randint:
.db 7,"randint"
#include "commands\randint.z80"
t_asinh:
.db 5,"asinh"
#include "commands\asinh.z80"
t_tan:
.db 3,"tan"
#include "commands\tan.z80"
t_acosh:
.db 5,"acosh"
#include "commands\acosh.z80"
t_atanh:
.db 5,"atanh"
#include "commands\atanh.z80"
t_Line:
.db 4,"Line"
#include "commands\Line.z80"
t_sin:
.db 3,"sin"
#include "commands\sin.z80"
t_e:
.db 1,"e"
#include "commands\e.z80"
t_mod:
.db 3,"mod"
#include "commands\mod.z80"
t_min:
.db 3,"min"
#include "commands\min.z80"
t_Circle:
.db 6,"Circle"
#include "commands\Circle.z80"
t_gcd:
.db 3,"gcd"
#include "commands\gcd.z80"
t_cos:
.db 3,"cos"
#include "commands\cos.z80"
t_log2:
.db 4,"log2"
#include "commands\log2.z80"
t_Tilemap:
.db 7,"Tilemap"
#include "commands\Tilemap.z80"
t_log10:
.db 5,"log10"
#include "commands\log10.z80"
t_Text:
.db 4,"Text"
#include "commands\Text.z80"
t_Disp:
.db 4,"Disp"
#include "commands\Disp.z80"
t_DispBuf:
.db 7,"DispBuf"
#include "commands\DispBuf.z80"
t_lcm:
.db 3,"lcm"
#include "commands\lcm.z80"
t_setseed:
.db 7,"setseed"
#include "commands\setseed.z80"
t_pow10:
.db 5,"pow10"
#include "commands\pow10.z80"
t_pow2:
.db 4,"pow2"
#include "commands\pow2.z80"
t_Shiftbuf:
.db 8,"Shiftbuf"
#include "commands\Shiftbuf.z80"
t_randseed:
.db 8,"randseed"
#include "commands\randseed.z80"
t_exp:
.db 3,"exp"
#include "commands\exp.z80"
t_Output:
.db 6,"Output"
#include "commands\Output.z80"
t_Sprite:
.db 6,"Sprite"
#include "commands\Sprite.z80"
t_acos:
.db 4,"acos"
#include "commands\acos.z80"
t_Rect:
.db 4,"Rect"
#include "commands\Rect.z80"
t_rand:
.db 4,"rand"
#include "commands\rand.z80"
t_asin:
.db 4,"asin"
#include "commands\asin.z80"
t_tanh:
.db 4,"tanh"
#include "commands\tanh.z80"

poparg:
    ret
tostr:
    ret
s_NOTFOUND:
    .db "Command Not Found",0
input_save:
    .dw 0
input_savesize:
    .dw 0
hash:
    .dw 0
The code for the Disp.z80 file, already convoluted for your viewing pleasure :P
Code: [Select]
    .dw __tokenizeDisp  ;First code generates the token, returns a pointer to the name in HL, size of the name in BC
    .dw __parsecodeDisp
    .dw __compiledcodeDisp
    .dw __helpDisp

__tokenizeDisp:
    ld hl,+_
    ld bc,1
    ret
_:
    .db tokGraphics+0
__helpDisp:
    .db 0
__compiledcodeDisp:
    .db _poparg
    .db _tostr
    .db _inlineasm \ .dw +_-$-2
    bcall(_PutS)
    ret
_:
__parsecodeDisp:
;    call poparg
;    call tostr
    bcall(_PutS)
    bcall(_NewLine)
    ret
Edit:
Oops, and the grammer3.inc file (don't get your hopes up, it's just a test :P).
Code: [Select]
tokVars     = $00   ;allows for 32 variable types (int, str, float, etc.)
tokGraphics = $20   ;allows for 32 basic graphics commands (plot(), Pxl(), Rect(), etc.)
tokControl  = $40   ;allows for 32 control commands (If, Goto, etc.)
tokMath     = $60   ;allows for 128 basic math commands
tokExtend   = $E0
tokInclude  = $F0

var_uint8   = 0
var_int8    = 1
var_uint16  = 2
var_int16   = 3
var_uint32  = 4
var_int32   = 5
var_uint64  = 6
var_int64   = 7
var_float   = 8
var_floatext= 9     ;extended precision float
var_fixed88 = 10    ;8.8 fixed point number, optimized for speed over var_fixed
var_fixed1616=11    ;16.16 fixed point number, optimized for speed over var_fixed
var_fixed   = 12    ;customized size for integer and fractional part, up to 64 bits total
var_rational= 13    ;customized size for the numerator and denominator of: int8, int16, int32, and int64. Denominator is always unsigned.
var_complex = 14    ;complex pair of floats.
var_complexext=15   ;complex pair of extended precision floats.
var_gaussian= 16    ;complex pair of integers. Customized size for the real and imaginary parts: int8, int16, int32, and int64.
var_gaussian8=17
var_gaussian16=18


var_sprite  = 28
var_array   = 29
var_list    = 30
var_str     = 31


_poparg     = 0
_pusharg    = 1
_tostr      = 2
_inlineasm  = 3

22
Math and Science / A faster Newton's Method Square Root
« on: October 10, 2017, 02:04:50 pm »
EDIT 2022-06-29: The post below has a much needed improvement :)

I decided to compute some square roots by hand to keep my tired self awake and alert at work. I opted to use Newton's Method this time and I noticed an optimization!
Instead of attaining 2N precision with a 2N-by-N->2N division, I split it up into an N-by-N multiply and a N-by-N->N division, which on the Z80 or similar processor can be nearly twice as fast.


The standard algorithm works as follows:

To find the square root of c, start with a guess called x.
Iterate the following: (x+c/x)/2


Each iteration doubles the digits of precision; try it on your calculator if you aren't familiar with it! To fin the square root of 21, guess '5'. Type 5, hit [Enter]. Then do .5(Ans+21/Ans and hit [Enter] a few times.

The way it works is, if the current x is bigger than the square root of c, then c/x is less, and vice-versa. So averaging the two gets a closer estimate to the actual square root. My observation was that if I had, say, 3 digits precision, then those digits stay the same in the next iteration. Through some observations, this would mean the first 3 digits of c/x will match that of x. In fact, for c/x, I'll only need to compute it to 6 digits, but if I can skip the first 3, then the division is easier and faster! So, lets optimize:


=>.5(x+c/x)
=>.5(x+x+(c/x-x))
=>x+.5(c-x*x)/x

So it's probably not so clear, but essentially for 2n-digits precision, I need to square an n-digit number and divide an n-digit number by n digits, to n digits precision. This replaces a 2n/2n division.

The former is faster, so let's try an example of finding the square root of 21:

c=21
Guess: x=5
(c-x*x) = -4
-4/5/2 = -.4
x-.4 = 4.6

(c-x*x) = 21-4.6*4.6 = -.16
-.16/4.6/2 = -.0173913... ~.02
x-.02 = 4.58

(c-x*x) = 21-4.58*4.58 = .0236
.0236/4.58/2 = .0025764... ~.0026
x+.0026 = 4.5826

(c-x*x) = 21-4.5826*4.5826 = -.00022276
-.00022276/4.5826/2 = -.000024304979... ~-.000024305
x-.000024305 = 4.582575695



In practice by hand, it actually doesn't save much time, but that's because by hand I usually can perform a 2n-by-2n division a little slower than half that of a 2n-by-2n and I can do multiplication and division in roughly the same speed. So overall This gets me about 20% faster.

On the Z80, a 2N-by-N->2N division takes roughly the time of 1.6 2N-by-2N->4N multiplies. As well, a 2N-by-2N->4N multiply takes roughly 3.5 times the time of an N-by-N->2N multiply using recursive Karatsuba multiplication.

So the standard algorithm takes roughly 5.6 N-by-N->2N multiplies worth of time.
The modified algorithm takes roughly 2.9 N-by-N->2N multiplies worth of time.

23
TI Z80 / Shunting-Yard Algorithm
« on: September 27, 2017, 08:37:33 pm »
So I have been really wanting to implement the Shunting-Yard algorithm for a few years now and I think I've got it in assembly. I keep finding bugs, but in case other people are interested, here is my code. It assumes that the operator stack and output (combined) won't exceed 768 bytes.

Things that would be useful:
  • Converting numbers to an intermediate format.
  • Handling multi-byte tokens and named vars and functions.
This way the code can be pseudo-compiled when it is converted to RPN, then an RPN calculator can parse the code faster or prep it for actual compiling.
Code: [Select]
#define bcall(x) rst 28h \ .dw x
saveSScreen = 86ECh
scrap=saveSScreen
outhead=8000h
stackhead = 8002h
.db $BB,$6D
.org $9D95
    ld hl,test
    ld bc,test_end-test
    call shuntingyard
    call rpn
    ld hl,scrap
    bcall(450Ah)
    bcall(452Eh)
    ret
shuntingyard:
    ld de,scrap
    ld (outhead),de
    ld d,(scrap/256)+3
    ld (stackhead),de
_:
    ld a,(hl)
    call +_
    cpi
    jp pe,-_
    ld hl,scrap+768
    ld de,(stackhead)
    or a
    sbc hl,de
    ld b,h
    ld c,l
    ld hl,(outhead)
    ex de,hl
    jr z,$+3
    ldir
    dec de
    xor a
    ld (de),a
    ret
_:
    cp '.'
    jp z,num_dec
    cp 30h
    jr c,+_
    cp 3Ah
    jp c,num
_:
    cp '('
    jp nz,+_
    ex de,hl
    ld hl,(stackhead)
    dec hl
    ld (hl),','
    dec hl
    ld (hl),a
    ld (stackhead),hl
    ex de,hl
    ret
_:
    cp ')'
    jp nz,checkunops
    push hl
    push bc
    ld hl,scrap+768
    ld de,(stackhead)
    sbc hl,de
    jp z,ERR_Unmatched_lparens
    ld b,h
    ld c,l
    ex de,hl
    ld de,(outhead)
;BC is the size of the stack. Use this in case there is a missing ')' so we don't read garbage.
;basically search for the matching '(' while piping out the stack to the output.
outerloop:
    ld a,(hl)
    cp '('
    jr z,parens_found
    ld a,','
_:
    cp (hl)
    ldi
    jp po,ERR_Unmatched_lparens
    jr z,outerloop
    jp -_
parens_found:
    inc hl
    inc hl
    ld (outhead),de
    ld (stackhead),hl
    pop bc
    pop hl
    ret
checkunops:
checkbinops:
;; if the token is an operator, then:
;; while there is an operator at the top of the operator stack with
;; greater than or equal to precedence and the operator is left associative:
;; pop operators from the operator stack, onto the output queue.
;; push the read operator onto the operator stack.
;;
;;
    push bc
    ex de,hl
    call getprecedence
    ld a,c
    pop bc
    ex de,hl
    jp c,search_function
    ;now C is the precedence, with lower bit = 1 if left-associative
    push bc
    push hl
    ld de,(stackhead)
    ld hl,scrap+768
    sbc hl,de
    ld b,h
    ld c,l
    ld hl,(outhead)
    ex de,hl
    jr z,pushop
    ;a is the precedence against which to compare
_:
    push hl
    push bc
    push af
    ld a,(hl)
    call getprecedence
    jr c,+_
    pop hl
    ld a,h      ;incoming
    cp c
    jr nz,$+4
    rra \ nop

    pop bc
    pop hl
   
 ;======================================================
    jr nc,pushop
.echo "The following code only works until we have to add >1 byte tokens."
 ldi
 ldi
    jp pe,-_
    jp $+6
_:
    pop af
    pop bc
    pop hl
pushop:
    ld (outhead),de
    pop de
    dec hl
    ld (hl),','
    dec hl
    ld a,(de)
    ld (stackhead),hl
    ld (hl),a
    ex de,hl
    pop bc
    ret
search_function:
    jp ERR_Func_Not_Found
getprecedence:
    ld hl,binops
    ld b,(binops_end-binops)/2
_:
    cp (hl)
    inc hl
    ld c,(hl)
    ret z
    inc hl
    djnz -_
    scf
    ret
binops:
    .db 4,  $01
    .db '=',$50
    .db '|',$60
    .db '&',$70
    .db '-',$81     ;right associative is odd
    .db '+',$80     ;left associative is even
    .db '/',$83     ;right associative
    .db '*',$82     ;left associative
    .db '^',$85     ;right associative
binops_end:
num:
    ld de,(outhead)
_:
    ldi
    jp po,+_
    ld a,(hl)
    cp '.'
    jr z,num_dec+4
    cp 30h
    jr c,+_
    cp 3Ah
    jr c,-_
_:
    ld a,','
    ld (de),a
    inc de
    ld (outhead),de
    dec hl
    inc bc
    ret
num_dec:
    ld de,(outhead)
_:
    ldi
    jp po,+_
    ld a,(hl)
    cp 30h
    jr c,+_
    cp 3Ah
    jr c,-_
_:
    cp '.'
    jp z,ERR_Syntax_00
    ld a,','
    ld (de),a
    inc de
    ld (outhead),de
    dec hl
    inc bc
    ret
ERR_Syntax_00:      ;Too many decimal points.
ERR_Func_Not_Found:
ERR_Unmatched_lparens:
    ret
rpn:
    ret
test:
;    .db "(3.1415926535)"
.db "(3.142+6/2-7)*3^6*3"
test_end:

24
Hi all, I lost my previous references and example programs and it took me this morning to locate this algorithm, digest it, and spew out my own version.  I looked on all of my calculators and Omni first, so I'm going to post it here for when it happens again :P

Anyways, this is one of my favorite algorithms for evaluating logarithms:

Code: [Select]
;Natural Logarithm on [.5,2]
;Single precision
a=.5(1+x)
g=.5(a+x/a) ;half precision divide
g=.5(g+x/g) ;full precision divide
b=a
a=(a+g)/2
c=.5(a+g)
g=.5(c+a*g/c) ;full precision divide
c=a
b = a-b/4
a=(a+g)/2
c = a-c/4-b/16
return (x-1)(1-1/4)(1-1/16)/c
  • It achieves single precision accuracy (at least 24.4996 bits) on the range of inputs from [.5,2].
  • During range reduction, x is usually reduced to some value on [c,2c].
    • The best precision is found when c=.5sqrt(2) (range is [.5sqrt(2),sqrt(2)], achieving at least 31.5 bits
    • I prefer c=2/3 since 2/3 and 4/3 are equidistant from 1-- it makes it easier for me to analyze time complexity. This still offers at least 29.97 bits, which is better than single precision
  • Cost is:
    • amean: 7 . 'amean' is the same cost as an add in binary floats
    • half divide: 1
    • full divide: 3
    • multiply:    1
    • shift-by-2:  3
    • shift-by-4:  2. This is sightly more efficient on the Z80 than 4 single-shifts when values are in RAM
    • add/sub:     5
    • add/sub const:1

I derived this algorithm from this wonderful paper which is annoyingly never at the top of a Google search and in fact took me a loooong time to ever stumble upon it, sadly.

This paper greatly accelerates the classic Borchardt-Gauss algorithm to be on par with the AGM algorithm. At their core, both algorithms perform an arithmetic and geometric mean, but AGM requires them to be done in parallel (emulated on single-core processors by some simple variable juggling), whereas B-G does them sequentially. As well, AGM achieves quadratic convergence or better (I've seen some exponential convergence in specific special cases), whereas classic B-G usually achieves linear convergence. Carlson's version of the B-G algorithm adds O(log(n)^2) additions and O(log(n)) space for quadratic convergence (where n is the number of desired bits of accuracy).

I like the B-G-C algorithm since I can easily obtain the inverse trig functions and inverse hyperbolic functions as well as the natural logarithm.

25
Sorry, I'm on my phone so I'll probably not go too in-depth on this :( Bug me for details if I don't get around to it and you need them :P

So:
Given ##x\in[-.5ln2,.5ln2]##
Let ##y=x^{2}##
Let ##a=\frac{x}{2}\frac{1+\frac{5y}{156}\left(1+\frac{3y}{550}\left(1+\frac{y}{1512}\right)\right)}{1+\frac{3y}{26}\left(1+\frac{5y}{396}\left(1+\frac{y}{450}\right)\right)}##
Then ##e^{x}\approx\frac{1+a}{1-a}##

Accuracy is ~75.9666 bits.
7 increments, 1 decrement
6 constant multiplications
6 general multiplications
2 general divisions
1 div by 2 (right shift or decrement an exponent)

For comparison, that's comparable to 16 terms of the Taylor's series, or 8 terms of the standard Padé expansion (exponential is special in that it comes out to f(x)/f(-x) so it can be done even easier than most).

I basically carried out a Padé expansion for e^x to infinity, noticed that after the constant term all the even coefficients were zero, so I used a Padé expansion on that function to quickly find our approximation for a.

In my usage, I actually implemented a 2x function since I'm using binary floats with 64-bit precision. I take int(x) and save that for a final exponent for the float. Remove that value from x. By definition of int(), x is now non-negative. If x≥.5, increment that saved exponent, subtract 1 from x. Now x is on [-.5,.5]. Now we need to perform 2x, but that is equivalent to ex*ln(2). We've effectively applied range reduction to our original input and now we can rely on the algorithm at the top. That integer part that we saved earlier now gets added to the exponent byte, et voilà !

I think when I calculated max error using my floats it was accurate up to the last two bits or something. Don't quote me on that as I don't have those notes on me at the moment and it was done months ago.

26
TI Z80 / Floatlib
« on: May 14, 2016, 06:44:13 pm »
EDIT 14 January 2019 : I put this project up on GitHub (Floatlib), incorporating the z80float library that I have been updating and fixing and adding to. Floatlib now has extended-precision support as well as a more complete library of single-precision float routines!

I made a cute little app for my calc that I thought I would share! It basically puts all of my single-precision floats into one app and provides a somewhat easy way to access the routines through a jump table.

It even features an in-app reference for the routines which is useful for when I'm programming on my calc, as well as a hexadecimal translation of the header just in case I forget the code.

I included two examples. The first simply computes and displays eπ. The second uses an algorithm similar to that of the Arithmetic-Geometric Mean algorithm to compute arctangent. It takes the input number as a string in Ans.

Warning: I've run into a few bugs (some are documented, others I haven't chased down yet). Pro-tip: Don't store your program variables in the range of scrap to about scrap+30. The default value for scrap is 8000h. If you change this, please note that the app expects the scrap region to be 256 bytes.

27
ASM / [help] App Signing, SPASM
« on: February 09, 2016, 10:14:48 am »
My computer is a Rasberry Pi with Debian, and I'm trying to create and sign apps with SPASM. Unfortunately I get the error that appsigning isn't available in my build of Spasm. So then I installed rabbitsign, but rabbitsign says the number of app pages is wrong and when I try to send the resulting file to the calc, it has an error.

Can anybody help me? And is it possible for spasm to do it all (I know that's what I used to do with my old computer).

28
TI Z80 / Yet Another Code Parser
« on: February 08, 2016, 11:17:34 am »
As many of you know, I have a predilection for making code parsing engines. This time I don't have any grand expectations to finish this as a programming language. I will, however, offer my latest attempt for anybody who wants to make a custom parser.

Adding a binary operator
   Operations like ‘+’ and ‘-’ are considered binary operators. They take two inputs and produce one output. In this parser, binary operators are performed left-to-right. To add a new bin op, open up the source file 'binop.z80'. In the routine isBinOp, after the ld a,(de) line, insert binoperator(token,my_op). Then add the routine to the file somewhere like:
Code: [Select]
my_op:
;;checks if x<y
    xor a
    sbc hl,de
    ld h,a
    rla
    ld l,a
    xor a    ;sets type as uint16
    ret

Adding functions
This is a tad more complex of a process, but once you add a few, it's super easy. There are several built-in packages for different types of functions (math or graphics, for example). Each of these packages are listed in the look up table in the file "includes.z80". You can even add your own built-in packages if you want. Each package has an alphabetically sorted lookup table of functions and you will need to add your function to one of these tables. First however, let's add the code:
Code: [Select]
my_func00:
    .db "SPRITE",0    ;This is the name of the function

;;To make sure Ans is preserved, use this default code
    ld hl,(ans_ptr)
    push hl
    push af
    ld a,(ans_type)
    push af

;;Parse the arguments. Remember you can't modify DE or BC unless you save them.
;;parseNum returns A as the type, HL as the value. It checks to make sure the input was a number, and since only uint16 is supported, you don't need to check the type.
    call parseNum \ ld a,l \ push af
    call parseNum \ pop af \ ld h,a \ push hl
    call parseData
;;Now the stack holds the coordinates, HL points to the data string.
;;Save BC,DE so we can have full reign
    ld (parsePtr),de    ;Save these values
    ld (parseSize),bc   ;
    pop bc          ;B=X,C=Y
    call drawSprite

;;Now restore BC,DE
    ld bc,(parseSize)
    ld de,(parsePtr)

;This is the default ending. Jump to closeFUnc with
;A = ans_type
;H = opening token. Ex. '(' for SPRITE(x,y,data)
;<stack> holds ans_ptr
    pop hl
    pop af
    jp closeFunc
When that is finished, insert .dw my_func00 into the graphics LUT (located in graphics.z80). You need to insert this so that the LUT is in alphabetical order! Since "SPRITE" comes after "RECTXOR", you would insert it at the end of the LUT.

To Use:
Send prgmPARSER to your calc. You need Ans to contain the name of the program file to parse. For prgmTEST, use "ETEST" or "[TEST".

Included functions:
DISP x
RECTXOR(y,x,h,w)
ABS(x)

Binary ops:
+,-,*,/ are the traditional binary ops, currently only performed on uint16
plotdot, plotcross, plotbox are used as in Axe for 16-bit AND, OR, and XOR.

Things that would make this much better:

-Adding metadata after the function names to indicate stuff like whether it is a binary operator (so that 'x nCr y' can be added, for example.)
-An ASCII editor, since this is actually supposed to take ASCII input, not tokens.
-Variable support

29
TI Z80 / Balltrix - z80 ASM Remake
« on: January 09, 2016, 11:22:08 pm »
Updated Here

I've remade DJ Omnimaga's Balltrix in z80 Assembly. As requested in an older port, I've kept the original title screen.

The biggest issue that needs fixing is the name entry menu for highscores. It works, but it is super ugly and tedious. Some quirks/features are:
-The highscore info is saved directly to the program, so no external save file.
-Only 48 bytes of RAM are used to store the graphics data and none of the 768-byte buffers are modified by this program.
-This program spends a lot of it's time in a low-power halt state and operates at 6MHz.

I'm still in the process of tweaking and modifying the program, but I think I've got the control sensitivity almost perfect as well as the gradual increase in speed. My highest score so far is 87 ;D

30
TI Z80 / Mandelbrot Set explorer
« on: September 01, 2015, 09:00:55 am »
I had no internet for a few days, but at the same time I had a day off from work, so I made a Mandelbrot Set explorer! It uses 4.12 fixed point arithmetic. Here is a gif of it in action, and attached are stills of some deeper zooms.



I also made my own 4x4 mini font!

Pages: 1 [2] 3 4 ... 13