Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Topics - Xeda112358

Pages: 1 2 3 [4] 5 6 ... 13
46
Math and Science / A question about zeroes of the Zeta function
« on: October 28, 2014, 01:24:11 pm »

The problem:
Is there an ##s\in \mathbb{C}## such that ##\zeta(s)=\zeta(2s)=0##? If so, what is ##\lim_{z\rightarrow s}{\frac{\zeta(s)^{2}}{\zeta(2s)}}##?



Okee dokee, I know this is getting dangerously close to the Riemann Hypothesis, but I swear that wasn't my original intent!


I was playing around with infinite sums and products and I wanted to figure out the sum form of ##\prod_{p}{\frac{1+p^{-s}}{1-p^-s}}##. I started by knowing that ##\prod_{p}{(1-p^{-s})}^{-1} = \zeta(s)##, and from previous work, ##\prod_{p}{\sum_{k=0}^{n-1}{(p^{-s})^{k}}} = \frac{\zeta(s)}{\zeta(ns)}##). From that, I knew ##\prod_{p}{1+p^{-s}} = \frac{\zeta(s)}{\zeta(2s)}##, thus I know the product converges when ##\zeta(2s) \neq 0, \zeta(s) \in \mathbb{C}##. I knew convergence wasn't an issue (for the most part) so I expanded the product some (a reversal of Euler's conversion from ##\zeta(s) = \sum_{k=1}^{\infty}{k^{-s}} = \prod_{p}{(1-p^{-s})}^{-1}##) and obtained:

##\prod_{p}{\frac{1+p^{-s}}{1-p^-s}}= \sum_{k=1}^{\infty}{2^{f(k)}k^{-s}}##,


 where ##f(k)## is the number of prime factors of k. It turns out that this has been known since 1979 and after I had access to the internet, I figured out the typical symbol used for my ##f(k)## in this context is ##\omega(k)##. So that was cool, and I could write that sum and product in terms of the zeta function as ##\frac{\zeta(s)}{\zeta(2s)}\zeta(s) = \frac{\zeta(s)^{2}}{\zeta(2s)}##, so do with that what you will (I tried to use it to find the number of prime factors of a number n, but I didn't get anywhere useful). What I decided to pursue was when this function was zero. As long as ##\zeta(s)=0, \zeta(2s)\neq 0##, we know that it is 0, but I haven't yet figured out if there is ever an instance where ##s\in \mathbb{C}, \zeta(s)=\zeta(2s)=0##. If there is such an s, then the Riemann Hypothesis is false. However, if such an s does not exist, this says nothing about the Riemann Hypothesis :P .
Spoiler For Why the existence of an s would prove RH false:
It is simple: RH says that ##\zeta(s)=0## if and only if ##Re(s)=\frac{1}{2}##. So if ##\zeta(s)=\zeta(2s)=0##, assuming RH is true, then ##Re(s)=Re(2s)=2Re(s)##, which would imply s=0, a contradiction to the Riemann Hypothesis which we just assumed was true!


As well, if there is such an s, I wanted to know if it could be one of those removable discontinuity thingies.


EDIT: fixed a tag and an incorrect restatement of the RIemann Hypothesis :P

47
Math and Science / PrettyPrinting your Math Posts
« on: October 28, 2014, 12:24:57 pm »
If you want to post math stuff, it is a fantastic idea to take advantage of the [tex] <expression> [/tex] tags. Here is an example:

##e^{\sqrt{\frac{-3x}{2}}}## versus e^(sqrt(-3x/2)).


The code used in the tags is called LaTex and you can look up the fun stuff you can do with it via Google. However, for a bit of a quick start I will give the following nonsense expression and it's code:


##\sqrt{3} \neq \frac{\pi^{2}}{6} = \sum_{k=1}^{\infty}{\frac{1}{k^{2}}} \gt \zeta \left(2+\epsilon\right), \epsilon \in \mathbb{R}, \epsilon>0##


The code is:
[tex]\sqrt{3} \neq \frac{\pi^{2}}{6} = \sum_{k=1}^{\infty}{\frac{1}{k^{2}}} \gt \zeta \left(2+\epsilon\right), \epsilon \in \mathbb{R}, \epsilon>0[/tex]


That might look scary, so let's break it down:
\sqrt{} will put a square root symbol over everything inside {}
\neq is the "not equals" sign. There are also \leq, \lt, \geq, \gt.
\frac{numerator}{denominator} makes an expression in fraction form.
\pi is the pi symbol
^{exponent} raises the expression to a power
= is the = symbol
\sum_{lower limit}^{upper limit}{expression} creates a sum from an upper to lower limit (optional arguments). As a note, _{} creates a subscript, ^{} is a super script, if that helps to figure the stuff out.
\infty is the infinity symbol.
\gt is the greater than symbol
\zeta is the lower case zeta symbol. \Zeta is uppercase. LaTeX is case sensitive.
\left( is a left parentheses. It gets sized according to what comes between it and it's matching \right).
\in makes the "element of" symbol
\mathbb{} makes a "blackboard bold" character for certain special sets (N,Z,Q,A,R,C,H,O,S)


So for fun, here are some simpler codes and their outputs:
\left( 3^{2} \right)
##\left( 3^{2} \right)##


x_{n}
##x_{n}##


\frac{1}{k}
##\frac{1}{k}##


\sum_{k=1}^{n}{\frac{1}{k}}
##\sum_{k=1}^{n}{\frac{1}{k}}##

48
So I've been developing a way to compute the geometric mean of two numbers (defined by ##\sqrt(xy)##) and my original algorithm had a few snazzy properties:
-It is an add-shift algorithm
-No multiplication needed and so no risk of overflow
-No explicit square root operation performed

I played with the math a bit, and if I take it out of the realm of add-shift routines, and allow division and multiplication, I came up with an algorithm that is relatively easy to do by hand, converges quarticly (quadruples the digits of accuracy at each step), and still has no overflow risk. In practice, on my TI-84+, I get full 14-digit precision in 2 iterations. The next iteration should theoretically get 56 digits, and each iteration requires 2 divisions, 1 inverse, and 1 multiplication (plus a shift and a few adds). Here is an example to show how it works:

Say we wish to find the geometric mean of two numbers, x=209, y=654321. First, for notation, I am just going to refer to the geometric mean as {x,y}, so {209,654321}. The first step is to find the first 'n' so that 4n*x is bigger than y, so in this case, n=6.
Then {x2n,y2-n}={x,y}, but now the numbers are much "closer." We get {13376,10223.765625}. Now factor out the largest power of 2 that fits in both numbers, which is 8192. We have 8192{1.6328125,1.248018265}.

All of that preemptive stuff has set up a new {x,y} where we have y<x<2y, so 1<x/y<2, or 1>y/x>.5. This will allow for some fine-tuned magic in a bit. What we should realize is that if we can get x=y somehow, then we have {x,y}={r,r}=##\sqrt(r^{2})=r##. So notice that if we choose some number ##a##, we have ##{xa,y/a}=\sqrt{xay/a}=\sqrt{xy}={x,y}##. So what we need to do is find some ##a## such that xa=y/a, so xa2=y, ##a=\sqrt{y/x}##. However, we want to avoid computing square roots in our computation of square roots, so what we can do is make an estimate for a. As luck would have it, based on our restrictions above, we can make a pretty decent estimate of the square root just by averaging the quotient with 1. In this case, that is ##\frac{\frac{1.248018265}{1.6328125}+1}{2}\approx .8821682725##. So then we get 8192{1.440415382,1.414716788}. Iterating:
8192{1.427566085,1.427450431}
8192{1.427508258,1.427508256}
8192{1.427508257,1.427508257}

And for safety, we could do one more iteration and average the two numbers in case of extra hidden precision, but we get the final result 11694.147638883, which is pretty close to 11694.147638883.

But that required 5 iterations requiring 2 divisions and 1 multiplication each. This is not the promised 2 iterations! So here is where we do some magic. At the division step for y/x, we want to find the square root, right? Why not try a couple of iterations at this step? We would get {1,y/x} as the square root, our estimate is still ##\frac{\frac{y}{x}+1}{2}##, so if we apply this, we get ##{\frac{\frac{y}{x}+1}{2},y/x\frac{2}{\frac{y}{x}+1}}\approx \frac{\frac{y}{x}+1}{4}+\frac{y/x}{\frac{y}{x}+1}##. So in pseudo-code, we could do:
Code: [Select]
1+y/x→a
1-1/a+a>>2→a
{xa,y/a}→{x,y}
This is the same as performing two iterations at the cost of 3 divisions, 1 multiplication, compared to 4 divisions, 2 multiplications. So we save 1 division, 1 multiplication.



So, for some working TI-BASIC code:
Code: [Select]
{max(Ans),min(Ans→L1
int(log(L1(1)/L1(2))/log(4
L1{2^-Ans,2^Ans→L1
2^int(log(Ans(1))/log(2→P
L1/Ans→L1
While E-10<abs(max(deltaList(L1    ;engineering E
1+L1(2)/L1(1
1-Ans^-1+.25Ans
L1{Ans,Ans^-1→L1
End
Pmean(L1

49
Math and Science / My Newest Trick For Division
« on: March 25, 2014, 02:03:43 pm »
Okay, so yesterday, I was thinking about the algorithms I use for computing logs and exponentials with my floating point numbers and I realized that I had the core of a fast division algorithm right there. Then I thought, "oh wait this is just Goldschmidt Division." I was wrong though, now that I am more awake and looking at the algorithm. The biggest difference is that mine works on the same time complexity as long-division so I need 64 iterations for my 64 bits of precision versus 5.

But there is a but. Those 5 iterations would require 2 full precision 64-bit multiplications, which I currently have down to about 9000 t-states each. My method would require 3*32 shifts+subtracts. On the surface, it looks like it would require 32 more, but I have tricks and magic on my side.

So how does it work? I decided to look at x/y as a vector {x,y} where I try to drive y towards 1 (I have been doing a lot of CORDIC stuff lately). The speed comes from how I drive y towards 1. In it's normal state, I have mantissas as 1.xxxxx, so x and y can be treated as numbers on [1,2). Now as a quick observation, if y>=4/3, then y*3/4 is less than y, but greater than or equal to 1. However, y*3/4=y-y/4 = y-(y>>2) where ">>2" refers to a right shift of 2. Similarly, I compare to 8/7, 16/15, et cetera.

That comparison step can get annoying, especially if you are using a lookup table of 64-bit numbers. It can also get slow. What I did was look at the binary expansions of these numbers:
4/3 = 1.01010101...
8/7 = 1.001001001...
16/15 = 1.000100010001...
32/31 = 1.000010000100001...

See the pattern? Basically, I can count the leading zeroes after the decimal point and if the result is 63, I can stop (denominator is 1) else if it is n, the denominator is guaranteed to be on ##[\frac{2^{n+2}}{2^{n+2}-1},\frac{2^{n+1}}{2^{n+1}-1}]##. From this, a quick pseudo-code example would be:
Code: [Select]
Label:
   2+clz(y)→a
   if a>m : Return
   x-x>>a→x
   y-y>>a→y
The code assumes clz will count the leading 0s after the first digit (which is always 1) and m is the desired accuracy. A speed up can occur by noticing that if y=1.000...001xxxxx in binary with n 0s, then y-y>>(n+1) isn't going to change the next sequence of n bits after the 1 (because you are subtracting 0s from xs). So if you have gone up to m bits, the next m bits are basically computed for free in y. This leads to a 2-part algorithm:
Code: [Select]
Label1:
   2+clz(y)→a
   if a>m : Goto Label2
   x-x>>a→x
   y-y>>a→y
Label2:
For n from m+1 to 2m-1
   if bit n of y is 1: x-x>>n→x
Adding that extra bit of code gets 2M bits of precision.

I just thought I would share :) This sucks on the Z80 because there isn't a CLZ instruction (but this can be simulated at decent speed), but those arbitrary shifts are what can't kills it.

50
TI Z80 / SpriteFont
« on: March 17, 2014, 09:38:44 am »
I threw this together after Art_of_camelot mentioned that it could be useful. It is a lightweight font hook program that does the following:
  • During program execution, you can change the homescreen font.
  • Upon exiting, the previous font hook is restored
Fonts must be appvars without a header (so Omnicalc fonts won't work, for example), but they can be archived or in RAM. The "commands" are simple. A string in Ans contains the appvar name. Lowercase won't work very easily, as I don't include a tokens→ASCII routine, so use uppercase letters and numbers. For example:
Code: [Select]
"FONT":Asm(prgmSPRTFNT
If you want to swap fonts put a plus sign in front:
Code: [Select]

"+FONT":Asm(prgmSPRTFNT
If you want to disable the font hook, either pass a non-string or "-":
Code: [Select]
"-":Asm(prgmSPRTFNT


Notes- If you don't use "+" to swap fonts, the hook won't be able to load in the default font after exiting.
If you use the assembly program to allow you to use Delvar / Archive / Unarchive on a program, displaying text before setting the mode back to "program mode" will cause the font to be disabled.


EDIT: And the code:
Code: [Select]

#include    "ti83plus.inc"
#define     progStart   $9D95
.org        progStart-2
.db         $BB,$6D
name = n-SpriteFont+9872h
adjust = a-SpriteFont+9872h


SetUpFont:
bcall(_RclAns)
sub 4
jr z,Parse
deacthook1:
call Is_Hook_Active
ret nz
jp deacthook-SpriteFont+9872h
Parse:
ex de,hl
ld c,(hl)
inc hl
ld b,(hl)
inc hl
ld a,(hl)
cp 71h
jr z,deacthook1
cp 70h ;+ sign
jr nz,LoadHook
call Is_Hook_Active
inc hl
jr nz,LoadHook
ld de,name+1
ldir
xor a
ld (de),a
ret
LoadHook:
push hl
    push bc
ld hl,(fontHookPtr)
ld a,(fontHookPtr+2)
ld (prevhook),hl
ld (prevpage),a


ld hl,SpriteFont
ld de,9872h ;appbackupscreen
ld bc,SpriteFontEnd-SpriteFont
ldir
    pop bc
pop hl
ld de,name+1
ldir
xor a
ld (de),a




ld hl,9872h
ld a,l
bcall(4FE4h)         ;Sets the font hook
ret
Is_Hook_Active:
ld a,(9872h)
sub 83h
ret nz
bit 5,(iy+35h)
ret nz
push hl
ld hl,(fonthookPtr)
ld de,9872h
sbc hl,de
pop hl
ret nz
SpriteFont:
.db 83h
dec a
jr z,$+5
exithook:
xor a
inc a
ret
bit 1,(iy+8)
jr nz,locate_replace
    push hl
deacthook:
.db 21h
prevhook:
.dw 0
.db 3Eh
prevpage:
.db 0
bcall(4FE4h)         ;Sets the font hook
    pop hl
jr exithook
locate_replace:


;B is the char
ld (restoreBC-SpriteFont+9872h),bc
ld l,b
ld h,a
in a,(6)
ld (restorePage-SpriteFont+9872h),a
ld c,l
ld b,h
add hl,hl
add hl,bc
add hl,hl
add hl,bc
push hl
ld hl,name
rst 20h
bcall(_ChkFindSym)
pop hl
jr c,deacthook
ex de,hl
inc hl
inc hl
add hl,de
ld a,b
or a
jr z,ram_font+2
ld b,0
add hl,bc
ld c,10
add hl,bc
jp p,ram_font-SpriteFont+9872h
inc a \ ld h,40h
ram_font:
out (6),a
ld de,lFont_record
ld b,7


ld a,(hl) \ ld (de),a
inc e \ inc l
call z,adjust
djnz $-7


ld hl,lFont_record
.db 3Eh
restorePage:
.db 0
out (6),a
.db 1
restoreBC:
.dw 0
xor a
ret
a:
inc h
ret po
in a,(6)
inc a
out (6),a
ld h,40h
ret
n:
.db 15h,0,0,0,0,0,0,0,0
SpriteFontEnd:

51
Math and Science / Computing Logarithms
« on: March 06, 2014, 12:45:55 pm »
One of the cooler ways of computing logarithms that I know of generates 1 bit per iteration using a look up table and add/shift operations in programming-terms. It works using some simple rules about logs, in particular:
##log(xy)=log(x)+log(y)##



So for example:
##log(x)=log(x\frac{3}{4}\frac{4}{3})=log(x\frac{3}{4})+log(\frac{4}{3})##


log(1)=0 for any base, so if we multiply x by enough stuff to get it close to 1, we end up with 0+stuff and "stuff" can be in a precomputed table. Multiplying by 3/4, for example, is easy. Division by a power of 2 can be done with bit shifts, so to translate to code, x-x>>2. If we only multiply by ##\frac{2^{n}-1}{2^{n}}## then the process is an add and shift, and we just need to add ##log(\frac{2^{n}}{2^{n}-1})## to the accumulator. Let's do some practice to find log(23):


First, get 23 into the range [1,2). If we divide by 24, then we get 1.4375, so:
##log(23)=4log(2)+log(1.4375)##
Since 1.4375>4/3, if we multiply by 3/4, we will get a smaller number that is still bigger than 1. So:
##4log(2)+log(1.078125)+log(4/3)##
In turn, 1.078125<8/7, so we skip that.
1.078125>16/15, 1.078125*15/16=1.0107421875
##4log(2)+log(1.0107421875)+log(4/3)+log(16/15)##

Continuing this process up to 1024/1023 for 3 digits of accuracy, we get:

##4log(2)+log(1.002845764...)+log(4/3)+log(16/15)+log(128/127)+log(512/511)##
##log(23)\approx 4log(2)+log(4/3)+log(16/15)+log(128/127)+log(512/511)##


For comparison, ##log_{10}(23)\approx1.361727836##, and  ##4log_{10}(2)+log_{10}(4/3)+log_{10}(16/15)+log_{10}(128/127)+log_{10}(512/511)\approx 1.361342752##. We only wanted 3 digits of accuracy, but this is actually accurate to within 1/2596.

This is fast, but can we make it faster? (hint: You know the answer to this.)


Basically, we are comparing to 4/3, 4/3 (yes, twice), 8/7, 16/15, ... but if we are computing logs in a program, especially at the assembly level, we will be working with bits. Allow me to rewrite these numbers in binary:


4/3=1.0101010101...
8/7=1.0010010010...
16/15=1.000100010001...


Do you see the pattern? On top of that, we can really just count the leading zeroes after the decimal. If there are 5, we can multiply by 63/64  and at the worst case, it will be an overestimate by 1 shift (for example, 1.000001 is not bigger than 63/64, so we would actually need to x*127/128 = x-x>>7 instead of x-x>>6). This lets us skip any unnecessary steps for comparing and we don;t need an LUT of these values. As a new example, lets compute log(1.890625). In binary, this is log(1.111001). There are no leading zeroes, so multiply by 3/4:
1.111001-.01111001=1.01101011
[Aside: this is a case where we have to do an initial, multiply by 3/4 and do it again]


Now we jump into the main iteration. There is 1 leading 0, so multiplying by 3/4, we still get a number bigger than 1:
1.01101011-.0101101011=1.0001000001
Now we have 3 leading zeroes, but it happens to be smaller than 16/15 (notice it is "3 zeros, 1, 5zeroes, 1") so we multiply by 31/32:
1.0001000001
-.000010001000001
--------------------------
1.000001111011111

Now there are 5 leading zeroes, so we will stop here and estimate log(4/3)+log(4/3)+log(32/31)+log(64/63) which is accurate to two decimal places.


[/font]
I promised faster, though. That was an improvement, but not by as much as the following. Notice that multiplying by 15/16 is the same as shifting x 4 times to the right and then subtracting that from x. Since x is on [1,2), that means there is always 3 leading zeroes after the decimal followed by a 1 and everything else. We also know the first 3 digits have to be zeroes, to, so after the 1, we have 3 zeroes, so it looks like this:
Code: [Select]
1.0001abcdefgh
-.0001000abcdefgh
But if we subtract them, we see that we will get about 1.0000abc.... The abc are left alone if d=1 (no carry from subtraction) otherwise the bit propagates. Still, if a is set, we can expect that the next iteration is going to require multiplying by 31/32, but that won't likely change b or c, so then the next iteration depends on what b and c are.

Basically, what I am saying is this: If we have the number log(1.01abcdefg), we would multiply by 3/4 and if we stop here, we could get a better approximation by adding using a*log(8/7). If we stop at log(1.000001abcdefgh), though, there are 4 leading zeroes, so we can add to the end of this for a better approximation, a*log(128/127)+b*log(256/255)+c*log(512/511)+d*log(1024/1023)+e*log(2048/2047). Going back to an example in the previous section, we had x=1.000001111011111 with the approximation log(4/3)+log(4/3)+log(32/31)+log(64/63). If we want to guess and get pretty close to double the accuracy, we can add 1*log(128/127)+1*log(256/255)+1*log(512/511)+0*log(1024/1023)+1*log(2048/2047). Indeed, we get 4 digits of added accuracy (.2766723863, estimated vs. .2766053963... actual).


We can apply this recursively to double the digits of accuracy at each iteration, but this usually overestimates, especially early on. This will be easy to detect and correct when you subtract x-x>>n→x for each bit, so you can adjust there. Ideally, this is useful to try to squeeze as much extra precision as you can in the last iteration. If I wanted to write a logarithm routine to 64 bits of accuracy, I could wait until it got to 32-bits or more completed, and then I would have a pretty safe estimate for at least the next 31 bits and most likely the 64th bit and this would be without all the computation of x-x>>n.




52
Site Feedback and Questions / Omni Feature Request
« on: March 03, 2014, 09:51:19 am »
Hi. Some of you might not know me, but I was wondering if we could have something like [math][/math] / [itex][/itex] / [latex][/latex]-like tags. They make equations look a lot nicer than "237/101-4sqrt(2)/(x+sqrt(2))".




But really, I do like to post in our Math section and some formulas start looking pretty ugly in linear format.

53
ASM / Fast 8-bit Squaring
« on: January 28, 2014, 09:03:43 am »
Last night I was thinking again about my Sine Approximation routine and holy crud, I must have been really tired not to recognize this:

In the approximation, I was taking the upper 8 bits, truncating propagation. This was so that I had a fixed-point routine. But if I had taken the lower 8 bits, I would have had a useful 8-bit integer routine for -x2-x.

But wait.

If I start the accumulator with x (which is actually 3 cycles faster to do), then I end up with -x2.

Well then, anybody wanna optimize this? ^^ :
Code: [Select]
L_sqrd:
;Input: L
;Output: L*L->A
;159 t-states
;39 bytes
ld h,l
ld c,l
rr h
sbc a,a
xor l
add a,c
ld c,a
rl l

rr h
sbc a,a
xor l
and %11111000
add a,c
ld c,a
rl l

rr h
sbc a,a
xor l
and %11100000
add a,c
ld c,a
rl l

ld a,h
rrca
xor l
and 128
xor c
neg
ret

It might be more easy if I explain that it is computing:
Code: [Select]
=(hhhhhhhh^abcdefgh)+(gggggg00^bcdefg00)+(ffff0000^cdef0000)+(ee000000^de000000)
=(hhhhhhhh^abcdefgh)+(ggggg000^bcdef000)+(fff00000^cde00000)+(e0000000^d0000000)
=((hhhhhhhh^abcdefgh)+(ggggg000^bcdef000)+(fff00000^cde00000))^e0000000^d0000000

^ is XOR
+ is addition modulo 256
EDIT: Cross-posted:
I thought of a way to optimize the first iteration. Saved 2 bytes, 8 t-states.
Basically, at the first iteration, C=-1 or C=2L, then I shift L up one bit for the next iteration. I got rid of the initial ld c,a and use the first iteration to compute c. To do this, I just shift L at the beginning, then after the "sbc a,a" I just OR that with L. If a was $FF, the result is FF (which is -1), else it is 2*L:
Code: [Select]
L_sqrd:
;Input: L
;Output: L*L->A
;151 t-states
;37 bytes
ld h,l
;First iteration, get the lowest 3 bits
sla l
rrh
sbc a,a
or l
;second iteration, get the next 2 bits
ld c,a
rr h
sbc a,a
xor l
and $F8
add a,c
;third iteration, get the next 2 bits
ld c,a
sla l
rr h
sbc a,a
xor l
and $E0
add a,c
;fourth iteration, get the last bit
ld c,a
ld a,l
add a,a
rrc h
xor h
and $80
xor c
neg
ret
Also, as a note, if you stop at any iteration and perform NEG, you can mask to get the lower bits of the square. So for example, the first iteration gives you -L*L mod 8->A, the second returns -L*L mod 32->A, the third gives it mod 128.

54
ASM / Sine Approximation [Z80]
« on: January 25, 2014, 08:50:07 pm »
I made a neat little algorithm today that seems to do an okay job at approximating sine. My original intention was to create a routine for the following with a fixed point format where x is on [0,1)
x(1-x)

However, if 'x' is in 'a', i can just do a*(-a), but NEG is 2 bytes, 8 cycles, whereas CPL is 1 byte, 4 cycles. That made me wonder if I could make a fast routine to multiply A by its compliment (is it called 1s compliment?). After working in my notebook with some more abstract approaches (I used 7th degree polynomials) I reduced it to a fairly simple solution that would be much simpler to integrate into an IC or something. However, I decided to experiment a little with the result by seeing what happens if I don't let some bits to propagate into the upper 8 bits (remember, 8.8 fixed point). So basically, I computed the product of 7-th degree polynomials with binary coefficients, and then truncated any terms of degree 7 or lower, then I converted it back to binary and the result is a close approximation of x(1-x). This happens to be a close approximation of sine, too, so after some tweaking, I got it to have the same input/output range as Axe (I recall that Quigibo said he used x(1-x) as an approximation to sine, too).

The following uses 3 iterations as opposed to 8 and is faster than the Axe version, but larger by 8 (?) bytes:
Code: [Select]
p_Cos:
ld a,l
add a,64
ld l,a
p_Sin:
ld a,l
add a,a
push af
ld d,a
ld h,a
ld l,0
ld bc,037Fh
__SinLoop:
sla h
sbc a,a
xor d
and c
add a,l
ld l,a
rrc d
srl c
srl c
djnz __SinLoop
ld h,b
pop af
ret nc
xor a
sub l
ret z
ld l,a
dec h
ret
;This:
;  34 bytes
;  269 t-states min, else 282, else 294
;  avg. 76 t-states faster than Axe
;Axe:
;  27 bytes
;  341+b t-states min, else 354, else 366

If the bits of register 'l' are sabcdefg, this basically returns:
Code: [Select]
(0aaaaaaa^0bcdefg0)+(000bbbbb^000cdefg)+(00000ccc^00000def)

^ is for XOR logic, + is regular integer addition modulo 256
(s is the sign)

The original algorithm was going to be for computing the natural logarithm :P I came up with some new (to me) algorithms for that as well.

EDIT1: Thanks to calc84maniac for pointing out the optimisation using a different order of operations with xor/and ! Saved 1 byte, 12 cycles. This let me shuffle some code to save 8 more cycles.
EDIT2: Modified the last note to reflect the normalized input/output to match that of Axe.

55
TI Z80 / LangZ80
« on: January 16, 2014, 11:40:27 pm »
I started posting about LangZ80 in the FileSyst thread. Although I don't consider it a project-- it is more of a "proof of concept" program-- I have been having updates and instead of derailing the FileSyst thread, I brought it here.

Basically, I want to get FileSyst to a point where it can use variables and functions and basically work as a higher-level programming language and not just an exposed filesystem. I decided to experiment by creating a parser (yet again).

It is really broken and in a messy state, especially with the variable storage. There are lots of bugs there, as well as a few other bugs (integer that are multiples of 256 don't work properly, and I believe I left a few bugs in the float multiply routine). However, when I am not introducing bugs and things are working as they should, here is what it has:
  • Commands are typed out, like "UPDATELCD()"
  • Integers are arbitrary precision, from 0 to 2040 bits, in multiples of 8. These are unsigned.
  • Strings are supported. Start them with either " or ' and end them with the same token. This allows the use of quotes in strings.
  • Floats are supported. They default to at least 128 bits of precision (38 decimal digits) and at least 32-bits of precision after the decimal point. Floats can be arbitrary sizes, though, also up to a total of 2040 bits.
  • I have wrote the float_times_float and int_times_int routines, but not int_times_float and others.
  • It has an XORSPRITE() routine that was just for testing. It totally fakes the sprite rendering by not actually reading a sprite var and just assuming 8x8 sprites. Later this will change.
  • It has an UPDATELCD() command
  • It has a WAIT() command to wait for a key press
  • It has a few stacks to hold intermediate data. This part of the app is complete, to my knowledge. It pushes, pops, converts, updates pointers, all that good stuff.
  • When exiting, it returns "Ans" in the OS ans variable as a string (or it returns one of the various errors)
  • As such, it has an int?str routine, str?str, and most recently about an hour ago, a float?str routine.
  • It parses from prgmTEST
  • Storing and reading variables is almost supported. It is still broken and causes crashes, though.

It is really kind of garbage at the moment, but feel free to download it to test out in WabbitEmu. It is not stable and it will likely end up crashing a few times.

If at any point it seems like the program has frozen, it might instead be in the WAIT() loop or in another loop I made. To get out, try the following first:
  • Press any key that isn't [ON]
  • Press [ON]
If you have to press [ON] to exit, it was in a loop where I haven't added in the needed code yet, so I used that loop as a filler.


(also, it's 3344 bytes of code out of 16384)

56
TI Z80 / AsmDraw
« on: January 09, 2014, 10:06:03 pm »
I actually made the first version if this program a long time ago, before I became active in the community. I remade it a few nights ago and uploaded it to TICalc, but it basically works like this:

-Give it a list of coordinates and types
-It generates a hex opcode that draws the rectangles with the given coordinates and methods.

It takes input in the form of {x,y,width,height,type} and you can give it multiple rectangles. It does 6 rectangle types, including the typical black/white/invert, using OS bcalls. This is also a BASIC program that generates the opcodes. As well, it tries to optimise the opcode a little, so if there are 8 or more rectangles, it uses a different method of drawing them (generating an array of coordinates/types and reading them, using 29 bytes extra overhead, 4 bytes less for each rectangle).

Anyways, it was useful for me when I was working on BASIC RPGs to make an assembly program that drew a map. The download is below. With so many other tools available now, this type of program is a bit outdated :P

57
TI Z80 / Particle Engine (Buffer Based)
« on: January 09, 2014, 02:18:22 pm »
When I was working on the Grammer particle engine, I had an idea that I put to the side. I finally got around to testing it, and it was a bit of a success, so I will share it, even though it isn't a project.

If you have seen the Grammer particle engine, it is fast, but it gets slower as you add more particles. Eventually, after 2048 particles, it still gets almost 6FPS, but hope was that this alternative method would have speed independent of the number of particles on screen.

The trick? Instead of adding particles by pushing them on to a stack-like array of coordinates, I would use something like a graph buffer. In grayscale, we have a front buffer and back buffer. For this, we have a "fixed buffer" and a "particle buffer." When updating the LCD, these two get ORed together to show the particles superimposed on the fixed buffer.

Disadvantage:
You have to scan every pixel on the particle buffer to locate particles and move them accordingly.
Advantage:
-The particle buffer is a fixed 768 bytes and can hold up to 6144 particles (compared to 12288 bytes).
-Updating particles takes about teh same time no matter how many particles there are.
-Adding particles can be done with simple drawing commands since it is just a graphics buffer in disguise.
-Deleting particles is done with the same ease. Deleting individual particles with the other method is very slow, having to search through the buffer to locate the particle with the given coordinates.
-Performing large scale operations on a group of particles is very easy


My goal was to implement a simple rule to move down if possible, or move left/right. This is a standard rule. However, scanning each pixel individually would have been very slow. Instead, I scanned one row of pixels at a time, applying some bit logic to see which pixels can move in a given direction. This turns out to be really effective:

In fact, using a circle as the pen tool and updating the LCD cause the biggest speed bottlenecks. The number of particles on screen does not affect the speed at all since every pixel in the particle buffer is treated as a particle and has all of the transformations performed on it.

I decided to keep the pen tool as it is, but interleaving the particle updating with LCD updating gains some FPS.

Readme:
Run it with the Asm( token.
Use [ON] to exit.
Use arrows to move the cursor.
Use [2nd] to draw on the fixed buffer (makes impassable barriers).
Use [Mode] to draw particles.
Use [Del] to delete from the fixed buffer (deletes barriers).
Use [Y=]+[Up] to increase the radius of the circle.
Use [Y=]+[Down] to decrease the radius of the circle.
Multiple key presses are supported.
The F2-F5 keys shadow [up],[right],[left],[down]
15MHz is automatically set if it is supported by your calculator.
6MHz should still expect ~22FPS according to WabbitEmu.

58
Math and Science / Fermat Factoring (Zeda's Method)
« on: December 31, 2013, 03:45:50 pm »
As some have seen me talking in IRC, I have my own rendition of Fermat factoring. I compared it to what I thought was a typical approach and my method asymptotically replaces a square root operation with a subtract and an increment. In complexity terms:

Square root is on the level of multiplication and division, so O(M(n))
Subtraction is O(n)
Increment, if my calculations are correct, is O(1) (in the case of the algorithm described below, invert bit 1, if that returns 0, invert bit 2, ... to increment by 2)

So while Fermat factoring isn't as efficient as the general number field sieve, the method I have is more efficient than a typical approach to Fermat factoring.

To describe the idea, notice that a2-b2=(a-b)(a+b). This means that, given our number c, if we can find a difference of two squares so that they equal c, then we have a pair of factors for c. For example, 102-52=75, so 75=(10-5)(10+5)=5*15. If c>1 and is a positive odd integer, it will always have a trivial solution (the difference between consecutive squares is 1,3,5,7,..., so this is "trivial"). Any other solutions are non-trivial and imply that c is composite. If only the trivial solution exists, c is prime.
Spoiler For proof:
Assume c is composite. Then c=nm where n,m are natural numbers greater than 1. If we let n be the larger of the two, then we have:
n=(a+b)
m=(a-b)
m+2b=a-b+2b=a+b
m+2b=n
n-m=2b
(n-m)/2=b
n=a+b
n=a+(n-m)/2
n-(n-m)/2=a
(n+m)/2=a
(n-m)/2=b

The trivial solution is (c+1)/2=a,(c-1)/2=b factoring to 1c=c. If c is composite, it has a non trivial solution. Now the other direction, if c is prime, it does not have a non-trivial solution. Essentially, notice that if there is a non trivial solution, where n is not (c+1)/2 and m is not (c-1)/2, then we can factor c as (n-m)(n+m) which is a contradiction.

Summary:
If c is prime, it has only the trivial solution.
If c is composite, it has non-trivial solutions.
The next few steps toward solving this formula are the typical approaches and I do the same thing. We notice that a>sqrt(c).
Spoiler For proof:
Rewrite a2-b2=c as a2-c=b2. Then we see that a2-c is non-negative, so:
a2>=c,
a>=sqrt(c).
So then with our algorithm, we can initialise a=cieling(sqrt(c). That is the smallest that a can be, so now we enter the naive algorithm, checking if the difference a2-c is a perfect square:
Code: [Select]
ceiling(sqrt(c))→a
While notperfectsqaure(a*a-c)
a+1→a
EndWhile
sqrt(a*a-c)→b
Return {a-b,a+b}
However, the next step away from the naive approach is to notice that every time you add 1 to a, the difference increases by 2a+1. So you can save some computations by keeping track of the difference and updating at each iteration:
Code: [Select]
ceiling(sqrt(c))→a
a*a-c→b2
While notperfectsqaure(b2)
b2+2a+1→b2
a+1→a
EndWhile
sqrt(b2)→b
Return {a-b,a+b}
However, that notperfectsquare() routine will likely require computing the square root at least once (twice if done naively). My approach takes the optimisation another step further by keeping track of how far away b2 is from a perfect square. With this, note that if b2 is the next lowest perfect square below a2-c, then the one above is b2+2*b2+1, so if the difference between b and b2 exceeds 2*b2+1, we need to increment b and adjust the difference:
Code: [Select]
ceiling(sqrt(c))→a
a*a-c→diff
floor(sqrt(diff))→b
diff-b*b→diff
While diff>0
  diff+a+a+1→diff
  a+1→a
  While diff>=(2b+1)
    diff-b-b-1→diff
    b+1→b
  EndWhile
EndWhile
Return {a-b,a+b}
As it is, the inner while loop converges to iterating 1 time pretty quickly (bounded the square root of the difference between the initial guess for a and c). We can say then that the original square root required at each iteration is replaced by 1 subtract and 1 increment.

Rewriting this for some more efficiency, we can do:
Code: [Select]
ceiling(sqrt(c))→a
a*a-c→diff
floor(sqrt(diff))→b
diff-b*b→diff
2a+1→a
2b+1→b
While diff>0
  diff+a→diff
  a+2→a
  While diff>=b
    diff-b→diff
    b+2→b
  EndWhile
EndWhile
floor(a/2)→a
floor(b/2)→b
Return {a-b,a+b}
At the chip level, those +2 can also be increments requiring O(1) time instead of additions requiring O(n) time. This means that each iteration of the algorithm requires about:
- 2 add/sub operations
- 2 increments
- 3 compares (that can be changed to simply checking for overflow, so these can be trivially removed)

And the maximum number of iterations are based on how far the actual a value is from the original estimate. If we know C is an odd composite (as, for example, RSAs) then this is a pretty crummy upper bound of c/3-sqrt(c) in the worst case (c=3*p).
Spoiler For Example 1:
So let's try to factor 14111. First, estimate a=119. Then 1192-14111=50, so our initial b is 7, with difference of 1. (50-7^2=1).
a=119, b=7, diff=1. Increment a, add 2*119+1 to diff
a=120, b=7, diff=240. 240>=7*2+1, so increment b, subtract 15 from diff
a=120, b=8, diff=225. increment b, subtract 17 from diff
a=120, b=9, diff=208. increment b, subtract 19 from diff
a=120, b=10, diff=189. increment b, subtract 21 from diff
a=120, b=11, diff=168. increment b, subtract 23 from diff
a=120, b=12, diff=145. increment b, subtract 25 from diff
a=120, b=13, diff=120. increment b, subtract 27 from diff
a=120, b=14, diff=93. increment b, subtract 29 from diff
a=120, b=15, diff=64. increment b, subtract 31 from diff
a=120, b=16, diff=33. increment b, subtract 33 from diff
a=120, b=17, diff=0. diff=0, so we have found a and b

So 14111=(120-17)(120+17)=103*137.
Spoiler For Example 2:
c=75. Then a=9 initially, 81-75=6, so b=2, diff=2
a=9, b=2, diff =2
a=10, b=2, diff =21
a=10, b=3, diff =16
a=10, b=4, diff =9
a=10, b=5, diff =0

So 75=(10-5)(10+5)=5*15.
Spoiler For More Notes:
I came up with this algorithm a few years ago while working on Grammer. This was before Grammer had Line() or Circle() drawing routines, during the summer (when I didn't have internet). Grammer had a Circle() routine long before Line() because the Circle() routine was derived from my Fermat factoring routine above. The routine above can be thought of as interpolating a curve of the form y=sqrt(x^2-b) compared to y=sqrt(z-x^2) where z is the radius squared. However, the algorithm above stops when it lands exactly on a point (when its error=0) whereas circle drawing typically requires the whole curve to be drawn. I believe that my method of drawing circles and lines is like the Bresenham method, so I made a tutorial saying as much. However, I haven't validated that, so I haven't officially released it.
0,1,4,9, Fermat, Bresenham, Circles, and Lines

My circle and line drawing routines takes advantage of overflow (the c flag) to avoid making comparisons. Likewise:
Code: [Select]
If 0==c%2
  Return {2,a/2}
sqrt(c)→a
If floor(a)=a
  Return {a,a}
1+floor(a)→a
a*a-c→diff
floor(sqrt(diff))→b
diff-b*b→diff
2a-1→a
2b-1→b
diff-b→diff
Loop0:
  a+2→a
  diff+a→diff
  if carry_set
Loop1:
    b+2→b
    diff-b→diff
    GOTO Loop1 IF carry_not_set
  GOTO Loop0 IF zero_not_set
a>>1→a
b>>1→b
Return {a-b,a+b}
I should also note that at the assembly level, my integer square root routines typically also returns the difference between the original input and the next smaller perfect square, so that also helps speed things up a little (getting rid of all of the multiplications).

Sorry for the code nested in a spoiler tag :/

59
TI Z80 / Chaos Game
« on: December 18, 2013, 02:37:11 pm »
The Chaos Game is not a 'game' in the standard sense. Instead, it is a mathematical 'game' like Conway's Game of Life or Langton's Ant. There are so many cool, neat things about the Chaos Game, and many uses for it, but that would take a good semester or two just to describe and employ it. Instead, I will describe the basics and the way I am using it at the moment:

  • Start with a polygon, like a triangle or a square, or something more inventive
  • Start at some point, maybe on the perimeter-- it doesn't really matter (well, some deeper investigation will say otherwise, but... :P)
  • Now select a vertex randomly and move halfway to that vertex. Repeat this step many times.
On the surface, it makes neat things happen if you use a triangle-- it actually does not fill it in. There are certain points that can never be reached inside the triangle and in fact, it will make a Sierpinski's Triangle. However, squares and such will be filled in.

USES: One of the really cool effects of this deals with the very important part of the iterative step. The key word is "random." If you use an algorithm that cycles, this game will show that with strong biases for certain points. Any common runs or cycles will do this. For example, if you use a square and an RNG that returns the sequence {2,3} more frequently than other pairings with 2, the space between points 2 and 3 will be darker than any other points. Since we cannot produce real "random" numbers, we can test pseudo-random number generators to see if they produce the correct result or where biases might be. A very chaotic PRNG that has a very long period will yield good results. A PRNG that has a shorter period or common runs will produce a less amazing result.

Since I have been working on PRNGs, I have been using some simple implementations of the Chaos Game. Below is a code that allows you to add in your own 'Random' routine to test as well as the choice of using a square or triangle. In z80 assembly, we often mask out bits to generate integers in a smaller range like [0,1,2,3] or [0,1,2,3,4,5,6,7] (powers of 2, minus 1 make easy masks). While this is fast and simple, the Chaos Game will tell you if this is actually destroying the chaotic behavior of your RNG. If this is the case, you might have better results using a range that isn't a power of 2. However, this can be more time consuming. It's your pick :)
Code: [Select]
.nolist
#define TASM
#define bcall(xxxx)     rst 28h \ .dw xxxx
#define equ .equ
.list
shape=4
rand=2
update=1

     .org 9D93h
     .db $BB,$6D
    bcall(4BD0h)    ;_GrBufClr
    bcall(486Ah)    ;_GrBufCpy
    bcall(4570h)    ;_RunIndicOff
    di
    ld a,$FD
    out (1),a
    ld hl,0
ChaosGame_SierpinskisGasket:
    in a,(4)    ;test ON press
    and 8
    ret z
    in a,(1)    ;test Clear press
    and 64
    ret z
    ld (coords),hl
    call Plot
    ld hl,(coords)
    srl h
    srl l
    push hl
#IF update==256
    ld a,(count)
    inc a
    ld (count),a
    jr nz,$+5
    bcall(486Ah)  ;_GrBufCpy
#ENDIF
    call Random
#IF shape==3
    call L_mod_3
    pop hl
    jr z,ChaosGame_SierpinskisGasket
    dec a
    ld a,32
    jr nz,$+7
      add a,l
      ld l,a
      jp ChaosGame_SierpinskisGasket
    add a,h
    ld h,a
    jp ChaosGame_SierpinskisGasket
#ENDIF
#IF shape==4
    and 3
    pop hl
    jr z,ChaosGame_SierpinskisGasket
    rrca
    ld b,a
    ld a,32
    jr nc,$+6
      add a,l
      ld l,a
      ld a,32
    rrc b
    jp nc,ChaosGame_SierpinskisGasket
      add a,h
      ld h,a
    jp ChaosGame_SierpinskisGasket
#ENDIF
#IF rand==0
;  From CaDan, by Iambian and Geekboy
Random:
LFSR:
    ld    hl,lfsrseed1
    ld    a,(hl)
    rrca
    ld    (hl),a
    jp nc,+_
    xor    %00111000
    ld    (hl),a
_:
 ld l,a
 ret
#ENDIF
#IF rand==1
;  From CaDan, by Iambian and Geekboy
Random:
LFSR2Byte:
  ld hl,(lfsrseed2)
  ld a,L
  rlca
  xor h
  rlca  ;
  rlca  ;
  xor h ;
  rlca
  xor h
  rlca
  adc hl,hl
  ld (lfsrseed2),hl
  ld a,h     ;for combatability
  ret
#ENDIF
#IF rand==2
;  From Zeda's library
Random:
Rand8:
;Output:  L is a pseudo-random number, DE=509, B=0, C=seed1
;Destroys:A
;Size:    141 bytes
;Speed:   fastest: 672, slowest: 796
;Notes:
;  Requires 2 seeds of 1 byte each, seed1 and seed2
;Method
;  We iterates 2 Linear Congruential Generators. 'x' is seed1, 'y' is seed2
;    (33x+17) mod 251 -> x
;    (13y+83) mod 241 -> y
;  Next, we output the lower 8 bits of x*y mod 509
;  This all gives a period of 60491 (251*241) which should be overkill for games and such
    ld hl,(seed1)
    ld h,0
;multiply by 13
    ld b,h
    ld c,l
    add hl,hl
    add hl,bc
    add hl,hl
    add hl,hl
    add hl,bc
;add 83
    ld c,83
    add hl,bc
;mod_241
    ld c,15
    ld a,h
    add a,a
    add a,a
    add a,a
    add a,a
    sub h
    add a,l        ;139
    jr nc,$+6
    add a,c
    jr nc,$+3
    add a,c
    add a,c
    jr c,$+3
    sub c
;store back to seed1
    ld (seed1),a
;seed2
    ld hl,(seed2)
    ld h,b
;multiply by 33
    ld b,h
    ld c,l
    add hl,hl
    add hl,hl
    add hl,hl
    add hl,hl
    add hl,hl
    add hl,bc
;add 17
    ld c,17
    add hl,bc
;---
;make BC=seed1
    ld c,a
;---
;mod_251
    ld a,h
    add a,a
    add a,a
    add a,h
    add a,l
    jr nc,$+8
    add a,5
    jr nc,$+4
    add a,5
    cp 251
    jr c,$+4
    add a,5
;store seed2
    ld (seed2),a

    ld h,a
    ld l,b
;H_Times_C
    sla h \ jr nc,$+3 \ ld l,c
    add hl,hl \ jr nc,$+3 \ add hl,bc
    add hl,hl \ jr nc,$+3 \ add hl,bc
    add hl,hl \ jr nc,$+3 \ add hl,bc
    add hl,hl \ jr nc,$+3 \ add hl,bc
    add hl,hl \ jr nc,$+3 \ add hl,bc
    add hl,hl \ jr nc,$+3 \ add hl,bc
    add hl,hl \ jr nc,$+3 \ add hl,bc
HL_mod_509:
;HL_mod_509 -> HL
    ld a,h
    ld d,b
    srl a
    rl d
    add a,h
    sub d
    jr nc,$+3
    inc d
    add a,l
    jr nc,$+4
    inc d
    ccf
    ld e,a
    ld hl,509
    ex de,hl
    sbc hl,de
    jr c,$+6
    add hl,de
    sbc hl,de
    ret nc
    add hl,de
    ret
#ENDIF
#IF shape==3
L_mod_3:
;Outputs:
;    A is the remainder
;    destroys L,BC
;    z flag if divisible by 3, else nz

    ld bc,0306h
    ld a,l
    res 4,l
    rlca
    rlca
    rlca
    rlca
    and 15
    add a,l
    and 31
    ld l,a
    sra l
    sra l
    and b
    add a,l
    sub c \ jr nc,$+3 \ add a,c
    sub b \ ret nc \ add a,b
    ret
;at most 123 cycles, at least 114, 30 bytes
#ENDIF
#IF rand==3
;  From Zeda's library
Random:
Rand24:
;Inputs: seed1,seed2
;Outputs:
; HLA is the pseudo-random number
; seed1,seed2 incremented accordingly
;Destroys: BC,DE
;Notes:
; seed1*243+83 mod 65519 -> seed1
; seed2*251+43 mod 65521 -> seed2
; Output seed1*seed2
    ld hl,(seed1)
    xor a
    ld b,h \ ld c,l
    ld de,83
    add hl,hl \ rla    ;2
    add hl,bc \ adc a,d    ;3
    add hl,hl \ rla    ;6
    add hl,bc \ adc a,d    ;7
    add hl,hl \ rla    ;14
    add hl,bc \ adc a,d    ;15
    add hl,hl \ rla    ;30
    add hl,hl \ rla    ;60
    add hl,hl \ rla    ;120
    add hl,bc \ adc a,d    ;121
    add hl,hl \ rla    ;242
    add hl,bc \ adc a,d    ;243
    add hl,de \ adc a,d    ;243x+83
;now AHL mod 65519
; Essentially, we can at least subtract A*65519=A(65536-17), so add A*17 to HL
    ex de,hl
    ld l,a
    ld b,h
    ld c,l
    add hl,hl
    add hl,hl
    add hl,hl
    add hl,hl
    add hl,bc
    add hl,de
    ld c,17
    jr nc,$+3
    add hl,bc
    add hl,bc
    jr c,$+4
    sbc hl,bc
    ld (seed1),hl
;seed1 done, now we need to do seed2
    ld hl,(seed2)
; seed1*243+83 mod 65519 -> seed1
; seed2*251+43 mod 65521 -> seed2
; Output seed1*seed2
    xor a
    ld b,h \ ld c,l
    ld de,43
    add hl,hl \ rla  ;2
    add hl,bc \ adc a,d ;3
    add hl,hl \ rla  ;6
    add hl,bc \ adc a,d ;7
    add hl,hl \ rla  ;14
    add hl,bc \ adc a,d ;15
    add hl,hl \ rla  ;30
    add hl,bc \ adc a,d ;31
    add hl,hl \ rla  ;62
    add hl,hl \ rla  ;124
    add hl,bc \ adc a,d ;125
    add hl,hl \ rla  ;250
    add hl,bc \ adc a,d ;251
    add hl,de \ adc a,d ;251x+83
;now AHL mod 65521
; Essentially, we can at least subtract A*65521=A(65536-15), so add A*15 to HL
    ex de,hl
    ld l,a
    ld b,h
    ld c,l
    add hl,hl
    add hl,hl
    add hl,hl
    add hl,hl
    sbc hl,bc
    add hl,de
    ld c,15
    jr nc,$+3
    add hl,bc
    add hl,bc
    jr c,$+4
    sbc hl,bc
    ld (seed2),hl
;now seed1 and seed 2 are computed
    ld bc,(seed1)
    ex de,hl
    call BC_Times_DE
    ex de,hl
    ld l,b
    ld h,0
    ld b,h
    ld c,l
    add hl,hl
    add hl,hl
    add hl,bc
    add hl,hl
    add hl,hl
    add hl,bc
    add hl,hl
    add hl,bc
    ld c,d
    ld d,e
    ld e,a
    ld a,c
    sbc hl,bc
    sbc a,b
    ret nc
    ld c,43
    add hl,bc
    ret
BC_Times_DE:
;BHLA is the result
 ld a,b
 or a
 ld hl,0
 ld b,h
;1
 add a,a
 jr nc,$+4
 ld h,d
 ld l,e
;2
 add hl,hl
 rla
 jr nc,$+4
 add hl,de
 adc a,b
;227+10b-7p
 add hl,hl
 rla
 jr nc,$+4
 add hl,de
 adc a,b

 add hl,hl
 rla
 jr nc,$+4
 add hl,de
 adc a,b

 add hl,hl
 rla
 jr nc,$+4
 add hl,de
 adc a,b

 add hl,hl
 rla
 jr nc,$+4
 add hl,de
 adc a,b

 add hl,hl
 rla
 jr nc,$+4
 add hl,de
 adc a,b

 add hl,hl
 rla
 jr nc,$+4
 add hl,de
 adc a,b

;===
;AHL is the result of B*DE*256
 push hl
 ld h,b
 ld l,b
 ld b,a
 ld a,c
 ld c,h
;1
 add a,a
 jr nc,$+4
 ld h,d
 ld l,e
;2
 add hl,hl
 rla
 jr nc,$+4
 add hl,de
 adc a,c
;227+10b-7p
 add hl,hl
 rla
 jr nc,$+4
 add hl,de
 adc a,c

 add hl,hl
 rla
 jr nc,$+4
 add hl,de
 adc a,c

 add hl,hl
 rla
 jr nc,$+4
 add hl,de
 adc a,c

 add hl,hl
 rla
 jr nc,$+4
 add hl,de
 adc a,c

 add hl,hl
 rla
 jr nc,$+4
 add hl,de
 adc a,c

 add hl,hl
 rla
 jr nc,$+4
 add hl,de
 adc a,c

 pop de
;Now BDE*256+AHL
 ld c,a
 ld a,l
 ld l,h
 ld h,c
 add hl,de
 ret nc
 inc b
;BHLA is the 32-bit result
 ret
#ENDIF
#IF rand==4
Random:
;from ionRandom
    ld hl,(seed1)
    ld a,r
    ld d,a
    ld e,(hl)
    add hl,de
    add a,l
    xor h
    ld (seed1),hl
    ld hl,0
    ld e,a
    ld d,h
    add hl,de
    djnz $-1
    ld l,h
    ret
#ENDIF
#IF rand==5
Random:
    ld hl,(seed1)
    xor a
    ld b,h \ ld c,l
    ld de,83
    add hl,hl \ rla    ;2
    add hl,bc \ adc a,d    ;3
    add hl,hl \ rla    ;6
    add hl,bc \ adc a,d    ;7
    add hl,hl \ rla    ;14
    add hl,bc \ adc a,d    ;15
    add hl,hl \ rla    ;30
    add hl,hl \ rla    ;60
    add hl,hl \ rla    ;120
    add hl,bc \ adc a,d    ;121
    add hl,hl \ rla    ;242
    add hl,bc \ adc a,d    ;243
    add hl,de \ adc a,d    ;243x+83
;now AHL mod 65519
; Essentially, we can at least subtract A*65519=A(65536-17), so add A*17 to HL
    ex de,hl
    ld l,a
    ld b,h
    ld c,l
    add hl,hl
    add hl,hl
    add hl,hl
    add hl,hl
    add hl,bc
    add hl,de
    ld c,17
    jr nc,$+3
    add hl,bc
    add hl,bc
    jr c,$+4
    sbc hl,bc
    ld (seed1),hl
    ret
#ENDIF
#IF rand==6
Random:
    ld hl,(seed2)
; seed2*251+43 mod 65521 -> seed2
    xor a
    ld b,h \ ld c,l
    ld de,43
    add hl,hl \ rla  ;2
    add hl,bc \ adc a,d ;3
    add hl,hl \ rla  ;6
    add hl,bc \ adc a,d ;7
    add hl,hl \ rla  ;14
    add hl,bc \ adc a,d ;15
    add hl,hl \ rla  ;30
    add hl,bc \ adc a,d ;31
    add hl,hl \ rla  ;62
    add hl,hl \ rla  ;124
    add hl,bc \ adc a,d ;125
    add hl,hl \ rla  ;250
    add hl,bc \ adc a,d ;251
    add hl,de \ adc a,d ;251x+83
;now AHL mod 65521
; Essentially, we can at least subtract A*65521=A(65536-15), so add A*15 to HL
    ex de,hl
    ld l,a
    ld b,h
    ld c,l
    add hl,hl
    add hl,hl
    add hl,hl
    add hl,hl
    sbc hl,bc
    add hl,de
    ld c,15
    jr nc,$+3
    add hl,bc
    add hl,bc
    jr c,$+4
    sbc hl,bc
    ld (seed2),hl
    ret
#ENDIF
Plot:
;need to plot the pixel at HL
    ld a,l
    cp 64 \ ret nc
#IF update==1
    add a,80h
    out (16),a
#ENDIF
    ld a,h
    cp 96 \ ret nc
    ld c,l
    ld h,0
    ld b,h
    add hl,hl
    add hl,bc
    add hl,hl
    add hl,hl
    ld b,a
    rrca \ rrca \ rrca
    and 0Fh
#IF update==1
    add a,20h
    out (16),a
    add a,20h
#ELSE
    or 40h
#ENDIF
    ld c,a
    ld a,b
    ld b,93h
    add hl,bc
    and 7
    ld b,a
    inc b
    ld a,1
      rrca
      djnz $-1
    or (hl)
    ld (hl),a
#IF update==1
    out (17),a
#ENDIF

    ret

coords:    .db 0,0
count:    .db 0
lfsrseed2:
lfsrseed1:
seed1:
 .dw 256
seed2:
 .dw 0
As notes:
  • Press [ON] or [CLEAR] to exit
  • Set shape as 3 or 4 based on the mode you want to compile.
  • Set rand accordingly. There are 7 included (0,1,2,3,4,5,6). See below.
  • Set update to 256 to update every 256 iterations using the OS routine. This is not recommended. Use 1. It is faster and you see every pixel plotted.
There are 7 included PRNGs and 2 shapes. The shapes are 3 for a triangle, 4 for a square. The PRNGS are:

0 for a 1-byte Linear Feedback Shift Register (LFSR) that I was testing. It is a very fast code and pretty chaotic, but using shape=4 shows that it is actually strongly biased for simple masking. Using shape=3 yields excellent results with some minor bias. (credit goes to Iambian and Geekboy for this)

1 for a 2-byte LFSR, also from Iambian and Geekboy

2 for a much heavier/slower 8-bit PRNG I wrote. It works well for shape=3 and shows some fairly weak bias in shape=4, so it is decent for chaotic behavior

3 for an even larger, slower routine (about half the speed of the previous) but it yields a 24-bit pseudo-random number and passes both test very well. This routine is on par with the OS rand routine, but many times faster (probably hundreds or thousands times faster).

4 is the ION routine

5 and 6 are the LCGs used in (3)

You can add your own routines named Random as well, but they should return the result in L.
EDIT: Updated the code, added a download with a readme.

60
ASM / Speed Optimised HL_mod_10
« on: December 15, 2013, 01:44:11 pm »
I was trying to make an HL_mod_10 routine today employing some of the trickery I used with my Rand24 routine (24-bit pseudo-random number generator). I wanted to make it as fast as I could, so I was wondering if anybody wanted to have a fun game of optimise-the-short-assembly-routine.

My approach was to break HL into two parts as H*256+L. Mod 10, that is equivalent to:
(H*260-H*4+L) mod 10=(L-H*4) mod 10

H and L are 8-bit values, so H*4 has the potential of being a 10 bit number. If I call this X*256+Y where X is the upper 2 bits of H*4:
(L-H*4) mod 10 = (L-X*256-Y) mod 10
(L-X*256-Y) mod 10 =(L-(X*260-X*4)-Y) mod 10
(L-(X*260-X*4)-Y) mod 10=(L-Y+X*4) mod 10

Now that is an 8-bit subtraction with L and Y. If there is 8-bit 'overflow' with the subtraction, then I need to adjust it by adding a constant. Then, X*4 is at most a 4 bit number to add to that. Once that is all done, you can perform mod 10 stuff on an 8-bit number. I came up with this code:
Code: [Select]
HL_mod_10
;Input: HL
;Output: (HL_mod_10 -> A)
;197.75 avg cycles
ld a,l
ld l,h
ld h,0
add hl,hl
add hl,hl
sub l \ jr nc,$+8 \ add a,4 \ jr nc,$+4 \ add a,6
sla h \ sla h
add a,h \ jr nc,$+8 \ sub 4 \ jr nc,$+4 \ sub 6
sub 160 \ jr nc,$+4 \ add a,160
sub 80 \ jr nc,$+4 \ add a,80
sub 40 \ jr nc,$+4 \ add a,40
sub 20 \ jr nc,$+4 \ add a,20
sub 10 \ ret nc \ add a,10 \ ret

It averages a smidge under 197.75 t-states.


EDIT: If anybody wants to modify this for other languages, here is a supplemental explanation:
This is the equivalent of HL mod 10:
(L-Y+X*4) mod 10

If the input bits of HL are abcdefghijklmnop2, then:
L=ijklmnop2
Y=cdefgh002
X=000000ab2

So to put it into an 8-bit value to operate on:
ijklmnop2-cdefgh002+0000ab002

If this is >256 (it will be at most 267), then add 6 and keep the lower 8 bits.
If this is <0 (it will be at the lowest -255) then take the lower 8-bits (signed, two's complement form), add 4. If that overflows to be >256, then add another 6, keeping only the lower 8 bits.
So for example, if ijklmnop2-cdefgh002+0000ab002=-3 → FD+04→10116→01+06→07.

Now perform mod 10 on the 8-bit result. You can apply more tricks to this if you like. abcdefgh2=128a+0bcdefgh2, so if a=1, you can do 0bcdefgh2-2, for example. Or:
0000ab002+00cdefgh2
Or:
0000abc02+000defgh2

Pages: 1 2 3 [4] 5 6 ... 13