I made a neat little algorithm today that seems to do an okay job at approximating sine. My original intention was to create a routine for the following with a fixed point format where x is on [0,1)

x(1-x)

However, if 'x' is in 'a', i can just do a*(-a), but

`NEG` is 2 bytes, 8 cycles, whereas

`CPL` is 1 byte, 4 cycles. That made me wonder if I could make a fast routine to multiply A by its compliment (is it called 1s compliment?). After working in my notebook with some more abstract approaches (I used 7th degree polynomials) I reduced it to a fairly simple solution that would be much simpler to integrate into an IC or something. However, I decided to experiment a little with the result by seeing what happens if I don't let some bits to propagate into the upper 8 bits (remember, 8.8 fixed point). So basically, I computed the product of 7-th degree polynomials with binary coefficients, and then truncated any terms of degree 7 or lower, then I converted it back to binary and the result is a close approximation of x(1-x). This happens to be a close approximation of sine, too, so after some tweaking, I got it to have the same input/output range as Axe (I recall that Quigibo said he used x(1-x) as an approximation to sine, too).

The following uses 3 iterations as opposed to 8 and is faster than the Axe version, but larger by 8 (?) bytes:

`p_Cos:`

ld a,l

add a,64

ld l,a

p_Sin:

ld a,l

add a,a

push af

ld d,a

ld h,a

ld l,0

ld bc,037Fh

__SinLoop:

sla h

sbc a,a

xor d

and c

add a,l

ld l,a

rrc d

srl c

srl c

djnz __SinLoop

ld h,b

pop af

ret nc

xor a

sub l

ret z

ld l,a

dec h

ret

;This:

; 34 bytes

; 269 t-states min, else 282, else 294

; avg. 76 t-states faster than Axe

;Axe:

; 27 bytes

; 341+b t-states min, else 354, else 366

If the bits of register 'l' are sabcdefg, this basically returns:

`(0aaaaaaa^0bcdefg0)+(000bbbbb^000cdefg)+(00000ccc^00000def)`

^ is for XOR logic, + is regular integer addition modulo 256

(s is the sign)

The original algorithm was going to be for computing the natural logarithm

I came up with some new (to me) algorithms for that as well.

**EDIT1:** Thanks to calc84maniac for pointing out the optimisation using a different order of operations with xor/and ! Saved 1 byte, 12 cycles. This let me shuffle some code to save 8 more cycles.

**EDIT2:** Modified the last note to reflect the normalized input/output to match that of Axe.