Omnimaga
Calculator Community => TI Calculators => ASM => Topic started by: Xeda112358 on May 20, 2013, 08:49:55 am
-
Hey everyone, I wrote a routine for computing fixed point 8.8 natural logs. The issue is that it is huge and slow, taking about 11000 t-states and being 188 bytes for the routine plus the 8.8 FP division routine.
I am sure there are optimisations to make-- I know a few that will increase the size, but speed it up a tiny bit. But I also wonder if there is an even faster algorithm? Here is a breakdown of what my code does:
- First, I use part of a continued fraction. I tried three methods-- a Maclaurin Series (the most computationally expensive), a series I derived yesterday (slightly less computationally expensive), and continued fractions. The continued fractions converges faster than the other two and requires less than half the number of Multiplication/Division calls.
- The very first step, is testing if the input is zero, in which case it returns $FFFF
- Next, it puts the number in the range of [1,2] by multiplying it by a power of 2 (that power is either negative or positive or zero). It stores the power for later use.
- Next it performs the following computations in BASIC-ish pseudocode (for the continued fraction)
X-1→X
push X
4*X→X ;in code, X is in HL, so I just use add hl,hl \ add hl,hl. This doesn't really count as a multiplication routine, then.
Ans+4 ;In code, this is just inc h \ inc h \ inc h \ inc h
X/Ans
pop X
Ans+3
X/Ans
Ans+2
X/Ans
Ans+1
X/Ans
- The next step is to take the power of two that the original value was multiplied by, multiply that power by ln(2) and add it to the result. If the power was negative (so it was divided by a power of two), I store ln(2) in 8.16 FP format and add it a number of times to an accumulator , then add it to the result. If the power is positive, I simply subtract the 8.8 approximation of ln(2) from the result the number of times needed.
Now for the actual code:
FPLog88:
;Input:
; HL is the 8.8 Fixed Point input. H is the integer part, L is the fractional part.
;Output:
; HL is the natural log of the input, in 8.8 Fixed Point format.
ld a,h
or l
dec hl
ret z
inc hl
push hl
ld b,15
add hl,hl
jr c,$+4
djnz $-3
ld a,b
sub 8
jr nc,$+4
neg
ld b,a
pop hl
push af
jr nz,lnx
jr nc,$+7
add hl,hl
djnz $-1
jr lnx
sra h
rr l
djnz $-4
lnx:
dec h ;subtract 1 so that we are doing ln((x-1)+1) = ln(x)
ld d,h
ld e,l
inc h
call FPDE_Div_HL ;preserves DE, returns AHL as the 16.8 result
inc h ;now we are doing x/(3+Ans)
inc h
inc h
call FPDE_Div_HL
inc h ;now we are doing x/(2+Ans)
inc h
call FPDE_Div_HL
inc h ;now we are doing x/(1+Ans)
call FPDE_Div_HL ;now it is computed to pretty decent accuracy
pop af ;the power of 2 that we divided the initial input by
ret z ;if it was 0, we don't need to add/subtract anything else
ld b,a
jr c,SubtLn2
push hl
xor a
ld de,$B172 ;this is approximately ln(2) in 0.16 FP format
ld h,a
ld l,a
add hl,de
jr nc,$+3
inc a
djnz $-4
pop de
rl l ;returns c flag if we need to round up
ld l,h
ld h,a
jr nc,$+3
inc hl
add hl,de
ret
SubtLn2:
ld de,$00B1
or a
sbc hl,de
djnz $-3
ret
FPDE_Div_HL:
;Inputs:
; DE,HL are 8.8 Fixed Point numbers
;Outputs:
; DE is preserved
; AHL is the 16.8 Fixed Point result (rounded to the least significant bit)
di
push de
ld b,h
ld c,l
ld a,16
ld hl,0
Loop1:
sla e
rl d
adc hl,hl
jr nc,$+8
or a
sbc hl,bc
jp incE
sbc hl,bc
jr c,$+5
incE:
inc e
jr $+3
add hl,bc
dec a
jr nz,Loop1
ex af,af'
ld a,8
Loop2:
ex af,af'
sla e
rl d
rla
ex af,af'
add hl,hl
jr nc,$+8
or a
sbc hl,bc
jp incE_2
sbc hl,bc
jr c,$+5
incE_2:
inc e
jr $+3
add hl,bc
dec a
jr nz,Loop2
;round
ex af,af'
add hl,hl
jr c,$+6
sbc hl,de
jr c,$+9
inc e
jr nz,$+6
inc d
jr nz,$+3
inc a
ex de,hl
pop de
ret
On average this is off from the actual value by about 1/400. Since this is smaller than the smallest 8.8 value, it is pretty accurate. The worst case scenario that I have found so far was trying to find ln(1/256) which was off by about 7/512 from the actual value.
Later I will hopefully be working on sin() and cos() among others, but I thought I would share this and see if anybody had better methods.
EDIT: Thought of a fairly simple optimisation. I am going to test another optimisation that may speed it up by as much as 40%
EDIT2: Here is a much smaller and more efficient log2 routine. It is 71 bytes (so 117 bytes smaller) and takes <4000 t-states:
EDIT3: I found a handful of bugs (errors in translating from hex to assembly) so I went through the code and fixed it. I also optimised a few areas, saving a total of 1 byte and some clock cycles. Also, a legitimate use of 'rr a' instead of 'rra'. I only added that for ambiguity since I only needed to check if A>1. In a later post, I have a size optimised version that averages about 39 t-states slower which is about 1% slower.
Log_2_88:
;Inputs:
; HL is an unsigned 8.8 fixed point number.
;Outputs:
; HL is the signed 8.8 fixed point value of log base 2 of the input.
;Example:
; pass HL = 3.0, returns 1.58203125 (actual is ~1.584962501...)
;70 bytes
ex de,hl
ld hl,0
ld a,d
ld c,8
or a
jr z,DE_lessthan_1
srl d
jr z,logloop-1
inc l
rr e
jp $-7
DE_lessthan_1:
ld a,e
dec hl
or a
ret z
inc l
dec l
add a,a
jr nc,$-2
ld e,a
inc d
logloop:
add hl,hl
push hl
ld h,d
ld l,e
ld a,e
ld b,7
add hl,hl
rla
jr nc,$+3
add hl,de
djnz $-5
adc a,0
add hl,hl
rla
jr nc,$+5
add hl,de
adc a,0
ld e,h
ld d,a
pop hl
rr a
jr z,$+7
srl d
rr e
inc l
dec c
jr nz,logloop
ret
-
Wow nice! Im not sure ive seen anyone write such a routine before?!
-
Very nice. A minor optimization: in line 109 you can use rla instead of rl a, since you're not checking that zero flag anywhere.
-
I can't really tell, since because of my school level, I even don't know what a logarithm is :P but I wonder what'd be the use of your routine ..?
-
Logarithms are used to find out the power you should raise a certian number to to get another number. The natural logarithm (used here) is a base e logarithm, ln(x) is asking to what power should I raise e to get x. Hope you got it :P
-
In equation form, ln(a) is the solution to e^x=a, which is the definition we got in Terminale. BTW e is Euler's constant which is so that e^x=exp(x) where exp is the exponential function. :P
-
Okay, I got it, thanks :P but I still don't really see what could be the use of this routine in a program.
-
I'd just create two look-up tables: one for 0<n<1 and another for 1<n<255. Then, branch depending on the MSB of n. Actually, if you compute a reciprocal, you only need one table. Basically, your input really only has 8 significant bits; surely that can be used for an optimization. Your output domain is -5.6<n<5.6.
Edit: Use change-of-base rule: It's trivial to compute n-log-base-2, so just compute that then divide by the hardcoded log-base-2 of e value.
-
Very nice. A minor optimization: in line 109 you can use rla instead of rl a, since you're not checking that zero flag anywhere.
Oh, oops, I copied that from my calc incorrectly, thanks!
I'd just create two look-up tables: one for 0<n<1 and another for 1<n<255. Then, branch depending on the MSB of n. Actually, if you compute a reciprocal, you only need one table. Basically, your input really only has 8 significant bits; surely that can be used for an optimization. Your output domain is -5.6<n<5.6.
Edit: Use change-of-base rule: It's trivial to compute n-log-base-2, so just compute that then divide by the hardcoded log-base-2 of e value.
To your first point, you would only really need to look at the interval [1,2] which is all my code does. Essentially, it computes ln(x) by taking x=y2n where y is on [1,2], then ln(y2n) = ln(y)+ln(2n)= ln(y)+n*ln(2) and I have ln(2) hardcoded.
That being said, what you put in your edit makes a lot of sense. Shortly after this routine, I wrote a log2 routine that is much faster, and because I would be dividing by a constant log2(e) to change the base to natural log, the division routine could be optimised a bit, saving ~400 cycles. Also, the log2 routine is much better because I can extend it to just about any size fixed-point (or floating point) numbers that I want. I'll edit my first post with that.
-
I found a few errors in the log_2 routine (translation issues from hex->assembly). As well, I rearranged the code and in all saved a byte and some t-states :) I updated the first post with my preferred version which is slightly speed optimised in the same way as the last version. Here is a cleaner, smaller version and is only slightly slower (about 39 t-states on average):
Log_2_88_size:
;Inputs:
; HL is an unsigned 8.8 fixed point number.
;Outputs:
; HL is the signed 8.8 fixed point value of log base 2 of the input.
;Example:
; pass HL = 3.0, returns 1.58203125 (actual is ~1.584962501...)
;averages about 39 t-states slower than original
;62 bytes
ex de,hl
ld hl,0
ld a,d
ld c,8
or a
jr z,DE_lessthan_1
srl d
jr z,logloop-1
inc l
rr e
jr $-7
DE_lessthan_1:
ld a,e
dec hl
or a
ret z
inc l
dec l
add a,a
jr nc,$-2
ld e,a
inc d
logloop:
add hl,hl
push hl
ld h,d
ld l,e
ld a,e
ld b,8
add hl,hl
rla
jr nc,$+5
add hl,de
adc a,0
djnz $-7
ld e,h
ld d,a
pop hl
rr a ;this is right >_>
jr z,$+7
srl d
rr e
inc l
dec c
jr nz,logloop
ret
Also, I used 'rr a' here for the flags to test if a>1 (the c flag is reset before hand, but I could have used sra a or srl a or bit 1,a or bit 1,d).