Omnimaga

Calculator Community => TI Calculators => ASM => Topic started by: Xeda112358 on May 20, 2013, 08:49:55 am

Title: Fixed Point Logarithm
Post by: Xeda112358 on May 20, 2013, 08:49:55 am
Hey everyone, I wrote a routine for computing fixed point 8.8 natural logs. The issue is that it is huge and slow, taking about 11000 t-states and being 188 bytes for the routine plus the 8.8 FP division routine.

I am sure there are optimisations to make-- I know a few that will increase the size, but speed it up a tiny bit. But I also wonder if there is an even faster algorithm? Here is a breakdown of what my code does:

Code: [Select]
    X-1→X
     push X
     4*X→X    ;in code, X is in HL, so I just use add hl,hl \ add hl,hl. This doesn't really count as a multiplication routine, then.
     Ans+4     ;In code, this is just inc h \ inc h \ inc h \ inc h
     X/Ans
     pop X
     Ans+3
     X/Ans
     Ans+2
     X/Ans
     Ans+1
     X/Ans


Now for the actual code:
Code: [Select]
FPLog88:
;Input:
;     HL is the 8.8 Fixed Point input. H is the integer part, L is the fractional part.
;Output:
;     HL is the natural log of the input, in 8.8 Fixed Point format.
     ld a,h
     or l
     dec hl
     ret z
     inc hl
     push hl
     ld b,15
     add hl,hl
     jr c,$+4
     djnz $-3
     ld a,b
     sub 8
     jr nc,$+4
     neg
     ld b,a
     pop hl
     push af
     jr nz,lnx
     jr nc,$+7
     add hl,hl
     djnz $-1
     jr lnx
     sra h
     rr l
     djnz $-4
lnx:
     dec h        ;subtract 1 so that we are doing ln((x-1)+1) = ln(x)
     ld d,h
     ld e,l
     inc h
     call FPDE_Div_HL  ;preserves DE, returns AHL as the 16.8 result
     inc h        ;now we are doing x/(3+Ans)
     inc h
     inc h
     call FPDE_Div_HL
     inc h        ;now we are doing x/(2+Ans)
     inc h
     call FPDE_Div_HL
     inc h        ;now we are doing x/(1+Ans)
     call FPDE_Div_HL  ;now it is computed to pretty decent accuracy
     pop af       ;the power of 2 that we divided the initial input by
     ret z        ;if it was 0, we don't need to add/subtract anything else
     ld b,a
     jr c,SubtLn2
     push hl
     xor a
     ld de,$B172  ;this is approximately ln(2) in 0.16 FP format
     ld h,a
     ld l,a
     add hl,de
     jr nc,$+3
     inc a
     djnz $-4
     pop de
     rl l         ;returns c flag if we need to round up
     ld l,h
     ld h,a
     jr nc,$+3
     inc hl
     add hl,de
     ret
SubtLn2:
     ld de,$00B1
       or a
       sbc hl,de
       djnz $-3
     ret


FPDE_Div_HL:
;Inputs:
;     DE,HL are 8.8 Fixed Point numbers
;Outputs:
;     DE is preserved
;     AHL is the 16.8 Fixed Point result (rounded to the least significant bit)
     di
     push de
     ld b,h
     ld c,l
     ld a,16
     ld hl,0
Loop1:
     sla e
     rl d
     adc hl,hl
     jr nc,$+8
     or a
     sbc hl,bc
     jp incE
     sbc hl,bc
     jr c,$+5
incE:
     inc e
     jr $+3
     add hl,bc
     dec a
     jr nz,Loop1
     ex af,af'
     ld a,8
Loop2:
     ex af,af'
     sla e
     rl d
     rla
     ex af,af'
     add hl,hl
     jr nc,$+8
     or a
     sbc hl,bc
     jp incE_2
     sbc hl,bc
     jr c,$+5
incE_2:
     inc e
     jr $+3
     add hl,bc
     dec a
     jr nz,Loop2
;round
     ex af,af'
     add hl,hl
     jr c,$+6
     sbc hl,de
     jr c,$+9
     inc e
     jr nz,$+6
     inc d
     jr nz,$+3
     inc a
     ex de,hl
     pop de
     ret
On average this is off from the actual value by about 1/400. Since this is smaller than the smallest 8.8 value, it is pretty accurate. The worst case scenario that I have found so far was trying to find ln(1/256) which was off by about 7/512 from the actual value.

Later I will hopefully be working on sin() and cos() among others, but I thought I would share this and see if anybody had better methods.

EDIT: Thought of a fairly simple optimisation. I am going to test another optimisation that may speed it up by as much as 40%
EDIT2: Here is a much smaller and more efficient log2 routine. It is 71 bytes (so 117 bytes smaller) and takes <4000 t-states:
EDIT3: I found a handful of bugs (errors in translating from hex to assembly) so I went through the code and fixed it. I also optimised a few areas, saving a total of 1 byte and some clock cycles. Also, a legitimate use of 'rr a' instead of 'rra'. I only added that for ambiguity since I only needed to check if A>1. In a later post, I have a size optimised version that averages about 39 t-states slower which is about 1% slower.
Code: [Select]
Log_2_88:
;Inputs:
;     HL is an unsigned 8.8 fixed point number.
;Outputs:
;     HL is the signed 8.8 fixed point value of log base 2 of the input.
;Example:
;     pass HL = 3.0, returns 1.58203125 (actual is ~1.584962501...)
;70 bytes
     ex de,hl
     ld hl,0
     ld a,d
     ld c,8
     or a
     jr z,DE_lessthan_1
     srl d
     jr z,logloop-1
     inc l
     rr e
     jp $-7
DE_lessthan_1:
     ld a,e
     dec hl
     or a
     ret z
     inc l
     dec l
     add a,a
     jr nc,$-2
     ld e,a

     inc d
logloop:
     add hl,hl
     push hl
     ld h,d
     ld l,e
     ld a,e
     ld b,7

     add hl,hl
     rla
     jr nc,$+3
       add hl,de
     djnz $-5

     adc a,0
     add hl,hl
     rla
     jr nc,$+5
       add hl,de
       adc a,0
     ld e,h
     ld d,a
     pop hl
     rr a
     jr z,$+7
       srl d
       rr e
       inc l
     dec c
     jr nz,logloop
     ret
Title: Re: Fixed Point Logarithm
Post by: tr1p1ea on May 23, 2013, 04:47:35 pm
Wow nice! Im not sure ive seen anyone write such a routine before?!
Title: Re: Fixed Point Logarithm
Post by: utz on May 23, 2013, 07:06:23 pm
Very nice. A minor optimization: in line 109 you can use rla instead of rl a, since you're not checking that zero flag anywhere.
Title: Re: Fixed Point Logarithm
Post by: Matrefeytontias on May 24, 2013, 01:42:35 am
I can't really tell, since because of my school level, I even don't know what a logarithm is :P but I wonder what'd be the use of your routine ..?
Title: Re: Fixed Point Logarithm
Post by: ElementCoder on May 24, 2013, 02:18:53 am
Logarithms are used to find out the power you should raise a certian number to to get another number. The natural logarithm (used here) is a base e logarithm, ln(x) is asking to what power should I raise e to get x. Hope you got it :P
Title: Re: Fixed Point Logarithm
Post by: Streetwalrus on May 24, 2013, 07:02:45 am
In equation form, ln(a) is the solution to e^x=a, which is the definition we got in Terminale. BTW e is Euler's constant which is so that e^x=exp(x) where exp is the exponential function. :P
Title: Re: Fixed Point Logarithm
Post by: Matrefeytontias on May 25, 2013, 04:37:10 am
Okay, I got it, thanks :P but I still don't really see what could be the use of this routine in a program.
Title: Re: Fixed Point Logarithm
Post by: DrDnar on May 25, 2013, 05:06:16 pm
I'd just create two look-up tables: one for 0<n<1 and another for 1<n<255. Then, branch depending on the MSB of n. Actually, if you compute a reciprocal, you only need one table. Basically, your input really only has 8 significant bits; surely that can be used for an optimization. Your output domain is -5.6<n<5.6.

Edit: Use change-of-base rule: It's trivial to compute n-log-base-2, so just compute that then divide by the hardcoded log-base-2 of e value.
Title: Re: Fixed Point Logarithm
Post by: Xeda112358 on May 31, 2013, 06:55:30 am
Very nice. A minor optimization: in line 109 you can use rla instead of rl a, since you're not checking that zero flag anywhere.
Oh, oops, I copied that from my calc incorrectly, thanks!

I'd just create two look-up tables: one for 0<n<1 and another for 1<n<255. Then, branch depending on the MSB of n. Actually, if you compute a reciprocal, you only need one table. Basically, your input really only has 8 significant bits; surely that can be used for an optimization. Your output domain is -5.6<n<5.6.

Edit: Use change-of-base rule: It's trivial to compute n-log-base-2, so just compute that then divide by the hardcoded log-base-2 of e value.
To your first point, you would only really need to look at the interval [1,2] which is all my code does. Essentially, it computes ln(x) by taking x=y2n where y is on [1,2], then ln(y2n) = ln(y)+ln(2n)= ln(y)+n*ln(2) and I have ln(2) hardcoded.

That being said, what you put in your edit makes a lot of sense. Shortly after this routine, I wrote a log2 routine that is much faster, and because I would be dividing by a constant log2(e) to change the base to natural log, the division routine could be optimised a bit, saving ~400 cycles. Also, the log2 routine is much better because I can extend it to just about any size fixed-point (or floating point) numbers that I want. I'll edit my first post with that.
Title: Re: Fixed Point Logarithm
Post by: Xeda112358 on June 30, 2013, 06:21:22 pm
I found a few errors in the log_2 routine (translation issues from hex->assembly). As well, I rearranged the code and in all saved a byte and some t-states :) I updated the first post with my preferred version which is slightly speed optimised in the same way as the last version. Here is a cleaner, smaller version and is only slightly slower (about 39 t-states on average):
Code: [Select]
Log_2_88_size:
;Inputs:
;     HL is an unsigned 8.8 fixed point number.
;Outputs:
;     HL is the signed 8.8 fixed point value of log base 2 of the input.
;Example:
;     pass HL = 3.0, returns 1.58203125 (actual is ~1.584962501...)
;averages about 39 t-states slower than original
;62 bytes
     ex de,hl
     ld hl,0
     ld a,d
     ld c,8
     or a
     jr z,DE_lessthan_1
     srl d
     jr z,logloop-1
     inc l
     rr e
     jr $-7
DE_lessthan_1:
     ld a,e
     dec hl
     or a
     ret z
     inc l
     dec l
     add a,a
     jr nc,$-2
     ld e,a

     inc d
logloop:
     add hl,hl
     push hl
     ld h,d
     ld l,e
     ld a,e
     ld b,8

     add hl,hl
     rla
     jr nc,$+5
       add hl,de
       adc a,0
     djnz $-7

     ld e,h
     ld d,a
     pop hl
     rr a           ;this is right >_>
     jr z,$+7
       srl d
       rr e
       inc l
     dec c
     jr nz,logloop
     ret
Also, I used 'rr a' here for the flags to test if a>1 (the c flag is reset before hand, but I could have used sra a or srl a or bit 1,a or bit 1,d).