Omnimaga

Calculator Community => TI Calculators => ASM => Topic started by: cerzus69 on December 07, 2011, 08:13:26 am

Title: 24 bit multiplication
Post by: cerzus69 on December 07, 2011, 08:13:26 am
I have been looking for two days now to find a z80 ASM routine that multiplies two 24 bit numbers to get a 48 bit result.

I need it because I want to try and make a Mandelbrot set generator for the Ti84+ using Q2.21 Fixed Point numbers.

Now I would try to come up with one myself but if anyone knows of an existing one I'd like to know!

Thanks
Title: Re: 24 bit multiplication
Post by: jacobly on December 07, 2011, 11:05:18 pm
I do have a 24-bit floating-point multiplication routine ;D

Anyway, for normal 24-bit multiplication, it might just barely be possible without using memory, iy, or shadow registers...
Code: [Select]
; hldebc = hla * cde, a = 0
ld b,0
push hl
ld l,b
ld h,a
ld ix,0
ld a,24
Loop:
add ix,ix
adc hl,hl
ex (sp),hl
adc hl,hl
ex (sp),hl
jr nc,Next
add ix,de
adc hl,bc
jr nc,Next
ex (sp),hl
inc hl
ex (sp),hl
Next:
dec a
jr nz,Loop
pop de
ex de,hl
push ix ; ld c,ixl
pop bc ; ld b,ixh
...cool, I was right!

saved 2 bytes, 1149 cycles saved by using iy too (and in a way compatible with TIOS, imagine that)
Code: [Select]
; hldebc = hlc * bde
ld (iy+asm_Flag1),b
xor a
ld ix,0
ld b,24
Loop:
add ix,ix
rla
rl c
adc hl,hl
jr nc,Next
add ix,de
adc a,(iy+asm_Flag1)
rl c
jr nc,Next
inc hl
Next:
djnz Loop
ld e,a
ld d,c
push ix ; ld c,ixl
pop bc ; ld b,ixh
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 08, 2011, 10:01:41 am
Thanks a lot jacobly for your reply!

So, those 2 routines, are they for floating point numbers or for integers? I assume the latter, but just to be sure :)
Any idea how long they take to complete on average? I could count myself but seeing you talking about saving 1149 cycles makes me think you have an idea yourself ;)
Title: Re: 24 bit multiplication
Post by: ACagliano on December 08, 2011, 10:11:48 am
And, I'm assuming the result is in bc?
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 08, 2011, 10:12:35 am
hldebc, actually, according to the note in the first code (I am assuming)
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 08, 2011, 10:27:12 am
That's right, because the a 24 bit times 24 bit multiplication can have a result of at most 48 bits or 6 bytes, hence it should be stored in 'hldebc' and not just 'bc'.
Title: Re: 24 bit multiplication
Post by: ACagliano on December 08, 2011, 10:38:11 am
Ok, this is a bit of a stretch, but how about square-rooting a 24-bit number. Because I need to do the distance formula with a three-byte inputs for x1, x2, y1, y2, z1, and z2.
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 08, 2011, 10:53:11 am
Quote
Ok, this is a bit of a stretch, but how about square-rooting a 24-bit number

Interesting question. I have thought about this myself since I would need to square as well for the Mandelbrot formula. I would just multiplicate normally, but I bet there could be some optimization when both the multiplier and multiplicator are the same. Perhaps you could try something with just 8 bits first to keep it simple and then if it works, work up to 24 bits.

EDIT: Sorry, read you question wrong.. thought you meant squaring. Square-rooting would be a little harder I suppose...
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 08, 2011, 10:55:41 am
Hmm, that is an interesting question... I know of a simple algorithm that I have ported to 8-bit and 16-bit z80 code, but I haven't tried 24-bit or 32-bit...

EDIT: @cerzus69: he meant sqrt(x) as opposed to x2 I believe
Title: Re: 24 bit multiplication
Post by: ACagliano on December 08, 2011, 11:12:03 am
Yes. Basically, here's what I'm trying to do:

1. (x2-x1)^2 + (y2-y1)^2 + (z2-z1)^2  --> Ans , where all x,y,and z coords are 3 bytes large
2. √Ans
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 08, 2011, 11:19:11 am
If all the coords are 3 bytes large wouldn't that mean that Ans could possibly be 6 bytes large?
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 08, 2011, 11:20:31 am
So you really need 48-bit square root...
This could be fun x.x :D
Title: Re: 24 bit multiplication
Post by: ACagliano on December 08, 2011, 11:31:05 am
Ideas? Anyone willing to have "fun" with this?
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 08, 2011, 11:36:34 am
Well I am almost positive that using my algorithm would require the use of RAM, even if you used all the shadow registers, too.
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 08, 2011, 11:36:52 am
I wouldn't know where to start, I'm having problems enough with 24bit multiplication... :(
Title: Re: 24 bit multiplication
Post by: ACagliano on December 08, 2011, 03:11:20 pm
Well I am almost positive that using my algorithm would require the use of RAM, even if you used all the shadow registers, too.
Can you help me, conceptually with doing math higher than 2 bytes? Maybe I'll write it myself.
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 08, 2011, 03:54:25 pm
Okay, here is some pseudo code for an algorithm that will always work and is fast (each cycle gives the next digit):
Code: [Select]
    0→A
     0→D          ;This is where the result will go
     0→C          ;This is a carry
     For(B,1,12)  ;12 is half the number of input bits
     D+D→D        ;We want to shift D left
     D+D+1→C      ;The difference of consecutive squares?
     A<<E         ;rotate E left, then rotate the carry into A
     A<<E         ;In z80, "rlc e \ rla"
     If A>=C      ;If A is greater than or equal to C
       D+1→D
       A-C→A
     End
;You will need an input       (24 bits in the above example)
;You will need an output      (12 bits in the above example)
;You will need a counter      (4 bits in the above example)
;You will need an accumulator (24 bits in the above example)
;You will need a carry        (13 bits in the above example)
;
;You will need at least       (77 bits in the above example)
So for a simple 8-bit implementation in Z80 code:
Code: [Select]
;===============================================================
sqrtE:
;===============================================================
;Input:
;     E is the value to find the square root of
;Outputs:
;     A is E-D^2
;     B is 0
;     D is the result
;     E is not changed
;     HL is not changed
;Destroys:
;     C=2D+1 if D is even, 2D-1 if D is odd

        xor a               ;1      4         4
        ld d,a              ;1      4         4
        ld c,a              ;1      4         4
        ld b,4              ;2      7         7
sqrtELoop:
        rlc d               ;2      8        32
        ld c,d              ;1      4        16
        scf                 ;1      4        16
        rl c                ;2      8        32

        rlc e               ;2      8        32
        rla                 ;1      4        16
        rlc e               ;2      8        32
        rla                 ;1      4        16

        cp c                ;1      4        16
        jr c,$+4            ;4    12|15      48+3x
          inc d             ;--    --        --
          sub c             ;--    --        --
        djnz sqrtELoop      ;2    13|8       47
        ret                 ;1     10        10
If you want to refine the accuracy to round to the nearest integer, you can add this code to the end of the Z80 code:
Code: [Select]
       cp d                ;1      4         4
        jr c,$+3            ;3    12|11     12|11
          inc d             ;--    --        --
It takes advantage of the linear differences of a quadratic (meaning (a(x+1)2+b(x+1)+c)-(ax2+bx+c) is a lnear equation)

Anywho, I am being told I need to finish this up, so I hope this helps until I can find time later >.>
Title: Re: 24 bit multiplication
Post by: ACagliano on December 08, 2011, 03:57:06 pm
Ok, I'm confused. I'll leave this to the more advanced programmers for now.
Title: Re: 24 bit multiplication
Post by: thepenguin77 on December 08, 2011, 07:40:32 pm
Edit2:
  Dang, I need to read closer. Oh well, at least mine is way bigger than jacobly's.

These things actually aren't that difficult to write if you understand the principle by which they work.

Code: [Select]

multBCDbyEHL:
ld a, b
ld b, c
ld c, d
ld d, a
multDBCbyEHL:

push de
push hl

ld ix, $8000
ld (ix), l
xor a
ld h, a
ld l, a

call do8Bits

ld (ix+1), a
pop af
ld (ix), a
push de
ld a, (ix+1)

call do8Bits

ld (ix+1), a
pop af ;least sig number
ex de, hl
ex (sp), hl ;D and 2nd least sig in for new number
ld (ix), l
pop hl ;D and 2nd least sig
push af ;least sig number
push hl ;2nd least sig number
ex de, hl
ld a, (ix+1)

call do8Bits

ld b, a
ld c, h
ld d, l
pop hl
ld a, l
pop hl
ld h, a
ret





;####
;input: DBC = 1 number
; AHL = running number
; (ix) = to multiply by
;output: AHLE = output
; E is done
; DBC = 1 number


do8Bits:
ld (ix+1), 8
loop:
srl (ix)
jr nc, skip

add hl, bc
adc a, d
skip:
rra
rr h
rr l
rr e
dec (ix+1)
jr nz, loop
ret

I had to use 2 bytes of memory, so sorry about that. If you don't like it, you could just replace (ix) with ixl and (ix+1) with ixh and it would be ram independent, but then it wouldn't run on the Nspire.

Also, to my extreme delight, this thing worked on the first try ;D

(The result is in BCDEHL)

Edit:
   Added to wikiti so people will forever be able to multiply large numbers.
Title: Re: 24 bit multiplication
Post by: jacobly on December 08, 2011, 08:35:58 pm
My first multiplication routine takes 2746 - 4570 cycles, the second takes 1680 - 2880 cycles.
Edit: And yes, they are 24*24 bit integer multiplication routines (floating point routines are much more complicated :P).

As for square root, I should be able to do it using only af bc de hl ix (sp), or af bc de hl ix iy, just like the multiplication. Just give me a couple hours to port it to 48-bit.

Edit:
uses af bc de hl ix iy (for two bytes of memory)
84 bytes, 9186-9258 cycles
Code: [Select]
; ahl = sqrt(hldebc)
push bc ; ld c,ixl
pop ix ; ld b,ixh
push de
ld c,l
ld a,h
ld hl,0
ld b,h
ld e,l
ld d,h
ld (iy+asm_Flag1),d
ld (iy+asm_Flag2),24
Loop:
cp $40
push af
sbc hl,de
ld a,b
sbc a,(iy+asm_Flag1)
jr c,Restore
ld b,a
pop af
sub $40
scf
jr Skip
Restore:
pop af
adc hl,de
or a
Skip:
rl e
rl d
rl (iy+asm_Flag1)
add ix,ix
ex (sp),hl
adc hl,hl
ex (sp),hl
rl c
rla
adc hl,hl
rl b
add ix,ix
ex (sp),hl
adc hl,hl
ex (sp),hl
rl c
rla
adc hl,hl
rl b
dec (iy+asm_Flag2)
jr nz,Loop
pop hl
ld a,(iy+asm_Flag1)
ret

uses af bc b' de de' hl hl' ix
74 bytes, 5985-6777 cycles
Code: [Select]
; dea = sqrt(hldebc)
di
push bc ; ld c,ixl
pop ix ; ld b,ixh
ld bc,$40
ld a,l
ld l,h
ld h,b
exx
ld de,0
ld l,e
ld h,d
ld b,24
or a
Loop:
exx
sbc hl,bc
exx
sbc hl,de
jr nc,Skip
exx
add hl,bc
exx
adc hl,de
Skip:
exx
ccf
rl b
exx
rl e
rl d
exx
add ix,ix
rl e
rl d
rla
adc hl,hl
exx
adc hl,hl
exx
add ix,ix
rl e
rl d
rla
adc hl,hl
exx
adc hl,hl
djnz Loop
exx
ld a,b
exx
ei
ret
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 09, 2011, 07:39:22 pm
Okay, so this isn't too helpful, but it is an idea: I came up with a very simple algorithm that gets very accurate very fast (12 cycles for 16-bit numbers). On the Z80 it isn't that fast since it requires multiplication and division, but the algorithm looks like this:
Code: [Select]
B/2→A             ;start with anything, really
For(C,1,12
(A+B/A)/2→A
End

To implement this in assembly code, you will want to multiply the input by 4 (shifting left twice) and then divide the output by 2. Then if you want added precision, if there is a carry, increment by 1. Again, in z80, this algorithm isn't really worth it since it isn't as speed efficient as other methods, but it can be easier to grasp :) Plus, if you move on to coding for devices with the ability to multiply and divide, you will have a really tiny algorithm that uses a counter, input, and output as opposed to a bajillion registers.
Title: Re: 24 bit multiplication
Post by: jacobly on December 10, 2011, 02:45:24 am
Actually, my routine only uses about 3 virtual registers. The only reason it uses so many z80 registers is because the virtual registers are so big. In fact, the first routine uses the same number of bits as your's would (don't forget that A and B in your routine would each be 48 bits wide). However, your routine does have the advantage of using similarly sized registers (and fewer iterations), so it probably would be more useful on other processors.
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 10, 2011, 08:26:06 am
Hey, jacobly, I tested both of your multiplication routines and it doesn't seem like they're doing the same thing... I've tried both of them in my program but they have different results.
Title: Re: 24 bit multiplication
Post by: ACagliano on December 10, 2011, 09:24:34 am
Ok, I'm actually going to be sticking with just two bytes for position and scrapping the sector. That means I'll need to do 2-byte subtraction (no problem), square a 2-byte (rendering a max 4-byte), then add three 4-bytes (do you only need a 4-byte output or should you go up to 5), then square root that. That should be easier.

√( (x2-x1)^2 + (y2-y1)^2 + (z2-z1)^2)

√( (16bit-16bit)^2 + (16bit-16bit)^2 + (16bit-16bit)^2)


Can someone explain to me the theory behind multiplying/dividing/sqrt'ing numbers? Is there a standard theorem, regardless of the bit-size of the number?
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 10, 2011, 12:25:16 pm
Okay, so multiplying and dividing you probably actually know-- you just don't know it. So you remember in grade school learning how to multiply large numbers:
Code: [Select]
  432
  x27
-----
Well that is actually the best method and fastest. What you do is multiply 2*432. THen you shift the digits left and throw a zero at the end. Then you multiply 432*7 and add the two values together. Voila, multiplication :) Now in binary, math is almost always easier. Because you are using 0 and 1, multiplication is easy:
Code: [Select]
11000101
11010111
so you take the last digit of the second number and you get 1. Multiply 1 times the first and you get 11000101. Now shift that left and check the next bit of the bottom number. If that is 1, add it to the accumulator (the accumulator in this case is the running total, not necessarily the a register).
So here is what it looks like in assembly code:
Code: [Select]
;D*E
     xor a       ;This is the accumulator
     ld b,8      ;this is the counter
MultLoop:
     add a,a     ;this is to rotate the accumulator left. Use rlca, too
     rlc e       ;This puts the next bit in E in the carry
     jr nc,$+3   ;If the bit in E was not 1, you add 0 (so don't add!)
       add a,d   ;1*D=D, so add D. Binary makes math easy.
     djnz MultLoop
     ret

When it comes to division in z80, you are pretty much just doing long division. Finding the square root, however, is something you probably didn't do in school since we have calculators and it was taken out of the school curriculum (I sound old D:). Anywho, I learned how to do it in decimal from a textbook from the 1950's and then I extended it to binary since it is much, much easier in binary. It is difficult to explain in a post (you kind of need to see it to understand it), but here is an excellent site:
http://www.homeschoolmath.net/teaching/square-root-algorithm.php

The second method is what you will want to use as it gives each consecutive digit every cycle. If you can understand that well, you will probably see why it is so much easier in binary and how to create a z80 algorithm :)

Title: Re: 24 bit multiplication
Post by: jacobly on December 10, 2011, 06:13:47 pm
Hey, jacobly, I tested both of your multiplication routines and it doesn't seem like they're doing the same thing... I've tried both of them in my program but they have different results.
They both have different inputs. :)
The first is hla * cde, and the second is hlc * bde.
You probably want to add some code to the beginning of each so that the input works better for what you are doing.
For example:

   ld   a,c
   ld   c,b

at the beginning of the first routine causes them to have the same input (hlc and bde).

Edit: Here's some pseudo code that might help.
Code: [Select]
// Multiply a times b
temp = 0
repeat for each bit in a
temp <<= 1
if (high bit of a set)
temp += b
a <<= 1
return temp

// Divide a by b
temp = 0
repeat for each bit in a
temp <<= 1
temp += high bit of a
a <<= 1
if (temp >= b)
temp -= b
set low bit of a
return a

// Sqrt a
temp = 0
b = 0
repeat for every 2 bits in a
temp += high 2 bits of a
a <<= 2
test = b << 2 + 1
b <<= 1
if (temp >= test)
temp -= test
set low bit of b
return b
// Sqrt a, sometimes better with multiple-of-a-byte registers
temp = high byte of a
a <<= 8
b = 0
repeat for every 2 bits in a
test = b << 8 + 0x40
b <<= 1
if (temp >= test)
temp -= test
set low bit of b
temp += high 2 bits of a
a <<= 2
return b
The tricky part is figuring out how many bits are in each variable and allocating the z80 registers accordingly.
Title: Re: 24 bit multiplication
Post by: ACagliano on December 11, 2011, 11:31:56 am
Ok, so let me just get something straight...

 00100110
x11011001
-------------
1. 00100110
2. 0,01001100
3. 00,10011000
4. 001,00110001
5. 0010,01100011
6. 00100,11000110
7. 001001,10001101
8. 0010011,00011011

Thus, the answer is %00010011 %00011011

Now, let's check: 38 x 217 = 4891  XX it's wrong??
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 11, 2011, 11:55:42 am
Hmm, here is the algorithm in decimal:
Code: [Select]
333
x471
----
1) 4*333
   +Acc  = 1332
   Acc*10= 13320
2) 7*333
   +Acc  = 15651
   Acc*10= 156510
3) 1*333
   +Acc  = 156843
Here is the algorithm in binary:
Code: [Select]
00100110
x11011001
---------------
1) 1x00100110
     +Acc = 00100110
     Acc*2= 01001100
2) 1x00100110
     +Acc = 01001100+00100110=01110010
     Acc*2= 11100100
3) 0x00100110
     +Acc = 11100100
     Acc*2= 111001000
4) 1x00100110
     +Acc = 111101110
     Acc*2= 1111011100
5) 1x00100110
     +Acc = 10000000010
     Acc*2= 100000000100
6) 0x00100110
     +Acc = 100000000100
     Acc*2= 1000000001000
7) 0x00100110
     +Acc = 1000000001000
     Acc*2= 10000000010000
8) 1x00100110
     +Acc = 10000000110110
Title: Re: 24 bit multiplication
Post by: ACagliano on December 11, 2011, 12:47:33 pm
Ok, I think I get it.

You move through the second register. If the current bit is 1, you add 1 x a to a, then rla a. If the current bit is 0, just rla a.
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 11, 2011, 01:07:43 pm
Yes :) Typically you do rla, then check the next bit and if it is set, add the other register :)

The rla at the beginning works because a starts at 0 so 0*0=0.
Title: Re: 24 bit multiplication
Post by: ACagliano on December 11, 2011, 01:41:47 pm
Ok. I am particularly interested now in 2-byte multiplication and 4-byte square rooting. How would they be done?
Title: Re: 24 bit multiplication
Post by: jacobly on December 11, 2011, 02:06:55 pm
Code: [Select]
// Multiply a times b
temp = 0
repeat for each bit in a
temp <<= 1
if (high bit of a set)
temp += b
a <<= 1
return temp
if a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.
Spoiler For for code:
stolen from Axe
p_MulFull:
   ; Input in hl, result in cahl
   ld   c,h
   ld   a,l
   ld   hl,0   ;11
   ld   b,16   ;7
__MulFullNext:
   add   hl,hl   ;11
   rla      ;4
   rl   c   ;8
   jr   nc,__MulFullSkip   ;12/7
   add   hl,de   ;11
   adc   a,0   ;7
   jr   nc,__MulFullSkip
   inc   c
__MulFullSkip:
   djnz   __MulFullNext
   ret
__MulFullEnd:

Code: [Select]
// Sqrt a
temp = high byte of a
a <<= 8
b = 0
repeat for every 2 bits in a
test = b << 8 + 0x40
b <<= 1
if (temp >= test)
temp -= test
set low bit of b
temp += high 2 bits of a
a <<= 2
return b
If a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.
Spoiler For code:
stole my own routine from axe (and modified it)
p_Sqrt88:
   ; input in hlde, result in de
   ld   b,16
   ld   a,h
   ld   c,l
   push   de ; ld ixh,d
   pop   ix ; ld ixl,e
   ld   de,0
   ld   h,d
   ld   l,e
__Sqrt88Loop:
   sub   $40
   sbc   hl,de
   jr   nc,__Sqrt88Skip
   add   a,$40
   adc   hl,de
__Sqrt88Skip:
   ccf
   rl   e
   rl   d
   add   ix,ix
   rl   c
   rla
   adc   hl,hl
   add   ix,ix
   rl   c
   rla
   adc    hl,hl
   djnz   __Sqrt88Loop
   ret
__Sqrt88End:
Title: Re: 24 bit multiplication
Post by: ACagliano on December 11, 2011, 02:23:11 pm
Code: [Select]
// Multiply a times b
temp = 0
repeat for each bit in a
temp <<= 1
if (high bit of a set)
temp += b
a <<= 1
return temp
if a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.
Spoiler For for code:
stolen from Axe
p_MulFull:
   ; Input in hl, result in cahl
   ld   c,h
   ld   a,l
   ld   hl,0   ;11
   ld   b,16   ;7
__MulFullNext:
   add   hl,hl   ;11
   rla      ;4
   rl   c   ;8
   jr   nc,__MulFullSkip   ;12/7
   add   hl,de   ;11
   adc   a,0   ;7
   jr   nc,__MulFullSkip
   inc   c
__MulFullSkip:
   djnz   __MulFullNext
   ret
__MulFullEnd:

Code: [Select]
// Sqrt a
temp = high byte of a
a <<= 8
b = 0
repeat for every 2 bits in a
test = b << 8 + 0x40
b <<= 1
if (temp >= test)
temp -= test
set low bit of b
temp += high 2 bits of a
a <<= 2
return b
If a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.
Spoiler For code:
stole my own routine from axe (and modified it)
p_Sqrt88:
   ; input in hlde, result in de
   ld   b,16
   ld   a,h
   ld   c,l
   push   de ; ld ixh,d
   pop   ix ; ld ixl,e
   ld   de,0
   ld   h,d
   ld   l,e
__Sqrt88Loop:
   sub   $40
   sbc   hl,de
   jr   nc,__Sqrt88Skip
   add   a,$40
   adc   hl,de
__Sqrt88Skip:
   ccf
   rl   e
   rl   d
   add   ix,ix
   rl   c
   rla
   adc   hl,hl
   add   ix,ix
   rl   c
   rla
   adc    hl,hl
   djnz   __Sqrt88Loop
   ret
__Sqrt88End:

Kool. Thanks. But, shouldn't the first one have two inputs?
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 11, 2011, 02:30:16 pm
So with two-byte multiplication, you can take advantage of the fact that add hl,hl is the same as shifting hl left. It even gives you the carry! So in this case:
Code: [Select]
     ld hl,0
     ld a,16
MultLoop:
     add hl,hl      ;shifts hl left
     rl e \ rl d    ;shifts de left and if hl overflowed, it overflows into de
     jr nc,$+6      ;if the bit in DE is o, skip this chunk
       add hl,bc    ;add bc to hl (think of this as the first number)
       jr nc,$+3    ;overflow into de
         inc de
     dec a
     jr nz,MultLoop
     ret
That will multiply DE times BC and return the result in DEHL. I will see if I can port a square root routine for 32-bit...

EDIT: changed inc e to inc de
Title: Re: 24 bit multiplication
Post by: jacobly on December 11, 2011, 02:48:20 pm
Code: [Select]
// Multiply a times b
temp = 0
repeat for each bit in a
temp <<= 1
if (high bit of a set)
temp += b
a <<= 1
return temp
if a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.
Spoiler For for code:
stolen from Axe
p_MulFull:
   ; Input in hl and de, result in cahl
   ld   c,h
   ld   a,l
   ld   hl,0   ;11
   ld   b,16   ;7
__MulFullNext:
   add   hl,hl   ;11
   rla      ;4
   rl   c   ;8
   jr   nc,__MulFullSkip   ;12/7
   add   hl,de   ;11
   adc   a,0   ;7
   jr   nc,__MulFullSkip
   inc   c
__MulFullSkip:
   djnz   __MulFullNext
   ret
__MulFullEnd:

Code: [Select]
// Sqrt a
temp = high byte of a
a <<= 8
b = 0
repeat for every 2 bits in a
test = b << 8 + 0x40
b <<= 1
if (temp >= test)
temp -= test
set low bit of b
temp += high 2 bits of a
a <<= 2
return b
If a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.
Spoiler For code:
stole my own routine from axe (and modified it)
p_Sqrt88:
   ; input in hlde, result in de
   ld   b,16
   ld   a,h
   ld   c,l
   push   de ; ld ixh,d
   pop   ix ; ld ixl,e
   ld   de,0
   ld   h,d
   ld   l,e
__Sqrt88Loop:
   sub   $40
   sbc   hl,de
   jr   nc,__Sqrt88Skip
   add   a,$40
   adc   hl,de
__Sqrt88Skip:
   ccf
   rl   e
   rl   d
   add   ix,ix
   rl   c
   rla
   adc   hl,hl
   add   ix,ix
   rl   c
   rla
   adc    hl,hl
   djnz   __Sqrt88Loop
   ret
__Sqrt88End:

Kool. Thanks. But, shouldn't the first one have two inputs?

Of course, hl and de, isn't that what I said ;)
Title: Re: 24 bit multiplication
Post by: FloppusMaximus on December 11, 2011, 04:41:31 pm
My first multiplication routine takes 2746 - 4570 cycles, the second takes 1680 - 2880 cycles.
Oh boy, optimization time :D

The best I have so far is somewhere around 1800 cycles average (I'm too lazy to work out the exact probabilities at the moment, and not counting memory delays) using a squaring table and undocumented IX instructions.  Input is BDE and CHL, output is BCDEAL.  This routine works by expanding the formula 2xy = x²+y²-|x-y|², summed over each of the 9 pairs of bytes in the input.

(I'm not saying this is practical - unless you really have thousands of 24-bit multiplications to perform, you don't need this kind of speed.  This is just for fun.)
Code: [Select]
SUBFIRST .macro src1, src2, hdest, ldest
exx
ld a, src1
sub src2
jr nc, $ + 4
neg
exx
ld l, a
ld a, ldest
sub (hl)
ld ldest, a
inc h
ld a, hdest
sbc a, (hl)
ld hdest, a
  .endm

SUBNEXT .macro src1, src2, hdest, ldest
dec h
ex af, af'
exx
ld a, src1
sub src2
jr nc, $ + 4
neg
exx
ld l, a
ex af, af'
ld a, ldest
sbc a, (hl)
ld ldest, a
inc h
ld a, hdest
sbc a, (hl)
ld hdest, a
  .endm

BDE_times_CHL_sqrdiff_v3:
ld a, d
exx
ld h, high(sqrtab)
ld l, a
ld e, (hl)
inc h
ld d, (hl) ; DE = d²
exx
ld a, b
exx
ld l, a
ld b, (hl)
dec h
ld c, (hl) ; BC = b²
exx
ld a, e
exx
ld l, a
ld a, (hl)
inc h
ld h, (hl)
ld l, a ; HL = e²
call BC_DE_HL_times_10101
push bc
push hl
  push de
   exx
   ld a, h
   exx
   ld h, high(sqrtab)
   ld l, a
   ld e, (hl)
   inc h
   ld d, (hl) ; DE = h²
   exx
   ld a, c
   exx
   ld l, a
   ld b, (hl)
   dec h
   ld c, (hl) ; BC = c²
   exx
   ld a, l
   exx
   ld l, a
   ld a, (hl)
   inc h
   ld h, (hl)
   ld l, a ; HL = l²
   call BC_DE_HL_times_10101
   pop ix
  add ix, de
  pop de
adc hl, de
ex de, hl
pop hl
adc hl, bc
ld b, h
ld c, l ; BCDEIX = total
push af

ld h, high(sqrtab)
SUBFIRST e, l, ixh, ixl
SUBNEXT  d, h, d, e
SUBNEXT  b, c, b, c
jp nc, BDE_times_CHL_sqrdiff_v3_nc1
pop af
ccf
push af
BDE_times_CHL_sqrdiff_v3_nc1:

inc b

dec h
SUBFIRST e, h, e, ixh
SUBNEXT  d, c, c, d
jr nc, BDE_times_CHL_sqrdiff_v3_nc2
dec b
jp nz, BDE_times_CHL_sqrdiff_v3_nc2
pop af
ccf
push af
BDE_times_CHL_sqrdiff_v3_nc2:

dec h
SUBFIRST d, l, e, ixh
SUBNEXT  b, h, c, d
jr nc, BDE_times_CHL_sqrdiff_v3_nc3
dec b
jp nz, BDE_times_CHL_sqrdiff_v3_nc3
pop af
ccf
push af
BDE_times_CHL_sqrdiff_v3_nc3:

inc c

dec h
SUBFIRST b, l, d, e
jr nc, BDE_times_CHL_sqrdiff_v3_nc4
dec c
jp nz, BDE_times_CHL_sqrdiff_v3_nc4
dec b
jp nz, BDE_times_CHL_sqrdiff_v3_nc4
pop af
ccf
push af
BDE_times_CHL_sqrdiff_v3_nc4:

dec h
SUBFIRST e, c, d, e
pop hl
jr nc, BDE_times_CHL_sqrdiff_v3_nc5
dec c
jp nz, BDE_times_CHL_sqrdiff_v3_nc5
dec b
jp nz, BDE_times_CHL_sqrdiff_v3_nc5
inc l
BDE_times_CHL_sqrdiff_v3_nc5:

dec b
dec c

rr l
rr b
rr c
rr d
rr e
ld a, ixl
ld l, a
ld a, ixh
rra
rr l
ret


BC_DE_HL_times_10101:
push bc
ld a, h
ex af, af'
sub a
ld c, a
ld b, l
add hl, bc
adc a, a
ld b, e
add hl, bc
adc a, c ; AHL = [ L+H+E L ]
pop bc
push hl
push bc
  ld c, a
  ld b, 0
  ex af, af'
  ld h, a
  add hl, bc ; no way this can carry (initial HL is a square)
  ld c, a
  ld b, e
  sub a
  add hl, bc
  adc a, a ; AHL(SP+2) = [ H+E L+H L+H+E L ]
  add hl, de
  adc a, 0 ; AHL(SP+2) = [ H+E+D L+H+E L+H+E L ]
  pop bc
add hl, bc
adc a, 0 ; AHL(SP) = [ H+E+D+B L+H+E+C L+H+E L ]
ld e, d
ld d, c
add hl, de
adc a, b
jr nc, BC_DE_HL_times_10101_nc1
inc b ; BAHL(SP) = [ B B H+E+D+C+B L+H+E+D+C L+H+E L ]
BC_DE_HL_times_10101_nc1:
add a, e
jr nc, BC_DE_HL_times_10101_nc2
inc b ; BAHL(SP) = [ B D+B H+E+D+C+B L+H+E+D+C L+H+E L ]
BC_DE_HL_times_10101_nc2:
pop de
add a, c
ld c, a
ret nc
inc b ; BCHLDE = [ B D+C+B H+E+D+C+B L+H+E+D+C L+H+E L ]
ret

To get back to the topic somewhat, ACagliano, it sounds like you're more interested in squaring than in general multiplication.  Squaring can be considerably faster, especially if you use a lookup table (e.g., my best 16-bit squaring routine is around 170 cycles, versus around 800 for general multiplication.)
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 12, 2011, 10:43:43 am
I do have a 24-bit floating-point multiplication routine ;D

saved 2 bytes, 1149 cycles saved by using iy too (and in a way compatible with TIOS, imagine that)
Code: [Select]
; hldebc = hlc * bde
ld (iy+asm_Flag1),b
xor a
ld ix,0
ld b,24
Loop:
add ix,ix
rla
rl c
adc hl,hl
jr nc,Next
add ix,de
adc a,(iy+asm_Flag1)
rl c
jr nc,Next
inc hl
Next:
djnz Loop
ld e,a
ld d,c
push ix ; ld c,ixl
pop bc ; ld b,ixh

Jacobly, are you sure this works because I've been going through the code and it seems to me like the second 'rl c' should instead be 'add carry flag to c'. 2 times 'rl c' per loop seems wrong to me. Could you explain please? Because I've tried it as well in wabbitemu, taking the different input in account, and it is still not doing the right thing.
Title: Re: 24 bit multiplication
Post by: ACagliano on December 12, 2011, 02:35:39 pm
Yeah, all I need is 16-bit subtraction (which 'sub' supports, I think), 16-bit squaring, 32-bit addition, then 32-bit square rooting (or will I need to go up to 40-bit?).
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 12, 2011, 02:48:12 pm
16-bit subtraction
Code: [Select]
or a     ;to make sure the c flag is reset. Not always necessary if you know the c flag will be reset
sbc hl,bc  ;you can do sbc hl,de also.
32-bit addition (you mean two 32-bit inputs?)
Code: [Select]
;Inputs:
;     HLBC is one of the 32-bit inputs
;     DE points to the other 32-bit input in RAM
;Outputs:
;     HLBC is the 32-bit result
;     DE is incremented 3 times
;     A=H
;     c flag is set if there is an overflow
     ld a,(de) \ inc de
     add a,c \ ld c,a
     ld a,(de) \ inc de
     adc a,b \ ld b,a
     ld a,(de) \ inc de
     adc a,l \ ld l,a
     ld a,(de)
     adc a,h \ ld h,a
     ret
Squaring and square rooting... I will think on it D:

Also, I am working on a mini math library that will include RAM based math (so all the values will be in RAM). It seems like a few of these commands will need to rely on some memory. If they do, I suggest using the OP registers (11 bytes of RAM each).
Title: Re: 24 bit multiplication
Post by: jacobly on December 12, 2011, 07:07:45 pm
I do have a 24-bit floating-point multiplication routine ;D

saved 2 bytes, 1149 cycles saved by using iy too (and in a way compatible with TIOS, imagine that)
Code: [Select]
; hldebc = hlc * bde
ld (iy+asm_Flag1),b
xor a
ld ix,0
ld b,24
Loop:
add ix,ix
rla
rl c
adc hl,hl
jr nc,Next
add ix,de
adc a,(iy+asm_Flag1)
rl c
jr nc,Next
inc hl
Next:
djnz Loop
ld e,a
ld d,c
push ix ; ld c,ixl
pop bc ; ld b,ixh

Jacobly, are you sure this works because I've been going through the code and it seems to me like the second 'rl c' should instead be 'add carry flag to c'. 2 times 'rl c' per loop seems wrong to me. Could you explain please? Because I've tried it as well in wabbitemu, taking the different input in account, and it is still not doing the right thing.

That's strange. My test program must not have been working right, because when I went back and changed it a bit, it suddenly started telling me that the second routine doesn't work. :/
Anyway, my new test program seems to agree with this change. :)
Code: [Select]
; hldebc = hlc * bde
ld (iy+asm_Flag1),b
xor a
ld ix,0
ld b,24
Loop:
add ix,ix
rla
rl c
adc hl,hl
jr nc,Next
add ix,de
adc a,(iy+asm_Flag1)
jr nc,Next
inc c
jr nz,Next
inc hl
Next:
djnz Loop
ld e,a
ld d,c
push ix ; ld c,ixl
pop bc ; ld b,ixh
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 13, 2011, 11:06:38 am
That's strange. My test program must not have been working right, because when I went back and changed it a bit, it suddenly started telling me that the second routine doesn't work. :/
Anyway, my new test program seems to agree with this change. :)
Code: [Select]
; hldebc = hlc * bde
ld (iy+asm_Flag1),b
xor a
ld ix,0
ld b,24
Loop:
add ix,ix
rla
rl c
adc hl,hl
jr nc,Next
add ix,de
adc a,(iy+asm_Flag1)
jr nc,Next
inc c
jr nz,Next
inc hl
Next:
djnz Loop
ld e,a
ld d,c
push ix ; ld c,ixl
pop bc ; ld b,ixh

Cool, thanks a lot, indeed it works now! :D