Omnimaga

Calculator Community => TI Calculators => ASM => Topic started by: cerzus69 on December 07, 2011, 08:13:26 am

Title: 24 bit multiplication
Post by: cerzus69 on December 07, 2011, 08:13:26 am: I have been looking for two days now to find a z80 ASM routine that multiplies two 24 bit numbers to get a 48 bit result.

I need it because I want to try and make a Mandelbrot set generator for the Ti84+ using Q2.21 Fixed Point numbers.

Now I would try to come up with one myself but if anyone knows of an existing one I'd like to know!

Thanks
Title: Re: 24 bit multiplication
Post by: jacobly on December 07, 2011, 11:05:18 pm: I do have a 24-bit floating-point multiplication routine ;D

Anyway, for normal 24-bit multiplication, it might just barely be possible without using memory, iy, or shadow registers...
Code: [Select]
; hldebc = hla * cde, a = 0 ld b,0 push hl ld l,b ld h,a ld ix,0 ld a,24 Loop: add ix,ix adc hl,hl ex (sp),hl adc hl,hl ex (sp),hl jr nc,Next add ix,de adc hl,bc jr nc,Next ex (sp),hl inc hl ex (sp),hl Next: dec a jr nz,Loop pop de ex de,hl push ix ; ld c,ixl pop bc ; ld b,ixh...cool, I was right!

saved 2 bytes, 1149 cycles saved by using iy too (and in a way compatible with TIOS, imagine that)
Code: [Select]
; hldebc = hlc * bde ld (iy+asm_Flag1),b xor a ld ix,0 ld b,24 Loop: add ix,ix rla rl c adc hl,hl jr nc,Next add ix,de adc a,(iy+asm_Flag1) rl c jr nc,Next inc hl Next: djnz Loop ld e,a ld d,c push ix ; ld c,ixl pop bc ; ld b,ixh
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 08, 2011, 10:01:41 am: Thanks a lot jacobly for your reply!

So, those 2 routines, are they for floating point numbers or for integers? I assume the latter, but just to be sure :)
Any idea how long they take to complete on average? I could count myself but seeing you talking about saving 1149 cycles makes me think you have an idea yourself ;)
Title: Re: 24 bit multiplication
Post by: ACagliano on December 08, 2011, 10:11:48 am: And, I'm assuming the result is in bc?
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 08, 2011, 10:12:35 am: hldebc, actually, according to the note in the first code (I am assuming)
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 08, 2011, 10:27:12 am: That's right, because the a 24 bit times 24 bit multiplication can have a result of at most 48 bits or 6 bytes, hence it should be stored in 'hldebc' and not just 'bc'.
Title: Re: 24 bit multiplication
Post by: ACagliano on December 08, 2011, 10:38:11 am: Ok, this is a bit of a stretch, but how about square-rooting a 24-bit number. Because I need to do the distance formula with a three-byte inputs for x1, x2, y1, y2, z1, and z2.
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 08, 2011, 10:53:11 am: Quote
Ok, this is a bit of a stretch, but how about square-rooting a 24-bit number

Interesting question. I have thought about this myself since I would need to square as well for the Mandelbrot formula. I would just multiplicate normally, but I bet there could be some optimization when both the multiplier and multiplicator are the same. Perhaps you could try something with just 8 bits first to keep it simple and then if it works, work up to 24 bits.

EDIT: Sorry, read you question wrong.. thought you meant squaring. Square-rooting would be a little harder I suppose...
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 08, 2011, 10:55:41 am: Hmm, that is an interesting question... I know of a simple algorithm that I have ported to 8-bit and 16-bit z80 code, but I haven't tried 24-bit or 32-bit...

EDIT: @cerzus69: he meant sqrt(x) as opposed to x² I believe
Title: Re: 24 bit multiplication
Post by: ACagliano on December 08, 2011, 11:12:03 am: Yes. Basically, here's what I'm trying to do:

1. (x2-x1)^2 + (y2-y1)^2 + (z2-z1)^2 --> Ans , where all x,y,and z coords are 3 bytes large
2. √Ans
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 08, 2011, 11:19:11 am: If all the coords are 3 bytes large wouldn't that mean that Ans could possibly be 6 bytes large?
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 08, 2011, 11:20:31 am: So you really need 48-bit square root...
This could be fun x.x :D
Title: Re: 24 bit multiplication
Post by: ACagliano on December 08, 2011, 11:31:05 am: Ideas? Anyone willing to have "fun" with this?
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 08, 2011, 11:36:34 am: Well I am almost positive that using my algorithm would require the use of RAM, even if you used all the shadow registers, too.
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 08, 2011, 11:36:52 am: I wouldn't know where to start, I'm having problems enough with 24bit multiplication... :(
Title: Re: 24 bit multiplication
Post by: ACagliano on December 08, 2011, 03:11:20 pm: Quote from: Xeda112358 on December 08, 2011, 11:36:34 am
Well I am almost positive that using my algorithm would require the use of RAM, even if you used all the shadow registers, too.
Can you help me, conceptually with doing math higher than 2 bytes? Maybe I'll write it myself.
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 08, 2011, 03:54:25 pm: Okay, here is some pseudo code for an algorithm that will always work and is fast (each cycle gives the next digit):
Code: [Select]
0→A    0→D ;This is where the result will go    0→C ;This is a carry    For(B,1,12) ;12 is half the number of input bits    D+D→D ;We want to shift D left    D+D+1→C ;The difference of consecutive squares?    A<<E ;rotate E left, then rotate the carry into A    A<<E ;In z80, "rlc e \ rla"    If A>=C ;If A is greater than or equal to C    D+1→D    A-C→A    End ;You will need an input (24 bits in the above example) ;You will need an output (12 bits in the above example) ;You will need a counter (4 bits in the above example) ;You will need an accumulator (24 bits in the above example) ;You will need a carry (13 bits in the above example) ; ;You will need at least (77 bits in the above example)So for a simple 8-bit implementation in Z80 code:
Code: [Select]
;=============================================================== sqrtE: ;=============================================================== ;Input: ; E is the value to find the square root of ;Outputs: ; A is E-D^2 ; B is 0 ; D is the result ; E is not changed ; HL is not changed ;Destroys: ; C=2D+1 if D is even, 2D-1 if D is odd    xor a ;1 4 4    ld d,a ;1 4 4    ld c,a ;1 4 4    ld b,4 ;2 7 7 sqrtELoop:    rlc d ;2 8 32    ld c,d ;1 4 16    scf ;1 4 16    rl c ;2 8 32    rlc e ;2 8 32    rla ;1 4 16    rlc e ;2 8 32    rla ;1 4 16    cp c ;1 4 16    jr c,$+4 ;4 12|15 48+3x    inc d ;-- -- --    sub c ;-- -- --    djnz sqrtELoop ;2 13|8 47    ret ;1 10 10If you want to refine the accuracy to round to the nearest integer, you can add this code to the end of the Z80 code:
Code: [Select]
cp d ;1 4 4    jr c,$+3 ;3 12|11 12|11    inc d ;-- -- --It takes advantage of the linear differences of a quadratic (meaning (a(x+1)²+b(x+1)+c)-(ax²+bx+c) is a lnear equation)

Anywho, I am being told I need to finish this up, so I hope this helps until I can find time later >.>
Title: Re: 24 bit multiplication
Post by: ACagliano on December 08, 2011, 03:57:06 pm: Ok, I'm confused. I'll leave this to the more advanced programmers for now.
Title: Re: 24 bit multiplication
Post by: thepenguin77 on December 08, 2011, 07:40:32 pm: Edit2:
Dang, I need to read closer. Oh well, at least mine is way bigger than jacobly's.

These things actually aren't that difficult to write if you understand the principle by which they work.

Code: [Select]
multBCDbyEHL: ld a, b ld b, c ld c, d ld d, a multDBCbyEHL: push de push hl ld ix, $8000 ld (ix), l xor a ld h, a ld l, a call do8Bits ld (ix+1), a pop af ld (ix), a push de ld a, (ix+1) call do8Bits ld (ix+1), a pop af ;least sig number ex de, hl ex (sp), hl ;D and 2nd least sig in for new number ld (ix), l pop hl ;D and 2nd least sig push af ;least sig number push hl ;2nd least sig number ex de, hl ld a, (ix+1) call do8Bits ld b, a ld c, h ld d, l pop hl ld a, l pop hl ld h, a ret ;#### ;input: DBC = 1 number ; AHL = running number ; (ix) = to multiply by ;output: AHLE = output ; E is done ; DBC = 1 number do8Bits: ld (ix+1), 8 loop: srl (ix) jr nc, skip add hl, bc adc a, d skip: rra rr h rr l rr e dec (ix+1) jr nz, loop ret
I had to use 2 bytes of memory, so sorry about that. If you don't like it, you could just replace (ix) with ixl and (ix+1) with ixh and it would be ram independent, but then it wouldn't run on the Nspire.

Also, to my extreme delight, this thing worked on the first try ;D

(The result is in BCDEHL)

Edit:
Added to wikiti so people will forever be able to multiply large numbers.
Title: Re: 24 bit multiplication
Post by: jacobly on December 08, 2011, 08:35:58 pm: My first multiplication routine takes 2746 - 4570 cycles, the second takes 1680 - 2880 cycles.
Edit: And yes, they are 24*24 bit integer multiplication routines (floating point routines are much more complicated :P).

As for square root, I should be able to do it ~~using only af bc de hl ix (sp), or af bc de hl ix iy, just like the multiplication. Just give me a couple hours to port it to 48-bit.~~

Edit:
uses af bc de hl ix iy (for two bytes of memory)
84 bytes, 9186-9258 cycles
Code: [Select]
; ahl = sqrt(hldebc) push bc ; ld c,ixl pop ix ; ld b,ixh push de ld c,l ld a,h ld hl,0 ld b,h ld e,l ld d,h ld (iy+asm_Flag1),d ld (iy+asm_Flag2),24 Loop: cp $40 push af sbc hl,de ld a,b sbc a,(iy+asm_Flag1) jr c,Restore ld b,a pop af sub $40 scf jr Skip Restore: pop af adc hl,de or a Skip: rl e rl d rl (iy+asm_Flag1) add ix,ix ex (sp),hl adc hl,hl ex (sp),hl rl c rla adc hl,hl rl b add ix,ix ex (sp),hl adc hl,hl ex (sp),hl rl c rla adc hl,hl rl b dec (iy+asm_Flag2) jr nz,Loop pop hl ld a,(iy+asm_Flag1) ret
uses af bc b' de de' hl hl' ix
74 bytes, 5985-6777 cycles
Code: [Select]
; dea = sqrt(hldebc) di push bc ; ld c,ixl pop ix ; ld b,ixh ld bc,$40 ld a,l ld l,h ld h,b exx ld de,0 ld l,e ld h,d ld b,24 or a Loop: exx sbc hl,bc exx sbc hl,de jr nc,Skip exx add hl,bc exx adc hl,de Skip: exx ccf rl b exx rl e rl d exx add ix,ix rl e rl d rla adc hl,hl exx adc hl,hl exx add ix,ix rl e rl d rla adc hl,hl exx adc hl,hl djnz Loop exx ld a,b exx ei ret
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 09, 2011, 07:39:22 pm: Okay, so this isn't too helpful, but it is an idea: I came up with a very simple algorithm that gets very accurate very fast (12 cycles for 16-bit numbers). On the Z80 it isn't that fast since it requires multiplication and division, but the algorithm looks like this:
Code: [Select]
B/2→A ;start with anything, really For(C,1,12 (A+B/A)/2→A End
To implement this in assembly code, you will want to multiply the input by 4 (shifting left twice) and then divide the output by 2. Then if you want added precision, if there is a carry, increment by 1. Again, in z80, this algorithm isn't really worth it since it isn't as speed efficient as other methods, but it can be easier to grasp :) Plus, if you move on to coding for devices with the ability to multiply and divide, you will have a really tiny algorithm that uses a counter, input, and output as opposed to a bajillion registers.
Title: Re: 24 bit multiplication
Post by: jacobly on December 10, 2011, 02:45:24 am: Actually, my routine only uses about 3 virtual registers. The only reason it uses so many z80 registers is because the virtual registers are so big. In fact, the first routine uses the same number of bits as your's would (don't forget that A and B in your routine would each be 48 bits wide). However, your routine does have the advantage of using similarly sized registers (and fewer iterations), so it probably would be more useful on other processors.
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 10, 2011, 08:26:06 am: Hey, jacobly, I tested both of your multiplication routines and it doesn't seem like they're doing the same thing... I've tried both of them in my program but they have different results.
Title: Re: 24 bit multiplication
Post by: ACagliano on December 10, 2011, 09:24:34 am: Ok, I'm actually going to be sticking with just two bytes for position and scrapping the sector. That means I'll need to do 2-byte subtraction (no problem), square a 2-byte (rendering a max 4-byte), then add three 4-bytes (do you only need a 4-byte output or should you go up to 5), then square root that. That should be easier.

√( (x2-x1)^2 + (y2-y1)^2 + (z2-z1)^2)

√( (16bit-16bit)^2 + (16bit-16bit)^2 + (16bit-16bit)^2)

Can someone explain to me the theory behind multiplying/dividing/sqrt'ing numbers? Is there a standard theorem, regardless of the bit-size of the number?
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 10, 2011, 12:25:16 pm: Okay, so multiplying and dividing you probably actually know-- you just don't know it. So you remember in grade school learning how to multiply large numbers:
Code: [Select]
432 x27 -----Well that is actually the best method and fastest. What you do is multiply 2*432. THen you shift the digits left and throw a zero at the end. Then you multiply 432*7 and add the two values together. Voila, multiplication :) Now in binary, math is almost always easier. Because you are using 0 and 1, multiplication is easy:
Code: [Select]
11000101 11010111so you take the last digit of the second number and you get 1. Multiply 1 times the first and you get 11000101. Now shift that left and check the next bit of the bottom number. If that is 1, add it to the accumulator (the accumulator in this case is the running total, not necessarily the a register).
So here is what it looks like in assembly code:
Code: [Select]
;D*E xor a ;This is the accumulator ld b,8 ;this is the counter MultLoop: add a,a ;this is to rotate the accumulator left. Use rlca, too rlc e ;This puts the next bit in E in the carry jr nc,$+3 ;If the bit in E was not 1, you add 0 (so don't add!) add a,d ;1*D=D, so add D. Binary makes math easy. djnz MultLoop ret
When it comes to division in z80, you are pretty much just doing long division. Finding the square root, however, is something you probably didn't do in school since we have calculators and it was taken out of the school curriculum (I sound old D:). Anywho, I learned how to do it in decimal from a textbook from the 1950's and then I extended it to binary since it is much, much easier in binary. It is difficult to explain in a post (you kind of need to see it to understand it), but here is an excellent site:
http://www.homeschoolmath.net/teaching/square-root-algorithm.php

The second method is what you will want to use as it gives each consecutive digit every cycle. If you can understand that well, you will probably see why it is so much easier in binary and how to create a z80 algorithm :)
Title: Re: 24 bit multiplication
Post by: jacobly on December 10, 2011, 06:13:47 pm: Quote from: cerzus69 on December 10, 2011, 08:26:06 am
Hey, jacobly, I tested both of your multiplication routines and it doesn't seem like they're doing the same thing... I've tried both of them in my program but they have different results.
They both have different inputs. :)
The first is hla * cde, and the second is hlc * bde.
You probably want to add some code to the beginning of each so that the input works better for what you are doing.
For example:
ld a,c ld c,b
at the beginning of the first routine causes them to have the same input (hlc and bde).

Edit: Here's some pseudo code that might help.
Code: [Select]
// Multiply a times b temp = 0 repeat for each bit in a temp <<= 1 if (high bit of a set) temp += b a <<= 1 return temp // Divide a by b temp = 0 repeat for each bit in a temp <<= 1 temp += high bit of a a <<= 1 if (temp >= b) temp -= b set low bit of a return a // Sqrt a temp = 0 b = 0 repeat for every 2 bits in a temp += high 2 bits of a a <<= 2 test = b << 2 + 1 b <<= 1 if (temp >= test) temp -= test set low bit of b return b // Sqrt a, sometimes better with multiple-of-a-byte registers temp = high byte of a a <<= 8 b = 0 repeat for every 2 bits in a test = b << 8 + 0x40 b <<= 1 if (temp >= test) temp -= test set low bit of b temp += high 2 bits of a a <<= 2 return bThe tricky part is figuring out how many bits are in each variable and allocating the z80 registers accordingly.
Title: Re: 24 bit multiplication
Post by: ACagliano on December 11, 2011, 11:31:56 am: Ok, so let me just get something straight...

00100110
x11011001
-------------
1. 00100110
2. 0,01001100
3. 00,10011000
4. 001,00110001
5. 0010,01100011
6. 00100,11000110
7. 001001,10001101
8. 0010011,00011011

Thus, the answer is %00010011 %00011011

Now, let's check: 38 x 217 = 4891 XX it's wrong??
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 11, 2011, 11:55:42 am: Hmm, here is the algorithm in decimal:
Code: [Select]
333 x471 ---- 1) 4*333 +Acc = 1332 Acc*10= 13320 2) 7*333 +Acc = 15651 Acc*10= 156510 3) 1*333 +Acc = 156843Here is the algorithm in binary:
Code: [Select]
00100110 x11011001 --------------- 1) 1x00100110 +Acc = 00100110 Acc*2= 01001100 2) 1x00100110 +Acc = 01001100+00100110=01110010 Acc*2= 11100100 3) 0x00100110 +Acc = 11100100 Acc*2= 111001000 4) 1x00100110 +Acc = 111101110 Acc*2= 1111011100 5) 1x00100110 +Acc = 10000000010 Acc*2= 100000000100 6) 0x00100110 +Acc = 100000000100 Acc*2= 1000000001000 7) 0x00100110 +Acc = 1000000001000 Acc*2= 10000000010000 8) 1x00100110 +Acc = 10000000110110
Title: Re: 24 bit multiplication
Post by: ACagliano on December 11, 2011, 12:47:33 pm: Ok, I think I get it.

You move through the second register. If the current bit is 1, you add 1 x a to a, then rla a. If the current bit is 0, just rla a.
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 11, 2011, 01:07:43 pm: Yes :) Typically you do rla, then check the next bit and if it is set, add the other register :)

The rla at the beginning works because a starts at 0 so 0*0=0.
Title: Re: 24 bit multiplication
Post by: ACagliano on December 11, 2011, 01:41:47 pm: Ok. I am particularly interested now in 2-byte multiplication and 4-byte square rooting. How would they be done?
Title: Re: 24 bit multiplication
Post by: jacobly on December 11, 2011, 02:06:55 pm: Code: [Select]
// Multiply a times b temp = 0 repeat for each bit in a temp <<= 1 if (high bit of a set) temp += b a <<= 1 return tempif a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.
Spoiler For for code:
~~stolen~~ from Axep_MulFull: ; Input in hl, result in cahl ld c,h ld a,l ld hl,0 ;11 ld b,16 ;7 __MulFullNext: add hl,hl ;11 rla ;4 rl c ;8 jr nc,__MulFullSkip ;12/7 add hl,de ;11 adc a,0 ;7 jr nc,__MulFullSkip inc c __MulFullSkip: djnz __MulFullNext ret __MulFullEnd:

Code: [Select]
// Sqrt a temp = high byte of a a <<= 8 b = 0 repeat for every 2 bits in a test = b << 8 + 0x40 b <<= 1 if (temp >= test) temp -= test set low bit of b temp += high 2 bits of a a <<= 2 return bIf a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.
Spoiler For code:
~~stole~~ my own routine from axe (and modified it)p_Sqrt88: ; input in hlde, result in de ld b,16 ld a,h ld c,l push de ; ld ixh,d pop ix ; ld ixl,e ld de,0 ld h,d ld l,e __Sqrt88Loop: sub $40 sbc hl,de jr nc,__Sqrt88Skip add a,$40 adc hl,de __Sqrt88Skip: ccf rl e rl d add ix,ix rl c rla adc hl,hl add ix,ix rl c rla adc hl,hl djnz __Sqrt88Loop ret __Sqrt88End:
Title: Re: 24 bit multiplication
Post by: ACagliano on December 11, 2011, 02:23:11 pm: Quote from: jacobly on December 11, 2011, 02:06:55 pm
Code: [Select]
// Multiply a times b temp = 0 repeat for each bit in a temp <<= 1 if (high bit of a set) temp += b a <<= 1 return tempif a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.
Spoiler For for code:
~~stolen~~ from Axep_MulFull: ; Input in hl, result in cahl ld c,h ld a,l ld hl,0 ;11 ld b,16 ;7 __MulFullNext: add hl,hl ;11 rla ;4 rl c ;8 jr nc,__MulFullSkip ;12/7 add hl,de ;11 adc a,0 ;7 jr nc,__MulFullSkip inc c __MulFullSkip: djnz __MulFullNext ret __MulFullEnd:

Code: [Select]
// Sqrt a temp = high byte of a a <<= 8 b = 0 repeat for every 2 bits in a test = b << 8 + 0x40 b <<= 1 if (temp >= test) temp -= test set low bit of b temp += high 2 bits of a a <<= 2 return bIf a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.
Spoiler For code:
~~stole~~ my own routine from axe (and modified it)p_Sqrt88: ; input in hlde, result in de ld b,16 ld a,h ld c,l push de ; ld ixh,d pop ix ; ld ixl,e ld de,0 ld h,d ld l,e __Sqrt88Loop: sub $40 sbc hl,de jr nc,__Sqrt88Skip add a,$40 adc hl,de __Sqrt88Skip: ccf rl e rl d add ix,ix rl c rla adc hl,hl add ix,ix rl c rla adc hl,hl djnz __Sqrt88Loop ret __Sqrt88End:

Kool. Thanks. But, shouldn't the first one have two inputs?
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 11, 2011, 02:30:16 pm: So with two-byte multiplication, you can take advantage of the fact that add hl,hl is the same as shifting hl left. It even gives you the carry! So in this case:
Code: [Select]
ld hl,0 ld a,16 MultLoop: add hl,hl ;shifts hl left rl e \ rl d ;shifts de left and if hl overflowed, it overflows into de jr nc,$+6 ;if the bit in DE is o, skip this chunk add hl,bc ;add bc to hl (think of this as the first number) jr nc,$+3 ;overflow into de inc de dec a jr nz,MultLoop retThat will multiply DE times BC and return the result in DEHL. I will see if I can port a square root routine for 32-bit...

EDIT: changed inc e to inc de
Title: Re: 24 bit multiplication
Post by: jacobly on December 11, 2011, 02:48:20 pm: Quote from: ACagliano on December 11, 2011, 02:23:11 pm
Quote from: jacobly on December 11, 2011, 02:06:55 pm
Code: [Select]
// Multiply a times b temp = 0 repeat for each bit in a temp <<= 1 if (high bit of a set) temp += b a <<= 1 return tempif a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.
Spoiler For for code:
~~stolen~~ from Axep_MulFull: ; Input in hl and de, result in cahl ld c,h ld a,l ld hl,0 ;11 ld b,16 ;7 __MulFullNext: add hl,hl ;11 rla ;4 rl c ;8 jr nc,__MulFullSkip ;12/7 add hl,de ;11 adc a,0 ;7 jr nc,__MulFullSkip inc c __MulFullSkip: djnz __MulFullNext ret __MulFullEnd:

Code: [Select]
// Sqrt a temp = high byte of a a <<= 8 b = 0 repeat for every 2 bits in a test = b << 8 + 0x40 b <<= 1 if (temp >= test) temp -= test set low bit of b temp += high 2 bits of a a <<= 2 return bIf a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.
Spoiler For code:
~~stole~~ my own routine from axe (and modified it)p_Sqrt88: ; input in hlde, result in de ld b,16 ld a,h ld c,l push de ; ld ixh,d pop ix ; ld ixl,e ld de,0 ld h,d ld l,e __Sqrt88Loop: sub $40 sbc hl,de jr nc,__Sqrt88Skip add a,$40 adc hl,de __Sqrt88Skip: ccf rl e rl d add ix,ix rl c rla adc hl,hl add ix,ix rl c rla adc hl,hl djnz __Sqrt88Loop ret __Sqrt88End:

Kool. Thanks. But, shouldn't the first one have two inputs?

Of course, hl and de, isn't that what I said ;)
Title: Re: 24 bit multiplication
Post by: FloppusMaximus on December 11, 2011, 04:41:31 pm: Quote from: jacobly on December 08, 2011, 08:35:58 pm
My first multiplication routine takes 2746 - 4570 cycles, the second takes 1680 - 2880 cycles.
Oh boy, optimization time :D

The best I have so far is somewhere around 1800 cycles average (I'm too lazy to work out the exact probabilities at the moment, and not counting memory delays) using a squaring table and undocumented IX instructions. Input is BDE and CHL, output is BCDEAL. This routine works by expanding the formula 2xy = x²+y²-|x-y|², summed over each of the 9 pairs of bytes in the input.

(I'm not saying this is practical - unless you really have thousands of 24-bit multiplications to perform, you don't need this kind of speed. This is just for fun.)
Code: [Select]
SUBFIRST .macro src1, src2, hdest, ldest exx ld a, src1 sub src2 jr nc, $ + 4 neg exx ld l, a ld a, ldest sub (hl) ld ldest, a inc h ld a, hdest sbc a, (hl) ld hdest, a .endm SUBNEXT .macro src1, src2, hdest, ldest dec h ex af, af' exx ld a, src1 sub src2 jr nc, $ + 4 neg exx ld l, a ex af, af' ld a, ldest sbc a, (hl) ld ldest, a inc h ld a, hdest sbc a, (hl) ld hdest, a .endm BDE_times_CHL_sqrdiff_v3: ld a, d exx ld h, high(sqrtab) ld l, a ld e, (hl) inc h ld d, (hl) ; DE = d² exx ld a, b exx ld l, a ld b, (hl) dec h ld c, (hl) ; BC = b² exx ld a, e exx ld l, a ld a, (hl) inc h ld h, (hl) ld l, a ; HL = e² call BC_DE_HL_times_10101 push bc push hl push de exx ld a, h exx ld h, high(sqrtab) ld l, a ld e, (hl) inc h ld d, (hl) ; DE = h² exx ld a, c exx ld l, a ld b, (hl) dec h ld c, (hl) ; BC = c² exx ld a, l exx ld l, a ld a, (hl) inc h ld h, (hl) ld l, a ; HL = l² call BC_DE_HL_times_10101 pop ix add ix, de pop de adc hl, de ex de, hl pop hl adc hl, bc ld b, h ld c, l ; BCDEIX = total push af ld h, high(sqrtab) SUBFIRST e, l, ixh, ixl SUBNEXT d, h, d, e SUBNEXT b, c, b, c jp nc, BDE_times_CHL_sqrdiff_v3_nc1 pop af ccf push af BDE_times_CHL_sqrdiff_v3_nc1: inc b dec h SUBFIRST e, h, e, ixh SUBNEXT d, c, c, d jr nc, BDE_times_CHL_sqrdiff_v3_nc2 dec b jp nz, BDE_times_CHL_sqrdiff_v3_nc2 pop af ccf push af BDE_times_CHL_sqrdiff_v3_nc2: dec h SUBFIRST d, l, e, ixh SUBNEXT b, h, c, d jr nc, BDE_times_CHL_sqrdiff_v3_nc3 dec b jp nz, BDE_times_CHL_sqrdiff_v3_nc3 pop af ccf push af BDE_times_CHL_sqrdiff_v3_nc3: inc c dec h SUBFIRST b, l, d, e jr nc, BDE_times_CHL_sqrdiff_v3_nc4 dec c jp nz, BDE_times_CHL_sqrdiff_v3_nc4 dec b jp nz, BDE_times_CHL_sqrdiff_v3_nc4 pop af ccf push af BDE_times_CHL_sqrdiff_v3_nc4: dec h SUBFIRST e, c, d, e pop hl jr nc, BDE_times_CHL_sqrdiff_v3_nc5 dec c jp nz, BDE_times_CHL_sqrdiff_v3_nc5 dec b jp nz, BDE_times_CHL_sqrdiff_v3_nc5 inc l BDE_times_CHL_sqrdiff_v3_nc5: dec b dec c rr l rr b rr c rr d rr e ld a, ixl ld l, a ld a, ixh rra rr l ret BC_DE_HL_times_10101: push bc ld a, h ex af, af' sub a ld c, a ld b, l add hl, bc adc a, a ld b, e add hl, bc adc a, c ; AHL = [ L+H+E L ] pop bc push hl push bc ld c, a ld b, 0 ex af, af' ld h, a add hl, bc ; no way this can carry (initial HL is a square) ld c, a ld b, e sub a add hl, bc adc a, a ; AHL(SP+2) = [ H+E L+H L+H+E L ] add hl, de adc a, 0 ; AHL(SP+2) = [ H+E+D L+H+E L+H+E L ] pop bc add hl, bc adc a, 0 ; AHL(SP) = [ H+E+D+B L+H+E+C L+H+E L ] ld e, d ld d, c add hl, de adc a, b jr nc, BC_DE_HL_times_10101_nc1 inc b ; BAHL(SP) = [ B B H+E+D+C+B L+H+E+D+C L+H+E L ] BC_DE_HL_times_10101_nc1: add a, e jr nc, BC_DE_HL_times_10101_nc2 inc b ; BAHL(SP) = [ B D+B H+E+D+C+B L+H+E+D+C L+H+E L ] BC_DE_HL_times_10101_nc2: pop de add a, c ld c, a ret nc inc b ; BCHLDE = [ B D+C+B H+E+D+C+B L+H+E+D+C L+H+E L ] ret
To get back to the topic somewhat, ACagliano, it sounds like you're more interested in squaring than in general multiplication. Squaring can be considerably faster, especially if you use a lookup table (e.g., my best 16-bit squaring routine is around 170 cycles, versus around 800 for general multiplication.)
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 12, 2011, 10:43:43 am: Quote from: jacobly on December 07, 2011, 11:05:18 pm
I do have a 24-bit floating-point multiplication routine ;D

saved 2 bytes, 1149 cycles saved by using iy too (and in a way compatible with TIOS, imagine that)
Code: [Select]
; hldebc = hlc * bde ld (iy+asm_Flag1),b xor a ld ix,0 ld b,24 Loop: add ix,ix rla rl c adc hl,hl jr nc,Next add ix,de adc a,(iy+asm_Flag1) rl c jr nc,Next inc hl Next: djnz Loop ld e,a ld d,c push ix ; ld c,ixl pop bc ; ld b,ixh

Jacobly, are you sure this works because I've been going through the code and it seems to me like the second 'rl c' should instead be 'add carry flag to c'. 2 times 'rl c' per loop seems wrong to me. Could you explain please? Because I've tried it as well in wabbitemu, taking the different input in account, and it is still not doing the right thing.
Title: Re: 24 bit multiplication
Post by: ACagliano on December 12, 2011, 02:35:39 pm: Yeah, all I need is 16-bit subtraction (which 'sub' supports, I think), 16-bit squaring, 32-bit addition, then 32-bit square rooting (or will I need to go up to 40-bit?).
Title: Re: 24 bit multiplication
Post by: Xeda112358 on December 12, 2011, 02:48:12 pm: 16-bit subtraction
Code: [Select]
or a ;to make sure the c flag is reset. Not always necessary if you know the c flag will be reset sbc hl,bc ;you can do sbc hl,de also.32-bit addition (you mean two 32-bit inputs?)
Code: [Select]
;Inputs: ; HLBC is one of the 32-bit inputs ; DE points to the other 32-bit input in RAM ;Outputs: ; HLBC is the 32-bit result ; DE is incremented 3 times ; A=H ; c flag is set if there is an overflow ld a,(de) \ inc de add a,c \ ld c,a ld a,(de) \ inc de adc a,b \ ld b,a ld a,(de) \ inc de adc a,l \ ld l,a ld a,(de) adc a,h \ ld h,a retSquaring and square rooting... I will think on it D:

Also, I am working on a mini math library that will include RAM based math (so all the values will be in RAM). It seems like a few of these commands will need to rely on some memory. If they do, I suggest using the OP registers (11 bytes of RAM each).
Title: Re: 24 bit multiplication
Post by: jacobly on December 12, 2011, 07:07:45 pm: Quote from: cerzus69 on December 12, 2011, 10:43:43 am
Quote from: jacobly on December 07, 2011, 11:05:18 pm
I do have a 24-bit floating-point multiplication routine ;D

saved 2 bytes, 1149 cycles saved by using iy too (and in a way compatible with TIOS, imagine that)
Code: [Select]
; hldebc = hlc * bde ld (iy+asm_Flag1),b xor a ld ix,0 ld b,24 Loop: add ix,ix rla rl c adc hl,hl jr nc,Next add ix,de adc a,(iy+asm_Flag1) rl c jr nc,Next inc hl Next: djnz Loop ld e,a ld d,c push ix ; ld c,ixl pop bc ; ld b,ixh

Jacobly, are you sure this works because I've been going through the code and it seems to me like the second 'rl c' should instead be 'add carry flag to c'. 2 times 'rl c' per loop seems wrong to me. Could you explain please? Because I've tried it as well in wabbitemu, taking the different input in account, and it is still not doing the right thing.

That's strange. My test program must not have been working right, because when I went back and changed it a bit, it suddenly started telling me that the second routine doesn't work. :/
Anyway, my new test program seems to agree with this change. :)
Code: [Select]
; hldebc = hlc * bde ld (iy+asm_Flag1),b xor a ld ix,0 ld b,24 Loop: add ix,ix rla rl c adc hl,hl jr nc,Next add ix,de adc a,(iy+asm_Flag1) jr nc,Next inc c jr nz,Next inc hl Next: djnz Loop ld e,a ld d,c push ix ; ld c,ixl pop bc ; ld b,ixh
Title: Re: 24 bit multiplication
Post by: cerzus69 on December 13, 2011, 11:06:38 am: Quote from: jacobly on December 12, 2011, 07:07:45 pm
That's strange. My test program must not have been working right, because when I went back and changed it a bit, it suddenly started telling me that the second routine doesn't work. :/
Anyway, my new test program seems to agree with this change. :)
Code: [Select]
; hldebc = hlc * bde ld (iy+asm_Flag1),b xor a ld ix,0 ld b,24 Loop: add ix,ix rla rl c adc hl,hl jr nc,Next add ix,de adc a,(iy+asm_Flag1) jr nc,Next inc c jr nz,Next inc hl Next: djnz Loop ld e,a ld d,c push ix ; ld c,ixl pop bc ; ld b,ixh

Cool, thanks a lot, indeed it works now! :D

SMF 2.0.15 | SMF © 2017, Simple Machines
Dream Portal 1.1 © 2009–2025 Dream Portal Team