Omnimaga: The Coders Of Tomorrow
Welcome, Guest. Please login or register.
 
Omnimaga: The Coders Of Tomorrow
24 May, 2013, 00:43:00 *
Welcome, Guest. Please login or register.

Login with username, password and session length
 
   home   news downloads projects tutorials misc forums rules new posts irc about Login Register  
+-OmnomIRC

You must Register, be logged in and have at least 40 posts to use this shout-box! If it still doesn't show up afterward, it might be that OmnomIRC is disabled for your group or under maintenance.

Note: You can also use an IRC client like mIRC, X-Chat or Mibbit to connect to an EFnet server and #omnimaga.

Pages: 1 2 [3]   Go Down
  Print  
Author Topic: 24 bit multiplication - multiply two 24 bit numbers to get 48 bit result  (Read 1784 times) Bookmark and Share
0 Members and 1 Guest are viewing this topic.
ACagliano
LV8 Addict (Next: 1000)
********
Offline Offline

Last Login: 14 May, 2013, 13:02:38
Date Registered: 03 July, 2009, 01:06:06
Posts: 764


Total Post Ratings: +29

View Profile WWW
« Reply #30 on: 11 December, 2011, 20:41:47 »
0

Ok. I am particularly interested now in 2-byte multiplication and 4-byte square rooting. How would they be done?
Logged

-ACagliano
TI-Basic software developer

My Website


Current Projects
----------------------------
1. Legend of Zelda "Revenge of Ganon"
        -maps: 100%
        -graphics engine: 20% (sprites)
        -AI engine: 0%
        -event scripts: 60% (text left)
        -walking engine: 100%
        -miscellaneous: 40%
  -total progress:  54%

jacobly
LV4 Regular (Next: 200)
****
Offline Offline

Last Login: Yesterday at 22:58:38
Date Registered: 09 October, 2011, 01:53:09
Posts: 199

Total Post Ratings: +149

View Profile
« Reply #31 on: 11 December, 2011, 21:06:55 »
0


1
2
3
4
5
6
7
8
9
// Multiply a times b
temp = 0
repeat for each bit in a
temp <<= 1
if (high bit of a set)
temp += b
a <<= 1
return temp
if a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.
Spoiler for for code:
stolen from Axe
p_MulFull:
   ; Input in hl, result in cahl
   ld   c,h
   ld   a,l
   ld   hl,0   ;11
   ld   b,16   ;7
__MulFullNext:
   add   hl,hl   ;11
   rla      ;4
   rl   c   ;8
   jr   nc,__MulFullSkip   ;12/7
   add   hl,de   ;11
   adc   a,0   ;7
   jr   nc,__MulFullSkip
   inc   c
__MulFullSkip:
   djnz   __MulFullNext
   ret
__MulFullEnd:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Sqrt a
temp = high byte of a
a <<= 8
b = 0
repeat for every 2 bits in a
test = b << 8 + 0x40
b <<= 1
if (temp >= test)
temp -= test
set low bit of b
temp += high 2 bits of a
a <<= 2
return b
If a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.
Spoiler for code:
stole my own routine from axe (and modified it)
p_Sqrt88:
   ; input in hlde, result in de
   ld   b,16
   ld   a,h
   ld   c,l
   push   de ; ld ixh,d
   pop   ix ; ld ixl,e
   ld   de,0
   ld   h,d
   ld   l,e
__Sqrt88Loop:
   sub   $40
   sbc   hl,de
   jr   nc,__Sqrt88Skip
   add   a,$40
   adc   hl,de
__Sqrt88Skip:
   ccf
   rl   e
   rl   d
   add   ix,ix
   rl   c
   rla
   adc   hl,hl
   add   ix,ix
   rl   c
   rla
   adc    hl,hl
   djnz   __Sqrt88Loop
   ret
__Sqrt88End:
Logged
ACagliano
LV8 Addict (Next: 1000)
********
Offline Offline

Last Login: 14 May, 2013, 13:02:38
Date Registered: 03 July, 2009, 01:06:06
Posts: 764


Total Post Ratings: +29

View Profile WWW
« Reply #32 on: 11 December, 2011, 21:23:11 »
0


1
2
3
4
5
6
7
8
9
// Multiply a times b
temp = 0
repeat for each bit in a
temp <<= 1
if (high bit of a set)
temp += b
a <<= 1
return temp
if a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.
Spoiler for for code:
stolen from Axe
p_MulFull:
   ; Input in hl, result in cahl
   ld   c,h
   ld   a,l
   ld   hl,0   ;11
   ld   b,16   ;7
__MulFullNext:
   add   hl,hl   ;11
   rla      ;4
   rl   c   ;8
   jr   nc,__MulFullSkip   ;12/7
   add   hl,de   ;11
   adc   a,0   ;7
   jr   nc,__MulFullSkip
   inc   c
__MulFullSkip:
   djnz   __MulFullNext
   ret
__MulFullEnd:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Sqrt a
temp = high byte of a
a <<= 8
b = 0
repeat for every 2 bits in a
test = b << 8 + 0x40
b <<= 1
if (temp >= test)
temp -= test
set low bit of b
temp += high 2 bits of a
a <<= 2
return b
If a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.
Spoiler for code:
stole my own routine from axe (and modified it)
p_Sqrt88:
   ; input in hlde, result in de
   ld   b,16
   ld   a,h
   ld   c,l
   push   de ; ld ixh,d
   pop   ix ; ld ixl,e
   ld   de,0
   ld   h,d
   ld   l,e
__Sqrt88Loop:
   sub   $40
   sbc   hl,de
   jr   nc,__Sqrt88Skip
   add   a,$40
   adc   hl,de
__Sqrt88Skip:
   ccf
   rl   e
   rl   d
   add   ix,ix
   rl   c
   rla
   adc   hl,hl
   add   ix,ix
   rl   c
   rla
   adc    hl,hl
   djnz   __Sqrt88Loop
   ret
__Sqrt88End:

Kool. Thanks. But, shouldn't the first one have two inputs?
Logged

-ACagliano
TI-Basic software developer

My Website


Current Projects
----------------------------
1. Legend of Zelda "Revenge of Ganon"
        -maps: 100%
        -graphics engine: 20% (sprites)
        -AI engine: 0%
        -event scripts: 60% (text left)
        -walking engine: 100%
        -miscellaneous: 40%
  -total progress:  54%

Xeda112358
Xombie. I am it.
Coder Of Tomorrow
LV12 Extreme Poster (Next: 5000)
*
Offline Offline

Last Login: Yesterday at 22:01:23
Date Registered: 31 October, 2010, 08:46:36
Location: Land of Little Cubes and Tea, NY
Posts: 3760


Total Post Ratings: +609

View Profile
« Reply #33 on: 11 December, 2011, 21:30:16 »
0

So with two-byte multiplication, you can take advantage of the fact that add hl,hl is the same as shifting hl left. It even gives you the carry! So in this case:

1
2
3
4
5
6
7
8
9
10
11
12
13
     ld hl,0
     ld a,16
MultLoop:
     add hl,hl      ;shifts hl left
     rl e \ rl d    ;shifts de left and if hl overflowed, it overflows into de
     jr nc,$+6      ;if the bit in DE is o, skip this chunk
       add hl,bc    ;add bc to hl (think of this as the first number)
       jr nc,$+3    ;overflow into de
         inc de
     dec a
     jr nz,MultLoop
     ret
That will multiply DE times BC and return the result in DEHL. I will see if I can port a square root routine for 32-bit...

EDIT: changed inc e to inc de
« Last Edit: 12 December, 2011, 04:43:27 by Xeda112358 » Logged



Grammer Download (2.29.04.12)
Latest update (possibly incomplete)
My pastebin
Spoiler for FileSyst:
FileSyst is an application that provides a folder and filesystem for the TI-83+/84+ calculators. It is designed to be easy to access and use in BASIC, and it can be used to access game files and save data, or to create a command prompt, among other things:

Spoiler for Graphiti:
This is a graph explorer for graph theory. It will require lots of work to finish. Currently you can:
Add/delete vertices
Add edges (direction not shown, but they are directed)
Arrange vertices in a circle (in the future, you will be able to define levels of rings and the number of nodes in each)
Create complete graphs quickly

Plans:
Add adjacency matrix viewer
Deleting edges
Multiple graphs support
Arrows for directed graphs
Planarity testing
Matrix operations
Weighted edges
Chromatic polynomials
Chromatic numbers

Spoiler for Stats:

Samocal             [o---------]
Virtual Processor   [o---------]
EnG                 [oo--------]
Grammer             [ooo-------]
AsmComp             [ooo-------]
Partex              [oooo------]
BatLib              [oooooooo--]
Grammer82           [----------]
Grammer68000        [----------]


Pseudonyms:  Zeda, Xeda, Thunderbolt
Languages:   English, français
Programming: z80 Assmebly
             Grammer
             TI-BASIC (83/84/+/SE, 89/89t/92)
Known For:   -Creator of the Grammer programming language
              (Winning program of zContest2011)
             -BatLib- One of the most feature packed libraries for BASIC programmers available
              with over 100 functions and a simple programming language
             -Learning to program z80 in hexadecimal before using an assembler (no computer was
              available!)
╔═╦╗░╠═╬╣▒║ ║║▓╚═╩╝█


jacobly
LV4 Regular (Next: 200)
****
Offline Offline

Last Login: Yesterday at 22:58:38
Date Registered: 09 October, 2011, 01:53:09
Posts: 199

Total Post Ratings: +149

View Profile
« Reply #34 on: 11 December, 2011, 21:48:20 »
0


1
2
3
4
5
6
7
8
9
// Multiply a times b
temp = 0
repeat for each bit in a
temp <<= 1
if (high bit of a set)
temp += b
a <<= 1
return temp
if a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.
Spoiler for for code:
stolen from Axe
p_MulFull:
   ; Input in hl and de, result in cahl
   ld   c,h
   ld   a,l
   ld   hl,0   ;11
   ld   b,16   ;7
__MulFullNext:
   add   hl,hl   ;11
   rla      ;4
   rl   c   ;8
   jr   nc,__MulFullSkip   ;12/7
   add   hl,de   ;11
   adc   a,0   ;7
   jr   nc,__MulFullSkip
   inc   c
__MulFullSkip:
   djnz   __MulFullNext
   ret
__MulFullEnd:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Sqrt a
temp = high byte of a
a <<= 8
b = 0
repeat for every 2 bits in a
test = b << 8 + 0x40
b <<= 1
if (temp >= test)
temp -= test
set low bit of b
temp += high 2 bits of a
a <<= 2
return b
If a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.
Spoiler for code:
stole my own routine from axe (and modified it)
p_Sqrt88:
   ; input in hlde, result in de
   ld   b,16
   ld   a,h
   ld   c,l
   push   de ; ld ixh,d
   pop   ix ; ld ixl,e
   ld   de,0
   ld   h,d
   ld   l,e
__Sqrt88Loop:
   sub   $40
   sbc   hl,de
   jr   nc,__Sqrt88Skip
   add   a,$40
   adc   hl,de
__Sqrt88Skip:
   ccf
   rl   e
   rl   d
   add   ix,ix
   rl   c
   rla
   adc   hl,hl
   add   ix,ix
   rl   c
   rla
   adc    hl,hl
   djnz   __Sqrt88Loop
   ret
__Sqrt88End:

Kool. Thanks. But, shouldn't the first one have two inputs?

Of course, hl and de, isn't that what I said Wink
Logged
FloppusMaximus
LV5 Advanced (Next: 300)
*****
Offline Offline

Last Login: 09 May, 2013, 05:05:29
Date Registered: 03 October, 2010, 00:02:51
Posts: 286

Total Post Ratings: +52

View Profile
« Reply #35 on: 11 December, 2011, 23:41:31 »
+1

My first multiplication routine takes 2746 - 4570 cycles, the second takes 1680 - 2880 cycles.
Oh boy, optimization time Cheesy

The best I have so far is somewhere around 1800 cycles average (I'm too lazy to work out the exact probabilities at the moment, and not counting memory delays) using a squaring table and undocumented IX instructions.  Input is BDE and CHL, output is BCDEAL.  This routine works by expanding the formula 2xy = x²+y²-|x-y|², summed over each of the 9 pairs of bytes in the input.

(I'm not saying this is practical - unless you really have thousands of 24-bit multiplications to perform, you don't need this kind of speed.  This is just for fun.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
SUBFIRST .macro src1, src2, hdest, ldest
exx
ld a, src1
sub src2
jr nc, $ + 4
neg
exx
ld l, a
ld a, ldest
sub (hl)
ld ldest, a
inc h
ld a, hdest
sbc a, (hl)
ld hdest, a
  .endm

SUBNEXT .macro src1, src2, hdest, ldest
dec h
ex af, af'
exx
ld a, src1
sub src2
jr nc, $ + 4
neg
exx
ld l, a
ex af, af'
ld a, ldest
sbc a, (hl)
ld ldest, a
inc h
ld a, hdest
sbc a, (hl)
ld hdest, a
  .endm

BDE_times_CHL_sqrdiff_v3:
ld a, d
exx
ld h, high(sqrtab)
ld l, a
ld e, (hl)
inc h
ld d, (hl) ; DE = d²
exx
ld a, b
exx
ld l, a
ld b, (hl)
dec h
ld c, (hl) ; BC = b²
exx
ld a, e
exx
ld l, a
ld a, (hl)
inc h
ld h, (hl)
ld l, a ; HL = e²
call BC_DE_HL_times_10101
push bc
push hl
  push de
   exx
   ld a, h
   exx
   ld h, high(sqrtab)
   ld l, a
   ld e, (hl)
   inc h
   ld d, (hl) ; DE = h²
   exx
   ld a, c
   exx
   ld l, a
   ld b, (hl)
   dec h
   ld c, (hl) ; BC = c²
   exx
   ld a, l
   exx
   ld l, a
   ld a, (hl)
   inc h
   ld h, (hl)
   ld l, a ; HL = l²
   call BC_DE_HL_times_10101
   pop ix
  add ix, de
  pop de
adc hl, de
ex de, hl
pop hl
adc hl, bc
ld b, h
ld c, l ; BCDEIX = total
push af

ld h, high(sqrtab)
SUBFIRST e, l, ixh, ixl
SUBNEXT  d, h, d, e
SUBNEXT  b, c, b, c
jp nc, BDE_times_CHL_sqrdiff_v3_nc1
pop af
ccf
push af
BDE_times_CHL_sqrdiff_v3_nc1:

inc b

dec h
SUBFIRST e, h, e, ixh
SUBNEXT  d, c, c, d
jr nc, BDE_times_CHL_sqrdiff_v3_nc2
dec b
jp nz, BDE_times_CHL_sqrdiff_v3_nc2
pop af
ccf
push af
BDE_times_CHL_sqrdiff_v3_nc2:

dec h
SUBFIRST d, l, e, ixh
SUBNEXT  b, h, c, d
jr nc, BDE_times_CHL_sqrdiff_v3_nc3
dec b
jp nz, BDE_times_CHL_sqrdiff_v3_nc3
pop af
ccf
push af
BDE_times_CHL_sqrdiff_v3_nc3:

inc c

dec h
SUBFIRST b, l, d, e
jr nc, BDE_times_CHL_sqrdiff_v3_nc4
dec c
jp nz, BDE_times_CHL_sqrdiff_v3_nc4
dec b
jp nz, BDE_times_CHL_sqrdiff_v3_nc4
pop af
ccf
push af
BDE_times_CHL_sqrdiff_v3_nc4:

dec h
SUBFIRST e, c, d, e
pop hl
jr nc, BDE_times_CHL_sqrdiff_v3_nc5
dec c
jp nz, BDE_times_CHL_sqrdiff_v3_nc5
dec b
jp nz, BDE_times_CHL_sqrdiff_v3_nc5
inc l
BDE_times_CHL_sqrdiff_v3_nc5:

dec b
dec c

rr l
rr b
rr c
rr d
rr e
ld a, ixl
ld l, a
ld a, ixh
rra
rr l
ret


BC_DE_HL_times_10101:
push bc
ld a, h
ex af, af'
sub a
ld c, a
ld b, l
add hl, bc
adc a, a
ld b, e
add hl, bc
adc a, c ; AHL = [ L+H+E L ]
pop bc
push hl
push bc
  ld c, a
  ld b, 0
  ex af, af'
  ld h, a
  add hl, bc ; no way this can carry (initial HL is a square)
  ld c, a
  ld b, e
  sub a
  add hl, bc
  adc a, a ; AHL(SP+2) = [ H+E L+H L+H+E L ]
  add hl, de
  adc a, 0 ; AHL(SP+2) = [ H+E+D L+H+E L+H+E L ]
  pop bc
add hl, bc
adc a, 0 ; AHL(SP) = [ H+E+D+B L+H+E+C L+H+E L ]
ld e, d
ld d, c
add hl, de
adc a, b
jr nc, BC_DE_HL_times_10101_nc1
inc b ; BAHL(SP) = [ B B H+E+D+C+B L+H+E+D+C L+H+E L ]
BC_DE_HL_times_10101_nc1:
add a, e
jr nc, BC_DE_HL_times_10101_nc2
inc b ; BAHL(SP) = [ B D+B H+E+D+C+B L+H+E+D+C L+H+E L ]
BC_DE_HL_times_10101_nc2:
pop de
add a, c
ld c, a
ret nc
inc b ; BCHLDE = [ B D+C+B H+E+D+C+B L+H+E+D+C L+H+E L ]
ret

To get back to the topic somewhat, ACagliano, it sounds like you're more interested in squaring than in general multiplication.  Squaring can be considerably faster, especially if you use a lookup table (e.g., my best 16-bit squaring routine is around 170 cycles, versus around 800 for general multiplication.)
Logged
cerzus69
LV2 Member (Next: 40)
**
Offline Offline

Last Login: 25 December, 2011, 00:32:08
Date Registered: 16 February, 2011, 14:45:22
Posts: 27

Topic starter
Total Post Ratings: +6

View Profile
« Reply #36 on: 12 December, 2011, 17:43:43 »
0

I do have a 24-bit floating-point multiplication routine Grin

saved 2 bytes, 1149 cycles saved by using iy too (and in a way compatible with TIOS, imagine that)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
; hldebc = hlc * bde
ld (iy+asm_Flag1),b
xor a
ld ix,0
ld b,24
Loop:
add ix,ix
rla
rl c
adc hl,hl
jr nc,Next
add ix,de
adc a,(iy+asm_Flag1)
rl c
jr nc,Next
inc hl
Next:
djnz Loop
ld e,a
ld d,c
push ix ; ld c,ixl
pop bc ; ld b,ixh

Jacobly, are you sure this works because I've been going through the code and it seems to me like the second 'rl c' should instead be 'add carry flag to c'. 2 times 'rl c' per loop seems wrong to me. Could you explain please? Because I've tried it as well in wabbitemu, taking the different input in account, and it is still not doing the right thing.
« Last Edit: 12 December, 2011, 17:51:09 by cerzus69 » Logged
ACagliano
LV8 Addict (Next: 1000)
********
Offline Offline

Last Login: 14 May, 2013, 13:02:38
Date Registered: 03 July, 2009, 01:06:06
Posts: 764


Total Post Ratings: +29

View Profile WWW
« Reply #37 on: 12 December, 2011, 21:35:39 »
0

Yeah, all I need is 16-bit subtraction (which 'sub' supports, I think), 16-bit squaring, 32-bit addition, then 32-bit square rooting (or will I need to go up to 40-bit?).
Logged

-ACagliano
TI-Basic software developer

My Website


Current Projects
----------------------------
1. Legend of Zelda "Revenge of Ganon"
        -maps: 100%
        -graphics engine: 20% (sprites)
        -AI engine: 0%
        -event scripts: 60% (text left)
        -walking engine: 100%
        -miscellaneous: 40%
  -total progress:  54%

Xeda112358
Xombie. I am it.
Coder Of Tomorrow
LV12 Extreme Poster (Next: 5000)
*
Offline Offline

Last Login: Yesterday at 22:01:23
Date Registered: 31 October, 2010, 08:46:36
Location: Land of Little Cubes and Tea, NY
Posts: 3760


Total Post Ratings: +609

View Profile
« Reply #38 on: 12 December, 2011, 21:48:12 »
0

16-bit subtraction

1
2
3
or a     ;to make sure the c flag is reset. Not always necessary if you know the c flag will be reset
sbc hl,bc  ;you can do sbc hl,de also.
32-bit addition (you mean two 32-bit inputs?)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
;Inputs:
;     HLBC is one of the 32-bit inputs
;     DE points to the other 32-bit input in RAM
;Outputs:
;     HLBC is the 32-bit result
;     DE is incremented 3 times
;     A=H
;     c flag is set if there is an overflow
     ld a,(de) \ inc de
     add a,c \ ld c,a
     ld a,(de) \ inc de
     adc a,b \ ld b,a
     ld a,(de) \ inc de
     adc a,l \ ld l,a
     ld a,(de)
     adc a,h \ ld h,a
     ret
Squaring and square rooting... I will think on it Big frown

Also, I am working on a mini math library that will include RAM based math (so all the values will be in RAM). It seems like a few of these commands will need to rely on some memory. If they do, I suggest using the OP registers (11 bytes of RAM each).
Logged



Grammer Download (2.29.04.12)
Latest update (possibly incomplete)
My pastebin
Spoiler for FileSyst:
FileSyst is an application that provides a folder and filesystem for the TI-83+/84+ calculators. It is designed to be easy to access and use in BASIC, and it can be used to access game files and save data, or to create a command prompt, among other things:

Spoiler for Graphiti:
This is a graph explorer for graph theory. It will require lots of work to finish. Currently you can:
Add/delete vertices
Add edges (direction not shown, but they are directed)
Arrange vertices in a circle (in the future, you will be able to define levels of rings and the number of nodes in each)
Create complete graphs quickly

Plans:
Add adjacency matrix viewer
Deleting edges
Multiple graphs support
Arrows for directed graphs
Planarity testing
Matrix operations
Weighted edges
Chromatic polynomials
Chromatic numbers

Spoiler for Stats:

Samocal             [o---------]
Virtual Processor   [o---------]
EnG                 [oo--------]
Grammer             [ooo-------]
AsmComp             [ooo-------]
Partex              [oooo------]
BatLib              [oooooooo--]
Grammer82           [----------]
Grammer68000        [----------]


Pseudonyms:  Zeda, Xeda, Thunderbolt
Languages:   English, français
Programming: z80 Assmebly
             Grammer
             TI-BASIC (83/84/+/SE, 89/89t/92)
Known For:   -Creator of the Grammer programming language
              (Winning program of zContest2011)
             -BatLib- One of the most feature packed libraries for BASIC programmers available
              with over 100 functions and a simple programming language
             -Learning to program z80 in hexadecimal before using an assembler (no computer was
              available!)
╔═╦╗░╠═╬╣▒║ ║║▓╚═╩╝█


jacobly
LV4 Regular (Next: 200)
****
Offline Offline

Last Login: Yesterday at 22:58:38
Date Registered: 09 October, 2011, 01:53:09
Posts: 199

Total Post Ratings: +149

View Profile
« Reply #39 on: 13 December, 2011, 02:07:45 »
+1

I do have a 24-bit floating-point multiplication routine Grin

saved 2 bytes, 1149 cycles saved by using iy too (and in a way compatible with TIOS, imagine that)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
; hldebc = hlc * bde
ld (iy+asm_Flag1),b
xor a
ld ix,0
ld b,24
Loop:
add ix,ix
rla
rl c
adc hl,hl
jr nc,Next
add ix,de
adc a,(iy+asm_Flag1)
rl c
jr nc,Next
inc hl
Next:
djnz Loop
ld e,a
ld d,c
push ix ; ld c,ixl
pop bc ; ld b,ixh

Jacobly, are you sure this works because I've been going through the code and it seems to me like the second 'rl c' should instead be 'add carry flag to c'. 2 times 'rl c' per loop seems wrong to me. Could you explain please? Because I've tried it as well in wabbitemu, taking the different input in account, and it is still not doing the right thing.

That's strange. My test program must not have been working right, because when I went back and changed it a bit, it suddenly started telling me that the second routine doesn't work. Undecided
Anyway, my new test program seems to agree with this change. Smiley

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
; hldebc = hlc * bde
ld (iy+asm_Flag1),b
xor a
ld ix,0
ld b,24
Loop:
add ix,ix
rla
rl c
adc hl,hl
jr nc,Next
add ix,de
adc a,(iy+asm_Flag1)
jr nc,Next
inc c
jr nz,Next
inc hl
Next:
djnz Loop
ld e,a
ld d,c
push ix ; ld c,ixl
pop bc ; ld b,ixh
Logged
cerzus69
LV2 Member (Next: 40)
**
Offline Offline

Last Login: 25 December, 2011, 00:32:08
Date Registered: 16 February, 2011, 14:45:22
Posts: 27

Topic starter
Total Post Ratings: +6

View Profile
« Reply #40 on: 13 December, 2011, 18:06:38 »
0

That's strange. My test program must not have been working right, because when I went back and changed it a bit, it suddenly started telling me that the second routine doesn't work. Undecided
Anyway, my new test program seems to agree with this change. Smiley

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
; hldebc = hlc * bde
ld (iy+asm_Flag1),b
xor a
ld ix,0
ld b,24
Loop:
add ix,ix
rla
rl c
adc hl,hl
jr nc,Next
add ix,de
adc a,(iy+asm_Flag1)
jr nc,Next
inc c
jr nz,Next
inc hl
Next:
djnz Loop
ld e,a
ld d,c
push ix ; ld c,ixl
pop bc ; ld b,ixh

Cool, thanks a lot, indeed it works now! Cheesy
Logged
Pages: 1 2 [3]   Go Up
  Print  
 
Jump to:  

Powered by EzPortal
Powered by MySQL Powered by SMF 1.1.18 | SMF © 2013, Simple Machines Powered by PHP
Page created in 1.448 seconds with 31 queries.
Skin by DJ Omnimaga edited from SMF default theme with the help of tr1p1ea.
All programs, games and songs avaliable on this website are property of their respective owners.
Best viewed in Opera, Firefox, Chrome and Safari with a resolution of 1024x768 or above.