Omnimaga
Calculator Community => TI Calculators => ASM => Topic started by: ACagliano on October 10, 2011, 10:14:06 am
-
I need a couple asm routines. They are for a program I am working on. They are as follows. The data the routine reads will be pushed onto the stack. Assuming you pop into hl:
1. Render a circle onto the screen
hl = time to wait, after rendering element, before moving on
hl + 1 = circle center x coord
hl + 2 = circle center y coord
hl + 3 = radius
2. Render a white rectangle with black border onto screen
hl = time to wait, after rendering element, before moving on
hl + 1 = x of upper left corner
hl + 2 = y of upper left corner
hl + 3 = width
hl + 4 = height
3. Render text onto the screen
hl = time to wait...
hl + 1 = x to start display
hl + 2 = y to start display
hl + 3 = width of text display (in chars)
hl + 4 = zero t'ded string
4. Render sprite onto screen
hl = time to wait...
hl + 1 = x to start
hl + 2 = y to start
hl + 3 = width
hl + 4 = height
hl + 5 = sprite data
Any help would be great.
-
While I'm not necessarily going to write the routines for you (I've already written quite a few) here's how I would go about writing them:
1. I know my house might get bombed for this, but honestly, the only way I can see to do this would be to check out how Axe does it. Axe draws perfect circles very quickly and if I needed a circle routine, I would just copy axe's.
2. This one has actually been done for you, bcall(_DrawRectBorderClear) (http://education.ti.com/calculators/downloads/US/Software/Download/en/177/6585/83psysroutines.pdf#page=142) draws a rectangle with a white inner section. (Page 142 if your browser doesn't redirect you)
3. If you mean just draw text, then bcall(_vPutS) is your routine. But, if you mean draw text within a certain boundary area, bcall(_SFont_Len) will tell you how long an individual letter is. From there, you can draw the letters 1 by 1 with bcall(_vPutMap) and start a new line whenever you run out of space. (bcall(_SFont_Len) is on page 47 of that pdf I linked above)
4. There are so many sprite rendering routines, there's no need to make a new one. Here (http://wikiti.brandonw.net/index.php?title=Z80_Routines:Graphic:put8x8sprite) is a page for 8-bit wide sprites and here (http://wikiti.brandonw.net/index.php?title=Z80_Routines:Graphic:put16xBsprite) is a page for 16-bit wide sprites. However, if you want just a single routine to do all of your sprites no matter how big they are, this (http://wikiti.brandonw.net/index.php?title=Z80_Routines:Graphic:putLargeSprite) is your routine. Of course, picking one of the smaller ones would be faster than this though.
As for the delays, your best bet is to make a single delay routine and call it. In all honesty this routine right here is probably all you need:
Delay:
dec hl
ld a, h
or l
jr nz, delay
ret
-
1. I know my house might get bombed for this, but honestly, the only way I can see to do this would be to check out how Axe does it. Axe draws perfect circles very quickly and if I needed a circle routine, I would just copy axe's.
Ok. Will do.
2. This one has actually been done for you, bcall(_DrawRectBorderClear) (http://education.ti.com/calculators/downloads/US/Software/Download/en/177/6585/83psysroutines.pdf#page=142) draws a rectangle with a white inner section. (Page 142 if your browser doesn't redirect you)
Awesome. Thanks.
3. If you mean just draw text, then bcall(_vPutS) is your routine. But, if you mean draw text within a certain boundary area, bcall(_SFont_Len) will tell you how long an individual letter is. From there, you can draw the letters 1 by 1 with bcall(_vPutMap) and start a new line whenever you run out of space. (bcall(_SFont_Len) is on page 47 of that pdf I linked above)
Yeah, I'll be drawing text of the small font within a boundary area.
4. There are so many sprite rendering routines, there's no need to make a new one. Here (http://wikiti.brandonw.net/index.php?title=Z80_Routines:Graphic:put8x8sprite) is a page for 8-bit wide sprites and here (http://wikiti.brandonw.net/index.php?title=Z80_Routines:Graphic:put16xBsprite) is a page for 16-bit wide sprites. However, if you want just a single routine to do all of your sprites no matter how big they are, this (http://wikiti.brandonw.net/index.php?title=Z80_Routines:Graphic:putLargeSprite) is your routine. Of course, picking one of the smaller ones would be faster than this though.
I need one routine to draw sprites of different sizes. Thanks.
As for the delays, your best bet is to make a single delay routine and call it. In all honesty this routine right here is probably all you need:
Delay:
dec hl
ld a, h
or l
jr nz, delay
ret
Yeah. I think I can handle that. How many 'nop' or 'halt' cycles would equal 1 second, in fast mode?
-
Yeah. I think I can handle that. How many 'nop' or 'halt' cycles would equal 1 second, in fast mode?
For nops, you need whole heck of a lot. Each nop takes 4 t-states. And you figure the median calculator is running at 15,500,000 t-states per second so... 3.7 million nops.
If you want to use instructions to slow down your program, here's a routine you can use:
;hl = milliseconds of delay
milliDelay:
ld b, 174 ;7
innerLoop:
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
djnz innerLoop ;13*173+8 ;2257
dec hl ;6
ld a, h ;4
or l ;4
jr nz, milliDelay ;13
ret ;15,515
Halts work entirely different though. Halts wait for an interrupt and the interrupts run at a constant speed. That speed is 118 Hz on an 83+ and 107.79 Hz on everything else. But since you said you're running in fast mode, clearly you are not on an 83+ so you would need 108 halts to wait for one second.
(I said "median" calculator up above because calculators run anywhere from 14.5 MHz to 17.0 MHz in fast mode)
-
Yeah. I think I can handle that. How many 'nop' or 'halt' cycles would equal 1 second, in fast mode?
For nops, you need whole heck of a lot. Each nop takes 4 t-states. And you figure the median calculator is running at 15,500,000 t-states per second so... 3.7 million nops.
If you want to use instructions to slow down your program, here's a routine you can use:
;hl = milliseconds of delay
milliDelay:
ld b, 174 ;7
innerLoop:
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
djnz innerLoop ;13*173+8 ;2257
dec hl ;6
ld a, h ;4
or l ;4
jr nz, milliDelay ;13
ret ;15,515
Halts work entirely different though. Halts wait for an interrupt and the interrupts run at a constant speed. That speed is 118 Hz on an 83+ and 107.79 Hz on everything else. But since you said you're running in fast mode, clearly you are not on an 83+ so you would need 108 halts to wait for one second.
(I said "median" calculator up above because calculators run anywhere from 14.5 MHz to 17.0 MHz in fast mode)
Well, this may be the delay I utilize. Let me clarify...by "fast mode" I mean that I have opened the program with this...
in a,($2E) ;initialize faster processing
push af
ld a,0
out ($2E),a
call Start ;jump to main program. the main program will return here when 'ret' is called
pop af
out ($2E),a
ret
-
Oh, well, that's not really fast mode. Doing that will get you a 15% speed increase in the best case scenario, and that scenario is that you are running from flash in 15Mhz mode.
Port ($2E) is actually just a delay port that TI added. What it does is it takes away the 1 t-states per read from flash delay that TI added. Furthermore, it's effects are only seen if you are running in 15 Mhz mode, it has no effect in 6Mhz mode. (Unless you do some other stuff. Check WikiTI for the full interaction between ports 29, 2A, 2B, and 2E.)
This code however:
ld a, 3
out ($20), a
Will make the calculator run 250% faster. This is all it takes to put the calculator into 15Mhz mode. In all honesty, you really don't even need your code. The only reason you might need it is if you are running a very time-sensitive app. But, since you know about it, you might as well zero port ($2E) anyways. Also, you are 100% allowed to leave port ($20) at 03 (default 00) because the OS is going to throw it in 03 when you return anyways. Leaving port ($2E) at 00 should be fine as well ;)
-
;hl = milliseconds of delay
milliDelay:
ld b, 174 ;7
innerLoop:
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
djnz innerLoop ;13*173+8 ;2257
dec hl ;6
ld a, h ;4
or l ;4
jr nz, milliDelay ;13
ret ;15,515
What would I do if hl is in seconds, rather than milliseconds? And, actually, my code holds the number of seconds to delay in 'a'.
-
Multiply hl by 1000 to turn it into milliseconds. If you don't need all of the precision, leftshifting hl by 10 is approximately the same.
-
Multiply hl by 1000 to turn it into milliseconds. If you don't need all of the precision, leftshifting hl by 10 is approximately the same.
Can I use a for this, rather than hl? And what routines can I use to draw a line between two pixel coords?
-
Can I use a for this, rather than hl?
;a = milliseconds of delay
milliDelay:
ld b, 174 ;7 <-Increase this for larger delay increments.
innerLoop:
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
djnz innerLoop ;13*173+8 ;2257
dec a ;4
nop
nop ;Left here to keep similar timing.
jr nz, milliDelay ;13
ret ;15,515
Or you can use A as the high byte of HL
;a = high byte of milliseconds of delay
ld l, 0 ;7
ld h, a ;4
milliDelay:
ld b, 174 ;7
innerLoop:
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
ex (sp), hl ;19*174 ;3306
djnz innerLoop ;13*173+8 ;2257
dec hl ;6
ld a, h ;4
or l ;4
jr nz, milliDelay ;13
ret ;15,515
-
Ok, now, all I have to do is figure out...right now 'a' is in milliseconds. My 'a' is in seconds.
-
So if you want to delay for approximately 'a' seconds you can try this:
ld b,107
halt
djnz $-1
dec a
jr nz,$-6
ret
Does this help?
-
So if you want to delay for approximately 'a' seconds you can try this:
ld b,107
halt
djnz $-1
dec a
jr nz,$-6
ret
Does this help?
What does $-6 mean? $-1?
-
So if you want to delay for approximately 'a' seconds you can try this:
ld b,107
halt
djnz $-1
dec a
jr nz,$-6
ret
Does this help?
What does $-6 mean? $-1?
$ refers to the address of the current instruction. We usually use that if we're too lazy to use label names. This code is equivalent:
delayLoop:
ld b,107
haltLoop:
halt
djnz haltLoop
dec a
jr nz,delayLoop
ret
-
So if you want to delay for approximately 'a' seconds you can try this:
ld b,107
halt
djnz $-1
dec a
jr nz,$-6
ret
Does this help?
Remember to have interrupts enabled. ;)
-
So if you want to delay for approximately 'a' seconds you can try this:
ld b,107
halt
djnz $-1
dec a
jr nz,$-6
ret
Does this help?
Remember to have interrupts enabled. ;)
Thank you mucho. I don't actually disable them to begin with...lol.
-
If you don't want to look up the instruction bytes and are too lazy to write out labels, you could also use relative labels (with Spasm):
_ ld b,107
_ halt
djnz -_
dec a
jr nz, --_
ret
-
Okay, so Runer was speculating about how to get the best speed out of a math routine, so the first challenge he gave was for 8x8 multiplication with a 16-bit output. I am not doing all that well with the challenge, but here is a variation of what I came up with that actually is a 8x16 multiplication (it requires 4 more cycles to make it 8x8).
A_Times_DE:
;Input:
; A,DE
;Outputs:
; A is 0
; BC is not changed
; DE is not changed
; HL is the result
; z flag is set
; c flag is set if the input A is not 0
;Notes:
; If A is 0, 29 cycles
;Speed: 145+6n+21b cycles
; n=floor(log(a)/log(2))
; b is the number of bits in the number
; Testing over all values of A from 1 to 255:
; 313.7058824 average cycles
; Worst: 355
; Best : 166 (non trivial)
;Size: 25 bytes
ld hl,0 ;10
or a \ ret z ;9
cpl \ scf ;8
adc a,a ;4
jp nc,$+7 ;10 ;45
Loop:
add a,a ;4
jp c,$-1 ;10 ;14(7-n)
add hl,de ;11 ;11 (the rest are counted below)
add a,a ;4 ;4b
ret z ;5|11 ;5b+6
add hl,hl ;11 ;11b-11
jp p,$-4 ;21|20 ;20n+b
jp $-7
So that code is about twice as large as the following more standard routine, but it is also on average about 52 cycles faster and the worst case scenario is 35 cycles better than the worst case for the next routine. The advantage with the following code is that the speed is not all over the place:
DE_Times_A:
;Inputs:
; DE and A are factors
;Outputs:
; A is not changed
; B is 0
; C is not changed
; DE is not changed
; HL is the product
;Speed:
; 342+6x, x is the number of bits in A
; Average: 366 cycles
;Size:
; 13 bytes
ld b,8 ;7 7
ld hl,0 ;10 10
add hl,hl ;11*8 88
rlca ;4*8 32
jr nc,$+3 ;(12|18)*8 96+6x
add hl,de ;-- --
djnz $-5 ;13*7+8 99
ret ;10 10
Another interesting note is that the first routine does not use a counter and so it preserves BC. For the best speed without an LUT, I still think unrolled routines are the best :)
Actually, another note-- the first routine doesn't really get a speed boost from unrolling :/
You can make a hybrid of the two codes that will cost the b register, but it will be 21 bytes and average a smidge over 320 cycles, too.
EDIT: Also, I posted here because the topic title had a nice name and I am not ready to submit it elsewhere without the scrutiny of the better asm coders, first :P
-
Okay, so Runer was speculating about how to get the best speed out of a math routine, so the first challenge he gave was for 8x8 multiplication with a 16-bit output. I am not doing all that well with the challenge, but here is a variation of what I came up with that actually is a 8x16 multiplication (it requires 4 more cycles to make it 8x8).
Unless I've made a mistake somewhere, this routine doesn't work as written, because at the time you're testing the sign flag, the bit you're interested in has already been shifted out. But it's an interesting idea, so here's a version that works (but could probably be optimized more):
ld hl,0 ; 10
or a ; 4
ret z ; 5
scf ; 4
skip_zeroes:
adc a,a ; 4(9-n)
jr nc,skip_zeroes ; 12(9-n) - 5
jp loop_add0 ; 10
loop_add:
ret z ; 5k + 6
add hl,hl ; 11k
loop_add0:
add hl,de ; 11(k+1)
loop_noadd:
add a,a ; 4n
jr c, loop_add ; 7n + 5k
add hl,hl ; 11(n-k-1)
jp loop_noadd ; 10(n-k-1)
If I've worked it out correctly, this has a minimum (non-trivial) running time of 192, a maximum of 437, and average of ~368.57.
-
For reference, here are the timings for the routines from Z80 Bits (http://baze.au.com/misc/z80bits.html#1.1).
Rolled:
; Ref Worst Best
MultHbyE:
ld l, 0 ; 7 7 7
ld d, l ; 4 4 4
sla h ; 8 8 8
jr nc,$+3 ; 12/7 12 7
ld l,e ; 4 4
ld b, 7 ; 7 7 7
_: add hl,hl ; 11 11 11
jr nc,$+3 ; 12/7 7 12
add hl,de ; 11 11
djnz -_ ; 13/8 13 13
; -5 -5
; 327 284
MultAbyDE:
ld hl, 0 ; 10 10 10
ld c, l ; 4 4 4
add a,a ; 4 4 4
jr nc,$+4 ; 12/7 7 12
ld h,d ; 4 4
ld l,e ; 4 4
ld b, 7 ; 7 7 7
_: add hl,hl ; 11 11 11
rla ; 4 4 4
jr nc,$+4 ; 12/7 7 12
add hl,de ; 11 11
adc a,c ; 4 4
djnz -_ ; 13/8 13 13
; -5 -5
; 385 312
Unrolled:
MultHbyE:
; L and D must already be 0
sla h ; 8 8 8
jr nc,$+3 ; 12/7 12 7
ld l,e ; 4 4
add hl,hl ; 11 11 11
jr nc,$+3 ; 12/7 7 12
add hl,de ; 11 77
; *7 *7 *7
; 223 180
MultAbyDE:
; HL and C must already be 0
add a,a ; 4 4 4
jr nc,$+4 ; 12/7 7 12
ld h,d ; 4 4
ld l,e ; 4 4
add hl,hl ; 11 11 11
rla ; 4 4 4
jr nc,$+4 ; 12/7 7 12
add hl,de ; 11 11
adc a,c ; 4 4
; *7 *7 *7
; 278 205
-
This was to Floppus, but I lost internet connection for a while:
Dang, I must have messed up big time somewhere. You are right, that routine really does not work at all. Now I need to figure out where I went wrong because I was sure I had a working version :(
I think you over counted your cycles, too o.o What I did to make things easier was to remove the code that stripped the leading zeroes and that alone has a worst case of 404 cycles and best case of 327 cycles. pretty much, it is 316+11b where b is the number of bits in A (b=0 is the trivial case and I am not counting that). Your routine is still eluding me how to best analyse it, though, but I think it is a lot faster than you thought and better than the normal routine.