Calculator Community > ASM

ASM Optimized routines

(1/22) > >>

Galandros:
There are some cools optimized routines around. Calcmaniac is the recordist in z80, probably. At least in calculators z80 forums is.

On to the code:

--- Code: ---;calcmaniac84
cpHLDE:
or a
sbc hl,de
ret
;Important note: because the code is 3 bytes and a call is 3 bytes, just macro in:
;SPASM, TASM and BRASS compatible, I guess
#define cp_HLDE  or a \ sbc hl,de \ add hl,de

;- Reverse a
;input: Byte in A
;output: Reversed byte in A
;destroys B
;Clock cycles: 66
;Bytes: 18
;author: calcmaniac84
reversea:
ld b,a
rrca
rrca
xor b
and %10101010
xor b
ld b,a
rrca
rrca
rrca
rrca
xor b
and %01100110
xor b
rrca
ret

;reverse hl
;curiosity: a easy port of a common reverse A register is more efficient than tricky stuff
;calcmaniac84
;28 bytes and 104 cycles
ld a,l
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rrca
ld l,a
ret

;calc84maniac
;in: a = ABCDEFGH
;out: hl= AABBCCDDEEFFGGHH
rrca
rra
rra
ld l,a
rra
sra l
rla
rr l
sra l
rra
rr l
sra l

rrca
rra
rra
ld h,a
rra
sra h
rla
rr h
sra h
rra
rr h
sra h
ret

--- End code ---

--- Code: ---;Galandros optimized routines
;try to beat me... maybe is possible...

;Displays A register content on screen in decimal ASCII number, using no addition memory
DispA:
ld c,-100
call Na1
ld c,-10
call Na1
ld c,-1
Na1: ld b,'0'-1
Na2: inc b
jr c,Na2
sub c ;works as add 100/10/1
push af ;safer than ld c,a
ld a,b ;char is in b
CALL PUTCHAR ;plot a char. Replace with bcall(_PutC) or similar.
pop af ;safer than ld a,c
ret

;Note the following one is optimized for RPGs menus and the such, it is quite flexible. I am going to use in Lost Legends I ^^
;I started with one which used addition RAM for temporary storage (made by me, too), and optimized for size, speed and no extra memory use! ^.^
;the inc's and dec's were trick to debug -.-", the registers b and c are like counters and flags

;DispHL for games
;input: hl=num, d=row,e=col, c=number of algarisms to skip
;number of numbers' characters to display: 5 ; example: 65000
;output: hl displayed, with algarisms skiped and spaces for initial zeros
DispHL_games:
inc c
ld b,1 ;skip 0 flag
ld (CurRow),de
;Number in hl to decimal ASCII
;Thanks to z80 Bits
;inputs: hl = number to ASCII
;example: hl=300 outputs '  300'
;destroys: af, hl, de used
ld de,-10000
call Num1
ld de,-1000
call Num1
ld de,-100
call Num1
ld e,-10
call Num1
ld e,-1
Num1:
ld a,'0'-1
Num2: inc a
jr c,Num2
sbc hl,de
dec c ;c is skipping
jr nz,skipnum
inc c
djnz notcharnumzero
cp '0'
jr nz,notcharnumzero
inc b
skipnum:
ld a,' '
notcharnumzero:
push bc
call PUTCHAR  ;bcall(_PutC) works, not sure if it preserves bc
pop bc
ret

PUTCHAR:
bcall(_PutC)
ret

;Example usage of DispHL_games to understand what I mean
Test2:
ld hl,60003
ld de,\$0101
ld c,0
call DispHL_games
ld hl,60003
ld de,\$0102
ld c,1
call DispHL_games
ret

--- End code ---

Well, don't try to understand or optimize calcmaniac84 ones. j/k, trying to understand can be harsh (tip: have a good instruction set summary) but teaches some inner details of the z80 asm.

Quigibo:
Here is a little optimization I use but haven't really seen around.  When you need a direct key press, you have to wait about 7 clock cycles between setting the port and reading it.  Most people just fill in the extra space with a waste instruction like this:

--- Code: ---ld a,xx
out (1),a
ld a,(de)
in a,(1)
and yy

--- End code ---
9 Bytes, 43 T-States.

You can actually use the waste instruction to do something useful.  It gives a slight speed increase.

--- Code: ---ld a,xx
out (1),a
ld b,yy
in a,(1)
and b

--- End code ---
9 Bytes, 40 T-States.

calc84maniac:
Small and quick setup for IM 2 (this example sets up vector table at \$9900 and interrupt jump at \$9a9a, but values can be changed)

--- Code: ---di
ld a,\$99
ld bc,\$0100
ld h,a
ld d,a
ld l,c
ld e,b
ld i,a
inc a
ld (hl),a
ldir
ld l,a
ld (hl),\$c3
inc l
ld (hl),intvec & \$ff
inc l
ld (hl),intvec >> 8
im 2
ei
--- End code ---

Galandros:
I found this optimized routine around. It is as far optimized as z80 string copy can get.

--- Code: ---;author: calcmaniac84, I think
;Copy zero terminated string at HL to DE.
StrCopy:
xor a
docopystr:
cp (hl)
ldi
jr nz,docopystr
ret

--- End code ---

These are quite optimized. But may be is possible to optimize further. (speed and size) But it is not needed...
They shift a graphics buffer (optimized to 96x64) up or down by pixels passed in A register.

--- Code: ---scroll_up:
#ifdef DEBUG
cp 64+1
call c,ErrorOverFlow
#endif
ld l,a
ld e,a
ld h,0
ld d,h

push hl
ld de,768
ex de,hl
; carry is never set here if input is correct
; or a
sbc hl,de
ld b,h
ld c,l ; bc=768-12*a
ex de,hl
ld de,plotsscreen
ldir
;blank remaining area
ld h,d
ld l,e
inc de
ld (hl),\$00
pop bc
dec bc ; bc=12*a-1
ldir
ret
;PSEUDO CODE
; ld hl,plotsscreen+12*a
; ld de,plotsscreen
; ld bc,768-12*a
; ldir
; ld h,d
; ld l,e
; ld (hl),\$00
; inc de
; ld bc,12*a
; dec bc
; ldir
; ret

scroll_down:
#ifdef DEBUG
cp 64+1
call c,ErrorOverFlow
#endif
; a can be from 1 to 63
; a can be multiplied by 4
ld l,a ; hl = a*4
ld e,a
xor a
ld h,a
ld d,a
add hl,hl ; hl = a*8
add hl,de ; hl = a*12
ld e,a ; de = 0

push hl ; a*12 will needed later
push hl ; 2 times
ex de,hl
;carry is never set here
; or a
sbc hl,de ; hl= -a*12, de=a*12
ld de,plotsscreen+767
pop bc
push hl
ld hl,768+1
;carry always set
; or a
sbc hl,bc
ld b,h
ld c,l
pop hl
lddr
;blank remaining area
ld h,d
ld l,e
ld (hl),\$00
dec de
pop bc
dec bc
lddr
ret

; ld hl,plotsscreen+767-12*a
; ld de,plotsscreen+767
; ld bc,768-12*a
; lddr
; or
; ld (hl),\$00 ;; ld hl,plotsscreen
; ld h,d ;; ld (hl),\$00
; ld l,e ;; ld de,hl+1
; dec de ;; ld bc,12*a-1
; ld bc,12*a-1 ;; ldir
; lddr ;; ret
; ret

--- End code ---

mapar007:
Very nice! I'll add these to my utils.z80 file that is included in all my app builds.

Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k