• ASM Optimized routines 5 1
Currently:

### Author Topic: ASM Optimized routines  (Read 46505 times)

0 Members and 2 Guests are viewing this topic.

#### Galandros

• LV9 Veteran (Next: 1337)
• Posts: 1140
• Rating: +42/-10
##### ASM Optimized routines
« on: February 28, 2010, 07:27:53 am »
There are some cools optimized routines around. Calcmaniac is the recordist in z80, probably. At least in calculators z80 forums is.

On to the code:
Code: [Select]
;calcmaniac84
cpHLDE:
or a
sbc hl,de
ret
;Important note: because the code is 3 bytes and a call is 3 bytes, just macro in:
;SPASM, TASM and BRASS compatible, I guess
#define cp_HLDE  or a \ sbc hl,de \ add hl,de

;- Reverse a
;input: Byte in A
;output: Reversed byte in A
;destroys B
;Clock cycles: 66
;Bytes: 18
;author: calcmaniac84
reversea:
ld b,a
rrca
rrca
xor b
and %10101010
xor b
ld b,a
rrca
rrca
rrca
rrca
xor b
and %01100110
xor b
rrca
ret

;reverse hl
;curiosity: a easy port of a common reverse A register is more efficient than tricky stuff
;calcmaniac84
;28 bytes and 104 cycles
ld a,l
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rrca
ld l,a
ret

;calc84maniac
;in: a = ABCDEFGH
;out: hl= AABBCCDDEEFFGGHH
rrca
rra
rra
ld l,a
rra
sra l
rla
rr l
sra l
rra
rr l
sra l

rrca
rra
rra
ld h,a
rra
sra h
rla
rr h
sra h
rra
rr h
sra h
ret

Code: [Select]
;Galandros optimized routines
;try to beat me... maybe is possible...

;Displays A register content on screen in decimal ASCII number, using no addition memory
DispA:
ld c,-100
call Na1
ld c,-10
call Na1
ld c,-1
Na1: ld b,'0'-1
Na2: inc b
jr c,Na2
sub c ;works as add 100/10/1
push af ;safer than ld c,a
ld a,b ;char is in b
CALL PUTCHAR ;plot a char. Replace with bcall(_PutC) or similar.
pop af ;safer than ld a,c
ret

;Note the following one is optimized for RPGs menus and the such, it is quite flexible. I am going to use in Lost Legends I ^^
;I started with one which used addition RAM for temporary storage (made by me, too), and optimized for size, speed and no extra memory use! ^.^
;the inc's and dec's were trick to debug -.-", the registers b and c are like counters and flags

;DispHL for games
;input: hl=num, d=row,e=col, c=number of algarisms to skip
;number of numbers' characters to display: 5 ; example: 65000
;output: hl displayed, with algarisms skiped and spaces for initial zeros
DispHL_games:
inc c
ld b,1 ;skip 0 flag
ld (CurRow),de
;Number in hl to decimal ASCII
;Thanks to z80 Bits
;inputs: hl = number to ASCII
;example: hl=300 outputs '  300'
;destroys: af, hl, de used
ld de,-10000
call Num1
ld de,-1000
call Num1
ld de,-100
call Num1
ld e,-10
call Num1
ld e,-1
Num1:
ld a,'0'-1
Num2: inc a
jr c,Num2
sbc hl,de
dec c ;c is skipping
jr nz,skipnum
inc c
djnz notcharnumzero
cp '0'
jr nz,notcharnumzero
inc b
skipnum:
ld a,' '
notcharnumzero:
push bc
call PUTCHAR  ;bcall(_PutC) works, not sure if it preserves bc
pop bc
ret

PUTCHAR:
bcall(_PutC)
ret

;Example usage of DispHL_games to understand what I mean
Test2:
ld hl,60003
ld de,$0101 ld c,0 call DispHL_games ld hl,60003 ld de,$0102
ld c,1
call DispHL_games
ret

Well, don't try to understand or optimize calcmaniac84 ones. j/k, trying to understand can be harsh (tip: have a good instruction set summary) but teaches some inner details of the z80 asm.
Hobbing in calculator projects.

#### Quigibo

• The Executioner
• CoT Emeritus
• LV11 Super Veteran (Next: 3000)
• Posts: 2031
• Rating: +1075/-24
##### Re: ASM Optimized routines
« Reply #1 on: February 28, 2010, 05:21:57 pm »
Here is a little optimization I use but haven't really seen around.  When you need a direct key press, you have to wait about 7 clock cycles between setting the port and reading it.  Most people just fill in the extra space with a waste instruction like this:

Code: [Select]
ld a,xx
out (1),a
ld a,(de)
in a,(1)
and yy
9 Bytes, 43 T-States.

You can actually use the waste instruction to do something useful.  It gives a slight speed increase.

Code: [Select]
ld a,xx
out (1),a
ld b,yy
in a,(1)
and b
9 Bytes, 40 T-States.
« Last Edit: February 28, 2010, 05:23:48 pm by Quigibo »
___Axe_Parser___
Today the calculator, tomorrow the world!

#### calc84maniac

• eZ80 Guru
• Coder Of Tomorrow
• LV11 Super Veteran (Next: 3000)
• Posts: 2897
• Rating: +467/-17
##### Re: ASM Optimized routines
« Reply #2 on: February 28, 2010, 08:12:27 pm »
Small and quick setup for IM 2 (this example sets up vector table at $9900 and interrupt jump at$9a9a, but values can be changed)
Code: [Select]
di
ld a,$99 ld bc,$0100
ld h,a
ld d,a
ld l,c
ld e,b
ld i,a
inc a
ld (hl),a
ldir
ld l,a
ld (hl),$c3 inc l ld (hl),intvec &$ff
inc l
ld (hl),intvec >> 8
im 2
ei
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

#### Galandros

• LV9 Veteran (Next: 1337)
• Posts: 1140
• Rating: +42/-10
##### Re: ASM Optimized routines
« Reply #3 on: April 24, 2010, 12:12:44 pm »
I found this optimized routine around. It is as far optimized as z80 string copy can get.
Code: [Select]
;author: calcmaniac84, I think
;Copy zero terminated string at HL to DE.
StrCopy:
xor a
docopystr:
cp (hl)
ldi
jr nz,docopystr
ret

These are quite optimized. But may be is possible to optimize further. (speed and size) But it is not needed...
They shift a graphics buffer (optimized to 96x64) up or down by pixels passed in A register.
Code: [Select]
scroll_up:
#ifdef DEBUG
cp 64+1
call c,ErrorOverFlow
#endif
ld l,a
ld e,a
ld h,0
ld d,h

push hl
ld de,768
ex de,hl
; carry is never set here if input is correct
; or a
sbc hl,de
ld b,h
ld c,l ; bc=768-12*a
ex de,hl
ld de,plotsscreen
ldir
;blank remaining area
ld h,d
ld l,e
inc de
ld (hl),$00 pop bc dec bc ; bc=12*a-1 ldir ret ;PSEUDO CODE ; ld hl,plotsscreen+12*a ; ld de,plotsscreen ; ld bc,768-12*a ; ldir ; ld h,d ; ld l,e ; ld (hl),$00
; inc de
; ld bc,12*a
; dec bc
; ldir
; ret

scroll_down:
#ifdef DEBUG
cp 64+1
call c,ErrorOverFlow
#endif
; a can be from 1 to 63
; a can be multiplied by 4
ld l,a ; hl = a*4
ld e,a
xor a
ld h,a
ld d,a
add hl,hl ; hl = a*8
add hl,de ; hl = a*12
ld e,a ; de = 0

push hl ; a*12 will needed later
push hl ; 2 times
ex de,hl
;carry is never set here
; or a
sbc hl,de ; hl= -a*12, de=a*12
ld de,plotsscreen+767
pop bc
push hl
ld hl,768+1
;carry always set
; or a
sbc hl,bc
ld b,h
ld c,l
pop hl
lddr
;blank remaining area
ld h,d
ld l,e
ld (hl),$00 dec de pop bc dec bc lddr ret ; ld hl,plotsscreen+767-12*a ; ld de,plotsscreen+767 ; ld bc,768-12*a ; lddr ; or ; ld (hl),$00 ;; ld hl,plotsscreen
; ld h,d ;; ld (hl),$00 ; ld l,e ;; ld de,hl+1 ; dec de ;; ld bc,12*a-1 ; ld bc,12*a-1 ;; ldir ; lddr ;; ret ; ret « Last Edit: April 24, 2010, 12:15:14 pm by Galandros » Hobbing in calculator projects. #### mapar007 • LV7 Elite (Next: 700) • Posts: 550 • Rating: +28/-5 • The Great Mata Mata ##### Re: ASM Optimized routines « Reply #4 on: April 25, 2010, 03:58:56 am » Very nice! I'll add these to my utils.z80 file that is included in all my app builds. Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k #### Galandros • LV9 Veteran (Next: 1337) • Posts: 1140 • Rating: +42/-10 ##### Re: ASM Optimized routines « Reply #5 on: April 25, 2010, 05:04:47 am » Very nice! I'll add these to my utils.z80 file that is included in all my app builds. Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k Actually I am working on something like that. I am hand writing C functions in z80 assembly just for fun. I will share them when I finish. After seeing Axe Parser, it seems that is possible doing a good C compiler for z80. And we have documentation on how to optimize z80 assembly to do a optimizer, check the WikiTI topic: http://wikiti.brandonw.net/index.php?title=Z80_Optimization. « Last Edit: April 25, 2010, 05:14:53 am by Galandros » Hobbing in calculator projects. #### DJ Omnimaga • Former TI programmer • CoT Emeritus • LV15 Omnimagician (Next: --) • Posts: 55832 • Rating: +3151/-232 • CodeWalrus founder & retired Omnimaga founder ##### Re: ASM Optimized routines « Reply #6 on: April 25, 2010, 12:19:53 pm » Very nice! I'll add these to my utils.z80 file that is included in all my app builds. Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k I think I remember this, it was Halifax from the old Omnimaga forums who worked on it, right? There was a thread about it somewhere In case you are wondering where I went, I left Omni back in 2015 to form CodeWalrus due to various reasons explained back then, but I stopped calc dev in 2016 and am now mostly active on the CW Discord server at https://discord.gg/cuZcfcF Official Website |T-Shirt store | Reverbnation | Facebook | Youtube | Twitter | Spotify #### Quigibo • The Executioner • CoT Emeritus • LV11 Super Veteran (Next: 3000) • Posts: 2031 • Rating: +1075/-24 • I wish real life had a "Save" and "Load" button... ##### Re: ASM Optimized routines « Reply #7 on: April 29, 2010, 05:59:58 pm » Quigibo's Challenge! Can any of the following be done in 6 or fewer bytes? The input and output must be HL. • Multiply by 128? • Signed division by any nontrivial constant, other than 2, including negative numbers? • Modulus with any constant that is not a power of 2? I'm rewriting my math engine almost from scratch so I decided I would just optimize everything I could possibly conceive of at the same time. These are the ones I'm having trouble finding. ___Axe_Parser___ Today the calculator, tomorrow the world! #### calc84maniac • eZ80 Guru • Coder Of Tomorrow • LV11 Super Veteran (Next: 3000) • Posts: 2897 • Rating: +467/-17 ##### Re: ASM Optimized routines « Reply #8 on: April 29, 2010, 06:31:16 pm » Seems pretty impossible to me. "Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman #### Quigibo • The Executioner • CoT Emeritus • LV11 Super Veteran (Next: 3000) • Posts: 2031 • Rating: +1075/-24 • I wish real life had a "Save" and "Load" button... ##### Re: ASM Optimized routines « Reply #9 on: April 29, 2010, 06:58:39 pm » Okay, that's good. I spent hours trying to optimize some of these using all the tricks I know. That reassures me it was a wild goose chase. ___Axe_Parser___ Today the calculator, tomorrow the world! #### DJ Omnimaga • Former TI programmer • CoT Emeritus • LV15 Omnimagician (Next: --) • Posts: 55832 • Rating: +3151/-232 • CodeWalrus founder & retired Omnimaga founder ##### Re: ASM Optimized routines « Reply #10 on: April 29, 2010, 07:01:08 pm » Seems pretty impossible to me. No way! You're calc84god, you can do everything, even the impossible! (see TI-Boy SE/Project M/F-Zero) j/k I can't wait to see what kind of optimizations there will be in the next versions of Axe In case you are wondering where I went, I left Omni back in 2015 to form CodeWalrus due to various reasons explained back then, but I stopped calc dev in 2016 and am now mostly active on the CW Discord server at https://discord.gg/cuZcfcF Official Website |T-Shirt store | Reverbnation | Facebook | Youtube | Twitter | Spotify #### Quigibo • The Executioner • CoT Emeritus • LV11 Super Veteran (Next: 3000) • Posts: 2031 • Rating: +1075/-24 • I wish real life had a "Save" and "Load" button... ##### Re: ASM Optimized routines « Reply #11 on: April 29, 2010, 07:34:45 pm » It's nothing big. Mostly it just extend multiplication, modulus, and addition to higher powers of 2. The big optimizations won't come for a long time unfortunately. Functionality is more important right now. By the way, is there a better way to display hl at the coordinates (xx,yy) than this? Code: [Select] B_CALL(_SetXXXXOP2) B_CALL(_Op2ToOP1) ld hl,$yyxx
ld (PenCol),hl
ld a,5
B_CALL(_DispOP1A)

Its seems really roundabout to me.  Is there a bcall I don't know about that does this automatically?
___Axe_Parser___
Today the calculator, tomorrow the world!

#### calcdude84se

• Needs Motivation
• LV11 Super Veteran (Next: 3000)
• Posts: 2272
• Rating: +78/-13
• Wondering where their free time went...
##### Re: ASM Optimized routines
« Reply #12 on: April 29, 2010, 07:57:10 pm »
yeah, there's _DispHL
so you're code would be:
Code: [Select]
push hl
ld hl,$yyxx ld (PenCol),hl pop hl B_CALL(_DispHL) Just be aware it's right-justified in 5 spaces. (Since$ffff is 5 decimal digits, 65535)
EDIT: oh, wait, that's pencol? so this code doesn't work then. Oops...
« Last Edit: April 30, 2010, 05:49:37 pm by calcdude84se »
"People think computers will keep them from making mistakes. They're wrong. With computers you make mistakes faster."
I'll put it online when it does something.

#### calc84maniac

• eZ80 Guru
• Coder Of Tomorrow
• LV11 Super Veteran (Next: 3000)
• Posts: 2897
• Rating: +467/-17
##### Re: ASM Optimized routines
« Reply #13 on: April 29, 2010, 10:27:56 pm »
He's talking about graph screen display.
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

#### Galandros

• LV9 Veteran (Next: 1337)
• Posts: 1140
• Rating: +42/-10
##### Re: ASM Optimized routines
« Reply #14 on: April 30, 2010, 09:21:30 am »
Quigibo's Challenge!

Can any of the following be done in 6 or fewer bytes?  The input and output must be HL.

• Multiply by 128?
• Signed division by any nontrivial constant, other than 2, including negative numbers?
• Modulus with any constant that is not a power of 2?
Challenge accepted.

Answer to the multiplication by 128 in 6 bytes:

I started coding a routine that multiply A by 128:
Spoiler For Spoiler:
; The old trick to multiply by 256, by moving the low byte to high byte
ld h,a
xor a   ; resets carry
rr h     ; divide h by 2
rra      ; and pass bit 0 to a
ld l,a   ; store to l
; hl is a*128

After that, I very easily modified to (hl*128)%((2^16)-1). Unsigned version:
Spoiler For Spoiler:
ld h,l
xor a
rr h
rra
ld l,a
; 6 bytes and 24 clocks to multiply hl by 128, not bad O_o

I am very sure this routines works but I have not tested.
EDIT4: tested with a few values, it works.

EDIT3:
Multiply hl by 128, now signed. If I am right, to do signed, you only need to preserve the bit 7? If that's so:
Spoiler For Spoiler:
ld h,l
xor a
sra h
rra
ld l,a
; 6 bytes, 24 clocks, too

Now I will think about the others when I have more free time. Fun, fun, fun.