﻿ ASM Optimized routines
19 June, 2013, 10:45:55
 Author Topic: ASM Optimized routines
Galandros
There are some cools optimized routines around. Calcmaniac is the recordist in z80, probably. At least in calculators z80 forums is.

On to the code:
 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091 ;calcmaniac84cpHLDE: or a sbc hl,de add hl,de ret;Important note: because the code is 3 bytes and a call is 3 bytes, just macro in:;SPASM, TASM and BRASS compatible, I guess#define cp_HLDE  or a \ sbc hl,de \ add hl,de;- Reverse a;input: Byte in A;output: Reversed byte in A;destroys B;Clock cycles: 66;Bytes: 18;author: calcmaniac84reversea: ld b,a rrca rrca xor b and %10101010 xor b ld b,a rrca rrca rrca rrca xor b and %01100110 xor b rrca ret;reverse hl;curiosity: a easy port of a common reverse A register is more efficient than tricky stuff;calcmaniac84;28 bytes and 104 cyclesld a,lrlarr hrlarr hrlarr hrlarr hrlarr hrlarr hrlarr hrlarr hrlarrcald l,aret;calc84maniac;in: a = ABCDEFGH;out: hl= AABBCCDDEEFFGGHHrrcarrarrald l,arrasra lrlarr lsra lrrarr lsra lrrcarrarrald h,arrasra hrlarr hsra hrrarr hsra hret

 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586 ;Galandros optimized routines;try to beat me... maybe is possible...;Displays A register content on screen in decimal ASCII number, using no addition memoryDispA: ld c,-100 call Na1 ld c,-10 call Na1 ld c,-1Na1: ld b,'0'-1Na2: inc b add a,c jr c,Na2 sub c ;works as add 100/10/1 push af ;safer than ld c,a ld a,b ;char is in b CALL PUTCHAR ;plot a char. Replace with bcall(_PutC) or similar. pop af ;safer than ld a,c ret;Note the following one is optimized for RPGs menus and the such, it is quite flexible. I am going to use in Lost Legends I ^^;I started with one which used addition RAM for temporary storage (made by me, too), and optimized for size, speed and no extra memory use! ^.^;the inc's and dec's were trick to debug -.-", the registers b and c are like counters and flags;DispHL for games;input: hl=num, d=row,e=col, c=number of algarisms to skip;number of numbers' characters to display: 5 ; example: 65000;output: hl displayed, with algarisms skiped and spaces for initial zerosDispHL_games: inc c ld b,1 ;skip 0 flag ld (CurRow),de;Number in hl to decimal ASCII;Thanks to z80 Bits;inputs: hl = number to ASCII;example: hl=300 outputs '  300';destroys: af, hl, de used ld de,-10000 call Num1 ld de,-1000 call Num1 ld de,-100 call Num1 ld e,-10 call Num1 ld e,-1Num1: ld a,'0'-1Num2: inc a add hl,de jr c,Num2 sbc hl,de dec c ;c is skipping jr nz,skipnum inc c djnz notcharnumzero cp '0' jr nz,notcharnumzeroleadingzero: inc bskipnum: ld a,' 'notcharnumzero: push bc call PUTCHAR  ;bcall(_PutC) works, not sure if it preserves bc pop bc retPUTCHAR: bcall(_PutC) ret;Example usage of DispHL_games to understand what I meanTest2: ld hl,60003 ld de,\$0101 ld c,0 call DispHL_games ld hl,60003 ld de,\$0102 ld c,1 call DispHL_games ret

Well, don't try to understand or optimize calcmaniac84 ones. j/k, trying to understand can be harsh (tip: have a good instruction set summary) but teaches some inner details of the z80 asm.
Hobbing in calculator projects.
Quigibo
The Executioner
 « Reply #1 on: 01 March, 2010, 00:21:57 » 0

Here is a little optimization I use but haven't really seen around.  When you need a direct key press, you have to wait about 7 clock cycles between setting the port and reading it.  Most people just fill in the extra space with a waste instruction like this:

 123456 ld a,xxout (1),ald a,(de)in a,(1)and yy
9 Bytes, 43 T-States.

You can actually use the waste instruction to do something useful.  It gives a slight speed increase.

 123456 ld a,xxout (1),ald b,yyin a,(1)and b
9 Bytes, 40 T-States.
___Axe_Parser___
Today the calculator, tomorrow the world!
calc84maniac
Epic z80 roflpwner
Coder Of Tomorrow
 « Reply #2 on: 01 March, 2010, 03:12:27 » 0

Small and quick setup for IM 2 (this example sets up vector table at \$9900 and interrupt jump at \$9a9a, but values can be changed)
 12345678910111213141516171819 dild a,\$99ld bc,\$0100ld h,ald d,ald l,cld e,bld i,ainc ald (hl),aldirld l,ald (hl),\$c3inc lld (hl),intvec & \$ffinc lld (hl),intvec >> 8im 2ei
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman
Galandros
 « Reply #3 on: 24 April, 2010, 18:12:44 » 0

I found this optimized routine around. It is as far optimized as z80 string copy can get.
 12345678910 ;author: calcmaniac84, I think;Copy zero terminated string at HL to DE.StrCopy: xor adocopystr: cp (hl) ldi jr nz,docopystr ret

These are quite optimized. But may be is possible to optimize further. (speed and size) But it is not needed...
They shift a graphics buffer (optimized to 96x64) up or down by pixels passed in A register.
 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110 scroll_up:#ifdef DEBUG cp 64+1 call c,ErrorOverFlow#endif add a,a add a,a ld l,a ld e,a ld h,0 ld d,h add hl,hl add hl,de ; hl=a*12 push hl ld de,768 ex de,hl; carry is never set here if input is correct; or a sbc hl,de ld b,h ld c,l ; bc=768-12*a ex de,hl ld de,plotsscreen add hl,de ldir;blank remaining area ld h,d ld l,e inc de ld (hl),\$00 pop bc dec bc ; bc=12*a-1 ldir ret;PSEUDO CODE; ld hl,plotsscreen+12*a; ld de,plotsscreen; ld bc,768-12*a; ldir; ld h,d; ld l,e; ld (hl),\$00; inc de; ld bc,12*a; dec bc; ldir; retscroll_down:#ifdef DEBUG cp 64+1 call c,ErrorOverFlow#endif; a can be from 1 to 63; a can be multiplied by 4 add a,a add a,a ; a*4 ld l,a ; hl = a*4 ld e,a xor a ld h,a ld d,a add hl,hl ; hl = a*8 add hl,de ; hl = a*12 ld e,a ; de = 0 push hl ; a*12 will needed later push hl ; 2 times ex de,hl;carry is never set here; or a sbc hl,de ; hl= -a*12, de=a*12 ld de,plotsscreen+767 add hl,de ; hl=plotsscreen+767-12*a pop bc push hl ld hl,768+1;carry always set; or a sbc hl,bc ld b,h ld c,l pop hl lddr;blank remaining area ld h,d ld l,e ld (hl),\$00 dec de pop bc dec bc lddr ret; ld hl,plotsscreen+767-12*a; ld de,plotsscreen+767; ld bc,768-12*a; lddr; or; ld (hl),\$00 ;; ld hl,plotsscreen; ld h,d ;; ld (hl),\$00; ld l,e ;; ld de,hl+1; dec de ;; ld bc,12*a-1; ld bc,12*a-1 ;; ldir; lddr ;; ret; ret
Hobbing in calculator projects.
mapar007
 « Reply #4 on: 25 April, 2010, 09:58:56 » 0

Very nice! I'll add these to my utils.z80 file that is included in all my app builds.

Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k
Galandros
 « Reply #5 on: 25 April, 2010, 11:04:47 » 0

Very nice! I'll add these to my utils.z80 file that is included in all my app builds.

Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k
Actually I am working on something like that. I am hand writing C functions in z80 assembly just for fun. I will share them when I finish.
After seeing Axe Parser, it seems that is possible doing a good C compiler for z80. And we have documentation on how to optimize z80 assembly to do a optimizer, check the WikiTI topic: http://wikiti.brandonw.net/index.php?title=Z80_Optimization.
Hobbing in calculator projects.
DJ Omnimaga
Retired Omnimaga founder (Site issues must be PM'ed to Netham45, Eeems, Shmibs, Deep Thought and AngelFish, not me.)
 « Reply #6 on: 25 April, 2010, 18:19:53 » 0

Very nice! I'll add these to my utils.z80 file that is included in all my app builds.

Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k
I think I remember this, it was Halifax from the old Omnimaga forums who worked on it, right? There was a thread about it somewhere
Retired 83+ coder, Omnimaga/TIMGUL founder. Now doing power metal music (formerly did electronica)

Quigibo
The Executioner
 « Reply #7 on: 29 April, 2010, 23:59:58 » 0

Quigibo's Challenge!

Can any of the following be done in 6 or fewer bytes?  The input and output must be HL.

• Multiply by 128?
• Signed division by any nontrivial constant, other than 2, including negative numbers?
• Modulus with any constant that is not a power of 2?

I'm rewriting my math engine almost from scratch so I decided I would just optimize everything I could possibly conceive of at the same time.  These are the ones I'm having trouble finding.
___Axe_Parser___
Today the calculator, tomorrow the world!
calc84maniac
Epic z80 roflpwner
Coder Of Tomorrow
 « Reply #8 on: 30 April, 2010, 00:31:16 » 0

Seems pretty impossible to me.
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman
Quigibo
The Executioner
 « Reply #9 on: 30 April, 2010, 00:58:39 » 0

Okay, that's good.  I spent hours trying to optimize some of these using all the tricks I know.  That reassures me it was a wild goose chase.
___Axe_Parser___
Today the calculator, tomorrow the world!
DJ Omnimaga
Retired Omnimaga founder (Site issues must be PM'ed to Netham45, Eeems, Shmibs, Deep Thought and AngelFish, not me.)
Editor
 « Reply #10 on: 30 April, 2010, 01:01:08 » 0

Seems pretty impossible to me.

No way!

You're calc84god, you can do everything, even the impossible! (see TI-Boy SE/Project M/F-Zero)

j/k I can't wait to see what kind of optimizations there will be in the next versions of Axe
Retired 83+ coder, Omnimaga/TIMGUL founder. Now doing power metal music (formerly did electronica)

Quigibo
The Executioner
 « Reply #11 on: 30 April, 2010, 01:34:45 » 0

It's nothing big.  Mostly it just extend multiplication, modulus, and addition to higher powers of 2.  The big optimizations won't come for a long time unfortunately.  Functionality is more important right now.

By the way, is there a better way to display hl at the coordinates (xx,yy) than this?
 123456 B_CALL(_SetXXXXOP2)B_CALL(_Op2ToOP1)ld hl,\$yyxxld (PenCol),hlld a,5B_CALL(_DispOP1A)

Its seems really roundabout to me.  Is there a bcall I don't know about that does this automatically?
___Axe_Parser___
Today the calculator, tomorrow the world!
calcdude84se
Needs Motivation
Members
 « Reply #12 on: 30 April, 2010, 01:57:10 » 0

yeah, there's _DispHL
so you're code would be:
 12345 push hlld hl,\$yyxxld (PenCol),hlpop hlB_CALL(_DispHL)
Just be aware it's right-justified in 5 spaces. (Since \$ffff is 5 decimal digits, 65535)
EDIT: oh, wait, that's pencol? so this code doesn't work then. Oops...
"People think computers will keep them from making mistakes. They're wrong. With computers you make mistakes faster."
Bug me about PartesOS. I might just need reminding.
calc84maniac
Epic z80 roflpwner
Coder Of Tomorrow
 « Reply #13 on: 30 April, 2010, 04:27:56 » 0

He's talking about graph screen display.
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman
Galandros
 « Reply #14 on: 30 April, 2010, 15:21:30 » +1

Quigibo's Challenge!

Can any of the following be done in 6 or fewer bytes?  The input and output must be HL.

• Multiply by 128?
• Signed division by any nontrivial constant, other than 2, including negative numbers?
• Modulus with any constant that is not a power of 2?
Challenge accepted.

Answer to the multiplication by 128 in 6 bytes:

I started coding a routine that multiply A by 128:
Spoiler for Hidden:
; The old trick to multiply by 256, by moving the low byte to high byte
ld h,a
xor a   ; resets carry
rr h     ; divide h by 2
rra      ; and pass bit 0 to a
ld l,a   ; store to l
; hl is a*128

After that, I very easily modified to (hl*128)%((2^16)-1). Unsigned version:
Spoiler for Hidden:
ld h,l
xor a
rr h
rra
ld l,a
; 6 bytes and 24 clocks to multiply hl by 128, not bad O_o

I am very sure this routines works but I have not tested.
EDIT4: tested with a few values, it works.

EDIT3:
Multiply hl by 128, now signed. If I am right, to do signed, you only need to preserve the bit 7? If that's so:
Spoiler for Hidden:
ld h,l
xor a
sra h
rra
ld l,a
; 6 bytes, 24 clocks, too

Now I will think about the others when I have more free time. Fun, fun, fun.