﻿ ASM Optimized routines
19 June, 2013, 10:45:55
 OmnomIRC You must Register, be logged in and have at least 40 posts to use this shout-box! If it still doesn't show up afterward, it might be that OmnomIRC is disabled for your group or under maintenance.Note: You can also use an IRC client like mIRC, X-Chat or Mibbit to connect to an EFnet server and #omnimaga.

 Pages: [1] 2 3 ... 5   Go Down
 Author Topic: ASM Optimized routines -  (Read 5710 times) 0 Members and 1 Guest are viewing this topic.
Galandros
LV9 Veteran (Next: 1337)

Offline

Last Login: 27 March, 2011, 01:13:41
Date Registered: 18 October, 2008, 14:21:07
Posts: 1150

Topic starter
Total Post Ratings: +32

 « on: 28 February, 2010, 14:27:53 » +2

There are some cools optimized routines around. Calcmaniac is the recordist in z80, probably. At least in calculators z80 forums is.

On to the code:
 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091 ;calcmaniac84cpHLDE: or a sbc hl,de add hl,de ret;Important note: because the code is 3 bytes and a call is 3 bytes, just macro in:;SPASM, TASM and BRASS compatible, I guess#define cp_HLDE  or a \ sbc hl,de \ add hl,de;- Reverse a;input: Byte in A;output: Reversed byte in A;destroys B;Clock cycles: 66;Bytes: 18;author: calcmaniac84reversea: ld b,a rrca rrca xor b and %10101010 xor b ld b,a rrca rrca rrca rrca xor b and %01100110 xor b rrca ret;reverse hl;curiosity: a easy port of a common reverse A register is more efficient than tricky stuff;calcmaniac84;28 bytes and 104 cyclesld a,lrlarr hrlarr hrlarr hrlarr hrlarr hrlarr hrlarr hrlarr hrlarrcald l,aret;calc84maniac;in: a = ABCDEFGH;out: hl= AABBCCDDEEFFGGHHrrcarrarrald l,arrasra lrlarr lsra lrrarr lsra lrrcarrarrald h,arrasra hrlarr hsra hrrarr hsra hret

 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586 ;Galandros optimized routines;try to beat me... maybe is possible...;Displays A register content on screen in decimal ASCII number, using no addition memoryDispA: ld c,-100 call Na1 ld c,-10 call Na1 ld c,-1Na1: ld b,'0'-1Na2: inc b add a,c jr c,Na2 sub c ;works as add 100/10/1 push af ;safer than ld c,a ld a,b ;char is in b CALL PUTCHAR ;plot a char. Replace with bcall(_PutC) or similar. pop af ;safer than ld a,c ret;Note the following one is optimized for RPGs menus and the such, it is quite flexible. I am going to use in Lost Legends I ^^;I started with one which used addition RAM for temporary storage (made by me, too), and optimized for size, speed and no extra memory use! ^.^;the inc's and dec's were trick to debug -.-", the registers b and c are like counters and flags;DispHL for games;input: hl=num, d=row,e=col, c=number of algarisms to skip;number of numbers' characters to display: 5 ; example: 65000;output: hl displayed, with algarisms skiped and spaces for initial zerosDispHL_games: inc c ld b,1 ;skip 0 flag ld (CurRow),de;Number in hl to decimal ASCII;Thanks to z80 Bits;inputs: hl = number to ASCII;example: hl=300 outputs '  300';destroys: af, hl, de used ld de,-10000 call Num1 ld de,-1000 call Num1 ld de,-100 call Num1 ld e,-10 call Num1 ld e,-1Num1: ld a,'0'-1Num2: inc a add hl,de jr c,Num2 sbc hl,de dec c ;c is skipping jr nz,skipnum inc c djnz notcharnumzero cp '0' jr nz,notcharnumzeroleadingzero: inc bskipnum: ld a,' 'notcharnumzero: push bc call PUTCHAR  ;bcall(_PutC) works, not sure if it preserves bc pop bc retPUTCHAR: bcall(_PutC) ret;Example usage of DispHL_games to understand what I meanTest2: ld hl,60003 ld de,\$0101 ld c,0 call DispHL_games ld hl,60003 ld de,\$0102 ld c,1 call DispHL_games ret

Well, don't try to understand or optimize calcmaniac84 ones. j/k, trying to understand can be harsh (tip: have a good instruction set summary) but teaches some inner details of the z80 asm.
 Logged

Hobbing in calculator projects.
Quigibo
The Executioner
LV11 Super Veteran (Next: 3000)

Offline

Gender:
Last Login: 31 May, 2013, 10:48:29
Date Registered: 22 January, 2010, 05:02:37
Location: Los Angeles
Posts: 2022

Total Post Ratings: +1019

 « Reply #1 on: 01 March, 2010, 00:21:57 » 0

Here is a little optimization I use but haven't really seen around.  When you need a direct key press, you have to wait about 7 clock cycles between setting the port and reading it.  Most people just fill in the extra space with a waste instruction like this:

 123456 ld a,xxout (1),ald a,(de)in a,(1)and yy
9 Bytes, 43 T-States.

You can actually use the waste instruction to do something useful.  It gives a slight speed increase.

 123456 ld a,xxout (1),ald b,yyin a,(1)and b
9 Bytes, 40 T-States.
 « Last Edit: 01 March, 2010, 00:23:48 by Quigibo » Logged

___Axe_Parser___
Today the calculator, tomorrow the world!
calc84maniac
Epic z80 roflpwner
Coder Of Tomorrow
LV11 Super Veteran (Next: 3000)

Offline

Gender:
Date Registered: 28 August, 2008, 05:09:05
Location: Right behind you.
Posts: 2737

Total Post Ratings: +376

 « Reply #2 on: 01 March, 2010, 03:12:27 » 0

Small and quick setup for IM 2 (this example sets up vector table at \$9900 and interrupt jump at \$9a9a, but values can be changed)
 12345678910111213141516171819 dild a,\$99ld bc,\$0100ld h,ald d,ald l,cld e,bld i,ainc ald (hl),aldirld l,ald (hl),\$c3inc lld (hl),intvec & \$ffinc lld (hl),intvec >> 8im 2ei
 Logged

"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman
Galandros
LV9 Veteran (Next: 1337)

Offline

Last Login: 27 March, 2011, 01:13:41
Date Registered: 18 October, 2008, 14:21:07
Posts: 1150

Topic starter
Total Post Ratings: +32

 « Reply #3 on: 24 April, 2010, 18:12:44 » 0

I found this optimized routine around. It is as far optimized as z80 string copy can get.
 12345678910 ;author: calcmaniac84, I think;Copy zero terminated string at HL to DE.StrCopy: xor adocopystr: cp (hl) ldi jr nz,docopystr ret

These are quite optimized. But may be is possible to optimize further. (speed and size) But it is not needed...
They shift a graphics buffer (optimized to 96x64) up or down by pixels passed in A register.
 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110 scroll_up:#ifdef DEBUG cp 64+1 call c,ErrorOverFlow#endif add a,a add a,a ld l,a ld e,a ld h,0 ld d,h add hl,hl add hl,de ; hl=a*12 push hl ld de,768 ex de,hl; carry is never set here if input is correct; or a sbc hl,de ld b,h ld c,l ; bc=768-12*a ex de,hl ld de,plotsscreen add hl,de ldir;blank remaining area ld h,d ld l,e inc de ld (hl),\$00 pop bc dec bc ; bc=12*a-1 ldir ret;PSEUDO CODE; ld hl,plotsscreen+12*a; ld de,plotsscreen; ld bc,768-12*a; ldir; ld h,d; ld l,e; ld (hl),\$00; inc de; ld bc,12*a; dec bc; ldir; retscroll_down:#ifdef DEBUG cp 64+1 call c,ErrorOverFlow#endif; a can be from 1 to 63; a can be multiplied by 4 add a,a add a,a ; a*4 ld l,a ; hl = a*4 ld e,a xor a ld h,a ld d,a add hl,hl ; hl = a*8 add hl,de ; hl = a*12 ld e,a ; de = 0 push hl ; a*12 will needed later push hl ; 2 times ex de,hl;carry is never set here; or a sbc hl,de ; hl= -a*12, de=a*12 ld de,plotsscreen+767 add hl,de ; hl=plotsscreen+767-12*a pop bc push hl ld hl,768+1;carry always set; or a sbc hl,bc ld b,h ld c,l pop hl lddr;blank remaining area ld h,d ld l,e ld (hl),\$00 dec de pop bc dec bc lddr ret; ld hl,plotsscreen+767-12*a; ld de,plotsscreen+767; ld bc,768-12*a; lddr; or; ld (hl),\$00 ;; ld hl,plotsscreen; ld h,d ;; ld (hl),\$00; ld l,e ;; ld de,hl+1; dec de ;; ld bc,12*a-1; ld bc,12*a-1 ;; ldir; lddr ;; ret; ret
 « Last Edit: 24 April, 2010, 18:15:14 by Galandros » Logged

Hobbing in calculator projects.
mapar007
LV7 Elite (Next: 700)

Offline

Gender:
Last Login: 21 May, 2013, 17:24:56
Date Registered: 09 October, 2008, 17:38:37
Location: Mechelen, Flanders, Belgium
Posts: 553

Total Post Ratings: +23

 « Reply #4 on: 25 April, 2010, 09:58:56 » 0

Very nice! I'll add these to my utils.z80 file that is included in all my app builds.

Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k
 Logged

Galandros
LV9 Veteran (Next: 1337)

Offline

Last Login: 27 March, 2011, 01:13:41
Date Registered: 18 October, 2008, 14:21:07
Posts: 1150

Topic starter
Total Post Ratings: +32

 « Reply #5 on: 25 April, 2010, 11:04:47 » 0

Very nice! I'll add these to my utils.z80 file that is included in all my app builds.

Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k
Actually I am working on something like that. I am hand writing C functions in z80 assembly just for fun. I will share them when I finish.
After seeing Axe Parser, it seems that is possible doing a good C compiler for z80. And we have documentation on how to optimize z80 assembly to do a optimizer, check the WikiTI topic: http://wikiti.brandonw.net/index.php?title=Z80_Optimization.
 « Last Edit: 25 April, 2010, 11:14:53 by Galandros » Logged

Hobbing in calculator projects.
DJ Omnimaga
Retired Omnimaga founder (Site issues must be PM'ed to Netham45, Eeems, Shmibs, Deep Thought and AngelFish, not me.)
Editor
LV15 Omnimagician (Next: --)

Online

Gender:
Date Registered: 25 August, 2008, 07:00:21
Posts: 50634

Total Post Ratings: +2637

 « Reply #6 on: 25 April, 2010, 18:19:53 » 0

Very nice! I'll add these to my utils.z80 file that is included in all my app builds.

Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k
I think I remember this, it was Halifax from the old Omnimaga forums who worked on it, right? There was a thread about it somewhere
 Logged

Retired 83+ coder, Omnimaga/TIMGUL founder. Now doing power metal music (formerly did electronica)

Quigibo
The Executioner
LV11 Super Veteran (Next: 3000)

Offline

Gender:
Last Login: 31 May, 2013, 10:48:29
Date Registered: 22 January, 2010, 05:02:37
Location: Los Angeles
Posts: 2022

Total Post Ratings: +1019

 « Reply #7 on: 29 April, 2010, 23:59:58 » 0

Quigibo's Challenge!

Can any of the following be done in 6 or fewer bytes?  The input and output must be HL.

• Multiply by 128?
• Signed division by any nontrivial constant, other than 2, including negative numbers?
• Modulus with any constant that is not a power of 2?

I'm rewriting my math engine almost from scratch so I decided I would just optimize everything I could possibly conceive of at the same time.  These are the ones I'm having trouble finding.
 Logged

___Axe_Parser___
Today the calculator, tomorrow the world!
calc84maniac
Epic z80 roflpwner
Coder Of Tomorrow
LV11 Super Veteran (Next: 3000)

Offline

Gender:
Date Registered: 28 August, 2008, 05:09:05
Location: Right behind you.
Posts: 2737

Total Post Ratings: +376

 « Reply #8 on: 30 April, 2010, 00:31:16 » 0

Seems pretty impossible to me.
 Logged

"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman
Quigibo
The Executioner
LV11 Super Veteran (Next: 3000)

Offline

Gender:
Last Login: 31 May, 2013, 10:48:29
Date Registered: 22 January, 2010, 05:02:37
Location: Los Angeles
Posts: 2022

Total Post Ratings: +1019

 « Reply #9 on: 30 April, 2010, 00:58:39 » 0

Okay, that's good.  I spent hours trying to optimize some of these using all the tricks I know.  That reassures me it was a wild goose chase.
 Logged

___Axe_Parser___
Today the calculator, tomorrow the world!
DJ Omnimaga
Retired Omnimaga founder (Site issues must be PM'ed to Netham45, Eeems, Shmibs, Deep Thought and AngelFish, not me.)
Editor
LV15 Omnimagician (Next: --)

Online

Gender:
Date Registered: 25 August, 2008, 07:00:21
Posts: 50634

Total Post Ratings: +2637

 « Reply #10 on: 30 April, 2010, 01:01:08 » 0

Seems pretty impossible to me.

No way!

You're calc84god, you can do everything, even the impossible! (see TI-Boy SE/Project M/F-Zero)

j/k I can't wait to see what kind of optimizations there will be in the next versions of Axe
 Logged

Retired 83+ coder, Omnimaga/TIMGUL founder. Now doing power metal music (formerly did electronica)

Quigibo
The Executioner
LV11 Super Veteran (Next: 3000)

Offline

Gender:
Last Login: 31 May, 2013, 10:48:29
Date Registered: 22 January, 2010, 05:02:37
Location: Los Angeles
Posts: 2022

Total Post Ratings: +1019

 « Reply #11 on: 30 April, 2010, 01:34:45 » 0

It's nothing big.  Mostly it just extend multiplication, modulus, and addition to higher powers of 2.  The big optimizations won't come for a long time unfortunately.  Functionality is more important right now.

By the way, is there a better way to display hl at the coordinates (xx,yy) than this?
 123456 B_CALL(_SetXXXXOP2)B_CALL(_Op2ToOP1)ld hl,\$yyxxld (PenCol),hlld a,5B_CALL(_DispOP1A)

Its seems really roundabout to me.  Is there a bcall I don't know about that does this automatically?
 Logged

___Axe_Parser___
Today the calculator, tomorrow the world!
calcdude84se
Needs Motivation
Members
LV11 Super Veteran (Next: 3000)

Offline

Gender:
Last Login: 14 May, 2013, 16:12:14
Date Registered: 21 April, 2010, 04:20:59
Posts: 2207

Total Post Ratings: +62

 « Reply #12 on: 30 April, 2010, 01:57:10 » 0

yeah, there's _DispHL
so you're code would be:
 12345 push hlld hl,\$yyxxld (PenCol),hlpop hlB_CALL(_DispHL)
Just be aware it's right-justified in 5 spaces. (Since \$ffff is 5 decimal digits, 65535)
EDIT: oh, wait, that's pencol? so this code doesn't work then. Oops...
 « Last Edit: 30 April, 2010, 23:49:37 by calcdude84se » Logged

"People think computers will keep them from making mistakes. They're wrong. With computers you make mistakes faster."
Bug me about PartesOS. I might just need reminding.
calc84maniac
Epic z80 roflpwner
Coder Of Tomorrow
LV11 Super Veteran (Next: 3000)

Offline

Gender:
Date Registered: 28 August, 2008, 05:09:05
Location: Right behind you.
Posts: 2737

Total Post Ratings: +376

 « Reply #13 on: 30 April, 2010, 04:27:56 » 0

He's talking about graph screen display.
 Logged

"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman
Galandros
LV9 Veteran (Next: 1337)

Offline

Last Login: 27 March, 2011, 01:13:41
Date Registered: 18 October, 2008, 14:21:07
Posts: 1150

Topic starter
Total Post Ratings: +32

 « Reply #14 on: 30 April, 2010, 15:21:30 » +1

Quigibo's Challenge!

Can any of the following be done in 6 or fewer bytes?  The input and output must be HL.

• Multiply by 128?
• Signed division by any nontrivial constant, other than 2, including negative numbers?
• Modulus with any constant that is not a power of 2?
Challenge accepted.

Answer to the multiplication by 128 in 6 bytes:

I started coding a routine that multiply A by 128:
Spoiler for Hidden:
; The old trick to multiply by 256, by moving the low byte to high byte
ld h,a
xor a   ; resets carry
rr h     ; divide h by 2
rra      ; and pass bit 0 to a
ld l,a   ; store to l
; hl is a*128

After that, I very easily modified to (hl*128)%((2^16)-1). Unsigned version:
Spoiler for Hidden:
ld h,l
xor a
rr h
rra
ld l,a
; 6 bytes and 24 clocks to multiply hl by 128, not bad O_o

I am very sure this routines works but I have not tested.
EDIT4: tested with a few values, it works.

EDIT3:
Multiply hl by 128, now signed. If I am right, to do signed, you only need to preserve the bit 7? If that's so:
Spoiler for Hidden:
ld h,l
xor a
sra h
rra
ld l,a
; 6 bytes, 24 clocks, too

Now I will think about the others when I have more free time. Fun, fun, fun.