Omnimaga: The Coders Of Tomorrow
Welcome, Guest. Please login or register.
 
Omnimaga: The Coders Of Tomorrow
18 June, 2013, 06:54:17 *
Welcome, Guest. Please login or register.

Login with username, password and session length
 
   home   news downloads projects tutorials misc forums rules new posts irc about Login Register  
+-OmnomIRC

You must Register, be logged in and have at least 40 posts to use this shout-box! If it still doesn't show up afterward, it might be that OmnomIRC is disabled for your group or under maintenance.

Note: You can also use an IRC client like mIRC, X-Chat or Mibbit to connect to an EFnet server and #omnimaga.

Pages: [1] 2 3 ... 5   Go Down
  Print  
Author Topic: ASM Optimized routines -  (Read 5697 times) Bookmark and Share
0 Members and 1 Guest are viewing this topic.
Galandros
LV9 Veteran (Next: 1337)
*********
Offline Offline

Last Login: 27 March, 2011, 01:13:41
Date Registered: 18 October, 2008, 14:21:07
Location: dead end of Europe
Posts: 1150

Topic starter
Total Post Ratings: +32

View Profile
« on: 28 February, 2010, 14:27:53 »
+2

There are some cools optimized routines around. Calcmaniac is the recordist in z80, probably. At least in calculators z80 forums is.

On to the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
;calcmaniac84
cpHLDE:
 or a
 sbc hl,de
 add hl,de
 ret
;Important note: because the code is 3 bytes and a call is 3 bytes, just macro in:
;SPASM, TASM and BRASS compatible, I guess
#define cp_HLDE  or a \ sbc hl,de \ add hl,de

;- Reverse a
;input: Byte in A
;output: Reversed byte in A
;destroys B
;Clock cycles: 66
;Bytes: 18
;author: calcmaniac84
reversea:
ld b,a
rrca
rrca
xor b
and %10101010
xor b
ld b,a
rrca
rrca
rrca
rrca
xor b
and %01100110
xor b
rrca
ret

;reverse hl
;curiosity: a easy port of a common reverse A register is more efficient than tricky stuff
;calcmaniac84
;28 bytes and 104 cycles
ld a,l
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rrca
ld l,a
ret

;calc84maniac
;in: a = ABCDEFGH
;out: hl= AABBCCDDEEFFGGHH
rrca
rra
rra
ld l,a
rra
sra l
rla
rr l
sra l
rra
rr l
sra l

rrca
rra
rra
ld h,a
rra
sra h
rla
rr h
sra h
rra
rr h
sra h
ret


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
;Galandros optimized routines
;try to beat me... maybe is possible...

;Displays A register content on screen in decimal ASCII number, using no addition memory
DispA:
ld c,-100
call Na1
ld c,-10
call Na1
ld c,-1
Na1: ld b,'0'-1
Na2: inc b
add a,c
jr c,Na2
sub c ;works as add 100/10/1
push af ;safer than ld c,a
ld a,b ;char is in b
CALL PUTCHAR ;plot a char. Replace with bcall(_PutC) or similar.
pop af ;safer than ld a,c
ret


;Note the following one is optimized for RPGs menus and the such, it is quite flexible. I am going to use in Lost Legends I ^^
;I started with one which used addition RAM for temporary storage (made by me, too), and optimized for size, speed and no extra memory use! ^.^
;the inc's and dec's were trick to debug -.-", the registers b and c are like counters and flags

;DispHL for games
;input: hl=num, d=row,e=col, c=number of algarisms to skip
;number of numbers' characters to display: 5 ; example: 65000
;output: hl displayed, with algarisms skiped and spaces for initial zeros
DispHL_games:
inc c
ld b,1 ;skip 0 flag
ld (CurRow),de
;Number in hl to decimal ASCII
;Thanks to z80 Bits
;inputs: hl = number to ASCII
;example: hl=300 outputs '  300'
;destroys: af, hl, de used
ld de,-10000
call Num1
ld de,-1000
call Num1
ld de,-100
call Num1
ld e,-10
call Num1
ld e,-1
Num1:
ld a,'0'-1
Num2: inc a
add hl,de
jr c,Num2
sbc hl,de
dec c ;c is skipping
jr nz,skipnum
inc c
djnz notcharnumzero
cp '0'
jr nz,notcharnumzero
leadingzero:
inc b
skipnum:
ld a,' '
notcharnumzero:
push bc
call PUTCHAR  ;bcall(_PutC) works, not sure if it preserves bc
pop bc
ret

PUTCHAR:
bcall(_PutC)
ret

;Example usage of DispHL_games to understand what I mean
Test2:
ld hl,60003
ld de,$0101
ld c,0
call DispHL_games
ld hl,60003
ld de,$0102
ld c,1
call DispHL_games
ret

Well, don't try to understand or optimize calcmaniac84 ones. j/k, trying to understand can be harsh (tip: have a good instruction set summary) but teaches some inner details of the z80 asm.
About mine, do your best.
Logged

Hobbing in calculator projects.
Quigibo
The Executioner
LV11 Super Veteran (Next: 3000)
***********
Offline Offline

Gender: Male
Last Login: 31 May, 2013, 10:48:29
Date Registered: 22 January, 2010, 05:02:37
Location: Los Angeles
Posts: 2022


Total Post Ratings: +1019

View Profile
« Reply #1 on: 01 March, 2010, 00:21:57 »
0

Here is a little optimization I use but haven't really seen around.  When you need a direct key press, you have to wait about 7 clock cycles between setting the port and reading it.  Most people just fill in the extra space with a waste instruction like this:


1
2
3
4
5
6
ld a,xx
out (1),a
ld a,(de)
in a,(1)
and yy
9 Bytes, 43 T-States.

You can actually use the waste instruction to do something useful.  It gives a slight speed increase.


1
2
3
4
5
6
ld a,xx
out (1),a
ld b,yy
in a,(1)
and b
9 Bytes, 40 T-States.
« Last Edit: 01 March, 2010, 00:23:48 by Quigibo » Logged

___Axe_Parser___
Today the calculator, tomorrow the world!
calc84maniac
Epic z80 roflpwner
Coder Of Tomorrow
LV11 Super Veteran (Next: 3000)
*
Offline Offline

Gender: Male
Last Login: Today at 05:24:14
Date Registered: 28 August, 2008, 05:09:05
Location: Right behind you.
Posts: 2737


Total Post Ratings: +376

View Profile
« Reply #2 on: 01 March, 2010, 03:12:27 »
0

Small and quick setup for IM 2 (this example sets up vector table at $9900 and interrupt jump at $9a9a, but values can be changed)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
di
ld a,$99
ld bc,$0100
ld h,a
ld d,a
ld l,c
ld e,b
ld i,a
inc a
ld (hl),a
ldir
ld l,a
ld (hl),$c3
inc l
ld (hl),intvec & $ff
inc l
ld (hl),intvec >> 8
im 2
ei
Logged

"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman
Galandros
LV9 Veteran (Next: 1337)
*********
Offline Offline

Last Login: 27 March, 2011, 01:13:41
Date Registered: 18 October, 2008, 14:21:07
Location: dead end of Europe
Posts: 1150

Topic starter
Total Post Ratings: +32

View Profile
« Reply #3 on: 24 April, 2010, 18:12:44 »
0

I found this optimized routine around. It is as far optimized as z80 string copy can get.

1
2
3
4
5
6
7
8
9
10
;author: calcmaniac84, I think
;Copy zero terminated string at HL to DE.
StrCopy:
xor a
docopystr:
cp (hl)
ldi
jr nz,docopystr
ret

These are quite optimized. But may be is possible to optimize further. (speed and size) But it is not needed...
They shift a graphics buffer (optimized to 96x64) up or down by pixels passed in A register.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
scroll_up:
#ifdef DEBUG
cp 64+1
call c,ErrorOverFlow
#endif
add a,a
add a,a
ld l,a
ld e,a
ld h,0
ld d,h
add hl,hl
add hl,de ; hl=a*12

push hl
ld de,768
ex de,hl
; carry is never set here if input is correct
; or a
sbc hl,de
ld b,h
ld c,l ; bc=768-12*a
ex de,hl
ld de,plotsscreen
add hl,de
ldir
;blank remaining area
ld h,d
ld l,e
inc de
ld (hl),$00
pop bc
dec bc ; bc=12*a-1
ldir
ret
;PSEUDO CODE
; ld hl,plotsscreen+12*a
; ld de,plotsscreen
; ld bc,768-12*a
; ldir
; ld h,d
; ld l,e
; ld (hl),$00
; inc de
; ld bc,12*a
; dec bc
; ldir
; ret



scroll_down:
#ifdef DEBUG
cp 64+1
call c,ErrorOverFlow
#endif
; a can be from 1 to 63
; a can be multiplied by 4
add a,a
add a,a ; a*4
ld l,a ; hl = a*4
ld e,a
xor a
ld h,a
ld d,a
add hl,hl ; hl = a*8
add hl,de ; hl = a*12
ld e,a ; de = 0

push hl ; a*12 will needed later
push hl ; 2 times
ex de,hl
;carry is never set here
; or a
sbc hl,de ; hl= -a*12, de=a*12
ld de,plotsscreen+767
add hl,de ; hl=plotsscreen+767-12*a
pop bc
push hl
ld hl,768+1
;carry always set
; or a
sbc hl,bc
ld b,h
ld c,l
pop hl
lddr
;blank remaining area
ld h,d
ld l,e
ld (hl),$00
dec de
pop bc
dec bc
lddr
ret

; ld hl,plotsscreen+767-12*a
; ld de,plotsscreen+767
; ld bc,768-12*a
; lddr
; or
; ld (hl),$00 ;; ld hl,plotsscreen
; ld h,d ;; ld (hl),$00
; ld l,e ;; ld de,hl+1
; dec de ;; ld bc,12*a-1
; ld bc,12*a-1 ;; ldir
; lddr ;; ret
; ret
« Last Edit: 24 April, 2010, 18:15:14 by Galandros » Logged

Hobbing in calculator projects.
mapar007
LV7 Elite (Next: 700)
*******
Offline Offline

Gender: Male
Last Login: 21 May, 2013, 17:24:56
Date Registered: 09 October, 2008, 17:38:37
Location: Mechelen, Flanders, Belgium
Posts: 553


Total Post Ratings: +23

View Profile
« Reply #4 on: 25 April, 2010, 09:58:56 »
0

Very nice! I'll add these to my utils.z80 file that is included in all my app builds.

Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k
Logged

Galandros
LV9 Veteran (Next: 1337)
*********
Offline Offline

Last Login: 27 March, 2011, 01:13:41
Date Registered: 18 October, 2008, 14:21:07
Location: dead end of Europe
Posts: 1150

Topic starter
Total Post Ratings: +32

View Profile
« Reply #5 on: 25 April, 2010, 11:04:47 »
0

Very nice! I'll add these to my utils.z80 file that is included in all my app builds.

Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k
Actually I am working on something like that. I am hand writing C functions in z80 assembly just for fun. Tongue I will share them when I finish.
After seeing Axe Parser, it seems that is possible doing a good C compiler for z80. And we have documentation on how to optimize z80 assembly to do a optimizer, check the WikiTI topic: http://wikiti.brandonw.net/index.php?title=Z80_Optimization.
« Last Edit: 25 April, 2010, 11:14:53 by Galandros » Logged

Hobbing in calculator projects.
DJ Omnimaga
Retired Omnimaga founder (Site issues must be PM'ed to Netham45, Eeems, Shmibs, Deep Thought and AngelFish, not me.)
Editor
LV15 Omnimagician (Next: --)
*
Online Online

Gender: Male
Last Login: Today at 06:49:54
Date Registered: 25 August, 2008, 07:00:21
Location: Québec (Canada)
Posts: 50586


Total Post Ratings: +2634

View Profile WWW
« Reply #6 on: 25 April, 2010, 18:19:53 »
0

Very nice! I'll add these to my utils.z80 file that is included in all my app builds.

Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k
I think I remember this, it was Halifax from the old Omnimaga forums who worked on it, right? There was a thread about it somewhere
Logged

Retired 83+ coder, Omnimaga/TIMGUL founder. Now doing power metal music (formerly did electronica)

Follow me on Bandcamp|Facebook|Reverbnation|Youtube|Twitter|Myspace
Quigibo
The Executioner
LV11 Super Veteran (Next: 3000)
***********
Offline Offline

Gender: Male
Last Login: 31 May, 2013, 10:48:29
Date Registered: 22 January, 2010, 05:02:37
Location: Los Angeles
Posts: 2022


Total Post Ratings: +1019

View Profile
« Reply #7 on: 29 April, 2010, 23:59:58 »
0

Quigibo's Challenge!

Can any of the following be done in 6 or fewer bytes?  The input and output must be HL.

  • Multiply by 128?
  • Signed division by any nontrivial constant, other than 2, including negative numbers?
  • Modulus with any constant that is not a power of 2?

I'm rewriting my math engine almost from scratch so I decided I would just optimize everything I could possibly conceive of at the same time.  These are the ones I'm having trouble finding.
Logged

___Axe_Parser___
Today the calculator, tomorrow the world!
calc84maniac
Epic z80 roflpwner
Coder Of Tomorrow
LV11 Super Veteran (Next: 3000)
*
Offline Offline

Gender: Male
Last Login: Today at 05:24:14
Date Registered: 28 August, 2008, 05:09:05
Location: Right behind you.
Posts: 2737


Total Post Ratings: +376

View Profile
« Reply #8 on: 30 April, 2010, 00:31:16 »
0

Seems pretty impossible to me.
Logged

"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman
Quigibo
The Executioner
LV11 Super Veteran (Next: 3000)
***********
Offline Offline

Gender: Male
Last Login: 31 May, 2013, 10:48:29
Date Registered: 22 January, 2010, 05:02:37
Location: Los Angeles
Posts: 2022


Total Post Ratings: +1019

View Profile
« Reply #9 on: 30 April, 2010, 00:58:39 »
0

Okay, that's good.  I spent hours trying to optimize some of these using all the tricks I know.  That reassures me it was a wild goose chase.
Logged

___Axe_Parser___
Today the calculator, tomorrow the world!
DJ Omnimaga
Retired Omnimaga founder (Site issues must be PM'ed to Netham45, Eeems, Shmibs, Deep Thought and AngelFish, not me.)
Editor
LV15 Omnimagician (Next: --)
*
Online Online

Gender: Male
Last Login: Today at 06:49:54
Date Registered: 25 August, 2008, 07:00:21
Location: Québec (Canada)
Posts: 50586


Total Post Ratings: +2634

View Profile WWW
« Reply #10 on: 30 April, 2010, 01:01:08 »
0

Seems pretty impossible to me.
shocked

No way!

You're calc84god, you can do everything, even the impossible! (see TI-Boy SE/Project M/F-Zero)

j/k I can't wait to see what kind of optimizations there will be in the next versions of Axe
Logged

Retired 83+ coder, Omnimaga/TIMGUL founder. Now doing power metal music (formerly did electronica)

Follow me on Bandcamp|Facebook|Reverbnation|Youtube|Twitter|Myspace
Quigibo
The Executioner
LV11 Super Veteran (Next: 3000)
***********
Offline Offline

Gender: Male
Last Login: 31 May, 2013, 10:48:29
Date Registered: 22 January, 2010, 05:02:37
Location: Los Angeles
Posts: 2022


Total Post Ratings: +1019

View Profile
« Reply #11 on: 30 April, 2010, 01:34:45 »
0

It's nothing big.  Mostly it just extend multiplication, modulus, and addition to higher powers of 2.  The big optimizations won't come for a long time unfortunately.  Functionality is more important right now.

By the way, is there a better way to display hl at the coordinates (xx,yy) than this?

1
2
3
4
5
6
B_CALL(_SetXXXXOP2)
B_CALL(_Op2ToOP1)
ld hl,$yyxx
ld (PenCol),hl
ld a,5
B_CALL(_DispOP1A)

Its seems really roundabout to me.  Is there a bcall I don't know about that does this automatically?
Logged

___Axe_Parser___
Today the calculator, tomorrow the world!
calcdude84se
Needs Motivation
Members
LV11 Super Veteran (Next: 3000)
***********
Offline Offline

Gender: Male
Last Login: 14 May, 2013, 16:12:14
Date Registered: 21 April, 2010, 04:20:59
Posts: 2207


Total Post Ratings: +62

View Profile
« Reply #12 on: 30 April, 2010, 01:57:10 »
0

yeah, there's _DispHL
so you're code would be:

1
2
3
4
5
push hl
ld hl,$yyxx
ld (PenCol),hl
pop hl
B_CALL(_DispHL)
Just be aware it's right-justified in 5 spaces. (Since $ffff is 5 decimal digits, 65535)
EDIT: oh, wait, that's pencol? so this code doesn't work then. Oops... Embarrassed
« Last Edit: 30 April, 2010, 23:49:37 by calcdude84se » Logged

"People think computers will keep them from making mistakes. They're wrong. With computers you make mistakes faster."
-Adam Osborne
Bug me about PartesOS. I might just need reminding.
calc84maniac
Epic z80 roflpwner
Coder Of Tomorrow
LV11 Super Veteran (Next: 3000)
*
Offline Offline

Gender: Male
Last Login: Today at 05:24:14
Date Registered: 28 August, 2008, 05:09:05
Location: Right behind you.
Posts: 2737


Total Post Ratings: +376

View Profile
« Reply #13 on: 30 April, 2010, 04:27:56 »
0

He's talking about graph screen display.
Logged

"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman
Galandros
LV9 Veteran (Next: 1337)
*********
Offline Offline

Last Login: 27 March, 2011, 01:13:41
Date Registered: 18 October, 2008, 14:21:07
Location: dead end of Europe
Posts: 1150

Topic starter
Total Post Ratings: +32

View Profile
« Reply #14 on: 30 April, 2010, 15:21:30 »
+1

Quigibo's Challenge!

Can any of the following be done in 6 or fewer bytes?  The input and output must be HL.

  • Multiply by 128?
  • Signed division by any nontrivial constant, other than 2, including negative numbers?
  • Modulus with any constant that is not a power of 2?
Challenge accepted.

Answer to the multiplication by 128 in 6 bytes:

I started coding a routine that multiply A by 128:
Spoiler for Hidden:
; The old trick to multiply by 256, by moving the low byte to high byte
 ld h,a
 xor a   ; resets carry
 rr h     ; divide h by 2
 rra      ; and pass bit 0 to a
 ld l,a   ; store to l
; hl is a*128

After that, I very easily modified to (hl*128)%((2^16)-1). Unsigned version:
Spoiler for Hidden:
ld h,l
 xor a
 rr h
 rra
 ld l,a
; 6 bytes and 24 clocks to multiply hl by 128, not bad O_o

I am very sure this routines works but I have not tested.
EDIT4: tested with a few values, it works.

EDIT3:
Multiply hl by 128, now signed. If I am right, to do signed, you only need to preserve the bit 7? If that's so:
Spoiler for Hidden:
ld h,l
 xor a
 sra h
 rra
 ld l,a
; 6 bytes, 24 clocks, too

Now I will think about the others when I have more free time. Fun, fun, fun.
Give me some time, please. Smiley
EDIT: I am thinking in putting some of this challenges in WikiTI when we end the challenge. And maybe Axe's routines. If you have other routines/challenges of optimization share to see what I can do.
EDIT2: fixed a bug/typo and commented even more the code
« Last Edit: 30 April, 2010, 19:18:05 by Galandros » Logged

Hobbing in calculator projects.
Pages: [1] 2 3 ... 5   Go Up
  Print  
 
Jump to:  

Powered by EzPortal
Powered by MySQL Powered by SMF 1.1.18 | SMF © 2013, Simple Machines Powered by PHP
Page created in 0.689 seconds with 31 queries.
Skin by DJ Omnimaga edited from SMF default theme with the help of tr1p1ea.
All programs, games and songs avaliable on this website are property of their respective owners.
Best viewed in Opera, Firefox, Chrome and Safari with a resolution of 1024x768 or above.