Calculator Community > ASM |
ASM Optimized routines |
(1/22) > >> |
Galandros:
There are some cools optimized routines around. Calcmaniac is the recordist in z80, probably. At least in calculators z80 forums is. On to the code: --- Code: ---;calcmaniac84 cpHLDE: or a sbc hl,de add hl,de ret ;Important note: because the code is 3 bytes and a call is 3 bytes, just macro in: ;SPASM, TASM and BRASS compatible, I guess #define cp_HLDE or a \ sbc hl,de \ add hl,de ;- Reverse a ;input: Byte in A ;output: Reversed byte in A ;destroys B ;Clock cycles: 66 ;Bytes: 18 ;author: calcmaniac84 reversea: ld b,a rrca rrca xor b and %10101010 xor b ld b,a rrca rrca rrca rrca xor b and %01100110 xor b rrca ret ;reverse hl ;curiosity: a easy port of a common reverse A register is more efficient than tricky stuff ;calcmaniac84 ;28 bytes and 104 cycles ld a,l rla rr h rla rr h rla rr h rla rr h rla rr h rla rr h rla rr h rla rr h rla rrca ld l,a ret ;calc84maniac ;in: a = ABCDEFGH ;out: hl= AABBCCDDEEFFGGHH rrca rra rra ld l,a rra sra l rla rr l sra l rra rr l sra l rrca rra rra ld h,a rra sra h rla rr h sra h rra rr h sra h ret --- End code --- --- Code: ---;Galandros optimized routines ;try to beat me... maybe is possible... ;Displays A register content on screen in decimal ASCII number, using no addition memory DispA: ld c,-100 call Na1 ld c,-10 call Na1 ld c,-1 Na1: ld b,'0'-1 Na2: inc b add a,c jr c,Na2 sub c ;works as add 100/10/1 push af ;safer than ld c,a ld a,b ;char is in b CALL PUTCHAR ;plot a char. Replace with bcall(_PutC) or similar. pop af ;safer than ld a,c ret ;Note the following one is optimized for RPGs menus and the such, it is quite flexible. I am going to use in Lost Legends I ^^ ;I started with one which used addition RAM for temporary storage (made by me, too), and optimized for size, speed and no extra memory use! ^.^ ;the inc's and dec's were trick to debug -.-", the registers b and c are like counters and flags ;DispHL for games ;input: hl=num, d=row,e=col, c=number of algarisms to skip ;number of numbers' characters to display: 5 ; example: 65000 ;output: hl displayed, with algarisms skiped and spaces for initial zeros DispHL_games: inc c ld b,1 ;skip 0 flag ld (CurRow),de ;Number in hl to decimal ASCII ;Thanks to z80 Bits ;inputs: hl = number to ASCII ;example: hl=300 outputs ' 300' ;destroys: af, hl, de used ld de,-10000 call Num1 ld de,-1000 call Num1 ld de,-100 call Num1 ld e,-10 call Num1 ld e,-1 Num1: ld a,'0'-1 Num2: inc a add hl,de jr c,Num2 sbc hl,de dec c ;c is skipping jr nz,skipnum inc c djnz notcharnumzero cp '0' jr nz,notcharnumzero leadingzero: inc b skipnum: ld a,' ' notcharnumzero: push bc call PUTCHAR ;bcall(_PutC) works, not sure if it preserves bc pop bc ret PUTCHAR: bcall(_PutC) ret ;Example usage of DispHL_games to understand what I mean Test2: ld hl,60003 ld de,$0101 ld c,0 call DispHL_games ld hl,60003 ld de,$0102 ld c,1 call DispHL_games ret --- End code --- Well, don't try to understand or optimize calcmaniac84 ones. j/k, trying to understand can be harsh (tip: have a good instruction set summary) but teaches some inner details of the z80 asm. About mine, do your best. |
Quigibo:
Here is a little optimization I use but haven't really seen around. When you need a direct key press, you have to wait about 7 clock cycles between setting the port and reading it. Most people just fill in the extra space with a waste instruction like this: --- Code: ---ld a,xx out (1),a ld a,(de) in a,(1) and yy --- End code --- 9 Bytes, 43 T-States. You can actually use the waste instruction to do something useful. It gives a slight speed increase. --- Code: ---ld a,xx out (1),a ld b,yy in a,(1) and b --- End code --- 9 Bytes, 40 T-States. |
calc84maniac:
Small and quick setup for IM 2 (this example sets up vector table at $9900 and interrupt jump at $9a9a, but values can be changed) --- Code: ---di ld a,$99 ld bc,$0100 ld h,a ld d,a ld l,c ld e,b ld i,a inc a ld (hl),a ldir ld l,a ld (hl),$c3 inc l ld (hl),intvec & $ff inc l ld (hl),intvec >> 8 im 2 ei --- End code --- |
Galandros:
I found this optimized routine around. It is as far optimized as z80 string copy can get. --- Code: ---;author: calcmaniac84, I think ;Copy zero terminated string at HL to DE. StrCopy: xor a docopystr: cp (hl) ldi jr nz,docopystr ret --- End code --- These are quite optimized. But may be is possible to optimize further. (speed and size) But it is not needed... They shift a graphics buffer (optimized to 96x64) up or down by pixels passed in A register. --- Code: ---scroll_up: #ifdef DEBUG cp 64+1 call c,ErrorOverFlow #endif add a,a add a,a ld l,a ld e,a ld h,0 ld d,h add hl,hl add hl,de ; hl=a*12 push hl ld de,768 ex de,hl ; carry is never set here if input is correct ; or a sbc hl,de ld b,h ld c,l ; bc=768-12*a ex de,hl ld de,plotsscreen add hl,de ldir ;blank remaining area ld h,d ld l,e inc de ld (hl),$00 pop bc dec bc ; bc=12*a-1 ldir ret ;PSEUDO CODE ; ld hl,plotsscreen+12*a ; ld de,plotsscreen ; ld bc,768-12*a ; ldir ; ld h,d ; ld l,e ; ld (hl),$00 ; inc de ; ld bc,12*a ; dec bc ; ldir ; ret scroll_down: #ifdef DEBUG cp 64+1 call c,ErrorOverFlow #endif ; a can be from 1 to 63 ; a can be multiplied by 4 add a,a add a,a ; a*4 ld l,a ; hl = a*4 ld e,a xor a ld h,a ld d,a add hl,hl ; hl = a*8 add hl,de ; hl = a*12 ld e,a ; de = 0 push hl ; a*12 will needed later push hl ; 2 times ex de,hl ;carry is never set here ; or a sbc hl,de ; hl= -a*12, de=a*12 ld de,plotsscreen+767 add hl,de ; hl=plotsscreen+767-12*a pop bc push hl ld hl,768+1 ;carry always set ; or a sbc hl,bc ld b,h ld c,l pop hl lddr ;blank remaining area ld h,d ld l,e ld (hl),$00 dec de pop bc dec bc lddr ret ; ld hl,plotsscreen+767-12*a ; ld de,plotsscreen+767 ; ld bc,768-12*a ; lddr ; or ; ld (hl),$00 ;; ld hl,plotsscreen ; ld h,d ;; ld (hl),$00 ; ld l,e ;; ld de,hl+1 ; dec de ;; ld bc,12*a-1 ; ld bc,12*a-1 ;; ldir ; lddr ;; ret ; ret --- End code --- |
mapar007:
Very nice! I'll add these to my utils.z80 file that is included in all my app builds. Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k |
Navigation |
Message Index |
Next page |