Author Topic: Assembly Programmers - Help Axe Optimize! (Read 145157 times)

Happybobjr · « **Reply #240 on:** August 30, 2011, 06:23:00 am »

what does that code do though

Runer112 · « **Reply #241 on:** September 18, 2011, 03:15:23 am »

At this rate, I'll have optimized just about every Axe routine eventually!

p_ToHex: 31 cycles faster.

Code: (Old code: 25 bytes, 670 cycles) [Select]

p_ToHex:
 .db __ToHexEnd-$-1
 ld b,4
 ld de,vx_SptBuff
 push de
__ToHexLoop:
 ld a,$1F
__ToHexShift:
 add hl,hl
 rla
 jr nc,__ToHexShift
 daa
 add a,$A0
 adc a,$40
 ld (de),a
 inc de
 djnz __ToHexLoop
 xor a
 ld (de),a
 pop hl
 ret
__ToHexEnd:

Code: (New code: 25 bytes, 639 cycles) [Select]

p_ToHex:
 .db __ToHexEnd-$-1
 ld bc,4<<8+$1F
 ld de,vx_SptBuff
__ToHexLoop:
 ld a,c
__ToHexShift:
 add hl,hl
 rla
 jr nc,__ToHexShift
 daa
 add a,$A0
 adc a,$40
 ld (de),a
 inc e
 djnz __ToHexLoop
 ex de,hl
 ld (hl),b
 ld l,vx_SptBuff&$FF
 ret
__ToHexEnd:

p_ShiftLeft: 1 byte smaller, 67 cycles faster. You could save an additional 384 cycles by giving up the minor size savings and loading 12<<8+4 into de at the start of the routine and then replacing the immediate data operands in the loop with d and e.

Code: (Old code: 17 bytes, 27542 cycles) [Select]

p_ShiftLeft:
 .db __ShiftLeftEnd-1-$
 ld hl,plotSScreen+767
 ld c,64
__ShiftLeftLoop:
 ld b,12
 or a
__ShiftLeftShift:
 rl (hl)
 dec hl
 djnz __ShiftLeftShift
 dec c
 jr nz,__ShiftLeftLoop
 ret
__ShiftLeftEnd:

Code: (New code: 16 bytes, 27475 cycles) [Select]

p_ShiftLeft:
 .db __ShiftLeftEnd-1-$
 ld hl,plotSScreen+767
 xor a
__ShiftLeftLoop:
 ld b,12
__ShiftLeftShift:
 rl (hl)
 dec hl
 djnz __ShiftLeftShift
 add a,4
 jr nz,__ShiftLeftLoop
 ret
__ShiftLeftEnd:

p_ShiftRight: 1 byte smaller, 67 cycles faster. Same deal as p_ShiftLeft.

Code: (Old code: 17 bytes, 27542 cycles) [Select]

p_ShiftRight:
 .db __ShiftRightEnd-1-$
 ld hl,plotSScreen
 ld c,64
__ShiftRightLoop:
 ld b,12
 or a
__ShiftRightShift:
 rr (hl)
 inc hl
 djnz __ShiftRightShift
 dec c
 jr nz,__ShiftRightLoop
 ret
__ShiftRightEnd:

Code: (New code: 16 bytes, 27475 cycles) [Select]

p_ShiftRight:
 .db __ShiftRightEnd-1-$
 ld hl,plotSScreen
 xor a
__ShiftRightLoop:
 ld b,12
__ShiftRightShift:
 rr (hl)
 inc hl
 djnz __ShiftRightShift
 add a,4
 jr nz,__ShiftRightLoop
 ret
__ShiftRightEnd:

p_FreqOut: 1 byte smaller. Takes advantage of an absolute jump. This is a strange routine to optimize, because optimizing it results in it running about 15% faster which would result in slightly higher pitched and shorter notes. Although this command is rarely used, this augmentation might still make the optimization not worth it. Whether or not you include the optimization, it might be a good idea to change this routine to use p_Safety.

Code: (Old code: 23 bytes) [Select]

p_FreqOut:
 .db __FreqOutEnd-1-$
 xor a
__FreqOutLoop1:
 push bc
 ld e,a
__FreqOutLoop2:
 ld a,h
 or l
 jr z,__FreqOutDone
 dec hl
 dec bc
 ld a,b
 or c
 jr nz,__FreqOutLoop2
 ld a,e
 xor %00000011
 scf
__FreqOutDone:
 pop bc
 out ($00),a
 ret nc
 jr __FreqOutLoop1
__FreqOutEnd:

Code: (New code: 22 bytes) [Select]

p_FreqOut:
 .db __FreqOutEnd-1-$
 xor a
__FreqOutLoop1:
 push bc
 ld e,a
__FreqOutLoop2:
 ld a,h
 or l
 jr z,__FreqOutDone
 cpd
 jp pe,__FreqOutLoop2
 ld a,e
 xor %00000011
 scf
__FreqOutDone:
 pop bc
 out ($00),a
 ret nc
 jr __FreqOutLoop1
__FreqOutEnd:

p_IntSetup: 4 bytes smaller. I thought this was some pretty impressive work.

And regarding interrupts, I still think the port 6 saving and restoring shenanigans aren't necessary for programs. The only reason port 6 would need to be restored to the value it held when interrupts were enabled is if the user is using a shell application in conjugation with their Axe program. In that case, either the designer of the shell application interface system could provide modified interrupt routines in an Axiom, or the user is probably intelligent enough to be able to provide their own interrupt routines. (Actually it wouldn't even need to be their own, they could just copy the one for applications from the Commands.inc file)

Code: (Old code: 42 bytes, a lot of cycles) [Select]

p_IntSetup:
 .db __IntEnd-p_IntSetup-1
 di
 ld de,$8B01
 ld a,d
 ld i,a
 ld a,l
 ld hl,$8B00
 ld b,e
 ld c,l
 ld (hl),$8A
 ldir

 and %00000110
 out (4),a
 ld a,%00001000
 out (3),a
 ld a,(hl)
 out (3),a

 ld d,a
 ld e,a
 ld c,__IntDataEnd-__IntData
 ld hl,$0000
 ldir

 in a,(6)
 ld ($8A8A+__IntDataSMC-__IntData+1),a
__IntEnd:
 .db rp_Ans,9

Code: (New code: 38 bytes, more cycles but who cares?) [Select]

p_IntSetup:
 .db __IntEnd-p_IntSetup-1
 di
 ld a,l
 ld hl,$8C06
 ld de,$8C05
 ld bc,$8C05-$8A8A

 and l
 out (4),a
 ld a,h
 out (3),a
 dec a
 ld i,a
 dec a
 out (3),a

 ld (hl),a
 lddr

 ld hl,$0000
 ld c,__IntDataEnd-__IntData
 ldir

 in a,(6)
 ld ($8A8A+__IntDataSMC-__IntData+1),a
__IntEnd:
 .db rp_Ans,11

p_DtoF: 2 bytes smaller. Takes advantage of a bcall to do the same thing. It appears that B_CALL(_SetXXXXOP2) always returns OP2+1, which could be used to save an additional 2 bytes, but this bcall could theoretically be changed in future OS versions and break this optimization.

Code: (Old code: 13 bytes, a lot of cycles) [Select]

p_DtoF:
 .db 13
 ex (sp),hl
 B_CALL(_SetXXXXOP2)
 ld hl,OP2
 pop de
 ld bc,9
 ldir

Code: (New code: 11 bytes, a lot plus a few cycles) [Select]

p_DtoF:
 .db 11
 ex (sp),hl
 B_CALL(_SetXXXXOP2)
 ld hl,OP2
 pop de
 B_CALL(_Mov9B)

calc84maniac · « **Reply #242 on:** September 20, 2011, 12:26:00 am »

p_Length: 1 byte smaller, 2 cycles faster. Takes advantage of the fact that you will not need to search more than 16384 bytes starting at $4000-$7FFF or 32768 bytes starting at $8000-$FFFF, and also you shouldn't be searching at $0000-$3FFF.

Code: ((Old code: 11 bytes)) [Select]

p_Length:
 .db __LengthEnd-$-1
 xor a
 ld b,a
 ld c,a
 cpir
 ld hl,-1
 sbc hl,bc
 ret
__LengthEnd:

Code: ((New code: 10 bytes)) [Select]

p_Length:
 .db __LengthEnd-$-1
 xor a
 ld b,h
 ld d,h
 ld e,l
 cpir
 scf
 sbc hl,de
 ret
__LengthEnd:

jacobly · « **Reply #243 on:** October 09, 2011, 10:16:40 am »

Speed optimization for p_CheckSum by using an absolute jump.

Code: (Old Code: 19 bytes, 63.5*n+37 cycles) [Select]

p_CheckSum:
 .db __CheckSumEnd-$-1
 ld b,h
 ld c,l
 pop af
 pop hl
 push af
 xor a
 ld d,a
__CheckSumLoop:
 add a,(hl)
 ld e,a
 jr nc,$+3
 inc d
 cpi
 ex de,hl
 ret po
 ex de,hl
 jr __CheckSumLoop
__CheckSumEnd:

Code: (New Code: 19 bytes, 44.5*n+65 cycles) [Select]

p_CheckSum:
 .db __CheckSumEnd-$-1
 ld b,h
 ld c,l
 pop af
 pop hl
 push af
 xor a
 ld d,a
__CheckSumLoop:
 add a,(hl)
 jr nc,$+3
 inc d
 cpi
 jp pe,__CheckSumLoop
 ld h,d
 ld l,a
 ret
__CheckSumEnd:

Xeda112358 · « **Reply #244 on:** October 09, 2011, 05:38:20 pm »

Hmm, would this optimisation work to save one more byte? (sorry, I could be wrong):

Code: [Select]

p_CheckSum:
 .db __CheckSumEnd-$-1
 ld b,h
 ld c,l
 pop hl
 ex      (sp),hl
 xor a
 ld d,a
__CheckSumLoop:
 add a,(hl)
 jr nc,$+3
 inc d
 cpi
 jp pe,__CheckSumLoop
 ld h,d
 ld l,a
 ret
__CheckSumEnd:

calc84maniac · « **Reply #245 on:** October 09, 2011, 07:21:47 pm »

Ah, nice use of ex (sp),hl

Xeda112358 · « **Reply #246 on:** October 09, 2011, 07:26:47 pm »

Thanks

I think I learned it from you folks

EDIT: It does use 2 more cycles though, right?

calc84maniac · « **Reply #247 on:** October 09, 2011, 07:30:34 pm »

Quote from: Xeda112358 on October 09, 2011, 07:26:47 pm

Thanks I think I learned it from you folks
EDIT: It does use 2 more cycles though, right?

Actually, ex (sp),hl takes 2 fewer cycles than pop af and push af combined, so it's faster too

Happybobjr · « **Reply #248 on:** October 09, 2011, 07:37:42 pm »

what is checksum do?

calc84maniac · « **Reply #249 on:** October 13, 2011, 11:32:57 am »

Here, slightly optimized Bitmap():
Old code, 7 bytes and lots of cycles

Code: [Select]

p_EzSprite:
 .db 7
 pop de
 ld a,e
 pop de
 ld d,a
 B_CALL(_DisplayImage)

New code, 6 bytes and lots of cycles minus 4

Code: [Select]

p_EzSprite:
 .db 6
 pop bc
 pop de
 ld d,c
 B_CALL(_DisplayImage)

Xeda112358 · « **Reply #250 on:** October 14, 2011, 02:54:36 pm »

Is this an optimisation? I get the feeling that there is a reason it doesn't end in an ret and that it uses a jr...

Code: (Old Code: 7 bytes, 30 or 38 cycles) [Select]

p_DecWord:
 .db 7
 ld a,(hl)
 dec (hl)
 or a
 jr nz,$+4
 inc hl
 dec (hl)

Code: (New Code: 6 bytes, 29 or 36) [Select]

p_DecWord:
 .db 6
 ld a,(hl)
 dec (hl)
 or a
 ret nz
 inc hl
 dec (hl)

EDIT Yep, suspicion confirmed

Quigibo · « **Reply #251 on:** November 04, 2011, 01:58:14 am »

Not an optimization, but I'm posting this here since more assembly people will read it. Since the Bitmap() command is being replaced with something actually useful, that means the "Fix 8" and "Fix 9" will also need to be replaced. Are there any useful flags (particularly for text) that would be useful to Axe programmers that I haven't already covered with the other fix commands? A couple I can think of are an APD toggle or Lowercase toggle.

LincolnB · « **Reply #252 on:** November 04, 2011, 10:24:39 am »

Hm...I say this as an Axe programmer, not knowing ASM...how about UPSIDE DOWN TEXT! om nom nom nom

jacobly · « **Reply #253 on:** November 15, 2011, 12:01:37 am »

p_Input: saves three bytes and lots of cycles

Code: [Select]

p_Input:
 .db __InputEnd-$-1
 res 6,(iy+$1C)
 set 7,(iy+$09)
 xor a
 ld (ioPrompt),a
 B_CALL(_GetStringInput)
 B_CALL(_ZeroOP1)
 ld hl,$2D04
 ld (OP1),hl
 B_CALL(_ChkFindSym)
 inc de
 inc de
 ex de,hl
 ret
__InputEnd:

Code: [Select]

p_Input:
 .db __InputEnd-$-1
 res 6,(iy+$1C)
 set 7,(iy+$09)
 xor a
 ld (ioPrompt),a
 B_CALL(_GetStringInput)
 B_CALL(_ZeroOP1)
 ld a,$2D
 ld (OP1+1),a
 rst rFindSym
 inc de
 inc de
 ex de,hl
 ret
__InputEnd:

Quigibo · « **Reply #254 on:** November 16, 2011, 05:52:32 pm »

Thanks!

Author Topic: Assembly Programmers - Help Axe Optimize! (Read 145157 times)

Happybobjr

Re: Assembly Programmers - Help Axe Optimize!

Runer112

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

jacobly

Re: Assembly Programmers - Help Axe Optimize!

Xeda112358

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

Xeda112358

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

Happybobjr

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

Xeda112358

Re: Assembly Programmers - Help Axe Optimize!

Quigibo

Re: Assembly Programmers - Help Axe Optimize!

LincolnB

Re: Assembly Programmers - Help Axe Optimize!

jacobly

Re: Assembly Programmers - Help Axe Optimize!

Quigibo

Re: Assembly Programmers - Help Axe Optimize!