### Author Topic: Assembly Programmers - Help Axe Optimize!  (Read 140377 times)

0 Members and 1 Guest are viewing this topic.

#### Quigibo

• The Executioner
• CoT Emeritus
• LV11 Super Veteran (Next: 3000)
• Posts: 2031
• Rating: +1075/-24
##### Re: Assembly Programmers - Help Axe Optimize!
« Reply #195 on: May 26, 2011, 02:22:21 am »
Anyone up for some math?

I want to implement the reciprocal function for fixed point math.  For 8.8 numbers, A-1 is essentially just E10000//A however that division requires a number larger than can fit in a register pair.  Ideally, the routine could hijack a jump point into the current division routine instead of rewriting another one.  But its possible due to the symmetry involved that there might be a significantly optimized method using a slightly different approach, but I can't think of how that would work.  Has anyone seen or written a routine like this before?
___Axe_Parser___
Today the calculator, tomorrow the world!

#### Runer112

• Project Author
• LV11 Super Veteran (Next: 3000)
• Posts: 2289
• Rating: +639/-31
##### Re: Assembly Programmers - Help Axe Optimize!
« Reply #196 on: May 26, 2011, 04:35:58 pm »
I don't know of any speed-optimized function specific to taking the inverse. But that definitely doesn't mean one doesn't exist. However, you could easily implement it if you added 8.8 fixed point division:

p_Inverse:
.db 7
ex   de,hl
ld   hl,$100 call$0000      ;sub_88Div
.db rp_Ans,2

p_88Div:
.db __88DivEnd-1-$ld a,h xor d push af bit 7,h jr z,$+8
xor   a
sub   l
ld   l,a
sbc   a,a
sub   h
ld   h,a
bit   7,d
jr   z,$+8 xor a sub e ld e,a sbc a,a sub d ld d,a ld b,24 call$0000      ;sub_Div+2
pop   af
ret   nc
xor   a
sub   l
ld   l,a
sbc   a,a
sub   h
ld   h,a
ret
__88DivEnd:
.db   rp_Ans,12

EDIT: Just kidding, that hijacking of the 16/16 division routine to make an 8.8 division routine doesn't work. But it's definitely possible to hijack the 16/16 division routine at least for an 8.8 inverse.
« Last Edit: May 26, 2011, 06:57:40 pm by Runer112 »

#### Quigibo

• The Executioner
• CoT Emeritus
• LV11 Super Veteran (Next: 3000)
• Posts: 2031
• Rating: +1075/-24
##### Re: Assembly Programmers - Help Axe Optimize!
« Reply #197 on: May 26, 2011, 08:27:08 pm »
I'm not planning to add 8.8 division.  I think just multiplying by the inverse should work with enough accuracy.
___Axe_Parser___
Today the calculator, tomorrow the world!

#### Runer112

• Project Author
• LV11 Super Veteran (Next: 3000)
• Posts: 2289
• Rating: +639/-31
##### Re: Assembly Programmers - Help Axe Optimize!
« Reply #198 on: May 26, 2011, 08:28:41 pm »
But the logical way to get the inverse is to divide, is it not?

#### Quigibo

• The Executioner
• CoT Emeritus
• LV11 Super Veteran (Next: 3000)
• Posts: 2031
• Rating: +1075/-24
##### Re: Assembly Programmers - Help Axe Optimize!
« Reply #199 on: May 26, 2011, 08:31:35 pm »
Right, but an inverse can use a standard 16/16 division instead of a 24/16.
___Axe_Parser___
Today the calculator, tomorrow the world!

#### Runer112

• Project Author
• LV11 Super Veteran (Next: 3000)
• Posts: 2289
• Rating: +639/-31
##### Re: Assembly Programmers - Help Axe Optimize!
« Reply #200 on: May 26, 2011, 08:37:17 pm »
Yeah, I actually had a routine written for that which hijacked the 16/16 division routine, but deleted it in favor of the 8.8 division routine. However I realized that the 8.8 division routine doesn't work, so I'll try to recreate what I had before:

Code: [Select]
p_Inverse: .db __InverseEnd-1-$xor a bit 7,h push af jr z,$+8 sub l ld l,a sbc a,a sub h ld h,a xor a ex de,hl ld bc,16<<8 ld hl,1 call $0000 ;sub_Div+10 pop af ret z sub l ld l,a sbc a,a sub h ld h,a ret__InverseEnd: .db rp_Ans,12 « Last Edit: May 26, 2011, 08:39:05 pm by Runer112 » #### Quigibo • The Executioner • CoT Emeritus • LV11 Super Veteran (Next: 3000) • Posts: 2031 • Rating: +1075/-24 • I wish real life had a "Save" and "Load" button... ##### Re: Assembly Programmers - Help Axe Optimize! « Reply #201 on: May 26, 2011, 08:40:56 pm » I actually have a copy of the routine you poster earlier and it was a bit more optimized so no worries ___Axe_Parser___ Today the calculator, tomorrow the world! #### Runer112 • Project Author • LV11 Super Veteran (Next: 3000) • Posts: 2289 • Rating: +639/-31 ##### Re: Assembly Programmers - Help Axe Optimize! « Reply #202 on: May 26, 2011, 08:42:33 pm » Yeah it was more optimized, but I don't think it worked. It would've screwed up normal 16/16 division because of how I reordered the initialization in p_Div to destroy hl before loading hl into ac. « Last Edit: May 26, 2011, 08:43:33 pm by Runer112 » #### thepenguin77 • z80 Assembly Master • LV10 31337 u53r (Next: 2000) • Posts: 1594 • Rating: +823/-5 • The game in my avatar is bit.ly/p0zPWu ##### Re: Assembly Programmers - Help Axe Optimize! « Reply #203 on: June 10, 2011, 03:14:11 pm » This is a really simple one. When an interrupt is called, interrupts are automatically disabled. So you don't need to start the interrupt routine with DI. (Thinking that interrupts were enabled by default caused runer quite a headache over IRC ) « Last Edit: June 10, 2011, 03:14:54 pm by thepenguin77 » zStart v1.3.013 9-20-2013 All of my utilities TI-Connect Help You can build a statue out of either 1'x1' blocks or 12'x12' blocks. The 1'x1' blocks will take a lot longer, but the final product is worth it. -Runer112 #### Runer112 • Project Author • LV11 Super Veteran (Next: 3000) • Posts: 2289 • Rating: +639/-31 ##### Re: Assembly Programmers - Help Axe Optimize! « Reply #204 on: June 10, 2011, 11:20:26 pm » More stuff regarding interrupts. SMC'ing the active port 6 page into the interrupt handler is, as far as I know, only necessary for applications. You could get rid of this if the code is being compiled to a program to save 9 bytes. And on the topic of stuff that involves port 6, I think it would be nice if the archive byte reading routine avoided using a B_CALL for a massive speed boost, especially for code compiled as programs: p_ReadArc: 18 bytes (2x) larger, but ~1400 cycles (!!!10x!!!) faster Code: (36 bytes, ~142 cycles) [Select] p_ReadArc: .db __ReadArcEnd-1-$ ld c,a in a,(6) ld b,a ld a,h set 6,h res 7,h rlca rlca dec a and %00000011 add a,c out (6),a ld c,(hl) inc hl bit 7,h jr z,__ReadArcNoBoundary set 6,h res 7,h inc a out (6),a__ReadArcNoBoundary: ld l,(hl) ld h,c ld a,b out (6),a ret__ReadArcEnd:
p_ReadArcApp: 36 bytes (3x) larger, but ~1050 cycles (4x) faster

Code: (54 bytes, ~396 cycles) [Select]
p_ReadArcApp: .db __ReadArcAppEnd-1-$push hl ld hl,$0000 ld de,ramCode ld bc,__ReadArcAppRamCodeEnd-__ReadArcAppRamCode ldir pop hl ld e,a ld c,6 in b,(c) ld a,h set 6,h res 7,h rlca rlca dec a and %00000011 add a,e call ramCode ld e,d inc hl bit 7,h jr z,__ReadArcAppNoBoundary set 6,h res 7,h inc a__ReadArcAppNoBoundary: call ramCode ex de,hl ret__ReadArcAppEnd: .db rp_Ans,__ReadArcAppEnd-p_ReadArcApp-3__ReadArcAppRamCode: out (6),a ld d,(hl) out (c),b ret__ReadArcAppRamCodeEnd:
« Last Edit: June 11, 2011, 12:51:24 am by Runer112 »

#### Quigibo

• The Executioner
• CoT Emeritus
• LV11 Super Veteran (Next: 3000)
• Posts: 2031
• Rating: +1075/-24
##### Re: Assembly Programmers - Help Axe Optimize!
« Reply #205 on: June 11, 2011, 01:42:59 am »
Quote
This is a really simple one. When an interrupt is called, interrupts are automatically disabled. So you don't need to start the interrupt routine with DI.

They are disabled automatically already... there is a di at the start of the interrupt routine.  Is there some bug with that?

Also, about those archive reading commands... archive reading isn't as useful as it should be due to those sector boundary issues.  For instance, you can't reliably iterate a tilemap in archive because there is a small chance it could overlap between a sector boundary and iterating over it would add a "glitch byte" to the map since each sector adds an extra byte in front.  Although I guess you could modify those routines to take that into account, that might work since you can't read more than 64 consecutive kilobytes anyway.
« Last Edit: June 11, 2011, 01:43:38 am by Quigibo »
___Axe_Parser___
Today the calculator, tomorrow the world!

#### calc84maniac

• eZ80 Guru
• Coder Of Tomorrow
• LV11 Super Veteran (Next: 3000)
• Posts: 2912
• Rating: +471/-17
##### Re: Assembly Programmers - Help Axe Optimize!
« Reply #206 on: June 11, 2011, 01:51:59 am »
Quote
This is a really simple one. When an interrupt is called, interrupts are automatically disabled. So you don't need to start the interrupt routine with DI.

They are disabled automatically already... there is a di at the start of the interrupt routine.  Is there some bug with that?

Also, about those archive reading commands... archive reading isn't as useful as it should be due to those sector boundary issues.  For instance, you can't reliably iterate a tilemap in archive because there is a small chance it could overlap between a sector boundary and iterating over it would add a "glitch byte" to the map since each sector adds an extra byte in front.  Although I guess you could modify those routines to take that into account, that might work since you can't read more than 64 consecutive kilobytes anyway.
There's no chance of overlapping a sector boundary, but yeah you can overlap a page boundary. TI-OS doesn't allow variables to cross sector boundaries.

Edit: About the DI thing, he means that it's a waste of a byte and 4 cycles to DI when it has already been done by the hardware.
« Last Edit: June 11, 2011, 01:52:50 am by calc84maniac »
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

#### calc84maniac

• eZ80 Guru
• Coder Of Tomorrow
• LV11 Super Veteran (Next: 3000)
• Posts: 2912
• Rating: +471/-17
##### Re: Assembly Programmers - Help Axe Optimize!
« Reply #207 on: June 17, 2011, 03:49:26 am »
I made a one-byte optimization to p_SDiv:
 Old: New: Code: [Select]p_SDiv: .db __SDivEnd-1-$ld a,h xor d push af bit 7,h jr z,$+8 xor a sub l ld l,a sbc a,a sub h ld h,a bit 7,d jr z,$+8 xor a sub e ld e,a sbc a,a sub d ld d,a call$3F00+sub_Div pop af add a,a ret nc xor a sub l ld l,a sbc a,a sub h ld h,a ret__SDivEnd: Code: [Select]p_SDiv: .db __SDivEnd-1-$ld a,h xor d push af bit 7,h jr z,$+8 xor a sub l ld l,a sbc a,a sub h ld h,a bit 7,d jr z,$+8 xor a sub e ld e,a sbc a,a sub d ld d,a call$3F00+sub_Div pop af ret p xor a sub l ld l,a sbc a,a sub h ld h,a ret__SDivEnd:

I'm also working on a fixed-point division routine (that hijacks the normal division routine), but I think I need to make sure it works before I post it

Edit:
Well, I've convinced myself now that it works. You'll need to add in the stuff to correctly format the routine since I don't fully understand how that works (especially calling in the middle of other routines)
Code: [Select]
p_88Div: ld a,h xor d push af bit 7,h jr z,$+8 xor a sub l ld l,a sbc a,a sub h ld h,a bit 7,d jr z,$+8 xor a sub e ld e,a sbc a,a sub d ld d,a ld bc,\$1000 ld a,l ld l,h ld h,c call __DivLoop pop af ret p xor a sub l ld l,a sbc a,a sub h ld h,a ret
Overflow checking isn't handled, but I suppose that's normal. It might be nice to saturate the result, though.
« Last Edit: June 17, 2011, 04:18:06 am by calc84maniac »
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

#### Quigibo

• The Executioner
• CoT Emeritus
• LV11 Super Veteran (Next: 3000)
• Posts: 2031
• Rating: +1075/-24
##### Re: Assembly Programmers - Help Axe Optimize!
« Reply #208 on: June 17, 2011, 04:55:30 am »
Cool, thanks!  I was also able to do that same sign flag optimization to the 8.8 multiplication routine.  Any idea what might be a good token for fixed point division?  That's the main thing holding me back from adding it.  /* is the first thing that comes to mind but I think its confusing.  /// could also work but that's a lot to type...
___Axe_Parser___
Today the calculator, tomorrow the world!

#### Deep Toaster

• So much to do, so much time, so little motivation