Author Topic: Assembly Programmers - Help Axe Optimize!  (Read 142083 times)

0 Members and 1 Guest are viewing this topic.

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #75 on: September 22, 2010, 06:42:19 pm »
And how would the parser decide to do that ???

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #76 on: September 22, 2010, 06:48:29 pm »
The problem with conditional "short-circuit evaluation" is that it has to do a lot of non-linear "look-ahead" parsing to determine if it's okay to get out of the statement early or not.  You might for example have If A≠5 and sub(EQL,B,C) which might need to evaluate the second expression even if the first one is false.  The idea definitely sounds good though, but it seems like it would be really complicated for the compiler to tell whether or not it can actually use that optimization and be completely compatible with previous versions.  And even when it can, I would have to write completely new block code and assembly templates for those conditionals.
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #77 on: October 20, 2010, 08:56:30 am »
Oops, necropost, oh well :P

I don't know if this approach was purposely left out, as it's 15 bytes larger than the current routine and sometimes slower. I'm referring to the square root routine. Whereas the current routine (14 bytes) takes 37n+38 T-states (linear time), where n is the result+1 (1-256), the following routine (29 bytes) takes 5n+800 T-states (near constant time), where n is the number of set bits in the result (0-8). The existing routine is faster for values that would yield results of 0-19, but this routine would be faster for values that would yield results of 20-255, which is a much broader range of the 8-bit spectrum. Also, it would be much more reliable to run at a near constant speed in programs which rely on that to run smoothly themselves. The existing routine would take only a few hundred T-states for low inputs, but would take up to OVER NINE THOUSAND T-states to calculate the square roots for the highest inputs. So it's up to you if this is something you want to use.

Code: [Select]
p_Sqrt:
.db __SqrtEnd-1-$
ld a,l
ld l,h
ld de,$0040
ld h,d
ld b,8
or a
__SqrtLoop:
sbc hl,de
jr nc,__SqrtSkip
add hl,de
__SqrtSkip:
ccf
rl d
rla
adc hl,hl
rla
adc hl,hl
djnz __SqrtLoop
ld h,0
ld l,d
ret
__SqrtEnd:
« Last Edit: October 20, 2010, 09:10:44 am by Runer112 »

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #78 on: October 30, 2010, 08:11:33 pm »
I think it's been long enough that I can safely double post :P


Bit routine optimizations! Please tell me if any of these wouldn't work correctly, as I wrote them myself and I'm not a terribly experienced assembly programmer so that's a definite possibility.


Code: (Current code) [Select]
p_GetBit0:
.db 5 ;5 bytes, 36 T-states
add hl,hl
ccf
sbc hl,hl
inc hl


p_GetBit1:
.db 6 ;6 bytes, 47 T-states
add hl,hl
add hl,hl
ccf
sbc hl,hl
inc hl


p_GetBit2:
.db 7 ;7 bytes, 58 T-states
add hl,hl
add hl,hl
add hl,hl
ccf
sbc hl,hl
inc hl


p_GetBit6:
.db 7 ;7 bytes, 37 T-states
ld a,h
rra
rra
ccf
sbc hl,hl
inc hl

p_GetBit7:
.db 6 ;6 bytes, 33 T-states
rr h
ccf
sbc hl,hl
inc hl

p_GetBit8:
.db 6 ;6 bytes, 33 T-states
rl l
ccf
sbc hl,hl
inc hl


p_GetBit9:
.db 7 ;7 bytes, 37 T-states
ld a,l
rla
rla
ccf
sbc hl,hl
inc hl

p_GetBit10:
.db 8 ;8 bytes, 30/29 T-states
bit 5,l
ld hl,0
jr z,$+3
inc l




p_GetBit14:
.db 7 ;7 bytes, 37 T-states
ld a,l
rra
rra
ccf
sbc hl,hl
inc hl

p_GetBit15:
.db 6 ;6 bytes, 33 T-states
rr l
ccf
sbc hl,hl
inc hl

 
       
Code: (Optimized code) [Select]
p_GetBit0:
.db 5 ;5 bytes, 27 T-states
xor a
add hl,hl
ld h,a
rla
ld l,a

p_GetBit1:
.db 6 ;6 bytes, 38 T-states
xor a
add hl,hl
add hl,hl
ld h,a
rla
ld l,a

p_GetBit2:
.db 7 ;7 bytes, 49 T-states
xor a
add hl,hl
add hl,hl
add hl,hl
ld h,a
rla
ld l,a

p_GetBit6:
.db 7 ;7 bytes, 26 T-states
ld a,%00000010
and h
rrca
ld h,0
ld l,a


p_GetBit7:
.db 6 ;6 bytes, 22 T-states
ld a,%00000001
and h
ld h,0
ld l,a

p_GetBit8:
.db 5 ;5 bytes, 27 T-states
xor a
ld h,a
add hl,hl
ld l,h
ld h,a

p_GetBit9:
.db 6 ;6 bytes, 38 T-states
xor a
add hl,hl
ld h,a
add hl,hl
ld l,h
ld h,a

p_GetBit10:
.db 7 ;7 bytes, 49 T-states
xor a
add hl,hl
add hl,hl
ld h,a
add hl,hl
ld l,h
ld h,a

p_GetBit14:
.db 7 ;7 bytes, 26 T-states
ld a,%00000010
and l
rrca
ld h,0
ld l,a


p_GetBit15:
.db 5 ;5 bytes, 20 T-states
xor a
ld h,a
inc a
and l
ld l,a
 


Other optimizations:
  • The signed less than zero comparison (p_SLT0) can be optimized to the optimized p_GetBit0 above.
« Last Edit: October 30, 2010, 09:12:42 pm by Runer112 »

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #79 on: November 10, 2010, 02:38:37 am »
Signed greater than comparison:

Code: (Current code) [Select]
p_SIntGt:
.db 13 ;13 bytes, 48 T-states
ex de,hl
xor a
ld b,h
sbc hl,de
ld h,a
rra
xor b
xor d
rlca
and 1
ld l,a
       
Code: (Optimized code) [Select]
p_SIntGt:
.db 12 ;12 bytes, 67 T-states
ld bc,$8000
add hl,bc
ex de,hl
add hl,bc
xor a
sbc hl,de
ld h,a
rla
ld l,a


You getting all this Quigibo? :P
« Last Edit: November 10, 2010, 02:38:59 am by Runer112 »

Offline DJ Omnimaga

  • Clacualters are teh gr33t
  • CoT Emeritus
  • LV15 Omnimagician (Next: --)
  • *
  • Posts: 55942
  • Rating: +3154/-232
  • CodeWalrus founder & retired Omnimaga founder
    • View Profile
    • Dream of Omnimaga Music
Re: Assembly Programmers - Help Axe Optimize!
« Reply #80 on: November 10, 2010, 02:40:58 am »
I think he is too busy, which might explain why he doesn't respond.
* DJ Omnimaga hopes his school schedule doesn't get so drastic that he gets forced to quit the community for good... I am not too worried about the future of Axe programming, though. I was worried that if her became less active, there would be less activity in his sub-forum since he replied to a lot of help topics, but then activity still continued. I guess a huge thank to you and a bunch of other people is in order. Sadly, having quit programming a while ago I did not really participate much, though X.x
« Last Edit: November 10, 2010, 02:42:45 am by DJ Omnimaga »
Now active at https://discord.gg/cuZcfcF (CodeWalrus server)

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #81 on: November 10, 2010, 07:55:16 pm »
Yeah, I'm still reading all of this, even though I'm less active, I still visit just about every day :)  I've even been able to do a little more progress with Axe even with my busy schedule.

Runer112, are you sure that comparison is correct?  It seems like all it does is just change the high order bit before doing the subtraction.  It needs to check if the parity changed in that bit before and after the subtraction.  I actually already have plans to optimize this since I will be able to use the parity/overflow flag once I get relative jump replacement working with the axioms (so I can carry that feature over to the built-in commands).
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2912
  • Rating: +471/-17
    • View Profile
    • TI-Boy CE
Re: Assembly Programmers - Help Axe Optimize!
« Reply #82 on: November 10, 2010, 09:20:06 pm »
Changing the high order bit does work, actually. It changes a comparison in the -32768 to 32767 range to a comparison in the 0 to 65535 range (effectively changing from a signed comparison to an unsigned comparison).
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #83 on: November 10, 2010, 11:55:08 pm »
Changing the high order bit does work, actually. It changes a comparison in the -32768 to 32767 range to a comparison in the 0 to 65535 range (effectively changing from a signed comparison to an unsigned comparison).

Yup :) This is the only signed comparison for which this method is better though.

Do all the bit optimizations look correct by the way?



EDIT: If you plan on optimizing the signed comparisons to use the parity/overflow flag, you might want to check into that a bit. I was playing around with signed comparisons and wabbitemu was telling me very strange things. It seemed to tell me that signed comparisons relied on an xor of the p/v and s flags. Which makes no sense, but that's what wabbitemu was telling me. See table below.

hldesbc hl,de    cp/vshl>>de
2000    6000    C0001    0    1        0    
2000A00080001111
2000E00040001001
6000200040000001
6000A000C0001111
6000E00080001111
A000200080000010
A000600040000100
A000E000C0001010
E0002000C0000010
E000600080000010
E000A00040000001
« Last Edit: November 11, 2010, 12:20:11 am by Runer112 »

Offline DJ Omnimaga

  • Clacualters are teh gr33t
  • CoT Emeritus
  • LV15 Omnimagician (Next: --)
  • *
  • Posts: 55942
  • Rating: +3154/-232
  • CodeWalrus founder & retired Omnimaga founder
    • View Profile
    • Dream of Omnimaga Music
Re: Assembly Programmers - Help Axe Optimize!
« Reply #84 on: November 11, 2010, 12:32:17 am »
Yeah, I'm still reading all of this, even though I'm less active, I still visit just about every day :)  I've even been able to do a little more progress with Axe even with my busy schedule.
Ah phew, good to hear x.x. Still, I hope the schedule won't get even more hectic with the time. X.x
Now active at https://discord.gg/cuZcfcF (CodeWalrus server)

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2912
  • Rating: +471/-17
    • View Profile
    • TI-Boy CE
Re: Assembly Programmers - Help Axe Optimize!
« Reply #85 on: November 11, 2010, 01:04:30 am »
It seemed to tell me that signed comparisons relied on an xor of the p/v and s flags. Which makes no sense, but that's what wabbitemu was telling me.
It actually does make a bit of sense. Whether the mathematical (non-overflowed) result of the subtraction is positive or negative should give you the result of the comparison.  However, if there was a signed overflow, it will give the wrong result. So the sign flag needs to be inverted if there was an overflow, and XOR achieves this perfectly.
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #86 on: November 11, 2010, 01:43:50 am »
Yeah, my point is that Quigibo is probably just better off using the signed comparisons he already uses instead of bothering with the p/v flag, because it gets messy.
« Last Edit: November 11, 2010, 01:44:07 am by Runer112 »

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2912
  • Rating: +471/-17
    • View Profile
    • TI-Boy CE
Re: Assembly Programmers - Help Axe Optimize!
« Reply #87 on: November 15, 2010, 08:27:44 am »
Actually, the main reason he didn't use the p/v flag is because his routines didn't support absolute jumps. They apparently do now, so some speed-up using these flags might be possible.
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #88 on: November 28, 2010, 06:04:40 pm »
Cool, Quigibo added all my optimized auto-optimizations :) But I think you missed p_GetBit15, which can be optimized to be the same as p_Mod2.

Offline Munchor

  • LV13 Extreme Addict (Next: 9001)
  • *************
  • Posts: 6199
  • Rating: +295/-121
  • Code Recycler
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #89 on: November 28, 2010, 06:08:28 pm »
Cool, Quigibo added all my optimized auto-optimizations :) But I think you missed p_GetBit15, which can be optimized to be the same as p_Mod2.

Great! So, it optimizes the Axe script or the Asm conversion?

Like, the following program:

Code: [Select]
Output(0,0,"Hello World")
Is optimized to:

Code: [Select]
Output(0,0,"Hello World
Or is it Assembly that is optimized?