### Author Topic: The Optimization Compilation  (Read 72718 times)

0 Members and 1 Guest are viewing this topic.

#### Happybobjr

• James Oldiges
• LV11 Super Veteran (Next: 3000)
• Posts: 2325
• Rating: +128/-20
• Howdy :)
##### Re: The Optimization Compilation
« Reply #30 on: January 03, 2011, 07:13:12 pm »
is this an optimization?

if x=5
0->A
end

to

If x=5
-1->A  (minus)
end
School: East Central High School

Axe: １.０.0
TI-84 +SE  ||| OS: 2.53 MP (patched) ||| Version: "M"
TI-Nspire    |||  Lent out, and never returned
____________________________________________________________

#### Deep Toaster

• So much to do, so much time, so little motivation
• LV13 Extreme Addict (Next: 9001)
• Posts: 8217
• Rating: +758/-15
##### Re: The Optimization Compilation
« Reply #31 on: January 03, 2011, 07:20:06 pm »
is this an optimization?

if x=5
0->A
end

to

If x=5
-1->A  (minus)
end

Yep, it is, and really useful, too.

It's even better to do

Code: (Axe) [Select]
:!If A-5:→A:End

#### Happybobjr

• James Oldiges
• LV11 Super Veteran (Next: 3000)
• Posts: 2325
• Rating: +128/-20
• Howdy :)
##### Re: The Optimization Compilation
« Reply #32 on: January 03, 2011, 07:21:29 pm »

i'm an idiot sometimes.
thank you very much

also,
would this be better?
?*256

to

?*2*2*2*2*2*2*2*2

and.
?/8
to
?/2/2/2

[/edit]
« Last Edit: January 03, 2011, 07:22:58 pm by happybobjr »
School: East Central High School

Axe: １.０.0
TI-84 +SE  ||| OS: 2.53 MP (patched) ||| Version: "M"
TI-Nspire    |||  Lent out, and never returned
____________________________________________________________

#### ztrumpet

• The Rarely Active One
• CoT Emeritus
• LV13 Extreme Addict (Next: 9001)
• Posts: 5712
• Rating: +364/-4
• If you see this, send me a PM. Just for fun.
##### Re: The Optimization Compilation
« Reply #33 on: January 03, 2011, 07:29:07 pm »
That would make it faster, but the code would be bigger.
If I'm wrong, please correct me!
Unfinished Projects:
 Elmgon 14% Basic Movement Demo Homescreen Game Pack 80% Basic Latest Release Cube Droid Saves the Galaxy 65% Axe Demo Detonate 70% Axe
Completed Projects:
Exodus | Midnight |Drifter | Axe Snake | Jump! | Factory Theta | Spider | Plot Drop | Papi Jump | Numb3rs | Nibbler | Boost | Duel Tile Map Editor | Homescreen Map Editor | Key Group Check | Oasis

#### Deep Toaster

• So much to do, so much time, so little motivation
• LV13 Extreme Addict (Next: 9001)
• Posts: 8217
• Rating: +758/-15
##### Re: The Optimization Compilation
« Reply #34 on: January 03, 2011, 07:31:29 pm »

i'm an idiot sometimes.
thank you very much

also,
would this be better?
?*256

to

?*2*2*2*2*2*2*2*2

and.
?/8
to
?/2/2/2

[/edit]

#### squidgetx

• Food.
• CoT Emeritus
• LV10 31337 u53r (Next: 2000)
• Posts: 1881
• Rating: +503/-17
• rawr.
##### Re: The Optimization Compilation
« Reply #35 on: January 03, 2011, 08:31:51 pm »
Just ran some tests:

Code: [Select]
.TESTFor(A,0,50000)*2*2*2*2*2*2*2*2Endvs
Code: [Select]
.TESTFor(A,0,50000)*256End
The first completed in about 1.8 seconds, in a 50 byte size program
The second completed in about 1.3 seconds in a 45 byte size program

HOWEVER

Code: [Select]
.TESTFor(A,0,50000)*2*2*2*2*2*2*2Endcompleted in 1.8 seconds while
Code: [Select]
.TESTFor(A,0,50000)*128Endcompleted in 2.4 seconds

Seems like that trick only works on powers of 2 that are even?

I don't have much time to run more tests, can anyone confirm this?

#### DJ Omnimaga

• Former TI programmer
• CoT Emeritus
• LV15 Omnimagician (Next: --)
• Posts: 55918
• Rating: +3152/-232
• CodeWalrus founder & retired Omnimaga founder
##### Re: The Optimization Compilation
« Reply #36 on: January 03, 2011, 08:50:43 pm »
Interesting stuff, the multiplication one seems weird O.o

#### Quigibo

• The Executioner
• CoT Emeritus
• LV11 Super Veteran (Next: 3000)
• Posts: 2031
• Rating: +1075/-24
##### Re: The Optimization Compilation
« Reply #37 on: January 03, 2011, 09:58:59 pm »
Only *64 and *128 are more optimized as a bunch of *2's but it will make them larger by 1 and 2 bytes respectively.  You can always look at the commands.inc file to see exactly what routines are used and how big they are.  Also, I think *256/2 is faster than *128 but that only works if the original number is between 0 to 32,767.
___Axe_Parser___
Today the calculator, tomorrow the world!

#### Runer112

• Project Author
• LV11 Super Veteran (Next: 3000)
• Posts: 2289
• Rating: +639/-31
##### Re: The Optimization Compilation
« Reply #38 on: January 03, 2011, 11:07:14 pm »
Science lesson! YAY (By the way I completely understand if you don't want to read this whole thing, it's a bit lengthy. I'll put the important parts in bold.)

All Axe operations rely on using the register pair hl as the "Ans" value, using it to hold the running value of calculations. For those who aren't fully familiar with z80 assembly, hl is a combination of h and l, two 8-bit (1-byte) "registers." Registers are like variables you might store in memory, but they're stored directly inside of the the processor, so they can be used quickly. The basic registers (a, b, c, d, e, h, and l) are all 8-bit values, so most commands were built to work with these 8-bit registers, hence the z80 being an 8-bit system. However, Zilog knew that 8 bits was a little restrictive, and especially for systems that would have more than 256 bytes of memory, being able to easily use and manipulate 16-bit values (like pointers) would be very helpful. Did you notice that the other 5 basic registers go in alphabetical order before randomly jumping to h and l? Well that's because h and l are two very special 8-bit registers, designed to easily be combined into the Higher and Lower halves of a 16-bit value. With this special designation come very useful 16-bit operations built-in.

*2, for instance, simply breaks down to one assembly instruction: add hl,hl. This simply adds the value of hl to itself, which in other words multiplies it by 2. Because Zilog knew a basic function like adding would be a core operation, they made sure to make it small and swift: 1 byte to call and 11 cycles to execute.

*256 is a multiplication by 2^8, so one could achieve this by adding hl to itself 8 times. But there's an easier way to think of this. Just like how multiplication by 10 in a decimal system shifts every digit left one place, multiplication by 2 in a binary system shifts every digit left one place as well. And because hl is a 16-bit value with the high 8 bits in h and the low 8 bits in l, shifting these bits left 8 places would just result in the value in l being shifted all the way out and into h, and 8 trailing zeros being shifted into l. So instead of using add hl,hl eight times, *256 uses the following instructions: ld h,l  /  ld l,0. The first costs 1 byte and 4 cycles and the second costs 2 bytes and 7 cycles, for a grand total of 3 bytes and 11 cycles. Just as fast as *2!

*128, however, isn't so easy. Again, the obvious approach is to add hl to itself 7 times. This would cost 7 bytes and 77 cycles. You may also think to use the previous technique to multiply hl by 256, and then divide it by 2. However, we have a problem (pun not intended). If we multiplied the value by 256 and then divided it by 2, we would have lost the highest bit from the multiplication by 256 before dividing it by 2 again! So that's a bit of a problem. Anyways, Axe is optimized for size, and we can do better: using a loop. And although the z80 is a pretty old system, they were nice enough to give us a built-in loop structure: ld b,7  /  add hl,hl  /  djnz \$-1. This loads 7 into the b register, adds hl to hl, and then repeats adding hl to hl 6 more (b-1) times. Although this is a good amount slower than either of the previous options, coming in at 170 cycles, it only takes 2 bytes to initialize the loop counter, 1 byte for the add instruction, and 2 bytes for the loop execution instruction, for a total of 5 bytes.

Sorry to bore you... But congratulations, you now all know at least a little bit about z80 assembly, the structure of compiled Axe code! And the more you know about Axe's internals, the more you can optimize it, whether it be for speed or size!

EDIT: Wow, it took me a whole hour to write this? Major ninja'd.
« Last Edit: January 03, 2011, 11:12:30 pm by Runer112 »

#### Deep Toaster

• So much to do, so much time, so little motivation
• LV13 Extreme Addict (Next: 9001)
• Posts: 8217
• Rating: +758/-15
##### Re: The Optimization Compilation
« Reply #39 on: January 03, 2011, 11:20:21 pm »
Just ran some tests:

Just seeing those numbers (For(A,0,50000)) shows how fast Axe is compared to BASIC

And that's a really clear explanation, Runer! Hopefully it'll help people understans and find their own opts.

#### DJ Omnimaga

• Former TI programmer
• CoT Emeritus
• LV15 Omnimagician (Next: --)
• Posts: 55918
• Rating: +3152/-232
• CodeWalrus founder & retired Omnimaga founder
##### Re: The Optimization Compilation
« Reply #40 on: January 03, 2011, 11:52:43 pm »
I like how in Runer112 post I understand what hl is and some stuff about it easily, while with ASM in 28 days, in my 3 attempts, it gave me a brain f***. I guess your post was explained in a way that suits me more than ASM in 28 days. Thanks for explaining how multiplication works in Axe, though.

#### Happybobjr

• James Oldiges
• LV11 Super Veteran (Next: 3000)
• Posts: 2325
• Rating: +128/-20
• Howdy :)
##### Re: The Optimization Compilation
« Reply #41 on: January 20, 2011, 01:43:03 pm »
Which is faster?

Pxl-on(X,Y)
Pxl-on(X+1,Y)
Pxl-on(X,Y+1)
Pxl-on(X+1,Y+1)

or

Rect(X,Y,2,,)
School: East Central High School

Axe: １.０.0
TI-84 +SE  ||| OS: 2.53 MP (patched) ||| Version: "M"
TI-Nspire    |||  Lent out, and never returned
____________________________________________________________

#### Runer112

• Project Author
• LV11 Super Veteran (Next: 3000)
• Posts: 2289
• Rating: +639/-31
##### Re: The Optimization Compilation
« Reply #42 on: January 20, 2011, 02:14:46 pm »
The rectangle command will be faster than pixel commands. Assuming equally-distributed onscreen X and Y values, the Pxl-On() method will take an average of 1306.5 cycles, whereas the Rect() method will take an average of 790.5 cycles. Which reminds me that I need to add timing information for commands like Rect() to my commands list.

#### Happybobjr

• James Oldiges
• LV11 Super Veteran (Next: 3000)
• Posts: 2325
• Rating: +128/-20
• Howdy :)
##### Re: The Optimization Compilation
« Reply #43 on: January 20, 2011, 02:36:34 pm »
Thank you very much.
One more question.

Which is faster.

For(A,0,5)
For(B,0,5)
Pt-On(A*8,B*8,Pic1)
End
End

Or

For(A,0,35)
Pt-On(A/6*8,A^6*8,Pic1)
end
School: East Central High School

Axe: １.０.0
TI-84 +SE  ||| OS: 2.53 MP (patched) ||| Version: "M"
TI-Nspire    |||  Lent out, and never returned
____________________________________________________________

#### Runer112

• Project Author
• LV11 Super Veteran (Next: 3000)
• Posts: 2289
• Rating: +639/-31
##### Re: The Optimization Compilation
« Reply #44 on: January 20, 2011, 02:58:55 pm »
I can say without testing that the first will definitely be faster, as it avoids using division and moduli. The fastest way to achieve this with Pt-On() logic that I can imagine would be the following:

Code: [Select]
5*8Lbl LY  →Y  5*8  Lbl LX    →X    Pt-On(,Y,Pic1)  If X    -8    Goto LX  EndIf Y  -8  Goto LYEnd

This would also be slightly faster if you used Pt-Off() instead, but I don't know if your situation requires Pt-On() logic. If you're okay with Pt-Off() logic, I could actually give you an even faster way probably.
« Last Edit: January 20, 2011, 02:59:44 pm by Runer112 »