Print Page - The Optimization Compilation

Calculator Community => TI Calculators => Axe => Topic started by: squidgetx on December 28, 2010, 12:17:53 pm

Title: The Optimization Compilation
Post by: squidgetx on December 28, 2010, 12:17:53 pm

Optimization Compilation: The Quick & Dirty Guide to Optimizing in Axe

This is intended to be a guide for every Axe programmer with general optimization :)

For demonstration purposes, I will always use the variable A. Of course, you can do these optimizations with any other var or memory address. Most opts are either in code boxes or on bullet points.

Part I: The BASIC Stuff
Many of the tricks we've learned from BASIC can also be applied to Axe. However, not all of them are applicable. Let's take a look:
Boolean Stuff: As with BASIC, we can optimize

Code: [Select]

If VAR?0 to simply

Code: [Select]

If VAR
However, be warned that this may not always work with compound statements (If A and B). Because the logic operators are bitwise operations, the statement "2 or 0" will return 0. This means that unless you are certain that A and B are always boolean values (1 or 0), you need to do If A?0 and (B?0) instead of If A and B

Another random thing is to ALWAYS CLOSE YOUR QUOTES AND PARENTHESES. Unlike BASIC, the store arrow does not close them for you. Unlike BASIC, leaving them out does NOT improve the size of your program; it is unrelated to optimization but I figured I'd just stick it here since it is a good coding habit and makes code more readable.

Pre-evaluating expressions: Especially in games that heavily reference arrays throughout a section of code, it is often good both for speed and memory to pre-evaluate expressions that do not change throughout the loop. Look at this code for drawing a scrolling 16x16 tilemapper with X and Y positions in pixels:

Code: [Select]

For(A,0,11)
For(B,0,7)
If {Y/8+B*16+(X/8)+A+L1}
Pt-On(A*8-(X^8),B*8-(Y^8),{Y/8+B*32+(X/8)+A}*8+Pic0
End
End
End

There is a HUGE speed gain from simply preevaluating some of the expressions before entering the loop:

Code: [Select]

X/8->E
Y/8->F
X^8->I
Y^8->J
For(A,0,7)
For(B,0,11)
If {F+B*16+E+A+L1}->C
Pt-On(A*8-I,B*8-J,C*8+Pic0
End
End
End

Pixel Testing

Pixel testing can be a mean and nasty cycle stealer from many programs. But never fear, it can be optimized...a lot. Remember that we have access to the screen buffer in L6.

If you are pixel testing a constant pixel, like pxl-Test(20,20), you can more than halve the speed of this command with the following optimization:

Code: [Select]

{20*12+L6+2}^^re4 This optimization relies on the fact that the numbers can basically be pre-computed: use the following formula to derive the numbers you should use:

Code: [Select]

If A=EXP optimizes to

Code: [Select]

!If A xor EXPNote that you should use !If A – EXP if it is an optimized addition/subtraction (See Optimized Math)

Code: [Select]

If A=EXP and (B=EXP) optimizes to

Code: [Select]

!If A-EXP + (B-EXP) where + is the 16bit 'or' operator.
Now, if you are checking the same variable for more than one possible expression, then it yields a greater optimization to do this:

Code: [Select]

If A=EXP1 or (A=EXP2) to

Code: [Select]

If inData(A,Data(EXP1,EXP2,0)) You just have to make sure that you take care of the 0 case first, since this will return a non-zero value if the variable=0 :P Also, as Quigibo pointed out, this only works with constant, 8bit values.

Simple Math Stuff

If you can, always put the constants last in your expressions:CONST+A to A+CONST
Use two byte vars over one byte vars or nibbles if you can afford the free RAM; it will be slightly smaller and faster. (but only by a little)
Note: As far as I can tell the speed difference is negligible. Unless you’re raycasting or something
Avoid parentheses as much as possible when writing expressions. Use Axe's left-to-right order of operations to your advantage. This is in the documentation.
A quick way to determine the sign of a value (better than >>0) is EXP//32768 It will return -1 if the value is negative, and 0 if the value is 0 or positive
The new ++ -- arguments are a smaller and faster way of doing {EXP}+/-1->{EXP}. In the old form, EXP had to be evaluated twice. Now, it only needs to be evaluated once.
One random trick is if you need to initialize a 1-byte variable to 0, the line and 0->{EXP} will be smaller than 0->{EXP}

Optimized Math

Some operations are hard-coded optimized versions that don’t use the usual arithmetic operations. A complete list of them can be found in the following spoiler which I stole from Runer112’s handy list of command speed/size (http://ourl.ca/8693)

Spoiler For Optimized Math:

;Optimized Math ;----------------------------------------------- p_Add0: 0 bytes 0 cycles +0 Results in no compiled code p_Add1: 1 byte 6 cycles +1 Tied for the smallest way to change a value in Axe p_Add2: 2 bytes 12 cycles +2 p_Add3: 3 bytes 18 cycles +3 p_Add254: 3 bytes 16 cycles +254 p_Add255: 2 bytes 10 cycles +255 p_Add256: 1 byte 4 cycles +256 The absolute smallest and fastest way to change a value in Axe p_Add257: 2 bytes 10 cycles +257 p_Add258: 3 bytes 16 cycles +258 p_Add510: 4 bytes 20 cycles +510 p_Add511: 3 bytes 14 cycles +511 p_Add512: 2 bytes 8 cycles +512 p_Add513: 3 bytes 14 cycles +513 p_Add514: 4 bytes 20 cycles +514 p_Add767: 4 bytes 18 cycles +767 p_Add768: 3 bytes 12 cycles +768 p_Add769: 4 bytes 18 cycles +769 p_Add1024: 4 bytes 16 cycles +1024 p_Sub0: 0 bytes 0 cycles -0 Results in no compiled code p_Sub1: 1 byte 6 cycles -1 Tied for the smallest way to change a value in Axe p_Sub2: 2 bytes 12 cycles -2 p_Sub3: 3 bytes 18 cycles -3 p_Sub254: 3 bytes 16 cycles -254 p_Sub255: 2 bytes 10 cycles -255 p_Sub256: 1 byte 4 cycles -256 Also the absolute smallest and fastest way to change a value in Axe p_Sub257: 2 bytes 10 cycles -257 p_Sub258: 3 bytes 16 cycles -258 p_Sub510: 4 bytes 20 cycles -510 p_Sub511: 3 bytes 14 cycles -511 p_Sub512: 2 bytes 8 cycles -512 p_Sub513: 3 bytes 14 cycles -513 p_Sub514: 4 bytes 20 cycles -514 p_Sub767: 4 bytes 18 cycles -767 p_Sub768: 3 bytes 12 cycles -768 p_Sub769: 4 bytes 18 cycles -769 p_Sub1024: 4 bytes 16 cycles -1024 p_Mul0: 3 bytes 10 cycles *0 Same as loading the constant 0 p_Mul1: 0 bytes 0 cycles *1 Results in no compiled code p_MulN1: 6 bytes 24 cycles *?1 Same as p_IntNeg p_Mul2: 1 byte 11 cycles *2 Tied for the smallest way to change a value in Axe p_Mul3: 4 bytes 30 cycles *3 p_Mul4: 2 bytes 22 cycles *4 p_Mul5: 5 bytes 41 cycles *5 p_Mul6: 5 bytes 41 cycles *6 p_Mul7: 6 bytes 52 cycles *7 p_Mul8: 3 bytes 33 cycles *8 p_Mul9: 6 bytes 52 cycles *9 p_Mul10: 6 bytes 52 cycles *10 p_Mul12: 6 bytes 52 cycles *12 p_Mul16: 4 bytes 44 cycles *16 p_Mul32: 5 bytes 55 cycles *32 p_Mul64: 5 bytes 144 cycles *64 p_Mul128: 5 bytes 170 cycles *128 p_Mul255: 6 bytes 31 cycles *255 p_Mul256: 3 bytes 11 cycles *256 p_Mul257: 3 bytes 12 cycles *257 p_Mul258: 4 bytes 23 cycles *258 p_Mul260: 5 bytes 34 cycles *260 p_Mul264: 6 bytes 45 cycles *264 p_Mul512: 4 bytes 22 cycles *512 p_Mul513: 6 bytes 37 cycles *513 p_Mul514: 4 bytes 23 cycles *514 p_Mul516: 5 bytes 34 cycles *516 p_Mul520: 6 bytes 45 cycles *520 p_Mul768: 6 bytes 23 cycles *768 p_Mul1024: 5 bytes 33 cycles *1024 p_Mul1028: 5 bytes 34 cycles *1028 p_Mul1032: 6 bytes 45 cycles *1032 p_Mul2048: 6 bytes 44 cycles *2048 p_Mul2056: 6 bytes 45 cycles *2056 p_Mul4096: 5 bytes 290 cycles *4096 p_Mul8192: 5 bytes 314 cycles *8192 p_Mul16384: 5 bytes 338 cycles *16384 p_Mul32768: 6 bytes 24 cycles *32768 p_Mul65535: 6 bytes 24 cycles *65535 Same as p_MulN1 p_Div0: 3 bytes 10 cycles /0 Same as loading the constant 65535 p_Div1: 0 bytes 0 cycles /1 Results in no compiled code p_Div2: 4 bytes 16 cycles /2 p_Div10: 3 bytes ~1896 cycles /10 n*3+1878 cycles, n=number of set bits in result p_Div128: 5 bytes 27 cycles /128 p_Div256: 3 bytes 11 cycles /256 p_Div512: 5 bytes 19 cycles /512 p_Div32768: 5 bytes 27 cycles /32768 p_SDiv0: 3 bytes 10 cycles //0 **NOT MATHEMATICALLY CORRECT** Same as loading the constant 65535 p_SDiv1: 0 bytes 0 cycles //1 Results in no compiled code p_SDiv2: 4 bytes 16 cycles //2 p_SDiv64: 6 bytes 38 cycles //64 p_SDiv128: 4 bytes 23 cycles //128 p_SDiv256: 5 bytes 20 cycles //256 p_SDiv512: 6 bytes 38 cycles //512 p_SDiv16384: 6 bytes 38 cycles //16384 p_SDiv32768: 3 bytes 26 cycles //32768 p_Mod1: 3 bytes 10 cycles ^1 Same as loading the constant 0 p_Mod2: 5 bytes 20 cycles ^2 p_Mod4: 6 bytes 22 cycles ^4 p_Mod8: 6 bytes 22 cycles ^8 p_Mod16: 6 bytes 22 cycles ^16 p_Mod32: 6 bytes 22 cycles ^32 p_Mod64: 6 bytes 22 cycles ^64 p_Mod128: 4 bytes 15 cycles ^128 p_Mod256: 2 bytes 7 cycles ^256 p_Mod512: 4 bytes 15 cycles ^512 p_Mod1024: 4 bytes 15 cycles ^1024 p_Mod2048: 4 bytes 15 cycles ^2048 p_Mod4096: 4 bytes 15 cycles ^4096 p_Mod8192: 4 bytes 15 cycles ^8192 p_Mod16384: 4 bytes 15 cycles ^16384 p_Mod32768: 2 bytes 8 cycles ^32768 p_EQN512: 9 bytes 44 cycles =?512 p_EQN256: 8 bytes 40 cycles =?256 p_EQN2: 8 bytes 40 cycles =?2 p_EQN1: 7 bytes 36 cycles =?1 p_EQ0: 7 bytes 36 cycles =0 p_EQ1: 7 bytes ~29 cycles =1 24 cycles if true, 34 cycles if false p_EQ2: 8 bytes ~33 cycles =2 28 cycles if true, 38 cycles if false p_EQ256: 8 bytes 40 cycles =256 p_EQ512: 9 bytes 44 cycles =512 p_NEN512: 9 bytes ~31 cycles ??512 33 cycles if true, 28 cycles if false p_NEN256: 8 bytes ~27 cycles ??256 29 cycles if true, 24 cycles if false p_NEN2: 8 bytes 40 cycles ??2 p_NEN1: 7 bytes 36 cycles ??1 p_NE0: 7 bytes ~23 cycles ?0 25 cycles if true, 20 cycles if false p_NE1: 8 bytes ~27 cycles ?1 29 cycles if true, 24 cycles if false p_NE2: 9 bytes ~31 cycles ?2 33 cycles if true, 28 cycles if false p_NE256: 8 bytes ~27 cycles ?1 29 cycles if true, 24 cycles if false p_NE512: 9 bytes ~31 cycles ?2 33 cycles if true, 28 cycles if false p_GE0: 3 bytes 10 cycles ?0 Same as loading the constant 1 p_GT65535: 3 bytes 10 cycles >65535 Same as loading the constant 0 p_LE65535: 3 bytes 10 cycles ?65535 Same as loading the constant 1 p_LT0: 3 bytes 10 cycles <0 Same as loading the constant 0 p_GE1: 7 bytes ~23 cycles ?1 25 cycles if true, 20 cycles if false p_GT0: 7 bytes ~23 cycles >0 25 cycles if true, 20 cycles if false p_LE0: 7 bytes 36 cycles ?0 p_LT1: 7 bytes 36 cycles <1 p_SGE0: 4 bytes 32 cycles ??0 p_SLT0: 5 bytes 27 cycles <<0 p_GetBit0: 5 bytes 27 cycles ee0 p_GetBit1: 6 bytes 38 cycles ee1 p_GetBit2: 7 bytes 37 cycles ee2 p_GetBit3: 7 bytes 37 cycles ee3 p_GetBit4: 7 bytes 37 cycles ee4 p_GetBit5: 7 bytes 37 cycles ee5 p_GetBit6: 7 bytes 26 cycles ee6 p_GetBit7: 6 bytes 22 cycles ee7 p_GetBit8: 5 bytes 27 cycles ee8 e0 p_GetBit9: 6 bytes 38 cycles ee9 e1 p_GetBit10: 7 bytes 37 cycles ee10 e2 p_GetBit11: 7 bytes 37 cycles ee11 e3 p_GetBit12: 7 bytes 37 cycles ee12 e4 p_GetBit13: 7 bytes 37 cycles ee13 e5 p_GetBit14: 7 bytes 26 cycles ee14 e6 p_GetBit15: 5 bytes 20 cycles ee15 e7

Part III: "HL is the Ans of Axe" -Runer112

Like BASIC, Axe also has an Ans-counterpart, a register called HL. It is written to every time you "load" an expression or constant. This yields a multitude of
small optimizations:
-Duplicate Arguments:

Code: [Select]

Text(0,0,PTR) optimizes to

Code: [Select]

Text(0,,PTR)See how that works? When Axe goes to parse the second argument of Text(, it would normally load the second argument into HL and then use that as the Y position of the text. However, in the optimized piece it doesn't find anything to load for the second argument, so it just uses what was already in HL, namely, 0.

This, extended, yields further optimization opportunities:
-More Omitted Arguments: Not only does loading arguments in commands write to HL, so does loading arguments in normal math and control structure operations. This means that we can omit lots of things, saving more space and speed. For example...

Code: [Select]

If A>3: 1?B becomes

Code: [Select]

If A>3: ?BBut wait! What if you want to use this optimization on an equality check? Normally, If A=1:?B would work, but if we wanted to optimize the equality check as well, it doesn't work quite right. !If A-1 returns zero if true (or rather, if false), so you might then think that it's probably best to skip this particular optimization and go

Code: [Select]

!If A-1:1?B However, although it may seem counterintuitive, the following is actually smaller and faster:

Code: [Select]

!If A-1:+1?B Similarly,

Code: [Select]

If EXP: -1?B:End(with a minus sign) is preferable to

Code: [Select]

If EXP:0?B:End

Code: [Select]

0?B: 0?A becomes

Code: [Select]

0?B?Anote: When initializing variables that are 2 or less apart, it also yields further optimization to do this:

Code: [Select]

0?A: 1?B: 3?C to

Code: [Select]

0?A+1?B+2?CAn example of HL abuse in a subroutine:

Code: [Select]

sub(LBL,EXP).....Lbl LBL: EXP*2?{L1} to

Code: [Select]

sub(LBL,EXP)....Lbl LBL:*2?{L1]
-Here's another abuse of HL, creating the fastest and smallest loop structure (by Runer 112)

Quote from: Runer112 on December 29, 2010, 11:46:14 pm

It will execute the loop n times, with A starting at n-1 and decreasing down to 0:

Code: [Select]
n While -1?A ;Code A End

Moving around a sprite
Check this out: again courtesy of Runer112: (comments by me)

Code: [Select]

:.SMILE
:[004466000000817E]->Pic1
:DiagnosticOff
:0->X->Y
:Repeat getkey(15)
:ClrDraw
:getKey(3)-getKey(2)+X //check getkeys for X. But we're not going to store the value just yet...
:!If +1                         //First, check if X is negative one; that is, we'll check if X+1=0
:+1                             //if so, add one to make the value in HL 1 (!If statements return 0 if true)
:End
:-1                             //subtract 1 since we added 1 earlier
:Pt-On(min(,88)?X,getKey(1)-getKey(4)+Y+(=?1)min(,56)?Y,Pic1)    //now we'll use the smallest value of either HL or 88 as the X position. For Y, we'll first handle keypresses, then add 1 (the boolean value of y=-1) if Y=-1. Then we'll use the smallest value of either HL or 56 as the Y position.
:DispGraph
:End

I can't post every possible abuse of HL here: there are so many ways to use it. Study these examples to see how they work and you can apply it in your own programs.

Much of HL abuse has been obsoleted by Axe's peephole optimizer. However, you can still make use of some of these tricks. While the weird syntax is not always necessary, remember to arrange your code and operations such that the peephole optimizer can target it. I left the section somewhat intact so that you can look through it to understand the concept; the optimizer is useless if your code isn’t written intelligently.

Part IV: Subroutines
Subroutines probably are capable of saving the most space than any other type of optimization. And they are easy to use, too. There are only a couple rules of thumb to follow:
1. If you are rewriting a section of code more than once (and it is more than just one command), best put it into a subroutine.
2. If you aren't, then don't put it in a subroutine.

Routine calls are around 3-5 bytes, and an additional 3 bytes for every argument you load. Using the recursive subroutine feature that saves your arguments (sub(LBL^r....) costs you 15 bytes per argument.

Tail Call Optimization
This is lesser known and lesser used, but it is still worth sticking here I guess: If you are calling subroutines from a subroutine as the last line, then you can use a process known as Tail Call Optimization to change this:

Code: [Select]

Lbl A : Stuff : sub(B) : Return to this:

Code: [Select]

Lbl A: Stuff : Goto B The Return is not needed because you end up "stealing" subroutine B's return instead of having to return to A and then return again to the main program.

Part V: Speed over Size
These are all optimizations for aggressive speed gain at the expense of size.

Short Circuit Evaluation:
In most cases, it yields a (rather impressive) speed gain to change

Code: [Select]

If EXP1 and EXP2 to

Code: [Select]

If EXP1 : If EXP2 Make sure to have EXP1 (the outside If block) be the expression that is less likely to be true to gain the most speed.

Factoring Constants (and powers of 2); When multiplying and dividing by multiples of powers of 2, it yields an optimization to factor the number first.

Code: [Select]

EXP*96 to

Code: [Select]

EXP*32*3

Part VI: Miscellaneous
All the stuff I couldn't fit into another category...

If you need to draw horizontal or vertical lines, use the Rect() function with a width or height of 1 instead: Line(0,,20,0) to Rect(0,,20,1) This improves both speed and size
Use ElseIf and DS<( if you can. That's why they're there. They are surprisingly useful.
The Text() and Bitmap() commands are slow because they are TI's routines. Avoid them if possible. (I know, it's hard not to use Text(...but do your best :P ).
Note about the Fill() command: If you are initializing less than 10 bytes of data, it is better to just use ->{}^r 5 times instead.

Quote from: Runer112 on February 22, 2011, 03:52:13 pm

Also, a little-known fact regarding printing text at constant coordinates:
[/list]
Code: [Select]
.Coordinate=Y*256+X Text(30*256+20) Text "Stuff"Is 7 bytes smaller than:
Code: [Select]
Text(20,30,"Stuff")The same applies for text drawn to the home screen:
Code: [Select]
.Coordinate=X*256+Y Output(20*256+30) Disp "Stuff"Is smaller than:
Code: [Select]
Output(20,30,"Stuff")
By even more, 8 bytes.

The new DispGraphClrDraw command is the same as DispGraph:ClrDraw but twice as fast. Use it.

However, remember each new command you use in Axe creates its own subroutine in your program. Thus, (for size reasons) it’s best to avoid using Rect if you've already used Line and Pt-On instead of Pt-Change, (and preinvert the sprite).

Quote from: Quigibo

One major optimization that usually gets ignored is recycling large axe commands. Axe is not like BASIC and so each command needs its own subroutine to add to your program. For instance, lets say you use Pt-On() and Pt-Mask() in your code. Each one has to bring its own 100+ byte subroutine into your program. But you can probably get away with having just a Pt-Mask routine, recycling it to act like Pt-On by simply adding a 2nd layer to your sprite which is only 8 bytes extra instead of 100ish. Or you could do the opposite too and only have Pt-On() and Pt-Change() to manually change both buffers at once. This generally reduces the overall size of the program by a lot when you use the routines only once or very rarely in your code. And I don't mean calling it rarely, it could be the most used routine in your code, I just mean rarely appears in your source.

...And that's all I've got. Happy coding! Let me know if you have any questions, comments, additions, or corrections and I'll be happy to accommodate you :) This post will also be updated with any new major optimizations found.

Also special thanks to Quigibo and Runer112 who are the main contributors of all these optimizations :)

Oh and....666th POST >:D
[/list]

Title: Re: The Optimization Compilation
Post by: Munchor on December 28, 2010, 12:39:21 pm

I took a while to read this and I am astonished. Very good work. I'm astonished because I learn't lots of new stuff! I knew the basic ones, but the others, more complex will really help me making my programs faster. Thanks a lot.

Title: Re: The Optimization Compilation
Post by: Binder News on December 28, 2010, 12:42:05 pm

EPIC!!! I just HAD to bookmark this.

Title: Re: The Optimization Compilation
Post by: Munchor on December 28, 2010, 12:47:32 pm

Quote from: Binder News on December 28, 2010, 12:42:05 pm

EPIC!!! I just HAD to bookmark this.

Great idea, I copied it to a .txt file, WHICH I WILL NOT release in any way, don't worry. I bookmarked too and rated up.

Title: Re: The Optimization Compilation
Post by: Happybobjr on December 28, 2010, 01:03:17 pm

This is great.

Will this will be updated regularly when ever a new optimization is found?

Quote from: squidgetx on December 28, 2010, 12:17:53 pm

Part VI: Miscellaneous
All the stuff I couldn't fit into another category...

If you need to draw horizontal or vertical lines, use the Rect() function with a width or height of 1 instead:
Code: [Select]
Line(0,,20,0) to
Code: [Select]
Rect(0,,20,1)

this has speed improvement as well as size right?

Title: Re: The Optimization Compilation
Post by: Runer112 on December 28, 2010, 06:11:54 pm

Quote from: squidgetx on December 28, 2010, 12:17:53 pm

Part III: "HL is the Ans of Axe" -Runer112

My favorite optimization trick of all time ;)

Title: Re: The Optimization Compilation
Post by: Ikkerens on December 29, 2010, 07:16:44 am

I'm feeling slightly... astounded by this...
Remembering the code of splut, I'm missing ALOT of optimizations :w00t:

Title: Re: The Optimization Compilation
Post by: FinaleTI on December 29, 2010, 12:03:40 pm

Quote from: squidgetx on December 28, 2010, 12:17:53 pm

Now, if you are checking the same variable for more than one possible expression, then it yields a greater optimization to do this:
Code: [Select]
If A=EXP1 or (B=EXP2) to
Code: [Select]
If inData(A,Data(EXP1,EXP2,0)) You just have to make sure that you take care of the 0 case first, since this will return a non-zero value if the variable=0 :P

Shouldn't that first one be

Code: [Select]

If A=EXP1 or (A=EXP2)

Anyways, great guide. I'm definitely printing this out when I can.

Title: Re: The Optimization Compilation
Post by: Munchor on December 29, 2010, 12:12:38 pm

http://www.omnimaga.org/index.php?action=printpage;topic=5923.0 (http://www.omnimaga.org/index.php?action=printpage;topic=5923.0)

Right here :D

Title: Re: The Optimization Compilation
Post by: AngelFish on December 29, 2010, 12:16:59 pm

Quote from: Ikkerens on December 29, 2010, 07:16:44 am

I'm feeling slightly... astounded by this...
Remembering the code of splut, I'm missing ALOT of optimizations :w00t:

Alot? (http://hyperboleandahalf.blogspot.com/2010/04/alot-is-better-than-you-at-everything.html) :p

Title: Re: The Optimization Compilation
Post by: Hot_Dog on December 29, 2010, 12:19:09 pm

Wow, I can see you put a lot of work into this. Great job!

Title: Re: The Optimization Compilation
Post by: Munchor on December 29, 2010, 12:19:11 pm

Quote from: Qwerty.55 on December 29, 2010, 12:16:59 pm

Alot? (http://hyperboleandahalf.blogspot.com/2010/04/alot-is-better-than-you-at-everything.html) :p

:evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh: :evillaugh:

(http://4.bp.blogspot.com/_D_Z-D2tzi14/S8TRIo4br3I/AAAAAAAACv4/Zh7_GcMlRKo/s400/ALOT.png)

Title: Re: The Optimization Compilation
Post by: squidgetx on December 29, 2010, 12:36:10 pm

Lol, nice to see everyone enjoys it, and thanks for pointing out that small error FinaleTI.

Title: Re: The Optimization Compilation
Post by: Quigibo on December 29, 2010, 02:14:09 pm

By the way, you can't have non-constant data as part of the Data() command. And that whole inData() optimization only works for comparisons with numbers between 0-255. You can optimize it to this always though:

Code: [Select]

!If A-EXP1 [16bit and] (A-EXP2)

Title: Re: The Optimization Compilation
Post by: Michael_Lee on December 29, 2010, 05:46:06 pm

Ah...
I finally understand this:

Code: [Select]

0->X
While +26
  Do stuff...
  X-2->X
End

This is incredibly helpful - thanks!

Title: Re: The Optimization Compilation
Post by: Fast Crash on December 29, 2010, 05:49:30 pm

Very Nice tricks ! Congratulations !

Title: Re: The Optimization Compilation
Post by: calc84maniac on December 29, 2010, 06:46:57 pm

Quote from: Quigibo on December 29, 2010, 02:14:09 pm

By the way, you can't have non-constant data as part of the Data() command. And that whole inData() optimization only works for comparisons with numbers between 0-255. You can optimize it to this always though:

Code: [Select]
!If A-EXP1 [16bit and] (A-EXP2)

Wait, but you can't use bitwise-and like a logical "and" the same way you can with "or".

Title: Re: The Optimization Compilation
Post by: kindermoumoute on December 29, 2010, 06:55:44 pm

Wooooaa ! I need translate this in my french tutorial, can I ? :love:

Title: Re: The Optimization Compilation
Post by: squidgetx on December 29, 2010, 07:55:40 pm

Quote from: calc84maniac on December 29, 2010, 06:46:57 pm

Quote from: Quigibo on December 29, 2010, 02:14:09 pm
By the way, you can't have non-constant data as part of the Data() command. And that whole inData() optimization only works for comparisons with numbers between 0-255. You can optimize it to this always though:

Code: [Select]
!If A-EXP1 [16bit and] (A-EXP2)
Wait, but you can't use bitwise-and like a logical "and" the same way you can with "or".

Actually, yeah, i think calc84's right (I tried it today and it didn't work).

And kinder, go ahead :)

Title: Re: The Optimization Compilation
Post by: Quigibo on December 29, 2010, 08:45:05 pm

Oh you're right. But I just realized that your other optimization doesn't work either. You cannot do this for AND:

Code: [Select]

!If A-EXP1 + (B-EXP2)This doesn't work because there are more solutions than there should be. For instance, if A were EXP2 and B were EXP1 the expression would be zero when it should be non-zero.

But you can do this for a compound OR:

Code: [Select]

If A=EXP1 + (B=EXP2)
And you can do this for a compound AND:

Code: [Select]

!If A-EXP1 [16-bit or] (B-EXP2)
And speaking of optimizations. One major optimization that usually gets ignored is recycling large axe commands. Axe is not like BASIC and so each command needs its own subroutine to add to your program. For instance, lets say you use Pt-On() and Pt-Mask() in your code. Each one has to bring its own 100+ byte subroutine into your program. But you can probably get away with having just a Pt-Mask routine, recycling it to act like Pt-On by simply adding a 2nd layer to your sprite which is only 8 bytes extra instead of 100ish. Or you could do the opposite too and only have Pt-On() and Pt-Change() to manually change both buffers at once. This generally reduces the overall size of the program by a lot when you use the routines only once or very rarely in your code. And I don't mean calling it rarely, it could be the most used routine in your code, I just mean rarely appears in your source.

More examples: You can draw pixels and boxes with just sprite commands. Boxes can be drawn with just pixels or lines. Strait lines can be drawn with just boxes or pixels. Its best if you only have one type of DispGraph in your all your code. Pre-flip sprites so you don't need the flip commands in your code. etc.

The only downside is that it slightly reduces the speed since its not as specialized but its usually worth it because you save a lot of memory, especially in smaller programs.

Title: Re: The Optimization Compilation
Post by: ztrumpet on December 29, 2010, 11:06:12 pm

Awesome! Thanks squidgetx! ;D

Title: Re: The Optimization Compilation
Post by: Runer112 on December 29, 2010, 11:46:14 pm

In case anyone was wondering, here is the smallest loop structure that I know of in Axe. It will execute the loop n times, with A starting at n-1 and decreasing down to 0:

Code: [Select]

n
While 
  -1→A
  ;Code
  A
End

And if Quigibo adds conditional jumps, this sort of Do...While loop would be even smaller (which he has not yet, so don't bother with this code as of Axe 0.4.7):

Code: [Select]

n
Lbl L
  →A
  ;Code
!If A-1
  Goto L
End

EDIT: As a side note in case you read this Quigibo, any chance for Do...While loops? They're more optimized than normal While loops.

Title: Re: The Optimization Compilation
Post by: Michael_Lee on December 29, 2010, 11:50:51 pm

Quote from: Runer112 on December 29, 2010, 11:46:14 pm

In case anyone was wondering, here is the smallest loop structure that I know of in Axe.

By smallest, do you mean in terms of size, or in terms of speed? (or both?)

Title: Re: The Optimization Compilation
Post by: Runer112 on December 30, 2010, 12:01:33 am

Both ;) It's 8 bytes smaller and should save about 52 t-states per iteration.

Title: Re: The Optimization Compilation
Post by: Builderboy on December 30, 2010, 12:25:28 am

Wow that is a great loop optimization O.O i think with all the optimizations present here i could shave hundreds of bytes off of all of my Axe projects O.O

Title: Re: The Optimization Compilation
Post by: squidgetx on December 30, 2010, 07:21:32 am

Quigibo, when I use "+" I meant 16 bit "or" (I think I wrote that there next to it...) and wow, nice, Runer :)

Also, edited those new opts into the first post

Title: Re: The Optimization Compilation
Post by: DJ Omnimaga on December 30, 2010, 11:29:56 pm

Nice, so would it be better to use this instead of a For( loop?

Title: Re: The Optimization Compilation
Post by: squidgetx on January 02, 2011, 10:43:11 am

^yes.

I'm adding another opt that is rather specific (ofc by Runer112) (moving around a sprite) but I think that people will appreciate it. i also commented it so that you can understand what it's doing more easily (because at first you can't even begin to decipher what it's doing lol)

Title: Re: The Optimization Compilation
Post by: Munchor on January 02, 2011, 10:45:30 am

Nice squidget, the one from the other post, I went :crazy:

Title: Re: The Optimization Compilation
Post by: squidgetx on January 03, 2011, 06:48:45 pm

updated with discussion regarding +1 over 1 with respect to false expression testing (!If statements)

Again, thanks to Runer112 lol

Title: Re: The Optimization Compilation
Post by: Happybobjr on January 03, 2011, 07:13:12 pm

is this an optimization?

if x=5
0->A
end

to

If x=5
-1->A (minus)
end

Title: Re: The Optimization Compilation
Post by: Deep Toaster on January 03, 2011, 07:20:06 pm

Quote from: happybobjr on January 03, 2011, 07:13:12 pm

is this an optimization?

if x=5
0->A
end

to

If x=5
-1->A (minus)
end

Yep, it is, and really useful, too.

It's even better to do

Code: (Axe) [Select]

:!If A-5
:→A
:End

Title: Re: The Optimization Compilation
Post by: Happybobjr on January 03, 2011, 07:21:29 pm

:banghead:
i'm an idiot sometimes.
thank you very much

[edit]
also,
would this be better?
?*256

to

?*2*2*2*2*2*2*2*2

and.
?/8
to
?/2/2/2

Thanks in advance.
[/edit]

Title: Re: The Optimization Compilation
Post by: ztrumpet on January 03, 2011, 07:29:07 pm

That would make it faster, but the code would be bigger. :)

Title: Re: The Optimization Compilation
Post by: Deep Toaster on January 03, 2011, 07:31:29 pm

Quote from: happybobjr on January 03, 2011, 07:21:29 pm

:banghead:
i'm an idiot sometimes.
thank you very much

[edit]
also,
would this be better?
?*256

to

?*2*2*2*2*2*2*2*2

and.
?/8
to
?/2/2/2

Thanks in advance.
[/edit]

Nope, Axe already does the most optimized form. Look in the optimizations .txt for more info.

Title: Re: The Optimization Compilation
Post by: squidgetx on January 03, 2011, 08:31:51 pm

Just ran some tests:

Code: [Select]

.TEST
For(A,0,50000)
*2*2*2*2*2*2*2*2
End

Code: [Select]

.TEST
For(A,0,50000)
*256
End

The first completed in about 1.8 seconds, in a 50 byte size program
The second completed in about 1.3 seconds in a 45 byte size program

HOWEVER

Code: [Select]

.TEST
For(A,0,50000)
*2*2*2*2*2*2*2
End

completed in 1.8 seconds while

Code: [Select]

.TEST
For(A,0,50000)
*128
End

completed in 2.4 seconds

Seems like that trick only works on powers of 2 that are even?

I don't have much time to run more tests, can anyone confirm this?

Title: Re: The Optimization Compilation
Post by: DJ Omnimaga on January 03, 2011, 08:50:43 pm

Interesting stuff, the multiplication one seems weird O.o

Title: Re: The Optimization Compilation
Post by: Quigibo on January 03, 2011, 09:58:59 pm

Only *64 and *128 are more optimized as a bunch of *2's but it will make them larger by 1 and 2 bytes respectively. You can always look at the commands.inc file to see exactly what routines are used and how big they are. Also, I think *256/2 is faster than *128 but that only works if the original number is between 0 to 32,767.

Title: Re: The Optimization Compilation
Post by: Runer112 on January 03, 2011, 11:07:14 pm

Science lesson! YAY ;D (By the way I completely understand if you don't want to read this whole thing, it's a bit lengthy. I'll put the important parts in bold.)

All Axe operations rely on using the register pair hl as the "Ans" value, using it to hold the running value of calculations. For those who aren't fully familiar with z80 assembly, hl is a combination of h and l, two 8-bit (1-byte) "registers." Registers are like variables you might store in memory, but they're stored directly inside of the the processor, so they can be used quickly. The basic registers (a, b, c, d, e, h, and l) are all 8-bit values, so most commands were built to work with these 8-bit registers, hence the z80 being an 8-bit system. However, Zilog knew that 8 bits was a little restrictive, and especially for systems that would have more than 256 bytes of memory, being able to easily use and manipulate 16-bit values (like pointers) would be very helpful. Did you notice that the other 5 basic registers go in alphabetical order before randomly jumping to h and l? Well that's because h and l are two very special 8-bit registers, designed to easily be combined into the Higher and Lower halves of a 16-bit value. With this special designation come very useful 16-bit operations built-in.

*2, for instance, simply breaks down to one assembly instruction: add hl,hl. This simply adds the value of hl to itself, which in other words multiplies it by 2. Because Zilog knew a basic function like adding would be a core operation, they made sure to make it small and swift: 1 byte to call and 11 cycles to execute.

*256 is a multiplication by 2^8, so one could achieve this by adding hl to itself 8 times. But there's an easier way to think of this. Just like how multiplication by 10 in a decimal system shifts every digit left one place, multiplication by 2 in a binary system shifts every digit left one place as well. And because hl is a 16-bit value with the high 8 bits in h and the low 8 bits in l, shifting these bits left 8 places would just result in the value in l being shifted all the way out and into h, and 8 trailing zeros being shifted into l. So instead of using add hl,hl eight times, *256 uses the following instructions: ld h,l / ld l,0. The first costs 1 byte and 4 cycles and the second costs 2 bytes and 7 cycles, for a grand total of 3 bytes and 11 cycles. Just as fast as *2!

*128, however, isn't so easy. Again, the obvious approach is to add hl to itself 7 times. This would cost 7 bytes and 77 cycles. You may also think to use the previous technique to multiply hl by 256, and then divide it by 2. However, we have a problem (pun not intended). If we multiplied the value by 256 and then divided it by 2, we would have lost the highest bit from the multiplication by 256 before dividing it by 2 again! So that's a bit of a problem. Anyways, Axe is optimized for size, and we can do better: using a loop. And although the z80 is a pretty old system, they were nice enough to give us a built-in loop structure: ld b,7 / add hl,hl / djnz $-1. This loads 7 into the b register, adds hl to hl, and then repeats adding hl to hl 6 more (b-1) times. Although this is a good amount slower than either of the previous options, coming in at 170 cycles, it only takes 2 bytes to initialize the loop counter, 1 byte for the add instruction, and 2 bytes for the loop execution instruction, for a total of 5 bytes.

Sorry to bore you... But congratulations, you now all know at least a little bit about z80 assembly, the structure of compiled Axe code! And the more you know about Axe's internals, the more you can optimize it, whether it be for speed or size! ;)

EDIT: Wow, it took me a whole hour to write this? Major ninja'd.

Title: Re: The Optimization Compilation
Post by: Deep Toaster on January 03, 2011, 11:20:21 pm

Quote from: squidgetx on January 03, 2011, 08:31:51 pm

Just ran some tests:

Just seeing those numbers (For(A,0,50000)) shows how fast Axe is compared to BASIC O.O

And that's a really clear explanation, Runer! Hopefully it'll help people understans and find their own opts.

Title: Re: The Optimization Compilation
Post by: DJ Omnimaga on January 03, 2011, 11:52:43 pm

I like how in Runer112 post I understand what hl is and some stuff about it easily, while with ASM in 28 days, in my 3 attempts, it gave me a brain f***. I guess your post was explained in a way that suits me more than ASM in 28 days. Thanks for explaining how multiplication works in Axe, though. :D

Title: Re: The Optimization Compilation
Post by: Happybobjr on January 20, 2011, 01:43:03 pm

Which is faster?

Pxl-on(X,Y)
Pxl-on(X+1,Y)
Pxl-on(X,Y+1)
Pxl-on(X+1,Y+1)

or

Rect(X,Y,2,,)

Title: Re: The Optimization Compilation
Post by: Runer112 on January 20, 2011, 02:14:46 pm

The rectangle command will be faster than pixel commands. Assuming equally-distributed onscreen X and Y values, the Pxl-On() method will take an average of 1306.5 cycles, whereas the Rect() method will take an average of 790.5 cycles. Which reminds me that I need to add timing information for commands like Rect() to my commands list.

Title: Re: The Optimization Compilation
Post by: Happybobjr on January 20, 2011, 02:36:34 pm

Thank you very much.
One more question.

Which is faster.

For(A,0,5)
For(B,0,5)
Pt-On(A*8,B*8,Pic1)
End
End

Or

For(A,0,35)
Pt-On(A/6*8,A^6*8,Pic1)
end

Title: Re: The Optimization Compilation
Post by: Runer112 on January 20, 2011, 02:58:55 pm

I can say without testing that the first will definitely be faster, as it avoids using division and moduli. The fastest way to achieve this with Pt-On() logic that I can imagine would be the following:

Code: [Select]

5*8
Lbl LY
  →Y
  5*8
  Lbl LX
    →X
    Pt-On(,Y,Pic1)
  If X
    -8
    Goto LX
  End
If Y
  -8
  Goto LY
End

This would also be slightly faster if you used Pt-Off() instead, but I don't know if your situation requires Pt-On() logic. If you're okay with Pt-Off() logic, I could actually give you an even faster way probably.

Title: Re: The Optimization Compilation
Post by: Happybobjr on January 20, 2011, 03:04:42 pm

thank you very much

Title: Re: The Optimization Compilation
Post by: Munchor on January 21, 2011, 07:58:31 pm

Code: [Select]

While I
Disp "O
End

Will While I work as If I? If yes, is it faster than While I = 1?

Title: Re: The Optimization Compilation
Post by: Happybobjr on January 21, 2011, 08:01:44 pm

the code will create an endless loop.
that is faster than While I=1

Title: Re: The Optimization Compilation
Post by: rizoom on January 24, 2011, 11:56:11 am

And of course, instead of doing

Code: [Select]

:If EXP
:1->A
:End

Code: [Select]

:If EXP
:->A
:End

But this is much quicker AND smaller (saves 5 bytes)

Code: [Select]

:EXP->A

Title: Re: The Optimization Compilation
Post by: Builderboy on January 24, 2011, 12:08:32 pm

Unfortunately those are not equivalent pieces of code. In the first two examples, a 1 is stored to A upon a true expression, and it remains the same under a false expression. In the last, it is 1 when its true and 0 when its false.

Title: Re: The Optimization Compilation
Post by: rizoom on January 24, 2011, 12:13:13 pm

Oh yes, sorry!

Title: Re: The Optimization Compilation
Post by: Builderboy on January 24, 2011, 12:14:06 pm

Hey thats ok :) Sometimes this optimization works even if there is a difference ^^

Welcome to Omnimaga too! It's nice to see another Axe user around here :) Why don't you make a post in the Introduce Yourself page so we can all give you a proper hearty welcome :D

Title: Re: The Optimization Compilation
Post by: Happybobjr on January 24, 2011, 07:40:34 pm

Quote from: Builderboy on January 24, 2011, 12:14:06 pm

Hey thats ok :) Sometimes this optimization works even if there is a difference ^^

Welcome to Omnimaga too! It's nice to see another Axe user around here :) Why don't you make a post in the Introduce Yourself page so we can all give you a proper peanut filled welcome :D

fixed

Question mark!
0-A->B->C->D.....->Z->0

to

0->{l1-56}
fill(l1-56,56)

Title: Re: The Optimization Compilation
Post by: ztrumpet on January 24, 2011, 08:30:46 pm

Quote from: happybobjr on January 24, 2011, 07:40:34 pm

Question mark!
0-A->B->C->D.....->Z->0

to

0->{l1-56}
fill(l1-56,56)

How about this:
0->{^oA}
Fill(^oA,56)

This takes into account the moving of variables. :)

Title: Re: The Optimization Compilation
Post by: Runer112 on January 24, 2011, 09:15:05 pm

If you only need to initialize five or fewer 2-byte values like variables, it is best to initialize them without any special commands. With this method, you can also avoid changing the values of any memory locations that you don't want to initialize between values you do want to initialize.

Code: (18 bytes) [Select]

0→A→B→X→Y→{L₁}ʳ

However, if you need to initialize more than five values, it is best to use the Fill() method, like you suggested for zeroing out every variable. However, a few slight changes:

Storing 2 bytes to a constant address (e.g. °A) is more optimized than storing 1 byte.
Because of the above modification and the first two bytes now being equal to 0, the start address for Fill() can be increased by 1 (e.g. °A+1) to save some clock cycles by reducing the total number of bytes needed to be filled by 1.
There are only 27 variables, for a total of 54 bytes. Subtracting 1 for the above reason and 1 more due to the fact that the size argument of Fill() doesn't count the starting byte gives a grand total of 52 bytes to fill.

Code: (20 bytes) [Select]

0→{°A}ʳ
Fill(°A+1,52)

The above method can also be used to initialize 1-byte values by adjusting the size parameter of Fill() accordingly. Note, though, that it will only work for initializing 2-byte values whose low and high bytes are equal, like 0. To initialize six or fewer 2-byte values that do not adhere to this, use the first method mentioned in this post. To initialize more than six 2-byte values that do not adhere to this, use something like the following method. This example will initialize all 27 variables to the hex value 1337:

Code: (21 bytes) [Select]

ᴇ1337→{°A}ʳ
Copy(°A,+2,52)

Title: Re: The Optimization Compilation
Post by: squidgetx on February 03, 2011, 08:09:53 pm

Added discussion regarding preevaluating expressions. It seems easy, but sometimes you don't remember to do it :P (I know I didn't in SandLand)

Quote from: squidgetx on December 28, 2010, 12:17:53 pm

Pre-evaluating expressions: Especially in games that heavily reference arrays throughout a section of code, it is often good both for speed and memory to pre-evaluate expressions that do not change throughout the loop. Look at this code for drawing a scrolling 16x16 tilemapper with X and Y positions in pixels:
Code: [Select]
For(A,0,7) For(B,0,11) If {Y/8+B*16+(X/8)+A+L1} Pt-On(A*8-(X^8),B*8-(Y^8),{Y/8+B*32+(X/8)+A}*8+Pic0 End End EndThere is a HUGE speed gain from simply preevaluating some of the expressions before entering the loop:
Code: [Select]
X/8->E Y/8->F X^8->I Y^8->J For(A,0,7) For(B,0,11) If {F+B*16+E+A+L1}->C Pt-On(A*8-I,B*8-J,C*8+Pic0 End End End

...and also added the thing about Fill(). Man, this thing is getting hard to edit XD

Title: Re: The Optimization Compilation
Post by: calcdude84se on February 05, 2011, 09:51:28 pm

Code: [Select]

X/8->E
Y/8->F
X^8->I
Y^8->J
For(A,0,7)
*8-I->G
For(B,0,11)
If {F+B*16+E+A+L1}->C
Pt-On(G,B*8-J,C*8+Pic0
End
End
End

Even pre-evaluating for inner For( loops ;D

Title: Re: The Optimization Compilation
Post by: Happybobjr on February 05, 2011, 09:55:46 pm

Quote from: calcdude84se on February 05, 2011, 09:51:28 pm

X/8->E
Y/8->F
X^8->I
Y^8->J
For(A,0,7)
*8-I->G
For(B,0,11)
If {+F*16+E+A+L1}->C
Pt-On(G,B*8-J,C*8+Pic0
End
End
End

could you optimize that to... ^

Title: Re: The Optimization Compilation
Post by: calcdude84se on February 05, 2011, 09:58:12 pm

Yes you can :)

Title: Re: The Optimization Compilation
Post by: calc84maniac on February 05, 2011, 10:03:19 pm

No, that doesn't work. You're not guaranteed that the last result will be B in the first statement of the For( loop. In fact, I tested and it isn't. It would actually be 11-B (though I wouldn't rely on that since it might change in other versions of Axe)

Title: Re: The Optimization Compilation
Post by: Happybobjr on February 05, 2011, 10:05:32 pm

that would explain the random pixels remaining in one of my games

Title: Re: The Optimization Compilation
Post by: calcdude84se on February 05, 2011, 10:15:08 pm

Ah, okay. That would make my inserted line "A*8-I->G" then.

Title: Re: The Optimization Compilation
Post by: Runer112 on February 05, 2011, 11:46:14 pm

Here's the fastest code I could come up with. It uses optimized loop structures and even more pre-calculation.

Also, did you mean to use B as the y value in your code squidgetx? Because looping from 0-11 wouldn't make much sense for y, so I changed it. :P

Code: (Code) [Select]

Y and ᴇF8*2+(X/2/2/2)+L₁→D
⁻(X^8)→I
⁻(Y^8)→J
12→B
Lbl LB
  64
  Lbl LA
    →A
    If {*2+B+D}
      →C
      Pt-On(B*8+I,A+J,C*8+Pic0)
    End
  If A
    -8
    Goto LA
  End
If {°B-1}ʳ
  -256→{°B-1}ʳ
  Goto LB
End

Code: (Comments) [Select]

Tilemap offset; optimized version of Y/8*16+(X/8)+L₁→D
X offset; 
Y offset
For(B,12,0,⁻1)

For(A,64,0,⁻8)


If tile != 0
Store tile value
Display tile

Second part of For(A) loop



Second part of For(B) loop
Utilizes the fact that the second byte of A is 0 and -256 is faster than -1

Title: Re: The Optimization Compilation
Post by: Deep Toaster on February 05, 2011, 11:48:39 pm

Epic. I love return'd value hacks :D

Title: Re: The Optimization Compilation
Post by: calcdude84se on February 05, 2011, 11:49:38 pm

:crazy: That's amazing, Runer. Do you always come up with code like this? :P

Title: Re: The Optimization Compilation
Post by: Runer112 on February 05, 2011, 11:53:46 pm

Quote from: Deep Thought on February 05, 2011, 11:48:39 pm

Epic. I love return'd value hacks :D

My favorite Axe optimization trick! ;D

Quote from: calcdude84se on February 05, 2011, 11:49:38 pm

:crazy: That's amazing, Runer. Do you always come up with code like this? :P

I am so familiar with using optimized loops that I pretty much automatically come up with maximally optimized loops. ;) The rest of the stuff, though, just takes a bit of thinking and comparing potential command sizes/speeds. This (http://ourl.ca/8693) file I assembled a while ago actually helps a good deal when assessing options.

Title: Re: The Optimization Compilation
Post by: Deep Toaster on February 06, 2011, 12:04:09 am

Quote from: Runer112 on February 05, 2011, 11:53:46 pm

Quote from: Deep Thought on February 05, 2011, 11:48:39 pm
Epic. I love return'd value hacks :D

My favorite Axe optimization trick! ;D

Pretty much the Axe opt trick. Everything else builds on it :D

Title: Re: The Optimization Compilation
Post by: Quigibo on February 06, 2011, 05:57:55 am

That's going be a thing of the past. That optimization is built into Axe now. ;)

Title: Re: The Optimization Compilation
Post by: Deep Toaster on February 06, 2011, 10:01:18 am

Quote from: Quigibo on February 06, 2011, 05:57:55 am

That's going be a thing of the past. That optimization is built into Axe now. ;)

Which one?

Title: Re: The Optimization Compilation
Post by: squidgetx on February 06, 2011, 11:10:11 am

woah, that's mad nice, Runer. Is it always better to avoid For() loops and use Lbl/Goto structures instead (or the While -1 structure)?

Quigibo, do you mean that something like

Code: [Select]

A-1->A
!If A
...

would be auto-opted to

Code: [Select]

A-1->A
!If
...

Title: Re: The Optimization Compilation
Post by: Deep Toaster on February 06, 2011, 11:44:13 am

If so that'd be awesome. And maybe if a loop starts with While A or Repeat A, move the A to the end of the loop too?

Title: Re: The Optimization Compilation
Post by: Builderboy on February 06, 2011, 01:18:02 pm

Wow thats going to be auto optimized now? Thats gotta shave lots of bytes off of every program O.O

Title: Re: The Optimization Compilation
Post by: ztrumpet on February 06, 2011, 01:18:39 pm

Quote from: Quigibo on February 06, 2011, 05:57:55 am

That's going be a thing of the past. That optimization is built into Axe now. ;)

Woah, really? Awesome! ;D I can't wait for a new version. :D

Title: Re: The Optimization Compilation
Post by: calc84maniac on February 06, 2011, 02:48:26 pm

To take advantage of it, you'd still have to code in the same way, I think. But actually typing the variable name makes much more readable code :)

Title: Re: The Optimization Compilation
Post by: Builderboy on February 06, 2011, 02:52:07 pm

Quigibo, how is this being optomised? Is it actually looking into the assembly and seeing when there is a Ld ($) HL followed by a LD HL ($) ? or something along those lines?

Title: Re: The Optimization Compilation
Post by: Quigibo on February 06, 2011, 09:09:55 pm

Its not an auto-optimization, but it is a significant one.

To have a loop with a conditional at the end saves 3 bytes from a loop with conditional checking at the beginning. For instance:

Code: [Select]

Repeat getKey(15)
...code...
End

Can become:

Code: [Select]

While 1
...code...
EndIf getKey(15)

You can't do this with every loop since a lot of them need to check before entering the loop, but with situations like this example, it probobly doesn't matter. The EndIf can actually be used on any loop structure though, including do loops, while loops, repeat loops, and for loops. A do loop, which is also new, is just a "While 1" or "Repeat 0" which automatically get optimized to not do the check at all since they always loop.

Title: Re: The Optimization Compilation
Post by: ztrumpet on February 06, 2011, 09:38:46 pm

Cool! I love post test loops! ;D

Title: Re: The Optimization Compilation
Post by: squidgetx on February 13, 2011, 04:29:44 pm

Code: [Select]

{20*12+L6+2}^^r e4 This optimization relies on the fact that the numbers can basically be pre-computed: use the following formula to derive the numbers you should use:

Code: [Select]

{Y*12+L6+(X/8)}e(X^8) So for another example, the command pxl-Test(8,1) becomes {12+L6}e1.

The speed gain from this is so great that you can even still save (although not as much) even with a variable Y value. How you treat the constant X value remains the same as before, but simply substitute in your variable Y value in the above code. So for example, pxl-Test(31,Y) becomes {Y*12+L6+3}e7.

Edit: here are the numbers if anyone wants them
pxl-Test is 53 bytes, ~237 cycles, plus 66 to call. Then we have to load two constants, adding 20 cycles. Let's make a conservative estimate and say 320 cycles.
{} is 14 cycles. Loading a constant into it is 10 cycles. The fastest bit-check is e7 which is 20 cycles, while the slowest is e2 which is 49 cycles. So we're in the range of 44-73 cycles, less than a third of pxl-Test() O.o

For variable Y, pxl-Test(Y,constant) is about 326 cycles. Now, loading a var is 16 cycles, *12 is 52, plus a constant is 21 cycles. The speed of this is 123-152 cycles, still more than twice as fast.

I suspect that variable X and Y with this method is about the same speed as regular pxl-testing.

Title: Re: The Optimization Compilation
Post by: NinjaKnight on February 13, 2011, 10:17:33 pm

Is using Rect( to draw straight horizontal/vertical lines faster than Line( ?

Title: Re: The Optimization Compilation
Post by: Runer112 on February 13, 2011, 11:59:10 pm

Quote from: squidgetx on February 13, 2011, 04:29:44 pm

If you are pixel testing a constant pixel, like pxl-Test(20,20), you can more than halve the speed of this command with the following optimization:
Code: [Select]
{20*12+L6+2}e4

Remember, storing or recalling two-byte values at a constant address is always smaller and faster than storing or recalling one-byte values!

Code: [Select]

{20*12+L₆+2}ʳe4

Quote from: NinjaKnight on February 13, 2011, 10:17:33 pm

Is using Rect( to draw straight horizontal/vertical lines faster than Line( ?

Definitely. Here are the results I got for vertical lines. Also note the 3 bytes saved in the Rect() arguments with some nice hl "abuses." :P

Line(A,0,A,63): ~6132 cycles
Rect(A,0,+1,-2): ~3354 cycles

The results are even more profound for horizontal lines. Also, 1 byte saved in the Rect() arguments.

Line(0,A,95,A): 9809 cycles
Rect(0,A,^-1,+2): 2579 cycles

But if you really want a blazingly fast horizontal line drawer in pure Axe, use the following. It has a check to make sure that the y-value is valid, and if so, the whole process of calling the routine, checking the y-value, and drawing the line takes only 497 cycles. You would call it with something like Ysub(HL). Also, note the trick employed to check that the y-value is less than 64. That saves a few bytes and cycles over a normal comparison by utilizing the fact that we want a value precisely in the 6-bit range. By simply resetting these 6 bits and leaving any other bits unchanged, any 6-bit values will become zero and any out of range values will become nonzero, resulting in easy checking.

Code: [Select]

Lbl HL
  →r₆
  ReturnIf and b11000000
  ᴇFFFF→{r₆*12+L₆}ʳ
  Fill(,10)
Return

Title: Re: The Optimization Compilation
Post by: NinjaKnight on February 14, 2011, 10:28:17 pm

O.O That's a saving of OVER 9000 CYCLES!

Title: Re: The Optimization Compilation
Post by: squidgetx on February 16, 2011, 07:19:08 am

Here's something really cool I found out yesterday:
DispGraph^r is faster than DispGraph. By more than 10000 cycles, too. It varies from calc to calc, but on mine its a full 15000 cycles faster O.o This means that if you're making a monochrome game, and you're not using the backbuffer....(at the cost of around 15 bytes or so) you can make a

Quote from: NinjaKnight on February 14, 2011, 10:28:17 pm

saving of OVER 9000 CYCLES!

Title: Re: The Optimization Compilation
Post by: Deep Toaster on February 16, 2011, 09:26:08 am

Quote from: squidgetx on February 16, 2011, 07:19:08 am

Here's something really cool I found out yesterday:
DispGraph^r is faster than DispGraph. By more than 10000 cycles, too. It varies from calc to calc, but on mine its a full 15000 cycles faster O.o This means that if you're making a monochrome game, and you're not using the backbuffer....(at the cost of around 15 bytes or so) you can make a
Quote from: NinjaKnight on February 14, 2011, 10:28:17 pm
saving of OVER 9000 CYCLES!

Wait ... what? How did that work O.o

Title: Re: The Optimization Compilation
Post by: Happybobjr on February 16, 2011, 10:01:25 am

Quote from: squidgetx on February 16, 2011, 07:19:08 am

Here's something really cool I found out yesterday:
DispGraph^r is faster than DispGraph. By more than 10000 cycles, too. It varies from calc to calc, but on mine its a full 15000 cycles faster O.o This means that if you're making a monochrome game, and you're not using the backbuffer....(at the cost of around 15 bytes or so) you can make a
Quote from: NinjaKnight on February 14, 2011, 10:28:17 pm
saving of OVER 9000 CYCLES!

how? that's weird

Title: Re: The Optimization Compilation
Post by: Runer112 on February 16, 2011, 01:21:46 pm

I see you've been reading up on my Commands documentation, eh squidgetx? Yeah, that's an interesting thing I discovered when speed testing the display commands. On calculators like mine with the old, "good" screen drivers, the screen driver delay seems to be pretty low and constant from calculator to calculator. DispGraph could run just as fast or faster than DispGraph^r on these calculators. However, due to inconsistencies with the screen drivers in newer units, the routine may run too fast for the driver on some calculators, causing display problems, so Quigibo had to add a portion of code to pause the routine until the driver says it is ready. However, this pause itself adds some overhead time, making the routine slower.

Quigibo, the DispGraph^r routine doesn't have any throttling system in place, yet no problems have been reported with it on newer calculators. Could you just remove the throttling system from the DispGraph routine and add one or two time-wasting instructions to make each loop iteration take as many cycles as each DispGraph^r loop iteration?

EDIT: Hmm I don't know if Quigibo reads this thread and would see that, so I'm probably going to post that in a major thread he reads or send him a message about that.

Title: Re: The Optimization Compilation
Post by: Deep Toaster on February 16, 2011, 03:36:14 pm

Weird, so basically DispGraph^r has less of a delay than DispGraph, but it still works fine?

EDIT: Quigibo seems to read this occasionally (see the first post on this page). But posting in the Features Wishlist or something else would be a good idea, too, especially now that this thread is no longer in the Axe project subforum.

Title: Re: The Optimization Compilation
Post by: squidgetx on February 22, 2011, 04:03:28 pm

Quote

A quick and small way to determine the sign of a value (better than >>0) is
Code: [Select]
EXP//32768. It will return -1 if the value is negative, and 0 if the value is 0 or positive

Also added how Text(30*256+20):Text("Text") is much smaller than Text(20,30,"Text")

Credits to Runer112.

Title: Re: The Optimization Compilation
Post by: Freyaday on March 07, 2011, 12:43:26 am

I'd like to point out that DispGraph^r is not faster than DispGraph, it just uses less cycles. How can this be? DispGraph^r requires a 6MHz clock, because the screen requires a ~10microsec delay between successive inputs, otherwise the screen displays random garbage. DispGraph^r is faster than that @ Full speed, necessitating the slowdown. DispGraph, however, uses TI's own method to prevent this, an altermate display routine with the necessary delay built in.

Title: Re: The Optimization Compilation
Post by: Runer112 on March 07, 2011, 12:54:32 am

What do you mean DispGraph^r is not faster? Using fewer cycles at the same clock speed pretty much defines it as being faster. Although the actual byte retrieval, calculations, and outputting to the screen take more cycles, it doesn't have the safety checks built-in that DispGraph does, which slow the DispGraph routine down to being slower than the DispGraph^r routine.

Title: Re: The Optimization Compilation
Post by: Freyaday on March 07, 2011, 01:42:40 am

I'm trying to explain that the two commands are of roughly equal speed when the use of the Full and Normal commands are taken into account, and constantly switching between the two clock speeds so as not to slow down the rest of the program probably isn't what the compatability mode was meant for. Add to that the time of having to store to both buffers, and the advantages of DispGraph^r kinda get negated.
That said, if the program is intended for a 6MHz calc, then use ^r.

Title: Re: The Optimization Compilation
Post by: Runer112 on March 07, 2011, 01:59:01 am

I'm a little confused... I'm not sure in what context you would be switching clock speeds. And by a compatibility mode, do you mean the fact that the DispGraph routine waits for the screen driver to say that it's ready? I'm a bit confused about that because calling it a compatibility "mode" would suggest that there's something you can just turn on to automatically activate the safety.

But disregarding things I'm confused about, I do know that the two commands are close to the same speed speed, but because DispGraph^r avoids driver readiness checking, it is in fact faster than DispGraph at 6MHz. And you can't really compare them at 15MHz, because DispGraph^r doesn't work properly at that clock speed due to the lack of driver checking. Also, you don't need to store to the back buffer every frame for DispGraph^r, you can still display black and white images by just keeping the back buffer clear.

Title: Re: The Optimization Compilation
Post by: Freyaday on March 07, 2011, 02:12:31 am

'Twould be something like
Normal
DispGraph^r
Full
The 6MHz clock setting is a compatability mode. And Axe checkerboards the grey areas in DispGraph^r.

Title: Re: The Optimization Compilation
Post by: Runer112 on March 07, 2011, 02:15:18 am

Oh, well I haven't tested it but I think that still might be slightly faster than DispGraph at 15MHz, not 100% sure though. It would certainly be closer. And as long as you keep the back buffer blank, using DispGraph^r would result in the same image put on the screen as DispGraph.

EDIT: Actually let me test this right now...

EDIT 2: The results are in! Data first, analysis after.

[Normal speed] DispGraph^r: ~94 fps
[Normal speed] DispGraph: ~76 fps
[Full speed] DispGraph: ~114 fps

At normal speed (6MHz), DispGraph^r is about 25% faster than DispGraph. However, with an interesting turn of events, at full speed (~16MHz on my calculator) the results are pretty much reversed, with DispGraph being about 20% faster than DispGraph^r. Now that I think of it, this makes sense. The DispGraph^r routine runs about as fast as data can be outputted to the LCD driver. At 6MHz, the DispGraph routine is incapable of running this fast, because the driver readiness check slows things down a good amount for each failed check. However, at 15/16MHz, the check will be running very quickly and should continue execution much more swiftly after the driver becomes ready, resulting in only a little bit of overhead from the checking. After the check is exited, the remainder of the loop will then execute very quickly as well, making up most of the lost overhead from checking by performing the remainder of the loop in less than half the normal time. On top of this, the rest of the routine, including setup and the loop that the data sending loop is inside of, will run more than twice as fast, explaining the switch in the speed placings.

Title: Re: The Optimization Compilation
Post by: Freyaday on March 07, 2011, 03:04:13 am

Ta-Da! Counterintuitivity strikes again! But this is officially a lesson: Always double check. And make sure it's been single-checked, too.

Title: Re: The Optimization Compilation
Post by: Darl181 on June 09, 2011, 10:39:05 am

Ok, I have no idea what's going wrong with this...
I've been optimizing dimansion shift. Thing is, the inData() trick always returns as true ???
It worked in a little test program I made, but as soon as I put it into the shift source, it bugs.

In the screenie of the source, the unoptimized yet working line is shown just above the commented buggy line.
No matter wha the value of {P+D} is, it always returns true, be it any number that appears in the level (any num from 0 to 11)

btw, I tried storing {P+D} to a var and substituted {P+D} for the var in the code, still the same problem...

Title: Re: The Optimization Compilation
Post by: aeTIos on June 09, 2011, 10:44:35 am

uh, data(2,3,0) O.o?

Title: Re: The Optimization Compilation
Post by: squidgetx on June 09, 2011, 11:47:10 am

That is indeed very strange...(btw aeTIos, the Data(2,3,0) is correct)

Perhaps it may be because {P+D}=0? If it is equal to 0, then the inData() routine will return 3 (which will read as true). You'll have to deal with the zero case separately.

Title: Re: The Optimization Compilation
Post by: Darl181 on June 09, 2011, 02:30:58 pm

Quote from: self

No matter what the value of {P+D} is, it always returns true, be it any number that appears in the level (any num from 0 to 11)

Even when it equaled, say, 8, it would return as true (is that the right way to say it btw) and draw a door.
(what this if statement is part of is the level drawing code, if this line is true then draw a door)

Hm..might the nested For() loops it's inside of have anything to do with it?
EDIT: actually nvm that, it did the same thing in the repeat loop

EDIT2: info axe version 053, running on an 84pbe running os2.43. Has extra ram if that makes a diff :P

Title: Re: The Optimization Compilation
Post by: Runer112 on June 09, 2011, 04:56:55 pm

I tested that exact inData() situation and got the expected result of 0. All I can think of is either P+D is pointing to the wrong place or {P+D} equals zero, which will throw off the inData() command.

Anyway, if you want to test if the value equals 2 or 3, you can test much more efficiently with the following code:

Code: [Select]

!If {P+D}/2-1
  .{P+D}=2 or 3
End

Title: Re: The Optimization Compilation
Post by: Darl181 on June 09, 2011, 06:25:37 pm

I have something in place that works atm, but it seems wherever I use inData it does the same thing (doesn't work). Just in that one program..

Title: Re: The Optimization Compilation
Post by: calc84maniac on July 14, 2011, 11:02:47 pm

In Axe 1.0.0, new compiler optimizations allow you to further optimize If statements comparing to single-byte constants!
Whereas before you would do something like !If A-7, now you can save a byte and 10 clock cycles by doing !If A xor 7

Title: Re: The Optimization Compilation
Post by: shmibs on July 14, 2011, 11:24:28 pm

=D i'm amazed that axe can just continually get faster like it has been for months now.

Title: Re: The Optimization Compilation
Post by: Happybobjr on October 20, 2011, 10:19:51 pm

don't forget about just *2

Title: Re: The Optimization Compilation
Post by: squidgetx on October 21, 2011, 07:34:31 pm

^^Sorry, what? Explain a little more, please

This is getting out-dated :( but I haven't used 1.0.5 enough to figure out all the auto opts and new opts that can be (ab)used :P

Title: Re: The Optimization Compilation
Post by: Happybobjr on October 21, 2011, 08:12:21 pm

just like....
0->A+1->B
is more optimized than
0->A
1->B
....
4->A * 2->B
is more optimized than
4->A
8->B

||||||||||||||||||||||||

also, i think 'number less than 256' /256 might be more efficient than just 0

Title: Re: The Optimization Compilation
Post by: calc84maniac on October 21, 2011, 09:03:55 pm

Quote from: Happybobjr on October 21, 2011, 08:12:21 pm

also, i think 'number less than 256' /256 might be more efficient than just 0

Actually no, both of those are 3 bytes. But 'number less than 256' and 0 is only 2 bytes and it does the same thing :)

Title: Re: The Optimization Compilation
Post by: epic7 on November 11, 2011, 10:37:46 pm

I'm going to see how many bytes I can save now.

Title: Re: The Optimization Compilation
Post by: kindermoumoute on January 04, 2012, 04:09:47 pm

Currently, is this tutorial efficient yet ?

Title: Re: The Optimization Compilation
Post by: squidgetx on January 04, 2012, 04:18:09 pm

Umm it's a little outdated atm, though much of it is still OK.
Axe's peephole optimizer removes the need for things like

Code: [Select]

If A+1->A instead of

Code: [Select]

A+1->A
If A

Also DispGraph is just as fast as DispGraph^r
I'm planning to sit down sometime in the near future and update this a bit

Title: Re: The Optimization Compilation
Post by: squidgetx on January 04, 2012, 05:12:12 pm

Done.

For those of you who don't want to scan through, I've made a list of most of the new additions/changes

The HL section is quasi-obsolete
DispGraph^r is no longer faster than DispGraph
The !If A xor EXP structure is now better than the !If A - EXP unless you are using an optimized addition/subrtraction
When storing 0 to a 1 byte variable, and 0-> is better than 0->
I linked to Runer's index thread as well as stole the math opts table from him to be included here.
Otherwise cleaned up and reformatted

Title: Re: The Optimization Compilation
Post by: Builderboy on January 04, 2012, 05:28:37 pm

the 'and 0->' code only works if HL is less than 256

Title: Re: The Optimization Compilation
Post by: squidgetx on January 04, 2012, 10:39:41 pm

Hm, I thought that it doesn't matter because you're only storing the least significant byte into wherever you're storing?

Also I need to add stuff on the Select() command...Maybe later :P Edit: hm, and I want to discuss the structure of loops utilizing addition in place of mulitplication...So much to do so little time D:

Title: Re: The Optimization Compilation
Post by: Darl181 on January 10, 2012, 02:42:18 pm

Is there a way to make the inData() trick work with pointers and/or 16-bit numbers? If used erroneously will it mess up to the point of a crash?

Edit: or is there a different way now with the newer versions of Axe? :P

Title: Re: The Optimization Compilation
Post by: squidgetx on January 10, 2012, 07:50:29 pm

Well it really depends on the circumstances...
As far as 16bit numbers, I don't know for sure but I think not...
And what do you mean 'with pointers'?

Wow I'm so helpful. Haha but can you clarify a little bit?

Title: Re: The Optimization Compilation
Post by: Darl181 on January 10, 2012, 09:22:37 pm

Something like...
If inData({L1},Data(2,4,6,8,0))
..or maybe even...
If inData({L1}+1,Data(3,5,7,9,0))

Title: Re: The Optimization Compilation
Post by: squidgetx on January 14, 2012, 09:37:09 pm

That should work (sorry for the late reply lol)

No harm in trying these things out yourself btw ;) haha

Title: Re: The Optimization Compilation
Post by: Darl181 on January 14, 2012, 09:41:21 pm

I try, but more often than not test programs work differently for me :P and half the time I try random ambitious code weird stuff happens XD