Author Topic: The Optimization Compilation (Read 91007 times)

Happybobjr · « **Reply #30 on:** January 03, 2011, 07:13:12 pm »

is this an optimization?

if x=5
0->A
end

to

If x=5
-1->A (minus)
end

Deep Toaster · « **Reply #31 on:** January 03, 2011, 07:20:06 pm »

Quote from: happybobjr on January 03, 2011, 07:13:12 pm

is this an optimization?

if x=5
0->A
end

to

If x=5
-1->A (minus)
end

Yep, it is, and really useful, too.

It's even better to do

Code: (Axe) [Select]

:!If A-5
:→A
:End

Happybobjr · « **Reply #32 on:** January 03, 2011, 07:21:29 pm »

i'm an idiot sometimes.
thank you very much

[edit]
also,
would this be better?
?*256

to

?*2*2*2*2*2*2*2*2

and.
?/8
to
?/2/2/2

Thanks in advance.
[/edit]

ztrumpet · « **Reply #33 on:** January 03, 2011, 07:29:07 pm »

That would make it faster, but the code would be bigger.

Deep Toaster · « **Reply #34 on:** January 03, 2011, 07:31:29 pm »

Quote from: happybobjr on January 03, 2011, 07:21:29 pm

i'm an idiot sometimes.
thank you very much

[edit]
also,
would this be better?
?*256

to

?*2*2*2*2*2*2*2*2

and.
?/8
to
?/2/2/2

Thanks in advance.
[/edit]

Nope, Axe already does the most optimized form. Look in the optimizations .txt for more info.

squidgetx · « **Reply #35 on:** January 03, 2011, 08:31:51 pm »

Just ran some tests:

Code: [Select]

.TEST
For(A,0,50000)
*2*2*2*2*2*2*2*2
End

vs

Code: [Select]

.TEST
For(A,0,50000)
*256
End

The first completed in about 1.8 seconds, in a 50 byte size program
The second completed in about 1.3 seconds in a 45 byte size program

HOWEVER

Code: [Select]

.TEST
For(A,0,50000)
*2*2*2*2*2*2*2
End

completed in 1.8 seconds while

Code: [Select]

.TEST
For(A,0,50000)
*128
End

completed in 2.4 seconds

Seems like that trick only works on powers of 2 that are even?

I don't have much time to run more tests, can anyone confirm this?

DJ Omnimaga · « **Reply #36 on:** January 03, 2011, 08:50:43 pm »

Interesting stuff, the multiplication one seems weird O.o

Quigibo · « **Reply #37 on:** January 03, 2011, 09:58:59 pm »

Only *64 and *128 are more optimized as a bunch of *2's but it will make them larger by 1 and 2 bytes respectively. You can always look at the commands.inc file to see exactly what routines are used and how big they are. Also, I think *256/2 is faster than *128 but that only works if the original number is between 0 to 32,767.

Runer112 · « **Reply #38 on:** January 03, 2011, 11:07:14 pm »

Science lesson! YAY

(By the way I completely understand if you don't want to read this whole thing, it's a bit lengthy. I'll put the important parts in bold.)

All Axe operations rely on using the register pair hl as the "Ans" value, using it to hold the running value of calculations. For those who aren't fully familiar with z80 assembly, hl is a combination of h and l, two 8-bit (1-byte) "registers." Registers are like variables you might store in memory, but they're stored directly inside of the the processor, so they can be used quickly. The basic registers (a, b, c, d, e, h, and l) are all 8-bit values, so most commands were built to work with these 8-bit registers, hence the z80 being an 8-bit system. However, Zilog knew that 8 bits was a little restrictive, and especially for systems that would have more than 256 bytes of memory, being able to easily use and manipulate 16-bit values (like pointers) would be very helpful. Did you notice that the other 5 basic registers go in alphabetical order before randomly jumping to h and l? Well that's because h and l are two very special 8-bit registers, designed to easily be combined into the Higher and Lower halves of a 16-bit value. With this special designation come very useful 16-bit operations built-in.

*2, for instance, simply breaks down to one assembly instruction: add hl,hl. This simply adds the value of hl to itself, which in other words multiplies it by 2. Because Zilog knew a basic function like adding would be a core operation, they made sure to make it small and swift: 1 byte to call and 11 cycles to execute.

*256 is a multiplication by 2^8, so one could achieve this by adding hl to itself 8 times. But there's an easier way to think of this. Just like how multiplication by 10 in a decimal system shifts every digit left one place, multiplication by 2 in a binary system shifts every digit left one place as well. And because hl is a 16-bit value with the high 8 bits in h and the low 8 bits in l, shifting these bits left 8 places would just result in the value in l being shifted all the way out and into h, and 8 trailing zeros being shifted into l. So instead of using add hl,hl eight times, *256 uses the following instructions: ld h,l / ld l,0. The first costs 1 byte and 4 cycles and the second costs 2 bytes and 7 cycles, for a grand total of 3 bytes and 11 cycles. Just as fast as *2!

*128, however, isn't so easy. Again, the obvious approach is to add hl to itself 7 times. This would cost 7 bytes and 77 cycles. You may also think to use the previous technique to multiply hl by 256, and then divide it by 2. However, we have a problem (pun not intended). If we multiplied the value by 256 and then divided it by 2, we would have lost the highest bit from the multiplication by 256 before dividing it by 2 again! So that's a bit of a problem. Anyways, Axe is optimized for size, and we can do better: using a loop. And although the z80 is a pretty old system, they were nice enough to give us a built-in loop structure: ld b,7 / add hl,hl / djnz $-1. This loads 7 into the b register, adds hl to hl, and then repeats adding hl to hl 6 more (b-1) times. Although this is a good amount slower than either of the previous options, coming in at 170 cycles, it only takes 2 bytes to initialize the loop counter, 1 byte for the add instruction, and 2 bytes for the loop execution instruction, for a total of 5 bytes.

Sorry to bore you... But congratulations, you now all know at least a little bit about z80 assembly, the structure of compiled Axe code! And the more you know about Axe's internals, the more you can optimize it, whether it be for speed or size!

EDIT: Wow, it took me a whole hour to write this? Major ninja'd.

Deep Toaster · « **Reply #39 on:** January 03, 2011, 11:20:21 pm »

Quote from: squidgetx on January 03, 2011, 08:31:51 pm

Just ran some tests:

Just seeing those numbers (For(A,0,50000)) shows how fast Axe is compared to BASIC

And that's a really clear explanation, Runer! Hopefully it'll help people understans and find their own opts.

DJ Omnimaga · « **Reply #40 on:** January 03, 2011, 11:52:43 pm »

I like how in Runer112 post I understand what hl is and some stuff about it easily, while with ASM in 28 days, in my 3 attempts, it gave me a brain f***. I guess your post was explained in a way that suits me more than ASM in 28 days. Thanks for explaining how multiplication works in Axe, though.

Happybobjr · « **Reply #41 on:** January 20, 2011, 01:43:03 pm »

Which is faster?

Pxl-on(X,Y)
Pxl-on(X+1,Y)
Pxl-on(X,Y+1)
Pxl-on(X+1,Y+1)

or

Rect(X,Y,2,,)

Runer112 · « **Reply #42 on:** January 20, 2011, 02:14:46 pm »

The rectangle command will be faster than pixel commands. Assuming equally-distributed onscreen X and Y values, the Pxl-On() method will take an average of 1306.5 cycles, whereas the Rect() method will take an average of 790.5 cycles. Which reminds me that I need to add timing information for commands like Rect() to my commands list.

Happybobjr · « **Reply #43 on:** January 20, 2011, 02:36:34 pm »

Thank you very much.
One more question.

Which is faster.

For(A,0,5)
For(B,0,5)
Pt-On(A*8,B*8,Pic1)
End
End

Or

For(A,0,35)
Pt-On(A/6*8,A^6*8,Pic1)
end

Runer112 · « **Reply #44 on:** January 20, 2011, 02:58:55 pm »

I can say without testing that the first will definitely be faster, as it avoids using division and moduli. The fastest way to achieve this with Pt-On() logic that I can imagine would be the following:

Code: [Select]

5*8
Lbl LY
  →Y
  5*8
  Lbl LX
    →X
    Pt-On(,Y,Pic1)
  If X
    -8
    Goto LX
  End
If Y
  -8
  Goto LY
End

This would also be slightly faster if you used Pt-Off() instead, but I don't know if your situation requires Pt-On() logic. If you're okay with Pt-Off() logic, I could actually give you an even faster way probably.

Author Topic: The Optimization Compilation (Read 91007 times)

Happybobjr

Re: The Optimization Compilation

Deep Toaster

Re: The Optimization Compilation

Happybobjr

Re: The Optimization Compilation

ztrumpet

Re: The Optimization Compilation

Deep Toaster

Re: The Optimization Compilation

squidgetx

Re: The Optimization Compilation

DJ Omnimaga

Re: The Optimization Compilation

Quigibo

Re: The Optimization Compilation

Runer112

Re: The Optimization Compilation

Deep Toaster

Re: The Optimization Compilation

DJ Omnimaga

Re: The Optimization Compilation

Happybobjr

Re: The Optimization Compilation

Runer112

Re: The Optimization Compilation

Happybobjr

Re: The Optimization Compilation

Runer112

Re: The Optimization Compilation