### Author Topic: Unnamed Optimizing Compiler  (Read 426 times)

0 Members and 1 Guest are viewing this topic.

#### Xeda112358

• Xombie.
• Moderator
• LV12 Extreme Poster (Next: 5000)
• Posts: 4537
• Rating: +711/-6
• meow :3
##### Unnamed Optimizing Compiler
« on: April 17, 2019, 11:30:48 pm »
Hi, for a long time I've wanted to create a compiler (targeting the Z80) that was even better at optimizing. Back when I was first brainstorming a Grammer compiler, I had in mind the issues of existing Z80 C compilers and even Axe. The Z80 really is poorly suited to C, but Axe is actually pretty amazing (as many know). One thing that C, Axe, and Grammer all have in common is a standard way of passing arguments. For example, in the latter cases, the first arguments are passed in HL and BC (respectively). This doesn't always produce optimal code, but it is a lot easier to work with.

So my fantasy was writing a program on my computer that tried to find a more optimal way of passing arguments. I haven't managed it, but it is a lovely dream

I want to put this small, undocumented set of programs out for the public. I wrote it today and there are a bunch of issues, but on some code it works really well.

As an example, here is an example of input:
Code: [Select]
Disp((2*x+1)*6+x)Disp((27*x+1)*4+x)And output:
Code: [Select]
  ld hl,(var_x)  add hl,hl \ inc l  ld b,h \ ld c,l \ add hl,hl \ add hl,bc \ add hl,hl  ld de,(var_x)  add hl,de  call disp_uint16  ld de,27  ld hl,(var_x)  call mulHLDE  inc hl  add hl,hl \ add hl,hl  ld de,(var_x)  add hl,de  call disp_uint16
irToZ80.db basically contains the conversions from a high-level token to low level assembly. Each routine has input and output registers defined, as well as size and timing to use as metrics for optimization.

I reused tokens.db from another project, all that is important is name, type, and precedence.

If you run ./compile, it passes test.txt through sy.py to convert the source code into an intermediate representation (basically RPN), then passes that file through irToZ80.py to generate Z80 assembly code.

I basically implemented what I think is Djikstra's Shortest Path algorithm to find the solution that minimizes the target metric. In this case, I try to find the code that minimizes (6*size)^2+(speed)^2.

#### Xeda112358

• Xombie.
• Moderator
• LV12 Extreme Poster (Next: 5000)
• Posts: 4537
• Rating: +711/-6
• meow :3
##### Re: Unnamed Optimizing Compiler
« Reply #1 on: April 20, 2019, 10:24:33 am »
I'm starting to switch over to using an Abstract Syntax Tree (AST) for processing, and it is honestly more fun than I thought it would be. The code for optimizing the subexpressions is much easier. For example, x*2^n ==> x<<n, or x+1 converted to x++ (in this case an increment, not increment and assignment).

I started back up the bad habit of hardcoding some things just to get it to work, so I'll need to work on that!

The current big issue is with more complicated expressions like (x+y)+(x+1). This doesn't look like it should be complicated, but it basically evaluates this tree:
Code: [Select]
    Disp     |     +    / \   /   \  ++    +  |    / \  x   /   \     y     xIt freezes up trying to compile the subtrees while getting the outputs in the right order. This is due to how terribly I hacked together the code that tracked input and output. My goal is to come up with a well-defined way to pass and receive arguments (including allowing stack and variable juggling as glue code )

#### Art_of_camelot

• ಠ_ಠ ( ͡° ͜ʖ ͡°)
• CoT Emeritus
• LV13 Extreme Addict (Next: 9001)
• Posts: 6164
• Rating: +191/-9
• YouTube channel has my solo work and collaboration
##### Re: Unnamed Optimizing Compiler
« Reply #2 on: April 20, 2019, 09:19:57 pm »
Looks pretty cool Xeda. Somewhat unrelated, but I was looking up info on Game Boy programming (the Game Boy uses a Z80 clone +/- some things) and apparently you can do C on it via SDCC (or so I've read). Not sure how efficient it is or fast, but it's a thing.

#### Xeda112358

• Xombie.
• Moderator
• LV12 Extreme Poster (Next: 5000)
• Posts: 4537
• Rating: +711/-6
• meow :3
##### Re: Unnamed Optimizing Compiler
« Reply #3 on: April 21, 2019, 03:45:43 pm »
Looks pretty cool Xeda. Somewhat unrelated, but I was looking up info on Game Boy programming (the Game Boy uses a Z80 clone +/- some things) and apparently you can do C on it via SDCC (or so I've read). Not sure how efficient it is or fast, but it's a thing.
SDCC produces working, but somewhat inefficient code. One of the lofty goal for this project is to be able to convert LLVM IR code to (efficient) z80 assembly, for which there are existing C-->LLVM IR programs.

I know jacobly has done some pretty good work on llvm-z80, producing some pretty nice code, but there are some shortcomings still.

EDIT: I forgot to update I'm tired.

I'm running into a problem with the AST method. I'm trying to compile directly from the AST, and while the code seems correct, it isn't as efficient. The reason for this is that I haven't figured out an analogous algorithm that will work on ASTs without needlessly searching the entire set of potential codes. Most operations have four to eight implementations using different inputs/outputs. so, even compiling a short 9-token expression like (4+x)*3+7-y could have hundreds of thousands to millions of potential paths. The previous algorithm is able to drastically reduce the search space.

That said, I'm still able to get more clever with algorithmic optimizations. For example, if you try something like 3*(x+1)+5*(x+1), it will be optimized 8*(x+1) and then (x+1)<<3, producing the code:
Code: [Select]
 ld hl,(x) inc hl add hl,hl add hl,hl add hl,hl call disp_uint16
However, 3*(x+1)+9*(x+1) will convert to 12*(x+1) and try to make a routine faster than calling multiplication since only two bits are set in 12, producing this code:
Code: [Select]
 ld hl,(x) inc hl ld bc,(x) inc bc sla c rl b add hl,bc add hl,hl add hl,hl call disp_uint16With the other method, this code would have compiled 12(x+1) to something like:
Code: [Select]
 ld de,(x) inc de ld hl,(x) inc hl add hl,hl add hl,de add hl,hl add hl,hl call disp_uint16It still isn't optimal since neither algorithm keeps track of the current state aside from input and output, but the latter code tries more code paths, even if they start out suboptimal. The AST-based algorithm makes local optimizations, while the previous algorithm took into account the whole code.

For reference, this is how I, a human (I swear), would implement it:
Code: [Select]
 ld hl,(x) inc hl ld b,h ld c,l add hl,hl add hl,bc add hl,hl add hl,hl call disp_uint16The main difference is that I take advantage of the fact that X+1 is held in HL, so I don't need to compute it again to put in BC or DE. This is because I am taking into account "state." That is, I know that HL already contains the value I want, whereas the compilers currently only know that HL holds some value.

One thing that I learned about in looking into LLVM is the idea of Static Single Assignment (SSAs), which I think will make it a lot easier to track state and optimize accordingly, providing a great improvement in code quality. The challenge with that is trying to optimally allocate variables, using registers, stack space, or RAM.
« Last Edit: April 21, 2019, 04:16:53 pm by Xeda112358 »

#### Xeda112358

• Xombie.
• Moderator
• LV12 Extreme Poster (Next: 5000)
• Posts: 4537
• Rating: +711/-6
• meow :3
##### Re: Unnamed Optimizing Compiler
« Reply #4 on: April 23, 2019, 11:23:08 am »
I finally compiled a real program!

With this new iteration of the program, I am using an AST to perform algorithmic optimizations, but I am compiling with a modified version of the original method. Now it is producing better code that is (importantly) correct. My first attempt with this version got really, really slow on slightly complicated inputs-- I terminated the process after two hours trying to compile one line (it wasn't an infinite loop, either; the search space got huge).

So I modified it to allow limiting the amount of paths it kept in memory Keeping a single path would produce code like the previous version, but I found that 60-100 would produce decent code, while 200 produces better code.

It *still* doesn't track state, so there are lots of unnecessary ld hl,() and whatnot.

Anyways, here is what compiling via the commandline might look like:
Code: [Select]
#Convert the code to an intermediate representationpython sy.py test.txt#Convert the intermediate representation to Z80 assembly, with the ti-83+ family headerpython compile.py test.ir -TI8X -MAX_PATHS=150#Compilespasm test.asm bin/test.8xp -I inc -I routines
It will take a few seconds to generate the assembly, but it works! Attached is a screenshot of the following code:
Code: [Select]
3->x6->yDisp(3*(2*x+y*3)+9*(x+y))Disp(1337)Disp(3*(x+y)+9*(x+y))

#### Eeems

• Mr. Dictator
• LV13 Extreme Addict (Next: 9001)
• Posts: 6156
• Rating: +318/-36
• C'est la vie
##### Re: Unnamed Optimizing Compiler
« Reply #5 on: April 23, 2019, 11:30:23 am »
I'm excited to see where you go with this
I sure hope you put this in source control somewhere and allow us to submit pull requests if we want to help you with it.
/e

• Posts: 287
• Rating: +42/-0
##### Re: Unnamed Optimizing Compiler
« Reply #6 on: April 23, 2019, 11:43:09 pm »
I'm excited to see where you go with this
With that "Disp(1337)", not far...

#

GL xedy
« Last Edit: April 23, 2019, 11:50:15 pm by the_mad_joob »
"No human is trustworthy, not even me..." - the_mad_joob

#### Xeda112358

• Xombie.
• Moderator
• LV12 Extreme Poster (Next: 5000)
• Posts: 4537
• Rating: +711/-6
• meow :3
##### Re: Unnamed Optimizing Compiler
« Reply #7 on: April 24, 2019, 01:28:08 pm »
I put it up on GitHub. I added the GotoIf() function, which is really important for a programming language. From this, I'll build blocks like If, While, For, etc.
Here is input code:
Code: [Select]
4->k0->x1->ylbl0:  x+y->z  x+1->x  y+1->y  Disp(10*z)  k-1->k  GotoIf(lbl0,k!=0)2->xDisp(x)And attached is a screenshot of what the program displays.
As cool as that is, the generated assembly code is pretty ugly:
Code: [Select]
#include "jq.inc"#include "ti83plus.inc"scrap           = 8000hvar_k           = scrap+0var_x           = scrap+2var_y           = scrap+4var_z           = scrap+6.db $BB,$6D.org \$9D95 ld hl,4 ld (var_k),hl ld hl,0 ld (var_x),hl ld hl,1 ld (var_y),hllbl0: ld hl,(var_x) ld de,(var_y) add hl,de ld (var_z),hl ld hl,(var_x) inc hl ld (var_x),hl ld hl,(var_y) inc hl ld (var_y),hl ld hl,(var_z) add hl,hl add hl,hl ld de,(var_z) add hl,de add hl,hl call disp_uint16 ld hl,(var_k) dec hl ld (var_k),hl ld a,h or l jqnz(lbl0 ) ld hl,2 ld (var_x),hl#include "disp_uint16.z80"

#### Art_of_camelot

• ಠ_ಠ ( ͡° ͜ʖ ͡°)
• CoT Emeritus
• LV13 Extreme Addict (Next: 9001)
• Posts: 6164
• Rating: +191/-9
• YouTube channel has my solo work and collaboration
##### Re: Unnamed Optimizing Compiler
« Reply #8 on: May 09, 2019, 09:38:53 pm »
Very cool stuff!