Author Topic: Unnamed Optimizing Compiler (Read 7464 times)

Xeda112358 · « **on:** April 17, 2019, 11:30:48 pm »

Hi, for a long time I've wanted to create a compiler (targeting the Z80) that was even better at optimizing. Back when I was first brainstorming a Grammer compiler, I had in mind the issues of existing Z80 C compilers and even Axe. The Z80 really is poorly suited to C, but Axe is actually pretty amazing (as many know). One thing that C, Axe, and Grammer all have in common is a standard way of passing arguments. For example, in the latter cases, the first arguments are passed in HL and BC (respectively). This doesn't always produce optimal code, but it is a lot easier to work with.

So my fantasy was writing a program on my computer that tried to find a more optimal way of passing arguments. I haven't managed it, but it is a lovely dream

I want to put this small, undocumented set of programs out for the public. I wrote it today and there are a bunch of issues, but on some code it works really well.

As an example, here is an example of input:

Code: [Select]

Disp((2*x+1)*6+x)
Disp((27*x+1)*4+x)

And output:

Code: [Select]

  ld hl,(var_x)
  add hl,hl \ inc l
  ld b,h \ ld c,l \ add hl,hl \ add hl,bc \ add hl,hl
  ld de,(var_x)
  add hl,de
  call disp_uint16
  ld de,27
  ld hl,(var_x)
  call mulHLDE
  inc hl
  add hl,hl \ add hl,hl
  ld de,(var_x)
  add hl,de
  call disp_uint16

irToZ80.db basically contains the conversions from a high-level token to low level assembly. Each routine has input and output registers defined, as well as size and timing to use as metrics for optimization.

I reused tokens.db from another project, all that is important is name, type, and precedence.

If you run ./compile, it passes test.txt through sy.py to convert the source code into an intermediate representation (basically RPN), then passes that file through irToZ80.py to generate Z80 assembly code.

I basically implemented what I think is Djikstra's Shortest Path algorithm to find the solution that minimizes the target metric. In this case, I try to find the code that minimizes (6*size)^2+(speed)^2.

Xeda112358 · « **Reply #1 on:** April 20, 2019, 10:24:33 am »

I'm starting to switch over to using an Abstract Syntax Tree (AST) for processing, and it is honestly more fun than I thought it would be. The code for optimizing the subexpressions is much easier. For example, x*2^n ==> x<<n, or x+1 converted to x++ (in this case an increment, not increment and assignment).

I started back up the bad habit of hardcoding some things just to get it to work, so I'll need to work on that!

The current big issue is with more complicated expressions like (x+y)+(x+1). This doesn't look like it should be complicated, but it basically evaluates this tree:

Code: [Select]

    Disp
     |
     +
    / \
   /   \
  ++    +
  |    / \
  x   /   \
     y     x

It freezes up trying to compile the subtrees while getting the outputs in the right order. This is due to how terribly I hacked together the code that tracked input and output. My goal is to come up with a well-defined way to pass and receive arguments (including allowing stack and variable juggling as glue code

)

TIfanx1999 · « **Reply #2 on:** April 20, 2019, 09:19:57 pm »

Looks pretty cool Xeda. Somewhat unrelated, but I was looking up info on Game Boy programming (the Game Boy uses a Z80 clone +/- some things) and apparently you can do C on it via SDCC (or so I've read). Not sure how efficient it is or fast, but it's a thing.

Xeda112358 · « **Reply #3 on:** April 21, 2019, 03:45:43 pm »

Quote from: Art_of_camelot on April 20, 2019, 09:19:57 pm

Looks pretty cool Xeda. Somewhat unrelated, but I was looking up info on Game Boy programming (the Game Boy uses a Z80 clone +/- some things) and apparently you can do C on it via SDCC (or so I've read). Not sure how efficient it is or fast, but it's a thing.

SDCC produces working, but somewhat inefficient code. One of the lofty goal for this project is to be able to convert LLVM IR code to (efficient) z80 assembly, for which there are existing C-->LLVM IR programs.

I know jacobly has done some pretty good work on llvm-z80, producing some pretty nice code, but there are some shortcomings still.

EDIT: I forgot to update

I'm tired.

I'm running into a problem with the AST method. I'm trying to compile directly from the AST, and while the code seems correct, it isn't as efficient. The reason for this is that I haven't figured out an analogous algorithm that will work on ASTs without needlessly searching the entire set of potential codes. Most operations have four to eight implementations using different inputs/outputs. so, even compiling a short 9-token expression like (4+x)*3+7-y could have hundreds of thousands to millions of potential paths. The previous algorithm is able to drastically reduce the search space.

That said, I'm still able to get more clever with algorithmic optimizations. For example, if you try something like 3*(x+1)+5*(x+1), it will be optimized 8*(x+1) and then (x+1)<<3, producing the code:

Code: [Select]

 ld hl,(x)
 inc hl
 add hl,hl
 add hl,hl
 add hl,hl
 call disp_uint16

However, 3*(x+1)+9*(x+1) will convert to 12*(x+1) and try to make a routine faster than calling multiplication since only two bits are set in 12, producing this code:

Code: [Select]

 ld hl,(x)
 inc hl
 ld bc,(x)
 inc bc
 sla c
 rl b
 add hl,bc
 add hl,hl
 add hl,hl
 call disp_uint16

With the other method, this code would have compiled 12(x+1) to something like:

Code: [Select]

 ld de,(x)
 inc de
 ld hl,(x)
 inc hl
 add hl,hl
 add hl,de
 add hl,hl
 add hl,hl
 call disp_uint16

It still isn't optimal since neither algorithm keeps track of the current state aside from input and output, but the latter code tries more code paths, even if they start out suboptimal. The AST-based algorithm makes local optimizations, while the previous algorithm took into account the whole code.

For reference, this is how I, a human (I swear), would implement it:

Code: [Select]

 ld hl,(x)
 inc hl
 ld b,h
 ld c,l
 add hl,hl
 add hl,bc
 add hl,hl
 add hl,hl
 call disp_uint16

The main difference is that I take advantage of the fact that X+1 is held in HL, so I don't need to compute it again to put in BC or DE. This is because I am taking into account "state." That is, I know that HL already contains the value I want, whereas the compilers currently only know that HL holds some value.

One thing that I learned about in looking into LLVM is the idea of Static Single Assignment (SSAs), which I think will make it a lot easier to track state and optimize accordingly, providing a great improvement in code quality. The challenge with that is trying to optimally allocate variables, using registers, stack space, or RAM.

Xeda112358 · « **Reply #4 on:** April 23, 2019, 11:23:08 am »

I finally compiled a real program!

With this new iteration of the program, I am using an AST to perform algorithmic optimizations, but I am compiling with a modified version of the original method. Now it is producing better code that is (importantly) correct. My first attempt with this version got really, really slow on slightly complicated inputs-- I terminated the process after two hours trying to compile one line (it wasn't an infinite loop, either; the search space got huge).

So I modified it to allow limiting the amount of paths it kept in memory Keeping a single path would produce code like the previous version, but I found that 60-100 would produce decent code, while 200 produces better code.

It *still* doesn't track state, so there are lots of unnecessary ld hl,() and whatnot.

Anyways, here is what compiling via the commandline might look like:

Code: [Select]

#Convert the code to an intermediate representation
python sy.py test.txt

#Convert the intermediate representation to Z80 assembly, with the ti-83+ family header
python compile.py test.ir -TI8X -MAX_PATHS=150

#Compile
spasm test.asm bin/test.8xp -I inc -I routines

It will take a few seconds to generate the assembly, but it works! Attached is a screenshot of the following code:

Code: [Select]

3->x
6->y
Disp(3*(2*x+y*3)+9*(x+y))
Disp(1337)
Disp(3*(x+y)+9*(x+y))

Eeems · « **Reply #5 on:** April 23, 2019, 11:30:23 am »

I'm excited to see where you go with this

I sure hope you put this in source control somewhere and allow us to submit pull requests if we want to help you with it.

the_mad_joob · « **Reply #6 on:** April 23, 2019, 11:43:09 pm »

Quote from: Eeems on April 23, 2019, 11:30:23 am

I'm excited to see where you go with this

With that "Disp(1337)", not far...

#####

GL xedy

Xeda112358 · « **Reply #7 on:** April 24, 2019, 01:28:08 pm »

I put it up on GitHub. I added the GotoIf() function, which is really important for a programming language. From this, I'll build blocks like If, While, For, etc.
Here is input code:

Code: [Select]

4->k
0->x
1->y
lbl0:
  x+y->z
  x+1->x
  y+1->y
  Disp(10*z)
  k-1->k
  GotoIf(lbl0,k!=0)
2->x
Disp(x)

And attached is a screenshot of what the program displays.
As cool as that is, the generated assembly code is pretty ugly:

Code: [Select]

#include "jq.inc"
#include "ti83plus.inc"
scrap           = 8000h
var_k           = scrap+0
var_x           = scrap+2
var_y           = scrap+4
var_z           = scrap+6
.db $BB,$6D
.org $9D95
 ld hl,4
 ld (var_k),hl
 ld hl,0
 ld (var_x),hl
 ld hl,1
 ld (var_y),hl
lbl0:
 ld hl,(var_x)
 ld de,(var_y)
 add hl,de
 ld (var_z),hl
 ld hl,(var_x)
 inc hl
 ld (var_x),hl
 ld hl,(var_y)
 inc hl
 ld (var_y),hl
 ld hl,(var_z)
 add hl,hl
 add hl,hl
 ld de,(var_z)
 add hl,de
 add hl,hl
 call disp_uint16
 ld hl,(var_k)
 dec hl
 ld (var_k),hl
 ld a,h
 or l
 jqnz(lbl0 )
 ld hl,2
 ld (var_x),hl
#include "disp_uint16.z80"

TIfanx1999 · « **Reply #8 on:** May 09, 2019, 09:38:53 pm »

Very cool stuff!

Author Topic: Unnamed Optimizing Compiler (Read 7464 times)

Xeda112358

Unnamed Optimizing Compiler

Xeda112358

Re: Unnamed Optimizing Compiler

TIfanx1999

Re: Unnamed Optimizing Compiler

Xeda112358

Re: Unnamed Optimizing Compiler

Xeda112358

Re: Unnamed Optimizing Compiler

Eeems

Re: Unnamed Optimizing Compiler

the_mad_joob

Re: Unnamed Optimizing Compiler

Xeda112358

Re: Unnamed Optimizing Compiler

TIfanx1999

Re: Unnamed Optimizing Compiler