Brian Robert Callahan

academic, developer, with an eye towards a brighter techno-social life



[prev]
[next]

2021-04-25
A multi-platform "compiler"

All source code for this blog post can be found here.

Given enough time, every blog eventually programs a Brainfuck compiler. We can do that too. But we'll make it worth our while. Let's create a Brainfuck compiler that has multiple backends for different CPUs. We'll create a non-optimizing frontend and then backends for amd64, i386, Z80, and a generic C backend for all other CPUs. The compiler will automagically know what the native platform is on the machine that built it. It will also carry all other backends at all times and allow the user to cross compile with a simple command line switch.

I wrote the compiler in C. It's fairly simple to port to your favorite language.

Brainfuck

Brainfuck is an esoteric language invented in 1993 by Urban Müller. It is famous for being extremely simple to implement: the entire language contains only eight meaningful symbols, all of which are a single character. Anything outside those eight characters are considered comments. The mental model for us is not too dissimilar to a simple Turing Machine, complete with an "infinite" tape and the ability to move to any arbitrary cell, make any arbitrary cell contain any arbitrary value, and make jumping decisions based on if the current cell contains a value of 0.

Here is a table of the symbols in Brainfuck:

Let's code that up. The work can be found starting on line 175 of bfc.c. You'll notice that all of those functions are little more than shims that call the cg_ prefixed versions of the functions. Indeed, this helps with code organization: it lets us hide away the complexity of figuring what platform to generate code for to another C file, cg.c.

This is the entirety of our parser. I trust that if you are able to understand the parser from our Z80 assembler, then you will be able to understand this much simpler parser. The simplicity of this parser means that our backends will be non-optimizing, with the exception of the generic C backend, which will optimize because the C compiler will optimize the generated C code. This does mean the C backend is the best backend in terms of optimization. But our goal here is to learn how we can easily create multiple backends for a larger compiler suite.

Why not an Intermediate Language?

In real production compilers, you often see code compiled to an intermediate language, and from there moved into target-specific assembly (if GCC) or target-specific object code (if LLVM). Yes, intermediate code allows for a lot of powerful optimizations. Simple first, though.

You could try your hand at creating your own intermediate language, as you already have some of the tools to do so. You could, for example, create your own generic assembly language that your frontend targets, and from there create an assembler that, instead of creating a binary file like we did before, would instead output target-specific assembly.

Determining code generation

We keep a list of targets in bfc.h. The number attached to any individual backend does not matter so much, though for organization I decided the generic C backend should occupy the highest number (255) and the CPU-specific backends should count up to that number. It turns out we need ten code generation functions: one for each of the eight Brainfuck symbols plus startup and ending. The only logic we need in cg.c is to put a switch in each function to move us into the correct target code generation function. Each backend can then occupy its own C file for clear organization.

Writing backends

I began with amd64 for our first backend target. The code organization within amd64.c is exactly the same as cg.c except now we will write the assembly that corresponds to the Brainfuck symbol.

The i386 backend came next. I copied amd64.c to i386.c and performed a direct translation of the assembly from 64-bit to 32-bit.

Next came the generic C backend. This one is different from all the others since it is not true assembly code. The big difference is that to make C look good requires different indentation rules than assembly. Since I was making sure the assembly printed nicely for human consumption, I chose to do the same for the generated C code.

Last was the Z80 backend. This backend generates code for our assembler. It may not work with other Z80 assemblers out of the box, since I rely on a hidden extension in our assembler: allowing labels to begin with any non-numeric character, in this case, a period.

Configuring the native backend

The compiler comes with a simple configure script that determines the native backend for this machine. Remember that the compiler is always a cross compiler, and compiling for a different platform only requires the addition of a command line -t flag.

Conclusion

A Brainfuck compiler is about as simple as you can get for a compiler. Certainly a whole lot less eventful compared to our assembler!

Even so, we can see there are some interesting considerations to allow our frontend to target multiple backends. Perhaps this can be the beginning of our own backend system. Whenever I want to learn a new assembly language, I usually start by coding up a backend for an esoteric language. It helps me get familiar with the syntax and understand how memory is referenced and manupulated.

You should feel free to add additional backends for more CPUs. To get an idea of the amount of work involved, see the Z80 backend commit.

Top

RSS