Brian Robert Callahan

academic, developer, with an eye towards a brighter techno-social life



[prev]
[next]

2021-08-29
Let's get hands-on with QBE

All source code for this blog post can be found here.

Now that we have finished our bootstrap PL/0 compiler, I want to test out a shiny new toy I found that we may select as an important piece of our self-hosting compiler. That shiny new toy is QBE: a small and fast compiler back end that reads in an intermediate language, performs optimizations on it, and outputs assembly code for either amd64 or aarch64. I have heard that riscv64 support is in the works as well. But even just amd64 and aarch64 support will allow us to have a self-hosting compiler that outputs assembly for multiple CPUs. And these days, most of us reading this will be using an amd64 or aarch64 machine most of the time, so just those two platforms should be good enough for us. Seems like it's small enough that you could implement a new assembly outputter for the CPU of your choice as well. In any event, it is claimed to work on Mac OS, FreeBSD, and Linux, and it works on our OpenBSD development platform as well without any modifications, so I think QBE will be a good thing for us to try out and see if we like it.

I began thinking about this based on an email I had received about our compiler, asking if I would consider having the compiler target LLVM intermediate representation. I hadn't considered that. LLVM IR has some additional challenges that we must overcome, most notably its strict SSA form. QBE IR, while it also uses SSA form, is not as strict of a representation as LLVM IR, as we will soon see. QBE is also many orders of magnitude smaller than LLVM and written in C so we could even include a complete copy of the QBE sources with our self-hosting compiler. Even so, QBE claims 70% of the performance of compilers like LLVM in 10% of the code. Are those number true? Well, the 10% of the code certainly seems true. Whether or not we really get 70% of the performance, I am not sure. I am not sure it even matters. Even if we only got half of that in terms of performance, that's still a significant improvement over writing unoptimized assembly code. It also is "more realistic" for some values of "more" and "realistic," as the compilers you are probably using, namely GCC and LLVM, have their front ends compile to an intermediate language. With GCC, it's GENERIC and GIMPLE (and RTL). With LLVM, it's LLVM IR. For us, it might just be QBE IR. Let's try it out, write a small program directly in QBE IR to test things out and see if we like if. If we do, I propose we write two back ends for our self-hosting compiler: we'll start with a C back end and then add a QBE back end.

Remember that this is my first go at writing anything with QBE IR. My understanding is based on reading the QBE documentation and actualizing that understanding in writing this small program. It's very possible and expected that my understanding is imperfect. If you happen to be a QBE expert, let us know how we can improve our understanding. Even with our imperfect knowledge, we will have real usable knowledge of QBE and we will be able to make a decision if we would like to target it as a back end for our self-hosting PL/0 compiler.

Let's write (another) Brainfuck compiler

I find Brainfuck compilers to be a good introductory task when learning a new language: it by no means tests all the different aspects of the language but it does enough work for me to get a feel for if I like a language or not, because you can make a very simple or very complex compiler very compactly. Let's write a simple Brainfuck compiler in QBE IR and see how it goes.

I guess these things also end up as pseudo-tutorials, though I'm by no means an expect. This is the only program I've ever written directly in QBE IR. But I hope following along with me gets you to understand how I evaluate languages to solve a particular problem, and hopefully it helps demystify the QBE IR language itself. I had to start writing code to figure out what was going on but now that I have, I have to say I much like QBE IR, with perhaps minor reservations.

Basic QBE constructs

Comments begin with # and extend to the end of the line.

Functions are written somewhat similar to C. Let's take a look at a direct comparison.

This function in C:

int
main(void)
{

	return 0;
}

Would be written in QBE IR as:

export function w $main() {
@start
	ret 0
}

Names of global variables must begin with $, local labels (called blocks in QBE documentation) must begin with @, and local temporary variables must begin with %. Immediates are written as-is. Types must be prefixed before every variable. There are first-class types for 32-bit integers (w), 64-bit integers (l), single-precision floats (s), and double-precision floats (d). There are additional non-first-class types for bytes (b) and half-words (h), but in practice I only used bytes for string constants; if I wanted to use an individual byte from a string, it needed to be promoted to a w. We need to think about QBE IR as if it were assembly, so for us on our 64-bit systems, memory locations take l. When you directly reference a global variable in your QBE statements, you are working with the address of the global variable. If you want to use the value of the global variable, you will have to load the value into a temporary, manipulate the temporary, then store the new value of the temporary back into the global. We'll see that in action a little later.

String literals

We're going to have this Brainfuck compiler output the same style of C code as our hand-written assembly compiler. As I understand the QBE documentation, string literals need to be marked with the data reserved word, followed by a global name (begins with $), followed by the non-NUL-terminated but otherwise C-style string encased in curly brackets. Because everything needs to be prefixed with a type, that C-style string must be prefixed with b for bytes. So a string literal looks like this:

data $string = { b "This is a string.\n", b 0 }

As you can probably imagine, that , b 0 at the end is appending a NUL-byte to the end of the string. That means you could have written this string literal as:

data $string = { b "This is a string.\n\0" }

These two different ways of writing a string literal will produce slightly different, though equivalent, assembly. On my OpenBSD/amd64 machine, the first syntax produces:

.data
.balign 8
string:
        .ascii "This is a string.\n"
        .byte 0
/* end data */

While the second syntax produces:

.data
.balign 8
string:
        .ascii "This is a string.\n\0"
/* end data */

I have no arguments with either of these. Note that this is GNU assembler assembly that's being output. That's fine by me. Both GCC and Clang understand GNU assembly syntax so I'd say we're covered. And of course you can use GNU as directly as well if you'd like.

The first syntax is what I found in the QBE documentation. So that's what I used in the Brainfuck compiler. But don't feel like you have to use it. If you prefer the second syntax, that's fine. Let's write out all the strings we need. I am not going to worry about hand-optimization to the extreme. I want to do everything straightforward for today:

# Strings
data $bmis = { b "bfc: bracket mismatch\n", b 0 }

data $prologue = { b "char a[65536], *p=a;\nint\nmain(void)\n{\n", b 0 }
data $left = { b "p-=1;\n", b 0 }
data $right = { b "p+=1;\n", b 0 }
data $dec = { b "*p-=1;\n", b 0 }
data $inc = { b "*p+=1;\n", b 0 }
data $gchar = { b "*p=getchar();\n", b 0 }
data $pchar = { b "putchar(*p);\n", b 0 }
data $lopen = { b "while(*p){\n", b 0 }
data $lclose = { b "}\n", b 0 }
data $epilogue = { b "return 0;\n}\n", b 0 }

Since we're already here, let's also add whatever global variables we might need. We definitely need one global at least: we'll need a buffer to store the character we are going to write using the write(2) function. In our hand-written assembly, we cheated a bit and just assumed that the environment would set up a stack for us in a sane location. Let's be more routine here and make a global variable whose address we can then take. I am also going to add a depth variable as well. Since we're trying things out, I am going to use functions liberally and I might end us using the depth variable in multiple places, so a global is just the easiest to maintain. Let's add these:

# Global variables
data $ch = { b 0 }
data $depth = { w 0 }

# Strings
data $bmis = { b "bfc: bracket mismatch\n", b 0 }

data $prologue = { b "char a[65536], *p=a;\nint\nmain(void)\n{\n", b 0 }
data $left = { b "p-=1;\n", b 0 }
data $right = { b "p+=1;\n", b 0 }
data $dec = { b "*p-=1;\n", b 0 }
data $inc = { b "*p+=1;\n", b 0 }
data $gchar = { b "*p=getchar();\n", b 0 }
data $pchar = { b "putchar(*p);\n", b 0 }
data $lopen = { b "while(*p){\n", b 0 }
data $lclose = { b "}\n", b 0 }
data $epilogue = { b "return 0;\n}\n", b 0 }

If we process just this with QBE, we get this output:

.data
.balign 8
ch:
	.byte 0
/* end data */

.data
.balign 8
depth:
	.int 0
/* end data */

.data
.balign 8
bmis:
	.ascii "bfc: bracket mismatch\n"
	.byte 0
/* end data */

.data
.balign 8
prologue:
	.ascii "char a[65536], *p=a;\nint\nmain(void)\n{\n"
	.byte 0
/* end data */

.data
.balign 8
left:
	.ascii "p-=1;\n"
	.byte 0
/* end data */

.data
.balign 8
right:
	.ascii "p+=1;\n"
	.byte 0
/* end data */

.data
.balign 8
dec:
	.ascii "*p-=1;\n"
	.byte 0
/* end data */

.data
.balign 8
inc:
	.ascii "*p+=1;\n"
	.byte 0
/* end data */

.data
.balign 8
gchar:
	.ascii "*p=getchar();\n"
	.byte 0
/* end data */

.data
.balign 8
pchar:
	.ascii "putchar(*p);\n"
	.byte 0
/* end data */

.data
.balign 8
lopen:
	.ascii "while(*p){\n"
	.byte 0
/* end data */

.data
.balign 8
lclose:
	.ascii "}\n"
	.byte 0
/* end data */

.data
.balign 8
epilogue:
	.ascii "return 0;\n}\n"
	.byte 0
/* end data */

.section .note.GNU-stack,"",@progbits

Yeah, I'm good with this. Very sensible. Let's write some logic.

Printing characters to a file descriptor

Yes we can of course use printf or puts but we can learn a little more by doing it by hand. We only have to worry about writing to stdout and stderr anyhow.

Some things we need to know. First, we need to know that in assignment statements, only temporaries may be assigned to. There are load and store functions to be able to manipulate globals. The second thing we need to know is that temporaries may be non-SSA temporaries. This means we do not have to follow strict SSA form when dealing with temporaries. And it is this second piece that makes writing QBE IR so much easier than LLVM IR and I think might be the killer feature that will allow us to use QBE IR in our self-hosting PL/0 compiler.

Let's take a look at this function and talk it out:

function $dputs(l %s, w %fd) {	# Static function with parameters
@start
@loop
	%ch =w loadub %s	# Get byte at current string location
	%cmp =w ceqw %ch, 0	# Is byte NUL? Return 1 if true, 0 if false
	jnz %cmp, @done, @pbyte	# cmp returned true ? done : pbyte
@pbyte
	storew %ch, $ch		# Put byte into write buffer
	call $write(w %fd, l $ch, w 1)
	%s =l add %s, 1		# Next character in string
	jmp @loop
@done
	ret			# All functions end with ret, jmp, or jnz
}

Our dputs function is a void function that will take two parameters, a string %s and a file descriptor %fd. I don't remember seeing it in the QBE documentation but some trial-and-error uncovered that if you omit a type declaration between function and the function name, you get a void function. Even though the string is marked as a temporary, when we call dputs we'll use a global for that parameter and everything works.

QBE IR has a little quirk in that you cannot jump to the first block in a function. You can have two blocks right on top of each other and jump to the second block. So that's what I'm doing here. We have to dereference a character from the current string location, loadub will load an unsigned byte. Next, we check to see if that byte is 0, since if it is, we're finished with our string.

The comparison functions are regularly named: all begin with c, then the comparator you want (eq for equals, in our case), and all end with the type of the two sides of the comparison, w in our case. The comparisons return 1 if true, 0 if false. That then leads directly to a conditional jump, jnz, or jump if not 0. This is the only conditional jump available in QBE IR. There is an unconditional jump, jmp as well.

The conditional jump takes three operands, each separated by a comma. The first operand is a variable. The second is the block to jump to if the variable is not 0. The third is the block to jump to if the variable is 0. Therefore, the comparator and conditional jump together check to see if the current byte in the string is 0. If it is (i.e. the comparison returned 1), we jump to @done. If not, we jump to @pbyte.

We can simplify this further:

function $dputs(l %s, w %fd) {	# Static function with parameters
@start
@loop
	%ch =w loadub %s	# Get byte at current string location
	jnz %ch, @pbyte, @done	# %ch != 0 ? @pbyte : @done
@pbyte
	storew %ch, $ch		# Put byte into write buffer
	call $write(w %fd, l $ch, w 1)
	%s =l add %s, 1		# Next character in string
	jmp @loop
@done
	ret			# All functions end with ret, jmp, or jnz
}

Instead of relying on the return value of the comparison, we can eliminate the comparison entirely and just check the byte directly.

Now we have to print the byte. Recall that the C prototype for write(2) is ssize_t write(int, const void *, size_t);. What I chose to do is put the value of temporary variable %ch into the global variable $ch and then use the global as the pointer. As I understand the QBE documentation, globals never directly pass their value, they always pass their address. You must indirectly use load and store to get values out of globals. Therefore, if we were to translate that write call back to C, it would be write(fd, &ch, 1);. And that's what we want.

Finally, we increment the string location by one using %s as a non-SSA temporary, which we can see immediately since we use it twice in the %s =l add %s, 1 statement. Having non-SSA temporaries is incredibly easy and convenient. LLVM IR does not permit this expression because it does not allow non-SSA temporaries. And then we loop again. When we're done, we return.

I wonder what the amd64 assembly for this function looks like:

.text
dputs:
	pushq %rbp
	movq %rsp, %rbp
	pushq %rbx
	pushq %r12
.Lbb1:
	movzbl (%rdi), %eax
	cmpl $0, %eax
	jz .Lbb3
	movl %eax, ch(%rip)
	movl $1, %edx
	movl %esi, %r12d
	leaq ch(%rip), %rsi
	movq %rdi, %rbx
	movl %r12d, %edi
	callq write
	movl %r12d, %esi
	movq %rbx, %rdi
	addq $1, %rdi
	jmp .Lbb1
.Lbb3:
	popq %r12
	popq %rbx
	leave
	ret
/* end function dputs */

Yeah this is about what I would expect the assembly to look like. It might have been nice to have recognized that addq $1, %rdi could be rewritten as incq %rdi in this context to save a byte. But on the other hand, addq $1 and incq are not quite equivalent, so avoiding potential pitfalls is likely wise with such a small program like QBE.

Unfortunately, QBE does not recognize that movl $0, %eax can be improved to xorl %eax, %eax. That would be nice to have. On my machine, if I write a small program to do nothing but return 0:

export function w $main() {
@start
	ret 0
}

I get the following amd64 assembly:

.text
.globl main
main:
	pushq %rbp
	movq %rsp, %rbp
	movl $0, %eax
	leave
	ret
/* end function main */
.section .note.GNU-stack,"",@progbits

I do wish that movl $0, %eax was xorl %eax, %eax. There is an interesting write-up about the topic for those interested.

Writing error

Now that we have our general writing function, let's handle errors next. We only worry about errors if we have a bracket mismatch, so I will call our error function mismatch. It doesn't need to take any parameters, since the error message will always be the same and it will always be printed to stderr:

function $mismatch() {		# Static function without parameters
@start
	call $dputs(l $bmis, w 2)
	call $exit(w 1)		# Can call C functions at any time
	ret
}

The ability to directly interoperate with C is another excellent feature. Sure, we can do that with assembly as well, but this is much easier since we don't need to remember things like what values need to go in what registers. We just call the C function as if it were C, remembering to prefix each parameter with its type.

We still need that return at the end of the function. If you forget the ret at the end, you'll be greeted with this error message: last block misses jump. It was not immediately clear to me what QBE was trying to say. In short, QBE considers jmp, jnz, and ret to be jumps and every block needs to end with one. This is explained in the QBE documentation.

The mismatch function becomes this amd64 assembly:

.text
mismatch:
	pushq %rbp
	movq %rsp, %rbp
	movl $2, %esi
	leaq bmis(%rip), %rdi
	callq dputs
	movl $1, %edi
	callq exit
	leave
	ret
/* end function mismatch */

All looks good here.

Compiler logic

Now it's time to write the main logic for our Brainfuck compiler. We'll just do the simple switch statement on the current character of the input file and immediately write out the C translation.

One thing to note is that the main() function in C is not marked static. In QBE IR, if you mark a function export, that makes it non-static. Otherwise, it is static. That means the functions we've written so far are static functions. Also, we need to make sure that main returns an integer, since in C the main() function is typed as an int. That means our boilerplate needs to look like this:

export function w $main() {
@start
	ret 0
}

First thing's first: write out the prologue:

export function w $main() {
@start
	call $dputs(l $prologue, w 1)
	ret 0
}

Now, we'd like to read in a character from stdin into our character buffer $c. As read(2) returns the number of characters read, we should assign the return value and check that it is 1. If it's not 1, then we are finished:

export function w $main() {
@start
	call $dputs(l $prologue, w 1)
@loop
	%r =w call $read(w 0, l $ch, w 1)
	%cmp =w ceqw %r, 1
	jnz %cmp, @maybeleft, @eof	# %cmp == 1 ? @maybeleft : @eof
@maybeleft
@eof
	%d =w loadsw $depth
	jnz %d, @mismatch, @done
@mismatch
	call $mismatch()
	ret 0
@done
	call $dputs(l $epilogue, w 1)
	ret 0
}

We must make the comparison here because read(2) can return -1 on error and we need to go to @eof in that case as well.

I also did the end of file bracket checking and writing out the epilogue if everything checks out. Remember that all blocks must end in jmp, jnz, or ret, hence the unreachable ret 0 at the end of @mismatch.

Alright, now let's write out a QBE translation for a switch statement:

export function w $main() {
@start
	call $dputs(l $prologue, w 1)
@loop
	%r =w call $read(w 0, l $ch, w 1)
	%cmp =w ceqw %r, 1
	jnz %cmp, @maybeleft, @eof	# %cmp == 1 ? @maybeleft : @eof
@maybeleft
	%b =w loadub $ch		# Bytes must be loaded as words
	%cmp =w ceqw %b, 60		# <
	jnz %cmp, @isleft, @mayberight
@isleft
	call $dputs(l $left, w 1)
	jmp @loop
@mayberight
	%cmp =w ceqw %b, 62		# >
	jnz %cmp, @isright, @maybedec
@isright
	call $dputs(l $right, w 1)
	jmp @loop
@maybedec
	%cmp =w ceqw %b, 45		# -
	jnz %cmp, @isdec, @maybeinc
@isdec
	call $dputs(l $dec, w 1)
	jmp @loop
@maybeinc
	%cmp =w ceqw %b, 43		# +
	jnz %cmp, @isinc, @maybegetchar
@isinc
	call $dputs(l $inc, w 1)
	jmp @loop
@maybegetchar
	%cmp =w ceqw %b, 44		# ,
	jnz %cmp, @isgetchar, @maybeputchar
@isgetchar
	call $dputs(l $gchar, w 1)
	jmp @loop
@maybeputchar
	%cmp =w ceqw %b, 46		# .
	jnz %cmp, @isputchar, @maybelopen
@isputchar
	call $dputs(l $pchar, w 1)
	jmp @loop
@maybelopen
	%cmp =w ceqw %b, 91		# [
	jnz %cmp, @islopen, @maybelclose
@islopen
	call $islopen()
	jmp @loop
@maybelclose
	%cmp =w ceqw %b, 93		# ]
	jnz %cmp, @islclose, @loop
@islclose
	call $islclose()
	jmp @loop
@eof
	%d =w loadsw $depth
	jnz %d, @mismatch, @done
@mismatch
	call $mismatch()
	ret 0
@done
	call $dputs(l $epilogue, w 1)
	ret 0
}

I decided to break out bracket handling to their own functions. But before we do that, let's take a look at the generated assembly for main:

.text
.globl main
main:
	pushq %rbp
	movq %rsp, %rbp
	movl $1, %esi
	leaq prologue(%rip), %rdi
	callq dputs
.Lbb1:
	movl $1, %edx
	leaq ch(%rip), %rsi
	movl $0, %edi
	callq read
	cmpl $1, %eax
	jnz .Lbb18
	movzbl ch(%rip), %eax
	cmpl $60, %eax
	jz .Lbb17
	cmpl $62, %eax
	jz .Lbb16
	cmpl $45, %eax
	jz .Lbb15
	cmpl $43, %eax
	jz .Lbb14
	cmpl $44, %eax
	jz .Lbb13
	cmpl $46, %eax
	jz .Lbb12
	cmpl $91, %eax
	jz .Lbb11
	cmpl $93, %eax
	jnz .Lbb1
	callq islclose
	jmp .Lbb1
.Lbb11:
	callq islopen
	jmp .Lbb1
.Lbb12:
	movl $1, %esi
	leaq pchar(%rip), %rdi
	callq dputs
	jmp .Lbb1
.Lbb13:
	movl $1, %esi
	leaq gchar(%rip), %rdi
	callq dputs
	jmp .Lbb1
.Lbb14:
	movl $1, %esi
	leaq inc(%rip), %rdi
	callq dputs
	jmp .Lbb1
.Lbb15:
	movl $1, %esi
	leaq dec(%rip), %rdi
	callq dputs
	jmp .Lbb1
.Lbb16:
	movl $1, %esi
	leaq right(%rip), %rdi
	callq dputs
	jmp .Lbb1
.Lbb17:
	movl $1, %esi
	leaq left(%rip), %rdi
	callq dputs
	jmp .Lbb1
.Lbb18:
	movl depth(%rip), %eax
	cmpl $0, %eax
	jnz .Lbb20
	movl $1, %esi
	leaq epilogue(%rip), %rdi
	callq dputs
	movl $0, %eax
	jmp .Lbb21
.Lbb20:
	callq mismatch
	movl $0, %eax
.Lbb21:
	leave
	ret
/* end function main */

It looks like QBE wrote out the switch statement assembly backwards from the way we wrote it in QBE IR, but that's fine.

QBE wrote out a movl $0, %edi line. Again, it would have been better to have written out xorl %edi, %edi. You save three bytes by doing so.

movl $0, %edi encodes to:

bf 00 00 00 00

Whereas xorl %edi, %edi encodes to:

31 ff

If we were hand writing the assembly, we might also recognize that movl $1, %esi being placed before each dputs call is redundant; we could set %esi to 1 after the read call and be done with it. But I'm sure this is because I used an immediate for the file descriptor in each dputs call so this is a case where I could have done better and was likely being unreasonable about what QBE could do with the code I wrote.

Handling brackets

Lastly, let's write functions to handle [ and ]:

function $islopen() {
@start
	%d =w loadsw $depth	# Load value of global variable
	%d =w add %d, 1		# Because you can only assign to temporaries
	storew %d, $depth	# Store value to global variable
	call $dputs(l $lopen, w 1)
	ret
}

function $islclose() {
@start
	%d =w loadsw $depth
	%d =w sub %d, 1
	storew %d, $depth
	%cmp =w ceqw %d, -1	# Is the depth level -1 ?
	jnz %cmp, @bad, @good	# %cmp == 1 ? @bad : @good
@bad
	call $mismatch()
	ret
@good
	call $dputs(l $lclose, w 1)
	ret
}

The comments attempt to spell out why I did what I did. In QBE, you can only assign to temporaries, so we need to load the word out of $depth into a temporary. Then we can modify the value of the temporary and then store the new value back into $depth. We need to do this for both [ and ]. For [, we're done at this point. For ], we need to make sure the value of $depth has not fallen below 0, as that would signal a bracket mismatch.

And here is the assembly these functions become:

.text
islopen:
	pushq %rbp
	movq %rsp, %rbp
	movl depth(%rip), %eax
	addl $1, %eax
	movl %eax, depth(%rip)
	movl $1, %esi
	leaq lopen(%rip), %rdi
	callq dputs
	leave
	ret
/* end function islopen */

.text
islclose:
	pushq %rbp
	movq %rsp, %rbp
	movl depth(%rip), %eax
	subl $1, %eax
	movl %eax, depth(%rip)
	cmpl $-1, %eax
	jz .Lbb10
	movl $1, %esi
	leaq lclose(%rip), %rdi
	callq dputs
	jmp .Lbb11
.Lbb10:
	callq mismatch
.Lbb11:
	leave
	ret
/* end function islclose */

Perhaps it would have been nice to recognize that incl %eax and decl %eax could be used in these contexts, but as mentioned before it's not such a huge deal. Other than that, this all looks good.

Using QBE as a front end target for a self-hosting compiler

I think even with the minor tweaks I'd like to see with QBE, it has a lot going for it that makes it an ideal option for a back end for our self-hosting PL/0 compiler. We can refine our back end progressively: we can start with a C back end to get things going and prove the self-hosting nature of the compiler, then add a QBE back end to learn about how to convert high-level languages into assembly and assembly-like languages, and finally we can add a direct to assembly back end to learn about challenges such as register allocation (which QBE solves for us).

QBE is small enough that we can place an entire copy of the source code in our repository. It is written in C so we know we can build it. And it outputs either amd64 or aarch64 assembly, so it will just work for most of us most of the time. There's a lot more that QBE is capable of that we didn't encounter, but what I've seen is enough for me to think that it has good potential for us.

It gets the nod of approval for me to use. We'll be seeing QBE again.

Top

RSS