Dr. Brian Robert Callahan

academic, developer, with an eye towards a brighter techno-social life



[prev]
[next]

2022-06-29
OpenBSD has two new C compilers: chibicc and kefir

In my never ending quest to have oksh support every C compiler in existence, I have ported two more C compilers to OpenBSD. They are chibicc and kefir. As always, let's review them and at the end I'll have links to unofficial ports so that you can play around with these C compilers.

chibicc

chibicc is a small C11 compiler written by the person who is the original creator of the current LLVM lld linker. As an OpenBSD developer, Rui's work is extremely appreciated, as lld is the system linker on OpenBSD these days.

Unfortunately for me, chibicc was developed on Linux, and only tested there. So I am immediately wandering into unknown territory. With that said, I would not be very surprised if the port is relatively straightfoward, as chibicc compiles C to assembly and then calls GNU as to assemble the output into machine code. All the modern Unixes (that I know of and use) these days follow the System V ABI so I can be relatively certain that if the assembly code chibicc generates works on Linux, it'll work on OpenBSD (and all the BSDs) too. It should be mentioned that only supports amd64 so its utility as a multi-architecture compiler is nil. But its utility as a compiler to learn about compilers is high, since the purpose of chibicc is to be the focus of a compiler textbook.

Building chibicc

Building chibicc was very straightforward. In fact, it compiled pretty much as-is out of the box. That is not to say it worked out of the box, however. At first, chibicc would cause the assembler to complain. It turns out chibicc uses the -c flag with as which enables DWARF debug section compression. The old as in the OpenBSD base system does not understand that flag. Using the newer gas from ports does understand that flag, but it seems that lld does not know what to do with the compressed sections. I don't know if that's an lld issue or an OpenBSD issue but DWARF compressed sections probably are not that big of a deal for my purposes so I just removed that flag from the assembler invocation. I continued to use gas since chibicc issues instructions that the old as does not understand.

I then needed to turn my attention to the linker invocation, as chibicc assumes you're on the author's Linux system. Adapting the linker invocation to OpenBSD is pretty straightfoward. You can just look at the verbose linker invocation from the in-base clang and copy the flags over.

Then for the tests, they assume GNU bash and GNU grep. So I just adapted the tests to call them correctly. No need to try to translate bash to POSIX sh and GNU grep to BSD grep.

If you're interested in what this all looks like, here is the commit that contains the entire port to OpenBSD.

Running chibicc

There was one last thing missing from complete support for chibicc: GNU extended assembly. OpenBSD uses this in a header and when chibicc sees the extended assembly it has no idea what to do and gives up. This problem does not solely affect chibicc; it also affects other small C compilers such as cproc.

The solution is fairly easy: we simply have to fix up this header for chibicc, not too dissimilar to what gcc does for itself. We will have chibicc look in a custom directory for headers before checking the system directories, and in that custom directory will be a rewritten header file with the extended assembly removed. We can do this because OpenBSD has C fallback functions for all the functions written in extended assembly, and so chibicc will simply use those functions when needed.

We can see this in the install routine I added to the chibicc Makefile. It does a simple delete using sed and puts the fixed up header in the custom directory. The original header does not get changed, so this is totally non-destructive for the system.

With this, chibicc works on OpenBSD.

For some reason, the exit command does not work in a chibicc-built oksh. If you issue ^D you get a segfault. I am not sure why this is.

kefir

Kefir is an independent C17 compiler. Like chibicc, it targets amd64 only. Also like chibicc, kefir outputs assembly. Unlike chibicc, kefir claims to be supported on FreeBSD, so this might not be such uncharted territory.

Kefir also says in all bold letters in its README.md: Usage is strongly discouraged. This is [an] experimental project which is not meant for production purposes. That was all the encouragement I needed.

Building kefir

Kefir did not build out of the box on OpenBSD. The main issue is that kefir uses a number of multibyte to UTF-8, UTF-16, and UTF-32 conversion routines, not all of which are available on OpenBSD. For those that were not in OpenBSD libc, I found highly portable C versions in musl-libc and used those.

It seems that kefir may also produce assembly that the old as doesn't understand, so the newer gas is recommended as well.

Kefir also requires you to build a runtime library, not unlike compiler-rt for clang and libgcc for gcc. However, the build system does not do this for you, you need to do it yourself. You also need to install it yourself. I took care of all this in the port; that is the libkefirrt.a library. I had to come up with the name; I am very original. I actually got it from the reserved prefix of its functions.

Installing kefir

Like chibicc, kefir also does not understand GNU extended assembly, so the same trick with chibicc had to be deployed. Additionally, kefir does not understand __aligned(__alignof__(long long)) nor __aligned(__alignof__(long double)), which the stddef.h header uses. So I also had to fix up that header.

Using kefir

Unlike chibicc, and indeed every C compiler I've used up to this point, kefir does not feature a complete compiler driver. What that means is the kefir only does the compilation of C to assembly; it is up to you to pass that output to the assembler and then to the linker. All the other C compilers do all that work for you in the driver.

That means when it comes to building oksh, we can't run the configure script using kefir. We have to call configure with some other C compiler. And then we can't even use the Makefile either. Here is a quick shell script that will build oksh with kefir once you've run configure with another C compiler:

#!/bin/sh

for i in *.c ; do
  o="${i%?}o"
  echo "kefir -I/usr/local/libexec/kefir/include -I/usr/include -D_ANSI_LIBRARY -DEMACS -DVI $i | gas -o $o"
  kefir -I/usr/local/libexec/kefir/include -I/usr/include -D_ANSI_LIBRARY -DEMACS -DVI $i | gas -o $o
done

echo cc -fno-PIC -DEMACS -DVI -c emacs.c
cc -fno-PIC -DEMACS -DVI -c emacs.c

echo cc -fno-PIC -DEMACS -DVI -c misc.c
cc -fno-PIC -DEMACS -DVI -c misc.c

echo cc -nopie -o oksh *.o -lcurses -L/usr/local/libexec/kefir -lkefirrt
cc -nopie -o oksh *.o -lcurses -L/usr/local/libexec/kefir -lkefirrt

In addition, kefir fails to compile emacs.c and misc.c. Here are the error messages:

kefir -I/usr/local/libexec/kefir/include -I/usr/include -D_ANSI_LIBRARY -DEMACS -DVI emacs.c | gas -o emacs.o
Failed to compile! Error stack:
No.  Message                                                                              Class          Subclass   Compiler ref.
  0| emacs.c@883:15 Expression value shall be assignable to function parameter type |     Error|         Analysis|  source/ast/analyzer/nodes/function_call.c:94

kefir -I/usr/local/libexec/kefir/include -I/usr/include -D_ANSI_LIBRARY -DEMACS -DVI misc.c | gas -o misc.o
Failed to compile! Error stack:
No.  Message                                                                             Class          Subclass   Compiler ref.
  0| misc.c@724:22 Expression value shall be assignable to function parameter type |     Error|         Analysis|  source/ast/analyzer/nodes/function_call.c:94

That is why the build script above has the system C compiler build those two files.

Comparing compilers

Let's do the fun bit of comparing generated code sizes. To be clear, neither compiler is an optimizing compiler. So we should not expect numbers anywhere near that of clang or gcc.

Here are the number for chibicc:

text    data    bss     dec     hex
753670  40034   29848   823552  c9100

And here are the numbers for kefir:

text    data    bss     dec     hex
2374884 12071   30120   2417075 24e1b3

Remember too that two of the source files in the kefir build were compiled with clang so that brings the numbers down somewhat.

Hardly a surprise that both compilers bring up the rear of the pack in terms of binary size. What is surprising is just how far away kefir is from all the other C compilers. This is not inherently a bad thing; if the code kefir produces is correct, then it is amazing that one person was able to create a complete C17 compiler and that fact should be celebrated.

Having read a good bit of kefir-generated assembly, it appears what's happening is that the kefir runtime library has a list of all the actions that the intermediate representation can represent (e.g., push, pop, add, sub, etc.). Then, instead of translating from IR into assembly, what kefir does is converts the IR into a sequence of jumps into the functions in the runtime library.

To better understand this, imagine the following simple C program:

extern int puts(const char *);

int
main(void)
{

	puts("Hello");

	return 0;
}

Kefir will produce the following assembly:

# Globals
.global main

main:
# Begin prologue of main
    call __kefirrt_preserve_state
    sub %rsp, 16
    call __kefirrt_generic_prologue
# Load parameters of main
# End prologue of main
    lea %rbx, [__main_body]
    jmp [%rbx]
__main_body:
    .quad __kefirrt_push_impl, __kefirrt_string_literal0
    .quad __kefirrt_sfunction_puts_gate3, 0
    .quad __kefirrt_extend32_impl, 0
    .quad __kefirrt_pop_impl, 0
    .quad __kefirrt_push_impl, 0
    .quad __main_epilogue, 0
    .quad __main_epilogue, 0
__main_epilogue:
# Begin epilogue of main
    pop %rax
    mov %rsp, %r14
    add %rsp, 16
    jmp __kefirrt_restore_state
# End of main

__kefirrt_sfunction_puts_gate3:
    mov %r12, %rsp
    and %rsp, -16
    mov %rdi, [%r12 + 0]
    call puts
    mov %rsp, %r12
    add %rsp, 8
    push %rax
    add %rbx, 16
    jmp [%rbx]

.section .data
__kefirrt_module_static_vars:
    .byte 0x6d, 0x61, 0x69, 0x6e, 0x00

.section .rodata
__kefirrt_string_literal0:
    .ascii "Hello\000"

The functions __kefirrt_push_impl, __kefirrt_extend32_impl, __kefirrt_pop_impl, and __kefirrt_push_impl are all in the runtime library and won't be written out in the assembly file. Under the hood, kefir is using %rbx to keep track of where you are in the function and making indirect jumps to hop from instruction to instruction. For instructions kefir cannot possibly know in advance, like the calls, it does write those out in the generated assembly. But it is the same idea: it is just another address to jump to in the sequence of jumps.

The fancy term for this is threaded code and I recently learned that this was the way the old B compiler generated code. I'll admit this is not something I would have thought of but it appears to work just fine. I tend to think of Forth when I think of threaded code.

In comparison, the assembly chibicc outputs is so routine as to be boring; though that is exactly how it should be.

Conclusion

Porting compilers to OpenBSD is fun. These two small C compilers are complete enough to compile real software and demonstrate that there are a myriad of ways to solve the same set of problems. That to me is what makes compilers and interpreters interesting: there are so many ways to solve the same problem that there is always something new to learn.

If you'd like unofficial ports of these two compilers, here is one for chibicc and one for kefir.

Top

RSS