Dr. Brian Robert Callahan

academic, developer, with an eye towards a brighter techno-social life


Compiler numbers, and why they don't matter

OpenBSD recently upgraded their in-base compiler from clang 8.0.1 to clang 10.0.0. Neat.

One of the things that make measuring binary sizes so difficult (and why, despite using it as a rough metric for SnakeQR, I don't consider it a comparative metric) is that your compiler can do a lot behind the scenes that can cause binary sizes to differ. Fortunately for us, we no longer have to worry much about the size of our binaries. But let's compare things just for fun.

We will compare clang 8.0.1 to clang 10.0.0. And because I have the compilers on this machine, let's add in gcc 4.2.1, gcc 8.3.0, gcc, pcc 1.2.0 DEVEL 20200630, and lacc 0.0.1. That should give us a wide array of comparisons, though again this should not be considered scientific in any way.1 We will use total size in hex as our comparator.

Let's start with crt.o.

clang 10.0.0 clang 8.0.1 gcc gcc 8.3.0 gcc 4.2.1 pcc 1.2.0 DEVEL 20200630 lacc 0.0.1
3b 3b 3b 3b 3b 3b N/A2

All the same. That's good. That means our assemblers are not doing anything fancy behind the scenes. There are four different assemblers being used here: clang 10.0.0 and clang 8.0.1 each have their own built-in assembler, gcc and pcc are using GNU as, and gcc 8.3.0 and gcc 4.2.1 are using GNU as 2.17.

Now let's look at snakeqr.o.

clang 10.0.0 clang 8.0.1 gcc gcc 8.3.0 gcc 4.2.1 pcc 1.2.0 DEVEL 20200630 lacc 0.0.1
15e2 15d0 16a1 16dd 178d 1bd0 2509

Interestingly, clang 8.0.1 generates a smaller object file than clang 10.0.0. Newer versions of gcc produce smaller object code than older versions. pcc is not too terribly behind. And lacc brings up the rear and is noticably worse at producing small object code than the other compilers.3

So what can we draw from this? Probably not much.

clang 10.0.0 is a superior compiler to clang 8.0.1 and brings in support for newer versions of C++ and that is more than worth the potential for larger binaries, which if we're being honest here the 18 byte increase is effectively noise. Newer compilers are able to generate better code, but that had better be true or else there would be little reason to be using a newer compiler. clang and especially gcc have had the benefit of decades of time and generations of programmers and corporations throwing money at them to create high-quality, production-ready compilers. That hardly seems revelatory.

pcc, which has been around since the 1970s, can still hold its own against clang and gcc when it comes to standard C code. That might actually be the big take-away here for me.

And lacc shows us that C is a language small enough and versatile enough that a single person can produce a compiler that honestly puts up respectable numbers against the bigger compilers. That's the other big take-away. If you've ever wanted to write your own C compiler, producing something that you can be proud of and can produce good enough code for modern machines is within the reach of a single person.4

These comparisons might be interesting to look at but remember not to put any stock in them. Very few of us live in a world where these binary sizes would make or break our systems. There is no "best" compiler and the endeavor to crown one is little more than a distraction. The best compiler is the one that compiles the code you want to run in a time you can live with. Don't be blinded by shiny numbers; you'll no longer be able to see what really matters.

1 Further rendering this unscientific is that not all compilers accept the same options: neither gcc nor pcc understand -Oz so I replaced it with -Os for them; lacc only understands -O1. Only clang understands -fno-ret-protector and -mno-retpoline, so it was left out when compiling with other compilers. Additionally, lacc does not understand -ffreestanding, -fomit-frame-pointer, or -fno-stack-protector.

2 lacc cannot process assembly files; it processes C files directly into object files, though it can output assembly.

3 lacc does not have an optimizer so this is not much of a fair comparison.

4 lacc is an officially supported compiler for oksh and mksh. Creating a serious C compiler is within a single person's reach.