Dr. Brian Robert Callahan

academic, developer, with an eye towards a brighter techno-social life



[prev]
[next]

2023-07-23
Using non-GNU assemblers with the Portable C Compiler

The Portable C Compiler (PCC) is a robust little C99 compiler that is available for a decent number of modern and historic systems, such as Windows, all the major BSDs and even some minor BSDs, macOS, Linux, Solaris, as of recently Illumos, and some other Unix variants. Wikipedia calls PCC a production quality compiler. It's a nice little compiler; it compiles code faster than GCC and Clang, though it does not optimize quite as well as them. And it can recompile itself in only a few seconds.

PCC traces its history back to the 1970s and was used as the C compiler for BSD Unix for quite a long time.

While PCC is a complete compiler, including a libpcc.a runtime support library akin to libgcc.a for GCC and libcompiler_rt.a for Clang, PCC is not a complete compilation system. PCC relies on someone else to provide an assembler and a linker to complete the process of processing object code, shared libraries, and executables. In contrast, LLVM includes everything you need to go from zero to an executable: a preprocessor, C compiler, assembler, linker, runtime support library, and is even developing their own C standard library implementation. The GNU system also has a complete preprocessor, C compiler, assembler, linker, runtime support library, and C standard library implementation; at least all under the same umbrella if not all in the same single monorepo like LLVM. PCC only provides a preprocessor, C compiler, and runtime support library.

More specifically, PCC assumes the use of the GNU assembler and GNU linker, or at least a GNU-style assembler and linker. The LLVM linker is known to work, as it is a drop-in replacement for the GNU linker, and Clang can be used as the assembler, which is done by default on FreeBSD.

Most people below a certain age have probably only ever used the GNU assembler and/or Clang when it comes to assembly. Maybe some Microsoft assembler if they developing Windows applications. Or maybe SPIM in a college course. But that's likely it. It is not exactly surprising why this is: for most average open source developers, GCC and Clang are the preeminent compilation systems. It is an easy choice if you are developing your own compiler to compile to GNU-style assembly, and let tried-and-true software handle the remaining heavy lifting.

What I am interested in learning about today is if PCC can use other assemblers and linkers besides GNU and LLVM. While it may not be the most practical endeavor we can embark on, we might learn something about the assembly that PCC emits, and the compatibility of alternative assemblers and linkers.

Yasm

Yasm is a complete rewrite of the NASM assembler. One of the nice features of Yasm is that it understands both NASM syntax and GNU assembler syntax; you can select which syntax an input file is with a command line switch. Indeed, it has been known to the PCC developers for quite some time. Yasm appears to have been the original assembler for the Win32 platform, at least this commit is the earliest reference to Yasm in the PCC commit logs. It looks like Yasm was first discussed back in 2007. This makes a good bit of sense, as this would have been during the time that at least OpenBSD and NetBSD were actively working on PCC to replace GCC as the in-base compiler. This was before the time of Clang. OpenBSD and NetBSD would have wanted a BSD-licensed assembler, and Yasm is BSD-licensed.

The tendrils for Yasm support are still in PCC, but it seems like it was only ever used or tested on i386. When I tried using it on amd64, Yasm complained about a lot of things. First, it complained that I was trying to use 64-bit mnemonics when it was in 32-bit mode. This fix was straightforward: all I had to do was set the -f elf64 flag when compiling for amd64.

The next issue was that Yasm is extremely chatty. PCC when compiling for amd64 uses an .end directive at the end of each source module. The GNU assembler uses .end to stop processing. Yasm doesn't understand .end and warns when it encounters the .end directive. But as it is just a warning, we are able to continue; we just get an annoying useless warning for each .c file we compile.

Additionally, Yasm does not understand the .stabs and .stabn directives. This is the stabs debug information that PCC generates when using the -g flag. The fact that Yasm claims to not understand these directives confuses me a little. Yasm claims to be able to generate stabs debugging output, which suggests to me that the code for what to do with .stabs and .stabn directives might already be there.

However, just like the .end directive, Yasm outputs lots of warnings about .stabs and .stabn. So I taught PCC to add the -w flag to Yasm to silence warnings. I also had to teach PCC not to issue the -k flag to Yasm. The GNU assembler uses the -k flag to generate PIC code. Yasm does not appear to use any flag to generate PIC code, but appears to generate such code based on its input, at least on my OpenBSD machine Yasm generates PIC code. Or at least OpenBSD can smooth things over.

If you're OK with losing debug information, Yasm can be used as the PCC assembler without being any the wiser. Yasm assembles everything PCC throws its way.

I sent a small patch upstream and it was committed.

Sun assembler

As a result of getting Oracle Developer Studio 12.6 working on Illumos, I had a copy of the Sun assembler. You can download the compiler suite for both Solaris/Illumos and Linux for free. On both, the Sun assembler is a 64-bit executable that can assemble both 64-bit and 32-bit code. There is a switch that switches between the two: -m32 for 32-bit mode and -m64 for 64-bit mode. I already knew from porting PCC to Illumos that the Sun linker can be used without any problems. If the Sun assembler also works, then in theory one could use the Sun tools in full with PCC.

The Sun assembler in the Oracle Developer Studio 12.6 download is named fbe. I don't know why it has that name.

On i386, everything works fine with the Sun assembler without any changes. On amd64, the Sun assembler also does not understand the .end directive. Unlike Yasm where it is a warning, it is an error when using the Sun assembler. The Sun assembler also does not use the -v flag to print version information, instead using -V. I had to teach PCC to use the correct flag when using the Sun assembler.

Also, sometimes PCC emits a .short directive. The Sun assembler does not understand .short; it appears to be a GNU extension. The GNU binutils documentation says that .short is equivalent to .word unless there is architecture-specific documentation stating otherwise. I did not find anything about this in the x86-specific documentation, so I taught PCC that when using the Sun assembler to use .word instead of .short. That worked.

Next, I had to teach PCC to issue the correct flag for PIC code. The Sun assembler uses -KPIC to generate PIC code.

I then taught the configure script a new option, --with-sun-as, that declares that you are using the Sun assembler. The configure script already has a --with-yasm option that does the same when using Yasm. It would be nicer to let the configure script automatically detect which assembler you are using, but a manual option is fine for those that choose to use Yasm or the Sun assembler.

Finally, I taught the PCC setup code for Solaris/Illumos, Linux, NetBSD, and FreeBSD to issue the Sun assembler the correct flags when using it on those platforms. Thanks to the Linuxulator, both FreeBSD and NetBSD can run unmodified Linux binaries. Sadly, OpenBSD doesn't have a Linuxulator. While I'm sure it's not so simple, maybe one day Oracle will open source the Sun assembler. The Sun linker was open sourced by Sun as part of OpenSolaris. We can dream.

I had a NetBSD virtual machine lying around, so I copied the Sun assembler to the virtual machine and installed it as /emul/linux/bin/as. I had to install the suse_base-15.5nb1 package. I think you can use an older Suse base, but this was the most recent one so it is the one I used.

I then tested PCC with the Sun assembler on Illumos and NetBSD on amd64. And it worked perfectly on both! I can build and rebuild PCC ad infinitum and lots of other C code. The Sun assembler assembles everything PCC throws at it. The Sun assembler does give a warning in one scenario: in some cases, PCC issues .quad directives. When compiling with -fPIC, the Sun assembler warns that those .quad directives create absolute relocations. It doesn't seem to affect the usability of the resulting binaries, and the GNU assembler might well do the same thing but not warn about it. Or perhaps it is an opportunity for PCC to output better code in those situations if possible.

But in any event, we can confidently say that the Sun assembler can be used with PCC. I have sent my changes upstream but they have not yet been committed. I can't say I am incredibly surprised by this, as even GCC can use the Sun assembler, but I am not sure anyone ever tried with PCC until now. And it is good to know these things even if most people won't use it.

Conclusion

It was fun to fix up Yasm and the Sun assembler for PCC. We learned some new information about the GNU assembler syntax compatibility in both Yasm and the Sun assembler. And we made PCC a little bit better. All in a morning's work.

Top

RSS