Dr. Brian Robert Callahan

academic, developer, with an eye towards a brighter techno-social life



[prev]
[next]

2024-01-22
Can GCC use Clang as its assembler?

There are, I think, two great open source compiler infrastructures: GCC and LLVM. They have been a great boon to all kinds of development, open source and closed source alike.

Today, we are going to create a bit of a franken-compiler toolchain. Once upon a time, the LLVM project had a plugin for GCC called DragonEgg that would cause GCC to use LLVM's optimizers and backend generators instead of GCC's. We will do something not quite that cool but something with potentially useful results.

As a consequence of LLVM containing an integrated assembler, this means that clang can assemble assembly files if you pass them to clang. GCC cannot do this, at least not directly: if you pass gcc an assembly file it will simply pass it onto your assembler, usually GNU as but can be another Unix assembler, such as the Sun assembler, which can be run on Solaris and Illumos.

And so I want to know if it is possible to replace GNU as with clang as the assembler in the GCC toolchain. There is potential to want this, at least one some systems. FreeBSD, specifically FreeBSD 14.0-RELEASE, where will be running our experiment today, no longer ships a /usr/bin/as binary. With LLVM also providing a linker and many other utilities found in the GNU binutils, could GCC become a boutique frontend for the LLVM system?

Let's find out.

The plan

The plan is to have gcc complete a full 3-stage bootstrap of itself with clang specified as the assembler. That feels like enough of a real-world test that if gcc can compile itself, there will be a lot of code it can compile.

The version of clang found in FreeBSD 14.0-RELEASE is 16.0.6. The package you get if you run pkg install gcc is gcc12, but I think I'd prefer gcc13, as that is the most recent version.

Test setup

First, it is simply worth it to see if someone else has already thought of this and made our life very easy. I did install gcc12 with pkg install gcc. This gcc looks for its assembler at /usr/local/bin/as, which is where the binutils package installs itself. Since I'm going to be deleting this install of binutils anyway during our test, I simply copied /usr/bin/cc to /usr/local/bin/as.

GCC has a -pipe flag which pipes the output from GCC into the assembler. Without this flag, GCC will write assembly code out to a temporary file, then have the assembler read in that file. Clang supports this flag but does not actually do anything with it, as clang generates object code directly without the need for an assembler.

Unfortunately, this simple setup does not work. Clang complains that you need the -x flag to tell clang what the input is if the input is coming in on stdin.

That means we'll need to write a small wrapper program that adds this flag before executing clang.

Wrapper program for as in D

I suppose this can probably be done in shell, but I wrote a quick prototype in D to add the necessary -xassembler and -c flags to effectively turn clang into a command line assember:

import std.string;
import std.process;

int main(string[] args) {
    string[] av = new string[args.length + 2];
    av[0] = "/usr/bin/cc", av[1] = "-xassembler", av[2] = "-c";

    size_t ac = 3;
    foreach (i; 1 .. args.length)
        av[ac++] = args[i];

    return spawnProcess(av).wait();
}

A quick compile with LDC, ldc2 -O -release -ofas as.d, and I had my wrapper program. I then installed it with sudo mv as /usr/bin/ && sudo chown root:wheel /usr/bin/as. As FreeBSD 14.0-RELEASE does not ship with a /usr/bin/as utility, we are not damaging anything by doing this.

As an aside, D has become my language of choice for prototyping just about every new project. If you have never used D, you owe it to yourself to download a compiler and try it. There is a great community that cares a lot about making tools that solve real problems. And it works great on OpenBSD. I keep an OpenBSD build of the HEAD of GCC at all times so I always have a bleeding-edge GDC. Even if D isn't the language a project of mine is ultimately written in, I'll still prototype it in D and then ask ChatGPT to rewrite it for me in another language.

FreeBSD has a port of LDC which you can install with pkg install ldc. I don't see any ports of DMD or GDC, but upstream creates DMD packages for FreeBSD for each release and beta. Walter Bright, the creator of D and other cool things, once told me that he really likes FreeBSD and at least some of his infrastructure ran on it.

Building

I used the tarball and patches from the gcc13 port, but I built things outside the ports infrastructure to make it easier on myself. I used all the same configure flags as the ports build, except changing the --with-as and --with-ld flags to --with-as=/usr/bin/as and --with-ld=/usr/bin/ld.

The complete configure invocation is:

$ ../gcc-13.2.0/configure --verbose --enable-languages=c,c++,objc,fortran,jit --with-as=/usr/bin/as --with-ld=/usr/bin/ld --disable-nls --enable-gnu-indirect-function --enable-host-shared --enable-plugin --libdir=/usr/local/lib/gcc13 --libexecdir=/usr/local/libexec/gcc13 --program-suffix=13 --with-gmp=/usr/local --with-gxx-include-dir=/usr/local/lib/gcc13/include/c++ --with-gxx-libcxx-include-dir=/usr/include/c++/v1 --with-system-zlib --without-zstd

First failure

We are able to build the stage1 compiler, but it fails right away with a message that the assembler does not understand the --64 flag.

Interesting that while gcc (and clang) use -m64 and -m32 to set arch, GNU as uses --64 and --32. In any event, I needed to find a way to convert the --64 and --32 flags into -m64 and -m32, as I need to provide flags that clang understands. Fortunately, it is really easy to add this functionality to our wrapper program:

import std.string;
import std.process;

int main(string[] args) {
    string[] av = new string[args.length + 2];
    av[0] = "/usr/bin/cc", av[1] = "-xassembler", av[2] = "-c";

    size_t ac = 3;
    foreach (i; 1 .. args.length)
        av[ac++] = (args[i] == "--64" || args[i] == "--32") ? args[i].replace("--", "-m") : args[i];

    return spawnProcess(av).wait();
}

Again, let's build it with ldc2 -O -release -ofas as.d and install it with sudo mv as /usr/bin/ && sudo chown root:wheel /usr/bin/as.

Now we can restart the build. We should not need to rebuild the stage1 compiler, as it was built with clang.

Second failure

The stage1 compiler got a few lines farther in the configure script and then failed again. We were told:

configure:3831: error: in `/home/brian/build-gcc/x86_64-unknown-freebsd14.0/libgcc':
configure:3834: error: cannot compute suffix of object files: cannot compile
See `config.log' for more details

And the issue in question is:

/tmp//cc3quiho.s:44:2: error: changed section flags for .eh_frame, expected: 0x2
        .section        .eh_frame,"aw",@progbits
        ^
/tmp//cc3quiho.s:190:2: error: changed section flags for .debug_str, expected: 0x30
        .section        .debug_str,"",@progbits
        ^
/tmp//cc3quiho.s:190:2: error: changed section entsize for .debug_str, expected: 1
        .section        .debug_str,"",@progbits
        ^

I am not sure if this is a bug in clang or a bug in gcc. In any event, this code does not error out when GNU as is the assembler.

Building a new clang

I eventually traced the error to this discussion which led me to this commit. Perhaps clang is being too restrictive as to changes in section flags. Or maybe gcc should not be emitting those directives. I will need to build a new clang in any event. Instead of rebuilding the base system's clang, I chose to build the llvm16 port with an added patch to remove the offending check. The patch looks like this:

--- ELFAsmParser.cpp.orig       2024-01-20 21:52:37.741189000 -0500
+++ ELFAsmParser.cpp    2024-01-20 21:52:56.375637000 -0500
@@ -683,6 +683,7 @@
   // Check that flags are used consistently. However, the GNU assembler permits
   // to leave out in subsequent uses of the same sections; for compatibility,
   // do likewise.
+#if 0
   if (!TypeName.empty() && Section->getType() != Type &&
       !allowSectionTypeMismatch(getContext().getTargetTriple(), SectionName,
                                 Type))
@@ -695,6 +696,7 @@
       Section->getEntrySize() != Size)
     Error(loc, "changed section entsize for " + SectionName +
                    ", expected: " + Twine(Section->getEntrySize()));
+#endif

   if (getContext().getGenDwarfForAssembly() &&
       (Section->getFlags() & ELF::SHF_ALLOC) &&

This may feel a little heavy handed, as we are disabiling all the checks. But we are only going for proof-of-concept here; I will leave it to the LLVM people if this is indeed a bug, and a bug worth fixing.

I built and installed this new clang through the ports infrastructure. We will need to modify our wrapper program to use this new clang:

import std.string;
import std.process;

int main(string[] args) {
    string[] av = new string[args.length + 2];
    av[0] = "/usr/local/bin/clang16", av[1] = "-xassembler", av[2] = "-c";

    size_t ac = 3;
    foreach (i; 1 .. args.length)
        av[ac++] = (args[i] == "--64" || args[i] == "--32") ? args[i].replace("--", "-m") : args[i];

    return spawnProcess(av).wait();
}

Another build and install later, and we are ready to continue. Like last time, we just need to restart the gcc build. We don't need to start all over, as the stage1 compiler is built with clang.

There is also an llvm-mc utility installed with the port. This utility appears to be an analog to GNU as. But as FreeBSD does not ship this utility in the base system, and clang does have the integrated assembler built-in, I am going to stick with clang.

A linker failure

We hit a problem trying to link the 32-bit libgcc_s.so:

/home/brian/build-gcc/./gcc/xgcc -B/home/brian/build-gcc/./gcc/ -B/usr/local/x86_64-unknown-freebsd14.0/bin/ -B/usr/local/x86_64-unknown-freebsd14.0/lib/ -isystem /usr/local/x86_64-unknown-freebsd14.0/include -isystem /usr/local/x86_64-unknown-freebsd14.0/sys-include   -fno-checking -O2  -g -O2 -DIN_GCC   -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  -isystem ./include  -fpic -pthread -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector  -shared -nodefaultlibs -Wl,--soname=libgcc_s.so.1 -Wl,--version-script=libgcc.map -m32 -o 32/libgcc_s.so.1.tmp -g -O2 -m32 -B./ _muldi3_s.o _negdi2_s.o _lshrdi3_s.o _ashldi3_s.o _ashrdi3_s.o _cmpdi2_s.o _ucmpdi2_s.o _clear_cache_s.o _trampoline_s.o __main_s.o _absvsi2_s.o _absvdi2_s.o _addvsi3_s.o _addvdi3_s.o _subvsi3_s.o _subvdi3_s.o _mulvsi3_s.o _mulvdi3_s.o _negvsi2_s.o _negvdi2_s.o _ctors_s.o _ffssi2_s.o _ffsdi2_s.o _clz_s.o _clzsi2_s.o _clzdi2_s.o _ctzsi2_s.o _ctzdi2_s.o _popcount_tab_s.o _popcountsi2_s.o _popcountdi2_s.o _paritysi2_s.o _paritydi2_s.o _powisf2_s.o _powidf2_s.o _powixf2_s.o _powitf2_s.o _mulsc3_s.o _muldc3_s.o _mulxc3_s.o _multc3_s.o _divsc3_s.o _divdc3_s.o _divxc3_s.o _divtc3_s.o _bswapsi2_s.o _bswapdi2_s.o _clrsbsi2_s.o _clrsbdi2_s.o _mulbitint3_s.o _fixunssfsi_s.o _fixunsdfsi_s.o _fixunsxfsi_s.o _fixsfdi_s.o _fixdfdi_s.o _fixxfdi_s.o _fixunssfdi_s.o _fixunsdfdi_s.o _fixunsxfdi_s.o _floatdisf_s.o _floatdidf_s.o _floatdixf_s.o _floatundisf_s.o _floatundidf_s.o _floatundixf_s.o _divdi3_s.o _moddi3_s.o _divmoddi4_s.o _udivdi3_s.o _umoddi3_s.o _udivmoddi4_s.o _udiv_w_sdiv_s.o _divmodbitint4_s.o cpuinfo_s.o tf-signs_s.o sfp-exceptions_s.o _divhc3_s.o _mulhc3_s.o addtf3_s.o divtf3_s.o eqtf2_s.o getf2_s.o letf2_s.o multf3_s.o negtf2_s.o subtf3_s.o unordtf2_s.o fixtfsi_s.o fixunstfsi_s.o floatsitf_s.o floatunsitf_s.o fixtfdi_s.o fixunstfdi_s.o floatditf_s.o floatunditf_s.o fixsfbitint_s.o floatbitintsf_s.o fixdfbitint_s.o floatbitintdf_s.o extendhfsf2_s.o extendhfdf2_s.o extendhftf2_s.o extendhfxf2_s.o extendsfdf2_s.o extendsftf2_s.o extenddftf2_s.o extendxftf2_s.o extendbfsf2_s.o trunctfhf2_s.o truncxfhf2_s.o truncdfhf2_s.o truncsfhf2_s.o trunctfsf2_s.o truncdfsf2_s.o trunctfdf2_s.o trunctfxf2_s.o trunctfbf2_s.o truncxfbf2_s.o truncdfbf2_s.o truncsfbf2_s.o trunchfbf2_s.o fixtfbitint_s.o floatbitinttf_s.o eqhf2_s.o fixxfbitint_s.o floatbitinthf_s.o floatbitintbf_s.o floatbitintxf_s.o enable-execute-stack_s.o hardcfr_s.o strub_s.o unwind-dw2_s.o unwind-dw2-fde-dip_s.o unwind-sjlj_s.o unwind-c_s.o emutls_s.o libgcc.a -lc && rm -f 32/libgcc_s.so && if [ -f 32/libgcc_s.so.1 ]; then mv -f 32/libgcc_s.so.1 32/libgcc_s.so.1.backup; else true; fi && mv 32/libgcc_s.so.1.tmp 32/libgcc_s.so.1 && ln -s libgcc_s.so.1 32/libgcc_s.so
ld: error: duplicate symbol: __x86.get_pc_thunk.dx
>>> defined at crtstuff.c
>>>            /home/brian/build-gcc/./gcc/32/crtbeginS.o:(.gnu.linkonce.t.__x86.get_pc_thunk.dx+0x0)
>>> defined at crtstuff.c
>>>            /home/brian/build-gcc/./gcc/32/crtendS.o:(.gnu.linkonce.t.__x86.get_pc_thunk.dx+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.ax
>>> defined at libgcc2.c
>>>            _mulvdi3_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)
>>> defined at emutls.c
>>>            emutls_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.di
>>> defined at crtstuff.c
>>>            /home/brian/build-gcc/./gcc/32/crtbeginS.o:(.gnu.linkonce.t.__x86.get_pc_thunk.di+0x0)
>>> defined at unwind-c.c
>>>            unwind-c_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.di+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.ax
>>> defined at libgcc2.c
>>>            _mulvdi3_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)
>>> defined at unwind-dw2-fde-dip.c
>>>            unwind-dw2-fde-dip_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.di
>>> defined at crtstuff.c
>>>            /home/brian/build-gcc/./gcc/32/crtbeginS.o:(.gnu.linkonce.t.__x86.get_pc_thunk.di+0x0)
>>> defined at unwind-dw2-fde-dip.c
>>>            unwind-dw2-fde-dip_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.di+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.si
>>> defined at libgcc2.c
>>>            _absvdi2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.si+0x0)
>>> defined at unwind-dw2-fde-dip.c
>>>            unwind-dw2-fde-dip_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.si+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.cx
>>> defined at libgcc2.c
>>>            _absvsi2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.cx+0x0)
>>> defined at unwind-dw2-fde-dip.c
>>>            unwind-dw2-fde-dip_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.cx+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.ax
>>> defined at libgcc2.c
>>>            _mulvdi3_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)
>>> defined at unwind-dw2.c
>>>            unwind-dw2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.si
>>> defined at libgcc2.c
>>>            _absvdi2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.si+0x0)
>>> defined at unwind-dw2.c
>>>            unwind-dw2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.si+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.cx
>>> defined at libgcc2.c
>>>            _absvsi2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.cx+0x0)
>>> defined at unwind-dw2.c
>>>            unwind-dw2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.cx+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.di
>>> defined at crtstuff.c
>>>            /home/brian/build-gcc/./gcc/32/crtbeginS.o:(.gnu.linkonce.t.__x86.get_pc_thunk.di+0x0)
>>> defined at unwind-dw2.c
>>>            unwind-dw2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.di+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.di
>>> defined at crtstuff.c
>>>            /home/brian/build-gcc/./gcc/32/crtbeginS.o:(.gnu.linkonce.t.__x86.get_pc_thunk.di+0x0)
>>> defined at eqhf2.c
>>>            eqhf2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.di+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.ax
>>> defined at libgcc2.c
>>>            _mulvdi3_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)
>>> defined at trunchfbf2.c
>>>            trunchfbf2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.ax
>>> defined at libgcc2.c
>>>            _mulvdi3_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)
>>> defined at truncsfbf2.c
>>>            truncsfbf2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.ax
>>> defined at libgcc2.c
>>>            _mulvdi3_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)
>>> defined at truncdfbf2.c
>>>            truncdfbf2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.ax
>>> defined at libgcc2.c
>>>            _mulvdi3_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)
>>> defined at truncxfbf2.c
>>>            truncxfbf2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.ax
>>> defined at libgcc2.c
>>>            _mulvdi3_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)
>>> defined at trunctfbf2.c
>>>            trunctfbf2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.ax
>>> defined at libgcc2.c
>>>            _mulvdi3_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)
>>> defined at trunctfxf2.c
>>>            trunctfxf2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.ax
>>> defined at libgcc2.c
>>>            _mulvdi3_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)
>>> defined at trunctfdf2.c
>>>            trunctfdf2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)

ld: error: duplicate symbol: __x86.get_pc_thunk.ax
>>> defined at libgcc2.c
>>>            _mulvdi3_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)
>>> defined at truncdfsf2.c
>>>            truncdfsf2_s.o:(.gnu.linkonce.t.__x86.get_pc_thunk.ax+0x0)

ld: error: too many errors emitted, stopping now (use --error-limit=0 to see all errors)

Searching around for this, I came upon this blog post from one of the lld authors, which states:

"GNU linkers give .gnu.linkonce* sections COMDAT section semantics. ld.lld simply ignores such sections. https://bugs.llvm.org/show_bug.cgi?id=31586 tracks when the hack can be removed."

I take these errors to mean that the issue has not been resolved. That means we will need to change our configure options for gcc and change --with-ld=/usr/bin/ld to --with-ld=/usr/local/bin/ld and install the binutils package so that we can use GNU ld like the gcc13 port does. We will also need to restart our gcc build from scratch yet again.

You could use lld without issue if you add the --disable-multilib flag to configure. However, I want multilib on so we can test our assembler as much as possible.

TLS issues

Things were going well until we got to our stage2 libcpp. Then we encountered a confusing error message:

/home/brian/build-gcc/./prev-gcc/xg++ -B/home/brian/build-gcc/./prev-gcc/ -B/usr/local/x86_64-unknown-freebsd14.0/bin/ -nostdinc++ -B/home/brian/build-gcc/prev-x86_64-unknown-freebsd14.0/libstdc++-v3/src/.libs -B/home/brian/build-gcc/prev-x86_64-unknown-freebsd14.0/libstdc++-v3/libsupc++/.libs  -I/home/brian/build-gcc/prev-x86_64-unknown-freebsd14.0/libstdc++-v3/include/x86_64-unknown-freebsd14.0  -I/home/brian/build-gcc/prev-x86_64-unknown-freebsd14.0/libstdc++-v3/include  -I/home/brian/gcc-13.2.0/libstdc++-v3/libsupc++ -L/home/brian/build-gcc/prev-x86_64-unknown-freebsd14.0/libstdc++-v3/src/.libs -L/home/brian/build-gcc/prev-x86_64-unknown-freebsd14.0/libstdc++-v3/libsupc++/.libs  -I../../gcc-13.2.0/libcpp -I. -I../../gcc-13.2.0/libcpp/../include -I../../gcc-13.2.0/libcpp/include  -g -O2 -fno-checking -gtoggle -W -Wall -Wno-narrowing -Wwrite-strings -Wmissing-format-attribute -pedantic -Wno-long-long  -fno-exceptions -fno-rtti -I../../gcc-13.2.0/libcpp -I. -I../../gcc-13.2.0/libcpp/../include -I../../gcc-13.2.0/libcpp/include  -fPIC  -c -o charset.o -MT charset.o -MMD -MP -MF .deps/charset.Tpo ../../gcc-13.2.0/libcpp/charset.cc
/home/brian/build-gcc/prev-gcc/include/stddef.h:145:9: error: multiple types in one declaration
  145 | typedef __PTRDIFF_TYPE__ ptrdiff_t;
      |         ^~~~~~~~~~~~~~~~
In file included from ../../gcc-13.2.0/libcpp/charset.cc:20:
./config.h:373:19: error: declaration does not declare anything [-fpermissive]
  373 | #define ptrdiff_t int
      |                   ^~~
/home/brian/build-gcc/prev-x86_64-unknown-freebsd14.0/libstdc++-v3/include/x86_64-unknown-freebsd14.0/bits/c++config.h:309:11: error: multiple types in one declaration
  309 |   typedef __PTRDIFF_TYPE__      ptrdiff_t;
      |           ^~~~~~~~~~~~~~~~
./config.h:373:19: error: declaration does not declare anything [-fpermissive]
  373 | #define ptrdiff_t int
      |                   ^~~

This one was a bit tricky to hunt down. It looks like gcc does not know that FreeBSD has a sys/types.h header, which of course it does. The actual definition of ptrdiff_t is found in sys/_types.h, which sys/types.h pulls in.

But that's a symptom of the actual problem: gcc does not think FreeBSD has ANSI C headers because that check fails:

configure:4378: checking for ANSI C header files
configure:4398:  /home/brian/build-gcc/./prev-gcc/xgcc -B/home/brian/build-gcc/./prev-gcc/ -B/usr/local/x86_64-unknown-freebsd14.0/bin/ -B/usr/local/x86_64-unknown-freebsd14.0/bin/ -B/usr/local/x86_64-unknown-freebsd14.0/lib/ -isystem /usr/local/x86_64-unknown-freebsd14.0/include -isystem /usr/local/x86_64-unknown-freebsd14.0/sys-include   -fno-checking -c -g -O2 -fno-checking -gtoggle  conftest.c >&5
configure:4398: $? = 0
configure:4471:  /home/brian/build-gcc/./prev-gcc/xgcc -B/home/brian/build-gcc/./prev-gcc/ -B/usr/local/x86_64-unknown-freebsd14.0/bin/ -B/usr/local/x86_64-unknown-freebsd14.0/bin/ -B/usr/local/x86_64-unknown-freebsd14.0/lib/ -isystem /usr/local/x86_64-unknown-freebsd14.0/include -isystem /usr/local/x86_64-unknown-freebsd14.0/sys-include   -fno-checking -o conftest -g -O2 -fno-checking -gtoggle  -static-libstdc++ -static-libgcc  conftest.c  >&5
/usr/local/bin/ld: /tmp//ccNrkHRa.o: in function `main':
conftest.c:(.text.startup+0x37): undefined reference to `__emutls_v._ThreadRuneLocale'
collect2: error: ld returned 1 exit status
configure:4471: $? = 1
configure: program exited with status 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "cpplib"
| #define PACKAGE_TARNAME "cpplib"
| #define PACKAGE_VERSION " "
| #define PACKAGE_STRING "cpplib  "
| #define PACKAGE_BUGREPORT "gcc-bugs@gcc.gnu.org"
| #define PACKAGE_URL ""
| /* end confdefs.h.  */
| #include <ctype.h>
| #include <stdlib.h>
| #if ((' ' & 0x0FF) == 0x020)
| # define ISLOWER(c) ('a' <= (c) && (c) <= 'z')
| # define TOUPPER(c) (ISLOWER(c) ? 'A' + ((c) - 'a') : (c))
| #else
| # define ISLOWER(c) 		   (('a' <= (c) && (c) <= 'i') 		     || ('j' <= (c) && (c) <= 'r') 		     || ('s' <= (c) && (c) <= 'z'))
| # define TOUPPER(c) (ISLOWER(c) ? ((c) | 0x40) : (c))
| #endif
| 
| #define XOR(e, f) (((e) && !(f)) || (!(e) && (f)))
| int
| main ()
| {
|   int i;
|   for (i = 0; i < 256; i++)
|     if (XOR (islower (i), ISLOWER (i))
| 	|| toupper (i) != TOUPPER (i))
|       return 2;
|   return 0;
| }
configure:4482: result: no

And this causes a cascading effect that eventually leads to the error we saw.

I believe this means that gcc is unable to automatically detect if we have thread-local storage. We do, and we can force gcc to use it even if it fails to autodetect it properly. We will need to start the build over, adding the --enable-tls configure flag to do that.

I am not sure why gcc is unable to automatically detect if we have TLS when using clang as the assembler. It is able to do so when GNU as is used as the assembler. So there must be something subtly different between the two.

Success

This time, the build went all the way to the end and was successful!

While we had some kinks to work out, getting the entirety of gcc compiled suggests that clang is a potentially powerful replacement for GNU as that, while not identical, is good enough to be an assembler for gcc. This should not be too surprsing, as the LLVM integrated assembler should be able to assemble any arbitrary code, as you cannot know ahead of time what code might be given to clang.

llvm-mc after all

I discovered that clang is missing an important feature. When gcc uses the -pipe flag, it does not issue - for stdin to the assembler. GNU as, if you omit an input file, assumes input will come on stdin. Unfortunately, clang does not do this, so you cannot compile anything when using the -pipe flag.

I looked to see what the llvm-mc utility does, and it behaves the same as GNU as. So we will need to switch to llvm-mc as our assembler.

But that has its own quirks: different flags to control 32-bit versus 64-bit output and no -v flag. When GNU as receives the -v flag, it prints a line of self-identification and then continues on assembling whatever you give it; llvm-mc does not have this flag, and giving it -version or --version causes it to display self-identification and immediately quit. It would be nice if llvm-mc emulated GNU as behavior with the -v flag, but it is not vital.

For the last time, we will need to update our wrapper program:

import std.process;

int main(strings[] args) {
    string[] av;
    av ~= "/usr/local/bin/llvm-mc16";
    av ~= "--filetype=obj";

    foreach (i; 1 .. args.length) {
        switch (args[i]) {
        case "--64":
            av ~= "--arch=x86-64";
            break;
        case "--32":
            av ~= "--arch=x86";
            break;
        case "-v":
            break;
        default:
            av ~= args[i];
        }
    }

    return spawnProcess(av).wait();
}

One last build and install later, and a rebuild of gcc just to be on the safe side, and all is good. We can now use the -pipe flag and things work as they should. I doubt many compiles with the -Wa flag would work well, but I've never seen any use of -Wa in the wild.

Conclusion

To answer our question: yes, it is possible to use clang as an assembler for gcc, though technically llvm-mc might be better. Is it useful? Your mileage may vary. I could imagine bringing up a new platform using LLVM and then wanting an Ada or Fortran compiler but not wanting to port both gcc and binutils. On the other hand, LLVM is growing a Fortran compiler; maybe someone wants to grow an Ada compiler for LLVM?

I wonder what changes would be needed to smooth over the differences we encountered during our experiment. Or perhaps the llvm-mc utility can be improved, or accompanied with a wrapper utility that emulates GNU as. That would bring things very close to having GCC be a boutique frontend for the LLVM system. I for one think that would be very cool.

Top

RSS