Dr. Brian Robert Callahan
academic, developer, with an eye towards a brighter techno-social life
Late last night, I released version 7.0 of the oksh shell, a portable version of the public domain KornShell distributed with OpenBSD. It is modestly packaged, with more packages always appreciated. There is a decently sized userbase for oksh and I fully intend to keep the project going far into the future.
Besides the release, I also added cproc to the list of officially supported compilers. This new compiler written by a one-man team combines a C11 frontend with the QBE backend. This brings the number of open source C compilers than can compile oksh to nine, with an additional three proprietary C compilers also supported. Happily, all nine open source C compilers run on OpenBSD/amd64. As such, I figured now would be an interesting time to do some size and runtime comparisons of oksh built with these different C compilers, because who doesn't like benchmarks and comparisons?
Here is a table of the compilers used in this test:
Compiler | Version | Notes |
---|---|---|
clang | OpenBSD clang version 11.1.0 | In-base /usr/bin/cc |
gcc | gcc (GCC) 12.0.0 20210929 (experimental) | Using GNU assembler (GNU Binutils) 2.37.50.20210929 |
pcc | Portable C Compiler 1.2.0.DEVEL 20211009 for x86_64-unknown-openbsd7.0 | Using GNU assembler version 2.17 (amd64-unknown-openbsd7.0) using BFD version 2.17 |
cparser | cparser 1.22.1(3e00440a231132c09af2927300bfdf57111e5564) using libFirm 1.22(b5269b56fc71ae323d344af3f6c28cf0a6a8ba4b) | Using /usr/bin/cc for assembling |
lacc | lacc version 0.0.1 | Git hash ac8c693 |
tcc | tcc version 0.9.27 mob:15e9b73 (x86_64 OpenBSD) | |
ccomp | The CompCert C verified compiler, version 3.9 | Using same assembler as pcc |
nwcc | nwcc 0.8.3 | Only available via openbsd-wip |
cproc | Git hash 0438799 | Using QBE to generate assembly and same assembler as pcc |
I built oksh with each compiler with the following invocation:
$ CC=compiler ./configure && make -j4 && strip oksh
The CFLAGS
were set to -g -O2 -DEMACS -DVI
for all compilers, which is the default CFLAGS
generated by the configure
script. For compilers that understand it, -w
was also added to CFLAGS
, though -w
does not alter generated code, it simply silences all warnings.
Here is the size comparisons for oksh built with each compiler:
Compiler | size |
ls -lh |
ls -l |
---|---|---|---|
clang |
text data bss dec hex 290366 9944 29520 329830 50866 |
296K | 303128 |
gcc |
text data bss dec hex 268042 6392 29856 304290 4a4a2 |
271K | 277424 |
pcc |
text data bss dec hex 264603 7842 29340 301785 49ad9 |
272K | 278048 |
cparser |
text data bss dec hex 510704 6824 29332 546860 8582c |
508K | 520328 |
lacc |
text data bss dec hex 479874 8112 29528 517514 7e58a |
479K | 490584 |
tcc |
text data bss dec hex 333742 19272 29944 382958 5d7ee |
349K | 357264 |
ccomp |
text data bss dec hex 242414 2092 29920 274426 42ffa |
242K | 247856 |
nwcc |
text data bss dec hex 504677 33832 4064 542573 8476d |
529K | 541864 |
cproc |
text data bss dec hex 273085 46640 672 320397 4e38d |
315K | 322984 |
It's hardly scientific, but I used shellbench to take some rudimentary performance benchmarks. The number in each cell represents the number of executions per second. All default timing options were used.
The CPU on this machine is nothing to write home about. The OpenBSD dmesg reports it as: cpu0: Intel(R) Core(TM) i3-4130T CPU @ 2.90GHz, 2893.77 MHz, 06-3c-03
.
Here are the results:
--------------------------------------------------------------------------------------------------------------------------------- name ccomp clang cparser cproc gcc lacc nwcc pcc tcc --------------------------------------------------------------------------------------------------------------------------------- assign.sh: positional params 120,480 135,912 120,467 107,797 132,831 96,315 101,370 121,182 125,365 assign.sh: variable 149,833 168,800 152,759 133,036 175,595 112,702 126,870 154,719 159,828 assign.sh: local var 150,314 167,916 149,430 133,086 175,451 120,018 126,792 151,532 159,659 assign.sh: local var (typeset) 146,472 168,290 149,651 133,826 179,307 119,384 124,353 150,597 159,561 cmp.sh: [ ] 102,599 114,543 104,091 90,132 120,294 84,642 88,007 102,350 110,166 cmp.sh: [[ ]] 134,883 148,904 137,297 120,988 159,494 108,223 111,817 136,210 144,857 cmp.sh: case 143,116 161,184 142,177 126,498 168,059 114,009 118,066 144,064 153,134 count.sh: posix 99,377 114,402 99,062 89,284 118,188 80,479 85,073 102,314 107,129 count.sh: typeset -i 93,680 106,229 91,287 83,837 112,057 74,496 79,192 97,624 100,232 count.sh: increment 122,121 139,373 123,634 109,692 141,372 98,588 103,631 124,997 126,625 eval.sh: direct assign 102,757 112,158 103,060 91,531 116,397 81,716 86,173 101,820 107,854 eval.sh: eval assign 61,284 71,082 65,642 58,574 72,221 51,591 54,339 64,123 68,345 eval.sh: command subs 1,003 1,000 957 984 979 961 926 975 982 func.sh: no func 157,653 177,901 157,760 141,130 183,365 124,218 131,318 158,251 167,751 func.sh: func 123,812 133,315 123,988 112,766 143,663 98,281 102,353 123,969 113,735 null.sh: blank 193,181 220,011 195,206 171,036 231,913 148,025 160,371 199,671 210,651 null.sh: assign variable 152,246 169,445 153,762 135,307 179,523 121,333 127,024 154,443 164,533 null.sh: define function 157,999 176,067 158,369 141,096 182,415 123,972 133,265 160,626 169,376 null.sh: undefined variable 155,830 175,639 156,716 137,595 186,286 122,140 128,986 158,758 160,851 null.sh: : command 155,128 175,486 158,795 126,223 186,013 124,761 131,694 159,717 167,526 subshell.sh: no subshell 150,788 170,053 148,928 132,214 176,504 117,694 126,137 152,294 158,430 subshell.sh: brace 145,805 161,929 144,323 130,041 165,539 116,066 122,843 146,184 141,735 subshell.sh: subshell 1,144 1,155 1,126 1,116 1,138 1,095 1,062 1,144 1,139 subshell.sh: command subs 1,080 1,058 1,015 1,046 1,045 994 1,009 1,086 1,051 ---------------------------------------------------------------------------------------------------------------------------------
All of these numbers need context. First, as clang is the in-base C compiler, it has been augmented with a number of security mitigations such as Retguard. Though the performance cost for such security mitigations is minimal, minimal does not equal zero. Even so, clang does manage to eke out a handful of runtime benchmark wins over gcc, which otherwise wins all other runtime benchmarks. Realistically, clang and gcc are generally within striking distance even with clang implementing additional security mitigations.
Another piece of context is that three of these compilers, ccomp, nwcc, and tcc, do not build position-independent executables. As with Retguard, PIE has negligible overhead on amd64, though the overhead can be significant on i386. Even so, the lack of PIE might contribute to the consistent third-place finishes for tcc. Maybe not. In any event, I am quite pleasantly surprised by the numbers tcc has put up. Coupled with the incredible speed of tcc, that makes tcc quite a formidable compiler. If only tcc also supported PIE.
Of all the compilers, I am most surprised by cproc. I was somewhat hopeful that the runtime benchmarks would have been more favorable, as QBE claims 70% of the performance of advanced compilers in 10% of the code. To be clear, QBE is actually overdelivering on its promise: the runtime benchmarks show that cproc is getting between 74%-82% of the performance of gcc. But still quite a bit away from matching the rest of the smaller optimizer class: ccomp, cparser, and pcc. And then there's whatever tcc is doing. I have no idea how tcc is outperforming the small optimizer class either. I must be doing something wrong.
The non-optimizing class, lacc and nwcc, are not too surprising in terms of performance. The lacc developer notes that improving code generation is on the radar. Unfortuantely, nwcc has not seen any development since September 2017, so perhaps this is the best it will do unless someone wants to pick up nwcc and continue its development. Maybe that's you. Though, tcc is also part of the non-optimizing class and it outperforms everything except clang and gcc.
A final bit of context: cproc needed some tweaks in order to ignore the inline assembly in the <machine/endian.h>
header on OpenBSD and to produce PIE executables. Additionally, the cproc author also gave me a patch that slightly improves the code generated by QBE.
I suppose don't dispose of clang or gcc, but if you need another C compiler you have some nice options. The Tiny C Compiler certainly comes to mind, but really any of these compilers would give you a perfectly good compiling and using experience with oksh. And that's the best benchmark I can think of!
If you know of other C compilers that can compile oksh, definitely let me know. Bonus points if it works on OpenBSD, or you port it to OpenBSD.