Brian Robert Callahan

academic, developer, with an eye towards a brighter techno-social life



[prev]
[next]

2021-10-10
Benchmarking the oksh shell, or, let's look at a lot of C compilers

Late last night, I released version 7.0 of the oksh shell, a portable version of the public domain KornShell distributed with OpenBSD. It is modestly packaged, with more packages always appreciated. There is a decently sized userbase for oksh and I fully intend to keep the project going far into the future.

Besides the release, I also added cproc to the list of officially supported compilers. This new compiler written by a one-man team combines a C11 frontend with the QBE backend. This brings the number of open source C compilers than can compile oksh to nine, with an additional three proprietary C compilers also supported. Happily, all nine open source C compilers run on OpenBSD/amd64. As such, I figured now would be an interesting time to do some size and runtime comparisons of oksh built with these different C compilers, because who doesn't like benchmarks and comparisons?

Methodology

Here is a table of the compilers used in this test:

Compiler Version Notes
clang OpenBSD clang version 11.1.0 In-base /usr/bin/cc
gcc gcc (GCC) 12.0.0 20210929 (experimental) Using GNU assembler (GNU Binutils) 2.37.50.20210929
pcc Portable C Compiler 1.2.0.DEVEL 20211009 for x86_64-unknown-openbsd7.0 Using GNU assembler version 2.17 (amd64-unknown-openbsd7.0) using BFD version 2.17
cparser cparser 1.22.1(3e00440a231132c09af2927300bfdf57111e5564) using libFirm 1.22(b5269b56fc71ae323d344af3f6c28cf0a6a8ba4b) Using /usr/bin/cc for assembling
lacc lacc version 0.0.1 Git hash ac8c693
tcc tcc version 0.9.27 mob:15e9b73 (x86_64 OpenBSD)
ccomp The CompCert C verified compiler, version 3.9 Using same assembler as pcc
nwcc nwcc 0.8.3 Only available via openbsd-wip
cproc Git hash 0438799 Using QBE to generate assembly and same assembler as pcc

I built oksh with each compiler with the following invocation:

$ CC=compiler ./configure && make -j4 && strip oksh

The CFLAGS were set to -g -O2 -DEMACS -DVI for all compilers, which is the default CFLAGS generated by the configure script. For compilers that understand it, -w was also added to CFLAGS, though -w does not alter generated code, it simply silences all warnings.

Binary sizes

Here is the size comparisons for oksh built with each compiler:

Compiler size ls -lh ls -l
clang
text    data    bss     dec     hex
290366  9944    29520   329830  50866
296K 303128
gcc
text    data    bss     dec     hex
268042  6392    29856   304290  4a4a2
271K 277424
pcc
text    data    bss     dec     hex
264603  7842    29340   301785  49ad9
272K 278048
cparser
text    data    bss     dec     hex
510704  6824    29332   546860  8582c
508K 520328
lacc
text    data    bss     dec     hex
479874  8112    29528   517514  7e58a
479K 490584
tcc
text    data    bss     dec     hex
333742  19272   29944   382958  5d7ee
349K 357264
ccomp
text    data    bss     dec     hex
242414  2092    29920   274426  42ffa
242K 247856
nwcc
text    data    bss     dec     hex
504677  33832   4064    542573  8476d
529K 541864
cproc
text    data    bss     dec     hex
273085  46640   672     320397  4e38d
315K 322984

Runtime testing

It's hardly scientific, but I used shellbench to take some rudimentary performance benchmarks. The number in each cell represents the number of executions per second. All default timing options were used.

The CPU on this machine is nothing to write home about. The OpenBSD dmesg reports it as: cpu0: Intel(R) Core(TM) i3-4130T CPU @ 2.90GHz, 2893.77 MHz, 06-3c-03.

Here are the results:

---------------------------------------------------------------------------------------------------------------------------------
name                                ccomp      clang    cparser      cproc        gcc       lacc       nwcc        pcc        tcc
---------------------------------------------------------------------------------------------------------------------------------
assign.sh: positional params      120,480    135,912    120,467    107,797    132,831     96,315    101,370    121,182    125,365
assign.sh: variable               149,833    168,800    152,759    133,036    175,595    112,702    126,870    154,719    159,828
assign.sh: local var              150,314    167,916    149,430    133,086    175,451    120,018    126,792    151,532    159,659
assign.sh: local var (typeset)    146,472    168,290    149,651    133,826    179,307    119,384    124,353    150,597    159,561
cmp.sh: [ ]                       102,599    114,543    104,091     90,132    120,294     84,642     88,007    102,350    110,166
cmp.sh: [[ ]]                     134,883    148,904    137,297    120,988    159,494    108,223    111,817    136,210    144,857
cmp.sh: case                      143,116    161,184    142,177    126,498    168,059    114,009    118,066    144,064    153,134
count.sh: posix                    99,377    114,402     99,062     89,284    118,188     80,479     85,073    102,314    107,129
count.sh: typeset -i               93,680    106,229     91,287     83,837    112,057     74,496     79,192     97,624    100,232
count.sh: increment               122,121    139,373    123,634    109,692    141,372     98,588    103,631    124,997    126,625
eval.sh: direct assign            102,757    112,158    103,060     91,531    116,397     81,716     86,173    101,820    107,854
eval.sh: eval assign               61,284     71,082     65,642     58,574     72,221     51,591     54,339     64,123     68,345
eval.sh: command subs               1,003      1,000        957        984        979        961        926        975        982
func.sh: no func                  157,653    177,901    157,760    141,130    183,365    124,218    131,318    158,251    167,751
func.sh: func                     123,812    133,315    123,988    112,766    143,663     98,281    102,353    123,969    113,735
null.sh: blank                    193,181    220,011    195,206    171,036    231,913    148,025    160,371    199,671    210,651
null.sh: assign variable          152,246    169,445    153,762    135,307    179,523    121,333    127,024    154,443    164,533
null.sh: define function          157,999    176,067    158,369    141,096    182,415    123,972    133,265    160,626    169,376
null.sh: undefined variable       155,830    175,639    156,716    137,595    186,286    122,140    128,986    158,758    160,851
null.sh: : command                155,128    175,486    158,795    126,223    186,013    124,761    131,694    159,717    167,526
subshell.sh: no subshell          150,788    170,053    148,928    132,214    176,504    117,694    126,137    152,294    158,430
subshell.sh: brace                145,805    161,929    144,323    130,041    165,539    116,066    122,843    146,184    141,735
subshell.sh: subshell               1,144      1,155      1,126      1,116      1,138      1,095      1,062      1,144      1,139
subshell.sh: command subs           1,080      1,058      1,015      1,046      1,045        994      1,009      1,086      1,051
---------------------------------------------------------------------------------------------------------------------------------

Context

All of these numbers need context. First, as clang is the in-base C compiler, it has been augmented with a number of security mitigations such as Retguard. Though the performance cost for such security mitigations is minimal, minimal does not equal zero. Even so, clang does manage to eke out a handful of runtime benchmark wins over gcc, which otherwise wins all other runtime benchmarks. Realistically, clang and gcc are generally within striking distance even with clang implementing additional security mitigations.

Another piece of context is that three of these compilers, ccomp, nwcc, and tcc, do not build position-independent executables. As with Retguard, PIE has negligible overhead on amd64, though the overhead can be significant on i386. Even so, the lack of PIE might contribute to the consistent third-place finishes for tcc. Maybe not. In any event, I am quite pleasantly surprised by the numbers tcc has put up. Coupled with the incredible speed of tcc, that makes tcc quite a formidable compiler. If only tcc also supported PIE.

Of all the compilers, I am most surprised by cproc. I was somewhat hopeful that the runtime benchmarks would have been more favorable, as QBE claims 70% of the performance of advanced compilers in 10% of the code. To be clear, QBE is actually overdelivering on its promise: the runtime benchmarks show that cproc is getting between 74%-82% of the performance of gcc. But still quite a bit away from matching the rest of the smaller optimizer class: ccomp, cparser, and pcc. And then there's whatever tcc is doing. I have no idea how tcc is outperforming the small optimizer class either. I must be doing something wrong.

The non-optimizing class, lacc and nwcc, are not too surprising in terms of performance. The lacc developer notes that improving code generation is on the radar. Unfortuantely, nwcc has not seen any development since September 2017, so perhaps this is the best it will do unless someone wants to pick up nwcc and continue its development. Maybe that's you. Though, tcc is also part of the non-optimizing class and it outperforms everything except clang and gcc.

A final bit of context: cproc needed some tweaks in order to ignore the inline assembly in the <machine/endian.h> header on OpenBSD and to produce PIE executables. Additionally, the cproc author also gave me a patch that slightly improves the code generated by QBE.

Conclusion

I suppose don't dispose of clang or gcc, but if you need another C compiler you have some nice options. The Tiny C Compiler certainly comes to mind, but really any of these compilers would give you a perfectly good compiling and using experience with oksh. And that's the best benchmark I can think of!

If you know of other C compilers that can compile oksh, definitely let me know. Bonus points if it works on OpenBSD, or you port it to OpenBSD.

Top

RSS