Brian Robert Callahan

academic, developer, with an eye towards a brighter techno-social life


I built the new gcobol compiler on OpenBSD

For those of you who keep tabs on the compiler space, you might have heard that last week an announcement was made on the GCC mailing list to share with the world gcobol, a new GCC frontend for COBOL. There has been some coverage of this new compiler but not too much. I did notice that a reply to the initial announcement mentions looking forward to gcobol being officially a part of GCC. So, like we did for the Modula-2 compiler, let's make sure that when gcobol is merged into GCC, we will be assured that it works well on OpenBSD. Even better, the announcement email welcomes people to use it, so let's do that. Maybe we'll even learn some COBOL along the way.

Cloning the gcc-cobol repository

The first order of business was to clone the repository from their GitLab instance. Nothing particularly special here and git was even nice enough to default to the cobol branch which is where the new gcobol compiler lives.

Configuring gcobol

At this point, I have built GCC so many times that I just keep scripts around to handle everything for me in an automated fashion. This is very helpful as I do in-house CI for GDC, which pulls in a checkout of both binutils and GCC from the tip of their trees and builds/installs them, including every GCC frontend except for Ada and Go. All I had to do was copy the GDC scripts to a new directory for the gcobol build and make some minor tweaks to point to the gcobol repository. I figured I would build gcobol with my CI GCC compiler, since I figured the version of GCC that gcobol was using had to be at least somewhat recent. Turns out gcobol is using GCC 10.2.1, which is modern enough.

Because I don't need or want anything in my gcobol build other than gcobol, I set the --enable-languages=cobol configure flag. You still get the C and C++ compilers with this arrangement; that's fine though, and likely expected since you need those to fully bootstrap GCC.

And then we wait. GCC is not known for being quick to compile.

Building gcobol

There were lots of fixes necessary to build gcobol. Some were expected: OpenBSD requires some extra configuration not in upstream GCC. But many were unexpected: it appears that gcobol struggles to correctly understand OpenBSD and also the gcobol parts of the compiler are using functions not found in OpenBSD. Normally, these functions would be found in the Gnulib portability library, but it looks like things aren't working perfectly smoothly yet. I think these are things that can easily be fixed, perhaps with an upgrade to the underlying GCC version. I don't have these problems with my GDC CI version of GCC, which is the rolling tip of the tree.

For completeness, here is the complete diff I needed to successfully build gcobol:

diff --git a/configure b/configure
index f2ec106a86e..468b8f2316f 100755
--- a/configure
+++ b/configure
@@ -5527,7 +5527,7 @@ esac
 # When bootstrapping with GCC, build stage 1 in C++98 mode to ensure that a
 # C++98 compiler can still start the bootstrap.
 if test "$enable_bootstrap:$GXX" = "yes:yes"; then
-  CXX="$CXX -std=gnu++98"
+  CXX="$CXX -std=gnu++11"
 # Used for setting $lt_cv_objdir
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 610273d91e9..a826009d2bb 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7595,7 +7595,7 @@ inline_string_cmp (rtx target, tree var_str, const char *const_str,
 /* Inline expansion of a call to str(n)cmp and memcmp, with result going
    to TARGET if that's convenient.
    If the call is not been inlined, return NULL_RTX.  */
+extern "C" size_t strnlen(const char *, size_t);
 static rtx
 inline_expand_builtin_bytecmp (tree exp, rtx target)
diff --git a/gcc/cobol/cdf-copy.c b/gcc/cobol/cdf-copy.c
index 24a0db03e5d..6b2bf0ee90f 100644
--- a/gcc/cobol/cdf-copy.c
+++ b/gcc/cobol/cdf-copy.c
@@ -334,6 +334,11 @@ substitute( int fd, int output ) {
  * separator semicolon, or a sequence of one or more separator spaces
  * is considered to be a single space."
+#ifdef __cplusplus
+extern "C" char *stpcpy(char *, const char *);
+extern char *stpcpy(char *, const char *);
 static const char *
 esc( const char input[] ) {
   static char buffer[512];
diff --git a/gcc/cobol/cdf.y b/gcc/cobol/cdf.y
index 8be4c024c93..c18b0773032 100644
--- a/gcc/cobol/cdf.y
+++ b/gcc/cobol/cdf.y
@@ -29,6 +29,8 @@ is_word( int c ) {
 extern int yylineno, yyleng;
 extern char *yytext;
+extern ssize_t getline(char **, size_t *, FILE *);
 void yyerror( char const *s );
 void yyerrorv( const char fmt[], ... );
diff --git a/gcc/cobol/intrinsic.c b/gcc/cobol/intrinsic.c
index 200a9a0ac9a..ac0afaf9124 100644
--- a/gcc/cobol/intrinsic.c
+++ b/gcc/cobol/intrinsic.c
@@ -276,6 +276,17 @@ __gg__rem( size_t n, cblc_resolved_t *inputs[] )
     return dividend % divisor;
+struct random_data
+  {
+    int32_t *fptr;		/* Front pointer.  */
+    int32_t *rptr;		/* Rear pointer.  */
+    int32_t *state;		/* Array of state values.  */
+    int rand_type;		/* Type of random number generator.  */
+    int rand_deg;		/* Degree of random number generator.  */
+    int rand_sep;		/* Distance between front and rear.  */
+    int32_t *end_ptr;		/* Pointer behind state table.  */
+  };
 extern "C"
 __gg__random(size_t ninputs, cblc_resolved_t *inputs[] )
@@ -299,15 +310,15 @@ __gg__random(size_t ninputs, cblc_resolved_t *inputs[] )
         buf = (random_data *)malloc(sizeof(struct random_data));
         buf->state = NULL;
         state = (char *)malloc(state_len);
-        initstate_r( 49081, state, state_len, buf);
+        initstate( 49081, state, state_len);
     if( ninputs )
         int seed = (int)__gg__resolved_binary_value(&hyphen, &rdigits, inputs[0]);
-        srandom_r(seed, buf);
+        srandom(seed);
     int32_t retval_31;
-    random_r(buf, &retval_31);
+    random();
     // We are going to convert this to a value between zero and not quite one:
diff --git a/gcc/cobol/libgcobol.c b/gcc/cobol/libgcobol.c
index a63d384bd5e..c41dfb98416 100644
--- a/gcc/cobol/libgcobol.c
+++ b/gcc/cobol/libgcobol.c
@@ -3563,7 +3563,7 @@ static size_t record_count( cblc_file_t *file ) {
     return sb.st_size / size;
-typedef int (*file_sort_cmp_t)(const void *, const void *, void *);
+typedef int (*file_sort_cmp_t)(const void *, const void *);
 extern "C"
@@ -3588,7 +3588,7 @@ __gg_file_sort( cblc_file_t *file,
-    qsort_r( mem, nelem, size, cmp, NULL );
+    qsort( mem, nelem, size, cmp );
     munmap(mem, size * nelem);
     if( handle_ferror(file, __func__, "mmap() failure") )
@@ -3912,60 +3912,9 @@ struct for_sort_table
 static int
-compare_for_sort_table(const void *e1, const void *e2, void *sorter_)
+compare_for_sort_table(const void *e1, const void *e2)
     int retval = 0;
-    struct for_sort_table *sorter = (struct for_sort_table *)sorter_;
-    assert( (const unsigned char *)e1 >= sorter->bottom );
-    assert( (const unsigned char *)e2 >= sorter->bottom );
-    assert( (const unsigned char *)e1 < sorter->top );
-    assert( (const unsigned char *)e2 < sorter->top );
-    cblc_resolved_t left_ref  = {};
-    cblc_resolved_t right_ref = {};
-    left_ref.field           = sorter->left_side;
-    left_ref.actual_location = sorter->left_side->ref_data;
-    left_ref.actual_length   = sorter->left_side->ref_capacity;
-    right_ref.field           = sorter->right_side;
-    right_ref.actual_location = sorter->right_side->ref_data;
-    right_ref.actual_length   = sorter->right_side->ref_capacity;
-    for(size_t i=0; i<sorter->nkeys; i++)
-        {
-        // e1 and e2 each point to an entire row of the table.
-        // For each key, we need to pull out the relevant piece of the row
-        // that is the actual key:
-        const unsigned char *key1;
-        const unsigned char *key2;
-        if( sorter->ascending[i] )
-            {
-            key1 = (const unsigned char *)e1 + sorter->left_side[i].offset;
-            key2 = (const unsigned char *)e2 + sorter->left_side[i].offset;
-            }
-        else
-            {
-            // We accomplish a descending sort by swapping the data sources
-            key1 = (const unsigned char *)e2 + sorter->left_side[i].offset;
-            key2 = (const unsigned char *)e1 + sorter->left_side[i].offset;
-            }
-        memcpy(sorter->left_side[i].ref_data,  key1, sorter->left_side[i].ref_capacity);
-        memcpy(sorter->right_side[i].ref_data, key2, sorter->right_side[i].ref_capacity);
-        retval = __gg__compare(&left_ref, &right_ref, 0);
-        if( !retval )
-            {
-            // We are going to use the e1 and e2 pointers as a tiebreaker in
-            // order to create a stable sort.
-            retval = e1 < e2 ? -1 : 1;
-            }
-        }
     return retval;
@@ -4008,11 +3957,10 @@ __gg__sort_table(   cblc_resolved_t  *table,
     sorter.bottom = table->actual_location;    = table->actual_location + occurs * table->actual_length;
-    qsort_r(    table->actual_location,
+    qsort(      table->actual_location,
-                compare_for_sort_table,
-                &sorter );
+                compare_for_sort_table );
     // With the in-place sort completed, we are done
     for( size_t i=0; i<nkeys; i++ )
diff --git a/gcc/cobol/parse.y b/gcc/cobol/parse.y
index 148ae5e62e1..160061d4c37 100644
--- a/gcc/cobol/parse.y
+++ b/gcc/cobol/parse.y
@@ -4069,7 +4069,7 @@ write_file: 	write_what[name] advance_when[when] advancing
 	|	write_what[name]
 		  cbl_file_t *file = cbl_file_of(symbol_file(PROGRAM, $name->name));
-		  parser_file_write( file, $name, NULL, NULL );
+		  parser_file_write( file, $name, __null, NULL );
diff --git a/gcc/cobol/scan.l b/gcc/cobol/scan.l
index ffb6051b170..3999c76414b 100644
--- a/gcc/cobol/scan.l
+++ b/gcc/cobol/scan.l
@@ -203,6 +203,11 @@ int ydfparse(void);
 FILE * copy_mode_start();
+#ifdef __cplusplus
+extern "C" char *strsignal(int);
+extern char *strsignal(int);
 static int
 wait_for_the_children(void) {
   pid_t pid;
diff --git a/gcc/cobol/util.c b/gcc/cobol/util.c
index b5166b75f75..fa7862ed953 100644
--- a/gcc/cobol/util.c
+++ b/gcc/cobol/util.c
@@ -638,7 +638,7 @@ static size_t record_count( cbl_file_t *file ) {
   return sb.st_size / size;
-typedef int (*file_sort_cmp_t)(const void *, const void *, void *);
+typedef int (*file_sort_cmp_t)(const void *, const void *);
  * mmap file, qsort, and unmap. 
@@ -659,7 +659,7 @@ cbl_sort_file( cbl_file_t *file, file_sort_cmp_t cmp, void *arg ) {
     return false;
-  qsort_r( mem, nelem, size, cmp, arg );
+  qsort( mem, nelem, size, cmp );
   if( 0 != munmap(mem, size * nelem) ) {
     return false;
@@ -681,7 +681,7 @@ cbl_file_union( int tgt, int src ) {
   if( 0 != fstat(tgt, &sb) ) {
     return false;
-  loff_t off_in = sb.st_size, off_out = 0; 
+  off_t off_in = sb.st_size, off_out = 0; 
   if( 0 != fstat(src, &sb) ) {
     return false;
@@ -693,7 +693,8 @@ cbl_file_union( int tgt, int src ) {
     return false;
-  ssize_t n = copy_file_range(src, &off_in, tgt, &off_out, len, flags);
+  //ssize_t n = copy_file_range(src, &off_in, tgt, &off_out, len, flags);
+  ssize_t n = 0;
   return n == (ssize_t)len;
diff --git a/gcc/collect-utils.c b/gcc/collect-utils.c
index e85843bc862..5183d541d68 100644
--- a/gcc/collect-utils.c
+++ b/gcc/collect-utils.c
@@ -58,6 +58,7 @@ fatal_signal (int signum)
 /* Wait for a process to finish, and exit if a nonzero status is found.  */
+extern "C" char *strsignal(int);
 collect_wait (const char *prog, struct pex_obj *pex)
diff --git a/gcc/config/elfos.h b/gcc/config/elfos.h
index 74a3eafda6b..fb65bcbb3ed 100644
--- a/gcc/config/elfos.h
+++ b/gcc/config/elfos.h
@@ -109,6 +109,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    For most svr4 systems, the convention is that any symbol which begins
    with a period is not put into the linker symbol table by the assembler.  */
+#ifdef __cplusplus
+extern "C" char *stpcpy(char *, const char *);
+extern char *stpcpy(char *, const char *);
diff --git a/gcc/config/i386/openbsdelf.h b/gcc/config/i386/openbsdelf.h
index 7771e4c9ddb..4c094e9c699 100644
--- a/gcc/config/i386/openbsdelf.h
+++ b/gcc/config/i386/openbsdelf.h
@@ -91,13 +91,16 @@ along with GCC; see the file COPYING3.  If not see
    %{shared:-shared} %{R*} \
    %{static:-Bstatic} \
    %{!static:-Bdynamic} \
+   %{rdynamic:-export-dynamic} \
    %{assert*} \
-   -dynamic-linker /usr/libexec/"
+   %{!shared:%{!-dynamic-linker:-dynamic-linker /usr/libexec/}} \
+   %{!nostdlib:-L/usr/lib}"
 #define STARTFILE_SPEC "\
-	%{!shared: %{pg:gcrt0%O%s} %{!pg:%{p:gcrt0%O%s} %{!p:crt0%O%s}} \
-	crtbegin%O%s} %{shared:crtbeginS%O%s}"
+	%{!shared: %{pg:gcrt0%O%s} %{!pg:%{p:gcrt0%O%s} \
+	%{!p:%{!static:crt0%O%s} %{static:%{nopie:crt0%O%s} \
+	%{!nopie:rcrt0%O%s}}}} crtbegin%O%s} %{shared:crtbeginS%O%s}"
 #define ENDFILE_SPEC "%{!shared:crtend%O%s} %{shared:crtendS%O%s}"
diff --git a/gcc/config/openbsd.opt b/gcc/config/openbsd.opt
index ae7926a3719..3db4d647b9e 100644
--- a/gcc/config/openbsd.opt
+++ b/gcc/config/openbsd.opt
@@ -32,4 +32,7 @@ Driver
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/config/t-openbsd b/gcc/config/t-openbsd
index 7637da073b2..ccbba29a4b7 100644
--- a/gcc/config/t-openbsd
+++ b/gcc/config/t-openbsd
@@ -1,2 +1,6 @@
 # We don't need GCC's own include files.
+USER_H = $(srcdir)/ginclude/stdfix.h \
+	 $(srcdir)/ginclude/stdnoreturn.h \
+	 $(srcdir)/ginclude/stdalign.h \
+	 $(srcdir)/ginclude/stdatomic.h \
diff --git a/gcc/configure b/gcc/configure
index 8fe9c91fd7c..4220c675d6a 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -30723,7 +30723,7 @@ if ${gcc_cv_c_no_fpie+:} false; then :
   $as_echo_n "(cached) " >&6
    cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 int main(void) {return 0;}
@@ -30739,7 +30739,7 @@ fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_c_no_fpie" >&5
 $as_echo "$gcc_cv_c_no_fpie" >&6; }
 if test "$gcc_cv_c_no_fpie" = "yes"; then
@@ -30750,7 +30750,7 @@ if ${gcc_cv_no_pie+:} false; then :
   $as_echo_n "(cached) " >&6
-   LDFLAGS="$LDFLAGS -no-pie"
    cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 int main(void) {return 0;}
@@ -30767,7 +30767,7 @@ fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_no_pie" >&5
 $as_echo "$gcc_cv_no_pie" >&6; }
 if test "$gcc_cv_no_pie" = "yes"; then
-  NO_PIE_FLAG="-no-pie"
diff --git a/gcc/gcc-ar.c b/gcc/gcc-ar.c
index 3e1c9fe8569..98c935835d2 100644
--- a/gcc/gcc-ar.c
+++ b/gcc/gcc-ar.c
@@ -122,6 +122,8 @@ setup_prefixes (const char *exec_path)
   prefix_from_env ("PATH", &path);
+extern "C" char *strsignal(int);
 main (int ac, char **av)
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 9f790db0daf..08ab557e323 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -3019,6 +3019,7 @@ add_sysrooted_hdrs_prefix (struct path_prefix *pprefix, const char *prefix,
    with `|' between them.
    Return 0 if successful, -1 if failed.  */
+extern "C" char *strsignal(int);
 static int
 execute (void)
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 2d7c5292151..95f3db1003b 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -2382,6 +2382,7 @@ gimple_load_first_char (location_t loc, tree str, gimple_seq *stmts)
 /* Fold a call to the str{n}{case}cmp builtin pointed by GSI iterator.  */
+extern "C" size_t strnlen(const char *, size_t);
 static bool
 gimple_fold_builtin_string_compare (gimple_stmt_iterator *gsi)
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 9ba687fd775..bccf2599920 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1011,6 +1011,7 @@ output_symtab (void)
 /* Return identifier encoded in IB as a plain string.  */
+extern "C" size_t strnlen(const char *, size_t);
 static tree
 read_identifier (class lto_input_block *ib)
diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index 407f7300dfb..168732f25de 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -1065,6 +1065,7 @@ static const char *get_end_url_string (pretty_printer *);
 /* Formatting phases 1 and 2: render TEXT->format_spec plus
    TEXT->args_ptr into a series of chunks in pp_buffer (PP)->args[].
    Phase 3 is in pp_output_formatted_text.  */
+extern "C" size_t strnlen(const char *, size_t);
 pp_format (pretty_printer *pp, text_info *text)
diff --git a/gcc/toplev.c b/gcc/toplev.c
index e0b1b85731f..b1db0c66306 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -311,6 +311,7 @@ set_random_seed (const char *val)
 /* Handler for fatal signals, such as SIGSEGV.  These are transformed
    into ICE messages, which is much more user friendly.  In case the
    error printer crashes, reset the signal to prevent infinite recursion.  */
+extern "C" char *strsignal(int);
 static void
 crash_signal (int signo)
diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index b0874da5d1e..92b96fbc3fd 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -4639,6 +4639,7 @@ count_nonzero_bytes_addr (tree, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
    Uses RVALS to determine range information.
    Avoids recursing deeper than the limits in SNLIM allow.
    Returns true on success and false otherwise.  */
+extern "C" size_t strnlen(const char *, size_t);
 static bool
 count_nonzero_bytes (tree exp, unsigned HOST_WIDE_INT offset,
diff --git a/libcc1/ b/libcc1/
index a91dfc8c5e2..b73c5440b6d 100644
--- a/libcc1/
+++ b/libcc1/
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
 #include <cc1plugin-config.h>
 #include <string>
 #include <unistd.h>
+#include <sys/signal.h>
 #include <sys/types.h>
 #include <string.h>
 #include <errno.h>

Trying out the new compiler

Now I had a shiny new gcobol compiler. I guess I need to learn some COBOL.

One of the articles I read about gcobol mentioned that there was a textbook repository containing lots of small COBOL programs, and that gcobol was already extremely proficient at correctly compiling these programs. Seems like as good a place as any to begin our COBOL journey. There is also a COBOL85 test suite that the gcobol team is working on fully passing. That would be good for us to test as well, even if we get lots of errors now. We can see the progression of the compiler as more and more of the test suite compiles successfully.

Taking a look at the programs in the textbook repository, it looks like COBOL is one of those SHOUTING LANGUAGES, or a language that uses keywords that are in all capital letters. You shouldn't take that SHOUTING LANGUAGES thing too seriously, it's just a bit of humor.

I think I might need to go out and buy that textbook. Some of this COBOL code makes sense to me, such as the divvying up of the different sections. But some of the semantics of the language, particularly the data, is lost on me.

Keep in mind that gcobol is still alpha software, so you have to be willing to work with the limitations. I noticed that gcobol does not automatically figure out how to add a main symbol so even though COBOL programs may be valid, gcobol won't be able to successfully link programs. I discovered that changing the PROGRAM-ID of the program to main will get things working. You can see a picture of this here.

For the interested, here is the size of the simple hello world program from the image above:

text    data    bss     dec     hex
55245   5944    3106    64295   fb27

You also need to manually link your COBOL programs with -lgcobol -lm. At least powl is used in libgcobol hence why both libraries are needed. But at least for simple programs, these small tweaks—renaming the program to main to get a main symbol and adding -lgcobol -lm to the compiler invocation—is enough to successfully compile and link working programs.


I do intend to reach out to upstream to report the status of gcobol on OpenBSD. I understand they have other more pressing issues but if they really do intend to merge this into GCC, I want to make sure that OpenBSD can take advantage of gcobol on day one.

Getting your own gcobol compiler

If you'd like your own gcobol compiler, I have posted an amd64 package ready to be installed with pkg_add(1) on GitHub here. This package requires the devel/gas package, which will be auto-installed for you when installing the gcobol package. I will continue to update the package from time to time.

This gcobol compiler will install into /usr/local/cobol so please add /usr/local/cobol/bin to your PATH if you want to use it without typing in the full path every time.

If you need my configure scripts and package scripts because you want to bootstrap gcobol on a different architecture, feel free to reach out to me.


I spoke to gcobol upstream. Among other things, I learned that you can add a -main flag to your compiles and gcobol will insert a main() function that calls the function identified by PROGRAM-ID, which works for me in all the cases I've tried. Neat!