Dr. Brian Robert Callahan
academic, developer, with an eye towards a brighter techno-social life
For those of you who keep tabs on the compiler space, you might have heard that last week an announcement was made on the GCC mailing list to share with the world gcobol, a new GCC frontend for COBOL. There has been some coverage of this new compiler but not too much. I did notice that a reply to the initial announcement mentions looking forward to gcobol being officially a part of GCC. So, like we did for the Modula-2 compiler, let's make sure that when gcobol is merged into GCC, we will be assured that it works well on OpenBSD. Even better, the announcement email welcomes people to use it, so let's do that. Maybe we'll even learn some COBOL along the way.
The first order of business was to clone the repository from their GitLab instance. Nothing particularly special here and git was even nice enough to default to the cobol
branch which is where the new gcobol compiler lives.
At this point, I have built GCC so many times that I just keep scripts around to handle everything for me in an automated fashion. This is very helpful as I do in-house CI for GDC, which pulls in a checkout of both binutils and GCC from the tip of their trees and builds/installs them, including every GCC frontend except for Ada and Go. All I had to do was copy the GDC scripts to a new directory for the gcobol build and make some minor tweaks to point to the gcobol repository. I figured I would build gcobol with my CI GCC compiler, since I figured the version of GCC that gcobol was using had to be at least somewhat recent. Turns out gcobol is using GCC 10.2.1, which is modern enough.
Because I don't need or want anything in my gcobol build other than gcobol, I set the --enable-languages=cobol
configure flag. You still get the C and C++ compilers with this arrangement; that's fine though, and likely expected since you need those to fully bootstrap GCC.
And then we wait. GCC is not known for being quick to compile.
There were lots of fixes necessary to build gcobol. Some were expected: OpenBSD requires some extra configuration not in upstream GCC. But many were unexpected: it appears that gcobol struggles to correctly understand OpenBSD and also the gcobol parts of the compiler are using functions not found in OpenBSD. Normally, these functions would be found in the Gnulib portability library, but it looks like things aren't working perfectly smoothly yet. I think these are things that can easily be fixed, perhaps with an upgrade to the underlying GCC version. I don't have these problems with my GDC CI version of GCC, which is the rolling tip of the tree.
For completeness, here is the complete diff I needed to successfully build gcobol:
diff --git a/configure b/configure index f2ec106a86e..468b8f2316f 100755 --- a/configure +++ b/configure @@ -5527,7 +5527,7 @@ esac # When bootstrapping with GCC, build stage 1 in C++98 mode to ensure that a # C++98 compiler can still start the bootstrap. if test "$enable_bootstrap:$GXX" = "yes:yes"; then - CXX="$CXX -std=gnu++98" + CXX="$CXX -std=gnu++11" fi # Used for setting $lt_cv_objdir diff --git a/gcc/builtins.c b/gcc/builtins.c index 610273d91e9..a826009d2bb 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -7595,7 +7595,7 @@ inline_string_cmp (rtx target, tree var_str, const char *const_str, /* Inline expansion of a call to str(n)cmp and memcmp, with result going to TARGET if that's convenient. If the call is not been inlined, return NULL_RTX. */ - +extern "C" size_t strnlen(const char *, size_t); static rtx inline_expand_builtin_bytecmp (tree exp, rtx target) { diff --git a/gcc/cobol/cdf-copy.c b/gcc/cobol/cdf-copy.c index 24a0db03e5d..6b2bf0ee90f 100644 --- a/gcc/cobol/cdf-copy.c +++ b/gcc/cobol/cdf-copy.c @@ -334,6 +334,11 @@ substitute( int fd, int output ) { * separator semicolon, or a sequence of one or more separator spaces * is considered to be a single space." */ +#ifdef __cplusplus +extern "C" char *stpcpy(char *, const char *); +#else +extern char *stpcpy(char *, const char *); +#endif static const char * esc( const char input[] ) { static char buffer[512]; diff --git a/gcc/cobol/cdf.y b/gcc/cobol/cdf.y index 8be4c024c93..c18b0773032 100644 --- a/gcc/cobol/cdf.y +++ b/gcc/cobol/cdf.y @@ -29,6 +29,8 @@ is_word( int c ) { extern int yylineno, yyleng; extern char *yytext; +extern ssize_t getline(char **, size_t *, FILE *); + void yyerror( char const *s ); void yyerrorv( const char fmt[], ... ); diff --git a/gcc/cobol/intrinsic.c b/gcc/cobol/intrinsic.c index 200a9a0ac9a..ac0afaf9124 100644 --- a/gcc/cobol/intrinsic.c +++ b/gcc/cobol/intrinsic.c @@ -276,6 +276,17 @@ __gg__rem( size_t n, cblc_resolved_t *inputs[] ) return dividend % divisor; } +struct random_data + { + int32_t *fptr; /* Front pointer. */ + int32_t *rptr; /* Rear pointer. */ + int32_t *state; /* Array of state values. */ + int rand_type; /* Type of random number generator. */ + int rand_deg; /* Degree of random number generator. */ + int rand_sep; /* Distance between front and rear. */ + int32_t *end_ptr; /* Pointer behind state table. */ + }; + extern "C" double __gg__random(size_t ninputs, cblc_resolved_t *inputs[] ) @@ -299,15 +310,15 @@ __gg__random(size_t ninputs, cblc_resolved_t *inputs[] ) buf = (random_data *)malloc(sizeof(struct random_data)); buf->state = NULL; state = (char *)malloc(state_len); - initstate_r( 49081, state, state_len, buf); + initstate( 49081, state, state_len); } if( ninputs ) { int seed = (int)__gg__resolved_binary_value(&hyphen, &rdigits, inputs[0]); - srandom_r(seed, buf); + srandom(seed); } int32_t retval_31; - random_r(buf, &retval_31); + random(); // We are going to convert this to a value between zero and not quite one: diff --git a/gcc/cobol/libgcobol.c b/gcc/cobol/libgcobol.c index a63d384bd5e..c41dfb98416 100644 --- a/gcc/cobol/libgcobol.c +++ b/gcc/cobol/libgcobol.c @@ -3563,7 +3563,7 @@ static size_t record_count( cblc_file_t *file ) { return sb.st_size / size; } -typedef int (*file_sort_cmp_t)(const void *, const void *, void *); +typedef int (*file_sort_cmp_t)(const void *, const void *); extern "C" void @@ -3588,7 +3588,7 @@ __gg_file_sort( cblc_file_t *file, return; } - qsort_r( mem, nelem, size, cmp, NULL ); + qsort( mem, nelem, size, cmp ); munmap(mem, size * nelem); if( handle_ferror(file, __func__, "mmap() failure") ) @@ -3912,60 +3912,9 @@ struct for_sort_table }; static int -compare_for_sort_table(const void *e1, const void *e2, void *sorter_) +compare_for_sort_table(const void *e1, const void *e2) { int retval = 0; - struct for_sort_table *sorter = (struct for_sort_table *)sorter_; - - assert( (const unsigned char *)e1 >= sorter->bottom ); - assert( (const unsigned char *)e2 >= sorter->bottom ); - assert( (const unsigned char *)e1 < sorter->top ); - assert( (const unsigned char *)e2 < sorter->top ); - - cblc_resolved_t left_ref = {}; - cblc_resolved_t right_ref = {}; - - left_ref.field = sorter->left_side; - left_ref.actual_location = sorter->left_side->ref_data; - left_ref.actual_length = sorter->left_side->ref_capacity; - - right_ref.field = sorter->right_side; - right_ref.actual_location = sorter->right_side->ref_data; - right_ref.actual_length = sorter->right_side->ref_capacity; - - for(size_t i=0; i<sorter->nkeys; i++) - { - // e1 and e2 each point to an entire row of the table. - - // For each key, we need to pull out the relevant piece of the row - // that is the actual key: - - const unsigned char *key1; - const unsigned char *key2; - if( sorter->ascending[i] ) - { - key1 = (const unsigned char *)e1 + sorter->left_side[i].offset; - key2 = (const unsigned char *)e2 + sorter->left_side[i].offset; - } - else - { - // We accomplish a descending sort by swapping the data sources - key1 = (const unsigned char *)e2 + sorter->left_side[i].offset; - key2 = (const unsigned char *)e1 + sorter->left_side[i].offset; - } - - memcpy(sorter->left_side[i].ref_data, key1, sorter->left_side[i].ref_capacity); - memcpy(sorter->right_side[i].ref_data, key2, sorter->right_side[i].ref_capacity); - - retval = __gg__compare(&left_ref, &right_ref, 0); - - if( !retval ) - { - // We are going to use the e1 and e2 pointers as a tiebreaker in - // order to create a stable sort. - retval = e1 < e2 ? -1 : 1; - } - } return retval; } @@ -4008,11 +3957,10 @@ __gg__sort_table( cblc_resolved_t *table, sorter.bottom = table->actual_location; sorter.top = table->actual_location + occurs * table->actual_length; - qsort_r( table->actual_location, + qsort( table->actual_location, occurs, table->actual_length, - compare_for_sort_table, - &sorter ); + compare_for_sort_table ); // With the in-place sort completed, we are done for( size_t i=0; i<nkeys; i++ ) diff --git a/gcc/cobol/parse.y b/gcc/cobol/parse.y index 148ae5e62e1..160061d4c37 100644 --- a/gcc/cobol/parse.y +++ b/gcc/cobol/parse.y @@ -4069,7 +4069,7 @@ write_file: write_what[name] advance_when[when] advancing | write_what[name] { cbl_file_t *file = cbl_file_of(symbol_file(PROGRAM, $name->name)); - parser_file_write( file, $name, NULL, NULL ); + parser_file_write( file, $name, __null, NULL ); file_stack.push(file); } ; diff --git a/gcc/cobol/scan.l b/gcc/cobol/scan.l index ffb6051b170..3999c76414b 100644 --- a/gcc/cobol/scan.l +++ b/gcc/cobol/scan.l @@ -203,6 +203,11 @@ int ydfparse(void); FILE * copy_mode_start(); +#ifdef __cplusplus +extern "C" char *strsignal(int); +#else +extern char *strsignal(int); +#endif static int wait_for_the_children(void) { pid_t pid; diff --git a/gcc/cobol/util.c b/gcc/cobol/util.c index b5166b75f75..fa7862ed953 100644 --- a/gcc/cobol/util.c +++ b/gcc/cobol/util.c @@ -638,7 +638,7 @@ static size_t record_count( cbl_file_t *file ) { return sb.st_size / size; } -typedef int (*file_sort_cmp_t)(const void *, const void *, void *); +typedef int (*file_sort_cmp_t)(const void *, const void *); /* * mmap file, qsort, and unmap. @@ -659,7 +659,7 @@ cbl_sort_file( cbl_file_t *file, file_sort_cmp_t cmp, void *arg ) { return false; } - qsort_r( mem, nelem, size, cmp, arg ); + qsort( mem, nelem, size, cmp ); if( 0 != munmap(mem, size * nelem) ) { return false; @@ -681,7 +681,7 @@ cbl_file_union( int tgt, int src ) { if( 0 != fstat(tgt, &sb) ) { return false; } - loff_t off_in = sb.st_size, off_out = 0; + off_t off_in = sb.st_size, off_out = 0; if( 0 != fstat(src, &sb) ) { return false; @@ -693,7 +693,8 @@ cbl_file_union( int tgt, int src ) { return false; } - ssize_t n = copy_file_range(src, &off_in, tgt, &off_out, len, flags); + //ssize_t n = copy_file_range(src, &off_in, tgt, &off_out, len, flags); + ssize_t n = 0; return n == (ssize_t)len; } diff --git a/gcc/collect-utils.c b/gcc/collect-utils.c index e85843bc862..5183d541d68 100644 --- a/gcc/collect-utils.c +++ b/gcc/collect-utils.c @@ -58,6 +58,7 @@ fatal_signal (int signum) } /* Wait for a process to finish, and exit if a nonzero status is found. */ +extern "C" char *strsignal(int); int collect_wait (const char *prog, struct pex_obj *pex) diff --git a/gcc/config/elfos.h b/gcc/config/elfos.h index 74a3eafda6b..fb65bcbb3ed 100644 --- a/gcc/config/elfos.h +++ b/gcc/config/elfos.h @@ -109,6 +109,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see For most svr4 systems, the convention is that any symbol which begins with a period is not put into the linker symbol table by the assembler. */ +#ifdef __cplusplus +extern "C" char *stpcpy(char *, const char *); +#else +extern char *stpcpy(char *, const char *); +#endif #undef ASM_GENERATE_INTERNAL_LABEL #define ASM_GENERATE_INTERNAL_LABEL(LABEL, PREFIX, NUM) \ diff --git a/gcc/config/i386/openbsdelf.h b/gcc/config/i386/openbsdelf.h index 7771e4c9ddb..4c094e9c699 100644 --- a/gcc/config/i386/openbsdelf.h +++ b/gcc/config/i386/openbsdelf.h @@ -91,13 +91,16 @@ along with GCC; see the file COPYING3. If not see %{shared:-shared} %{R*} \ %{static:-Bstatic} \ %{!static:-Bdynamic} \ + %{rdynamic:-export-dynamic} \ %{assert*} \ - -dynamic-linker /usr/libexec/ld.so" + %{!shared:%{!-dynamic-linker:-dynamic-linker /usr/libexec/ld.so}} \ + %{!nostdlib:-L/usr/lib}" #undef STARTFILE_SPEC #define STARTFILE_SPEC "\ - %{!shared: %{pg:gcrt0%O%s} %{!pg:%{p:gcrt0%O%s} %{!p:crt0%O%s}} \ - crtbegin%O%s} %{shared:crtbeginS%O%s}" + %{!shared: %{pg:gcrt0%O%s} %{!pg:%{p:gcrt0%O%s} \ + %{!p:%{!static:crt0%O%s} %{static:%{nopie:crt0%O%s} \ + %{!nopie:rcrt0%O%s}}}} crtbegin%O%s} %{shared:crtbeginS%O%s}" #undef ENDFILE_SPEC #define ENDFILE_SPEC "%{!shared:crtend%O%s} %{shared:crtendS%O%s}" diff --git a/gcc/config/openbsd.opt b/gcc/config/openbsd.opt index ae7926a3719..3db4d647b9e 100644 --- a/gcc/config/openbsd.opt +++ b/gcc/config/openbsd.opt @@ -32,4 +32,7 @@ Driver pthread Driver +rdynamic +Driver + ; This comment is to ensure we retain the blank line above. diff --git a/gcc/config/t-openbsd b/gcc/config/t-openbsd index 7637da073b2..ccbba29a4b7 100644 --- a/gcc/config/t-openbsd +++ b/gcc/config/t-openbsd @@ -1,2 +1,6 @@ # We don't need GCC's own include files. -USER_H = $(EXTRA_HEADERS) +USER_H = $(srcdir)/ginclude/stdfix.h \ + $(srcdir)/ginclude/stdnoreturn.h \ + $(srcdir)/ginclude/stdalign.h \ + $(srcdir)/ginclude/stdatomic.h \ + $(EXTRA_HEADERS) diff --git a/gcc/configure b/gcc/configure index 8fe9c91fd7c..4220c675d6a 100755 --- a/gcc/configure +++ b/gcc/configure @@ -30723,7 +30723,7 @@ if ${gcc_cv_c_no_fpie+:} false; then : $as_echo_n "(cached) " >&6 else saved_CXXFLAGS="$CXXFLAGS" - CXXFLAGS="$CXXFLAGS -fno-PIE" + CXXFLAGS="$CXXFLAGS" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main(void) {return 0;} @@ -30739,7 +30739,7 @@ fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_c_no_fpie" >&5 $as_echo "$gcc_cv_c_no_fpie" >&6; } if test "$gcc_cv_c_no_fpie" = "yes"; then - NO_PIE_CFLAGS="-fno-PIE" + NO_PIE_CFLAGS="" fi @@ -30750,7 +30750,7 @@ if ${gcc_cv_no_pie+:} false; then : $as_echo_n "(cached) " >&6 else saved_LDFLAGS="$LDFLAGS" - LDFLAGS="$LDFLAGS -no-pie" + LDFLAGS="$LDFLAGS" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main(void) {return 0;} @@ -30767,7 +30767,7 @@ fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_no_pie" >&5 $as_echo "$gcc_cv_no_pie" >&6; } if test "$gcc_cv_no_pie" = "yes"; then - NO_PIE_FLAG="-no-pie" + NO_PIE_FLAG="" fi diff --git a/gcc/gcc-ar.c b/gcc/gcc-ar.c index 3e1c9fe8569..98c935835d2 100644 --- a/gcc/gcc-ar.c +++ b/gcc/gcc-ar.c @@ -122,6 +122,8 @@ setup_prefixes (const char *exec_path) prefix_from_env ("PATH", &path); } +extern "C" char *strsignal(int); + int main (int ac, char **av) { diff --git a/gcc/gcc.c b/gcc/gcc.c index 9f790db0daf..08ab557e323 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -3019,6 +3019,7 @@ add_sysrooted_hdrs_prefix (struct path_prefix *pprefix, const char *prefix, with `|' between them. Return 0 if successful, -1 if failed. */ +extern "C" char *strsignal(int); static int execute (void) diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index 2d7c5292151..95f3db1003b 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -2382,6 +2382,7 @@ gimple_load_first_char (location_t loc, tree str, gimple_seq *stmts) } /* Fold a call to the str{n}{case}cmp builtin pointed by GSI iterator. */ +extern "C" size_t strnlen(const char *, size_t); static bool gimple_fold_builtin_string_compare (gimple_stmt_iterator *gsi) diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index 9ba687fd775..bccf2599920 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -1011,6 +1011,7 @@ output_symtab (void) } /* Return identifier encoded in IB as a plain string. */ +extern "C" size_t strnlen(const char *, size_t); static tree read_identifier (class lto_input_block *ib) diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c index 407f7300dfb..168732f25de 100644 --- a/gcc/pretty-print.c +++ b/gcc/pretty-print.c @@ -1065,6 +1065,7 @@ static const char *get_end_url_string (pretty_printer *); /* Formatting phases 1 and 2: render TEXT->format_spec plus TEXT->args_ptr into a series of chunks in pp_buffer (PP)->args[]. Phase 3 is in pp_output_formatted_text. */ +extern "C" size_t strnlen(const char *, size_t); void pp_format (pretty_printer *pp, text_info *text) diff --git a/gcc/toplev.c b/gcc/toplev.c index e0b1b85731f..b1db0c66306 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -311,6 +311,7 @@ set_random_seed (const char *val) /* Handler for fatal signals, such as SIGSEGV. These are transformed into ICE messages, which is much more user friendly. In case the error printer crashes, reset the signal to prevent infinite recursion. */ +extern "C" char *strsignal(int); static void crash_signal (int signo) diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c index b0874da5d1e..92b96fbc3fd 100644 --- a/gcc/tree-ssa-strlen.c +++ b/gcc/tree-ssa-strlen.c @@ -4639,6 +4639,7 @@ count_nonzero_bytes_addr (tree, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, Uses RVALS to determine range information. Avoids recursing deeper than the limits in SNLIM allow. Returns true on success and false otherwise. */ +extern "C" size_t strnlen(const char *, size_t); static bool count_nonzero_bytes (tree exp, unsigned HOST_WIDE_INT offset, diff --git a/libcc1/connection.cc b/libcc1/connection.cc index a91dfc8c5e2..b73c5440b6d 100644 --- a/libcc1/connection.cc +++ b/libcc1/connection.cc @@ -20,6 +20,7 @@ along with GCC; see the file COPYING3. If not see #include <cc1plugin-config.h> #include <string> #include <unistd.h> +#include <sys/signal.h> #include <sys/types.h> #include <string.h> #include <errno.h>
Now I had a shiny new gcobol compiler. I guess I need to learn some COBOL.
One of the articles I read about gcobol mentioned that there was a textbook repository containing lots of small COBOL programs, and that gcobol was already extremely proficient at correctly compiling these programs. Seems like as good a place as any to begin our COBOL journey. There is also a COBOL85 test suite that the gcobol team is working on fully passing. That would be good for us to test as well, even if we get lots of errors now. We can see the progression of the compiler as more and more of the test suite compiles successfully.
Taking a look at the programs in the textbook repository, it looks like COBOL is one of those SHOUTING LANGUAGES, or a language that uses keywords that are in all capital letters. You shouldn't take that SHOUTING LANGUAGES thing too seriously, it's just a bit of humor.
I think I might need to go out and buy that textbook. Some of this COBOL code makes sense to me, such as the divvying up of the different sections. But some of the semantics of the language, particularly the data, is lost on me.
Keep in mind that gcobol is still alpha software, so you have to be willing to work with the limitations. I noticed that gcobol does not automatically figure out how to add a main
symbol so even though COBOL programs may be valid, gcobol won't be able to successfully link programs. I discovered that changing the PROGRAM-ID
of the program to main
will get things working. You can see a picture of this here.
For the interested, here is the size of the simple hello world program from the image above:
text data bss dec hex 55245 5944 3106 64295 fb27
You also need to manually link your COBOL programs with -lgcobol -lm
. At least powl
is used in libgcobol
hence why both libraries are needed. But at least for simple programs, these small tweaks—renaming the program to main
to get a main symbol and adding -lgcobol -lm
to the compiler invocation—is enough to successfully compile and link working programs.
I do intend to reach out to upstream to report the status of gcobol on OpenBSD. I understand they have other more pressing issues but if they really do intend to merge this into GCC, I want to make sure that OpenBSD can take advantage of gcobol on day one.
If you'd like your own gcobol compiler, I have posted an amd64
package ready to be installed with pkg_add(1)
on GitHub here. This package requires the devel/gas
package, which will be auto-installed for you when installing the gcobol package. I will continue to update the package from time to time.
This gcobol compiler will install into /usr/local/cobol
so please add /usr/local/cobol/bin
to your PATH
if you want to use it without typing in the full path every time.
If you need my configure scripts and package scripts because you want to bootstrap gcobol on a different architecture, feel free to reach out to me.
I spoke to gcobol upstream. Among other things, I learned that you can add a -main
flag to your compiles and gcobol will insert a main()
function that calls the function identified by PROGRAM-ID
, which works for me in all the cases I've tried. Neat!