Brian Robert Callahan

academic, developer, with an eye towards a brighter techno-social life



[prev]
[next]

2021-04-09
Demystifying programs that create programs, part 3: Globals, passes, and error handling

All source code for this blog post can be found here.

Let's continue writing our assembler. Today, I want to set up any global variables we might need and also set up error handling.

Starting to do the work

First thing though, let's write the first take of that assemble function from our main function. What does this function need to do? Let's keep it simple: let's say that this function will be the central point from which work is delegated. We already know how many lines we need to process: it's the number of items in our lines array. We have easy access to that in D using the .length method on lines: lines.length.

/**
 * Top-level assembly function.
 * Everything cascades downward from here.
 */
static void assemble(string[] lines, string outfile)
{
    for (lineno = 0; lineno < lines.length; lineno++)
        parse();

    fileWrite(outfile);
}

I guess that means we need a lineno variable too. I am going to make it a global variable, since I think I might use it in a number of places and it might be easier for us to make it global rather than having to pass that variable through each and every function: static size_t lineno;. I am going to make this lineno variable of type size_t. Even though we cannot possibly have that many input lines, I am protecting us against someone who maliciously tries to assemble a program greater than 65536 lines in length, which would potentially cause an infinite loop if we were to use ushort.

If we complete our parsing, then we have a valid program. Let's make a new function called fileWrite that writes out the object code to a file:

/**
 * After all code is emitted, write it out to a file.
 */
static void fileWrite(string outfile) {
    import std.file : write;

    write(outfile, output);
}

Here, we need to make certain we use the correct version of write. In D, write can do different things depending on its arguments, and by importing the version of write from std.file right before calling write, we ensure that our D compiler chooses the version of write that writes a string to a file.

Some more globals

I put an output variable in the fileWrite function. I think that variable should hold our binary file as we are are building it line by line. That makes it an array of ubytes, so another global: static ubyte[] output;. We should also create a global variable so that we know what the address of the current byte is. That one should be a ushort since its maximum value should equal the largest possible memory address the Z80 can address: static ushort addr;.

Passes

If you remember in the previous post, we mentioned that we had an issue in regards to labels: we are allowed to reference a label before we define it. However, if that happens, how can the assembler possibly know what the substitution value is for the reference?

The solution is to introduce the idea of passes. That is, we will read through the source code more than once. For us, we need to read through the source code twice: the first pass, we will do nothing but record the addresses for all label definitions; the second pass, we will output code. During the second pass, because we already recorded all the label addresses, we will know what address is being asked for in the case that a label is referenced before it is defined.

This does mean we will need to parse the source code twice. But that is fine. It will still be very fast to assemble. Let's update our assemble function now:

/**
 * Top-level assembly function.
 * Everything cascades downward from here.
 */
static void assemble(string[] lines, string outfile)
{
    pass = 1;
    for (lineno = 0; lineno < lines.length; lineno++)
        parse();

    pass = 2;
    for (lineno = 0; lineno < lines.length; lineno++)
        parse();

    fileWrite(outfile);
}

That also means we need one more global, to keep track of which pass we are on. It doesn't really matter what type of integer it is, so I made it an int: static int pass;.

Error handling

We want to make sure that we display at least a basic diagnostic when there is an error in the assembly code. We can take the liberty of simply quitting the assembler without writing a binary file if we encounter an error too. But providing a diagnostic so that the programmer can fix the mistake would be ideal. It can be a simple function:

/**
 * Nice error messages.
 */
static void err(string msg)
{
    stderr.writeln("a80: " ~ to!string(lineno + 1) ~ ": " ~ msg);
    enforce(0);
}

From this point on, if we encounter something that should trigger an error, we can write err("diagnostic"); and what will really be printed is the line number and the diagnostic. We are using the string concatenation operator we learned about in the previous blog post, as well as a new to!string syntax from the std.conv module from the D standard library. That will take the number that is lineno + 1 (we add 1 since arrays in D begin at 0 but humans read assembly code beginning at line 1) and turn it into a string. That allows it to be concatenated with the other strings in our writeln. Although the to! syntax might look a little funny, it actually holds some quite excellent power. See, string is not the only thing we can convert with to!. We can convert to numeric types as well. We may or may not get to do that in future, but it is a good thing to remember.

Finally, we use enforce(0); to quit the program. It comes from std.exception and checks to see if the value in its argument is true. Since 0 evaluates to false, the check always fails and the fatal error handling is enacted.

What does our code look like now?

Here is all the code we have written up to now:

import std.stdio;
import std.file;
import std.algorithm;
import std.string;
import std.conv;
import std.exception;

/**
 * Line number.
 */
static size_t lineno;

/**
 * Pass.
 */
static int pass;

/**
 * Output stored in memory until we're finished.
 */
static ubyte[] output;

/**
 * Address for labels.
 */
static ushort addr;

/**
 * Top-level assembly function.
 * Everything cascades downward from here.
 */
static void assemble(string[] lines, string outfile)
{
    pass = 1;
    for (lineno = 0; lineno < lines.length; lineno++)
        parse();

    pass = 2;
    for (lineno = 0; lineno < lines.length; lineno++)
        parse();

    fileWrite(outfile);
}

/**
 * After all code is emitted, write it out to a file.
 */
static void fileWrite(string outfile) {
    import std.file : write;

    write(outfile, output);
}

/**
 * Nice error messages.
 */
static void err(string msg)
{
    stderr.writeln("a80: " ~ to!string(lineno + 1) ~ ": " ~ msg);
    enforce(0);
}

/**
 * All good things start with a single function.
 */
void main(string[] args)
{
    /**
     * Make sure the user provides only one input file.
     */
    if (args.length != 2) {
        stderr.writeln("usage: a80 file.asm");
        return;
    }

    /**
     * Create an array of lines from the input file.
     */
    string[] lines = splitLines(cast(string)read(args[1]));

    /**
     * Name output file the same as the input but with .com ending.
     */
    auto split = args[1].findSplit(".asm");
    auto outfile = split[0] ~ ".com";

    /**
     * Do the work.
     */
    assemble(lines, outfile);
}

On the next episode

Next, we will learn how to parse a line into tokens, the constituent parts of a line. This step is crucial to begin to make decisions as to what object code to output for each line of assembly code.

Top

RSS