Brian Robert Callahan

academic, developer, with an eye towards a brighter techno-social life

2021-04-13
Demystifying programs that create programs, part 7: Further opcode processing

All source code for this blog post can be found here.

An assembler that can processes over half of the Intel 8080 opcodes is pretty impressive. But we still need to teach the assembler about the remaining opcodes. Today, we will teach our assembler the fourth quarter of the 8080 opcode table as that will allow us to write interesting programs from that point forward.

Instructions in the fourth quarter of the opcode table

The fourth quarter of the opcode table has the following categories of instructions: the majority are our jump, call, and return instructions; then are our push, pop, and exchange instructions; then arithmetic on immediates; finally, there are a couple of hardware control instructions.

Arithmetic on immediates

We still cannot do anything very meaningful with our assembler as it is since all of our registers are likely to be 0 and we have no way of putting that first non-zero number into any register. That changes now. Let's code up the arithmetic on immediates instructions. There are eight: one for each arithmetic operation we coded in the previous entry. Like the non-immediate versions, the a register is implied as the first operand and the location where the result will be placed.

The eight instructions are: adi, aci, sui, sbi, ani, xri, ori, and cpi. The new challenge for us to tackle is that each of these instructions takes a single argument. But this time, it is an immediate.

Processing 8-bit immediates

/**
 * Get an 8-bit immediate.
 */
private void imm()
{
    ushort num;
    bool found = false;

    if (isDigit(a1[0])) {
        if (input[input.length - 1] == 'h')
            num = to!ubyte(chop(input), 16);
        else
            num = to!ubyte(input, 10);
    } else {
        if (pass == 2) {
            for (size_t i = 0; i < stab.length; i++) {
                if (a1 == stab[i].lab) {
                    num = stab[i].value;
                    found = true;
                    break;
                }
            }

            if (!found)
                err("label " ~ a1 ~ " not defined");
        }
    }

    if (pass == 2)
        output ~= cast(ubyte)num;
}

This will not be the final version of this function, but it works for right now. Eventually, we will need to extend this to support 16-bit immediates but that is a task for tomorrow.

We will support the user inputting either decimal or hexadecimal numbers. The original CP/M assembler also permitted binary and octal numbers. We can add those later.

The difference between decimal and hexadecimal numbers in our assembler is that hexadecimal numbers end with the letter h and decimal numbers do not. In our imm function, we check to see if the last character of the a1 string is h and if it is we chop off the h and convert the string to a hexadecimal number using the to! syntax we learned about in part 3 of the series. As the isDigit function is in std.ascii, we must add that to our import list as well.

But what makes our imm function really powerful is that we can instead use a label to substitute its value. If the first character of a1 is not a number then it must be a label. During the second pass, we check to see if the label was declared at some point during the first pass and if it was not, we issue an error. Whether we have a number or a label, we output the 8-bit number during the second pass.

Writing the arithmetic on immediates functions

Now we are ready to write the functions. They all have the same attributes: a single argument, two bytes in size, and must end with a call to imm. Using the opcode table, see if you can write the functions. My versions are below:

/**
 * adi (0xc6)
 */
private void adi()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xc6);
    imm();
}

/**
 * aci (0xce)
 */
private void aci()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xce);
    imm();
}

/**
 * sui (0xd6)
 */
private void sui()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xd6);
    imm();
}

/**
 * sbi (0xde)
 */
private void sbi()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xde);
    imm();
}

/**
 * ani (0xe6)
 */
private void ani()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xe6);
    imm();
}

/**
 * xri (0xee)
 */
private void xri()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xee);
    imm();
}

/**
 * ori (0xf6)
 */
private void ori()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xf6);
    imm();
}

/**
 * cpi (0xfe)
 */
private void cpi()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xfe);
    imm();
}

Did you write the same functions I did?

Now you can hook these up to the mnemonic list in the process function. At this point, we can put any arbitrary 8-bit number into a using adi and then mov that number into any other register, and perform any arithmetic between registers. Meaning that as of now any register can have any arbitrary value.

Push, pop, and exchange

Next let's code up the functions for push, pop, and the exchanges. These are all relatively straightforward: each is one byte in size, the exchanges take no arguments and the pushes and pops take one argument that must be a register. However, the trick with the pushes and pops is that the register they accept is a 16-bit register as opposed to the 8-bit registers we saw with the mov and arithmetic operators last time.

For the exchanges, we have: xthl which swaps hl with the top of the stack, pchl which replaces the program counter with hl, xchg which swaps de with hl, and sphl which replaces sp with hl. They look like this:

/**
 * xthl (0xe3)
 */
private void xthl()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe3);
}

/**
 * pchl (0xe9)
 */
private void pchl()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe9);
}

/**
 * xchg (0xeb)
 */
private void xchg()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xeb);
}

/**
 * sphl (0xf9)
 */
private void sphl()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf9);
}

Now for push and pop:

/**
 * push (0xc5 + 16-bit register offset)
 */
private void push()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xc5 + regMod16());
}

/**
 * pop (0xc1 + 16-bit register offset)
 */
private void pop()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xc1 + regMod16());
}

As you can see, these take a 16-bit offset. Let's code that up:

/**
 * Return the 16-bit register offset.
 */
private int regMod16()
{
    if (a1 == "b") {
        return 0x00;
    } else if (a1 == "d") {
        return 0x10;
    } else if (a1 == "h") {
        return 0x20;
    } else if (a1 == "psw") {
        return 0x30;
    } else {
        err("invalid register for " ~ op);
    }

    /* This will never be reached, but quiets gdc.  */
    return 0;
}

This is very similar to the regMod8 function, just with fewer options to check.

Let's hook all of these functions up to the mnemonic list in the process function.

Control instructions

There are four hardware control instructions: out to put data out to peripherals, in to get data from peripherals, di to disable interrupts, and ei to enable interrupts. out and in take an 8-bit immediate and are both two bytes in size because of it. di and ei take no arguments and are one byte in size.

Try to code them up before looking at my functions:

/**
 * out (0xd3)
 */
private void i80_out()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xd3);
    imm();
}

/**
 * in (0xdb)
 */
private void i80_in()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xdb);
    imm();
}

/**
 * di (0xf3)
 */
private void di()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf3);
}

/**
 * ei (0xfb)
 */
private void ei()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xfb);
}

You'll notice I named the out and in functions i80_out and i80_in. This is because in and out are already keywords in D and therefore we cannot use them as function names.

Let's hook these up to the mnemonic list in the process function as well.

Jumps, calls, and returns

Now we can get to our jumps, calls and returns. All returns take no arguments and are one byte in size. All jumps and calls take one 16-bit address as an argument and are therefore three bytes in size. Let's code up a function to process a 16-bit address first. It is a little different from an immediate:

/**
 * Get a 16-bit address.
 */
private void a16()
{
    ushort num;
    bool found = false;

    if (isDigit(a1[0])) {
        num = numcheck(a1);
    } else {
        for (size_t i = 0; i < stab.length; i++) {
            if (a1 == stab[i].lab) {
                num = stab[i].value;
                found = true;
                break;
            }
        }

        if (pass == 2) {
            if (!found)
                err("label " ~ a1 ~ " not defined");
        }
    }

    if (pass == 2) {
        output ~= cast(ubyte)(num & 0xff);
        output ~= cast(ubyte)((num >> 8) & 0xff);
    }
}

A lot of the logic is the same as the imm function, except the end where we output a 16-bit number in little endian order rather than an 8-bit number. Also it turns out that the logic for checking if a number is decimal or hexadecimal is exactly the same as in the imm function, so I made that logic its own separate function so that I can reuse it:

/**
 * Check if a number is decimal or hex.
 */
private ushort numcheck(string input)
{
    ushort num;

    if (input[input.length - 1] == 'h')
        num = to!ushort(chop(input), 16);
    else
        num = to!ushort(input, 10);

    return num;
}

And make sure to make the replacement in the imm function as well. If you forget, I will have the complete assembler code at the end of this post as usual.

Now let's code up all our jumps, calls, and returns:

/**
 * rnz (0xc0)
 */
private void rnz()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xc0);
}

/**
 * jnz (0xc2)
 */
private void jnz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xc2);
    a16();
}

/**
 * jmp (0xc3)
 */
private void jmp()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xc3);
    a16();
}

/**
 * cnz (0xc4)
 */
private void cnz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xc4);
    a16();
}

/**
 * rz (0xc8)
 */
private void rz()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xc8);
}

/**
 * ret (0xc9)
 */
private void ret()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xc9);
}

/**
 * jz (0xca)
 */
private void jz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xca);
    a16();
}

/**
 * cz (0xcc)
 */
private void cz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xcc);
    a16();
}

/**
 * call (0xcd)
 */
private void call()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xcd);
    a16();
}

/**
 * rnc (0xd0)
 */
private void rnc()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xd0);
}

/**
 * jnc (0xd2)
 */
private void jnc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xd2);
    a16();
}

/**
 * cnc (0xd4)
 */
private void cnc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xd4);
    a16();
}

/**
 * rc (0xd8)
 */
private void rc()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xd8);
}

/**
 * jc (0xda)
 */
private void jc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xda);
    a16();
}

/**
 * cc (0xdc)
 */
private void cc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xdc);
    a16();
}

/**
 * rpo (0xe0)
 */
private void rpo()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe0);
}

/**
 * jpo (0xe2)
 */
private void jpo()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xe2);
    a16();
}

/**
 * cpo (0xe4)
 */
private void cpo()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xe4);
    a16();
}

/**
 * rpe (0xe8)
 */
private void rpe()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe8);
}

/**
 * jpe (0xea)
 */
private void jpe()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xea);
    a16();
}

/**
 * cpe (0xec)
 */
private void cpe()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xec);
    a16();
}

/**
 * rp (0xf0)
 */
private void rp()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf0);
}

/**
 * jp (0xf2)
 */
private void jp()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xf2);
    a16();
}

/**
 * cp (0xf4)
 */
private void cp()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xf4);
    a16();
}

/**
 * rm (0xf8)
 */
private void rm()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf8);
}

/**
 * jm (0xfa)
 */
private void jm()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xfa);
    a16();
}

/**
 * cm (0xfc)
 */
private void cm()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xfc);
    a16();
}

That definitely took some time. We're almost done for today.

Reset vectors

Lastly are the reset vectors. There are eight of them, numbered 0-7. They were designed to work with peripherals, but what they do is push the program counter onto the stack and then jump to address (8 * reset vector number). That means it should be just one more function to code:

/**
 * rst (0xc7 + offset)
 */
private void rst()
{
    argcheck(!a1.empty && a2.empty);
    auto offset = to!int(a1, 10);
    if (offset >= 0 && offset <= 7)
        passAct(1, 0xc7 + (offset << 3));
    else
        err("invalid reset vector: " ~ to!string(offset));
}

We check to make sure that the argument, which must be a number, is between 0 and 7, inclusive, and produce an error if it is not. The output byte is 0xc7 + (reset vector number * 8), but as we learned before we can make it a little faster by turning the multiplication by eight into a left shift by three.

For our final step today, let's hook up any functions we haven't hooked up yet to the mnemonic list in the process function.

A growing assembler

Here is the full code for our assembler after today:

import std.stdio;
import std.file;
import std.algorithm;
import std.string;
import std.conv;
import std.exception;
import std.ascii;

/**
 * Line number.
 */
private size_t lineno;

/**
 * Pass.
 */
private int pass;

/**
 * Output stored in memory until we're finished.
 */
private ubyte[] output;

/**
 * Address for labels.
 */
private ushort addr;

/**
 * Intel 8080 assembler instruction.
 */
private string lab;      /// Label
private string op;       /// Instruction mnemonic
private string a1;       /// First argument
private string a2;       /// Second argument
private string comm;     /// Comment

/**
 * Individual symbol table entry.
 */
struct symtab
{
    string lab;         /// Symbol name
    ushort value;       /// Symbol value
};

/**
 * Symbol table is an array of entries.
 */
private symtab[] stab;

/**
 * Top-level assembly function.
 * Everything cascades downward from here.
 * Repeat the parsing twice.
 * Pass 1 gathers symbols and their addresses/values.
 * Pass 2 emits code.
 */
private void assemble(string[] lines, string outfile)
{
    pass = 1;
    for (lineno = 0; lineno < lines.length; lineno++) {
        parse(lines[lineno]);
        process();
    }

    pass = 2;
    for (lineno = 0; lineno < lines.length; lineno++) {
        parse(lines[lineno]);
        process();
    }

    fileWrite(outfile);
}

/**
 * After all code is emitted, write it out to a file.
 */
private void fileWrite(string outfile) {
    import std.file : write;

    write(outfile, output);
}

/**
 * Parse each line into (up to) five tokens.
 */
private void parse(string line) {
    /* Reset all our variables.  */
    lab = null;
    op = null;
    a1 = null;
    a2 = null;
    comm = null;

    /* Remove any whitespace at the beginning of the line.  */
    auto preprocess = stripLeft(line);

    /* Split comment from the rest of the line.  */
    auto splitcomm = preprocess.findSplit(";");
    if (!splitcomm[2].empty)
        comm = strip(splitcomm[2]);

    /* Split second argument from the remainder.  */
    auto splita2 = splitcomm[0].findSplit(",");
    if (!splita2[2].empty)
        a2 = strip(splita2[2]);

    /* Split first argument from the remainder.  */
    auto splita1 = splita2[0].findSplit("\t");
    if (!splita1[2].empty) {
        a1 = strip(splita1[2]);
    } else {
        splita1 = splita2[0].findSplit(" ");
        if (!splita1[2].empty) {
            a1 = strip(splita1[2]);
        }
    }

    /* Split op from label.  */
    auto splitop = splita1[0].findSplit(":");
    if (!splitop[1].empty) {
        op = strip(splitop[2]);
        lab = strip(splitop[0]);
    } else {
        op = strip(splitop[0]);
    }

    /**
     * Fixup for the label: op case.
     */
    auto opFix = a1.findSplit("\t");
    if (!opFix[1].empty) {
        op = strip(opFix[0]);
        a1 = strip(opFix[2]);
    } else {
        opFix = a1.findSplit(" ");
        if (!opFix[1].empty) {
            op = strip(opFix[0]);
            a1 = strip(opFix[2]);
        } else {
            if (op.empty && !a1.empty && a2.empty) {
                op = a1;
                a1 = null;
            }
        }
    }
}

/**
 * Figure out which op we have.
 */
private void process()
{
    /**
     * Special case for if you put a label by itself on a line.
     * Or have a totally blank line.
     */
    if (op.empty && a1.empty && a2.empty) {
        passAct(0, -1);
        return;
    }

    /**
     * List of all valid mnemonics.
     */
    if (op == "nop")
        nop();
    else if (op == "mov")
        mov();
    else if (op == "hlt")
        hlt();
    else if (op == "add")
        add();
    else if (op == "adc")
        adc();
    else if (op == "sub")
        sub();
    else if (op == "sbb")
        sbb();
    else if (op == "ana")
        ana();
    else if (op == "xra")
        xra();
    else if (op == "ora")
        ora();
    else if (op == "cmp")
        cmp();
    else if (op == "rnz")
        rnz();
    else if (op == "pop")
        pop();
    else if (op == "jnz")
        jnz();
    else if (op == "jmp")
        jmp();
    else if (op == "cnz")
        cnz();
    else if (op == "push")
        push();
    else if (op == "adi")
        adi();
    else if (op == "rst")
        rst();
    else if (op == "rz")
        rz();
    else if (op == "ret")
        ret();
    else if (op == "jz")
        jz();
    else if (op == "cz")
        cz();
    else if (op == "call")
        call();
    else if (op == "aci")
        aci();
    else if (op == "rnc")
        rnc();
    else if (op == "jnc")
        jnc();
    else if (op == "out")
        i80_out();
    else if (op == "cnc")
        cnc();
    else if (op == "sui")
        sui();
    else if (op == "rc")
        rc();
    else if (op == "jc")
        jc();
    else if (op == "in")
        i80_in();
    else if (op == "cc")
        cc();
    else if (op == "sbi")
        sbi();
    else if (op == "rpo")
        rpo();
    else if (op == "jpo")
        jpo();
    else if (op == "xthl")
        xthl();
    else if (op == "cpo")
        cpo();
    else if (op == "ani")
        ani();
    else if (op == "rpe")
        rpe();
    else if (op == "pchl")
        pchl();
    else if (op == "jpe")
        jpe();
    else if (op == "xchg")
        xchg();
    else if (op == "cpe")
        cpe();
    else if (op == "xri")
        xri();
    else if (op == "rp")
        rp();
    else if (op == "jp")
        jp();
    else if (op == "di")
        di();
    else if (op == "cp")
        cp();
    else if (op == "ori")
        ori();
    else if (op == "rm")
        rm();
    else if (op == "sphl")
        sphl();
    else if (op == "jm")
        jm();
    else if (op == "ei")
        ei();
    else if (op == "cm")
        cm();
    else if (op == "cpi")
        cpi();
    else
        err("unknown mnemonic: " ~ op);
}

/**
 * Take action depending on which pass this is.
 */
private void passAct(ushort size, int outbyte)
{
    if (pass == 1) {
        /* Add new symbol if we have a label.  */
        if (!lab.empty)
            addsym();

        /* Increment address counter by size of instruction.  */
        addr += size;
    } else {
        /**
         * Output the byte representing the opcode.
         * If the opcode carries additional information
         *   (e.g., immediate or address), we will output that
         *   in a separate helper function.
         */
        if (outbyte >= 0)
            output ~= cast(ubyte)outbyte;
    }
}

/**
 * Add a symbol to the symbol table.
 */
private void addsym()
{
    for (size_t i = 0; i < stab.length; i++) {
        if (lab == stab[i].lab)
            err("duplicate label: " ~ lab);
    }

    symtab newsym = { lab, addr };
    stab ~= newsym;
}

/**
 * nop (0x00)
 */
private void nop()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0x00);
}

/**
 * mov (0x40 + (8-bit register offset << 3) + 8-bit register offset
 * We allow mov m, m (0x76)
 * But that will result in HLT.
 */
private void mov()
{
    argcheck(!a1.empty && !a2.empty);
    passAct(1, 0x40 + (regMod8(a1) << 3) + regMod8(a2));
}

/**
 * hlt (0x76)
 */
private void hlt()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0x76);
}

/**
 * add (0x80 + 8-bit register offset)
 */
private void add()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0x80 + regMod8(a1));
}

/**
 * adc (0x88 + 8-bit register offset)
 */
private void adc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0x88 + regMod8(a1));
}

/**
 * sub (0x90 + 8-bit register offset)
 */
private void sub()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0x90 + regMod8(a1));
}

/**
 * sbb (0x98 + 8-bit register offset)
 */
private void sbb()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0x98 + regMod8(a1));
}

/**
 * ana (0xa0 + 8-bit register offset)
 */
private void ana()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xa0 + regMod8(a1));
}

/**
 * xra (0xa8 + 8-bit register offset)
 */
private void xra()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xa8 + regMod8(a1));
}

/**
 * ora (0xb0 + 8-bit register offset)
 */
private void ora()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xb0 + regMod8(a1));
}

/**
 * cmp (0xb8 + 8-bit register offset)
 */
private void cmp()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xb8 + regMod8(a1));
}

/**
 * rnz (0xc0)
 */
private void rnz()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xc0);
}

/**
 * pop (0xc1 + 16-bit register offset)
 */
private void pop()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xc1 + regMod16());
}

/**
 * jnz (0xc2)
 */
private void jnz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xc2);
    a16();
}

/**
 * jmp (0xc3)
 */
private void jmp()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xc3);
    a16();
}

/**
 * cnz (0xc4)
 */
private void cnz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xc4);
    a16();
}

/**
 * push (0xc5 + 16-bit register offset)
 */
private void push()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xc5 + regMod16());
}

/**
 * adi (0xc6)
 */
private void adi()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xc6);
    imm();
}

/**
 * rst (0xc7 + offset)
 */
private void rst()
{
    argcheck(!a1.empty && a2.empty);
    auto offset = to!int(a1, 10);
    if (offset >= 0 && offset <= 7)
        passAct(1, 0xc7 + (offset * 8));
    else
        err("invalid reset vector: " ~ to!string(offset));
}

/**
 * rz (0xc8)
 */
private void rz()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xc8);
}

/**
 * ret (0xc9)
 */
private void ret()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xc9);
}

/**
 * jz (0xca)
 */
private void jz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xca);
    a16();
}

/**
 * cz (0xcc)
 */
private void cz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xcc);
    a16();
}

/**
 * call (0xcd)
 */
private void call()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xcd);
    a16();
}

/**
 * aci (0xce)
 */
private void aci()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xce);
    imm();
}

/**
 * rnc (0xd0)
 */
private void rnc()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xd0);
}

/**
 * jnc (0xd2)
 */
private void jnc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xd2);
    a16();
}

/**
 * out (0xd3)
 */
private void i80_out()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xd3);
    imm();
}

/**
 * cnc (0xd4)
 */
private void cnc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xd4);
    a16();
}

/**
 * sui (0xd6)
 */
private void sui()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xd6);
    imm();
}

/**
 * rc (0xd8)
 */
private void rc()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xd8);
}

/**
 * jc (0xda)
 */
private void jc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xda);
    a16();
}

/**
 * in (0xdb)
 */
private void i80_in()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xdb);
    imm();
}

/**
 * cc (0xdc)
 */
private void cc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xdc);
    a16();
}

/**
 * sbi (0xde)
 */
private void sbi()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xde);
    imm();
}

/**
 * rpo (0xe0)
 */
private void rpo()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe0);
}

/**
 * jpo (0xe2)
 */
private void jpo()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xe2);
    a16();
}

/**
 * xthl (0xe3)
 */
private void xthl()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe3);
}

/**
 * cpo (0xe4)
 */
private void cpo()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xe4);
    a16();
}

/**
 * ani (0xe6)
 */
private void ani()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xe6);
    imm();
}

/**
 * rpe (0xe8)
 */
private void rpe()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe8);
}

/**
 * pchl (0xe9)
 */
private void pchl()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe9);
}

/**
 * jpe (0xea)
 */
private void jpe()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xea);
    a16();
}

/**
 * xchg (0xeb)
 */
private void xchg()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xeb);
}

/**
 * cpe (0xec)
 */
private void cpe()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xec);
    a16();
}

/**
 * xri (0xee)
 */
private void xri()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xee);
    imm();
}

/**
 * rp (0xf0)
 */
private void rp()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf0);
}

/**
 * jp (0xf2)
 */
private void jp()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xf2);
    a16();
}

/**
 * di (0xf3)
 */
private void di()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf3);
}

/**
 * cp (0xf4)
 */
private void cp()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xf4);
    a16();
}

/**
 * ori (0xf6)
 */
private void ori()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xf6);
    imm();
}

/**
 * rm (0xf8)
 */
private void rm()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf8);
}

/**
 * sphl (0xf9)
 */
private void sphl()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf9);
}

/**
 * jm (0xfa)
 */
private void jm()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xfa);
    a16();
}

/**
 * ei (0xfb)
 */
private void ei()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xfb);
}

/**
 * cm (0xfc)
 */
private void cm()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xfc);
    a16();
}

/**
 * cpi (0xfe)
 */
private void cpi()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xfe);
    imm();
}

/**
 * Get an 8-bit immediate.
 */
private void imm()
{
    ushort num;
    bool found = false;

    if (isDigit(a1[0])) {
        num = numcheck(a1);
    } else {
        if (pass == 2) {
            for (size_t i = 0; i < stab.length; i++) {
                if (a1 == stab[i].lab) {
                    num = stab[i].value;
                    found = true;
                    break;
                }
            }

            if (!found)
                err("label " ~ a1 ~ " not defined");
        }
    }

    if (pass == 2)
        output ~= cast(ubyte)num;
}

/**
 * Get a 16-bit address.
 */
private void a16()
{
    ushort num;
    bool found = false;

    if (isDigit(a1[0])) {
        num = numcheck(a1);
    } else {
        for (size_t i = 0; i < stab.length; i++) {
            if (a1 == stab[i].lab) {
                num = stab[i].value;
                found = true;
                break;
            }
        }

        if (pass == 2) {
            if (!found)
                err("label " ~ a1 ~ " not defined");
        }
    }

    if (pass == 2) {
        output ~= cast(ubyte)(num & 0xff);
        output ~= cast(ubyte)((num >> 8) & 0xff);
    }
}

/**
 * Return the 16-bit register offset.
 */
private int regMod16()
{
    if (a1 == "b") {
        return 0x00;
    } else if (a1 == "d") {
        return 0x10;
    } else if (a1 == "h") {
        return 0x20;
    } else if (a1 == "psw") {
        return 0x30;
    } else {
        err("invalid register for " ~ op);
    }

    /* This will never be reached, but quiets gdc.  */
    return 0;
}

/**
 * Return the 8-bit register offset.
 */
private int regMod8(string reg)
{
    if (reg == "b")
        return 0x00;
    else if (reg == "c")
        return 0x01;
    else if (reg == "d")
        return 0x02;
    else if (reg == "e")
        return 0x03;
    else if (reg == "h")
        return 0x04;
    else if (reg == "l")
        return 0x05;
    else if (reg == "m")
        return 0x06;
    else if (reg == "a")
        return 0x07;
    else
        err("invalid register " ~ reg);

    /* This will never be reached, but quiets gdc.  */
    return 0;
}

/**
 * Check arguments.
 */
private void argcheck(bool passed)
{
    if (passed == false)
        err("arguments not correct for mnemonic: " ~ op);
}

/**
 * Check if a number is decimal or hex.
 */
private ushort numcheck(string input)
{
    ushort num;

    if (input[input.length - 1] == 'h')
        num = to!ushort(chop(input), 16);
    else
        num = to!ushort(input, 10);

    return num;
}

/**
 * Nice error messages.
 */
private void err(string msg)
{
    stderr.writeln("a80: " ~ to!string(lineno + 1) ~ ": " ~ msg);
    enforce(0);
}

/**
 * All good things start with a single function.
 */
void main(string[] args)
{
    /**
     * Make sure the user provides only one input file.
     */
    if (args.length != 2) {
        stderr.writeln("usage: a80 file.asm");
        return;
    }

    /**
     * Create an array of lines from the input file.
     */
    string[] lines = splitLines(cast(string)read(args[1]));

    /**
     * Name output file the same as the input but with .com ending.
     */
    auto split = args[1].findSplit(".asm");
    auto outfile = split[0] ~ ".com";

    /**
     * Do the work.
     */
    assemble(lines, outfile);
}

Assembling our first working program

Do you remember the Fibonacci program from a few posts ago? Here it is again. We can now assemble it into a fully working executable. Save this as fib.asm and run it through your newly compiled assembler:

; Fibonacci in Intel 8080 assembler.
; Results in b
start:
	xra	a	; zero out a
	mov	b, a	; b = a
	mov	c, a	; c = a
	adi	01h	; a = a + 1
	mov	c, a	; c = a
	xra	a	; zero out a
loop:	add	c	; a = a + c
	cmp	c
	jc	start	; jump if carry
	mov	b, a
	mov	a, c
	mov	c, b
	jmp	loop	; jump to loop

You should then run the resulting executable through our disassembler and you can see that while the assembly uses labels for the jumps, the disassembly correctly translated them to their addresses. Unfortunately, if you have an emulator you will not be able to see any output just yet. I wrote my own Intel 8080 CPU emulator in C that displays the current value of all the registers so I can confirm that this program does in fact work correctly. We are only a couple of days away from being able to see output via CP/M.

Next time

We will teach the assembler about the first quarter of the opcode table. Then we will be finished encoding all the 8080 instructions.

Let's also take a moment to congratulate ourselves for breaking the 1000 lines mark. Not all of it is code, of course, but vim reports that our source code file is 1040 lines long. I think that's worth celebrating.

Brian Robert Callahan

[prev]

[next]

2021-04-13
Demystifying programs that create programs, part 7: Further opcode processing

Instructions in the fourth quarter of the opcode table

Arithmetic on immediates

Processing 8-bit immediates

Writing the arithmetic on immediates functions

Push, pop, and exchange

Control instructions

Jumps, calls, and returns

Reset vectors

A growing assembler

Assembling our first working program

Next time

Brian Robert Callahan

[prev]

[next]

2021-04-13Demystifying programs that create programs, part 7: Further opcode processing

Instructions in the fourth quarter of the opcode table

Arithmetic on immediates

Processing 8-bit immediates

Writing the arithmetic on immediates functions

Push, pop, and exchange

Control instructions

Jumps, calls, and returns

Reset vectors

A growing assembler

Assembling our first working program

Next time

2021-04-13
Demystifying programs that create programs, part 7: Further opcode processing