Brian Robert Callahan

academic, developer, with an eye towards a brighter techno-social life



[prev]
[next]

2021-04-13
Demystifying programs that create programs, part 7: Further opcode processing

All source code for this blog post can be found here.

An assembler that can processes over half of the Intel 8080 opcodes is pretty impressive. But we still need to teach the assembler about the remaining opcodes. Today, we will teach our assembler the fourth quarter of the 8080 opcode table as that will allow us to write interesting programs from that point forward.

Instructions in the fourth quarter of the opcode table

The fourth quarter of the opcode table has the following categories of instructions: the majority are our jump, call, and return instructions; then are our push, pop, and exchange instructions; then arithmetic on immediates; finally, there are a couple of hardware control instructions.

Arithmetic on immediates

We still cannot do anything very meaningful with our assembler as it is since all of our registers are likely to be 0 and we have no way of putting that first non-zero number into any register. That changes now. Let's code up the arithmetic on immediates instructions. There are eight: one for each arithmetic operation we coded in the previous entry. Like the non-immediate versions, the a register is implied as the first operand and the location where the result will be placed.

The eight instructions are: adi, aci, sui, sbi, ani, xri, ori, and cpi. The new challenge for us to tackle is that each of these instructions takes a single argument. But this time, it is an immediate.

Processing 8-bit immediates

/**
 * Get an 8-bit immediate.
 */
static void imm()
{
    ushort num;
    bool found = false;

    if (isDigit(a1[0])) {
        if (input[input.length - 1] == 'h')
            num = to!ubyte(chop(input), 16);
        else
            num = to!ubyte(input, 10);
    } else {
        if (pass == 2) {
            for (size_t i = 0; i < stab.length; i++) {
                if (a1 == stab[i].lab) {
                    num = stab[i].value;
                    found = true;
                    break;
                }
            }

            if (!found)
                err("label " ~ a1 ~ " not defined");
        }
    }

    if (pass == 2)
        output ~= cast(ubyte)num;
}

This will not be the final version of this function, but it works for right now. Eventually, we will need to extend this to support 16-bit immediates but that is a task for tomorrow.

We will support the user inputting either decimal or hexadecimal numbers. The original CP/M assembler also permitted binary and octal numbers. We can add those later.

The difference between decimal and hexadecimal numbers in our assembler is that hexadecimal numbers end with the letter h and decimal numbers do not. In our imm function, we check to see if the last character of the a1 string is h and if it is we chop off the h and convert the string to a hexadecimal number using the to! syntax we learned about in part 3 of the series. As the isDigit function is in std.ascii, we must add that to our import list as well.

But what makes our imm function really powerful is that we can instead use a label to substitute its value. If the first character of a1 is not a number then it must be a label. During the second pass, we check to see if the label was declared at some point during the first pass and if it was not, we issue an error. Whether we have a number or a label, we output the 8-bit number during the second pass.

Writing the arithmetic on immediates functions

Now we are ready to write the functions. They all have the same attributes: a single argument, two bytes in size, and must end with a call to imm. Using the opcode table, see if you can write the functions. My versions are below:

/**
 * adi (0xc6)
 */
static void adi()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xc6);
    imm();
}

/**
 * aci (0xce)
 */
static void aci()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xce);
    imm();
}

/**
 * sui (0xd6)
 */
static void sui()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xd6);
    imm();
}

/**
 * sbi (0xde)
 */
static void sbi()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xde);
    imm();
}

/**
 * ani (0xe6)
 */
static void ani()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xe6);
    imm();
}

/**
 * xri (0xee)
 */
static void xri()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xee);
    imm();
}

/**
 * ori (0xf6)
 */
static void ori()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xf6);
    imm();
}

/**
 * cpi (0xfe)
 */
static void cpi()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xfe);
    imm();
}

Did you write the same functions I did?

Now you can hook these up to the mnemonic list in the process function. At this point, we can put any arbitrary 8-bit number into a using adi and then mov that number into any other register, and perform any arithmetic between registers. Meaning that as of now any register can have any arbitrary value.

Push, pop, and exchange

Next let's code up the functions for push, pop, and the exchanges. These are all relatively straightforward: each is one byte in size, the exchanges take no arguments and the pushes and pops take one argument that must be a register. However, the trick with the pushes and pops is that the register they accept is a 16-bit register as opposed to the 8-bit registers we saw with the mov and arithmetic operators last time.

For the exchanges, we have: xthl which swaps hl with the top of the stack, pchl which replaces the program counter with hl, xchg which swaps de with hl, and sphl which replaces sp with hl. They look like this:

/**
 * xthl (0xe3)
 */
static void xthl()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe3);
}

/**
 * pchl (0xe9)
 */
static void pchl()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe9);
}

/**
 * xchg (0xeb)
 */
static void xchg()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xeb);
}

/**
 * sphl (0xf9)
 */
static void sphl()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf9);
}

Now for push and pop:

/**
 * push (0xc5 + 16-bit register offset)
 */
static void push()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xc5 + regMod16());
}

/**
 * pop (0xc1 + 16-bit register offset)
 */
static void pop()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xc1 + regMod16());
}

As you can see, these take a 16-bit offset. Let's code that up:

/**
 * Return the 16-bit register offset.
 */
static int regMod16()
{
    if (a1 == "b") {
        return 0x00;
    } else if (a1 == "d") {
        return 0x10;
    } else if (a1 == "h") {
        return 0x20;
    } else if (a1 == "psw") {
        return 0x30;
    } else {
        err("invalid register for " ~ op);
    }

    /* This will never be reached, but quiets gdc.  */
    return 0;
}

This is very similar to the regMod8 function, just with fewer options to check.

Let's hook all of these functions up to the mnemonic list in the process function.

Control instructions

There are four hardware control instructions: out to put data out to peripherals, in to get data from peripherals, di to disable interrupts, and ei to enable interrupts. out and in take an 8-bit immediate and are both two bytes in size because of it. di and ei take no arguments and are one byte in size.

Try to code them up before looking at my functions:

/**
 * out (0xd3)
 */
static void i80_out()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xd3);
    imm();
}

/**
 * in (0xdb)
 */
static void i80_in()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xdb);
    imm();
}

/**
 * di (0xf3)
 */
static void di()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf3);
}

/**
 * ei (0xfb)
 */
static void ei()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xfb);
}

You'll notice I named the out and in functions i80_out and i80_in. This is because in and out are already keywords in D and therefore we cannot use them as function names.

Let's hook these up to the mnemonic list in the process function as well.

Jumps, calls, and returns

Now we can get to our jumps, calls and returns. All returns take no arguments and are one byte in size. All jumps and calls take one 16-bit address as an argument and are therefore three bytes in size. Let's code up a function to process a 16-bit address first. It is a little different from an immediate:

/**
 * Get a 16-bit address.
 */
static void a16()
{
    ushort num;
    bool found = false;

    if (isDigit(a1[0])) {
        num = numcheck(a1);
    } else {
        for (size_t i = 0; i < stab.length; i++) {
            if (a1 == stab[i].lab) {
                num = stab[i].value;
                found = true;
                break;
            }
        }

        if (pass == 2) {
            if (!found)
                err("label " ~ a1 ~ " not defined");
        }
    }

    if (pass == 2) {
        output ~= cast(ubyte)(num & 0xff);
        output ~= cast(ubyte)((num >> 8) & 0xff);
    }
}

A lot of the logic is the same as the imm function, except the end where we output a 16-bit number in little endian order rather than an 8-bit number. Also it turns out that the logic for checking if a number is decimal or hexadecimal is exactly the same as in the imm function, so I made that logic its own separate function so that I can reuse it:

/**
 * Check if a number is decimal or hex.
 */
static ushort numcheck(string input)
{
    ushort num;

    if (input[input.length - 1] == 'h')
        num = to!ushort(chop(input), 16);
    else
        num = to!ushort(input, 10);

    return num;
}

And make sure to make the replacement in the imm function as well. If you forget, I will have the complete assembler code at the end of this post as usual.

Now let's code up all our jumps, calls, and returns:

/**
 * rnz (0xc0)
 */
static void rnz()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xc0);
}

/**
 * jnz (0xc2)
 */
static void jnz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xc2);
    a16();
}

/**
 * jmp (0xc3)
 */
static void jmp()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xc3);
    a16();
}

/**
 * cnz (0xc4)
 */
static void cnz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xc4);
    a16();
}

/**
 * rz (0xc8)
 */
static void rz()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xc8);
}

/**
 * ret (0xc9)
 */
static void ret()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xc9);
}

/**
 * jz (0xca)
 */
static void jz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xca);
    a16();
}

/**
 * cz (0xcc)
 */
static void cz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xcc);
    a16();
}

/**
 * call (0xcd)
 */
static void call()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xcd);
    a16();
}

/**
 * rnc (0xd0)
 */
static void rnc()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xd0);
}

/**
 * jnc (0xd2)
 */
static void jnc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xd2);
    a16();
}

/**
 * cnc (0xd4)
 */
static void cnc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xd4);
    a16();
}

/**
 * rc (0xd8)
 */
static void rc()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xd8);
}

/**
 * jc (0xda)
 */
static void jc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xda);
    a16();
}

/**
 * cc (0xdc)
 */
static void cc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xdc);
    a16();
}

/**
 * rpo (0xe0)
 */
static void rpo()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe0);
}

/**
 * jpo (0xe2)
 */
static void jpo()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xe2);
    a16();
}

/**
 * cpo (0xe4)
 */
static void cpo()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xe4);
    a16();
}

/**
 * rpe (0xe8)
 */
static void rpe()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe8);
}

/**
 * jpe (0xea)
 */
static void jpe()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xea);
    a16();
}

/**
 * cpe (0xec)
 */
static void cpe()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xec);
    a16();
}

/**
 * rp (0xf0)
 */
static void rp()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf0);
}

/**
 * jp (0xf2)
 */
static void jp()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xf2);
    a16();
}

/**
 * cp (0xf4)
 */
static void cp()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xf4);
    a16();
}

/**
 * rm (0xf8)
 */
static void rm()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf8);
}

/**
 * jm (0xfa)
 */
static void jm()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xfa);
    a16();
}

/**
 * cm (0xfc)
 */
static void cm()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xfc);
    a16();
}

That definitely took some time. We're almost done for today.

Reset vectors

Lastly are the reset vectors. There are eight of them, numbered 0-7. They were designed to work with peripherals, but what they do is push the program counter onto the stack and then jump to address (8 * reset vector number). That means it should be just one more function to code:

/**
 * rst (0xc7 + offset)
 */
static void rst()
{
    argcheck(!a1.empty && a2.empty);
    auto offset = to!int(a1, 10);
    if (offset >= 0 && offset <= 7)
        passAct(1, 0xc7 + (offset << 3));
    else
        err("invalid reset vector: " ~ to!string(offset));
}

We check to make sure that the argument, which must be a number, is between 0 and 7, inclusive, and produce an error if it is not. The output byte is 0xc7 + (reset vector number * 8), but as we learned before we can make it a little faster by turning the multiplication by eight into a left shift by three.

For our final step today, let's hook up any functions we haven't hooked up yet to the mnemonic list in the process function.

A growing assembler

Here is the full code for our assembler after today:

import std.stdio;
import std.file;
import std.algorithm;
import std.string;
import std.conv;
import std.exception;
import std.ascii;

/**
 * Line number.
 */
static size_t lineno;

/**
 * Pass.
 */
static int pass;

/**
 * Output stored in memory until we're finished.
 */
static ubyte[] output;

/**
 * Address for labels.
 */
static ushort addr;

/**
 * Intel 8080 assembler instruction.
 */
static string lab;      /// Label
static string op;       /// Instruction mnemonic
static string a1;       /// First argument
static string a2;       /// Second argument
static string comm;     /// Comment

/**
 * Individual symbol table entry.
 */
struct symtab
{
    string lab;         /// Symbol name
    ushort value;       /// Symbol value
};

/**
 * Symbol table is an array of entries.
 */
static symtab[] stab;

/**
 * Top-level assembly function.
 * Everything cascades downward from here.
 * Repeat the parsing twice.
 * Pass 1 gathers symbols and their addresses/values.
 * Pass 2 emits code.
 */
static void assemble(string[] lines, string outfile)
{
    pass = 1;
    for (lineno = 0; lineno < lines.length; lineno++) {
        parse(lines[lineno]);
        process();
    }

    pass = 2;
    for (lineno = 0; lineno < lines.length; lineno++) {
        parse(lines[lineno]);
        process();
    }

    fileWrite(outfile);
}

/**
 * After all code is emitted, write it out to a file.
 */
static void fileWrite(string outfile) {
    import std.file : write;

    write(outfile, output);
}

/**
 * Parse each line into (up to) five tokens.
 */
static void parse(string line) {
    /* Reset all our variables.  */
    lab = null;
    op = null;
    a1 = null;
    a2 = null;
    comm = null;

    /* Remove any whitespace at the beginning of the line.  */
    auto preprocess = stripLeft(line);

    /* Split comment from the rest of the line.  */
    auto splitcomm = preprocess.findSplit(";");
    if (!splitcomm[2].empty)
        comm = strip(splitcomm[2]);

    /* Split second argument from the remainder.  */
    auto splita2 = splitcomm[0].findSplit(",");
    if (!splita2[2].empty)
        a2 = strip(splita2[2]);

    /* Split first argument from the remainder.  */
    auto splita1 = splita2[0].findSplit("\t");
    if (!splita1[2].empty) {
        a1 = strip(splita1[2]);
    } else {
        splita1 = splita2[0].findSplit(" ");
        if (!splita1[2].empty) {
            a1 = strip(splita1[2]);
        }
    }

    /* Split op from label.  */
    auto splitop = splita1[0].findSplit(":");
    if (!splitop[1].empty) {
        op = strip(splitop[2]);
        lab = strip(splitop[0]);
    } else {
        op = strip(splitop[0]);
    }

    /**
     * Fixup for the label: op case.
     */
    auto opFix = a1.findSplit("\t");
    if (!opFix[1].empty) {
        op = strip(opFix[0]);
        a1 = strip(opFix[2]);
    } else {
        opFix = a1.findSplit(" ");
        if (!opFix[1].empty) {
            op = strip(opFix[0]);
            a1 = strip(opFix[2]);
        } else {
            if (op.empty && !a1.empty && a2.empty) {
                op = a1;
                a1 = null;
            }
        }
    }
}

/**
 * Figure out which op we have.
 */
static void process()
{
    /**
     * Special case for if you put a label by itself on a line.
     * Or have a totally blank line.
     */
    if (op.empty && a1.empty && a2.empty) {
        passAct(0, -1);
        return;
    }

    /**
     * List of all valid mnemonics.
     */
    if (op == "nop")
        nop();
    else if (op == "mov")
        mov();
    else if (op == "hlt")
        hlt();
    else if (op == "add")
        add();
    else if (op == "adc")
        adc();
    else if (op == "sub")
        sub();
    else if (op == "sbb")
        sbb();
    else if (op == "ana")
        ana();
    else if (op == "xra")
        xra();
    else if (op == "ora")
        ora();
    else if (op == "cmp")
        cmp();
    else if (op == "rnz")
        rnz();
    else if (op == "pop")
        pop();
    else if (op == "jnz")
        jnz();
    else if (op == "jmp")
        jmp();
    else if (op == "cnz")
        cnz();
    else if (op == "push")
        push();
    else if (op == "adi")
        adi();
    else if (op == "rst")
        rst();
    else if (op == "rz")
        rz();
    else if (op == "ret")
        ret();
    else if (op == "jz")
        jz();
    else if (op == "cz")
        cz();
    else if (op == "call")
        call();
    else if (op == "aci")
        aci();
    else if (op == "rnc")
        rnc();
    else if (op == "jnc")
        jnc();
    else if (op == "out")
        i80_out();
    else if (op == "cnc")
        cnc();
    else if (op == "sui")
        sui();
    else if (op == "rc")
        rc();
    else if (op == "jc")
        jc();
    else if (op == "in")
        i80_in();
    else if (op == "cc")
        cc();
    else if (op == "sbi")
        sbi();
    else if (op == "rpo")
        rpo();
    else if (op == "jpo")
        jpo();
    else if (op == "xthl")
        xthl();
    else if (op == "cpo")
        cpo();
    else if (op == "ani")
        ani();
    else if (op == "rpe")
        rpe();
    else if (op == "pchl")
        pchl();
    else if (op == "jpe")
        jpe();
    else if (op == "xchg")
        xchg();
    else if (op == "cpe")
        cpe();
    else if (op == "xri")
        xri();
    else if (op == "rp")
        rp();
    else if (op == "jp")
        jp();
    else if (op == "di")
        di();
    else if (op == "cp")
        cp();
    else if (op == "ori")
        ori();
    else if (op == "rm")
        rm();
    else if (op == "sphl")
        sphl();
    else if (op == "jm")
        jm();
    else if (op == "ei")
        ei();
    else if (op == "cm")
        cm();
    else if (op == "cpi")
        cpi();
    else
        err("unknown mnemonic: " ~ op);
}

/**
 * Take action depending on which pass this is.
 */
static void passAct(ushort size, int outbyte)
{
    if (pass == 1) {
        /* Add new symbol if we have a label.  */
        if (!lab.empty)
            addsym();

        /* Increment address counter by size of instruction.  */
        addr += size;
    } else {
        /**
         * Output the byte representing the opcode.
         * If the opcode carries additional information
         *   (e.g., immediate or address), we will output that
         *   in a separate helper function.
         */
        if (outbyte >= 0)
            output ~= cast(ubyte)outbyte;
    }
}

/**
 * Add a symbol to the symbol table.
 */
static void addsym()
{
    for (size_t i = 0; i < stab.length; i++) {
        if (lab == stab[i].lab)
            err("duplicate label: " ~ lab);
    }

    symtab newsym = { lab, addr };
    stab ~= newsym;
}

/**
 * nop (0x00)
 */
static void nop()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0x00);
}

/**
 * mov (0x40 + (8-bit register offset << 3) + 8-bit register offset
 * We allow mov m, m (0x76)
 * But that will result in HLT.
 */
static void mov()
{
    argcheck(!a1.empty && !a2.empty);
    passAct(1, 0x40 + (regMod8(a1) << 3) + regMod8(a2));
}

/**
 * hlt (0x76)
 */
static void hlt()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0x76);
}

/**
 * add (0x80 + 8-bit register offset)
 */
static void add()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0x80 + regMod8(a1));
}

/**
 * adc (0x88 + 8-bit register offset)
 */
static void adc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0x88 + regMod8(a1));
}

/**
 * sub (0x90 + 8-bit register offset)
 */
static void sub()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0x90 + regMod8(a1));
}

/**
 * sbb (0x98 + 8-bit register offset)
 */
static void sbb()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0x98 + regMod8(a1));
}

/**
 * ana (0xa0 + 8-bit register offset)
 */
static void ana()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xa0 + regMod8(a1));
}

/**
 * xra (0xa8 + 8-bit register offset)
 */
static void xra()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xa8 + regMod8(a1));
}

/**
 * ora (0xb0 + 8-bit register offset)
 */
static void ora()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xb0 + regMod8(a1));
}

/**
 * cmp (0xb8 + 8-bit register offset)
 */
static void cmp()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xb8 + regMod8(a1));
}

/**
 * rnz (0xc0)
 */
static void rnz()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xc0);
}

/**
 * pop (0xc1 + 16-bit register offset)
 */
static void pop()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xc1 + regMod16());
}

/**
 * jnz (0xc2)
 */
static void jnz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xc2);
    a16();
}

/**
 * jmp (0xc3)
 */
static void jmp()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xc3);
    a16();
}

/**
 * cnz (0xc4)
 */
static void cnz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xc4);
    a16();
}

/**
 * push (0xc5 + 16-bit register offset)
 */
static void push()
{
    argcheck(!a1.empty && a2.empty);
    passAct(1, 0xc5 + regMod16());
}

/**
 * adi (0xc6)
 */
static void adi()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xc6);
    imm();
}

/**
 * rst (0xc7 + offset)
 */
static void rst()
{
    argcheck(!a1.empty && a2.empty);
    auto offset = to!int(a1, 10);
    if (offset >= 0 && offset <= 7)
        passAct(1, 0xc7 + (offset * 8));
    else
        err("invalid reset vector: " ~ to!string(offset));
}

/**
 * rz (0xc8)
 */
static void rz()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xc8);
}

/**
 * ret (0xc9)
 */
static void ret()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xc9);
}

/**
 * jz (0xca)
 */
static void jz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xca);
    a16();
}

/**
 * cz (0xcc)
 */
static void cz()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xcc);
    a16();
}

/**
 * call (0xcd)
 */
static void call()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xcd);
    a16();
}

/**
 * aci (0xce)
 */
static void aci()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xce);
    imm();
}

/**
 * rnc (0xd0)
 */
static void rnc()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xd0);
}

/**
 * jnc (0xd2)
 */
static void jnc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xd2);
    a16();
}

/**
 * out (0xd3)
 */
static void i80_out()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xd3);
    imm();
}

/**
 * cnc (0xd4)
 */
static void cnc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xd4);
    a16();
}

/**
 * sui (0xd6)
 */
static void sui()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xd6);
    imm();
}

/**
 * rc (0xd8)
 */
static void rc()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xd8);
}

/**
 * jc (0xda)
 */
static void jc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xda);
    a16();
}

/**
 * in (0xdb)
 */
static void i80_in()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xdb);
    imm();
}

/**
 * cc (0xdc)
 */
static void cc()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xdc);
    a16();
}

/**
 * sbi (0xde)
 */
static void sbi()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xde);
    imm();
}

/**
 * rpo (0xe0)
 */
static void rpo()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe0);
}

/**
 * jpo (0xe2)
 */
static void jpo()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xe2);
    a16();
}

/**
 * xthl (0xe3)
 */
static void xthl()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe3);
}

/**
 * cpo (0xe4)
 */
static void cpo()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xe4);
    a16();
}

/**
 * ani (0xe6)
 */
static void ani()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xe6);
    imm();
}

/**
 * rpe (0xe8)
 */
static void rpe()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe8);
}

/**
 * pchl (0xe9)
 */
static void pchl()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xe9);
}

/**
 * jpe (0xea)
 */
static void jpe()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xea);
    a16();
}

/**
 * xchg (0xeb)
 */
static void xchg()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xeb);
}

/**
 * cpe (0xec)
 */
static void cpe()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xec);
    a16();
}

/**
 * xri (0xee)
 */
static void xri()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xee);
    imm();
}

/**
 * rp (0xf0)
 */
static void rp()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf0);
}

/**
 * jp (0xf2)
 */
static void jp()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xf2);
    a16();
}

/**
 * di (0xf3)
 */
static void di()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf3);
}

/**
 * cp (0xf4)
 */
static void cp()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xf4);
    a16();
}

/**
 * ori (0xf6)
 */
static void ori()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xf6);
    imm();
}

/**
 * rm (0xf8)
 */
static void rm()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf8);
}

/**
 * sphl (0xf9)
 */
static void sphl()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xf9);
}

/**
 * jm (0xfa)
 */
static void jm()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xfa);
    a16();
}

/**
 * ei (0xfb)
 */
static void ei()
{
    argcheck(a1.empty && a2.empty);
    passAct(1, 0xfb);
}

/**
 * cm (0xfc)
 */
static void cm()
{
    argcheck(!a1.empty && a2.empty);
    passAct(3, 0xfc);
    a16();
}

/**
 * cpi (0xfe)
 */
static void cpi()
{
    argcheck(!a1.empty && a2.empty);
    passAct(2, 0xfe);
    imm();
}

/**
 * Get an 8-bit immediate.
 */
static void imm()
{
    ushort num;
    bool found = false;

    if (isDigit(a1[0])) {
        num = numcheck(a1);
    } else {
        if (pass == 2) {
            for (size_t i = 0; i < stab.length; i++) {
                if (a1 == stab[i].lab) {
                    num = stab[i].value;
                    found = true;
                    break;
                }
            }

            if (!found)
                err("label " ~ a1 ~ " not defined");
        }
    }

    if (pass == 2)
        output ~= cast(ubyte)num;
}

/**
 * Get a 16-bit address.
 */
static void a16()
{
    ushort num;
    bool found = false;

    if (isDigit(a1[0])) {
        num = numcheck(a1);
    } else {
        for (size_t i = 0; i < stab.length; i++) {
            if (a1 == stab[i].lab) {
                num = stab[i].value;
                found = true;
                break;
            }
        }

        if (pass == 2) {
            if (!found)
                err("label " ~ a1 ~ " not defined");
        }
    }

    if (pass == 2) {
        output ~= cast(ubyte)(num & 0xff);
        output ~= cast(ubyte)((num >> 8) & 0xff);
    }
}

/**
 * Return the 16-bit register offset.
 */
static int regMod16()
{
    if (a1 == "b") {
        return 0x00;
    } else if (a1 == "d") {
        return 0x10;
    } else if (a1 == "h") {
        return 0x20;
    } else if (a1 == "psw") {
        return 0x30;
    } else {
        err("invalid register for " ~ op);
    }

    /* This will never be reached, but quiets gdc.  */
    return 0;
}

/**
 * Return the 8-bit register offset.
 */
static int regMod8(string reg)
{
    if (reg == "b")
        return 0x00;
    else if (reg == "c")
        return 0x01;
    else if (reg == "d")
        return 0x02;
    else if (reg == "e")
        return 0x03;
    else if (reg == "h")
        return 0x04;
    else if (reg == "l")
        return 0x05;
    else if (reg == "m")
        return 0x06;
    else if (reg == "a")
        return 0x07;
    else
        err("invalid register " ~ reg);

    /* This will never be reached, but quiets gdc.  */
    return 0;
}

/**
 * Check arguments.
 */
static void argcheck(bool passed)
{
    if (passed == false)
        err("arguments not correct for mnemonic: " ~ op);
}

/**
 * Check if a number is decimal or hex.
 */
static ushort numcheck(string input)
{
    ushort num;

    if (input[input.length - 1] == 'h')
        num = to!ushort(chop(input), 16);
    else
        num = to!ushort(input, 10);

    return num;
}

/**
 * Nice error messages.
 */
static void err(string msg)
{
    stderr.writeln("a80: " ~ to!string(lineno + 1) ~ ": " ~ msg);
    enforce(0);
}

/**
 * All good things start with a single function.
 */
void main(string[] args)
{
    /**
     * Make sure the user provides only one input file.
     */
    if (args.length != 2) {
        stderr.writeln("usage: a80 file.asm");
        return;
    }

    /**
     * Create an array of lines from the input file.
     */
    string[] lines = splitLines(cast(string)read(args[1]));

    /**
     * Name output file the same as the input but with .com ending.
     */
    auto split = args[1].findSplit(".asm");
    auto outfile = split[0] ~ ".com";

    /**
     * Do the work.
     */
    assemble(lines, outfile);
}

Assembling our first working program

Do you remember the Fibonacci program from a few posts ago? Here it is again. We can now assemble it into a fully working executable. Save this as fib.asm and run it through your newly compiled assembler:

; Fibonacci in Intel 8080 assembler.
; Results in b
start:
	xra	a	; zero out a
	mov	b, a	; b = a
	mov	c, a	; c = a
	adi	01h	; a = a + 1
	mov	c, a	; c = a
	xra	a	; zero out a
loop:	add	c	; a = a + c
	cmp	c
	jc	start	; jump if carry
	mov	b, a
	mov	a, c
	mov	c, b
	jmp	loop	; jump to loop

You should then run the resulting executable through our disassembler and you can see that while the assembly uses labels for the jumps, the disassembly correctly translated them to their addresses. Unfortunately, if you have an emulator you will not be able to see any output just yet. I wrote my own Intel 8080 CPU emulator in C that displays the current value of all the registers so I can confirm that this program does in fact work correctly. We are only a couple of days away from being able to see output via CP/M.

Next time

We will teach the assembler about the first quarter of the opcode table. Then we will be finished encoding all the 8080 instructions.

Let's also take a moment to congratulate ourselves for breaking the 1000 lines mark. Not all of it is code, of course, but vim reports that our source code file is 1040 lines long. I think that's worth celebrating.

Top

RSS