Dr. Brian Robert Callahan
academic, developer, with an eye towards a brighter techno-social life
This blog post is the first in hopefully a series where we dive very deeply into a piece of code we've written. I'm calling the series code deep dive for now but if you think of a better name let me know.
I spent yesterday doing a single-afternoon game jam after watching this video in the morning. The idea of coding with extreme constraints sounds like a lot of fun, and the video did not make a game for the objectively best operating system no bias, so I wanted to see if I could get it done and get it done quickly. Had to be sub-12 hours, had to fit into a QR code, and could not use any facilities other than the kernel that were not self-written. What I came up with is SnakeQR.
Let's break down how this went from zero to completed game.
A game with constraints rivalling the Atari 2600 means that you have to plan up-front what you want to include and—perhaps more importantly—what you are willing to live without. When thinking about my design choices, I thought about the Brick Game-like device that I had when I was a child and its version of snake. A known grid, minimal buttons, one fruit on the screen at a time, and no fancy user information other than score. I was willing to live without score, and I think on the Unix terminal people expect things closer to 80x24 characters
Another important concession is that I did not want to play around with any dynamic memory handling. That felt like a disaster waiting to happen. Everything that SnakeQR needs will need to be hardcoded buffers. There is still plenty to get wrong even with hardcoded buffers.
With the design settled coding could begin.
What do you lose out on when you only have the kernel syscall API to do all your work? The short answer is most everything. Even the connection to the API has to be reconstituted. Let's begin with the basics. To start, we need a way to exit back to the shell after the game ends, we need a way to write to the screen, and we need a way to get user input. Remember too while we think about the main function in C being where the program begins, the actual entry point is _start.
While this could be written in C like so
void _start(void) { main(); _exit(0); }
I found that handwritten assembly produced smaller code. And the function is so small and simple I do not believe it poses a porting challenge.
.text .globl _start _start: callq main movl $1, %eax xorl %edi, %edi syscall .size _start,.-_start
But something is wrong. Even if we comment out the call to main, the program crashes immediately on startup! Worse, it appears that OpenBSD doesn't consider this an executable. After some Googling, I found an archived post from jasper@'s blog where he discovered that you must add some identifying information in a .note.openbsd.ident section.
.section ".note.openbsd.ident", "a" .p2align 2 .long 0x8 .long 0x4 .long 0x1 .asciz "OpenBSD" .long 0x0
More details about these numbers can be found in the OpenBSD src/lib/csu/os-note-elf.h source code.
jasper@ tried it with .align instead of .p2align and found that it did not work with ld.lld. I did not test that myself but I can confirm that .p2align works with ld.lld.
I could not figure out a way to write the syscall connection in C so I also wrote it in assembly. This will be the most difficult part to port to another platform, particularly other CPUs.
.globl _syscall _syscall: movq %rdi, %rax movq %rsi, %rdi movq %rdx, %rsi movq %rcx, %rdx movq %r8, %rcx movq %r9, %r8 syscall retq .size _syscall,.-_syscall
If you've never seen amd64 assembly before, when you call syscall, you first put the syscall number for the function you want into %rax and then follow the standard calling convention for any parameters. However, the calling convention of a function call does not include %rax. So what my _syscall function is doing is resorting all the needed information into the correct registers before calling the real syscall.
That wraps up the assembly portion of SnakeQR. As we saw, at least some of this could be rewritten in C if portability is a concern.
The snakeqr.c file is divided into two parts: the first half (from the beginning to the system function) comprises reconstituted header files from libc that provide the data structures we need. The second half (from the timer function to the end) is the actual game and represents the novel code of the project.
I began by walking through what needs to be done: draw the screen, initialize the snake and the first fruit, then enter the main loop which reads in input from the player and updates the game based on those inputs.
I first tried to draw the screen fully, all 80x24 characters, every update. This is untenable (but hey, you gotta try). In 2020, I believe we can be safe in assuming that all Unix consoles will support VT100 escape codes, so let's use those. But even to write out those escape codes, we need to be able to write, which is functionality we do not have yet!
We have our assembly function that calls syscall correctly. Now we need to use it in the C file.
The C translation of our assembly function is
extern void * _syscall(void *n, void *a, void *b, void *c, void *d, void *e);
Where n is the syscall number and a-e are the function parameters. This follows the syscall(2) API. The sys/syscall.h header file contains a list of syscalls and their syscall number. All we need to do is figure out which number corresponds to write: 4.
According to the write(2) manual page, there are three function parameters: int d, const void *buf, and size_t nbytes. Let's translate all the size types back into their underlying storage sizes (size_t == unsigned long, on OpenBSD) to give us int d, const void *buf, and unsigned long nbytes. We can now use this information to write our reconstructed write function.
static void write(int d, const void *buf, unsigned long nbytes) { _syscall((void *) 4, (void *) d, (void *) buf, (void *) nbytes, (void *) 0, (void *) 0); }
It is important to cast everything to void * since that's what our prototype and function expect. Unused parameters can be set to NULL, but no headers allowed so we translate NULL back to (void *) 0.
Now we can write anything we want to our terminal. Let's draw a fancy 80x23 screen (23 instead of 24 so the game can be played in tmux).
static void scrinit(void) { int x, y; write(1, "\033[?25l", 6); write(1, "\033[2J\033[H", 7); write(1, " ", 1); for (x = 1; x < 79; x++) write(1, "-", 1); write(1, "\n", 1); for (y = 1; y < 22; y++) { write(1, "|", 1); for (x = 0; x < 78; x++) write(1, " ", 1); write(1, "|\n", 2); } write(1, " ", 1); for (x = 1; x < 79; x++) write(1, "-", 1); }
Next let's draw our snake. Our snake should grow with each fruit it eats. Starting at 1. Let's add a variable called snakelen to keep track of it and begin the game by incrementing it to 1 before initializing our snake.
static void snakeinit(void) { char buf[2]; again: getentropy(buf, 2); snakex = buf[0]; snakey = buf[1]; if ((snakex < 8 || snakex > 72) || (snakey < 8 || snakey > 16)) goto again; location[snakey][snakex] = snakelen; direction = 3; }
A couple of things stand out here. First is the call to getentropy(2). This is absolutely not a great way to get random numbers but it was cheaper (code size-wise) than anything else I could think of. Let's write our getentropy syscall. It has a syscall number of 7 and takes two parameters, void *buf and size_t buflen. Again, we will translate size_t back to unsigned long.
static void getentropy(void *buf, unsigned long buflen) { _syscall((void *) 7, (void *) buf, (void *) buflen, (void *) 0, (void *) 0, (void *) 0); }
Another potential solution would be to create an unsigned long long that increments at the top of every loop cycle, with another loop added at the very beginning of the game to wait for the player to press a key to begin playing (so that the number is different every play) and use that number for all your random operations. But like I said, my goal is to get the smallest possible code and this was the smallest for me but maybe you can do it better (and it would be more portable that way too).
We then check to make sure that the snake spawns with enough space between it and any of the walls to ensure the game doesn't end as soon as it begins.
Next, we record the snake's location in a 23x79 grid (we can chop off the ends here because we know they're the border and thus the snake can never be there) that we declared as a static global variable.1
static short location[23][79];
Taking inspiration from the Brick Game of old, and because dynamic memory handling and objects are out of the question for this project, we do something clever with snakelen. This variable doubles as a timer to tell the game how many frames to draw the snake at that location. Every time the snake moves (i.e., every frame), we read through the array and if that location is not zero, we draw a piece of the snake at that location and then decrement the value at that location by one. When it reaches zero, we stop drawing at that location. This will give us a correct snake with easy collision checking a little later on. And it gives us a pleasing growth animation too, a nice bonus.
The direction variable tells us which way the snake is moving: 0 == up, 1 == down, 2 == left, and 3 == right. We default to moving right.
Next we want to draw our first fruit. We use the same function to draw every subsequent fruit.
static void fruitplace(void) { char buf[2], xs[2], ys[2]; int i, n; again: getentropy(buf, 2); fruitx = buf[0]; fruity = buf[1]; if ((fruitx < 2 || fruitx > 78) || (fruity < 2 || fruity > 22) || location[fruity][fruitx] != 0) goto again; ++snakelen; i = 0; n = fruitx; do { xs[i++] = n % 10 + '0'; } while ((n /= 10) > 0); i = 0; n = fruity; do { ys[i++] = n % 10 + '0'; } while ((n /= 10) > 0); write(1, "\033[", 2); if (fruity > 9) write(1, &ys[1], 1); write(1, &ys[0], 1); write(1, ";", 1); if (fruitx > 9) write(1, &xs[1], 1); write(1, &xs[0], 1); write(1, "H%", 2); }
It begins much the same as the snakeinit function, though with an extra check to make sure that the fruit isn't being drawn on top of any portion of the snake. But then we have all this mess. In order to use our write function, we need to know how many characters we are writing. But we do not know how many characters we need to write before the fruit's location is generated, and it could vary! Fortunately, we do not need to write all at once; we can break up the write and it will still work correctly. These do ... while loops generate ASCII characters that correspond to the number being translated (i.e., it does the reverse of atoi(3)). The astute among you will notice that the resulting ASCII characters are backwards: the 10s digit is in the 0th index and the 1s digit is in the 1st index. But that's ok we just need to remember that. We also know the number must be between 2 and 77 or 2 and 22, so we only need two cells in our temporary arrays to store the ASCII characters. We can then do a quick check for number value (as a stand-in for number of characters) and print appropriately.
We can finally draw our first frame.
static void drawscr(void) { char xs[2], ys[2]; int i, n, x, y; for (y = 1; y < 23; y++) { for (x = 1; x < 79; x++) { if (location[y][x] > 0) { i = 0; n = x; do { xs[i++] = n % 10 + '0'; } while ((n /= 10) > 0); i = 0; n = y; do { ys[i++] = n % 10 + '0'; } while ((n /= 10) > 0); write(1, "\033[", 2); if (y > 9) write(1, &ys[1], 1); write(1, &ys[0], 1); write(1, ";", 1); if (x > 9) write(1, &xs[1], 1); write(1, &xs[0], 1); if (--(location[y][x]) == 0) write(1, "H ", 2); else write(1, "H@", 2); } } } }
Iterate over the game board. If a location cell value is greater than zero, a segment of the snake is there. Decrement the value in the cell. Draw it using the same integer to ASCII dance as we did with the fruit. Clear it if the new cell value is zero.
If we sit in a loop awaiting player input (like getchar does), this will be a very boring game as the snake will only ever move when the player presses a key. No fun at all. Instead, we can set a timer using signals, specifically SIGALRM, which will move the snake every time the alarm expires. This has the added effect of having regular intervals of time between frames, instead of the potential for an overly short frame if we move every time the player presses a key. In our design, a player can press as many keys as they like and the last key pressed before the alarm went off will be the direction the snake moves for that frame.
While there is an alarm(3) function, it is not a syscall and it only has second resolution. One frame per second would still feel very slow. But there is a hint in that manual page: "This is a simplified interface to setitimer(2)."
That's a system call, so let's look into setitimer.
Let's set up our setitimer call. Function call number 69.
static void setitimer(int which, const struct itimerval *value, struct itimerval *ovalue) { _syscall((void *) 69, (void *) which, (void *) value, (void *) ovalue, (void *) 0, (void *) 0); }
We will also need to import the definition of an itimerval.
struct itimerval { struct timeval it_interval; struct timeval it_value; };
Which in turn needs a definition of a timerval.
struct timeval { long long tv_sec; long tv_usec; };
We can now create a new function called do_move which will run every time the timer expires.2
static void do_move(int signo) { static struct itimerval value; if (signo == 0) timer(do_move); value.it_value.tv_sec = 0; value.it_value.tv_usec = 100000; move(); setitimer(0, &value, (void *) 0); }
When we call do_move manually in main, we give it the parameter 0. This will establish the signal handlers necessary to run do_move every time the timer expires, which is 0.1 seconds. That felt like the right balance between speed and difficulty. When the timer expires, it will call do_move with SIGALRM (14 on OpenBSD) which moves the snake and resets the timer.
The very first time we call do_move, we need to establsh the signal handlers necessary for the alarm to do the right thing.
static void timer(void (*cb)(int)) { struct sigaction action; action.sa_mask = 0; action.sa_flags = 0; action.__sigaction_u.__sa_handler = cb; sigaction(14, &action, (void *) 0); }
As you can probably guess, this is going to pull in a lot. We need all the signal handling necessary for this to work. First we need to tell SnakeQR what a struct sigaction is.
struct sigaction { union { void (*__sa_handler)(int); void (*__sa_sigaction)(int, siginfo_t *, void *); } __sigaction_u; unsigned int sa_mask; int sa_flags; };
In order to understand struct sigaction, we need to know what a siginfo_t is too.
typedef struct { int si_signo; int si_code; int si_errno; union { int _pad[29]; struct { int _pid; union { struct { unsigned int _uid; union sigval _value; } _kill; struct { long long _utime; long long _stime; int _status; } _cld; } _pdata; } _proc; struct { void *_addr; int _trapno; } _fault; } _data; } siginfo_t;
And in order to understand siginfo_t, we need to know what a sigval is.
union sigval { int sival_int; void *sival_ptr; };
Finally, we need to tell it what the sigaction function is. Fortunately for us, sigaction(2) is a system call. Number 46.
static void sigaction(int sig, const struct sigaction *act, struct sigaction *oact) { _syscall((void *) 46, (void *) sig, (void *) act, (void *) oact, (void *) 0, (void *) 0); }
That was a lot but it was so worth it. We just turned a very boring non-game into something worth playing.
How do we actually move? We use the direction variable to update our x and y coordinates.
static void move(void) { switch (direction) { case 0: --snakey; break; case 1: ++snakey; break; case 2: --snakex; break; case 3: ++snakex; } checkcrash(); if (snakex == fruitx && snakey == fruity) fruitplace(); location[snakey][snakex] = snakelen; drawscr(); }
After that, we check to see if the snake has crashed into a wall or itself, as that would cause the game to end.
static void checkcrash(void) { char score[4]; int i, n; if ((snakex < 2 || snakex > 78) || (snakey < 2 || snakey > 22) || (location[snakey][snakex] > 0 && location[snakey][snakex] < snakelen)) { system("stty sane"); write(1, "\033[?25h", 6); write(1, "\033[2J\033[H", 7); i = 0; --snakelen; n = --snakelen; do { score[i++] = n % 10 + '0'; } while ((n /= 10) > 0); write(1, "Score: ", 7); if (snakelen > 999) write(1, &score[3], 1); if (snakelen > 99) write(1, &score[2], 1); if (snakelen > 9) write(1, &score[1], 1); write(1, &score[0], 1); write(1, "\n", 1); _exit(0); } }
This is also our exit point for the game. If we've crashed, we reset the screen, calculate our score by converting snakelen into ASCII text, print that text, and call _exit. That is syscall number 1.
static void _exit(int status) { _syscall((void *) 1, (void *) status, (void *) 0, (void *) 0, (void *) 0, (void *) 0); }
Previously, I said that I could live without scoring. Score printing was the very last thing I implemented, and only after ensuring that the game would still fit into a QR code (and even better, under 2 KB). The score printing is quite expensive, about 60 bytes.
If we haven't crashed into anything, we check to see if we've eaten a fruit. If we have, we need to generate a new fruit. Then we record the length of the snake (our liveness countdown value) into that cell of the location array. We can now draw a new frame with our updated information.
We are now finally able to enter our main loop. It does not do much.
while (1) { c = 0; read(0, &c, 1); switch (c) { case 'h': direction = 2; break; case 'j': direction = 1; break; case 'k': direction = 0; break; case 'l': direction = 3; } }
Loop forever, since we have an exit point, polling for input from the player with read. Function call number 3.
static long read(int d, void *buf, unsigned long nbytes) { return (long) _syscall((void *) 3, (void *) d, (void *) buf, (void *) nbytes, (void *) 0, (void *) 0); }
We are using vi(1) arrow keys, so the only valid keys for our game are h, j, k, and l. We will ignore everything else.
We are ready to play! Let's compile our game with make.3
# snakeqr Makefile CC = cc CFLAGS = -Oz -nostdinc -ffreestanding CFLAGS += -fno-PIE -fno-PIC -fno-ret-protector -fomit-frame-pointer CFLAGS += -fno-stack-protector -mno-retpoline CFLAGS += -Wno-int-to-void-pointer-cast PROG = snakeqr OBJS = crt.o snakeqr.o all: ${OBJS} /usr/bin/ld -nopie -o ${PROG} ${OBJS} /usr/bin/strip ${PROG} /usr/bin/strip -R .comment ${PROG} /usr/bin/gzexe ${PROG} qr: qrencode -r ${PROG} -8 -o ${PROG}.png clean: rm -rf ${PROG} ${OBJS} ${PROG}.core ${PROG}~
We will end up with a binary called snakeqr. Let's run it!
The game runs, but all that happens is the snake moves to the right and eventually crashes into the wall. None of our button presses do anything.
There is one last thing we need for it to work. Our terminal is buffered. We need to turn that off. There are a few ways we could handle this. We could convert our read into getchar. But that would add a ton of complexity as we would need to understand FILE * and its ilk. Not worth it. We could teach our game the termios functions in order to handle things. But that would also be very complex. There is another trick up our sleeve: why not let the shell handle it? We can use system(3) to call stty(1) to do the terminal settings work for us. While system is a libc function and not a syscall, it is easy to implement and on the small side.
static void system(const char *command) { char *argp[4] = { "/bin/sh", "-c", (void *) 0, (void *) 0 }; int cpid, pid, pstat; argp[2] = (char *) command; switch ((cpid = vfork())) { case -1: _exit(1); case 0: execve("/bin/sh", argp, (void *) 0); _exit(127); } do { pid = wait4(cpid, &pstat, 0, (void *) 0); } while (pid == -1); }
We need to teach SnakeQR three more system calls for system to work: vfork (66), execve (59), and wait4 (11). The system function in OpenBSD libc uses waitpid(3) instead of wait4 but as explained in the manual page: "The waitpid() call is identical to wait4() with an rusage value of zero."
static long vfork(void) { return (long) _syscall((void *) 66, (void *) 0, (void *) 0, (void *) 0, (void *) 0, (void *) 0); }
static void execve(const char *path, char *const argv[], char *const envp[]) { _syscall((void *) 59, (void *) path, (void *) argv, (void *) envp, (void *) 0, (void *) 0); }
static long wait4(int wpid, int *status, int options, void *rusage) { return (long) _syscall((void *) 11, (void *) wpid, (void *) status, (void *) options, (void *) rusage, (void *) 0); }
We can add a system("stty cbreak -echo") call to main at startup and a system("stty sane") call during checkcrash as we are cleaning up. Our final main function looks like this now.
void main(void) { int c; ++snakelen; scrinit(); system("stty cbreak -echo"); snakeinit(); fruitplace(); drawscr(); do_move(0); while (1) { c = 0; read(0, &c, 1); switch (c) { case 'h': direction = 2; break; case 'j': direction = 1; break; case 'k': direction = 0; break; case 'l': direction = 3; } } }
Let's recompile and play. Success! Our game works! We have managed to create a real game worth playing in a size small enough to fit into a QR code. But we have a few more tricks up our sleeve to get down to that small size.
You will notice in the Makefile that we are using clang's -Oz flag to aggressively optimize for size and there are a lot of CFLAGS disabling nearly all of the mitigations that OpenBSD users expect, all of which are enabled by default. There are always tradeoffs in the speed-size-performance-security space and we are consciously choosing size over everything else. We barely care about speed, so long as our 10 fps can be attained. We don't much care about performance either; it doesn't matter to us if we are using elegant and optimized algorithms, just small ones. And we are choosing to not care about security by turning off all the mitigations. You should not do this. You should choose to care about security. I believe that this program is safe but it does contain a system symbol, a "/bin/sh" string, and an execve symbol. You have been warned. Turn the mitigations back on if you intend to have this on your system for any length of time.
We strip the binary as usual. We then also strip the .comment section. This saves a few bytes, and I don't think anyone really cares if their binaries don't proudly announce that they were linked with LLD.
There is one final trick we can see in our Makefile: the use of gzexe(1) to compress the executable down. There are other ways of doing executable packing, such as the well-known UPX, but that is only available for OpenBSD/i386 and I did not want to spend the time porting it. I also wanted to avoid using things outside of the base operating system since we had already come so far with the least of tooling. I also experimented with rewriting the gzexe script to use other compression tools, discovering that Lzip gave by far the best compression, but ultimately abandoned the idea since again it would mean bringing something in from ports and because gzexe got the final binary down to under 2 KB, which was good enough for me.
I hope you enjoyed this very deep dive into SnakeQR and how you can code your own games with extreme minimalism and constraints. The code is ISC licensed, so you are free to do what you'd like with the game. If you port it to a new platform, whether a new CPU or a new Unix OS, please send in a Pull Request!
1 Writing this blog post made me notice a bug here. The location array was originally an array of chars, which even if unsigned would not be large enough to hold the maximum possible snakelen value. A short works. Fixed in a144479. Goes to show you how none of us are safe from careful code review.
2 Another code review change. Originally setitimer was called before move but I believe it makes more sense to call move first then setitimer. Fixed in 74a1942.
3 A third code review change. I discovered that the -fdata-sections -ffunction-sections -Wl,--gc-sections dance was increasing the final binary size. Fixed in 0d6fa81.