Nintendo 64 console with EverDrive cartridge

Nintendo 64 Part 3: Building a Sample Program

, Nintendo 64, Programming

More ramblings about Nintendo 64 homebrew. I’m trying to describe my process in detail, and editing it into a coherent narrative.

Game Jam Updates

The jam has started! These blog posts are being written and posted on a bit of a delay, so the jam has actually started almost a week ago.

The original announcement: YouTube: Nintendo 64’s First Game Jam ().

Theme reveal: N64brew Game Jam #1 - Theme Reveal Trailer (). The theme is size.

According to one of the folks on Discord, there are something like 32 solo and 12 teams participating in the jam. 44 total, nice! We’ll see how many people make it, come December, I’m sure you’ll be able to count the number of finished games on your fingers. (I’m not even sure you’ll need both hands. We’ll see.)

The prize pot is also over $1,000. I donated to the prize pot because I want people to see how exciting this game jam is. There are a couple people in chat who are a bit unhappy with the size of the prize pot…

The current way that things are handled with the Prize pot makes my blood boil so hard. […] I expect there to be lots of bickering when the jam’s over […]

That’s the issue. When there’s money involved, it becomes progressively less and less your choice what the real goals are, and my issue is that this prize pool is going to directly affect the enjoyment and fun of the Jam

My thoughts on the prize pool:

Enough about the jam. Time to develop!

Getting an SDK

There appear to be a few different SDKs and libraries.

  1. LibUltra, Nintendo’s official (proprietary) SDK. People call this LibUltra even though it was historically just “the SDK”. Available for Windows and SGI IRIX.
  2. Libdragon, an open-source library for Nintendo 64.
  3. Pseultra, a collection of Nintendo 64 development tools, including a library called libpseultra.
  4. Libn64, an open-source library for Nintendo 64.
  5. LibreUltra, a matching decompilation of Nintendo’s LibUltra.

From comments in chat, it sounds like the open-source versions are not completely ready for making 3D games, although the’ll work well enough for 2D. I’ll start with the LibUltra SDK and explore other options later.

I’m using Linux, so I’ll start by trying the SGI IRIX toolchain. (This was the wrong choice! But I’ll learn about that later.) At the very least, it’s more likely that the tools will be distributed in a more familiar format (ideally, a tar file). I found the following items as CD-ROM images:

So, not was the “Developer Tool Kit” CD-ROM image corrupted, but after going through the work of trying to extract files from it, I learned that it does not contain LibUltra or the Nintendo 64 toolchain! Instead, it contains additional libraries that run on top of the N64 toolchain, such as NuSystem. From the docs:

NuSystem reduces the amount of effort needed in the initial stage of program development, making N64 development easier to understand. In NuSystem, each N64 function is a component which can be controlled using callback and front-end functions - facilitating the progress of N64 programs. The flexible design takes processing speed, memory efficiency, and expandability into consideration. With NuSystem you can create a program without delving into the complicated aspects of N64 development.

It turns out that what I want is the disc called “OS/Library”, which I had not entirely understood. The reason it’s called “OS” is because the library, LibUltra, contains a basic operating system for the Nintendo 64 that is linked into N64 games. The latest version is 2.0L, and for maximum confusion, it’s written with a lower-case “L” so you can more easily confuse it with version 2.0I.

The software for SGI is distributed for a software installation program called inst, which IRIX uses to install software. Since I don’t have an SGI IRIX workstation handy (yet—it’s been a dream of mine to have an Indigo2, Indy, O2, or Octane), I wrote a tool to extract files from these packages called SGI Extractor.

The OS/Library 2.0L cd contains only one package. After extracting it, I poked around looking for interesting files:

Some of the text files were in Japanese, with the Japanese EUC encoding. If you want to read them, you probably want to convert them to Unicode. For example:

$ cd usr/src/PR
$ ls
README.jp  assets  demos  demos_old  doc  libsrc
$ iconv -f eucJP -t utf-8 <README.jp
このディレクトリについて

    このディレクトリには、下記のものが含まれています。

    doc/
    リリースノートはこのディレクトリの下にあります。まずはリリース
    ノートをご覧下さい。過去のリリースノートは doc/relnotes_old/
[...]

Compiler Flags

I see the flags other people are using in the chat:

CFLAGS = -fno-PIC -mabi=32 -mno-shared -mno-abicalls \
    -march=vr4300 -mtune=vr4300 -mfix4300 -G 0

Let’s analyze these to see what we need.

Preprocessor Definitions

I also see people using some flags like -D_MIPS_SZLONG=32 and -D_MIPS_SZINT=32. Are these necessary? We can check:

$ mips64-gcc -dM -E -xc /dev/null | grep _MIPS_SZ
#define _MIPS_SZPTR 32
#define _MIPS_SZINT 32
#define _MIPS_SZLONG 32

It turns out that GCC defines these by default.

Position Independent Code

We know that GCC is not generating PIC code by default, because trying to enable it fails:

$ mips64-gcc -fpic -c -xc -
cc1: error: position-independent code requires '-mabicalls'

From reading GCC: MIPS Options we can figure out that we don’t need -mno-abicalls because it is default, we also don’t need -mno-shared because it has no effect without -mabicalls, and we likewise don’t need -fno-pic.

Multiplication Bug

The -mfix4300 looks like it is intended to enable a workaround for a bug in the VR4300 silicon, but does this flag work and what exactly does it do? It’s not documented in the GCC manual. However, we can find this note in the GCC source code:

Early VR4300 silicon has a CPU bug where multiplies with certain operands may corrupt immediately following multiplies. This is a simple fix to insert NOPs.

The surrounding code seems to just insert a nop after a floating-point multiply. We can test whether this nop is inserted with and without the -mfix4300 flag. Create test.c:

float f(float x, float y) {
    return x * y;
}

Compile it with and without -mfix4300:

$ mips64-gcc -S -O2 test.c && cat test.s
f:
        jr      $31
        mul.s   $f0,$f12,$f13
$ mips64-gcc -mfix4300 -S -O2 test.c && cat test.s
f:
        mul.s   $f0,$f12,$f13
        nop
        jr      $31
        nop

Looks like this option is necessary.

MIPS ABI

The last thing we need to figure out is to set the correct ABI. There are five ABIs: o32, n32, o64, n64, and eabi. The GCC flag -mabi=32 selects the “o32” ABI and -mabi=64 selects the “n64” ABI, and just to be completely clear here the “n” in “n64” stands for “native”, not “Nintendo”. If we choose the wrong ABI, we may still be able to build and link our code, but we may experience anything from mysterious data corruption to crashes.

I thought this would be fairly simple to write up, but as I investigate, I become less sure what the correct ABI option is.

We know that the Nintendo 64 uses a NEC VR4300, which is a MIPS R4300i, and implements the MIPS III instruction set. This is a 64-bit processor, but you already knew that, because Nintendo decided that using a 64-bit architecture was so important that they named the console after it (even though their next console, the Nintendo GameCube, had a 32-bit architecture).

However, since it is a 64-bit processor, the “o32” ABI does not make logical sense. What is “o32”? It is the ABI used for MIPS I and MIPS II processors, which are 32-bit processors. The “n32” ABI is for 64-bit processors only, and was created to provide an efficient ILP32 (32-bit int, long, and pointer) ABI for 64-bit MIPS processors. The n32 ABI makes logical sense, because the Nintendo 64 has a 64-bit processor and a 32-bit address space. From Whats Wrong With O32, N32, N64:

o32 has been an orphan for a long time. Somewhere in the mid-1990s SGI dropped it completely, because all their systems had been using real 64-bit CPUs for some time.

So, how can we find out what ABI the Nintendo 64 LibUltra toolchain uses? We could try looking at the object files in LibUltra:

$ mkdir objs
$ cd objs
$ ar x /path/to/libultra.a
$ mips64-readelf -h bcopy.o
ELF Header:
  Magic:   7f 45 4c 46 01 02 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, big endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           MIPS R3000
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          4736 (bytes into file)
  Flags:                             0x10000000, mips2
  Size of this header:               52 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           40 (bytes)
  Number of section headers:         8
  Section header string table index: 2

This doesn’t give us the information we want. (Edit: Yes it does! The mips2 ISA is 32-bit, and it implies the o32 ABI. I didn’t realize this at the time I wrote this.) When we compile code with our mips64-gcc toolchain, the resulting object files have flags which specify which ABI we are using, but that information isn’t present here for whatever reason. What other way can we be confident that we have the right ABI to link with LibUltra? Well, there are two main differences between the o32 and n32 ABIs. From N32 ABI Overview, we see that the n32 has more argument registers, 8, compared to o32’s 4, but that would only make a difference for functions with more than four arguments. We can also look for a function with a 64-bit parameter and look at the disassembly. One such function is osSetTime(). Here is the declaration:

typedef u64 OSTime;
extern void osSetTime(OSTime);

And here is the disassembly:

$ mips64-objdump -d settime.o

settime.o:     file format elf32-bigmips


Disassembly of section .text:

00000000 <osSetTime>:
  0:    afa40000    sw a0,0(sp)
  4:    8fae0000    lw t6,0(sp)
  8:    afa50004    sw a1,4(sp)
  c:    3c010000    lui        at,0x0
  10:   8faf0004   lw        t7,4(sp)
  14:   ac2e0000   sw        t6,0(at)
  18:   3c010000   lui       at,0x0
  1c:   03e00008   jr        ra
  20:   ac2f0004   sw        t7,4(at)
  ...

This is clearly the o32 ABI designed for MIPS II, because it splits a single 64-bit argument between the a0 and a1 registers. As a side note, I’m not used to reading MIPS and thought that objdump wasn’t showing me the entire function because the function didn’t end with a return. Of course the return is the second-to-last instruction (the jr), and the following instruction (the sw) is in the branch delay slot. MIPS, eh? I’m sure it made the silicon simpler.

Final Flags

The final set of compiler flags we use are:

CFLAGS = -mabi=32 -ffreestanding -mfix4300 -G 0

I added -ffreestanding because the Nintendo 64 certainly qualifies as a freestanding environment. See Language Standards Supported By GCC.

Building a Sample Program

The SDK includes sample programs in /usr/src/PR/demos. I’d like to pick a small one and build it. Which one has the fewest lines of code?

$ cd usr/src/PR/demos
$ for dir in * ; do
    if test -d "$dir" ; then
      echo -n "$dir,"
      cloc "$dir" --csv | tail -n 1 | cut -d, -f5
    fi
  done | sort -n -t, -k2
Texture,49
greset,52
print,84
sramtest,134
ginv,188
gl,334
fault,359
onetri,412
onetri-fpal,416
topgun,507
[...]

The onetri demo looks the most promising. I’ll try to build it.

This is the makefile I made, based on the demo’s makefile and using the flags I figured out above. I just want to compile codesegment.o from the sample program.

CFLAGS := -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include

CC := mips64-gcc
LD := mips64-ld

CFLAGS += -DF3DEX_GBI_2
ifdef FINAL
CFLAGS += -O2 -DNDEBUG -D_FINALROM
N64LIB = ultra_rom
else
CFLAGS += -g -DDEBUG
N64LIB = ultra_d
endif

codefiles = onetri.c dram_stack.c rdp_output.c
codeobjects = $(codefiles:.c=.o)
datafiles = static.c cfb.c rsp_cfb.c
dataobjects = $(datafiles:.c=.o)

codesegment.o: $(codeobjects)
    $(LD) -nostdlib -r -o $@ $^ ../ultra/lib/lib$(N64LIB).a
clean:
    rm -f $(codeobjects) $(dataobjects) codesegment.o
.PHONY: clean
$ make
mips64-gcc -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o onetri.o onetri.c
onetri.c:31:10: fatal error: assert.h: No such file or directory
   31 | #include <assert.h>
      |          ^~~~~~~~~~
compilation terminated.
make: *** [<builtin>: onetri.o] Error 1
[Exit: 2]

I’ll create a really simple <assert.h> header to get this compiling, and put it in a folder named system.

#define assert(x) (void)0

Update the Makefile,

CFLAGS := -mabi=32 -ffreestanding -mfix4300 -G 0 \
    -I../ultra/include -I../system
$ make
mips64-gcc -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include -I../system -DF3DEX_GBI_2 -g -DDEBUG   -c -o onetri.o onetri.c
mips64-gcc -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include -I../system -DF3DEX_GBI_2 -g -DDEBUG   -c -o dram_stack.o dram_stack.c
mips64-gcc -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include -I../system -DF3DEX_GBI_2 -g -DDEBUG   -c -o rdp_output.o rdp_output.c
mips64-ld -nostdlib -r -o codesegment.o onetri.o dram_stack.o rdp_output.o ../ultra/lib/libultra_d.a
mips64-ld: ../ultra/lib/libultra_d.a(exceptasm.o): linking 32-bit code with 64-bit code
mips64-ld: failed to merge target specific data of file ../ultra/lib/libultra_d.a(exceptasm.o)
mips64-ld: ../ultra/lib/libultra_d.a(ll.o): linking 32-bit code with 64-bit code
mips64-ld: failed to merge target specific data of file ../ultra/lib/libultra_d.a(ll.o)
mips64-ld: codesegment.o: illegal section name `.gptab.data'
mips64-ld: final link failed: nonrepresentable section on output
make: *** [Makefile:21: codesegment.o] Error 1

The first message is “linking 32-bit code with 64-bit code”, so I’ll tackle that… by asking in chat. It turns out that the SGI version of the SDK and the Windows version of the SDK are different! The SGI version was compiled with the SGI compiler, and the Windows version was compiled with GCC. The toolchains are not compatible.

Getting the Windows Toolchain

We find an ISO image of the Windows OS/PC disc and mount it.

$ cabextract -d ~/os20l os20l_eng.exe
$ cd ~/os20l
$ unshield x data1.cab
$ find Ultra_Dev_*/usr/include -type f -exec dos2unix '{}' +

Of note, the library in this SDK is named libgultra, instead of libultra. The “g” stands for GNU or GCC.

Building A Sample Program, Mark Two

This toolchain has an <assert.h> header, so I can delete my version. Here are the changes to the makefile:

CFLAGS := -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include

codesegment.o: $(codeobjects)
    $(LD) -nostdlib -r -o $@ $^ ../ultra/lib/libg$(N64LIB).a

The code segment does build now.

$ make
mips64-gcc -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o onetri.o onetri.c
mips64-gcc -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o dram_stack.o dram_stack.c
mips64-gcc -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o rdp_output.o rdp_output.c
mips64-ld -nostdlib -r -o codesegment.o onetri.o dram_stack.o rdp_output.o ../ultra/lib/libgultra_d.a

We install Spicy and add rules to our makefile:

all: onetri.n64

onetri.n64: spec codesegment.o $(dataobjects)
    spicy -r $@ spec
$ make
mips64-gcc -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o onetri.o onetri.c
mips64-gcc -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o dram_stack.o dram_stack.c
mips64-gcc -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o rdp_output.o rdp_output.c
mips64-ld -nostdlib -r -o codesegment.o onetri.o dram_stack.o rdp_output.o ../ultra/lib/libgultra_d.a
mips64-gcc -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o static.o static.c
mips64-gcc -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o cfb.o cfb.c
mips64-gcc -mabi=32 -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o rsp_cfb.o rsp_cfb.c
spicy -r onetri.n64 spec
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x54ced9]

A panic usually means that there is something wrong with the program, so I made my own fork of Spicy, depp/spicy, with better error handling. With the forked version, I can make additional progress. Here is a change to the makefile:

onetri.n64: spec codesegment.o $(dataobjects)
    spicy -r $@ spec --toolchain-prefix=mips64-

Trying to compile it, I fail yet again.

$ make
spicy -r onetri.n64 spec --toolchain-prefix=mips64-
ERRO[0000] Error: spicy.LinkSpec: Error running 'mips64-ld': exit status 1: mips64-ld: codesegment.o: in function `__osInitialize_common':
(.text+0x4648): undefined reference to `__udivdi3'
mips64-ld: codesegment.o: in function `MonitorInitBreak':
../monutil.s:184: undefined reference to `__umoddi3'
mips64-ld: ../monutil.s:184: undefined reference to `__udivdi3'
mips64-ld: ../monutil.s:184: undefined reference to `__divdi3'
make: *** [Makefile:29: onetri.n64] Error 1

I know what this is. This means I haven’t linked in LibGCC! But there is only one copy of libgcc.a that I see, and on investigation, it uses the o64 ABI.

Here I go again, rebuilding the toolchain.

$ mkdir build-binutils; cd build-binutils
$ ../binutils-2.35.1/configure \
  --target=mips32-elf --prefix=/opt/n64 \
  --program-prefix=mips32-elf- --with-cpu=vr4300 \
  --with-sysroot --disable-nls --disable-werror
$ make
$ sudo make install
$ cd ..
$ mkdir build-gcc; cd build-gcc
$ ../gcc-10.2.0/configure \
  --target=mips32-elf --prefix=$prefix \
  --program-prefix=mips32-elf- --with-arch=vr4300 \
  --with-languages=c,c++ --disable-threads \
  --disable-nls --without-headers --with-newlib
$ make all-gcc
$ make all-target-libgcc
$ sudo make install-gcc
$ sudo make install-target-libgcc

I’ve renamed things, so I have to change the Makefile. Note that GCC is now compiled to use the o32 ABI by default.

CFLAGS := -ffreestanding -mfix4300 -G 0 -I../ultra/include

CC := mips32-elf-gcc
LD := mips32-elf-ld

codesegment.o: $(codeobjects)
    $(LD) -nostdlib -r -o $@ $^ ../ultra/lib/libg$(N64LIB).a \
        /opt/n64/lib/gcc/mips32-elf/10.2.0/libgcc.a

onetri.n64: spec codesegment.o $(dataobjects)
    spicy -r $@ spec --toolchain-prefix=mips32-elf-

Compiling it this time works:

$ make
mips32-elf-gcc -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o onetri.o onetri.c
mips32-elf-gcc -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o dram_stack.o dram_stack.c
mips32-elf-gcc -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o rdp_output.o rdp_output.c
mips32-elf-ld -nostdlib -r -o codesegment.o onetri.o dram_stack.o rdp_output.o ../ultra/lib/libgultra_d.a /opt/n64/lib/gcc/mips32-elf/10.2.0/libgcc.a
mips32-elf-gcc -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o static.o static.c
mips32-elf-gcc -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o cfb.o cfb.c
mips32-elf-gcc -ffreestanding -mfix4300 -G 0 -I../ultra/include -DF3DEX_GBI_2 -g -DDEBUG   -c -o rsp_cfb.o rsp_cfb.c
spicy -r onetri.n64 spec --toolchain-prefix=mips32-elf-

The emulator crashes, and it’s because we’re missing the boot code. This is created by MakeMask, a program which has another modern replica: MakeMask. This computes a checksum on 1 MiB of data starting at offset 0x1000, but the program just panics because my ROM is not padded out to a large enough size. I made my own fork with some better error handling and made it pad out the ROM: depp/makemask. Additionally, the debug build seems not to work, so I need FINAL=1.

Update: A previous version of this post wrote makerom instead of makemask below.

$ make clean
$ make FINAL=1
$ makemask onetri.n64
$ /usr/games/mupen64plus onetri.n64
Colored Square in Nintendo 64 Emulator

I’m going to bed. At some point in the future I’ll figure out how to make a game in this Byzantine development environment.