Nintendo 64 console with EverDrive cartridge

Nintendo 64 Part 8: Files and Filesystems

, Nintendo 64, Programming

We can’t embed tons of data into our main program. The Nintendo 64 only has 4 MiB of RAM (or 8 MiB with the Expansion Pak), so we’ll run out of memory pretty quickly. We need a way to organize data on the cartridge so we can load it into memory as needed.

This means creating something like a filesystem, but we don’t need much of a filesystem. Instead of referring to files by name, we’re going to refer to them by number.

Background: Linkers, Symbols, and Relocations

Note: Skip to the next section if you already understand linkers, symbols, and relocations.)

Have you ever wondered what happens when you write code like this and compile it?

extern int my_global;
int get_my_global(void) {
  return my_global;
}

Think about this:

In order to return my_global, the machine code needs to have the address of the my_global variable. But the compiler doesn’t know what the address is, so what does the put there?

Let’s compile and look at the disassembly. We’re going this in x86 and then MIPS for comparison. Use -O2 to make the code shorter and easier to read.

x86 Code

$ cc -O2 -c example.c
$ objdump -d example.o

example.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <get_my_global>:
   0:   8b 05 00 00 00 00       mov    0x0(%rip),%eax        # 6 <get_my_global+0x6>
   6:   c3                      retq

At address $0 you see the instruction which loads my_global. The disassembly shows what address is written there: 0 (relative to %rip). The variable obviously isn’t at %rip, and what’s happening here is the compiler has created a relocation which instructs the linker to place the correct address of my_global at link time. Relocations are also called “fixups” because the linker “fixes up” the code with the correct addresses.

You can think of the relocation as a hole in the program, where the linker has to put data in the hole.

We can look at these relocation entries with objdump:

$ objdump -r example.o

example.o:     file format elf64-x86-64

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE
0000000000000002 R_X86_64_PC32     my_global-0x0000000000000004

There is one relocation in the .text segment which takes the 32-bit PC-relative of my_global, subtracts 4, and stores the result at address 2 in our code (which is where the zeroes are). This relocation happens at link-time, and since it uses a relative address, the code can then be loaded anywhere in memory without modifying it—it’s position-independent code.

MIPS Code

$ mips32-elf-gcc -O2 -c example.c
$ mips32-elf-objdump -d example.o


example.o:     file format elf32-bigmips


Disassembly of section .text:

00000000 <get_my_global>:
   0:   03e00008        jr      ra
   4:   8f820000        lw      v0,0(gp)

$ mips32-elf-objdump -r example.o

example.o:     file format elf32-bigmips

RELOCATION RECORDS FOR [.text]:
OFFSET   TYPE              VALUE
00000004 R_MIPS_GPREL16    my_global

MIPS exercises your brain a bit when you read it. The function actually returns first and then loads the value of the global afterwards, because MIPS executes the instruction that comes after certain jump instructions even if the jump is taken.

However, this is showing $gp-relative addressing, which is not what we are using for our Nintendo 64 program. (Maybe we should?)

$ mips32-elf-gcc -O2 -c example.c -G 0
$ mips32-elf-objdump -d example.o

example.o:     file format elf32-bigmips


Disassembly of section .text:

00000000 <get_my_global>:
   0:   3c020000        lui     v0,0x0
   4:   03e00008        jr      ra
   8:   8c420000        lw      v0,0(v0)

$ mips32-elf-objdump -r example.o

example.o:     file format elf32-bigmips

RELOCATION RECORDS FOR [.text]:
OFFSET   TYPE              VALUE
00000000 R_MIPS_HI16       my_global
00000008 R_MIPS_LO16       my_global

This is the full 32-bit relocation, which in MIPS must be split in the high 16 bits and low 16 bits, because MIPS needs two instructions to load a 32-bit constant—and there’s even another instruction in the middle of the load.

Tricky Linkers

So why do we care? Because we can use the linker to stick any value into a relocation, not just the address of a variable in memory. We are bamboozling the C compiler:

extern uint8_t my_symbol;

uintptr_t get_value(void) {
  return (uintptr_t)&my_symbol;
}

This example works the same way as the examples above. The C compiler cannot calculate the address of my_symbol because it’s defined in some other file, so the C compiler puts some zeros in the file and adds a relocation record to tell the compiler how to fix it up.

But… we’re not actually reading my_symbol. It doesn’t need to exist, because all we’re doing is converting the address to an integer and returning that integer.

We are going to use the linker to make it so that the symbol my_symbol is not the address of a variable, but just some number that we want to insert into our C program. We can do that in two ways. We can do it in the linker script:

my_symbol = 12345;

Or we can do it in an assembly language file:

.global my_symbol
my_symbol = 12345

Both of these techniques make it so the C code has the same effect as this:

uintptr_t get_value(void) {
  return 12345;
}

Note that my_symbol is gone. It never really existed—it was just a fiction that we created in order to let us use the linker to stick numbers into our program.

Packing Up Our Data

Now we are going to pack up our data files and add them to our ROM image. We are going to use the minimum number of files to prove that our code works: two. Here’s the basic structure which we will create:

Diagram showing a header and two files

The header will just be an array of these structures:

// Descriptor for an object in the pak.
struct pak_object {
  uint32_t offset;
  uint32_t size;
};

This is pretty easy for a custom tool to create. Our asset packaging tool will consume a manifest, read the listed files, and produce the entire block of data that we will embed in our ROM. The manifest looks like this:

IMG_CAT test/Ariella_32x32.dat
IMG_BALL test/Ball.dat

The first column is the identifier for the file and the second column is the file path. Since we access our files by index, we use the manifest to generate a header file with all the indexes in it. This is the header file we end up with:

/* This file is automatically generated. */
#pragma once
#define PAK_SIZE 2
#define IMG_CAT 1
#define IMG_BALL 2

We can embed our data file by appending it to our ROM before running makemask (this is inside a Bazel genrule, which is very similar to a Make recipe):

mips32-elf-objcopy -O binary $(location :Thornmarked.elf) $@
cat $(location //assets:assets.dat) >>$@
makemask $@

We need to know the location of our data in ROM, so we create an extra empty linker section containing for the end of the ROM file and export the load address (location in ROM) as a symbol:

pakdata : ALIGN(16) {
    . = .;
} >rom
_pakdata_offset = LOADADDR(pakdata);

We set up some structures for DMA:

static OSMesgQueue dma_message_queue;
static OSMesg dma_message_buffer;
static OSIoMesg dma_io_message_buffer;

osCreateMesgQueue(&dma_message_queue, &dma_message_buffer, 1);

This brings us back to the trick with linkers earlier, so we can use the location of the Pak file in our program to load data using DMA.

// Offset in cartridge where data is stored.
extern u8 _pakdata_offset[];

// Load data relative to the start of the Pak data in ROM.
static void load_pak_data(void *dest, uint32_t offset,
                          uint32_t size) {
  osWritebackDCache(dest, size);
  osInvalDCache(dest, size);
  dma_io_message_buffer = (OSIoMesg){
      .hdr =
          {
              .pri = OS_MESG_PRI_NORMAL,
              .retQueue = &dma_message_queue,
          },
      .dramAddr = dest,
      .devAddr = (uint32_t)_pakdata_offset + offset,
      .size = size,
  };
  osEPiStartDma(rom_handle, &dma_io_message_buffer, OS_READ);
  osRecvMesg(&dma_message_queue, NULL, OS_MESG_BLOCK);
  osInvalDCache(dest, size);
}

Once we can load data using these offsets, we can load the Pak header and write a function to load objects. Note that I am using one-based indexes for objects, because I want zero to be an invalid value. We are again aligning to 16 bytes so we don’t get cache tearing.

// Info for the pak objects, to be loaded from cartridge.
static struct pak_object pak_objects[PAK_SIZE]
    __attribute__((aligned(16)));

// Load the
static void load_pak_object(void *dest, int index) {
  struct pak_object obj = pak_objects[index - 1];
  load_pak_data(dest, sizeof(pak_objects) + obj.offset, obj.size);
}

Finally, this is all it takes to read data from cartridge memory. So easy!

static uint16_t img_cat[32 * 32] __attribute__((aligned(16)));
static uint16_t img_ball[32 * 32] __attribute__((aligned(16)));

// Load the pak header.
load_pak_data(pak_objects, 0, sizeof(pak_objects));

// Load objects.
load_pak_object(img_cat, IMG_CAT);
load_pak_object(img_ball, IMG_BALL);

We swap out one of the images for in our drawing code for the new image and it works!

Picture of cat and picture of ball on pink background
She has a toy now.

Just to check, it does work on hardware. Do remember that any data you want to DMA from cartridge memory must be properly aligned. I believe 2-byte alignment is good enough.