Nintendo 64 console with EverDrive cartridge

Nintendo 64 Part 21: GP-Relative Addressing

, Nintendo 64, Programming

In the MIPS ABI, register $gp (GPR 28) is called the “global area pointer”. What does that mean, and how is it used?

How GP-Relative Addressing Works

You can see how it works by looking at the assembly code generated by the compiler. Here is a simple function that loads a global variable and returns it:

int variable;

int get_variable(void) {
  return variable;
}

Compiled with the -G 0 flag that I’ve been using, the assembly output is:

get_variable:
        lui     $2,%hi(variable)
        jr      $31
        lw      $2,%lo(variable)($2)

It takes two instructions to load a global variable because the address is 32 bits and MIPS instructions are 32 bits, so the variable’s address is split across two instructions. The first instruction, lui, loads the top 16 bits of the address into a register, and the lw instruction loads the variable using the low 16 bits of the address as an offset.

The way $gp works is simple:

GCC will handle step #3 for us if we pass a different value for the -G flag. With -G 4, variables up to four bytes in size will be accessed relative to $gp. For example, the above code now compiles to this:

get_variable:
        jr      $31
        lw      $2,%gp_rel(variable)($28)

You can see that it only takes one instruction to load the variable now. The limitation is that the offset in lw and other load/store instructions is limited to 16 bits, so we can only access 64 KiB of data this way.

Variables accessed with GP-relative addressing are also called “small” objects and the compiler will place them in the .sdata and .sbss sections—the “small data” and “small BSS” sections.

Using GP-Relative Addresses

First, I’ll change the value of the -G flag to -G 4, just as a starting point. This means that global variables that are 4 bytes or smaller will be accessed relative to $gp.

Next, I update the linker script to place the small data and small BSS sections next to each other, and create a _gp symbol for the value of the $gp register. The _gp symbol name is special, and recognized by the MIPS linker as the value for $gp. Since the BSS section comes right after the data section, I can place the small data at the end of the data section and the small BSS at the beginning of the BSS section. Here are the changes to the linker script:

.text : {
    ...
    *(.data .data.*)
    _gp = ALIGN(16) + 0x8000;
    *(.sdata .sdata.*)
    _text_end = .;
} >ram AT>rom

...

.bss (NOLOAD) : ALIGN(16) {
    _bss_start = .;
    *(.sbss .sbss.*)
    *(.bss .bss.*)
    ...
} >ram

Since -0x8000 is the lowest offset that you can put in a load or store instruction, I put $gp exactly 0x8000 after the start of the small data section.

I add code to the entry point will to set the $gp register. On a real operating system on MIPS, the toolchain will do this for you.

_start:
        ...
        # Set up global pointer
        la      $gp, _gp
        ...

This is not enough! The Nintendo 64 OS, LibUltra, is written with the assumption that you are not using GP-relative addressing. When you create a new thread, the new thread will have $gp set to 0, instead of inheriting the value of $gp from the parent thread.

There are two ways I can solve this. One is to just fix the value of $gp at the beginning of every thread:

// This function must be called at the top of each thread, except
// boot.
inline void thread_init(void) {
  __asm__("la $gp, _gp");
}

void idle(void *arg) {
  thread_init();
  ...
}

void main(void *arg) {
  thread_init();
  ...
}

Note: Inline assembly should normally have inputs, outputs, and side effects specified. However, since the assembly has no outputs or side effects specified, GCC treats it as implicitly volatile.

In general, an __asm__ block is not as simple as just “put assembly here”, so don’t think of it that way.

An alternative technique is to create a wrapper for osThreadCreate which sets the value of $gp. This means I don’t have to remember to use thread_init() at the beginning of every thread:

// Call this function instead of osCreateThread.
void thread_create(OSThread *thread, int thread_id,
                   void (*func)(void *arg), void *arg, void *stack,
                   int priority) {
  osCreateThread(thread, thread_id, func, arg, stack, priority);
  __asm__(
      ".set gp=64\n\t"
      "sd $gp, %0\n\t"
      ".set gp=default"
      : "=m"(thread->context.gp));
}

As one final gotcha, I need to ensure that my code doesn’t access any global variables in LibUltra using GP-relative addressing, because LibUltra was compiled with the -G 0 and these variables won’t be placed in the small data sections. Fortunately, there is only one variable that I access this way, osTvType. I can tell GCC that the variable is in a specific section, and this will prevent GCC from using GP-relative addressing for that variable. The exact section is not important, it’s just important that GCC knows that the variable is not in .sdata or .sbss.

I add the following declaration to my code for osTvType:

// Avoid gprel access by declaring a different section.
extern s32 osTvType __attribute__((section(".data")));

It works! GP-relative addressing makes my code slightly smaller—the ROM shrinks by 64 bytes. Not worth the time I spent, but sometimes it’s just a learning experience.