NES Development Day 1: Creating a ROM

, Famicom, Programming

Recently I’ve been dabbling with some NES (Nintendo Famicom) development. NES development is a bit fun and a bit frustrating… but it’s the kind of frustration that makes you feel all warm and fuzzy inside when you finally get something to work.

Retro Homebrew: Why NES?

I knew I wanted to make a 2D retro game, but there are just so many systems to choose from, and many of them have good homebrew communities you can learn from. It was time to narrow the list down. My final list had four systems on it. I wanted a system small enough that it could barely do what I wanted it to do, with a good community and good documentation.

Of these systems, the NES is the most generally limited. The background tile layer on an NES is two regions of 32x30 tiles, which is about the same size as one screen of video. This makes it difficult to scroll both horizontally and vertically at the same time, which is why so many games didn’t have free scrolling, and the ones that did often had visual glitches when scrolling in one of the directions. The NES gives you a small palette to choose from, with a maximum of 25 different colors on screen at any given moment, and only three different colors in any sprite or background tile. This makes any kind of art direction a real challenge. The NES also only has 2 KiB of RAM available to the CPU, which is more or less a vanilla 6502 with no binary-coded decimal instructions. Sound effects are limited to a single triangle wave, two PWM channels, a noise generator, and 6-bit PCM.

By comparison, its competitor the Sega Master System had fewer limitations in general. The Master System’s background layer was larger than one full screen of video, which made it trivial to scroll freely in horizontal, vertical, or diagonal directions. It had 8 KiB of RAM, and a sound generator which could do basic FM synthesis.

So why choose the NES? The technical challenges looked fun but not impossible, and the NES homebrew community was active and had good documentation. So off I went.

Getting Started

I use Linux for development these days, and my home workstation runs Fedora. The first step was to just get an assembler working, and produce a runnable NES ROM image.

It turns out that there are a ton of 6502 assemblers. This makes sense, of course, the 6502 was an immensely popular CPU back in the day, popular enough that it’s still manufactured in some form, and countless programmers cut their teeth writing programs in assembly for the Apple II, Commodore 64, BBC Micro, Vic-20, etc. A Tour of 6502 Cross-Assemblers gives a pretty good overview of the options, noting that ca65 is “now the assembler recommended for use by the NES development community.” A NES game on GitHub, Nova the Squirrel, uses ca65 in its mk.bat build script, giving me a nice example to fall back on in case things mysteriously don’t work. Finally, the NesDev Wiki Programming Guide recommends installing ca65.

Fedora had no ca65 or cc65 packages in the main repositories or in RPM Fusion, so I built and installed the cc65 suite from source. This was straightforward and easy. The trick is getting ca65 to produce files in iNES format (see INES on Nes Dev Wiki), which is the format for ROM images used by NES emulators. Unlike some other 6502 assemblers, cc65 does not have built-in support for generating NES ROM images. But first, we have to understand a little bit of how the build process works.

The cc65 tools include both an assembler, ca65, and linker, ld65. If you’re familiar with compiling C or C++ code on the command line, this should be very familiar to you. The output for ca65 is an object file, containing data (including code) organized in sections, which can contain symbol definitions and have symbol references to be resolved later. The linker, ld65, resolves all the undefined symbols and organizes the data into an image using a linker script which you write yourself.

Since the iNES format has a simple 16-byte header, all we need to do is put the header data into its own section, and then use the linker script to put that section at the start of the output. Here’s how we can write the iNES header to its own section. The section name “INES” is something we chose, it’s not a standard name that our tools understand.

;;; Size of PRG in units of 16 KiB.
prg_npage = 1
;;; Size of CHR in units of 8 KiB.
chr_npage = 1
;;; INES mapper number.
mapper = 0
;;; Mirroring (0 = horizontal, 1 = vertical)
mirroring = 1

.segment "INES"
        .byte $4e, $45, $53, $1a
        .byte prg_npage
        .byte chr_npage
        .byte ((mapper & $0f) << 4) | (mirroring & 1)
        .byte mapper & $f0

Now that the data exists in the assembler output, we can describe the different sections to the linker in our linker script. Each section will map to a memory region of our own choosing, so the INES segment is mapped to a region which we name HEADER. Since the INES segment just contains ordinary data, we can use type “ro” (read-only).

This part of the script also describes a few other segments, like the zero page, BSS, the interrupt vector, and the CHR data (graphics).

SEGMENTS {
    ZEROPAGE: load = ZP, type = zp;
    BSS:    load = RAM, type = bss;
    INES:   load = HEADER, type = ro, align = $10;
    CODE:   load = PRG0, type = ro;
    VECTOR: load = PRG0, type = ro, start = $BFFA;
    CHR0a:  load = CHR0a, type = ro;
    CHR0b:  load = CHR0b, type = ro;
}

Once the segments are mapped to memory regions, we can emit an output image file with the memory regions at offsets of our choosing.

MEMORY {
    ZP:     start = $0000, size = $0100, type = rw;
    RAM:    start = $0300, size = $0400, type = rw;
    HEADER: start = $0000, size = $0010, type = rw,
            file = %O, fill = yes;
    PRG0:   start = $8000, size = $4000, type = ro,
            file = %O, fill = yes;
    CHR0a:  start = $0000, size = $1000, type = ro,
            file = %O, fill = yes;
    CHR0b:  start = $1000, size = $1000, type = ro,
            file = %O, fill = yes;
}

Note that ZP and RAM don’t get written to the output file. The segments only contain zeroes, and are just used so that we can have symbols that point to locations in RAM. The iNES format starts with the 16-byte header, followed by a series of 16 KiB PRG (CPU memory) blocks, and then a series of 8 KiB CHR (PPU memory) blocks. We only have one PRG block, named PRG0. The part of the cartridge mapped to PPU ROM contains two 4 KiB pattern tables, so we split this into two regions named CHR0a and CHR0b. Right now there is only one bank of PRG and one bank of CHR, but with bank switching this could grow in the future.

All this work, and the only thing we have to show for it is a NES ROM file that doesn’t even do anything. So we pick up a tutorial on NES Programming Basics, and write a program that beeps.

To do this, we write handlers for the three interrupts, and put them in the interrupt vector that appears at address $BFFA. The interrupt vector just contains three addresses, one for each interrupt:

And here’s the assembly:

.segment "VECTOR"
.addr nmi
.addr reset
.addr irq

For now, we don’t need an NMI or IRQ handler, so these can be empty.

.code

.proc nmi
        rti
.endproc

.proc irq
        rti
.endproc

All of the interesting stuff goes into the RESET handler, but before we write that, define a few symbols so we aren’t pasting constants into our code everywhere.

;;; PPU registers.
PPUCTRL		= $2000
PPUMASK		= $2001
PPUSTATUS	= $2002
OAMADDR		= $2003
OAMDATA		= $2004
PPUSCROLL	= $2005
PPUADDR		= $2006
PPUDATA		= $2007

;;; Other IO registers.
OAMDMA		= $4014
APUSTATUS	= $4015
JOYPAD1		= $4016
JOYPAD2		= $4017

With that out of the way, we can initialize the NES hardware. The 6502 in the NES and its RAM start up with unknown contents, so we have to reset all of the processor’s state, zero the RAM, and wait for the PPU to warm up. According to the PPU Warmup Is Real forum post, we should wait for three vertical blanking interrupts before trying to draw anything on the screen.

.code
.proc reset
        sei			; Disable interrupts
        cld			; Clear decimal mode
        ldx #$ff
        txs			; Initialize SP = $FF
        inx
        stx PPUCTRL		; PPUCTRL = 0
        stx PPUMASK		; PPUMASK = 0
        stx APUSTATUS		; PPUSTATUS = 0

        ;; PPU warmup, wait two frames, plus a third later.
        ;; http://forums.nesdev.com/viewtopic.php?f=2&t=3958
:	bit PPUSTATUS
        bpl :-
:	bit PPUSTATUS
        bpl :-

        ;; Zero ram.
        txa
:	sta $000, x
        sta $100, x
        sta $200, x
        sta $300, x
        sta $400, x
        sta $500, x
        sta $600, x
        sta $700, x
        inx
        bne :-

        ;; Final wait for PPU warmup.
:	bit PPUSTATUS
        bpl :-

After the hardware is initialized, all we do is copy and paste in some code which plays an audio tone, forever.

        lda #$01		; enable pulse 1
        sta $4015
        lda #$08		; period
        sta $4002
        lda #$02
        sta $4003
        lda #$bf		; volume
        sta $4000
forever:
        jmp forever
.endproc

Results

Only 73 lines of assembly code and we’re able to annoy people with a continuous beep noise.