The letter C crossed out, in front of an AI-generated image of an ugly basement filled with computers.

Dear Retro Community, Stop Writing Tools in C

Jan 14, 2023, Programming

If you’re making a game for old DOS PCs, the Game Boy Advance, the Nintendo 64, or some other retro system, there’s a good chance that you’re writing C code anyway. Why not write your tools in C?

“C is portable,” You say. “You can run C programs on Windows, Linux, and macOS.”

“C is lightweight,” You say. “You don’t have to download a ton of dependencies.”

“C is fast.” You say.

Stop justifying C. You’re suffering. It’s time to end the pain and switch to a nice, happy language where you can just write some programs and move on with your life. If you choose a nice language, you finish the job sooner and go outside to play with your friends.

Let’s talk about the reason why C is a bad choice in more detail.

Debugging and Memory Safety

Let’s get this point out of the way first.

C is not memory-safe. When you have a memory error in your C program, you can get all sorts of weird behavior, crashes, or corrupted output. Maybe free() segfaults because there’s a bug in your program somewhere. That bug could be anywhere in your program.

If you are not used to writing C, expect to spend hours or days debugging mysterious crashes and data corruption. Yes, this is frustrating. This is the biggest reason why nobody uses C these days if they can avoid it.

Parsing

Parsing strings in C is a miserable experience. Let’s work through an example to show why.

Let’s say that we are trying to parse the Wavefront OBJ format, which is a relatively common format for 3D models. (Of course you should normally use a library for this, but this is just an example to show what happens when you try to parse a string in C.)

An OBJ file is a line-oriented text file, so it should be straightforward to parse with C. Here’s a line you might see in an OBJ file:

f 10/8/7 11/9/8 12/10/3

This line describes a single triangle, with references to vertex position, texture coordinate, and normal data. As parser output, what we want is some data structure like this:

// Data structure for one vertex in a face.
typedef struct {
  int position;
  int texture_coordinate;
  int normal;
} FaceVertex;

// What the triangle will look like after parsing.
FaceVertex MyTriangle[3] = {
  {10, 8, 7},
  {11, 9, 8},
  {12, 10, 3},
};

How do we write the parser? First you have to read the line, and maybe check that the first token is an f, for “face element”.

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

// Parse face element line.
void ParseFace(char *line);

// Parse OBJ file.
void ParseOBJ(FILE *fp) {
  while (true) {
    char line[100];
    if (fgets(line, sizeof(line), fp) == NULL) {
      if (ferror(fp)) {
        perror("Could not read file");
        exit(1);
      }
      break;
    }
    switch (line[0]) {
    case 'f':
      ParseFace(line);
      break;
    }
  }
}

Okay, now we have to split the string into list of strings separated by spaces. If we were writing Python, we’d be able to do something super easy like this:

for token in s.split():
    # ... process token ...

Instead, we ask around online or look at a textbook, and find out that strtok() is a function which splits strings into smaller strings. Let’s use that function. First, we split the line into its separate vertexes, which are separated by spaces:

#include <string.h>

// Parse vertex in face element.
void ParseVertex(FaceVertex *vertex, char *token);

// Maximum number of vertexes in a face.
#define MAX_FACE_VERTEXES 10

// Parse face element line.
void ParseFace(char *line) {
  char *token = strtok(line, " ");
  // Skip the "f" token.
  token = strtok(NULL, " ");

  // Parse each remaining vertex.
  FaceVertex vertexes[MAX_FACE_VERTEXES];
  int count = 0;
  while (token != NULL) {
    if (count >= MAX_FACE_VERTEXES) {
      fputs("Error: too many vertexes in a face\n", stderr);
      exit(1);
    }
    ParseVertex(&vertexes[count], token);
    count++;
    token = strtok(NULL, " ");
  }
  if (count < 3) {
    fputs("Error: too few vertexes in a face\n", stderr);
    exit(1);
  }
}

For now, let’s test to see if splitting the string works correctly. Instead of parsing each vertex, we’ll just print each vertex out. Instead of reading the data from a file, we’ll pass in a string with test data.

// Parse vertex in face element.
void ParseVertex(FaceVertex *vertex, char *token) {
  printf("Vertex: '%s'\n", token);
}

int main(int argc, char **argv) {
  ParseFace("f 10/8/7 11/9/8 12/10/3");
}

Ok, this program crashes. You get a segmentation fault when you try to run it.

$ ./a.out
32150 Segmentation fault      (core dumped) ./a.out

We’re just trying to split a string, what is so complicated about that, and why is it crashing?

It turns out that you can’t use strtok() on a string constant. Okay, fair enough. Let’s fix that by using an array for the string.

char TestData[] = "f 10/8/7 11/9/8 12/10/3";
int main(int argc, char **argv) {
  ParseFace(TestData);
}

This program prints out the expected result:

$ ./a.out
Vertex: '10/8/7'
Vertex: '11/9/8'
Vertex: '12/10/3'

Now let’s write ParseVertex() the same way, using strtok().

#include <errno.h>
#include <limits.h>

// Parse vertex in face element.
void ParseVertex(FaceVertex *vertex, char *token) {
  printf("Vertex '%s'\n", token);
  int parts[3];
  for (int i = 0; i < 3; i++) {
    parts[i] = -1;
  }
  char *subtoken = strtok(token, "/");
  int count = 0;
  while (subtoken != NULL) {
    if (count >= 3) {
      fputs("Error: too many parts in vertex\n", stderr);
      exit(1);
    }
    // Only process non-empty parts in the vertex.
    if (*subtoken != '\0') {
      // This is how you check for errors in strtol().
      errno = 0;
      char *end;
      long value = strtol(subtoken, &end, 10);
      if (*end != '\0') {
        fputs("Error: invalid index in vertex\n", stderr);
        exit(1);
      }
      if (value < 0 || value > INT_MAX ||
          (value == LONG_MAX && errno == ERANGE)) {
        fputs("Error: index is out of range\n", stderr);
      }
      parts[count] = value;
    }
    count++;
    subtoken = strtok(NULL, "/");
  }
  vertex->position = parts[0];
  vertex->texture_coordinate = parts[1];
  vertex->normal = parts[2];
}

Looks good so far, but we get an error when we run it:

$ ./a.out
Vertex '10/8/7'
Error: too few vertexes in a face

What is going on here? It looks like only the first vertex is getting parsed!

It turns out that you cannot use strtok() in nested loops, because strtok() uses global variables to keep track of some state that it needs. The solution is to use strtok_r(). Here is the final code:

#include <errno.h>
#include <limits.h>
#include <string.h>

// Parse vertex in face element.
void ParseVertex(FaceVertex *vertex, char *token);

// Maximum number of vertexes in a face.
#define MAX_FACE_VERTEXES 10

// Parse face element line.
void ParseFace(char *line) {
  char *save;
  char *token = strtok_r(line, " ", &save);
  // Skip the "f" token.
  token = strtok_r(NULL, " ", &save);

  // Parse each remaining vertex.
  FaceVertex vertexes[MAX_FACE_VERTEXES];
  int count = 0;
  while (token != NULL) {
    if (count >= MAX_FACE_VERTEXES) {
      fputs("Error: too many vertexes in a face\n", stderr);
      exit(1);
    }
    ParseVertex(&vertexes[count], token);
    count++;
    token = strtok_r(NULL, " ", &save);
  }
  if (count < 3) {
    fputs("Error: too few vertexes in a face\n", stderr);
    exit(1);
  }
  puts("Vertexes:");
  for (int i = 0; i < count; i++) {
    printf("  %d, %d, %d\n", vertexes[i].position,
           vertexes[i].texture_coordinate, vertexes[i].normal);
  }
}

// Parse vertex in face element.
void ParseVertex(FaceVertex *vertex, char *token) {
  int parts[3];
  for (int i = 0; i < 3; i++) {
    parts[i] = -1;
  }
  char *save;
  char *subtoken = strtok_r(token, "/", &save);
  int count = 0;
  while (subtoken != NULL) {
    if (count >= 3) {
      fputs("Error: too many parts in vertex\n", stderr);
      exit(1);
    }
    // Only process non-empty parts in the vertex.
    if (*subtoken != '\0') {
      // This is how you check for errors in strtol().
      errno = 0;
      char *end;
      long value = strtol(subtoken, &end, 10);
      if (*end != '\0') {
        fputs("Error: invalid index in vertex\n", stderr);
        exit(1);
      }
      if (value < 0 || value > INT_MAX ||
          (value == LONG_MAX && errno == ERANGE)) {
        fputs("Error: index is out of range\n", stderr);
      }
      parts[count] = value;
    }
    count++;
    subtoken = strtok_r(NULL, "/", &save);
  }
  vertex->position = parts[0];
  vertex->texture_coordinate = parts[1];
  vertex->normal = parts[2];
}

It works!

$ ./a.out
Vertexes:
  10, 8, 7
  11, 9, 8
  12, 10, 3

It cost us some number of hours debugging. We had to learn the weird quirks of how strings work in C. We wrote 64 lines of code. All we’ve managed to do is parse a single simple line of a file that we’re interested in.

This is bad. Processing strings in C is miserable.

For comparison, here’s the Python code:

import dataclasses

@dataclasses.dataclass
class Vertex:
    position: int
    texture_coordinate: int
    normal: int

def parse_vertex(token):
    parts = [None] * 3
    for n, part in enumerate(token.split('/')):
        if n < 3 and part:
            parts[n] = int(part)
    return Vertex(*parts)

def parse_face(tokens):
    vertexes = []
    for token in tokens[1:]:
        vertexes.append(parse_vertex(token))
    if len(vertexes) < 3:
        raise ValueError('too few vertexes in face')
    return vertexes

def parse_line(line):
    tokens = line.split()
    if tokens and tokens[0] == 'f':
        vertexes = parse_face(tokens)
        for vertex in vertexes:
            print(vertex)

parse_line("f 10/8/7 11/9/8 12/10/3")

Quick and easy to write, and it works:

$ python parse.py
Vertex(position=10, texture_coordinate=8, normal=7)
Vertex(position=11, texture_coordinate=9, normal=8)
Vertex(position=12, texture_coordinate=10, normal=3)

Time for a Break

Take a moment to think.

Do you really want to spend another two weeks in this awful basement staring at a screen so you can make the some development tool run slightly faster? Is this what you enjoy doing?

AI-generated image of a man working at a computer in a basement. — AI image generators haven’t quite figured out what computers look like, or what straight lines are.

Go for a walk outside, look at some trees.

At some point, ask yourself, “Do I get extra points for doing things the hard way?”

Speed

For some, the allure of C is its speed. C’s speed is its siren call and thousands of software engineers lay at the bottom of the ocean shipwrecked. They dreamed about faster C programs and didn’t notice that they were drowning.

Here’s the catch: C is slow where it counts. Tools are for you, the developer, to use. Fast tools save you time, but making the fast tools wastes your time. There is no point in spending an extra week writing your tool in C so you can save a few hundred milliseconds here and there.

And yes, the amount of time you’re saving is probably measured in milliseconds or seconds. The formula is simple: Add up the time you saved (a few seconds), subtract the time you spent (days or weeks). You already know that this number is negative, you do not need to get out your calculator.

Your time is more valuable than the computer’s time.

Let’s look at the Python script we wrote above and see how fast it runs.

>>> def parse_line(line):
...     tokens = line.split()
...     if tokens and tokens[0] == 'f':
...         vertexes = parse_face(tokens)
...
>>> import timeit
>>>
>>> timeit.timeit(
...     'parse_line("f 10/8/7 11/9/8 12/10/3")',
...     globals={'parse_line': parse_line})
3.3218245949974516

The timeit.timeit() function runs 1,000,000 repititions by default, so we can conclude that this function takes about 3 microseconds to parse a single line.

That’s not as fast as C, but it’s fast enough for simple tools.

This example uses Python, a notoriously “slow” language, just to make a point. If you choose a language like Java, C#, Go, or Rust, you’ll undoubtedly see much better performance.

Language performance is a droll subject. Go look at a programming language shootout if you want better comparisons. (The claim that Python is a “slow” language has a lot of problems anyway. It’s slow the way that I used it here.)

Portability

You can get a C compiler for just about anything, so in theory, C is portable.

Yes, you can get your C code to run on any system you like. You’re able to run C on Linux, macOS, and Windows. The problem is that compiling C binaries for multiple systems sucks, and almost any other language is easier to port than C.

Those other languages fall into three categories:

Some languages are compile once, run anywhere. This includes Java and C#. You write your program, compile it to a Java JAR or C# EXE file, and distribute it. Anyone with a copy of the JRE (for Java) or the .NET runtime (for C#) can run your program, no problem. One version of the program works for everyone.
Some languages have good cross-compilation support. This includes Go and Zig. If you want to create a Linux, macOS, and Windows binary, you can do that with only one computer—you can easily cross-compile your program for all three operating systems, no matter which operating system you are using to host the compiler.
Some languages are scripting languages. Your end-users just need to install a runtime for the scripting language and your program will run fine, directly from source code. This includes languages like Python, Ruby, JavaScript, and Perl.

C may be portable in a technical sense, but it’s much easier to make software for multiple platforms if you choose a different language.

Back in the 1990s, everyone was miserable porting their C programs between Windows and Sun Solaris and Macintosh, and when Java came along, Sun’s “Write once, run anywhere” slogan for Java was, well, incredibly compelling. If your developer experience is worse than the experience of a late-1990s software developer, maybe you should pick a different language.

Dependencies

Dependencies in C are the worst.

Let’s say that you want to use LibPNG with your C program. That sounds straightforward, right?

First, you have to get your copy of LibPNG. If you download it from the LibPNG website, you have to compile it. Don’t do that. Use a package manager instead.

Which package manager do you use? Depending on your system, you might install LibPNG with apt (Debian, Ubuntu, WSL), brew (macOS), dnf (Fedora), emerge (Gentoo), or pacman (Arch). The package name is different for different distros! You have to install libpng-dev for Debian, Ubuntu, and WSL, libpng-devel for Fedora, and libpng for Gentoo, Arch, and Homebrew. Good luck writing instructions for your users.

Now you have to get the compiler flags and linker flags for LibPNG. These may be different for different end-users, so we have tools like pkg-config. There are alse CMake scripts to find LibPNG if you’re using CMake.

If you want to distribute binaries, you can ask your users to install LibPNG alongside your program, or you can figure out a way to statically link LibPNG, or you can compile LibPNG as a dynamic library and ship it next to your application.

For comparison, how do you install a library for reading PNG files in Python? First, install Pillow:

$ python -m pip install pillow

There is no step two. Nothing needs to be configured. Some other languages, like Go, have PNG support built-in to the standard library, so not even this step is necessary.

Which Language Should I Use

Pick any. Use one you’re familiar with. Just don’t pick C (and C++ is not much better). Pick a more modern language like Java, C#, Go, Rust, Python, Ruby… pick almost anything, just not C.

Use C Anyway

There are some reasons to use C anyway, against this general advice. Here are some reasons I thought of:

You are already very experienced at C, and nobody can stop you.
You are making developer tools to distribute, and you need to use libraries which are written in C.

I’m Not Calling You Out Specifically

This post is not directed at any one person. I’ve talked to multiple people who had these problems with strtok(), multiple people who spent hours or days debugging a crash in free(), multiple people who gave me the same “portability” justification. These are recurring issues that affect everyone who writes their tools in C; this is not a call-out post for any specific person.

If I convince just one person to use a different language for their tools instead of C, I will have earned back the time I spent writing this article.