
Nintendo 64 Part 23: Animations
First, the result. Here is how the animations look in-engine:
This took me about three days to implement. You’re only getting the finished product, after I went through and chased down dead ends and fixed mistakes. I assure you, I wrote many bugs and got plenty of garbled models or crashes while I was implementing this.
Importing Skeletal Animations with AssImp
I’ve been using AssImp to import models. It supports importing skeletal animations from FBX, and Blender also exports skeletal animations to FBX.
The biggest resource I found for dealing with skeletal animations using AssImp is Skeletal Animation With Assimp. It turns out that it’s not difficult to import skeletal animations from an FBX file, it just requires work and it’s easy to make mistakes. Here’s how to do it.
To start, walk through the node graph in the imported scene, keeping track of the global transformation matrix, which transforms from scene space to the space of the node.
// Get the global transformation matrix for each node.
void EvaluateNode(const aiNode *node,
                  const aiMatrix4x4 &parent_transform) {
  aiMatrix4x4 local_transform = node->mTransform;
  aiMatrix4x4 global_transform = parent_transform * local_transform;
  for (unsigned i = 0; i < node->mNumChildren; i++) {
    EvaluateNode(node->mChildren[i], global_transform);
  }
}The final transformation for each node are recorded; they’re needed later.
// A map from node name to global transformation matrix.
using Transforms = std::unordered_map<std::string, aiMatrix4x4>;
// Convert AssImp string type to std::string.
std::string ToString(const aiString &s) {
  return std::string(s.data, s.length);
}
// Get the global transformation matrix for each node.
void EvaluateNode(Transforms &transforms, const aiNode *node,
                  const aiMatrix4x4 &parent_transform) {
  aiMatrix4x4 local_transform = node->mTransform;
  aiMatrix4x4 global_transform = parent_transform * local_transform;
  transforms[ToString(node->mName)] = global_transform;
  for (unsigned i = 0; i < node->mNumChildren; i++) {
    EvaluateNode(node->mChildren[i], global_transform);
  }
}Next, iterate over the bones and use the corresponding node transforms to transform the vertexes in the mesh. The transform of each bone is equal to the transform of the node with the same name, multiplied by the bone offset matrix. The documentation and comments in the header file for AssImp are contradictory. The documentation is corect and the header is wrong—the matrix converts from bone space to mesh space, which is the only way that makes sense. (The header says that it converts from bone space to mesh space, but this is false.)
Scale the results of transforming each vertex by the bone weight for that vertex, and for each vertex, sum the results from each bone. The total sum of weights for a vertex is 1.0.
// Calculate vertex positions.
std::vector<aiVector3D> EvaluateMesh(const Transforms &transforms,
                                     const aiMesh *mesh) {
  // Start with a zero vector for each vertex.
  std::vector<aiVector3D> vertexes(mesh->mNumVertices,
                                   aiVector3D());
  // Iterate over bones.
  for (unsigned bone_idx = 0; bone_idx < mesh->mNumBones;
       bone_idx++) {
    const aiBone *bone = mesh->mBones[bone_idx];
    std::string bone_name = ToString(bone->mName);
    const aiMatrix4x4t transform =
        transforms.at(bone_name) * bone->mOffsetMatrix;
    // Iterate over vertexes, accumulate the results.
    for (unsigned weight_idx = 0; weight_idx < bone->mNumWeights;
         weight_idx++) {
      const aiVertexWeight &weight = bone->mWeights[weight_idx];
      vertexes.at(weight.mVertexId) +=
          (transform * mesh->mVertices[weight.mVertexId]) *
          weight.mWeight;
    }
  }
  return vertexes;
}By itself, this will give you the location of each vertex in the bind pose for the skeleton. To get vertex positions for an animated pose, read animation data from the scene and use it to override node transforms.
Start by modifying EvaluateNode to accept animation data.
// Get the global transformation matrix for each node.
void EvaluateNode(Transforms &transforms,
                  const Transforms &animation_transforms,
                  const aiNode *node,
                  const aiMatrix4x4 &parent_transform) {
  std::string name = ToString(node->mName);
  // Local transform may be overridden by animation data.
  aiMatrix4x4 local_transform = node->mTransform;
  auto animation_transform = animation_transforms.find(name);
  if (animation_transform != std::end(animation_transforms)) {
    local_transform = animation_transform->second;
  }
  aiMatrix4x4 global_transform = parent_transform * local_transform;
  transforms[name] = global_transform;
  for (unsigned i = 0; i < node->mNumChildren; i++) {
    EvaluateNode(node->mChildren[i], global_transform);
  }
}The overrides are calculated from the animation data, which contains keyframes for various nodes in the graph.
// Get transformation overrides for a frame of animation.
Transforms EvaluateFrame(const aiAnimation *animation,
                         double time) {
  Transforms result;
  // Iterate over channels in animation. Each channel contains the
  // transformation for one node.
  for (unsigned channel_id = 0;
       channel_id < animation->mNumChannels; channel_id++) {
    const aiNodeAnim *channel = animation->mChannels[channel_id];
    // Get the position, rotation, and scaling data for the current
    // point in time.
    const aiVector3D position =
        ReadChannel(time, channel->mPositionKeys,
                    channel->mNumPositionKeys, aiVector3D(0.0f));
    aiQuaternion rotation =
        ReadChannel(time, channel->mRotationKeys,
                    channel->mNumRotationKeys, aiQuaternion());
    const aiVector3D scaling =
        ReadChannel(time, channel->mScalingKeys,
                    channel->mNumScalingKeys, aiVector3D(1.0f));
    // Construct the node's local transformation matrix.
    result[Str(channel->mNodeName)] =
        aiMatrix4x4(scaling, rotation, position);
  }
  return result;
}The position, rotation, and scale are calculated by interpolating within the
animation’s keyframes.
The data types are different, but fortunately, it’s possible to write
the ReadChannel function once using a template:
// Read an object from an animation channel.
template <typename Key>
typename Key::elem_type ReadChannel(
    double time, const Key *keys, unsigned count,
    typename Key::elem_type default_value) {
  // If the channel has no data, return the default value.
  if (count == 0) {
    return default_value;
  }
  // Find the first keyframe that is after the current time.
  unsigned idx = 0;
  while (idx < count && time >= keys[idx].mTime) {
    idx++;
  }
  // Current time is before all key frames, return value from first
  // key frame.
  if (idx == 0) {
    return keys[0].mValue;
  }
  // Current time is after all key frames, return value from last
  // key frame.
  if (idx == count) {
    return keys[idx - 1].mValue;
  }
  // Interpolate between the previous and next key frame.
  const Key &prev = keys[idx - 1];
  const Key &next = keys[idx];
  double frac = (time - prev.mTime) / (next.mTime - prev.mTime);
  return Interpolate(prev.mValue, next.mValue, frac);
}This requires an interpolation function which works on both vectors and quaternions. AssImp defines spherical linear interpolation for quaternions, so you don’t need to implement that yourself, you just have to wrap it.
// Interpolate between vectors.
aiVector3D Interpolate(const aiVector3D &a, const aiVector3D &b,
                       float frac) {
  return a * (1.0f - frac) + b * frac;
}
// Interpolate between quaternions.
aiQuaternion Interpolate(const aiQuaternion &a,
                         const aiQuaternion &b, float frac) {
  aiQuaternion r;
  aiQuaternion::Interpolate(r, a, b, frac);
  return r;
}That’s all you need to evaluate skeletal animations using AssImp. Pretty short!
Animation Format for Nintendo 64
The new model format looks something like this, simplified a bit:
// A frame of animation for a model.
struct model_frame {
  float time;
  Vtx *vertex;
};
// Header for a model.
struct model_header {
  Gfx *display_list;
  int frame_count;
  struct model_frame frame[];
};On the cartridge, the pointers are stored as offsets from the beginning of the model, so the loader fixes up the pointers after loading:
// Convert an offset to a pointer.
void *pointer_fixup(void *ptr, uintptr_t base, size_t size) {
  uintptr_t value = (uintptr_t)ptr;
  if (value > size) {
    fatal_error(
        "Bad pointer in asset\nPointer: %p\nBase: %p\nSize: %zu",
        ptr, (void *)base, size);
  }
  return (void *)(value + base);
}
// Convert offsets in a model to pointers.
void model_fixup(struct model_header *model, size_t size) {
  uintptr_t base = (uintptr_t)model;
  model->display_list =
      model_fixup(model->display_list, base, size);
  for (int frame = 0; frame < model->frame_count; frame++) {
    model->frame[frame].vertex =
        pointer_fixup(model->frame[frame].vertex, base, size);
  }
}Here is how to render a frame, using an RSP segment to point to the vertex data (which is different for each frame) so the vertex addresses can be hard-coded into the display list:
// Render a specific frame of a model.
Gfx *render_model(Gfx *dl, const struct model_header *model,
                  int frame_id) {
  gSPSegment(dl++, 1, K0_TO_PHYS(model->frame[frame_id].vertex));
  gSPDisplayList(dl++, model->display_list);
  return dl;
}The code that runs in the engine is so simple! The CPU barely has to do anything—just construct two commands in a display list.
The tradeoff for this simplicity is the model size. Each vertex takes 16 bytes of data. The model uses 272 vertexes (a small percentage of these are duplicates). One full frame of vertex data consumes 16×272 = 4352 bytes. There are 44 frames of animation, which adds up to 191488 total bytes of vertex data.
200 KB is not a huge amount of data, but a stock Nintendo 64 only has 4 MiB of RAM, so I won’t be able to load very many models with this level of complexity into RAM at the same time.
There are some techniques that I could use to reduce memory usage:
- The vertex data could be streamed from the cartridge. According to Halley’s Comet Software, DMA from cartridge memory reads 3.7 bytes per cycle, which at 93.75 MHz means a bandwidth of 350 MB/s. At this speed, 4352 bytes would transfer in only 13 μs, which is definitely fast enough for streaming.
- Only the first 8 bytes of each vertex contain position data, so the remaining 8 bytes can be shared across all vertexes—but this would require some additional work to combine the two halves of each vertex during each frame.
- The model could use fewer frames if the engine interpolated between them.
The Result
Here are the resulting ROM images, available for both NTSC and PAL:
“Thornmarked” is the working name for the game.