My Cursed Website

, Programming

First, see Rakhim’s Blogging vs. blog setups comic.

The program that generates this website is cursed. I call it “webgen”, and it’s grown over the years:

$ cloc webgen

It’s 9,227 lines of Go. That’s not even the whole thing—there’s also a program I wrote for deployment, which is currently at 11,504 lines of code. That’s over 20 thousand lines of code in total. May this blog post serve as a warning.

Overview: Code is a Burden

Programming is a cool way to solve problems, but you also create problems when you write code. I don’t think I understood that when I was young. Writing more code to fix problems is, in some ways, just digging yourself deeper into the hole.

The best code is the code that you never write.

We all run into this problem, and it happens because you need a completely different mindset when you’re in school compared to when you’re working.

In the learning mindset, you make progress as you write code. In the work mindset, writing code can actually set you back, since your code might just be a future maintenance burden

The Gory Details of My Website

Here’s the history of my website, and how the associated code got so complicated. Maybe if I understand the decisions that led me to this point, I can figure out what went wrong with the decision-making process itself.

Plain HTML

The oldest commit in the website’s Git repo is from 2012, and it contains five HTML files, a few images, and a couple games I wrote. I’m sure that all I did to deploy the website was run rsync.

Here’s what my main page looked like:

<!doctype html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
    <title>Dietrich Epp’s page</title>
    <link rel="stylesheet" href="style.css">
  </head>
  <body>
    <div id="main">
      <h1>Dietrich Epp’s page</h1>
      <ul>
        <li><a href="bio.html">Biography</a></li>
      ...
    </div>
  </body>
</html>

Munging HTML with Python

In August 2012, I made a commit with the ominous message, “Begin work on Python web templating system.” The goal was to eliminate some of the boilerplate so I could focus on just writing content. At first it was simple. The program was under 200 lines, and it let me write my web page like this:

<h1>Dietrich Epp’s page</h1>
<ul>
  <li><a href="bio.html">Biography</a></li>
...

So much better! The program used html5lib to parse HTML, so it could do simple things like extract the text inside an <h1> tag and insert it into the <title> tag.

I started to add simple features, eliminating some of the steps that were previously manual.

Feature: File sizes.

I wanted to put the size of the download next to download links, like this:

<p><a href="Lonely_Star.zip">Download Lonely Star</a>
  v1.0 (1.14 MB)</p>

I didn’t want to do this manually, so I added some more code to my website generator, which let me write my page like this:

<p><a href="Lonely_Star.zip">Download Lonely Star v1.0</a></p>

Of course, I had strong opinions about the correct way to format file sizes—use the correct SI prefixes, show three digits of precision, and round to even. Getting all the edge cases correct was a fun challenge (for example, 999499 bytes is 999 kB, but 999500 is 1.00 MB), so of course I wrote the code myself instead of just picking a third-party library. This code eventually got converted to Go, and is published on GitHub: bytesize (pkg.go.dev).

Feature: Image Dimensions

The web generator would insert width and height attributes for image tags, which reduces content reflow. This feature would transform a simple image tag from this:

<img src="image.png" alt="an image that does not exist">

Into this:

<img src="image.png" alt="an image that does not exist"
     width="400" height="300">

Feature: Embedded Source Code

Pasting source code for typical languages like JavaScript or C into an HTML document is a pain because you have to escape all the tokens correctly. Getting the formatting right is also a bit of a pain—it’s much easier to put source code in a separate file and then use your website generator to insert it into the HTML.

Since my website processor worked on HTML, I used processing instructions to embed source code. The processor would read the file, convert it to HTML, and wrap it in <pre> and <code> tags. The HTML looked like this:

<?file src/main1.js tabsize=4 ?>

A JavaScript library called Prettify provided syntax highlighting (and Prettify is now abandoned).

Python 3 Migration

In January 2014, I migrated the script to Python 3. Something I wouldn’t have had to do if I hadn’t written any Python at all.

Refactoring Python

In May 2014, I spent a week refactoring and working on the web generator. I added tests, I rewrote the HTML processing code, added a configuration file, and wrote support for rules describing how to process different parts of the site.

Here’s what the new YAML configuration file looked like:

content-type:
  .html: text/html;charset=UTF-8
  .txt: text/plain;charset=UTF-8
  .css: text/css
  .jpg .jpeg: image/jpeg
  .png: image/png
  .mp3: audio/mpeg
  .ogg: audio/ogg
  .js: application/javascript
  .xz .gz .zip .bz2 .tgz: application/octet-stream
index:
- index.html
exclude:
- /webgen
- /.git
- /_gen
rules:
- roots: /
  rules:
  - match: "*"
    handler: raw
  - match: "*.html"
    handler: html
- roots:
  - /smac
  - /diablo2
  - /fallout3
  - /ludumdare/ld26/play
  rules:
  - match: "*"
    handler: raw

Was this progress? Or was this just complication?

The new generator was certainly full of features I wanted—it could detect broken links site-wide, it rewrote links in HTML to the canonical URL, and it rewrote relative URLs to make them shorter.

Was my website tooling getting better, or getting larger? At this point, it was still only 1,343 lines of Python code, just 6% of its size seven years later, in May 2021.

“Big File” Support

In August 2014, I added the website generator’s worst, most complicated, most ill-advised feature. This feature was “big file” support.

Imagine this: I’m at a conference with my laptop, it’s 2014, and there are hundreds of people connected to the same congested Wi-Fi network. I want to add something to my website, but first I need to pull the latest version of my Git repo, which contains a bunch of binary files which I don’t need.

It’s a real scenario, but perhaps I went too far trying to address it. The way I addressed it was to store large files somewhere else, and only keep the metadata (filesize, SHA-256 hash) in the repository.

Such an innocuous idea—I want to “store large files somewhere else”—but the devil is in the details. Where do I store the files? How do I synchronize these files between multiple computers? Do I version these files?

Faster Deployment

Everyone wants faster deployment, right?

For some reason, in January 2015, I decided to try and improve upon my old deployment script. The old deployment script was short and sweet. Here’s the relevant code that I removed:

def deploy(self, *, delete=False):
    """Deploy the website to the server."""
    print('Exporting website to local directory...')
    success = self.export('.export')
    if not success:
        return
    opts = [
        '--recursive',
        '--times',
        '--verbose',
        '--compress',
        '--perms',
        '--chmod=D755,F644',
    ]
    if delete:
        opts.append('--delete')
    retcode = subprocess.call(
        ['rsync'] + opts + ['--', './.export/', 'gloin:/home/www/'])
    return retcode == 0

The new deployment code was much more complicated:

  1. Create an SSH socket using ssh -S -T.
  2. Upload missing “big files” which aren’t present on the server.
  3. Upload all the remaining files.
  4. Run a shell script which moves all the new files into place. Big files are hard linked, small files are moved.

This was… faster? I guess? I spent some time tweaking it over the years.

Python Major Refactor

In April-May 2015, I spent about a week refactoring the generator.

The Mako templates were fairly nice to work with, here’s a snippet from the game index, an earlier version of /games/:

<table class="gallery">
  % for game in games:
    % if loop.even:
      <tr>
    % endif
    <td>
      <a href="${game['path']}/">
        <p>${game['name']}</p>
        <img src="${game['path']}/${game['thumb']}" width="256">
      </a>
    </td>
    % if loop.odd or loop.last:
      </tr>
    % endif
  % endfor
</table>

Cache busting is the real crazy feature here. The idea is that each different version of a file gets a different filename. This doesn’t apply to HTML files (because it would break links) or files which only get one version (like software downloads), but it does apply to files which I may decide to update—images, CSS, and JavaScript.

I chose to implement cache busting by adding a content hash to the filename. For example, the CSS file I’m using at the moment gets is named main.css, but when it’s deployed, it gets renamed to main.9b40a246aad2bac8.css.

This addresses the problem where I’d have to bypass the cache when reloading to see changes. In other words, I can just press F5 or ⌘R, instead of Ctrl-F5 or ⌘⇧R. I also liked the idea that I could set longer cache expiration on resources like CSS and images and not worry about what happens when I want to change them.

At this point, the script is 2,367 lines of Python code.

Rewriting in Go

In 2017, my opinions about Python were changing, and I wanted to move my website generator to Go. I liked Go’s simple type system, easy multithreading, and good libraries for working with things like HTML.

This rewrite apparently took a while. I deleted the Python website generator in October 2016, and implemented most of the new Go code in May 2017. The new Go code still worked with the Python Mako templates—the website tooling, written in Go, would run a child process written in Python to evaluate the templates.

Again, one of the crazy features was the cache busting. If I wanted to rename images, CSS, and JavaScript files so the filenames contained a hash of the content, the website generator had to process all the files in the correct order—because you couldn’t compute the hash of one file until all the links in that file were updated, and you don’t know which links a file had until you started processing it.

In a simple case, you might be processing a CSS file, but the CSS file contains a link to an image, so you need to process the image first (changing the filename) and process the CSS file afterwards (with the updated image link).

It would be easy to just process files type by type: images first, then CSS, and finally HTML. Unfortunately, there was one file that threw everything off—I used an external CSS stylesheet in an SVG file.

So of course I solved the problem by writing a more complicated, general-purpose website generator rather than simplifying my website.

This Go system was rather complicated. It was 6,970 lines of code. It was multi-threaded.

Removing Python

In February 2017, I decided to remove the remaining Python code, remove Mako templates, and use Go for everything. Go has its own templating system which I could use. The Mako code looked like this:

<%inherit file="/_article.mako"/>
<%!
object = schema.Article(
    name='Chaos Tomb: Visualizing Gameplay with D3 and SQL',
    dateCreated='2015-05-07T04:00:00-07:00',
    keywords=['Ludum Dare', 'Chaos Tomb', 'Analytics'],
)
prettify = True;
css_files = ['analytics.css']
script_files = ['/assets/d3.js', 'map.js', 'charts.js', 'ui.js']
%>
<p>
  <a href="/games/chaos-tomb/">Chaos Tomb</a>
  is a little too hard. How do I know that?
</p>

After the switch to Go, it looked like this:

Name: Chaos Tomb: Visualizing Gameplay with D3 and SQL
Type: Article
Date-Created: 2015-05-07T04:00:00-07:00
Keywords: Ludum Dare, Chaos Tomb, Analytics

{{.AddCSS "analytics.css"}}
{{.AddScript "/assets/d3.js" "map.js" "charts.js" "ui.js"}}
{{.Prettify}}
<p>
  <a href="/games/chaos-tomb/">Chaos Tomb</a>
  is a little too hard. How do I know that?
</p>

Like the red queen says, “it takes all the running you can do, to keep in the same place.”

Glorious New Deployment System

Around April 2019, I got the itch to make my website deployment even faster and more sophisticated. I had a dream of a reliable, sophisticated deployment tool.

The central idea was that I would have two tools. One tool was “webgen”, the website generator, which was tailored to generate the files for my personal website. It would be redesigned to generate a website package. The second tool was “webpkg”, which would operate on website packages—deploy packages to web servers, serve packages locally for testing, check packages for broken links, and it would let you save old versions of a website so you could easily roll back to an earlier version.

Here’s what a package looks like:

Resource /
Content-Type: text/html; charset=UTF-8
Size: 16562
Hash: d51c49a3b4db0defa53e222b518219ecc0b6d9e4ff0ff446d54dd3a1d926c472
Data: build/files/min_0109.html

Resource /articles/
Content-Type: text/html; charset=UTF-8
Size: 1984
Hash: 6a47e6a189ab5c96064e1f42a9eda9997fe440548786cf8bd24f15f84da5969f
Data: build/files/min_0015.html

The webpkg tool takes a while to write. I work on it at various points in 2019 and 2020. It comes to 11,504 lines of code, and can’t even deploy to remote servers, I have to deploy to a local directory and then use rsync.

The First Step Is Acknowledging That You Have a Problem

Fast-forward to May 2021, and I’m woriking on a plan to simplify things.

Have you noticed a pattern? I seem to work on this around April or May:

I think the only way to understand this is that the static website generator must not be something that I do in order to produce a website. Instead, it’s an activity that I enjoy, like rock gardening or watching TV.

It’s May again, so maybe what I can do this year is un-write some of the code I’ve written, un-implement features, and un-solve problems.

And in the future—don’t trust a coding session in May.