Compass Navigation Streamline Icon: https://streamlinehq.com
applied cartography

One status field per model

Any sufficiently old application starts to succumb to a pernicious form of technical debt known in street parlance as shitty data modeling.

Sometimes this manifests as the god object: a single model that represents the user, and the settings, and the DNS configuration, and twelve other things. Sometimes this comes in the form of a table (or multiple tables) where the initial set of data modeling concerns in the early goings of the project don't quite match the reality discovered along the way, and a series of subtle mismatches collide with each other in the same way that subtle mismatches between tectonic plates do.

Data models, unlike other areas of tech debt, are correctly scary to refactor. Even in Django — an application framework with really robust, mature migration tooling — reshaping data in production is non-trivial. The weight associated with even relatively simple schema changes can be so overwhelming as to forever dissuade a would-be re-architect from making things right.

Therefore, it is that much more important to spend the extra mental energy early on to make sure, whenever possible, your data model is a roughly correct one, and to course correct early when it isn't.

There are many ways to do this, and the goal of describing a virtuous data model in its entirety is too large and broad a problem for this measly little essay. Instead, I want to share a heuristic that I have found particularly useful — one which is summed up, as many of my blog posts are, in the title.

Every data model must have at most one status field.

If you're thinking about making a change such that a model has more than one status field, you have the wrong data model.


Let me illustrate via self-flagellation and talk about Buttondown's own problematic model: the Newsletter object.

The Newsletter object has three status fields within its lush, expansive confines:

  1. A status field (the normal one)
  2. A sending_domain_status field
  3. A hosting_domain_status field

This is incorrect. We should have created standalone models for the sending domain and hosting domain, each with a simple status field of its own, and drawn foreign keys from the Newsletter onto those. We did not do this, because at the time it felt like overkill.

And so. You pay the price — not in any one specific bug, but in weirdness, in the difficulty of reasoning about the code. Is there a meaningful difference between an active status and a hosting_domain_status of active for an active newsletter, versus an active status and a hosting_domain_status of pending? What queries should return which combinations? The confusion compounds.

Again, I know this sounds trivial. But every good data model has syntactic sugar around the state machine, and every good state machine has a unary representation of its state. See also: enums.


About the Author

I'm Justin Duke — a software engineer, writer, and founder. I currently work as the CEO of Buttondown, the best way to start and grow your newsletter, and as a partner at Third South Capital.

Colophon

You can view a markdown version of this post here.