Skip to main content

The only constant in life is change.

Heraclitus

Improving stability and readability with Enum data type

When designing databases for web applications it is common practice to use a tiny int data type for configuration settings, published state or active/inactive toggles.

From a programmer’s perspective this makes perfect sense. Because most programmers love ones & zeros and because most programming languages can interpret ones & zeros as boolean values.

Let's say for example we have a document table with a published field. The published field has two possible states.

  • 0 = unpublished
  • 1 = published

If I wanted to set the published state I could do something like this

$document->published = true;

$otherDocument->published = false;

Checking the published state is easy too.

if ($document->published) {
     // do something with published documents
}

All of these examples are very readable and close to natural language, however there are some limitations to this approach that can be fixed through the use of the Enum data type.

Data integrity

The first limitation with using the tiny int data type is inability to insure data integrity. At the beginning of the example I defined the implied encoding rule being used for the $document->published column. (0 = unpublished & 1 = published)

However since there is no actual way of enforcing the encoding rule, it is possible to introduce a bug through improper encoding. Since there would be no warning to indicate that an improper encoding error had occurred, you wouldn't know until the inevitable customer complaint.

Because the Enum data type makes the encoding explicit and prevents invalid values, it giving us a direct way to define, enforce and maintain data integrity.

Complexity Management

Another limitation of the tiny int approach is that its implied encoding only makes sense with binary options. As an application matures, it is often the case that what first appeared to be a simple binary requires a more fine grained approach.

Continuing with the $document->published example, most document management applications support additional concepts like draft documents and archived documents. In order to support these new concepts using the tiny int, we have to increase the complexity of our encoding rules.

This increases the likelihood of a bug being introduced because of improper encoding. Additionally as the complexity of the encoding rules increases, the readability of our code base decreases. Because the encoding rules are completely dependent on specialized knowledge.

Without knowing the encoding rules, can you tell me with complete confidence what the published state of document is in the code below?

$document->published = 2;
$otherDocument->published = 3;  

Of course there are ways to mitigate confusion in the code base. We could perhaps create global constants for each value and use those to set the published state. But this still depends on the programmer having special knowledge that such global constants exist.

Both approaches increase the complexity of the implementation without sufficiently mitigating the risk of improper encoding bugs.

Using the Enum data type we avoid all this complexity by simply not double encoding our values. Can you tell me with complete confidence what the published state of the document is in the code below?

$document->published = ‘draft’;
$otherDocument->published = ‘archived’;  

The Enum data type doesn’t require specialized knowledge because the acceptable values are part of the fields definition. That is to say if you know there is an Enum field called published then you also know all the acceptable values of that field.

Even if you forget, the data integrity protection mentioned earlier will give you a clear error warning informing you of the mistake well before it ever makes it to production.

Data Readability

Another benefit of using Enum values instead of tiny int is that the data become readable in its raw form. Compare the two tables below and ask yourself which one is easier to understand

tiny int:
id title text published
1 Encapsulation One of the… 1
2 Why I don’t use else Many years … 2
3 Structured HTML form data Anyone who … 3
4 Improving stability and ... When designing… -1
Enum:
id title text published
1 Encapsulation One of the… published
2 Why I don’t use else Many years … draft
3 Structured HTML form data Anyone who … archive
4 Improving stability and ... When designing… trashed

Some of you might feel that this example is unfair because I introduced a published state that hasn't been discussed up to this point, so there was no way for you to guess what -1 meant.

You are right that it is unfair, but in my experience this is a very realistic representation of what working on an application that depends on special knowledge is like.

Conclusion

It seems to me that in order to create more resilient and flexible applications, it is important to reduce unnecessary complexity, use methods that enforce data integrity and eliminate specialized knowledge dependencies whenever possible.

I have found that using the Enum data type, instead of double encoded techniques like the tiny int, to be highly effective at achieving those goals. I hope you'll consider using them in your project.