This post is the third in a series of arguments for using relational databases instead of, say, flat files. As I mentioned previously, I'm trying to steer clear from the "normal" reasons one often hears (integrity, transactions, anomaly reduction, yadda, yadda).
In relational-speak, null is interpreted as either "the absence of any meaningful value" or simply "unknown". The distinction can be subtle at times. Apparently Codd considered four-valued logic to better delineate the meanings, but that way lies madness.
The relational null is special. For example, a Boolean data value can be True, False, or null. A numeric zero is distinct from null, as are the infinities. Empty strings are not null either. With the exception of SQL's
is identity operator, any expression containing null will evaluate to null. Null is not equal to null. Even more interesting, null is not not equal to null.
Null is the void; a black hole of nothingness.
The relational null doesn't exist in most imperative languages. The keyword "null" does pop up from time to time, but not in the context of true ternary logic. For example, the C language uses both
NULL (the end-of-string guard, and an invalid pointer address, respectively) but these are just aliases for zero. Other languages use similar terms:
Because of this, implementing null in a flat file becomes a bit of a challenge. How do you distinguish zero, the empty string, an empty list, or even False from null? Without special care, you can't. It's usually a simple matter to discern a null in a numeric or Boolean field (leave 'em blank). Things get murky for other datatypes. Leaving a string value empty might mean either an empty string or null (preferably the former). Which will it be? The inevitable "solution" is to encode a special sentinel value -- like "
***THIS IS A NULL***" or perhaps something less flamboyant -- but the developer is ever after left with the uneasy feeling that one of the infinite monkeys using the system will someday enter "
***THIS IS A NULL***" into a form, wreaking system-wide havoc.
Even if you can come up with an untypeable sentinel value, you now have to make sure that expressions involving your sentinel are evaluated according to the rules of ternary logic. If you concatenate "
Bill" with "
***THIS IS A NULL***", do you get "
Bill ***THIS IS A NULL***", "
Bill", or "
***THIS IS A NULL***" as the result? Hint, null concatenated (or added, multiplied, compared, inverted, exclusive-ored, etc.) to anything is null.
All of this begs the question: why use nulls at all? Well, if you can avoid them, more power to you. One of the goals when designing relational databases is to minimize the need for nulls.
But sometimes you just can't avoid the damned things. Take the example of recording the gender of people (e.g. employees, customers). What could be more straightforward? Male or Female. Case closed. Except real life always injects exceptions into the perfect design. Leaving aside hermaphrodites or SNL's Pat, there will always be cases where the gender field can't be filled in: it's not known yet, government regulations forbid asking in this one instance, or it's simply irrelevant for the task at hand. Without a null possibility, records end up with random genders or default male or female. Either way the data is incorrect rather than merely incomplete.
Long story short, nulls are an indispensable tool for dealing with the vagaries of meatspace data. That they're incorporated into every relational datatype is recognition of this fuzzy state of affairs. Rolling your own just to avoid RDBMS overhead is a false economy.
- John Marabeas on 20080324.Monday:
With all you said, would it be okay to include a "notes" field (tinyblob datatype, let's say) in, say, a members table even though not every member will have a note associated with him/her? That being the case, would I assign a "NULL" value to the field so as waste a minimum amount of storage space if the field is left blank for those members with no notes? (Or should a separate table me made (called "notes") such that it will only contain information if there are notes associated with those members who have them)?
- Dave on 20080327.Thursday:
John, I wouldn't worry about space, since a text/blob or varchar column will only take up a small amount of overhead if left empty and you may not get any of that back for a NULL anyway. The separate table idea is probably the most space-efficient, although it allows an item to have multiple notes if you're not careful about the PK (but maybe that's not a problem anyway). As for allowing NULL values for your notes or comments, unless you can clearly define the difference between a NULL note and an empty note, and see that definition as useful, I'd go with NOT NULL. NULL has its place, but avoiding it should be your default option.