Validation considered harmful
We don’t get a lot of enjoyment about trampling sacred cows here at Coactus, honest we don’t. But we see so much bad practice out there – more so recently – that we feel compelled to speak out.
Today’s sacred cow is document validation, such as is performed by technologies such as DTDs, and more recently XML Schema and RelaxNG.
Surprisingly though, we’re not picking on any one particular validation technology. XML Schema has been getting its fair share of bad press, and rightly so, but for different reasons than we’re going to talk about here. We believe that virtually all forms of validation, as commonly practiced, are harmful; an anathema to use at Web scale. Specifically, our argument is this;
Tests of validity which are a function of time make the independent evolution of software problematic.
Why? Consider the scenario of two parties on the Web which want to exchange a certain kind of document. Party A has an expensive support contract with BigDocCo that ensures that they’re always running the latest-and-greatest document processing software. But party B doesn’t, and so typically lags a few months behind. During one of those lags, a new version of the schema is released which relaxes an earlier stanza in the schema which constrained a certain field to the values “1”, “2”, or “3”; “4” is now a valid value. So, party A, with its new software, happily fires off a document to B as it often does, but this document includes the value “4” in that field. What happens? Of course B rejects it; it’s an invalid document, and an alert is raised with the human adminstrator, dramatically increasing the cost of document exchange. All because evolvability wasn’t baked in, because a schema was used in its default mode of operation; to restrict rather than permit.
Just because it only makes sense for a field in a document to contain the values “1”, “2”, or “3” today, does not necessarily mean “4”, “0”, or “9834” won’t be valid tomorrow. Similarly, just because a document doesn’t now contain a field named “Blarg”, it doesn’t mean it won’t later. A good rule of thumb in document design is to avoid making assumptions about what won’t be there in the future, and a rule of thumb for software is to defer checking extension fields or values until you can’t any longer.
On the Web, you need to be able to process messages from the future.
P.S. if you’re wondering what time-independent validity looks like, we’ll cover that at a later date (check the tags of this post for a hint).