Another way to look at the validation problem is through the eyes of a software architect. We’re often concerned about “encapsulation”, the practice of keeping related business logic and data together. That practice can take the form of “objects” in OO systems, or just through the partitioning of data and processing logic in non-OO systems. In either form, the value there is that maintainability is improved.
Schemas, on the other hand, when used in a typical “gatekeeper” style – where the software doesn’t see the document until after it’s passed validation – break encapsulation by enforcing some business rules separately from the software responsible for them. In practice, it’s rarely the case that software doesn’t also have some of these rules, meaning that they’re duplicated, creating a maintainability issue.
Yet another way of looking at the problem is as a communication problem (or lack thereof in this case). In the scenario described in the previous post, the meaning of the document with the field value of “4” is still unambiguously clear to both sender and recipient, it’s just that the recipient isn’t expecting that specific value, and so it rejects the whole message as a result.
It’s true that there will be cases where unknown values like this need special attention; for example, if the “4” represented “nuclear core shutdown procedure #4” and the software only currently knows numbers 1 to 3. But we know how to handle those cases – by defining sensible fallback behaviour as part of our extensibility rules – and schemas play no part in that solution. More often than not, software which can handle the value “3” can also handle the value “4” without any trouble; an age of “200”, a quantity of “100000”, the year “8000”, a SKU following a non-traditional grammar, etc…, and the only reason we don’t accomodate those values is because we don’t currently believe them to be reasonable.
But that’s really no way to write long-lived software, is it? To expose – essentially as part of the contract to the outside world – a set of assumptions made at some soon-to-be-long-past point in time? Is this anything less than an announcement to the world of the inability of the developers of this software to adapt? Isn’t it also an implementation detail, further eroding the separation of interface and implementation?
If the message can be understood, then it should be processed.