Calendar

January 2007
M T W T F S S
« Dec   Feb »
1234567
891011121314
15161718192021
22232425262728
293031  

January 2, 2007

Starting with the Web

Filed under: architecture,integration,webarch,webservices — Mark Baker @ 11:07 pm

Starting with the Web

A position paper for the W3C Workshop on Web of Services for Enterprise Computing, by Mark Baker of Coactus Consulting.

My position

For those that might not be aware of my position regarding Web services, I will state it here for clarity.

I believe that the Web, since its inception, has always been about services, and therefore that “Web services” are redundant. Worse than that though, while the Web’s service model is highly constrained in order to provide important degrees of loose coupling, Web services are effectively unconstrained, and therefore much more tightly coupled. As a result, Web services are unsuitable for use at scale; certainly between enterprises, but even between departments in larger enterprises.

With that in mind …

Problem statement

I begin from the assumption then, that Web services are not part of the solution, and that the Web is. And while I believe that the Web is far more capable as a distributed system than commonly believed, there is also little doubt that it is lacking in some important ways from an enterprise perspective … just not as much as some believe.

I suggest that the problem statement is this;

How can the Web be used to meet the needs of the enterprise?

Part of the Solution

When looking at how to use the Web – how to add new “ilities” (architectural properties) or modify amounts of existing ones – we need an architectural guide to help us understand the tradeoffs being made by either relaxing existing constraints, or adding new ones. I suggest that the REST architectural style is a good starting point, because in addition to being rigorously defined and providing well documented kinds and degrees of properties, it is also the style used to craft the foundational specifications of the Web; HTTP and URIs.

Tradeoffs

I anticipate that most of the improvements that are developed will be REST extensions; solutions which provide additional constraints atop REST, and therefore don’t require disregarding any of REST’s existing constraints. In my experience, this is a desirable situation in general because of the emphasis of REST on providing simplicity, and the role many of its constraints have in providing it. In addition, on the Web, it means that network effects are maximized and centralization dependencies (and other independent evolvability issues) avoided. It’s also often achievable in practice, as long as the problem continues to involve document exchange, in my experience… but not always.

For example, when performance is a make-or-break proposition, and other avenues for improving it have been exhausted, sometimes the stateless constraint will need relaxing. This comes with reasonably large costs – reliability, scalability, and simplicity are reduced for example – but if those costs are fully appreciated and the benefits worth it, then it is the right choice. Because, of course, the solution isn’t REST as a panacea, it’s REST as a starting point, a guide.

The Enterprise Web?

So what features do enterprises need exactly? Some of the common ones mentioned include reliability, security, and transactions. Let’s dig a little deeper into those …

Reliability

Reliability is often mentioned by Web services proponents as a property in which the Web is lacking. But what is meant by “reliability” in that context? Commonly, it’s used to refer to a quality of the messaging infrastructure; that HTTP messages are not, themselves, reliable. That’s true of course, but it’s equally true of any message sent over a network which is not under the control of any single authority, like the Internet; sometimes, messages are lost.

The issue, therefore, is how to build reliable applications atop an unreliable network.

In general, there is little that can be done for a Web services based solution. As arbitrary Web services clients and servers share knowledge of little more than a layer 6 message framing technology and processing model (SOAP), the only option for addressing the problem is at this level, using that knowledge. So we can take actions such as assigning messages unique identifiers in an attempt to detect loss or duplication, but not much else. Solutions at this level tend to do little more than tradeoff latency (i.e. wait time for automated retries) for slightly improved message reception rates. But even then, applications still have to deal with the inevitable case of message loss. Some might even argue that “reliable messaging” is an oxymoron, and that RM solutions are commonly attempts to mask the unmaskable.

Web servers and clients though, share a lot more than just a message framing and processing standard; they share a standardized application interface (HTTP GET, PUT, POST, etc.. as well as associated headers and features), and a naming standard (URIs), both richer (layer 7) artifacts that a generic Web services solution can’t take advantage of. As a result, the options for providing reliability are greater and, in my experience, simpler.

Consider that a reliable HTTP GET message would make little sense, as HTTP GET is defined to be safe; if a GET message is lost then another can be dispatched without fear of the implications of two or more of those messages arriving successfully at their destination. The same also holds for PUT and DELETE messages; though they’re not safe, they are still idempotent. Known safe and/or idempotent operations are a key tool in enabling the reliable coordination of activities between untrusted agents. Most important though, is that there exist known operations, as this makes a priori coordination possible in the first place.

HTTP POST messages, however, due to POST being neither safe nor idempotent, have some of the same reliability characteristics as do Web services. But despite this, the use of POST as an application layer semantic, together with standardized application identifiers, permits innovative solutions such as POE (POST Once Exactly), which uses single-use identifiers to ensure “only once” behaviour.

The “404″ response code is also sometimes mentioned as a reliability problem with HTTP, but it is not at all; quite the opposite, in fact. The 404 response code, once received, reliably tells the client that the publisher of the URI has, either temporarily or permanently, detached the service from that URI. A distributed system without an equivalent of a 404 error is one which cannot support independent evolvability, and therefore is inherrently brittle and unreliable.

Security

Security on the Web today is likely the best example of how the Web itself has been limited by the canonical “Web browsing” application. The security requirements for browsing were primarily to provide privacy, and the simplest solution providing privacy was to use SSL/TLS as a secure transport layer beneath HTTP (aka “HTTPS”).

For the enterprise, and for enterprise-to-enterprise integration in particular, privacy is necessary, but insufficient in the general case as requirements such as non-repudiation, intermediary based routing, and more are added (others can better speak to these requirements than myself). Of course, TLS doesn’t really begin to address these needs, and work is clearly needed.

Luckily, I’m not the first person to point this out. There have been at least two initiatives to add message-level security (MLS) to HTTP; SEA from 1996, HTTPsec from 2005. One might also consider S/MIME and OpenPGP as simple forms of MLS insofar as they’re concerned with privacy and sender authentication.

Interestingly, message-level security is actually more RESTful than transport level security in two respects; that visibility is improved through selective protection of message contents (versus encrypting the whole thing with transport layer security), and that layering and self-descriptiveness (possibly including statelessness) are improved by using application layer authentication instead of (potentially – when it’s used) transport layer authentication.

Transactions

This is obviously an immensely broad topic that we couldn’t begin to cover in a position paper in detail sufficient to actually make a point. Suffice it to say though, that as with reliability, my position is that the space of possible solutions is transformed considerably by the use of a priori known application semantics (and otherwise within the constraints of the REST architectural style), and therefore that solutions will look quite unlike those used for Web services.

Perhaps there will be an opportunity to work through a scenario at the workshop, if there’s enough interest.

Conclusion

The Web offers a more highly constrained, more loosely coupled service model than do “Web services”. While enterprises can often afford the expense of maintaining and evolving tightly coupled systems, inter-enterprise communication and integration is inherrently a much larger scale task and so requires the kind of loose coupling provided by the Web (not to suggest the Web has a monopoly on it, of course). Even within the enterprise though, loose coupling using existing pervasively deployed standards (and supporting tooling, both open source and commercial) could offer considerable benefits and savings. REST should be our guide toward that end.

• • •

Two more reasons why validation is still harmful

Filed under: integration — Mark Baker @ 12:45 am

Another way to look at the validation problem is through the eyes of a software architect. We’re often concerned about “encapsulation”, the practice of keeping related business logic and data together. That practice can take the form of “objects” in OO systems, or just through the partitioning of data and processing logic in non-OO systems. In either form, the value there is that maintainability is improved.

Schemas, on the other hand, when used in a typical “gatekeeper” style – where the software doesn’t see the document until after it’s passed validation – break encapsulation by enforcing some business rules separately from the software responsible for them. In practice, it’s rarely the case that software doesn’t also have some of these rules, meaning that they’re duplicated, creating a maintainability issue.

Yet another way of looking at the problem is as a communication problem (or lack thereof in this case). In the scenario described in the previous post, the meaning of the document with the field value of “4″ is still unambiguously clear to both sender and recipient, it’s just that the recipient isn’t expecting that specific value, and so it rejects the whole message as a result.

It’s true that there will be cases where unknown values like this need special attention; for example, if the “4″ represented “nuclear core shutdown procedure #4″ and the software only currently knows numbers 1 to 3. But we know how to handle those cases – by defining sensible fallback behaviour as part of our extensibility rules – and schemas play no part in that solution. More often than not, software which can handle the value “3″ can also handle the value “4″ without any trouble; an age of “200″, a quantity of “100000″, the year “8000″, a SKU following a non-traditional grammar, etc…, and the only reason we don’t accomodate those values is because we don’t currently believe them to be reasonable.

But that’s really no way to write long-lived software, is it? To expose – essentially as part of the contract to the outside world – a set of assumptions made at some soon-to-be-long-past point in time? Is this anything less than an announcement to the world of the inability of the developers of this software to adapt? Isn’t it also an implementation detail, further eroding the separation of interface and implementation?

If the message can be understood, then it should be processed.

• • •
Powered by: WordPress • Template by: Priss