XML News from Monday, December 8, 2008

The W3C Core Working group has broken faith with the XML community by publishing an XML 1.0, fifth edition spec that is incompatible with all previous versions. The grammar has changed and previously malformed documents are suddenly well-formed. Existing parsers cannot handle the syntax defined by this edition. XML 1.1 has failed so now the W3C is trying to rewrite history and pretend that this is what they meant all along. (If that were true, why did we waste so much time on XML 1.1?) Apparently stability of standards is no longer a virtue at the W3C. This is even worse than the XML 1.1 debacle. At least there, the W3C admitted they were pushing a new, incompatible version; and gave users a hook to tell which version they were receiving. Now we don't even have that. As if XML weren't already confusing enough for people who don't spend 60 hours a week thinking about this stuff. Now we have to explain that the well-formedness of a document depends on which version of which parser is being used, and which edition of XML the parser implements, and no, there's nothing in the document to tell you which version you should be using.

The ostensible goal of this edition is to improve internationalizability of XML by enabling additional characters that might someday be needed by someone, somewhere to name an element or attribute. (Byzantine Greek musical symbols anyone?) In practice, though, I think fear will do the opposite. The real rules are now far too confusing and far too poorly labeled for any person to follow given the unadvertised version conflicts. The quick and dirty reality is now going to be, "Name everything with Latin-1". If you go beyond that, you're taking your chances that what works with your parser may not work with others'. Great for Western Europeans (except Greeks) and Americans; sucks for everybody else.

Perhaps the time has come to say that the W3C has outlived its usefulness. Really, has there been any important W3C spec in this millennium that's worth the paper it isn't printed on? The W3C almost killed HTML, and browser vendors have effectively abandoned it. Between schemas and XML 1.0 5th edition, they same intent on doing the same thing to XML. And don't get me started on the huge amount of effort and brain power being wasted on counting semantic angels on top of a URI-named pin. XSLT 2 and XPath 2 were still-born, and the much more pragmatic XSLT 1.1 was killed. Maybe XQuery, but even that is far more complex and less powerful than it should be due to an excessive number of use cases and a poorly designed schema type system. I think we might all be better off if the W3C had declared victory and closed up shop in 2001.