XML News from Wednesday, January 26, 2005

The IETF has released RFC 3986, Uniform Resource Identifier (URI): Generic Syntax, as an offical standard (only the 66th they've published). This replaces RFC 2396 as the official definition of URIs, and it's about time. RFC 2396 looked simpole on the surface, when you started digging into it, it became obvious that it had lots of unaddressed cases and unanswered questions. 3986 is a vast improvement. For instance, it finally requires that non-ASCII characters be encoded in UTF-8 prior to eprcent escaping, rather than whatever character set the author happens to prefer. Given the increasing importance of URIs to everything from billboards to the semantic web, it's a wonder we've come as far as we have on such shaky foundations.


In additon, the IETF has issued RFC 3987, Internationalized Resource Identifiers (IRIs), as a proposed standard. IRIs are basically the same as URIs except that you don't have to escape non-ASCII characters, so you can write http://www.cafeconleche.org/reports/cω.html instead of http://www.cafeconleche.org/reports/c%CF%89.html. A lot of XML-related specs such as XInclude either implicitly or explicitly use IRIs instead of URIs.