XML News from Wednesday, April 21, 2004

Day 3 of XML Europe begins with Ken Holman talking about "Writing Formatting Specifications for XML Documents: A UBL Case Study." They're about 50 people in the room, a good turnout for 9:00 A.M. UBL is the Universal Business Language. This is based on what happened in the UBL Forms subcommittee. UBL is the payload for EBXML. EBXML says how things go around, not what goes around. The UBL 1.0 release package is being assembled this week.

Jon Bosak asked Ken to write some stylesheets for UBL 1.0, but the group didn't really know what they wanted the result documents to contain or look like. So Ken formed a committee to design standard forms presentations. These are not based on XSL-FO, but they do use XPath to identify the information in the documents to be displayed in the visual representation. Other technologies such as PDF or PostScript could be used instead.

Ken seems to believe the visual representation needs to be standardized, but this denies the value of XML. The normative document becomes the printout, not the XML.

I just noticed something weird: I've been at this conference for more than two days now and I haven't yet heard anyone say the words "architectural forms." Times have changed. I've had heard a lot of people saying the words "RDF", "OWL", and "topic maps". I still don't really know what those words mean myself, but people are still saying them.

Second talk of the morning is Klaas Bals, CTO of Inventive Designers, a developer of Scriptura, XSL-FO rendering engine, on "Using XSL-FO 1.1 for Business Type Documents." He prefers XSLT pull (xsl:for-each, xsl:value-of) to XSLT push (xsl:apply-templates) for business type documents, because in a business type document such as an invoice, the layout drives where the content is placed. In forms there's lots of absolute positions, limited if any flowing of content.

XSL-FO 1.1 is a working draft that will change in the future.

The absolute-position property can only be applied to fo:block-container!?!

BarcodeML is an XML application for bar codes that can be generated easily by XSLT. He suggests developing a ChartML for charts that would make it easier to generate charts from XSLT, and which could be processed by special purpose processors as MathML and SVG are processed by MathML and SVG renderers today. It's an interesting idea. It would certainly be easier than generating the SVG for a chart directly from the XML data using XSLT. You perhaps could implement a renderer on top of JFreeChart, and similar libraries. I wish I had time to work on it. It might make a nice paper for XML 2004 in Philly in December, but I doubt I could do it in time for the deadline. Hmm, perhaps there's something like this built into OpenOffice or Excel? Do those products' XML formats use a special charting vocabulary or just a generic graphics vocabulary? I should check. He says Chrysalis has also begun work on a charting XML application.

Back to room D and the technical track for the final two sessions of the show. First Alex Brown is talking about "Refactoring XML." He wants to refactor XML itself, the technology, not refactoring XML instance documents. Oh god, he wants to talk about elements vs. attributes, again! Hasn't everything there is to say about this already been said? he thinks the question (which to use when) indicates a "bad smell" in XML. I disagree.

DocHeads (developers who work with narrative documents) work round XML by augmentation. DataHeads (developers who work with record-like documents) work around XML by reduction. E.g. SOAP forbidding processing instructions. Interesting point. It seems reasonable, and I hadn't thought of the split that way before.

He suggests that XML was not intended for human consumption based on the XML design goals, specifically "Terseness in XML markup is of minimal importance." "XML is verbose by design", and "XML is text, but isn't meant to be read." I disagree. First of all he's confusing consumption with production. Secondly, I do not think lack of terseness is not a problem for humans. He ignored Item 3, "XML documents should be human-legible and reasonably clear," and I can't find two of his other quotes in the spec. OK, they're from XML in a 10 Points, not the spec. And the idea that XML isn't 't meant to be read is almost 180° wrong. Liam Quin appears to agree with me. He makes reference to Terry Pratchett's "Lies to Children", and suggests these goals are examples of such "Lies to Children." See the Science of Discworld (Great book by the way. Last I checked it wasn't available in the States. You can order it from Amazon UK.)

He proposes to leave out everything from XML except tags and text: no processing instructions, DTDs, attributes, comments, etc. He wants to derive it from SGML rather than XML. He claims this will control proliferation of ad hoc XML subsets, and terminate permathreads on other formats. Then new lexical layers (non-angle brackets, short tags, SDATA entities, etc.) can be built on top of this. However, this is only for DataHeads, not DocHeads. The main benefit is simpler parser implementation (in other words a non-conformant parser that doesn't support real XML). Google Rick Jelliffe on Goldilocks and SML for a rebuttal.

This is so wrong on so many levels, I don't know where to begin. I think this talk gets the booby prize for the single worst idea of the conference. Henry Thompson sums up, "Sorry, but no. I came with an open mind, but I'm not convinced."

Interesting historical point, according to Thompson, "Tim" (Berners-Lee? or Bray?) directly and personally rejected the original (processing instructions?) namespaces proposal, and Eliot Kimber walked out of the working group as a result. But in hindsight, Thompson thinks Tim was right. He didn't at the time.

Next Mark Birbeck is giving a late-breaking news presentation about XHTML and RDF (co-authored with Stephen Pemberton). The goal is a new syntax for RDF and XHTML that would allow the two of them to integrate better.

HTML meta element's name attribute becomes the property attribute, which may appear on any element:
January 24, 2003
HTML href attribute becomes the resource attribute which can appear on any element, and make it a clickable link
The England captain had his hair cut
The right choice of URIs is necessary to make this reliable metadata. The taxononmies are based on URIs.
Elements (not just head) can contain link and meta child elements to identify metadata about the element.
Add a content attribute. For instance this tells you which England captain (football or rugby) is referred to:
The England captain had his hair cut
Adds a datatype attribute:
2003-01-24

According to Pemberton, this is a snapshot of an unfinished work in progress. The XML limitation of one ID per element is apparently a problem that remains to be solved.

Overall, this seems interesting and it might be helpful, but it really doesn't do anything about the fundamental problem of getting content publishers to provide accurate, useful metadata. Maybe that's too harsh. This syntax would make adding metadata easier, which might expand its use somewhat. This syntax is a lot easier to stomach than traditional RDF syntax. Liam Quin points out that this has the problem of QNames in element content, which makes cut and paste fragile because you can lose the namespace declarations. Henry Thompson doesn't think this is such a big problem. Liam also notes validating these XHTML documents is a problem, but I don't see that. (Then again, I don't really care if my XHTML validates. This page doesn't.)

Conference chair Edd Dumbill is giving the closing keynote address on "The State of XML." "The state of XML is pretty good." He's been told Microsoft writes their schemas in RELAX NG and then translates them into W3C XML Schema Language. He's worried about a lot of the web services specs being devloped outside the W3C (shows the Feigenbaum? Swale?, period doubling diagram) and suspects we're heading for a train wreck. He likes REST and document oriented web services. He's optimistic about XForms. "These days we all need to be librarians." He believes we need standard schemas and taxonomies to achieve interoperability. (I disagree.) Mobile devices (PDAs, etc.) will drive adoption of XHTML. He foresees more regulations governing the Web as it becomes more and more important to our daily lives. The general buzz of the conference is the human readability and editability of XML. Over 80% of attendees (at a previous conference) used a text editor to edit their XML. "A successful document type is a readable document type." Microsoft is starting to get this. Illegible XML is a problem for RDF. XML syntax may not be right for all aplications (RELAX NG, RDF). He wants us to be inspired about the state of XML. His speech will be posted on xml.com tomorrow.

The conference didn't provide us with a CD or a printed copy of the proceedings. These should be posted on the conference web site soon, if they're not there already. I'll upload my own paper here on Cafe con Leche on Monday when I get back to the States.