Cafe con Leche News Monday, December 3, 2007

I've arrived at XML 2007 in Boston. There's some snow on the ground (though not a huge amount by Boston standards) so it's not clear how many people will get here in time for the morning keynote. I am glad I stayed in the conference hotel instead of the cheaper one a few blocks away. Normally it wouldn't be a big deal, but in the snow it's trickier.

I brought the camera to this show, so hopefully I can get a few pictures. However it seems my ancient copy of Photoshop Elements 2 won't run on the Intel MacBook and/or Leopard so my editing capabilities are limited.

This is the first time I've taken the new MacBook out for a spin in public. The battery definitely does not have the advertised four hours of battery life. It's running down in two or less.

This keyboard is going to take a little getting used to. I really miss my home, page up and page down keys. Seems like Function-up arrow and Function down arrow d page up and page down respectively. I wonder if Function-left arrow and Function-right arrow do home and end? Yep, looks like they do.

The text cursor seems to disappear a lot. (I've used Universal Access to make it bigger. That may help. Hmm, no, it doesn't. The text cursor just takes too long to show up when I use the trackpad. That's weird.) I'm also not accustomed to Leopard yet, and not all of the preferences seem to have transferred over from my desktop. Among other things I seem to have lost all my keychain items. I hope I can remember the passwords to everything. Hopefully this won't make a big deal for the PowerPoint slides I have prepared for tonight.

Over 300 people are attending, a little drop-off from last year. The printed program is out of date. Check online.

The first keynote is a panel with Michael Day, Douglas Crockford, and C. Michael Sperberg-McQueen on "Does XML Have a Future on the Web?"

David Megginson, C. Michael Sperberg-McQueen, Douglas Crockford, Michael Day, IDEAlliance host

Michael Day

No. XML is used more on the server than sent directly to the client. "XHTML" is rarely well-formed. Will not replace HTML on web sites.

Douglas Crockford

"Certainly yes, and I'd offer as evidence of that you can still buy Cobol compilers." However "it's clearly trending down." "XML is really not a very effective data format." JSON is just easier to use. The Web itself is in danger as a result of the XML adventure. There has been no progress made on the Web since 1999. Security is the major issue. We've been too distracted by XML to give HTML the repairs it needs. (First thing he's said I agree with.) He wants to reexamine HTML and DOM and JavaScript with security in mind.

In Q&A, he elaborates that the problem is that different pieces of the HTML page (including JavaScripts and mashup programs from different sources) are not separated from each other and can each see the whole page. Multiple languages--HTML, XML, JavaScript, etc.--make securing this and finding the evil scripts inside the data very difficult. Good points, however when McQueen challenges him on JSON security, he's in complete denial about the specific security issues that he introduced with JSON. (And this continues in later Q&A.)

C. Michael Sperberg-McQueen

Yes, and "it ought to have a future on the Web, and it depends in part on what you mean by the Web." The Web is a "single connected information space," not even just HTTP. Internationalization and accessibility are keys. Compromised notations for specialized niches have non-trivial costs. We need loose coupling between client and server. He focuses on writing things. He needs richer markup than down translation to HTML allows. He publishes XML on the Web. "XML will die when you rip it out of my cold dead hands."

In Q&A he suggests that the WhatWG is broken They are defining a parser spec rather than a language spec. He thinks that if publishers cared about interoperable interpretation of their documents they'd publish valid HTML.

For the next session, I have the choice between two different flavors of snake oil (microformats and XML hardware) and something really boring and mostly irrelevant (DITA). Maybe I'll flip a three-sided coin. OK. Microformats wins.

Melissa Utzinger from the Mitre Corporation is giving a basic introduction to microformats. Firefox 3 has an API for this. There are some Firefox extensions for editing these. Google Maps, Yahoo Local, Yahoo tech, Flickr are using this.

Melissa Utzinger

I'm not sure why, but this site is not updating as fast as it should. I'm not sure if it's a server caching issue or the local wireless network here at the hotel is caching or just what is going on. Hmm, looks like it's on the server. I've ssh'd into the server and used lynx from there and I still see a non-updated page. Hmm, wait: it does look like a client side SFTP problem. Maybe the local WAN or Cyberduck doesn't work with Leopard? I'll try updating it.

The new version of CyberDuck does seem to be more stable. However it's changed the Upload menu accelerator from Command-U to Option-Up arrow. I hate it when programs do that. My fingers remember Command-U.

E-mail also seems to be a problem. My Speakeasy account works, but IBiblio/Metalab doesn't unless I turn off encrypted connections. I'll have to change my passwords once the conference is over. This may be the wireless proxy/firewall or it may be IBiblio. (Their server certificate expired unexpectedly on Thanksgiving, and they're running on a self-signed certificate for the moment.)

Fo the first afternoon session, we're not sure if the speaker is going to show up. Hmm, looks like he/she didn't. I'm switching over to the XML on the Web track for Mark Pruett talking about "Taming XML in Ajax". AJAX apps are one-page applications. (I knew there was something fishy about them. He just put his finger on it. Different resources should have different URLs. You shouldn't be able to change the resource without changing the URL, but too many AJAX apps obviously do that. Some AJAX apps like GMail get this right--each message has its own URL--but too many don't. We need more granularity than one URL per application. We need separate URLs/URIs for different states of the application. These URLs may be client generated, and the server from which the application code was downloaded may never even see them; but we still need the URLs.)

He's demoing four approaches to building on simple AJAX weather app. The same domain problem is an issue. Approach #1 talks to a server based proxy that talks to the National Weather Service to get around this. Approach #2 uses server side XSLT. Approach #3 uses browser side XSLT. Approach #4 uses Yahoo Pipes and JSON. Apparently there's a script tag hack that can completely get around the cross-domain limitation. This opens up security issues, but he thinks these are not a big deal if you're just loading XML data.

The next session Kurt Cagle talks about "The Trouble with DOM and/or "Lightweight XML: An Exploration of E4X". Only he's not here. He's talking over the Net. Dan McCreary is hosting locally. Weird.

DOM is semantically neutral. The initial setup cost to use XPath and XSLT is a problem. The plumbing costs too much.

JSON is not a good document format because of "unique addressability". There can only be one value per key name. (I'm not sure that's true.) You can have lists or maps, but not both at the same time. "JSON is a degenerate case of XML." He warns us not to tell the AJAX people this because they'll get upset, but he doesn't know Douglas Crawford is sitting in the room. The disadvantages of a long-distance presentation. :-)

In the third afternoon session I listen to Norm Walsh from Sun talk about XProc: An XML Pipeline Language. They're running late. They'll have to go back to second last call working draft. Should be finished by the Spring. XProc specifies what should be done to which XML documents and in what order. For example, XInclude, then validate; or validate, then XInclude. The output of one step flows into the input of the next step. Steps may have options (expected) or parameters (unexpected). It should be amenable to streaming.

<p:xslt name="db2html">
  <p:input port="source">
    <p:pipe step="expand"/>
  </p:input>
  <p:input port="stylesheet">
    <p:document href="docbook.xsl"/>
  </p:input>
  <p:option name="initial-mode" select="$imode" />
  <p:parameter name="foo" value="bar" />
</p:xslt>

Each step has a type and a name.
Steps have named input and output ports, which are parts of the signature. (How to handle multiple inputs and outputs such as may go into XInclude or come out of XSLT? There's a secondary output port that returns zero or more documents. Is there such a thing as an EntityResolver step that an map URIs to other sources? No, there isn't.)
Primary output of one step is default primary input of next step.
Literal inputs are allowed via p:inline.
XPath 2 is allowed.
Compound steps contain other steps. Users cannot define these, only atomic steps.
There is iteration for operating on a bunch of documents with the same step.
Selective processing handles data islands via p:viewport.
30 required atomic steps: add-attribute, add-xml-base, etc.

They're about half a dozen implementations including one written in XQuery! Overall, this is quite interesting and potentially useful. It's the first practical and essentially new thing I've heard about at this conference so far. If time permits, maybe I should see if I can write a developerWorks article about this.

In the last afternoon session Intel's Stewart Taylor talks about XML and XPath in the Wild. They scraped files from the Web, which I suspect gives them a very non-representative sample. (Much, probably most XML, isn't on the public Web.) Types included XHTML, RSS, VoiceXML, SVG, SAML, and SMIL. They also scraped XPath expressions from open source projects. They did various statistics on this including principle component analysis.

XPath Results: 50/50 split between child and descendant axes. A third of the expressions were very simple. 47% had two or more steps but relatively few had three or four. Only 18% of expressions used predicates. About half of these were attribute value tests. Numeric comparisons were non-existent, so don't worry about type conversions. Half used functions, mostly string(), count(), text(), sum(), and boolean(). DOM usage is more common than XPath. They were looking in Java and .NET source code, not XSLT stylesheets and XQuery databases (which I expect would have had much more complex expressions on average.)

XML News from Monday, December 3, 2007