Cafe con Leche News Wednesday, December 6, 2006

There are two kinds of conference keynotes. Some keynotes are given by techie celebrities talking about interesting things. The second kind are given by some CEO or marketing flack you've never heard of, talking about their company's "vision". The crucial difference between the two is that the conference usually pays a large amount of money for the first and is paid a large amount of money for the second. It's the difference in whether the conference is selling the speakers to the attendees or the attendees to the speakers. Most conferences try to do a little of both to greater or lesser degrees.

This one may actually be the first type. The speaker is Darrin McBeath from Reed Elsevier on Unleashing the Power of XML. Elsevier is not a conference sponsor, and is not selling us a product. They're an XML user, not a vendor. They use DTDs for documents, W3C schemas for some services. They've played with web services, but without a lot of success. They can only make this work with people they've been working with before, and whom they have contracts with; not with end users. (He seems a little surprised by this, but it's pretty much exactly what I'd expect from a SOAP-based system.) He claims web service shaven;t had a large impact on publishing. That doesn't surprise me either.

Namespaces are a major weakness of XML, mostly due to complexity. Schemas are too complex. In general, complexity of any kind and from any source seems like a big problem for him. Only the techies understand XML.

3 papers on XQuery last year at XML 2005. 9 papers this year. I'm not sure that reflects anything real, or just the preferences of the referees. Most publishers are not yet using either hybrid (XQuery+SQL) or native (XQuery only) XML databases. If anything they're using Saxon or DataDirect XQuery. XQuery+XML database does speed up the development of publishing applications, and the execution speed of these applications. Unlike other search engines, XQuery has no predefined granularity.

First conference session of the morning is Yahoo's Douglas Crockford on "JSON, the Fat Free Alternative to XML". Of course, a fat free diet can kill you. I intend to listen very closely to this talk for advocacy of any nutritional deficiencies, fad diets, or eating disorders. I plan to explain why this approach is broken in my first panel session this afternoon. I expect I'll later write that up in a Cafes article.

"XML on the Web has effectively died" -- Simon St. Laurent.

Douglas Crockford's question is "How should the data be delivered?" More accurately, in what format? He has programmer-colored glasses. He wants all his data to look like a programming languages data structure, specifically, JavaScript. I will explain exactly what is wrong with this goal this afternoon.

JSON is language independent, text-based (Unicode; autodetected encoding), lightweight, easy-to-parse.
Only for data, not documents. (a false dichotomy, IMNSHO)
Based on quasi-literal notation.
Types include integers, reals, strings, booleans, null.
Objects are collections of name-value pairs. (Really it's just an associative array.)
Names are strings that need not be unique (though it is recommended).
Arrays are ordered lists.
application/json MIME type. "It appears that compliance with formal standards causes things to break."
No version.
YAML is a superset of JSON.
"JSON has become the X in AJAX"
eval is a security risk; use parseJSON instead
There are other security issues I don't fully understand

"Because the consumer of your data and the generator of your data both tend to be written in programming languages, I contend it's not a problem."

He explains objections to JSON:

Namespaces
No Validators
Not extensible
Not XML

I can't type fast enough to explain his arguments or rebut them, but he's missing a lot. He has a very narrow view of the world. A little more on that this afternoon.

There's a JSON-XML mapping, but someone screwed up. It's not fully round-trippable. I don't see any fundamental reason it couldn't be. It just isn't. (XML->JSON is much trickier.)

JSON used to have comments, but people started putting metadata into comments, so they took them out. (Maybe people do need metadata?)

Next in this track, Jason Hunter discusses Web Publishing 2.0 and XQuery. Content sizes are increasing. People's expectations are growing for immediate, relevant, searchable access. XML doesn't fit well into a relational model. XML is a triangle (tree) and can't fit into rectangular tables (SQL). Excellent visualization. I'll have to remember that one.

He misses one of the crucial lessons of Google, though. Fielded search is powerful for professional librarians, but a failure for end users such as doctors (or really, anyone).

XML is the raw negative. It gives the publisher more information than Google (which only sees the HTML) so they can do better search. (True, but not important outside a few very special areas like Lexis-Nexis). Most searchers want broader search rather than more specialized.

He sees a lot of Web 2.0 startups using mySQL that he thinks should be using a native XML database. Well, maybe; but until there's a decent open source native XML database that's not going to happen. Mark Logic is way too expensive for a startup. Last time I checked, eXist was too unreliable. I do expect to see a solid open source native XML database, but probably not before 2008 at the earliest.

They're some interesting use cases here such as Safari U, but it's all Web 1.0. So far I don't see any Ajax or interactivity with the client. It's all done on the back end with Mark Logic's XML database. He's proposing some personalization of RSS and so forth, but these examples are hypothetical.

XQuery has much less impedance mismatch for web apps than relational databases and Java. Hibernate == translate Java to relational. JSF == translate Java to HTML. XQuery has less translation to do.

The final session was a panel on next generation XML APIs led by Norman Walsh. I talked about XOM. Eric van der Vlist talked about TreeBind and Philippe Poulard talked about Active Tags. That session didn't go so well since none of us had enough time to really explain our approach, and everyone was rushed. Plus none of the pan elists could see or commeent on the others's slides.

Most of the panels at this show, including these two, were made by combining proposals the referees didn't think were worth a full 45 minute slot; and that rarely works. In the future I think papers should be accepted or rejected without offering panel positions as consolation prizes. Panels are fun, but they need to be developed as a panel, not as an unrelated collection of 15 minute sessions.

XML News from Wednesday, December 6, 2006