Cafe con Leche News Monday, August 21, 2006

Jez Higgins has posted a new version of Arabica, an open source C++ XML parser toolkit that supports SAX2 and DOM2 by wrapping an underlying parser such as expat, Xerces, libxml, or the Microsoft XML parser COM component. It supports various string types. According to Higgins, "The August 2006 release extends the XPath engine to support arbitrary strings types. A new dual DOM/Streaming parser has been added. By registering a callback function, partially built DOM trees can be processed, modified, manipulated or even discarded, before proceeding to build more of the tree. The release also includes assorted minor bug fixes." Arabica is published under a BSD style license.

Pavel Sher has posted Juxy 0.7, "a simple unit testing library for XSLT written in Java. Juxy allows to call or apply individual XSLT templates from Java and does not use any specific features of XSLT processor for that purposes. It relies entirely on TRaX API and should work with any TRaX compliant XSLT processor." Juxy is published under the Apache 2.0 license. Java 1.4 or later is required.

Jacek Radajewski has posted an alpha of UTF-X, a JUnit extension for testing XSLT stylesheets. According to Radajewski, "We've developed it at USQ for unit testing our stylesheets about three years ago, and two years ago released it under GPL. Although still in Alpha versions UTF-X works well and has been used in reasonable size projects (over 1500 templates/tests). UTF-X tests or Test Definition Files (TDFs) are XML documents which can be validated and rendered. Being able to render your tests works well for the test-first-design approach as you can write all your tests, validate them against your and/or xhtml DTD and render them for visual inspection. If everything is OK you can write your templates untill all tests pass." UTF-X requires Java 5 and is published under the GPL.

Oleg Paraschenko has released XSieve, "an XML transformation language based on combination of XSLT and Scheme (a Lisp dialect). XSieve make XSLT to be a general-purpose language." I'm not quite sure why you'd want that when we already have excellent general purpose languages like Python and Java, but there it is. XSieve allows XSLT extension functions to be written in Scheme. Since XSLT and Scheme are both functional languages, that may be a better match than extension functions written in imperative languages like Java and C.

XRules.org has released XmlVoyager, a spreadsheet-like XML browser for Windows. According to developer Waleed Abdullav, "XmlVoyager is especially useful when working with publicly standardized XML formats such as those created by vertical standards groups like UBL and OAGIS. Columns can be added to identify which nodes are populated by the XML creator and which ones are required or optional by the XML consumer (see example in the download). And, additional columns can be used to provide comments or describe business rules that govern the use of each node. XmlVoyager makes it easy to customize the view to show only relevant data as needed."

Orbeon has submitted the XML Pipeline Language (XPL) Version 1.0 (Draft) to the W3C. "An XPL program defines orchestrated sequences of operations on XML Information Sets (Infosets). Individual operations are encapsulated within components called XML processors. Operations include production, consumption, and transformation of XML Infosets. An XPL program supports unconditional operations, and may support as well conditions, loops, and change of control following runtime errors." This is an important idea, and a big hole int the existing XML family of specs. Whether this is the right implementation of this idea, I don't yet know.

Alex Milowski is working on smallx:

library and set of tools that is being developed to process XML infosets. It has two distinct features in that the infoset implementation allows streaming of documents and that processing of infosets can be accomplished using a concept called pipelines. The library contains a full compliment of technologies--include XPath and XSLT.

Pipelines provide the ability to chain together different components that perform different tasks to process a XML document. Some of these tasks might be decision points in the processing while other might transform the input (e.g. XSLT). All components in the pipeline have the ability to stream the infoset it they so choose.

The key difference of this code over others is that it allows streaming of infosets to be mixed in with non-streamed document-based processing. This allows large data sets to be processed in a minimal amount of memory while allowing traditional technologies like XSLT to still be used.

For example, in the following pipeline:
<p:pipe xmlns:p="http://www.smallx.com/Vocabulary/Pipeline/2005/1/0" name="scoped-xslt">
<p:subtree select="/doc/part/subsection">

<p:xslt src="translate.xsl"/>
</p:subtree>
</p:pipe>
the XSLT transform "translate.xsl" is limited to only the elements that match the "/doc/part/subsection" XPath. Every other part of the document "flows" around the XSLT in a streaming fashion so that only the subsection subtree needs to be built and passed to XSLT. In the end, the pipeline puts all the pieces back together in the right order.

Agains, whether this is the right implementation of this idea, I don't yet know.

IBM's alphaWorks has released XJ, a derivative of the Java programming language 1.4 that builds in native support for XML (like Cω does for C#). "In XJ, one can import XML schemas just as one does Java classes. All the element declarations in the XML schema are then available to programmers as if they were Java classes. Programmers can write inline XPath expressions on these classes, and the compiler checks them for correctness with respect to the XML schema. In addition, it performs optimizations to improve the evaluation of XPath expressions. A programmer may construct new XML documents by writing XML directly inline. Again, the compiler ensures correctness with respect to the appropriate schema." It sounds interesting, but the tight coupling to schemas is a serious mistake. I want my XPaths and XML literals to work regardless of what the schema says. Indeed the schema agnosticity of both XML and XPath is one of their strengths. It's disturbing how many people keep trying to force the schemaless genie back into the bottle.

XML News from Monday, August 21, 2006