XML News from Thursday, October 30, 2008

XimpleWare has released VTD-XML 2.4, a free-as-in-speech (GPL) non-extractive Java/C/C# library for processing XML that supports XPath. This appears to be an example of what Sam Wilmot calls "in situ parsing". In other words, rather than creating objects representing the content of an XML document, VTD-XML just passes pointers into the actual, real XML. (These are the abstract pointers of your data structures textbook, not C-style addresses in memory. In this cases the pointers are int indexes into the file.) You don't even need to hold the document in memory. It can remain on disk. This should improve speed and memory usage, but I haven't verified that, and I don't trust their own benchmarks. Version 2.4 supports memory mapped files and files up to 256 gigabytes. However it's still not a minimally conformant XML parser, and doesn't seem likely to become one. In particular, it only supports the five predefined entities, not others that may be declared in the internal DTD subset.